Installing Ruby 2.3 on Archlinux

I’ve been running Archlinux for a few years now. I ran Ubuntu for a 8 years before that and frequently ran into issues with old packages that eventually spurred me to jump to Arch where I get to deal with issues in new packages instead. “Pick your poison” as the saying goes.

Today I needed to get an app running that required Ruby 2.3.3 and, true to form, the poison of the day was all about the libraries installed on my system being to new to compile Ruby 2.3.

I’m a long time user of Rbenv. It’s nice and clean and it’s ruby-build plugin makes installing new versions of Ruby as easy as rbenv install 2.3.3… which is exactly what kicked off the fun.

[mike@longshot identity-idp]$ rbenv install 2.3.3
Downloading ruby-2.3.3.tar.bz2...
-> https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.3.tar.bz2
Installing ruby-2.3.3...
*** Error in `./miniruby': malloc(): memory corruption: 0x00007637497798d8 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x72bdd)[0x66e27048fbdd]
...
./miniruby(+0x2470b)[0x80e03b1670b]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x66e27043d4ca]
./miniruby(_start+0x2a)[0x80e03b1673a]
======= Memory map: ========
80e03af2000-80e03de0000 r-xp 00000000 00:27 154419
...
66e2715e7000-66e2715e8000 rw-p 00000000 00:00 0
763748f81000-763749780000 rw-p 00000000 00:00 0                          [stack]

BUILD FAILED (Arch Linux using ruby-build 20170726-9-g86909bf)

Inspect or clean up the working tree at /tmp/ruby-build.20170828122031.16671
Results logged to /tmp/ruby-build.20170828122031.16671.log

Last 10 log lines:
generating enc.mk
creating verconf.h
./template/encdb.h.tmpl:86:in `<main>': undefined local variable or method `encidx' for main:Object (NameError)
	from /tmp/ruby-build.20170828122031.16671/ruby-2.3.3/lib/erb.rb:864:in `eval'
	from /tmp/ruby-build.20170828122031.16671/ruby-2.3.3/lib/erb.rb:864:in `result'
	from ./tool/generic_erb.rb:38:in `<main>'
make: *** [uncommon.mk:818: encdb.h] Error 1
make: *** Waiting for unfinished jobs....
verconf.h updated
make: *** [uncommon.mk:655: enc.mk] Aborted (core dumped)

The issues here are twofold; Ruby 2.3 won’t build with GCC 7 or OpenSSL 1.1. Arch as it stands today has both by default.

[mike@longshot ~]$ openssl version
OpenSSL 1.1.0f  25 May 2017
[mike@longshot ~]$ gcc -v
gcc version 7.1.1 20170630 (GCC)

To solve the OpenSSL problem we need 1.0 installed (sudo pacman -S openssl-1.0, but it’s probably installed already), and we need to tell ruby-build where to find both the header files, and the openssl directory itself.

Helping compilers find header files is the job of pkg-config. On Arch the config files that do that are typically in /usr/lib/pkgconfig/ but in this case we want to point to the pkg-config file in /usr/lib/openssl/1.0/pkgconfig before searching there. To do that we assign a colon-delimited set of paths to PKG_CONFIG_PATH.

Then we need to tell Ruby where the openssl directory is which is done via RUBY_CONFIGURE_OPTS.

[mike@longshot ~]$ PKG_CONFIG_PATH=/usr/lib/openssl-1.0/pkgconfig/:/usr/lib/pkgconfig/ RUBY_CONFIGURE_OPTS=--with-openssl-dir=/usr/lib/openssl-1.0/ rbenv install 2.3.3
Downloading ruby-2.3.3.tar.bz2...
-> https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.3.tar.bz2
Installing ruby-2.3.3...

BUILD FAILED (Arch Linux using ruby-build 20170726-9-g86909bf)

Inspect or clean up the working tree at /tmp/ruby-build.20170829103308.24191
Results logged to /tmp/ruby-build.20170829103308.24191.log

Last 10 log lines:
  R8: 0x0000016363058550  R9: 0x0000016362cc3dd8 R10: 0x0000016362fafe80
 R11: 0x000000000000001b R12: 0x0000000000000031 R13: 0x0000016363059a40
 R14: 0x0000000000000000 R15: 0x00000163630599a0 EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
linking static-library libruby-static.a
ar: `u' modifier ignored since `D' is the default (see `U')
verifying static-library libruby-static.a
make: *** [uncommon.mk:655: enc.mk] Segmentation fault (core dumped)
make: *** Waiting for unfinished jobs....

Our OpenSSL errors fixed we now get the segfault that comes from GCC 7. So we need to install an earlier gcc (sudo pacman -S gcc5) add two more variables (CC and CXX) to specify the C and C++ compilers to we want used.

[mike@longshot ~]$ CC=gcc-5 CXX=g++-5 PKG_CONFIG_PATH=/usr/lib/openssl-1.0/pkgconfig/:/usr/lib/pkgconfig/ RUBY_CONFIGURE_OPTS=--with-openssl-dir=/usr/lib/openssl-1.0/ rbenv install 2.3.3
Downloading ruby-2.3.3.tar.bz2...
-> https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.3.tar.bz2
Installing ruby-2.3.3...
Installed ruby-2.3.3 to /home/mike/.rbenv/versions/2.3.3

With that done, you should now have a working Ruby 2.3:

[mike@longshot ~]$ rbenv global 2.3.3
[mike@longshot ~]$ ruby -e "puts 'hello world'"
hello world
Advertisements

On working with Capybara

I’ve been writing Ruby since 2009 and while TDD as a process has long been clear to me, the morass of testing terminology has not. For a recent project I made pretty significant use of Capybara, and through it, Selenium. While it solved some problems I am not sure I could have solved any other way, it created others and on the way shed some light on some murky terminology.

I think mine was a pretty common pathway to working with Capybara, finding it via Rails and Rspec and the need to do an integration test.

The Rspec equivalent of an integration test is the Request Spec, and its often pointed out that its intended use is API testing, which wasn’t what I was doing. What’s held up as the user focused compliment to request specs are feature specs using Capybara.

The sudden appearance of “the feature” as the focus of these specs, and the brief description of “Acceptance test framework for web applications” at the top of the Capybara Github page should be our first signs that things have shifted a little.

This shift in terminology has implications technically, which are not immediately obvious. The intent of Acceptance testing “is to guarantee that a customers requirements have been met and the system is acceptable“. Importantly, since “Acceptance tests are black box system tests”, this means testing the system from the outside via the UI.

Its this “via the UI” part that should stand out, since its a far cry from the other kinds of tests of tests common with Rails. Uncle Bob has said that testing via the UI “is always a bad idea” and I got a taste of why pretty much right away. Lets take a feature spec like this as an example:

    it "asks for user email" do
      visit '/'
      fill_in "user_email", with: "foo@example.com"
      click_button "user_submit"
      expect(page).to have_content "Thanks for your email!"
    end

Notice that suddenly I am stating expectations about the HTML page contents and looking for and manipulating page elements like forms and buttons.
The code for the user_submit button above would typically look like this in most Rails apps:

<%= f.submit "Submit", :class => "btn snazzy" %>

In Rails 3.0.19 that code would use the User class and the input type to create the id attribute automatically. Our click_button 'user_submit' from above finds the element by id and our test passes:

<input class="btn snazzy" id="user_submit" name="commit" type="submit" value="Submit">

In Rails 3.1.12, the same code outputs this:

<input class="btn snazzy" name="commit" type="submit" value="Submit">

In Rails 3.1 they decided to remove the id attribute from the form submit helper so our click_button "user_submit" from the example test above now finds nothing and the test fails.

Rails view helpers exist to abstract away the details of shifting preferences in HTML, form element naming and the tweakery required for cross-browser support. In spite of that I appear to be violating the boundary that abstraction is supposed to provide by making my tests depend on the specific output of the helper.

There are patterns like page objects that can reduce the brittleness but testing via the UI is something that only makes sense when the system under test really is a black box. If you are the developer of said system, it isn’t and you have better options available.

Contrary to the black box assumption of Acceptance tests, Rails integration tests access system internals like cookies, session and the assigns hash as well as asserting specific templates have been rendered. All of that is done via HTTP requests without reliance on clicking UI elements to move between the controllers under test.

Another problem comes from the use of Selenium. By default Capybara uses rack-test as its driver, but adding js: true switches to the javascript driver which defaults to Selenium:

    it "asks for user email", js: true do
      visit '/'
      fill_in "user_email", with: "foo@example.com"
      click_button "user_submit"
      expect(page).to have_content "Thanks for your email!"
    end

The unexpected consequences of this seemingly innocuous option come a month or two later when I try to run my test suite:

     Failure/Error: visit '/'
     Selenium::WebDriver::Error::WebDriverError:
       unable to obtain stable firefox connection in 60 seconds (127.0.0.1:7055)

What happened? Well, my package manager has updated my Firefox version to 26.0, and the selenium-webdriver version specified in my Gemfile.lock is 2.35.1.

Yes, with “js: true” I have made that test dependent on a particular version of Firefox, which is living outside of my version control and gets auto-updated by my package manager every six weeks.

While workarounds like bundle update selenium-webdriver or simply skipping tests tagged with js:true using rspec -t ~js:true are available, your default rspec command will always run all the tests. The need to use special options to avoid breakage is unlikely to be remembered/known by future selves/developers, so the solution seems to be keeping some sort of strict separation between your regular test suite and, minimally any test that uses js, or ideally all Acceptance tests. I’m not sure that that might look like yet, but I’m thinking about it.

Acceptance testing differs far more that I initially appreciated from other types of testing, and like most things, when used as intended its pretty fabulous. The unsettling thing in all this was how easy it was to drift into Acceptance testing without explicitly meaning to.

Including the Capybara DSL in integration tests certainly blurs the boundaries between two things that are already pretty easily confused.

Capybara’s ostensible raison d’etre seems to be largely glossed over in most other places as well. Matthew Robbins otherwise great book “Application Testing with Capybara“, is not only conspicuously not called “Acceptance testing with Capybara”, it only mentions the words “Acceptance testing” twice. Not exactly a recipe for a clear-eyed look at the tradeoffs of adopting acceptances testing.

Cabybara is certainly nice to work with, and being able to translate a clients “when I click here it breaks” almost verbatim into a test is amazing. I feel like I now have a better idea of how to enjoy that upside without the downside.

Web scraping with Ruby

Writing a web scraper seems to be almost a right of passage as a programmer, and recently I found it was my turn to write one. Judging from the libraries I looked at, the term “web scraping” seems to refer to a spectrum of functionality; web crawling on one end to processing the results on the other, and often doing some combination of the two.

I looked at a few libraries and all of them seemed to either do to much or not enough relative to what I had in mind, so as millions of programmers before me have done, I wrote my own.

The idea was very much oriented around extracting a set of structured information from a given page, and to that end I wrote a little DSL to get it done.
The library is called (very creatively) Skrape.

Assuming you have a page like this at the address example.com:

<html><body><h1>I am a title</h1></body></html>

You can scrape the title off the page with this:

results = Skrape::Page.new("http://example.com").extract do
  extract_title with: 'h1'
end

The results would be:

{title: "I am a title"}

The calls to “extract_*” are caught with method_missing and whatever follows the “extract_” is used as the key in the hash of results that is returned.
To deal with the inevitable edge cases that come up so often in scraping you can all so pass a block which will be handed whatever the CSS selector found so you can do some further processing.

I’ve needed to use it for picking out the href attribute of a link:

results = Skrape::Page.new(url).extract do
  extract_link_href with: 'a', and_run: proc {|link| link.attr('href').value }
end

And also removing problematic <br> tags from elements:

results = Skrape::Page.new(url).extract do
  extract_locations with: '.address', and_run: proc {|address| address.map{|a| a.inner_html.gsub('<br>', ', ')} }
end

While there are still some improvements I would like to make, so far I am pretty happy with the library. It feels readable and does not do to much. If you have some scraping to do, check it out. Pull requests welcome.

An intro to injection attacks

I’ve found myself explaining SQL injection attacks to people a few times lately and thought I would write up something I can just point to instead.

For illustration purposes lets make a toy authentication system.

Lets say you have a database with table for all your users that looks like this:

The structure of table "users":
+-----+-----------+----------------+
| uid | username  | password       |
+-----+-----------+----------------+
|   1 | mike      | catfish        |
+-----+-----------+----------------+
|   2 | sally     | floride        |
+-----+-----------+----------------+
|   3 | akira     | pickles        |
+-----+-----------+----------------+

Lets say a user wants to use your application and you ask them for their username and password, and they give you ‘akira’ and ‘pickles’.

The next step in authenticating this user is to check in the database to see if we have a user with both a username of ‘akira’ and a password of ‘pickles’ which we can do with a SQL statement like this:

select * from user where username = 'akira' and password = 'pickles';

Since we know we have one row that satisfies both of the conditions we set (value in username must equal ‘akira’ and the value in password must equal ‘pickles’), if we hand that string of text to the database and ask it to execute it, we would expect the database to return the following data:

+-----+-----------+----------------+
| uid | username  | password       |
+-----+-----------+----------------+
|   3 | akira     | pickles        |
+-----+-----------+----------------+

If the database returns a row we can let the user go ahead and use our application.

Of course if we need our SQL statement to work for everyone and so we can’t just write ‘akira’ in there. So lets replace the username and password with variables (PHP style):

select * from user where username = '$username' and password = '$password';

Now if someone logs in with ‘mike’ and ‘catfish’ our application is going to place the value ‘mike’ in the variable $username  and ‘catfish’ in the variable $password and the PHP interpreter will be responsible for substituting the variable names for the actual values so it can create the finished string that looks like this:

select * from user where username = 'mike' and password = 'catfish';

This will be passed to the database which executes the command and return a row.

Unfortunately mixing data supplied by the user with our pre-exisisting SQL commands and passing the result to the database as one single string has set the stage for some bad behaviour:

$username = "";
$password = " ' or user_name like '%m% ";

select * from user where username = '$username' and password = '$password';

Once the final string is assembled suddenly the meaning is very different:

select * from user where username = ' ' and password = '' or username like '%m%';

Now our database will return a row if the username AND password are both empty strings OR if any of the usernames contains the letter ‘m’. Suddenly the bar for logging into our application is a lot lower.

There are tonnes of possible variations on this, and it’s a common enough problem that its the stuff of jokes among programmers:

XKCD: Little bobby tables.

The root of the problem is the commingling of user supplied data with commands intended  to be executed. The user supplied password value of “ ‘ or username like ‘%m%” entirely changed the meaning of our SQL command where if we had been able make it clear that this was just a string to search the database for we would have had the expected behaviour of comparing the string ” ‘ or username like ‘%m%” to strings in our list of passwords (‘pickles’, ‘catfish’ and ‘floride’).

If you think about it like that you realize that this is not just a problem with SQL, but a problem that shows up everywhere data and commands are mixed.

Keeping these things separate is the only way to stay safe. When the data is kept separate it can be sanitized, or singled out for whatever special treatment is appropriate for the situation by libraries/drivers or whatever else.

Separating data from commands can look pretty different depending on the what you are working on. Keeping data and commands separate when executing system commands in Ruby or Node.js looks different from keeping them separate using prepared statements to safely query a database using JRuby/Java. The same rules apply to NoSQL things like Mongodb as well.

People far smarter than I am still get caught by this stuff pretty regularly, so there is no magic bullet to solve it. If you are working with Rails, security tools like Breakman can help catch things but the only real solution is awareness and practice.

Ruby Redos: Rspec

I have always found Rspec syntax fascinating. There is something incredible about being able to type readable English sentences and have them do exactly what you think they will from reading them.
I wanted to take a stab at recreating something like Rspec and it was a great exercise. Test::Unit has always felt foreign and this little experiment was a great reminder of why: It ignores Ruby’s blocks.

Test::Unit works by gathering all the decendents of Test::Unit::TestCase and calling every method that starts with “test_”:

class TestMyClass < Test::Unit::TestCase
  def test_my_method1
    ... 
  end 

  def test_my_method2
    ... 
  end 
end

This approach, to my eyes, is a relative of the command pattern, which is essentially a workaround for the lack of language support for blocks. It does not make much sense to use the command pattern in Ruby but you still see it pop up all over the place.

Rspec syntax, on the other hand, is really a just a bunch of Ruby blocks:

[14] pry(main)> load 'test.rb'
=> true
[15] pry(main)> describe "Thing" do                                                                       
[15] pry(main)*   it "is nil" do                                                                          
[15] pry(main)*     expect(nil).to be_nil                                                                 
[15] pry(main)*   end  
[15] pry(main)* end  
A Thing:
is nil
-----

You will notice that describe is defined on the main object and that everything else I have put in a TestContext class to house the methods that I am expecting to be called in the blocks. Then I am using instance eval to execute the block as though it were part of the TestContext class. One other thing to point out is that calling Proc.new will grab any block that has been passed to the method:

require 'benchmark'

def describe subject
  puts "A #{subject}:"
  block = Proc.new
  Benchmark.measure do
  TestContext.new.run(block)
  end.real
end


class TestContext

  def initialize
    @examples = {}
  end

  def run block
    instance_eval &block
    @examples.each_pair do |desc, code|
      if code.call
        puts "  \033[32m#{desc}\033[0m\n"
      else
        puts "  \033[31m#{desc}\033[0m\n"
      end
      puts "-----"
    end
  end

  def it its_description
    @examples[its_description] = Proc.new
  end

  def expect thing
    Assertion.new(thing)
  end

  def be_nil
    NilMatcher.new
  end

end

class NilMatcher

  def match subject
    subject.nil?
  end

end

class Assertion

  def initialize subject
    @subject = subject
  end

  def to what
    what.match(@subject)
  end

end

I like the fact that the emphasis is on how a few methods read when chained (describe, it, expect, to, be_nil) and using those as a thin layer around blocks. This is some of the core stuff that that makes Ruby great to work with and, in my opinion, a great example of what Matz is talking about when he says “Languages can be weapons against complexity”.
He has given us blocks in the language so things as common as “wanting to execute code later” is actually part of the language. No design pattern required. The flexibility of the language also means we can write code with a very small distance between what is said and what is meant, further reducing the complexity. Both of those things are on display in Rspec.

Does your predicate return what you think it does?

def has_query?
  self.param_search_query && self.param_search_query != "Eventname"
end

I found this in some code I was working on today. This is a pretty standard “predicate method”, which, by convention, should always return either true or false. Strangely, as I was exploring the code with Pry, I noticed the method return nil.

It turned out that param_search_query was actually nil, which, when mixed with the logical AND (&&) operator returns something the author clearly did not expect:

mike@sleepycat:~☺  irb
irb(main):001:> nil && true
=> nil

This was actually surprising to me too. I would have thought that nil would have been falsey and that the expression would return false. Definitely something I am going to remember next time I am using the && operator.

So the next question is, what to do about it.
One way to ensure a boolean return value that I have seen is to use the bang operator twice. This works because !nil is true and !!nil is false:

irb(main):003:0> !!nil
=> false

So we could use that to fix the method like so:

def has_query?
  !!(self.param_search_query && self.param_search_query != "Eventname")
end

I’m not a fan of this because I think it obscures the intention of the author.
In the end, since this was in a Rails app I went with the present? method:

def has_query?
  self.param_search_query.present? && self.param_search_query != "Eventname"
end

The tests pass and I’m back to hunting bigger game…

Factory not registered

I have been working on adding tests to an existing Rails app for one of my clients. The app was written by some well intentioned person who included tonnes of testing libraries but just never got around to writing tests.

group :development, :test do
  gem 'capybara'
  gem 'cucumber'
  gem 'cucumber-rails', :require => false
  gem 'database_cleaner'
  gem 'factory_girl'
  gem 'factory_girl_rails'
  gem 'rspec-core'
  gem 'rspec-expectations'
  gem 'rspec-mocks'
  gem 'rspec'
  gem 'rspec-rails'
  gem 'ZenTest'
end

As I started adding specs I needed to create factories, and when I started to do that I started to get errors. First “Factory already registered” errors and then “Factory not registered” errors. Very annoying.
The root of the problem seems to be that factory_girl_rails and factory_girl both look for factories when loaded. The solution for me was to remove factory_girl and let factory_girl_rails do the right thing.
This triggered a major purge of unneeded gems which left me with a Gemfile that looks like this:

group :development, :test do
  gem 'factory_girl_rails'
  gem 'rspec-rails'
  gem 'capybara'
end

It also left me with passing tests and no further errors.

I think the take away for me here is to let bundler/rubygems dependency management system do the right thing. Over-specifying stuff is a recipe for trouble. Over-specifying stuff in the Gemfile might look like explicitly including factory_girl when it will obviously be a dependency of factory_girl_rails. I’ve also seen problems arise from habitually including version numbers in the Gemfile, which makes things so specific that Bundler no longer has the latitude to figure out something that works.

In any case its a good reminder that less is more.