Installing Ruby 2.3 on Archlinux

I’ve been running Archlinux for a few years now. I ran Ubuntu for a 8 years before that and frequently ran into issues with old packages that eventually spurred me to jump to Arch where I get to deal with issues in new packages instead. “Pick your poison” as the saying goes.

Today I needed to get an app running that required Ruby 2.3.3 and, true to form, the poison of the day was all about the libraries installed on my system being to new to compile Ruby 2.3.

I’m a long time user of Rbenv. It’s nice and clean and it’s ruby-build plugin makes installing new versions of Ruby as easy as rbenv install 2.3.3… which is exactly what kicked off the fun.

[mike@longshot identity-idp]$ rbenv install 2.3.3
Downloading ruby-2.3.3.tar.bz2...
-> https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.3.tar.bz2
Installing ruby-2.3.3...
*** Error in `./miniruby': malloc(): memory corruption: 0x00007637497798d8 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x72bdd)[0x66e27048fbdd]
...
./miniruby(+0x2470b)[0x80e03b1670b]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x66e27043d4ca]
./miniruby(_start+0x2a)[0x80e03b1673a]
======= Memory map: ========
80e03af2000-80e03de0000 r-xp 00000000 00:27 154419
...
66e2715e7000-66e2715e8000 rw-p 00000000 00:00 0
763748f81000-763749780000 rw-p 00000000 00:00 0                          [stack]

BUILD FAILED (Arch Linux using ruby-build 20170726-9-g86909bf)

Inspect or clean up the working tree at /tmp/ruby-build.20170828122031.16671
Results logged to /tmp/ruby-build.20170828122031.16671.log

Last 10 log lines:
generating enc.mk
creating verconf.h
./template/encdb.h.tmpl:86:in `<main>': undefined local variable or method `encidx' for main:Object (NameError)
	from /tmp/ruby-build.20170828122031.16671/ruby-2.3.3/lib/erb.rb:864:in `eval'
	from /tmp/ruby-build.20170828122031.16671/ruby-2.3.3/lib/erb.rb:864:in `result'
	from ./tool/generic_erb.rb:38:in `<main>'
make: *** [uncommon.mk:818: encdb.h] Error 1
make: *** Waiting for unfinished jobs....
verconf.h updated
make: *** [uncommon.mk:655: enc.mk] Aborted (core dumped)

The issues here are twofold; Ruby 2.3 won’t build with GCC 7 or OpenSSL 1.1. Arch as it stands today has both by default.

[mike@longshot ~]$ openssl version
OpenSSL 1.1.0f  25 May 2017
[mike@longshot ~]$ gcc -v
gcc version 7.1.1 20170630 (GCC)

To solve the OpenSSL problem we need 1.0 installed (sudo pacman -S openssl-1.0, but it’s probably installed already), and we need to tell ruby-build where to find both the header files, and the openssl directory itself.

Helping compilers find header files is the job of pkg-config. On Arch the config files that do that are typically in /usr/lib/pkgconfig/ but in this case we want to point to the pkg-config file in /usr/lib/openssl/1.0/pkgconfig before searching there. To do that we assign a colon-delimited set of paths to PKG_CONFIG_PATH.

Then we need to tell Ruby where the openssl directory is which is done via RUBY_CONFIGURE_OPTS.

[mike@longshot ~]$ PKG_CONFIG_PATH=/usr/lib/openssl-1.0/pkgconfig/:/usr/lib/pkgconfig/ RUBY_CONFIGURE_OPTS=--with-openssl-dir=/usr/lib/openssl-1.0/ rbenv install 2.3.3
Downloading ruby-2.3.3.tar.bz2...
-> https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.3.tar.bz2
Installing ruby-2.3.3...

BUILD FAILED (Arch Linux using ruby-build 20170726-9-g86909bf)

Inspect or clean up the working tree at /tmp/ruby-build.20170829103308.24191
Results logged to /tmp/ruby-build.20170829103308.24191.log

Last 10 log lines:
  R8: 0x0000016363058550  R9: 0x0000016362cc3dd8 R10: 0x0000016362fafe80
 R11: 0x000000000000001b R12: 0x0000000000000031 R13: 0x0000016363059a40
 R14: 0x0000000000000000 R15: 0x00000163630599a0 EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
linking static-library libruby-static.a
ar: `u' modifier ignored since `D' is the default (see `U')
verifying static-library libruby-static.a
make: *** [uncommon.mk:655: enc.mk] Segmentation fault (core dumped)
make: *** Waiting for unfinished jobs....

Our OpenSSL errors fixed we now get the segfault that comes from GCC 7. So we need to install an earlier gcc (sudo pacman -S gcc5) add two more variables (CC and CXX) to specify the C and C++ compilers to we want used.

[mike@longshot ~]$ CC=gcc-5 CXX=g++-5 PKG_CONFIG_PATH=/usr/lib/openssl-1.0/pkgconfig/:/usr/lib/pkgconfig/ RUBY_CONFIGURE_OPTS=--with-openssl-dir=/usr/lib/openssl-1.0/ rbenv install 2.3.3
Downloading ruby-2.3.3.tar.bz2...
-> https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.3.tar.bz2
Installing ruby-2.3.3...
Installed ruby-2.3.3 to /home/mike/.rbenv/versions/2.3.3

With that done, you should now have a working Ruby 2.3:

[mike@longshot ~]$ rbenv global 2.3.3
[mike@longshot ~]$ ruby -e "puts 'hello world'"
hello world

Installing R-Studio on Ubuntu 16.10

rstudioInstalling things on Linux is either really easy, or a yak shave with surprisingly little between those extremes.

It seems that Ubuntu 16.10 has removed Gstreamer 0.10 from the repos and replaced it with Gstreamer 1.0, which is great… until you need to install R-Studio.

While the R-Studio people are aiming to drop the Gstreamer dependency, for the moment, as of 16.10, installing it has fallen into the yak-shave category.

Installing R-Studio works fine, but if you try to run (from the terminal) it you will get the error:

rstudio: error while loading shared libraries: libgstreamer-0.10.so.0: cannot open shared object file: No such file or directory

We can see that it’s failing to load Gstreamer, but since it’s been removed from the Ubuntu repos fixing this will mean getting those packages elsewhere.

To start with, we can download the latest R-studio daily build and install it using dpkg:

$ wget https://s3.amazonaws.com/rstudio-dailybuilds/rstudio-1.0.124-amd64.deb
$ sudo dpkg -i rstudio-1.0.124-amd64.deb

The dpkg command can also query the package to display information about it. If we use the uppercase I option we can confirm that this package requires exactly version 0.10 of libgstreamer:

dpkg -I rstudio-1.0.124-amd64.deb 
 new debian package, version 2.0.
 size 98840122 bytes: control archive=42847 bytes.
     554 bytes,    12 lines      control              
  163246 bytes,  1548 lines      md5sums              
     198 bytes,    10 lines   *  postinst             #!/bin/sh
     158 bytes,    10 lines   *  postrm               #!/bin/sh
 Package: rstudio
 Version: 1.0.124
 Section: devel
 Priority: optional
 Architecture: amd64
 Depends: libjpeg62, libedit2, libgstreamer0.10-0, libgstreamer-plugins-base0.10-0, libssl1.0.0,  libc6 (>= 2.7)
 Recommends: r-base (>= 2.11.1)
 Installed-Size: 526019
 Maintainer: RStudio <info@rstudio.com>
 Description: RStudio
  RStudio is a set of integrated tools designed to help you be more productive with R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, and workspace management.

Debian (which Ubuntu is based on) has the old Gstreamer packages we need to satisfy those dependencies, so we can get them from there. If you need something other than the AMD64 see here and here. The if you have a 64bit machine, you can download and install like this:

# download with wget
$ wget http://ftp.ca.debian.org/debian/pool/main/g/gstreamer0.10/libgstreamer0.10-0_0.10.36-1.5_amd64.deb
$ wget http://ftp.ca.debian.org/debian/pool/main/g/gst-plugins-base0.10/libgstreamer-plugins-base0.10-0_0.10.36-2_amd64.deb

# Now install with dpkg
$ sudo dpkg -i libgstreamer0.10-0_0.10.36-1.5_amd64.deb
$ sudo dpkg -i libgstreamer-plugins-base0.10-0_0.10.36-2_amd64.deb

While that solves R’s problems, we now have one of our own. We’ve purposefully installed old packages and don’t want Ubuntu’s package manager to enthusiastically upgrade them next time we update.
To resolve that problem will put a hold on them with apt-mark:

$ sudo apt-mark hold libgstreamer-plugins-base0.10-0
libgstreamer-plugins-base0.10-0 set on hold.
$ sudo apt-mark hold libgstreamer0.10
libgstreamer0.10-0 set on hold.

And we can check the packages that are on hold with:

$ sudo apt-mark showhold
libgstreamer-plugins-base0.10-0
libgstreamer0.10-0

Hopefully that saves someone some Googling.
Now that’s working, it’s time to play with some R!

Running Gephi on Ubuntu 15.10

A while ago I gave a talk at the Ottawa graph meetup about getting started doing graph data visualizations with Gephi. Ever the optimist, I invited people to install Gephi on their machines and then follow along as I walked through doing various things with the program.

java_install

What trying to get a room of 20 people to install a Java program has taught me is that the installer’s “Java is found everywhere” is not advertising; it’s a warning. I did indeed experience the power of Java, and after about ten minutes of old/broken/multiple  Java versions, broken classpaths and Java 7/8 compatiblity drama, I gave up and completed the rest of the talk as a demo.

All of this was long forgotten until my wife and I started a little open data project recently and needed to use Gephi to visualize the data. The Gephi install she had attempted the day of the talk was still lingering on her Ubuntu system and so it was time to actually figure out how to get it going.

The instructions for installing Gephi are pretty straight forward:

  1. Update your distribution with the last official JRE 7 or 8 packages.
  2. After the download completes, unzip and untar the file in a directory.
  3. Run it by executing ./bin/gephi script file.

The difficulty was that after doing that, Gephi would show its splash screen and then hang as the loading bar said “Starting modules…“.

If you have every downloaded plugins for Gephi, you will have noticed that they have an .nbm extension, which indicates they, and (if you will pardon the pun) by extension, Gephi itself is built on top of the Netbeans IDE.
So the next question was, does Netbeans itself work?

sudo apt-get install netbeans
netbeans

Wouldn’t you know it, that Netbeans also freezes while loading modules.

Installing Oracle’s version of Java was suggested and the place to get that is the Webupd8 Team’s ppa:

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer oracle-java8-set-default
# The java version that got installed:
java -version
java version &quot;1.8.0_72&quot;
Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)

That finally left us with a working version of gephi.

Gephi 0.9.1 running on Ubuntu 15.10
Gephi 0.9.1 running on Ubuntu 15.10

Installing Gephi on Arch Linux was (thankfully) drama-free, but interestingly installs the OpenJDK, they very thing that seemed to causing the problems on Ubuntu:

yaourt -S gephi
java -version
openjdk version &quot;1.8.0_74&quot;
OpenJDK Runtime Environment (build 1.8.0_74-b02)
OpenJDK 64-Bit Server VM (build 25.74-b02, mixed mode)

It’s a mystery to me why Gephi on Ubuntu seems to require Oracle’s Java but on Arch I can run it on OpenJDK.
With a little luck it can remain a mystery.

Changing keyboard layout options in Ubuntu 14.04

Back in 2012 I switched my caps-lock key to act as a second ESC key. This made a big impact in my Vim usage, and you can understand why when you see the keyboard vi was created with.

Having become reliant on that little tweak, it was a little disconcerting to realize that the keyboard layout options I had used to switch my caps-lock were nowhere to be found in Ubuntu 14.04. It turns out that Gnome (upstream from Ubuntu) removed the settings from the system settings entirely.

Fortunately this is still accessible via the Gnome Tweak Tool.
You can install that like this:

sudo apt-get install gnome-tweak-tool

Once installed you can launch the tool from the terminal:

gnome-tweak-tool

You can find the all the old options under the “typing” option.
gnome tweak tool

Its a little weird to have such useful stuff suddenly removed from the system settings. Hopefully they will find there way back in a future version, for the moment, my Vim crisis has been averted and that’s enough.

On working with Capybara

I’ve been writing Ruby since 2009 and while TDD as a process has long been clear to me, the morass of testing terminology has not. For a recent project I made pretty significant use of Capybara, and through it, Selenium. While it solved some problems I am not sure I could have solved any other way, it created others and on the way shed some light on some murky terminology.

I think mine was a pretty common pathway to working with Capybara, finding it via Rails and Rspec and the need to do an integration test.

The Rspec equivalent of an integration test is the Request Spec, and its often pointed out that its intended use is API testing, which wasn’t what I was doing. What’s held up as the user focused compliment to request specs are feature specs using Capybara.

The sudden appearance of “the feature” as the focus of these specs, and the brief description of “Acceptance test framework for web applications” at the top of the Capybara Github page should be our first signs that things have shifted a little.

This shift in terminology has implications technically, which are not immediately obvious. The intent of Acceptance testing “is to guarantee that a customers requirements have been met and the system is acceptable“. Importantly, since “Acceptance tests are black box system tests”, this means testing the system from the outside via the UI.

Its this “via the UI” part that should stand out, since its a far cry from the other kinds of tests of tests common with Rails. Uncle Bob has said that testing via the UI “is always a bad idea” and I got a taste of why pretty much right away. Lets take a feature spec like this as an example:

    it "asks for user email" do
      visit '/'
      fill_in "user_email", with: "foo@example.com"
      click_button "user_submit"
      expect(page).to have_content "Thanks for your email!"
    end

Notice that suddenly I am stating expectations about the HTML page contents and looking for and manipulating page elements like forms and buttons.
The code for the user_submit button above would typically look like this in most Rails apps:

<%= f.submit "Submit", :class => "btn snazzy" %>

In Rails 3.0.19 that code would use the User class and the input type to create the id attribute automatically. Our click_button 'user_submit' from above finds the element by id and our test passes:

<input class="btn snazzy" id="user_submit" name="commit" type="submit" value="Submit">

In Rails 3.1.12, the same code outputs this:

<input class="btn snazzy" name="commit" type="submit" value="Submit">

In Rails 3.1 they decided to remove the id attribute from the form submit helper so our click_button "user_submit" from the example test above now finds nothing and the test fails.

Rails view helpers exist to abstract away the details of shifting preferences in HTML, form element naming and the tweakery required for cross-browser support. In spite of that I appear to be violating the boundary that abstraction is supposed to provide by making my tests depend on the specific output of the helper.

There are patterns like page objects that can reduce the brittleness but testing via the UI is something that only makes sense when the system under test really is a black box. If you are the developer of said system, it isn’t and you have better options available.

Contrary to the black box assumption of Acceptance tests, Rails integration tests access system internals like cookies, session and the assigns hash as well as asserting specific templates have been rendered. All of that is done via HTTP requests without reliance on clicking UI elements to move between the controllers under test.

Another problem comes from the use of Selenium. By default Capybara uses rack-test as its driver, but adding js: true switches to the javascript driver which defaults to Selenium:

    it "asks for user email", js: true do
      visit '/'
      fill_in "user_email", with: "foo@example.com"
      click_button "user_submit"
      expect(page).to have_content "Thanks for your email!"
    end

The unexpected consequences of this seemingly innocuous option come a month or two later when I try to run my test suite:

     Failure/Error: visit '/'
     Selenium::WebDriver::Error::WebDriverError:
       unable to obtain stable firefox connection in 60 seconds (127.0.0.1:7055)

What happened? Well, my package manager has updated my Firefox version to 26.0, and the selenium-webdriver version specified in my Gemfile.lock is 2.35.1.

Yes, with “js: true” I have made that test dependent on a particular version of Firefox, which is living outside of my version control and gets auto-updated by my package manager every six weeks.

While workarounds like bundle update selenium-webdriver or simply skipping tests tagged with js:true using rspec -t ~js:true are available, your default rspec command will always run all the tests. The need to use special options to avoid breakage is unlikely to be remembered/known by future selves/developers, so the solution seems to be keeping some sort of strict separation between your regular test suite and, minimally any test that uses js, or ideally all Acceptance tests. I’m not sure that that might look like yet, but I’m thinking about it.

Acceptance testing differs far more that I initially appreciated from other types of testing, and like most things, when used as intended its pretty fabulous. The unsettling thing in all this was how easy it was to drift into Acceptance testing without explicitly meaning to.

Including the Capybara DSL in integration tests certainly blurs the boundaries between two things that are already pretty easily confused.

Capybara’s ostensible raison d’etre seems to be largely glossed over in most other places as well. Matthew Robbins otherwise great book “Application Testing with Capybara“, is not only conspicuously not called “Acceptance testing with Capybara”, it only mentions the words “Acceptance testing” twice. Not exactly a recipe for a clear-eyed look at the tradeoffs of adopting acceptances testing.

Cabybara is certainly nice to work with, and being able to translate a clients “when I click here it breaks” almost verbatim into a test is amazing. I feel like I now have a better idea of how to enjoy that upside without the downside.

Web scraping with Ruby

Writing a web scraper seems to be almost a right of passage as a programmer, and recently I found it was my turn to write one. Judging from the libraries I looked at, the term “web scraping” seems to refer to a spectrum of functionality; web crawling on one end to processing the results on the other, and often doing some combination of the two.

I looked at a few libraries and all of them seemed to either do to much or not enough relative to what I had in mind, so as millions of programmers before me have done, I wrote my own.

The idea was very much oriented around extracting a set of structured information from a given page, and to that end I wrote a little DSL to get it done.
The library is called (very creatively) Skrape.

Assuming you have a page like this at the address example.com:

<html><body><h1>I am a title</h1></body></html>

You can scrape the title off the page with this:

results = Skrape::Page.new("http://example.com").extract do
  extract_title with: 'h1'
end

The results would be:

{title: "I am a title"}

The calls to “extract_*” are caught with method_missing and whatever follows the “extract_” is used as the key in the hash of results that is returned.
To deal with the inevitable edge cases that come up so often in scraping you can all so pass a block which will be handed whatever the CSS selector found so you can do some further processing.

I’ve needed to use it for picking out the href attribute of a link:

results = Skrape::Page.new(url).extract do
  extract_link_href with: 'a', and_run: proc {|link| link.attr('href').value }
end

And also removing problematic <br> tags from elements:

results = Skrape::Page.new(url).extract do
  extract_locations with: '.address', and_run: proc {|address| address.map{|a| a.inner_html.gsub('<br>', ', ')} }
end

While there are still some improvements I would like to make, so far I am pretty happy with the library. It feels readable and does not do to much. If you have some scraping to do, check it out. Pull requests welcome.

An intro to injection attacks

I’ve found myself explaining SQL injection attacks to people a few times lately and thought I would write up something I can just point to instead.

For illustration purposes lets make a toy authentication system.

Lets say you have a database with table for all your users that looks like this:

The structure of table "users":
+-----+-----------+----------------+
| uid | username  | password       |
+-----+-----------+----------------+
|   1 | mike      | catfish        |
+-----+-----------+----------------+
|   2 | sally     | floride        |
+-----+-----------+----------------+
|   3 | akira     | pickles        |
+-----+-----------+----------------+

Lets say a user wants to use your application and you ask them for their username and password, and they give you ‘akira’ and ‘pickles’.

The next step in authenticating this user is to check in the database to see if we have a user with both a username of ‘akira’ and a password of ‘pickles’ which we can do with a SQL statement like this:

select * from user where username = 'akira' and password = 'pickles';

Since we know we have one row that satisfies both of the conditions we set (value in username must equal ‘akira’ and the value in password must equal ‘pickles’), if we hand that string of text to the database and ask it to execute it, we would expect the database to return the following data:

+-----+-----------+----------------+
| uid | username  | password       |
+-----+-----------+----------------+
|   3 | akira     | pickles        |
+-----+-----------+----------------+

If the database returns a row we can let the user go ahead and use our application.

Of course if we need our SQL statement to work for everyone and so we can’t just write ‘akira’ in there. So lets replace the username and password with variables (PHP style):

select * from user where username = '$username' and password = '$password';

Now if someone logs in with ‘mike’ and ‘catfish’ our application is going to place the value ‘mike’ in the variable $username  and ‘catfish’ in the variable $password and the PHP interpreter will be responsible for substituting the variable names for the actual values so it can create the finished string that looks like this:

select * from user where username = 'mike' and password = 'catfish';

This will be passed to the database which executes the command and return a row.

Unfortunately mixing data supplied by the user with our pre-exisisting SQL commands and passing the result to the database as one single string has set the stage for some bad behaviour:

$username = "";
$password = " ' or user_name like '%m% ";

select * from user where username = '$username' and password = '$password';

Once the final string is assembled suddenly the meaning is very different:

select * from user where username = ' ' and password = '' or username like '%m%';

Now our database will return a row if the username AND password are both empty strings OR if any of the usernames contains the letter ‘m’. Suddenly the bar for logging into our application is a lot lower.

There are tonnes of possible variations on this, and it’s a common enough problem that its the stuff of jokes among programmers:

XKCD: Little bobby tables.

The root of the problem is the commingling of user supplied data with commands intended  to be executed. The user supplied password value of “ ‘ or username like ‘%m%” entirely changed the meaning of our SQL command where if we had been able make it clear that this was just a string to search the database for we would have had the expected behaviour of comparing the string ” ‘ or username like ‘%m%” to strings in our list of passwords (‘pickles’, ‘catfish’ and ‘floride’).

If you think about it like that you realize that this is not just a problem with SQL, but a problem that shows up everywhere data and commands are mixed.

Keeping these things separate is the only way to stay safe. When the data is kept separate it can be sanitized, or singled out for whatever special treatment is appropriate for the situation by libraries/drivers or whatever else.

Separating data from commands can look pretty different depending on the what you are working on. Keeping data and commands separate when executing system commands in Ruby or Node.js looks different from keeping them separate using prepared statements to safely query a database using JRuby/Java. The same rules apply to NoSQL things like Mongodb as well.

People far smarter than I am still get caught by this stuff pretty regularly, so there is no magic bullet to solve it. If you are working with Rails, security tools like Breakman can help catch things but the only real solution is awareness and practice.