Setting up Oracle-XE on Arch linux

Currently I’m working on a project that needs to pull data from an Oracle database. My normal development setup is to install the database locally and develop the application TDD style with a test database, so it seemed reasonable to do the same with Oracle as well. Although, the fact that this became fodder for a blog post suggests it wasn’t as easy as I expected.

First up was the basic decision about what to install. The Oracle database itself is a multi-gigabyte monster, seemingly designed to sell support contracts so I was glad to find discover an Express Edition exists. Last released in 2014, and aimed at whatever “easy development” means in the world of Oracle, apparently “Applications developed with XE may be immediately used with other editions of the Oracle Database”. This sounds like the right thing to me.

So “easy development” obviously begins with logging into your Oracle account and downloading the Oracle XE zip file.

Next we want to package this up so that it can be cleanly installed and removed from our system. The Oracle-XE package on the AUR uses the zip file we just downloaded to build a package we can install, so lets get that happening

[mike@longshot ~]$ git clone https://aur.archlinux.org/oracle-xe.git
Cloning into 'oracle-xe'...
remote: Counting objects: 22, done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 22 (delta 8), reused 22 (delta 8)
Unpacking objects: 100% (22/22), done.
[mike@longshot ~]$ cd oracle-xe                           
[mike@longshot oracle-xe]$ cp ~/Downloads/oracle-xe-11.2.0-1.0.x86_64.rpm.zip .
[mike@longshot oracle-xe]$ ls
listener.ora  oracle_env.csh  oracle_env.sh  oracle.install  oracle-xe  oracle-xe-11.2.0-1.0.x86_64.rpm.zip  oracle-xe.conf  oracle-xe.service  PKGBUILD
[mike@longshot oracle-xe]$ makepkg
==> Making package: oracle-xe 11.2.0_1.0-4 (Tue Oct 17 14:51:00 EDT 2017)
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> Retrieving sources...
  -> Found oracle-xe-11.2.0-1.0.x86_64.rpm.zip
  -> Found oracle_env.csh
  -> Found oracle_env.sh
  -> Found oracle-xe
  -> Found oracle-xe.conf
  -> Found listener.ora
  -> Found oracle-xe.service
==> Validating source files with md5sums...
    oracle-xe-11.2.0-1.0.x86_64.rpm.zip ... Passed
    oracle_env.csh ... Passed
    oracle_env.sh ... Passed
    oracle-xe ... Passed
    oracle-xe.conf ... Passed
    listener.ora ... Passed
    oracle-xe.service ... Passed
==> Extracting sources...
  -> Extracting oracle-xe-11.2.0-1.0.x86_64.rpm.zip with bsdtar
==> Starting build()...
==> Entering fakeroot environment...
==> Starting package()...
==> Tidying install...
  -> Removing libtool files...
  -> Purging unwanted files...
  -> Removing static library files...
  -> Compressing man and info pages...
==> Checking for packaging issue...
==> Creating package "oracle-xe"...
  -> Generating .PKGINFO file...
  -> Generating .BUILDINFO file...
  -> Adding install file...
  -> Generating .MTREE file...
  -> Compressing package...
==> Leaving fakeroot environment.
==> Finished making: oracle-xe 11.2.0_1.0-4 (Tue Oct 17 14:54:35 EDT 2017)
[mike@longshot oracle-xe]$ ls
listener.ora  oracle_env.csh  oracle_env.sh  oracle.install  oracle-xe  oracle-xe-11.2.0_1.0-4-x86_64.pkg.tar.xz  oracle-xe-11.2.0-1.0.x86_64.rpm.zip  oracle-xe.conf  oracle-xe.service  pkg  PKGBUILD  src
[mike@longshot oracle-xe]$ sudo pacman -U oracle-xe-11.2.0_1.0-4-x86_64.pkg.tar.xz 
[sudo] password for mike: 
loading packages...
resolving dependencies...
looking for conflicting packages...

Packages (1) oracle-xe-11.2.0_1.0-4

Total Installed Size:  564.61 MiB

:: Proceed with installation? [Y/n] y
(1/1) checking keys in keyring                                                                                                 [#############################################################################] 100%
(1/1) checking package integrity                                                                                               [#############################################################################] 100%
(1/1) loading package files                                                                                                    [#############################################################################] 100%
(1/1) checking for file conflicts                                                                                              [#############################################################################] 100%
(1/1) checking available disk space                                                                                            [#############################################################################] 100%
:: Processing package changes...
(1/1) installing oracle-xe                                                                                                     [#############################################################################] 100%

creating group "dba" ...done

creating user "oracle" ...done

change directory rights ...done

set sticky bit to oracle executable ...done

creating /etc/sysconfig ...done

creating /var/log/oracle ...done


add your user to the "dba" group in order to use the oracle tools

:: Running post-transaction hooks...
(1/2) Arming ConditionNeedsUpdate...
(2/2) Updating the desktop file MIME type cache...
[mike@longshot oracle-xe]$ sudo usermod -aG dba $USER

Above we built and installed the Oracle-XE package, and added the dba group to the current users existing groups.

To get a sense of what we just installed it’s good to look at what that package put into the /etc directory.

[mike@longshot node_oracle]$ pacman -Ql oracle-xe | grep "etc"
oracle-xe /etc/
oracle-xe /etc/ld.so.conf.d/
oracle-xe /etc/ld.so.conf.d/oracle-xe.conf
oracle-xe /etc/profile.d/
oracle-xe /etc/profile.d/oracle_env.csh
oracle-xe /etc/profile.d/oracle_env.sh
oracle-xe /etc/rc.d/
oracle-xe /etc/rc.d/oracle-xe
oracle-xe /etc/systemd/
oracle-xe /etc/systemd/system/
oracle-xe /etc/systemd/system/oracle-xe.service
[mike@longshot node_oracle]$ cat /etc/ld.so.conf.d/oracle-xe.conf
/usr/lib/oracle/product/11.2.0/xe/lib
[mike@longshot node_oracle]$ cat /etc/profile.d/oracle_env.sh 
export ORACLE_HOME=/usr/lib/oracle/product/11.2.0/xe
export ORACLE_SID=XE
export NLS_LANG=`$ORACLE_HOME/bin/nls_lang.sh`
export PATH=$PATH:$ORACLE_HOME/bin

Here we can see that this package installed an entry in our library search path (/etc/ld.so.conf.d/oracle-xe.conf), added some env vars for us (/etc/profile.d/oracle_env.sh), added a run script (/etc/rc.d/oracle-xe) and a systemd service (/etc/systemd/system/oracle-xe.service).

In theory we should be able to install our node driver and have it work.

Installing node-oracledb

Oracle has thoughtfully released a Node.js driver, which can be installed with npm install oracledb. This driver installs and compiles a bunch of stuff with node-gyp and expects some libraries and headers to be available for that process. Let’s see!

[mike@longshot node_oracle]$ npm i oracledb

> oracledb@1.13.1 install /home/mike/projects/play/node_oracle/node_modules/oracledb
> node-gyp rebuild

node-oracledb ERR! Error: Cannot find $OCI_LIB_DIR/libclntsh.so
node-oracledb ERR! Error: See https://github.com/oracle/node-oracledb/blob/master/INSTALL.md

gyp: Call to 'INSTURL="https://github.com/oracle/node-oracledb/blob/master/INSTALL.md"; ERR="node-oracledb ERR! Error:"; if [ -z $OCI_LIB_DIR ]; then OCI_LIB_DIR=`ls -d /usr/lib/oracle/*/client*/lib/libclntsh.* 2> /dev/null | tail -1 | sed -e 's#/libclntsh[^/]*##'`; if [ -z $OCI_LIB_DIR ]; then if [ -z "$ORACLE_HOME" ]; then if [ -f /opt/oracle/instantclient/libclntsh.so ]; then echo "/opt/oracle/instantclient/"; else echo "$ERR Cannot find Oracle library libclntsh.so" >&2; echo "$ERR See $INSTURL" >&2; echo "" >&2; fi; else if [ -f "$ORACLE_HOME/lib/libclntsh.so" ]; then echo $ORACLE_HOME/lib; else echo "$ERR Cannot find \$ORACLE_HOME/lib/libclntsh.so" >&2; echo "$ERR See $INSTURL" >&2; echo "" >&2; fi; fi; else if [ -f "$OCI_LIB_DIR/libclntsh.so" ]; then echo $OCI_LIB_DIR; else echo "$ERR Cannot find \$OCI_LIB_DIR/libclntsh.so" >&2; echo "$ERR See $INSTURL" >&2; echo "" >&2; fi; fi; else if [ -f "$OCI_LIB_DIR/libclntsh.so" ]; then echo $OCI_LIB_DIR; else echo "$ERR Cannot find \$OCI_LIB_DIR/libclntsh.so" >&2; echo "$ERR See $INSTURL" >&2; echo "" >&2; fi; fi;' returned exit status 0 while in binding.gyp. while trying to load binding.gyp
gyp ERR! configure error 
gyp ERR! stack Error: `gyp` failed with exit code: 1
gyp ERR! stack     at ChildProcess.onCpExit (/home/mike/.nodenv/versions/8.7.0/lib/node_modules/npm/node_modules/node-gyp/lib/configure.js:336:16)
gyp ERR! stack     at emitTwo (events.js:125:13)
gyp ERR! stack     at ChildProcess.emit (events.js:213:7)
gyp ERR! stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:200:12)
gyp ERR! System Linux 4.13.8-1-hardened
gyp ERR! command "/home/mike/.nodenv/versions/8.7.0/bin/node" "/home/mike/.nodenv/versions/8.7.0/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /home/mike/projects/play/node_oracle/node_modules/oracledb
gyp ERR! node -v v8.7.0
gyp ERR! node-gyp -v v3.6.2
gyp ERR! not ok 
npm WARN node_oracle@1.0.0 No description
npm WARN node_oracle@1.0.0 No repository field.

npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! oracledb@1.13.1 install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the oracledb@1.13.1 install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/mike/.npm/_logs/2017-10-19T21_01_24_226Z-debug.log

Not being a C/C++ programmer, these moments are pretty perplexing. It looks a lot like ORACLE_HOME=/usr/lib/oracle/product/11.2.0/xe OCI_LIB_DIR=$ORACLE_HOME/lib OCI_INC_DIR=$ORACLE_HOME/xdk/include npm i oracledb should work, but it doesn’t.

ldconfig -N -v | grep libclntsh.so prints out libclntsh.so.11.1 -> libclntsh.so.11.1 so the library seems to be findable, just not by the driver.

Plan B

It turns out that the headers and libraries we need are also available in Oracle’s instantclient. This would mean more downloading/packaging silliness except someone has gone to the effort to package these instantclient libraries and providing the as a pacman repo. Since the world is a beautiful place and everyone is friends on the internet I am going to pull my packages from them by adding these lines to my pacman.conf:

[mike@longshot node_oracle]$ tail -n 3 /etc/pacman.conf 
[oracle]
SigLevel = Optional TrustAll
Server = http://linux.shikadi.net/arch/$repo/$arch/

Then we update and install.

[mike@longshot node_oracle] sudo pacman -Sy
[mike@longshot node_oracle]$ sudo pacman -S oracle-instantclient-sdk oracle-instantclient-basic

Looking at the contents pacman -Ql oracle-instantclient-sdk shows bunch of files being put into /usr/include, while pacman -Ql oracle-instantclient-basic shows our much sought after libclntsh.so going into /usr/lib. It looks like we finally have some plausible values for OCI_LIB_DIR and OCI_INC_DIR.

[mike@longshot node_oracle]$ OCI_LIB_DIR=/usr/lib OCI_INC_DIR=/usr/include npm i oracledb

> oracledb@1.13.1 install /home/mike/projects/play/node_oracle/node_modules/oracledb
> node-gyp rebuild

make: Entering directory '/home/mike/projects/play/node_oracle/node_modules/oracledb/build'
  CXX(target) Release/obj.target/oracledb/src/njs/src/njsOracle.o
  CXX(target) Release/obj.target/oracledb/src/njs/src/njsPool.o
  CXX(target) Release/obj.target/oracledb/src/njs/src/njsConnection.o
  CXX(target) Release/obj.target/oracledb/src/njs/src/njsResultSet.o
  CXX(target) Release/obj.target/oracledb/src/njs/src/njsMessages.o
  CXX(target) Release/obj.target/oracledb/src/njs/src/njsIntLob.o
  CXX(target) Release/obj.target/oracledb/src/dpi/src/dpiEnv.o
  CXX(target) Release/obj.target/oracledb/src/dpi/src/dpiEnvImpl.o
  CXX(target) Release/obj.target/oracledb/src/dpi/src/dpiException.o
  CXX(target) Release/obj.target/oracledb/src/dpi/src/dpiExceptionImpl.o
  CXX(target) Release/obj.target/oracledb/src/dpi/src/dpiConnImpl.o
  CXX(target) Release/obj.target/oracledb/src/dpi/src/dpiDateTimeArrayImpl.o
  CXX(target) Release/obj.target/oracledb/src/dpi/src/dpiPoolImpl.o
  CXX(target) Release/obj.target/oracledb/src/dpi/src/dpiStmtImpl.o
  CXX(target) Release/obj.target/oracledb/src/dpi/src/dpiUtils.o
  CXX(target) Release/obj.target/oracledb/src/dpi/src/dpiLob.o
  CXX(target) Release/obj.target/oracledb/src/dpi/src/dpiCommon.o
  SOLINK_MODULE(target) Release/obj.target/oracledb.node
  COPY Release/oracledb.node
make: Leaving directory '/home/mike/projects/play/node_oracle/node_modules/oracledb/build'
npm notice created a lockfile as package-lock.json. You should commit this file.
npm WARN node_oracle@1.0.0 No description
npm WARN node_oracle@1.0.0 No repository field.

+ oracledb@1.13.1
added 2 packages in 11.894s

Talking to Oracle-XE from Node

After installing XE, instandclient-basic and sdk, the full set of environmental variable that made this thing work are:

export ORACLE_HOME=/usr/lib/oracle/product/11.2.0/xe
export ORACLE_SID=XE
export NLS_LANG=`$ORACLE_HOME/bin/nls_lang.sh`
export PATH=$PATH:$ORACLE_HOME/bin
export OCI_INC_DIR=/usr/include
export OCI_LIB_DIR=/usr/lib

Next up I want to configure XE (which seems to need those vars set). Below I’ll use sudo -E to ensure that all of those variable still exist when I do sudo:

[mike@longshot node_oracle]$ sudo -E /etc/rc.d/oracle-xe configure
[sudo] password for mike: 

Oracle Database 11g Express Edition Configuration
-------------------------------------------------
This will configure on-boot properties of Oracle Database 11g Express 
Edition.  The following questions will determine whether the database should 
be starting upon system boot, the ports it will use, and the passwords that 
will be used for database accounts.  Press <Enter> to accept the defaults. 
Ctrl-C will abort.

Specify the HTTP port that will be used for Oracle Application Express [8080]:

Specify a port that will be used for the database listener [1521]:

Specify a password to be used for database accounts.  Note that the same
password will be used for SYS and SYSTEM.  Oracle recommends the use of 
different passwords for each database account.  This can be done after 
initial configuration:
Confirm the password:

Do you want Oracle Database 11g Express Edition to be started on boot (y/n) [y]:y

Starting Oracle Net Listener...Done
Configuring database...Done
Starting Oracle Database 11g Express Edition instance...Done
Installation completed successfully.

In theory XE is configured and running (in the future you’ll probably want to start it with systemctl start oracle-xe), and the node-oracle README suggests that we run one of the examples to test it. What they don’t mention is that the example is based on sample data in an “hr” account that needs to be enabled first.

[mike@longshot oracle-xe]$ sqlplus /nolog

SQL*Plus: Release 11.2.0.2.0 Production on Fri Oct 20 14:20:30 2017

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

SQL> connect system/yourpassword as sysdba
Connected.
SQL> ALTER USER hr ACCOUNT UNLOCK;          
User altered.
SQL> ALTER USER hr IDENTIFIED BY password;         
User altered.

The example script reads it’s config from a file so I created that using the terrible password I assigned to the hr account above:

[mike@longshot node_oracle]$ cat dbconfig.js 
module.exports = {
    user: "hr",
    password: "password",
    connectString: "localhost/XE",
};

So now we should be able to run the example:

[mike@longshot node_oracle]$ node select1.js 
[ { name: 'DEPARTMENT_ID' }, { name: 'DEPARTMENT_NAME' } ]
[ [ 180, 'Construction' ] ]

What’s next

The next logical step here is to start exploring the capabilities of the Node driver. There is also the Simple-oracledb package which is suddenly sounding very interesting to me.
Hopefully this will save someone else some time.

Advertisements

invalid value for parameter “TimeZone”

While working on standing up a Rails app I ran into a pretty weird error that really had me scratching my head.

[mike@longshot identity-idp]$ rake db:create
PG::InvalidParameterValue: ERROR:  invalid value for parameter "TimeZone": "UTC"
: SET time zone 'UTC'
Couldn't create database for {"pool"=>5, "timeout"=>5000, "host"=>"localhost", "adapter"=>"postgresql", "encoding"=>"utf8", "database"=>"upaya_development", "port"=>5432}
rake aborted!
ActiveRecord::StatementInvalid: PG::InvalidParameterValue: ERROR:  invalid value for parameter "TimeZone": "UTC"
: SET time zone 'UTC'

PG::InvalidParameterValue: ERROR:  invalid value for parameter "TimeZone": "UTC"

Tasks: TOP => db:create
(See full trace by running task with --trace)

The output of timedatectl status looked OK, but just to be sure, I updated them to EDT. No difference. When I tried rake db:migrate I got a far more instructive error:

[mike@longshot identity-idp]$ rake db:migrate
rake aborted!
ArgumentError: Invalid Timezone: UTC
/home/mike/projects/identity-idp/config/environment.rb:5:in `<top (required)>'
TZInfo::InvalidTimezoneIdentifier: Expected 44 bytes reading '/usr/share/zoneinfo/UTC', but got 0 bytes
/home/mike/projects/identity-idp/config/environment.rb:5:in `<top (required)>'
TZInfo::InvalidZoneinfoFile: Expected 44 bytes reading '/usr/share/zoneinfo/UTC', but got 0 bytes
/home/mike/projects/identity-idp/config/environment.rb:5:in `<top (required)>'
Tasks: TOP => log => environment
(See full trace by running task with --trace)

/usr/share/zoneinfo/UTC is 0 bytes? A quick looks shows it to be true, and the package that supplied this file is tzdata.

[mike@longshot identity-idp]$ cat /usr/share/zoneinfo/UTC 
[mike@longshot identity-idp]$ ls -l /usr/share/zoneinfo/UTC 
-rw-r--r-- 6 root root 0 Jul 26 18:01 /usr/share/zoneinfo/UTC
[mike@longshot identity-idp]$ pacman -Qo /usr/share/zoneinfo/UTC
/usr/share/zoneinfo/UTC is owned by tzdata 2017b-1

That doesn’t seem right, let’s reinstall…

[mike@longshot identity-idp]$ sudo pacman -S tzdata
warning: tzdata-2017b-1 is up to date -- reinstalling
resolving dependencies...
looking for conflicting packages...

Packages (1) tzdata-2017b-1

Total Installed Size:  1.81 MiB
Net Upgrade Size:      0.00 MiB

:: Proceed with installation? [Y/n] y
(1/1) checking keys in keyring                                               [###########################################] 100%
(1/1) checking package integrity                                             [###########################################] 100%
(1/1) loading package files                                                  [###########################################] 100%
(1/1) checking for file conflicts                                            [###########################################] 100%
(1/1) checking available disk space                                          [###########################################] 100%
:: Processing package changes...
(1/1) reinstalling tzdata                                                    [###########################################] 100%
:: Running post-transaction hooks...
(1/1) Arming ConditionNeedsUpdate...
[mike@longshot identity-idp]$ ls -l /usr/share/zoneinfo/UTC 
-rw-r--r-- 6 root root 127 Mar 24 12:38 /usr/share/zoneinfo/UTC
[mike@longshot identity-idp]$ cat /usr/share/zoneinfo/UTC 
TZif2UTCTZif2�UTC
UTC0

After that, creating and migrating worked again without problems. I’m not sure what happened there, but hopefully this will prevent people (or future me) wasting a bunch more time on it.

Installing Ruby 2.3 on Archlinux

I’ve been running Archlinux for a few years now. I ran Ubuntu for a 8 years before that and frequently ran into issues with old packages that eventually spurred me to jump to Arch where I get to deal with issues in new packages instead. “Pick your poison” as the saying goes.

Today I needed to get an app running that required Ruby 2.3.3 and, true to form, the poison of the day was all about the libraries installed on my system being to new to compile Ruby 2.3.

I’m a long time user of Rbenv. It’s nice and clean and it’s ruby-build plugin makes installing new versions of Ruby as easy as rbenv install 2.3.3… which is exactly what kicked off the fun.

[mike@longshot identity-idp]$ rbenv install 2.3.3
Downloading ruby-2.3.3.tar.bz2...
-> https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.3.tar.bz2
Installing ruby-2.3.3...
*** Error in `./miniruby': malloc(): memory corruption: 0x00007637497798d8 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x72bdd)[0x66e27048fbdd]
...
./miniruby(+0x2470b)[0x80e03b1670b]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x66e27043d4ca]
./miniruby(_start+0x2a)[0x80e03b1673a]
======= Memory map: ========
80e03af2000-80e03de0000 r-xp 00000000 00:27 154419
...
66e2715e7000-66e2715e8000 rw-p 00000000 00:00 0
763748f81000-763749780000 rw-p 00000000 00:00 0                          [stack]

BUILD FAILED (Arch Linux using ruby-build 20170726-9-g86909bf)

Inspect or clean up the working tree at /tmp/ruby-build.20170828122031.16671
Results logged to /tmp/ruby-build.20170828122031.16671.log

Last 10 log lines:
generating enc.mk
creating verconf.h
./template/encdb.h.tmpl:86:in `<main>': undefined local variable or method `encidx' for main:Object (NameError)
	from /tmp/ruby-build.20170828122031.16671/ruby-2.3.3/lib/erb.rb:864:in `eval'
	from /tmp/ruby-build.20170828122031.16671/ruby-2.3.3/lib/erb.rb:864:in `result'
	from ./tool/generic_erb.rb:38:in `<main>'
make: *** [uncommon.mk:818: encdb.h] Error 1
make: *** Waiting for unfinished jobs....
verconf.h updated
make: *** [uncommon.mk:655: enc.mk] Aborted (core dumped)

The issues here are twofold; Ruby 2.3 won’t build with GCC 7 or OpenSSL 1.1. Arch as it stands today has both by default.

[mike@longshot ~]$ openssl version
OpenSSL 1.1.0f  25 May 2017
[mike@longshot ~]$ gcc -v
gcc version 7.1.1 20170630 (GCC)

To solve the OpenSSL problem we need 1.0 installed (sudo pacman -S openssl-1.0, but it’s probably installed already), and we need to tell ruby-build where to find both the header files, and the openssl directory itself.

Helping compilers find header files is the job of pkg-config. On Arch the config files that do that are typically in /usr/lib/pkgconfig/ but in this case we want to point to the pkg-config file in /usr/lib/openssl/1.0/pkgconfig before searching there. To do that we assign a colon-delimited set of paths to PKG_CONFIG_PATH.

Then we need to tell Ruby where the openssl directory is which is done via RUBY_CONFIGURE_OPTS.

[mike@longshot ~]$ PKG_CONFIG_PATH=/usr/lib/openssl-1.0/pkgconfig/:/usr/lib/pkgconfig/ RUBY_CONFIGURE_OPTS=--with-openssl-dir=/usr/lib/openssl-1.0/ rbenv install 2.3.3
Downloading ruby-2.3.3.tar.bz2...
-> https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.3.tar.bz2
Installing ruby-2.3.3...

BUILD FAILED (Arch Linux using ruby-build 20170726-9-g86909bf)

Inspect or clean up the working tree at /tmp/ruby-build.20170829103308.24191
Results logged to /tmp/ruby-build.20170829103308.24191.log

Last 10 log lines:
  R8: 0x0000016363058550  R9: 0x0000016362cc3dd8 R10: 0x0000016362fafe80
 R11: 0x000000000000001b R12: 0x0000000000000031 R13: 0x0000016363059a40
 R14: 0x0000000000000000 R15: 0x00000163630599a0 EFL: 0x0000000000010202

-- C level backtrace information -------------------------------------------
linking static-library libruby-static.a
ar: `u' modifier ignored since `D' is the default (see `U')
verifying static-library libruby-static.a
make: *** [uncommon.mk:655: enc.mk] Segmentation fault (core dumped)
make: *** Waiting for unfinished jobs....

Our OpenSSL errors fixed we now get the segfault that comes from GCC 7. So we need to install an earlier gcc (sudo pacman -S gcc5) add two more variables (CC and CXX) to specify the C and C++ compilers to we want used.

[mike@longshot ~]$ CC=gcc-5 CXX=g++-5 PKG_CONFIG_PATH=/usr/lib/openssl-1.0/pkgconfig/:/usr/lib/pkgconfig/ RUBY_CONFIGURE_OPTS=--with-openssl-dir=/usr/lib/openssl-1.0/ rbenv install 2.3.3
Downloading ruby-2.3.3.tar.bz2...
-> https://cache.ruby-lang.org/pub/ruby/2.3/ruby-2.3.3.tar.bz2
Installing ruby-2.3.3...
Installed ruby-2.3.3 to /home/mike/.rbenv/versions/2.3.3

With that done, you should now have a working Ruby 2.3:

[mike@longshot ~]$ rbenv global 2.3.3
[mike@longshot ~]$ ruby -e "puts 'hello world'"
hello world

Dealing with supernodes in ArangoDB

About a year ago I wrote about data modeling in ArangoDB. The main take away is to avoid the temptation to make unnecessary vertices (and all the attendant edges), since the traversing high degree vertices (a vertex with lots of edges pointing at it) is an expensive process.

Graphs are cool and it’s easy to forget that ArangoDB is a great document store. Treating is as such means “Embedding“, to borrow a term from MongoDB,  which lets you keep your traversals fast.

But while good data modeling can prevent you from creating some high-degree vertices, you will run into them eventually, and ArangoDB’s new “vertex-centric” indexes is a feature that is there for exactly those moments.

First we can try and get a sense of the impact these high-degree vertices have on a traversal. To do that I wrote a script that would generate a star graph starting with three “seed” vertices, a start, middle, and end.

The goal was to walk across the middle vertex to get from start to end while changing the number of vertices connected to the middle.

// the seed vertices
let seedVertices = [
  {name: 'start', _key: '1', type: "seed"},
  {name: 'middle', _key: '2', type: "seed"},
  {name: 'end', _key: '3', type: "seed"}
]

With just 10 vertices, this is nowhere near deserving the name “supernode”, but it’s pretty clear why these things are called star-graphs.

star_graph
A baby “super node”.

Next we crank up the number of vertices so we can see the effect on traversal time.

star_graph_traversal

By the time we get to a vertex surrounded by 100,000 to 1,000,000 other vertices we are starting to get into full-blown supernode territory. You can see that by the time we get to sorting through a million incident edges ArangoDB is up to 4.3 seconds to traverse across that middle vertex.

A “vertex-centric” index is one that include either _to or _from plus some other edge attribute. In this case I’ve added a type attribute which I’ll combine with _to to make my index. (Note that if I was allowing “any” as a direction I would need a second index that combines type and _from)

creating_a_hash_index

ArangoDB calculates the path and offers it to you as you do a traversal. You can access the path by declaring a variable to receive it. In the query below p contains the path. If we use the ALL array comparison operator on p.edges to say that all the edges in the path should have type of “seed”, that should be enough to get Arango to use are new index.

    FOR v,e,p IN 1..2 ANY 'vertices/1' edges
      FILTER p.edges[*].type ALL == 'seed' &&  v.name == 'end'
        RETURN v

The selectivity score shown by explain doesn’t leave you very hopeful that Arango will use the index…

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields               Ranges
  2   hash   edges        false    false         0.00 %   [ `type`, `_to` ]    base INBOUND
  2   edge   edges        false    false        50.00 %   [ `_from`, `_to` ]   base OUTBOUND

but having our query execute in 0.2 milliseconds instead of 4.3 seconds is a pretty good indication it’s working.

arango_query_explained

Back to modeling

For me, this little experiment has underscored the importance of good data modeling. You don’t have to worry about the number of edges if you don’t create a vertex in the first place. If you are conservative about what you are willing to make a vertex, and make good use of indexes you can see that ArangoDB is going to be able to gracefully handle some pretty hefty data.

With vertex-centric indexes, and other features like the new Smartgraphs, ArangoDB has gone from being a great document database with a trick up it’s sleeve (Joins!) to being a really solid graph database and there are new features landing regularly. I’m curious to see where they go next.

Making requests in vanilla js with Apollo

There are lots of good reasons to be running GraphQL on the server. It’s clean, no ORM‘s or frameworks needed and has some interesting security properties too. But just because you are rockin’ the new hotness on the server side doesn’t mean you want it on the client side too. Sometimes the right thing is the simplest thing that can possibly work.

The Apollo Client is a GraphQL client made by the people behind Meteor. It aims to be an advanced and capable client that plays nice with the rest of the ecosystem. It has a lot going on, and sadly doesn’t seem to spend much time advertising that it’s actually a pretty great fit for those “simplest thing that can possibly work” moments as well.

Installing it is roughly what you might expect, but you also need the graphql-tag library so you can create queries Javascript’s new tagged template literals.

npm install --save apollo-client graphql-tag

So here, in all it’s glory, “simplest thing that can possibly work”:

import ApolloClient from 'apollo-client'
import gql from 'graphql-tag'

const client = new ApolloClient();

let query = gql`
  query {
    foo {
      bar
    }
  }
`
client.query({query}).then((results) => {
  //do something useful
})

I think this is actually even more simple than Lokka, which actually bills itself as the “Simple JavaScript Client for GraphQL”.

If you need to specify your endpoint as something other than the host the js came from, then you get to add just a little extra:

import ApolloClient, { createNetworkInterface } from 'apollo-client'

const opts = {uri: 'http://example.com:8080/graphql'}
const networkInterface = createNetworkInterface(opts)
const client = new ApolloClient({
  networkInterface,
});

But simple doesn’t mean we are restricted to queries only. Mutations can be simple too:

let mutation = gql`
  mutation ($foo: [FooInput] $bar: String!) {
    addFoo(
      foo: $foo
      bar: $bar
    ){
      foo
      bar
    }
  }
`

client.mutate({mutation, variables: {foo: [1,2,3], bar: "baz"}}).then((results) => {
  //do something with result
})

Obviously you will need the server side schema to support that, but that is all that is needed on the client.

Apollo has a tonne of features and integrates with Redux nicely (it does caching with it’s own internal Redux store unless you want it to use yours). While simplicity doesn’t appear to be it’s focus, the Apollo client is certainly capable of it. You’d just never guess from the documentation. Hopefully this will make it a little easier to appreciate the simple side of Apollo.

Installing R-Studio on Ubuntu 16.10

rstudioInstalling things on Linux is either really easy, or a yak shave with surprisingly little between those extremes.

It seems that Ubuntu 16.10 has removed Gstreamer 0.10 from the repos and replaced it with Gstreamer 1.0, which is great… until you need to install R-Studio.

While the R-Studio people are aiming to drop the Gstreamer dependency, for the moment, as of 16.10, installing it has fallen into the yak-shave category.

Installing R-Studio works fine, but if you try to run (from the terminal) it you will get the error:

rstudio: error while loading shared libraries: libgstreamer-0.10.so.0: cannot open shared object file: No such file or directory

We can see that it’s failing to load Gstreamer, but since it’s been removed from the Ubuntu repos fixing this will mean getting those packages elsewhere.

To start with, we can download the latest R-studio daily build and install it using dpkg:

$ wget https://s3.amazonaws.com/rstudio-dailybuilds/rstudio-1.0.124-amd64.deb
$ sudo dpkg -i rstudio-1.0.124-amd64.deb

The dpkg command can also query the package to display information about it. If we use the uppercase I option we can confirm that this package requires exactly version 0.10 of libgstreamer:

dpkg -I rstudio-1.0.124-amd64.deb 
 new debian package, version 2.0.
 size 98840122 bytes: control archive=42847 bytes.
     554 bytes,    12 lines      control              
  163246 bytes,  1548 lines      md5sums              
     198 bytes,    10 lines   *  postinst             #!/bin/sh
     158 bytes,    10 lines   *  postrm               #!/bin/sh
 Package: rstudio
 Version: 1.0.124
 Section: devel
 Priority: optional
 Architecture: amd64
 Depends: libjpeg62, libedit2, libgstreamer0.10-0, libgstreamer-plugins-base0.10-0, libssl1.0.0,  libc6 (>= 2.7)
 Recommends: r-base (>= 2.11.1)
 Installed-Size: 526019
 Maintainer: RStudio <info@rstudio.com>
 Description: RStudio
  RStudio is a set of integrated tools designed to help you be more productive with R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, and workspace management.

Debian (which Ubuntu is based on) has the old Gstreamer packages we need to satisfy those dependencies, so we can get them from there. If you need something other than the AMD64 see here and here. The if you have a 64bit machine, you can download and install like this:

# download with wget
$ wget http://ftp.ca.debian.org/debian/pool/main/g/gstreamer0.10/libgstreamer0.10-0_0.10.36-1.5_amd64.deb
$ wget http://ftp.ca.debian.org/debian/pool/main/g/gst-plugins-base0.10/libgstreamer-plugins-base0.10-0_0.10.36-2_amd64.deb

# Now install with dpkg
$ sudo dpkg -i libgstreamer0.10-0_0.10.36-1.5_amd64.deb
$ sudo dpkg -i libgstreamer-plugins-base0.10-0_0.10.36-2_amd64.deb

While that solves R’s problems, we now have one of our own. We’ve purposefully installed old packages and don’t want Ubuntu’s package manager to enthusiastically upgrade them next time we update.
To resolve that problem will put a hold on them with apt-mark:

$ sudo apt-mark hold libgstreamer-plugins-base0.10-0
libgstreamer-plugins-base0.10-0 set on hold.
$ sudo apt-mark hold libgstreamer0.10
libgstreamer0.10-0 set on hold.

And we can check the packages that are on hold with:

$ sudo apt-mark showhold
libgstreamer-plugins-base0.10-0
libgstreamer0.10-0

Hopefully that saves someone some Googling.
Now that’s working, it’s time to play with some R!

GraphQL and security

Imagine you have a web application that allows people view widgets by name. Somewhere deep in your codebase, a programmer has thoughtfully updated the existing ES5 SQL injection to this stylish new ES6 SQL injection:

`select * from widgets where name = '${name}';`

Injection attacks just like this one persist in-spite of the fact that a search for “sql injection tutorial” returns around 3,740,000 results on Google. They have made the top of the OWASP top 10 in 2010 and 2013 and probably will again in 2016 and likely for the foreseeable future. Motherboard even calls it “the hack that will never go away“.

The standard answer to this sort of problem is input sanitization. It’s standard enough that it shows up in jokes.

exploits_of_a_mom
xkcd’s famous “Exploits of a Mom

But when faced with such consistent failure, it’s reasonable to ask if there isn’t something systemic going on.

There is a sub-field of security research known as Language Theoretic Security (Langsec) that is asking exactly that question.

Exploitation is unexpected computation caused reliably or probabilistically by some crafted inputs. — Meredith Patterson

Dropping the students table in the comic is exactly the kind of “unexpected computation” they are talking about. To combat it, Langsec encourages programmers to consider their inputs as a language and the sum of the adhoc checks on those inputs as a parser for that language.

They advocate a clean separation of concerns between the recognition and processing of inputs, and bringing a recognizer of appropriate strength to bear on the input you are parsing (ie: not validating HTML/XML with a regex)

Where this starts intersecting with GraphQL is in what this actually implies:

This design and programming paradigm begins with a description of valid inputs to a program as a formal language (such as a grammar or a DFA). The purpose of such a disciplined specification is to cleanly separate the input-handling code and processing code.

A LangSec-compliant design properly transforms input-handling code into a recognizer for the input language; this recognizer rejects non-conforming inputs and transforms conforming inputs to structured data (such as an object or a tree structure, ready for type or value-based pattern matching).

The processing code can then access the structured data (but not the raw inputs or parsers’ temporary data artifacts) under a set of assumptions regarding the accepted inputs that are enforced by the recognizer.

If all that starts sounding kind of familiar, well it did to me too.

GraphQL

GraphQL allows you to define a formal language via it’s type system. As queries arrive, they are lexed, parsed, matched against the user defined types and formed into an Abstract Syntax Tree (AST). The contents of that AST are then made available to processing code via resolve functions.

The promise of Langsec is “software free from broad and currently dominant classes of bugs and vulnerabilities related to incorrect parsing and interpretation of messages between software components”, and GraphQL seems poised to put this within reach of every developer.

Turning back to the example we started with, how could we use GraphQL to protect against that SQL injection? Lets get this going in a test.

require("babel-polyfill")

import expect from 'expect'
import {
  graphql,
  GraphQLSchema,
  GraphQLScalarType,
  GraphQLObjectType,
  GraphQLString,
  GraphQLNonNull
} from 'graphql'
import { GraphQLError } from 'graphql/error';
import { Kind } from 'graphql/language';

describe('SQL injection', () => {

  it('returns a SQL injected string', async () => {

    let schema = new GraphQLSchema({
      query: new GraphQLObjectType({
        name: 'Query',
        fields: () => ({
          widgets: {
            type: GraphQLString,
            args: {
              name: {
                description: 'The name of the widget',
                type: new GraphQLNonNull(GraphQLString)
              }
            },
            resolve: (source, {name}) => `select * from widgets where name = '${name}';`
          }
        })
      })
    })

    let query = `
      query widgetByName($name: String!) {
        widgets(name: $name)
      }
    `

    //args: schema, query, rootValue, contextValue, variables
    let result = await graphql(schema, query, null, null, {name: "foo'; drop table widgets; --"})
    expect(result.data.widgets).toEqual("select * from widgets where name = 'foo'; drop table widgets; --'")
  })

})

GraphQL brings lots of benefits (no need for API versioning, all data in a single round trip, etc…), and while those are compelling, simply passing strings into our backend systems misses an opportunity to do something different.

Let’s create a custom type that is more specific than just a “string”; an AlphabeticString.

  it('is fixed with custom types', async () => {

    let AlphabeticString = new GraphQLScalarType({
      name: 'AlphabeticString',
      description: 'represents a string with no special characters.',
      serialize: String,
      parseValue: (value) => {
        if(value.match(/^([A-Za-z]|\s)+$/)) {
          return value
        }
        return null
      },
      parseLiteral: (ast) => {
        if(value.match(/^([A-Za-z]|\s)+$/)) {
          return ast.value
        }
        return null
      }
    });

    let schema = new GraphQLSchema({
      query: new GraphQLObjectType({
        name: 'Query',
        fields: () => ({
          widgets: {
            type: GraphQLString,
            args: {
              name: {
                description: 'The name of the widget',
                type: new GraphQLNonNull(AlphabeticString)
              }
            },
            resolve: (source, {name}) => {
              return `select * from widgets where name = '${name}';`
            }
          }
        })
      })
    })

    let query = `
    query widgetByName($name: AlphabeticString!) {
        widgets(name: $name)
      }
    `

    //args: schema, query, rootValue, contextValue, variables
    let result = await graphql(schema, query, null, null, {name: "foo'; drop table widgets; --"})
    expect(result.errors).toExist()
    expect(result.errors[0].message).toInclude("got invalid value")
  })

This test now passes; the SQL string is rejected during parsing before ever reaching the resolve function.

On making custom types

There are already libraries out there that provide custom types for things like URLs, datetime’s and other things, but you will definitely want to be able to make your own.

To start defining your own types you will need to understand the role
the various functions play:

    let MyType = new GraphQLScalarType({
      name: 'MyType',
      description: 'this will show up in the documentation!',
      serialize: (value) => { //... }
      parseValue: (value) => { //... },
      parseLiteral: (ast) => { //... }
    });

The serialize function is the easiest to understand: GraphQL responses are serialized to JSON, and if this type needs special treatment before being included in a response, this is the place to do it.

Understanding parseValue and parseLiteral requires a quick look at two different queries:

//name is a string literal, so parseLiteral is called 
query {
  widgets(name: "Soft squishy widget")
}

//name is a value so parseValue is called
query widgetByName($name: AlphabeticString!) {
  widgets(name: $name)
}

In the first query, name is a string literal, so your types parseLiteral function will be called. The second query, the name is supplied as the value of a variable so parseValue is called.

Your type could end up being used in either of those scenarios so it’s important to do your validation in both of those functions.

The standard GraphQL types (GraphQLInt, GraphQLFloat, etc.) also implement those same functions.

Putting the whole thing together, the process then looks something like this:

A query arrives, gets tokenized, the parser builds the AST, calling parseValue and parseLiteral as needed. When the AST is complete, it recurses down the AST calling resolve, using parseValue and parseLiteral again on whatever is returned before calling serialize on each to create the response.

Where to go from here

Langsec is a deep topic, with implications wider than just what is discussed here. While GraphQL is certainly not “Langsec in a box”, it not only seems to be making the design patterns that follow from Langsec’s insights a reality, it has a has a shot at making them mainstream. I’d love to see the Langsec lens turned on GraphQL and see how it can guide the evolution of the spec and the practices around it.

I would encourage you to dig into the ideas of Langsec, and the best place to start is here: