GraphQL and security

Imagine you have a web application that allows people view widgets by name. Somewhere deep in your codebase, a programmer has thoughtfully updated the existing ES5 SQL injection to this stylish new ES6 SQL injection:

`select * from widgets where name = '${name}';`

Injection attacks just like this one persist in-spite of the fact that a search for “sql injection tutorial” returns around 3,740,000 results on Google. They have made the top of the OWASP top 10 in 2010 and 2013 and probably will again in 2016 and likely for the foreseeable future. Motherboard even calls it “the hack that will never go away“.

The standard answer to this sort of problem is input sanitization. It’s standard enough that it shows up in jokes.

exploits_of_a_mom
xkcd’s famous “Exploits of a Mom

But when faced with such consistent failure, it’s reasonable to ask if there isn’t something systemic going on.

There is a sub-field of security research known as Language Theoretic Security (Langsec) that is asking exactly that question.

Exploitation is unexpected computation caused reliably or probabilistically by some crafted inputs. — Meredith Patterson

Dropping the students table in the comic is exactly the kind of “unexpected computation” they are talking about. To combat it, Langsec encourages programmers to consider their inputs as a language and the sum of the adhoc checks on those inputs as a parser for that language.

They advocate a clean separation of concerns between the recognition and processing of inputs, and bringing a recognizer of appropriate strength to bear on the input you are parsing (ie: not validating HTML/XML with a regex)

Where this starts intersecting with GraphQL is in what this actually implies:

This design and programming paradigm begins with a description of valid inputs to a program as a formal language (such as a grammar or a DFA). The purpose of such a disciplined specification is to cleanly separate the input-handling code and processing code.

A LangSec-compliant design properly transforms input-handling code into a recognizer for the input language; this recognizer rejects non-conforming inputs and transforms conforming inputs to structured data (such as an object or a tree structure, ready for type or value-based pattern matching).

The processing code can then access the structured data (but not the raw inputs or parsers’ temporary data artifacts) under a set of assumptions regarding the accepted inputs that are enforced by the recognizer.

If all that starts sounding kind of familiar, well it did to me too.

GraphQL

GraphQL allows you to define a formal language via it’s type system. As queries arrive, they are lexed, parsed, matched against the user defined types and formed into an Abstract Syntax Tree (AST). The contents of that AST are then made available to processing code via resolve functions.

The promise of Langsec is “software free from broad and currently dominant classes of bugs and vulnerabilities related to incorrect parsing and interpretation of messages between software components”, and GraphQL seems poised to put this within reach of every developer.

Turning back to the example we started with, how could we use GraphQL to protect against that SQL injection? Lets get this going in a test.

require("babel-polyfill")

import expect from 'expect'
import {
  graphql,
  GraphQLSchema,
  GraphQLScalarType,
  GraphQLObjectType,
  GraphQLString,
  GraphQLNonNull
} from 'graphql'
import { GraphQLError } from 'graphql/error';
import { Kind } from 'graphql/language';

describe('SQL injection', () => {

  it('returns a SQL injected string', async () => {

    let schema = new GraphQLSchema({
      query: new GraphQLObjectType({
        name: 'Query',
        fields: () => ({
          widgets: {
            type: GraphQLString,
            args: {
              name: {
                description: 'The name of the widget',
                type: new GraphQLNonNull(GraphQLString)
              }
            },
            resolve: (source, {name}) => `select * from widgets where name = '${name}';`
          }
        })
      })
    })

    let query = `
      query widgetByName($name: String!) {
        widgets(name: $name)
      }
    `

    //args: schema, query, rootValue, contextValue, variables
    let result = await graphql(schema, query, null, null, {name: "foo'; drop table widgets; --"})
    expect(result.data.widgets).toEqual("select * from widgets where name = 'foo'; drop table widgets; --'")
  })

})

GraphQL brings lots of benefits (no need for API versioning, all data in a single round trip, etc…), and while those are compelling, simply passing strings into our backend systems misses an opportunity to do something different.

Let’s create a custom type that is more specific than just a “string”; an AlphabeticString.

  it('is fixed with custom types', async () => {

    let AlphabeticString = new GraphQLScalarType({
      name: 'AlphabeticString',
      description: 'represents a string with no special characters.',
      serialize: String,
      parseValue: (value) => {
        if(value.match(/^([A-Za-z]|\s)+$/)) {
          return value
        }
        return null
      },
      parseLiteral: (ast) => {
        if(value.match(/^([A-Za-z]|\s)+$/)) {
          return ast.value
        }
        return null
      }
    });

    let schema = new GraphQLSchema({
      query: new GraphQLObjectType({
        name: 'Query',
        fields: () => ({
          widgets: {
            type: GraphQLString,
            args: {
              name: {
                description: 'The name of the widget',
                type: new GraphQLNonNull(AlphabeticString)
              }
            },
            resolve: (source, {name}) => {
              return `select * from widgets where name = '${name}';`
            }
          }
        })
      })
    })

    let query = `
    query widgetByName($name: AlphabeticString!) {
        widgets(name: $name)
      }
    `

    //args: schema, query, rootValue, contextValue, variables
    let result = await graphql(schema, query, null, null, {name: "foo'; drop table widgets; --"})
    expect(result.errors).toExist()
    expect(result.errors[0].message).toInclude("got invalid value")
  })

This test now passes; the SQL string is rejected during parsing before ever reaching the resolve function.

On making custom types

There are already libraries out there that provide custom types for things like URLs, datetime’s and other things, but you will definitely want to be able to make your own.

To start defining your own types you will need to understand the role
the various functions play:

    let MyType = new GraphQLScalarType({
      name: 'MyType',
      description: 'this will show up in the documentation!',
      serialize: (value) => { //... }
      parseValue: (value) => { //... },
      parseLiteral: (ast) => { //... }
    });

The serialize function is the easiest to understand: GraphQL responses are serialized to JSON, and if this type needs special treatment before being included in a response, this is the place to do it.

Understanding parseValue and parseLiteral requires a quick look at two different queries:

//name is a string literal, so parseLiteral is called 
query {
  widgets(name: "Soft squishy widget")
}

//name is a value so parseValue is called
query widgetByName($name: AlphabeticString!) {
  widgets(name: $name)
}

In the first query, name is a string literal, so your types parseLiteral function will be called. The second query, the name is supplied as the value of a variable so parseValue is called.

Your type could end up being used in either of those scenarios so it’s important to do your validation in both of those functions.

The standard GraphQL types (GraphQLInt, GraphQLFloat, etc.) also implement those same functions.

Putting the whole thing together, the process then looks something like this:

A query arrives, gets tokenized, the parser builds the AST, calling parseValue and parseLiteral as needed. When the AST is complete, it recurses down the AST calling resolve, using parseValue and parseLiteral again on whatever is returned before calling serialize on each to create the response.

Where to go from here

Langsec is a deep topic, with implications wider than just what is discussed here. While GraphQL is certainly not “Langsec in a box”, it not only seems to be making the design patterns that follow from Langsec’s insights a reality, it has a has a shot at making them mainstream. I’d love to see the Langsec lens turned on GraphQL and see how it can guide the evolution of the spec and the practices around it.

I would encourage you to dig into the ideas of Langsec, and the best place to start is here:

That stonewalling thing

There is a meme in the current crypto “debate” that makes me cringe whenever I read it: the idea of “stonewalling”. It’s come up in the Apple vs FBI case as the ForbesLA Times, Jacobin magazine and others all described Apple as “stonewalling” the FBI.

Wired’s recent Whatsapp story mentioned that “WhatsApp is, in practice, stonewalling the federal government” and while Foreign Policy magazine avoided the word, they captured the essence when they described Whatsapp as a “a service willing to adopt technological solutions to prevent compliance with many types of court orders”.

All of these articles make it sound like Apple/Whatsapp has the data, but it unwilling to give it to the government.

�
  !��s�����|Ǧ�2}|q�h�J�,�^��=&/
                                    _,e�r%����/D@�1f��"�
                                                                ]�?c�,��y�l?��3�lF�'���ǘ��IA��O�Y�i�����ё�R��`�[�]�H���P�1'��������S����~tF\�������^��f@��<P�g�	!X���6eh�U�rN���=d@܉eQe���B�lk����\ҠcE��
�$�d&���_xor�s�-���l,v���44�E����n�[���1YL�o�ޜ�g�m�����Tx�f	܁�����å+e�LR�E1���ޅx
                                                                                              �a*�Զ\l�ϫ)4&���or�-�4���C���q��|-2[͘7 ��
��0�ǹ����+�5b!�wV����������3\n�꨻�R�,Ĝ�

\F����P�IJ<Ը$�`Q/���D�w��̣���v"|��z�g/I��@!�(�z������]ɹ3}+f1�
                                                                  ju��vw�y~#7�w��K������M\g�.uW�i
                                                                                                    TYc���I@�s�;�/��
                                                                                                                        �����s�c�ݮ���C�
                                                                                                                                         �6~�e

Blobs of encrypted text like the one above are useless for anyone put the holder of the decryption key. Where the company holds the decryption key and refused to give it up, it seems reasonable to call that “stonewalling”.

Without the decryption key, you may be in possession of such a blob but you can’t meaningfully be described as “having” the data within it. Calls of “stonewalling” in cases like that are either grandstanding or reveal an opinion-disqualifying level of ignorance.

These accusations of stonewalling obscure what I think is the real appeal of encryption and tools such as Tor: it’s not that these technologies prevent compliance, it’s that companies can prevent the collection certain types of data in the first place.

The authors of a recent paper called “Cryptopolitik and the Darknet” did exactly that when they crawled the darknet for data:

“In order to avoid illegal material, such as media files of child pornography or publications by terrorist organisations, only textual content was harvested automatically. Any other material was either filtered out or immediately discarded.”

Nobody would think to accuse them of stonewalling or adopting “technological solutions to prevent compliance” for finding a way to do their darknet crawl without accumulating a bunch of data that is going to bring with it complicated law enforcement dealings.

When Whatapp wants to “avoid illegal material” while still remaining in the messaging business, they do it with end-to-end encryption.

 

Why end-to-end? In end-to-end encryption, the end users hold the decryption keys. Companies who decide to keep the keys themselves become a target of every spy agency on the planet and run the risk of becoming the next Gemalto.

That technologies and architectural choices exist which allow you to filter the data you are exposed to, and therefore your level of legal liability/obligation feels new. Or maybe what’s new is companies willingness to actually implement them.

No-one is interested in obstructing investigations, just managing risk and legal exposure. “If you collect it, they will come” is becoming a common phrase among programmers and security people, and for companies who don’t want to end up holding data related to a crime, painting a giant target on themselves, dedicating resources to servicing government requests, or having awkward public relations moments, end-to-end encryption starts to look like good risk management. Doubly so when you are dealing with multiple governments.

In that context, governments pushing back against end-to-end seem to indicate an existing idea that companies are somehow obligated to collect data on behalf of the government and that using encryption to limit your collection is not OK. This is visible in the issue of the government conscripting companies to do it’s work raised by the FBI’s recent use of the 1789 All-Writs Act law to try to force Apple to build software to hack it’s own phone.

With many years of enthusiastic support from companies like AT&T it’s easy to see where that idea might have come from. As the American government oscillates between attacking tech companies and asking them to do it’s job, and authoritarian governments and international customers look on, it’s not hard to see why many tech companies are far less enthusiastic about facilitating government aims. So far “stonewalling” seems to be a deliberately provocative framing for the “we’ve stopped collecting that data, leave us out of this” reality that end-to-end encryption creates.

Seeing that kind of inflammatory rhetoric from the FBI or congress is one thing, but it’s widespread use by journalists is very disconcerting.

As cries of “stonewalling” turn to accusations of tech companies thinking they are “above the law” and now draft anti-encryption legislation, it’s probably good to remember that blob of encrypted text. It’s not that these companies are getting in the way of the FBI getting data, they are trying to get themselves out of the way by removing their own access to it.

Of all people, former NSA director Michael Hayden recently observed “America is simply more secure with unbreakable end-to-end encryption”. I never thought I would be hoping more people would listen to him.

gpg and signing your own Arch Linux packages

One of the first things that I wanted to install on my system after switching to Arch was ArangoDB. Sadly it wasn’t in the official repos. I’ve had mixed success installing things from the AUR and the Arango packages there didn’t improve my ratio.

Using those packages as a starting point, I did a little tinkering and got it all working the way I like. I’ve been following all the work being done on reproducible builds, and why that is needed and it seems that with all that going on, anyone dabbling with making packages should at the very least be signing them. With that as my baseline, I figured I might as well start with mine .

Of course package signing involves learning about gpg/pgp whose “ease of use” is legendary.

Before we get to package signing, a little about gpg.

gpg --list-keys
gpg -k
gpg --list-public-keys

All of these commands list the contents of ~/.gnupg/pubring.gpg.

gpg --list-secret-keys
gpg -K

Both list all keys from ~/.gnupg/secring.gpg.

The pacman package manager also has its own gpg databases which you can explore with:

gpg --homedir /etc/pacman.d/gnupg --list-keys

So the task at hand is getting my public key into the list of public keys that pacman trusts. To do that we will need to do more than just list keys we need to reference them individually. gpg has a few ways to do that by passing an argument to one of our list keys commands above. I’ll do a quick search through the list of keys that pacman trusts:

mike@longshot:~/projects/arangodb_pkg☺ gpg --homedir /etc/pacman.d/gnupg -k pierre
pub   rsa3072/6AC6A4C2 2011-11-18 [SC]
uid         [  full  ] Pierre Schmitz (Arch Linux Master Key) <pierre@master-key.archlinux.org>
sub   rsa1024/86872C2F 2011-11-18 [E]
sub   rsa3072/1B516B59 2011-11-18 [A]

pub   rsa2048/9741E8AC 2011-04-10 [SC]
uid         [  full  ] Pierre Schmitz <pierre@archlinux.de>
sub   rsa2048/54211796 2011-04-10 [E]

mike@longshot:~/projects/arangodb_pkg☺  gpg --homedir /etc/pacman.d/gnupg --list-keys stephane
pub   rsa2048/AB441196 2011-10-30 [SC]
uid         [ unknown] Stéphane Gaudreault <stephane@archlinux.org>
sub   rsa2048/FDC576A9 2011-10-30 [E]

If you look at the output there you can see what is called an openpgp short key id. We can use those to refer to individual keys but we can also use long id’s and fingerprints:

gpg --homedir /etc/pacman.d/gnupg -k --keyid-format long stephane
pub   rsa2048/EA6836E1AB441196 2011-10-30 [SC]
uid                 [ unknown] Stéphane Gaudreault <stephane@archlinux.org>
sub   rsa2048/4ABE673EFDC576A9 2011-10-30 [E]


gpg --homedir /etc/pacman.d/gnupg -k --fingerprint stephane
pub   rsa2048/AB441196 2011-10-30 [SC]
      Key fingerprint = 0B20 CA19 31F5 DA3A 70D0  F8D2 EA68 36E1 AB44 1196
uid         [ unknown] Stéphane Gaudreault <stephane@archlinux.org>
sub   rsa2048/FDC576A9 2011-10-30 [E]

So we can identify Stephane’s specific key using either the short id, the long id or the fingerprint:

gpg --homedir /etc/pacman.d/gnupg -k AB441196
gpg --homedir /etc/pacman.d/gnupg -k EA6836E1AB441196
gpg --homedir /etc/pacman.d/gnupg -k 0B20CA1931F5DA3A70D0F8D2EA6836E1AB441196

Armed with a way to identify the key I want pacman to trust I need to do the transfer. Though not initially obvious, gpg can push and pull keys from designated key servers. The file at ~/.gnupg/gpg.conf tells me that my keyserver is keys.gnupg.net, while pacman’s file at /etc/pacman.d/gnupg/gpg.conf says it is using pool.sks-keyservers.net

Using my key’s long id I’ll push it to my default keyserver and tell pacman to pull it and then sign it.

#send my key
gpg --send-key F77636AC51B71B99
#tell pacman to pull that key from my keyserver
sudo pacman-key --keyserver keys.gnupg.net -r F77636AC51B71B99
#sign the key it recieved and start trusting it
sudo pacman-key --lsign-key F77636AC51B71B99

With all that done, I should be able to sign my package with

makepkg --sign --key F77636AC51B71B99

We can also shorten that by setting the default-key option in ~/.gnupg/gpg.conf.

# If you have more than 1 secret key in your keyring, you may want to
# uncomment the following option and set your preferred keyid.

default-key F77636AC51B71B99

With my default key set I’m able to make and install with this:

mike@longshot:~/projects/arangodb_pkg☺ makepkg --sign
mike@longshot:~/projects/arangodb_pkg☺ sudo pacman -U arangodb-2.8.1-1-x86_64.pkg.tar.xz
loading packages...
resolving dependencies...
looking for conflicting packages...

Packages (1) arangodb-2.8.1-1

Total Installed Size:  146.92 MiB
Net Upgrade Size:        0.02 MiB

:: Proceed with installation? [Y/n] y
(1/1) checking keys in keyring
[##################################################] 100%
(1/1) checking package integrity
[##################################################] 100%

The ease with which I can make my own packages is a very appealing part of Arch Linux for me. Signing them was the next logical step and I’m looking forward to exploring some related topics like running my own repo, digging deeper into GPG, the Web of Trust, and reproducible builds. It’s all fun stuff, if you can only find the time.

Private communications with Signal

The internet is a global network. You would think that the fact that every message sent over it passes though many legal jurisdictions would make the need for encryption obvious and uncontroversial. Sadly that is not the case.

The need for something more than legal safeguards becomes obvious when you see that a request from a Toronto home to toronto.com (whose server is in Toronto!) leaving Canadian legal jurisdiction on a path through both New York and Chicago before finally reaching it’s Toronto destination.

boomerang_route
An example of a boomerang route from the ixmaps.ca project. About 25% of traffic with both a start and end point in Canada is routed this way.

Applications that deliver technical safeguards, like end-to-end encryption, offer that “something more” that protects my data beyond the border. One of these a applications is Signal, a project of Open Whisper Systems.

In an offline world, privacy was the default, a product of things you didn’t say or do, and probably also a byproduct of how hard it was to move information around. As things like chatting with friends and family or reading a newspaper all moved online, those activities suddenly involved sending data in plain text over public infrastructure. Privacy become something that existed only for those that found a way to avoid the default of sending plain text. Privacy became a product of action rather than inaction.

Signal and its predecessor projects Textsecure and Redphone are part of an effort to make privacy the default again by rolling high end encryption into apps polished for mainstream adoption.

Screenshot_20151103-142606

Signal does two main things: sending text messages and making calls. What Signal actually does for secure communications is very different from what it does for insecure ones and is worth understanding.

Text messages

When sending a text message to someone who does not have Signal, the application sends a standard SMS message. The details of what constitutes an SMS message were hashed out in 1988 long before security was a thing and consequently, a related specification notes “SMS messages are transported without any provisions for privacy or integrity”, but importantly they are transported over the telephone network.

When sending secure text messages, Signal uses your mobile data to send the message using the Textsecure protocol v2.

The distinction between those two is worth making since coverage for data vs telephone can vary as can the costs, say if you are travelling and turn off mobile data.

The first time you send or receive a secure message with someone, behind the scenes you exchange cryptographic keys. Everything Signal does is focused on ensuring secure communication with the holder of that key. If the key for one of your contacts changes, it should be considered an event worth a quick phone call. This can happen innocently enough, say if they uninstall and then reinstall the app, but since all the other security measures a built on that, its worth asking about.

After the first text message has been sent or received, from then on Signal uses those keys to generate new keys for each individual message (Described in detail here.). This ensures that even if one message were to be decrypted, every other message is still safe.

Calling

Calling follows a similar pattern; for insecure calls Signal simply launches your phones standard phone app, while encrypted calls it handles itself. And like the secure text messages, this also uses your mobile data rather than routing through the phone network.

Secure calls are placed using the ZRTP protocol, the details of which are hidden from the user with one important exception.
On screen when you make a secure call you will notice two words displayed. These words are part of the ZRTP protocol and were generated based on the key that both parties used to encrypt the call.

zrtp_call

Both parties should see the same two words. If you say one and ask your contact to read the other, and they don’t match up, they keys you have agreed upon are not the same. If the keys are not the same it suggests someone has tampered with connection information inflight and inserted themselves into your conversation.

Verifying keys

Part of whole key exchange process that allows users to identify each other involves pulling your contacts public key from a central key directory server. The use of a central server means that I now have to trust that server not to maliciously give me a key for someone else. Open Whisper Systems Trevor Perrin addressed the problem of trusting unauthenticated keys in a talk at the NSEC security conference. It’s just a few minutes but its an interesting insight into the balancing act involved in bringing private communications to the masses:

For the interested or the paranoid, Signal lets you verify a contacts key by touching your contacts name/number at the top of your conversation. This brings up the details for that contact which includes a “Verify identity” option.verifyWith that, and your own identity details, found under Settings (three vertical dots on the top right of the main screen) > “My Identity Key”, you are able to either read a key fingerprint or if you have a QR/Barcode scanner you can use that to verify your contacts key.

scan_optionsverified

Open Source

Establishing that there is no secret behaviour or hidden flaws somewhere in the code is critical in this world where we put a significant amount of trust in computers (and where the definition of computer is expanding to encompass everything from voting machines to Volkswagens).

Signal establishes trust by developing the code in the open the code so that it can be reviewed (like this review of Signal’s predecessor Redphone by Cryptographer Matthew Green). Former Google security researcher Morgan Marquise-Boire has endorsed Signal as has Edward Snowden.

But even if you believe the experts that Signal works as advertised, its common for “free” apps to seem significantly less “free” once you realize what they do to turn a profit. With that in mind, another component of trust is understanding the business model behind the products you use.

When asked about the business model on a popular tech forum, Open Whisper Systems founder Moxie Marlinspike explained “in general, Open Whisper Systems is a project rather than a company, and the project’s objective is not financial profit.”

The project is funded by a combination of grants and donations from the Freedom of the Press Foundation and The Shuttlesworth Foundation among others. It is worked on by a core group of contributors led by and supporting cast of volunteers.

Final thoughts

Signal does a great job of making encrypting your communications a non-event. Encrypted as they travel the network, our messages are now secure against tampering and interception, no matter whose servers/routers they pass through. The result: privacy.

The fact that applying security measures result in privacy should tell you that the oft quoted choice between “security vs privacy” is a false one. As pointed out by Timothy Mitchener-Nissen, assuming a balance between these things only results in sacrificing increments of privacy in pursuit of the unachievable idea of total security. The ultimate result is reducing privacy to zero. Signal is just one way to grab back one of those increments of privacy.

All of that said my interest in privacy technologies and encryption is an interest for me as a technologist. If you are counting on these technologies like Signal to protect you from anyone serious (like a nation-state) the information above is barely a beginning. I would suggest reading this best practices for Tor and the grugq’s article on signals, intelligence. Actually anything/everything by the grugq.