GraphQL i18n

One of the things I love about GraphQL is that it is “self documenting”, but of course, here in Canada the obvious question that follows is “in both languages?”. Since GraphQL is one of the core technologies for me I really wanted to figure out a decent answer when people ask about i18n, but it’s never been clear to me how to handle it.

Since I’m familiar with GraphQL-js, I’ll be using that here but Apollo Server is on my list to explore and may have some different answers. I’ll also be using my favourite i18n library Lingui here, but any other i18n library could be easily substituted.

Just to be clear, the issue at hand is not i18n for the data (that you’d handle with something like ArangoDB’s TRANSLATE) it’s i18n for the description attributes that can be attached to all your schema objects.
Once description strings are added, anyone (and “anyone” could and probably will include developer tools, IDEs or developers themselves) can introspect on the schema by querying the __schema or __type fields to find the descriptions with a query like this:

{
  __schema {
    queryType {
      fields {
        name
        description
        args {
          name
          description
        }
      }
    }
  }
}

To help think this through lets create a basic schema that just returns the current time and returns the DateTime type from above and use it to explore i18n.

const DateTime = new GraphQLObjectType({
  name: 'DateTime',
  description: 'An example date/time object.',
  fields: () => ({
    date: {
      description: 'The current date in DD/MM/YYYY format.',
      type: GraphQLString,
    },
    time: {
      description: 'The current time in HH:MM:SS AM/PM format.',
      type: GraphQLString,
    },
  }),
})

const query = new GraphQLObjectType({
  name: 'Query',
  fields: {
    now: {
      description: 'Returns current time and date values.',
      type: DateTime, // what the resolve function will produce
      resolve: (root, args, context) => {
        let now = new Date()
        let time = now.toLocaleDateString(context.language, {
          timeZone: 'America/Toronto',
        })
        let date = now.toLocaleTimeString(context.language, {
          timeZone: 'America/Toronto',
        })
        return { date, time }
      },
    },
  },
})

With our query type and it’s return type created, we just need to wrap that in a schema and pass it to the express-graphql middleware. It will mount the schema on the url we specify and pass the request object to all our resolvers as the third argument (aka “the context”).

Adding the requestLanguage middleware from the express-request-language library before express-graphql means that incoming requests will be checked for the Accept-Language header, and the best matching language of the languages you specify will be attached to the request as request.language. Remember that express-graphql is passing the request to our resolvers as context so that means that we access the request.language as context.language.

const schema = new GraphQLSchema({ query })

let server = express()

server
  .use(
    requestLanguage({
      languages: ['en', 'fr'], // First locale becomes the default
    }),
  )
  .use('/graphql', graphqlHTTP({ schema }))

server.listen(3000)

With this basic setup, we can already see the outline of the issue: our schema and types with their accompanying descriptions are defined once when the script is run, but the language we want is whatever is appropriate for each request.

It probably won’t surprise you that our middleware can been passed a function that it will execute per request that will produce the configuation needed. With that in place we have a pretty clear path forward.

server
  .use(
    requestLanguage({
      languages: ['en', 'fr'], // First locale becomes the default
    }),
  )
  .use(
    '/graphql',
    graphqlHTTP((request, response, graphQLParams) => {
      return {
        schema: new GraphQLSchema({
          query: // define a schema and types using the request language and pass it in
        }),
      }
    }),
  )

My first step was to define a function that recieves an i18n object and returns a schema.

const createSchema = i18n => {
  // Define a type that describes the data
  const DateTime = new GraphQLObjectType({
    name: 'DateTime',
    description: i18n.t`An example date/time object.`,
    fields: () => ({
      date: {
        description: i18n.t`The current date in DD/MM/YYYY format.`,
        type: GraphQLString,
      },
      time: {
        description: i18n.t`The current time in HH:MM:SS AM/PM format.`,
        type: GraphQLString,
      },
    }),
  })

  const query = new GraphQLObjectType({
    name: 'Query',
    fields: {
      now: {
        description: i18n.t`Returns current time and date values.`,
        type: DateTime, // what the resolve function will produce
        resolve: (root, args, context) => {
          let now = new Date()
          let time = now.toLocaleDateString(context.language, {
            timeZone: 'America/Toronto',
          })
          let date = now.toLocaleTimeString(context.language, {
            timeZone: 'America/Toronto',
          })
          return { date, time }
        },
      },
    },
  })

  return query
}

With that defined I can use it to produce a schema, but because lingui works by scanning for and extracting things like i18n.t`...` from my code, I have to remember not to rename that otherwise lingui extract won’t find my translations. Additionally I don’t want variable shadowing, so I import i18n under a different name and rename it to what Lingui expects only when I go to create the schema:

const express = require('express')
const graphqlHTTP = require('express-graphql')
const { GraphQLSchema, GraphQLObjectType, GraphQLString } = require('graphql')
const { i18n: internationalization, unpackCatalog } = require('lingui-i18n') // import i18n as something else
const requestLanguage = require('express-request-language')

internationalization.load({  // Load our language files
  fr: unpackCatalog(require('./locale/fr/messages.js')),
  en: unpackCatalog(require('./locale/en/messages.js')),
})

// Our function that creates the schema
const createSchema = i18n => {
...
}

let server = express()

server
  .use(
    requestLanguage({
      languages: internationalization.availableLanguages.sort(), // First locale becomes the default
    }),
  )
  .use(
    '/graphql',
    graphqlHTTP(async (request, response, graphQLParams) => {
      internationalization.activate(request.language)
      return {
        schema: new GraphQLSchema({
          query: createSchema(internationalization),
        }),
      }
    }),
  )

server.listen(3000)

Getting Lingui properly integrated took a little fiddling. It brings along it’s own copy of babel which doesn’t seem to see your other babel plugins but does read your .babelrc.
Configuring my projects babel using only command line options, solved the clashes with lingui’s babel, and then making sure Lingui would only look for translations in the src folder was all I needed to get Lingui in working along-side the usual “transpile to dist” workflow (the finished code is available here).
After running Lingui’s lingui extract and doing my translations I’m now able to hit my endpoint and see the translated descriptions:

# Ask for English!
mike@sleepycat:~$ curl -H "Accept-Language: en" -H "Content-Type: application/graphql" -d "{ __schema {queryType { fields { name description } } } }" "localhost:3000/graphql"
{"data":{"__schema":{"queryType":{"fields":[{"name":"now","description":"Returns current time and date values."}]}}}}
# Ask for French!
mike@sleepycat:~$ curl -H "Accept-Language: fr" -H "Content-Type: application/graphql" -d "{ __schema {queryType { fields { name description } } } }" "localhost:3000/graphql"
{"data":{"__schema":{"queryType":{"fields":[{"name":"now","description":"Renvoie les valeurs actuelles de date et d'heure."}]}}}}

Performance

Obviously defining the schema and types before processing each request is going to cost something but it would be good to know what. This is a question some load testing with wrk2 can give us insight into (with the caveat that both the server and the load testing program were running on the same laptop, so take this with a large grain of salt).

First, a version without the schema per request:

mike@sleepycat:~$ wrk2 -t4 -c100 -d30s -R2000 --latency "http://localhost:3000/graphql?query=%7B%0A%20%20now%20%7B%0A%20%20%20%20date%0A%20%20%20%20time%0A%20%20%7D%0A%7D"
Running 30s test @ http://localhost:3000/graphql?query=%7B%0A%20%20now%20%7B%0A%20%20%20%20date%0A%20%20%20%20time%0A%20%20%7D%0A%7D
  4 threads and 100 connections
  Thread calibration: mean lat.: 3309.908ms, rate sampling interval: 11272ms
  Thread calibration: mean lat.: 3310.066ms, rate sampling interval: 11280ms
  Thread calibration: mean lat.: 3308.845ms, rate sampling interval: 11280ms
  Thread calibration: mean lat.: 3310.191ms, rate sampling interval: 11280ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    11.83s     3.26s   17.47s    57.94%
    Req/Sec   214.00      0.00   214.00    100.00%

And now the one with our i18n schema per request:

mike@sleepycat:~$ wrk2 -t4 -c100 -d30s -R2000 --latency "http://localhost:3000/graphql?query=%7B%0A%20%20now%20%7B%0A%20%20%20%20date%0A%20%20%20%20time%0A%20%20%7D%0A%7D"
Running 30s test @ http://localhost:3000/graphql?query=%7B%0A%20%20now%20%7B%0A%20%20%20%20date%0A%20%20%20%20time%0A%20%20%7D%0A%7D
  4 threads and 100 connections
  Thread calibration: mean lat.: 3196.913ms, rate sampling interval: 11296ms
  Thread calibration: mean lat.: 3196.537ms, rate sampling interval: 11288ms
  Thread calibration: mean lat.: 3115.362ms, rate sampling interval: 10493ms
  Thread calibration: mean lat.: 3196.062ms, rate sampling interval: 11288ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    11.91s     3.35s   17.81s    57.65%
    Req/Sec   207.00      0.00   207.00    100.00%

So generating our schema/types per request drops us from 214 requests per second down to 207. Clearly it’s not free, but in this little example it’s pretty reasonable and in this world of microservices, there are a fair number of services that aren’t much bigger than this example. That said, a ~3% drop for something so simple is probably something you would want to watch carefully. A larger schema with more imports might well be far more costly. Clearly this little performance test is far from rigorous, but it’s nice to have some vague sense of the impact.

i18n and GraphQL

In the end this blog post is as much a question as it is an answer. I’m pretty certain their are futher refinements to be made here and that there are ways to avoid some or maybe all of the performance penalty highlighted above.
I’m also pretty curious about what it would look like to get a similar thing happening with Apollo Server.
Hopefully this will help other people who are trying to do i18n with GraphQL and maybe surface better options.

Advertisements

Javascript i18n with Lingui

Living in a country with two official lanaguage means that you don’t get far into a project before the question of internationalization (aka i18n to anyone who has to type it more than a few times) comes up.

There are a few options for dealing with this in Javascript, but it’s taken a while to find one I like. First, I expect to use the same library on the server and on the client, and I expect to be able to use it with libraries like React.

React-Intl works OK on the client side, but using the underlying Intl on the server looks under-documented and deeply clunky. I18next is reasonable on the server and has integrations with most client side frameworks. While it’s a decent choice, there is something about the way it works which rubs me the wrong way.

i18next.init({
  lng: 'en',
  fallbackLng: 'en',
  resources: {
    en: {
      translation: {
        person: {
          firstName: "First name",
          lastName: "Last name",
	}
      },
    },
    fr: {
      translation: {
        person: {
          firstName: "Prénom",
          lastName: "Nom de famille",
	}
      },
    },
  }
})

The above code shows how to use it. It’s a pretty standard setup (very familiar if you’ve ever done i18n in Rails), some singleton object (you can make others if you need to) with an internal collection of messages which are stored as a JSON object.

One of the things that I dislike about this approach is that the translations stored in that JSON object tend to accumulate and hang around long after the code that needs that message is gone.

The other thing I find doesn’t sit well with me is the way you access those messages: i18n.t('mutation.fields.purchase.args.expiryYear').

What you are looking at is a function call that assumes the existence of an object like {translation:{fields:{purchase:{args:{expiryYear: "Expiry year"}}}}}. This an example of structural coupling, my code depends on the structure of that object. This sort of thing is normally considered an anti-pattern, a violation of the “law of Demeter”, but it’s pretty common among i18n libraries. I have to decide on the structure to start with, and after that, changing that structure (say if you decide you didn’t make the right decision about how to structure it originally) is going to break a lot of things.

Poking around I stumbled on a library that takes a different approach: Lingui.

Lingui is interesting because it builds a nice translation workflow by leveraging the now ubiquitous infrastructure of Babel.

Aside from the core code in lingui-i18n (and other packages dealing with React) the heart of lingui’s approach are two babel plugins: babel-plugin-lingui-extract-messages and babel-plugin-lingui-transform-js.

We can install what we need for the server side like this.

yarn add lingui-cli lingui-i18n

The babel-plugin-lingui-extract-messages plugin does what is advertised on the tin. First we need a little test code to extract.

const { i18n } = require('lingui-i18n')

i18n.t`I do like a bit of gorgonzola.`
i18n.t`Not even Wensleydale?`

Then we need to create some locales using the helper supplied by lingui-cli:

[mike@longshot lingui]$ lingui add-locale en fr
Added locale en.
Added locale fr.

(use "lingui extract" to extract messages)
[mike@longshot lingui]$ tree
.
├── index.js
├── locale
│   ├── en
│   │   └── messages.json
│   └── fr
│       └── messages.json
├── package.json
└── yarn.lock

Next we use babel-plugin-lingui-extract-messages via the Lingui CLI commandlingui extract to scan our code for those internationalized strings and extract them into translation files.

[mike@longshot lingui]$ lingui extract
Extracting messages from source files…
Collecting all messages…
Writing message catalogues…
Messages extracted!

Catalog statistics:
┌──────────┬─────────────┬─────────┐
│ Language │ Total count │ Missing │
├──────────┼─────────────┼─────────┤
│ en       │      2      │    2    │
│ fr       │      2      │    2    │
└──────────┴─────────────┴─────────┘

(use "lingui add-locale <language>" to add more locales)
(use "lingui extract" to update catalogs with new messages)
(use "lingui compile" to compile catalogs for production)

Lingui prints out a nice summary of the state of my translations.
A look at the translation files shows how Lingui can solve that coupling problem; it generates an object with the content of the strings used as the keys. This way my translations are looked up by content, rather than via their location in some stucture. Since Lingui defaults to showing the message id (which is actually the English content string from the source) we’ll just edit the French messages file.

[mike@longshot lingui]$ cat locale/fr/messages.json 
{
  "I do like a bit of gorgonzola.": {
    "translation": "Je aime un peu de gorgonzola.",
    "origin": [
      [
        "index.js",
        3
      ]
    ]
  },
  "Not even Wensleydale?": {
    "translation": "Pas même Wensleydale?",
    "origin": [
      [
        "index.js",
        4
      ]
    ]
  }
}

With that done we need to compile the json into JS files for use with lingui compile. The missing piece now is the how those i18n.t tagged template literals are going to produce our translated strings at runtime, and the answer is babel-plugin-lingui-transform-js.

Since a picture is worth a thousand words, I think the best way to explain it is this:

[mike@longshot lingui]$ cat index.js | babel --plugins lingui-transform-js
const { i18n } = require('lingui-i18n');

i18n._('I do like a bit of gorgonzola.');
i18n._('Not even Wensleydale?');

As you can see, all the calls to i18n.t`` are replaced with calls to i18n._(). This is underscore function is the low level api that Lingui uses to actually give you the translated strings.

Now that we know that, let’s take a look at what using the library looks like.

[mike@longshot lingui]$ node
> var { i18n, unpackCatalog } = require('lingui-i18n')
undefined
> i18n.load({fr: unpackCatalog(require('./locale/fr/messages.js')), en: unpackCatalog(require('./locale/en/messages.js'))})
undefined
> i18n.availableLanguages
[ 'fr', 'en' ]>
> i18n.activate('en')
undefined
> i18n._('I do like a bit of gorgonzola.')
'I do like a bit of gorgonzola.'
> i18n.activate('fr')
undefined
> i18n._('I do like a bit of gorgonzola.')
'Je aime un peu de gorgonzola.'

Lingui has some more tricks up it’s sleeve like pluralization, but one of the things I’m happiest about is that this approach also solves that “unused messages” problem that I mentioned.

If we delete our “Not even Wensleydale?” message and run lingui extract again we can see the benefits of this static analysis style approach: Lingui knows when there is nothing referencing a message, and marks it as obsolete.

[mike@longshot lingui]$ cat locale/fr/messages.json 
{
  "I do like a bit of gorgonzola.": {
    "translation": "Je aime un peu de gorgonzola.",
    "origin": [
      [
        "index.js",
        3
      ]
    ]
  },
  "Not even Wensleydale?": {
    "translation": "Pas même Wensleydale?",
    "origin": [
      [
        "index.js",
        4
      ]
    ],
    "obsolete": true
  }
}

Better still, Lingui will clean out the obsolete messages for you with lingui extract --clean.

[mike@longshot lingui]$ lingui extract --clean
Extracting messages from source files…
Collecting all messages…
Writing message catalogues…
Messages extracted!

Catalog statistics:
┌──────────┬─────────────┬─────────┐
│ Language │ Total count │ Missing │
├──────────┼─────────────┼─────────┤
│ en       │      1      │    1    │
│ fr       │      1      │    0    │
└──────────┴─────────────┴─────────┘

(use "lingui add-locale <language>" to add more locales)
(use "lingui extract" to update catalogs with new messages)
(use "lingui compile" to compile catalogs for production)
[mike@longshot lingui]$ cat locale/fr/messages.json 
{
  "I do like a bit of gorgonzola.": {
    "translation": "Je aime un peu de gorgonzola.",
    "origin": [
      [
        "index.js",
        3
      ]
    ]
  }
}

For me this is pretty much the holy grail for i18n. Here I’ve focused on using Lingui without any other libraries, but it’s just as awesome with React. With locale files that can be plausibly handed over to a translator, and tooling that both find and remove translations Lingui has become my goto i18n library.