Working with the Google Vision API

I remember hearing a story about a developer whose contract with the military specified the number of kilos of documentation that were required to accompany the system they were building. I think of that story from time to time when I use Google products.

Google’s Vision API gives access to legit state-of-the-art Artificial Intelligence and is amazing for extracting text from images, but a concise modern example doesn’t seem to exist in spite of the huge volume of documentation.

The example they give is in the classic callback style:

var vision = require('@google-cloud/vision');

var visionClient = vision({
  projectId: 'grape-spaceship-123',
  keyFilename: '/path/to/keyfile.json'

visionClient.detectText('./image.jpg', function(err, text) {
  // text = [
  //   'This was text found in the image',
  //   'This was more text found in the image'
  // ]

With all that has been written about the inversion of control problems of callbacks and ES2015 support nearly complete and in wide use thanks to Babel, examples like this are feeling distinctly retro.

Also painful for anyone working with Docker is that the authentication appears to require me to include a keyfile.json somewhere in my container, where what I actually want is to store that stuff in the environment.

After a bit of experimentation, it turns out that that the google-cloud-node library doesn’t let us down. It’s filled with all the promisey goodness we scripters-of-java have come to expect. If you are using jest this test should get you going:

import Vision from '@google-cloud/vision'

describe('Google Vision client', () => {

  it('successfully connects', async () => {
    let client = Vision({
      projectId: process.env.GOOGLE_VISION_PROJECT_ID,
      credentials: {
	      private_key: process.env.GOOGLE_VISION_PRIVATE_KEY.replace(/\\n/g, '\n'),
        client_email: process.env.GOOGLE_VISION_CLIENT_EMAIL

    let [[text, ...words], annotations] = await client.detectText(__dirname + '/data/foo.jpg')
    expect(text).toEqual("foo bar\n")
    expect(words).toContain("foo", "bar")


The project id is easy enough to find, but the environment variables used to avoid the keyfile.json are actually found within the keyfile.

  "type": "service_account",
  "project_id": "...",
  "private_key_id": "...",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "",
  "client_id": "...",
  "auth_uri": "",
  "token_uri": "",
  "auth_provider_x509_cert_url": "",
  "client_x509_cert_url": ""

The keyfile above was created by going to the credentials console and following the instructions here.

Note the replace(/\\n/g, '\n') happening on the GOOGLE_VISION_PRIVATE_KEY. This is from issue 1173 and without it you end up with the error
Error: error:0906D06C:PEM routines:PEM_read_bio:no start line. Replacing new lines with new lines seems silly but you gotta do what you gotta do.

The last missing piece is an image with some text. I created a quick test image in Gimp with the words “foo bar”:


While it wasn’t clear at first glance, google-cloud-node is a pretty sophisticated and capable library, despite being theoretically “alpha”. Google is remaking itself as “the AI company” and the boundary pushing stuff it’s doing means I’m probably going to be using this client a lot. I really was hoping to find a small amount of the “right” documentation instead of the huge volume of partial answers spread across their sprawling empire. Hopefully this is a useful contribution towards that reality.