Sikuli tips

Lately I have been working alot with Sikuli. In case you have not heard of it, its a project that uses the computer vision library OpenCV to automate GUI interactions. While Sikuli itself is new to me, this also marks my first exposure to the underlying language of Python.

For the most part its nice to use an has some decent documentation, but there have been a few things that I have tripped over; some related to Sikuli itself and some due to my lack of familiarity with Python. So here is a compilation of a bunch of those little things that I had to stop and scratch my head over.

Java 7:

Don’t. It crashes every time you click one of the image functions in the IDE. Use Java 6 and life is good.

Special Keys:

All the special keys can be accessed through the Key class.

type('Sikuli rocks' + Key.ENTER)

The rest of the keys follow the same pattern… Key.TAB, Key.BACKSPACE and so on.
All that leads you to believe that to copy some text is going to use Key.CTRL, but you actually need to use the KeyModifier class instead. So Ctrl+C looks like this:

type('c', KeyModifier.CTRL)

Killing a script:

You will run your script a lot and when it goes badly you need to kill it but there is no obvious way to do that since the IDE disappears when you run the script. Kill your script with this:


Sometimes you need to capture a image of a menu (like a context menu) that disappears when you move away. To catch an image of the menu use the shorcut:


Launching an app:
Sure you could doubleclick some icon somewhere but its much faster to launch most apps with a command. To save yourself from having to escape all the slashes in a path you should use Pythons raw string (note the r in from of the path):

firefox = App(r'C:\Program Files\Mozilla\firefox.exe')

Sikuli has some pretty good documentation but there are some areas where it feels a little thin. Specifically finding a complete list of methods for a particular class. Fortunately Pythons dir() function does that job.

firefox = App(r'C:\Program Files\Mozilla\firefox.exe')
screenRegion = firefox.window()
print dir(screenRegion)

[‘ROI’, ‘above’, ‘autoWaitTimeout’, ‘below’, ‘bottomLeft’, ‘bottomRight’…]

Child windows:
Lets say that Firefox opens a child window, like the downloads window. To get a handle on that you just need to do this:

downloadsWindow = firefox.window(0)

Unfortunately what you will find in downloadsWindow is an object of Region class. I was expecting to be able to get an App instance so I could call methods like focus() on it. It does the job though.

One other thing is that sometimes you will want to check that a region actually contains what you think it does. For that you will probably want to capture an image of that area during execution. I drop in code like this when ever I need to get a visual of a region:

import shutil
captureimage = capture(resultsBox)
shutil.move(captureimage, r'C:\somefolder\bounds.png')

These are a few of the things I have bumped into so far and I wanted to put them up here so I don’t forget them. Hopfully I’ll get to do more work with Sikuli. I think with a little practice I could make some scripts that are pretty resiliant to a lot of the changes that normally break these types of scripts (unexpected popups for instance, or slow networks). I’m looking forward to learning more about it.


4 thoughts on “Sikuli tips”

  1. Hi Mike,
    The tips you provided were very useful. Kindly post any new tips, also
    share some sample scripts, if anything you have written.. By the way, great job dude..


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s