Why graphs? Why now?

Buyer behavior analysis, protein-protein interactions, the human brain, fraud detection, financial analysis; if you sketch any of these out on a whiteboard, you will most likely end up with a series of circles connected by lines.

This simple representation we all intuitively use to map out the relationships between things. Though simple, under the name “graph” or “network graph”, it is the subject of study for an entire branch of mathematics (graph theory), and the burgeoning field of Social Network Analysis (SNA).

SNA is the study of the network of relationships among a set of things rather than the things themselves. This type of analysis is increasing common in academia across a huge number of fields. Google Scholar gives a fairly clear indication that the term is increasingly common among the academic databases it crawls.

The growth of Social Network Analysis
The growth of Social Network Analysis.

The technique surfaces in many domains, used to identify which actors within a given network are “interesting”. In a biology context “interesting” actors might be the genes that are interacting the most with other genes given a certain stimulus.

 Transcriptome-Based Network Analysis Reveals a Spectrum Model of Human Macrophage Activation Xue, Jia et al.
Transcriptome-Based Network Analysis Reveals a Spectrum Model of Human Macrophage Activation – Xue, Jia et al.

In the context of epidemiology, if the actors are fish farms, the movement of fish between then forms a network which can be analysed. Aquaculture sites that have the highest number of incoming and outgoing connections become “interesting” since the movements mean a high vulnerability to infection and likelihood to spread disease.

Image from the paper Application of network analysis to farmed salmonid movement data from Scotland
Image from the paper Application of network analysis to farmed salmonid
movement data from Scotland

An “interesting” financial institution might be one whose financial ties with other institutions indicate that it’s failure might have a domino effect.

Figure from DebtRank: Too Central to Fail? Financial Networks, the FED and Systemic Risk
Figure from DebtRank: Too Central to Fail? Financial
Networks, the FED and Systemic Risk

While Social Network analysis has been steadily growing, shifts in industry are underway that promise to make this type of analysis more common outside academia.

In 2001, with computers spreading everywhere, e-commerce heating up and internet usage around 500 million, Doug Laney, an eagle-eyed analyst for MetaGroup (now Gartner), notices a trend; data is changing. He described how data was increasing along 3 axes: increasing in volume, velocity and variety which eventually became known as “the 3 V’s of Big Data”.

An amazing visualization of the 3 V's from Datasciencecentral.com
An amazing vizualization of the 3 V’s from Datasciencecentral.com

This changing characteristics of data itself has touched off what is often called a “Cambrian explosion” of non-relational databases that offer the speed, flexibility and horizontal scalability needed to accommodate it. These databases and collectively known as NoSQL databases.

A non-exhaustive but representative timeline of NoSQL databases.
A non-exhaustive but representative timeline of NoSQL databases. From the first database (a graph database) till today. Can you see the explosion? Explore the timeline yourself.

Since the launch of Google’s Bigtable in 2005, More than 28 NoSQL databases have been launched. The majority fall into one of the 3 main sub-categories: key/value store, document store or graph database.

Anywhere a graph database is used is an obvious place to use SNA, but the ability to build a graph atop either key/value stores or document databases means that the majority of NoSQL databases are amenable to being analysed with SNA.

There are also many relational databases struggling to contain and query graphy datasets that developers have dutifully pounded into the shape of a table. As graph databases gain more traction we will likely see some of these converted in their entirety to graphs using tools like R2G wherever developers end up struggling with recursion or an explosion of join tables, or something like routing.

In addition to the steady pressure of the 3 V’s and the growth of SNA as an analytical and even predictive, tool, there are many companies (RunkeeperYahooLinkedIn) whose data model is a graph. Facebook and Netflix both fall into this category and have each released tools, both of which are pitched as alternatives to REST architecture style most web applications are based on, to make building graph backed applications easier.

Circling back to the original question of “why graphs?”, hopefully the answer is clearer. For anyone with an interest in data analysis, paying attention to this space gives access to powerful tools and a growing number of opportunities to apply them. For developers, understanding graphs allows better data modelling and architectural decisions.

Beyond the skills needed to design and tend to these new databases and make sense of their contents, knowledge of graphs will also increasingly be required to make sense of the world around us.

Understanding why you got turned down for a loan will require understanding a graph, why you are/aren’t fat, and eventually who gets insurance and at what price will too.

Facebook patents technology to help lenders discriminate against borrowers based on social connections
Facebook patents technology to help lenders discriminate against borrowers based on social connections

Proper security increasingly requires thinking in graphs, as even the most “boring” of us can be a useful stepping stone on the way to compromising someone “interesting”; perhaps a client, an acquaintance, an employer, or a user of something we create.

Graphs will be used to find and eliminate key terrorists, map out criminal networks, and route you to that new vegan restaurant across town.

With talk of storage capacities actually surpassing Moore’s law, SNA growing nearly linearly, NoSQL growing, interest in graphs on the way up, and application development tools finally appearing, the answer to “why now?” is that this is only the beginning. We are all connected, and understanding how is the future.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s