Buyer behavior analysis, protein-protein interactions, the human brain, fraud detection, financial analysis; if you sketch any of these out on a whiteboard, you will most likely end up with a series of circles connected by lines.
This simple representation we all intuitively use to map out the relationships between things. Though simple, under the name “graph” or “network graph”, it is the subject of study for an entire branch of mathematics (graph theory), and the burgeoning field of Social Network Analysis (SNA).
SNA is the study of the network of relationships among a set of things rather than the things themselves. This type of analysis is increasing common in academia across a huge number of fields. Google Scholar gives a fairly clear indication that the term is increasingly common among the academic databases it crawls.
The technique surfaces in many domains, used to identify which actors within a given network are “interesting”. In a biology context “interesting” actors might be the genes that are interacting the most with other genes given a certain stimulus.
In the context of epidemiology, if the actors are fish farms, the movement of fish between then forms a network which can be analysed. Aquaculture sites that have the highest number of incoming and outgoing connections become “interesting” since the movements mean a high vulnerability to infection and likelihood to spread disease.
An “interesting” financial institution might be one whose financial ties with other institutions indicate that it’s failure might have a domino effect.
While Social Network analysis has been steadily growing, shifts in industry are underway that promise to make this type of analysis more common outside academia.
In 2001, with computers spreading everywhere, e-commerce heating up and internet usage around 500 million, Doug Laney, an eagle-eyed analyst for MetaGroup (now Gartner), notices a trend; data is changing. He described how data was increasing along 3 axes: increasing in volume, velocity and variety which eventually became known as “the 3 V’s of Big Data”.
This changing characteristics of data itself has touched off what is often called a “Cambrian explosion” of non-relational databases that offer the speed, flexibility and horizontal scalability needed to accommodate it. These databases and collectively known as NoSQL databases.
Since the launch of Google’s Bigtable in 2005, More than 28 NoSQL databases have been launched. The majority fall into one of the 3 main sub-categories: key/value store, document store or graph database.
Anywhere a graph database is used is an obvious place to use SNA, but the ability to build a graph atop either key/value stores or document databases means that the majority of NoSQL databases are amenable to being analysed with SNA.
There are also many relational databases struggling to contain and query graphy datasets that developers have dutifully pounded into the shape of a table. As graph databases gain more traction we will likely see some of these converted in their entirety to graphs using tools like R2G wherever developers end up struggling with recursion or an explosion of join tables, or something like routing.
In addition to the steady pressure of the 3 V’s and the growth of SNA as an analytical and even predictive, tool, there are many companies (Runkeeper, Yahoo, LinkedIn) whose data model is a graph. Facebook and Netflix both fall into this category and have each released tools, both of which are pitched as alternatives to REST architecture style most web applications are based on, to make building graph backed applications easier.
Circling back to the original question of “why graphs?”, hopefully the answer is clearer. For anyone with an interest in data analysis, paying attention to this space gives access to powerful tools and a growing number of opportunities to apply them. For developers, understanding graphs allows better data modelling and architectural decisions.
Beyond the skills needed to design and tend to these new databases and make sense of their contents, knowledge of graphs will also increasingly be required to make sense of the world around us.
Understanding why you got turned down for a loan will require understanding a graph, why you are/aren’t fat, and eventually who gets insurance and at what price will too.
Proper security increasingly requires thinking in graphs, as even the most “boring” of us can be a useful stepping stone on the way to compromising someone “interesting”; perhaps a client, an acquaintance, an employer, or a user of something we create.
With talk of storage capacities actually surpassing Moore’s law, SNA growing nearly linearly, NoSQL growing, interest in graphs on the way up, and application development tools finally appearing, the answer to “why now?” is that this is only the beginning. We are all connected, and understanding how is the future.