Graph Databases - DBMS

In a relational database, relationships are an afterthought — we represent them with foreign keys and JOINs. In a graph database, relationships are first-class citizens. They’re stored directly, traversed instantly, and queried naturally.

Think of it like a social network. We don’t think “User #42 has a row in the friends table pointing to User #99.” We think “Manish is friends with Priya.” Graph databases store data the way we naturally think about connections.

Core Concepts

A graph database has three building blocks:

Nodes — the entities (people, products, locations). Like rows in a table.

Edges (Relationships) — the connections between nodes. They have a type and direction. Like foreign keys, but stored as actual links.

Properties — key-value pairs on both nodes and edges. Like columns.

Graph Structure

Manish :Person

FRIENDS_WITH ────────→ since: 2020

Priya :Person

↓ WORKS_AT

Acme :Company

TechCo :Company

Neo4j and Cypher

Neo4j is the most popular graph database. It uses Cypher as its query language, which uses ASCII art patterns to describe graph traversals.

// Create nodes
CREATE (m:Person {name: "Manish", age: 25})
CREATE (p:Person {name: "Priya", age: 24})
CREATE (a:Company {name: "Acme Corp"})

// Create relationships
CREATE (m)-[:FRIENDS_WITH {since: 2020}]->(p)
CREATE (m)-[:WORKS_AT {role: "Backend Dev"}]->(a)

// Find a person
MATCH (p:Person {name: "Manish"})
RETURN p

// Find Manish's friends
MATCH (m:Person {name: "Manish"})-[:FRIENDS_WITH]->(friend)
RETURN friend.name

// Find friends of friends (2 hops)
MATCH (m:Person {name: "Manish"})-[:FRIENDS_WITH*2]->(fof)
RETURN DISTINCT fof.name

// Find the shortest path between two people
MATCH path = shortestPath(
  (a:Person {name: "Manish"})-[*]-(b:Person {name: "Rahul"})
)
RETURN path

The beauty of Cypher is that it reads almost like a sentence. (m)-[:FRIENDS_WITH]->(friend) literally means “m is friends with friend.”

Why Graphs Beat JOINs

In a relational database, finding “friends of friends of friends” requires multiple self-JOINs:

-- SQL: friends of friends of friends (3 levels deep)
SELECT DISTINCT f3.name
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
JOIN friendships f3 ON f2.friend_id = f3.user_id
WHERE f1.user_id = 42;
-- Gets exponentially slower with each level

In a graph database, the same query is constant time per hop:

// Cypher: friends of friends of friends
MATCH (m:Person {id: 42})-[:FRIENDS_WITH*3]->(fof)
RETURN DISTINCT fof.name
// Traverses the graph directly — no JOINs needed

The key difference: relational databases compute relationships at query time (JOINs). Graph databases store relationships as physical pointers. Traversing a relationship is just following a pointer — O(1) regardless of the total data size.

Use Cases

Social networks — who knows whom, mutual friends, connection suggestions. LinkedIn’s “People You May Know” is a graph problem.

Recommendation engines — “People who bought X also bought Y.” We traverse the graph: User → bought → Product ← bought ← Other Users → bought → Other Products.

Fraud detection — find circular money transfers, identify suspicious patterns in transaction networks.

Knowledge graphs — Google’s Knowledge Graph, Wikipedia’s data structure. “What is the capital of the country where the Eiffel Tower is located?”

Access control — “Does user X have permission to resource Y through group Z?” This is a graph traversal.

Network topology — mapping computer networks, dependencies between microservices, infrastructure relationships.

When NOT to Use Graph Databases

Simple CRUD — if our data is mostly flat tables with simple relationships, a relational database is simpler and more battle-tested
Heavy aggregations — summing up sales by month, calculating averages. SQL is better at this
High write throughput — graph databases are generally optimized for read-heavy, traversal-heavy workloads
Large-scale analytics — for data warehousing and analytics, columnar databases are better

Other Graph Databases

Amazon Neptune — managed graph database supporting both property graphs (Gremlin) and RDF (SPARQL)
ArangoDB — multi-model database that supports graphs, documents, and key-value
Dgraph — distributed graph database built for production scale
JanusGraph — open-source, distributed, built on top of existing storage backends (Cassandra, HBase)

In simple language, graph databases are the right tool when the connections between our data matter as much as the data itself. If we’re constantly asking “who is connected to whom” or “how do we get from A to B”, a graph database will be orders of magnitude faster than joining relational tables. But for everything else, relational databases are still the better choice.