A graph G = (V, E) consists of a set of vertices (nodes) and a set of edges (links) connecting them.
| Term | Definition |
|---|---|
| Vertex (node) | A fundamental unit -- represents an entity |
| Edge (link) | A connection between two vertices |
| Degree | Number of edges incident to a vertex |
| Path | Sequence of vertices connected by edges |
| Cycle | A path that starts and ends at the same vertex |
| Connected | Every vertex reachable from every other |
| Component | A maximal connected subgraph |
5 vertices, 6 edges, connected graph
Graphs are the most general-purpose data structure -- trees, linked lists, and arrays are all special cases.
Edges have no direction. If (u, v) exists, then (v, u) is implied.
Each edge has a source and a target. (u, v) does not imply (v, u).
Most real-world graphs are directed. Even "undirected" relationships are often stored as two directed edges for implementation simplicity.
A weighted graph assigns a numeric value (weight, cost, capacity) to each edge. Weights represent distance, latency, bandwidth, probability, or any domain metric.
An unweighted graph is a weighted graph where every edge has weight 1.
For a graph with n vertices, allocate an n x n matrix M. M[i][j] = 1 if edge (i, j) exists, 0 otherwise. For weighted graphs, store the weight.
Vertices: A=0 B=1 C=2 D=3
A B C D
A [ 0 1 1 0 ]
B [ 1 0 0 1 ]
C [ 1 0 0 1 ]
D [ 0 1 1 0 ]
| Property | Value |
|---|---|
| Space | O(V2) -- regardless of edge count |
| Edge lookup | O(1) -- direct array access |
| Add edge | O(1) |
| Iterate neighbours | O(V) -- must scan entire row |
| Add vertex | O(V2) -- reallocate matrix |
Best for dense graphs where |E| approaches |V|2. Wastes memory on sparse graphs.
Each vertex stores a list (array, linked list, or hash set) of its neighbours. The most common representation in practice.
A -> [B, C]
B -> [A, D]
C -> [A, D]
D -> [B, C]
Default choice for most graph algorithms. BFS and DFS run in O(V + E) -- optimal.
| Property | Value |
|---|---|
| Space | O(V + E) -- proportional to graph size |
| Edge lookup | O(degree) or O(1) with hash set |
| Add edge | O(1) -- append to list |
| Iterate neighbours | O(degree) -- direct traversal |
| Add vertex | O(1) -- append new list |
| Criterion | Adjacency Matrix | Adjacency List |
|---|---|---|
| Space complexity | O(V2) | O(V + E) |
| Edge existence check | O(1) | O(degree) |
| Iterate all neighbours | O(V) | O(degree) |
| Add / remove edge | O(1) | O(1) add / O(degree) remove |
| Dense graph performance | Excellent | Wasteful pointer overhead |
| Sparse graph performance | Wasteful memory | Excellent |
| Cache locality | Good (contiguous) | Poor (pointer chasing) |
| Matrix operations | Natural (multiply, transpose) | Requires conversion |
|E| > |V|2/4)Store edges as a flat list of (source, target, weight) tuples. Simplest possible representation.
[(A,B,5), (A,C,2), (B,D,3),
(C,D,7), (D,E,1)]
O(E)O(E) -- linear scanAn |V| x |E| matrix. Column j has +1 at source, -1 at target (directed), or 1 at both endpoints (undirected).
e1 e2 e3 e4
A [ 1 1 0 0 ]
B [ 1 0 1 0 ]
C [ 0 1 0 1 ]
D [ 0 0 1 1 ]
O(V * E)Edge lists are underrated for batch processing. Many distributed graph frameworks (Spark GraphX) use edge-list partitioning internally.
CSR stores a sparse adjacency structure using three flat arrays, eliminating pointer overhead and maximising cache locality.
Graph: 0->[1,3] 1->[2] 2->[3] 3->[]
row_ptr: [0, 2, 3, 4, 4]
col_idx: [1, 3, 2, 3]
values: [1, 1, 1, 1]
| Array | Size | Purpose |
|---|---|---|
row_ptr | V + 1 | Start index in col_idx for each vertex |
col_idx | E | Concatenated sorted neighbour indices |
values | E | Edge weights (omit for unweighted) |
O(V + E) with minimal overheadO(log degree)Standard format for high-performance graph libraries (Boost Graph, SuiteSparse, cuGraph) and GPU graph processing. CSC (Compressed Sparse Column) enables efficient in-neighbour access.
An implicit graph generates vertices and edges on demand via a function rather than storing them in memory. The graph exists logically but is never fully materialised.
Vertex (r, c) connects to (r+/-1, c) and (r, c+/-1). An n x n grid has n2 vertices but the adjacency structure is never allocated.
Puzzle states (Rubik's cube, 15-puzzle) are vertices; legal moves are edges. Billions of nodes but BFS explores only a fraction.
Chess positions are vertices; legal moves are edges. ~1047 positions -- fully materialising the tree is impossible.
def neighbours(state):
for move in legal_moves(state):
yield apply(move, state)
# BFS with implicit graph
from collections import deque
def bfs(start, goal):
queue = deque([start])
visited = {start}
while queue:
node = queue.popleft()
if node == goal:
return True
for nbr in neighbours(node):
if nbr not in visited:
visited.add(nbr)
queue.append(nbr)
return False
If your graph has more than ~108 vertices, implicit representation combined with BFS, DFS, or A* is the only practical option.
Allow multiple edges (parallel edges) between the same pair of vertices, and optionally self-loops.
A ==[flight_101]==> B
A ==[flight_202]==> B
A ==[flight_303]==> B
A hyperedge connects an arbitrary number of vertices (not just two). Generalises the edge concept.
Standard graph algorithms do not apply directly. Common approach: expand to a bipartite graph.
A graph whose vertices can be divided into two disjoint sets U and V such that every edge connects a vertex in U to a vertex in V. No edge within the same set.
Time complexity: O(V + E). A graph is bipartite iff it contains no odd-length cycle.
A directed acyclic graph is a directed graph with no cycles. Every DAG has at least one topological ordering.
O(V + E)Run DFS and check for back edges -- an edge to a vertex currently on the recursion stack. If found, the graph has a cycle and is not a DAG.
| Application | Vertices | Edges |
|---|---|---|
| Build systems (Make, Bazel) | Tasks | Dependencies |
| Package managers (npm, pip) | Packages | Version deps |
| Data pipelines (Airflow) | Stages | Data flow |
| Git commit history | Commits | Parent ptrs |
| Spreadsheet formulas | Cells | Cell refs |
| Neural network layers | Layers | Data flow |
Topological order: A, B, C, D, E
Produces an ordering of vertices such that for every directed edge (u, v), vertex u appears before v. Only possible on DAGs.
1. Compute in-degree for every vertex
2. Enqueue all vertices with in-degree 0
3. While queue is not empty:
a. Dequeue vertex u, append to result
b. For each neighbour v of u:
- Decrement in-degree of v
- If in-degree becomes 0, enqueue v
4. If result length != |V|, cycle exists
Time: O(V + E) Space: O(V)
1. Run DFS from each unvisited vertex
2. On finishing a vertex (all descendants
explored), push it onto a stack
3. Pop the stack to get topological order
Time: O(V + E) Space: O(V)
Multiple valid orderings may exist. Kahn's can be adapted to find all orderings or detect a unique one.
Store vertices and edges as first-class objects with index-free adjacency -- each node physically points to its neighbours.
CREATE TABLE vertices (
id INTEGER PRIMARY KEY,
label VARCHAR(64),
props JSONB
);
CREATE TABLE edges (
src INTEGER REFERENCES vertices(id),
dst INTEGER REFERENCES vertices(id),
label VARCHAR(64),
weight FLOAT,
PRIMARY KEY (src, dst, label)
);
| Feature | Native Graph DB | Relational |
|---|---|---|
| Multi-hop traversal | Fast (pointer chasing) | Slow (recursive JOINs) |
| Ad-hoc analytics | Limited | Excellent (SQL) |
| ACID transactions | Varies | Mature |
| Ecosystem / tooling | Smaller | Enormous |
| Schema flexibility | High | Rigid |
| Aggregation queries | Weak | Strong |
Use a native graph DB when your core access pattern is multi-hop traversal. Use relational when you also need joins, aggregation, and mature tooling.
Vertices and edges carry labels and arbitrary key-value properties. Dominant model in industry.
(:Person {name:"Alice", age:30})
-[:FOLLOWS {since:2023}]->
(:Person {name:"Bob", age:28})
Query languages:
Data as triples: (subject, predicate, object). Every entity is a URI. W3C standard for the Semantic Web.
<:Alice> <:follows> <:Bob> .
<:Alice> <:age> "30"^^xsd:integer .
<:Bob> <:worksAt> <:Acme> .
Query language:
| Aspect | Property Graph | RDF |
|---|---|---|
| Schema | Optional labels + constraints | Ontologies (OWL, RDFS) |
| Strength | Application development | Data integration, linked data |
| Ecosystem | Neo4j, TigerGraph, Memgraph | Wikidata, DBpedia, knowledge graphs |
| Need | Library |
|---|---|
| Quick prototyping, <100K edges | NetworkX |
| Large graphs in Python (>1M edges) | igraph or graph-tool |
| Production Java application | JGraphT |
| High-performance C++ pipeline | Boost Graph Library |
| Billion-edge distributed processing | Spark GraphX / Pregel |
| GPU-accelerated analytics | NVIDIA cuGraph |
| Scenario | Representation | Reason |
|---|---|---|
| Sparse, dynamic graph | Adjacency list (hash map) | O(V+E) space, fast traversal, easy mutation |
| Dense graph or matrix algebra | Adjacency matrix | O(1) edge lookup, spectral methods |
| Static analysis, HPC, GPU | CSR / CSC | Cache-friendly, minimal memory |
| Edge-centric algorithms (Kruskal) | Edge list | Sort by weight, union-find |
| Enormous state space (AI search) | Implicit graph | Generate on demand, never materialise |
| Persistent storage with queries | Property graph DB | Index-free adjacency, Cypher/Gremlin |
| Linked data / knowledge graph | RDF triple store | Standards-based, federated queries |
| Multiple edge types, same nodes | Multigraph adj. list | Edge IDs distinguish parallel edges |
Start with an adjacency list. Move to CSR when profiling shows memory or cache bottlenecks. Move to a graph database when you need persistence and multi-hop queries at scale.
Vertices = users; edges = follows/friendships. Graph algorithms power friend suggestions (common neighbours), influence scoring (PageRank), and community detection (Louvain).
Intersections = vertices; road segments = weighted edges. Dijkstra, A*, and contraction hierarchies enable real-time routing in Google Maps, Waze, and OSRM.
Packages, modules, or build targets as vertices; imports as edges. Topological sort determines build order; cycle detection prevents circular dependencies.
Entities and relationships modelled as a graph -- Google Knowledge Graph, Wikidata. Power semantic search, question answering, and recommendation.
Control flow graphs, data flow graphs, SSA form. Graph analyses drive dead code elimination, register allocation, and instruction scheduling.
Protein interaction networks, genome assembly (de Bruijn graphs). Transaction graphs reveal suspicious patterns via subgraph matching in fraud detection.
G = (V, E) is the most flexible data structure -- trees, lists, and matrices are all special casesO(V + E) space, optimal traversal| Source | Description |
|---|---|
| Cormen et al. | Introduction to Algorithms (CLRS) -- chapters 22-26 |
| Skiena | The Algorithm Design Manual -- practical graph problem taxonomy |
| Needham & Hodler | Graph Algorithms (O'Reilly) -- Neo4j and Spark |
| NetworkX Docs | networkx.org |
| Boost Graph | boost.org/libs/graph |
| Neo4j Academy | graphacademy.neo4j.com |