COMPUTER SCIENCE FUNDAMENTALS SERIES

Binary Search
Trees

BST invariant · Search & insertion · Balancing ·
Treaps · Splay trees · Order statistics

Mid-level software engineer track · 20 slides

The BST Invariant

Definition

A Binary Search Tree is a rooted binary tree where every node satisfies the BST invariant:

All keys in the left subtree are strictly less than the node's key
All keys in the right subtree are strictly greater than the node's key
Both left and right subtrees are themselves valid BSTs

Node structure

class Node:
    key:    int
    value:  any
    left:   Node | None
    right:  Node | None

The invariant gives us O(log n) expected search time — each comparison eliminates roughly half the remaining nodes, just like binary search on a sorted array.

Visual example

Search

Algorithm

Starting at the root, compare the target key k with the current node's key:

k == node.key — found, return node
k < node.key — recurse into left subtree
k > node.key — recurse into right subtree
node == null — key not present

Recursive

def search(node, key):
    if node is None:
        return None
    if key == node.key:
        return node
    elif key < node.key:
        return search(node.left, key)
    else:
        return search(node.right, key)

Iterative variant

def search_iterative(node, key):
    while node is not None:
        if key == node.key:
            return node
        elif key < node.key:
            node = node.left
        else:
            node = node.right
    return None

Iterative search avoids call-stack overhead — preferred in production for deep trees.

Search trace example

Searching for key 7 in the example tree:

8→3→6→7found!

3 comparisons — O(log 9) ≈ 3.2

Insertion

Algorithm

Insertion follows the same path as search. When we reach a null pointer, we create a new node there.

def insert(node, key, value):
    if node is None:
        return Node(key, value)
    if key < node.key:
        node.left = insert(node.left, key, value)
    elif key > node.key:
        node.right = insert(node.right, key, value)
    else:
        node.value = value  # duplicate: update
    return node

Key insight: the shape of a BST is entirely determined by insertion order. This is the fundamental motivation for self-balancing trees.

Insertion order matters

Balanced insert

[4, 2, 6, 1, 3, 5, 7]

Height = 2

Degenerate insert

[1, 2, 3, 4, 5, 6, 7]

Height = 6

Deletion

Three cases

Leaf node

No children — remove the node directly.

One child

Replace the node with its single child.

Two children

Replace with in-order successor, then delete the successor.

Deletion with two children is O(h) where h is the tree height — finding the successor requires walking down the right subtree.

Pseudocode

def delete(node, key):
    if node is None:
        return None
    if key < node.key:
        node.left = delete(node.left, key)
    elif key > node.key:
        node.right = delete(node.right, key)
    else:
        if node.left is None:
            return node.right
        if node.right is None:
            return node.left
        # Two children: in-order successor
        succ = find_min(node.right)
        node.key = succ.key
        node.value = succ.value
        node.right = delete(
            node.right, succ.key)
    return node

In-Order Successor & Predecessor

In-order successor

The node with the smallest key greater than a given node's key.

Case	Method
Right subtree exists	Leftmost node in right subtree
No right subtree	Walk up until node is in a left subtree

def find_successor(node):
    if node.right:
        return find_min(node.right)
    parent = node.parent
    while parent and node == parent.right:
        node = parent
        parent = parent.parent
    return parent

In-order predecessor

The mirror: largest key smaller than the given node.

Case	Method
Left subtree exists	Rightmost node in left subtree
No left subtree	Walk up until node is in a right subtree

Successor and predecessor are essential for deletion (two-child case), range queries, and iterator design.

BST Traversals

Four traversal orders

Order	Visit sequence	Output (example tree)
In-order	left, node, right	1, 3, 4, 6, 7, 8, 10, 13, 14
Pre-order	node, left, right	8, 3, 1, 6, 4, 7, 10, 14, 13
Post-order	left, right, node	1, 4, 7, 6, 3, 13, 14, 10, 8
Level-order	BFS by depth	8, 3, 10, 1, 6, 14, 4, 7, 13

In-order traversal

def inorder(node, result):
    if node is None:
        return
    inorder(node.left, result)
    result.append(node.key)
    inorder(node.right, result)

Key insight

In-order traversal of a BST always produces keys in ascending sorted order. This is the foundation of tree-sort.

Tree-sort complexity

O(n log n) average — balanced insertions
O(n²) worst — degenerate tree
O(n) in-order traversal after construction

BST from Sorted Array

Problem

Given a sorted array, build a height-balanced BST (minimise height).

Algorithm

Pick the middle element as root. Recursively build the left subtree from the left half and the right subtree from the right half.

def sorted_array_to_bst(arr, lo, hi):
    if lo > hi:
        return None
    mid = (lo + hi) // 2
    node = Node(arr[mid])
    node.left  = sorted_array_to_bst(
        arr, lo, mid - 1)
    node.right = sorted_array_to_bst(
        arr, mid + 1, hi)
    return node

Complexity

Metric	Value
Time	`O(n)` — each element visited once
Space	`O(log n)` — recursion stack
Result height	`floor(log₂(n))`

Result for [1..7]

This is the optimal BST construction for uniformly distributed queries. For skewed distributions, see optimal BSTs (Knuth's algorithm).

Complexity Analysis

Time complexity by operation

Operation	Best	Average	Worst
Search	`O(1)`	`O(log n)`	`O(n)`
Insert	`O(1)`	`O(log n)`	`O(n)`
Delete	`O(1)`	`O(log n)`	`O(n)`
Min / Max	`O(1)`	`O(log n)`	`O(n)`
Successor	`O(1)`	`O(log n)`	`O(n)`
In-order traversal	`O(n)`	`O(n)`	`O(n)`

Space complexity

Storage: O(n) for n nodes
Recursive ops: O(h) stack space
Balanced: h = O(log n)
Degenerate: h = O(n)

Average-case analysis

A BST built from n random insertions has expected height O(log n). Specifically, the expected depth of a node is approximately 1.39 × log₂(n) — the same constant as quicksort's expected comparisons.

Warning

The O(log n) average case assumes random insertion order. In practice, data often arrives sorted or nearly sorted, producing the worst case. This is why we need balancing.

The Degeneration Problem

What goes wrong

When keys are inserted in sorted order, each new node becomes a right child. The tree degenerates into a linked list.

Sorted insert

Height = n - 1

Balanced alternative

Height = 2

Performance impact

Tree shape	Height	Search
Perfectly balanced	`log₂(n)`	`O(log n)`
Random insertions	`~1.39 × log₂(n)`	`O(log n)`
Sorted insertions	`n - 1`	`O(n)`

Real-world triggers

Auto-incrementing IDs inserted in order
Alphabetically sorted data (names, cities)
Time-series data with monotonic timestamps
Sequential log entries

A degenerate BST offers no advantage over a linked list but uses more memory (two child pointers per node).

Why Balancing Matters

The guarantee

A balanced BST maintains h = O(log n) after every insertion and deletion, ensuring worst-case O(log n) operations.

Balance criteria comparison

Structure	Criterion	Height bound
AVL tree	Child heights differ by ≤ 1	`1.44 × log₂(n)`
Red-Black	No path > 2× shortest	`2 × log₂(n)`
B-tree	All leaves same depth	`log_B(n)`
Treap	Random priorities	`O(log n)` expected
Splay tree	No explicit criterion	`O(log n)` amortised

The cost of balance

Balancing requires extra work on each insert/delete — typically rotations:

Single rotation

One pointer swap — O(1)

Double rotation

Two pointer swaps — O(1)

Splaying

Up to O(log n) rotations — amortised

The overhead is small: rotations are constant-time pointer operations. The payoff is enormous: guaranteed logarithmic performance regardless of input order.

Randomised BSTs

Idea

If insertion order determines tree shape, and random order gives O(log n) expected height, can we simulate random order?

Approach 1 — Shuffle then insert

Randomly permute the keys before insertion. Simple but requires all keys up front (offline).

Approach 2 — Randomised insertion

On inserting a key into a subtree of size n, with probability 1/(n+1) insert it at the root of the subtree (using rotations). Otherwise, recurse normally.

Implementation

def rand_insert(node, key):
    if node is None:
        return Node(key)
    if random() < 1.0 / (size(node) + 1):
        return insert_at_root(node, key)
    if key < node.key:
        node.left = rand_insert(
            node.left, key)
    else:
        node.right = rand_insert(
            node.right, key)
    return node

Properties

Each key equally likely to be root of any subtree
Expected height: O(log n)
No worst-case guarantee — but bad trees are exponentially unlikely

Randomised BSTs are elegant in theory. In practice, treaps achieve the same probabilistic guarantees with a cleaner implementation.

Treaps

Definition

A treap is a BST where each node has a key (BST order) and a random priority (heap order). The tree simultaneously satisfies:

BST property on keys — left < node < right
Max-heap property on priorities — parent ≥ child

Operations

Operation	Method	Time
Insert	BST insert, rotate up to restore heap	`O(log n)`
Delete	Set priority → -∞, rotate to leaf, remove	`O(log n)`
Split	Insert key with priority ∞	`O(log n)`
Merge	Compare root priorities, recurse	`O(log n)`

Structure

Treaps are simpler to implement than AVL or Red-Black trees. Used in competitive programming and functional data structures.

Augmented BSTs — Order Statistics Tree

Concept

An augmented BST stores additional information in each node to support extra queries efficiently. The augmented data must be maintainable during rotations in O(1).

Order statistics tree

Each node stores size — the number of nodes in its subtree (including itself).

Supported operations

Operation	Description	Time
`select(k)`	Find the k-th smallest element	`O(log n)`
`rank(key)`	Find the rank (position) of a key	`O(log n)`

Structure

Select algorithm

def select(node, k):
    left_size = size(node.left)
    if k == left_size + 1:
        return node
    elif k <= left_size:
        return select(node.left, k)
    else:
        return select(node.right,
                      k - left_size - 1)

Augmented BSTs — Interval Tree

Problem

Given a set of intervals [lo, hi], efficiently find all intervals that overlap a query interval or point.

Augmentation

Each node stores an interval [lo, hi] as its key (ordered by lo), plus max_hi — the maximum hi value in its subtree.

Overlap query

def query_overlap(node, lo, hi):
    if node is None:
        return []
    results = []
    if overlaps(node.interval, (lo, hi)):
        results.append(node.interval)
    if node.left and node.left.max_hi >= lo:
        results += query_overlap(
            node.left, lo, hi)
    if node.right and \
       node.right.interval.lo <= hi:
        results += query_overlap(
            node.right, lo, hi)
    return results

Structure

Applications

Scheduling

Calendar conflict detection

Genomics

Overlapping gene regions

Geometry

Segment intersection

Networking

IP range lookups

Overlap queries: O(log n + k) where k = matches. Without augmentation: O(n).

Self-Balancing Preview — AVL Trees

Definition

An AVL tree (Adelson-Velsky and Landis, 1962) is a BST where for every node, the heights of the left and right subtrees differ by at most 1.

Balance factor

balance_factor(node) = height(left) - height(right)
# Valid values: -1, 0, +1
# Any other value triggers rebalancing

Properties

Height bound: h < 1.44 × log₂(n + 2)
Search: O(log n) worst case
Insert: O(log n) — at most 2 rotations
Delete: O(log n) — up to O(log n) rotations

Rotations

Imbalance	Rotation
Left-Left	Right rotation (single)
Right-Right	Left rotation (single)
Left-Right	Left then right (double)
Right-Left	Right then left (double)

Right rotation (LL case)

AVL trees provide stricter balance than Red-Black trees — searches are slightly faster. Trade-off: more rotations on delete.

Self-Balancing Preview — Red-Black Trees

Five properties

Every node is red or black
The root is black
Every null leaf is black
Red nodes have only black children
All paths from a node to its null descendants have the same black-height

Height guarantee

Longest path (alternating red-black) is at most 2× the shortest (all black):

h ≤ 2 × log₂(n + 1)

Red-Black trees are the default balanced BST in most standard libraries: std::map (C++), TreeMap (Java), SortedDictionary (C#).

AVL vs Red-Black

Feature	AVL	Red-Black
Height bound	`1.44 log n`	`2 log n`
Search speed	Slightly faster	Slightly slower
Insert rotations	Up to 2	Up to 2
Delete rotations	`O(log n)`	Up to 3
Best for	Read-heavy	Insert/delete-heavy

Example structure

Splay Trees

Idea

A self-adjusting BST that moves the most recently accessed node to the root via rotations called splaying. No balance metadata stored.

Splay operations

Pattern	Name	Action
Child of root	Zig	Single rotation
Same side (LL/RR)	Zig-zig	Rotate parent first, then node
Opposite side (LR/RL)	Zig-zag	Rotate node twice

Amortised analysis

No single operation guaranteed O(log n)
m operations on n-node tree: O(m × log n) total
Amortised cost per operation: O(log n)

Advantages

No extra storage

No height, colour, size, or priority fields needed.

Adaptive

Frequently accessed elements stay near the root — automatically adapts to access patterns.

Working-set property

Accessing k distinct elements in a window costs O(log k) amortised, not O(log n).

Splay trees excel in caches and access-pattern-skewed workloads. Suboptimal for uniform random access due to constant restructuring.

BST Iterator Design

Problem

Provide an iterator that returns BST elements in sorted (in-order) order, using O(h) space and O(1) amortised time per next() call.

Stack-based approach

class BSTIterator:
    def __init__(self, root):
        self.stack = []
        self._push_left(root)

    def _push_left(self, node):
        while node:
            self.stack.append(node)
            node = node.left

    def has_next(self):
        return len(self.stack) > 0

    def next(self):
        node = self.stack.pop()
        self._push_left(node.right)
        return node.key

Complexity

Metric	Value
Space	`O(h)` — stack holds at most one node per level
Amortised `next()`	`O(1)` — each node pushed and popped once
Worst single `next()`	`O(h)` — descending a left spine

Morris traversal alternative

Uses O(1) extra space by temporarily modifying tree pointers (threading). Restores the tree after traversal. Not suitable for concurrent access.

The stack-based iterator is a classic interview question and the standard implementation in most BST libraries. It decouples traversal from processing.

Applications

Symbol tables & dictionaries

BSTs implement the ordered symbol table ADT: put, get, delete, min, max, floor, ceiling, rank, select.

Used in compilers, interpreters, configuration stores.

Database indexing

B-trees (a generalisation of BSTs with high branching factor) are the default index structure in all major relational databases.

BST concepts — invariant, rotations, balance — directly underpin B-tree operations.

Auto-complete & prefix search

BSTs on strings support prefix-based lookup. "What words come between X and Y?" in O(log n + k).

Tries are a related BST-family structure.

Computational geometry

k-d trees — BSTs partitioning k-dimensional space
Interval trees — scheduling, genomics
Range trees — orthogonal range queries

Operating systems

Linux CFS — Red-Black tree keyed by virtual runtime
Memory management — VM area tracking via Red-Black tree
File systems — ext4 extent trees, Btrfs B-trees

BSTs are not just an academic exercise. They are the backbone of ordered data access in virtually every system you use.

Summary & Further Reading

Key takeaways

The BST invariant (left < node < right) enables O(log n) search, insert, and delete — but only when balanced
Degenerate BSTs reduce to linked lists with O(n) operations — insertion order is everything
Balancing strategies (AVL, Red-Black, treaps, splay) restore O(log n) guarantees at minimal overhead
Augmented BSTs (order statistics, interval trees) answer richer queries without changing asymptotic complexity
Treaps combine BST + heap using random priorities for expected O(log n) height
Splay trees achieve amortised O(log n) through access-driven restructuring
The stack-based BST iterator provides O(h) space, O(1) amortised traversal
BSTs underpin databases, operating systems, and compilers

Source	Description
Sedgewick & Wayne	Algorithms, 4th ed. — definitive BST treatment with Java implementations
CLRS	Introduction to Algorithms — formal proofs for Red-Black trees, augmented BSTs
Skiena	The Algorithm Design Manual — practical perspective on when to use which tree
Sleator & Tarjan	"Self-Adjusting Binary Search Trees" — the original splay tree paper (1985)
MIT 6.006	Erik Demaine's lectures on BSTs — free on YouTube, excellent visualisations

Binary SearchTrees

The BST Invariant

Definition

Node structure

Visual example

Search

Algorithm

Recursive

Iterative variant

Search trace example

Insertion

Algorithm

Insertion order matters

Balanced insert

Degenerate insert

Deletion

Three cases

Leaf node

One child

Two children

Pseudocode

In-Order Successor & Predecessor

In-order successor

In-order predecessor

BST Traversals

Four traversal orders

In-order traversal

Key insight

Tree-sort complexity

BST from Sorted Array

Problem

Algorithm

Complexity

Result for [1..7]

Complexity Analysis

Time complexity by operation

Space complexity

Average-case analysis

Warning

The Degeneration Problem

What goes wrong

Sorted insert

Balanced alternative

Performance impact

Real-world triggers

Why Balancing Matters

The guarantee

Balance criteria comparison

The cost of balance

Single rotation

Double rotation

Splaying

Randomised BSTs

Idea

Approach 1 — Shuffle then insert

Approach 2 — Randomised insertion

Implementation

Properties

Treaps

Definition

Operations

Structure

Augmented BSTs — Order Statistics Tree

Concept

Order statistics tree

Supported operations

Structure

Select algorithm

Augmented BSTs — Interval Tree

Problem

Augmentation

Overlap query

Structure

Applications

Scheduling

Genomics

Geometry

Networking

Self-Balancing Preview — AVL Trees

Definition

Binary Search
Trees