COMPUTER SCIENCE FUNDAMENTALS SERIES

Divide and
Conquer

Master theorem · Merge sort · Quick sort ·
Strassen's algorithm · Closest pair · FFT

Mid-level software engineer track · 20 slides

The Paradigm: Divide, Conquer, Combine

Three steps

Every divide-and-conquer algorithm follows the same recursive blueprint:

1 Divide — break the problem into smaller, independent subproblems of the same type

2 Conquer — solve each subproblem recursively; if small enough, solve directly (base case)

3 Combine — merge the subproblem solutions into a solution for the original problem

When to use D&C

Problem has optimal substructure — solution builds from sub-solutions
Subproblems are independent — no shared mutable state
Combining is efficient — merging cost must not dominate
Examples: sorting, searching, geometric problems, algebraic transforms

D&C vs Dynamic Programming: both decompose problems. D&C subproblems are independent (no overlap); DP subproblems overlap and require memoisation.

Recurrence Relations

Defining runtime recursively

A D&C algorithm with a subproblems of size n/b and O(n^d) combine work has recurrence:

T(n) = a · T(n/b) + O(n^d)

Solution methods

Substitution — guess the form, prove by induction
Recursion tree — draw the tree, sum work per level
Master theorem — closed-form for T(n) = aT(n/b) + O(n^d)

Common recurrence examples

Algorithm	Recurrence	Solution
Binary search	`T(n) = T(n/2) + O(1)`	`O(log n)`
Merge sort	`T(n) = 2T(n/2) + O(n)`	`O(n log n)`
Karatsuba	`T(n) = 3T(n/2) + O(n)`	`O(n^1.585)`
Strassen	`T(n) = 7T(n/2) + O(n²)`	`O(n^2.807)`
Naive multiply	`T(n) = 4T(n/2) + O(n)`	`O(n²)`

The Master Theorem

For recurrence T(n) = a · T(n/b) + O(n^d) where a ≥ 1, b > 1, d ≥ 0:

Case 1 — Leaves dominate

d < log_b(a)

T(n) = O(n^log_b(a))

Recursion branches faster than work shrinks

Case 2 — Balanced

d = log_b(a)

T(n) = O(n^d · log n)

Each level does equal work; log n levels total

Case 3 — Root dominates

d > log_b(a)

T(n) = O(n^d)

Combine cost dwarfs recursive cost

Worked examples

Algorithm	a, b, d	log_b(a)	Case	Result
Merge sort	2, 2, 1	1	Case 2	`O(n log n)`
Binary search	1, 2, 0	0	Case 2	`O(log n)`
Strassen	7, 2, 2	2.807	Case 1	`O(n^2.807)`
Karatsuba	3, 2, 1	1.585	Case 1	`O(n^1.585)`

The Master Theorem does not cover all recurrences. Akra–Bazzi generalises it to unequal subproblem sizes.

Merge Sort

The canonical D&C sort

1 Divide — split array into two halves

2 Conquer — recursively sort each half

3 Combine — merge two sorted halves in O(n)

Properties

Time: O(n log n) worst, average, and best case
Space: O(n) auxiliary (merge buffer)
Stable: yes — equal elements preserve original order
Not in-place: requires extra memory proportional to input

Pseudocode

merge_sort(A, lo, hi):
    if hi - lo <= 1: return
    mid = (lo + hi) / 2
    merge_sort(A, lo, mid)
    merge_sort(A, mid, hi)
    merge(A, lo, mid, hi)

merge(A, lo, mid, hi):
    L = A[lo..mid]
    R = A[mid..hi]
    i = j = 0, k = lo
    while i < |L| and j < |R|:
        if L[i] <= R[j]:
            A[k++] = L[i++]
        else:
            A[k++] = R[j++]
    copy remaining L or R into A

Merge Sort — Analysis

Recursion tree

Level 0:     n work  (1 merge of n)
Level 1:     n work  (2 merges of n/2)
Level 2:     n work  (4 merges of n/4)
  ...
Level log n: n work  (n merges of 1)

Total: n · log n

Inversion counting

Merge sort can count inversions during the merge step. When a right-side element is chosen before left-side elements, it crosses over all remaining left elements — add that count.

Practical variants

Timsort

Python and Java's hybrid: natural merge sort + insertion sort for small runs. Exploits partially-sorted input for O(n) best case.

Bottom-up merge sort

Iterative variant. Merge pairs of size 1, then 2, then 4, ... Better cache locality, no recursion overhead.

External merge sort

Standard algorithm for sorting data that does not fit in RAM. Split into sorted runs, merge runs from disk using a priority queue.

Merge sort is the comparison sort baseline — guaranteed O(n log n) regardless of input distribution.

Quick Sort

Algorithm

1 Divide — choose a pivot, partition the array around it

2 Conquer — recursively sort elements less than and greater than pivot

3 Combine — nothing; partitioning is in-place

Complexity

Case	Time	When
Best	`O(n log n)`	Balanced partitions
Average	`O(n log n)`	Random pivot
Worst	`O(n²)`	Sorted + bad pivot

Pseudocode (Lomuto)

quicksort(A, lo, hi):
    if lo < hi:
        p = partition(A, lo, hi)
        quicksort(A, lo, p - 1)
        quicksort(A, p + 1, hi)

Why Quick Sort wins in practice

In-place: O(log n) stack space vs O(n) for merge sort
Cache-friendly: sequential access during partitioning
Small constant: fewer data movements on average
Randomised pivot eliminates adversarial worst case

Partition Schemes — Lomuto & Hoare

Lomuto partition

partition(A, lo, hi):
  pivot = A[hi]
  i = lo
  for j = lo to hi-1:
    if A[j] <= pivot:
      swap(A[i], A[j])
      i++
  swap(A[i], A[hi])
  return i

Simple, easy to prove correct
~n comparisons, up to n swaps
Poor on many duplicates

Hoare partition

partition(A, lo, hi):
  pivot = A[lo]
  i = lo-1, j = hi+1
  loop:
    do i++ while A[i]<pivot
    do j-- while A[j]>pivot
    if i >= j: return j
    swap(A[i], A[j])

~n/2 swaps on average
Does not place pivot in final position
Better cache behaviour

Three-way (DNF)

Partitions into < pivot, = pivot, > pivot. Essential when many duplicates are present.

lo, mid, hi pointers
scan left to right:
  < pivot → swap to lo
  = pivot → advance mid
  > pivot → swap to hi

Used by std::sort variants and pdqsort.

Binary Search & Variants

Classic binary search

binary_search(A, target):
    lo, hi = 0, len(A) - 1
    while lo <= hi:
        mid = lo + (hi - lo) / 2
        if A[mid] == target: return mid
        elif A[mid] < target: lo = mid + 1
        else: hi = mid - 1
    return -1

Time: O(log n) — recurrence T(n) = T(n/2) + O(1), Master Case 2

Space: O(1) iterative, O(log n) recursive

Off-by-one pitfalls

Use lo + (hi - lo) / 2 not (lo + hi) / 2 to avoid overflow
Clarify inclusive vs exclusive bounds before coding
Test on arrays of size 0, 1, 2 — most bugs hide there

Variants

Variant	Description
Lower bound	First index where `A[i] ≥ target`
Upper bound	First index where `A[i] > target`
Exponential	Doubling to find range, then binary search — `O(log k)`
Interpolation	Estimate by value distribution — `O(log log n)` uniform
Fractional cascading	Search multiple sorted lists — amortised `O(log n + k)`

Strassen's Matrix Multiplication

Strassen's insight (1969)

Divide each n×n matrix into four n/2 × n/2 submatrices. Naive block multiply needs 8 recursive multiplications. Strassen reduces this to 7.

The seven products

M1 = (A11 + A22)(B11 + B22)
M2 = (A21 + A22) B11
M3 = A11 (B12 - B22)
M4 = A22 (B21 - B11)
M5 = (A11 + A12) B22
M6 = (A21 - A11)(B11 + B12)
M7 = (A12 - A22)(B21 + B22)

Result blocks computed from M1..M7 using only addition/subtraction.

Complexity comparison

Method	Multiplications	Time
Naive	8 per level	`O(n³)`
Strassen	7 per level	`O(n^2.807)`
CW variants	—	`O(n^2.372)` (galactic)

Strassen is practical for n ≥ 64 or so. Below that threshold, the constant factor from extra additions makes naive faster. Libraries like BLAS switch strategies based on matrix size.

Closest Pair of Points

Problem

Given n points in the plane, find the pair with minimum Euclidean distance. Brute force: O(n²).

D&C approach — `O(n log n)`

Sort points by x-coordinate
Divide — split into left and right halves by median x
Conquer — recursively find closest pair in each half: d_L, d_R
Combine — let d = min(d_L, d_R). Check pairs straddling the dividing line within a strip of width 2d

T(n) = 2T(n/2) + O(n)  →  O(n log n)

The strip trick

For each point in the strip, compare to at most 7 subsequent points (sorted by y). Strip processing is O(n), not O(n²).

Karatsuba Multiplication

Problem

Multiply two n-digit integers. School algorithm: O(n²).

Karatsuba's trick (1960)

Split each number into high and low halves:

x = x_H · B^m + x_L
y = y_H · B^m + y_L

Naive: 4 multiplications of n/2-digit numbers. Karatsuba uses 3:

z0 = x_L · y_L
z2 = x_H · y_H
z1 = (x_L + x_H)(y_L + y_H) - z0 - z2

x·y = z2·B^2m + z1·B^m + z0

Complexity

Recurrence: T(n) = 3T(n/2) + O(n)

Master Case 1: log₂(3) ≈ 1.585 > 1 → O(n^1.585)

Multiplication algorithm hierarchy

Algorithm	Complexity	Crossover
Schoolbook	`O(n²)`	n < 20–80
Karatsuba	`O(n^1.585)`	20–80 digits
Toom-Cook	`O(n^1.465)`	~100+ digits
FFT-based	`O(n log n log log n)`	~10,000+ digits

Python's int and GMP chain these algorithms as n grows.

Maximum Subarray Problem

D&C approach — `O(n log n)`

Divide — split array at midpoint
Conquer — recursively find max subarray in left and right halves
Combine — find max subarray crossing the midpoint (linear scan)
Return maximum of left, right, and crossing

T(n) = 2T(n/2) + O(n)  →  O(n log n)

Kadane's algorithm — `O(n)`

kadane(A):
    max_here = max_so_far = A[0]
    for i = 1 to n-1:
        max_here = max(A[i],
                       max_here + A[i])
        max_so_far = max(max_so_far,
                         max_here)
    return max_so_far

Single pass, O(1) space. Essentially dynamic programming.

Comparison

	D&C	Kadane
Time	`O(n log n)`	`O(n)`
Parallelism	Naturally parallel	Sequential
Teaching value	Demonstrates combine step	Demonstrates DP

The maximum subarray is a case where D&C is instructive but not optimal. Kadane wins for serial execution; D&C wins for parallel.

Median of Medians / Selection

The selection problem

Find the k-th smallest element in an unsorted array.

Quickselect — expected `O(n)`

Partition around a random pivot. Recurse only into the side containing index k. Expected O(n), worst case O(n²).

Median of medians — worst-case `O(n)`

Divide array into groups of 5
Find the median of each group (brute force)
Recursively find the median of those medians → pivot
Partition around this pivot
Recurse into the correct side

Why groups of 5?

The pivot is guaranteed to be between the 30th and 70th percentile. Each recursive call eliminates at least 30% of elements:

T(n) = T(n/5) + T(7n/10) + O(n)

This solves to O(n) because 1/5 + 7/10 = 9/10 < 1.

In practice

std::nth_element in C++ uses Introselect: Quickselect with median-of-medians fallback.

The constant factor is large — Quickselect with random pivot is faster in practice. Median of medians is primarily of theoretical importance: it proves linear selection is possible.

FFT — Fast Fourier Transform

The Discrete Fourier Transform (DFT)

Given a sequence of n complex numbers, the DFT computes n frequency components. Naive evaluation: O(n²).

Cooley–Tukey FFT (1965)

Divide the DFT into two half-size DFTs on even-indexed and odd-indexed elements:

X[k]     = E[k] + ω^k · O[k]
X[k+n/2] = E[k] - ω^k · O[k]

where ω = e^{-2πi/n}
E = DFT of even elements
O = DFT of odd elements

Complexity

T(n) = 2T(n/2) + O(n)  →  O(n log n)

Master theorem Case 2. Reduces DFT from O(n²) to O(n log n).

The butterfly operation

One addition and one subtraction, multiplied by a twiddle factor. Maps naturally to hardware and SIMD.

FFT — Applications & Complexity

Applications

Polynomial multiplication — multiply two degree-n polynomials in O(n log n) instead of O(n²)
Big integer multiplication — Schonhage–Strassen uses FFT for O(n log n log log n)
Signal processing — spectral analysis, filtering, convolution
Audio/image compression — MP3, JPEG rely on DCT (closely related)
String matching — convolution-based pattern matching

Related transforms

Transform	Domain	Use
FFT	Complex numbers	General signal processing
NTT	Integers mod p	Exact polynomial arithmetic
DCT	Real numbers	Compression (JPEG, MP3)
Walsh–Hadamard	Binary/Boolean	Subset-sum convolutions

Number-Theoretic Transform (NTT)

FFT over finite fields (integers mod prime p) instead of complex numbers. Avoids floating-point errors entirely. Used in competitive programming and cryptographic polynomial multiplication.

Parallelism in D&C

Natural parallelism

D&C subproblems are independent — they can run on separate cores, threads, or machines without synchronisation until the combine step.

Fork-join model

fork-join quicksort(A, lo, hi):
    if hi-lo < THRESHOLD:
        insertion_sort(A, lo, hi)
        return
    p = partition(A, lo, hi)
    fork: quicksort(A, lo, p-1)
    fork: quicksort(A, p+1, hi)
    join

Work and span

Metric	Definition	Merge sort
Work `W(n)`	Total operations	`O(n log n)`
Span `S(n)`	Longest dependency chain	`O(n)`
Parallelism	`W(n) / S(n)`	`O(log n)`

Practical frameworks

Fork/Join — Java

RecursiveTask, work-stealing thread pool

TBB — C++

parallel_for, parallel_reduce

Cilk — C/C++

cilk_spawn, cilk_sync

Rayon — Rust

par_iter, work-stealing

multiprocessing — Python

Process pools (GIL workaround)

Amdahl's Law: speedup is limited by the sequential fraction. In merge sort, the final merge is sequential and dominates at high core counts.

Common Pitfalls & Optimisations

Pitfalls

Stack overflow — deep recursion on large inputs. Use iterative bottom-up or increase stack size
Small subproblem overhead — recursive calls on tiny arrays waste function-call overhead. Switch to insertion sort below 16–32 elements
Worst-case pivot — deterministic Quick Sort on sorted input is O(n²). Always randomise or use median-of-three
Unnecessary copying — naive merge sort copies arrays at every level. Use index-based merging with a shared buffer
Integer overflow — (lo + hi) / 2 overflows for large arrays. Use lo + (hi - lo) / 2

Optimisations

Technique	Applies to	Benefit
Hybrid cutoff	All D&C sorts	Insertion sort for n < 16
Tail recursion	Quick Sort	`O(log n)` stack depth
Introsort	Quick Sort	Heap sort fallback at 2 log n depth
Bottom-up merge	Merge Sort	Better cache locality
Bit-reversal	FFT	In-place iterative FFT

D&C in Practice

Standard library sort implementations

Language	Algorithm	Notes
C++	Introsort	QS + Heap + Insertion
Python	Timsort	Merge + Insertion; exploits runs
Java	Dual-pivot QS / Timsort	QS for primitives, Timsort for objects
Rust	pdqsort	Pattern-defeating QS
Go	pdqsort (1.19+)	Replaced introsort

D&C beyond sorting

Domain	Algorithm	D&C idea
Geometry	Convex hull	Merge two convex hulls
Linear algebra	Strassen, FFT solvers	Block decomposition
Databases	External merge sort	Divide across disk pages
Distributed	MapReduce	Map = divide, Reduce = combine
Graphics	BSP trees, k-d trees	Spatial subdivision

If your problem decomposes into independent subproblems of the same type, D&C is likely the right paradigm.

Summary & Further Reading

Key takeaways

Divide and conquer is a fundamental paradigm: divide, conquer, combine
The Master Theorem provides closed-form O-notation for most D&C recurrences
Merge Sort is the stable O(n log n) baseline; Quick Sort wins in practice
Strassen, Karatsuba, and FFT beat "obvious" lower bounds via clever decomposition
Binary search is the simplest D&C — master its variants and off-by-one traps
D&C is naturally parallel: independent subproblems map to cores/machines
Real-world implementations are hybrid: switch strategies at small sizes and degenerate inputs

Divide andConquer

The Paradigm: Divide, Conquer, Combine

Three steps

When to use D&C

Recurrence Relations

Defining runtime recursively

Solution methods

Common recurrence examples

The Master Theorem

Case 1 — Leaves dominate

Case 2 — Balanced

Case 3 — Root dominates

Worked examples

Merge Sort

The canonical D&C sort

Properties

Pseudocode

Merge Sort — Analysis

Recursion tree

Inversion counting

Practical variants

Timsort

Bottom-up merge sort

External merge sort

Quick Sort

Algorithm

Complexity

Pseudocode (Lomuto)

Why Quick Sort wins in practice

Partition Schemes — Lomuto & Hoare

Lomuto partition

Hoare partition

Three-way (DNF)

Binary Search & Variants

Classic binary search

Off-by-one pitfalls

Variants

Strassen's Matrix Multiplication

Strassen's insight (1969)

The seven products

Complexity comparison

Closest Pair of Points

Problem

D&C approach — O(n log n)

The strip trick

Karatsuba Multiplication

Problem

Karatsuba's trick (1960)

Complexity

Multiplication algorithm hierarchy

Maximum Subarray Problem

D&C approach — O(n log n)

Kadane's algorithm — O(n)

Comparison

Median of Medians / Selection

The selection problem

Quickselect — expected O(n)

Median of medians — worst-case O(n)

Why groups of 5?

In practice

FFT — Fast Fourier Transform

The Discrete Fourier Transform (DFT)

Cooley–Tukey FFT (1965)

Complexity

The butterfly operation

FFT — Applications & Complexity

Applications

Related transforms

Number-Theoretic Transform (NTT)

Parallelism in D&C

Natural parallelism

Fork-join model

Work and span

Practical frameworks

Fork/Join — Java

TBB — C++

Cilk — C/C++

Rayon — Rust

multiprocessing — Python

Common Pitfalls & Optimisations

Divide and
Conquer

D&C approach — `O(n log n)`

D&C approach — `O(n log n)`

Kadane's algorithm — `O(n)`

Quickselect — expected `O(n)`

Median of medians — worst-case `O(n)`