COMPUTER SCIENCE FUNDAMENTALS SERIES

Divide and
Conquer

Master theorem · Merge sort · Quick sort ·
Strassen's algorithm · Closest pair · FFT
Mid-level software engineer track  ·  20 slides
02

The Paradigm: Divide, Conquer, Combine

Three steps

Every divide-and-conquer algorithm follows the same recursive blueprint:

1 Divide — break the problem into smaller, independent subproblems of the same type
2 Conquer — solve each subproblem recursively; if small enough, solve directly (base case)
3 Combine — merge the subproblem solutions into a solution for the original problem

When to use D&C

  • Problem has optimal substructure — solution builds from sub-solutions
  • Subproblems are independent — no shared mutable state
  • Combining is efficient — merging cost must not dominate
  • Examples: sorting, searching, geometric problems, algebraic transforms

D&C vs Dynamic Programming: both decompose problems. D&C subproblems are independent (no overlap); DP subproblems overlap and require memoisation.

03

Recurrence Relations

Defining runtime recursively

A D&C algorithm with a subproblems of size n/b and O(nd) combine work has recurrence:

T(n) = a · T(n/b) + O(n^d)

Solution methods

  • Substitution — guess the form, prove by induction
  • Recursion tree — draw the tree, sum work per level
  • Master theorem — closed-form for T(n) = aT(n/b) + O(nd)

Common recurrence examples

AlgorithmRecurrenceSolution
Binary searchT(n) = T(n/2) + O(1)O(log n)
Merge sortT(n) = 2T(n/2) + O(n)O(n log n)
KaratsubaT(n) = 3T(n/2) + O(n)O(n1.585)
StrassenT(n) = 7T(n/2) + O(n2)O(n2.807)
Naive multiplyT(n) = 4T(n/2) + O(n)O(n2)
04

The Master Theorem

For recurrence T(n) = a · T(n/b) + O(nd) where a ≥ 1, b > 1, d ≥ 0:

Case 1 — Leaves dominate

d < logb(a)

T(n) = O(nlogb(a))

Recursion branches faster than work shrinks

Case 2 — Balanced

d = logb(a)

T(n) = O(nd · log n)

Each level does equal work; log n levels total

Case 3 — Root dominates

d > logb(a)

T(n) = O(nd)

Combine cost dwarfs recursive cost

Worked examples

Algorithma, b, dlogb(a)CaseResult
Merge sort2, 2, 11Case 2O(n log n)
Binary search1, 2, 00Case 2O(log n)
Strassen7, 2, 22.807Case 1O(n2.807)
Karatsuba3, 2, 11.585Case 1O(n1.585)

The Master Theorem does not cover all recurrences. Akra–Bazzi generalises it to unequal subproblem sizes.

05

Merge Sort

The canonical D&C sort

1 Divide — split array into two halves
2 Conquer — recursively sort each half
3 Combine — merge two sorted halves in O(n)

Properties

  • Time: O(n log n) worst, average, and best case
  • Space: O(n) auxiliary (merge buffer)
  • Stable: yes — equal elements preserve original order
  • Not in-place: requires extra memory proportional to input

Pseudocode

merge_sort(A, lo, hi):
    if hi - lo <= 1: return
    mid = (lo + hi) / 2
    merge_sort(A, lo, mid)
    merge_sort(A, mid, hi)
    merge(A, lo, mid, hi)

merge(A, lo, mid, hi):
    L = A[lo..mid]
    R = A[mid..hi]
    i = j = 0, k = lo
    while i < |L| and j < |R|:
        if L[i] <= R[j]:
            A[k++] = L[i++]
        else:
            A[k++] = R[j++]
    copy remaining L or R into A
06

Merge Sort — Analysis

Recursion tree

Level 0:     n work  (1 merge of n)
Level 1:     n work  (2 merges of n/2)
Level 2:     n work  (4 merges of n/4)
  ...
Level log n: n work  (n merges of 1)

Total: n · log n

Inversion counting

Merge sort can count inversions during the merge step. When a right-side element is chosen before left-side elements, it crosses over all remaining left elements — add that count.

Practical variants

Timsort

Python and Java's hybrid: natural merge sort + insertion sort for small runs. Exploits partially-sorted input for O(n) best case.

Bottom-up merge sort

Iterative variant. Merge pairs of size 1, then 2, then 4, ... Better cache locality, no recursion overhead.

External merge sort

Standard algorithm for sorting data that does not fit in RAM. Split into sorted runs, merge runs from disk using a priority queue.

Merge sort is the comparison sort baseline — guaranteed O(n log n) regardless of input distribution.

07

Quick Sort

Algorithm

1 Divide — choose a pivot, partition the array around it
2 Conquer — recursively sort elements less than and greater than pivot
3 Combine — nothing; partitioning is in-place

Complexity

CaseTimeWhen
BestO(n log n)Balanced partitions
AverageO(n log n)Random pivot
WorstO(n2)Sorted + bad pivot

Pseudocode (Lomuto)

quicksort(A, lo, hi):
    if lo < hi:
        p = partition(A, lo, hi)
        quicksort(A, lo, p - 1)
        quicksort(A, p + 1, hi)

Why Quick Sort wins in practice

  • In-place: O(log n) stack space vs O(n) for merge sort
  • Cache-friendly: sequential access during partitioning
  • Small constant: fewer data movements on average
  • Randomised pivot eliminates adversarial worst case
08

Partition Schemes — Lomuto & Hoare

Lomuto partition

partition(A, lo, hi):
  pivot = A[hi]
  i = lo
  for j = lo to hi-1:
    if A[j] <= pivot:
      swap(A[i], A[j])
      i++
  swap(A[i], A[hi])
  return i
  • Simple, easy to prove correct
  • ~n comparisons, up to n swaps
  • Poor on many duplicates

Hoare partition

partition(A, lo, hi):
  pivot = A[lo]
  i = lo-1, j = hi+1
  loop:
    do i++ while A[i]<pivot
    do j-- while A[j]>pivot
    if i >= j: return j
    swap(A[i], A[j])
  • ~n/2 swaps on average
  • Does not place pivot in final position
  • Better cache behaviour

Three-way (DNF)

Partitions into < pivot, = pivot, > pivot. Essential when many duplicates are present.

lo, mid, hi pointers
scan left to right:
  < pivot → swap to lo
  = pivot → advance mid
  > pivot → swap to hi

Used by std::sort variants and pdqsort.

09

Binary Search & Variants

Classic binary search

binary_search(A, target):
    lo, hi = 0, len(A) - 1
    while lo <= hi:
        mid = lo + (hi - lo) / 2
        if A[mid] == target: return mid
        elif A[mid] < target: lo = mid + 1
        else: hi = mid - 1
    return -1

Time: O(log n) — recurrence T(n) = T(n/2) + O(1), Master Case 2

Space: O(1) iterative, O(log n) recursive

Off-by-one pitfalls

  • Use lo + (hi - lo) / 2 not (lo + hi) / 2 to avoid overflow
  • Clarify inclusive vs exclusive bounds before coding
  • Test on arrays of size 0, 1, 2 — most bugs hide there

Variants

VariantDescription
Lower boundFirst index where A[i] ≥ target
Upper boundFirst index where A[i] > target
ExponentialDoubling to find range, then binary search — O(log k)
InterpolationEstimate by value distribution — O(log log n) uniform
Fractional cascadingSearch multiple sorted lists — amortised O(log n + k)
10

Strassen's Matrix Multiplication

Strassen's insight (1969)

Divide each n×n matrix into four n/2 × n/2 submatrices. Naive block multiply needs 8 recursive multiplications. Strassen reduces this to 7.

The seven products

M1 = (A11 + A22)(B11 + B22)
M2 = (A21 + A22) B11
M3 = A11 (B12 - B22)
M4 = A22 (B21 - B11)
M5 = (A11 + A12) B22
M6 = (A21 - A11)(B11 + B12)
M7 = (A12 - A22)(B21 + B22)

Result blocks computed from M1..M7 using only addition/subtraction.

Complexity comparison

MethodMultiplicationsTime
Naive8 per levelO(n3)
Strassen7 per levelO(n2.807)
CW variantsO(n2.372) (galactic)

Strassen is practical for n ≥ 64 or so. Below that threshold, the constant factor from extra additions makes naive faster. Libraries like BLAS switch strategies based on matrix size.

11

Closest Pair of Points

Problem

Given n points in the plane, find the pair with minimum Euclidean distance. Brute force: O(n2).

D&C approach — O(n log n)

  1. Sort points by x-coordinate
  2. Divide — split into left and right halves by median x
  3. Conquer — recursively find closest pair in each half: dL, dR
  4. Combine — let d = min(dL, dR). Check pairs straddling the dividing line within a strip of width 2d
T(n) = 2T(n/2) + O(n)  →  O(n log n)

The strip trick

Left half Right half strip 2d

For each point in the strip, compare to at most 7 subsequent points (sorted by y). Strip processing is O(n), not O(n2).

12

Karatsuba Multiplication

Problem

Multiply two n-digit integers. School algorithm: O(n2).

Karatsuba's trick (1960)

Split each number into high and low halves:

x = x_H · B^m + x_L
y = y_H · B^m + y_L

Naive: 4 multiplications of n/2-digit numbers. Karatsuba uses 3:

z0 = x_L · y_L
z2 = x_H · y_H
z1 = (x_L + x_H)(y_L + y_H) - z0 - z2

x·y = z2·B^2m + z1·B^m + z0

Complexity

Recurrence: T(n) = 3T(n/2) + O(n)

Master Case 1: log2(3) ≈ 1.585 > 1O(n1.585)

Multiplication algorithm hierarchy

AlgorithmComplexityCrossover
SchoolbookO(n2)n < 20–80
KaratsubaO(n1.585)20–80 digits
Toom-CookO(n1.465)~100+ digits
FFT-basedO(n log n log log n)~10,000+ digits

Python's int and GMP chain these algorithms as n grows.

13

Maximum Subarray Problem

D&C approach — O(n log n)

  1. Divide — split array at midpoint
  2. Conquer — recursively find max subarray in left and right halves
  3. Combine — find max subarray crossing the midpoint (linear scan)
  4. Return maximum of left, right, and crossing
T(n) = 2T(n/2) + O(n)  →  O(n log n)

Kadane's algorithm — O(n)

kadane(A):
    max_here = max_so_far = A[0]
    for i = 1 to n-1:
        max_here = max(A[i],
                       max_here + A[i])
        max_so_far = max(max_so_far,
                         max_here)
    return max_so_far

Single pass, O(1) space. Essentially dynamic programming.

Comparison

D&CKadane
TimeO(n log n)O(n)
ParallelismNaturally parallelSequential
Teaching valueDemonstrates combine stepDemonstrates DP

The maximum subarray is a case where D&C is instructive but not optimal. Kadane wins for serial execution; D&C wins for parallel.

14

Median of Medians / Selection

The selection problem

Find the k-th smallest element in an unsorted array.

Quickselect — expected O(n)

Partition around a random pivot. Recurse only into the side containing index k. Expected O(n), worst case O(n2).

Median of medians — worst-case O(n)

  1. Divide array into groups of 5
  2. Find the median of each group (brute force)
  3. Recursively find the median of those medians → pivot
  4. Partition around this pivot
  5. Recurse into the correct side

Why groups of 5?

The pivot is guaranteed to be between the 30th and 70th percentile. Each recursive call eliminates at least 30% of elements:

T(n) = T(n/5) + T(7n/10) + O(n)

This solves to O(n) because 1/5 + 7/10 = 9/10 < 1.

In practice

std::nth_element in C++ uses Introselect: Quickselect with median-of-medians fallback.

The constant factor is large — Quickselect with random pivot is faster in practice. Median of medians is primarily of theoretical importance: it proves linear selection is possible.

15

FFT — Fast Fourier Transform

The Discrete Fourier Transform (DFT)

Given a sequence of n complex numbers, the DFT computes n frequency components. Naive evaluation: O(n2).

Cooley–Tukey FFT (1965)

Divide the DFT into two half-size DFTs on even-indexed and odd-indexed elements:

X[k]     = E[k] + ω^k · O[k]
X[k+n/2] = E[k] - ω^k · O[k]

where ω = e^{-2πi/n}
E = DFT of even elements
O = DFT of odd elements

Complexity

T(n) = 2T(n/2) + O(n)  →  O(n log n)

Master theorem Case 2. Reduces DFT from O(n2) to O(n log n).

The butterfly operation

a b a+ωb a-ωb twiddle factor ω^k

One addition and one subtraction, multiplied by a twiddle factor. Maps naturally to hardware and SIMD.

16

FFT — Applications & Complexity

Applications

  • Polynomial multiplication — multiply two degree-n polynomials in O(n log n) instead of O(n2)
  • Big integer multiplication — Schonhage–Strassen uses FFT for O(n log n log log n)
  • Signal processing — spectral analysis, filtering, convolution
  • Audio/image compression — MP3, JPEG rely on DCT (closely related)
  • String matching — convolution-based pattern matching

Related transforms

TransformDomainUse
FFTComplex numbersGeneral signal processing
NTTIntegers mod pExact polynomial arithmetic
DCTReal numbersCompression (JPEG, MP3)
Walsh–HadamardBinary/BooleanSubset-sum convolutions

Number-Theoretic Transform (NTT)

FFT over finite fields (integers mod prime p) instead of complex numbers. Avoids floating-point errors entirely. Used in competitive programming and cryptographic polynomial multiplication.

17

Parallelism in D&C

Natural parallelism

D&C subproblems are independent — they can run on separate cores, threads, or machines without synchronisation until the combine step.

Fork-join model

fork-join quicksort(A, lo, hi):
    if hi-lo < THRESHOLD:
        insertion_sort(A, lo, hi)
        return
    p = partition(A, lo, hi)
    fork: quicksort(A, lo, p-1)
    fork: quicksort(A, p+1, hi)
    join

Work and span

MetricDefinitionMerge sort
Work W(n)Total operationsO(n log n)
Span S(n)Longest dependency chainO(n)
ParallelismW(n) / S(n)O(log n)

Practical frameworks

Fork/Join — Java

RecursiveTask, work-stealing thread pool

TBB — C++

parallel_for, parallel_reduce

Cilk — C/C++

cilk_spawn, cilk_sync

Rayon — Rust

par_iter, work-stealing

multiprocessing — Python

Process pools (GIL workaround)

Amdahl's Law: speedup is limited by the sequential fraction. In merge sort, the final merge is sequential and dominates at high core counts.

18

Common Pitfalls & Optimisations

Pitfalls

  • Stack overflow — deep recursion on large inputs. Use iterative bottom-up or increase stack size
  • Small subproblem overhead — recursive calls on tiny arrays waste function-call overhead. Switch to insertion sort below 16–32 elements
  • Worst-case pivot — deterministic Quick Sort on sorted input is O(n2). Always randomise or use median-of-three
  • Unnecessary copying — naive merge sort copies arrays at every level. Use index-based merging with a shared buffer
  • Integer overflow(lo + hi) / 2 overflows for large arrays. Use lo + (hi - lo) / 2

Optimisations

TechniqueApplies toBenefit
Hybrid cutoffAll D&C sortsInsertion sort for n < 16
Tail recursionQuick SortO(log n) stack depth
IntrosortQuick SortHeap sort fallback at 2 log n depth
Bottom-up mergeMerge SortBetter cache locality
Bit-reversalFFTIn-place iterative FFT
19

D&C in Practice

Standard library sort implementations

LanguageAlgorithmNotes
C++IntrosortQS + Heap + Insertion
PythonTimsortMerge + Insertion; exploits runs
JavaDual-pivot QS / TimsortQS for primitives, Timsort for objects
RustpdqsortPattern-defeating QS
Gopdqsort (1.19+)Replaced introsort

D&C beyond sorting

DomainAlgorithmD&C idea
GeometryConvex hullMerge two convex hulls
Linear algebraStrassen, FFT solversBlock decomposition
DatabasesExternal merge sortDivide across disk pages
DistributedMapReduceMap = divide, Reduce = combine
GraphicsBSP trees, k-d treesSpatial subdivision

If your problem decomposes into independent subproblems of the same type, D&C is likely the right paradigm.

20

Summary & Further Reading

Key takeaways

  • Divide and conquer is a fundamental paradigm: divide, conquer, combine
  • The Master Theorem provides closed-form O-notation for most D&C recurrences
  • Merge Sort is the stable O(n log n) baseline; Quick Sort wins in practice
  • Strassen, Karatsuba, and FFT beat "obvious" lower bounds via clever decomposition
  • Binary search is the simplest D&C — master its variants and off-by-one traps
  • D&C is naturally parallel: independent subproblems map to cores/machines
  • Real-world implementations are hybrid: switch strategies at small sizes and degenerate inputs

Recommended reading

CLRSIntroduction to Algorithms — chapters 2, 4, 7, 9, 33 cover all major D&C topics
Kleinberg & TardosAlgorithm Design — excellent D&C chapter with closest pair, FFT
SedgewickAlgorithms — practical Quick Sort and Merge Sort analysis
Jeff Ericksonjeffe.cs.illinois.edu — free textbook, outstanding recursion chapter
MIT 6.046Design and Analysis of Algorithms — D&C lectures on YouTube

Merge Sort Quick Sort Strassen FFT Karatsuba Binary Search Closest Pair