Appearance
Catalan Numbers
- Difficulty: [Intermediate] CF 1400-1700
- Prerequisites: Combinatorics, DP Intro
- Key idea: A single integer sequence that counts dozens of seemingly unrelated combinatorial structures -- parenthesizations, binary trees, triangulations, lattice paths, and more.
Quick Revisit
- USE WHEN: counting balanced brackets, BSTs, triangulations, Dyck paths, or non-crossing partitions
- INVARIANT: C_n = sum_{i=0}^{n-1} C_i * C_{n-1-i} (convolution / recursive split)
- TIME: O(n) via C(2n,n)/(n+1) with modular inverse | SPACE: O(n) precomputed
- CLASSIC BUG: dividing C(2n,n) by (n+1) without modular inverse
- PRACTICE: see practice problems in this chapter
Contents
- Intuition
- Definitions and Formulas
- Applications Table
- Implementation
- Variants and Extensions
- Worked Example: Polygon Triangulations
- Worked Example: Count BSTs
- Worked Example: Dyck Paths via Reflection
- Common Pitfalls
- Complexity Notes
- Self-Test
- Practice Problems
- Summary
Intuition
"Count the number of valid parenthesizations of
pairs."
The brute-force approach: generate all
n = 3 --> 2^6 = 64 candidate strings
Valid: ((())), (()()), (())(), ()(()), ()()()
Answer: 5 out of 64For
The number of valid parenthesizations of
This is the
n : 0 1 2 3 4 5 6 7 8 9 10
C_n: 1 1 2 5 14 42 132 429 1430 4862 16796Visual Walkthrough
All
1) ( ( ( ) ) ) 2) ( ( ) ( ) ) 3) ( ( ) ) ( )
| | | | | | | | | | | | | | | | | |
0 1 2 2 1 0 0 1 2 1 2 0 0 1 2 1 0 1
4) ( ) ( ( ) ) 5) ( ) ( ) ( )
| | | | | | | | | | | |
0 1 0 1 2 1 0 1 0 1 0 1Each valid string decomposes uniquely by first-match: the opening ( at position ) at position ) is
For
((())) = ( + (()) + ) + "" i=2, n-1-i=0 C_2 * C_0 = 2*1
(()()) = ( + ()() + ) + "" i=2, n-1-i=0 (same bucket)
(())() = ( + () + ) + () i=1, n-1-i=1 C_1 * C_1 = 1*1
()(()) = ( + + ) + (()) i=0, n-1-i=2 C_0 * C_2 = 1*2
()()() = ( + + ) + ()() i=0, n-1-i=2 (same bucket)Sum:
The Invariant
At every prefix of a valid parenthesization, the count of
(isthe count of ).
Equivalently, if we map ( to ) to
What the Code Won't Teach You
The code gives you
BIJECTION WEB -- each arrow is a 1-to-1 mapping:
Bracket sequences <--> Dyck paths <--> Binary trees
↕ ↕ ↕
Stack-sortable Lattice paths Triangulations
permutations below diagonal of (n+2)-gonThe code also won't teach you the reflection principle: the key trick that turns "paths that never cross a boundary" into "total paths minus bad paths reflected across the boundary." This one insight derives the closed form and generalizes to ballot problems and higher dimensions.
Finally, computing
When to Reach for This
Trigger phrases in problem statements:
| Trigger | Likely model |
|---|---|
| "count balanced/valid parenthesizations" | Direct Catalan |
| "number of distinct BSTs with | Catalan |
| "triangulations of a convex polygon" | Catalan |
| "monotone lattice paths that stay below diagonal" | Catalan |
| "stack-sortable permutations" | Catalan |
| "non-crossing partitions" | Catalan |
| "number of full binary trees with | Catalan |
If the answer for small
Definitions and Formulas
Closed-Form
Warning (modular). Under a modulus
, the factor is not an integer division — it is a multiplication by , which only exists when . For the standard contest prime this is always fine, but if the problem hands you a non-prime modulus, or one where shares a factor with , this formula silently produces a wrong answer with no runtime error.
Alternatively, using the reflection / ballot argument:
This form is often easier to compute modulo a prime since it avoids division by
The Catalan numbers live along the central column of Pascal's triangle, adjusted by one entry to the right:
PASCAL'S TRIANGLE -- Catalan numbers as a difference of neighbors:
Row 0: 1
Row 1: 1 1
Row 2: 1 2 1
Row 3: 1 3 3 1
Row 4: 1 4 6 4 1 C_2 = 6 - 4 = 2
Row 5: 1 5 10 10 5 1
Row 6: 1 6 15 20 15 6 1 C_3 = 20 - 15 = 5
Row 7: 1 7 21 35 35 21 7 1
Row 8: 1 8 28 56 70 56 28 8 1 C_4 = 70 - 56 = 14
^ ^
C(2n,n) C(2n,n+1)
C_n = C(2n, n) - C(2n, n+1)DP Recurrence
This is a convolution -- it comes directly from the recursive decomposition ( A ) B described above.
| Split terms | ||
|---|---|---|
| 0 | (base) | 1 |
| 1 | 1 | |
| 2 | 2 | |
| 3 | 5 | |
| 4 | 14 |
Time:
Generating Function (Brief)
Let
Solving the quadratic:
This is useful in advanced generating-function arguments but rarely needed directly in CP implementations.
Applications Table
| # | Structure | Size parameter | |
|---|---|---|---|
| 1 | Balanced parenthesizations | valid strings | |
| 2 | Rooted full binary trees | distinct trees | |
| 3 | Rooted binary trees (BSTs) | structurally distinct BSTs | |
| 4 | Triangulations of convex polygon | triangulations | |
| 5 | Lattice paths (Dyck paths) | monotone paths | |
| 6 | Stack-sortable permutations | permutation of | sortable by 1 stack |
| 7 | Non-crossing partitions | partitions | |
| 8 | Ballot sequences | valid sequences |
Key relationships:
- BSTs: choosing root
splits keys into (left subtree, nodes) and (right subtree, nodes). Same recurrence as Catalan. - Triangulations: choosing the triangle containing a fixed edge of an
-gon splits the remaining polygon into two smaller ones. Same recurrence. - Lattice paths: a monotone path from
to using steps R and U that never goes above the diagonal corresponds 1-to-1 with a Dyck word (R (, U)).
Implementation
Catalan via DP --
Use the recurrence directly when
cpp
#include <bits/stdc++.h>
using namespace std;
const int MOD = 1e9 + 7;
vector<long long> catalan_dp(int n) {
vector<long long> c(n + 1);
c[0] = 1;
for (int i = 1; i <= n; i++)
for (int j = 0; j < i; j++)
c[i] = (c[i] + c[j] % MOD * c[i - 1 - j]) % MOD;
return c;
}Catalan via Closed-Form -- precompute
Preferred when
cpp
#include <bits/stdc++.h>
using namespace std;
const int MOD = 1e9 + 7;
const int MAXN = 2'000'001;
long long fac[MAXN], inv_fac[MAXN];
long long power(long long a, long long b, long long mod) {
long long res = 1;
a %= mod;
for (; b > 0; b >>= 1) {
if (b & 1) res = res * a % mod;
a = a * a % mod;
}
return res;
}
void precompute() {
fac[0] = 1;
for (int i = 1; i < MAXN; i++)
fac[i] = fac[i - 1] * i % MOD;
inv_fac[MAXN - 1] = power(fac[MAXN - 1], MOD - 2, MOD);
for (int i = MAXN - 2; i >= 0; i--)
inv_fac[i] = inv_fac[i + 1] * (i + 1) % MOD;
}
long long C(int n, int k) {
if (k < 0 || k > n) return 0;
return fac[n] % MOD * inv_fac[k] % MOD * inv_fac[n - k] % MOD;
}
long long catalan(int n) {
// C_n = C(2n, n) - C(2n, n+1)
return (C(2 * n, n) - C(2 * n, n + 1) + MOD) % MOD;
}Enumerating All Balanced Parenthesizations
Sometimes you need not just the count but the actual strings (for small
cpp
#include <bits/stdc++.h>
using namespace std;
void generate(int n, int open, int close, string& cur,
vector<string>& result) {
if ((int)cur.size() == 2 * n) {
result.push_back(cur);
return;
}
if (open < n) {
cur.push_back('(');
generate(n, open + 1, close, cur, result);
cur.pop_back();
}
if (close < open) {
cur.push_back(')');
generate(n, open, close + 1, cur, result);
cur.pop_back();
}
}
vector<string> all_parenthesizations(int n) {
vector<string> result;
string cur;
generate(n, 0, 0, cur, result);
return result;
}Time:
Variants and Extensions
Ballot Problem / Bertrand's Ballot Theorem
In an election, candidate A gets
votes and candidate B gets votes with . The probability that A is strictly ahead throughout the counting is .
This connects to Catalan via the reflection principle: the number of paths from
Catalan and Non-Crossing Partitions
A non-crossing partition of
n = 4, C_4 = 14 non-crossing partitions:
{1}{2}{3}{4} {12}{3}{4} {1}{23}{4} {1}{2}{34}
{13}{2}{4} (CROSSING - invalid!)
{14}{2}{3} {12}{34} {1}{234} {123}{4}
{124}{3} {134}{2} {1234}
... (14 total valid ones)The bijection to Dyck paths: scan positions (, and merging/closing maps to ).
Generalized Catalan / Ballot Numbers
The number of lattice paths from
When
Super Catalan (Schroeder Numbers)
If you also allow a "double step" (diagonal
Catalan Convolution
Counts the number of ways to parenthesize
Worked Example: Polygon Triangulations
Problem: In how many ways can a convex
Solution: Label the vertices
This triangle splits the polygon into:
- A convex polygon on vertices
(a -gon, which has triangulations, with the convention for a triangle or degenerate edge). - A convex polygon on vertices
(an -gon, which has triangulations).
For a hexagon (
Worked Example: Count BSTs
Problem: How many structurally distinct BSTs can be formed with keys
Solution: Pick root
Substituting
With
For
ALL C_3 = 5 BSTs with keys {1, 2, 3}:
1 1 2 3 3
\ \ / \ / /
2 3 1 3 1 2
\ / \ /
3 2 2 1
root=1 root=1 root=2 root=3 root=3
left=0 left=0 left=1 left=2 left=2
right=2 right=2 right=1 right=0 right=0
C_0*C_2 + C_1*C_1 + C_2*C_0 = 1*2 + 1*1 + 2*1 = 5 OKWorked Example: Dyck Paths via Reflection
Problem: Count monotone lattice paths from
Total paths (no restriction):
Bad paths (touch
Good paths:
n=3: Good paths = C(6,3) - C(6,4) = 20 - 15 = 5 = C_3
Example good path (R=right, U=up):
R R U R U U
3 . . . X
2 . . X .
1 . X . . "X" marks the path
0 X . . .
0 1 2 3All 5 Dyck paths for
DYCK PATHS for n=3 (must stay on or below diagonal y=x):
y y=x
3 * * * ■ /
| | /
2 * * ■──■ /
| | /
1 * ■──■ /
| | /
0 ■──■ * *
0 1 2 3 x
Path 1: RRRUUU Path 2: RRUURU Path 3: RRUURU
stays on diagonal dips at step 4 hugs the diagonal
y=x y=x y=x
3 * * * ■ 3 * * ■──■ 3 * * * ■
2 * * ■ * 2 * * ■ * 2 * ■──■──■
1 * ■ * * 1 * ■──■ * 1 * ■ * *
0 ■──■ * * 0 ■──■ * * 0 ■──■ * *
Path 4: RURURU Path 5: RURURU
staircase pattern staircase offset
y=x y=x
3 * * * ■ 3 * * ■──■
2 * * ■──■ 2 * ■──■ *
1 * ■──■ * 1 ■──■ * *
0 ■──■ * * 0 ■ * * *Common Pitfalls
Mental Traps
"It matches 1, 1, 2, 5, 14 -- must be Catalan." Many sequences agree with Catalan for small
TRAP: Empirical matching
Your sequence: 1 1 2 5 14 ... ???
Catalan numbers: 1 1 2 5 14 42 132
Motzkin numbers: 1 1 2 4 9 21 51
^ ^
diverge at n=3 and n=4
but agree at n=0..2!"The ballot problem is just the bracket problem rephrased." Both use the reflection principle, but the conditions differ. Brackets require prefix sums
"Division form works everywhere."
SAFE vs UNSAFE computation:
UNSAFE: C_n = C(2n, n) / (n+1) <-- needs modular inverse
fails if p | (n+1)
SAFE: C_n = C(2n, n) - C(2n, n+1) <-- pure subtraction
always correct| Pitfall | Fix |
|---|---|
| Off-by-one in polygon triangulations | An |
| Using | Use |
| Forgetting | The empty structure is always counted as 1 valid configuration |
| Overflow with raw factorials | Always reduce mod |
| Confusing "full binary tree" vs "binary tree" | Full = every node has 0 or 2 children; |
| Not recognizing a Catalan variant | Check small cases against |
Writing answer = catalan(n) for triangulations of an | Always verify your Catalan index against the smallest case: a 3-gon has exactly 1 triangulation = |
Debug Drills
Drill 1. Division in modular arithmetic.
cpp
long long catalan(int n) {
return fact[2 * n] / fact[n] / fact[n] / (n + 1); // What's wrong?
}What's wrong?
You can't use integer division for modular arithmetic! fact[2n] / fact[n] doesn't give the right result mod p. Use modular inverse: fact[2*n] * inv_fact[n] % MOD * inv_fact[n] % MOD * power(n+1, MOD-2, MOD) % MOD. Or better, use the subtraction form C(2n,n) - C(2n,n+1) to avoid dividing by (n+1).
Drill 2. Off-by-one in Catalan recurrence.
cpp
vector<long long> C(n + 1);
C[0] = 1;
for (int i = 1; i <= n; i++)
for (int j = 0; j < i; j++)
C[i] = (C[i] + C[j] * C[i - j]) % MOD; // What's wrong?What's wrong?
The recurrence is Cₙ = Σ Cⱼ * C_{n-1-j} for j from 0 to n-1, but the code computes C[j] * C[i-j] (using i-j instead of i-1-j). Fix: change to C[j] * C[i - 1 - j].
Drill 3. Catalan for wrong problem size.
cpp
// Count BSTs with nodes labeled 1..n
long long count_bst(int n) {
return catalan(n - 1); // Correct?
}What's wrong?
No -- the number of structurally distinct BSTs with n nodes is exactly Cₙ, not C_{n-1}. The Catalan number Cₙ counts BSTs with n nodes: C₁ = 1, C₂ = 2, C₃ = 5. The offset n-1 is wrong here (that's for triangulations of an (n+2)-gon, not BSTs).
Complexity Notes
| Method | Time | Space | When to use |
|---|---|---|---|
| DP recurrence | |||
| Closed-form (factorial precompute) | |||
| Lucas' theorem | Modulus is small prime |
Self-Test
Before moving to practice problems, verify you can do each of these without looking at the notes above:
- [ ] State the closed-form formula for
in both division and subtraction forms - [ ] Compute
by hand using the subtraction form - [ ] Write the Catalan DP recurrence from memory and explain the "first return" decomposition
- [ ] Name five canonical Catalan structures and sketch a bijection between any two
- [ ] Explain why the subtraction form is safer than the division form for modular arithmetic
- [ ] Given a convex hexagon, determine that it has
triangulations (not ) - [ ] Identify a problem that looks like Catalan but isn't (e.g., ballot problem with strict inequality)
Practice Problems
| Rating | You should be able to... |
|---|---|
| CF 1200 | Recognize that balanced bracket sequences are counted by Catalan numbers |
| CF 1500 | Compute Cₙ mod p using the formula C(2n,n)/(n+1) with modular inverse |
| CF 1800 | Identify non-obvious Catalan instances (triangulations, non-crossing partitions); use the ballot problem |
| CF 2100 | Generalize Catalan to k-ary trees, multi-type Catalan, and weighted Catalan recurrences via generating functions |
| # | Problem | Source | Difficulty | Key idea |
|---|---|---|---|---|
| 1 | Distinct BSTs | LeetCode 96 | CF ~1200 | Direct Catalan DP |
| 2 | Valid Parentheses Strings | LeetCode 22 | CF ~1300 | Backtracking with invariant |
| 3 | Dyck Path | CSES | CF ~1400 | |
| 4 | Bracket Sequences I | CSES | CF ~1500 | Catalan closed-form |
| 5 | Bracket Sequences II | CSES | CF ~1700 | Prefix-fixed Catalan via reflection |
| 6 | CF 1264C -- Beautiful Mirrors | CF | CF 1400 | Ballot / Catalan variant |
| 7 | CF 1862F -- Magic Will Save the World | CF | CF 1500 | Catalan-adjacent counting |
| 8 | CF 1lobsters -- Catalan + DP | CF | CF 1700 | Catalan recurrence in DP |
| 9 | Polygon Triangulation Count | SPOJ | CF ~1600 | |
| 10 | CF 1767C -- Count Binary Strings | CF | CF 1700 | Catalan + prefix constraints |
Summary
Catalan Number C_n
==================
Closed form: C_n = C(2n, n) / (n + 1)
= C(2n, n) - C(2n, n + 1)
Recurrence: C_0 = 1
C_n = sum_{i=0}^{n-1} C_i * C_{n-1-i}
First values: 1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, ...
Trigger: counting balanced structures, BSTs, non-crossing
partitions, triangulations, Dyck paths
Pitfall: off-by-one in polygon size; forgetting C_0 = 1;
division mod prime needs inverse (use subtraction form)Cross-Links
- Combinatorics Basics -- binomial coefficients and modular inverse needed for Catalan computation
- DP -- 1D Introduction -- Catalan recurrence is a classic DP problem
- DP Thinking Framework -- recognizing Catalan structure in DP state design