Skip to content

Catalan Numbers

  • Difficulty: [Intermediate] CF 1400-1700
  • Prerequisites: Combinatorics, DP Intro
  • Key idea: A single integer sequence that counts dozens of seemingly unrelated combinatorial structures -- parenthesizations, binary trees, triangulations, lattice paths, and more.

Quick Revisit

  • USE WHEN: counting balanced brackets, BSTs, triangulations, Dyck paths, or non-crossing partitions
  • INVARIANT: C_n = sum_{i=0}^{n-1} C_i * C_{n-1-i} (convolution / recursive split)
  • TIME: O(n) via C(2n,n)/(n+1) with modular inverse | SPACE: O(n) precomputed
  • CLASSIC BUG: dividing C(2n,n) by (n+1) without modular inverse
  • PRACTICE: see practice problems in this chapter

Contents


Intuition

"Count the number of valid parenthesizations of n pairs."

The brute-force approach: generate all 22n binary strings of length 2n, then filter to those that are balanced.

n = 3  -->  2^6 = 64 candidate strings
Valid: ((())), (()()), (())(), ()(()), ()()()
Answer: 5 out of 64

For n=10, that is 220106 candidates and only 16796 valid ones. For n=15, it is 230109 -- too slow even to enumerate. We need a closed-form formula or a fast recurrence.

The number of valid parenthesizations of n pairs equals C(2n,n)/(n+1) -- and this same number appears in binary trees, grid paths, polygon triangulations, and stack-sortable permutations.

This is the n-th Catalan number, denoted Cn:

n :  0  1  2   3   4    5    6     7      8      9      10
C_n: 1  1  2   5  14   42  132   429   1430   4862   16796

Visual Walkthrough

All C3=5 valid parenthesizations of 3 pairs, shown with nesting depth:

1)  ( ( ( ) ) )          2)  ( ( ) ( ) )          3)  ( ( ) ) ( )
    | | | | | |              | | | | | |              | | | | | |
    0 1 2 2 1 0              0 1 2 1 2 0              0 1 2 1 0 1

4)  ( ) ( ( ) )          5)  ( ) ( ) ( )
    | | | | | |              | | | | | |
    0 1 0 1 2 1              0 1 0 1 0 1

Each valid string decomposes uniquely by first-match: the opening ( at position 0 matches some ) at position 2i+1. Everything strictly inside that pair is A (a valid string of i pairs), and everything after the matching ) is B (a valid string of n1i pairs):

string of n pairs=(A)B,|A|=i,|B|=n1i.

For n=3, the five strings split as:

  ((()))   = ( + (()) + ) + ""       i=2, n-1-i=0   C_2 * C_0 = 2*1
  (()())   = ( + ()() + ) + ""       i=2, n-1-i=0   (same bucket)
  (())()   = ( + ()   + ) + ()       i=1, n-1-i=1   C_1 * C_1 = 1*1
  ()(())   = ( +      + ) + (())     i=0, n-1-i=2   C_0 * C_2 = 1*2
  ()()()   = ( +      + ) + ()()     i=0, n-1-i=2   (same bucket)

Sum: C0C2+C1C1+C2C0=2+1+2=5. Checks out.

The Invariant

At every prefix of a valid parenthesization, the count of ( is the count of ).

Equivalently, if we map ( to +1 and ) to 1, the prefix-sum sequence never goes negative and ends at 0. This is exactly the condition for a Dyck path -- a lattice path from (0,0) to (2n,0) using steps (+1,+1) and (+1,1) that never dips below the x-axis.

What the Code Won't Teach You

The code gives you Cn -- a single integer. What it hides is why so many unrelated structures share the same count. The answer is bijections: every Catalan structure can be mapped 1-to-1 to every other. Understanding the bijection lets you transform an unfamiliar problem into a familiar one.

BIJECTION WEB -- each arrow is a 1-to-1 mapping:

  Bracket sequences <--> Dyck paths <--> Binary trees
        ↕                  ↕                ↕
  Stack-sortable     Lattice paths     Triangulations
  permutations       below diagonal    of (n+2)-gon

The code also won't teach you the reflection principle: the key trick that turns "paths that never cross a boundary" into "total paths minus bad paths reflected across the boundary." This one insight derives the closed form and generalizes to ballot problems and higher dimensions.

Finally, computing Cn with the division form (2nn)/(n+1) silently fails when (n+1) shares a factor with the modulus -- the code produces a wrong answer with no runtime error. The subtraction form (2nn)(2nn+1) is always safe.

When to Reach for This

Trigger phrases in problem statements:

TriggerLikely model
"count balanced/valid parenthesizations"Direct Catalan
"number of distinct BSTs with n nodes"Catalan
"triangulations of a convex polygon"Catalan
"monotone lattice paths that stay below diagonal"Catalan
"stack-sortable permutations"Catalan
"non-crossing partitions"Catalan
"number of full binary trees with n+1 leaves"Catalan

If the answer for small n matches 1,1,2,5,14,42,, you are almost certainly looking at Catalan numbers (or a direct transform of them).


Definitions and Formulas

Closed-Form

Cn=1n+1(2nn)=(2n)!(n+1)!n!

Warning (modular). Under a modulus p, the 1n+1 factor is not an integer division — it is a multiplication by (n+1)1modp, which only exists when gcd(n+1,p)=1. For the standard contest prime p=109+7 this is always fine, but if the problem hands you a non-prime modulus, or one where n+1 shares a factor with p, this formula silently produces a wrong answer with no runtime error.

Alternatively, using the reflection / ballot argument:

Cn=(2nn)(2nn+1)

This form is often easier to compute modulo a prime since it avoids division by n+1 inside the binomial — preferred for contest code unless you specifically need the multiplicative form.

The Catalan numbers live along the central column of Pascal's triangle, adjusted by one entry to the right:

PASCAL'S TRIANGLE -- Catalan numbers as a difference of neighbors:

Row 0:                    1
Row 1:                  1   1
Row 2:                1   2   1
Row 3:              1   3   3   1
Row 4:            1   4   6   4   1         C_2 = 6 - 4 = 2
Row 5:          1   5  10  10   5   1
Row 6:        1   6  15  20  15   6   1     C_3 = 20 - 15 = 5
Row 7:      1   7  21  35  35  21   7   1
Row 8:    1   8  28  56  70  56  28   8  1  C_4 = 70 - 56 = 14
                       ^   ^
                  C(2n,n) C(2n,n+1)

  C_n = C(2n, n) - C(2n, n+1)

DP Recurrence

C0=1,Cn=i=0n1CiCn1i

This is a convolution -- it comes directly from the recursive decomposition ( A ) B described above.

nSplit termsCn
0(base)1
1C0C01
2C0C1+C1C02
3C0C2+C1C1+C2C05
4C0C3+C1C2+C2C1+C3C014

Time: O(n2) to compute C0Cn.

Generating Function (Brief)

Let c(x)=n0Cnxn. The recurrence gives:

c(x)=1+xc(x)2

Solving the quadratic: c(x)=114x2x.

This is useful in advanced generating-function arguments but rarely needed directly in CP implementations.


Applications Table

#StructureSize parameterCn counts
1Balanced parenthesizationsn pairs of parensvalid strings
2Rooted full binary treesn internal nodes (n+1 leaves)distinct trees
3Rooted binary trees (BSTs)n nodes (keys 1..n)structurally distinct BSTs
4Triangulations of convex polygon(n+2)-gontriangulations
5Lattice paths (Dyck paths)(0,0) to (n,n) below diagonalmonotone paths
6Stack-sortable permutationspermutation of [n]sortable by 1 stack
7Non-crossing partitionsn-element setpartitions
8Ballot sequences2n votes, A always aheadvalid sequences

Key relationships:

  • BSTs: choosing root k splits keys into [1..k1] (left subtree, k1 nodes) and [k+1..n] (right subtree, nk nodes). Same recurrence as Catalan.
  • Triangulations: choosing the triangle containing a fixed edge of an (n+2)-gon splits the remaining polygon into two smaller ones. Same recurrence.
  • Lattice paths: a monotone path from (0,0) to (n,n) using steps R and U that never goes above the diagonal corresponds 1-to-1 with a Dyck word (R (, U )).

Implementation

Catalan via DP -- O(n2)

Use the recurrence directly when n is small (n5000).

cpp
#include <bits/stdc++.h>
using namespace std;

const int MOD = 1e9 + 7;

vector<long long> catalan_dp(int n) {
    vector<long long> c(n + 1);
    c[0] = 1;
    for (int i = 1; i <= n; i++)
        for (int j = 0; j < i; j++)
            c[i] = (c[i] + c[j] % MOD * c[i - 1 - j]) % MOD;
    return c;
}

Catalan via Closed-Form -- O(n) precompute

Preferred when n can be up to 106 (or even 107) and the modulus is prime.

cpp
#include <bits/stdc++.h>
using namespace std;

const int MOD = 1e9 + 7;
const int MAXN = 2'000'001;

long long fac[MAXN], inv_fac[MAXN];

long long power(long long a, long long b, long long mod) {
    long long res = 1;
    a %= mod;
    for (; b > 0; b >>= 1) {
        if (b & 1) res = res * a % mod;
        a = a * a % mod;
    }
    return res;
}

void precompute() {
    fac[0] = 1;
    for (int i = 1; i < MAXN; i++)
        fac[i] = fac[i - 1] * i % MOD;
    inv_fac[MAXN - 1] = power(fac[MAXN - 1], MOD - 2, MOD);
    for (int i = MAXN - 2; i >= 0; i--)
        inv_fac[i] = inv_fac[i + 1] * (i + 1) % MOD;
}

long long C(int n, int k) {
    if (k < 0 || k > n) return 0;
    return fac[n] % MOD * inv_fac[k] % MOD * inv_fac[n - k] % MOD;
}

long long catalan(int n) {
    // C_n = C(2n, n) - C(2n, n+1)
    return (C(2 * n, n) - C(2 * n, n + 1) + MOD) % MOD;
}

Enumerating All Balanced Parenthesizations

Sometimes you need not just the count but the actual strings (for small n). Backtracking with the prefix invariant:

cpp
#include <bits/stdc++.h>
using namespace std;

void generate(int n, int open, int close, string& cur,
              vector<string>& result) {
    if ((int)cur.size() == 2 * n) {
        result.push_back(cur);
        return;
    }
    if (open < n) {
        cur.push_back('(');
        generate(n, open + 1, close, cur, result);
        cur.pop_back();
    }
    if (close < open) {
        cur.push_back(')');
        generate(n, open, close + 1, cur, result);
        cur.pop_back();
    }
}

vector<string> all_parenthesizations(int n) {
    vector<string> result;
    string cur;
    generate(n, 0, 0, cur, result);
    return result;
}

Time: O(Cnn) -- exponential, but optimal for enumeration since you produce every valid string exactly once.


Variants and Extensions

Ballot Problem / Bertrand's Ballot Theorem

In an election, candidate A gets a votes and candidate B gets b votes with a>b. The probability that A is strictly ahead throughout the counting is (ab)/(a+b).

This connects to Catalan via the reflection principle: the number of paths from (0,0) to (a+b,ab) that touch or cross the x-axis can be counted by reflecting the bad prefix.

Catalan and Non-Crossing Partitions

A non-crossing partition of {1,2,,n} arranges the elements on a circle and groups them into blocks such that no two blocks "cross" -- i.e., there are no a<b<c<d with a,c in one block and b,d in another.

n = 4, C_4 = 14 non-crossing partitions:

  {1}{2}{3}{4}    {12}{3}{4}     {1}{23}{4}     {1}{2}{34}
  {13}{2}{4}  (CROSSING - invalid!)
  {14}{2}{3}      {12}{34}       {1}{234}       {123}{4}
  {124}{3}        {134}{2}       {1234}
  ... (14 total valid ones)

The bijection to Dyck paths: scan positions 1n; opening a new block maps to (, and merging/closing maps to ).

Generalized Catalan / Ballot Numbers

The number of lattice paths from (0,0) to (m,n) with mn that never go above the diagonal is:

mn+1m+1(m+nn)

When m=n, this reduces to Cn.

Super Catalan (Schroeder Numbers)

If you also allow a "double step" (diagonal (+1,+1)), you get the large Schroeder numbers. These count lattice paths with steps R, U, D (diagonal) that stay below the diagonal:

Sn:1,2,6,22,90,394,

Catalan Convolution

Cn(r)=r2n+r(2n+rn)

Counts the number of ways to parenthesize n+r factors into groups of 2, i.e., sequences of n up-steps and n+r down-steps that stay non-negative. When r=1, this is the standard Catalan number.


Worked Example: Polygon Triangulations

Problem: In how many ways can a convex (n+2)-gon be divided into triangles by drawing non-crossing diagonals?

Solution: Label the vertices 0,1,,n+1. Fix edge (0,n+1). Every triangulation contains exactly one triangle using this edge -- say triangle (0,k,n+1) for some 1kn.

This triangle splits the polygon into:

  • A convex polygon on vertices 0,1,,k (a (k+1)-gon, which has Ck1 triangulations, with the convention C0=1 for a triangle or degenerate edge).
  • A convex polygon on vertices k,k+1,,n+1 (an (nk+2)-gon, which has Cnk triangulations).
T(n)=k=1nCk1Cnk=i=0n1CiCn1i=Cn

For a hexagon (n=4): C4=14 triangulations.


Worked Example: Count BSTs

Problem: How many structurally distinct BSTs can be formed with keys 1,2,,n?

Solution: Pick root k (1kn). Left subtree has keys 1k1 (k1 nodes), right subtree has keys k+1n (nk nodes). The number of BSTs is:

T(n)=k=1nT(k1)T(nk)

Substituting i=k1:

T(n)=i=0n1T(i)T(n1i)

With T(0)=1, this is exactly the Catalan recurrence. So T(n)=Cn.

For n=5: C5=42 distinct BSTs.

ALL C_3 = 5 BSTs with keys {1, 2, 3}:

  1           1         2         3       3
   \           \       / \       /       /
    2           3     1   3     1       2
     \         /                 \     /
      3       2                   2   1

root=1      root=1   root=2   root=3   root=3
left=0      left=0   left=1   left=2   left=2
right=2     right=2  right=1  right=0  right=0

C_0*C_2 + C_1*C_1 + C_2*C_0 = 1*2 + 1*1 + 2*1 = 5 OK

Worked Example: Dyck Paths via Reflection

Problem: Count monotone lattice paths from (0,0) to (n,n) using steps R=(1,0) and U=(0,1) that never go strictly above the diagonal y=x.

Total paths (no restriction): (2nn) -- choose which n of the 2n steps are R.

Bad paths (touch y=x+1 at some point): reflect the portion of the path after the first touch across the line y=x+1. This maps each bad path to a unique path from (0,0) to (n1,n+1), and vice versa. Count: (2nn1)=(2nn+1).

Good paths: (2nn)(2nn+1)=Cn.

n=3:  Good paths = C(6,3) - C(6,4) = 20 - 15 = 5 = C_3

Example good path (R=right, U=up):
  R R U R U U

  3 . . . X
  2 . . X .
  1 . X . .    "X" marks the path
  0 X . . .
    0 1 2 3

All 5 Dyck paths for n=3 on the lattice grid:

DYCK PATHS for n=3 (must stay on or below diagonal y=x):

y                         y=x
3 *  *  *  ■              /
  |        |             /
2 *  *  ■──■            /
  |     |              /
1 *  ■──■             /
  |  |               /
0 ■──■  *  *
  0  1  2  3  x

Path 1: RRRUUU          Path 2: RRUURU         Path 3: RRUURU
  stays on diagonal       dips at step 4         hugs the diagonal

       y=x                    y=x                    y=x
  3 *  *  *  ■           3 *  *  ■──■          3 *  *  *  ■
  2 *  *  ■  *           2 *  *  ■  *          2 *  ■──■──■
  1 *  ■  *  *           1 *  ■──■  *          1 *  ■  *  *
  0 ■──■  *  *           0 ■──■  *  *          0 ■──■  *  *

Path 4: RURURU           Path 5: RURURU
  staircase pattern        staircase offset

       y=x                    y=x
  3 *  *  *  ■           3 *  *  ■──■
  2 *  *  ■──■           2 *  ■──■  *
  1 *  ■──■  *           1 ■──■  *  *
  0 ■──■  *  *           0 ■  *  *  *

Common Pitfalls

Mental Traps

"It matches 1, 1, 2, 5, 14 -- must be Catalan." Many sequences agree with Catalan for small n but diverge later. Always construct a bijection to a canonical Catalan structure (brackets, Dyck paths, binary trees) rather than pattern-matching on values.

TRAP: Empirical matching

  Your sequence:     1  1  2  5  14  ...  ???
  Catalan numbers:   1  1  2  5  14  42   132
  Motzkin numbers:   1  1  2  4   9  21    51
                                ^    ^
                        diverge at n=3 and n=4
                        but agree at n=0..2!

"The ballot problem is just the bracket problem rephrased." Both use the reflection principle, but the conditions differ. Brackets require prefix sums 0 ending at 0; the ballot problem requires prefix sums >0 throughout. One word (strictly ahead vs. at least ahead) changes the formula from Cn to (ab)/(a+b)(a+ba).

"Division form works everywhere."Cn=(2nn)/(n+1) needs the modular inverse of (n+1). If gcd(n+1,p)>1, the inverse doesn't exist. No error -- just a wrong answer.

SAFE vs UNSAFE computation:

  UNSAFE:  C_n = C(2n, n) / (n+1)  <-- needs modular inverse
                                         fails if p | (n+1)

  SAFE:    C_n = C(2n, n) - C(2n, n+1)  <-- pure subtraction
                                              always correct
PitfallFix
Off-by-one in polygon triangulationsAn (n+2)-gon has Cn triangulations, not Cn+2
Using C(2n,n)/(n+1) without modular inverseUse C(2n,n)C(2n,n+1) form to avoid division
Forgetting C0=1 in DPThe empty structure is always counted as 1 valid configuration
Overflow with raw factorialsAlways reduce mod p at every multiplication
Confusing "full binary tree" vs "binary tree"Full = every node has 0 or 2 children; Cn counts full trees with n internal nodes
Not recognizing a Catalan variantCheck small cases against 1,1,2,5,14,42 before assuming Catalan
Writing answer = catalan(n) for triangulations of an n-gonAlways verify your Catalan index against the smallest case: a 3-gon has exactly 1 triangulation = C1, so an n-gon has Cn2, not Cn. The off-by-one (or off-by-two) is the most common Catalan mistake

Debug Drills

Drill 1. Division in modular arithmetic.

cpp
long long catalan(int n) {
    return fact[2 * n] / fact[n] / fact[n] / (n + 1);  // What's wrong?
}
What's wrong?

You can't use integer division for modular arithmetic! fact[2n] / fact[n] doesn't give the right result mod p. Use modular inverse: fact[2*n] * inv_fact[n] % MOD * inv_fact[n] % MOD * power(n+1, MOD-2, MOD) % MOD. Or better, use the subtraction form C(2n,n) - C(2n,n+1) to avoid dividing by (n+1).

Drill 2. Off-by-one in Catalan recurrence.

cpp
vector<long long> C(n + 1);
C[0] = 1;
for (int i = 1; i <= n; i++)
    for (int j = 0; j < i; j++)
        C[i] = (C[i] + C[j] * C[i - j]) % MOD;  // What's wrong?
What's wrong?

The recurrence is Cₙ = Σ Cⱼ * C_{n-1-j} for j from 0 to n-1, but the code computes C[j] * C[i-j] (using i-j instead of i-1-j). Fix: change to C[j] * C[i - 1 - j].

Drill 3. Catalan for wrong problem size.

cpp
// Count BSTs with nodes labeled 1..n
long long count_bst(int n) {
    return catalan(n - 1);  // Correct?
}
What's wrong?

No -- the number of structurally distinct BSTs with n nodes is exactly Cₙ, not C_{n-1}. The Catalan number Cₙ counts BSTs with n nodes: C₁ = 1, C₂ = 2, C₃ = 5. The offset n-1 is wrong here (that's for triangulations of an (n+2)-gon, not BSTs).


Complexity Notes

MethodTimeSpaceWhen to use
DP recurrenceO(n2)O(n)n5000, no prime modulus needed
Closed-form (factorial precompute)O(n) precompute, O(1) queryO(n)n107, modulus is prime
Lucas' theoremO(plogpn)O(p)Modulus is small prime

Self-Test

Before moving to practice problems, verify you can do each of these without looking at the notes above:

  • [ ] State the closed-form formula for Cn in both division and subtraction forms
  • [ ] Compute C4=14 by hand using the subtraction form
  • [ ] Write the Catalan DP recurrence from memory and explain the "first return" decomposition
  • [ ] Name five canonical Catalan structures and sketch a bijection between any two
  • [ ] Explain why the subtraction form is safer than the division form for modular arithmetic
  • [ ] Given a convex hexagon, determine that it has C4=14 triangulations (not C6)
  • [ ] Identify a problem that looks like Catalan but isn't (e.g., ballot problem with strict inequality)

Practice Problems

RatingYou should be able to...
CF 1200Recognize that balanced bracket sequences are counted by Catalan numbers
CF 1500Compute Cₙ mod p using the formula C(2n,n)/(n+1) with modular inverse
CF 1800Identify non-obvious Catalan instances (triangulations, non-crossing partitions); use the ballot problem
CF 2100Generalize Catalan to k-ary trees, multi-type Catalan, and weighted Catalan recurrences via generating functions
#ProblemSourceDifficultyKey idea
1Distinct BSTsLeetCode 96CF ~1200Direct Catalan DP
2Valid Parentheses StringsLeetCode 22CF ~1300Backtracking with invariant
3Dyck PathCSESCF ~1400C(2n,n)/(n+1) mod prime
4Bracket Sequences ICSESCF ~1500Catalan closed-form
5Bracket Sequences IICSESCF ~1700Prefix-fixed Catalan via reflection
6CF 1264C -- Beautiful MirrorsCFCF 1400Ballot / Catalan variant
7CF 1862F -- Magic Will Save the WorldCFCF 1500Catalan-adjacent counting
8CF 1lobsters -- Catalan + DPCFCF 1700Catalan recurrence in DP
9Polygon Triangulation CountSPOJCF ~1600(n+2)-gon Cn
10CF 1767C -- Count Binary StringsCFCF 1700Catalan + prefix constraints

Summary

Catalan Number C_n
==================

Closed form:   C_n = C(2n, n) / (n + 1)
                    = C(2n, n) - C(2n, n + 1)

Recurrence:    C_0 = 1
               C_n = sum_{i=0}^{n-1} C_i * C_{n-1-i}

First values:  1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, ...

Trigger:       counting balanced structures, BSTs, non-crossing
               partitions, triangulations, Dyck paths

Pitfall:       off-by-one in polygon size; forgetting C_0 = 1;
               division mod prime needs inverse (use subtraction form)


Built for the climb from Codeforces 1100 to 2100.