DP -- Interval

AL-30 | Optimize over all ways to split a contiguous range $[l, r]$ -- the go-to framework for matrix chain multiplication, polygon games, and merge-cost problems.

Difficulty: [Intermediate]Prerequisites: DP -- 1D IntroductionCF Rating: 1600-2000

Quick Revisit
USE WHEN: optimally split or merge a contiguous range [l, r]
INVARIANT: dp[i][j] = optimal answer for the subarray/interval [i..j]
TIME: O(n^3) | SPACE: O(n^2)
CLASSIC BUG: iterating by endpoints instead of by interval length
PRACTICE: DP Practice Ladder

Intuition
C++ STL Reference
Implementation (Contest-Ready)
Operations Reference
Problem Patterns & Tricks
Contest Cheat Sheet
Gotchas & Debugging
Self-Test
Practice Problems
Further Reading

Intuition

2.1 The Pain

You have 4 matrices to multiply:

$A_{0} (10 \times 30), A_{1} (30 \times 5), A_{2} (5 \times 60), A_{3} (60 \times 10)$

Matrix multiplication is associative, so the product $A_{0} A_{1} A_{2} A_{3}$ is the same regardless of parenthesization -- but the number of scalar multiplications is not. Every ordering gives a different cost:

Parenthesization              Scalar multiplications
-----------------------------------------------------
((A0 A1) A2) A3                        10500
(A0 (A1 A2)) A3                        33000
(A0 A1)(A2 A3)                          5000  <-- best
A0 ((A1 A2) A3)                        30000
A0 (A1 (A2 A3))                         7500

The worst choice costs 6.6x more than the best. And we only had 5 options here. The number of distinct parenthesizations for $n$ matrices is the Catalan number $C_{n - 1}$ :

n (matrices)    C_{n-1} (parenthesizations)
--------------------------------------------
 4                        5
10                     4862
20               1767263190

$C_{n}$ grows as $Θ (4^{n} / n^{3 / 2})$ . Brute-forcing every parenthesization is hopeless. We need structure.

2.2 The Key Insight

Every optimal solution splits the interval $[i, j]$ at some point $k$ -- try all splits, take the best.

Define $d p [i] [j]$ = minimum scalar multiplications to compute $A_{i} \cdot A_{i + 1} \dots A_{j}$ .

Any parenthesization of $[i, j]$ makes a final multiplication that combines $A_{i} \dots A_{k}$ (a single $d_{i} \times d_{k + 1}$ matrix) with $A_{k + 1} \dots A_{j}$ (a single $d_{k + 1} \times d_{j + 1}$ matrix). The cost of that final step is $d_{i} \cdot d_{k + 1} \cdot d_{j + 1}$ . We don't know which $k$ is best, so we try every $k \in [i, j)$ :

$d p [i] [j] = min_{i \leq k < j} (d p [i] [k] + d p [k + 1] [j] + d_{i} \cdot d_{k + 1} \cdot d_{j + 1})$

Base case: $d p [i] [i] = 0$ -- a single matrix requires zero multiplications.

The same structure generalizes to any problem where you combine adjacent subranges: replace the cost term, and swap $min$ for $max$ if you are maximizing.

2.3 Visual Walkthrough

We trace the 4-matrix example with dimensions $d = [10, 30, 5, 60, 10]$ .

Step 1 -- length = 1  (base cases, no multiplication needed)

  dp[0][0] = 0    dp[1][1] = 0    dp[2][2] = 0    dp[3][3] = 0

Step 2 -- length = 2  (one possible split each)

  dp[0][1]  k=0: 0 + 0 + 10*30*5  = 1500
  dp[1][2]  k=1: 0 + 0 + 30*5*60  = 9000
  dp[2][3]  k=2: 0 + 0 + 5*60*10  = 3000

Step 3 -- length = 3  (two candidate splits each, pick the min)

  dp[0][2]  k=0: dp[0][0]+dp[1][2] + 10*30*60 = 0+9000+18000   = 27000
            k=1: dp[0][1]+dp[2][2] + 10*5*60  = 1500+0+3000    =  4500 *
  dp[1][3]  k=1: dp[1][1]+dp[2][3] + 30*5*10  = 0+3000+1500    =  4500 *
            k=2: dp[1][2]+dp[3][3] + 30*60*10 = 9000+0+18000   = 27000

Step 4 -- length = 4  (three candidate splits -- the final answer)

  dp[0][3]  k=0: dp[0][0]+dp[1][3] + 10*30*10 = 0+4500+3000    =  7500
            k=1: dp[0][1]+dp[2][3] + 10*5*10  = 1500+3000+500  =  5000 *
            k=2: dp[0][2]+dp[3][3] + 10*60*10 = 4500+0+6000    = 10500

The filled table (upper triangle, row $= i$ , column $= j$ ):

         j=0     j=1     j=2     j=3
  i=0 [    0    1500    4500    5000 ]     <-- dp[0][3] = 5000
  i=1 [   --       0    9000    4500 ]
  i=2 [   --      --       0    3000 ]
  i=3 [   --      --      --       0 ]

Reading back the optimal split: $k = 1$ at the top level gives $(A_{0} A_{1}) (A_{2} A_{3})$ :

  A0*A1 : 10x30 * 30x5  = 1500 mults   -->  10x5  matrix
  A2*A3 :  5x60 * 60x10 = 3000 mults   -->   5x10 matrix
  final : 10x5  *  5x10 =  500 mults   -->  10x10 matrix
  total                  = 5000 mults

Notice how every cell we read ( $d p [0] [1], d p [2] [3]$ , etc.) was already computed in an earlier step. That is the whole point of filling by increasing length.

2.4 The Invariant

+------------------------------------------------------------------+
| INVARIANT                                                        |
|                                                                  |
| dp[i][j] = optimal cost for the subproblem on range [i, j],     |
| computed by examining every split:                               |
|                                                                  |
|   dp[i][j] = best over k in [i, j) of                           |
|              dp[i][k] + dp[k+1][j] + cost(i, k, j)              |
|                                                                  |
| Fill order: increasing interval length (len = 1, 2, ..., n).    |
| This guarantees that when we compute dp[i][j] of length L,      |
| every dp[i][k] (length <= L-1) and dp[k+1][j] (length <= L-1)  |
| is already finalized.                                            |
+------------------------------------------------------------------+

What the Code Won't Teach You

"Last operation" thinking is the soul of interval DP. The template says "try all split points k" -- but why does splitting work? Because we're asking: what was the last split / merge / removal performed on interval [i, j]? The last operation divides the problem into two independent subproblems because nothing else interacts with them after that point.

This is why burst balloons is hard: the natural "first operation" framing creates dependencies (bursting balloon k changes the neighbors of k-1 and k+1). The "last operation" framing eliminates them -- when k is the last balloon burst in (l, r), the boundaries l and r are guaranteed present.

"Last Operation" Thinking: Burst Balloons

  Balloons: ... [l]  b1  b2  b3  [r] ...
                 ^   elements     ^
               fixed            fixed
              boundary         boundary

  If k = b2 is the LAST balloon burst in (l, r):

    Step 1: Burst everything in (l, k)  ->  dp[l][k]
            Balloons left of k handled; only l and k remain

    Step 2: Burst everything in (k, r)  ->  dp[k][r]
            Balloons right of k handled; only k and r remain

    Step 3: Burst k itself              ->  a[l] * a[k] * a[r]
            k is the last one, so l and r are its neighbors

  Timeline (reading bottom -> top):

    LAST:     burst k            neighbors are l and r  <- guaranteed!
                ↑
    MIDDLE:   burst all in (k,r) only k, r remain in right half
                ↑
    FIRST:    burst all in (l,k) only l, k remain in left half

  Total: dp[l][k] + dp[k][r] + a[l] * a[k] * a[r]

Matrix chain multiplication is THE canonical interval DP problem. Every interval DP problem is, at its core, a variant of MCM. The structure is always:

Define dp[i][j] = optimal answer for range [i, j]
Try every split point k
Combine: dp[i][j] = best over k of (dp[i][k] ⊕ dp[k+1][j] ⊕ cost(i, k, j))

The only things that change between problems are: what dp[i][j] means, what the cost function is, and whether k splits [i,j] into [i,k]/[k+1,j] (MCM style) or fixes k as a boundary (burst balloons style).

MCM: dp[i][j] tries all split points k

  Matrices: A_i, A_{i+1}, ..., A_j

  For dp[1][4], try k = 1, 2, 3:

    k=1:  (A1) | (A2 A3 A4)    ->  dp[1][1] + dp[2][4] + d1*d2*d5
    k=2:  (A1 A2) | (A3 A4)    ->  dp[1][2] + dp[3][4] + d1*d3*d5
    k=3:  (A1 A2 A3) | (A4)    ->  dp[1][3] + dp[4][4] + d1*d4*d5

  Each split decomposes [i, j] into two shorter intervals:

    [============i======================j============]
         [====i=======k====]  [====k+1======j====]
              left half            right half
            (already               (already
             computed)              computed)

When transitions are O(n) vs O(1) per state. Most interval DP problems try all split points k ∈ [i, j), giving O(n) work per state and O(n³) total. But some problems have O(1) transitions:

Palindrome problems: if s[i] == s[j], then dp[i][j] = dp[i+1][j-1] -- no split-point loop needed.
Problems where only the endpoints matter (not the internal structure) -- the transition depends only on dp[i+1][j], dp[i][j-1], or dp[i+1][j-1].

If you find yourself writing the k-loop but the answer works without it, check whether the recurrence only references adjacent sub-intervals.

The "gap method" -- the standard iteration pattern:

cpp

for (int len = 2; len <= n; len++)         // gap = interval length
    for (int i = 0; i + len - 1 < n; i++) // left endpoint
    {
        int j = i + len - 1;              // right endpoint
        // ... try all splits k in [i, j)
    }

This ensures that when computing dp[i][j] of length L, all subproblems of length < L are already solved. Memorize this loop -- it's the skeleton of every bottom-up interval DP.

Interval DP Table Fill Order (n = 5)

  The table is upper-triangular: dp[i][j] exists only for i <= j.
  Fill diagonal by diagonal (by interval length):

         j=0   j=1   j=2   j=3   j=4
  i=0  [ d=1   d=2   d=3   d=4   d=5 ]     d = diagonal number
  i=1  [  .    d=1   d=2   d=3   d=4 ]         = interval length
  i=2  [  .     .    d=1   d=2   d=3 ]
  i=3  [  .     .     .    d=1   d=2 ]     Fill order:
  i=4  [  .     .     .     .    d=1 ]      d=1 -> d=2 -> d=3 -> d=4 -> d=5

  Dependency arrows for dp[1][4] (on diagonal d=4):

    dp[1][4] depends on:
      k=1: dp[1][1] (d=1) + dp[2][4] (d=3)  <- earlier diagonal OK
      k=2: dp[1][2] (d=2) + dp[3][4] (d=2)  <- earlier diagonal OK
      k=3: dp[1][3] (d=3) + dp[4][4] (d=1)  <- earlier diagonal OK

         j=0   j=1   j=2   j=3   j=4
  i=0  [  .     .     .     .     .  ]
  i=1  [  .    [●]   [●]   [●]--->*  ]   * = dp[1][4] (being computed)
  i=2  [  .     .     .     . ╱   .  ]   ● = dependencies on row i=1
  i=3  [  .     .     .    [●]    .  ]   ○ = dependencies on column j=4
  i=4  [  .     .     .     .   [○]  ]   All on earlier diagonals
               d=1   d=2   d=3  d=1

  Every arrow points from a later diagonal to an earlier one.
  This is why filling by increasing length guarantees correctness.

2.5 When to Reach for This

Trigger phrases -- if the problem says any of these, think interval DP first:

"merge adjacent elements" / "combine neighboring piles"
"optimal parenthesization" / "matrix chain"
"minimum cost to reduce range $[i, j]$ to a single element"
"burst balloons" / "remove stones" from a contiguous segment
"triangulate a polygon"
$n \leq 500$ with a range-based cost function

Counter-examples -- these look similar but are NOT interval DP:

Partition into exactly $k$ contiguous groups: the groups don't recursively split an interval at every possible point. Use 1D DP with an extra dimension for $k$ (or divide-and-conquer optimization).
Remove elements to make the array sorted / pick a subsequence: no merging or splitting of contiguous ranges. This is subsequence DP, not interval DP.
Range queries (min, max, sum): contiguous ranges appear, but there is no optimization over split points. Use a segment tree or sparse table instead.

C++ STL Reference

Function / Class	Header	What it does	Time Complexity
`vector<vector<T>>`	`<vector>`	2D DP table `dp[n][n]`	$O (n^{2})$ to allocate
`fill(first, last, val)`	`<algorithm>`	Initialize DP table to INF or 0	$O (n^{2})$
`partial_sum(first, last, dest)`	`<numeric>`	Build prefix sums for range-cost queries	$O (n)$
`min(a, b)` / `max(a, b)`	`<algorithm>`	Pick optimal split	$O (1)$
`numeric_limits<T>::max()`	`<limits>`	Sentinel for min-DP initialization	$O (1)$
`array<array<T,N>,N>`	`<array>`	Stack-allocated DP table (faster for small $n$ )	$O (1)$ alloc

Implementation (Contest-Ready)

Version 1: Minimal Contest Template -- Stone Merging (MCM Pattern)

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<int> a(n);
    for (auto& x : a) cin >> x;

    vector<long long> pre(n + 1, 0);
    for (int i = 0; i < n; i++) pre[i + 1] = pre[i] + a[i];
    auto cost = [&](int l, int r) -> long long { return pre[r + 1] - pre[l]; };

    vector dp(n, vector<long long>(n, 0));
    for (int len = 2; len <= n; len++) {
        for (int l = 0; l + len - 1 < n; l++) {
            int r = l + len - 1;
            dp[l][r] = LLONG_MAX;
            for (int k = l; k < r; k++)
                dp[l][r] = min(dp[l][r], dp[l][k] + dp[k + 1][r] + cost(l, r));
        }
    }
    cout << dp[0][n - 1] << "\n";
}

Version 2: Explained -- Optimal BST / General Interval DP

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<int> a(n);
    for (auto& x : a) cin >> x;

    // Prefix sums for O(1) range sum queries.
    // cost(l, r) = sum of a[l..r] = pre[r+1] - pre[l].
    vector<long long> pre(n + 1, 0);
    for (int i = 0; i < n; i++)
        pre[i + 1] = pre[i] + a[i];

    // dp[l][r] = minimum cost to merge all elements in [l, r] into one.
    // Base case: a single element costs nothing to "merge".
    vector dp(n, vector<long long>(n, 0));

    // Iterate by interval length.  Length 1 is already handled (dp[i][i] = 0).
    // For length 2, 3, ..., n:
    for (int len = 2; len <= n; len++) {
        for (int l = 0; l + len - 1 < n; l++) {
            int r = l + len - 1;

            // Initialize to infinity -- we want the minimum.
            dp[l][r] = LLONG_MAX;

            // Try every split point k: left part [l, k], right part [k+1, r].
            // The cost of this merge is: solve left + solve right + cost of
            // combining the two results (here, the sum of the whole range).
            for (int k = l; k < r; k++) {
                long long candidate = dp[l][k] + dp[k + 1][r]
                                    + (pre[r + 1] - pre[l]);
                dp[l][r] = min(dp[l][r], candidate);
            }
        }
    }

    // dp[0][n-1] holds the answer for the entire array.
    cout << dp[0][n - 1] << "\n";
}

Bonus: Burst Balloons Variant (Knuth-style Interval DP)

This variant defines $d p [l] [r]$ as the answer for the open interval $(l, r)$ -- elements $l$ and $r$ are boundaries, not burst. The "last balloon to burst" in $(l, r)$ is chosen as $k$ .

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<int> a(n + 2);
    a[0] = a[n + 1] = 1;
    for (int i = 1; i <= n; i++) cin >> a[i];

    // dp[l][r] = max coins from bursting all balloons strictly between l and r.
    vector dp(n + 2, vector<long long>(n + 2, 0));

    for (int len = 2; len <= n + 1; len++) {
        for (int l = 0; l + len <= n + 1; l++) {
            int r = l + len;
            for (int k = l + 1; k < r; k++) {
                long long val = dp[l][k] + dp[k][r]
                              + (long long)a[l] * a[k] * a[r];
                dp[l][r] = max(dp[l][r], val);
            }
        }
    }
    cout << dp[0][n + 1] << "\n";
}

Operations Reference

The $O (n^{3})$ core dominates runtime, but knowing where each constant factor comes from helps you judge whether Knuth's optimization or other tricks are worth applying.

Operation	Time	Space
Build prefix sums	$O (n)$	$O (n)$
Fill base cases (length 1)	$O (n)$	--
Compute all DP states	$O (n^{3})$	$O (n^{2})$
Query single state $d p [l] [r]$	$O (1)$	--
Reconstruct optimal split (with parent table)	$O (n)$	$O (n^{2})$
With Knuth optimization	$O (n^{2})$	$O (n^{2})$

Problem Patterns & Tricks

Pattern 1: Matrix Chain Multiplication (MCM)

Split a sequence of matrices at every possible boundary, minimize total scalar multiplications. The classic interval DP pattern. Cost of merging $[l, k]$ and $[k + 1, r]$ depends on the dimensions at $l$ , $k$ , and $r$ .

cpp

// dims[i] = rows of matrix i, dims[n] = cols of matrix n-1
for (int k = l; k < r; k++)
    dp[l][r] = min(dp[l][r], dp[l][k] + dp[k+1][r]
               + (long long)dims[l] * dims[k+1] * dims[r+1]);

CF examples: CF 149D (Coloring Brackets)

Pattern 2: Stone Merging / Pile Merging

Merge $n$ adjacent piles; each merge costs the sum of the two groups. Directly fits the MCM pattern with $cost (l, r) = \sum a [l . . r]$ (use prefix sums). Circular variant: duplicate the array to length $2 n$ and take the min over all length- $n$ windows.

CF examples: CF 1312E (Array Shrinking), CF 1132F (Clear the String)

Pattern 3: Burst Balloons / Last-to-Choose

Instead of choosing the first split, choose the last element to process in a range. Boundaries remain fixed; the chosen element $k$ is the "survivor." Recurrence becomes $d p [l] [r] = max_{k} (d p [l] [k] + d p [k] [r] + f (l, k, r))$ where $l < k < r$ .

cpp

// k is the last balloon burst in open interval (l, r)
for (int k = l + 1; k < r; k++)
    dp[l][r] = max(dp[l][r], dp[l][k] + dp[k][r]
               + (long long)a[l] * a[k] * a[r]);

CF examples: CF 607B (Zuma -- palindrome removal variant)

Pattern 4: Palindrome Partitioning / Removal

Given a string, remove palindromic substrings to clear it with minimum operations. Or: partition into minimum palindromes. Interval DP where $d p [l] [r]$ = min removals for substring $[l, r]$ . Key optimization: if $s [l] == s [r]$ , then $d p [l] [r] = d p [l + 1] [r - 1]$ (they can be removed together with the inner part).

cpp

dp[l][r] = dp[l+1][r-1]; // if s[l] == s[r]
for (int k = l; k < r; k++)
    dp[l][r] = min(dp[l][r], dp[l][k] + dp[k+1][r]);

CF examples: CF 607B (Zuma), CF 1132F (Clear the String)

Pattern 5: Optimal Binary Search Tree

Given keys with frequencies, build a BST minimizing total search cost. $d p [l] [r]$ = min cost for keys $l . . r$ , try each key $k$ as root. Cost adds the sum of frequencies (every node in the subtree pays one extra level). Admits Knuth's optimization: optimal $k$ is monotone, reducing to $O (n^{2})$ .

cpp

// opt[l][r] = optimal split point for [l, r]
for (int k = opt[l][r-1]; k <= opt[l+1][r]; k++)
    if (dp[l][k-1] + dp[k+1][r] + freq_sum(l, r) < dp[l][r]) {
        dp[l][r] = dp[l][k-1] + dp[k+1][r] + freq_sum(l, r);
        opt[l][r] = k;
    }

CF examples: CF 1197E (Culture), CF 1025D (Recovering BST)

Pattern 6: Polygon Triangulation / Cutting

Triangulate a convex polygon to minimize total cost of triangle edges. Label vertices $0. . n - 1$ . Fix edge $(0, n - 1)$ as base, try each third vertex $k$ . This is MCM with geometric cost.

CF examples: CF 1025D (Recovering BST)

Pattern 7: Circular Interval DP

When the array is circular (first and last are adjacent), duplicate the array: $b [i] = a [i mod n]$ for $i \in [0, 2 n)$ . Run standard interval DP on length- $n$ windows, answer = ${min}_{i = 0}^{n - 1} d p [i] [i + n - 1]$ .

cpp

// Circular stone merge: double the array
vector<int> b(2 * n);
for (int i = 0; i < 2 * n; i++) b[i] = a[i % n];
// Run interval DP on b[], then:
long long ans = LLONG_MAX;
for (int i = 0; i < n; i++) ans = min(ans, dp[i][i + n - 1]);

CF examples: CF 1114D (Flood Fill)

Contest Cheat Sheet

+--------------------------------------------------------------+
|                   INTERVAL DP CHEAT SHEET                    |
+--------------------------------------------------------------+
| WHEN TO USE:                                                 |
|   - Merge/split contiguous ranges optimally                  |
|   - "Matrix chain" pattern: combine [l,k] and [k+1,r]       |
|   - n <= 500 and O(n^3) fits in time                         |
|   - Circular arrays (double and take n-length window)        |
+--------------------------------------------------------------+
| TEMPLATE:                                                    |
|   for (int len = 2; len <= n; len++)                         |
|     for (int l = 0; l + len - 1 < n; l++) {                 |
|       int r = l + len - 1;                                   |
|       dp[l][r] = INF;                                        |
|       for (int k = l; k < r; k++)                            |
|         dp[l][r] = min(dp[l][r],                             |
|           dp[l][k] + dp[k+1][r] + cost(l, r));              |
|     }                                                        |
+--------------------------------------------------------------+
| VARIANTS:                                                    |
|   - "Last chosen":  k in (l,r), dp[l][k]+dp[k][r]+f(l,k,r) |
|   - Palindrome:     if s[l]==s[r], dp[l][r]=dp[l+1][r-1]    |
|   - Circular:       double array, min over n windows         |
+--------------------------------------------------------------+
| COMPLEXITY:  Time O(n^3)   Space O(n^2)                      |
| KNUTH OPT:  Time O(n^2)   (when cost is convex/monotone)     |
+--------------------------------------------------------------+

Gotchas & Debugging

Loop Order -- The #1 Mistake

Wrong:

cpp

for (int l = 0; l < n; l++)
    for (int r = l + 1; r < n; r++)  // dp[l+1][r] might not be computed yet!

Right:

cpp

for (int len = 2; len <= n; len++)
    for (int l = 0; l + len - 1 < n; l++)

Always iterate by length. This is non-negotiable.

Base Cases

For merge-style: $d p [i] [i] = 0$ (a single pile costs nothing to merge).
For removal-style: $d p [i] [i] = 1$ (one element needs one operation to remove).
For "last chosen" (burst balloons): boundaries are not part of the range. Table is $(n + 2) \times (n + 2)$ with sentinels at index $0$ and $n + 1$ .

Initialization

Min-DP: initialize to LLONG_MAX or 1e18. Do not use INT_MAX if you add to it (overflow).
Max-DP: initialize to 0 or LLONG_MIN as appropriate.
Always set base cases before the length loop, or handle len = 1 explicitly.

Off-by-One in Split Range

MCM style: $k \in [l, r - 1]$ . Left = $[l, k]$ , right = $[k + 1, r]$ .
Last-chosen style: $k \in [l + 1, r - 1]$ . Both ends are boundaries, not elements.
Mixing these up produces subtle WA.

Circular Arrays

Array must be exactly $2 n$ elements, not $2 n - 1$ .
Final answer: scan all windows $d p [i] [i + n - 1]$ for $i \in [0, n)$ .

Memory

$d p [500] [500]$ of long long = $500^{2} \times 8 = 2$ MB. Fine.
$d p [5000] [5000]$ = 200 MB. Will MLE. If $n > 700$ , reconsider.

Debugging Tips

Print the DP table for small $n$ (say $n = 4$ ) and verify by hand.
Check that $d p [i] [i]$ values match your base case.
If WA, verify the split-point range: k < r vs k <= r vs k < r - 1.

Mental Traps

Trap: "Interval DP is just 2D DP with i and j."

The table is 2D, yes -- but the fill order is fundamentally different. In grid DP you fill row-by-row (or column-by-column). In interval DP you must fill by interval length, diagonal by diagonal. This is the single most common source of bugs:

WRONG: Row-by-row fill            CORRECT: Diagonal (by length) fill
(standard 2D DP order)            (interval DP order)

     j: 0  1  2  3                     j: 0  1  2  3
i:0  [  1-> 2-> 3-> 4 ]             i:0  [  1  2  4  7 ]
  1  [     5-> 6-> 7 ]               1  [     1  3  6 ]
  2  [        8-> 9 ]               2  [        1  5 ]
  3  [          10 ]               3  [           1 ]

Row-by-row: dp[0][3] is cell 4,   Diagonal: dp[0][3] is cell 7,
computed BEFORE dp[1][3] (cell 7)  computed AFTER dp[1][3] (cell 6)
-- but dp[0][3] NEEDS dp[1][3]!    -- all dependencies satisfied OK

Numbers show computation order. In row-by-row, cell 4 (dp[0][3])
runs before cells 6-7 (dp[1][3], dp[2][3]) that it depends on.
In diagonal order, every dependency is on an earlier diagonal.

Trap: "The split point k ranges from i to j."

Almost always wrong. In MCM-style problems, k ranges from i to j-1 (i.e., k < r), because k is the last index of the left half. If k = j, the left half is [i, j] (the whole interval) and the right half [j+1, j] is empty -- producing an infinite loop in top-down or a wrong answer in bottom-up. In burst-balloons-style problems, k ranges from i+1 to j-1 (l < k < r), because i and j are boundaries, not elements.

Rule of thumb: both sub-intervals created by the split must be non-empty (or must be valid base cases).

Trap: "I can use top-down to avoid thinking about fill order."

True -- memoized recursion handles the order automatically. But this masks your understanding of the dependency structure. In contests this matters because:

Bottom-up is faster (no recursion overhead, better cache locality).
Knuth's optimization requires bottom-up iteration in a specific order.
If you can't explain why length-first iteration works, you'll struggle to adapt the pattern to non-standard variants (circular, tree-shaped intervals).

Top-down is fine for getting AC, but bottom-up forces you to internalize the diagonal dependency pattern -- which is the real learning goal.

Common Mistakes

Iterating by endpoints instead of by length. dp[i][j] depends on shorter intervals. The outer loop must be over length len = 1..n, inner loop over starting index i. If you loop for l ; for r by left endpoint, when computing dp[l][r] the subproblems dp[l][k] and dp[k+1][r] haven't been computed yet.
Off-by-one in split point range. For interval [i..j], the split point k typically ranges i <= k < j (or i < k < j depending on the formulation). Getting boundaries wrong misses cases.
Forgetting the merge cost. Splitting [i..j] into [i..k] and [k+1..j] often adds a combining cost (e.g., prefix[j+1] - prefix[i] in matrix chain). Omitting it gives wrong answers.
Base case mismatch. Single-element intervals dp[i][i] are usually 0 or the element itself. Wrong initialization propagates to every larger interval.

Debug Drills

Drill 1. Matrix chain multiplication returns 0.

cpp

int dp[505][505] = {};
for (int len = 2; len <= n; len++)
    for (int i = 1; i + len - 1 <= n; i++) {
        int j = i + len - 1;
        for (int k = i; k < j; k++)
            dp[i][j] = min(dp[i][j], dp[i][k] + dp[k+1][j] + p[i-1]*p[k]*p[j]);
    }

What's wrong?

dp[i][j] starts at 0 (from = {}), and min(0, anything_positive) is always 0. Initialize dp[i][j] = INF for all i < j before the loop, or set dp[i][j] = INF at the start of the len loop before trying splits.

Drill 2. Stone merging gives wrong answer on circular arrangement.

cpp

// Only considers linear arrangement
for (int len = 2; len <= n; len++)
    for (int l = 0; l + len - 1 < n; l++) {
        int r = l + len - 1;
        dp[l][r] = INF;
        for (int k = l; k < r; k++)
            dp[l][r] = min(dp[l][r], dp[l][k] + dp[k+1][r] + sum(l, r));
    }

What's wrong?

For circular stone merging, you need to double the array: concatenate a with itself, run interval DP on the $2 n$ -length array, then take min(dp[i][i+n-1]) for i = 0..n-1. The linear-only approach misses merges that wrap around.

Drill 3. Interval DP on bracket sequence gives wrong count.

cpp

for (int len = 2; len <= n; len++)
    for (int i = 0; i + len - 1 < n; i++) {
        int j = i + len - 1;
        if (s[i] == '(' && s[j] == ')')
            dp[i][j] = dp[i+1][j-1] + 2;
        for (int k = i; k < j; k++)
            dp[i][j] = max(dp[i][j], dp[i][k] + dp[k+1][j]);
    }

What's wrong?

Missing the case where s[i] == '[' and s[j] == ']' (or any other bracket type). Also, the matching check should come before the split loop but its result should participate in the max -- currently it overwrites the split-loop result when brackets match. Combine: compute the match case, then take max over splits including the match.

Self-Test

Before moving to practice problems, verify you can do each of these without looking at the notes above:

[ ] Implement matrix chain multiplication using interval DP -- write the full bottom-up solution from memory
[ ] Explain the fill order: why we iterate by interval length, not by (i, j) row-by-row -- what breaks if you use row order?
[ ] State what dp[i][j] represents in a burst-balloons-type problem -- why are i and j boundaries, not elements?
[ ] Trace through a 4-element interval DP by hand, filling the table diagonal by diagonal (length 1 -> 2 -> 3 -> 4)
[ ] Convert between top-down recursive interval DP and bottom-up iterative -- what changes besides the loop structure?
[ ] Identify interval DP problems from these keywords: "merge", "remove", "partition contiguous range", "parenthesize optimally"

Practice Problems

CF Rating	How Interval DP Appears
1200	Rarely -- most interval DPs start at 1500+
1500	Palindrome partitioning, basic matrix chain multiplication
1800	Stone merging, bracket coloring, merge-equal-values problems
2100	Interval DP with Knuth optimization, interval DP on circular arrays, combined with segment trees

#	Problem	Source	Difficulty	Key Idea
1	Matrix Chain Multiplication	CSES	Easy	Standard MCM template
2	Zuma	CF 607B	1900	Palindrome removal, interval DP
3	Array Shrinking	CF 1312E	1800	Merge equal-valued adjacent elements
4	Clear the String	CF 1132F	1800	Remove same-char substrings, interval DP
5	Flood Fill	CF 1114D	1900	Circular interval DP on colors
6	Coloring Brackets	CF 149D	1900	Interval DP on bracket sequences
7	Recovering BST	CF 1025D	2400	Interval DP on BST structure with GCD
8	Yet Another Minimization Problem	CF 868F	2500	Interval DP + divide-and-conquer optimization

Recap

Loop structure: outer loop over interval length, inner loop over left endpoint, innermost loop over split point.
Transition: dp[i][j] = min/max over k of (dp[i][k] + dp[k+1][j] + merge_cost).
O(n^3) time, O(n^2) space. Knuth's optimization reduces to O(n^2) when the cost function satisfies the quadrangle inequality.
Classic problems: matrix chain multiplication, burst balloons, stone merging, palindrome partitioning.

Cross-Links

DP -- 1D Introduction -- 1D DP foundation before tackling intervals
DP Thinking Framework -- interval is one of the 6 state archetypes
DP vs Greedy Guide -- greedy splitting usually fails; interval DP is the safe choice
DP Practice Ladder -- graded problem set

DP -- Interval ​

Contents ​

Intuition ​

2.1 The Pain ​

2.2 The Key Insight ​

2.3 Visual Walkthrough ​

2.4 The Invariant ​

What the Code Won't Teach You ​

2.5 When to Reach for This ​

C++ STL Reference ​

Implementation (Contest-Ready) ​

Version 1: Minimal Contest Template -- Stone Merging (MCM Pattern) ​

Version 2: Explained -- Optimal BST / General Interval DP ​

Bonus: Burst Balloons Variant (Knuth-style Interval DP) ​

Operations Reference ​

Problem Patterns & Tricks ​

Pattern 1: Matrix Chain Multiplication (MCM) ​

Pattern 2: Stone Merging / Pile Merging ​

Pattern 3: Burst Balloons / Last-to-Choose ​

Pattern 4: Palindrome Partitioning / Removal ​

Pattern 5: Optimal Binary Search Tree ​

Pattern 6: Polygon Triangulation / Cutting ​

Pattern 7: Circular Interval DP ​

Contest Cheat Sheet ​

Gotchas & Debugging ​

Loop Order -- The #1 Mistake ​

Base Cases ​

Initialization ​

Off-by-One in Split Range ​

Circular Arrays ​

Memory ​

Debugging Tips ​

Mental Traps ​

Common Mistakes ​

Debug Drills ​

Self-Test ​

Practice Problems ​

Further Reading ​

Recap ​

Cross-Links ​

DP -- Interval

Contents

Intuition

2.1 The Pain

2.2 The Key Insight

2.3 Visual Walkthrough

2.4 The Invariant

What the Code Won't Teach You

2.5 When to Reach for This

C++ STL Reference

Implementation (Contest-Ready)

Version 1: Minimal Contest Template -- Stone Merging (MCM Pattern)

Version 2: Explained -- Optimal BST / General Interval DP

Bonus: Burst Balloons Variant (Knuth-style Interval DP)

Operations Reference

Problem Patterns & Tricks

Pattern 1: Matrix Chain Multiplication (MCM)

Pattern 2: Stone Merging / Pile Merging

Pattern 3: Burst Balloons / Last-to-Choose

Pattern 4: Palindrome Partitioning / Removal

Pattern 5: Optimal Binary Search Tree

Pattern 6: Polygon Triangulation / Cutting

Pattern 7: Circular Interval DP

Contest Cheat Sheet

Gotchas & Debugging

Loop Order -- The #1 Mistake

Base Cases

Initialization

Off-by-One in Split Range

Circular Arrays

Memory

Debugging Tips

Mental Traps

Common Mistakes

Debug Drills

Self-Test

Practice Problems

Further Reading

Recap

Cross-Links