Skip to content

DP -- Divide and Conquer Optimization

Quick Revisit

  • USE WHEN: Partition DP where opt(i,j)opt(i,j+1) — optimal split point is monotone (cost satisfies quadrangle inequality)
  • INVARIANT: For the midpoint jm, find opt(i,jm); left half searches mopt, right half searches mopt
  • TIME: O(kn log n) | SPACE: O(n)
  • CLASSIC BUG: Passing wrong m-search bounds into recursive calls (must be [mlo,opt] and [opt,mhi], not the j bounds)
  • PRACTICE: 04-ladder-dp

AL-32 | When the optimal split point is monotone, cut the DP transition from O(kn2) to O(knlogn) -- the standard trick for partition-cost DPs on Codeforces Div 1 C/D.

Difficulty: [Advanced]Prerequisites: DP -- 1D Introduction, Binary SearchCF Rating: 2000-2300+

Contents


Intuition

You want to split a row of n data points into exactly k contiguous segments, minimizing a total cost. The natural DP tries every possible split point for every position -- and that inner loop blows up.

You have a recurrence that partitions n elements into k contiguous groups:

dp[i][j]=min0m<j(dp[i1][m]+C(m+1,j))

Layer i represents "use exactly i groups to cover positions 1..j." For each of the k layers you try every split point m for every position j, giving O(n) work per state and O(kn2) total.

When n=105 and k=20, that is 2×1011 operations -- hopelessly slow. You need to cut the inner loop from O(n) to O(logn) per state.

Let opt(i,j) denote the smallest split point m that achieves the minimum for state (i,j).

If the optimal split point opt(i,j) is monotone non-decreasing in j (i.e., opt(i,j)opt(i,j+1) for all j), you can solve each row of the DP with divide and conquer, reducing the total cost from O(kn2) to O(knlogn).

Why does monotonicity help? Once you know opt(i,jm) for the midpoint jm of some range [jlo,jhi], you know:

  • Every j<jm has opt(i,j)opt(i,jm).
  • Every j>jm has opt(i,j)opt(i,jm).

So you split the problem into two halves, each with a strictly smaller search range for m. This is divide and conquer on the j-axis, and each "level" of the recursion scans a total of O(n) split points, across O(logn) levels.

The monotonicity condition holds whenever the cost function C(l,r) satisfies the quadrangle inequality (QI):

C(a,c)+C(b,d)C(a,d)+C(b,c)for abcd

Intuitively, "nested intervals cost no more than crossing intervals." Squared range sums, convex functions of prefix-sum differences, and many combinatorial costs satisfy QI.

Naive vs D&C: Scanning Pattern Comparison

  NAIVE O(kn²): For each j, scan ALL possible split points m.
  Every cell scans the full range independently.

  j=1:  [x]
  j=2:  [x x]
  j=3:  [x x x]
  j=4:  [x x x x]
  j=5:  [x x x x x]
  j=6:  [x x x x x x]
  j=7:  [x x x x x x x]
  j=8:  [x x x x x x x x]
        ─────────────────────
        m=0 1 2 3 4 5 6 7        Total x's ~= n²/2

  D&C OPTIMIZATION O(kn log n): Solve midpoint first, then
  constrain each half's search range using monotonicity.

  Level 0 -- solve j=4:    [. . . x x x x .]       scan m in [0..7]
                                                           ↓ opt(4)=2
  Level 1 -- solve j=2,6:  [. x x . . . . .]       m in [0..2]
                           [. . . . . x x x]       m in [2..7]

  Level 2 -- solve j=1,3,  [x . . . . . . .]       m in [0..1]
             5,7:          [. . x . . . . .]       m in [1..2]
                           [. . . . x x . .]       m in [2..5]
                           [. . . . . . x x]       m in [5..7]

  Each level's x's partition [0..n], so total per level = O(n).
  Levels = O(log n).  Grand total = O(n log n) per layer.

Visual Walkthrough

Setup: n=8, k=2 (two groups). Layer i=2: compute dp[2][j] for j=2..8 using values from dp[1][]. Suppose the optimal split points turn out to be:

  j:      2   3   4   5   6   7   8
  opt:    1   1   2   2   3   5   6
                          ^
          non-decreasing -- monotonicity holds

Step 1 -- Solve the midpoint. Range j[2,8], search m[1,7]. Midpoint jm=5. Scan m=1..4 and find opt(2,5)=2.

  j:    [2 ---- 5 ---- 8]       m-range: [1 .. 7]
               ^
            solve j=5 -> opt=2

Step 2 -- Recurse left. Range j[2,4], search m[1,2] (narrowed: mopt(2,5)=2). Midpoint jm=3. Scan m=1..2 and find opt(2,3)=1.

  j:    [2 -- 3 -- 4]           m-range: [1 .. 2]
             ^
          solve j=3 -> opt=1

Then recurse into j=2 with m[1,1] and j=4 with m[1,2].

Step 3 -- Recurse right. Range j[6,8], search m[2,7] (narrowed: mopt(2,5)=2). Midpoint jm=7. Scan m=2..6 and find opt(2,7)=5.

  j:    [6 -- 7 -- 8]           m-range: [2 .. 7]
             ^
          solve j=7 -> opt=5

Then recurse into j=6 with m[2,5] and j=8 with m[5,7].

Step 4 -- Leaf calls. Each remaining sub-range has one element. Scan the narrowed m-range to fill in dp[2][2], dp[2][4], dp[2][6], dp[2][8].

  Full recursion tree (level -> total m-scans = O(n)):

  Level 0:  j in [2..8],  m in [1..7]      scan 4 values for j=5
                /                \
  Level 1:  [2..4] m:[1..2]    [6..8] m:[2..7]
            scan 2 for j=3      scan 5 for j=7
            /      \            /      \
  Level 2: [2]    [4]        [6]     [8]
           m:[1,1] m:[1,2]   m:[2,5]  m:[5,7]
           scan 1  scan 2    scan 4   scan 3

  Total scans across all levels: O(n) per level * O(log n) levels

The Invariant

Invariant: When computing dp[i][j] for j[jlo,jhi], the optimal split point lies in [mlo,mhi]. Solving the midpoint jmid and finding opt(i,jmid) splits both the j-range and the m-range:

  • Left half: j[jlo,jmid1], m[mlo,opt(i,jmid)]
  • Right half: j[jmid+1,jhi], m[opt(i,jmid),mhi]

Each recursive level partitions [mlo,mhi] into non-overlapping scans (plus shared endpoints), so total work per level is O(n). With O(logn) levels, one layer costs O(nlogn) and the full DP costs O(knlogn).

General Recursion Tree of solve(l, r, opt_lo, opt_hi)

  solve(l, r, opt_lo, opt_hi) splits the j-range [l..r] at its
  midpoint and narrows the opt-range for each half.

  solve(1, 8, 0, 7)                        j-range=[1..8], opt-range=[0..7]
       │  mid=4, scan m∈[0..7] -> opt(4)=2

       ├── solve(1, 3, 0, 2)               j=[1..3], opt=[0..2]  <- narrowed!
       │       │  mid=2, scan m∈[0..2] -> opt(2)=1
       │       ├── solve(1, 1, 0, 1)       j=[1], opt=[0..1]
       │       │       scan m∈[0..1] -> done (leaf)
       │       └── solve(3, 3, 1, 2)       j=[3], opt=[1..2]
       │               scan m∈[1..2] -> done (leaf)

       └── solve(5, 8, 2, 7)               j=[5..8], opt=[2..7]  <- narrowed!
               │  mid=6, scan m∈[2..7] -> opt(6)=4
               ├── solve(5, 5, 2, 4)       j=[5], opt=[2..4]
               │       scan m∈[2..4] -> done (leaf)
               └── solve(7, 8, 4, 7)       j=[7..8], opt=[4..7]
                       │  mid=7, scan m∈[4..7] -> opt(7)=5
                       ├── solve(8, 8, 5, 7)   (leaf)
                       └── (empty left)

  KEY: At each level, the opt-ranges are non-overlapping (up to shared
  endpoints), so total scan work per level = O(n). Depth = O(log n).
  Rows = layers (k), Columns = positions (j).
  Each row is filled by one D&C pass over the previous row.

         j=1   j=2   j=3   j=4   j=5   j=6   j=7   j=8
  k=1:  [base] [base] [base] [base] [base] [base] [base] [base]
         │      │      │      │      │      │      │      │
         ▼      ▼      ▼      ▼      ▼      ▼      ▼      ▼
  k=2:  [    ] [    ] [    ] [MID ] [    ] [    ] [    ] [    ]
                                ↑ solve mid first
                             opt=2
                        ┌───────┴───────┐
                   left half         right half
                   j∈[1..3]          j∈[5..8]
                   opt∈[0..2]        opt∈[2..7]
                        │                 │
                        ▼                 ▼
  k=2:  [    ] [MID ] [    ] [done] [    ] [MID ] [    ] [    ]
                  ↑                          ↑
               opt=1                      opt=4
           ┌────┴────┐               ┌────┴──────┐
          j=1       j=3             j=5     j∈[7..8]
         opt∈[0,1] opt∈[1,2]      opt∈[2,4] opt∈[4,7]

  k=3:  (same D&C process, using completed k=2 row as "prv")

  Arrow pattern: each layer reads the FULL previous layer (prv[]),
  but the D&C recursion constrains which m-values are scanned.

Worked Example: Cell-by-Cell Trace for n = 6, k = 2

  Computing dp[2][j] for j = 1..6.   opt-range starts as [0..5].
  Suppose cost(m+1, j) = (j - m)².   dp[1][j] = j² (base layer).

  Level 0:  solve(1, 6, 0, 5)
    mid j=3, scan m ∈ {0,1,2,3,4,5}:
      m=0: dp[1][0]+cost(1,3) = 0+9  = 9
      m=1: dp[1][1]+cost(2,3) = 1+4  = 5
      m=2: dp[1][2]+cost(3,3) = 4+1  = 5  <- tie, take m=2
    -> dp[2][3] = 5,  opt(2,3) = 2

  Level 1L: solve(1, 2, 0, 2)
    mid j=1, scan m ∈ {0}:
      m=0: 0+1 = 1
    -> dp[2][1] = 1,  opt(2,1) = 0
    Then j=2, scan m ∈ {0,1,2}:
      m=0: 0+4 = 4    m=1: 1+1 = 2    m=2: 4+0 = 4
    -> dp[2][2] = 2,  opt(2,2) = 1

  Level 1R: solve(4, 6, 2, 5)
    mid j=5, scan m ∈ {2,3,4}:
      m=2: 4+9 = 13   m=3: 9+4 = 13   m=4: 16+1 = 17
    -> dp[2][5] = 13,  opt(2,5) = 2
    Then j=4, scan m ∈ {2}:
      m=2: 4+4 = 8
    -> dp[2][4] = 8,  opt(2,4) = 2
    Then j=6, scan m ∈ {2,3,4,5}:
      m=2: 4+16 = 20  m=3: 9+9 = 18  m=4: 16+4 = 20  m=5: 25+1 = 26
    -> dp[2][6] = 18,  opt(2,6) = 3

  Summary:  j:   1   2   3   4   5   6
           dp:   1   2   5   8  13  18
          opt:   0   1   2   2   2   3    <- non-decreasing OK

  Total m-scans: 1 + 3 + 6 + 1 + 3 + 4 = 18  (vs 6x6 = 36 naive)
  Savings grow with n:  O(n log n) vs O(n²).

What the Code Won't Teach You

How to verify the monotonicity condition in practice. Before applying D&C optimization to a new problem, implement the naive O(kn2) DP first. Add a line that records opt[i][j] -- the split point m that achieves the minimum for each state (i,j). Print opt[i][j] for a fixed row i across all j. If the values are non-decreasing, the precondition holds. If they're not, D&C optimization will silently give wrong answers. This takes 5 minutes and saves hours of debugging.

cpp
// Verification snippet -- add to your naive O(kn²) DP:
for (int j = 1; j <= n; j++) {
    for (int m = 0; m < j; m++) {
        if (prv[m] + cost(m+1, j) < cur[j]) {
            cur[j] = prv[m] + cost(m+1, j);
            opt[j] = m;   // record optimal split
        }
    }
}
// After filling row i, print opt[j] values:
for (int j = 1; j <= n; j++) cerr << "opt[" << j << "]=" << opt[j] << " ";
cerr << "\n";
// Expected: non-decreasing sequence. If not, D&C optimization does NOT apply.

The relationship between D&C optimization and Knuth's optimization. Both exploit the same underlying phenomenon -- monotonicity of optimal split points. The difference is the DP shape they apply to:

  • Knuth's optimization: Interval DP of the form dp[l][r]=minlm<r(dp[l][m]+dp[m+1][r]+C(l,r)). Reduces O(n3)O(n2). The split point m satisfies opt(l,r1)opt(l,r)opt(l+1,r).
  • D&C optimization: Layered/partitioning DP of the form dp[i][j]=minm(dp[i1][m]+C(m+1,j)). Reduces O(kn2)O(knlogn). The split point satisfies opt(i,j)opt(i,j+1).

Both require the quadrangle inequality on the cost function, but they apply it to different recurrence structures. Using D&C on an interval DP (or Knuth's on a partitioning DP) won't work.

Why the recursion depth is O(logn). Each call to solve(jl, jr, kl, kr) picks the midpoint jm=(jl+jr)/2 and recurses on [jl,jm1] and [jm+1,jr]. The j-range halves at each level, so the recursion has depth log2n. At each level, the m-ranges across all recursive calls at that level partition [0,n] (up to shared endpoints), so total scan work per level is O(n). Multiply: O(logn) levels × O(n) per level =O(nlogn) per layer.

With those deeper insights in hand, here is how to recognize D&C optimization problems in practice.

When to Reach for This

Trigger phrases in problem statements:

  • "Partition array into exactly k contiguous groups, minimize total cost."
  • You write a 1D/1D DP and notice (or can prove) that opt is monotone.
  • Cost function is a concave or convex function of a prefix-sum difference (squared sums, absolute deviations, etc.).

Quick QI test: if C(l,r) depends only on rl and is convex, QI usually holds. When in doubt, brute-force verify on small n:

cpp
for (int a = 1; a <= n; a++)
  for (int b = a; b <= n; b++)
    for (int c = b; c <= n; c++)
      for (int d = c; d <= n; d++)
        assert(cost(a,c) + cost(b,d) <= cost(a,d) + cost(b,c));

Counter-examples -- when this does NOT apply:

  • opt is not monotone (e.g., cost is the number of distinct elements in a range -- QI fails in general). Use a different technique.
  • Cost is linear in the split point (no curvature). Use the Convex Hull Trick (CHT) instead.
  • Cost has no structure at all. Fall back to O(kn2) or consider Knuth's optimization if the Knuth conditions hold.

C++ STL Reference

The implementation uses only basic containers and algorithms:

Function / ClassHeaderWhat it doesTime Complexity
vector<long long><vector>DP rows prv[], cur[]O(n) alloc
fill(first, last, val)<algorithm>Reset DP row to LLONG_MAXO(n)
partial_sum(first, last, dest)<numeric>Build prefix sums for cost queriesO(n)
min(a, b)<algorithm>Pick best transitionO(1)
LLONG_MAX<climits>Infinity sentinel for min-DP--
function<void(...)><functional>Recursive lambda (if needed)--

Implementation (Contest-Ready)

Version 1: Minimal Contest Template

⚠️ Monotonicity is a PRECONDITION, not a hope. The D&C optimization requires that the optimal split point opt(i, j) satisfies opt(i, j) <= opt(i, j+1) (or >=). If this doesn't hold, the optimization gives wrong answers silently.

How to verify: Before applying D&C optimization, implement the O(n²) DP and print opt(i, j) for each j in a fixed row i. If the values are non-decreasing, proceed. If not, D&C optimization does NOT apply.

Sufficient condition (Knuth/SMAWK): If the cost function C(l, r) satisfies the quadrangle inequality: C(a, c) + C(b, d) <= C(a, d) + C(b, c) for a <= b <= c <= d, then monotonicity holds.

Solves: partition array a[1..n] into exactly K contiguous groups, minimize C(l,r) where C(l,r)=(sum of a[l..r])2.

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    int n, K;
    cin >> n >> K;
    vector<long long> a(n + 1), pre(n + 1, 0);
    for (int i = 1; i <= n; i++) { cin >> a[i]; pre[i] = pre[i - 1] + a[i]; }

    auto cost = [&](int l, int r) -> long long {
        long long s = pre[r] - pre[l - 1];
        return s * s;
    };

    const long long INF = 1e18;
    vector<long long> prv(n + 1, INF), cur(n + 1, INF);

    // Base layer: 1 group covering [1..j]
    for (int j = 1; j <= n; j++) prv[j] = cost(1, j);

    for (int i = 2; i <= K; i++) {
        fill(cur.begin(), cur.end(), INF);
        // D&C: compute cur[jl..jr] with k in [kl..kr]
        auto solve = [&](auto& self, int jl, int jr, int kl, int kr) -> void {
            if (jl > jr) return;
            int jm = (jl + jr) / 2;
            int best_k = kl;
            for (int k = kl; k <= min(jm - 1, kr); k++) {
                long long val = prv[k] + cost(k + 1, jm);
                if (val < cur[jm]) { cur[jm] = val; best_k = k; }
            }
            self(self, jl, jm - 1, kl, best_k);
            self(self, jm + 1, jr, best_k, kr);
        };
        solve(solve, i, n, i - 1, n - 1);
        swap(prv, cur);
    }
    cout << prv[n] << "\n";
}

Version 2: Explained

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    int n, K;
    cin >> n >> K;
    vector<long long> a(n + 1), pre(n + 1, 0);
    for (int i = 1; i <= n; i++) {
        cin >> a[i];
        pre[i] = pre[i - 1] + a[i];
    }

    // Cost of one group covering [l..r].
    // Here: squared range sum.  Replace with your problem's cost.
    auto cost = [&](int l, int r) -> long long {
        long long s = pre[r] - pre[l - 1];
        return s * s;
    };

    const long long INF = 1e18;
    // prv[j] = dp[i-1][j], cur[j] = dp[i][j].
    // Only two rows needed -- previous layer and current layer.
    vector<long long> prv(n + 1, INF), cur(n + 1, INF);

    // Layer 1: exactly one group, so dp[1][j] = cost(1, j).
    for (int j = 1; j <= n; j++)
        prv[j] = cost(1, j);

    // Layers 2..K: use divide and conquer on j.
    for (int i = 2; i <= K; i++) {
        fill(cur.begin(), cur.end(), INF);

        // solve(jl, jr, kl, kr):
        //   Compute cur[j] for j in [jl..jr].
        //   The optimal k for these j-values lies in [kl..kr].
        //
        // Strategy:
        //   1. Pick midpoint jm = (jl+jr)/2.
        //   2. Brute-force scan k in [kl..min(jm-1, kr)] to find opt(jm).
        //      (k < jm because group i must be non-empty: [k+1..jm].)
        //   3. Recurse left  [jl..jm-1] with k in [kl..opt(jm)].
        //   4. Recurse right [jm+1..jr] with k in [opt(jm)..kr].
        //
        // Why correct: opt is monotone, so left half needs k <= opt(jm)
        // and right half needs k >= opt(jm).
        auto solve = [&](auto& self, int jl, int jr, int kl, int kr) -> void {
            if (jl > jr) return;
            int jm = (jl + jr) / 2;
            int best_k = kl;

            // Scan all feasible k for position jm.
            // k can be at most jm-1 (so that group [k+1..jm] is non-empty),
            // and at most kr (upper bound from monotonicity).
            for (int k = kl; k <= min(jm - 1, kr); k++) {
                long long val = prv[k] + cost(k + 1, jm);
                if (val < cur[jm]) {
                    cur[jm] = val;
                    best_k = k;
                }
            }

            // Left half: j < jm, so opt <= best_k.
            self(self, jl, jm - 1, kl, best_k);
            // Right half: j > jm, so opt >= best_k.
            self(self, jm + 1, jr, best_k, kr);
        };

        // j ranges from i (need at least i elements for i groups) to n.
        // k ranges from i-1 (previous layer needs at least i-1 elements) to n-1.
        solve(solve, i, n, i - 1, n - 1);
        swap(prv, cur);
    }

    // After K layers, answer is in prv[n] (swapped from cur).
    cout << prv[n] << "\n";
}

Operations Reference

OperationTimeSpace
Build prefix sumsO(n)O(n)
Base layer (1 group)O(n)--
One D&C layerO(nlogn)O(n)
Full DP (K layers)O(Knlogn)O(n)
Naive DP (no optimization)O(Kn2)O(n)
Cost function query (prefix sums)O(1)--

Problem Patterns & Tricks

Pattern 1: Partition into K Groups -- Squared Cost

Partition array into exactly K contiguous groups, minimize (group sum)2. The squared-sum cost satisfies QI. Direct application of the template.

cpp
auto cost = [&](int l, int r) -> long long {
    long long s = pre[r] - pre[l - 1];
    return s * s;
};

CF examples: CF 868F (Yet Another Minimization Problem)

Pattern 2: Partition into K Groups -- Count of Distinct Pairs

Cost = number of pairs (x,y) with x<y and a[x]=a[y] in the group. Maintain a frequency array and update incrementally as k moves (similar to Mo's algorithm). Satisfies QI.

CF examples: CF 868F

Pattern 3: Minimum Cost to Cut -- 1D Facility Location

Place K facilities on a line; each client goes to the nearest one. Cost of a segment is |a[j]median| or a similar convex function of the group. Monotone opt applies.

CF examples: CF 1527E (Partition Game)

Pattern 4: CSES-Style "Divide into K Segments"

Minimize the maximum segment sum, or minimize a convex function of segment sums. When the objective is convex in the partition, D&C optimization applies. Sometimes combined with binary search on the answer ("aliens trick" / lambda optimization).

CF examples: CSES - Dividing into Segments

Pattern 5: Building Bridges / Weighted Intervals

dp[j]=mink<j(dp[k]+C(k,j)) -- a 1D DP where the cost function has monotone opt. This is the single-layer variant (K=1 outer loop, just one D&C call). Often appears as "building bridges" or "placing checkpoints."

cpp
// Single-layer D&C optimization
auto solve = [&](auto& self, int jl, int jr, int kl, int kr) -> void {
    if (jl > jr) return;
    int jm = (jl + jr) / 2, best_k = kl;
    for (int k = kl; k <= min(jm - 1, kr); k++)
        if (dp[k] + cost(k + 1, jm) < dp[jm])
            { dp[jm] = dp[k] + cost(k + 1, jm); best_k = k; }
    self(self, jl, jm - 1, kl, best_k);
    self(self, jm + 1, jr, best_k, kr);
};

CF examples: CF 321E (Ciel and Gondola)

Pattern 6: Online Cost with Incremental Updates

When C(l,r) cannot be computed in O(1) but can be extended left/right in O(1) (like a frequency counter), scan k from kl to kr while maintaining a sliding data structure. Total work is still O(nlogn) per layer because the k-ranges at each recursion level form a partition of [0,n].

CF examples: CF 868F


Contest Cheat Sheet

+----------------------------------------------------------------+
|          DP DIVIDE AND CONQUER OPTIMIZATION                    |
+----------------------------------------------------------------+
| WHEN TO USE:                                                   |
|   - dp[i][j] = min over k { dp[i-1][k] + C(k+1,j) }         |
|   - opt(i,j) is non-decreasing in j (monotone split)          |
|   - C(l,r) satisfies quadrangle inequality                    |
|   - Reduces O(Kn^2) to O(Kn log n)                            |
+----------------------------------------------------------------+
| TEMPLATE CORE:                                                 |
|   solve(jl, jr, kl, kr):                                      |
|     jm = (jl+jr)/2                                             |
|     scan k in [kl..min(jm-1, kr)] -> find best_k              |
|     recurse left:  (jl, jm-1, kl, best_k)                     |
|     recurse right: (jm+1, jr, best_k, kr)                     |
+----------------------------------------------------------------+
| KEY BOUNDS:                                                    |
|   j in [i .. n],  k in [i-1 .. n-1]                           |
|   Two rows only: prv[] and cur[], swap after each layer        |
+----------------------------------------------------------------+
| COMPLEXITY:  Time O(Kn log n)   Space O(n)                     |
| VERIFY QI:   C(a,c)+C(b,d) <= C(a,d)+C(b,c)  for a<=b<=c<=d  |
+----------------------------------------------------------------+

Gotchas & Debugging

Two-Row Requirement

You must have the previous layer fully computed before starting D&C on the current layer. Do not try to compute in-place -- the recursion reads prv[k] values that must not be overwritten.

k-Range Bounds

The most common bug: passing wrong kl / kr to recursive calls.

  • Left call gets kl to best_k (not best_k - 1).
  • Right call gets best_k to kr (not best_k + 1).

best_k is included in both halves. This is correct because the left half constrains optbest_k and the right half constrains optbest_k.

k < jm Constraint

In the inner scan, k must satisfy k <= jm - 1 so that group [k+1,jm] is non-empty. Use min(jm - 1, kr) as the loop bound. Forgetting this causes index-out-of-range or empty groups.

Cost Function Must Be Correct

The D&C trick only helps with the transition scan -- if C(l,r) is buggy, no amount of optimization saves you. Test C independently on small cases.

Verifying QI

If unsure whether your cost function satisfies QI:

cpp
// Brute-force QI check on small n (debugging only)
for (int a = 1; a <= n; a++)
  for (int b = a; b <= n; b++)
    for (int c = b; c <= n; c++)
      for (int d = c; d <= n; d++)
        assert(cost(a,c) + cost(b,d) <= cost(a,d) + cost(b,c));

Overflow

If C(l,r) involves squared sums and a[i]109, the sum can reach 1014 and the square 1028 -- this overflows long long. Use __int128 or restructure the cost.

Debugging Tips

  • Print best_k for a small example and verify monotonicity by hand.
  • Compare output against the O(Kn2) naive solution on random inputs up to n=50.
  • If TLE, check that your cost function is truly O(1) per query (or that the incremental version is amortized correctly).

Off-by-one in the opt-search bound

The single tricky line in the entire template is the inner-loop bound: for (int k = kl; k <= min(jm - 1, kr); k++). Writing min(jm, kr) instead of min(jm - 1, kr) lets k = jm, which reads an uncomputed value from the previous layer (or an empty group [jm+1, jm]) and silently corrupts dp[i][jm]. Every D&C optimization bug I've hit lived in this single line -- comment it with // k must be strictly less than jm and re-read it before submitting.

Mental Traps

"The monotonicity condition is obvious." It is not. You must prove that opt(i,j)opt(i,j+1) for your specific cost function. The quadrangle inequality is a sufficient condition, but it is not always easy to verify by inspection. Some cost functions that "look" monotone are not (e.g., cost based on the number of distinct elements in a range). Always verify empirically on small inputs or prove it analytically before committing to D&C optimization.

  THE MONOTONICITY PROPERTY YOU MUST VERIFY:

  For a fixed layer i:

    opt(i,1) <= opt(i,2) <= opt(i,3) <= ... <= opt(i,n)
       │          │          │                    │
       ▼          ▼          ▼                    ▼
    best m     best m     best m              best m
    for j=1    for j=2    for j=3             for j=n

  OK VALID (monotone):    opt = [0, 0, 1, 2, 2, 3, 5, 6]
  X INVALID (not mono):  opt = [0, 2, 1, 3, 2, 5, 4, 6]
                                    ↑ ↑        ↑ ↑
                               2>1 violates    5>4 violates

  If INVALID: D&C optimization restricts the search range for some j
  to NOT include its true optimal m -> silent wrong answers.

"D&C optimization always applies when cost is concave/convex." Concavity or convexity of C(l,r) as a function of the interval is sufficient for the quadrangle inequality (and therefore monotone split points), but it is not the only way monotonicity arises. The key condition is monotonicity of opt(i,j) -- not concavity itself. There exist cost functions that are neither concave nor convex but still have monotone optimal splits. Conversely, there are cost functions with local concavity that still violate QI. Always check the actual condition (monotone opt), not a proxy.

"I can just use this whenever the naive DP is O(kn2)." The O(kn2)O(knlogn) reduction only works when the QBSM (Quadrangle inequality / optimal Break-point Sorted Monotonicity) condition holds. Without it, the D&C recursion prunes the search range incorrectly. Other O(kn2) DPs may need entirely different optimizations: the Convex Hull Trick (when the transition is linear in m), Knuth's optimization (for interval DPs), or the SMAWK algorithm (for totally monotone matrices).


Debug Drills

Drill 1: Scanning opt in wrong range

What's wrong?
cpp
void solve(int k, int lo, int hi, int opt_lo, int opt_hi) {
    if (lo > hi) return;
    int mid = (lo + hi) / 2;
    int best_j = opt_lo;
    for (int j = opt_lo; j <= opt_hi; j++) {  // BUG: j can exceed mid!
        long long val = dp[k-1][j] + cost(j + 1, mid);
        if (val < dp[k][mid]) {
            dp[k][mid] = val;
            best_j = j;
        }
    }
    solve(k, lo, mid - 1, opt_lo, best_j);
    solve(k, mid + 1, hi, best_j, opt_hi);
}

The loop bound j <= opt_hi doesn't enforce j < mid. When opt_hi >= mid, the loop considers j = mid, which means cost(mid + 1, mid) is evaluated (empty range) -- silent corruption. Fix: Change the loop bound to j <= min(opt_hi, mid - 1).

Drill 2: Forgot to initialize dp to infinity

What's wrong?
cpp
long long dp[K_MAX + 1][N_MAX + 1]; // uninitialized!

void solve_all() {
    for (int i = 1; i <= n; i++)
        dp[1][i] = cost(1, i); // base case
    for (int k = 2; k <= K; k++)
        solve(k, 1, n, 0, n - 1);
    cout << dp[K][n] << endl;
}

dp[k][i] for k >= 2 is never initialized to infinity. The solve routine compares val < dp[k][mid] to update, but dp[k][mid] starts as 0 (or garbage). Since val is positive, no update fires, and the answer is 0. Fix: Before each solve(k, ...) call, initialize dp[k][1..n] = INF (e.g., 1e18).

Drill 3: Cost function not O(1)

What's wrong?
cpp
// cost(l, r) = number of distinct elements in a[l..r]
int cost(int l, int r) {
    set<int> s;
    for (int i = l; i <= r; i++) s.insert(a[i]);
    return s.size();
    // BUG: O(n log n) per call -> total O(n^2 log^2 n)
}

The D&C optimization makes O(n log n) calls to cost(). An O(n) cost function blows the complexity back up past the naive O(Kn²). Fix: Use incremental cost with a global frequency array and maintain cost(l, r) by adding/removing one element at a time as pointers shift. Total pointer movement across the recursion is O(n log n), so amortized updates stay within budget (the CF 868F pattern).


Self-Test

Before using this in a contest, verify you can do all of the following without looking at the notes:

  • [ ] State the D&C optimization precondition: the optimal split point opt(i,j) must be monotone non-decreasing in j for each fixed layer i.
  • [ ] Explain how the solve(l, r, opt_lo, opt_hi) recursion reduces O(kn2) to O(knlogn): solving the midpoint narrows the opt-range for each half, and the total scan work per recursion level is O(n) across O(logn) levels.
  • [ ] Verify the monotonicity condition on a small example (n8) by hand: compute opt(i,j) for all j in a fixed row and check they're non-decreasing.
  • [ ] Implement the D&C optimization for a partitioning problem from scratch in under 15 minutes (write solve, the layer loop, and the cost function).
  • [ ] Distinguish when to use D&C optimization vs Knuth's optimization vs Convex Hull Trick: D&C for layered partition DPs with monotone opt; Knuth's for interval DPs dp[l][r] with QI; CHT when the transition is a linear function of the split point.
  • [ ] Explain why this technique requires computing C(l,r) efficiently: the inner scan evaluates C(m+1,j) for each candidate m, so O(1) cost queries are needed to keep the per-level work at O(n).

Practice Problems

CF RatingWhat you should be comfortable with
1200Understand the O(Kn²) partition DP; write the brute-force three-nested-loop version
1500Know what "monotonicity of opt" means; verify the monotonicity condition on small cases
1800Implement the D&C template cleanly; precompute prefix sums for O(1) cost queries; solve CF 321E
2100Prove the quadrangle inequality for non-trivial cost functions; combine D&C with incremental cost updates (CF 868F); distinguish D&C from CHT and Knuth
#ProblemSourceDifficultyKey Idea
1Ciel and GondolaCF 321E2000Partition into K groups, direct D&C opt template
2Partition GameCF 1527E2100Partition cost based on repeated elements
3Yet Another Minimization ProblemCF 868F2500D&C opt with incremental cost (frequency pairs)
4Tear Up the FloorCF 1609F2400Partition rows, quadrangle inequality cost
5LancesCF 1707C2400D&C opt on geometric cost
6APIO 2014 -- SequenceAPIOHardClassic K-partition, squared cost
7IOI 2014 -- HolidayIOIHardD&C optimization with segment tree cost
8Guardians of the LunaticsSPOJMediumPartition into K groups, absolute cost

Further Reading

Built for the climb from Codeforces 1100 to 2100.