CDQ Divide and Conquer

Quick Revisit
USE WHEN: multi-dimensional partial order (2D/3D inversions, updates-before-queries in time)
INVARIANT: left-half contributions to right-half computed at each recursion level; cross-half only
TIME: O(n log² n) | SPACE: O(n)
CLASSIC BUG: not resetting the BIT/data structure after processing each recursive call
PRACTICE: 08-ladder-mixed

Offline technique that solves multi-dimensional partial-order problems by recursively splitting one dimension and computing cross-contributions between halves. Replaces persistent or 2D data structures with a simple recursive framework.

ID: AL-42 Difficulty: [Advanced]Prerequisites: Divide and Conquer, Fenwick Tree, Offline QueriesCF Rating Range: 1900--2100

2. Intuition
3. How It Works
4. Implementation
5. Complexity Analysis
6. Problem Patterns & Tricks
7. Contest Cheat Sheet
8. Gotchas & Debugging
9. Self-Test
10. Practice Problems
11. Further Reading

Intuition

2.1 The Pain

Count 2D inversions: given $n$ points $(x_{i}, y_{i})$ , count pairs $(i, j)$ where $x_{i} < x_{j}$ and $y_{i} > y_{j}$ .

Brute force: check every pair.

  Points:  A(1,5)  B(2,3)  C(3,6)  D(4,1)  E(5,4)  F(6,2)

  15 pairs to check, 10 satisfy both conditions.  O(n^2).

For $n = 2 \times 10^{5}$ that is $4 \times 10^{10}$ -- far too slow. Online alternatives (persistent BIT, 2D segment tree) are complex and memory-heavy. We want something simpler.

2.2 The Key Insight

Split the index/time dimension in half. All contributions from the left half to the right half can be computed efficiently (e.g. with a BIT over the second dimension), then recurse on each half independently.

The trick is dimensional reduction. A 2D partial order has two conditions: $x_{i} < x_{j}$ and $y_{i} > y_{j}$ . If we sort by $x$ and recurse on the array index, every left-right pair automatically satisfies $x_{i} < x_{j}$ . That eliminates one dimension. The remaining condition ( $y_{i} > y_{j}$ ) is a 1D problem solvable with a BIT.

2.3 Visual Walkthrough

Six points sorted by $x$ :

  Index:   0       1       2       3       4       5
  Point: (1,5)   (2,3)   (3,6)   (4,1)   (5,4)   (6,2)
           A       B       C       D       E       F

Level 0 -- Split [0,5] at mid=2:

  LEFT  [0..2]: A(1,5)  B(2,3)  C(3,6)
  RIGHT [3..5]: D(4,1)  E(5,4)  F(6,2)

  Cross-contributions: pairs (i in LEFT, j in RIGHT) with y_i > y_j.

    D(y=1): A(5>1) B(3>1) C(6>1) => 3
    E(y=4): A(5>4)        C(6>4) => 2
    F(y=2): A(5>2) B(3>2) C(6>2) => 3
                              cross = 8

Recurse on each half. The full recursion tree:

                    [0..5]  cross=8
                   /              \
            [0..2]  cross=0     [3..5]  cross=1
           /       \           /       \
      [0..1] c=1  [2..2]  [3..4] c=0  [5..5]
      /    \                /    \
   [0] [1]              [3]   [4]

Total = 8 + 0 + 1 + 1 + 0 = 10. Matches brute force.

Generalized CDQ recursion -- left half contributes to right half:

  CDQ Recursion Tree -- left half contributes to right half:
  
              [1..........8]
             /              \
        [1..4]  ======>  [5..8]     left contributes to right
       /     \          /     \
    [1,2] => [3,4]   [5,6] => [7,8]
     / \      / \     / \      / \
    1   2    3   4   5   6    7   8
    
  ======> means "process contributions from left to right"
  
  Key: at every level, cross-contributions flow left --> right.
  Contributions *within* each half are handled by deeper recursion.
  Every pair (i < j) is counted exactly once at the LCA of their leaves.

Worked Example: Counting Inversions via CDQ (Merge-Sort Style)

  Array A = [5, 3, 6, 1, 4, 2]       (count pairs i<j with A[i]>A[j])

  ── Level 2: split into pairs ──────────────────────────────
  [5, 3]     [6, 1]     [4, 2]
   cross:1    cross:1    cross:1       (5>3, 6>1, 4>2)
   sorted->[3,5]  sorted->[1,6]  sorted->[2,4]

  ── Level 1: merge pairs into quads ────────────────────────
  Merge [3, 5] with [1, 6]:
    Compare 3 vs 1: take 1, inversions += 2  (both 3,5 > 1)
    Compare 3 vs 6: take 3, inversions += 0
    Compare 5 vs 6: take 5, inversions += 0
    Take 6.                    cross = 2
    sorted -> [1, 3, 5, 6]

  Merge [2, 4] (only one pair, becomes right half at next level)

  ── Level 0: merge [1, 3, 5, 6] with [2, 4] ───────────────
    ptr_L -> 1, ptr_R -> 2
    | Take | From  | remaining left | += inversions |
    |------|-------|----------------|---------------|
    |  1   | left  |   [3, 5, 6]    |      0        |
    |  2   | right |   --            |  += 3 (3,5,6) |
    |  3   | left  |   [5, 6]       |      0        |
    |  4   | right |   --            |  += 2 (5,6)   |
    |  5   | left  |   [6]          |      0        |
    |  6   | left  |   []           |      0        |
                                       cross = 5

  Total inversions = 1 + 1 + 1 + 2 + 5 = 10
  Brute force check: (5,3)(5,1)(5,4)(5,2)(3,1)(3,2)(6,1)(6,4)(6,2)(4,2) = 10 OK

2.4 The Invariant

After processing interval $[l, r]$ , all contributions from elements in $[l, mid]$ to elements in $[mid + 1, r]$ have been counted. Recursion on the two halves counts contributions within each half.

Together, every pair $(i, j)$ with $l \leq i < j \leq r$ is counted exactly once: either in the cross-contribution of some ancestor call, or within a recursive sub-call.

2.5 When to Reach for This

Trigger	Example
2D partial order / 2D inversion counting	"Count pairs with $a_{i} < a_{j}$ and $b_{i} > b_{j}$ "
3D partial order (CDQ + BIT)	"Count points dominated in all three coordinates"
Offline point updates + range queries	"Process updates and queries in order, but offline"
DP optimization with prefix constraints	" $d p [j] = min_{i < j, a_{i} \leq a_{j}} d p [i] + \dots$ "
Replacing persistent/2D data structures	Any offline 2D problem where online is overkill

Rule of thumb: If you see two or three ordering conditions on pairs and the problem is offline, think CDQ.

Dimension Role Assignment

In a k-dimensional problem (e.g., 3D partial order / CDQ on BIT):

Dimension	Handled by	When
1st	Global sort	Before anything (sort input by dim 1)
2nd	CDQ merge	During divide-and-conquer (left-to-right cross contribution)
3rd	BIT/segment tree	Inside the cross-contribution step

Dimension reduction pipeline -- CDQ peels off one dimension at each stage:

  3D partial order:  (a_i <= a_j) AND (b_i <= b_j) AND (c_i <= c_j)
  
  CDQ eliminates one dimension at each level:
  
  +------------------+     +------------------+     +------------------+
  | 3 dimensions     | --> | 2 dimensions     | --> | 1 dimension      |
  | sort by dim a    |     | CDQ on dim b     |     | BIT/merge on c   |
  | (preprocessing)  |     | (divide-conquer) |     | (data structure) |
  +------------------+     +------------------+     +------------------+
        ^                        ^                        ^
        |                        |                        |
    Global sort             Merge-sort by b          BIT update/query
    eliminates a            eliminates b             handles c

This assignment is NOT arbitrary. The 1st dimension must be the one that, once sorted, doesn't need to be "unsorted." The 2nd dimension is the one CDQ's merge handles. The 3rd dimension needs a data structure because it's neither sorted nor merged.

What the Code Won't Teach You

When to use CDQ vs. alternatives. CDQ is the right tool when:

You have a 3D partial order problem, or
You have a DP optimization where the transition is $d p [i] = min_{j < i, b_{j} \leq b_{i}} {d p [j] + f (c_{j}, c_{i})}$ -- a 2D constraint on the transition.

The alternative to CDQ for (1) is a persistent segment tree or a merge sort tree, which are often simpler to implement but use more memory. The alternative for (2) is usually nothing -- CDQ D&C DP optimization fills a niche that few other techniques handle cleanly.

Why CDQ is $O (n \log^{2} n)$ . The BIT contributes $O (\log n)$ per element. At each level of the D&C tree, each element is processed once. The D&C tree has $O (\log n)$ levels. Total: $O (n \log^{2} n)$ . This is better than a naive $O (n^{2})$ but worse than a segment tree solution if one exists. Knowing when $O (n \log^{2} n)$ fits the time limit requires checking against $n$ and TL.

The merge is load-bearing, not cleanup. Reading the code, the merge-sort step at the end of cdq(l, r) looks like cleanup. It is not. If you remove it, the parent call's two-pointer sweep over the second dimension breaks because it assumes both halves are sorted. The algorithm is a merge sort that counts things along the way.

3. How It Works

Given $n$ elements with attributes $(a_{i}, b_{i})$ , count pairs $(i, j)$ where $i < j$ , $a_{i} \leq a_{j}$ , and $b_{i} \leq b_{j}$ .

Sort by $a$ (or use the original index as first dimension).
Recurse with cdq(l, r):
- Base: $l = r$ , return.
- $mid = ⌊ (l + r) / 2 ⌋$ .
- cdq(l, mid), cdq(mid+1, r).
- Merge step: compute cross-contributions using $b$ .
Merge-sort $[l, r]$ by $b$ so parent calls can two-pointer sweep.

Complexity: $T (n) = 2 T (n / 2) + O (n)$ for two-pointer merge = $O (n \log n)$ . With a BIT in the merge step: $O (n \log^{2} n)$ .

Implementation

4.1 CDQ for 2D Inversion Count (merge-sort style)

Count pairs $(i, j)$ with $i < j$ and $a [i] > a [j]$ -- classic inversion count reframed as CDQ.

The 1D ancestor. Standard merge-sort inversion counting is the one-dimensional CDQ: split the index dimension at the midpoint, count pairs that cross (left $i$ , right $j$ with $a [i] > a [j]$ ), recurse. The "second dimension" is the value $a [i]$ , handled by the merge-sort sweep itself. Once you see this, 3D partial order is just two more dimensions stacked on top: the merge sort handles one of them, and a BIT handles the third. CDQ is merge-sort with a counting hook.

How the merge-sort cross-counting works:

  CDQ merge-sort pass:
  
  Left half (sorted):     Right half (sorted):
  +---+---+---+---+       +---+---+---+---+
  | 1 | 3 | 5 | 7 |       | 2 | 4 | 6 | 8 |
  +---+---+---+---+       +---+---+---+---+
    ^                         ^
    i                         j
    
  For each j, count how many i's in left half are GREATER than a[j].
  Since both halves are sorted, use two-pointer:
  
  When a[i] <= a[j]:  advance i (no inversion with this j)
  When a[i] >  a[j]:  ALL of a[i..mid] > a[j]  -->  add (mid - i + 1)
  
  After counting, merge both halves into sorted order for parent level.

cpp

#include <bits/stdc++.h>
using namespace std;

long long ans;
vector<int> tmp;

// After cdq(l,r), a[l..r] is sorted (merge-sort invariant).
void cdq(vector<int>& a, int l, int r) {
    if (l >= r) return;
    int mid = (l + r) / 2;
    cdq(a, l, mid);
    cdq(a, mid + 1, r);

    // Count cross-contributions: for each j in [mid+1,r],
    // count elements in [l,mid] that are greater than a[j].
    int i = l, j = mid + 1, k = 0;
    tmp.resize(r - l + 1);
    while (i <= mid && j <= r) {
        if (a[i] <= a[j]) {
            tmp[k++] = a[i++];
        } else {
            // All a[i..mid] > a[j], contributing (mid - i + 1) inversions.
            ans += mid - i + 1;
            tmp[k++] = a[j++];
        }
    }
    while (i <= mid) tmp[k++] = a[i++];
    while (j <= r)   tmp[k++] = a[j++];
    copy(tmp.begin(), tmp.begin() + k, a.begin() + l);
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n;
    cin >> n;
    vector<int> a(n);
    for (int& x : a) cin >> x;

    ans = 0;
    cdq(a, 0, n - 1);
    cout << ans << "\n";
}

4.2 CDQ for 2D Partial Order with BIT

Given $n$ points $(x_{i}, y_{i})$ , count for each point $j$ how many points $i$ satisfy $x_{i} < x_{j}$ and $y_{i} < y_{j}$ . (This is the 2D dominance counting problem.)

cpp

#include <bits/stdc++.h>
using namespace std;

struct Point {
    int x, y, idx;
    int result;
};

struct BIT {
    int n;
    vector<int> tree;
    vector<int> modified; // track modified indices for cleanup
    BIT(int n) : n(n), tree(n + 1, 0) {}
    void update(int i, int delta) {
        modified.push_back(i);
        for (; i <= n; i += i & (-i))
            tree[i] += delta;
    }
    int query(int i) {
        int s = 0;
        for (; i > 0; i -= i & (-i))
            s += tree[i];
        return s;
    }
    void clear() {
        for (int i : modified)
            for (; i <= n; i += i & (-i))
                tree[i] = 0;
        modified.clear();
    }
};

void cdq(vector<Point>& pts, int l, int r, BIT& bit) {
    if (l >= r) return;
    int mid = (l + r) / 2;
    cdq(pts, l, mid, bit);
    cdq(pts, mid + 1, r, bit);

    // Both halves sorted by y (merge-sort invariant from recursion).
    // Two-pointer: insert left elements into BIT by y, query for right.
    int i = l;
    for (int j = mid + 1; j <= r; j++) {
        while (i <= mid && pts[i].y <= pts[j].y) {
            bit.update(pts[i].y, 1);
            i++;
        }
        pts[j].result += bit.query(pts[j].y - 1);
    }
    bit.clear();

    // Merge-sort by y for parent calls.
    inplace_merge(pts.begin() + l, pts.begin() + mid + 1,
                  pts.begin() + r + 1,
                  [](const Point& a, const Point& b) {
                      return a.y < b.y;
                  });
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
    int n; cin >> n;
    vector<Point> pts(n);
    for (int i = 0; i < n; i++) {
        cin >> pts[i].x >> pts[i].y;
        pts[i].idx = i; pts[i].result = 0;
    }
    // Coordinate compress y to [1..m].
    vector<int> ys(n);
    for (int i = 0; i < n; i++) ys[i] = pts[i].y;
    sort(ys.begin(), ys.end());
    ys.erase(unique(ys.begin(), ys.end()), ys.end());
    for (auto& p : pts)
        p.y = (int)(lower_bound(ys.begin(), ys.end(), p.y) - ys.begin()) + 1;

    sort(pts.begin(), pts.end(), [](const Point& a, const Point& b) {
        return a.x < b.x || (a.x == b.x && a.y < b.y);
    });
    int maxY = (int)ys.size();
    BIT fenwick(maxY);
    cdq(pts, 0, n - 1, fenwick);

    vector<int> res(n);
    for (auto& p : pts) res[p.idx] = p.result;
    for (int i = 0; i < n; i++) cout << res[i] << "\n";
}

4.3 CDQ for 3D Partial Order (CDQ + nested BIT)

The classic "3D partial order" problem: given $n$ elements with attributes $(a_{i}, b_{i}, c_{i})$ , count for each element $j$ the number of elements $i$ with $a_{i} \leq a_{j}$ , $b_{i} \leq b_{j}$ , $c_{i} \leq c_{j}$ .

Pattern: track touched indices, undo at the end of each merge step. Every CDQ implementation that uses a BIT (or any auxiliary structure) needs to leave that structure clean for the next merge step at the same level. The pattern, used in the BIT below, is: every update pushes the touched index onto a stack; the undo() walks the stack and zeros only those indices. This makes cleanup $O (elements inserted)$ instead of $O (maxval)$ . Forgetting this is the single most common CDQ bug. The naive memset at the end of each call destroys the $O (n \log^{2} n)$ complexity and can mask off-by-one bugs in the merge pointer (see "the mistake that teaches" below).

Strategy:

Sort by $a$ .
CDQ on the array index eliminates the $a$ dimension.
In the merge step, sweep by $b$ (two-pointer), use a BIT on $c$ .

3D Partial Order Walkthrough

Concrete example with 5 elements, each having attributes $(a, b, c)$ :

  Raw input:
    E0 = (3, 2, 5)
    E1 = (1, 4, 3)
    E2 = (2, 1, 2)
    E3 = (1, 3, 1)
    E4 = (2, 2, 4)

Step 1 -- Sort globally by $a$ , breaking ties by $(b, c)$ :

  After sort by (a, b, c):
    idx  a  b  c
    ---- --------
     0   1  3  1   (was E3)
     1   1  4  3   (was E1)
     2   2  1  2   (was E2)
     3   2  2  4   (was E4)
     4   3  2  5   (was E0)

  No duplicates here, so no deduplication needed.
  All elements have cnt = 1.

Step 2 -- CDQ(0, 4): split at mid = 2

  LEFT  [0..2]:  (1,3,1)  (1,4,3)  (2,1,2)
  RIGHT [3..4]:  (2,2,4)  (3,2,5)

  Recurse: CDQ(0,2), then CDQ(3,4).

Step 2a -- CDQ(0, 2): split at mid = 1

  LEFT  [0..1]:  (1,3,1)  (1,4,3)
  RIGHT [2..2]:  (2,1,2)

  Recurse: CDQ(0,1), then CDQ(2,2) [base case].

Step 2a-i -- CDQ(0, 1): split at mid = 0

  LEFT  [0..0]:  (1,3,1)   -- base case
  RIGHT [1..1]:  (1,4,3)   -- base case

  Cross-contribution (left -> right):
    Two-pointer sweep by b:
      ptr=0: e[0].b=3 <= e[1].b=4 --> BIT.update(c=1, cnt=1)
      Query for e[1]: BIT.query(c=3) = 1
      e[1].result += 1  -->  result[1] = 1

  BIT.undo()
  Merge-sort by b: [(1,3,1), (1,4,3)]  (already sorted by b)

Back to CDQ(0, 2): cross-contribution

  LEFT  [0..1] sorted by b: (1,3,1)  (1,4,3)
  RIGHT [2..2]:              (2,1,2)

  Two-pointer sweep by b:
    j=2: e[2].b=1.  ptr=0: e[0].b=3 > 1, so no insertions.
    Query for e[2]: BIT.query(c=2) = 0
    e[2].result += 0  -->  result[2] = 0

  BIT.undo()
  Merge-sort by b: [(2,1,2), (1,3,1), (1,4,3)]
  State of [0..2] sorted by b:  b=1, b=3, b=4

Step 2b -- CDQ(3, 4): split at mid = 3

  LEFT  [3..3]:  (2,2,4)   -- base case
  RIGHT [4..4]:  (3,2,5)   -- base case

  Cross-contribution:
    ptr=3: e[3].b=2 <= e[4].b=2 --> BIT.update(c=4, cnt=1)
    Query for e[4]: BIT.query(c=5) = 1
    e[4].result += 1  -->  result[4] = 1

  BIT.undo()
  Merge-sort by b: [(2,2,4), (3,2,5)]  (tie in b=2, stable)

Back to CDQ(0, 4): cross-contribution

  LEFT  [0..2] sorted by b: (2,1,2)  (1,3,1)  (1,4,3)
  RIGHT [3..4] sorted by b: (2,2,4)  (3,2,5)

  Two-pointer sweep by b:
    j=3, e[3].b=2:
      ptr=0: e[0].b=1 <= 2 --> BIT.update(c=2, cnt=1), ptr=1
      ptr=1: e[1].b=3 > 2, stop.
      Query: BIT.query(c=4) = 1
      e[3].result += 1  -->  result[3] = 0+1 = 1

    j=4, e[4].b=2:
      ptr=1: e[1].b=3 > 2, stop.
      Query: BIT.query(c=5) = 1
      e[4].result += 1  -->  result[4] = 1+1 = 2

  BIT.undo()
  Merge-sort by b: [(2,1,2), (2,2,4), (3,2,5), (1,3,1), (1,4,3)]

Step 3 -- Final results:

  Element    (a,b,c)    result    d = result+cnt-1    Meaning
  -------    -------    ------    ----------------    -------
  E3         (1,3,1)      0            0              0 elements dominate E3
  E1         (1,4,3)      1            1              1 element dominates E1
  E2         (2,1,2)      0            0              0 elements dominate E2
  E4         (2,2,4)      1            1              1 element dominates E4
  E0         (3,2,5)      2            2              2 elements dominate E0

  Verification (brute force):
    E3(1,3,1): no element has all coords <=  --> 0  *
    E1(1,4,3): E3(1<=1, 3<=4, 1<=3)         --> 1  *
    E2(2,1,2): no element has all coords <=  --> 0  *
    E4(2,2,4): E2(2<=2, 1<=2, 2<=4)         --> 1  *
    E0(3,2,5): E2(2<=3, 1<=2, 2<=5)
               E4(2<=3, 2<=2, 4<=5)         --> 2  *

  f[0]=2, f[1]=2, f[2]=1  (output for n=5)

cpp

#include <bits/stdc++.h>
using namespace std;

struct Element {
    int a, b, c, cnt, result;
    bool operator==(const Element& o) const {
        return a == o.a && b == o.b && c == o.c;
    }
};

struct BIT {
    int n;
    vector<int> tree;
    vector<int> stk;
    BIT() : n(0) {}
    BIT(int n) : n(n), tree(n + 1, 0) {}
    void update(int i, int v) {
        stk.push_back(i);
        for (; i <= n; i += i & (-i))
            tree[i] += v;
    }
    int query(int i) {
        int s = 0;
        for (; i > 0; i -= i & (-i))
            s += tree[i];
        return s;
    }
    void undo() {
        for (int i : stk)
            for (int j = i; j <= n; j += j & (-j))
                tree[j] = 0;
        stk.clear();
    }
};

BIT bit;
vector<Element> tmp;

void cdq(vector<Element>& e, int l, int r) {
    if (l >= r) return;
    int mid = (l + r) / 2;
    cdq(e, l, mid);
    cdq(e, mid + 1, r);

    // Cross-contribution: two-pointer sweep by b.
    // Insert left elements' c into BIT, query for right elements.
    int ptr = l;
    for (int j = mid + 1; j <= r; j++) {
        while (ptr <= mid && e[ptr].b <= e[j].b) {
            bit.update(e[ptr].c, e[ptr].cnt);
            ptr++;
        }
        e[j].result += bit.query(e[j].c);
    }
    bit.undo();

    // Merge sort by b for parent calls.
    tmp.clear();
    int li = l, ri = mid + 1;
    while (li <= mid && ri <= r) {
        if (e[li].b <= e[ri].b) tmp.push_back(e[li++]);
        else                    tmp.push_back(e[ri++]);
    }
    while (li <= mid) tmp.push_back(e[li++]);
    while (ri <= r)   tmp.push_back(e[ri++]);
    for (int k = 0; k < (int)tmp.size(); k++)
        e[l + k] = tmp[k];
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n, k;
    cin >> n >> k;
    vector<Element> raw(n);
    for (int i = 0; i < n; i++) {
        cin >> raw[i].a >> raw[i].b >> raw[i].c;
        raw[i].cnt = 1;
        raw[i].result = 0;
    }

    // Sort by (a, b, c).
    sort(raw.begin(), raw.end(), [](const Element& x, const Element& y) {
        if (x.a != y.a) return x.a < y.a;
        if (x.b != y.b) return x.b < y.b;
        return x.c < y.c;
    });

    // Deduplicate identical elements.
    vector<Element> e;
    for (int i = 0; i < n; ) {
        int j = i;
        while (j < n && raw[j] == raw[i]) j++;
        Element cur = raw[i];
        cur.cnt = j - i;
        cur.result = 0;
        e.push_back(cur);
        i = j;
    }

    int m = (int)e.size();
    bit = BIT(k);
    cdq(e, 0, m - 1);

    // f[d] = number of elements with exactly d elements dominating them.
    // An element with count c contributes: result + (c-1) own-group pairs.
    vector<int> f(n, 0);
    for (auto& el : e) {
        int d = el.result + el.cnt - 1;
        f[d] += el.cnt;
    }
    for (int i = 0; i < n; i++)
        cout << f[i] << "\n";
}

Merge Trace ASCII -- 3D CDQ Cross-Contribution Step

Below is a step-by-step trace of the cross-contribution phase for the top-level CDQ call from the walkthrough above. The LEFT half [0..2] and RIGHT half [3..4] are already sorted by $b$ from their recursive calls.

  LEFT  (sorted by b):   R0=(2,1,2)  R1=(1,3,1)  R2=(1,4,3)
  RIGHT (sorted by b):   R3=(2,2,4)  R4=(3,2,5)
  BIT range: [1..5] (all zeros initially)

  Two-pointer: ptr starts at left[0], j sweeps right[0..1].

  ---------------------------------------------------------------
  STEP 1:  j -> R3  (b=2)
  ---------------------------------------------------------------
    ptr -> R0 (b=1):  1 <= 2?  YES
      BIT.update(c=2, cnt=1)
      BIT state:
        index: 1   2   3   4   5
        value: 0   1   0   1   0
               +-------+---+
               |  c=2 inserted, propagated to index 4
      ptr advances to R1

    ptr -> R1 (b=3):  3 <= 2?  NO, stop.

    Query for R3:  BIT.query(c=4)
      Traverse: 4 -> 0  (picks up index 4: val=1)
      Result = 1
      R3.result += 1
                                     +---+---+---+---+---+
      BIT snapshot after step 1:     | 0 | 1 | 0 | 1 | 0 |
                                     +---+---+---+---+---+
                                       1   2   3   4   5

  ---------------------------------------------------------------
  STEP 2:  j -> R4  (b=2)
  ---------------------------------------------------------------
    ptr -> R1 (b=3):  3 <= 2?  NO, stop.
    (No new insertions.)

    Query for R4:  BIT.query(c=5)
      Traverse: 5 -> 4 -> 0
        index 5: val=0
        index 4: val=1
      Result = 1
      R4.result += 1
                                     +---+---+---+---+---+
      BIT snapshot after step 2:     | 0 | 1 | 0 | 1 | 0 |
                                     +---+---+---+---+---+
                                       1   2   3   4   5

  ---------------------------------------------------------------
  CLEANUP:  BIT.undo()
  ---------------------------------------------------------------
    Undo stack: [2]  (only index 2 was inserted)
    Reset index 2 -> 0, propagate to index 4 -> 0
                                     +---+---+---+---+---+
      BIT after undo:                | 0 | 0 | 0 | 0 | 0 |
                                     +---+---+---+---+---+
                                       1   2   3   4   5

  ---------------------------------------------------------------
  MERGE by b:
  ---------------------------------------------------------------
    Left:   R0(b=1)  R1(b=3)  R2(b=4)
    Right:  R3(b=2)  R4(b=2)

    Merge result:  R0(b=1)  R3(b=2)  R4(b=2)  R1(b=3)  R2(b=4)
                     ^         ^        ^         ^        ^
                    ptr       ptr      ptr       ptr      ptr
                   left      right    right     left     left

  Cross-contribution summary:
    R3.result = 1   (dominated by R0 via left half)
    R4.result = 1   (dominated by R0 via left half)
    (Combined with results from recursive calls:
     R3 total = 1, R4 total = 1+1 = 2)

4.4 CDQ for Offline Point Updates + Range Queries

Convert online "update point, query prefix sum" into an offline CDQ problem. Each operation gets a timestamp $t$ , a type (update/query), and a position $x$ . CDQ on $t$ eliminates the time dimension; the merge step sweeps by $x$ and uses a BIT (or running sum) for prefix queries.

cpp

struct Op { int t, type, x, val, qidx; };
// type: 0=update, 1=query.  val: update delta or query sign (+1/-1).

void cdq(vector<Op>& ops, int l, int r) {
    if (l >= r) return;
    int mid = (l + r) / 2;
    cdq(ops, l, mid);
    cdq(ops, mid + 1, r);

    // Collect left-updates and right-queries, both sorted by x.
    vector<Op*> upd, qry;
    for (int i = l; i <= mid; i++)
        if (ops[i].type == 0) upd.push_back(&ops[i]);
    for (int i = mid + 1; i <= r; i++)
        if (ops[i].type == 1) qry.push_back(&ops[i]);

    // Cross-contribution: updates in [l,mid] affect queries in [mid+1,r].
    int ptr = 0;
    for (auto* q : qry) {
        while (ptr < (int)upd.size() && upd[ptr]->x <= q->x) {
            bit.update(upd[ptr]->x, upd[ptr]->val);
            ptr++;
        }
        ans[q->qidx] += (long long)q->val * bit.query(q->x);
    }
    bit.undo();

    // Merge-sort by x for parent calls.
    inplace_merge(ops.begin() + l, ops.begin() + mid + 1,
                  ops.begin() + r + 1,
                  [](const Op& a, const Op& b) { return a.x < b.x; });
}

Complexity Analysis

Time

Variant	Merge step	Recurrence	Total
2D merge-sort CDQ	$O (n)$ two-pointer	$T (n) = 2 T (n / 2) + O (n)$	$O (n \log n)$
2D CDQ + BIT	$O (n \log n)$ BIT ops	$T (n) = 2 T (n / 2) + O (n \log n)$	$O (n \log^{2} n)$
3D CDQ + BIT	$O (n \log n)$ BIT in merge	$T (n) = 2 T (n / 2) + O (n \log n)$	$O (n \log^{2} n)$
$k$ -D nested CDQ	$O (n \log^{k - 2} n)$	...	$O (n \log^{k - 1} n)$

$O (n + V)$ total space where $V$ is the coordinate range after compression.

$O (n \log^{2} n)$ with $n = 10^{5}$ : roughly $3 \times 10^{7}$ ops, well under 2s.
BIT cleanup must be $O (elements touched)$ , not $O (maxval)$ .

Problem Patterns & Tricks

Pattern 1 -- 2D Inversion / Dominance Counting

Count pairs $(i, j)$ with $i < j$ , $a_{i} \leq a_{j}$ , $b_{i} \leq b_{j}$ (or the reverse). CDQ on index, merge-sort or BIT on the second dimension.

Recognize it: "Count pairs satisfying two simultaneous ordering conditions."

CF 1042D  -- Pair of Topics (count pairs a_i+a_j > b_i+b_j, rephrase as 1D)
Luogu P3810 -- 3D partial order (classic template problem)

Pattern 2 -- 3D Partial Order

Each element has three attributes $(a, b, c)$ . Count how many elements are dominated by each element. Sort by $a$ , CDQ on index (eliminates $a$ ), merge-sort by $b$ , BIT on $c$ .

Recognize it: "For each element, count how many have all three attributes $\leq$ ."

Luogu P3810 -- Three-dimensional partial order (the canonical problem)
CF 12D     -- Ball (pairs with all three conditions, with care for ties)

Pattern 3 -- Offline Dynamic 2D Prefix Sums

Point updates and rectangle queries arrive in a stream. Process offline: assign timestamps, CDQ on time, BIT on one spatial dimension, sweep on the other. A rectangle query decomposes into four prefix queries.

Recognize it: "Update a 2D grid point, query a rectangle sum, and the problem does not force online processing."

CF 848C -- Goodbye Souvenir (offline, CDQ on time + BIT)
CF 1093E -- Intersection of Permutations (offline approach)

Pattern 4 -- DP Optimization with CDQ

The DP transition $d p [j] = min_{i < j} {d p [i] + cost (i, j)}$ where $cost$ depends on attributes of $i$ and $j$ with partial-order constraints. CDQ computes contributions from already-computed states to future states.

Key difference: Must process left half first (to finalize its DP values) before computing cross-contributions to the right half.

cpp

void cdq(int l, int r) {
    if (l == r) {
        // Finalize dp[l] here (e.g., dp[l] += base_cost[l]).
        return;
    }
    int mid = (l + r) / 2;
    cdq(l, mid);          // Compute dp values for left half.
    // Cross-contribution: use dp[i] for i in [l,mid] to update
    // dp[j] for j in [mid+1,r].
    compute_cross(l, mid, mid + 1, r);
    cdq(mid + 1, r);      // Now compute dp values for right half.
}

CF 1530F -- Bingo (CDQ-style DP optimization)
IOI 2016 -- Aliens (CDQ + convex hull trick variant)

Pattern 5 -- Converting Online to Offline

Any problem with "update, then query" can be made offline by recording all operations with timestamps. CDQ on the timestamp dimension replaces a persistent data structure with a simple recursive framework.

When it fails: The problem forces online (each query answer affects the next query). Look for "decode the query using the last answer" -- that kills CDQ.

CF 848C -- Goodbye Souvenir (classic CDQ offline conversion)
CF 1527E -- Partition Game (DP + offline, CDQ-amenable)

Contest Cheat Sheet

+--------------------------------------------------------------------+
|  CDQ DIVIDE AND CONQUER -- QUICK REFERENCE                         |
+--------------------------------------------------------------------+
|  WHEN: 2D/3D partial order, offline updates+queries, DP with       |
|        partial-order transitions.  Problem must be OFFLINE.         |
|  TIME: O(n log^2 n) typical    SPACE: O(n + V)                     |
+--------------------------------------------------------------------+
|  TEMPLATE (3D partial order):                                      |
|    1. Sort by a.                                                   |
|    2. cdq(l, r):                                                   |
|       if l==r: return                                              |
|       mid = (l+r)/2                                                |
|       cdq(l, mid);  cdq(mid+1, r);                                |
|       // cross: two-pointer by b, BIT on c                         |
|       ptr = l                                                      |
|       for j in [mid+1, r]:                                         |
|         while ptr<=mid and e[ptr].b <= e[j].b:                     |
|           bit.update(e[ptr].c, e[ptr].cnt); ptr++                  |
|         e[j].result += bit.query(e[j].c)                           |
|       bit.undo()                                                   |
|       merge_sort [l,r] by b                                        |
+--------------------------------------------------------------------+
|  CHECKLIST:                                                        |
|  [ ] Sorted by first dimension before CDQ                          |
|  [ ] Merge-sort by second dimension inside CDQ                     |
|  [ ] BIT cleanup uses stack (not memset!)                          |
|  [ ] Coordinate compress all dimensions                            |
|  [ ] Handle ties correctly (use <= not <, or count duplicates)     |
|  [ ] For DP variant: left-then-cross-then-right order              |
+--------------------------------------------------------------------+

Gotchas & Debugging

BIT must be reset between merge steps. After computing cross-contributions from the left half to the right half, undo all BIT updates (set back to 0). If you don't, the BIT accumulates stale values from previous merge steps, causing overcounting. Use a "cleanup" loop that reverses the updates, NOT memset(bit, 0, sizeof(bit)) (which is O(n) and destroys the complexity).

1. BIT cleanup must be O(elements touched), not O(maxval).memset after each merge turns $O (n \log n)$ into $O (V \cdot \log n)$ . Always use a stack of modified indices and zero only those.

2. Tie-breaking in the first dimension. Elements with the same $a$ on different sides of the split cause false contributions. Break ties by $b$ then $c$ when sorting by $a$ .

3. Forgetting to merge-sort by the second dimension. The two-pointer sweep at the parent level requires sorted data. The merge-sort step is not optional.

4. DP variant: wrong recursion order. For CDQ-based DP, compute left, then cross-contribution, then right. Standard CDQ is order-independent; the DP variant is not.

5. Duplicate elements. Deduplicate and store a count field. An element with count $c$ contributes $c - 1$ self-pairs.

6. Off-by-one in BIT queries. Strict $c_{i} < c_{j}$ : query bit.query(c_j - 1). Non-strict $\leq$ : query bit.query(c_j). Mixing these shifts every answer.

7. Coordinate compression. Compress each dimension independently before using as BIT indices.

Mental Traps

Trap: "CDQ is just divide and conquer, I'll figure out the details." The word "divide and conquer" is accurate but underspecifies the key invariant: after CDQ processes a range $[l, r]$ , the elements must be sorted by the second dimension within that range. This is what enables the next level's merge step. CDQ is simultaneously a D&C counting algorithm and a merge sort -- the merge is not optional cleanup, it's load-bearing.

Trap: "CDQ and merge sort are the same thing." CDQ uses merge sort internally but they are not the same. Merge sort sorts by one key. CDQ sorts by one key while counting contributions that cross the midpoint in a second key. The counting step (using a BIT on the third dimension) is what makes CDQ CDQ.

Trap: "I can replace CDQ with a 2D BIT." Sometimes you can, but often you can't. A 2D BIT has $O (\log^{2} n)$ per operation, which is fine for static 2D range counting. CDQ achieves $O (n \log^{2} n)$ total for dynamic (online-ordered) 3D partial order problems. The difference matters when the second dimension has updates that depend on previous queries (like in DP optimization).

Trap: "CDQ is only for 3D partial order." CDQ is used for 3D partial order, but it's also used to optimize DP transitions with two-dimensional dependencies. The structure is the same -- divide on one dimension, merge-sort on another, BIT on the third -- but the semantic roles (what "dimension" means) differ. Conflating the two use cases leads to applying the wrong version.

The mistake that teaches

3D partial order cleanup with memset. I spent three hours on WA-large-test. My BIT was accumulating stale values. The fix was not memset(bit, 0, sizeof(bit)) after each cross-merge -- that's O(MAXVAL) per call, and on recursive calls it wipes out parent-level values. Worse, memset masked an off-by-one in my merge pointer that had accidentally inserted a right-half element into the BIT. Correct cleanup: undo only the inserts you actually made by walking the same left-half elements and subtracting. This is O(right - left) per call, totals O(n log n), and exposes the kind of bugs memset hides.

cpp

for (auto& e : left_half) bit.update(e.dim3, +1);  // insert
for (auto& e : right_half) ans[e.id] += bit.query(e.dim3);  // query
for (auto& e : left_half) bit.update(e.dim3, -1);  // undo

In recursive algorithms with shared global state, always undo your modifications. Never rely on bulk resets.

Debug Drills

Drill 1: Cross-half merge counts wrong direction.

cpp

void solve(int l, int r) {
    if (r - l <= 1) return;
    int mid = (l + r) / 2;
    solve(l, mid);
    solve(mid, r);
    // Cross-half: left contributes to right
    sort(pts + l, pts + mid, by_y);
    sort(pts + mid, pts + r, by_y);
    int j = l;
    for (int i = mid; i < r; i++) {
        while (j < mid && pts[j].y <= pts[i].y)
            bit_update(pts[j].z, 1), j++;
        pts[i].ans += bit_query(pts[i].z);
    }
    // cleanup
    for (int k = l; k < j; k++)
        bit_update(pts[k].z, -1);
}

What's wrong?

The two recursive calls solve(l, mid) and solve(mid, r) are both executed before the cross-half merge. But in DP-style CDQ, the right half depends on the left half's results. You must solve the left half first, then do the cross-half merge (so left-half answers propagate to the right half), and then solve the right half. Fix: move solve(mid, r) to after the cross-half step. For pure counting (non-DP) CDQ, the order doesn't matter -- but for DP optimization, it's critical.

Drill 2: Tie-breaking causes missed contributions.

cpp

// Sorting by a, then by b, then by c
sort(pts, pts + n, [](auto& a, auto& b) {
    if (a.x != b.x) return a.x < b.x;
    if (a.y != b.y) return a.y < b.y;
    return a.z < b.z;
});
// Then CDQ processes indices [0, n)

What's wrong?

When elements have equal x values, the CDQ split at mid may place element j in the right half and element i in the left half even though x[j] == x[i]. The cross-half step then counts j as being "dominated" by i in the x-dimension, but the problem requires strict or non-strict inequality -- and equal x-values on opposite sides of mid get treated as if left < right. Fix: after sorting, deduplicate elements with identical (x, y, z) by counting multiplicities, or ensure that elements with equal x-values are handled correctly by the cross-half step (e.g., by not contributing across halves when x-values are equal).

Drill 3: BIT cleanup misses some entries.

cpp

// After cross-half merge:
for (int k = l; k < mid; k++)
    bit_update(pts[k].z, -1);

What's wrong?

The cleanup loop iterates over all left-half elements (l to mid), but during the merge, the pointer j only advanced to some j <= mid -- meaning only elements from l to j-1 were actually inserted into the BIT. Removing elements that were never inserted causes the BIT to go negative, corrupting future queries. Fix: for (int k = l; k < j; k++) bit_update(pts[k].z, -1); -- only undo what was actually done.

Drill 4: Cross-contribution direction in comments.

cpp

void cdq(int l, int r) {
    if (l >= r) return;
    int mid = (l + r) / 2;
    cdq(l, mid);
    cdq(mid + 1, r);
    // Cross: compute how RIGHT affects LEFT
    sort(a + l, a + mid + 1, by_dim2);
    sort(a + mid + 1, a + r + 1, by_dim2);
    // merge and accumulate...
}

What's wrong?

The cross-contribution should compute how the LEFT half affects the RIGHT half, not the other way around. In a partial order counting problem, element j in the left half (earlier in dim1) can dominate element i in the right half. The code comment says "RIGHT affects LEFT" -- if the merge logic follows this comment, it counts contributions backwards. Fix the direction: insert left-half elements into the BIT, query right-half elements against it.

Drill 5: BIT cleanup with memset.

cpp

void cdq(int l, int r) {
    if (l >= r) return;
    int mid = (l + r) / 2;
    cdq(l, mid);
    cdq(mid + 1, r);
    // cross step: insert left into BIT, query right
    for (int i = l; i <= mid; i++)
        bit.update(a[i].z, 1);
    for (int i = mid + 1; i <= r; i++)
        ans[a[i].id] += bit.query(a[i].z);
    memset(bit.tree, 0, sizeof(bit.tree));  // nuclear reset
}

What's wrong?

memset resets the entire BIT array, which is O(MAX_VAL) per CDQ call. Across all O(n log n) recursive calls, this gives O(n * MAX_VAL * log n) total work -- potentially orders of magnitude slower than intended. Worse, if the BIT is shared and a parent call had pending values, they're destroyed. Fix: undo only the values you inserted:

cpp

for (int i = l; i <= mid; i++)
    bit.update(a[i].z, -1);

This is O(right - left) per call, totaling O(n log n) across all calls.

Drill 6: Forgetting to stable-sort by dim2 in the merge step.

cpp

void cdq(int l, int r) {
    if (l >= r) return;
    int mid = (l + r) / 2;
    cdq(l, mid);
    cdq(mid + 1, r);
    // Just sort both halves by dim2 and two-pointer merge
    sort(a + l, a + mid + 1, by_dim2);
    sort(a + mid + 1, a + r + 1, by_dim2);
    int i = l, j = mid + 1;
    while (i <= mid && j <= r) {
        if (a[i].y <= a[j].y) {
            bit.update(a[i].z, 1);
            i++;
        } else {
            ans[a[j].id] += bit.query(a[j].z);
            j++;
        }
    }
    while (j <= r) { ans[a[j].id] += bit.query(a[j].z); j++; }
    // cleanup...
}

What's wrong?

The sort by dim2 destroys the dim1 ordering within each half, which the recursive calls relied on. After returning from cdq(l, mid) and cdq(mid+1, r), the elements within each half should still be sorted by dim1 for parent-level processing. The fix: use a merge-sort approach where the CDQ function itself returns elements sorted by dim2 (like merge sort returns sorted arrays), or use a temporary buffer and restore the original order after the cross step. Alternatively, sort by dim1 at the start of each CDQ call instead of relying on the caller.

Debugging checklist:

Test on $n = 1, 2, 3$ . Compare against $O (n^{2})$ brute force for $n \leq 100$ .
Verify BIT is clean at each merge start: assert(bit.query(maxval) == 0).
Print elements at each level -- are halves sorted by the right attribute?

Self-Test

Before moving on, verify you can answer these without looking at the notes:

[ ] Dimension roles: Describe the role of each dimension in a 3D partial order CDQ -- which dimension is handled by global sort, which by CDQ's merge, and which by the BIT?
[ ] CDQ skeleton: Write the skeleton of cdq(l, r) for DP optimization -- the four steps (left recursion, cross contribution, right recursion, merge) in the correct order.
[ ] BIT reset reasoning: Explain why the BIT must be reset between merge steps. Give a concrete example of what goes wrong (overcounting) if you skip the reset.
[ ] Complexity breakdown: State the time complexity -- $O (n \log^{2} n)$ -- and explain where each log factor comes from (D&C depth vs. BIT operations).
[ ] Applicability judgment: Name one problem where CDQ D&C applies and one where it looks applicable but isn't (e.g., forced-online queries).
[ ] Merge is load-bearing: Explain why the merge-sort step after cross-contribution is mandatory, not optional cleanup. What breaks at the parent level if you skip it?
[ ] CDQ vs. alternatives: When would you prefer a persistent segment tree or merge sort tree over CDQ, and vice versa?

Practice Problems

Rating Progression

CF Rating	What to Master
1200	Understand merge sort inversion counting as the simplest 2D partial-order problem (a precursor to CDQ thinking).
1500	Formalize the 2D partial-order template: sort by dimension 1, divide-and-conquer, use a BIT for dimension 2 in the cross-half merge. Understand the divide-and-conquer framework: solve left, solve right, compute cross-contributions.
1800	Full 3D partial order (sort by dim 1, CDQ on dim 2, BIT on dim 3); recognize when a DP recurrence can be framed as a partial-order problem. Handle the BIT cleanup correctly.
2100	Nested CDQ (CDQ inside CDQ) for 4D+ problems; combining CDQ with other offline techniques (e.g., CDQ + parallel binary search); handling duplicates and equal elements correctly across all dimensions. CDQ for DP optimization, CDQ + sweep.

#	Problem	Source	Difficulty	Key Idea
1	Inversions	CF EDU	Easy	Classic inversion count; CDQ = merge sort
2	Ball	CF 12D	Medium	2D dominance with tie-breaking
3	Three-dimensional Partial Order	Luogu P3810	Medium	Canonical 3D CDQ template
4	Pair of Topics	CF 1042D	Medium	Rephrase as inversion count with CDQ
5	Goodbye Souvenir	CF 848C	Medium-Hard	CDQ on time + BIT for offline queries
6	Partition Game	CF 1527E	Medium-Hard	DP optimization with CDQ
7	Flowers	CF 840C	Hard	2D problem reducible to CDQ
8	Intersecting Segments	CF 1093E	Hard	Offline 2D with CDQ approach
9	Guard Mark	Luogu P4169	Hard	4D CDQ (nested CDQ layers)
10	Mokia	Luogu P4390	Hard	CDQ on time for 2D point update + rect query

CDQ Divide and Conquer ​

Contents ​

Intuition ​

2.1 The Pain ​

2.2 The Key Insight ​

2.3 Visual Walkthrough ​

Worked Example: Counting Inversions via CDQ (Merge-Sort Style) ​

2.4 The Invariant ​

2.5 When to Reach for This ​

Dimension Role Assignment ​

What the Code Won't Teach You ​

3. How It Works ​

Implementation ​

4.1 CDQ for 2D Inversion Count (merge-sort style) ​

4.2 CDQ for 2D Partial Order with BIT ​

4.3 CDQ for 3D Partial Order (CDQ + nested BIT) ​

3D Partial Order Walkthrough ​

Merge Trace ASCII -- 3D CDQ Cross-Contribution Step ​

4.4 CDQ for Offline Point Updates + Range Queries ​

Complexity Analysis ​

Time ​

Problem Patterns & Tricks ​

Pattern 1 -- 2D Inversion / Dominance Counting ​

Pattern 2 -- 3D Partial Order ​

Pattern 3 -- Offline Dynamic 2D Prefix Sums ​

Pattern 4 -- DP Optimization with CDQ ​

Pattern 5 -- Converting Online to Offline ​

Contest Cheat Sheet ​

Gotchas & Debugging ​

Mental Traps ​

The mistake that teaches ​

Debug Drills ​

Self-Test ​

Practice Problems ​

Further Reading ​

CDQ Divide and Conquer

Contents

Intuition

2.1 The Pain

2.2 The Key Insight

2.3 Visual Walkthrough

Worked Example: Counting Inversions via CDQ (Merge-Sort Style)

2.4 The Invariant

2.5 When to Reach for This

Dimension Role Assignment

What the Code Won't Teach You

3. How It Works

Implementation

4.1 CDQ for 2D Inversion Count (merge-sort style)

4.2 CDQ for 2D Partial Order with BIT

4.3 CDQ for 3D Partial Order (CDQ + nested BIT)

3D Partial Order Walkthrough

Merge Trace ASCII -- 3D CDQ Cross-Contribution Step

4.4 CDQ for Offline Point Updates + Range Queries

Complexity Analysis

Time

Problem Patterns & Tricks

Pattern 1 -- 2D Inversion / Dominance Counting

Pattern 2 -- 3D Partial Order

Pattern 3 -- Offline Dynamic 2D Prefix Sums

Pattern 4 -- DP Optimization with CDQ

Pattern 5 -- Converting Online to Offline

Contest Cheat Sheet

Gotchas & Debugging

Mental Traps

The mistake that teaches

Debug Drills

Self-Test

Practice Problems

Further Reading