2D Fenwick Tree (2D BIT)

Quick Revisit: Point updates and rectangle sum queries on a grid in $O (\log n \cdot \log m)$ —the natural 2D extension of the 1D Fenwick tree. Under the hood: two nested BITs, one over rows, one over columns. Both loops climb i += i & -i on update, both descend i -= i & -i on query. The classic trap: passing index 0 in either dimension hangs the loop forever. (Practice problems)

Difficulty: CF 1800–2100 · Prereqs: Fenwick Tree (1D), Prefix Sums

Why This Exists

A 2D Fenwick tree is just the row-column product of the 1D trick: one BIT along rows, one along columns. It exists because 2D prefix sums are elegant until updates arrive, at which point one edit dirties an entire quadrant.

You have a 4×4 grid of integers:

text

       c=1  c=2  c=3  c=4
r=1  |  3  |  1  |  2  |  5  |
r=2  |  4  |  2  |  1  |  3  |
r=3  |  1  |  6  |  3  |  2  |
r=4  |  2  |  4  |  5  |  1  |

You need to support two operations, intermixed in any order:

update(r, c, delta): add $δ$ to cell $(r, c)$ .
rect_sum(r1, c1, r2, c2): return the sum of all cells in the rectangle $[r_{1} . . r_{2}] \times [c_{1} . . c_{2}]$ .

Your first instinct is to precompute a 2D prefix sum array $P$ where $P [r] [c] = \sum_{i = 1}^{r} \sum_{j = 1}^{c} a [i] [j]$ .

text

P:     c=0  c=1  c=2  c=3  c=4
r=0  |  0  |  0  |  0  |  0  |  0  |
r=1  |  0  |  3  |  4  |  6  | 11  |
r=2  |  0  |  7  | 10  | 13  | 21  |
r=3  |  0  |  8  | 17  | 23  | 33  |
r=4  |  0  | 10  | 23  | 34  | 45  |

Queries are fast: rect_sum(2, 2, 3, 4) via inclusion-exclusion:

text

P[3][4] - P[1][4] - P[3][1] + P[1][1] = 33 - 11 - 8 + 3 = 17
check: 2+1+3 + 6+3+2 = 17    <-- correct

Four lookups, done. But now run update(2, 3, +10)—adding 10 to $a [2] [3]$ . Every $P [r] [c]$ with $r \geq 2$ and $c \geq 3$ is stale:

text

Stale entries:  P[2][3], P[2][4], P[3][3], P[3][4], P[4][3], P[4][4]
                                                      ^-- 6 entries

That's $(n - r + 1) \times (m - c + 1)$ entries. In the worst case, a single update at $(1, 1)$ on an $n \times m$ grid dirties all $n \cdot m$ prefix cells. With $n = m = 2000$ and $q = 2 \times 10^{5}$ mixed operations, that's up to $8 \times 10^{11}$ work—hopeless TLE.

We need a structure that handles both point updates and rectangle queries in sublinear time, per dimension.

The way out is to nest a 1D BIT inside another 1D BIT: the outer structure handles rows, and each outer node contains a 1D BIT over columns. An update at $(r, c)$ climbs the outer BIT over rows ( $O (\log n)$ steps), and at each step climbs the inner BIT over columns ( $O (\log m)$ steps). Total: $O (\log n \cdot \log m)$ . Queries work the same way: outer prefix sum over rows, inner prefix sum over columns.

Concretely, tree[i][j] stores the sum of all cells $a [x] [y]$ where $x$ falls in the 1D-BIT responsibility range of index $i$ —rows $[i - lowbit (i) + 1, i]$ —and $y$ falls in the 1D-BIT responsibility range of index $j$ —columns $[j - lowbit (j) + 1, j]$ . The 2D structure is the Cartesian product of two independent 1D decompositions.

If you think of the 1D BIT as a hierarchical reporting structure along a corridor of offices, the 2D BIT is the same idea over a floor plan: one hierarchy for rows of desks and another for columns. Each "area manager" at position $(i, j)$ is responsible for a rectangular block of desks, and the block size in each dimension is determined independently by the lowest set bit of $i$ and $j$ .

Walkthrough

Starting grid (4×4):

text

       c=1  c=2  c=3  c=4
r=1  |  3  |  1  |  2  |  5  |
r=2  |  4  |  2  |  1  |  3  |
r=3  |  1  |  6  |  3  |  2  |
r=4  |  2  |  4  |  5  |  1  |

Step 1: Responsibility rectangles

Each tree[i][j] covers the rectangle of rows $[i - lowbit (i) + 1, i]$ and columns $[j - lowbit (j) + 1, j]$ :

text

tree[i][j]    row-range    col-range    rectangle         value
----------    ---------    ---------    ---------         -----
tree[1][1]    [1,1]        [1,1]        {(1,1)}             3
tree[1][2]    [1,1]        [1,2]        {(1,1),(1,2)}       4
tree[1][3]    [1,1]        [3,3]        {(1,3)}             2
tree[1][4]    [1,1]        [1,4]        {(1,1)..(1,4)}     11
tree[2][1]    [1,2]        [1,1]        {(1,1),(2,1)}       7
tree[2][2]    [1,2]        [1,2]        {(1,1)..(2,2)}     10
tree[2][3]    [1,2]        [3,3]        {(1,3),(2,3)}       3
tree[2][4]    [1,2]        [1,4]        {(1,1)..(2,4)}     21
tree[3][1]    [3,3]        [1,1]        {(3,1)}             1
tree[3][2]    [3,3]        [1,2]        {(3,1),(3,2)}       7
  ...           ...          ...           ...             ...
tree[4][4]    [1,4]        [1,4]        all 16 cells       45

The pattern: tree[i][j] responsibility = (1D-BIT range of $i$ ) $\times$ (1D-BIT range of $j$ ).

Step 2: Point update—update(3, 2, +10)

Add 10 to cell $(3, 2)$ . The outer BIT climbs rows: $i = 3 \to 4$ (since $3 + lowbit (3) = 4$ ), then $4 + lowbit (4) = 8 > 4$ , stop. For each row index $i$ , the inner BIT climbs columns: $j = 2 \to 4$ (since $2 + lowbit (2) = 4$ ), then $4 + lowbit (4) = 8 > 4$ , stop.

text

Outer i=3:
    Inner j=2:  tree[3][2] += 10    (covers rows [3,3], cols [1,2])
    Inner j=4:  tree[3][4] += 10    (covers rows [3,3], cols [1,4])

Outer i=4:
    Inner j=2:  tree[4][2] += 10    (covers rows [1,4], cols [1,2])
    Inner j=4:  tree[4][4] += 10    (covers rows [1,4], cols [1,4])

Total: 4 cells updated    (log2(4) * log2(4) = 2 * 2 = 4)

Visually, the updated entries form a grid pattern:

text

       j=1    j=2    j=3    j=4
i=1  |  .   |  .   |  .   |  .   |
i=2  |  .   |  .   |  .   |  .   |
i=3  |  .   | +10  |  .   | +10  |
i=4  |  .   | +10  |  .   | +10  |
              ^^^^          ^^^^
         (j climbs: 2 -> 4 for each i that climbs: 3 -> 4)

Step 3: Rectangle query—query(r1=2, c1=2, r2=3, c2=4)

Use inclusion-exclusion on 2D prefix sums:

text

rect_sum(2,2,3,4) = prefix(3,4) - prefix(1,4) - prefix(3,1) + prefix(1,1)

Each prefix(r, c) call descends both BIT dimensions. For example, prefix(3, 4):

text

Outer i=3 (0011):
    Inner j=4 (0100):  += tree[3][4]
    j = 4 - 4 = 0, stop inner

    i strip: 3 - 1 = 2

Outer i=2 (0010):
    Inner j=4 (0100):  += tree[2][4]
    j = 4 - 4 = 0, stop inner

    i strip: 2 - 2 = 0, stop outer

prefix(3,4) = tree[3][4] + tree[2][4]

After the update, tree[3][4] = 12 + 10 = 22, tree[2][4] = 21 (unchanged):

text

prefix(3,4) = 22 + 21 = 43
prefix(1,4) = tree[1][4] = 11
prefix(3,1) = tree[3][1] + tree[2][1] = 1 + 7 = 8
prefix(1,1) = tree[1][1] = 3

rect_sum = 43 - 11 - 8 + 3 = 27
check: (4+2+1+3) + (1+16+3+2) = 10 + 22 = 32... wait
       original sum was 17, added 10 to (3,2), answer = 27   <-- correct!

The examples confirm the structure works. The property that guarantees correctness in general is this: query(r, c) returns $\sum_{i = 1}^{r} \sum_{j = 1}^{c} a [i] [j]$ —the 2D prefix sum over the rectangle $[1. . r] [1. . c]$ . Any rectangle $[r_{1} . . r_{2}] [c_{1} . . c_{2}]$ then decomposes via inclusion-exclusion:

rect_sum (r_{1}, c_{1}, r_{2}, c_{2}) = Q (r_{2}, c_{2}) - Q (r_{1} - 1, c_{2}) - Q (r_{2}, c_{1} - 1) + Q (r_{1} - 1, c_{1} - 1)

where $Q (r, c) = query (r, c)$ .

Visually, the four corners of the inclusion-exclusion:

text

          col 0    c1-1       c2
            ┌────────┬──────────┐
   row 0    │        │          │
            │   A    │    B     │     A = Q(r1-1, c1-1)  <- add back
   r1-1     │        │          │     B = Q(r1-1, c2)    <- subtract
            ├────────┼──────────┤
            │        │ XXXXXXXX │     C = Q(r2, c1-1)    <- subtract
   r2       │   C    │ XXXXDXXX │     D = Q(r2, c2)      <- start here
            │        │ XXXXXXXX │
            └────────┴──────────┘

   answer = D - B - C + A

   Signs:  +D covers everything from (1,1) to (r2,c2).
           -B removes the top strip.
           -C removes the left strip.
           +A adds back the top-left corner (subtracted twice).

This mirrors the 2D prefix sum formula exactly. The BIT just makes both query and update logarithmic in each dimension.

What the Code Won't Teach You

Under the Hood

1D BIT must be automatic before you attempt 2D. The 2D BIT is mechanically two nested 1D BIT loops. If you hesitate at all writing 1D update/query, that hesitation compounds—you'll end up debugging the nesting and the 1D logic simultaneously. You should be able to write 1D BIT in under two minutes from memory before touching 2D.

Many "2D" problems don't actually need a 2D BIT. Before reaching for one, ask: can I sort by one dimension and use a 1D BIT for the other? Dominance counting, for instance, often reduces to sorting by $x$ and querying a 1D BIT on $y$ . The 2D BIT has a larger constant factor ( $\log n \cdot \log m$ with poor cache locality) and uses $O (n m)$ memory—the 1D approach is faster and lighter whenever it applies.

The constant factor is worse than you'd expect. For $n = m = 2000$ , a single update touches up to $\log (2000)^{2} \approx 121$ cells. The inner loop jumps by lowbit(j), skipping around in memory and causing cache misses. On tight time limits (2–3 seconds with $2 \times 10^{5}$ operations), this constant can matter. Profile before assuming $O (\log^{2})$ is fast enough.

2D prefix-sum inclusion-exclusion must feel automatic. The formula $Q (r_{2}, c_{2}) - Q (r_{1} - 1, c_{2}) - Q (r_{2}, c_{1} - 1) + Q (r_{1} - 1, c_{1} - 1)$ should come out without thought. If you have to re-derive it during a contest, you'll be debugging two layers at once: the BIT traversal and the inclusion-exclusion signs.

Use this when you see "2D point update + rectangle sum query" or "online 2D queries with modifications" and the grid fits in memory (typically $n, m \leq 2000$ – $4000$ ). Don't use it when the grid is too large for $O (n m)$ memory, or when you need 2D range min/max—2D segment trees or offline techniques handle those better.

The Algorithm

A 2D BIT is a 2D array tree[1..n][1..m] where index $(i, j)$ stores the partial sum over the responsibility rectangle determined by both dimensions independently. All 1D BIT operations generalize: just nest the column loop inside the row loop.

text

FUNCTION update(r, c, delta):
    i = r
    WHILE i <= n:
        j = c
        WHILE j <= m:
            tree[i][j] += delta
            j += j AND (-j)
        i += i AND (-i)

FUNCTION query(r, c):
    sum = 0
    i = r
    WHILE i > 0:
        j = c
        WHILE j > 0:
            sum += tree[i][j]
            j -= j AND (-j)
        i -= i AND (-i)
    RETURN sum

FUNCTION rect_sum(r1, c1, r2, c2):
    RETURN query(r2, c2) - query(r1-1, c2)
         - query(r2, c1-1) + query(r1-1, c1-1)

The nested loop structure visualized for update(3, 2, +5) on an 8×8 grid:

text

Outer loop: i traverses row-ancestors of r=3
┌──────────────────────────────────────────┐
│ i = 3  (011₂, lowbit = 1)               │
│   Inner loop: j traverses col-ancestors  │
│   ┌──────────────────────────────────┐   │
│   │ j=2 ──► j=4 ──► j=8 ──► STOP   │   │
│   │ tree[3][2] += 5                  │   │
│   │ tree[3][4] += 5                  │   │
│   │ tree[3][8] += 5                  │   │
│   └──────────────────────────────────┘   │
│ i += lowbit(3) = 4                       │
│                                          │
│ i = 4  (100₂, lowbit = 4)               │
│   ┌──────────────────────────────────┐   │
│   │ j=2 ──► j=4 ──► j=8 ──► STOP   │   │
│   │ tree[4][2] += 5                  │   │
│   │ tree[4][4] += 5                  │   │
│   │ tree[4][8] += 5                  │   │
│   └──────────────────────────────────┘   │
│ i += lowbit(4) = 8                       │
│                                          │
│ i = 8  (1000₂, lowbit = 8)              │
│   ┌──────────────────────────────────┐   │
│   │ j=2 ──► j=4 ──► j=8 ──► STOP   │   │
│   │ tree[8][2] += 5                  │   │
│   │ tree[8][4] += 5                  │   │
│   │ tree[8][8] += 5                  │   │
│   └──────────────────────────────────┘   │
│ i += lowbit(8) = 16 > 8, STOP           │
└──────────────────────────────────────────┘

Cells touched: 3 x 3 = 9 = log₂(8) x log₂(8)
Each outer step spawns a complete inner loop.

Both update and query run in $O (\log n \cdot \log m)$ —the outer loop runs $O (\log n)$ times, the inner loop $O (\log m)$ per outer step. Space is $O (n \cdot m)$ . Building from a grid takes $O (n m \log n \log m)$ via per-cell updates, or $O (n m)$ with the propagate trick (see the implementation below).

Implementation

cpp

#include <bits/stdc++.h>
using namespace std;

// --- 2D BIT: Point update + Rectangle query ---
struct BIT2D {
    int n, m;
    vector<vector<long long>> tree;

    BIT2D(int n, int m)
        : n(n), m(m), tree(n + 1, vector<long long>(m + 1, 0)) {}

    // Build from existing grid in O(n*m) using propagation
    BIT2D(const vector<vector<long long>>& grid)
        : n((int)grid.size() - 1),
          m((int)grid[0].size() - 1),
          tree(grid.size(), vector<long long>(grid[0].size(), 0)) {
        // Copy values into tree, then propagate
        for (int i = 1; i <= n; i++)
            for (int j = 1; j <= m; j++)
                tree[i][j] = grid[i][j];
        // Propagate along columns first
        for (int i = 1; i <= n; i++)
            for (int j = 1; j <= m; j++) {
                int nj = j + (j & (-j));
                if (nj <= m) tree[i][nj] += tree[i][j];
            }
        // Then propagate along rows
        for (int i = 1; i <= n; i++)
            for (int j = 1; j <= m; j++) {
                int ni = i + (i & (-i));
                if (ni <= n) tree[ni][j] += tree[i][j];
            }
    }

    void update(int r, int c, long long delta) {
        for (int i = r; i <= n; i += i & (-i))
            for (int j = c; j <= m; j += j & (-j))
                tree[i][j] += delta;
    }

    // Prefix sum query: sum of [1..r][1..c]
    long long query(int r, int c) {
        long long s = 0;
        for (int i = r; i > 0; i -= i & (-i))
            for (int j = c; j > 0; j -= j & (-j))
                s += tree[i][j];
        return s;
    }

    // Rectangle sum query: sum of [r1..r2][c1..c2]
    long long query(int r1, int c1, int r2, int c2) {
        return query(r2, c2) - query(r1 - 1, c2)
             - query(r2, c1 - 1) + query(r1 - 1, c1 - 1);
    }
};

// --- 2D BIT: Range update + Point query ---
// Uses the 2D difference array idea.
// range_update(r1, c1, r2, c2, v) adds v to all cells in the rectangle.
// point_query(r, c) returns the current value at (r, c).
struct BIT2D_RangePoint {
    int n, m;
    vector<vector<long long>> tree;

    BIT2D_RangePoint(int n, int m)
        : n(n), m(m), tree(n + 2, vector<long long>(m + 2, 0)) {}

    void bit_update(int r, int c, long long delta) {
        for (int i = r; i <= n; i += i & (-i))
            for (int j = c; j <= m; j += j & (-j))
                tree[i][j] += delta;
    }

    // Add delta to all cells in [r1..r2][c1..c2]
    void range_update(int r1, int c1, int r2, int c2, long long delta) {
        bit_update(r1, c1, delta);
        bit_update(r1, c2 + 1, -delta);
        bit_update(r2 + 1, c1, -delta);
        bit_update(r2 + 1, c2 + 1, delta);
    }

    // Query the current value at cell (r, c)
    long long point_query(int r, int c) {
        long long s = 0;
        for (int i = r; i > 0; i -= i & (-i))
            for (int j = c; j > 0; j -= j & (-j))
                s += tree[i][j];
        return s;
    }
};

// --- 2D BIT: Range update + Range query ---
// Supports both adding v to a rectangle and querying the sum of a rectangle.
// Uses four 2D BITs based on the identity:
//   sum_{i=1}^{r} sum_{j=1}^{c} a[i][j]
//     = (r+1)(c+1)*B1 - (c+1)*B2 - (r+1)*B3 + B4
// where B1..B4 are BITs storing D[i][j], i*D[i][j], j*D[i][j], i*j*D[i][j].
struct BIT2D_RangeRange {
    int n, m;
    vector<vector<long long>> b1, b2, b3, b4;

    BIT2D_RangeRange(int n, int m)
        : n(n), m(m),
          b1(n + 2, vector<long long>(m + 2, 0)),
          b2(n + 2, vector<long long>(m + 2, 0)),
          b3(n + 2, vector<long long>(m + 2, 0)),
          b4(n + 2, vector<long long>(m + 2, 0)) {}

    void bit_add(int r, int c, long long v) {
        long long rv = (long long)r * v, cv = (long long)c * v;
        long long rcv = (long long)r * c * v;
        for (int i = r; i <= n; i += i & (-i))
            for (int j = c; j <= m; j += j & (-j)) {
                b1[i][j] += v;
                b2[i][j] += rv;
                b3[i][j] += cv;
                b4[i][j] += rcv;
            }
    }

    void range_update(int r1, int c1, int r2, int c2, long long v) {
        bit_add(r1, c1, v);
        bit_add(r1, c2 + 1, -v);
        bit_add(r2 + 1, c1, -v);
        bit_add(r2 + 1, c2 + 1, v);
    }

    long long prefix_sum(int r, int c) {
        long long s = 0;
        for (int i = r; i > 0; i -= i & (-i))
            for (int j = c; j > 0; j -= j & (-j))
                s += (long long)(r + 1) * (c + 1) * b1[i][j]
                   - (long long)(c + 1) * b2[i][j]
                   - (long long)(r + 1) * b3[i][j]
                   + b4[i][j];
        return s;
    }

    long long range_sum(int r1, int c1, int r2, int c2) {
        return prefix_sum(r2, c2) - prefix_sum(r1 - 1, c2)
             - prefix_sum(r2, c1 - 1) + prefix_sum(r1 - 1, c1 - 1);
    }
};

int main() {
    // Example: point update + rectangle query on n x m grid
    int n, m, q;
    scanf("%d %d %d", &n, &m, &q);

    BIT2D bit(n, m);

    // Read initial grid and populate
    for (int i = 1; i <= n; i++)
        for (int j = 1; j <= m; j++) {
            long long v;
            scanf("%lld", &v);
            bit.update(i, j, v);
        }

    while (q--) {
        int t;
        scanf("%d", &t);
        if (t == 1) {
            int r, c;
            long long delta;
            scanf("%d %d %lld", &r, &c, &delta);
            bit.update(r, c, delta);
        } else {
            int r1, c1, r2, c2;
            scanf("%d %d %d %d", &r1, &c1, &r2, &c2);
            printf("%lld\n", bit.query(r1, c1, r2, c2));
        }
    }
}

Pre-Code Sanity Check

Before you start typing, confirm: are $N$ and $M$ small enough for an $N \times M$ array (typically $\leq 2000$ – $4000$ )? Have you shifted all coordinates to 1-based indexing? Is every update a point update, or do you need the 4-BIT range-update extension? Are rectangle sums going through the 4-term inclusion-exclusion? If coordinates reach $10^{9}$ , have you compressed them or switched to offline CDQ?

The whole thing fits in one sentence: outer walks rows, inner walks cols—same bit trick, both directions.

Operations Reference

Each operation's cost reflects the nested BIT structure—one logarithmic factor per dimension.

Operation	Time	Space	Notes
Build (n*m updates)	$O (n m \log n \log m)$	$O (n m)$	Naive: call update for each cell
Build (propagation)	$O (n m)$	$O (n m)$	Copy + propagate along each dim
Point update	$O (\log n \cdot \log m)$	--	`update(r, c, delta)`
Prefix query	$O (\log n \cdot \log m)$	--	`query(r, c)` = sum of $[1. . r] [1. . c]$
Rectangle query	$O (\log n \cdot \log m)$	--	Inclusion-exclusion on 4 prefix queries
Range update (point query)	$O (\log n \cdot \log m)$	$O (n m)$	2D difference array in BIT
Range update + range query	$O (\log n \cdot \log m)$	$O (4 n m)$	Four 2D BITs

Problem Patterns & Tricks

Pattern 1: 2D Rectangle Sum Queries

The bread-and-butter application. Given an $n \times m$ grid with point updates and rectangle sum queries, use BIT2D directly. The inclusion-exclusion formula handles arbitrary axis-aligned rectangles.

cpp

BIT2D bit(n, m);
bit.update(r, c, delta);
long long ans = bit.query(r1, c1, r2, c2);

Problems: CSES Forest Queries II, CF 869E

Pattern 2: 2D Range Update + Point Query

Need to add $v$ to all cells in a rectangle $[r_{1} . . r_{2}] [c_{1} . . c_{2}]$ and later query individual cells? Use the 2D difference array trick. Four BIT updates per range update, one BIT prefix query per point query.

text

range_update(r1, c1, r2, c2, v):
    bit_update(r1,   c1,   +v)
    bit_update(r1,   c2+1, -v)
    bit_update(r2+1, c1,   -v)
    bit_update(r2+1, c2+1, +v)

point_query(r, c):
    return prefix_sum(r, c)

This mirrors the 1D trick: 1D difference array becomes a 2D difference matrix with four corner updates.

Problems: CF 341D, Gym 2D Range Update

Pattern 3: 2D Inversion Counting / Dominance Counting

Count pairs where point $A$ dominates point $B$ in both coordinates (2D dominance counting). Sort by one coordinate, sweep, and use a 2D BIT (or a 1D BIT on the second coordinate after compression) to count.

For strictly 2D problems where both coordinates matter simultaneously, the 2D BIT avoids coordinate-by-coordinate decomposition.

cpp

// Sort points by x, then for each point, query BIT for count of
// points already inserted with y <= current y
sort(pts.begin(), pts.end());
BIT bit(max_y);
long long count = 0;
for (auto& [x, y] : pts) {
    count += bit.query(y);  // 1D BIT often suffices after sorting
    bit.update(y, 1);
}

When a true 2D structure is needed (e.g., both dimensions require online updates), use BIT2D.

Problems: CF 1540B, CF 848C

Pattern 4: Online 2D Queries (No Offline Tricks)

When queries arrive online—each answer depending on the previous output—you can't sort or reorder them. A 2D BIT handles this naturally: each operation costs $O (\log^{2})$ with no batching. This is where the 2D BIT beats CDQ divide-and-conquer and merge sort tree approaches.

Problems: CF 1093E

Pattern 5: Grid Toggle + Rectangle Count

Classic: grid of 0/1, toggle a cell, query "how many 1s in rectangle?" This is point update (+1 or -1) plus rectangle sum—direct 2D BIT application.

cpp

BIT2D bit(n, m);
// toggle cell (r, c):
if (grid[r][c] == 0) { grid[r][c] = 1; bit.update(r, c, +1); }
else                  { grid[r][c] = 0; bit.update(r, c, -1); }
// count 1s in rectangle:
long long ones = bit.query(r1, c1, r2, c2);

Problems: CSES Forest Queries II, POJ 2155

Pattern 6: Coordinate Compression for Sparse Grids

When the grid is conceptually huge ( $10^{9} \times 10^{9}$ ) but only $q \leq 2 \times 10^{5}$ cells are ever accessed, compress both coordinates to $[1. . q]$ and use a 2D BIT of size $q \times q$ .

cpp

// Collect all row/col coordinates from updates and queries
vector<int> rs, cs;
for (auto& op : ops) { rs.push_back(op.r); cs.push_back(op.c); }
sort(rs.begin(), rs.end()); rs.erase(unique(rs.begin(), rs.end()), rs.end());
sort(cs.begin(), cs.end()); cs.erase(unique(cs.begin(), cs.end()), cs.end());
auto cr = [&](int r) { return (int)(lower_bound(rs.begin(), rs.end(), r) - rs.begin()) + 1; };
auto cc = [&](int c) { return (int)(lower_bound(cs.begin(), cs.end(), c) - cs.begin()) + 1; };
BIT2D bit((int)rs.size(), (int)cs.size());

Coordinate compression visualized:

text

Before compression (sparse, up to 10^9 x 10^9):

  Points: (5, 100000), (42, 3), (999999, 500)

  999999 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ● (999999, 500)
     :
    42 ─ ● (42, 3)
     :
     5 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ● (5, 100000)
         |                 |               |
         3               500           100000

After compression (dense 3x3 BIT):

  Rows: 5 -> 1,  42 -> 2,  999999 -> 3
  Cols: 3 -> 1, 500 -> 2, 100000 -> 3

     3 ─ ─ ● (3, 2)           BIT size: 3 x 3
     2 ─ ● (2, 1)             instead of 10^9 x 10^9
     1 ─ ─ ─ ─ ● (1, 3)
         |   |   |
         1   2   3

Problems: CF 1530E

Gotchas & Debugging

1. 1-Indexed in Both Dimensions

Like 1D BIT, indices must start at 1 in both row and column dimensions. update(0, c, v) or update(r, 0, v) triggers an infinite loop. If your grid is 0-indexed, add 1 to both coordinates everywhere.

2. Inclusion-Exclusion Off-By-One

The rectangle sum formula uses $r_{1} - 1$ and $c_{1} - 1$ . When $r_{1} = 1$ or $c_{1} = 1$ , these become 0, and query(0, c) should return 0—which it does, since the BIT loop exits immediately when $i = 0$ . But if you accidentally pass negative indices, you get undefined behavior.

3. Memory: $O (n m)$ Can Be Tight

For $n = m = 4000$ with long long, the tree consumes $4000 \times 4000 \times 8 = 128$ MB. For range-update-range-query (four BITs), that's 512 MB—likely MLE. Solutions:

Use int instead of long long if values fit in 32 bits.
Use coordinate compression if the grid is sparse.
Consider offline approaches (CDQ divide-and-conquer) for very large grids.

2D difference array: four corner updates. For range update + point query, you need four BIT updates per range operation. Missing any of the four corners gives wrong answers. The signs alternate in a checkerboard pattern:

text

    c1        c2+1
r1  [+v]      [-v]
r2+1[-v]      [+v]

Build efficiently. Don't build by calling update $n \times m$ times—that's $O (n m \log n \log m)$ . Instead, copy values into tree[][], then propagate along columns (for each row), then propagate along rows (for each column). Total: $O (n m)$ .

Swapped loop bounds on non-square grids. Allocating an $N \times M$ array but then writing for (j = y; j <= N; j += j & (-j)) (using N instead of M) skips valid columns or writes out of bounds. Keep the two dimensions distinct.

With $n = m = 2000$ and values up to $10^{9}$ , the maximum prefix sum is $4 \times 10^{15}$ . Use long long. With range updates adding up to $10^{9}$ across $2 \times 10^{5}$ operations, sums can be even larger—be careful.

4. Debugging Tips

Verify against brute-force 2D prefix sums for small grids ( $n, m \leq 5$ ).
Print the reconstructed grid: for each (i,j): print query(i,j,i,j).
If WA, check: are both dimensions 1-indexed? Is the BIT allocated large enough ( $n + 2$ for diff array)? Are the inclusion-exclusion signs correct?

Mental Traps

Trap: "2D BIT must be fundamentally harder than 1D"

It isn't. The 2D BIT is mechanically the 1D BIT applied twice: the outer loop is a 1D BIT traversal over rows, and the inner loop is a 1D BIT traversal over columns. If 1D BIT feels natural, 2D is almost direct:

text

1D BIT update:                 2D BIT update:

for i = r; i <= n; i+=low(i)   for i = r; i <= n; i+=low(i)
    tree[i] += delta               for j = c; j <= m; j+=low(j)
                                       tree[i][j] += delta

Identical structure -- just nest the column loop inside.

The trap is attempting 2D before the 1D version is automatic. Any hesitation in the 1D loops compounds when they're nested.

Trap: "I need a 2D BIT for this 2D problem"

Many problems that appear to require a 2D structure can be solved with a 1D BIT by processing one dimension sorted and using the BIT for the other. Before reaching for the 2D BIT (which costs $O (\log n \cdot \log m)$ per operation and $O (n m)$ memory), check whether sorting eliminates one dimension:

text

Problem: count points (x,y) dominated by each query

Approach A -- 2D BIT:              Approach B -- sort + 1D BIT:

  All points in 2D BIT              Sort all points by x
  BIT size: max_x x max_y          Sweep left -> right:
  Update: O(log X * log Y)           insert y into 1D BIT
  Memory: O(X * Y)                   query BIT for y <= qy
                                    Update: O(log Y)
                                    Memory: O(Y)

  Approach B is faster and uses far less memory.
  Only use 2D BIT when both dimensions need online updates.

Trap: "Update goes up, query goes down—easy, I won't mix them up"

Under contest pressure it's surprisingly common to use the wrong direction in one of the two nested loops. Both loops in update go UP; both loops in query go DOWN. Mixing directions in just the inner loop produces plausible-looking wrong answers:

text

UPDATE: both loops go UP (+= lowbit)

  outer: i=3 ──► i=4 ──► i=8 ──► STOP
  inner: j=2 ──► j=4 ──► j=8 ──► STOP
              ▲        ▲        ▲
           +lowbit  +lowbit  +lowbit

QUERY: both loops go DOWN (-= lowbit)

  outer: i=7 ──► i=6 ──► i=4 ──► i=0 STOP
  inner: j=5 ──► j=4 ──► j=0 STOP
              ▼        ▼        ▼
           -lowbit  -lowbit  -lowbit

[X] WRONG: outer UP, inner DOWN (or vice versa)
[OK] RIGHT: both loops same direction per operation

Spot the Bug

Bug 1—Silent zero index:

cpp

void update(int x, int y, int val) {
    for (int i = x; i <= N; i += i & (-i))
        for (int j = y; j <= M; j += j & (-j))
            tree[i][j] += val;
}
// Called with: update(0, 3, 1);

Fix

x = 0 makes i & (-i) == 0, so the outer loop increments by 0 → infinite loop. Always use 1-based indices; add assert(x >= 1 && y >= 1); during debugging.

Bug 2—Inclusion-exclusion sign error:

cpp

int query(int x1, int y1, int x2, int y2) {
    return prefix(x2, y2) - prefix(x1 - 1, y2)
         - prefix(x2, y1 - 1) - prefix(x1 - 1, y1 - 1);
}

Fix

The last term should be added, not subtracted. Correct: + prefix(x1-1, y1-1).

Bug 3—Swapped loop bounds:

cpp

void update(int x, int y, int val) {
    for (int i = x; i <= N; i += i & (-i))
        for (int j = y; j <= N; j += j & (-j))  // <- N should be M
            tree[i][j] += val;
}

Fix

Inner loop uses N instead of M. When the grid is non-square, this skips columns or writes out of bounds.

Can You...

Before moving on: write update(r, c, delta) from memory—two nested loops, both += lowbit. Write query(r, c) from memory—two nested loops, both -= lowbit. State the 4-term inclusion-exclusion formula for rect_sum(r1, c1, r2, c2) and explain the sign of each term. Explain why passing index 0 to a BIT hangs the loop (what's lowbit(0)?). Name one problem type where a true 2D BIT is needed, and one where 1D BIT plus sorting suffices. Draw the 2D difference-array sign pattern that turns four BIT updates into a rectangle range update.

Practice Problems

#	Problem	Source	Difficulty	Key Idea
1	Forest Queries II	CSES	Medium	2D point toggle + rectangle count
2	CF 869E—The Untended Antiquity	CF	Medium	2D range update, hashing + BIT
3	CF 341D—Iahub and Xors	CF	Hard	2D range update + range query with XOR
4	POJ 2155—Matrix	POJ	Medium	2D range toggle + point query
5	CF 1093E—Intersection of Permutations	CF	Hard	2D BIT / merge sort tree
6	CF 1540B -- Tree Array	CF	Medium-Hard	Dominance counting on tree
7	CF 848C -- Goodbye Souvenir	CF	Hard	CDQ or 2D BIT approach
8	SPOJ MATSUM -- Matrix Summation	SPOJ	Medium	Classic 2D point update + rect query
9	CF 1530E -- Encryption	CF	Hard	Coordinate compression + 2D BIT
10	AtCoder ABC 339F	AtCoder	Medium-Hard	2D counting with BIT

2D Fenwick Tree (2D BIT) ​

Why This Exists ​

Walkthrough ​

What the Code Won't Teach You ​

Under the Hood ​

The Algorithm ​

Implementation ​

Pre-Code Sanity Check ​

Operations Reference ​

Problem Patterns & Tricks ​

Pattern 1: 2D Rectangle Sum Queries ​

Pattern 2: 2D Range Update + Point Query ​

Pattern 3: 2D Inversion Counting / Dominance Counting ​

Pattern 4: Online 2D Queries (No Offline Tricks) ​

Pattern 5: Grid Toggle + Rectangle Count ​

Pattern 6: Coordinate Compression for Sparse Grids ​

Gotchas & Debugging ​

1. 1-Indexed in Both Dimensions ​

2. Inclusion-Exclusion Off-By-One ​

3. Memory: O(nm) Can Be Tight ​

4. Debugging Tips ​

Mental Traps ​

Spot the Bug ​

Can You... ​

Practice Problems ​

Further Reading ​

2D Fenwick Tree (2D BIT)

Why This Exists

Walkthrough

What the Code Won't Teach You

Under the Hood

The Algorithm

Implementation

Pre-Code Sanity Check

Operations Reference

Problem Patterns & Tricks

Pattern 1: 2D Rectangle Sum Queries

Pattern 2: 2D Range Update + Point Query

Pattern 3: 2D Inversion Counting / Dominance Counting

Pattern 4: Online 2D Queries (No Offline Tricks)

Pattern 5: Grid Toggle + Rectangle Count

Pattern 6: Coordinate Compression for Sparse Grids

Gotchas & Debugging

1. 1-Indexed in Both Dimensions

2. Inclusion-Exclusion Off-By-One

3. Memory: $O (n m)$ Can Be Tight

4. Debugging Tips

Mental Traps

Spot the Bug

Can You...

Practice Problems

Further Reading