Skip to content

2D Fenwick Tree (2D BIT)

Quick Revisit: Point updates and rectangle sum queries on a grid in O(lognlogm)—the natural 2D extension of the 1D Fenwick tree. Under the hood: two nested BITs, one over rows, one over columns. Both loops climb i += i & -i on update, both descend i -= i & -i on query. The classic trap: passing index 0 in either dimension hangs the loop forever. (Practice problems)

Difficulty: CF 1800–2100 · Prereqs: Fenwick Tree (1D), Prefix Sums


Why This Exists

A 2D Fenwick tree is just the row-column product of the 1D trick: one BIT along rows, one along columns. It exists because 2D prefix sums are elegant until updates arrive, at which point one edit dirties an entire quadrant.

You have a 4×4 grid of integers:

text
       c=1  c=2  c=3  c=4
r=1  |  3  |  1  |  2  |  5  |
r=2  |  4  |  2  |  1  |  3  |
r=3  |  1  |  6  |  3  |  2  |
r=4  |  2  |  4  |  5  |  1  |

You need to support two operations, intermixed in any order:

  1. update(r, c, delta): add δ to cell (r,c).
  2. rect_sum(r1, c1, r2, c2): return the sum of all cells in the rectangle [r1..r2]×[c1..c2].

Your first instinct is to precompute a 2D prefix sum array P where P[r][c]=i=1rj=1ca[i][j].

text
P:     c=0  c=1  c=2  c=3  c=4
r=0  |  0  |  0  |  0  |  0  |  0  |
r=1  |  0  |  3  |  4  |  6  | 11  |
r=2  |  0  |  7  | 10  | 13  | 21  |
r=3  |  0  |  8  | 17  | 23  | 33  |
r=4  |  0  | 10  | 23  | 34  | 45  |

Queries are fast: rect_sum(2, 2, 3, 4) via inclusion-exclusion:

text
P[3][4] - P[1][4] - P[3][1] + P[1][1] = 33 - 11 - 8 + 3 = 17
check: 2+1+3 + 6+3+2 = 17    <-- correct

Four lookups, done. But now run update(2, 3, +10)—adding 10 to a[2][3]. Every P[r][c] with r2 and c3 is stale:

text
Stale entries:  P[2][3], P[2][4], P[3][3], P[3][4], P[4][3], P[4][4]
                                                      ^-- 6 entries

That's (nr+1)×(mc+1) entries. In the worst case, a single update at (1,1) on an n×m grid dirties all nm prefix cells. With n=m=2000 and q=2×105 mixed operations, that's up to 8×1011 work—hopeless TLE.

We need a structure that handles both point updates and rectangle queries in sublinear time, per dimension.

The way out is to nest a 1D BIT inside another 1D BIT: the outer structure handles rows, and each outer node contains a 1D BIT over columns. An update at (r,c) climbs the outer BIT over rows (O(logn) steps), and at each step climbs the inner BIT over columns (O(logm) steps). Total: O(lognlogm). Queries work the same way: outer prefix sum over rows, inner prefix sum over columns.

Concretely, tree[i][j] stores the sum of all cells a[x][y] where x falls in the 1D-BIT responsibility range of index i—rows [ilowbit(i)+1,i]—and y falls in the 1D-BIT responsibility range of index j—columns [jlowbit(j)+1,j]. The 2D structure is the Cartesian product of two independent 1D decompositions.

If you think of the 1D BIT as a hierarchical reporting structure along a corridor of offices, the 2D BIT is the same idea over a floor plan: one hierarchy for rows of desks and another for columns. Each "area manager" at position (i,j) is responsible for a rectangular block of desks, and the block size in each dimension is determined independently by the lowest set bit of i and j.

Walkthrough

Starting grid (4×4):

text
       c=1  c=2  c=3  c=4
r=1  |  3  |  1  |  2  |  5  |
r=2  |  4  |  2  |  1  |  3  |
r=3  |  1  |  6  |  3  |  2  |
r=4  |  2  |  4  |  5  |  1  |

Step 1: Responsibility rectangles

Each tree[i][j] covers the rectangle of rows [ilowbit(i)+1,i] and columns [jlowbit(j)+1,j]:

text
tree[i][j]    row-range    col-range    rectangle         value
----------    ---------    ---------    ---------         -----
tree[1][1]    [1,1]        [1,1]        {(1,1)}             3
tree[1][2]    [1,1]        [1,2]        {(1,1),(1,2)}       4
tree[1][3]    [1,1]        [3,3]        {(1,3)}             2
tree[1][4]    [1,1]        [1,4]        {(1,1)..(1,4)}     11
tree[2][1]    [1,2]        [1,1]        {(1,1),(2,1)}       7
tree[2][2]    [1,2]        [1,2]        {(1,1)..(2,2)}     10
tree[2][3]    [1,2]        [3,3]        {(1,3),(2,3)}       3
tree[2][4]    [1,2]        [1,4]        {(1,1)..(2,4)}     21
tree[3][1]    [3,3]        [1,1]        {(3,1)}             1
tree[3][2]    [3,3]        [1,2]        {(3,1),(3,2)}       7
  ...           ...          ...           ...             ...
tree[4][4]    [1,4]        [1,4]        all 16 cells       45

The pattern: tree[i][j] responsibility = (1D-BIT range of i) × (1D-BIT range of j).

Step 2: Point update—update(3, 2, +10)

Add 10 to cell (3,2). The outer BIT climbs rows: i=34 (since 3+lowbit(3)=4), then 4+lowbit(4)=8>4, stop. For each row index i, the inner BIT climbs columns: j=24 (since 2+lowbit(2)=4), then 4+lowbit(4)=8>4, stop.

text
Outer i=3:
    Inner j=2:  tree[3][2] += 10    (covers rows [3,3], cols [1,2])
    Inner j=4:  tree[3][4] += 10    (covers rows [3,3], cols [1,4])

Outer i=4:
    Inner j=2:  tree[4][2] += 10    (covers rows [1,4], cols [1,2])
    Inner j=4:  tree[4][4] += 10    (covers rows [1,4], cols [1,4])

Total: 4 cells updated    (log2(4) * log2(4) = 2 * 2 = 4)

Visually, the updated entries form a grid pattern:

text
       j=1    j=2    j=3    j=4
i=1  |  .   |  .   |  .   |  .   |
i=2  |  .   |  .   |  .   |  .   |
i=3  |  .   | +10  |  .   | +10  |
i=4  |  .   | +10  |  .   | +10  |
              ^^^^          ^^^^
         (j climbs: 2 -> 4 for each i that climbs: 3 -> 4)

Step 3: Rectangle query—query(r1=2, c1=2, r2=3, c2=4)

Use inclusion-exclusion on 2D prefix sums:

text
rect_sum(2,2,3,4) = prefix(3,4) - prefix(1,4) - prefix(3,1) + prefix(1,1)

Each prefix(r, c) call descends both BIT dimensions. For example, prefix(3, 4):

text
Outer i=3 (0011):
    Inner j=4 (0100):  += tree[3][4]
    j = 4 - 4 = 0, stop inner

    i strip: 3 - 1 = 2

Outer i=2 (0010):
    Inner j=4 (0100):  += tree[2][4]
    j = 4 - 4 = 0, stop inner

    i strip: 2 - 2 = 0, stop outer

prefix(3,4) = tree[3][4] + tree[2][4]

After the update, tree[3][4] = 12 + 10 = 22, tree[2][4] = 21 (unchanged):

text
prefix(3,4) = 22 + 21 = 43
prefix(1,4) = tree[1][4] = 11
prefix(3,1) = tree[3][1] + tree[2][1] = 1 + 7 = 8
prefix(1,1) = tree[1][1] = 3

rect_sum = 43 - 11 - 8 + 3 = 27
check: (4+2+1+3) + (1+16+3+2) = 10 + 22 = 32... wait
       original sum was 17, added 10 to (3,2), answer = 27   <-- correct!

The examples confirm the structure works. The property that guarantees correctness in general is this: query(r, c) returns i=1rj=1ca[i][j]—the 2D prefix sum over the rectangle [1..r][1..c]. Any rectangle [r1..r2][c1..c2] then decomposes via inclusion-exclusion:

rect_sum(r1,c1,r2,c2)=Q(r2,c2)Q(r11,c2)Q(r2,c11)+Q(r11,c11)

where Q(r,c)=query(r,c).

Visually, the four corners of the inclusion-exclusion:

text
          col 0    c1-1       c2
            ┌────────┬──────────┐
   row 0    │        │          │
            │   A    │    B     │     A = Q(r1-1, c1-1)  <- add back
   r1-1     │        │          │     B = Q(r1-1, c2)    <- subtract
            ├────────┼──────────┤
            │        │ XXXXXXXX │     C = Q(r2, c1-1)    <- subtract
   r2       │   C    │ XXXXDXXX │     D = Q(r2, c2)      <- start here
            │        │ XXXXXXXX │
            └────────┴──────────┘

   answer = D - B - C + A

   Signs:  +D covers everything from (1,1) to (r2,c2).
           -B removes the top strip.
           -C removes the left strip.
           +A adds back the top-left corner (subtracted twice).

This mirrors the 2D prefix sum formula exactly. The BIT just makes both query and update logarithmic in each dimension.

What the Code Won't Teach You

Under the Hood

1D BIT must be automatic before you attempt 2D. The 2D BIT is mechanically two nested 1D BIT loops. If you hesitate at all writing 1D update/query, that hesitation compounds—you'll end up debugging the nesting and the 1D logic simultaneously. You should be able to write 1D BIT in under two minutes from memory before touching 2D.

Many "2D" problems don't actually need a 2D BIT. Before reaching for one, ask: can I sort by one dimension and use a 1D BIT for the other? Dominance counting, for instance, often reduces to sorting by x and querying a 1D BIT on y. The 2D BIT has a larger constant factor (lognlogm with poor cache locality) and uses O(nm) memory—the 1D approach is faster and lighter whenever it applies.

The constant factor is worse than you'd expect. For n=m=2000, a single update touches up to log(2000)2121 cells. The inner loop jumps by lowbit(j), skipping around in memory and causing cache misses. On tight time limits (2–3 seconds with 2×105 operations), this constant can matter. Profile before assuming O(log2) is fast enough.

2D prefix-sum inclusion-exclusion must feel automatic. The formula Q(r2,c2)Q(r11,c2)Q(r2,c11)+Q(r11,c11) should come out without thought. If you have to re-derive it during a contest, you'll be debugging two layers at once: the BIT traversal and the inclusion-exclusion signs.

Use this when you see "2D point update + rectangle sum query" or "online 2D queries with modifications" and the grid fits in memory (typically n,m20004000). Don't use it when the grid is too large for O(nm) memory, or when you need 2D range min/max—2D segment trees or offline techniques handle those better.


The Algorithm

A 2D BIT is a 2D array tree[1..n][1..m] where index (i,j) stores the partial sum over the responsibility rectangle determined by both dimensions independently. All 1D BIT operations generalize: just nest the column loop inside the row loop.

text
FUNCTION update(r, c, delta):
    i = r
    WHILE i <= n:
        j = c
        WHILE j <= m:
            tree[i][j] += delta
            j += j AND (-j)
        i += i AND (-i)

FUNCTION query(r, c):
    sum = 0
    i = r
    WHILE i > 0:
        j = c
        WHILE j > 0:
            sum += tree[i][j]
            j -= j AND (-j)
        i -= i AND (-i)
    RETURN sum

FUNCTION rect_sum(r1, c1, r2, c2):
    RETURN query(r2, c2) - query(r1-1, c2)
         - query(r2, c1-1) + query(r1-1, c1-1)

The nested loop structure visualized for update(3, 2, +5) on an 8×8 grid:

text
Outer loop: i traverses row-ancestors of r=3
┌──────────────────────────────────────────┐
│ i = 3  (011₂, lowbit = 1)               │
│   Inner loop: j traverses col-ancestors  │
│   ┌──────────────────────────────────┐   │
│   │ j=2 ──► j=4 ──► j=8 ──► STOP   │   │
│   │ tree[3][2] += 5                  │   │
│   │ tree[3][4] += 5                  │   │
│   │ tree[3][8] += 5                  │   │
│   └──────────────────────────────────┘   │
│ i += lowbit(3) = 4                       │
│                                          │
│ i = 4  (100₂, lowbit = 4)               │
│   ┌──────────────────────────────────┐   │
│   │ j=2 ──► j=4 ──► j=8 ──► STOP   │   │
│   │ tree[4][2] += 5                  │   │
│   │ tree[4][4] += 5                  │   │
│   │ tree[4][8] += 5                  │   │
│   └──────────────────────────────────┘   │
│ i += lowbit(4) = 8                       │
│                                          │
│ i = 8  (1000₂, lowbit = 8)              │
│   ┌──────────────────────────────────┐   │
│   │ j=2 ──► j=4 ──► j=8 ──► STOP   │   │
│   │ tree[8][2] += 5                  │   │
│   │ tree[8][4] += 5                  │   │
│   │ tree[8][8] += 5                  │   │
│   └──────────────────────────────────┘   │
│ i += lowbit(8) = 16 > 8, STOP           │
└──────────────────────────────────────────┘

Cells touched: 3 x 3 = 9 = log₂(8) x log₂(8)
Each outer step spawns a complete inner loop.

Both update and query run in O(lognlogm)—the outer loop runs O(logn) times, the inner loop O(logm) per outer step. Space is O(nm). Building from a grid takes O(nmlognlogm) via per-cell updates, or O(nm) with the propagate trick (see the implementation below).


Implementation

cpp
#include <bits/stdc++.h>
using namespace std;

// --- 2D BIT: Point update + Rectangle query ---
struct BIT2D {
    int n, m;
    vector<vector<long long>> tree;

    BIT2D(int n, int m)
        : n(n), m(m), tree(n + 1, vector<long long>(m + 1, 0)) {}

    // Build from existing grid in O(n*m) using propagation
    BIT2D(const vector<vector<long long>>& grid)
        : n((int)grid.size() - 1),
          m((int)grid[0].size() - 1),
          tree(grid.size(), vector<long long>(grid[0].size(), 0)) {
        // Copy values into tree, then propagate
        for (int i = 1; i <= n; i++)
            for (int j = 1; j <= m; j++)
                tree[i][j] = grid[i][j];
        // Propagate along columns first
        for (int i = 1; i <= n; i++)
            for (int j = 1; j <= m; j++) {
                int nj = j + (j & (-j));
                if (nj <= m) tree[i][nj] += tree[i][j];
            }
        // Then propagate along rows
        for (int i = 1; i <= n; i++)
            for (int j = 1; j <= m; j++) {
                int ni = i + (i & (-i));
                if (ni <= n) tree[ni][j] += tree[i][j];
            }
    }

    void update(int r, int c, long long delta) {
        for (int i = r; i <= n; i += i & (-i))
            for (int j = c; j <= m; j += j & (-j))
                tree[i][j] += delta;
    }

    // Prefix sum query: sum of [1..r][1..c]
    long long query(int r, int c) {
        long long s = 0;
        for (int i = r; i > 0; i -= i & (-i))
            for (int j = c; j > 0; j -= j & (-j))
                s += tree[i][j];
        return s;
    }

    // Rectangle sum query: sum of [r1..r2][c1..c2]
    long long query(int r1, int c1, int r2, int c2) {
        return query(r2, c2) - query(r1 - 1, c2)
             - query(r2, c1 - 1) + query(r1 - 1, c1 - 1);
    }
};

// --- 2D BIT: Range update + Point query ---
// Uses the 2D difference array idea.
// range_update(r1, c1, r2, c2, v) adds v to all cells in the rectangle.
// point_query(r, c) returns the current value at (r, c).
struct BIT2D_RangePoint {
    int n, m;
    vector<vector<long long>> tree;

    BIT2D_RangePoint(int n, int m)
        : n(n), m(m), tree(n + 2, vector<long long>(m + 2, 0)) {}

    void bit_update(int r, int c, long long delta) {
        for (int i = r; i <= n; i += i & (-i))
            for (int j = c; j <= m; j += j & (-j))
                tree[i][j] += delta;
    }

    // Add delta to all cells in [r1..r2][c1..c2]
    void range_update(int r1, int c1, int r2, int c2, long long delta) {
        bit_update(r1, c1, delta);
        bit_update(r1, c2 + 1, -delta);
        bit_update(r2 + 1, c1, -delta);
        bit_update(r2 + 1, c2 + 1, delta);
    }

    // Query the current value at cell (r, c)
    long long point_query(int r, int c) {
        long long s = 0;
        for (int i = r; i > 0; i -= i & (-i))
            for (int j = c; j > 0; j -= j & (-j))
                s += tree[i][j];
        return s;
    }
};

// --- 2D BIT: Range update + Range query ---
// Supports both adding v to a rectangle and querying the sum of a rectangle.
// Uses four 2D BITs based on the identity:
//   sum_{i=1}^{r} sum_{j=1}^{c} a[i][j]
//     = (r+1)(c+1)*B1 - (c+1)*B2 - (r+1)*B3 + B4
// where B1..B4 are BITs storing D[i][j], i*D[i][j], j*D[i][j], i*j*D[i][j].
struct BIT2D_RangeRange {
    int n, m;
    vector<vector<long long>> b1, b2, b3, b4;

    BIT2D_RangeRange(int n, int m)
        : n(n), m(m),
          b1(n + 2, vector<long long>(m + 2, 0)),
          b2(n + 2, vector<long long>(m + 2, 0)),
          b3(n + 2, vector<long long>(m + 2, 0)),
          b4(n + 2, vector<long long>(m + 2, 0)) {}

    void bit_add(int r, int c, long long v) {
        long long rv = (long long)r * v, cv = (long long)c * v;
        long long rcv = (long long)r * c * v;
        for (int i = r; i <= n; i += i & (-i))
            for (int j = c; j <= m; j += j & (-j)) {
                b1[i][j] += v;
                b2[i][j] += rv;
                b3[i][j] += cv;
                b4[i][j] += rcv;
            }
    }

    void range_update(int r1, int c1, int r2, int c2, long long v) {
        bit_add(r1, c1, v);
        bit_add(r1, c2 + 1, -v);
        bit_add(r2 + 1, c1, -v);
        bit_add(r2 + 1, c2 + 1, v);
    }

    long long prefix_sum(int r, int c) {
        long long s = 0;
        for (int i = r; i > 0; i -= i & (-i))
            for (int j = c; j > 0; j -= j & (-j))
                s += (long long)(r + 1) * (c + 1) * b1[i][j]
                   - (long long)(c + 1) * b2[i][j]
                   - (long long)(r + 1) * b3[i][j]
                   + b4[i][j];
        return s;
    }

    long long range_sum(int r1, int c1, int r2, int c2) {
        return prefix_sum(r2, c2) - prefix_sum(r1 - 1, c2)
             - prefix_sum(r2, c1 - 1) + prefix_sum(r1 - 1, c1 - 1);
    }
};

int main() {
    // Example: point update + rectangle query on n x m grid
    int n, m, q;
    scanf("%d %d %d", &n, &m, &q);

    BIT2D bit(n, m);

    // Read initial grid and populate
    for (int i = 1; i <= n; i++)
        for (int j = 1; j <= m; j++) {
            long long v;
            scanf("%lld", &v);
            bit.update(i, j, v);
        }

    while (q--) {
        int t;
        scanf("%d", &t);
        if (t == 1) {
            int r, c;
            long long delta;
            scanf("%d %d %lld", &r, &c, &delta);
            bit.update(r, c, delta);
        } else {
            int r1, c1, r2, c2;
            scanf("%d %d %d %d", &r1, &c1, &r2, &c2);
            printf("%lld\n", bit.query(r1, c1, r2, c2));
        }
    }
}

Pre-Code Sanity Check

Before you start typing, confirm: are N and M small enough for an N×M array (typically 20004000)? Have you shifted all coordinates to 1-based indexing? Is every update a point update, or do you need the 4-BIT range-update extension? Are rectangle sums going through the 4-term inclusion-exclusion? If coordinates reach 109, have you compressed them or switched to offline CDQ?

The whole thing fits in one sentence: outer walks rows, inner walks cols—same bit trick, both directions.


Operations Reference

Each operation's cost reflects the nested BIT structure—one logarithmic factor per dimension.

OperationTimeSpaceNotes
Build (n*m updates)O(nmlognlogm)O(nm)Naive: call update for each cell
Build (propagation)O(nm)O(nm)Copy + propagate along each dim
Point updateO(lognlogm)--update(r, c, delta)
Prefix queryO(lognlogm)--query(r, c) = sum of [1..r][1..c]
Rectangle queryO(lognlogm)--Inclusion-exclusion on 4 prefix queries
Range update (point query)O(lognlogm)O(nm)2D difference array in BIT
Range update + range queryO(lognlogm)O(4nm)Four 2D BITs

Problem Patterns & Tricks

Pattern 1: 2D Rectangle Sum Queries

The bread-and-butter application. Given an n×m grid with point updates and rectangle sum queries, use BIT2D directly. The inclusion-exclusion formula handles arbitrary axis-aligned rectangles.

cpp
BIT2D bit(n, m);
bit.update(r, c, delta);
long long ans = bit.query(r1, c1, r2, c2);

Problems: CSES Forest Queries II, CF 869E

Pattern 2: 2D Range Update + Point Query

Need to add v to all cells in a rectangle [r1..r2][c1..c2] and later query individual cells? Use the 2D difference array trick. Four BIT updates per range update, one BIT prefix query per point query.

text
range_update(r1, c1, r2, c2, v):
    bit_update(r1,   c1,   +v)
    bit_update(r1,   c2+1, -v)
    bit_update(r2+1, c1,   -v)
    bit_update(r2+1, c2+1, +v)

point_query(r, c):
    return prefix_sum(r, c)

This mirrors the 1D trick: 1D difference array becomes a 2D difference matrix with four corner updates.

Problems: CF 341D, Gym 2D Range Update

Pattern 3: 2D Inversion Counting / Dominance Counting

Count pairs where point A dominates point B in both coordinates (2D dominance counting). Sort by one coordinate, sweep, and use a 2D BIT (or a 1D BIT on the second coordinate after compression) to count.

For strictly 2D problems where both coordinates matter simultaneously, the 2D BIT avoids coordinate-by-coordinate decomposition.

cpp
// Sort points by x, then for each point, query BIT for count of
// points already inserted with y <= current y
sort(pts.begin(), pts.end());
BIT bit(max_y);
long long count = 0;
for (auto& [x, y] : pts) {
    count += bit.query(y);  // 1D BIT often suffices after sorting
    bit.update(y, 1);
}

When a true 2D structure is needed (e.g., both dimensions require online updates), use BIT2D.

Problems: CF 1540B, CF 848C

Pattern 4: Online 2D Queries (No Offline Tricks)

When queries arrive online—each answer depending on the previous output—you can't sort or reorder them. A 2D BIT handles this naturally: each operation costs O(log2) with no batching. This is where the 2D BIT beats CDQ divide-and-conquer and merge sort tree approaches.

Problems: CF 1093E

Pattern 5: Grid Toggle + Rectangle Count

Classic: grid of 0/1, toggle a cell, query "how many 1s in rectangle?" This is point update (+1 or -1) plus rectangle sum—direct 2D BIT application.

cpp
BIT2D bit(n, m);
// toggle cell (r, c):
if (grid[r][c] == 0) { grid[r][c] = 1; bit.update(r, c, +1); }
else                  { grid[r][c] = 0; bit.update(r, c, -1); }
// count 1s in rectangle:
long long ones = bit.query(r1, c1, r2, c2);

Problems: CSES Forest Queries II, POJ 2155

Pattern 6: Coordinate Compression for Sparse Grids

When the grid is conceptually huge (109×109) but only q2×105 cells are ever accessed, compress both coordinates to [1..q] and use a 2D BIT of size q×q.

cpp
// Collect all row/col coordinates from updates and queries
vector<int> rs, cs;
for (auto& op : ops) { rs.push_back(op.r); cs.push_back(op.c); }
sort(rs.begin(), rs.end()); rs.erase(unique(rs.begin(), rs.end()), rs.end());
sort(cs.begin(), cs.end()); cs.erase(unique(cs.begin(), cs.end()), cs.end());
auto cr = [&](int r) { return (int)(lower_bound(rs.begin(), rs.end(), r) - rs.begin()) + 1; };
auto cc = [&](int c) { return (int)(lower_bound(cs.begin(), cs.end(), c) - cs.begin()) + 1; };
BIT2D bit((int)rs.size(), (int)cs.size());

Coordinate compression visualized:

text
Before compression (sparse, up to 10^9 x 10^9):

  Points: (5, 100000), (42, 3), (999999, 500)

  999999 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ● (999999, 500)
     :
    42 ─ ● (42, 3)
     :
     5 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ● (5, 100000)
         |                 |               |
         3               500           100000

After compression (dense 3x3 BIT):

  Rows: 5 -> 1,  42 -> 2,  999999 -> 3
  Cols: 3 -> 1, 500 -> 2, 100000 -> 3

     3 ─ ─ ● (3, 2)           BIT size: 3 x 3
     2 ─ ● (2, 1)             instead of 10^9 x 10^9
     1 ─ ─ ─ ─ ● (1, 3)
         |   |   |
         1   2   3

Problems: CF 1530E

Gotchas & Debugging

1. 1-Indexed in Both Dimensions

Like 1D BIT, indices must start at 1 in both row and column dimensions. update(0, c, v) or update(r, 0, v) triggers an infinite loop. If your grid is 0-indexed, add 1 to both coordinates everywhere.

2. Inclusion-Exclusion Off-By-One

The rectangle sum formula uses r11 and c11. When r1=1 or c1=1, these become 0, and query(0, c) should return 0—which it does, since the BIT loop exits immediately when i=0. But if you accidentally pass negative indices, you get undefined behavior.

3. Memory: O(nm) Can Be Tight

For n=m=4000 with long long, the tree consumes 4000×4000×8=128 MB. For range-update-range-query (four BITs), that's 512 MB—likely MLE. Solutions:

  • Use int instead of long long if values fit in 32 bits.
  • Use coordinate compression if the grid is sparse.
  • Consider offline approaches (CDQ divide-and-conquer) for very large grids.

2D difference array: four corner updates. For range update + point query, you need four BIT updates per range operation. Missing any of the four corners gives wrong answers. The signs alternate in a checkerboard pattern:

text
    c1        c2+1
r1  [+v]      [-v]
r2+1[-v]      [+v]

Build efficiently. Don't build by calling update n×m times—that's O(nmlognlogm). Instead, copy values into tree[][], then propagate along columns (for each row), then propagate along rows (for each column). Total: O(nm).

Swapped loop bounds on non-square grids. Allocating an N×M array but then writing for (j = y; j <= N; j += j & (-j)) (using N instead of M) skips valid columns or writes out of bounds. Keep the two dimensions distinct.

With n=m=2000 and values up to 109, the maximum prefix sum is 4×1015. Use long long. With range updates adding up to 109 across 2×105 operations, sums can be even larger—be careful.

4. Debugging Tips

  • Verify against brute-force 2D prefix sums for small grids (n,m5).
  • Print the reconstructed grid: for each (i,j): print query(i,j,i,j).
  • If WA, check: are both dimensions 1-indexed? Is the BIT allocated large enough (n+2 for diff array)? Are the inclusion-exclusion signs correct?

Mental Traps

Trap: "2D BIT must be fundamentally harder than 1D"

It isn't. The 2D BIT is mechanically the 1D BIT applied twice: the outer loop is a 1D BIT traversal over rows, and the inner loop is a 1D BIT traversal over columns. If 1D BIT feels natural, 2D is almost direct:

text
1D BIT update:                 2D BIT update:

for i = r; i <= n; i+=low(i)   for i = r; i <= n; i+=low(i)
    tree[i] += delta               for j = c; j <= m; j+=low(j)
                                       tree[i][j] += delta

Identical structure -- just nest the column loop inside.

The trap is attempting 2D before the 1D version is automatic. Any hesitation in the 1D loops compounds when they're nested.

Trap: "I need a 2D BIT for this 2D problem"

Many problems that appear to require a 2D structure can be solved with a 1D BIT by processing one dimension sorted and using the BIT for the other. Before reaching for the 2D BIT (which costs O(lognlogm) per operation and O(nm) memory), check whether sorting eliminates one dimension:

text
Problem: count points (x,y) dominated by each query

Approach A -- 2D BIT:              Approach B -- sort + 1D BIT:

  All points in 2D BIT              Sort all points by x
  BIT size: max_x x max_y          Sweep left -> right:
  Update: O(log X * log Y)           insert y into 1D BIT
  Memory: O(X * Y)                   query BIT for y <= qy
                                    Update: O(log Y)
                                    Memory: O(Y)

  Approach B is faster and uses far less memory.
  Only use 2D BIT when both dimensions need online updates.

Trap: "Update goes up, query goes down—easy, I won't mix them up"

Under contest pressure it's surprisingly common to use the wrong direction in one of the two nested loops. Both loops in update go UP; both loops in query go DOWN. Mixing directions in just the inner loop produces plausible-looking wrong answers:

text
UPDATE: both loops go UP (+= lowbit)

  outer: i=3 ──► i=4 ──► i=8 ──► STOP
  inner: j=2 ──► j=4 ──► j=8 ──► STOP
              ▲        ▲        ▲
           +lowbit  +lowbit  +lowbit

QUERY: both loops go DOWN (-= lowbit)

  outer: i=7 ──► i=6 ──► i=4 ──► i=0 STOP
  inner: j=5 ──► j=4 ──► j=0 STOP
              ▼        ▼        ▼
           -lowbit  -lowbit  -lowbit

[X] WRONG: outer UP, inner DOWN (or vice versa)
[OK] RIGHT: both loops same direction per operation

Spot the Bug

Bug 1—Silent zero index:

cpp
void update(int x, int y, int val) {
    for (int i = x; i <= N; i += i & (-i))
        for (int j = y; j <= M; j += j & (-j))
            tree[i][j] += val;
}
// Called with: update(0, 3, 1);
Fixx = 0 makes i & (-i) == 0, so the outer loop increments by 0 → infinite loop. Always use 1-based indices; add assert(x >= 1 && y >= 1); during debugging.

Bug 2—Inclusion-exclusion sign error:

cpp
int query(int x1, int y1, int x2, int y2) {
    return prefix(x2, y2) - prefix(x1 - 1, y2)
         - prefix(x2, y1 - 1) - prefix(x1 - 1, y1 - 1);
}
Fix The last term should be added, not subtracted. Correct: + prefix(x1-1, y1-1).

Bug 3—Swapped loop bounds:

cpp
void update(int x, int y, int val) {
    for (int i = x; i <= N; i += i & (-i))
        for (int j = y; j <= N; j += j & (-j))  // <- N should be M
            tree[i][j] += val;
}
Fix Inner loop uses N instead of M. When the grid is non-square, this skips columns or writes out of bounds.

Can You...

Before moving on: write update(r, c, delta) from memory—two nested loops, both += lowbit. Write query(r, c) from memory—two nested loops, both -= lowbit. State the 4-term inclusion-exclusion formula for rect_sum(r1, c1, r2, c2) and explain the sign of each term. Explain why passing index 0 to a BIT hangs the loop (what's lowbit(0)?). Name one problem type where a true 2D BIT is needed, and one where 1D BIT plus sorting suffices. Draw the 2D difference-array sign pattern that turns four BIT updates into a rectangle range update.


Practice Problems

#ProblemSourceDifficultyKey Idea
1Forest Queries IICSESMedium2D point toggle + rectangle count
2CF 869E—The Untended AntiquityCFMedium2D range update, hashing + BIT
3CF 341D—Iahub and XorsCFHard2D range update + range query with XOR
4POJ 2155—MatrixPOJMedium2D range toggle + point query
5CF 1093E—Intersection of PermutationsCFHard2D BIT / merge sort tree
6CF 1540B -- Tree ArrayCFMedium-HardDominance counting on tree
7CF 848C -- Goodbye SouvenirCFHardCDQ or 2D BIT approach
8SPOJ MATSUM -- Matrix SummationSPOJMediumClassic 2D point update + rect query
9CF 1530E -- EncryptionCFHardCoordinate compression + 2D BIT
10AtCoder ABC 339FAtCoderMedium-Hard2D counting with BIT

Further Reading


Next Up: Segment Tree—when you need more than sums, or when the grid is too large for O(nm) memory.

Built for the climb from Codeforces 1100 to 2100.