Skip to content

Binary Indexed Tree (Fenwick Tree)

A Fenwick tree is the smallest useful dynamic prefix-sum structure: it supports point updates and prefix sum queries in O(logn) time with very little code. If a segment tree would work but feels heavier than the problem deserves, this is usually the first thing to try.

Difficulty: Intermediate | Prerequisites: Union-Find (DSU)

Contest Frequency: Regular—common alternative to segment tree


Intuition

You have this array of 8 elements:

text
a: | 2 | 1 | 4 | 3 | 5 | 2 | 7 | 1 |
     1   2   3   4   5   6   7   8

You need to support two operations, intermixed in any order:

  1. prefix_sum(i): return a[1]+a[2]++a[i].
  2. update(i, delta): add δ to a[i].

Your first instinct is to precompute a prefix sum array p where p[i]=a[1]+a[2]++a[i].

text
p: | 0 | 2 | 3 | 7 | 10 | 15 | 17 | 24 | 25 |
     0   1   2   3    4    5    6    7    8

Queries are instant—prefix_sum(5) = p[5] = 15. One lookup, done.

Now run update(3, +10), adding 10 to a[3]. The array becomes:

text
a: | 2 | 1 | 14 | 3 | 5 | 2 | 7 | 1 |
               ^
             changed

But p[3] through p[8] are all stale. One update now forces six recomputations:

text
p[3]:  7 -> 17     p[4]: 10 -> 20     p[5]: 15 -> 25
p[6]: 17 -> 27     p[7]: 24 -> 34     p[8]: 25 -> 35
                                       ^-- 6 entries touched

Six operations for one update. Next, update(1, +5):

text
p[1]:  2 -> 7      p[2]:  3 -> 8      p[3]: 17 -> 22    p[4]: 20 -> 25
p[5]: 25 -> 30     p[6]: 27 -> 32     p[7]: 34 -> 39    p[8]: 35 -> 40
                                                          ^-- all 8 entries touched

Eight operations. Every single prefix sum entry changed.

Let's count the total work for a mixed sequence of 6 operations:

text
Operation          | Work (prefix sum array approach)
-------------------+----------------------------------
prefix_sum(5)      |  1  (read p[5])
update(3, +10)     |  6  (recompute p[3..8])
prefix_sum(7)      |  1  (read p[7])
update(1, +5)      |  8  (recompute p[1..8])
prefix_sum(4)      |  1  (read p[4])
update(6, +3)      |  3  (recompute p[6..8])
-------------------+----------------------------------
Total              | 20 operations

On average, each update touches n/2 entries. Flip the strategy and keep only the raw array, and queries become O(n) instead. Either way, one side of the interface is linear. For n=2×105 and q=2×105 mixed operations, that is up to 4×1010 work—TLE.

The problem is that we're rebuilding the entire prefix sum array on every single update. We need a smarter decomposition.

Instead of one monolithic prefix sum, break the array into power-of-two chunks. Each index i stores the sum of exactly i&(i) consecutive elements ending at position i, so any prefix decomposes into at most log2n chunks and any point update climbs through at most that many ancestors.

The cleanest mental model is a reporting hierarchy. Employee 1 reports only their own sales. Employee 2 reports the combined total for employees 1–2. Employee 4 covers 1–4. Employee 8 covers 1–8. Other employees cover smaller blocks determined by the lowest set bit in their ID. To get the company total for employees 1–7, you don't ask all 7 people. You ask employee 7, then 6, then 4—three lookups instead of seven.

The analogy breaks in one specific place: the BIT does not store an explicit tree. Its parent relation is computed on the fly as i+(i&(i)), so the hierarchy is implicit and entirely binary-arithmetic driven. No pointers, just a flat array and the lowbit trick.

Walking Through the Mechanics

Same 8-element array:

text
a: | 2 | 1 | 4 | 3 | 5 | 2 | 7 | 1 |
     1   2   3   4   5   6   7   8

Step 1: Assign Responsibility Ranges Using the Lowest Set Bit

For each index i, compute lowbit(i)=i&(i). Index i is responsible for the sum of lowbit(i) elements ending at position i—that is, the range [ilowbit(i)+1,i].

text
Index  Binary  lowbit  Responsible range   bit[i] stores
-----  ------  ------  -----------------   ----------------------
  1    0001      1     [1, 1]              a[1]           = 2
  2    0010      2     [1, 2]              a[1]+a[2]      = 3
  3    0011      1     [3, 3]              a[3]           = 4
  4    0100      4     [1, 4]              a[1]+..+a[4]   = 10
  5    0101      1     [5, 5]              a[5]           = 5
  6    0110      2     [5, 6]              a[5]+a[6]      = 7
  7    0111      1     [7, 7]              a[7]           = 7
  8    1000      8     [1, 8]              a[1]+..+a[8]   = 25

These ranges tile the array with no gaps and controlled overlap:

text
bit[1]: |--|
bit[2]: |-----|
bit[3]:       |--|
bit[4]: |-----------|
bit[5]:             |--|
bit[6]:             |-----|
bit[7]:                   |--|
bit[8]: |-----------------------|
        1  2  3  4  5  6  7  8

Notice the pattern: indices with lowbit = 1 (odd indices: 1, 3, 5, 7) cover single elements. Indices with lowbit = 2 (indices 2, 6) cover pairs. Index 4 (lowbit = 4) covers a group of four. Index 8 (lowbit = 8) covers all eight. The binary representation dictates the hierarchy.

Step 2: Query—Compute prefix_sum(7)

Start at i=7. Add bit[i], then strip the lowest set bit: i=i&(i). Repeat until i=0.

text
i = 7  (0111)  -->  add bit[7] = 7    covers [7, 7]
     strip lowest bit: 7 - 1 = 6

i = 6  (0110)  -->  add bit[6] = 7    covers [5, 6]
     strip lowest bit: 6 - 2 = 4

i = 4  (0100)  -->  add bit[4] = 10   covers [1, 4]
     strip lowest bit: 4 - 4 = 0

i = 0  -->  STOP

result = 7 + 7 + 10 = 24
check:   a[1]+..+a[7] = 2+1+4+3+5+2+7 = 24   <-- correct!

3 additions. The ranges [7,7], [5,6], [1,4] partition [1,7] perfectly—no overlap, no gap. This always works because stripping set bits of 7=01112 peels off chunks of size 1, 2, and 4, summing to 7.

Step 3: Update—Add +10 at Position 3

Start at i=3. Add 10 to bit[i], then add the lowest set bit: i+=i&(i). Repeat until i>n.

text
i = 3  (0011)  -->  bit[3] += 10   covers [3, 3]
     add lowest bit: 3 + 1 = 4

i = 4  (0100)  -->  bit[4] += 10   covers [1, 4]
     add lowest bit: 4 + 4 = 8

i = 8  (1000)  -->  bit[8] += 10   covers [1, 8]
     add lowest bit: 8 + 8 = 16 > 8  -->  STOP

Only 3 entries updated—exactly the entries whose responsible ranges contain position 3. Compare: the prefix sum array needed 6 updates for this same operation.

text
Before update:                     After update(3, +10):

bit[1] = 2                        bit[1] = 2
bit[2] = 3                        bit[2] = 3
bit[3] = 4           ------>      bit[3] = 14            *
bit[4] = 10                       bit[4] = 20            *
bit[5] = 5                        bit[5] = 5
bit[6] = 7                        bit[6] = 7
bit[7] = 7                        bit[7] = 7
bit[8] = 25                       bit[8] = 35            *

Step 4: Verify—Query prefix_sum(7) After the Update

After adding 10 to a[3], the true prefix sum through index 7 should be 24+10=34.

text
i = 7  (0111)  -->  bit[7] = 7     (unchanged)
     strip: 7 - 1 = 6

i = 6  (0110)  -->  bit[6] = 7     (unchanged)
     strip: 6 - 2 = 4

i = 4  (0100)  -->  bit[4] = 20    (was 10, got +10 in Step 3)
     strip: 4 - 4 = 0

i = 0  -->  STOP

result = 7 + 7 + 20 = 34   <-- correct!

Still 3 additions for the query. The update in Step 3 correctly propagated the change to bit[4], which the query in Step 4 picks up. Entries not on the query path (bit[3], bit[8]) were also updated—they will be picked up by other queries that pass through them (e.g., prefix_sum(3) uses bit[3]; prefix_sum(8) uses bit[8]).

Step 5: Operation Count Comparison

Running the same 6-operation sequence from before, now using the BIT:

text
Operation          | Naive (prefix array) | BIT
-------------------+----------------------+-----------------
prefix_sum(5)      |  1                   |  2  (5->4->0)
update(3, +10)     |  6                   |  3  (3->4->8)
prefix_sum(7)      |  1                   |  3  (7->6->4->0)
update(1, +5)      |  8                   |  4  (1->2->4->8)
prefix_sum(4)      |  1                   |  1  (4->0)
update(6, +3)      |  3                   |  2  (6->8)
-------------------+----------------------+-----------------
Total              | 20                   | 15

At n=8, the gap is modest: 20 vs. 15. But each BIT operation does at most log2n=3 steps regardless of which position is targeted. At n=2×105, a single naive update can touch 200,000 entries. A BIT update touches at most 17. The gap doesn't just widen—it explodes.

The Invariant That Holds It Together

text
+------------------------------------------------------------------+
| INVARIANT: bit[i] stores the sum of elements a[j] for            |
|   j in [i - lowbit(i) + 1,  i]  where lowbit(i) = i & (-i).    |
|                                                                  |
| Equivalently: bit[i] = a[i - lowbit(i) + 1] + ... + a[i].      |
+------------------------------------------------------------------+

Why this is maintained on update: When we execute update(k, delta), we visit every index i whose responsible range contains position k—that is, every i such that k[ilowbit(i)+1,i]. The update loop i+=i&(i) generates exactly these indices and no others. Look at Step 3 above: updating position 3 touched bit[3] (range [3,3]), bit[4] (range [1,4]), and bit[8] (range [1,8])—exactly the three entries whose responsible ranges include position 3. Indices like bit[2] (range [1,2]) and bit[6] (range [5,6]) were correctly skipped because their ranges do not contain position 3.

Why this is maintained on query: When we compute prefix_sum(i), we need a[1]++a[i]. Stripping the lowest set bit (i=i&(i)) partitions the range [1,i] into non-overlapping chunks whose sizes are the set bits in the binary representation of i. Look at Step 2: querying prefix_sum(7) decomposed [1,7] into [7,7]+[5,6]+[1,4]—three chunks corresponding to the three set bits in 7=01112 (sizes 1, 2, and 4, summing to 7). No element is counted twice, no element is missed.

Connection to the walkthrough: In Step 4, after the update, bit[4] had changed from 10 to 20. If it had not been updated in Step 3, the invariant would break: bit[4] would still claim to store a[1]+a[2]+a[3]+a[4]=10, but the true value of that sum is now 20 (because a[3] grew by 10). The query in Step 4 would return 7+7+10=24 instead of the correct 34. The update loop—climbing from i=3 to its ancestors 4 and 8—is precisely what keeps the invariant alive.

Deeper Principles

With the mechanics in place, here are the deeper insights that separate understanding from mere pattern-matching.

What the Code Won't Teach You

lowbit is geometric—visualize the coverage pattern.

The expression i & (-i) does more than compute a number. It tells you the width of the range stored at that index. Lay out the BIT entries by coverage width and a staircase pattern appears:

text
         Coverage width of tree[i] (= lowbit(i))

Index:   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16
lowbit:  1    2    1    4    1    2    1    8    1    2    1    4    1    2    1   16

         #         #         #         #         #         #         #         #
         #    #    #    #    #    #    #    #    #    #    #    #    #    #    #
              #         #         #         #         #         #         #
              #         #         #         #              #         #
                        #                   #                        #
                        #                   #
                        #                   #
                                            #
         1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16

Each column's height = lowbit(i) = how many elements tree[i] covers.
Odd indices always have height 1 (single element).
Powers of 2 form the "spine" -- each covers twice as much as the last.

This is self-similar, not random. Every time you add lowbit during an update, you jump to the next taller column. Every time you subtract lowbit during a query, you step back to the previous taller column. That geometry is exactly why both loops stay O(logn).

BIT cannot do min/max—the subtraction trick breaks.

BIT computes range queries via subtraction: sum(l,r)=prefix(r)prefix(l1). That only works when the operation has an inverse. Sum has subtraction; XOR has XOR. Min and max have no inverses, so once a prefix minimum drops to 2 you cannot "un-min" the earlier elements. The cancellation that makes a BIT work simply isn't there, which is why non-invertible operations belong to a segment tree.

When BIT beats segment tree—and when it doesn't.

BIT wins on constant factor, code length, cache behavior, and memory. The loops are short, the array is flat, and there is no tree recursion to manage. Segment trees win when you need non-invertible operations, lazy propagation, or actual tree walking. BIT can still do k-th element queries with binary lifting, but only for sum-like operations.

Reflex rule: if the operation is invertible and you only need point updates plus prefix/range queries, start with BIT. Everything else points to a segment tree.

Prerequisites you must have cold.

  • Binary representation: you must be able to compute i & (-i) by hand. Know that -i in two's complement flips all bits and adds 1, so i & (-i) isolates the lowest set bit. If this is opaque, drill it before touching BIT code.
  • Prefix-sum paradigm: BIT is a dynamic prefix sum array. The mental model—build prefix sums, answer range queries by subtraction, but now updates are fast too—must be second nature.

Once the mechanics and the binary trick are clear, the only remaining question is when this is the right tool.

When to Reach for This

Trigger phrases—if you see these in a problem, think BIT:

  • "prefix sum with point updates"—textbook BIT territory.
  • "point update, range sum query"—use query(l,r)=query(r)query(l1).
  • "count inversions" or "count pairs (i,j) with a[i]>a[j] and i<j"—process right-to-left with a BIT over values as a frequency table.
  • "dynamic order statistics" or "k-th smallest with insertions/deletions"—use the bit-descent trick for O(logn) k-th element queries.
  • "2D grid with point updates and rectangle sum queries"—use the nested 2D BIT.

Counter-examples—problems that look like they need a BIT but actually don't:

  • "Range update + range query" with set operations (for example, "set all elements in [l,r] to v, then query range sums")—a BIT can handle range-add + range-sum via the two-BIT trick (BIT_RangeRange in the Implementation section below), but set operations destroy the additive structure. You need a segment tree with lazy propagation. See: Segment Tree (Lazy).
  • "Range minimum/maximum query with updates"—BIT only works for operations that have an inverse. Min and max have none, so you cannot recover a subrange from prefix data. Use a segment tree. See: Segment Tree.
  • "Static range queries, no updates at all"—if the array never changes, don't pay for update support. A plain prefix sum array gives O(1) queries after O(n) preprocessing, which is strictly better than BIT's O(logn). For static min/max, use a sparse table for O(1) queries. See: Sparse Table / RMQ.

C++ STL Reference

No STL container implements a Fenwick tree. This is a manual implementation. The only STL dependency is <vector> for storage.

Function / ClassHeaderWhat it doesTime Complexity
vector<long long><vector>Backing storage for BIT--
__builtin_ctz(x)built-inCount trailing zeros (= log2 of lowest set bit)O(1)
x & (-x)--Extract lowest set bit of xO(1)

Implementation (Contest-Ready)

Minimal Contest Template

cpp
#include <bits/stdc++.h>
using namespace std;

struct BIT {
    int n;
    vector<long long> tree;
    BIT(int n) : n(n), tree(n + 1, 0) {}

    void update(int i, long long delta) {
        for (; i <= n; i += i & (-i))
            tree[i] += delta;
    }

    long long query(int i) {
        long long s = 0;
        for (; i > 0; i -= i & (-i))
            s += tree[i];
        return s;
    }

    long long query(int l, int r) {
        return query(r) - query(l - 1);
    }
};

int main() {
    int n, q;
    scanf("%d %d", &n, &q);
    BIT bit(n);
    for (int i = 1; i <= n; i++) {
        long long x;
        scanf("%lld", &x);
        bit.update(i, x);
    }
    while (q--) {
        int t;
        scanf("%d", &t);
        if (t == 1) {
            int i; long long v;
            scanf("%d %lld", &i, &v);
            bit.update(i, v);
        } else {
            int l, r;
            scanf("%d %d", &l, &r);
            printf("%lld\n", bit.query(l, r));
        }
    }
}

Explained Version with O(n) Build

cpp
#include <bits/stdc++.h>
using namespace std;

struct BIT {
    int n;
    vector<long long> tree;

    // Initialize BIT of size n (1-indexed: positions 1..n)
    BIT(int n) : n(n), tree(n + 1, 0) {}

    // O(n) construction from an existing array a[1..n].
    // Instead of n separate O(log n) updates, we propagate each
    // node's value to its direct parent exactly once.
    BIT(const vector<long long>& a) : n((int)a.size() - 1), tree(a) {
        // a must be 1-indexed: a[0] is unused, a[1..n] has values.
        for (int i = 1; i <= n; i++) {
            int parent = i + (i & (-i));
            if (parent <= n)
                tree[parent] += tree[i];
        }
    }

    // Add delta to position i. Propagates up to all ancestors.
    // Ancestors of i are: i, i + lowbit(i), i + lowbit(i + lowbit(i)), ...
    void update(int i, long long delta) {
        for (; i <= n; i += i & (-i))
            tree[i] += delta;
    }

    // Returns sum of a[1..i] (prefix sum up to index i).
    // Walks down by stripping the lowest set bit each step.
    long long query(int i) {
        long long s = 0;
        for (; i > 0; i -= i & (-i))
            s += tree[i];
        return s;
    }

    // Returns sum of a[l..r] using two prefix queries.
    long long query(int l, int r) {
        return query(r) - query(l - 1);
    }

    // Point set: set a[i] = val (not add). Requires knowing current value.
    // Use when the problem says "set a[i] to x" rather than "add x to a[i]".
    // To use this, maintain a shadow array of current values.
    // void set(int i, long long val, vector<long long>& a) {
    //     update(i, val - a[i]);
    //     a[i] = val;
    // }
};

// --- Range update + Point query BIT ---
// Trick: to add v to all elements in [l, r], do:
//   bit.update(l, +v);  bit.update(r+1, -v);
// Then bit.query(i) gives the current value at position i.

struct BIT_RangeUpdate {
    int n;
    vector<long long> tree;
    BIT_RangeUpdate(int n) : n(n), tree(n + 2, 0) {}

    void update(int i, long long delta) {
        for (; i <= n; i += i & (-i))
            tree[i] += delta;
    }

    // Add delta to all positions in [l, r]
    void range_add(int l, int r, long long delta) {
        update(l, delta);
        update(r + 1, -delta);
    }

    // Query the current value at position i (prefix sum of difference array)
    long long point_query(int i) {
        long long s = 0;
        for (; i > 0; i -= i & (-i))
            s += tree[i];
        return s;
    }
};

// --- Range update + Range query BIT ---
// Uses two BITs to support both range add and range sum.
// If D is the difference array, then:
//   a[1]+a[2]+...+a[k] = sum_{i=1}^{k} D[i]*(k-i+1)
//                       = (k+1)*sum(D[i]) - sum(i*D[i])

struct BIT_RangeRange {
    int n;
    vector<long long> b1, b2; // b1 stores D[i], b2 stores i*D[i]

    BIT_RangeRange(int n) : n(n), b1(n + 2, 0), b2(n + 2, 0) {}

    void bit_add(vector<long long>& b, int i, long long delta) {
        for (; i <= n; i += i & (-i))
            b[i] += delta;
    }

    long long bit_sum(vector<long long>& b, int i) {
        long long s = 0;
        for (; i > 0; i -= i & (-i))
            s += b[i];
        return s;
    }

    // Add delta to all positions in [l, r]
    void range_add(int l, int r, long long delta) {
        bit_add(b1, l, delta);
        bit_add(b1, r + 1, -delta);
        bit_add(b2, l, delta * l);
        bit_add(b2, r + 1, -delta * (r + 1));
    }

    // Prefix sum query: a[1] + a[2] + ... + a[i]
    long long prefix_sum(int i) {
        return bit_sum(b1, i) * (i + 1) - bit_sum(b2, i);
    }

    // Range sum query: a[l] + a[l+1] + ... + a[r]
    long long range_sum(int l, int r) {
        return prefix_sum(r) - prefix_sum(l - 1);
    }
};

int main() {
    // Example: CSES Dynamic Range Sum Queries
    int n, q;
    scanf("%d %d", &n, &q);
    vector<long long> a(n + 1);
    for (int i = 1; i <= n; i++) scanf("%lld", &a[i]);
    BIT bit(a); // O(n) construction
    while (q--) {
        int t;
        scanf("%d", &t);
        if (t == 1) {
            int i; long long val;
            scanf("%d %lld", &i, &val);
            bit.update(i, val - a[i]);
            a[i] = val;
        } else {
            int l, r;
            scanf("%d %d", &l, &r);
            printf("%lld\n", bit.query(l, r));
        }
    }
}

Update Traversal Visualized

How does update(5, +delta) propagate through a 16-element BIT? Each step adds lowbit(i) to jump to the next ancestor whose range contains index 5:

text
update(5, +delta)  -- trace every tree[] entry that gets modified:

Step  i (binary)   tree[i] += delta   lowbit(i)   range covered by tree[i]
----  ----------   ----------------   ---------   ------------------------
  1   5  (00101)   tree[5]  += D         1        [5, 5]
  2   6  (00110)   tree[6]  += D         2        [5, 6]
  3   8  (01000)   tree[8]  += D         8        [1, 8]
  4   16 (10000)   tree[16] += D        16        [1, 16]
      32 > n=16 -> STOP

Visually (which entries light up for update at position 5):

tree[16]: [================================================]  <-- hit
tree[15]:                                              [==]
tree[14]:                                        [========]
tree[13]:                                  [==]
tree[12]:                            [=================]
tree[11]:                      [==]
tree[10]:                [========]
tree[9]:           [==]
tree[8]:  [==========================]                         <-- hit
tree[7]:                       [==]
tree[6]:                 [========]                             <-- hit
tree[5]:           [==]                                         <-- hit (start)
tree[4]:  [=================]
tree[3]:              [==]
tree[2]:  [========]
tree[1]:  [==]
          1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16

Path: 5 -> 6 -> 8 -> 16  (each hop: i += i & (-i))
Notice: each hit covers a strictly larger range than the previous.

Operations Reference

Use this table to pick the right BIT variant for your problem—each row is a different capability with its cost.

OperationTimeSpaceNotes
Build (from array)O(n)O(n)Propagate-to-parent method
Build (n updates)O(nlogn)O(n)Naive: call update n times
Point updateO(logn)--update(i, delta)
Prefix queryO(logn)--query(i) = sum of a[1..i]
Range queryO(logn)--query(l, r) = two prefix queries
Range update (point query)O(logn)O(n)Difference array trick
Range update + range queryO(logn)O(2n)Two BITs
2D point updateO(log2n)O(n2)Nested BIT
2D prefix queryO(log2n)--Nested BIT

Where Does the Complexity Come From?

A BIT uses the binary representation of indices. The key operations are:

  • Query (prefix sum): Start at index i, add tree[i], then remove the lowest set bit: i=i&(i). Each step clears one bit, so at most log2n steps.
  • Update (point add): Start at index i, add to tree[i], then add the lowest set bit: i+=i&(i). Each step sets a higher bit, so at most log2n steps.

Why i&(i)? In two's complement, i flips all bits and adds 1. So i&(i) isolates the lowest set bit. This is the "responsible range length" for position i in the BIT.

0-Indexed vs. 1-Indexed BIT

1-indexed (standard): Indices go from 1 to n. Simpler bit operations.

cpp
void update(int i, int delta) { for (; i <= n; i += i & (-i)) bit[i] += delta; }
int query(int i) { int s = 0; for (; i > 0; i -= i & (-i)) s += bit[i]; return s; }

0-indexed: Indices go from 0 to n1. Transform: internally use i+1.

cpp
void update(int i, int delta) { for (++i; i <= n; i += i & (-i)) bit[i] += delta; }
int query(int i) { int s = 0; for (++i; i > 0; i -= i & (-i)) s += bit[i]; return s; }

The only difference is ++i at the start of each function. Both give the same asymptotic complexity. Use 1-indexed in contests (less error-prone). If your problem uses 0-indexed arrays, just add 1 when interfacing with the BIT.

Problem Patterns & Tricks

Inversion Counting

Count pairs (i,j) where i<j and a[i]>a[j]. Process elements right-to-left. For each a[i], query how many elements smaller than a[i] are already in the BIT (those are to the right), then insert a[i].

cpp
// Coordinate compress a[] to [1..n], then:
long long inversions = 0;
BIT bit(n);
for (int i = n - 1; i >= 0; i--) {
    inversions += bit.query(a[i] - 1); // count elements < a[i] already inserted
    bit.update(a[i], 1);
}

Problems: CF 459D, CF 1430E

Range Update + Point Query (Difference Array)

When you need to add v to all elements in [l,r] and later query individual elements, use a BIT over a difference array. update(l, +v) and update(r+1, -v), then query(i) gives the value at position i.

Problems: CF 276C, CSES Range Update Queries

Dynamic Order Statistics (k-th Smallest)

Maintain a BIT as a frequency array. To find the k-th smallest element, binary search on the BIT prefix sums—or use the O(logn) bit-descent trick:

cpp
// Find smallest pos such that prefix_sum(pos) >= k
int kth(BIT& bit, int k) {
    int pos = 0;
    for (int pw = 1 << __lg(bit.n); pw > 0; pw >>= 1) {
        if (pos + pw <= bit.n && bit.tree[pos + pw] < k) {
            pos += pw;
            k -= bit.tree[pos];
        }
    }
    return pos + 1;
}

This walks the BIT top-down in O(logn) without any binary search.

Problems: CF 1042D, CF 61E

2D BIT (2D Prefix Sums with Updates)

Nest one BIT inside another to handle 2D point updates and 2D prefix queries. Each update(x, y, delta) does O(log2n) work. Useful for grid problems with dynamic updates.

cpp
struct BIT2D {
    int n, m;
    vector<vector<long long>> tree;
    BIT2D(int n, int m) : n(n), m(m), tree(n + 1, vector<long long>(m + 1, 0)) {}

    void update(int x, int y, long long delta) {
        for (int i = x; i <= n; i += i & (-i))
            for (int j = y; j <= m; j += j & (-j))
                tree[i][j] += delta;
    }

    long long query(int x, int y) {
        long long s = 0;
        for (int i = x; i > 0; i -= i & (-i))
            for (int j = y; j > 0; j -= j & (-j))
                s += tree[i][j];
        return s;
    }

    long long query(int x1, int y1, int x2, int y2) {
        return query(x2, y2) - query(x1 - 1, y2)
             - query(x2, y1 - 1) + query(x1 - 1, y1 - 1);
    }
};

Problems: CF 869E, CSES Forest Queries II

Offline Processing with BIT

Sort queries/events by some coordinate, sweep through them, and use BIT for the other dimension. Classic in problems combining sorting with range queries.

Problems: CF 1311F, CF 1042D

BIT on Values (Frequency Counting)

Instead of indexing BIT by position, index by value. Insert/remove elements and query "how many elements in range [l,r]?" Requires coordinate compression if values are large.

cpp
// Count elements in [1..v]
bit.update(val, +1);          // insert val
long long cnt = bit.query(v); // how many elements <= v

Problems: CF 540E, CF 1355E

Range Update + Range Query (Two BITs)

When you need both range add and range sum, use the two-BIT trick from BIT_RangeRange above. Avoids segment tree + lazy propagation entirely.

Problems: CSES Range Updates and Sums (partial—full version needs segtree for set operations)

Contest Cheat Sheet

text
+----------------------------------------------------------------+
| FENWICK TREE (BIT) CHEAT SHEET                                 |
+----------------------------------------------------------------+
| WHEN TO USE:                                                   |
|   - Point update + prefix/range sum queries                    |
|   - Inversion counting                                         |
|   - Dynamic order statistics (k-th element)                    |
|   - 2D range sums with updates                                 |
|   - Anything segment tree does for sum/xor (simpler code)      |
+----------------------------------------------------------------+
| CORE (1-indexed, n elements):                                  |
|   void upd(int i, ll d) { for(;i<=n;i+=i&-i) t[i]+=d; }      |
|   ll qry(int i) { ll s=0; for(;i>0;i-=i&-i) s+=t[i]; return s;}|
|   ll qry(int l,int r) { return qry(r)-qry(l-1); }            |
+----------------------------------------------------------------+
| RANGE UPDATE (difference array):                               |
|   add v to [l,r]: upd(l,+v), upd(r+1,-v)                     |
|   point query i:  qry(i)                                      |
+----------------------------------------------------------------+
| K-TH SMALLEST (frequency BIT):                                 |
|   Walk top-down: pw = 1<<__lg(n), halve pw each step          |
+----------------------------------------------------------------+
| COMPLEXITY:                                                    |
|   Build: O(n)   Update: O(log n)   Query: O(log n)            |
|   Space: O(n)   Constant factor: ~2-3x better than segtree    |
+----------------------------------------------------------------+
| GOTCHA: 1-indexed! Index 0 = infinite loop.                   |
+----------------------------------------------------------------+

Gotchas & Debugging

1-Indexed—This Is Non-Negotiable

BIT must be 1-indexed. update(0, v) causes an infinite loop because 0 & (-0) == 0, so i += 0 never terminates. If your input is 0-indexed, add 1 everywhere.

Off-By-One in Range Queries

query(l, r) = query(r) - query(l - 1). If you write query(r) - query(l), you exclude element l. This is the #1 BIT bug.

update Is Additive, Not Absolute

update(i, v) adds v to position i. To set position i to value x, you must do update(i, x - current[i]) and maintain a shadow array.

Integer Overflow

With n=2×105 and values up to 109, prefix sums reach 2×1014. Use long long.

Coordinate Compression

When values go up to 109 but n2×105, compress values to [1,n] before using BIT. Sort, deduplicate, and use lower_bound to map.

cpp
vector<int> vals(a.begin(), a.end());
sort(vals.begin(), vals.end());
vals.erase(unique(vals.begin(), vals.end()), vals.end());
auto compress = [&](int x) {
    return (int)(lower_bound(vals.begin(), vals.end(), x) - vals.begin()) + 1;
};

Range Update: Don't Forget r + 1

In the difference-array trick, you must call update(r + 1, -delta). If r=n, that accesses index n+1—so allocate size n+2.

Debugging Tips

  • Print the BIT array raw: for (int i = 1; i <= n; i++) cerr << bit.query(i, i) << " ";
  • Verify against brute-force prefix sums for small n.
  • If you get WA, check: is the BIT large enough? Did you compress values? Is the shadow array in sync?

Thinking tree[i] Stores a[i]

This is the most common conceptual error. tree[i] stores the sum of a range of elements, not a single element. The range length is lowbit(i).

text
What beginners imagine:            What tree[i] actually stores:

tree[1] = a[1]  WRONG              tree[1] = a[1]              (range [1,1], len 1)
tree[2] = a[2]  WRONG              tree[2] = a[1]+a[2]         (range [1,2], len 2)
tree[3] = a[3]  WRONG              tree[3] = a[3]              (range [3,3], len 1)
tree[4] = a[4]  WRONG              tree[4] = a[1]+a[2]+a[3]+a[4]  (range [1,4], len 4)
tree[5] = a[5]  WRONG              tree[5] = a[5]              (range [5,5], len 1)
tree[6] = a[6]  WRONG              tree[6] = a[5]+a[6]         (range [5,6], len 2)

Rule: tree[i] = sum of a[j] for j in [i - lowbit(i) + 1 .. i]
      Only when lowbit(i) = 1 (odd indices) does tree[i] = a[i].

Thinking BIT Is "a Simpler Segment Tree"

BIT and segment tree have fundamentally different structures. A segment tree has explicit nodes for every interval and can walk its tree to answer arbitrary range queries. A BIT is a flat array with an implicit structure that only natively supports prefix queries. You derive range queries by subtraction, which requires an invertible operation.

text
Segment tree:                    BIT:
        [1..8]                   Just a flat array: tree[1..8]
       /      \                  "Tree" structure lives entirely
   [1..4]    [5..8]             in the bit manipulation.
   /    \    /    \              No nodes, no children, no pointers.
 [1,2] [3,4] [5,6] [7,8]
 / \   / \   / \   / \          Can answer: prefix queries (natively)
1   2 3   4 5   6 7   8         Cannot answer: arbitrary range min/max
                                Cannot do: lazy propagation

Off-by-One in Range Queries—the Silent Killer

The correct formula is query(r) - query(l-1), not query(r) - query(l). The second version silently drops element a[l]. This bug is hard to catch because it gives the right answer whenever a[l] = 0.

text
Example: a = [_, 2, 5, 3, 7, 1]   (1-indexed, a[0] unused)
Goal: sum of a[2..4] = 5 + 3 + 7 = 15

Prefix sums:  query(1)=2  query(2)=7  query(3)=10  query(4)=17

CORRECT:     query(4) - query(1)  =  17 - 2  = 15
             query(r) - query(l-1)    r=4, l-1=1
                                             ^ cuts off everything BEFORE l

WRONG:       query(4) - query(2)  =  17 - 7  = 10
             query(r) - query(l)      r=4, l=2
                                           ^ cuts off l itself!

Visually, what each subtraction removes:

a:       [2]  [5]  [3]  [7]  [1]
          1    2    3    4    5

query(1): [##]                       <-- correct: remove prefix before l=2
query(2): [##] [##]                  <-- WRONG: removes a[2] too!
query(4): [##] [##] [##] [##]       <-- always the same

Correct:  query(4) - query(1) = [5] + [3] + [7] = 15
Wrong:    query(4) - query(2) =       [3] + [7] = 10   -- a[2] is GONE

Self-Test

Before moving on, verify you own this data structure—not just recognize it:

  • [ ] Can you write update() and query() from memory with correct traversal directions? (+= lowbit for update, -= lowbit for query)
  • [ ] Can you compute lowbit(6) by hand? (6 = 0110, -6 = 1010, 6 & -6 = 0010 = 2) What about lowbit(12)? (12 = 1100, -12 = 0100, 12 & -12 = 0100 = 4)
  • [ ] Can you explain in one sentence why BIT cannot compute range min?
  • [ ] Can you state the correct range query formula and identify the off-by-one trap?
  • [ ] Can you describe the O(n) build method? (Propagate each tree[i] to its parent i + lowbit(i))
  • [ ] Given a problem that says "range add, point query," can you immediately see the difference-array BIT trick?
  • [ ] Can you name one problem where BIT is clearly better than segment tree, and one where it clearly isn't?

Practice Problems

#ProblemSourceDifficultyKey Idea
1Dynamic Range Sum QueriesCSESEasyBasic point update + range query
2Range Update QueriesCSESEasyDifference array BIT
3Distinct Values QueriesCSESMediumOffline + BIT
4Inversion CountCSESMediumClassic inversion counting
5CF 459D -- Pashmak and Flower BouquetsCFMediumInversion counting variant
6CF 1042D -- Petya and ArrayCFMediumPrefix sums + BIT on values
7CF 61E -- Enemy is WeakCFMediumTriple inversions
8CF 1430E -- String ReversalCFMedium-HardInversion count via BIT
9Forest Queries IICSESHard2D BIT
10CF 1311F -- Moving PointsCFHardSweep line + BIT

Further Reading

Built for the climb from Codeforces 1100 to 2100.