Skip to content

Bitset Optimization

Quick Revisit

  • USE WHEN: O(n^2) boolean DP is barely TLE and operations are AND/OR/XOR on bit vectors
  • INVARIANT: bitset packs N bits into N/64 words; each bitwise op processes 64 bits per instruction
  • TIME: O(N/64) per bitwise op | SPACE: O(N/64)
  • CLASSIC BUG: bitset size N must be a compile-time constant—cannot use runtime variable
  • PRACTICE: ../12-practice-ladders/01-ladder-1100-to-1400.md

std::bitset gives you O(n/64) speedups on bitwise operations, turning borderline TLE into comfortable AC.

Scope note: This file teaches when and how std::bitset works—memory layout, the /64 speedup, and core patterns (transitive closure, subset-sum, Gaussian elimination). For a contest-ready cheat sheet of bitset recipes and quick-reference snippets, see Bitset Tricks.

Difficulty: Intermediate | CF Rating: 1500–1800 | Prerequisites: Bit Manipulation

Bit-parallel techniques date to 1960s IBM mainframes, where word-level boolean operations on packed bit vectors were a primary optimization. The C++ std::bitset standardized this idea in C++98, giving competitive programmers a one-line way to exploit 64-wide parallelism. Think of a bitset as a row of 64 light switches on a single panel: you flip all 64 with one hand instead of flipping each bulb individually. The fingerprint to burn into memory: O(n²) TLE + boolean operations → bitset /64. A useful mnemonic: "Pack it, OR it, ship it—64 at a time."


Contents


When O(n2) is barely too slow

Here is a problem you will meet sooner or later:

Given a directed graph on n2000 nodes, determine for every pair (u,v) whether u can reach v.

The naive plan: run a DFS/BFS from each node. Each traversal touches O(n+m) edges. In the worst case m=O(n2), so the total is O(nn2)=O(n3). For n=2000 that is 8×109—way too slow.

A smarter plan: process nodes in reverse topological order and propagate reachability sets. For each edge ji, merge reachability: reach[i] = reach[i] U reach[j]. If we store reach[i] as a std::set or std::vector<bool>, each union costs O(n). With O(n2) edges, we get O(n3) again—but with a much smaller constant. Still borderline for n=2000.

The question: can we shave off a constant factor of 64?


64 Bits at a Time

A std::bitset<N> packs N bits into N/64 64-bit words. When you write a |= b, the compiler emits one OR instruction per word. For N=2000:

2000/64=32 words

That is 32 OR instructions instead of 2000 individual flag updates. The speedup factor is roughly 64×—and that turns O(n3) with a constant of 1 into O(n3/64), which for n=2000 is about 1.25×108. Comfortable under 1 second.

Memory Layout: Bitset vs. Bool Array

text
  bool reach[2000]:
  +---+---+---+---+---+---+---+---+---+---+---+--     --+---+
  | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | ... | 1 |
  +---+---+---+---+---+---+---+---+---+---+---+--     --+---+
    1 byte each  -->  2000 bytes total (wasted: 7 bits per byte)

  bitset<2000> reach:
  +----------------+----------------+----------------+---  ---+----------------+
  | word 0 (64b)   | word 1 (64b)   | word 2 (64b)   |  ...  | word 31 (64b)  |
  | bits 0..63     | bits 64..127   | bits 128..191  |       | bits 1984..2047|
  +----------------+----------------+----------------+---  ---+----------------+
    8 bytes each  -->  256 bytes total (8x denser)

The density matters for two reasons: less memory used, and better cache utilization. But the real win is that bitwise OR/AND/XOR on the entire set costs O(N/64) instead of O(N).

How a |= b Works Under the Hood

text
  a:   |0110...1001|1010...0011|0001...1110|  ...  |1100...0001|
       word 0       word 1       word 2              word 31

  b:   |1001...0110|0101...1100|1110...0001|  ...  |0011...1110|
       word 0       word 1       word 2              word 31

  a|=b:|1111...1111|1111...1111|1111...1111|  ...  |1111...1111|
       1 CPU OR     1 CPU OR     1 CPU OR            1 CPU OR
       = 32 total OR instructions for 2000 bits

The Aha Moment

The key insight: a boolean array is already a bitset—you are just wasting 63 bits per element. Every bool in a bool[] occupies 8 bits but stores only 1 bit of information.

When you replace bool reach[2000] with bitset<2000>, you are not changing the algorithm. You are telling the compiler to stop wasting 7/8 of every byte and to process 64 booleans per CPU instruction instead of one.

The "aha" is recognizing that any inner loop doing a[i] = a[i] | b[i] across a boolean array is secretly 64 independent OR operations packed into one—if only you let the compiler see it.


Let's see that 64× speedup in action, starting with the motivating problem: transitive closure.

Transitive Closure in 32 Operations per Edge

Process nodes in reverse topological order. For each edge ji, do reach[i] |= reach[j]. Every node is reachable from itself, so initialize reach[i].set(i).

This is the bitset-accelerated variant of Floyd-Warshall transitive closure. For the full per-element bit tricks used in pivot selection, see Bit Manipulation.

Walk through a small example—6 nodes, edges in a DAG:

text
  Graph:
        0 ---> 1 ---> 3
        |             ^
        v            /
        2 ---+----> 4 ---> 5

Edges: 0->1, 0->2, 1->3, 2->4, 4->3, 4->5

Reverse topological order: 3, 5, 4, 1, 2, 0

text
  Step 1: node 3 (no outgoing edges)
    reach[3] = {3}                    = 001000

  Step 2: node 5 (no outgoing edges)
    reach[5] = {5}                    = 100000

  Step 3: node 4 (edges 4->3, 4->5)
    reach[4] = {4}                    = 010000
    reach[4] |= reach[3]  ->  {3,4}  = 011000
    reach[4] |= reach[5]  ->  {3,4,5}= 111000

  Step 4: node 1 (edge 1->3)
    reach[1] = {1}                    = 000010
    reach[1] |= reach[3]  ->  {1,3}  = 001010

  Step 5: node 2 (edge 2->4)
    reach[2] = {2}                    = 000100
    reach[2] |= reach[4]  ->  {2,3,4,5} = 111100

  Step 6: node 0 (edges 0->1, 0->2)
    reach[0] = {0}                        = 000001
    reach[0] |= reach[1]  ->  {0,1,3}    = 001011
    reach[0] |= reach[2]  ->  {0,1,2,3,4,5} = 111111

Total work: 6 merge operations, each touching 6/64=1 word. For n=2000 with O(n2) edges: O(n2)×O(n/64)=O(n3/64).


Subset-Sum in O(nW/64)

Classic problem: given n weights, can we make exactly sum W?

The subset-sum DP here is a special case of DP—Knapsack. For bitmask-based DP where the mask itself is the state (not a bitset optimization), see DP—Bitmask.

Standard DP: dp[i] = true if sum i is achievable. For each weight w: iterate i from W down to w and set dp[i] |= dp[i-w]. That is O(nW).

With bitset: represent dp as a bitset<W+1>. Bit i is set iff sum i is achievable. For each weight w:

dp|=dpw

One line. The left-shift moves every achievable sum up by w. The OR merges old and new achievable sums. Cost: O(W/64) per weight. Total: O(nW/64).

The Shift in Action

Start with weights [2,3,5] and W=10. Bitset indices 0..10:

text
  Initial:    dp = 10000000000   (only sum 0 is achievable)
                   ^
                   bit 0

  Weight w=2: dp << 2 = 00100000000
                           ^
                           bit 2
              dp |= (dp << 2):
              dp = 10100000000   (sums: 0, 2)

  Weight w=3: dp << 3 = 00010100000
              dp |= (dp << 3):
              dp = 10110100000   (sums: 0, 2, 3, 5)

  Weight w=5: dp << 5 = 00000101101
              dp |= (dp << 5):
              dp = 10110101101   (sums: 0, 2, 3, 5, 7, 8, 10)
                                        *              *
                                     we can make 10!

Reading right to left: position 0 = achievable, position 2 = achievable, and so on. The final bitset tells us every reachable sum in one shot.


Gaussian Elimination, but Everything is XOR

A system of linear equations over GF(2) (the field with elements {0, 1}, addition = XOR, multiplication = AND) arises in problems about:

  • toggling switches (flip a light, and its neighbors flip too)
  • linear dependence of bitmasks
  • counting solutions to XOR systems

For the general (non-GF(2)) version of Gaussian elimination, see Gaussian Elimination. For building an XOR basis without the full matrix, see Linear Basis (XOR Basis).

Each equation is a row of 0/1 coefficients. Elimination step: XOR one row into another to clear a column. With bitset rows, one elimination step costs O(n/64). Full Gaussian elimination: O(n) pivots, each touching O(n) rows → O(n3/64).

Row Operations on Bitset Rows

text
  Eliminate column 2 from row B using row A:

  Row A:  |1 0 1 1 0 0 | 1|     <-- pivot row (column 2 set)
  Row B:  |0 1 1 0 1 0 | 0|     <-- column 2 is set, need to clear

  Row B ^= Row A:
  Row B': |1 1 0 1 1 0 | 1|     <-- column 2 cleared

  With bitset: rows[b] ^= rows[a];   // O(n/64) per row op

The augmented column (rightmost) is included in the bitset, so the entire elimination needs no special-casing.


The three examples above—transitive closure, subset-sum, and Gaussian elimination—share the same structure. Here is how to tell when a new problem fits the pattern.

When to Reach for Bitset

Bitset optimization is not universal. It helps when:

  1. The problem is already O(n2) or O(n3) and the inner loop does bitwise/set operations on boolean arrays.
  2. n is in the sweet spot: 1000–5000. Below 1000, even O(n3) is fine. Above 5000, even O(n3/64) might TLE.
  3. The operations are bitwise—OR, AND, XOR, shift, popcount. You cannot use bitset to speed up addition or min/max operations.

Common sweet spots:

text
  +--------------------+----------+------------+------------------+
  | Problem            | Naive    | With bitset| n sweet spot     |
  +--------------------+----------+------------+------------------+
  | Transitive closure | O(n^3)   | O(n^3/64)  | n <= 2000-4000   |
  | Subset-sum         | O(nW)    | O(nW/64)   | W <= 10^5-10^6   |
  | Gaussian elim GF2  | O(n^3)   | O(n^3/64)  | n <= 2000-4000   |
  | Hamming distance   | O(n*m)   | O(n*m/64)  | n*m <= 10^8      |
  | Set intersection   | O(n)     | O(n/64)    | many queries     |
  +--------------------+----------+------------+------------------+

Trigger Phrases

These problem-statement patterns should make you think "bitset":

  • "reachability in a dense graph" with n4000—transitive closure via reach[u] |= reach[v]
  • "subset-sum with large W" (W up to 105106)—shift-and-OR DP
  • "XOR system of equations" or "toggle puzzle"—Gaussian elimination over GF(2)
  • "Hamming distance between binary strings"(a ^ b).count()
  • "count common elements" across many set pairs—(a & b).count()
  • "bipartite matching on dense graph" with n3000—bitset-accelerated augmenting paths
  • boolean DP that is barely TLE at O(n2) or O(nW)—the /64 constant shaves it to AC

The unifying signal: an inner loop that does boolean operations on a vector of bits, and N is in the 1000–5000 range where the 64× factor is the difference between TLE and AC.

What the Code Won't Teach You

The implementations above show how to apply bitset optimization. What they cannot convey is the judgment call: should I use this here?

The 64× speedup is real but the constant matters. At N=5000, O(N2/64)390,000 operations—very fast. But O(N3/64)2×109—still likely TLE.

You save a factor of 64, not a change in the exponent. Before reaching for bitset, check whether the reduced complexity actually fits within the time limit.

text
  N = 5000 complexity check

  +------------------+-----------------+----------+
  |    Algorithm     |   Operations    |  Verdict |
  +------------------+-----------------+----------+
  | n*n / 64         |      390K       |  FAST    |
  +------------------+-----------------+----------+
  | n*n*n / 64       |    2 * 10^9     |  TLE     |
  +------------------+-----------------+----------+
  | n*n              |     25 * 10^6   |  OK      |
  +------------------+-----------------+----------+
  | n*n*n            |    125 * 10^9   |  TLE     |
  +------------------+-----------------+----------+

If you see N up to 5000, a boolean DP or reachability problem, and a time limit that would TLE at O(N2) but pass at O(N2/64)—that is the bitset signal. Similarly, knapsack with N up to 104 and W up to 105 screams bitset. When none of these structural hints are present, bitset is almost certainly the wrong tool.

text
  Signal detection for bitset problems

  +--- Is the inner loop boolean ops ---+
  |                                     |
  v YES                           v NO
  |                               |
  +--- N in 1000 to 5000 ---+    +---> skip bitset
  |                          |
  v YES                v NO
  |                    |
  * TRY BITSET *       +---> probably not needed

Bitset optimization is a competitive programming trick, not a production technique. In real software, you would use SIMD intrinsics, parallelism, or algorithmic improvements instead of std::bitset. In contests, bitset is the fastest path from TLE to AC for boolean DP problems with large N—but recognizing when it applies is more valuable than knowing the syntax.

See also: Data Structure Selection Guide for choosing between bitset, bitmask DP, and plain arrays under contest pressure.


When NOT to Use Bitset

Knowing when bitset doesn't help is as important as knowing when it does. Compare the decision table in the Data Structure Selection Guide.

When n is Small Enough That Regular Arrays Are Fine

If n500, even O(n3) runs in under a second. If n1000, O(n2) is trivial. The /64 speedup is pointless when the naive approach already fits comfortably in the time limit. Adding bitset to a problem that doesn't need it just makes your code harder to read and debug.

text
  n = 500:   O(n^3) = 1.25 * 10^8   <-- fits in ~1 sec
  n = 1000:  O(n^2) = 10^6           <-- trivial
  --> no bitset needed for these sizes

When You Need Individual Element Access More Than Bulk Operations

Bitset shines at bulk operations (|=, &=, count(), <<=). But if your inner loop does if (bs.test(i)) { ... complex logic per element ... }, you are not benefiting from the 64-wide parallelism—you are just using a memory-efficient bool array. If most operations are single-bit reads/writes with per-element branching, a plain bool[] or vector<bool> is simpler and equally fast.

When Memory Isn't the Bottleneck

Bitset saves memory (8× denser than bool[]). But if your arrays are small or you have plenty of memory headroom, the density advantage is irrelevant. Don't use bitset purely for memory savings unless you are actually near the memory limit.

When the Constant Factor of Bitset Overhead Isn't Worth It

std::bitset operations carry overhead: shifts involve copying and shifting across word boundaries, count() calls popcount on every word. For small N (say N<128), these overheads can make bitset slower than a simple int or long long bitmask. If your mask fits in 64 bits, use uint64_t with raw bit ops—it will be faster and simpler.

text
  N <= 64:   use uint64_t with __builtin_popcountll()
  N <= 128:  consider two uint64_t values
  N > 128:   bitset<N> starts to pay off

When Bit Manipulation per Element Is Needed

std::bitset does not support per-element arithmetic, comparisons, or conditional operations across packed words. You cannot do "for each set bit, add its index to a sum" in O(N/64)—that requires iterating over set bits one by one. If your algorithm needs:

  • Per-element min/max/addition across two arrays
  • Conditional updates based on each element's value
  • Arithmetic operations (not boolean) on packed data

...then bitset is the wrong tool. You need SIMD intrinsics or just a plain array.

Summary: Skip Bitset When...

text
  +------------------------------------------------------+----------+
  | Situation                                            | Use what |
  +------------------------------------------------------+----------+
  | n <= 500, O(n^3) fits                                | plain [] |
  | n <= 64, need bit ops                                | uint64_t |
  | Inner loop is per-element branching                  | bool[]   |
  | Operations are min/max/addition, not boolean         | int[]    |
  | Memory is not a concern, n is small                  | plain [] |
  | Need to iterate set bits with complex per-bit logic  | uint64_t |
  +------------------------------------------------------+----------+

std::bitset API Reference

All in <bitset> (included via <bits/stdc++.h>).

Function / OperatorWhat it doesTime
bitset<N>()Construct all-zero bitset of N bitsO(N/64)
bitset<N>(unsigned long long v)Construct from integerO(N/64)
bitset<N>(string s)Construct from "010110" stringO(N)
a & b, a &= bBitwise ANDO(N/64)
a | b, a |= bBitwise ORO(N/64)
a ^ b, a ^= bBitwise XORO(N/64)
~aBitwise NOT (flip all)O(N/64)
a << k, a <<= kLeft shift by kO(N/64)
a >> k, a >>= kRight shift by kO(N/64)
a == b, a != bEquality comparisonO(N/64)
a.count()Number of set bits (popcount)O(N/64)
a.any()True if any bit is setO(N/64)
a.none()True if no bit is setO(N/64)
a.all()True if all bits are setO(N/64)
a.test(pos)Check bit at position posO(1)
a.set()Set all bits to 1O(N/64)
a.set(pos)Set bit at pos to 1O(1)
a.set(pos, val)Set bit at pos to valO(1)
a.reset()Clear all bits to 0O(N/64)
a.reset(pos)Clear bit at posO(1)
a.flip()Flip all bitsO(N/64)
a.flip(pos)Flip bit at posO(1)
a[pos]Access bit (returns proxy reference)O(1)
a.size()Returns N (compile-time constant)O(1)
a.to_string()Convert to "010110..." stringO(N)
a.to_ulong()Convert to unsigned longO(1)
a.to_ullong()Convert to unsigned long longO(1)
a._Find_first()Index of lowest set bit (GCC only)O(N/64)
a._Find_next(pos)Index of next set bit after pos (GCC)O(N/64)

_Find_first() and _Find_next() return N (the template size) when no set bit is found. They are essential for iterating over set bits:

cpp
for (int i = bs._Find_first(); i < (int)bs.size(); i = bs._Find_next(i)) {
    // process set bit i
}

Implementation: Transitive Closure

cpp
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 2005;
bitset<MAXN> reach[MAXN];

int main() {
    int n, m;
    scanf("%d %d", &n, &m);

    vector<vector<int>> adj(n);
    vector<int> indeg(n, 0);
    for (int i = 0; i < m; i++) {
        int u, v;
        scanf("%d %d", &u, &v);
        adj[u].push_back(v);
        indeg[v]++;
    }

    // topological sort (Kahn's)
    queue<int> q;
    // Invariant: q contains all nodes with current indeg == 0
    for (int i = 0; i < n; i++)
        if (indeg[i] == 0) q.push(i);
    vector<int> order;
    // Invariant: order[0..len-1] is a valid topological prefix
    while (!q.empty()) {
        int u = q.front(); q.pop();
        order.push_back(u);
        // Invariant: decrementing indeg for u's neighbors; new zero-indeg nodes enqueued
        for (int v : adj[u])
            if (--indeg[v] == 0) q.push(v);
    }

    // propagate in reverse topological order
    // Invariant: reach[u] is complete for all u processed so far (later in topo order)
    for (int i = (int)order.size() - 1; i >= 0; i--) {
        int u = order[i];
        reach[u].set(u);                   // u reaches itself
        // Invariant: reach[v] is already final for each neighbor v (processed earlier)
        for (int v : adj[u])
            reach[u] |= reach[v];          // O(n/64) per edge
    }

    // answer queries: reach[u].test(v)
    // Invariant: row i is fully printed before moving to row i+1
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++)
            putchar(reach[i].test(j) ? '1' : '0');
        putchar('\n');
    }
}

Total: O(n2/64) per node for the OR, across n nodes → O(n3/64).


Implementation: Subset-Sum

cpp
#include <bits/stdc++.h>
using namespace std;

const int MAXW = 100001;

int main() {
    int n, W;
    scanf("%d %d", &n, &W);

    bitset<MAXW> dp;
    dp.set(0);                              // sum 0 is achievable

    // Invariant: dp[s] == 1 iff sum s is achievable using items 0..i-1
    for (int i = 0; i < n; i++) {
        int w;
        scanf("%d", &w);
        dp |= (dp << w);                   // O(W/64) per weight
    }

    puts(dp.test(W) ? "YES" : "NO");
}

Total: O(nW/64). For n=1000,W=100000: about 1.5×106 word operations—essentially instant.

Recovering the Subset

If you need to print which items were chosen, track a bitset per item (or work backwards):

cpp
#include <bits/stdc++.h>
using namespace std;

const int MAXW = 100001;

int main() {
    int n, W;
    scanf("%d %d", &n, &W);
    vector<int> w(n);
    for (int i = 0; i < n; i++) scanf("%d", &w[i]);

    bitset<MAXW> dp;
    dp.set(0);

    // remember which items were used
    vector<bitset<MAXW>> after(n);
    // Invariant: after[i] = dp state after considering items 0..i
    for (int i = 0; i < n; i++) {
        dp |= (dp << w[i]);
        after[i] = dp;
    }

    if (!dp.test(W)) { puts("NO"); return 0; }

    // backtrack
    puts("YES");
    int cur = W;
    // Invariant: cur = remaining sum to reconstruct using items 0..i
    for (int i = n - 1; i >= 0; i--) {
        // was item i used? check if cur-w[i] was achievable before item i
        bitset<MAXW>& prev = (i > 0) ? after[i - 1] : (dp.reset(), dp.set(0), dp);
        if (cur >= w[i] && prev.test(cur - w[i])) {
            printf("%d ", w[i]);
            cur -= w[i];
        }
    }
    puts("");
}

Implementation: Gaussian Elimination over GF(2)

Find the rank (or a solution) of a system of XOR equations. Each equation: ai,0ai,1ai,n1=bi.

cpp
#include <bits/stdc++.h>
using namespace std;

const int MAXN = 2005;
bitset<MAXN> row_bs[MAXN];  // row_bs[i] stores coefficients + augmented bit

int gauss(int n, int m) {
    // n equations, m variables
    // augmented bit is at position m
    int rank = 0;
    // Invariant: rows 0..rank-1 are pivot rows with leading 1 in distinct columns
    for (int col = 0; col < m; col++) {
        // find pivot: a row in [rank..n) with bit col set
        int pivot = -1;
        // Invariant: rows 0..rank-1 already have pivots; search only rank..n-1
        for (int r = rank; r < n; r++) {
            if (row_bs[r].test(col)) { pivot = r; break; }
        }
        if (pivot == -1) continue;  // no pivot for this column, skip

        swap(row_bs[rank], row_bs[pivot]);

        // Safety: verify the pivot row is non-zero (guards against subtle bugs
        // if test() and swap logic are ever refactored)
        if (!row_bs[rank].any()) continue;

        // Invariant: row_bs[rank] has bit col set; eliminate col from all other rows
        for (int r = 0; r < n; r++) {
            if (r != rank && row_bs[r].test(col))
                row_bs[r] ^= row_bs[rank];     // O(m/64)
        }
        rank++;
    }
    return rank;  // rank may be 0 if all rows were zero
}

int main() {
    int n, m;
    scanf("%d %d", &n, &m);

    for (int i = 0; i < n; i++) {
        for (int j = 0; j < m; j++) {
            int x; scanf("%d", &x);
            if (x) row_bs[i].set(j);
        }
        int b; scanf("%d", &b);
        if (b) row_bs[i].set(m);             // augmented column
    }

    int rank = gauss(n, m);
    printf("Rank = %d\n", rank);

    if (rank == 0) {
        // All rows are zero (or were zeroed out). Check if any had a
        // non-zero augmented bit -- that means 0 = 1, inconsistent.
        for (int r = 0; r < n; r++) {
            if (row_bs[r].test(m)) {
                puts("No solution");
                return 0;
            }
        }
        puts("Solution exists (all variables are free)");
        return 0;
    }

    // check consistency: if any row has all-zero coefficients but set augmented bit
    for (int r = rank; r < n; r++) {
        if (row_bs[r].test(m)) {
            puts("No solution");
            return 0;
        }
    }
    puts("Solution exists");
}

Complexity Reference

The table below summarizes the cost and space of every bitset operation and the three canonical algorithms that benefit from the /64 speedup.

OperationTimeSpace
Construct bitset<N>O(N/64)O(N/64) words
a |= b, a &= b, a ^= bO(N/64)--
a <<= k, a >>= kO(N/64)--
a.count()O(N/64)--
_Find_first(), _Find_next()O(N/64) worst--
Transitive closure (n nodes)O(n3/64)O(n2/64)
Subset-sum (n items, target W)O(nW/64)O(W/64)
Gaussian elimination GF(2) (n×m)O(nmn/64)O(nm/64)
Hamming distance (two bitsets)O(N/64)--
Set intersection sizeO(N/64)--

With the reference table as a guide, here are the recurring problem shapes where bitset optimization pays off. For the underlying bit-level tricks used inside these patterns, see Bit Manipulation.

Problem Patterns

Transitive Closure / Reachability

Pattern: Propagate reachability sets along edges using |=.

cpp
reach[u] |= reach[v]; // for each edge u -> v

Problems:


Subset-Sum / Knapsack

Pattern: Shift-and-OR to propagate achievable sums.

cpp
dp |= (dp << w);

Works for 0/1 knapsack feasibility. For bounded quantities, repeat shifts or use the standard binary decomposition trick.

Problems:


Gaussian Elimination over GF(2)

Pattern: XOR row operations on bitset rows.

cpp
if (row_bs[r].test(col)) row_bs[r] ^= row_bs[pivot];

Use cases: counting XOR basis size, solving toggle puzzles, finding linear dependencies among bitmasks.

Problems:


Hamming Distance Computation

Pattern: (a ^ b).count() gives the Hamming distance between two binary strings in O(N/64).

cpp
int dist = (a ^ b).count();

Useful for nearest-neighbor searches or clustering on binary vectors.

text
  Hamming distance with bitset XOR

  a         | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
            +---+---+---+---+---+---+---+---+
                    XOR
  b         | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
            +---+---+---+---+---+---+---+---+
                     |
                     v
  a XOR b   | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 |
            +---+---+---+---+---+---+---+---+
                  ^   ^           ^
                  *   *           *
  popcount --> 3 bits differ

Problems:


Set Intersection Counting

Pattern: (a & b).count() gives |AB| in O(N/64).

Useful when you have many sets and need pairwise intersection sizes. For example: counting common friends between all pairs of users.

cpp
int common = (friends[u] & friends[v]).count();
text
  Set intersection with bitset AND

  friends of u  | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
                +---+---+---+---+---+---+---+---+
                        AND
  friends of v  | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 |
                +---+---+---+---+---+---+---+---+
                         |
                         v
  u AND v       | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
                +---+---+---+---+---+---+---+---+
                         ^               ^
                         * common        *
  popcount --> 2 friends in common

Problems:


Bipartite Matching Speedup

Pattern: In Hopcroft-Karp or Hungarian algorithm, use bitset to represent which vertices on the right side are still unmatched. Finding an available neighbor becomes (adj[u] & unmatched)._Find_first().

cpp
bitset<MAXN> unmatched;
unmatched.set(); // all initially unmatched
// ...
int v = (adj[u] & unmatched)._Find_first();
if (v < n) { /* match u to v */ unmatched.reset(v); }
text
  Bitset matching -- find first available neighbor

  adj of u     | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
               +---+---+---+---+---+---+---+---+
                       AND
  unmatched    | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
               +---+---+---+---+---+---+---+---+
                        |
                        v
  result       | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
               +---+---+---+---+---+---+---+---+
                         ^
                         |
               _Find_first() --> bit 3
               match u to vertex 3

Problems:


DP over Subsets with Fast Iteration

Pattern: Iterate over set bits of a bitset using _Find_first() / _Find_next() instead of checking every position.

cpp
for (int i = bs._Find_first(); i < N; i = bs._Find_next(i)) {
    // only visits set bits -- skips 64 zeros at a time
}

This replaces the for (int i = 0; i < N; i++) if (bs.test(i)) pattern and is significantly faster when the bitset is sparse.


Contest Cheat Sheet

text
+------------------------------------------------------------------+
|  BITSET OPTIMIZATION -- QUICK REFERENCE                          |
+------------------------------------------------------------------+
|  #include <bitset>          // or bits/stdc++.h                  |
|  bitset<N> bs;              // N must be compile-time constant    |
|                                                                  |
|  CORE OPS (all O(N/64)):                                         |
|    a |= b    a &= b    a ^= b    ~a    a <<= k    a >>= k       |
|    a.count()   a.any()   a.none()   a.all()                     |
|                                                                  |
|  SINGLE BIT (all O(1)):                                          |
|    a.set(i)  a.reset(i)  a.flip(i)  a.test(i)  a[i]            |
|                                                                  |
|  GCC EXTENSIONS:                                                 |
|    a._Find_first()         // lowest set bit index               |
|    a._Find_next(pos)       // next set bit after pos             |
|    both return N if no bit found                                 |
|                                                                  |
|  PATTERNS:                                                       |
|    Reachability:   reach[u] |= reach[v]       O(n^3/64)         |
|    Subset-sum:     dp |= (dp << w)            O(nW/64)          |
|    Gauss GF(2):    row[r] ^= row[pivot]       O(n^3/64)         |
|    Hamming dist:   (a ^ b).count()            O(N/64)           |
|    Intersection:   (a & b).count()            O(N/64)           |
+------------------------------------------------------------------+

If I Had 5 Minutes

If you have 5 minutes before a contest round and need a bitset refresher:

  1. bitset<N> packs N bools into N/64 words. All bitwise ops (|=, &=, ^=, <<=, count()) run in O(N/64).
  2. Subset-sum: dp |= (dp << w)—one line replaces the entire inner loop. Shift goes left (toward higher sums).
  3. Reachability / transitive closure: reach[u] |= reach[v] in reverse topological order. O(n3/64).
  4. GCC-only iteration: _Find_first() / _Find_next(pos) to walk set bits. Both return N when exhausted.
  5. Template size N must be a compile-time constant. Use const int MAXN = ...; and bitset<MAXN>.

Before You Code Checklist

Before writing bitset code in a contest, run through these five checks:

  • [ ] Is the inner loop boolean? The operations must be OR, AND, XOR, or shift—not min, max, or addition.
  • [ ] Is N in the sweet spot (1000–5000)? Below 500, brute force works. Above 5000, even O(N3/64) may TLE.
  • [ ] Does the reduced complexity fit? Calculate: N3/64 or NW/64. Does it fit in 108 operations?
  • [ ] Is memory OK? bitset<N> arr[N] uses N2/8 bytes. For N=5000: ~3 MB (fine). For N=50000: ~300 MB (too much).
  • [ ] Is the template size a compile-time constant? You cannot write bitset<n> where n is read from input.

Gotchas and Debugging

Template Parameter Must Be a Compile-Time Constant

cpp
int n;
cin >> n;
bitset<n> bs;  // COMPILE ERROR: n is not constexpr

You must use a fixed upper bound: bitset<MAXN>. If you need runtime-sized bit arrays, use vector<uint64_t> and do the bit packing yourself (or use vector<bitset<BLOCK>>, but that is rarely worth the complexity).

No Dynamic Resizing

Unlike vector<bool>, a bitset has fixed size. You cannot push_back or resize. Pick MAXN large enough, but be aware of memory: bitset<100000> uses about 12.5 KB, and an array of 5000 of them uses 61 MB.

_Find_first() and _Find_next() Are GCC-Only

These do not exist in MSVC or standard C++. On Codeforces and most OJs (which use GCC), they work fine. If you are on MSVC, you need to write your own using the underlying word array (not portable).

operator[] Returns a Proxy, Not a Bool

cpp
bitset<100> bs;
auto x = bs[5];  // x is bitset::reference, not bool
bool y = bs[5];  // OK: implicit conversion

This rarely causes bugs, but can confuse auto type deduction. Prefer bs.test(5) when you need a plain bool.

Shifts Move Bits, Not the Object

bs << 3 shifts all bits left by 3 positions (higher indices). It does NOT shift the bitset object in memory. Bit 0 becomes bit 3, bit 1 becomes bit 5, etc. Bits shifted past position N1 are lost.

text
  Before: bit positions  [4][3][2][1][0]
          values          1  0  1  1  0

  After << 2:            [4][3][2][1][0]
                          1  1  0  0  0
                          ^  ^
                          bits 0,1 moved to 2,3; old bit 4 lost

Performance Is O(N/64), Not O(1)

A common mistake: assuming a |= b is O(1). It is not—it is O(N/64). For bitset<64> it behaves like O(1), but for bitset<100000> it touches 1563 words. Factor this into your complexity analysis.

Memory: Arrays of Bitsets Add Up Fast

text
  bitset<5000> reach[5000];
  Memory: 5000 * ceil(5000/64) * 8 bytes
        = 5000 * 79 * 8
        = 3,160,000 bytes ~ 3 MB       <-- usually fine

  bitset<100000> dp[100000];
  Memory: 100000 * 1563 * 8 bytes
        ~ 1.25 GB                       <-- way over ML

Always estimate memory before declaring large bitset arrays.

Mental Traps

Trap 1—Applying bitset to non-boolean operations. You see an O(n2) inner loop and reach for bitset. But the loop computes dp[i] = min(dp[i], dp[j] + cost[j])—a min operation with data dependencies. Bitset only speeds up independent boolean (AND/OR/XOR) operations across packed words. If the loop has min, max, addition, or depends on the previous iteration's result, bitset cannot help.

text
  WRONG -- data dependent min/max loop
  +----+----+----+----+----+----+----+----+
  | d0 | d1 | d2 | d3 | d4 | d5 | d6 | d7 |
  +----+----+----+----+----+----+----+----+
           |
           v d1 needs d0 result first
  +----+----+----+----+----+----+----+----+
  | d0 | ?? | d2 | d3 | d4 | d5 | d6 | d7 |
  +----+----+----+----+----+----+----+----+
                |
                v d2 needs d1 -- serial chain
  cannot pack into 64-bit parallel ops

  RIGHT -- independent boolean OR loop
  +--------+--------+--------+--------+
  | word 0 | word 1 | word 2 | word 3 |  <-- a
  +--------+--------+--------+--------+
      |        |        |        |
      v OR     v OR     v OR     v OR
  +--------+--------+--------+--------+
  | word 0 | word 1 | word 2 | word 3 |  <-- b
  +--------+--------+--------+--------+
  all 4 words OR in parallel

Trap 2—Assuming bitset operations are O(1). When N=64, a bitset fits in one machine word and a |= b is truly one instruction. But at N=10000, the same operation touches 157 words. Treating a |= b as O(1) in your complexity analysis leads to wrong time estimates and surprise TLEs.

text
  N = 64 -- fits in 1 word
  +----------------------------------+
  |          single 64-bit word      |  1 OR instruction
  +----------------------------------+

  N = 10000 -- spans 157 words
  +------+------+------+------+------+---   ---+-------+
  |  w0  |  w1  |  w2  |  w3  |  w4  |   ...   | w156  |
  +------+------+------+------+------+---   ---+-------+
     |      |      |      |      |                 |
     v OR   v OR   v OR   v OR   v OR    ...       v OR
  157 OR instructions -- NOT constant time

Trap 3—Shifting the wrong direction in knapsack. The bitset knapsack uses dp |= (dp << w). Students often write dp >> w by mistake. Left shift moves bits to higher indices—adding weight. Right shift moves to lower indices—subtracting weight, which is wrong for 0/1 knapsack.

text
  dp before        bit0  bit1  bit2  bit3  bit4
                   +-----+-----+-----+-----+-----+
                   |  1  |  0  |  1  |  0  |  0  |
                   +-----+-----+-----+-----+-----+
                   sums reachable: 0 and 2

  WRONG  dp >> 2              RIGHT  dp << 2
  +-----+-----+-----+         +-----+-----+-----+-----+-----+
  |  1  |  0  |  0  |         |  0  |  0  |  1  |  0  |  1  |
  +-----+-----+-----+         +-----+-----+-----+-----+-----+
  bit2 lost                   sums shifted UP by 2
  bits go DOWN (subtract w)   bits go UP (add w -- correct)

The Mistake That Teaches

You're solving a problem: given N3000 items, find if subset-sum W is achievable. You write:

cpp
bitset<100001> dp;
dp.set(0);
for (int i = 0; i < n; i++)
    dp |= (dp << w[i]);

It passes the samples. You submit—Wrong Answer on test 3.

You stare at the code. The logic is correct. You add debug prints: the DP bitset looks right for small cases. You stress test against a brute-force—and finally catch it: one of the weights is 0. When wi=0, dp << 0 is just dp, so dp |= dp is a no-op. That's fine—weight 0 doesn't change achievable sums. But your input-reading code skips the scanf for items with w[i] == 0, causing all subsequent weights to be read incorrectly.

The lesson: the bitset part was correct. The bug was in ordinary I/O logic that you stopped checking because you trusted the "clever" part. In bitset problems, the bug is almost never in the bitset operation—it's in the setup, indexing, or edge cases around it. Always stress test the boring parts.


Self-Test

Before moving on, verify you can do these without looking at the topic file or reference material:

  • [ ] Write the 3-line bitset knapsack DP (dp[0] = 1; for each item: dp |= (dp << w[i])), explain why the shift goes left, and verify it on a small example by hand.
  • [ ] Implement Floyd-Warshall transitive closure using bitset<N> row arrays—get the reach[i] |= reach[k] propagation direction right without looking it up.
  • [ ] Explain the difference between bitset<1000> and vector<bool> in terms of what operations are O(1) vs O(N/64) vs O(N).
  • [ ] State under what N and W conditions bitset knapsack is likely to pass vs TLE—give concrete threshold numbers.
  • [ ] Explain why bitset<n> where n is a runtime variable is a compile error, and state the standard workaround.
  • [ ] Estimate the memory usage of bitset<5000> arr[5000] without a calculator—get within 2× of the real answer.
  • [ ] Given a problem with N3000 and boolean row operations, identify whether bitset optimization applies and estimate the resulting complexity.

Scenario Drills

Scenario 1: You have a graph with N=4000 nodes and need all-pairs shortest paths (not just reachability). The edge weights are 1 or 2. Can you use bitset?

Answer: Not directly. Bitset optimizes boolean operations, and shortest-path requires min/addition. However, you could use BFS layers: maintain bitset<4000> layer[k] for "nodes at distance exactly k". Propagate layer[k+1] = (neighbors of layer[k]) & ~visited. This gives all-pairs distances in O(N2D/64) where D is the diameter. If D is small, this works.

Scenario 2: You need to count the number of distinct subset-sums (not just check if W is achievable). N=1000, W=100000.

Answer: The bitset knapsack dp |= (dp << w[i]) already computes all achievable sums. After processing all items, dp.count() gives the number of distinct achievable sums in O(W/64). This is a direct application.

Scenario 3: You're solving a 2-SAT problem with N=5000 variables. The implication graph has O(N2) edges. Transitive closure via DFS is too slow. Can bitset help?

Answer: Yes. Build the implication graph and compute transitive closure with bitset reachability in O(N3/64). For N=5000: 50003/642×109—tight but possible with the low constant of bitset ops. See SCC and 2-SAT for the standard approach.


Debug This Exercises

Find the bug in each snippet. Answers are below (hidden by a horizontal rule).

Bug 1: Subset-Sum Shift Direction

cpp
bitset<100001> dp;
dp.set(0);
for (int i = 0; i < n; i++) {
    dp |= (dp >> w[i]);  // <-- is this right?
}
printf("%s\n", dp.test(W) ? "YES" : "NO");

Bug 2: Transitive Closure Propagation Order

cpp
// Process in topological order (NOT reverse)
for (int i = 0; i < (int)order.size(); i++) {
    int u = order[i];
    reach[u].set(u);
    for (int v : adj[u])
        reach[u] |= reach[v];  // <-- will this work?
}

Bug 3: Iterating Set Bits—Off-by-One

cpp
bitset<1000> bs;
// ... fill bs ...
for (int i = bs._Find_first(); i <= 1000; i = bs._Find_next(i)) {
    process(i);  // <-- any problem?
}

Bug 4: Gaussian Elimination—Missing Augmented Column

cpp
int gauss(int n, int m) {
    int rank = 0;
    for (int col = 0; col <= m; col++) {  // <-- note the <= m
        int pivot = -1;
        for (int r = rank; r < n; r++)
            if (row_bs[r].test(col)) { pivot = r; break; }
        if (pivot == -1) continue;
        swap(row_bs[rank], row_bs[pivot]);
        for (int r = 0; r < n; r++)
            if (r != rank && row_bs[r].test(col))
                row_bs[r] ^= row_bs[rank];
        rank++;
    }
    return rank;
}

Debug This—Answers

Bug 1: dp >> w[i] shifts bits right (toward lower indices), which subtracts weight instead of adding it. Fix: dp |= (dp << w[i]). Left shift moves achievable sums to higher values.

Bug 2: Processing in forward topological order means when you process node u, its successors v haven't been processed yet—reach[v] is incomplete. Fix: process in reverse topological order so successors are finalized before predecessors.

Bug 3: The loop condition i <= 1000 should be i < 1000. _Find_first() and _Find_next() return N (the template size, 1000) when no set bit is found. With <=, you'd process index 1000, which is out of bounds. Fix: i < 1000 (or i < (int)bs.size()).

Bug 4: The loop iterates col from 0 to m inclusive. Column m is the augmented column (the right-hand side). You must NOT pivot on the augmented column—that would mix the RHS into the elimination and corrupt the system. Fix: col < m (strict less-than).


Practice Problems

#ProblemSourceDifficultyKey Idea
1Reachability QueriesCSES1600Transitive closure with bitset
2Square SubsetsCF 895C1900Gaussian elimination, count XOR bases
3(Zero XOR Subset)-lessCF 1101G1700XOR linear basis with bitset
40-1 MSTCF 1242B1900BFS with bitset complement graph
5PermuTree (hard)CF 1856E22100Subset-sum with bitset for partition
6Alex and a TV ShowCF 1097F2400Bitset Mobius / divisor operations
7CoinsCSES1500Knapsack feasibility with bitset
8Developing SkillsCF 581D1500Greedy + bitset DP variant

What to Solve Next

If you are comfortable with the three core patterns (transitive closure, subset-sum, Gaussian GF(2)), here is where to go:

  1. Quick-reference recipesBitset Tricks is the contest-ready cheat sheet companion to this file. Skim it before your next round.
  2. Solidify recognition—work through the Practice Ladder (1100–1400) problems that involve boolean DP. Practice spotting the "/64 signal" before reading the editorial.
  3. Deepen bit-level fluency—revisit Bit Manipulation for the per-element tricks (isolating lowest set bit, popcount idioms) that complement bitset bulk operations.
  4. Bitset meets knapsackDP—Knapsack uses bitset left-shift to turn O(nW) feasibility DP into O(nW/64).
  5. Learn bitmask DPDP—Bitmask covers the O(2n) state-space approach where the mask is the DP state, not a packed boolean array. Know which is which.
  6. Explore graph structures—bitset reachability connects directly to SCC and 2-SAT and Floyd-Warshall transitive closure. Understanding those algorithms makes the bitset acceleration feel natural.
  7. Contest decision-making—the Data Structure Selection Guide helps you decide between bitset, segment tree, and other tools under time pressure.

Further Reading

Built for the climb from Codeforces 1100 to 2100.