Bitset Optimization

Quick Revisit
USE WHEN: O(n^2) boolean DP is barely TLE and operations are AND/OR/XOR on bit vectors
INVARIANT: bitset packs N bits into N/64 words; each bitwise op processes 64 bits per instruction
TIME: O(N/64) per bitwise op | SPACE: O(N/64)
CLASSIC BUG: bitset size N must be a compile-time constant—cannot use runtime variable
PRACTICE: ../12-practice-ladders/01-ladder-1100-to-1400.md

std::bitset gives you $O (n / 64)$ speedups on bitwise operations, turning borderline TLE into comfortable AC.

Scope note: This file teaches when and how std::bitset works—memory layout, the /64 speedup, and core patterns (transitive closure, subset-sum, Gaussian elimination). For a contest-ready cheat sheet of bitset recipes and quick-reference snippets, see Bitset Tricks.

Difficulty: Intermediate | CF Rating: 1500–1800 | Prerequisites: Bit Manipulation

Bit-parallel techniques date to 1960s IBM mainframes, where word-level boolean operations on packed bit vectors were a primary optimization. The C++ std::bitset standardized this idea in C++98, giving competitive programmers a one-line way to exploit 64-wide parallelism. Think of a bitset as a row of 64 light switches on a single panel: you flip all 64 with one hand instead of flipping each bulb individually. The fingerprint to burn into memory: O(n²) TLE + boolean operations → bitset /64. A useful mnemonic: "Pack it, OR it, ship it—64 at a time."

When O(n^2) is barely too slow
64 Bits at a Time
Transitive Closure in 32 Operations per Edge
Subset-Sum in O(nW/64)
Gaussian Elimination, but Everything is XOR
When to Reach for Bitset
When NOT to Use Bitset
std::bitset API Reference
Implementation: Transitive Closure
Implementation: Subset-Sum
Implementation: Gaussian Elimination over GF(2)
Complexity Reference
Problem Patterns
Contest Cheat Sheet
If I Had 5 Minutes
Before You Code Checklist
Gotchas and Debugging
The Mistake That Teaches
Self-Test
Scenario Drills
Debug This Exercises
Practice Problems
What to Solve Next
Further Reading

When $O (n^{2})$ is barely too slow

Here is a problem you will meet sooner or later:

Given a directed graph on $n \leq 2000$ nodes, determine for every pair $(u, v)$ whether $u$ can reach $v$ .

The naive plan: run a DFS/BFS from each node. Each traversal touches $O (n + m)$ edges. In the worst case $m = O (n^{2})$ , so the total is $O (n \cdot n^{2}) = O (n^{3})$ . For $n = 2000$ that is $8 \times 10^{9}$ —way too slow.

A smarter plan: process nodes in reverse topological order and propagate reachability sets. For each edge $j \to i$ , merge reachability: reach[i] = reach[i] U reach[j]. If we store reach[i] as a std::set or std::vector<bool>, each union costs $O (n)$ . With $O (n^{2})$ edges, we get $O (n^{3})$ again—but with a much smaller constant. Still borderline for $n = 2000$ .

The question: can we shave off a constant factor of 64?

64 Bits at a Time

A std::bitset<N> packs $N$ bits into $⌈ N / 64 ⌉$ 64-bit words. When you write a |= b, the compiler emits one OR instruction per word. For $N = 2000$ :

⌈ 2000 / 64 ⌉ = 32 words

That is 32 OR instructions instead of 2000 individual flag updates. The speedup factor is roughly 64×—and that turns $O (n^{3})$ with a constant of $1$ into $O (n^{3} / 64)$ , which for $n = 2000$ is about $1.25 \times 10^{8}$ . Comfortable under 1 second.

Memory Layout: Bitset vs. Bool Array

text

  bool reach[2000]:
  +---+---+---+---+---+---+---+---+---+---+---+--     --+---+
  | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | ... | 1 |
  +---+---+---+---+---+---+---+---+---+---+---+--     --+---+
    1 byte each  -->  2000 bytes total (wasted: 7 bits per byte)

  bitset<2000> reach:
  +----------------+----------------+----------------+---  ---+----------------+
  | word 0 (64b)   | word 1 (64b)   | word 2 (64b)   |  ...  | word 31 (64b)  |
  | bits 0..63     | bits 64..127   | bits 128..191  |       | bits 1984..2047|
  +----------------+----------------+----------------+---  ---+----------------+
    8 bytes each  -->  256 bytes total (8x denser)

The density matters for two reasons: less memory used, and better cache utilization. But the real win is that bitwise OR/AND/XOR on the entire set costs $O (N / 64)$ instead of $O (N)$ .

How `a |= b` Works Under the Hood

text

  a:   |0110...1001|1010...0011|0001...1110|  ...  |1100...0001|
       word 0       word 1       word 2              word 31

  b:   |1001...0110|0101...1100|1110...0001|  ...  |0011...1110|
       word 0       word 1       word 2              word 31

  a|=b:|1111...1111|1111...1111|1111...1111|  ...  |1111...1111|
       1 CPU OR     1 CPU OR     1 CPU OR            1 CPU OR
       = 32 total OR instructions for 2000 bits

The Aha Moment

The key insight: a boolean array is already a bitset—you are just wasting 63 bits per element. Every bool in a bool[] occupies 8 bits but stores only 1 bit of information.

When you replace bool reach[2000] with bitset<2000>, you are not changing the algorithm. You are telling the compiler to stop wasting 7/8 of every byte and to process 64 booleans per CPU instruction instead of one.

The "aha" is recognizing that any inner loop doing a[i] = a[i] | b[i] across a boolean array is secretly 64 independent OR operations packed into one—if only you let the compiler see it.

Let's see that 64× speedup in action, starting with the motivating problem: transitive closure.

Transitive Closure in 32 Operations per Edge

Process nodes in reverse topological order. For each edge $j \to i$ , do reach[i] |= reach[j]. Every node is reachable from itself, so initialize reach[i].set(i).

This is the bitset-accelerated variant of Floyd-Warshall transitive closure. For the full per-element bit tricks used in pivot selection, see Bit Manipulation.

Walk through a small example—6 nodes, edges in a DAG:

text

  Graph:
        0 ---> 1 ---> 3
        |             ^
        v            /
        2 ---+----> 4 ---> 5

Edges: 0->1, 0->2, 1->3, 2->4, 4->3, 4->5

Reverse topological order: 3, 5, 4, 1, 2, 0

text

  Step 1: node 3 (no outgoing edges)
    reach[3] = {3}                    = 001000

  Step 2: node 5 (no outgoing edges)
    reach[5] = {5}                    = 100000

  Step 3: node 4 (edges 4->3, 4->5)
    reach[4] = {4}                    = 010000
    reach[4] |= reach[3]  ->  {3,4}  = 011000
    reach[4] |= reach[5]  ->  {3,4,5}= 111000

  Step 4: node 1 (edge 1->3)
    reach[1] = {1}                    = 000010
    reach[1] |= reach[3]  ->  {1,3}  = 001010

  Step 5: node 2 (edge 2->4)
    reach[2] = {2}                    = 000100
    reach[2] |= reach[4]  ->  {2,3,4,5} = 111100

  Step 6: node 0 (edges 0->1, 0->2)
    reach[0] = {0}                        = 000001
    reach[0] |= reach[1]  ->  {0,1,3}    = 001011
    reach[0] |= reach[2]  ->  {0,1,2,3,4,5} = 111111

Total work: 6 merge operations, each touching $⌈ 6 / 64 ⌉ = 1$ word. For $n = 2000$ with $O (n^{2})$ edges: $O (n^{2}) \times O (n / 64) = O (n^{3} / 64)$ .

Subset-Sum in $O (n W / 64)$

Classic problem: given $n$ weights, can we make exactly sum $W$ ?

The subset-sum DP here is a special case of DP—Knapsack. For bitmask-based DP where the mask itself is the state (not a bitset optimization), see DP—Bitmask.

Standard DP: dp[i] = true if sum $i$ is achievable. For each weight $w$ : iterate $i$ from $W$ down to $w$ and set dp[i] |= dp[i-w]. That is $O (n W)$ .

With bitset: represent dp as a bitset<W+1>. Bit $i$ is set iff sum $i$ is achievable. For each weight $w$ :

dp | = dp ≪ w

One line. The left-shift moves every achievable sum up by $w$ . The OR merges old and new achievable sums. Cost: $O (W / 64)$ per weight. Total: $O (n W / 64)$ .

The Shift in Action

Start with weights $[2, 3, 5]$ and $W = 10$ . Bitset indices 0..10:

text

  Initial:    dp = 10000000000   (only sum 0 is achievable)
                   ^
                   bit 0

  Weight w=2: dp << 2 = 00100000000
                           ^
                           bit 2
              dp |= (dp << 2):
              dp = 10100000000   (sums: 0, 2)

  Weight w=3: dp << 3 = 00010100000
              dp |= (dp << 3):
              dp = 10110100000   (sums: 0, 2, 3, 5)

  Weight w=5: dp << 5 = 00000101101
              dp |= (dp << 5):
              dp = 10110101101   (sums: 0, 2, 3, 5, 7, 8, 10)
                                        *              *
                                     we can make 10!

Reading right to left: position 0 = achievable, position 2 = achievable, and so on. The final bitset tells us every reachable sum in one shot.

Gaussian Elimination, but Everything is XOR

A system of linear equations over GF(2) (the field with elements {0, 1}, addition = XOR, multiplication = AND) arises in problems about:

toggling switches (flip a light, and its neighbors flip too)
linear dependence of bitmasks
counting solutions to XOR systems

For the general (non-GF(2)) version of Gaussian elimination, see Gaussian Elimination. For building an XOR basis without the full matrix, see Linear Basis (XOR Basis).

Each equation is a row of 0/1 coefficients. Elimination step: XOR one row into another to clear a column. With bitset rows, one elimination step costs $O (n / 64)$ . Full Gaussian elimination: $O (n)$ pivots, each touching $O (n)$ rows → $O (n^{3} / 64)$ .

Row Operations on Bitset Rows

text

  Eliminate column 2 from row B using row A:

  Row A:  |1 0 1 1 0 0 | 1|     <-- pivot row (column 2 set)
  Row B:  |0 1 1 0 1 0 | 0|     <-- column 2 is set, need to clear

  Row B ^= Row A:
  Row B': |1 1 0 1 1 0 | 1|     <-- column 2 cleared

  With bitset: rows[b] ^= rows[a];   // O(n/64) per row op

The augmented column (rightmost) is included in the bitset, so the entire elimination needs no special-casing.

The three examples above—transitive closure, subset-sum, and Gaussian elimination—share the same structure. Here is how to tell when a new problem fits the pattern.

When to Reach for Bitset

Bitset optimization is not universal. It helps when:

The problem is already $O (n^{2})$ or $O (n^{3})$ and the inner loop does bitwise/set operations on boolean arrays.
$n$ is in the sweet spot: 1000–5000. Below 1000, even $O (n^{3})$ is fine. Above 5000, even $O (n^{3} / 64)$ might TLE.
The operations are bitwise—OR, AND, XOR, shift, popcount. You cannot use bitset to speed up addition or min/max operations.

Common sweet spots:

text

  +--------------------+----------+------------+------------------+
  | Problem            | Naive    | With bitset| n sweet spot     |
  +--------------------+----------+------------+------------------+
  | Transitive closure | O(n^3)   | O(n^3/64)  | n <= 2000-4000   |
  | Subset-sum         | O(nW)    | O(nW/64)   | W <= 10^5-10^6   |
  | Gaussian elim GF2  | O(n^3)   | O(n^3/64)  | n <= 2000-4000   |
  | Hamming distance   | O(n*m)   | O(n*m/64)  | n*m <= 10^8      |
  | Set intersection   | O(n)     | O(n/64)    | many queries     |
  +--------------------+----------+------------+------------------+

Trigger Phrases

These problem-statement patterns should make you think "bitset":

"reachability in a dense graph" with $n \leq 4000$ —transitive closure via reach[u] |= reach[v]
"subset-sum with large W" ( $W$ up to $10^{5}$ – $10^{6}$ )—shift-and-OR DP
"XOR system of equations" or "toggle puzzle"—Gaussian elimination over GF(2)
"Hamming distance between binary strings"—(a ^ b).count()
"count common elements" across many set pairs—(a & b).count()
"bipartite matching on dense graph" with $n \leq 3000$ —bitset-accelerated augmenting paths
boolean DP that is barely TLE at $O (n^{2})$ or $O (n W)$ —the /64 constant shaves it to AC

The unifying signal: an inner loop that does boolean operations on a vector of bits, and $N$ is in the 1000–5000 range where the 64× factor is the difference between TLE and AC.

What the Code Won't Teach You

The implementations above show how to apply bitset optimization. What they cannot convey is the judgment call: should I use this here?

The 64× speedup is real but the constant matters. At $N = 5000$ , $O (N^{2} / 64) \approx 390,000$ operations—very fast. But $O (N^{3} / 64) \approx 2 \times 10^{9}$ —still likely TLE.

You save a factor of 64, not a change in the exponent. Before reaching for bitset, check whether the reduced complexity actually fits within the time limit.

text

  N = 5000 complexity check

  +------------------+-----------------+----------+
  |    Algorithm     |   Operations    |  Verdict |
  +------------------+-----------------+----------+
  | n*n / 64         |      390K       |  FAST    |
  +------------------+-----------------+----------+
  | n*n*n / 64       |    2 * 10^9     |  TLE     |
  +------------------+-----------------+----------+
  | n*n              |     25 * 10^6   |  OK      |
  +------------------+-----------------+----------+
  | n*n*n            |    125 * 10^9   |  TLE     |
  +------------------+-----------------+----------+

If you see $N$ up to 5000, a boolean DP or reachability problem, and a time limit that would TLE at $O (N^{2})$ but pass at $O (N^{2} / 64)$ —that is the bitset signal. Similarly, knapsack with $N$ up to $10^{4}$ and $W$ up to $10^{5}$ screams bitset. When none of these structural hints are present, bitset is almost certainly the wrong tool.

text

  Signal detection for bitset problems

  +--- Is the inner loop boolean ops ---+
  |                                     |
  v YES                           v NO
  |                               |
  +--- N in 1000 to 5000 ---+    +---> skip bitset
  |                          |
  v YES                v NO
  |                    |
  * TRY BITSET *       +---> probably not needed

Bitset optimization is a competitive programming trick, not a production technique. In real software, you would use SIMD intrinsics, parallelism, or algorithmic improvements instead of std::bitset. In contests, bitset is the fastest path from TLE to AC for boolean DP problems with large $N$ —but recognizing when it applies is more valuable than knowing the syntax.

See also: Data Structure Selection Guide for choosing between bitset, bitmask DP, and plain arrays under contest pressure.

When NOT to Use Bitset

Knowing when bitset doesn't help is as important as knowing when it does. Compare the decision table in the Data Structure Selection Guide.

When $n$ is Small Enough That Regular Arrays Are Fine

If $n \leq 500$ , even $O (n^{3})$ runs in under a second. If $n \leq 1000$ , $O (n^{2})$ is trivial. The /64 speedup is pointless when the naive approach already fits comfortably in the time limit. Adding bitset to a problem that doesn't need it just makes your code harder to read and debug.

text

  n = 500:   O(n^3) = 1.25 * 10^8   <-- fits in ~1 sec
  n = 1000:  O(n^2) = 10^6           <-- trivial
  --> no bitset needed for these sizes

When You Need Individual Element Access More Than Bulk Operations

Bitset shines at bulk operations (|=, &=, count(), <<=). But if your inner loop does if (bs.test(i)) { ... complex logic per element ... }, you are not benefiting from the 64-wide parallelism—you are just using a memory-efficient bool array. If most operations are single-bit reads/writes with per-element branching, a plain bool[] or vector<bool> is simpler and equally fast.

When Memory Isn't the Bottleneck

Bitset saves memory (8× denser than bool[]). But if your arrays are small or you have plenty of memory headroom, the density advantage is irrelevant. Don't use bitset purely for memory savings unless you are actually near the memory limit.

When the Constant Factor of Bitset Overhead Isn't Worth It

std::bitset operations carry overhead: shifts involve copying and shifting across word boundaries, count() calls popcount on every word. For small $N$ (say $N < 128$ ), these overheads can make bitset slower than a simple int or long long bitmask. If your mask fits in 64 bits, use uint64_t with raw bit ops—it will be faster and simpler.

text

  N <= 64:   use uint64_t with __builtin_popcountll()
  N <= 128:  consider two uint64_t values
  N > 128:   bitset<N> starts to pay off

When Bit Manipulation per Element Is Needed

std::bitset does not support per-element arithmetic, comparisons, or conditional operations across packed words. You cannot do "for each set bit, add its index to a sum" in $O (N / 64)$ —that requires iterating over set bits one by one. If your algorithm needs:

Per-element min/max/addition across two arrays
Conditional updates based on each element's value
Arithmetic operations (not boolean) on packed data

...then bitset is the wrong tool. You need SIMD intrinsics or just a plain array.

Summary: Skip Bitset When...

text

  +------------------------------------------------------+----------+
  | Situation                                            | Use what |
  +------------------------------------------------------+----------+
  | n <= 500, O(n^3) fits                                | plain [] |
  | n <= 64, need bit ops                                | uint64_t |
  | Inner loop is per-element branching                  | bool[]   |
  | Operations are min/max/addition, not boolean         | int[]    |
  | Memory is not a concern, n is small                  | plain [] |
  | Need to iterate set bits with complex per-bit logic  | uint64_t |
  +------------------------------------------------------+----------+

std::bitset API Reference

All in <bitset> (included via <bits/stdc++.h>).

Function / Operator	What it does	Time
`bitset<N>()`	Construct all-zero bitset of $N$ bits	$O (N / 64)$
`bitset<N>(unsigned long long v)`	Construct from integer	$O (N / 64)$
`bitset<N>(string s)`	Construct from `"010110"` string	$O (N)$
`a & b`, `a &= b`	Bitwise AND	$O (N / 64)$
`a \| b`, `a \|= b`	Bitwise OR	$O (N / 64)$
`a ^ b`, `a ^= b`	Bitwise XOR	$O (N / 64)$
`~a`	Bitwise NOT (flip all)	$O (N / 64)$
`a << k`, `a <<= k`	Left shift by $k$	$O (N / 64)$
`a >> k`, `a >>= k`	Right shift by $k$	$O (N / 64)$
`a == b`, `a != b`	Equality comparison	$O (N / 64)$
`a.count()`	Number of set bits (popcount)	$O (N / 64)$
`a.any()`	True if any bit is set	$O (N / 64)$
`a.none()`	True if no bit is set	$O (N / 64)$
`a.all()`	True if all bits are set	$O (N / 64)$
`a.test(pos)`	Check bit at position `pos`	$O (1)$
`a.set()`	Set all bits to 1	$O (N / 64)$
`a.set(pos)`	Set bit at `pos` to 1	$O (1)$
`a.set(pos, val)`	Set bit at `pos` to `val`	$O (1)$
`a.reset()`	Clear all bits to 0	$O (N / 64)$
`a.reset(pos)`	Clear bit at `pos`	$O (1)$
`a.flip()`	Flip all bits	$O (N / 64)$
`a.flip(pos)`	Flip bit at `pos`	$O (1)$
`a[pos]`	Access bit (returns proxy reference)	$O (1)$
`a.size()`	Returns $N$ (compile-time constant)	$O (1)$
`a.to_string()`	Convert to `"010110..."` string	$O (N)$
`a.to_ulong()`	Convert to `unsigned long`	$O (1)$
`a.to_ullong()`	Convert to `unsigned long long`	$O (1)$
`a._Find_first()`	Index of lowest set bit (GCC only)	$O (N / 64)$
`a._Find_next(pos)`	Index of next set bit after `pos` (GCC)	$O (N / 64)$

_Find_first() and _Find_next() return N (the template size) when no set bit is found. They are essential for iterating over set bits:

cpp

for (int i = bs._Find_first(); i < (int)bs.size(); i = bs._Find_next(i)) {
    // process set bit i
}

Implementation: Transitive Closure

cpp

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 2005;
bitset<MAXN> reach[MAXN];

int main() {
    int n, m;
    scanf("%d %d", &n, &m);

    vector<vector<int>> adj(n);
    vector<int> indeg(n, 0);
    for (int i = 0; i < m; i++) {
        int u, v;
        scanf("%d %d", &u, &v);
        adj[u].push_back(v);
        indeg[v]++;
    }

    // topological sort (Kahn's)
    queue<int> q;
    // Invariant: q contains all nodes with current indeg == 0
    for (int i = 0; i < n; i++)
        if (indeg[i] == 0) q.push(i);
    vector<int> order;
    // Invariant: order[0..len-1] is a valid topological prefix
    while (!q.empty()) {
        int u = q.front(); q.pop();
        order.push_back(u);
        // Invariant: decrementing indeg for u's neighbors; new zero-indeg nodes enqueued
        for (int v : adj[u])
            if (--indeg[v] == 0) q.push(v);
    }

    // propagate in reverse topological order
    // Invariant: reach[u] is complete for all u processed so far (later in topo order)
    for (int i = (int)order.size() - 1; i >= 0; i--) {
        int u = order[i];
        reach[u].set(u);                   // u reaches itself
        // Invariant: reach[v] is already final for each neighbor v (processed earlier)
        for (int v : adj[u])
            reach[u] |= reach[v];          // O(n/64) per edge
    }

    // answer queries: reach[u].test(v)
    // Invariant: row i is fully printed before moving to row i+1
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++)
            putchar(reach[i].test(j) ? '1' : '0');
        putchar('\n');
    }
}

Total: $O (n^{2} / 64)$ per node for the OR, across $n$ nodes → $O (n^{3} / 64)$ .

Implementation: Subset-Sum

cpp

#include <bits/stdc++.h>
using namespace std;

const int MAXW = 100001;

int main() {
    int n, W;
    scanf("%d %d", &n, &W);

    bitset<MAXW> dp;
    dp.set(0);                              // sum 0 is achievable

    // Invariant: dp[s] == 1 iff sum s is achievable using items 0..i-1
    for (int i = 0; i < n; i++) {
        int w;
        scanf("%d", &w);
        dp |= (dp << w);                   // O(W/64) per weight
    }

    puts(dp.test(W) ? "YES" : "NO");
}

Total: $O (n W / 64)$ . For $n = 1000, W = 100 000$ : about $1.5 \times 10^{6}$ word operations—essentially instant.

Recovering the Subset

If you need to print which items were chosen, track a bitset per item (or work backwards):

cpp

#include <bits/stdc++.h>
using namespace std;

const int MAXW = 100001;

int main() {
    int n, W;
    scanf("%d %d", &n, &W);
    vector<int> w(n);
    for (int i = 0; i < n; i++) scanf("%d", &w[i]);

    bitset<MAXW> dp;
    dp.set(0);

    // remember which items were used
    vector<bitset<MAXW>> after(n);
    // Invariant: after[i] = dp state after considering items 0..i
    for (int i = 0; i < n; i++) {
        dp |= (dp << w[i]);
        after[i] = dp;
    }

    if (!dp.test(W)) { puts("NO"); return 0; }

    // backtrack
    puts("YES");
    int cur = W;
    // Invariant: cur = remaining sum to reconstruct using items 0..i
    for (int i = n - 1; i >= 0; i--) {
        // was item i used? check if cur-w[i] was achievable before item i
        bitset<MAXW>& prev = (i > 0) ? after[i - 1] : (dp.reset(), dp.set(0), dp);
        if (cur >= w[i] && prev.test(cur - w[i])) {
            printf("%d ", w[i]);
            cur -= w[i];
        }
    }
    puts("");
}

Implementation: Gaussian Elimination over GF(2)

Find the rank (or a solution) of a system of XOR equations. Each equation: $a_{i, 0} \oplus a_{i, 1} \oplus \dots \oplus a_{i, n - 1} = b_{i}$ .

cpp

#include <bits/stdc++.h>
using namespace std;

const int MAXN = 2005;
bitset<MAXN> row_bs[MAXN];  // row_bs[i] stores coefficients + augmented bit

int gauss(int n, int m) {
    // n equations, m variables
    // augmented bit is at position m
    int rank = 0;
    // Invariant: rows 0..rank-1 are pivot rows with leading 1 in distinct columns
    for (int col = 0; col < m; col++) {
        // find pivot: a row in [rank..n) with bit col set
        int pivot = -1;
        // Invariant: rows 0..rank-1 already have pivots; search only rank..n-1
        for (int r = rank; r < n; r++) {
            if (row_bs[r].test(col)) { pivot = r; break; }
        }
        if (pivot == -1) continue;  // no pivot for this column, skip

        swap(row_bs[rank], row_bs[pivot]);

        // Safety: verify the pivot row is non-zero (guards against subtle bugs
        // if test() and swap logic are ever refactored)
        if (!row_bs[rank].any()) continue;

        // Invariant: row_bs[rank] has bit col set; eliminate col from all other rows
        for (int r = 0; r < n; r++) {
            if (r != rank && row_bs[r].test(col))
                row_bs[r] ^= row_bs[rank];     // O(m/64)
        }
        rank++;
    }
    return rank;  // rank may be 0 if all rows were zero
}

int main() {
    int n, m;
    scanf("%d %d", &n, &m);

    for (int i = 0; i < n; i++) {
        for (int j = 0; j < m; j++) {
            int x; scanf("%d", &x);
            if (x) row_bs[i].set(j);
        }
        int b; scanf("%d", &b);
        if (b) row_bs[i].set(m);             // augmented column
    }

    int rank = gauss(n, m);
    printf("Rank = %d\n", rank);

    if (rank == 0) {
        // All rows are zero (or were zeroed out). Check if any had a
        // non-zero augmented bit -- that means 0 = 1, inconsistent.
        for (int r = 0; r < n; r++) {
            if (row_bs[r].test(m)) {
                puts("No solution");
                return 0;
            }
        }
        puts("Solution exists (all variables are free)");
        return 0;
    }

    // check consistency: if any row has all-zero coefficients but set augmented bit
    for (int r = rank; r < n; r++) {
        if (row_bs[r].test(m)) {
            puts("No solution");
            return 0;
        }
    }
    puts("Solution exists");
}

Complexity Reference

The table below summarizes the cost and space of every bitset operation and the three canonical algorithms that benefit from the $/ 64$ speedup.

Operation	Time	Space
Construct `bitset<N>`	$O (N / 64)$	$O (N / 64)$ words
`a \|= b`, `a &= b`, `a ^= b`	$O (N / 64)$	--
`a <<= k`, `a >>= k`	$O (N / 64)$	--
`a.count()`	$O (N / 64)$	--
`_Find_first()`, `_Find_next()`	$O (N / 64)$ worst	--
Transitive closure ( $n$ nodes)	$O (n^{3} / 64)$	$O (n^{2} / 64)$
Subset-sum ( $n$ items, target $W$ )	$O (n W / 64)$	$O (W / 64)$
Gaussian elimination GF(2) ( $n \times m$ )	$O (n \cdot m \cdot n / 64)$	$O (n \cdot m / 64)$
Hamming distance (two bitsets)	$O (N / 64)$	--
Set intersection size	$O (N / 64)$	--

With the reference table as a guide, here are the recurring problem shapes where bitset optimization pays off. For the underlying bit-level tricks used inside these patterns, see Bit Manipulation.

Problem Patterns

Transitive Closure / Reachability

Pattern: Propagate reachability sets along edges using |=.

cpp

reach[u] |= reach[v]; // for each edge u -> v

Problems:

Subset-Sum / Knapsack

Pattern: Shift-and-OR to propagate achievable sums.

cpp

dp |= (dp << w);

Works for 0/1 knapsack feasibility. For bounded quantities, repeat shifts or use the standard binary decomposition trick.

Problems:

CF 1856E2 -- PermuTree (hard version)
AtCoder ABC221 -- Subset Sum
CSES -- Coin Combinations I (bitset variant)

Gaussian Elimination over GF(2)

Pattern: XOR row operations on bitset rows.

cpp

if (row_bs[r].test(col)) row_bs[r] ^= row_bs[pivot];

Use cases: counting XOR basis size, solving toggle puzzles, finding linear dependencies among bitmasks.

Problems:

Hamming Distance Computation

Pattern: (a ^ b).count() gives the Hamming distance between two binary strings in $O (N / 64)$ .

cpp

int dist = (a ^ b).count();

Useful for nearest-neighbor searches or clustering on binary vectors.

text

  Hamming distance with bitset XOR

  a         | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
            +---+---+---+---+---+---+---+---+
                    XOR
  b         | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
            +---+---+---+---+---+---+---+---+
                     |
                     v
  a XOR b   | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 |
            +---+---+---+---+---+---+---+---+
                  ^   ^           ^
                  *   *           *
  popcount --> 3 bits differ

Problems:

CF 1097F -- Alex and a TV Show

Set Intersection Counting

Pattern: (a & b).count() gives $| A \cap B |$ in $O (N / 64)$ .

Useful when you have many sets and need pairwise intersection sizes. For example: counting common friends between all pairs of users.

cpp

int common = (friends[u] & friends[v]).count();

text

  Set intersection with bitset AND

  friends of u  | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
                +---+---+---+---+---+---+---+---+
                        AND
  friends of v  | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 |
                +---+---+---+---+---+---+---+---+
                         |
                         v
  u AND v       | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
                +---+---+---+---+---+---+---+---+
                         ^               ^
                         * common        *
  popcount --> 2 friends in common

Problems:

CF 1242B -- 0-1 MST

Bipartite Matching Speedup

Pattern: In Hopcroft-Karp or Hungarian algorithm, use bitset to represent which vertices on the right side are still unmatched. Finding an available neighbor becomes (adj[u] & unmatched)._Find_first().

cpp

bitset<MAXN> unmatched;
unmatched.set(); // all initially unmatched
// ...
int v = (adj[u] & unmatched)._Find_first();
if (v < n) { /* match u to v */ unmatched.reset(v); }

text

  Bitset matching -- find first available neighbor

  adj of u     | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
               +---+---+---+---+---+---+---+---+
                       AND
  unmatched    | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
               +---+---+---+---+---+---+---+---+
                        |
                        v
  result       | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
               +---+---+---+---+---+---+---+---+
                         ^
                         |
               _Find_first() --> bit 3
               match u to vertex 3

Problems:

CF 1089L -- Lazyland

DP over Subsets with Fast Iteration

Pattern: Iterate over set bits of a bitset using _Find_first() / _Find_next() instead of checking every position.

cpp

for (int i = bs._Find_first(); i < N; i = bs._Find_next(i)) {
    // only visits set bits -- skips 64 zeros at a time
}

This replaces the for (int i = 0; i < N; i++) if (bs.test(i)) pattern and is significantly faster when the bitset is sparse.

Contest Cheat Sheet

text

+------------------------------------------------------------------+
|  BITSET OPTIMIZATION -- QUICK REFERENCE                          |
+------------------------------------------------------------------+
|  #include <bitset>          // or bits/stdc++.h                  |
|  bitset<N> bs;              // N must be compile-time constant    |
|                                                                  |
|  CORE OPS (all O(N/64)):                                         |
|    a |= b    a &= b    a ^= b    ~a    a <<= k    a >>= k       |
|    a.count()   a.any()   a.none()   a.all()                     |
|                                                                  |
|  SINGLE BIT (all O(1)):                                          |
|    a.set(i)  a.reset(i)  a.flip(i)  a.test(i)  a[i]            |
|                                                                  |
|  GCC EXTENSIONS:                                                 |
|    a._Find_first()         // lowest set bit index               |
|    a._Find_next(pos)       // next set bit after pos             |
|    both return N if no bit found                                 |
|                                                                  |
|  PATTERNS:                                                       |
|    Reachability:   reach[u] |= reach[v]       O(n^3/64)         |
|    Subset-sum:     dp |= (dp << w)            O(nW/64)          |
|    Gauss GF(2):    row[r] ^= row[pivot]       O(n^3/64)         |
|    Hamming dist:   (a ^ b).count()            O(N/64)           |
|    Intersection:   (a & b).count()            O(N/64)           |
+------------------------------------------------------------------+

If I Had 5 Minutes

If you have 5 minutes before a contest round and need a bitset refresher:

bitset<N> packs $N$ bools into $⌈ N / 64 ⌉$ words. All bitwise ops (|=, &=, ^=, <<=, count()) run in $O (N / 64)$ .
Subset-sum: dp |= (dp << w)—one line replaces the entire inner loop. Shift goes left (toward higher sums).
Reachability / transitive closure: reach[u] |= reach[v] in reverse topological order. $O (n^{3} / 64)$ .
GCC-only iteration: _Find_first() / _Find_next(pos) to walk set bits. Both return $N$ when exhausted.
Template size $N$ must be a compile-time constant. Use const int MAXN = ...; and bitset<MAXN>.

Before You Code Checklist

Before writing bitset code in a contest, run through these five checks:

[ ] Is the inner loop boolean? The operations must be OR, AND, XOR, or shift—not min, max, or addition.
[ ] Is $N$ in the sweet spot (1000–5000)? Below 500, brute force works. Above 5000, even $O (N^{3} / 64)$ may TLE.
[ ] Does the reduced complexity fit? Calculate: $N^{3} / 64$ or $N \cdot W / 64$ . Does it fit in $\sim 10^{8}$ operations?
[ ] Is memory OK? bitset<N> arr[N] uses $N^{2} / 8$ bytes. For $N = 5000$ : ~3 MB (fine). For $N = 50000$ : ~300 MB (too much).
[ ] Is the template size a compile-time constant? You cannot write bitset<n> where n is read from input.

Gotchas and Debugging

Template Parameter Must Be a Compile-Time Constant

cpp

int n;
cin >> n;
bitset<n> bs;  // COMPILE ERROR: n is not constexpr

You must use a fixed upper bound: bitset<MAXN>. If you need runtime-sized bit arrays, use vector<uint64_t> and do the bit packing yourself (or use vector<bitset<BLOCK>>, but that is rarely worth the complexity).

No Dynamic Resizing

Unlike vector<bool>, a bitset has fixed size. You cannot push_back or resize. Pick MAXN large enough, but be aware of memory: bitset<100000> uses about 12.5 KB, and an array of 5000 of them uses 61 MB.

`_Find_first()` and `_Find_next()` Are GCC-Only

These do not exist in MSVC or standard C++. On Codeforces and most OJs (which use GCC), they work fine. If you are on MSVC, you need to write your own using the underlying word array (not portable).

`operator[]` Returns a Proxy, Not a Bool

cpp

bitset<100> bs;
auto x = bs[5];  // x is bitset::reference, not bool
bool y = bs[5];  // OK: implicit conversion

This rarely causes bugs, but can confuse auto type deduction. Prefer bs.test(5) when you need a plain bool.

Shifts Move Bits, Not the Object

bs << 3 shifts all bits left by 3 positions (higher indices). It does NOT shift the bitset object in memory. Bit 0 becomes bit 3, bit 1 becomes bit 5, etc. Bits shifted past position $N - 1$ are lost.

text

  Before: bit positions  [4][3][2][1][0]
          values          1  0  1  1  0

  After << 2:            [4][3][2][1][0]
                          1  1  0  0  0
                          ^  ^
                          bits 0,1 moved to 2,3; old bit 4 lost

Performance Is $O (N / 64)$ , Not $O (1)$

A common mistake: assuming a |= b is $O (1)$ . It is not—it is $O (N / 64)$ . For bitset<64> it behaves like $O (1)$ , but for bitset<100000> it touches 1563 words. Factor this into your complexity analysis.

Memory: Arrays of Bitsets Add Up Fast

text

  bitset<5000> reach[5000];
  Memory: 5000 * ceil(5000/64) * 8 bytes
        = 5000 * 79 * 8
        = 3,160,000 bytes ~ 3 MB       <-- usually fine

  bitset<100000> dp[100000];
  Memory: 100000 * 1563 * 8 bytes
        ~ 1.25 GB                       <-- way over ML

Always estimate memory before declaring large bitset arrays.

Mental Traps

Trap 1—Applying bitset to non-boolean operations. You see an $O (n^{2})$ inner loop and reach for bitset. But the loop computes dp[i] = min(dp[i], dp[j] + cost[j])—a min operation with data dependencies. Bitset only speeds up independent boolean (AND/OR/XOR) operations across packed words. If the loop has min, max, addition, or depends on the previous iteration's result, bitset cannot help.

text

  WRONG -- data dependent min/max loop
  +----+----+----+----+----+----+----+----+
  | d0 | d1 | d2 | d3 | d4 | d5 | d6 | d7 |
  +----+----+----+----+----+----+----+----+
           |
           v d1 needs d0 result first
  +----+----+----+----+----+----+----+----+
  | d0 | ?? | d2 | d3 | d4 | d5 | d6 | d7 |
  +----+----+----+----+----+----+----+----+
                |
                v d2 needs d1 -- serial chain
  cannot pack into 64-bit parallel ops

  RIGHT -- independent boolean OR loop
  +--------+--------+--------+--------+
  | word 0 | word 1 | word 2 | word 3 |  <-- a
  +--------+--------+--------+--------+
      |        |        |        |
      v OR     v OR     v OR     v OR
  +--------+--------+--------+--------+
  | word 0 | word 1 | word 2 | word 3 |  <-- b
  +--------+--------+--------+--------+
  all 4 words OR in parallel

Trap 2—Assuming bitset operations are $O (1)$ . When $N = 64$ , a bitset fits in one machine word and a |= b is truly one instruction. But at $N = 10000$ , the same operation touches 157 words. Treating a |= b as $O (1)$ in your complexity analysis leads to wrong time estimates and surprise TLEs.

text

  N = 64 -- fits in 1 word
  +----------------------------------+
  |          single 64-bit word      |  1 OR instruction
  +----------------------------------+

  N = 10000 -- spans 157 words
  +------+------+------+------+------+---   ---+-------+
  |  w0  |  w1  |  w2  |  w3  |  w4  |   ...   | w156  |
  +------+------+------+------+------+---   ---+-------+
     |      |      |      |      |                 |
     v OR   v OR   v OR   v OR   v OR    ...       v OR
  157 OR instructions -- NOT constant time

Trap 3—Shifting the wrong direction in knapsack. The bitset knapsack uses dp |= (dp << w). Students often write dp >> w by mistake. Left shift moves bits to higher indices—adding weight. Right shift moves to lower indices—subtracting weight, which is wrong for 0/1 knapsack.

text

  dp before        bit0  bit1  bit2  bit3  bit4
                   +-----+-----+-----+-----+-----+
                   |  1  |  0  |  1  |  0  |  0  |
                   +-----+-----+-----+-----+-----+
                   sums reachable: 0 and 2

  WRONG  dp >> 2              RIGHT  dp << 2
  +-----+-----+-----+         +-----+-----+-----+-----+-----+
  |  1  |  0  |  0  |         |  0  |  0  |  1  |  0  |  1  |
  +-----+-----+-----+         +-----+-----+-----+-----+-----+
  bit2 lost                   sums shifted UP by 2
  bits go DOWN (subtract w)   bits go UP (add w -- correct)

The Mistake That Teaches

You're solving a problem: given $N \leq 3000$ items, find if subset-sum $W$ is achievable. You write:

cpp

bitset<100001> dp;
dp.set(0);
for (int i = 0; i < n; i++)
    dp |= (dp << w[i]);

It passes the samples. You submit—Wrong Answer on test 3.

You stare at the code. The logic is correct. You add debug prints: the DP bitset looks right for small cases. You stress test against a brute-force—and finally catch it: one of the weights is 0. When $w_{i} = 0$ , dp << 0 is just dp, so dp |= dp is a no-op. That's fine—weight 0 doesn't change achievable sums. But your input-reading code skips the scanf for items with w[i] == 0, causing all subsequent weights to be read incorrectly.

The lesson: the bitset part was correct. The bug was in ordinary I/O logic that you stopped checking because you trusted the "clever" part. In bitset problems, the bug is almost never in the bitset operation—it's in the setup, indexing, or edge cases around it. Always stress test the boring parts.

Self-Test

Before moving on, verify you can do these without looking at the topic file or reference material:

[ ] Write the 3-line bitset knapsack DP (dp[0] = 1; for each item: dp |= (dp << w[i])), explain why the shift goes left, and verify it on a small example by hand.
[ ] Implement Floyd-Warshall transitive closure using bitset<N> row arrays—get the reach[i] |= reach[k] propagation direction right without looking it up.
[ ] Explain the difference between bitset<1000> and vector<bool> in terms of what operations are $O (1)$ vs $O (N / 64)$ vs $O (N)$ .
[ ] State under what $N$ and $W$ conditions bitset knapsack is likely to pass vs TLE—give concrete threshold numbers.
[ ] Explain why bitset<n> where n is a runtime variable is a compile error, and state the standard workaround.
[ ] Estimate the memory usage of bitset<5000> arr[5000] without a calculator—get within 2× of the real answer.
[ ] Given a problem with $N \leq 3000$ and boolean row operations, identify whether bitset optimization applies and estimate the resulting complexity.

Scenario Drills

Scenario 1: You have a graph with $N = 4000$ nodes and need all-pairs shortest paths (not just reachability). The edge weights are 1 or 2. Can you use bitset?

Answer: Not directly. Bitset optimizes boolean operations, and shortest-path requires min/addition. However, you could use BFS layers: maintain bitset<4000> layer[k] for "nodes at distance exactly $k$ ". Propagate layer[k+1] = (neighbors of layer[k]) & ~visited. This gives all-pairs distances in $O (N^{2} \cdot D / 64)$ where $D$ is the diameter. If $D$ is small, this works.

Scenario 2: You need to count the number of distinct subset-sums (not just check if $W$ is achievable). $N = 1000$ , $W = 100000$ .

Answer: The bitset knapsack dp |= (dp << w[i]) already computes all achievable sums. After processing all items, dp.count() gives the number of distinct achievable sums in $O (W / 64)$ . This is a direct application.

Scenario 3: You're solving a 2-SAT problem with $N = 5000$ variables. The implication graph has $O (N^{2})$ edges. Transitive closure via DFS is too slow. Can bitset help?

Answer: Yes. Build the implication graph and compute transitive closure with bitset reachability in $O (N^{3} / 64)$ . For $N = 5000$ : $5000^{3} / 64 \approx 2 \times 10^{9}$ —tight but possible with the low constant of bitset ops. See SCC and 2-SAT for the standard approach.

Debug This Exercises

Find the bug in each snippet. Answers are below (hidden by a horizontal rule).

Bug 1: Subset-Sum Shift Direction

cpp

bitset<100001> dp;
dp.set(0);
for (int i = 0; i < n; i++) {
    dp |= (dp >> w[i]);  // <-- is this right?
}
printf("%s\n", dp.test(W) ? "YES" : "NO");

Bug 2: Transitive Closure Propagation Order

cpp

// Process in topological order (NOT reverse)
for (int i = 0; i < (int)order.size(); i++) {
    int u = order[i];
    reach[u].set(u);
    for (int v : adj[u])
        reach[u] |= reach[v];  // <-- will this work?
}

Bug 3: Iterating Set Bits—Off-by-One

cpp

bitset<1000> bs;
// ... fill bs ...
for (int i = bs._Find_first(); i <= 1000; i = bs._Find_next(i)) {
    process(i);  // <-- any problem?
}

Bug 4: Gaussian Elimination—Missing Augmented Column

cpp

int gauss(int n, int m) {
    int rank = 0;
    for (int col = 0; col <= m; col++) {  // <-- note the <= m
        int pivot = -1;
        for (int r = rank; r < n; r++)
            if (row_bs[r].test(col)) { pivot = r; break; }
        if (pivot == -1) continue;
        swap(row_bs[rank], row_bs[pivot]);
        for (int r = 0; r < n; r++)
            if (r != rank && row_bs[r].test(col))
                row_bs[r] ^= row_bs[rank];
        rank++;
    }
    return rank;
}

Debug This—Answers

Bug 1: dp >> w[i] shifts bits right (toward lower indices), which subtracts weight instead of adding it. Fix: dp |= (dp << w[i]). Left shift moves achievable sums to higher values.

Bug 2: Processing in forward topological order means when you process node $u$ , its successors $v$ haven't been processed yet—reach[v] is incomplete. Fix: process in reverse topological order so successors are finalized before predecessors.

Bug 3: The loop condition i <= 1000 should be i < 1000. _Find_first() and _Find_next() return N (the template size, 1000) when no set bit is found. With <=, you'd process index 1000, which is out of bounds. Fix: i < 1000 (or i < (int)bs.size()).

Bug 4: The loop iterates col from 0 to $m$ inclusive. Column $m$ is the augmented column (the right-hand side). You must NOT pivot on the augmented column—that would mix the RHS into the elimination and corrupt the system. Fix: col < m (strict less-than).

Practice Problems

#	Problem	Source	Difficulty	Key Idea
1	Reachability Queries	CSES	1600	Transitive closure with bitset
2	Square Subsets	CF 895C	1900	Gaussian elimination, count XOR bases
3	(Zero XOR Subset)-less	CF 1101G	1700	XOR linear basis with bitset
4	0-1 MST	CF 1242B	1900	BFS with bitset complement graph
5	PermuTree (hard)	CF 1856E2	2100	Subset-sum with bitset for partition
6	Alex and a TV Show	CF 1097F	2400	Bitset Mobius / divisor operations
7	Coins	CSES	1500	Knapsack feasibility with bitset
8	Developing Skills	CF 581D	1500	Greedy + bitset DP variant

What to Solve Next

If you are comfortable with the three core patterns (transitive closure, subset-sum, Gaussian GF(2)), here is where to go:

Quick-reference recipes—Bitset Tricks is the contest-ready cheat sheet companion to this file. Skim it before your next round.
Solidify recognition—work through the Practice Ladder (1100–1400) problems that involve boolean DP. Practice spotting the "/64 signal" before reading the editorial.
Deepen bit-level fluency—revisit Bit Manipulation for the per-element tricks (isolating lowest set bit, popcount idioms) that complement bitset bulk operations.
Bitset meets knapsack—DP—Knapsack uses bitset left-shift to turn $O (n W)$ feasibility DP into $O (n W / 64)$ .
Learn bitmask DP—DP—Bitmask covers the $O (2^{n})$ state-space approach where the mask is the DP state, not a packed boolean array. Know which is which.
Explore graph structures—bitset reachability connects directly to SCC and 2-SAT and Floyd-Warshall transitive closure. Understanding those algorithms makes the bitset acceleration feel natural.
Contest decision-making—the Data Structure Selection Guide helps you decide between bitset, segment tree, and other tools under time pressure.

Bitset Optimization ​

Contents ​

When O(n2) is barely too slow ​

64 Bits at a Time ​

Memory Layout: Bitset vs. Bool Array ​

How a |= b Works Under the Hood ​

The Aha Moment ​

Transitive Closure in 32 Operations per Edge ​

Subset-Sum in O(nW/64) ​

The Shift in Action ​

Gaussian Elimination, but Everything is XOR ​

Row Operations on Bitset Rows ​

When to Reach for Bitset ​

Trigger Phrases ​

What the Code Won't Teach You ​

When NOT to Use Bitset ​

When n is Small Enough That Regular Arrays Are Fine ​

When You Need Individual Element Access More Than Bulk Operations ​

When Memory Isn't the Bottleneck ​

When the Constant Factor of Bitset Overhead Isn't Worth It ​

When Bit Manipulation per Element Is Needed ​

Summary: Skip Bitset When... ​

std::bitset API Reference ​

Implementation: Transitive Closure ​

Implementation: Subset-Sum ​

Recovering the Subset ​

Implementation: Gaussian Elimination over GF(2) ​

Complexity Reference ​

Problem Patterns ​

Transitive Closure / Reachability ​

Subset-Sum / Knapsack ​

Gaussian Elimination over GF(2) ​

Hamming Distance Computation ​

Set Intersection Counting ​

Bipartite Matching Speedup ​

DP over Subsets with Fast Iteration ​

Contest Cheat Sheet ​

If I Had 5 Minutes ​

Before You Code Checklist ​

Gotchas and Debugging ​

Template Parameter Must Be a Compile-Time Constant ​

No Dynamic Resizing ​

_Find_first() and _Find_next() Are GCC-Only ​

operator[] Returns a Proxy, Not a Bool ​

Shifts Move Bits, Not the Object ​

Performance Is O(N/64), Not O(1) ​

Memory: Arrays of Bitsets Add Up Fast ​

Mental Traps ​

The Mistake That Teaches ​

Self-Test ​

Scenario Drills ​

Debug This Exercises ​

Bug 1: Subset-Sum Shift Direction ​

Bug 2: Transitive Closure Propagation Order ​

Bug 3: Iterating Set Bits—Off-by-One ​

Bug 4: Gaussian Elimination—Missing Augmented Column ​

Debug This—Answers ​

Practice Problems ​

What to Solve Next ​

Further Reading ​

Bitset Optimization

Contents

When $O (n^{2})$ is barely too slow

64 Bits at a Time

Memory Layout: Bitset vs. Bool Array

How `a |= b` Works Under the Hood

The Aha Moment

Transitive Closure in 32 Operations per Edge

Subset-Sum in $O (n W / 64)$

The Shift in Action

Gaussian Elimination, but Everything is XOR

Row Operations on Bitset Rows

When to Reach for Bitset

Trigger Phrases

What the Code Won't Teach You

When NOT to Use Bitset

When $n$ is Small Enough That Regular Arrays Are Fine

When You Need Individual Element Access More Than Bulk Operations

When Memory Isn't the Bottleneck

When the Constant Factor of Bitset Overhead Isn't Worth It

When Bit Manipulation per Element Is Needed

Summary: Skip Bitset When...

std::bitset API Reference

Implementation: Transitive Closure

Implementation: Subset-Sum

Recovering the Subset

Implementation: Gaussian Elimination over GF(2)

Complexity Reference

Problem Patterns

Transitive Closure / Reachability

Subset-Sum / Knapsack

Gaussian Elimination over GF(2)

Hamming Distance Computation

Set Intersection Counting

Bipartite Matching Speedup

DP over Subsets with Fast Iteration

Contest Cheat Sheet

If I Had 5 Minutes

Before You Code Checklist

Gotchas and Debugging

Template Parameter Must Be a Compile-Time Constant

No Dynamic Resizing

`_Find_first()` and `_Find_next()` Are GCC-Only

`operator[]` Returns a Proxy, Not a Bool

Shifts Move Bits, Not the Object

Performance Is $O (N / 64)$ , Not $O (1)$

Memory: Arrays of Bitsets Add Up Fast

Mental Traps

The Mistake That Teaches

Self-Test

Scenario Drills

Debug This Exercises

Bug 1: Subset-Sum Shift Direction

Bug 2: Transitive Closure Propagation Order

Bug 3: Iterating Set Bits—Off-by-One

Bug 4: Gaussian Elimination—Missing Augmented Column

Debug This—Answers

Practice Problems

What to Solve Next

Further Reading