Appearance
Bitset Optimization
Quick Revisit
- USE WHEN: O(n^2) boolean DP is barely TLE and operations are AND/OR/XOR on bit vectors
- INVARIANT: bitset packs N bits into N/64 words; each bitwise op processes 64 bits per instruction
- TIME: O(N/64) per bitwise op | SPACE: O(N/64)
- CLASSIC BUG: bitset size N must be a compile-time constant—cannot use runtime variable
- PRACTICE: ../12-practice-ladders/01-ladder-1100-to-1400.md
std::bitset gives you
Scope note: This file teaches when and how
std::bitsetworks—memory layout, the /64 speedup, and core patterns (transitive closure, subset-sum, Gaussian elimination). For a contest-ready cheat sheet of bitset recipes and quick-reference snippets, see Bitset Tricks.
Difficulty: Intermediate | CF Rating: 1500–1800 | Prerequisites: Bit Manipulation
Bit-parallel techniques date to 1960s IBM mainframes, where word-level boolean operations on packed bit vectors were a primary optimization. The C++ std::bitset standardized this idea in C++98, giving competitive programmers a one-line way to exploit 64-wide parallelism. Think of a bitset as a row of 64 light switches on a single panel: you flip all 64 with one hand instead of flipping each bulb individually. The fingerprint to burn into memory: O(n²) TLE + boolean operations → bitset /64. A useful mnemonic: "Pack it, OR it, ship it—64 at a time."
Contents
- When O(n^2) is barely too slow
- 64 Bits at a Time
- Transitive Closure in 32 Operations per Edge
- Subset-Sum in O(nW/64)
- Gaussian Elimination, but Everything is XOR
- When to Reach for Bitset
- When NOT to Use Bitset
- std::bitset API Reference
- Implementation: Transitive Closure
- Implementation: Subset-Sum
- Implementation: Gaussian Elimination over GF(2)
- Complexity Reference
- Problem Patterns
- Contest Cheat Sheet
- If I Had 5 Minutes
- Before You Code Checklist
- Gotchas and Debugging
- The Mistake That Teaches
- Self-Test
- Scenario Drills
- Debug This Exercises
- Practice Problems
- What to Solve Next
- Further Reading
When is barely too slow
Here is a problem you will meet sooner or later:
Given a directed graph on
nodes, determine for every pair whether can reach .
The naive plan: run a DFS/BFS from each node. Each traversal touches
A smarter plan: process nodes in reverse topological order and propagate reachability sets. For each edge reach[i] = reach[i] U reach[j]. If we store reach[i] as a std::set or std::vector<bool>, each union costs
The question: can we shave off a constant factor of 64?
64 Bits at a Time
A std::bitset<N> packs a |= b, the compiler emits one OR instruction per word. For
That is 32 OR instructions instead of 2000 individual flag updates. The speedup factor is roughly 64×—and that turns
Memory Layout: Bitset vs. Bool Array
text
bool reach[2000]:
+---+---+---+---+---+---+---+---+---+---+---+-- --+---+
| 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | ... | 1 |
+---+---+---+---+---+---+---+---+---+---+---+-- --+---+
1 byte each --> 2000 bytes total (wasted: 7 bits per byte)
bitset<2000> reach:
+----------------+----------------+----------------+--- ---+----------------+
| word 0 (64b) | word 1 (64b) | word 2 (64b) | ... | word 31 (64b) |
| bits 0..63 | bits 64..127 | bits 128..191 | | bits 1984..2047|
+----------------+----------------+----------------+--- ---+----------------+
8 bytes each --> 256 bytes total (8x denser)The density matters for two reasons: less memory used, and better cache utilization. But the real win is that bitwise OR/AND/XOR on the entire set costs
How a |= b Works Under the Hood
text
a: |0110...1001|1010...0011|0001...1110| ... |1100...0001|
word 0 word 1 word 2 word 31
b: |1001...0110|0101...1100|1110...0001| ... |0011...1110|
word 0 word 1 word 2 word 31
a|=b:|1111...1111|1111...1111|1111...1111| ... |1111...1111|
1 CPU OR 1 CPU OR 1 CPU OR 1 CPU OR
= 32 total OR instructions for 2000 bitsThe Aha Moment
The key insight: a boolean array is already a bitset—you are just wasting 63 bits per element. Every bool in a bool[] occupies 8 bits but stores only 1 bit of information.
When you replace bool reach[2000] with bitset<2000>, you are not changing the algorithm. You are telling the compiler to stop wasting 7/8 of every byte and to process 64 booleans per CPU instruction instead of one.
The "aha" is recognizing that any inner loop doing a[i] = a[i] | b[i] across a boolean array is secretly 64 independent OR operations packed into one—if only you let the compiler see it.
Let's see that 64× speedup in action, starting with the motivating problem: transitive closure.
Transitive Closure in 32 Operations per Edge
Process nodes in reverse topological order. For each edge reach[i] |= reach[j]. Every node is reachable from itself, so initialize reach[i].set(i).
This is the bitset-accelerated variant of Floyd-Warshall transitive closure. For the full per-element bit tricks used in pivot selection, see Bit Manipulation.
Walk through a small example—6 nodes, edges in a DAG:
text
Graph:
0 ---> 1 ---> 3
| ^
v /
2 ---+----> 4 ---> 5Edges: 0->1, 0->2, 1->3, 2->4, 4->3, 4->5
Reverse topological order: 3, 5, 4, 1, 2, 0
text
Step 1: node 3 (no outgoing edges)
reach[3] = {3} = 001000
Step 2: node 5 (no outgoing edges)
reach[5] = {5} = 100000
Step 3: node 4 (edges 4->3, 4->5)
reach[4] = {4} = 010000
reach[4] |= reach[3] -> {3,4} = 011000
reach[4] |= reach[5] -> {3,4,5}= 111000
Step 4: node 1 (edge 1->3)
reach[1] = {1} = 000010
reach[1] |= reach[3] -> {1,3} = 001010
Step 5: node 2 (edge 2->4)
reach[2] = {2} = 000100
reach[2] |= reach[4] -> {2,3,4,5} = 111100
Step 6: node 0 (edges 0->1, 0->2)
reach[0] = {0} = 000001
reach[0] |= reach[1] -> {0,1,3} = 001011
reach[0] |= reach[2] -> {0,1,2,3,4,5} = 111111Total work: 6 merge operations, each touching
Subset-Sum in
Classic problem: given
The subset-sum DP here is a special case of DP—Knapsack. For bitmask-based DP where the mask itself is the state (not a bitset optimization), see DP—Bitmask.
Standard DP: dp[i] = true if sum dp[i] |= dp[i-w]. That is
With bitset: represent dp as a bitset<W+1>. Bit
One line. The left-shift moves every achievable sum up by
The Shift in Action
Start with weights
text
Initial: dp = 10000000000 (only sum 0 is achievable)
^
bit 0
Weight w=2: dp << 2 = 00100000000
^
bit 2
dp |= (dp << 2):
dp = 10100000000 (sums: 0, 2)
Weight w=3: dp << 3 = 00010100000
dp |= (dp << 3):
dp = 10110100000 (sums: 0, 2, 3, 5)
Weight w=5: dp << 5 = 00000101101
dp |= (dp << 5):
dp = 10110101101 (sums: 0, 2, 3, 5, 7, 8, 10)
* *
we can make 10!Reading right to left: position 0 = achievable, position 2 = achievable, and so on. The final bitset tells us every reachable sum in one shot.
Gaussian Elimination, but Everything is XOR
A system of linear equations over GF(2) (the field with elements {0, 1}, addition = XOR, multiplication = AND) arises in problems about:
- toggling switches (flip a light, and its neighbors flip too)
- linear dependence of bitmasks
- counting solutions to XOR systems
For the general (non-GF(2)) version of Gaussian elimination, see Gaussian Elimination. For building an XOR basis without the full matrix, see Linear Basis (XOR Basis).
Each equation is a row of 0/1 coefficients. Elimination step: XOR one row into another to clear a column. With bitset rows, one elimination step costs
Row Operations on Bitset Rows
text
Eliminate column 2 from row B using row A:
Row A: |1 0 1 1 0 0 | 1| <-- pivot row (column 2 set)
Row B: |0 1 1 0 1 0 | 0| <-- column 2 is set, need to clear
Row B ^= Row A:
Row B': |1 1 0 1 1 0 | 1| <-- column 2 cleared
With bitset: rows[b] ^= rows[a]; // O(n/64) per row opThe augmented column (rightmost) is included in the bitset, so the entire elimination needs no special-casing.
The three examples above—transitive closure, subset-sum, and Gaussian elimination—share the same structure. Here is how to tell when a new problem fits the pattern.
When to Reach for Bitset
Bitset optimization is not universal. It helps when:
- The problem is already
or and the inner loop does bitwise/set operations on boolean arrays. is in the sweet spot: 1000–5000. Below 1000, even is fine. Above 5000, even might TLE. - The operations are bitwise—OR, AND, XOR, shift, popcount. You cannot use bitset to speed up addition or min/max operations.
Common sweet spots:
text
+--------------------+----------+------------+------------------+
| Problem | Naive | With bitset| n sweet spot |
+--------------------+----------+------------+------------------+
| Transitive closure | O(n^3) | O(n^3/64) | n <= 2000-4000 |
| Subset-sum | O(nW) | O(nW/64) | W <= 10^5-10^6 |
| Gaussian elim GF2 | O(n^3) | O(n^3/64) | n <= 2000-4000 |
| Hamming distance | O(n*m) | O(n*m/64) | n*m <= 10^8 |
| Set intersection | O(n) | O(n/64) | many queries |
+--------------------+----------+------------+------------------+Trigger Phrases
These problem-statement patterns should make you think "bitset":
- "reachability in a dense graph" with
—transitive closure via reach[u] |= reach[v] - "subset-sum with large W" (
up to – )—shift-and-OR DP - "XOR system of equations" or "toggle puzzle"—Gaussian elimination over GF(2)
- "Hamming distance between binary strings"—
(a ^ b).count() - "count common elements" across many set pairs—
(a & b).count() - "bipartite matching on dense graph" with
—bitset-accelerated augmenting paths - boolean DP that is barely TLE at
or —the /64 constant shaves it to AC
The unifying signal: an inner loop that does boolean operations on a vector of bits, and
What the Code Won't Teach You
The implementations above show how to apply bitset optimization. What they cannot convey is the judgment call: should I use this here?
The 64× speedup is real but the constant matters. At
You save a factor of 64, not a change in the exponent. Before reaching for bitset, check whether the reduced complexity actually fits within the time limit.
text
N = 5000 complexity check
+------------------+-----------------+----------+
| Algorithm | Operations | Verdict |
+------------------+-----------------+----------+
| n*n / 64 | 390K | FAST |
+------------------+-----------------+----------+
| n*n*n / 64 | 2 * 10^9 | TLE |
+------------------+-----------------+----------+
| n*n | 25 * 10^6 | OK |
+------------------+-----------------+----------+
| n*n*n | 125 * 10^9 | TLE |
+------------------+-----------------+----------+If you see
text
Signal detection for bitset problems
+--- Is the inner loop boolean ops ---+
| |
v YES v NO
| |
+--- N in 1000 to 5000 ---+ +---> skip bitset
| |
v YES v NO
| |
* TRY BITSET * +---> probably not neededBitset optimization is a competitive programming trick, not a production technique. In real software, you would use SIMD intrinsics, parallelism, or algorithmic improvements instead of std::bitset. In contests, bitset is the fastest path from TLE to AC for boolean DP problems with large
See also: Data Structure Selection Guide for choosing between bitset, bitmask DP, and plain arrays under contest pressure.
When NOT to Use Bitset
Knowing when bitset doesn't help is as important as knowing when it does. Compare the decision table in the Data Structure Selection Guide.
When is Small Enough That Regular Arrays Are Fine
If /64 speedup is pointless when the naive approach already fits comfortably in the time limit. Adding bitset to a problem that doesn't need it just makes your code harder to read and debug.
text
n = 500: O(n^3) = 1.25 * 10^8 <-- fits in ~1 sec
n = 1000: O(n^2) = 10^6 <-- trivial
--> no bitset needed for these sizesWhen You Need Individual Element Access More Than Bulk Operations
Bitset shines at bulk operations (|=, &=, count(), <<=). But if your inner loop does if (bs.test(i)) { ... complex logic per element ... }, you are not benefiting from the 64-wide parallelism—you are just using a memory-efficient bool array. If most operations are single-bit reads/writes with per-element branching, a plain bool[] or vector<bool> is simpler and equally fast.
When Memory Isn't the Bottleneck
Bitset saves memory (8× denser than bool[]). But if your arrays are small or you have plenty of memory headroom, the density advantage is irrelevant. Don't use bitset purely for memory savings unless you are actually near the memory limit.
When the Constant Factor of Bitset Overhead Isn't Worth It
std::bitset operations carry overhead: shifts involve copying and shifting across word boundaries, count() calls popcount on every word. For small int or long long bitmask. If your mask fits in 64 bits, use uint64_t with raw bit ops—it will be faster and simpler.
text
N <= 64: use uint64_t with __builtin_popcountll()
N <= 128: consider two uint64_t values
N > 128: bitset<N> starts to pay offWhen Bit Manipulation per Element Is Needed
std::bitset does not support per-element arithmetic, comparisons, or conditional operations across packed words. You cannot do "for each set bit, add its index to a sum" in
- Per-element min/max/addition across two arrays
- Conditional updates based on each element's value
- Arithmetic operations (not boolean) on packed data
...then bitset is the wrong tool. You need SIMD intrinsics or just a plain array.
Summary: Skip Bitset When...
text
+------------------------------------------------------+----------+
| Situation | Use what |
+------------------------------------------------------+----------+
| n <= 500, O(n^3) fits | plain [] |
| n <= 64, need bit ops | uint64_t |
| Inner loop is per-element branching | bool[] |
| Operations are min/max/addition, not boolean | int[] |
| Memory is not a concern, n is small | plain [] |
| Need to iterate set bits with complex per-bit logic | uint64_t |
+------------------------------------------------------+----------+std::bitset API Reference
All in <bitset> (included via <bits/stdc++.h>).
| Function / Operator | What it does | Time |
|---|---|---|
bitset<N>() | Construct all-zero bitset of | |
bitset<N>(unsigned long long v) | Construct from integer | |
bitset<N>(string s) | Construct from "010110" string | |
a & b, a &= b | Bitwise AND | |
a | b, a |= b | Bitwise OR | |
a ^ b, a ^= b | Bitwise XOR | |
~a | Bitwise NOT (flip all) | |
a << k, a <<= k | Left shift by | |
a >> k, a >>= k | Right shift by | |
a == b, a != b | Equality comparison | |
a.count() | Number of set bits (popcount) | |
a.any() | True if any bit is set | |
a.none() | True if no bit is set | |
a.all() | True if all bits are set | |
a.test(pos) | Check bit at position pos | |
a.set() | Set all bits to 1 | |
a.set(pos) | Set bit at pos to 1 | |
a.set(pos, val) | Set bit at pos to val | |
a.reset() | Clear all bits to 0 | |
a.reset(pos) | Clear bit at pos | |
a.flip() | Flip all bits | |
a.flip(pos) | Flip bit at pos | |
a[pos] | Access bit (returns proxy reference) | |
a.size() | Returns | |
a.to_string() | Convert to "010110..." string | |
a.to_ulong() | Convert to unsigned long | |
a.to_ullong() | Convert to unsigned long long | |
a._Find_first() | Index of lowest set bit (GCC only) | |
a._Find_next(pos) | Index of next set bit after pos (GCC) |
_Find_first() and _Find_next() return N (the template size) when no set bit is found. They are essential for iterating over set bits:
cpp
for (int i = bs._Find_first(); i < (int)bs.size(); i = bs._Find_next(i)) {
// process set bit i
}Implementation: Transitive Closure
cpp
#include <bits/stdc++.h>
using namespace std;
const int MAXN = 2005;
bitset<MAXN> reach[MAXN];
int main() {
int n, m;
scanf("%d %d", &n, &m);
vector<vector<int>> adj(n);
vector<int> indeg(n, 0);
for (int i = 0; i < m; i++) {
int u, v;
scanf("%d %d", &u, &v);
adj[u].push_back(v);
indeg[v]++;
}
// topological sort (Kahn's)
queue<int> q;
// Invariant: q contains all nodes with current indeg == 0
for (int i = 0; i < n; i++)
if (indeg[i] == 0) q.push(i);
vector<int> order;
// Invariant: order[0..len-1] is a valid topological prefix
while (!q.empty()) {
int u = q.front(); q.pop();
order.push_back(u);
// Invariant: decrementing indeg for u's neighbors; new zero-indeg nodes enqueued
for (int v : adj[u])
if (--indeg[v] == 0) q.push(v);
}
// propagate in reverse topological order
// Invariant: reach[u] is complete for all u processed so far (later in topo order)
for (int i = (int)order.size() - 1; i >= 0; i--) {
int u = order[i];
reach[u].set(u); // u reaches itself
// Invariant: reach[v] is already final for each neighbor v (processed earlier)
for (int v : adj[u])
reach[u] |= reach[v]; // O(n/64) per edge
}
// answer queries: reach[u].test(v)
// Invariant: row i is fully printed before moving to row i+1
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++)
putchar(reach[i].test(j) ? '1' : '0');
putchar('\n');
}
}Total:
Implementation: Subset-Sum
cpp
#include <bits/stdc++.h>
using namespace std;
const int MAXW = 100001;
int main() {
int n, W;
scanf("%d %d", &n, &W);
bitset<MAXW> dp;
dp.set(0); // sum 0 is achievable
// Invariant: dp[s] == 1 iff sum s is achievable using items 0..i-1
for (int i = 0; i < n; i++) {
int w;
scanf("%d", &w);
dp |= (dp << w); // O(W/64) per weight
}
puts(dp.test(W) ? "YES" : "NO");
}Total:
Recovering the Subset
If you need to print which items were chosen, track a bitset per item (or work backwards):
cpp
#include <bits/stdc++.h>
using namespace std;
const int MAXW = 100001;
int main() {
int n, W;
scanf("%d %d", &n, &W);
vector<int> w(n);
for (int i = 0; i < n; i++) scanf("%d", &w[i]);
bitset<MAXW> dp;
dp.set(0);
// remember which items were used
vector<bitset<MAXW>> after(n);
// Invariant: after[i] = dp state after considering items 0..i
for (int i = 0; i < n; i++) {
dp |= (dp << w[i]);
after[i] = dp;
}
if (!dp.test(W)) { puts("NO"); return 0; }
// backtrack
puts("YES");
int cur = W;
// Invariant: cur = remaining sum to reconstruct using items 0..i
for (int i = n - 1; i >= 0; i--) {
// was item i used? check if cur-w[i] was achievable before item i
bitset<MAXW>& prev = (i > 0) ? after[i - 1] : (dp.reset(), dp.set(0), dp);
if (cur >= w[i] && prev.test(cur - w[i])) {
printf("%d ", w[i]);
cur -= w[i];
}
}
puts("");
}Implementation: Gaussian Elimination over GF(2)
Find the rank (or a solution) of a system of XOR equations. Each equation:
cpp
#include <bits/stdc++.h>
using namespace std;
const int MAXN = 2005;
bitset<MAXN> row_bs[MAXN]; // row_bs[i] stores coefficients + augmented bit
int gauss(int n, int m) {
// n equations, m variables
// augmented bit is at position m
int rank = 0;
// Invariant: rows 0..rank-1 are pivot rows with leading 1 in distinct columns
for (int col = 0; col < m; col++) {
// find pivot: a row in [rank..n) with bit col set
int pivot = -1;
// Invariant: rows 0..rank-1 already have pivots; search only rank..n-1
for (int r = rank; r < n; r++) {
if (row_bs[r].test(col)) { pivot = r; break; }
}
if (pivot == -1) continue; // no pivot for this column, skip
swap(row_bs[rank], row_bs[pivot]);
// Safety: verify the pivot row is non-zero (guards against subtle bugs
// if test() and swap logic are ever refactored)
if (!row_bs[rank].any()) continue;
// Invariant: row_bs[rank] has bit col set; eliminate col from all other rows
for (int r = 0; r < n; r++) {
if (r != rank && row_bs[r].test(col))
row_bs[r] ^= row_bs[rank]; // O(m/64)
}
rank++;
}
return rank; // rank may be 0 if all rows were zero
}
int main() {
int n, m;
scanf("%d %d", &n, &m);
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
int x; scanf("%d", &x);
if (x) row_bs[i].set(j);
}
int b; scanf("%d", &b);
if (b) row_bs[i].set(m); // augmented column
}
int rank = gauss(n, m);
printf("Rank = %d\n", rank);
if (rank == 0) {
// All rows are zero (or were zeroed out). Check if any had a
// non-zero augmented bit -- that means 0 = 1, inconsistent.
for (int r = 0; r < n; r++) {
if (row_bs[r].test(m)) {
puts("No solution");
return 0;
}
}
puts("Solution exists (all variables are free)");
return 0;
}
// check consistency: if any row has all-zero coefficients but set augmented bit
for (int r = rank; r < n; r++) {
if (row_bs[r].test(m)) {
puts("No solution");
return 0;
}
}
puts("Solution exists");
}Complexity Reference
The table below summarizes the cost and space of every bitset operation and the three canonical algorithms that benefit from the
| Operation | Time | Space |
|---|---|---|
Construct bitset<N> | ||
a |= b, a &= b, a ^= b | -- | |
a <<= k, a >>= k | -- | |
a.count() | -- | |
_Find_first(), _Find_next() | -- | |
| Transitive closure ( | ||
| Subset-sum ( | ||
| Gaussian elimination GF(2) ( | ||
| Hamming distance (two bitsets) | -- | |
| Set intersection size | -- |
With the reference table as a guide, here are the recurring problem shapes where bitset optimization pays off. For the underlying bit-level tricks used inside these patterns, see Bit Manipulation.
Problem Patterns
Transitive Closure / Reachability
Pattern: Propagate reachability sets along edges using |=.
cpp
reach[u] |= reach[v]; // for each edge u -> vProblems:
Subset-Sum / Knapsack
Pattern: Shift-and-OR to propagate achievable sums.
cpp
dp |= (dp << w);Works for 0/1 knapsack feasibility. For bounded quantities, repeat shifts or use the standard binary decomposition trick.
Problems:
- CF 1856E2 -- PermuTree (hard version)
- AtCoder ABC221 -- Subset Sum
- CSES -- Coin Combinations I (bitset variant)
Gaussian Elimination over GF(2)
Pattern: XOR row operations on bitset rows.
cpp
if (row_bs[r].test(col)) row_bs[r] ^= row_bs[pivot];Use cases: counting XOR basis size, solving toggle puzzles, finding linear dependencies among bitmasks.
Problems:
Hamming Distance Computation
Pattern: (a ^ b).count() gives the Hamming distance between two binary strings in
cpp
int dist = (a ^ b).count();Useful for nearest-neighbor searches or clustering on binary vectors.
text
Hamming distance with bitset XOR
a | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
+---+---+---+---+---+---+---+---+
XOR
b | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
+---+---+---+---+---+---+---+---+
|
v
a XOR b | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 |
+---+---+---+---+---+---+---+---+
^ ^ ^
* * *
popcount --> 3 bits differProblems:
Set Intersection Counting
Pattern: (a & b).count() gives
Useful when you have many sets and need pairwise intersection sizes. For example: counting common friends between all pairs of users.
cpp
int common = (friends[u] & friends[v]).count();text
Set intersection with bitset AND
friends of u | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
+---+---+---+---+---+---+---+---+
AND
friends of v | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 |
+---+---+---+---+---+---+---+---+
|
v
u AND v | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
+---+---+---+---+---+---+---+---+
^ ^
* common *
popcount --> 2 friends in commonProblems:
Bipartite Matching Speedup
Pattern: In Hopcroft-Karp or Hungarian algorithm, use bitset to represent which vertices on the right side are still unmatched. Finding an available neighbor becomes (adj[u] & unmatched)._Find_first().
cpp
bitset<MAXN> unmatched;
unmatched.set(); // all initially unmatched
// ...
int v = (adj[u] & unmatched)._Find_first();
if (v < n) { /* match u to v */ unmatched.reset(v); }text
Bitset matching -- find first available neighbor
adj of u | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
+---+---+---+---+---+---+---+---+
AND
unmatched | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 |
+---+---+---+---+---+---+---+---+
|
v
result | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
+---+---+---+---+---+---+---+---+
^
|
_Find_first() --> bit 3
match u to vertex 3Problems:
DP over Subsets with Fast Iteration
Pattern: Iterate over set bits of a bitset using _Find_first() / _Find_next() instead of checking every position.
cpp
for (int i = bs._Find_first(); i < N; i = bs._Find_next(i)) {
// only visits set bits -- skips 64 zeros at a time
}This replaces the for (int i = 0; i < N; i++) if (bs.test(i)) pattern and is significantly faster when the bitset is sparse.
Contest Cheat Sheet
text
+------------------------------------------------------------------+
| BITSET OPTIMIZATION -- QUICK REFERENCE |
+------------------------------------------------------------------+
| #include <bitset> // or bits/stdc++.h |
| bitset<N> bs; // N must be compile-time constant |
| |
| CORE OPS (all O(N/64)): |
| a |= b a &= b a ^= b ~a a <<= k a >>= k |
| a.count() a.any() a.none() a.all() |
| |
| SINGLE BIT (all O(1)): |
| a.set(i) a.reset(i) a.flip(i) a.test(i) a[i] |
| |
| GCC EXTENSIONS: |
| a._Find_first() // lowest set bit index |
| a._Find_next(pos) // next set bit after pos |
| both return N if no bit found |
| |
| PATTERNS: |
| Reachability: reach[u] |= reach[v] O(n^3/64) |
| Subset-sum: dp |= (dp << w) O(nW/64) |
| Gauss GF(2): row[r] ^= row[pivot] O(n^3/64) |
| Hamming dist: (a ^ b).count() O(N/64) |
| Intersection: (a & b).count() O(N/64) |
+------------------------------------------------------------------+If I Had 5 Minutes
If you have 5 minutes before a contest round and need a bitset refresher:
bitset<N>packsbools into words. All bitwise ops ( |=,&=,^=,<<=,count()) run in. - Subset-sum:
dp |= (dp << w)—one line replaces the entire inner loop. Shift goes left (toward higher sums). - Reachability / transitive closure:
reach[u] |= reach[v]in reverse topological order.. - GCC-only iteration:
_Find_first()/_Find_next(pos)to walk set bits. Both returnwhen exhausted. - Template size
must be a compile-time constant. Use const int MAXN = ...;andbitset<MAXN>.
Before You Code Checklist
Before writing bitset code in a contest, run through these five checks:
- [ ] Is the inner loop boolean? The operations must be OR, AND, XOR, or shift—not min, max, or addition.
- [ ] Is
in the sweet spot (1000–5000)? Below 500, brute force works. Above 5000, even may TLE. - [ ] Does the reduced complexity fit? Calculate:
or . Does it fit in operations? - [ ] Is memory OK?
bitset<N> arr[N]usesbytes. For : ~3 MB (fine). For : ~300 MB (too much). - [ ] Is the template size a compile-time constant? You cannot write
bitset<n>wherenis read from input.
Gotchas and Debugging
Template Parameter Must Be a Compile-Time Constant
cpp
int n;
cin >> n;
bitset<n> bs; // COMPILE ERROR: n is not constexprYou must use a fixed upper bound: bitset<MAXN>. If you need runtime-sized bit arrays, use vector<uint64_t> and do the bit packing yourself (or use vector<bitset<BLOCK>>, but that is rarely worth the complexity).
No Dynamic Resizing
Unlike vector<bool>, a bitset has fixed size. You cannot push_back or resize. Pick MAXN large enough, but be aware of memory: bitset<100000> uses about 12.5 KB, and an array of 5000 of them uses 61 MB.
_Find_first() and _Find_next() Are GCC-Only
These do not exist in MSVC or standard C++. On Codeforces and most OJs (which use GCC), they work fine. If you are on MSVC, you need to write your own using the underlying word array (not portable).
operator[] Returns a Proxy, Not a Bool
cpp
bitset<100> bs;
auto x = bs[5]; // x is bitset::reference, not bool
bool y = bs[5]; // OK: implicit conversionThis rarely causes bugs, but can confuse auto type deduction. Prefer bs.test(5) when you need a plain bool.
Shifts Move Bits, Not the Object
bs << 3 shifts all bits left by 3 positions (higher indices). It does NOT shift the bitset object in memory. Bit 0 becomes bit 3, bit 1 becomes bit 5, etc. Bits shifted past position
text
Before: bit positions [4][3][2][1][0]
values 1 0 1 1 0
After << 2: [4][3][2][1][0]
1 1 0 0 0
^ ^
bits 0,1 moved to 2,3; old bit 4 lostPerformance Is , Not
A common mistake: assuming a |= b is bitset<64> it behaves like bitset<100000> it touches 1563 words. Factor this into your complexity analysis.
Memory: Arrays of Bitsets Add Up Fast
text
bitset<5000> reach[5000];
Memory: 5000 * ceil(5000/64) * 8 bytes
= 5000 * 79 * 8
= 3,160,000 bytes ~ 3 MB <-- usually fine
bitset<100000> dp[100000];
Memory: 100000 * 1563 * 8 bytes
~ 1.25 GB <-- way over MLAlways estimate memory before declaring large bitset arrays.
Mental Traps
Trap 1—Applying bitset to non-boolean operations. You see an bitset. But the loop computes dp[i] = min(dp[i], dp[j] + cost[j])—a min operation with data dependencies. Bitset only speeds up independent boolean (AND/OR/XOR) operations across packed words. If the loop has min, max, addition, or depends on the previous iteration's result, bitset cannot help.
text
WRONG -- data dependent min/max loop
+----+----+----+----+----+----+----+----+
| d0 | d1 | d2 | d3 | d4 | d5 | d6 | d7 |
+----+----+----+----+----+----+----+----+
|
v d1 needs d0 result first
+----+----+----+----+----+----+----+----+
| d0 | ?? | d2 | d3 | d4 | d5 | d6 | d7 |
+----+----+----+----+----+----+----+----+
|
v d2 needs d1 -- serial chain
cannot pack into 64-bit parallel ops
RIGHT -- independent boolean OR loop
+--------+--------+--------+--------+
| word 0 | word 1 | word 2 | word 3 | <-- a
+--------+--------+--------+--------+
| | | |
v OR v OR v OR v OR
+--------+--------+--------+--------+
| word 0 | word 1 | word 2 | word 3 | <-- b
+--------+--------+--------+--------+
all 4 words OR in parallelTrap 2—Assuming bitset operations are a |= b is truly one instruction. But at a |= b as
text
N = 64 -- fits in 1 word
+----------------------------------+
| single 64-bit word | 1 OR instruction
+----------------------------------+
N = 10000 -- spans 157 words
+------+------+------+------+------+--- ---+-------+
| w0 | w1 | w2 | w3 | w4 | ... | w156 |
+------+------+------+------+------+--- ---+-------+
| | | | | |
v OR v OR v OR v OR v OR ... v OR
157 OR instructions -- NOT constant timeTrap 3—Shifting the wrong direction in knapsack. The bitset knapsack uses dp |= (dp << w). Students often write dp >> w by mistake. Left shift moves bits to higher indices—adding weight. Right shift moves to lower indices—subtracting weight, which is wrong for 0/1 knapsack.
text
dp before bit0 bit1 bit2 bit3 bit4
+-----+-----+-----+-----+-----+
| 1 | 0 | 1 | 0 | 0 |
+-----+-----+-----+-----+-----+
sums reachable: 0 and 2
WRONG dp >> 2 RIGHT dp << 2
+-----+-----+-----+ +-----+-----+-----+-----+-----+
| 1 | 0 | 0 | | 0 | 0 | 1 | 0 | 1 |
+-----+-----+-----+ +-----+-----+-----+-----+-----+
bit2 lost sums shifted UP by 2
bits go DOWN (subtract w) bits go UP (add w -- correct)The Mistake That Teaches
You're solving a problem: given
cpp
bitset<100001> dp;
dp.set(0);
for (int i = 0; i < n; i++)
dp |= (dp << w[i]);It passes the samples. You submit—Wrong Answer on test 3.
You stare at the code. The logic is correct. You add debug prints: the DP bitset looks right for small cases. You stress test against a brute-force—and finally catch it: one of the weights is 0. When dp << 0 is just dp, so dp |= dp is a no-op. That's fine—weight 0 doesn't change achievable sums. But your input-reading code skips the scanf for items with w[i] == 0, causing all subsequent weights to be read incorrectly.
The lesson: the bitset part was correct. The bug was in ordinary I/O logic that you stopped checking because you trusted the "clever" part. In bitset problems, the bug is almost never in the bitset operation—it's in the setup, indexing, or edge cases around it. Always stress test the boring parts.
Self-Test
Before moving on, verify you can do these without looking at the topic file or reference material:
- [ ] Write the 3-line bitset knapsack DP (
dp[0] = 1; for each item: dp |= (dp << w[i])), explain why the shift goes left, and verify it on a small example by hand. - [ ] Implement Floyd-Warshall transitive closure using
bitset<N>row arrays—get thereach[i] |= reach[k]propagation direction right without looking it up. - [ ] Explain the difference between
bitset<1000>andvector<bool>in terms of what operations arevs vs . - [ ] State under what
and conditions bitset knapsack is likely to pass vs TLE—give concrete threshold numbers. - [ ] Explain why
bitset<n>wherenis a runtime variable is a compile error, and state the standard workaround. - [ ] Estimate the memory usage of
bitset<5000> arr[5000]without a calculator—get within 2× of the real answer. - [ ] Given a problem with
and boolean row operations, identify whether bitset optimization applies and estimate the resulting complexity.
Scenario Drills
Scenario 1: You have a graph with
Answer: Not directly. Bitset optimizes boolean operations, and shortest-path requires min/addition. However, you could use BFS layers: maintain
bitset<4000> layer[k]for "nodes at distance exactly". Propagate layer[k+1] = (neighbors of layer[k]) & ~visited. This gives all-pairs distances inwhere is the diameter. If is small, this works.
Scenario 2: You need to count the number of distinct subset-sums (not just check if
Answer: The bitset knapsack
dp |= (dp << w[i])already computes all achievable sums. After processing all items,dp.count()gives the number of distinct achievable sums in. This is a direct application.
Scenario 3: You're solving a 2-SAT problem with
Answer: Yes. Build the implication graph and compute transitive closure with bitset reachability in
. For : —tight but possible with the low constant of bitset ops. See SCC and 2-SAT for the standard approach.
Debug This Exercises
Find the bug in each snippet. Answers are below (hidden by a horizontal rule).
Bug 1: Subset-Sum Shift Direction
cpp
bitset<100001> dp;
dp.set(0);
for (int i = 0; i < n; i++) {
dp |= (dp >> w[i]); // <-- is this right?
}
printf("%s\n", dp.test(W) ? "YES" : "NO");Bug 2: Transitive Closure Propagation Order
cpp
// Process in topological order (NOT reverse)
for (int i = 0; i < (int)order.size(); i++) {
int u = order[i];
reach[u].set(u);
for (int v : adj[u])
reach[u] |= reach[v]; // <-- will this work?
}Bug 3: Iterating Set Bits—Off-by-One
cpp
bitset<1000> bs;
// ... fill bs ...
for (int i = bs._Find_first(); i <= 1000; i = bs._Find_next(i)) {
process(i); // <-- any problem?
}Bug 4: Gaussian Elimination—Missing Augmented Column
cpp
int gauss(int n, int m) {
int rank = 0;
for (int col = 0; col <= m; col++) { // <-- note the <= m
int pivot = -1;
for (int r = rank; r < n; r++)
if (row_bs[r].test(col)) { pivot = r; break; }
if (pivot == -1) continue;
swap(row_bs[rank], row_bs[pivot]);
for (int r = 0; r < n; r++)
if (r != rank && row_bs[r].test(col))
row_bs[r] ^= row_bs[rank];
rank++;
}
return rank;
}Debug This—Answers
Bug 1: dp >> w[i] shifts bits right (toward lower indices), which subtracts weight instead of adding it. Fix: dp |= (dp << w[i]). Left shift moves achievable sums to higher values.
Bug 2: Processing in forward topological order means when you process node reach[v] is incomplete. Fix: process in reverse topological order so successors are finalized before predecessors.
Bug 3: The loop condition i <= 1000 should be i < 1000. _Find_first() and _Find_next() return N (the template size, 1000) when no set bit is found. With <=, you'd process index 1000, which is out of bounds. Fix: i < 1000 (or i < (int)bs.size()).
Bug 4: The loop iterates col from 0 to col < m (strict less-than).
Practice Problems
| # | Problem | Source | Difficulty | Key Idea |
|---|---|---|---|---|
| 1 | Reachability Queries | CSES | 1600 | Transitive closure with bitset |
| 2 | Square Subsets | CF 895C | 1900 | Gaussian elimination, count XOR bases |
| 3 | (Zero XOR Subset)-less | CF 1101G | 1700 | XOR linear basis with bitset |
| 4 | 0-1 MST | CF 1242B | 1900 | BFS with bitset complement graph |
| 5 | PermuTree (hard) | CF 1856E2 | 2100 | Subset-sum with bitset for partition |
| 6 | Alex and a TV Show | CF 1097F | 2400 | Bitset Mobius / divisor operations |
| 7 | Coins | CSES | 1500 | Knapsack feasibility with bitset |
| 8 | Developing Skills | CF 581D | 1500 | Greedy + bitset DP variant |
What to Solve Next
If you are comfortable with the three core patterns (transitive closure, subset-sum, Gaussian GF(2)), here is where to go:
- Quick-reference recipes—Bitset Tricks is the contest-ready cheat sheet companion to this file. Skim it before your next round.
- Solidify recognition—work through the Practice Ladder (1100–1400) problems that involve boolean DP. Practice spotting the "/64 signal" before reading the editorial.
- Deepen bit-level fluency—revisit Bit Manipulation for the per-element tricks (isolating lowest set bit, popcount idioms) that complement bitset bulk operations.
- Bitset meets knapsack—DP—Knapsack uses bitset left-shift to turn
feasibility DP into . - Learn bitmask DP—DP—Bitmask covers the
state-space approach where the mask is the DP state, not a packed boolean array. Know which is which. - Explore graph structures—bitset reachability connects directly to SCC and 2-SAT and Floyd-Warshall transitive closure. Understanding those algorithms makes the bitset acceleration feel natural.
- Contest decision-making—the Data Structure Selection Guide helps you decide between bitset, segment tree, and other tools under time pressure.
Further Reading
- cp-algorithms: Bitset—submask enumeration and related techniques
- Errichto's bitset tutorial (CF Blog)—thorough walkthrough with problems
- neal's bitset tricks (CF Blog)—advanced applications including matching
- GCC bitset source—understanding
_Find_first()and_Find_next()internals