DP -- Bitmask

AL-29 | Use an integer to represent a subset of $n$ elements, then DP over all $2^{n}$ subsets. The go-to technique when $n \leq 20$ and you need to track "which elements have been used."

Difficulty: [Intermediate]Prerequisites: DP -- 1D IntroductionCF Rating Range: 1600-2000

Quick Revisit
USE WHEN: n <= 20 elements and you need to track which elements have been used
INVARIANT: dp[mask] = optimal answer when the set of processed elements equals mask
TIME: O(2^n * n) | SPACE: O(2^n)
CLASSIC BUG: using (1 << n) with int when n >= 31 -- use (1LL << n)
PRACTICE: DP Practice Ladder

Intuition
C++ STL Reference
Implementation (Contest-Ready)
- TSP Minimal * TSP Explained
- Subset Sum / Partition * Min Cost Assignment
Operations Reference
Problem Patterns & Tricks
Contest Cheat Sheet
Gotchas & Debugging * Mental Traps
Self-Test
Practice Problems
Further Reading

Contest Frequency: ** Regular -- appears in Div 2 D/E when n <= 20

Intuition

Consider the Travelling Salesman Problem on 5 cities. The naive approach tries every ordering: $5! = 120$ permutations. Manageable. But scale to $n = 20$ and you face $20! \approx 2.4 \times 10^{18}$ permutations -- utterly impossible even at $10^{9}$ operations per second.

The root of the explosion is that brute force distinguishes the order you visited cities in, even when the only things that matter for future decisions are (a) which cities you have already visited, and (b) where you are right now. Two histories that visit the same set and end at the same city are interchangeable. Brute force ignores that and recounts every permutation separately.

Encode which cities are visited as a bitmask -- dp[mask][last] = best cost to visit exactly the cities in mask, ending at last. This reduces permutations to $2^{n} \times n$ states.

Think of a checklist with $n$ boxes. Each box is either ticked (visited) or unticked. An integer's binary representation is exactly such a checklist: bit $i$ is 1 if city $i$ is ticked. Two travellers who have ticked the same boxes and stand in the same city are in identical situations -- we only need to keep the best cost among them. The bitmask is the checklist.

For $n = 20$ : $2^{20} \times 20 \approx 2 \times 10^{7}$ states, each with at most $n$ transitions. That is roughly $4 \times 10^{8}$ work -- tight but feasible in 1-2 seconds.

Visual Walkthrough

Five cities: $0, 1, 2, 3, 4$ . Start at city 0. The bitmask has 5 bits; bit $i$ set means city $i$ visited.

  Step 0 -- Base case
  mask = 00001 (decimal 1), last = 0
  Meaning: only city 0 visited, standing at 0.  Cost = 0.

  Step 1 -- Expand from (00001, 0)
  Try adding city 1:  mask = 00011 (3),  last = 1,  cost = dist[0][1]
  Try adding city 2:  mask = 00101 (5),  last = 2,  cost = dist[0][2]
  Try adding city 3:  mask = 01001 (9),  last = 3,  cost = dist[0][3]
  Try adding city 4:  mask = 10001 (17), last = 4,  cost = dist[0][4]

  Step 2 -- Expand from (00011, 1)   [cities {0,1} visited, at city 1]
  Try adding city 2:  mask = 00111 (7),  last = 2,  cost = dp[00011][1] + dist[1][2]
  Try adding city 3:  mask = 01011 (11), last = 3,  cost = dp[00011][1] + dist[1][3]
  Try adding city 4:  mask = 10011 (19), last = 4,  cost = dp[00011][1] + dist[1][4]
  (and similarly from (00101, 2), (01001, 3), (10001, 4) ...)

  Step 3 -- Masks with 3 bits set, e.g. (00111, 2)  [cities {0,1,2}, at 2]
  Try adding city 3:  mask = 01111, last = 3
  Try adding city 4:  mask = 10111, last = 4
  Multiple paths may arrive at the same (mask, last) -- keep min cost.

  Step 4 -- Masks with 4 bits set, e.g. (01111, 3)  [cities {0,1,2,3}, at 3]
  Only city 4 remains:  mask = 11111, last = 4
  Cost = dp[01111][3] + dist[3][4]

  Final -- mask = 11111 (all visited).
  Answer = min over all last cities i of dp[11111][i] + dist[i][0].

  State count: 2^5 x 5 = 160  vs  5! = 120 permutations.
  At n = 20:  2^20 x 20 ~ 2*10^7  vs  20! ~ 2.4*10^18.

The savings come from merging all orderings that produce the same (mask, last) pair into a single DP cell.

Bitmask State-Space Lattice (n = 3)

  Each node is a mask. Arrows show transitions (setting one bit).
  In bitmask DP, you process nodes top-to-bottom (increasing mask value).
  Every dependency points upward -- to a numerically smaller mask.

                          111 (7)  <- all elements used (answer here)
                         / | \
                       /   |   \
                     /     |     \
                  011(3) 101(5) 110(6)   <- 2 elements used
                   /\     /\     /\
                  /  \   /  \   /  \
               001  010 001 100 010 100
                 \  |    |  /    |  /
                  \ |    | /     | /
                 001(1) 010(2) 100(4)    <- 1 element used
                    \    |    /
                     \   |   /
                      \  |  /
                      000 (0)            <- empty set (base case)

  Transition: mask -> mask | (1 << bit)   (move DOWN in diagram)
  Dependency: dp[mask] depends on dp[mask with one bit removed] (UP)

  Processing order: 000, 001, 010, 011, 100, 101, 110, 111
  = 0, 1, 2, 3, 4, 5, 6, 7  <- simple ascending loop!

The lattice makes the dependency structure explicit. We can now formalize the DP recurrence it implies.

The Invariant

+-----------------------------------------------------------------------+
| dp[mask][i] = optimal cost over ALL orderings of the cities in mask   |
|               that start at city 0 and end at city i.                 |
|                                                                       |
| Transition:                                                           |
|   dp[mask | (1 << j)][j] = min(dp[mask][i] + cost[i][j])             |
|   for every j NOT in mask (i.e., (mask >> j) & 1 == 0).              |
|                                                                       |
| Base case:  dp[1 << 0][0] = 0   (start at city 0, only city 0 done). |
| Answer:     min over i of dp[(1 << n) - 1][i] + cost[i][0].          |
+-----------------------------------------------------------------------+

What the Code Won't Teach You

These are conceptual insights that reading the implementation alone won't give you:

Why subset enumeration order matters. Iterating mask = 0, 1, 2, ..., (1<<n)-1 processes masks in increasing integer order. Any submask of mask is numerically smaller than mask (removing a bit always decreases the value). So by the time you compute dp[mask], every dp[submask] it depends on is already final. This is not obvious from the code -- it looks like "just a for-loop" -- but it's the reason the DP is correct without topological sorting.

The connection to inclusion-exclusion. SOS DP (Sum over Subsets) is essentially the inclusion-exclusion principle automated via bitmask DP. When you compute $f [mask] = \sum_{s \subseteq mask} a [s]$ , you're summing contributions of all subsets -- the same structure as inclusion-exclusion over set membership. Möbius inversion on the subset lattice is inclusion-exclusion. If you understand one, you understand the other.

When to add a second dimension beyond the mask. dp[mask] (1D) works when the mask alone determines the optimal value -- e.g., assignment problems where the worker index is popcount(mask). You need dp[mask][last] (2D) when future transitions depend on which specific element was last -- e.g., TSP, where the next edge cost depends on the current city. The question to ask: "given the same set of used elements, does it matter how I got here?" If yes, add the extra dimension.

When to Reach for This

Trigger phrases in the problem statement:

" $n \leq 20$ " (or $n \leq 15$ , $n \leq 18$ ) -- the single strongest signal.
"visit all nodes", "use every element exactly once", "assign each worker to a job."
"TSP", "Hamiltonian path/cycle", "assignment problem", "matching."
"partition into groups", "minimum number of subsets."

Counter-examples -- bitmask DP is NOT the answer when:

$n > 23$ -- $2^{23} \approx 8 \times 10^{6}$ states is borderline; $2^{24}$ with a factor of $n$ is too slow.
Items are independent (no interaction between choices) -- plain knapsack or greedy suffices.
The structure is a DAG or tree -- use tree DP or topological-order DP instead.
You need subset sums but $n$ is large and values are small -- use value-based knapsack DP.

C++ STL Reference

Function / Builtin	Header	What it does	Notes
`__builtin_popcount(x)`	built-in	Count of set bits in `unsigned int`	$O (1)$ . Use `__builtin_popcountll` for `long long`
`__builtin_ctz(x)`	built-in	Count trailing zeros (index of lowest set bit)	Undefined for `x == 0`. Use `__builtin_ctzll` for `long long`
`__builtin_clz(x)`	built-in	Count leading zeros	Undefined for `x == 0`
`1 << i`	--	Bitmask with only bit `i` set	Use `1LL << i` when `i >= 31`
`mask & (1 << i)`	--	Test if bit `i` is set	Nonzero (not necessarily 1) if set
`mask \| (1 << i)`	--	Set bit `i`	--
`mask ^ (1 << i)`	--	Toggle bit `i`	Only use to remove if bit is known to be set
`mask & (mask - 1)`	--	Clear lowest set bit	Useful for iterating set bits
`bit_ceil(x)`	`<bit>`	Smallest power of 2 $\geq x$ (C++20)	--
`popcount(x)`	`<bit>`	C++20 standard popcount for `unsigned` types	Prefer `__builtin_popcount` for contest portability

Implementation (Contest-Ready)

Version 1: TSP -- Minimum Cost Hamiltonian Cycle (Minimal)

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<vector<int>> dist(n, vector<int>(n));
    for (auto& row : dist)
        for (auto& x : row) cin >> x;

    const int INF = 1e9;
    // dp[mask][i] = min cost to visit exactly the cities in mask, ending at i
    vector<vector<int>> dp(1 << n, vector<int>(n, INF));
    dp[1][0] = 0;  // start at city 0

    for (int mask = 1; mask < (1 << n); ++mask)
        for (int u = 0; u < n; ++u) {
            if (dp[mask][u] >= INF) continue;
            if (!(mask & (1 << u))) continue;
            for (int v = 0; v < n; ++v) {
                if (mask & (1 << v)) continue;  // already visited
                int nmask = mask | (1 << v);
                dp[nmask][v] = min(dp[nmask][v], dp[mask][u] + dist[u][v]);
            }
        }

    int full = (1 << n) - 1;
    int ans = INF;
    for (int u = 0; u < n; ++u)
        ans = min(ans, dp[full][u] + dist[u][0]);
    cout << ans << "\n";
}

Version 1: TSP -- Explained

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<vector<int>> dist(n, vector<int>(n));
    for (auto& row : dist)
        for (auto& x : row) cin >> x;

    const int INF = 1e9;
    // dp[mask][i]: minimum cost to have visited exactly the set of cities
    // represented by mask, with the last city visited being i.
    // mask is a bitmask of n bits. Bit j set means city j has been visited.
    vector<vector<int>> dp(1 << n, vector<int>(n, INF));

    // Base case: start at city 0, only city 0 visited, cost 0
    dp[1][0] = 0;

    for (int mask = 1; mask < (1 << n); ++mask)
        for (int u = 0; u < n; ++u) {
            if (dp[mask][u] >= INF) continue;  // unreachable state
            if (!(mask & (1 << u))) continue;   // u must be in mask
            // Try extending to unvisited city v
            for (int v = 0; v < n; ++v) {
                if (mask & (1 << v)) continue;  // v already visited
                int nmask = mask | (1 << v);    // mark v as visited
                dp[nmask][v] = min(dp[nmask][v], dp[mask][u] + dist[u][v]);
            }
        }

    // Close the cycle: from every possible last city, return to city 0
    int full = (1 << n) - 1;
    int ans = INF;
    for (int u = 0; u < n; ++u)
        ans = min(ans, dp[full][u] + dist[u][0]);
    cout << ans << "\n";
}

Version 2: Subset Sum / Partition -- Assign Items to Groups (Minimal)

Given $n$ items with weights, check if they can be split into two groups with equal sum.

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<int> w(n);
    for (auto& x : w) cin >> x;

    int total = 0;
    for (int x : w) total += x;
    if (total % 2 != 0) { cout << "NO\n"; return 0; }

    int half = total / 2;
    // dp[mask] = sum of elements in the subset represented by mask
    // We want to find any mask with dp[mask] == half
    vector<int> dp(1 << n);
    for (int mask = 0; mask < (1 << n); ++mask) {
        if (mask == 0) { dp[0] = 0; continue; }
        // Find any set bit, compute sum incrementally
        int lsb = __builtin_ctz(mask);
        dp[mask] = dp[mask ^ (1 << lsb)] + w[lsb];
    }

    for (int mask = 0; mask < (1 << n); ++mask) {
        if (dp[mask] == half) {
            cout << "YES\n";
            return 0;
        }
    }
    cout << "NO\n";
}

Version 2b: Minimum Cost Assignment -- Explained

$n$ workers, $n$ jobs. cost[i][j] = cost of assigning worker $i$ to job $j$ . Each worker gets exactly one job. Minimize total cost. (Hungarian algorithm is $O (n^{3})$ , but for $n \leq 20$ , bitmask DP is simpler.)

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<vector<int>> cost(n, vector<int>(n));
    for (auto& row : cost)
        for (auto& x : row) cin >> x;

    const int INF = 1e9;
    // dp[mask] = min cost to assign jobs (the set bits in mask) to the
    // first popcount(mask) workers.
    // Worker index = popcount(mask) - 1 when we add the newest job.
    vector<int> dp(1 << n, INF);
    dp[0] = 0;

    for (int mask = 0; mask < (1 << n); ++mask) {
        if (dp[mask] >= INF) continue;
        int i = __builtin_popcount(mask);  // next worker to assign
        if (i >= n) continue;
        // Try assigning job j to worker i
        for (int j = 0; j < n; ++j) {
            if (mask & (1 << j)) continue;  // job j already taken
            int nmask = mask | (1 << j);
            dp[nmask] = min(dp[nmask], dp[mask] + cost[i][j]);
        }
    }
    cout << dp[(1 << n) - 1] << "\n";
}

Operations Reference

The dominant factor in bitmask DP is the $2^{n}$ state space. This table shows how the choice of pattern affects the constant factor multiplied against that exponential base.

Operation	Time	Space
TSP (visit all $n$ nodes, min cost path)	$O (2^{n} \cdot n^{2})$	$O (2^{n} \cdot n)$
Assignment problem ( $n$ workers, $n$ jobs)	$O (2^{n} \cdot n)$	$O (2^{n})$
Subset-of-subset DP (iterate all submasks of all masks)	$O (3^{n})$	$O (2^{n})$
SOS DP (Sum over Subsets)	$O (2^{n} \cdot n)$	$O (2^{n})$
Profile DP (grid with bitmask row state)	$O (m \cdot 2^{n} \cdot 2^{n})$	$O (2^{n})$
Computing `dp[mask]` incrementally from lowest bit	$O (2^{n})$ total	$O (2^{n})$

Practical limits: $n \leq 20$ for $O (2^{n} \cdot n)$ , $n \leq 15$ for $O (3^{n})$ , $n \leq 12$ for $O (4^{n})$ or heavy-constant methods.

Problem Patterns & Tricks

Pattern 1: TSP and Hamiltonian Path

Visit all $n$ nodes with minimum cost. State: dp[mask][last]. The classic bitmask DP.

Variant -- Hamiltonian path (no return): Answer is $min_{u} d p [(1 ≪ n) - 1] [u]$ without adding the return edge.

Variant -- Multiple starting cities: Initialize dp[1 << s][s] = 0 for all valid starts $s$ .

Examples: CF 1185G1 -- Playlist for Polycarp (Easy), CSES -- Hamiltonian Flights

TSP DP Table Concept

  TSP with 4 cities (0,1,2,3).  dp[mask][last] = min cost.
  Only reachable states shown.  Start: dp[0001][0] = 0.

           last ->   0       1       2       3
  mask ↓
  0001 (1)       [  0  ]    -       -       -
  0011 (3)          -    [ d01 ]    -       -
  0101 (5)          -       -    [ d02 ]    -
  1001 (9)          -       -       -    [ d03 ]
  0111 (7)          -    [  *  ] [  *  ]    -
  1011 (11)         -    [  *  ]    -    [  *  ]
  1101 (13)         -       -    [  *  ] [  *  ]
  1111 (15)         -    [  *  ] [  *  ] [  *  ]

  [*] = min over all predecessors:
    dp[mask][v] = min over u∈mask, u!=v of { dp[mask\v][u] + dist[u][v] }

  Example: dp[0111][2] = min( dp[0011][1] + dist[1][2],
                               dp[0101][0] + dist[0][2]  -- but 0 is start,
                               ... check all u in {0,1} with bit set )

  Answer: min over v of dp[1111][v] + dist[v][0]  (return to start)

  Dimensions: 2^4 x 4 = 64 cells  (vs 4! = 24 permutations)
  At n=20:   2^20 x 20 ~= 20M cells (vs 20! ~= 2.4 x 10^18)

Pattern 2: Assignment Problem (Workers to Jobs)

$n$ workers, $n$ jobs, minimize total cost. dp[mask] = min cost to assign the set of jobs in mask to the first popcount(mask) workers. Transition: try every unassigned job for the next worker.

Key insight: You don't need a 2D state because the worker index is determined by popcount(mask).

Examples: CF 1238E -- Keyboard Purchase, CSES -- General Matching

Pattern 3: SOS DP (Sum over Subsets)

Compute for each mask: $f [mask] = \sum_{s \subseteq mask} a [s]$ .

Naive: iterate submasks of each mask $\to O (3^{n})$ . SOS DP does it in $O (2^{n} \cdot n)$ using inclusion along each bit dimension:

cpp

vector<long long> f(a.begin(), a.end());  // copy of input array
for (int i = 0; i < n; ++i)
    for (int mask = 0; mask < (1 << n); ++mask)
        if (mask & (1 << i))
            f[mask] += f[mask ^ (1 << i)];
// now f[mask] = sum of a[s] for all s that are subsets of mask

This is essentially a "high-dimensional prefix sum." Inverse (Mobius) reverses the direction.

Examples: CF 165E -- Compatible Numbers, CF 449D -- Jzzhu and Numbers

Pattern 4: Profile DP (Broken Profile / Plug DP)

Fill a grid column by column (or row by row). The bitmask encodes which cells in the current column boundary are "occupied" by shapes from the previous column (e.g., dominoes, L-shapes).

State: dp[col][mask] where mask represents the profile at the boundary between column col and col+1.

Classic problem: Count ways to tile an $m \times n$ grid with $1 \times 2$ dominoes. $m \leq 12$ , $n$ can be large. State: bitmask of $m$ bits.

Examples: CF 1102F -- Tiling with Dominoes, CSES -- Counting Tilings

Pattern 5: Subset Partitioning

Partition $n$ items into $k$ groups (or any number of groups) optimizing some criterion. Iterate over submasks to try all ways to form the next group.

cpp

// dp[mask] = min groups to cover all items in mask
// group_ok[s] = true if subset s can form a valid group
for (int mask = 1; mask < (1 << n); ++mask)
    for (int s = mask; s > 0; s = (s - 1) & mask)
        if (group_ok[s])
            dp[mask] = min(dp[mask], dp[mask ^ s] + 1);

Complexity: $O (3^{n})$ (each element is in current group, in a previous group, or not yet assigned).

Examples: CF 1550E -- Stringforces, CF 1316E -- Team Building

Submask Enumeration Visualized

  Enumerating all non-empty submasks of mask = 1101 (13):

  Start: s = 1101 (13)
         ↓ (s-1) & mask = 1100 & 1101 = 1100
  s = 1100 (12)
         ↓ (s-1) & mask = 1011 & 1101 = 1001
  s = 1001 (9)
         ↓ (s-1) & mask = 1000 & 1101 = 1000
  s = 1000 (8)
         ↓ (s-1) & mask = 0111 & 1101 = 0101
  s = 0101 (5)
         ↓ (s-1) & mask = 0100 & 1101 = 0100
  s = 0100 (4)
         ↓ (s-1) & mask = 0011 & 1101 = 0001
  s = 0001 (1)
         ↓ (s-1) & mask = 0000 & 1101 = 0000
  s = 0000 -> STOP (loop condition s > 0)

  Total submasks visited: 7 (= 2^3 - 1, where 3 = popcount(1101))

  Why (s-1) & mask works:
  • (s-1) flips the lowest set bit and sets all lower bits
  • & mask keeps only bits that exist in the original mask
  • This "steps down" through submasks without visiting non-submasks

  Complexity over ALL masks: Σ 2^popcount(mask) = 3^n
  (each bit is: in mask & submask, in mask & not submask, or not in mask)

Pattern 6: Matching / Pairing with Bitmask

Pair up $2 k$ elements (from $n$ total) optimally. dp[mask] = best cost for pairing the elements in mask. Transition: pick the lowest unmatched element, try pairing it with every other unmatched element.

cpp

for (int mask = 0; mask < (1 << n); ++mask) {
    int i = __builtin_ctz(~mask);  // first unpaired element
    if (i >= n) continue;
    for (int j = i + 1; j < n; ++j)
        if (!(mask & (1 << j)))
            dp[mask | (1 << i) | (1 << j)] = min(
                dp[mask | (1 << i) | (1 << j)],
                dp[mask] + cost[i][j]);
}

Examples: CF 1579G -- Minimal Coverage, CF 1152F2 -- Neko Rules the Catniverse (Hard)

Contest Cheat Sheet

+------------------------------------------------------------------+
|                   DP BITMASK CHEAT SHEET (AL-29)                 |
+------------------------------------------------------------------+
| WHEN TO USE:                                                     |
|   - n <= 20 (or n <= 15 for O(3^n) subset iteration)            |
|   - "Assign/visit/partition ALL elements optimally"              |
|   - Problem says "permutation", "subset", "Hamiltonian"          |
+------------------------------------------------------------------+
| BIT OPERATIONS:                                                  |
|   test:   mask & (1 << i)                                        |
|   set:    mask | (1 << i)                                        |
|   clear:  mask & ~(1 << i)                                       |
|   toggle: mask ^ (1 << i)                                        |
|   lowest: __builtin_ctz(mask)                                    |
|   count:  __builtin_popcount(mask)                               |
+------------------------------------------------------------------+
| TSP:  dp[mask][u] = min cost visiting mask, ending at u          |
|   dp[nmask][v] = min(dp[nmask][v], dp[mask][u] + dist[u][v])    |
| ASSIGN: dp[mask] = min cost, worker = popcount(mask)             |
| SOS:  for i in [0,n): for mask: if bit i set, f[m]+=f[m^(1<<i)]|
+------------------------------------------------------------------+
| SUBMASK ITERATION:                                               |
|   for (int s = mask; s > 0; s = (s-1) & mask) { ... }           |
|   // handles all non-empty submasks; check s=0 separately        |
+------------------------------------------------------------------+
| TIME:  O(2^n * n) assign | O(2^n * n^2) TSP | O(3^n) submasks   |
| SPACE: O(2^n) or O(2^n * n)                                     |
+------------------------------------------------------------------+
| WATCH: 1LL << i if i >= 31 | n <= 20 hard limit | base cases    |
+------------------------------------------------------------------+

Gotchas & Debugging

1. The $n \leq 20$ constraint

$2^{20} = 1,048,576 \approx 10^{6}$ . Multiplied by $n = 20$ , that's $\sim 2 \times 10^{7}$ -- tight but fine. For $n = 24$ , $2^{24} \cdot 24 \approx 4 \times 10^{8}$ -- too slow. If $n > 20$ , bitmask DP is almost certainly not the intended approach.

For $O (3^{n})$ : $3^{15} \approx 1.4 \times 10^{7}$ is fine. $3^{20} \approx 3.5 \times 10^{9}$ -- way too slow. Check which complexity your approach actually needs.

2. `1 << i` vs `1LL << i`

1 << 31 is undefined behavior in C++ (signed integer overflow). Even for n = 20, if you compute 1 << n to get the full mask, that's $2^{20}$ which fits in int, but 1 << 31 does not. Safer: always use 1 << i with int when $n \leq 30$ , but if any intermediate value or the mask itself is long long, use 1LL << i.

cpp

// BAD: undefined behavior when i >= 31
int mask = 1 << 31;

// GOOD
long long mask = 1LL << 31;

3. Iterating submasks correctly

The loop for (int s = mask; s > 0; s = (s - 1) & mask) skips s = 0. If the empty subset is a valid state, handle it outside the loop. If s is unsigned, s > 0 after decrementing from 0 wraps to $2^{32} - 1$ -- infinite loop. Keep s as int.

4. Forgetting to check `dp[mask][u] < INF` before transitioning

Without this guard, you process unreachable states and pollute downstream values with INF + cost (potential overflow). Always skip states that haven't been reached.

5. Memory for `dp[1 << n][n]`

$2^{20} \times 20 \times 4$ bytes $= 80$ MB. This exceeds the typical 256 MB memory limit. Solutions:

Use dp[1 << n] (1D) when the second dimension can be derived (e.g., from popcount).
Use short instead of int if values fit.
Process masks in order so you only need two "layers" (rare for bitmask DP).

6. Wrong bit indexing

If cities/items are 1-indexed in the input but your bits are 0-indexed, every (1 << i) is off by one. Pick one convention and stick to it. The simplest: read into 0-indexed arrays, use 0-indexed bits.

7. Debug checklist

Print dp[mask] for all masks with small $n$ (3 or 4). Compare to brute-force.
Verify base cases: dp[0] = 0 (or dp[1 << start] = 0 for TSP).
Check that you never set a bit that's already set in mask.
Verify the answer location: dp[(1 << n) - 1] for full-set answers.

Mental Traps

These are the thinking errors that survive even after you "know" the technique:

Trap 1: "I need to try all permutations." Bitmask DP tracks which elements are used, not the order. Two orderings that produce the same (mask, last) pair are merged into one cell. Unless order is explicitly part of your state (e.g., dp[mask][last] in TSP), you do NOT enumerate permutations -- the mask already collapses them.

Trap 2: " $2^{n}$ is always too slow." Reality check: $2^{20} \approx 10^{6}$ , and $2^{20} \times 20 \approx 2 \times 10^{7}$ -- easily fits in a 2-second time limit. The wall is at $n \approx 23$ - $25$ . If the constraint says $n \leq 20$ , bitmask DP is designed for this.

Trap 3: "I'll just iterate all subsets inside the mask loop." Iterating all submasks of each mask is $O (3^{n})$ , NOT $O (4^{n})$ -- but $3^{20} \approx 3.5 \times 10^{9}$ is too slow. If you need submask enumeration, you need $n \leq 15$ . Confusing $O (2^{n} \cdot n)$ patterns (TSP, assignment) with $O (3^{n})$ patterns (subset partitioning) is a common source of TLE.

  Bitmask state transition -- adding element 2 to mask 01011:

  Before:    01011   (elements {0, 1, 3} used)
                ^
                bit 2 is 0 -> element 2 is available

  Set bit 2:  01011
             | 00100   (1 << 2)
             --------
              01111   (elements {0, 1, 2, 3} used)

  In code:   int nmask = mask | (1 << 2);   // 11 | 4 = 15
             //  01011 | 00100 = 01111

  WRONG if bit already set:
              01111 | 00100 = 01111  <- mask unchanged!
              This creates a silent self-loop. Always guard:
              if (mask & (1 << v)) continue;  // v already in mask

Common Mistakes

Int overflow on bit shift. (1 << n) overflows int when n >= 31. Always use (1LL << n).
Wrong initialization. For minimization, memset(dp, 0x3f, sizeof dp) and set dp[0] = 0. All-zeros initialization gives wrong answers because min(0, anything_positive) = 0.
Inconsistent indexing. Using (mask >> i) & 1 with 0-indexed elements but 1-indexed input causes silent off-by-one across the entire bitmask.
Forgetting the popcount check. Some problems need the k-th element assigned to a specific position. Use __builtin_popcount(mask) to determine which position you are filling.
Exceeding memory. dp[1 << 20] is about 4 MB for int, but dp[1 << 20][20] is 80 MB. Watch the state size.
mask & j instead of mask & (1 << j). The #1 bitmask DP bug. mask & j treats j as a bitmask, not a bit index, so city 2 (binary 10) collides with city 1's bit. Always convert an index to a mask with (1 << j) before testing.

Debug Drills

Drill 1. TSP bitmask DP gives a tour shorter than any single edge.

cpp

int dp[1 << 20][20];
memset(dp, 0x3f, sizeof dp);
dp[1][0] = 0; // start at city 0
for (int mask = 1; mask < (1 << n); mask++)
    for (int u = 0; u < n; u++) {
        if (!(mask & (1 << u))) continue;
        for (int v = 0; v < n; v++) {
            if (mask & (1 << v)) continue;
            dp[mask | (1 << v)][v] = min(dp[mask | (1 << v)][v], dp[mask][u] + dist[u][v]);
        }
    }
int ans = dp[(1 << n) - 1][0]; // return to city 0

What's wrong?

The final answer dp[(1<<n)-1][0] reads the cost of ending at city 0, but the DP only stores the cost of the last city visited, not including the return edge. Fix: ans = min over all u of (dp[(1<<n)-1][u] + dist[u][0]).

Drill 2. Subset enumeration misses the empty subset.

cpp

int total = 0;
for (int sub = mask; sub > 0; sub = (sub - 1) & mask)
    total += value[sub];

What's wrong?

The loop condition sub > 0 skips sub = 0 (the empty subset). If you need to include the empty set, add total += value[0]; after the loop. The idiom for (sub = mask; ; sub = (sub-1) & mask) { process(sub); if (sub == 0) break; } handles all subsets including empty.

Drill 3. Assignment DP assigns person 0 correctly but all others get cost 0.

cpp

int dp[1 << 20] = {};
for (int mask = 0; mask < (1 << n); mask++) {
    int person = __builtin_popcount(mask);
    if (person >= n) continue;
    for (int task = 0; task < n; task++) {
        if (mask & (1 << task)) continue;
        dp[mask | (1 << task)] = min(dp[mask | (1 << task)], dp[mask] + cost[person][task]);
    }
}

What's wrong?

dp is initialized to all zeros, so min(0, anything_positive) is always 0 for states beyond the first. Initialize memset(dp, 0x3f, sizeof dp) and set dp[0] = 0. Only the empty-set base case should be zero.

Self-Test

Before moving to the practice problems, make sure you can do each of these without looking anything up:

[ ] Explain what dp[mask] represents in a bitmask DP and why iterating masks in increasing order guarantees all dependencies (submasks) are already computed.
[ ] Implement TSP with bitmask DP from scratch: dp[mask][i] = min cost to visit the set mask ending at city i. Include correct base case, transition, and answer extraction.
[ ] Convert between "iterate over all submasks of mask" (for (int s = mask; s > 0; s = (s-1) & mask)) and "iterate over all supermasks of mask" (flip perspective: for each mask, check if your target is a submask).
[ ] State the time complexity of bitmask DP for TSP ( $O (2^{n} \cdot n^{2})$ ) and for assignment ( $O (2^{n} \cdot n)$ ). Explain why $n > 20$ - $23$ is too large.
[ ] Explain why dp[(1<<n)-1] gives the answer for "use all elements" problems -- the mask (1<<n)-1 has all $n$ bits set, representing the full set.
[ ] Debug scenario: mask already has bit i set but you try to add element i again via mask | (1 << i). What goes wrong? (The mask doesn't change, so you create a self-loop that may silently produce an incorrect shorter "path" that revisits a city.)
[ ] Write the submask enumeration idiom and state its total complexity when summed over all masks ( $O (3^{n})$ ).

Practice Problems

CF Rating	How bitmask DP tends to appear
1200	Rare — brute force handles $n \leq 10$
1500	Small TSP, assignment, "match $n$ items to $n$ slots"
1800	Bitmask DP + profile DP on grids, Hamiltonian path counting
2100	Broken profile DP, SOS DP (sum over subsets), bitmask + other DP dimensions

#	Problem	Source	Difficulty	Key Idea
1	Hamiltonian Flights	CSES	1700	Count Hamiltonian paths with bitmask DP
2	Matching	AtCoder DP-O	1600	Assignment problem, count perfect matchings
3	Compatible Numbers	CF 165E	1800	SOS DP -- find a compatible pair via superset sums
4	Team Building	CF 1316E	1800	Assignment + leftover scoring with bitmask
5	Jzzhu and Numbers	CF 449D	2000	SOS DP + inclusion-exclusion
6	Counting Tilings	CSES	2000	Profile DP on grid with dominoes
7	Elevator Rides	CSES	1800	Partition items into minimum groups with weight limit
8	Keyboard Purchase	CF 1238E	2000	SOS DP on adjacency bitmask
9	Little Elephant and Xor	CF 258D	2100	Digit DP meets bitmask reasoning
10	Fish	AtCoder DP-Y	1900	Probability + bitmask DP over elimination order

Recap

Core idea: an integer is a set. Bit i set means element i is included. Enumerate all 2^n subsets.
Transition pattern: for each mask, try adding each element i not in mask: dp[mask | (1 << i)].
Feasibility bound: n <= 20 (maybe 23 with careful optimization). Beyond that, look for other structure.
Classic problems: TSP, assignment, minimum-cost Hamiltonian path, subset partitioning.

Cross-Links

DP -- 1D Introduction -- DP foundations before subset enumeration
DP Thinking Framework -- bitmask is one of the 6 state archetypes
DP vs Greedy Guide -- assignment problems resist greedy; bitmask DP is the answer
DP Practice Ladder -- graded problem set

DP -- Bitmask ​

Contents ​

Intuition ​

Visual Walkthrough ​

Bitmask State-Space Lattice (n = 3) ​

The Invariant ​

What the Code Won't Teach You ​

When to Reach for This ​

C++ STL Reference ​

Implementation (Contest-Ready) ​

Version 1: TSP -- Minimum Cost Hamiltonian Cycle (Minimal) ​

Version 1: TSP -- Explained ​

Version 2: Subset Sum / Partition -- Assign Items to Groups (Minimal) ​

Version 2b: Minimum Cost Assignment -- Explained ​

Operations Reference ​

Problem Patterns & Tricks ​

Pattern 1: TSP and Hamiltonian Path ​

TSP DP Table Concept ​

Pattern 2: Assignment Problem (Workers to Jobs) ​

Pattern 3: SOS DP (Sum over Subsets) ​

Pattern 4: Profile DP (Broken Profile / Plug DP) ​

Pattern 5: Subset Partitioning ​

Submask Enumeration Visualized ​

Pattern 6: Matching / Pairing with Bitmask ​

Contest Cheat Sheet ​

Gotchas & Debugging ​

1. The n≤20 constraint ​

2. 1 << i vs 1LL << i ​

3. Iterating submasks correctly ​

4. Forgetting to check dp[mask][u] < INF before transitioning ​

5. Memory for dp[1 << n][n] ​

6. Wrong bit indexing ​

7. Debug checklist ​

Mental Traps ​

Common Mistakes ​

Debug Drills ​

Self-Test ​

Practice Problems ​

Further Reading ​

Recap ​

Cross-Links ​

DP -- Bitmask

Contents

Intuition

Visual Walkthrough

Bitmask State-Space Lattice (n = 3)

The Invariant

What the Code Won't Teach You

When to Reach for This

C++ STL Reference

Implementation (Contest-Ready)

Version 1: TSP -- Minimum Cost Hamiltonian Cycle (Minimal)

Version 1: TSP -- Explained

Version 2: Subset Sum / Partition -- Assign Items to Groups (Minimal)

Version 2b: Minimum Cost Assignment -- Explained

Operations Reference

Problem Patterns & Tricks

Pattern 1: TSP and Hamiltonian Path

TSP DP Table Concept

Pattern 2: Assignment Problem (Workers to Jobs)

Pattern 3: SOS DP (Sum over Subsets)

Pattern 4: Profile DP (Broken Profile / Plug DP)

Pattern 5: Subset Partitioning

Submask Enumeration Visualized

Pattern 6: Matching / Pairing with Bitmask

Contest Cheat Sheet

Gotchas & Debugging

1. The $n \leq 20$ constraint

2. `1 << i` vs `1LL << i`

3. Iterating submasks correctly

4. Forgetting to check `dp[mask][u] < INF` before transitioning

5. Memory for `dp[1 << n][n]`

6. Wrong bit indexing

7. Debug checklist

Mental Traps

Common Mistakes

Debug Drills

Self-Test

Practice Problems

Further Reading

Recap

Cross-Links