Skip to content

DP -- Bitmask

AL-29 | Use an integer to represent a subset of n elements, then DP over all 2n subsets. The go-to technique when n20 and you need to track "which elements have been used."

Difficulty: [Intermediate]Prerequisites: DP -- 1D IntroductionCF Rating Range: 1600-2000

Quick Revisit

  • USE WHEN: n <= 20 elements and you need to track which elements have been used
  • INVARIANT: dp[mask] = optimal answer when the set of processed elements equals mask
  • TIME: O(2^n * n) | SPACE: O(2^n)
  • CLASSIC BUG: using (1 << n) with int when n >= 31 -- use (1LL << n)
  • PRACTICE: DP Practice Ladder

Contents

Contest Frequency: ** Regular -- appears in Div 2 D/E when n <= 20


Intuition

Consider the Travelling Salesman Problem on 5 cities. The naive approach tries every ordering: 5!=120 permutations. Manageable. But scale to n=20 and you face 20!2.4×1018 permutations -- utterly impossible even at 109 operations per second.

The root of the explosion is that brute force distinguishes the order you visited cities in, even when the only things that matter for future decisions are (a) which cities you have already visited, and (b) where you are right now. Two histories that visit the same set and end at the same city are interchangeable. Brute force ignores that and recounts every permutation separately.

Encode which cities are visited as a bitmask -- dp[mask][last] = best cost to visit exactly the cities in mask, ending at last. This reduces permutations to 2n×n states.

Think of a checklist with n boxes. Each box is either ticked (visited) or unticked. An integer's binary representation is exactly such a checklist: bit i is 1 if city i is ticked. Two travellers who have ticked the same boxes and stand in the same city are in identical situations -- we only need to keep the best cost among them. The bitmask is the checklist.

For n=20: 220×202×107 states, each with at most n transitions. That is roughly 4×108 work -- tight but feasible in 1-2 seconds.

Visual Walkthrough

Five cities: 0,1,2,3,4. Start at city 0. The bitmask has 5 bits; bit i set means city i visited.

  Step 0 -- Base case
  mask = 00001 (decimal 1), last = 0
  Meaning: only city 0 visited, standing at 0.  Cost = 0.

  Step 1 -- Expand from (00001, 0)
  Try adding city 1:  mask = 00011 (3),  last = 1,  cost = dist[0][1]
  Try adding city 2:  mask = 00101 (5),  last = 2,  cost = dist[0][2]
  Try adding city 3:  mask = 01001 (9),  last = 3,  cost = dist[0][3]
  Try adding city 4:  mask = 10001 (17), last = 4,  cost = dist[0][4]

  Step 2 -- Expand from (00011, 1)   [cities {0,1} visited, at city 1]
  Try adding city 2:  mask = 00111 (7),  last = 2,  cost = dp[00011][1] + dist[1][2]
  Try adding city 3:  mask = 01011 (11), last = 3,  cost = dp[00011][1] + dist[1][3]
  Try adding city 4:  mask = 10011 (19), last = 4,  cost = dp[00011][1] + dist[1][4]
  (and similarly from (00101, 2), (01001, 3), (10001, 4) ...)

  Step 3 -- Masks with 3 bits set, e.g. (00111, 2)  [cities {0,1,2}, at 2]
  Try adding city 3:  mask = 01111, last = 3
  Try adding city 4:  mask = 10111, last = 4
  Multiple paths may arrive at the same (mask, last) -- keep min cost.

  Step 4 -- Masks with 4 bits set, e.g. (01111, 3)  [cities {0,1,2,3}, at 3]
  Only city 4 remains:  mask = 11111, last = 4
  Cost = dp[01111][3] + dist[3][4]

  Final -- mask = 11111 (all visited).
  Answer = min over all last cities i of dp[11111][i] + dist[i][0].

  State count: 2^5 x 5 = 160  vs  5! = 120 permutations.
  At n = 20:  2^20 x 20 ~ 2*10^7  vs  20! ~ 2.4*10^18.

The savings come from merging all orderings that produce the same (mask, last) pair into a single DP cell.

Bitmask State-Space Lattice (n = 3)

  Each node is a mask. Arrows show transitions (setting one bit).
  In bitmask DP, you process nodes top-to-bottom (increasing mask value).
  Every dependency points upward -- to a numerically smaller mask.

                          111 (7)  <- all elements used (answer here)
                         / | \
                       /   |   \
                     /     |     \
                  011(3) 101(5) 110(6)   <- 2 elements used
                   /\     /\     /\
                  /  \   /  \   /  \
               001  010 001 100 010 100
                 \  |    |  /    |  /
                  \ |    | /     | /
                 001(1) 010(2) 100(4)    <- 1 element used
                    \    |    /
                     \   |   /
                      \  |  /
                      000 (0)            <- empty set (base case)

  Transition: mask -> mask | (1 << bit)   (move DOWN in diagram)
  Dependency: dp[mask] depends on dp[mask with one bit removed] (UP)

  Processing order: 000, 001, 010, 011, 100, 101, 110, 111
  = 0, 1, 2, 3, 4, 5, 6, 7  <- simple ascending loop!

The lattice makes the dependency structure explicit. We can now formalize the DP recurrence it implies.

The Invariant

+-----------------------------------------------------------------------+
| dp[mask][i] = optimal cost over ALL orderings of the cities in mask   |
|               that start at city 0 and end at city i.                 |
|                                                                       |
| Transition:                                                           |
|   dp[mask | (1 << j)][j] = min(dp[mask][i] + cost[i][j])             |
|   for every j NOT in mask (i.e., (mask >> j) & 1 == 0).              |
|                                                                       |
| Base case:  dp[1 << 0][0] = 0   (start at city 0, only city 0 done). |
| Answer:     min over i of dp[(1 << n) - 1][i] + cost[i][0].          |
+-----------------------------------------------------------------------+

What the Code Won't Teach You

These are conceptual insights that reading the implementation alone won't give you:

Why subset enumeration order matters. Iterating mask = 0, 1, 2, ..., (1<<n)-1 processes masks in increasing integer order. Any submask of mask is numerically smaller than mask (removing a bit always decreases the value). So by the time you compute dp[mask], every dp[submask] it depends on is already final. This is not obvious from the code -- it looks like "just a for-loop" -- but it's the reason the DP is correct without topological sorting.

The connection to inclusion-exclusion. SOS DP (Sum over Subsets) is essentially the inclusion-exclusion principle automated via bitmask DP. When you compute f[mask]=smaska[s], you're summing contributions of all subsets -- the same structure as inclusion-exclusion over set membership. Möbius inversion on the subset lattice is inclusion-exclusion. If you understand one, you understand the other.

When to add a second dimension beyond the mask. dp[mask] (1D) works when the mask alone determines the optimal value -- e.g., assignment problems where the worker index is popcount(mask). You need dp[mask][last] (2D) when future transitions depend on which specific element was last -- e.g., TSP, where the next edge cost depends on the current city. The question to ask: "given the same set of used elements, does it matter how I got here?" If yes, add the extra dimension.

When to Reach for This

Trigger phrases in the problem statement:

  • "n20" (or n15, n18) -- the single strongest signal.
  • "visit all nodes", "use every element exactly once", "assign each worker to a job."
  • "TSP", "Hamiltonian path/cycle", "assignment problem", "matching."
  • "partition into groups", "minimum number of subsets."

Counter-examples -- bitmask DP is NOT the answer when:

  • n>23 -- 2238×106 states is borderline; 224 with a factor of n is too slow.
  • Items are independent (no interaction between choices) -- plain knapsack or greedy suffices.
  • The structure is a DAG or tree -- use tree DP or topological-order DP instead.
  • You need subset sums but n is large and values are small -- use value-based knapsack DP.

C++ STL Reference

Function / BuiltinHeaderWhat it doesNotes
__builtin_popcount(x)built-inCount of set bits in unsigned intO(1). Use __builtin_popcountll for long long
__builtin_ctz(x)built-inCount trailing zeros (index of lowest set bit)Undefined for x == 0. Use __builtin_ctzll for long long
__builtin_clz(x)built-inCount leading zerosUndefined for x == 0
1 << i--Bitmask with only bit i setUse 1LL << i when i >= 31
mask & (1 << i)--Test if bit i is setNonzero (not necessarily 1) if set
mask | (1 << i)--Set bit i--
mask ^ (1 << i)--Toggle bit iOnly use to remove if bit is known to be set
mask & (mask - 1)--Clear lowest set bitUseful for iterating set bits
bit_ceil(x)<bit>Smallest power of 2 x (C++20)--
popcount(x)<bit>C++20 standard popcount for unsigned typesPrefer __builtin_popcount for contest portability

Implementation (Contest-Ready)

Version 1: TSP -- Minimum Cost Hamiltonian Cycle (Minimal)

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<vector<int>> dist(n, vector<int>(n));
    for (auto& row : dist)
        for (auto& x : row) cin >> x;

    const int INF = 1e9;
    // dp[mask][i] = min cost to visit exactly the cities in mask, ending at i
    vector<vector<int>> dp(1 << n, vector<int>(n, INF));
    dp[1][0] = 0;  // start at city 0

    for (int mask = 1; mask < (1 << n); ++mask)
        for (int u = 0; u < n; ++u) {
            if (dp[mask][u] >= INF) continue;
            if (!(mask & (1 << u))) continue;
            for (int v = 0; v < n; ++v) {
                if (mask & (1 << v)) continue;  // already visited
                int nmask = mask | (1 << v);
                dp[nmask][v] = min(dp[nmask][v], dp[mask][u] + dist[u][v]);
            }
        }

    int full = (1 << n) - 1;
    int ans = INF;
    for (int u = 0; u < n; ++u)
        ans = min(ans, dp[full][u] + dist[u][0]);
    cout << ans << "\n";
}

Version 1: TSP -- Explained

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<vector<int>> dist(n, vector<int>(n));
    for (auto& row : dist)
        for (auto& x : row) cin >> x;

    const int INF = 1e9;
    // dp[mask][i]: minimum cost to have visited exactly the set of cities
    // represented by mask, with the last city visited being i.
    // mask is a bitmask of n bits. Bit j set means city j has been visited.
    vector<vector<int>> dp(1 << n, vector<int>(n, INF));

    // Base case: start at city 0, only city 0 visited, cost 0
    dp[1][0] = 0;

    for (int mask = 1; mask < (1 << n); ++mask)
        for (int u = 0; u < n; ++u) {
            if (dp[mask][u] >= INF) continue;  // unreachable state
            if (!(mask & (1 << u))) continue;   // u must be in mask
            // Try extending to unvisited city v
            for (int v = 0; v < n; ++v) {
                if (mask & (1 << v)) continue;  // v already visited
                int nmask = mask | (1 << v);    // mark v as visited
                dp[nmask][v] = min(dp[nmask][v], dp[mask][u] + dist[u][v]);
            }
        }

    // Close the cycle: from every possible last city, return to city 0
    int full = (1 << n) - 1;
    int ans = INF;
    for (int u = 0; u < n; ++u)
        ans = min(ans, dp[full][u] + dist[u][0]);
    cout << ans << "\n";
}

Version 2: Subset Sum / Partition -- Assign Items to Groups (Minimal)

Given n items with weights, check if they can be split into two groups with equal sum.

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<int> w(n);
    for (auto& x : w) cin >> x;

    int total = 0;
    for (int x : w) total += x;
    if (total % 2 != 0) { cout << "NO\n"; return 0; }

    int half = total / 2;
    // dp[mask] = sum of elements in the subset represented by mask
    // We want to find any mask with dp[mask] == half
    vector<int> dp(1 << n);
    for (int mask = 0; mask < (1 << n); ++mask) {
        if (mask == 0) { dp[0] = 0; continue; }
        // Find any set bit, compute sum incrementally
        int lsb = __builtin_ctz(mask);
        dp[mask] = dp[mask ^ (1 << lsb)] + w[lsb];
    }

    for (int mask = 0; mask < (1 << n); ++mask) {
        if (dp[mask] == half) {
            cout << "YES\n";
            return 0;
        }
    }
    cout << "NO\n";
}

Version 2b: Minimum Cost Assignment -- Explained

n workers, n jobs. cost[i][j] = cost of assigning worker i to job j. Each worker gets exactly one job. Minimize total cost. (Hungarian algorithm is O(n3), but for n20, bitmask DP is simpler.)

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<vector<int>> cost(n, vector<int>(n));
    for (auto& row : cost)
        for (auto& x : row) cin >> x;

    const int INF = 1e9;
    // dp[mask] = min cost to assign jobs (the set bits in mask) to the
    // first popcount(mask) workers.
    // Worker index = popcount(mask) - 1 when we add the newest job.
    vector<int> dp(1 << n, INF);
    dp[0] = 0;

    for (int mask = 0; mask < (1 << n); ++mask) {
        if (dp[mask] >= INF) continue;
        int i = __builtin_popcount(mask);  // next worker to assign
        if (i >= n) continue;
        // Try assigning job j to worker i
        for (int j = 0; j < n; ++j) {
            if (mask & (1 << j)) continue;  // job j already taken
            int nmask = mask | (1 << j);
            dp[nmask] = min(dp[nmask], dp[mask] + cost[i][j]);
        }
    }
    cout << dp[(1 << n) - 1] << "\n";
}

Operations Reference

The dominant factor in bitmask DP is the 2n state space. This table shows how the choice of pattern affects the constant factor multiplied against that exponential base.

OperationTimeSpace
TSP (visit all n nodes, min cost path)O(2nn2)O(2nn)
Assignment problem (n workers, n jobs)O(2nn)O(2n)
Subset-of-subset DP (iterate all submasks of all masks)O(3n)O(2n)
SOS DP (Sum over Subsets)O(2nn)O(2n)
Profile DP (grid with bitmask row state)O(m2n2n)O(2n)
Computing dp[mask] incrementally from lowest bitO(2n) totalO(2n)

Practical limits: n20 for O(2nn), n15 for O(3n), n12 for O(4n) or heavy-constant methods.


Problem Patterns & Tricks

Pattern 1: TSP and Hamiltonian Path

Visit all n nodes with minimum cost. State: dp[mask][last]. The classic bitmask DP.

Variant -- Hamiltonian path (no return): Answer is minudp[(1n)1][u] without adding the return edge.

Variant -- Multiple starting cities: Initialize dp[1 << s][s] = 0 for all valid starts s.

Examples: CF 1185G1 -- Playlist for Polycarp (Easy), CSES -- Hamiltonian Flights

TSP DP Table Concept

  TSP with 4 cities (0,1,2,3).  dp[mask][last] = min cost.
  Only reachable states shown.  Start: dp[0001][0] = 0.

           last ->   0       1       2       3
  mask ↓
  0001 (1)       [  0  ]    -       -       -
  0011 (3)          -    [ d01 ]    -       -
  0101 (5)          -       -    [ d02 ]    -
  1001 (9)          -       -       -    [ d03 ]
  0111 (7)          -    [  *  ] [  *  ]    -
  1011 (11)         -    [  *  ]    -    [  *  ]
  1101 (13)         -       -    [  *  ] [  *  ]
  1111 (15)         -    [  *  ] [  *  ] [  *  ]

  [*] = min over all predecessors:
    dp[mask][v] = min over u∈mask, u!=v of { dp[mask\v][u] + dist[u][v] }

  Example: dp[0111][2] = min( dp[0011][1] + dist[1][2],
                               dp[0101][0] + dist[0][2]  -- but 0 is start,
                               ... check all u in {0,1} with bit set )

  Answer: min over v of dp[1111][v] + dist[v][0]  (return to start)

  Dimensions: 2^4 x 4 = 64 cells  (vs 4! = 24 permutations)
  At n=20:   2^20 x 20 ~= 20M cells (vs 20! ~= 2.4 x 10^18)

Pattern 2: Assignment Problem (Workers to Jobs)

n workers, n jobs, minimize total cost. dp[mask] = min cost to assign the set of jobs in mask to the first popcount(mask) workers. Transition: try every unassigned job for the next worker.

Key insight: You don't need a 2D state because the worker index is determined by popcount(mask).

Examples: CF 1238E -- Keyboard Purchase, CSES -- General Matching


Pattern 3: SOS DP (Sum over Subsets)

Compute for each mask: f[mask]=smaska[s].

Naive: iterate submasks of each mask O(3n). SOS DP does it in O(2nn) using inclusion along each bit dimension:

cpp
vector<long long> f(a.begin(), a.end());  // copy of input array
for (int i = 0; i < n; ++i)
    for (int mask = 0; mask < (1 << n); ++mask)
        if (mask & (1 << i))
            f[mask] += f[mask ^ (1 << i)];
// now f[mask] = sum of a[s] for all s that are subsets of mask

This is essentially a "high-dimensional prefix sum." Inverse (Mobius) reverses the direction.

Examples: CF 165E -- Compatible Numbers, CF 449D -- Jzzhu and Numbers


Pattern 4: Profile DP (Broken Profile / Plug DP)

Fill a grid column by column (or row by row). The bitmask encodes which cells in the current column boundary are "occupied" by shapes from the previous column (e.g., dominoes, L-shapes).

State: dp[col][mask] where mask represents the profile at the boundary between column col and col+1.

Classic problem: Count ways to tile an m×n grid with 1×2 dominoes. m12, n can be large. State: bitmask of m bits.

Examples: CF 1102F -- Tiling with Dominoes, CSES -- Counting Tilings


Pattern 5: Subset Partitioning

Partition n items into k groups (or any number of groups) optimizing some criterion. Iterate over submasks to try all ways to form the next group.

cpp
// dp[mask] = min groups to cover all items in mask
// group_ok[s] = true if subset s can form a valid group
for (int mask = 1; mask < (1 << n); ++mask)
    for (int s = mask; s > 0; s = (s - 1) & mask)
        if (group_ok[s])
            dp[mask] = min(dp[mask], dp[mask ^ s] + 1);

Complexity: O(3n) (each element is in current group, in a previous group, or not yet assigned).

Examples: CF 1550E -- Stringforces, CF 1316E -- Team Building

Submask Enumeration Visualized

  Enumerating all non-empty submasks of mask = 1101 (13):

  Start: s = 1101 (13)
         ↓ (s-1) & mask = 1100 & 1101 = 1100
  s = 1100 (12)
         ↓ (s-1) & mask = 1011 & 1101 = 1001
  s = 1001 (9)
         ↓ (s-1) & mask = 1000 & 1101 = 1000
  s = 1000 (8)
         ↓ (s-1) & mask = 0111 & 1101 = 0101
  s = 0101 (5)
         ↓ (s-1) & mask = 0100 & 1101 = 0100
  s = 0100 (4)
         ↓ (s-1) & mask = 0011 & 1101 = 0001
  s = 0001 (1)
         ↓ (s-1) & mask = 0000 & 1101 = 0000
  s = 0000 -> STOP (loop condition s > 0)

  Total submasks visited: 7 (= 2^3 - 1, where 3 = popcount(1101))

  Why (s-1) & mask works:
  • (s-1) flips the lowest set bit and sets all lower bits
  • & mask keeps only bits that exist in the original mask
  • This "steps down" through submasks without visiting non-submasks

  Complexity over ALL masks: Σ 2^popcount(mask) = 3^n
  (each bit is: in mask & submask, in mask & not submask, or not in mask)

Pattern 6: Matching / Pairing with Bitmask

Pair up 2k elements (from n total) optimally. dp[mask] = best cost for pairing the elements in mask. Transition: pick the lowest unmatched element, try pairing it with every other unmatched element.

cpp
for (int mask = 0; mask < (1 << n); ++mask) {
    int i = __builtin_ctz(~mask);  // first unpaired element
    if (i >= n) continue;
    for (int j = i + 1; j < n; ++j)
        if (!(mask & (1 << j)))
            dp[mask | (1 << i) | (1 << j)] = min(
                dp[mask | (1 << i) | (1 << j)],
                dp[mask] + cost[i][j]);
}

Examples: CF 1579G -- Minimal Coverage, CF 1152F2 -- Neko Rules the Catniverse (Hard)


Contest Cheat Sheet

+------------------------------------------------------------------+
|                   DP BITMASK CHEAT SHEET (AL-29)                 |
+------------------------------------------------------------------+
| WHEN TO USE:                                                     |
|   - n <= 20 (or n <= 15 for O(3^n) subset iteration)            |
|   - "Assign/visit/partition ALL elements optimally"              |
|   - Problem says "permutation", "subset", "Hamiltonian"          |
+------------------------------------------------------------------+
| BIT OPERATIONS:                                                  |
|   test:   mask & (1 << i)                                        |
|   set:    mask | (1 << i)                                        |
|   clear:  mask & ~(1 << i)                                       |
|   toggle: mask ^ (1 << i)                                        |
|   lowest: __builtin_ctz(mask)                                    |
|   count:  __builtin_popcount(mask)                               |
+------------------------------------------------------------------+
| TSP:  dp[mask][u] = min cost visiting mask, ending at u          |
|   dp[nmask][v] = min(dp[nmask][v], dp[mask][u] + dist[u][v])    |
| ASSIGN: dp[mask] = min cost, worker = popcount(mask)             |
| SOS:  for i in [0,n): for mask: if bit i set, f[m]+=f[m^(1<<i)]|
+------------------------------------------------------------------+
| SUBMASK ITERATION:                                               |
|   for (int s = mask; s > 0; s = (s-1) & mask) { ... }           |
|   // handles all non-empty submasks; check s=0 separately        |
+------------------------------------------------------------------+
| TIME:  O(2^n * n) assign | O(2^n * n^2) TSP | O(3^n) submasks   |
| SPACE: O(2^n) or O(2^n * n)                                     |
+------------------------------------------------------------------+
| WATCH: 1LL << i if i >= 31 | n <= 20 hard limit | base cases    |
+------------------------------------------------------------------+

Gotchas & Debugging

1. The n20 constraint

220=1,048,576106. Multiplied by n=20, that's 2×107 -- tight but fine. For n=24, 224244×108 -- too slow. If n>20, bitmask DP is almost certainly not the intended approach.

For O(3n): 3151.4×107 is fine. 3203.5×109 -- way too slow. Check which complexity your approach actually needs.

2. 1 << i vs 1LL << i

1 << 31 is undefined behavior in C++ (signed integer overflow). Even for n = 20, if you compute 1 << n to get the full mask, that's 220 which fits in int, but 1 << 31 does not. Safer: always use 1 << i with int when n30, but if any intermediate value or the mask itself is long long, use 1LL << i.

cpp
// BAD: undefined behavior when i >= 31
int mask = 1 << 31;

// GOOD
long long mask = 1LL << 31;

3. Iterating submasks correctly

The loop for (int s = mask; s > 0; s = (s - 1) & mask) skips s = 0. If the empty subset is a valid state, handle it outside the loop. If s is unsigned, s > 0 after decrementing from 0 wraps to 2321 -- infinite loop. Keep s as int.

4. Forgetting to check dp[mask][u] < INF before transitioning

Without this guard, you process unreachable states and pollute downstream values with INF + cost (potential overflow). Always skip states that haven't been reached.

5. Memory for dp[1 << n][n]

220×20×4 bytes =80 MB. This exceeds the typical 256 MB memory limit. Solutions:

  • Use dp[1 << n] (1D) when the second dimension can be derived (e.g., from popcount).
  • Use short instead of int if values fit.
  • Process masks in order so you only need two "layers" (rare for bitmask DP).

6. Wrong bit indexing

If cities/items are 1-indexed in the input but your bits are 0-indexed, every (1 << i) is off by one. Pick one convention and stick to it. The simplest: read into 0-indexed arrays, use 0-indexed bits.

7. Debug checklist

  1. Print dp[mask] for all masks with small n (3 or 4). Compare to brute-force.
  2. Verify base cases: dp[0] = 0 (or dp[1 << start] = 0 for TSP).
  3. Check that you never set a bit that's already set in mask.
  4. Verify the answer location: dp[(1 << n) - 1] for full-set answers.

Mental Traps

These are the thinking errors that survive even after you "know" the technique:

Trap 1: "I need to try all permutations." Bitmask DP tracks which elements are used, not the order. Two orderings that produce the same (mask, last) pair are merged into one cell. Unless order is explicitly part of your state (e.g., dp[mask][last] in TSP), you do NOT enumerate permutations -- the mask already collapses them.

Trap 2: "2n is always too slow." Reality check: 220106, and 220×202×107 -- easily fits in a 2-second time limit. The wall is at n23-25. If the constraint says n20, bitmask DP is designed for this.

Trap 3: "I'll just iterate all subsets inside the mask loop." Iterating all submasks of each mask is O(3n), NOT O(4n) -- but 3203.5×109 is too slow. If you need submask enumeration, you need n15. Confusing O(2nn) patterns (TSP, assignment) with O(3n) patterns (subset partitioning) is a common source of TLE.

  Bitmask state transition -- adding element 2 to mask 01011:

  Before:    01011   (elements {0, 1, 3} used)
                ^
                bit 2 is 0 -> element 2 is available

  Set bit 2:  01011
             | 00100   (1 << 2)
             --------
              01111   (elements {0, 1, 2, 3} used)

  In code:   int nmask = mask | (1 << 2);   // 11 | 4 = 15
             //  01011 | 00100 = 01111

  WRONG if bit already set:
              01111 | 00100 = 01111  <- mask unchanged!
              This creates a silent self-loop. Always guard:
              if (mask & (1 << v)) continue;  // v already in mask

Common Mistakes

  1. Int overflow on bit shift. (1 << n) overflows int when n >= 31. Always use (1LL << n).
  2. Wrong initialization. For minimization, memset(dp, 0x3f, sizeof dp) and set dp[0] = 0. All-zeros initialization gives wrong answers because min(0, anything_positive) = 0.
  3. Inconsistent indexing. Using (mask >> i) & 1 with 0-indexed elements but 1-indexed input causes silent off-by-one across the entire bitmask.
  4. Forgetting the popcount check. Some problems need the k-th element assigned to a specific position. Use __builtin_popcount(mask) to determine which position you are filling.
  5. Exceeding memory. dp[1 << 20] is about 4 MB for int, but dp[1 << 20][20] is 80 MB. Watch the state size.
  6. mask & j instead of mask & (1 << j). The #1 bitmask DP bug. mask & j treats j as a bitmask, not a bit index, so city 2 (binary 10) collides with city 1's bit. Always convert an index to a mask with (1 << j) before testing.

Debug Drills

Drill 1. TSP bitmask DP gives a tour shorter than any single edge.

cpp
int dp[1 << 20][20];
memset(dp, 0x3f, sizeof dp);
dp[1][0] = 0; // start at city 0
for (int mask = 1; mask < (1 << n); mask++)
    for (int u = 0; u < n; u++) {
        if (!(mask & (1 << u))) continue;
        for (int v = 0; v < n; v++) {
            if (mask & (1 << v)) continue;
            dp[mask | (1 << v)][v] = min(dp[mask | (1 << v)][v], dp[mask][u] + dist[u][v]);
        }
    }
int ans = dp[(1 << n) - 1][0]; // return to city 0
What's wrong?

The final answer dp[(1<<n)-1][0] reads the cost of ending at city 0, but the DP only stores the cost of the last city visited, not including the return edge. Fix: ans = min over all u of (dp[(1<<n)-1][u] + dist[u][0]).

Drill 2. Subset enumeration misses the empty subset.

cpp
int total = 0;
for (int sub = mask; sub > 0; sub = (sub - 1) & mask)
    total += value[sub];
What's wrong?

The loop condition sub > 0 skips sub = 0 (the empty subset). If you need to include the empty set, add total += value[0]; after the loop. The idiom for (sub = mask; ; sub = (sub-1) & mask) { process(sub); if (sub == 0) break; } handles all subsets including empty.

Drill 3. Assignment DP assigns person 0 correctly but all others get cost 0.

cpp
int dp[1 << 20] = {};
for (int mask = 0; mask < (1 << n); mask++) {
    int person = __builtin_popcount(mask);
    if (person >= n) continue;
    for (int task = 0; task < n; task++) {
        if (mask & (1 << task)) continue;
        dp[mask | (1 << task)] = min(dp[mask | (1 << task)], dp[mask] + cost[person][task]);
    }
}
What's wrong?

dp is initialized to all zeros, so min(0, anything_positive) is always 0 for states beyond the first. Initialize memset(dp, 0x3f, sizeof dp) and set dp[0] = 0. Only the empty-set base case should be zero.


Self-Test

Before moving to the practice problems, make sure you can do each of these without looking anything up:

  • [ ] Explain what dp[mask] represents in a bitmask DP and why iterating masks in increasing order guarantees all dependencies (submasks) are already computed.
  • [ ] Implement TSP with bitmask DP from scratch: dp[mask][i] = min cost to visit the set mask ending at city i. Include correct base case, transition, and answer extraction.
  • [ ] Convert between "iterate over all submasks of mask" (for (int s = mask; s > 0; s = (s-1) & mask)) and "iterate over all supermasks of mask" (flip perspective: for each mask, check if your target is a submask).
  • [ ] State the time complexity of bitmask DP for TSP (O(2nn2)) and for assignment (O(2nn)). Explain why n>20-23 is too large.
  • [ ] Explain why dp[(1<<n)-1] gives the answer for "use all elements" problems -- the mask (1<<n)-1 has all n bits set, representing the full set.
  • [ ] Debug scenario: mask already has bit i set but you try to add element i again via mask | (1 << i). What goes wrong? (The mask doesn't change, so you create a self-loop that may silently produce an incorrect shorter "path" that revisits a city.)
  • [ ] Write the submask enumeration idiom and state its total complexity when summed over all masks (O(3n)).

Practice Problems

CF RatingHow bitmask DP tends to appear
1200Rare — brute force handles n10
1500Small TSP, assignment, "match n items to n slots"
1800Bitmask DP + profile DP on grids, Hamiltonian path counting
2100Broken profile DP, SOS DP (sum over subsets), bitmask + other DP dimensions
#ProblemSourceDifficultyKey Idea
1Hamiltonian FlightsCSES1700Count Hamiltonian paths with bitmask DP
2MatchingAtCoder DP-O1600Assignment problem, count perfect matchings
3Compatible NumbersCF 165E1800SOS DP -- find a compatible pair via superset sums
4Team BuildingCF 1316E1800Assignment + leftover scoring with bitmask
5Jzzhu and NumbersCF 449D2000SOS DP + inclusion-exclusion
6Counting TilingsCSES2000Profile DP on grid with dominoes
7Elevator RidesCSES1800Partition items into minimum groups with weight limit
8Keyboard PurchaseCF 1238E2000SOS DP on adjacency bitmask
9Little Elephant and XorCF 258D2100Digit DP meets bitmask reasoning
10FishAtCoder DP-Y1900Probability + bitmask DP over elimination order

Further Reading


Recap

  • Core idea: an integer is a set. Bit i set means element i is included. Enumerate all 2^n subsets.
  • Transition pattern: for each mask, try adding each element i not in mask: dp[mask | (1 << i)].
  • Feasibility bound: n <= 20 (maybe 23 with careful optimization). Beyond that, look for other structure.
  • Classic problems: TSP, assignment, minimum-cost Hamiltonian path, subset partitioning.

Built for the climb from Codeforces 1100 to 2100.