Appearance
SOS DP (Sum Over Subsets)
Quick Revisit
- USE WHEN: Need
for all masks (subset-sum aggregation, AND/OR convolution, counting divisor-like relations on bitmasks) - INVARIANT: After processing bit
, accounts for all subsets of that differ only in bits - TIME: O(n · 2ⁿ) | SPACE: O(2ⁿ)
- CLASSIC BUG: Iterating masks inside the bit loop instead of bit inside masks — breaks the dimension-by-dimension sweep
- PRACTICE: 04-ladder-dp
AL-30 | For an array
Difficulty: [Advanced]Prerequisites: DP -- Bitmask, Bit ManipulationCF Rating Range: 1800-2100
Contents
Intuition
- C++ STL Reference
- Implementation (Contest-Ready)
- Operations Reference
- Problem Patterns & Tricks
- Contest Cheat Sheet
- Gotchas & Debugging
- Self-Test
- Practice Problems
- Further Reading
Intuition
Imagine you are building a team-composition analyzer for a game with
The brute-force approach iterates over all submasks of each mask:
cpp
for (int mask = 0; mask < (1 << n); ++mask)
for (int s = mask; ; s = (s - 1) & mask) {
F[mask] += A[s];
if (s == 0) break;
}Each mask
For
Subset Lattice (Hasse Diagram) for n = 3
=========================================
Each node is a bitmask. An edge from A up to B means A ⊂ B
(A is a subset of B, obtained by setting one more bit).
SOS DP computes, for each node, the sum of values at ALL
nodes reachable going DOWNWARD.
111
/ | \
110 101 011
|\ /\ /|
100 010 001
\ | /
000
Subsets of 101: Subsets of 110:
101 110
/ \ / \
100 001 100 010
\ / \ /
000 000
F[101] = A[000]+A[001]+A[100]+A[101]
F[110] = A[000]+A[010]+A[100]+A[110]
F[111] = sum of ALL A[mask] (every mask is a subset of 111)Process bits one at a time: for each bit position, either include it or don't -- this "Zeta transform" computes all
Think of it as a high-dimensional prefix sum. If you have a 2D grid, you compute prefix sums first along rows, then along columns -- two separate passes. SOS DP does exactly the same thing, but in
For each bit
cpp
for (int i = 0; i < n; ++i) // one pass per bit
for (int mask = 0; mask < (1 << n); ++mask)
if (mask & (1 << i))
F[mask] += F[mask ^ (1 << i)]; // add "bit i off" versionTotal work:
Visual Walkthrough
Initial array (copy of
index(bin) 000 001 010 011 100 101 110 111
F[]: 2 3 5 7 11 13 17 19Layer 0 -- process bit 0 (the
mask 011: bit 0 set -> F[011] += F[010] => 7 + 5 = 12
mask 001: bit 0 set -> F[001] += F[000] => 3 + 2 = 5
mask 101: bit 0 set -> F[101] += F[100] => 13 + 11 = 24
mask 111: bit 0 set -> F[111] += F[110] => 19 + 17 = 36
After layer 0:
index(bin) 000 001 010 011 100 101 110 111
F[]: 2 5 5 12 11 24 17 36Meaning:
Layer 1 -- process bit 1 (the
mask 010: bit 1 set -> F[010] += F[000] => 5 + 2 = 7
mask 011: bit 1 set -> F[011] += F[001] => 12 + 5 = 17
mask 110: bit 1 set -> F[110] += F[100] => 17 + 11 = 28
mask 111: bit 1 set -> F[111] += F[101] => 36 + 24 = 60
After layer 1:
index(bin) 000 001 010 011 100 101 110 111
F[]: 2 5 7 17 11 24 28 60Layer 2 -- process bit 2 (the
mask 100: bit 2 set -> F[100] += F[000] => 11 + 2 = 13
mask 101: bit 2 set -> F[101] += F[001] => 24 + 5 = 29
mask 110: bit 2 set -> F[110] += F[010] => 28 + 7 = 35
mask 111: bit 2 set -> F[111] += F[011] => 60 + 17 = 77
After layer 2 (FINAL):
index(bin) 000 001 010 011 100 101 110 111
F[]: 2 5 7 17 13 29 35 77Verification:
Layer-by-layer diagram (n = 3):
A[mask] --bit 0--> sum over --bit 1--> sum over --bit 2--> sum over
subsets subsets ALL subsets
differing differing = F[mask]
in bit 0 in bits 0-1
Each layer doubles the "freedom" by one more bit position.
After all n layers, every submask has been counted exactly once.SOS DP Dimension Sweep -- State After Each Pass (n = 3)
========================================================
Shows which submasks' A-values are included in dp[mask] after each pass.
INIT After dim 0 After dim 1 After dim 2
mask (= A[mask]) (bit 0 free) (bits 0-1 free) (ALL free)
───────────────────────────────────────────────────────────────────────────────
000 {000} {000} {000} {000}
001 {001} {000,001} {000,001} {000,001}
010 {010} {010} {000,010} {000,010}
011 {011} {010,011} {000,001, {000,001,
010,011} 010,011}
100 {100} {100} {100} {000,100}
101 {101} {100,101} {100,101} {000,001,
100,101}
110 {110} {110} {100,110} {000,010,
100,110}
111 {111} {110,111} {100,101, {ALL 8
110,111} masks}
Each dimension pass DOUBLES the submasks covered (on average).
After all 3 passes, dp[mask] = sum over ALL submasks of mask.
State transitions for mask = 111:
INIT: {111}
dim 0: {111} ∪ dp[110] = {111} ∪ {110} = {110,111}
dim 1: {110,111} ∪ dp[101] = {110,111} ∪ {100,101}
= {100,101,110,111}
dim 2: {100,101,110,111} ∪ dp[011] = ... ∪ {000,001,010,011}
= {ALL 8 masks} OKThe Invariant
+-----------------------------------------------------------------------+
| After processing bit j, dp[mask] contains the sum of A[s] over all |
| masks s such that: |
| - s agrees with mask on bits j+1, j+2, ..., n-1 |
| - s is a submask of mask on bits 0, 1, ..., j |
| |
| In other words, bits 0..j are "free" (can be 0 even if mask has 1), |
| while bits j+1..n-1 are "locked" (must match mask exactly). |
| |
| After processing all n bits, every bit is free => full subset sum. |
+-----------------------------------------------------------------------+Why SOS Beats Naive: Eliminating Redundant Work (n = 3)
=========================================================
NAIVE O(3^n): Each mask independently enumerates its own submasks.
mask=111 visits: 000 001 010 011 100 101 110 111 (8 submasks)
mask=110 visits: 000 010 100 110 (4 submasks)
mask=101 visits: 000 001 100 101 (4 submasks)
mask=011 visits: 000 001 010 011 (4 submasks)
mask=100 visits: 000 100 (2 submasks)
mask=010 visits: 000 010 (2 submasks)
mask=001 visits: 000 001 (2 submasks)
mask=000 visits: 000 (1 submask)
Total: 27 = 3^3
Notice: A[000] is visited 8 times! A[001] visited 4 times!
Massive redundancy -- each contribution recomputed from scratch.
SOS O(n * 2^n): Three passes, each touching ALL 2^n masks once.
Pass 0 (bit 0): 8 masks x 1 check/add = 8 ops
Pass 1 (bit 1): 8 masks x 1 check/add = 8 ops
Pass 2 (bit 2): 8 masks x 1 check/add = 8 ops
Total: 24 = 3 x 2^3
The trick: pass 0 propagates bit-0 contributions to ALL masks
that need them -- ONCE. Pass 1 reuses that work, adding bit-1
contributions. Each subsequent pass builds on all previous ones.
It's the same idea as separable convolution:
2D brute force: O(K² per pixel)
Separable: O(K per pixel) in each of 2 passes = O(2K)
Here: n "dimensions" (bits), each pass costs O(2^n).
Total: O(n * 2^n) instead of O(3^n).What the Code Won't Teach You
The Zeta/Möbius transform connection. SOS DP is not just an algorithm -- it is the Zeta transform over the subset lattice (a partially ordered set where += vs -=. Recognizing "this is a lattice transform" unlocks the entire family -- superset sums, AND/OR convolution, and inclusion-exclusion all become instances of the same algebraic structure.
Why dimension-by-dimension works. The subset relation decomposes bit-by-bit:
When SOS DP applies. Any time you need
The complexity insight. Naive subset enumeration over all masks costs
With the deeper structure understood, here are the practical signals that point to SOS DP.
When to Reach for This
Trigger phrases in the problem statement:
- "Sum over all subsets" / "aggregate over submasks" / "for every mask, combine contributions from submasks."
- "Zeta transform" / "Mobius transform" / "inclusion-exclusion over bitmasks."
- "AND convolution" / "OR convolution" -- these reduce to SOS DP.
- The problem involves
states and asks for a transform that relates masks to their submasks or supermasks. - You already have an
submask-enumeration solution and need to speed it up.
Counter-examples -- SOS DP is NOT the answer when:
- You need to iterate submasks with individual processing per submask (not just summation) --
may be unavoidable. - The relationship is between masks and their complements, not submasks -- use direct XOR tricks.
-- even becomes too slow. - The aggregation is not decomposable bit-by-bit (e.g., you need the product over a non-commutative structure).
C++ STL Reference
SOS DP relies on standard bit-manipulation builtins rather than dedicated containers:
| Function / Builtin | Header | What it does | Notes |
|---|---|---|---|
__builtin_popcount(x) | built-in | Count of set bits | Use __builtin_popcountll for long long |
__builtin_ctz(x) | built-in | Index of lowest set bit | Undefined for x == 0 |
1 << i | -- | Bitmask with only bit | Use 1LL << i when i >= 31 |
mask & (1 << i) | -- | Test if bit | -- |
mask ^ (1 << i) | -- | Toggle bit | -- |
mask | (1 << i) | -- | Set bit | -- |
mask & (mask - 1) | -- | Clear lowest set bit | Useful for iterating set bits |
popcount(x) | <bit> | C++20 standard popcount | -- |
__lg(x) | built-in | Floor of | Index of highest set bit |
Implementation (Contest-Ready)
Version 1: SOS DP -- Zeta Transform (Minimal)
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
int n;
cin >> n;
vector<long long> A(1 << n);
for (auto& x : A) cin >> x;
// F[mask] = sum of A[s] for all s that are submasks of mask
vector<long long> F(A.begin(), A.end());
for (int i = 0; i < n; ++i)
for (int mask = 0; mask < (1 << n); ++mask)
if (mask & (1 << i))
F[mask] += F[mask ^ (1 << i)];
for (int mask = 0; mask < (1 << n); ++mask)
cout << F[mask] << " \n"[mask == (1 << n) - 1];
}Loop Order Warning
Loop order is critical: bit-outer, mask-inner. The correct SOS DP iterates:
cpp
for (int bit = 0; bit < N; bit++) // OUTER: which dimension
for (int mask = 0; mask < (1 << N); mask++) // INNER: which subset
if (mask & (1 << bit))
dp[mask] += dp[mask ^ (1 << bit)];Why not mask-outer? If you iterate masks in the outer loop, when processing mask M at bit B, the value dp[M ^ (1 << B)] may already include contributions from higher bits that shouldn't be counted yet. The bit-outer loop ensures each dimension is processed independently across ALL masks before moving to the next dimension.
This is analogous to how the knapsack loop direction matters -- same principle of "which row am I reading from?"
Version 1: SOS DP -- Zeta Transform (Explained)
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
int n;
cin >> n;
vector<long long> A(1 << n);
for (auto& x : A) cin >> x;
// Goal: for each mask, compute F[mask] = sum of A[s] for all submasks s of mask.
//
// Approach: process one bit dimension at a time.
// After processing bit i, F[mask] accounts for all submasks that may
// differ from mask on bits 0..i (but must agree on bits i+1..n-1).
vector<long long> F(A.begin(), A.end());
for (int i = 0; i < n; ++i) {
for (int mask = 0; mask < (1 << n); ++mask) {
if (mask & (1 << i)) {
// mask has bit i set. The submask with bit i cleared
// is mask ^ (1 << i). Accumulate its (partial) sum.
F[mask] += F[mask ^ (1 << i)];
}
// If bit i is not set in mask, no submask of mask can have
// bit i set, so there is nothing to add.
}
}
for (int mask = 0; mask < (1 << n); ++mask)
cout << F[mask] << " \n"[mask == (1 << n) - 1];
}Version 2: Mobius Transform -- Inverse of SOS (Minimal)
Undo the Zeta transform: given
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
int n;
cin >> n;
vector<long long> F(1 << n);
for (auto& x : F) cin >> x;
// Mobius inversion: subtract instead of add, same loop structure
for (int i = 0; i < n; ++i)
for (int mask = 0; mask < (1 << n); ++mask)
if (mask & (1 << i))
F[mask] -= F[mask ^ (1 << i)];
// Now F[mask] = original A[mask]
for (int mask = 0; mask < (1 << n); ++mask)
cout << F[mask] << " \n"[mask == (1 << n) - 1];
}Version 3: Superset Sum (Zeta Transform -- Upward)
Compute
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
int n;
cin >> n;
vector<long long> A(1 << n);
for (auto& x : A) cin >> x;
// Superset sum: flip the condition -- add when bit i is NOT set
vector<long long> G(A.begin(), A.end());
for (int i = 0; i < n; ++i)
for (int mask = 0; mask < (1 << n); ++mask)
if (!(mask & (1 << i)))
G[mask] += G[mask | (1 << i)];
for (int mask = 0; mask < (1 << n); ++mask)
cout << G[mask] << " \n"[mask == (1 << n) - 1];
}Version 4: AND Convolution
Given arrays
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
int n;
cin >> n;
vector<long long> A(1 << n), B(1 << n);
for (auto& x : A) cin >> x;
for (auto& x : B) cin >> x;
// Step 1: Superset-sum (Zeta) both arrays
auto zeta_super = [&](vector<long long>& f) {
for (int i = 0; i < n; ++i)
for (int mask = 0; mask < (1 << n); ++mask)
if (!(mask & (1 << i)))
f[mask] += f[mask | (1 << i)];
};
zeta_super(A);
zeta_super(B);
// Step 2: Pointwise multiply
vector<long long> C(1 << n);
for (int mask = 0; mask < (1 << n); ++mask)
C[mask] = A[mask] * B[mask];
// Step 3: Inverse (Mobius) -- superset version: subtract upward
for (int i = 0; i < n; ++i)
for (int mask = 0; mask < (1 << n); ++mask)
if (!(mask & (1 << i)))
C[mask] -= C[mask | (1 << i)];
for (int mask = 0; mask < (1 << n); ++mask)
cout << C[mask] << " \n"[mask == (1 << n) - 1];
}Version 5: OR Convolution
Given arrays
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
int n;
cin >> n;
vector<long long> A(1 << n), B(1 << n);
for (auto& x : A) cin >> x;
for (auto& x : B) cin >> x;
// Step 1: Subset-sum (Zeta) both arrays
auto zeta_sub = [&](vector<long long>& f) {
for (int i = 0; i < n; ++i)
for (int mask = 0; mask < (1 << n); ++mask)
if (mask & (1 << i))
f[mask] += f[mask ^ (1 << i)];
};
zeta_sub(A);
zeta_sub(B);
// Step 2: Pointwise multiply
vector<long long> C(1 << n);
for (int mask = 0; mask < (1 << n); ++mask)
C[mask] = A[mask] * B[mask];
// Step 3: Inverse (Mobius) -- subset version: subtract downward
for (int i = 0; i < n; ++i)
for (int mask = 0; mask < (1 << n); ++mask)
if (mask & (1 << i))
C[mask] -= C[mask ^ (1 << i)];
for (int mask = 0; mask < (1 << n); ++mask)
cout << C[mask] << " \n"[mask == (1 << n) - 1];
}Operations Reference
| Operation | Time | Space | When |
|---|---|---|---|
| Subset sum (Zeta transform) | |||
| Superset sum (upward Zeta) | |||
| Mobius inversion (subset) | Recover | ||
| Mobius inversion (superset) | Recover | ||
| AND convolution | |||
| OR convolution | $C[m] = \sum_{i \mathbin{ | ||
| Brute-force submask enumeration | Iterate all submasks of all masks |
Practical limits:
Memory:
Problem Patterns & Tricks
Pattern 1: Subset Sum Transform
Compute
Use case: You have values indexed by bitmasks and need aggregate information over all submasks. Common in counting problems: "how many elements have a property compatible with mask?"
Key trick: To find, for each mask, whether any submask has a nonzero value (not just the sum), store the value itself and check
Examples: CF 165E -- Compatible Numbers
Pattern 2: Superset Sum and Complementary Counting
Compute
Use case: "Count elements whose bits are a superset of mask" -- i.e., elements that have at least the bits in mask set.
Examples: CF 449D -- Jzzhu and Numbers
Pattern 3: AND Convolution
Recipe: (1) Superset-Zeta both
Why it works: Superset sum of
Examples: CF 1620F -- Bipartite Array
Pattern 4: OR Convolution
Recipe: (1) Subset-Zeta both
Why it works: Symmetric to AND convolution. Subset sum turns OR into pointwise product because
Examples: CF 800D -- Varying Kibibits
Pattern 5: Inclusion-Exclusion via Mobius
Many counting problems ask: "how many subsets satisfy ALL of several constraints?" Encode constraints as bits. The superset sum counts subsets satisfying at least the given constraints. Mobius inversion recovers the count for exactly the given constraints.
Classic pattern:
Count subsets with AND = 0:
1. Let cnt[mask] = number of array elements equal to mask.
2. Superset-Zeta: S[mask] = number of elements with all bits of mask set.
3. The number of k-tuples with AND = mask (exactly) is recovered by
Mobius inversion of S[mask]^k.Examples: CF 449D -- Jzzhu and Numbers, CF 1036F -- Relatively Prime Powers
Pattern 6: Finding a Compatible Element
"For each
Rephrase: find
cpp
vector<int> witness(1 << n, -1);
for (int i = 0; i < sz; ++i)
witness[A[i]] = i; // store index of some element equal to this mask
for (int i = 0; i < n; ++i)
for (int mask = 0; mask < (1 << n); ++mask)
if (!(mask & (1 << i)) && witness[mask] == -1)
witness[mask] = witness[mask | (1 << i)];
// witness[mask] = index of some element that is a submask of mask (or -1)
// For element A[i], check witness[complement of A[i]].Examples: CF 165E -- Compatible Numbers
Pattern 7: Counting Subsets with a Given Popcount
Combine SOS DP with popcount tracking. Instead of a single sum, maintain a vector of sums by popcount:
cpp
// dp[mask][k] = sum of A[s] over submasks s of mask with popcount(s) = kThis costs
Contest Cheat Sheet
+------------------------------------------------------------------+
| SOS DP CHEAT SHEET (AL-30) |
+------------------------------------------------------------------+
| WHAT IT COMPUTES: |
| F[mask] = sum of A[s] for all submasks s of mask |
| (also: superset sums, AND/OR convolution, Mobius inversion) |
+------------------------------------------------------------------+
| SUBSET SUM (Zeta): |
| for (int i = 0; i < n; ++i) |
| for (int m = 0; m < (1<<n); ++m) |
| if (m & (1<<i)) F[m] += F[m ^ (1<<i)]; |
+------------------------------------------------------------------+
| SUPERSET SUM: |
| for (int i = 0; i < n; ++i) |
| for (int m = 0; m < (1<<n); ++m) |
| if (!(m & (1<<i))) G[m] += G[m | (1<<i)]; |
+------------------------------------------------------------------+
| MOBIUS (inverse of Zeta): same loops, subtract instead of add |
+------------------------------------------------------------------+
| AND CONVOLUTION: super-Zeta -> multiply -> super-Mobius |
| OR CONVOLUTION: sub-Zeta -> multiply -> sub-Mobius |
+------------------------------------------------------------------+
| TIME: O(n * 2^n) n <= 20 comfortable |
| SPACE: O(2^n) 8 MB for n=20 with long long |
+------------------------------------------------------------------+
| INVARIANT: after bit j, bits 0..j are "free", j+1..n-1 "locked" |
+------------------------------------------------------------------+
| TRIGGERS: "sum over subsets", "AND/OR convolution", n <= 20, |
| "Zeta/Mobius transform", speed up O(3^n) submask enum |
+------------------------------------------------------------------+Gotchas & Debugging
1. Loop order matters
The outer loop must be over bit positions, the inner loop over masks. Reversing them computes something entirely wrong -- you would process all bits for mask 0 before touching mask 1, breaking the layer-by-layer invariant.
cpp
// CORRECT // WRONG
for (int i = 0; i < n; ++i) for (int m = 0; m < (1<<n); ++m)
for (int m = 0; m < (1<<n); ++m) for (int i = 0; i < n; ++i)
... ...2. Subset vs. superset direction
Subset sum: condition is if (mask & (1 << i)), operation is += F[mask ^ (1 << i)] (bit off). Superset sum: condition is if (!(mask & (1 << i))), operation is += G[mask | (1 << i)] (bit on).
Mixing these up produces the transform in the wrong direction. Double-check with
3. Mobius is just subtraction
The inverse of the subset-sum Zeta transform is identical code with -= instead of +=. Do not reverse the loop order for inversion -- the same loop order with subtraction works because the transform is its own structural inverse under sign change.
4. Overflow with pointwise multiplication
In AND/OR convolution, after Zeta transform the values can be as large as long long. Use modular arithmetic if the problem requires it, or __int128 for intermediate products.
5. Off-by-one in
If values range from
6. Forgetting the initial copy
SOS DP modifies the array in place. If you need the original
7. Debug checklist
- Test with
or -- small enough to verify every entry by hand. - Verify
(full mask is the supermask of everything). - Verify
(empty mask has only one submask: itself). - After Mobius inversion of a Zeta-transformed array, you should recover the original.
- For convolution: verify
(AND) or (OR).
Mental Traps
"SOS DP iterates over all subsets." No -- it iterates over all bitmasks and all dimensions (bit positions). The magic is that it never explicitly enumerates the subsets of any single mask. Each pass processes one dimension across every mask, and the composition of
"SOS is just inclusion-exclusion." SOS DP computes the Zeta transform (subset sum):
"The dimension loop order doesn't matter." It absolutely does. Each dimension (bit position) must be fully processed across ALL masks before moving to the next dimension. Think of it as dp[mask] has correctly accumulated contributions from submasks differing only in bits
Mental Model: SOS DP State Transitions for n = 3
===================================================
Think of each dp[mask] as a "bucket" being filled in 3 sweeps.
Sweep 0 (bit 0): Sweep 1 (bit 1): Sweep 2 (bit 2):
dp[111]: dp[111]: dp[111]:
+= dp[110] += dp[101] += dp[011]
{111,110} {111,110,101,100} {ALL 8}
dp[101]: dp[101]: dp[101]:
+= dp[100] (bit 1 off, skip) += dp[001]
{101,100} {101,100} {101,100,001,000}
dp[011]: dp[011]: dp[011]:
+= dp[010] += dp[001] (bit 2 off, skip)
{011,010} {011,010,001,000} {011,010,001,000}
Each sweep: bit i MUST be fully processed for ALL masks
before sweep i+1 begins. This is non-negotiable.Common Mistakes
- Swapping loop order. The outer loop must be bit-position, inner must be mask. Reversing them breaks the layer-by-layer invariant -- each dimension must be fully processed across all masks before the next.
- Wrong direction (subset vs superset). Subset-sum:
if (mask & (1 << i)) f[mask] += f[mask ^ (1 << i)]. Superset-sum: flip the condition --if (!(mask & (1 << i))) f[mask] += f[mask | (1 << i)]. Sweeping in the wrong direction gives the dual problem and the numbers look plausible on small tests. Always comment the intended direction above the loop. - Mobius confused with re-running Zeta. Mobius inversion is the same loop structure with
-=instead of+=; do not reverse the loop order for inversion. - Overflow in AND/OR convolution. After the Zeta transform values grow to
sum |A[i]|. Multiplying two such values overflowslong long. Use modular arithmetic or__int128. - Forgetting the destructive nature of SOS. SOS modifies the array in place. If you later need the original counts (e.g., to subtract self-contribution), save a copy before running the sweep.
Debug Drills
Drill 1. Off-by-one in bit loop.
cpp
// SOS DP: for each mask, compute sum of f[s] for all s that are subsets of mask
int f[1 << 20];
int n = 20;
// populate f[mask] with initial values...
for (int i = 0; i <= n; i++) { // BUG: i <= n goes one bit too far
for (int mask = 0; mask < (1 << n); mask++) {
if (mask & (1 << i))
f[mask] += f[mask ^ (1 << i)];
}
}What's wrong?
The outer loop goes i = 0 to i = n (inclusive), which is n+1 iterations. When i = n, (1 << i) equals (1 << 20) = 1048576, which is outside the valid mask range [0, 2²⁰ - 1]. This reads/writes out of bounds. Fix: Change the condition to i < n (i.e., for (int i = 0; i < n; i++)). The loop should iterate over bit positions 0 through n-1.
Drill 2. Wrong subset direction (superset instead of subset).
cpp
// Goal: f[mask] = sum of original f[s] for all s ⊆ mask
// (subset sum / zeta transform)
for (int i = 0; i < n; i++) {
for (int mask = 0; mask < (1 << n); mask++) {
if (!(mask & (1 << i))) // BUG: this is superset direction!
f[mask] += f[mask | (1 << i)];
}
}What's wrong?
This code performs the superset-sum (for each mask, sum over all supermasks), not subset-sum. The condition !(mask & (1 << i)) selects masks that don't have bit i, then adds the value from the mask that does have bit i. This propagates values downward (supersets -> subsets). Fix: For subset-sum, flip the condition: if (mask & (1 << i)) f[mask] += f[mask ^ (1 << i)];. This propagates values upward (subsets -> supersets).
Drill 3. Forgetting to save the original array before SOS sweep.
cpp
// Count, for each mask, how many of the n input values are subsets of mask
int cnt[1 << 20];
memset(cnt, 0, sizeof(cnt));
int n_vals;
cin >> n_vals;
for (int i = 0; i < n_vals; i++) {
int x; cin >> x;
cnt[x]++; // only increment, not set -- but what if x appears multiple times?
}
// SOS sweep
for (int i = 0; i < 20; i++)
for (int mask = 0; mask < (1 << 20); mask++)
if (mask & (1 << i))
cnt[mask] += cnt[mask ^ (1 << i)];
// Now use cnt[mask] as "number of input values that are subsets of mask"
// BUG: works correctly... until someone adds cnt[x]-- later thinking they can undoWhat's wrong?
This code is actually correct for the stated purpose -- but the subtle trap is that the SOS sweep is destructive: after the sweep, cnt[mask] no longer holds the original count of values equal to mask; it holds the cumulative subset sum. If any later code tries to read or modify the original cnt[x] values (e.g., subtracting self-contribution for "count subsets of x among other values"), it will use the wrong (cumulative) value. Fix: Save a copy of the original array before the SOS sweep: int orig[1 << 20]; memcpy(orig, cnt, sizeof(cnt));. Then orig[x] gives the raw count and cnt[x] gives the subset-sum count.
Self-Test
Before moving on, verify you can do the following without referencing the notes:
- [ ] Implement SOS DP from scratch: for each mask, compute the sum of
over all . Write the correct nested loop (bit-outer, mask-inner) with the right condition and update. - [ ] Explain the complexity: why is SOS DP
instead of ? State the key insight (dimension decomposition eliminates redundant subset enumeration). - [ ] State the dimension-sweep invariant: after processing bit
, dp[mask]has accumulated all submasks that agree withmaskon bitsand are free on bits . - [ ] Implement the superset sum: flip the condition -- process masks where bit
is NOT set, and add dp[mask | (1 << i)]. - [ ] Apply SOS DP to a concrete problem: given an array of integers, for each element
, count how many other elements satisfy (i.e., is an AND-submask of ). Set up the frequency array and run the Zeta transform. - [ ] Describe the Zeta/Möbius relationship: SOS DP computes the Zeta transform; Möbius inversion (same loops, subtract instead of add) undoes it. State when you need each.
Practice Problems
| CF Rating | What You Should Be Comfortable With |
|---|---|
| 1200 | Understand bitmask basics; enumerate all submasks of a mask in O(3ⁿ); know what "subset sum over lattice" means conceptually |
| 1500 | Implement the standard SOS DP loop (iterate over bits, then over masks); solve "for each mask, count how many input values are subsets of it" |
| 1800 | Apply SOS to AND/OR convolutions; use superset-sum (flip inclusion direction); combine SOS with Möbius inversion to get exact counts (not cumulative); solve CSES Bit Problem |
| 2100 | XOR convolution (Walsh-Hadamard Transform); SOS in non-binary bases (ternary zeta for "divisibility" lattices); SOS on large alphabets with dimension-by-dimension sweeping |
| # | Problem | Source | Difficulty | Key Idea |
|---|---|---|---|---|
| 1 | Compatible Numbers | CF 165E | 1800 | SOS DP with witness propagation -- find |
| 2 | Jzzhu and Numbers | CF 449D | 2000 | Superset sum + inclusion-exclusion for AND = 0 |
| 3 | Bits And Pieces | CF 1208F | 2100 | Superset SOS to greedily maximize $a_i \mathbin{ |
| 4 | Relatively Prime Powers | CF 1036F | 2000 | Mobius function over exponent masks |
| 5 | Vowels | CF 383E | 1900 | SOS DP to count triples with at least one vowel |
| 6 | Varying Kibibits | CF 800D | 2100 | Generalized SOS in non-binary base (ternary Zeta) |
| 7 | Prefix Function Queries | CF 1553F | 2100 | Bitmask transform to handle string matching constraints |
| 8 | AND Closure | CF 1680F | 2000 | Subset-sum transform to find AND-closed sets |
| 9 | Special Pairs | CF 1761E | 1900 | OR convolution to count pairs |
| 10 | Expected Power | CF 2020E | 1900 | SOS DP with probability over XOR subsets |
| 11 | Bit Problem | CSES | 1800 | For each x, count elements where x AND y = y (subset sum) |
| 12 | Reachable Nodes | CSES | 2100 | Bitset reachability via SOS-like propagation |
Further Reading
- CF Blog -- SOS Dynamic Programming -- the definitive tutorial by usaxena95. Start here.
- CF Blog -- Sum over Subsets and Mobius Inversion -- in-depth treatment of Mobius inversion on the subset lattice.
- cp-algorithms -- Sum over Subsets -- covers
submask enumeration and comparison with SOS. - CF Blog -- Bitwise Convolutions (AND, OR, XOR) -- unifies AND/OR/XOR convolution using subset/superset/Walsh-Hadamard transforms.
- See: DP -- Bitmask -- prerequisite; covers the
submask enumeration that SOS DP improves upon. - See: Bit Manipulation -- fundamental bit operations used throughout.