Skip to content

SOS DP (Sum Over Subsets)

Quick Revisit

  • USE WHEN: Need F[mask]=smaskA[s] for all masks (subset-sum aggregation, AND/OR convolution, counting divisor-like relations on bitmasks)
  • INVARIANT: After processing bit i, F[m] accounts for all subsets of m that differ only in bits 0..i
  • TIME: O(n · 2ⁿ) | SPACE: O(2ⁿ)
  • CLASSIC BUG: Iterating masks inside the bit loop instead of bit inside masks — breaks the dimension-by-dimension sweep
  • PRACTICE: 04-ladder-dp

AL-30 | For an array A[02n1] indexed by bitmasks, compute F[mask]=smaskA[s] for every mask -- in O(n2n) instead of O(3n). The "Zeta transform" over the subset lattice.

Difficulty: [Advanced]Prerequisites: DP -- Bitmask, Bit ManipulationCF Rating Range: 1800-2100

Contents


Intuition

Imagine you are building a team-composition analyzer for a game with n skills. Each player subset has a power score, and for every team you want the total power of all sub-teams it contains. In bitmask terms: you have an array A of 2n values, one per bitmask of n bits, and you need to compute, for every mask:

F[mask]=smaskA[s]

The brute-force approach iterates over all submasks of each mask:

cpp
for (int mask = 0; mask < (1 << n); ++mask)
    for (int s = mask; ; s = (s - 1) & mask) {
        F[mask] += A[s];
        if (s == 0) break;
    }

Each mask m with k set bits has 2k submasks. Summing over all masks:

k=0n(nk)2k=(1+2)n=3n

For n=20: 3203.5×109 -- far too slow. We need something fundamentally better.

Subset Lattice (Hasse Diagram) for n = 3
=========================================

Each node is a bitmask. An edge from A up to B means A ⊂ B
(A is a subset of B, obtained by setting one more bit).
SOS DP computes, for each node, the sum of values at ALL
nodes reachable going DOWNWARD.

                         111
                        / | \
                     110  101  011
                      |\ /\ /|
                     100 010 001
                        \ | /
                         000

Subsets of 101:               Subsets of 110:
     101                           110
     / \                           / \
   100   001                     100   010
     \   /                         \  /
      000                          000

F[101] = A[000]+A[001]+A[100]+A[101]
F[110] = A[000]+A[010]+A[100]+A[110]
F[111] = sum of ALL A[mask]  (every mask is a subset of 111)

Process bits one at a time: for each bit position, either include it or don't -- this "Zeta transform" computes all F[mask] in O(n2n) instead of O(3n).

Think of it as a high-dimensional prefix sum. If you have a 2D grid, you compute prefix sums first along rows, then along columns -- two separate passes. SOS DP does exactly the same thing, but in n dimensions, one per bit position.

For each bit i, we ask: "What if bit i were turned off?" If bit i is set in the mask, we accumulate the contribution from the version of the mask with bit i cleared. After sweeping through all n bits, every submask's contribution has been included exactly once.

cpp
for (int i = 0; i < n; ++i)                        // one pass per bit
    for (int mask = 0; mask < (1 << n); ++mask)
        if (mask & (1 << i))
            F[mask] += F[mask ^ (1 << i)];          // add "bit i off" version

Total work: n2n. For n=20: 202202×107. Excellent.

Visual Walkthrough

n=3, so there are 23=8 values. We label masks in binary: 000,001,010,,111.

Initial array (copy of A):

index(bin)  000  001  010  011  100  101  110  111
F[]:          2    3    5    7   11   13   17   19

Layer 0 -- process bit 0 (the 1-bit): For every mask with bit 0 set, add the value from the mask with bit 0 cleared.

mask  011: bit 0 set -> F[011] += F[010]  =>  7 + 5  = 12
mask  001: bit 0 set -> F[001] += F[000]  =>  3 + 2  =  5
mask  101: bit 0 set -> F[101] += F[100]  => 13 + 11 = 24
mask  111: bit 0 set -> F[111] += F[110]  => 19 + 17 = 36

After layer 0:
index(bin)  000  001  010  011  100  101  110  111
F[]:          2    5    5   12   11   24   17   36

Meaning: F[m] now includes sums where bit 0 may or may not be set.

Layer 1 -- process bit 1 (the 2-bit): For every mask with bit 1 set, add the value from the mask with bit 1 cleared.

mask  010: bit 1 set -> F[010] += F[000]  =>  5 + 2  =  7
mask  011: bit 1 set -> F[011] += F[001]  => 12 + 5  = 17
mask  110: bit 1 set -> F[110] += F[100]  => 17 + 11 = 28
mask  111: bit 1 set -> F[111] += F[101]  => 36 + 24 = 60

After layer 1:
index(bin)  000  001  010  011  100  101  110  111
F[]:          2    5    7   17   11   24   28   60

Layer 2 -- process bit 2 (the 4-bit): For every mask with bit 2 set, add the value from the mask with bit 2 cleared.

mask  100: bit 2 set -> F[100] += F[000]  => 11 + 2  = 13
mask  101: bit 2 set -> F[101] += F[001]  => 24 + 5  = 29
mask  110: bit 2 set -> F[110] += F[010]  => 28 + 7  = 35
mask  111: bit 2 set -> F[111] += F[011]  => 60 + 17 = 77

After layer 2 (FINAL):
index(bin)  000  001  010  011  100  101  110  111
F[]:          2    5    7   17   13   29   35   77

Verification: F[111] should be s111A[s]=2+3+5+7+11+13+17+19=77. Correct.

F[101] should be A[000]+A[001]+A[100]+A[101]=2+3+11+13=29. Correct.

Layer-by-layer diagram (n = 3):

    A[mask]  --bit 0-->  sum over   --bit 1-->  sum over    --bit 2-->  sum over
                         subsets                subsets                  ALL subsets
                         differing              differing               = F[mask]
                         in bit 0               in bits 0-1

    Each layer doubles the "freedom" by one more bit position.
    After all n layers, every submask has been counted exactly once.
SOS DP Dimension Sweep -- State After Each Pass (n = 3)
========================================================

Shows which submasks' A-values are included in dp[mask] after each pass.

                 INIT          After dim 0      After dim 1       After dim 2
mask            (= A[mask])   (bit 0 free)     (bits 0-1 free)   (ALL free)
───────────────────────────────────────────────────────────────────────────────
000             {000}          {000}            {000}             {000}
001             {001}          {000,001}        {000,001}         {000,001}
010             {010}          {010}            {000,010}         {000,010}
011             {011}          {010,011}        {000,001,         {000,001,
                                                010,011}          010,011}
100             {100}          {100}            {100}             {000,100}
101             {101}          {100,101}        {100,101}         {000,001,
                                                                  100,101}
110             {110}          {110}            {100,110}         {000,010,
                                                                  100,110}
111             {111}          {110,111}        {100,101,         {ALL 8
                                                110,111}          masks}

Each dimension pass DOUBLES the submasks covered (on average).
After all 3 passes, dp[mask] = sum over ALL submasks of mask.

State transitions for mask = 111:
  INIT:    {111}
  dim 0:   {111} ∪ dp[110] = {111} ∪ {110}         = {110,111}
  dim 1:   {110,111} ∪ dp[101] = {110,111} ∪ {100,101}
                                                     = {100,101,110,111}
  dim 2:   {100,101,110,111} ∪ dp[011] = ... ∪ {000,001,010,011}
                                                     = {ALL 8 masks}  OK

The Invariant

+-----------------------------------------------------------------------+
| After processing bit j, dp[mask] contains the sum of A[s] over all   |
| masks s such that:                                                    |
|   - s agrees with mask on bits j+1, j+2, ..., n-1                    |
|   - s is a submask of mask on bits 0, 1, ..., j                      |
|                                                                       |
| In other words, bits 0..j are "free" (can be 0 even if mask has 1),  |
| while bits j+1..n-1 are "locked" (must match mask exactly).          |
|                                                                       |
| After processing all n bits, every bit is free => full subset sum.    |
+-----------------------------------------------------------------------+
Why SOS Beats Naive: Eliminating Redundant Work (n = 3)
=========================================================

NAIVE O(3^n): Each mask independently enumerates its own submasks.

  mask=111 visits: 000 001 010 011 100 101 110 111  (8 submasks)
  mask=110 visits: 000     010     100     110       (4 submasks)
  mask=101 visits: 000 001         100 101           (4 submasks)
  mask=011 visits: 000 001 010 011                   (4 submasks)
  mask=100 visits: 000             100               (2 submasks)
  mask=010 visits: 000     010                       (2 submasks)
  mask=001 visits: 000 001                           (2 submasks)
  mask=000 visits: 000                               (1 submask)
                                             Total:  27 = 3^3

  Notice: A[000] is visited 8 times! A[001] visited 4 times!
  Massive redundancy -- each contribution recomputed from scratch.

SOS O(n * 2^n): Three passes, each touching ALL 2^n masks once.

  Pass 0 (bit 0):  8 masks x 1 check/add  =  8 ops
  Pass 1 (bit 1):  8 masks x 1 check/add  =  8 ops
  Pass 2 (bit 2):  8 masks x 1 check/add  =  8 ops
                                    Total:   24 = 3 x 2^3

  The trick: pass 0 propagates bit-0 contributions to ALL masks
  that need them -- ONCE. Pass 1 reuses that work, adding bit-1
  contributions. Each subsequent pass builds on all previous ones.

  It's the same idea as separable convolution:
    2D brute force:   O(K² per pixel)
    Separable:        O(K per pixel) in each of 2 passes = O(2K)

  Here: n "dimensions" (bits), each pass costs O(2^n).
  Total: O(n * 2^n) instead of O(3^n).

What the Code Won't Teach You

The Zeta/Möbius transform connection. SOS DP is not just an algorithm -- it is the Zeta transform over the subset lattice (a partially ordered set where AB iff AB). Given g(mask), the Zeta transform produces f(mask)=smaskg(s). The Möbius inversion undoes it: given f, recover g. In code the only difference is += vs -=. Recognizing "this is a lattice transform" unlocks the entire family -- superset sums, AND/OR convolution, and inclusion-exclusion all become instances of the same algebraic structure.

Why dimension-by-dimension works. The subset relation decomposes bit-by-bit: sm iff for every bit i, simi. This means the n-dimensional sum factors into n independent 1-dimensional sums -- exactly like separable convolution in image processing. A 2D Gaussian blur is O(K2) per pixel naively, but O(2K) with two 1D passes. SOS DP is the same principle in n binary dimensions: each "pass" handles one bit-dimension, and the passes compose to give the full subset sum.

When SOS DP applies. Any time you need f(mask)=smaskg(s) computed for ALL masks simultaneously. The key word is simultaneously -- if you only need the subset sum for a single specific mask, direct enumeration in O(2k) (where k = popcount) may be faster. SOS DP's advantage is the batch computation: all 2n subset sums at once.

The complexity insight. Naive subset enumeration over all masks costs O(3n) because each of n bits independently falls into one of three categories: (1) in mask and in submask, (2) in mask but not in submask, (3) not in mask. SOS DP reduces this to O(n2n) by eliminating the redundant re-enumeration -- each bit-dimension pass costs O(2n), and there are n passes. For n=20: 3203.5×109 vs 20×2202×107 -- a 175x speedup.

With the deeper structure understood, here are the practical signals that point to SOS DP.

When to Reach for This

Trigger phrases in the problem statement:

  • "Sum over all subsets" / "aggregate over submasks" / "for every mask, combine contributions from submasks."
  • "Zeta transform" / "Mobius transform" / "inclusion-exclusion over bitmasks."
  • "AND convolution" / "OR convolution" -- these reduce to SOS DP.
  • The problem involves 2n states and asks for a transform that relates masks to their submasks or supermasks.
  • You already have an O(3n) submask-enumeration solution and need to speed it up.

Counter-examples -- SOS DP is NOT the answer when:

  • You need to iterate submasks with individual processing per submask (not just summation) -- O(3n) may be unavoidable.
  • The relationship is between masks and their complements, not submasks -- use direct XOR tricks.
  • n>23 -- even O(n2n) becomes too slow.
  • The aggregation is not decomposable bit-by-bit (e.g., you need the product over a non-commutative structure).

C++ STL Reference

SOS DP relies on standard bit-manipulation builtins rather than dedicated containers:

Function / BuiltinHeaderWhat it doesNotes
__builtin_popcount(x)built-inCount of set bitsUse __builtin_popcountll for long long
__builtin_ctz(x)built-inIndex of lowest set bitUndefined for x == 0
1 << i--Bitmask with only bit i setUse 1LL << i when i >= 31
mask & (1 << i)--Test if bit i is set--
mask ^ (1 << i)--Toggle bit i--
mask | (1 << i)--Set bit i--
mask & (mask - 1)--Clear lowest set bitUseful for iterating set bits
popcount(x)<bit>C++20 standard popcount--
__lg(x)built-inFloor of log2(x)Index of highest set bit

Implementation (Contest-Ready)

Version 1: SOS DP -- Zeta Transform (Minimal)

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<long long> A(1 << n);
    for (auto& x : A) cin >> x;

    // F[mask] = sum of A[s] for all s that are submasks of mask
    vector<long long> F(A.begin(), A.end());
    for (int i = 0; i < n; ++i)
        for (int mask = 0; mask < (1 << n); ++mask)
            if (mask & (1 << i))
                F[mask] += F[mask ^ (1 << i)];

    for (int mask = 0; mask < (1 << n); ++mask)
        cout << F[mask] << " \n"[mask == (1 << n) - 1];
}

Loop Order Warning

Loop order is critical: bit-outer, mask-inner. The correct SOS DP iterates:

cpp
for (int bit = 0; bit < N; bit++)          // OUTER: which dimension
    for (int mask = 0; mask < (1 << N); mask++)  // INNER: which subset
        if (mask & (1 << bit))
            dp[mask] += dp[mask ^ (1 << bit)];

Why not mask-outer? If you iterate masks in the outer loop, when processing mask M at bit B, the value dp[M ^ (1 << B)] may already include contributions from higher bits that shouldn't be counted yet. The bit-outer loop ensures each dimension is processed independently across ALL masks before moving to the next dimension.

This is analogous to how the knapsack loop direction matters -- same principle of "which row am I reading from?"

Version 1: SOS DP -- Zeta Transform (Explained)

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<long long> A(1 << n);
    for (auto& x : A) cin >> x;

    // Goal: for each mask, compute F[mask] = sum of A[s] for all submasks s of mask.
    //
    // Approach: process one bit dimension at a time.
    // After processing bit i, F[mask] accounts for all submasks that may
    // differ from mask on bits 0..i (but must agree on bits i+1..n-1).
    vector<long long> F(A.begin(), A.end());

    for (int i = 0; i < n; ++i) {
        for (int mask = 0; mask < (1 << n); ++mask) {
            if (mask & (1 << i)) {
                // mask has bit i set. The submask with bit i cleared
                // is mask ^ (1 << i). Accumulate its (partial) sum.
                F[mask] += F[mask ^ (1 << i)];
            }
            // If bit i is not set in mask, no submask of mask can have
            // bit i set, so there is nothing to add.
        }
    }

    for (int mask = 0; mask < (1 << n); ++mask)
        cout << F[mask] << " \n"[mask == (1 << n) - 1];
}

Version 2: Mobius Transform -- Inverse of SOS (Minimal)

Undo the Zeta transform: given F[mask]=smaskA[s], recover A from F.

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<long long> F(1 << n);
    for (auto& x : F) cin >> x;

    // Mobius inversion: subtract instead of add, same loop structure
    for (int i = 0; i < n; ++i)
        for (int mask = 0; mask < (1 << n); ++mask)
            if (mask & (1 << i))
                F[mask] -= F[mask ^ (1 << i)];

    // Now F[mask] = original A[mask]
    for (int mask = 0; mask < (1 << n); ++mask)
        cout << F[mask] << " \n"[mask == (1 << n) - 1];
}

Version 3: Superset Sum (Zeta Transform -- Upward)

Compute G[mask]=smaskA[s] -- sum over all supermasks.

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<long long> A(1 << n);
    for (auto& x : A) cin >> x;

    // Superset sum: flip the condition -- add when bit i is NOT set
    vector<long long> G(A.begin(), A.end());
    for (int i = 0; i < n; ++i)
        for (int mask = 0; mask < (1 << n); ++mask)
            if (!(mask & (1 << i)))
                G[mask] += G[mask | (1 << i)];

    for (int mask = 0; mask < (1 << n); ++mask)
        cout << G[mask] << " \n"[mask == (1 << n) - 1];
}

Version 4: AND Convolution

Given arrays A and B of size 2n, compute C[mask]=i&j=maskA[i]B[j].

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<long long> A(1 << n), B(1 << n);
    for (auto& x : A) cin >> x;
    for (auto& x : B) cin >> x;

    // Step 1: Superset-sum (Zeta) both arrays
    auto zeta_super = [&](vector<long long>& f) {
        for (int i = 0; i < n; ++i)
            for (int mask = 0; mask < (1 << n); ++mask)
                if (!(mask & (1 << i)))
                    f[mask] += f[mask | (1 << i)];
    };
    zeta_super(A);
    zeta_super(B);

    // Step 2: Pointwise multiply
    vector<long long> C(1 << n);
    for (int mask = 0; mask < (1 << n); ++mask)
        C[mask] = A[mask] * B[mask];

    // Step 3: Inverse (Mobius) -- superset version: subtract upward
    for (int i = 0; i < n; ++i)
        for (int mask = 0; mask < (1 << n); ++mask)
            if (!(mask & (1 << i)))
                C[mask] -= C[mask | (1 << i)];

    for (int mask = 0; mask < (1 << n); ++mask)
        cout << C[mask] << " \n"[mask == (1 << n) - 1];
}

Version 5: OR Convolution

Given arrays A and B of size 2n, compute C[mask]=i|j=maskA[i]B[j].

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<long long> A(1 << n), B(1 << n);
    for (auto& x : A) cin >> x;
    for (auto& x : B) cin >> x;

    // Step 1: Subset-sum (Zeta) both arrays
    auto zeta_sub = [&](vector<long long>& f) {
        for (int i = 0; i < n; ++i)
            for (int mask = 0; mask < (1 << n); ++mask)
                if (mask & (1 << i))
                    f[mask] += f[mask ^ (1 << i)];
    };
    zeta_sub(A);
    zeta_sub(B);

    // Step 2: Pointwise multiply
    vector<long long> C(1 << n);
    for (int mask = 0; mask < (1 << n); ++mask)
        C[mask] = A[mask] * B[mask];

    // Step 3: Inverse (Mobius) -- subset version: subtract downward
    for (int i = 0; i < n; ++i)
        for (int mask = 0; mask < (1 << n); ++mask)
            if (mask & (1 << i))
                C[mask] -= C[mask ^ (1 << i)];

    for (int mask = 0; mask < (1 << n); ++mask)
        cout << C[mask] << " \n"[mask == (1 << n) - 1];
}

Operations Reference

OperationTimeSpaceWhen
Subset sum (Zeta transform)O(n2n)O(2n)F[m]=smA[s]
Superset sum (upward Zeta)O(n2n)O(2n)G[m]=smA[s]
Mobius inversion (subset)O(n2n)O(2n)Recover A from subset sums
Mobius inversion (superset)O(n2n)O(2n)Recover A from superset sums
AND convolutionO(n2n)O(2n)C[m]=i&j=mA[i]B[j]
OR convolutionO(n2n)O(2n)$C[m] = \sum_{i \mathbin{
Brute-force submask enumerationO(3n)O(2n)Iterate all submasks of all masks

Practical limits: n20 for O(n2n); n15 for O(3n).

Memory: 220×8 bytes (long long) =8 MB -- very comfortable.


Problem Patterns & Tricks

Pattern 1: Subset Sum Transform

Compute F[mask]=smaskA[s]. The direct application of SOS DP.

Use case: You have values indexed by bitmasks and need aggregate information over all submasks. Common in counting problems: "how many elements have a property compatible with mask?"

Key trick: To find, for each mask, whether any submask has a nonzero value (not just the sum), store the value itself and check F[mask]0 after the transform. Or store a specific witness (e.g., the index) instead of a sum.

Examples: CF 165E -- Compatible Numbers


Pattern 2: Superset Sum and Complementary Counting

Compute G[mask]=smaskA[s] -- flip the direction. Equivalently, G[mask]=F[mask] where F is the subset sum of A[m]=A[m] (complement all masks). But the direct "superset Zeta" is simpler.

Use case: "Count elements whose bits are a superset of mask" -- i.e., elements that have at least the bits in mask set.

Examples: CF 449D -- Jzzhu and Numbers


Pattern 3: AND Convolution

C[m]=i&j=mA[i]B[j].

Recipe: (1) Superset-Zeta both A and B. (2) Pointwise multiply. (3) Superset-Mobius the result.

Why it works: Superset sum of A at mask m equals A^[m]=smA[s]. The property sm and tm iff s&tm means A^[m]B^[m]=C^[m].

Examples: CF 1620F -- Bipartite Array


Pattern 4: OR Convolution

C[m]=i|j=mA[i]B[j].

Recipe: (1) Subset-Zeta both A and B. (2) Pointwise multiply. (3) Subset-Mobius the result.

Why it works: Symmetric to AND convolution. Subset sum turns OR into pointwise product because sm and tm iff s|tm.

Examples: CF 800D -- Varying Kibibits


Pattern 5: Inclusion-Exclusion via Mobius

Many counting problems ask: "how many subsets satisfy ALL of several constraints?" Encode constraints as bits. The superset sum counts subsets satisfying at least the given constraints. Mobius inversion recovers the count for exactly the given constraints.

Classic pattern:

Count subsets with AND = 0:
  1. Let cnt[mask] = number of array elements equal to mask.
  2. Superset-Zeta: S[mask] = number of elements with all bits of mask set.
  3. The number of k-tuples with AND = mask (exactly) is recovered by
     Mobius inversion of S[mask]^k.

Examples: CF 449D -- Jzzhu and Numbers, CF 1036F -- Relatively Prime Powers


Pattern 6: Finding a Compatible Element

"For each A[i], find any A[j] such that A[i]&A[j]=0."

Rephrase: find j where A[j] is a submask of A[i] (the complement). Use SOS DP storing a witness instead of a sum: propagate any valid index instead of adding values.

cpp
vector<int> witness(1 << n, -1);
for (int i = 0; i < sz; ++i)
    witness[A[i]] = i;  // store index of some element equal to this mask

for (int i = 0; i < n; ++i)
    for (int mask = 0; mask < (1 << n); ++mask)
        if (!(mask & (1 << i)) && witness[mask] == -1)
            witness[mask] = witness[mask | (1 << i)];

// witness[mask] = index of some element that is a submask of mask (or -1)
// For element A[i], check witness[complement of A[i]].

Examples: CF 165E -- Compatible Numbers


Pattern 7: Counting Subsets with a Given Popcount

Combine SOS DP with popcount tracking. Instead of a single sum, maintain a vector of sums by popcount:

cpp
// dp[mask][k] = sum of A[s] over submasks s of mask with popcount(s) = k

This costs O(n22n) -- feasible for n15. Useful when the answer depends not just on which bits are present but on how many.


Contest Cheat Sheet

+------------------------------------------------------------------+
|                SOS DP CHEAT SHEET (AL-30)                        |
+------------------------------------------------------------------+
| WHAT IT COMPUTES:                                                |
|   F[mask] = sum of A[s] for all submasks s of mask               |
|   (also: superset sums, AND/OR convolution, Mobius inversion)    |
+------------------------------------------------------------------+
| SUBSET SUM (Zeta):                                               |
|   for (int i = 0; i < n; ++i)                                   |
|     for (int m = 0; m < (1<<n); ++m)                             |
|       if (m & (1<<i))  F[m] += F[m ^ (1<<i)];                   |
+------------------------------------------------------------------+
| SUPERSET SUM:                                                    |
|   for (int i = 0; i < n; ++i)                                   |
|     for (int m = 0; m < (1<<n); ++m)                             |
|       if (!(m & (1<<i)))  G[m] += G[m | (1<<i)];                |
+------------------------------------------------------------------+
| MOBIUS (inverse of Zeta):  same loops, subtract instead of add   |
+------------------------------------------------------------------+
| AND CONVOLUTION:  super-Zeta -> multiply -> super-Mobius         |
| OR  CONVOLUTION:  sub-Zeta   -> multiply -> sub-Mobius           |
+------------------------------------------------------------------+
| TIME:   O(n * 2^n)       n <= 20 comfortable                    |
| SPACE:  O(2^n)           8 MB for n=20 with long long            |
+------------------------------------------------------------------+
| INVARIANT: after bit j, bits 0..j are "free", j+1..n-1 "locked" |
+------------------------------------------------------------------+
| TRIGGERS: "sum over subsets", "AND/OR convolution", n <= 20,     |
|           "Zeta/Mobius transform", speed up O(3^n) submask enum  |
+------------------------------------------------------------------+

Gotchas & Debugging

1. Loop order matters

The outer loop must be over bit positions, the inner loop over masks. Reversing them computes something entirely wrong -- you would process all bits for mask 0 before touching mask 1, breaking the layer-by-layer invariant.

cpp
// CORRECT                           // WRONG
for (int i = 0; i < n; ++i)          for (int m = 0; m < (1<<n); ++m)
  for (int m = 0; m < (1<<n); ++m)     for (int i = 0; i < n; ++i)
    ...                                   ...

2. Subset vs. superset direction

Subset sum: condition is if (mask & (1 << i)), operation is += F[mask ^ (1 << i)] (bit off). Superset sum: condition is if (!(mask & (1 << i))), operation is += G[mask | (1 << i)] (bit on).

Mixing these up produces the transform in the wrong direction. Double-check with n=2.

3. Mobius is just subtraction

The inverse of the subset-sum Zeta transform is identical code with -= instead of +=. Do not reverse the loop order for inversion -- the same loop order with subtraction works because the transform is its own structural inverse under sign change.

4. Overflow with pointwise multiplication

In AND/OR convolution, after Zeta transform the values can be as large as |A[i]|. Multiplying two such values can overflow long long. Use modular arithmetic if the problem requires it, or __int128 for intermediate products.

5. Off-by-one in n

If values range from 0 to V and you want bitmasks over log2V bits, make sure n is large enough. n=20 handles values up to 2201=1,048,575.

6. Forgetting the initial copy

SOS DP modifies the array in place. If you need the original A later, copy it into F before running the transform. A common bug: running SOS on A directly, then trying to use A for something else.

7. Debug checklist

  1. Test with n=2 or n=3 -- small enough to verify every entry by hand.
  2. Verify F[(1n)1]=A[i] (full mask is the supermask of everything).
  3. Verify F[0]=A[0] (empty mask has only one submask: itself).
  4. After Mobius inversion of a Zeta-transformed array, you should recover the original.
  5. For convolution: verify C[0]=A[0]B[0] (AND) or C[(1n)1]=i|j=fullA[i]B[j] (OR).

Mental Traps

"SOS DP iterates over all subsets." No -- it iterates over all bitmasks and all dimensions (bit positions). The magic is that it never explicitly enumerates the subsets of any single mask. Each pass processes one dimension across every mask, and the composition of n passes covers all subsets implicitly. If you think of it as "visiting subsets," you'll misunderstand the loop structure and the invariant.

"SOS is just inclusion-exclusion." SOS DP computes the Zeta transform (subset sum): f(m)=smg(s). Möbius inversion undoes it: given f, recover g by subtracting overcounted terms. They are inverses of each other -- related but not the same operation. Confusing them leads to applying the wrong one (e.g., running Möbius when you need Zeta, and getting the original values back instead of the sums).

"The dimension loop order doesn't matter." It absolutely does. Each dimension (bit position) must be fully processed across ALL masks before moving to the next dimension. Think of it as n sequential sweeps, one per bit. After sweep i, every dp[mask] has correctly accumulated contributions from submasks differing only in bits 0i. If you swap the loops (mask-outer, bit-inner), a single mask gets all its bit-updates before other masks are updated, breaking the transitive propagation that makes the algorithm correct.

Mental Model: SOS DP State Transitions for n = 3
===================================================

Think of each dp[mask] as a "bucket" being filled in 3 sweeps.

Sweep 0 (bit 0):     Sweep 1 (bit 1):     Sweep 2 (bit 2):

dp[111]:              dp[111]:              dp[111]:
  += dp[110]            += dp[101]            += dp[011]
  {111,110}             {111,110,101,100}     {ALL 8}

dp[101]:              dp[101]:              dp[101]:
  += dp[100]            (bit 1 off, skip)     += dp[001]
  {101,100}             {101,100}             {101,100,001,000}

dp[011]:              dp[011]:              dp[011]:
  += dp[010]            += dp[001]            (bit 2 off, skip)
  {011,010}             {011,010,001,000}     {011,010,001,000}

Each sweep: bit i MUST be fully processed for ALL masks
before sweep i+1 begins. This is non-negotiable.

Common Mistakes

  1. Swapping loop order. The outer loop must be bit-position, inner must be mask. Reversing them breaks the layer-by-layer invariant -- each dimension must be fully processed across all masks before the next.
  2. Wrong direction (subset vs superset). Subset-sum: if (mask & (1 << i)) f[mask] += f[mask ^ (1 << i)]. Superset-sum: flip the condition -- if (!(mask & (1 << i))) f[mask] += f[mask | (1 << i)]. Sweeping in the wrong direction gives the dual problem and the numbers look plausible on small tests. Always comment the intended direction above the loop.
  3. Mobius confused with re-running Zeta. Mobius inversion is the same loop structure with -= instead of +=; do not reverse the loop order for inversion.
  4. Overflow in AND/OR convolution. After the Zeta transform values grow to sum |A[i]|. Multiplying two such values overflows long long. Use modular arithmetic or __int128.
  5. Forgetting the destructive nature of SOS. SOS modifies the array in place. If you later need the original counts (e.g., to subtract self-contribution), save a copy before running the sweep.

Debug Drills

Drill 1. Off-by-one in bit loop.

cpp
// SOS DP: for each mask, compute sum of f[s] for all s that are subsets of mask
int f[1 << 20];
int n = 20;

// populate f[mask] with initial values...

for (int i = 0; i <= n; i++) {           // BUG: i <= n goes one bit too far
    for (int mask = 0; mask < (1 << n); mask++) {
        if (mask & (1 << i))
            f[mask] += f[mask ^ (1 << i)];
    }
}
What's wrong?

The outer loop goes i = 0 to i = n (inclusive), which is n+1 iterations. When i = n, (1 << i) equals (1 << 20) = 1048576, which is outside the valid mask range [0, 2²⁰ - 1]. This reads/writes out of bounds. Fix: Change the condition to i < n (i.e., for (int i = 0; i < n; i++)). The loop should iterate over bit positions 0 through n-1.

Drill 2. Wrong subset direction (superset instead of subset).

cpp
// Goal: f[mask] = sum of original f[s] for all s ⊆ mask
// (subset sum / zeta transform)
for (int i = 0; i < n; i++) {
    for (int mask = 0; mask < (1 << n); mask++) {
        if (!(mask & (1 << i)))               // BUG: this is superset direction!
            f[mask] += f[mask | (1 << i)];
    }
}
What's wrong?

This code performs the superset-sum (for each mask, sum over all supermasks), not subset-sum. The condition !(mask & (1 << i)) selects masks that don't have bit i, then adds the value from the mask that does have bit i. This propagates values downward (supersets -> subsets). Fix: For subset-sum, flip the condition: if (mask & (1 << i)) f[mask] += f[mask ^ (1 << i)];. This propagates values upward (subsets -> supersets).

Drill 3. Forgetting to save the original array before SOS sweep.

cpp
// Count, for each mask, how many of the n input values are subsets of mask
int cnt[1 << 20];
memset(cnt, 0, sizeof(cnt));
int n_vals;
cin >> n_vals;
for (int i = 0; i < n_vals; i++) {
    int x; cin >> x;
    cnt[x]++;   // only increment, not set -- but what if x appears multiple times?
}
// SOS sweep
for (int i = 0; i < 20; i++)
    for (int mask = 0; mask < (1 << 20); mask++)
        if (mask & (1 << i))
            cnt[mask] += cnt[mask ^ (1 << i)];

// Now use cnt[mask] as "number of input values that are subsets of mask"
// BUG: works correctly... until someone adds cnt[x]-- later thinking they can undo
What's wrong?

This code is actually correct for the stated purpose -- but the subtle trap is that the SOS sweep is destructive: after the sweep, cnt[mask] no longer holds the original count of values equal to mask; it holds the cumulative subset sum. If any later code tries to read or modify the original cnt[x] values (e.g., subtracting self-contribution for "count subsets of x among other values"), it will use the wrong (cumulative) value. Fix: Save a copy of the original array before the SOS sweep: int orig[1 << 20]; memcpy(orig, cnt, sizeof(cnt));. Then orig[x] gives the raw count and cnt[x] gives the subset-sum count.


Self-Test

Before moving on, verify you can do the following without referencing the notes:

  • [ ] Implement SOS DP from scratch: for each mask, compute the sum of A[sub] over all submask. Write the correct nested loop (bit-outer, mask-inner) with the right condition and update.
  • [ ] Explain the complexity: why is SOS DP O(n2n) instead of O(3n)? State the key insight (dimension decomposition eliminates redundant subset enumeration).
  • [ ] State the dimension-sweep invariant: after processing bit i, dp[mask] has accumulated all submasks that agree with mask on bits i+1n1 and are free on bits 0i.
  • [ ] Implement the superset sum: flip the condition -- process masks where bit i is NOT set, and add dp[mask | (1 << i)].
  • [ ] Apply SOS DP to a concrete problem: given an array of integers, for each element x, count how many other elements y satisfy (x&y)=y (i.e., y is an AND-submask of x). Set up the frequency array and run the Zeta transform.
  • [ ] Describe the Zeta/Möbius relationship: SOS DP computes the Zeta transform; Möbius inversion (same loops, subtract instead of add) undoes it. State when you need each.

Practice Problems

CF RatingWhat You Should Be Comfortable With
1200Understand bitmask basics; enumerate all submasks of a mask in O(3ⁿ); know what "subset sum over lattice" means conceptually
1500Implement the standard SOS DP loop (iterate over bits, then over masks); solve "for each mask, count how many input values are subsets of it"
1800Apply SOS to AND/OR convolutions; use superset-sum (flip inclusion direction); combine SOS with Möbius inversion to get exact counts (not cumulative); solve CSES Bit Problem
2100XOR convolution (Walsh-Hadamard Transform); SOS in non-binary bases (ternary zeta for "divisibility" lattices); SOS on large alphabets with dimension-by-dimension sweeping
#ProblemSourceDifficultyKey Idea
1Compatible NumbersCF 165E1800SOS DP with witness propagation -- find j s.t. A[i]&A[j]=0
2Jzzhu and NumbersCF 449D2000Superset sum + inclusion-exclusion for AND = 0
3Bits And PiecesCF 1208F2100Superset SOS to greedily maximize $a_i \mathbin{
4Relatively Prime PowersCF 1036F2000Mobius function over exponent masks
5VowelsCF 383E1900SOS DP to count triples with at least one vowel
6Varying KibibitsCF 800D2100Generalized SOS in non-binary base (ternary Zeta)
7Prefix Function QueriesCF 1553F2100Bitmask transform to handle string matching constraints
8AND ClosureCF 1680F2000Subset-sum transform to find AND-closed sets
9Special PairsCF 1761E1900OR convolution to count pairs
10Expected PowerCF 2020E1900SOS DP with probability over XOR subsets
11Bit ProblemCSES1800For each x, count elements where x AND y = y (subset sum)
12Reachable NodesCSES2100Bitset reachability via SOS-like propagation

Further Reading


Built for the climb from Codeforces 1100 to 2100.