SOS DP (Sum Over Subsets)

Quick Revisit
USE WHEN: Need $F [mask] = \sum_{s \subseteq mask} A [s]$ for all masks (subset-sum aggregation, AND/OR convolution, counting divisor-like relations on bitmasks)
INVARIANT: After processing bit $i$ , $F [m]$ accounts for all subsets of $m$ that differ only in bits $0. . i$
TIME: O(n · 2ⁿ) | SPACE: O(2ⁿ)
CLASSIC BUG: Iterating masks inside the bit loop instead of bit inside masks — breaks the dimension-by-dimension sweep
PRACTICE: 04-ladder-dp

AL-30 | For an array $A [0 \dots 2^{n} - 1]$ indexed by bitmasks, compute $F [mask] = \sum_{s \subseteq mask} A [s]$ for every mask -- in $O (n \cdot 2^{n})$ instead of $O (3^{n})$ . The "Zeta transform" over the subset lattice.

Difficulty: [Advanced]Prerequisites: DP -- Bitmask, Bit ManipulationCF Rating Range: 1800-2100

Intuition
C++ STL Reference
Implementation (Contest-Ready)
Operations Reference
Problem Patterns & Tricks
Contest Cheat Sheet
Gotchas & Debugging
- Mental Traps
Self-Test
Practice Problems
Further Reading

Intuition

Imagine you are building a team-composition analyzer for a game with $n$ skills. Each player subset has a power score, and for every team you want the total power of all sub-teams it contains. In bitmask terms: you have an array $A$ of $2^{n}$ values, one per bitmask of $n$ bits, and you need to compute, for every mask:

F [mask] = \sum_{s \subseteq mask} A [s]

The brute-force approach iterates over all submasks of each mask:

cpp

for (int mask = 0; mask < (1 << n); ++mask)
    for (int s = mask; ; s = (s - 1) & mask) {
        F[mask] += A[s];
        if (s == 0) break;
    }

Each mask $m$ with $k$ set bits has $2^{k}$ submasks. Summing over all masks:

\sum_{k = 0}^{n} (\binom{n}{k}) \cdot 2^{k} = (1 + 2)^{n} = 3^{n}

For $n = 20$ : $3^{20} \approx 3.5 \times 10^{9}$ -- far too slow. We need something fundamentally better.

Subset Lattice (Hasse Diagram) for n = 3
=========================================

Each node is a bitmask. An edge from A up to B means A ⊂ B
(A is a subset of B, obtained by setting one more bit).
SOS DP computes, for each node, the sum of values at ALL
nodes reachable going DOWNWARD.

                         111
                        / | \
                     110  101  011
                      |\ /\ /|
                     100 010 001
                        \ | /
                         000

Subsets of 101:               Subsets of 110:
     101                           110
     / \                           / \
   100   001                     100   010
     \   /                         \  /
      000                          000

F[101] = A[000]+A[001]+A[100]+A[101]
F[110] = A[000]+A[010]+A[100]+A[110]
F[111] = sum of ALL A[mask]  (every mask is a subset of 111)

Process bits one at a time: for each bit position, either include it or don't -- this "Zeta transform" computes all $F [mask]$ in $O (n \cdot 2^{n})$ instead of $O (3^{n})$ .

Think of it as a high-dimensional prefix sum. If you have a 2D grid, you compute prefix sums first along rows, then along columns -- two separate passes. SOS DP does exactly the same thing, but in $n$ dimensions, one per bit position.

For each bit $i$ , we ask: "What if bit $i$ were turned off?" If bit $i$ is set in the mask, we accumulate the contribution from the version of the mask with bit $i$ cleared. After sweeping through all $n$ bits, every submask's contribution has been included exactly once.

cpp

for (int i = 0; i < n; ++i)                        // one pass per bit
    for (int mask = 0; mask < (1 << n); ++mask)
        if (mask & (1 << i))
            F[mask] += F[mask ^ (1 << i)];          // add "bit i off" version

Total work: $n \cdot 2^{n}$ . For $n = 20$ : $20 \cdot 2^{20} \approx 2 \times 10^{7}$ . Excellent.

Visual Walkthrough

$n = 3$ , so there are $2^{3} = 8$ values. We label masks in binary: $000, 001, 010, \dots, 111$ .

Initial array (copy of $A$ ):

index(bin)  000  001  010  011  100  101  110  111
F[]:          2    3    5    7   11   13   17   19

Layer 0 -- process bit 0 (the $1$ -bit): For every mask with bit 0 set, add the value from the mask with bit 0 cleared.

mask  011: bit 0 set -> F[011] += F[010]  =>  7 + 5  = 12
mask  001: bit 0 set -> F[001] += F[000]  =>  3 + 2  =  5
mask  101: bit 0 set -> F[101] += F[100]  => 13 + 11 = 24
mask  111: bit 0 set -> F[111] += F[110]  => 19 + 17 = 36

After layer 0:
index(bin)  000  001  010  011  100  101  110  111
F[]:          2    5    5   12   11   24   17   36

Meaning: $F [m]$ now includes sums where bit 0 may or may not be set.

Layer 1 -- process bit 1 (the $2$ -bit): For every mask with bit 1 set, add the value from the mask with bit 1 cleared.

mask  010: bit 1 set -> F[010] += F[000]  =>  5 + 2  =  7
mask  011: bit 1 set -> F[011] += F[001]  => 12 + 5  = 17
mask  110: bit 1 set -> F[110] += F[100]  => 17 + 11 = 28
mask  111: bit 1 set -> F[111] += F[101]  => 36 + 24 = 60

After layer 1:
index(bin)  000  001  010  011  100  101  110  111
F[]:          2    5    7   17   11   24   28   60

Layer 2 -- process bit 2 (the $4$ -bit): For every mask with bit 2 set, add the value from the mask with bit 2 cleared.

mask  100: bit 2 set -> F[100] += F[000]  => 11 + 2  = 13
mask  101: bit 2 set -> F[101] += F[001]  => 24 + 5  = 29
mask  110: bit 2 set -> F[110] += F[010]  => 28 + 7  = 35
mask  111: bit 2 set -> F[111] += F[011]  => 60 + 17 = 77

After layer 2 (FINAL):
index(bin)  000  001  010  011  100  101  110  111
F[]:          2    5    7   17   13   29   35   77

Verification: $F [111]$ should be $\sum_{s \subseteq 111} A [s] = 2 + 3 + 5 + 7 + 11 + 13 + 17 + 19 = 77$ . Correct.

$F [101]$ should be $A [000] + A [001] + A [100] + A [101] = 2 + 3 + 11 + 13 = 29$ . Correct.

Layer-by-layer diagram (n = 3):

    A[mask]  --bit 0-->  sum over   --bit 1-->  sum over    --bit 2-->  sum over
                         subsets                subsets                  ALL subsets
                         differing              differing               = F[mask]
                         in bit 0               in bits 0-1

    Each layer doubles the "freedom" by one more bit position.
    After all n layers, every submask has been counted exactly once.

SOS DP Dimension Sweep -- State After Each Pass (n = 3)
========================================================

Shows which submasks' A-values are included in dp[mask] after each pass.

                 INIT          After dim 0      After dim 1       After dim 2
mask            (= A[mask])   (bit 0 free)     (bits 0-1 free)   (ALL free)
───────────────────────────────────────────────────────────────────────────────
000             {000}          {000}            {000}             {000}
001             {001}          {000,001}        {000,001}         {000,001}
010             {010}          {010}            {000,010}         {000,010}
011             {011}          {010,011}        {000,001,         {000,001,
                                                010,011}          010,011}
100             {100}          {100}            {100}             {000,100}
101             {101}          {100,101}        {100,101}         {000,001,
                                                                  100,101}
110             {110}          {110}            {100,110}         {000,010,
                                                                  100,110}
111             {111}          {110,111}        {100,101,         {ALL 8
                                                110,111}          masks}

Each dimension pass DOUBLES the submasks covered (on average).
After all 3 passes, dp[mask] = sum over ALL submasks of mask.

State transitions for mask = 111:
  INIT:    {111}
  dim 0:   {111} ∪ dp[110] = {111} ∪ {110}         = {110,111}
  dim 1:   {110,111} ∪ dp[101] = {110,111} ∪ {100,101}
                                                     = {100,101,110,111}
  dim 2:   {100,101,110,111} ∪ dp[011] = ... ∪ {000,001,010,011}
                                                     = {ALL 8 masks}  OK

The Invariant

+-----------------------------------------------------------------------+
| After processing bit j, dp[mask] contains the sum of A[s] over all   |
| masks s such that:                                                    |
|   - s agrees with mask on bits j+1, j+2, ..., n-1                    |
|   - s is a submask of mask on bits 0, 1, ..., j                      |
|                                                                       |
| In other words, bits 0..j are "free" (can be 0 even if mask has 1),  |
| while bits j+1..n-1 are "locked" (must match mask exactly).          |
|                                                                       |
| After processing all n bits, every bit is free => full subset sum.    |
+-----------------------------------------------------------------------+

Why SOS Beats Naive: Eliminating Redundant Work (n = 3)
=========================================================

NAIVE O(3^n): Each mask independently enumerates its own submasks.

  mask=111 visits: 000 001 010 011 100 101 110 111  (8 submasks)
  mask=110 visits: 000     010     100     110       (4 submasks)
  mask=101 visits: 000 001         100 101           (4 submasks)
  mask=011 visits: 000 001 010 011                   (4 submasks)
  mask=100 visits: 000             100               (2 submasks)
  mask=010 visits: 000     010                       (2 submasks)
  mask=001 visits: 000 001                           (2 submasks)
  mask=000 visits: 000                               (1 submask)
                                             Total:  27 = 3^3

  Notice: A[000] is visited 8 times! A[001] visited 4 times!
  Massive redundancy -- each contribution recomputed from scratch.

SOS O(n * 2^n): Three passes, each touching ALL 2^n masks once.

  Pass 0 (bit 0):  8 masks x 1 check/add  =  8 ops
  Pass 1 (bit 1):  8 masks x 1 check/add  =  8 ops
  Pass 2 (bit 2):  8 masks x 1 check/add  =  8 ops
                                    Total:   24 = 3 x 2^3

  The trick: pass 0 propagates bit-0 contributions to ALL masks
  that need them -- ONCE. Pass 1 reuses that work, adding bit-1
  contributions. Each subsequent pass builds on all previous ones.

  It's the same idea as separable convolution:
    2D brute force:   O(K² per pixel)
    Separable:        O(K per pixel) in each of 2 passes = O(2K)

  Here: n "dimensions" (bits), each pass costs O(2^n).
  Total: O(n * 2^n) instead of O(3^n).

What the Code Won't Teach You

The Zeta/Möbius transform connection. SOS DP is not just an algorithm -- it is the Zeta transform over the subset lattice (a partially ordered set where $A \leq B$ iff $A \subseteq B$ ). Given $g (mask)$ , the Zeta transform produces $f (mask) = \sum_{s \subseteq mask} g (s)$ . The Möbius inversion undoes it: given $f$ , recover $g$ . In code the only difference is += vs -=. Recognizing "this is a lattice transform" unlocks the entire family -- superset sums, AND/OR convolution, and inclusion-exclusion all become instances of the same algebraic structure.

Why dimension-by-dimension works. The subset relation decomposes bit-by-bit: $s \subseteq m$ iff for every bit $i$ , $s_{i} \leq m_{i}$ . This means the $n$ -dimensional sum factors into $n$ independent 1-dimensional sums -- exactly like separable convolution in image processing. A 2D Gaussian blur is $O (K^{2})$ per pixel naively, but $O (2 K)$ with two 1D passes. SOS DP is the same principle in $n$ binary dimensions: each "pass" handles one bit-dimension, and the passes compose to give the full subset sum.

When SOS DP applies. Any time you need $f (mask) = \sum_{s \subseteq mask} g (s)$ computed for ALL masks simultaneously. The key word is simultaneously -- if you only need the subset sum for a single specific mask, direct enumeration in $O (2^{k})$ (where $k$ = popcount) may be faster. SOS DP's advantage is the batch computation: all $2^{n}$ subset sums at once.

The complexity insight. Naive subset enumeration over all masks costs $O (3^{n})$ because each of $n$ bits independently falls into one of three categories: (1) in mask and in submask, (2) in mask but not in submask, (3) not in mask. SOS DP reduces this to $O (n \cdot 2^{n})$ by eliminating the redundant re-enumeration -- each bit-dimension pass costs $O (2^{n})$ , and there are $n$ passes. For $n = 20$ : $3^{20} \approx 3.5 \times 10^{9}$ vs $20 \times 2^{20} \approx 2 \times 10^{7}$ -- a 175x speedup.

With the deeper structure understood, here are the practical signals that point to SOS DP.

When to Reach for This

Trigger phrases in the problem statement:

"Sum over all subsets" / "aggregate over submasks" / "for every mask, combine contributions from submasks."
"Zeta transform" / "Mobius transform" / "inclusion-exclusion over bitmasks."
"AND convolution" / "OR convolution" -- these reduce to SOS DP.
The problem involves $2^{n}$ states and asks for a transform that relates masks to their submasks or supermasks.
You already have an $O (3^{n})$ submask-enumeration solution and need to speed it up.

Counter-examples -- SOS DP is NOT the answer when:

You need to iterate submasks with individual processing per submask (not just summation) -- $O (3^{n})$ may be unavoidable.
The relationship is between masks and their complements, not submasks -- use direct XOR tricks.
$n > 23$ -- even $O (n \cdot 2^{n})$ becomes too slow.
The aggregation is not decomposable bit-by-bit (e.g., you need the product over a non-commutative structure).

C++ STL Reference

SOS DP relies on standard bit-manipulation builtins rather than dedicated containers:

Function / Builtin	Header	What it does	Notes
`__builtin_popcount(x)`	built-in	Count of set bits	Use `__builtin_popcountll` for `long long`
`__builtin_ctz(x)`	built-in	Index of lowest set bit	Undefined for `x == 0`
`1 << i`	--	Bitmask with only bit $i$ set	Use `1LL << i` when `i >= 31`
`mask & (1 << i)`	--	Test if bit $i$ is set	--
`mask ^ (1 << i)`	--	Toggle bit $i$	--
`mask \| (1 << i)`	--	Set bit $i$	--
`mask & (mask - 1)`	--	Clear lowest set bit	Useful for iterating set bits
`popcount(x)`	`<bit>`	C++20 standard popcount	--
`__lg(x)`	built-in	Floor of $\log_{2} (x)$	Index of highest set bit

Implementation (Contest-Ready)

Version 1: SOS DP -- Zeta Transform (Minimal)

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<long long> A(1 << n);
    for (auto& x : A) cin >> x;

    // F[mask] = sum of A[s] for all s that are submasks of mask
    vector<long long> F(A.begin(), A.end());
    for (int i = 0; i < n; ++i)
        for (int mask = 0; mask < (1 << n); ++mask)
            if (mask & (1 << i))
                F[mask] += F[mask ^ (1 << i)];

    for (int mask = 0; mask < (1 << n); ++mask)
        cout << F[mask] << " \n"[mask == (1 << n) - 1];
}

Loop Order Warning

Loop order is critical: bit-outer, mask-inner. The correct SOS DP iterates:

cpp

for (int bit = 0; bit < N; bit++)          // OUTER: which dimension
    for (int mask = 0; mask < (1 << N); mask++)  // INNER: which subset
        if (mask & (1 << bit))
            dp[mask] += dp[mask ^ (1 << bit)];

Why not mask-outer? If you iterate masks in the outer loop, when processing mask M at bit B, the value dp[M ^ (1 << B)] may already include contributions from higher bits that shouldn't be counted yet. The bit-outer loop ensures each dimension is processed independently across ALL masks before moving to the next dimension.

This is analogous to how the knapsack loop direction matters -- same principle of "which row am I reading from?"

Version 1: SOS DP -- Zeta Transform (Explained)

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<long long> A(1 << n);
    for (auto& x : A) cin >> x;

    // Goal: for each mask, compute F[mask] = sum of A[s] for all submasks s of mask.
    //
    // Approach: process one bit dimension at a time.
    // After processing bit i, F[mask] accounts for all submasks that may
    // differ from mask on bits 0..i (but must agree on bits i+1..n-1).
    vector<long long> F(A.begin(), A.end());

    for (int i = 0; i < n; ++i) {
        for (int mask = 0; mask < (1 << n); ++mask) {
            if (mask & (1 << i)) {
                // mask has bit i set. The submask with bit i cleared
                // is mask ^ (1 << i). Accumulate its (partial) sum.
                F[mask] += F[mask ^ (1 << i)];
            }
            // If bit i is not set in mask, no submask of mask can have
            // bit i set, so there is nothing to add.
        }
    }

    for (int mask = 0; mask < (1 << n); ++mask)
        cout << F[mask] << " \n"[mask == (1 << n) - 1];
}

Version 2: Mobius Transform -- Inverse of SOS (Minimal)

Undo the Zeta transform: given $F [mask] = \sum_{s \subseteq mask} A [s]$ , recover $A$ from $F$ .

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<long long> F(1 << n);
    for (auto& x : F) cin >> x;

    // Mobius inversion: subtract instead of add, same loop structure
    for (int i = 0; i < n; ++i)
        for (int mask = 0; mask < (1 << n); ++mask)
            if (mask & (1 << i))
                F[mask] -= F[mask ^ (1 << i)];

    // Now F[mask] = original A[mask]
    for (int mask = 0; mask < (1 << n); ++mask)
        cout << F[mask] << " \n"[mask == (1 << n) - 1];
}

Version 3: Superset Sum (Zeta Transform -- Upward)

Compute $G [mask] = \sum_{s \supseteq mask} A [s]$ -- sum over all supermasks.

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<long long> A(1 << n);
    for (auto& x : A) cin >> x;

    // Superset sum: flip the condition -- add when bit i is NOT set
    vector<long long> G(A.begin(), A.end());
    for (int i = 0; i < n; ++i)
        for (int mask = 0; mask < (1 << n); ++mask)
            if (!(mask & (1 << i)))
                G[mask] += G[mask | (1 << i)];

    for (int mask = 0; mask < (1 << n); ++mask)
        cout << G[mask] << " \n"[mask == (1 << n) - 1];
}

Version 4: AND Convolution

Given arrays $A$ and $B$ of size $2^{n}$ , compute $C [mask] = \sum_{i & j = mask} A [i] \cdot B [j]$ .

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<long long> A(1 << n), B(1 << n);
    for (auto& x : A) cin >> x;
    for (auto& x : B) cin >> x;

    // Step 1: Superset-sum (Zeta) both arrays
    auto zeta_super = [&](vector<long long>& f) {
        for (int i = 0; i < n; ++i)
            for (int mask = 0; mask < (1 << n); ++mask)
                if (!(mask & (1 << i)))
                    f[mask] += f[mask | (1 << i)];
    };
    zeta_super(A);
    zeta_super(B);

    // Step 2: Pointwise multiply
    vector<long long> C(1 << n);
    for (int mask = 0; mask < (1 << n); ++mask)
        C[mask] = A[mask] * B[mask];

    // Step 3: Inverse (Mobius) -- superset version: subtract upward
    for (int i = 0; i < n; ++i)
        for (int mask = 0; mask < (1 << n); ++mask)
            if (!(mask & (1 << i)))
                C[mask] -= C[mask | (1 << i)];

    for (int mask = 0; mask < (1 << n); ++mask)
        cout << C[mask] << " \n"[mask == (1 << n) - 1];
}

Version 5: OR Convolution

Given arrays $A$ and $B$ of size $2^{n}$ , compute $C [mask] = \sum_{i | j = mask} A [i] \cdot B [j]$ .

cpp

#include <bits/stdc++.h>
using namespace std;

int main() {
    int n;
    cin >> n;
    vector<long long> A(1 << n), B(1 << n);
    for (auto& x : A) cin >> x;
    for (auto& x : B) cin >> x;

    // Step 1: Subset-sum (Zeta) both arrays
    auto zeta_sub = [&](vector<long long>& f) {
        for (int i = 0; i < n; ++i)
            for (int mask = 0; mask < (1 << n); ++mask)
                if (mask & (1 << i))
                    f[mask] += f[mask ^ (1 << i)];
    };
    zeta_sub(A);
    zeta_sub(B);

    // Step 2: Pointwise multiply
    vector<long long> C(1 << n);
    for (int mask = 0; mask < (1 << n); ++mask)
        C[mask] = A[mask] * B[mask];

    // Step 3: Inverse (Mobius) -- subset version: subtract downward
    for (int i = 0; i < n; ++i)
        for (int mask = 0; mask < (1 << n); ++mask)
            if (mask & (1 << i))
                C[mask] -= C[mask ^ (1 << i)];

    for (int mask = 0; mask < (1 << n); ++mask)
        cout << C[mask] << " \n"[mask == (1 << n) - 1];
}

Operations Reference

Operation	Time	Space	When
Subset sum (Zeta transform)	$O (n \cdot 2^{n})$	$O (2^{n})$	$F [m] = \sum_{s \subseteq m} A [s]$
Superset sum (upward Zeta)	$O (n \cdot 2^{n})$	$O (2^{n})$	$G [m] = \sum_{s \supseteq m} A [s]$
Mobius inversion (subset)	$O (n \cdot 2^{n})$	$O (2^{n})$	Recover $A$ from subset sums
Mobius inversion (superset)	$O (n \cdot 2^{n})$	$O (2^{n})$	Recover $A$ from superset sums
AND convolution	$O (n \cdot 2^{n})$	$O (2^{n})$	$C [m] = \sum_{i & j = m} A [i] B [j]$
OR convolution	$O (n \cdot 2^{n})$	$O (2^{n})$	$C[m] = \sum_{i \mathbin{
Brute-force submask enumeration	$O (3^{n})$	$O (2^{n})$	Iterate all submasks of all masks

Practical limits: $n \leq 20$ for $O (n \cdot 2^{n})$ ; $n \leq 15$ for $O (3^{n})$ .

Memory: $2^{20} \times 8$ bytes (long long) $= 8$ MB -- very comfortable.

Problem Patterns & Tricks

Pattern 1: Subset Sum Transform

Compute $F [mask] = \sum_{s \subseteq mask} A [s]$ . The direct application of SOS DP.

Use case: You have values indexed by bitmasks and need aggregate information over all submasks. Common in counting problems: "how many elements have a property compatible with mask?"

Key trick: To find, for each mask, whether any submask has a nonzero value (not just the sum), store the value itself and check $F [mask] \neq 0$ after the transform. Or store a specific witness (e.g., the index) instead of a sum.

Examples: CF 165E -- Compatible Numbers

Pattern 2: Superset Sum and Complementary Counting

Compute $G [mask] = \sum_{s \supseteq mask} A [s]$ -- flip the direction. Equivalently, $G [mask] = F^{'} [\overset{―}{mask}]$ where $F^{'}$ is the subset sum of $A^{'} [m] = A [\overset{―}{m}]$ (complement all masks). But the direct "superset Zeta" is simpler.

Use case: "Count elements whose bits are a superset of mask" -- i.e., elements that have at least the bits in mask set.

Examples: CF 449D -- Jzzhu and Numbers

Pattern 3: AND Convolution

$C [m] = \sum_{i & j = m} A [i] \cdot B [j]$ .

Recipe: (1) Superset-Zeta both $A$ and $B$ . (2) Pointwise multiply. (3) Superset-Mobius the result.

Why it works: Superset sum of $A$ at mask $m$ equals $\hat{A} [m] = \sum_{s \supseteq m} A [s]$ . The property $s \supseteq m$ and $t \supseteq m$ iff $s & t \supseteq m$ means $\hat{A} [m] \cdot \hat{B} [m] = \hat{C} [m]$ .

Examples: CF 1620F -- Bipartite Array

Pattern 4: OR Convolution

$C [m] = \sum_{i | j = m} A [i] \cdot B [j]$ .

Recipe: (1) Subset-Zeta both $A$ and $B$ . (2) Pointwise multiply. (3) Subset-Mobius the result.

Why it works: Symmetric to AND convolution. Subset sum turns OR into pointwise product because $s \subseteq m$ and $t \subseteq m$ iff $s | t \subseteq m$ .

Examples: CF 800D -- Varying Kibibits

Pattern 5: Inclusion-Exclusion via Mobius

Many counting problems ask: "how many subsets satisfy ALL of several constraints?" Encode constraints as bits. The superset sum counts subsets satisfying at least the given constraints. Mobius inversion recovers the count for exactly the given constraints.

Classic pattern:

Count subsets with AND = 0:
  1. Let cnt[mask] = number of array elements equal to mask.
  2. Superset-Zeta: S[mask] = number of elements with all bits of mask set.
  3. The number of k-tuples with AND = mask (exactly) is recovered by
     Mobius inversion of S[mask]^k.

Examples: CF 449D -- Jzzhu and Numbers, CF 1036F -- Relatively Prime Powers

Pattern 6: Finding a Compatible Element

"For each $A [i]$ , find any $A [j]$ such that $A [i] & A [j] = 0$ ."

Rephrase: find $j$ where $A [j]$ is a submask of $\overset{―}{A [i]}$ (the complement). Use SOS DP storing a witness instead of a sum: propagate any valid index instead of adding values.

cpp

vector<int> witness(1 << n, -1);
for (int i = 0; i < sz; ++i)
    witness[A[i]] = i;  // store index of some element equal to this mask

for (int i = 0; i < n; ++i)
    for (int mask = 0; mask < (1 << n); ++mask)
        if (!(mask & (1 << i)) && witness[mask] == -1)
            witness[mask] = witness[mask | (1 << i)];

// witness[mask] = index of some element that is a submask of mask (or -1)
// For element A[i], check witness[complement of A[i]].

Examples: CF 165E -- Compatible Numbers

Pattern 7: Counting Subsets with a Given Popcount

Combine SOS DP with popcount tracking. Instead of a single sum, maintain a vector of sums by popcount:

cpp

// dp[mask][k] = sum of A[s] over submasks s of mask with popcount(s) = k

This costs $O (n^{2} \cdot 2^{n})$ -- feasible for $n \leq 15$ . Useful when the answer depends not just on which bits are present but on how many.

Contest Cheat Sheet

+------------------------------------------------------------------+
|                SOS DP CHEAT SHEET (AL-30)                        |
+------------------------------------------------------------------+
| WHAT IT COMPUTES:                                                |
|   F[mask] = sum of A[s] for all submasks s of mask               |
|   (also: superset sums, AND/OR convolution, Mobius inversion)    |
+------------------------------------------------------------------+
| SUBSET SUM (Zeta):                                               |
|   for (int i = 0; i < n; ++i)                                   |
|     for (int m = 0; m < (1<<n); ++m)                             |
|       if (m & (1<<i))  F[m] += F[m ^ (1<<i)];                   |
+------------------------------------------------------------------+
| SUPERSET SUM:                                                    |
|   for (int i = 0; i < n; ++i)                                   |
|     for (int m = 0; m < (1<<n); ++m)                             |
|       if (!(m & (1<<i)))  G[m] += G[m | (1<<i)];                |
+------------------------------------------------------------------+
| MOBIUS (inverse of Zeta):  same loops, subtract instead of add   |
+------------------------------------------------------------------+
| AND CONVOLUTION:  super-Zeta -> multiply -> super-Mobius         |
| OR  CONVOLUTION:  sub-Zeta   -> multiply -> sub-Mobius           |
+------------------------------------------------------------------+
| TIME:   O(n * 2^n)       n <= 20 comfortable                    |
| SPACE:  O(2^n)           8 MB for n=20 with long long            |
+------------------------------------------------------------------+
| INVARIANT: after bit j, bits 0..j are "free", j+1..n-1 "locked" |
+------------------------------------------------------------------+
| TRIGGERS: "sum over subsets", "AND/OR convolution", n <= 20,     |
|           "Zeta/Mobius transform", speed up O(3^n) submask enum  |
+------------------------------------------------------------------+

Gotchas & Debugging

1. Loop order matters

The outer loop must be over bit positions, the inner loop over masks. Reversing them computes something entirely wrong -- you would process all bits for mask 0 before touching mask 1, breaking the layer-by-layer invariant.

cpp

// CORRECT                           // WRONG
for (int i = 0; i < n; ++i)          for (int m = 0; m < (1<<n); ++m)
  for (int m = 0; m < (1<<n); ++m)     for (int i = 0; i < n; ++i)
    ...                                   ...

2. Subset vs. superset direction

Subset sum: condition is if (mask & (1 << i)), operation is += F[mask ^ (1 << i)] (bit off). Superset sum: condition is if (!(mask & (1 << i))), operation is += G[mask | (1 << i)] (bit on).

Mixing these up produces the transform in the wrong direction. Double-check with $n = 2$ .

3. Mobius is just subtraction

The inverse of the subset-sum Zeta transform is identical code with -= instead of +=. Do not reverse the loop order for inversion -- the same loop order with subtraction works because the transform is its own structural inverse under sign change.

4. Overflow with pointwise multiplication

In AND/OR convolution, after Zeta transform the values can be as large as $\sum | A [i] |$ . Multiplying two such values can overflow long long. Use modular arithmetic if the problem requires it, or __int128 for intermediate products.

5. Off-by-one in $n$

If values range from $0$ to $V$ and you want bitmasks over $⌈ \log_{2} V ⌉$ bits, make sure $n$ is large enough. $n = 20$ handles values up to $2^{20} - 1 = 1,048,575$ .

6. Forgetting the initial copy

SOS DP modifies the array in place. If you need the original $A$ later, copy it into $F$ before running the transform. A common bug: running SOS on $A$ directly, then trying to use $A$ for something else.

7. Debug checklist

Test with $n = 2$ or $n = 3$ -- small enough to verify every entry by hand.
Verify $F [(1 ≪ n) - 1] = \sum A [i]$ (full mask is the supermask of everything).
Verify $F [0] = A [0]$ (empty mask has only one submask: itself).
After Mobius inversion of a Zeta-transformed array, you should recover the original.
For convolution: verify $C [0] = A [0] \cdot B [0]$ (AND) or $C [(1 ≪ n) - 1] = \sum_{i | j = full} A [i] \cdot B [j]$ (OR).

Mental Traps

"SOS DP iterates over all subsets." No -- it iterates over all bitmasks and all dimensions (bit positions). The magic is that it never explicitly enumerates the subsets of any single mask. Each pass processes one dimension across every mask, and the composition of $n$ passes covers all subsets implicitly. If you think of it as "visiting subsets," you'll misunderstand the loop structure and the invariant.

"SOS is just inclusion-exclusion." SOS DP computes the Zeta transform (subset sum): $f (m) = \sum_{s \subseteq m} g (s)$ . Möbius inversion undoes it: given $f$ , recover $g$ by subtracting overcounted terms. They are inverses of each other -- related but not the same operation. Confusing them leads to applying the wrong one (e.g., running Möbius when you need Zeta, and getting the original values back instead of the sums).

"The dimension loop order doesn't matter." It absolutely does. Each dimension (bit position) must be fully processed across ALL masks before moving to the next dimension. Think of it as $n$ sequential sweeps, one per bit. After sweep $i$ , every dp[mask] has correctly accumulated contributions from submasks differing only in bits $0 \dots i$ . If you swap the loops (mask-outer, bit-inner), a single mask gets all its bit-updates before other masks are updated, breaking the transitive propagation that makes the algorithm correct.

Mental Model: SOS DP State Transitions for n = 3
===================================================

Think of each dp[mask] as a "bucket" being filled in 3 sweeps.

Sweep 0 (bit 0):     Sweep 1 (bit 1):     Sweep 2 (bit 2):

dp[111]:              dp[111]:              dp[111]:
  += dp[110]            += dp[101]            += dp[011]
  {111,110}             {111,110,101,100}     {ALL 8}

dp[101]:              dp[101]:              dp[101]:
  += dp[100]            (bit 1 off, skip)     += dp[001]
  {101,100}             {101,100}             {101,100,001,000}

dp[011]:              dp[011]:              dp[011]:
  += dp[010]            += dp[001]            (bit 2 off, skip)
  {011,010}             {011,010,001,000}     {011,010,001,000}

Each sweep: bit i MUST be fully processed for ALL masks
before sweep i+1 begins. This is non-negotiable.

Common Mistakes

Swapping loop order. The outer loop must be bit-position, inner must be mask. Reversing them breaks the layer-by-layer invariant -- each dimension must be fully processed across all masks before the next.
Wrong direction (subset vs superset). Subset-sum: if (mask & (1 << i)) f[mask] += f[mask ^ (1 << i)]. Superset-sum: flip the condition -- if (!(mask & (1 << i))) f[mask] += f[mask | (1 << i)]. Sweeping in the wrong direction gives the dual problem and the numbers look plausible on small tests. Always comment the intended direction above the loop.
Mobius confused with re-running Zeta. Mobius inversion is the same loop structure with -= instead of +=; do not reverse the loop order for inversion.
Overflow in AND/OR convolution. After the Zeta transform values grow to sum |A[i]|. Multiplying two such values overflows long long. Use modular arithmetic or __int128.
Forgetting the destructive nature of SOS. SOS modifies the array in place. If you later need the original counts (e.g., to subtract self-contribution), save a copy before running the sweep.

Debug Drills

Drill 1. Off-by-one in bit loop.

cpp

// SOS DP: for each mask, compute sum of f[s] for all s that are subsets of mask
int f[1 << 20];
int n = 20;

// populate f[mask] with initial values...

for (int i = 0; i <= n; i++) {           // BUG: i <= n goes one bit too far
    for (int mask = 0; mask < (1 << n); mask++) {
        if (mask & (1 << i))
            f[mask] += f[mask ^ (1 << i)];
    }
}

What's wrong?

The outer loop goes i = 0 to i = n (inclusive), which is n+1 iterations. When i = n, (1 << i) equals (1 << 20) = 1048576, which is outside the valid mask range [0, 2²⁰ - 1]. This reads/writes out of bounds. Fix: Change the condition to i < n (i.e., for (int i = 0; i < n; i++)). The loop should iterate over bit positions 0 through n-1.

Drill 2. Wrong subset direction (superset instead of subset).

cpp

// Goal: f[mask] = sum of original f[s] for all s ⊆ mask
// (subset sum / zeta transform)
for (int i = 0; i < n; i++) {
    for (int mask = 0; mask < (1 << n); mask++) {
        if (!(mask & (1 << i)))               // BUG: this is superset direction!
            f[mask] += f[mask | (1 << i)];
    }
}

What's wrong?

This code performs the superset-sum (for each mask, sum over all supermasks), not subset-sum. The condition !(mask & (1 << i)) selects masks that don't have bit i, then adds the value from the mask that does have bit i. This propagates values downward (supersets -> subsets). Fix: For subset-sum, flip the condition: if (mask & (1 << i)) f[mask] += f[mask ^ (1 << i)];. This propagates values upward (subsets -> supersets).

Drill 3. Forgetting to save the original array before SOS sweep.

cpp

// Count, for each mask, how many of the n input values are subsets of mask
int cnt[1 << 20];
memset(cnt, 0, sizeof(cnt));
int n_vals;
cin >> n_vals;
for (int i = 0; i < n_vals; i++) {
    int x; cin >> x;
    cnt[x]++;   // only increment, not set -- but what if x appears multiple times?
}
// SOS sweep
for (int i = 0; i < 20; i++)
    for (int mask = 0; mask < (1 << 20); mask++)
        if (mask & (1 << i))
            cnt[mask] += cnt[mask ^ (1 << i)];

// Now use cnt[mask] as "number of input values that are subsets of mask"
// BUG: works correctly... until someone adds cnt[x]-- later thinking they can undo

What's wrong?

This code is actually correct for the stated purpose -- but the subtle trap is that the SOS sweep is destructive: after the sweep, cnt[mask] no longer holds the original count of values equal to mask; it holds the cumulative subset sum. If any later code tries to read or modify the original cnt[x] values (e.g., subtracting self-contribution for "count subsets of x among other values"), it will use the wrong (cumulative) value. Fix: Save a copy of the original array before the SOS sweep: int orig[1 << 20]; memcpy(orig, cnt, sizeof(cnt));. Then orig[x] gives the raw count and cnt[x] gives the subset-sum count.

Self-Test

Before moving on, verify you can do the following without referencing the notes:

[ ] Implement SOS DP from scratch: for each mask, compute the sum of $A [sub]$ over all $sub \subseteq mask$ . Write the correct nested loop (bit-outer, mask-inner) with the right condition and update.
[ ] Explain the complexity: why is SOS DP $O (n \cdot 2^{n})$ instead of $O (3^{n})$ ? State the key insight (dimension decomposition eliminates redundant subset enumeration).
[ ] State the dimension-sweep invariant: after processing bit $i$ , dp[mask] has accumulated all submasks that agree with mask on bits $i + 1 \dots n - 1$ and are free on bits $0 \dots i$ .
[ ] Implement the superset sum: flip the condition -- process masks where bit $i$ is NOT set, and add dp[mask | (1 << i)].
[ ] Apply SOS DP to a concrete problem: given an array of integers, for each element $x$ , count how many other elements $y$ satisfy $(x & y) = y$ (i.e., $y$ is an AND-submask of $x$ ). Set up the frequency array and run the Zeta transform.
[ ] Describe the Zeta/Möbius relationship: SOS DP computes the Zeta transform; Möbius inversion (same loops, subtract instead of add) undoes it. State when you need each.

Practice Problems

CF Rating	What You Should Be Comfortable With
1200	Understand bitmask basics; enumerate all submasks of a mask in O(3ⁿ); know what "subset sum over lattice" means conceptually
1500	Implement the standard SOS DP loop (iterate over bits, then over masks); solve "for each mask, count how many input values are subsets of it"
1800	Apply SOS to AND/OR convolutions; use superset-sum (flip inclusion direction); combine SOS with Möbius inversion to get exact counts (not cumulative); solve CSES Bit Problem
2100	XOR convolution (Walsh-Hadamard Transform); SOS in non-binary bases (ternary zeta for "divisibility" lattices); SOS on large alphabets with dimension-by-dimension sweeping

#	Problem	Source	Difficulty	Key Idea
1	Compatible Numbers	CF 165E	1800	SOS DP with witness propagation -- find $j$ s.t. $A [i] & A [j] = 0$
2	Jzzhu and Numbers	CF 449D	2000	Superset sum + inclusion-exclusion for AND = 0
3	Bits And Pieces	CF 1208F	2100	Superset SOS to greedily maximize $a_i \mathbin{
4	Relatively Prime Powers	CF 1036F	2000	Mobius function over exponent masks
5	Vowels	CF 383E	1900	SOS DP to count triples with at least one vowel
6	Varying Kibibits	CF 800D	2100	Generalized SOS in non-binary base (ternary Zeta)
7	Prefix Function Queries	CF 1553F	2100	Bitmask transform to handle string matching constraints
8	AND Closure	CF 1680F	2000	Subset-sum transform to find AND-closed sets
9	Special Pairs	CF 1761E	1900	OR convolution to count pairs
10	Expected Power	CF 2020E	1900	SOS DP with probability over XOR subsets
11	Bit Problem	CSES	1800	For each x, count elements where x AND y = y (subset sum)
12	Reachable Nodes	CSES	2100	Bitset reachability via SOS-like propagation

SOS DP (Sum Over Subsets) ​

Contents ​

Intuition ​

Intuition ​

Visual Walkthrough ​

The Invariant ​

What the Code Won't Teach You ​

When to Reach for This ​

C++ STL Reference ​

Implementation (Contest-Ready) ​

Version 1: SOS DP -- Zeta Transform (Minimal) ​

Loop Order Warning ​

Version 1: SOS DP -- Zeta Transform (Explained) ​

Version 2: Mobius Transform -- Inverse of SOS (Minimal) ​

Version 3: Superset Sum (Zeta Transform -- Upward) ​

Version 4: AND Convolution ​

Version 5: OR Convolution ​

Operations Reference ​

Problem Patterns & Tricks ​

Pattern 1: Subset Sum Transform ​

Pattern 2: Superset Sum and Complementary Counting ​

Pattern 3: AND Convolution ​

Pattern 4: OR Convolution ​

Pattern 5: Inclusion-Exclusion via Mobius ​

Pattern 6: Finding a Compatible Element ​

Pattern 7: Counting Subsets with a Given Popcount ​

Contest Cheat Sheet ​

Gotchas & Debugging ​

1. Loop order matters ​

2. Subset vs. superset direction ​

3. Mobius is just subtraction ​

4. Overflow with pointwise multiplication ​

5. Off-by-one in n ​

6. Forgetting the initial copy ​

7. Debug checklist ​

Mental Traps ​

Common Mistakes ​

Debug Drills ​

Self-Test ​

Practice Problems ​

Further Reading ​

SOS DP (Sum Over Subsets)

Contents

Intuition

Intuition

Visual Walkthrough

The Invariant

What the Code Won't Teach You

When to Reach for This

C++ STL Reference

Implementation (Contest-Ready)

Version 1: SOS DP -- Zeta Transform (Minimal)

Loop Order Warning

Version 1: SOS DP -- Zeta Transform (Explained)

Version 2: Mobius Transform -- Inverse of SOS (Minimal)

Version 3: Superset Sum (Zeta Transform -- Upward)

Version 4: AND Convolution

Version 5: OR Convolution

Operations Reference

Problem Patterns & Tricks

Pattern 1: Subset Sum Transform

Pattern 2: Superset Sum and Complementary Counting

Pattern 3: AND Convolution

Pattern 4: OR Convolution

Pattern 5: Inclusion-Exclusion via Mobius

Pattern 6: Finding a Compatible Element

Pattern 7: Counting Subsets with a Given Popcount

Contest Cheat Sheet

Gotchas & Debugging

1. Loop order matters

2. Subset vs. superset direction

3. Mobius is just subtraction

4. Overflow with pointwise multiplication

5. Off-by-one in $n$

6. Forgetting the initial copy

7. Debug checklist

Mental Traps

Common Mistakes

Debug Drills

Self-Test

Practice Problems

Further Reading