The Art of Choosing What to Remember

Quick Revisit
REACH FOR THIS WHEN: you know the problem is DP but can't figure out what the state should be
KEY DECISION: what is the minimum information from the past needed to make optimal future decisions?
QUICK RULE: if your state space is too large, compress it -- bitmask, last-k, rank, or modular residue
SEE ALSO: DP vs Greedy Guide, DS Augmentation

DP isn't about filling tables. It's about deciding what information from the past you need to make decisions about the future.
Prerequisites: DP Thinking Framework

Why State Design Is the Hardest Part

Writing the recurrence is mechanical. Implementing the transitions is straightforward. But choosing the state -- that's where problems are won or lost.

A DP state encodes exactly the minimum information needed to make all future decisions optimally, without knowing the specific history that led there. Too much information and your state space explodes. Too little and you can't distinguish situations that need different handling.

The mantra: A state captures enough past to determine the future, and nothing more.

Four Techniques for Finding the Right State

1. Profile DP -- When the Grid Is Too Wide

Situation. Tiling a grid, placing non-attacking pieces, or any problem where decisions in one column depend on the shape of the previous column.

The state. Instead of tracking every cell, track only the profile -- the pattern along the boundary between "processed" and "unprocessed."

Classic example: Domino tiling of a $3 \times n$ grid. The profile is which cells in the current column are already covered by a horizontal domino started in the previous column. For a grid with 3 rows, the profile is a 3-bit mask -- only 8 states. Process column by column: for each profile, enumerate which dominos you can place and what profile they leave for the next column.

Column i-1    Column i
  ██░         ░██       Profile = 100 (top cell pokes into column i)
  ░░░    ->    ░░░       Only 3 rows -> only 2³ = 8 possible profiles
  ░░░         ░░░

For an $m \times n$ grid, the profile has $2^{m}$ states, so this works when $m$ is small — commonly $m \leq 10 -- 20$ , depending on how many transitions per profile you need to enumerate. A bare $2^{m} \cdot n$ pass is fine at $m = 20$ ; if each profile spawns another exponential of subset transitions, you may be capped closer to $m = 12 -- 15$ .

2. Bitmask DP -- When Tracking a Subset

Situation. You're assigning items to slots, visiting cities, or partitioning into groups, and you need to know which items have been used.

The state. A bitmask of $n$ bits, where bit $i$ = 1 means item $i$ has been used/visited.

Classic example: Traveling Salesman (TSP). State: dp[mask][v] = minimum cost to visit exactly the cities in mask, ending at city $v$ . Transition: for each unvisited city $u$ , dp[mask | (1<<u)][u] = min(dp[mask][v] + cost[v][u]).

The key insight for reduction: if a problem says "choose a subset" or "assign $n$ things ( $n \leq 20$ )," the bitmask is the state.

cpp

// TSP core: O(2^n * n^2)
for (int mask = 0; mask < (1 << n); mask++)
    for (int v = 0; v < n; v++) if (dp[mask][v] < INF)
        for (int u = 0; u < n; u++) if (!(mask & (1 << u)))
            dp[mask | (1 << u)][u] = min(dp[mask | (1 << u)][u],
                                          dp[mask][v] + cost[v][u]);

3. "Last Few Elements" -- When Only Recent History Matters

Situation. The constraint involves consecutive elements, runs, or patterns of bounded length.

The state. Track only the last $k$ decisions (or some compressed summary of them), not the entire history.

Example: No three consecutive identical elements. State: dp[i][last_value][run_length] where run_length ∈ {1, 2}. You don't need to know what happened at position 0 when you're at position 1000 -- you only need the current run.

Example: Longest alternating subsequence. State: dp[i][last_direction] where direction ∈ {up, down}. The entire history compresses to a single bit.

4. Coordinate Compression in the State

Situation. The values in the problem are huge ( $10^{9}$ ), but there are few distinct values ( $\leq 10^{5}$ ).

The state. Replace raw values with their rank among all distinct values. A state over ranks has manageable size.

Example: Count rectangles with sum <= K in a grid. If the prefix sums range up to $10^{18}$ , you can't index by value. But if you compress the $O (n)$ distinct prefix sums to ranks $0 \dots n - 1$ , a Fenwick tree over ranks solves it.

Choosing Your State Design

Technique	When to use	State size	Typical constraint
Profile DP	Grid tiling, boundary-dependent decisions	O(2^m * n)	m small, often 10-20 (transitions can tighten this)
Bitmask DP	Assign/visit/partition n items	O(2^n * n)	n <= 20
Last-k elements	Constraints involve consecutive elements or bounded runs	O(n * k * alphabet)	k small (1-3)
Coordinate compression	Huge value range, few distinct values	O(n) or O(n log n)	values up to 10^9, distinct count <= 10^5

Decision cues:

"Assign n things to n slots" (n <= 20) --> bitmask DP
"Tiling an m x n grid" (m small, often <= 10-20) --> profile DP
"No three consecutive same elements" --> last-k with run length
"Values up to 10^9 but only 10^5 distinct" --> coordinate compression

Worked Example: State Discovery in Action

Problem. Given a string of digits, insert + signs between some digits to minimize the total sum. You must use exactly $k$ plus signs.

Step 1: What are the decisions? Where to place each + sign.

Step 2: What changes after each decision? The position in the string and how many + signs remain.

Step 3: What's the minimal state? dp[i][j] = minimum sum achievable using the first $i$ digits with exactly $j$ plus signs. Transition: the last number starts at some position $p$ , so dp[i][j] = min over p of (dp[p][j-1] + number(p+1..i)).

Step 4: Sanity check. State space: $O (n \cdot k)$ . Transition cost: $O (n)$ per state. Total: $O (n^{2} k)$ . For $n, k \leq 1000$ , that's fast enough.

Step 5: Implementation watch. The transition relies on fast number(p+1..i) evaluation. Two practical issues: (a) the substring's numeric value can overflow 64-bit integers for long blocks, so use big integers, modular arithmetic, or work with sums of digits if the recurrence allows it; (b) if you naively recompute the value inside the inner loop you double the constant factor — precompute it incrementally or via prefix tables. State design forces these issues to surface; they don't go away by ignoring them.

Notice the process: we didn't start with the recurrence. We started by asking what we need to remember.

Common Mistakes in Choosing

Too much state. "Let me track the entire configuration." If your state is the full grid or the complete sequence, you've just rewritten brute force with memoization. Ask: which parts of the configuration actually matter for future decisions?

Forgetting a dimension. Your DP gives wrong answers because states that should be distinct are merged. Common culprit: forgetting to track parity, remaining budget, or which "mode" you're in (e.g., currently holding vs. not holding a stock).

Wrong direction. Sometimes dp[i] = "best answer for the first $i$ items" doesn't work, but dp[i] = "best answer for items $i \dots n$ " does (or vice versa). If transitions feel unnatural, try reversing.

Inventing unnecessary dimensions. Tracking information that doesn't affect future decisions bloats the state space. Ask: "if I remove this dimension, can two states that are currently distinct actually make different optimal choices?" If not, drop the dimension.

Not sanity-checking the state space size. Before coding, compute |states| x |transitions per state|. If the product exceeds ~10^8, the DP will TLE. Either compress the state or find a different approach.

Bad State vs Better State

State design is easiest to learn by contrast. A few examples where the obvious state is wrong, and a tightening fixes it:

Problem	Bad state	Why it fails	Better state
Buy/sell stock with cooldown	`dp[i]` = best profit using first `i` days	Loses whether you currently hold a share or are in cooldown	`dp[i][hold]` with `hold ∈ {free, holding, cooldown}`
LIS by ending value	`dp[i]` = LIS ending at index `i`	$O (n^{2})$ scan unavoidable	`dp[len]` = smallest ending value of any LIS of length `len`, then binary search ( $O (n \log n)$ )
No 3 consecutive identical chars	`dp[i][full history]`	State explodes in `i`	`dp[i][last_char][run_length ∈ {1,2}]`
Knapsack with item-count cap	`dp[i][weight]` only	Forgets items used; over-counts	`dp[i][weight][k]` or 0/1 iteration over items
Path count on grid with k obstacles passable	`dp[i][j]`	Can't tell whether obstacle was already used	`dp[i][j][used]`

The pattern: when two situations that should diverge in the future are merged in the present, the state is too coarse. When the state contains information no future transition reads, it is too fine.

The State Design Checklist

When you're stuck designing a DP state, walk through these questions:

What decision am I making at each step?
After making this decision, what information from the past do I need to evaluate future decisions?
Can I compress that information? (Bitmask? Last-k? Rank? Modular residue?)
Is the state space small enough? Compute $| states | \times | transitions per state |$ and check.
If the state space is too large, what information can I drop or approximate?

Cross-References

DP Thinking Framework -- the prerequisite for state design
Bitmask DP -- detailed bitmask DP techniques
DP vs Greedy Guide -- when to use DP at all
Problem Transformations -- coordinate compression and other reductions
Reduction Patterns -- the full "if X, think Y" catalog

Practice Problems

#	Problem	Source	Rating	State Design Insight
1	Hamiltonian Flights	CSES	1800	Bitmask + endpoint
2	Broken Profile	CF Blog	2000	Profile DP on grid boundary
3	Buy and Sell Stock with Cooldown	LeetCode	1600	State =
4	Digit Sum	AtCoder	1700	Position + remainder + tight flag
5	Partition into Subsets	AtCoder	2100	Bitmask over subset of elements

Recap

A state captures enough past to determine the future, and nothing more.
Four compression techniques: profile (boundary shape), bitmask (subset), last-k (recent history), coordinate compression (rank among distinct values).
Discovery process: what decisions am I making? What past info do future decisions need? Can I compress it? Is the state space small enough?
Too much state = brute force with memoization. Too little state = wrong answers from merged distinct situations.
Always sanity-check: |states| x |transitions| must fit in the time limit before you start coding.

The Art of Choosing What to Remember ​

Why State Design Is the Hardest Part ​

Four Techniques for Finding the Right State ​

1. Profile DP -- When the Grid Is Too Wide ​

2. Bitmask DP -- When Tracking a Subset ​

3. "Last Few Elements" -- When Only Recent History Matters ​

4. Coordinate Compression in the State ​

Choosing Your State Design ​

Worked Example: State Discovery in Action ​

Common Mistakes in Choosing ​

Bad State vs Better State ​

The State Design Checklist ​

Cross-References ​

Practice Problems ​

Recap ​