Change the Problem, Not the Algorithm

Quick Revisit
REACH FOR THIS WHEN: the problem seems hard but might become easy if you rephrase it
KEY DECISION: which transformation simplifies the problem -- compression, complement, dual, difference, contribution, or offline?
QUICK RULE: don't change the algorithm, change the problem so a simple algorithm works
SEE ALSO: Binary Search Reductions, Reduction Patterns

Sometimes the smartest move isn't finding a better algorithm. It's rephrasing the question so a simple algorithm works.

Coordinate Compression

The situation. Values range up to $10^{9}$ , but there are only $n \leq 10^{5}$ distinct values. You need to index by value (in a Fenwick tree, DP table, or array).

The transformation. Replace each value with its rank among all distinct values. Sort the values, assign ranks 0 through $k - 1$ .

cpp

vector<int> vals = a; // copy
sort(vals.begin(), vals.end());
vals.erase(unique(vals.begin(), vals.end()), vals.end());
for (int& x : a)
    x = lower_bound(vals.begin(), vals.end(), x) - vals.begin();
// Now all values are in [0, vals.size()-1]

When to use it. Whenever a data structure needs values as indices and the value range is too large but the count of distinct values is manageable. Inversions, 2D sweep line, Fenwick-based counting -- coordinate compression is often the first step.

The Complement Trick

The situation. Counting "valid" configurations directly is complicated because the constraints interact.

The transformation. Count the total configurations (usually simple), then subtract the invalid ones.

Example: Count pairs $(i, j)$ with $a_{i} + a_{j} \neq k$ . Total pairs: $(\binom{n}{2})$ . Pairs with sum equal to $k$ : count using a hash map. Answer = total - bad.

Example: Probability of at least one success. $P (at least one) = 1 - P (none)$ . Often $P (none)$ is a simple product.

When it backfires. The complement is only useful when one side is simpler than the other. If both "valid" and "invalid" are equally complex, you've gained nothing.

The Dual Problem

The situation. The problem as stated leads to a hard optimization, but there's an equivalent formulation that's easier.

König's theorem (the classic dual): In a bipartite graph, the size of the maximum matching equals the size of the minimum vertex cover. So "minimum vertex cover" reduces to "maximum bipartite matching," which we know how to solve with augmenting paths or Hopcroft-Karp.

Min-cut = Max-flow. You want the cheapest way to disconnect source from sink? Solve max-flow instead. Same answer, completely different algorithm.

Minimum path cover in a DAG = $n$ - maximum matching. Each unmatched node starts a new path.

These dualities aren't tricks you invent on the spot. They're theorems you know, and recognizing when a problem is secretly the dual of something familiar is the skill.

Difference Arrays and Prefix Sums

The situation. You need to perform many range additions, then query individual values.

The transformation. Instead of updating every element in $[l, r]$ , update only the endpoints: $d [l]$ += $v$ , $d [r + 1]$ -= $v$ . After all updates, the prefix sum of $d$ gives the actual array.

cpp

// Range add [l, r] by v
diff[l] += v;
diff[r + 1] -= v;

// Reconstruct after all operations
for (int i = 1; i < n; i++)
    diff[i] += diff[i - 1];

The deeper principle. Prefix sums and difference arrays are inverses. If your problem involves one, consider switching to the other. Range sum queries with point updates -> Fenwick tree on values. Range updates with point queries -> Fenwick tree on differences.

Contribution Technique

The situation. You need the sum of some quantity over all subsets, subarrays, pairs, or structures. Computing each structure individually is too slow.

The transformation. Instead of iterating over structures, iterate over elements and ask: "how many structures does this element contribute to?"

Example: Sum of all subarray minimums. Don't iterate over subarrays. For each element $a [i]$ , find how many subarrays have $a [i]$ as their minimum. If $a [i]$ is the minimum of subarrays that start in $[L, i]$ and end in $[i, R]$ (boundaries determined by previous/next smaller elements), then $a [i]$ contributes $a [i] \times (i - L + 1) \times (R - i + 1)$ .

Use a monotone stack to find $L$ and $R$ for all elements in $O (n)$ .

Example: Sum of GCDs over all pairs. For each value $g$ , count how many pairs have $gcd = g$ . Use inclusion-exclusion: pairs where $g | a_{i}$ and $g | a_{j}$ minus pairs where $2 g$ , $3 g$ , etc. divide both.

The contribution technique turns an $O (n^{2})$ or $O (2^{n})$ summation into $O (n \log n)$ or $O (n)$ by changing what you iterate over.

Going Offline

Precondition. Going offline is allowed only when answering one query does not change the inputs of later queries. The classic blocker is "the next query's parameters are XORed with the previous answer" — that forces online processing. If queries depend on past answers but the dependency can be decoded safely (e.g., the dependency is on results of operations, not on judge-supplied query bodies), you can sometimes still reorder; check before committing.

The situation. Queries arrive online, and answering each one independently is expensive. But the problem doesn't actually require online processing.

The transformation. Read all queries first, reorder them so they're cheaper to answer together.

Sorting queries by right endpoint. Process elements left to right. When you reach position $r$ , answer all queries with right endpoint $r$ . You maintain a data structure that grows incrementally -- much easier than handling arbitrary ranges.

Mo's algorithm. Sort queries by $(⌊ l / \sqrt{n} ⌋, r)$ . Process them in this order, maintaining a current window $[c u r L, c u r R]$ and adding/removing elements one at a time. Total movement: $O (n \sqrt{n})$ .

CDQ divide-and-conquer. Split the timeline in half. Recursively solve each half. Then handle the cross-half interactions. Turns online 2D problems into offline 1D problems.

When you can't go offline. Some problems explicitly force online processing (e.g., each query depends on the previous answer via XOR). In those cases, you need persistent or link-cut data structures instead.

Recognition Cues

If you see in the problem...	Transformation to try
Values up to 10^9 but few distinct values	Coordinate compression
"Count valid X" with complex constraints	Complement: total - invalid
"Minimum vertex cover" on bipartite graph	Dual: maximum matching (Konig's)
"Disconnect source from sink at minimum cost"	Dual: max-flow = min-cut
Many range additions, then query individual values	Difference array
"Sum of f(x) over all subarrays/pairs/subsets"	Contribution: iterate over elements, count appearances
Many range queries, order doesn't matter	Go offline: sort queries, sweep
"Probability of at least one"	Complement: 1 - P(none)

The Transformation Toolbox (Summary)

Transformation	You Have	You Get
Coordinate compression	Values up to $10^{9}$	Indices up to $n$
Complement	"Count valid" (hard)	"Total - invalid" (easier)
Dual problem	Hard optimization	Equivalent easier optimization
Difference array	Range updates	Endpoint updates + prefix reconstruction at the end
Contribution	Sum over structures	Sum over elements
Offline	Arbitrary query order	Sorted/batched queries

None of these change the underlying algorithm. They change the problem so that a simple algorithm suffices.

Common Mistakes in Choosing

Using the complement when both sides are equally hard. The complement trick only helps when one side is simpler. If "count valid" and "count invalid" are both complex, you've gained nothing.
Forgetting to undo coordinate compression. After solving with compressed coordinates, the answer may need to be mapped back to original values. Don't output ranks when the problem asks for values.
Applying difference arrays when updates are not uniform. Difference arrays handle "add v to range [l,r]." If the update varies by position (e.g., add i*v to position i), you need a different technique.
Assuming you can go offline. Some problems force online processing (e.g., each query depends on the previous answer via XOR). Read the problem carefully for such constraints.
Missing the contribution technique. When you see "sum over all subarrays" and think O(n^2), stop and ask: "how many subarrays does element i appear in?" This is the most commonly missed transformation.

Cross-References

Prefix Sums -- the inverse of difference arrays
Difference Arrays -- range updates via endpoint marking
Coordinate Compression -- the full technique
Mo's Algorithm -- offline query reordering
Contribution Technique -- element-wise counting
Graph Modeling -- another reduction approach
Binary Search Reductions -- optimization to decision

Practice Problems

#	Problem	Source	Rating	Transformation
1	Sum of Subarray Minimums	LeetCode	1700	Contribution + monotone stack
2	Range Update Queries	CSES	1400	Difference array
3	DQUERY	SPOJ	1700	Offline + BIT by right endpoint
4	Count Inversions	SPOJ	1500	Coordinate compression + BIT
5	Maximum Flow	CSES	1800	Min-cut duality
6	XOR and Favorite Number	CF 617E	2100	Mo's algorithm (offline reordering)

Recap

Six transformations: coordinate compression, complement, dual problem, difference array, contribution, offline processing.
None change the algorithm -- they change the problem so a simple algorithm suffices.
Coordinate compression: when values are huge but distinct count is small, replace with ranks.
Complement trick: count total minus invalid when "valid" is complex but "invalid" is simple.
Contribution technique: iterate over elements, not structures -- turns O(n^2) sums into O(n log n) or O(n).
Going offline: sort or batch queries for cheaper processing. Mo's, sweep, CDQ divide-and-conquer.

Change the Problem, Not the Algorithm ​

Coordinate Compression ​

The Complement Trick ​

The Dual Problem ​

Difference Arrays and Prefix Sums ​

Contribution Technique ​

Going Offline ​

Recognition Cues ​

The Transformation Toolbox (Summary) ​

Common Mistakes in Choosing ​

Cross-References ​

Practice Problems ​

Recap ​