Skip to content

Change the Problem, Not the Algorithm

Quick Revisit

  • REACH FOR THIS WHEN: the problem seems hard but might become easy if you rephrase it
  • KEY DECISION: which transformation simplifies the problem -- compression, complement, dual, difference, contribution, or offline?
  • QUICK RULE: don't change the algorithm, change the problem so a simple algorithm works
  • SEE ALSO: Binary Search Reductions, Reduction Patterns

Sometimes the smartest move isn't finding a better algorithm. It's rephrasing the question so a simple algorithm works.

Coordinate Compression

The situation. Values range up to 109, but there are only n105 distinct values. You need to index by value (in a Fenwick tree, DP table, or array).

The transformation. Replace each value with its rank among all distinct values. Sort the values, assign ranks 0 through k1.

cpp
vector<int> vals = a; // copy
sort(vals.begin(), vals.end());
vals.erase(unique(vals.begin(), vals.end()), vals.end());
for (int& x : a)
    x = lower_bound(vals.begin(), vals.end(), x) - vals.begin();
// Now all values are in [0, vals.size()-1]

When to use it. Whenever a data structure needs values as indices and the value range is too large but the count of distinct values is manageable. Inversions, 2D sweep line, Fenwick-based counting -- coordinate compression is often the first step.

The Complement Trick

The situation. Counting "valid" configurations directly is complicated because the constraints interact.

The transformation. Count the total configurations (usually simple), then subtract the invalid ones.

Example: Count pairs (i,j) with ai+ajk. Total pairs: (n2). Pairs with sum equal to k: count using a hash map. Answer = total - bad.

Example: Probability of at least one success.P(at least one)=1P(none). Often P(none) is a simple product.

When it backfires. The complement is only useful when one side is simpler than the other. If both "valid" and "invalid" are equally complex, you've gained nothing.

The Dual Problem

The situation. The problem as stated leads to a hard optimization, but there's an equivalent formulation that's easier.

König's theorem (the classic dual): In a bipartite graph, the size of the maximum matching equals the size of the minimum vertex cover. So "minimum vertex cover" reduces to "maximum bipartite matching," which we know how to solve with augmenting paths or Hopcroft-Karp.

Min-cut = Max-flow. You want the cheapest way to disconnect source from sink? Solve max-flow instead. Same answer, completely different algorithm.

Minimum path cover in a DAG = n - maximum matching. Each unmatched node starts a new path.

These dualities aren't tricks you invent on the spot. They're theorems you know, and recognizing when a problem is secretly the dual of something familiar is the skill.

Difference Arrays and Prefix Sums

The situation. You need to perform many range additions, then query individual values.

The transformation. Instead of updating every element in [l,r], update only the endpoints: d[l] += v, d[r+1] -= v. After all updates, the prefix sum of d gives the actual array.

cpp
// Range add [l, r] by v
diff[l] += v;
diff[r + 1] -= v;

// Reconstruct after all operations
for (int i = 1; i < n; i++)
    diff[i] += diff[i - 1];

The deeper principle. Prefix sums and difference arrays are inverses. If your problem involves one, consider switching to the other. Range sum queries with point updates -> Fenwick tree on values. Range updates with point queries -> Fenwick tree on differences.

Contribution Technique

The situation. You need the sum of some quantity over all subsets, subarrays, pairs, or structures. Computing each structure individually is too slow.

The transformation. Instead of iterating over structures, iterate over elements and ask: "how many structures does this element contribute to?"

Example: Sum of all subarray minimums. Don't iterate over subarrays. For each element a[i], find how many subarrays have a[i] as their minimum. If a[i] is the minimum of subarrays that start in [L,i] and end in [i,R] (boundaries determined by previous/next smaller elements), then a[i] contributes a[i]×(iL+1)×(Ri+1).

Use a monotone stack to find L and R for all elements in O(n).

Example: Sum of GCDs over all pairs. For each value g, count how many pairs have gcd=g. Use inclusion-exclusion: pairs where g|ai and g|aj minus pairs where 2g, 3g, etc. divide both.

The contribution technique turns an O(n2) or O(2n) summation into O(nlogn) or O(n) by changing what you iterate over.

Going Offline

Precondition. Going offline is allowed only when answering one query does not change the inputs of later queries. The classic blocker is "the next query's parameters are XORed with the previous answer" — that forces online processing. If queries depend on past answers but the dependency can be decoded safely (e.g., the dependency is on results of operations, not on judge-supplied query bodies), you can sometimes still reorder; check before committing.

The situation. Queries arrive online, and answering each one independently is expensive. But the problem doesn't actually require online processing.

The transformation. Read all queries first, reorder them so they're cheaper to answer together.

Sorting queries by right endpoint. Process elements left to right. When you reach position r, answer all queries with right endpoint r. You maintain a data structure that grows incrementally -- much easier than handling arbitrary ranges.

Mo's algorithm. Sort queries by (l/n,r). Process them in this order, maintaining a current window [curL,curR] and adding/removing elements one at a time. Total movement: O(nn).

CDQ divide-and-conquer. Split the timeline in half. Recursively solve each half. Then handle the cross-half interactions. Turns online 2D problems into offline 1D problems.

When you can't go offline. Some problems explicitly force online processing (e.g., each query depends on the previous answer via XOR). In those cases, you need persistent or link-cut data structures instead.

Recognition Cues

If you see in the problem...Transformation to try
Values up to 10^9 but few distinct valuesCoordinate compression
"Count valid X" with complex constraintsComplement: total - invalid
"Minimum vertex cover" on bipartite graphDual: maximum matching (Konig's)
"Disconnect source from sink at minimum cost"Dual: max-flow = min-cut
Many range additions, then query individual valuesDifference array
"Sum of f(x) over all subarrays/pairs/subsets"Contribution: iterate over elements, count appearances
Many range queries, order doesn't matterGo offline: sort queries, sweep
"Probability of at least one"Complement: 1 - P(none)

The Transformation Toolbox (Summary)

TransformationYou HaveYou Get
Coordinate compressionValues up to 109Indices up to n
Complement"Count valid" (hard)"Total - invalid" (easier)
Dual problemHard optimizationEquivalent easier optimization
Difference arrayRange updatesEndpoint updates + prefix reconstruction at the end
ContributionSum over structuresSum over elements
OfflineArbitrary query orderSorted/batched queries

None of these change the underlying algorithm. They change the problem so that a simple algorithm suffices.

Common Mistakes in Choosing

  • Using the complement when both sides are equally hard. The complement trick only helps when one side is simpler. If "count valid" and "count invalid" are both complex, you've gained nothing.
  • Forgetting to undo coordinate compression. After solving with compressed coordinates, the answer may need to be mapped back to original values. Don't output ranks when the problem asks for values.
  • Applying difference arrays when updates are not uniform. Difference arrays handle "add v to range [l,r]." If the update varies by position (e.g., add i*v to position i), you need a different technique.
  • Assuming you can go offline. Some problems force online processing (e.g., each query depends on the previous answer via XOR). Read the problem carefully for such constraints.
  • Missing the contribution technique. When you see "sum over all subarrays" and think O(n^2), stop and ask: "how many subarrays does element i appear in?" This is the most commonly missed transformation.

Cross-References

Practice Problems

#ProblemSourceRatingTransformation
1Sum of Subarray MinimumsLeetCode1700Contribution + monotone stack
2Range Update QueriesCSES1400Difference array
3DQUERYSPOJ1700Offline + BIT by right endpoint
4Count InversionsSPOJ1500Coordinate compression + BIT
5Maximum FlowCSES1800Min-cut duality
6XOR and Favorite NumberCF 617E2100Mo's algorithm (offline reordering)

Recap

  • Six transformations: coordinate compression, complement, dual problem, difference array, contribution, offline processing.
  • None change the algorithm -- they change the problem so a simple algorithm suffices.
  • Coordinate compression: when values are huge but distinct count is small, replace with ranks.
  • Complement trick: count total minus invalid when "valid" is complex but "invalid" is simple.
  • Contribution technique: iterate over elements, not structures -- turns O(n^2) sums into O(n log n) or O(n).
  • Going offline: sort or batch queries for cheaper processing. Mo's, sweep, CDQ divide-and-conquer.

Built for the climb from Codeforces 1100 to 2100.