Skip to content

Sorting and Searching

Quick Revisit

  • USE WHEN: Problem involves finding pairs, counting duplicates, or answering repeated queries on an array
  • INVARIANT: After sorting, equal elements are adjacent and binary search works in O(log n)
  • TIME: O(n log n) sort + O(log n) per query | SPACE: O(n) or O(1) if in-place
  • CLASSIC BUG: Using <= in comparator instead of < (violates strict weak ordering, causes UB)
  • PRACTICE: ../12-practice-ladders/01-ladder-1100-to-1400.md

Sort first, ask questions later. O(nlogn) sorting unlocks binary search, two pointers, greedy, and half the problems on Codeforces Div 2 A–C. Sorting transforms chaos into structure—once sorted, problems that seemed impossible (finding pairs, counting inversions, detecting duplicates) become trivial with a single scan.

Difficulty: Beginner | Prerequisites: Arrays and Strings | CF Rating Relevance: 800–1200 | Contest Frequency: Common—fundamental to nearly every contest


Contents


Intuition

You have this array of 8 elements and need to answer repeated "find" queries:

text
  a = | 5 | 2 | 8 | 2 | 9 | 1 | 2 | 7 |
        0   1   2   3   4   5   6   7

Query 1: "Is 7 in the array?" You scan left to right: check index 0 (5, no), 1 (2, no), 2 (8, no), 3 (2, no), 4 (9, no), 5 (1, no), 6 (2, no), 7 (7, yes). That is 7 comparisons.

Query 2: "Is 3 in the array?" Scan all 8 elements, never find it. 8 comparisons.

Query 3: "Is 9 in the array?" Check 0..4—found at index 4. 5 comparisons.

Three queries, roughly 8+8+5=21 comparisons. Each query costs O(n) in the worst case. If you have q queries, total work is O(nq). With n=105 and q=105, that is 1010 operations—far too slow for a 2-second time limit.

We are scanning the entire array from scratch every single time. There has to be a better way.

Sorting pays a one-time O(nlogn) upfront cost so that every subsequent search costs only O(logn) via binary search.

Think of a messy desk buried in unsorted papers. Every time someone asks "do you have the invoice for April 3rd?" you rummage through the entire pile—that is linear search. Now imagine spending twenty minutes organizing everything by date into a sorted file cabinet. That upfront effort is real, but afterward every lookup is near-instant: flip to the right section, narrow down, done. Sorting is building that file cabinet. Binary search is the fast lookup it enables.

The analogy maps well: the papers are array elements, the date ordering is the sort key, and "flip to the middle, decide which half" is exactly binary search. One place it breaks down: a file cabinet lets you slip new papers into the right spot easily, but a sorted array does not—insertion costs O(n). For dynamic data, reach for a set or multiset instead.

Visual Walkthrough

Same array. First, pay the upfront cost—sort it:

text
  Before:  | 5 | 2 | 8 | 2 | 9 | 1 | 2 | 7 |
  After:   | 1 | 2 | 2 | 2 | 5 | 7 | 8 | 9 |
             0   1   2   3   4   5   6   7

  Cost of sorting: O(n log n) = roughly 8 * 3 = 24 comparisons

Now answer the same three queries with binary search:

Query 1: "Is 7 in the array?"

text
  Step 1:
    | 1 | 2 | 2 | 2 | 5 | 7 | 8 | 9 |
      ^               ^               ^
      lo            mid              hi
      lo=0, hi=7, mid=3.  a[3]=2 < 7  -->  lo = 4

  Step 2:
    | 1 | 2 | 2 | 2 | 5 | 7 | 8 | 9 |
                        ^   ^       ^
                        lo mid      hi
      lo=4, hi=7, mid=5.  a[5]=7 == 7  -->  FOUND.  2 comparisons.

Query 2: "Is 3 in the array?"

text
  Step 1:
    | 1 | 2 | 2 | 2 | 5 | 7 | 8 | 9 |
      ^               ^               ^
      lo            mid              hi
      lo=0, hi=7, mid=3.  a[3]=2 < 3  -->  lo = 4

  Step 2:
    | 1 | 2 | 2 | 2 | 5 | 7 | 8 | 9 |
                        ^   ^       ^
                        lo mid      hi
      lo=4, hi=7, mid=5.  a[5]=7 > 3  -->  hi = 4

  Step 3:
    | 1 | 2 | 2 | 2 | 5 | 7 | 8 | 9 |
                        ^
                      lo,hi
      lo=4, hi=4, mid=4.  a[4]=5 > 3  -->  hi = 3

  lo > hi  -->  NOT FOUND.  3 comparisons.

Query 3: "Is 9 in the array?"

text
  Step 1:
    | 1 | 2 | 2 | 2 | 5 | 7 | 8 | 9 |
      ^               ^               ^
      lo            mid              hi
      lo=0, hi=7, mid=3.  a[3]=2 < 9  -->  lo = 4

  Step 2:
    | 1 | 2 | 2 | 2 | 5 | 7 | 8 | 9 |
                        ^       ^   ^
                        lo    mid   hi
      lo=4, hi=7, mid=5.  a[5]=7 < 9  -->  lo = 6

  Step 3:
    | 1 | 2 | 2 | 2 | 5 | 7 | 8 | 9 |
                                ^   ^
                               lo  hi
                                 ^
                                mid
      lo=6, hi=7, mid=6.  a[6]=8 < 9  -->  lo = 7

  Step 4:
    | 1 | 2 | 2 | 2 | 5 | 7 | 8 | 9 |
                                        ^
                                      lo,hi
      lo=7, hi=7, mid=7.  a[7]=9 == 9  -->  FOUND.  4 comparisons.

Comparison tally:

text
  +--------------------------------------------------+
  |             Unsorted (linear)  Sorted (binary)    |
  |  Query 1:       7 ops              2 ops          |
  |  Query 2:       8 ops              3 ops          |
  |  Query 3:       5 ops              4 ops          |
  |  ------------------------------------------       |
  |  Queries:       20 ops             9 ops          |
  |  Sort cost:      0 ops            ~24 ops         |
  |  TOTAL:         20 ops            ~33 ops         |
  +--------------------------------------------------+

With just 3 queries, the upfront sort nearly breaks even. At q=10 queries the sorted approach already wins. At q=105, it is unsorted 105×8=800,000 ops vs sorted 24+105×3300,024 ops. The gap widens as n grows: unsorted is O(nq); sorted is O(nlogn+qlogn).

The Invariant

text
+------------------------------------------------------------------+
| INVARIANT: After sorting, a[i] <= a[i+1] for all valid i.        |
| This monotone ordering guarantees that if a[mid] < target, then  |
| every element in a[0..mid] is also < target, so the entire left  |
| half can be eliminated. Each comparison discards half the search  |
| space, yielding O(log n) per query.                              |
+------------------------------------------------------------------+

Why is the invariant maintained? std::sort establishes it, and binary search is read-only—it never modifies the array. As long as you do not insert, remove, or overwrite elements between the sort and the searches, the invariant holds.

Connect back to the walkthrough: look at Query 1, Step 1. We check a[3] = 2 < 7. Because the array is sorted, we know a[0], a[1], a[2] are all 2, hence all <7. That single comparison eliminates 4 elements at once. Without the sorted invariant, a[3] = 2 tells us nothing about any other position—we would still have to check them one by one.

What the Code Won't Teach You

Sorting is an enabler, not a goal. You rarely sort because the problem says "sort this array." You sort because it unlocks binary search, two pointers, greedy, sweep line, or deduplication. The real skill is recognizing that a problem becomes tractable after sorting—and then choosing what to sort by.

Custom comparator design is where the bugs hide. The call to std::sort is one line; the comparator is where you encode problem-specific logic. Get it wrong and you trigger undefined behavior—not just a wrong answer. Train yourself to always verify strict weak ordering: never return true for equal elements.

text
  WHAT SORTING UNLOCKS -- THE ENABLING MAP
  =========================================

  sort(a)
    |
    +---> Binary search     O(log n) lookups
    |       lower_bound / upper_bound
    |
    +---> Two pointers      O(n) pair finding
    |       converge from both ends
    |
    +---> Greedy            process "best first"
    |       interval scheduling, knapsack
    |
    +---> Sweep line        sorted events
    |       overlap counting, union of intervals
    |
    +---> Deduplication     sort + unique
    |       coordinate compression
    |
    '---> Adjacent scan     O(n) closest pair
            min difference must be neighbors

nth_element is the secret weapon nobody uses. When a problem asks for the k-th smallest element (single query), you do not need a full O(nlogn) sort—std::nth_element does it in O(n) average. Yet in contest after contest, competitors sort the entire array out of habit. Know your tools.

When to Reach for This

Trigger phrases—if you see these in a problem statement, think sort + search:

  • "Find whether element X exists in the array" or "how many elements satisfy..."—sort, then lower_bound / upper_bound for O(logn) lookups.
  • "Sort by a custom key" or "minimize/maximize by processing in some order"—custom comparator sort as a preprocessing step for greedy.
  • "Coordinate compression" or "values up to 109 but only n105 distinct values"—sort + unique to compress coordinates.
  • "Count pairs with sum/difference equal to K"—sort enables two-pointer or binary search pairing.
  • "Answer q offline queries about membership or rank"—sort the array once, answer all queries in O(qlogn).

Counter-examples—looks like sort + search but is not:

  • "Find the K-th smallest element (single query, no repeated lookups)." You do not need a full sort. nth_element solves this in O(n) average—cheaper than O(nlogn). See the STL reference below for nth_element.
  • "Elements are inserted and deleted between queries." A static sorted array cannot handle dynamic updates efficiently (O(n) per insert). Use a set, multiset, or a balanced BST structure instead. See: 08-set-and-multiset.md.
  • "The array is already given in sorted order." Do not re-sort—skip straight to binary search. This sounds obvious, but in contest stress it is easy to add a redundant sort call that wastes time and—worse—might destroy a secondary ordering you need.

How Sorting Works Under the Hood

You call sort() and it just works—but knowing what it does helps you pick the right tool and predict performance.

Merge Sort

Idea: Divide the array in half, recursively sort each half, then merge the two sorted halves into one.

text
  Input:  [5, 2, 8, 1, 9, 3]

  Split:  [5, 2, 8]          [1, 9, 3]
  Split:  [5] [2, 8]         [1] [9, 3]
  Split:  [5] [2] [8]        [1] [9] [3]

  Merge:  [5] [2, 8] -> [2, 5, 8]     [1] [3, 9] -> [1, 3, 9]
  Merge:  [2, 5, 8] + [1, 3, 9]
          compare 2 vs 1 -> take 1
          compare 2 vs 3 -> take 2
          compare 5 vs 3 -> take 3
          compare 5 vs 9 -> take 5
          take 8, take 9
  Result: [1, 2, 3, 5, 8, 9]

Complexity: O(nlogn) worst case—always, no exceptions. Uses O(n) extra space for the merge buffer. Stable—equal elements keep their original relative order.

In C++, stable_sort() uses merge sort (or a variant). Use it when stability matters—for example, sorting by one key while preserving the order from a previous sort on another key.

Quicksort

Idea: Pick a pivot element, partition the array into elements < pivot and elements pivot, then recurse on each partition.

  • Average case: O(nlogn)—when the pivot splits the array roughly in half each time.
  • Worst case: O(n2)—when the pivot is always the smallest or largest element (e.g., already-sorted input with a naive first-element pivot).

Why std::sort is safe: The C++ standard library does not use plain quicksort. It uses introsort—quicksort that monitors recursion depth and switches to heapsort if it gets too deep. This guarantees O(nlogn) worst case while preserving quicksort's excellent average-case cache performance. Introsort also drops into insertion sort for small subarrays (n16 or so), since insertion sort has low overhead for tiny inputs.

So when you call sort(), you get the best of three worlds: quicksort's speed, heapsort's worst-case guarantee, and insertion sort's efficiency on small ranges.

Merge sort was invented by John von Neumann in 1945 for the EDVAC computer. Quicksort followed from Tony Hoare in 1959. The O(nlogn) lower bound for comparison-based sorting was established in the 1960s via information theory: n! possible permutations require at least log2(n!)nlogn comparisons to distinguish.

When to Use What

NeedUseWhy
General sortingsort()—introsortO(nlogn) guaranteed, fastest in practice
Stable sort (preserve order of equals)stable_sort()—merge sortO(nlogn), needs O(n) extra space
Partial sort (top k elements)partial_sort()O(nlogk), only sorts [first, mid)
k-th element onlynth_element()O(n) average, no full sort needed
Already nearly sortedsort() still fineIntrosort handles it; stable_sort is also fast here
Small integer range [0,MAX]Counting sort (see below)O(n+MAX), beats comparison sorts

Counting Sort and Radix Sort

Comparison-based sorts have a theoretical lower bound of Ω(nlogn). But if you know something extra about your data—like the values are small integers—you can sort in linear time.

Counting Sort

When it applies: All values are integers in range [0,MAX] where MAX106 or so (must fit in memory).

Idea: Count how many times each value appears, then reconstruct the sorted array from the counts.

Complexity: O(n+MAX) time, O(MAX) space.

Complexity Proof: Counting sort performs exactly three passes:

  1. One pass over n elements to count occurrences—O(n).
  2. One pass over the MAX+1 buckets to accumulate prefix sums—O(MAX).
  3. One pass over n elements to place each into its sorted position—O(n).

Total: O(n)+O(MAX)+O(n)=O(n+MAX). No comparisons are used—each element is placed directly by its value, bypassing the Ω(nlogn) comparison-sort lower bound.

Simple version (in-place, not stable—fine for most CP problems):

cpp
void counting_sort(vector<int>& a, int max_val) {
    vector<int> cnt(max_val + 1, 0);
    for (int x : a) cnt[x]++;  // Invariant: cnt[v] = number of occurrences of v seen so far
    int idx = 0;
    for (int v = 0; v <= max_val; v++)  // Invariant: idx = total elements placed so far
        while (cnt[v]--) a[idx++] = v;
}

Stable version (preserves relative order of equal elements—needed when elements carry extra data):

cpp
// Stable counting sort: elements with the same key appear in their original order
vector<int> counting_sort_stable(const vector<int>& a, int max_val) {
    int n = a.size();
    vector<int> cnt(max_val + 1, 0);
    for (int x : a) cnt[x]++;  // Invariant: cnt[v] = occurrences of v in a
    // Convert counts to starting positions
    for (int v = 1; v <= max_val; v++)  // Invariant: cnt[v] = first output index for value v
        cnt[v] += cnt[v - 1];
    vector<int> out(n);
    for (int i = n - 1; i >= 0; i--) {  // Invariant: elements a[i+1..n-1] are already placed
        out[--cnt[a[i]]] = a[i];
    }
    return out;
}

When you will see it in CP:

  • Sorting by frequency—count occurrences, then iterate values in order.
  • Bucket-style problems where values are bounded (e.g., grades 0–100, characters 'a'–'z').
  • Coordinate compression when the value range is already small.
  • Building frequency arrays for string problems (character histograms).

Important: If MAX is large (e.g., 109), counting sort is not viable—the array would be too big. Use coordinate compression first (sort + unique), or just use sort().

Radix Sort

Idea: Sort integers by processing one digit at a time, from least significant to most significant. Each digit-level sort uses counting sort as a subroutine.

Complexity: O(d×(n+b)) where d is the number of digits and b is the base (e.g., 10 for decimal, 256 for byte-level).

Complexity Proof: Each of the d digit passes runs a counting sort on n elements with b possible values, costing O(n+b). There are d passes, so total is O(d×(n+b)). For 32-bit integers using base b=256: d=4 bytes, so cost is O(4×(n+256))=O(n). This is truly linear for fixed-width integers.

Working C++ implementation (base-256 LSD radix sort for non-negative integers):

cpp
void radix_sort(vector<int>& a) {
    int n = a.size();
    vector<int> buf(n);
    for (int shift = 0; shift < 32; shift += 8) {  // Invariant: elements are sorted by bits [0..shift-1]
        int cnt[256] = {};
        for (int i = 0; i < n; i++)  // Invariant: cnt[d] = count of elements with digit d at current byte
            cnt[(a[i] >> shift) & 0xFF]++;
        for (int i = 1; i < 256; i++)  // Invariant: cnt[i] = starting output position for digit i
            cnt[i] += cnt[i - 1];
        for (int i = n - 1; i >= 0; i--)  // Invariant: elements a[i+1..n-1] already placed (stable)
            buf[--cnt[(a[i] >> shift) & 0xFF]] = a[i];
        swap(a, buf);
    }
}

When to use each:

  • Counting sort: When all values are in a small range [0,K] where K106. Perfect for character frequencies, grades, small bounded values.
  • Radix sort: When you have many large integers (e.g., n=107 values up to 109) and need to beat std::sort's constant factor. Rarely needed in CP, but essential for extreme time limits.
  • std::sort: Default choice. Only switch to counting/radix when profiling shows you need the constant-factor speedup or when values are naturally bounded.

Radix sort is rarely coded from scratch in competitive programming—sort() is almost always fast enough, and radix sort's constant-factor advantage only materializes for very large n with small keys. But it is worth knowing it exists: it breaks the O(nlogn) comparison-sort barrier, it is why you can sort 107 32-bit integers in roughly 4×n passes using base-256, and some problems with extreme time limits hint at needing linear-time sorts.


C++ STL Reference

The full toolkit for sorting and searching in the C++ standard library. You will use sort and lower_bound constantly; the rest are situational but worth knowing.

Function / ClassHeaderWhat it doesTime Complexity
sort(first, last)<algorithm>IntroSort. Sorts range in ascending order.O(nlogn)
sort(first, last, cmp)<algorithm>Sort with custom comparator cmp.O(nlogn)
stable_sort(first, last)<algorithm>Merge sort. Preserves relative order of equal elements.O(nlogn) (with extra memory) or O(nlog2n)
stable_sort(first, last, cmp)<algorithm>Stable sort with custom comparator.Same as above
partial_sort(first, mid, last)<algorithm>Sorts only [first, mid)—the smallest mid - first elements.O(nlogk) where k = mid - first
nth_element(first, nth, last)<algorithm>Rearranges so element at nth is what would be there if sorted. Elements before nth are it, after are . Not sorted within halves.O(n) average
lower_bound(first, last, val)<algorithm>Iterator to first element val. Range must be sorted.O(logn)
upper_bound(first, last, val)<algorithm>Iterator to first element > val. Range must be sorted.O(logn)
equal_range(first, last, val)<algorithm>Returns {lower_bound, upper_bound} as a pair.O(logn)
binary_search(first, last, val)<algorithm>Returns true if val exists in sorted range.O(logn)
find(first, last, val)<algorithm>Linear search. Returns iterator to first match or last.O(n)
count(first, last, val)<algorithm>Counts occurrences of val in range.O(n)
ranges::sort(r, cmp, proj)<algorithm>C++20 ranges sort with optional projection.O(nlogn)
is_sorted(first, last)<algorithm>Checks if range is sorted.O(n)
merge(f1, l1, f2, l2, out)<algorithm>Merges two sorted ranges into out.O(n+m)

Lambda comparator examples:

cpp
// Sort descending
sort(a.begin(), a.end(), [](int x, int y){ return x > y; });

// Sort pairs by second element, break ties by first
sort(v.begin(), v.end(), [](const auto& a, const auto& b){
    return a.second != b.second ? a.second < b.second : a.first < b.first;
});

// C++20 ranges with projection (sort structs by field)
ranges::sort(events, {}, &Event::end_time);  // sort by end_time ascending

Implementation (Contest-Ready)

Minimal Contest Template

cpp
#include <bits/stdc++.h>
using namespace std;

int main(){
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n; cin >> n;
    vector<int> a(n);
    for(auto& x : a) cin >> x;

    sort(a.begin(), a.end());

    int q; cin >> q;
    while(q--){
        int x; cin >> x;
        // Invariant: a is sorted, so lower_bound/upper_bound are valid
        // count of elements equal to x
        auto lo = lower_bound(a.begin(), a.end(), x);
        auto hi = upper_bound(a.begin(), a.end(), x);
        int cnt = hi - lo;
        // index of first element >= x  (-1 if none)
        int pos = (lo != a.end()) ? (int)(lo - a.begin()) : -1;
        cout << cnt << " " << pos << "\n";
    }
}

Sort + Binary Search with Custom Comparator (Explained)

cpp
#include <bits/stdc++.h>
using namespace std;

int main(){
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n; cin >> n;
    // Store (value, original_index) so we can recover positions after sorting
    vector<pair<int,int>> a(n);
    for(int i = 0; i < n; i++){
        cin >> a[i].first;
        a[i].second = i;  // Invariant: a[i].second = original index of element i
    }

    // Sort by value ascending; ties broken by original index
    // The comparator must satisfy strict weak ordering:
    //   cmp(x, x) must be false
    //   if cmp(x, y) then !cmp(y, x)
    sort(a.begin(), a.end(), [](const pair<int,int>& x, const pair<int,int>& y){
        if(x.first != y.first) return x.first < y.first;
        return x.second < y.second;
    });

    int q; cin >> q;
    while(q--){
        int x; cin >> x;

        // lower_bound: iterator to first element with .first >= x
        // We search using a custom comparator that compares only .first
        auto lo = lower_bound(a.begin(), a.end(), make_pair(x, INT_MIN));

        // upper_bound: iterator to first element with .first > x
        auto hi = upper_bound(a.begin(), a.end(), make_pair(x, INT_MAX));

        int cnt = (int)(hi - lo);

        if(cnt > 0){
            // The original index of the first occurrence
            int orig_idx = lo->second;
            cout << cnt << " " << orig_idx << "\n";
        } else {
            cout << "0 -1\n";
        }
    }
}

Operations Reference

Time and space costs for every sorting and searching operation covered above.

OperationTimeSpaceNotes
sortO(nlogn)O(logn) stackIntroSort, in-place
stable_sortO(nlogn)O(n)Extra memory for merge
nth_elementO(n) avgO(logn) stackIn-place, partial order
partial_sort (top k)O(nlogk)O(1)Heapselect
lower_bound / upper_boundO(logn)O(1)Requires sorted range
binary_searchO(logn)O(1)Requires sorted range
find (unsorted)O(n)O(1)Linear scan
count (unsorted)O(n)O(1)Linear scan
count via upper - lower (sorted)O(logn)O(1)Two binary searches
merge two sorted rangesO(n+m)O(n+m)Into output range
Build sorted structure (set/multiset)O(nlogn)O(n)Alternative to sort+BS

Where Does the O(n log n) Come From?

Merge sort divides the array in half recursively, then merges the sorted halves.

  • How many levels? Each level halves the problem: nn/2n/41. That's log2n levels.
  • Work per level? At each level, every element participates in exactly one merge. Merging two sorted halves costs O(n) total across all merges at that level.
  • Total: log2n levels × O(n) per level =O(nlogn).

This also proves the lower bound: any comparison-based sort must make Ω(nlogn) comparisons. There are n! permutations; each comparison halves the possibilities, so log2(n!)=Θ(nlogn) comparisons are necessary.


Problem Patterns & Tricks

"Find/count pairs with property X" + N <= 10^5—reach for Sort + Two Pointers or Sort + Binary Search.

Sort + Binary Search (Existence / Count)

Sort once, then answer "is X in the array?" or "how many elements fall in [L,R]?" with lower_bound / upper_boundO(logn) per query after O(nlogn) preprocessing.

cpp
// Count elements in [L, R]
int cnt = upper_bound(a.begin(), a.end(), R) - lower_bound(a.begin(), a.end(), L);

CF examples: CF 706B—Interesting drink (direct lower/upper bound), CF 474B—Worms.


Sort + Two Pointers (Pair / Triplet Finding)

Sort, then converge two pointers from both ends to find pairs summing to a target. Generalizes to 3SUM by fixing one element and running two pointers on the rest.

cpp
sort(a.begin(), a.end());
int lo = 0, hi = n - 1;
while(lo < hi){
    // Invariant: answer (if it exists) has one index in [lo..] and one in [..hi]
    int s = a[lo] + a[hi];
    if(s == target) { /* found */ lo++; hi--; }
    else if(s < target) lo++;
    else hi--;
}

CF examples: CF 1850E—Cardboard for Pictures, CF 1915D—Unnatural Language Processing.


Custom Sort for Greedy

Many greedy problems reduce to "sort by the right criterion, then sweep." The art is choosing what to sort by.

  • Interval scheduling: sort by end time, greedily pick non-overlapping intervals.
  • Job scheduling with deadlines: sort by deadline.
  • Fractional knapsack: sort by value/weight ratio descending.
cpp
// Interval scheduling: sort by end time
sort(intervals.begin(), intervals.end(), [](const auto& a, const auto& b){
    return a.second < b.second;
});

CF examples: CF 1472D—Even-Odd Game, CF 1353D—Constructing the Array.


Sort + Unique (Deduplication / Coordinate Compression)

Sort, then remove duplicates with erase(unique(...), end()). The fundamental preprocessing step for coordinate compression and counting distinct values.

cpp
sort(a.begin(), a.end());
a.erase(unique(a.begin(), a.end()), a.end());
// a now contains sorted unique elements

CF examples: CF 1etz C—various compression problems, CF 1560C—Infinity Table.


Binary Search on Answer

Not searching in an array—searching over a value space. "What is the minimum X such that check(X) is true?" Sort is usually a subroutine inside check().

cpp
long long lo = 0, hi = 2e18;
while(lo < hi){
    // Invariant: answer is in [lo, hi]
    long long mid = lo + (hi - lo) / 2;
    if(check(mid)) hi = mid;
    else lo = mid + 1;
}
// lo == smallest valid answer

CF examples: CF 1623C—Balance the Bits, CF 1760E—Binary Inversions.


Sort to Detect / Count Inversions

Merge sort counts inversions in O(nlogn) as a side effect of sorting. Alternatively, sort indices by value and use a BIT. Appears in problems asking "how disordered" a permutation is.

cpp
// Merge sort inversion count (sketch)
long long merge_count(vector<int>& a, int l, int r){
    if(r - l <= 1) return 0;
    int m = (l + r) / 2;
    long long cnt = merge_count(a, l, m) + merge_count(a, m, r);
    vector<int> buf;
    int i = l, j = m;
    while(i < m && j < r){
        // Invariant: buf contains merged elements from a[l..i-1] and a[m..j-1]
        if(a[i] <= a[j]) buf.push_back(a[i++]);
        else { buf.push_back(a[j++]); cnt += m - i; }
    }
    while(i < m) buf.push_back(a[i++]);  // Invariant: remaining left-half elements are all > buf.back()
    while(j < r) buf.push_back(a[j++]);  // Invariant: remaining right-half elements are all > buf.back()
    copy(buf.begin(), buf.end(), a.begin() + l);
    return cnt;
}

CF examples: CF 1490F—Equalize the Array, CF 1430E—String Reversal.


Sort Events (Sweep Line)

Convert intervals to (start, +1) and (end, −1) events, sort by coordinate, then sweep left to right tracking the active count. The pattern handles overlap counting, maximum simultaneous events, and interval union.

cpp
vector<pair<int,int>> events; // {coordinate, +1 or -1}
for(auto& [l, r] : intervals){
    events.push_back({l, +1});
    events.push_back({r + 1, -1}); // +1 for closed intervals
}
sort(events.begin(), events.end());
int cur = 0, mx = 0;
for(auto& [pos, delta] : events){
    cur += delta;  // Invariant: cur = number of active intervals at position pos
    mx = max(mx, cur);
}

CF examples: CF 1355C—Count Triangles, CF 1462D—Add to Neighbour and Remove.


When NOT to Sort

  • When order matters—if the problem asks about the original array order (e.g., longest increasing subsequence), sorting destroys the information you need. See Divide and Conquer for merge-sort-based approaches that preserve index information.
  • When N is very small—for N20, brute force is often simpler and fast enough. See Complete Search.
  • When you need online processing—if elements arrive one at a time, consider a balanced BST or heap instead of re-sorting.
  • When value range is tiny—if all values are in [0,1000], counting sort or direct array indexing beats O(NlogN). See Counting Sort and Radix Sort above.

Before You Code—Sorting Checklist

  • [ ] I confirmed the sort order matches my algorithm's needs (ascending? descending? custom?)
  • [ ] My comparator uses strict <, NOT <=
  • [ ] I checked if sort stability matters (does relative order of equal elements affect the answer?)
  • [ ] I verified N*log(N) fits in the time limit (N <= 10^6 → ~20 million ops, fine)
  • [ ] If using binary search after sort: am I searching on the right property?

Contest Cheat Sheet

Mnemonic: "SCUB"—Sort first, Compare with strict-less-than, Use lower_bound, Beware stability.

If I Had 5 Minutes

  • Default to sort(a.begin(), a.end())—it's O(N log N) and enough for N <= 10^6
  • Pair/struct sorting: use tie() for multi-key, not manual comparators
  • Need stability? stable_sort(). Need partial? nth_element() is O(N)
  • After sorting: binary search with lower_bound/upper_bound, NOT linear scan
  • Custom comparator must be strict (< not <=) or you get UB
text
+--------------------------------------------------------------+
|                SORTING & SEARCHING CHEAT SHEET                |
+--------------------------------------------------------------+
| WHEN TO USE:                                                  |
|   - Need to find/count elements quickly    --> sort + BS      |
|   - Need pairs with target sum             --> sort + 2ptr    |
|   - Greedy needs "best first" ordering     --> custom sort    |
|   - Count distinct / compress coords       --> sort + unique  |
|   - Count elements in range [L,R]          --> upper - lower  |
+--------------------------------------------------------------+
| KEY STL CALLS:                                                |
|   sort(a.begin(), a.end());                                   |
|   sort(a.begin(), a.end(), [](int x, int y){ return x>y; }); |
|   auto it = lower_bound(a.begin(), a.end(), x);              |
|   int cnt = upper_bound(all, R) - lower_bound(all, L);       |
|   nth_element(a.begin(), a.begin()+k, a.end()); // kth elem  |
+--------------------------------------------------------------+
| COMPLEXITY:                                                   |
|   sort:         O(n log n) time,  O(log n) space             |
|   stable_sort:  O(n log n) time,  O(n) space                 |
|   lower/upper:  O(log n) per query                            |
|   nth_element:  O(n) average                                  |
+--------------------------------------------------------------+
| WATCH OUT:                                                    |
|   - Comparator must be strict weak ordering (no cmp(x,x))    |
|   - lower_bound/upper_bound need SORTED range                 |
|   - Use (long long) for products that overflow int            |
+--------------------------------------------------------------+

Common Mistakes & Debugging

Strict Weak Ordering Violations

The leading cause of mysterious runtime errors with sort. Your comparator cmp must satisfy:

  1. Irreflexivity: cmp(a, a) is false.
  2. Asymmetry: if cmp(a, b) then !cmp(b, a).
  3. Transitivity: if cmp(a, b) and cmp(b, c) then cmp(a, c).

Wrong: return a.x <= b.x;—returns true for equal elements. Right: return a.x < b.x;

On GCC with _GLIBCXX_DEBUG, this triggers an assertion. On judges, it can cause a wrong answer, an infinite loop, or a segfault.

Custom Comparator Gotchas

cpp
// WRONG: causes undefined behavior (not strict weak ordering)
sort(a.begin(), a.end(), [](int x, int y) { return x <= y; });

// RIGHT: use strict less-than
sort(a.begin(), a.end(), [](int x, int y) { return x < y; });

// WRONG: overflow when computing a - b for large values
sort(a.begin(), a.end(), [](int x, int y) { return x - y < 0; }); // overflow!

// RIGHT: direct comparison
sort(a.begin(), a.end(), [](int x, int y) { return x < y; });

// WRONG: lambda captures dangling reference
int key;
cin >> key;
auto cmp = [&key](int x, int y) { return abs(x - key) < abs(y - key); };
// Safe here, but if 'key' goes out of scope before sort finishes, UB.
// Prefer capture by value [key] when the captured variable might not outlive the lambda.

// Multi-key sort using tie (clean and safe):
sort(events.begin(), events.end(), [](const auto& a, const auto& b) {
    return tie(a.time, a.type) < tie(b.time, b.type);
});
// tie() compares lexicographically -- first by time, then by type on ties.
// This is MUCH cleaner and safer than:
//   if (a.time != b.time) return a.time < b.time; return a.type < b.type;

Signed / Unsigned Comparison

lower_bound returns an iterator. Computing it - a.begin() gives a signed ptrdiff_t, but a.size() is size_t (unsigned). Comparing signed and unsigned can produce bugs when the signed value is negative.

cpp
// Dangerous:
if(it - a.begin() < a.size()) // mixed sign comparison

// Safe:
int pos = (int)(it - a.begin());
if(pos < n) // both int

When stable_sort Matters

Sort by key A, then sort by key B with stable_sort, and elements with equal B retain their A ordering. With plain sort, that prior ordering is lost. This matters for multi-criteria sorts done in multiple passes.

Sort Stability—Visual

text
Input:  [(3,'a'), (1,'b'), (3,'c'), (2,'d')]

Stable sort by first element:
  [(1,'b'), (2,'d'), (3,'a'), (3,'c')]
                     ^^^^^^^^^^^^^^^^
                     Original relative order preserved!

Unstable sort by first element:
  [(1,'b'), (2,'d'), (3,'c'), (3,'a')]  <-- 'c' and 'a' may swap
                     ^^^^^^^^^^^^^^^^
                     Relative order NOT guaranteed!

+----------+------------+------------------+
| Sort     | Stable?    | When to use      |
+----------+------------+------------------+
| sort()   | NO         | Default, fastest |
| stable_  | YES        | Multi-key sort,  |
| sort()   |            | preserve order   |
+----------+------------+------------------+

lower_bound and upper_bound require a sorted range. Calling them on an unsorted range compiles without complaint but returns garbage. If binary search gives wrong answers, first check that you actually sorted.

Off-by-One with upper_bound

upper_bound(a.begin(), a.end(), x) returns an iterator past the last element equal to x. To get the index of the last element x, subtract 1:

cpp
auto it = upper_bound(a.begin(), a.end(), x);
if(it != a.begin()){
    --it;
    // *it is the largest element <= x
}

Integer Overflow in Comparators

Comparing by difference (e.g., return a - b < 0) overflows int for large values. Use direct comparison (return a < b) or cast to long long.

Mental Traps

Trap 1: "sort is just sort." Sorting is not the destination—it is the departure gate. The moment you call sort(), you should already know what technique comes next: binary search? two pointers? greedy sweep? If you sort without a follow-up plan, you are spending O(nlogn) for nothing.

text
  THE SORT -> THEN-WHAT PIPELINE
  ==============================

  Problem: "find all pairs with sum = target"

                 sort(a)
                    |
        +---------  ?  ---------+
        |                       |
   Binary search          Two pointers
   for each a[i],         lo=0, hi=n-1
   search target-a[i]     converge inward
        |                       |
   O(n log n)              O(n)
   total                   total

  Both work. Two pointers is faster.
  But BOTH require the sort step first.

Trap 2: "lower_bound works on any container."std::lower_bound(s.begin(), s.end(), val) on a std::set compiles but runs in O(n)—it uses iterator increment, not the tree structure. Use the member function s.lower_bound(val) for O(logn). This silent performance trap has caused countless TLEs.

The Mistake That Teaches

In a Div2-C, I needed to sort intervals by end time for activity selection. I wrote return a.end <= b.end; as my comparator. It passed locally with GCC, cleared the sample tests, and then crashed with Runtime Error on test 5. The <= comparator violates strict weak ordering: when two intervals share the same end time, the comparator claims both a < b and b < a—a contradiction. Introsort relies on that property and can segfault or infinite-loop when it's broken. Changing <= to < fixed everything. Lesson: strict weak ordering is not a suggestion; it is a contract.

What Would You Do If...?

  1. You need to sort by multiple keys? Use tie(key1, key2) in your comparator. It is cleaner and less error-prone than writing if (a.key1 != b.key1) return a.key1 < b.key1; return a.key2 < b.key2;.
  2. Sort + binary search gives TLE? Check whether you are sorting inside a loop. Sort once outside, then binary search inside.
  3. You need the k-th smallest element but not a full sort? Use nth_element()O(N) average. It rearranges so the element at position k is what would be there if fully sorted; elements before it are it, elements after are it.

Self-Test

Close the file and answer these from memory. If any gives you pause, re-read the relevant section before continuing.

  • [ ] Write a custom comparator for sorting (deadline, duration) pairs by deadline ascending, ties broken by duration ascending—with valid strict weak ordering.
  • [ ] Explain why std::lower_bound on std::set iterators is O(n) and how to get O(logn).
  • [ ] Count pairs (i, j) with a[i] + a[j] = target in O(nlogn) using sort + binary search.
  • [ ] Name three algorithms or techniques that require sorted input as a prerequisite.
  • [ ] State when nth_element is preferable to sort and what complexity advantage it gives.

Debug This!

Bug 1: Why does this give the wrong count of values x?

cpp
vector<int> a = {1, 3, 5, 7, 9};
int x = 5;
int count = lower_bound(a.begin(), a.end(), x) - a.begin();
// Expected: 3 (values 1, 3, 5)
// Got: 2

Answer: lower_bound returns an iterator to the first element x, so it returns 2 here (skipping 5 itself). For the count of elements x, use upper_bound: upper_bound(a.begin(), a.end(), x) - a.begin() returns 3.

Bug 2: Why does this RE on some test cases?

cpp
sort(a.begin(), a.end(), [](pair<int,int> x, pair<int,int> y) {
    return x.first * y.second <= y.first * x.second;  // Two bugs!
});

Answer: Two bugs: (1) <= violates strict weak ordering—when x.first * y.second == y.first * x.second, both sides return true, which is a contradiction; (2) the multiplication can overflow int for large values. Fix: use < and cast to long long.

Bug 3: Why does sorting not help here?

cpp
// Find longest increasing subsequence
vector<int> a = {3, 1, 4, 1, 5, 9};
sort(a.begin(), a.end());
// Now find longest run of increasing... but it's ALL increasing after sort!

Answer: Sorting destroys the original positions. LIS depends on which elements appear in order, not which values are adjacent after sorting. You need patience sorting or DP on the original sequence, not sort().

Bug 4: nth_element gives wrong result?

cpp
vector<int> a = {5, 3, 1, 4, 2};
nth_element(a.begin(), a.begin() + 2, a.end());
cout << a[2]; // Expected: median (3)
// This is actually correct! But people often expect a[k] = k-th smallest
// when nth_element uses 0-indexed position. a[2] IS the 3rd smallest = 3.

Answer: This actually works correctly. The gotcha: people sometimes pass a.begin() + k expecting the k-th smallest (1-indexed), but the position is 0-indexed. For the k-th smallest in the 1-indexed sense, use a.begin() + k - 1.


Practice Problems

#ProblemSourceDifficultyKey Idea
1Vanya and LanternsCF 492B1200Sort + check gaps
2Interesting drinkCF 706B1100Sort + upper_bound per query
3Factory RobotsCSESEasySort + unique count
4TowersCSESEasySort / multiset greedy
5Even-Odd GameCF 1472D1200Sort descending + greedy
6Count TrianglesCF 1355C1500Sort + two pointers / binary search
7String ReversalCF 1430E1700Sort-based inversion counting
8Magic Powder - 2CF 670D21500Binary search on answer
9PlaylistCSESMediumSort + sliding window / two pointers
10TasksCSESMediumSort by deadline, greedy scheduling

Rating Progression

  • At CF 1200: Sort arrays and use lower_bound for existence checks.
  • At CF 1500: Design custom comparators for greedy problems; use sort + two pointers.
  • At CF 1800: Exploit sort stability for multi-key problems; use counting sort for O(N+K) when the value range is small.
  • At CF 2100: Combine sort with sweep line, merge sort for inversions, or sort events for complex simulations.

Further Reading

  • Binary Search—sorting enables O(logn) lookups and "binary search on the answer"
  • Two Pointers—sorted arrays make two-pointer sweeps possible for pair/sum problems
  • Greedy—many greedy strategies begin with "sort by some key, then iterate"
  • Coordinate Compression—using sort + unique as a preprocessing step to map large value ranges to small indices
  • Prefix Sums—combining sorted arrays with prefix sums for range counting
  • Sliding Window—sliding window techniques that often pair with sorted order
  • Data Structure Selection Guide—when to pick sort + search vs. set / map / heap
  • Practice Ladder 1100–1400—curated problems at the right difficulty for drilling these patterns

What to Solve Next: Work through the practice problems above at your current rating level, then move on to Binary Search, which builds directly on the sorted-array foundation from this chapter. Once comfortable with both, Two Pointers is the natural next technique to pair with sorting.

Built for the climb from Codeforces 1100 to 2100.