Randomization Techniques

Quick Revisit
USE WHEN: Deterministic solution can be hacked (adversarial input), or you need random pivots/hashes/bases
INVARIANT: Randomize the part the adversary controls (pivot choice, hash base/mod) so worst-case inputs don't exist
TIME: Expected O(n log n) for randomized quicksort/select; O(n) for hashing | SPACE: O(n)
CLASSIC BUG: Using a fixed seed or fixed hash constants—adversary reverse-engineers them and forces collisions/worst-case
PRACTICE: 01-ladder-1100-to-1400

Difficulty: Intermediate | CF Rating Range: 1400–1800 | Prerequisites: Math Fundamentals, String Hashing | Applies to: CF 1100–2100

Intuition
Core Mechanics
C++ STL Reference -- The <random> Toolkit
Randomized Hashing
Randomized Algorithms
- Las Vegas vs Monte Carlo
- Randomized Quickselect
- Treap
- Reservoir Sampling and Fisher-Yates
- Anti-Hash Attack
The Birthday Paradox and Collision Analysis
Anti-Hack Strategies for Codeforces
Common Patterns
Complexity Analysis
Edge Cases and Pitfalls
Contest Cheat Sheet
Self-Test
Practice Problems
Probabilistic Method in Competitive Programming
If I Had 5 Minutes
Pattern Fingerprint
Rating Progression
Before You Code Checklist
What Would You Do If...?
Historical Note
The Mistake That Teaches
When NOT to Use Randomization
Debug This
Cross-References
Extended Practice Problems
Further Reading
Next Up
Appendix: Quick Reference

Intuition

You submit a clean deterministic solution on Codeforces. It passes pretests. Then during the hacking phase, someone drops a crafted input and your solution gets TLE or WA.

Consider quicksort with the classic "always pick the first element as pivot" strategy. Against a sorted array of $n = 10^{5}$ elements, the $O (n^{2})$ worst case triggers instantly:

text

Input:  [1, 2, 3, 4, 5, ..., 100000]
Pivot:   1
           \--- partition splits into (0 elements) | (99999 elements)
Pivot:   2
           \--- partition splits into (0 elements) | (99998 elements)
...
Result: n + (n-1) + (n-2) + ... + 1 = n*(n+1)/2 comparisons --> TLE

Or consider polynomial string hashing with a fixed base and modulus. An adversary reverse-engineers your constants and constructs colliding strings, turning your $O (n)$ hash comparison into a guaranteed WA.

Deterministic algorithms have knowable worst cases—and on competitive programming platforms with open hacking, the adversary will find them.

"Adding randomness makes adversarial attacks impossible—the enemy can't predict your random choices."

When you pick a random pivot, no input arrangement is consistently bad. When you pick a random hash base at runtime, no pre-built collision set works. The adversary would need to predict your random seed—which they cannot.

The shift in thinking: instead of guaranteeing $O (f (n))$ for all inputs, we guarantee expected $O (f (n))$ for any input. The randomness lives in the algorithm, not in the input distribution.

Visual Walkthrough—Random Pivot Selection

text

Array: [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]

--- Deterministic (first element) on sorted input ---
Sorted: [1, 1, 2, 3, 3, 4, 5, 5, 6, 9]
Pivot=1 -> split: [] | [1,2,3,3,4,5,5,6,9]      depth grows linearly
Pivot=1 -> split: [] | [2,3,3,4,5,5,6,9]
...  O(n) depth --> O(n^2) total

--- Randomized pivot on the SAME sorted input ---
Sorted: [1, 1, 2, 3, 3, 4, 5, 5, 6, 9]
Random pivot=5 -> split: [1,1,2,3,3,4] | [5,6,9]   ~balanced!
  Left:  random pivot=2 -> [1,1] | [3,3,4]           ~balanced!
  Right: random pivot=6 -> [5] | [9]                  ~balanced!
...  O(log n) expected depth --> O(n log n) expected total

Each random pivot has at least a $1 / 2$ probability of landing in the "middle half" of the current subarray, producing a split no worse than $1 / 4$ to $3 / 4$ . After $O (\log n)$ such good splits we reach base cases.

The Invariant

"Expected running time is $O (n \log n)$ regardless of input distribution."

For randomized quicksort, the expected number of comparisons is $2 n \ln n \approx 1.39 n \log_{2} n$ for any permutation. No adversarial input can break this expectation—only luck (which averages out over many runs).

For randomized hashing with base chosen uniformly from $[2, p - 1]$ , the collision probability for two distinct strings of length $L$ is at most $L / p$ . With $p \approx 10^{9} + 7$ and $L \leq 10^{6}$ , that is $< 10^{- 3}$ per comparison.

What the Code Won't Teach You

The templates show how to randomize, but not why each choice matters. Here are insights that only come from debugging wrong submissions:

Hashing safety is about the modulus, not the base. Polynomial rolling hash treats a string as a number in base $b$ mod $M$ . Choosing a random base prevents adversaries from constructing collisions—but the size of $M$ determines how safe you are against accidental collisions. With $n$ hash comparisons, the birthday paradox gives collision probability $\approx n^{2} / (2 M)$ . For $n = 10^{5}$ :

text

Modulus M          Collision prob (n=10^5 comparisons)
+--------------------------------------------------+
| 10^9 + 7         ~10^10 / 2*10^9  ~ 5.0          |  <-- UNSAFE (expect collisions!)
| 2^61 - 1         ~10^10 / 2^62    ~ 2*10^{-9}    |  <-- Safe
| Double hash       ~10^10 / 2^122  ~ negligible    |  <-- Very safe
+--------------------------------------------------+

unordered_map is a landmine. The default hash function on Codeforces is known. An adversary constructs $n$ keys that all land in the same bucket, turning every lookup into $O (n)$ . The fix is the SafeHash struct from the Anti-Hack Strategies section, which XORs each key with a runtime random constant—shifting the collision distribution from adversary-controlled to uniformly random.

text

Default unordered_map with adversarial keys:
  Bucket 0: key1 -> key2 -> key3 -> ... -> keyN    O(n) lookup!
  Bucket 1: (empty)
  Bucket 2: (empty)
  ...

SafeHash unordered_map (same keys, random salt):
  Bucket 0: key7 -> key42                           O(1) expected
  Bucket 1: key3 -> key99 -> key15                  O(1) expected
  Bucket 2: key1                                    O(1) expected
  ...      (keys distributed ~uniformly)

When randomization is the intended solution, look for signals: $n$ is large, the answer is hard to compute exactly, or the problem asks for "any valid answer." Randomized selection (quickselect) is $O (n)$ expected—for safety in contests, shuffle the input first or use introselect (which falls back to median-of-medians).

When to Reach for This

Trigger phrases and situations:

"anti-hack"—you need a solution that cannot be broken by adversarial input
"probabilistic"—the problem explicitly allows small error probability
"expected value solution"—you need expected $O (f (n))$ , not worst-case
Hash collisions—your deterministic hash keeps getting hacked
Randomized algorithms—Miller-Rabin, Pollard rho, Freivalds' algorithm
Large search spaces—random sampling finds a witness faster than enumeration

Core Mechanics

Random Number Generation in C++

Never use rand()—it is slow, low quality, and only 15-bit on some platforms. Use the Mersenne Twister mt19937 or mt19937_64:

cpp

#include <bits/stdc++.h>
using namespace std;

// Seed with chrono to prevent predictable output
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
mt19937_64 rng64(chrono::steady_clock::now().time_since_epoch().count());

// Generate a random integer in [a, b] inclusive
int rand_int(int a, int b) {
    return uniform_int_distribution<int>(a, b)(rng);
}

// Generate a random long long in [a, b] inclusive
long long rand_ll(long long a, long long b) {
    return uniform_int_distribution<long long>(a, b)(rng64);
}

Why chrono seed? A fixed seed lets a hacker replicate your random sequence. A chrono-based seed changes with every submission.

Random Shuffling

The simplest and most powerful anti-hack technique—shuffling the input before processing destroys any adversarial ordering:

cpp

vector<int> a(n);
for (int i = 0; i < n; i++) cin >> a[i];
shuffle(a.begin(), a.end(), rng);
// Now process a[] -- no adversarial ordering possible

Use it for quicksort, treap insertion, or any algorithm whose worst case depends on input order.

Random Pivot Selection (Quicksort / Quickselect)

cpp

int partition(vector<int>& a, int lo, int hi) {
    int pivot_idx = rand_int(lo, hi);
    swap(a[pivot_idx], a[hi]);
    int pivot = a[hi];
    int i = lo;
    for (int j = lo; j < hi; j++) {
        if (a[j] <= pivot) {
            swap(a[i], a[j]);
            i++;
        }
    }
    swap(a[i], a[hi]);
    return i;
}

void quicksort(vector<int>& a, int lo, int hi) {
    if (lo >= hi) return;
    int p = partition(a, lo, hi);
    quicksort(a, lo, p - 1);
    quicksort(a, p + 1, hi);
}

Expected comparisons: $\approx 1.39 n \log_{2} n$ . Worst case is still $O (n^{2})$ but occurs with probability $\leq 1 / n!$ —astronomically unlikely.

C++ STL Reference -- The `<random>` Toolkit

Everything you need lives in <random> (plus shuffle from <algorithm>). Forget rand() and srand()—15-bit range on MSVC, global state, non-uniform distribution. The modern C++ facilities are fast, high-quality, and safe when seeded properly.

Class / Function	Header	What It Does	Complexity
`mt19937`	`<random>`	32-bit Mersenne Twister engine; period $2^{19937} - 1$	$O (1)$ per call
`mt19937_64`	`<random>`	64-bit variant for `long long` ranges	$O (1)$ per call
`uniform_int_distribution<T>(a, b)`	`<random>`	Uniform integer in $[a, b]$ inclusive	$O (1)$
`uniform_real_distribution<T>(a, b)`	`<random>`	Uniform real in $[a, b)$	$O (1)$
`discrete_distribution<T>({w₀, w₁, ...})`	`<random>`	Weighted random index; returns $i$ with prob $\propto w_{i}$	$O (k)$ setup, $O (\log k)$ per call
`bernoulli_distribution(p)`	`<random>`	Returns `true` with probability $p$	$O (1)$
`shuffle(first, last, gen)`	`<algorithm>`	Uniform random permutation (Fisher-Yates internally)	$O (n)$
`sample(first, last, out, n, gen)`	`<algorithm>`	Choose $n$ elements without replacement (C++17)	$O (n)$
`nth_element(first, nth, last)`	`<algorithm>`	Rearranges so element at `nth` is in sorted position; $O (n)$ average	$O (n)$ avg

Usage patterns you'll actually reach for in contest:

cpp

mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());

// Random int in [lo, hi]
int x = uniform_int_distribution<int>(lo, hi)(rng);

// Random long long -- use the 64-bit engine
mt19937_64 rng64(chrono::steady_clock::now().time_since_epoch().count());
long long y = uniform_int_distribution<long long>(lo, hi)(rng64);

// Shuffle a vector -- kills adversarial ordering
shuffle(v.begin(), v.end(), rng);

// Pick k random elements without replacement (C++17)
vector<int> chosen;
sample(v.begin(), v.end(), back_inserter(chosen), k, rng);

// k-th smallest element without full sort
nth_element(v.begin(), v.begin() + k, v.end());
// v[k] is now the (k+1)-th smallest; elements before it are <= v[k]

text

  Distribution overview:

  uniform_int_distribution(1, 6)     ->  1  2  3  4  5  6   (each 1/6)
  uniform_real_distribution(0, 1)    ->  [0.0 ................. 1.0)
  bernoulli_distribution(0.3)        ->  false (70%)  |  true (30%)
  discrete_distribution({3, 1, 1})   ->  0 (60%)  1 (20%)  2 (20%)

Two things that bite people:

random_shuffle was removed in C++17. Always use shuffle with an explicit engine as the third argument. If you see random_shuffle in old code or editorials, replace it immediately.
rng() % n is biased when rng.max() + 1 is not divisible by n. Use uniform_int_distribution instead—it handles rejection sampling internally so every value in the range is equally likely.

Randomized Hashing

Polynomial Hashing with Random Base

The standard polynomial hash of string $s$ with base $b$ and modulus $m$ :

h (s) = \sum_{i = 0}^{| s | - 1} s [i] \cdot b^{| s | - 1 - i} (\mod m)

Anti-hack version: choose $b$ randomly at runtime.

cpp

const long long MOD = (1LL << 61) - 1; // Mersenne prime, very fast modular arithmetic

long long mod_mul(long long a, long long b) {
    // Compute (a * b) % MOD without overflow using __int128
    return (__int128)a * b % MOD;
}

long long mod_add(long long a, long long b) {
    a += b;
    if (a >= MOD) a -= MOD;
    return a;
}

struct RollingHash {
    int n;
    long long base;
    vector<long long> h, pw;

    RollingHash(const string& s) : n(s.size()), h(n + 1), pw(n + 1) {
        base = rand_ll(256, MOD - 1);
        pw[0] = 1;
        for (int i = 0; i < n; i++) {
            pw[i + 1] = mod_mul(pw[i], base);
            h[i + 1] = mod_add(mod_mul(h[i], base), s[i]);
        }
    }

    // Hash of s[l..r] (0-indexed, inclusive)
    long long query(int l, int r) {
        return mod_add(h[r + 1], MOD - mod_mul(h[l], pw[r - l + 1]));
    }
};

Why $2^{61} - 1$ ? It is a Mersenne prime, so modular reduction is fast via bitwise operations. The collision probability for two distinct strings of length $L$ is at most $L / (2^{61} - 1) < 10^{- 12}$ per query—far safer than $10^{9} + 7$ .

Double Hashing (Belt and Suspenders)

For problems where even a single collision is fatal, use two independent random bases:

cpp

struct DoubleHash {
    RollingHash h1, h2;
    DoubleHash(const string& s) : h1(s), h2(s) {}

    pair<long long, long long> query(int l, int r) {
        return {h1.query(l, r), h2.query(l, r)};
    }
};

Collision probability drops to $\approx L^{2} / (2^{61} - 1)^{2}$ —negligible for any practical input size.

Randomized Algorithms

Miller-Rabin Primality Test

Deterministic primality testing typically uses trial division up to $\sqrt{n}$ — $O (\sqrt{n})$ , fine up to about $10^{12}$ but far too slow for 64-bit integers. Miller-Rabin runs in $O (k \log^{2} n)$ for $k$ rounds, making it practical for $n \leq 10^{18}$ .

cpp

long long power(long long a, long long d, long long mod) {
    long long result = 1;
    a %= mod;
    while (d > 0) {
        if (d & 1) result = (__int128)result * a % mod;
        a = (__int128)a * a % mod;
        d >>= 1;
    }
    return result;
}

bool miller_rabin(long long n, long long a) {
    if (n % a == 0) return n == a;
    long long d = n - 1;
    int r = 0;
    while (d % 2 == 0) { d /= 2; r++; }
    long long x = power(a, d, n);
    if (x == 1 || x == n - 1) return true;
    for (int i = 0; i < r - 1; i++) {
        x = (__int128)x * x % n;
        if (x == n - 1) return true;
    }
    return false;
}

bool is_prime(long long n) {
    if (n < 2) return false;
    // Deterministic for n < 3.317e23 with these bases
    for (long long a : {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37}) {
        if (!miller_rabin(n, a)) return false;
    }
    return true;
}

The fixed witness set ${2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37}$ makes this deterministic for all $n < 3.317 \times 10^{23}$ —every 64-bit integer.

Freivalds' Algorithm (Matrix Multiplication Verification)

Given three $n \times n$ matrices $A$ , $B$ , $C$ , verify whether $A \times B = C$ in $O (n^{2})$ instead of $O (n^{3})$ .

Pick a random binary vector $r \in {0, 1}^{n}$ . Compute $A \cdot (B \cdot r)$ and $C \cdot r$ . If $A B = C$ , they are always equal. If $A B \neq C$ , they differ with probability $\geq 1 / 2$ .

cpp

bool freivalds(vector<vector<long long>>& A,
               vector<vector<long long>>& B,
               vector<vector<long long>>& C, int n, int iterations = 20) {
    for (int iter = 0; iter < iterations; iter++) {
        vector<long long> r(n);
        for (int i = 0; i < n; i++) r[i] = rng() & 1;

        // Compute Br
        vector<long long> Br(n, 0);
        for (int i = 0; i < n; i++)
            for (int j = 0; j < n; j++)
                Br[i] += B[i][j] * r[j];

        // Compute A(Br)
        vector<long long> ABr(n, 0);
        for (int i = 0; i < n; i++)
            for (int j = 0; j < n; j++)
                ABr[i] += A[i][j] * Br[j];

        // Compute Cr
        vector<long long> Cr(n, 0);
        for (int i = 0; i < n; i++)
            for (int j = 0; j < n; j++)
                Cr[i] += C[i][j] * r[j];

        if (ABr != Cr) return false;
    }
    return true; // Correct with probability >= 1 - 2^{-iterations}
}

Error probability after $k$ iterations: $\leq 2^{- k}$ . At $k = 20$ , the chance of a false positive is $< 10^{- 6}$ .

Random Sampling / Witness Search

Many problems reduce to: find an element with property $P$ , where at least a constant fraction of elements satisfy $P$ . Random sampling finds such an element in $O (1)$ expected attempts.

Example: given a multiset where the majority element appears $> n / 2$ times, a random sample hits it with probability $> 1 / 2$ . After $k$ samples, the probability of missing it entirely is $< 2^{- k}$ .

cpp

// Find an element appearing > n/2 times (guaranteed to exist)
int find_majority(vector<int>& a) {
    int n = a.size();
    while (true) {
        int candidate = a[rand_int(0, n - 1)];
        int count = 0;
        for (int x : a) count += (x == candidate);
        if (count > n / 2) return candidate;
    }
    // Expected iterations: < 2
}

Las Vegas vs Monte Carlo — A Deeper Look

Las Vegas algorithms always produce the correct answer; only the running time is random. Think of them as "keep trying until you succeed."

Monte Carlo algorithms always terminate within a fixed time bound, but the answer may be wrong with some bounded probability $ε$ . Reduce $ε$ by repeating.

text

+------------------------------------------------------------------+
| Property        | Las Vegas              | Monte Carlo            |
+------------------------------------------------------------------+
| Correctness     | ALWAYS correct         | Correct w.p. >= 1-eps  |
| Runtime         | Random (expected bound) | Deterministic bound   |
| Error control   | Run longer / retry     | Repeat k times:       |
|                 |                        |   error <= eps^k       |
| CP examples     | Randomized Quicksort   | Miller-Rabin (random   |
|                 | Randomized Quickselect |   witnesses)           |
|                 | Treap insert/delete    | Freivalds' algorithm   |
|                 | Random pivot partition | Randomized hashing     |
| When to use     | Need exact answer,     | Approximate answer OK, |
|                 | can tolerate variable  | need guaranteed speed  |
|                 | runtime                |                        |
+------------------------------------------------------------------+

Converting between the two:

Monte Carlo → Las Vegas: if you can verify the answer cheaply, run the MC algorithm, verify, and retry on failure. For example, Miller-Rabin identifies a candidate; trial division up to small bounds confirms it.
Las Vegas → Monte Carlo: run the LV algorithm for a fixed time budget. If it hasn't finished, output a default answer. You sacrifice correctness for a timing guarantee.

text

  Las Vegas runtime distribution         Monte Carlo error reduction
                                          by repetition
  Probability                            Error prob
  ^                                      ^
  |  *                                   |  *
  |  * *                                 |    *
  |  *   *                               |      *
  |  *     * *                           |        *  *
  |  *         * * * *                   |              *  *  *  *
  +-------------------> Time             +--------------------> k (rounds)
  Expected = O(n log n)                  eps^k shrinks fast
  (but could be longer)                  (20 rounds -> ~10^{-6})

Contest rule of thumb: if the problem says "print any valid answer" or has $n \leq 10^{5}$ with a 2–3 second time limit, a Las Vegas approach with expected $O (n \log n)$ is safe. For property-checking problems (primality, matrix equality), Monte Carlo with 20–30 rounds is standard.

Cross-reference: Miller-Rabin (above) is the classic Monte Carlo example; randomized quicksort (Random Pivot Selection) is the classic Las Vegas example.

Randomized Quickselect — Expected O(n) Median Finding

Quickselect finds the $k$ -th smallest element in expected $O (n)$ time using a random pivot, compared to $O (n \log n)$ for sorting.

Partition around a random pivot—exactly like quicksort—but recurse into only one side: the side that contains index $k$ .

text

Find the 4th smallest element (k=3, 0-indexed) in [7, 2, 9, 1, 5, 8, 3]:

Step 1: Random pivot = 5
  Partition: [2, 1, 3] | 5 | [7, 9, 8]
  pivot is at index 3 == k  -->  FOUND! Answer = 5

  (Lucky case. Usually takes a few rounds.)

Step 2 (if pivot had landed at index 5):
  Random pivot = 8
  Partition: [7, 2, 1, 5, 3] | 8 | [9]
  pivot at index 5, k=3 < 5  -->  recurse LEFT on [7, 2, 1, 5, 3]

Step 3: Random pivot = 2
  Partition: [1] | 2 | [7, 5, 3]
  pivot at index 1, k=3 > 1  -->  recurse RIGHT on [7, 5, 3], k'=3-2=1

Step 4: Random pivot = 5
  Partition: [3] | 5 | [7]
  pivot at index 1 == k'  -->  FOUND! Answer = 5

Full C++ implementation:

cpp

#include <bits/stdc++.h>
using namespace std;

mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());

// Returns the k-th smallest element (0-indexed) in a[lo..hi]
int quickselect(vector<int>& a, int lo, int hi, int k) {
    while (lo < hi) {
        // Random pivot
        int pivot_idx = uniform_int_distribution<int>(lo, hi)(rng);
        swap(a[pivot_idx], a[hi]);
        int pivot = a[hi];

        // Three-way partition (handles duplicates efficiently)
        int lt = lo, gt = hi;  // a[lo..lt-1] < pivot, a[gt+1..hi] > pivot
        int i = lo;
        while (i <= gt) {
            if (a[i] < pivot)       swap(a[i++], a[lt++]);
            else if (a[i] > pivot)  swap(a[i], a[gt--]);
            else                    i++;
        }
        // Now a[lt..gt] == pivot

        if (k < lt)      hi = lt - 1;
        else if (k > gt) lo = gt + 1;
        else              return a[k];  // pivot is the answer
    }
    return a[lo];
}

// Convenience: find the k-th smallest in the entire array
int kth_smallest(vector<int> a, int k) {
    // Note: pass by value to avoid modifying the original
    return quickselect(a, 0, (int)a.size() - 1, k);
}

Complexity analysis:

Expected comparisons: $\leq 4 n$ (proof via linearity of expectation).
Each pivot lands in the middle half with probability $\geq 1 / 2$ —a "good" pivot that reduces the subarray by at least $1 / 4$ .
After a good pivot, remaining size $\leq 3 n / 4$ . Expected good pivots before the next recursive call: $2$ .
Total work: $n + 3 n / 4 + (3 / 4)^{2} n + \dots = 4 n$ .
Worst case is $O (n^{2})$ , but the probability of exceeding $c n$ is exponentially small.

Contest tip: for safety, shuffle the array before calling quickselect, or use nth_element from the STL, which uses introselect (quickselect with a fallback to median-of-medians for guaranteed $O (n)$ worst case).

Treap — Randomized Binary Search Tree

A treap (tree + heap) is a BST where each node also carries a random priority. It maintains BST order on keys and max-heap order on priorities simultaneously.

Why it works: random priorities force the tree into the shape of a random BST, which has expected depth $O (\log n)$ . That gives $O (\log n)$ expected time for insert, delete, split, and merge—without the complex rotations of AVL or red-black trees.

text

Treap with keys {1, 3, 5, 7, 9} and random priorities:

BST property: left < root < right (on keys)
Heap property: parent priority > child priority

              (5, pri=97)
              /           \
        (3, pri=84)     (9, pri=91)
        /               /
  (1, pri=32)     (7, pri=45)

Verify BST on keys:  1 < 3 < 5 < 7 < 9     <-- correct
Verify heap on pri:  97 > 84, 97 > 91       <-- root highest
                     84 > 32                 <-- left subtree
                     91 > 45                 <-- right subtree

Core Operations—Split and Merge:

All other treap operations reduce to two primitives:

text

SPLIT(t, key) --> (L, R)
  L contains all nodes with key <= key
  R contains all nodes with key > key

        (5,97)                   (3,84)          (5,97)
        /    \      split(4)     /               /    \
   (3,84)  (9,91)  -------->  (1,32)        (empty) (9,91)
   /        /                                        /
 (1,32)  (7,45)                L              (7,45)
                                                 R

MERGE(L, R) --> T   (all keys in L < all keys in R)
  Pick root by higher priority, recursively merge the rest.

  (3,84)       (9,91)             (9,91)
  /            /       merge      /    \
(1,32)      (7,45)  -------->  (3,84) (empty)
                               /    \
                            (1,32) (7,45)

  91 > 84, so 9 is root.
  Left of 9 = merge(L, left_child_of_9) = merge({1,3}, {7})
  84 > 45, so 3 is root of that subtree.

  Final:       (9, pri=91)
               /
          (3, pri=84)
          /         \
    (1, pri=32)  (7, pri=45)

Full C++ implementation (implicit treap with split/merge):

cpp

#include <bits/stdc++.h>
using namespace std;

mt19937 rng_treap(chrono::steady_clock::now().time_since_epoch().count());

struct Node {
    int key, priority, size;
    Node *left, *right;
    Node(int k) : key(k), priority(rng_treap()), size(1),
                  left(nullptr), right(nullptr) {}
};

int sz(Node* t) { return t ? t->size : 0; }

void update(Node* t) {
    if (t) t->size = 1 + sz(t->left) + sz(t->right);
}

// Split treap t into L (keys <= key) and R (keys > key)
void split(Node* t, int key, Node*& L, Node*& R) {
    if (!t) { L = R = nullptr; return; }
    if (t->key <= key) {
        split(t->right, key, t->right, R);
        L = t;
    } else {
        split(t->left, key, L, t->left);
        R = t;
    }
    update(t);
}

// Merge two treaps L and R (all keys in L < all keys in R)
Node* merge(Node* L, Node* R) {
    if (!L || !R) return L ? L : R;
    if (L->priority > R->priority) {
        L->right = merge(L->right, R);
        update(L);
        return L;
    } else {
        R->left = merge(L, R->left);
        update(R);
        return R;
    }
}

// Insert key into treap rooted at t
void insert(Node*& t, int key) {
    Node *L, *R;
    split(t, key, L, R);
    t = merge(merge(L, new Node(key)), R);
}

// Erase one occurrence of key from treap rooted at t
void erase(Node*& t, int key) {
    if (!t) return;
    if (t->key == key) {
        Node* tmp = merge(t->left, t->right);
        delete t;
        t = tmp;
    } else if (key < t->key) {
        erase(t->left, key);
    } else {
        erase(t->right, key);
    }
    update(t);
}

// Find k-th smallest (0-indexed)
int kth(Node* t, int k) {
    int left_size = sz(t->left);
    if (k < left_size) return kth(t->left, k);
    if (k == left_size) return t->key;
    return kth(t->right, k - left_size - 1);
}

// Count elements strictly less than key
int count_less(Node* t, int key) {
    if (!t) return 0;
    if (t->key < key) return 1 + sz(t->left) + count_less(t->right, key);
    return count_less(t->left, key);
}

Why Treap over std::set?

std::set is fine for insert/find/erase, but it cannot:

Find the $k$ -th smallest element in $O (\log n)$
Count elements in a range in $O (\log n)$
Split and merge in $O (\log n)$ —essential for interval operations
Serve as a sequence with split/merge at arbitrary positions (implicit treap)

text

Operation comparison:

+----------------------------------------------------------+
| Operation         | std::set   | Treap      | Notes      |
+----------------------------------------------------------+
| Insert/Delete     | O(log n)   | O(log n)*  | *expected  |
| Find k-th element | O(n)       | O(log n)*  | treap wins |
| Split at key      | O(n)       | O(log n)*  | treap wins |
| Merge two sets    | O(n log n) | O(log n)*  | treap wins |
| Count in range    | O(n)       | O(log n)*  | treap wins |
+----------------------------------------------------------+
*expected time with random priorities

Cross-reference: Treaps combine ideas from Sorting and Searching (BST property) and randomization (heap with random priorities).

Reservoir Sampling & Fisher-Yates Shuffle

Reservoir Sampling (Algorithm R)

Problem: given a stream of $n$ elements (where $n$ may be unknown or too large to store), select $k$ elements uniformly at random.

Maintain a "reservoir" of $k$ items. For the $i$ -th element ( $i > k$ ), include it with probability $k / i$ , replacing a randomly chosen existing element.

text

Stream: a, b, c, d, e, f, g    (k=3)

Step 1: reservoir = [a, b, c]              (first k items)
Step 2: i=4, include d with prob 3/4=0.75
        Say YES -> replace random slot -> [a, d, c]
Step 3: i=5, include e with prob 3/5=0.60
        Say NO  -> reservoir unchanged  -> [a, d, c]
Step 4: i=6, include f with prob 3/6=0.50
        Say YES -> replace random slot -> [f, d, c]
Step 5: i=7, include g with prob 3/7=0.43
        Say NO  -> reservoir unchanged  -> [f, d, c]

Each of the 7 elements had exactly 3/7 probability of being in the final set.

Full C++ implementation:

cpp

#include <bits/stdc++.h>
using namespace std;

mt19937 rng_reservoir(chrono::steady_clock::now().time_since_epoch().count());

// Reservoir sampling: select k elements uniformly from stream
vector<int> reservoir_sample(vector<int>& stream, int k) {
    int n = stream.size();
    vector<int> reservoir(stream.begin(), stream.begin() + min(k, n));

    for (int i = k; i < n; i++) {
        int j = uniform_int_distribution<int>(0, i)(rng_reservoir);
        if (j < k) {
            reservoir[j] = stream[i];
        }
    }
    return reservoir;
}

Correctness proof sketch: by induction. After processing $i$ elements, each of the first $i$ is in the reservoir with probability $k / i$ .

Base case ( $i = k$ ): all $k$ elements are in the reservoir, $P = k / k = 1$ .
Inductive step: element $i + 1$ enters with probability $k / (i + 1)$ . An existing element survives if either: (a) the new element is not chosen: probability $(i + 1 - k) / (i + 1)$ , or (b) the new element is chosen but replaces a different slot: probability $k / (i + 1) \cdot (k - 1) / k = (k - 1) / (i + 1)$ . Total survival probability: $(i + 1 - k) / (i + 1) + (k - 1) / (i + 1) = i / (i + 1)$ . By the inductive hypothesis, the element was present with probability $k / i$ . New probability: $(k / i) \cdot (i / (i + 1)) = k / (i + 1)$ . ∎

Fisher-Yates Shuffle (Knuth Shuffle)

Problem: generate a uniformly random permutation of an array in $O (n)$ time.

cpp

void fisher_yates(vector<int>& a) {
    int n = a.size();
    for (int i = n - 1; i > 0; i--) {
        int j = uniform_int_distribution<int>(0, i)(rng);
        swap(a[i], a[j]);
    }
}

Correctness: at step $i$ , the element placed at position $i$ is chosen uniformly from $i + 1$ candidates. Total distinct outcomes: $n \cdot (n - 1) \dots 1 = n!$ , and each permutation corresponds to exactly one sequence of choices—so all $n!$ are equally likely.

text

Fisher-Yates on [A, B, C, D]:

Step i=3: swap a[3] with a[rand(0..3)]
  4 choices: [A,B,C,D] [A,B,D,C] [A,D,C,B] [D,B,C,A]
             (swap w/0) (swap w/1) (swap w/2) (swap w/3=noop)

Step i=2: swap a[2] with a[rand(0..2)]
  3 choices for each of the 4 above = 12 states

Step i=1: swap a[1] with a[rand(0..1)]
  2 choices for each = 24 states = 4! permutations  <-- all equally likely

Warning: random_shuffle was deprecated in C++14 and removed in C++17. Always use shuffle(a.begin(), a.end(), rng) instead.

Anti-Hash Attack — Why Random Bases Are Essential

This section digs into the mechanics of hash attacks, complementing the overview in the Anti-Hack Strategies section.

How Fixed Bases Get Hacked

With a fixed hash base (e.g., base = 31, mod = 1e9+7), an attacker can reverse-engineer collisions two ways:

Polynomial root attack: a collision between strings $s_{1} \neq s_{2}$ means $h (s_{1}) - h (s_{2}) \equiv 0 (\mod M)$ —a polynomial of degree $\leq L$ in $b$ , with at most $L$ roots mod $M$ . Since $b$ is known, the attacker constructs strings that are roots of that polynomial.
Thue-Morse attack: strings built from the Thue-Morse sequence produce collisions for any base against a given modulus. This is the method behind the well-known unordered_map hack on Codeforces.

text

Attack on fixed base=31, mod=10^9+7:

Attacker knows: hash("ab") = 31*97 + 98 = 3105 (mod 10^9+7)
Attacker constructs s2 with hash("s2") = 3105 (mod 10^9+7)
  --> guaranteed collision!

With random base chosen at RUNTIME:
Attacker prepares collision for base=31... but you picked base=7392841.
  --> collision probability = L / (2^61 - 1) ~ 10^{-13}

  +-------------------------------------------------------+
  |  "A fixed base is a lock whose key is publicly known.  |
  |   A random base is a lock whose key is chosen after    |
  |   the attacker has already left."                      |
  +-------------------------------------------------------+

Birthday Paradox Collision Probability Table

How many hash comparisons $Q$ until you expect a collision?

text

+----------------------------------------------------------------+
| Hash space       | Size           | Q for 50%    | Q for safe  |
|                  |                | collision    | (< 10^{-6}) |
+----------------------------------------------------------------+
| mod 10^9+7       | ~10^9          | ~31,623      | ~44         |
| mod 10^9+9       | ~10^9          | ~31,623      | ~44         |
| mod 2^32         | ~4*10^9        | ~65,536      | ~92         |
| mod 2^61-1       | ~2.3*10^18     | ~1.5*10^9    | ~2,147      |
| Double hash      | ~5.3*10^36     | ~2.3*10^18   | ~3*10^15    |
| (2^61-1)^2       |                |              |             |
+----------------------------------------------------------------+
  "Q for safe" = max Q such that collision prob < 10^{-6}
  Formula: Q < sqrt(2 * N * 10^{-6})

Practical rules for Codeforces:

Never use mod = 10^9 + 7 alone with $> 100$ hash comparisons.
mod = 2^{61} - 1 is the safe default—good up to ~ $10^{9}$ comparisons.
For interactive or hacking-enabled problems, use double hash.
Always choose the base randomly at runtime.

Cross-reference: see String Hashing for the full polynomial hashing tutorial, and Hash Maps for unordered_map anti-hack patterns.

The Birthday Paradox and Collision Analysis

Among $k$ items drawn uniformly from a set of size $N$ , the collision probability is approximately:

P (collision) \approx 1 - e^{- k^{2} / (2 N)}

A collision becomes likely ( $> 50 %$ ) when $k \approx \sqrt{N}$ .

Implications for competitive programming:

Hash space size $N$	Collision likely after $k$ values
$10^{9} + 7$	$\approx 31,623$ hashes
$2^{32}$	$\approx 65,536$ hashes
$2^{61} - 1$	$\approx 1.52 \times 10^{9}$
$2^{64}$	$\approx 4.29 \times 10^{9}$

Rule of thumb: if you are making $Q$ hash comparisons, you need $N ≫ Q^{2}$ . For $Q = 10^{6}$ comparisons that means $N ≫ 10^{12}$ —so $10^{9} + 7$ is not safe, but $2^{61} - 1$ is.

Anti-Hack Strategies for Codeforces

Strategy 1: Shuffle the Input

If your algorithm depends on element order—BST insertion, quicksort, convex hull trick ordering—shuffle first:

cpp

shuffle(a.begin(), a.end(), rng);

Strategy 2: Randomize Hash Parameters

Never hardcode hash bases or moduli—choose them at runtime:

cpp

const long long MOD = (1LL << 61) - 1;
const long long BASE = rand_ll(256, MOD - 1);

Strategy 3: Randomize Map/Set Hash

unordered_map on Codeforces is vulnerable to collision attacks via the known default hash. Use a custom hash with a random component:

cpp

struct SafeHash {
    static uint64_t splitmix64(uint64_t x) {
        x += 0x9e3779b97f4a7c15;
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
        return x ^ (x >> 31);
    }
    size_t operator()(uint64_t x) const {
        static const uint64_t FIXED_RANDOM =
            chrono::steady_clock::now().time_since_epoch().count();
        return splitmix64(x + FIXED_RANDOM);
    }
};

unordered_map<long long, int, SafeHash> safe_map;

Strategy 4: Random Perturbation for Geometry

Floating-point geometry problems can fail on degenerate cases—collinear points, overlapping coordinates. A tiny random perturbation resolves them:

cpp

// Perturb points slightly to avoid degeneracies
for (auto& [x, y] : points) {
    x += uniform_real_distribution<double>(-1e-9, 1e-9)(rng);
    y += uniform_real_distribution<double>(-1e-9, 1e-9)(rng);
}

Strategy 5: Multiple Random Runs

When a single run might fail, repeat and take the best result:

cpp

int best = INT_MAX;
for (int attempt = 0; attempt < 30; attempt++) {
    shuffle(a.begin(), a.end(), rng);
    int result = solve(a);
    best = min(best, result);
}
cout << best << "\n";

Common Patterns

Random Hashing for String Comparison

Problem shape: compare $Q$ pairs of substrings for equality.

Build a rolling hash with a random base. Each comparison is $O (1)$ after $O (n)$ preprocessing. Use the RollingHash struct from the Randomized Hashing section.

Random Sampling for Witness Finding

Problem shape: find an element satisfying property $P$ , where $\geq n / k$ elements satisfy $P$ .

Sample uniformly at random and verify. Expected attempts: $k$ . Useful for:

Finding a vertex in the minimum cut (sample $O (\sqrt{n})$ vertices)
Finding a color that appears frequently
Finding a random edge in a dense subgraph

Birthday Paradox for Collision Detection

Problem shape: find two elements that share a property (e.g., same hash, same remainder).

Generate $O (\sqrt{N})$ random elements and check all pairs. The birthday paradox guarantees a collision is likely. Applications:

Pollard's rho factorization
Finding cycles in iterated functions
Collision search in meet-in-the-middle

Randomized Reduction to Simpler Problem

Problem shape: a complex problem that becomes tractable for random inputs.

Randomly transform the input to destroy adversarial structure, then solve the expected easy case. Examples:

Random coordinate rotation to avoid axis-aligned edge cases in geometry
Random edge contraction for minimum cut (Karger's algorithm)
Random XOR basis for linear algebra over GF(2)

Complexity Analysis

Each randomized technique trades a deterministic guarantee for a probabilistic one. The table below shows both the time cost and the error bound you accept.

Technique	Time Complexity	Error Probability
Random quicksort	$O (n \log n)$ expected	N/A (always correct)
Random quickselect	$O (n)$ expected	N/A (always correct)
Polynomial hash check	$O (n)$ per query	$\leq L / p$ per comparison
Miller-Rabin (12 bases)	$O (\log^{2} n)$	$0$ for $n < 3.3 \times 10^{23}$
Freivalds' ( $k$ rounds)	$O (k n^{2})$	$\leq 2^{- k}$
Random sampling ( $1 / c$ fraction)	$O (c \cdot T_{verify})$ expected	Tunable

Las Vegas vs. Monte Carlo:

Las Vegas: always correct, expected runtime bound—randomized quicksort, quickselect, random sampling with verification.
Monte Carlo: always fast, bounded error probability—Miller-Rabin (with random witnesses), Freivalds' algorithm, randomized hashing.

Monte Carlo solutions are accepted in competitive programming when the error probability is below $\approx 10^{- 6}$ per test case.

Edge Cases and Pitfalls

Pitfall 1: Using `rand()` Instead of `mt19937`

rand() has too many problems to count: 15-bit range on many platforms, poor distribution, global state, and slow speed. Always use mt19937.

Pitfall 2: Fixed Seeds

cpp

// BAD: Hacker can replicate your random sequence
mt19937 rng(42);

// GOOD: Unpredictable seed
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());

Pitfall 3: Insufficient Hash Space

Using mod = 10^9 + 7 with $10^{5}$ hash comparisons gives collision probability $\approx 10^{5} / 10^{9} = 10^{- 4}$ per comparison. Over $10^{5}$ comparisons, expected collisions $\approx 10$ . Switch to $2^{61} - 1$ or double hashing.

Pitfall 4: Not Enough Iterations

For Monte Carlo algorithms, always compute the required repetitions. If each attempt succeeds with probability $p$ , you need $k = ⌈ \ln (10^{- 6}) / \ln (1 - p) ⌉$ attempts for $10^{- 6}$ failure probability.

Success prob $p$	Attempts for $< 10^{- 6}$ failure
$1 / 2$	$20$
$1 / 3$	$30$
$1 / 10$	$132$
$1 / 100$	$1,382$

Pitfall 5: Modular Arithmetic Overflow

When MOD is near $2^{61}$ , the intermediate product of a * b overflows 64-bit. Use __int128:

cpp

long long safe_mul(long long a, long long b, long long mod) {
    return (__int128)a * b % mod;
}

Pitfall 6: Biased Random Ranges

cpp

// BAD: biased if RAND_MAX+1 is not divisible by n
int x = rand() % n;

// GOOD: uniform distribution
int x = uniform_int_distribution<int>(0, n - 1)(rng);

Contest Cheat Sheet

text

+----------------------------------------------------------------------+
|                    RANDOMIZATION CHEAT SHEET                         |
+----------------------------------------------------------------------+
| SETUP (paste at top of every solution):                              |
|                                                                      |
|   mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
|                                                                      |
|   int rand_int(int a, int b) {                                      |
|       return uniform_int_distribution<int>(a, b)(rng);              |
|   }                                                                  |
+----------------------------------------------------------------------+
| ANTI-HACK PATTERNS:                                                  |
|                                                                      |
|   Adversarial order:   shuffle(a.begin(), a.end(), rng);            |
|   Hash collision:      base = rand_int(256, MOD - 1);               |
|   Map blowup:          unordered_map<K, V, SafeHash> mp;            |
+----------------------------------------------------------------------+
| COMMON TECHNIQUES:                                                   |
|                                                                      |
|   Random sampling: pick random element, check property               |
|     -> if >= 1/k of elements satisfy, expected k attempts            |
|     -> 30 attempts gives < 10^{-6} failure for k = 2                |
|                                                                      |
|   Birthday paradox: O(sqrt(N)) samples -> likely collision in N      |
|     -> mod 10^9+7: collisions after ~sqrt(10^9) ~= 31623            |
|     -> mod 2^61-1: safe up to ~sqrt(2^61) ~= 1.5x10^9              |
|                                                                      |
|   Monte Carlo: repeat k times for error <= 2^{-k}                   |
|     -> k = 20 gives error < 10^{-6} (when each round p >= 1/2)     |
+----------------------------------------------------------------------+
| KEY COMPLEXITIES:                                                    |
|                                                                      |
|   Randomized quicksort:   O(n log n) expected, always correct       |
|   Randomized quickselect: O(n) expected, always correct             |
|   nth_element (STL):      O(n) expected                             |
|   Miller-Rabin (12 bases): O(log^2 n), exact for n < 3.3x10^23     |
|   Treap (split/merge):    O(log n) expected per operation           |
|   Poly hash check:        O(n), error <= L/p per comparison         |
|   Freivalds' (k rounds):  O(k*n^2), error <= 2^{-k}               |
+----------------------------------------------------------------------+
| SAFE HASH FOR unordered_map (splitmix64):                            |
|                                                                      |
|   struct SafeHash {                                                  |
|     static uint64_t splitmix64(uint64_t x) {                        |
|       x += 0x9e3779b97f4a7c15;                                      |
|       x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;                   |
|       x = (x ^ (x >> 27)) * 0x94d049bb133111eb;                   |
|       return x ^ (x >> 31);                                         |
|     }                                                                |
|     size_t operator()(uint64_t x) const {                            |
|       static const uint64_t FIXED_RANDOM =                           |
|         chrono::steady_clock::now().time_since_epoch().count();      |
|       return splitmix64(x + FIXED_RANDOM);                           |
|     }                                                                |
|   };                                                                 |
|   // Usage: unordered_map<long long, int, SafeHash> mp;              |
+----------------------------------------------------------------------+
| LAS VEGAS vs MONTE CARLO:                                            |
|                                                                      |
|   Las Vegas:  always correct, random runtime                        |
|               (quicksort, quickselect, treap)                        |
|   Monte Carlo: always fast, bounded error probability               |
|               (Miller-Rabin, Freivalds', hash comparison)            |
+----------------------------------------------------------------------+

Self-Test

Verify you can do all of these without looking at the notes:

[ ] Write the three-line mt19937 seeding pattern plus a rand_int(a, b) helper from memory. Explain why chrono seeding matters on Codeforces.
[ ] Given $10^{5}$ hash comparisons using mod $10^{9} + 7$ , estimate the collision probability via the birthday paradox. State why switching to mod $2^{61} - 1$ or double hashing fixes it.
[ ] Distinguish Las Vegas from Monte Carlo: give one example of each from this chapter and state precisely what guarantee each provides (correctness vs runtime).
[ ] State the birthday paradox threshold: how many random samples from a space of size $N$ before a collision is expected? Write the formula.
[ ] Write SafeHash (the splitmix64 variant) for unordered_map from memory. Explain what attack it defends against and why the default std::hash<int> is vulnerable.
[ ] A problem says " $\geq n / 3$ of elements satisfy property $P$ ." How many random samples are needed to find one with probability $> 1 - 10^{- 6}$ ? Show the calculation.
[ ] Explain why rng() % n is biased and what to use instead.
[ ] State the expected number of comparisons in randomized quicksort and why the worst case $O (n^{2})$ is astronomically unlikely.

Practice Problems

#	Problem	Key Technique	Difficulty
1	CF 364D - Ghd	Random sampling + GCD	CF 2000
2	CF 869E - The Untended Antiquity	2D hashing with random values	CF 2100
3	CF 1746F - Kazaee	Randomized parity check	CF 2100
4	CF 585F - Digits of Number Pi	Aho-Corasick + random hashing	CF 2100
5	CF 1178F2 - Long Colorful Strip	Random shuffle + brute force	CF 1900
6	CF 1523D - Love-Hate	Random sampling + SOS DP	CF 2000
7	CF 1477C - Nezzar and Nice Beatmap	Random permutation construction	CF 1400
8	CF 727F - Polycarp's Problems	Randomized binary search	CF 1600
9	CF 1354D - Multiset	Binary search + anti-hack shuffle	CF 1400
10	CF 1514D - Cut and Stick	Random sampling for majority element	CF 1600
11	Playlist	Sliding window with hash set for distinct elements	CSES 1400
12	String Matching	Can solve with randomized polynomial hashing	CSES 1500

Suggested Progression

Start: #7, #9 (direct application of shuffle / random selection)
Build: #8, #10 (random sampling with verification)
Strengthen: #1, #6 (random sampling + complex verification)
Master: #2, #3 (randomized hashing in advanced settings)

Probabilistic Method in Competitive Programming

The probabilistic method proves that an object with a desired property exists by showing that a random construction achieves it with positive probability. In CP this translates directly: randomly generate a solution and it is likely to be valid.

Random Coloring Example

Problem: given a graph with $m$ edges, find a 2-coloring that cuts at least $m / 2$ edges (an edge is "cut" if its endpoints get different colors).

Color each vertex independently, each color with probability $1 / 2$ . Each edge is cut with probability $1 / 2$ , so by linearity of expectation the expected number of cut edges is $m / 2$ . Since the expectation equals $m / 2$ , some coloring must achieve at least $m / 2$ cuts. A random coloring lands there in expectation, and repeating 30 times makes it overwhelmingly likely.

cpp

// Random 2-coloring that cuts >= m/2 edges in expectation
// Repeat a few times, take the best
int best_cut = 0;
vector<int> best_color;

for (int attempt = 0; attempt < 30; attempt++) {
    vector<int> color(n);
    for (int i = 0; i < n; i++) color[i] = rng() & 1;

    int cut = 0;
    for (auto& [u, v] : edges)
        cut += (color[u] != color[v]);

    if (cut > best_cut) {
        best_cut = cut;
        best_color = color;
    }
}
// best_cut is very likely >= m/2

text

Random 2-coloring on a 6-vertex graph:

  (W)----(B)          W = white, B = black
  / \    / \          Edges between different colors = CUT
(B) (W)(B) (W)       Edges between same colors = UNCUT
  \  |  |  /
   (B)--(W)           Cut edges: 7 out of 10  (>= m/2 = 5)  OK

If I Had 5 Minutes

Randomization doesn't make your algorithm lucky—it makes the adversary blind. By injecting randomness into your decisions, you transform worst-case inputs into average-case inputs: no fixed input can be consistently bad against unpredictable choices. Think of it as shuffling a deck before dealing—once the cards are randomized, any stacked arrangement is neutralized.

The mnemonic to burn in: SHUFFLE—SEED—SPACE—the three S's of competitive-programming randomization:

Shuffle the input to destroy adversarial ordering
Seed with chrono to prevent replication across submissions
Space must be large ( $mod \geq 2^{61} - 1$ ) to survive the birthday paradox

Last-minute review before a contest:

Seed: mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
Shuffle: shuffle(a.begin(), a.end(), rng);—kills adversarial order
Safe hash: random base + mod $2^{61} - 1$ —kills collision attacks
Safe map: SafeHash with splitmix64 + FIXED_RANDOM—kills unordered_map hacks
Repeat: for Monte Carlo, 20–30 iterations gives error $< 10^{- 6}$

Pattern Fingerprint

Constraint Pattern	Technique	Why
"Find any valid answer" + large $n$	Random sampling + verify	Expected $O (1)$ attempts if fraction is constant
String comparison, $Q > 10^{4}$ queries	Random-base polynomial hash	Prevents adversarial collision construction
`unordered_map` with $n > 10^{4}$	SafeHash (splitmix64 + random salt)	Default hash is publicly known and hackable
Sorted/structured input + TLE risk	`shuffle()` before processing	Destroys adversarial ordering
$k$ -th element, $n > 10^{6}$	Randomized quickselect / `nth_element`	Expected $O (n)$ vs $O (n \log n)$ sort
Dynamic ordered set with $k$ -th query	Treap with split/merge	$O (\log n)$ expected for all operations
Primality test, $n \leq 10^{18}$	Miller-Rabin with 12 fixed witnesses	$O (\log^{2} n)$ , deterministic for 64-bit
Matrix product verification $A B = C$	Freivalds' algorithm	$O (k n^{2})$ vs $O (n^{3})$ multiplication
"Majority element in subarray"	Random sampling + count	$> 1 / 2$ chance per sample, repeat 30x
Interactive / hacking-enabled problem	Double hash + chrono seed	Maximum protection against adversary

Rating Progression

CF 1200 (Beginner)

text

+----------------------------------------------------------+
| What you'll see:                                         |
|   - Problems solvable with sort + greedy                 |
|   - Randomization not required, but shuffle() can save   |
|     you from adversarial TLE on quicksort-based ideas    |
| Key skill: know how to seed mt19937 and call shuffle()   |
+----------------------------------------------------------+

CF 1500 (Intermediate)

text

+----------------------------------------------------------+
| What you'll see:                                         |
|   - Hashing problems where random base prevents hacks    |
|   - "Find any valid element" via random sampling         |
|   - unordered_map TLE from collision attacks              |
| Key skill: SafeHash, random-base polynomial hashing,     |
|            random sampling for majority/witness finding   |
+----------------------------------------------------------+

CF 1800 (Advanced)

text

+----------------------------------------------------------+
| What you'll see:                                         |
|   - Treap as ordered set with k-th element queries       |
|   - Miller-Rabin for large primality checks              |
|   - Randomized algorithms as the intended solution       |
| Key skill: treap split/merge, quickselect, Freivalds',   |
|            probability analysis for error bounds          |
+----------------------------------------------------------+

CF 2100 (Expert)

text

+----------------------------------------------------------+
| What you'll see:                                         |
|   - Probabilistic method arguments (random coloring)     |
|   - Pollard rho factorization                            |
|   - Randomized complexity reductions                     |
|   - Problems where randomization IS the editorial        |
| Key skill: designing custom randomized algorithms,       |
|            analyzing expected value and error probability |
+----------------------------------------------------------+

Before You Code Checklist

Before writing a randomized solution, verify:
[ ] Seed is unpredictable: using chrono::steady_clock, not a fixed constant
[ ] Distribution is uniform: using uniform_int_distribution, not rng() % n
[ ] Enough iterations: for Monte Carlo, calculate $k$ needed for $< 10^{- 6}$ error
[ ] Hash space is large enough: $N ≫ Q^{2}$ where $Q$ = number of comparisons
[ ] Solution is testable: you can stress-test against a brute-force solution

What Would You Do If...?

Scenario 1: your unordered_map solution passes on your machine but gets TLE on Codeforces with $n = 2 \times 10^{5}$ .

The default hash is being attacked. The attacker constructed keys that all land in the same bucket, turning $O (1)$ lookups into $O (n)$ . Replace with SafeHash (Strategy 3 in the Anti-Hack Strategies section).

Scenario 2: your rolling hash solution gets WA on test 42, but passes all other tests including stress tests with $n = 10^{6}$ .

Likely a hash collision. The birthday paradox says $\sim 10^{5}$ comparisons against a space of $10^{9} + 7$ is already in dangerous territory. Switch to double hash with $2^{61} - 1$ .

Scenario 3: you need the median of $10^{7}$ elements in under 1 second, but sorting takes 1.5 seconds.

Use randomized quickselect ( $O (n)$ expected) or nth_element from the STL. Don't sort the full array when you only need one order statistic.

Historical Note

The systematic study of randomized algorithms began with Michael Rabin's 1976 paper introducing the Miller-Rabin primality test and randomized closest-pair algorithms. László Babai coined the term "Las Vegas algorithm" in 1979. Cecilia Aragon and Raimund Seidel invented treaps in 1989, elegantly fusing two classical data structures with a dash of randomness.

The Mistake That Teaches

The Debugging Story: "Why Does My Hash Keep Colliding?"

Alice writes a string matching solution for a Codeforces round. She uses polynomial hashing with base = 31 and mod = 1e9 + 7. Her solution passes pretests. During hacking, Bob submits a test with 200,000 short strings.

Alice's solution gets Wrong Answer.

She's confused—her hash logic is correct. She tests locally with random strings: no issues. She tests with Bob's input: collisions everywhere.

What happened? Bob ran a script that generates strings with identical hashes for base 31 mod $10^{9} + 7$ . Because Alice's base was hardcoded, Bob precomputed the collision strings offline.

The fix (three characters changed):

cpp

// BEFORE (hackable):
const long long BASE = 31;

// AFTER (safe):
const long long BASE = rand_ll(256, MOD - 1);

The lesson: a hardcoded base is a publicly visible lock combination. A runtime-random base is a lock that changes with every submission. The adversary arrives with the old key and finds it doesn't fit.

text

Alice's timeline:
  t=0    Submit with BASE=31         --> AC on pretests
  t=30   Bob downloads her code, sees BASE=31
  t=31   Bob generates collision strings for BASE=31
  t=32   Bob hacks Alice              --> WA

Alice v2:
  t=0    Submit with BASE=rand_ll()  --> AC on pretests
  t=30   Bob downloads her code, sees BASE=rand_ll()
  t=31   Bob: "I can't predict the base..."  --> hack FAILS

When NOT to Use Randomization

Randomization is powerful, but not always the right call:

When deterministic $O (n \log n)$ exists and suffices: don't add complexity. std::sort is already introsort—quicksort with a heapsort fallback—giving guaranteed $O (n \log n)$ .
When the problem requires exact answers with a proof: some problems, especially interactive ones, need deterministic guarantees. A Monte Carlo approach with $10^{- 6}$ error probability may fail on the one test case it gets wrong.
When $n$ is small enough for brute force: randomization adds code complexity. If $n \leq 20$ , just enumerate.
When the random approach has high constant factors: mt19937 is not free. In ultra-tight time limit problems ( $n = 10^{7}$ , TL = 1s), the overhead of random number generation in every operation can tip the balance.
When you cannot verify the answer: Las Vegas algorithms need a cheap verification step. If verification is as expensive as the computation itself, random restarts buy nothing.

Debug This

Exercise 1: this shuffle is biased. Why?

cpp

// BUGGY: biased shuffle
void bad_shuffle(vector<int>& a) {
    int n = a.size();
    for (int i = 0; i < n; i++) {
        int j = uniform_int_distribution<int>(0, n - 1)(rng);
        swap(a[i], a[j]);
    }
}

Answer

The range of j should be [i, n-1] (or [0, i]), not [0, n-1]. With [0, n-1], there are $n^{n}$ equally likely outcomes but only $n!$ permutations. Since $n^{n}$ is not divisible by $n!$ for $n > 2$ , some permutations are more likely than others. The correct Fisher-Yates shuffle picks j from [0, i] (backward) or [i, n-1] (forward).

Exercise 2: this hash produces collisions even with random base. Why?

cpp

// BUGGY: broken hash
long long hash_string(const string& s) {
    long long h = 0;
    long long base = rand_ll(2, 1000000006);
    long long mod = 1000000007;
    for (char c : s) {
        h = h * base + c;
        h %= mod;
    }
    return h;
}

Answer

The base is re-randomized on every call! Two calls to hash_string on the same string will likely produce different hashes because each call picks a new random base. The base must be generated ONCE (globally or in a struct constructor) and reused for all hash computations in the same run.

Exercise 3: this reservoir sampling has off-by-one. Where?

cpp

// BUGGY: off-by-one in reservoir sampling
vector<int> sample(vector<int>& stream, int k) {
    vector<int> res(stream.begin(), stream.begin() + k);
    for (int i = k; i < (int)stream.size(); i++) {
        int j = uniform_int_distribution<int>(0, i - 1)(rng);
        if (j < k) res[j] = stream[i];
    }
    return res;
}

Answer

The distribution should be uniform_int_distribution<int>(0, i), not (0, i - 1). With (0, i-1), the $i$ -th element (0-indexed) is included with probability $k / i$ instead of the correct $k / (i + 1)$ . This makes earlier elements underrepresented and later elements overrepresented.

Exercise 4: this treap merge has a silent bug that corrupts the tree. What is it?

cpp

// BUGGY: treap merge
Node* merge(Node* L, Node* R) {
    if (!L) return R;
    if (!R) return L;
    if (L->priority > R->priority) {
        L->right = merge(L->right, R);
        return L;
    } else {
        R->left = merge(L, R->left);
        return R;
    }
}

Answer

The update() call is missing! After modifying L->right or R->left, the size field is stale. Any subsequent call to sz() or kth() will return wrong results. The fix: add update(L); return L; and update(R); return R; respectively.

Cross-References

String Hashing: full treatment of polynomial hashing, rolling hash, and substring comparison. The random-base technique from the Randomized Hashing section is applied there for anti-hack safety.
Hash Maps and Unordered Maps: explains why unordered_map is vulnerable and how SafeHash (Anti-Hack Strategies) prevents collision attacks. Essential reading for Codeforces contestants.
Sorting and Searching: covers deterministic sorting algorithms. Randomized quicksort and quickselect above are randomized extensions of the partition-based techniques discussed there.

Extended Practice Problems

#	Problem	Key Technique	Difficulty
13	CF 1523D - Love-Hate	Random sampling + SOS DP	CF 2000
14	CF 896C - Willem, Chtholly and Seniorious	Chtholly tree (randomized interval merging)	CF 2100
15	CF 1763C - Another Array Problem	Random observation for small cases	CF 1400
16	CF 1154G - Minimum Possible LCM	Random sampling + GCD tricks	CF 1900
17	CF 1081G - Mergesort Strikes Back	Expected value of random permutations	CF 2400
18	CF 1705E - Mark and Professor	Randomized hashing on bitsets	CF 2100
19	CF 1554D - Diane	Hash-based string construction	CF 1800
20	CF 1142B - Lynyrd Skynyrd	Random sampling for cycle detection	CF 1800

Extended Progression

Warm-up (CF 1200–1400): #7, #9, #15—shuffle, basic random selection
Foundation (CF 1400–1600): #8, #10, #19—sampling, hashing patterns
Intermediate (CF 1600–1900): #1, #5, #16, #20—sampling + complex logic
Advanced (CF 1900–2100): #2, #3, #6, #14, #18—advanced randomized DS
Expert (CF 2100+): #4, #13, #17—deep probability + randomized analysis

Next Up

Continue to: Binary Lifting and LCA for techniques where preprocessing meets tree structure, or return to Sorting and Searching to solidify the deterministic foundations that randomization builds upon.

Appendix: Quick Reference

cpp

// --- Paste this block at the top of your solution ---
#include <bits/stdc++.h>
using namespace std;

mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
mt19937_64 rng64(chrono::steady_clock::now().time_since_epoch().count());

int rand_int(int a, int b) {
    return uniform_int_distribution<int>(a, b)(rng);
}

long long rand_ll(long long a, long long b) {
    return uniform_int_distribution<long long>(a, b)(rng64);
}

struct SafeHash {
    static uint64_t splitmix64(uint64_t x) {
        x += 0x9e3779b97f4a7c15;
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
        return x ^ (x >> 31);
    }
    size_t operator()(uint64_t x) const {
        static const uint64_t FIXED_RANDOM =
            chrono::steady_clock::now().time_since_epoch().count();
        return splitmix64(x + FIXED_RANDOM);
    }
};
// --- End of randomization toolkit ---

Randomization Techniques ​

Contents ​

Intuition ​

Visual Walkthrough—Random Pivot Selection ​

The Invariant ​

What the Code Won't Teach You ​

When to Reach for This ​

Core Mechanics ​

Random Number Generation in C++ ​

Random Shuffling ​

Random Pivot Selection (Quicksort / Quickselect) ​

C++ STL Reference -- The <random> Toolkit ​

Randomized Hashing ​

Polynomial Hashing with Random Base ​

Double Hashing (Belt and Suspenders) ​

Randomized Algorithms ​

Miller-Rabin Primality Test ​

Freivalds' Algorithm (Matrix Multiplication Verification) ​

Random Sampling / Witness Search ​

Las Vegas vs Monte Carlo — A Deeper Look ​

Randomized Quickselect — Expected O(n) Median Finding ​

Treap — Randomized Binary Search Tree ​

Reservoir Sampling & Fisher-Yates Shuffle ​

Reservoir Sampling (Algorithm R) ​

Fisher-Yates Shuffle (Knuth Shuffle) ​

Anti-Hash Attack — Why Random Bases Are Essential ​

How Fixed Bases Get Hacked ​

Birthday Paradox Collision Probability Table ​

The Birthday Paradox and Collision Analysis ​

Anti-Hack Strategies for Codeforces ​

Strategy 1: Shuffle the Input ​

Strategy 2: Randomize Hash Parameters ​

Strategy 3: Randomize Map/Set Hash ​

Strategy 4: Random Perturbation for Geometry ​

Strategy 5: Multiple Random Runs ​

Common Patterns ​

Random Hashing for String Comparison ​

Random Sampling for Witness Finding ​

Birthday Paradox for Collision Detection ​

Randomized Reduction to Simpler Problem ​

Complexity Analysis ​

Edge Cases and Pitfalls ​

Pitfall 1: Using rand() Instead of mt19937 ​

Pitfall 2: Fixed Seeds ​

Pitfall 3: Insufficient Hash Space ​

Pitfall 4: Not Enough Iterations ​

Pitfall 5: Modular Arithmetic Overflow ​

Pitfall 6: Biased Random Ranges ​

Contest Cheat Sheet ​

Self-Test ​

Practice Problems ​

Suggested Progression ​

Probabilistic Method in Competitive Programming ​

Random Coloring Example ​

If I Had 5 Minutes ​

Pattern Fingerprint ​

Rating Progression ​

CF 1200 (Beginner) ​

CF 1500 (Intermediate) ​

CF 1800 (Advanced) ​

CF 2100 (Expert) ​

Before You Code Checklist ​

What Would You Do If...? ​

Historical Note ​

The Mistake That Teaches ​

When NOT to Use Randomization ​

Debug This ​

Cross-References ​

Extended Practice Problems ​

Extended Progression ​

Further Reading ​

Next Up ​

Appendix: Quick Reference ​

Randomization Techniques

Contents

Intuition

Visual Walkthrough—Random Pivot Selection

The Invariant

What the Code Won't Teach You

When to Reach for This

Core Mechanics

Random Number Generation in C++

Random Shuffling

Random Pivot Selection (Quicksort / Quickselect)

C++ STL Reference -- The `<random>` Toolkit

Randomized Hashing

Polynomial Hashing with Random Base

Double Hashing (Belt and Suspenders)

Randomized Algorithms

Miller-Rabin Primality Test

Freivalds' Algorithm (Matrix Multiplication Verification)

Random Sampling / Witness Search

Las Vegas vs Monte Carlo — A Deeper Look

Randomized Quickselect — Expected O(n) Median Finding

Treap — Randomized Binary Search Tree

Reservoir Sampling & Fisher-Yates Shuffle

Reservoir Sampling (Algorithm R)

Fisher-Yates Shuffle (Knuth Shuffle)

Anti-Hash Attack — Why Random Bases Are Essential

How Fixed Bases Get Hacked

Birthday Paradox Collision Probability Table

The Birthday Paradox and Collision Analysis

Anti-Hack Strategies for Codeforces

Strategy 1: Shuffle the Input

Strategy 2: Randomize Hash Parameters

Strategy 3: Randomize Map/Set Hash

Strategy 4: Random Perturbation for Geometry

Strategy 5: Multiple Random Runs

Common Patterns

Random Hashing for String Comparison

Random Sampling for Witness Finding

Birthday Paradox for Collision Detection

Randomized Reduction to Simpler Problem

Complexity Analysis

Edge Cases and Pitfalls

Pitfall 1: Using `rand()` Instead of `mt19937`

Pitfall 2: Fixed Seeds

Pitfall 3: Insufficient Hash Space

Pitfall 4: Not Enough Iterations

Pitfall 5: Modular Arithmetic Overflow

Pitfall 6: Biased Random Ranges

Contest Cheat Sheet

Self-Test

Practice Problems

Suggested Progression

Probabilistic Method in Competitive Programming

Random Coloring Example

If I Had 5 Minutes

Pattern Fingerprint

Rating Progression

CF 1200 (Beginner)

CF 1500 (Intermediate)

CF 1800 (Advanced)

CF 2100 (Expert)

Before You Code Checklist

What Would You Do If...?

Historical Note

The Mistake That Teaches

When NOT to Use Randomization

Debug This

Cross-References

Extended Practice Problems

Extended Progression

Further Reading

Next Up

Appendix: Quick Reference