Skip to content

Randomization Techniques

Quick Revisit

  • USE WHEN: Deterministic solution can be hacked (adversarial input), or you need random pivots/hashes/bases
  • INVARIANT: Randomize the part the adversary controls (pivot choice, hash base/mod) so worst-case inputs don't exist
  • TIME: Expected O(n log n) for randomized quicksort/select; O(n) for hashing | SPACE: O(n)
  • CLASSIC BUG: Using a fixed seed or fixed hash constants—adversary reverse-engineers them and forces collisions/worst-case
  • PRACTICE: 01-ladder-1100-to-1400

Difficulty: Intermediate | CF Rating Range: 1400–1800 | Prerequisites: Math Fundamentals, String Hashing | Applies to: CF 1100–2100


Contents


Intuition

You submit a clean deterministic solution on Codeforces. It passes pretests. Then during the hacking phase, someone drops a crafted input and your solution gets TLE or WA.

Consider quicksort with the classic "always pick the first element as pivot" strategy. Against a sorted array of n=105 elements, the O(n2) worst case triggers instantly:

text
Input:  [1, 2, 3, 4, 5, ..., 100000]
Pivot:   1
           \--- partition splits into (0 elements) | (99999 elements)
Pivot:   2
           \--- partition splits into (0 elements) | (99998 elements)
...
Result: n + (n-1) + (n-2) + ... + 1 = n*(n+1)/2 comparisons --> TLE

Or consider polynomial string hashing with a fixed base and modulus. An adversary reverse-engineers your constants and constructs colliding strings, turning your O(n) hash comparison into a guaranteed WA.

Deterministic algorithms have knowable worst cases—and on competitive programming platforms with open hacking, the adversary will find them.

"Adding randomness makes adversarial attacks impossible—the enemy can't predict your random choices."

When you pick a random pivot, no input arrangement is consistently bad. When you pick a random hash base at runtime, no pre-built collision set works. The adversary would need to predict your random seed—which they cannot.

The shift in thinking: instead of guaranteeing O(f(n)) for all inputs, we guarantee expected O(f(n)) for any input. The randomness lives in the algorithm, not in the input distribution.

Visual Walkthrough—Random Pivot Selection

text
Array: [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]

--- Deterministic (first element) on sorted input ---
Sorted: [1, 1, 2, 3, 3, 4, 5, 5, 6, 9]
Pivot=1 -> split: [] | [1,2,3,3,4,5,5,6,9]      depth grows linearly
Pivot=1 -> split: [] | [2,3,3,4,5,5,6,9]
...  O(n) depth --> O(n^2) total

--- Randomized pivot on the SAME sorted input ---
Sorted: [1, 1, 2, 3, 3, 4, 5, 5, 6, 9]
Random pivot=5 -> split: [1,1,2,3,3,4] | [5,6,9]   ~balanced!
  Left:  random pivot=2 -> [1,1] | [3,3,4]           ~balanced!
  Right: random pivot=6 -> [5] | [9]                  ~balanced!
...  O(log n) expected depth --> O(n log n) expected total

Each random pivot has at least a 1/2 probability of landing in the "middle half" of the current subarray, producing a split no worse than 1/4 to 3/4. After O(logn) such good splits we reach base cases.

The Invariant

"Expected running time is O(nlogn) regardless of input distribution."

For randomized quicksort, the expected number of comparisons is 2nlnn1.39nlog2n for any permutation. No adversarial input can break this expectation—only luck (which averages out over many runs).

For randomized hashing with base chosen uniformly from [2,p1], the collision probability for two distinct strings of length L is at most L/p. With p109+7 and L106, that is <103 per comparison.

What the Code Won't Teach You

The templates show how to randomize, but not why each choice matters. Here are insights that only come from debugging wrong submissions:

Hashing safety is about the modulus, not the base. Polynomial rolling hash treats a string as a number in base b mod M. Choosing a random base prevents adversaries from constructing collisions—but the size of M determines how safe you are against accidental collisions. With n hash comparisons, the birthday paradox gives collision probability n2/(2M). For n=105:

text
Modulus M          Collision prob (n=10^5 comparisons)
+--------------------------------------------------+
| 10^9 + 7         ~10^10 / 2*10^9  ~ 5.0          |  <-- UNSAFE (expect collisions!)
| 2^61 - 1         ~10^10 / 2^62    ~ 2*10^{-9}    |  <-- Safe
| Double hash       ~10^10 / 2^122  ~ negligible    |  <-- Very safe
+--------------------------------------------------+

unordered_map is a landmine. The default hash function on Codeforces is known. An adversary constructs n keys that all land in the same bucket, turning every lookup into O(n). The fix is the SafeHash struct from the Anti-Hack Strategies section, which XORs each key with a runtime random constant—shifting the collision distribution from adversary-controlled to uniformly random.

text
Default unordered_map with adversarial keys:
  Bucket 0: key1 -> key2 -> key3 -> ... -> keyN    O(n) lookup!
  Bucket 1: (empty)
  Bucket 2: (empty)
  ...

SafeHash unordered_map (same keys, random salt):
  Bucket 0: key7 -> key42                           O(1) expected
  Bucket 1: key3 -> key99 -> key15                  O(1) expected
  Bucket 2: key1                                    O(1) expected
  ...      (keys distributed ~uniformly)

When randomization is the intended solution, look for signals: n is large, the answer is hard to compute exactly, or the problem asks for "any valid answer." Randomized selection (quickselect) is O(n) expected—for safety in contests, shuffle the input first or use introselect (which falls back to median-of-medians).

When to Reach for This

Trigger phrases and situations:

  • "anti-hack"—you need a solution that cannot be broken by adversarial input
  • "probabilistic"—the problem explicitly allows small error probability
  • "expected value solution"—you need expected O(f(n)), not worst-case
  • Hash collisions—your deterministic hash keeps getting hacked
  • Randomized algorithms—Miller-Rabin, Pollard rho, Freivalds' algorithm
  • Large search spaces—random sampling finds a witness faster than enumeration

Core Mechanics

Random Number Generation in C++

Never use rand()—it is slow, low quality, and only 15-bit on some platforms. Use the Mersenne Twister mt19937 or mt19937_64:

cpp
#include <bits/stdc++.h>
using namespace std;

// Seed with chrono to prevent predictable output
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
mt19937_64 rng64(chrono::steady_clock::now().time_since_epoch().count());

// Generate a random integer in [a, b] inclusive
int rand_int(int a, int b) {
    return uniform_int_distribution<int>(a, b)(rng);
}

// Generate a random long long in [a, b] inclusive
long long rand_ll(long long a, long long b) {
    return uniform_int_distribution<long long>(a, b)(rng64);
}

Why chrono seed? A fixed seed lets a hacker replicate your random sequence. A chrono-based seed changes with every submission.

Random Shuffling

The simplest and most powerful anti-hack technique—shuffling the input before processing destroys any adversarial ordering:

cpp
vector<int> a(n);
for (int i = 0; i < n; i++) cin >> a[i];
shuffle(a.begin(), a.end(), rng);
// Now process a[] -- no adversarial ordering possible

Use it for quicksort, treap insertion, or any algorithm whose worst case depends on input order.

Random Pivot Selection (Quicksort / Quickselect)

cpp
int partition(vector<int>& a, int lo, int hi) {
    int pivot_idx = rand_int(lo, hi);
    swap(a[pivot_idx], a[hi]);
    int pivot = a[hi];
    int i = lo;
    for (int j = lo; j < hi; j++) {
        if (a[j] <= pivot) {
            swap(a[i], a[j]);
            i++;
        }
    }
    swap(a[i], a[hi]);
    return i;
}

void quicksort(vector<int>& a, int lo, int hi) {
    if (lo >= hi) return;
    int p = partition(a, lo, hi);
    quicksort(a, lo, p - 1);
    quicksort(a, p + 1, hi);
}

Expected comparisons: 1.39nlog2n. Worst case is still O(n2) but occurs with probability 1/n!—astronomically unlikely.


C++ STL Reference -- The <random> Toolkit

Everything you need lives in <random> (plus shuffle from <algorithm>). Forget rand() and srand()—15-bit range on MSVC, global state, non-uniform distribution. The modern C++ facilities are fast, high-quality, and safe when seeded properly.

Class / FunctionHeaderWhat It DoesComplexity
mt19937<random>32-bit Mersenne Twister engine; period 2199371O(1) per call
mt19937_64<random>64-bit variant for long long rangesO(1) per call
uniform_int_distribution<T>(a, b)<random>Uniform integer in [a,b] inclusiveO(1)
uniform_real_distribution<T>(a, b)<random>Uniform real in [a,b)O(1)
discrete_distribution<T>({w₀, w₁, ...})<random>Weighted random index; returns i with prob wiO(k) setup, O(logk) per call
bernoulli_distribution(p)<random>Returns true with probability pO(1)
shuffle(first, last, gen)<algorithm>Uniform random permutation (Fisher-Yates internally)O(n)
sample(first, last, out, n, gen)<algorithm>Choose n elements without replacement (C++17)O(n)
nth_element(first, nth, last)<algorithm>Rearranges so element at nth is in sorted position; O(n) averageO(n) avg

Usage patterns you'll actually reach for in contest:

cpp
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());

// Random int in [lo, hi]
int x = uniform_int_distribution<int>(lo, hi)(rng);

// Random long long -- use the 64-bit engine
mt19937_64 rng64(chrono::steady_clock::now().time_since_epoch().count());
long long y = uniform_int_distribution<long long>(lo, hi)(rng64);

// Shuffle a vector -- kills adversarial ordering
shuffle(v.begin(), v.end(), rng);

// Pick k random elements without replacement (C++17)
vector<int> chosen;
sample(v.begin(), v.end(), back_inserter(chosen), k, rng);

// k-th smallest element without full sort
nth_element(v.begin(), v.begin() + k, v.end());
// v[k] is now the (k+1)-th smallest; elements before it are <= v[k]
text
  Distribution overview:

  uniform_int_distribution(1, 6)     ->  1  2  3  4  5  6   (each 1/6)
  uniform_real_distribution(0, 1)    ->  [0.0 ................. 1.0)
  bernoulli_distribution(0.3)        ->  false (70%)  |  true (30%)
  discrete_distribution({3, 1, 1})   ->  0 (60%)  1 (20%)  2 (20%)

Two things that bite people:

  1. random_shuffle was removed in C++17. Always use shuffle with an explicit engine as the third argument. If you see random_shuffle in old code or editorials, replace it immediately.

  2. rng() % n is biased when rng.max() + 1 is not divisible by n. Use uniform_int_distribution instead—it handles rejection sampling internally so every value in the range is equally likely.


Randomized Hashing

Polynomial Hashing with Random Base

The standard polynomial hash of string s with base b and modulus m:

h(s)=i=0|s|1s[i]b|s|1i(modm)

Anti-hack version: choose b randomly at runtime.

cpp
const long long MOD = (1LL << 61) - 1; // Mersenne prime, very fast modular arithmetic

long long mod_mul(long long a, long long b) {
    // Compute (a * b) % MOD without overflow using __int128
    return (__int128)a * b % MOD;
}

long long mod_add(long long a, long long b) {
    a += b;
    if (a >= MOD) a -= MOD;
    return a;
}

struct RollingHash {
    int n;
    long long base;
    vector<long long> h, pw;

    RollingHash(const string& s) : n(s.size()), h(n + 1), pw(n + 1) {
        base = rand_ll(256, MOD - 1);
        pw[0] = 1;
        for (int i = 0; i < n; i++) {
            pw[i + 1] = mod_mul(pw[i], base);
            h[i + 1] = mod_add(mod_mul(h[i], base), s[i]);
        }
    }

    // Hash of s[l..r] (0-indexed, inclusive)
    long long query(int l, int r) {
        return mod_add(h[r + 1], MOD - mod_mul(h[l], pw[r - l + 1]));
    }
};

Why 2611? It is a Mersenne prime, so modular reduction is fast via bitwise operations. The collision probability for two distinct strings of length L is at most L/(2611)<1012 per query—far safer than 109+7.

Double Hashing (Belt and Suspenders)

For problems where even a single collision is fatal, use two independent random bases:

cpp
struct DoubleHash {
    RollingHash h1, h2;
    DoubleHash(const string& s) : h1(s), h2(s) {}

    pair<long long, long long> query(int l, int r) {
        return {h1.query(l, r), h2.query(l, r)};
    }
};

Collision probability drops to L2/(2611)2—negligible for any practical input size.


Randomized Algorithms

Miller-Rabin Primality Test

Deterministic primality testing typically uses trial division up to nO(n), fine up to about 1012 but far too slow for 64-bit integers. Miller-Rabin runs in O(klog2n) for k rounds, making it practical for n1018.

cpp
long long power(long long a, long long d, long long mod) {
    long long result = 1;
    a %= mod;
    while (d > 0) {
        if (d & 1) result = (__int128)result * a % mod;
        a = (__int128)a * a % mod;
        d >>= 1;
    }
    return result;
}

bool miller_rabin(long long n, long long a) {
    if (n % a == 0) return n == a;
    long long d = n - 1;
    int r = 0;
    while (d % 2 == 0) { d /= 2; r++; }
    long long x = power(a, d, n);
    if (x == 1 || x == n - 1) return true;
    for (int i = 0; i < r - 1; i++) {
        x = (__int128)x * x % n;
        if (x == n - 1) return true;
    }
    return false;
}

bool is_prime(long long n) {
    if (n < 2) return false;
    // Deterministic for n < 3.317e23 with these bases
    for (long long a : {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37}) {
        if (!miller_rabin(n, a)) return false;
    }
    return true;
}

The fixed witness set {2,3,5,7,11,13,17,19,23,29,31,37} makes this deterministic for all n<3.317×1023—every 64-bit integer.

Freivalds' Algorithm (Matrix Multiplication Verification)

Given three n×n matrices A, B, C, verify whether A×B=C in O(n2) instead of O(n3).

Pick a random binary vector r{0,1}n. Compute A(Br) and Cr. If AB=C, they are always equal. If ABC, they differ with probability 1/2.

cpp
bool freivalds(vector<vector<long long>>& A,
               vector<vector<long long>>& B,
               vector<vector<long long>>& C, int n, int iterations = 20) {
    for (int iter = 0; iter < iterations; iter++) {
        vector<long long> r(n);
        for (int i = 0; i < n; i++) r[i] = rng() & 1;

        // Compute Br
        vector<long long> Br(n, 0);
        for (int i = 0; i < n; i++)
            for (int j = 0; j < n; j++)
                Br[i] += B[i][j] * r[j];

        // Compute A(Br)
        vector<long long> ABr(n, 0);
        for (int i = 0; i < n; i++)
            for (int j = 0; j < n; j++)
                ABr[i] += A[i][j] * Br[j];

        // Compute Cr
        vector<long long> Cr(n, 0);
        for (int i = 0; i < n; i++)
            for (int j = 0; j < n; j++)
                Cr[i] += C[i][j] * r[j];

        if (ABr != Cr) return false;
    }
    return true; // Correct with probability >= 1 - 2^{-iterations}
}

Error probability after k iterations: 2k. At k=20, the chance of a false positive is <106.

Many problems reduce to: find an element with property P, where at least a constant fraction of elements satisfy P. Random sampling finds such an element in O(1) expected attempts.

Example: given a multiset where the majority element appears >n/2 times, a random sample hits it with probability >1/2. After k samples, the probability of missing it entirely is <2k.

cpp
// Find an element appearing > n/2 times (guaranteed to exist)
int find_majority(vector<int>& a) {
    int n = a.size();
    while (true) {
        int candidate = a[rand_int(0, n - 1)];
        int count = 0;
        for (int x : a) count += (x == candidate);
        if (count > n / 2) return candidate;
    }
    // Expected iterations: < 2
}

Las Vegas vs Monte Carlo — A Deeper Look

Las Vegas algorithms always produce the correct answer; only the running time is random. Think of them as "keep trying until you succeed."

Monte Carlo algorithms always terminate within a fixed time bound, but the answer may be wrong with some bounded probability ε. Reduce ε by repeating.

text
+------------------------------------------------------------------+
| Property        | Las Vegas              | Monte Carlo            |
+------------------------------------------------------------------+
| Correctness     | ALWAYS correct         | Correct w.p. >= 1-eps  |
| Runtime         | Random (expected bound) | Deterministic bound   |
| Error control   | Run longer / retry     | Repeat k times:       |
|                 |                        |   error <= eps^k       |
| CP examples     | Randomized Quicksort   | Miller-Rabin (random   |
|                 | Randomized Quickselect |   witnesses)           |
|                 | Treap insert/delete    | Freivalds' algorithm   |
|                 | Random pivot partition | Randomized hashing     |
| When to use     | Need exact answer,     | Approximate answer OK, |
|                 | can tolerate variable  | need guaranteed speed  |
|                 | runtime                |                        |
+------------------------------------------------------------------+

Converting between the two:

  • Monte Carlo → Las Vegas: if you can verify the answer cheaply, run the MC algorithm, verify, and retry on failure. For example, Miller-Rabin identifies a candidate; trial division up to small bounds confirms it.
  • Las Vegas → Monte Carlo: run the LV algorithm for a fixed time budget. If it hasn't finished, output a default answer. You sacrifice correctness for a timing guarantee.
text
  Las Vegas runtime distribution         Monte Carlo error reduction
                                          by repetition
  Probability                            Error prob
  ^                                      ^
  |  *                                   |  *
  |  * *                                 |    *
  |  *   *                               |      *
  |  *     * *                           |        *  *
  |  *         * * * *                   |              *  *  *  *
  +-------------------> Time             +--------------------> k (rounds)
  Expected = O(n log n)                  eps^k shrinks fast
  (but could be longer)                  (20 rounds -> ~10^{-6})

Contest rule of thumb: if the problem says "print any valid answer" or has n105 with a 2–3 second time limit, a Las Vegas approach with expected O(nlogn) is safe. For property-checking problems (primality, matrix equality), Monte Carlo with 20–30 rounds is standard.

Cross-reference: Miller-Rabin (above) is the classic Monte Carlo example; randomized quicksort (Random Pivot Selection) is the classic Las Vegas example.

Randomized Quickselect — Expected O(n) Median Finding

Quickselect finds the k-th smallest element in expected O(n) time using a random pivot, compared to O(nlogn) for sorting.

Partition around a random pivot—exactly like quicksort—but recurse into only one side: the side that contains index k.

text
Find the 4th smallest element (k=3, 0-indexed) in [7, 2, 9, 1, 5, 8, 3]:

Step 1: Random pivot = 5
  Partition: [2, 1, 3] | 5 | [7, 9, 8]
  pivot is at index 3 == k  -->  FOUND! Answer = 5

  (Lucky case. Usually takes a few rounds.)

Step 2 (if pivot had landed at index 5):
  Random pivot = 8
  Partition: [7, 2, 1, 5, 3] | 8 | [9]
  pivot at index 5, k=3 < 5  -->  recurse LEFT on [7, 2, 1, 5, 3]

Step 3: Random pivot = 2
  Partition: [1] | 2 | [7, 5, 3]
  pivot at index 1, k=3 > 1  -->  recurse RIGHT on [7, 5, 3], k'=3-2=1

Step 4: Random pivot = 5
  Partition: [3] | 5 | [7]
  pivot at index 1 == k'  -->  FOUND! Answer = 5

Full C++ implementation:

cpp
#include <bits/stdc++.h>
using namespace std;

mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());

// Returns the k-th smallest element (0-indexed) in a[lo..hi]
int quickselect(vector<int>& a, int lo, int hi, int k) {
    while (lo < hi) {
        // Random pivot
        int pivot_idx = uniform_int_distribution<int>(lo, hi)(rng);
        swap(a[pivot_idx], a[hi]);
        int pivot = a[hi];

        // Three-way partition (handles duplicates efficiently)
        int lt = lo, gt = hi;  // a[lo..lt-1] < pivot, a[gt+1..hi] > pivot
        int i = lo;
        while (i <= gt) {
            if (a[i] < pivot)       swap(a[i++], a[lt++]);
            else if (a[i] > pivot)  swap(a[i], a[gt--]);
            else                    i++;
        }
        // Now a[lt..gt] == pivot

        if (k < lt)      hi = lt - 1;
        else if (k > gt) lo = gt + 1;
        else              return a[k];  // pivot is the answer
    }
    return a[lo];
}

// Convenience: find the k-th smallest in the entire array
int kth_smallest(vector<int> a, int k) {
    // Note: pass by value to avoid modifying the original
    return quickselect(a, 0, (int)a.size() - 1, k);
}

Complexity analysis:

  • Expected comparisons: 4n (proof via linearity of expectation).
  • Each pivot lands in the middle half with probability 1/2—a "good" pivot that reduces the subarray by at least 1/4.
  • After a good pivot, remaining size 3n/4. Expected good pivots before the next recursive call: 2.
  • Total work: n+3n/4+(3/4)2n+=4n.
  • Worst case is O(n2), but the probability of exceeding cn is exponentially small.

Contest tip: for safety, shuffle the array before calling quickselect, or use nth_element from the STL, which uses introselect (quickselect with a fallback to median-of-medians for guaranteed O(n) worst case).

Treap — Randomized Binary Search Tree

A treap (tree + heap) is a BST where each node also carries a random priority. It maintains BST order on keys and max-heap order on priorities simultaneously.

Why it works: random priorities force the tree into the shape of a random BST, which has expected depth O(logn). That gives O(logn) expected time for insert, delete, split, and merge—without the complex rotations of AVL or red-black trees.

text
Treap with keys {1, 3, 5, 7, 9} and random priorities:

BST property: left < root < right (on keys)
Heap property: parent priority > child priority

              (5, pri=97)
              /           \
        (3, pri=84)     (9, pri=91)
        /               /
  (1, pri=32)     (7, pri=45)

Verify BST on keys:  1 < 3 < 5 < 7 < 9     <-- correct
Verify heap on pri:  97 > 84, 97 > 91       <-- root highest
                     84 > 32                 <-- left subtree
                     91 > 45                 <-- right subtree

Core Operations—Split and Merge:

All other treap operations reduce to two primitives:

text
SPLIT(t, key) --> (L, R)
  L contains all nodes with key <= key
  R contains all nodes with key > key

        (5,97)                   (3,84)          (5,97)
        /    \      split(4)     /               /    \
   (3,84)  (9,91)  -------->  (1,32)        (empty) (9,91)
   /        /                                        /
 (1,32)  (7,45)                L              (7,45)
                                                 R

MERGE(L, R) --> T   (all keys in L < all keys in R)
  Pick root by higher priority, recursively merge the rest.

  (3,84)       (9,91)             (9,91)
  /            /       merge      /    \
(1,32)      (7,45)  -------->  (3,84) (empty)
                               /    \
                            (1,32) (7,45)

  91 > 84, so 9 is root.
  Left of 9 = merge(L, left_child_of_9) = merge({1,3}, {7})
  84 > 45, so 3 is root of that subtree.

  Final:       (9, pri=91)
               /
          (3, pri=84)
          /         \
    (1, pri=32)  (7, pri=45)

Full C++ implementation (implicit treap with split/merge):

cpp
#include <bits/stdc++.h>
using namespace std;

mt19937 rng_treap(chrono::steady_clock::now().time_since_epoch().count());

struct Node {
    int key, priority, size;
    Node *left, *right;
    Node(int k) : key(k), priority(rng_treap()), size(1),
                  left(nullptr), right(nullptr) {}
};

int sz(Node* t) { return t ? t->size : 0; }

void update(Node* t) {
    if (t) t->size = 1 + sz(t->left) + sz(t->right);
}

// Split treap t into L (keys <= key) and R (keys > key)
void split(Node* t, int key, Node*& L, Node*& R) {
    if (!t) { L = R = nullptr; return; }
    if (t->key <= key) {
        split(t->right, key, t->right, R);
        L = t;
    } else {
        split(t->left, key, L, t->left);
        R = t;
    }
    update(t);
}

// Merge two treaps L and R (all keys in L < all keys in R)
Node* merge(Node* L, Node* R) {
    if (!L || !R) return L ? L : R;
    if (L->priority > R->priority) {
        L->right = merge(L->right, R);
        update(L);
        return L;
    } else {
        R->left = merge(L, R->left);
        update(R);
        return R;
    }
}

// Insert key into treap rooted at t
void insert(Node*& t, int key) {
    Node *L, *R;
    split(t, key, L, R);
    t = merge(merge(L, new Node(key)), R);
}

// Erase one occurrence of key from treap rooted at t
void erase(Node*& t, int key) {
    if (!t) return;
    if (t->key == key) {
        Node* tmp = merge(t->left, t->right);
        delete t;
        t = tmp;
    } else if (key < t->key) {
        erase(t->left, key);
    } else {
        erase(t->right, key);
    }
    update(t);
}

// Find k-th smallest (0-indexed)
int kth(Node* t, int k) {
    int left_size = sz(t->left);
    if (k < left_size) return kth(t->left, k);
    if (k == left_size) return t->key;
    return kth(t->right, k - left_size - 1);
}

// Count elements strictly less than key
int count_less(Node* t, int key) {
    if (!t) return 0;
    if (t->key < key) return 1 + sz(t->left) + count_less(t->right, key);
    return count_less(t->left, key);
}

Why Treap over std::set?

std::set is fine for insert/find/erase, but it cannot:

  • Find the k-th smallest element in O(logn)
  • Count elements in a range in O(logn)
  • Split and merge in O(logn)—essential for interval operations
  • Serve as a sequence with split/merge at arbitrary positions (implicit treap)
text
Operation comparison:

+----------------------------------------------------------+
| Operation         | std::set   | Treap      | Notes      |
+----------------------------------------------------------+
| Insert/Delete     | O(log n)   | O(log n)*  | *expected  |
| Find k-th element | O(n)       | O(log n)*  | treap wins |
| Split at key      | O(n)       | O(log n)*  | treap wins |
| Merge two sets    | O(n log n) | O(log n)*  | treap wins |
| Count in range    | O(n)       | O(log n)*  | treap wins |
+----------------------------------------------------------+
*expected time with random priorities

Cross-reference: Treaps combine ideas from Sorting and Searching (BST property) and randomization (heap with random priorities).

Reservoir Sampling & Fisher-Yates Shuffle

Reservoir Sampling (Algorithm R)

Problem: given a stream of n elements (where n may be unknown or too large to store), select k elements uniformly at random.

Maintain a "reservoir" of k items. For the i-th element (i>k), include it with probability k/i, replacing a randomly chosen existing element.

text
Stream: a, b, c, d, e, f, g    (k=3)

Step 1: reservoir = [a, b, c]              (first k items)
Step 2: i=4, include d with prob 3/4=0.75
        Say YES -> replace random slot -> [a, d, c]
Step 3: i=5, include e with prob 3/5=0.60
        Say NO  -> reservoir unchanged  -> [a, d, c]
Step 4: i=6, include f with prob 3/6=0.50
        Say YES -> replace random slot -> [f, d, c]
Step 5: i=7, include g with prob 3/7=0.43
        Say NO  -> reservoir unchanged  -> [f, d, c]

Each of the 7 elements had exactly 3/7 probability of being in the final set.

Full C++ implementation:

cpp
#include <bits/stdc++.h>
using namespace std;

mt19937 rng_reservoir(chrono::steady_clock::now().time_since_epoch().count());

// Reservoir sampling: select k elements uniformly from stream
vector<int> reservoir_sample(vector<int>& stream, int k) {
    int n = stream.size();
    vector<int> reservoir(stream.begin(), stream.begin() + min(k, n));

    for (int i = k; i < n; i++) {
        int j = uniform_int_distribution<int>(0, i)(rng_reservoir);
        if (j < k) {
            reservoir[j] = stream[i];
        }
    }
    return reservoir;
}

Correctness proof sketch: by induction. After processing i elements, each of the first i is in the reservoir with probability k/i.

  • Base case (i=k): all k elements are in the reservoir, P=k/k=1.
  • Inductive step: element i+1 enters with probability k/(i+1). An existing element survives if either: (a) the new element is not chosen: probability (i+1k)/(i+1), or (b) the new element is chosen but replaces a different slot: probability k/(i+1)(k1)/k=(k1)/(i+1). Total survival probability: (i+1k)/(i+1)+(k1)/(i+1)=i/(i+1). By the inductive hypothesis, the element was present with probability k/i. New probability: (k/i)(i/(i+1))=k/(i+1). ∎

Fisher-Yates Shuffle (Knuth Shuffle)

Problem: generate a uniformly random permutation of an array in O(n) time.

cpp
void fisher_yates(vector<int>& a) {
    int n = a.size();
    for (int i = n - 1; i > 0; i--) {
        int j = uniform_int_distribution<int>(0, i)(rng);
        swap(a[i], a[j]);
    }
}

Correctness: at step i, the element placed at position i is chosen uniformly from i+1 candidates. Total distinct outcomes: n(n1)1=n!, and each permutation corresponds to exactly one sequence of choices—so all n! are equally likely.

text
Fisher-Yates on [A, B, C, D]:

Step i=3: swap a[3] with a[rand(0..3)]
  4 choices: [A,B,C,D] [A,B,D,C] [A,D,C,B] [D,B,C,A]
             (swap w/0) (swap w/1) (swap w/2) (swap w/3=noop)

Step i=2: swap a[2] with a[rand(0..2)]
  3 choices for each of the 4 above = 12 states

Step i=1: swap a[1] with a[rand(0..1)]
  2 choices for each = 24 states = 4! permutations  <-- all equally likely

Warning: random_shuffle was deprecated in C++14 and removed in C++17. Always use shuffle(a.begin(), a.end(), rng) instead.

Anti-Hash Attack — Why Random Bases Are Essential

This section digs into the mechanics of hash attacks, complementing the overview in the Anti-Hack Strategies section.

How Fixed Bases Get Hacked

With a fixed hash base (e.g., base = 31, mod = 1e9+7), an attacker can reverse-engineer collisions two ways:

  1. Polynomial root attack: a collision between strings s1s2 means h(s1)h(s2)0(modM)—a polynomial of degree L in b, with at most L roots mod M. Since b is known, the attacker constructs strings that are roots of that polynomial.

  2. Thue-Morse attack: strings built from the Thue-Morse sequence produce collisions for any base against a given modulus. This is the method behind the well-known unordered_map hack on Codeforces.

text
Attack on fixed base=31, mod=10^9+7:

Attacker knows: hash("ab") = 31*97 + 98 = 3105 (mod 10^9+7)
Attacker constructs s2 with hash("s2") = 3105 (mod 10^9+7)
  --> guaranteed collision!

With random base chosen at RUNTIME:
Attacker prepares collision for base=31... but you picked base=7392841.
  --> collision probability = L / (2^61 - 1) ~ 10^{-13}

  +-------------------------------------------------------+
  |  "A fixed base is a lock whose key is publicly known.  |
  |   A random base is a lock whose key is chosen after    |
  |   the attacker has already left."                      |
  +-------------------------------------------------------+

Birthday Paradox Collision Probability Table

How many hash comparisons Q until you expect a collision?

text
+----------------------------------------------------------------+
| Hash space       | Size           | Q for 50%    | Q for safe  |
|                  |                | collision    | (< 10^{-6}) |
+----------------------------------------------------------------+
| mod 10^9+7       | ~10^9          | ~31,623      | ~44         |
| mod 10^9+9       | ~10^9          | ~31,623      | ~44         |
| mod 2^32         | ~4*10^9        | ~65,536      | ~92         |
| mod 2^61-1       | ~2.3*10^18     | ~1.5*10^9    | ~2,147      |
| Double hash      | ~5.3*10^36     | ~2.3*10^18   | ~3*10^15    |
| (2^61-1)^2       |                |              |             |
+----------------------------------------------------------------+
  "Q for safe" = max Q such that collision prob < 10^{-6}
  Formula: Q < sqrt(2 * N * 10^{-6})

Practical rules for Codeforces:

  1. Never use mod = 10^9 + 7 alone with >100 hash comparisons.
  2. mod = 2^{61} - 1 is the safe default—good up to ~109 comparisons.
  3. For interactive or hacking-enabled problems, use double hash.
  4. Always choose the base randomly at runtime.

Cross-reference: see String Hashing for the full polynomial hashing tutorial, and Hash Maps for unordered_map anti-hack patterns.


The Birthday Paradox and Collision Analysis

Among k items drawn uniformly from a set of size N, the collision probability is approximately:

P(collision)1ek2/(2N)

A collision becomes likely (>50%) when kN.

Implications for competitive programming:

Hash space size NCollision likely after k values
109+731,623 hashes
23265,536 hashes
26111.52×109
2644.29×109

Rule of thumb: if you are making Q hash comparisons, you need NQ2. For Q=106 comparisons that means N1012—so 109+7 is not safe, but 2611 is.


Anti-Hack Strategies for Codeforces

Strategy 1: Shuffle the Input

If your algorithm depends on element order—BST insertion, quicksort, convex hull trick ordering—shuffle first:

cpp
shuffle(a.begin(), a.end(), rng);

Strategy 2: Randomize Hash Parameters

Never hardcode hash bases or moduli—choose them at runtime:

cpp
const long long MOD = (1LL << 61) - 1;
const long long BASE = rand_ll(256, MOD - 1);

Strategy 3: Randomize Map/Set Hash

unordered_map on Codeforces is vulnerable to collision attacks via the known default hash. Use a custom hash with a random component:

cpp
struct SafeHash {
    static uint64_t splitmix64(uint64_t x) {
        x += 0x9e3779b97f4a7c15;
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
        return x ^ (x >> 31);
    }
    size_t operator()(uint64_t x) const {
        static const uint64_t FIXED_RANDOM =
            chrono::steady_clock::now().time_since_epoch().count();
        return splitmix64(x + FIXED_RANDOM);
    }
};

unordered_map<long long, int, SafeHash> safe_map;

Strategy 4: Random Perturbation for Geometry

Floating-point geometry problems can fail on degenerate cases—collinear points, overlapping coordinates. A tiny random perturbation resolves them:

cpp
// Perturb points slightly to avoid degeneracies
for (auto& [x, y] : points) {
    x += uniform_real_distribution<double>(-1e-9, 1e-9)(rng);
    y += uniform_real_distribution<double>(-1e-9, 1e-9)(rng);
}

Strategy 5: Multiple Random Runs

When a single run might fail, repeat and take the best result:

cpp
int best = INT_MAX;
for (int attempt = 0; attempt < 30; attempt++) {
    shuffle(a.begin(), a.end(), rng);
    int result = solve(a);
    best = min(best, result);
}
cout << best << "\n";

Common Patterns

Random Hashing for String Comparison

Problem shape: compare Q pairs of substrings for equality.

Build a rolling hash with a random base. Each comparison is O(1) after O(n) preprocessing. Use the RollingHash struct from the Randomized Hashing section.

Random Sampling for Witness Finding

Problem shape: find an element satisfying property P, where n/k elements satisfy P.

Sample uniformly at random and verify. Expected attempts: k. Useful for:

  • Finding a vertex in the minimum cut (sample O(n) vertices)
  • Finding a color that appears frequently
  • Finding a random edge in a dense subgraph

Birthday Paradox for Collision Detection

Problem shape: find two elements that share a property (e.g., same hash, same remainder).

Generate O(N) random elements and check all pairs. The birthday paradox guarantees a collision is likely. Applications:

  • Pollard's rho factorization
  • Finding cycles in iterated functions
  • Collision search in meet-in-the-middle

Randomized Reduction to Simpler Problem

Problem shape: a complex problem that becomes tractable for random inputs.

Randomly transform the input to destroy adversarial structure, then solve the expected easy case. Examples:

  • Random coordinate rotation to avoid axis-aligned edge cases in geometry
  • Random edge contraction for minimum cut (Karger's algorithm)
  • Random XOR basis for linear algebra over GF(2)

Complexity Analysis

Each randomized technique trades a deterministic guarantee for a probabilistic one. The table below shows both the time cost and the error bound you accept.

TechniqueTime ComplexityError Probability
Random quicksortO(nlogn) expectedN/A (always correct)
Random quickselectO(n) expectedN/A (always correct)
Polynomial hash checkO(n) per queryL/p per comparison
Miller-Rabin (12 bases)O(log2n)0 for n<3.3×1023
Freivalds' (k rounds)O(kn2)2k
Random sampling (1/c fraction)O(cTverify) expectedTunable

Las Vegas vs. Monte Carlo:

  • Las Vegas: always correct, expected runtime bound—randomized quicksort, quickselect, random sampling with verification.
  • Monte Carlo: always fast, bounded error probability—Miller-Rabin (with random witnesses), Freivalds' algorithm, randomized hashing.

Monte Carlo solutions are accepted in competitive programming when the error probability is below 106 per test case.


Edge Cases and Pitfalls

Pitfall 1: Using rand() Instead of mt19937

rand() has too many problems to count: 15-bit range on many platforms, poor distribution, global state, and slow speed. Always use mt19937.

Pitfall 2: Fixed Seeds

cpp
// BAD: Hacker can replicate your random sequence
mt19937 rng(42);

// GOOD: Unpredictable seed
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());

Pitfall 3: Insufficient Hash Space

Using mod = 10^9 + 7 with 105 hash comparisons gives collision probability 105/109=104 per comparison. Over 105 comparisons, expected collisions 10. Switch to 2611 or double hashing.

Pitfall 4: Not Enough Iterations

For Monte Carlo algorithms, always compute the required repetitions. If each attempt succeeds with probability p, you need k=ln(106)/ln(1p) attempts for 106 failure probability.

Success prob pAttempts for <106 failure
1/220
1/330
1/10132
1/1001,382

Pitfall 5: Modular Arithmetic Overflow

When MOD is near 261, the intermediate product of a * b overflows 64-bit. Use __int128:

cpp
long long safe_mul(long long a, long long b, long long mod) {
    return (__int128)a * b % mod;
}

Pitfall 6: Biased Random Ranges

cpp
// BAD: biased if RAND_MAX+1 is not divisible by n
int x = rand() % n;

// GOOD: uniform distribution
int x = uniform_int_distribution<int>(0, n - 1)(rng);

Contest Cheat Sheet

text
+----------------------------------------------------------------------+
|                    RANDOMIZATION CHEAT SHEET                         |
+----------------------------------------------------------------------+
| SETUP (paste at top of every solution):                              |
|                                                                      |
|   mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
|                                                                      |
|   int rand_int(int a, int b) {                                      |
|       return uniform_int_distribution<int>(a, b)(rng);              |
|   }                                                                  |
+----------------------------------------------------------------------+
| ANTI-HACK PATTERNS:                                                  |
|                                                                      |
|   Adversarial order:   shuffle(a.begin(), a.end(), rng);            |
|   Hash collision:      base = rand_int(256, MOD - 1);               |
|   Map blowup:          unordered_map<K, V, SafeHash> mp;            |
+----------------------------------------------------------------------+
| COMMON TECHNIQUES:                                                   |
|                                                                      |
|   Random sampling: pick random element, check property               |
|     -> if >= 1/k of elements satisfy, expected k attempts            |
|     -> 30 attempts gives < 10^{-6} failure for k = 2                |
|                                                                      |
|   Birthday paradox: O(sqrt(N)) samples -> likely collision in N      |
|     -> mod 10^9+7: collisions after ~sqrt(10^9) ~= 31623            |
|     -> mod 2^61-1: safe up to ~sqrt(2^61) ~= 1.5x10^9              |
|                                                                      |
|   Monte Carlo: repeat k times for error <= 2^{-k}                   |
|     -> k = 20 gives error < 10^{-6} (when each round p >= 1/2)     |
+----------------------------------------------------------------------+
| KEY COMPLEXITIES:                                                    |
|                                                                      |
|   Randomized quicksort:   O(n log n) expected, always correct       |
|   Randomized quickselect: O(n) expected, always correct             |
|   nth_element (STL):      O(n) expected                             |
|   Miller-Rabin (12 bases): O(log^2 n), exact for n < 3.3x10^23     |
|   Treap (split/merge):    O(log n) expected per operation           |
|   Poly hash check:        O(n), error <= L/p per comparison         |
|   Freivalds' (k rounds):  O(k*n^2), error <= 2^{-k}               |
+----------------------------------------------------------------------+
| SAFE HASH FOR unordered_map (splitmix64):                            |
|                                                                      |
|   struct SafeHash {                                                  |
|     static uint64_t splitmix64(uint64_t x) {                        |
|       x += 0x9e3779b97f4a7c15;                                      |
|       x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;                   |
|       x = (x ^ (x >> 27)) * 0x94d049bb133111eb;                   |
|       return x ^ (x >> 31);                                         |
|     }                                                                |
|     size_t operator()(uint64_t x) const {                            |
|       static const uint64_t FIXED_RANDOM =                           |
|         chrono::steady_clock::now().time_since_epoch().count();      |
|       return splitmix64(x + FIXED_RANDOM);                           |
|     }                                                                |
|   };                                                                 |
|   // Usage: unordered_map<long long, int, SafeHash> mp;              |
+----------------------------------------------------------------------+
| LAS VEGAS vs MONTE CARLO:                                            |
|                                                                      |
|   Las Vegas:  always correct, random runtime                        |
|               (quicksort, quickselect, treap)                        |
|   Monte Carlo: always fast, bounded error probability               |
|               (Miller-Rabin, Freivalds', hash comparison)            |
+----------------------------------------------------------------------+

Self-Test

Verify you can do all of these without looking at the notes:

  • [ ] Write the three-line mt19937 seeding pattern plus a rand_int(a, b) helper from memory. Explain why chrono seeding matters on Codeforces.
  • [ ] Given 105 hash comparisons using mod 109+7, estimate the collision probability via the birthday paradox. State why switching to mod 2611 or double hashing fixes it.
  • [ ] Distinguish Las Vegas from Monte Carlo: give one example of each from this chapter and state precisely what guarantee each provides (correctness vs runtime).
  • [ ] State the birthday paradox threshold: how many random samples from a space of size N before a collision is expected? Write the formula.
  • [ ] Write SafeHash (the splitmix64 variant) for unordered_map from memory. Explain what attack it defends against and why the default std::hash<int> is vulnerable.
  • [ ] A problem says "n/3 of elements satisfy property P." How many random samples are needed to find one with probability >1106? Show the calculation.
  • [ ] Explain why rng() % n is biased and what to use instead.
  • [ ] State the expected number of comparisons in randomized quicksort and why the worst case O(n2) is astronomically unlikely.

Practice Problems

#ProblemKey TechniqueDifficulty
1CF 364D - GhdRandom sampling + GCDCF 2000
2CF 869E - The Untended Antiquity2D hashing with random valuesCF 2100
3CF 1746F - KazaeeRandomized parity checkCF 2100
4CF 585F - Digits of Number PiAho-Corasick + random hashingCF 2100
5CF 1178F2 - Long Colorful StripRandom shuffle + brute forceCF 1900
6CF 1523D - Love-HateRandom sampling + SOS DPCF 2000
7CF 1477C - Nezzar and Nice BeatmapRandom permutation constructionCF 1400
8CF 727F - Polycarp's ProblemsRandomized binary searchCF 1600
9CF 1354D - MultisetBinary search + anti-hack shuffleCF 1400
10CF 1514D - Cut and StickRandom sampling for majority elementCF 1600
11PlaylistSliding window with hash set for distinct elementsCSES 1400
12String MatchingCan solve with randomized polynomial hashingCSES 1500

Suggested Progression

  1. Start: #7, #9 (direct application of shuffle / random selection)
  2. Build: #8, #10 (random sampling with verification)
  3. Strengthen: #1, #6 (random sampling + complex verification)
  4. Master: #2, #3 (randomized hashing in advanced settings)

Probabilistic Method in Competitive Programming

The probabilistic method proves that an object with a desired property exists by showing that a random construction achieves it with positive probability. In CP this translates directly: randomly generate a solution and it is likely to be valid.

Random Coloring Example

Problem: given a graph with m edges, find a 2-coloring that cuts at least m/2 edges (an edge is "cut" if its endpoints get different colors).

Color each vertex independently, each color with probability 1/2. Each edge is cut with probability 1/2, so by linearity of expectation the expected number of cut edges is m/2. Since the expectation equals m/2, some coloring must achieve at least m/2 cuts. A random coloring lands there in expectation, and repeating 30 times makes it overwhelmingly likely.

cpp
// Random 2-coloring that cuts >= m/2 edges in expectation
// Repeat a few times, take the best
int best_cut = 0;
vector<int> best_color;

for (int attempt = 0; attempt < 30; attempt++) {
    vector<int> color(n);
    for (int i = 0; i < n; i++) color[i] = rng() & 1;

    int cut = 0;
    for (auto& [u, v] : edges)
        cut += (color[u] != color[v]);

    if (cut > best_cut) {
        best_cut = cut;
        best_color = color;
    }
}
// best_cut is very likely >= m/2
text
Random 2-coloring on a 6-vertex graph:

  (W)----(B)          W = white, B = black
  / \    / \          Edges between different colors = CUT
(B) (W)(B) (W)       Edges between same colors = UNCUT
  \  |  |  /
   (B)--(W)           Cut edges: 7 out of 10  (>= m/2 = 5)  OK

If I Had 5 Minutes

Randomization doesn't make your algorithm lucky—it makes the adversary blind. By injecting randomness into your decisions, you transform worst-case inputs into average-case inputs: no fixed input can be consistently bad against unpredictable choices. Think of it as shuffling a deck before dealing—once the cards are randomized, any stacked arrangement is neutralized.

The mnemonic to burn in: SHUFFLE—SEED—SPACE—the three S's of competitive-programming randomization:

  • Shuffle the input to destroy adversarial ordering
  • Seed with chrono to prevent replication across submissions
  • Space must be large (mod2611) to survive the birthday paradox

Last-minute review before a contest:

  1. Seed: mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
  2. Shuffle: shuffle(a.begin(), a.end(), rng);—kills adversarial order
  3. Safe hash: random base + mod 2611—kills collision attacks
  4. Safe map: SafeHash with splitmix64 + FIXED_RANDOM—kills unordered_map hacks
  5. Repeat: for Monte Carlo, 20–30 iterations gives error <106

Pattern Fingerprint

Constraint PatternTechniqueWhy
"Find any valid answer" + large nRandom sampling + verifyExpected O(1) attempts if fraction is constant
String comparison, Q>104 queriesRandom-base polynomial hashPrevents adversarial collision construction
unordered_map with n>104SafeHash (splitmix64 + random salt)Default hash is publicly known and hackable
Sorted/structured input + TLE riskshuffle() before processingDestroys adversarial ordering
k-th element, n>106Randomized quickselect / nth_elementExpected O(n) vs O(nlogn) sort
Dynamic ordered set with k-th queryTreap with split/mergeO(logn) expected for all operations
Primality test, n1018Miller-Rabin with 12 fixed witnessesO(log2n), deterministic for 64-bit
Matrix product verification AB=CFreivalds' algorithmO(kn2) vs O(n3) multiplication
"Majority element in subarray"Random sampling + count>1/2 chance per sample, repeat 30x
Interactive / hacking-enabled problemDouble hash + chrono seedMaximum protection against adversary

Rating Progression

CF 1200 (Beginner)

text
+----------------------------------------------------------+
| What you'll see:                                         |
|   - Problems solvable with sort + greedy                 |
|   - Randomization not required, but shuffle() can save   |
|     you from adversarial TLE on quicksort-based ideas    |
| Key skill: know how to seed mt19937 and call shuffle()   |
+----------------------------------------------------------+

CF 1500 (Intermediate)

text
+----------------------------------------------------------+
| What you'll see:                                         |
|   - Hashing problems where random base prevents hacks    |
|   - "Find any valid element" via random sampling         |
|   - unordered_map TLE from collision attacks              |
| Key skill: SafeHash, random-base polynomial hashing,     |
|            random sampling for majority/witness finding   |
+----------------------------------------------------------+

CF 1800 (Advanced)

text
+----------------------------------------------------------+
| What you'll see:                                         |
|   - Treap as ordered set with k-th element queries       |
|   - Miller-Rabin for large primality checks              |
|   - Randomized algorithms as the intended solution       |
| Key skill: treap split/merge, quickselect, Freivalds',   |
|            probability analysis for error bounds          |
+----------------------------------------------------------+

CF 2100 (Expert)

text
+----------------------------------------------------------+
| What you'll see:                                         |
|   - Probabilistic method arguments (random coloring)     |
|   - Pollard rho factorization                            |
|   - Randomized complexity reductions                     |
|   - Problems where randomization IS the editorial        |
| Key skill: designing custom randomized algorithms,       |
|            analyzing expected value and error probability |
+----------------------------------------------------------+

Before You Code Checklist

Before writing a randomized solution, verify:

  • [ ] Seed is unpredictable: using chrono::steady_clock, not a fixed constant
  • [ ] Distribution is uniform: using uniform_int_distribution, not rng() % n
  • [ ] Enough iterations: for Monte Carlo, calculate k needed for <106 error
  • [ ] Hash space is large enough: NQ2 where Q = number of comparisons
  • [ ] Solution is testable: you can stress-test against a brute-force solution

What Would You Do If...?

Scenario 1: your unordered_map solution passes on your machine but gets TLE on Codeforces with n=2×105.

The default hash is being attacked. The attacker constructed keys that all land in the same bucket, turning O(1) lookups into O(n). Replace with SafeHash (Strategy 3 in the Anti-Hack Strategies section).

Scenario 2: your rolling hash solution gets WA on test 42, but passes all other tests including stress tests with n=106.

Likely a hash collision. The birthday paradox says 105 comparisons against a space of 109+7 is already in dangerous territory. Switch to double hash with 2611.

Scenario 3: you need the median of 107 elements in under 1 second, but sorting takes 1.5 seconds.

Use randomized quickselect (O(n) expected) or nth_element from the STL. Don't sort the full array when you only need one order statistic.


Historical Note

The systematic study of randomized algorithms began with Michael Rabin's 1976 paper introducing the Miller-Rabin primality test and randomized closest-pair algorithms. László Babai coined the term "Las Vegas algorithm" in 1979. Cecilia Aragon and Raimund Seidel invented treaps in 1989, elegantly fusing two classical data structures with a dash of randomness.


The Mistake That Teaches

The Debugging Story: "Why Does My Hash Keep Colliding?"

Alice writes a string matching solution for a Codeforces round. She uses polynomial hashing with base = 31 and mod = 1e9 + 7. Her solution passes pretests. During hacking, Bob submits a test with 200,000 short strings.

Alice's solution gets Wrong Answer.

She's confused—her hash logic is correct. She tests locally with random strings: no issues. She tests with Bob's input: collisions everywhere.

What happened? Bob ran a script that generates strings with identical hashes for base 31 mod 109+7. Because Alice's base was hardcoded, Bob precomputed the collision strings offline.

The fix (three characters changed):

cpp
// BEFORE (hackable):
const long long BASE = 31;

// AFTER (safe):
const long long BASE = rand_ll(256, MOD - 1);

The lesson: a hardcoded base is a publicly visible lock combination. A runtime-random base is a lock that changes with every submission. The adversary arrives with the old key and finds it doesn't fit.

text
Alice's timeline:
  t=0    Submit with BASE=31         --> AC on pretests
  t=30   Bob downloads her code, sees BASE=31
  t=31   Bob generates collision strings for BASE=31
  t=32   Bob hacks Alice              --> WA

Alice v2:
  t=0    Submit with BASE=rand_ll()  --> AC on pretests
  t=30   Bob downloads her code, sees BASE=rand_ll()
  t=31   Bob: "I can't predict the base..."  --> hack FAILS

When NOT to Use Randomization

Randomization is powerful, but not always the right call:

  1. When deterministic O(nlogn) exists and suffices: don't add complexity. std::sort is already introsort—quicksort with a heapsort fallback—giving guaranteed O(nlogn).

  2. When the problem requires exact answers with a proof: some problems, especially interactive ones, need deterministic guarantees. A Monte Carlo approach with 106 error probability may fail on the one test case it gets wrong.

  3. When n is small enough for brute force: randomization adds code complexity. If n20, just enumerate.

  4. When the random approach has high constant factors: mt19937 is not free. In ultra-tight time limit problems (n=107, TL = 1s), the overhead of random number generation in every operation can tip the balance.

  5. When you cannot verify the answer: Las Vegas algorithms need a cheap verification step. If verification is as expensive as the computation itself, random restarts buy nothing.


Debug This

Exercise 1: this shuffle is biased. Why?

cpp
// BUGGY: biased shuffle
void bad_shuffle(vector<int>& a) {
    int n = a.size();
    for (int i = 0; i < n; i++) {
        int j = uniform_int_distribution<int>(0, n - 1)(rng);
        swap(a[i], a[j]);
    }
}
Answer

The range of j should be [i, n-1] (or [0, i]), not [0, n-1]. With [0, n-1], there are nn equally likely outcomes but only n! permutations. Since nn is not divisible by n! for n>2, some permutations are more likely than others. The correct Fisher-Yates shuffle picks j from [0, i] (backward) or [i, n-1] (forward).

Exercise 2: this hash produces collisions even with random base. Why?

cpp
// BUGGY: broken hash
long long hash_string(const string& s) {
    long long h = 0;
    long long base = rand_ll(2, 1000000006);
    long long mod = 1000000007;
    for (char c : s) {
        h = h * base + c;
        h %= mod;
    }
    return h;
}
Answer

The base is re-randomized on every call! Two calls to hash_string on the same string will likely produce different hashes because each call picks a new random base. The base must be generated ONCE (globally or in a struct constructor) and reused for all hash computations in the same run.

Exercise 3: this reservoir sampling has off-by-one. Where?

cpp
// BUGGY: off-by-one in reservoir sampling
vector<int> sample(vector<int>& stream, int k) {
    vector<int> res(stream.begin(), stream.begin() + k);
    for (int i = k; i < (int)stream.size(); i++) {
        int j = uniform_int_distribution<int>(0, i - 1)(rng);
        if (j < k) res[j] = stream[i];
    }
    return res;
}
Answer

The distribution should be uniform_int_distribution<int>(0, i), not (0, i - 1). With (0, i-1), the i-th element (0-indexed) is included with probability k/i instead of the correct k/(i+1). This makes earlier elements underrepresented and later elements overrepresented.

Exercise 4: this treap merge has a silent bug that corrupts the tree. What is it?

cpp
// BUGGY: treap merge
Node* merge(Node* L, Node* R) {
    if (!L) return R;
    if (!R) return L;
    if (L->priority > R->priority) {
        L->right = merge(L->right, R);
        return L;
    } else {
        R->left = merge(L, R->left);
        return R;
    }
}
Answer

The update() call is missing! After modifying L->right or R->left, the size field is stale. Any subsequent call to sz() or kth() will return wrong results. The fix: add update(L); return L; and update(R); return R; respectively.


Cross-References

  • String Hashing: full treatment of polynomial hashing, rolling hash, and substring comparison. The random-base technique from the Randomized Hashing section is applied there for anti-hack safety.

  • Hash Maps and Unordered Maps: explains why unordered_map is vulnerable and how SafeHash (Anti-Hack Strategies) prevents collision attacks. Essential reading for Codeforces contestants.

  • Sorting and Searching: covers deterministic sorting algorithms. Randomized quicksort and quickselect above are randomized extensions of the partition-based techniques discussed there.


Extended Practice Problems

#ProblemKey TechniqueDifficulty
13CF 1523D - Love-HateRandom sampling + SOS DPCF 2000
14CF 896C - Willem, Chtholly and SenioriousChtholly tree (randomized interval merging)CF 2100
15CF 1763C - Another Array ProblemRandom observation for small casesCF 1400
16CF 1154G - Minimum Possible LCMRandom sampling + GCD tricksCF 1900
17CF 1081G - Mergesort Strikes BackExpected value of random permutationsCF 2400
18CF 1705E - Mark and ProfessorRandomized hashing on bitsetsCF 2100
19CF 1554D - DianeHash-based string constructionCF 1800
20CF 1142B - Lynyrd SkynyrdRandom sampling for cycle detectionCF 1800

Extended Progression

  1. Warm-up (CF 1200–1400): #7, #9, #15—shuffle, basic random selection
  2. Foundation (CF 1400–1600): #8, #10, #19—sampling, hashing patterns
  3. Intermediate (CF 1600–1900): #1, #5, #16, #20—sampling + complex logic
  4. Advanced (CF 1900–2100): #2, #3, #6, #14, #18—advanced randomized DS
  5. Expert (CF 2100+): #4, #13, #17—deep probability + randomized analysis

Further Reading


Next Up

Continue to: Binary Lifting and LCA for techniques where preprocessing meets tree structure, or return to Sorting and Searching to solidify the deterministic foundations that randomization builds upon.


Appendix: Quick Reference

cpp
// --- Paste this block at the top of your solution ---
#include <bits/stdc++.h>
using namespace std;

mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
mt19937_64 rng64(chrono::steady_clock::now().time_since_epoch().count());

int rand_int(int a, int b) {
    return uniform_int_distribution<int>(a, b)(rng);
}

long long rand_ll(long long a, long long b) {
    return uniform_int_distribution<long long>(a, b)(rng64);
}

struct SafeHash {
    static uint64_t splitmix64(uint64_t x) {
        x += 0x9e3779b97f4a7c15;
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
        return x ^ (x >> 31);
    }
    size_t operator()(uint64_t x) const {
        static const uint64_t FIXED_RANDOM =
            chrono::steady_clock::now().time_since_epoch().count();
        return splitmix64(x + FIXED_RANDOM);
    }
};
// --- End of randomization toolkit ---

Built for the climb from Codeforces 1100 to 2100.