String Hashing

AL-34 | Map strings to integers via polynomial hashing -- compare substrings in $O (1)$ after $O (n)$ preprocessing. A pragmatic comparator that wins most contest string problems, as long as you remember it is a fingerprint, not a proof.

Difficulty: [Intermediate]Prerequisites: Arrays and Strings, Math FundamentalsCF Rating Range: 1400 -- 1800

Contest Frequency: ** Regular -- common in string comparison problems

Quick Revisit
USE WHEN: fast substring comparison, counting distinct substrings, or string matching
INVARIANT: hash(s[l..r]) computed in O(1) from prefix hashes and precomputed powers
TIME: O(n) preprocess, O(1) per query | SPACE: O(n)
CLASSIC BUG: single hash causes collisions -- always use double hashing (two mod values)
PRACTICE: Strings Practice Ladder

Intuition
C++ STL Reference
Implementation (Contest-Ready)
Operations Reference
Problem Patterns & Tricks
Contest Cheat Sheet
Gotchas & Debugging
Self-Test
Practice Problems
Further Reading

Intuition

Suppose you want to check whether a pattern $p$ of length $m = 5$ appears inside a text $s$ of length $n = 20$ .

The naive approach: slide $p$ over every position $i$ in $s$ and compare character by character.

  s: [a b c a b c d a b c a b d a b c a b c a]    n = 20
  p: [a b c a b]                                    m = 5

  Position 0:  compare s[0..4] vs p[0..4]  ->  5 comparisons
  Position 1:  compare s[1..5] vs p[0..4]  ->  up to 5 comparisons
  ...
  Position 15: compare s[15..19] vs p[0..4] ->  up to 5 comparisons
  ---------------------------------------------------------------
  Total positions: n - m + 1 = 16
  Worst-case comparisons: 16 * 5 = 80 = O(n * m)

For $n = 10^{6}$ and $m = 10^{3}$ , that is $10^{9}$ character comparisons -- too slow. We need a way to compare entire substrings in $O (1)$ .

Convert each substring to a single number (its hash) so that comparing two substrings takes $O (1)$ instead of $O (m)$ -- and use a polynomial rolling hash to compute all $n - m + 1$ substring hashes in $O (n)$ total.

Analogy: Imagine you have a phone book with millions of names. Comparing two names letter-by-letter is slow. But if every name had a unique phone number, you could compare two numbers in a single step. A polynomial hash is that phone number -- it compresses an arbitrary-length string into a fixed-size integer. Two different strings might share a number (collision), but with a good hash this is astronomically unlikely.

Visual walkthrough

We hash the string $s$ = "abcab" using base $b = 31$ and character values $a = 1, b = 2, c = 3$ . Think of each string as a number in base $b$ .

Step 1 -- Build prefix hashes left to right.

Each prefix hash absorbs one more character: $h [i + 1] = h [i] \cdot b + s [i]$ .

  i:       0       1       2       3       4
  s[i]:    a(1)    b(2)    c(3)    a(1)    b(2)

  h[0] = 0
  h[1] = h[0]*31 + 1  =  0 + 1       =  1
  h[2] = h[1]*31 + 2  =  31 + 2      =  33
  h[3] = h[2]*31 + 3  =  1023 + 3    =  1026
  h[4] = h[3]*31 + 1  =  31806 + 1   =  31807
  h[5] = h[4]*31 + 2  =  986017 + 2  =  986019

Step 2 -- Extract hash of a substring in $O (1)$ via the prefix trick.

Want hash of $s [2. .4]$ = "cab". The prefix hash $h [5]$ encodes "abcab" and $h [2]$ encodes "ab". Subtract the contribution of "ab" shifted into the high digits:

  hash(s[2..4]) = h[5] - h[2] * b^3
                = 986019 - 33 * 29791
                = 986019 - 983103
                = 2916       (all mod some prime p)

Verify directly: $3 \cdot 31^{2} + 1 \cdot 31 + 2 = 2883 + 31 + 2 = 2916$ . Correct.

Step 3 -- Slide the window: rolling from $s [0. .4]$ to $s [1. .5]$ .

  hash(s[0..4]) = h[5] - h[0] * b^5 = 986019 - 0 = 986019
  hash(s[1..5]) = h[6] - h[1] * b^5

Each new window hash is one subtraction and one lookup -- $O (1)$ per position.

Step 4 -- Compare pattern hash to each window hash.

  pattern "abcab" hash = 986019
  Window i=0: hash(s[0..4]) = 986019  -> MATCH (verify chars to confirm)
  Window i=1: hash(s[1..5]) = ...     -> compare in O(1)
  Window i=2: hash(s[2..6]) = ...     -> compare in O(1)
  ...
  Total: O(n) hash comparisons instead of O(n * m) char comparisons

The invariant

hash( $s [l . . r]$ ) = ( $h [r + 1] - h [l] \cdot b^{r - l + 1}$ ) mod $p$
Every substring of $s$ maps to a number in $[0, p)$ . Two equal substrings always produce the same hash. Two different substrings may collide (same hash), but with a random base and large prime $p$ , the collision probability per pair is $\sim 1 / p$ .

Use double hashing (two independent $(b, p)$ pairs) to push collision probability to $\sim 1 / (p_{1} \cdot p_{2}) \approx 10^{- 18}$ -- effectively zero for contest constraints ( $n \leq 10^{6}$ ).

  Birthday paradox: collision probability vs. number of hashes

  Single hash (mod ~= 10⁹):
  
  Hashes:    100    1,000   10,000   30,000   100,000
  P(col):   ~0%    ~0.05%   ~5%     ~50%     ~99.9%
             ─       ─       ░░      ████     ████████
  
  Double hash (mod ~= 10¹⁸):
  
  Hashes:    100    1,000   10⁶      10⁹
  P(col):   ~0%    ~0%     ~0%      ~0.0000005%
             ─       ─       ─        ─
  
  With n = 10⁵ substrings and single hash:
    Pairs ~= 5 x 10⁹  ->  EXPECT ~5 collisions  ->  WA
  
  With double hash:
    Pairs ~= 5 x 10⁹  ->  EXPECT ~5 x 10⁻⁹ collisions  ->  SAFE

Fingerprints, not proofs

It is worth being precise about what a hash equality means.

Mathematical equality of two substrings is a property of the strings themselves: every character at every offset matches. KMP, Z, suffix arrays, and direct character comparison all establish this.
Hash equality says only that two strings produce the same fingerprint under your chosen (base, mod). Equal strings always have equal hashes; the reverse is "almost always" true under double hashing with random parameters.

Two practical consequences:

For most contest problems, double hashing is reliable enough that treating "hash equal" as "string equal" is fine. The expected number of collisions over a contest's worth of comparisons is effectively zero.
When the cost of one wrong answer is total failure -- e.g., you build a graph of "equal" substrings and run DFS, or you commit to an output based on a single equality check -- consider verifying surviving candidates by direct character comparison after the hash filter:
cpp
```
if (hs.query(l1, r1) == hs.query(l2, r2)
    && equal(s.begin() + l1, s.begin() + r1 + 1, s.begin() + l2)) {
    // confirmed equal
}
```
The verification is $O (len)$ per surviving pair, but you only pay it on hash matches, so total cost stays close to the hash-only version.

This is the core editorial takeaway: hashing is randomized contest reliability, not mathematical equality. Use it freely, but design your algorithm so a single false positive cannot silently corrupt the rest of the work.

What the Code Won't Teach You

When exact algorithms are strictly better. Hashing is probabilistic -- a collision gives a wrong answer. KMP and suffix arrays are deterministic. For problems where you string together multiple equality checks and act on them irreversibly (e.g., building a graph of equal substrings, then running DFS), a single hash collision cascades into completely wrong output. Exact algorithms don't have this failure mode. Know when the risk is acceptable.

The birthday paradox is your enemy. With a single mod $\sim 10^{9}$ , you need only $\sim \sqrt{10^{9}} \approx 30,000$ distinct hash values before a 50% chance of collision. If you're hashing all $O (n^{2})$ substrings for $n = 3000$ , that's $\sim 4.5 \times 10^{6}$ hashes -- collisions are virtually guaranteed. Double hashing ( $10^{18}$ collision resistance) is not overkill; it's the minimum for safety.

Hashing on unordered_map is double jeopardy. If you store hash values in an unordered_map, the map itself uses another hash function on your keys. A collision in the map's internal hash slows lookups to $O (n)$ ; a collision in your string hash gives wrong answers. Both can be exploited by adversarial inputs on Codeforces. Use map (slower but safe) or a custom splitmix64 hash for the container.

Defending against anti-hash tests. Codeforces allows "hacking" with adversarial inputs, and there are well-known techniques for constructing strings that collide under common hash parameters. Your defenses: (1) randomize the base and mod at runtime so attackers can't precompute collisions, (2) use double hashing with two independent (mod, base) pairs, (3) avoid the popular mods like $10^{9} + 7$ or $10^{9} + 9$ -- use large random primes instead. A hashing solution that works on pretests but dies to a hack is worse than a slower deterministic one that passes everything.

The mental model: hashing is fingerprinting, not identification. Think of a hash as a fingerprint -- if two fingerprints differ, the strings are definitely different; if they match, the strings are probably the same. This one-sided guarantee means hashing excels at disproving equality (filter out non-matches fast) but can't prove it. Design your algorithms around this asymmetry. Use hashing to narrow candidates, then verify the survivors with exact comparisons if correctness demands it.

Where hashing truly shines over heavier machinery. Suffix arrays and automata are powerful but rigid -- they process one fixed string. Hashing is the most flexible string tool you have: compare substrings of different strings, hash 2D grids or trees, combine with binary search to find longest common prefixes, or roll hashes across a sliding window. Anytime a problem involves comparing pieces from multiple sources or the structure doesn't fit neatly into suffix-array territory, hashing is often the simplest correct approach.

When to reach for this

Trigger phrases in problem statements:

"compare substrings for equality"
"pattern matching" / "find all occurrences"
"count distinct substrings"
"check if a substring is a palindrome"
"longest common substring" (combine with binary search)
"string period" / "string has a repeating pattern"

Counter-examples -- when hashing is NOT the best tool:

You need a guaranteed exact match with no false positives -> use KMP or Z-function (deterministic).
Single-pattern search where simplicity matters -> KMP may be simpler and equally fast.
You need the full suffix array or LCP array -> build a suffix array directly.
The problem involves edits/updates to the string -> hashing requires $O (n)$ rebuild (consider segment tree + hashing for point updates).

C++ STL Reference

String hashing is not provided by the STL -- you implement it yourself. These STL tools help:

Function / Class	Header	What it does	Time Complexity
`string::operator[]`	`<string>`	Access character at index (use for hash value)	$O (1)$
`string::substr(pos, len)`	`<string>`	Extract substring (avoid in hot loops -- $O (n)$ )	$O (n)$
`string::size()`	`<string>`	Length of string	$O (1)$
`mt19937 rng(chrono::steady_clock::now().time_since_epoch().count())`	`<random>`, `<chrono>`	Randomized base to defeat anti-hash tests	$O (1)$
`uniform_int_distribution<int>`	`<random>`	Pick random base in range $[l o, h i]$	$O (1)$
`__int128`	built-in (GCC)	Avoid overflow in modular multiplication	$O (1)$

Key detail: Don't use std::hash<string> for CP -- it's not portable, not invertible over substrings, and trivially hackable on Codeforces.

Implementation (Contest-Ready)

Version 1: Single Hash -- Minimal Template

cpp

#include <bits/stdc++.h>
using namespace std;

struct SingleHash {
    static const long long MOD = 1e9 + 7;
    static const long long BASE = 131;  // pick randomly in contest
    int n;
    vector<long long> h, pw;

    SingleHash(const string& s) : n(s.size()), h(n + 1), pw(n + 1) {
        pw[0] = 1;
        for (int i = 0; i < n; i++) {
            h[i + 1] = (h[i] * BASE + s[i]) % MOD;
            pw[i + 1] = pw[i] * BASE % MOD;
        }
    }

    // hash of s[l..r] (0-indexed, inclusive)
    long long query(int l, int r) {
        return (h[r + 1] - h[l] * pw[r - l + 1] % MOD + MOD) % MOD;
    }
};

int main() {
    string s;
    cin >> s;
    SingleHash sh(s);

    int q;
    cin >> q;
    while (q--) {
        int l1, r1, l2, r2;
        cin >> l1 >> r1 >> l2 >> r2;
        cout << (sh.query(l1, r1) == sh.query(l2, r2) ? "Yes" : "No") << "\n";
    }
}

Version 2: Double Hash with Substring Query -- Explained

cpp

#include <bits/stdc++.h>
using namespace std;

struct DoubleHash {
    // Two large primes far from powers of 2.
    // Randomize BASE1, BASE2 in contest to resist hacking.
    static constexpr long long MOD1 = 1e9 + 7, MOD2 = 1e9 + 9;
    long long BASE1, BASE2;
    int n;
    vector<long long> h1, h2, pw1, pw2;

    DoubleHash(const string& s, long long b1 = 131, long long b2 = 137)
        : BASE1(b1), BASE2(b2), n(s.size()),
          h1(n + 1), h2(n + 1), pw1(n + 1), pw2(n + 1) {
        pw1[0] = pw2[0] = 1;
        for (int i = 0; i < n; i++) {
            // Build prefix hashes: h[i+1] = h[i] * base + s[i]
            h1[i + 1] = (h1[i] * BASE1 + s[i]) % MOD1;
            h2[i + 1] = (h2[i] * BASE2 + s[i]) % MOD2;
            // Precompute powers of base
            pw1[i + 1] = pw1[i] * BASE1 % MOD1;
            pw2[i + 1] = pw2[i] * BASE2 % MOD2;
        }
    }

    // Returns (hash1, hash2) for substring s[l..r] (0-indexed, inclusive).
    // Formula: h[r+1] - h[l] * base^(r-l+1), taken mod.
    // The +MOD before final %MOD handles C++ negative remainder.
    pair<long long, long long> query(int l, int r) {
        int len = r - l + 1;
        long long v1 = (h1[r + 1] - h1[l] * pw1[len] % MOD1 + MOD1) % MOD1;
        long long v2 = (h2[r + 1] - h2[l] * pw2[len] % MOD2 + MOD2) % MOD2;
        return {v1, v2};
    }

    // Compare two substrings for equality in O(1).
    bool equal(int l1, int r1, int l2, int r2) {
        return query(l1, r1) == query(l2, r2);
    }
};

// Randomize bases at startup to defeat anti-hash tests.
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
long long rand_base(long long lo = 256, long long hi = 1e9) {
    return uniform_int_distribution<long long>(lo, hi)(rng);
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    string s, t;
    cin >> s >> t;

    long long b1 = rand_base(), b2 = rand_base();
    DoubleHash hs(s, b1, b2), ht(t, b1, b2);

    // Example: find all occurrences of t in s (Rabin-Karp)
    int n = s.size(), m = t.size();
    auto target = ht.query(0, m - 1);
    vector<int> matches;
    for (int i = 0; i + m - 1 < n; i++) {
        if (hs.query(i, i + m - 1) == target)
            matches.push_back(i);
    }

    cout << matches.size() << "\n";
    for (int pos : matches) cout << pos + 1 << " ";
    cout << "\n";
}

Operations Reference

All operations assume the prefix hash array is already built. The $O (1)$ substring query is what makes hashing competitive with dedicated string algorithms.

Operation	Time	Space
Build prefix hashes (length $n$ )	$O (n)$	$O (n)$
Query hash of substring $s [l . . r]$	$O (1)$	--
Compare two substrings	$O (1)$	--
Rabin-Karp: find pattern $m$ in text $n$	$O (n + m)$	$O (n + m)$
Hash + binary search (e.g., longest common prefix)	$O (n \log n)$	$O (n)$
Hash all substrings of length $k$	$O (n)$	$O (n)$
Build hash for two strings	$O (n + m)$	$O (n + m)$
Rebuild hash after single-char update	$O (n)$	--

Problem Patterns & Tricks

Pattern 1: Rabin-Karp String Matching

Find all occurrences of pattern $t$ in text $s$ . Hash $t$ , then slide a window of length $| t |$ over $s$ comparing hashes.

cpp

auto target = ht.query(0, m - 1);
for (int i = 0; i + m <= n; i++)
    if (hs.query(i, i + m - 1) == target)
        // match at position i

Problems: CF 271D, CF 126B

Pattern 2: Longest Common Substring via Binary Search + Hashing

Binary search on length $k$ . For each $k$ , hash all substrings of length $k$ from both strings, check for common hash in a set.

cpp

auto check = [&](int k) -> bool {
    unordered_set<long long> seen;
    for (int i = 0; i + k <= n; i++)
        seen.insert(combine(hs.query(i, i + k - 1)));
    for (int i = 0; i + k <= m; i++)
        if (seen.count(combine(ht.query(i, i + k - 1))))
            return true;
    return false;
};
// binary search on k in [0, min(n, m)]

Total: $O ((n + m) \log min (n, m))$ .

Problems: CF 580E

Pattern 3: Palindrome Check

A substring $s [l . . r]$ is a palindrome iff its forward hash equals its reverse hash. Build hashes for both $s$ and $reverse (s)$ .

cpp

// forward hash of s[l..r]
auto fwd = hs.query(l, r);
// reverse hash: position in reversed string is (n-1-r)..(n-1-l)
auto rev = hr.query(n - 1 - r, n - 1 - l);
bool is_palindrome = (fwd == rev);

Problems: CF 7D, CF 1326D2

Pattern 4: Count Distinct Substrings

Hash every substring of every length, insert into a set. The set size is the answer.

cpp

set<pair<long long,long long>> distinct;
for (int len = 1; len <= n; len++)
    for (int i = 0; i + len <= n; i++)
        distinct.insert(hs.query(i, i + len - 1));
cout << distinct.size();

$O (n^{2} \log n)$ -- good enough for $n \leq 3000$ . For larger $n$ , use suffix array.

Problems: CF 271D, SPOJ DISUBSTR

Pattern 5: Longest Common Prefix (LCP) with Binary Search

Binary search on the length of the common prefix of $s [i . .]$ and $s [j . .]$ . Check equality via hash.

cpp

int lcp(int i, int j) {
    int lo = 0, hi = min(n - i, n - j);
    while (lo < hi) {
        int mid = lo + (hi - lo + 1) / 2;
        if (hs.query(i, i + mid - 1) == hs.query(j, j + mid - 1))
            lo = mid;
        else
            hi = mid - 1;
    }
    return lo;
}

$O (\log n)$ per query. Useful as a building block for string comparison, suffix array construction, etc.

Problems: CF 126B

  Binary search on LCP length using hash comparison:

  s = "a b c a b d a b c a b e"
       0 1 2 3 4 5 6 7 8 9 ...

  LCP of suffix at i=0 vs suffix at i=6:

  lo=0                          hi=6
  ├─────────────────────────────┤
          mid=3
          hash(s[0..2]) vs hash(s[6..8])
          "abc" == "abc"  ->  match! lo=3
                    mid=4
                    hash(s[0..3]) vs hash(s[6..9])
                    "abca" == "abca"  ->  match! lo=4
                              mid=5
                              hash(s[0..4]) vs hash(s[6..10])
                              "abcab" == "abcab"  ->  match! lo=5
                                        mid=5 (converged)
                                        -> LCP = 5

  Each step: O(1) hash comparison.  Total: O(log n).

Pattern 6: Hashing + DP (String Periods, Repetitions)

Use hashing to check if a string has period $p$ : $s [i] = s [i + p]$ for all valid $i$ , equivalent to checking $h (s [0. . n - p - 1]) = h (s [p . . n - 1])$ .

cpp

for (int p = 1; p <= n; p++)
    if (n % p == 0 && hs.query(0, n - p - 1) == hs.query(p, n - 1))
        // s has period p

Problems: CF 182D, CF 25E

Contest Cheat Sheet

+-------------------------------------------------------+
|              STRING HASHING CHEAT SHEET                |
+-------------------------------------------------------+
| WHEN TO USE                                           |
|   - Substring equality queries                        |
|   - Pattern matching (Rabin-Karp)                     |
|   - "Count distinct substrings" up to n ~ 3000        |
|   - Palindrome checking with reverse hash             |
|   - Binary search on string properties (LCP, LCS)    |
+-------------------------------------------------------+
| BUILD                                                 |
|   pw[0]=1; for i in 0..n-1:                           |
|     h[i+1] = (h[i]*B + s[i]) % M                     |
|     pw[i+1] = pw[i]*B % M                             |
+-------------------------------------------------------+
| QUERY s[l..r] (0-indexed, inclusive)                  |
|   (h[r+1] - h[l]*pw[r-l+1] % M + M) % M             |
+-------------------------------------------------------+
| SAFE PARAMS                                           |
|   Double hash: MOD = 1e9+7, 1e9+9                    |
|   BASE: random in [256, 1e9]                          |
|   Randomize at runtime to beat hacks                  |
+-------------------------------------------------------+
| COMPLEXITY   Build: O(n)   Query: O(1)               |
+-------------------------------------------------------+

Gotchas & Debugging

⚠️ Single hashing is NOT safe. With one hash and mod ~10^9, birthday paradox gives ~50% collision probability at ~30,000 strings. On Codeforces, adversarial tests WILL break single hashing.
Always use double hashing (two independent bases and mods):
cpp
const long long MOD1 = 1e9 + 7, MOD2 = 1e9 + 9;
const long long BASE1 = 131, BASE2 = 137;
// Store pair {hash1, hash2} -- compare both
Collision probability with double hashing: ~1/(MOD1 * MOD2) ~= 10^{-18} per pair.
Also: When computing hash differences for substrings, add MOD before taking %: (h[r] - h[l-1] * pw[r-l+1] % MOD + MOD) % MOD -- the subtraction can go negative.

1. Negative modular arithmetic. C++ % can return negative values when the dividend is negative. The expression h[r+1] - h[l] * pw[len] % MOD can go negative. Always add +MOD before the final %MOD:

cpp

(h[r+1] - h[l] * pw[len] % MOD + MOD) % MOD
//                                ^^^^

2. Operator precedence trap.h[l] * pw[len] % MOD is parsed as (h[l] * pw[len]) % MOD because * and % have the same precedence and associate left-to-right. This is correct here. But if you write h[r+1] - h[l] * pw[len] + MOD) % MOD without the inner % MOD, the multiplication overflows long long. Always reduce after each multiply.

3. Collision probability -- don't use single hash in contests with hacking. A single mod $\sim 10^{9}$ gives $\sim 10^{- 9}$ collision probability per pair. With $n \sim 10^{5}$ substrings, you make $\sim 10^{10}$ pair comparisons -- expect collisions. Use double hashing or pick $m > 10^{18}$ (via __int128).

4. Choosing bases.

Don't use BASE = 26 or BASE = 31 -- they're the first thing anti-hash tests target.
Randomize at runtime: BASE = uniform_int_distribution<long long>(256, 1e9)(rng).
Make sure BASE > 256 so all ASCII values are distinct modulo the base.

5. Off-by-one in prefix hash indexing. The prefix hash array is 1-indexed: h[0] = 0, h[i] = hash of s[0..i-1]. The substring s[l..r] uses h[r+1] and h[l]. Mixing 0-indexed and 1-indexed is the #1 source of bugs. Write a query(l, r) function and always go through it.

6. Comparing substrings from different strings. Both strings must use the same base and mod. Build both hash objects with identical parameters. If you randomize, generate the base once and pass it to both constructors.

7. Hash of empty string. By convention h[0] = 0. Querying query(l, l-1) (empty range) should return 0. Make sure your query function handles this or you never call it with invalid ranges.

8. TLE from unordered_set with bad hash.unordered_set<pair<long long,long long>> doesn't compile by default (no hash for pair). Either combine the two hashes into one long long via v1 * MOD2 + v2, or use a custom hash, or use map/set (with $\log$ factor).

Mental Traps

Trap 1: "If my hash passes the samples, it won't collide on the judge."

This is exactly backwards. Problem setters on Codeforces actively add anti-hash tests after seeing common base/mod choices in previous rounds. If you use base=31, mod=1e9+7 -- appearing in thousands of submissions -- your hash is already known. Adversarial inputs can be constructed to force collisions with known parameters. Always randomize the base at runtime and always use double hashing.

  The adversary's workflow on Codeforces:

  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
  │ Read your    │    │ Find two     │    │ Submit hack  │
  │ submission:  │───>│ strings with │───>│ with those   │
  │ base=31      │    │ same hash    │    │ strings as   │
  │ mod=1e9+7    │    │ under (31,   │    │ test input   │
  │              │    │  1e9+7)      │    │              │
  └──────────────┘    └──────────────┘    └──────────────┘
  
  Defense: randomize base at runtime
    mt19937 rng(chrono::steady_clock::now()...);
    BASE = uniform_int_distribution<ll>(256, 1e9)(rng);
  
  Adversary can't read your runtime-chosen base -> can't hack.

Trap 2: "Double hashing is overkill for this problem."

It adds 5 lines of code and doubles a tiny constant factor. The alternative -- getting hacked or failing a hidden test -- wastes an entire submission and potentially your contest ranking. Single hashing is a coin flip on Codeforces; double hashing is insurance that costs nothing.

🐛 When Your Solution is Wrong

WA (Wrong Answer):
- Hash collision! Single hash has ~ $1 / MOD$ collision probability. Use double hashing (two different bases/mods) for $\leq 10^{- 12}$ collision chance.
- Wrong modular arithmetic (negative remainders in some languages)
- Off-by-one in substring hash extraction
TLE (Time Limit Exceeded):
- Recomputing hash from scratch for each substring instead of using prefix hash array
- Using string::substr() which copies -- $O (n)$ per call
Anti-hack tip: Randomize the base at runtime: base = rng() % (MOD - 2) + 2.

Common Mistakes

Single hash collisions. Birthday paradox makes collisions likely above ~10^5 hashes. Always use double hashing (two independent base/mod pairs).
Negative values from subtraction. (a - b) % p can be negative in C++. Fix: ((a - b) % p + p) % p.
Wrong power precomputation. pw[0] must be 1 (not base), and powers must be precomputed up to the full string length.
Hash comparison without collision awareness. Hash equality means "probably equal." Always verify with direct comparison in critical applications, or use double hashing.
Anti-hash tests. Some competitive programming judges use anti-hash tests. Double hashing with large primes (e.g., 10^9 + 7 and 10^9 + 9) is the standard defense.
Hardcoded base on adversarial judges. "I hashed every substring correctly but got Wrong Answer on test 37..." You used base = 31 and mod = 1e9+7 -- the exact constants from the cp-algorithms template. The problem setter generated an anti-hash test that forces collisions for these specific values. Switching to a random base chosen at runtime (mt19937) and a second modulus made the solution robust. Never hardcode the base in a problem where adversarial tests are possible.

Debug Drills

Drill 1 -- Off-by-one in substring hash extraction

cpp

// Hash of s[l..r] (0-indexed, inclusive)
long long get_hash(int l, int r) {
    return (h[r] - h[l] * pw[r - l + 1]) % MOD;
}

What's wrong?

It should be h[r+1] - h[l] * pw[r - l + 1] (if h is 1-indexed prefix hash where h[i] = hash of s[0..i-1]). Or if h[r] means hash of first r+1 chars, then the subtraction index is wrong. The classic fix:

cpp

long long get_hash(int l, int r) {
    return ((h[r + 1] - h[l] * pw[r - l + 1]) % MOD + MOD) % MOD;
}

The + MOD) % MOD also prevents negative remainders.

Drill 2 -- Power array overflow

cpp

pw[0] = 1;
for (int i = 1; i <= n; i++)
    pw[i] = pw[i - 1] * BASE;  // no modular reduction!

What's wrong?

Without % MOD, pw[i] overflows even long long for i > 40 or so. Every multiplication in the power array must be taken modulo MOD:

cpp

pw[i] = pw[i - 1] * BASE % MOD;

Drill 3 -- Comparing hashes from different mod systems

cpp

if (get_hash1(l1, r1) == get_hash2(l2, r2))  // hash1 uses MOD1, hash2 uses MOD2
    cout << "match";

What's wrong?

You're comparing a hash computed modulo MOD1 with one computed modulo MOD2. They can be equal by coincidence even for completely different strings. Both hashes must use the same (base, mod) pair, or you must compare (hash1_mod1, hash1_mod2) == (hash2_mod1, hash2_mod2) for double hashing.

Self-Test

Before moving to problems, verify you can do these without looking at the notes:

[ ] Implement double polynomial hashing from memory -- two hash arrays with two (base, mod) pairs, including + MOD in subtraction and power precomputation
[ ] State the collision probability for single vs. double hash with $10^{6}$ comparisons and $p \approx 10^{9}$ , and explain why double hashing is adversary-proof
[ ] Write the formula to extract hash of $s [l . . r]$ : $(h [r + 1] - h [l] \cdot b^{r - l + 1}) mod p$ , and explain why + MOD is needed
[ ] Name two situations where hashing is the right tool (binary search on string properties, multiple substring equality checks) and two where exact algorithms are better (guaranteed correctness, single pattern search)
[ ] State the minimum safe base for lowercase-only strings ( $> 26$ ) and explain why base = 26 causes non-collision false matches

Practice Problems

CF Rating	What You Should Be Able to Do
1200	Compute a polynomial hash; check if a pattern occurs in a text (single hash).
1500	Use prefix hashes for $O (1)$ substring comparison; find all occurrences of a pattern.
1800	Double hashing to resist anti-hash tests; rolling hash for sliding-window substring problems.
2100	Combine hashing with binary search (e.g., longest common substring); hash on trees or 2-D grids.

#	Problem	Source	Difficulty	Key Idea
1	Substring Search (CSES)	CSES	1400	Rabin-Karp pattern matching
2	Password (CF 126B)	CF	1700	Prefix = suffix = interior substring via hashing + Z/KMP
3	Good Substrings (CF 271D)	CF	1500	Count distinct substrings with character constraint
4	Palindrome Degree (CF 7D)	CF	1700	Recursive palindrome check with hashing
5	Yet Another Palindrome Problem (CF 1326D2)	CF	1800	Greedy + palindrome hashing
6	Kuro and GCD and XOR and SUM (CF 182D)	CF	1500	String period detection
7	Shortest Superstring (CF 25E)	CF	2200	Overlap via hashing + greedy
8	String Hashing (CSES)	CSES	1400	Direct application of polynomial hashing
9	Palindrome Pairs (CF 17E)	CF	2400	Hashing + combinatorics for palindrome counting
10	MUH and Cube Walls (CF 471D)	CF	1800	Difference array + Rabin-Karp matching

Recap

Polynomial hash: h(s) = sum(s[i] * base^i) mod p. Precompute prefix hashes for O(1) substring queries.
Always double-hash with two independent (base, mod) pairs to avoid collisions.
Use cases: substring equality, distinct substring counting, string matching, longest common substring via binary search + hashing.
Limitation: hashing is probabilistic. For hack-proof solutions, consider suffix arrays or KMP.

Cross-Links

KMP and Z-Function -- deterministic O(n) pattern matching alternative
Math Fundamentals -- modular arithmetic underlying hash computation
Strings Practice Ladder -- graded problem set

String Hashing

Contents

Intuition

Visual walkthrough

The invariant

Fingerprints, not proofs

What the Code Won't Teach You

When to reach for this

C++ STL Reference

Implementation (Contest-Ready)

Version 1: Single Hash -- Minimal Template

Version 2: Double Hash with Substring Query -- Explained

Operations Reference

Problem Patterns & Tricks

Pattern 1: Rabin-Karp String Matching

Pattern 2: Longest Common Substring via Binary Search + Hashing

Pattern 3: Palindrome Check

Pattern 4: Count Distinct Substrings

Pattern 5: Longest Common Prefix (LCP) with Binary Search

Pattern 6: Hashing + DP (String Periods, Repetitions)

Contest Cheat Sheet

Gotchas & Debugging

Mental Traps

🐛 When Your Solution is Wrong

Common Mistakes

Debug Drills

Self-Test

Practice Problems

Further Reading

Solve With Me -- CF 271D "Good Substrings"

Recap

Cross-Links

String Hashing ​

Contents ​

Intuition ​

Visual walkthrough ​

The invariant ​

Fingerprints, not proofs ​

What the Code Won't Teach You ​

When to reach for this ​

C++ STL Reference ​

Implementation (Contest-Ready) ​

Version 1: Single Hash -- Minimal Template ​

Version 2: Double Hash with Substring Query -- Explained ​

Operations Reference ​

Problem Patterns & Tricks ​

Pattern 1: Rabin-Karp String Matching ​

Pattern 2: Longest Common Substring via Binary Search + Hashing ​

Pattern 3: Palindrome Check ​

Pattern 4: Count Distinct Substrings ​

Pattern 5: Longest Common Prefix (LCP) with Binary Search ​

Pattern 6: Hashing + DP (String Periods, Repetitions) ​

Contest Cheat Sheet ​

Gotchas & Debugging ​

Mental Traps ​

🐛 When Your Solution is Wrong ​

Common Mistakes ​

Debug Drills ​

Self-Test ​

Practice Problems ​

Further Reading ​

Solve With Me -- CF 271D "Good Substrings" ​

Recap ​

Cross-Links ​

String Hashing

Contents

Intuition

Visual walkthrough

The invariant

Fingerprints, not proofs

What the Code Won't Teach You

When to reach for this

C++ STL Reference

Implementation (Contest-Ready)

Version 1: Single Hash -- Minimal Template

Version 2: Double Hash with Substring Query -- Explained

Operations Reference

Problem Patterns & Tricks

Pattern 1: Rabin-Karp String Matching

Pattern 2: Longest Common Substring via Binary Search + Hashing

Pattern 3: Palindrome Check

Pattern 4: Count Distinct Substrings

Pattern 5: Longest Common Prefix (LCP) with Binary Search

Pattern 6: Hashing + DP (String Periods, Repetitions)

Contest Cheat Sheet

Gotchas & Debugging

Mental Traps

🐛 When Your Solution is Wrong

Common Mistakes

Debug Drills

Self-Test

Practice Problems

Further Reading

Solve With Me -- CF 271D "Good Substrings"

Recap

Cross-Links