KMP and Z-Function

Linear-time string matching and prefix analysis -- the two workhorses that replace brute-force $O (n m)$ pattern search on Codeforces.

Difficulty: [Intermediate]Prerequisites: Arrays and StringsCF Rating Range: 1500-1900

Quick Revisit
USE WHEN: exact pattern matching, period/border finding, or prefix-suffix analysis in O(n)
INVARIANT: pi[i] = longest proper prefix of s[0..i] that is also a suffix
TIME: O(n + m) | SPACE: O(n + m)
CLASSIC BUG: wrong separator -- use a character not in the alphabet (e.g., '#') between pattern and text
PRACTICE: Strings Practice Ladder

Intuition
C++ STL Reference
Implementation
Operations Reference
Problem Patterns & Tricks
Contest Cheat Sheet
Gotchas & Debugging
Self-Test
Practice Problems
Further Reading

Contest Frequency: ** Regular -- appears in pattern matching problems

Intuition

KMP and Z are two views of the same idea: prefix reuse. Both arrays answer "how much of the pattern do we already know matches?" but from different vantage points -- KMP looks backward (longest prefix that is also a suffix of s[0..i]), Z looks forward (longest substring at position i that is a prefix of s). Either array solves pattern matching via the p + '#' + t concatenation trick. Pick by taste: Z is usually the easier first implementation; KMP's failure function generalizes to automaton DPs.

With that promise in hand:

Search for pattern $p = $ "abab" in text $t = $ "ababcabab".

The naive approach slides $p$ one position at a time across $t$ , restarting the comparison from the first character of $p$ after every mismatch:

  t: a b a b c a b a b
  p: a b a b              i=0: match 4 chars, then mismatch at t[4]='c'
     . a b a b            i=1: mismatch at t[1]='b' vs p[0]='a'  (wasted!)
       . a b a b          i=2: match 4 chars, mismatch at t[6]='a' vs p[4]=?
                                ... wait, p only has length 4, so full match!
         . a b a b        i=3: mismatch immediately                (wasted!)
           . a b a b      i=4: mismatch immediately                (wasted!)
             . a b a b    i=5: match 4 chars -> full match!

Many of those restarts are redundant. At $i = 0$ we already matched "abab" and saw that $t[4] = $ 'c'. The suffix "ab" of what we matched is also a prefix of $p$ -- so we know a potential match starting at $i = 2$ already has its first two characters confirmed. Yet the naive method re-reads them.

Worst case: $t = $ "aaa...a" ( $n$ characters), $p = $ "aaa...ab" ( $m$ characters). Every alignment matches $m - 1$ characters before failing, giving $O (n \cdot m)$ total comparisons.

KMP: When a mismatch occurs, the failure function tells you exactly how far to shift -- reusing characters you have already matched -- so you never backtrack in the text.

The failure function (prefix function) $π [i]$ for a string $s$ answers: "What is the length of the longest proper prefix of $s [0. . i]$ that is also a suffix?" With $π$ in hand, a mismatch at pattern position $j$ means the pattern's first $π [j - 1]$ characters still align with the text -- jump $j$ to $π [j - 1]$ instead of rewinding to 0. The text pointer never moves backward.

Z-function: $Z [i]$ equals the length of the longest substring starting at position $i$ that matches a prefix of the string -- computed in $O (n)$ by maintaining a window of the rightmost match.

For pattern matching, concatenate $p + # + t$ and compute $Z$ . Any position $i$ with $Z [i] \geq | p |$ marks a match. The window $[l, r)$ (called the Z-box) lets each new $Z [i]$ start from previously known information instead of from scratch.

Why Z runs in linear time, said carefully:

The Z-box $[l, r)$ is "the rightmost segment of $s$ that we have already verified to be a prefix of $s$ ." If $i$ lies inside the box, then $s [i . . r - 1]$ is by construction equal to $s [i - l . . r - l - 1]$ -- a prefix slice we already analysed. So $Z [i]$ starts at $min (Z [i - l], r - i)$ for free.
The clip to $r - i$ is essential: copying $Z [i - l]$ blindly may claim a match that extends past $r$ , and the characters past $r$ have not been verified yet. Anything past the right edge has to be checked by hand.
The naive extension loop only runs when it pushes $r$ further right. Since $r$ moves only forward and $r \leq n$ , the total extension work across all $i$ is $O (n)$ -- exactly the same amortization as the text pointer in KMP.

Build the ideas first, name them last: failure function, Z-box, prefix function -- these are just names for the reuse mechanisms above.

Visual Walkthrough

KMP: Failure-function construction for $p = $ `"abab"`

  p:     a   b   a   b
  index: 0   1   2   3

  Step 1  i=0: pi[0] = 0                       (by definition)

  Step 2  i=1: compare p[1]='b' vs p[0]='a'
                mismatch, j falls to 0         -> pi[1] = 0

  Step 3  i=2: compare p[2]='a' vs p[0]='a'
                match, j becomes 1             -> pi[2] = 1
                (prefix "a" == suffix "a")

  Step 4  i=3: compare p[3]='b' vs p[1]='b'
                match, j becomes 2             -> pi[3] = 2
                (prefix "ab" == suffix "ab")

  Result:
  +-------+-------+-------+-------+
  | p[i]  |   a   |   b   |   a   |   b   |
  +-------+-------+-------+-------+-------+
  | pi[i] |   0   |   0   |   1   |   2   |
  +-------+-------+-------+-------+-------+

KMP: Matching $p = $ `"abab"` in $t = $ `"ababcabab"`

  j = position in pattern, i = position in text.

  Step 1  i=0 j=0: t[0]='a'==p[0]  -> j=1
  Step 2  i=1 j=1: t[1]='b'==p[1]  -> j=2
  Step 3  i=2 j=2: t[2]='a'==p[2]  -> j=3
  Step 4  i=3 j=3: t[3]='b'==p[3]  -> j=4 == |p|  => MATCH at i-|p|+1 = 0
          After match: j = pi[3] = 2   (reuse "ab" overlap)

  Step 5  i=4 j=2: t[4]='c'!=p[2]='a'
          Fallback: j = pi[1] = 0
          t[4]='c'!=p[0]='a'         -> j stays 0, advance i

  Step 6  i=5 j=0: t[5]='a'==p[0]  -> j=1
  Step 7  i=6 j=1: t[6]='b'==p[1]  -> j=2
  Step 8  i=7 j=2: t[7]='a'==p[2]  -> j=3
  Step 9  i=8 j=3: t[8]='b'==p[3]  -> j=4 == |p|  => MATCH at i-|p|+1 = 5

  Text pointer i moved forward every step -- never backward.
  Two matches found: positions 0 and 5.

Z-function: brief walkthrough on $s = $ `"abab#ababcabab"`

  (This is the concatenation p + '#' + t with p = "abab", t = "ababcabab".)

  s:  a  b  a  b  #  a  b  a  b  c  a  b  a  b
  i:  0  1  2  3  4  5  6  7  8  9  10 11 12 13

  Key Z-values (Z[0] undefined):
    Z[5]  = 4   s[5..8]  = "abab" matches prefix of length 4  => match at t[0]
    Z[10] = 4   s[10..13]= "abab" matches prefix of length 4  => match at t[5]

  Any Z[i] >= |p| = 4 signals a match.  Position in t = i - |p| - 1.

  The Z-box [l, r) jumps forward as we compute:
    After Z[5]=4:  l=5,  r=9
    i=6 is inside [5,9), so Z[6] >= min(Z[1], 9-6) = min(0, 3) = 0
    Extend naively: s[0]='a' vs s[6]='b' -> stop. Z[6]=0.
    ... and so on.

Worked Example: KMP Failure Array for `"ABABAC"`

  Pattern: A  B  A  B  A  C
  Index:   0  1  2  3  4  5

  | i | p[i] | j before | Compare              | Action              | fail[i] |
  |---|------|----------|----------------------|---------------------|---------|
  | 0 |  A   |    --     | --                    | base case           |    0    |
  | 1 |  B   |    0     | p[1]='B' vs p[0]='A' | mismatch, j stays 0 |    0    |
  | 2 |  A   |    0     | p[2]='A' vs p[0]='A' | match, j->1          |    1    |
  | 3 |  B   |    1     | p[3]='B' vs p[1]='B' | match, j->2          |    2    |
  | 4 |  A   |    2     | p[4]='A' vs p[2]='A' | match, j->3          |    3    |
  | 5 |  C   |    3     | p[5]='C' vs p[3]='B' | mismatch -> j=fail[2]=1 |      |
  |   |      |    1     | p[5]='C' vs p[1]='B' | mismatch -> j=fail[0]=0 |      |
  |   |      |    0     | p[5]='C' vs p[0]='A' | mismatch, j stays 0 |    0    |

  fail[] = [ 0, 0, 1, 2, 3, 0 ]

  At i=5 the chain 3->1->0 shows the failure function "cascading"
  through all shorter borders before giving up.

Worked Example: KMP Matching `"ABABAC"` in `"ABABABABAC"`

  Text:    A B A B A B A B A C
  Index:   0 1 2 3 4 5 6 7 8 9
  Pattern: A B A B A C              fail[] = [0, 0, 1, 2, 3, 0]

  Step | i | j | Compare            | Action
  -----|---|---|--------------------|-----------------------------------------
    1  | 0 | 0 | t[0]='A' = p[0]   | match, j->1
    2  | 1 | 1 | t[1]='B' = p[1]   | match, j->2
    3  | 2 | 2 | t[2]='A' = p[2]   | match, j->3
    4  | 3 | 3 | t[3]='B' = p[3]   | match, j->4
    5  | 4 | 4 | t[4]='A' = p[4]   | match, j->5
    6  | 5 | 5 | t[5]='B' != p[5]='C' | MISMATCH -> j=fail[4]=3
    7  | 5 | 3 | t[5]='B' = p[3]   | match, j->4   (reused "ABAB" overlap)
    8  | 6 | 4 | t[6]='A' = p[4]   | match, j->5
    9  | 7 | 5 | t[7]='B' != p[5]='C' | MISMATCH -> j=fail[4]=3
   10  | 7 | 3 | t[7]='B' = p[3]   | match, j->4   (reused again)
   11  | 8 | 4 | t[8]='A' = p[4]   | match, j->5
   12  | 9 | 5 | t[9]='C' = p[5]   | match, j->6 = |p| -> MATCH at pos 4

  Alignment at match:
    Text:  A B A B [A B A B A C]
    Index: 0 1 2 3  4 5 6 7 8 9
                    ^-----------^  pattern found at position 4

  Key insight: at steps 6 and 9, the failure function let us skip
  back to j=3 instead of restarting from j=0 -- saving 6 comparisons.

Worked Example: Z-Function Trace for `"aabxaab"`

  String: a  a  b  x  a  a  b
  Index:  0  1  2  3  4  5  6

  | i | s[i] | [L,R) before | Inside window? | Action                    | Z[i] | [L,R) after |
  |---|------|-------------|----------------|---------------------------|------|-------------|
  | 1 |  a   | [0, 0)      | No -> naive     | s[1]='a'=s[0] OK s[2]='b'!=s[1] X | 1 | [1, 2)    |
  | 2 |  b   | [1, 2)      | No (i>=R)->naive | s[2]='b'!=s[0]='a' X      |  0   | [1, 2)      |
  | 3 |  x   | [1, 2)      | No (i>=R)->naive | s[3]='x'!=s[0]='a' X      |  0   | [1, 2)      |
  | 4 |  a   | [1, 2)      | No (i>=R)->naive | s[4..6]="aab"=s[0..2] OK  |  3   | [4, 7)      |
  | 5 |  a   | [4, 7)      | Yes, k=1       | Z[1]=1 < R-i=2 -> copy    |  1   | [4, 7)      |
  | 6 |  b   | [4, 7)      | Yes, k=2       | Z[2]=0 < R-i=1 -> copy    |  0   | [4, 7)      |

  Z[] = [ -, 1, 0, 0, 3, 1, 0 ]

  At i=5: we're inside the Z-box [4,7), so s[4..6] = s[0..2].
  Position 5 within the box corresponds to position 1 in the prefix,
  so Z[5] = Z[1] = 1 -- computed for free, no character comparisons.

The Invariant

KMP invariant. $π [i]$ = length of the longest proper prefix of $s [0. . i]$ that is also a suffix of $s [0. . i]$ . During matching, the text pointer $i$ never decreases -- each character of the text is examined at most once. Total work: $O (n + m)$ .

Z-function invariant. $Z [i]$ is computed using the $[l, r)$ window (the rightmost segment known to match a prefix). If $i < r$ , we can copy from $Z [i - l]$ up to $r - i$ characters for free; naive extension only happens when we push $r$ further right. Since $r$ only increases, total naive extension steps across all $i$ are $O (n)$ .

What the Code Won't Teach You

The prefix function encodes ALL borders, not just the longest. The chain π[n-1] -> π[π[n-1]-1] -> ... -> 0 gives every length where a prefix equals a suffix. This chain is what makes problems like "find a border that also appears in the interior" (CF 126B) solvable in O(n).
KMP's automaton interpretation opens up DP. The transition table δ[state][char] turns KMP into a DFA. This DFA becomes the state space for DP problems like "count strings of length n containing pattern p." The prefix function alone can't do this -- you need the full automaton.
Z-function's window [l, r) is analogous to Manacher's [l, r]. Both algorithms maintain a "rightmost known good region" and reuse previously computed information for positions inside it. Understanding one makes the other trivial. The key: r only moves right, giving O(n) total.

  THE FAILURE LINK CHAIN -- all borders of "abacabab":

  s:   a  b  a  c  a  b  a  b
  π:   0  0  1  0  1  2  3  2

  Border chain from π[7]=2:
  ┌─────────────────────────────────────┐
  │ π[7] = 2  ->  border "ab"           │
  │ π[1] = 0  ->  no more borders       │
  └─────────────────────────────────────┘

  ALL borders of "abacabab": just length 2 ("ab").
  
  Border chain from π[6]=3 (string "abacaba"):
  ┌─────────────────────────────────────┐
  │ π[6] = 3  ->  border "aba"          │
  │ π[2] = 1  ->  border "a"            │
  │ π[0] = 0  ->  stop                  │
  └─────────────────────────────────────┘

  ALL borders of "abacaba": lengths 3 ("aba") and 1 ("a").

When to Reach for This

Trigger phrases in problem statements:

"find all occurrences of $P$ in $T$ " -- exact pattern matching
"count occurrences of $P$ in $T$ "
"prefix function", "failure function" -- direct KMP
"period of a string", "shortest repeating unit"
"longest prefix which is also a suffix" -- string borders via $π$

Counter-examples (reach for something else):

Multiple patterns simultaneously in one text $\to$ Aho-Corasick
Approximate / fuzzy matching (edit distance $\leq k$ ) $\to$ DP-based techniques
Longest common substring of two strings $\to$ suffix array or suffix automaton
Lexicographic comparisons of many substrings $\to$ suffix array

C++ STL Reference

Function / Class	Header	What it does	Time Complexity
`s.find(pattern)`	`<string>`	Find first occurrence of `pattern` in `s`	$O (n \cdot m)$ worst case
`s.find(pattern, pos)`	`<string>`	Find first occurrence starting at `pos`	$O (n \cdot m)$ worst case
`s.rfind(pattern)`	`<string>`	Find last occurrence	$O (n \cdot m)$ worst case
`search(b1, e1, b2, e2)`	`<algorithm>`	Find subsequence in range	$O (n \cdot m)$ worst case
`mismatch(b1, e1, b2)`	`<algorithm>`	Find first position where ranges differ	$O (n)$

Why KMP/Z beats string::find: The STL find uses a naive or slightly optimized algorithm -- no guaranteed $O (n + m)$ . For $n, m \leq 10^{6}$ , naive search can TLE on adversarial inputs (e.g., s = "aaa...a", p = "aaa...ab"). KMP and Z-function guarantee $O (n + m)$ regardless of input.

Implementation (Contest-Ready)

4a. KMP -- Prefix Function (Minimal)

cpp

#include <bits/stdc++.h>
using namespace std;

vector<int> prefix_function(const string &s) {
    int n = s.size();
    vector<int> pi(n, 0);
    for (int i = 1; i < n; i++) {
        int j = pi[i - 1];
        while (j > 0 && s[i] != s[j]) j = pi[j - 1];
        if (s[i] == s[j]) j++;
        pi[i] = j;
    }
    return pi;
}

// Find all occurrences of p in t. Returns 0-indexed positions in t.
vector<int> kmp_search(const string &p, const string &t) {
    string s = p + '#' + t;
    auto pi = prefix_function(s);
    int m = p.size();
    vector<int> res;
    for (int i = m + 1; i < (int)s.size(); i++)
        if (pi[i] == m)
            res.push_back(i - 2 * m);
    return res;
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    string t, p;
    cin >> t >> p;
    auto occ = kmp_search(p, t);
    cout << occ.size() << "\n";
    for (int x : occ) cout << x << " ";
    cout << "\n";
}

4b. KMP -- Prefix Function (Explained)

cpp

#include <bits/stdc++.h>
using namespace std;

vector<int> prefix_function(const string &s) {
    int n = s.size();
    vector<int> pi(n, 0);
    // pi[0] = 0 by definition (no proper prefix/suffix for single char)
    for (int i = 1; i < n; i++) {
        // j = length of the candidate prefix-suffix we're trying to extend.
        // Start from the best we found at the previous position.
        int j = pi[i - 1];

        // If s[i] doesn't extend the current prefix-suffix of length j,
        // fall back to the next shorter one: pi[j-1].
        // This chain of fallbacks is the key insight of KMP.
        while (j > 0 && s[i] != s[j]) j = pi[j - 1];

        // If we can extend, increment length by 1.
        if (s[i] == s[j]) j++;
        pi[i] = j;
    }
    return pi;
}

vector<int> kmp_search(const string &p, const string &t) {
    // Concatenation trick: build "pattern#text" where '#' is a character
    // not in the alphabet. Then pi[i] can never exceed |p|.
    string s = p + '#' + t;
    auto pi = prefix_function(s);
    int m = p.size();
    vector<int> res;
    for (int i = m + 1; i < (int)s.size(); i++) {
        // pi[i] == m means the last m characters of s[0..i] match p entirely.
        // The match starts at position (i - 2*m) in the original text t.
        // Derivation: position in s is i, prefix "p#" has length m+1,
        //   so position in t is (i - (m+1)), and the match started m-1
        //   characters earlier: (i - (m+1)) - (m-1) = i - 2m.
        if (pi[i] == m)
            res.push_back(i - 2 * m);
    }
    return res;
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    string t, p;
    cin >> t >> p;
    auto occ = kmp_search(p, t);
    cout << occ.size() << "\n";
    for (int x : occ) cout << x << " ";
    cout << "\n";
}

4c. Z-Function (Minimal)

cpp

#include <bits/stdc++.h>
using namespace std;

vector<int> z_function(const string &s) {
    int n = s.size();
    vector<int> z(n, 0);
    int l = 0, r = 0;
    for (int i = 1; i < n; i++) {
        if (i < r) z[i] = min(r - i, z[i - l]);
        while (i + z[i] < n && s[z[i]] == s[i + z[i]]) z[i]++;
        if (i + z[i] > r) { l = i; r = i + z[i]; }
    }
    return z;
}

// Find all occurrences of p in t. Returns 0-indexed positions in t.
vector<int> z_search(const string &p, const string &t) {
    string s = p + '#' + t;
    auto z = z_function(s);
    int m = p.size();
    vector<int> res;
    for (int i = m + 1; i < (int)s.size(); i++)
        if (z[i] >= m)
            res.push_back(i - m - 1);
    return res;
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    string t, p;
    cin >> t >> p;
    auto occ = z_search(p, t);
    cout << occ.size() << "\n";
    for (int x : occ) cout << x << " ";
    cout << "\n";
}

4d. Z-Function (Explained)

cpp

#include <bits/stdc++.h>
using namespace std;

vector<int> z_function(const string &s) {
    int n = s.size();
    vector<int> z(n, 0);
    // [l, r) is the "Z-box": the rightmost segment that matches a prefix of s.
    int l = 0, r = 0;
    for (int i = 1; i < n; i++) {
        // If i is inside the current Z-box, we can reuse a previously computed
        // Z-value. z[i - l] tells us how far the match extends from the
        // corresponding position in the prefix copy, but we can't go past r.
        if (i < r) z[i] = min(r - i, z[i - l]);

        // Extend naively from where we left off.
        while (i + z[i] < n && s[z[i]] == s[i + z[i]]) z[i]++;

        // Update the Z-box if this extends further right.
        if (i + z[i] > r) { l = i; r = i + z[i]; }
    }
    return z;
}

vector<int> z_search(const string &p, const string &t) {
    // Concatenation trick: "pattern#text".
    // '#' must not appear in p or t.
    // Any position i in the concatenated string with z[i] >= |p|
    // corresponds to a match in t at position i - |p| - 1.
    string s = p + '#' + t;
    auto z = z_function(s);
    int m = p.size();
    vector<int> res;
    for (int i = m + 1; i < (int)s.size(); i++)
        if (z[i] >= m)
            res.push_back(i - m - 1);
    return res;
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    string t, p;
    cin >> t >> p;
    auto occ = z_search(p, t);
    cout << occ.size() << "\n";
    for (int x : occ) cout << x << " ";
    cout << "\n";
}

ASCII Diagram: KMP Automaton for Pattern "ABABAC"

Pattern:  A  B  A  B  A  C
State:    0  1  2  3  4  5  6(accept)
π[]:      0  0  1  2  3  0

State machine (match transitions -> solid, failure transitions ⇢ dashed):

     A       B       A       B       A       C
 (0)───▷(1)───▷(2)───▷(3)───▷(4)───▷(5)───▷(6) ACCEPT
  │      │      │      │      │      │
  │fail  │fail  │fail  │fail  │fail  │fail
  ▼      ▼      ▼      ▼      ▼      ▼
 (0)    (0)    (1)    (2)    (3)    (0)
  ↑      ↑             ↑             ↑
  └──────┘             └─────────────┘
   on mismatch          on mismatch

Transition table (state x char -> next state):
  State   'A'   'B'   'C'   other
  ─────────────────────────────────
    0       1     0     0     0
    1       1     2     0     0
    2       3     0     0     0
    3       3     4     0     0
    4       5     0     0     0
    5       1     2     6     0

Reading text "ABABABABAC":
  Pos:   0  1  2  3  4  5  6  7  8  9
  Char:  A  B  A  B  A  B  A  B  A  C
  State: 1  2  3  4  5  2  3  4  5  6 <- match at pos 4..9!

Key: on mismatch at state s, jump to π[s] and retry (no input consumed).
     This is equivalent to the longest proper prefix that is also a suffix.

Operations Reference

Both algorithms run in linear time, but they solve slightly different sets of problems. Use this table to verify which operations each one supports and at what cost.

Prefix Function (KMP)

Operation	Time	Space
Build prefix function $π$ for string of length $n$	$O (n)$	$O (n)$
Pattern matching ( $p$ in $t$ ) via concatenation	$O (n + m)$	$O (n + m)$
Count all occurrences	$O (n + m)$	$O (n + m)$
Find shortest period of string	$O (n)$	$O (n)$
Find all borders of string	$O (n)$ total	$O (n)$

Z-Function

Operation	Time	Space
Build Z-array for string of length $n$	$O (n)$	$O (n)$
Pattern matching ( $p$ in $t$ ) via concatenation	$O (n + m)$	$O (n + m)$
Count all occurrences	$O (n + m)$	$O (n + m)$
Find shortest period of string	$O (n)$	$O (n)$

Problem Patterns & Tricks

Pattern 1: Exact Pattern Matching

Find all occurrences of pattern $p$ in text $t$ . Build $s = p + # + t$ , compute prefix function or Z-array, and check for values equal to $| p |$ .

cpp

// Using Z-function
string s = p + '#' + t;
auto z = z_function(s);
for (int i = (int)p.size() + 1; i < (int)s.size(); i++)
    if (z[i] >= (int)p.size())
        cout << i - (int)p.size() - 1 << "\n";

Problems: CF 625B, CF 126B

Extracting Match Positions

cpp

// KMP search -- matches at positions where pattern ends
for (int i = 0, j = 0; i < n; i++) {
    while (j > 0 && text[i] != pattern[j]) j = fail[j - 1];
    if (text[i] == pattern[j]) j++;
    if (j == m) {
        // Match found starting at i - m + 1
        matches.push_back(i - m + 1);
        j = fail[j - 1];  // Continue searching (overlapping matches)
    }
}

Z-function: Concatenate pattern + "$" + text (where $ doesn't appear in either). Then z[i] == |pattern| means a match starts at position i - |pattern| - 1 in the text.

The $ separator prevents the Z-values from "leaking" across the boundary.

Pattern 2: Shortest Period of a String

The shortest period of $s$ has length $n - π [n - 1]$ , where $π$ is the prefix function. If $n$ is divisible by this length, the string is a full repetition; otherwise the last copy is partial.

cpp

auto pi = prefix_function(s);
int n = s.size();
int period = n - pi[n - 1];
// If n % period == 0, s = (s[0..period-1]) repeated n/period times.

  PERIOD DETECTION via prefix function:

  s = "abcabcab"  (n=8)
  π:  [0, 0, 0, 1, 2, 3, 4, 5]

  period = n - π[n-1] = 8 - 5 = 3
  
  ┌───┬───┬───┬───┬───┬───┬───┬───┐
  │ a │ b │ c │ a │ b │ c │ a │ b │
  └───┴───┴───┴───┴───┴───┴───┴───┘
  ├─ period ─┤─ period ─┤─ partial─┤
  │  "abc"   │  "abc"   │  "ab"    │

  8 % 3 = 2 != 0 -> NOT a perfect repetition.
  The string HAS period 3, but the last copy is truncated.

  Compare: s = "abcabc" (n=6), π = [0,0,0,1,2,3]
  period = 6 - 3 = 3.  6 % 3 = 0 -> perfect repetition: "abc"x2.

Problems: CF 182D, CF 1138B

Pattern 3: Counting Substring Occurrences

Count how many times each prefix of $s$ appears as a substring. Process $π$ in reverse: each $π [i] > 0$ means the prefix of length $π [i]$ appears once more. Propagate counts up the failure-link chain.

cpp

auto pi = prefix_function(s);
int n = s.size();
vector<int> cnt(n + 1, 1); // each prefix appears at least once (itself)
for (int i = n - 1; i >= 1; i--)
    cnt[pi[i]] += cnt[i];
// cnt[k] = number of times s[0..k-1] appears in s (for k >= 1)
// cnt[0] accumulates leftover and is usually not meaningful.

Problems: CF 432D, CF 1163D

Pattern 4: String Borders (Prefix = Suffix)

All borders of a string $s$ have lengths $π [n - 1], π [π [n - 1] - 1], π [π [π [n - 1] - 1] - 1], \dots$ -- just follow the failure-link chain until you reach 0.

cpp

auto pi = prefix_function(s);
int n = s.size();
vector<int> borders;
int k = pi[n - 1];
while (k > 0) {
    borders.push_back(k);
    k = pi[k - 1];
}

  VISUAL: ALL BORDERS OF "abacaba" via failure chain

  a b a c a b a
  ├─────────────┤  full string (length 7)
  ├─────┤         π[6]=3: prefix "aba" = suffix "aba"  OK
  │a b a│c a│b a│
  ├─┤               π[2]=1: prefix "a" = suffix "a"    OK
  │a│b a c a b│a│
  
  Chain: 3 -> 1 -> 0 (stop). Borders: {3, 1}.

Problems: CF 126B, CF 1092F

Pattern 5: Number of Distinct Substrings (Incremental)

Add characters one by one and use Z-function to count new distinct substrings. When appending character $c$ to current string $s$ , compute Z-function of $reverse (s + c)$ . The number of new substrings is $| s | + 1 - max (Z [i])$ .

Problems: CF 271D

Pattern 6: Automaton Construction (KMP Automaton)

Precompute the transition function $δ [j] [c]$ for all states $j$ and characters $c$ . This gives $O (1)$ per character matching and is useful when matching against multiple texts or combining with DP.

  KMP AUTOMATON for pattern "aba" (states 0,1,2,3):

  State 0 ──'a'──-> State 1 ──'b'──-> State 2 ──'a'──-> State 3 (MATCH)
    ↑   │            ↑   │            ↑               │
    │  'b'           │  'a'          │              'b'
    │   │            │   │           │                │
    └───┘            └───┘           └────────────────┘
  
  Full transition table:
  ┌───────┬─────┬─────┐
  │ State │  a  │  b  │
  ├───────┼─────┼─────┤
  │   0   │  1  │  0  │
  │   1   │  1  │  2  │
  │   2   │  3  │  0  │
  │   3   │  1  │  2  │  <- after match, fall to π[2]=1 ("a"), then δ[1][b]=2
  └───────┴─────┴─────┘

cpp

auto pi = prefix_function(p);
int m = p.size();
// delta[j][c] = next state when in state j and reading char c
vector<vector<int>> delta(m + 1, vector<int>(26));
for (int j = 0; j <= m; j++) {
    for (int c = 0; c < 26; c++) {
        if (j < m && c == p[j] - 'a')
            delta[j][c] = j + 1;
        else
            delta[j][c] = j == 0 ? 0 : delta[pi[j - 1]][c];
    }
}

Problems: CF 1163D, CF 808G

Contest Cheat Sheet

+---------------------------------------------------------------+
|              KMP & Z-FUNCTION CHEAT SHEET                      |
+---------------------------------------------------------------+
| WHEN TO USE:                                                   |
|   o Pattern matching in O(n+m) instead of O(n*m)               |
|   o String period / shortest repeating unit                    |
|   o All borders of a string (prefix = suffix)                  |
|   o Counting prefix occurrences                                |
+---------------------------------------------------------------+
| CONCATENATION TRICK:                                           |
|   s = pattern + '#' + text                                     |
|   '#' must not appear in either string!                        |
+---------------------------------------------------------------+
| KMP PREFIX FUNCTION:                                           |
|   pi[0]=0; for(i=1..n-1) {                                    |
|     j=pi[i-1]; while(j>0 && s[i]!=s[j]) j=pi[j-1];           |
|     if(s[i]==s[j]) j++; pi[i]=j; }                            |
|   Match: pi[i]==|p| => match at i-2*|p| in text               |
+---------------------------------------------------------------+
| Z-FUNCTION:                                                    |
|   l=r=0; for(i=1..n-1) {                                      |
|     if(i<r) z[i]=min(r-i,z[i-l]);                             |
|     while(i+z[i]<n && s[z[i]]==s[i+z[i]]) z[i]++;            |
|     if(i+z[i]>r) l=i, r=i+z[i]; }                            |
|   Match: z[i]>=|p| => match at i-|p|-1 in text                |
+---------------------------------------------------------------+
| PERIOD: period = n - pi[n-1]                                   |
| COMPLEXITY: O(n) build, O(n+m) match | SPACE: O(n+m)          |
+---------------------------------------------------------------+

Gotchas & Debugging

Off-by-one in prefix function

$π [0]$ is always 0 -- the loop starts at $i = 1$ . Starting at $i = 0$ won't crash but wastes a cycle and can mask bugs.
When extracting match positions from the concatenated string $p + # + t$ , the offset is $i - 2 | p |$ for KMP (prefix function) and $i - | p | - 1$ for Z-function. Mixing these up shifts all your answers.

Separator character in concatenation trick

The separator # must not appear in the input alphabet. If the input can contain any ASCII character, use a character outside the range (e.g., char(0) or char(1)) or just set $Z / π$ values to be capped at $| p |$ .
Forgetting the separator entirely makes $π$ or $Z$ values "leak" across the pattern-text boundary, producing phantom matches.

Using `int` for string sizes

string::size() returns size_t. Comparisons like i < s.size() - 1 when s is empty underflow to a huge number. Always cast: i < (int)s.size() - 1.

Empty pattern or text

If $| p | = 0$ , the concatenated string starts with #, and every position has $Z [i] \geq 0 = | p |$ -- you'll report matches everywhere. Guard against empty input.

Period formula edge case

$period = n - π [n - 1]$ . If $n \mod period \neq 0$ , the string is not a perfect repetition -- the last copy is truncated. Don't assume full periodicity without checking the divisibility condition.

KMP automaton memory

Building the full automaton $δ [m + 1] [| Σ |]$ uses $O (m \cdot | Σ |)$ memory. For $| Σ | = 26$ and $m \leq 10^{6}$ , that's ~100 MB. In that case, stick with the on-the-fly prefix function instead of precomputing the automaton.

Debugging tip

Print the $π$ or $Z$ array for small inputs and verify manually. Off-by-one bugs in the inner while loop are the #1 source of WA.

Mental Traps

Trap 1: "The failure link chain is O(n) per call -- the algorithm must be O(n²)."

The inner while (j > 0 && s[i] != s[j]) j = pi[j-1] looks expensive. But j can only decrease as many times as it has increased. Since j increases by at most 1 per outer iteration, total decreases across ALL iterations <= n.

  AMORTIZED COST OF FAILURE CHAIN:

  j value over time for s = "aabaabaab":
  
  i:  0  1  2  3  4  5  6  7  8
  j:  0  1  0  1  2  3  4  5  2
      ↑  ↑  ↓  ↑  ↑  ↑  ↑  ↑  ↓↓↓
     +1 +1 -1 +1 +1 +1 +1 +1 -3

  Total increases: 7   (one per match)
  Total decreases: 4   (spread across fail chain walks)
  Total <= 2n ⟹ O(n)

Trap 2: "Z-function and KMP prefix function are redundant -- just learn one."

They compute different things. KMP gives π[i] = longest border of s[0..i]. Z gives z[i] = longest prefix of s matching at position i. For "find all borders of a string," KMP is natural (follow the π chain). For "does s[i..i+k-1] equal s[0..k-1]?" Z answers in O(1). Knowing both saves conversion headaches.

  KMP π-array for "aabaaab":
  i:   0  1  2  3  4  5  6
  π:   0  1  0  1  2  2  0     <- borders of each prefix

  Z-array for "aabaaab":
  i:   0  1  2  3  4  5  6
  Z:   -  1  0  2  1  1  0     <- prefix match lengths at each position

  DIFFERENT INFORMATION. Not interchangeable without O(n) conversion.

Trap 3: "I don't need the separator in p#t -- just cap π or Z values at |p|."

Capping works for Z-function (just check z[i] >= |p|). But for KMP's prefix function, without the separator, π[i] can exceed |p|, and capping it retroactively doesn't fix the damage -- the failure links were computed using the uncapped values, so subsequent π[i] values are wrong. Always use the separator with KMP.

Common Mistakes

Missing or wrong separator. Concatenating pattern + text without a separator character (e.g., '#') causes false matches that span the boundary.
Off-by-one in match position. When pi[i] == m in the concatenated string pattern + '#' + text, the match ends at position i - m - 1 in the text. Getting this index wrong misreports match locations.
Not resetting after a match. To find all occurrences with KMP, set j = pi[j-1] after a full match, not j = 0. Resetting to 0 skips overlapping matches.
Z-function off-by-one. z[0] is undefined (or set to 0/n by convention). Starting the scan at i = 1 is correct; starting at i = 0 causes issues.
Confusing KMP and Z-function outputs. pi[i] is the length of the longest border ending at i. z[i] is the length of the longest prefix match starting at i. They answer different questions.
Wrong fallback index in prefix function. "My prefix function seemed correct but produced wrong periods for strings like aabaab..." In the inner while-loop, writing len = pi[len] instead of len = pi[len - 1]. When s[i] != s[len], you want to fall back to the longest border of s[0..len-1], which is pi[len - 1]. Writing pi[len] reads one position too far and gives the border of s[0..len], breaking the invariant. This single off-by-one is the #1 KMP implementation bug.

Debug Drills

Drill 1 -- Prefix function: wrong fallback

cpp

vector<int> prefix_function(string& s) {
    int n = s.size();
    vector<int> pi(n, 0);
    for (int i = 1; i < n; i++) {
        int len = pi[i - 1];
        while (len > 0 && s[i] != s[len])
            len = pi[len];  // BUG
        if (s[i] == s[len]) len++;
        pi[i] = len;
    }
    return pi;
}

What's wrong?

pi[len] should be pi[len - 1]. When falling back, you want the longest border of the prefix s[0..len-1], not s[0..len]:

cpp

len = pi[len - 1];

Drill 2 -- Z-function: off-by-one in window update

cpp

vector<int> z_function(string& s) {
    int n = s.size();
    vector<int> z(n, 0);
    int l = 0, r = 0;
    for (int i = 1; i < n; i++) {
        if (i < r)
            z[i] = min(r - i, z[i - l]);
        while (i + z[i] < n && s[z[i]] == s[i + z[i]])
            z[i]++;
        if (i + z[i] > r)
            l = i, r = i + z[i] - 1;  // BUG
    }
    return z;
}

What's wrong?

r should be i + z[i], not i + z[i] - 1. The convention is that r is the exclusive right endpoint of the Z-box [l, r). With r = i + z[i] - 1, the window shrinks by 1 and the condition i < r misses the last character:

cpp

if (i + z[i] > r)
    l = i, r = i + z[i];

Drill 3 -- KMP matching: wrong occurrence index

cpp

string combined = pattern + "#" + text;
auto pi = prefix_function(combined);
for (int i = 0; i < combined.size(); i++) {
    if (pi[i] == pattern.size())
        cout << i << "\n";  // BUG: prints index in combined, not in text
}

What's wrong?

Position i is in the combined string pattern + '#' + text. The occurrence in text starts at i - 2 * pattern.size(). Correct:

cpp

if (pi[i] == (int)pattern.size())
    cout << (i - 2 * (int)pattern.size()) << "\n";

More precisely, the match ends at position i - pattern.size() - 1 in text, so it starts at i - 2 * pattern.size().

Self-Test

Before moving to the practice problems, you should be able to answer these without looking anything up:

[ ] Compute the prefix function for "aabaaab" by hand and verify each value.
[ ] Compute the Z-array for "aabaaab" by hand, tracking the [l, r) window at each step.
[ ] Given pattern "aba" and text "abababa", use the concatenation trick with BOTH KMP and Z-function to find all match positions. Verify the position formulas differ.
[ ] Find all borders of "abacabac" using the failure link chain. State every border length.
[ ] Determine the shortest period of "abcabcab" using the period formula. Is it a perfect repetition? Why or why not?
[ ] Build the KMP automaton δ[j][c] for pattern "ab" over alphabet {a, b}. Write out the full 3x2 transition table.
[ ] Explain in one sentence why KMP and Z-function are both O(n), referencing what quantity "only moves forward."

Practice Problems

CF Rating	What You Should Be Able to Do
1200	Use `strstr` or brute-force matching; understand what a prefix function value means.
1500	Implement KMP and Z-function from scratch; solve single-pattern matching and period-finding problems.
1800	Use the KMP automaton for DP on strings; exploit Z-function for multi-pattern tricks via concatenation.
2100	Combine KMP/Z with other structures (segment tree, suffix array) for advanced string DP or online matching.

#	Problem	Source	Difficulty	Key Idea
1	Pattern Find	CSES	Easy	Direct KMP or Z-function pattern matching
2	Password	CF 126B	1700	Find longest border that also occurs in the middle
3	MUH and Cube Walls	CF 471D	1800	KMP on difference array
4	Prefixes and Suffixes	CF 432D	2000	All borders + counting prefix occurrences
5	Anthem of Berland	CF 808G	2300	KMP automaton + DP
6	Periodic Strings	CF 182D	1500	Shortest period via prefix function
7	String Matching	CSES	Easy	Z-function concatenation trick
8	Shift and Reverse	CF 1138B	1600	Period analysis with prefix function
9	Vasya and Big Integers	CF 1051E	2300	Z-function + DP for string comparison
10	Sasha and a Very Easy Test	CF 1163D	2200	KMP automaton transitions + DP on strings

Recap

KMP prefix function: pi[i] = longest proper prefix of s[0..i] that equals a suffix. Built in O(n). Pattern matching via pattern + '#' + text.
Z-function: z[i] = length of the longest substring starting at i that matches a prefix of s. Also O(n).
KMP vs Z-function: largely interchangeable for pattern matching. Z-function is often simpler to code; KMP integrates naturally into automaton-based solutions.
Key applications: pattern search, period/border detection, string compression, counting occurrences.

Cross-Links

String Hashing -- probabilistic O(1) substring comparison alternative
DP -- 1D Introduction -- many string DP problems build on prefix functions
Strings Practice Ladder -- graded problem set

KMP and Z-Function ​

Contents ​

Intuition ​

Visual Walkthrough ​

KMP: Failure-function construction for $p = $ "abab" ​

KMP: Matching $p = $ "abab" in $t = $ "ababcabab" ​

Z-function: brief walkthrough on $s = $ "abab#ababcabab" ​

Worked Example: KMP Failure Array for "ABABAC" ​

Worked Example: KMP Matching "ABABAC" in "ABABABABAC" ​

Worked Example: Z-Function Trace for "aabxaab" ​

The Invariant ​

What the Code Won't Teach You ​

When to Reach for This ​

C++ STL Reference ​

Implementation (Contest-Ready) ​

4a. KMP -- Prefix Function (Minimal) ​

4b. KMP -- Prefix Function (Explained) ​

4c. Z-Function (Minimal) ​

4d. Z-Function (Explained) ​

ASCII Diagram: KMP Automaton for Pattern "ABABAC" ​

Operations Reference ​

Prefix Function (KMP) ​

Z-Function ​

Problem Patterns & Tricks ​

Pattern 1: Exact Pattern Matching ​

Extracting Match Positions ​

Pattern 2: Shortest Period of a String ​

Pattern 3: Counting Substring Occurrences ​

Pattern 4: String Borders (Prefix = Suffix) ​

Pattern 5: Number of Distinct Substrings (Incremental) ​

Pattern 6: Automaton Construction (KMP Automaton) ​

Contest Cheat Sheet ​

Gotchas & Debugging ​

Off-by-one in prefix function ​

Separator character in concatenation trick ​

Using int for string sizes ​

Empty pattern or text ​

Period formula edge case ​

KMP automaton memory ​

Debugging tip ​

Mental Traps ​

Common Mistakes ​

Debug Drills ​

Self-Test ​

Practice Problems ​

Further Reading ​

Recap ​

Cross-Links ​

KMP and Z-Function

Contents

Intuition

Visual Walkthrough

KMP: Failure-function construction for $p = $ `"abab"`

KMP: Matching $p = $ `"abab"` in $t = $ `"ababcabab"`

Z-function: brief walkthrough on $s = $ `"abab#ababcabab"`

Worked Example: KMP Failure Array for `"ABABAC"`

Worked Example: KMP Matching `"ABABAC"` in `"ABABABABAC"`

Worked Example: Z-Function Trace for `"aabxaab"`

The Invariant

What the Code Won't Teach You

When to Reach for This

C++ STL Reference

Implementation (Contest-Ready)

4a. KMP -- Prefix Function (Minimal)

4b. KMP -- Prefix Function (Explained)

4c. Z-Function (Minimal)

4d. Z-Function (Explained)

ASCII Diagram: KMP Automaton for Pattern "ABABAC"

Operations Reference

Prefix Function (KMP)

Z-Function

Problem Patterns & Tricks

Pattern 1: Exact Pattern Matching

Extracting Match Positions

Pattern 2: Shortest Period of a String

Pattern 3: Counting Substring Occurrences

Pattern 4: String Borders (Prefix = Suffix)

Pattern 5: Number of Distinct Substrings (Incremental)

Pattern 6: Automaton Construction (KMP Automaton)

Contest Cheat Sheet

Gotchas & Debugging

Off-by-one in prefix function

Separator character in concatenation trick

Using `int` for string sizes

Empty pattern or text

Period formula edge case

KMP automaton memory

Debugging tip

Mental Traps

Common Mistakes

Debug Drills

Self-Test

Practice Problems

Further Reading

Recap

Cross-Links