Skip to content

KMP and Z-Function

Linear-time string matching and prefix analysis -- the two workhorses that replace brute-force O(nm) pattern search on Codeforces.

Difficulty: [Intermediate]Prerequisites: Arrays and StringsCF Rating Range: 1500-1900

Quick Revisit

  • USE WHEN: exact pattern matching, period/border finding, or prefix-suffix analysis in O(n)
  • INVARIANT: pi[i] = longest proper prefix of s[0..i] that is also a suffix
  • TIME: O(n + m) | SPACE: O(n + m)
  • CLASSIC BUG: wrong separator -- use a character not in the alphabet (e.g., '#') between pattern and text
  • PRACTICE: Strings Practice Ladder

Contents

Contest Frequency: ** Regular -- appears in pattern matching problems


Intuition

KMP and Z are two views of the same idea: prefix reuse. Both arrays answer "how much of the pattern do we already know matches?" but from different vantage points -- KMP looks backward (longest prefix that is also a suffix of s[0..i]), Z looks forward (longest substring at position i that is a prefix of s). Either array solves pattern matching via the p + '#' + t concatenation trick. Pick by taste: Z is usually the easier first implementation; KMP's failure function generalizes to automaton DPs.

With that promise in hand:

Search for pattern $p = $ "abab" in text $t = $ "ababcabab".

The naive approach slides p one position at a time across t, restarting the comparison from the first character of p after every mismatch:

  t: a b a b c a b a b
  p: a b a b              i=0: match 4 chars, then mismatch at t[4]='c'
     . a b a b            i=1: mismatch at t[1]='b' vs p[0]='a'  (wasted!)
       . a b a b          i=2: match 4 chars, mismatch at t[6]='a' vs p[4]=?
                                ... wait, p only has length 4, so full match!
         . a b a b        i=3: mismatch immediately                (wasted!)
           . a b a b      i=4: mismatch immediately                (wasted!)
             . a b a b    i=5: match 4 chars -> full match!

Many of those restarts are redundant. At i=0 we already matched "abab" and saw that $t[4] = $ 'c'. The suffix "ab" of what we matched is also a prefix of p -- so we know a potential match starting at i=2 already has its first two characters confirmed. Yet the naive method re-reads them.

Worst case: $t = $ "aaa...a" (n characters), $p = $ "aaa...ab" (m characters). Every alignment matches m1 characters before failing, giving O(nm) total comparisons.

KMP: When a mismatch occurs, the failure function tells you exactly how far to shift -- reusing characters you have already matched -- so you never backtrack in the text.

The failure function (prefix function) π[i] for a string s answers: "What is the length of the longest proper prefix of s[0..i] that is also a suffix?" With π in hand, a mismatch at pattern position j means the pattern's first π[j1] characters still align with the text -- jump j to π[j1] instead of rewinding to 0. The text pointer never moves backward.

Z-function: Z[i] equals the length of the longest substring starting at position i that matches a prefix of the string -- computed in O(n) by maintaining a window of the rightmost match.

For pattern matching, concatenate p+#+t and compute Z. Any position i with Z[i]|p| marks a match. The window [l,r) (called the Z-box) lets each new Z[i] start from previously known information instead of from scratch.

Why Z runs in linear time, said carefully:

  • The Z-box [l,r) is "the rightmost segment of s that we have already verified to be a prefix of s." If i lies inside the box, then s[i..r1] is by construction equal to s[il..rl1] -- a prefix slice we already analysed. So Z[i] starts at min(Z[il], ri) for free.
  • The clip to ri is essential: copying Z[il] blindly may claim a match that extends past r, and the characters past r have not been verified yet. Anything past the right edge has to be checked by hand.
  • The naive extension loop only runs when it pushes r further right. Since r moves only forward and rn, the total extension work across all i is O(n) -- exactly the same amortization as the text pointer in KMP.

Build the ideas first, name them last: failure function, Z-box, prefix function -- these are just names for the reuse mechanisms above.

Visual Walkthrough

KMP: Failure-function construction for $p = $ "abab"

  p:     a   b   a   b
  index: 0   1   2   3

  Step 1  i=0: pi[0] = 0                       (by definition)

  Step 2  i=1: compare p[1]='b' vs p[0]='a'
                mismatch, j falls to 0         -> pi[1] = 0

  Step 3  i=2: compare p[2]='a' vs p[0]='a'
                match, j becomes 1             -> pi[2] = 1
                (prefix "a" == suffix "a")

  Step 4  i=3: compare p[3]='b' vs p[1]='b'
                match, j becomes 2             -> pi[3] = 2
                (prefix "ab" == suffix "ab")

  Result:
  +-------+-------+-------+-------+
  | p[i]  |   a   |   b   |   a   |   b   |
  +-------+-------+-------+-------+-------+
  | pi[i] |   0   |   0   |   1   |   2   |
  +-------+-------+-------+-------+-------+

KMP: Matching $p = $ "abab" in $t = $ "ababcabab"

  j = position in pattern, i = position in text.

  Step 1  i=0 j=0: t[0]='a'==p[0]  -> j=1
  Step 2  i=1 j=1: t[1]='b'==p[1]  -> j=2
  Step 3  i=2 j=2: t[2]='a'==p[2]  -> j=3
  Step 4  i=3 j=3: t[3]='b'==p[3]  -> j=4 == |p|  => MATCH at i-|p|+1 = 0
          After match: j = pi[3] = 2   (reuse "ab" overlap)

  Step 5  i=4 j=2: t[4]='c'!=p[2]='a'
          Fallback: j = pi[1] = 0
          t[4]='c'!=p[0]='a'         -> j stays 0, advance i

  Step 6  i=5 j=0: t[5]='a'==p[0]  -> j=1
  Step 7  i=6 j=1: t[6]='b'==p[1]  -> j=2
  Step 8  i=7 j=2: t[7]='a'==p[2]  -> j=3
  Step 9  i=8 j=3: t[8]='b'==p[3]  -> j=4 == |p|  => MATCH at i-|p|+1 = 5

  Text pointer i moved forward every step -- never backward.
  Two matches found: positions 0 and 5.

Z-function: brief walkthrough on $s = $ "abab#ababcabab"

  (This is the concatenation p + '#' + t with p = "abab", t = "ababcabab".)

  s:  a  b  a  b  #  a  b  a  b  c  a  b  a  b
  i:  0  1  2  3  4  5  6  7  8  9  10 11 12 13

  Key Z-values (Z[0] undefined):
    Z[5]  = 4   s[5..8]  = "abab" matches prefix of length 4  => match at t[0]
    Z[10] = 4   s[10..13]= "abab" matches prefix of length 4  => match at t[5]

  Any Z[i] >= |p| = 4 signals a match.  Position in t = i - |p| - 1.

  The Z-box [l, r) jumps forward as we compute:
    After Z[5]=4:  l=5,  r=9
    i=6 is inside [5,9), so Z[6] >= min(Z[1], 9-6) = min(0, 3) = 0
    Extend naively: s[0]='a' vs s[6]='b' -> stop. Z[6]=0.
    ... and so on.

Worked Example: KMP Failure Array for "ABABAC"

  Pattern: A  B  A  B  A  C
  Index:   0  1  2  3  4  5

  | i | p[i] | j before | Compare              | Action              | fail[i] |
  |---|------|----------|----------------------|---------------------|---------|
  | 0 |  A   |    --     | --                    | base case           |    0    |
  | 1 |  B   |    0     | p[1]='B' vs p[0]='A' | mismatch, j stays 0 |    0    |
  | 2 |  A   |    0     | p[2]='A' vs p[0]='A' | match, j->1          |    1    |
  | 3 |  B   |    1     | p[3]='B' vs p[1]='B' | match, j->2          |    2    |
  | 4 |  A   |    2     | p[4]='A' vs p[2]='A' | match, j->3          |    3    |
  | 5 |  C   |    3     | p[5]='C' vs p[3]='B' | mismatch -> j=fail[2]=1 |      |
  |   |      |    1     | p[5]='C' vs p[1]='B' | mismatch -> j=fail[0]=0 |      |
  |   |      |    0     | p[5]='C' vs p[0]='A' | mismatch, j stays 0 |    0    |

  fail[] = [ 0, 0, 1, 2, 3, 0 ]

  At i=5 the chain 3->1->0 shows the failure function "cascading"
  through all shorter borders before giving up.

Worked Example: KMP Matching "ABABAC" in "ABABABABAC"

  Text:    A B A B A B A B A C
  Index:   0 1 2 3 4 5 6 7 8 9
  Pattern: A B A B A C              fail[] = [0, 0, 1, 2, 3, 0]

  Step | i | j | Compare            | Action
  -----|---|---|--------------------|-----------------------------------------
    1  | 0 | 0 | t[0]='A' = p[0]   | match, j->1
    2  | 1 | 1 | t[1]='B' = p[1]   | match, j->2
    3  | 2 | 2 | t[2]='A' = p[2]   | match, j->3
    4  | 3 | 3 | t[3]='B' = p[3]   | match, j->4
    5  | 4 | 4 | t[4]='A' = p[4]   | match, j->5
    6  | 5 | 5 | t[5]='B' != p[5]='C' | MISMATCH -> j=fail[4]=3
    7  | 5 | 3 | t[5]='B' = p[3]   | match, j->4   (reused "ABAB" overlap)
    8  | 6 | 4 | t[6]='A' = p[4]   | match, j->5
    9  | 7 | 5 | t[7]='B' != p[5]='C' | MISMATCH -> j=fail[4]=3
   10  | 7 | 3 | t[7]='B' = p[3]   | match, j->4   (reused again)
   11  | 8 | 4 | t[8]='A' = p[4]   | match, j->5
   12  | 9 | 5 | t[9]='C' = p[5]   | match, j->6 = |p| -> MATCH at pos 4

  Alignment at match:
    Text:  A B A B [A B A B A C]
    Index: 0 1 2 3  4 5 6 7 8 9
                    ^-----------^  pattern found at position 4

  Key insight: at steps 6 and 9, the failure function let us skip
  back to j=3 instead of restarting from j=0 -- saving 6 comparisons.

Worked Example: Z-Function Trace for "aabxaab"

  String: a  a  b  x  a  a  b
  Index:  0  1  2  3  4  5  6

  | i | s[i] | [L,R) before | Inside window? | Action                    | Z[i] | [L,R) after |
  |---|------|-------------|----------------|---------------------------|------|-------------|
  | 1 |  a   | [0, 0)      | No -> naive     | s[1]='a'=s[0] OK s[2]='b'!=s[1] X | 1 | [1, 2)    |
  | 2 |  b   | [1, 2)      | No (i>=R)->naive | s[2]='b'!=s[0]='a' X      |  0   | [1, 2)      |
  | 3 |  x   | [1, 2)      | No (i>=R)->naive | s[3]='x'!=s[0]='a' X      |  0   | [1, 2)      |
  | 4 |  a   | [1, 2)      | No (i>=R)->naive | s[4..6]="aab"=s[0..2] OK  |  3   | [4, 7)      |
  | 5 |  a   | [4, 7)      | Yes, k=1       | Z[1]=1 < R-i=2 -> copy    |  1   | [4, 7)      |
  | 6 |  b   | [4, 7)      | Yes, k=2       | Z[2]=0 < R-i=1 -> copy    |  0   | [4, 7)      |

  Z[] = [ -, 1, 0, 0, 3, 1, 0 ]

  At i=5: we're inside the Z-box [4,7), so s[4..6] = s[0..2].
  Position 5 within the box corresponds to position 1 in the prefix,
  so Z[5] = Z[1] = 1 -- computed for free, no character comparisons.

The Invariant

KMP invariant. π[i] = length of the longest proper prefix of s[0..i] that is also a suffix of s[0..i]. During matching, the text pointer i never decreases -- each character of the text is examined at most once. Total work: O(n+m).

Z-function invariant. Z[i] is computed using the [l,r) window (the rightmost segment known to match a prefix). If i<r, we can copy from Z[il] up to ri characters for free; naive extension only happens when we push r further right. Since r only increases, total naive extension steps across all i are O(n).

What the Code Won't Teach You

  1. The prefix function encodes ALL borders, not just the longest. The chain π[n-1] -> π[π[n-1]-1] -> ... -> 0 gives every length where a prefix equals a suffix. This chain is what makes problems like "find a border that also appears in the interior" (CF 126B) solvable in O(n).

  2. KMP's automaton interpretation opens up DP. The transition table δ[state][char] turns KMP into a DFA. This DFA becomes the state space for DP problems like "count strings of length n containing pattern p." The prefix function alone can't do this -- you need the full automaton.

  3. Z-function's window [l, r) is analogous to Manacher's [l, r]. Both algorithms maintain a "rightmost known good region" and reuse previously computed information for positions inside it. Understanding one makes the other trivial. The key: r only moves right, giving O(n) total.

  THE FAILURE LINK CHAIN -- all borders of "abacabab":

  s:   a  b  a  c  a  b  a  b
  π:   0  0  1  0  1  2  3  2

  Border chain from π[7]=2:
  ┌─────────────────────────────────────┐
  │ π[7] = 2  ->  border "ab"           │
  │ π[1] = 0  ->  no more borders       │
  └─────────────────────────────────────┘

  ALL borders of "abacabab": just length 2 ("ab").
  
  Border chain from π[6]=3 (string "abacaba"):
  ┌─────────────────────────────────────┐
  │ π[6] = 3  ->  border "aba"          │
  │ π[2] = 1  ->  border "a"            │
  │ π[0] = 0  ->  stop                  │
  └─────────────────────────────────────┘

  ALL borders of "abacaba": lengths 3 ("aba") and 1 ("a").

When to Reach for This

Trigger phrases in problem statements:

  • "find all occurrences of P in T" -- exact pattern matching
  • "count occurrences of P in T"
  • "prefix function", "failure function" -- direct KMP
  • "period of a string", "shortest repeating unit"
  • "longest prefix which is also a suffix" -- string borders via π

Counter-examples (reach for something else):

  • Multiple patterns simultaneously in one text Aho-Corasick
  • Approximate / fuzzy matching (edit distance k) DP-based techniques
  • Longest common substring of two strings suffix array or suffix automaton
  • Lexicographic comparisons of many substrings suffix array

C++ STL Reference

Function / ClassHeaderWhat it doesTime Complexity
s.find(pattern)<string>Find first occurrence of pattern in sO(nm) worst case
s.find(pattern, pos)<string>Find first occurrence starting at posO(nm) worst case
s.rfind(pattern)<string>Find last occurrenceO(nm) worst case
search(b1, e1, b2, e2)<algorithm>Find subsequence in rangeO(nm) worst case
mismatch(b1, e1, b2)<algorithm>Find first position where ranges differO(n)

Why KMP/Z beats string::find: The STL find uses a naive or slightly optimized algorithm -- no guaranteed O(n+m). For n,m106, naive search can TLE on adversarial inputs (e.g., s = "aaa...a", p = "aaa...ab"). KMP and Z-function guarantee O(n+m) regardless of input.


Implementation (Contest-Ready)

4a. KMP -- Prefix Function (Minimal)

cpp
#include <bits/stdc++.h>
using namespace std;

vector<int> prefix_function(const string &s) {
    int n = s.size();
    vector<int> pi(n, 0);
    for (int i = 1; i < n; i++) {
        int j = pi[i - 1];
        while (j > 0 && s[i] != s[j]) j = pi[j - 1];
        if (s[i] == s[j]) j++;
        pi[i] = j;
    }
    return pi;
}

// Find all occurrences of p in t. Returns 0-indexed positions in t.
vector<int> kmp_search(const string &p, const string &t) {
    string s = p + '#' + t;
    auto pi = prefix_function(s);
    int m = p.size();
    vector<int> res;
    for (int i = m + 1; i < (int)s.size(); i++)
        if (pi[i] == m)
            res.push_back(i - 2 * m);
    return res;
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    string t, p;
    cin >> t >> p;
    auto occ = kmp_search(p, t);
    cout << occ.size() << "\n";
    for (int x : occ) cout << x << " ";
    cout << "\n";
}

4b. KMP -- Prefix Function (Explained)

cpp
#include <bits/stdc++.h>
using namespace std;

vector<int> prefix_function(const string &s) {
    int n = s.size();
    vector<int> pi(n, 0);
    // pi[0] = 0 by definition (no proper prefix/suffix for single char)
    for (int i = 1; i < n; i++) {
        // j = length of the candidate prefix-suffix we're trying to extend.
        // Start from the best we found at the previous position.
        int j = pi[i - 1];

        // If s[i] doesn't extend the current prefix-suffix of length j,
        // fall back to the next shorter one: pi[j-1].
        // This chain of fallbacks is the key insight of KMP.
        while (j > 0 && s[i] != s[j]) j = pi[j - 1];

        // If we can extend, increment length by 1.
        if (s[i] == s[j]) j++;
        pi[i] = j;
    }
    return pi;
}

vector<int> kmp_search(const string &p, const string &t) {
    // Concatenation trick: build "pattern#text" where '#' is a character
    // not in the alphabet. Then pi[i] can never exceed |p|.
    string s = p + '#' + t;
    auto pi = prefix_function(s);
    int m = p.size();
    vector<int> res;
    for (int i = m + 1; i < (int)s.size(); i++) {
        // pi[i] == m means the last m characters of s[0..i] match p entirely.
        // The match starts at position (i - 2*m) in the original text t.
        // Derivation: position in s is i, prefix "p#" has length m+1,
        //   so position in t is (i - (m+1)), and the match started m-1
        //   characters earlier: (i - (m+1)) - (m-1) = i - 2m.
        if (pi[i] == m)
            res.push_back(i - 2 * m);
    }
    return res;
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    string t, p;
    cin >> t >> p;
    auto occ = kmp_search(p, t);
    cout << occ.size() << "\n";
    for (int x : occ) cout << x << " ";
    cout << "\n";
}

4c. Z-Function (Minimal)

cpp
#include <bits/stdc++.h>
using namespace std;

vector<int> z_function(const string &s) {
    int n = s.size();
    vector<int> z(n, 0);
    int l = 0, r = 0;
    for (int i = 1; i < n; i++) {
        if (i < r) z[i] = min(r - i, z[i - l]);
        while (i + z[i] < n && s[z[i]] == s[i + z[i]]) z[i]++;
        if (i + z[i] > r) { l = i; r = i + z[i]; }
    }
    return z;
}

// Find all occurrences of p in t. Returns 0-indexed positions in t.
vector<int> z_search(const string &p, const string &t) {
    string s = p + '#' + t;
    auto z = z_function(s);
    int m = p.size();
    vector<int> res;
    for (int i = m + 1; i < (int)s.size(); i++)
        if (z[i] >= m)
            res.push_back(i - m - 1);
    return res;
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    string t, p;
    cin >> t >> p;
    auto occ = z_search(p, t);
    cout << occ.size() << "\n";
    for (int x : occ) cout << x << " ";
    cout << "\n";
}

4d. Z-Function (Explained)

cpp
#include <bits/stdc++.h>
using namespace std;

vector<int> z_function(const string &s) {
    int n = s.size();
    vector<int> z(n, 0);
    // [l, r) is the "Z-box": the rightmost segment that matches a prefix of s.
    int l = 0, r = 0;
    for (int i = 1; i < n; i++) {
        // If i is inside the current Z-box, we can reuse a previously computed
        // Z-value. z[i - l] tells us how far the match extends from the
        // corresponding position in the prefix copy, but we can't go past r.
        if (i < r) z[i] = min(r - i, z[i - l]);

        // Extend naively from where we left off.
        while (i + z[i] < n && s[z[i]] == s[i + z[i]]) z[i]++;

        // Update the Z-box if this extends further right.
        if (i + z[i] > r) { l = i; r = i + z[i]; }
    }
    return z;
}

vector<int> z_search(const string &p, const string &t) {
    // Concatenation trick: "pattern#text".
    // '#' must not appear in p or t.
    // Any position i in the concatenated string with z[i] >= |p|
    // corresponds to a match in t at position i - |p| - 1.
    string s = p + '#' + t;
    auto z = z_function(s);
    int m = p.size();
    vector<int> res;
    for (int i = m + 1; i < (int)s.size(); i++)
        if (z[i] >= m)
            res.push_back(i - m - 1);
    return res;
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    string t, p;
    cin >> t >> p;
    auto occ = z_search(p, t);
    cout << occ.size() << "\n";
    for (int x : occ) cout << x << " ";
    cout << "\n";
}

ASCII Diagram: KMP Automaton for Pattern "ABABAC"

Pattern:  A  B  A  B  A  C
State:    0  1  2  3  4  5  6(accept)
π[]:      0  0  1  2  3  0

State machine (match transitions -> solid, failure transitions ⇢ dashed):

     A       B       A       B       A       C
 (0)───▷(1)───▷(2)───▷(3)───▷(4)───▷(5)───▷(6) ACCEPT
  │      │      │      │      │      │
  │fail  │fail  │fail  │fail  │fail  │fail
  ▼      ▼      ▼      ▼      ▼      ▼
 (0)    (0)    (1)    (2)    (3)    (0)
  ↑      ↑             ↑             ↑
  └──────┘             └─────────────┘
   on mismatch          on mismatch

Transition table (state x char -> next state):
  State   'A'   'B'   'C'   other
  ─────────────────────────────────
    0       1     0     0     0
    1       1     2     0     0
    2       3     0     0     0
    3       3     4     0     0
    4       5     0     0     0
    5       1     2     6     0

Reading text "ABABABABAC":
  Pos:   0  1  2  3  4  5  6  7  8  9
  Char:  A  B  A  B  A  B  A  B  A  C
  State: 1  2  3  4  5  2  3  4  5  6 <- match at pos 4..9!

Key: on mismatch at state s, jump to π[s] and retry (no input consumed).
     This is equivalent to the longest proper prefix that is also a suffix.

Operations Reference

Both algorithms run in linear time, but they solve slightly different sets of problems. Use this table to verify which operations each one supports and at what cost.

Prefix Function (KMP)

OperationTimeSpace
Build prefix function π for string of length nO(n)O(n)
Pattern matching (p in t) via concatenationO(n+m)O(n+m)
Count all occurrencesO(n+m)O(n+m)
Find shortest period of stringO(n)O(n)
Find all borders of stringO(n) totalO(n)

Z-Function

OperationTimeSpace
Build Z-array for string of length nO(n)O(n)
Pattern matching (p in t) via concatenationO(n+m)O(n+m)
Count all occurrencesO(n+m)O(n+m)
Find shortest period of stringO(n)O(n)

Problem Patterns & Tricks

Pattern 1: Exact Pattern Matching

Find all occurrences of pattern p in text t. Build s=p+#+t, compute prefix function or Z-array, and check for values equal to |p|.

cpp
// Using Z-function
string s = p + '#' + t;
auto z = z_function(s);
for (int i = (int)p.size() + 1; i < (int)s.size(); i++)
    if (z[i] >= (int)p.size())
        cout << i - (int)p.size() - 1 << "\n";

Problems: CF 625B, CF 126B

Extracting Match Positions

KMP: When j == |pattern| during the search phase, a match ends at position i in the text. The match STARTS at i - |pattern| + 1 (0-indexed) or i - |pattern| depending on your loop convention. Always verify with a small example.

cpp
// KMP search -- matches at positions where pattern ends
for (int i = 0, j = 0; i < n; i++) {
    while (j > 0 && text[i] != pattern[j]) j = fail[j - 1];
    if (text[i] == pattern[j]) j++;
    if (j == m) {
        // Match found starting at i - m + 1
        matches.push_back(i - m + 1);
        j = fail[j - 1];  // Continue searching (overlapping matches)
    }
}

Z-function: Concatenate pattern + "$" + text (where $ doesn't appear in either). Then z[i] == |pattern| means a match starts at position i - |pattern| - 1 in the text.

The $ separator prevents the Z-values from "leaking" across the boundary.

Pattern 2: Shortest Period of a String

The shortest period of s has length nπ[n1], where π is the prefix function. If n is divisible by this length, the string is a full repetition; otherwise the last copy is partial.

cpp
auto pi = prefix_function(s);
int n = s.size();
int period = n - pi[n - 1];
// If n % period == 0, s = (s[0..period-1]) repeated n/period times.
  PERIOD DETECTION via prefix function:

  s = "abcabcab"  (n=8)
  π:  [0, 0, 0, 1, 2, 3, 4, 5]

  period = n - π[n-1] = 8 - 5 = 3
  
  ┌───┬───┬───┬───┬───┬───┬───┬───┐
  │ a │ b │ c │ a │ b │ c │ a │ b │
  └───┴───┴───┴───┴───┴───┴───┴───┘
  ├─ period ─┤─ period ─┤─ partial─┤
  │  "abc"   │  "abc"   │  "ab"    │

  8 % 3 = 2 != 0 -> NOT a perfect repetition.
  The string HAS period 3, but the last copy is truncated.

  Compare: s = "abcabc" (n=6), π = [0,0,0,1,2,3]
  period = 6 - 3 = 3.  6 % 3 = 0 -> perfect repetition: "abc"x2.

Problems: CF 182D, CF 1138B

Pattern 3: Counting Substring Occurrences

Count how many times each prefix of s appears as a substring. Process π in reverse: each π[i]>0 means the prefix of length π[i] appears once more. Propagate counts up the failure-link chain.

cpp
auto pi = prefix_function(s);
int n = s.size();
vector<int> cnt(n + 1, 1); // each prefix appears at least once (itself)
for (int i = n - 1; i >= 1; i--)
    cnt[pi[i]] += cnt[i];
// cnt[k] = number of times s[0..k-1] appears in s (for k >= 1)
// cnt[0] accumulates leftover and is usually not meaningful.

Problems: CF 432D, CF 1163D

Pattern 4: String Borders (Prefix = Suffix)

All borders of a string s have lengths π[n1],π[π[n1]1],π[π[π[n1]1]1], -- just follow the failure-link chain until you reach 0.

cpp
auto pi = prefix_function(s);
int n = s.size();
vector<int> borders;
int k = pi[n - 1];
while (k > 0) {
    borders.push_back(k);
    k = pi[k - 1];
}
  VISUAL: ALL BORDERS OF "abacaba" via failure chain

  a b a c a b a
  ├─────────────┤  full string (length 7)
  ├─────┤         π[6]=3: prefix "aba" = suffix "aba"  OK
  │a b a│c a│b a│
  ├─┤               π[2]=1: prefix "a" = suffix "a"    OK
  │a│b a c a b│a│
  
  Chain: 3 -> 1 -> 0 (stop). Borders: {3, 1}.

Problems: CF 126B, CF 1092F

Pattern 5: Number of Distinct Substrings (Incremental)

Add characters one by one and use Z-function to count new distinct substrings. When appending character c to current string s, compute Z-function of reverse(s+c). The number of new substrings is |s|+1max(Z[i]).

Problems: CF 271D

Pattern 6: Automaton Construction (KMP Automaton)

Precompute the transition function δ[j][c] for all states j and characters c. This gives O(1) per character matching and is useful when matching against multiple texts or combining with DP.

  KMP AUTOMATON for pattern "aba" (states 0,1,2,3):

  State 0 ──'a'──-> State 1 ──'b'──-> State 2 ──'a'──-> State 3 (MATCH)
    ↑   │            ↑   │            ↑               │
    │  'b'           │  'a'          │              'b'
    │   │            │   │           │                │
    └───┘            └───┘           └────────────────┘
  
  Full transition table:
  ┌───────┬─────┬─────┐
  │ State │  a  │  b  │
  ├───────┼─────┼─────┤
  │   0   │  1  │  0  │
  │   1   │  1  │  2  │
  │   2   │  3  │  0  │
  │   3   │  1  │  2  │  <- after match, fall to π[2]=1 ("a"), then δ[1][b]=2
  └───────┴─────┴─────┘
cpp
auto pi = prefix_function(p);
int m = p.size();
// delta[j][c] = next state when in state j and reading char c
vector<vector<int>> delta(m + 1, vector<int>(26));
for (int j = 0; j <= m; j++) {
    for (int c = 0; c < 26; c++) {
        if (j < m && c == p[j] - 'a')
            delta[j][c] = j + 1;
        else
            delta[j][c] = j == 0 ? 0 : delta[pi[j - 1]][c];
    }
}

Problems: CF 1163D, CF 808G


Contest Cheat Sheet

+---------------------------------------------------------------+
|              KMP & Z-FUNCTION CHEAT SHEET                      |
+---------------------------------------------------------------+
| WHEN TO USE:                                                   |
|   o Pattern matching in O(n+m) instead of O(n*m)               |
|   o String period / shortest repeating unit                    |
|   o All borders of a string (prefix = suffix)                  |
|   o Counting prefix occurrences                                |
+---------------------------------------------------------------+
| CONCATENATION TRICK:                                           |
|   s = pattern + '#' + text                                     |
|   '#' must not appear in either string!                        |
+---------------------------------------------------------------+
| KMP PREFIX FUNCTION:                                           |
|   pi[0]=0; for(i=1..n-1) {                                    |
|     j=pi[i-1]; while(j>0 && s[i]!=s[j]) j=pi[j-1];           |
|     if(s[i]==s[j]) j++; pi[i]=j; }                            |
|   Match: pi[i]==|p| => match at i-2*|p| in text               |
+---------------------------------------------------------------+
| Z-FUNCTION:                                                    |
|   l=r=0; for(i=1..n-1) {                                      |
|     if(i<r) z[i]=min(r-i,z[i-l]);                             |
|     while(i+z[i]<n && s[z[i]]==s[i+z[i]]) z[i]++;            |
|     if(i+z[i]>r) l=i, r=i+z[i]; }                            |
|   Match: z[i]>=|p| => match at i-|p|-1 in text                |
+---------------------------------------------------------------+
| PERIOD: period = n - pi[n-1]                                   |
| COMPLEXITY: O(n) build, O(n+m) match | SPACE: O(n+m)          |
+---------------------------------------------------------------+

Gotchas & Debugging

Off-by-one in prefix function

  • π[0] is always 0 -- the loop starts at i=1. Starting at i=0 won't crash but wastes a cycle and can mask bugs.
  • When extracting match positions from the concatenated string p+#+t, the offset is i2|p| for KMP (prefix function) and i|p|1 for Z-function. Mixing these up shifts all your answers.

Separator character in concatenation trick

  • The separator # must not appear in the input alphabet. If the input can contain any ASCII character, use a character outside the range (e.g., char(0) or char(1)) or just set Z/π values to be capped at |p|.
  • Forgetting the separator entirely makes π or Z values "leak" across the pattern-text boundary, producing phantom matches.

Using int for string sizes

  • string::size() returns size_t. Comparisons like i < s.size() - 1 when s is empty underflow to a huge number. Always cast: i < (int)s.size() - 1.

Empty pattern or text

  • If |p|=0, the concatenated string starts with #, and every position has Z[i]0=|p| -- you'll report matches everywhere. Guard against empty input.

Period formula edge case

  • period=nπ[n1]. If nmodperiod0, the string is not a perfect repetition -- the last copy is truncated. Don't assume full periodicity without checking the divisibility condition.

KMP automaton memory

  • Building the full automaton δ[m+1][|Σ|] uses O(m|Σ|) memory. For |Σ|=26 and m106, that's ~100 MB. In that case, stick with the on-the-fly prefix function instead of precomputing the automaton.

Debugging tip

  • Print the π or Z array for small inputs and verify manually. Off-by-one bugs in the inner while loop are the #1 source of WA.

Mental Traps

Trap 1: "The failure link chain is O(n) per call -- the algorithm must be O(n²)."

The inner while (j > 0 && s[i] != s[j]) j = pi[j-1] looks expensive. But j can only decrease as many times as it has increased. Since j increases by at most 1 per outer iteration, total decreases across ALL iterations <= n.

  AMORTIZED COST OF FAILURE CHAIN:

  j value over time for s = "aabaabaab":
  
  i:  0  1  2  3  4  5  6  7  8
  j:  0  1  0  1  2  3  4  5  2
      ↑  ↑  ↓  ↑  ↑  ↑  ↑  ↑  ↓↓↓
     +1 +1 -1 +1 +1 +1 +1 +1 -3

  Total increases: 7   (one per match)
  Total decreases: 4   (spread across fail chain walks)
  Total <= 2n ⟹ O(n)

Trap 2: "Z-function and KMP prefix function are redundant -- just learn one."

They compute different things. KMP gives π[i] = longest border of s[0..i]. Z gives z[i] = longest prefix of s matching at position i. For "find all borders of a string," KMP is natural (follow the π chain). For "does s[i..i+k-1] equal s[0..k-1]?" Z answers in O(1). Knowing both saves conversion headaches.

  KMP π-array for "aabaaab":
  i:   0  1  2  3  4  5  6
  π:   0  1  0  1  2  2  0     <- borders of each prefix

  Z-array for "aabaaab":
  i:   0  1  2  3  4  5  6
  Z:   -  1  0  2  1  1  0     <- prefix match lengths at each position

  DIFFERENT INFORMATION. Not interchangeable without O(n) conversion.

Trap 3: "I don't need the separator in p#t -- just cap π or Z values at |p|."

Capping works for Z-function (just check z[i] >= |p|). But for KMP's prefix function, without the separator, π[i] can exceed |p|, and capping it retroactively doesn't fix the damage -- the failure links were computed using the uncapped values, so subsequent π[i] values are wrong. Always use the separator with KMP.


Common Mistakes

  1. Missing or wrong separator. Concatenating pattern + text without a separator character (e.g., '#') causes false matches that span the boundary.
  2. Off-by-one in match position. When pi[i] == m in the concatenated string pattern + '#' + text, the match ends at position i - m - 1 in the text. Getting this index wrong misreports match locations.
  3. Not resetting after a match. To find all occurrences with KMP, set j = pi[j-1] after a full match, not j = 0. Resetting to 0 skips overlapping matches.
  4. Z-function off-by-one. z[0] is undefined (or set to 0/n by convention). Starting the scan at i = 1 is correct; starting at i = 0 causes issues.
  5. Confusing KMP and Z-function outputs. pi[i] is the length of the longest border ending at i. z[i] is the length of the longest prefix match starting at i. They answer different questions.
  6. Wrong fallback index in prefix function. "My prefix function seemed correct but produced wrong periods for strings like aabaab..." In the inner while-loop, writing len = pi[len] instead of len = pi[len - 1]. When s[i] != s[len], you want to fall back to the longest border of s[0..len-1], which is pi[len - 1]. Writing pi[len] reads one position too far and gives the border of s[0..len], breaking the invariant. This single off-by-one is the #1 KMP implementation bug.

Debug Drills

Drill 1 -- Prefix function: wrong fallback

cpp
vector<int> prefix_function(string& s) {
    int n = s.size();
    vector<int> pi(n, 0);
    for (int i = 1; i < n; i++) {
        int len = pi[i - 1];
        while (len > 0 && s[i] != s[len])
            len = pi[len];  // BUG
        if (s[i] == s[len]) len++;
        pi[i] = len;
    }
    return pi;
}
What's wrong?

pi[len] should be pi[len - 1]. When falling back, you want the longest border of the prefix s[0..len-1], not s[0..len]:

cpp
len = pi[len - 1];

Drill 2 -- Z-function: off-by-one in window update

cpp
vector<int> z_function(string& s) {
    int n = s.size();
    vector<int> z(n, 0);
    int l = 0, r = 0;
    for (int i = 1; i < n; i++) {
        if (i < r)
            z[i] = min(r - i, z[i - l]);
        while (i + z[i] < n && s[z[i]] == s[i + z[i]])
            z[i]++;
        if (i + z[i] > r)
            l = i, r = i + z[i] - 1;  // BUG
    }
    return z;
}
What's wrong?

r should be i + z[i], not i + z[i] - 1. The convention is that r is the exclusive right endpoint of the Z-box [l, r). With r = i + z[i] - 1, the window shrinks by 1 and the condition i < r misses the last character:

cpp
if (i + z[i] > r)
    l = i, r = i + z[i];

Drill 3 -- KMP matching: wrong occurrence index

cpp
string combined = pattern + "#" + text;
auto pi = prefix_function(combined);
for (int i = 0; i < combined.size(); i++) {
    if (pi[i] == pattern.size())
        cout << i << "\n";  // BUG: prints index in combined, not in text
}
What's wrong?

Position i is in the combined string pattern + '#' + text. The occurrence in text starts at i - 2 * pattern.size(). Correct:

cpp
if (pi[i] == (int)pattern.size())
    cout << (i - 2 * (int)pattern.size()) << "\n";

More precisely, the match ends at position i - pattern.size() - 1 in text, so it starts at i - 2 * pattern.size().


Self-Test

Before moving to the practice problems, you should be able to answer these without looking anything up:

  • [ ] Compute the prefix function for "aabaaab" by hand and verify each value.
  • [ ] Compute the Z-array for "aabaaab" by hand, tracking the [l, r) window at each step.
  • [ ] Given pattern "aba" and text "abababa", use the concatenation trick with BOTH KMP and Z-function to find all match positions. Verify the position formulas differ.
  • [ ] Find all borders of "abacabac" using the failure link chain. State every border length.
  • [ ] Determine the shortest period of "abcabcab" using the period formula. Is it a perfect repetition? Why or why not?
  • [ ] Build the KMP automaton δ[j][c] for pattern "ab" over alphabet {a, b}. Write out the full 3x2 transition table.
  • [ ] Explain in one sentence why KMP and Z-function are both O(n), referencing what quantity "only moves forward."

Practice Problems

CF RatingWhat You Should Be Able to Do
1200Use strstr or brute-force matching; understand what a prefix function value means.
1500Implement KMP and Z-function from scratch; solve single-pattern matching and period-finding problems.
1800Use the KMP automaton for DP on strings; exploit Z-function for multi-pattern tricks via concatenation.
2100Combine KMP/Z with other structures (segment tree, suffix array) for advanced string DP or online matching.
#ProblemSourceDifficultyKey Idea
1Pattern FindCSESEasyDirect KMP or Z-function pattern matching
2PasswordCF 126B1700Find longest border that also occurs in the middle
3MUH and Cube WallsCF 471D1800KMP on difference array
4Prefixes and SuffixesCF 432D2000All borders + counting prefix occurrences
5Anthem of BerlandCF 808G2300KMP automaton + DP
6Periodic StringsCF 182D1500Shortest period via prefix function
7String MatchingCSESEasyZ-function concatenation trick
8Shift and ReverseCF 1138B1600Period analysis with prefix function
9Vasya and Big IntegersCF 1051E2300Z-function + DP for string comparison
10Sasha and a Very Easy TestCF 1163D2200KMP automaton transitions + DP on strings

Further Reading

Related guidebook entries:


Recap

  • KMP prefix function: pi[i] = longest proper prefix of s[0..i] that equals a suffix. Built in O(n). Pattern matching via pattern + '#' + text.
  • Z-function: z[i] = length of the longest substring starting at i that matches a prefix of s. Also O(n).
  • KMP vs Z-function: largely interchangeable for pattern matching. Z-function is often simpler to code; KMP integrates naturally into automaton-based solutions.
  • Key applications: pattern search, period/border detection, string compression, counting occurrences.

Built for the climb from Codeforces 1100 to 2100.