Appearance
Aho-Corasick
Quick Revisit
- USE WHEN: Searching for multiple patterns simultaneously in a single text scan
- INVARIANT: Failure link of node v points to the longest proper suffix of v's string that is a prefix of some pattern in the trie
- TIME: O(n + m + z) where n = text length, m = total pattern length, z = matches | SPACE: O(m × σ)
- CLASSIC BUG: Not following the dictionary suffix link chain to report all matching patterns at each position (misses overlapping shorter patterns)
- PRACTICE: 07-ladder-strings
AL-37 | Multi-pattern string matching in linear time -- build a trie, thread failure links through it, and scan the text once to find all occurrences of all patterns in
Difficulty: [Advanced]Prerequisites: KMP and Z-Function, TrieCF Rating Range: 1900 -- 2200+
Contents
- Intuition
- C++ STL Reference
- Implementation
- Operations Reference
- Problem Patterns & Tricks
- Contest Cheat Sheet
- Gotchas & Debugging
- Self-Test
- Practice Problems
- Further Reading
Intuition
You have three patterns -- "he", "she", "his" -- and a text "ahishers". The brute-force plan: run KMP (or any single-pattern matcher) once per pattern. Each run costs
We need a way to search for all patterns in a single pass through the text.
Build a trie of all patterns, then add failure links (like KMP's failure function but across patterns) -- now a single pass through the text finds all pattern occurrences simultaneously.
KMP is a single highway: one pattern, one failure function, one automaton. Aho-Corasick is an interstate system with on-ramps between patterns. Each failure link is an on-ramp that says "the suffix you just read also happens to be a prefix of this other pattern -- jump there and keep driving." By threading all these on-ramps through the shared trie, one scan of the text rides every highway at once.
Visual Walkthrough
Patterns: "he", "she", "his". Text: "ahishers".
Step 1 -- Build the trie.
(root)
/ \
h s
| |
[1] [4]
/ \ |
e i h
| | |
[2] [3] [5]
{he} | |
s e
| |
[*] [6]
{his} {she}Nodes marked {...} are pattern-end nodes.
TRIE STRUCTURE after inserting "he", "she", "his":
root
├──'h'──-> [h]
│ ├──'e'──-> [he]OK
│ └──'i'──-> [hi]
│ └──'s'──-> [his]OK
└──'s'──-> [s]
└──'h'──-> [sh]
└──'e'──-> [she]OK
Nodes marked OK = end of a pattern.
Total trie nodes: 7 (excluding root).Step 2 -- Compute failure links (BFS from root).
Failure links point to the longest proper suffix of the node's string that exists as a prefix of some node in the trie.
Node String Fail Notes
---- ------ ---- -----
root "" root
[1] "h" root no proper suffix "h" elsewhere
[4] "s" root
[2] "he" root "e" is not in the trie as a prefix
[3] "hi" root
[5] "sh" [1] suffix "h" = node [1]
[*] "his" root "is","s" not in trie as prefixes
[6] "she" [2] suffix "he" = node [2] {he} <-- dictionary suffix link!The critical link: [6] "she" --fail--> [2] "he". Whenever we match "she", the failure chain also reaches the node for "he", so both matches are reported.
Step 3 -- Scan text "ahishers" through the automaton.
Pos Char State Action
--- ---- ----- ------
0 a root no edge 'a' from root -> stay at root
1 h [1] edge 'h' from root
2 i [3] edge 'i' from [1]="h"
3 s [*] edge 's' from [3]="hi"
** MATCH "his" ending at pos 3 **
4 h [1] fail from [*] has no 'h'; root has 'h' -> [1]
5 e [2] edge 'e' from [1]="h"
** MATCH "he" ending at pos 5 **
6 r root no edge 'r' from [2]; fail to root; no 'r' -> root
7 s [4] edge 's' from rootStep 4 -- Collect results.
Wait -- did we miss "she"? Let us re-examine. In "ahishers", the substring "she" starts at position 4: a h i s h e r s. Positions 4-5-6 are h e r, not s h e. In fact "she" does not appear in "ahishers". The automaton is correct: it found "his" (positions 1..3) and "he" (positions 4..5). Two matches, one pass,
Step 5 -- Why failure links matter.
After matching "his" at node [*], the next character is 'h'. Node [*] has no child 'h', so the automaton follows the failure link to root, finds edge 'h', and lands at [1]. Without failure links we would have to restart from the root and rescan -- losing the overlap. The links make the automaton never backtrack in the text.
Overlapping patterns: {he, she, hers} over text "ushers"
The walkthrough above did not actually exercise the dictionary suffix link, because "ahishers" happens not to contain "she". Here is the canonical small example that does. Patterns: he, she, hers. Text: "ushers".
Trie: Failure links (only the interesting ones):
root
/ \ sh --fail-> h (suffix "h" of "sh")
h s she* --fail-> he* (suffix "he" of "she")
/ \ \ her --fail-> root
e r h hers*--fail-> s (suffix "s" of "hers")
* | \
s e Dictionary suffix links:
* * she* --dict-> he* (because he* IS a pattern,
reachable via the fail chain)
others --dict-> nothing useful
Stars (*) mark pattern-end nodes.
Text scan ("ushers"):
Pos Char State Outputs at this position
--- ---- ------------- ---------------------------------------
0 u root --
1 s [s] --
2 h [sh] --
3 e [she]* "she" (state itself is terminal)
-> follow dict link to [he]*: also report "he"
4 r [her] --
5 s [hers]* "hers" (terminal)
-> follow dict link: nothing further
Reported matches: she @ pos 1..3
he @ pos 2..3 <-- found ONLY via the dict suffix link
hers@ pos 2..5The line that matters is at position 3. The automaton state [she] is itself a pattern, so we report "she". But the dictionary suffix link from [she] jumps to [he], which is also a pattern -- so without traversing dict links we would silently miss "he". This is the entire reason dictionary suffix links exist as a separate notion from failure links: failure links find the right state to be in, dictionary suffix links find every pattern that ends at the current text position.
The Invariant
Invariant. At position
in the text, the automaton state represents the longest suffix of that is a prefix of some pattern in the trie. Failure links point from each node to the longest proper suffix of that node's string that is also a node in the trie. Dictionary suffix links further point to the nearest ancestor (via the failure chain) that marks a pattern end, ensuring no match is missed.
What the Code Won't Teach You
The 40-line AC template hides three conceptual layers that you must understand separately:
Why BFS order matters for construction. Failure links are computed in BFS order because
fail[v]depends onfail[parent[v]], which must already be computed (it's shallower). If you compute in DFS order,fail[parent[v]]might not exist yet.The automaton is a DFA -- every state has a transition for every character. The
go[v][c]array has no holes. Missing trie edges are filled by following fail links during BFS. This is what makes text scanning O(|text|) -- no fail-chain chasing at query time.Dictionary suffix links are a separate optimization. They shortcut the fail chain to the nearest complete pattern. Without them, collecting all matches at a position requires walking the entire fail chain (potentially O(depth) per character). With them, you jump directly to relevant matches.
Layer 1: TRIE Layer 2: + FAIL LINKS Layer 3: + OUTPUT LINKS
┌─────────────┐ ┌─────────────────┐ ┌──────────────────────┐
│ Insert all │ │ BFS to compute │ │ For each node, cache │
│ patterns │ -> │ fail[v] for all │ -> │ nearest pattern via │
│ into trie │ │ nodes in BFS │ │ fail chain │
│ │ │ order; fill all │ │ │
│ O(M) time │ │ go[v][c] edges │ │ O(M) time │
│ │ │ O(M*Σ) time │ │ │
└─────────────┘ └─────────────────┘ └──────────────────────┘When to Reach for This
Trigger phrases in problem statements:
- "multiple pattern matching" / "dictionary of patterns in a text"
- "banned substrings" / "forbidden words"
- "how many of the given strings appear in
?" - "count occurrences of each pattern"
- "DP avoiding a set of forbidden patterns" (AC automaton + DP)
Counter-examples -- when not to use AC:
- Single pattern -- use KMP or Z-function. AC adds unnecessary trie overhead.
- Patterns are all substrings of each other (e.g.
"a","aa","aaa") -- AC still works but the dictionary-link walk becomesper character; use the push-down technique (see Section 6, Pattern 4) or consider suffix structures. - You need suffix/substring structure of the text itself -- suffix array or suffix automaton is the right tool.
C++ STL Reference
| Function / Class | Header | What it does | Time Complexity |
|---|---|---|---|
queue<int> | <queue> | BFS queue for computing failure links | |
array<int, 26> | <array> | Fixed-size child pointer array per trie node | |
vector<int> | <vector> | Storing node data, pattern indices, output lists | |
string::operator[] | <string> | Character access during text scan | |
fill(first, last, val) | <algorithm> | Initialize child arrays to | |
memset | <cstring> | Fast bulk initialization of node arrays |
Note: AC automaton is a hand-built data structure. The STL provides only basic containers -- there is no std::aho_corasick. The table above lists the STL pieces you'll compose into the implementation.
Implementation (Contest-Ready)
Failure Links != Output Links
Failure link (suffix link): Points to the longest proper suffix of the current string that is also a prefix of some pattern. Used to navigate the automaton on mismatch.
Output link (dictionary suffix link): Points to the longest proper suffix of the current string that is itself a COMPLETE pattern. Used to find ALL patterns that end here.
Why you need BOTH:
- Failure links let you traverse efficiently (like KMP's failure function).
- Output links let you enumerate all matches. Without output links, you only find the longest matching pattern at each position, missing shorter ones.
Precompute all transitions: Instead of following failure links at runtime, precompute go[v][c] for every node and character:
cpp
// In BFS order:
if (ch is a child of v) {
go[v][c] = child;
} else {
go[v][c] = go[fail[v]][c]; // Inherit from failure link
}This makes each step O(1) instead of following a chain of failure links.
Version 1: Minimal -- Array-Based AC Automaton
cpp
#include <bits/stdc++.h>
using namespace std;
struct AhoCorasick {
static constexpr int ALPHA = 26;
struct Node {
array<int, ALPHA> ch;
int fail = 0, end_cnt = 0, dict = -1;
Node() { ch.fill(-1); }
};
vector<Node> t;
AhoCorasick() : t(1) {}
void insert(const string& s, int id) {
int v = 0;
for (char c : s) {
int ci = c - 'a';
if (t[v].ch[ci] == -1) {
t[v].ch[ci] = (int)t.size();
t.emplace_back();
}
v = t[v].ch[ci];
}
t[v].end_cnt++;
t[v].dict = id;
}
void build() {
queue<int> q;
for (int c = 0; c < ALPHA; c++) {
if (t[0].ch[c] == -1)
t[0].ch[c] = 0;
else {
t[t[0].ch[c]].fail = 0;
q.push(t[0].ch[c]);
}
}
while (!q.empty()) {
int v = q.front(); q.pop();
for (int c = 0; c < ALPHA; c++) {
int u = t[v].ch[c];
int f = t[t[v].fail].ch[c];
if (u == -1) {
t[v].ch[c] = f;
} else {
t[u].fail = f;
// dictionary suffix link: nearest ancestor via fail with end_cnt > 0
t[u].dict = (t[f].end_cnt > 0) ? f : t[f].dict;
q.push(u);
}
}
}
}
// Count total matches in text. Modify for specific needs.
long long search(const string& text) {
long long cnt = 0;
int v = 0;
for (char c : text) {
v = t[v].ch[c - 'a'];
for (int u = v; u > 0; u = t[u].dict) {
if (t[u].end_cnt > 0)
cnt += t[u].end_cnt;
}
}
return cnt;
}
};
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n;
cin >> n;
AhoCorasick ac;
for (int i = 0; i < n; i++) {
string s;
cin >> s;
ac.insert(s, i);
}
ac.build();
string text;
cin >> text;
cout << ac.search(text) << "\n";
}Version 2: Explained -- With Per-Pattern Counting
cpp
#include <bits/stdc++.h>
using namespace std;
struct AhoCorasick {
static constexpr int ALPHA = 26;
struct Node {
array<int, ALPHA> ch; // ch[c] = index of child node for character c
int fail = 0; // failure (suffix) link
int dict = -1; // dictionary suffix link: nearest pattern-end ancestor via fail chain
int pat_id = -1; // which pattern ends here (-1 if none)
Node() { ch.fill(-1); }
};
vector<Node> t;
int num_patterns = 0;
AhoCorasick() : t(1) {}
// Insert pattern s with index id into the trie.
// Time: O(|s|).
void insert(const string& s, int id) {
int v = 0;
for (char c : s) {
int ci = c - 'a';
if (t[v].ch[ci] == -1) {
t[v].ch[ci] = (int)t.size();
t.emplace_back();
}
v = t[v].ch[ci];
}
// If multiple patterns are identical, only the last id is stored.
// For multi-id, use a vector<int> or linked list.
t[v].pat_id = id;
num_patterns = max(num_patterns, id + 1);
}
// Build failure links and dictionary suffix links via BFS.
// After build(), t[v].ch[c] is defined for ALL (v, c) pairs --
// missing edges are filled in to point to the correct fail-chain node,
// turning the trie into a full automaton (no explicit fail-following needed during search).
// Time: O(m * ALPHA) where m = total pattern length = number of trie nodes.
void build() {
queue<int> q;
// Initialize root's children: missing edges loop back to root.
for (int c = 0; c < ALPHA; c++) {
if (t[0].ch[c] == -1) {
t[0].ch[c] = 0; // missing edge -> back to root
} else {
t[t[0].ch[c]].fail = 0;
q.push(t[0].ch[c]);
}
}
while (!q.empty()) {
int v = q.front(); q.pop();
for (int c = 0; c < ALPHA; c++) {
int u = t[v].ch[c]; // actual child
int f = t[t[v].fail].ch[c]; // where fail link's child goes
if (u == -1) {
// No real child for c -- automaton edge goes to f.
t[v].ch[c] = f;
} else {
// Real child exists. Its failure link = f.
t[u].fail = f;
// Dictionary suffix link: nearest node in the fail chain that ends a pattern.
t[u].dict = (t[f].pat_id >= 0) ? f : t[f].dict;
q.push(u);
}
}
}
}
// Search text and count occurrences of each pattern.
// Returns a vector cnt where cnt[id] = number of occurrences of pattern id.
// Time: O(|text| + z), where z = total number of matches.
vector<int> search(const string& text) {
vector<int> cnt(num_patterns, 0);
int v = 0;
for (char c : text) {
v = t[v].ch[c - 'a']; // automaton transition -- always valid after build()
// Walk dictionary suffix links to collect all patterns ending here.
for (int u = v; u > 0; u = t[u].dict) {
if (t[u].pat_id >= 0)
cnt[t[u].pat_id]++;
}
}
return cnt;
}
};
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n;
cin >> n;
AhoCorasick ac;
vector<string> patterns(n);
for (int i = 0; i < n; i++) {
cin >> patterns[i];
ac.insert(patterns[i], i);
}
ac.build();
string text;
cin >> text;
auto cnt = ac.search(text);
for (int i = 0; i < n; i++)
cout << patterns[i] << ": " << cnt[i] << "\n";
}ASCII Trace: Aho-Corasick Automaton for
STEP 1 -- Build Trie (insert patterns):
(root)
/ \
h s
/ \ \
e* i h
| \ \
r s* e*
|
s*
Nodes: 0=root, 1=h, 2=he*, 3=hi, 4=his*, 5=her, 6=hers*, 7=s, 8=sh, 9=she*
(* = output node -- marks end of a pattern)
STEP 2 -- Build Failure Links (BFS from root):
Node Fail Reason
──────────────────────────────────────────────
h -> root no proper suffix of "h" in trie
s -> root no proper suffix of "s" in trie
he -> root suffix "e" not in trie
hi -> root suffix "i" not in trie
sh -> h suffix "h" is node 1
he*r -> root suffix "r" not in trie
hi*s -> s suffix "s" is node 7
sh*e -> he suffix "he" is node 2 -> dict link to "he"*
her*s -> s suffix "s" is node 7
Failure links (── ▷) and dictionary links (══▷):
root ◁── h ◁── he* ◁── her ◁── hers*
▲ │
│ ▼ (fail)
root ◁── s ◁── sh ──▷ (fail->h)
│
she* ══▷ he* (dict link: also outputs "he")
STEP 3 -- Search text "ahishers":
Pos Char State Output
────────────────────────────────────────
0 a root -
1 h h -
2 i hi -
3 s his* match "his" at pos 1
4 h h - (fail from his->s->root, then 'h'->h)
5 e he* match "he" at pos 4
6 r her -
7 s hers* match "hers" at pos 4
+ dict follow -> match "she" at pos 3? No.
(via fail: hers->s, no further output)
Matches found: "his"@1, "he"@4, "hers"@4
(Note: "she"@1 also matched via she*->he* dict link when processing pos 3
in a full implementation that checks all suffix outputs.)ASCII Diagram: Trie with Failure Links (3 Patterns: {he, she, his})
Solid arrows (->) = trie edges Dashed arrows (⇢) = failure links
┌─────── root ───────┐
│ (fail -> self) │
▼ ▼
┌─── h ──┐ s
│ │ │
▼ ▼ ▼
he* hi sh *****⇢ h (fail)
* │ │
* ▼ ▼
* his* she* ****⇢ he* (fail + dict link)
*
(he fails -> root)
On mismatch at any node, follow its failure link and retry.
Dictionary links chain output-producing failure ancestors.Operations Reference
Each phase of the AC pipeline has a distinct cost profile -- consult this table to budget time and memory before coding.
| Operation | Time | Space |
|---|---|---|
| Insert one pattern of length | ||
| Insert all | ||
| Build failure links (BFS) | ||
| Search text of length | ||
| Search text (per-pattern count) | ||
| Full pipeline: insert + build + search |
Memory note: With int children, each node uses int ch[MAXNODES][26] to avoid vector overhead, or reduce to per-problem alphabet size.
Problem Patterns & Tricks
Pattern 1: Multi-Pattern Search (Direct)
Given a text and
Approach: Insert all patterns, build, scan text, collect matches via dictionary suffix links.
Examples: CF 1400F -- x-Prime Substrings, CF 710F -- String Set Queries
Pattern 2: DP on AC Automaton
Build the AC automaton as states. Run DP where dp[i][v] = "best value after processing
Typical problem: "Count strings of length
AC AUTOMATON AS DP STATE SPACE:
Problem: "Count strings of length N over {a,b} avoiding patterns {ab, ba}"
AC automaton states: {root, a, b, [ab]X, [ba]X}
DP[i][state] = # strings of length i ending in 'state'
Transitions (only non-forbidden states):
┌───────┐ 'a' ┌───┐ 'b' ┌────┐
│ root │──────-> │ a │──────-> │ ab │X (forbidden)
│ │ 'b' ├───┤ 'a' ├────┤
│ │──────-> │ b │──────-> │ ba │X (forbidden)
└───────┘ └───┘ └────┘
Only walk transitions where target is NOT a forbidden state.
Answer = Σ DP[N][non-forbidden states]cpp
// dp[i][v] = number of strings of length i ending at automaton state v
// Transition: dp[i+1][t[v].ch[c]] += dp[i][v] for each c in [0, ALPHA)
// Skip states that lie on a pattern-end path (if counting avoidance).Examples: CF 1207G -- Indie Album, CF 86C -- Genetic Engineering
Pattern 3: AC Automaton + Bitmask DP
When dp[v][mask] = "at automaton state mask."
Goal: Find the shortest string that contains all
Examples: CF 533F -- Encoding
Pattern 4: Counting Matches Efficiently (Push-Down Technique)
When
cpp
// During search: only do cnt[v]++ (don't walk dict links)
// After search: build BFS order, then push counts in reverse
vector<int> order;
order.reserve(t.size());
queue<int> bfs;
for (int c = 0; c < 26; c++)
if (t[0].ch[c] != 0) bfs.push(t[0].ch[c]);
while (!bfs.empty()) {
int v = bfs.front(); bfs.pop();
order.push_back(v);
for (int c = 0; c < 26; c++)
if (t[v].ch[c] != t[t[v].fail].ch[c])
bfs.push(t[v].ch[c]);
}
// Process nodes in reverse BFS order (deepest first)
for (int i = (int)order.size() - 1; i >= 0; i--)
cnt[t[order[i]].fail] += cnt[order[i]];
// Now cnt[v] = total times the automaton visited v or any descendant in fail treeThis reduces worst-case search from
Examples: CF 710F -- String Set Queries, CF 963D -- Frequency of String
Pattern 5: Online / Dynamic Pattern Set
Patterns are added or removed between queries. Rebuild the AC automaton after each batch of updates. For
Examples: CF 710F -- String Set Queries
Pattern 6: AC on Tree (Euler Tour + Offline)
Given strings on tree edges/nodes, answer "how many times does pattern
Examples: CF 1207G -- Indie Album, CF 547E -- Mike and Friends
Contest Cheat Sheet
+------------------------------------------------------------------+
| AHO-CORASICK CHEAT SHEET |
+------------------------------------------------------------------+
| WHEN TO USE: |
| - Multiple pattern search in a single text |
| - "Which of these k patterns appear?" / count occurrences |
| - DP over string states avoiding/requiring patterns |
| - k * |text| is too slow (need linear scan) |
+------------------------------------------------------------------+
| BUILD: |
| 1. Insert all patterns into trie O(m) |
| 2. BFS to compute fail + dict links O(m * ALPHA) |
| 3. Scan text, follow automaton edges O(n + z) |
+------------------------------------------------------------------+
| KEY FORMULAS: |
| Nodes <= m + 1 (m = sum of pattern lengths) |
| Memory ~ m * ALPHA * 4 bytes |
| Time: O(m * ALPHA + n + z) |
+------------------------------------------------------------------+
| FAIL LINK: longest proper suffix of current path in trie |
| DICT LINK: nearest pattern-end reachable via fail chain |
| AUTOMATON: after build, ch[v][c] always valid (no -1) |
+------------------------------------------------------------------+
| OPTIMIZE: push-down on fail tree to avoid O(n*k) dict walks |
+------------------------------------------------------------------+Gotchas & Debugging
Forgetting dictionary suffix links
Failure links alone find only the longest pattern ending at each position. If "she" and "he" are both patterns and you match "she", the failure link from the "she" node goes to the "he" node. Without dictionary suffix links (or walking the full fail chain), you miss "he". Always walk dict links or use the push-down technique.
Overlapping identical patterns
If pattern "abc" is inserted twice, you need end_cnt (a counter) at the node, not just a boolean flag. Otherwise you undercount.
Missing automaton completion
After build(), every t[v].ch[c] must be defined (no
Alphabet mismatch
If patterns use characters outside [a-z], the c - 'a' mapping produces negative or out-of-range indices. Either remap characters or use unordered_map<char,int> per node (much slower).
Memory limit exceeded
With vector<Node> with array<int,26> per node uses
cpp
int ch[MAXNODES][26];
int fail[MAXNODES], dict[MAXNODES], pat_id[MAXNODES];This eliminates per-vector-element overhead and is cache-friendlier.
MEMORY LAYOUT COMPARISON:
Vector of structs (high overhead): Flat arrays (cache-friendly):
┌────────────────────────┐ ┌────────────────────────────┐
│ struct Node { │ │ int ch[MAXN][26]; // 26 │
│ array<int,26> ch; │ xMAXN │ int fail[MAXN]; │
│ int fail, dict; │ nodes │ int dict[MAXN]; │
│ int pat_id; │ │ int pat_id[MAXN]; │
│ }; │ │ │
│ sizeof ~= 120 bytes │ │ sizeof per node ~= 116 B │
│ + vector overhead │ │ but contiguous -> fast │
└────────────────────────┘ └────────────────────────────┘
M=5x10⁵ -> ~72 MB + overhead M=5x10⁵ -> ~58 MB, no overheadWorst-case dictionary suffix link walks
Text "aaa...a" with patterns "a", "aa", "aaa", ..., "a^k" causes
Debug checklist
- Print trie: for each node, print its string, fail target, dict target,
pat_id. - Verify
build()leaves noch[v][c] == -1. - Test with a single pattern (should degrade to KMP behavior).
- Test with identical patterns.
- Test with empty text and empty pattern set.
Mental Traps
Trap 1: "I built the trie, so I'm done." Building the trie is step 1 of 3. Without failure links (step 2) and automaton completion (step 3), you have a structure that can only match patterns starting at position 0.
INCOMPLETE SETUP -- what many beginners build:
Trie only (no fail links): With fail + automaton:
root──h──e(OK) root──h──e(OK)──r──s(OK)
│ │ ↑fail ↑fail
└──s──h──e(OK) └──s──h──e(OK)
↑fail│ ↑fail=h->e
root │
↓
go[s-node]['h'] -> h-node
(automaton fills ALL edges)Trap 2: "Output links and failure links are the same thing." Failure link = where to go on mismatch (longest suffix that's a trie prefix). Output link = nearest ancestor via fail chain that is a complete pattern. Confusing them means you find mismatches but miss overlapping matches.
Patterns: {"he", "she", "hers"}
At node for "she":
fail["she"] -> "e" node (longest suffix in trie)
output["she"] -> "he" node (nearest COMPLETE pattern via fail chain)
If you only follow fail links for matches:
X You land on "e" -- NOT a complete pattern -- and stop.
X You miss that "he" also matched.
If you follow output links:
OK output["she"] = "he" node -> report "he" match
OK output["he"] = null -> doneTrap 3: "AC is overkill -- I'll just run K separate KMP searches." For K=3, sure. For K=10⁵ patterns with total length M=10⁶ against text of length N=10⁶, KxKMP = O(KN) = 10¹¹. AC = O(N+M) = 210⁶. Know the crossover point.
Common Mistakes
- Missing dictionary suffix links during output. "My AC automaton found some patterns but missed others that definitely appear in the text..." You computed failure links correctly but forgot to follow dictionary suffix links during the output phase. At a node
v, checking ifvitself is terminal is not enough -- you must also walk the fail chain to find shorter patterns that also end at this position. Fix: maintain adict_suffix_link[v](the nearest terminal ancestor in the fail chain) or, after scanning, propagate counts from children to parents in the fail tree.
Debug Drills
Drill 1 -- BFS for failure links skips the root's children
cpp
queue<int> q;
q.push(0); // root
while (!q.empty()) {
int v = q.front(); q.pop();
for (auto& [c, u] : trie[v].children) {
trie[u].fail = (v == 0) ? 0 : trie[trie[v].fail].children[c];
q.push(u);
}
}What's wrong?
When computing trie[u].fail for a non-root parent, trie[trie[v].fail].children[c] might not exist. You need to follow the fail chain until you find a node that has child c, or reach root. The correct approach: precompute the go[v][c] function (automaton transition) that handles fail-link following, then fail[u] = go[fail[v]][c].
Drill 2 -- Not marking output via fail chain
cpp
// Scanning text
int v = 0;
for (char c : text) {
v = go[v][c];
if (trie[v].is_terminal)
results.push_back(trie[v].pattern_id);
// Missing: check fail chain for shorter pattern matches
}What's wrong?
At node v, there might be shorter patterns ending at the same position -- reachable via the fail chain. Add:
cpp
for (int u = v; u != 0; u = trie[u].dict_suffix_link) {
if (trie[u].is_terminal)
results.push_back(trie[u].pattern_id);
}Or use the dict_suffix_link to jump directly to the nearest terminal ancestor.
Drill 3 -- Automaton transition function missing fail fallback
cpp
int go(int v, int c) {
if (trie[v].children.count(c))
return trie[v].children[c];
return 0; // BUG: should follow fail link, not jump to root
}What's wrong?
When node v has no child for character c, you should follow fail[v] and try again (recursively), not jump directly to root. The standard approach precomputes go[v][c] during BFS:
cpp
if (trie[v].children.count(c))
go[v][c] = trie[v].children[c];
else
go[v][c] = (v == 0) ? 0 : go[trie[v].fail][c];Self-Test
Before moving to the practice problems, you should be able to answer these without looking anything up:
- [ ] Given patterns {"ab", "b", "bc"}, draw the trie, compute all failure links via BFS, and fill in automaton transitions for every (state, character) pair.
- [ ] Trace the AC automaton on text "xabcy" with the patterns above -- list every state visited and every pattern match reported (including via output links).
- [ ] Explain why the root's failure link must point to itself, and what breaks if it doesn't.
- [ ] State the time complexity of AC construction and text scanning separately, in terms of total pattern length M, text length N, alphabet size Σ, and number of matches Z.
- [ ] Describe the push-down technique for counting all pattern occurrences in O(N + M) without walking output links during scanning.
- [ ] Give an example where running K separate KMP instances is faster than AC (small K), and one where AC dominates (large K).
Practice Problems
| CF Rating | What You Should Be Able to Do |
|---|---|
| 1200 | Search for a single pattern using KMP or find(); understand what a trie is. |
| 1500 | Build a trie of patterns and add failure (suffix) links via BFS; scan a text to find all pattern occurrences. |
| 1800 | Use dictionary suffix links for efficient output; combine AC with DP on the automaton states. |
| 2100 | DP on the AC automaton with bitmask or counting states; combine with fail-tree properties (subtree queries, Euler tour). |
| # | Problem | Source | Difficulty | Key Idea |
|---|---|---|---|---|
| 1 | Aho-Corasick (EDU) | CF EDU | 1600 | Step-by-step introduction, build and query |
| 2 | String Set Queries | CF 710F | 1900 | Dynamic pattern set, rebuild AC per batch |
| 3 | Frequency of String | CF 963D | 2000 | Per-pattern occurrence positions, push-down |
| 4 | Genetic Engineering | CF 86C | 2100 | DP on AC automaton, coverage constraints |
| 5 | Indie Album | CF 1207G | 2200 | AC on tree + Euler tour + BIT |
| 6 | Mike and Friends | CF 547E | 2200 | Suffix automaton / AC + fail tree queries |
| 7 | x-Prime Substrings | CF 1400F | 2300 | AC + DP, avoid forbidden substrings |
| 8 | Sleeping | CF 519E | 2100 | AC automaton state DP with modular counting |
| 9 | Finding Patterns | CSES | 1800 | Multi-pattern existence check |
| 10 | Counting Patterns | CSES | 1900 | Count occurrences of each pattern |
| 11 | Pattern Positions | CSES | 1900 | Find first occurrence position of each pattern |
Further Reading
- cp-algorithms: Aho-Corasick -- clean reference with detailed proofs and code.
- CF Blog: Aho-Corasick tutorial -- community writeup with worked examples.
- CF EDU: String Algorithms -- structured AC problem set.
- Original paper: Aho & Corasick, "Efficient String Matching: An Aid to Bibliographic Search" (1975).
- Prerequisite: KMP and Z-Function -- single-pattern failure function.
- Prerequisite: Trie -- underlying data structure.
- Related: Suffix Array, Suffix Automaton (tracked in GAPS.md for a future advanced file)