Appearance
Trie
A trie stores shared prefixes once. Each edge consumes one character, and each root-to-node path spells a prefix, so lookup, insert, and delete stay at
Difficulty: [Intermediate]Prerequisites: Arrays and Strings
Contest Frequency: Regular—appears in string and XOR problems
Contents
- Intuition
- The Invariant That Makes Counting Work
- Mental Model Traps
- When to Reach for This
- Memory Layout
- Implementation
- Binary Trie for XOR Maximization
- Operations Reference
- Problem Patterns
- Contest Cheat Sheet
- Common Mistakes & Debugging
- Self-Test
- Practice Problems
- Further Reading
Intuition
You have five strings: "cat", "car", "card", "do", "dog". Someone asks: "How many of these start with "ca"?"
Your first instinct: loop through all five strings, compare character by character.
text
Query: "ca"
"cat" -- c=c, a=a --> match. Count = 1
"car" -- c=c, a=a --> match. Count = 2
"card" -- c=c, a=a --> match. Count = 3
"do" -- c!=d --> skip
"dog" -- c!=d --> skip
Result: 3. Character comparisons: 2+2+2+1+1 = 8Now a second query: "How many start with "d"?"
text
Query: "d"
"cat" -- c!=d --> skip
"car" -- c!=d --> skip
"card" -- c!=d --> skip
"do" -- d=d --> match. Count = 1
"dog" -- d=d --> match. Count = 2
Result: 2. Character comparisons: 1+1+1+1+1 = 5Two queries cost
The waste is obvious: we compared "ca" against "do" and "dog" even though they share no characters with the query. Every query re-scans the entire set, repeating the same prefix comparisons over and over.
The fix is to store each shared prefix once—as a path in a tree of characters, where each edge is one letter and each root-to-node path spells out a unique prefix. That is exactly a trie. Autocomplete on a phone keyboard works this way: type 'c' and the candidate list narrows; type 'a' and it narrows further. You never re-scan the entire dictionary on each keystroke—you follow one branch deeper into the tree. But a trie is exact: one wrong character and you hit a dead end. No approximate matching comes for free.
Visual Walkthrough
Same five strings: "cat", "car", "card", "do", "dog". Build the trie one insertion at a time. Each node tracks cnt = how many inserted strings pass through it.
Step 1: Insert "cat"
Create nodes along the path root → c → a → t. Mark t as end-of-word.
text
(root)
|
c cnt=1
|
a cnt=1
|
t* cnt=1
* = end-of-word marker
Nodes created: 3Step 2: Insert "car"
Walk existing path root → c → a (shared with "cat"). Branch at a: create child r. Increment counts along the shared path.
text
(root)
|
c cnt=2
|
a cnt=2
/ \
t* r*
1 1
Nodes created: 1 (only r is new)
The prefix "ca" is stored ONCE, shared by both words.Step 3: Insert "card"
Walk root → c → a → r (all exist). Create child d under r.
text
(root)
|
c cnt=3
|
a cnt=3
/ \
t* r* cnt=2
1 |
d* cnt=1
Nodes created: 1
"car" is a prefix of "card" -- both share the path to r.Step 4: Insert "do"
No d child at root. Create a new branch: root → d → o.
text
(root)
/ \
c cnt=3 d cnt=1
| |
a cnt=3 o* cnt=1
/ \
t* r* cnt=2
1 |
d* cnt=1
Nodes created: 2Step 5: Insert "dog"
Walk root → d → o (exists). Create child g under o.
text
(root)
/ \
c cnt=3 d cnt=2
| |
a cnt=3 o* cnt=2
/ \ / \
t* r* cnt=2 g* cnt=1
1 |
d* cnt=1
Total nodes: 8
Total characters across all strings: 15
Shared prefixes saved us 7 nodes.Now query: "How many strings start with "ca"?"
Walk root → c → a. Read cnt = 3. Done.
text
Char comparisons: 2 (one per query character).
Brute force needed 8 comparisons for this same query.Query: "How many start with "d"?"
Walk root → d. Read cnt = 2. Done in 1 comparison.
text
Brute force: 13 char comparisons for 2 queries.
Trie: 3 char comparisons for 2 queries.
And the gap widens as n grows: brute force is O(n * |query|)
per query, trie is O(|query|) -- completely independent of n.The Invariant That Makes Counting Work
text
+----------------------------------------------------------------+
| INVARIANT: Each path from root to a node represents a unique |
| prefix of at least one inserted string. Children of every node |
| partition the continuations by the next character. The count at |
| each node equals the number of inserted strings whose prefix |
| matches that root-to-node path. |
+----------------------------------------------------------------+Every insert walks the existing shared path and only creates a new node when the next character has no matching child. No node exists without at least one inserted string passing through it. The cnt counter is incremented at every node along the insertion path, so it always reflects the exact number of strings sharing that prefix.
Connect back to the walkthrough: in Step 3, inserting "card" walked the existing path c → a → r (shared with "car") and created a new node only for d. The counts at c, a, and r all increased by 1, correctly reflecting that one more string passes through each prefix. Skip the count update at r and a query for prefix "car" returns 1 instead of 2—the invariant breaks.
Mental Model Traps
What the Code Won't Teach You
Characters live on edges, not in nodes.
This is the single most common source of confusion. A trie node does not "contain" a character—it represents the state reached after consuming that character. The root node contains no character at all (it's the empty prefix). When you see children[c], the character c labels the edge from the current node to the child, not the child itself.
text
Edge-labeled (correct mental model):
(root) Node = "state after reading prefix"
/ \ Edge = "consume this character"
'c' 'd'
/ \
(c) (d) (c) means "we've read 'c' so far"
/ \ | NOT "this node IS 'c'"
'a' ... 'o'
/ |
(ca) (do)If you put the character in the node, you will off-by-one on root handling, miscalculate depths, and confuse yourself when implementing delete or compressed tries (Patricia tries).
Shared prefix structure is the real power.
A trie doesn't just store individual strings—it compresses them by factoring out common prefixes. If you insert 10,000 URLs starting with "https://www.", that 12-character prefix becomes one chain of 12 nodes, not 10,000 copies. This is why a trie is a prefix index, not merely a set. The structure itself encodes which strings share which prefixes, and you can answer prefix queries by reading a single counter at the right node.
If you only need "is this string in the set?" and never care about prefixes, you are paying for structure you don't use—a hash set is simpler and often faster.
An XOR trie is the same concept with a binary alphabet.
A binary trie for integers is not a separate data structure—it's a trie where the alphabet is children[2] instead of children[26]. The greedy max-XOR query is just "at each level, prefer the bit that differs from the query"—which is exactly a prefix search in a two-letter alphabet.
Node count: why
Each inserted string of length children[] requires knowing
text
Worst case (no sharing): Best case (all same):
n=3, strings: "ab", "cd", "ef" n=3, strings: "ab", "ab", "ab"
(root) (root)
/ | \ |
a c e 6 nodes + root a cnt=3
| | | = n*L + 1 = 7 |
b* d* f* b* cnt=3
2 nodes + root = 3Before you code, decide whether you need exact membership or prefix counts, choose the alphabet size up front, and keep cnt_prefix separate from cnt_end. Plain prefix tries tend to show up in mid-tier string tasks; binary and persistent tries sit in harder XOR-heavy problems.
When to Reach for This
Trigger phrases—if you see these in a problem, think trie:
- "common prefix" or "longest common prefix" among a set of strings
- "dictionary of strings" with prefix queries or membership tests
- "count how many strings start with…"—prefix counting
- "maximize
" or "XOR maximum"—binary trie (each bit is a character in a two-letter alphabet) - "autocomplete" or "lexicographic order over a string set"
Counter-examples—looks like trie, but isn't:
- Exact match only, no prefix structure needed. If you just need "is this string in the set?" with no prefix queries,
std::unordered_set<string>is simpler and uses less memory. See: Hash Map. - Substring queries ("does pattern P appear anywhere inside text T?"). A trie stores prefixes from the root, not arbitrary substrings. You need a suffix array, suffix automaton, or Aho-Corasick. See:
suffix-structures.md,06-aho-corasick.md. - Fuzzy / approximate matching ("find all strings within edit distance
"). A standard trie has zero tolerance for character mismatches—one wrong letter and the search fails immediately.
Where it shows up on Codeforces:
- "Maximum XOR of two elements"—binary trie, greedy bit-by-bit (CF 706D, CF 1895D).
- "Count strings with given prefix"—string trie with subtree counters.
- "Persistent trie"—offline XOR queries on prefix XORs (CF 1055F).
- "Autocomplete / dictionary"—trie with end-of-word markers.
Why not just use std::set or hashing? A sorted std::set gives
Memory Layout
The standard contest approach uses a flat array of nodes (not pointers). Each node stores an array of children indices. This is cache-friendly and avoids new/delete overhead, which matters when you have
text
Trie storing "cat", "car" with children[node][26]:
Logical tree: Flat array in memory:
(0: root) children[][]:
| node a b c d ... r ... t ...
c | 0 -1 -1 1 -1 ... -1 ... -1 ...
(1) 1 2 -1 -1 -1 ... -1 ... -1 ...
| 2 -1 -1 -1 -1 ... 4 ... 3 ...
a | 3 -1 -1 -1 -1 ... -1 ... -1 ... <- "cat"
(2) 4 -1 -1 -1 -1 ... -1 ... -1 ... <- "car"
/ \
t r cnt_prefix[]: [0, 2, 2, 1, 1]
(3) (4) cnt_end[]: [0, 0, 0, 1, 1]
Key insight: children[2]['t'-'a'] = 3, children[2]['r'-'a'] = 4
All nodes packed contiguously -> cache-friendly sequential access.
No pointers, no heap allocations per node.
Memory per node: 26 ints (children) + 2 ints (counters) = 112 bytes
For 10^6 nodes: ~112 MB. Plan your MAXN carefully!Implementation
Version 1: String Trie (Contest Template)
cpp
#include <bits/stdc++.h>
using namespace std;
struct Trie {
// Each node: children[0..25], cnt_prefix, cnt_end
static constexpr int ALPHA = 26;
vector<array<int, 26>> ch;
vector<int> cnt_prefix; // strings passing through this node
vector<int> cnt_end; // strings ending at this node
Trie() { new_node(); }
int new_node() {
ch.push_back({});
ch.back().fill(-1);
cnt_prefix.push_back(0);
cnt_end.push_back(0);
return (int)ch.size() - 1;
}
void insert(const string& s) {
int cur = 0;
for (char c : s) {
int x = c - 'a';
if (ch[cur][x] == -1) ch[cur][x] = new_node();
cur = ch[cur][x];
cnt_prefix[cur]++;
}
cnt_end[cur]++;
}
bool search(const string& s) {
int cur = 0;
for (char c : s) {
int x = c - 'a';
if (ch[cur][x] == -1) return false;
cur = ch[cur][x];
}
return cnt_end[cur] > 0;
}
int count_prefix(const string& s) {
int cur = 0;
for (char c : s) {
int x = c - 'a';
if (ch[cur][x] == -1) return 0;
cur = ch[cur][x];
}
return cnt_prefix[cur];
}
void erase(const string& s) {
// Only call if s was previously inserted
int cur = 0;
for (char c : s) {
int x = c - 'a';
cur = ch[cur][x];
cnt_prefix[cur]--;
}
cnt_end[cur]--;
}
};
int main() {
Trie tr;
tr.insert("cat");
tr.insert("car");
tr.insert("car");
tr.insert("card");
cout << tr.search("car") << "\n"; // 1
cout << tr.search("cab") << "\n"; // 0
cout << tr.count_prefix("ca") << "\n"; // 4
cout << tr.count_prefix("car") << "\n"; // 3
tr.erase("car");
cout << tr.count_prefix("car") << "\n"; // 2
}The annotated version:
cpp
#include <bits/stdc++.h>
using namespace std;
struct Trie {
// Flat array-based trie. Each node is an index into these vectors.
// Using vectors instead of raw arrays so the trie grows as needed.
// This avoids guessing MAXN upfront and wasting memory.
static constexpr int ALPHA = 26;
vector<array<int, 26>> ch; // ch[node][c] = child index, or -1
vector<int> cnt_prefix; // how many inserted strings pass through this node
vector<int> cnt_end; // how many inserted strings end exactly here
Trie() { new_node(); } // node 0 is root
int new_node() {
ch.push_back({});
ch.back().fill(-1); // -1 means "no child"
cnt_prefix.push_back(0);
cnt_end.push_back(0);
return (int)ch.size() - 1;
}
// Walk down the trie, creating nodes as needed.
// Increment cnt_prefix at every node we pass through,
// and cnt_end at the terminal node.
void insert(const string& s) {
int cur = 0; // start at root
for (char c : s) {
int x = c - 'a';
if (ch[cur][x] == -1)
ch[cur][x] = new_node(); // extend trie
cur = ch[cur][x];
cnt_prefix[cur]++; // one more string passes here
}
cnt_end[cur]++;
}
// Walk down. If any edge is missing, the word is not in the trie.
bool search(const string& s) {
int cur = 0;
for (char c : s) {
int x = c - 'a';
if (ch[cur][x] == -1) return false;
cur = ch[cur][x];
}
return cnt_end[cur] > 0; // must be a complete word, not just a prefix
}
// Returns how many inserted strings have s as a prefix.
int count_prefix(const string& s) {
int cur = 0;
for (char c : s) {
int x = c - 'a';
if (ch[cur][x] == -1) return 0;
cur = ch[cur][x];
}
return cnt_prefix[cur];
}
// Logical delete: decrement counters along the path.
// Does NOT reclaim nodes. Safe if you only erase strings that exist.
void erase(const string& s) {
int cur = 0;
for (char c : s) {
int x = c - 'a';
cur = ch[cur][x];
cnt_prefix[cur]--;
}
cnt_end[cur]--;
}
};
int main() {
Trie tr;
tr.insert("cat");
tr.insert("car");
tr.insert("car");
tr.insert("card");
cout << tr.search("car") << "\n"; // 1
cout << tr.search("cab") << "\n"; // 0
cout << tr.count_prefix("ca") << "\n"; // 4
cout << tr.count_prefix("car") << "\n"; // 3
tr.erase("car");
cout << tr.count_prefix("car") << "\n"; // 2
}Alternative Implementations — Array-Based vs Map-Based
Array-based (faster, fixed alphabet—e.g., lowercase letters):
cpp
int trie[MAXNODES][26], cnt = 0;
bool isEnd[MAXNODES];
void insert(const string& s) {
int cur = 0;
for (char c : s) {
int ch = c - 'a';
if (!trie[cur][ch]) trie[cur][ch] = ++cnt;
cur = trie[cur][ch];
}
isEnd[cur] = true;
}Map-based (flexible alphabet, slower due to hashing):
cpp
struct TrieNode {
unordered_map<char, int> children;
bool isEnd = false;
};
vector<TrieNode> trie(1);
void insert(const string& s) {
int cur = 0;
for (char c : s) {
if (!trie[cur].children.count(c)) {
trie[cur].children[c] = trie.size();
trie.emplace_back();
}
cur = trie[cur].children[c];
}
trie[cur].isEnd = true;
}Tradeoffs: Array-based is roughly 3–5× faster and cache-friendlier, but uses
Binary Trie for XOR Maximization
cpp
#include <bits/stdc++.h>
using namespace std;
struct BinaryTrie {
// Each node has two children: 0 and 1.
// We process bits from the highest (MAXBIT-1) down to 0.
static constexpr int MAXBIT = 30; // enough for values < 2^30
vector<array<int, 2>> ch;
vector<int> cnt; // how many values pass through this node
BinaryTrie() { new_node(); }
int new_node() {
ch.push_back({-1, -1});
cnt.push_back(0);
return (int)ch.size() - 1;
}
void insert(int x) {
int cur = 0;
for (int b = MAXBIT - 1; b >= 0; b--) {
int bit = (x >> b) & 1;
if (ch[cur][bit] == -1) ch[cur][bit] = new_node();
cur = ch[cur][bit];
cnt[cur]++;
}
}
void erase(int x) {
int cur = 0;
for (int b = MAXBIT - 1; b >= 0; b--) {
int bit = (x >> b) & 1;
cur = ch[cur][bit];
cnt[cur]--;
}
}
// Returns max(x ^ y) for all y currently in the trie.
// Greedy: at each bit, try to go opposite to x's bit.
int max_xor(int x) {
int cur = 0, res = 0;
for (int b = MAXBIT - 1; b >= 0; b--) {
int bit = (x >> b) & 1;
int want = 1 - bit; // opposite bit maximizes XOR
if (ch[cur][want] != -1 && cnt[ch[cur][want]] > 0) {
res |= (1 << b);
cur = ch[cur][want];
} else {
cur = ch[cur][bit];
}
}
return res;
}
};
int main() {
BinaryTrie bt;
bt.insert(5); // 101
bt.insert(2); // 010
bt.insert(7); // 111
cout << bt.max_xor(3) << "\n"; // 3^5=6 -> prints 6
cout << bt.max_xor(0) << "\n"; // 0^7=7 -> prints 7
bt.erase(7);
cout << bt.max_xor(0) << "\n"; // 0^5=5 -> prints 5
}The binary trie stores numbers as bit strings from MSB to LSB. To find max_xor(3) with values {5, 2, 7} in the trie:
text
Binary trie with values 5 (101), 2 (010), 7 (111):
Bit 2 (MSB) (root)
/ \
0 1
/ / \
Bit 1 (0) (10) (11)
| | |
1 0 1
| | |
Bit 0 (01) (100) (111)
| | |
0 1 1
| | |
(010) (101) (111)
=2 =5 =7
Query: max_xor(3) 3 = 011 in binary
Bit 2: query bit=0 -> want 1 (opposite) -> go RIGHT -> res |= (1<<2) = 4
Bit 1: query bit=1 -> want 0 (opposite) -> go LEFT to (10) -> res |= (1<<1) = 6
Bit 0: query bit=1 -> want 0 (opposite) -> go LEFT to (100)-> no change, res = 6
but 100->101 path
gives bit=1 -> res |= 1
Wait -- let's trace precisely:
Bit 2: 3 has bit 0 -> want 1 -> node (1) exists, go right. res = 100 = 4
Bit 1: 3 has bit 1 -> want 0 -> node (10) exists, go left. res = 110 = 6
Bit 0: 3 has bit 1 -> want 0 -> node (100) has child 1 only
-> forced to take bit 1 -> go to (101). res stays 110 = 6.
Result: 6 (which is 3 XOR 5 = 6) [ok]The annotated version:
cpp
#include <bits/stdc++.h>
using namespace std;
struct BinaryTrie {
// Binary trie over integers. Each level represents one bit.
// We go from the most significant bit (MSB) down to bit 0.
// This ordering is critical: the greedy XOR-max algorithm needs
// to fix the highest bits first to maximize the result.
static constexpr int MAXBIT = 30; // handles values up to ~10^9
vector<array<int, 2>> ch; // ch[node][0] and ch[node][1]
vector<int> cnt; // number of live values in this subtree
BinaryTrie() { new_node(); } // node 0 is root
int new_node() {
ch.push_back({-1, -1});
cnt.push_back(0);
return (int)ch.size() - 1;
}
// Insert integer x, walking bit by bit from MSB to LSB.
void insert(int x) {
int cur = 0;
for (int b = MAXBIT - 1; b >= 0; b--) {
int bit = (x >> b) & 1;
if (ch[cur][bit] == -1)
ch[cur][bit] = new_node();
cur = ch[cur][bit];
cnt[cur]++;
}
}
// Logical erase: decrement counts along x's path.
// Only call if x is currently in the trie.
void erase(int x) {
int cur = 0;
for (int b = MAXBIT - 1; b >= 0; b--) {
int bit = (x >> b) & 1;
cur = ch[cur][bit];
cnt[cur]--;
}
}
// Greedy XOR maximization.
// At each bit position, we want the opposite of x's bit:
// if x's bit is 0, going to child 1 sets this bit in the XOR result;
// if x's bit is 1, going to child 0 sets this bit in the XOR result.
// If the preferred child doesn't exist (or is empty), take the other.
int max_xor(int x) {
int cur = 0, res = 0;
for (int b = MAXBIT - 1; b >= 0; b--) {
int bit = (x >> b) & 1;
int want = 1 - bit;
if (ch[cur][want] != -1 && cnt[ch[cur][want]] > 0) {
res |= (1 << b); // this bit contributes to XOR
cur = ch[cur][want];
} else {
// forced to go same direction, this bit is 0 in result
cur = ch[cur][bit];
}
}
return res;
}
};
int main() {
BinaryTrie bt;
bt.insert(5); // 101
bt.insert(2); // 010
bt.insert(7); // 111
// Query: what value in the trie maximizes XOR with 3 (011)?
// Try opposite bits: bit2=0->want 1 (go 1), bit1=1->want 0 (go 0),
// bit0=1->want 0 (go 1, forced). Reached 101=5. 3^5=6.
cout << bt.max_xor(3) << "\n"; // 6
cout << bt.max_xor(0) << "\n"; // 7
bt.erase(7);
cout << bt.max_xor(0) << "\n"; // 5
}Operations Reference
String Trie (alphabet size )
| Operation | Time | Space |
|---|---|---|
| Insert | $O( | s |
| Search | $O( | s |
| Count prefix | $O( | s |
| Erase | $O( | s |
| Build from |
Binary trie (over integers):
| Operation | Time | Space |
|---|---|---|
| Insert | ||
| Erase | ||
| Max XOR query | ||
| Build from |
long long).
Problem Patterns
Pattern 1: Maximum XOR Pair
Problem: Given an array
Insert all values into a binary trie, then for each max_xor(a[i]).
cpp
BinaryTrie bt;
for (int x : a) bt.insert(x);
int ans = 0;
for (int x : a) ans = max(ans, bt.max_xor(x));Problems: CF 706D, CSES - Finding Patterns (variant).
Maximum XOR Subarray. Find a contiguous subarray with maximum XOR sum. Compute prefix XOR array
cpp
vector<int> p(n + 1, 0);
for (int i = 0; i < n; i++) p[i + 1] = p[i] ^ a[i];
BinaryTrie bt;
int ans = 0;
for (int i = 0; i <= n; i++) {
if (i > 0) ans = max(ans, bt.max_xor(p[i]));
bt.insert(p[i]);
}Problems: CF 282E (subarray XOR, simplified variant).
XOR with Range Constraint. Count pairs where
Problems: CF 817E (counting pairs with XOR < K concept), CF 1895D.
Prefix Counting / Dictionary Lookup. Given a dictionary and queries, count how many dictionary words have a given prefix, or check membership. Straight string trie with cnt_prefix counters.
cpp
Trie tr;
for (auto& w : dict) tr.insert(w);
for (auto& q : queries) cout << tr.count_prefix(q) << "\n";Problems: CF 965E (trie on strings).
Persistent Binary Trie. Answer "max XOR with any element in insert creates
Problems: CF 1055F (persistent trie on tree paths).
Lexicographic Minimum / Autocomplete. Find the lexicographically smallest string with a given prefix, or implement autocomplete. DFS from the prefix node, always taking the smallest child first.
cpp
string autocomplete(Trie& tr, const string& prefix) {
int cur = 0;
string res = prefix;
for (char c : prefix) {
int x = c - 'a';
if (tr.ch[cur][x] == -1) return ""; // prefix not found
cur = tr.ch[cur][x];
}
while (tr.cnt_end[cur] == 0) {
for (int c = 0; c < 26; c++) {
if (tr.ch[cur][c] != -1 && tr.cnt_prefix[tr.ch[cur][c]] > 0) {
res += (char)('a' + c);
cur = tr.ch[cur][c];
break;
}
}
}
return res;
}Problems: CF 455B (game on trie, involves lex properties).
Contest Cheat Sheet
text
+----------------------------------------------------------+
| TRIE CHEAT SHEET |
+----------------------------------------------------------+
| STRING TRIE |
| Use: prefix count, dictionary, autocomplete |
| Node: array<int,26> children + cnt_prefix + cnt_end |
| Insert/Search/Count: O(|s|) |
| Space: O(total_len * 26) -- often much less |
| |
| BINARY TRIE |
| Use: max XOR pair, max XOR subarray, XOR with limit |
| Node: array<int,2> children + cnt |
| MAXBIT: 30 (int) or 60 (long long) |
| Insert bits MSB -> LSB |
| max_xor: greedily pick opposite bit at each level |
| |
| MAX XOR PAIR: |
| insert all, query each -> O(n * B) |
| |
| MAX XOR SUBARRAY: |
| prefix XOR array, then max XOR pair on prefixes |
| |
| REMEMBER: |
| - Array-based, NOT pointer-based (speed + memory) |
| - cnt tracks live values for erase support |
| - Reserve nodes if n is known: ch.reserve(n * B) |
+----------------------------------------------------------+Common Mistakes & Debugging
Pointer-based trie = MLE/TLE. Each
newcall has overhead (roughly 16–40 bytes per allocation on most systems). Withstrings of length 10, that is allocations. Use the flat-array approach shown above. Alphabet size blowup. Each node costs
space. For lowercase-only ( ), each node is 26 ints = 104 bytes. At nodes that is ~100 MB. If memory is tight, consider using unordered_map<int,int>per node or compressing the trie (but this is slower).Forgetting the root is not a character. The root represents the empty prefix. Off-by-one errors happen when you increment counters at the root—typically you should only increment at children.
Erase without checking existence. If you call
erase("abc")but"abc"was never inserted, you get negative counts. Always track what you have inserted or guard with asearchcall first.Wrong bit order in binary trie. Bits must go MSB to LSB. If you iterate LSB to MSB, the greedy XOR-max algorithm fails—higher bits matter more.
MAXBIT too small. If values can reach
, you need . If values can reach (or you use long long), you need. Off-by-one: so covers . Not reserving memory. If you know the max number of nodes upfront, call
ch.reserve(MAX_NODES)to avoid repeated reallocations. This can be the difference between AC and TLE.Negative numbers in binary trie. XOR with negative numbers in C++ uses two's complement. Either offset all values by adding
to make them non-negative, or set and handle the sign bit carefully. Forgetting the end-of-word marker. Reaching the last character of a query only means the query is a prefix of some inserted word. Without a boolean
is_endflag (orcnt_end > 0check) at the terminal node of each insertion,search("app")returnstrueeven if only"applesauce"was inserted. A trie node existing doesn't mean a string ends there—always mark terminals."Trie is just a tree." A trie is a rooted tree, but calling it "just a tree" misses what makes it useful. A trie is a prefix index: every root-to-node path corresponds to exactly one prefix, children partition by the next character, and subtree counts answer prefix queries in
. A generic tree has none of these structural guarantees. When someone says "just use a tree," ask: does your tree let you answer "how many strings start with X?" by reading one integer? If not, it's not "just" a tree problem—it's a trie problem.
Trap: "I can use a trie for any string matching."
A trie matches prefixes from the start of each string. It does not handle:
- Substring search ("does
"cat"appear anywhere inside"concatenate"?")—use a suffix array, suffix automaton, or Aho-Corasick. - Approximate/fuzzy matching ("find strings within edit distance 2 of
"cat"")—requires a BK-tree, edit-distance automata, or specialized trie extensions. - Regex matching—requires automaton construction, not a trie.
A trie answers: "Which strings in my set start with this prefix?"—nothing more, nothing less.
text
What trie CAN vs CAN'T do:
[Y] "How many strings start with 'ca'?" -> walk root->c->a, read cnt
[Y] "Is 'card' in the set?" -> walk root->c->a->r->d, check is_end
[Y] "Max XOR of x with any value in set?" -> binary trie, greedy bit walk
[N] "Does 'ar' appear inside any string?" -> substring, need suffix structure
[N] "Find strings similar to 'cat'" -> fuzzy match, need edit-distance algo
[N] "Match regex 'c.t' in string set" -> need automaton, not trieTrap: Confusing trie lookup with hash set lookup.
Both give
text
Trie lookup for "car": Hash set lookup for "car":
(root) hash("car") = 7
| walk edge 'c' v
(c) bucket[7] -> "car" -> match!
| walk edge 'a'
(ca) One hash computation + one
| walk edge 'r' string comparison. No prefix
(car) <- found! info preserved.
3 edge traversals. Can answer: "is 'car' in set?"
Also knows: 3 strings CANNOT answer: "how many start
share prefix "ca", 2 share with 'ca'?" (would need to
prefix "car", etc. scan all entries)The trie is slower for pure membership checks (more pointer chasing, worse cache behavior) but carries structural information the hash set discards.
Self-Test
Before moving on, make sure you can answer these without looking back:
- [ ] Edge vs node: If someone asks "what character does this trie node store?", what is the correct answer? (Hint: the character lives on the ___.)
- [ ] Prefix count: You insert
"app","apple","ape","bat". What doescntequal at the node reached by walking root→a→p? Why? - [ ] Node sizing: You have
strings each of length at most over a 26-letter alphabet. What is the maximum number of trie nodes? How many ints does your children[]array need? - [ ] XOR trie bit order: Why must you insert bits MSB→LSB, not LSB→MSB? What breaks if you reverse the order?
- [ ] Prefix vs exact: You insert
"car"and"card". Doessearch("car")return true? Doesstarts_with("car")return true? What is the difference in the code? - [ ] When NOT to use trie: Give two problem types that look like they might need a trie but actually don't.
- [ ] Memory trap: Your trie uses
children[MAXN][26]withMAXN = 10^7. How much memory does this consume? Will it fit in a 256 MB limit?
Bug Gallery
Bug 1: String trie—off-by-one in character mapping for uppercase letters
cpp
void insert(const string& s) {
int cur = 0;
for (char c : s) {
int ch = c - 'A'; // [BUG] Bug when input has lowercase
if (!trie[cur][ch])
trie[cur][ch] = ++node_cnt;
cur = trie[cur][ch];
}
is_end[cur] = true;
}Why it's wrong: If the input contains lowercase letters, c - 'A' produces values 32–57, which index out of the trie[cur][26] array. This causes silent memory corruption.
cpp
int ch = c - 'a'; // [FIX] Match the actual input character set
// Or normalize: int ch = tolower(c) - 'a';Bug 2: Binary trie—using wrong bit width for the value range
cpp
// Values up to 10^9
void insert(int x) {
int cur = 0;
for (int i = 29; i >= 0; i--) { // [BUG] Bug: 2^30 = ~10^9, need bit 30
int bit = (x >> i) & 1;
if (!trie[cur][bit])
trie[cur][bit] = ++node_cnt;
cur = trie[cur][bit];
}
}Why it's wrong:
cpp
for (int i = 30; i >= 0; i--) { // [FIX] Cover full range up to 2^31 - 1Bug 3: Counting prefixes—incrementing count only at end-of-word nodes
cpp
int count_prefix(const string& s) {
int cur = 0, count = 0;
for (char c : s) {
int ch = c - 'a';
if (!trie[cur][ch]) return count;
cur = trie[cur][ch];
if (is_end[cur]) count++; // [BUG] Bug: only counts complete words
}
return count;
}If the goal is to count how many inserted strings have s as a prefix (or how many strings are prefixes of s), relying on is_end alone misses the pass-through count. You need a separate pass_count that increments for every insertion passing through that node.
cpp
cur = trie[cur][ch];
count += pass_cnt[cur]; // [FIX] Use a pass-through counter set during insertBug 4: Null/uninitialized children—accessing child without checking for null
cpp
bool search(const string& s) {
int cur = 0;
for (char c : s) {
int ch = c - 'a';
cur = trie[cur][ch]; // [BUG] Bug: no null check
}
return is_end[cur];
}If a character's child doesn't exist, trie[cur][ch] is 0 (the root), and the search continues from the root instead of returning false. This gives false positives for strings that aren't in the trie.
cpp
int ch = c - 'a';
cur = trie[cur][ch];
if (!cur) return false; // [FIX] No such path existsBug 5: Alphabet offset—forgetting to subtract base character entirely
cpp
void insert(const string& s) {
int cur = 0;
for (char c : s) {
if (!trie[cur][c]) // [BUG] Bug: using raw ASCII value (97--122) as index
trie[cur][c] = ++node_cnt;
cur = trie[cur][c];
}
is_end[cur] = true;
}Using the raw ASCII value of c (e.g., 'a' = 97) as an array index into trie[cur][26] writes far out of bounds, causing memory corruption or a crash.
cpp
int ch = c - 'a'; // [FIX] Map 'a'..'z' to 0..25
if (!trie[cur][ch])
trie[cur][ch] = ++node_cnt;
cur = trie[cur][ch];Bug 6: Memory limit exceeded—oversized trie array allocation
cpp
// Each string up to length 10^5, up to 10^5 strings
int trie[10000001][26]; // [BUG] Bug: ~1 GB memory
bool is_end[10000001];Why it's wrong: MAXN × max_length.
cpp
// Use total sum of all string lengths as MAXN
int trie[1000001][26]; // [FIX] Fits within memory limit
bool is_end[1000001];
// Or use dynamic allocation with pointers/maps for sparse alphabetsBug 7: Node array not reset between test cases
When solving multi-test-case problems, failing to reset the trie between cases leaves stale nodes and counts from previous inputs. Either clear the entire array with memset (slow for large MAXN) or track the node count and reset only the used portion: set node_cnt = 0 and memset(trie, 0, sizeof(int) * 26 * (node_cnt + 1)).
Practice Problems
| # | Problem | Source | Difficulty | Key Idea |
|---|---|---|---|---|
| 1 | Vasiliy's Multiset | CF 706D | 1800 | Binary trie, XOR max query with insert/erase |
| 2 | Xor-MST | CF 888G | 2400 | Binary trie + divide and conquer (Boruvka's) |
| 3 | XOR on Segment | CF 282E | 2100 | Prefix XOR + binary trie for max subarray XOR (variant) |
| 4 | Vitya and Strange Lesson | CF 842D | 2000 | Binary trie, global XOR updates, MEX queries |
| 5 | Frequency of String | CF 963D | 2300 | Aho-Corasick (trie-based automaton) |
| 6 | A and B and Lecture Rooms | CF 519E | 2100 | LCA (but trie-based string matching variants exist) |
| 7 | String Set Queries | CF 710F | 2400 | Aho-Corasick (trie + failure links, dynamic rebuild) |
| 8 | Maximum XOR Secondary | CF 281D | 1800 | Stack + binary trie for XOR of adjacent elements |
| 9 | Garden of the Sun | CF 1495C | 2300 | Constructive (trie-like greedy) |
| 10 | Nauuo and Binary Operations | CF 1172C2 | 2100 | Bit-by-bit trie reasoning |
Start with CF 706D—it is the canonical binary trie XOR problem. Then move to CF 842D for global XOR tricks.
Further Reading
- cp-algorithms: Trie—Aho-Corasick builds on the trie; understanding the base trie is a prerequisite.
- CF Blog: Binary Trie Tutorial—covers XOR queries in detail.
- CF Blog: Persistent Trie—extension for versioned queries.
- Related topics in this notebook:
- Hash Map and Unordered Map—when exact membership suffices
- Segment Tree—another workhorse data structure for range queries
- Data Structure Selection Guide
- Practice Ladder: Data Structures