Appearance
Lyndon Factorization
Quick Revisit
- USE WHEN: Finding lexicographically smallest rotation or decomposing a string into Lyndon words
- INVARIANT: s = w₁w₂…wₖ where each wᵢ is a Lyndon word and w₁ ≥ w₂ ≥ … ≥ wₖ lexicographically
- TIME: O(n) | SPACE: O(1)
- CLASSIC BUG: Not handling the case when s[j] < s[k] (new Lyndon word starts) vs s[j] > s[k] (emit factors of length period) — swapping the two produces wrong factorization
- PRACTICE: 07-ladder-strings
Duval's algorithm decomposes any string into Lyndon words in
time and space, enabling minimum rotation and cyclic string comparisons.
| Difficulty | Prerequisites |
|---|---|
| Advanced (CF 1800-2100) | KMP and Z-Function |
Contents
- From Rotations to Lyndon Words
- Watching the Algorithm Work
- More Worked Examples
- Relevant String Operations
- Implementation
- Complexity at a Glance
- Problem Patterns and Tricks
- Contest Cheat Sheet
- Gotchas and Debugging
- Self-Test
- Practice Problems
- Further Reading
From Rotations to Lyndon Words
Suppose you need the lexicographically smallest rotation of "cabba". The five rotations are:
| Start | Rotation |
|---|---|
| 0 | cabba |
| 1 | abbac |
| 2 | bbaca |
| 3 | bacab |
| 4 | acabb |
The answer is "abbac" (start index 1). The brute-force way generates all
Lyndon words
A string
| String | Lyndon? | Why |
|---|---|---|
"a" | Yes | No proper suffixes to violate the rule |
"ab" | Yes | "ab" "b" |
"aab" | Yes | "aab" "ab" "b" |
"abcd" | Yes | Strictly increasing -- always smaller than any suffix |
"ba" | No | "ba" "a" |
"aa" | No | "aa" "a" (prefix comparison: "a" is shorter) |
"abab" | No | Not primitive -- repeats "ab" |
Checking if "aab" is a Lyndon word:
"aab" vs all proper suffixes:
"aab" < "ab" ? a < a? equal -> a < b? YES OK
"aab" < "b" ? a < b? YES OK
All suffixes are strictly larger -> "aab" IS Lyndon
Checking if "abab" is a Lyndon word:
"abab" < "bab" ? a < b? YES OK
"abab" < "ab" ? a=a, b=b, a=end -> "ab" shorter
"abab" > "ab" X FAIL
Not all suffixes are strictly larger -> "abab" is NOT LyndonTwo handy equivalences for a Lyndon word
is the unique smallest rotation of itself. Every other rotation is strictly larger. is primitive (cannot be written as a shorter string repeated) and is the smallest rotation of itself.
If either condition fails,
The factorization theorem
Every string
where each
(lexicographically non-increasing). This is the Lyndon factorization of
Let's work through "abbaabba" by hand. We need to split it into non-increasing Lyndon words:
- Start scanning from the left. Is
"a"Lyndon? Yes (single char). Can we extend?"ab"is Lyndon ("ab""b"). Keep going:"abb"-- check suffixes:"abb""bb""b". Still Lyndon. - Try extending to
"abba". Check"abba"against suffix"a":"abba""a". That breaks it --"abba"is not Lyndon. - First Lyndon word:
"abb". - Remaining string:
"aabba". Try"a","aa"(not Lyndon --"aa""a"). Try"aab":"aab""ab""b". Lyndon. Try"aabb":"aabb""abb""bb""b". Still Lyndon. - Try
"aabba"-- suffix"a":"aabba""a". Not Lyndon. - Second Lyndon word:
"aabb". - Remaining:
"a". Single character -- always Lyndon. Third word:"a".
Result: "abb" | "aabb" | "a".
Check non-increasing: "abb" "aabb" (position 1: b a) and "aabb" "a". Correct.
Duval's algorithm
Chen, Fox, and Lyndon proved the factorization theorem exists; Duval gave an
: start of the current group being factorized -- this never moves backward. : scanning position, moving right through the string. : comparison position within the candidate Lyndon word pattern. The current candidate Lyndon word is , and is the position in that candidate that the new character is being compared against.
Starting with
The asymmetry between the two unequal cases is what makes the algorithm correct:
The whole thing terminates when
How i, j, k evolve on a tiny string s = "ab a ab" (= "abaab"):
+---+---+---+--------+--------+---------+----------------------------+
| i | j | k | s[j] | s[k] | compare | action |
+---+---+---+--------+--------+---------+----------------------------+
| 0 | 1 | 0 | b | a | j > k | extend: k -> 0, j -> 2 |
| 0 | 2 | 0 | a | a | j = k | repeat: k -> 1, j -> 3 |
| 0 | 3 | 1 | a | b | j < k | BREAK; period = j - k = 2 |
+---+---+---+--------+--------+---------+----------------------------+
Emit: copies of length 2 starting at i=0 while i <= k=1.
First emit "ab" (positions 0..1), i -> 2. Now i = 2 > k = 1, stop.
+---+---+---+--------+--------+---------+----------------------------+
| 2 | 3 | 2 | a | a | j = k | repeat: k -> 3, j -> 4 |
| 2 | 4 | 3 | b | a | j > k | extend: k -> 2, j -> 5=n |
+---+---+---+--------+--------+---------+----------------------------+
j reached end. period = j - k = 5 - 2 = 3.
Emit one copy of length 3 starting at i=2: "aab" (positions 2..4), i -> 5.
Result for "abaab": "ab" | "aab"
Check non-increasing: "ab" >= "aab" (b > a at pos 1) OKMinimum rotation via Lyndon factorization
Here's the key trick: to find the smallest rotation of
Why? Later words in the factorization are lexicographically
This gives
What the Code Won't Teach You
Why Duval's algorithm is truly
The Burrows-Wheeler connection. Sorting all rotations of a string is the core of the Burrows-Wheeler Transform (used in bzip2 compression). The SA-IS suffix array algorithm classifies suffixes into S-type and L-type, mirroring Lyndon-word structure. Understanding Lyndon factorization gives you a second lens on why suffix array construction works.
Counting Lyndon words is a combinatorics problem. The number of Lyndon words of length
The "aha moment": uniqueness of the factorization. The Chen-Fox-Lyndon theorem says every string has a unique factorization into Lyndon words
Contest frequency is low, but when it appears, nothing else works. Lyndon factorization shows up in maybe 1 in 200 contest problems -- but those problems are essentially unsolvable without it. The classic signal is "find the lexicographically smallest rotation" (direct application), but subtler appearances include string comparison problems where you need a canonical decomposition, or problems involving primitive roots of strings. The technique is worth knowing at the 1900+ level precisely because it's rare enough that most competitors don't know it, giving you a decisive edge when it appears.
Minimum rotation is the gateway application. The most common contest use of Lyndon factorization: given a string
Booth's algorithm
Booth's algorithm is an independent
- A failure function array
(like KMP's), indexed relative to the current best candidate. - A candidate starting position
for the best rotation seen so far.
At each step, when a character comparison fails:
- If the new character is smaller than expected, we've found a potentially better starting position -- update
. - Otherwise, fall back using the failure function, just like KMP.
Booth's uses
Watching the Algorithm Work
Duval's algorithm on "abbaabba"
String: a b b a a b b a
Index: 0 1 2 3 4 5 6 7
Round 1 (i = 0):
+----+----+------+------+---------+-----------+
| j | k | s[j] | s[k] | Compare | Action |
+----+----+------+------+---------+-----------+
| 1 | 0 | b | a | j > k | k=0, ext |
| 2 | 0 | b | a | j > k | k=0, ext |
| 3 | 0 | a | a | j = k | k=1, rep |
| 4 | 1 | a | b | j < k | BREAK |
+----+----+------+------+---------+-----------+
len = j - k = 4 - 1 = 3
Output: "abb" (positions 0..2)
i advances to 3
Round 2 (i = 3):
+----+----+------+------+---------+-----------+
| j | k | s[j] | s[k] | Compare | Action |
+----+----+------+------+---------+-----------+
| 4 | 3 | a | a | j = k | k=4, rep |
| 5 | 4 | b | a | j > k | k=3, ext |
| 6 | 3 | b | a | j > k | k=3, ext |
| 7 | 3 | a | a | j = k | k=4, rep |
| 8 | | | | j >= n | END |
+----+----+------+------+---------+-----------+
len = j - k = 8 - 4 = 4
Output: "aabb" (positions 3..6)
i advances to 7
Round 3 (i = 7):
j = 8 >= n immediately.
Output: "a" (position 7)
i advances to 8. Done.Factorization result
a b b | a a b b | a
+---------+---------------+---+
| "abb" | "aabb" |"a"|
+---------+---------------+---+
Lyndon >= Lyndon >= LAll three are Lyndon words, and they appear in non-increasing order.
Minimum rotation of "cabba"
s = "cabba" (n = 5)
Doubled (conceptual): c a b b a c a b b a
0 1 2 3 4 5 6 7 8 9
Duval on doubled string (modular indexing):
Round 1: i=0
s[0]='c' vs s[1]='a' -> 'c' > 'a' -> BREAK immediately
Output: "c" (len 1), i -> 1
Round 2: i=1 (i < n=5, so ans = 1)
Scans j = 2..9, building the Lyndon word "abbac"
s[1]='a' < s[2]='b' -> extend
s[1]='a' < s[3]='b' -> extend
s[1]='a' = s[4]='a' -> repetition track, k -> 2
s[2]='b' < s[5]='c' -> extend (k resets to 1)
...continues matching through j=10 -> END
Output: "abbac" (len 5), i -> 6
i = 6 >= n = 5, outer loop exits.
Last boundary before n: position 1.
Answer: rotation starting at index 1 = "abbac"
+-------+---------+
| Start | Rotation|
+-------+---------+
| 0 | cabba |
| 1 | abbac | <-- minimum
| 2 | bbaca |
| 3 | bacab |
| 4 | acabb |
+-------+---------+More Worked Examples
These additional examples hit the edge cases that trip people up in contests.
All identical characters: "aaaa"
Every single character is its own Lyndon word because "aa" is NOT Lyndon ("aa" "a" by prefix comparison). The algorithm sees nothing but equality, and when
String: a a a a
Index: 0 1 2 3
Round 1 (i = 0):
+----+----+------+------+---------+-----------+
| j | k | s[j] | s[k] | Compare | Action |
+----+----+------+------+---------+-----------+
| 1 | 0 | a | a | j = k | k=1, rep |
| 2 | 1 | a | a | j = k | k=2, rep |
| 3 | 2 | a | a | j = k | k=3, rep |
| 4 | | | | j >= n | END |
+----+----+------+------+---------+-----------+
len = j - k = 4 - 3 = 1
Output: "a" (pos 0), "a" (pos 1), "a" (pos 2), "a" (pos 3)
i advances to 4. Done.
Result: "a" | "a" | "a" | "a"
Each is trivially Lyndon.
Non-increasing: "a" >= "a" >= "a" >= "a" OKThis is the most common edge case to test -- if your implementation gets stuck in an infinite loop on all-identical input, the inner while condition is wrong.
Repeated Lyndon word: "ababab"
When a string is a perfect repetition of a Lyndon word, Duval's algorithm detects the pattern through its equality-tracking mechanism. Variable
String: a b a b a b
Index: 0 1 2 3 4 5
Round 1 (i = 0):
+----+----+------+------+---------+-----------+
| j | k | s[j] | s[k] | Compare | Action |
+----+----+------+------+---------+-----------+
| 1 | 0 | b | a | j > k | k=0, ext |
| 2 | 0 | a | a | j = k | k=1, rep |
| 3 | 1 | b | b | j = k | k=2, rep |
| 4 | 2 | a | a | j = k | k=3, rep |
| 5 | 3 | b | b | j = k | k=4, rep |
| 6 | | | | j >= n | END |
+----+----+------+------+---------+-----------+
len = j - k = 6 - 4 = 2
Output: "ab" (pos 0), "ab" (pos 2), "ab" (pos 4)
i advances to 6. Done.
Result: "ab" | "ab" | "ab"
Each factor is the Lyndon word "ab".
Non-increasing: "ab" >= "ab" >= "ab" OKNotice how the algorithm never backtracks. Variable
Decreasing string: "dcba"
When every character is smaller than the one before it, nothing can extend -- each character breaks immediately.
String: d c b a
Index: 0 1 2 3
Round 1: i=0, j=1, k=0
s[1]='c' < s[0]='d' -> BREAK
Output: "d" (pos 0), i -> 1
Round 2: i=1, j=2, k=1
s[2]='b' < s[1]='c' -> BREAK
Output: "c" (pos 1), i -> 2
Round 3: i=2, j=3, k=2
s[3]='a' < s[2]='b' -> BREAK
Output: "b" (pos 2), i -> 3
Round 4: i=3, j=4 >= n -> END
Output: "a" (pos 3), i -> 4
Result: "d" | "c" | "b" | "a"
Non-increasing: d >= c >= b >= a OKA fully decreasing string produces the maximum number of Lyndon factors -- one per character. Conversely, a fully increasing string like "abcd" is itself a single Lyndon word (since "abcd"
A real-world string: "banana"
This hits an interesting case where the algorithm finds multiple copies of the same Lyndon word embedded in a natural word.
String: b a n a n a
Index: 0 1 2 3 4 5
Round 1 (i = 0):
j=1, k=0: s[1]='a' < s[0]='b' -> BREAK
len = 1, Output: "b" (pos 0), i -> 1
Round 2 (i = 1):
+----+----+------+------+---------+-----------+
| j | k | s[j] | s[k] | Compare | Action |
+----+----+------+------+---------+-----------+
| 2 | 1 | n | a | j > k | k=1, ext |
| 3 | 1 | a | a | j = k | k=2, rep |
| 4 | 2 | n | n | j = k | k=3, rep |
| 5 | 3 | a | a | j = k | k=4, rep |
| 6 | | | | j >= n | END |
+----+----+------+------+---------+-----------+
len = j - k = 6 - 4 = 2
Output: "an" (pos 1), "an" (pos 3)
i advances to 5. But wait -- i=5 > k=4, so the while(i<=k) loop
only outputs two copies and leaves i=5.
Round 3 (i = 5):
j=6 >= n immediately.
Output: "a" (pos 5), i -> 6. Done.
Result: "b" | "an" | "an" | "a"
Check: "b" >= "an"? (b > a at position 0) OK
"an" >= "an"? (equal) OK
"an" >= "a"? (equal prefix, "an" longer) OK
All Lyndon: "b" OK, "an" (an < n) OK, "a" OKMinimum rotation of "dbacd"
A slightly trickier minimum-rotation example showing how Duval on the doubled string tracks the answer.
s = "dbacd" (n = 5)
Doubled (conceptual): d b a c d d b a c d
0 1 2 3 4 5 6 7 8 9
Round 1: i=0
s[0]='d' vs s[1]='b' -> 'd' > 'b' -> BREAK
Output: "d" (len 1), i -> 1
Round 2: i=1 (i < n=5, so ans = 1)
s[1]='b' vs s[2]='a' -> 'b' > 'a' -> BREAK
Output: "b" (len 1), i -> 2
Round 3: i=2 (i < n=5, so ans = 2)
s[2]='a' vs s[3]='c' -> extend (c > a), k=2
s[2]='a' vs s[4]='d' -> extend (d > a), k=2
s[2]='a' vs s[5]='d' -> extend (d > a), k=2
s[2]='a' vs s[6]='b' -> extend (b > a), k=2
s[2]='a' vs s[7]='a' -> equal, k=3
s[3]='c' vs s[8]='c' -> equal, k=4
s[4]='d' vs s[9]='d' -> equal, k=5
j=10 >= 2n -> END
len = 10 - 5 = 5
Output: "acddb" (positions 2..6), i -> 7
i = 7 >= n = 5, outer loop exits.
Last boundary before n: position 2.
Answer: rotation starting at index 2 = "acddba"
Wait -- let's verify:
+-------+-----------+
| Start | Rotation |
+-------+-----------+
| 0 | dbacd |
| 1 | bacdd |
| 2 | acddb | <-- minimum
| 3 | cddba |
| 4 | ddbac |
+-------+-----------+
"acddb" is the lexicographically smallest. OKRelevant String Operations
These standard library features are useful alongside Lyndon factorization:
| Operation | Syntax | Notes |
|---|---|---|
| Lexicographic compare | s.compare(pos, len, t, pos2, len2) | Returns |
| Substring | s.substr(pos, len) | |
| Rotate in-place | rotate(s.begin(), s.begin()+k, s.end()) | Left-rotate by |
| Concatenation | s + s | For doubled-string tricks |
Operator < | s < t | Lexicographic; |
In contest code you rarely call these -- the algorithms below work character by character. But rotate is handy for verification and testing.
Implementation
Contest template
Short, copy-paste ready. Three functions covering all common needs.
cpp
#include <bits/stdc++.h>
using namespace std;
vector<string> duval(const string &s) {
int n = s.size();
vector<string> res;
int i = 0;
while (i < n) {
int j = i + 1, k = i;
while (j < n && s[k] <= s[j]) {
if (s[k] < s[j]) k = i;
else k++;
j++;
}
while (i <= k) {
res.push_back(s.substr(i, j - k));
i += j - k;
}
}
return res;
}
int min_rotation(const string &s) {
int n = s.size();
int i = 0, ans = 0;
while (i < n) {
ans = i;
int j = i + 1, k = i;
while (j < 2 * n && s[k % n] <= s[j % n]) {
if (s[k % n] < s[j % n]) k = i;
else k++;
j++;
}
while (i <= k) i += j - k;
}
return ans;
}
int booth(const string &s) {
int n = s.size();
string t = s + s;
vector<int> f(2 * n, -1);
int k = 0;
for (int j = 1; j < 2 * n; j++) {
int i = f[j - 1 - k];
while (i != -1 && t[j] != t[k + i + 1]) {
if (t[j] < t[k + i + 1]) k = j - i - 1;
i = f[i];
}
if (t[j] != t[k + i + 1]) {
if (t[j] < t[k + i + 1]) k = j;
f[j - k] = -1;
} else {
f[j - k] = i + 1;
}
}
return k;
}
int main() {
string s;
cin >> s;
auto fact = duval(s);
for (auto &w : fact) cout << "[" << w << "]";
cout << "\n";
int idx1 = min_rotation(s);
int idx2 = booth(s);
cout << "Min rotation at " << idx1 << ": "
<< s.substr(idx1) + s.substr(0, idx1) << "\n";
cout << "Booth confirms: " << idx2 << "\n";
}Explained version
Same algorithms with detailed commentary on every step.
cpp
#include <bits/stdc++.h>
using namespace std;
/*
* Duval's algorithm for Lyndon factorization.
*
* Returns the unique decomposition s = w1 w2 ... wk where each wi
* is a Lyndon word and w1 >= w2 >= ... >= wk lexicographically.
*
* Time: O(n) Space: O(1) beyond the output vector
*/
vector<string> duval(const string &s) {
int n = s.size();
vector<string> res;
int i = 0;
while (i < n) {
// j scans forward; k tracks our position inside the candidate
// Lyndon word for comparison (starts at the word's beginning).
int j = i + 1;
int k = i;
while (j < n && s[k] <= s[j]) {
if (s[k] < s[j]) {
// s[j] is strictly larger than the corresponding position
// in the candidate word. The entire segment i..j becomes
// one bigger Lyndon word. Reset k to start comparing
// future characters against the beginning of this word.
k = i;
} else {
// s[j] matches exactly. We might be seeing another copy
// of the same Lyndon word repeating. Advance k to keep
// tracking which position in the pattern we're matching.
k++;
}
j++;
}
// Now s[j] < s[k] or j reached end-of-string.
// The Lyndon word has length (j - k). Positions i..j-1 contain
// one or more complete copies of it (plus maybe a prefix that
// belongs to the next group -- but i won't reach that far).
int len = j - k;
while (i <= k) {
res.push_back(s.substr(i, len));
i += len;
}
// i has moved past all complete copies. Any leftover partial
// match is already at the correct position for the next round.
}
return res;
}
/*
* Minimum rotation via Duval on the doubled string.
*
* Conceptually runs Lyndon factorization on s+s using modular indexing
* (no actual concatenation). The last Lyndon word boundary that starts
* before position n is the answer.
*
* Time: O(n) Space: O(1)
*/
int min_rotation(const string &s) {
int n = s.size();
if (n == 0) return 0;
int i = 0, ans = 0;
while (i < n) {
// Every time we enter this loop with i < n, we have a valid
// rotation candidate. Record it -- the last one wins because
// the factorization is non-increasing.
ans = i;
int j = i + 1, k = i;
// Same Duval logic, but on a string of length 2n accessed
// via modular indexing. This lets us "see" rotations that
// wrap around the original string boundary.
while (j < 2 * n && s[k % n] <= s[j % n]) {
if (s[k % n] < s[j % n]) k = i;
else k++;
j++;
}
// Advance i past all complete copies of this Lyndon word.
while (i <= k) i += j - k;
}
return ans;
}
/*
* Booth's algorithm -- minimum rotation via modified failure function.
*
* Processes the doubled string s+s, maintaining a candidate start k.
* The failure function f[] is built relative to the current candidate,
* similar to KMP's prefix function but updated whenever a smaller
* rotation is discovered.
*
* Time: O(n) Space: O(n)
*/
int booth(const string &s) {
int n = s.size();
if (n == 0) return 0;
string t = s + s;
vector<int> f(2 * n, -1);
int k = 0; // best starting position so far
for (int j = 1; j < 2 * n; j++) {
// i is the fallback position relative to candidate k.
int i = f[j - 1 - k];
// Walk back through the failure chain on mismatch.
while (i != -1 && t[j] != t[k + i + 1]) {
// If t[j] is smaller, we found a better candidate.
if (t[j] < t[k + i + 1]) k = j - i - 1;
i = f[i];
}
if (t[j] != t[k + i + 1]) {
// Still mismatched after exhausting the failure chain.
if (t[j] < t[k + i + 1]) k = j;
f[j - k] = -1;
} else {
// Characters match -- extend the failure function.
f[j - k] = i + 1;
}
}
return k;
}
int main() {
string s;
cin >> s;
cout << "Lyndon factorization:\n";
auto fact = duval(s);
for (auto &w : fact) cout << " [" << w << "]";
cout << "\n";
int idx = min_rotation(s);
string rotated = s.substr(idx) + s.substr(0, idx);
cout << "Smallest rotation: \"" << rotated
<< "\" (starting at index " << idx << ")\n";
int idx2 = booth(s);
cout << "Booth confirms: index " << idx2 << "\n";
}Complexity at a Glance
| Operation | Time | Space | Notes |
|---|---|---|---|
| Lyndon factorization (Duval) | *Excluding the output vector | ||
| Min rotation (Duval, mod-index) | Preferred in contests | ||
| Min rotation (Booth) | Failure array of size | ||
Min rotation (Duval on s+s) | Explicit string doubling | ||
| Naive all-rotations sort | For comparison only |
All three
Problem Patterns and Tricks
1. Minimum cyclic rotation
The bread and butter. Given a string, find the lexicographically smallest rotation. Call min_rotation() and you're done.
cpp
string smallest_rotation(const string &s) {
int k = min_rotation(s);
return s.substr(k) + s.substr(0, k);
}Appears whenever a problem says "cyclic" and "lexicographic" in the same sentence -- necklaces, circular DNA sequences, rotating lock combinations.
Why it works -- the deeper explanation. The Lyndon factorization of
Why "last boundary before n" works:
s = "cabba" -> s+s = "cabbacabba"
Factorization of s+s:
"c" | "abbac" | "abba"
^ ^
i=0 i=1 (< n=5, record ans=1)
"c" >= "abbac" >= "abba" (non-increasing)
The boundary at i=1 gives rotation "abbac" -- which IS the smallest.
We don't need to compare all rotations; the non-increasing property
guarantees the last valid boundary is the best.Problems: CSES 1110 "Minimal Rotation", SPOJ MINMOVE
2. Necklace equivalence
Two strings represent the same "necklace" if one is a rotation of the other. To test equivalence, compute the canonical (smallest) rotation of each and compare.
cpp
bool same_necklace(const string &a, const string &b) {
if (a.size() != b.size()) return false;
int n = a.size();
int ka = min_rotation(a), kb = min_rotation(b);
for (int i = 0; i < n; i++)
if (a[(ka + i) % n] != b[(kb + i) % n])
return false;
return true;
}To group
Worked example -- counting distinct necklaces.
Given {"abc", "bca", "cab", "xyz", "zxy"}, how many distinct necklaces?
Canonical rotations:
"abc" -> min_rotation = 0 -> "abc"
"bca" -> min_rotation = 1 -> "abc" (same necklace!)
"cab" -> min_rotation = 2 -> "abc" (same necklace!)
"xyz" -> min_rotation = 0 -> "xyz"
"zxy" -> min_rotation = 1 -> "xyz" (same necklace!)
Distinct canonical forms: {"abc", "xyz"}
Answer: 2 distinct necklaces.This shows up in chemistry (molecular ring isomers), music theory (pitch class sets), and combinatorics. The min_rotation makes it practical even for large datasets.
Problems: Kattis "minrotation", classical necklace counting
Necklace equivalence via canonical rotation:
String A: "c a b" String B: "b c a"
╭───╮ ╭───╮
│ c │ │ b │
╱ ╲ ╱ ╲
╭───╮ ╭───╮ ╭───╮ ╭───╮
│ b │ │ a │ │ a │ │ c │
╰───╯ ╰───╯ ╰───╯ ╰───╯
min_rotation("cab") = 1 -> "abc"
min_rotation("bca") = 2 -> "abc"
Same canonical form -> same necklace OK3. Cyclic substring matching
Given a pattern
Strategy: compute the canonical rotation of
For the classic variant -- counting distinct rotations of
cpp
// Count how many distinct rotations of p appear as substrings of t
int count_cyclic_occurrences(const string &t, const string &p) {
int n = p.size();
// Build suffix automaton on t
// Feed each rotation of p (= substrings of p+p of length n)
// into the automaton, collecting distinct matches.
// Lyndon factorization helps identify the canonical
// starting point and avoid redundant checks.
// See CF 235C for the full approach.
}The key insight: rather than checking all
Problems: CF 235C "Cyclical Quest"
4. Lexicographic comparison of cyclic strings
Sometimes the problem isn't to find the minimum rotation but to compare two cyclic strings. Are all rotations of
The approach: compute min_rotation for both strings. If you need to compare arbitrary rotations (not just the minimum), you can combine Lyndon factorization with suffix arrays -- but for most contest problems, canonicalization via min_rotation + direct comparison is sufficient.
cpp
// Compare cyclic strings: is every rotation of a < every rotation of b?
// Equivalent to: canonical(a) < canonical(b)
bool cyclic_less(const string &a, const string &b) {
int ka = min_rotation(a), kb = min_rotation(b);
int n = a.size(), m = b.size();
for (int i = 0; i < min(n, m); i++) {
if (a[(ka + i) % n] != b[(kb + i) % m])
return a[(ka + i) % n] < b[(kb + i) % m];
}
return n < m;
}5. Suffix array connection
The Lyndon factorization is deeply connected to suffix arrays. In the SA-IS algorithm (one of the fastest linear-time suffix array constructions), the classification of suffixes into S-type and L-type mirrors the Lyndon word structure.
More directly: all proper suffixes of a Lyndon word are lexicographically larger than the word itself. This means suffixes within a Lyndon factor appear in a predictable order in the suffix array, which some problems exploit for faster construction or querying.
Connection between Lyndon factors and suffix array ordering:
s = "abb|aabb|a" (Lyndon factorization from earlier)
Within "abb" (a Lyndon word), suffixes are:
"abb" < "bb" < "b"
These appear in this relative order in the suffix array.
Within "aabb" (a Lyndon word), suffixes are:
"aabb" < "abb" < "bb" < "b"
Again, the word itself comes first among its suffixes.
This ordering property is what SA-IS exploits for linear-time
suffix array construction.6. String periodicity analysis
The Lyndon factorization reveals periodic structure. If the factorization of
cpp
bool has_period(const string &s, int p) {
auto fact = duval(s);
if (fact.empty()) return false;
const string &w = fact[0];
if ((int)w.size() != p) return false;
for (auto &f : fact)
if (f != w && w.substr(0, f.size()) != f)
return false;
return true;
}This connects to the KMP failure function: the smallest period of
Worked example -- detecting periods via factorization.
s = "abcabcab"
Lyndon factorization: "abcabcab"
Wait -- is "abcabcab" Lyndon? Check: "abcabcab" < "bcabcab"? Yes (a<b).
"abcabcab" < "cabcab"? Yes (a<c). "abcabcab" < "abcab"? Same prefix...
"abcabcab" vs "abcab": a=a, b=b, c=c, a=a, b=b, then "abcab" ends.
"abcabcab" > "abcab" (longer, same prefix). So NOT Lyndon!
Actual factorization: run Duval...
i=0: "abc" extends (a<b<c all increasing). Then s[3]='a'=s[0]='a'
-> repeat tracking. s[4]='b'=s[1]='b', s[5]='c'=s[2]='c',
s[6]='a'=s[3]... wait, k would be at 3, s[k]=s[0]='a'=s[6]='a',
s[7]='b'=s[1]='b'. j=8>=n. len=8-4=...
Actually: the factorization is "abc" | "abc" | "ab"
All factors equal to "abc" or a prefix of it -> period = 3.
Matches KMP: π[7] = 5, so period = 8 - 5 = 3. OKProblems: CF 825F "String Compression"
7. Runs and Lyndon roots
A run (maximal repetition) in a string is a maximal interval
Runs and their Lyndon roots:
s = "aabaabaab"
Run [0, 5]: "aabaab" has period 3 ("aab")
Lyndon root of "aab": "aab" itself (already Lyndon: aab < ab < b)
Run [3, 8]: "abaab" ... actually this isn't periodic enough.
Let's check: "aabaabaab" -- the run [0, 8] has period 3.
"aab" repeated 3 times = "aabaabaab"
Lyndon root: "aab" OK
The Lyndon root is unique per run and provides a canonical
representative for the repetition pattern.Understanding this won't appear directly in a contest, but it deepens your intuition for why Lyndon words keep showing up in string algorithms -- they are the natural "atoms" that repetitive structure is built from.
8. Splitting a cyclic string into primitive components
Given a circular string, sometimes you need to decompose it into its primitive root (the shortest string whose repetitions form the full string). Lyndon factorization handles this: if the factorization consists of
cpp
// Returns the primitive root of a string (shortest w such that
// s = w repeated k times, possibly with a prefix remainder)
string primitive_root(const string &s) {
auto fact = duval(s);
return fact[0]; // First Lyndon factor is always the root
// (only if all factors are identical)
}
// Check if s is a perfect power (s = w^k for some k > 1)
bool is_power(const string &s) {
auto fact = duval(s);
if (fact.size() <= 1) return false;
for (auto &f : fact)
if (f != fact[0]) return false;
return true;
}Contest Cheat Sheet
+-----------------------------------------------------------+
| LYNDON FACTORIZATION - QUICK REFERENCE |
+-----------------------------------------------------------+
| |
| Lyndon word: s < all its proper suffixes (strictly) |
| Factorization: s = w1 w2 ... wk, each wi Lyndon, |
| w1 >= w2 >= ... >= wk |
| |
| Duval (factorization): O(n) time, O(1) space |
| Min rotation (Duval): O(n) time, O(1) space |
| Min rotation (Booth): O(n) time, O(n) space |
| |
| When to use: |
| - "smallest/lexicographic rotation" |
| - "cyclic equivalence" or "necklace" |
| - "canonical form of cyclic string" |
| - comparing strings up to rotation |
| |
| Min rotation (Duval, O(1) space): |
| int i=0, ans=0; |
| while (i < n) { |
| ans = i; |
| int j = i+1, k = i; |
| while (j < 2*n && s[k%n] <= s[j%n]) |
| if (s[k%n] < s[j%n]) k = i; else k++; |
| j++; |
| while (i <= k) i += j-k; |
| } // ans = start of min rotation |
| |
+-----------------------------------------------------------+Gotchas and Debugging
1. Strict inequality in the Lyndon definition. A Lyndon word must be strictly less than all proper suffixes. The string "aa" is not Lyndon because "aa" "a" (a prefix is always less than the full string in standard lexicographic order). The factorization of "aa" is "a" | "a" -- two separate words. If you mistakenly treat "aa" as Lyndon, your factorization will be wrong.
2. Off-by-one in Booth's algorithm. Booth's failure function is indexed relative to the current best candidatef[j - k] access.
3. Empty string. Both duval() and min_rotation() should handle the empty string gracefully. The factorization of "" is empty, and min_rotation("") returns 0. The implementations above handle this correctly (the while loops simply don't execute).
4. All identical characters."aaaa" factorizes as "a" | "a" | "a" | "a" -- four single-character Lyndon words. The minimum rotation is 0. Make sure your inner while loop exits promptly when s[k] \leq s[j] with
5. The <= in Duval's inner loop. The condition is s[k] <= s[j], not s[k] < s[j]. The equality branch handles the repetition-tracking case. Changing <= to < produces incorrect factorizations -- the algorithm won't detect repeated copies of a Lyndon word.
6. Forgetting the doubled string for min rotation. If you run Duval on "cabba" has factorization "c" | "abb" | "a" -- the last word starts at position 4, giving rotation "acabb". But the actual minimum is "abbac" (position 1), which only appears when looking at s + s.
7. Modular indexing bounds. In the Duval-based min_rotation, the inner loop goes up to i < n is correct -- we only need Lyndon word boundaries in the first half of the doubled string.
8. Returning boundaries instead of substrings. In contest code you often want start positions, not substrings (to avoid
cpp
vector<int> duval_starts(const string &s) {
int n = s.size();
vector<int> starts;
int i = 0;
while (i < n) {
int j = i + 1, k = i;
while (j < n && s[k] <= s[j]) {
if (s[k] < s[j]) k = i;
else k++;
j++;
}
while (i <= k) {
starts.push_back(i);
i += j - k;
}
}
return starts;
}Mental Traps
Trap 1: "Lyndon factorization is only for finding the smallest rotation."
That's the most common application, but it's not the only one. The factorization reveals periodic structure, connects to suffix array construction (SA-IS), and provides canonical representatives for necklace equivalence classes. If you treat it as a single-purpose tool, you'll miss problems where the factorization's structure -- the sequence of Lyndon words and their lengths -- is itself the answer.
What Lyndon factorization actually unlocks:
┌─────────────────────────────────────────────┐
│ Lyndon Factorization │
│ s = w₁ w₂ ... wₖ (non-increasing) │
└──────┬──────────┬──────────┬────────────────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌────────┐ ┌──────────────────┐
│ Min │ │Necklace│ │ Period detection │
│ rotation │ │ canon │ │ (all wᵢ equal?) │
└──────────┘ └────────┘ └──────────────────┘
│ │
▼ ▼
┌──────────┐ ┌──────────────────┐
│ SA-IS │ │ String │
│ suffix │ │ compression │
│ classify │ │ (CF 825F) │
└──────────┘ └──────────────────┘Trap 2: "The uniqueness of the factorization is obvious."
It's not. Many different ways to split a string into Lyndon words exist. The Chen-Fox-Lyndon theorem says only one such split is non-increasing -- and that's the hard part to prove. If you don't appreciate uniqueness, you can't explain why Duval's algorithm always produces the correct answer regardless of input. The comparison logic (
Common Mistakes
- Booth's failure-function confusion with KMP. "My Booth's algorithm returned index 0 for the string
abab, but the minimum rotation isababstarting at index 0 -- wait, shouldn't it beabab?" Actuallyababhas rotations:abab,baba,abab,baba. The minimum isababat index 0, which is correct. The real mistake was on a string likebaa-- rotations arebaa,aab,aba. The minimum isaab(index 1). The bug was in the failure-function update: whens[i] < s[j], the pointerjshould reset to0(not stay put). In Booth's variant, you must handle the<case by resetting the candidate start position. Confusing Booth's failure function with KMP's is the classic trap.
Debug Drills
Drill 1 -- Duval's algorithm: wrong reset after mismatch
cpp
int i = 0;
while (i < n) {
int j = i, k = i + 1;
while (k < n && s[j] <= s[k]) {
j = (s[j] == s[k]) ? j + 1 : i; // BUG
k++;
}
while (i <= j) {
i += k - j;
}
}What's wrong?
When s[j] < s[k] (strict), j should reset to i (the start of the current tentative Lyndon word). But the condition s[j] <= s[k] groups == and < together incorrectly. When s[j] == s[k], we should advance j to j+1. When s[j] < s[k], we should reset j to i. The fix:
cpp
if (s[j] < s[k]) {
j = i; // extend current Lyndon word
} else {
j++; // s[j] == s[k], continue comparing
}Wait -- actually in Duval's algorithm: if s[k] > s[j], we reset j = i (current block can be extended). If s[k] == s[j], we increment j. If s[k] < s[j], we break and output. The original code has the condition inverted.
Drill 2 -- Booth's algorithm: forgetting to double the string
cpp
vector<int> f(n, -1);
int k = 0;
for (int j = 1; j < n; j++) { // should be 2*n
// ... Booth's logic ...
}What's wrong?
Booth's algorithm works on the doubled string s + s. The loop must iterate up to 2*n, and the failure array must have size 2*n. With only n iterations, you miss rotations that start in the second half:
cpp
vector<int> f(2 * n, -1);
int k = 0;
for (int j = 1; j < 2 * n; j++) {
char sj = s[j % n];
// ... rest of Booth's logic using s[... % n] ...
}Drill 3 -- Outputting Lyndon factors in wrong order
cpp
// Inside Duval's algorithm, outputting factors:
while (i <= j) {
factors.push_back(s.substr(i, k - j));
i += k - j;
j += k - j; // BUG: j should not change
}What's wrong?
After the inner while-loop breaks, the period length is k - j. We output complete factors of length k - j starting from i. The variable j should not be modified in this output loop. Fix:
cpp
int len = k - j;
while (i <= j) {
factors.push_back(s.substr(i, len));
i += len;
}Self-Test
Before moving to problems, verify you can do these without looking at the notes:
- [ ] State the definition of a Lyndon word precisely -- including why
"abab"is NOT Lyndon (it equals a rotation of itself) and why"ab"IS Lyndon - [ ] State the Chen-Fox-Lyndon theorem: every string has a unique factorization into Lyndon words in non-increasing lexicographic order
- [ ] Explain why Duval's algorithm is
: variable never decreases, and the total advancement of across all rounds is bounded by - [ ] Hand-trace Duval on
"aab"and verify the result is non-increasing Lyndon words - [ ] Describe how the Lyndon factorization of
yields the smallest rotation of , and identify which boundary gives the starting index
Practice Problems
| CF Rating | What You Should Be Able to Do |
|---|---|
| 1200 | Know what a rotation is and find the minimum rotation by brute force. |
| 1500 | Implement Booth's algorithm for minimum rotation; understand what a Lyndon word is. |
| 1800 | Implement Duval's algorithm; use Lyndon factorization to solve cyclic-comparison problems. |
| 2100 | Combine Lyndon factorization with suffix arrays or SAM for advanced cyclic substring problems. |
| Problem | Source | Difficulty | Key Technique |
|---|---|---|---|
| Minimal Rotation | CSES 1110 | CF ~1400 | Direct min_rotation() application |
| MINMOVE | SPOJ | CF ~1400 | Direct minimum rotation |
| Glass Beads | POJ 1509 | CF ~1500 | Minimum rotation, classic problem |
| Cyclical Quest | CF 235C | CF 2400 | Cyclic substrings via suffix automaton |
| String Compression | CF 825F | CF 2200 | Period analysis with Z-function + DP |
| Password | CF 126B | CF 1700 | Prefix-suffix matching (KMP/Z, related) |
| Minimum Cyclic Shift | Timus 1423 | CF ~1500 | Direct Duval or Booth application |
| Isomorphic Strings | CF 985F | CF 2300 | Hashing + cyclic comparison |
| Necklace | UVa 11475 | CF ~1600 | Palindromic necklace via min rotation + KMP |
| Periodic Strings | CF 1295C | CF 1400 | String period analysis (Lyndon structure insight) |
| Anthem of Berland | CF 808G | CF 2200 | KMP with cyclic pattern variants |
Suggested Progression
Start with CSES 1110, SPOJ MINMOVE, Timus 1423 -- direct application of
min_rotation. Write the code from scratch, don't copy-paste. These build muscle memory for Duval's three-case logic.Build with POJ 1509, UVa 11475, CF 1295C -- slightly more involved problems where you need to combine min rotation or periodicity analysis with another technique.
Strengthen with CF 126B, CF 985F -- problems where Lyndon-adjacent thinking (prefix functions, cyclic hashing) is the core technique. These test your understanding of the ideas behind Lyndon factorization, not just the implementation.
Master with CF 235C, CF 825F, CF 808G -- hard problems requiring suffix automata, DP, or multi-technique combinations where cyclic string structure is one ingredient among several.
The first three are direct applications. CF 235C requires combining cyclic rotation ideas with a suffix automaton. CF 825F uses period detection (related to Lyndon structure). CF 126B exercises the KMP-family thinking that Booth's algorithm also builds on.
Further Reading
- cp-algorithms: Lyndon Factorization -- Thorough treatment with proofs, Duval's algorithm, and the minimum rotation connection.
- Booth's Algorithm (Wikipedia) -- Formal description and correctness argument.
- CF Blog: Lyndon Factorization and Applications -- Community discussion with contest examples.
- Original paper: J.P. Duval, "Factorizing Words over an Ordered Alphabet", Journal of Algorithms 4(4), 1983 -- the O(n) algorithm.
Related Techniques
- Suffix Array -- suffix arrays also solve lexicographic problems on strings, including minimum rotation
- KMP and Z-Function -- Booth's algorithm for minimum rotation uses a KMP-like failure function
- String Hashing -- hashing provides an alternative O(n log n) approach to cyclic comparison