Appearance
Arrays and Strings
Quick Revisit
- USE WHEN: You need O(1) lookup/update by index or need to track seen elements
- INVARIANT: a[i] is accessed in O(1) via base + i*sizeof(element)
- TIME: O(1) access/update | SPACE: O(n)
- CLASSIC BUG: Off-by-one on 0-indexed vs 1-indexed arrays
- PRACTICE: ../12-practice-ladders/01-ladder-1100-to-1400.md
The most fundamental data structures in competitive programming—contiguous memory blocks that underpin nearly every problem you'll solve on Codeforces.
Difficulty: Beginner | Prerequisites: None
Arrays date to the earliest stored-program computers of the late 1940s, where contiguous memory words were the only data structure available. The C language (1972) made pointer-arithmetic access a first-class citizen, and C++ std::vector (1994, HP STL) added dynamic resizing while preserving the O(1) access guarantee. Strings as first-class objects—rather than null-terminated char*—arrived with std::string in C++98.
Think of an array as a row of numbered lockers: knowing the number gets you to the locker in one step, no matter how long the hallway. The real aha moment comes when you realize that an array's index is a hash function for small integers—and that almost every "have I seen X before?" question collapses to a single array lookup. Once that clicks, competitive programming starts to click too. A useful mnemonic: "Value IS address—skip the search."
Contents
- Intuition
- C++ STL Reference
- Implementation (Contest-Ready)
- Operations Reference
- Problem Patterns and Tricks
- Contest Cheat Sheet
- If I Had 5 Minutes
- Before You Code Checklist
- Gotchas and Debugging
- When NOT to Use This
- What Would You Do If...?
- The Mistake That Teaches
- Self-Test
- Debug-This Exercises
- Practice Problems
- Further Reading
Intuition
You have an array of six integers and need to find the first value that appears twice:
text
a = [3, 1, 4, 1, 5, 3]Your first instinct: for each element a[i], scan everything before it looking for a match.
text
i=0 value=3 check against: (nothing) --> 0 comparisons
i=1 value=1 check against: 3 --> 1 comparison, no match
i=2 value=4 check against: 3, 1 --> 2 comparisons, no match
i=3 value=1 check against: 3, 1 --> 2 comparisons, MATCH!Found it—1 is the first duplicate—after 5 comparisons. Not terrible for 6 elements. But you got lucky: the duplicate was near the middle. Push it to the end:
text
a = [1, 2, 3, 4, 5, 1] (duplicate at position 5)
i=0 (0 comparisons)
i=1 (1 comparison)
i=2 (2 comparisons)
i=3 (3 comparisons)
i=4 (4 comparisons)
i=5 (5 comparisons) --> finally, MATCH
Total: 0 + 1 + 2 + 3 + 4 + 5 = 15 comparisons = n*(n-1)/2For
We're re-scanning previous elements for every single position. There has to be a better way.
An array's
Think of numbered mailboxes at a post office. To check if mail arrived for box #4, you don't search through every box starting from #1. You walk directly to box 4 and open it. The box number is the location—no searching required.
An array works the same way: seen[4] compiles to a single address calculation—base + 4 * sizeof(bool)—regardless of how many elements exist. No comparisons, no traversal.
Where the analogy breaks: a post office has a fixed number of boxes. If someone asks for box #1,000,000,000, you can't build a billion mailboxes. Similarly, if your values range up to seen array. You'd need hashing (unordered_map) or coordinate compression to shrink the value space first.
Why contiguous memory matters beyond indexing: modern CPUs load memory in cache lines (~64 bytes). When you iterate a vector, adjacent elements are already in L1 cache—the hardware predicts your sequential access pattern. A list or map scatters nodes across the heap, turning each access into a potential cache miss.
For vector over list/deque unless you specifically need
Visual Walkthrough
Same array: a = [3, 1, 4, 1, 5, 3]. Allocate seen[6], all initialized to false (shown as .; * means true).
text
Step 1: Process a[0] = 3
a: | 3 | 1 | 4 | 1 | 5 | 3 |
^
i=0
seen: | . | . | . | . | . | . |
index: 0 1 2 3 4 5
seen[3] is false --> not a duplicate
Set seen[3] = true
seen: | . | . | . | * | . | . |
index: 0 1 2 3 4 5
ops so far: 1 lookup + 1 write = 2text
Step 2: Process a[1] = 1
a: | 3 | 1 | 4 | 1 | 5 | 3 |
^
i=1
seen: | . | . | . | * | . | . |
index: 0 1 2 3 4 5
seen[1] is false --> not a duplicate
Set seen[1] = true
seen: | . | * | . | * | . | . |
index: 0 1 2 3 4 5
ops so far: 4text
Step 3: Process a[2] = 4
a: | 3 | 1 | 4 | 1 | 5 | 3 |
^
i=2
seen: | . | * | . | * | . | . |
index: 0 1 2 3 4 5
seen[4] is false --> not a duplicate
Set seen[4] = true
seen: | . | * | . | * | * | . |
index: 0 1 2 3 4 5
ops so far: 6text
Step 4: Process a[3] = 1
a: | 3 | 1 | 4 | 1 | 5 | 3 |
^
i=3
seen: | . | * | . | * | * | . |
index: 0 1 2 3 4 5
seen[1] is TRUE --> DUPLICATE FOUND! Return value 1.
seen: | . | * | . | * | * | . |
index: 0 1 2 3 4 5
ops so far: 7 (just one more lookup)Brute force: 5 comparisons in the lucky case, up to 15 in the worst case—
Array lookup: 7 constant-time operations, always—
The gap widens fast. At
The Invariant
text
+----------------------------------------------------------+
| INVARIANT: After processing a[0..i-1], seen[v] == true |
| if and only if value v appeared in a[0..i-1]. Each |
| lookup and update costs O(1) because array indexing is |
| direct address arithmetic: base + v * sizeof(bool). |
+----------------------------------------------------------+The invariant holds because every time we process a[i]:
- We read
seen[a[i]]—by the invariant, this istrueexactly whena[i]already appeared ina[0..i-1]. - If
false, we setseen[a[i]] = true, extending the invariant to covera[0..i]. - If
true, we have found a duplicate and stop.
Look at Step 4 above—seen[1] is true because Step 2 set it when we first encountered value 1 at index 1. The invariant guarantees that a true entry means "this value appeared earlier," not "some garbage was in memory." Without
This same principle—use the value as the index—is the foundation of frequency arrays (freq[c - 'a']++), counting sort, bucket sort, and direct-address tables. The contiguous memory layout of arrays is what makes the address calculation constant-time and the whole approach viable.
What the Code Won't Teach You
The real skill is boundary discipline, not memorization. When you write a loop or a slice, explicitly state your invariant: "i is the index of the current element" or "everything in 0..i-1 has been processed." The moment that invariant gets fuzzy, you are one off-by-one away from a bug. This is not about memorizing that arrays are zero-indexed—it is about maintaining a precise mental model at every step.
text
Boundary invariant at each loop step
+---+---+---+---+---+---+
| # | # | # | | | |
+---+---+---+---+---+---+
0 1 2 3 4 5
<-processed->|<-pending->
^
i
At step i: a[0..i-1] done, a[i..n-1] remainingStrings in competitive programming are almost always about patterns, not raw characters. The interesting questions are: does string A appear in string B? What is the longest prefix shared by two strings? How many distinct substrings exist? The mechanics—indexing, slicing, concatenation—are just the substrate. The real skill is recognizing the structural question hiding underneath the string manipulation.
text
Surface vs structure
s = a b c a b c d
+--+--+--+--+--+--+--+
| | | | | | | |
+--+--+--+--+--+--+--+
Pattern view
|<--abc-->|<--abc-->| d
+--+--+--+--+--+--+--+
| #| #| #| #| #| #| |
+--+--+--+--+--+--+--+
period 3Character arithmetic is a basic building block, not a trick. c - 'a' maps lowercase letters to indices 0 through 25, letting you use characters directly as array positions. This is the foundation of every frequency-counting solution and appears in nearly every string problem at every difficulty level.
text
Character to index mapping
char a b c d ... z
| | | | |
v v v v v
idx 0 1 2 3 25
+---+---+---+---+-----+---+
freq | | | | | ... | |
+---+---+---+---+-----+---+
0 1 2 3 25With those deeper skills in mind, here's how to recognize when an array-based approach is the right tool.
When to Reach for This
Trigger phrases in problem statements:
- "count the frequency of each element" → frequency array indexed by value
- "find duplicates" or "any value that repeats" → boolean
seenarray - "values bounded by a small constant (
)" → direct indexing feasible - "rearrange / sort when values are in a small range" → counting sort via array buckets
- "character frequency in a string" →
freq[c - 'a']with a size-26 array
Counter-examples (looks like direct-indexing, but isn't):
- Values up to
or negative values—can't allocate an array that big. Use unordered_mapor coordinate-compress into a dense range first, then index. See Sort + Unique (Coordinate Compression) below. - "Find the
-th smallest element"—direct indexing doesn't help with rank queries. You need sorting, a selection algorithm, or an order-statistics tree. - "Insert/delete elements frequently in the middle"—arrays pay
per mid-insertion due to shifting. If this dominates your runtime, consider a balanced BST or a policy-based tree.
Where arrays and strings appear on Codeforces:
- Tags:
implementation,strings,sortings,greedy,brute force - Difficulty: 800–1600 (arrays as the core structure), but array manipulation tricks appear up to 2400+
- Nearly every problem uses arrays—the question is how
Pattern Fingerprint:
values bounded by 10^6→ direct-index frequency array |find duplicates / first repeat→ boolean seen array |range sum queries→ prefix sums |longest subarray with property→ sliding window / two pointers |string matching→ hashing or KMP (nots.find())
Rating Progression—what array/string skills matter at each level:
| CF Rating | What You Need |
|---|---|
| ~1200 | Frequency arrays, basic string manipulation, sort + iteration, simple simulation |
| ~1500 | Prefix sums, two pointers, sliding window, coordinate compression, s += c vs s = s + c awareness |
| ~1800 | Kadane's variations, sweep-line on arrays, string hashing, careful complexity analysis of nested loops |
| ~2100+ | Suffix arrays, Z/KMP for non-trivial matching, segment trees over arrays, bitmask + array DP |
See: Complexity Analysis for how to evaluate whether your array approach fits in time
With the core idea clear, here is the C++ toolkit that implements it.
C++ STL Reference
Containers
| Function / Class | Header | What it does | Time Complexity |
|---|---|---|---|
vector<T> | <vector> | Dynamic array, contiguous memory | Access |
array<T,N> | <array> | Fixed-size array on stack, bounds-checked with .at() | Access |
string | <string> | Dynamic char array with string operations | Access substr |
string_view | <string_view> | Non-owning view into string, no copy | Construction |
Key vector operations
| Function / Class | Header | What it does | Time Complexity |
|---|---|---|---|
v.push_back(x) | <vector> | Append element (copy) | Amortized |
v.emplace_back(args...) | <vector> | Construct element in-place at end | Amortized |
v.pop_back() | <vector> | Remove last element | |
v.resize(n) | <vector> | Change size; new elements value-initialized | |
v.resize(n, val) | <vector> | Change size; new elements set to val | |
v.reserve(n) | <vector> | Pre-allocate capacity (no size change) | |
v.shrink_to_fit() | <vector> | Request to reduce capacity to size | |
v.insert(it, x) | <vector> | Insert before iterator | |
v.erase(it) | <vector> | Remove at iterator, shift left | |
v.erase(first, last) | <vector> | Remove range | |
v.clear() | <vector> | Remove all elements (size=0, capacity unchanged) | |
v.assign(n, val) | <vector> | Replace contents with n copies of val | |
v.front() / v.back() | <vector> | First / last element reference | |
v.data() | <vector> | Pointer to underlying array |
Key string operations
| Function / Class | Header | What it does | Time Complexity |
|---|---|---|---|
s.substr(pos, len) | <string> | Extract substring (returns new string) | |
s.find(t) | <string> | Find first occurrence of t | |
s.rfind(t) | <string> | Find last occurrence of t | |
s.find_first_of(chars) | <string> | Find first char in set | |
s += t | <string> | Append string | Amortized |
s.append(t) | <string> | Same as += | Amortized |
s.insert(pos, t) | <string> | Insert at position | |
s.erase(pos, len) | <string> | Erase substring | |
s.replace(pos, len, t) | <string> | Replace substring | |
s.c_str() | <string> | Null-terminated C-string pointer | |
stoi(s) / stoll(s) | <string> | String to int / long long | |
to_string(x) | <string> | Number to string |
Algorithms on Ranges
| Function / Class | Header | What it does | Time Complexity |
|---|---|---|---|
sort(b, e) | <algorithm> | Sort range [b, e) | |
reverse(b, e) | <algorithm> | Reverse range in-place | |
unique(b, e) | <algorithm> | Remove consecutive duplicates (returns new end) | |
lower_bound(b, e, x) | <algorithm> | Iterator to first element | |
upper_bound(b, e, x) | <algorithm> | Iterator to first element | |
accumulate(b, e, init) | <numeric> | Sum of range | |
max_element(b, e) | <algorithm> | Iterator to max | |
min_element(b, e) | <algorithm> | Iterator to min | |
fill(b, e, val) | <algorithm> | Fill range with value | |
iota(b, e, start) | <numeric> | Fill with start, start+1, ... | |
next_permutation(b, e) | <algorithm> | Mutate to next lexicographic permutation |
Hidden Complexity Table
Many operations look cheap but hide linear or worse costs. This table collects the surprises:
| Operation | vector | string | deque | Notes |
|---|---|---|---|---|
push_back | * = amortized; single call may trigger | |||
insert(mid) | Must shift elements; not "just one insert" | |||
substr | N/A | N/A | Allocates a new string of length | |
find | N/A | string::find is NOT guaranteed to use KMP/BM—GCC uses naive search | ||
+ (concat) | N/A | N/A | Creates a new string; prefer += to avoid temporaries | |
== (compare) | Must check every element; no short hash | |||
erase(begin) | Deque wins for front removal | |||
operator[] | All three—but deque has higher constant factor |
Warning:
string::findon GCC (libstdc++) uses a naivesearch, not Knuth-Morris-Pratt. If you need fast substring search in a contest, use String Hashing or KMP/Z-Function instead.
Implementation (Contest-Ready)
Minimal Contest Template
cpp
#include <bits/stdc++.h>
using namespace std;
int main(){
ios::sync_with_stdio(false);
cin.tie(nullptr);
int t;
cin >> t;
while(t--){
int n;
cin >> n;
vector<int> a(n);
for(auto &x : a) cin >> x;
// --- solve ---
sort(a.begin(), a.end());
// Invariant: after unique(), elements in [begin, new_end) are distinct and sorted
a.erase(unique(a.begin(), a.end()), a.end());
cout << a.size() << "\n";
// Invariant: at iteration i, elements a[0..i-1] have been printed
for(int i = 0; i < (int)a.size(); i++){
cout << a[i] << " \n"[i + 1 == (int)a.size()];
}
}
}Annotated Version
cpp
#include <bits/stdc++.h>
using namespace std;
int main(){
// Fast I/O: decouples C and C++ streams, disables sync with scanf/printf.
// Never mix cin/cout with scanf/printf after this.
ios::sync_with_stdio(false);
cin.tie(nullptr); // don't flush cout before each cin
int t;
cin >> t;
while(t--){
int n;
cin >> n;
// Read array. vector<int> a(n) creates n elements initialized to 0.
// Prefer this over `int a[n]` (VLA is non-standard in C++).
vector<int> a(n);
for(auto &x : a) cin >> x; // range-for with reference to read in-place
// Sort + unique: classic idiom to remove duplicates.
// unique() moves consecutive dupes to the end, returns iterator to new logical end.
// erase() actually shrinks the vector.
// Invariant: after this block, a[] contains only distinct values in sorted order
sort(a.begin(), a.end());
a.erase(unique(a.begin(), a.end()), a.end());
cout << a.size() << "\n";
// Output with spaces between elements, newline at end.
// " \n"[condition] is a trick: if condition is true (last element),
// prints '\n'; otherwise prints ' '.
// Invariant: at step i, a[0..i-1] have been printed with correct delimiters
for(int i = 0; i < (int)a.size(); i++){
cout << a[i] << " \n"[i + 1 == (int)a.size()];
}
}
}String I/O Patterns
cpp
#include <bits/stdc++.h>
using namespace std;
int main(){
ios::sync_with_stdio(false);
cin.tie(nullptr);
// Read a single string (no spaces)
string s;
cin >> s;
// Read entire line (including spaces)
string line;
getline(cin, line); // WARNING: if cin >> was used before, consume leftover '\n' first
// Fix: cin.ignore() or cin >> ws before getline
// Read n strings
int n;
cin >> n;
vector<string> words(n);
for(auto &w : words) cin >> w;
// Build string efficiently
string result;
result.reserve(n * 10); // pre-allocate if you know approximate size
for(auto &w : words){
result += w;
result += ' ';
}
// Character frequency (ASCII trick)
int freq[128] = {};
for(char c : s) freq[(int)c]++;
// Convert case
for(char &c : s) c = tolower(c);
cout << s << "\n";
}Operations Reference
Use this section as a quick lookup during contests. Each table shows the time and space cost of every common operation.
vector<T>
| Operation | Time | Space |
|---|---|---|
Access v[i] | ||
push_back / emplace_back | Amortized | |
pop_back | ||
insert at position i | ||
erase at position i | ||
resize(n) | ||
reserve(n) | ||
clear | ||
sort | ||
| Iterate all | ||
| Copy |
string
| Operation | Time | Space |
|---|---|---|
Access s[i] | ||
s += t (append) | Amortized | |
substr(pos, len) | ||
find(t) | ||
compare / == | ||
insert(pos, t) | ||
erase(pos, len) | ||
Lexicographic < |
C-Array T a[N]
| Operation | Time | Space |
|---|---|---|
Access a[i] | ||
Fill with memset/fill | ||
Copy with memcpy/copy |
String Iteration Efficiency
Small choices in string loops have large performance consequences:
1. s.size() returns size_t (unsigned)—match your loop variable or cast.
cpp
// BAD: comparison between signed and unsigned
for(int i = 0; i < s.size(); i++) // compiler warning; works, but sloppy
// GOOD: cast to int (safe for n <= 2*10^9)
for(int i = 0; i < (int)s.size(); i++)
// ALSO GOOD: range-for when you don't need the index
for(char c : s)2. s[i] is s.substr(i) is
Every substr allocates a new string and copies characters. In a loop, this turns
cpp
// BAD: O(n^2) -- substr copies (n-i) chars each iteration
for(int i = 0; i < (int)s.size(); i++){
string tail = s.substr(i); // O(n-i) allocation + copy
if(tail < best) best = tail;
}
// GOOD: compare in-place using iterators or string_view
for(int i = 0; i < (int)s.size(); i++){
// Invariant: best holds the lex-smallest suffix seen so far among s[0..i-1]
if(s.compare(i, string::npos, best) < 0)
best = s.substr(i); // only copy when we find a new best
}3. s += c vs s = s + c—amortized
cpp
string s;
// BAD: s = s + c creates a temporary, copies all of s, then appends c --> O(n) per call
// Invariant: after i iterations, s contains i characters (but O(i^2) total work)
for(int i = 0; i < n; i++)
s = s + 'a'; // O(n^2) total!
// GOOD: s += c appends in-place with amortized O(1) reallocation
// Invariant: after i iterations, s contains i characters (O(i) total work)
for(int i = 0; i < n; i++)
s += 'a'; // O(n) totalThis is the single most common performance bug in string-heavy contest problems. If your string solution TLEs, check for
s = s + ...first.
vector<bool> Specialization—A Dangerous Trap
vector<bool> is not a real vector. The C++ standard requires it to be a packed bitfield (1 bit per element), which breaks several assumptions:
cpp
// This works for vector<int>, but FAILS for vector<bool>:
vector<bool> v(8, true);
// FAILS: cannot take address of a bit -- v[0] returns a proxy object, not a bool&
bool* p = &v[0]; // compile error!
// FAILS: range-for with reference doesn't bind to the proxy
for(bool& b : v) b = false; // compile error in some compilers!
// FAILS: data() is not available
bool* raw = v.data(); // compile error!Why this matters in CP: You might write a generic function taking vector<T>& and it silently breaks when T = bool. Or you pass &v[0] to a C-style API and get a compile error mid-contest.
The fix—use vector<char> or bitset:
cpp
// Use vector<char> when you need a resizable boolean array
vector<char> seen(n, 0); // each element is 0 or 1, 1 byte each
seen[i] = 1; // works exactly like you'd expect
char* p = &seen[0]; // fine -- real references, real pointers
// Use bitset<N> when size is fixed at compile time and you want bit-packing
bitset<100001> seen; // 1 bit per element, but with proper interface
seen[i] = 1;
seen.count(); // popcount in O(N/64)See: Bitset Optimization for advanced bitset techniques in CP
The primitives above are building blocks. These are the contest patterns they assemble into.
Problem Patterns and Tricks
Frequency Counting
Count occurrences of elements. Use a plain array for small alphabets, unordered_map for large ranges.
cpp
// Character frequency in O(n)
// Invariant: after processing s[0..i-1], freq[c-'a'] == count of c in s[0..i-1]
int freq[26] = {};
for(char c : s) freq[c - 'a']++;Sort + Unique (Coordinate Compression)
Remove duplicates or compress values to indices.
cpp
vector<int> vals = a; // copy
sort(vals.begin(), vals.end());
vals.erase(unique(vals.begin(), vals.end()), vals.end());
// Now vals[i] is the i-th smallest unique value.
// To compress: index of x = lower_bound(vals.begin(), vals.end(), x) - vals.begin()Problems: CF 1005C - Summarize to the Power of Two, CF 1560C
Two Pointers on Sorted Array
Maintain two indices moving inward (or same direction) to find pairs or subarrays.
cpp
// Find pair with sum = target in sorted array
// Invariant: no valid pair exists entirely within a[0..lo-1] or a[hi+1..n-1]
int lo = 0, hi = (int)a.size() - 1;
while(lo < hi){
int s = a[lo] + a[hi];
if(s == target){ /* found */ break; }
else if(s < target) lo++;
else hi--;
}text
Two pointer movement on sorted array -- find pair summing to 9
+---+---+---+---+---+----+
| 1 | 2 | 4 | 5 | 7 | 10 | target = 9
+---+---+---+---+---+----+
^ ^
lo hi 1+10 = 11 > 9 --> hi moves left
+---+---+---+---+---+----+
| 1 | 2 | 4 | 5 | 7 | 10 |
+---+---+---+---+---+----+
^ ^
lo hi 1+7 = 8 < 9 --> lo moves right
+---+---+---+---+---+----+
| 1 | 2 | 4 | 5 | 7 | 10 |
+---+---+---+---+---+----+
^ ^
lo hi 2+7 = 9 FOUNDSee: Two Pointers
Prefix Sums for Range Queries
Pre-compute prefix sums to answer range-sum queries in
cpp
// Invariant: pre[i] == sum of a[0..i-1]; pre[0] == 0
vector<long long> pre(n + 1, 0);
for(int i = 0; i < n; i++) pre[i + 1] = pre[i] + a[i];
// sum of a[l..r] = pre[r+1] - pre[l]text
Building prefix sums from array a
index 0 1 2 3 4
+---+---+---+---+---+
a | 3 | 1 | 4 | 1 | 5 |
+---+---+---+---+---+
| | | | |
v v v v v accumulate left to right
index 0 1 2 3 4 5
+---+---+---+---+---+----+
pre | 0 | 3 | 4 | 8 | 9 | 14 |
+---+---+---+---+---+----+
pre[i+1] = pre[i] + a[i]
Range query: sum of a[1..3]
+---+---+---+---+---+----+
pre | 0 | 3 | 4 | 8 | 9 | 14 |
+---+---+---+---+---+----+
^ ^
pre[1]=3 pre[4]=9
+---subtract----+ 9 - 3 = 6See: Prefix Sums
String Palindrome Check
Use two pointers from both ends or reverse-compare.
cpp
bool isPalin(const string &s){
int n = s.size();
for(int i = 0; i < n / 2; i++)
if(s[i] != s[n - 1 - i]) return false;
return true;
}
// Or: string t = s; reverse(t.begin(), t.end()); return s == t;Sliding Window on String
Fixed or variable-width window scanning for substrings matching a property.
cpp
// Count substrings of length k with all distinct chars
// Invariant: freq[] tracks char counts in window s[max(0,i-k+1)..i], distinct == number of nonzero entries
int cnt = 0;
int freq[26] = {};
int distinct = 0;
for(int i = 0; i < (int)s.size(); i++){
if(freq[s[i] - 'a']++ == 0) distinct++;
if(i >= k){
if(--freq[s[i - k] - 'a'] == 0) distinct--;
}
if(i >= k - 1 && distinct == k) cnt++;
}See: Sliding Window
Erase-Remove Idiom
Efficiently remove elements matching a predicate without manual index juggling.
cpp
// Remove all zeros from vector
v.erase(remove(v.begin(), v.end(), 0), v.end());
// Remove all even numbers
v.erase(remove_if(v.begin(), v.end(), [](int x){ return x % 2 == 0; }), v.end());Max Subarray (Kadane's Algorithm)
Find contiguous subarray with maximum sum in
cpp
// Invariant: cur == max subarray sum ending at current element; best == global max so far
long long best = a[0], cur = 0;
for(int x : a){
cur = max((long long)x, cur + x);
best = max(best, cur);
}text
Tracking cur and best through the array
a | -2 | 3 | -1 | 5 | -4 | 2 |
+----+----+----+----+----+----+
cur -2 3 2 7 3 5
best -2 3 3 7 7 7
^
best subarray is a[1..3]
3 + (-1) + 5 = 7
cur = max(x alone, cur + x) -- extend or restart
best = max(best, cur) -- global maximum so farContest Cheat Sheet
text
+---------------------------------------------------------------+
| ARRAYS & STRINGS CHEAT SHEET |
+---------------------------------------------------------------+
| WHEN TO USE: |
| o Random access needed o Contiguous data |
| o Sorting / binary search o Frequency counting |
| o String matching / manipulation |
+---------------------------------------------------------------+
| KEY STL: |
| vector<int> a(n); // dynamic array |
| sort(a.begin(), a.end()); // O(n log n) |
| a.erase(unique(a.begin(), a.end()), a.end()); // dedup |
| lower_bound(a.begin(), a.end(), x); // binary search |
| string s; cin >> s; // read word |
| s.substr(pos, len); // O(len) -- returns new string |
| s.find("pat"); // O(n*m) -- use hashing if TLE |
+---------------------------------------------------------------+
| COMPLEXITY: |
| Access: O(1) push_back: amortized O(1) |
| Insert/mid: O(n) Sort: O(n log n) |
| find (str): O(n*m) substr: O(k) |
+---------------------------------------------------------------+
| SPACE: vector O(n), string O(n), C-array O(N) stack |
+---------------------------------------------------------------+
| FAST I/O: |
| ios::sync_with_stdio(false); cin.tie(nullptr); |
+---------------------------------------------------------------+If I Had 5 Minutes
vector<int> a(n); for(auto &x : a) cin >> x;—read an arraysort(a.begin(), a.end()); a.erase(unique(a.begin(), a.end()), a.end());—deduplicateint freq[26]={}; for(char c:s) freq[c-'a']++;—character frequencypre[i+1] = pre[i] + a[i];withsum(l,r) = pre[r+1] - pre[l]—prefix sums- Use
s += cnots = s + c; use(int)v.size()not barev.size()—avoid the two most common bugs
Before You Code Checklist
Before writing your solution to an array/string problem, verify:
- [ ] Bounds: What is the maximum
? Does fit, or do you need or ? - [ ] Type: Will values exceed
int? Uselong longfor sums and products. Check intermediate expressions too. - [ ] Edge cases: Empty array? Single element? All elements identical? String of length 1?
- [ ] Indexing convention: 0-indexed or 1-indexed? Stick with one. If the problem uses 1-indexed, decide whether to convert on input.
- [ ] Output format: Spaces between elements? Trailing space OK? Newline at end? Read the output specification twice.
Gotchas and Debugging
These are the bugs that cost rounds. Most array and string errors fall into a handful of recurring patterns—learn to recognize them before they bite.
Off-by-One Errors
v.size()returnssize_t(unsigned).v.size() - 1whenvis empty gives a huge number, not. Cast to intor checkv.empty()first.- 0-indexed vs 1-indexed: pick one convention per problem and stick with it. Most CF editorial code uses 0-indexed arrays but 1-indexed in the problem statement.
String Concatenation in a Loop
cpp
// BAD: O(n^2) total -- each += copies the entire string
string result;
for(int i = 0; i < n; i++) result += to_string(a[i]) + " ";
// BETTER: pre-reserve or use ostringstream
string result;
result.reserve(n * 12);
for(int i = 0; i < n; i++){
result += to_string(a[i]);
result += ' ';
}In practice, for competitive programming, just printing in a loop is fine. Only worry about this if you're building a large string to use later.
Iterator Invalidation
push_backmay reallocate, invalidating all iterators to the vector.eraseinvalidates iterators at and after the erased position.- Never do
for(auto it = v.begin(); it != v.end(); ++it) if(*it == x) v.erase(it);—use the erase-remove idiom instead.
The crash scenario—push_back during iteration:
cpp
// BUG: Undefined Behavior! push_back may reallocate, invalidating `it`
vector<int> v = {1, 2, 3, 4, 5};
for(auto it = v.begin(); it != v.end(); ++it){
if(*it % 2 == 0)
v.push_back(*it * 10); // UB: reallocation invalidates `it`
}
// May crash, may silently corrupt data, may appear to "work" -- all are UBtext
What happens when push_back triggers reallocation:
Before push_back:
v: +---+---+---+---+---+
| 1 | 2 | 3 | 4 | 5 | capacity=5
+---+---+---+---+---+
^
it (valid pointer into this block)
After push_back(20) triggers realloc:
OLD block (freed!):
+---+---+---+---+---+
| ? | ? | ? | ? | ? | <-- deallocated memory!
+---+---+---+---+---+
^
it still points here --> DANGLING POINTER
NEW block:
+---+---+---+---+---+----+---+---+---+---+
| 1 | 2 | 3 | 4 | 5 | 20 | | | | | capacity=10
+---+---+---+---+---+----+---+---+---+---+Fix 1—use index-based loop (preferred in CP):
cpp
vector<int> v = {1, 2, 3, 4, 5};
int orig_size = (int)v.size();
// Invariant: i scans only the original elements; new elements appended beyond orig_size
for(int i = 0; i < orig_size; i++){
if(v[i] % 2 == 0)
v.push_back(v[i] * 10); // safe: index is not invalidated by realloc
}
// v = {1, 2, 3, 4, 5, 20, 40}Fix 2—reserve enough capacity upfront:
cpp
vector<int> v = {1, 2, 3, 4, 5};
v.reserve(v.size() * 2); // guarantee no reallocation
// Now iterators remain valid as long as we don't exceed reserved capacity
for(auto it = v.begin(); it != v.end(); ++it){
// still risky if end() changes -- prefer index-based loops
}Rule of thumb for CP: Avoid iterator-based loops when modifying the container. Use
for(int i = 0; i < ...)—indices survive reallocation, iterators don't. See STL Containers for more on iterator validity rules across container types.
char Signedness
charis signed on most x86 compilers.s[i] - 'a'can be negative ifs[i]is not a lowercase letter. Always validate input range or cast tounsigned char.
Multi-Test Case Pitfalls
- Global arrays / vectors must be cleared or resized each test case. Prefer declaring inside the
while(t--)loop. - If using global
int a[MAXN], usefill(a, a+n, 0)—don'tmemsetto arbitrary values (only works for 0 and -1).
getline after cin >>
cpp
int n;
cin >> n;
// cin >> leaves '\n' in the buffer
string s;
getline(cin, s); // reads empty string!
// Fix:
cin.ignore(); // or: cin >> ws;
getline(cin, s); // now reads the actual lineInteger Overflow
v.size()issize_t(usuallyunsigned long long). Arithmetic likev.size() - 2can underflow.- Array indices multiplied together:
a[i * n + j]—ifiandnareint, the product can overflow for.
Comparing Strings
==on strings is, not . In tight loops, consider hashing. - Lexicographic comparison (
<) short-circuits on first difference—usually fast in practice.
Mental Traps
Confusing index with value. When finding the maximum element, it is easy to store the result ambiguously. "The index of the max element" and "the max element" are two different things. Use distinct variable names (max_idx vs max_val) to prevent subtle bugs where you return a position when you meant a value, or vice versa.
text
WRONG -- ambiguous variable
+---+---+---+---+---+
| 8 | 3 | 9 | 1 | 5 |
+---+---+---+---+---+
0 1 2 3 4
max = 2 index or value?
RIGHT -- distinct names
+---+---+---+---+---+
| 8 | 3 | 9 | 1 | 5 |
+---+---+---+---+---+
0 1 2 3 4
^
max_idx = 2 <-- position
max_val = 9 <-- value at position"I already know arrays." This overconfidence is exactly what produces a string of off-by-one bugs in the first 10 minutes of a contest. The fact that you know arrays does not mean you reason about them carefully under pressure. Slow down at every boundary condition—explicitly verify your loop bounds, your slice endpoints, and your termination conditions before submitting.
text
WRONG -- casual loop bound
+---+---+---+---+---+---+
| 3 | 1 | 4 | 1 | 5 | 9 | n = 6
+---+---+---+---+---+---+--
0 1 2 3 4 5 6
^
i reaches here -- out of bounds
RIGHT -- half-open range [0, n)
+---+---+---+---+---+---+
| 3 | 1 | 4 | 1 | 5 | 9 |
+---+---+---+---+---+---+
0 1 2 3 4 5
^ ^
|<--- valid range ----->|
i from 0 stops before nMutating a string through a reference when you wanted a copy. In C++, strings are mutable, so passing by reference and modifying inside a function silently changes the caller's data. If you intended to work on a copy, pass by value. This mistake is invisible during testing when you only call the function once, and only surfaces with multiple calls or when the caller reuses the string.
text
WRONG -- pass by ref mutates caller
caller +---+---+---+---+
| a | b | c | d |
+---+---+---+---+
|
v (ref -- same object)
func +---+---+---+---+
| a | # | c | d | caller sees # too!
+---+---+---+---+
RIGHT -- pass by value keeps original
caller +---+---+---+---+
| a | b | c | d | <-- unchanged
+---+---+---+---+
|
v (copy -- new object)
func +---+---+---+---+
| a | # | c | d | only the copy changed
+---+---+---+---+The Mistake That Teaches
It was a Div. 2 round, problem B, rated 1200. The task: given a string, build the lexicographically smallest string by appending characters from either end. My solution: loop, compare s.front() vs s.back(), append the smaller one, pop it.
It passed pretests. I moved on to C. Thirty minutes later: "Wrong Answer on test 47." I stared at the code. The logic seemed airtight. Then I spotted it: when front and back were equal, I always took the front—but the correct choice depends on which interior characters come next. I needed to compare the remaining string with its reverse, not just the endpoints.
But the real lesson wasn't the logic bug. It was that I'd written s = s.substr(1) to "pop the front." On a string of length 200,000, that's lo and hi into the original string, never copying.
Two bugs in one solution. The logic bug was the one I noticed; the performance bug was the one that cost me the round. Now I always ask: "Am I copying a string inside a loop?"
See: Debugging and Stress Testing for systematic approaches to finding these bugs
When NOT to Use This
Arrays and strings are the default, but sometimes they're the wrong tool:
| Situation | Why Arrays Fail | Use Instead |
|---|---|---|
| Frequent insertions/deletions in the middle | set, multiset, or a balanced BST | |
| Need | Arrays only support | Policy-based tree (order_of_key) or segment tree |
| FIFO queue behavior (push back, pop front) | vector pop_front is | deque or queue |
| Key-value lookups by non-integer keys | Can't index by string or large integer | unordered_map or map |
| Persistent/versioned data | Modifying an array destroys previous state | Persistent segment tree or copy-on-write |
| Need set intersection/union | Manual merge is error-prone | set_intersection on sorted vectors, or use bitset |
In CP, use
vectorby default. Only switch when the problem demands an operation that arrays can't do efficiently. See STL Containers for the full container reference, or the Data Structure Selection Guide for a quick decision flowchart.
What Would You Do If...?
Scenario 1: You're solving a problem that says "find all pairs (i,j) with a[i]+a[j]=K." The naive O(n^2) approach TLEs at n=200,000.
Sort the array, then use two pointers from both ends. Each pointer moves at most n times total, so it's O(n log n) for the sort + O(n) for the scan. Alternatively, use a hash set: for each element x, check if K-x has been seen.
Scenario 2: Your string solution passes on small tests but TLEs on n=500,000. You're building a result string character by character.
Check: Are you using s = s + c anywhere? Switch to s += c. Are you calling s.substr() inside a loop? Replace with index tracking. Are you calling s.find() in a loop? That's O(n*m) per call—switch to string hashing or KMP.
Scenario 3: You need to process queries: "how many elements in the array fall in range [L, R]?" with 200,000 queries on an array of 200,000 elements.
A raw array can't answer range queries faster than O(n) per query. Sort the array and use lower_bound/upper_bound for O(log n) per query. If the array is modified between queries, you need a segment tree or BIT.
Self-Test
Before moving on, verify you can do each of these without looking back at the text:
- [ ] Write a loop that iterates over every adjacent pair
(a[i], a[i+1])in an array ofnelements, without accessing out-of-bounds memory. - [ ] Extract the substring
s[l..r](inclusive) usingstd::string::substr—get the arguments right on the first try. - [ ] Count the frequency of each lowercase letter in a string in O(n) time using an array, not a map.
- [ ] Explain why
s.size() - 1is dangerous whensis empty and how to guard against it. - [ ] Given a 2D array
int a[N][M], state which loop order is cache-friendly and which is not. - [ ] Given
int max_idxandint max_val, explain which stores a position and which stores a value—then write correct code to find both for an array. - [ ] Describe what happens when you call
v.push_back(x)while iterating overvwith an iterator—and name the safe alternative.
Debug-This Exercises
Find the bug in each snippet. Answers are below the fold.
Bug 1: The disappearing elements
cpp
vector<int> v = {1, 2, 3, 4, 5, 6};
for(int i = 0; i < (int)v.size(); i++){
if(v[i] % 2 == 0)
v.erase(v.begin() + i);
}
// Expected: {1, 3, 5} -- what do you actually get?Bug 2: The infinite builder
cpp
string build(int n){
string s;
for(int i = 0; i < n; i++)
s = s + (char)('a' + i % 26);
return s;
}
// Compiles and gives correct output. What's the bug?Bug 3: The subtle overflow
cpp
vector<int> v = {10, 20, 30};
for(int i = 0; i < v.size() - 1; i++){
cout << v[i] << " ";
}
// Works fine. Now what happens if v is empty?Bug 4: The wrong reverse
cpp
string s = "abcdef";
for(int i = 0; i <= (int)s.size(); i++){
swap(s[i], s[s.size() - i]);
}
// What's wrong? (Two bugs!)Answers (click to reveal)
Bug 1: After erasing element at index i, the next element shifts into position i, but i++ skips it. Result: {1, 3, 5, 6}—the 6 survives because 4 was erased at index 3, then i advances to 4 which is now past the end. Fix: don't increment i after an erase, or use the erase-remove idiom.
Bug 2: s = s + (char)(...) creates a temporary string each iteration, copying all of s. Total work: s += (char)('a' + i % 26) for amortized
Bug 3: v.size() is size_t (unsigned). When v is empty, v.size() - 1 wraps to ~0ULL (about for(int i = 0; i < (int)v.size() - 1; i++) or check v.empty() first.
Bug 4: Two bugs: (a) s.size() - 0 = 6 which is out of bounds (valid indices are 0–5), should be s.size() - 1 - i; (b) the loop condition i <= s.size() should be i < s.size() / 2—otherwise you swap elements back to their original positions in the second half. Fix: for(int i = 0; i < (int)s.size()/2; i++) swap(s[i], s[s.size()-1-i]);
Practice Problems
| # | Problem | Source | Difficulty | Key Idea |
|---|---|---|---|---|
| 1 | Watermelon | CF 4A | 800 | Basic if-statement, array boundary |
| 2 | Word | CF 59A | 800 | Character frequency, case conversion |
| 3 | Distinct Numbers | CSES | ~900 | Sort + unique |
| 4 | Palindrome Reorder | CSES | ~1000 | Frequency counting, string building |
| 5 | Yet Another Palindrome Problem | CF 1324C | 1100 | Frequency + position tracking |
| 6 | Make It Good | CF 1385C | 1200 | Suffix analysis, array structure |
| 7 | Good Subarrays | CF 1398C | 1600 | Prefix sums on transformed array |
| 8 | Maximum Subarray Sum | CSES | ~1600 | Kadane's algorithm |
| 9 | Xenia and Colorful Gems | CF 1336C | 1700 | Sort + binary search on arrays |
| 10 | Longest Palindromic Prefix | CF 1326D2 | 2200 | String hashing + palindrome |
For a structured progression through array and string problems, see the Practice Ladder (1100–1400).
Further Reading
- cp-algorithms: String Processing—hashing, KMP, Z-function
- CF Blog: C++ Tricks for Competitive Programming—I/O speedups, useful STL
- CF Blog: A comprehensive guide on vector—deep dive into vector internals
- USACO Guide: Introduction to Data Structures—beginner-friendly walkthrough
Related guidebook entries:
- Two Pointers
- Sliding Window
- Prefix Sums
- Sorting and Searching
- String Hashing
- KMP and Z-Function
- STL Containers—full container reference including
deque,set,map - Bitset Optimization—when
vector<bool>isn't enough - Bit Manipulation—bitwise tricks on array elements
- Complexity Analysis—evaluating whether your approach fits
- Debugging and Stress Testing—systematic bug hunting
Next Up: STL Containers—
set,map,deque,priority_queue, and when each one beats a plainvector.