Arrays and Strings

Quick Revisit
USE WHEN: You need O(1) lookup/update by index or need to track seen elements
INVARIANT: a[i] is accessed in O(1) via base + i*sizeof(element)
TIME: O(1) access/update | SPACE: O(n)
CLASSIC BUG: Off-by-one on 0-indexed vs 1-indexed arrays
PRACTICE: ../12-practice-ladders/01-ladder-1100-to-1400.md

The most fundamental data structures in competitive programming—contiguous memory blocks that underpin nearly every problem you'll solve on Codeforces.

Difficulty: Beginner | Prerequisites: None

Arrays date to the earliest stored-program computers of the late 1940s, where contiguous memory words were the only data structure available. The C language (1972) made pointer-arithmetic access a first-class citizen, and C++ std::vector (1994, HP STL) added dynamic resizing while preserving the O(1) access guarantee. Strings as first-class objects—rather than null-terminated char*—arrived with std::string in C++98.

Think of an array as a row of numbered lockers: knowing the number gets you to the locker in one step, no matter how long the hallway. The real aha moment comes when you realize that an array's index is a hash function for small integers—and that almost every "have I seen X before?" question collapses to a single array lookup. Once that clicks, competitive programming starts to click too. A useful mnemonic: "Value IS address—skip the search."

Intuition
C++ STL Reference
Implementation (Contest-Ready)
Operations Reference
Problem Patterns and Tricks
Contest Cheat Sheet
If I Had 5 Minutes
Before You Code Checklist
Gotchas and Debugging
When NOT to Use This
What Would You Do If...?
The Mistake That Teaches
Self-Test
Debug-This Exercises
Practice Problems
Further Reading

Intuition

You have an array of six integers and need to find the first value that appears twice:

text

a = [3, 1, 4, 1, 5, 3]

Your first instinct: for each element a[i], scan everything before it looking for a match.

text

i=0  value=3   check against: (nothing)        --> 0 comparisons
i=1  value=1   check against: 3                --> 1 comparison, no match
i=2  value=4   check against: 3, 1             --> 2 comparisons, no match
i=3  value=1   check against: 3, 1             --> 2 comparisons, MATCH!

Found it—1 is the first duplicate—after 5 comparisons. Not terrible for 6 elements. But you got lucky: the duplicate was near the middle. Push it to the end:

text

a = [1, 2, 3, 4, 5, 1]          (duplicate at position 5)

i=0  (0 comparisons)
i=1  (1 comparison)
i=2  (2 comparisons)
i=3  (3 comparisons)
i=4  (4 comparisons)
i=5  (5 comparisons)  --> finally, MATCH

Total: 0 + 1 + 2 + 3 + 4 + 5 = 15 comparisons = n*(n-1)/2

For $n = 10^{5}$ , that is roughly $5 \times 10^{9}$ comparisons—about 5 seconds on a machine doing $10^{9}$ operations per second. For $n = 10^{6}$ , the count reaches $5 \times 10^{11}$ . Contest time limits are 1–2 seconds.

We're re-scanning previous elements for every single position. There has to be a better way.

An array's $O (1)$ indexed access lets you use a value directly as an address, turning "have I seen this before?" from a linear scan into a single lookup.

Think of numbered mailboxes at a post office. To check if mail arrived for box #4, you don't search through every box starting from #1. You walk directly to box 4 and open it. The box number is the location—no searching required.

An array works the same way: seen[4] compiles to a single address calculation—base + 4 * sizeof(bool)—regardless of how many elements exist. No comparisons, no traversal.

Where the analogy breaks: a post office has a fixed number of boxes. If someone asks for box #1,000,000,000, you can't build a billion mailboxes. Similarly, if your values range up to $10^{9}$ , you can't allocate a billion-entry seen array. You'd need hashing (unordered_map) or coordinate compression to shrink the value space first.

Why contiguous memory matters beyond indexing: modern CPUs load memory in cache lines (~64 bytes). When you iterate a vector, adjacent elements are already in L1 cache—the hardware predicts your sequential access pattern. A list or map scatters nodes across the heap, turning each access into a potential cache miss.

For $n \leq 10^{6}$ , this can mean a 5–10× wall-clock difference. Prefer vector over list/deque unless you specifically need $O (1)$ front insertion.

Visual Walkthrough

Same array: a = [3, 1, 4, 1, 5, 3]. Allocate seen[6], all initialized to false (shown as .; * means true).

text

Step 1: Process a[0] = 3

  a:      | 3 | 1 | 4 | 1 | 5 | 3 |
            ^
            i=0

  seen:   | . | . | . | . | . | . |
   index:   0   1   2   3   4   5

  seen[3] is false  -->  not a duplicate
  Set seen[3] = true

  seen:   | . | . | . | * | . | . |
   index:   0   1   2   3   4   5

  ops so far: 1 lookup + 1 write = 2

text

Step 2: Process a[1] = 1

  a:      | 3 | 1 | 4 | 1 | 5 | 3 |
                ^
                i=1

  seen:   | . | . | . | * | . | . |
   index:   0   1   2   3   4   5

  seen[1] is false  -->  not a duplicate
  Set seen[1] = true

  seen:   | . | * | . | * | . | . |
   index:   0   1   2   3   4   5

  ops so far: 4

text

Step 3: Process a[2] = 4

  a:      | 3 | 1 | 4 | 1 | 5 | 3 |
                    ^
                    i=2

  seen:   | . | * | . | * | . | . |
   index:   0   1   2   3   4   5

  seen[4] is false  -->  not a duplicate
  Set seen[4] = true

  seen:   | . | * | . | * | * | . |
   index:   0   1   2   3   4   5

  ops so far: 6

text

Step 4: Process a[3] = 1

  a:      | 3 | 1 | 4 | 1 | 5 | 3 |
                        ^
                        i=3

  seen:   | . | * | . | * | * | . |
   index:   0   1   2   3   4   5

  seen[1] is TRUE  -->  DUPLICATE FOUND!  Return value 1.

  seen:   | . | * | . | * | * | . |
   index:   0   1   2   3   4   5

  ops so far: 7  (just one more lookup)

Brute force: 5 comparisons in the lucky case, up to 15 in the worst case— $O (n^{2})$ .

Array lookup: 7 constant-time operations, always— $O (n)$ .

The gap widens fast. At $n = 10^{5}$ : brute force does up to $5 \times 10^{9}$ comparisons; the array approach does at most $2 n = 2 \times 10^{5}$ operations. That is a 25,000× difference.

The Invariant

text

+----------------------------------------------------------+
| INVARIANT: After processing a[0..i-1], seen[v] == true   |
| if and only if value v appeared in a[0..i-1]. Each       |
| lookup and update costs O(1) because array indexing is    |
| direct address arithmetic: base + v * sizeof(bool).      |
+----------------------------------------------------------+

The invariant holds because every time we process a[i]:

We read seen[a[i]]—by the invariant, this is true exactly when a[i] already appeared in a[0..i-1].
If false, we set seen[a[i]] = true, extending the invariant to cover a[0..i].
If true, we have found a duplicate and stop.

Look at Step 4 above—seen[1] is true because Step 2 set it when we first encountered value 1 at index 1. The invariant guarantees that a true entry means "this value appeared earlier," not "some garbage was in memory." Without $O (1)$ indexed access, maintaining this property would require scanning a list each time, destroying the $O (n)$ guarantee.

This same principle—use the value as the index—is the foundation of frequency arrays (freq[c - 'a']++), counting sort, bucket sort, and direct-address tables. The contiguous memory layout of arrays is what makes the address calculation constant-time and the whole approach viable.

What the Code Won't Teach You

The real skill is boundary discipline, not memorization. When you write a loop or a slice, explicitly state your invariant: "i is the index of the current element" or "everything in 0..i-1 has been processed." The moment that invariant gets fuzzy, you are one off-by-one away from a bug. This is not about memorizing that arrays are zero-indexed—it is about maintaining a precise mental model at every step.

text

  Boundary invariant at each loop step

  +---+---+---+---+---+---+
  | # | # | # |   |   |   |
  +---+---+---+---+---+---+
    0   1   2   3   4   5
    <-processed->|<-pending->
                 ^
                 i

  At step i: a[0..i-1] done, a[i..n-1] remaining

Strings in competitive programming are almost always about patterns, not raw characters. The interesting questions are: does string A appear in string B? What is the longest prefix shared by two strings? How many distinct substrings exist? The mechanics—indexing, slicing, concatenation—are just the substrate. The real skill is recognizing the structural question hiding underneath the string manipulation.

text

  Surface vs structure

  s = a b c a b c d
      +--+--+--+--+--+--+--+
      |  |  |  |  |  |  |  |
      +--+--+--+--+--+--+--+

  Pattern view
      |<--abc-->|<--abc-->| d
      +--+--+--+--+--+--+--+
      | #| #| #| #| #| #|  |
      +--+--+--+--+--+--+--+
      period 3

Character arithmetic is a basic building block, not a trick. c - 'a' maps lowercase letters to indices 0 through 25, letting you use characters directly as array positions. This is the foundation of every frequency-counting solution and appears in nearly every string problem at every difficulty level.

text

  Character to index mapping

  char  a   b   c   d   ...   z
        |   |   |   |         |
        v   v   v   v         v
  idx   0   1   2   3        25

        +---+---+---+---+-----+---+
  freq  |   |   |   |   | ... |   |
        +---+---+---+---+-----+---+
          0   1   2   3        25

With those deeper skills in mind, here's how to recognize when an array-based approach is the right tool.

When to Reach for This

Trigger phrases in problem statements:

"count the frequency of each element" → frequency array indexed by value
"find duplicates" or "any value that repeats" → boolean seen array
"values bounded by a small constant ( $\leq 10^{6}$ )" → direct indexing feasible
"rearrange / sort when values are in a small range" → counting sort via array buckets
"character frequency in a string" → freq[c - 'a'] with a size-26 array

Counter-examples (looks like direct-indexing, but isn't):

Values up to $10^{9}$ or negative values—can't allocate an array that big. Use unordered_map or coordinate-compress into a dense range first, then index. See Sort + Unique (Coordinate Compression) below.
"Find the $k$ -th smallest element"—direct indexing doesn't help with rank queries. You need sorting, a selection algorithm, or an order-statistics tree.
"Insert/delete elements frequently in the middle"—arrays pay $O (n)$ per mid-insertion due to shifting. If this dominates your runtime, consider a balanced BST or a policy-based tree.

Where arrays and strings appear on Codeforces:

Tags: implementation, strings, sortings, greedy, brute force
Difficulty: 800–1600 (arrays as the core structure), but array manipulation tricks appear up to 2400+
Nearly every problem uses arrays—the question is how

Pattern Fingerprint: values bounded by 10^6 → direct-index frequency array | find duplicates / first repeat → boolean seen array | range sum queries → prefix sums | longest subarray with property → sliding window / two pointers | string matching → hashing or KMP (not s.find())

Rating Progression—what array/string skills matter at each level:

CF Rating	What You Need
~1200	Frequency arrays, basic string manipulation, `sort` + iteration, simple simulation
~1500	Prefix sums, two pointers, sliding window, coordinate compression, `s += c` vs `s = s + c` awareness
~1800	Kadane's variations, sweep-line on arrays, string hashing, careful complexity analysis of nested loops
~2100+	Suffix arrays, Z/KMP for non-trivial matching, segment trees over arrays, bitmask + array DP

See: Complexity Analysis for how to evaluate whether your array approach fits in time

With the core idea clear, here is the C++ toolkit that implements it.

C++ STL Reference

Containers

Function / Class	Header	What it does	Time Complexity
`vector<T>`	`<vector>`	Dynamic array, contiguous memory	Access $O (1)$ , push_back amortized $O (1)$
`array<T,N>`	`<array>`	Fixed-size array on stack, bounds-checked with `.at()`	Access $O (1)$
`string`	`<string>`	Dynamic char array with string operations	Access $O (1)$ , `substr` $O (k)$
`string_view`	`<string_view>`	Non-owning view into string, no copy	Construction $O (1)$ , no allocation

Key vector operations

Function / Class	Header	What it does	Time Complexity
`v.push_back(x)`	`<vector>`	Append element (copy)	Amortized $O (1)$
`v.emplace_back(args...)`	`<vector>`	Construct element in-place at end	Amortized $O (1)$
`v.pop_back()`	`<vector>`	Remove last element	$O (1)$
`v.resize(n)`	`<vector>`	Change size; new elements value-initialized	$O (n)$
`v.resize(n, val)`	`<vector>`	Change size; new elements set to `val`	$O (n)$
`v.reserve(n)`	`<vector>`	Pre-allocate capacity (no size change)	$O (n)$
`v.shrink_to_fit()`	`<vector>`	Request to reduce capacity to size	$O (n)$
`v.insert(it, x)`	`<vector>`	Insert before iterator	$O (n)$
`v.erase(it)`	`<vector>`	Remove at iterator, shift left	$O (n)$
`v.erase(first, last)`	`<vector>`	Remove range	$O (n)$
`v.clear()`	`<vector>`	Remove all elements (size=0, capacity unchanged)	$O (n)$
`v.assign(n, val)`	`<vector>`	Replace contents with `n` copies of `val`	$O (n)$
`v.front()` / `v.back()`	`<vector>`	First / last element reference	$O (1)$
`v.data()`	`<vector>`	Pointer to underlying array	$O (1)$

Key string operations

Function / Class	Header	What it does	Time Complexity
`s.substr(pos, len)`	`<string>`	Extract substring (returns new string)	$O (l e n)$
`s.find(t)`	`<string>`	Find first occurrence of `t`	$O (n \cdot m)$ worst case
`s.rfind(t)`	`<string>`	Find last occurrence of `t`	$O (n \cdot m)$ worst case
`s.find_first_of(chars)`	`<string>`	Find first char in set	$O (n \cdot k)$
`s += t`	`<string>`	Append string	Amortized $O (l e n (t))$
`s.append(t)`	`<string>`	Same as `+=`	Amortized $O (l e n (t))$
`s.insert(pos, t)`	`<string>`	Insert at position	$O (n + l e n (t))$
`s.erase(pos, len)`	`<string>`	Erase substring	$O (n)$
`s.replace(pos, len, t)`	`<string>`	Replace substring	$O (n + l e n (t))$
`s.c_str()`	`<string>`	Null-terminated C-string pointer	$O (1)$
`stoi(s)` / `stoll(s)`	`<string>`	String to int / long long	$O (n)$
`to_string(x)`	`<string>`	Number to string	$O (\log x)$

Algorithms on Ranges

Function / Class	Header	What it does	Time Complexity
`sort(b, e)`	`<algorithm>`	Sort range `[b, e)`	$O (n \log n)$
`reverse(b, e)`	`<algorithm>`	Reverse range in-place	$O (n)$
`unique(b, e)`	`<algorithm>`	Remove consecutive duplicates (returns new end)	$O (n)$
`lower_bound(b, e, x)`	`<algorithm>`	Iterator to first element $\geq x$ (sorted range)	$O (\log n)$
`upper_bound(b, e, x)`	`<algorithm>`	Iterator to first element $> x$ (sorted range)	$O (\log n)$
`accumulate(b, e, init)`	`<numeric>`	Sum of range	$O (n)$
`max_element(b, e)`	`<algorithm>`	Iterator to max	$O (n)$
`min_element(b, e)`	`<algorithm>`	Iterator to min	$O (n)$
`fill(b, e, val)`	`<algorithm>`	Fill range with value	$O (n)$
`iota(b, e, start)`	`<numeric>`	Fill with `start, start+1, ...`	$O (n)$
`next_permutation(b, e)`	`<algorithm>`	Mutate to next lexicographic permutation	$O (n)$

Hidden Complexity Table

Many operations look cheap but hide linear or worse costs. This table collects the surprises:

Operation	`vector`	`string`	`deque`	Notes
`push_back`	$O (1)$ *	$O (1)$ *	$O (1)$ *	* = amortized; single call may trigger $O (n)$ reallocation
`insert(mid)`	$O (n)$	$O (n)$	$O (n)$	Must shift elements; not "just one insert"
`substr`	N/A	$O (k)$	N/A	Allocates a new string of length $k$
`find`	$O (n)$	$O (n \cdot m)$	N/A	`string::find` is NOT guaranteed to use KMP/BM—GCC uses naive search
`+` (concat)	N/A	$O (n + m)$	N/A	Creates a new string; prefer `+=` to avoid temporaries
`==` (compare)	$O (n)$	$O (n)$	$O (n)$	Must check every element; no short hash
`erase(begin)`	$O (n)$	$O (n)$	$O (1)$	Deque wins for front removal
`operator[]`	$O (1)$	$O (1)$	$O (1)$	All three—but deque has higher constant factor

Warning: string::find on GCC (libstdc++) uses a naive $O (n \cdot m)$ search, not Knuth-Morris-Pratt. If you need fast substring search in a contest, use String Hashing or KMP/Z-Function instead.

Implementation (Contest-Ready)

Minimal Contest Template

cpp

#include <bits/stdc++.h>
using namespace std;

int main(){
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int t;
    cin >> t;
    while(t--){
        int n;
        cin >> n;
        vector<int> a(n);
        for(auto &x : a) cin >> x;

        // --- solve ---
        sort(a.begin(), a.end());
        // Invariant: after unique(), elements in [begin, new_end) are distinct and sorted
        a.erase(unique(a.begin(), a.end()), a.end());

        cout << a.size() << "\n";
        // Invariant: at iteration i, elements a[0..i-1] have been printed
        for(int i = 0; i < (int)a.size(); i++){
            cout << a[i] << " \n"[i + 1 == (int)a.size()];
        }
    }
}

Annotated Version

cpp

#include <bits/stdc++.h>
using namespace std;

int main(){
    // Fast I/O: decouples C and C++ streams, disables sync with scanf/printf.
    // Never mix cin/cout with scanf/printf after this.
    ios::sync_with_stdio(false);
    cin.tie(nullptr);  // don't flush cout before each cin

    int t;
    cin >> t;
    while(t--){
        int n;
        cin >> n;

        // Read array. vector<int> a(n) creates n elements initialized to 0.
        // Prefer this over `int a[n]` (VLA is non-standard in C++).
        vector<int> a(n);
        for(auto &x : a) cin >> x;  // range-for with reference to read in-place

        // Sort + unique: classic idiom to remove duplicates.
        // unique() moves consecutive dupes to the end, returns iterator to new logical end.
        // erase() actually shrinks the vector.
        // Invariant: after this block, a[] contains only distinct values in sorted order
        sort(a.begin(), a.end());
        a.erase(unique(a.begin(), a.end()), a.end());

        cout << a.size() << "\n";

        // Output with spaces between elements, newline at end.
        // " \n"[condition] is a trick: if condition is true (last element),
        // prints '\n'; otherwise prints ' '.
        // Invariant: at step i, a[0..i-1] have been printed with correct delimiters
        for(int i = 0; i < (int)a.size(); i++){
            cout << a[i] << " \n"[i + 1 == (int)a.size()];
        }
    }
}

String I/O Patterns

cpp

#include <bits/stdc++.h>
using namespace std;

int main(){
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    // Read a single string (no spaces)
    string s;
    cin >> s;

    // Read entire line (including spaces)
    string line;
    getline(cin, line);  // WARNING: if cin >> was used before, consume leftover '\n' first
    // Fix: cin.ignore() or cin >> ws before getline

    // Read n strings
    int n;
    cin >> n;
    vector<string> words(n);
    for(auto &w : words) cin >> w;

    // Build string efficiently
    string result;
    result.reserve(n * 10);  // pre-allocate if you know approximate size
    for(auto &w : words){
        result += w;
        result += ' ';
    }

    // Character frequency (ASCII trick)
    int freq[128] = {};
    for(char c : s) freq[(int)c]++;

    // Convert case
    for(char &c : s) c = tolower(c);

    cout << s << "\n";
}

Operations Reference

Use this section as a quick lookup during contests. Each table shows the time and space cost of every common operation.

vector<T>

Operation	Time	Space
Access `v[i]`	$O (1)$	$O (1)$
`push_back` / `emplace_back`	Amortized $O (1)$	$O (1)$ amortized
`pop_back`	$O (1)$	$O (1)$
`insert` at position `i`	$O (n - i)$	$O (1)$
`erase` at position `i`	$O (n - i)$	$O (1)$
`resize(n)`	$O (n)$	$O (n)$
`reserve(n)`	$O (n)$	$O (n)$
`clear`	$O (n)$	$O (1)$ (capacity kept)
`sort`	$O (n \log n)$	$O (\log n)$ stack
Iterate all	$O (n)$	$O (1)$
Copy	$O (n)$	$O (n)$

string

Operation	Time	Space
Access `s[i]`	$O (1)$	$O (1)$
`s += t` (append)	Amortized $O (\| t \|)$	$O (1)$ amortized
`substr(pos, len)`	$O (l e n)$	$O (l e n)$ (new string)
`find(t)`	$O (n \cdot m)$ worst	$O (1)$
`compare` / `==`	$O (min (n, m))$	$O (1)$
`insert(pos, t)`	$O (n + \| t \|)$	$O (1)$
`erase(pos, len)`	$O (n)$	$O (1)$
Lexicographic `<`	$O (min (n, m))$	$O (1)$

C-Array `T a[N]`

Operation	Time	Space
Access `a[i]`	$O (1)$	$O (1)$
Fill with `memset`/`fill`	$O (N)$	$O (1)$
Copy with `memcpy`/`copy`	$O (N)$	$O (N)$

String Iteration Efficiency

Small choices in string loops have large performance consequences:

1. s.size() returns size_t (unsigned)—match your loop variable or cast.

cpp

// BAD: comparison between signed and unsigned
for(int i = 0; i < s.size(); i++)  // compiler warning; works, but sloppy

// GOOD: cast to int (safe for n <= 2*10^9)
for(int i = 0; i < (int)s.size(); i++)

// ALSO GOOD: range-for when you don't need the index
for(char c : s)

2. s[i] is $O (1)$ but s.substr(i) is $O (n - i)$ .

Every substr allocates a new string and copies characters. In a loop, this turns $O (n)$ logic into $O (n^{2})$ :

cpp

// BAD: O(n^2) -- substr copies (n-i) chars each iteration
for(int i = 0; i < (int)s.size(); i++){
    string tail = s.substr(i);  // O(n-i) allocation + copy
    if(tail < best) best = tail;
}

// GOOD: compare in-place using iterators or string_view
for(int i = 0; i < (int)s.size(); i++){
    // Invariant: best holds the lex-smallest suffix seen so far among s[0..i-1]
    if(s.compare(i, string::npos, best) < 0)
        best = s.substr(i);  // only copy when we find a new best
}

3. s += c vs s = s + c—amortized $O (1)$ vs $O (n)$ .

cpp

string s;
// BAD: s = s + c creates a temporary, copies all of s, then appends c --> O(n) per call
// Invariant: after i iterations, s contains i characters (but O(i^2) total work)
for(int i = 0; i < n; i++)
    s = s + 'a';           // O(n^2) total!

// GOOD: s += c appends in-place with amortized O(1) reallocation
// Invariant: after i iterations, s contains i characters (O(i) total work)
for(int i = 0; i < n; i++)
    s += 'a';              // O(n) total

This is the single most common performance bug in string-heavy contest problems. If your string solution TLEs, check for s = s + ... first.

vector<bool> Specialization—A Dangerous Trap

vector<bool> is not a real vector. The C++ standard requires it to be a packed bitfield (1 bit per element), which breaks several assumptions:

cpp

// This works for vector<int>, but FAILS for vector<bool>:
vector<bool> v(8, true);

// FAILS: cannot take address of a bit -- v[0] returns a proxy object, not a bool&
bool* p = &v[0];           // compile error!

// FAILS: range-for with reference doesn't bind to the proxy
for(bool& b : v) b = false; // compile error in some compilers!

// FAILS: data() is not available
bool* raw = v.data();       // compile error!

Why this matters in CP: You might write a generic function taking vector<T>& and it silently breaks when T = bool. Or you pass &v[0] to a C-style API and get a compile error mid-contest.

The fix—use vector<char> or bitset:

cpp

// Use vector<char> when you need a resizable boolean array
vector<char> seen(n, 0);  // each element is 0 or 1, 1 byte each
seen[i] = 1;              // works exactly like you'd expect
char* p = &seen[0];       // fine -- real references, real pointers

// Use bitset<N> when size is fixed at compile time and you want bit-packing
bitset<100001> seen;       // 1 bit per element, but with proper interface
seen[i] = 1;
seen.count();              // popcount in O(N/64)

See: Bitset Optimization for advanced bitset techniques in CP

The primitives above are building blocks. These are the contest patterns they assemble into.

Problem Patterns and Tricks

Frequency Counting

Count occurrences of elements. Use a plain array for small alphabets, unordered_map for large ranges.

cpp

// Character frequency in O(n)
// Invariant: after processing s[0..i-1], freq[c-'a'] == count of c in s[0..i-1]
int freq[26] = {};
for(char c : s) freq[c - 'a']++;

Problems: CF 1850C, CF 520A

Sort + Unique (Coordinate Compression)

Remove duplicates or compress values to indices.

cpp

vector<int> vals = a;  // copy
sort(vals.begin(), vals.end());
vals.erase(unique(vals.begin(), vals.end()), vals.end());
// Now vals[i] is the i-th smallest unique value.
// To compress: index of x = lower_bound(vals.begin(), vals.end(), x) - vals.begin()

Problems: CF 1005C - Summarize to the Power of Two, CF 1560C

Two Pointers on Sorted Array

Maintain two indices moving inward (or same direction) to find pairs or subarrays.

cpp

// Find pair with sum = target in sorted array
// Invariant: no valid pair exists entirely within a[0..lo-1] or a[hi+1..n-1]
int lo = 0, hi = (int)a.size() - 1;
while(lo < hi){
    int s = a[lo] + a[hi];
    if(s == target){ /* found */ break; }
    else if(s < target) lo++;
    else hi--;
}

text

  Two pointer movement on sorted array -- find pair summing to 9

  +---+---+---+---+---+----+
  | 1 | 2 | 4 | 5 | 7 | 10 |   target = 9
  +---+---+---+---+---+----+
    ^                    ^
    lo                   hi      1+10 = 11 > 9  --> hi moves left

  +---+---+---+---+---+----+
  | 1 | 2 | 4 | 5 | 7 | 10 |
  +---+---+---+---+---+----+
    ^                ^
    lo               hi          1+7 = 8 < 9   --> lo moves right

  +---+---+---+---+---+----+
  | 1 | 2 | 4 | 5 | 7 | 10 |
  +---+---+---+---+---+----+
        ^            ^
        lo           hi          2+7 = 9        FOUND

Problems: CF 1472D, CF 1851B

See: Two Pointers

Prefix Sums for Range Queries

Pre-compute prefix sums to answer range-sum queries in $O (1)$ .

cpp

// Invariant: pre[i] == sum of a[0..i-1]; pre[0] == 0
vector<long long> pre(n + 1, 0);
for(int i = 0; i < n; i++) pre[i + 1] = pre[i] + a[i];
// sum of a[l..r] = pre[r+1] - pre[l]

text

  Building prefix sums from array a

  index     0   1   2   3   4
          +---+---+---+---+---+
  a       | 3 | 1 | 4 | 1 | 5 |
          +---+---+---+---+---+
            |   |   |   |   |
            v   v   v   v   v   accumulate left to right

  index   0   1   2   3   4   5
        +---+---+---+---+---+----+
  pre   | 0 | 3 | 4 | 8 | 9 | 14 |
        +---+---+---+---+---+----+
                                      pre[i+1] = pre[i] + a[i]

  Range query: sum of a[1..3]
        +---+---+---+---+---+----+
  pre   | 0 | 3 | 4 | 8 | 9 | 14 |
        +---+---+---+---+---+----+
              ^               ^
              pre[1]=3        pre[4]=9
              +---subtract----+  9 - 3 = 6

Problems: CF 1398C, CF 1343C

See: Prefix Sums

String Palindrome Check

Use two pointers from both ends or reverse-compare.

cpp

bool isPalin(const string &s){
    int n = s.size();
    for(int i = 0; i < n / 2; i++)
        if(s[i] != s[n - 1 - i]) return false;
    return true;
}
// Or: string t = s; reverse(t.begin(), t.end()); return s == t;

Problems: CF 1326C, CF 798A

Sliding Window on String

Fixed or variable-width window scanning for substrings matching a property.

cpp

// Count substrings of length k with all distinct chars
// Invariant: freq[] tracks char counts in window s[max(0,i-k+1)..i], distinct == number of nonzero entries
int cnt = 0;
int freq[26] = {};
int distinct = 0;
for(int i = 0; i < (int)s.size(); i++){
    if(freq[s[i] - 'a']++ == 0) distinct++;
    if(i >= k){
        if(--freq[s[i - k] - 'a'] == 0) distinct--;
    }
    if(i >= k - 1 && distinct == k) cnt++;
}

Problems: CF 1722G, CF 1196D2

See: Sliding Window

Erase-Remove Idiom

Efficiently remove elements matching a predicate without manual index juggling.

cpp

// Remove all zeros from vector
v.erase(remove(v.begin(), v.end(), 0), v.end());

// Remove all even numbers
v.erase(remove_if(v.begin(), v.end(), [](int x){ return x % 2 == 0; }), v.end());

Problems: CF 1703C, CF 1352C

Max Subarray (Kadane's Algorithm)

Find contiguous subarray with maximum sum in $O (n)$ .

cpp

// Invariant: cur == max subarray sum ending at current element; best == global max so far
long long best = a[0], cur = 0;
for(int x : a){
    cur = max((long long)x, cur + x);
    best = max(best, cur);
}

text

  Tracking cur and best through the array

  a     | -2 |  3 | -1 |  5 | -4 |  2 |
        +----+----+----+----+----+----+
  cur     -2    3    2    7    3    5
  best    -2    3    3    7    7    7
                          ^
                          best subarray is a[1..3]
                          3 + (-1) + 5 = 7

  cur  = max(x alone, cur + x)   -- extend or restart
  best = max(best, cur)          -- global maximum so far

Problems: CF 1796B, CSES 1643

Contest Cheat Sheet

text

+---------------------------------------------------------------+
|                  ARRAYS & STRINGS CHEAT SHEET                  |
+---------------------------------------------------------------+
| WHEN TO USE:                                                   |
|   o Random access needed           o Contiguous data           |
|   o Sorting / binary search        o Frequency counting        |
|   o String matching / manipulation                             |
+---------------------------------------------------------------+
| KEY STL:                                                       |
|   vector<int> a(n);          // dynamic array                  |
|   sort(a.begin(), a.end());  // O(n log n)                     |
|   a.erase(unique(a.begin(), a.end()), a.end()); // dedup       |
|   lower_bound(a.begin(), a.end(), x);  // binary search        |
|   string s; cin >> s;        // read word                      |
|   s.substr(pos, len);        // O(len) -- returns new string   |
|   s.find("pat");             // O(n*m) -- use hashing if TLE   |
+---------------------------------------------------------------+
| COMPLEXITY:                                                    |
|   Access:      O(1)          push_back: amortized O(1)         |
|   Insert/mid:  O(n)          Sort:      O(n log n)             |
|   find (str):  O(n*m)        substr:    O(k)                   |
+---------------------------------------------------------------+
| SPACE: vector O(n), string O(n), C-array O(N) stack            |
+---------------------------------------------------------------+
| FAST I/O:                                                      |
|   ios::sync_with_stdio(false); cin.tie(nullptr);               |
+---------------------------------------------------------------+

If I Had 5 Minutes

vector<int> a(n); for(auto &x : a) cin >> x;—read an array
sort(a.begin(), a.end()); a.erase(unique(a.begin(), a.end()), a.end());—deduplicate
int freq[26]={}; for(char c:s) freq[c-'a']++;—character frequency
pre[i+1] = pre[i] + a[i]; with sum(l,r) = pre[r+1] - pre[l]—prefix sums
Use s += c not s = s + c; use (int)v.size() not bare v.size()—avoid the two most common bugs

Before You Code Checklist

Before writing your solution to an array/string problem, verify:

[ ] Bounds: What is the maximum $n$ ? Does $O (n^{2})$ fit, or do you need $O (n \log n)$ or $O (n)$ ?
[ ] Type: Will values exceed int? Use long long for sums and products. Check intermediate expressions too.
[ ] Edge cases: Empty array? Single element? All elements identical? String of length 1?
[ ] Indexing convention: 0-indexed or 1-indexed? Stick with one. If the problem uses 1-indexed, decide whether to convert on input.
[ ] Output format: Spaces between elements? Trailing space OK? Newline at end? Read the output specification twice.

Gotchas and Debugging

These are the bugs that cost rounds. Most array and string errors fall into a handful of recurring patterns—learn to recognize them before they bite.

Off-by-One Errors

v.size() returns size_t (unsigned). v.size() - 1 when v is empty gives a huge number, not $- 1$ . Cast to int or check v.empty() first.
0-indexed vs 1-indexed: pick one convention per problem and stick with it. Most CF editorial code uses 0-indexed arrays but 1-indexed in the problem statement.

String Concatenation in a Loop

cpp

// BAD: O(n^2) total -- each += copies the entire string
string result;
for(int i = 0; i < n; i++) result += to_string(a[i]) + " ";

// BETTER: pre-reserve or use ostringstream
string result;
result.reserve(n * 12);
for(int i = 0; i < n; i++){
    result += to_string(a[i]);
    result += ' ';
}

In practice, for competitive programming, just printing in a loop is fine. Only worry about this if you're building a large string to use later.

Iterator Invalidation

push_back may reallocate, invalidating all iterators to the vector.
erase invalidates iterators at and after the erased position.
Never do for(auto it = v.begin(); it != v.end(); ++it) if(*it == x) v.erase(it);—use the erase-remove idiom instead.

The crash scenario—push_back during iteration:

cpp

// BUG: Undefined Behavior! push_back may reallocate, invalidating `it`
vector<int> v = {1, 2, 3, 4, 5};
for(auto it = v.begin(); it != v.end(); ++it){
    if(*it % 2 == 0)
        v.push_back(*it * 10);  // UB: reallocation invalidates `it`
}
// May crash, may silently corrupt data, may appear to "work" -- all are UB

text

  What happens when push_back triggers reallocation:

  Before push_back:
  v: +---+---+---+---+---+
     | 1 | 2 | 3 | 4 | 5 |  capacity=5
     +---+---+---+---+---+
             ^
             it (valid pointer into this block)

  After push_back(20) triggers realloc:
  OLD block (freed!):
     +---+---+---+---+---+
     | ? | ? | ? | ? | ? |  <-- deallocated memory!
     +---+---+---+---+---+
             ^
             it still points here --> DANGLING POINTER

  NEW block:
     +---+---+---+---+---+----+---+---+---+---+
     | 1 | 2 | 3 | 4 | 5 | 20 |   |   |   |   |  capacity=10
     +---+---+---+---+---+----+---+---+---+---+

Fix 1—use index-based loop (preferred in CP):

cpp

vector<int> v = {1, 2, 3, 4, 5};
int orig_size = (int)v.size();
// Invariant: i scans only the original elements; new elements appended beyond orig_size
for(int i = 0; i < orig_size; i++){
    if(v[i] % 2 == 0)
        v.push_back(v[i] * 10);  // safe: index is not invalidated by realloc
}
// v = {1, 2, 3, 4, 5, 20, 40}

Fix 2—reserve enough capacity upfront:

cpp

vector<int> v = {1, 2, 3, 4, 5};
v.reserve(v.size() * 2);  // guarantee no reallocation
// Now iterators remain valid as long as we don't exceed reserved capacity
for(auto it = v.begin(); it != v.end(); ++it){
    // still risky if end() changes -- prefer index-based loops
}

Rule of thumb for CP: Avoid iterator-based loops when modifying the container. Use for(int i = 0; i < ...)—indices survive reallocation, iterators don't. See STL Containers for more on iterator validity rules across container types.

`char` Signedness

char is signed on most x86 compilers. s[i] - 'a' can be negative if s[i] is not a lowercase letter. Always validate input range or cast to unsigned char.

Multi-Test Case Pitfalls

Global arrays / vectors must be cleared or resized each test case. Prefer declaring inside the while(t--) loop.
If using global int a[MAXN], use fill(a, a+n, 0)—don't memset to arbitrary values (only works for 0 and -1).

`getline` after `cin >>`

cpp

int n;
cin >> n;
// cin >> leaves '\n' in the buffer
string s;
getline(cin, s);  // reads empty string!
// Fix:
cin.ignore();     // or: cin >> ws;
getline(cin, s);  // now reads the actual line

Integer Overflow

v.size() is size_t (usually unsigned long long). Arithmetic like v.size() - 2 can underflow.
Array indices multiplied together: a[i * n + j]—if i and n are int, the product can overflow for $n > 46340$ .

Comparing Strings

== on strings is $O (n)$ , not $O (1)$ . In tight loops, consider hashing.
Lexicographic comparison (<) short-circuits on first difference—usually fast in practice.

Mental Traps

Confusing index with value. When finding the maximum element, it is easy to store the result ambiguously. "The index of the max element" and "the max element" are two different things. Use distinct variable names (max_idx vs max_val) to prevent subtle bugs where you return a position when you meant a value, or vice versa.

text

  WRONG -- ambiguous variable
  +---+---+---+---+---+
  | 8 | 3 | 9 | 1 | 5 |
  +---+---+---+---+---+
    0   1   2   3   4

  max = 2     index or value?

  RIGHT -- distinct names
  +---+---+---+---+---+
  | 8 | 3 | 9 | 1 | 5 |
  +---+---+---+---+---+
    0   1   2   3   4
            ^
  max_idx = 2    <-- position
  max_val = 9    <-- value at position

"I already know arrays." This overconfidence is exactly what produces a string of off-by-one bugs in the first 10 minutes of a contest. The fact that you know arrays does not mean you reason about them carefully under pressure. Slow down at every boundary condition—explicitly verify your loop bounds, your slice endpoints, and your termination conditions before submitting.

text

  WRONG -- casual loop bound
  +---+---+---+---+---+---+
  | 3 | 1 | 4 | 1 | 5 | 9 |  n = 6
  +---+---+---+---+---+---+--
    0   1   2   3   4   5   6
                            ^
                    i reaches here -- out of bounds

  RIGHT -- half-open range [0, n)
  +---+---+---+---+---+---+
  | 3 | 1 | 4 | 1 | 5 | 9 |
  +---+---+---+---+---+---+
    0   1   2   3   4   5
    ^                       ^
    |<--- valid range ----->|
    i from 0       stops before n

Mutating a string through a reference when you wanted a copy. In C++, strings are mutable, so passing by reference and modifying inside a function silently changes the caller's data. If you intended to work on a copy, pass by value. This mistake is invisible during testing when you only call the function once, and only surfaces with multiple calls or when the caller reuses the string.

text

  WRONG -- pass by ref mutates caller
  caller  +---+---+---+---+
          | a | b | c | d |
          +---+---+---+---+
               |
               v  (ref -- same object)
  func    +---+---+---+---+
          | a | # | c | d |  caller sees # too!
          +---+---+---+---+

  RIGHT -- pass by value keeps original
  caller  +---+---+---+---+
          | a | b | c | d |  <-- unchanged
          +---+---+---+---+
               |
               v  (copy -- new object)
  func    +---+---+---+---+
          | a | # | c | d |  only the copy changed
          +---+---+---+---+

The Mistake That Teaches

It was a Div. 2 round, problem B, rated 1200. The task: given a string, build the lexicographically smallest string by appending characters from either end. My solution: loop, compare s.front() vs s.back(), append the smaller one, pop it.

It passed pretests. I moved on to C. Thirty minutes later: "Wrong Answer on test 47." I stared at the code. The logic seemed airtight. Then I spotted it: when front and back were equal, I always took the front—but the correct choice depends on which interior characters come next. I needed to compare the remaining string with its reverse, not just the endpoints.

But the real lesson wasn't the logic bug. It was that I'd written s = s.substr(1) to "pop the front." On a string of length 200,000, that's $O (n)$ per step—making my solution $O (n^{2})$ . Even after fixing the logic, it TLEd. The fix: use two pointers lo and hi into the original string, never copying.

Two bugs in one solution. The logic bug was the one I noticed; the performance bug was the one that cost me the round. Now I always ask: "Am I copying a string inside a loop?"

See: Debugging and Stress Testing for systematic approaches to finding these bugs

When NOT to Use This

Arrays and strings are the default, but sometimes they're the wrong tool:

Situation	Why Arrays Fail	Use Instead
Frequent insertions/deletions in the middle	$O (n)$ shifting per operation	`set`, `multiset`, or a balanced BST
Need $O (\log n)$ rank/select queries	Arrays only support $O (n)$ search (unsorted) or $O (\log n)$ search (sorted, but insert breaks it)	Policy-based tree (`order_of_key`) or segment tree
FIFO queue behavior (push back, pop front)	`vector` pop_front is $O (n)$	`deque` or `queue`
Key-value lookups by non-integer keys	Can't index by string or large integer	`unordered_map` or `map`
Persistent/versioned data	Modifying an array destroys previous state	Persistent segment tree or copy-on-write
Need set intersection/union	Manual merge is error-prone	`set_intersection` on sorted vectors, or use `bitset`

In CP, use vector by default. Only switch when the problem demands an operation that arrays can't do efficiently. See STL Containers for the full container reference, or the Data Structure Selection Guide for a quick decision flowchart.

What Would You Do If...?

Scenario 1: You're solving a problem that says "find all pairs (i,j) with a[i]+a[j]=K." The naive O(n^2) approach TLEs at n=200,000.

Sort the array, then use two pointers from both ends. Each pointer moves at most n times total, so it's O(n log n) for the sort + O(n) for the scan. Alternatively, use a hash set: for each element x, check if K-x has been seen.

Scenario 2: Your string solution passes on small tests but TLEs on n=500,000. You're building a result string character by character.

Check: Are you using s = s + c anywhere? Switch to s += c. Are you calling s.substr() inside a loop? Replace with index tracking. Are you calling s.find() in a loop? That's O(n*m) per call—switch to string hashing or KMP.

Scenario 3: You need to process queries: "how many elements in the array fall in range [L, R]?" with 200,000 queries on an array of 200,000 elements.

A raw array can't answer range queries faster than O(n) per query. Sort the array and use lower_bound/upper_bound for O(log n) per query. If the array is modified between queries, you need a segment tree or BIT.

Self-Test

Before moving on, verify you can do each of these without looking back at the text:

[ ] Write a loop that iterates over every adjacent pair (a[i], a[i+1]) in an array of n elements, without accessing out-of-bounds memory.
[ ] Extract the substring s[l..r] (inclusive) using std::string::substr—get the arguments right on the first try.
[ ] Count the frequency of each lowercase letter in a string in O(n) time using an array, not a map.
[ ] Explain why s.size() - 1 is dangerous when s is empty and how to guard against it.
[ ] Given a 2D array int a[N][M], state which loop order is cache-friendly and which is not.
[ ] Given int max_idx and int max_val, explain which stores a position and which stores a value—then write correct code to find both for an array.
[ ] Describe what happens when you call v.push_back(x) while iterating over v with an iterator—and name the safe alternative.

Debug-This Exercises

Find the bug in each snippet. Answers are below the fold.

Bug 1: The disappearing elements

cpp

vector<int> v = {1, 2, 3, 4, 5, 6};
for(int i = 0; i < (int)v.size(); i++){
    if(v[i] % 2 == 0)
        v.erase(v.begin() + i);
}
// Expected: {1, 3, 5}  -- what do you actually get?

Bug 2: The infinite builder

cpp

string build(int n){
    string s;
    for(int i = 0; i < n; i++)
        s = s + (char)('a' + i % 26);
    return s;
}
// Compiles and gives correct output. What's the bug?

Bug 3: The subtle overflow

cpp

vector<int> v = {10, 20, 30};
for(int i = 0; i < v.size() - 1; i++){
    cout << v[i] << " ";
}
// Works fine. Now what happens if v is empty?

Bug 4: The wrong reverse

cpp

string s = "abcdef";
for(int i = 0; i <= (int)s.size(); i++){
    swap(s[i], s[s.size() - i]);
}
// What's wrong? (Two bugs!)

Answers (click to reveal)

Bug 1: After erasing element at index i, the next element shifts into position i, but i++ skips it. Result: {1, 3, 5, 6}—the 6 survives because 4 was erased at index 3, then i advances to 4 which is now past the end. Fix: don't increment i after an erase, or use the erase-remove idiom.

Bug 2: s = s + (char)(...) creates a temporary string each iteration, copying all of s. Total work: $O (n^{2})$ . For $n = 10^{6}$ , this TLEs. Fix: use s += (char)('a' + i % 26) for amortized $O (1)$ per append.

Bug 3: v.size() is size_t (unsigned). When v is empty, v.size() - 1 wraps to ~0ULL (about $1.8 \times 10^{19}$ ), and the loop runs until segfault. Fix: for(int i = 0; i < (int)v.size() - 1; i++) or check v.empty() first.

Bug 4: Two bugs: (a) s.size() - 0 = 6 which is out of bounds (valid indices are 0–5), should be s.size() - 1 - i; (b) the loop condition i <= s.size() should be i < s.size() / 2—otherwise you swap elements back to their original positions in the second half. Fix: for(int i = 0; i < (int)s.size()/2; i++) swap(s[i], s[s.size()-1-i]);

Practice Problems

#	Problem	Source	Difficulty	Key Idea
1	Watermelon	CF 4A	800	Basic if-statement, array boundary
2	Word	CF 59A	800	Character frequency, case conversion
3	Distinct Numbers	CSES	~900	Sort + unique
4	Palindrome Reorder	CSES	~1000	Frequency counting, string building
5	Yet Another Palindrome Problem	CF 1324C	1100	Frequency + position tracking
6	Make It Good	CF 1385C	1200	Suffix analysis, array structure
7	Good Subarrays	CF 1398C	1600	Prefix sums on transformed array
8	Maximum Subarray Sum	CSES	~1600	Kadane's algorithm
9	Xenia and Colorful Gems	CF 1336C	1700	Sort + binary search on arrays
10	Longest Palindromic Prefix	CF 1326D2	2200	String hashing + palindrome

For a structured progression through array and string problems, see the Practice Ladder (1100–1400).

Arrays and Strings ​

Contents ​

Intuition ​

Visual Walkthrough ​

The Invariant ​

What the Code Won't Teach You ​

When to Reach for This ​

C++ STL Reference ​

Containers ​

Key vector operations ​

Key string operations ​

Algorithms on Ranges ​

Hidden Complexity Table ​

Implementation (Contest-Ready) ​

Minimal Contest Template ​

Annotated Version ​

String I/O Patterns ​

Operations Reference ​

vector<T> ​

string ​

C-Array T a[N] ​

String Iteration Efficiency ​

vector<bool> Specialization—A Dangerous Trap ​

Problem Patterns and Tricks ​

Frequency Counting ​

Sort + Unique (Coordinate Compression) ​

Two Pointers on Sorted Array ​

Prefix Sums for Range Queries ​

String Palindrome Check ​

Sliding Window on String ​

Erase-Remove Idiom ​

Max Subarray (Kadane's Algorithm) ​

Contest Cheat Sheet ​

If I Had 5 Minutes ​

Before You Code Checklist ​

Gotchas and Debugging ​

Off-by-One Errors ​

String Concatenation in a Loop ​

Iterator Invalidation ​

char Signedness ​

Multi-Test Case Pitfalls ​

getline after cin >> ​

Integer Overflow ​

Comparing Strings ​

Mental Traps ​

The Mistake That Teaches ​

When NOT to Use This ​

What Would You Do If...? ​

Self-Test ​

Debug-This Exercises ​

Practice Problems ​

Further Reading ​

Arrays and Strings

Contents

Intuition

Visual Walkthrough

The Invariant

What the Code Won't Teach You

When to Reach for This

C++ STL Reference

Containers

Key vector operations

Key string operations

Algorithms on Ranges

Hidden Complexity Table

Implementation (Contest-Ready)

Minimal Contest Template

Annotated Version

String I/O Patterns

Operations Reference

vector<T>

string

C-Array `T a[N]`

String Iteration Efficiency

vector<bool> Specialization—A Dangerous Trap

Problem Patterns and Tricks

Frequency Counting

Sort + Unique (Coordinate Compression)

Two Pointers on Sorted Array

Prefix Sums for Range Queries

String Palindrome Check

Sliding Window on String

Erase-Remove Idiom

Max Subarray (Kadane's Algorithm)

Contest Cheat Sheet

If I Had 5 Minutes

Before You Code Checklist

Gotchas and Debugging

Off-by-One Errors

String Concatenation in a Loop

Iterator Invalidation

`char` Signedness

Multi-Test Case Pitfalls

`getline` after `cin >>`

Integer Overflow

Comparing Strings

Mental Traps

The Mistake That Teaches

When NOT to Use This

What Would You Do If...?

Self-Test

Debug-This Exercises

Practice Problems

Further Reading