Skip to content

Arrays and Strings

Quick Revisit

  • USE WHEN: You need O(1) lookup/update by index or need to track seen elements
  • INVARIANT: a[i] is accessed in O(1) via base + i*sizeof(element)
  • TIME: O(1) access/update | SPACE: O(n)
  • CLASSIC BUG: Off-by-one on 0-indexed vs 1-indexed arrays
  • PRACTICE: ../12-practice-ladders/01-ladder-1100-to-1400.md

The most fundamental data structures in competitive programming—contiguous memory blocks that underpin nearly every problem you'll solve on Codeforces.

Difficulty: Beginner | Prerequisites: None

Arrays date to the earliest stored-program computers of the late 1940s, where contiguous memory words were the only data structure available. The C language (1972) made pointer-arithmetic access a first-class citizen, and C++ std::vector (1994, HP STL) added dynamic resizing while preserving the O(1) access guarantee. Strings as first-class objects—rather than null-terminated char*—arrived with std::string in C++98.

Think of an array as a row of numbered lockers: knowing the number gets you to the locker in one step, no matter how long the hallway. The real aha moment comes when you realize that an array's index is a hash function for small integers—and that almost every "have I seen X before?" question collapses to a single array lookup. Once that clicks, competitive programming starts to click too. A useful mnemonic: "Value IS address—skip the search."


Contents


Intuition

You have an array of six integers and need to find the first value that appears twice:

text
a = [3, 1, 4, 1, 5, 3]

Your first instinct: for each element a[i], scan everything before it looking for a match.

text
i=0  value=3   check against: (nothing)        --> 0 comparisons
i=1  value=1   check against: 3                --> 1 comparison, no match
i=2  value=4   check against: 3, 1             --> 2 comparisons, no match
i=3  value=1   check against: 3, 1             --> 2 comparisons, MATCH!

Found it—1 is the first duplicate—after 5 comparisons. Not terrible for 6 elements. But you got lucky: the duplicate was near the middle. Push it to the end:

text
a = [1, 2, 3, 4, 5, 1]          (duplicate at position 5)

i=0  (0 comparisons)
i=1  (1 comparison)
i=2  (2 comparisons)
i=3  (3 comparisons)
i=4  (4 comparisons)
i=5  (5 comparisons)  --> finally, MATCH

Total: 0 + 1 + 2 + 3 + 4 + 5 = 15 comparisons = n*(n-1)/2

For n=105, that is roughly 5×109 comparisons—about 5 seconds on a machine doing 109 operations per second. For n=106, the count reaches 5×1011. Contest time limits are 1–2 seconds.

We're re-scanning previous elements for every single position. There has to be a better way.

An array's O(1) indexed access lets you use a value directly as an address, turning "have I seen this before?" from a linear scan into a single lookup.

Think of numbered mailboxes at a post office. To check if mail arrived for box #4, you don't search through every box starting from #1. You walk directly to box 4 and open it. The box number is the location—no searching required.

An array works the same way: seen[4] compiles to a single address calculation—base + 4 * sizeof(bool)—regardless of how many elements exist. No comparisons, no traversal.

Where the analogy breaks: a post office has a fixed number of boxes. If someone asks for box #1,000,000,000, you can't build a billion mailboxes. Similarly, if your values range up to 109, you can't allocate a billion-entry seen array. You'd need hashing (unordered_map) or coordinate compression to shrink the value space first.

Why contiguous memory matters beyond indexing: modern CPUs load memory in cache lines (~64 bytes). When you iterate a vector, adjacent elements are already in L1 cache—the hardware predicts your sequential access pattern. A list or map scatters nodes across the heap, turning each access into a potential cache miss.

For n106, this can mean a 5–10× wall-clock difference. Prefer vector over list/deque unless you specifically need O(1) front insertion.

Visual Walkthrough

Same array: a = [3, 1, 4, 1, 5, 3]. Allocate seen[6], all initialized to false (shown as .; * means true).

text
Step 1: Process a[0] = 3

  a:      | 3 | 1 | 4 | 1 | 5 | 3 |
            ^
            i=0

  seen:   | . | . | . | . | . | . |
   index:   0   1   2   3   4   5

  seen[3] is false  -->  not a duplicate
  Set seen[3] = true

  seen:   | . | . | . | * | . | . |
   index:   0   1   2   3   4   5

  ops so far: 1 lookup + 1 write = 2
text
Step 2: Process a[1] = 1

  a:      | 3 | 1 | 4 | 1 | 5 | 3 |
                ^
                i=1

  seen:   | . | . | . | * | . | . |
   index:   0   1   2   3   4   5

  seen[1] is false  -->  not a duplicate
  Set seen[1] = true

  seen:   | . | * | . | * | . | . |
   index:   0   1   2   3   4   5

  ops so far: 4
text
Step 3: Process a[2] = 4

  a:      | 3 | 1 | 4 | 1 | 5 | 3 |
                    ^
                    i=2

  seen:   | . | * | . | * | . | . |
   index:   0   1   2   3   4   5

  seen[4] is false  -->  not a duplicate
  Set seen[4] = true

  seen:   | . | * | . | * | * | . |
   index:   0   1   2   3   4   5

  ops so far: 6
text
Step 4: Process a[3] = 1

  a:      | 3 | 1 | 4 | 1 | 5 | 3 |
                        ^
                        i=3

  seen:   | . | * | . | * | * | . |
   index:   0   1   2   3   4   5

  seen[1] is TRUE  -->  DUPLICATE FOUND!  Return value 1.

  seen:   | . | * | . | * | * | . |
   index:   0   1   2   3   4   5

  ops so far: 7  (just one more lookup)

Brute force: 5 comparisons in the lucky case, up to 15 in the worst case—O(n2).

Array lookup: 7 constant-time operations, always—O(n).

The gap widens fast. At n=105: brute force does up to 5×109 comparisons; the array approach does at most 2n=2×105 operations. That is a 25,000× difference.

The Invariant

text
+----------------------------------------------------------+
| INVARIANT: After processing a[0..i-1], seen[v] == true   |
| if and only if value v appeared in a[0..i-1]. Each       |
| lookup and update costs O(1) because array indexing is    |
| direct address arithmetic: base + v * sizeof(bool).      |
+----------------------------------------------------------+

The invariant holds because every time we process a[i]:

  1. We read seen[a[i]]—by the invariant, this is true exactly when a[i] already appeared in a[0..i-1].
  2. If false, we set seen[a[i]] = true, extending the invariant to cover a[0..i].
  3. If true, we have found a duplicate and stop.

Look at Step 4 above—seen[1] is true because Step 2 set it when we first encountered value 1 at index 1. The invariant guarantees that a true entry means "this value appeared earlier," not "some garbage was in memory." Without O(1) indexed access, maintaining this property would require scanning a list each time, destroying the O(n) guarantee.

This same principle—use the value as the index—is the foundation of frequency arrays (freq[c - 'a']++), counting sort, bucket sort, and direct-address tables. The contiguous memory layout of arrays is what makes the address calculation constant-time and the whole approach viable.

What the Code Won't Teach You

The real skill is boundary discipline, not memorization. When you write a loop or a slice, explicitly state your invariant: "i is the index of the current element" or "everything in 0..i-1 has been processed." The moment that invariant gets fuzzy, you are one off-by-one away from a bug. This is not about memorizing that arrays are zero-indexed—it is about maintaining a precise mental model at every step.

text
  Boundary invariant at each loop step

  +---+---+---+---+---+---+
  | # | # | # |   |   |   |
  +---+---+---+---+---+---+
    0   1   2   3   4   5
    <-processed->|<-pending->
                 ^
                 i

  At step i: a[0..i-1] done, a[i..n-1] remaining

Strings in competitive programming are almost always about patterns, not raw characters. The interesting questions are: does string A appear in string B? What is the longest prefix shared by two strings? How many distinct substrings exist? The mechanics—indexing, slicing, concatenation—are just the substrate. The real skill is recognizing the structural question hiding underneath the string manipulation.

text
  Surface vs structure

  s = a b c a b c d
      +--+--+--+--+--+--+--+
      |  |  |  |  |  |  |  |
      +--+--+--+--+--+--+--+

  Pattern view
      |<--abc-->|<--abc-->| d
      +--+--+--+--+--+--+--+
      | #| #| #| #| #| #|  |
      +--+--+--+--+--+--+--+
      period 3

Character arithmetic is a basic building block, not a trick. c - 'a' maps lowercase letters to indices 0 through 25, letting you use characters directly as array positions. This is the foundation of every frequency-counting solution and appears in nearly every string problem at every difficulty level.

text
  Character to index mapping

  char  a   b   c   d   ...   z
        |   |   |   |         |
        v   v   v   v         v
  idx   0   1   2   3        25

        +---+---+---+---+-----+---+
  freq  |   |   |   |   | ... |   |
        +---+---+---+---+-----+---+
          0   1   2   3        25

With those deeper skills in mind, here's how to recognize when an array-based approach is the right tool.

When to Reach for This

Trigger phrases in problem statements:

  • "count the frequency of each element" → frequency array indexed by value
  • "find duplicates" or "any value that repeats" → boolean seen array
  • "values bounded by a small constant (106)" → direct indexing feasible
  • "rearrange / sort when values are in a small range" → counting sort via array buckets
  • "character frequency in a string" → freq[c - 'a'] with a size-26 array

Counter-examples (looks like direct-indexing, but isn't):

  • Values up to 109 or negative values—can't allocate an array that big. Use unordered_map or coordinate-compress into a dense range first, then index. See Sort + Unique (Coordinate Compression) below.
  • "Find the k-th smallest element"—direct indexing doesn't help with rank queries. You need sorting, a selection algorithm, or an order-statistics tree.
  • "Insert/delete elements frequently in the middle"—arrays pay O(n) per mid-insertion due to shifting. If this dominates your runtime, consider a balanced BST or a policy-based tree.

Where arrays and strings appear on Codeforces:

  • Tags: implementation, strings, sortings, greedy, brute force
  • Difficulty: 800–1600 (arrays as the core structure), but array manipulation tricks appear up to 2400+
  • Nearly every problem uses arrays—the question is how

Pattern Fingerprint: values bounded by 10^6 → direct-index frequency array | find duplicates / first repeat → boolean seen array | range sum queries → prefix sums | longest subarray with property → sliding window / two pointers | string matching → hashing or KMP (not s.find())

Rating Progression—what array/string skills matter at each level:

CF RatingWhat You Need
~1200Frequency arrays, basic string manipulation, sort + iteration, simple simulation
~1500Prefix sums, two pointers, sliding window, coordinate compression, s += c vs s = s + c awareness
~1800Kadane's variations, sweep-line on arrays, string hashing, careful complexity analysis of nested loops
~2100+Suffix arrays, Z/KMP for non-trivial matching, segment trees over arrays, bitmask + array DP

See: Complexity Analysis for how to evaluate whether your array approach fits in time


With the core idea clear, here is the C++ toolkit that implements it.

C++ STL Reference

Containers

Function / ClassHeaderWhat it doesTime Complexity
vector<T><vector>Dynamic array, contiguous memoryAccess O(1), push_back amortized O(1)
array<T,N><array>Fixed-size array on stack, bounds-checked with .at()Access O(1)
string<string>Dynamic char array with string operationsAccess O(1), substr O(k)
string_view<string_view>Non-owning view into string, no copyConstruction O(1), no allocation

Key vector operations

Function / ClassHeaderWhat it doesTime Complexity
v.push_back(x)<vector>Append element (copy)Amortized O(1)
v.emplace_back(args...)<vector>Construct element in-place at endAmortized O(1)
v.pop_back()<vector>Remove last elementO(1)
v.resize(n)<vector>Change size; new elements value-initializedO(n)
v.resize(n, val)<vector>Change size; new elements set to valO(n)
v.reserve(n)<vector>Pre-allocate capacity (no size change)O(n)
v.shrink_to_fit()<vector>Request to reduce capacity to sizeO(n)
v.insert(it, x)<vector>Insert before iteratorO(n)
v.erase(it)<vector>Remove at iterator, shift leftO(n)
v.erase(first, last)<vector>Remove rangeO(n)
v.clear()<vector>Remove all elements (size=0, capacity unchanged)O(n)
v.assign(n, val)<vector>Replace contents with n copies of valO(n)
v.front() / v.back()<vector>First / last element referenceO(1)
v.data()<vector>Pointer to underlying arrayO(1)

Key string operations

Function / ClassHeaderWhat it doesTime Complexity
s.substr(pos, len)<string>Extract substring (returns new string)O(len)
s.find(t)<string>Find first occurrence of tO(nm) worst case
s.rfind(t)<string>Find last occurrence of tO(nm) worst case
s.find_first_of(chars)<string>Find first char in setO(nk)
s += t<string>Append stringAmortized O(len(t))
s.append(t)<string>Same as +=Amortized O(len(t))
s.insert(pos, t)<string>Insert at positionO(n+len(t))
s.erase(pos, len)<string>Erase substringO(n)
s.replace(pos, len, t)<string>Replace substringO(n+len(t))
s.c_str()<string>Null-terminated C-string pointerO(1)
stoi(s) / stoll(s)<string>String to int / long longO(n)
to_string(x)<string>Number to stringO(logx)

Algorithms on Ranges

Function / ClassHeaderWhat it doesTime Complexity
sort(b, e)<algorithm>Sort range [b, e)O(nlogn)
reverse(b, e)<algorithm>Reverse range in-placeO(n)
unique(b, e)<algorithm>Remove consecutive duplicates (returns new end)O(n)
lower_bound(b, e, x)<algorithm>Iterator to first element x (sorted range)O(logn)
upper_bound(b, e, x)<algorithm>Iterator to first element >x (sorted range)O(logn)
accumulate(b, e, init)<numeric>Sum of rangeO(n)
max_element(b, e)<algorithm>Iterator to maxO(n)
min_element(b, e)<algorithm>Iterator to minO(n)
fill(b, e, val)<algorithm>Fill range with valueO(n)
iota(b, e, start)<numeric>Fill with start, start+1, ...O(n)
next_permutation(b, e)<algorithm>Mutate to next lexicographic permutationO(n)

Hidden Complexity Table

Many operations look cheap but hide linear or worse costs. This table collects the surprises:

OperationvectorstringdequeNotes
push_backO(1)*O(1)*O(1)** = amortized; single call may trigger O(n) reallocation
insert(mid)O(n)O(n)O(n)Must shift elements; not "just one insert"
substrN/AO(k)N/AAllocates a new string of length k
findO(n)O(nm)N/Astring::find is NOT guaranteed to use KMP/BM—GCC uses naive search
+ (concat)N/AO(n+m)N/ACreates a new string; prefer += to avoid temporaries
== (compare)O(n)O(n)O(n)Must check every element; no short hash
erase(begin)O(n)O(n)O(1)Deque wins for front removal
operator[]O(1)O(1)O(1)All three—but deque has higher constant factor

Warning: string::find on GCC (libstdc++) uses a naive O(nm) search, not Knuth-Morris-Pratt. If you need fast substring search in a contest, use String Hashing or KMP/Z-Function instead.


Implementation (Contest-Ready)

Minimal Contest Template

cpp
#include <bits/stdc++.h>
using namespace std;

int main(){
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int t;
    cin >> t;
    while(t--){
        int n;
        cin >> n;
        vector<int> a(n);
        for(auto &x : a) cin >> x;

        // --- solve ---
        sort(a.begin(), a.end());
        // Invariant: after unique(), elements in [begin, new_end) are distinct and sorted
        a.erase(unique(a.begin(), a.end()), a.end());

        cout << a.size() << "\n";
        // Invariant: at iteration i, elements a[0..i-1] have been printed
        for(int i = 0; i < (int)a.size(); i++){
            cout << a[i] << " \n"[i + 1 == (int)a.size()];
        }
    }
}

Annotated Version

cpp
#include <bits/stdc++.h>
using namespace std;

int main(){
    // Fast I/O: decouples C and C++ streams, disables sync with scanf/printf.
    // Never mix cin/cout with scanf/printf after this.
    ios::sync_with_stdio(false);
    cin.tie(nullptr);  // don't flush cout before each cin

    int t;
    cin >> t;
    while(t--){
        int n;
        cin >> n;

        // Read array. vector<int> a(n) creates n elements initialized to 0.
        // Prefer this over `int a[n]` (VLA is non-standard in C++).
        vector<int> a(n);
        for(auto &x : a) cin >> x;  // range-for with reference to read in-place

        // Sort + unique: classic idiom to remove duplicates.
        // unique() moves consecutive dupes to the end, returns iterator to new logical end.
        // erase() actually shrinks the vector.
        // Invariant: after this block, a[] contains only distinct values in sorted order
        sort(a.begin(), a.end());
        a.erase(unique(a.begin(), a.end()), a.end());

        cout << a.size() << "\n";

        // Output with spaces between elements, newline at end.
        // " \n"[condition] is a trick: if condition is true (last element),
        // prints '\n'; otherwise prints ' '.
        // Invariant: at step i, a[0..i-1] have been printed with correct delimiters
        for(int i = 0; i < (int)a.size(); i++){
            cout << a[i] << " \n"[i + 1 == (int)a.size()];
        }
    }
}

String I/O Patterns

cpp
#include <bits/stdc++.h>
using namespace std;

int main(){
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    // Read a single string (no spaces)
    string s;
    cin >> s;

    // Read entire line (including spaces)
    string line;
    getline(cin, line);  // WARNING: if cin >> was used before, consume leftover '\n' first
    // Fix: cin.ignore() or cin >> ws before getline

    // Read n strings
    int n;
    cin >> n;
    vector<string> words(n);
    for(auto &w : words) cin >> w;

    // Build string efficiently
    string result;
    result.reserve(n * 10);  // pre-allocate if you know approximate size
    for(auto &w : words){
        result += w;
        result += ' ';
    }

    // Character frequency (ASCII trick)
    int freq[128] = {};
    for(char c : s) freq[(int)c]++;

    // Convert case
    for(char &c : s) c = tolower(c);

    cout << s << "\n";
}

Operations Reference

Use this section as a quick lookup during contests. Each table shows the time and space cost of every common operation.

vector<T>

OperationTimeSpace
Access v[i]O(1)O(1)
push_back / emplace_backAmortized O(1)O(1) amortized
pop_backO(1)O(1)
insert at position iO(ni)O(1)
erase at position iO(ni)O(1)
resize(n)O(n)O(n)
reserve(n)O(n)O(n)
clearO(n)O(1) (capacity kept)
sortO(nlogn)O(logn) stack
Iterate allO(n)O(1)
CopyO(n)O(n)

string

OperationTimeSpace
Access s[i]O(1)O(1)
s += t (append)Amortized O(|t|)O(1) amortized
substr(pos, len)O(len)O(len) (new string)
find(t)O(nm) worstO(1)
compare / ==O(min(n,m))O(1)
insert(pos, t)O(n+|t|)O(1)
erase(pos, len)O(n)O(1)
Lexicographic <O(min(n,m))O(1)

C-Array T a[N]

OperationTimeSpace
Access a[i]O(1)O(1)
Fill with memset/fillO(N)O(1)
Copy with memcpy/copyO(N)O(N)

String Iteration Efficiency

Small choices in string loops have large performance consequences:

1. s.size() returns size_t (unsigned)—match your loop variable or cast.

cpp
// BAD: comparison between signed and unsigned
for(int i = 0; i < s.size(); i++)  // compiler warning; works, but sloppy

// GOOD: cast to int (safe for n <= 2*10^9)
for(int i = 0; i < (int)s.size(); i++)

// ALSO GOOD: range-for when you don't need the index
for(char c : s)

2. s[i] is O(1) but s.substr(i) is O(ni).

Every substr allocates a new string and copies characters. In a loop, this turns O(n) logic into O(n2):

cpp
// BAD: O(n^2) -- substr copies (n-i) chars each iteration
for(int i = 0; i < (int)s.size(); i++){
    string tail = s.substr(i);  // O(n-i) allocation + copy
    if(tail < best) best = tail;
}

// GOOD: compare in-place using iterators or string_view
for(int i = 0; i < (int)s.size(); i++){
    // Invariant: best holds the lex-smallest suffix seen so far among s[0..i-1]
    if(s.compare(i, string::npos, best) < 0)
        best = s.substr(i);  // only copy when we find a new best
}

3. s += c vs s = s + c—amortized O(1) vs O(n).

cpp
string s;
// BAD: s = s + c creates a temporary, copies all of s, then appends c --> O(n) per call
// Invariant: after i iterations, s contains i characters (but O(i^2) total work)
for(int i = 0; i < n; i++)
    s = s + 'a';           // O(n^2) total!

// GOOD: s += c appends in-place with amortized O(1) reallocation
// Invariant: after i iterations, s contains i characters (O(i) total work)
for(int i = 0; i < n; i++)
    s += 'a';              // O(n) total

This is the single most common performance bug in string-heavy contest problems. If your string solution TLEs, check for s = s + ... first.

vector<bool> Specialization—A Dangerous Trap

vector<bool> is not a real vector. The C++ standard requires it to be a packed bitfield (1 bit per element), which breaks several assumptions:

cpp
// This works for vector<int>, but FAILS for vector<bool>:
vector<bool> v(8, true);

// FAILS: cannot take address of a bit -- v[0] returns a proxy object, not a bool&
bool* p = &v[0];           // compile error!

// FAILS: range-for with reference doesn't bind to the proxy
for(bool& b : v) b = false; // compile error in some compilers!

// FAILS: data() is not available
bool* raw = v.data();       // compile error!

Why this matters in CP: You might write a generic function taking vector<T>& and it silently breaks when T = bool. Or you pass &v[0] to a C-style API and get a compile error mid-contest.

The fix—use vector<char> or bitset:

cpp
// Use vector<char> when you need a resizable boolean array
vector<char> seen(n, 0);  // each element is 0 or 1, 1 byte each
seen[i] = 1;              // works exactly like you'd expect
char* p = &seen[0];       // fine -- real references, real pointers

// Use bitset<N> when size is fixed at compile time and you want bit-packing
bitset<100001> seen;       // 1 bit per element, but with proper interface
seen[i] = 1;
seen.count();              // popcount in O(N/64)

See: Bitset Optimization for advanced bitset techniques in CP


The primitives above are building blocks. These are the contest patterns they assemble into.

Problem Patterns and Tricks

Frequency Counting

Count occurrences of elements. Use a plain array for small alphabets, unordered_map for large ranges.

cpp
// Character frequency in O(n)
// Invariant: after processing s[0..i-1], freq[c-'a'] == count of c in s[0..i-1]
int freq[26] = {};
for(char c : s) freq[c - 'a']++;

Problems: CF 1850C, CF 520A

Sort + Unique (Coordinate Compression)

Remove duplicates or compress values to indices.

cpp
vector<int> vals = a;  // copy
sort(vals.begin(), vals.end());
vals.erase(unique(vals.begin(), vals.end()), vals.end());
// Now vals[i] is the i-th smallest unique value.
// To compress: index of x = lower_bound(vals.begin(), vals.end(), x) - vals.begin()

Problems: CF 1005C - Summarize to the Power of Two, CF 1560C

Two Pointers on Sorted Array

Maintain two indices moving inward (or same direction) to find pairs or subarrays.

cpp
// Find pair with sum = target in sorted array
// Invariant: no valid pair exists entirely within a[0..lo-1] or a[hi+1..n-1]
int lo = 0, hi = (int)a.size() - 1;
while(lo < hi){
    int s = a[lo] + a[hi];
    if(s == target){ /* found */ break; }
    else if(s < target) lo++;
    else hi--;
}
text
  Two pointer movement on sorted array -- find pair summing to 9

  +---+---+---+---+---+----+
  | 1 | 2 | 4 | 5 | 7 | 10 |   target = 9
  +---+---+---+---+---+----+
    ^                    ^
    lo                   hi      1+10 = 11 > 9  --> hi moves left

  +---+---+---+---+---+----+
  | 1 | 2 | 4 | 5 | 7 | 10 |
  +---+---+---+---+---+----+
    ^                ^
    lo               hi          1+7 = 8 < 9   --> lo moves right

  +---+---+---+---+---+----+
  | 1 | 2 | 4 | 5 | 7 | 10 |
  +---+---+---+---+---+----+
        ^            ^
        lo           hi          2+7 = 9        FOUND

Problems: CF 1472D, CF 1851B

See: Two Pointers

Prefix Sums for Range Queries

Pre-compute prefix sums to answer range-sum queries in O(1).

cpp
// Invariant: pre[i] == sum of a[0..i-1]; pre[0] == 0
vector<long long> pre(n + 1, 0);
for(int i = 0; i < n; i++) pre[i + 1] = pre[i] + a[i];
// sum of a[l..r] = pre[r+1] - pre[l]
text
  Building prefix sums from array a

  index     0   1   2   3   4
          +---+---+---+---+---+
  a       | 3 | 1 | 4 | 1 | 5 |
          +---+---+---+---+---+
            |   |   |   |   |
            v   v   v   v   v   accumulate left to right

  index   0   1   2   3   4   5
        +---+---+---+---+---+----+
  pre   | 0 | 3 | 4 | 8 | 9 | 14 |
        +---+---+---+---+---+----+
                                      pre[i+1] = pre[i] + a[i]

  Range query: sum of a[1..3]
        +---+---+---+---+---+----+
  pre   | 0 | 3 | 4 | 8 | 9 | 14 |
        +---+---+---+---+---+----+
              ^               ^
              pre[1]=3        pre[4]=9
              +---subtract----+  9 - 3 = 6

Problems: CF 1398C, CF 1343C

See: Prefix Sums

String Palindrome Check

Use two pointers from both ends or reverse-compare.

cpp
bool isPalin(const string &s){
    int n = s.size();
    for(int i = 0; i < n / 2; i++)
        if(s[i] != s[n - 1 - i]) return false;
    return true;
}
// Or: string t = s; reverse(t.begin(), t.end()); return s == t;

Problems: CF 1326C, CF 798A

Sliding Window on String

Fixed or variable-width window scanning for substrings matching a property.

cpp
// Count substrings of length k with all distinct chars
// Invariant: freq[] tracks char counts in window s[max(0,i-k+1)..i], distinct == number of nonzero entries
int cnt = 0;
int freq[26] = {};
int distinct = 0;
for(int i = 0; i < (int)s.size(); i++){
    if(freq[s[i] - 'a']++ == 0) distinct++;
    if(i >= k){
        if(--freq[s[i - k] - 'a'] == 0) distinct--;
    }
    if(i >= k - 1 && distinct == k) cnt++;
}

Problems: CF 1722G, CF 1196D2

See: Sliding Window

Erase-Remove Idiom

Efficiently remove elements matching a predicate without manual index juggling.

cpp
// Remove all zeros from vector
v.erase(remove(v.begin(), v.end(), 0), v.end());

// Remove all even numbers
v.erase(remove_if(v.begin(), v.end(), [](int x){ return x % 2 == 0; }), v.end());

Problems: CF 1703C, CF 1352C

Max Subarray (Kadane's Algorithm)

Find contiguous subarray with maximum sum in O(n).

cpp
// Invariant: cur == max subarray sum ending at current element; best == global max so far
long long best = a[0], cur = 0;
for(int x : a){
    cur = max((long long)x, cur + x);
    best = max(best, cur);
}
text
  Tracking cur and best through the array

  a     | -2 |  3 | -1 |  5 | -4 |  2 |
        +----+----+----+----+----+----+
  cur     -2    3    2    7    3    5
  best    -2    3    3    7    7    7
                          ^
                          best subarray is a[1..3]
                          3 + (-1) + 5 = 7

  cur  = max(x alone, cur + x)   -- extend or restart
  best = max(best, cur)          -- global maximum so far

Problems: CF 1796B, CSES 1643


Contest Cheat Sheet

text
+---------------------------------------------------------------+
|                  ARRAYS & STRINGS CHEAT SHEET                  |
+---------------------------------------------------------------+
| WHEN TO USE:                                                   |
|   o Random access needed           o Contiguous data           |
|   o Sorting / binary search        o Frequency counting        |
|   o String matching / manipulation                             |
+---------------------------------------------------------------+
| KEY STL:                                                       |
|   vector<int> a(n);          // dynamic array                  |
|   sort(a.begin(), a.end());  // O(n log n)                     |
|   a.erase(unique(a.begin(), a.end()), a.end()); // dedup       |
|   lower_bound(a.begin(), a.end(), x);  // binary search        |
|   string s; cin >> s;        // read word                      |
|   s.substr(pos, len);        // O(len) -- returns new string   |
|   s.find("pat");             // O(n*m) -- use hashing if TLE   |
+---------------------------------------------------------------+
| COMPLEXITY:                                                    |
|   Access:      O(1)          push_back: amortized O(1)         |
|   Insert/mid:  O(n)          Sort:      O(n log n)             |
|   find (str):  O(n*m)        substr:    O(k)                   |
+---------------------------------------------------------------+
| SPACE: vector O(n), string O(n), C-array O(N) stack            |
+---------------------------------------------------------------+
| FAST I/O:                                                      |
|   ios::sync_with_stdio(false); cin.tie(nullptr);               |
+---------------------------------------------------------------+

If I Had 5 Minutes

  1. vector<int> a(n); for(auto &x : a) cin >> x;—read an array
  2. sort(a.begin(), a.end()); a.erase(unique(a.begin(), a.end()), a.end());—deduplicate
  3. int freq[26]={}; for(char c:s) freq[c-'a']++;—character frequency
  4. pre[i+1] = pre[i] + a[i]; with sum(l,r) = pre[r+1] - pre[l]—prefix sums
  5. Use s += c not s = s + c; use (int)v.size() not bare v.size()—avoid the two most common bugs

Before You Code Checklist

Before writing your solution to an array/string problem, verify:

  • [ ] Bounds: What is the maximum n? Does O(n2) fit, or do you need O(nlogn) or O(n)?
  • [ ] Type: Will values exceed int? Use long long for sums and products. Check intermediate expressions too.
  • [ ] Edge cases: Empty array? Single element? All elements identical? String of length 1?
  • [ ] Indexing convention: 0-indexed or 1-indexed? Stick with one. If the problem uses 1-indexed, decide whether to convert on input.
  • [ ] Output format: Spaces between elements? Trailing space OK? Newline at end? Read the output specification twice.

Gotchas and Debugging

These are the bugs that cost rounds. Most array and string errors fall into a handful of recurring patterns—learn to recognize them before they bite.

Off-by-One Errors

  • v.size() returns size_t (unsigned). v.size() - 1 when v is empty gives a huge number, not 1. Cast to int or check v.empty() first.
  • 0-indexed vs 1-indexed: pick one convention per problem and stick with it. Most CF editorial code uses 0-indexed arrays but 1-indexed in the problem statement.

String Concatenation in a Loop

cpp
// BAD: O(n^2) total -- each += copies the entire string
string result;
for(int i = 0; i < n; i++) result += to_string(a[i]) + " ";

// BETTER: pre-reserve or use ostringstream
string result;
result.reserve(n * 12);
for(int i = 0; i < n; i++){
    result += to_string(a[i]);
    result += ' ';
}

In practice, for competitive programming, just printing in a loop is fine. Only worry about this if you're building a large string to use later.

Iterator Invalidation

  • push_back may reallocate, invalidating all iterators to the vector.
  • erase invalidates iterators at and after the erased position.
  • Never do for(auto it = v.begin(); it != v.end(); ++it) if(*it == x) v.erase(it);—use the erase-remove idiom instead.

The crash scenario—push_back during iteration:

cpp
// BUG: Undefined Behavior! push_back may reallocate, invalidating `it`
vector<int> v = {1, 2, 3, 4, 5};
for(auto it = v.begin(); it != v.end(); ++it){
    if(*it % 2 == 0)
        v.push_back(*it * 10);  // UB: reallocation invalidates `it`
}
// May crash, may silently corrupt data, may appear to "work" -- all are UB
text
  What happens when push_back triggers reallocation:

  Before push_back:
  v: +---+---+---+---+---+
     | 1 | 2 | 3 | 4 | 5 |  capacity=5
     +---+---+---+---+---+
             ^
             it (valid pointer into this block)

  After push_back(20) triggers realloc:
  OLD block (freed!):
     +---+---+---+---+---+
     | ? | ? | ? | ? | ? |  <-- deallocated memory!
     +---+---+---+---+---+
             ^
             it still points here --> DANGLING POINTER

  NEW block:
     +---+---+---+---+---+----+---+---+---+---+
     | 1 | 2 | 3 | 4 | 5 | 20 |   |   |   |   |  capacity=10
     +---+---+---+---+---+----+---+---+---+---+

Fix 1—use index-based loop (preferred in CP):

cpp
vector<int> v = {1, 2, 3, 4, 5};
int orig_size = (int)v.size();
// Invariant: i scans only the original elements; new elements appended beyond orig_size
for(int i = 0; i < orig_size; i++){
    if(v[i] % 2 == 0)
        v.push_back(v[i] * 10);  // safe: index is not invalidated by realloc
}
// v = {1, 2, 3, 4, 5, 20, 40}

Fix 2—reserve enough capacity upfront:

cpp
vector<int> v = {1, 2, 3, 4, 5};
v.reserve(v.size() * 2);  // guarantee no reallocation
// Now iterators remain valid as long as we don't exceed reserved capacity
for(auto it = v.begin(); it != v.end(); ++it){
    // still risky if end() changes -- prefer index-based loops
}

Rule of thumb for CP: Avoid iterator-based loops when modifying the container. Use for(int i = 0; i < ...)—indices survive reallocation, iterators don't. See STL Containers for more on iterator validity rules across container types.

char Signedness

  • char is signed on most x86 compilers. s[i] - 'a' can be negative if s[i] is not a lowercase letter. Always validate input range or cast to unsigned char.

Multi-Test Case Pitfalls

  • Global arrays / vectors must be cleared or resized each test case. Prefer declaring inside the while(t--) loop.
  • If using global int a[MAXN], use fill(a, a+n, 0)—don't memset to arbitrary values (only works for 0 and -1).

getline after cin >>

cpp
int n;
cin >> n;
// cin >> leaves '\n' in the buffer
string s;
getline(cin, s);  // reads empty string!
// Fix:
cin.ignore();     // or: cin >> ws;
getline(cin, s);  // now reads the actual line

Integer Overflow

  • v.size() is size_t (usually unsigned long long). Arithmetic like v.size() - 2 can underflow.
  • Array indices multiplied together: a[i * n + j]—if i and n are int, the product can overflow for n>46340.

Comparing Strings

  • == on strings is O(n), not O(1). In tight loops, consider hashing.
  • Lexicographic comparison (<) short-circuits on first difference—usually fast in practice.

Mental Traps

Confusing index with value. When finding the maximum element, it is easy to store the result ambiguously. "The index of the max element" and "the max element" are two different things. Use distinct variable names (max_idx vs max_val) to prevent subtle bugs where you return a position when you meant a value, or vice versa.

text
  WRONG -- ambiguous variable
  +---+---+---+---+---+
  | 8 | 3 | 9 | 1 | 5 |
  +---+---+---+---+---+
    0   1   2   3   4

  max = 2     index or value?

  RIGHT -- distinct names
  +---+---+---+---+---+
  | 8 | 3 | 9 | 1 | 5 |
  +---+---+---+---+---+
    0   1   2   3   4
            ^
  max_idx = 2    <-- position
  max_val = 9    <-- value at position

"I already know arrays." This overconfidence is exactly what produces a string of off-by-one bugs in the first 10 minutes of a contest. The fact that you know arrays does not mean you reason about them carefully under pressure. Slow down at every boundary condition—explicitly verify your loop bounds, your slice endpoints, and your termination conditions before submitting.

text
  WRONG -- casual loop bound
  +---+---+---+---+---+---+
  | 3 | 1 | 4 | 1 | 5 | 9 |  n = 6
  +---+---+---+---+---+---+--
    0   1   2   3   4   5   6
                            ^
                    i reaches here -- out of bounds

  RIGHT -- half-open range [0, n)
  +---+---+---+---+---+---+
  | 3 | 1 | 4 | 1 | 5 | 9 |
  +---+---+---+---+---+---+
    0   1   2   3   4   5
    ^                       ^
    |<--- valid range ----->|
    i from 0       stops before n

Mutating a string through a reference when you wanted a copy. In C++, strings are mutable, so passing by reference and modifying inside a function silently changes the caller's data. If you intended to work on a copy, pass by value. This mistake is invisible during testing when you only call the function once, and only surfaces with multiple calls or when the caller reuses the string.

text
  WRONG -- pass by ref mutates caller
  caller  +---+---+---+---+
          | a | b | c | d |
          +---+---+---+---+
               |
               v  (ref -- same object)
  func    +---+---+---+---+
          | a | # | c | d |  caller sees # too!
          +---+---+---+---+

  RIGHT -- pass by value keeps original
  caller  +---+---+---+---+
          | a | b | c | d |  <-- unchanged
          +---+---+---+---+
               |
               v  (copy -- new object)
  func    +---+---+---+---+
          | a | # | c | d |  only the copy changed
          +---+---+---+---+

The Mistake That Teaches

It was a Div. 2 round, problem B, rated 1200. The task: given a string, build the lexicographically smallest string by appending characters from either end. My solution: loop, compare s.front() vs s.back(), append the smaller one, pop it.

It passed pretests. I moved on to C. Thirty minutes later: "Wrong Answer on test 47." I stared at the code. The logic seemed airtight. Then I spotted it: when front and back were equal, I always took the front—but the correct choice depends on which interior characters come next. I needed to compare the remaining string with its reverse, not just the endpoints.

But the real lesson wasn't the logic bug. It was that I'd written s = s.substr(1) to "pop the front." On a string of length 200,000, that's O(n) per step—making my solution O(n2). Even after fixing the logic, it TLEd. The fix: use two pointers lo and hi into the original string, never copying.

Two bugs in one solution. The logic bug was the one I noticed; the performance bug was the one that cost me the round. Now I always ask: "Am I copying a string inside a loop?"

See: Debugging and Stress Testing for systematic approaches to finding these bugs


When NOT to Use This

Arrays and strings are the default, but sometimes they're the wrong tool:

SituationWhy Arrays FailUse Instead
Frequent insertions/deletions in the middleO(n) shifting per operationset, multiset, or a balanced BST
Need O(logn) rank/select queriesArrays only support O(n) search (unsorted) or O(logn) search (sorted, but insert breaks it)Policy-based tree (order_of_key) or segment tree
FIFO queue behavior (push back, pop front)vector pop_front is O(n)deque or queue
Key-value lookups by non-integer keysCan't index by string or large integerunordered_map or map
Persistent/versioned dataModifying an array destroys previous statePersistent segment tree or copy-on-write
Need set intersection/unionManual merge is error-proneset_intersection on sorted vectors, or use bitset

In CP, use vector by default. Only switch when the problem demands an operation that arrays can't do efficiently. See STL Containers for the full container reference, or the Data Structure Selection Guide for a quick decision flowchart.


What Would You Do If...?

Scenario 1: You're solving a problem that says "find all pairs (i,j) with a[i]+a[j]=K." The naive O(n^2) approach TLEs at n=200,000.

Sort the array, then use two pointers from both ends. Each pointer moves at most n times total, so it's O(n log n) for the sort + O(n) for the scan. Alternatively, use a hash set: for each element x, check if K-x has been seen.

Scenario 2: Your string solution passes on small tests but TLEs on n=500,000. You're building a result string character by character.

Check: Are you using s = s + c anywhere? Switch to s += c. Are you calling s.substr() inside a loop? Replace with index tracking. Are you calling s.find() in a loop? That's O(n*m) per call—switch to string hashing or KMP.

Scenario 3: You need to process queries: "how many elements in the array fall in range [L, R]?" with 200,000 queries on an array of 200,000 elements.

A raw array can't answer range queries faster than O(n) per query. Sort the array and use lower_bound/upper_bound for O(log n) per query. If the array is modified between queries, you need a segment tree or BIT.


Self-Test

Before moving on, verify you can do each of these without looking back at the text:

  • [ ] Write a loop that iterates over every adjacent pair (a[i], a[i+1]) in an array of n elements, without accessing out-of-bounds memory.
  • [ ] Extract the substring s[l..r] (inclusive) using std::string::substr—get the arguments right on the first try.
  • [ ] Count the frequency of each lowercase letter in a string in O(n) time using an array, not a map.
  • [ ] Explain why s.size() - 1 is dangerous when s is empty and how to guard against it.
  • [ ] Given a 2D array int a[N][M], state which loop order is cache-friendly and which is not.
  • [ ] Given int max_idx and int max_val, explain which stores a position and which stores a value—then write correct code to find both for an array.
  • [ ] Describe what happens when you call v.push_back(x) while iterating over v with an iterator—and name the safe alternative.

Debug-This Exercises

Find the bug in each snippet. Answers are below the fold.

Bug 1: The disappearing elements

cpp
vector<int> v = {1, 2, 3, 4, 5, 6};
for(int i = 0; i < (int)v.size(); i++){
    if(v[i] % 2 == 0)
        v.erase(v.begin() + i);
}
// Expected: {1, 3, 5}  -- what do you actually get?

Bug 2: The infinite builder

cpp
string build(int n){
    string s;
    for(int i = 0; i < n; i++)
        s = s + (char)('a' + i % 26);
    return s;
}
// Compiles and gives correct output. What's the bug?

Bug 3: The subtle overflow

cpp
vector<int> v = {10, 20, 30};
for(int i = 0; i < v.size() - 1; i++){
    cout << v[i] << " ";
}
// Works fine. Now what happens if v is empty?

Bug 4: The wrong reverse

cpp
string s = "abcdef";
for(int i = 0; i <= (int)s.size(); i++){
    swap(s[i], s[s.size() - i]);
}
// What's wrong? (Two bugs!)
Answers (click to reveal)

Bug 1: After erasing element at index i, the next element shifts into position i, but i++ skips it. Result: {1, 3, 5, 6}—the 6 survives because 4 was erased at index 3, then i advances to 4 which is now past the end. Fix: don't increment i after an erase, or use the erase-remove idiom.

Bug 2: s = s + (char)(...) creates a temporary string each iteration, copying all of s. Total work: O(n2). For n=106, this TLEs. Fix: use s += (char)('a' + i % 26) for amortized O(1) per append.

Bug 3: v.size() is size_t (unsigned). When v is empty, v.size() - 1 wraps to ~0ULL (about 1.8×1019), and the loop runs until segfault. Fix: for(int i = 0; i < (int)v.size() - 1; i++) or check v.empty() first.

Bug 4: Two bugs: (a) s.size() - 0 = 6 which is out of bounds (valid indices are 0–5), should be s.size() - 1 - i; (b) the loop condition i <= s.size() should be i < s.size() / 2—otherwise you swap elements back to their original positions in the second half. Fix: for(int i = 0; i < (int)s.size()/2; i++) swap(s[i], s[s.size()-1-i]);


Practice Problems

#ProblemSourceDifficultyKey Idea
1WatermelonCF 4A800Basic if-statement, array boundary
2WordCF 59A800Character frequency, case conversion
3Distinct NumbersCSES~900Sort + unique
4Palindrome ReorderCSES~1000Frequency counting, string building
5Yet Another Palindrome ProblemCF 1324C1100Frequency + position tracking
6Make It GoodCF 1385C1200Suffix analysis, array structure
7Good SubarraysCF 1398C1600Prefix sums on transformed array
8Maximum Subarray SumCSES~1600Kadane's algorithm
9Xenia and Colorful GemsCF 1336C1700Sort + binary search on arrays
10Longest Palindromic PrefixCF 1326D22200String hashing + palindrome

For a structured progression through array and string problems, see the Practice Ladder (1100–1400).


Further Reading

Related guidebook entries:


Next Up: STL Containersset, map, deque, priority_queue, and when each one beats a plain vector.

Built for the climb from Codeforces 1100 to 2100.