Skip to content

Hash Map and Unordered Map

Fast key-value lookups in O(1) average. When the problem is really "remember what I've seen" or "jump straight to the bucket that owns this key," this is the default tool for contests.

Difficulty: Beginner | Prerequisites: Arrays and Strings | CF Rating Range: 800–2100


Intuition

You have six user IDs and need to find which ones appear more than once:

text
a = [5, 3, 8, 3, 12, 5]

The naive approach: compare each element with everything that comes after it.

text
i=0 (5):  compare with 3, 8, 3, 12, 5   -->  5 comparisons  (match at j=5)
i=1 (3):  compare with 8, 3, 12, 5       -->  4 comparisons  (match at j=3)
i=2 (8):  compare with 3, 12, 5          -->  3 comparisons  (no match)
i=3 (3):  compare with 12, 5             -->  2 comparisons  (already counted)
i=4 (12): compare with 5                 -->  1 comparison   (no match)
                                              ---------------
                                 Total:       15 comparisons

For n=6, that is n(n1)2=15 comparisons. For n=105, that balloons to roughly 5×109—far too slow for a 2-second time limit. And this is just duplicate detection; frequency counting is equally painful with the same approach.

The PO Box Idea

If you can compute where a key lives in constant time, you never need to search for it.

Think of a post office with numbered PO boxes. When a letter arrives for box 137, the clerk doesn't inspect every box on the wall—she goes straight to 137. A hash map works the same way: a hash function h(k) turns a key k into a bucket index, and lookup becomes a direct jump instead of a scan.

The analogy has one crack: PO boxes have one owner each, but hash buckets can hold multiple keys. That is a collision, and it means the average O(1) lookup can degrade to O(chain length). In the worst case, when everything lands in one bucket, every operation becomes O(n).

Trace: [5, 3, 8, 3, 12, 5] into 7 Buckets

Count the frequency of each element in a table with m=7 buckets and hash h(k)=kmod7.

Step 1: Insert a[0] = 5. h(5)=5. Bucket 5 empty. Store 5 : 1.

text
  Bucket   Contents
  +-----+-----------+
  | [5] | 5 : 1     |  <-- inserted
  +-----+-----------+

Step 2: Insert a[1] = 3. h(3)=3. Bucket 3 empty. Store 3 : 1.

Step 3: Insert a[2] = 8. h(8)=1. Bucket 1 empty. Store 8 : 1.

text
  Bucket   Contents
  +-----+-----------+
  | [1] | 8 : 1     |  <-- inserted
  | [3] | 3 : 1     |
  | [5] | 5 : 1     |
  +-----+-----------+

Step 4: Insert a[3] = 3. h(3)=3mod7=3. Bucket 3 already has key 3 with count 1. Key matches—increment count to 2. Duplicate found!

text
  Bucket   Contents
  +-----+-----------+
  | [0] | (empty)   |
  | [1] | 8 : 1     |
  | [2] | (empty)   |
  | [3] | 3 : 2     |  <-- count updated (duplicate!)
  | [4] | (empty)   |
  | [5] | 5 : 1     |
  | [6] | (empty)   |
  +-----+-----------+
  ops so far: 4

Step 5: Insert a[4] = 12. h(12)=12mod7=5. Bucket 5 has key 5—but 125, so this is a collision. Chain key 12 after key 5 in the same bucket. Store 12 -> 1.

text
  Bucket   Contents
  +-----+---------------------+
  | [1] | 8 : 1               |
  | [3] | 3 : 2               |
  | [5] | 5 : 1  -->  12 : 1  |  <-- collision chain
  +-----+---------------------+

Step 6: Insert a[5] = 5. h(5)=5mod7=5. Bucket 5 has chain 5:1 -> 12:1. First entry matches key 5—increment count to 2. Duplicate found!

text
  Bucket   Contents
  +-----+---------------------+
  | [1] | 8 : 1               |
  | [3] | 3 : 2               |
  | [5] | 5 : 2  -->  12 : 1  |
  +-----+---------------------+

Final tally: 7 hash-table operations for 6 elements, versus 15 brute-force comparisons. With a good hash function, each insertion and lookup averages O(1), giving O(n) total work. The gap only widens: brute force is O(n2); the hash table stays O(n).

The Invariant

text
+-------------------------------------------------------------------+
| Every key k in the table lives in bucket h(k) % m.                |
| To find, insert, or update k, we only examine bucket h(k) % m     |
| and its collision chain — never the entire table.                 |
+-------------------------------------------------------------------+

This invariant holds because every insertion computes the hash before deciding where to store the key. No key is ever placed in the wrong bucket.

Look at Step 4: when we insert the second 3, we compute h(3)=3 and go straight to bucket 3. The invariant guarantees the first 3 is already there, so we find it in one probe and increment its count. Without this invariant—if keys could drift to arbitrary buckets—we'd be back to scanning everything.

The invariant degrades when all keys hash to the same bucket. That single bucket's chain grows to length n, and every operation becomes O(n). This is exactly what anti-hash attacks exploit: adversaries craft inputs so h(k)modm is identical for every key. The defense is a custom hash with random salt so the attacker cannot predict bucket assignments.

operator[] Creates Entries Silently

operator[] is a trap: it doesn't just read, it writes.

When you write if (m[key] == 0), you are not just checking—you are inserting key with value 0 if it did not exist. After that line, m.size() has grown by one. In a loop that tests millions of nonexistent keys, this silently fills the map with garbage entries and causes MLE or wrong answers.

Rule: use m.count(key), m.find(key) != m.end(), or C++20's m.contains(key) for existence checks. Never use m[key] unless you intend to create the entry.

Anti-Hash Attacks

The default std::hash<int> on many compilers is close to the identity function (it returns the key itself or a trivial transform). The STL typically uses a power-of-2 bucket count m=2k, so the bucket index is hash(key) & (m - 1)—just the low-order bits. An attacker who knows m can craft n keys that all share the same low bits, forcing every key into one bucket:

text
  Normal distribution (good hash):       Anti-hash attack (identity hash):

  Bucket  Chain                           Bucket  Chain
  [0]  -> A                               [0]  -> K1 -> K2 -> ... -> Kn
  [1]  -> B -> C                          [1]  -> (empty)
  [2]  -> (empty)                         [2]  -> (empty)
  [3]  -> D                               [3]  -> (empty)
  [4]  -> E -> F                          ...
  [5]  -> (empty)                         [m-1]-> (empty)
  ...
  Avg lookup: O(1)                        Every lookup: O(n)  --> TLE!

  How it happens:
    default hash(k) ≈ k                (identity-like)
    bucket_count m  = 2^k              (power of 2)
    bucket index    = hash(k) & (m-1)  (low bits)
    Attacker picks 0, m, 2m, 3m, ...   (all share low bits)
    n = 10^5 keys × 10^5 chain = 10^10 ops --> TLE

This is not theoretical—it happens on Codeforces regularly. The defense is always the same: use a custom hash with a random salt so bucket assignments are unpredictable (see the custom hash implementation below).

Chaining vs. Open Addressing

The two standard collision strategies, side by side:

text
  CHAINING                                OPEN ADDRESSING
  Buckets     Linked lists                Buckets (inline storage)
  +-----+     +---+   +---+              +-----+-----+-----+-----+-----+
  | [0] | --> | A | ->| D |              |  A  |  B  | gap |  D  |  C  |
  +-----+     +---+   +---+              +-----+-----+-----+-----+-----+
  | [1] | --> | B |                       Probe sequence: h(k), h(k)+1, ...
  +-----+     +---+
  | [2] |     (empty)                     + Better cache locality
  +-----+                                 + No pointer chasing
  | [3] | --> | C |                       - Harder to delete
  +-----+     +---+                       - Degrades at high load factor

  + Simple, handles high load             See: gp_hash_table from <ext/pb_ds>
  - Pointer chasing = cache misses
  - High memory overhead per node

"unordered_map is always faster" is a common trap. Prefer map—a balanced BST with guaranteed O(logn)—when:

  • You need sorted iteration or lower_bound / upper_bound queries.
  • You need range queries ("how many keys in [L,R]?").
  • n is small (105)—map can have better cache locality for small node counts and no hash overhead, so it can beat unordered_map in practice.
  • Contest safetymap has O(logn) worst case. No anti-hash attack can degrade it. When in doubt during a contest, map is the safer default.
text
  +---------------------------------------------------+
  |  DECISION: map vs unordered_map                   |
  +---------------------------------------------------+
  |  Need sorted order / lower_bound?                 |
  |    YES --> use map                                |
  |    NO  |                                          |
  |        v                                          |
  |  Keys are ints in [0, C), C <= 10^7?              |
  |    YES --> use vector<int> freq(C, 0)             |
  |    NO  |                                          |
  |        v                                          |
  |  n > 10^5 and need max speed?                     |
  |    YES --> unordered_map + custom_hash + reserve  |
  |    NO  |                                          |
  |        v                                          |
  |  Default: map (safe, predictable O(log n))        |
  +---------------------------------------------------+

STL Reference

ContainerHeaderDescriptionAvg LookupWorst Lookup
unordered_map<K,V><unordered_map>Hash table, key-value pairsO(1)O(n)
unordered_set<K><unordered_set>Hash table, keys only (no values)O(1)O(n)
map<K,V><map>Balanced BST, key-value pairsO(logn)O(logn)
set<K><set>Balanced BST, keys onlyO(logn)O(logn)

One pitfall to watch: after m.reserve(1 << 20) you have about 1M buckets. Calling m.clear() visits every bucket even if the map holds only 10 entries. In a tight loop that repeatedly fills and clears a map, this hidden cost can cause TLE. Alternatives: (a) swap with a fresh map (m = {};), or (b) track inserted keys in a side vector and erase only those.

OperationSyntaxNotes
Insert / updatem[key] = val;Creates entry if key absent (default-constructs value)
Insert without overwritem.insert({key, val});No-op if key exists
Lookup (unsafe)m[key]Inserts default if missing! Don't use for existence checks
Lookup (safe)m.count(key) or m.find(key) != m.end()Does not insert
Lookup (C++20)m.contains(key)Cleanest existence check
Deletem.erase(key);No-op if key absent
Sizem.size()
Iteratefor (auto& [k,v] : m)Structured bindings (C++17)
Reserve bucketsm.reserve(n);Pre-allocate to avoid rehashing
Clearm.clear();O(bucket_count), not O(size)—see Gotchas

Implementation

Frequency Counting

cpp
#include <bits/stdc++.h>
using namespace std;

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n;
    cin >> n;
    unordered_map<int, int> freq;
    freq.reserve(n);

    for (int i = 0; i < n; i++) {
        int x;
        cin >> x;
        freq[x]++;
    }

    // Find the most frequent element
    int best_key = 0, best_cnt = 0;
    for (auto& [key, cnt] : freq) {
        if (cnt > best_cnt) {
            best_cnt = cnt;
            best_key = key;
        }
    }

    cout << best_key << " appears " << best_cnt << " times\n";
}

Custom Hash — Anti-Hack Safe

Use this in every CF contest where you reach for unordered_map or unordered_set. FIXED_RANDOM is computed once per run from the system clock, so the attacker cannot predict your bucket assignments.

cpp
#include <bits/stdc++.h>
using namespace std;

struct custom_hash {
    static uint64_t splitmix64(uint64_t x) {
        x += 0x9e3779b97f4a7c15;
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
        return x ^ (x >> 31);
    }

    size_t operator()(uint64_t x) const {
        static const uint64_t FIXED_RANDOM =
            chrono::steady_clock::now().time_since_epoch().count();
        return splitmix64(x + FIXED_RANDOM);
    }
};

// For pair<int,int> keys (e.g., coordinate mapping)
struct pair_hash {
    size_t operator()(pair<int,int> const& p) const {
        custom_hash h;
        auto a = h(p.first);
        auto b = h(p.second);
        return a ^ (b << 32 | b >> 32);
    }
};

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n;
    cin >> n;

    unordered_map<long long, int, custom_hash> freq;
    freq.reserve(1 << 20);  // ~1M buckets, avoids rehashing on most contest sizes

    for (int i = 0; i < n; i++) {
        long long x;
        cin >> x;
        freq[x]++;
    }

    // Coordinate map — keys are pair<int,int>
    unordered_map<pair<int,int>, int, pair_hash> coord;
    coord.reserve(1 << 20);
    coord[{3, 5}] = 42;

    for (auto& [key, val] : freq)
        cout << key << " : " << val << "\n";
}

Operations Reference

Operationunordered_map (avg)unordered_map (worst)map
InsertO(1)O(n)O(logn)
LookupO(1)O(n)O(logn)
DeleteO(1)O(n)O(logn)
lower_boundN/AN/AO(logn)
Iterate allO(n)O(n)O(n) sorted

Space Complexity

ContainerSpaceNotes
unordered_mapO(n)~50-80 bytes per entry (high constant)
mapO(n)~40-50 bytes per entry
unordered_setO(n)Less than map variant (no value stored)

Practical note: unordered_map uses 3–5× more memory than a plain array. If keys are small integers in [0,C), prefer a plain vector<int> freq(C, 0)—it is faster and uses less memory.


Problem Patterns

Frequency counting. "How many times does each element appear?" / "Most frequent." The base pattern.

cpp
unordered_map<int, int, custom_hash> freq;
for (int x : a) freq[x]++;

CF 4C—Registration System (1300). Frequency counting with string keys. CF 1676F—Longest Strike (1300). Frequency map + sweep.

Two-sum / complement lookup. For each element, check whether its complement already exists in the map. Check seen[target - x] before inserting x to avoid pairing an element with itself (unless the problem allows i=j).

cpp
unordered_map<int, int, custom_hash> seen;
long long ans = 0;
for (int x : a) {
    ans += seen[target - x];
    seen[x]++;
}

CF 1520D—Same Differences (1200). Transform to aii, count pairs. CSES 1640—Sum of Two Values.

Always clarify whether the problem wants ordered or unordered pairs, and whether i=j is allowed. The hash map doesn't care—you have to.

Coordinate / state mapping. Map composite keys—pairs, tuples, strings—to values. Common in grid problems and state-space search.

cpp
unordered_map<pair<int,int>, int, pair_hash> last_seen;
// or use map<pair<int,int>, int> for simplicity

CF 1296C—Yet Another Walking Robot (1500). Coordinate pair as key, cycle detection.

First / last index tracking. Store the first (or most recent) occurrence index of each value.

cpp
unordered_map<int, int, custom_hash> first_idx;
for (int i = 0; i < n; i++) {
    if (!first_idx.contains(a[i]))
        first_idx[a[i]] = i;
}

CF 782A—Andryusha and Socks (800).

Canonical signature as key. Normalize each element to a canonical form, then group by that form.

cpp
// Group numbers by their "complement" prime factorization
map<vector<int>, int> sig_count;

CF 1225D—Power Products (1700). Prime signature as map key, complement lookup.

String set deduplication. Store substrings or hashes in a set to count distinct matches.

cpp
unordered_set<string> seen;
for (int i = 0; i < n; i++) {
    string sub = s.substr(i, len);
    if (is_good(sub)) seen.insert(sub);
}

CF 271D—Good Substrings (1500).

When Not to Use

  • Need sorted iteration or lower_bound / upper_bound. Hash maps have no ordering. Use map or set.
  • Keys are small integers in [0,C), C107. A plain vector<int> freq(C, 0) is faster and uses far less memory.
  • Range queries ("how many keys fall in [L,R]?"). Hash tables cannot answer range queries. Use a balanced BST (set / map), Fenwick tree, or segment tree.
  • Contest with aggressive hacking and you are using default hash. You will get hacked. Switch to map or add custom_hash.

Gotchas

m[key] silently inserts. The #1 beginner bug—covered in detail above. Use count, find, or contains for existence checks.

Common Mistakes & Debugging

Forgetting .reserve() causes repeated rehashing. Without reserve, the container rebuilds from scratch each time it crosses the load-factor threshold—each rehash is O(n) and can happen many times during a single insertion loop. m.reserve(1 << 20) avoids this for most contest sizes.

m.clear() is O(bucket_count), not O(size). After m.reserve(1 << 20), clearing visits $\sim$1M buckets even if only 5 keys were inserted. In a T-test-case loop that clears a pre-reserved map each round, total cost is O(Tbucket_count)—easily TLE. Either skip reserve when clearing per test, declare a fresh unordered_map per case, or track inserted keys in a side vector and erase only those.

Iteration order is non-deterministic. unordered_map order depends on hash values and bucket count—do not rely on insertion order. If you need ordered output, collect into a vector and sort.

erase(key) removes the entire key, not one occurrence. This trips up anyone who treats a frequency map like a multiset:

cpp
unordered_map<int,int> freq;
for (int x : a) freq[x]++;
freq.erase(5);                        // wipes key 5 entirely
if (--freq[5] == 0) freq.erase(5);    // decrement then cleanup — correct

Erasing during iteration invalidates the iterator. Use the erase-return-iterator idiom:

cpp
for (auto it = m.begin(); it != m.end(); ) {
    if (should_remove(it->second))
        it = m.erase(it);
    else
        ++it;
}

No default hash for pair, tuple, or custom structs. You must provide a hash functor (see pair_hash in Implementation) or use map instead. A quick alternative for pair<int,int>: encode as a single long long with (uint64_t)a << 32 | (uint32_t)b.

Mental Traps

"unordered_map is always faster than map." For small n or pointer-heavy workloads, map can win. A red-black tree node sits in one allocation; a hash table chains through bucket pointers, linked-list nodes, and potential rehashes. On n=104 integer insertions you may see map win by 10–30% due to fewer cache misses and zero hashing overhead. The crossover where unordered_map reliably pulls ahead is typically around n=105106. Default to map in contests unless you've profiled a bottleneck and added custom_hash + reserve.

"O(1) average means safe." O(1) average hides the O(n) worst case. On Codeforces with hacking enabled, adversaries craft inputs that force every key into one bucket—turning your hash map into a linked list. The mechanism is detailed in Anti-Hash Attacks above. Always use custom_hash with a random salt on CF, or just use map for guaranteed O(logn).

"clear() is cheap." m.clear() is O(bucket_count), not O(size). After m.reserve(1 << 20), clearing visits ~1M buckets even if only 5 keys were inserted. In a loop over T test cases each calling clear() on a pre-reserved map, total cost is O(Tbucket_count)—easily TLE. Either skip reserve if you clear per test case, or erase keys manually from a tracked list.


Self-Test

Before moving on, verify you can answer these without looking back:

  • [ ] What happens when you write m[key] and key is not in the map? Why is this dangerous for existence checks?
  • [ ] Why does std::unordered_map with default hash get TLE on Codeforces? Describe the attack mechanism in one sentence.
  • [ ] Write the custom_hash struct from memory (or at least describe what splitmix64 + random salt achieves).
  • [ ] When should you prefer map over unordered_map? Name at least 3 situations.
  • [ ] What is the time complexity of m.clear() after m.reserve(1 << 20) with only 10 elements inserted? Why?
  • [ ] unordered_map has no default hash for pair<int,int>. What are your two options?
  • [ ] In the two-sum pattern, why do we check seen[target - x] before inserting x?

Practice Problems

#ProblemRatingKey Technique
1782A—Andryusha and Socks800Set membership tracking
21520D—Same Differences1200Transform aii, count pairs via map
34C—Registration System1300Frequency counting with string keys
41676F—Longest Strike1300Frequency map + sweep
51296C—Yet Another Walking Robot1500Coordinate pair as key, cycle detection
6271D—Good Substrings1500String set deduplication
71225D—Power Products1700Prime signature as map key
8CSES 1640—Sum of Two ValuesComplement lookup
9CSES 1641—Sum of Three ValuesFix one, hash the other two
10CSES 1642—Sum of Four ValuesPairwise sum hash map
11CSES 1660—Subarray Sums IPrefix sum + hash map

Further Reading

Related topics:

Built for the climb from Codeforces 1100 to 2100.