Appearance
Hash Map and Unordered Map
Fast key-value lookups in
Difficulty: Beginner | Prerequisites: Arrays and Strings | CF Rating Range: 800–2100
Intuition
You have six user IDs and need to find which ones appear more than once:
text
a = [5, 3, 8, 3, 12, 5]The naive approach: compare each element with everything that comes after it.
text
i=0 (5): compare with 3, 8, 3, 12, 5 --> 5 comparisons (match at j=5)
i=1 (3): compare with 8, 3, 12, 5 --> 4 comparisons (match at j=3)
i=2 (8): compare with 3, 12, 5 --> 3 comparisons (no match)
i=3 (3): compare with 12, 5 --> 2 comparisons (already counted)
i=4 (12): compare with 5 --> 1 comparison (no match)
---------------
Total: 15 comparisonsFor
The PO Box Idea
If you can compute where a key lives in constant time, you never need to search for it.
Think of a post office with numbered PO boxes. When a letter arrives for box 137, the clerk doesn't inspect every box on the wall—she goes straight to 137. A hash map works the same way: a hash function
The analogy has one crack: PO boxes have one owner each, but hash buckets can hold multiple keys. That is a collision, and it means the average
Trace: [5, 3, 8, 3, 12, 5] into 7 Buckets
Count the frequency of each element in a table with
Step 1: Insert a[0] = 5. 5 : 1.
text
Bucket Contents
+-----+-----------+
| [5] | 5 : 1 | <-- inserted
+-----+-----------+Step 2: Insert a[1] = 3. 3 : 1.
Step 3: Insert a[2] = 8. 8 : 1.
text
Bucket Contents
+-----+-----------+
| [1] | 8 : 1 | <-- inserted
| [3] | 3 : 1 |
| [5] | 5 : 1 |
+-----+-----------+Step 4: Insert a[3] = 3.
text
Bucket Contents
+-----+-----------+
| [0] | (empty) |
| [1] | 8 : 1 |
| [2] | (empty) |
| [3] | 3 : 2 | <-- count updated (duplicate!)
| [4] | (empty) |
| [5] | 5 : 1 |
| [6] | (empty) |
+-----+-----------+
ops so far: 4Step 5: Insert a[4] = 12. 12 -> 1.
text
Bucket Contents
+-----+---------------------+
| [1] | 8 : 1 |
| [3] | 3 : 2 |
| [5] | 5 : 1 --> 12 : 1 | <-- collision chain
+-----+---------------------+Step 6: Insert a[5] = 5. 5:1 -> 12:1. First entry matches key 5—increment count to 2. Duplicate found!
text
Bucket Contents
+-----+---------------------+
| [1] | 8 : 1 |
| [3] | 3 : 2 |
| [5] | 5 : 2 --> 12 : 1 |
+-----+---------------------+Final tally: 7 hash-table operations for 6 elements, versus 15 brute-force comparisons. With a good hash function, each insertion and lookup averages
The Invariant
text
+-------------------------------------------------------------------+
| Every key k in the table lives in bucket h(k) % m. |
| To find, insert, or update k, we only examine bucket h(k) % m |
| and its collision chain — never the entire table. |
+-------------------------------------------------------------------+This invariant holds because every insertion computes the hash before deciding where to store the key. No key is ever placed in the wrong bucket.
Look at Step 4: when we insert the second 3, we compute 3 is already there, so we find it in one probe and increment its count. Without this invariant—if keys could drift to arbitrary buckets—we'd be back to scanning everything.
The invariant degrades when all keys hash to the same bucket. That single bucket's chain grows to length
operator[] Creates Entries Silently
operator[] is a trap: it doesn't just read, it writes.
When you write if (m[key] == 0), you are not just checking—you are inserting key with value 0 if it did not exist. After that line, m.size() has grown by one. In a loop that tests millions of nonexistent keys, this silently fills the map with garbage entries and causes MLE or wrong answers.
Rule: use m.count(key), m.find(key) != m.end(), or C++20's m.contains(key) for existence checks. Never use m[key] unless you intend to create the entry.
Anti-Hash Attacks
The default std::hash<int> on many compilers is close to the identity function (it returns the key itself or a trivial transform). The STL typically uses a power-of-2 bucket count hash(key) & (m - 1)—just the low-order bits. An attacker who knows
text
Normal distribution (good hash): Anti-hash attack (identity hash):
Bucket Chain Bucket Chain
[0] -> A [0] -> K1 -> K2 -> ... -> Kn
[1] -> B -> C [1] -> (empty)
[2] -> (empty) [2] -> (empty)
[3] -> D [3] -> (empty)
[4] -> E -> F ...
[5] -> (empty) [m-1]-> (empty)
...
Avg lookup: O(1) Every lookup: O(n) --> TLE!
How it happens:
default hash(k) ≈ k (identity-like)
bucket_count m = 2^k (power of 2)
bucket index = hash(k) & (m-1) (low bits)
Attacker picks 0, m, 2m, 3m, ... (all share low bits)
n = 10^5 keys × 10^5 chain = 10^10 ops --> TLEThis is not theoretical—it happens on Codeforces regularly. The defense is always the same: use a custom hash with a random salt so bucket assignments are unpredictable (see the custom hash implementation below).
Chaining vs. Open Addressing
The two standard collision strategies, side by side:
text
CHAINING OPEN ADDRESSING
Buckets Linked lists Buckets (inline storage)
+-----+ +---+ +---+ +-----+-----+-----+-----+-----+
| [0] | --> | A | ->| D | | A | B | gap | D | C |
+-----+ +---+ +---+ +-----+-----+-----+-----+-----+
| [1] | --> | B | Probe sequence: h(k), h(k)+1, ...
+-----+ +---+
| [2] | (empty) + Better cache locality
+-----+ + No pointer chasing
| [3] | --> | C | - Harder to delete
+-----+ +---+ - Degrades at high load factor
+ Simple, handles high load See: gp_hash_table from <ext/pb_ds>
- Pointer chasing = cache misses
- High memory overhead per node"unordered_map is always faster" is a common trap. Prefer map—a balanced BST with guaranteed
- You need sorted iteration or
lower_bound/upper_boundqueries. - You need range queries ("how many keys in
?"). is small ( )— mapcan have better cache locality for small node counts and no hash overhead, so it can beatunordered_mapin practice.- Contest safety—
maphasworst case. No anti-hash attack can degrade it. When in doubt during a contest, mapis the safer default.
text
+---------------------------------------------------+
| DECISION: map vs unordered_map |
+---------------------------------------------------+
| Need sorted order / lower_bound? |
| YES --> use map |
| NO | |
| v |
| Keys are ints in [0, C), C <= 10^7? |
| YES --> use vector<int> freq(C, 0) |
| NO | |
| v |
| n > 10^5 and need max speed? |
| YES --> unordered_map + custom_hash + reserve |
| NO | |
| v |
| Default: map (safe, predictable O(log n)) |
+---------------------------------------------------+STL Reference
| Container | Header | Description | Avg Lookup | Worst Lookup |
|---|---|---|---|---|
unordered_map<K,V> | <unordered_map> | Hash table, key-value pairs | ||
unordered_set<K> | <unordered_set> | Hash table, keys only (no values) | ||
map<K,V> | <map> | Balanced BST, key-value pairs | ||
set<K> | <set> | Balanced BST, keys only |
One pitfall to watch: after m.reserve(1 << 20) you have about 1M buckets. Calling m.clear() visits every bucket even if the map holds only 10 entries. In a tight loop that repeatedly fills and clears a map, this hidden cost can cause TLE. Alternatives: (a) swap with a fresh map (m = {};), or (b) track inserted keys in a side vector and erase only those.
| Operation | Syntax | Notes |
|---|---|---|
| Insert / update | m[key] = val; | Creates entry if key absent (default-constructs value) |
| Insert without overwrite | m.insert({key, val}); | No-op if key exists |
| Lookup (unsafe) | m[key] | Inserts default if missing! Don't use for existence checks |
| Lookup (safe) | m.count(key) or m.find(key) != m.end() | Does not insert |
| Lookup (C++20) | m.contains(key) | Cleanest existence check |
| Delete | m.erase(key); | No-op if key absent |
| Size | m.size() | |
| Iterate | for (auto& [k,v] : m) | Structured bindings (C++17) |
| Reserve buckets | m.reserve(n); | Pre-allocate to avoid rehashing |
| Clear | m.clear(); |
Implementation
Frequency Counting
cpp
#include <bits/stdc++.h>
using namespace std;
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n;
cin >> n;
unordered_map<int, int> freq;
freq.reserve(n);
for (int i = 0; i < n; i++) {
int x;
cin >> x;
freq[x]++;
}
// Find the most frequent element
int best_key = 0, best_cnt = 0;
for (auto& [key, cnt] : freq) {
if (cnt > best_cnt) {
best_cnt = cnt;
best_key = key;
}
}
cout << best_key << " appears " << best_cnt << " times\n";
}Custom Hash — Anti-Hack Safe
Use this in every CF contest where you reach for unordered_map or unordered_set. FIXED_RANDOM is computed once per run from the system clock, so the attacker cannot predict your bucket assignments.
cpp
#include <bits/stdc++.h>
using namespace std;
struct custom_hash {
static uint64_t splitmix64(uint64_t x) {
x += 0x9e3779b97f4a7c15;
x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
return x ^ (x >> 31);
}
size_t operator()(uint64_t x) const {
static const uint64_t FIXED_RANDOM =
chrono::steady_clock::now().time_since_epoch().count();
return splitmix64(x + FIXED_RANDOM);
}
};
// For pair<int,int> keys (e.g., coordinate mapping)
struct pair_hash {
size_t operator()(pair<int,int> const& p) const {
custom_hash h;
auto a = h(p.first);
auto b = h(p.second);
return a ^ (b << 32 | b >> 32);
}
};
int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n;
cin >> n;
unordered_map<long long, int, custom_hash> freq;
freq.reserve(1 << 20); // ~1M buckets, avoids rehashing on most contest sizes
for (int i = 0; i < n; i++) {
long long x;
cin >> x;
freq[x]++;
}
// Coordinate map — keys are pair<int,int>
unordered_map<pair<int,int>, int, pair_hash> coord;
coord.reserve(1 << 20);
coord[{3, 5}] = 42;
for (auto& [key, val] : freq)
cout << key << " : " << val << "\n";
}Operations Reference
| Operation | unordered_map (avg) | unordered_map (worst) | map |
|---|---|---|---|
| Insert | |||
| Lookup | |||
| Delete | |||
lower_bound | N/A | N/A | |
| Iterate all |
Space Complexity
| Container | Space | Notes |
|---|---|---|
unordered_map | ~50-80 bytes per entry (high constant) | |
map | ~40-50 bytes per entry | |
unordered_set | Less than map variant (no value stored) |
Practical note: unordered_map uses 3–5× more memory than a plain array. If keys are small integers in vector<int> freq(C, 0)—it is faster and uses less memory.
Problem Patterns
Frequency counting. "How many times does each element appear?" / "Most frequent." The base pattern.
cpp
unordered_map<int, int, custom_hash> freq;
for (int x : a) freq[x]++;CF 4C—Registration System (1300). Frequency counting with string keys. CF 1676F—Longest Strike (1300). Frequency map + sweep.
Two-sum / complement lookup. For each element, check whether its complement already exists in the map. Check seen[target - x] before inserting x to avoid pairing an element with itself (unless the problem allows
cpp
unordered_map<int, int, custom_hash> seen;
long long ans = 0;
for (int x : a) {
ans += seen[target - x];
seen[x]++;
}CF 1520D—Same Differences (1200). Transform to
, count pairs. CSES 1640—Sum of Two Values.
Always clarify whether the problem wants ordered or unordered pairs, and whether
Coordinate / state mapping. Map composite keys—pairs, tuples, strings—to values. Common in grid problems and state-space search.
cpp
unordered_map<pair<int,int>, int, pair_hash> last_seen;
// or use map<pair<int,int>, int> for simplicityCF 1296C—Yet Another Walking Robot (1500). Coordinate pair as key, cycle detection.
First / last index tracking. Store the first (or most recent) occurrence index of each value.
cpp
unordered_map<int, int, custom_hash> first_idx;
for (int i = 0; i < n; i++) {
if (!first_idx.contains(a[i]))
first_idx[a[i]] = i;
}CF 782A—Andryusha and Socks (800).
Canonical signature as key. Normalize each element to a canonical form, then group by that form.
cpp
// Group numbers by their "complement" prime factorization
map<vector<int>, int> sig_count;CF 1225D—Power Products (1700). Prime signature as map key, complement lookup.
String set deduplication. Store substrings or hashes in a set to count distinct matches.
cpp
unordered_set<string> seen;
for (int i = 0; i < n; i++) {
string sub = s.substr(i, len);
if (is_good(sub)) seen.insert(sub);
}CF 271D—Good Substrings (1500).
When Not to Use
- Need sorted iteration or
lower_bound/upper_bound. Hash maps have no ordering. Usemaporset. - Keys are small integers in
, . A plain vector<int> freq(C, 0)is faster and uses far less memory. - Range queries ("how many keys fall in
?"). Hash tables cannot answer range queries. Use a balanced BST ( set/map), Fenwick tree, or segment tree. - Contest with aggressive hacking and you are using default hash. You will get hacked. Switch to
mapor addcustom_hash.
Gotchas
m[key] silently inserts. The #1 beginner bug—covered in detail above. Use count, find, or contains for existence checks.
Common Mistakes & Debugging
Forgetting .reserve() causes repeated rehashing. Without reserve, the container rebuilds from scratch each time it crosses the load-factor threshold—each rehash is m.reserve(1 << 20) avoids this for most contest sizes.
m.clear() is m.reserve(1 << 20), clearing visits $\sim$1M buckets even if only 5 keys were inserted. In a reserve when clearing per test, declare a fresh unordered_map per case, or track inserted keys in a side vector and erase only those.
Iteration order is non-deterministic. unordered_map order depends on hash values and bucket count—do not rely on insertion order. If you need ordered output, collect into a vector and sort.
erase(key) removes the entire key, not one occurrence. This trips up anyone who treats a frequency map like a multiset:
cpp
unordered_map<int,int> freq;
for (int x : a) freq[x]++;
freq.erase(5); // wipes key 5 entirely
if (--freq[5] == 0) freq.erase(5); // decrement then cleanup — correctErasing during iteration invalidates the iterator. Use the erase-return-iterator idiom:
cpp
for (auto it = m.begin(); it != m.end(); ) {
if (should_remove(it->second))
it = m.erase(it);
else
++it;
}No default hash for pair, tuple, or custom structs. You must provide a hash functor (see pair_hash in Implementation) or use map instead. A quick alternative for pair<int,int>: encode as a single long long with (uint64_t)a << 32 | (uint32_t)b.
Mental Traps
"unordered_map is always faster than map." For small map can win. A red-black tree node sits in one allocation; a hash table chains through bucket pointers, linked-list nodes, and potential rehashes. On map win by 10–30% due to fewer cache misses and zero hashing overhead. The crossover where unordered_map reliably pulls ahead is typically around map in contests unless you've profiled a bottleneck and added custom_hash + reserve.
"O(1) average means safe." custom_hash with a random salt on CF, or just use map for guaranteed
"clear() is cheap." m.clear() is m.reserve(1 << 20), clearing visits ~1M buckets even if only 5 keys were inserted. In a loop over clear() on a pre-reserved map, total cost is reserve if you clear per test case, or erase keys manually from a tracked list.
Self-Test
Before moving on, verify you can answer these without looking back:
- [ ] What happens when you write
m[key]andkeyis not in the map? Why is this dangerous for existence checks? - [ ] Why does
std::unordered_mapwith default hash get TLE on Codeforces? Describe the attack mechanism in one sentence. - [ ] Write the
custom_hashstruct from memory (or at least describe whatsplitmix64+ random salt achieves). - [ ] When should you prefer
mapoverunordered_map? Name at least 3 situations. - [ ] What is the time complexity of
m.clear()afterm.reserve(1 << 20)with only 10 elements inserted? Why? - [ ]
unordered_maphas no default hash forpair<int,int>. What are your two options? - [ ] In the two-sum pattern, why do we check
seen[target - x]before insertingx?
Practice Problems
| # | Problem | Rating | Key Technique |
|---|---|---|---|
| 1 | 782A—Andryusha and Socks | 800 | Set membership tracking |
| 2 | 1520D—Same Differences | 1200 | Transform |
| 3 | 4C—Registration System | 1300 | Frequency counting with string keys |
| 4 | 1676F—Longest Strike | 1300 | Frequency map + sweep |
| 5 | 1296C—Yet Another Walking Robot | 1500 | Coordinate pair as key, cycle detection |
| 6 | 271D—Good Substrings | 1500 | String set deduplication |
| 7 | 1225D—Power Products | 1700 | Prime signature as map key |
| 8 | CSES 1640—Sum of Two Values | — | Complement lookup |
| 9 | CSES 1641—Sum of Three Values | — | Fix one, hash the other two |
| 10 | CSES 1642—Sum of Four Values | — | Pairwise sum hash map |
| 11 | CSES 1660—Subarray Sums I | — | Prefix sum + hash map |
Further Reading
- Blowing up
unordered_map(CF blog by neal)—the definitive post on anti-hash attacks and custom hash functions - C++ STL
unordered_mapreference - Policy-based hash table (
gp_hash_table)—faster alternative from<ext/pb_ds> - Arrays and Strings—prerequisite: basic STL containers
Related topics:
- Set and Multiset—ordered counterpart with
guarantees; prefer when you need lower_boundor sorted iteration - Stack, Queue, Deque—other fundamental containers often combined with hash maps in BFS/sliding-window problems
- Data Structure Selection Guide—decision framework for choosing the right container
- Practice Ladder: Data Structures—graded problem set covering all core structures