Appearance
Set and Multiset
Sorted associative containers backed by red-black trees—the right tool when you need ordered updates, predecessor/successor queries, or a dynamic sorted collection.
Difficulty: Beginner | Prerequisites: Arrays and Strings
Intuition
You're solving a contest problem with a live collection of integers: insert a number, delete a number, and query "what is the smallest number
Operations: insert(8), insert(3), insert(11), insert(5), query(>=6), insert(14), delete(11), insert(2), query(>=10).
text
insert(8): [8] 0 elements shifted
insert(3): [3, 8] 1 element shifted right
insert(11): [3, 8, 11] 0 elements shifted
insert(5): [3, 5, 8, 11] 2 elements shifted right
query(>=6): binary search -> 8 OK, O(log n)
insert(14): [3, 5, 8, 11, 14] 0 elements shifted
delete(11): [3, 5, 8, 14] 1 element shifted left
insert(2): [2, 3, 5, 8, 14] 4 elements shifted right
query(>=10): binary search -> 14 OK, O(log n)That's 8 element shifts across just 7 insert/delete operations on a tiny collection. Each insert or delete costs
Scale to
The queries are fine (
A balanced BST keeps elements in sorted order while guaranteeing
Instead of storing elements side-by-side in a contiguous array, a BST stores them in a tree: each node holds one value, its left subtree holds all smaller values, its right subtree holds all larger values. To insert, walk down from the root making
Analogy: Imagine a librarian maintaining a sorted shelf. The sorted-array approach is like one tight row of books—inserting in the middle means sliding everything right. The balanced-BST approach is like organizing by a tree of decisions: fiction or non-fiction, then A–M or N–Z, then A–F or G–M. Finding where a book belongs takes
Where the analogy breaks: A real library's decision tree is static. A BST can become lopsided—imagine inserting 1, 2, 3, 4, 5 in order and getting a chain instead of a tree. The C++ set uses a red-black tree that automatically rebalances via rotations after each insert or delete, keeping the height at
Visual Walkthrough
Same operations, but now using a BST. The exact shape doesn't matter—every operation follows the same pattern: walk down, make a local change, stop.
text
Step 1: insert(8)
8
comparisons: 0 (root is empty)
Step 2: insert(3)
8
/
3
comparisons: 1 (3 < 8, go left)
Step 3: insert(11)
8
/ \
3 11
comparisons: 1 (11 > 8, go right)
Step 4: insert(5)
8
/ \
3 11
\
5
comparisons: 2 (5 < 8, 5 > 3)
Step 5: query "smallest >= 6"
8 <-- 6 < 8, so 8 is a candidate. Go left to look for smaller.
/ \
3 11
\
5 <-- 6 > 3, go right.
<-- 6 > 5, go right. Null -> stop.
Answer: 8 comparisons: 3
Step 6: insert(14)
8
/ \
3 11
\ \
5 14
comparisons: 2 (14 > 8, 14 > 11)
Step 7: delete(11)
Find 11: 11 > 8, go right -> found.
Node 11 has one child (14). Replace 11 with 14.
8
/ \
3 14
\
5
comparisons: 1 to find, O(1) pointer fixup
Step 8: insert(2)
8
/ \
3 14
/ \
2 5
comparisons: 2 (2 < 8, 2 < 3)
Step 9: query "smallest >= 10"
8 <-- 10 > 8, go right.
/ \
3 14 <-- 10 < 14, candidate = 14. Go left. Null -> stop.
/ \
2 5
Answer: 14 comparisons: 2text
Sorted Array BST
------------ ---
7 inserts/deletes: 8 shifts 11 comparisons (no shifts!)
2 queries: ~4 comparisons 5 comparisons
------------ ---
Worst single op: O(n) shifts O(log n) comparisons
Total for n ops: O(n^2) O(n log n)With
The BST Invariant and Why It Survives Every Operation
text
+-------------------------------------------------------------------+
| INVARIANT: For every node X in the tree: |
| (1) All keys in X's left subtree are < X's key. |
| (2) All keys in X's right subtree are > X's key (set) or |
| >= X's key (multiset). |
| (3) The tree height is always O(log n) due to red-black |
| balancing. |
| |
| Maintained because: |
| - Insert walks down using (1)+(2) to find the correct leaf |
| position, preserving order. |
| - Delete removes/replaces with in-order successor, preserving |
| order. |
| - Red-black rotations after insert/delete restore property (3) |
| without breaking (1) or (2). |
+-------------------------------------------------------------------+Step 5 shows why lower_bound(6) works: invariants (1) and (2) let us conclude at each node that 8 is a candidate (since
Step 7 shows deletion: we replace 11 with 14, its only child. That's safe because 14 was already in 11's right subtree—so
Property (3) is what separates a balanced BST from a naive one. Without it, inserting sorted data (1, 2, 3, ...) degrades the tree into a linked list with
Why lower_bound and upper_bound Are the Power Tools
lower_bound and upper_bound are the power tools—not find.
text
Set s = {1, 3, 5, 7, 9}
s.find(4) --> end() (not found, useless)
s.lower_bound(4) --> ^5 (first >= 4, THIS is what you want)
s.upper_bound(5) --> ^7 (first > 5)
"Nearest neighbor" pattern:
+------------------------------------------+
| auto it = s.lower_bound(x); |
| // it = nearest >= x |
| if (it != s.begin()) { |
| auto prev_it = prev(it); |
| // prev_it = nearest < x |
| } |
+------------------------------------------+The set as a dynamic sorted array is underappreciated. Problems that seem to need offline sorting can sometimes be solved online: insert an element and immediately query its rank or nearest neighbor.
std::set vs unordered_set is not just "slower vs faster." They have fundamentally different capabilities: set supports ordered queries (lower_bound, iteration in sorted order, predecessor/successor); unordered_set supports only membership testing. If you need "nearest element to x," unordered_set cannot help.
text
Choose your container:
Need ordering/nearest? --YES--> set / multiset
|
NO
|
Need O(1) lookup? --YES--> unordered_set / unordered_map
|
NO
|
Static + sorted queries? --YES--> sorted vector + binary searchWhen to Reach for This
If a problem asks you to maintain order while elements come and go, think set / multiset:
- "find the smallest element
"—that is literally lower_bound. - "maintain a dynamic sorted collection with insert and delete"—the core use case.
- "sliding window median" or "k-th smallest"—multiset with a tracked iterator.
- "predecessor / successor queries"—
prev(s.lower_bound(x))ands.upper_bound(x). - "greedy: pick the best available item"—set of available items +
lower_bound.
Counter-examples—problems that look like they need set but don't:
- Only need membership testing, no ordering: Use
unordered_setinstead (average vs ). If the problem never asks "find the nearest value" or "iterate in order," you're paying for structure you don't use. - Only need the minimum or maximum: Use
priority_queue. Its constant factor is far smaller (~8 bytes/element vs ~48 bytes for a tree node). If you never calllower_boundand never delete arbitrary elements, a heap is faster and lighter. - Collection is static after construction: Sort a plain
vectoronce and usestd::lower_bound. Arrays are cache-friendly and have ~5–10× better constant factors than tree traversal. Only reach forsetwhen the collection changes between queries.
C++ STL Reference
| Function / Class | Header | What It Does | Time Complexity |
|---|---|---|---|
set<T> | <set> | Sorted unique container | -- |
multiset<T> | <set> | Sorted container allowing duplicates | -- |
map<K,V> | <map> | Sorted key-value container (unique keys) | -- |
s.insert(x) | -- | Insert element x | |
s.emplace(args...) | -- | Construct and insert in-place | |
s.erase(x) | -- | set: remove element x; multiset: remove all copies of x | |
s.erase(it) | -- | Remove element at iterator it | |
s.erase(first, last) | -- | Remove range [first, last) | |
s.find(x) | -- | Iterator to x, or end() if absent | |
s.count(x) | -- | Number of elements equal to x | |
s.contains(x) | -- | true if x present (C++20) | |
s.lower_bound(x) | -- | Iterator to first element | |
s.upper_bound(x) | -- | Iterator to first element | |
s.begin() / s.end() | -- | Iterators to first / past-last | |
s.rbegin() / s.rend() | -- | Reverse iterators | |
s.size() | -- | Number of elements | |
s.empty() | -- | Check if empty | |
s.clear() | -- | Remove all elements | |
prev(it) / next(it) | <iterator> | Move iterator backward / forward | |
s.extract(x) | -- | Remove and return node (C++17, avoids realloc) | |
s.merge(other) | -- | Splice nodes from other into s (C++17) |
Always use s.lower_bound(x) (member function), not std::lower_bound(s.begin(), s.end(), x). The free function is
Implementation (Contest-Ready)
Ordered Statistics—Minimal Contest Template
A common contest pattern: maintain a sorted dynamic collection, answer queries about k-th smallest, count of elements less than x, predecessor/successor.
cpp
#include <bits/stdc++.h>
using namespace std;
int main(){
ios::sync_with_stdio(false);
cin.tie(nullptr);
int q; cin >> q;
set<int> s;
multiset<int> ms;
while(q--){
int type, x; cin >> type >> x;
if(type == 1){
// Insert into both
s.insert(x);
ms.insert(x);
} else if(type == 2){
// Erase one copy from multiset, erase from set
s.erase(x);
auto it = ms.find(x);
if(it != ms.end()) ms.erase(it); // ONE copy only!
} else if(type == 3){
// Predecessor: largest element < x
auto it = s.lower_bound(x);
if(it == s.begin()) cout << "NONE\n";
else cout << *prev(it) << "\n";
} else if(type == 4){
// Successor: smallest element > x
auto it = s.upper_bound(x);
if(it == s.end()) cout << "NONE\n";
else cout << *it << "\n";
} else if(type == 5){
// Min and Max of multiset
if(ms.empty()) cout << "EMPTY\n";
else cout << *ms.begin() << " " << *ms.rbegin() << "\n";
}
}
}Sliding Window with Multiset—Median Maintenance
Classic problem: given an array and window size
cpp
#include <bits/stdc++.h>
using namespace std;
int main(){
ios::sync_with_stdio(false);
cin.tie(nullptr);
int n, k;
cin >> n >> k;
vector<int> a(n);
for(auto& x : a) cin >> x;
// We maintain a multiset of the current window.
// The median is at position k/2 in 0-indexed sorted order.
// We keep an iterator 'mid' pointing to the median element
// and update it as elements enter/leave the window.
multiset<int> window;
multiset<int>::iterator mid;
for(int i = 0; i < n; i++){
// --- INSERT a[i] into window ---
window.insert(a[i]);
if(window.size() == 1){
// First element: median is trivially this element
mid = window.begin();
} else {
// If the new element is less than current median,
// the median index shifts left (so move iterator back).
// If >= median, the median stays or shifts right depending
// on whether window size is now even/odd.
if(a[i] < *mid){
// New element went to the left half.
// If window size is now even, median shifts left.
if(window.size() % 2 == 0)
--mid;
// If odd, median stays (new element balanced by being left).
} else {
// New element went to right half (or equal).
// If window size is now odd, median shifts right.
if(window.size() % 2 == 1)
++mid;
}
}
// --- REMOVE a[i-k] once the window is full ---
if(i >= k){
int old_val = a[i - k];
// Decide how removing old_val affects the median iterator.
if(old_val < *mid){
// Removing from left half: median shifts right
// (but only if size will become even)
if(window.size() % 2 == 0)
++mid;
} else if(old_val > *mid){
// Removing from right half: median shifts left
// (but only if size will become odd, i.e., currently even)
if(window.size() % 2 == 1)
--mid;
} else {
// Removing the median itself!
// We need to move mid before erasing.
// Prefer moving right; if window becomes even-sized,
// we'll need the right-of-center element anyway.
auto to_erase = mid;
if(window.size() % 2 == 1)
--mid;
else
++mid;
window.erase(to_erase);
goto skip_erase; // already erased
}
// Normal erase (not the median itself)
window.erase(window.find(old_val));
skip_erase:;
}
// --- OUTPUT median once we have a full window ---
if(i >= k - 1){
cout << *mid << " \n"[i == n - 1];
}
}
}Operations Reference
The costs for each set/multiset operation—an
| Operation | Time | Space |
|---|---|---|
insert(x) | ||
erase(iterator) | -- | |
erase(value) | -- | |
find(x) | -- | |
count(x) | -- | |
contains(x) (C++20) | -- | |
lower_bound(x) | -- | |
upper_bound(x) | -- | |
begin() / end() | -- | |
prev(it) / next(it) | -- | |
| Iterate all elements | -- | |
clear() | -- | |
| Total space for | -- |
Problem Patterns & Tricks
Predecessor / Successor Queries
Use lower_bound and upper_bound to find the nearest element above or below a target in
cpp
// Largest element <= x
auto it = s.upper_bound(x);
if(it != s.begin()) { --it; /* *it is the answer */ }
// Smallest element >= x
auto it = s.lower_bound(x);
if(it != s.end()) { /* *it is the answer */ }CF examples: CF 1234D, CF 1462E.
Multiset as a lazy-delete priority queue. When you need a priority queue that supports arbitrary deletion (not just the top), use a multiset. The min is *ms.begin(), the max is *ms.rbegin(), and you can erase any element by iterator.
cpp
multiset<int> pq;
pq.insert(10); pq.insert(3); pq.insert(7);
int mn = *pq.begin(); // 3
int mx = *pq.rbegin(); // 10
pq.erase(pq.find(7)); // remove 7 specificallyCF examples: CF 1354D, CF 1157E.
Sliding window with an ordered structure. Maintain a multiset of the current window; it supports finding the median, min, max, or k-th element in
Coordinate compression via set. Insert all relevant values into a set, then assign compressed indices. Useful before applying a BIT or segment tree.
cpp
set<int> vals(a.begin(), a.end());
map<int,int> comp;
int idx = 0;
for(int v : vals) comp[v] = idx++;
// Now comp[a[i]] gives the compressed indexCF examples: CF 1311F, CF 1560F2.
Policy-based order-statistics tree. GNU __gnu_pbds::tree gives you a set with find_by_order(k) (k-th element) and order_of_key(x) (rank of x)—critical when you need rank queries that a plain set can't provide.
cpp
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace __gnu_pbds;
typedef tree<int, null_type, less<int>,
rb_tree_tag, tree_order_statistics_node_update> ordered_set;
ordered_set os;
os.insert(1); os.insert(3); os.insert(5);
cout << *os.find_by_order(1); // 3 (0-indexed)
cout << os.order_of_key(4); // 2 (# elements < 4)For duplicate support, use tree<pair<int,int>, ...> with {value, unique_id}. CF examples: CF 1042D, CF 1311F.
Two Sets for Median Maintenance
Maintain two multisets—one for the lower half, one for the upper half. Rebalance after each insert/delete to keep sizes within 1. Gives
cpp
multiset<int> lo, hi;
// Invariant: lo.size() == hi.size() or lo.size() == hi.size() + 1
// Median = *lo.rbegin()
auto balance = [&](){
while(lo.size() > hi.size() + 1){
hi.insert(*lo.rbegin());
lo.erase(prev(lo.end()));
}
while(hi.size() > lo.size()){
lo.insert(*hi.begin());
hi.erase(hi.begin());
}
};CF examples: CF 1234D, CF 1702F.
"Next available" with a set of free slots. Keep a set<int> of available positions. When a position is requested, use lower_bound to find the nearest available one and erase it.
cpp
set<int> free_slots;
for(int i = 1; i <= n; i++) free_slots.insert(i);
int allocate(int preferred){
auto it = free_slots.lower_bound(preferred);
if(it == free_slots.end()) it = free_slots.begin(); // wrap around
int slot = *it;
free_slots.erase(it);
return slot;
}CF examples: CF 1469D, CF 1700D.
Contest Cheat Sheet
text
+--------------------------------------------------------------+
| SET / MULTISET CHEAT SHEET |
+--------------------------------------------------------------+
| WHEN TO USE: |
| - Need sorted order + O(log n) insert/delete/lookup |
| - Need lower_bound / upper_bound (predecessor/successor) |
| - Need min/max of dynamic collection |
| - multiset when duplicates matter |
+--------------------------------------------------------------+
| KEY CALLS: |
| s.insert(x); s.erase(s.find(x)); // ONE copy |
| s.lower_bound(x); s.upper_bound(x); |
| *s.begin(); // min *s.rbegin(); // max |
| prev(it); next(it); s.contains(x); // C++20 |
+--------------------------------------------------------------+
| DANGER: |
| ms.erase(x) removes ALL copies! Use ms.erase(ms.find(x)) |
| Use MEMBER lower_bound, not std::lower_bound (O(n)!) |
| Check it != end() before dereferencing! |
+--------------------------------------------------------------+
| COMPLEXITY: |
| insert/erase/find/lower_bound: O(log n) |
| Space: ~48 bytes per node (high constant) |
+--------------------------------------------------------------+
| POLICY-BASED (order statistics): |
| #include <ext/pb_ds/assoc_container.hpp> |
| find_by_order(k) -> k-th element (0-indexed) |
| order_of_key(x) -> # elements < x |
+--------------------------------------------------------------+Common Mistakes & Debugging
multiset::erase(value) Removes ALL Copies
This is the #1 multiset bug. ms.erase(x) removes every element equal to x, not just one. In miniature:
cpp
multiset<int> ms = {3, 5, 5, 7};
ms.erase(5); // intended: remove one 5
cout << ms.size(); // prints 2, not 3The failure mode surfaces in real problems too. Imagine maintaining a multiset of segment lengths for a "Traffic Lights" style problem. When a new point splits segment seg.erase(L) silently deletes all copies of that length—breaking any test case where two segments share a length. The fix is always seg.erase(seg.find(L)).
multiset::erase(value) erases all occurrences; multiset::erase(iterator) erases exactly one. To remove one copy safely:
cpp
auto it = ms.find(x);
if(it != ms.end()) ms.erase(it); // erases exactly onestd::lower_bound vs member lower_bound. The free function std::lower_bound uses std::advance on non-random-access iterators, making it
cpp
// WRONG -- O(n) because set iterators are bidirectional, not random-access
auto it = lower_bound(s.begin(), s.end(), x);
// RIGHT -- O(log n), uses the tree structure
auto it = s.lower_bound(x);Iterator invalidation. insert() does not invalidate any existing iterators. erase(it) invalidates only it; all others remain valid. When erasing during iteration, use the return value:
cpp
for(auto it = s.begin(); it != s.end(); ){
if(should_remove(*it))
it = s.erase(it); // returns next valid iterator
else
++it;
}Dereferencing end(). Always bounds-check before dereferencing the result of find, lower_bound, or upper_bound:
cpp
auto it = s.lower_bound(x);
if(it != s.end()){
// safe to use *it
}Empty set pitfalls. *s.begin() and *s.rbegin() are undefined behavior on an empty set. Guard it:
cpp
if(!s.empty()) cout << *s.begin();Custom comparators. A custom comparator must define a strict weak ordering (<, not <=). Using <= causes undefined behavior and silent corruption.
cpp
// WRONG
set<int, less_equal<int>> s; // DO NOT DO THIS
// RIGHT
set<int, less<int>> s; // default, fine
set<int, greater<int>> s; // max at begin()High memory constant. Each tree node allocates ~40–64 bytes (pointers + color + alignment). For
Elements are immutable. Set elements are const—modifying them corrupts the internal BST ordering. In C++17 it won't compile; casting away const is UB. For example:
cpp
set<pair<int,int>> s = {{1,10},{2,20},{3,30}};
auto it = s.find({2, 20});
it->second = 25; // update the valueTo change a value, erase and re-insert:
cpp
s.erase(it);
s.insert({2, 25});Self-Test
Verify you can answer each of these without looking at the notes:
- [ ] Write a "find nearest element" query: given a set and value x, find the closest element in the set
- [ ] Erase exactly one copy of a value from a
multisetwithout erasing all duplicates - [ ] Explain the difference between
lower_bound(x)andupper_bound(x)with a concrete example - [ ] State when
unordered_setis better thansetand whensetis the only correct choice - [ ] Access the last element of a set using both
rbegin()andprev(end()) - [ ] Explain why
std::lower_bound(s.begin(), s.end(), x)isbut s.lower_bound(x)is
Practice Problems
| # | Problem | Source | Difficulty | Key Idea |
|---|---|---|---|---|
| 1 | Towers | CF 37A | Easy (800) | Use set to count distinct values |
| 2 | Registration System | CF 4C | Easy (1300) | map to track name counts |
| 3 | Closest to the Right | CF 1166A | Easy (1300) | lower_bound on a set |
| 4 | Multiset | CF 1354D | Medium (1900) | Multiset operations + careful erase |
| 5 | Sliding Median | CSES | Medium (1700) | Sliding window + multiset median tracking |
| 6 | Number of Smaller | CF 1042D | Medium (1800) | Policy-based tree, order_of_key |
| 7 | Close Segments | CF 1462E2 | Medium (1900) | Sorting + multiset for counting |
| 8 | Multiples of Length | CF 1157E | Medium-Hard (2000) | Greedy + multiset erase/insert |
| 9 | Josephus Problem II | CSES | Hard (2100) | Order-statistics tree simulation |
| 10 | Array Partition | CF 1700D | Hard (2100) | Set of available positions + greedy |
Further Reading
- cp-algorithms: Policy-Based Data Structures—order-statistics trees in detail.
- CF Blog: C++ STL set/multiset tricks—practical contest techniques.
- CF Blog: Blowing up
unordered_map—why ordered containers are sometimes safer. - cppreference:
std::set—authoritative API reference. - cppreference:
std::multiset - Priority Queue and Heaps—when you only need the min/max, a heap is faster than a balanced BST.
- Policy-Based DS (PBDS)—extends set with order-statistics (kth element, rank queries).
- Binary Search—
lower_bound/upper_boundon sets mirrors binary search on sorted arrays. - Binary Indexed Tree (Fenwick)—a lighter-weight alternative for rank queries.
Related Techniques
- Hash Map and Unordered Map—
lookup when you don't need ordering - Priority Queue and Heaps—when you only need min/max, a heap is faster and lighter
- Policy-Based DS (PBDS)—extends set with order-statistics (kth element, rank queries)
- Segment Tree—for range queries beyond what a set provides
- Binary Search—
lower_bound/upper_boundon sets mirrors binary search on sorted arrays - Data Structure Selection Guide
- Practice Ladder: Data Structures