Skip to content

Set and Multiset

Sorted associative containers backed by red-black trees—the right tool when you need ordered updates, predecessor/successor queries, or a dynamic sorted collection.

Difficulty: Beginner | Prerequisites: Arrays and Strings


Intuition

You're solving a contest problem with a live collection of integers: insert a number, delete a number, and query "what is the smallest number x?" The obvious approach is a sorted array, but arrays punish every insertion and deletion that lands in the middle.

Operations: insert(8), insert(3), insert(11), insert(5), query(>=6), insert(14), delete(11), insert(2), query(>=10).

text
  insert(8):   [8]                              0 elements shifted
  insert(3):   [3, 8]                           1 element shifted right
  insert(11):  [3, 8, 11]                       0 elements shifted
  insert(5):   [3, 5, 8, 11]                    2 elements shifted right
  query(>=6):  binary search -> 8               OK, O(log n)
  insert(14):  [3, 5, 8, 11, 14]                0 elements shifted
  delete(11):  [3, 5, 8, 14]                    1 element shifted left
  insert(2):   [2, 3, 5, 8, 14]                 4 elements shifted right
  query(>=10): binary search -> 14              OK, O(log n)

That's 8 element shifts across just 7 insert/delete operations on a tiny collection. Each insert or delete costs O(n) to maintain sorted order because every element after the insertion point must slide over.

Scale to n=105 operations: each one shifts up to 105 elements. That's up to 1010 total work—far too slow for a 2-second time limit.

The queries are fine (O(logn) via binary search), but inserts and deletes break the approach.

A balanced BST keeps elements in sorted order while guaranteeing O(logn) insert, delete, and search—no shifting required.

Instead of storing elements side-by-side in a contiguous array, a BST stores them in a tree: each node holds one value, its left subtree holds all smaller values, its right subtree holds all larger values. To insert, walk down from the root making O(logn) comparisons, then attach a new leaf—nothing else moves. To delete, remove one node and patch up at most O(logn) pointers.

Analogy: Imagine a librarian maintaining a sorted shelf. The sorted-array approach is like one tight row of books—inserting in the middle means sliding everything right. The balanced-BST approach is like organizing by a tree of decisions: fiction or non-fiction, then A–M or N–Z, then A–F or G–M. Finding where a book belongs takes O(logn) decisions, and slotting it in just means adding it at the end of a branch—no other books move.

Where the analogy breaks: A real library's decision tree is static. A BST can become lopsided—imagine inserting 1, 2, 3, 4, 5 in order and getting a chain instead of a tree. The C++ set uses a red-black tree that automatically rebalances via rotations after each insert or delete, keeping the height at O(logn). You never need to think about the balancing details—the STL handles them.

Visual Walkthrough

Same operations, but now using a BST. The exact shape doesn't matter—every operation follows the same pattern: walk down, make a local change, stop.

text
Step 1: insert(8)

         8
                               comparisons: 0 (root is empty)

Step 2: insert(3)

         8
        /
       3
                               comparisons: 1 (3 < 8, go left)

Step 3: insert(11)

         8
        / \
       3   11
                               comparisons: 1 (11 > 8, go right)

Step 4: insert(5)

         8
        / \
       3   11
        \
         5
                               comparisons: 2 (5 < 8, 5 > 3)

Step 5: query "smallest >= 6"

         8    <-- 6 < 8, so 8 is a candidate. Go left to look for smaller.
        / \
       3   11
        \
         5   <-- 6 > 3, go right.
              <-- 6 > 5, go right. Null -> stop.

   Answer: 8                   comparisons: 3

Step 6: insert(14)

         8
        / \
       3   11
        \    \
         5   14
                               comparisons: 2 (14 > 8, 14 > 11)

Step 7: delete(11)

   Find 11: 11 > 8, go right -> found.
   Node 11 has one child (14). Replace 11 with 14.

         8
        / \
       3   14
        \
         5
                               comparisons: 1 to find, O(1) pointer fixup

Step 8: insert(2)

         8
        / \
       3   14
      / \
     2   5
                               comparisons: 2 (2 < 8, 2 < 3)

Step 9: query "smallest >= 10"

         8    <-- 10 > 8, go right.
        / \
       3   14  <-- 10 < 14, candidate = 14. Go left. Null -> stop.
      / \
     2   5

   Answer: 14                  comparisons: 2
text
                    Sorted Array       BST
                    ------------       ---
  7 inserts/deletes:  8 shifts         11 comparisons (no shifts!)
  2 queries:          ~4 comparisons   5 comparisons
                    ------------       ---
  Worst single op:    O(n) shifts      O(log n) comparisons
  Total for n ops:    O(n^2)           O(n log n)

With n=105: the sorted array does up to 1010 work. The BST does about 105×171.7×106over 5000× faster.

The BST Invariant and Why It Survives Every Operation

text
+-------------------------------------------------------------------+
| INVARIANT: For every node X in the tree:                          |
|   (1) All keys in X's left subtree are < X's key.                |
|   (2) All keys in X's right subtree are > X's key (set) or       |
|       >= X's key (multiset).                                      |
|   (3) The tree height is always O(log n) due to red-black         |
|       balancing.                                                  |
|                                                                   |
| Maintained because:                                               |
|   - Insert walks down using (1)+(2) to find the correct leaf      |
|     position, preserving order.                                   |
|   - Delete removes/replaces with in-order successor, preserving   |
|     order.                                                        |
|   - Red-black rotations after insert/delete restore property (3)  |
|     without breaking (1) or (2).                                  |
+-------------------------------------------------------------------+

Step 5 shows why lower_bound(6) works: invariants (1) and (2) let us conclude at each node that 8 is a candidate (since 6<8) and that everything to 8's right is larger still, so we go left hunting for something smaller-but-still-6. When that subtree yields nothing, 8 is the answer. The BST property turns an ordered-set query into binary search on a tree.

Step 7 shows deletion: we replace 11 with 14, its only child. That's safe because 14 was already in 11's right subtree—so 14>8 is guaranteed by invariant (2). No other nodes need scanning.

Property (3) is what separates a balanced BST from a naive one. Without it, inserting sorted data (1, 2, 3, ...) degrades the tree into a linked list with O(n) height. The red-black tree's coloring rules and rotations ensure this never happens.

Why lower_bound and upper_bound Are the Power Tools

lower_bound and upper_bound are the power tools—not find.

text
    Set s = {1, 3, 5, 7, 9}

    s.find(4)         -->  end()  (not found, useless)
    s.lower_bound(4)  -->  ^5     (first >= 4, THIS is what you want)
    s.upper_bound(5)  -->  ^7     (first > 5)

    "Nearest neighbor" pattern:
    +------------------------------------------+
    |  auto it = s.lower_bound(x);             |
    |  // it = nearest >= x                    |
    |  if (it != s.begin()) {                  |
    |      auto prev_it = prev(it);            |
    |      // prev_it = nearest < x            |
    |  }                                       |
    +------------------------------------------+

The set as a dynamic sorted array is underappreciated. Problems that seem to need offline sorting can sometimes be solved online: insert an element and immediately query its rank or nearest neighbor.

std::set vs unordered_set is not just "slower vs faster." They have fundamentally different capabilities: set supports ordered queries (lower_bound, iteration in sorted order, predecessor/successor); unordered_set supports only membership testing. If you need "nearest element to x," unordered_set cannot help.

text
    Choose your container:

    Need ordering/nearest?   --YES--> set / multiset
         |
         NO
         |
    Need O(1) lookup?        --YES--> unordered_set / unordered_map
         |
         NO
         |
    Static + sorted queries? --YES--> sorted vector + binary search

When to Reach for This

If a problem asks you to maintain order while elements come and go, think set / multiset:

  • "find the smallest element X"—that is literally lower_bound.
  • "maintain a dynamic sorted collection with insert and delete"—the core use case.
  • "sliding window median" or "k-th smallest"—multiset with a tracked iterator.
  • "predecessor / successor queries"prev(s.lower_bound(x)) and s.upper_bound(x).
  • "greedy: pick the best available item"—set of available items + lower_bound.

Counter-examples—problems that look like they need set but don't:

  • Only need membership testing, no ordering: Use unordered_set instead (O(1) average vs O(logn)). If the problem never asks "find the nearest value" or "iterate in order," you're paying for structure you don't use.
  • Only need the minimum or maximum: Use priority_queue. Its constant factor is far smaller (~8 bytes/element vs ~48 bytes for a tree node). If you never call lower_bound and never delete arbitrary elements, a heap is faster and lighter.
  • Collection is static after construction: Sort a plain vector once and use std::lower_bound. Arrays are cache-friendly and have ~5–10× better constant factors than tree traversal. Only reach for set when the collection changes between queries.

C++ STL Reference

Function / ClassHeaderWhat It DoesTime Complexity
set<T><set>Sorted unique container--
multiset<T><set>Sorted container allowing duplicates--
map<K,V><map>Sorted key-value container (unique keys)--
s.insert(x)--Insert element xO(logn)
s.emplace(args...)--Construct and insert in-placeO(logn)
s.erase(x)--set: remove element x; multiset: remove all copies of xO(logn+k) where k = count
s.erase(it)--Remove element at iterator itO(1) amortized
s.erase(first, last)--Remove range [first, last)O(logn+k)
s.find(x)--Iterator to x, or end() if absentO(logn)
s.count(x)--Number of elements equal to xO(logn+k)
s.contains(x)--true if x present (C++20)O(logn)
s.lower_bound(x)--Iterator to first element xO(logn)
s.upper_bound(x)--Iterator to first element >xO(logn)
s.begin() / s.end()--Iterators to first / past-lastO(1)
s.rbegin() / s.rend()--Reverse iteratorsO(1)
s.size()--Number of elementsO(1)
s.empty()--Check if emptyO(1)
s.clear()--Remove all elementsO(n)
prev(it) / next(it)<iterator>Move iterator backward / forwardO(1) amortized
s.extract(x)--Remove and return node (C++17, avoids realloc)O(logn)
s.merge(other)--Splice nodes from other into s (C++17)O(nlogn)

Always use s.lower_bound(x) (member function), not std::lower_bound(s.begin(), s.end(), x). The free function is O(n) on sets because set iterators are not random-access.


Implementation (Contest-Ready)

Ordered Statistics—Minimal Contest Template

A common contest pattern: maintain a sorted dynamic collection, answer queries about k-th smallest, count of elements less than x, predecessor/successor.

cpp
#include <bits/stdc++.h>
using namespace std;

int main(){
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int q; cin >> q;
    set<int> s;
    multiset<int> ms;

    while(q--){
        int type, x; cin >> type >> x;
        if(type == 1){
            // Insert into both
            s.insert(x);
            ms.insert(x);
        } else if(type == 2){
            // Erase one copy from multiset, erase from set
            s.erase(x);
            auto it = ms.find(x);
            if(it != ms.end()) ms.erase(it); // ONE copy only!
        } else if(type == 3){
            // Predecessor: largest element < x
            auto it = s.lower_bound(x);
            if(it == s.begin()) cout << "NONE\n";
            else cout << *prev(it) << "\n";
        } else if(type == 4){
            // Successor: smallest element > x
            auto it = s.upper_bound(x);
            if(it == s.end()) cout << "NONE\n";
            else cout << *it << "\n";
        } else if(type == 5){
            // Min and Max of multiset
            if(ms.empty()) cout << "EMPTY\n";
            else cout << *ms.begin() << " " << *ms.rbegin() << "\n";
        }
    }
}

Sliding Window with Multiset—Median Maintenance

Classic problem: given an array and window size k, find the median of each sliding window.

cpp
#include <bits/stdc++.h>
using namespace std;

int main(){
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n, k;
    cin >> n >> k;
    vector<int> a(n);
    for(auto& x : a) cin >> x;

    // We maintain a multiset of the current window.
    // The median is at position k/2 in 0-indexed sorted order.
    // We keep an iterator 'mid' pointing to the median element
    // and update it as elements enter/leave the window.
    multiset<int> window;
    multiset<int>::iterator mid;

    for(int i = 0; i < n; i++){
        // --- INSERT a[i] into window ---
        window.insert(a[i]);

        if(window.size() == 1){
            // First element: median is trivially this element
            mid = window.begin();
        } else {
            // If the new element is less than current median,
            // the median index shifts left (so move iterator back).
            // If >= median, the median stays or shifts right depending
            // on whether window size is now even/odd.
            if(a[i] < *mid){
                // New element went to the left half.
                // If window size is now even, median shifts left.
                if(window.size() % 2 == 0)
                    --mid;
                // If odd, median stays (new element balanced by being left).
            } else {
                // New element went to right half (or equal).
                // If window size is now odd, median shifts right.
                if(window.size() % 2 == 1)
                    ++mid;
            }
        }

        // --- REMOVE a[i-k] once the window is full ---
        if(i >= k){
            int old_val = a[i - k];
            // Decide how removing old_val affects the median iterator.
            if(old_val < *mid){
                // Removing from left half: median shifts right
                // (but only if size will become even)
                if(window.size() % 2 == 0)
                    ++mid;
            } else if(old_val > *mid){
                // Removing from right half: median shifts left
                // (but only if size will become odd, i.e., currently even)
                if(window.size() % 2 == 1)
                    --mid;
            } else {
                // Removing the median itself!
                // We need to move mid before erasing.
                // Prefer moving right; if window becomes even-sized,
                // we'll need the right-of-center element anyway.
                auto to_erase = mid;
                if(window.size() % 2 == 1)
                    --mid;
                else
                    ++mid;
                window.erase(to_erase);
                goto skip_erase; // already erased
            }
            // Normal erase (not the median itself)
            window.erase(window.find(old_val));
            skip_erase:;
        }

        // --- OUTPUT median once we have a full window ---
        if(i >= k - 1){
            cout << *mid << " \n"[i == n - 1];
        }
    }
}

Operations Reference

The costs for each set/multiset operation—an O(logn) core with high per-node memory overhead.

OperationTimeSpace
insert(x)O(logn)O(1) per element (tree node overhead ~40-64 bytes)
erase(iterator)O(1) amortized--
erase(value)O(logn+k), k = count of value--
find(x)O(logn)--
count(x)O(logn+k)--
contains(x) (C++20)O(logn)--
lower_bound(x)O(logn)--
upper_bound(x)O(logn)--
begin() / end()O(1)--
prev(it) / next(it)O(1) amortized--
Iterate all elementsO(n)--
clear()O(n)--
Total space for n elements--O(n), ~40-64 bytes per node

Problem Patterns & Tricks

Predecessor / Successor Queries

Use lower_bound and upper_bound to find the nearest element above or below a target in O(logn). This pattern appears in greedy problems where you pick the "best available" item.

cpp
// Largest element <= x
auto it = s.upper_bound(x);
if(it != s.begin()) { --it; /* *it is the answer */ }

// Smallest element >= x
auto it = s.lower_bound(x);
if(it != s.end()) { /* *it is the answer */ }

CF examples: CF 1234D, CF 1462E.

Multiset as a lazy-delete priority queue. When you need a priority queue that supports arbitrary deletion (not just the top), use a multiset. The min is *ms.begin(), the max is *ms.rbegin(), and you can erase any element by iterator.

cpp
multiset<int> pq;
pq.insert(10); pq.insert(3); pq.insert(7);
int mn = *pq.begin();      // 3
int mx = *pq.rbegin();     // 10
pq.erase(pq.find(7));      // remove 7 specifically

CF examples: CF 1354D, CF 1157E.

Sliding window with an ordered structure. Maintain a multiset of the current window; it supports finding the median, min, max, or k-th element in O(logn) per slide. CF examples: CF 1196D2, CSES Sliding Median.

Coordinate compression via set. Insert all relevant values into a set, then assign compressed indices. Useful before applying a BIT or segment tree.

cpp
set<int> vals(a.begin(), a.end());
map<int,int> comp;
int idx = 0;
for(int v : vals) comp[v] = idx++;
// Now comp[a[i]] gives the compressed index

CF examples: CF 1311F, CF 1560F2.

Policy-based order-statistics tree. GNU __gnu_pbds::tree gives you a set with O(logn) find_by_order(k) (k-th element) and order_of_key(x) (rank of x)—critical when you need rank queries that a plain set can't provide.

cpp
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace __gnu_pbds;

typedef tree<int, null_type, less<int>,
    rb_tree_tag, tree_order_statistics_node_update> ordered_set;

ordered_set os;
os.insert(1); os.insert(3); os.insert(5);
cout << *os.find_by_order(1);  // 3  (0-indexed)
cout << os.order_of_key(4);   // 2  (# elements < 4)

For duplicate support, use tree<pair<int,int>, ...> with {value, unique_id}. CF examples: CF 1042D, CF 1311F.


Two Sets for Median Maintenance

Maintain two multisets—one for the lower half, one for the upper half. Rebalance after each insert/delete to keep sizes within 1. Gives O(logn) median.

cpp
multiset<int> lo, hi;
// Invariant: lo.size() == hi.size() or lo.size() == hi.size() + 1
// Median = *lo.rbegin()
auto balance = [&](){
    while(lo.size() > hi.size() + 1){
        hi.insert(*lo.rbegin());
        lo.erase(prev(lo.end()));
    }
    while(hi.size() > lo.size()){
        lo.insert(*hi.begin());
        hi.erase(hi.begin());
    }
};

CF examples: CF 1234D, CF 1702F.

"Next available" with a set of free slots. Keep a set<int> of available positions. When a position is requested, use lower_bound to find the nearest available one and erase it.

cpp
set<int> free_slots;
for(int i = 1; i <= n; i++) free_slots.insert(i);

int allocate(int preferred){
    auto it = free_slots.lower_bound(preferred);
    if(it == free_slots.end()) it = free_slots.begin(); // wrap around
    int slot = *it;
    free_slots.erase(it);
    return slot;
}

CF examples: CF 1469D, CF 1700D.

Contest Cheat Sheet

text
+--------------------------------------------------------------+
|                   SET / MULTISET CHEAT SHEET                 |
+--------------------------------------------------------------+
| WHEN TO USE:                                                 |
|   - Need sorted order + O(log n) insert/delete/lookup        |
|   - Need lower_bound / upper_bound (predecessor/successor)   |
|   - Need min/max of dynamic collection                       |
|   - multiset when duplicates matter                          |
+--------------------------------------------------------------+
| KEY CALLS:                                                   |
|   s.insert(x);          s.erase(s.find(x)); // ONE copy     |
|   s.lower_bound(x);     s.upper_bound(x);                   |
|   *s.begin();  // min   *s.rbegin();  // max                 |
|   prev(it); next(it);   s.contains(x); // C++20             |
+--------------------------------------------------------------+
| DANGER:                                                      |
|   ms.erase(x) removes ALL copies! Use ms.erase(ms.find(x)) |
|   Use MEMBER lower_bound, not std::lower_bound (O(n)!)      |
|   Check it != end() before dereferencing!                    |
+--------------------------------------------------------------+
| COMPLEXITY:                                                  |
|   insert/erase/find/lower_bound: O(log n)                   |
|   Space: ~48 bytes per node (high constant)                  |
+--------------------------------------------------------------+
| POLICY-BASED (order statistics):                             |
|   #include <ext/pb_ds/assoc_container.hpp>                   |
|   find_by_order(k) -> k-th element (0-indexed)              |
|   order_of_key(x)  -> # elements < x                        |
+--------------------------------------------------------------+

Common Mistakes & Debugging

multiset::erase(value) Removes ALL Copies

This is the #1 multiset bug. ms.erase(x) removes every element equal to x, not just one. In miniature:

cpp
multiset<int> ms = {3, 5, 5, 7};
ms.erase(5);           // intended: remove one 5
cout << ms.size();     // prints 2, not 3

The failure mode surfaces in real problems too. Imagine maintaining a multiset of segment lengths for a "Traffic Lights" style problem. When a new point splits segment L into L1 and L2, a naive line seg.erase(L) silently deletes all copies of that length—breaking any test case where two segments share a length. The fix is always seg.erase(seg.find(L)).

multiset::erase(value) erases all occurrences; multiset::erase(iterator) erases exactly one. To remove one copy safely:

cpp
auto it = ms.find(x);
if(it != ms.end()) ms.erase(it);  // erases exactly one

std::lower_bound vs member lower_bound. The free function std::lower_bound uses std::advance on non-random-access iterators, making it O(n) on a set. Always use the member function.

cpp
// WRONG -- O(n) because set iterators are bidirectional, not random-access
auto it = lower_bound(s.begin(), s.end(), x);

// RIGHT -- O(log n), uses the tree structure
auto it = s.lower_bound(x);

Iterator invalidation. insert() does not invalidate any existing iterators. erase(it) invalidates only it; all others remain valid. When erasing during iteration, use the return value:

cpp
for(auto it = s.begin(); it != s.end(); ){
    if(should_remove(*it))
        it = s.erase(it);   // returns next valid iterator
    else
        ++it;
}

Dereferencing end(). Always bounds-check before dereferencing the result of find, lower_bound, or upper_bound:

cpp
auto it = s.lower_bound(x);
if(it != s.end()){
    // safe to use *it
}

Empty set pitfalls. *s.begin() and *s.rbegin() are undefined behavior on an empty set. Guard it:

cpp
if(!s.empty()) cout << *s.begin();

Custom comparators. A custom comparator must define a strict weak ordering (<, not <=). Using <= causes undefined behavior and silent corruption.

cpp
// WRONG
set<int, less_equal<int>> s;  // DO NOT DO THIS

// RIGHT
set<int, less<int>> s;        // default, fine
set<int, greater<int>> s;     // max at begin()

High memory constant. Each tree node allocates ~40–64 bytes (pointers + color + alignment). For n=106, that's ~50–60 MB. If memory is tight (64 MB), consider sorted arrays or Fenwick trees.

Elements are immutable. Set elements are const—modifying them corrupts the internal BST ordering. In C++17 it won't compile; casting away const is UB. For example:

cpp
set<pair<int,int>> s = {{1,10},{2,20},{3,30}};
auto it = s.find({2, 20});
it->second = 25; // update the value

To change a value, erase and re-insert:

cpp
s.erase(it);
s.insert({2, 25});

Self-Test

Verify you can answer each of these without looking at the notes:

  • [ ] Write a "find nearest element" query: given a set and value x, find the closest element in the set
  • [ ] Erase exactly one copy of a value from a multiset without erasing all duplicates
  • [ ] Explain the difference between lower_bound(x) and upper_bound(x) with a concrete example
  • [ ] State when unordered_set is better than set and when set is the only correct choice
  • [ ] Access the last element of a set using both rbegin() and prev(end())
  • [ ] Explain why std::lower_bound(s.begin(), s.end(), x) is O(n) but s.lower_bound(x) is O(logn)

Practice Problems

#ProblemSourceDifficultyKey Idea
1TowersCF 37AEasy (800)Use set to count distinct values
2Registration SystemCF 4CEasy (1300)map to track name counts
3Closest to the RightCF 1166AEasy (1300)lower_bound on a set
4MultisetCF 1354DMedium (1900)Multiset operations + careful erase
5Sliding MedianCSESMedium (1700)Sliding window + multiset median tracking
6Number of SmallerCF 1042DMedium (1800)Policy-based tree, order_of_key
7Close SegmentsCF 1462E2Medium (1900)Sorting + multiset for counting
8Multiples of LengthCF 1157EMedium-Hard (2000)Greedy + multiset erase/insert
9Josephus Problem IICSESHard (2100)Order-statistics tree simulation
10Array PartitionCF 1700DHard (2100)Set of available positions + greedy

Further Reading


Built for the climb from Codeforces 1100 to 2100.