Parallel Binary Search

Quick Revisit
USE WHEN: Q queries each binary-searching over a shared timeline of events with expensive per-query replay
REQUIRES: the predicate is monotonic per query — once true at time t, it stays true for all t' > t. Without this, the binary-search decision is invalid and PBS silently returns wrong answers.
INVARIANT: all queries sharing the same midpoint are checked together in one event-replay pass
TIME: O(log T · (T · u + Q · q)) where T = number of events, u = per-event update cost (e.g. log N for BIT, α(N) for DSU), q = per-query check cost
SPACE: O(N + Q)
CLASSIC BUG: not resetting the data structure between binary search rounds — each round replays from scratch
PRACTICE: 08-ladder-mixed

AL-41 | Run $Q$ independent binary searches simultaneously: group queries by their current midpoints, process all events up to each midpoint once, then decide left/right for every query in one pass. Reduces $O (Q \log T \cdot f)$ to $O (\log T \cdot (N + Q) \cdot f)$ .

Difficulty: [Advanced]Prerequisites: Binary Search, Offline Queries, Fenwick TreeCF Rating Range: 1900--2100

2. Intuition
3. C++ STL Reference
4. Implementation (Contest-Ready)
5. Operations Reference
6. Problem Patterns & Tricks
7. Contest Cheat Sheet
8. Gotchas & Debugging
9. Self-Test
10. Practice Problems
11. Further Reading

Intuition

You have $N$ events that happen in order (time $1, 2, \dots, T$ ) and $Q$ queries, each asking: "What is the minimum time $t$ such that some condition holds for query $i$ ?"

Hard precondition: the per-query predicate must be monotonic. Once it becomes true at time $t$ , it must stay true for every $t^{'} > t$ . This is what makes the answer binary-searchable. If the condition can toggle on and off (e.g., "is component size exactly 3?"), PBS will produce wrong answers without any obvious symptom -- the binary search decisions are invalid even though the framework "runs." Verify monotonicity from the problem semantics before reaching for PBS.

Given monotonicity, each query can be solved independently with binary search over $[1, T]$ . But the check function is expensive: it requires replaying events from scratch.

  Events:  e1  e2  e3  e4  e5  e6  e7  e8      (T = 8)

  Q0: binary search [1..8] --> mid=4, replay e1..e4, check --> go right
      then [5..8]          --> mid=6, replay e1..e6, check --> go left
      then [5..5]          --> answer = 5

  Q1: binary search [1..8] --> mid=4, replay e1..e4, check --> go left
      then [1..3]          --> mid=2, replay e1..e2, check --> go right
      then [3..3]          --> answer = 3

Each query does $O (\log T)$ iterations. Each iteration replays up to $O (T)$ events. With $Q$ queries, total cost is $O (Q \cdot \log T \cdot T)$ . For $Q = T = 10^{5}$ , that is $10^{5} \cdot 17 \cdot 10^{5} \approx 10^{11.2}$ -- hopelessly slow.

The waste is obvious: Q0 replays $e_{1} \dots e_{4}$ at midpoint 4, and Q1 also replays $e_{1} \dots e_{4}$ at midpoint 4. They share the same midpoint but do the work independently.

Run all $Q$ binary searches simultaneously: group queries by their current midpoint, process events up to that midpoint once, then decide left/right for all queries -- total $O (\log T)$ passes over all events.

Instead of each query owning a private binary search, we orchestrate all binary searches in lockstep. In each pass:

Every query has a current search interval $[l o_{i}, h i_{i}]$ . Compute $m i d_{i} = (l o_{i} + h i_{i}) / 2$ .
Group queries by $m i d_{i}$ . Sort or bucket them.
Sweep events $e_{1}, e_{2}, \dots, e_{T}$ in order. When the sweep reaches time $m i d_{i}$ , evaluate the condition for every query whose midpoint is $m i d_{i}$ .
Based on the check result, set $l o_{i} = m i d_{i} + 1$ (go right) or $h i_{i} = m i d_{i}$ (go left).
Reset the data structure and repeat.

Each pass processes all $T$ events exactly once and evaluates all $Q$ queries. After $O (\log T)$ passes, every query has converged.

Total: $O (\log T \cdot (T \cdot u + Q \cdot q))$ , where $u$ is the per-event update cost (e.g., $O (\log N)$ for a BIT update, $O (α (N))$ for a DSU union) and $q$ is the per-query check cost (typically $O (\log N)$ for a BIT range query, $O (α (N))$ for a DSU connectivity check). Naming both factors matters: editorials often write a single " $f$ ", but for tight constraints you need to know which one dominates -- the BIT updates ( $T \log N$ per pass) usually dwarf the query checks unless $Q ≫ T$ .

  Parallel Binary Search -- multiple queries, shared timeline:

  Timeline:  [  1  2  3  4  5  6  7  8  9  10  ]

  Query 1:   [---lo---------mid---------hi---]   check mid=5
  Query 2:   [---lo----mid----------hi-------]   check mid=4
  Query 3:   [------lo------mid------hi------]   check mid=6

  All queries processed in ONE pass through events 1..10
  Then each query updates its lo/hi based on check result

  One PBS iteration:

  +------------------+     +-------------------+     +------------------+
  | For each query   |     | Process events    |     | For each query   |
  | compute mid =    |---->| 1..max_mid in     |---->| if check(mid)    |
  | (lo+hi)/2        |     | order, applying   |     |   hi = mid       |
  | bucket by mid    |     | to data structure |     | else lo = mid+1  |
  +------------------+     +-------------------+     +------------------+
         ^                                                    |
         |                                                    |
         +---------- repeat O(log T) iterations <-------------+

Visual Walkthrough

Four queries binary searching over time $[1 \dots 8]$ . Each query $i$ asks: "minimum time $t$ such that node $i$ is connected to node 0?"

Suppose the true answers are: Q0 = 3, Q1 = 5, Q2 = 2, Q3 = 7.

  PASS 1  (all queries start with [1, 8], mid = 4)
  --------------------------------------------------------
  Sweep events e1..e8. At mid = 4, check all queries:

      Q0: connected after e1..e4?  YES  --> hi = 4     [1, 4]
      Q1: connected after e1..e4?  NO   --> lo = 5     [5, 8]
      Q2: connected after e1..e4?  YES  --> hi = 4     [1, 4]
      Q3: connected after e1..e4?  NO   --> lo = 5     [5, 8]

  Reset data structure.

  PASS 2  (midpoints: Q0=2, Q1=6, Q2=2, Q3=6)
  --------------------------------------------------------
  Sweep events e1..e8:
    At time 2 (mid for Q0, Q2):
      Q0: connected after e1..e2?  NO   --> lo = 3     [3, 4]
      Q2: connected after e1..e2?  YES  --> hi = 2     [1, 2]

    At time 6 (mid for Q1, Q3):
      Q1: connected after e1..e6?  YES  --> hi = 6     [5, 6]
      Q3: connected after e1..e6?  NO   --> lo = 7     [7, 8]

  Reset data structure.

  PASS 3  (midpoints: Q0=3, Q1=5, Q2=1, Q3=7)
  --------------------------------------------------------
  Sweep events e1..e8:
    At time 1: Q2: connected? NO   --> lo = 2          [2, 2]  DONE
    At time 3: Q0: connected? YES  --> hi = 3          [3, 3]  DONE
    At time 5: Q1: connected? YES  --> hi = 5          [5, 5]  DONE
    At time 7: Q3: connected? YES  --> hi = 7          [7, 7]  DONE

  All intervals have lo = hi.  Answers: Q0=3, Q1=5, Q2=2, Q3=7.

  Bucketing queries by mid:

  Event timeline: 1   2   3   4   5   6   7   8
                  |   |   |   |   |   |   |   |
  Bucket at 2:   Q3  |   |   |   |   |   |   |
  Bucket at 4:       Q1,Q5   |   |   |   |   |
  Bucket at 6:               Q2  |   |   |   |
  Bucket at 7:                       Q4  |   |

  Process events 1->8, check queries when reaching their mid

Three passes instead of twelve individual binary search runs. Each pass swept all eight events exactly once.

The Invariant

+-------------------------------------------------------------------+
| INVARIANT: After pass k, every query's search interval [lo, hi]   |
| has width <= T / 2^k. The true answer always lies within [lo, hi].|
| After ceil(log2(T)) passes, lo = hi for every query.              |
+-------------------------------------------------------------------+

Correctness relies on the monotonicity of the predicate. If "condition holds at time $t$ " implies "condition holds at time $t^{'} > t$ ", then the binary search decision (go left / go right) is always correct, and the interval always contains the true answer.

When to Reach for This

Trigger phrases in problem statements:

"For each of $Q$ queries, find the minimum time / threshold / cost such that ..." -- multiple binary searches over the same event axis.
"Events happen in order; each query asks when something first becomes true" -- monotonic predicate over a shared timeline.
"Offline queries with monotonic answer" -- the answer for each query can be binary searched, and events can be replayed.

Counter-examples (this technique does NOT apply):

Non-monotonic predicate. If the condition can toggle on and off, binary search is invalid. Use persistent data structures instead.
Online / forced-online queries. Queries depend on previous answers (e.g., XOR decryption). You cannot batch midpoints.
Single query. Just do a normal binary search. Parallel binary search is only beneficial for $Q ≫ 1$ .

What the Code Won't Teach You

Reading the implementation alone won't reveal why PBS works. The critical structural insight: all $Q$ queries binary search over the same ordered domain (time, weight, index), and checking feasibility for all queries at a given midpoint can be done in a single sweep through the events. Without the "single sweep" property, PBS offers no advantage over independent binary searches.

The log factor savings come from amortization. Naive: $Q$ binary searches x $O (\log T)$ iterations x $O (T)$ per check = $O (Q T \log T)$ . PBS: $O (\log T)$ rounds x one sweep of $T$ events handling all $Q$ midpoints = $O ((T + Q) \log T \cdot f)$ . The key: processing events is the expensive part, and PBS ensures each event is processed at most once per round regardless of how many queries share that midpoint.

Also non-obvious: the data structure reset between passes is not a performance hack -- it is a correctness requirement. Each round must evaluate midpoints against a clean state built from scratch, because queries at different midpoints need the data structure in different states (events $[1 \dots m i d_{1}]$ vs. $[1 \dots m i d_{2}]$ ). The sweep handles this by processing events in order and checking queries when the sweep reaches their midpoint.

C++ STL Reference

Function / Class	Header	What it does	Time
`vector<vector<int>>`	`<vector>`	Group (bucket) query indices by midpoint	$O (Q)$
`fill(first, last, val)`	`<algorithm>`	Reset BIT / data structure between passes	$O (N)$
`iota(first, last, val)`	`<numeric>`	Initialize index arrays for indirect sorting	$O (N)$
`sort(first, last, cmp)`	`<algorithm>`	Sort events by time (once, before all passes)	$O (T \log T)$
`__builtin_clz(x)`	built-in	Compute $⌈ \log_{2} T ⌉$ for pass count estimate	$O (1)$

Implementation (Contest-Ready)

Version 1 -- Minimal: Parallel Binary Search Framework

cpp

#include <bits/stdc++.h>
using namespace std;

struct BIT {
    int n;
    vector<long long> t;
    BIT() : n(0) {}
    BIT(int n) : n(n), t(n + 1) {}
    void update(int i, long long v) {
        for (; i <= n; i += i & -i) t[i] += v;
    }
    long long query(int i) {
        long long s = 0;
        for (; i > 0; i -= i & -i) s += t[i];
        return s;
    }
    long long query(int l, int r) { return query(r) - query(l - 1); }
    void reset() { fill(t.begin(), t.end(), 0LL); }
};

int main() {
    int n, m, q;
    scanf("%d%d%d", &n, &m, &q);

    // Events: at time j, add value w to position p.
    struct Event { int p; long long w; };
    vector<Event> events(m);
    for (int j = 0; j < m; j++)
        scanf("%d%lld", &events[j].p, &events[j].w);

    // Queries: minimum time t such that sum over [l..r] >= k.
    struct Query { int l, r; long long k; };
    vector<Query> queries(q);
    for (int i = 0; i < q; i++)
        scanf("%d%d%lld", &queries[i].l, &queries[i].r, &queries[i].k);

    vector<int> lo(q, 0), hi(q, m - 1), ans(q, -1);
    BIT bit(n);

    for (int pass = 0; pass < 20; pass++) { // ceil(log2(m)) <= 20
        // Bucket queries by midpoint.
        vector<vector<int>> bucket(m);
        bool any_active = false;
        for (int i = 0; i < q; i++) {
            if (lo[i] <= hi[i]) {
                int mid = (lo[i] + hi[i]) / 2;
                bucket[mid].push_back(i);
                any_active = true;
            }
        }
        if (!any_active) break;

        bit.reset();
        for (int j = 0; j < m; j++) {
            // Apply event j.
            bit.update(events[j].p, events[j].w);
            // Check all queries whose midpoint is j.
            for (int i : bucket[j]) {
                long long val = bit.query(queries[i].l, queries[i].r);
                if (val >= queries[i].k) {
                    ans[i] = j;      // feasible: record and go left
                    hi[i] = (lo[i] + hi[i]) / 2 - 1;
                } else {
                    lo[i] = (lo[i] + hi[i]) / 2 + 1; // infeasible: go right
                }
            }
        }
    }

    for (int i = 0; i < q; i++)
        printf("%d\n", ans[i] + 1); // 1-indexed time
}

Complexity: $O (\log M \cdot (M + Q) \cdot \log N)$ time, $O (N + M + Q)$ space.

$\log M$ passes over all events.
Each pass: $M$ BIT updates ( $O (\log N)$ each) + $Q$ BIT queries ( $O (\log N)$ each).

Version 2 -- Explained: Minimum Time for Connectivity (DSU)

Classic problem: $N$ nodes, $M$ edges added one by one. $Q$ queries ask "what is the earliest time nodes $u_{i}$ and $v_{i}$ become connected?"

cpp

#include <bits/stdc++.h>
using namespace std;

struct DSU {
    vector<int> par, rnk;
    // Track modified nodes for efficient reset.
    vector<int> modified;

    DSU(int n) : par(n), rnk(n, 0) {
        iota(par.begin(), par.end(), 0);
    }

    int find(int x) {
        // Path splitting -- does not prevent reset.
        while (par[x] != x) {
            int next = par[par[x]];
            if (par[x] != next) {
                modified.push_back(x);
                par[x] = next;
            }
            x = next;
        }
        return x;
    }

    bool unite(int a, int b) {
        a = find(a); b = find(b);
        if (a == b) return false;
        if (rnk[a] < rnk[b]) swap(a, b);
        modified.push_back(b);
        par[b] = a;
        if (rnk[a] == rnk[b]) {
            modified.push_back(~a); // sentinel: rank change
            rnk[a]++;
        }
        return true;
    }

    bool connected(int a, int b) { return find(a) == find(b); }

    void reset() {
        // Undo all modifications since last reset.
        while (!modified.empty()) {
            int node = modified.back();
            modified.pop_back();
            if (node < 0) {
                rnk[~node]--;
            } else {
                par[node] = node;
            }
        }
    }
};

int main() {
    int n, m, q;
    scanf("%d%d%d", &n, &m, &q);

    struct Edge { int u, v; };
    vector<Edge> edges(m);
    for (auto& [u, v] : edges) {
        scanf("%d%d", &u, &v);
        u--; v--; // 0-indexed
    }

    struct Query { int u, v; };
    vector<Query> queries(q);
    for (auto& [u, v] : queries) {
        scanf("%d%d", &u, &v);
        u--; v--;
    }

    // lo[i] = earliest feasible time, hi[i] = latest time to check.
    // ans[i] = best known feasible time (or -1 if never connected).
    vector<int> lo(q, 0), hi(q, m - 1), ans(q, -1);
    DSU dsu(n);

    int passes = 0;
    while (passes < 20) {
        // Bucket queries by midpoint.
        vector<vector<int>> bucket(m);
        bool any_active = false;
        for (int i = 0; i < q; i++) {
            if (lo[i] <= hi[i]) {
                bucket[(lo[i] + hi[i]) / 2].push_back(i);
                any_active = true;
            }
        }
        if (!any_active) break;

        dsu.reset();
        for (int j = 0; j < m; j++) {
            dsu.unite(edges[j].u, edges[j].v);

            for (int i : bucket[j]) {
                int mid = (lo[i] + hi[i]) / 2;
                if (dsu.connected(queries[i].u, queries[i].v)) {
                    ans[i] = mid + 1; // 1-indexed answer
                    hi[i] = mid - 1;
                } else {
                    lo[i] = mid + 1;
                }
            }
        }
        // Reset DSU for next pass.
        dsu.reset();
        passes++;
    }

    for (int i = 0; i < q; i++)
        printf("%d\n", ans[i]);
}

Why a resettable DSU instead of rollback? Each pass rebuilds from scratch. We only need to undo everything at the end of a pass, not fine-grained rollback to arbitrary checkpoints. Tracking modified nodes and restoring them is simpler and faster than a full rollback stack.

Complexity: $O (\log M \cdot (M \cdot α (N) + Q \cdot α (N)))$ $= O ((M + Q) \cdot α (N) \cdot \log M)$ .

Operations Reference

Operation	Time per pass	Passes	Total
Bucket queries by midpoint	$O (Q)$	$O (\log T)$	$O (Q \log T)$
Sweep events (BIT updates)	$O (T \log N)$	$O (\log T)$	$O (T \log N \log T)$
Evaluate queries (BIT queries)	$O (Q \log N)$	$O (\log T)$	$O (Q \log N \log T)$
Reset BIT	$O (N)$	$O (\log T)$	$O (N \log T)$
BIT variant total	--	--	$O ((T + Q) \cdot \log N \cdot \log T)$
Sweep events (DSU unions)	$O (T \cdot α (N))$	$O (\log T)$	$O (T \cdot α (N) \cdot \log T)$
Evaluate queries (DSU finds)	$O (Q \cdot α (N))$	$O (\log T)$	$O (Q \cdot α (N) \cdot \log T)$
Reset DSU	$O (T)$	$O (\log T)$	$O (T \log T)$
DSU variant total	--	--	$O ((T + Q) \cdot α (N) \cdot \log T)$

Problem Patterns & Tricks

Pattern 1: Minimum Time for Threshold (BIT)

$M$ events add values to positions. Each query asks: "minimum time such that the sum over $[l, r]$ reaches threshold $k$ ."

Binary search over time. Each pass sweeps all events, updating a BIT. At each midpoint, query the BIT for the range sum and compare to $k$ .

Template: Version 1 above.

Examples: Meteors (POI 2011), CF 617E.

Pattern 2: Minimum Time for Connectivity (DSU)

$M$ edges added over time. Each query asks: "when do nodes $u$ and $v$ first become connected?"

Binary search over the edge insertion timeline. Each pass replays all edges into a DSU. At each midpoint, check connectivity.

Template: Version 2 above.

Examples: CF 1923F, USACO Fencing the Cows variants.

Pattern 3: Minimum Cost / Weight Threshold

Edges have weights. Queries ask: "what is the minimum weight $w$ such that, using only edges with weight $\leq w$ , nodes $u$ and $v$ are connected?"

Sort edges by weight (not insertion time). The "time axis" becomes the sorted edge index. Same parallel binary search framework.

cpp

sort(edges.begin(), edges.end(), [](auto& a, auto& b) {
    return a.w < b.w;
});
// Now "event j" = "add the j-th lightest edge".
// Each query binary searches over [0, m-1] for minimum j.

Examples: CF 1851G (can also be solved with offline DSU, but PBS generalizes to harder variants).

Pattern 4: Assign Values via PBS (Meteors-Style)

Each of $N$ nations owns a set of stations on a circular array. $M$ meteor showers each add values to a range $[l, r]$ . Each nation has a target sum. Find the earliest shower after which each nation's total reaches its target.

The check function queries total value across all stations of a nation. Use a BIT for range-add, point-query (or range-add, range-query with a difference BIT).

cpp

// Range update: add v to [l, r]
void range_add(BIT& bit, int l, int r, long long v) {
    bit.update(l, v);
    bit.update(r + 1, -v);
}
// Point query: prefix sum gives value at position i
long long point_query(BIT& bit, int i) {
    return bit.query(i);
}

This is the canonical PBS problem. Each pass: sweep showers in order, range-add to BIT, check nations at their midpoints.

Example: Meteors (POI 2011).

Pattern 5: PBS + Merge Sort Tree / Wavelet Tree

Sometimes the check function needs "count of elements $\leq x$ in range $[l, r]$ ." A merge sort tree (or wavelet tree) answers this in $O (\log^{2} N)$ or $O (\log N)$ . Pair it with PBS when you have $Q$ queries each needing a different threshold $x$ .

  Binary search over x for each query.
  Each pass: for all queries at midpoint x, query the merge sort
  tree for count of elements <= x in [l, r].

Example: CF 1093G -- more naturally solved with persistent segment trees, but PBS is an alternative.

Pattern 6: Optimized Reset via Undo Log

Instead of clearing the entire data structure each pass, keep an undo log. After each pass, undo all modifications. This avoids the $O (N)$ reset cost and may be faster in practice when $T ≪ N$ .

The DSU Version 2 above uses this approach with the modified vector. For BIT, track which indices were updated:

cpp

vector<int> touched;
void tracked_update(BIT& bit, int i, long long v) {
    touched.push_back(i);
    bit.update(i, v);
}
void undo_all(BIT& bit) {
    // Zero out only touched cells -- faster than full reset.
    // Rebuild would be needed; simpler to just re-zero the BIT.
    bit.reset(); // For BIT, full reset is usually fine.
    touched.clear();
}

Contest Cheat Sheet

+-------------------------------------------------------------------+
|               PARALLEL BINARY SEARCH CHEAT SHEET                  |
+-------------------------------------------------------------------+
| WHEN TO USE:                                                      |
|   - Q queries, each binary searching over the same event axis     |
|   - Predicate is monotonic (once true, stays true)                |
|   - All queries and events are known offline                      |
|   - Independent binary searches would repeat event processing     |
+-------------------------------------------------------------------+
| FRAMEWORK:                                                        |
|   lo[i] = 0, hi[i] = M-1 for all queries i                      |
|   repeat ceil(log2(M)) times:                                     |
|     bucket queries by mid = (lo+hi)/2                             |
|     sweep events 0..M-1:                                          |
|       apply event j to data structure                             |
|       for each query i with mid_i == j:                           |
|         if check(i): ans[i]=j, hi[i]=mid-1                       |
|         else:        lo[i]=mid+1                                  |
|     reset data structure                                          |
+-------------------------------------------------------------------+
| COMPLEXITY:                                                       |
|   BIT:  O((M + Q) * log(N) * log(M))  |  O(N + M + Q) space     |
|   DSU:  O((M + Q) * alpha(N) * log(M)) | O(N + M + Q) space     |
+-------------------------------------------------------------------+
| REMEMBER:                                                         |
|   - Midpoint must be computed BEFORE the pass (snapshot lo/hi)    |
|   - Reset data structure between passes (or use undo log)         |
|   - ans[i] stores last known feasible value; final answer = ans   |
|   - Handle "no answer" case: if lo > hi and ans == -1, print -1  |
|   - 1-index conversion: if events are 0-indexed, ans[i]+1        |
+-------------------------------------------------------------------+

Gotchas & Debugging

Midpoint computed during the pass instead of before. Once you update lo[i] or hi[i] mid-pass, the midpoint changes. Always compute all midpoints before the sweep starts (or bucket queries at the start of each pass). The bucketing approach in Version 1 handles this correctly.
Off-by-one in binary search. The standard pitfall. If the predicate is "sum >= k", use hi = mid when true and lo = mid + 1 when false (finding leftmost true). The implementations above use hi = mid - 1 with a separate ans variable to record the last feasible value -- both conventions work, but mixing them is a bug.
Forgetting to reset the data structure. Each pass must start from a clean state. If you forget to reset the BIT or DSU, events from the previous pass contaminate the current one, producing wrong midpoint evaluations.
Not handling "never feasible" queries. If a query's condition is never satisfied (e.g., nodes that are never connected), ans stays at $- 1$ . Make sure your output handles this case.
BIT overflow. If events add large values and you sum over many events, the BIT needs long long. With $M = 10^{5}$ events each adding up to $10^{9}$ , the sum can reach $10^{14}$ .
DSU reset correctness. When resetting a DSU by undoing modifications, process the undo log in reverse order. The modified vector is a stack -- pop from the back.
Circular ranges in Meteors. If events are range-adds on a circular array ( $l > r$ wraps around), split into two BIT updates: [l, N] and [1, r].
Debugging tip. Print each query's $[l o, h i]$ after every pass. Verify that intervals shrink monotonically. If any interval grows or oscillates, the check function is not monotonic (or there is a reset bug).

Mental Traps

"PBS is just binary search on each query independently." This is exactly what PBS is NOT. Independent binary searches cost $O (Q \log T \cdot check)$ . PBS runs all $Q$ searches together -- each round processes all events once and advances every query's interval simultaneously. The savings come from batching the event sweep across all queries.
"Parallel means multithreaded." The word "parallel" refers to the $Q$ binary searches running in lockstep (simultaneously), not to CPU parallelism or concurrency. Don't let the name mislead you.
"If I can binary search the answer, I should use PBS." PBS helps only when (1) you have multiple queries each needing binary search, and (2) the check for all queries at a given midpoint can be done efficiently in one sweep. If each query's individual check is already fast ( $O (\log N)$ ), running $Q$ separate binary searches may be fine.
"PBS and CDQ divide-and-conquer are interchangeable." They solve overlapping but distinct problem classes. PBS requires a monotonic predicate per query; CDQ D&C partitions the problem recursively and handles contribution counting. Reaching for the wrong one wastes contest time.

The mistake that teaches

CF Meteors: replaying from scratch per midpoint. My PBS had the right asymptotic complexity on paper but kept TLEing. The bug: within each iteration, I was replaying operations from scratch for every distinct midpoint. If 1000 queries all had mid = 500, I applied operations 1..500 once. But for another 1000 with mid = 501, I cleared the BIT and replayed operations 1..501 from scratch. Fix: within each PBS iteration, sort the midpoints and process them in increasing order, applying operations incrementally (add operation mid to the BIT, don't restart from 1). Reset once per round, not per midpoint. This cut my constant factor ~50x. PBS's power comes from processing events once per round, not once per query.

Debug Drills

Drill 1: Off-by-one in the binary search convergence.

cpp

// Initial setup
vector<int> lo(Q, 0), hi(Q, T); // T operations numbered 0..T-1

for (int iter = 0; iter < 25; iter++) {
    // Group queries by midpoint
    vector<vector<int>> at(T + 1);
    for (int i = 0; i < Q; i++) {
        if (lo[i] <= hi[i]) {
            int mid = (lo[i] + hi[i]) / 2;
            at[mid].push_back(i);
        }
    }
    // Apply operations and check predicates
    reset_ds();
    for (int t = 0; t <= T; t++) {
        if (t > 0) apply_operation(t); // apply operation t
        for (int q : at[t]) {
            if (check(q)) hi[q] = t - 1;
            else lo[q] = t + 1;
        }
    }
}
// Answer for query i is lo[i]

What's wrong?

Operations are numbered 0..T-1, but hi is initialized to T (which is out of range). When mid = T, the code tries apply_operation(T), which doesn't exist. If the predicate is never satisfied, lo[i] ends up as T+1, which is an invalid answer. Fix: initialize hi[i] = T - 1 (last valid operation), and handle the "never satisfied" case separately (if lo[i] > T - 1 after convergence, the answer is "impossible").

Drill 2: Forgetting to reset the data structure between iterations.

cpp

for (int iter = 0; iter < LOG; iter++) {
    vector<vector<int>> buckets(T + 1);
    for (int i = 0; i < Q; i++) {
        if (lo[i] <= hi[i])
            buckets[(lo[i] + hi[i]) / 2].push_back(i);
    }
    // BUG: no reset here
    for (int t = 1; t <= T; t++) {
        apply(t);
        for (int q : buckets[t])
            if (check(q)) hi[q] = t - 1;
            else lo[q] = t + 1;
    }
}

What's wrong?

The data structure (BIT, DSU, etc.) is not reset between PBS iterations. After iteration 1, the DS contains all T operations. Iteration 2 starts applying operations again on top of the existing state, effectively doubling the values. Fix: call reset_ds() at the beginning of each iteration, before the inner loop.

Drill 3: Predicate check uses the wrong accumulated state.

cpp

for (int t = 1; t <= T; t++) {
    for (int q : buckets[t]) {
        if (check(q)) hi[q] = t - 1;
        else lo[q] = t + 1;
    }
    apply(t); // BUG: apply AFTER checking
}

What's wrong?

The predicate for queries with midpoint t should be checked after applying operations 1..t. But here, apply(t) happens after checking queries at midpoint t, so those queries only see operations 1..t-1. This shifts every answer by 1. Fix: move apply(t) before the query-checking loop, so queries at midpoint t see the effect of operation t.

Drill 4: Midpoint grouping with lo = t on failure.

cpp

for (int round = 0; round < 20; round++) {
    vector<vector<int>> at_mid(T + 1);
    for (int i = 0; i < Q; i++) {
        if (lo[i] < hi[i]) {
            int mid = (lo[i] + hi[i]) / 2;
            at_mid[mid].push_back(i);
        }
    }
    // Process events 1..T, at each time t check queries in at_mid[t]
    for (int t = 1; t <= T; t++) {
        applyEvent(t);
        for (int qi : at_mid[t]) {
            if (check(qi))
                hi[qi] = t;     // answer might be earlier
            else
                lo[qi] = t;     // answer is later -- BUG
        }
    }
    resetDataStructure();
}

What's wrong?

When the check fails (answer is later than mid), lo[qi] should be set to mid + 1, i.e., t + 1. Setting lo[qi] = t means the next round will include midpoint t again in the search range, potentially causing an infinite loop where lo never advances past a failing midpoint. Fix: lo[qi] = t + 1; (or equivalently, lo[qi] = mid + 1 where mid = t).

Drill 5: Not resetting BIT between rounds.

cpp

for (int round = 0; round < 20; round++) {
    // group queries by midpoint...
    int eventPtr = 0;
    for (int t = 1; t <= T; t++) {
        while (eventPtr < M && events[eventPtr].time <= t)
            bit.add(events[eventPtr].pos, events[eventPtr].val), eventPtr++;
        // check queries at midpoint t...
    }
    // forgot to reset BIT here!
}

What's wrong?

The BIT is never reset between rounds. In round 2, the BIT still contains all events from round 1, so prefix sums are doubled. By round k, values are k times too large, making every query's predicate trivially true. Fix: add bit.reset(); after each round's loop, or undo all insertions at the end of each round. Also, eventPtr should be reset to 0 at the start of each round.

Drill 6: Circular BIT range-add wrap-around.

cpp

// Meteor adds val to circular range [l, r] on array of size n
void addMeteor(int l, int r, long long val) {
    bit.add(l, val);
    bit.add(r + 1, -val);
    // Forgot to handle wrap-around when l > r (circular)
}

What's wrong?

When the range wraps around (l > r in a circular array), the range is [l, n] ∪ [1, r]. The code should split it:

cpp

if (l <= r) {
    bit.add(l, val);
    bit.add(r + 1, -val);
} else {
    bit.add(l, val);
    bit.add(n + 1, -val);
    bit.add(1, val);
    bit.add(r + 1, -val);
}

Without this, a wrapped range like [8, 3] on an array of size 10 incorrectly adds to positions 8..3 as if it were [3, 8], covering the wrong elements entirely.

Self-Test

Without referring back to the text -- check every box you can confidently do from memory.

[ ] Explain why running $Q$ binary searches "in parallel" saves a log factor compared to running them independently. What is shared across queries in each round?
[ ] State the two preconditions for PBS to apply: what properties must the queries and the predicate have? (Monotonic predicate + independent queries over the same ordered domain.)
[ ] Write the outer-loop skeleton of PBS from memory -- initialization of lo/hi, per-round bucketing by midpoint, event sweep, and binary search update -- without filling in problem-specific details.
[ ] Explain why PBS requires offline processing and cannot work when queries arrive online.
[ ] Derive the PBS time complexity for both the BIT variant ( $O ((T + Q) \log N \log T)$ ) and the DSU variant ( $O ((T + Q) \cdot α (N) \cdot \log T)$ ) from the loop structure.
[ ] Identify what goes wrong if you forget to reset the data structure between passes, and describe the visible symptom (intervals not shrinking, wrong answers, non-monotonic convergence).
[ ] Name a concrete problem (e.g., Meteors POI 2011) where PBS is needed and explain why naive independent binary searches would be too slow.

Practice Problems

Rating Progression

CF Rating	What to Master
1200	Understand basic binary search on the answer; recognize monotonic predicates (if condition holds at time t, it holds for all t' > t).
1500	Implement binary search on time for a single query by replaying operations; understand why this is O(Q * T log T) and too slow.
1800	Full parallel binary search: maintain `[lo, hi]` per query, group queries by midpoint, simulate operations once per iteration, check predicates in batch. O((T + Q) log T * f(n)) total. Parallelize: maintain lo[i]/hi[i] per query, group by midpoint, process events in one sweep, update bounds.
2100	Combine PBS with BIT/segment tree for efficient operation simulation; handle tricky predicates (e.g., "earliest time when connected component has sum >= K"); use PBS as a building block inside CDQ or other offline frameworks. PBS + BIT range-add for weighted events, PBS + DSU with rollback for connectivity, nested PBS.

#	Problem	Source	Difficulty	Key Idea
1	Meteors	POI 2011	Hard	PBS + BIT range-add; canonical PBS problem
2	CF 1923F -- Shrink-Ray	CF	Medium-Hard	PBS + prefix sums; shrinking intervals
3	CF 852D -- Exploration Plan	CF	Hard	PBS + DSU + bipartite matching check
4	CF 1100E -- Andrew and Taxi	CF	Medium	Binary search on edge weight + topological sort; PBS extends to multi-query
5	CF 670E -- Correct Bracket Sequence Editor	CF	Medium	PBS on deletion timeline
6	USACO Plat -- Robotic Cow Herd	USACO	Hard	PBS + priority queue / sorted merge
7	CF 1039D -- You Are Given a Tree	CF	Very Hard	PBS over block sizes; heavy-light decomposition
8	Aizu 2645 -- Community	Aizu	Hard	PBS + weighted DSU connectivity

Parallel Binary Search ​

Contents ​

Intuition ​

Visual Walkthrough ​

The Invariant ​

When to Reach for This ​

What the Code Won't Teach You ​

C++ STL Reference ​

Implementation (Contest-Ready) ​

Version 1 -- Minimal: Parallel Binary Search Framework ​

Version 2 -- Explained: Minimum Time for Connectivity (DSU) ​

Operations Reference ​

Problem Patterns & Tricks ​

Pattern 1: Minimum Time for Threshold (BIT) ​

Pattern 2: Minimum Time for Connectivity (DSU) ​

Pattern 3: Minimum Cost / Weight Threshold ​

Pattern 4: Assign Values via PBS (Meteors-Style) ​

Pattern 5: PBS + Merge Sort Tree / Wavelet Tree ​

Pattern 6: Optimized Reset via Undo Log ​

Contest Cheat Sheet ​

Gotchas & Debugging ​

Mental Traps ​

The mistake that teaches ​

Debug Drills ​

Self-Test ​

Practice Problems ​

Further Reading ​

Parallel Binary Search

Contents

Intuition

Visual Walkthrough

The Invariant

When to Reach for This

What the Code Won't Teach You

C++ STL Reference

Implementation (Contest-Ready)

Version 1 -- Minimal: Parallel Binary Search Framework

Version 2 -- Explained: Minimum Time for Connectivity (DSU)

Operations Reference

Problem Patterns & Tricks

Pattern 1: Minimum Time for Threshold (BIT)

Pattern 2: Minimum Time for Connectivity (DSU)

Pattern 3: Minimum Cost / Weight Threshold

Pattern 4: Assign Values via PBS (Meteors-Style)

Pattern 5: PBS + Merge Sort Tree / Wavelet Tree

Pattern 6: Optimized Reset via Undo Log

Contest Cheat Sheet

Gotchas & Debugging

Mental Traps

The mistake that teaches

Debug Drills

Self-Test

Practice Problems

Further Reading