Appearance
Closest Pair of Points
Quick Revisit
- USE WHEN: finding minimum distance between any two points in a 2D point set
- INVARIANT: strip of width 2d around dividing line has ≤ 6 candidate neighbors per point in y-order
- TIME: O(n log n) | SPACE: O(n)
- CLASSIC BUG: checking too few points in the strip — must compare against next 6–7 by y, not just adjacent
- PRACTICE: 08-ladder-mixed
Finding the two nearest points in a set of
Difficulty: [Intermediate]Prerequisites: Geometry Basics, SortingCF Rating Range: 1500 -- 2000
The Brute Force Wall
Check all
The structure we exploit: points that are far apart in sorted order can be pruned without checking. But the trick is doing this in 2D, where "sorted order" isn't enough -- you need the strip argument.
Divide and Conquer -- The Algorithm
- Sort points by x-coordinate.
- Divide into left and right halves by median x.
- Conquer recursively: find closest pair in each half. Let
. - Combine: the hard part. The closest pair might straddle the dividing line.
The Strip Trick
Only points within distance
Left | Right
|
. | .
. |
. . | . Strip width = 2d
. | . . (d on each side of the dividing line)
. |
|
|<-d-|--d->|
Within a d x 2d box, at most 8 points can fit
(pigeonhole on d/2 x d/2 grid cells).
Each point checks <= 6 others -> O(n) merge step.Why 6 (or 7)? Pack the
Total:
Why this file shows the sweep, not the recursion. The textbook divide-and-conquer version is the cleanest place to prove the
bound, and the strip-window argument is the heart of that proof. But the recursive version is fiddly to code (merging y-sorted halves in linear time, base cases, allocation churn). The sweep-line variant below has the same complexity and is easier to get right under contest pressure, so most contestants prefer it. Read the recursion as the why, write the sweep as the what.
Implementation
cpp
#include <bits/stdc++.h>
using namespace std;
using ll = long long;
struct P {
ll x, y;
int id;
};
ll dist2(P& a, P& b) {
return (a.x-b.x)*(a.x-b.x) + (a.y-b.y)*(a.y-b.y);
}
ll closest_pair(vector<P>& pts) {
int n = pts.size();
sort(pts.begin(), pts.end(), [](P& a, P& b){
return a.x < b.x || (a.x == b.x && a.y < b.y);
});
ll best = LLONG_MAX;
// maintain a set ordered by y for the active window
auto cmp = [](const P& a, const P& b){
return a.y < b.y || (a.y == b.y && a.x < b.x);
};
set<P, decltype(cmp)> active(cmp);
int left = 0;
for (int i = 0; i < n; i++) {
ll d = (ll)ceil(sqrt((double)best));
// remove points too far left
while (left < i && pts[i].x - pts[left].x > d) {
active.erase(active.find(pts[left]));
left++;
}
// query points in [y-d, y+d]
P lo = {LLONG_MIN, pts[i].y - d, -1};
for (auto it = active.lower_bound(lo); it != active.end(); ++it) {
if (it->y - pts[i].y > d) break;
ll dd = dist2(pts[i], const_cast<P&>(*it));
best = min(best, dd);
}
active.insert(pts[i]);
}
return best; // squared distance
}Note: This is the sweep-line variant (simpler to code than recursive D&C). It processes points left-to-right, maintaining a balanced BST of "active" points within the current best distance window. Same
A note on the
sqrtcall inside the loop. The Floating-Point Gotchas chapter (11-floating-point-gotchas.md) warns againstsqrtin inner loops because of precision drift and cost. Here,ceil(sqrt(best))runs once per outer-loop iteration (total calls, not ), and ceilplus along longcast guarantees that any rounding error makes the window wider rather than narrower — so we never miss a candidate pair. The cost is onesqrtper processed point. If you want to be paranoid, an integer alternative is to maintainbestandbest_root_ceiltogether, recomputingbest_root_ceilonly whenbestdecreases. For contest use, the version below is fast enough.
The Randomized Alternative
Hash points into a grid of cell size
When This Shows Up
- "Minimum distance between any two points" -- directly this problem.
- Clustering: closest pair is a building block for hierarchical clustering.
- Collision detection: "do any two objects overlap?" reduces to closest pair.
- As a subroutine: some problems compute closest pair after transforming coordinates.
Gotchas
| Trap | What happens | Fix |
|---|---|---|
Using double distances | Precision loss for large coordinates | Compare squared distances (stay in long long) |
| Forgetting to handle | Edge case in recursive version | Handle base case explicitly |
sqrt in the innermost loop | Slow and imprecise | Compare squared distances; only call sqrt once per outer iteration to size the sliding window (as the implementation does) |
| Duplicate points | Distance 0, division by zero in grid hash | Handle best = 0 early exit |
Self-Test
Q1. In the strip merge step, why is it sufficient to check only the next 6-7 points in y-sorted order rather than all strip points?
Answer
Within the $d \times 2d$ rectangle centered on the dividing line, you can fit at most 8 points (by a pigeonhole argument on a grid of $d/2 \times d/2$ cells). So each point has at most 7 neighbors in the strip, and checking 6-7 subsequent points in y-order covers all of them.
Q2. The sweep-line implementation uses
ceil(sqrt(best)). Why not justsqrt(best)?Answer
`best` stores the squared distance as a `long long`. We need the actual distance $d$ to define the sliding window width. Using `ceil` ensures we don't accidentally truncate and make the window too narrow, which could miss the true closest pair. The conservative rounding guarantees correctness.
Q3. Why does the implementation compare squared distances instead of actual distances?
Answer
Squared distances avoid `sqrt`, which is both slow and introduces floating-point error. Since $\text{dist}(a,b) < \text{dist}(c,d) \iff \text{dist}^2(a,b) < \text{dist}^2(c,d)$, comparisons on squared distances are equivalent and stay in exact integer arithmetic.
Q4. What happens if all
points are identical? Does the algorithm handle it correctly? Answer
All distances are 0, so `best = 0` immediately. In the sweep-line version, `d = 0`, so the window collapses -- no points are ever removed from the active set, but no new pairs improve the answer either. The algorithm returns 0 correctly. However, if using a grid-based randomized approach, cell size $d = 0$ causes division by zero -- special-case it.
Practice Problems
| Problem | What it tests |
|---|---|
| CSES -- Minimum Euclidean Distance | Textbook closest pair |
| SPOJ -- CLOPPAIR | Closest pair with point IDs |
| CF 429D -- Tricky Function | Reduce to closest pair via transformation |
| UVa 10245 -- The Closest Pair Problem | Classic implementation |
Further Reading
- Geometry Basics -- distance formulas, cross product
- Sweep Line Geometry -- the sweep paradigm generalized