Closest Pair of Points

Quick Revisit
USE WHEN: finding minimum distance between any two points in a 2D point set
INVARIANT: strip of width 2d around dividing line has ≤ 6 candidate neighbors per point in y-order
TIME: O(n log n) | SPACE: O(n)
CLASSIC BUG: checking too few points in the strip — must compare against next 6–7 by y, not just adjacent
PRACTICE: 08-ladder-mixed

Finding the two nearest points in a set of $n$ 2D points, in $O (n \log n)$ -- a classic divide-and-conquer gem that shows up in clustering, collision detection, and as a subroutine in harder geometry tasks.

Difficulty: [Intermediate]Prerequisites: Geometry Basics, SortingCF Rating Range: 1500 -- 2000

The Brute Force Wall

Check all $(\binom{n}{2})$ pairs. That's $O (n^{2})$ , which dies at $n = 10^{5}$ .

The structure we exploit: points that are far apart in sorted order can be pruned without checking. But the trick is doing this in 2D, where "sorted order" isn't enough -- you need the strip argument.

Divide and Conquer -- The Algorithm

Sort points by x-coordinate.
Divide into left and right halves by median x.
Conquer recursively: find closest pair in each half. Let $d = min (d_{L}, d_{R})$ .
Combine: the hard part. The closest pair might straddle the dividing line.

The Strip Trick

Only points within distance $d$ of the dividing line can improve the answer. Collect these into a strip, sort them by y-coordinate, and for each point, compare it against subsequent points in the strip -- but stop after at most 6 comparisons.

      Left  |  Right
            |
    .       |   .
        .   |
    .     . | .          Strip width = 2d
        .   | .   .      (d on each side of the dividing line)
    .       |
            |
       |<-d-|--d->|

  Within a d x 2d box, at most 8 points can fit
  (pigeonhole on d/2 x d/2 grid cells).
  Each point checks <= 6 others -> O(n) merge step.

Why 6 (or 7)? Pack the $d \times 2 d$ strip with non-overlapping $d / 2 \times d / 2$ cells: 2 columns × 4 rows = 8 cells, each can hold at most one point (otherwise two points would be within distance $d$ of each other on the same side of the divider, contradicting the recursive guarantee). So one point sees at most 7 others in the strip. Some textbooks tighten the bound to 6 by a slightly more careful packing argument; the difference does not matter as long as your loop uses a distance window rather than a fixed count. The implementation below takes that approach: keep advancing while $| y_{j} - y_{i} | < d$ , regardless of count. That sidesteps the 6-versus-7 debate entirely.

Total: $T (n) = 2 T (n / 2) + O (n \log n)$ with naive strip sort, or $O (n \log n)$ total if you merge-sort by y alongside the recursion.

Why this file shows the sweep, not the recursion. The textbook divide-and-conquer version is the cleanest place to prove the $O (n \log n)$ bound, and the strip-window argument is the heart of that proof. But the recursive version is fiddly to code (merging y-sorted halves in linear time, base cases, allocation churn). The sweep-line variant below has the same complexity and is easier to get right under contest pressure, so most contestants prefer it. Read the recursion as the why, write the sweep as the what.

Implementation

cpp

#include <bits/stdc++.h>
using namespace std;

using ll = long long;
struct P {
    ll x, y;
    int id;
};

ll dist2(P& a, P& b) {
    return (a.x-b.x)*(a.x-b.x) + (a.y-b.y)*(a.y-b.y);
}

ll closest_pair(vector<P>& pts) {
    int n = pts.size();
    sort(pts.begin(), pts.end(), [](P& a, P& b){
        return a.x < b.x || (a.x == b.x && a.y < b.y);
    });

    ll best = LLONG_MAX;
    // maintain a set ordered by y for the active window
    auto cmp = [](const P& a, const P& b){
        return a.y < b.y || (a.y == b.y && a.x < b.x);
    };
    set<P, decltype(cmp)> active(cmp);

    int left = 0;
    for (int i = 0; i < n; i++) {
        ll d = (ll)ceil(sqrt((double)best));
        // remove points too far left
        while (left < i && pts[i].x - pts[left].x > d) {
            active.erase(active.find(pts[left]));
            left++;
        }
        // query points in [y-d, y+d]
        P lo = {LLONG_MIN, pts[i].y - d, -1};
        for (auto it = active.lower_bound(lo); it != active.end(); ++it) {
            if (it->y - pts[i].y > d) break;
            ll dd = dist2(pts[i], const_cast<P&>(*it));
            best = min(best, dd);
        }
        active.insert(pts[i]);
    }
    return best; // squared distance
}

Note: This is the sweep-line variant (simpler to code than recursive D&C). It processes points left-to-right, maintaining a balanced BST of "active" points within the current best distance window. Same $O (n \log n)$ complexity.

A note on the sqrt call inside the loop. The Floating-Point Gotchas chapter (11-floating-point-gotchas.md) warns against sqrt in inner loops because of precision drift and cost. Here, ceil(sqrt(best)) runs once per outer-loop iteration ( $O (n)$ total calls, not $O (n^{2})$ ), and ceil plus a long long cast guarantees that any rounding error makes the window wider rather than narrower — so we never miss a candidate pair. The cost is one sqrt per processed point. If you want to be paranoid, an integer alternative is to maintain best and best_root_ceil together, recomputing best_root_ceil only when best decreases. For contest use, the version below is fast enough.

The Randomized Alternative

Hash points into a grid of cell size $d$ (where $d$ is a rough estimate of the answer). For each point, check only the $O (1)$ neighboring cells. Expected $O (n)$ time after an initial $O (n)$ pass to get a starting $d$ from a random sample. Rarely needed in contests but good to know exists.

When This Shows Up

"Minimum distance between any two points" -- directly this problem.
Clustering: closest pair is a building block for hierarchical clustering.
Collision detection: "do any two objects overlap?" reduces to closest pair.
As a subroutine: some problems compute closest pair after transforming coordinates.

Gotchas

Trap	What happens	Fix
Using `double` distances	Precision loss for large coordinates	Compare squared distances (stay in `long long`)
Forgetting to handle $n \leq 3$	Edge case in recursive version	Handle base case explicitly
`sqrt` in the innermost loop	Slow and imprecise	Compare squared distances; only call `sqrt` once per outer iteration to size the sliding window (as the implementation does)
Duplicate points	Distance 0, division by zero in grid hash	Handle `best = 0` early exit

Self-Test

Q1. In the strip merge step, why is it sufficient to check only the next 6-7 points in y-sorted order rather than all strip points?
Answer
Within the $d \times 2d$ rectangle centered on the dividing line, you can fit at most 8 points (by a pigeonhole argument on a grid of $d/2 \times d/2$ cells). So each point has at most 7 neighbors in the strip, and checking 6-7 subsequent points in y-order covers all of them.

Q2. The sweep-line implementation uses ceil(sqrt(best)). Why not just sqrt(best)?
Answer
`best` stores the squared distance as a `long long`. We need the actual distance $d$ to define the sliding window width. Using `ceil` ensures we don't accidentally truncate and make the window too narrow, which could miss the true closest pair. The conservative rounding guarantees correctness.

Q3. Why does the implementation compare squared distances instead of actual distances?
Answer
Squared distances avoid `sqrt`, which is both slow and introduces floating-point error. Since $\text{dist}(a,b) < \text{dist}(c,d) \iff \text{dist}^2(a,b) < \text{dist}^2(c,d)$, comparisons on squared distances are equivalent and stay in exact integer arithmetic.

Q4. What happens if all $n$ points are identical? Does the algorithm handle it correctly?
Answer
All distances are 0, so `best = 0` immediately. In the sweep-line version, `d = 0`, so the window collapses -- no points are ever removed from the active set, but no new pairs improve the answer either. The algorithm returns 0 correctly. However, if using a grid-based randomized approach, cell size $d = 0$ causes division by zero -- special-case it.

Practice Problems

Problem	What it tests
CSES -- Minimum Euclidean Distance	Textbook closest pair
SPOJ -- CLOPPAIR	Closest pair with point IDs
CF 429D -- Tricky Function	Reduce to closest pair via transformation
UVa 10245 -- The Closest Pair Problem	Classic implementation

Closest Pair of Points ​

The Brute Force Wall ​

Divide and Conquer -- The Algorithm ​

The Strip Trick ​

Implementation ​

The Randomized Alternative ​

When This Shows Up ​

Gotchas ​

Self-Test ​

Practice Problems ​

Further Reading ​