Skip to content

Linear Basis (XOR Basis)

Quick Revisit

  • USE WHEN: XOR subset queries -- max XOR, min XOR, k-th XOR, count distinct XOR values
  • INVARIANT: basis[k] has bit k as its highest set bit; at most 60 basis vectors for 64-bit values
  • TIME: O(n log V) build, O(log V) per query | SPACE: O(log V)
  • CLASSIC BUG: forgetting to reduce basis to RREF before k-th XOR query
  • PRACTICE: ../12-practice-ladders/08-ladder-mixed.md

A Gaussian elimination structure over GF(2) that lets you answer XOR subset queries in O(logV) time.

DifficultyAdvanced
CF Rating1800 - 2100
PrerequisitesBit Manipulation

Contents


Why XOR subsets are tricky

Here is a deceptively simple question: given n integers, pick any subset and XOR them together. What is the maximum value you can produce?

Brute force checks all 2n subsets -- hopeless when n can reach 105. The trick is to realize that most of those 2n combinations are redundant. Think of each number as a vector of bits. XOR is addition in a vector space over the field with two elements (GF(2), where 11=0). A subset-XOR is really a linear combination with coefficients 0 or 1.

Just like in ordinary linear algebra, every vector space has a basis -- a minimal set of vectors that spans the whole space. For 64-bit integers the dimension is bounded by the number of bits actually used, so we never need more basis vectors than there are bits in the input. Building that basis is Gaussian elimination, adapted for XOR.

Sizing the basis array — match the input bound exactly. The right MAXLOG depends on the input, not the integer type:

Input boundBits neededbasis array size
0ai109 (fits int)30basis[30]
0ai1018 (fits long long)60basis[60]
0ai<263 (signed long long, no negatives)63basis[63]
unsigned 64-bit values64basis[64] (use uint64_t)

The conventional basis[60] choice is tuned to 1018 contest bounds. If the problem hands you values up to 262, basis[60] silently misses the top two bits. Always read the constraint and set the bound to match.

Building the basis: Gaussian elimination over GF(2)

We maintain an array basis[0..59]. Slot k either holds a number whose highest set bit is bit k, or is zero (empty). To insert a value x:

  1. Scan from the highest bit of x downward.
  2. Let k be the current highest bit of x.
    • If basis[k] == 0 -- no existing element has this leading bit. Store x there and stop. We just increased the rank by one.
    • If basis[k] != 0 -- XOR x with basis[k]. This clears bit k in x and we repeat with the new, smaller x.
  3. If x becomes 0, it was linearly dependent -- discard it.

This is exactly row reduction. Each successful insertion adds one independent element. The invariant is: every stored value has a unique leading bit that no other stored value shares.

Worked example: inserting 7, 5, 3, 6

Let us trace through with four numbers. We show binary representations and which basis slots fill up.

Insert 7 (binary 111). Highest bit is bit 2. basis[2] is empty, so store 7 there.

basis[2] = 111  (7)
basis[1] = 000  (empty)
basis[0] = 000  (empty)

Insert 5 (binary 101). Highest bit is bit 2. basis[2] = 7, so XOR: 57=2 (binary 010). Now highest bit is bit 1. basis[1] is empty, so store 2 there.

basis[2] = 111  (7)
basis[1] = 010  (2)
basis[0] = 000  (empty)

Note: we stored 2, not the original 5. That is fine -- the span is the same.

Insert 3 (binary 011). Highest bit is bit 1. basis[1] = 2, so XOR: 32=1 (binary 001). Highest bit is bit 0. basis[0] is empty, store 1.

basis[2] = 111  (7)
basis[1] = 010  (2)
basis[0] = 001  (1)

All three slots are full -- rank 3, meaning 23=8 distinct XOR values reachable.

Insert 6 (binary 110). Highest bit is bit 2. XOR with basis[2]: 67=1 (001). Highest bit is bit 0. XOR with basis[0]: 11=0. Result is 0 -- linearly dependent. Discarded.

basis[2] = 111  (7)    (unchanged)
basis[1] = 010  (2)
basis[0] = 001  (1)

What you can do with a basis

Once built, the basis supports several powerful queries.

Can a value be represented? Run the same insertion procedure. If it reduces to 0, yes. If it would be inserted (non-zero remainder), no.

Maximum XOR. Greedily accumulate from the highest basis slot downward: if XOR-ing basis[k] with the running answer increases it, take it. Since basis[k] has bit k as its leading bit, this always works -- higher bits dominate.

Minimum non-zero XOR. The lowest-index non-zero slot in the basis (after reducing to row echelon form) gives the minimum.

k-th smallest XOR. Reduce the basis to reduced row echelon form (each basis element has its leading bit set and no other basis element has that bit). Then the 2|basis| achievable values correspond to binary numbers: the i-th bit of k tells you whether to include the i-th basis vector (sorted by bit position).

Merge two bases. Insert every element of one basis into the other -- at most 60 insertions.

Count of distinct XOR values. Exactly 2|basis| (including 0).

Row-echelon vs reduced row-echelon — a subtle distinction

The standard insertion procedure described above produces a basis in row-echelon form (REF): each basis[k] has bit k as its highest set bit, but lower bits may be set, and may also be set in other basis vectors.

QueryForm needed
Insert / can-be-represented / maximum XORRow echelon (REF) — what insert builds
Minimum non-zero XORRow echelon
k-th smallest XOR / ordered enumerationReduced row echelon (RREF)

RREF additionally forces: for each basis[k], no other basis vector has bit k set. That eliminates the cross-talk between basis vectors so that XOR-ing in any subset of basis vectors yields a value whose bit pattern at the pivot positions exactly matches the inclusion pattern. That is what makes the bijection "binary representation of k ↔ subset of basis vectors ↔ k-th XOR value" work.

Skipping the reduce step before a k-th-smallest query is the silent bug the quick revisit warns about — max XOR will keep working, so it is easy to overlook.

SPAN OF THE BASIS (rank 3, values 0-7):

  basis[2]=111  basis[1]=010  basis[0]=001

  XOR combination          Value
  ─────────────────────    ─────
  (none)                     000 = 0
  basis[0]                   001 = 1
  basis[1]                   010 = 2
  basis[1] ^ basis[0]        011 = 3
  basis[2]                   111 = 7
  basis[2] ^ basis[0]        110 = 6
  basis[2] ^ basis[1]        101 = 5
  basis[2]^basis[1]^basis[0] 100 = 4

  All 2^3 = 8 values reachable.  This is a complete 3-bit space.

Applications in competitive programming

  • Maximum XOR subarray: Compute prefix XOR pi. Any subarray XOR is prpl1. Fix r, find maximum over all l -- insert all previous prefix XORs into a basis and query max XOR with current pr.
  • XOR spanning tree: All spanning trees of a graph have XOR values differing by cycle XORs. Build a basis of cycle XORs, then find the spanning tree XOR that maximizes (or minimizes) the target.
  • Path XOR in trees: XOR along a path = XOR of root-to-u and root-to-v. Combine with basis for subtree queries.
  • Linearly independent subsets: The basis size is the maximum number of elements you can choose so that no subset XORs to 0.

Visual intuition

Basis slots during insertion

Inserting: 7(111)  5(101)  3(011)  6(110)

Step 1: x = 7 = 111
  bit 2 slot empty --> store

  [bit 2] 111       << 7 goes here
  [bit 1] ---
  [bit 0] ---

Step 2: x = 5 = 101
  bit 2 slot has 111 --> x ^= 111 --> x = 010
  bit 1 slot empty --> store

  [bit 2] 111
  [bit 1] 010       << 5 ^ 7 = 2 goes here
  [bit 0] ---

Step 3: x = 3 = 011
  bit 1 slot has 010 --> x ^= 010 --> x = 001
  bit 0 slot empty --> store

  [bit 2] 111
  [bit 1] 010
  [bit 0] 001       << 3 ^ 2 = 1 goes here

Step 4: x = 6 = 110
  bit 2 slot has 111 --> x ^= 111 --> x = 001
  bit 0 slot has 001 --> x ^= 001 --> x = 000
  x == 0 --> DISCARD (dependent)

  [bit 2] 111       (no change)
  [bit 1] 010
  [bit 0] 001

Greedy maximum XOR walkthrough

Goal: maximize ans, starting from ans = 000

  Check basis[2] = 111:
    ans ^ 111 = 111 > 000?  YES --> ans = 111

  Check basis[1] = 010:
    ans ^ 010 = 101 < 111?  NO (bit 2 would turn off)
    Wait -- 111 ^ 010 = 101.  Bit 2 still set? Yes (101 has bit 2).
    But 101 < 111, so skip? Actually the greedy rule is:
    "if bit k of ans is 0, XOR in basis[k]"
    ans = 111: bit 1 is already 1 --> skip basis[1].

  Check basis[0] = 001:
    bit 0 of ans is already 1 --> skip.

  Result: ans = 111 = 7.  Maximum achievable XOR.

The greedy rule is simple: for each basis slot k from high to low, if bit k of the running answer is not set, XOR it in. Since each basis element has a unique leading bit, this never "damages" higher bits.

What the Code Won't Teach You

The code compiles, runs, and gives correct answers. What it hides:

Why the greedy for max-XOR works. Each basis slot k "owns" bit k -- no other basis vector has that bit set as its leader. XOR-ing basis[k] into the running answer toggles bit k without disturbing any higher bit already accumulated. This is a greedy choice with no regret: it works because the basis is in row echelon form. If you skip this understanding, variants (max XOR with a fixed starting value, max XOR over a prefix range) will feel like entirely new problems instead of the same greedy on a modified input.

The persistent/index-aware basis for range queries. To answer "maximum XOR from elements in [l,r]" offline, sort queries by r and maintain a basis where each slot also tracks the latest index of the element stored there. During insertion, if the new element's index is later than the current occupant's, swap them and keep reducing the older one. When answering query [l,r], only use entries whose stored index l.

STANDARD vs INDEX-AWARE INSERTION:

  Standard insert(x):                Index-aware insert(x, idx):
  +---------------------------+      +-----------------------------------+
  | for bit k high to low:    |      | for bit k high to low:            |
  |   if basis[k]==0:         |      |   if basis[k]==0:                 |
  |     basis[k]=x; return    |      |     basis[k]=x; index[k]=idx     |
  |   x ^= basis[k]           |      |   else if idx > index[k]:         |
  |                           |      |     swap(x, basis[k])             |
  |                           |      |     swap(idx, index[k])           |
  |                           |      |   x ^= basis[k]                   |
  +---------------------------+      +-----------------------------------+
  Same span, but index-aware lets you answer range queries.

Reduced row echelon form (for k-th smallest)

Before reduction:       After reduction:
  [bit 2] 111             [bit 2] 100
  [bit 1] 010             [bit 1] 010
  [bit 0] 001             [bit 0] 001

Reduction: for each basis[k], clear bit k in all other basis elements.
  basis[2] ^= basis[1] --> 111 ^ 010 = 101
  basis[2] ^= basis[0] --> 101 ^ 001 = 100

Now k-th smallest (0-indexed, excluding 0):
  k=1 -> take basis[0]           = 001 = 1
  k=2 -> take basis[1]           = 010 = 2
  k=3 -> take basis[0]^basis[1]  = 011 = 3
  k=4 -> take basis[2]           = 100 = 4
  ...and so on, reading k in binary.

C++ STL reference

Function / ExpressionWhat it doesNotes
x ^ yBitwise XORAssociative, commutative
x & yBitwise AND
x | yBitwise OR
x >> kRight shift by kDivides by 2k
1LL << kBit k setUse LL for k31
__builtin_clzll(x)Count leading zeros (64-bit)Undefined for x=0
63 - __builtin_clzll(x)Highest set bit indexCommon idiom
__builtin_popcountll(x)Number of set bits
bit_width(x)C++20: log2x+1<bit> header

Implementation (contest-ready)

Minimal contest template

cpp
#include <bits/stdc++.h>
using namespace std;

struct LinearBasis {
    static const int MAXLOG = 60;
    long long basis[MAXLOG];
    int sz;

    LinearBasis() : sz(0) { memset(basis, 0, sizeof basis); }

    bool insert(long long x) {
        for (int i = MAXLOG - 1; i >= 0; i--) {
            if (!((x >> i) & 1)) continue;
            if (!basis[i]) { basis[i] = x; sz++; return true; }
            x ^= basis[i];
        }
        return false; // dependent
    }

    bool can_represent(long long x) {
        for (int i = MAXLOG - 1; i >= 0; i--) {
            if ((x >> i) & 1) x ^= basis[i];
        }
        return x == 0;
    }

    long long max_xor() {
        long long ans = 0;
        for (int i = MAXLOG - 1; i >= 0; i--)
            ans = max(ans, ans ^ basis[i]);
        return ans;
    }

    long long min_xor() {
        for (int i = 0; i < MAXLOG; i++)
            if (basis[i]) return basis[i];
        return 0; // empty basis
    }

    void reduce() {
        for (int i = MAXLOG - 1; i >= 0; i--)
            if (basis[i])
                for (int j = i + 1; j < MAXLOG; j++)
                    if ((basis[j] >> i) & 1)
                        basis[j] ^= basis[i];
    }

    long long kth_smallest(long long k) {
        // Call reduce() first. Returns k-th smallest (1-indexed).
        // If 0 is reachable (sz < total elements), adjust k.
        vector<long long> compact;
        for (int i = 0; i < MAXLOG; i++)
            if (basis[i]) compact.push_back(basis[i]);
        if (k >= (1LL << (int)compact.size())) return -1;
        long long ans = 0;
        for (int i = (int)compact.size() - 1; i >= 0; i--)
            if ((k >> i) & 1)
                ans ^= compact[i];
        return ans;
    }

    void merge(const LinearBasis &other) {
        for (int i = MAXLOG - 1; i >= 0; i--)
            if (other.basis[i]) insert(other.basis[i]);
    }

    int size() const { return sz; }
};

int main() {
    int n;
    scanf("%d", &n);
    LinearBasis lb;
    for (int i = 0; i < n; i++) {
        long long x;
        scanf("%lld", &x);
        lb.insert(x);
    }
    printf("%lld\n", lb.max_xor());
    return 0;
}

Explained version

cpp
#include <bits/stdc++.h>
using namespace std;

struct LinearBasis {
    static const int MAXLOG = 60; // enough for values up to ~10^18
    long long basis[MAXLOG];
    int sz; // number of independent vectors in the basis

    LinearBasis() : sz(0) {
        memset(basis, 0, sizeof basis);
    }

    // Try to insert x into the basis.
    // Returns true if x was independent (basis grew by 1).
    // Returns false if x was already representable (dependent).
    bool insert(long long x) {
        for (int i = MAXLOG - 1; i >= 0; i--) {
            if (!((x >> i) & 1)) continue; // bit i not set, skip

            if (!basis[i]) {
                // No basis element has bit i as leading bit.
                // x becomes the representative for this bit position.
                basis[i] = x;
                sz++;
                return true;
            }
            // basis[i] exists with leading bit i.
            // XOR x with it to clear bit i in x, then keep reducing.
            x ^= basis[i];
        }
        // x reduced to 0 -- it was a linear combination of existing basis.
        return false;
    }

    // Check whether x can be formed by XOR of some subset of the basis.
    // Same logic as insert, but we don't modify the basis.
    bool can_represent(long long x) {
        for (int i = MAXLOG - 1; i >= 0; i--) {
            if ((x >> i) & 1) x ^= basis[i];
        }
        return x == 0;
    }

    // Maximum XOR achievable from any subset of the basis.
    // Greedy: from highest bit down, XOR in basis[i] if it increases ans.
    // Since basis[i] has bit i as leading bit, XOR-ing it in sets bit i
    // without disturbing any higher bits already accumulated.
    long long max_xor() {
        long long ans = 0;
        for (int i = MAXLOG - 1; i >= 0; i--)
            ans = max(ans, ans ^ basis[i]);
        return ans;
    }

    // Minimum non-zero XOR achievable.
    // In row echelon form (our invariant), the smallest basis element
    // is the answer because it cannot be made smaller by XOR with others.
    long long min_xor() {
        for (int i = 0; i < MAXLOG; i++)
            if (basis[i]) return basis[i];
        return 0; // empty basis -- only 0 is representable
    }

    // Reduce to reduced row echelon form:
    // for each basis[i], clear bit i in every other basis element.
    // After this, each basis element has EXACTLY one unique bit set
    // that no other element shares -- and may have lower bits set too,
    // but those lower bits do NOT appear as leading bits of other elements.
    //
    // Actually, reduced row echelon means: for each basis[j] with j > i,
    // if bit i is set in basis[j], XOR it out using basis[i].
    void reduce() {
        for (int i = MAXLOG - 1; i >= 0; i--) {
            if (!basis[i]) continue;
            for (int j = i + 1; j < MAXLOG; j++) {
                if ((basis[j] >> i) & 1) {
                    basis[j] ^= basis[i];
                }
            }
        }
    }

    // k-th smallest XOR value (1-indexed, among non-zero values).
    // MUST call reduce() before using this.
    //
    // After reduction, collect non-zero basis elements into a compact list.
    // The achievable values biject with binary numbers 0..2^sz - 1:
    // bit j of k tells whether to include compact[j].
    long long kth_smallest(long long k) {
        vector<long long> compact;
        for (int i = 0; i < MAXLOG; i++)
            if (basis[i]) compact.push_back(basis[i]);

        // k must be in [1, 2^sz - 1] for non-zero values.
        // If the original set can produce 0 (i.e., sz < n), then
        // k=1 means 0, and you should subtract 1 from k before calling.
        if (k >= (1LL << (int)compact.size())) return -1;

        long long ans = 0;
        for (int i = (int)compact.size() - 1; i >= 0; i--)
            if ((k >> i) & 1)
                ans ^= compact[i];
        return ans;
    }

    // Merge another basis into this one.
    // At most MAXLOG insertions, so O(MAXLOG^2).
    void merge(const LinearBasis &other) {
        for (int i = MAXLOG - 1; i >= 0; i--)
            if (other.basis[i]) insert(other.basis[i]);
    }

    int size() const { return sz; }
};

int main() {
    int n;
    scanf("%d", &n);

    LinearBasis lb;
    for (int i = 0; i < n; i++) {
        long long x;
        scanf("%lld", &x);
        lb.insert(x);
    }

    // Example: print max XOR, min non-zero XOR, count of distinct values
    printf("Max XOR:      %lld\n", lb.max_xor());
    printf("Min XOR:      %lld\n", lb.min_xor());
    printf("Distinct XOR: %lld\n", 1LL << lb.size());

    // k-th smallest query
    lb.reduce();
    for (long long k = 1; k < (1LL << lb.size()); k++) {
        printf("k=%lld -> %lld\n", k, lb.kth_smallest(k));
    }

    return 0;
}

Operations reference

OperationTime ComplexitySpaceNotes
insert(x)O(logV)O(1)V = max value, so logV60
can_represent(x)O(logV)O(1)Same loop as insert
max_xor()O(logV)O(1)Single greedy pass
min_xor()O(logV)O(1)Scan for first non-zero slot
reduce()O(log2V)O(1)One-time preprocessing
kth_smallest(k)O(logV)O(logV)Requires reduce() first
merge(other)O(log2V)O(1)Insert each of other's elements
size()O(1)O(1)Returns basis rank
Build from n elementsO(nlogV)O(logV)n insertions

Problem patterns and tricks

1. Maximum XOR subset

The most direct application. Insert all values, call max_xor().

cpp
LinearBasis lb;
for (auto x : a) lb.insert(x);
cout << lb.max_xor() << "\n";

Problems: CF 959F -- offline queries sorted by index, maintain basis as you add elements.

2. Maximum XOR subarray (prefix XOR trick)

Any subarray XOR alar=prpl1 where p is the prefix XOR array. To maximize over all lr:

PREFIX XOR TRICK FOR MAXIMUM XOR SUBARRAY:

  Array a:       [3,  5,  2,  6]
  Prefix XOR p:  [0,  3,  6,  4,  2]
                  ^              ^
                  p[0]           p[4]

  Subarray XOR(a[1..3]) = p[3] ^ p[0] = 4 ^ 0 = 4
  Subarray XOR(a[2..4]) = p[4] ^ p[1] = 2 ^ 3 = 1

  For each r, maximize p[r] ^ p[l-1] over all l <= r
  --> insert p[0..r-1] into basis, query max XOR with p[r]
cpp
vector<long long> p(n + 1);
for (int i = 0; i < n; i++) p[i + 1] = p[i] ^ a[i];

LinearBasis lb;
long long ans = 0;
for (int r = 1; r <= n; r++) {
    lb.insert(p[r - 1]);
    // max XOR of p[r] with any element in the basis
    long long cur = p[r];
    for (int i = 59; i >= 0; i--)
        cur = max(cur, cur ^ lb.basis[i]);
    ans = max(ans, cur);
}

Problems: CF 845G -- Shortest Path Problem, which is really about XOR paths in a graph.

3. XOR spanning tree / cycle space

In an undirected weighted graph, fix any spanning tree. Every non-tree edge creates a cycle whose XOR is computable. Build a basis of these cycle XORs. The set of achievable path XORs between two nodes = (tree path XOR) (any subset of cycle basis). Maximize or minimize with max_xor.

XOR CYCLE SPACE IN A GRAPH:

      1 ---5--- 2          Spanning tree edges: 1-2(5), 2-3(3)
      |        / |          Non-tree edge: 1-3(7)  -->  cycle XOR = 5^3^7 = 1
      7      3   |
      |    /     |          Path 1->3 via tree:      5 ^ 3 = 6
      3 -------- 3          Path 1->3 via non-tree:  7
                            Difference:  6 ^ 7 = 1  (= cycle XOR)

  To minimize path XOR: tree_path_xor ^ (optimal subset of cycle basis)

Problems: CF 845G

4. Online insertion with offline queries

Sort queries by time or index. Maintain a linear basis, inserting elements as you go. Each query only sees elements inserted so far.

Problems: CF 959F -- sort queries by right endpoint, insert array elements left to right.

5. Path XOR in trees

Root the tree. For each node u, store du = XOR of edge weights from root to u. Path XOR from u to v = dudv. Combine with segment tree of bases for subtree/path queries.

Problems: Tree XOR queries (various gym problems).

6. k-th smallest XOR value

Build basis, reduce, then use kth_smallest. Remember to handle whether 0 is achievable: if the number of inserted elements exceeds the basis size, then some element was dependent, meaning 0 is in the span. In that case, k=1 maps to 0, so subtract 1 from k before calling kth_smallest.

cpp
LinearBasis lb;
int n = a.size();
for (auto x : a) lb.insert(x);
lb.reduce();
bool zero_reachable = (lb.size() < n);
// if zero_reachable, k=1 is 0, k=2 is kth_smallest(1), etc.

Problems: CF 1101G -- Zero XOR Subset (check if 0 is achievable, which means basis size < n).

7. Counting subsets with a given XOR

The number of subsets of n elements that XOR to a target t is either 0 (if t is not representable) or 2n|basis| (each dependent element can be freely toggled).

Problems: Various counting problems on CF.


Contest cheat sheet

+------------------------------------------------------------------+
|               LINEAR BASIS (XOR BASIS) CHEAT SHEET               |
+------------------------------------------------------------------+
| WHEN TO USE:                                                     |
|   - Maximum/minimum XOR of a subset                              |
|   - k-th smallest XOR value                                      |
|   - Count distinct achievable XOR values (2^rank)                |
|   - Check if value is XOR-representable                          |
|   - XOR cycle space in graphs                                    |
+------------------------------------------------------------------+
| KEY CODE:                                                        |
|   bool insert(ll x) {                                            |
|     for (int i=59;i>=0;i--) {                                    |
|       if (!((x>>i)&1)) continue;                                 |
|       if (!basis[i]) { basis[i]=x; return true; }                |
|       x ^= basis[i];                                             |
|     }                                                            |
|     return false;                                                 |
|   }                                                              |
+------------------------------------------------------------------+
| COMPLEXITY:                                                      |
|   insert      O(log V)     |  max_xor     O(log V)              |
|   represent?  O(log V)     |  kth_small   O(log V) + reduce     |
|   merge       O(log^2 V)   |  reduce      O(log^2 V)            |
|   build       O(n log V)   |  space       O(log V)              |
+------------------------------------------------------------------+
| REMEMBER:                                                        |
|   - Call reduce() BEFORE kth_smallest                            |
|   - Use long long (not int) for 60-bit values                   |
|   - Distinct XOR count = 2^(basis size)                          |
|   - Zero reachable iff sz < n (some element was dependent)       |
+------------------------------------------------------------------+

Gotchas and debugging

Using int instead of long long. Values can be up to 1018. The basis array, all intermediate variables, and the shift 1LL << k must use long long. A bare 1 << 40 is undefined behavior.

Forgetting to reduce() before kth_smallest. Without reduction, basis elements share bits across slots. The bijection between k and XOR values only works in reduced row echelon form.

Off-by-one in basis size vs. element count. The basis size is the number of non-zero slots, not n. size() should count only successful insertions. Getting this wrong breaks the zero-reachability check and the k-th smallest offset.

Empty basis edge case. If no elements are inserted (or all are 0), the basis is empty. max_xor() and min_xor() should return 0. kth_smallest with k1 should return 1.

Handling 0 in k-th smallest. Zero is always reachable when the basis rank is less than the number of inserted elements (some element was dependent). If 0 should be counted as the "first" value, subtract 1 from k before the binary decomposition query:

cpp
if (zero_reachable) {
    if (k == 1) { cout << 0; return; }
    k--; // shift because 0 is the smallest
}
cout << lb.kth_smallest(k);

Merging bases is O(log2V), not O(logV). Each of the 60 elements requires an O(60) insertion. If you need to merge many bases (e.g., segment tree of bases), keep this in mind for complexity analysis.

Re-inserting a value. Inserting the same value twice is harmless -- it reduces to 0 the second time. But if you track counts or indices, be careful.

Mental Traps

"The basis is unique -- insertion order doesn't matter." The span of the basis is unique (same vector space regardless of order), but the actual vectors stored in basis[] depend on insertion order. Insert {5, 7} vs {7, 5} and you get different entries. This never affects query results (max, min, representability), but it can confuse debugging when you compare basis arrays across two implementations.

INSERTION ORDER AFFECTS VECTORS, NOT SPAN:

  Order A: insert 7(111), then 5(101)     Order B: insert 5(101), then 7(111)
  +----------------------------+           +----------------------------+
  | basis[2] = 111  (from 7)   |           | basis[2] = 101  (from 5)   |
  | basis[1] = 010  (5^7=2)    |           | basis[1] = 010  (7^5=2)    |
  +----------------------------+           +----------------------------+
  Different vectors ---- Same span = {0, 2, 5, 7}

"Failed insertion means the element is useless." A failed insertion (x reduces to 0) means x is already in the span -- it is representable, not useless. If you are counting subsets that XOR to a target, every dependent element doubles the count: 2nrank subsets per achievable value. Treating dependent elements as garbage misses the counting insight entirely.

DEPENDENT != USELESS:

  Elements: {7, 5, 3, 6}     Basis rank = 3     Dependent: {6}

  # subsets XOR-ing to any reachable target: 2^(4-3) = 2
    XOR to 0:  {}              and  {7,5,3,6}
    XOR to 7:  {7}             and  {5,3,6}
    XOR to 2:  {5,7}           and  {3,6}

  Each dependent element DOUBLES the solution count.

Self-Test

  • [ ] Write XOR basis insertion from memory -- the loop from high bit to low, placement-vs-XOR branch
  • [ ] Write the max-XOR greedy query and explain why higher bits dominate lower ones
  • [ ] Given a basis of rank 3 built from 5 elements, state how many distinct XOR values are achievable (including 0) and how many subsets produce each value
  • [ ] Explain why kth_smallest requires reduce() first -- what breaks without reduced row echelon form?
  • [ ] Describe how to check if a value x is representable using the basis, without modifying it

Practice problems

#ProblemSourceDifficultyKey Idea
1Xor-MSTCF 888GHard (2300)Divide-and-conquer on bits + basis
2Shortest Path Problem?CF 845GMedium (2100)Cycle XOR basis in graph
3Zero XOR SubsetCF 1101GMedium (1900)Check if basis rank < n
4Mahmoud and Ehab and yet another xor taskCF 959FMedium (2100)Offline queries + basis
5Linear Algebra TestCF 678FMedium (2100)Basis over GF(2)
6Two Teams ComposingCF 1335E2Easy (1800)Basis counting
7XOR on SegmentCF 242EMedium (2000)Segment tree + bitwise operations
8Compatible NumbersCF 165EMedium (2200)Bitmask DP / XOR structure

Further reading


The Aha Moment

A set of n numbers can produce up to 2n different XOR subsets -- but the number of distinct XOR values is always a power of 2, specifically 2r where r is the rank of the XOR basis (at most log2V60). No matter how many numbers you have, the "XOR universe" they span is determined by at most 60 basis vectors.

The real "aha": building the XOR basis is just Gaussian elimination on binary numbers. Insert each number top-bit-first: if the highest set bit isn't claimed by an existing basis vector, this number becomes a new basis vector. If it is, XOR with that basis vector and continue. After processing all numbers, the basis has at most 60 elements, and you can answer "maximum XOR", "minimum XOR", "k-th smallest XOR", and "can we make value X?" -- all in O(logV).


Pattern Fingerprint

Constraint / SignalTechnique
"Maximum XOR of any subset"XOR basis -- greedily XOR basis vectors from high to low
"Can we obtain XOR value X from a subset?"Insert all values into basis; try to reduce X to 0
"Count distinct XOR values"2rank
"k-th smallest XOR of any subset"Reduce basis to RREF, treat k as a bitmask over basis vectors
"Maximum XOR path from u to v in a graph"Any path XOR cycle XORs; insert all cycle XORs into basis
"Online insertion + max XOR query"XOR basis supports O(logV) insert without rebuild
"XOR of subarray" with prefix XOR trickCombine prefix XOR with XOR basis on a sliding window or offline

Rating Progression

CF RatingWhat You Should Be Able To Do
1200Know XOR properties: aa=0, a0=a; solve basic XOR puzzles
1500Understand that XOR basis exists and reduces n numbers to 60 basis vectors; find max XOR subset
1800Implement XOR basis for max/min/count queries; handle "XOR path in graph" via cycle space
2100Implement k-th smallest XOR; combine XOR basis with data structures (segment tree, divide and conquer); handle online queries with incremental insertion

Before You Code Checklist

  • [ ] Initialize basis array to all zeros -- long long basis[60] = {}
  • [ ] Insert from the highest bit downward -- for each number, find its highest set bit, check if that bit's slot is occupied, XOR if so, and continue
  • [ ] For "maximum XOR": greedily XOR each basis vector -- start with ans=0, for each basis vector b from high to low, set ans=max(ans,ansb)
  • [ ] For "k-th smallest": convert to reduced row echelon form -- each basis vector should have a unique leading bit with all other basis vectors having 0 in that position
  • [ ] Don't forget the empty subset -- XOR of the empty subset is 0; whether this counts depends on the problem

What Would You Do If...?

Scenario 1: "Find the maximum XOR of a path between any two nodes in a graph." Any two paths between the same endpoints differ by a set of cycle XORs. So: find any spanning tree, compute path XOR from root to each node, then insert all "back edge" cycle XORs into an XOR basis. For a query (u,v): start with pathXOR(u)pathXOR(v) and greedily maximize by XORing in basis vectors.

Scenario 2: "Find the k-th smallest XOR of any subset, k1018." Reduce the basis to RREF (reduced row echelon form). Now the r basis vectors form a "binary coordinate system": the k-th value (0-indexed) is obtained by treating k's binary representation as a selector -- bit i of k selects whether to include the i-th basis vector. Works because RREF makes the basis vectors independent in a nice positional sense.

Scenario 3: "You need XOR basis queries on a subarray [l,r]." Offline: use divide-and-conquer on array ranges, maintaining an XOR basis for each recursive half. Online: use a persistent or "prefix" XOR basis where basis[r] represents elements a1ar, annotated with the latest index each basis vector was derived from; to answer [l,r], only use basis vectors with index l.


The Mistake That Teaches

Story: You're solving "maximum XOR of any subset." You build the basis, then to find the maximum, you XOR ALL basis vectors together. Test case: basis = {4, 2, 1} (binary: 100, 010, 001). XOR all: 421=7. Correct!

Next test: basis = {7, 3} (binary: 111, 011). XOR all: 73=4. But the actual maximum is 7 (just take the first element alone). Your approach failed because XORing two basis vectors can decrease the result.

The fix: Use the greedy algorithm:

cpp
long long ans = 0;
for (int i = 59; i >= 0; i--)
    if (basis[i]) ans = max(ans, ans ^ basis[i]);

This works because ans ^ basis[i] > ans precisely when basis[i] has a set bit that ans doesn't -- and processing from high bits to low ensures we never undo a good choice.

Lesson: "XOR everything together" is NOT the maximum XOR. Use the greedy bit-by-bit approach.


Debug This

Bug 1:

Find the bug in this XOR basis insertion
cpp
long long basis[60];
void insert(long long val) {
    for (int i = 59; i >= 0; i--) {
        if (!(val >> i & 1)) continue;
        if (basis[i]) {
            val ^= basis[i];
        } else {
            basis[i] = val;
            return;
        }
    }
    // val became 0 -- linearly dependent, discard
}

// Maximum XOR query:
long long max_xor() {
    long long ans = 0;
    for (int i = 0; i < 60; i++)  // low to high
        ans = max(ans, ans ^ basis[i]);
    return ans;
}

Bug: The max_xor function iterates from low bits to high bits. This can produce suboptimal results because a low-bit basis vector might set a bit that a later high-bit vector would have set anyway (alongside other bits). The greedy algorithm must go from high to low to ensure the highest bits are set first.

Fix: Change to for (int i = 59; i >= 0; i--).

Bug 2:

Find the bug in this "can we make value X" check
cpp
bool can_make(long long x) {
    for (int i = 59; i >= 0; i--) {
        if (!(x >> i & 1)) continue;
        if (basis[i]) x ^= basis[i];
        else return true;  // this bit can be set freely?
    }
    return x == 0;
}

Bug: When basis[i] is 0 (no basis vector for bit i), the function returns true -- but this is backwards! If there's no basis vector for bit i and x has bit i set, that means we cannot produce that bit, so we should return false.

Fix: Swap the return values:

cpp
else return false;  // can't produce this bit

And the final return x == 0; becomes return true; (if x was fully reduced to 0, we can produce it).

Wait, actually the original final return x == 0 is correct for success. Just fix the else:

cpp
else return false;

Bug 3:

Find the bug in this k-th smallest XOR
cpp
// After building basis, compute k-th smallest distinct XOR (1-indexed)
long long kth_xor(long long k) {
    // Compact basis: collect non-zero basis vectors
    vector<long long> b;
    for (int i = 0; i < 60; i++)
        if (basis[i]) b.push_back(basis[i]);

    int r = b.size();
    if (k > (1LL << r)) return -1;  // only 2^r distinct values

    long long ans = 0;
    for (int i = 0; i < r; i++) {
        if (k >> i & 1) ans ^= b[i];
    }
    return ans;
}

Bug: This requires the basis to be in reduced row echelon form (RREF) for the bit-selector interpretation to work correctly. If the basis is in standard (non-reduced) form, basis vectors may share set bits, and the mapping from k to XOR value won't produce sorted order.

Fix: Before calling kth_xor, reduce the basis to RREF:

cpp
for (int i = 59; i >= 0; i--)
    if (basis[i])
        for (int j = i + 1; j < 60; j++)
            if (basis[j] >> i & 1) basis[j] ^= basis[i];

Also: if the empty subset (XOR = 0) is a valid answer, adjust k accordingly -- if 0 is included, query with k1 instead.


Common Mistakes

  • Querying k-th XOR without RREF. The standard basis form has shared lower bits between vectors. Reduce to RREF first or the bit-selector mapping is wrong.
  • Forgetting the empty subset. XOR of zero elements is 0. If 0 is a valid answer, adjust k accordingly in k-th queries.
  • Wrong bit width. Using 30 bits when values can reach 10^18 (need 60 bits).
  • Inserting duplicates and expecting rank to increase. Inserting the same value twice reduces to 0 -- harmless but rank stays the same.

Historical Origin

The XOR basis is a specialization of Gaussian elimination over finite fields, a technique with roots in Évariste Galois's 19th-century work on field theory. The specific application to binary (GF(2)) problems in competitive programming became popular in the 2010s, driven by Codeforces problems requiring efficient XOR subset queries -- transforming what seems like an exponential search into a O(nlogV) construction.


Mnemonic Anchor

"60 basis vectors span every XOR -- insert high-bit-first, query greedily."


ConceptLink
Bit Manipulation07-bit-manipulation
Gaussian Elimination10-gaussian-elimination
Matrix Exponentiation09-matrix-exponentiation
PracticeLadder - Mixed

Built for the climb from Codeforces 1100 to 2100.