Skip to content

Virtual Tree (Auxiliary Tree)

Quick Revisit

  • USE WHEN: Multiple queries each involving a small subset of k important nodes on a large tree (Steiner tree, subtree DP on marked nodes)
  • INVARIANT: Virtual tree has ≤ 2k−1 nodes (important nodes + pairwise LCAs) and preserves ancestor-descendant structure of the original tree
  • TIME: O(k log k) construction per query (sorting + LCA) | SPACE: O(k) per query
  • CLASSIC BUG: Not adding LCAs of consecutive nodes in the sorted-by-tin order — the virtual tree loses structural edges and disconnects
  • PRACTICE: 05-ladder-graphs

Given a tree on n nodes, you need to answer queries that each involve a small subset of k "important" nodes. A virtual tree compresses the original tree to at most 2k1 nodes -- just the important ones and their pairwise LCAs -- so you can run expensive algorithms on a tiny graph instead of the full tree.

Difficulty: [Advanced]Prerequisites: Euler Tour & LCA, DFS & Tree DFSCF Rating Range: 2100-2500+


The Problem It Solves

You have a tree with n=105 nodes. A query gives you k=5 marked nodes and asks: "What is the minimum Steiner tree connecting these 5 nodes?" Or: "Run a DP on the tree but only these k nodes matter."

Solving on the full tree costs O(n) per query. With q queries, that's O(nq) -- too slow when both are large but k is small.

The virtual tree lets you solve each query in O(klogk) construction + O(k) for the DP/algorithm. Total: O(kilogn).

Construction Algorithm

Input: The original tree (with LCA preprocessing) and a list of k important nodes.

Output: A small tree of at most 2k1 nodes preserving the ancestor-descendant structure of the original.

Steps:

  1. Sort important nodes by Euler tour entry time (tin).
  2. Add pairwise LCAs: for each consecutive pair (vi,vi+1) in sorted order, compute LCA(vi,vi+1) and add it to the set.
  3. Deduplicate and re-sort by tin.
  4. Build the virtual tree using a stack: process nodes in tin order, maintaining a stack representing the current root-to-leaf path. For each new node, pop stack entries that are not ancestors, adding edges as you go.
  Original tree:            Important nodes: {3, 7, 11}
        1
       / \
      2   3                 LCA(3,7) = 1,  LCA(7,11) = 5
     / \   \
    4   5   6               Virtual tree nodes: {1, 3, 5, 7, 11}
       / \
      7   8                 Virtual tree:
     /                           1
    9                           / \
   / \                         3   5
  10  11                          / \
                                 7  11
                            (edges carry original distances)

The virtual tree has 5 nodes instead of 11. Edge weights equal the path length in the original tree.

Why consecutive-tin LCAs are enough

The most surprising step of the algorithm is step 2: we only add the LCAs of consecutive pairs in the tin-sorted list, not the LCAs of all (k2) pairs. The pairwise LCAs are still implicitly captured. Why?

Claim. If you sort the marked nodes by Euler-tour entry time tin, the set of LCAs of all pairs equals the set of LCAs of consecutive pairs.

The intuition: in tin order, the marked nodes appear in the order DFS first visits them. Two marked nodes u,w that are not consecutive in the sorted list have some marked node v between them with tin[u]<tin[v]<tin[w]. Then LCA(u,w) either equals LCA(u,v) or LCA(v,w) -- whichever is shallower -- because DFS visited u, descended into the subtree where v lives, then ascended and continued to w. So any "branching point" between u and w shows up as a consecutive-pair LCA somewhere in the chain u,v,w. Adding only consecutive LCAs already collects every branching point in the original tree's structure on the marked set.

This is why the construction is O(klogk) instead of O(k2logn): we only need k1 LCA queries, plus one sort.

The tin order is critical. Sort by any other order (DFS post-order, label, depth) and the consecutive-pair LCA argument breaks. The DFS-order adjacency of marked nodes is what exposes all the branch LCAs.

Stack-Based Construction

The stack maintains the path from the virtual root to the current working node:

  Processing nodes in tin order: [1, 3, 5, 7, 11]

  Process 1:  stack = [1]
  Process 3:  3 is child of 1 -> add edge (1,3), stack = [1, 3]
  Process 5:  LCA(3,5)=1. Pop 3 (edge 1-3 done). 5 is child of 1.
              stack = [1, 5]
  Process 7:  7 is child of 5 -> add edge (5,7), stack = [1, 5, 7]
  Process 11: LCA(7,11)=5. Pop 7 (edge 5-7 done). 11 is child of 5.
              stack = [1, 5, 11]
  Done: pop remaining -> edges (5,11), (1,5)

Implementation

cpp
#include <bits/stdc++.h>
using namespace std;

// Assume LCA is precomputed: lca(u,v), tin[u] (Euler tour entry time)
// adj_vt: adjacency list of the virtual tree (cleared each query)

vector<int> adj_vt[MAXN];
int stk[MAXN], top_stk;

bool isAnc(int u, int v) {  // is u an ancestor of v?
    return tin[u] <= tin[v] && tout[v] <= tout[u];
}

void addEdge(int u, int v) {
    adj_vt[u].push_back(v);
    adj_vt[v].push_back(u);
}

void buildVirtualTree(vector<int>& nodes) {
    sort(nodes.begin(), nodes.end(), [](int a, int b) {
        return tin[a] < tin[b];
    });
    // add LCAs of consecutive pairs
    int sz = nodes.size();
    for (int i = 0; i + 1 < sz; i++) {
        nodes.push_back(lca(nodes[i], nodes[i + 1]));
    }
    sort(nodes.begin(), nodes.end(), [](int a, int b) {
        return tin[a] < tin[b];
    });
    nodes.erase(unique(nodes.begin(), nodes.end()), nodes.end());

    // Stack-based construction. The stack holds the current
    // root-to-leaf path of the virtual tree under construction.
    top_stk = 0;
    for (int v : nodes) {
        if (top_stk == 0) { stk[top_stk++] = v; continue; }
        // Pop stack entries that are NOT ancestors of v -- those
        // subtrees are finalised; record their parent edge.
        while (top_stk > 1 && !isAnc(stk[top_stk - 1], v)) {
            addEdge(stk[top_stk - 2], stk[top_stk - 1]);
            top_stk--;
        }
        int l = lca(v, stk[top_stk - 1]);
        if (stk[top_stk - 1] != l) {
            // The LCA of v and the current stack top sits strictly
            // between them. It must already be in `nodes` (we added
            // it during the LCA-of-consecutive-pairs step), so we
            // attach the unfinished top to l and replace the top.
            addEdge(l, stk[top_stk - 1]);
            stk[top_stk - 1] = l;
        }
        stk[top_stk++] = v;
    }
    while (top_stk > 1) {
        addEdge(stk[top_stk - 2], stk[top_stk - 1]);
        top_stk--;
    }
}

Clean up after each query: clear adj_vt[v] for every node in the virtual tree (don't memset the whole array).

Complexity

OperationTime
Build virtual treeO(klogk) (sorting + LCA queries)
Solve on virtual treedepends on algorithm, but tree has O(k) nodes
LCA query (binary lifting)O(logn) per query
Total per queryO(klogn)

When to Reach for Virtual Tree

  • "k special nodes" in a tree query where knq.
  • Steiner tree on a tree (minimum subtree connecting marked nodes).
  • DP on tree with only a few relevant leaves/nodes per query.
  • Any problem where you'd run full tree DP but most nodes are irrelevant.

Pitfalls

  • Forgetting to add the root. If the tree is rooted and your algorithm needs the root, include it in the important set.
  • Not cleaning up adjacency lists. Only clear nodes that appeared in the virtual tree -- clearing the full array is O(n) per query.
  • Off-by-one in LCA pairs. You need LCAs of consecutive pairs in sorted tin order, not all pairs.
  • Edge weights. If edges carry weights, the virtual tree edge weight between two nodes equals the distance in the original tree (precompute with depth arrays).

Practice Problems

#ProblemDifficultyNotes
1CF 613D -- Kingdom and its Cities2500Classic virtual tree + greedy
2CF 1111E -- Tree2500Virtual tree + DP
3CF 587F -- Duff is Mad2800Advanced virtual tree application
4Luogu P2495 -- 消耗战--Steiner tree on virtual tree

Further Reading


Built for the climb from Codeforces 1100 to 2100.