Skip to content

DFS and Tree DFS

AL-13 | Depth-first search on general graphs and rooted trees -- the backbone of nearly every graph problem on Codeforces.

Difficulty: [Beginner]Prerequisites: Graph Representation, Stack, Queue, and DequeCF Rating: 1100-1500

Contest Frequency: *** Common -- appears in most Div 2 contests

Quick Revisit

  • USE WHEN: reachability, cycle detection, connected components, tree traversal, back-edge classification
  • INVARIANT: each node visited exactly once; recursion stack reveals ancestry
  • TIME: O(V + E) | SPACE: O(V) stack depth
  • CLASSIC BUG: stack overflow on large graphs (use iterative DFS or increase stack size)
  • PRACTICE: ../12-practice-ladders/05-ladder-graphs.md

Contents


Intuition

This file teaches DFS in two passes. The first treats DFS as a way to visit every reachable node — the bare traversal. The second specialises to DFS on a rooted tree, where the recursion stack becomes a path from the root and timestamps become the subtree structure. Edge classification (tree/back/forward/cross) is a consequence of the second view; it appears later, after you have a use for it.

Part 1 — Plain DFS: visit every reachable node

Suppose you have 6 nodes and some edges, and you need to visit every node reachable from node 1.

  Graph:
  1 -- 2    2 -- 4
  1 -- 3    2 -- 5
  3 -- 6

        1
       / \
      2   3
     / \   \
    4   5   6

Naive attempt: pick a random neighbor, go there, pick another random neighbor... You might visit node 2, then jump back to 1, visit 2 again, wander to 3, revisit 1 a third time. With n=6 and m=5 edges you could easily perform 15+ node visits before covering all 6 nodes -- triple the necessary work. Every revisit is a wasted operation. The core problem: without a strategy, you revisit nodes and miss others.

The key insight:

Go as deep as possible before backtracking -- this naturally explores every reachable node exactly once using a stack (or recursion).

Analogy: you are in a maze. At every fork, always take the first unexplored corridor and walk until you hit a dead end. Then retrace your steps to the last fork that still has an unexplored corridor and repeat. You never re-enter a corridor you have already explored.

The "break" in this strategy is when you reach a dead end (a node all of whose neighbors are visited). At that moment the stack pops and you backtrack to the most recent node that still has unexplored neighbors. The stack guarantees you return to exactly the right place.

This first view of DFS — visit every reachable node, exactly once, backed by a stack — is sufficient for connected components, flood fill, reachability tests, and undirected cycle detection. Everything in Part 1 below works at this level.

Visual Walkthrough

Same 6-node graph, DFS from node 1, visiting smaller-numbered neighbors first.

  adj[1] = {2, 3}   adj[2] = {1, 4, 5}   adj[3] = {1, 6}
  adj[4] = {2}       adj[5] = {2}          adj[6] = {3}

  Step  Action             Stack (top->right)  Visited          tin  tout
  ----  -----------------  ------------------  ---------------  ---  ----
   1    visit 1            [1]                 {1}               0    -
   2    visit 2 (from 1)   [1, 2]              {1,2}             1    -
   3    visit 4 (from 2)   [1, 2, 4]           {1,2,4}           2    -
   4    backtrack from 4   [1, 2]              {1,2,4}           -    3
   5    visit 5 (from 2)   [1, 2, 5]           {1,2,4,5}         4    -
   6    backtrack from 5   [1, 2]              {1,2,4,5}         -    5
   7    backtrack from 2   [1]                 {1,2,4,5}         -    6
   8    visit 3 (from 1)   [1, 3]              {1,2,3,4,5}       7    -
   9    visit 6 (from 3)   [1, 3, 6]           {1,2,3,4,5,6}     8    -
  10    backtrack from 6   [1, 3]              {1,2,3,4,5,6}     -    9
  11    backtrack from 3   [1]                 {1,2,3,4,5,6}     -   10
  12    backtrack from 1   []                  {1,2,3,4,5,6}     -   11

  DFS order:  1 -> 2 -> 4 -> 5 -> 3 -> 6
  Total node visits: 6  (each node entered exactly once)
  Total edge inspections: 2 * 5 = 10  (each undirected edge checked from both ends)
  Overall work: O(n + m) = O(6 + 5) = O(11)

Discovery times (tin) and finish times (tout) are shown above. They define nested intervals: node u is an ancestor of node v in the DFS tree iff tin[u]tin[v] and tout[v]tout[u]. We come back to how these timestamps power subtree queries and cycle detection in Part 2 below.

The Invariant

+------------------------------------------------------------------+
| INVARIANT: Every node is visited exactly once. The recursion      |
| stack contains the current path from the source to the node       |
| being processed.                                                  |
+------------------------------------------------------------------+

Informal induction proof that DFS visits every reachable node exactly once:

Base case (k=0). The source node s is marked visited and placed on the stack. It is visited exactly once.

Inductive step. Assume that after k iterations every node that has been marked visited was visited exactly once, and every node on the stack is on the current path from s. Consider the next iteration: we pop (or are at) node u. For each neighbor v of u:

  • If v is already visited, we skip it -- no double visit.
  • If v is unvisited, we mark it visited before recursing (or pushing). So v is entered exactly once.

Because the graph is finite, the process terminates. Any reachable node w must lie on some path s=v0,v1,,vk=w. By induction, v0 is visited; when vi is visited, vi+1 is either already visited or will be visited when we inspect the edge (vi,vi+1). Hence w is eventually visited.

BFS vs DFS -- when to pick which:

CriterionDFSBFS
MemoryO(depth) stack framesO(width) queue entries
Shortest path (unweighted)NoYes -- BFS gives shortest paths
Connectivity / reachabilityYesYes
Cycle detectionYes -- back edge to gray nodePossible but less natural
Topological sortYes -- reverse of finish orderYes -- Kahn's algorithm
Tree properties (subtree size, diameter, tin/tout)Natural fitAwkward
Subtree queries / Euler tourDFS onlyNot applicable

Rule of thumb. If you need shortest distance in an unweighted graph or level-order information, use BFS. For almost everything else -- connectivity, cycle detection, topological sort, tree structure -- DFS is simpler and often more powerful. DFS uses O(depth) memory which can be O(n) worst case (chain graph); BFS uses O(width) which can also be O(n) worst case (star graph).

Part 2 — DFS on rooted trees: ancestry, time, and structure

The plain-DFS view above is enough for traversal. But on a rooted tree (or a DFS tree of any graph), DFS does more than visit nodes — the order in which it enters and exits them is itself the data structure. Three insights separate "I can code DFS" from "I understand DFS":

1. The recursion stack IS the current source-to-node path.

When DFS is processing node v, every node on the call stack forms the unique tree path from the root to v. This is why a back edge (to a GRAY node on the stack) immediately gives you a cycle -- the stack path from that ancestor down to v, plus the back edge, IS the cycle.

  DFS tree (--- = tree edge,  ... = back edge):

      1 ---> 2 ---> 3          Stack at node 3: [1, 2, 3]
                     :
                     :..> 1    back edge: 3 sees GRAY node 1
                               cycle = 1 -> 2 -> 3 -> 1
                               (stack path 1->2->3 + back edge 3->1)

2. tin/tout encode the entire subtree structure in O(1) queries.

The code increments a timer on entry and exit. The insight the code won't teach you: these times form nested intervals. [tin[u], tout[u]] fully contains [tin[v], tout[v]] if and only if v is in the subtree of u. This turns "is v a descendant of u?" into a simple range check -- no tree traversal needed.

  DFS tree with tin/tout (same 6-node graph):

        1 [0,11]                [1 [2 [4]4 [5]5 ]2 [3 [6]6 ]3 ]1
       / \                       0  1  2  3  4  5  6  7  8  9 10 11
      2   3
     / \    \                   Is 6 in subtree(1)?
    4   5    6                    tin[1]=0 <= tin[6]=8  AND
  [2,3] [4,5] [8,9]              tout[6]=9 <= tout[1]=11  => YES
       [1,6]  [7,10]
                                Is 5 in subtree(3)?
                                  tin[3]=7 <= tin[5]=4?  NO => NOT in subtree

3. Once you have a DFS tree, every edge has a name.

Now that ancestry is well-defined, the edges DFS sees on a directed graph fall into four categories:

  • Tree edge: to a WHITE (undiscovered) node — explored next.
  • Back edge: to a GRAY (on-stack) ancestor — closes a cycle.
  • Forward edge: to a BLACK descendant already in the subtree.
  • Cross edge: to a BLACK node in a different subtree.

In undirected graphs only tree and back edges exist. This classification is what turns DFS into directed-cycle detection, topological sort, SCC decomposition, and bridge finding — each of those algorithms is just "DFS plus interpret the edges."

If this is your first read, you do not need to memorise the four categories. The one to keep is back edge ⇒ cycle. The others appear in advanced graph algorithms once they have a job to do.

When to Reach for This

DFS is the go-to for: cycle detection, topological ordering (via post-order), connected components, articulation points/bridges, tree diameter, and subtree queries.

Trigger phrases in the problem statement:

  • "connected components" -> DFS (or BFS)
  • "cycle detection" -> DFS with colors (WHITE/GRAY/BLACK)
  • "tree traversal", "subtree queries" -> tree DFS with tin/tout
  • "topological order" -> DFS finish-time reverse
  • "bipartite check" -> DFS 2-coloring
  • "reachability" -> DFS or BFS
  • "bridges", "articulation points" -> DFS with low[] array (Tarjan)
  • "ancestor check", "back edge" -> DFS timestamps
  • "tree diameter" -> two-pass DFS (farthest node twice)
  • "subtree sum/update" on a tree -> DFS Euler tour + range query

Counter-examples (do NOT use DFS here):

  • "shortest path in an unweighted graph" -> use BFS
  • "shortest path with weights" -> use Dijkstra / Bellman-Ford
  • "minimum spanning tree" -> use Kruskal / Prim

DFS State Snapshots

Using the same 6-node graph from the walkthrough, here are snapshots of DFS state at key moments. Nodes are colored WHITE (undiscovered), GRAY (on the recursion stack), or BLACK (finished -- all descendants processed).

Snapshot 1: Mid-traversal (step 5 -- visiting node 5 from node 2)

        1[G]                     Recursion stack
       / \                       (= current path from root):
     2[G]  3[W]                  +---+
    / \      \                   | 5 | <-- top (currently processing)
  4[B] 5[G]  6[W]               | 2 |
                                 | 1 |
  W = WHITE (undiscovered)       +---+
  G = GRAY  (on stack)
  B = BLACK (finished)

  Nodes 1, 2, 5 are GRAY -- they form the path 1 -> 2 -> 5.
  Node 4 is BLACK -- fully processed, already backtracked.
  Nodes 3, 6 are WHITE -- not yet reached.

Snapshot 2: Back edge discovered -- edge to a GRAY node means cycle

  Suppose an extra directed edge (5 -> 1) exists.
  When DFS at node 5 examines neighbor 1:
    node 1 is GRAY (on stack) => back edge => CYCLE detected!

        1[G] <.............
       / \                 .  back edge (5 -> 1)
     2[G]  3[W]            .  cycle: 1 -> 2 -> 5 -> 1
    / \      \             .
  4[B] 5[G] ..'   6[W]

  Rule: edge to GRAY node  = back edge  = cycle exists
        edge to BLACK node = forward/cross edge = no cycle
        edge to WHITE node = tree edge (explore it)

Snapshot 3: Nested tin/tout intervals as brackets

  tin/tout values from the walkthrough:

    Node:  1      2     4     5      3     6
    tin:   0      1     2     4      7     8
    tout: 11      6     3     5     10     9

  Visualized as nested brackets on a number line:

  0   1   2   3   4   5   6   7   8   9  10  11
  |   |   |   |   |   |   |   |   |   |   |   |
  [1  [2  [4  ]4  [5  ]5  ]2  [3  [6  ]6  ]3  ]1

  Reading the nesting:
  - [4]=[2,3]  inside [2]=[1,6]  inside [1]=[0,11]
       => 4 in subtree(2), 2 in subtree(1)            YES
  - [6]=[8,9]  inside [3]=[7,10]
       => 6 in subtree(3)                              YES
  - [3]=[7,10] inside [2]=[1,6]?  tin[3]=7 > tout[2]=6
       => 3 NOT in subtree(2)                          NO

  O(1) subtree test:
    v in subtree(u)  <=>  tin[u] <= tin[v] AND tout[v] <= tout[u]

C++ STL Reference

Function / ClassHeaderWhat it doesTime Complexity
vector<vector<int>> adj(n)<vector>Adjacency list representationO(n+m) space
vector<bool> vis(n, false)<vector>Visited arrayO(n) space
stack<int><stack>Explicit stack for iterative DFSO(1) push/pop
function<void(int,int)><functional>Wrapper for recursive lambdaO(1) call overhead
ranges::fill(vis, false)<algorithm>Reset visited array between test casesO(n)

Implementation (Contest-Ready)

Version 1: Graph DFS -- Recursive + Iterative

cpp
#include <bits/stdc++.h>
using namespace std;

int main(){
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n, m;
    cin >> n >> m;
    vector<vector<int>> adj(n + 1);
    for(int i = 0; i < m; i++){
        int u, v; cin >> u >> v;
        adj[u].push_back(v);
        adj[v].push_back(u);
    }

    // --- Recursive DFS ---
    vector<bool> vis(n + 1, false);
    int comp = 0;
    auto dfs = [&](auto& self, int u) -> void {
        vis[u] = true;
        for(int v : adj[u])
            if(!vis[v]) self(self, v);
    };
    for(int i = 1; i <= n; i++){
        if(!vis[i]){
            dfs(dfs, i);
            comp++;
        }
    }
    cout << "Connected components (recursive): " << comp << "\n";

    // --- Iterative DFS (safe for large n) ---
    fill(vis.begin(), vis.end(), false);
    comp = 0;
    for(int i = 1; i <= n; i++){
        if(vis[i]) continue;
        comp++;
        stack<int> st;
        st.push(i);
        vis[i] = true;
        while(!st.empty()){
            int u = st.top(); st.pop();
            for(int v : adj[u]){
                if(!vis[v]){
                    vis[v] = true;
                    st.push(v);
                }
            }
        }
    }
    cout << "Connected components (iterative): " << comp << "\n";
}

Version 2: Tree DFS with tin/tout, Depth, Subtree Size, Parent

cpp
#include <bits/stdc++.h>
using namespace std;

int main(){
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n;
    cin >> n;
    vector<vector<int>> adj(n + 1);
    for(int i = 0; i < n - 1; i++){
        int u, v; cin >> u >> v;
        adj[u].push_back(v);
        adj[v].push_back(u);
    }

    vector<int> tin(n + 1), tout(n + 1), depth(n + 1, 0);
    vector<int> par(n + 1, 0), sz(n + 1, 1);
    int timer = 0;

    // Recursive tree DFS from root = 1
    auto dfs = [&](auto& self, int u, int p) -> void {
        par[u] = p;
        tin[u] = timer++;
        for(int v : adj[u]){
            if(v == p) continue;
            depth[v] = depth[u] + 1;
            self(self, v, u);
            sz[u] += sz[v];
        }
        tout[u] = timer++;
    };
    dfs(dfs, 1, 0);

    // Check if u is ancestor of v in O(1)
    auto is_ancestor = [&](int u, int v) -> bool {
        return tin[u] <= tin[v] && tout[v] <= tout[u];
    };

    // Answer queries
    int q;
    cin >> q;
    while(q--){
        int u, v; cin >> u >> v;
        if(is_ancestor(u, v))
            cout << u << " is ancestor of " << v << "\n";
        else
            cout << u << " is NOT ancestor of " << v << "\n";
    }
}

Explained Version: Tree DFS with Inline Comments

cpp
#include <bits/stdc++.h>
using namespace std;

int main(){
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    int n;
    cin >> n;
    vector<vector<int>> adj(n + 1);
    for(int i = 0; i < n - 1; i++){
        int u, v; cin >> u >> v;
        adj[u].push_back(v);
        adj[v].push_back(u);
    }

    // tin[u]: time when we first enter node u
    // tout[u]: time when we finish processing u and all its subtree
    // Together, [tin[u], tout[u]] defines the "interval" of u's subtree.
    vector<int> tin(n + 1), tout(n + 1), depth(n + 1, 0);
    vector<int> par(n + 1, 0), sz(n + 1, 1);
    int timer = 0;

    // We use auto& self pattern for recursive lambdas.
    // p = parent, prevents going back up the edge we came from.
    auto dfs = [&](auto& self, int u, int p) -> void {
        par[u] = p;
        tin[u] = timer++;   // record entry time

        for(int v : adj[u]){
            if(v == p) continue;   // skip parent edge (tree has no other cycles)
            depth[v] = depth[u] + 1;
            self(self, v, u);      // recurse into child
            sz[u] += sz[v];        // after child returns, accumulate subtree size
        }

        tout[u] = timer++;  // record exit time
    };
    dfs(dfs, 1, 0);  // root at 1, parent of root is 0 (sentinel)

    // Ancestor check: u is ancestor of v iff v's interval is inside u's interval.
    // This works because DFS timestamps form nested intervals on a tree.
    auto is_ancestor = [&](int u, int v) -> bool {
        return tin[u] <= tin[v] && tout[v] <= tout[u];
    };

    int q;
    cin >> q;
    while(q--){
        int u, v; cin >> u >> v;
        cout << (is_ancestor(u, v) ? "YES" : "NO") << "\n";
    }
}

ASCII Trace: DFS on a 7-Node Graph

Adjacency list:
  0: [1, 2]
  1: [0, 3, 4]
  2: [0, 5]
  3: [1]
  4: [1, 6]
  5: [2]
  6: [4]

Step  Action              Stack (top->)    Visited           disc/fin
---------------------------------------------------------------------
 1    push 0              [0]             {}                 -
 2    pop 0, visit        []              {0}                0: 1/-
 3    push 2,1            [2,1]           {0}                -
 4    pop 1, visit        [2]             {0,1}              1: 2/-
 5    push 4,3            [2,4,3]         {0,1}              -
 6    pop 3, visit        [2,4]           {0,1,3}            3: 3/4
 7    pop 4, visit        [2]             {0,1,3,4}          4: 5/-
 8    push 6              [2,6]           {0,1,3,4}          -
 9    pop 6, visit        [2]             {0,1,3,4,6}        6: 6/7
10    (4 finished)        [2]             {0,1,3,4,6}        4: 5/8
11    (1 finished)        [2]             {0,1,3,4,6}        1: 2/9
12    pop 2, visit        []              {0,1,2,3,4,6}      2: 10/-
13    push 5              [5]             {0,1,2,3,4,6}      -
14    pop 5, visit        []              {0,1,2,3,4,5,6}    5: 11/12
15    (2 finished)        []              {0,1,2,3,4,5,6}    2: 10/13
16    (0 finished)        []              {0,1,2,3,4,5,6}    0: 1/14

DFS tree edges: 0->1, 1->3, 1->4, 4->6, 0->2, 2->5

Operations Reference

DFS itself is always O(n+m), but the useful quantities you compute during the traversal -- timestamps, subtree sizes, depths -- each come at their own cost.

OperationTimeSpace
DFS traversal (graph)O(n+m)O(n+m) adj list + O(n) visited
DFS traversal (tree)O(n)O(n) adj list + arrays
Compute tin/toutO(n)O(n) for arrays
Compute subtree sizesO(n)O(n)
Compute depthsO(n)O(n)
Ancestor check (with tin/tout)O(1) per queryO(n) precomputation
Cycle detection (directed)O(n+m)O(n) color array
Cycle detection (undirected)O(n+m)O(n) visited + parent
Connected componentsO(n+m)O(n) visited

Recursive vs Iterative -- When to Pick Which

Both versions appear in Version 1 above. Key points:

  • Iterative DFS processes neighbors in reverse order compared to recursive (the stack reverses adjacency-list iteration).
  • For tree DFS with entry/exit times, iterative requires tracking whether a node is being entered or exited -- more bookkeeping than recursive.
  • For most contest problems with n2×105, recursive DFS is fine. Increase stack size with ulimit -s unlimited or a thread with larger stack.
  • For n>105 on strict stack limits, use iterative DFS (see Gotcha 1).

Problem Patterns & Tricks

Pattern 1: Connected Components

Count or label connected components in an undirected graph. Run DFS from every unvisited node -- each new DFS call is a new component.

cpp
int comp = 0;
for(int i = 1; i <= n; i++){
    if(!vis[i]){
        dfs(dfs, i);
        comp++;
    }
}

CF examples: CF 977E, CF 1176E


Pattern 2: Cycle Detection (Directed Graph)

Color-based DFS: white/gray/black. An edge to a gray node is a back edge (cycle). To reconstruct the cycle, record parents and trace back from the gray node.

cpp
vector<int> color(n + 1, 0); // 0=white, 1=gray, 2=black
bool has_cycle = false;
auto dfs = [&](auto& self, int u) -> void {
    color[u] = 1;
    for(int v : adj[u]){
        if(color[v] == 1) has_cycle = true; // back edge
        if(color[v] == 0) self(self, v);
    }
    color[u] = 2;
};

CF examples: CF 977E, CF 1385E


Pattern 3: Bipartite Check (2-Coloring)

DFS while coloring nodes with alternating colors. If an edge connects two same-colored nodes, the graph is not bipartite.

cpp
vector<int> col(n + 1, -1);
bool bipartite = true;
auto dfs = [&](auto& self, int u, int c) -> void {
    col[u] = c;
    for(int v : adj[u]){
        if(col[v] == -1) self(self, v, c ^ 1);
        else if(col[v] == c) bipartite = false;
    }
};

CF examples: CF 687A, CF 1144F


Pattern 4: Subtree Queries via tin/tout (Euler Tour)

Flatten the tree into an array using tin. Node u's subtree occupies indices [tin[u],tin[u]+sz[u]1] in the flattened array. Now subtree sum/min/max becomes a range query -- apply BIT or segment tree.

cpp
// After tree DFS with tin and sz computed:
// Subtree of u maps to range [tin[u], tin[u] + sz[u] - 1]
// To add val to node u: update position tin[u]
// To query subtree sum of u: range query [tin[u], tin[u]+sz[u]-1]

CF examples: CF 620E, CF 343D


Pattern 5: Tree Diameter

The diameter (longest path) of a tree can be found with two DFS calls: (1) DFS from any node, find the farthest node u. (2) DFS from u, find the farthest node v. Distance uv is the diameter.

cpp
auto farthest = [&](int start) -> pair<int,int> {
    vector<int> dist(n + 1, -1);
    dist[start] = 0;
    auto go = [&](auto& self, int u, int p) -> void {
        for(int v : adj[u]) if(v != p){
            dist[v] = dist[u] + 1;
            self(self, v, u);
        }
    };
    go(go, start, 0);
    int far = max_element(dist.begin() + 1, dist.begin() + n + 1) - dist.begin();
    return {far, dist[far]};
};
auto [u, d1] = farthest(1);
auto [v, diameter] = farthest(u);

CF examples: CF 1324F, CSES Tree Diameter


Pattern 6: Finding Bridges and Articulation Points

Extend DFS with tin and low arrays. low[u] = minimum tin reachable from the subtree of u via one back edge. Edge (u,v) is a bridge if low[v]>tin[u]. Node u is an articulation point if low[v]tin[u] for some child v (with special handling for root).

CF examples: CF 1000E, CF 118E

See: 02-bridges-and-articulation-points.md for full implementation.


Pattern 7: DFS on Implicit Graphs (Grid DFS)

Treat a 2D grid as a graph. Each cell has up to 4 neighbors. DFS to flood-fill components, count islands, etc.

cpp
int dx[] = {0,0,1,-1}, dy[] = {1,-1,0,0};
auto dfs = [&](auto& self, int x, int y) -> void {
    vis[x][y] = true;
    for(int d = 0; d < 4; d++){
        int nx = x + dx[d], ny = y + dy[d];
        if(nx >= 0 && nx < n && ny >= 0 && ny < m && !vis[nx][ny] && grid[nx][ny] == '.')
            self(self, nx, ny);
    }
};

CF examples: CF 1365D, CF 1594D


Contest Cheat Sheet

+--------------------------------------------------------------+
|                 DFS / TREE DFS CHEAT SHEET                   |
+--------------------------------------------------------------+
| WHEN TO USE:                                                 |
|   - Reachability, connected components, flood fill           |
|   - Cycle detection (directed: colors, undirected: parent)   |
|   - Bipartite check (2-color DFS)                            |
|   - Tree: subtree sizes, depths, tin/tout, diameter          |
+--------------------------------------------------------------+
| GRAPH DFS:                                                   |
|   vis[u] = true;                                             |
|   for(int v : adj[u]) if(!vis[v]) dfs(v);                   |
+--------------------------------------------------------------+
| TREE DFS (root r, parent p):                                 |
|   tin[u] = timer++; sz[u] = 1;                               |
|   for(int v : adj[u]) if(v != p)                             |
|       depth[v] = depth[u]+1, dfs(v,u), sz[u] += sz[v];      |
|   tout[u] = timer++;                                         |
+--------------------------------------------------------------+
| ANCESTOR CHECK:  tin[u] <= tin[v] && tout[v] <= tout[u]      |
+--------------------------------------------------------------+
| DIAMETER: dfs from any node -> farthest u; dfs from u -> v   |
+--------------------------------------------------------------+
| CYCLE (directed): edge to GRAY node = back edge = cycle      |
+--------------------------------------------------------------+
| COMPLEXITY:  O(n + m) time,  O(n + m) space                  |
+--------------------------------------------------------------+
| DANGER:                                                      |
|   - Stack overflow for n >= 1e5: use iterative or ulimit -s  |
|   - Mark visited BEFORE pushing (iterative), not after pop   |
|   - Reset vis[] between test cases                           |
+--------------------------------------------------------------+

Gotchas & Debugging

Gotcha 1: Stack Overflow on Deep Recursion

Default stack size is typically 1-8 MB. A recursive DFS on a chain of n=106 nodes overflows. Solutions:

cpp
// Option A: Increase stack size (Linux, compile-time)
// g++ -O2 -Wl,-z,stacksize=67108864 sol.cpp  (64 MB stack)

// Option B: Set stack via ulimit before running
// ulimit -s unlimited && ./sol

// Option C: Use iterative DFS (always safe)
stack<int> st;
st.push(root);
vis[root] = true;
while(!st.empty()){
    int u = st.top(); st.pop();
    for(int v : adj[u]) if(!vis[v]){
        vis[v] = true;
        st.push(v);
    }
}

On Codeforces, the stack limit is usually 256 MB, but not always. Check the problem constraints. If n2×105, recursive DFS is usually fine.

Gotcha 2: Iterative DFS -- Mark Visited Before Push, Not After Pop

cpp
// WRONG -- pushes same node multiple times
while(!st.empty()){
    int u = st.top(); st.pop();
    if(vis[u]) continue;  // wasteful, can TLE
    vis[u] = true;
    for(int v : adj[u]) if(!vis[v]) st.push(v);
}

// RIGHT -- mark before push
st.push(start);
vis[start] = true;
while(!st.empty()){
    int u = st.top(); st.pop();
    for(int v : adj[u]) if(!vis[v]){
        vis[v] = true;
        st.push(v);
    }
}

Gotcha 3: Forgetting to Reset Between Test Cases

With multiple test cases, forgetting to clear vis, tin, tout, etc. gives wrong answers. Use fill or reconstruct:

cpp
int t; cin >> t;
while(t--){
    int n; cin >> n;
    vector<vector<int>> adj(n + 1);
    // ... read input ...
    vector<bool> vis(n + 1, false);  // fresh each test case
    // ... dfs ...
}

Gotcha 4: 1-indexed vs 0-indexed

Adjacency list of size n with nodes 1..n needs adj(n+1), not adj(n). Off-by-one here causes out-of-bounds or missing nodes.

Gotcha 5: Undirected Cycle Detection -- Parent Check

In undirected graphs, every edge appears twice. You must skip the edge back to the parent, not just check visited:

cpp
// WRONG for multigraphs -- two edges between u and v look like a cycle
auto dfs = [&](auto& self, int u, int p) -> void {
    vis[u] = true;
    for(int v : adj[u]){
        if(v == p) continue;  // skip parent
        if(vis[v]){ /* cycle found */ }
        else self(self, v, u);
    }
};

For multigraphs (multiple edges between same pair), track the edge index instead of the parent node.

Gotcha 6: Tree DFS -- Root Choice

For unrooted trees, the root choice doesn't affect subtree sizes or depths relative to the root, but it changes which nodes are ancestors. Some problems require rerooting -- see 08-dp-on-trees.md.

Mental Traps

Trap 1: "DFS finds the shortest path"

DFS finds a path, not the shortest. The path it finds depends entirely on adjacency list order. Use BFS for shortest paths in unweighted graphs.

  Graph:  1 ------- 4          adj[1] = {2, 4}
          |         |
          2 --- 3 --+          DFS from 1 (visits 2 first):
                                 1 -> 2 -> 3 -> 4   (length 3)
  DFS path to 4:  1 -> 2 -> 3 -> 4  (length 3)
  Shortest path:  1 -> 4             (length 1)
                  ^^^^^^^^^^^^
                  BFS would find this!

Trap 2: "The visited array and the recursion stack are the same thing"

They are NOT. visited[v] = true means v has been discovered (WHITE -> GRAY or BLACK). The recursion stack holds only the currently active nodes (GRAY). This distinction is critical for cycle detection in directed graphs.

  After backtracking from node 4 (step 4 in walkthrough):

        1[G]            visited[] = {1, 2, 4}     (3 nodes)
       / \              stack     = [1, 2]         (2 nodes)
     2[G]  3[W]
    / \                 Node 4: visited = YES, on stack = NO  (BLACK)
  4[B] 5[W]            Node 2: visited = YES, on stack = YES (GRAY)

  An edge to node 4 (BLACK) is NOT a cycle -- it's a forward/cross edge.
  An edge to node 2 (GRAY)  IS a cycle -- it's a back edge.

  Using only visited[] for directed cycle detection is WRONG.
  You need the color/on-stack distinction (GRAY vs BLACK).

Common Mistakes

MistakeConsequenceFix
Recursive DFS on chain of 105+ nodesStack overflowUse iterative DFS or ulimit -s unlimited
Not marking visited before recursing/pushingInfinite loop or TLESet vis[v] = true before the recursive call or push
Using visited[] instead of colors for directed cycle detectionFalse positives -- cross/forward edges look like cyclesUse WHITE/GRAY/BLACK; only edge to GRAY = back edge = cycle
Not handling disconnected componentsMissing nodes entirelyWrap DFS in for(i=1..n) if(!vis[i]) dfs(i) outer loop

See Gotchas & Debugging for detailed examples of each.


Self-Test

  • [ ] I can write tree DFS with parent tracking -- dfs(u, par), if (v == par) continue, post-order subtree size
  • [ ] I can compute in-time and out-time correctly with a single timer incrementing on both entry and exit
  • [ ] I can state the subtree query: "is v in subtree of u?" using tin[u] <= tin[v] && tout[v] <= tout[u]
  • [ ] I can explain the stack overflow risk for large n and name one mitigation (iterative DFS or ulimit)
  • [ ] I can classify edges in DFS: back edge to gray node indicates a cycle

Practice Problems

#ProblemSourceDifficultyKey Idea
1Building RoadsCSESEasy (1200)Count and connect components via DFS
2Round TripCSESEasy (1400)Cycle detection in undirected graph, reconstruct cycle
3Graph ColoringCSESEasy (1400)Bipartite check via 2-color DFS
4LabyrinthCSESEasy (1400)Grid DFS/BFS, path reconstruction
5Tree DiameterCSESMedium (1500)Two-pass DFS for diameter
6SubordinatesCSESMedium (1400)Subtree sizes via tree DFS
7Tree Distances ICSESMedium (1700)Farthest node for each vertex (two DFS from diameter endpoints)
8CF 1324FCFMedium (1700)Maximum white-black balance on tree path
9CF 1144FCFMedium (1800)Bipartite check + greedy coloring
10CF 343DCFHard (2000)Subtree updates via Euler tour + BIT

10. Iterative DFS and BFS Using Explicit Data Structures

Why Iterative?

Recursive DFS uses the call stack, which is typically limited to ~10^4-10^5 frames (1-8 MB default stack on most systems). A chain graph with n=106 nodes causes a stack overflow. Iterative DFS with an explicit stack<int> uses heap memory instead, avoiding this limit entirely. For large graphs (n105), always prefer iterative DFS unless you've verified the stack size.

BFS is naturally iterative (queue-based) and doesn't have this problem.

Side-by-Side Comparison

DFS with Stack:                BFS with Queue:
+---------------------------+  +---------------------------+
| push(start)               |  | enqueue(start)            |
| mark start visited        |  | mark start visited        |
| while stack not empty:    |  | while queue not empty:    |
|   u = stack.top(); pop()  |  |   u = queue.front(); pop()|
|   process u               |  |   process u               |
|   for each neighbor v:    |  |   for each neighbor v:    |
|     if not visited[v]:    |  |     if not visited[v]:    |
|       mark visited[v]     |  |       mark visited[v]     |
|       push(v)             |  |       enqueue(v)          |
+---------------------------+  +---------------------------+

Key difference: stack (LIFO) vs queue (FIFO).
DFS dives deep first; BFS explores level by level.
Both visit every node exactly once: O(n + m).

Trace on Example Graph

    1 --- 2
    |     |
    3 --- 4 --- 5

adj[1] = {2, 3}    adj[2] = {1, 4}    adj[3] = {1, 4}
adj[4] = {2, 3, 5} adj[5] = {4}

DFS Trace (starting from node 1, stack = LIFO):

Step  Action              Stack (top->right)  Visited
----  ------------------  -----------------  ---------
 1    push(1), mark 1     [1]                {1}
 2    pop 1, push 2,3     [3, 2]             {1,2,3}
      (neighbors of 1: 2->mark+push, 3->mark+push)
 3    pop 2, push 4       [3, 4]             {1,2,3,4}
      (neighbors of 2: 1->visited, 4->mark+push)
 4    pop 4, push 5       [3, 5]             {1,2,3,4,5}
      (neighbors of 4: 2->visited, 3->visited, 5->mark+push)
 5    pop 5               [3]                {1,2,3,4,5}
      (neighbors of 5: 4->visited)
 6    pop 3               []                 {1,2,3,4,5}
      (neighbors of 3: 1->visited, 4->visited)

DFS visit order: 1 -> 2 -> 4 -> 5 -> 3

BFS Trace (starting from node 1, queue = FIFO):

Step  Action              Queue (front->right) Visited
----  ------------------  ------------------  ---------
 1    enqueue(1), mark 1  [1]                 {1}
 2    dequeue 1, enq 2,3  [2, 3]              {1,2,3}
      (neighbors of 1: 2->mark+enqueue, 3->mark+enqueue)
 3    dequeue 2, enq 4    [3, 4]              {1,2,3,4}
      (neighbors of 2: 1->visited, 4->mark+enqueue)
 4    dequeue 3           [4]                 {1,2,3,4}
      (neighbors of 3: 1->visited, 4->visited)
 5    dequeue 4, enq 5    [5]                 {1,2,3,4,5}
      (neighbors of 4: 2->visited, 3->visited, 5->mark+enqueue)
 6    dequeue 5           []                  {1,2,3,4,5}
      (neighbors of 5: 4->visited)

BFS visit order: 1 -> 2 -> 3 -> 4 -> 5
BFS levels:      L0:{1}  L1:{2,3}  L2:{4}  L3:{5}

Key observations from the trace:

  • DFS dives deep: 1->2->4->5 before backtracking to 3.
  • BFS explores level by level: all distance-1 nodes (2,3) before distance-2 nodes (4).
  • Both visit every node exactly once and inspect each edge twice (undirected).
  • BFS naturally gives shortest-path distances in unweighted graphs; DFS does not.

The Aha Moment

DFS is really about time. Every node gets a discovery time (when you first visit it) and a finish time (when you backtrack from it). These timestamps encode the entire structure of the search: if [inu,outu][inv,outv], then u is a descendant of v in the DFS tree. Once you see DFS as a time-stamping machine, everything -- back edges, subtree queries, Euler tours -- clicks into place.

Rating Progression

CF RatingWhat You Should Be Comfortable With
1200Basic DFS, connected components, tree traversal, cycle detection
1500DFS timestamps, subtree sizes, parent tracking, back/forward/cross edge classification
1800Euler tour flattening for range queries, bridges/articulation points, SCC (Tarjan/Kosaraju)
2100DFS tree properties for hard graph problems, centroid decomposition (DFS-based), virtual tree construction

Before You Code Checklist

  • [ ] Visited/color array initialized? Unvisited nodes must be clearly distinguishable.
  • [ ] Recursion depth: For n=106 nodes, recursive DFS will stack overflow. Use iterative DFS or increase stack size.
  • [ ] Parent tracking: In tree DFS, are you passing parent to avoid revisiting the parent as a "back edge"?
  • [ ] Timestamps consistent: Are tin and tout being incremented with the same global timer?
  • [ ] Edge classification: For directed graphs, did you distinguish tree/back/forward/cross edges using entry and exit times?

What Would You Do If...?

  1. ...the tree has 106 nodes and recursive DFS stack overflows? Use iterative DFS with an explicit stack, or set ulimit -s unlimited (Linux) / increase stack size in your compiler settings. In contests, iterative DFS is safer.

  2. ...you need to answer "is u an ancestor of v?" in O(1)? Use DFS timestamps: u is an ancestor of v if and only if tin[u]tin[v] and tout[u]tout[v].

  3. ...you need to detect ALL cycles in a directed graph, not just whether one exists? DFS with back-edge detection tells you a cycle exists, but extracting all cycles is #P-hard in general. Instead, find SCCs (Tarjan) -- every SCC with more than one node contains cycles.

The Mistake That Teaches

The bug: You implement cycle detection in a directed graph using DFS. But your code reports false cycles:

cpp
vector<bool> visited(n + 1, false);
bool has_cycle = false;

void dfs(int u) {
    visited[u] = true;
    for (int v : adj[u]) {
        if (visited[v]) { has_cycle = true; return; }
        dfs(v);
    }
}

The lesson: In a directed graph, reaching a visited node doesn't mean there's a cycle -- it might be a cross edge or forward edge. You need three colors: white (unvisited), gray (in current DFS path), black (fully processed). A cycle exists only when you reach a gray node (back edge). The code above works for undirected graphs (with parent check) but is wrong for directed ones.

Debug This

Bug 1: Tree DFS computes wrong subtree sizes:

cpp
int sz[MAXN];
void dfs(int u, int parent) {
    sz[u] = 1;
    for (int v : adj[u]) {
        if (v == parent) continue;
        dfs(v, u);
    }
    // where's the subtree sum?
}
Answer

After the recursive call dfs(v, u), you never add the child's subtree size to the parent. Fix: add sz[u] += sz[v]; after the recursive call.

Bug 2: DFS timestamps are wrong -- many nodes have the same tin:

cpp
int timer = 0;
void dfs(int u, int p) {
    tin[u] = timer;  // BUG: not incrementing
    for (int v : adj[u]) {
        if (v != p) dfs(v, u);
    }
    tout[u] = timer;  // BUG: not incrementing
}
Answer

The timer is never incremented. Fix: use tin[u] = timer++; and tout[u] = timer++;. Without incrementing, all nodes get the same timestamp and ancestor queries become meaningless.

Bug 3: Iterative DFS visits nodes in wrong order compared to recursive:

cpp
stack<int> st;
st.push(root);
while (!st.empty()) {
    int u = st.top(); st.pop();
    if (vis[u]) continue;
    vis[u] = true;
    for (int v : adj[u]) {
        if (!vis[v]) st.push(v);
    }
}
Answer

This visits neighbors in reverse order compared to recursive DFS because the stack reverses the adjacency list iteration. The last neighbor pushed is the first popped. If order matters (e.g., for consistent timestamps), iterate adj[u] in reverse: for (int i = adj[u].size()-1; i >= 0; i--). For most problems, the order doesn't affect correctness, but it can affect output matching.

Historical Origin

Depth-first search was first formally analyzed by Robert Tarjan in 1972, who showed how DFS timestamps and back edges could solve problems like finding strongly connected components, biconnected components, and bridges -- all in linear time. His work earned him the Turing Award in 1986.

Mnemonic Anchor

"Go deep, stamp time, backtrack with knowledge." DFS is a time machine: entry time, exit time, and everything in between is your subtree.


Further Reading


Code Reading Exercise

What does this code compute? Read it carefully before checking the answer.

cpp
int ans;

int dfs(int u, int p, vector<vector<int>>& adj) {
    int mx1 = 0, mx2 = 0;
    for (int v : adj[u]) {
        if (v == p) continue;
        int d = dfs(v, u, adj) + 1;
        if (d >= mx1) { mx2 = mx1; mx1 = d; }
        else if (d > mx2) { mx2 = d; }
    }
    ans = max(ans, mx1 + mx2);
    return mx1;
}
Answer This computes the diameter of a tree -- the length of the longest path between any two nodes. At each node u, it tracks the two longest downward paths (mx1 and mx2). The longest path passing through u has length mx1 + mx2, which updates the global answer. The function returns the single longest downward path for the parent's computation. Called once from the root with ans = 0.

What does this code compute? Assume adj represents a tree with n nodes.

cpp
int result;

void dfs(int u, int p, int n, vector<vector<int>>& adj, vector<int>& sz) {
    sz[u] = 1;
    int heaviest = 0;
    for (int v : adj[u]) {
        if (v == p) continue;
        dfs(v, u, n, adj, sz);
        sz[u] += sz[v];
        heaviest = max(heaviest, sz[v]);
    }
    heaviest = max(heaviest, n - sz[u]);
    if (heaviest <= n / 2) result = u;
}
Answer This finds the centroid of a tree -- the node whose removal minimizes the size of the largest remaining connected component. For each node u, it computes the heaviest component: the largest child subtree or the "upward" part (n - sz[u]). If the heaviest part is <= n/2, that node is a centroid. Every tree has at least one centroid, and this single DFS finds it in O(n).

Recap

  • DFS explores as deep as possible before backtracking. Time: O(V+E).
  • The recursion stack IS the current root-to-node path -- back edges to nodes on this path reveal cycles.
  • tin/tout timestamps encode subtree structure: v in subtree(u) iff tin[u]tin[v] and tout[v]tout[u] -- O(1) per query.
  • For trees: DFS naturally computes depth, subtree size, parent, and Euler tour (flattening for range queries).
  • Edge classification (tree/back/forward/cross) drives cycle detection, topological sort, SCC, and bridge-finding.
  • Watch for stack overflow on large n (use iterative DFS), and always handle disconnected components with an outer loop.

Built for the climb from Codeforces 1100 to 2100.