Skip to content

Topological Sort

Introduction

Topological Sort is a way of arranging items in a specific order where some items must come before others — like figuring out the right sequence to get dressed (underwear before pants, socks before shoes) or the right order to take university courses (Calculus I before Calculus II). It's one of the most practical graph algorithms you'll encounter, powering everything from build systems and task schedulers to spreadsheet formula evaluation.

The Problem: Ordering Things with Dependencies

Imagine you're planning a construction project. You can't put up the roof before the walls, and you can't build the walls before the foundation. Each task depends on others being completed first. Now imagine you have hundreds of tasks with complex dependencies — how do you figure out a valid order to do everything?

This is exactly what Topological Sort solves. Given a set of items and rules about which items must come before others, it produces a linear ordering that respects all the rules.

Real-World Examples

  • Course prerequisites: You need Algebra before Calculus, and Calculus before Differential Equations.
  • Build systems: File A.js imports from B.js, which imports from C.js — the compiler must process C first, then B, then A.
  • Package managers: npm, pip, and apt all use topological sort to determine installation order.
  • Spreadsheet formulas: Cell A1 references B2, which references C3. The spreadsheet must recalculate C3 → B2 → A1.
  • Task scheduling: In CI/CD pipelines, tests must pass before deployment, and builds must succeed before tests.

Core Concepts

Directed Acyclic Graphs (DAGs)

Topological Sort only works on Directed Acyclic Graphs (DAGs). Let's break that down:

  • Directed: Each edge has a direction — "A must come before B" is different from "B must come before A."
  • Acyclic: No cycles allowed. If A depends on B, B depends on C, and C depends on A, there's no valid order — that's a deadlock.

If someone gives you a graph with a cycle, topological sort will detect it and tell you that no valid ordering exists.

In-degree

The in-degree of a node is the number of edges pointing into it — in other words, how many prerequisites it has. A node with in-degree 0 has no prerequisites and can be processed immediately.

Multiple Valid Orderings

Unlike sorting numbers where there's one correct answer, topological sort can produce multiple valid orderings. For example, if A and B have no dependencies, either [A, B, C] or [B, A, C] is correct — as long as both come before anything that depends on them.

Two Main Algorithms

There are two classic approaches to topological sort. Both produce valid orderings, but they work in fundamentally different ways.

Approach 1: Kahn's Algorithm (BFS-based)

Kahn's Algorithm uses a simple, intuitive strategy:

  1. Find all nodes with no prerequisites (in-degree = 0).
  2. Add them to a queue.
  3. Process nodes from the queue one at a time:
    • Add the node to the result.
    • Remove all its outgoing edges ("mark dependencies as satisfied").
    • If any neighbor now has in-degree 0, add it to the queue.
  4. If the result contains all nodes, you have a valid ordering. Otherwise, there's a cycle.

Think of it like a buffet line: people with no one in front of them go first, and as they leave, new people become "first in line."

Implementation: Kahn's Algorithm

javascript
function topologicalSortKahn(numNodes, edges) {
  // Build adjacency list and in-degree array
  const adjacency = Array.from({ length: numNodes }, () => []);
  const inDegree = new Array(numNodes).fill(0);

  for (const [from, to] of edges) {
    adjacency[from].push(to);
    inDegree[to]++;
  }

  // Start with all nodes that have no prerequisites
  const queue = [];
  for (let i = 0; i < numNodes; i++) {
    if (inDegree[i] === 0) {
      queue.push(i);
    }
  }

  const result = [];

  while (queue.length > 0) {
    const node = queue.shift();
    result.push(node);

    // "Satisfy" this dependency for all neighbors
    for (const neighbor of adjacency[node]) {
      inDegree[neighbor]--;
      if (inDegree[neighbor] === 0) {
        queue.push(neighbor);
      }
    }
  }

  // If we didn't process all nodes, there's a cycle
  if (result.length !== numNodes) {
    return null; // Cycle detected!
  }

  return result;
}

// Example: Course schedule
// 0: Algebra, 1: Calculus, 2: Linear Algebra,
// 3: Diff Equations, 4: Statistics
const edges = [
  [0, 1], // Algebra before Calculus
  [0, 2], // Algebra before Linear Algebra
  [1, 3], // Calculus before Diff Equations
  [2, 3], // Linear Algebra before Diff Equations
  [0, 4], // Algebra before Statistics
];

const order = topologicalSortKahn(5, edges);
console.log("Course order:", order);
// Possible output: [0, 1, 2, 4, 3]
// Algebra → Calculus → Linear Algebra → Statistics → Diff Equations

Approach 2: DFS-based (Post-order)

The DFS approach explores the graph depth-first and adds nodes to the result after all their dependents have been visited. The result is then reversed to get the topological order.

Think of it like exploring a maze: you go as deep as possible, and when you hit a dead end, you record that node. The deepest nodes (those with no further dependencies) get recorded first.

Implementation: DFS-based Topological Sort

javascript
function topologicalSortDFS(numNodes, edges) {
  const adjacency = Array.from({ length: numNodes }, () => []);

  for (const [from, to] of edges) {
    adjacency[from].push(to);
  }

  const WHITE = 0; // Unvisited
  const GRAY = 1;  // In progress (currently in call stack)
  const BLACK = 2; // Fully processed

  const color = new Array(numNodes).fill(WHITE);
  const stack = [];
  let hasCycle = false;

  function dfs(node) {
    if (hasCycle) return;

    color[node] = GRAY; // Mark as "in progress"

    for (const neighbor of adjacency[node]) {
      if (color[neighbor] === GRAY) {
        // We've found a back edge — cycle!
        hasCycle = true;
        return;
      }
      if (color[neighbor] === WHITE) {
        dfs(neighbor);
      }
    }

    color[node] = BLACK; // Mark as "done"
    stack.push(node);    // Post-order: add after all descendants
  }

  for (let i = 0; i < numNodes; i++) {
    if (color[i] === WHITE) {
      dfs(i);
    }
  }

  if (hasCycle) return null;

  return stack.reverse(); // Reverse post-order = topological order
}

// Same course schedule example
const edges = [
  [0, 1], [0, 2], [1, 3], [2, 3], [0, 4],
];

const order = topologicalSortDFS(5, edges);
console.log("Course order (DFS):", order);

Step-by-Step Walkthrough

Let's trace Kahn's algorithm through a build system example:

Modules: A, B, C, D, E
Dependencies: A→C, B→C, B→D, C→E, D→E

Comparing the Two Approaches

FeatureKahn's (BFS)DFS-based
StrategyRemove nodes with no prerequisitesExplore deepest paths first
Cycle detectionCount processed nodes ≠ totalBack edge detection (gray → gray)
Data structureQueueCall stack (recursion)
Time complexityO(V + E)O(V + E)
Space complexityO(V + E)O(V + E)
Better when...You need level-by-level processingYou're already doing DFS
Parallelism hintNodes dequeued together can run in parallelNot as natural

Practical Application: Build System

Let's build a more realistic example — a module build system that resolves dependencies:

javascript
class BuildSystem {
  constructor() {
    this.modules = new Map(); // name → Set of dependencies
  }

  addModule(name, dependencies = []) {
    if (!this.modules.has(name)) {
      this.modules.set(name, new Set());
    }
    for (const dep of dependencies) {
      this.modules.get(name).add(dep);
      if (!this.modules.has(dep)) {
        this.modules.set(dep, new Set());
      }
    }
  }

  getBuildOrder() {
    const names = [...this.modules.keys()];
    const nameToIndex = new Map();
    names.forEach((name, i) => nameToIndex.set(name, i));

    const n = names.length;
    const inDegree = new Array(n).fill(0);
    const adjacency = Array.from({ length: n }, () => []);

    for (const [module, deps] of this.modules) {
      const moduleIdx = nameToIndex.get(module);
      for (const dep of deps) {
        const depIdx = nameToIndex.get(dep);
        adjacency[depIdx].push(moduleIdx); // dep → module
        inDegree[moduleIdx]++;
      }
    }

    // Kahn's algorithm
    const queue = [];
    for (let i = 0; i < n; i++) {
      if (inDegree[i] === 0) queue.push(i);
    }

    const order = [];
    const parallelGroups = []; // Bonus: group tasks that can run in parallel

    while (queue.length > 0) {
      // All items currently in queue can be built simultaneously
      const currentBatch = [...queue];
      parallelGroups.push(currentBatch.map(i => names[i]));
      queue.length = 0;

      for (const node of currentBatch) {
        order.push(names[node]);
        for (const neighbor of adjacency[node]) {
          inDegree[neighbor]--;
          if (inDegree[neighbor] === 0) {
            queue.push(neighbor);
          }
        }
      }
    }

    if (order.length !== n) {
      throw new Error(
        "Circular dependency detected! Cannot determine build order."
      );
    }

    return { order, parallelGroups };
  }
}

// Usage
const build = new BuildSystem();
build.addModule("app", ["ui", "api"]);
build.addModule("ui", ["components", "styles"]);
build.addModule("api", ["utils", "auth"]);
build.addModule("components", ["utils"]);
build.addModule("styles", []);
build.addModule("utils", []);
build.addModule("auth", ["utils"]);

const { order, parallelGroups } = build.getBuildOrder();
console.log("Build order:", order);
console.log("Parallel groups:");
parallelGroups.forEach((group, i) =>
  console.log(`  Step ${i + 1}: [${group.join(", ")}]`)
);
// Step 1: [styles, utils]          ← can build simultaneously
// Step 2: [components, auth]       ← can build simultaneously
// Step 3: [ui, api]                ← can build simultaneously
// Step 4: [app]

Cycle Detection in Detail

One of the most valuable side effects of topological sort is detecting cycles. Here's a focused cycle detection example:

javascript
function detectCycles(numNodes, edges) {
  const adjacency = Array.from({ length: numNodes }, () => []);
  const inDegree = new Array(numNodes).fill(0);

  for (const [from, to] of edges) {
    adjacency[from].push(to);
    inDegree[to]++;
  }

  const queue = [];
  for (let i = 0; i < numNodes; i++) {
    if (inDegree[i] === 0) queue.push(i);
  }

  let processedCount = 0;

  while (queue.length > 0) {
    const node = queue.shift();
    processedCount++;

    for (const neighbor of adjacency[node]) {
      inDegree[neighbor]--;
      if (inDegree[neighbor] === 0) queue.push(neighbor);
    }
  }

  if (processedCount === numNodes) {
    console.log("No cycle detected. Graph is a valid DAG.");
    return false;
  }

  // Nodes still with inDegree > 0 are part of cycles
  const cycleNodes = [];
  for (let i = 0; i < numNodes; i++) {
    if (inDegree[i] > 0) cycleNodes.push(i);
  }
  console.log("Cycle detected! Nodes involved:", cycleNodes);
  return true;
}

// No cycle
detectCycles(4, [[0,1],[1,2],[0,3]]); // No cycle detected

// Has cycle: 1 → 2 → 3 → 1
detectCycles(4, [[0,1],[1,2],[2,3],[3,1]]); // Cycle detected! Nodes: [1,2,3]

Classic Interview Problem: Course Schedule

This is one of the most common interview problems involving topological sort. Given numCourses and a list of prerequisite pairs, determine if it's possible to finish all courses.

javascript
/**
 * LeetCode 207: Course Schedule
 * Given numCourses and prerequisites array,
 * return true if you can finish all courses.
 */
function canFinish(numCourses, prerequisites) {
  const adjacency = Array.from({ length: numCourses }, () => []);
  const inDegree = new Array(numCourses).fill(0);

  for (const [course, prereq] of prerequisites) {
    adjacency[prereq].push(course);
    inDegree[course]++;
  }

  const queue = [];
  for (let i = 0; i < numCourses; i++) {
    if (inDegree[i] === 0) queue.push(i);
  }

  let count = 0;
  while (queue.length > 0) {
    const course = queue.shift();
    count++;
    for (const next of adjacency[course]) {
      inDegree[next]--;
      if (inDegree[next] === 0) queue.push(next);
    }
  }

  return count === numCourses;
}

/**
 * LeetCode 210: Course Schedule II
 * Return the ordering of courses you should take.
 * Return empty array if impossible.
 */
function findOrder(numCourses, prerequisites) {
  const adjacency = Array.from({ length: numCourses }, () => []);
  const inDegree = new Array(numCourses).fill(0);

  for (const [course, prereq] of prerequisites) {
    adjacency[prereq].push(course);
    inDegree[course]++;
  }

  const queue = [];
  for (let i = 0; i < numCourses; i++) {
    if (inDegree[i] === 0) queue.push(i);
  }

  const order = [];
  while (queue.length > 0) {
    const course = queue.shift();
    order.push(course);
    for (const next of adjacency[course]) {
      inDegree[next]--;
      if (inDegree[next] === 0) queue.push(next);
    }
  }

  return order.length === numCourses ? order : [];
}

// Test
console.log(canFinish(4, [[1,0],[2,1],[3,2]])); // true
console.log(canFinish(2, [[0,1],[1,0]]));       // false (cycle)
console.log(findOrder(4, [[1,0],[2,0],[3,1],[3,2]]));
// [0, 1, 2, 3] or [0, 2, 1, 3]

Time and Space Complexity

Where V is the number of vertices (nodes) and E is the number of edges. Both Kahn's and DFS-based approaches share the same asymptotic complexity.

When to Use Which Approach

Best Practices

  1. Choose Kahn's for most interview problems: It's more intuitive, easier to debug, and naturally detects cycles by counting processed nodes.
  2. Use DFS-based when integrating with existing DFS traversals: If you're already doing a depth-first search (e.g., finding strongly connected components), tacking on topological sort is trivial.
  3. Always validate for cycles: In production code, never assume the input is a DAG. Always check and handle the cyclic case gracefully.
  4. Leverage parallel groups from Kahn's: The nodes available at each step of Kahn's algorithm can be processed in parallel — useful for task schedulers and build systems.
  5. Represent the graph with adjacency lists: Don't use adjacency matrices for topological sort — they waste space and make the algorithm O(V²) instead of O(V + E).
  6. Consider using string-keyed maps for real-world problems: The index-based examples above are great for interview problems, but production code often benefits from using module names or task IDs directly as keys.
  7. Watch for disconnected components: Both algorithms handle disconnected graphs correctly, but make sure you initialize the starting set by scanning all nodes, not just node 0.
  • Graphs — Topological sort operates on directed graphs; understanding graph representations (adjacency list vs. matrix) is essential.
  • Breadth-First Search (BFS) — Kahn's algorithm is essentially BFS with in-degree tracking.
  • Depth-First Search (DFS) — The DFS-based topological sort is a direct application of DFS with post-order recording.
  • Shortest Path algorithms — On DAGs, the shortest path can be found in O(V + E) by processing nodes in topological order.
  • Dynamic Programming on DAGs — Many DP problems on graphs require topological ordering to ensure subproblems are solved before they're needed.

Tests and Challenges

Practice these problems to solidify your understanding of topological sort:

LeetCode

HackerRank

Start with Course Schedule (LeetCode 207) — it's the purest topological sort problem and appears frequently in interviews. Once comfortable, try Alien Dictionary for a more creative application.