Monte Carlo Tree Search

INFO

a heuristic search algorithm used for decision-making in large, complex environments.

builds the search tree incrementally using random simulations (rollouts) to evaluate actions, balancing exploration and exploitation statistically

How it Works

Selection: Traverse the tree from root using a policy to select promising nodes
Expansion: Add one or more child nodes to the tree from the selected node
Simulation: Run a random or heuristic-based playout from the new node to a terminal state
Backpropagation: Propagate the simulation result back up the tree, updating visit counts and win rates

Exploration vs Exploitation via UCB1
- Uses Upper Confidence Bound (UCB1) to select nodes:
$UCB1 (i) = \overset{ˉ}{X}_{i} + c \frac{ln N}{n _{i}}$
- $\overset{ˉ}{X}_{i}$ : average reward of node $i$
- $N$ : total visits to parent
- $n_{i}$ : visits to node $i$
- $c$ : exploration constant (typically $2$ )
Anytime Algorithm
- Can be stopped at any time and still return the best-known action
- Ideal for real-time decision-making under computational constraints
Domain-Agnostic
- Requires no domain-specific evaluation function
- Works well in environments where simulation is cheap but evaluation is hard
Scalable and Parallelizable
- Can be distributed across cores or machines for faster search