INFO
a heuristic search algorithm used for decision-making in large, complex environments.
- builds the search tree incrementally using random simulations (rollouts) to evaluate actions, balancing exploration and exploitation statistically
How it Works
- Selection: Traverse the tree from
rootusing a policy to select promisingnodes - Expansion: Add one or more
child nodesto the tree from the selected node - Simulation: Run a random or heuristic-based playout from the new node to a terminal state
- Backpropagation: Propagate the simulation result back up the tree, updating visit counts and win rates
Key Features
- Exploration vs Exploitation via UCB1
- Uses Upper Confidence Bound (UCB1) to select nodes:
- : average reward of node
- : total visits to parent
- : visits to node
- : exploration constant (typically )
- Anytime Algorithm
- Can be stopped at any time and still return the best-known action
- Ideal for real-time decision-making under computational constraints
- Domain-Agnostic
- Requires no domain-specific evaluation function
- Works well in environments where simulation is cheap but evaluation is hard
- Scalable and Parallelizable
- Can be distributed across cores or machines for faster search