Deep Q-Network (DQN)

INFO

Combines Q-learning with Deep Neural Networks to handle high-dimensional state spaces

Uses experience replay and fixed Q-targets to stabilize training
- Allows model to efficiently learn from high-dimensional data

Training Workflow

Function Approximation with Deep Neural Networks
- Replaces traditional Q-tables with a neural network to estimate $Q (s, a)$
- Enables learning in environments with large or continuous state spaces
Experience Replay
- Stores transitions $(s, a, r, s^{'})$ in a buffer
- Randomly samples mini-batches to break temporal correlations
- Improves data efficiency and stabilizes training
Target Network
- Maintains a separate, slowly-updated copy of the Q-network
- Reduces oscillations and divergence during training
- Target network parameters $θ^{-}$ are synced every $N$ steps
$ϵ$ -Greedy Exploration
- Balances exploration and exploitation
- Starts with high $ϵ$ (more exploration), decays over time to favor learned policy
Bellman Loss Optimization
- Uses the Bellman equation to compute temporal difference (TD) error $L (θ) = E [(r + γ a^{'} max Q (s^{'}, a^{'}; θ^{-}) - Q (s, a; θ))^{2}]$
Scalability to Visual Input
- Can process raw image frames using convolutional layers