INFO
Combines Q-learning with Deep Neural Networks to handle high-dimensional state spaces
- Uses experience replay and fixed Q-targets to stabilize training
- Allows model to efficiently learn from high-dimensional data
Training Workflow
- Initialize replay buffer and networks (main and target)
- For each step:
- Select action using -greedy policy
- Store transition in replay buffer
- Sample mini-batch and compute loss
- Update main network via gradient descent
- Periodically sync target network
Key Feature
- Function Approximation with Deep Neural Networks
- Replaces traditional Q-tables with a neural network to estimate
- Enables learning in environments with large or continuous state spaces
- Experience Replay
- Stores transitions in a buffer
- Randomly samples mini-batches to break temporal correlations
- Improves data efficiency and stabilizes training
- Target Network
- Maintains a separate, slowly-updated copy of the Q-network
- Reduces oscillations and divergence during training
- Target network parameters are synced every steps
- -Greedy Exploration
- Balances exploration and exploitation
- Starts with high (more exploration), decays over time to favor learned policy
- Bellman Loss Optimization
- Uses the Bellman equation to compute temporal difference (TD) error
- Scalability to Visual Input
- Can process raw image frames using convolutional layers