INFO

Combines Q-learning with Deep Neural Networks to handle high-dimensional state spaces

  • Uses experience replay and fixed Q-targets to stabilize training
    • Allows model to efficiently learn from high-dimensional data

Training Workflow

  1. Initialize replay buffer and networks (main and target)
  2. For each step:
    • Select action using -greedy policy
    • Store transition in replay buffer
    • Sample mini-batch and compute loss
    • Update main network via gradient descent
    • Periodically sync target network

Key Feature

  • Function Approximation with Deep Neural Networks
    • Replaces traditional Q-tables with a neural network to estimate
    • Enables learning in environments with large or continuous state spaces
  • Experience Replay
    • Stores transitions in a buffer
    • Randomly samples mini-batches to break temporal correlations
    • Improves data efficiency and stabilizes training
  • Target Network
    • Maintains a separate, slowly-updated copy of the Q-network
    • Reduces oscillations and divergence during training
    • Target network parameters are synced every steps
  • -Greedy Exploration
    • Balances exploration and exploitation
    • Starts with high (more exploration), decays over time to favor learned policy
  • Bellman Loss Optimization
    • Uses the Bellman equation to compute temporal difference (TD) error
  • Scalability to Visual Input
    • Can process raw image frames using convolutional layers