INFO

An off-policy, actor-critic algorithm designed for environments with continuous action spaces.

Components

  • Actor Network: Learns a deterministic policy that maps states to actions
  • Critic Network: Estimates the Q-value for given state-action pairs
  • Target Networks: Stabilize training by slowly updating target versions of actor and critic
  • Replay Buffer: Stores transitions for off-policy learning
  • Exploration Noise: Adds noise (e.g. Ornstein-Uhlenbeck) to actions for exploration in continuous space

Key Features

  1. Deterministic Policy
    • Unlike stochastic Policy Gradient methods, Deep Deterministic Policy Gradient uses a deterministic actor
    • Efficient in high-dimensional action spaces where sampling is costly
  2. Off-Policy Learning
    • Learns from stored experiences, improving sample efficiency
    • Enables reuse of past trajectories for training
  3. Actor-Critic Architecture
    • Actor proposes actions
    • Critic evaluates them
    • Critic guides actor updates via gradient of Q-values
  4. Target Networks
    • Reduce training instability by slowing updating target parameters
  5. Exploration via Noise
    • Adds temporary correlated noise (e.g. Ornstein-Uhlenbeck) to encourage exploration
    • especially useful in physical control tasks