INFO
An off-policy, actor-critic algorithm designed for environments with continuous action spaces.
- Blends the strength of DQN and Policy Gradient Methods to learn both a Q-function and a deterministic policy simultaneously
Components
- Actor Network: Learns a deterministic policy that maps states to actions
- Critic Network: Estimates the Q-value for given state-action pairs
- Target Networks: Stabilize training by slowly updating target versions of actor and critic
- Replay Buffer: Stores transitions for off-policy learning
- Exploration Noise: Adds noise (e.g. Ornstein-Uhlenbeck) to actions for exploration in continuous space
Key Features
- Deterministic Policy
- Unlike stochastic Policy Gradient methods, Deep Deterministic Policy Gradient uses a deterministic actor
- Efficient in high-dimensional action spaces where sampling is costly
- Off-Policy Learning
- Learns from stored experiences, improving sample efficiency
- Enables reuse of past trajectories for training
- Actor-Critic Architecture
- Actor proposes actions
- Critic evaluates them
- Critic guides actor updates via gradient of Q-values
- Target Networks
- Reduce training instability by slowing updating target parameters
- Exploration via Noise
- Adds temporary correlated noise (e.g. Ornstein-Uhlenbeck) to encourage exploration
- especially useful in physical control tasks