Stochastic Gradient Descent (SGD) Classifier

INFO

Optimization algorithm used in machine learning for efficiently training models, particularly those involving large-scale datasets

Developed by: Herbert Robbins and Sutton Monro (1951, foundational SGD theory)
Core Principle: Updates model parameters using one sample at a time, making it computationally efficient for large datasets
Search Strategy:
- Applies stochastic gradient descent optimization
- Supports multiple loss functions:
  - Hinge loss (for Support Vector Machines)
  - Log loss (for Logistic Regression)
- Often used with linear models for classification tasks
- Enables online learning by updating weights incrementally

Workflow

Data Preparation
- Standardize features to ensure consistent gradient updates
Model Training
- Fit SGD classifier using chosen loss function
- Update weights per sample using gradient of loss
Prediction & Evaluation
- Predict class labels on test data
- Evaluate using metrics:
  - Accuracy
  - Classification Report (includes precision, recall, F1-score)

Code Example

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score, classification_report
 
# Generating a synthetic dataset (simulating e-commerce browsing behavior)
np.random.seed(42)
n_samples = 1000
 
# Features: products viewed, time spent (minutes), cart added (binary)
X = np.random.rand(n_samples, 3) * [10, 50, 1]  # Scaling different ranges
y = np.random.choice([0, 1], size=n_samples)  # Binary target: 1 (Purchase), 0 (No Purchase)
 
# Splitting into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 
# Standardizing the data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
 
# Training the SGD Classifier
sgd_clf = SGDClassifier(loss='log', max_iter=1000, learning_rate='optimal', random_state=42)
sgd_clf.fit(X_train, y_train)
 
# Predictions
y_pred = sgd_clf.predict(X_test)
 
# Evaluating the Model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
 
# Display results
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:\n", report)

Advantages

Scalable for large datasets
- Updates weights incrementally
- Suitable for real-time applications
Fast and memory-efficient
- Especially effective with high-dimensional data
Supports online learning
- Model can be updated continuously with new data

Disadvantages

High variance and instability
- Single-sample updates introduce stochastic noise
- May converge to suboptimal solutions
Sensitive to hyperparameters
- Learning rate and number of iterations must be tuned carefully
Not ideal for small datasets
- Random updates may lead to poor generalization
- May require batch or mini-batch gradient descent for stability

Jason's Notebook

Explorer

Stochastic Gradient Descent (SGD) Classifier

Workflow

Code Example

Advantages

Disadvantages

Graph View

Table of Contents

Backlinks