• statistical technique used to analyze and interpret data points collected or recorded at successive time intervals
  • commonly employed to identify underlying patterns, trends, seasonality, and cyclic behavior within time-dependent datasets
  • accounts for the temporal structure inherent in the data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
 
# Generate synthetic time series data (e.g., sales over 100 days)
np.random.seed(42)
time_index = pd.date_range(start='2023-01-01', periods=100, freq='D')
sales_data = np.cumsum(np.random.randn(100) * 5 + 50)  # Trend + Random noise
df = pd.DataFrame({'Date': time_index, 'Sales': sales_data}).set_index('Date')
 
# Perform stationarity test (Augmented Dickey-Fuller test)
adf_test = adfuller(df['Sales'])
print(f"ADF Statistic: {adf_test[0]:.4f}")
print(f"P-Value: {adf_test[1]:.4f}")
 
# Fit ARIMA Model
model = ARIMA(df['Sales'], order=(2,1,2))  # ARIMA(p,d,q)
model_fit = model.fit()
 
# Forecast the next 10 periods
forecast = model_fit.forecast(steps=10)
 
# Plot results
plt.figure(figsize=(10, 5))
plt.plot(df.index, df['Sales'], label='Observed Sales', color='blue')
plt.plot(pd.date_range(df.index[-1], periods=11, freq='D')[1:], forecast, label='Forecast', color='red', linestyle='dashed')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.legend()
plt.title('Time Series Forecasting with ARIMA')
plt.show()

Analysis Process

  1. stationarity check using the Augmented Dickey-Fuller test (ADF)
    • determines whether the time series data is stationary → mean and variance do not change over time due to trends or seasonality
    • p-value less than 0.05 indicates stationarity → does not require additional transformation
    • if time series is non-stationary
      • stabilize the dataset by applying differencing (an integrated component ARIMA)
  2. modeling the time series data
    • crucial for generating accurate forecasts
    • ARIMA model
      • commonly used for forecasting
      • three components:
        1. Auto Regressive (AR)
          • models the relationship between past observations
        2. Integrated (I)
          • ensures stationarity through differencing
        3. Moving Average (MA)
          • accounts for past error items

IMPORTANT

The code above is called the function ARIMA(2, 1, 2) meaning is uses 2 lagged values, 1 differencing step, and 2 moving average terms

Advantages

  • provides structured framework for analyzing historical data
  • techniques such as decomposition and ARIMA help capture trends and seasonality
    • improves prediction accuracy
  • supports data-driven decision-making
    • enabling businesses to optimize these parameters based on anticipated outcomes
      • operations
      • recourse allocation
      • financial planning
  • versatility extends across various industries

Disadvantages

  • ARIMA assumes stationarity
    • requires additional preprocessing if strong trends or seasonality exist
  • technique is highly sensitive to outliers
    • unexpected events can significantly impact forecast accuracy
  • generally perform better for short-term predictions
    • long-term forecasts becomes increasingly uncertain
  • model selection and parameter tuning require expertise
    • determining optimal values for ARIMA’s p, d, and q parameters often involves a trial-and-error approach
    • challenging for non-experts

What is Time Series Analysis?