- statistical technique used to analyze and interpret data points collected or recorded at successive time intervals
- commonly employed to identify underlying patterns, trends, seasonality, and cyclic behavior within time-dependent datasets
- accounts for the temporal structure inherent in the data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
# Generate synthetic time series data (e.g., sales over 100 days)
np.random.seed(42)
time_index = pd.date_range(start='2023-01-01', periods=100, freq='D')
sales_data = np.cumsum(np.random.randn(100) * 5 + 50) # Trend + Random noise
df = pd.DataFrame({'Date': time_index, 'Sales': sales_data}).set_index('Date')
# Perform stationarity test (Augmented Dickey-Fuller test)
adf_test = adfuller(df['Sales'])
print(f"ADF Statistic: {adf_test[0]:.4f}")
print(f"P-Value: {adf_test[1]:.4f}")
# Fit ARIMA Model
model = ARIMA(df['Sales'], order=(2,1,2)) # ARIMA(p,d,q)
model_fit = model.fit()
# Forecast the next 10 periods
forecast = model_fit.forecast(steps=10)
# Plot results
plt.figure(figsize=(10, 5))
plt.plot(df.index, df['Sales'], label='Observed Sales', color='blue')
plt.plot(pd.date_range(df.index[-1], periods=11, freq='D')[1:], forecast, label='Forecast', color='red', linestyle='dashed')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.legend()
plt.title('Time Series Forecasting with ARIMA')
plt.show()Analysis Process
- stationarity check using the Augmented Dickey-Fuller test (ADF)
- determines whether the time series data is stationary → mean and variance do not change over time due to trends or seasonality
- p-value less than 0.05 indicates stationarity → does not require additional transformation
- if time series is non-stationary
- stabilize the dataset by applying differencing (an integrated component ARIMA)
- modeling the time series data
- crucial for generating accurate forecasts
- ARIMA model
- commonly used for forecasting
- three components:
- Auto Regressive (AR)
- models the relationship between past observations
- Integrated (I)
- ensures stationarity through differencing
- Moving Average (MA)
- accounts for past error items
- Auto Regressive (AR)
IMPORTANT
The code above is called the function ARIMA(2, 1, 2) meaning is uses 2 lagged values, 1 differencing step, and 2 moving average terms
Advantages
- provides structured framework for analyzing historical data
- techniques such as decomposition and ARIMA help capture trends and seasonality
- improves prediction accuracy
- supports data-driven decision-making
- enabling businesses to optimize these parameters based on anticipated outcomes
- operations
- recourse allocation
- financial planning
- enabling businesses to optimize these parameters based on anticipated outcomes
- versatility extends across various industries
Disadvantages
- ARIMA assumes stationarity
- requires additional preprocessing if strong trends or seasonality exist
- technique is highly sensitive to outliers
- unexpected events can significantly impact forecast accuracy
- generally perform better for short-term predictions
- long-term forecasts becomes increasingly uncertain
- model selection and parameter tuning require expertise
- determining optimal values for ARIMA’s p, d, and q parameters often involves a trial-and-error approach
- challenging for non-experts