Descriptive Statistics
-
ways of summarizing large sets of quantitative (numerical) information
-
provides insights into
- central tendencies
- variability
- patterns within datasets
-
understand key characteristics of their data and identify anomalies, trends, and outliers by employing techniques such as
- mean
- median
- mode
- range
- standard deviation
- skewness
- kurtosis
-
serves as foundation for further data analysis
- enable generation of actionable insights without relying on complex statistical modeling
Statistical Measurements
Mean
- the arithmetic average
- fundamental measure of central tendency
- provides insight into the overall trend of a dataset
can be misleading when the data contains outliers or is highly skewed
import numpy as np
data = [100, 200, 150, 400, 500]
mean_value = np.mean(data)Median
- the middle value in an ordered dataset
- more robust measure of central tendency when data is skewed
- less affected by extreme values
- useful in financial and income-related analysis
import numpy as np
data = [100, 200, 150, 400, 500]
median_value = np.median(data)Mode
- the value that appears most frequently in a dataset
- useful for categorical data analysis
from statistics import mode
data = [42, 37, 42, 45, 42, 38, 37]
mode_value = mode(data)Range
- measure of dispersion that represents the difference between the maximum and minimum values in a dataset
- provides simple measure of variability
- does not account for how data is distributed between these extremes
data = [42, 37, 42, 45, 42, 38, 37]
range_value = max(data) - min(data)Standard Deviation
- quantifies the amount of variation or dispersion in a dataset
- Lower standard deviation: close to the mean
- Higher standard deviation: greater variability
import numpy as python
std_dev = np.std(data, ddof=1) # ddof=1 for sample standard deviationSkewness
- measures the asymmetry of a distribution relative to its mean
- positive skew: long right tail
- negative skew: long left tail
from scipy.stats import skew
skewness_value = skew(data)Kurtosis
- measures the “tailedness” of a distribution, indicating whether data points are concentrated around the mean or dispersed across the tails
from scipy.stats import kurtosis
kurtosis_value = kurtosis(data)