Descriptive Statistics

ways of summarizing large sets of quantitative (numerical) information
provides insights into
- central tendencies
- variability
- patterns within datasets
understand key characteristics of their data and identify anomalies, trends, and outliers by employing techniques such as
- mean
- median
- mode
- range
- standard deviation
- skewness
- kurtosis
serves as foundation for further data analysis
- enable generation of actionable insights without relying on complex statistical modeling

Statistical Measurements

the arithmetic average
fundamental measure of central tendency $m e an = \frac{i = 0 \sum N x _{i}}{N}$
provides insight into the overall trend of a dataset can be misleading when the data contains outliers or is highly skewed

import numpy as np
 
data = [100, 200, 150, 400, 500]
mean_value = np.mean(data)

the middle value in an ordered dataset
more robust measure of central tendency when data is skewed
less affected by extreme values
- useful in financial and income-related analysis

import numpy as np
 
data = [100, 200, 150, 400, 500]
median_value = np.median(data)

from statistics import mode
 
data = [42, 37, 42, 45, 42, 38, 37]
mode_value = mode(data)

measure of dispersion that represents the difference between the maximum and minimum values in a dataset
provides simple measure of variability
does not account for how data is distributed between these extremes

data = [42, 37, 42, 45, 42, 38, 37]
range_value = max(data) - min(data)

quantifies the amount of variation or dispersion in a dataset
- Lower standard deviation: close to the mean
- Higher standard deviation: greater variability

import numpy as python
 
std_dev = np.std(data, ddof=1)  # ddof=1 for sample standard deviation

measures the asymmetry of a distribution relative to its mean
- positive skew: long right tail
- negative skew: long left tail

from scipy.stats import skew
 
skewness_value = skew(data)

measures the “tailedness” of a distribution, indicating whether data points are concentrated around the mean or dispersed across the tails

from scipy.stats import kurtosis
 
kurtosis_value = kurtosis(data)