Apriori Algorithm

INFO

A foundational technique in association rule learning used to identify frequent itemsets and derive rules from large transactional datasets.

Developed by: Agrawal and Srikant (1994)
Core Principle: A subset of a frequent itemset must also be frequent
Search Strategy: Level-wise, breadth-first expansion

Workflow

Frequent Itemset Generation
- Expand itemset one item at a time
- Prune candidates that do not meet the minimum support threshold
Rule Derivation
- Generate rules from frequent itemset
- Evaluate using metrics:
  - Support: Frequency of itemset in transactions
  - Confidence: Likelihood that consequent appears when antecedent does
  - Lift: Measures strength of association vs. random chance

Code Example

import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
 
# Sample transaction dataset
data = {
    'Transaction': [1, 2, 3, 4, 5],
    'Milk': [1, 1, 0, 1, 1],
    'Bread': [1, 0, 1, 1, 1],
    'Butter': [1, 0, 1, 0, 1],
    'Jam': [0, 1, 0, 1, 1]
}
 
# Convert to DataFrame
df = pd.DataFrame(data).set_index('Transaction')
 
# Apply Apriori algorithm
frequent_itemsets = apriori(df, min_support=0.4, use_colnames=True)
 
# Generate association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.6)
 
# Display results
import ace_tools as tools
tools.display_dataframe_to_user(name="Association Rules", dataframe=rules)

Advantages

Produces interpretable rules for actionable insights
Applicable across domains (retail, healthcare, web usage)
Metrics (support, confidence, lift) help prioritize relationships

Disadvantages

High computational cost for large datasets
Poor scalability in high-dimensional spaces
Requires manual tuning of support/confidence threshold

Jason's Notebook

Explorer

Apriori Algorithm

Workflow

Code Example

Advantages

Disadvantages

Graph View

Table of Contents

Backlinks