INFO
Technique in machine learning and data analysis where classification decisions are made based on predefined rules
Rules typically follow an “if-then” structure and are derived from training data, expert knowledge, or rule-mining algorithms
- Developed by: Rooted in expert systems and symbolic AI from the 1970s–1980s
- Core Principle: Uses explicit logical conditions to categorize data points
- Search Strategy:
- Rules can be extracted from:
- Decision trees
- Association rule mining (e.g., Apriori algorithm)
- Domain expert knowledge
- Enables interpretable decision-making with transparent logic
- Rules can be extracted from:
Workflow
- Rule Definition
- Manually or algorithmically define classification rules
- Use logical conditions based on feature thresholds
- Rule Application
- Apply rules to each data point
- Assign class labels based on matched conditions
Code Example
import pandas as pd
# Sample dataset with customer transactions
data = {
"Customer_ID": [101, 102, 103, 104, 105],
"Total_Purchases": [15, 2, 8, 12, 1],
"Total_Spend": [1200, 150, 800, 950, 50],
"Last_Purchase_Days_Ago": [30, 210, 90, 45, 365]
}
df = pd.DataFrame(data)
# Define classification function
def classify_customer(row):
if row["Total_Purchases"] > 10 and row["Total_Spend"] > 1000:
return "Loyal Customer"
elif row["Total_Purchases"] < 3 and row["Last_Purchase_Days_Ago"] > 180:
return "At-Risk Customer"
elif row["Total_Purchases"] >= 5 and row["Total_Spend"] > 500:
return "Regular Customer"
else:
return "Occasional Customer"
# Apply classification
df["Customer_Category"] = df.apply(classify_customer, axis=1)
# Display the classified results
import ace_tools as tools
tools.display_dataframe_to_user(name="Rule-Based Classification Results", dataframe=df)Advantages
- Interpretability
- Decisions are based on explicitly defined rules
- Easy to explain and audit
- Valuable in regulatory environments
- Computationally efficient
- No iterative training required
- Fast execution on structured datasets
Disadvantages
- Rigid and inflexible
- Rules must be manually defined
- Requires frequent updates for dynamic data
- Limited in handling complex relationships
- Struggles with non-linear interactions or high-dimensional data