Churn modeling case study
SignalForge: Statistically Honest Churn Modeling
Optuna-tuned models on IBM Telco with leakage-free cross-validation, bootstrap confidence intervals, paired t-tests, and calibration. The interesting result is how little separates the models — so the writeup treats selection as a calibration and interpretability decision, not an accuracy contest.
7,043 customers
IBM Telco — real data via Kaggle, with a synthetic sample for credential-free runs and CI
LR · RF · GB
Logistic regression, random forest, and gradient boosting — within ~0.003 AUC, overlapping CIs
Not a production system
A methodology study — and the repo says so explicitly
Statistical Methods
Leak-free cross-validation
Folds split so features never see the outcome they predict.
Bootstrap 95% CIs
Confidence intervals on every headline metric, not bare point estimates.
Paired t-tests
Model-vs-model comparisons tested for significance, not eyeballed.
Probability calibration
Predicted probabilities aligned with observed churn frequencies.
Key Findings
Three model types — logistic regression, random forest, and gradient boosting — benchmarked on IBM Telco with full statistical rigor.
The models land within ~0.003 AUC of each other with overlapping confidence intervals, so selection is a calibration and interpretability call, not an accuracy race. Methodology is the deliverable.
Real data via Kaggle; a committed synthetic sample lets the project and its CI run with no credentials, keeping the line between synthetic and real data explicit.