Churn modeling case study

SignalForge: Statistically Honest Churn Modeling

Optuna-tuned models on IBM Telco with leakage-free cross-validation, bootstrap confidence intervals, paired t-tests, and calibration. The interesting result is how little separates the models — so the writeup treats selection as a calibration and interpretability decision, not an accuracy contest.

Dataset

7,043 customers

IBM Telco — real data via Kaggle, with a synthetic sample for credential-free runs and CI

Models compared

LR · RF · GB

Logistic regression, random forest, and gradient boosting — within ~0.003 AUC, overlapping CIs

Scope

Not a production system

A methodology study — and the repo says so explicitly

View on GitHub

Statistical Methods

Leak-free cross-validation

Folds split so features never see the outcome they predict.

Bootstrap 95% CIs

Confidence intervals on every headline metric, not bare point estimates.

Paired t-tests

Model-vs-model comparisons tested for significance, not eyeballed.

Probability calibration

Predicted probabilities aligned with observed churn frequencies.

Key Findings

Three model types — logistic regression, random forest, and gradient boosting — benchmarked on IBM Telco with full statistical rigor.

The models land within ~0.003 AUC of each other with overlapping confidence intervals, so selection is a calibration and interpretability call, not an accuracy race. Methodology is the deliverable.

Real data via Kaggle; a committed synthetic sample lets the project and its CI run with no credentials, keeping the line between synthetic and real data explicit.

Pythonscikit-learnOptunaStreamlit