CHRISTIAN
CALLAHAN
Data / ML Engineer
I build models that hold up to scrutiny. Across my work you'll find leakage-safe labeling, baselines before boosting, cross-validation, probability calibration, and an explicit line between synthetic and real data — every metric is committed and reproducible from a clean clone. I work end-to-end: raw data and SQL/warehouse modeling, through tuned and calibrated models, to Streamlit, FastAPI, and Next.js interfaces.
What I Build
Models that hold up to scrutiny, shipped end-to-end.
Statistically Honest Modeling
Leakage-safe labeling, baselines before boosting, cross-validation, bootstrap confidence intervals, and probability calibration — so the numbers mean what they say.
End-to-End ML Engineering
From raw data and SQL/warehouse modeling, through tuned and calibrated models, to Streamlit, FastAPI, and Next.js interfaces.
Reproducible by Default
An explicit line between synthetic and real data, with every metric committed and reproducible from a clean clone.
Stack
Portfolio
A glimpse of the projects I've been working on
Pit Wall Intelligence
Ingested 4 seasons of lap-level F1 data (85 races, 90k laps, 33 circuits) through a DuckDB + dbt warehouse. Trained an isotonic tyre-degradation model (1.38s within-circuit MAE; 9.4s leave-one-circuit-out median) and a calibrated LightGBM undercut classifier (AUC 0.66 ± 0.05 on 5-fold GroupKFold, Brier 0.084). Validated a Monte Carlo race simulator against 3 famous 2024 strategy calls (Monaco / Hungary / Italy); average MAE 1.65 finishing positions. Shipped a 6-page Streamlit dashboard, a containerized FastAPI inference service (17ms median latency), and a weekly automated retraining workflow with MLflow tracking.
SignalForge
Logistic regression vs. random forest vs. gradient boosting on IBM Telco (7,043 customers), with leak-free cross-validation, bootstrap 95% CIs, paired t-tests, and calibration. The models land within ~0.003 AUC with overlapping CIs, so the writeup treats selection as a calibration/interpretability decision rather than an accuracy contest. Real data via Kaggle; a synthetic sample lets it and CI run with no credentials. Not a production system, and the repo says so.
SaaS Churn Simulator
Time-windowed (observation / gap / check) labeling on RetailRocket (2.76M events, 1.41M visitors); LightGBM (Optuna-tuned, isotonic-calibrated) vs. a logistic baseline, plus a budget-targeting ROI simulator. 5-fold CV ROC-AUC 0.88 ± 0.06; calibration cut the test Brier score from 0.065 to 0.009. The value is the methodology and the honest read of a hard dataset.
Ecommerce Retention & Growth
30-day churn prediction on the WSDM KKBox dataset: calibrated XGBoost (PR-AUC and calibration emphasized under ~9% churn), K-Means LTV segmentation, and a retention-ROI simulator. Ships a synthetic generator so the pipeline runs without the large download.
Ticket Intel
Routing and extractive summarization on Banking77 (77 intents) using TF-IDF + Naive Bayes by design: fast, cheap, interpretable, with the router abstracted so a transformer can drop in later. Benchmarked with reproducible, honest evaluation.
Automodeler
Type a ticker, get a fully-linked 3-statement Excel model with native formulas.
Experience
My professional and educational journey
Founder & Principal
CGC Labs
2026 - Present
Healthcare BI for community and rural hospitals
Developed and battle-tested a complete healthcare BI platform at critical access hospitals. Executive dashboards built in whatever the hospital already has — Tableau, Python/Streamlit, or Excel — plus SQL data operations on top of existing EMR sources. Now available to community and rural systems as a managed service. $7,500/mo plus a one-time setup fee.
Key Impact
monthly engagement model plus a one-time setup fee for community and rural hospital systems
executive dashboards in Tableau, Python/Streamlit, or advanced Excel — built on top of existing EMR sources, no new licenses required
a proven healthcare BI system, developed at critical access hospitals, now offered as a turnkey managed service
healthcare BI for community and rural hospitals — the segment large firms ignore
Business Intelligence Analyst
Community Hospital (Critical Access)
2022 - 2026
McCook, NE
Owned the BI function at a critical access hospital. Replaced a failed $150K vendor solution with custom Tableau and SQL infrastructure. Led a full EMR migration to Paragon. Built the reporting that survived CMS audit.
Key Impact
first-ever 75th percentile HCAHPS ranking for the facility
custom Tableau/SQL system delivering $10K/yr in ongoing savings
multi-system EMR migration to Paragon with zero record loss
quality, risk, and operational analytics embedded in clinical workflows
Education
Dual MBA & M.S. Data Science
Eastern University
Expected 2027Bachelor of Applied Science
Peru State College
2022Recent Activity
Continuous learning and shipping.
Latest Commits
Loading commits...
Current Focus
F1 Strategy Analytics
Pit Wall Intelligence — tyre degradation, undercut probability, and Monte Carlo race simulation over a DuckDB + dbt warehouse and calibrated ML. My most advanced project, end-to-end: warehouse, models, a 6-page dashboard, and a Dockerized FastAPI service.
Churn & Retention Modeling
Churn and retention modeling with statistical rigor — leakage-safe labeling, calibration, and bootstrap confidence intervals across SignalForge and the SaaS Churn Simulator.