CHRISTIAN

CALLAHAN

Data / ML Engineer

I build models that hold up to scrutiny, end to end: leakage-safe labeling, real baselines, calibration, and a clear line between synthetic and real data. Every metric is committed and reproducible from a clean clone. From SQL and warehouse modeling to the Streamlit, FastAPI, or Next.js interface on top.

PROJECTS CONTACT

What I Build

Models that hold up to scrutiny, shipped end-to-end.

Statistically Honest

Leakage-safe labeling, baselines before boosting, cross-validation, bootstrap CIs, and calibration. The numbers mean what they say.

End to End

Raw data and SQL, through tuned and calibrated models, to a Streamlit, FastAPI, or Next.js interface.

Reproducible by Default

Synthetic and real data kept separate, every metric committed and runnable from a clean clone.

Stack

PythonSQLTypeScriptscikit-learnXGBoostLightGBMOptunapandasDuckDBdbtFastAPIStreamlitNext.jsPostgreSQLPrismapytestGitHub ActionsDocker

Portfolio

A few things I've shipped

Featured Project

Pit Wall Intelligence

Ingested 4 seasons of lap-level F1 data (85 races, 90k laps, 33 circuits) through a DuckDB + dbt warehouse. Trained an isotonic tyre-degradation model (1.38s within-circuit MAE; 9.4s leave-one-circuit-out median) and a calibrated LightGBM undercut classifier (AUC 0.66 ± 0.05 on 5-fold GroupKFold, Brier 0.084). Validated a Monte Carlo race simulator against 3 famous 2024 strategy calls (Monaco / Hungary / Italy); average MAE 1.65 finishing positions. Shipped a 6-page Streamlit dashboard, a containerized FastAPI inference service (17ms median latency), and a weekly automated retraining workflow with MLflow tracking.

DuckDBdbtLightGBMFastAPIMLflow

SignalForge

Logistic regression vs. random forest vs. gradient boosting on IBM Telco (7,043 customers), with leak-free cross-validation, bootstrap 95% CIs, paired t-tests, and calibration. The models land within ~0.003 AUC with overlapping CIs, so the writeup treats selection as a calibration/interpretability decision rather than an accuracy contest.

Pythonscikit-learnOptunaStreamlit

SaaS Churn Simulator

Time-windowed (observation / gap / check) labeling on RetailRocket (2.76M events, 1.41M visitors); LightGBM (Optuna-tuned, isotonic-calibrated) vs. a logistic baseline, plus a budget-targeting ROI simulator. 5-fold CV ROC-AUC 0.88 ± 0.06; calibration cut the test Brier score from 0.065 to 0.009.

PythonLightGBMOptunaMLflowDocker

Ecommerce Retention & Growth

30-day churn prediction on the WSDM KKBox dataset: calibrated XGBoost (PR-AUC and calibration emphasized under ~9% churn), K-Means LTV segmentation, and a retention-ROI simulator. Ships a synthetic generator so the pipeline runs without the large download.

PythonXGBoostscikit-learnpandas

Ticket Intel

Routing and extractive summarization on Banking77 (77 intents) using TF-IDF + Naive Bayes by design: fast, cheap, interpretable, with the router abstracted so a transformer can drop in later.

Pythonscikit-learnFastAPIStreamlit

Automodeler

Type a ticker, get a fully-linked 3-statement Excel model with native formulas.

FastAPIPythonFMP API

Experience

My professional and educational journey

Current Role

Founder & Principal

CGC Labs

2026 - Present

Healthcare BI for community and rural hospitals

Outsourced BI for rural and critical access hospitals. Executive dashboards on whatever the hospital already runs (Tableau, Python/Streamlit, Excel), plus SQL operations on the existing EMR. The system I built in-house, now a managed service.

TableauSQLPythonStreamlitExcelEMRHealthcare BI

Key Impact

Tool-agnosticdelivery

executive dashboards in Tableau, Python/Streamlit, or advanced Excel, built on existing EMR sources with no new licenses

ProductizedBI platform

a proven healthcare BI system, developed at critical access hospitals, now offered as a turnkey managed service

Nichespecialization

healthcare BI for community and rural hospitals, the segment large firms ignore

Previous Role

Business Intelligence Analyst

Community Hospital (Critical Access)

2022 - 2026

McCook, NE

Owned the BI function at a critical access hospital. Replaced a failed $150K vendor solution with custom Tableau and SQL infrastructure. Did the BI-side data transformation on the Veradigm-to-Paragon EMR migration, alongside Altera. Built the reporting that survived CMS audit.

PythonSQLTableauParagon EMRETL

Key Impact

22%NPS lift

first-ever 75th percentile HCAHPS ranking for the facility

$150Kvendor replaced

custom Tableau/SQL system delivering $10K/yr in ongoing savings

ParagonEMR migration

BI-side data transformation on the Veradigm-to-Paragon cutover, alongside Altera; reporting kept intact

5+production models

quality, risk, and operational analytics embedded in clinical workflows

Previous:Manufacturing & Operations(2018 - 2022)

Parker HannifinRed Willow Co Sheriff Dept

Education

Dual MBA & M.S. Data Science

Eastern University

Expected 2027

Bachelor of Applied Science

Peru State College

2022

Recent Activity

Continuous learning and shipping.

Latest Commits

Loading commits...

Current Focus

F1 Strategy Analytics

Pit Wall Intelligence — tyre degradation, undercut probability, and Monte Carlo race simulation over a DuckDB + dbt warehouse and calibrated ML. My most advanced project, end-to-end: warehouse, models, a 6-page dashboard, and a Dockerized FastAPI service.

Churn & Retention Modeling

Churn and retention modeling with statistical rigor — leakage-safe labeling, calibration, and bootstrap confidence intervals across SignalForge and the SaaS Churn Simulator.

Connect

Open to data and ML engineering roles. Email is fastest; the code is on GitHub, the history is on LinkedIn.

christian.g.callahan@gmail.com

CHRISTIAN

CALLAHAN

Data / ML Engineer

PROJECTS CONTACT

What I Build

Models that hold up to scrutiny, shipped end-to-end.

Statistically Honest

Leakage-safe labeling, baselines before boosting, cross-validation, bootstrap CIs, and calibration. The numbers mean what they say.

End to End

Raw data and SQL, through tuned and calibrated models, to a Streamlit, FastAPI, or Next.js interface.

Reproducible by Default

Synthetic and real data kept separate, every metric committed and runnable from a clean clone.

Stack

PythonSQLTypeScriptscikit-learnXGBoostLightGBMOptunapandasDuckDBdbtFastAPIStreamlitNext.jsPostgreSQLPrismapytestGitHub ActionsDocker

Portfolio

A few things I've shipped

Featured Project

Pit Wall Intelligence

DuckDBdbtLightGBMFastAPIMLflow

SignalForge

Pythonscikit-learnOptunaStreamlit

SaaS Churn Simulator

PythonLightGBMOptunaMLflowDocker

Ecommerce Retention & Growth

PythonXGBoostscikit-learnpandas

Ticket Intel

Routing and extractive summarization on Banking77 (77 intents) using TF-IDF + Naive Bayes by design: fast, cheap, interpretable, with the router abstracted so a transformer can drop in later.

Pythonscikit-learnFastAPIStreamlit

Automodeler

Type a ticker, get a fully-linked 3-statement Excel model with native formulas.

FastAPIPythonFMP API

Experience

My professional and educational journey

Current Role

Founder & Principal

CGC Labs

2026 - Present

Healthcare BI for community and rural hospitals

TableauSQLPythonStreamlitExcelEMRHealthcare BI

Key Impact

Tool-agnosticdelivery

executive dashboards in Tableau, Python/Streamlit, or advanced Excel, built on existing EMR sources with no new licenses

ProductizedBI platform

a proven healthcare BI system, developed at critical access hospitals, now offered as a turnkey managed service

Nichespecialization

healthcare BI for community and rural hospitals, the segment large firms ignore

Previous Role

Business Intelligence Analyst

Community Hospital (Critical Access)

2022 - 2026

McCook, NE

PythonSQLTableauParagon EMRETL

Key Impact

22%NPS lift

first-ever 75th percentile HCAHPS ranking for the facility

$150Kvendor replaced

custom Tableau/SQL system delivering $10K/yr in ongoing savings

ParagonEMR migration

BI-side data transformation on the Veradigm-to-Paragon cutover, alongside Altera; reporting kept intact

5+production models

quality, risk, and operational analytics embedded in clinical workflows

Previous:Manufacturing & Operations(2018 - 2022)

Parker HannifinRed Willow Co Sheriff Dept

Education

Dual MBA & M.S. Data Science

Eastern University

Expected 2027

Bachelor of Applied Science

Peru State College

2022

Recent Activity

Continuous learning and shipping.

Latest Commits

Loading commits...

Current Focus

F1 Strategy Analytics

Churn & Retention Modeling

Churn and retention modeling with statistical rigor — leakage-safe labeling, calibration, and bootstrap confidence intervals across SignalForge and the SaaS Churn Simulator.

Connect

Open to data and ML engineering roles. Email is fastest; the code is on GitHub, the history is on LinkedIn.

christian.g.callahan@gmail.com