AABB2025: Machine Learning May Help Spot Contaminated Blood Counts

October 28, 2025

Sometimes a patient’s lab results just don’t add up: a hemoglobin drops unexpectedly, then rebounds without intervention. These anomalies can happen when intravenous (IV) fluids dilute blood samples collected directly from IV catheters, skewing complete blood counts (CBCs) and potentially leading to unnecessary transfusions.

During Monday afternoon’s hematology and coagulation oral abstract session, Carly Maucione, MD, from Washington University in St. Louis, described how researchers from Washington University and the University of Utah developed and evaluated machine learning models to retrospectively detect CBC contamination. For each site, 1,000 CBC “trios,” a current test paired with prior and post samples within 48 hours, were randomly selected for expert review. Reviewers determined whether the current CBC appeared contaminated based on a transient drop and subsequent recovery in analyte values.

A subsection of the cases from the expert-reviewed datasets, along with simulated contamination cases, were used to train supervised Light GBM decision tree models. These models learned to recognize characteristic “anomaly resolution patterns” — the drop-and-rebound behavior of hemoglobin, platelets and white blood cells — that signal contamination. Reviewers identified contamination in 5.3% of Utah cases and 3.7% of Washington University cases, while the model predicted contamination in about 2% of cases.

To assess the impact on transfusions, researchers examined a subgroup of 1,000 cases in which a red blood cell transfusion occurred between the current and post-CBC. Contamination rates in this subgroup were higher — 11.8% at Utah and 12.5% at Washington University — and reviewers judged 8.1% and 7.6% of transfusions, respectively, as potentially unnecessary.

Maucione also acknowledged limitations, including the challenge of distinguishing contamination from other clinical events such as bleeding or massive transfusion protocols, and the fact that model is not currently sensitive enough to detect all contamination events. While the goal is to develop real-time models for contamination detection, Maucione said that the current model can be used for quality assurance, targeted collector education and post-implementation monitoring.