Practical Machine Learning in the Clinical Laboratory
Welcome to Practical Machine Learning in the Clinical Laboratory. This site aims to serve as a supplement to the content outlined in the article, “Validating, Implementing, and Monitoring Machine Learning Solutions in the Clinical Laboratory Safely and Effectively”. We hope to provide a more detailed, technical corollary to the concepts and principles discussed in the main article.
The site will guide us through some of the practical components of applying machine learning to clinical laboratory tasks using a real-world example, the detection of basic metabolic panel (BMP) results that have been contaminated by 0.9% normal saline (NS). The data and models used in this example are publicly available on FigShare (see Getting Started), and the code will be written in R.
The Motivating Example
Erroneous laboratory results contribute to a cascade of downstream consequences that negatively impact patient care1, including delays in diagnosis, incorrect treatments, and increased healthcare costs2–4. The majority of these errors stem from issues of improper collection or transport, and occur prior to a specimen reaching the laboratory5,6. While substantial progress has been made in reducing the burden of mislabeled specimens, improperly ordered tests, and other preanalytical errors7–11, contamination by IV fluids remains an unsolved problem12,13. Recognizing this unmet need, the IFCC Working Group on Laboratory Error and Patient Safety added a new quality indicator – “Contamination by a non-microbiological source (Pre-Cont)” to its 2019 report14.
IV Fluid Contamination
IV fluid contamination occurs when a sample is collected from a catheter through which a solution is being infused or drawn proximally to the catheter’s insertion site. This leads to divergence in the measured concentrations for all analytes being tested, the nature of which depend on the composition of the contaminating fluid (Figure 2). Current protocols for detecting contaminated specimens vary across institutions, and may rely on delta checks, feasibility flags, or manual technologist review. These methods are often time-consuming and may prone to error15. The multivariate nature of this problem lends itself well to a machine learning solution.
The Machine Learning Solution
~2,500,000 BMP results collected from inpatients at a single institution were extracted from the laboratory information system. Contamination by 0.9% normal saline was simulated16 at varying mixture ratios in a randomly selected subset of results (Figure 3). An XGBoost17 model was tuned using cross-validation, then trained to predict the binary class label of simulated contamination vs. physiologic result.
Two models will be described in this example:
A real-time model that uses the patients’ current and most recent prior results to predicts contamination at the time the specimen is drawn.
A retrospective model which also incorporates patients’ subsequent results to assess for the anomaly-with-resolution pattern.
The real-time model would be intended for live clinical use, while the retrospective model would be intended as a quality assurance tool and mechanism by which ground truth labels could be applied in an automated, scalable fashion.