The current interest of mine is leaned towards the AI research applied to biomedical or healthcare domains. While scanning through papers, I spotted this paper and I decided to have a look at it. Indeed this was intriguing, and worth sparing time to read.

It briefly talks about the ML models trained to predict whether the parturients will require blood transfusion while undergoing cesarean section, published in 2024 on Nature.

Introduction

The amount of the blood loss during cesarian section and surgery is large and the transfusion requirement is not occasional, but a mendatory.

The good amount of blood preparation could save time cost intraoperatively. Blood is a scarce medical resource and it has to be significantly prepared with right amount so not to waste before surgery, thus accurately predicting the amount of blood is always demanded.

This paper articulates the data preparation and model comparison metrics and gives insight on which data to leverage for training ML models.

This paper mostly focuses on red blood cell excluding other blood products. The primary aim by the paper is to select the right ML model with the best performance in predicting the need for an intraoperative Red Blood Cell(RBC) transfusion during a CS.

Methods

The data could be divided into two parts: demographic data of the patient and perioperative data.

Data
- Data size
  - total: 16,137
  - used: 14,254 after excluding non-complete data
  - RBC transfusion during surgery: 1,020 patients (7.16% of the total)
  - data split: 6:2:2 for training, validation and test
- the most recent data values within two days prior to surgery
  - demographic data
    - age, weight, height etc.
    - placenta previa totalis/partialis/marginalis
  - perioperative data
    - anesthesia, midazolam use, RBC transfusion

Data ratio for classification was particularly interesting to see. It is uncommon to apply imbalanced ratio to the number of ground truths. This paper compared by applying 1:1, 1:2..., 1:4 even.

ML models
- XGBoost, KNN, DT, SVM, MLP, LR, RF, DNN
Model assessment
- AUROC
- AUPRC
- metrics
  - accuracy, recall, precision, F1

Results

In accord with the paper, XGBoost most excel in the prediction for the blood transfusion with the score of AUROC 0.82 and Accuracy 0.94.

This figure compares ROC and PRC between each model.

But when I look at PRC curve, the recall values are too low that it the graph forms weird curves. Also the table above indicates that the F1 score is not greater than 0.5.

The figure below shows all ROC and PRC curves for different models with different dataset ratio.

Discussion

1:1 ratio dataset did not improve the performance of the model.
- imbalanced dataset was better performed
Traditional modeling can lead to degradation of performance where as ML aims for broader generalization and is adequate in this case
Limits
- single-center study
  - heterogeneous dataset could help generalizing the blood transfusion metrics which could enhance the model performance in general
- needs more data balancing techniques for model training
  - by only selecting 1:1, 1:2..., 1:4 dataset ratio has a certain limit. This could confuse the model when the dataset is well-refined and well-polished.

Reflection

The research gave me a great range of insight when understanding medical data and how to apply models to it. However there were several moments when I felt the paper could be better in quality and have the models that could be utilized in the actual CR.

I would like to list a few things that I think will improve the performance of the models and settle down some limitations the paper mentioned.

Recall score is extremely low, only precision is high
- Well, I think this come from the data balance. The balance could be adjusted or the data itself could be normalized or preprocessed in advance. Raw data has a high risk as some features will carry a greater parameter for the model and this will affect the performance. The range generalization or trimming unnecessary data could help too.
- The result indicates that the performance of XGBoost model excels but the recall score tells that the model is no use, even though the precision and accuracy metrics are high. This indicates that the model is highly biased towards predicting True Negative class as the best model trained with data ratio of True and False was not 1:1. This could imply that True Positive rate is extremely low, meaning when the parturients actually need blood transfusion, the model could say they will not need transfusion, which will increase the clinical risk.
data split could be 8:1:1 to focus more on the training
- The data lacked in the size. Only a 1-2 thousands of data rows could be considered to be used with 50:50 ratio for model training. To generalize the prediction performance with the scarce data could cause a highly biased result. If the dataset size is small, the training data rows should be more than 60%.

Reference

Lee, S. W., Park, B., Seo, J., Lee, S., & Sim, J. H. (2024). Development of a machine learning approach for prediction of red blood cell transfusion in patients undergoing Cesarean section at a single institution. Scientific Reports, 14, 16628. https://doi.org/10.1038/s41598-024-67784-2

[Paper Review] Development of a machine learning approach for prediction of red blood cell transfusion in patients undergoing Cesarean section at a single institution

Introduction

Methods

Results

Discussion

Reflection

Reference

Comments

Paper Review

[Paper Review] Operationalizing Large Language Models for Clinical Research Data Extraction: Methods, Quality Control, and Governance

More from this blog

[Paper Review] Operationalizing Large Language Models for Clinical Research Data Extraction: Methods, Quality Control, and Governance

[Paper Review] MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

[Paper Review] Revolutionizing Speaker Recognition and Diarization: A Novel Methodology in Speech Analysis

[Paper Review] Training a Helpful and Harmless Assistant withReinforcement Learning from Human Feedback

Command Palette

Introduction

Methods

Results

Discussion

Reflection

Reference

Comments

Paper Review

[Paper Review] Operationalizing Large Language Models for Clinical Research Data Extraction: Methods, Quality Control, and Governance

More from this blog