SURC 2025 Student Presentations
SUNY Undergraduate Research Conference Student Presentations

From Data to Delivery: Predicting Pregnancy Risk with a Boosted Decision Tree

Authors: McKenzie Skrastins, Leemor Waldman, Vladislav Kargin

SUNY Campus: Binghamton University

Presentation Type: Poster

Location: Old Union Hall

Presentation #: 41

Timeslot: Session A 9:00-10:00 AM

Abstract: Maternal health is a critical global priority, with an estimated 213 million pregnancies annually. High-risk pregnancies, defined as those with increased odds of complications, contribute significantly to maternal and neonatal mortality, particularly in low-income regions. In the United States alone, 65,000 high-risk pregnancies occur annually, with 80% of related deaths deemed preventable. This study aimed to create a statistical learning model that could classify the risk level of pregnancies occurring in Bangladesh. The data used contained 1,014 observations col- lected from maternity clinics in Bangladesh, encompassing key physiological features such as age, blood pressure, blood sugar, body temperature, heart rate, and risk level. After pre-processing the dataset, a benchmark logistic regression model was created. The logistic regression achieved 65% accuracy, indicating relationships between features and risk levels. Due to the high bias, low variance nature of the dataset, a decision tree, specifically a boosted decision tree, was used to make the final predictions. A baseline decision tree model with optimized hyper- parameters reached an accuracy of 82.3% and a boosted decision tree with tuned hyperparameters yielded a 10-fold cross validated accuracy of 85.81%. Furthermore, blood sugar level was found to be the most important predictor in classifying pregnancies. These results highlight the usefulness of statistical learning in identifying high-risk pregnancies, and emphasize the need for enhanced prenatal care, particularly in socioeconomically disadvantaged regions.