Return to search

Explainable and sparse predictive models with applications in reproductive health and oncology

This dissertation develops explainable and sparse predictive models applied to two main healthcare applications: reproductive health and oncology. Through the application of advanced machine learning techniques and survival analysis, we aim to enhance predictive accuracy and provide actionable insights in these critical areas. The thesis is structured into four distinct problems, each focusing on a particular research question.

The first problem concerns the prediction of the probability of conception among couples actively trying to conceive. Using self-reported health data from a North American preconception cohort study, we analyzed factors such as sociodemographics, lifestyle, medical history, diet quality, and specific male partner characteristics. Machine learning algorithms were employed to predict the probability of conception demonstrating improved discrimination and potential clinical utility.

The second problem explores the application of machine learning algorithms to electronic health record (EHR) data for identifying predictor variables associated with polycystic ovarian syndrome (PCOS) diagnosis. Employing gradient boosted trees and feed-forward multilayer perceptron classifiers, we developed a scoring system that improved the model's performance, providing a valuable tool for early detection and intervention.

The third problem focuses on predicting the risk of miscarriage among female participants who conceived during the study period. Utilizing both static and survival analysis, including Cox proportional hazard models, we developed predictive models to assess miscarriage risk. The study revealed that most miscarriages were due to random genetic errors during early pregnancy, indicating that miscarriage is not easily predicted based on preconception sociodemographic and lifestyle characteristics.

Finally, the fourth problem focuses on the development of predictive models for managing Chronic Myeloid Leukemia (CML) patients. We developed models to predict whether patients will achieve deep molecular response (DMR) at later treatment stages and maintaining this status up to 60 months post-treatment initiation. These models offer insights into treatment effectiveness and patient management, aiming to support clinical decision-making and improve long-term patient outcomes.


By emphasizing the explainability of these models, this dissertation not only aims to provide accurate predictions but also to ensure that the results are interpretable and actionable for healthcare professionals. Overall, this thesis showcases the potential of predictive modeling to improve reproductive health and oncology-related outcomes. The development and validation of various models in these contexts underscore the value of machine learning algorithms in healthcare research, analysis of epidemiologic data, and prediction of critical health events. The findings have significant implications for enhancing patient care, informing clinical practices, and guiding healthcare policy decisions.

Identiferoai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/49309
Date20 September 2024
CreatorsZad, Zahra
ContributorsPaschalidis, Ioannis Ch.
Source SetsBoston University
Languageen_US
Detected LanguageEnglish
TypeThesis/Dissertation

Page generated in 0.0023 seconds