Global ETD Search

1	Deep Learning Methods Cannot Outperform Other Machine Learning Methods on Analyzing Genome-wide Association Studies Zhou, Shaoze 31 August 2022 (has links) Deep Learning (DL) has been broadly applied to solve big data problems in biomedical fields, which is most successful in image processing. Recently, many DL methods have been applied to analyze genomic studies. However, genomic data usually has too small a sample size to fit a complex network. They do not have common structural patterns like images to utilize pre-trained networks or take advantage of convolution layers. The concern of overusing DL methods motivates us to evaluate DL methods' performance versus popular non-deep Machine Learning (ML) methods for analyzing genomic data with a wide range of sample sizes. In this paper, we conduct a benchmark study using the UK Biobank data and its many random subsets with different sample sizes. The original UK Biobank data has about 500k participants. Each patient has comprehensive patient characteristics, disease histories, and genomic information, i.e., the genotypes of millions of Single-Nucleotide Polymorphism (SNPs). We are interested in predicting the risk of three lung diseases: asthma, COPD, and lung cancer. There are 205,238 participants have recorded disease outcomes for these three diseases. Five prediction models are investigated in this benchmark study, including three non-deep machine learning methods (Elastic Net, XGBoost, and SVM) and two deep learning methods (DNN and LSTM). Besides the most popular performance metrics, such as the F1-score, we promote the hit curve, a visual tool to describe the performance of predicting rare events. We discovered that DL methods frequently fail to outperform non-deep ML in analyzing genomic data, even in large datasets with over 200k samples. The experiment results suggest not overusing DL methods in genomic studies, even with biobank-level sample sizes. The performance differences between DL and non-deep ML decrease as the sample size of data increases. This suggests when the sample size of data is significant, further increasing sample sizes leads to more performance gain in DL methods. Hence, DL methods could be better if we analyze genomic data bigger than this study. / Graduate deep learning machine learning genomic analysis disease prediction imbalance data hit curve
2	Evaluating Factors Contributing to Crash Severity Among Older Drivers: Statistical Modeling and Machine Learning Approaches Alrumaidhi, Mubarak S. M. S. 23 February 2024 (has links) Road crashes pose a significant public health issue worldwide, often leading to severe injuries and fatalities. This dissertation embarks on a comprehensive examination of the factors affecting road crash severity, with a special focus on older drivers and the unique challenges introduced by the COVID-19 pandemic. Utilizing a dataset from Virginia, USA, the research integrates advanced statistical methods and machine learning techniques to dissect this critical issue from multiple angles. The initial study within the dissertation employs multilevel ordinal logistic regression to assess crash severity among older drivers, revealing the complex interplay of various factors such as crash type, road attributes, and driver behavior. It highlights the increased risk of severe crashes associated with head-on collisions, driver distraction or impairment, and the non-use of seat belts, specifically affecting older drivers. These findings are pivotal in understanding the unique vulnerabilities of this demographic on the road. Furthermore, the dissertation explores the efficacy of both parametric and non-parametric machine learning models in predicting crash severity. It emphasizes the innovative use of synthetic resampling techniques, particularly random over-sampling examples (ROSE) and synthetic minority over-sampling technique (SMOTE), to address class imbalances. This methodological advancement not only improves the accuracy of crash severity predictions for severe crashes but also offers a comprehensive understanding of diverse factors, including environmental and roadway characteristics. Additionally, the dissertation examines the influence of the COVID-19 pandemic on road safety, revealing a paradoxical decrease in overall traffic crashes accompanied by an increase in the rate of severe injuries. This finding underscores the pandemic's transformative effect on driving behaviors and patterns, heightening risks for vulnerable road users like pedestrians and cyclists. The study calls for adaptable road safety strategies responsive to global challenges and societal shifts. Collectively, the studies within this dissertation contribute substantially to transportation safety research. They demonstrate the complex nature of factors influencing crash severity and the efficacy of tailored approaches in addressing these challenges. The integration of advanced statistical methods with machine learning techniques offers a profound understanding of crash dynamics and sets a new benchmark for future research in transportation safety. This dissertation underscores the evolving challenges in road safety, especially amidst demographic shifts and global crises, and advocates for adaptive, evidence-based strategies to enhance road safety for all, particularly vulnerable groups like the older drivers. / Doctor of Philosophy / Road crashes are a major concern worldwide, often leading to serious injuries and loss of life. This dissertation delves into the critical issue of road crash severity, with a special focus on older drivers and the challenges brought about by the COVID-19 pandemic. Drawing on data from Virginia, USA, the research combines cutting-edge statistical methods and machine learning to shed light on this pressing matter. One important part of the research focuses on older drivers. It uses advanced analysis to find out why crashes involving this group might be more serious. The study discovered that situations like head-on collisions, driver distraction or impairment, and not wearing seat belts greatly increase the risk for older drivers. Understanding these risks is crucial in identifying the special needs of older drivers on the road. Then, the study explores the power of machine learning in predicting crash severity. Here, the research stands out by using innovative techniques to balance out the data, leading to more accurate predictions. This part of the study not only improves our understanding of what leads to severe crashes but also highlights how different environmental and road factors play a role. Following this, the research looks at how the COVID-19 pandemic has impacted road safety. Interestingly, while the overall number of crashes went down during the pandemic, the rate of severe injuries in the crashes that occurred increased. This suggests that the pandemic changed driving behaviors, posing increased risks especially to pedestrians and cyclists. In summary, this dissertation offers valuable insights into the complex factors affecting road crash severity. It underscores the importance of using advanced analysis techniques to understand these dynamics better, especially in the face of demographic changes and global challenges like the pandemic. The findings are not just academically significant; they provide practical guidance for policymakers and road safety experts to develop strategies that make roads safer for everyone, particularly older drivers. Crash Severity Machine Learning Statistical Modeling Multilevel Modeling Resampling Techniques Imbalance Data Road Safety Older drivers Temporal Instability COVID-19 Transportation safety
3	Design and assessment of a computer-assisted artificial intelligence system for predicting preterm labor in women attending regular check-ups. Emphasis in imbalance data learning technique Nieto del Amor, Félix 18 December 2023 (has links) Tesis por compendio / [ES] El parto prematuro, definido como el nacimiento antes de las 37 semanas de gestación, es una importante preocupación mundial con implicaciones para la salud de los recién nacidos y los costes económicos. Afecta aproximadamente al 11% de todos los nacimientos, lo que supone más de 15 millones de individuos en todo el mundo. Los métodos actuales para predecir el parto prematuro carecen de precisión, lo que conduce a un sobrediagnóstico y a una viabilidad limitada en entornos clínicos. La electrohisterografía (EHG) ha surgido como una alternativa prometedora al proporcionar información relevante sobre la electrofisiología uterina. Sin embargo, los sistemas de predicción anteriores basados en EHG no se han trasladado de forma efectiva a la práctica clínica, debido principalmente a los sesgos en el manejo de datos desbalanceados y a la necesidad de modelos de predicción robustos y generalizables. Esta tesis doctoral pretende desarrollar un sistema de predicción del parto prematuro basado en inteligencia artificial utilizando EHG y datos obstétricos de mujeres sometidas a controles prenatales regulares. Este sistema implica la extracción de características relevantes, la optimización del subespacio de características y la evaluación de estrategias para abordar el reto de los datos desbalanceados para una predicción robusta. El estudio valida la eficacia de las características temporales, espectrales y no lineales para distinguir entre casos de parto prematuro y a término. Las nuevas medidas de entropía, en concreto la dispersión y la entropía de burbuja, superan a las métricas de entropía tradicionales en la identificación del parto prematuro. Además, el estudio trata de maximizar la información complementaria al tiempo que minimiza la redundancia y las características de ruido para optimizar el subespacio de características para una predicción precisa del parto prematuro mediante un algoritmo genético. Además, se ha confirmado la fuga de información entre el conjunto de datos de entrenamiento y el de prueba al generar muestras sintéticas antes de la partición de datos, lo que da lugar a una capacidad de generalización sobreestimada del sistema predictor. Estos resultados subrayan la importancia de particionar y después remuestrear para garantizar la independencia de los datos entre las muestras de entrenamiento y de prueba. Se propone combinar el algoritmo genético y el remuestreo en la misma iteración para hacer frente al desequilibrio en el aprendizaje de los datos mediante el enfoque de particio'n-remuestreo, logrando un área bajo la curva ROC del 94% y una precisión media del 84%. Además, el modelo demuestra un F1-score y una sensibilidad de aproximadamente el 80%, superando a los estudios existentes que consideran el enfoque de remuestreo después de particionar. Esto revela el potencial de un sistema de predicción de parto prematuro basado en EHG, permitiendo estrategias orientadas al paciente para mejorar la prevención del parto prematuro, el bienestar materno-fetal y la gestión óptima de los recursos hospitalarios. En general, esta tesis doctoral proporciona a los clínicos herramientas valiosas para la toma de decisiones en escenarios de riesgo materno-fetal de parto prematuro. Permite a los clínicos diseñar estrategias orientadas al paciente para mejorar la prevención y el manejo del parto prematuro. La metodología propuesta es prometedora para el desarrollo de un sistema integrado de predicción del parto prematuro que pueda mejorar la planificación del embarazo, optimizar la asignación de recursos y reducir el riesgo de parto prematuro. / [CA] El part prematur, definit com el naixement abans de les 37 setmanes de gestacio', e's una important preocupacio' mundial amb implicacions per a la salut dels nounats i els costos econo¿mics. Afecta aproximadament a l'11% de tots els naixements, la qual cosa suposa me's de 15 milions d'individus a tot el mo'n. Els me¿todes actuals per a predir el part prematur manquen de precisio', la qual cosa condueix a un sobrediagno¿stic i a una viabilitat limitada en entorns cl¿'nics. La electrohisterografia (EHG) ha sorgit com una alternativa prometedora en proporcionar informacio' rellevant sobre l'electrofisiologia uterina. No obstant aixo¿, els sistemes de prediccio' anteriors basats en EHG no s'han traslladat de manera efectiva a la pra¿ctica cl¿'nica, degut principalment als biaixos en el maneig de dades desequilibrades i a la necessitat de models de prediccio' robustos i generalitzables. Aquesta tesi doctoral prete'n desenvolupar un sistema de prediccio' del part prematur basat en intel·lige¿ncia artificial utilitzant EHG i dades obste¿triques de dones sotmeses a controls prenatals regulars. Aquest sistema implica l'extraccio' de caracter¿'stiques rellevants, l'optimitzacio' del subespai de caracter¿'stiques i l'avaluacio' d'estrate¿gies per a abordar el repte de les dades desequilibrades per a una prediccio' robusta. L'estudi valguda l'efica¿cia de les caracter¿'stiques temporals, espectrals i no lineals per a distingir entre casos de part prematur i a terme. Les noves mesures d'entropia, en concret la dispersio' i l'entropia de bambolla, superen a les me¿triques d'entropia tradicionals en la identificacio' del part prematur. A me's, l'estudi tracta de maximitzar la informacio' complementa¿ria al mateix temps que minimitza la redunda¿ncia i les caracter¿'stiques de soroll per a optimitzar el subespai de caracter¿'stiques per a una prediccio' precisa del part prematur mitjan¿cant un algorisme gene¿tic. A me's, hem confirmat la fugida d'informacio' entre el conjunt de dades d'entrenament i el de prova en generar mostres sinte¿tiques abans de la particio' de dades, la qual cosa dona lloc a una capacitat de generalitzacio' sobreestimada del sistema predictor. Aquests resultats subratllen la importa¿ncia de particionar i despre's remostrejar per a garantir la independe¿ncia de les dades entre les mostres d'entrenament i de prova. Proposem combinar l'algorisme gene¿tic i el remostreig en la mateixa iteracio' per a fer front al desequilibri en l'aprenentatge de les dades mitjan¿cant l'enfocament de particio'-remostrege, aconseguint una a¿rea sota la corba ROC del 94% i una precisio' mitjana del 84%. A me's, el model demostra una puntuacio' F1 i una sensibilitat d'aproximadament el 80%, superant als estudis existents que consideren l'enfocament de remostreig despre's de particionar. Aixo¿ revela el potencial d'un sistema de prediccio' de part prematur basat en EHG, permetent estrate¿gies orientades al pacient per a millorar la prevencio' del part prematur, el benestar matern-fetal i la gestio' o¿ptima dels recursos hospitalaris. En general, aquesta tesi doctoral proporciona als cl¿'nics eines valuoses per a la presa de decisions en escenaris de risc matern-fetal de part prematur. Permet als cl¿'nics dissenyar estrate¿gies orientades al pacient per a millorar la prevencio' i el maneig del part prematur. La metodologia proposada e's prometedora per al desenvolupament d'un sistema integrat de prediccio' del part prematur que puga millorar la planificacio' de l'embara¿s, optimitzar l'assignacio' de recursos i millorar la qualitat de l'atencio'. / [EN] Preterm delivery, defined as birth before 37 weeks of gestation, is a significant global concern with implications for the health of newborns and economic costs. It affects approximately 11% of all births, amounting to more than 15 million individuals worldwide. Current methods for predicting preterm labor lack precision, leading to overdiagnosis and limited practicality in clinical settings. Electrohysterography (EHG) has emerged as a promising alternative by providing relevant information about uterine electrophysiology. However, previous prediction systems based on EHG have not effectively translated into clinical practice, primarily due to biases in handling imbalanced data and the need for robust and generalizable prediction models. This doctoral thesis aims to develop an artificial intelligence based preterm labor prediction system using EHG and obstetric data from women undergoing regular prenatal check-ups. This system entails extracting relevant features, optimizing the feature subspace, and evaluating strategies to address the imbalanced data challenge for robust prediction. The study validates the effectiveness of temporal, spectral, and non-linear features in distinguishing between preterm and term labor cases. Novel entropy measures, namely dispersion and bubble entropy, outperform traditional entropy metrics in identifying preterm labor. Additionally, the study seeks to maximize complementary information while minimizing redundancy and noise features to optimize the feature subspace for accurate preterm delivery prediction by a genetic algorithm. Furthermore, we have confirmed leakage information between train and test data set when generating synthetic samples before data partitioning giving rise to an overestimated generalization capability of the predictor system. These results emphasize the importance of using partitioning-resampling techniques for ensuring data independence between train and test samples. We propose to combine genetic algorithm and resampling method at the same iteration to deal with imbalanced data learning using partition-resampling pipeline, achieving an Area Under the ROC Curve of 94% and Average Precision of 84%. Moreover, the model demonstrates an F1-score and recall of approximately 80%, outperforming existing studies on partition-resampling pipeline. This finding reveals the potential of an EHG-based preterm birth prediction system, enabling patient-oriented strategies for enhanced preterm labor prevention, maternal-fetal well-being, and optimal hospital resource management. Overall, this doctoral thesis provides clinicians with valuable tools for decision-making in preterm labor maternal-fetal risk scenarios. It enables clinicians to design a patient-oriented strategies for enhanced preterm birth prevention and management. The proposed methodology holds promise for the development of an integrated preterm birth prediction system that can enhance pregnancy planning, optimize resource allocation, and ultimately improve the outcomes for both mother and baby. / Nieto Del Amor, F. (2023). Design and assessment of a computer-assisted artificial intelligence system for predicting preterm labor in women attending regular check-ups. Emphasis in imbalance data learning technique [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/200900 / Compendio Predicción de parto prematuro Electrohisterografía Desequilibrio de datos Algoritmo genético Métodos de remuestreo Electromiografía uterina Aprendizaje automático Preterm labor prediction Electrohysterography Imbalance data learning Genetic algorithm Resampling methods Uterine electromyography Machine learning TECNOLOGIA ELECTRONICA

1

Page generated in 0.0832 seconds