Global ETD Search

1	Harnessing Transfer Learning and Image Analysis Techniques for Enhanced Biological Insights: Multifaceted Approaches to Diagnosis and Prognosis of Diseases Ziyu Liu (18410397) 22 April 2024 (has links) <p dir="ltr">Despite the remarkable advancements of machine learning (ML) technologies in biomedical research, especially in tackling complex human diseases such as cancer and Alzheimer's disease, a considerable gap persists between promising theoretical results and dependable clinical applications in diagnosis, prognosis, and therapeutic decision-making. One of the primary challenges stems from the absence of large high-quality patient datasets, which arises from the cost and human labor required for collecting such datasets and the scarcity of patient samples. Moreover, the inherent complexity of the data often leads to a feature space dimension that is large compared with the sample size, potentially causing instability during training and unreliability in inference. To address these challenges, the transfer learning (TL) approach has been embraced in biomedical ML applications to facilitate knowledge transfer across diverse and related biological contexts. Leveraging this principle, we introduce an unsupervised multi-view TL algorithm, named MVTOT [1], which enables the analysis of various biomarkers across different cancer types. Specifically, we compress high-dimensional biomarkers from different cancer types into a low-dimensional feature space via nonnegative matrix factorization and distill common information shared by various cancer types using the Wasserstein distance defined by Optimal Transport theory. We evaluate the stratification performance on three early-stage cancers from the Cancer Genome Atlas (TCGA) project. Our framework, compared with other benchmark methods, demonstrates superior accuracy in patient survival outcome stratification.</p><p dir="ltr">Additionally, while patient-level stratification has enhanced clinical decision-making, our understanding of diseases at the single-cell (SC) level remains limited, which is crucial for deciphering disease progression mechanisms, monitoring drug responses, and prioritizing drug targets. It is essential to associate each SC with patient-level clinical traits such as survival hazard, drug response, and disease subtypes. However, SC samples often lack direct labeling with these traits, and the significant statistical gap between patient and SC-level gene expressions impedes the transfer of well-annotated patient-level disease attributes to SCs. Domain adaptation (DA), a TL subfield, addresses this challenge by training a domain-invariant feature extractor for both patient and SC gene expression matrices, facilitating the successful application of ML models trained on patient-level data to SC samples. Expanding upon an established deep-learning-based DA model, DEGAS [2], we substitute their computationally ineffective maximum mean discrepancy loss with the Wasserstein distance as the metric for domain discrepancy. This substitution facilitates the embedding of both SC and patient inputs into a common latent feature space. Subsequently, employing the model trained on patient-level disease attributes, we predict SC-level survival hazard, disease status, and drug response for prostate cancer, Alzheimer's SC data, and multiple myeloma data, respectively. Our approach outperforms benchmark studies, uncovering clinically significant cell subgroups and revealing the correlation between survival hazard and drug response at the SC level.</p><p dir="ltr">Furthermore, in addition to these approaches, we acknowledge the effectiveness of TL and image analysis in stratifying patients with early and late-stage Mild Cognitive Impairment based on neuroimaging, as well as predicting survival and metastasis in melanoma based on histological images. These applications underscore the potential of employing ML methods, especially TL algorithms, in addressing biomedical issues from various angles, thereby enhancing our understanding of disease mechanisms and developing new biomarkers predicting patient outcomes.</p> Biostatistics Transfer Learning Study Multiview learning Single cell technologies
2	Learning a Multiview Weighted Majority Vote Classifier : Using PAC-Bayesian Theory and Boosting / Apprentissage de vote de majorité pour la classification multivue : Utilisation de la théorie PAC-Bayésienne et du boosting Goyal, Anil 23 October 2018 (has links) La génération massive de données, nous avons de plus en plus de données issues de différentes sources d’informations ayant des propriétés hétérogènes. Il est donc important de prendre en compte ces représentations ou vues des données. Ce problème d'apprentissage automatique est appelé apprentissage multivue. Il est utile dans de nombreux domaines d’applications, par exemple en imagerie médicale, nous pouvons représenter le cerveau humains via des IRM, t-fMRI, EEG, etc. Dans cette cette thèse, nous nous concentrons sur l’apprentissage multivue supervisé, où l’apprentissage multivue est une combinaison de différents modèles de classifications ou de vues. Par conséquent, selon notre point de vue, il est intéressant d’aborder la question de l’apprentissage à vues multiples dans le cadre PAC-Bayésien. C’est un outil issu de la théorie de l’apprentissage statistique étudiant les modèles s’exprimant comme des votes de majorité. Un des avantages est qu’elle permet de prendre en considération le compromis entre précision et diversité des votants, au cœur des problématiques liées à l’apprentissage multivue. La première contribution de cette thèse étend la théorie PAC-Bayésienne classique (avec une seule vue) à l’apprentissage multivue (avec au moins deux vues). Pour ce faire, nous définissons une hiérarchie de votants à deux niveaux: les classifieurs spécifiques à la vue et les vues elles-mêmes. Sur la base de cette stratégie, nous avons dérivé des bornes en généralisation PAC-Bayésiennes (probabilistes et non-probabilistes) pour l’apprentissage multivue. D'un point de vue pratique, nous avons conçu deux algorithmes d'apprentissage multivues basés sur notre stratégie PAC-Bayésienne à deux niveaux. Le premier algorithme appelé PB-MVBoost est un algorithme itératif qui apprend les poids sur les vues en contrôlant le compromis entre la précision et la diversité des vues. Le second est une approche de fusion tardive où les prédictions des classifieurs spécifiques aux vues sont combinées via l’algorithme PAC-Bayésien CqBoost proposé par Roy et al. Enfin, nous montrons que la minimisation des erreurs pour le vote de majorité multivue est équivalente à la minimisation de divergences de Bregman. De ce constat, nous proposons un algorithme appelé MωMvC2 pour apprendre un vote de majorité multivue. / With tremendous generation of data, we have data collected from different information sources having heterogeneous properties, thus it is important to consider these representations or views of the data. This problem of machine learning is referred as multiview learning. It has many applications for e.g. in medical imaging, we can represent human brain with different set of features for example MRI, t-fMRI, EEG, etc. In this thesis, we focus on supervised multiview learning, where we see multiview learning as combination of different view-specific classifiers or views. Therefore, according to our point of view, it is interesting to tackle multiview learning issue through PAC-Bayesian framework. It is a tool derived from statistical learning theory studying models expressed as majority votes. One of the advantages of PAC-Bayesian theory is that it allows to directly capture the trade-off between accuracy and diversity between voters, which is important for multiview learning. The first contribution of this thesis is extending the classical PAC-Bayesian theory (with a single view) to multiview learning (with more than two views). To do this, we considered a two-level hierarchy of distributions over the view-specific voters and the views. Based on this strategy, we derived PAC-Bayesian generalization bounds (both probabilistic and expected risk bounds) for multiview learning. From practical point of view, we designed two multiview learning algorithms based on our two-level PAC-Bayesian strategy. The first algorithm is a one-step boosting based multiview learning algorithm called as PB-MVBoost. It iteratively learns the weights over the views by optimizing the multiview C-Bound which controls the trade-off between the accuracy and the diversity between the views. The second algorithm is based on late fusion approach where we combine the predictions of view-specific classifiers using the PAC-Bayesian algorithm CqBoost proposed by Roy et al. Finally, we show that minimization of classification error for multiview weighted majority vote is equivalent to the minimization of Bregman divergences. This allowed us to derive a parallel update optimization algorithm (referred as MωMvC2) to learn our multiview weighted majority vote. Apprentissage multivue Théorie PAC-Bayésienne Votes de majorité Multiview Learning PAC-Bayesian Theory Boosting Majority Vote

Search results

Harnessing Transfer Learning and Image Analysis Techniques for Enhanced Biological Insights: Multifaceted Approaches to Diagnosis and Prognosis of Diseases

Learning a Multiview Weighted Majority Vote Classifier : Using PAC-Bayesian Theory and Boosting / Apprentissage de vote de majorité pour la classification multivue : Utilisation de la théorie PAC-Bayésienne et du boosting