• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5661
  • 579
  • 285
  • 275
  • 167
  • 157
  • 83
  • 66
  • 50
  • 42
  • 24
  • 21
  • 20
  • 19
  • 12
  • Tagged with
  • 9143
  • 9143
  • 3049
  • 1704
  • 1539
  • 1534
  • 1439
  • 1379
  • 1211
  • 1198
  • 1181
  • 1132
  • 1122
  • 1040
  • 1035
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
711

Gene fusions in cancer: Classification of fusion events and regulation patterns of fusion pathway neighbors

Hughes, Katelyn 05 May 2016 (has links)
Cancer is a leading cause of death worldwide, resulting in an estimated 1.6 million mortalities and 600,000 new cases in the US alone in 2015. Gene fusions, hybrid genes formed from two originally separated genes, are known drivers of cancer. However, gene fusions have also been found in healthy cells due to routine errors in replication. This project aims to understand the role of gene fusion in cancer. Specifically, we seek to achieve two goals. First, we would like to develop a computational method that predicts if a gene fusion event is associated with the cancer or healthy sample. Second, we would like to use this information to determine and characterize molecular mechanisms behind the gene fusion events. Recent studies have attempted to address these problems, but without explicit consideration of the fact that there are overlapping fusion events in both cancer and healthy cells. Here, we address this problem using FUsion Enriched Learning of CANcer Mutations (FUELCAN), a semi-supervised model, which classifies all overlapping fusion events as unlabeled to start. The model is trained using the known cancer and healthy samples and tested using the unlabeled dataset. Unlabeled data is classified as associated with healthy or cancer samples and the top 20 data points are put back into the training set. The process continues until all have been appropriately classified. Three datasets were analyzed from Acute Lymphoblastic Leukemia (ALL), breast cancer and colorectal cancer. We obtained similar results for both supervised and semi-supervised classification. To improve our model, we assessed the functional landscape of gene fusion events and observed that the pathway neighbors of both gene fusion partners are differentially expressed in each cancer dataset. The significant neighbors are also shown to have direct connections to cancer pathways and functions, indicating that these gene fusions are important for cancer development. Future directions include applying the acquired transcriptomic knowledge to our machine learning algorithm, counting transcription factors and kinases within the gene fusion events and their neighbors and assessing the differences between upstream and downstream effects within the pathway neighbors.
712

Statistical Learning Methods for Personalized Medical Decision Making

Liu, Ying January 2016 (has links)
The theme of my dissertation is on merging statistical modeling with medical domain knowledge and machine learning algorithms to assist in making personalized medical decisions. In its simplest form, making personalized medical decisions for treatment choices and disease diagnosis modality choices can be transformed into classification or prediction problems in machine learning, where the optimal decision for an individual is a decision rule that yields the best future clinical outcome or maximizes diagnosis accuracy. However, challenges emerge when analyzing complex medical data. On one hand, statistical modeling is needed to deal with inherent practical complications such as missing data, patients' loss to follow-up, ethical and resource constraints in randomized controlled clinical trials. On the other hand, new data types and larger scale of data call for innovations combining statistical modeling, domain knowledge and information technologies. This dissertation contains three parts addressing the estimation of optimal personalized rule for choosing treatment, the estimation of optimal individualized rule for choosing disease diagnosis modality, and methods for variable selection if there are missing data. In the first part of this dissertation, we propose a method to find optimal Dynamic treatment regimens (DTRs) in Sequential Multiple Assignment Randomized Trial (SMART) data. Dynamic treatment regimens (DTRs) are sequential decision rules tailored at each stage of treatment by potentially time-varying patient features and intermediate outcomes observed in previous stages. The complexity, patient heterogeneity, and chronicity of many diseases and disorders call for learning optimal DTRs that best dynamically tailor treatment to each individual's response over time. We propose a robust and efficient approach referred to as Augmented Multistage Outcome-Weighted Learning (AMOL) to identify optimal DTRs from sequential multiple assignment randomized trials. We improve outcome-weighted learning (Zhao et al.~2012) to allow for negative outcomes; we propose methods to reduce variability of weights to achieve numeric stability and higher efficiency; and finally, for multiple-stage trials, we introduce robust augmentation to improve efficiency by drawing information from Q-function regression models at each stage. The proposed AMOL remains valid even if the regression model is misspecified. We formally justify that proper choice of augmentation guarantees smaller stochastic errors in value function estimation for AMOL; we then establish the convergence rates for AMOL. The comparative advantage of AMOL over existing methods is demonstrated in extensive simulation studies and applications to two SMART data sets: a two-stage trial for attention deficit hyperactivity disorder and the STAR*D trial for major depressive disorder. The second part of the dissertation introduced a machine learning algorithm to estimate personalized decision rules for medical diagnosis/screening to maximize a weighted combination of sensitivity and specificity. Using subject-specific risk factors and feature variables, such rules administer screening tests with balanced sensitivity and specificity, and thus protect low-risk subjects from unnecessary pain and stress caused by false positive tests, while achieving high sensitivity for subjects at high risk. We conducted simulation study mimicking a real breast cancer study, and we found significant improvements on sensitivity and specificity comparing our personalized screening strategy (assigning mammography+MRI to high-risk patients and mammography alone to low-risk subjects based on a composite score of their risk factors) to one-size-fits-all strategy (assigning mammography+MRI or mammography alone to all subjects). When applying to a Parkinson's disease (PD) FDG-PET and fMRI data, we showed that the method provided individualized modality selection that can improve AUC, and it can provide interpretable decision rules for choosing brain imaging modality for early detection of PD. To the best of our knowledge, this is the first time in the literature to propose automatic data-driven methods and learning algorithm for personalized diagnosis/screening strategy. In the last part of the dissertation, we propose a method, Multiple Imputation Random Lasso (MIRL), to select important variables and to predict the outcome for an epidemiological study of Eating and Activity in Teens. % in the presence of missing data. In this study, 80% of individuals have at least one variable missing. Therefore, using variable selection methods developed for complete data after list-wise deletion substantially reduces prediction power. Recent work on prediction models in the presence of incomplete data cannot adequately account for large numbers of variables with arbitrary missing patterns. We propose MIRL to combine penalized regression techniques with multiple imputation and stability selection. Extensive simulation studies are conducted to compare MIRL with several alternatives. MIRL outperforms other methods in high-dimensional scenarios in terms of both reduced prediction error and improved variable selection performance, and it has greater advantage when the correlation among variables is high and missing proportion is high. MIRL is shown to have improved performance when comparing with other applicable methods when applied to the study of Eating and Activity in Teens for the boys and girls separately, and to a subgroup of low social economic status (SES) Asian boys who are at high risk of developing obesity.
713

M3S – Développement de la spectroscopie Raman en cytopathologie : Application au diagnostic de la leucémie lymphoïde chronique / M3S - Development of Raman spectroscopy in cytopathology : Application to the diagnosis of chronic lymphocytic leukaemia

Féré, Michael 18 December 2018 (has links)
Actuellement, il existe peu de nouvelles technologies "Label free" afin de faciliter et d’améliorer le diagnostic précoce. Ces technologies pourraient être des outils puissants pour mieux diagnostiquer les patients. De nombreuses études ont montré le potentiel de la spectroscopie Raman pour aider les cliniciens. Le travail réalisé au cours de cette thèse avait pour but de mettre au point un outil autonome pour le diagnostic de la LLC, grâce à des données Raman acquises dans différentes conditions expérimentales et instrumentales lors de campagnes de mesures multicentriques. Cependant, ces changements influent beaucoup sur les données Raman, ce qui pose des problèmes de transférabilité. L’apparition de cette technologie au chevet du patient est donc entravée, il est nécessaire de corriger ce manque de transférabilité. Dans ce mémoire, différents axes de recherche ont été menés. Il a été proposé, dans un premier temps, d'évaluer une solution consistant en l'application d'un prétraitement spécifiquement développé afin d’éliminer la variabilité spectrale induite par les différents changements de conditions. Le prétraitement basé sur l’EMSC a montré de fortes performances pour homogénéiser ces données multicentriques. Le second axe de recherche a été d’évaluer différentes stratégies, afin de créer et d’optimiser des modèles pour le diagnostic de la LLC. 100 modèles de classification ont donc été créé grâce à la double validation croisée répétée. La combinaison des prédictions de ces modèles a permis, grâce à un vote majoritaire, de prédire avec une grande précision si un patient était sain ou atteint de la LLC. / Currently, there are few new "Label free" technologies to facilitate and improve early diagnosis. These technologies could be powerful tools to better diagnose patients. Many studies have shown the potential of Raman spectroscopy to help clinicians. The work carried out during this thesis aimed to develop an autonomous tool for the diagnosis of CLL, using Raman data acquired under different experimental and instrumental conditions during multicentric measurement campaigns. However, these changes have a significant impact on Raman data, which poses transferability issues. The appearance of this technology at the bedside is therefore hindered, it is necessary to correct this lack of transferability. In this thesis, various lines of research were conducted. As a first step, it was proposed to evaluate a solution consisting in the application of a specifically developed pre-treatment to eliminate the spectral variability induced by the different changes in conditions. Pre-treatment based on EMSC has shown strong performance in homogenizing this multicentric data. The second research axis was to evaluate different strategies, in order to create and optimize models for the diagnosis of CLL. 100 classification models were therefore created through repeated double crossvalidation. The combination of the predictions of these models allowed, through a majority vote, to predict with great accuracy whether a patient was healthy or sick.
714

Design of a Test Framework for the Evaluation of Transfer Learning Algorithms

Unknown Date (has links)
A traditional machine learning environment is characterized by the training and testing data being drawn from the same domain, therefore, having similar distribution characteristics. In contrast, a transfer learning environment is characterized by the training data having di erent distribution characteristics from the testing data. Previous research on transfer learning has focused on the development and evaluation of transfer learning algorithms using real-world datasets. Testing with real-world datasets exposes an algorithm to a limited number of data distribution di erences and does not exercise an algorithm's full capability and boundary limitations. In this research, we de ne, implement, and deploy a transfer learning test framework to test machine learning algorithms. The transfer learning test framework is designed to create a wide-range of distribution di erences that are typically encountered in a transfer learning environment. By testing with many di erent distribution di erences, an algorithm's strong and weak points can be discovered and evaluated against other algorithms. This research additionally performs case studies that use the transfer learning test framework. The rst case study focuses on measuring the impact of exposing algorithms to the Domain Class Imbalance distortion pro le. The next case study uses the entire transfer learning test framework to evaluate both transfer learning and traditional machine learning algorithms. The nal case study uses the transfer learning test framework in conjunction with real-world datasets to measure the impact of the base traditional learner on the performance of transfer learning algorithms. Two additional experiments are performed that are focused on using unique realworld datasets. The rst experiment uses transfer learning techniques to predict fraudulent Medicare claims. The second experiment uses a heterogeneous transfer learning method to predict phishing webgages. These case studies will be of interest to researchers who develop and improve transfer learning algorithms. This research will also be of bene t to machine learning practitioners in the selection of high-performing transfer learning algorithms. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2017. / FAU Electronic Theses and Dissertations Collection
715

Unsupervised Activity Discovery and Characterization for Sensor-Rich Environments

Hamid, Muhammad Raffay 28 November 2005 (has links)
This thesis presents an unsupervised method for discovering and analyzing the different kinds of activities in an active environment. Drawing from natural language processing, a novel representation of activities as bags of event n-grams is introduced, where the global structural information of activities using their local event statistics is analyzed. It is demonstrated how maximal cliques in an undirected edge-weighted graph of activities, can be used in an unsupervised manner, to discover the different activity-classes. Taking on some work done in computer networks and bio-informatics, it is shown how to characterize these discovered activity-classes from a wholestic as well as a by-parts view-point. A definition of anomalous activities is formulated along with a way to detect them based on the difference of an activity instance from each of the discovered activity-classes. Finally, an information theoretic method to explain the detected anomalies in a human-interpretable form is presented. Results over extensive data-sets, collected from multiple active environments are presented, to show the competence and generalizability of the proposed framework.
716

Near-Optimality of Distributed Network Management with a Machine Learning Approach

Jeon, Sung-eok 09 July 2007 (has links)
An analytical framework is developed for distributed management of large networks where each node makes locally its decisions. Two issues remain open. One is whether a distributed algorithm would result in a near-optimal management. The other is the complexity, i.e., whether a distributed algorithm would scale gracefully with a network size. We study these issues through modeling, approximation, and randomized distributed algorithms. For near-optimality issue, we first derive a global probabilistic model of network management variables which characterizes the complex spatial dependence of the variables. The spatial dependence results from externally imposed management constraints and internal properties of communication environments. We then apply probabilistic graphical models in machine learning to show when and whether the global model can be approximated by a local model. This study results in a sufficient condition for distributed management to be nearly optimal. We then show how to obtain a near-optimal configuration through decentralized adaptation of local configurations. We next derive a near-optimal distributed inference algorithm based on the derived local model. We characterize the trade-off between near-optimality and complexity of distributed and statistical management. We validate our formulation and theory through simulations.
717

Domain knowledge, uncertainty, and parameter constraints

Mao, Yi 24 August 2010 (has links)
No description available.
718

AIRS: a resource limited artificial immune classifier

Watkins, Andrew B. January 2001 (has links)
Thesis (M.S.)--Mississippi State University. Department of Computer Science. / Title from title screen. Includes bibliographical references.
719

Gene finding in eukaryotic genomes using external information and machine learning techniques

Burns, Paul D. 20 September 2013 (has links)
Gene finding in eukaryotic genomes is an essential part of a comprehensive approach to modern systems biology. Most methods developed in the past rely on a combination of computational prediction and external information about gene structures from transcript sequences and comparative genomics. In the past, external sequence information consisted of a combination of full-length cDNA and expressed sequence tag (EST) sequences. Much improvement in prediction of genes and gene isoforms is promised by availability of RNA-seq data. However, productive use of RNA-seq for gene prediction has been difficult due to challenges associated with mapping RNA-seq reads which span splice junctions to prevalent splicing noise in the cell. This work addresses this difficulty with the development of methods and implementation of two new pipelines: 1/ a novel pipeline for accurate mapping of RNA-seq reads to compact genomes and 2/ a pipeline for prediction of genes using the RNA-seq spliced alignments in eukaryotic genomes. Machine learning methods are employed in order to overcome errors associated with the process of mapping short RNA-seq reads across introns and using them for determining sequence model parameters for gene prediction. In addition to the development of these new methods, genome annotation work was performed on several plant genome projects.
720

Factorisation Matricielle, Application à la Recommandation Personnalisée de Préférences

Delporte, Julien 03 February 2014 (has links) (PDF)
Cette thèse s'articule autour des problèmes d'optimisation à grande échelle, et plus particulièrement autour des méthodes de factorisation matricielle sur des problèmes de grandes tailles. L'objectif des méthodes de factorisation de grandes matrices est d'extraire des variables latentes qui permettent d'expliquer les données dans un espace de dimension réduite. Nous nous sommes intéressés au domaine d'application de la recommandation et plus particulièrement au problème de prédiction de préférences d'utilisateurs. Dans une contribution nous nous sommes intéressés à l'application de méthodes de factorisation dan un environnement de recommandation contextuelle et notamment dans un contexte social. Dans une seconde contribution, nous nous sommes intéressés au problème de sélection de modèle pour la factorisation où l'on cherche à déterminer de façon automatique le rang de la factorisation par estimation de risque.

Page generated in 0.0963 seconds