Global ETD Search

1	Selection component analysis of the PGI polymorphism in Sphaeroma rugicauda Riddoch, B. January 1987 (has links) No description available. 572.8 Isopod gene selection
2	Gene selection based on consistency modelling, algorithms and applications Hu, Yingjie (Raphael) Unknown Date (has links) Consistency modeling for gene selection is a new topic emerging from recent cancer bioinformatics research. The result of classification or clustering on a training set was often found very different from the same operations on a testing set. Here, the issue is addressed as a consistency problem. In practice, the inconsistency of microarray datasets prevents many typical gene selection methods working properly for cancer diagnosis and prognosis. In an attempt to deal with this problem, a new concept of performance-based consistency is proposed in this thesis.An interesting finding in our previous experiments is that by using a proper set of informative genes, we significantly improved the consistency characteristic of microarray data. Therefore, how to select genes in terms of consistency modelling becomes an interesting topic. Many previously published gene selection methods perform well in the cancer diagnosis domain, but questions are raised because of the irreproducibility of experimental results. Motivated by this, two new gene selection methods based on the proposed performance-based consistency concept, GAGSc (Genetic Algorithm Gene Selection method in terms of consistency) and LOOLSc (Leave-one-out Least-Square bound method with consistency measurement) were developed in this study with the purpose of identifying a set of informative genes for achieving replicable results of microarray data analysis.The proposed consistency concept was investigated on eight benchmark microarray and proteomic datasets. The experimental results show that the different microarray datasets have different consistency characteristics, and that better consistency can lead to an unbiased and reproducible outcome with good disease prediction accuracy.As an implementation of the proposed performance-based consistency, GAGSc and LOOLSc are capable of providing a small set of informative genes. Comparing with those traditional gene selection methods without using consistency measurement, GAGSc and LOOLSc can provide more accurate classification results. More importantly, GAGSc and LOOLSc have demonstrated that gene selection, with the proposed consistency measurement, is able to enhance the reproducibility in microarray diagnosis experiments. Gene selection Classification Bias Reproducability Cancer bioinformatics DNA microarrays
3	Gene selection based on consistency modelling, algorithms and applications Hu, Yingjie (Raphael) Unknown Date (has links) Consistency modeling for gene selection is a new topic emerging from recent cancer bioinformatics research. The result of classification or clustering on a training set was often found very different from the same operations on a testing set. Here, the issue is addressed as a consistency problem. In practice, the inconsistency of microarray datasets prevents many typical gene selection methods working properly for cancer diagnosis and prognosis. In an attempt to deal with this problem, a new concept of performance-based consistency is proposed in this thesis.An interesting finding in our previous experiments is that by using a proper set of informative genes, we significantly improved the consistency characteristic of microarray data. Therefore, how to select genes in terms of consistency modelling becomes an interesting topic. Many previously published gene selection methods perform well in the cancer diagnosis domain, but questions are raised because of the irreproducibility of experimental results. Motivated by this, two new gene selection methods based on the proposed performance-based consistency concept, GAGSc (Genetic Algorithm Gene Selection method in terms of consistency) and LOOLSc (Leave-one-out Least-Square bound method with consistency measurement) were developed in this study with the purpose of identifying a set of informative genes for achieving replicable results of microarray data analysis.The proposed consistency concept was investigated on eight benchmark microarray and proteomic datasets. The experimental results show that the different microarray datasets have different consistency characteristics, and that better consistency can lead to an unbiased and reproducible outcome with good disease prediction accuracy.As an implementation of the proposed performance-based consistency, GAGSc and LOOLSc are capable of providing a small set of informative genes. Comparing with those traditional gene selection methods without using consistency measurement, GAGSc and LOOLSc can provide more accurate classification results. More importantly, GAGSc and LOOLSc have demonstrated that gene selection, with the proposed consistency measurement, is able to enhance the reproducibility in microarray diagnosis experiments. Gene selection Classification Bias Reproducability Cancer bioinformatics DNA microarrays
4	Improving Feature Selection Techniques for Machine Learning Tan, Feng 27 November 2007 (has links) As a commonly used technique in data preprocessing for machine learning, feature selection identifies important features and removes irrelevant, redundant or noise features to reduce the dimensionality of feature space. It improves efficiency, accuracy and comprehensibility of the models built by learning algorithms. Feature selection techniques have been widely employed in a variety of applications, such as genomic analysis, information retrieval, and text categorization. Researchers have introduced many feature selection algorithms with different selection criteria. However, it has been discovered that no single criterion is best for all applications. We proposed a hybrid feature selection framework called based on genetic algorithms (GAs) that employs a target learning algorithm to evaluate features, a wrapper method. We call it hybrid genetic feature selection (HGFS) framework. The advantages of this approach include the ability to accommodate multiple feature selection criteria and find small subsets of features that perform well for the target algorithm. The experiments on genomic data demonstrate that ours is a robust and effective approach that can find subsets of features with higher classification accuracy and/or smaller size compared to each individual feature selection algorithm. A common characteristic of text categorization tasks is multi-label classification with a great number of features, which makes wrapper methods time-consuming and impractical. We proposed a simple filter (non-wrapper) approach called Relation Strength and Frequency Variance (RSFV) measure. The basic idea is that informative features are those that are highly correlated with the class and distribute most differently among all classes. The approach is compared with two well-known feature selection methods in the experiments on two standard text corpora. The experiments show that RSFV generate equal or better performance than the others in many cases. Feature selection Gene selection Text categorization Text classification Genetic algorithm Dimension Reduction Term selection Computer Sciences
5	COMPUTATIONAL IDENTIFICATION AND MOLECULAR VERIFICATION OF MIRNA IN EASTERN SUBTERRANEAN TERMITES (RETICULITERMES FLAVIPES) Yu, Tian 01 January 2014 (has links) Reticulitermes flavipes is one of the most common termite species in the world, and has been an intriguing research model due to its ecological and biological and economic significance. The fundamental biological question addressed by this study is to elucidate the role of miRNAs in termite development and how miRNA can influence labor division. miRNAs are short non-coding RNAs that have an important role in gene regulation at post-transcriptional level, and can potentially be involved in the regulation of caste polyphenism. Using a computational approach, I identified 167 conserved and 33 novel miRNAs in the dataset. miR-iab-4 and 19 other miRNAs showed highly differential expression between worker and soldier, and their possible roles in termite biology are discussed. To reliably quantify miRNA expression in experiments, I tested the stability of 10 miRNAs as reference gene using quantitative real-time PCR. miR-8_3, bantam and miR-276a-3p are the most stable miRNAs in different castes, pre-soldier formation, and different tissues, respectively. Lastly, the predicted miRNA expression is verified by the qRT-PCR for 8 miRNAs. Overall, this study shows that miRNA plays a role in mediating the work-soldier transition in R. flavipes. miRNA identification bioinformatics termite miRNA verification miRNA reference gene selection Agriculture Entomology
6	IMPROVED GENE PAIR BIOMARKERS FOR MICROARRAY DATA CLASSIFICATION Khamesipour, Alireza 01 August 2018 (has links) The Top Scoring Pair (TSP) classifier, based on the notion of relative ranking reversals in the expressions of two marker genes, has been proposed as a simple, accurate, and easily interpretable decision rule for classification and class prediction of gene expression profiles. We introduce the AUC-based TSP classifier, which is based on the Area Under the ROC (Receiver Operating Characteristic) Curve. The AUCTSP classifier works according to the same principle as TSP but differs from the latter in that the probabilities that determine the top scoring pair are computed based on the relative rankings of the two marker genes across all subjects as opposed to for each individual subject. Although the classification is still done on an individual subject basis, the generalization that the AUC-based probabilities provide during training yield an overall better and more stable classifier. Through extensive simulation results and case studies involving classification in ovarian, leukemia, colon, and breast and prostate cancers and diffuse large b-cell lymphoma, we show the superiority of the proposed approach in terms of improving classification accuracy, avoiding overfitting and being less prone to selecting non-informative pivot genes. The proposed AUCTSP is a simple yet reliable and robust rank-based classifier for gene expression classification. While the AUCTSP works by the same principle as TSP, its ability to determine the top scoring gene pair based on the relative rankings of two marker genes across {\em all} subjects as opposed to each individual subject results in significant performance gains in classification accuracy. In addition, the proposed method tends to avoid selection of non-informative (pivot) genes as members of the top-scoring pair.\\ We have also proposed the use of the AUC test statistic in order to reduce the computational cost of the TSP in selecting the most informative pair of genes for diagnosing a specific disease. We have proven the efficacy of our proposed method through case studies in ovarian, colon, leukemia, breast and prostate cancers and diffuse large b-cell lymphoma in selecting informative genes. We have compared the selected pairs, computational cost and running time and classification performance of a subset of differentially expressed genes selected based on the AUC probability with the original TSP in the aforementioned datasets. The reduce sized TSP has proven to dramatically reduce the computational cost and time complexity of selecting the top scoring pair of genes in comparison to the original TSP in all of the case studies without degrading the performance of the classifier. Using the AUC probability, we were able to reduce the computational cost and CPU running time of the TSP by 79\% and 84\% respectively on average in the tested case studies. In addition, the use of the AUC probability prior to applying the TSP tends to avoid the selection of genes that are not expressed (``pivot'' genes) due to the imposed condition. We have demonstrated through LOOCV and 5-fold cross validation that the reduce sized TSP and TSP have shown to perform approximately the same in terms of classification accuracy for smaller threshold values. In conclusion, we suggest the use of the AUC test statistic in reducing the size of the dataset for the extensions of the TSP method, e.g. the k-TSP and TST, in order to make these methods feasible and cost effective. AUC Cancer diagnosis Gene expression Gene selection Microarray data analysis
7	SqueezeFit Linear Program: Fast and Robust Label-aware Dimensionality Reduction Lu, Tien-hsin 01 October 2020 (has links) No description available. Mathematics Dimensionality reduction linear program marker gene selection machine learning mathematical data science optimization
8	Gene Selection by 1-D Discrete Wavelet Transform for Classifying Cancer Samples Using DNA Microarray Date Jose, Adarsh 09 June 2009 (has links) No description available. Biomedical Research discrete wavelet transform microarray data cancer gene selection classification
9	Machine Learning to Interrogate High-throughput Genomic Data: Theory and Applications Yu, Guoqiang 19 September 2011 (has links) The missing heritability in genome-wide association studies (GWAS) is an intriguing open scientific problem which has attracted great recent interest. The interaction effects among risk factors, both genetic and environmental, are hypothesized to be one of the main missing heritability sources. Moreover, detection of multilocus interaction effect may also have great implications for revealing disease/biological mechanisms, for accurate risk prediction, personalized clinical management, and targeted drug design. However, current analysis of GWAS largely ignores interaction effects, partly due to the lack of tools that meet the statistical and computational challenges posed by taking into account interaction effects. Here, we propose a novel statistically-based framework (Significant Conditional Association) for systematically exploring, assessing significance, and detecting interaction effect. Further, our SCA work has also revealed new theoretical results and insights on interaction detection, as well as theoretical performance bounds. Using in silico data, we show that the new approach has detection power significantly better than that of peer methods, while controlling the running time within a permissible range. More importantly, we applied our methods on several real data sets, confirming well-validated interactions with more convincing evidence (generating smaller p-values and requiring fewer samples) than those obtained through conventional methods, eliminating inconsistent results in the original reports, and observing novel discoveries that are otherwise undetectable. The proposed methods provide a useful tool to mine new knowledge from existing GWAS and generate new hypotheses for further research. Microarray gene expression studies provide new opportunities for the molecular characterization of heterogeneous diseases. Multiclass gene selection is an imperative task for identifying phenotype-associated mechanistic genes and achieving accurate diagnostic classification. Most existing multiclass gene selection methods heavily rely on the direct extension of two-class gene selection methods. However, simple extensions of binary discriminant analysis to multiclass gene selection are suboptimal and not well-matched to the unique characteristics of the multi-category classification problem. We report a simpler and yet more accurate strategy than previous works for multicategory classification of heterogeneous diseases. Our method selects the union of one-versus-everyone phenotypic up-regulated genes (OVEPUGs) and matches this gene selection with a one-versus-rest support vector machine. Our approach provides even-handed gene resources for discriminating both neighboring and well-separated classes, and intends to assure the statistical reproducibility and biological plausibility of the selected genes. We evaluated the fold changes of OVEPUGs and found that only a small number of high-ranked genes were required to achieve superior accuracy for multicategory classification. We tested the proposed OVEPUG method on six real microarray gene expression data sets (five public benchmarks and one in-house data set) and two simulation data sets, observing significantly improved performance with lower error rates, fewer marker genes, and higher performance sustainability, as compared to several widely-adopted gene selection and classification methods. / Ph. D. Gene-Environment Interaction Gene-Gene Interaction Multi-category gene selection Genome-wide Association Study
10	Les réseaux bayésiens : classification et recherche de réseaux locaux en cancérologie / Classification and capture of regulation networks with bayesian networks in oncology Prestat, Emmanuel 25 May 2010 (has links) En cancérologie, les puces à ADN mesurant le transcriptome sont devenues un outil commun pour chercher à caractériser plus finement les pathologies, dans l’espoir de trouver au travers des expressions géniques : des mécanismes,des classes, des associations entre molécules, des réseaux d’interactions cellulaires. Ces réseaux d’interactions sont très intéressants d’un point de vue biologique car ils concentrent un grand nombre de connaissances sur le fonctionnement cellulaire. Ce travail de thèse a pour but, à partir de ces mêmes données d’expression, d’extraire des structures pouvant s’apparenter à des réseaux d’interactions génétiques. Le cadre méthodologique choisi pour appréhender cette problématique est les « Réseaux Bayésiens », c’est-à-dire une méthode à la fois graphique et probabiliste permettant de modéliser des systèmes pourtant statiques (ici le réseau d’expression génétique) à l’aide d’indépendances conditionnelles sous forme d’un réseau. L’adaptation de cette méthode à des données dont la dimension des variables (ici l’expression des gènes, dont l’ordre de grandeur est 105) est très supérieure à la dimension des échantillons (ordre102 en cancérologie) pose des problèmes statistiques (de faux positifs et négatifs) et combinatoires (avec seulement 10gènes on a 4×1018 graphes orientés sans circuit possibles). A partir de plusieurs problématiques de cancers (leucémies et cancers du sein), ce projet propose une stratégie d’accélération de recherche de réseaux d’expression à l’aide de Réseaux Bayésiens, ainsi que des mises en œuvre de cette méthode pour classer des tumeurs, sélectionner un ensemble de gènes d’intérêt reliés à une condition biologique particulière, rechercher des réseaux locaux autour d’un gène d’intérêt.On propose parallèlement de modéliser un Réseau Bayésien à partir d’un réseau biologique connu, utile pour simuler des échantillons et tester des méthodes de reconstruction de graphes à partir de données contrôlées. / In oncology, microarrays have become a classical tool to search and characterize pathologies at a deeper level than previous methods, using genetic expression to find the mechanisms, classes, molecular associations, and cellular interaction networks of different cancers. From a biological point of view, these cellular networks are interesting because they concentrate a large amount of knowledge about cellular processes. The goal of this PhD thesis project is to extract structures that could correspond to genetic interaction networks from the expression data. "Bayesian Networks", i.e. a graphic and probabilistic method that models even static systems (like the expression network) with conditional independences, are used as the framework to investigate this problem. The adaptation of this method to data where the dimension of the variables (about 105 for gene expression) is much greater than the dimension of the samples (about 102 in oncology) aggravates some statistical and combinatorial problems. For several cancer problematics, this project proposes an acceleration strategy for capturing expression networks with Bayesian Networks and some methods to classify tumors, finding gene signatures of particular biological conditions by searching for local networks in the neighborhood of a gene of interest. In parallel, we propose to model a Bayesian Network from a known biological network, which is useful to simulate samples and to test these methods to reconstruct graphs from Réseaux cellulaires Transcriptome Réseaux Bayésiens Classification Sélection de variables Cancer Cellular networks Transcriptome Bayesian Networks Classification Gene selection Cancer

Search results