Global ETD Search

1	Algorithms for the analysis of gene expression data Venet, David 07 December 2004 (has links) High-throughput gene expression data have been generated on a large scale by biologists. The thesis describe a set of tools for the analysis of such data. It is specially gearded towards microarray data. gene expression data bioinformatics
2	Pattern analysis of microarray data. / 基因芯片數據中的模式分析 / CUHK electronic theses & dissertations collection / Ji yin xin pian shu ju zhong de mo shi fen xi January 2009 (has links) DNA microarray technology is the most notable high throughput technology which emerged for functional genomics in recent years. Patterns in microarray data provide clues of gene functions, cell types, and interactions among genes or gene products. Since the scale of microarray data keeps on growing, there is an urgent need for the development of methods and tools for the analysis of these huge amounts of complex data. / Interesting patterns in microarray data can be patterns appearing with significant frequencies or patterns appearing special trends. Firstly, an algorithm to find biclusters with coherent values is proposed. For these biclusters the subset of genes (or samples) show some similarities, such as low Euclidean distance or high Pearson correlation coefficient. We propose Average Correlation Value (ACV) to measure the homogeneity of a bicluster. ACV outperforms other alternatives for being applicable for biclusters of more types. Our algorithm applies dominant set approach to create sets of sorting vectors for rows of the data matrix. In this way, the co-expressed rows of the data matrix could be gathered. By alternatively sorting and transposing the data matrix the blocks of co-expressed subset are gathered. Weighted correlation coefficient is used to measure the similarity in the gene level and the sample level. Their weights are updated each time using the sorting vector of the previous iteration. Genes/samples which are assigned higher weights contribute more to the similarity measure when they are used as features for the other dimension. Unlike the two-way clustering or divide and conquer algorithm, our approach does not break the structure of the whole data and can find multiple overlapping biclusters. Also the method has low computation cost comparing to the exhaustive enumeration and distribution parameter identification methods. / Next, algorithms to find biclusters with coherent evolutions, more specific, the order preserving patterns, are proposed. In an Order Preserving Cluster (OP-Cluster) genes induce the same relative order on samples, while the exact magnitude of the data are not regarded. By converting each gene expression vector into an ordered label sequence, we transfer the problem into finding frequent orders appearing in the sequence set. Two heuristic algorithms, Growing Prefix and Suffix (GPS) and Growing Frequent Position (GFP) are presented. The results show these methods both have good scale-up properties. They output larger OP-Clusters more efficiently and have lower space and computation space cost comparing to the existing methods. / We propose the idea of Discovering Distinct Patterns (DDP) in gene expression data. The distinct patterns correspond to genes with significantly different patterns. DDP is useful to scale-down the analysis when there is little prior knowledge. A DDP algorithm is proposed by iteratively picking out pairs of genes with the largest dissimilarities. Experiments are implemented on both synthetic data sets and real microarray data. The results show the effectiveness and efficiency in finding functional significant genes. The usefulness of genes with distinct patterns for constructing simplified gene regulatory network is further discussed. / Teng, Li. / Adviser: Laiwan Chan. / Source: Dissertation Abstracts International, Volume: 71-01, Section: B, page: 0446. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 118-128). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese. DNA microarrays Gene expression--Data processing
3	Surrogate variable analysis / Leek, Jeffrey Tullis. January 2007 (has links) Thesis (Ph. D.)--University of Washington, 2007. / Vita. Includes bibliographical references (p. 113-121).
4	Fully Bayesian T-probit Regression with Heavy-tailed Priors for Selection in High-Dimensional Features with Grouping Structure 2015 September 1900 (has links) Feature selection is demanded in many modern scientific research problems that use high-dimensional data. A typical example is to find the genes that are most related to a certain disease (e.g., cancer) from high-dimensional gene expression profiles. There are tremendous difficulties in eliminating a large number of useless or redundant features. The expression levels of genes have structure; for example, a group of co-regulated genes that have similar biological functions tend to have similar mRNA expression levels. Many statistical methods have been proposed to take the grouping structure into consideration in feature selection and regression, including Group LASSO, Supervised Group LASSO, and regression on group representatives. In this thesis, we propose to use a sophisticated Markov chain Monte Carlo method (Hamiltonian Monte Carlo with restricted Gibbs sampling) to fit T-probit regression with heavy-tailed priors to make selection in the features with grouping structure. We will refer to this method as fully Bayesian T-probit. The main feature of fully Bayesian T-probit is that it can make feature selection within groups automatically without a pre-specification of the grouping structure and more efficiently discard noise features than LASSO (Least Absolute Shrinkage and Selection Operator). Therefore, the feature subsets selected by fully Bayesian T-probit are significantly more sparse than subsets selected by many other methods in the literature. Such succinct feature subsets are much easier to interpret or understand based on existing biological knowledge and further experimental investigations. In this thesis, we use simulated and real datasets to demonstrate that the predictive performances of the more sparse feature subsets selected by fully Bayesian T-probit are comparable with the much larger feature subsets selected by plain LASSO, Group LASSO, Supervised Group LASSO, random forest, penalized logistic regression and t-test. In addition, we demonstrate that the succinct feature subsets selected by fully Bayesian T-probit have significantly better predictive power than the feature subsets of the same size taken from the top features selected by the aforementioned methods. Bayesian methods probit MCMC gene expression data grouping structure
5	The development and application of informatics-based systems for the analysis of the human transcriptome. Kelso, Janet January 2003 (has links) <p>Despite the fact that the sequence of the human genome is now complete it has become clear that the elucidation of the transcriptome is more complicated than previously expected. There is mounting evidence for unexpected and previously underestimated phenomena such as alternative splicing in the transcriptome. As a result, the identification of novel transcripts arising from the genome continues. Furthermore, as the volume of transcript data grows it is becoming increasingly difficult to integrate expression information which is from different sources, is stored in disparate locations, and is described using differing terminologies. Determining the function of translated transcripts also remains a complex task. Information about the expression profile &ndash / the location and timing of transcript expression &ndash / provides evidence that can be used in understanding the role of the expressed transcript in the organ or tissue under study, or in developmental pathways or disease phenotype observed.<br /> <br /> In this dissertation I present novel computational approaches with direct biological applications to two distinct but increasingly important areas of research in gene expression research. The first addresses detection and characterisation of alternatively spliced transcripts. The second is the construction of an hierarchical controlled vocabulary for gene expression data and the annotation of expression libraries with controlled terms from the hierarchies. In the final chapter the biological questions that can be approached, and the discoveries that can be made using these systems are illustrated with a view to demonstrating how the application of informatics can both enable and accelerate biological insight into the human transcriptome.</p> Gene expression, Data processing Genetics, Data processing Genomes, Data processing.
6	Similarity-Driven Cluster Merging Method for Unsupervised Fuzzy Clustering Xiong, Xuejian, Tan, Kian Lee 01 1900 (has links) In this paper, a similarity-driven cluster merging method is proposed for unsupervised fuzzy clustering. The cluster merging method is used to resolve the problem of cluster validation. Starting with an overspecified number of clusters in the data, pairs of similar clusters are merged based on the proposed similarity-driven cluster merging criterion. The similarity between clusters is calculated by a fuzzy cluster similarity matrix, while an adaptive threshold is used for merging. In addition, a modified generalized objective function is used for prototype-based fuzzy clustering. The function includes the p-norm distance measure as well as principal components of the clusters. The number of the principal components is determined automatically from the data being clustered. The performance of this unsupervised fuzzy clustering algorithm is evaluated by several experiments of an artificial data set and a gene expression data set. / Singapore-MIT Alliance (SMA) cluster merging unsupervised fuzzy clustering cluster validity gene expression data
7	Feature Selection for Gene Expression Data Based on Hilbert-Schmidt Independence Criterion Zarkoob, Hadi 21 May 2010 (has links) DNA microarrays are capable of measuring expression levels of thousands of genes, even the whole genome, in a single experiment. Based on this, they have been widely used to extend the studies of cancerous tissues to a genomic level. One of the main goals in DNA microarray experiments is to identify a set of relevant genes such that the desired outputs of the experiment mostly depend on this set, to the exclusion of the rest of the genes. This is motivated by the fact that the biological process in cell typically involves only a subset of genes, and not the whole genome. The task of selecting a subset of relevant genes is called feature (gene) selection. Herein, we propose a feature selection algorithm for gene expression data. It is based on the Hilbert-Schmidt independence criterion, and partly motivated by Rank-One Downdate (R1D) and the Singular Value Decomposition (SVD). The algorithm is computationally very fast and scalable to large data sets, and can be applied to response variables of arbitrary type (categorical and continuous). Experimental results of the proposed technique are presented on some synthetic and well-known microarray data sets. Later, we discuss the capability of HSIC in providing a general framework which encapsulates many widely used techniques for dimensionality reduction, clustering and metric learning. We will use this framework to explain two metric learning algorithms, namely the Fisher discriminant analysis (FDA) and closed form metric learning (CFML). As a result of this framework, we are able to propose a new metric learning method. The proposed technique uses the concepts from normalized cut spectral clustering and is associated with an underlying convex optimization problem. Feature selection Hilbert-Schmidt Independence Criterion Gene expression data Statistics
8	GAGS : A Novel Microarray Gene Selection Algorithm for Gene Expression Classification Wu, Kuo-yi 30 July 2010 (has links) In this thesis, we have proposed a novel microarray gene selection algorithm consisting of five processes for solving gene expression classification problem. A normalization process is first used to remove the differences among different scales of genes. Second, an efficient gene ranking process is proposed to filter out the unrelated genes. Then, the genetic algorithm is adopted to find the informative gene subsets for each class. For each class, these informative gene subsets are adopted to classify the testing dataset separately. Finally, the separated classification results are fused to one final classification result. In the first experiment, 4 microarray datasets are used to verify the performance of the proposed algorithm. The experiment is conducted using the leave-one-out-cross-validation (LOOCV) resampling method. We compared the proposed algorithm with twenty one existing methods. The proposed algorithm obtains three wins in four datasets, and the accuracies of three datasets all reach 100%. In the second experiment, 9 microarray datasets are used to verify the proposed algorithm. The experiment is conducted using 50% VS 50% resampling method. Our proposed algorithm obtains eight wins among nine datasets for all competing methods. Feature selection Gene expression data analysis Genetic algorithm
9	HARP: a practical projected clustering algorithm for mining gene expression data Yip, Yuk-Lap, Kevin., 葉旭立. January 2003 (has links) published_or_final_version / abstract / toc / Computer Science and Information Systems / Master / Master of Philosophy Gene expression - Data processing. Computer algorithms. Cluster analysis. Data mining.
10	Feature Selection for Gene Expression Data Based on Hilbert-Schmidt Independence Criterion Zarkoob, Hadi 21 May 2010 (has links) DNA microarrays are capable of measuring expression levels of thousands of genes, even the whole genome, in a single experiment. Based on this, they have been widely used to extend the studies of cancerous tissues to a genomic level. One of the main goals in DNA microarray experiments is to identify a set of relevant genes such that the desired outputs of the experiment mostly depend on this set, to the exclusion of the rest of the genes. This is motivated by the fact that the biological process in cell typically involves only a subset of genes, and not the whole genome. The task of selecting a subset of relevant genes is called feature (gene) selection. Herein, we propose a feature selection algorithm for gene expression data. It is based on the Hilbert-Schmidt independence criterion, and partly motivated by Rank-One Downdate (R1D) and the Singular Value Decomposition (SVD). The algorithm is computationally very fast and scalable to large data sets, and can be applied to response variables of arbitrary type (categorical and continuous). Experimental results of the proposed technique are presented on some synthetic and well-known microarray data sets. Later, we discuss the capability of HSIC in providing a general framework which encapsulates many widely used techniques for dimensionality reduction, clustering and metric learning. We will use this framework to explain two metric learning algorithms, namely the Fisher discriminant analysis (FDA) and closed form metric learning (CFML). As a result of this framework, we are able to propose a new metric learning method. The proposed technique uses the concepts from normalized cut spectral clustering and is associated with an underlying convex optimization problem. Feature selection Hilbert-Schmidt Independence Criterion Gene expression data Statistics

Search results