• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 31
  • 6
  • 3
  • 3
  • 3
  • 1
  • 1
  • 1
  • Tagged with
  • 52
  • 52
  • 46
  • 12
  • 12
  • 10
  • 10
  • 9
  • 8
  • 7
  • 7
  • 7
  • 6
  • 6
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Algorithms for the analysis of gene expression data

Venet, David 07 December 2004 (has links)
High-throughput gene expression data have been generated on a large scale by biologists. The thesis describe a set of tools for the analysis of such data. It is specially gearded towards microarray data.
2

Pattern analysis of microarray data. / 基因芯片數據中的模式分析 / CUHK electronic theses & dissertations collection / Ji yin xin pian shu ju zhong de mo shi fen xi

January 2009 (has links)
DNA microarray technology is the most notable high throughput technology which emerged for functional genomics in recent years. Patterns in microarray data provide clues of gene functions, cell types, and interactions among genes or gene products. Since the scale of microarray data keeps on growing, there is an urgent need for the development of methods and tools for the analysis of these huge amounts of complex data. / Interesting patterns in microarray data can be patterns appearing with significant frequencies or patterns appearing special trends. Firstly, an algorithm to find biclusters with coherent values is proposed. For these biclusters the subset of genes (or samples) show some similarities, such as low Euclidean distance or high Pearson correlation coefficient. We propose Average Correlation Value (ACV) to measure the homogeneity of a bicluster. ACV outperforms other alternatives for being applicable for biclusters of more types. Our algorithm applies dominant set approach to create sets of sorting vectors for rows of the data matrix. In this way, the co-expressed rows of the data matrix could be gathered. By alternatively sorting and transposing the data matrix the blocks of co-expressed subset are gathered. Weighted correlation coefficient is used to measure the similarity in the gene level and the sample level. Their weights are updated each time using the sorting vector of the previous iteration. Genes/samples which are assigned higher weights contribute more to the similarity measure when they are used as features for the other dimension. Unlike the two-way clustering or divide and conquer algorithm, our approach does not break the structure of the whole data and can find multiple overlapping biclusters. Also the method has low computation cost comparing to the exhaustive enumeration and distribution parameter identification methods. / Next, algorithms to find biclusters with coherent evolutions, more specific, the order preserving patterns, are proposed. In an Order Preserving Cluster (OP-Cluster) genes induce the same relative order on samples, while the exact magnitude of the data are not regarded. By converting each gene expression vector into an ordered label sequence, we transfer the problem into finding frequent orders appearing in the sequence set. Two heuristic algorithms, Growing Prefix and Suffix (GPS) and Growing Frequent Position (GFP) are presented. The results show these methods both have good scale-up properties. They output larger OP-Clusters more efficiently and have lower space and computation space cost comparing to the existing methods. / We propose the idea of Discovering Distinct Patterns (DDP) in gene expression data. The distinct patterns correspond to genes with significantly different patterns. DDP is useful to scale-down the analysis when there is little prior knowledge. A DDP algorithm is proposed by iteratively picking out pairs of genes with the largest dissimilarities. Experiments are implemented on both synthetic data sets and real microarray data. The results show the effectiveness and efficiency in finding functional significant genes. The usefulness of genes with distinct patterns for constructing simplified gene regulatory network is further discussed. / Teng, Li. / Adviser: Laiwan Chan. / Source: Dissertation Abstracts International, Volume: 71-01, Section: B, page: 0446. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 118-128). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts in English and Chinese.
3

Surrogate variable analysis /

Leek, Jeffrey Tullis. January 2007 (has links)
Thesis (Ph. D.)--University of Washington, 2007. / Vita. Includes bibliographical references (p. 113-121).
4

Fully Bayesian T-probit Regression with Heavy-tailed Priors for Selection in High-Dimensional Features with Grouping Structure

2015 September 1900 (has links)
Feature selection is demanded in many modern scientific research problems that use high-dimensional data. A typical example is to find the genes that are most related to a certain disease (e.g., cancer) from high-dimensional gene expression profiles. There are tremendous difficulties in eliminating a large number of useless or redundant features. The expression levels of genes have structure; for example, a group of co-regulated genes that have similar biological functions tend to have similar mRNA expression levels. Many statistical methods have been proposed to take the grouping structure into consideration in feature selection and regression, including Group LASSO, Supervised Group LASSO, and regression on group representatives. In this thesis, we propose to use a sophisticated Markov chain Monte Carlo method (Hamiltonian Monte Carlo with restricted Gibbs sampling) to fit T-probit regression with heavy-tailed priors to make selection in the features with grouping structure. We will refer to this method as fully Bayesian T-probit. The main feature of fully Bayesian T-probit is that it can make feature selection within groups automatically without a pre-specification of the grouping structure and more efficiently discard noise features than LASSO (Least Absolute Shrinkage and Selection Operator). Therefore, the feature subsets selected by fully Bayesian T-probit are significantly more sparse than subsets selected by many other methods in the literature. Such succinct feature subsets are much easier to interpret or understand based on existing biological knowledge and further experimental investigations. In this thesis, we use simulated and real datasets to demonstrate that the predictive performances of the more sparse feature subsets selected by fully Bayesian T-probit are comparable with the much larger feature subsets selected by plain LASSO, Group LASSO, Supervised Group LASSO, random forest, penalized logistic regression and t-test. In addition, we demonstrate that the succinct feature subsets selected by fully Bayesian T-probit have significantly better predictive power than the feature subsets of the same size taken from the top features selected by the aforementioned methods.
5

The development and application of informatics-based systems for the analysis of the human transcriptome.

Kelso, Janet January 2003 (has links)
<p>Despite the fact that the sequence of the human genome is now complete it has become clear that the elucidation of the transcriptome is more complicated than previously expected. There is mounting evidence for unexpected and previously underestimated phenomena such as alternative splicing in the transcriptome. As a result, the identification of novel transcripts arising from the genome continues. Furthermore, as the volume of transcript data grows it is becoming increasingly difficult to integrate expression information which is from different sources, is stored in disparate locations, and is described using differing terminologies. Determining the function of translated transcripts also remains a complex task. Information about the expression profile &ndash / the location and timing of transcript expression &ndash / provides evidence that can be used in understanding the role of the expressed transcript in the organ or tissue under study, or in developmental pathways or disease phenotype observed.<br /> <br /> In this dissertation I present novel computational approaches with direct biological applications to two distinct but increasingly important areas of research in gene expression research. The first addresses detection and characterisation of alternatively spliced transcripts. The second is the construction of an hierarchical controlled vocabulary for gene expression data and the annotation of expression libraries with controlled terms from the hierarchies. In the final chapter the biological questions that can be approached, and the discoveries that can be made using these systems are illustrated with a view to demonstrating how the application of informatics can both enable and accelerate biological insight into the human transcriptome.</p>
6

Intensity based methodologies for facial expression recognition.

January 2001 (has links)
by Hok Chun Lo. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2001. / Includes bibliographical references (leaves 136-143). / Abstracts in English and Chinese. / LIST OF FIGURES --- p.viii / LIST OF TABLES --- p.x / Chapter 1. --- INTRODUCTION --- p.1 / Chapter 2. --- PREVIOUS WORK ON FACIAL EXPRESSION RECOGNITION --- p.9 / Chapter 2.1. --- Active Deformable Contour --- p.9 / Chapter 2.2. --- Facial Feature Points and B-spline Curve --- p.10 / Chapter 2.3. --- Optical Flow Approach --- p.11 / Chapter 2.4. --- Facial Action Coding System --- p.12 / Chapter 2.5. --- Neural Network --- p.13 / Chapter 3. --- EIGEN-ANALYSIS BASED METHOD FOR FACIAL EXPRESSION RECOGNITION --- p.15 / Chapter 3.1. --- Related Topics on Eigen-Analysis Based Method --- p.15 / Chapter 3.1.1. --- Terminologies --- p.15 / Chapter 3.1.2. --- Principal Component Analysis --- p.17 / Chapter 3.1.3. --- Significance of Principal Component Analysis --- p.18 / Chapter 3.1.4. --- Graphical Presentation of the Idea of Principal Component Analysis --- p.20 / Chapter 3.2. --- EigenFace Method for Face Recognition --- p.21 / Chapter 3.3. --- Eigen-Analysis Based Method for Facial Expression Recognition --- p.23 / Chapter 3.3.1. --- Person-Dependent Database --- p.23 / Chapter 3.3.2. --- Direct Adoption of EigenFace Method --- p.24 / Chapter 3.3.3. --- Multiple Subspaces Method --- p.27 / Chapter 3.4. --- Detail Description on Our Approaches --- p.29 / Chapter 3.4.1. --- Database Formation --- p.29 / Chapter a. --- Conversion of Image to Column Vector --- p.29 / Chapter b. --- "Preprocess: Scale Regulation, Orientation Regulation and Cropping." --- p.30 / Chapter c. --- Scale Regulation --- p.31 / Chapter d. --- Orientation Regulation --- p.32 / Chapter e. --- Cropping of images --- p.33 / Chapter f. --- Calculation of Expression Subspace for Direct Adoption Method --- p.35 / Chapter g. --- Calculation of Expression Subspace for Multiple Subspaces Method. --- p.38 / Chapter 3.4.2. --- Recognition Process for Direct Adoption Method --- p.38 / Chapter 3.4.3. --- Recognition Process for Multiple Subspaces Method --- p.39 / Chapter a. --- Intensity Normalization Algorithm --- p.39 / Chapter b. --- Matching --- p.44 / Chapter 3.5. --- Experimental Result and Analysis --- p.45 / Chapter 4. --- DEFORMABLE TEMPLATE MATCHING SCHEME FOR FACIAL EXPRESSION RECOGNITION --- p.53 / Chapter 4.1. --- Background Knowledge --- p.53 / Chapter 4.1.1. --- Camera Model --- p.53 / Chapter a. --- Pinhole Camera Model and Perspective Projection --- p.54 / Chapter b. --- Orthographic Camera Model --- p.56 / Chapter c. --- Affine Camera Model --- p.57 / Chapter 4.1.2. --- View Synthesis --- p.58 / Chapter a. --- Technique Issue of View Synthesis --- p.59 / Chapter 4.2. --- View Synthesis Technique for Facial Expression Recognition --- p.68 / Chapter 4.2.1. --- From View Synthesis Technique to Template Deformation --- p.69 / Chapter 4.3. --- Database Formation --- p.71 / Chapter 4.3.1. --- Person-Dependent Database --- p.72 / Chapter 4.3.2. --- Model Images Acquisition --- p.72 / Chapter 4.3.3. --- Templates' Structure and Formation Process --- p.73 / Chapter 4.3.4. --- Selection of Warping Points and Template Anchor Points --- p.77 / Chapter a. --- Selection of Warping Points --- p.78 / Chapter b. --- Selection of Template Anchor Points --- p.80 / Chapter 4.4. --- Recognition Process --- p.81 / Chapter 4.4.1. --- Solving Warping Equation --- p.83 / Chapter 4.4.2. --- Template Deformation --- p.83 / Chapter 4.4.3. --- Template from Input Images --- p.86 / Chapter 4.4.4. --- Matching --- p.87 / Chapter 4.5. --- Implementation of Automation System --- p.88 / Chapter 4.5.1. --- Kalman Filter --- p.89 / Chapter 4.5.2. --- Using Kalman Filter for Trakcing in Our System --- p.89 / Chapter 4.5.3. --- Limitation --- p.92 / Chapter 4.6. --- Experimental Result and Analysis --- p.93 / Chapter 5. --- CONCLUSION AND FUTURE WORK --- p.97 / APPENDIX --- p.100 / Chapter I. --- Image Sample 1 --- p.100 / Chapter II. --- Image Sample 2 --- p.109 / Chapter III. --- Image Sample 3 --- p.119 / Chapter IV. --- Image Sample 4 --- p.135 / BIBLIOGRAPHY --- p.136
7

Similarity-Driven Cluster Merging Method for Unsupervised Fuzzy Clustering

Xiong, Xuejian, Tan, Kian Lee 01 1900 (has links)
In this paper, a similarity-driven cluster merging method is proposed for unsupervised fuzzy clustering. The cluster merging method is used to resolve the problem of cluster validation. Starting with an overspecified number of clusters in the data, pairs of similar clusters are merged based on the proposed similarity-driven cluster merging criterion. The similarity between clusters is calculated by a fuzzy cluster similarity matrix, while an adaptive threshold is used for merging. In addition, a modified generalized objective function is used for prototype-based fuzzy clustering. The function includes the p-norm distance measure as well as principal components of the clusters. The number of the principal components is determined automatically from the data being clustered. The performance of this unsupervised fuzzy clustering algorithm is evaluated by several experiments of an artificial data set and a gene expression data set. / Singapore-MIT Alliance (SMA)
8

Feature Selection for Gene Expression Data Based on Hilbert-Schmidt Independence Criterion

Zarkoob, Hadi 21 May 2010 (has links)
DNA microarrays are capable of measuring expression levels of thousands of genes, even the whole genome, in a single experiment. Based on this, they have been widely used to extend the studies of cancerous tissues to a genomic level. One of the main goals in DNA microarray experiments is to identify a set of relevant genes such that the desired outputs of the experiment mostly depend on this set, to the exclusion of the rest of the genes. This is motivated by the fact that the biological process in cell typically involves only a subset of genes, and not the whole genome. The task of selecting a subset of relevant genes is called feature (gene) selection. Herein, we propose a feature selection algorithm for gene expression data. It is based on the Hilbert-Schmidt independence criterion, and partly motivated by Rank-One Downdate (R1D) and the Singular Value Decomposition (SVD). The algorithm is computationally very fast and scalable to large data sets, and can be applied to response variables of arbitrary type (categorical and continuous). Experimental results of the proposed technique are presented on some synthetic and well-known microarray data sets. Later, we discuss the capability of HSIC in providing a general framework which encapsulates many widely used techniques for dimensionality reduction, clustering and metric learning. We will use this framework to explain two metric learning algorithms, namely the Fisher discriminant analysis (FDA) and closed form metric learning (CFML). As a result of this framework, we are able to propose a new metric learning method. The proposed technique uses the concepts from normalized cut spectral clustering and is associated with an underlying convex optimization problem.
9

GAGS : A Novel Microarray Gene Selection Algorithm for Gene Expression Classification

Wu, Kuo-yi 30 July 2010 (has links)
In this thesis, we have proposed a novel microarray gene selection algorithm consisting of five processes for solving gene expression classification problem. A normalization process is first used to remove the differences among different scales of genes. Second, an efficient gene ranking process is proposed to filter out the unrelated genes. Then, the genetic algorithm is adopted to find the informative gene subsets for each class. For each class, these informative gene subsets are adopted to classify the testing dataset separately. Finally, the separated classification results are fused to one final classification result. In the first experiment, 4 microarray datasets are used to verify the performance of the proposed algorithm. The experiment is conducted using the leave-one-out-cross-validation (LOOCV) resampling method. We compared the proposed algorithm with twenty one existing methods. The proposed algorithm obtains three wins in four datasets, and the accuracies of three datasets all reach 100%. In the second experiment, 9 microarray datasets are used to verify the proposed algorithm. The experiment is conducted using 50% VS 50% resampling method. Our proposed algorithm obtains eight wins among nine datasets for all competing methods.
10

Graph-based Support Vector Machines for Patient Response Prediction Using Pathway and Gene Expression Data

Huang, Norman Jason 14 October 2013 (has links)
Over the past decade, multiple function genomic datasets studying chromosomal aberrations and their downstream implications on gene expression have accumulated across a variety of cancer types. With the majority being paired copy number/gene expression profiles originating from the same patient groups, this time frame has also induced a wealth of integrative attempts in hope that the concurrent analysis between both genomic structures will result in optimized downstream results. Borrowing the concept, this dissertation presents a novel contribution to the development of statistical methodology for integrating copy number and gene expression data for purposes of predicting treatment response in multiple myeloma patients.

Page generated in 0.0615 seconds