Global ETD Search

1	Paving the Randomized Gauss-Seidel Wu, Wei 01 January 2017 (has links) The Randomized Gauss-Seidel Method (RGS) is an iterative algorithm that solves overdetermined systems of linear equations Ax = b. This paper studies an update on the RGS method, the Randomized Block Gauss-Seidel Method. At each step, the algorithm greedily minimizes the objective function L(x) = kAx bk2 with respect to a subset of coordinates. This paper describes a Randomized Block Gauss-Seidel Method (RBGS) which uses a randomized control method to choose a subset at each step. This algorithm is the first block RGS method with an expected linear convergence rate which can be described by the properties of the matrix A and its column submatrices. The analysis demonstrates that RBGS improves RGS more when given appropriate column-paving of the matrix, a partition of the columns into well-conditioned blocks. The main result yields a RBGS method that is more e cient than the simple RGS method. Numerical Analysis and Computation
2	Analysis and Reconstruction of the Hematopoietic Stem Cell Differentiation Tree: A Linear Programming Approach for Gene Selection Ghadie, Mohamed A. January 2015 (has links) Stem cells differentiate through an organized hierarchy of intermediate cell types to terminally differentiated cell types. This process is largely guided by master transcriptional regulators, but it also depends on the expression of many other types of genes. The discrete cell types in the differentiation hierarchy are often identified based on the expression or non-expression of certain marker genes. Historically, these have often been various cell-surface proteins, which are fairly easy to assay biochemically but are not necessarily causative of the cell type, in the sense of being master transcriptional regulators. This raises important questions about how gene expression across the whole genome controls or reflects cell state, and in particular, differentiation hierarchies. Traditional approaches to understanding gene expression patterns across multiple conditions, such as principal components analysis or K-means clustering, can group cell types based on gene expression, but they do so without knowledge of the differentiation hierarchy. Hierarchical clustering and maximization of parsimony can organize the cell types into a tree, but in general this tree is different from the differentiation hierarchy. Using hematopoietic differentiation as an example, we demonstrate how many genes other than marker genes are able to discriminate between different branches of the differentiation tree by proposing two models for detecting genes that are up-regulated or down-regulated in distinct lineages. We then propose a novel approach to solving the following problem: Given the differentiation hierarchy and gene expression data at each node, construct a weighted Euclidean distance metric such that the minimum spanning tree with respect to that metric is precisely the given differentiation hierarchy. We provide a set of linear constraints that are provably sufficient for the desired construction and a linear programming framework to identify sparse sets of weights, effectively identifying genes that are most relevant for discriminating different parts of the tree. We apply our method to microarray gene expression data describing 38 cell types in the hematopoiesis hierarchy, constructing a sparse weighted Euclidean metric that uses just 175 genes. These 175 genes are different than the marker genes that were used to identify the 38 cell types, hence offering a novel alternative way of discriminating different branches of the tree. A DAVID functional annotation analysis shows that the 175 genes reflect major processes and pathways active in different parts of the tree. However, we find that there are many alternative sets of weights that satisfy the linear constraints. Thus, in the style of random-forest training, we also construct metrics based on random subsets of the genes and compare them to the metric of 175 genes. Our results show that the 175 genes frequently appear in the random metrics, implicating their significance from an empirical point of view as well. Finally, we show how our linear programming method is able to identify columns that were selected to build minimum spanning trees on the nodes of random variable-size matrices. Linear Programming Distance Metric Learning Machine Learning Feature Selection Tree Reconstruction Hierarchical Clustering Minimum Spanning Tree Clustering Optimization Maximum Parsimony Euclidean Distance Weighted Euclidean Stem Cell Differentiation Hematopoiesis Transcriptional Regulation Transcription Factor Gene Selection Gene Expression Microarray Cell Type Marker Gene Functional Annotation Random Forest Biological Function Regulation Statistical Significance Erythropoiesis Natural Killer Cell T Cell B Cell Granulocyte Monocyte Megakaryocyte Minimization Linear Constraint Cell Lineage

Search results

Paving the Randomized Gauss-Seidel

Analysis and Reconstruction of the Hematopoietic Stem Cell Differentiation Tree: A Linear Programming Approach for Gene Selection