Global ETD Search

11	Identification of gene expression changes in human cancer using bioinformatic approaches Griffith, Obi Lee 05 1900 (has links) The human genome contains tens of thousands of gene loci which code for an even greater number of protein and RNA products. The highly complex temporal and spatial expression of these genes makes possible all the biological processes of life. Altered gene expression by mutation or deregulation is fundamental for the development of many human diseases. The ultimate aim of this thesis was to identify gene expression changes relevant to cancer. The advent of genome-wide expression profiling techniques, such as microarrays, has provided powerful new tools to identify such changes and researchers are now faced with an explosion of gene expression data. Processing, comparing and integrating these data present major challenges. I approached these challenges by developing and assessing novel methods for cross-platform analysis of expression data, scalable subspace clustering, and curation of experimental gene regulation data from the published literature. I found that combining results from different expression platforms increases reliability of coexpression predictions. However, I also observed that global correlation between platforms was generally low, and few gene pairs reached reasonable thresholds for high-confidence coexpression. Therefore, I developed a novel subspace clustering algorithm, able to identify coexpressed genes in experimental subsets of very large gene expression datasets. Biological assessment against several metrics indicates that this algorithm performs well. I also developed a novel meta-analysis method to identify consistently reported genes from differential expression studies when raw data are unavailable. This method was applied to thyroid cancer, producing a ranked list of significantly over-represented genes. Tissue microarray analysis of some of these candidates and others identified a number of promising biomarkers for diagnostic and prognostic classification of thyroid cancer. Finally, I present ORegAnno (www.oreganno.org), a resource for the community-driven curation of experimentally verified regulatory sequences. This resource has proven a great success with ~30,000 sequences entered from over 900 publications by ~50 contributing users. These data, methods and resources contribute to our overall understanding of gene regulation, gene expression, and the changes that occur in cancer. Such an understanding should help identify new cancer mechanisms, potential treatment targets, and have significant diagnostic and prognostic implications. / Medicine, Faculty of / Medical Genetics, Department of / Graduate Bioinformatics Gene expression Gene regulation SAGE Tissue microarray Thyroid cancer Subspace clustering Biclustering Ontology Biomarker
12	Sparse subspace clustering-based motion segmentation with complete occlusion handling Mattheus, Jana January 2021 (has links) Motion segmentation is part of the computer vision field and aims to find the moving parts in a video sequence. It is used in applications such as autonomous driving, surveillance, robotics, human motion analysis, and video indexing. Since there are so many applications, motion segmentation is ill-defined and the research field is vast. Despite the advances in the research over the years, the existing methods are still far behind human capabilities. Problems such as changes in illumination, camera motion, noise, mixtures of motion, missing data, and occlusion remain challenges. Feature-based approaches have grown in popularity over the years, especially manifold clustering methods due to their strong mathematical foundation. Methods exploiting sparse and low-rank representations are often used since the dimensionality of the data is reduced while useful information regarding the motion segments is extracted. However, these methods are unable to effectively handle large and complete occlusions as well as missing data since they tend to fail when the amount of missing data becomes too large. An algorithm based on Sparse Subspace Clustering (SSC) has been proposed to address the issue of occlusions and missing data so that SSC can handle these cases with high accuracy. A frame-to-frame analysis was adopted as a pre-processing step to identify motion segments between consecutive frames, called inter-frame motion segments. The pre-processing step is called Multiple Split-And-Merge (MSAM), which is based on the classic top-down split-and-merge algorithm. Only points present in both frame pairs are segmented. This means that a point undergoing an occlusion is only assigned to a motion class when it has been visible for two consecutive frames after re-entering the camera view. Once all the inter-frame segments have been extracted, the results are combined in a single matrix and used as the input for the classic SSC algorithm. Therefore, SSC segments inter-frame motion segments rather than point trajectories. The resulting algorithm is referred to as MSAM-SSC. MSAM-SSC outperformed some of the most popular manifold clustering methods on the Hopkins155 and KT3DMoSeg datasets. It was also able to handle complete occlusions and 50% missing data sequences, as well as outliers. The algorithm can handle mixtures of motions and different numbers of motions. However, it was found that MSAM-SSC is more suited for traffic and articulate motion scenes which are often used in applications such as robotics, surveillance, and autonomous driving. For future work, the algorithm can be optimised to reduce the execution time so that it can be used for real-time applications. Additionally, the number of moving objects in the scene can be estimated to obtain a method that does not rely on prior knowledge. / Dissertation (MEng (Computer Engineering))--University of Pretoria, 2021. / CSIR / Electrical, Electronic and Computer Engineering / MEng (Computer Engineering) / Unrestricted UCTD Motion segmentation Motion analysis Sparse subspace clustering Manifold clustering Computer vision
13	Graph Coloring and Clustering Algorithms for Science and Engineering Applications Bozdag, Doruk January 2008 (has links) No description available. Bioinformatics Computer Science Electrical Engineering parallel graph coloring distributed memor y biclustering co-clustering subspace clustering
14	High-dimensional Data Clustering and Statistical Analysis of Clustering-based Data Summarization Products Zhou, Dunke 27 June 2012 (has links) No description available. Statistics k-means sparsity stability analysis subspace clustering Mallows distance climate data data reduction
15	Singular Value Computation and Subspace Clustering Liang, Qiao 01 January 2015 (has links) In this dissertation we discuss two problems. In the first part, we consider the problem of computing a few extreme eigenvalues of a symmetric definite generalized eigenvalue problem or a few extreme singular values of a large and sparse matrix. The standard method of choice of computing a few extreme eigenvalues of a large symmetric matrix is the Lanczos or the implicitly restarted Lanczos method. These methods usually employ a shift-and-invert transformation to accelerate the speed of convergence, which is not practical for truly large problems. With this in mind, Golub and Ye proposes an inverse-free preconditioned Krylov subspace method, which uses preconditioning instead of shift-and-invert to accelerate the convergence. To compute several eigenvalues, Wielandt is used in a straightforward manner. However, the Wielandt deflation alters the structure of the problem and may cause some difficulties in certain applications such as the singular value computations. So we first propose to consider a deflation by restriction method for the inverse-free Krylov subspace method. We generalize the original convergence theory for the inverse-free preconditioned Krylov subspace method to justify this deflation scheme. We next extend the inverse-free Krylov subspace method with deflation by restriction to the singular value problem. We consider preconditioning based on robust incomplete factorization to accelerate the convergence. Numerical examples are provided to demonstrate efficiency and robustness of the new algorithm. In the second part of this thesis, we consider the so-called subspace clustering problem, which aims for extracting a multi-subspace structure from a collection of points lying in a high-dimensional space. Recently, methods based on self expressiveness property (SEP) such as Sparse Subspace Clustering and Low Rank Representations have been shown to enjoy superior performances than other methods. However, methods with SEP may result in representations that are not amenable to clustering through graph partitioning. We propose a method where the points are expressed in terms of an orthonormal basis. The orthonormal basis is optimally chosen in the sense that the representation of all points is sparsest. Numerical results are given to illustrate the effectiveness and efficiency of this method. singular value decomposition machine learning subspace clustering Numerical Analysis and Computation Theory and Algorithms
16	Subspace clustering on static datasets and dynamic data streams using bio-inspired algorithms / Regroupement de sous-espaces sur des ensembles de données statiques et des flux de données dynamiques à l'aide d'algorithmes bioinspirés Peignier, Sergio 27 July 2017 (has links) Une tâche importante qui a été étudiée dans le contexte de données à forte dimensionnalité est la tâche connue sous le nom de subspace clustering. Le subspace clustering est généralement reconnu comme étant plus compliqué que le clustering standard, étant donné que cette tâche vise à détecter des groupes d’objets similaires entre eux (clusters), et qu’en même temps elle vise à trouver les sous-espaces où apparaissent ces similitudes. Le subspace clustering, ainsi que le clustering traditionnel ont été récemment étendus au traitement de flux de données en mettant à jour les modèles de clustering de façon incrémentale. Les différents algorithmes qui ont été proposés dans la littérature, reposent sur des bases algorithmiques très différentes. Parmi ces approches, les algorithmes évolutifs ont été sous-explorés, même si ces techniques se sont avérées très utiles pour traiter d’autres problèmes NP-difficiles. L’objectif de cette thèse a été de tirer parti des nouvelles connaissances issues de l’évolution afin de concevoir des algorithmes évolutifs qui traitent le problème du subspace clustering sur des jeux de données statiques ainsi que sur des flux de données dynamiques. Chameleoclust, le premier algorithme développé au cours de ce projet, tire partie du grand degré de liberté fourni par des éléments bio-inspirés tels qu’un génome de longueur variable, l’existence d’éléments fonctionnels et non fonctionnels et des opérateurs de mutation incluant des réarrangements chromosomiques. KymeroClust, le deuxième algorithme conçu dans cette thèse, est un algorithme de k-medianes qui repose sur un mécanisme évolutif important: la duplication et la divergence des gènes. SubMorphoStream, le dernier algorithme développé ici, aborde le problème du subspace clustering sur des flux de données dynamiques. Cet algorithme repose sur deux mécanismes qui jouent un rôle clef dans l’adaptation rapide des bactéries à des environnements changeants: l’amplification de gènes et l’absorption de matériel génétique externe. Ces algorithmes ont été comparés aux principales techniques de l’état de l’art, et ont obtenu des résultats compétitifs. En outre, deux applications appelées EvoWave et EvoMove ont été développés pour évaluer la capacité de ces algorithmes à résoudre des problèmes réels. EvoWave est une application d’analyse de signaux Wi-Fi pour détecter des contextes différents. EvoMove est un compagnon musical artificiel qui produit des sons basés sur le clustering des mouvements d’un danseur, décrits par des données provenant de capteurs de déplacements. / An important task that has been investigated in the context of high dimensional data is subspace clustering. This data mining task is recognized as more general and complicated than standard clustering, since it aims to detect groups of similar objects called clusters, and at the same time to find the subspaces where these similarities appear. Furthermore, subspace clustering approaches as well as traditional clustering ones have recently been extended to deal with data streams by updating clustering models in an incremental way. The different algorithms that have been proposed in the literature, rely on very different algorithmic foundations. Among these approaches, evolutionary algorithms have been under-explored, even if these techniques have proven to be valuable addressing other NP-hard problems. The aim of this thesis was to take advantage of new knowledge from evolutionary biology in order to conceive evolutionary subspace clustering algorithms for static datasets and dynamic data streams. Chameleoclust, the first algorithm developed in this work, takes advantage of the large degree of freedom provided by bio-like features such as a variable genome length, the existence of functional and non-functional elements and mutation operators including chromosomal rearrangements. KymeroClust, our second algorithm, is a k-medians based approach that relies on the duplication and the divergence of genes, a cornerstone evolutionary mechanism. SubMorphoStream, the last one, tackles the subspace clustering task over dynamic data streams. It relies on two important mechanisms that favor fast adaptation of bacteria to changing environments, namely gene amplification and foreign genetic material uptake. All these algorithms were compared to the main state-of-the-art techniques, obtaining competitive results. Results suggest that these algorithms are useful complementary tools in the analyst toolbox. In addition, two applications called EvoWave and EvoMove have been developed to assess the capacity of these algorithms to address real world problems. EvoWave is an application that handles the analysis of Wi-Fi signals to detect different contexts. EvoMove, the second one, is a musical companion that produces sounds based on the clustering of dancer moves captured using motion sensors. Informatique Flux de données Algorithmes évolutionnaires Alogorithmes bio-Inspirés Regroupement de sous-Espaces Information Technology Data streams Evolutionary algorithms Bio-Inspired Algorithms Subspace clustering 005.101 072
17	A GPU Accelerated Tensor Spectral Method for Subspace Clustering Pai, Nithish January 2016 (has links) (PDF) In this thesis we consider the problem of clustering the data lying in a union of subspaces using spectral methods. Though the data generated may have high dimensionality, in many of the applications, such as motion segmentation and illumination invariant face clustering, the data resides in a union of subspaces having small dimensions. Furthermore, for a number of classification and inference problems, it is often useful to identify these subspaces and work with data in this smaller dimensional manifold. If the observations in each cluster were to be distributed around a centric, applying spectral clustering on an a nifty matrix built using distance based similarity measures between the data points have been used successfully to solve the problem. But it has been observed that using such pair-wise distance based measure between the data points to construct a similarity matrix is not sufficient to solve the subspace clustering problem. Hence, a major challenge is to end a similarity measure that can capture the information of the subspace the data lies in. This is the motivation to develop methods that use an affinity tensor by calculating similarity between multiple data points. One can then use spectral methods on these tensors to solve the subspace clustering problem. In order to keep the algorithm computationally feasible, one can employ column sampling strategies. However, the computational costs for performing the tensor factorization increases very quickly with increase in sampling rate. Fortunately, the advances in GPU computing has made it possible to perform many linear algebra operations several order of magnitudes faster than traditional CPU and multicourse computing. In this work, we develop parallel algorithms for subspace clustering on a GPU com-putting environment. We show that this gives us a significant speedup over the implementations on the CPU, which allows us to sample a larger fraction of the tensor and thereby achieve better accuracies. We empirically analyze the performance of these algorithms on a number of synthetically generated subspaces con gyrations. We ally demonstrate the effectiveness of these algorithms on the motion segmentation, handwritten digit clustering and illumination invariant face clustering and show that the performance of these algorithms are comparable with the state of the art approaches. Subspace Clustering Tensors Spectral Method Hypergraphs and Tensors Tensor Factorization Spectral Clustering based Algorithms GPU Accelerated Algorithm GPU Computing Computer Science
18	Low-Rank and Sparse Decomposition for Hyperspectral Image Enhancement and Clustering Tian, Long 03 May 2019 (has links) In this dissertation, some new algorithms are developed for hyperspectral imaging analysis enhancement. Tensor data format is applied in hyperspectral dataset sparse and low-rank decomposition, which could enhance the classification and detection performance. And multi-view learning technique is applied in hyperspectral imaging clustering. Furthermore, kernel version of multi-view learning technique has been proposed, which could improve clustering performance. Most of low-rank and sparse decomposition algorithms are based on matrix data format for HSI analysis. As HSI contains high spectral dimensions, tensor based extended low-rank and sparse decomposition (TELRSD) is proposed in this dissertation for better performance of HSI classification with low-rank tensor part, and HSI detection with sparse tensor part. With this tensor based method, HSI is processed in 3D data format, and information between spectral bands and pixels maintain integrated during decomposition process. This proposed algorithm is compared with other state-of-art methods. And the experiment results show that TELRSD has the best performance among all those comparison algorithms. HSI clustering is an unsupervised task, which aims to group pixels into different groups without labeled information. Low-rank sparse subspace clustering (LRSSC) is the most popular algorithms for this clustering task. The spatial-spectral based multi-view low-rank sparse subspace clustering (SSMLC) algorithms is proposed in this dissertation, which extended LRSSC with multi-view learning technique. In this algorithm, spectral and spatial views are created to generate multi-view dataset of HSI, where spectral partition, morphological component analysis (MCA) and principle component analysis (PCA) are applied to create others views. Furthermore, kernel version of SSMLC (k-SSMLC) also has been investigated. The performance of SSMLC and k-SSMLC are compared with sparse subspace clustering (SSC), low-rank sparse subspace clustering (LRSSC), and spectral-spatial sparse subspace clustering (S4C). It has shown that SSMLC could improve the performance of LRSSC, and k-SSMLC has the best performance. The spectral clustering has been proved that it equivalent to non-negative matrix factorization (NMF) problem. In this case, NMF could be applied to the clustering problem. In order to include local and nonlinear features in data source, orthogonal NMF (ONMF), graph-regularized NMF (GNMF) and kernel NMF (k-NMF) has been proposed for better clustering performance. The non-linear orthogonal graph NMF combine both kernel, orthogonal and graph constraints in NMF (k-OGNMF), which push up the clustering performance further. In the HSI domain, kernel multi-view based orthogonal graph NMF (k-MOGNMF) is applied for subspace clustering, where k-OGNMF is extended with multi-view algorithm, and it has better performance and computation efficiency. Multi-view algorithm Low-rank sparse subspace clustering Non-nagative Matrix Factorization. Clustering Anomaly Dectection Classification Low-rank and sparse decomposition Hyperspectral Image
19	Mapas auto-organizáveis com topologioa variante no tempo para categorização em subespaços em dados de alta dimensionalidade e vistas múltiplas ANTONINO, Victor Oliveira 16 August 2016 (has links) Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2017-04-24T15:04:03Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) mapas-auto-organizaveis2.pdf: 2835656 bytes, checksum: 8836a86bd2cced9353cb25b53383b305 (MD5) / Made available in DSpace on 2017-04-24T15:04:03Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) mapas-auto-organizaveis2.pdf: 2835656 bytes, checksum: 8836a86bd2cced9353cb25b53383b305 (MD5) Previous issue date: 2016-08-16 / Métodos e algoritmos em aprendizado de máquina não supervisionado têm sido empregados em diversos problemas significativos. Uma explosão na disponibilidade de dados de várias fontes e modalidades está correlacionada com os avanços na obtenção, compressão, armazenamento, transferência e processamento de grandes quantidades de dados complexos com alta dimensionalidade, como imagens digitais, vídeos de vigilância e microarranjos de DNA. O agrupamento se torna difícil devido à crescente dispersão desses dados, bem como a dificuldade crescente em discriminar distâncias entre os pontos de dados. Este trabalho apresenta um algoritmo de agrupamento suave em subespaços baseado em um mapa auto-organizável (SOM) com estrutura variante no tempo, o que significa que o agrupamento dos dados pode ser alcançado sem qualquer conhecimento prévio, tais como o número de categorias ou a topologia dos padrões de entrada, nos quais ambos são determinados durante o processo de treinamento. O modelo também atribui diferentes pesos a diferentes dimensões, o que implica que cada dimensão contribui para o descobrimento dos aglomerados de dados. Para validar o modelo, diversos conjuntos de dados reais foram utilizados, considerando uma diversificada gama de contextos, tais como mineração de dados, expressão genética, agrupamento multivista e problemas de visão computacional. Os resultados são promissores e conseguem lidar com dados reais caracterizados pela alta dimensionalidade. / Unsupervised learning methods have been employed on many significant problems. A blast in the availability of data from multiple sources and modalities is correlated with advancements in how to obtain, compress, store, transfer, and process large amounts of complex high-dimensional data, such as digital images, surveillance videos, and DNA microarrays. Clustering becomes challenging due to the increasing sparsity of such data, as well as the increasing difficulty in discriminating distances between data points. This work presents a soft subspace clustering algorithm based on a self-organizing map (SOM) with time-variant structure, meaning that clustering data can be achieved without any prior knowledge such as the number of categories or input data topology, in which both are determined during the training process. The model also assigns different weights to different dimensions, this implies that every dimension contributes to uncover clusters. To validate the model, we used a number of real-world data sets, considering a diverse range of contexts such as data mining, gene expression, multi-view and computer vision problems. The promising results can handle real-world data characterized by high dimensionality. Dados em Alta Dimensionalidade Campo Receptivo Local Aprendizagem por Relevância Mapas Auto-Organizáveis Agrupamento em Subespaços High-Dimensional Data Local Receptive Field Relevance Learning SelfOrganizing Maps (SOMs) Subspace Clustering

Search results