Global ETD Search

21	Design of Experiments for Large Scale Catalytic Systems Kumar, Siddhartha Unknown Date No description available.
22	Nonnegative matrix factorization for clustering Kuang, Da 27 August 2014 (has links) This dissertation shows that nonnegative matrix factorization (NMF) can be extended to a general and efficient clustering method. Clustering is one of the fundamental tasks in machine learning. It is useful for unsupervised knowledge discovery in a variety of applications such as text mining and genomic analysis. NMF is a dimension reduction method that approximates a nonnegative matrix by the product of two lower rank nonnegative matrices, and has shown great promise as a clustering method when a data set is represented as a nonnegative data matrix. However, challenges in the widespread use of NMF as a clustering method lie in its correctness and efficiency: First, we need to know why and when NMF could detect the true clusters and guarantee to deliver good clustering quality; second, existing algorithms for computing NMF are expensive and often take longer time than other clustering methods. We show that the original NMF can be improved from both aspects in the context of clustering. Our new NMF-based clustering methods can achieve better clustering quality and run orders of magnitude faster than the original NMF and other clustering methods. Like other clustering methods, NMF places an implicit assumption on the cluster structure. Thus, the success of NMF as a clustering method depends on whether the representation of data in a vector space satisfies that assumption. Our approach to extending the original NMF to a general clustering method is to switch from the vector space representation of data points to a graph representation. The new formulation, called Symmetric NMF, takes a pairwise similarity matrix as an input and can be viewed as a graph clustering method. We evaluate this method on document clustering and image segmentation problems and find that it achieves better clustering accuracy. In addition, for the original NMF, it is difficult but important to choose the right number of clusters. We show that the widely-used consensus NMF in genomic analysis for choosing the number of clusters have critical flaws and can produce misleading results. We propose a variation of the prediction strength measure arising from statistical inference to evaluate the stability of clusters and select the right number of clusters. Our measure shows promising performances in artificial simulation experiments. Large-scale applications bring substantial efficiency challenges to existing algorithms for computing NMF. An important example is topic modeling where users want to uncover the major themes in a large text collection. Our strategy of accelerating NMF-based clustering is to design algorithms that better suit the computer architecture as well as exploit the computing power of parallel platforms such as the graphic processing units (GPUs). A key observation is that applying rank-2 NMF that partitions a data set into two clusters in a recursive manner is much faster than applying the original NMF to obtain a flat clustering. We take advantage of a special property of rank-2 NMF and design an algorithm that runs faster than existing algorithms due to continuous memory access. Combined with a criterion to stop the recursion, our hierarchical clustering algorithm runs significantly faster and achieves even better clustering quality than existing methods. Another bottleneck of NMF algorithms, which is also a common bottleneck in many other machine learning applications, is to multiply a large sparse data matrix with a tall-and-skinny dense matrix. We use the GPUs to accelerate this routine for sparse matrices with an irregular sparsity structure. Overall, our algorithm shows significant improvement over popular topic modeling methods such as latent Dirichlet allocation, and runs more than 100 times faster on data sets with millions of documents. Nonnegative matrix factorization Cluster analysis Hierarchical clustering Cancer subtype discovery GPU computing Sparse matrix multiplication
23	Automatic Clustering of 3D Objects for Hierarchical Level-of-Detail Wiberg, Benjamin January 2018 (has links) This report describes an algorithm for computing 3D object hierarchies fit for hlod optimization. The algorithm is used as a pre-processing stage in an hlod pipeline that automatically optimizes 3D models containing multiple meshes. The algorithm for generating hierarchies groups together meshes in a hierarchical tree using operations on bounding spheres of the meshes. The algorithm prioritizes grouping close objects together in the early stages, and relaxes its constraints toward the end, resulting in a tree structure with a single root node. The hierarchical tree is then used by computing proxy meshes, i.e. simplified stand-in meshes, for the inner nodes of the hierarchy. Finally, the resulting proxy meshes, together with the generated hierarchy and the original meshes, are used to render the model using a tree-traversing hlod switching algorithm that renders deeper parts of the tree containing more detailed meshes when more detail is needed. In addition, a minor change to the clustering algorithm is proposed. By swapping the bounding spheres to AABBs (Axis-Aligned Bounding Boxes) in the clustering stage, hierarchies with different properties are generated. This change is shown to generate hierarchies with similar rendering performance as the hierarchies made with bounding spheres, while at the same time resulting in lower space requirements for all proxy meshes. Overall, the proposed automatic hlod pipeline is shown to increase rendering performance for all evaluated scenes in most frames, while never yielding noticeably worse performance than the original model as well. hierarchical level-of-detail mesh simplification hierarchical clustering 3D optimization Media and Communication Technology Medieteknik
24	Development of a hierarchical k-selecting clustering algorithm – application to allergy. Malm, Patrik January 2007 (has links) The objective with this Master’s thesis was to develop, implement and evaluate an iterative procedure for hierarchical clustering with good overall performance which also merges features of certain already described algorithms into a single integrated package. An accordingly built tool was then applied to an allergen IgE-reactivity data set. The finally implemented algorithm uses a hierarchical approach which illustrates the emergence of patterns in the data. At each level of the hierarchical tree a partitional clustering method is used to divide data into k groups, where the number k is decided through application of cluster validation techniques. The cross-reactivity analysis, by means of the new algorithm, largely arrives at anticipated cluster formations in the allergen data, which strengthen results obtained through previous studies on the subject. Notably, though, certain unexpected findings presented in the former analysis where aggregated differently, and more in line with phylogenetic and protein family relationships, by the novel clustering package. bioinformatics partitional clustering hierarchical clustering allergy crossreactivity Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
25	Identifying Nodes of Transmission in Disease Diffusion Through Social Media Lamb, David Sebastian 03 July 2017 (has links) The spread of infectious diseases can be described in terms of three interrelated components: interaction, movement, and scale. Transmission between individuals requires some form of interaction, which is dependent on the pathogen, to occur. Diseases spread through the movement of their hosts; they spread across many spatial scales from local neighborhoods to countries, or temporal scales from days to years, or periodic intervals. Prior research into the spread of disease have examined diffusion processes retrospectively at regional or country levels, or developed differential equation or simulation models of the dynamics of disease transmission. While some of the more recent models incorporate all three components, they are limited in the way they understand where interactions occur. The focus has been on home or work, including contact with family or coworkers. The models reflect a lack of knowledge about how transmissions are made at specific locations in time, so-called nodes of transmission. That is, how individuals’ intersections in time and space function in disease transmission. This project sought to use the three factors of interaction, movement, and scale to better understand the spread of disease in terms of the place of interaction called the node of transmission. The overarching objective of this research was: how can nodes of transmission be identified through individual activity spaces incorporating the three factors of infectious disease spread: interaction, movement, and scale? This objective fed into three main sub-objectives: defining nodes of transmission, developing an appropriate methodology to identifying nodes of transmission, and applying it using geotagged social media data from Twitter. To develop an appropriate framework, this research relied on time geography, and traditional disease. This particularly relied on the idea of bundling to create the nodes, and a nesting effect that integrated scale. The data source used to identify nodes of transmission was collected from Twitter for the Los Angeles County, USA, area from October 2015 to February 2016. Automated text classification was used to identify messages where users self-reported an influenza-like-illness. Different groupings were created that combined both the syndrome and the symptoms of influenza, and applied to the automated classification. The use of Twitter for small-area health analysis was evaluated along with different text classification methodologies. A space-time hierarchical clustering technique was adapted to be applied towards the twitter data in both identifying nodes of transmission and identifying spatiotemporal contact networks. This clustering data was applied to the classified Twitter data to look at where interaction between the classified users were occurring. This pointed to six nodes that were typically densely populated areas that saw the merging of large groups of people in Los Angeles (e.g. Disneyland and Hollywood Boulevard).The movement of these individuals were also examined by using a edit distance to compare their visits to different clusters and nodes. twitter health influenza hierarchical clustering time geography Geographic Information Sciences Geography Public Health
26	Towards Next Generation Vertical Search Engines Zheng, Li 25 March 2014 (has links) As the Web evolves unexpectedly fast, information grows explosively. Useful resources become more and more difficult to find because of their dynamic and unstructured characteristics. A vertical search engine is designed and implemented towards a specific domain. Instead of processing the giant volume of miscellaneous information distributed in the Web, a vertical search engine targets at identifying relevant information in specific domains or topics and eventually provides users with up-to-date information, highly focused insights and actionable knowledge representation. As the mobile device gets more popular, the nature of the search is changing. So, acquiring information on a mobile device poses unique requirements on traditional search engines, which will potentially change every feature they used to have. To summarize, users are strongly expecting search engines that can satisfy their individual information needs, adapt their current situation, and present highly personalized search results. In my research, the next generation vertical search engine means to utilize and enrich existing domain information to close the loop of vertical search engine's system that mutually facilitate knowledge discovering, actionable information extraction, and user interests modeling and recommendation. I investigate three problems in which domain taxonomy plays an important role, including taxonomy generation using a vertical search engine, actionable information extraction based on domain taxonomy, and the use of ensemble taxonomy to catch user's interests. As the fundamental theory, ultra-metric, dendrogram, and hierarchical clustering are intensively discussed. Methods on taxonomy generation using my research on hierarchical clustering are developed. The related vertical search engine techniques are practically used in Disaster Management Domain. Especially, three disaster information management systems are developed and represented as real use cases of my research work. Data Mining Vertical Search Engine Hierarchical Clustering Taxonomy Recommendation Disaster Management
27	PSORIATIC FUNGAL AND BACTERIAL MICROBIOMES IDENTIFY PATIENT ENDOTYPES Salem, Iman 01 September 2021 (has links) No description available. Medicine Molecular Biology
28	Topological Hierarchies and Decomposition: From Clustering to Persistence Brown, Kyle A. 27 May 2022 (has links) No description available. Computer Science topological data analysis hierarchical clustering exploratory data analysis topology clustering data science
29	Understanding Noise and Structure behind Metric Spaces Wang, Dingkang 20 October 2021 (has links) No description available. Computer Science Metric spaces Metric Embedding Hierarchical clustering Metric Consensus Ordinal Consensus Embedding with outliers
30	Performance Assessment of The Extended Gower Coefficient on Mixed Data with Varying Types of Functional Data. Koomson, Obed 01 December 2018 (has links) (PDF) Clustering is a widely used technique in data mining applications to source, manage, analyze and extract vital information from large amounts of data. Most clustering procedures are limited in their performance when it comes to data with mixed attributes. In recent times, mixed data have evolved to include directional and functional data. In this study, we will give an introduction to clustering with an eye towards the application of the extended Gower coefficient by Hendrickson (2014). We will conduct a simulation study to assess the performance of this coefficient on mixed data whose functional component has strictly-decreasing signal curves and also those whose functional component has a mixture of strictly-decreasing signal curves and periodic tendencies. We will assess how four different hierarchical clustering algorithms perform on mixed data simulated under varying conditions with and without weights. The comparison of the various clustering solutions will be done using the Rand Index. Hierarchical Clustering Mixed data Strictly-decreasing signal Periodic signal Extended Gower coefficient. Applied Mathematics

Search results