Global ETD Search

11	Scalable model-based clustering algorithms for large databases and their applications. / CUHK electronic theses & dissertations collection / Digital dissertation consortium January 2002 (has links) by Huidong Jin. / "August 2002." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (p. 193-204). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. / Abstracts in English and Chinese. Cluster analysis--Data processing Data mining Database management
12	Learnable similarity functions and their application to record linkage and clustering Bilenko, Mikhail Yuryevich 28 August 2008 (has links) Not available / text Cluster analysis--Data processing Pattern recognition systems Machine learning
13	Mining complex databases using the EM algorithm Ordońẽz, Carlos January 2000 (has links) No description available. Expectation-maximization algorithms Cluster analysis Data processing Data mining
14	Aggregation in large scale quadratic programming Foster, David Martin 08 1900 (has links) No description available. Cluster analysis Data processing Linear programming Quadratic programming
15	Extending low-rank matrix factorizations for emerging applications Zhou, Ke 13 January 2014 (has links) Low-rank matrix factorizations have become increasingly popular to project high dimensional data into latent spaces with small dimensions in order to obtain better understandings of the data and thus more accurate predictions. In particular, they have been widely applied to important applications such as collaborative filtering and social network analysis. In this thesis, I investigate the applications and extensions of the ideas of the low-rank matrix factorization to solve several practically important problems arise from collaborative filtering and social network analysis. A key challenge in recommendation system research is how to effectively profile new users, a problem generally known as \emph{cold-start} recommendation. In the first part of this work, we extend the low-rank matrix factorization by allowing the latent factors to have more complex structures --- decision trees to solve the problem of cold-start recommendations. In particular, we present \emph{functional matrix factorization} (fMF), a novel cold-start recommendation method that solves the problem of adaptive interview construction based on low-rank matrix factorizations. The second part of this work considers the efficiency problem of making recommendations in the context of large user and item spaces. Specifically, we address the problem through learning binary codes for collaborative filtering, which can be viewed as restricting the latent factors in low-rank matrix factorizations to be binary vectors that represent the binary codes for both users and items. In the third part of this work, we investigate the applications of low-rank matrix factorizations in the context of social network analysis. Specifically, we propose a convex optimization approach to discover the hidden network of social influence with low-rank and sparse structure by modeling the recurrent events at different individuals as multi-dimensional Hawkes processes, emphasizing the mutual-excitation nature of the dynamics of event occurrences. The proposed framework combines the estimation of mutually exciting process and the low-rank matrix factorization in a principled manner. In the fourth part of this work, we estimate the triggering kernels for the Hawkes process. In particular, we focus on estimating the triggering kernels from an infinite dimensional functional space with the Euler Lagrange equation, which can be viewed as applying the idea of low-rank factorizations in the functional space. Matrix factorization Collaborative filtering Social network Dimensional analysis Computer programs Cluster analysis Data processing Social networks
16	Cluster dynamics in the Basque region of Spain Luque, N. E. January 2011 (has links) Developing and retaining competitive advantage was a major concern for all companies; it fundamentally relied on being aware of the external environment and customer satisfaction. Modifications of the environment conditions and unexpected economic events could cause of a loss of the level of organisational adjustment and subsequent loss in competitiveness, only those organisations able to rapidly adjust to these dynamics would be able to remain. In some instances, companies decided to geographically co-locate seeking economies of scale and benefiting from complementarities. Literature review revealed the strong support that clusters had from Government and Local Authorities, but it also highlighted the limited practical research in the field. The aim of this research was to measure the dynamism of the cluster formed by the geographical concentration of diverse manufacturers within the Mondragon Cooperativa Group in the Basque region of Spain, and compared it to the individual dynamism of these organisations in order to have a better understanding the actual complementarities and synergies of this industrial colocation. Literature review identified dynamic capabilities as the core enablers of organisation when competing in dynamic environments; based on these capabilities, a model was formulated. This model combined with the primary data collected via questionnaire and interviews helped measure the dynamism of the individual cluster members and the cluster as whole as well as provided an insight on the complementarities and synergies of this type of alliance. The findings of the research concluded that the cluster as a whole was more dynamic than the individual members; nevertheless, the model suggested that there were considerable differences in speed among the cluster members. These differences on speed were determined by the size of the company and their performance in dimensions such as marketing, culture and management. The research also suggested that despite of the clear differences in the level of dynamism among cluster members, all companies benefited in some way from being part of the cluster; these benefits were different in nature depending on each specific members. 658
17	Digital photo album management techniques: from one dimension to multi-dimension. January 2005 (has links) Lu Yang. / Thesis submitted in: November 2004. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 96-103). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Our Contributions --- p.3 / Chapter 1.3 --- Thesis Outline --- p.5 / Chapter 2 --- Background Study --- p.7 / Chapter 2.1 --- MPEG-7 Introduction --- p.8 / Chapter 2.2 --- Image Analysis in CBIR Systems --- p.11 / Chapter 2.2.1 --- Color Information --- p.13 / Chapter 2.2.2 --- Color Layout --- p.19 / Chapter 2.2.3 --- Texture Information --- p.20 / Chapter 2.2.4 --- Shape Information --- p.24 / Chapter 2.2.5 --- CBIR Systems --- p.26 / Chapter 2.3 --- Image Processing in JPEG Frequency Domain --- p.30 / Chapter 2.4 --- Photo Album Clustering --- p.33 / Chapter 3 --- Feature Extraction and Similarity Analysis --- p.38 / Chapter 3.1 --- Feature Set in Frequency Domain --- p.38 / Chapter 3.1.1 --- JPEG Frequency Data --- p.39 / Chapter 3.1.2 --- Our Feature Set --- p.42 / Chapter 3.2 --- Digital Photo Similarity Analysis --- p.43 / Chapter 3.2.1 --- Energy Histogram --- p.43 / Chapter 3.2.2 --- Photo Distance --- p.45 / Chapter 4 --- 1-Dimensional Photo Album Management Techniques --- p.49 / Chapter 4.1 --- Photo Album Sorting --- p.50 / Chapter 4.2 --- Photo Album Clustering --- p.52 / Chapter 4.3 --- Photo Album Compression --- p.56 / Chapter 4.3.1 --- Variable IBP frames --- p.56 / Chapter 4.3.2 --- Adaptive Search Window --- p.57 / Chapter 4.3.3 --- Compression Flow --- p.59 / Chapter 4.4 --- Experiments and Performance Evaluations --- p.60 / Chapter 5 --- High Dimensional Photo Clustering --- p.67 / Chapter 5.1 --- Traditional Clustering Techniques --- p.67 / Chapter 5.1.1 --- Hierarchical Clustering --- p.68 / Chapter 5.1.2 --- Traditional K-means --- p.71 / Chapter 5.2 --- Multidimensional Scaling --- p.74 / Chapter 5.2.1 --- Introduction --- p.75 / Chapter 5.2.2 --- Classical Scaling --- p.77 / Chapter 5.3 --- Our Interactive MDS-based Clustering --- p.80 / Chapter 5.3.1 --- Principal Coordinates from MDS --- p.81 / Chapter 5.3.2 --- Clustering Scheme --- p.82 / Chapter 5.3.3 --- Layout Scheme --- p.84 / Chapter 5.4 --- Experiments and Results --- p.87 / Chapter 6 --- Conclusions --- p.94 / Bibliography --- p.96 Photograph albums--Data processing Image processing--Digital techniques Cluster analysis--Data processing Computer algorithms
18	A novel framework for binning environmental genomic fragments Yang, Bin, 杨彬 January 2010 (has links) published_or_final_version / Computer Science / Master / Master of Philosophy Genomics - Data processing. Genomes - Data processing. Microbial ecology - Data processing. Cluster analysis - Data processing. Cluster analysis - Computer programs.
19	Sharing the love : a generic socket API for Hadoop Mapreduce Yee, Adam J. 01 January 2011 (has links) Hadoop is a popular software framework written in Java that performs data-intensive distributed computations on a cluster. It includes Hadoop MapReduce and the Hadoop Distributed File System (HDFS). HDFS has known scalability limitations due to its single NameNode which holds the entire file system namespace in RAM on one computer. Therefore, the NameNode can only store limited amounts of file names depending on the RAM capacity. The solution to furthering scalability is distributing the namespace similar to how file is data divided into chunks and stored across cluster nodes. Hadoop has an abstract file system API which is extended to integrate HDFS, but has also been extended for integrating file systems S3, CloudStore, Ceph and PVFS. File systems Ceph and PVFS already distribute the namespace, while others such as Lustre are making the conversion. Google previously announced in 2009 they have been implementing a Google File System distributed namespace to achieve greater scalability. The Generic Hadoop API is created from Hadoop's abstract file system API. It speaks a simple communication protocol that can integrate any file system which supports TCP sockets. By providing a file system agnostic API, future work with other file systems might provide ways for surpassing Hadoop 's current scalability limitations. Furthermore, the new API eliminates the need for customizing Hadoop's Java implementation, and instead moves the implementation to the file system itself. Thus, developers wishing to integrate their new file system with Hadoop are not responsible for understanding details ofHadoop's internal operation. The API is tested on a homogeneous, four-node cluster with OrangeFS. Initial OrangeFS I/0 throughputs compared to HDFS are 67% ofHDFS' write throughput and 74% percent of HDFS' read throughput. But, compared with an alternate method of integrating with OrangeFS (a POSIX kernel interface), write and read throughput is increased by 23% and 7%, respectively Apache Hadoop (Computer file) MapReduce (Computer program) Cluster analysis Data processing Computer algorithms Computer Sciences
20	Text mining of online book reviews for non-trivial clustering of books and users Lin, Eric 14 August 2013 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The classification of consumable media by mining relevant text for their identifying features is a subjective process. Previous attempts to perform this type of feature mining have generally been limited in scope due having limited access to user data. Many of these studies used human domain knowledge to evaluate the accuracy of features extracted using these methods. In this thesis, we mine book review text to identify nontrivial features of a set of similar books. We make comparisons between books by looking for books that share characteristics, ultimately performing clustering on the books in our data set. We use the same mining process to identify a corresponding set of characteristics in users. Finally, we evaluate the quality of our methods by examining the correlation between our similarity metric, and user ratings. mining data analysis recommendation sentiment End-user computing Web usage mining Knowledge management Information behavior -- Research Cluster analysis -- Data processing System analysis -- Data processing Information retrieval -- Book reviews

Search results