• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 14
  • 6
  • 5
  • Tagged with
  • 24
  • 24
  • 24
  • 24
  • 9
  • 8
  • 5
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Scalable model-based clustering algorithms for large databases and their applications. / CUHK electronic theses & dissertations collection / Digital dissertation consortium

January 2002 (has links)
by Huidong Jin. / "August 2002." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (p. 193-204). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. / Abstracts in English and Chinese.
12

Learnable similarity functions and their application to record linkage and clustering

Bilenko, Mikhail Yuryevich 28 August 2008 (has links)
Not available / text
13

Mining complex databases using the EM algorithm

Ordońẽz, Carlos January 2000 (has links)
No description available.
14

Aggregation in large scale quadratic programming

Foster, David Martin 08 1900 (has links)
No description available.
15

Extending low-rank matrix factorizations for emerging applications

Zhou, Ke 13 January 2014 (has links)
Low-rank matrix factorizations have become increasingly popular to project high dimensional data into latent spaces with small dimensions in order to obtain better understandings of the data and thus more accurate predictions. In particular, they have been widely applied to important applications such as collaborative filtering and social network analysis. In this thesis, I investigate the applications and extensions of the ideas of the low-rank matrix factorization to solve several practically important problems arise from collaborative filtering and social network analysis. A key challenge in recommendation system research is how to effectively profile new users, a problem generally known as \emph{cold-start} recommendation. In the first part of this work, we extend the low-rank matrix factorization by allowing the latent factors to have more complex structures --- decision trees to solve the problem of cold-start recommendations. In particular, we present \emph{functional matrix factorization} (fMF), a novel cold-start recommendation method that solves the problem of adaptive interview construction based on low-rank matrix factorizations. The second part of this work considers the efficiency problem of making recommendations in the context of large user and item spaces. Specifically, we address the problem through learning binary codes for collaborative filtering, which can be viewed as restricting the latent factors in low-rank matrix factorizations to be binary vectors that represent the binary codes for both users and items. In the third part of this work, we investigate the applications of low-rank matrix factorizations in the context of social network analysis. Specifically, we propose a convex optimization approach to discover the hidden network of social influence with low-rank and sparse structure by modeling the recurrent events at different individuals as multi-dimensional Hawkes processes, emphasizing the mutual-excitation nature of the dynamics of event occurrences. The proposed framework combines the estimation of mutually exciting process and the low-rank matrix factorization in a principled manner. In the fourth part of this work, we estimate the triggering kernels for the Hawkes process. In particular, we focus on estimating the triggering kernels from an infinite dimensional functional space with the Euler Lagrange equation, which can be viewed as applying the idea of low-rank factorizations in the functional space.
16

Cluster dynamics in the Basque region of Spain

Luque, N. E. January 2011 (has links)
Developing and retaining competitive advantage was a major concern for all companies; it fundamentally relied on being aware of the external environment and customer satisfaction. Modifications of the environment conditions and unexpected economic events could cause of a loss of the level of organisational adjustment and subsequent loss in competitiveness, only those organisations able to rapidly adjust to these dynamics would be able to remain. In some instances, companies decided to geographically co-locate seeking economies of scale and benefiting from complementarities. Literature review revealed the strong support that clusters had from Government and Local Authorities, but it also highlighted the limited practical research in the field. The aim of this research was to measure the dynamism of the cluster formed by the geographical concentration of diverse manufacturers within the Mondragon Cooperativa Group in the Basque region of Spain, and compared it to the individual dynamism of these organisations in order to have a better understanding the actual complementarities and synergies of this industrial colocation. Literature review identified dynamic capabilities as the core enablers of organisation when competing in dynamic environments; based on these capabilities, a model was formulated. This model combined with the primary data collected via questionnaire and interviews helped measure the dynamism of the individual cluster members and the cluster as whole as well as provided an insight on the complementarities and synergies of this type of alliance. The findings of the research concluded that the cluster as a whole was more dynamic than the individual members; nevertheless, the model suggested that there were considerable differences in speed among the cluster members. These differences on speed were determined by the size of the company and their performance in dimensions such as marketing, culture and management. The research also suggested that despite of the clear differences in the level of dynamism among cluster members, all companies benefited in some way from being part of the cluster; these benefits were different in nature depending on each specific members.
17

Digital photo album management techniques: from one dimension to multi-dimension.

January 2005 (has links)
Lu Yang. / Thesis submitted in: November 2004. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 96-103). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Our Contributions --- p.3 / Chapter 1.3 --- Thesis Outline --- p.5 / Chapter 2 --- Background Study --- p.7 / Chapter 2.1 --- MPEG-7 Introduction --- p.8 / Chapter 2.2 --- Image Analysis in CBIR Systems --- p.11 / Chapter 2.2.1 --- Color Information --- p.13 / Chapter 2.2.2 --- Color Layout --- p.19 / Chapter 2.2.3 --- Texture Information --- p.20 / Chapter 2.2.4 --- Shape Information --- p.24 / Chapter 2.2.5 --- CBIR Systems --- p.26 / Chapter 2.3 --- Image Processing in JPEG Frequency Domain --- p.30 / Chapter 2.4 --- Photo Album Clustering --- p.33 / Chapter 3 --- Feature Extraction and Similarity Analysis --- p.38 / Chapter 3.1 --- Feature Set in Frequency Domain --- p.38 / Chapter 3.1.1 --- JPEG Frequency Data --- p.39 / Chapter 3.1.2 --- Our Feature Set --- p.42 / Chapter 3.2 --- Digital Photo Similarity Analysis --- p.43 / Chapter 3.2.1 --- Energy Histogram --- p.43 / Chapter 3.2.2 --- Photo Distance --- p.45 / Chapter 4 --- 1-Dimensional Photo Album Management Techniques --- p.49 / Chapter 4.1 --- Photo Album Sorting --- p.50 / Chapter 4.2 --- Photo Album Clustering --- p.52 / Chapter 4.3 --- Photo Album Compression --- p.56 / Chapter 4.3.1 --- Variable IBP frames --- p.56 / Chapter 4.3.2 --- Adaptive Search Window --- p.57 / Chapter 4.3.3 --- Compression Flow --- p.59 / Chapter 4.4 --- Experiments and Performance Evaluations --- p.60 / Chapter 5 --- High Dimensional Photo Clustering --- p.67 / Chapter 5.1 --- Traditional Clustering Techniques --- p.67 / Chapter 5.1.1 --- Hierarchical Clustering --- p.68 / Chapter 5.1.2 --- Traditional K-means --- p.71 / Chapter 5.2 --- Multidimensional Scaling --- p.74 / Chapter 5.2.1 --- Introduction --- p.75 / Chapter 5.2.2 --- Classical Scaling --- p.77 / Chapter 5.3 --- Our Interactive MDS-based Clustering --- p.80 / Chapter 5.3.1 --- Principal Coordinates from MDS --- p.81 / Chapter 5.3.2 --- Clustering Scheme --- p.82 / Chapter 5.3.3 --- Layout Scheme --- p.84 / Chapter 5.4 --- Experiments and Results --- p.87 / Chapter 6 --- Conclusions --- p.94 / Bibliography --- p.96
18

A novel framework for binning environmental genomic fragments

Yang, Bin, 杨彬 January 2010 (has links)
published_or_final_version / Computer Science / Master / Master of Philosophy
19

Sharing the love : a generic socket API for Hadoop Mapreduce

Yee, Adam J. 01 January 2011 (has links)
Hadoop is a popular software framework written in Java that performs data-intensive distributed computations on a cluster. It includes Hadoop MapReduce and the Hadoop Distributed File System (HDFS). HDFS has known scalability limitations due to its single NameNode which holds the entire file system namespace in RAM on one computer. Therefore, the NameNode can only store limited amounts of file names depending on the RAM capacity. The solution to furthering scalability is distributing the namespace similar to how file is data divided into chunks and stored across cluster nodes. Hadoop has an abstract file system API which is extended to integrate HDFS, but has also been extended for integrating file systems S3, CloudStore, Ceph and PVFS. File systems Ceph and PVFS already distribute the namespace, while others such as Lustre are making the conversion. Google previously announced in 2009 they have been implementing a Google File System distributed namespace to achieve greater scalability. The Generic Hadoop API is created from Hadoop's abstract file system API. It speaks a simple communication protocol that can integrate any file system which supports TCP sockets. By providing a file system agnostic API, future work with other file systems might provide ways for surpassing Hadoop 's current scalability limitations. Furthermore, the new API eliminates the need for customizing Hadoop's Java implementation, and instead moves the implementation to the file system itself. Thus, developers wishing to integrate their new file system with Hadoop are not responsible for understanding details ofHadoop's internal operation. The API is tested on a homogeneous, four-node cluster with OrangeFS. Initial OrangeFS I/0 throughputs compared to HDFS are 67% ofHDFS' write throughput and 74% percent of HDFS' read throughput. But, compared with an alternate method of integrating with OrangeFS (a POSIX kernel interface), write and read throughput is increased by 23% and 7%, respectively
20

Text mining of online book reviews for non-trivial clustering of books and users

Lin, Eric 14 August 2013 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The classification of consumable media by mining relevant text for their identifying features is a subjective process. Previous attempts to perform this type of feature mining have generally been limited in scope due having limited access to user data. Many of these studies used human domain knowledge to evaluate the accuracy of features extracted using these methods. In this thesis, we mine book review text to identify nontrivial features of a set of similar books. We make comparisons between books by looking for books that share characteristics, ultimately performing clustering on the books in our data set. We use the same mining process to identify a corresponding set of characteristics in users. Finally, we evaluate the quality of our methods by examining the correlation between our similarity metric, and user ratings.

Page generated in 0.1403 seconds