Global ETD Search

441	Entropy based techniques with applications in data mining Okafor, Anthony. January 2005 (has links) Thesis (Ph. D.)--University of Florida, 2005. / Title from title page of source document. Document formatted into pages; contains 97 pages. Includes vita. Includes bibliographical references.
442	Association rule based classification Palanisamy, Senthil Kumar. January 2006 (has links) Thesis (M.S.)--Worcester Polytechnic Institute. / Keywords: Itemset Pruning, Association Rules, Adaptive Minimal Support, Associative Classification, Classification. Includes bibliographical references (p.70-74).
443	Pattern discovery in spatial, image, and biological data / Qian, Yu., January 2006 (has links) Thesis (Ph. D.)--University of Texas at Dallas, 2006. / Includes vita. Includes bibliographical references (leaves 195-205).
444	Applying data mining to job-shop scheduling using regression analysis Innani, Alok D. January 2004 (has links) Thesis (M.S.)--Ohio University, August, 2004. / Title from PDF t.p. Includes bibliographical references (p. 84-87)
445	Use of data mining for investigation of crime patterns Padhye, Manoday D. January 2006 (has links) Thesis (M.S.)--West Virginia University, 2006. / Title from document title page. Document formatted into pages; contains viii, 108 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 80-81).
446	Online Aggregation über Datenströmen mit Verfahren der mathematischen Statistik in grossen Datenbanksystemen Blohsfeld, Björn. Unknown Date (has links) Universiẗat, Diss., 2002--Marburg.
447	On building predictive models with company annual reports Qiu, Xin Ying. January 2007 (has links) Thesis (Ph. D.)--University of Iowa, 2007. / Supervisor: Padmini Srinivasan. Includes bibliographical references (leaves 93-100).
448	Beschleunigte Entwicklung von Katalysatorsystemen und Polymeren durch Automatisierung, kombinatorische Methoden, schnelle Analytik und Datenanalyse Tuchbreiter, Arno. Unknown Date (has links) (PDF) Universiẗat, Diss., 2003--Freiburg (Breisgau).
449	Συγκριτική μελέτη κατανεμημένων και παράλληλων αλγόριθμων παραγωγής κανόνων συσχέτισης Γερολυμάτος, Αντώνιος 23 August 2010 (has links) - / - Αλγόριθμοι Εξόρυξη δεδομένων 511.8 Algorithms Data mining
450	Pivot-based Data Partitioning for Distributed k Nearest Neighbor Mining Kuhlman, Caitlin Anne 20 January 2017 (has links) This thesis addresses the need for a scalable distributed solution for k-nearest-neighbor (kNN) search, a fundamental data mining task. This unsupervised method poses particular challenges on shared-nothing distributed architectures, where global information about the dataset is not available to individual machines. The distance to search for neighbors is not known a priori, and therefore a dynamic data partitioning strategy is required to guarantee that exact kNN can be found autonomously on each machine. Pivot-based partitioning has been shown to facilitate bounding of partitions, however state-of-the-art methods suffer from prohibitive data duplication (upwards of 20x the size of the dataset). In this work an innovative method for solving exact distributed kNN search called PkNN is presented. The key idea is to perform computation over several rounds, leveraging pivot-based data partitioning at each stage. Aggressive data-driven bounds limit communication costs, and a number of optimizations are designed for efficient computation. Experimental study on large real-world data (over 1 billion points) compares PkNN to the state-of-the-art distributed solution, demonstrating that the benefits of additional stages of computation in the PkNN method heavily outweigh the added I/O overhead. PkNN achieves a data duplication rate close to 1, significant speedup over previous solutions, and scales effectively in data cardinality and dimension. PkNN can facilitate distributed solutions to other unsupervised learning methods which rely on kNN search as a critical building block. As one example, a distributed framework for the Local Outlier Factor (LOF) algorithm is given. Testing on large real-world and synthetic data with varying characteristics measures the scalability of PkNN and the distributed LOF framework in data size and dimensionality. distributed computing kNN Search data Mining

Search results