Global ETD Search

1	New data mining models based on formal concept analysis and probability logic Jiang, Liying. January 1900 (has links) Thesis (Ph.D.)--University of Nebraska-Lincoln, 2006. / Title from title screen (site viewed on Jan 23, 2007). PDF text: 127 p. : ill. (some col.) ; 1.29Mb. UMI publication number: AAT 3216105. Includes bibliographical references. Also available in microfilm and microfiche format. Data mining. Association rule mining.
2	Apriori sets and sequences mining association rules from time sequence attributes. Pray, Keith A. January 2004 (has links) Thesis (M.S.) -- Worcester Polytechnic Institute. / Keywords: mining complex data; temporal association rules; computer system performance; stock market analysis; sleep disorder data. Includes bibliographical references (p. 79-85). Data mining. Association rule mining.
3	Association rule mining in cooperative research Zhang, Ya. Klein, Cerry M. January 2009 (has links) The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file. Title from PDF of title page (University of Missouri--Columbia, viewed January 26, 2010). Thesis advisor: Dr. Cerry M. Klein. Includes bibliographical references.
4	Use of data mining for investigation of crime patterns Padhye, Manoday D. January 2006 (has links) Thesis (M.S.)--West Virginia University, 2006. / Title from document title page. Document formatted into pages; contains viii, 108 p. : ill. (some col.). Includes abstract. Includes bibliographical references (p. 80-81).
5	Domain-concept mining an efficient on-demand data mining approach / Mahamaneerat, Wannapa Kay, Shyu, Chi-Ren. January 2008 (has links) Title from PDF of title page (University of Missouri--Columbia, viewed on February 24, 2010). The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file. Dissertation advisor: Dr. Chi-Ren Shyu. Vita. Includes bibliographical references.
6	pcApriori: Scalable apriori for multiprocessor systems Schlegel, Benjamin, Kiefer, Tim, Kissinger, Thomas, Lehner, Wolfgang 16 September 2022 (has links) Frequent-itemset mining is an important part of data mining. It is a computational and memory intensive task and has a large number of scientific and statistical application areas. In many of them, the datasets can easily grow up to tens or even several hundred gigabytes of data. Hence, efficient algorithms are required to process such amounts of data. In the recent years, there have been proposed many efficient sequential mining algorithms, which however cannot exploit current and future systems providing large degrees of parallelism. Contrary, the number of parallel frequent-itemset mining algorithms is rather small and most of them do not scale well as the number of threads is largely increased. In this paper, we present a highly-scalable mining algorithm that is based on the well-known Apriori algorithm; it is optimized for processing very large datasets on multiprocessor systems. The key idea of pcApriori is to employ a modified producer--consumer processing scheme, which partitions the data during processing and distributes it to the available threads. We conduct many experiments on large datasets. pcApriori scales almost linear on our test system comprising 32 cores. info:eu-repo/classification/ddc/004 ddc:004
7	Frequent itemset mining on multiprocessor systems Schlegel, Benjamin 30 May 2013 (has links) Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets. info:eu-repo/classification/ddc/004 ddc:004

1

Page generated in 0.201 seconds