Return to search

pcApriori: Scalable apriori for multiprocessor systems

Frequent-itemset mining is an important part of data mining. It is a computational and memory intensive task and has a large number of scientific and statistical application areas. In many of them, the datasets can easily grow up to tens or even several hundred gigabytes of data. Hence, efficient algorithms are required to process such amounts of data. In the recent years, there have been proposed many efficient sequential mining algorithms, which however cannot exploit current and future systems providing large degrees of parallelism. Contrary, the number of parallel frequent-itemset mining algorithms is rather small and most of them do not scale well as the number of threads is largely increased. In this paper, we present a highly-scalable mining algorithm that is based on the well-known Apriori algorithm; it is optimized for processing very large datasets on multiprocessor systems. The key idea of pcApriori is to employ a modified producer--consumer processing scheme, which partitions the data during processing and distributes it to the available threads. We conduct many experiments on large datasets. pcApriori scales almost linear on our test system comprising 32 cores.

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:80641
Date16 September 2022
CreatorsSchlegel, Benjamin, Kiefer, Tim, Kissinger, Thomas, Lehner, Wolfgang
PublisherACM
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/acceptedVersion, doc-type:conferenceObject, info:eu-repo/semantics/conferenceObject, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess
Relation978-1-4503-1921-8, 20, 10.1145/2484838.2484879, info:eu-repo/grantAgreement/Deutsche Forschungsgemeinschaft/Sonderforschungsbereiche/164481002//HAEC - Highly Adaptive Energy-Efficient Computing/SFB 912

Page generated in 0.0022 seconds