• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Parallel Closet+ Algorithm For Finding Frequent Closed Itemsets

Sen, Tayfun 01 July 2009 (has links) (PDF)
Data mining is proving itself to be a very important field as the data available is increasing exponentially, thanks to first computerization and now internetization. On the other hand, cluster computing systems made up of commodity hardware are becoming widespread, along with the multicore processor architectures. This high computing power is synthesized with data mining to process huge amounts of data and to reach information and knowledge. Frequent itemset mining is a special subtopic of data mining because it is an integral part of many types of data mining tasks. Often this task is a prerequisite for many other data mining algorithms, most notably algorithms in the association rule mining area. For this reason, it is studied heavily in the literature. In this thesis, a parallel implementation of CLOSET+, a frequent closed itemset mining algorithm, is presented. The CLOSET+ algorithm has been modified to run on multiple processors simultaneously, in order to obtain results faster. Open MPI and Boost libraries have been used for the communication between different processes and the program has been tested on different inputs and parameters. Experimental results show that the algorithm exhibits high speedup and eficiency for dense data when the support value is higher than a determined value. Proposed parallel algorithm could prove to be useful for application areas where fast response is needed for low to medium number of frequent closed itemsets. A particular application area is the Web where online applications have similar requirements.
2

FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory

Oukid, Ismail, Lasperas, Johan, Nica, Anisoara, Willhalm, Thomas, Lehner, Wolfgang 17 August 2022 (has links)
The advent of Storage Class Memory (SCM) is driving a rethink of storage systems towards a single-level architecture where memory and storage are merged. In this context, several works have investigated how to design persistent trees in SCM as a fundamental building block for these novel systems. However, these trees are significantly slower than DRAM-based counterparts since trees are latency-sensitive and SCM exhibits higher latencies than DRAM. In this paper we propose a novel hybrid SCM-DRAM persistent and concurrent B-Tree, named Fingerprinting Persistent Tree (FPTree) that achieves similar performance to DRAM-based counterparts. In this novel design, leaf nodes are persisted in SCM while inner nodes are placed in DRAM and rebuilt upon recovery. The FPTree uses Fingerprinting, a technique that limits the expected number of in-leaf probed keys to one. In addition, we propose a hybrid concurrency scheme for the FPTree that is partially based on Hardware Transactional Memory. We conduct a thorough performance evaluation and show that the FPTree outperforms state-of-the-art persistent trees with different SCM latencies by up to a factor of 8.2. Moreover, we show that the FPTree scales very well on a machine with 88 logical cores. Finally, we integrate the evaluated trees in memcached and a prototype database. We show that the FPTree incurs an almost negligible performance overhead over using fully transient data structures, while significantly outperforming other persistent trees.

Page generated in 0.0242 seconds