Global ETD Search

1	Didelių duomenų sekų analizės problemos / Data mining problems Ambraziūnas, Valdas 11 June 2004 (has links) The main goal of these thesis is to compare association rules finding algorithms and to indicate the usability of finding association rules in business area. In order to achieve this goal, the theoretical analysis of three algorithms is done: 1. The Apriori algorithm – the most well known association rule algorithm – based on the property: “Any subset of a large itemset must be large”. This algorithm assumes that the database is memory-resident. The maximum number of database scans is one more than the cardinality of the largest large itemset. 2. The Sampling algorithm deals with the database sample prior the full database scan. The database sample is drawn such that it can be memory-resident. The Sampling algorithm reduces the number of database scans to one in the best case and two in the worst case. 3. The Partitioning algorithm divides database into partitions and bases on the property: “A large itemset must be large in at least one of the partitions”. This algorithm reduces the number of database scans to two and divides the database into partitions such that each partition can be placed into main memory. There are created programs for all three algorithms plus the program for the full set of itemsets algorithm. Programs are created in C++ language. In order to achieve topmost performance, the GUI is missed. Nine test data sets are created to compare the algorithms. Six of them contains real life data from telecommunications business area. Datasets varies from the... [to full text] Informatics Apriori Duomenų analizė Partitioning Sampling Association rules Data mining Asociacinės taisyklės

Search results

Didelių duomenų sekų analizės problemos / Data mining problems