Global ETD Search

1	A Large Itemset-Based Approach to Mining Subspace Clusters from DNA Microarray Data Tsai, Yueh-Chi 20 June 2008 (has links) DNA Microarrays are one of the latest breakthroughs in experimental molecular biology and have opened the possibility of creating datasets of molecular information to represent many systems of biological or clinical interest. Clustering techniques have been proven to be helpful to understand gene function, gene regulation, cellular processes, and subtypes of cells. Investigations show that more often than not, several genes contribute to a disease, which motivates researchers to identify a subset of genes whose expression levels are similar under a subset of conditions. Most of the subspace clustering models define similarity among different objects by distances over either all or only a subset of the dimensions. However, strong correlations may still exist among a set of objects, even if they are far apart from each other as measured by the distance functions. Many techniques, such as pCluster and zCluster, have been proposed to find subspace clusters with the coherence expression of a subset of genes on a subset of conditions. However, both of them contain the time-consuming steps, which are constructing gene-pair MDSs and distributing the gene information in each node of a prefix tree. Therefore, in this thesis, we propose a Large Itemset-Based Clustering (LISC) algorithm to improve the disadvantages of the pCluster and zCluster algorithms. First, we avoid to construct the gene-pair MDSs. We only construct the condition-pair MDSs to reduce the processing time. Second, we transform the task of mining the possible maximal gene sets into the mining problem of the large itemsets from the condition-pair MDSs. We make use of the concept of the large itemset which is used in mining association rules, where a large itemset is represented as a set of items appearing in a sufficient number of transactions. Since we are only interested in the subspace cluster with gene sets as large as possible, it is desirable to pay attention to those gene sets which have reasonably large support from the condition-pair MDSs. In other words, we want to find the large itemsets from the condition-pair MDSs; therefore, we obtain the gene set with respect to enough condition-pairs. In this step, we efficiently use the revised version of FP-tree structure, which has been shown to be one of the most efficient data structures for mining large itemsets, to find the large itemsets of gene sets from the condition-pair MDSs. Thus, we can avoid the complex distributing operation and reduce the search space dramatically by using the FP-tree structure. Finally, we develop an algorithm to construct the final clusters from the gene set and the condition--pair after searching the FP-tree. Since we are interested in the clusters which are large enough and not belong to any other clusters, we alternately combine or extend the gene sets and the condition sets to construct the interesting subspace clusters as large as possible. From our simulation results, we show that our proposed algorithm needs shorter processing time than those previous proposed algorithms, since they need to construct gene-pair MDSs. Large Itemset Microarray Subspace Clustering pCluster FP-tree
2	Daugiamačių sekų šablonų analizė / Multidimensional sequential pattern mining Ivaškevičius, Klaidas 30 June 2014 (has links) Pagrindinis šio magistro baigiamojo darbo tikslas buvo apžvelgti kai kurių algoritmų ir jų kombinacijų pritaikymą daugiamačiams sekų šablonams analizuoti ir įgyvendinti algoritmą, gebantį tai atlikti. Buvo aprašyta FP-Tree medžio struktūra, kuri yra skirta kompaktiškai saugoti kritiniams (pvz., dažnai pasikartojantiems) duomenims, pateiktas FP-Growth algoritmas, galintis analizuoti tokią duomenų struktūrą ir rezultate pateikiantis visų dažnų elementų šablonų aibę. Pristatyta modifikuotų FP-Growth ir PrefixSpan algoritmų kombinacija – MD-PS-FPG algoritmas, pateikti kai kurių atliktų testavimų rezultatai, tolimesnių darbų pagrindiniai tikslai ir pan. / The main goal of this master final work was to present some of the algorithms and their combinations for the multidimensional sequence pattern mining and implement an algorithm, that is capable of doing that. FP-Tree, that is used to store critical (for example, often repeated) data, was described. FP-Growth algorithm, that can analyze FP-Tree structure and give frequent pattern set as a result, was presented. MD-PS-FPG algorithm – a combination of modified FP-Growth and PrefixSpan algorithms – was introduced. The results of some tests, further work objectives and other things were also presented. Data mining Multidimensional Sequence Pattern Algorithm PrefixSpan FP-Tree FP-Growth MD-PS-FPG
3	Získávání frekventovaných vzorů z proudu dat / Frequent Pattern Discovery in a Data Stream Dvořák, Michal January 2012 (has links) Frequent-pattern mining from databases has been widely studied and frequently observed. Unfortunately, these algorithms are not suitable for data stream processing. In frequent-pattern mining from data streams, it is important to manage sets of items and also their history. There are several reasons for this; it is not just the history of frequent items, but also the history of potentially frequent sets that can become frequent later. This requires more memory and computational power. This thesis describes two algorithms: Lossy Counting and FP-stream. An effective implementation of these algorithms in C# is an integral part of this thesis. In addition, the two algorithms have been compared.
4	Získávání znalostí z datových skladů / Knowledge Discovery over Data Warehouses Pumprla, Ondřej January 2009 (has links) This Master's thesis deals with the principles of the data mining process, especially with the mining of association rules. The theoretical apparatus of general description and principles of the data warehouse creation is set. On the basis of this theoretical knowledge, the application for the association rules mining is implemented. The application requires the data in the transactional form or the multidimensional data organized in the Star schema. The implemented algorithms for finding of the frequent patterns are Apriori and FP-tree. The system allows the variant setting of parameters for mining process. Also, the validation tests and efficiency proofs were accomplished. From the point of view of the association rules searching support, the resultant application is more applicable and robust than the existing compared systems SAS Miner and Oracle Data Miner.

1

Page generated in 0.0217 seconds