Global ETD Search

1	Daugiamačių sekų šablonų analizė / Multidimensional sequential pattern mining Ivaškevičius, Klaidas 30 June 2014 (has links) Pagrindinis šio magistro baigiamojo darbo tikslas buvo apžvelgti kai kurių algoritmų ir jų kombinacijų pritaikymą daugiamačiams sekų šablonams analizuoti ir įgyvendinti algoritmą, gebantį tai atlikti. Buvo aprašyta FP-Tree medžio struktūra, kuri yra skirta kompaktiškai saugoti kritiniams (pvz., dažnai pasikartojantiems) duomenims, pateiktas FP-Growth algoritmas, galintis analizuoti tokią duomenų struktūrą ir rezultate pateikiantis visų dažnų elementų šablonų aibę. Pristatyta modifikuotų FP-Growth ir PrefixSpan algoritmų kombinacija – MD-PS-FPG algoritmas, pateikti kai kurių atliktų testavimų rezultatai, tolimesnių darbų pagrindiniai tikslai ir pan. / The main goal of this master final work was to present some of the algorithms and their combinations for the multidimensional sequence pattern mining and implement an algorithm, that is capable of doing that. FP-Tree, that is used to store critical (for example, often repeated) data, was described. FP-Growth algorithm, that can analyze FP-Tree structure and give frequent pattern set as a result, was presented. MD-PS-FPG algorithm – a combination of modified FP-Growth and PrefixSpan algorithms – was introduced. The results of some tests, further work objectives and other things were also presented. Data mining Multidimensional Sequence Pattern Algorithm PrefixSpan FP-Tree FP-Growth MD-PS-FPG
2	Applying the Apriori and FP-Growth Association Algorithms to Liver Cancer Data Pinheiro, Fabiola M. R. 27 August 2013 (has links) Cancer is the leading cause of deaths globally. Although liver cancer ranks only fourth in incidence worldwide among all types of cancer, its survivability rate is the lowest. Liver cancer is often diagnosed at an advanced stage, because in the early stages of the disease patients usually do not have signs or symptoms. After initial diagnosis, therapeutic options are limited and tend to be effective only for small size tumors with limited spread and minimal vascular invasion. As a result, long-term patient survival remains minimal, and has not improved in the past three decades. In order to reduce morbidity and mortality from liver cancer, improvement in early diagnosis and the evaluation of current treatments are essential. This study tested the applicability of the Apriori and FP-Growth association data mining algorithms to liver cancer patient data, obtained from the British Columbia Cancer Agency. The data was used to develop association rules which indicate what combinations of factors are most commonly observed with liver cancer incidence as well as with increased or decreased rates of mortality. Ideally, these association rules will be applied in future studies using liver cancer data extracted from other Electronic Health Record (EHR) systems. The main objective of making these rules available is to facilitate early detection guidelines for liver cancer and to evaluate current treatment options. / Graduate / 0566 / 0984 / fabiola@uvic.ca liver cancer association analysis association rule apriori fp-growth data mining British Columbia Yukon
3	Uma arquitetura de software para descoberta de regras de associação multidimensional, multinível e de outliers em cubos OLAP: um estudo de caso com os algoritmos APriori e FPGrowth Moreira Tanuro, Carla 31 January 2010 (has links) Made available in DSpace on 2014-06-12T15:55:26Z (GMT). No. of bitstreams: 2 arquivo2236_1.pdf: 2979608 bytes, checksum: 3c3ed256a9de67bd5b716bb15d15cb6c (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2010 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / O processo tradicional de descoberta de conhecimento em bases de dados (KDD Knowledge Discovery in Databases) não contempla etapas de processamento multidimensional e multinível (i.e., processamento OLAP - OnLine Analytical Processing) para minerar cubos de dados. Por conseqüência, a maioria das abordagens de OLAM (OLAP Mining) propõe adaptações no algoritmo minerador. Dado que esta abordagem provê uma solução fortemente acoplada ao algoritmo minerador, ela impede que as adaptações para mineração multidimensional e multinível sejam utilizadas com outros algoritmos. Além disto, grande parte das propostas de OLAM para regras de associação não considera o uso de um servidor OLAP e não tira proveito de todo o potencial multidimensional e multinível presentes nos cubos OLAP. Por estes motivos, algum retrabalho (e.g., re-implementação de operações OLAP) é realizado e padrões possivelmente fortes decorrentes de generalizações não são identificados. Diante desse cenário, este trabalho propõe a arquitetura DOLAM (Decoupled OLAM) para mineração desacoplada de regras de associação multidimensional, multinível e de outliers em cubos OLAP. A arquitetura DOLAM deve ser inserida no processo de KDD (Knowledge Discovery in Databases) como uma etapa de processamento que fica entre as etapas de Pré-Processamento e Transformação de Dados. A arquitetura DOLAM define e implementa três componentes: 1) Detector de Outliers, 2) Explorador de Subcubos e 3) Expansor de Ancestrais. A partir de uma consulta do usuário, estes componentes são capazes de, respectivamente: 1) identificar ruídos significativos nas células do resultado; 2) explorar, recursivamente, todas as células do resultado, de forma a contemplar todas as possibilidades de combinações multidimensional e multinível e 3) recuperar todos os antecessores (generalizações) das células do resultado. O componente central da arquitetura é o Expansor de Ancestrais - o único de uso obrigatório. Ressalta-se que, a partir desses componentes, o processamento OLAM fica desacoplado do algoritmo minerador e permite realizar descobertas mais abrangentes, as quais, por conseqüência, podem retornar padrões potencialmente mais fortes. Como prova de conceito, foi realizado um estudo de caso com dados reais de uma empresa de micro-crédito. O estudo de caso foi implementado em Java, fez uso do servidor OLAP Mondrian e utilizou as implementações dos algoritmos para mineração de regras de associação APriori e FP-Growth do pacote de software Weka OLAP Mineração de dados KDD OLAM Regras de associação APriori FP-growth Mineração multidimensional Mineração multinível Outlier
4	學術研究論文推薦系統之研究 / Development of a Recommendation System for Academic Research Papers 葉博凱 Unknown Date (has links) 推薦系統為網站提升使用者滿意度、減少使用者所花費的時間並且替網站提供方提升銷售，是現在網站中不可或缺的要素，而推薦系統的研究集中在娛樂項目，學術研究論文推薦系統的研究有限。若能給予有價值的相關文獻，提供協助，無疑是加速進步的速度。在過去的研究中，為了達到個人化目的所使用的方法，都有不可避免或未解決的缺點，2002年美國研究圖書館協會提出布達佩斯開放獲取計劃(Budapest Open Access Initiative)，不要求使用者註冊帳號與支付款項就能取得研究論文全文，這樣的做法使期刊走向開放的風氣開始盛行，時至今日，開放獲取對學術期刊網站帶來重大的影響。在這樣的時空背景之下，本研究提出一個適用於學術論文之推薦機制，以FP-Growth演算法與協同過濾做為推薦方法的基礎，消弭過去研究之缺點，並具個人化推薦的優點，經實驗驗證後，證實本研究所提出的推薦架構具有良好的成效。 / Recommendation system is used in many field like movie, music, electric commerce and library. It’s not only save customers’ time but also raise organizations’ efficient. Recommended system is an essential element in a website. Some methods have been developed for recommended system, but they are primarily focused on content or collaboration-based mechanisms. For academic research, it is very important that relevant literature can be provided to researchers when they conduct literature review. Previous research indicates that there are inevitable or unsolved shortcomings in existing methods such as cold starts. Association of Research Libraries purpose “Budapest Open Access Initiative” that is advocate open access concept. Open access means that users can get full paper without register and pay fee. It’s a major impact to academic journal website. In this space-time background, we propose a hybrid recommendation mechanism that takes into consideration the nature of recommendation academic papers to mitigate the shortcomings of existing methods. 學術論文推薦協同過濾關聯規則冷啟動 FP-Growth recommendation systems collaborative filtering association rules cold start FP-Growth
5	Frequent itemset mining on multiprocessor systems Schlegel, Benjamin 08 May 2014 (has links) (PDF) Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets. Data Mining Assoziationsanalyse Mehrprozessorsysteme Paralleles Data Mining SIMD Apriori Eclat FP-growth Data mining Association rule mining Multiprocessor Systems Parallel mining SIMD Compression Apriori Eclat FP-growth ddc:004 rvk:ST 530 Datenverarbeitung Informatik Computerprogrammierung Programme Daten Spezielle Computerverfahren Data Mining Algorithmen Multithreading SIMD Datenkompression
6	Frequent itemset mining on multiprocessor systems Schlegel, Benjamin 30 May 2013 (has links) Frequent itemset mining is an important building block in many data mining applications like market basket analysis, recommendation, web-mining, fraud detection, and gene expression analysis. In many of them, the datasets being mined can easily grow up to hundreds of gigabytes or even terabytes of data. Hence, efficient algorithms are required to process such large amounts of data. In recent years, there have been many frequent-itemset mining algorithms proposed, which however (1) often have high memory requirements and (2) do not exploit the large degrees of parallelism provided by modern multiprocessor systems. The high memory requirements arise mainly from inefficient data structures that have only been shown to be sufficient for small datasets. For large datasets, however, the use of these data structures force the algorithms to go out-of-core, i.e., they have to access secondary memory, which leads to serious performance degradations. Exploiting available parallelism is further required to mine large datasets because the serial performance of processors almost stopped increasing. Algorithms should therefore exploit the large number of available threads and also the other kinds of parallelism (e.g., vector instruction sets) besides thread-level parallelism. In this work, we tackle the high memory requirements of frequent itemset mining twofold: we (1) compress the datasets being mined because they must be kept in main memory during several mining invocations and (2) improve existing mining algorithms with memory-efficient data structures. For compressing the datasets, we employ efficient encodings that show a good compression performance on a wide variety of realistic datasets, i.e., the size of the datasets is reduced by up to 6.4x. The encodings can further be applied directly while loading the dataset from disk or network. Since encoding and decoding is repeatedly required for loading and mining the datasets, we reduce its costs by providing parallel encodings that achieve high throughputs for both tasks. For a memory-efficient representation of the mining algorithms’ intermediate data, we propose compact data structures and even employ explicit compression. Both methods together reduce the intermediate data’s size by up to 25x. The smaller memory requirements avoid or delay expensive out-of-core computation when large datasets are mined. For coping with the high parallelism provided by current multiprocessor systems, we identify the performance hot spots and scalability issues of existing frequent-itemset mining algorithms. The hot spots, which form basic building blocks of these algorithms, cover (1) counting the frequency of fixed-length strings, (2) building prefix trees, (3) compressing integer values, and (4) intersecting lists of sorted integer values or bitmaps. For all of them, we discuss how to exploit available parallelism and provide scalable solutions. Furthermore, almost all components of the mining algorithms must be parallelized to keep the sequential fraction of the algorithms as small as possible. We integrate the parallelized building blocks and components into three well-known mining algorithms and further analyze the impact of certain existing optimizations. Our algorithms are already single-threaded often up an order of magnitude faster than existing highly optimized algorithms and further scale almost linear on a large 32-core multiprocessor system. Although our optimizations are intended for frequent-itemset mining algorithms, they can be applied with only minor changes to algorithms that are used for mining of other types of itemsets. info:eu-repo/classification/ddc/004 ddc:004
7	Získávání frekventovaných vzorů z proudu dat / Frequent Pattern Discovery in a Data Stream Dvořák, Michal January 2012 (has links) Frequent-pattern mining from databases has been widely studied and frequently observed. Unfortunately, these algorithms are not suitable for data stream processing. In frequent-pattern mining from data streams, it is important to manage sets of items and also their history. There are several reasons for this; it is not just the history of frequent items, but also the history of potentially frequent sets that can become frequent later. This requires more memory and computational power. This thesis describes two algorithms: Lossy Counting and FP-stream. An effective implementation of these algorithms in C# is an integral part of this thesis. In addition, the two algorithms have been compared.

1

Page generated in 0.0206 seconds