Global ETD Search

1	Novel applications of Association Rule Mining- Data Stream Mining Vithal Kadam, Omkar January 2009 (has links) From the advent of association rule mining, it has become one of the most researched areas of data exploration schemes. In recent years, implementing association rule mining methods in extracting rules from a continuous flow of voluminous data, known as Data Stream has generated immense interest due to its emerging applications such as network-traffic analysis, sensor-network data analysis. For such typical kinds of application domains, the facility to process such enormous amount of stream data in a single pass is critical. Data stream mining Association rule mining
2	Novel applications of Association Rule Mining- Data Stream Mining Vithal Kadam, Omkar January 2009 (has links) From the advent of association rule mining, it has become one of the most researched areas of data exploration schemes. In recent years, implementing association rule mining methods in extracting rules from a continuous flow of voluminous data, known as Data Stream has generated immense interest due to its emerging applications such as network-traffic analysis, sensor-network data analysis. For such typical kinds of application domains, the facility to process such enormous amount of stream data in a single pass is critical. Data stream mining Association rule mining
3	Um estudo investigativo de algoritmos de regressão para data streams Nunes, André Luís 28 March 2017 (has links) Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2017-06-13T14:22:04Z No. of bitstreams: 1 André Luís Nunes_.pdf: 2523682 bytes, checksum: 5e3899cfac6d76db6b2c6ac16b7f5325 (MD5) / Made available in DSpace on 2017-06-13T14:22:04Z (GMT). No. of bitstreams: 1 André Luís Nunes_.pdf: 2523682 bytes, checksum: 5e3899cfac6d76db6b2c6ac16b7f5325 (MD5) Previous issue date: 2017-03-28 / Nenhuma / A explosão no volume de dados e a sua velocidade de expansão tornam as tarefas de descoberta do conhecimento e a análise de dados desafiantes, ainda mais quando consideradas bases não-estacionárias. Embora a predição de valores futuros exerça papel fundamental em áreas como: o clima, problemas de roteamentos e economia, entre outros, a classificação ainda parece ser a tarefa mais explorada. Recentemente, alguns algoritmos voltados à regressão de valores foram lançados, como por exemplo: FIMT-DD, AMRules, IBLStreams e SFNRegressor, entretanto seus estudos investigativos exploraram mais aspectos de inovação e análise do erro de predição, do que explorar suas capacidades mediante critérios apontados como fundamentais para data stream, como tempo de execução e memória. Dessa forma, o objetivo deste trabalho é apresentar um estudo investigativo sobre estes algoritmos que tratam regressão, considerando ambientes dinâmicos, utilizando bases de dados massivas, além de explorar a capacidade de adaptação dos algoritmos com a presença de concept drift. Para isto três bases de dados foram analisadas e estendidas para explorar os principais critérios de avaliação adotados, sendo realizada uma ampla experimentação que produziu uma comparação dos resultados obtidos frente aos algoritmos escolhidos, possibilitando gerar indicativos do comportamento de cada um mediante os diferentes cenários a que foram expostos. Assim, como principais contribuições deste trabalho são destacadas: a avaliação de critérios fundamentais: memória, tempo de execução e poder de generalização, relacionados a regressão para data stream; produção de uma análise crítica dos algoritmos investigados; e a possibilidade de reprodução e extensão dos estudos realizados pela disponibilização das parametrizações empregadas / The explosion of data volume and its expansion speed make tasks of finding knowledge and analyzing data challenging, even more so when non-stationary bases are considered. Although the future values prediction plays a fundamental role in areas such as climate, routing problems and economics, among others, classification seems to be still the most exploited task. Recently, some value-regression algorithms have been launched, for example: FIMT-DD, AMRules, IBLStreams and SFNRegressor; however, their investigative studies have explored more aspects of innovation and analysis of error prediction than exploring their capabilities through criteria that are considered fundamental to data stream, such as elapsed time and memory. In this way, the objective of this work is to present an investigative study about these algorithms that treat regression considering dynamic environments, using massive databases, and also explore the algorithm's adaptability capacity with the presence of concept drift. In order to do this, three databases were analyzed and extended to explore the main evaluation criteria adopted. A wide experiment was carried out, which produced a comparison of the results obtained with the chosen algorithms, allowing to generate behavior indication of each one through the different scenarios to which were exposed. Thus, the main contributions of this work are: evaluation of fundamental criteria: memory, execution time and power of generalization, related to regression to data stream; production of a critical analysis of the algorithms investigated; and the possibility of reproducing and extending the studies carried out by making available the parametrizations applyed. Mineração de data stream Regressão Concept drift Data stream mining Regression
4	Real-time Distributed Computation of Formal Concepts and Analytics De Alburquerque Melo, Cassio 19 July 2013 (has links) (PDF) The advances in technology for creation, storage and dissemination of data have dramatically increased the need for tools that effectively provide users with means of identifying and understanding relevant information. Despite the great computing opportunities distributed frameworks such as Hadoop provide, it has only increased the need for means of identifying and understanding relevant information. Formal Concept Analysis (FCA) may play an important role in this context, by employing more intelligent means in the analysis process. FCA provides an intuitive understanding of generalization and specialization relationships among objects and their attributes in a structure known as a concept lattice. The present thesis addresses the problem of mining and visualising concepts over a data stream. The proposed approach is comprised of several distributed components that carry the computation of concepts from a basic transaction, filter and transforms data, stores and provides analytic features to visually explore data. The novelty of our work consists of: (i) a distributed processing and analysis architecture for mining concepts in real-time; (ii) the combination of FCA with visual analytics visualisation and exploration techniques, including association rules analytics; (iii) new algorithms for condensing and filtering conceptual data and (iv) a system that implements all proposed techniques, called Cubix, and its use cases in Biology, Complex System Design and Space Applications. [SPI:OTHER] Engineering Sciences/Other Formal Concept Analysis Visual Analytics Data Stream Mining
5	Extraction and Energy Efficient Processing of Streaming Data García-Martín, Eva January 2017 (has links) The interest in machine learning algorithms is increasing, in parallel with the advancements in hardware and software required to mine large-scale datasets. Machine learning algorithms account for a significant amount of energy consumed in data centers, which impacts the global energy consumption. However, machine learning algorithms are optimized towards predictive performance and scalability. Algorithms with low energy consumption are necessary for embedded systems and other resource constrained devices; and desirable for platforms that require many computations, such as data centers. Data stream mining investigates how to process potentially infinite streams of data without the need to store all the data. This ability is particularly useful for companies that are generating data at a high rate, such as social networks. This thesis investigates algorithms in the data stream mining domain from an energy efficiency perspective. The thesis comprises of two parts. The first part explores how to extract and analyze data from Twitter, with a pilot study that investigates a correlation between hashtags and followers. The second and main part investigates how energy is consumed and optimized in an online learning algorithm, suitable for data stream mining tasks. The second part of the thesis focuses on analyzing, understanding, and reformulating the Very Fast Decision Tree (VFDT) algorithm, the original Hoeffding tree algorithm, into an energy efficient version. It presents three key contributions. First, it shows how energy varies in the VFDT from a high-level view by tuning different parameters. Second, it presents a methodology to identify energy bottlenecks in machine learning algorithms, by portraying the functions of the VFDT that consume the largest amount of energy. Third, it introduces dynamic parameter adaptation for Hoeffding trees, a method to dynamically adapt the parameters of Hoeffding trees to reduce their energy consumption. The results show an average energy reduction of 23% on the VFDT algorithm. / Scalable resource-efficient systems for big data analytics machine learning green computing data mining data stream mining green machine learning Computer Sciences Datavetenskap (datalogi)
6	Graph-based Multi-view Clustering for Continuous Pattern Mining Åleskog, Christoffer January 2021 (has links) Background. In many smart monitoring applications, such as smart healthcare, smart building, autonomous cars etc., data are collected from multiple sources and contain information about different perspectives/views of the monitored phenomenon, physical object, system. In addition, in many of those applications the availability of relevant labelled data is often low or even non-existing. Inspired by this, in this thesis study we propose a novel algorithm for multi-view stream clustering. The algorithm can be applied for continuous pattern mining and labeling of streaming data. Objectives. The main objective of this thesis is to develop and implement a novel multi-view stream clustering algorithm. In addition, the potential of the proposed algorithm is studied and evaluated on two datasets: synthetic and real-world. The conducted experiments study the new algorithm’s performance compared to a single-view clustering algorithm and an algorithm without transferring knowledge between chunks. Finally, the obtained results are analyzed, discussed and interpreted. Methods. Initially, we study the state-of-the-art multi-view (stream) clustering algorithms. Then we develop our multi-view clustering algorithm for streaming data by implementing transfer of knowledge feature. We present and explain in details the developed algorithm by motivating each choice made during the algorithm design phase. Finally, discussion of the algorithm configuration, experimental setup and the datasets chosen for the experiments are presented and motivated. Results. Different configurations of the proposed algorithm have been studied and evaluated under different experimental scenarios on two different datasets: synthetic and real-world. The proposed multi-view clustering algorithm has demonstrated higher performance on the synthetic data than on the real-world dataset. This is mainly due to not very good quality of the used real-world data. Conclusions. The proposed algorithm has demonstrated higher performance results on the synthetic dataset than on the real-world dataset. It can generate high-quality clustering solutions with respect to the used evaluation metrics. In addition, the transfer of knowledge feature has been shown to have a positive effect on the algorithm performance. A further study of the proposed algorithm on other richer and more suitable datasets, e.g., data collected from numerous sensors used for monitoring some phenomenon, is planned to be conducted in the future work. Machine Learning Unsupervised Learning Multi-view Clustering Data Stream Mining Pattern Mining Computer Sciences Datavetenskap (datalogi)
7	Efficient Frequent Closed Itemset Algorithms With Applications To Stream Mining And Classification Ranganath, B N 09 1900 (has links) Data mining is an area to find valid, novel, potentially useful, and ultimately understandable abstractions in a data. Frequent itemset mining is one of the important data mining approaches to find those abstractions in the form of patterns. Frequent Closed itemsets provide complete and condensed information for non-redundant association rules generation. For many applications mining all the frequent itemsets is not necessary, and mining frequent Closed itemsets are adequate. Compared to frequent itemset mining, frequent Closed itemset mining generates less number of itemsets, and therefore improves the efficiency and effectiveness of these tasks. Recently, much research has been done on Closed itemsets mining, but it is mainly for traditional databases where multiple scans are needed, and whenever new transactions arrive, additional scans must be performed on the updated transaction database; therefore, they are not suitable for data stream mining. Mining frequent itemsets from data streams has many potential and broad applications. Some of the emerging applications of data streams that require association rule mining are network traffic monitoring and web click streams analysis. Different from data in traditional static databases, data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data. Recent works on data stream mining based on sliding window method slide the window by one transaction at a time. But when the window size is large and support threshold is low, the existing methods consume significant time and lead to a large increase in user response time. In our first work, we propose a novel algorithm Stream-Close based on sliding window model to mine frequent Closed itemsets from the data streams within the current sliding window. We enhance the scalabality of the algorithm by introducing several optimization techniques such as sliding the window by multiple transactions at a time and novel pruning techniques which lead to a considerable reduction in the number of candidate itemsets to be examined for closure checking. Our experimental studies show that the proposed algorithm scales well with large data sets. Still the notion of frequent closed itemsets generates a huge number of closed itemsets in some applications. This drawback makes frequent closed itemsets mining infeasible in many applications since users cannot interpret the large volume of output (which sometimes will be greater than the data itself when support threshold is low) and may lead to an overhead to develop extra applications which post processes the output of original algorithm to reduce the size of the output. Recent work on clustering of itemsets considers strictly either expression(consists of items present in itemset) or support of the itemsets or partially both to reduce the number of itemsets. But the drawback of the above approaches is that in some situations, number of itemsets does not reduce due to their restricted view of either considering expressions or support. So we propose a new notion of frequent itemsets called clustered itemsets which considers both expressions and support of the itemsets in summarizing the output. We introduce a new distance measure w.r.t expressions and also prove the problem of mining clustered itemsets to be NP-hard. In our second work, we propose a deterministic locality sensitive hashing based classifier using clustered itemsets. Locality sensitive hashing(LSH)is a technique for efficiently finding a nearest neighbour in high dimensional data sets. The idea of locality sensitive hashing is to hash the points using several hash functions to ensure that for each function the probability of collision is much higher for objects that are close to each other than those that are far apart. We propose a LSH based approximate nearest neighbour classification strategy. But the problem with LSH is, it randomly chooses hash functions and the estimation of a large number of hash functions could lead to an increase in query time. From Classification point of view, since LSH chooses randomly from a family of hash functions the buckets may contain points belonging to other classes which may affect classification accuracy. So, in order to overcome these problems we propose to use class association rules based hash functions which ensure that buckets corresponding to the class association rules contain points from the same class. But associative classification involves generation and examination of large number of candidate class association rules. So, we use the clustered itemsets which reduce the number of class association rules to be examined. We also establish formal connection between clustering parameter(delta used in the generation of clustered frequent itemsets) and discriminative measure such as Information gain. Our experimental studies show that the proposed method achieves an increase in accuracy over LSH based near neighbour classification strategy. Data Mining Classification - Algorithms Frequent Itemset Mining Clustered Itemsets Data Stream Mining Locality Sensitive Hashing Stream-Close Algorithm Associative Classification Clustered Frequent Itemsets Closed Frequent Itemsets Stream Mining Computer Science
8	Real-time Distributed Computation of Formal Concepts and Analytics / Calcul distribué des concepts formels en temps réel et analyse visuelle De Alburquerque Melo, Cassio 19 July 2013 (has links) Les progrès de la technologie pour la création, le stockage et la diffusion des données ont considérablement augmenté le besoin d’outils qui permettent effectivement aux utilisateurs les moyens d’identifier et de comprendre l’information pertinente. Malgré les possibilités de calcul dans les cadres distribuées telles que des outils comme Hadoop offrent, il a seulement augmenté le besoin de moyens pour identifier et comprendre les informations pertinentes. L’Analyse de Concepts Formels (ACF) peut jouer un rôle important dans ce contexte, en utilisant des moyens plus intelligents dans le processus d’analyse. ACF fournit une compréhension intuitive de la généralisation et de spécialisation des relations entre les objets et leurs attributs dans une structure connue comme un treillis de concepts. Cette thèse aborde le problème de l’exploitation et visualisation des concepts sur un flux de données. L’approche proposée est composé de plusieurs composants distribués qui effectuent le calcul des concepts d’une transaction de base, filtre et transforme les données, les stocke et fournit des fonctionnalités analytiques pour l’exploitation visuelle des données. La nouveauté de notre travail consiste à: (i) une architecture distribuée de traitement et d’analyse des concepts et l’exploitation en temps réel, (ii) la combinaison de l’ACF avec l’analyse des techniques d’exploration, y compris la visualisation des règles d’association, (iii) des nouveaux algorithmes pour condenser et filtrage des données conceptuelles et (iv) un système qui met en œuvre toutes les techniques proposées, Cubix, et ses étude de cas en biologie, dans la conception de systèmes complexes et dans les applications spatiales. / The advances in technology for creation, storage and dissemination of data have dramatically increased the need for tools that effectively provide users with means of identifying and understanding relevant information. Despite the great computing opportunities distributed frameworks such as Hadoop provide, it has only increased the need for means of identifying and understanding relevant information. Formal Concept Analysis (FCA) may play an important role in this context, by employing more intelligent means in the analysis process. FCA provides an intuitive understanding of generalization and specialization relationships among objects and their attributes in a structure known as a concept lattice. The present thesis addresses the problem of mining and visualising concepts over a data stream. The proposed approach is comprised of several distributed components that carry the computation of concepts from a basic transaction, filter and transforms data, stores and provides analytic features to visually explore data. The novelty of our work consists of: (i) a distributed processing and analysis architecture for mining concepts in real-time; (ii) the combination of FCA with visual analytics visualisation and exploration techniques, including association rules analytics; (iii) new algorithms for condensing and filtering conceptual data and (iv) a system that implements all proposed techniques, called Cubix, and its use cases in Biology, Complex System Design and Space Applications. Analyse de concept formels Fouille visuelle de données Fouille de flots de données Formal Concept Analysis Visual Analytics Data Stream Mining
9	Incision fluviale et transition d'une rivière alluviale vers une rivière à fond rocheux : formation et évolution des seuils molassiques de la moyenne Garonne toulousaine au cours du 20e siècle / Fluvial incision and transition from alluvial channel to bedrock channel : formation and evolution of the molassic knickpoints of the middle Garonne River during the 20th century Jantzi, Hugo 21 September 2018 (has links) La moyenne Garonne a vu sa morphologie et sa dynamique hydro-sédimentaire durant le XXe siècle se modifier de manière significative, avec la contraction de la bande active et l’incision de son chenal. Cette thèse se propose d’analyser les ajustements verticaux du cours d’eau depuis 1830 et la dynamique d’évolution des seuils rocheux mis à l’affleurement sous l’effet de l’incision. L’étude se fonde sur une approche géohistorique et une approche de terrain, à trois échelles spatiales : (1) le linéaire de la moyenne Garonne, (2) trois tronçons et (3) les seuils inscrits dans les tronçons. A l’échelle du linéaire, trois phases d’ajustements sont identifiées : une première phase (1830-1860s) d’exhaussement (1,9 cm.an-1) et d’élargissement, une seconde phase (1860s-1920) d’aggradation (3,2 cm.an-1) et de contraction et une troisième phase (1920-2000s) d’incision (2,6 cm.an-1) et de contraction avec une accélération des processus d’incision (3,6 cm.an-1) et de contraction à partir des années 1960s. A l’échelle des tronçons, le rôle majeur des extractions de granulats est mis en évidence dans la mise à l’affleurement des seuils rocheux par érosion régressive et progressive et cela, de manière rapide durant les années 1970s. A l’échelle des seuils, la dégradation de la molasse montre qu’une fois dégagée de la charge alluviale, l’incision de la molasse reste active et que le cours d’eau poursuit son incision sans rupture majeure en termes de vitesse dans l’enchaînement entre les deux processus. Par ailleurs, l’étude des formes d’érosion dans la molasse a mis en évidence l’importance de la fusion de ces dernières dans l’érosion et l’évolution des seuils, ce qui tend à montrer que l’érosion verticale ne serait pas le facteur principal dans le développement des seuils rocheux. L’étude de la mobilité de la charge grossière montre que les seuils ne sont pas un obstacle à la mobilité des galets et sont transparents au transfert amont-aval. / The middle Garonne River has seen its morphology and hydro-sedimentary dynamics change significantly during the 20th century, resulting in channel narrowing and incision. This thesis proposes to analyze the vertical adjustments of the river since 1830 and the dynamics of the evolution of the knickpoints released from the alluvial cover under the effect of the incision. The study is based on a geohistorical and field approach, at three spatial scales: (1) the length of the middle Garonne, (2) three sections and (3) the knickpoints present at each section. At the linear scale, three phases of adjustments are identified: a first phase (1830-1860s) of aggradation (1.9 cm.an-1) and widening; a second phase (1860-1920) of aggradation (3.2 cm.an-1) and narrowing; then a third phase (1920-2000s) of incision (2.6 cm.an-1) and narrowing with an acceleration of the incision (3.6 cm.an-1) and narrowing from the 1960s. At the section scale, the major role of in-stream mining has been highlighted with regard to the outcropping of knickpoints by regressive and progressive erosion operating rapidly during the 1970s. At the scale of the knickpoints, the degradation of the molasse shows that once the alluvial cover is released, the incision remains active and the river continues its incision without major break in terms of speed between the two processes. Furthermore, the study of the erosion forms in the molasse highlights the importance of the fusion of the forms for the erosion and the evolution of the knickpoints, and tends to show that vertical erosion would not be the main factor for the development of the knickpoints. The mobility of the coarse load has also shown that the knickpoints are not an obstacle to the mobility of the pebbles during downstream transfer. Garonne Ajustement du chenal Incision fluviale Seuils rocheux Extraction de granulats Garonne Channel adjustements Fluvial incision Knicpoints In-stream mining
10	串流資料分析在台灣股市指數期貨之應用 / An Application of Streaming Data Analysis on TAIEX Futures 林宏哲, Lin, Hong Che Unknown Date (has links) 資料串流探勘是一個重要的研究領域，因為在現實中有許多重要的資料以串流的形式產生或被收集，金融市場的資料常常是一種資料串流，而通常這類型資料的本質是變動性大的。在這篇論文中我們運應了資料串流探勘的技術去預測台灣加權指數期貨的漲跌。對機器而言，預測期貨這種資料串流並不容易，而困難度跟概念飄移的種類與程度或頻率有關。概念飄移表示資料的潛在分布改變，這造成預測的準確率會急遽下降，因此我們專注在如何處理概念飄移。首先我們根據實驗的結果推測台灣加權指數期貨可能存在高頻率的概念飄移。另外實驗結果指出，使用偵測概念飄移的演算法可以大幅改善預測的準確率，甚至對於原本表現不好的演算法都能有顯著的改善。在這篇論文中我們亦整理出專門處理各類概念飄移的演算法。此外，我們提出了一個多分類器演算法，有助於偵測「重複發生」類別的概念飄移。該演算法相比改進之前，其最大的特色在於不需要使用者設定每個子分類器的樣本數，而該樣本數是影響演算法的關鍵之一。 / Data stream mining is an important research field, because data is usually generated and collected in a form of a stream in many cases in the real world. Financial market data is such an example. It is intrinsically dynamic and usually generated in a sequential manner. In this thesis, we apply data stream mining techniques to the prediction of Taiwan Stock Exchange Capitalization Weighted Stock Index Futures or TAIEX Futures. Our goal is to predict the rising or falling of the futures. The prediction is difficult and the difficulty is associated with concept drift, which indicates changes in the underlying data distribution. Therefore, we focus on concept drift handling. We first show that concept drift occurs frequently in the TAIEX Futures data by referring to the results from an empirical study. In addition, the results indicate that a concept drift detection method can improve the accuracy of the prediction even when it is used with a data stream mining algorithm that does not perform well. Next, we explore methods that can help us identify the types of concept drift. The experimental results indicate that sudden and reoccurring concept drift exist in the TAIEX Futures data. Moreover, we propose an ensemble based algorithm for reoccurring concept drift. The most characteristic feature of the proposed algorithm is that it can adaptively determine the chunk size, which is an important parameter for other concept drift handling algorithms. 資料串流探勘概念飄移台灣股市期貨 data stream mining concept drift TAIEX Futures

Search results