Global ETD Search

1	Novel applications of Association Rule Mining- Data Stream Mining Vithal Kadam, Omkar January 2009 (has links) From the advent of association rule mining, it has become one of the most researched areas of data exploration schemes. In recent years, implementing association rule mining methods in extracting rules from a continuous flow of voluminous data, known as Data Stream has generated immense interest due to its emerging applications such as network-traffic analysis, sensor-network data analysis. For such typical kinds of application domains, the facility to process such enormous amount of stream data in a single pass is critical. Data stream mining Association rule mining
2	Novel applications of Association Rule Mining- Data Stream Mining Vithal Kadam, Omkar January 2009 (has links) From the advent of association rule mining, it has become one of the most researched areas of data exploration schemes. In recent years, implementing association rule mining methods in extracting rules from a continuous flow of voluminous data, known as Data Stream has generated immense interest due to its emerging applications such as network-traffic analysis, sensor-network data analysis. For such typical kinds of application domains, the facility to process such enormous amount of stream data in a single pass is critical. Data stream mining Association rule mining
3	Um estudo investigativo de algoritmos de regressão para data streams Nunes, André Luís 28 March 2017 (has links) Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2017-06-13T14:22:04Z No. of bitstreams: 1 André Luís Nunes_.pdf: 2523682 bytes, checksum: 5e3899cfac6d76db6b2c6ac16b7f5325 (MD5) / Made available in DSpace on 2017-06-13T14:22:04Z (GMT). No. of bitstreams: 1 André Luís Nunes_.pdf: 2523682 bytes, checksum: 5e3899cfac6d76db6b2c6ac16b7f5325 (MD5) Previous issue date: 2017-03-28 / Nenhuma / A explosão no volume de dados e a sua velocidade de expansão tornam as tarefas de descoberta do conhecimento e a análise de dados desafiantes, ainda mais quando consideradas bases não-estacionárias. Embora a predição de valores futuros exerça papel fundamental em áreas como: o clima, problemas de roteamentos e economia, entre outros, a classificação ainda parece ser a tarefa mais explorada. Recentemente, alguns algoritmos voltados à regressão de valores foram lançados, como por exemplo: FIMT-DD, AMRules, IBLStreams e SFNRegressor, entretanto seus estudos investigativos exploraram mais aspectos de inovação e análise do erro de predição, do que explorar suas capacidades mediante critérios apontados como fundamentais para data stream, como tempo de execução e memória. Dessa forma, o objetivo deste trabalho é apresentar um estudo investigativo sobre estes algoritmos que tratam regressão, considerando ambientes dinâmicos, utilizando bases de dados massivas, além de explorar a capacidade de adaptação dos algoritmos com a presença de concept drift. Para isto três bases de dados foram analisadas e estendidas para explorar os principais critérios de avaliação adotados, sendo realizada uma ampla experimentação que produziu uma comparação dos resultados obtidos frente aos algoritmos escolhidos, possibilitando gerar indicativos do comportamento de cada um mediante os diferentes cenários a que foram expostos. Assim, como principais contribuições deste trabalho são destacadas: a avaliação de critérios fundamentais: memória, tempo de execução e poder de generalização, relacionados a regressão para data stream; produção de uma análise crítica dos algoritmos investigados; e a possibilidade de reprodução e extensão dos estudos realizados pela disponibilização das parametrizações empregadas / The explosion of data volume and its expansion speed make tasks of finding knowledge and analyzing data challenging, even more so when non-stationary bases are considered. Although the future values prediction plays a fundamental role in areas such as climate, routing problems and economics, among others, classification seems to be still the most exploited task. Recently, some value-regression algorithms have been launched, for example: FIMT-DD, AMRules, IBLStreams and SFNRegressor; however, their investigative studies have explored more aspects of innovation and analysis of error prediction than exploring their capabilities through criteria that are considered fundamental to data stream, such as elapsed time and memory. In this way, the objective of this work is to present an investigative study about these algorithms that treat regression considering dynamic environments, using massive databases, and also explore the algorithm's adaptability capacity with the presence of concept drift. In order to do this, three databases were analyzed and extended to explore the main evaluation criteria adopted. A wide experiment was carried out, which produced a comparison of the results obtained with the chosen algorithms, allowing to generate behavior indication of each one through the different scenarios to which were exposed. Thus, the main contributions of this work are: evaluation of fundamental criteria: memory, execution time and power of generalization, related to regression to data stream; production of a critical analysis of the algorithms investigated; and the possibility of reproducing and extending the studies carried out by making available the parametrizations applyed. Mineração de data stream Regressão Concept drift Data stream mining Regression
4	Real-time Distributed Computation of Formal Concepts and Analytics De Alburquerque Melo, Cassio 19 July 2013 (has links) (PDF) The advances in technology for creation, storage and dissemination of data have dramatically increased the need for tools that effectively provide users with means of identifying and understanding relevant information. Despite the great computing opportunities distributed frameworks such as Hadoop provide, it has only increased the need for means of identifying and understanding relevant information. Formal Concept Analysis (FCA) may play an important role in this context, by employing more intelligent means in the analysis process. FCA provides an intuitive understanding of generalization and specialization relationships among objects and their attributes in a structure known as a concept lattice. The present thesis addresses the problem of mining and visualising concepts over a data stream. The proposed approach is comprised of several distributed components that carry the computation of concepts from a basic transaction, filter and transforms data, stores and provides analytic features to visually explore data. The novelty of our work consists of: (i) a distributed processing and analysis architecture for mining concepts in real-time; (ii) the combination of FCA with visual analytics visualisation and exploration techniques, including association rules analytics; (iii) new algorithms for condensing and filtering conceptual data and (iv) a system that implements all proposed techniques, called Cubix, and its use cases in Biology, Complex System Design and Space Applications. [SPI:OTHER] Engineering Sciences/Other Formal Concept Analysis Visual Analytics Data Stream Mining
5	Extraction and Energy Efficient Processing of Streaming Data García-Martín, Eva January 2017 (has links) The interest in machine learning algorithms is increasing, in parallel with the advancements in hardware and software required to mine large-scale datasets. Machine learning algorithms account for a significant amount of energy consumed in data centers, which impacts the global energy consumption. However, machine learning algorithms are optimized towards predictive performance and scalability. Algorithms with low energy consumption are necessary for embedded systems and other resource constrained devices; and desirable for platforms that require many computations, such as data centers. Data stream mining investigates how to process potentially infinite streams of data without the need to store all the data. This ability is particularly useful for companies that are generating data at a high rate, such as social networks. This thesis investigates algorithms in the data stream mining domain from an energy efficiency perspective. The thesis comprises of two parts. The first part explores how to extract and analyze data from Twitter, with a pilot study that investigates a correlation between hashtags and followers. The second and main part investigates how energy is consumed and optimized in an online learning algorithm, suitable for data stream mining tasks. The second part of the thesis focuses on analyzing, understanding, and reformulating the Very Fast Decision Tree (VFDT) algorithm, the original Hoeffding tree algorithm, into an energy efficient version. It presents three key contributions. First, it shows how energy varies in the VFDT from a high-level view by tuning different parameters. Second, it presents a methodology to identify energy bottlenecks in machine learning algorithms, by portraying the functions of the VFDT that consume the largest amount of energy. Third, it introduces dynamic parameter adaptation for Hoeffding trees, a method to dynamically adapt the parameters of Hoeffding trees to reduce their energy consumption. The results show an average energy reduction of 23% on the VFDT algorithm. / Scalable resource-efficient systems for big data analytics machine learning green computing data mining data stream mining green machine learning Computer Sciences Datavetenskap (datalogi)
6	Graph-based Multi-view Clustering for Continuous Pattern Mining Åleskog, Christoffer January 2021 (has links) Background. In many smart monitoring applications, such as smart healthcare, smart building, autonomous cars etc., data are collected from multiple sources and contain information about different perspectives/views of the monitored phenomenon, physical object, system. In addition, in many of those applications the availability of relevant labelled data is often low or even non-existing. Inspired by this, in this thesis study we propose a novel algorithm for multi-view stream clustering. The algorithm can be applied for continuous pattern mining and labeling of streaming data. Objectives. The main objective of this thesis is to develop and implement a novel multi-view stream clustering algorithm. In addition, the potential of the proposed algorithm is studied and evaluated on two datasets: synthetic and real-world. The conducted experiments study the new algorithm’s performance compared to a single-view clustering algorithm and an algorithm without transferring knowledge between chunks. Finally, the obtained results are analyzed, discussed and interpreted. Methods. Initially, we study the state-of-the-art multi-view (stream) clustering algorithms. Then we develop our multi-view clustering algorithm for streaming data by implementing transfer of knowledge feature. We present and explain in details the developed algorithm by motivating each choice made during the algorithm design phase. Finally, discussion of the algorithm configuration, experimental setup and the datasets chosen for the experiments are presented and motivated. Results. Different configurations of the proposed algorithm have been studied and evaluated under different experimental scenarios on two different datasets: synthetic and real-world. The proposed multi-view clustering algorithm has demonstrated higher performance on the synthetic data than on the real-world dataset. This is mainly due to not very good quality of the used real-world data. Conclusions. The proposed algorithm has demonstrated higher performance results on the synthetic dataset than on the real-world dataset. It can generate high-quality clustering solutions with respect to the used evaluation metrics. In addition, the transfer of knowledge feature has been shown to have a positive effect on the algorithm performance. A further study of the proposed algorithm on other richer and more suitable datasets, e.g., data collected from numerous sensors used for monitoring some phenomenon, is planned to be conducted in the future work. Machine Learning Unsupervised Learning Multi-view Clustering Data Stream Mining Pattern Mining Computer Sciences Datavetenskap (datalogi)
7	Real-time Distributed Computation of Formal Concepts and Analytics / Calcul distribué des concepts formels en temps réel et analyse visuelle De Alburquerque Melo, Cassio 19 July 2013 (has links) Les progrès de la technologie pour la création, le stockage et la diffusion des données ont considérablement augmenté le besoin d’outils qui permettent effectivement aux utilisateurs les moyens d’identifier et de comprendre l’information pertinente. Malgré les possibilités de calcul dans les cadres distribuées telles que des outils comme Hadoop offrent, il a seulement augmenté le besoin de moyens pour identifier et comprendre les informations pertinentes. L’Analyse de Concepts Formels (ACF) peut jouer un rôle important dans ce contexte, en utilisant des moyens plus intelligents dans le processus d’analyse. ACF fournit une compréhension intuitive de la généralisation et de spécialisation des relations entre les objets et leurs attributs dans une structure connue comme un treillis de concepts. Cette thèse aborde le problème de l’exploitation et visualisation des concepts sur un flux de données. L’approche proposée est composé de plusieurs composants distribués qui effectuent le calcul des concepts d’une transaction de base, filtre et transforme les données, les stocke et fournit des fonctionnalités analytiques pour l’exploitation visuelle des données. La nouveauté de notre travail consiste à: (i) une architecture distribuée de traitement et d’analyse des concepts et l’exploitation en temps réel, (ii) la combinaison de l’ACF avec l’analyse des techniques d’exploration, y compris la visualisation des règles d’association, (iii) des nouveaux algorithmes pour condenser et filtrage des données conceptuelles et (iv) un système qui met en œuvre toutes les techniques proposées, Cubix, et ses étude de cas en biologie, dans la conception de systèmes complexes et dans les applications spatiales. / The advances in technology for creation, storage and dissemination of data have dramatically increased the need for tools that effectively provide users with means of identifying and understanding relevant information. Despite the great computing opportunities distributed frameworks such as Hadoop provide, it has only increased the need for means of identifying and understanding relevant information. Formal Concept Analysis (FCA) may play an important role in this context, by employing more intelligent means in the analysis process. FCA provides an intuitive understanding of generalization and specialization relationships among objects and their attributes in a structure known as a concept lattice. The present thesis addresses the problem of mining and visualising concepts over a data stream. The proposed approach is comprised of several distributed components that carry the computation of concepts from a basic transaction, filter and transforms data, stores and provides analytic features to visually explore data. The novelty of our work consists of: (i) a distributed processing and analysis architecture for mining concepts in real-time; (ii) the combination of FCA with visual analytics visualisation and exploration techniques, including association rules analytics; (iii) new algorithms for condensing and filtering conceptual data and (iv) a system that implements all proposed techniques, called Cubix, and its use cases in Biology, Complex System Design and Space Applications. Analyse de concept formels Fouille visuelle de données Fouille de flots de données Formal Concept Analysis Visual Analytics Data Stream Mining
8	串流資料分析在台灣股市指數期貨之應用 / An Application of Streaming Data Analysis on TAIEX Futures 林宏哲, Lin, Hong Che Unknown Date (has links) 資料串流探勘是一個重要的研究領域，因為在現實中有許多重要的資料以串流的形式產生或被收集，金融市場的資料常常是一種資料串流，而通常這類型資料的本質是變動性大的。在這篇論文中我們運應了資料串流探勘的技術去預測台灣加權指數期貨的漲跌。對機器而言，預測期貨這種資料串流並不容易，而困難度跟概念飄移的種類與程度或頻率有關。概念飄移表示資料的潛在分布改變，這造成預測的準確率會急遽下降，因此我們專注在如何處理概念飄移。首先我們根據實驗的結果推測台灣加權指數期貨可能存在高頻率的概念飄移。另外實驗結果指出，使用偵測概念飄移的演算法可以大幅改善預測的準確率，甚至對於原本表現不好的演算法都能有顯著的改善。在這篇論文中我們亦整理出專門處理各類概念飄移的演算法。此外，我們提出了一個多分類器演算法，有助於偵測「重複發生」類別的概念飄移。該演算法相比改進之前，其最大的特色在於不需要使用者設定每個子分類器的樣本數，而該樣本數是影響演算法的關鍵之一。 / Data stream mining is an important research field, because data is usually generated and collected in a form of a stream in many cases in the real world. Financial market data is such an example. It is intrinsically dynamic and usually generated in a sequential manner. In this thesis, we apply data stream mining techniques to the prediction of Taiwan Stock Exchange Capitalization Weighted Stock Index Futures or TAIEX Futures. Our goal is to predict the rising or falling of the futures. The prediction is difficult and the difficulty is associated with concept drift, which indicates changes in the underlying data distribution. Therefore, we focus on concept drift handling. We first show that concept drift occurs frequently in the TAIEX Futures data by referring to the results from an empirical study. In addition, the results indicate that a concept drift detection method can improve the accuracy of the prediction even when it is used with a data stream mining algorithm that does not perform well. Next, we explore methods that can help us identify the types of concept drift. The experimental results indicate that sudden and reoccurring concept drift exist in the TAIEX Futures data. Moreover, we propose an ensemble based algorithm for reoccurring concept drift. The most characteristic feature of the proposed algorithm is that it can adaptively determine the chunk size, which is an important parameter for other concept drift handling algorithms. 資料串流探勘概念飄移台灣股市期貨 data stream mining concept drift TAIEX Futures
9	A Reservoir of Adaptive Algorithms for Online Learning from Evolving Data Streams Pesaranghader, Ali 26 September 2018 (has links) Continuous change and development are essential aspects of evolving environments and applications, including, but not limited to, smart cities, military, medicine, nuclear reactors, self-driving cars, aviation, and aerospace. That is, the fundamental characteristics of such environments may evolve, and so cause dangerous consequences, e.g., putting people lives at stake, if no reaction is adopted. Therefore, learning systems need to apply intelligent algorithms to monitor evolvement in their environments and update themselves effectively. Further, we may experience fluctuations regarding the performance of learning algorithms due to the nature of incoming data as it continuously evolves. That is, the current efficient learning approach may become deprecated after a change in data or environment. Hence, the question 'how to have an efficient learning algorithm over time against evolving data?' has to be addressed. In this thesis, we have made two contributions to settle the challenges described above. In the machine learning literature, the phenomenon of (distributional) change in data is known as concept drift. Concept drift may shift decision boundaries, and cause a decline in accuracy. Learning algorithms, indeed, have to detect concept drift in evolving data streams and replace their predictive models accordingly. To address this challenge, adaptive learners have been devised which may utilize drift detection methods to locate the drift points in dynamic and changing data streams. A drift detection method able to discover the drift points quickly, with the lowest false positive and false negative rates, is preferred. False positive refers to incorrectly alarming for concept drift, and false negative refers to not alarming for concept drift. In this thesis, we introduce three algorithms, called as the Fast Hoeffding Drift Detection Method (FHDDM), the Stacking Fast Hoeffding Drift Detection Method (FHDDMS), and the McDiarmid Drift Detection Methods (MDDMs), for detecting drift points with the minimum delay, false positive, and false negative rates. FHDDM is a sliding window-based algorithm and applies Hoeffding’s inequality (Hoeffding, 1963) to detect concept drift. FHDDM slides its window over the prediction results, which are either 1 (for a correct prediction) or 0 (for a wrong prediction). Meanwhile, it compares the mean of elements inside the window with the maximum mean observed so far; subsequently, a significant difference between the two means, upper-bounded by the Hoeffding inequality, indicates the occurrence of concept drift. The FHDDMS extends the FHDDM algorithm by sliding multiple windows over its entries for a better drift detection regarding the detection delay and false negative rate. In contrast to FHDDM/S, the MDDM variants assign weights to their entries, i.e., higher weights are associated with the most recent entries in the sliding window, for faster detection of concept drift. The rationale is that recent examples reflect the ongoing situation adequately. Then, by putting higher weights on the latest entries, we may detect concept drift quickly. An MDDM algorithm bounds the difference between the weighted mean of elements in the sliding window and the maximum weighted mean seen so far, using McDiarmid’s inequality (McDiarmid, 1989). Eventually, it alarms for concept drift once a significant difference is experienced. We experimentally show that FHDDM/S and MDDMs outperform the state-of-the-art by representing promising results in terms of the adaptation and classification measures. Due to the evolving nature of data streams, the performance of an adaptive learner, which is defined by the classification, adaptation, and resource consumption measures, may fluctuate over time. In fact, a learning algorithm, in the form of a (classifier, detector) pair, may present a significant performance before a concept drift point, but not after. We define this problem by the question 'how can we ensure that an efficient classifier-detector pair is present at any time in an evolving environment?' To answer this, we have developed the Tornado framework which runs various kinds of learning algorithms simultaneously against evolving data streams. Each algorithm incrementally and independently trains a predictive model and updates the statistics of its drift detector. Meanwhile, our framework monitors the (classifier, detector) pairs, and recommends the efficient one, concerning the classification, adaptation, and resource consumption performance, to the user. We further define the holistic CAR measure that integrates the classification, adaptation, and resource consumption measures for evaluating the performance of adaptive learning algorithms. Our experiments confirm that the most efficient algorithm may differ over time because of the developing and evolving nature of data streams. Machine Learning Adaptive Learning Multi-Strategy Learning Data Stream Mining Evolving Data Streams Concept Drift Drift Detection Drift Detection Methods Window-based Drift Detection Hoeffding's inequality McDiarmid's inequality
10	An efficient entropy estimation approach Paavola, M. (Marko) 01 November 2011 (has links) Abstract Advances in miniaturisation have led to the development of new wireless measurement technologies such as wireless sensor networks (WSNs). A WSN consists of low cost nodes, which are battery-operated devices, capable of sensing the environment, transmitting and receiving, and computing. While a WSN has several advantages, including cost-effectiveness and easy installation, the nodes suffer from small memory, low computing power, small bandwidth and limited energy supply. In order to cope with restrictions on resources, data processing methods should be as efficient as possible. As a result, high quality approximates are preferred instead of accurate answers. The aim of this thesis was to propose an efficient entropy approximation method for resource-constrained environments. Specifically, the algorithm should use a small, constant amount of memory, and have certain accuracy and low computational demand. The performance of the proposed algorithm was evaluated experimentally with three case studies. The first study focused on the online monitoring of WSN communications performance in an industrial environment. The monitoring approach was based on the observation that entropy could be applied to assess the impact of interferences on time-delay variation of periodic tasks. The main purpose of the additional two cases, depth of anaesthesia (DOA) –monitoring and benchmarking with simulated data sets was to provide additional evidence on the general applicability of the proposed method. Moreover, in case of DOA-monitoring, an efficient entropy approximation could assist in the development of handheld devices or processing large amount of online data from different channels simultaneously. The initial results from the communication and DOA monitoring applications as well as from simulations were encouraging. Therefore, based on the case studies, the proposed method was able to meet the stated requirements. Since entropy is a widely used quantity, the method is also expected to have a variety of applications in measurement systems with similar requirements. / Tiivistelmä Mekaanisten- ja puolijohdekomponenttien pienentyminen on mahdollistanut uusien mittaustekniikoiden, kuten langattomien anturiverkkojen kehittämisen. Anturiverkot koostuvat halvoista, paristokäyttöisistä solmuista, jotka pystyvät mittaamaan ympäristöään sekä käsittelemään, lähettämään ja vastaanottamaan tietoja. Anturiverkkojen etuja ovat kustannustehokkuus ja helppo käyttöönotto, rajoitteina puolestaan vähäinen muisti- ja tiedonsiirtokapasiteetti, alhainen laskentateho ja rajoitettu energiavarasto. Näiden rajoitteiden vuoksi solmuissa käytettävien laskentamenetelmien tulee olla mahdollisimman tehokkaita. Tämän työn tavoitteena oli esittää tehokas entropian laskentamenetelmä resursseiltaan rajoitettuihin ympäristöihin. Algoritmin vaadittiin olevan riittävän tarkka, muistinkulutukseltaan pieni ja vakiosuuruinen sekä laskennallisesti tehokas. Työssä kehitetyn menetelmän suorituskykyä tutkittiin sovellusesimerkkien avulla. Ensimmäisessä tapauksessa perehdyttiin anturiverkon viestiyhteyksien reaaliaikaiseen valvontaan. Lähestymistavan taustalla oli aiempi tutkimus, jonka perusteella entropian avulla voidaan havainnoida häiriöiden vaikutusta viestien viiveiden vaihteluun. Muiden sovellusesimerkkien, anestesian syvyysindikaattorin ja simulaatiokokeiden, päätavoite oli tutkia menetelmän yleistettävyyttä. Erityisesti anestesian syvyyden seurannassa menetelmän arvioitiin voivan olla lisäksi hyödyksi langattomien, käsikäyttöisten syvyysmittareiden kehittämisessä ja suurten mittausmäärien reaaliaikaisessa käsittelyssä. Alustavat tulokset langattoman verkon yhteyksien ja anestesian syvyyden valvonnasta sekä simuloinneista olivat lupaavia. Sovellusesimerkkien perusteella esitetty algoritmi kykeni vastaamaan asetettuihin vaatimuksiin. Koska entropia on laajalti käytetty suure, menetelmä saattaa soveltua useisiin mittausympäristöihin, joissa on samankaltaisia vaatimuksia. data stream mining depth of anaesthesia differential evolution entropy estimation wireless sensor networks anestesian syvyys differentiaalievoluutio entropian arviointi langattomat anturiverkot tietovirtojen louhinta

Search results