Global ETD Search

21	Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging Olorunnimbe, Muhammed January 2015 (has links) In this era of the Internet of Things and Big Data, a proliferation of connected devices continuously produce massive amounts of fast evolving streaming data. There is a need to study the relationships in such streams for analytic applications, such as network intrusion detection, fraud detection and financial forecasting, amongst other. In this setting, it is crucial to create data mining algorithms that are able to seamlessly adapt to temporal changes in data characteristics that occur in data streams. These changes are called concept drifts. The resultant models produced by such algorithms should not only be highly accurate and be able to swiftly adapt to changes. Rather, the data mining techniques should also be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider issues such as storage space needs and memory utilization. This is especially relevant when we aim to build personalized, near-instant models in a Big Data setting. This research work focuses on mining in a data stream with concept drift, using an online bagging method, with consideration to the memory utilization. Our aim is to take an adaptive approach to resource allocation during the mining process. Specifically, we consider metalearning, where the models of multiple classifiers are combined into an ensemble, has been very successful when building accurate models against data streams. However, little work has been done to explore the interplay between accuracy, efficiency and utility. This research focuses on this issue. We introduce an adaptive metalearning algorithm that takes advantage of the memory utilization cost of concept drift, in order to vary the ensemble size during the data mining process. We aim to minimize the memory usage, while maintaining highly accurate models with a high utility. We evaluated our method against a number of benchmarking datasets and compare our results against the state-of-the art. Return on Investment (ROI) was used to evaluate the gain in performance in terms of accuracy, in contrast to the time and memory invested. We aimed to achieve high ROI without compromising on the accuracy of the result. Our experimental results indicate that we achieved this goal. Data stream Concept drift Metalearning Cost sensitive adaptation ROI Utility Adaptive ensemble size Online bagging
22	Higher Order Neural Networks and Neural Networks for Stream Learning Dong, Yue January 2017 (has links) The goal of this thesis is to explore some variations of neural networks. The thesis is mainly split into two parts: a variation of the shaping functions in neural networks and a variation of learning rules in neural networks. In the first part, we mainly investigate polynomial perceptrons - a perceptron with a polynomial shaping function instead of a linear one. We prove the polynomial perceptron convergence theorem and illustrate the notion by showing that a higher order perceptron can learn the XOR function through empirical experiments with implementation. In the second part, we propose three models (SMLP, SA, SA2) for stream learning and anomaly detection in streams. The main technique allowing these models to perform at a level comparable to the state-of-the-art algorithms in stream learning is the learning rule used. We employ mini-batch gradient descent algorithm and stochastic gradient descent algorithm to speed up the models. In addition, the use of parallel processing with multi-threads makes the proposed methods highly efficient in dealing with streaming data. Our analysis shows that all models have linear runtime and constant memory requirement. We also demonstrate empirically that the proposed methods feature high detection rate, low false alarm rate, and fast response. The paper on the first two models (SMLP, SA) is published in the 29th Canadian AI Conference and won the best paper award. The invited journal paper on the third model (SA2) for Computational Intelligence is under peer review. machine learning neural networks anomaly detection stream learning concept drift higher order neural networks
23	Clustering Techniques for Mining and Analysis of Evolving Data Devagiri, Vishnu Manasa January 2021 (has links) The amount of data generated is on rise due to increased demand for fields like IoT, smart monitoring applications, etc. Data generated through such systems have many distinct characteristics like continuous data generation, evolutionary, multi-source nature, and heterogeneity. In addition, the real-world data generated in these fields is largely unlabelled. Clustering is an unsupervised learning technique used to group, analyze and interpret unlabelled data. Conventional clustering algorithms are not suitable for dealing with data having previously mentioned characteristics due to memory and computational constraints, their inability to handle concept drift, distributed location of data. Therefore novel clustering approaches capable of analyzing and interpreting evolving and/or multi-source streaming data are needed. The thesis is focused on building evolutionary clustering algorithms for data that evolves over time. We have initially proposed an evolutionary clustering approach, entitled Split-Merge Clustering (Paper I), capable of continuously updating the generated clustering solution in the presence of new data. Through the progression of the work, new challenges have been studied and addressed. Namely, the Split-Merge Clustering algorithm has been enhanced in Paper II with new capabilities to deal with the challenges of multi-view data applications. A multi-view or multi-source data presents the studied phenomenon/system from different perspectives (views), and can reveal interesting knowledge that is not visible when only one view is considered and analyzed. This has motivated us to continue in this direction by designing two other novel multi-view data stream clustering algorithms. The algorithm proposed in Paper III improves the performance and interpretability of the algorithm proposed in Paper II. Paper IV introduces a minimum spanning tree based multi-view clustering algorithm capable of transferring knowledge between consecutive data chunks, and it is also enriched with a post-clustering pattern-labeling procedure. The proposed and studied evolutionary clustering algorithms are evaluated on various data sets. The obtained results have demonstrated the robustness of the algorithms for modeling, analyzing, and mining evolving data streams. They are able to adequately adapt single and multi-view clustering models by continuously integrating newly arriving data. Clustering analysis Concept drift Evolutionary clustering Machine learning Streaming data Computer Sciences Datavetenskap (datalogi)
24	Smart Cube Predictions for Online Analytic Query Processing in Data Warehouses Belcin, Andrei 01 April 2021 (has links) A data warehouse (DW) is a transformation of many sources of transactional data integrated into a single collection that is non-volatile and time-variant that can provide decision support to managerial roles within an organization. For this application, the database server needs to process multiple users’ queries by joining various datasets and loading the result in main memory to begin calculations. In current systems, this process is reactionary to users’ input and can be undesirably slow. In previous studies, it was shown that a personalization scheme of a single user’s query patterns and loading the smaller subset into main memory the query response time significantly shortened the query response time. The LPCDA framework developed in this research handles multiple users’ query demands, and the query patterns are subject to change (so-called concept drift) and noise. To this end, the LPCDA framework detects changes in user behaviour and dynamically adapts the personalized smart cube definition for the group of users. Numerous data mart (DM)s, as components of the DW, are subject to intense aggregations to assist analytics at the request of automated systems and human users’ queries. Subsequently, there is a growing need to properly manage the supply of data into main memory that is in closest proximity to the CPU that computes the query in order to reduce the response time from the moment a query arrives at the DW server. As a result, this thesis proposes an end-to-end adaptive learning ensemble for resource allocation of cuboids within a a DM to achieve a relevant and timely constructed smart cube before the time in need, as a way of adopting the just-in-time inventory management strategy applied in other real-world scenarios. The algorithms comprising the ensemble involve predictive methodologies from Bayesian statistics, data mining, and machine learning, that reflect the changes in the data-generating process using a number of change detection algorithms. Therefore, given different operational constraints and data-specific considerations, the ensemble can, to an effective degree, determine the cuboids in the lattice of a DM to pre-construct into a smart cube ahead of users submitting their queries, thereby benefiting from a quicker response than static schema views or no action at all. Machine Learning Concept Drift Data Warehouse Smart Cube OLAP Predictive Modelling
25	Using particle swarm optimisation to train feedforward neural networks in dynamic environments Rakitianskaia, A.S. (Anastassia Sergeevna) 13 February 2012 (has links) The feedforward neural network (NN) is a mathematical model capable of representing any non-linear relationship between input and output data. It has been succesfully applied to a wide variety of classification and function approximation problems. Various neural network training algorithms were developed, including the particle swarm optimiser (PSO), which was shown to outperform the standard back propagation training algorithm on a selection of problems. However, it was usually assumed that the environment in which a NN operates is static. Such an assumption is often not valid for real life problems, and the training algorithms have to be adapted accordingly. Various dynamic versions of the PSO have already been developed. This work investigates the applicability of dynamic PSO algorithms to NN training in dynamic environments, and compares the performance of dynamic PSO algorithms to the performance of back propagation. Three popular dynamic PSO variants are considered. The extent of adaptive properties of back propagation and dynamic PSO under different kinds of dynamic environments is determined. Dynamic PSO is shown to be a viable alternative to back propagation, especially under the environments exhibiting infrequent gradual changes. Copyright 2011, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. Please cite as follows: Rakitianskaia, A 2011, Using particle swarm optimisation to train feedforward neural networks in dynamic environments, MSc dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://upetd.up.ac.za/thesis/available/etd-02132012-233212 / > C12/4/406/gm / Dissertation (MSc)--University of Pretoria, 2011. / Computer Science / Unrestricted Computational intelligence Particle swarm optimization (PSO) Concept drift Neural networks UCTD
26	Towards Robust and Adaptive Machine Learning : A Fresh Perspective on Evaluation and Adaptation Methodologies in Non-Stationary Environments Bayram, Firas January 2023 (has links) Machine learning (ML) has become ubiquitous in various disciplines and applications, serving as a powerful tool for developing predictive models to analyze diverse variables of interest. With the advent of the digital era, the proliferation of data has presented numerous opportunities for growth and expansion across various domains. However, along with these opportunities, there is a unique set of challenges that arises due to the dynamic and ever-changing nature of data. These challenges include concept drift, which refers to shifting data distributions over time, and other data-related issues that can be framed as learning problems. Traditional static models are inadequate in handling these issues, underscoring the need for novel approaches to enhance the performance robustness and reliability of ML models to effectively navigate the inherent non-stationarity in the online world. The field of concept drift is characterized by several intricate aspects that challenge learning algorithms, including the analysis of model performance, which requires evaluating and understanding how the ML model's predictive capability is affected by different problem settings. Additionally, determining the magnitude of drift necessary for change detection is an indispensable task, as it involves identifying substantial shifts in data distributions. Moreover, the integration of adaptive methodologies is essential for updating ML models in response to data dynamics, enabling them to maintain their effectiveness and reliability in evolving environments. In light of the significance and complexity of the topic, this dissertation offers a fresh perspective on the performance robustness and adaptivity of ML models in non-stationary environments. The main contributions of this research include exploring and organizing the literature, analyzing the performance of ML models in the presence of different types of drift, and proposing innovative methodologies for drift detection and adaptation that solve real-world problems. By addressing these challenges, this research paves the way for the development of more robust and adaptive ML solutions capable of thriving in dynamic and evolving data landscapes. / Machine learning (ML) is widely used in various disciplines as a powerful tool for developing predictive models to analyze diverse variables. In the digital era, the abundance of data has created growth opportunities, but it also brings challenges due to the dynamic nature of data. One of these challenges is concept drift, the shifting data distributions over time. Consequently, traditional static models are inadequate for handling these challenges in the online world. Concept drift, with its intricate aspects, presents a challenge for learning algorithms. Analyzing model performance and detecting substantial shifts in data distributions are crucial for integrating adaptive methodologies to update ML models in response to data dynamics, maintaining effectiveness and reliability in evolving environments. In this dissertation, a fresh perspective is offered on the robustness and adaptivity of ML models in non-stationary environments. This research explores and organizes existing literature, analyzes ML model performance in the presence of drift, and proposes innovative methodologies for detecting and adapting to drift in real-world problems. The aim is to develop more robust and adaptive ML solutions capable of thriving in dynamic and evolving data landscapes. machine learning concept drift covariate shift performance robustness evaluation methodology adaptive learning Computer Sciences Datavetenskap (datalogi)
27	Graded possibilistic clustering of non-stationary data streams Abdullatif, Amr R.A., Masulli, F., Rovetta, S., Cabri, A. 27 January 2020 (has links) Yes / Multidimensional data streams are a major paradigm in data science. This work focuses on possibilistic clustering algorithms as means to perform clustering of multidimensional streaming data. The proposed approach exploits fuzzy outlier analysis to provide good learning and tracking abilities in both concept shift and concept drift. Cluster model Concept drift Ambient assisted living Annealing schedule Concept shift
28	Human-Machine Alignment for Context Recognition in the Wild Bontempelli, Andrea 30 April 2024 (has links) The premise for AI systems like personal assistants to provide guidance and suggestions to an end-user is to understand, at any moment in time, the personal context that the user is in. The context – where the user is, what she is doing and with whom – allows the machine to represent the world in user’s terms. The context must be inferred from a stream of sensor readings generated by smart wearables such as smartphones and smartwatches, and the labels are acquired from the user directly. To perform robust context prediction in this real-world scenario, the machine must handle the egocentric nature of the context, adapt to the changing world and user, and maintain a bidirectional interaction with the user to ensure the user-machine alignment of world representations. To this end, the machine must learn incrementally on the input stream of sensor readings and user supervision. In this work, we: (i) introduce interactive classification in the wild and present knowledge drift (KD), a special form of concept drift, occurring due to world and user changes; (ii) develop simple and robust ML methods to tackle these scenarios; (iii) showcase the advantages of each of these methods in empirical evaluations on controlled synthetic and real-world data sets; (iv) design a flexible and modular architecture that combines the methods above to support context recognition in the wild; (v) present an evaluation with real users in a concrete social science use case.
29	基於概念飄移探勘的社群多媒體之熱門程度預測 / Popularity prediction of social multimedia based on concept drift mining 鄭世宏, Jheng, Shih Hong Unknown Date (has links) 近年來社群平台(Social Media)的興起，提供了人與人之間簡便且快速互相交換各式各樣內容的機會。社群多媒體(Social Multimedia)指的就是使用者在社群平台上所互相交換的多媒體內容，相較於單純的多媒體內容而言，社群多媒體多了寶貴的大量社群平台使用者之間分享互動的記錄，以及社群平台使用者在社群網絡(Social Network)中的各項資訊。如此一來為多媒體內容提供了更多面向的資料，讓社群多媒體比起單純的多媒體內容有更多的應用的可能。微網誌(Microblog)是個可以讓使用者自由的即時分享文字訊息的平台，有著許多使用者的當下的心情、眼前所看到聽到的事或與朋友對話等。而微網誌平台相較於其它單純用來分享多媒體內容的社群平台(例如YouTube或Flickr)而言，在微網誌平台上的多媒體內容有明顯的分享傳遞現象。而本研究的目標，就是要利用些多媒體內容在微網誌平台上的分享傳遞的特性與資料，針對群多媒體內容進行熱門預測。隨著時間的前進，若以單一同樣的規則來進行熱門預測，將可能造成預測準確率的下降；再者，即使是在同樣的時間點，不同的多媒體內容會有各自隨著時間在熱門上的變化趨勢，還是會有需要不同的規則來進行熱門預測的可能性，也就是所謂的局部概念飄移現象。在此我們將熱門預測問題轉為資料探勘(Data Mining)中的分類(Classification)問題，並同時將局部概念飄移現象納入考慮，提出一個針對微網誌平台上多媒體內容的熱門預測方法。實驗結果顯示，有考慮局部概念飄移的熱門預測方法，在準確率的表現上明顯的優於GCD方法(平均有4%的提升)與Baseline方法(平均有10%的提升)，代表我們的熱門預測方法更適合微網誌平台上的多媒體內容，也代表的確有概念飄移與局部概念飄移的現象存在。 / In recent years, the rise of social media offers an easy and fast way for information exchange. Social multimedia refers to the multimedia content that users share on the social media. Different from traditional multimedia, social multimedia contains both the multimedia and user behavior information on social media. Microblog is one type of social media. Compared to other social media such as YouTube and Flickr, microblogs provide a more friendly environment for users to propagate social multimedia. The goal of this thesis is to make use of the characteristics and information of propagation on microblogs for popularity prediction of social multimedia. The popularity prediction method based on concept drift mining is proposed. In particular, the local concept drift mechanism is employed to capture the local characteristics of social multimedia. By taking the local concept drift into consideration, the task of popularity prediction is transformed into the ensemble classification problem. Experiments on social multimedia collected from plurk show that the proposed approach performs well. 社群多媒體社群媒體熱門預測概念飄移局部概念飄移分類 Social Multimedia Social Media Popularity Prediction Concept Drift Local Concept Drift Classification
30	Avaliação criteriosa dos algoritmos de detecção de concept drifts SANTOS, Silas Garrido Teixeira de Carvalho 27 February 2015 (has links) Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-07-11T12:33:28Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) silas-dissertacao-versao-final-2016.pdf: 1708159 bytes, checksum: 6c0efc5f2f0b27c79306418c9de516f1 (MD5) / Made available in DSpace on 2016-07-11T12:33:28Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) silas-dissertacao-versao-final-2016.pdf: 1708159 bytes, checksum: 6c0efc5f2f0b27c79306418c9de516f1 (MD5) Previous issue date: 2015-02-27 / FACEPE / A extração de conhecimento em ambientes com fluxo contínuo de dados é uma atividade que vem crescendo progressivamente. Diversas são as situações que necessitam desse mecanismo, como o monitoramento do histórico de compras de clientes; a detecção de presença por meio de sensores; ou o monitoramento da temperatura da água. Desta maneira, os algoritmos utilizados para esse fim devem ser atualizados constantemente, buscando adaptar-se às novas instâncias e levando em consideração as restrições computacionais. Quando se trabalha em ambientes com fluxo contínuo de dados, em geral não é recomendável supor que sua distribuição permanecerá estacionária. Diversas mudanças podem ocorrer ao longo do tempo, desencadeando uma situação geralmente conhecida como mudança de conceito (concept drift). Neste trabalho foi realizado um estudo comparativo entre alguns dos principais métodos de detecção de mudanças: ADWIN, DDM, DOF, ECDD, EDDM, PL e STEPD. Para execução dos experimentos foram utilizadas bases artificiais – simulando mudanças abruptas, graduais rápidas, e graduais lentas – e também bases com problemas reais. Os resultados foram analisados baseando-se na precisão, tempo de execução, uso de memória, tempo médio de detecção das mudanças, e quantidade de falsos positivos e negativos. Já os parâmetros dos métodos foram definidos utilizando uma versão adaptada de um algoritmo genético. De acordo com os resultados do teste de Friedman juntamente com Nemenyi, em termos de precisão, DDM se mostrou o método mais eficiente com as bases utilizadas, sendo estatisticamente superior ao DOF e ECDD. Já EDDM foi o método mais rápido e também o mais econômico no uso da memória, sendo superior ao DOF, ECDD, PL e STEPD, em ambos os casos. Conclui-se então que métodos mais sensíveis às detecções de mudanças, e consequentemente mais propensos a alarmes falsos, obtêm melhores resultados quando comparados a métodos menos sensíveis e menos suscetíveis a alarmes falsos. / Knowledge extraction from data streams is an activity that has been progressively receiving an increased demand. Examples of such applications include monitoring purchase history of customers, movement data from sensors, or water temperatures. Thus, algorithms used for this purpose must be constantly updated, trying to adapt to new instances and taking into account computational constraints. When working in environments with a continuous flow of data, there is no guarantee that the distribution of the data will remain stationary. On the contrary, several changes may occur over time, triggering situations commonly known as concept drift. In this work we present a comparative study of some of the main drift detection methods: ADWIN, DDM, DOF, ECDD, EDDM, PL and STEPD. For the execution of the experiments, artificial datasets were used – simulating abrupt, fast gradual, and slow gradual changes – and also datasets with real problems. The results were analyzed based on the accuracy, runtime, memory usage, average time to change detection, and number of false positives and negatives. The parameters of methods were defined using an adapted version of a genetic algorithm. According to the Friedman test with Nemenyi results, in terms of accuracy, DDM was the most efficient method with the datasets used, and statistically superior to DOF and ECDD. EDDM was the fastest method and also the most economical in memory usage, being statistically superior to DOF, ECDD, PL and STEPD, in both cases. It was concluded that more sensitive change detection methods, and therefore more prone to false alarms, achieve better results when compared to less sensitive and less susceptible to false alarms methods. Mudanças de conceito (Concept drift) Métodos de detecção de mudanças Algoritmo genético Avaliação criteriosa Data streams Concept drift Drift detection methods Genetic algorithm Rigorous evaluation

Search results