• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 33
  • 10
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 53
  • 53
  • 34
  • 33
  • 20
  • 15
  • 14
  • 13
  • 10
  • 10
  • 10
  • 10
  • 9
  • 9
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging

Olorunnimbe, Muhammed January 2015 (has links)
In this era of the Internet of Things and Big Data, a proliferation of connected devices continuously produce massive amounts of fast evolving streaming data. There is a need to study the relationships in such streams for analytic applications, such as network intrusion detection, fraud detection and financial forecasting, amongst other. In this setting, it is crucial to create data mining algorithms that are able to seamlessly adapt to temporal changes in data characteristics that occur in data streams. These changes are called concept drifts. The resultant models produced by such algorithms should not only be highly accurate and be able to swiftly adapt to changes. Rather, the data mining techniques should also be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider issues such as storage space needs and memory utilization. This is especially relevant when we aim to build personalized, near-instant models in a Big Data setting. This research work focuses on mining in a data stream with concept drift, using an online bagging method, with consideration to the memory utilization. Our aim is to take an adaptive approach to resource allocation during the mining process. Specifically, we consider metalearning, where the models of multiple classifiers are combined into an ensemble, has been very successful when building accurate models against data streams. However, little work has been done to explore the interplay between accuracy, efficiency and utility. This research focuses on this issue. We introduce an adaptive metalearning algorithm that takes advantage of the memory utilization cost of concept drift, in order to vary the ensemble size during the data mining process. We aim to minimize the memory usage, while maintaining highly accurate models with a high utility. We evaluated our method against a number of benchmarking datasets and compare our results against the state-of-the art. Return on Investment (ROI) was used to evaluate the gain in performance in terms of accuracy, in contrast to the time and memory invested. We aimed to achieve high ROI without compromising on the accuracy of the result. Our experimental results indicate that we achieved this goal.
22

Higher Order Neural Networks and Neural Networks for Stream Learning

Dong, Yue January 2017 (has links)
The goal of this thesis is to explore some variations of neural networks. The thesis is mainly split into two parts: a variation of the shaping functions in neural networks and a variation of learning rules in neural networks. In the first part, we mainly investigate polynomial perceptrons - a perceptron with a polynomial shaping function instead of a linear one. We prove the polynomial perceptron convergence theorem and illustrate the notion by showing that a higher order perceptron can learn the XOR function through empirical experiments with implementation. In the second part, we propose three models (SMLP, SA, SA2) for stream learning and anomaly detection in streams. The main technique allowing these models to perform at a level comparable to the state-of-the-art algorithms in stream learning is the learning rule used. We employ mini-batch gradient descent algorithm and stochastic gradient descent algorithm to speed up the models. In addition, the use of parallel processing with multi-threads makes the proposed methods highly efficient in dealing with streaming data. Our analysis shows that all models have linear runtime and constant memory requirement. We also demonstrate empirically that the proposed methods feature high detection rate, low false alarm rate, and fast response. The paper on the first two models (SMLP, SA) is published in the 29th Canadian AI Conference and won the best paper award. The invited journal paper on the third model (SA2) for Computational Intelligence is under peer review.
23

Clustering Techniques for Mining and Analysis of Evolving Data

Devagiri, Vishnu Manasa January 2021 (has links)
The amount of data generated is on rise due to increased demand for fields like IoT, smart monitoring applications, etc. Data generated through such systems have many distinct characteristics like continuous data generation, evolutionary, multi-source nature, and heterogeneity. In addition, the real-world data generated in these fields is largely unlabelled. Clustering is an unsupervised learning technique used to group, analyze and interpret unlabelled data. Conventional clustering algorithms are not suitable for dealing with data having previously mentioned characteristics due to memory and computational constraints, their inability to handle concept drift, distributed location of data. Therefore novel clustering approaches capable of analyzing and interpreting evolving and/or multi-source streaming data are needed.  The thesis is focused on building evolutionary clustering algorithms for data that evolves over time. We have initially proposed an evolutionary clustering approach, entitled Split-Merge Clustering (Paper I), capable of continuously updating the generated clustering solution in the presence of new data. Through the progression of the work, new challenges have been studied and addressed. Namely, the Split-Merge Clustering algorithm has been enhanced in Paper II with new capabilities to deal with the challenges of multi-view data applications. A multi-view or multi-source data presents the studied phenomenon/system from different perspectives (views), and can reveal interesting knowledge that is not visible when only one view is considered and analyzed. This has motivated us to continue in this direction by designing two other novel multi-view data stream clustering algorithms. The algorithm proposed in Paper III improves the performance and interpretability of the algorithm proposed in Paper II. Paper IV introduces a minimum spanning tree based multi-view clustering algorithm capable of transferring knowledge between consecutive data chunks, and it is also enriched with a post-clustering pattern-labeling procedure.  The proposed and studied evolutionary clustering algorithms are evaluated on various data sets. The obtained results have demonstrated the robustness of the algorithms for modeling, analyzing, and mining evolving data streams. They are able to adequately adapt single and multi-view clustering models by continuously integrating newly arriving data.
24

Smart Cube Predictions for Online Analytic Query Processing in Data Warehouses

Belcin, Andrei 01 April 2021 (has links)
A data warehouse (DW) is a transformation of many sources of transactional data integrated into a single collection that is non-volatile and time-variant that can provide decision support to managerial roles within an organization. For this application, the database server needs to process multiple users’ queries by joining various datasets and loading the result in main memory to begin calculations. In current systems, this process is reactionary to users’ input and can be undesirably slow. In previous studies, it was shown that a personalization scheme of a single user’s query patterns and loading the smaller subset into main memory the query response time significantly shortened the query response time. The LPCDA framework developed in this research handles multiple users’ query demands, and the query patterns are subject to change (so-called concept drift) and noise. To this end, the LPCDA framework detects changes in user behaviour and dynamically adapts the personalized smart cube definition for the group of users. Numerous data mart (DM)s, as components of the DW, are subject to intense aggregations to assist analytics at the request of automated systems and human users’ queries. Subsequently, there is a growing need to properly manage the supply of data into main memory that is in closest proximity to the CPU that computes the query in order to reduce the response time from the moment a query arrives at the DW server. As a result, this thesis proposes an end-to-end adaptive learning ensemble for resource allocation of cuboids within a a DM to achieve a relevant and timely constructed smart cube before the time in need, as a way of adopting the just-in-time inventory management strategy applied in other real-world scenarios. The algorithms comprising the ensemble involve predictive methodologies from Bayesian statistics, data mining, and machine learning, that reflect the changes in the data-generating process using a number of change detection algorithms. Therefore, given different operational constraints and data-specific considerations, the ensemble can, to an effective degree, determine the cuboids in the lattice of a DM to pre-construct into a smart cube ahead of users submitting their queries, thereby benefiting from a quicker response than static schema views or no action at all.
25

Using particle swarm optimisation to train feedforward neural networks in dynamic environments

Rakitianskaia, A.S. (Anastassia Sergeevna) 13 February 2012 (has links)
The feedforward neural network (NN) is a mathematical model capable of representing any non-linear relationship between input and output data. It has been succesfully applied to a wide variety of classification and function approximation problems. Various neural network training algorithms were developed, including the particle swarm optimiser (PSO), which was shown to outperform the standard back propagation training algorithm on a selection of problems. However, it was usually assumed that the environment in which a NN operates is static. Such an assumption is often not valid for real life problems, and the training algorithms have to be adapted accordingly. Various dynamic versions of the PSO have already been developed. This work investigates the applicability of dynamic PSO algorithms to NN training in dynamic environments, and compares the performance of dynamic PSO algorithms to the performance of back propagation. Three popular dynamic PSO variants are considered. The extent of adaptive properties of back propagation and dynamic PSO under different kinds of dynamic environments is determined. Dynamic PSO is shown to be a viable alternative to back propagation, especially under the environments exhibiting infrequent gradual changes. Copyright 2011, University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. Please cite as follows: Rakitianskaia, A 2011, Using particle swarm optimisation to train feedforward neural networks in dynamic environments, MSc dissertation, University of Pretoria, Pretoria, viewed yymmdd < http://upetd.up.ac.za/thesis/available/etd-02132012-233212 / > C12/4/406/gm / Dissertation (MSc)--University of Pretoria, 2011. / Computer Science / Unrestricted
26

Towards Robust and Adaptive Machine Learning : A Fresh Perspective on Evaluation and Adaptation Methodologies in Non-Stationary Environments

Bayram, Firas January 2023 (has links)
Machine learning (ML) has become ubiquitous in various disciplines and applications, serving as a powerful tool for developing predictive models to analyze diverse variables of interest. With the advent of the digital era, the proliferation of data has presented numerous opportunities for growth and expansion across various domains. However, along with these opportunities, there is a unique set of challenges that arises due to the dynamic and ever-changing nature of data. These challenges include concept drift, which refers to shifting data distributions over time, and other data-related issues that can be framed as learning problems. Traditional static models are inadequate in handling these issues, underscoring the need for novel approaches to enhance the performance robustness and reliability of ML models to effectively navigate the inherent non-stationarity in the online world. The field of concept drift is characterized by several intricate aspects that challenge learning algorithms, including the analysis of model performance, which requires evaluating and understanding how the ML model's predictive capability is affected by different problem settings. Additionally, determining the magnitude of drift necessary for change detection is an indispensable task, as it involves identifying substantial shifts in data distributions. Moreover, the integration of adaptive methodologies is essential for updating ML models in response to data dynamics, enabling them to maintain their effectiveness and reliability in evolving environments. In light of the significance and complexity of the topic, this dissertation offers a fresh perspective on the performance robustness and adaptivity of ML models in non-stationary environments. The main contributions of this research include exploring and organizing the literature, analyzing the performance of ML models in the presence of different types of drift, and proposing innovative methodologies for drift detection and adaptation that solve real-world problems. By addressing these challenges, this research paves the way for the development of more robust and adaptive ML solutions capable of thriving in dynamic and evolving data landscapes. / Machine learning (ML) is widely used in various disciplines as a powerful tool for developing predictive models to analyze diverse variables. In the digital era, the abundance of data has created growth opportunities, but it also brings challenges due to the dynamic nature of data. One of these challenges is concept drift, the shifting data distributions over time. Consequently, traditional static models are inadequate for handling these challenges in the online world. Concept drift, with its intricate aspects, presents a challenge for learning algorithms. Analyzing model performance and detecting substantial shifts in data distributions are crucial for integrating adaptive methodologies to update ML models in response to data dynamics, maintaining effectiveness and reliability in evolving environments. In this dissertation, a fresh perspective is offered on the robustness and adaptivity of ML models in non-stationary environments. This research explores and organizes existing literature, analyzes ML model performance in the presence of drift, and proposes innovative methodologies for detecting and adapting to drift in real-world problems. The aim is to develop more robust and adaptive ML solutions capable of thriving in dynamic and evolving data landscapes.
27

Graded possibilistic clustering of non-stationary data streams

Abdullatif, Amr R.A., Masulli, F., Rovetta, S., Cabri, A. 27 January 2020 (has links)
Yes / Multidimensional data streams are a major paradigm in data science. This work focuses on possibilistic clustering algorithms as means to perform clustering of multidimensional streaming data. The proposed approach exploits fuzzy outlier analysis to provide good learning and tracking abilities in both concept shift and concept drift.
28

基於概念飄移探勘的社群多媒體之熱門程度預測 / Popularity prediction of social multimedia based on concept drift mining

鄭世宏, Jheng, Shih Hong Unknown Date (has links)
近年來社群平台(Social Media)的興起,提供了人與人之間簡便且快速互相交換各式各樣內容的機會。社群多媒體(Social Multimedia)指的就是使用者在社群平台上所互相交換的多媒體內容,相較於單純的多媒體內容而言,社群多媒體多了寶貴的大量社群平台使用者之間分享互動的記錄,以及社群平台使用者在社群網絡(Social Network)中的各項資訊。如此一來為多媒體內容提供了更多面向的資料,讓社群多媒體比起單純的多媒體內容有更多的應用的可能。 微網誌(Microblog)是個可以讓使用者自由的即時分享文字訊息的平台,有著許多使用者的當下的心情、眼前所看到聽到的事或與朋友對話等。而微網誌平台相較於其它單純用來分享多媒體內容的社群平台(例如YouTube或Flickr)而言,在微網誌平台上的多媒體內容有明顯的分享傳遞現象。而本研究的目標,就是要利用些多媒體內容在微網誌平台上的分享傳遞的特性與資料,針對群多媒體內容進行熱門預測。 隨著時間的前進,若以單一同樣的規則來進行熱門預測,將可能造成預測準確率的下降;再者,即使是在同樣的時間點,不同的多媒體內容會有各自隨著時間在熱門上的變化趨勢,還是會有需要不同的規則來進行熱門預測的可能性,也就是所謂的局部概念飄移現象。在此我們將熱門預測問題轉為資料探勘(Data Mining)中的分類(Classification)問題,並同時將局部概念飄移現象納入考慮,提出一個針對微網誌平台上多媒體內容的熱門預測方法。實驗結果顯示,有考慮局部概念飄移的熱門預測方法,在準確率的表現上明顯的優於GCD方法(平均有4%的提升)與Baseline方法(平均有10%的提升),代表我們的熱門預測方法更適合微網誌平台上的多媒體內容,也代表的確有概念飄移與局部概念飄移的現象存在。 / In recent years, the rise of social media offers an easy and fast way for information exchange. Social multimedia refers to the multimedia content that users share on the social media. Different from traditional multimedia, social multimedia contains both the multimedia and user behavior information on social media. Microblog is one type of social media. Compared to other social media such as YouTube and Flickr, microblogs provide a more friendly environment for users to propagate social multimedia. The goal of this thesis is to make use of the characteristics and information of propagation on microblogs for popularity prediction of social multimedia. The popularity prediction method based on concept drift mining is proposed. In particular, the local concept drift mechanism is employed to capture the local characteristics of social multimedia. By taking the local concept drift into consideration, the task of popularity prediction is transformed into the ensemble classification problem. Experiments on social multimedia collected from plurk show that the proposed approach performs well.
29

Avaliação criteriosa dos algoritmos de detecção de concept drifts

SANTOS, Silas Garrido Teixeira de Carvalho 27 February 2015 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-07-11T12:33:28Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) silas-dissertacao-versao-final-2016.pdf: 1708159 bytes, checksum: 6c0efc5f2f0b27c79306418c9de516f1 (MD5) / Made available in DSpace on 2016-07-11T12:33:28Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) silas-dissertacao-versao-final-2016.pdf: 1708159 bytes, checksum: 6c0efc5f2f0b27c79306418c9de516f1 (MD5) Previous issue date: 2015-02-27 / FACEPE / A extração de conhecimento em ambientes com fluxo contínuo de dados é uma atividade que vem crescendo progressivamente. Diversas são as situações que necessitam desse mecanismo, como o monitoramento do histórico de compras de clientes; a detecção de presença por meio de sensores; ou o monitoramento da temperatura da água. Desta maneira, os algoritmos utilizados para esse fim devem ser atualizados constantemente, buscando adaptar-se às novas instâncias e levando em consideração as restrições computacionais. Quando se trabalha em ambientes com fluxo contínuo de dados, em geral não é recomendável supor que sua distribuição permanecerá estacionária. Diversas mudanças podem ocorrer ao longo do tempo, desencadeando uma situação geralmente conhecida como mudança de conceito (concept drift). Neste trabalho foi realizado um estudo comparativo entre alguns dos principais métodos de detecção de mudanças: ADWIN, DDM, DOF, ECDD, EDDM, PL e STEPD. Para execução dos experimentos foram utilizadas bases artificiais – simulando mudanças abruptas, graduais rápidas, e graduais lentas – e também bases com problemas reais. Os resultados foram analisados baseando-se na precisão, tempo de execução, uso de memória, tempo médio de detecção das mudanças, e quantidade de falsos positivos e negativos. Já os parâmetros dos métodos foram definidos utilizando uma versão adaptada de um algoritmo genético. De acordo com os resultados do teste de Friedman juntamente com Nemenyi, em termos de precisão, DDM se mostrou o método mais eficiente com as bases utilizadas, sendo estatisticamente superior ao DOF e ECDD. Já EDDM foi o método mais rápido e também o mais econômico no uso da memória, sendo superior ao DOF, ECDD, PL e STEPD, em ambos os casos. Conclui-se então que métodos mais sensíveis às detecções de mudanças, e consequentemente mais propensos a alarmes falsos, obtêm melhores resultados quando comparados a métodos menos sensíveis e menos suscetíveis a alarmes falsos. / Knowledge extraction from data streams is an activity that has been progressively receiving an increased demand. Examples of such applications include monitoring purchase history of customers, movement data from sensors, or water temperatures. Thus, algorithms used for this purpose must be constantly updated, trying to adapt to new instances and taking into account computational constraints. When working in environments with a continuous flow of data, there is no guarantee that the distribution of the data will remain stationary. On the contrary, several changes may occur over time, triggering situations commonly known as concept drift. In this work we present a comparative study of some of the main drift detection methods: ADWIN, DDM, DOF, ECDD, EDDM, PL and STEPD. For the execution of the experiments, artificial datasets were used – simulating abrupt, fast gradual, and slow gradual changes – and also datasets with real problems. The results were analyzed based on the accuracy, runtime, memory usage, average time to change detection, and number of false positives and negatives. The parameters of methods were defined using an adapted version of a genetic algorithm. According to the Friedman test with Nemenyi results, in terms of accuracy, DDM was the most efficient method with the datasets used, and statistically superior to DOF and ECDD. EDDM was the fastest method and also the most economical in memory usage, being statistically superior to DOF, ECDD, PL and STEPD, in both cases. It was concluded that more sensitive change detection methods, and therefore more prone to false alarms, achieve better results when compared to less sensitive and less susceptible to false alarms methods.
30

An approach for online learning in the presence of concept changes / Une approche pour l'apprentissage en-ligne en présence de changements de concept.

Jaber, Ghazal 18 October 2013 (has links)
De nombreuses applications de flux de données ont vu le jour au cours des dernières années. Lorsque l'environnement évolue, il est nécessaire de s'appuyer sur un apprentissage en ligne pouvant s'adapter aux conditions changeantes, alias dérives de concept. L'adaptation aux dérives de concept implique d'oublier une partie ou la totalité des connaissances acquises lorsque le concept change, tout en accumulant des connaissances sur le concept sous-jacent supposé stationnaire. Ce compromis est appelé le dilemme stabilité-plasticité.Les méthodes d'ensemble ont été parmi les approches les plus réussies. Cependant, la gestion de l'ensemble qui détermine les informations à oublier n'a pas été complètement étudiée jusqu'ici. Notre travail montre l'importance de la stratégie de l'oubli en comparant plusieurs approches. Les résultats ainsi obtenus nous amènent à proposer une nouvelle méthode d'ensemble avec une stratégie d'oubli conçue pour s'adapter aux dérives de concept. Des évaluations empiriques montrent que notre méthode se compare favorablement aux systèmes adaptatifs de l'état de l'art.Les majorité des anciens travaux réalisés se sont focalisés sur la détection des changements de concept, ainsi que les méthodes permettant d'adapter le système d'apprentissage aux changements. Dans ce travail, nous allons plus loin en introduisant un mécanisme d'anticipation capable de détecter des états pertinents de l'environnement, de reconnaître les contextes récurrents et d'anticiper les changements de concept susceptibles.Par conséquent, la méthode que nous proposons traite à la fois le défi d'optimiser le dilemme stabilité-plasticité, l'anticipation et la reconnaissance des futurs concepts. Ceci est accompli grâce à une méthode d'ensemble qui contrôle un comité d'apprenants. D'une part, la gestion de l'ensemble permet de s'adapter naturellement à la dynamique des changements de concept avec peu de paramètres à régler. D'autre part, un mécanisme d'apprentissage surveillant les changements dans l'ensemble fournit des moyens pour anticiper la modification sous-jacente du contexte. / Learning from data streams is emerging as an important application area. When the environment changes, it is necessary to rely on on-line learning with the capability to adapt to changing conditions a.k.a. concept drifts. Adapting to concept drifts entails forgetting some or all of the old acquired knowledge when the concept changes while accumulating knowledge regarding the supposedly stationary underlying concept. This tradeoff is called the stability-plasticity dilemma. Ensemble methods have been among the most successful approaches. However, the management of the ensemble which ultimately controls how past data is forgotten has not been thoroughly investigated so far. Our work shows the importance of the forgetting strategy by comparing several approaches. The results thus obtained lead us to propose a new ensemble method with an enhanced forgetting strategy to adapt to concept drifts. Experimental comparisons show that our method compares favorably with the well-known state-of-the-art systems. The majority of previous works focused only on means to detect changes and to adapt to them. In our work, we go one step further by introducing a meta-learning mechanism that is able to detect relevant states of the environment, to recognize recurring contexts and to anticipate likely concepts changes. Hence, the method we suggest, deals with both the challenge of optimizing the stability-plasticity dilemma and with the anticipation and recognition of incoming concepts. This is accomplished through an ensemble method that controls a ensemble of incremental learners. The management of the ensemble of learners enables one to naturally adapt to the dynamics of the concept changes with very few parameters to set, while a learning mechanism managing the changes in the ensemble provides means for the anticipation of, and the quick adaptation to, the underlying modification of the context.

Page generated in 0.4369 seconds