Global ETD Search

1	Anomaly Detection in Univariate Time Series Data in the Presence of Concept Drift Zamani Alavijeh, Soroush January 2021 (has links) Digital applications and devices record data over time to enable the users and managers to monitor their activity. Errors occur in data, including the time series data, for various reasons including software system failures and human errors. The problem of identifying errors, also referred to as anomaly detection, in time series data is a well studied topic by the data management and systems researchers. Such data are often recorded in dynamic environments where a change in the standard or the recording hardware can result in different and novel patterns arising in the data. Such novel patterns are caused by what is referred to as concept drifts. Concept drift occurs when there is a pattern change in the statistical properties of the data, e.g. the distribution of the data, over time. The problem of identifying anomalies in time series data recorded and stored in dynamic environments has not been extensively studied. In this study, we focus on this problem. We propose and implement a unified framework that is able to identify drifts in univariate time series data and incorporate information gained from the data to train a learning model that is able to detect anomalies in unseen univariate time series data. / Thesis / Master of Science (MSc) Anomaly Detection Concept Drift
2	Tuning and Optimising Concept Drift Detection Do, Ethan Quoc-Nam January 2021 (has links) Data drifts naturally occur in data streams due to seasonality, change in data usage, and the data generation process. Concepts modelled via the data streams will also experience such drift. The problem of differentiating concept drift from anomalies is important to identify normal vs abnormal behaviour. Existing techniques achieve poor responsiveness and accuracy towards this differentiation task. We take two approaches to address this problem. First, we extend an existing sliding window algorithm to include multiple windows to model recently seen data stream patterns, and define new parameters to compare the data streams. Second, we study a set of optimisers and tune a Bi-LSTM model parameters to maximize accuracy. / Thesis / Master of Applied Science (MASc) concept drift anomaly detection concept drift detection
3	Semi-Supervised Hybrid Windowing Ensembles for Learning from Evolving Streams Floyd, Sean Louis Alan 03 June 2019 (has links) In this thesis, learning refers to the intelligent computational extraction of knowledge from data. Supervised learning tasks require data to be annotated with labels, whereas for unsupervised learning, data is not labelled. Semi-supervised learning deals with data sets that are partially labelled. A major issue with supervised and semi-supervised learning of data streams is late-arriving or missing class labels. Assuming that correctly labelled data will always be available and timely is often unfeasible, and, as such, supervised methods are not directly applicable in the real world. Therefore, real-world problems usually require the use of semi-supervised or unsupervised learning techniques. For instance, when considering a spam detection task, it is not reasonable to assume that all spam will be identified (correctly labelled) prior to learning. Additionally, in semi-supervised learning, "the instances having the highest [predictive] confidence are not necessarily the most useful ones" [41]. We investigate how self-training performs without its selective heuristic in a streaming setting. This leads us to our contributions. We extend an existing concept drift detector to operate without any labelled data, by using a sliding window of our ensemble's prediction confidence, instead of a boolean indicating whether the ensemble's predictions are correct. We also extend selective self-training, a semi-supervised learning method, by using all predictions, and not only those with high predictive confidence. Finally, we introduce a novel windowing type for ensembles, as sliding windows are very time consuming and regular tumbling windows are not a suitable replacement. Our windowing technique can be considered a hybrid of the two: we train each sub-classifier in the ensemble with tumbling windows, but delay training in such a way that only one sub-classifier can update its model per iteration. We found, through statistical significance tests, that our framework is (roughly 160 times) faster than current state of the art techniques, and achieves comparable predictive accuracy. That being said, more research is needed to further reduce the quantity of labelled data used for training, while also increasing its predictive accuracy. Ensemble learning Data streams Concept drift Online learning Non-stationary environments
4	Adaptive User Interfaces for Mobile Computing Devices Bridle, Robert Angus, robert.bridle@gmail.com January 2008 (has links) This thesis examines the use of adaptive user interface elements on a mobile phone and presents two adaptive user interface approaches. The approaches attempt to increase the efficiency with which a user interacts with a mobile phone, while ensuring the interface remains predictable to a user. ¶ An adaptive user interface approach is presented that predicts the menu item a user will select. When a menu is opened, the predicted menu item is highlighted instead of the top-most menu item. The aim is to maintain the layout of the menu and to save the user from performing scrolling key presses. A machine learning approach is used to accomplish the prediction task. However, learning in the mobile phone environment produces several difficulties. These are limited availability of training examples, concept drift and limited computational resources. A novel learning approach is presented that addresses these difficulties. This learning approach addresses limited training examples and limited computational resources by employing a highly restricted hypothesis space. Furthermore, the approach addresses concept drift by determining the hypothesis that has been consistent for the longest run of training examples into the past. Under certain concept drift restrictions, an analysis of this approach shows it to be superior to approaches that use a fixed window of training examples. An experimental evaluation on data collected from several users interacting with a mobile phone was used to assess this learning approach in practice. The results of this evaluation are reported in terms of the average number of key presses saved. The benefit of menu-item prediction can clearly be seen, with savings of up to three key presses on every menu interaction. ¶ An extension of the menu-item prediction approach is presented that removes the need to manually specify a restricted hypothesis space. The approach uses a decision-tree learner to generate hypotheses online and uses the minimum description length principle to identify the occurrence of concept shifts. The identification of concept shifts is used to guide the hypothesis generation process. The approach is compared with the original menu-item prediction approach in which hypotheses are manually specified. Experimental results using the same datasets are reported. ¶ Another adaptive user interface approach is presented that induces shortcuts on a mobile phone interface. The approach is based on identifying shortcuts in the form of macros, which can automate a sequence of actions. A means of specifying relevant action sequences is presented, together with several learning approaches for predicting which shortcut to present to a user. A small subset of the possible shortcuts on a mobile phone was considered. This subset consisted of shortcuts that automated the actions of making a phone call or sending a text message. The results of an experimental evaluation of the shortcut prediction approaches are presented. The shortcut prediction process was evaluated in terms of predictive accuracy and stability, where stability was defined as the rate at which predicted shortcuts changed over time. The importance of stability is discussed, and is used to question the advantages of using sophisticated learning approaches for achieving adaptive user interfaces on mobile phones. Finally, several methods for combining accuracy and stability measures are presented, and the learning approaches are compared with these methods. mobile computing adaptive machine learning user interface concept drift
5	[en] NEUROEVOLUTIVE LEARNING AND CONCEPT DRIFT DETECTION IN NON-STATIONARY ENVIRONMENTS / [pt] APRENDIZAGEM NEUROEVOLUTIVA E DETECÇÃO DE CONCEPT DRIFT EM AMBIENTES NÃO ESTACIONÁRIOS TATIANA ESCOVEDO 04 July 2016 (has links) [pt] Os conceitos do mundo real muitas vezes não são estáveis: eles mudam com o tempo. Assim como os conceitos, a distribuição de dados também pode se alterar. Este problema de mudança de conceitos ou distribuição de dados é conhecido como concept drift e é um desafio para um modelo na tarefa de aprender a partir de dados. Este trabalho apresenta um novo modelo neuroevolutivo com inspiração quântica, baseado em um comitê de redes neurais do tipo Multi-Layer Perceptron (MLP), para a aprendizagem em ambientes não estacionários, denominado NEVE (Neuro-EVolutionary Ensemble). Também apresenta um novo mecanismo de detecção de concept drift, denominado DetectA (Detect Abrupt) com a capacidade de detectar mudanças tanto de forma proativa quanto de forma reativa. O algoritmo evolutivo com inspiração quântica binário-real AEIQ-BR é utilizado no NEVE para gerar automaticamente novos classificadores para o comitê, determinando a topologia mais adequada para a nova rede, selecionando as variáveis de entrada mais apropriadas e determinando todos os pesos da rede neural MLP. O algoritmo AEIQ-R determina os pesos de votação de cada rede neural membro do comitê, sendo possível utilizar votação por combinação linear, votação majoritária ponderada e simples. São implementadas quatro diferentes abordagens do NEVE, que se diferem uma da outra pela forma de detectar e tratar os drifts ocorridos. O trabalho também apresenta resultados de experimentos realizados com o método DetectA e com o modelo NEVE em bases de dados reais e artificiais. Os resultados mostram que o detector se mostrou robusto e eficiente para bases de dados de alta dimensionalidade, blocos de tamanho intermediário, bases de dados com qualquer proporção de drift e com qualquer balanceamento de classes e que, em geral, os melhores resultados obtidos foram usando algum tipo de detecção. Comparando a acurácia do NEVE com outros modelos consolidados da literatura, verifica-se que o NEVE teve acurácia superior na maioria dos casos. Isto reforça que a abordagem por comitê neuroevolutivo é uma escolha robusta para situações em que as bases de dados estão sujeitas a mudanças repentinas de comportamento. / [en] Real world concepts are often not stable: they change with time. Just as the concepts, data distribution may change as well. This problem of change in concepts or distribution of data is known as concept drift and is a challenge for a model in the task of learning from data. This work presents a new neuroevolutive model with quantum inspiration called NEVE (Neuro- EVolutionary Ensemble), based on an ensemble of Multi-Layer Perceptron (MLP) neural networks for learning in non-stationary environments. It also presents a new concept drift detection mechanism, called DetectA (DETECT Abrupt) with the ability to detect changes both proactively as reactively. The evolutionary algorithm with binary-real quantum inspiration AEIQ-BR is used in NEVE to automatically generate new classifiers for the ensemble, determining the most appropriate topology for the new network and by selecting the most appropriate input variables and determining all the weights of the neural network. The AEIQ-R algorithm determines the voting weight of each neural network ensemble member, and you can use voting by linear combination and voting by weighted or simple majority. Four different approaches of NEVE are implemented and they differ from one another by the way of detecting and treating occurring drifts. The work also presents results of experiments conducted with the DetectA method and with the NEVE model in real and artificial databases. The results show that the detector has proved efficient and suitable for data bases with high-dimensionality, intermediate sized blocks, any proportion of drifts and with any class balancing. Comparing the accuracy of NEVE with other consolidated models in the literature, it appears that NEVE had higher accuracy in most cases. This reinforces that the neuroevolution ensemble approach is a robust choice to situations in which the databases are subject to sudden changes in behavior. [pt] CLASSIFICACAO [en] CLASSIFICATION [pt] NEUROEVOLUCAO [pt] CONCEPT DRIFT [en] CONCEPT DRIFT [pt] DETECCAO DE CONCEPT DRIFT [en] CONCEPT DRIFT DETECTION [pt] COMITE DE REDES NEURAIS [en] NEURAL NETWORKS ENSEMBLES [pt] AMBIENTES NAO ESTACIONARIOS [en] NON-STATIONARY ENVIRONMENTS
6	Concept Drift in Surgery Prediction Beyene, Ayne, Welemariam, Tewelle January 2012 (has links) Context: In healthcare, the decision of patient referral evolves through time because of changes in scientific developments, and clinical practices. Existing decision support systems of patient referral are based on the expert systems approach. This usually requires manual updates when changes in clinical practices occur. Automatically updating the decision support system by identifying and handling so-called concept drift improves the efficiency of healthcare systems. In the stateof-the- art, there are only specific ways of handling concept drift; developing a more generic technique which works regardless of restrictions on how slow, fast, sudden, gradual, local, global, cyclical, noisy or otherwise changes in internal distribution, is still a challenge. Objectives: An algorithm that handles concept drift in surgery prediction is investigated. Concept drift detection techniques are evaluated to find out a suitable detection technique in the context of surgery prediction. Moreover, a plausible combination of detection and handling algorithms including the proposed algorithm, Trigger Based Ensemble (TBE)+, are evaluated on hospital data. Method: Experiments are conducted to investigates the impact of concept drift on prediction performance and to reduce concept drift impact. The experiments compare three existing methods (AWE, Active Classifier, Learn++) and the proposed algorithm, Trigger Based Ensemble(TBE). Real-world dataset from orthopedics department of Belkinge hospital and other domain dataset are used in the experiment. Results: The negative impact of concept drift in surgery prediction is investigated. The relationship between temporal changes in data distribution and surgery prediction concept drift is identified. Furthermore, the proposed algorithm is evaluated and compared with existing handling approaches. Conclusion: The proposed algorithm, Trigger Based Ensemble (TBE), is capable of detecting the occurrences of concept drifts and to adapt quickly to various changes. The Trigger Based Ensemble algorithm performed comparatively better or sometimes similar to the existing concept drift handling algorithms in the absence of noise. Moreover, the performance of Trigger Based Ensemble is consistent for small and large dataset. The research is of twofold contributions, in that it is improving surgery prediction performance as well as contributing one competitive concept drift handling algorithm to the area of computer science. Concept drift Concept Drift in Surgery Prediction Concept Drift Handling Algorithm Trigger Based Ensemble Computer Sciences Datavetenskap (datalogi)
7	Dynamic Committees for Handling Concept Drift in Databases (DCCD) AlShammeri, Mohammed 07 November 2012 (has links) Concept drift refers to a problem that is caused by a change in the data distribution in data mining. This leads to reduction in the accuracy of the current model that is used to examine the underlying data distribution of the concept to be discovered. A number of techniques have been introduced to address this issue, in a supervised learning (or classification) setting. In a classification setting, the target concept (or class) to be learned is known. One of these techniques is called “Ensemble learning”, which refers to using multiple trained classifiers in order to get better predictions by using some voting scheme. In a traditional ensemble, the underlying base classifiers are all of the same type. Recent research extends the idea of ensemble learning to the idea of using committees, where a committee consists of diverse classifiers. This is the main difference between the regular ensemble classifiers and the committee learning algorithms. Committees are able to use diverse learning methods simultaneously and dynamically take advantage of the most accurate classifiers as the data change. In addition, some committees are able to replace their members when they perform poorly. This thesis presents two new algorithms that address concept drifts. The first algorithm has been designed to systematically introduce gradual and sudden concept drift scenarios into datasets. In order to save time and avoid memory consumption, the Concept Drift Introducer (CDI) algorithm divides the number of drift scenarios into phases. The main advantage of using phases is that it allows us to produce a highly scalable concept drift detector that evaluates each phase, instead of evaluating each individual drift scenario. We further designed a novel algorithm to handle concept drift. Our Dynamic Committee for Concept Drift (DCCD) algorithm uses a voted committee of hypotheses that vote on the best base classifier, based on its predictive accuracy. The novelty of DCCD lies in the fact that we employ diverse heterogeneous classifiers in one committee in an attempt to maximize diversity. DCCD detects concept drifts by using the accuracy and by weighing the committee members by adding one point to the most accurate member. The total loss in accuracy for each member is calculated at the end of each point of measurement, or phase. The performance of the committee members are evaluated to decide whether a member needs to be replaced or not. Moreover, DCCD detects the worst member in the committee and then eliminates this member by using a weighting mechanism. Our experimental evaluation centers on evaluating the performance of DCCD on various datasets of different sizes, with different levels of gradual and sudden concept drift. We further compare our algorithm to another state-of-the-art algorithm, namely the MultiScheme approach. The experiments indicate the effectiveness of our DCCD method under a number of diverse circumstances. The DCCD algorithm generally generates high performance results, especially when the number of concept drifts is large in a dataset. For the size of the datasets used, our results showed that DCCD produced a steady improvement in performance when applied to small datasets. Further, in large and medium datasets, our DCCD method has a comparable, and often slightly higher, performance than the MultiScheme technique. The experimental results also show that the DCCD algorithm limits the loss in accuracy over time, regardless of the size of the dataset. Data Mining Machine Learning Concept Drift Concept Shift Non-Stationary Environments Ensemble Learning Learning Committees Dynamic Committees
8	Um estudo investigativo de algoritmos de regressão para data streams Nunes, André Luís 28 March 2017 (has links) Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2017-06-13T14:22:04Z No. of bitstreams: 1 André Luís Nunes_.pdf: 2523682 bytes, checksum: 5e3899cfac6d76db6b2c6ac16b7f5325 (MD5) / Made available in DSpace on 2017-06-13T14:22:04Z (GMT). No. of bitstreams: 1 André Luís Nunes_.pdf: 2523682 bytes, checksum: 5e3899cfac6d76db6b2c6ac16b7f5325 (MD5) Previous issue date: 2017-03-28 / Nenhuma / A explosão no volume de dados e a sua velocidade de expansão tornam as tarefas de descoberta do conhecimento e a análise de dados desafiantes, ainda mais quando consideradas bases não-estacionárias. Embora a predição de valores futuros exerça papel fundamental em áreas como: o clima, problemas de roteamentos e economia, entre outros, a classificação ainda parece ser a tarefa mais explorada. Recentemente, alguns algoritmos voltados à regressão de valores foram lançados, como por exemplo: FIMT-DD, AMRules, IBLStreams e SFNRegressor, entretanto seus estudos investigativos exploraram mais aspectos de inovação e análise do erro de predição, do que explorar suas capacidades mediante critérios apontados como fundamentais para data stream, como tempo de execução e memória. Dessa forma, o objetivo deste trabalho é apresentar um estudo investigativo sobre estes algoritmos que tratam regressão, considerando ambientes dinâmicos, utilizando bases de dados massivas, além de explorar a capacidade de adaptação dos algoritmos com a presença de concept drift. Para isto três bases de dados foram analisadas e estendidas para explorar os principais critérios de avaliação adotados, sendo realizada uma ampla experimentação que produziu uma comparação dos resultados obtidos frente aos algoritmos escolhidos, possibilitando gerar indicativos do comportamento de cada um mediante os diferentes cenários a que foram expostos. Assim, como principais contribuições deste trabalho são destacadas: a avaliação de critérios fundamentais: memória, tempo de execução e poder de generalização, relacionados a regressão para data stream; produção de uma análise crítica dos algoritmos investigados; e a possibilidade de reprodução e extensão dos estudos realizados pela disponibilização das parametrizações empregadas / The explosion of data volume and its expansion speed make tasks of finding knowledge and analyzing data challenging, even more so when non-stationary bases are considered. Although the future values prediction plays a fundamental role in areas such as climate, routing problems and economics, among others, classification seems to be still the most exploited task. Recently, some value-regression algorithms have been launched, for example: FIMT-DD, AMRules, IBLStreams and SFNRegressor; however, their investigative studies have explored more aspects of innovation and analysis of error prediction than exploring their capabilities through criteria that are considered fundamental to data stream, such as elapsed time and memory. In this way, the objective of this work is to present an investigative study about these algorithms that treat regression considering dynamic environments, using massive databases, and also explore the algorithm's adaptability capacity with the presence of concept drift. In order to do this, three databases were analyzed and extended to explore the main evaluation criteria adopted. A wide experiment was carried out, which produced a comparison of the results obtained with the chosen algorithms, allowing to generate behavior indication of each one through the different scenarios to which were exposed. Thus, the main contributions of this work are: evaluation of fundamental criteria: memory, execution time and power of generalization, related to regression to data stream; production of a critical analysis of the algorithms investigated; and the possibility of reproducing and extending the studies carried out by making available the parametrizations applyed. Mineração de data stream Regressão Concept drift Data stream mining Regression
9	An incremental gaussian mixture network for data stream classification in non-stationary environments / Uma rede de mistura de gaussianas incrementais para classificação de fluxos contínuos de dados em cenários não estacionários Diaz, Jorge Cristhian Chamby January 2018 (has links) Classificação de fluxos contínuos de dados possui muitos desafios para a comunidade de mineração de dados quando o ambiente não é estacionário. Um dos maiores desafios para a aprendizagem em fluxos contínuos de dados está relacionado com a adaptação às mudanças de conceito, as quais ocorrem como resultado da evolução dos dados ao longo do tempo. Duas formas principais de desenvolver abordagens adaptativas são os métodos baseados em conjunto de classificadores e os algoritmos incrementais. Métodos baseados em conjunto de classificadores desempenham um papel importante devido à sua modularidade, o que proporciona uma maneira natural de se adaptar a mudanças de conceito. Os algoritmos incrementais são mais rápidos e possuem uma melhor capacidade anti-ruído do que os conjuntos de classificadores, mas têm mais restrições sobre os fluxos de dados. Assim, é um desafio combinar a flexibilidade e a adaptação de um conjunto de classificadores na presença de mudança de conceito, com a simplicidade de uso encontrada em um único classificador com aprendizado incremental. Com essa motivação, nesta dissertação, propomos um algoritmo incremental, online e probabilístico para a classificação em problemas que envolvem mudança de conceito. O algoritmo é chamado IGMN-NSE e é uma adaptação do algoritmo IGMN. As duas principais contribuições da IGMN-NSE em relação à IGMN são: melhoria de poder preditivo para tarefas de classificação e a adaptação para alcançar um bom desempenho em cenários não estacionários. Estudos extensivos em bases de dados sintéticas e do mundo real demonstram que o algoritmo proposto pode rastrear os ambientes em mudança de forma muito próxima, independentemente do tipo de mudança de conceito. / Data stream classification poses many challenges for the data mining community when the environment is non-stationary. The greatest challenge in learning classifiers from data stream relates to adaptation to the concept drifts, which occur as a result of changes in the underlying concepts. Two main ways to develop adaptive approaches are ensemble methods and incremental algorithms. Ensemble method plays an important role due to its modularity, which provides a natural way of adapting to change. Incremental algorithms are faster and have better anti-noise capacity than ensemble algorithms, but have more restrictions on concept drifting data streams. Thus, it is a challenge to combine the flexibility and adaptation of an ensemble classifier in the presence of concept drift, with the simplicity of use found in a single classifier with incremental learning. With this motivation, in this dissertation we propose an incremental, online and probabilistic algorithm for classification as an effort of tackling concept drifting. The algorithm is called IGMN-NSE and is an adaptation of the IGMN algorithm. The two main contributions of IGMN-NSE in relation to the IGMN are: predictive power improvement for classification tasks and adaptation to achieve a good performance in non-stationary environments. Extensive studies on both synthetic and real-world data demonstrate that the proposed algorithm can track the changing environments very closely, regardless of the type of concept drift. Banco : Dados Algoritmos Incremental learning Gaussian mixture models Concept drift Data streams classification
10	Dynamic Committees for Handling Concept Drift in Databases (DCCD) AlShammeri, Mohammed 07 November 2012 (has links) Concept drift refers to a problem that is caused by a change in the data distribution in data mining. This leads to reduction in the accuracy of the current model that is used to examine the underlying data distribution of the concept to be discovered. A number of techniques have been introduced to address this issue, in a supervised learning (or classification) setting. In a classification setting, the target concept (or class) to be learned is known. One of these techniques is called “Ensemble learning”, which refers to using multiple trained classifiers in order to get better predictions by using some voting scheme. In a traditional ensemble, the underlying base classifiers are all of the same type. Recent research extends the idea of ensemble learning to the idea of using committees, where a committee consists of diverse classifiers. This is the main difference between the regular ensemble classifiers and the committee learning algorithms. Committees are able to use diverse learning methods simultaneously and dynamically take advantage of the most accurate classifiers as the data change. In addition, some committees are able to replace their members when they perform poorly. This thesis presents two new algorithms that address concept drifts. The first algorithm has been designed to systematically introduce gradual and sudden concept drift scenarios into datasets. In order to save time and avoid memory consumption, the Concept Drift Introducer (CDI) algorithm divides the number of drift scenarios into phases. The main advantage of using phases is that it allows us to produce a highly scalable concept drift detector that evaluates each phase, instead of evaluating each individual drift scenario. We further designed a novel algorithm to handle concept drift. Our Dynamic Committee for Concept Drift (DCCD) algorithm uses a voted committee of hypotheses that vote on the best base classifier, based on its predictive accuracy. The novelty of DCCD lies in the fact that we employ diverse heterogeneous classifiers in one committee in an attempt to maximize diversity. DCCD detects concept drifts by using the accuracy and by weighing the committee members by adding one point to the most accurate member. The total loss in accuracy for each member is calculated at the end of each point of measurement, or phase. The performance of the committee members are evaluated to decide whether a member needs to be replaced or not. Moreover, DCCD detects the worst member in the committee and then eliminates this member by using a weighting mechanism. Our experimental evaluation centers on evaluating the performance of DCCD on various datasets of different sizes, with different levels of gradual and sudden concept drift. We further compare our algorithm to another state-of-the-art algorithm, namely the MultiScheme approach. The experiments indicate the effectiveness of our DCCD method under a number of diverse circumstances. The DCCD algorithm generally generates high performance results, especially when the number of concept drifts is large in a dataset. For the size of the datasets used, our results showed that DCCD produced a steady improvement in performance when applied to small datasets. Further, in large and medium datasets, our DCCD method has a comparable, and often slightly higher, performance than the MultiScheme technique. The experimental results also show that the DCCD algorithm limits the loss in accuracy over time, regardless of the size of the dataset. Data Mining Machine Learning Concept Drift Concept Shift Non-Stationary Environments Ensemble Learning Learning Committees Dynamic Committees

Search results