Spelling suggestions: "subject:"byconcept drift"" "subject:"c.concept drift""
1 |
Tuning and Optimising Concept Drift DetectionDo, Ethan Quoc-Nam January 2021 (has links)
Data drifts naturally occur in data streams due to seasonality, change in data usage,
and the data generation process. Concepts modelled via the data streams will also
experience such drift. The problem of differentiating concept drift from anomalies
is important to identify normal vs abnormal behaviour. Existing techniques achieve
poor responsiveness and accuracy towards this differentiation task.
We take two approaches to address this problem. First, we extend an existing
sliding window algorithm to include multiple windows to model recently seen data
stream patterns, and define new parameters to compare the data streams. Second,
we study a set of optimisers and tune a Bi-LSTM model parameters to maximize
accuracy. / Thesis / Master of Applied Science (MASc)
|
2 |
Anomaly Detection in Univariate Time Series Data in the Presence of Concept DriftZamani Alavijeh, Soroush January 2021 (has links)
Digital applications and devices record data over time to enable the users and managers to monitor their activity. Errors occur in data, including the time series data, for various reasons including software system failures and human errors. The problem of identifying errors, also referred to as anomaly detection, in time series data is a well studied topic by the data management and systems researchers. Such data are often recorded in dynamic environments where a change in the standard or the recording hardware can result in different and novel patterns arising in the data. Such novel patterns are caused by what is referred to as concept drifts. Concept drift occurs when there is a pattern change in the statistical properties of the data, e.g. the distribution of the data, over time. The problem of identifying anomalies in time series data recorded and stored in dynamic environments has not been extensively studied. In this study, we focus on this problem. We propose and implement a unified framework that is able to identify drifts in univariate time series data and incorporate information gained from the data to train a learning model that is able to detect anomalies in unseen univariate time series data. / Thesis / Master of Science (MSc)
|
3 |
Concept Drift in Surgery PredictionBeyene, Ayne, Welemariam, Tewelle January 2012 (has links)
Context: In healthcare, the decision of patient referral evolves through time because of changes in scientific developments, and clinical practices. Existing decision support systems of patient referral are based on the expert systems approach. This usually requires manual updates when changes in clinical practices occur. Automatically updating the decision support system by identifying and handling so-called concept drift improves the efficiency of healthcare systems. In the stateof-the- art, there are only specific ways of handling concept drift; developing a more generic technique which works regardless of restrictions on how slow, fast, sudden, gradual, local, global, cyclical, noisy or otherwise changes in internal distribution, is still a challenge. Objectives: An algorithm that handles concept drift in surgery prediction is investigated. Concept drift detection techniques are evaluated to find out a suitable detection technique in the context of surgery prediction. Moreover, a plausible combination of detection and handling algorithms including the proposed algorithm, Trigger Based Ensemble (TBE)+, are evaluated on hospital data. Method: Experiments are conducted to investigates the impact of concept drift on prediction performance and to reduce concept drift impact. The experiments compare three existing methods (AWE, Active Classifier, Learn++) and the proposed algorithm, Trigger Based Ensemble(TBE). Real-world dataset from orthopedics department of Belkinge hospital and other domain dataset are used in the experiment. Results: The negative impact of concept drift in surgery prediction is investigated. The relationship between temporal changes in data distribution and surgery prediction concept drift is identified. Furthermore, the proposed algorithm is evaluated and compared with existing handling approaches. Conclusion: The proposed algorithm, Trigger Based Ensemble (TBE), is capable of detecting the occurrences of concept drifts and to adapt quickly to various changes. The Trigger Based Ensemble algorithm performed comparatively better or sometimes similar to the existing concept drift handling algorithms in the absence of noise. Moreover, the performance of Trigger Based Ensemble is consistent for small and large dataset. The research is of twofold contributions, in that it is improving surgery prediction performance as well as contributing one competitive concept drift handling algorithm to the area of computer science.
|
4 |
Concept drift learning and its application to adaptive information filteringWidyantoro, Dwi Hendratmo 30 September 2004 (has links)
Tracking the evolution of user interests is a problem instance of concept drift learning. Keeping track of multiple interest categories is a natural phenomenon as well as an interesting tracking problem because interests can emerge and diminish at different time frames. The first part of this dissertation presents a Multiple Three-Descriptor Representation (MTDR) algorithm, a novel algorithm for learning concept drift especially built for tracking the dynamics of multiple target concepts in the information filtering domain. The learning process of the algorithm combines the long-term and short-term interest (concept) models in an attempt to benefit from the strength of both models. The MTDR algorithm improves over existing concept drift learning algorithms in the domain.
Being able to track multiple target concepts with a few examples poses an even more important and challenging problem because casual users tend to be reluctant to provide the examples needed, and learning from a few labeled data is generally difficult. The second part presents a computational Framework for Extending Incomplete Labeled Data Stream (FEILDS). The system modularly extends the capability of an existing concept drift learner in dealing with incomplete labeled data stream. It expands the learner's original input stream with relevant unlabeled data; the process generates a new stream with improved learnability. FEILDS employs a concept formation system for organizing its input stream into a concept (cluster) hierarchy. The system uses the concept and cluster hierarchy to identify the instance's concept and unlabeled data relevant to a concept. It also adopts the persistence assumption in temporal reasoning for inferring the relevance of concepts. Empirical evaluation indicates that FEILDS is able to improve the performance of existing learners particularly when learning from a stream with a few labeled data.
Lastly, a new concept formation algorithm, one of the key components in the FEILDS architecture, is presented. The main idea is to discover intrinsic hierarchical structures regardless of the class distribution and the shape of the input stream. Experimental evaluation shows that the algorithm is relatively robust to input ordering, consistently producing a hierarchy structure of high quality.
|
5 |
Fine-Grained, Unsupervised, Context-based Change Detection and Adaptation for Evolving Categorical DataD'Ettorre, Sarah January 2016 (has links)
Concept drift detection, the identfication of changes in data distributions in streams,
is critical to understanding the mechanics of data generating processes and ensuring that data models remain representative through time [2]. Many change detection methods utilize statistical techniques that take numerical data as input. However, many applications produce data streams containing categorical attributes. In this context, numerical statistical methods are unavailable, and different approaches are required. Common solutions use error monitoring, assuming that
fluctuations in the error measures of a learning system correspond to concept drift [4]. There has been very little research, though, on context-based concept drift detection in categorical streams. This approach observes changes in the actual data distribution and is less popular due to the challenges associated with categorical data analysis. However, context-based change detection is arguably more informative as it is data-driven, and more widely applicable in that it can function in an unsupervised setting [4]. This study offers a contribution to this gap in the research by proposing a novel context-based change detection and adaptation algorithm for categorical data, namely Fine-Grained Change Detection in Categorical Data Streams (FG-CDCStream). This unsupervised method exploits elements of ensemble learning, a technique whereby decisions are made according to the majority vote of a set of models representing different random subspaces of the data [5]. These ideas are applied to a set of concept drift detector objects and merged with concepts from a recent, state-of-the-art, context-based change detection algorithm, the so-called Change Detection in Categorical Data Streams (CDCStream) [4]. FG-CDCStream is proposed as an extension of the batch-based CDCStream, providing instance-by-instance analysis and improving its change detection capabilities especially in data streams containing abrupt changes or a combination of abrupt and gradual changes. FG-CDCStream also enhances the adaptation strategy of CDCStream producing more representative post-change models.
|
6 |
Semi-Supervised Hybrid Windowing Ensembles for Learning from Evolving StreamsFloyd, Sean Louis Alan 03 June 2019 (has links)
In this thesis, learning refers to the intelligent computational extraction of knowledge from data. Supervised learning tasks require data to be annotated with labels, whereas for unsupervised learning, data is not labelled. Semi-supervised learning deals with data sets that are partially labelled. A major issue with supervised and semi-supervised learning of data streams is late-arriving or missing class labels. Assuming that correctly labelled data will always be available and timely is often unfeasible, and, as such, supervised methods are not directly applicable in the real world. Therefore, real-world problems usually require the use of semi-supervised or unsupervised learning techniques. For instance, when considering a spam detection task, it is not reasonable to assume that all spam will be identified (correctly labelled) prior to learning. Additionally, in semi-supervised learning, "the instances having the highest [predictive] confidence are not necessarily the most useful ones" [41]. We investigate how self-training performs without its selective heuristic in a streaming setting.
This leads us to our contributions. We extend an existing concept drift detector to operate without any labelled data, by using a sliding window of our ensemble's prediction confidence, instead of a boolean indicating whether the ensemble's predictions are correct. We also extend selective self-training, a semi-supervised learning method, by using all predictions, and not only those with high predictive confidence. Finally, we introduce a novel windowing type for ensembles, as sliding windows are very time consuming and regular tumbling windows are not a suitable replacement. Our windowing technique can be considered a hybrid of the two: we train each sub-classifier in the ensemble with tumbling windows, but delay training in such a way that only one sub-classifier can update its model per iteration.
We found, through statistical significance tests, that our framework is (roughly 160 times) faster than current state of the art techniques, and achieves comparable predictive accuracy. That being said, more research is needed to further reduce the quantity of labelled data used for training, while also increasing its predictive accuracy.
|
7 |
Adaptive User Interfaces for Mobile Computing DevicesBridle, Robert Angus, robert.bridle@gmail.com January 2008 (has links)
This thesis examines the use of adaptive user interface elements on a mobile phone and presents two adaptive user interface approaches. The approaches attempt to increase the efficiency with which a user interacts with a mobile phone, while ensuring the interface remains predictable to a user.
¶
An adaptive user interface approach is presented that predicts the menu item a user will select. When a menu is opened, the predicted menu item is highlighted instead of the top-most menu item. The aim is to maintain the layout of the menu and to save the user from performing scrolling key presses. A machine learning approach is used to accomplish the prediction task. However, learning in the mobile phone environment produces several difficulties. These are limited availability of training examples, concept drift and limited computational resources. A novel learning approach is presented that addresses these difficulties. This learning approach addresses limited training examples and limited computational resources by employing a highly restricted hypothesis space. Furthermore, the approach addresses concept drift by determining the hypothesis that has been consistent for the longest run of training examples into the past. Under certain concept drift restrictions, an analysis of this approach shows it to be superior to approaches that use a fixed window of training examples. An experimental evaluation on data collected from several users interacting with a mobile phone was used to assess this learning approach in practice. The results of this evaluation are reported in terms of the average number of key presses saved. The benefit of menu-item prediction can clearly be seen, with savings of up to three key presses on every menu interaction.
¶
An extension of the menu-item prediction approach is presented that removes the need to manually specify a restricted hypothesis space. The approach uses a decision-tree learner to generate hypotheses online and uses the minimum description length principle to identify the occurrence of concept shifts. The identification of concept shifts is used to guide the hypothesis generation process. The approach is compared with the original menu-item prediction approach in which hypotheses are manually specified. Experimental results using the same datasets are reported.
¶
Another adaptive user interface approach is presented that induces shortcuts on a mobile phone interface. The approach is based on identifying shortcuts in the form of macros, which can automate a sequence of actions. A means of specifying relevant action sequences is presented, together with several learning approaches for predicting which shortcut to present to a user. A small subset of the possible shortcuts on a mobile phone was considered. This subset consisted of shortcuts that automated the actions of making a phone call or sending a text message. The results of an experimental evaluation of the shortcut prediction approaches are presented. The shortcut prediction process was evaluated in terms of predictive accuracy and stability, where stability was defined as the rate at which predicted shortcuts changed over time. The importance of stability is discussed, and is used to question the advantages of using sophisticated learning approaches for achieving adaptive user interfaces on mobile phones. Finally, several methods for combining accuracy and stability measures are presented, and the learning approaches are compared with these methods.
|
8 |
[en] NEUROEVOLUTIVE LEARNING AND CONCEPT DRIFT DETECTION IN NON-STATIONARY ENVIRONMENTS / [pt] APRENDIZAGEM NEUROEVOLUTIVA E DETECÇÃO DE CONCEPT DRIFT EM AMBIENTES NÃO ESTACIONÁRIOSTATIANA ESCOVEDO 04 July 2016 (has links)
[pt] Os conceitos do mundo real muitas vezes não são estáveis: eles
mudam com o tempo. Assim como os conceitos, a distribuição de dados
também pode se alterar. Este problema de mudança de conceitos ou
distribuição de dados é conhecido como concept drift e é um desafio para um
modelo na tarefa de aprender a partir de dados. Este trabalho apresenta um
novo modelo neuroevolutivo com inspiração quântica, baseado em um comitê
de redes neurais do tipo Multi-Layer Perceptron (MLP), para a aprendizagem
em ambientes não estacionários, denominado NEVE (Neuro-EVolutionary
Ensemble). Também apresenta um novo mecanismo de detecção de concept
drift, denominado DetectA (Detect Abrupt) com a capacidade de detectar
mudanças tanto de forma proativa quanto de forma reativa. O algoritmo
evolutivo com inspiração quântica binário-real AEIQ-BR é utilizado no NEVE
para gerar automaticamente novos classificadores para o comitê, determinando
a topologia mais adequada para a nova rede, selecionando as variáveis de
entrada mais apropriadas e determinando todos os pesos da rede neural MLP.
O algoritmo AEIQ-R determina os pesos de votação de cada rede neural
membro do comitê, sendo possível utilizar votação por combinação linear,
votação majoritária ponderada e simples. São implementadas quatro diferentes
abordagens do NEVE, que se diferem uma da outra pela forma de detectar e
tratar os drifts ocorridos. O trabalho também apresenta resultados de
experimentos realizados com o método DetectA e com o modelo NEVE em
bases de dados reais e artificiais. Os resultados mostram que o detector se
mostrou robusto e eficiente para bases de dados de alta dimensionalidade,
blocos de tamanho intermediário, bases de dados com qualquer proporção de
drift e com qualquer balanceamento de classes e que, em geral, os melhores
resultados obtidos foram usando algum tipo de detecção. Comparando a
acurácia do NEVE com outros modelos consolidados da literatura, verifica-se
que o NEVE teve acurácia superior na maioria dos casos. Isto reforça que a
abordagem por comitê neuroevolutivo é uma escolha robusta para situações
em que as bases de dados estão sujeitas a mudanças repentinas de
comportamento. / [en] Real world concepts are often not stable: they change with time. Just as
the concepts, data distribution may change as well. This problem of change in
concepts or distribution of data is known as concept drift and is a challenge for
a model in the task of learning from data. This work presents a new
neuroevolutive model with quantum inspiration called NEVE (Neuro-
EVolutionary Ensemble), based on an ensemble of Multi-Layer Perceptron
(MLP) neural networks for learning in non-stationary environments. It also
presents a new concept drift detection mechanism, called DetectA (DETECT
Abrupt) with the ability to detect changes both proactively as reactively. The
evolutionary algorithm with binary-real quantum inspiration AEIQ-BR is used in
NEVE to automatically generate new classifiers for the ensemble, determining
the most appropriate topology for the new network and by selecting the most
appropriate input variables and determining all the weights of the neural
network. The AEIQ-R algorithm determines the voting weight of each neural
network ensemble member, and you can use voting by linear combination and
voting by weighted or simple majority. Four different approaches of NEVE are
implemented and they differ from one another by the way of detecting and
treating occurring drifts. The work also presents results of experiments
conducted with the DetectA method and with the NEVE model in real and
artificial databases. The results show that the detector has proved efficient and
suitable for data bases with high-dimensionality, intermediate sized blocks, any
proportion of drifts and with any class balancing. Comparing the accuracy of
NEVE with other consolidated models in the literature, it appears that NEVE
had higher accuracy in most cases. This reinforces that the neuroevolution
ensemble approach is a robust choice to situations in which the databases are
subject to sudden changes in behavior.
|
9 |
Performance Envelopes of Adaptive Ensemble Data Stream ClassifiersJoe-Yen, Stefan 01 January 2017 (has links)
This dissertation documents a study of the performance characteristics of algorithms designed to mitigate the effects of concept drift on online machine learning. Several supervised binary classifiers were evaluated on their performance when applied to an input data stream with a non-stationary class distribution. The selected classifiers included ensembles that combine the contributions of their member algorithms to improve overall performance. These ensembles adapt to changing class definitions, known as “concept drift,” often present in real-world situations, by adjusting the relative contributions of their members. Three stream classification algorithms and three adaptive ensemble algorithms were compared to determine the capabilities of each in terms of accuracy and throughput. For each< run of the experiment, the percentage of correct classifications was measured using prequential analysis, a well-established methodology in the evaluation of streaming classifiers. Throughput was measured in classifications performed per second as timed by the CPU clock. Two main experimental variables were manipulated to investigate and compare the range of accuracy and throughput exhibited by each algorithm under various conditions. The number of attributes in the instances to be classified and the speed at which the definitions of labeled data drifted were varied across six total combinations of drift-speed and dimensionality. The implications of results are used to recommend improved methods for working with stream-based data sources. The typical approach to counteract concept drift is to update the classification models with new data. In the stream paradigm, classifiers are continuously exposed to new data that may serve as representative examples of the current situation. However, updating the ensemble classifier in order to maintain or improve accuracy can be computationally costly and will negatively impact throughput. In a real-time system, this could lead to an unacceptable slow-down. The results of this research showed that,among several algorithms for reducing the effect of concept drift, adaptive decision trees maintained the highest accuracy without slowing down with respect to the no-drift condition. Adaptive ensemble techniques were also able to maintain reasonable accuracy in the presence of drift without much change in the throughput. However, the overall throughput of the adaptive methods is low and may be unacceptable for extremely time-sensitive applications. The performance visualization methodology utilized in this study gives a clear and intuitive visual summary that allows system designers to evaluate candidate algorithms with respect to their performance needs.
|
10 |
Dynamic Committees for Handling Concept Drift in Databases (DCCD)AlShammeri, Mohammed 07 November 2012 (has links)
Concept drift refers to a problem that is caused by a change in the data distribution in data mining. This leads to reduction in the accuracy of the current model that is used to examine the underlying data distribution of the concept to be discovered. A number of techniques have been introduced to address this issue, in a supervised learning (or classification) setting. In a classification setting, the target concept (or class) to be learned is known. One of these techniques is called “Ensemble learning”, which refers to using multiple trained classifiers in order to get better predictions by using some voting scheme. In a traditional ensemble, the underlying base classifiers are all of the same type. Recent research extends the idea of ensemble learning to the idea of using committees, where a committee consists of diverse classifiers. This is the main difference between the regular ensemble classifiers and the committee learning algorithms. Committees are able to use diverse learning methods simultaneously and dynamically take advantage of the most accurate classifiers as the data change. In addition, some committees are able to replace their members when they perform poorly.
This thesis presents two new algorithms that address concept drifts. The first algorithm has been designed to systematically introduce gradual and sudden concept drift scenarios into datasets. In order to save time and avoid memory consumption, the Concept Drift Introducer (CDI) algorithm divides the number of drift scenarios into phases. The main advantage of using phases is that it allows us to produce a highly scalable concept drift detector that evaluates each phase, instead of evaluating each individual drift scenario.
We further designed a novel algorithm to handle concept drift. Our Dynamic Committee for Concept Drift (DCCD) algorithm uses a voted committee of hypotheses that vote on the best base classifier, based on its predictive accuracy. The novelty of DCCD lies in the fact that we employ diverse heterogeneous classifiers in one committee in an attempt to maximize diversity. DCCD detects concept drifts by using the accuracy and by weighing the committee members by adding one point to the most accurate member. The total loss in accuracy for each member is calculated at the end of each point of measurement, or phase. The performance of the committee members are evaluated to decide whether a member needs to be replaced or not. Moreover, DCCD detects the worst member in the committee and then eliminates this member by using a weighting mechanism.
Our experimental evaluation centers on evaluating the performance of DCCD on various datasets of different sizes, with different levels of gradual and sudden concept drift. We further compare our algorithm to another state-of-the-art algorithm, namely the MultiScheme approach. The experiments indicate the effectiveness of our DCCD method under a number of diverse circumstances. The DCCD algorithm generally generates high performance results, especially when the number of concept drifts is large in a dataset. For the size of the datasets used, our results showed that DCCD produced a steady improvement in performance when applied to small datasets. Further, in large and medium datasets, our DCCD method has a comparable, and often slightly higher, performance than the MultiScheme technique. The experimental results also show that the DCCD algorithm limits the loss in accuracy over time, regardless of the size of the dataset.
|
Page generated in 0.1367 seconds