Spelling suggestions: "subject:"drift detection 3methods"" "subject:"drift detection 4methods""
1 |
A Reservoir of Adaptive Algorithms for Online Learning from Evolving Data StreamsPesaranghader, Ali 26 September 2018 (has links)
Continuous change and development are essential aspects of evolving environments and applications, including, but not limited to, smart cities, military, medicine, nuclear reactors, self-driving cars, aviation, and aerospace. That is, the fundamental characteristics of such environments may evolve, and so cause dangerous consequences, e.g., putting people lives at stake, if no reaction is adopted. Therefore, learning systems need to apply intelligent algorithms to monitor evolvement in their environments and update themselves effectively. Further, we may experience fluctuations regarding the performance of learning algorithms due to the nature of incoming data as it continuously evolves. That is, the current efficient learning approach may become deprecated after a change in data or environment. Hence, the question 'how to have an efficient learning algorithm over time against evolving data?' has to be addressed. In this thesis, we have made two contributions to settle the challenges described above.
In the machine learning literature, the phenomenon of (distributional) change in data is known as concept drift. Concept drift may shift decision boundaries, and cause a decline in accuracy. Learning algorithms, indeed, have to detect concept drift in evolving data streams and replace their predictive models accordingly. To address this challenge, adaptive learners have been devised which may utilize drift detection methods to locate the drift points in dynamic and changing data streams. A drift detection method able to discover the drift points quickly, with the lowest false positive and false negative rates, is preferred. False positive refers to incorrectly alarming for concept drift, and false negative refers to not alarming for concept drift. In this thesis, we introduce three algorithms, called as the Fast Hoeffding Drift Detection Method (FHDDM), the Stacking Fast Hoeffding Drift Detection Method (FHDDMS), and the McDiarmid Drift Detection Methods (MDDMs), for detecting drift points with the minimum delay, false positive, and false negative rates. FHDDM is a sliding window-based algorithm and applies Hoeffding’s inequality (Hoeffding, 1963) to detect concept drift. FHDDM slides its window over the prediction results, which are either 1 (for a correct prediction) or 0 (for a wrong prediction). Meanwhile, it compares the mean of elements inside the window with the maximum mean observed so far; subsequently, a significant difference between the two means, upper-bounded by the Hoeffding inequality, indicates the occurrence of concept drift. The FHDDMS extends the FHDDM algorithm by sliding multiple windows over its entries for a better drift detection regarding the detection delay and false negative rate. In contrast to FHDDM/S, the MDDM variants assign weights to their entries, i.e., higher weights are associated with the most recent entries in the sliding window, for faster detection of concept drift. The rationale is that recent examples reflect the ongoing situation adequately. Then, by putting higher weights on the latest entries, we may detect concept drift quickly. An MDDM algorithm bounds the difference between the weighted mean of elements in the sliding window and the maximum weighted mean seen so far, using McDiarmid’s inequality (McDiarmid, 1989). Eventually, it alarms for concept drift once a significant difference is experienced. We experimentally show that FHDDM/S and MDDMs outperform the state-of-the-art by representing promising results in terms of the adaptation and classification measures.
Due to the evolving nature of data streams, the performance of an adaptive learner, which is defined by the classification, adaptation, and resource consumption measures, may fluctuate over time. In fact, a learning algorithm, in the form of a (classifier, detector) pair, may present a significant performance before a concept drift point, but not after. We define this problem by the question 'how can we ensure that an efficient classifier-detector pair is present at any time in an evolving environment?' To answer this, we have developed the Tornado framework which runs various kinds of learning algorithms simultaneously against evolving data streams. Each algorithm incrementally and independently trains a predictive model and updates the statistics of its drift detector. Meanwhile, our framework monitors the (classifier, detector) pairs, and recommends the efficient one, concerning the classification, adaptation, and resource consumption performance, to the user. We further define the holistic CAR measure that integrates the classification, adaptation, and resource consumption measures for evaluating the performance of adaptive learning algorithms. Our experiments confirm that the most efficient algorithm may differ over time because of the developing and evolving nature of data streams.
|
2 |
Avaliação criteriosa dos algoritmos de detecção de concept driftsSANTOS, Silas Garrido Teixeira de Carvalho 27 February 2015 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-07-11T12:33:28Z
No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
silas-dissertacao-versao-final-2016.pdf: 1708159 bytes, checksum: 6c0efc5f2f0b27c79306418c9de516f1 (MD5) / Made available in DSpace on 2016-07-11T12:33:28Z (GMT). No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
silas-dissertacao-versao-final-2016.pdf: 1708159 bytes, checksum: 6c0efc5f2f0b27c79306418c9de516f1 (MD5)
Previous issue date: 2015-02-27 / FACEPE / A extração de conhecimento em ambientes com fluxo contínuo de dados é uma atividade que
vem crescendo progressivamente. Diversas são as situações que necessitam desse mecanismo,
como o monitoramento do histórico de compras de clientes; a detecção de presença por meio
de sensores; ou o monitoramento da temperatura da água. Desta maneira, os algoritmos
utilizados para esse fim devem ser atualizados constantemente, buscando adaptar-se às
novas instâncias e levando em consideração as restrições computacionais. Quando se
trabalha em ambientes com fluxo contínuo de dados, em geral não é recomendável supor
que sua distribuição permanecerá estacionária. Diversas mudanças podem ocorrer ao longo
do tempo, desencadeando uma situação geralmente conhecida como mudança de conceito
(concept drift). Neste trabalho foi realizado um estudo comparativo entre alguns dos
principais métodos de detecção de mudanças: ADWIN, DDM, DOF, ECDD, EDDM, PL e
STEPD. Para execução dos experimentos foram utilizadas bases artificiais – simulando
mudanças abruptas, graduais rápidas, e graduais lentas – e também bases com problemas
reais. Os resultados foram analisados baseando-se na precisão, tempo de execução, uso
de memória, tempo médio de detecção das mudanças, e quantidade de falsos positivos e
negativos. Já os parâmetros dos métodos foram definidos utilizando uma versão adaptada
de um algoritmo genético. De acordo com os resultados do teste de Friedman juntamente
com Nemenyi, em termos de precisão, DDM se mostrou o método mais eficiente com as
bases utilizadas, sendo estatisticamente superior ao DOF e ECDD. Já EDDM foi o método
mais rápido e também o mais econômico no uso da memória, sendo superior ao DOF,
ECDD, PL e STEPD, em ambos os casos. Conclui-se então que métodos mais sensíveis
às detecções de mudanças, e consequentemente mais propensos a alarmes falsos, obtêm
melhores resultados quando comparados a métodos menos sensíveis e menos suscetíveis a
alarmes falsos. / Knowledge extraction from data streams is an activity that has been progressively receiving
an increased demand. Examples of such applications include monitoring purchase history
of customers, movement data from sensors, or water temperatures. Thus, algorithms used
for this purpose must be constantly updated, trying to adapt to new instances and taking
into account computational constraints. When working in environments with a continuous
flow of data, there is no guarantee that the distribution of the data will remain stationary.
On the contrary, several changes may occur over time, triggering situations commonly
known as concept drift. In this work we present a comparative study of some of the main
drift detection methods: ADWIN, DDM, DOF, ECDD, EDDM, PL and STEPD. For
the execution of the experiments, artificial datasets were used – simulating abrupt, fast
gradual, and slow gradual changes – and also datasets with real problems. The results
were analyzed based on the accuracy, runtime, memory usage, average time to change
detection, and number of false positives and negatives. The parameters of methods were
defined using an adapted version of a genetic algorithm. According to the Friedman test
with Nemenyi results, in terms of accuracy, DDM was the most efficient method with
the datasets used, and statistically superior to DOF and ECDD. EDDM was the fastest
method and also the most economical in memory usage, being statistically superior to
DOF, ECDD, PL and STEPD, in both cases. It was concluded that more sensitive change
detection methods, and therefore more prone to false alarms, achieve better results when
compared to less sensitive and less susceptible to false alarms methods.
|
Page generated in 0.14 seconds