Spelling suggestions: "subject:"byconcept drift"" "subject:"c.concept drift""
11 |
Um estudo investigativo de algoritmos de regressão para data streamsNunes, André Luís 28 March 2017 (has links)
Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2017-06-13T14:22:04Z
No. of bitstreams: 1
André Luís Nunes_.pdf: 2523682 bytes, checksum: 5e3899cfac6d76db6b2c6ac16b7f5325 (MD5) / Made available in DSpace on 2017-06-13T14:22:04Z (GMT). No. of bitstreams: 1
André Luís Nunes_.pdf: 2523682 bytes, checksum: 5e3899cfac6d76db6b2c6ac16b7f5325 (MD5)
Previous issue date: 2017-03-28 / Nenhuma / A explosão no volume de dados e a sua velocidade de expansão tornam as tarefas de descoberta do conhecimento e a análise de dados desafiantes, ainda mais quando consideradas bases não-estacionárias. Embora a predição de valores futuros exerça papel fundamental em áreas como: o clima, problemas de roteamentos e economia, entre outros, a classificação ainda parece ser a tarefa mais explorada. Recentemente, alguns algoritmos voltados à regressão de valores foram lançados, como por exemplo: FIMT-DD, AMRules, IBLStreams e SFNRegressor, entretanto seus estudos investigativos exploraram mais aspectos de inovação e análise do erro de predição, do que explorar suas capacidades mediante critérios apontados como fundamentais para data stream, como tempo de execução e memória. Dessa forma, o objetivo deste trabalho é apresentar um estudo investigativo sobre estes algoritmos que tratam regressão, considerando ambientes dinâmicos, utilizando bases de dados massivas, além de explorar a capacidade de adaptação dos algoritmos com a presença de concept drift. Para isto três bases de dados foram analisadas e estendidas para explorar os principais critérios de avaliação adotados, sendo realizada uma ampla experimentação que produziu uma comparação dos resultados obtidos frente aos algoritmos escolhidos, possibilitando gerar indicativos do comportamento de cada um mediante os diferentes cenários a que foram expostos. Assim, como principais contribuições deste trabalho são destacadas: a avaliação de critérios fundamentais: memória, tempo de execução e poder de generalização, relacionados a regressão para data stream; produção de uma análise crítica dos algoritmos investigados; e a possibilidade de reprodução e extensão dos estudos realizados pela disponibilização das parametrizações empregadas / The explosion of data volume and its expansion speed make tasks of finding knowledge and analyzing data challenging, even more so when non-stationary bases are considered. Although the future values prediction plays a fundamental role in areas such as climate, routing problems and economics, among others, classification seems to be still the most exploited task. Recently, some value-regression algorithms have been launched, for example: FIMT-DD, AMRules, IBLStreams and SFNRegressor; however, their investigative studies have explored more aspects of innovation and analysis of error prediction than exploring their capabilities through criteria that are considered fundamental to data stream, such as elapsed time and memory. In this way, the objective of this work is to present an investigative study about these algorithms that treat regression considering dynamic environments, using massive databases, and also explore the algorithm's adaptability capacity with the presence of concept drift. In order to do this, three databases were analyzed and extended to explore the main evaluation criteria adopted. A wide experiment was carried out, which produced a comparison of the results obtained with the chosen algorithms, allowing to generate behavior indication of each one through the different scenarios to which were exposed. Thus, the main contributions of this work are: evaluation of fundamental criteria: memory, execution time and power of generalization, related to regression to data stream; production of a critical analysis of the algorithms investigated; and the possibility of reproducing and extending the studies carried out by making available the parametrizations applyed.
|
12 |
An incremental gaussian mixture network for data stream classification in non-stationary environments / Uma rede de mistura de gaussianas incrementais para classificação de fluxos contínuos de dados em cenários não estacionáriosDiaz, Jorge Cristhian Chamby January 2018 (has links)
Classificação de fluxos contínuos de dados possui muitos desafios para a comunidade de mineração de dados quando o ambiente não é estacionário. Um dos maiores desafios para a aprendizagem em fluxos contínuos de dados está relacionado com a adaptação às mudanças de conceito, as quais ocorrem como resultado da evolução dos dados ao longo do tempo. Duas formas principais de desenvolver abordagens adaptativas são os métodos baseados em conjunto de classificadores e os algoritmos incrementais. Métodos baseados em conjunto de classificadores desempenham um papel importante devido à sua modularidade, o que proporciona uma maneira natural de se adaptar a mudanças de conceito. Os algoritmos incrementais são mais rápidos e possuem uma melhor capacidade anti-ruído do que os conjuntos de classificadores, mas têm mais restrições sobre os fluxos de dados. Assim, é um desafio combinar a flexibilidade e a adaptação de um conjunto de classificadores na presença de mudança de conceito, com a simplicidade de uso encontrada em um único classificador com aprendizado incremental. Com essa motivação, nesta dissertação, propomos um algoritmo incremental, online e probabilístico para a classificação em problemas que envolvem mudança de conceito. O algoritmo é chamado IGMN-NSE e é uma adaptação do algoritmo IGMN. As duas principais contribuições da IGMN-NSE em relação à IGMN são: melhoria de poder preditivo para tarefas de classificação e a adaptação para alcançar um bom desempenho em cenários não estacionários. Estudos extensivos em bases de dados sintéticas e do mundo real demonstram que o algoritmo proposto pode rastrear os ambientes em mudança de forma muito próxima, independentemente do tipo de mudança de conceito. / Data stream classification poses many challenges for the data mining community when the environment is non-stationary. The greatest challenge in learning classifiers from data stream relates to adaptation to the concept drifts, which occur as a result of changes in the underlying concepts. Two main ways to develop adaptive approaches are ensemble methods and incremental algorithms. Ensemble method plays an important role due to its modularity, which provides a natural way of adapting to change. Incremental algorithms are faster and have better anti-noise capacity than ensemble algorithms, but have more restrictions on concept drifting data streams. Thus, it is a challenge to combine the flexibility and adaptation of an ensemble classifier in the presence of concept drift, with the simplicity of use found in a single classifier with incremental learning. With this motivation, in this dissertation we propose an incremental, online and probabilistic algorithm for classification as an effort of tackling concept drifting. The algorithm is called IGMN-NSE and is an adaptation of the IGMN algorithm. The two main contributions of IGMN-NSE in relation to the IGMN are: predictive power improvement for classification tasks and adaptation to achieve a good performance in non-stationary environments. Extensive studies on both synthetic and real-world data demonstrate that the proposed algorithm can track the changing environments very closely, regardless of the type of concept drift.
|
13 |
Dynamic Committees for Handling Concept Drift in Databases (DCCD)AlShammeri, Mohammed 07 November 2012 (has links)
Concept drift refers to a problem that is caused by a change in the data distribution in data mining. This leads to reduction in the accuracy of the current model that is used to examine the underlying data distribution of the concept to be discovered. A number of techniques have been introduced to address this issue, in a supervised learning (or classification) setting. In a classification setting, the target concept (or class) to be learned is known. One of these techniques is called “Ensemble learning”, which refers to using multiple trained classifiers in order to get better predictions by using some voting scheme. In a traditional ensemble, the underlying base classifiers are all of the same type. Recent research extends the idea of ensemble learning to the idea of using committees, where a committee consists of diverse classifiers. This is the main difference between the regular ensemble classifiers and the committee learning algorithms. Committees are able to use diverse learning methods simultaneously and dynamically take advantage of the most accurate classifiers as the data change. In addition, some committees are able to replace their members when they perform poorly.
This thesis presents two new algorithms that address concept drifts. The first algorithm has been designed to systematically introduce gradual and sudden concept drift scenarios into datasets. In order to save time and avoid memory consumption, the Concept Drift Introducer (CDI) algorithm divides the number of drift scenarios into phases. The main advantage of using phases is that it allows us to produce a highly scalable concept drift detector that evaluates each phase, instead of evaluating each individual drift scenario.
We further designed a novel algorithm to handle concept drift. Our Dynamic Committee for Concept Drift (DCCD) algorithm uses a voted committee of hypotheses that vote on the best base classifier, based on its predictive accuracy. The novelty of DCCD lies in the fact that we employ diverse heterogeneous classifiers in one committee in an attempt to maximize diversity. DCCD detects concept drifts by using the accuracy and by weighing the committee members by adding one point to the most accurate member. The total loss in accuracy for each member is calculated at the end of each point of measurement, or phase. The performance of the committee members are evaluated to decide whether a member needs to be replaced or not. Moreover, DCCD detects the worst member in the committee and then eliminates this member by using a weighting mechanism.
Our experimental evaluation centers on evaluating the performance of DCCD on various datasets of different sizes, with different levels of gradual and sudden concept drift. We further compare our algorithm to another state-of-the-art algorithm, namely the MultiScheme approach. The experiments indicate the effectiveness of our DCCD method under a number of diverse circumstances. The DCCD algorithm generally generates high performance results, especially when the number of concept drifts is large in a dataset. For the size of the datasets used, our results showed that DCCD produced a steady improvement in performance when applied to small datasets. Further, in large and medium datasets, our DCCD method has a comparable, and often slightly higher, performance than the MultiScheme technique. The experimental results also show that the DCCD algorithm limits the loss in accuracy over time, regardless of the size of the dataset.
|
14 |
An approach for online learning in the presence of concept changesJaber, Ghazal 18 October 2013 (has links) (PDF)
Learning from data streams is emerging as an important application area. When the environment changes, it is necessary to rely on on-line learning with the capability to adapt to changing conditions a.k.a. concept drifts. Adapting to concept drifts entails forgetting some or all of the old acquired knowledge when the concept changes while accumulating knowledge regarding the supposedly stationary underlying concept. This tradeoff is called the stability-plasticity dilemma. Ensemble methods have been among the most successful approaches. However, the management of the ensemble which ultimately controls how past data is forgotten has not been thoroughly investigated so far. Our work shows the importance of the forgetting strategy by comparing several approaches. The results thus obtained lead us to propose a new ensemble method with an enhanced forgetting strategy to adapt to concept drifts. Experimental comparisons show that our method compares favorably with the well-known state-of-the-art systems. The majority of previous works focused only on means to detect changes and to adapt to them. In our work, we go one step further by introducing a meta-learning mechanism that is able to detect relevant states of the environment, to recognize recurring contexts and to anticipate likely concepts changes. Hence, the method we suggest, deals with both the challenge of optimizing the stability-plasticity dilemma and with the anticipation and recognition of incoming concepts. This is accomplished through an ensemble method that controls a ensemble of incremental learners. The management of the ensemble of learners enables one to naturally adapt to the dynamics of the concept changes with very few parameters to set, while a learning mechanism managing the changes in the ensemble provides means for the anticipation of, and the quick adaptation to, the underlying modification of the context.
|
15 |
On the Feasibility of Integrating Data Mining Algorithms into Self Adaptive Systems for Context Awareness and Requirements EvolutionRook, Angela 20 August 2014 (has links)
Context is important to today's mobile and ubiquitous systems as operational requirements are only valid under certain context conditions. Detecting context and adapting automatically to that context is a key feature of many of these systems. However, when the operational context associated with a particular requirement changes drastically in a way that designers could not have anticipated, many systems are unable to effectively adapt their operating parameters to continue meeting user needs. Automatically detecting and implementing this system context evolution is highly desirable because it allows for increased uncertainty to be built into the system at design time in order to efficiently and effectively cope with these kinds of drastic changes. This thesis is an empirical investigation and discussion towards integrating data mining algorithms into self-adaptive systems to analyze and de fine new context relevant to specific system requirements when current system context parameters are no longer sufficient. / Graduate / 0984 / arook@uvic.ca
|
16 |
Approaching Concept Drift by Context Feature PartitioningHoffmann, Nico, Kirmse, Matthias, Petersohn, Uwe 20 February 2012 (has links) (PDF)
In this paper we present a new approach to handle concept drift using domain-specific knowledge. More precisely, we capitalize known context features to partition a domain into subdomains featuring static class distributions. Subsequently, we learn separate classifiers for each sub domain and classify new instances accordingly. To determine the optimal partitioning for a domain we apply a search algorithm aiming to maximize the resulting accuracy. In practical domains like fault detection concept drift often occurs in combination with imbalances data. As this issue gets more important learning models on smaller subdomains we additionally use sampling methods to handle it. Comparative experiments with artificial data sets showed that our approach outperforms a plain SVM regarding different performance measures. Summarized, the partitioning concept drift approach (PCD) is a possible way to handle concept drift in domains where the causing context features are at least partly known.
|
17 |
An incremental gaussian mixture network for data stream classification in non-stationary environments / Uma rede de mistura de gaussianas incrementais para classificação de fluxos contínuos de dados em cenários não estacionáriosDiaz, Jorge Cristhian Chamby January 2018 (has links)
Classificação de fluxos contínuos de dados possui muitos desafios para a comunidade de mineração de dados quando o ambiente não é estacionário. Um dos maiores desafios para a aprendizagem em fluxos contínuos de dados está relacionado com a adaptação às mudanças de conceito, as quais ocorrem como resultado da evolução dos dados ao longo do tempo. Duas formas principais de desenvolver abordagens adaptativas são os métodos baseados em conjunto de classificadores e os algoritmos incrementais. Métodos baseados em conjunto de classificadores desempenham um papel importante devido à sua modularidade, o que proporciona uma maneira natural de se adaptar a mudanças de conceito. Os algoritmos incrementais são mais rápidos e possuem uma melhor capacidade anti-ruído do que os conjuntos de classificadores, mas têm mais restrições sobre os fluxos de dados. Assim, é um desafio combinar a flexibilidade e a adaptação de um conjunto de classificadores na presença de mudança de conceito, com a simplicidade de uso encontrada em um único classificador com aprendizado incremental. Com essa motivação, nesta dissertação, propomos um algoritmo incremental, online e probabilístico para a classificação em problemas que envolvem mudança de conceito. O algoritmo é chamado IGMN-NSE e é uma adaptação do algoritmo IGMN. As duas principais contribuições da IGMN-NSE em relação à IGMN são: melhoria de poder preditivo para tarefas de classificação e a adaptação para alcançar um bom desempenho em cenários não estacionários. Estudos extensivos em bases de dados sintéticas e do mundo real demonstram que o algoritmo proposto pode rastrear os ambientes em mudança de forma muito próxima, independentemente do tipo de mudança de conceito. / Data stream classification poses many challenges for the data mining community when the environment is non-stationary. The greatest challenge in learning classifiers from data stream relates to adaptation to the concept drifts, which occur as a result of changes in the underlying concepts. Two main ways to develop adaptive approaches are ensemble methods and incremental algorithms. Ensemble method plays an important role due to its modularity, which provides a natural way of adapting to change. Incremental algorithms are faster and have better anti-noise capacity than ensemble algorithms, but have more restrictions on concept drifting data streams. Thus, it is a challenge to combine the flexibility and adaptation of an ensemble classifier in the presence of concept drift, with the simplicity of use found in a single classifier with incremental learning. With this motivation, in this dissertation we propose an incremental, online and probabilistic algorithm for classification as an effort of tackling concept drifting. The algorithm is called IGMN-NSE and is an adaptation of the IGMN algorithm. The two main contributions of IGMN-NSE in relation to the IGMN are: predictive power improvement for classification tasks and adaptation to achieve a good performance in non-stationary environments. Extensive studies on both synthetic and real-world data demonstrate that the proposed algorithm can track the changing environments very closely, regardless of the type of concept drift.
|
18 |
An incremental gaussian mixture network for data stream classification in non-stationary environments / Uma rede de mistura de gaussianas incrementais para classificação de fluxos contínuos de dados em cenários não estacionáriosDiaz, Jorge Cristhian Chamby January 2018 (has links)
Classificação de fluxos contínuos de dados possui muitos desafios para a comunidade de mineração de dados quando o ambiente não é estacionário. Um dos maiores desafios para a aprendizagem em fluxos contínuos de dados está relacionado com a adaptação às mudanças de conceito, as quais ocorrem como resultado da evolução dos dados ao longo do tempo. Duas formas principais de desenvolver abordagens adaptativas são os métodos baseados em conjunto de classificadores e os algoritmos incrementais. Métodos baseados em conjunto de classificadores desempenham um papel importante devido à sua modularidade, o que proporciona uma maneira natural de se adaptar a mudanças de conceito. Os algoritmos incrementais são mais rápidos e possuem uma melhor capacidade anti-ruído do que os conjuntos de classificadores, mas têm mais restrições sobre os fluxos de dados. Assim, é um desafio combinar a flexibilidade e a adaptação de um conjunto de classificadores na presença de mudança de conceito, com a simplicidade de uso encontrada em um único classificador com aprendizado incremental. Com essa motivação, nesta dissertação, propomos um algoritmo incremental, online e probabilístico para a classificação em problemas que envolvem mudança de conceito. O algoritmo é chamado IGMN-NSE e é uma adaptação do algoritmo IGMN. As duas principais contribuições da IGMN-NSE em relação à IGMN são: melhoria de poder preditivo para tarefas de classificação e a adaptação para alcançar um bom desempenho em cenários não estacionários. Estudos extensivos em bases de dados sintéticas e do mundo real demonstram que o algoritmo proposto pode rastrear os ambientes em mudança de forma muito próxima, independentemente do tipo de mudança de conceito. / Data stream classification poses many challenges for the data mining community when the environment is non-stationary. The greatest challenge in learning classifiers from data stream relates to adaptation to the concept drifts, which occur as a result of changes in the underlying concepts. Two main ways to develop adaptive approaches are ensemble methods and incremental algorithms. Ensemble method plays an important role due to its modularity, which provides a natural way of adapting to change. Incremental algorithms are faster and have better anti-noise capacity than ensemble algorithms, but have more restrictions on concept drifting data streams. Thus, it is a challenge to combine the flexibility and adaptation of an ensemble classifier in the presence of concept drift, with the simplicity of use found in a single classifier with incremental learning. With this motivation, in this dissertation we propose an incremental, online and probabilistic algorithm for classification as an effort of tackling concept drifting. The algorithm is called IGMN-NSE and is an adaptation of the IGMN algorithm. The two main contributions of IGMN-NSE in relation to the IGMN are: predictive power improvement for classification tasks and adaptation to achieve a good performance in non-stationary environments. Extensive studies on both synthetic and real-world data demonstrate that the proposed algorithm can track the changing environments very closely, regardless of the type of concept drift.
|
19 |
Adaptive Machine Learning for Credit Card Fraud DetectionDal Pozzolo, Andrea 04 December 2015 (has links)
Billions of dollars of loss are caused every year by fraudulent credit card transactions. The design of efficient fraud detection algorithms is key for reducing these losses, and more and more algorithms rely on advanced machine learning techniques to assist fraud investigators. The design of fraud detection algorithms is however particularly challenging due to the non-stationary distribution of the data, the highly unbalanced classes distributions and the availability of few transactions labeled by fraud investigators. At the same time public data are scarcely available for confidentiality issues, leaving unanswered many questions about what is the best strategy. In this thesis we aim to provide some answers by focusing on crucial issues such as: i) why and how undersampling is useful in the presence of class imbalance (i.e. frauds are a small percentage of the transactions), ii) how to deal with unbalanced and evolving data streams (non-stationarity due to fraud evolution and change of spending behavior), iii) how to assess performances in a way which is relevant for detection and iv) how to use feedbacks provided by investigators on the fraud alerts generated. Finally, we design and assess a prototype of a Fraud Detection System able to meet real-world working conditions and that is able to integrate investigators’ feedback to generate accurate alerts. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
|
20 |
Dynamic Committees for Handling Concept Drift in Databases (DCCD)AlShammeri, Mohammed January 2012 (has links)
Concept drift refers to a problem that is caused by a change in the data distribution in data mining. This leads to reduction in the accuracy of the current model that is used to examine the underlying data distribution of the concept to be discovered. A number of techniques have been introduced to address this issue, in a supervised learning (or classification) setting. In a classification setting, the target concept (or class) to be learned is known. One of these techniques is called “Ensemble learning”, which refers to using multiple trained classifiers in order to get better predictions by using some voting scheme. In a traditional ensemble, the underlying base classifiers are all of the same type. Recent research extends the idea of ensemble learning to the idea of using committees, where a committee consists of diverse classifiers. This is the main difference between the regular ensemble classifiers and the committee learning algorithms. Committees are able to use diverse learning methods simultaneously and dynamically take advantage of the most accurate classifiers as the data change. In addition, some committees are able to replace their members when they perform poorly.
This thesis presents two new algorithms that address concept drifts. The first algorithm has been designed to systematically introduce gradual and sudden concept drift scenarios into datasets. In order to save time and avoid memory consumption, the Concept Drift Introducer (CDI) algorithm divides the number of drift scenarios into phases. The main advantage of using phases is that it allows us to produce a highly scalable concept drift detector that evaluates each phase, instead of evaluating each individual drift scenario.
We further designed a novel algorithm to handle concept drift. Our Dynamic Committee for Concept Drift (DCCD) algorithm uses a voted committee of hypotheses that vote on the best base classifier, based on its predictive accuracy. The novelty of DCCD lies in the fact that we employ diverse heterogeneous classifiers in one committee in an attempt to maximize diversity. DCCD detects concept drifts by using the accuracy and by weighing the committee members by adding one point to the most accurate member. The total loss in accuracy for each member is calculated at the end of each point of measurement, or phase. The performance of the committee members are evaluated to decide whether a member needs to be replaced or not. Moreover, DCCD detects the worst member in the committee and then eliminates this member by using a weighting mechanism.
Our experimental evaluation centers on evaluating the performance of DCCD on various datasets of different sizes, with different levels of gradual and sudden concept drift. We further compare our algorithm to another state-of-the-art algorithm, namely the MultiScheme approach. The experiments indicate the effectiveness of our DCCD method under a number of diverse circumstances. The DCCD algorithm generally generates high performance results, especially when the number of concept drifts is large in a dataset. For the size of the datasets used, our results showed that DCCD produced a steady improvement in performance when applied to small datasets. Further, in large and medium datasets, our DCCD method has a comparable, and often slightly higher, performance than the MultiScheme technique. The experimental results also show that the DCCD algorithm limits the loss in accuracy over time, regardless of the size of the dataset.
|
Page generated in 0.0925 seconds