Global ETD Search

31	Meta-aprendizado aplicado a fluxos contínuos de dados / Metalearning for algorithm selection in data strams Rossi, Andre Luís Debiaso 19 December 2013 (has links) Algoritmos de aprendizado de máquina são amplamente empregados na indução de modelos para descoberta de conhecimento em conjuntos de dados. Como grande parte desses algoritmos assume que os dados são gerados por uma função de distribuição estacionária, um modelo é induzido uma única vez e usado indefinidamente para a predição do rótulo de novos dados. Entretanto, atualmente, diversas aplicações, como gerenciamento de transportes e monitoramento por redes de sensores, geram fluxos contínuos de dados que podem mudar ao longo do tempo. Consequentemente, a eficácia do algoritmo escolhido para esses problemas pode se deteriorar ou outros algoritmos podem se tornar mais apropriados para as características dos novos dados. Nesta tese é proposto um método baseado em meta-aprendizado para gerenciar o processo de aprendizado em ambientes dinâmicos de fluxos contínuos de dados com o objetivo de melhorar o desempenho preditivo do sistema de aprendizado. Esse método, denominado MetaStream, seleciona regularmente o algoritmo mais promissor para os dados que estão chegando, de acordo com as características desses dados e de experiências passadas. O método proposto emprega técnicas de aprendizado de máquina para gerar o meta-conhecimento, que relaciona as características extraídas dos dados em diferentes instantes do tempo ao desempenho preditivo dos algoritmos. Entre as medidas usadas para extrair informação relevante dos dados, estão aquelas comumente empregadas em meta-aprendizado convencional com diferentes conjuntos de dados, que são adaptadas para as especificidades do cenário de fluxos, e de áreas correlatas, que consideram, por exemplo, a ordem de chegada dos dados. O MetaStream é avaliado para três conjuntos de dados reais e seis algoritmos de aprendizado diferentes. Os resultados mostram a aplicabilidade do MetaStream e sua capacidade de melhorar o desempenho preditivo geral do sistema de aprendizado em relação a um método de referência para a maioria dos problemas investigados. Deve ser observado que uma combinação de modelos mostrou-se superior ao MetaStream para dois conjuntos de dados. Assim, foram analisados os principais fatores que podem ter influenciado nos resultados observados e são indicadas possíveis melhorias do método proposto / Machine learning algorithms are widely employed to induce models for knowledge discovery in databases. Since most of these algorithms suppose that the underlying distribution of the data is stationary, a model is induced only once e it is applied to predict the label of new data indefinitely. However, currently, many real applications, such as transportation management systems and monitoring of sensor networks, generate data streams that can change over time. Consequently, the effectiveness of the algorithm chosen for these problems may deteriorate or other algorithms may become more suitable for the new data characteristics. This thesis proposes a metalearning based method for the management of the learning process in dynamic environments of data streams aiming to improve the general predictive performance of the learning system. This method, named MetaStream, regularly selects the most promising algorithm for arriving data according to its characteristics and past experiences. The proposed method employs machine learning techniques to generate metaknowledge, which relates the characteristics extracted from data in different time points to the predictive performance of the algorithms. Among the measures applied to extract relevant information are those commonly used in conventional metalearning for different data sets, which are adapted for the data stream particularities, and from other related areas that consider the order of the data stream. We evaluate MetaStream for three real data stream problems and six different learning algorithms. The results show the applicability of the MetaStream and its capability to improve the general predictive performance of the learning system compared to a baseline method for the majority of the cases investigated. It must be observed that an ensemble of models is usually superior to MetaStream. Thus, we analyzed the main factors that may have influenced the results and indicate possible improvements for the proposed method Algorithm selection Data streams Fluxos contínuos de dados Meta-aprendizado Metalearning Seleção de algoritmos
32	Improving Hoeffding Trees Kirkby, Richard Brendon January 2008 (has links) Modern information technology allows information to be collected at a far greater rate than ever before. So fast, in fact, that the main problem is making sense of it all. Machine learning offers promise of a solution, but the field mainly focusses on achieving high accuracy when data supply is limited. While this has created sophisticated classification algorithms, many do not cope with increasing data set sizes. When the data set sizes get to a point where they could be considered to represent a continuous supply, or data stream, then incremental classification algorithms are required. In this setting, the effectiveness of an algorithm cannot simply be assessed by accuracy alone. Consideration needs to be given to the memory available to the algorithm and the speed at which data is processed in terms of both the time taken to predict the class of a new data sample and the time taken to include this sample in an incrementally updated classification model. The Hoeffding tree algorithm is a state-of-the-art method for inducing decision trees from data streams. The aim of this thesis is to improve this algorithm. To measure improvement, a comprehensive framework for evaluating the performance of data stream algorithms is developed. Within the framework memory size is fixed in order to simulate realistic application scenarios. In order to simulate continuous operation, classes of synthetic data are generated providing an evaluation on a large scale. Improvements to many aspects of the Hoeffding tree algorithm are demonstrated. First, a number of methods for handling continuous numeric features are compared. Second, tree prediction strategy is investigated to evaluate the utility of various methods. Finally, the possibility of improving accuracy using ensemble methods is explored. The experimental results provide meaningful comparisons of accuracy and processing speeds between different modifications of the Hoeffding tree algorithm under various memory limits. The study on numeric attributes demonstrates that sacrificing accuracy for space at the local level often results in improved global accuracy. The prediction strategy shown to perform best adaptively chooses between standard majority class and Naive Bayes prediction in the leaves. The ensemble method investigation shows that combining trees can be worthwhile, but only when sufficient memory is available, and improvement is less likely than in traditional machine learning. In particular, issues are encountered when applying the popular boosting method to streams. machine learning classification data streams decision trees hoeffding trees boosting bagging option trees
33	CBPsp: complex business processes for stream processing Kamaleswaran, Rishikesan 01 April 2011 (has links) This thesis presents the framework of a complex business process driven event stream processing system to produce meaningful output with direct implications to the business objectives of an organization. This framework is demonstrated using a case study instantiating the management of a newborn infant with hypoglycaemia. Business processes defined within guidelines, are defined at build-time while critical knowledge found in the definition of business processes are used to support their enactment for stream analysis. Four major research contributions are delivered. The first contribution enables the definition and enactment of complex business processes in real-time. The second contribution supports the extraction of business process using knowledge found within the initial expression of the business process. The third contribution allows for the explicit use of temporal abstraction and stream analysis knowledge to support enactment in real-time. Finally, the last contribution is the real-time integration of heterogeneous streams based on Service-Oriented Architecture principles. / UOIT Complex business processes Event stream processing Data streams Real-time enactment Clinical decision support system
34	Geometric Approximation Algorithms in the Online and Data Stream Models Zarrabi-Zadeh, Hamid January 2008 (has links) The online and data stream models of computation have recently attracted considerable research attention due to many real-world applications in various areas such as data mining, machine learning, distributed computing, and robotics. In both these models, input items arrive one at a time, and the algorithms must decide based on the partial data received so far, without any secure information about the data that will arrive in the future. In this thesis, we investigate efficient algorithms for a number of fundamental geometric optimization problems in the online and data stream models. The problems studied in this thesis can be divided into two major categories: geometric clustering and computing various extent measures of a set of points. In the online setting, we show that the basic unit clustering problem admits non-trivial algorithms even in the simplest one-dimensional case: we show that the naive upper bounds on the competitive ratio of algorithms for this problem can be beaten using randomization. In the data stream model, we propose a new streaming algorithm for maintaining "core-sets" of a set of points in fixed dimensions, and also, introduce a new simple framework for transforming a class of offline algorithms to their equivalents in the data stream model. These results together lead to improved streaming approximation algorithms for a wide variety of geometric optimization problems in fixed dimensions, including diameter, width, k-center, smallest enclosing ball, minimum-volume bounding box, minimum enclosing cylinder, minimum-width enclosing spherical shell/annulus, etc. In high-dimensional data streams, where the dimension is not a constant, we propose a simple streaming algorithm for the minimum enclosing ball (the 1-center) problem with an improved approximation factor. Computational Geometry Optimization Problems Approximation Algorithms Online Algorithms Data Streams Computer Science
35	Geometric Approximation Algorithms in the Online and Data Stream Models Zarrabi-Zadeh, Hamid January 2008 (has links) The online and data stream models of computation have recently attracted considerable research attention due to many real-world applications in various areas such as data mining, machine learning, distributed computing, and robotics. In both these models, input items arrive one at a time, and the algorithms must decide based on the partial data received so far, without any secure information about the data that will arrive in the future. In this thesis, we investigate efficient algorithms for a number of fundamental geometric optimization problems in the online and data stream models. The problems studied in this thesis can be divided into two major categories: geometric clustering and computing various extent measures of a set of points. In the online setting, we show that the basic unit clustering problem admits non-trivial algorithms even in the simplest one-dimensional case: we show that the naive upper bounds on the competitive ratio of algorithms for this problem can be beaten using randomization. In the data stream model, we propose a new streaming algorithm for maintaining "core-sets" of a set of points in fixed dimensions, and also, introduce a new simple framework for transforming a class of offline algorithms to their equivalents in the data stream model. These results together lead to improved streaming approximation algorithms for a wide variety of geometric optimization problems in fixed dimensions, including diameter, width, k-center, smallest enclosing ball, minimum-volume bounding box, minimum enclosing cylinder, minimum-width enclosing spherical shell/annulus, etc. In high-dimensional data streams, where the dimension is not a constant, we propose a simple streaming algorithm for the minimum enclosing ball (the 1-center) problem with an improved approximation factor. Computational Geometry Optimization Problems Approximation Algorithms Online Algorithms Data Streams Computer Science
36	Τεχνικές εξόρυξης χώρο-χρονικών δεδομένων και εφαρμογές τους στην ανάλυση ηλεκτροεγκεφαλογραφήματος Κορβέσης, Παναγιώτης 16 May 2014 (has links) Η εξόρυξη χώρο-χρονικών δεδομένων αποτελεί πλέον μία από τις σημαντικότερες κατευθύνσεις του κλάδου της εξόρυξης γνώσης. Κάποια από τα βασικά προβλήματα που καλείται να αντιμετωπίσει είναι η ανακάλυψη περιοχών που εμφανίζουν ομοιότητες στην χρονική τους εξέλιξη, η αναγνώριση προτύπων που εμφανίζονται τόσο στην χωρική όσο και στη χρονική πληροφορία, η πρόβλεψη μελλοντικών τιμών και η αποθήκευση σε εξειδικευμένες βάσεις δεδομένων με σκοπό την αποδοτική απάντηση χωροχρονικών ερωτημάτων. Οι μέθοδοι που προσεγγίζουν τα παραπάνω προβλήματα καθώς και οι βασικές εργασίες της εξόρυξης γνώσης, όπως η κατηγοριοποίηση και η ομαδοποίηση, εμφανίζονται στον πυρήνα της πλειονότητας των εργαλείων ανάλυσης και επεξεργασίας χώρο-χρονικών δεδομένων. Βασικός στόχος της παρούσας εργασίας είναι η εφαρμογή μεθόδων εξόρυξης χώρο-χρονικών δεδομένων στο Ηλεκτροεγκεφαλογράφημα (ΗΕΓ), το οποίο αποτελεί μία από τις πιο διαδεδομένες τεχνικές ανάλυσης της εγκεφαλικής λειτουργίας. Τα δεδομένα που προκύπτουν από το ΗΕΓ περιέχουν τόσο χωρική όσο και χρονική πληροφορία καθώς αποτελούνται από ηλεκτρικά σήματα που προέρχονται από ηλεκτρόδια τοποθετημένα σε συγκεκριμένες θέσεις στο κρανίο. Τα βασικά προβλήματα που μελετήθηκαν στην επεξεργασία του ΗΕΓ είναι η μοντελοποίηση και η συσταδοποίηση χώρο-χρονικών δεδομένων, τα οποία οδήγησαν στην ανάπτυξη των αντίστοιχων μεθόδων. Στα πλαίσια της παρούσας εργασίας μελετήθηκε επίσης το πρόβλημα της διαχείρισης των δεδομένων ΗΕΓ και τη ανάλυσης ροών δεδομένων σε πραγματικό χρόνο. Η ενασχόληση με τα συγκεκριμένα προβλήματα οδήγησε α) στη δημιουργία καινοτόμων μεθόδων μοντελοποίησης και συσταδοποίησης χωρο-χρονικών δεδομένων, β) στον σχεδιασμό μιας βάσης δεδομένων, γ) στην μελέτη της βιβλιογραφίας στο θέμα της εξόρυξης και της διαχείρισης ροών δεδομένων και δ) στην δημιουργία μιας εφαρμογής για την ανάλυση δεδομένων σε πραγματικό χρόνο πάνω σε ένα σύστημα διαχείρισης ροών δεδομένων. Η παρούσα εργασία περιλαμβάνει ένα ένα σύνολο μεθόδων και εργαλείων ανάλυσης και διαχείρισης δεδομένων που εξετάστηκαν και χρησιμοποιήθηκαν προκειμένου να μελετηθεί η καταλληλότητά της εφαρμογής τους στις καταγραφές ΗΕΓ. Με τον τρόπο αυτό επιτυγχάνεται ο πρωταρχικός στόχος της εργασίας: η προώθηση υπαρχόντων και η δημιουργία καινοτόμων μεθόδων ανάλυσης από τον κλάδο της εξόρυξης γνώσης στα δεδομένα του ηλεκτροεγκεφαλογραφήματος. / Mining spatiotemporal data is one of the most significant topics in the field of data mining and knowledge discovery. Detecting locations that exhibit similarities in their temporal evolution, recognizing patterns that appear in both spatial and temporal information and storing spatiotemporal data in specialized databases are some of the fundamental problems tackled by researchers in this specific area. Methods and algorithms that address such problems along with the common data mining tasks (e.g. classification and clustering) are critical in the development of applications for analyzing spatiotemporal data, fact that highlights the necessity of continuous advancements of these algorithms in terms of usability, accuracy and performance. The most significant objective of the work performed during this thesis is the application of spatiotemporal data mining methods on the analysis of EEG, in order to exploit the both the spatial and the temporal nature of these data (i.e. electrodes placed on specific locations on the scalp that continuously record the electrical activity of the brain). Towards this direction the problems of modeling and clustering spatiotemporal data were extensively studied and the major outcome was the development of two corresponding methods. Furthermore, during this work the problem of managing EEG data was investigated both in the offline and the online scenario and within the latter, the state of the art in mining data streams was studied. The outcomes of this thesis related to the aforementioned problems include a) the development of a graph-based method for modeling spatiotemporal data, b) a method for clustering spatiotemporal data based on this model, c) the design of a database schema for storing eeg recording data and meta-data and d) the development of an application for online spindle detection over a data stream management system. Finally, this work aims towards the development of new and the adaptation of existing data mining methods in the context of spatiotemporal EEG analysis. Ροές δεδομένων 006.312 Spatiotemporal data mining Data streams Electroencephalography
37	An approach for online learning in the presence of concept changes Jaber, Ghazal 18 October 2013 (has links) (PDF) Learning from data streams is emerging as an important application area. When the environment changes, it is necessary to rely on on-line learning with the capability to adapt to changing conditions a.k.a. concept drifts. Adapting to concept drifts entails forgetting some or all of the old acquired knowledge when the concept changes while accumulating knowledge regarding the supposedly stationary underlying concept. This tradeoff is called the stability-plasticity dilemma. Ensemble methods have been among the most successful approaches. However, the management of the ensemble which ultimately controls how past data is forgotten has not been thoroughly investigated so far. Our work shows the importance of the forgetting strategy by comparing several approaches. The results thus obtained lead us to propose a new ensemble method with an enhanced forgetting strategy to adapt to concept drifts. Experimental comparisons show that our method compares favorably with the well-known state-of-the-art systems. The majority of previous works focused only on means to detect changes and to adapt to them. In our work, we go one step further by introducing a meta-learning mechanism that is able to detect relevant states of the environment, to recognize recurring contexts and to anticipate likely concepts changes. Hence, the method we suggest, deals with both the challenge of optimizing the stability-plasticity dilemma and with the anticipation and recognition of incoming concepts. This is accomplished through an ensemble method that controls a ensemble of incremental learners. The management of the ensemble of learners enables one to naturally adapt to the dynamics of the concept changes with very few parameters to set, while a learning mechanism managing the changes in the ensemble provides means for the anticipation of, and the quick adaptation to, the underlying modification of the context. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Online machine learning Ensemble methods Concept drift Data streams
38	An incremental gaussian mixture network for data stream classification in non-stationary environments / Uma rede de mistura de gaussianas incrementais para classificação de fluxos contínuos de dados em cenários não estacionários Diaz, Jorge Cristhian Chamby January 2018 (has links) Classificação de fluxos contínuos de dados possui muitos desafios para a comunidade de mineração de dados quando o ambiente não é estacionário. Um dos maiores desafios para a aprendizagem em fluxos contínuos de dados está relacionado com a adaptação às mudanças de conceito, as quais ocorrem como resultado da evolução dos dados ao longo do tempo. Duas formas principais de desenvolver abordagens adaptativas são os métodos baseados em conjunto de classificadores e os algoritmos incrementais. Métodos baseados em conjunto de classificadores desempenham um papel importante devido à sua modularidade, o que proporciona uma maneira natural de se adaptar a mudanças de conceito. Os algoritmos incrementais são mais rápidos e possuem uma melhor capacidade anti-ruído do que os conjuntos de classificadores, mas têm mais restrições sobre os fluxos de dados. Assim, é um desafio combinar a flexibilidade e a adaptação de um conjunto de classificadores na presença de mudança de conceito, com a simplicidade de uso encontrada em um único classificador com aprendizado incremental. Com essa motivação, nesta dissertação, propomos um algoritmo incremental, online e probabilístico para a classificação em problemas que envolvem mudança de conceito. O algoritmo é chamado IGMN-NSE e é uma adaptação do algoritmo IGMN. As duas principais contribuições da IGMN-NSE em relação à IGMN são: melhoria de poder preditivo para tarefas de classificação e a adaptação para alcançar um bom desempenho em cenários não estacionários. Estudos extensivos em bases de dados sintéticas e do mundo real demonstram que o algoritmo proposto pode rastrear os ambientes em mudança de forma muito próxima, independentemente do tipo de mudança de conceito. / Data stream classification poses many challenges for the data mining community when the environment is non-stationary. The greatest challenge in learning classifiers from data stream relates to adaptation to the concept drifts, which occur as a result of changes in the underlying concepts. Two main ways to develop adaptive approaches are ensemble methods and incremental algorithms. Ensemble method plays an important role due to its modularity, which provides a natural way of adapting to change. Incremental algorithms are faster and have better anti-noise capacity than ensemble algorithms, but have more restrictions on concept drifting data streams. Thus, it is a challenge to combine the flexibility and adaptation of an ensemble classifier in the presence of concept drift, with the simplicity of use found in a single classifier with incremental learning. With this motivation, in this dissertation we propose an incremental, online and probabilistic algorithm for classification as an effort of tackling concept drifting. The algorithm is called IGMN-NSE and is an adaptation of the IGMN algorithm. The two main contributions of IGMN-NSE in relation to the IGMN are: predictive power improvement for classification tasks and adaptation to achieve a good performance in non-stationary environments. Extensive studies on both synthetic and real-world data demonstrate that the proposed algorithm can track the changing environments very closely, regardless of the type of concept drift. Banco : Dados Algoritmos Incremental learning Gaussian mixture models Concept drift Data streams classification
39	Efficient Processing of Skyline Queries on Static Data Sources, Data Streams and Incomplete Datasets January 2014 (has links) abstract: Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems. An assumption commonly made by many skyline algorithms is that a skyline query is applied to a single static data source or data stream. Unfortunately, this assumption does not hold in many applications in which a skyline query may involve attributes belonging to multiple data sources and requires a join operation to be performed before the skyline can be produced. Recently, various skyline-join algorithms have been proposed to address this problem in the context of static data sources. However, these algorithms suffer from several drawbacks: they often need to scan the data sources exhaustively to obtain the skyline-join results; moreover, the pruning techniques employed to eliminate tuples are largely based on expensive tuple-to-tuple comparisons. On the other hand, most data stream techniques focus on single stream skyline queries, thus rendering them unsuitable for skyline-join queries. Another assumption typically made by most of the earlier skyline algorithms is that the data is complete and all skyline attribute values are available. Due to this constraint, these algorithms cannot be applied to incomplete data sources in which some of the attribute values are missing and are represented by NULL values. There exists a definition of dominance for incomplete data, but this leads to undesirable consequences such as non-transitive and cyclic dominance relations both of which are detrimental to skyline processing. Based on the aforementioned observations, the main goal of the research described in this dissertation is the design and development of a framework of skyline operators that effectively handles three distinct types of skyline queries: 1) skyline-join queries on static data sources, 2) skyline-window-join queries over data streams, and 3) strata-skyline queries on incomplete datasets. This dissertation presents the unique challenges posed by these skyline queries and addresses the shortcomings of current skyline techniques by proposing efficient methods to tackle the added overhead in processing skyline queries on static data sources, data streams, and incomplete datasets. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2014 Computer science Data Streams Incomplete Data Multiple Datasets Skyline-Join Skyline Query Processing Skyline-Window-Join
40	An incremental gaussian mixture network for data stream classification in non-stationary environments / Uma rede de mistura de gaussianas incrementais para classificação de fluxos contínuos de dados em cenários não estacionários Diaz, Jorge Cristhian Chamby January 2018 (has links) Classificação de fluxos contínuos de dados possui muitos desafios para a comunidade de mineração de dados quando o ambiente não é estacionário. Um dos maiores desafios para a aprendizagem em fluxos contínuos de dados está relacionado com a adaptação às mudanças de conceito, as quais ocorrem como resultado da evolução dos dados ao longo do tempo. Duas formas principais de desenvolver abordagens adaptativas são os métodos baseados em conjunto de classificadores e os algoritmos incrementais. Métodos baseados em conjunto de classificadores desempenham um papel importante devido à sua modularidade, o que proporciona uma maneira natural de se adaptar a mudanças de conceito. Os algoritmos incrementais são mais rápidos e possuem uma melhor capacidade anti-ruído do que os conjuntos de classificadores, mas têm mais restrições sobre os fluxos de dados. Assim, é um desafio combinar a flexibilidade e a adaptação de um conjunto de classificadores na presença de mudança de conceito, com a simplicidade de uso encontrada em um único classificador com aprendizado incremental. Com essa motivação, nesta dissertação, propomos um algoritmo incremental, online e probabilístico para a classificação em problemas que envolvem mudança de conceito. O algoritmo é chamado IGMN-NSE e é uma adaptação do algoritmo IGMN. As duas principais contribuições da IGMN-NSE em relação à IGMN são: melhoria de poder preditivo para tarefas de classificação e a adaptação para alcançar um bom desempenho em cenários não estacionários. Estudos extensivos em bases de dados sintéticas e do mundo real demonstram que o algoritmo proposto pode rastrear os ambientes em mudança de forma muito próxima, independentemente do tipo de mudança de conceito. / Data stream classification poses many challenges for the data mining community when the environment is non-stationary. The greatest challenge in learning classifiers from data stream relates to adaptation to the concept drifts, which occur as a result of changes in the underlying concepts. Two main ways to develop adaptive approaches are ensemble methods and incremental algorithms. Ensemble method plays an important role due to its modularity, which provides a natural way of adapting to change. Incremental algorithms are faster and have better anti-noise capacity than ensemble algorithms, but have more restrictions on concept drifting data streams. Thus, it is a challenge to combine the flexibility and adaptation of an ensemble classifier in the presence of concept drift, with the simplicity of use found in a single classifier with incremental learning. With this motivation, in this dissertation we propose an incremental, online and probabilistic algorithm for classification as an effort of tackling concept drifting. The algorithm is called IGMN-NSE and is an adaptation of the IGMN algorithm. The two main contributions of IGMN-NSE in relation to the IGMN are: predictive power improvement for classification tasks and adaptation to achieve a good performance in non-stationary environments. Extensive studies on both synthetic and real-world data demonstrate that the proposed algorithm can track the changing environments very closely, regardless of the type of concept drift. Banco : Dados Algoritmos Incremental learning Gaussian mixture models Concept drift Data streams classification

Search results