Global ETD Search

11	Ανίχνευση παρασίτων σε ροές δεδομένων και αποκατάσταση σήματος με χρήση πλειογραμμικής άλγεβρας Τριανταφυλλόπουλος, Δημήτριος 07 May 2015 (has links) Στόχος της παρούσας διπλωματικής είναι η παρουσίαση ενός συστήματος ανίχνευσης και διαχείρισης παρασίτων σε δεδομένα εγκεφαλογραφήματος (EEG). Το σύστημα αυτό σε πραγματικό χρόνο ανιχνεύει της ύπαρξη παρασίτων κατά την διάρκεια της καταγραφής, αξιοποιώντας ένα προ-εκπαιδευμένο μοντέλο. Τα παράσιτα που ανιχνεύτηκαν μπορούν να διαχειριστούν με αρκετές τεχνοτροπίες ανάλογα με τις ανάγκες της εκάστοτε εφαρμογής. Στην παρούσα διπλωματική παρουσιάζεται μια τεχνοτροπία η οποία αφαιρεί ένα οφθαλμικό παράσιτο με αξιοποίηση τανυστών. Συγκεκριμένα, στην διπλωματική αυτή παρουσιάζονται οι ανάγκες διαχείρισης ροών δεδομένων και πως αυτές αντιμετωπίζονται στην περίπτωση των δεδομένων εγκεφαλογραφήματος. Ο όγκος των δεδομένων καθώς και ο ρυθμός μετά- δοσής τους είναι καθοριστικοί για την διαχείριση και ανάλυση της εισερχόμενης στο σύστημα ροής. Στην διπλωματική αυτή παρουσιάζονται οι γενικές στρατηγικές που έχουν σχεδιαστεί για την διαχείριση χρονοσειρών μεγάλου όγκου και παρουσιάζεται η εφαρμογή τους σε δεδομένα εγκεφαλογραφήματος. Το προτεινόμενο λοιπόν σύστημα μπορεί σε πραγματικό χρόνο να διαχειριστεί ροές δεδομένων εγκεφαλογραφήματος και να διαχωρίσει σε πραγματικό χρόνο περιόδους που υπάρχει κάποιο παράσιτο στο ληφθέν σήμα. Επίσης προ- τείνεται μια μέθοδος που σε offline ανάλυση μπορεί να αφαιρέσει έναν τύπο παρασίτου και συγκεκριμένα το οφθαλμικό παράσιτο. / This diploma thesis presents a system able to detect and manage artifacts in EEG data streams. Ροές δεδομένων Τανυστές 616.804 754 7 Data streams Tensors
12	Clustering to Improve One-Class Classifier Performance in Data Streams Moulton, Richard Hugh 27 August 2018 (has links) The classification task requires learning a decision boundary between classes by making use of training examples from each. A potential challenge for this task is the class imbalance problem, which occurs when there are many training instances available for a single class, the majority class, and few training instances for the other, the minority class [58]. In this case, it is no longer clear how to separate the majority class from something for which we have little to no knowledge. More worrying, often the minority class is the class of interest, e.g. for detecting abnormal conditions from streaming sensor data. The one-class classification (OCC) paradigm addresses this scenario by casting the task as learning a decision boundary around the majority class with no need for minority class instances [110]. OCC has been thoroughly investigated, e.g. [20, 60, 90, 110], and many one-class classifiers have been proposed. One approach for improving one-class classifier performance on static data sets is learning in the context of concepts: the majority class is broken down into its constituent sub-concepts and a classifier is induced over each [100]. Modern machine learning research, however, is concerned with data streams: where potentially infinite amounts of data arrive quickly and need to be processed as they arrive. In these cases it is not possible to store all of the instances in memory, nor is it practical to wait until “the end of the data stream” before learning. An example is network intrusion detection: detecting an attack on the computer network should occur as soon as practicable. Many one-class classifiers for data streams have been described in the literature, e.g. [33, 108], and it is worth investigating whether the approach of learning in the context of concepts can be successfully applied to the OCC task for data streams as well. This thesis identifies that the idea of breaking the majority class into subconcepts to simplify the OCC problem has been demonstrated for static data sets, [100], but has not been applied in data streams. The primary contribution to the literature made by this thesis is the identification of how the majority class’s sub-concept structure can be used to improve the classification performance of streaming one-class classifiers while mitigating the challenges posed by the data stream environment. Three frameworks are developed, each using this knowledge to a different degree. These are applied with a selection of streaming one-class classifiers to both synthetic and benchmark data streams with performance compared to that of the one-class classifier learning independently. These results are analyzed and it is shown that scenarios exist where knowledge of sub-concepts can be used to improve one-class classifier performance. machine learning one-class classification data streams sub-concepts
13	Compositional Kalman Filters for Navigational Data Streams In IoT Systems Boiko, Yuri 24 September 2018 (has links) The Internet of Things (IoT) technology is undergoing expansion into different aspects of our life, changing the way businesses operate and bringing in efficiency and reliability of digital controls on various levels. Processing large amount of data from connected sensor networks becomes a challenging task. Specific part of it related to fleet management requires processing of the data on boards of vehicles equipped with multiple electronic devices and sensors for maintenance and operation of such vehicles. Herewith the efficiency of various configurations of employing Kalman filter algorithm for on-the-fly pre-processing of the sensory network originated data streams in IoT systems is investigated. Contextual grouping of the data streams for pre-processing by specialized Kalman filter units is found to be able to satisfy the logistics of IoT system operations. It is demonstrated that interconnection of the elementary Kalman filters into an organized network, the compositional Kalman filter, allows to take advantage of the redundancy of data streams to accomplish IoT pre-processing of the raw data. This includes intermittent data imputation, missing data replacement, lost data recovery, as well as error events detection and correction. Architectures are proposed and tested for the interaction of elementary Kalman filters in detection of GPS outage events and their compensation via data replacement procedure, as well as GPS offset occurrence detection and its compensation via data correction routine. Demonstrated is the efficiency of the suggested compositional designs of elementary Kalman filter networks for the purpose of data pre-processing in IoT systems. Kalman filter Internet of Things IoT Data streams Navigation
14	Detecção de novidade em fluxos contínuos de dados multiclasse / Novelty detection in multiclass data streams Elaine Ribeiro de Faria Paiva 08 May 2014 (has links) Mineração de fluxos contínuos de dados é uma área de pesquisa emergente que visa extrair conhecimento a partir de grandes quantidades de dados, gerados continuamente. Detecção de novidade é uma tarefa de classificação que consiste em reconhecer que um exemplo ou conjunto de exemplos em um fluxo de dados diferem significativamente dos exemplos vistos anteriormente. Essa é uma importante tarefa para fluxos contínuos de dados, principalmente porque novos conceitos podem aparecer, desaparecer ou evoluir ao longo do tempo. A maioria dos trabalhos da literatura apresentam a detecção de novidade como uma tarefa de classificação binária. Poucos trabalhos tratam essa tarefa como multiclasse, mas usam medidas de avaliação binária. Em vários problemas, o correto seria tratar a detecção de novidade em fluxos contínuos de dados como uma tarefa multiclasse, no qual o conceito conhecido do problema é formado por uma ou mais classes, e diferentes novas classes podem aparecer ao longo do tempo. Esta tese propõe um novo algoritmo MINAS para detecção de novidade em fluxos contínuos de dados. MINAS considera que a detecção de novidade é uma tarefa multiclasse. Na fase de treinamento, MINAS constrói um modelo de decisão com base em um conjunto de exemplos rotulados. Na fase de aplicação, novos exemplos são classificados usando o modelo de decisão atual, ou marcados como desconhecidos. Grupos de exemplos desconhecidos podem formar padrões-novidade válidos, que são então adicionados ao modelo de decisão. O modelo de decisão é atualizado ao longo do fluxo a fim de refletir mudanças nas classes conhecidas e permitir inserção de padrões-novidade. Esta tese também propõe uma nova metodologia para avaliação de algoritmos para detecção de novidade em fluxos contínuos de dados. Essa metodologia associa os padrões-novidade não rotulados às classes reais do problema, permitindo assim avaliar a matriz de confusão que é incremental e retangular. Além disso, a metodologia de avaliação propõe avaliar os exemplos desconhecidos separadamente e utilizar medidas de avaliação multiclasse. Por último, esta tese apresenta uma série de experimentos executados usando o MINAS e os principais algoritmos da literatura em bases de dados artificiais e reais. Além disso, o MINAS foi aplicado a um problema real, que consiste no reconhecimento de atividades humanas usando dados de acelerômetro. Os resultados experimentais mostram o potencial do algoritmo e da metodologia propostos / Data stream mining is an emergent research area that aims to extract knowledge from large amounts of continuously generated data. Novelty detection is a classification task that assesses if an example or a set of examples differ significantly from the previously seen examples. This is an important task for data streams, mainly because new concepts may appear, disappear or evolve over time. Most of the work found in the novelty detection literature presents novelty detection as a binary classification task. A few authors treat this task as multiclass, but even they use binary evaluation measures. In several real problems, novelty detection in data streams must be treated as a multiclass task, in which, the known concept about the problem is composed by one or more classes and different new classes may appear over time. This thesis proposes a new algorithm MINAS for novelty detection in data streams. MINAS deals with novelty detection as a multiclass task. In the training phase, MINAS builds a decision model based on a labeled data set. In the application phase, new examples are classified using the decision model, or marked with an unknown profile. Groups of unknown examples can be later used to create valid novelty patterns, which are added to the current decision model. The decision model is updated as new data arrives in the stream in order to reflect changes in the known classes and to allow the addition of novelty patterns. This thesis also proposes a new methodology to evaluate classifiers for novelty detection in data streams. This methodology associates the unlabeled novelty patterns to the true problem classes, allowing the evaluation of a confusion matrix that is incremental and rectangular. In addition, the proposed methodology allows the evaluation of unknown examples separately and the use multiclass evaluation measures. Additionally, this thesis presents a set of experiments carried out comparing the MINAS algorithm and the main novelty detection algorithms found in the literature, using artificial and real data sets. Finally, MINAS was applied to a human activity recognition problem using accelerometer data. The experimental results show the potential of the proposed algorithm and methodologies Detecção de novidades Fluxos contínuos de dados Data streams Novelty detection
15	Agrupamento de fluxos de dados utilizando dimensão fractal / Clustering data streams using fractal dimension Christian Cesar Bones 15 March 2018 (has links) Realizar o agrupamento de fluxos de dados contínuos e multidimensionais (multidimensional data streams) é uma tarefa dispendiosa, visto que esses tipos de dados podem possuir características peculiares e que precisam ser consideradas, dentre as quais destacam-se: podem ser infinitos, tornando inviável, em muitas aplicações realizar mais de uma leitura dos dados; ponto de dados podem possuir diversas dimensões e a correlação entre as dimensões pode impactar no resultado final da análise e; são capazes de evoluir com o passar do tempo. Portanto, faz-se necessário o desenvolvimento de métodos computacionais adequados a essas características, principalmente nas aplicações em que realizar manualmente tal tarefa seja algo impraticável em razão do volume de dados, por exemplo, na análise e predição do comportamento climático. Nesse contexto, o objetivo desse trabalho de pesquisa foi propor técnicas computacionais, eficientes e eficazes, que contribuíssem para a extração de conhecimento de fluxos de dados com foco na tarefa de agrupamento de fluxos de dados similares. Assim, no escopo deste trabalho, foram desenvolvidos dois métodos para agrupamento de fluxos de dados evolutivos, multidimensionais e potencialmente infinitos, ambos baseados no conceito de dimensão fractal, até então não utilizada nesse contexto na literatura: o eFCDS, acrônimo para evolving Fractal Clustering of Data Streams, e o eFCC, acrônimo para evolving Fractal Clusters Construction. O eFCDS utiliza a dimensão fractal para mensurar a correlação, linear ou não, existente entre as dimensões dos dados de um fluxo de dados multidimensional num período de tempo. Esta medida, calculada para cada fluxo de dados, é utilizada como critério de agrupamento de fluxos de dados com comportamentos similares ao longo do tempo. O eFCC, por outro lado, realiza o agrupamento de fluxos de dados multidimensionais de acordo com dois critérios principais: comportamento ao longo do tempo, considerando a medida de correlação entre as dimensões dos dados de cada fluxo de dados, e a distribuição de dados em cada grupo criado, analisada por meio da dimensão fractal do mesmo. Ambos os métodos possibilitam ainda a identificação de outliers e constroem incrementalmente os grupos ao longo do tempo. Além disso, as soluções propostas para tratamento de correlações em fluxos de dados multidimensionais diferem dos métodos apresentados na literatura da área, que em geral utilizam técnicas de sumarização e identificação de correlações lineares aplicadas apenas à fluxos de dados unidimensionais. O eFCDS e o eFCC foram testados e confrontados com métodos da literatura que também se propõem a agrupar fluxos de dados. Nos experimentos realizados com dados sintéticos e reais, tanto o eFCDS quanto o eFCC obtiveram maior eficiência na construção dos agrupamentos, identificando os fluxos de dados com comportamento semelhante e cujas dimensões se correlacionam de maneira similar. Além disso, o eFCC conseguiu agrupar os fluxos de dados que mantiveram distribuição dos dados semelhante em um período de tempo. Os métodos possuem como uma das aplicações imediatas a extração de padrões de interesse de fluxos de dados proveniente de sensores climáticos, com o objetivo de apoiar pesquisas em Agrometeorologia. / To cluster multidimensional data streams is an expensive task since this kind of data could have some peculiarities characteristics that must be considered, among which: they are potencially infinite, making many reads impossible to perform; data can have many dimensions and the correlation among them could have an affect on the analysis; as the time pass through they are capable of evolving. Therefore, it is necessary the development of appropriate computational methods to these characteristics, especially in the areas where performing such task manually is impractical due to the volume of data, for example, in the analysis and prediction of climate behavior. In that context, the research goal was to propose efficient and effective techniques that clusters multidimensional evolving data streams. Among the applications that handles with that task, we highlight the evolving Fractal Clustering of Data Streams, and the eFCC acronym for evolving Fractal Clusters Construction. The eFCDS calculates the data streams fractal dimension to correlate the dimensions in a non-linear way and to cluster those with the biggest similarity over a period of time, evolving the clusters as new data is read. Through calculating the fractal dimension and then cluster the data streams the eFCDS applies an innovative strategy, distinguishing itself from the state-of-art methods that perform clustering using summaries techniques and linear correlation to build their clusters over unidimensional data streams. The eFCDS also identifies those data streams who showed anomalous behavior in the analyzed time period treating them as outliers. The other method devoleped is called eFCC. It also builds data streams clusters, however, they are built on a two premises basis: the data distribution should be likely the same and second the behavior should be similar in the same time period. To perform that kind of clustering the eFCC calculates the clusters fractal dimension itself and the data streams fractal dimension, following the evolution in the data, relocating the data streams from one group to another when necessary and identifying those that become outlier. Both eFCDS and eFCC were evaluated and confronted with their competitor, that also propose to cluster data streams and not only data points. Through a detailed experimental evaluation using synthetic and real data, both methods have achieved better efficiency on building the groups, better identifying data streams with similar behavior during a period of time and whose dimensions correlated in a similar way, as can be observed in the result chapter 6. Besides that, the eFCC also cluster the data streams which maintained similar data distribution over a period of time. As immediate application the methods developed in this thesis can be used to extract patterns of interest from climate sensors aiming to support researches in agrometeorology. Agrupamento de sensores Extração de conhecimento Fluxo de dados Clustering Data streams Sensors
16	A Comparative Study of Ensemble Active Learning Alabdulrahman, Rabaa January 2014 (has links) Data Stream mining is an important emerging topic in the data mining and machine learning domain. In a Data Stream setting, the data arrive continuously and often at a fast pace. Examples include credit cards transaction records, surveillances video streams, network event logs, and telecommunication records. Such types of data bring new challenges to the data mining research community. Specifically, a number of researchers have developed techniques in order to build accurate classification models against such Data Streams. Ensemble Learning, where a number of so-called base classifiers are combined in order to build a model, has shown some promise. However, a number of challenges remain. Often, the class labels of the arriving data are incorrect or missing. Furthermore, Data Stream algorithms may benefit from an online learning paradigm, where a small amount of newly arriving data is used to learn incrementally. To this end, the use of Active Learning, where the user is in the loop, has been proposed as a way to extend Ensemble Learning. Here, the hypothesis is that Active Learning would increase the performance, in terms of accuracy, ensemble size, and the time it takes to build the model. This thesis tests the validity of this hypothesis. Namely, we explore whether augmenting Ensemble Learning with an Active Learning component benefits the Data Stream Learning process. Our analysis indicates that this hypothesis does not necessarily hold for the datasets under consideration. That is, the accuracies of Active Ensemble Learning are not statistically significantly higher than when using normal Ensemble Learning. Rather, Active Learning may even cause an increase in error rate. Further, Active Ensemble Learning actually results in an increase in the time taken to build the model. However, our results indicate that Active Ensemble Learning builds accurate models against much smaller ensemble sizes, when compared to the traditional Ensemble Learning algorithms. Further, the models we build are constructed against small and incrementally growing training sets, which may be very beneficial in a real time Data Stream setting. Data Streams Ensemble Learning Active Learning Active Ensemble Learning
17	Query Processing Over Incomplete Data Streams Ren, Weilong 19 November 2021 (has links) No description available. Computer Science P-iDS Query Processing Incomplete Data Streams
18	Efficient Estimation of Dynamic Density Functions with Applications in Streaming Data Qahtan, Abdulhakim Ali Ali 11 May 2016 (has links) Recent advances in computing technology allow for collecting vast amount of data that arrive continuously in the form of streams. Mining data streams is challenged by the speed and volume of the arriving data. Furthermore, the underlying distribution of the data changes over the time in unpredicted scenarios. To reduce the computational cost, data streams are often studied in forms of condensed representation, e.g., Probability Density Function (PDF). This thesis aims at developing an online density estimator that builds a model called KDE-Track for characterizing the dynamic density of the data streams. KDE-Track estimates the PDF of the stream at a set of resampling points and uses interpolation to estimate the density at any given point. To reduce the interpolation error and computational complexity, we introduce adaptive resampling where more/less resampling points are used in high/low curved regions of the PDF. The PDF values at the resampling points are updated online to provide up-to-date model of the data stream. Comparing with other existing online density estimators, KDE-Track is often more accurate (as reflected by smaller error values) and more computationally efficient (as reflected by shorter running time). The anytime available PDF estimated by KDE-Track can be applied for visualizing the dynamic density of data streams, outlier detection and change detection in data streams. In this thesis work, the first application is to visualize the taxi traffic volume in New York city. Utilizing KDE-Track allows for visualizing and monitoring the traffic flow on real time without extra overhead and provides insight analysis of the pick up demand that can be utilized by service providers to improve service availability. The second application is to detect outliers in data streams from sensor networks based on the estimated PDF. The method detects outliers accurately and outperforms baseline methods designed for detecting and cleaning outliers in sensor data. The third application is to detect changes in data streams. We propose a framework based on Principal Component Analysis (PCA) that reduces the problem of detecting changes in multidimensional data into the problem of detecting changes in the projected data on the principal components. We provide a theoretical analysis, which is support by experimental results to show that utilizing PCA reflects different types of changes in data streams on the projected data over one or more principal components. Our framework is accurate in detecting changes with low computational costs and scales well for high dimensional data. data streams density estimation dynamic density change detection outtier detection
19	A grid-based middleware for processing distributed data streams Chen, Liang 22 September 2006 (has links) No description available. Grid Computing Distributed Data Streams GATES Self-Adaptation
20	Análise espaço-temporal de data streams multidimensionais / Spatio-temporal analysis in multidimensional data streams Nunes, Santiago Augusto 06 April 2015 (has links) Fluxos de dados são usualmente caracterizados por grandes quantidades de dados gerados continuamente em processos síncronos ou assíncronos potencialmente infinitos, em aplicações como: sistemas meteorológicos, processos industriais, tráfego de veículos, transações financeiras, redes de sensores, entre outras. Além disso, o comportamento dos dados tende a sofrer alterações significativas ao longo do tempo, definindo data streams evolutivos. Estas alterações podem significar eventos temporários (como anomalias ou eventos extremos) ou mudanças relevantes no processo de geração da stream (que resultam em alterações na distribuição dos dados). Além disso, esses conjuntos de dados podem possuir características espaciais, como a localização geográfica de sensores, que podem ser úteis no processo de análise. A detecção dessas variações de comportamento que considere os aspectos da evolução temporal, assim como as características espaciais dos dados, é relevante em alguns tipos de aplicação, como o monitoramento de eventos climáticos extremos em pesquisas na área de Agrometeorologia. Nesse contexto, esse projeto de mestrado propõe uma técnica para auxiliar a análise espaço-temporal em data streams multidimensionais que contenham informações espaciais e não espaciais. A abordagem adotada é baseada em conceitos da Teoria de Fractais, utilizados para análise de comportamento temporal, assim como técnicas para manipulação de data streams e estruturas de dados hierárquicas, visando permitir uma análise que leve em consideração os aspectos espaciais e não espaciais simultaneamente. A técnica desenvolvida foi aplicada a dados agrometeorológicos, visando identificar comportamentos distintos considerando diferentes sub-regiões definidas pelas características espaciais dos dados. Portanto, os resultados deste trabalho incluem contribuições para a área de mineração de dados e de apoio a pesquisas em Agrometeorologia. / Data streams are usually characterized by large amounts of data generated continuously in synchronous or asynchronous potentially infinite processes, in applications such as: meteorological systems, industrial processes, vehicle traffic, financial transactions, sensor networks, among others. In addition, the behavior of the data tends to change significantly over time, defining evolutionary data streams. These changes may mean temporary events (such as anomalies or extreme events) or relevant changes in the process of generating the stream (that result in changes in the distribution of the data). Furthermore, these data sets can have spatial characteristics such as geographic location of sensors, which can be useful in the analysis process. The detection of these behavioral changes considering aspects of evolution, as well as the spatial characteristics of the data, is relevant for some types of applications, such as monitoring of extreme weather events in Agrometeorology researches. In this context, this project proposes a technique to help spatio-temporal analysis in multidimensional data streams containing spatial and non-spatial information. The adopted approach is based on concepts of the Fractal Theory, used for temporal behavior analysis, as well as techniques for data streams handling also hierarchical data structures, allowing analysis tasks that take into account the spatial and non-spatial aspects simultaneously. The developed technique has been applied to agro-meteorological data to identify different behaviors considering different sub-regions defined by the spatial characteristics of the data. Therefore, results from this work include contribution to data mining area and support research in Agrometeorology. Análise espaço-temporal Data mining Data streams multidimensionais. fractals Mineração de dados Multidimensional data streams Spatio-temporal analysis Teoria dos fractais

Search results