Spelling suggestions: "subject:"data stream"" "subject:"mata stream""
91 |
State Management for Efficient Event Pattern DetectionZhao, Bo 20 May 2022 (has links)
Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst.
Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt.
In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt. / Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events.
State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified.
In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache.
I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency.
|
92 |
Deferred Maintenance of Disk-Based Random SamplesGemulla, Rainer, Lehner, Wolfgang 12 January 2023 (has links)
Random sampling is a well-known technique for approximate processing of large datasets. We introduce a set of algorithms for incremental maintenance of large random samples on secondary storage. We show that the sample maintenance cost can be reduced by refreshing the sample in a deferred manner. We introduce a novel type of log file which follows the intuition that only a “sample” of the operations on the base data has to be considered to maintain a random sample in a statistically correct way. Additionally, we develop a deferred refresh algorithm which updates the sample by using fast sequential disk access only, and which does not require any main memory. We conducted an extensive set of experiments and found, that our algorithms reduce maintenance cost by several orders of magnitude.
|
93 |
Constructive Visualization : A token-based paradigm allowing to assemble dynamic visual representation for non-experts / La visualisation constructive : un paradigme de design de visualisation qui permet d'assembler des représentations visuel dynamique pour des personnes non expertesHuron, Samuel 29 September 2014 (has links)
Durant les 20 dernières années, la recherche en visualisation d’informations (InfoVis) a permis l’émergence de nouvelles techniques et méthodes qui permettent d’assister l’analyse de données intensives pour la science, l’industrie, et les gouvernements. Cependant, la plupart de ces travaux de recherches furent orientés sur des données statiques pour des utilisateurs experts.Dernièrement, des évolutions technologique et sociétales ont eu pour effet de rendre les données de plus en plus dynamiques et accessibles pour une population plus diverse. Par exemple des flux de données tels que les emails, les mises à jours de statuts sur les réseaux sociaux, les flux RSS, les systèmes de
gestion de versions, et bien d’autres. Ces nouveaux types de données sont utilisés par une population qui n’est pas forcément entraînée ou éduquée à utiliser des visualisations de données. La plupart de ces personnes sont des utilisateurs occasionnels, d’autres utilisent très souvent ces données dans leurs travaux. Dans les deux cas, il est probable que ces personnes n’aient pas reçu de formation formelle en visualisation de données.Ces changements technologiques et sociétaux ont généré une multitude de nouveaux défis, car la plupart des techniques de visualisations sont conçues pour des experts et des bases de données statiques. Peu d’études ont été conduites pour explorer ces défis. Dans ce rapport de thèse, j’adresse la question suivante : « Peut-on permettre à des utilisateurs non-experts de créer leur propre visualisation et de contribuer à l’analyse de flux de données ? »La première étape pour répondre à cette question est d’évaluer si des personnes non formées à la visualisation d’informations ou aux « data sciences » peuvent effectuer des tâches d’analyse de données dynamiques utiles, en utilisant un système de visualisation adapté pour supporter cette tâche. Dans la première partie de cette dissertation, je présente différents scénarios et systèmes, qui permettent à des utilisateurs non-experts (de 20 à 300 ou 2000 à 700 000 personnes) d’utiliser la visualisation d’informations pour analyser des données dynamiques.Un autre problème important est le manque de principes génériques de design pour l’encodage visuel de visualisations d’informations dynamiques. Dans cette dissertation, je conçois, définis, et explore un espace de design pour représenter des donnés dynamiques pour des utilisateurs non-experts. Cette espace de design est structuré par des jetons graphiques représentant des éléments de données qui permettent de construire dans le temps différentes visualisations, tant classiques que nouvelles.Dans cette thèse, je propose un nouveau paradigme de conception (design) pour faciliter la réalisation de visualisation d’informations par les utilisateurs non-experts. Ce paradigme est inspiré par des théories établies en psychologie du développement, tout autant que par des pratiques passées et présentes de création de visualisation à partir d’objets tangibles. Je décris tout d’abord les composants et processus de bases qui structurent ce paradigme. Ensuite, j’utiliserai cette description pour étudier *si et comment* des utilisateur non-experts sont capables de créer, discuter, et mettre à jour leurs propres visualisations. Cette étude nous permettra de réviser notre modèle précédent et de fournir une première exploration des phénomènes relatifs à la création d’encodages visuels par des utilisateurs non-experts sans logiciel. En résumé, cette thèse contribue à la compréhension des visualisations dynamiques pour des utilisateurs non-experts. / During the past two decades, information visualisation (InfoVis) research has created new techniques and methods to support data- intensive analyses in science, industry and government. These have enabled a wide range of analyses tasks to be executed, with tasks varying in terms of the type and volume of data involved. However, the majority of this research has focused on static datasets, and the analysis and visualisation tasks tend to be carried out by trained expert users. In more recent years, social changes and technological advances have meant that data have become more and more dynamic, and are consumed by a wider audience. Examples of such dynamic data streams include e-mails, status updates, RSS 1 feeds, versioning systems, social networks and others. These new types of data are used by populations that are not specifically trained in information visualization. Some of these people might consist of casual users, while others might consist of people deeply involved with the data, but in both cases, they would not have received formal training in information visualization. For simplicity, throughout this dissertation, I refer to the people (casual users, novices, data experts) who have not been trained in information visualisation as non-experts.These social and technological changes have given rise to multiple challenges because most existing visualisation models and techniques are intended for experts, and assume static datasets. Few studies have been conducted that explore these challenges. In this dissertation, with my collaborators, I address the question: Can we empower non-experts in their use of visualisation by enabling them to contribute to data stream analysis as well as to create their own visualizations?The first step to answering this question is to determine whether people who are not trained in information visualisation and the data sciences can conduct useful dynamic analysis tasks using a visualisation system that is adapted to support their tasks. In the first part of this dissertation I focus on several scenarios and systems where different sized crowds of InfoVis non-experts users (20 to 300 and 2 000 to 700 000 people) use dynamic information visualisation to analyse dynamic data.Another important issue is the lack of generic design principles for the visual encoding of dynamic visualization. In this dissertation I design, define and explore a design space to represent dynamic data for non-experts. This design space is structured by visual tokens representing data items that provide the constructive material for the assembly over time of different visualizations, from classic represen- tations to new ones. To date, research on visual encoding has been focused on static datasets for specific tasks, leaving generic dynamic approaches unexplored and unexploited.In this thesis, I propose construction as a design paradigm for non-experts to author simple and dynamic visualizations. This paradigm is inspired by well-established developmental psychological theory as well as past and existing practices of visualisation authoring with tangible elements. I describe the simple conceptual components and processes underlying this paradigm, making it easier for the human computer interaction community to study and support this process for a wide range of visualizations. Finally, I use this paradigm and tangible tokens to study if and how non-experts are able to create, discuss and update their own visualizations. This study allows us to refine our previous model and provide a first exploration into how non-experts perform a visual mapping without software. In summary, this thesis contributes to the understanding of dynamic visualisation for non-expert users.
|
94 |
Sustainable Declarative Monitoring Architecture : Energy optimization of interactions between application service oriented queries and wireless sensor devices : Application to Smart Buildings / Architecture de monitoring déclaratif durable : Optimisation énergétique des interactions entre requêtes applicatives orientées service et réseau de capteurs sans fil : Application aux bâtiments intelligentsPinarer, Ozgun 15 December 2017 (has links)
La dernière décennie a montré un intérêt croissant pour les bâtiments intelligents. Les bâtiments traditionnels sont les principaux consommateurs d’une partie importante des ressources énergétiques, d'où le besoin de bâtiments intelligents a alors émergé. Ces nouveaux bâtiments doivent être conçus selon des normes de construction durables pour consommer moins. Ces bâtiments intelligents sont devenus l’un des principaux domaines d’application des environnements pervasifs. En effet, une infrastructure basique de construction de bâtiment intelligent se compose notamment d’un ensemble de capteurs sans fil. Les capteurs basiques permettent l’acquisition, la transmission et la réception de données. La consommation d’énergie élevée de l’ensemble de ces appareils est un des problèmes les plus difficiles et fait donc l’objet d’études dans ce domaine de la recherche. Les capteurs sont autonomes en termes d’énergie. Etant donné que la consommation d’énergie a un fort impact sur la durée de vie du service, il existe plusieurs approches dans la littérature. Cependant, les approches existantes sont souvent adaptées à une seule application de surveillance et reposent sur des configurations statiques pour les capteurs. Dans cette thèse, nous contribuons à la définition d’une architecture de surveillance déclaratif durable par l’optimisation énergétique des interactions entre requêtes applicative orientées service et réseau de capteurs sans fil. Nous avons choisi le bâtiment intelligent comme cas d’application et nous étudions donc un système de surveillance d’un bâtiment intelligent. Du point de vue logiciel, un système de surveillance peut être défini comme un ensemble d’applications qui exploitent les mesures des capteurs en temps réel. Ces applications sont exprimées dans un langage déclaratif sous la forme de requêtes continues sur les flux de données des capteurs. Par conséquent, un système de multi-applications nécessite la gestion de plusieurs demandes de flux de données suivant différentes fréquences d’acq/tx de données pour le même capteur sans fil, avec des exigences dynamiques requises par les applications. Comme une configuration statique ne peut pas optimiser la consommation d’énergie du système, nous proposons une approche intitulée Smart-Service Stream-oriented Sensor Management (3SoSM) afin d’optimiser les interactions entre les exigences des applications et l’environnement des capteurs sans fil, en temps réel. 3SoSM offre une configuration dynamique des capteurs pour réduire la consommation d’énergie tout en satisfaisant les exigences des applications en temps réel. Nous avons conduit un ensemble d’expérimentations effectuées avec un simulateur de réseau de capteurs sans fil qui ont permis de valider notre approche quant à l’optimisation de la consommation d’énergie des capteurs, et donc l’augmentation de la durée de vie de ces capteurs, en réduisant notamment les communications non nécessaires. / Recent researches and analysis reports declare that high energy consumption of buildings is major problem in developed countries. As a result, they show concretely that building energy management systems (BEMS) and deployed wireless sensor network environments are important for energy efficiency of building operations. In the literature, existing smart building management systems focus on energy consumption of the building, hardware deployed inside/outside of the building and network communication issues. They adopt static configurations for wireless sensor devices and proposed models are fitted to a single application. In this study, we propose a sustainable declarative monitoring architecture that focus on the energy optimisation of interactions between application service oriented queries and wireless sensor devices. We consider the monitoring system as a set of applications that exploit sensor measures in real time such as HVAC automation and control systems, real time supervision, security. These applications can be configured dynamically by the users or by the supervisor. In our approach, we take a data point of view: applications are declaratively expressed as a set of continuous queries on the sensor data stream. To achieve our objective of energy aware optimization of the monitoring architecture, we formalize sensor device configuration and fit data acquisition and data transmission to actual applications requirements. We present a complete monitoring architecture and an algorithm that handles dynamic sensor configuration. We introduce a platform that covers physical and also simulated wireless sensor devices.
|
95 |
MPEG Z/Alpha and high-resolution MPEG / MPEG Z/Alpha och högupplösande MPEG-videoZiegler, Gernot January 2003 (has links)
<p>The progression of technical development has yielded practicable camera systems for the acquisition of so called depth maps, images with depth information. </p><p>Images and movies with depth information open the door for new types of applications in the area of computer graphics and vision. That implies that they will need to be processed in all increasing volumes.</p><p>Increased depth image processing puts forth the demand for a standardized data format for the exchange of image data with depth information, both still and animated. Software to convert acquired depth data to such videoformats is highly necessary. </p><p>This diploma thesis sheds light on many of the issues that come with this new task group. It spans from data acquisition over readily available software for the data encoding to possible future applications. </p><p>Further, a software architecture fulfilling all of the mentioned demands is presented. </p><p>The encoder is comprised of a collection of UNIX programs that generate MPEG Z/Alpha, an MPEG2 based video format. MPEG Z/Alpha contains beside MPEG2's standard data streams one extra data stream to store image depth information (and transparency). </p><p>The decoder suite, called TexMPEG, is a C library for the in-memory decompression of MPEG Z/Alpha. Much effort has been put into video decoder parallelization, and TexMPEG is now capable of decoding multiple video streams, not only in parallel internally, but also with inherent frame synchronization between parallely decoded MPEG videos.</p>
|
96 |
Efficient Frequent Closed Itemset Algorithms With Applications To Stream Mining And ClassificationRanganath, B N 09 1900 (has links)
Data mining is an area to find valid, novel, potentially useful, and ultimately understandable abstractions in a data. Frequent itemset mining is one of the important data mining approaches to find those abstractions in the form of patterns. Frequent Closed itemsets provide complete and condensed information for non-redundant association rules generation. For many applications mining all the frequent itemsets is not necessary, and mining frequent Closed itemsets are adequate. Compared to frequent itemset mining, frequent Closed itemset mining generates less number of itemsets, and therefore improves the efficiency and effectiveness of these tasks.
Recently, much research has been done on Closed itemsets mining, but it is mainly for traditional databases where multiple scans are needed, and whenever new transactions arrive, additional scans must be performed on the updated transaction database; therefore, they are not suitable for data stream mining.
Mining frequent itemsets from data streams has many potential and broad applications. Some of the emerging applications of data streams that require association rule mining are network traffic monitoring and web click streams analysis. Different from data in traditional static databases, data streams typically arrive continuously in high speed with huge amount and changing data distribution. This raises new issues that need to be considered when developing association rule mining techniques for stream data.
Recent works on data stream mining based on sliding window method slide the window by one transaction at a time. But when the window size is large and support threshold is low, the existing methods consume significant time and lead to a large increase in user response time.
In our first work, we propose a novel algorithm Stream-Close based on sliding window model to mine frequent Closed itemsets from the data streams within the current sliding window. We enhance the scalabality of the algorithm by introducing several optimization techniques such as sliding the window by multiple transactions at a time and novel pruning techniques which lead to a considerable reduction in the number of candidate itemsets to be examined for closure checking. Our experimental studies show that the proposed algorithm scales well with large data sets.
Still the notion of frequent closed itemsets generates a huge number of closed itemsets in some applications. This drawback makes frequent closed itemsets mining infeasible in many applications since users cannot interpret the large volume of output (which sometimes will be greater than the data itself when support threshold is low) and may lead to an overhead to develop extra applications which post processes the output of original algorithm to reduce the size of the output.
Recent work on clustering of itemsets considers strictly either expression(consists of items present in itemset) or support of the itemsets or partially both to reduce the number of itemsets. But the drawback of the above approaches is that in some situations, number of itemsets does not reduce due to their restricted view of either considering expressions or support.
So we propose a new notion of frequent itemsets called clustered itemsets which considers both expressions and support of the itemsets in summarizing the output. We introduce a new distance measure w.r.t expressions and also prove the problem of mining clustered itemsets to be NP-hard.
In our second work, we propose a deterministic locality sensitive hashing based classifier using clustered itemsets. Locality sensitive hashing(LSH)is a technique for efficiently finding a nearest neighbour in high dimensional data sets. The idea of locality sensitive hashing is to hash the points using several hash functions to ensure that for each function the probability of collision is much higher for objects that are close to each other than those that are far apart. We propose a LSH based approximate nearest neighbour classification strategy. But the problem with LSH is, it randomly chooses hash functions and the estimation of a large number of hash functions could lead to an increase in query time. From Classification point of view, since LSH chooses randomly from a family of hash functions the buckets may contain points belonging to other classes which may affect classification accuracy. So, in order to overcome these problems we propose to use class association rules based hash functions which ensure that buckets corresponding to the class association rules contain points from the same class. But associative classification involves generation and examination of large number of candidate class association rules. So, we use the clustered itemsets which reduce the number of class association rules to be examined. We also establish formal connection between clustering parameter(delta used in the generation of clustered frequent itemsets) and discriminative measure such as Information gain. Our experimental studies show that the proposed method achieves an increase in accuracy over LSH based near neighbour classification strategy.
|
97 |
Sampling Algorithms for Evolving DatasetsGemulla, Rainer 24 October 2008 (has links) (PDF)
Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such samples are widely used to speed up the processing of analytic queries and data-mining tasks, to enhance query optimization, and to facilitate information integration. Most of the existing work on database sampling focuses on how to create or exploit a random sample of a static database, that is, a database that does not change over time. The assumption of a static database, however, severely limits the applicability of these techniques in practice, where data is often not static but continuously evolving. In order to maintain the statistical validity of the sample, any changes to the database have to be appropriately reflected in the sample. In this thesis, we study efficient methods for incrementally maintaining a uniform random sample of the items in a dataset in the presence of an arbitrary sequence of insertions, updates, and deletions. We consider instances of the maintenance problem that arise when sampling from an evolving set, from an evolving multiset, from the distinct items in an evolving multiset, or from a sliding window over a data stream. Our algorithms completely avoid any accesses to the base data and can be several orders of magnitude faster than algorithms that do rely on such expensive accesses. The improved efficiency of our algorithms comes at virtually no cost: the resulting samples are provably uniform and only a small amount of auxiliary information is associated with the sample. We show that the auxiliary information not only facilitates efficient maintenance, but it can also be exploited to derive unbiased, low-variance estimators for counts, sums, averages, and the number of distinct items in the underlying dataset. In addition to sample maintenance, we discuss methods that greatly improve the flexibility of random sampling from a system's point of view. More specifically, we initiate the study of algorithms that resize a random sample upwards or downwards. Our resizing algorithms can be exploited to dynamically control the size of the sample when the dataset grows or shrinks; they facilitate resource management and help to avoid under- or oversized samples. Furthermore, in large-scale databases with data being distributed across several remote locations, it is usually infeasible to reconstruct the entire dataset for the purpose of sampling. To address this problem, we provide efficient algorithms that directly combine the local samples maintained at each location into a sample of the global dataset. We also consider a more general problem, where the global dataset is defined as an arbitrary set or multiset expression involving the local datasets, and provide efficient solutions based on hashing.
|
98 |
Obtenção de padrões sequenciais em data streams atendendo requisitos do Big DataCarvalho, Danilo Codeco 06 June 2016 (has links)
Submitted by Daniele Amaral (daniee_ni@hotmail.com) on 2016-10-20T18:13:56Z
No. of bitstreams: 1
DissDCC.pdf: 2421455 bytes, checksum: 5fd16625959b31340d5f845754f109ce (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-11-08T18:42:36Z (GMT) No. of bitstreams: 1
DissDCC.pdf: 2421455 bytes, checksum: 5fd16625959b31340d5f845754f109ce (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-11-08T18:42:42Z (GMT) No. of bitstreams: 1
DissDCC.pdf: 2421455 bytes, checksum: 5fd16625959b31340d5f845754f109ce (MD5) / Made available in DSpace on 2016-11-08T18:42:49Z (GMT). No. of bitstreams: 1
DissDCC.pdf: 2421455 bytes, checksum: 5fd16625959b31340d5f845754f109ce (MD5)
Previous issue date: 2016-06-06 / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / The growing amount of data produced daily, by both businesses and individuals in the web, increased the demand for analysis and extraction of knowledge of this data. While the last two decades the solution was to store and perform data mining algorithms, currently it has become unviable even to supercomputers. In addition, the requirements of the Big Data age go far beyond the large amount of data to analyze. Response time requirements and complexity of the data acquire more weight in many areas in the real world. New models have been researched and developed, often proposing distributed computing or different ways to handle the data stream mining. Current researches shows that an alternative in the data stream mining is to join a real-time event handling mechanism with a classic mining association rules or sequential patterns algorithms. In this work is shown a data stream mining approach to meet the Big Data response time requirement, linking the event handling mechanism in real time Esper and Incremental Miner of Stretchy Time Sequences (IncMSTS) algorithm. The results show that is possible to take a static data mining algorithm for data stream environment and keep tendency
in the patterns, although not possible to continuously read all data coming into the data stream. / O crescimento da quantidade de dados produzidos diariamente, tanto por empresas como por indivíduos na web, aumentou a exigência para a análise e extração de conhecimento sobre esses dados. Enquanto nas duas últimas décadas a solução era armazenar e executar algoritmos de mineração de dados, atualmente isso se tornou inviável mesmo em super computadores. Além disso, os requisitos da chamada era do Big Data vão muito além da grande quantidade de dados a se analisar. Requisitos de tempo de resposta e complexidade dos dados adquirem maior peso em muitos domínios no mundo real. Novos modelos têm sido pesquisados e desenvolvidos, muitas vezes propondo computação distribuída ou diferentes formas de se tratar a mineração de fluxo de dados. Pesquisas atuais mostram que uma alternativa na mineração de fluxo de dados é unir um mecanismo de tratamento de eventos em tempo real com algoritmos clássicos de mineração de regras de associação ou padrões sequenciais. Neste trabalho é mostrada uma abordagem de mineração de fluxo de dados (data stream) para atender ao requisito de tempo de resposta do Big Data, que une o mecanismo de manipulação de eventos em tempo real Esper e o algoritmo Incremental Miner of Stretchy Time Sequences (IncMSTS). Os resultados mostram ser possível levar um algoritmo de mineração de dados estático para o ambiente de fluxo de dados e manter as tendências de padrões encontrados, mesmo não sendo possível ler todos os dados vindos continuamente no fluxo de dados.
|
99 |
Avaliação criteriosa dos algoritmos de detecção de concept driftsSANTOS, Silas Garrido Teixeira de Carvalho 27 February 2015 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-07-11T12:33:28Z
No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
silas-dissertacao-versao-final-2016.pdf: 1708159 bytes, checksum: 6c0efc5f2f0b27c79306418c9de516f1 (MD5) / Made available in DSpace on 2016-07-11T12:33:28Z (GMT). No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
silas-dissertacao-versao-final-2016.pdf: 1708159 bytes, checksum: 6c0efc5f2f0b27c79306418c9de516f1 (MD5)
Previous issue date: 2015-02-27 / FACEPE / A extração de conhecimento em ambientes com fluxo contínuo de dados é uma atividade que
vem crescendo progressivamente. Diversas são as situações que necessitam desse mecanismo,
como o monitoramento do histórico de compras de clientes; a detecção de presença por meio
de sensores; ou o monitoramento da temperatura da água. Desta maneira, os algoritmos
utilizados para esse fim devem ser atualizados constantemente, buscando adaptar-se às
novas instâncias e levando em consideração as restrições computacionais. Quando se
trabalha em ambientes com fluxo contínuo de dados, em geral não é recomendável supor
que sua distribuição permanecerá estacionária. Diversas mudanças podem ocorrer ao longo
do tempo, desencadeando uma situação geralmente conhecida como mudança de conceito
(concept drift). Neste trabalho foi realizado um estudo comparativo entre alguns dos
principais métodos de detecção de mudanças: ADWIN, DDM, DOF, ECDD, EDDM, PL e
STEPD. Para execução dos experimentos foram utilizadas bases artificiais – simulando
mudanças abruptas, graduais rápidas, e graduais lentas – e também bases com problemas
reais. Os resultados foram analisados baseando-se na precisão, tempo de execução, uso
de memória, tempo médio de detecção das mudanças, e quantidade de falsos positivos e
negativos. Já os parâmetros dos métodos foram definidos utilizando uma versão adaptada
de um algoritmo genético. De acordo com os resultados do teste de Friedman juntamente
com Nemenyi, em termos de precisão, DDM se mostrou o método mais eficiente com as
bases utilizadas, sendo estatisticamente superior ao DOF e ECDD. Já EDDM foi o método
mais rápido e também o mais econômico no uso da memória, sendo superior ao DOF,
ECDD, PL e STEPD, em ambos os casos. Conclui-se então que métodos mais sensíveis
às detecções de mudanças, e consequentemente mais propensos a alarmes falsos, obtêm
melhores resultados quando comparados a métodos menos sensíveis e menos suscetíveis a
alarmes falsos. / Knowledge extraction from data streams is an activity that has been progressively receiving
an increased demand. Examples of such applications include monitoring purchase history
of customers, movement data from sensors, or water temperatures. Thus, algorithms used
for this purpose must be constantly updated, trying to adapt to new instances and taking
into account computational constraints. When working in environments with a continuous
flow of data, there is no guarantee that the distribution of the data will remain stationary.
On the contrary, several changes may occur over time, triggering situations commonly
known as concept drift. In this work we present a comparative study of some of the main
drift detection methods: ADWIN, DDM, DOF, ECDD, EDDM, PL and STEPD. For
the execution of the experiments, artificial datasets were used – simulating abrupt, fast
gradual, and slow gradual changes – and also datasets with real problems. The results
were analyzed based on the accuracy, runtime, memory usage, average time to change
detection, and number of false positives and negatives. The parameters of methods were
defined using an adapted version of a genetic algorithm. According to the Friedman test
with Nemenyi results, in terms of accuracy, DDM was the most efficient method with
the datasets used, and statistically superior to DOF and ECDD. EDDM was the fastest
method and also the most economical in memory usage, being statistically superior to
DOF, ECDD, PL and STEPD, in both cases. It was concluded that more sensitive change
detection methods, and therefore more prone to false alarms, achieve better results when
compared to less sensitive and less susceptible to false alarms methods.
|
100 |
MPEG Z/Alpha and high-resolution MPEG / MPEG Z/Alpha och högupplösande MPEG-videoZiegler, Gernot January 2003 (has links)
The progression of technical development has yielded practicable camera systems for the acquisition of so called depth maps, images with depth information. Images and movies with depth information open the door for new types of applications in the area of computer graphics and vision. That implies that they will need to be processed in all increasing volumes. Increased depth image processing puts forth the demand for a standardized data format for the exchange of image data with depth information, both still and animated. Software to convert acquired depth data to such videoformats is highly necessary. This diploma thesis sheds light on many of the issues that come with this new task group. It spans from data acquisition over readily available software for the data encoding to possible future applications. Further, a software architecture fulfilling all of the mentioned demands is presented. The encoder is comprised of a collection of UNIX programs that generate MPEG Z/Alpha, an MPEG2 based video format. MPEG Z/Alpha contains beside MPEG2's standard data streams one extra data stream to store image depth information (and transparency). The decoder suite, called TexMPEG, is a C library for the in-memory decompression of MPEG Z/Alpha. Much effort has been put into video decoder parallelization, and TexMPEG is now capable of decoding multiple video streams, not only in parallel internally, but also with inherent frame synchronization between parallely decoded MPEG videos.
|
Page generated in 0.0501 seconds