• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 25
  • 4
  • 3
  • 2
  • 1
  • Tagged with
  • 39
  • 39
  • 39
  • 13
  • 9
  • 8
  • 8
  • 8
  • 8
  • 8
  • 8
  • 7
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

NETAH, un framework pour la composition distribuée de flux d'événements / NETAH, A Framework for Composing Distributed Event Streams

Epal Njamen, Orleant 11 October 2016 (has links)
La réduction de la taille des équipements et l’avènement des communications sans fil ont fortement contribué à l’avènement d’une informatique durable. La plupart des applications informatiques sont aujourd’hui construites en tenant compte de cet environnement ambiant dynamique. Leur développement et exécution nécessite des infrastructures logicielles autorisant des entités à s’exécuter, à interagir à travers divers modes (synchrone et asynchrone), à s’adapter à leur(s) environnement(s) notamment en termes : - de consommation de ressources (calcul, mémoire, support de stockage, bases de données, connexions réseaux, ...), - de multiplicité des sources de données (illustrée par le Web, les capteurs, compteurs intelligents, satellites, les bases de données existantes, ...) - des formats multiples des objets statiques ou en flux (images, son, vidéos). Notons que dans beaucoup de cas, les objets des flux doivent être homogénéisées, enrichies, croisées, filtrées et agrégées pour constituer in fine des produits informationnels riches en sémantique et stratégiques pour les applications ou utilisateurs. Les systèmes à base d'événements particulièrement bien adaptés à la programmation de ce type d’applications. Ils peuvent offrir des communications anonymes et asynchrones (émetteurs/serveurs et récepteurs /clients ne se connaissent pas) qui facilitent l'interopération et la collaboration entre des services autonomes et hétérogènes. Les systèmes d’événements doivent être capables d'observer, transporter, filtrer, agréger, corréler et analyser de nombreux flux d’événements produits de manière distribuée. Ces services d’observation doivent pouvoir être déployés sur des architectures distribuées telles que les réseaux de capteurs, les smart-grid, et le cloud pour contribuer à l’observation des systèmes complexes et à leur contrôle autonome grâce à des processus réactifs de prise de décision. L’objectif de la thèse est de proposer un modèle de composition distribuée de flux d’événements et de spécifier un service d’événements capable de réaliser efficacement l’agrégation, la corrélation temporelle et causale, et l’analyse de flux d’événements dans des plateformes distribuées à base de services. TRAVAIL A REALISER (i) Etat de l’art - Systèmes de gestion de flux événements - Services et infrastructures d’événements distribués - Modèles d’événements (ii) Définition d’un scénario d’expérimentation et de comparaison des approches existantes. (iii) Définition d’un modèle de composition distribuée de flux d’événements à base de suscriptions (iv) Spécification et implantation d’un service distribuée de composition de flux d’événements. / The reduction in the size of equipments and the advent of wireless communications have greatly contributed to the advent of sustainable IT . Most computer applications today are built taking into account the dynamic ambient environment. Their development and execution need software infrastructure allowing entities to execute , interact through a variety of modes (synchronous and asynchronous ) , has to adapt to their (s) environment (s ), particularly in terms of: - resource consumption ( computation , memory , storage media , databases , networks connections , ... ) - the multiplicity of data sources ( illustrated by the Web , sensors, smart meters, satellites, existing data bases .. . ) - multiple formats of static objects or streams (images , sounds, videos ) . Note that in many cases , stream's objects have to be homogenized, enriched, filtered and aggregated to form informations rich in semantic and strategic for applications or end users. Event based systems are particularly well suited to the programming of such applications. They can offer anonymous and asynchronous communications ( transmitters / receivers and servers / clients do not know each others) that facilitate interoperation and cooperation between autonomous and heterogeneous services. The event systems should be able to observe, transport, filter, aggregate, correlate and analyze many events streams produced in a distributed way. These observation services must be able to be deployed on distributed architectures , such as sensor networks , smart -grid and cloud, to contribute to the observation of complex systems and their self-control via reactive decisions making processes. The aim of the thesis is to propose a model for distributed event flows composition and specify an event service that can effectively realize the aggregation , temporal and causal correlation , and analysis of flow events in distributed service -based platforms. WORK TO BE PERFORMED (i) State of the art: - Events flow management systems - distributed event services - event model ( ii ) Definition of a scenario for experimentation and comparison of existing approaches. ( iii ) Definition of a model of composition delivered a stream of events based superscriptions ( iv ) Specification and implementation of a distributed event flow composition service
22

Automated Injection of Curated Knowledge Into Real-Time Clinical Systems: CDS Architecture for the 21st Century

January 2018 (has links)
abstract: Clinical Decision Support (CDS) is primarily associated with alerts, reminders, order entry, rule-based invocation, diagnostic aids, and on-demand information retrieval. While valuable, these foci have been in production use for decades, and do not provide a broader, interoperable means of plugging structured clinical knowledge into live electronic health record (EHR) ecosystems for purposes of orchestrating the user experiences of patients and clinicians. To date, the gap between knowledge representation and user-facing EHR integration has been considered an “implementation concern” requiring unscalable manual human efforts and governance coordination. Drafting a questionnaire engineered to meet the specifications of the HL7 CDS Knowledge Artifact specification, for example, carries no reasonable expectation that it may be imported and deployed into a live system without significant burdens. Dramatic reduction of the time and effort gap in the research and application cycle could be revolutionary. Doing so, however, requires both a floor-to-ceiling precoordination of functional boundaries in the knowledge management lifecycle, as well as formalization of the human processes by which this occurs. This research introduces ARTAKA: Architecture for Real-Time Application of Knowledge Artifacts, as a concrete floor-to-ceiling technological blueprint for both provider heath IT (HIT) and vendor organizations to incrementally introduce value into existing systems dynamically. This is made possible by service-ization of curated knowledge artifacts, then injected into a highly scalable backend infrastructure by automated orchestration through public marketplaces. Supplementary examples of client app integration are also provided. Compilation of knowledge into platform-specific form has been left flexible, in so far as implementations comply with ARTAKA’s Context Event Service (CES) communication and Health Services Platform (HSP) Marketplace service packaging standards. Towards the goal of interoperable human processes, ARTAKA’s treatment of knowledge artifacts as a specialized form of software allows knowledge engineers to operate as a type of software engineering practice. Thus, nearly a century of software development processes, tools, policies, and lessons offer immediate benefit: in some cases, with remarkable parity. Analyses of experimentation is provided with guidelines in how choice aspects of software development life cycles (SDLCs) apply to knowledge artifact development in an ARTAKA environment. Portions of this culminating document have been further initiated with Standards Developing Organizations (SDOs) intended to ultimately produce normative standards, as have active relationships with other bodies. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2018
23

[en] A REAL-TIME REASONING SERVICE FOR THE INTERNET OF THINGS / [pt] UM SERVIÇO DE RACIOCÍNIO COMPUTACIONAL EM TEMPO REAL PARA A INTERNET DAS COISAS

RUHAN DOS REIS MONTEIRO 17 January 2019 (has links)
[pt] O crescimento da Internet das Coisas (IoT) nos trouxe a oportunidade de criar aplicações em diversas áreas com o uso de sensores e atuadores. Um dos problemas encontrados em sistemas de IoT é a dificuldade de adicionar relações semânticas aos dados brutos produzidos por estes sensores e conseguir inferir novos fatos a partir destas relações. Além disso, devido à natureza destes sistemas, os dados produzidos por eles, conhecidos como streams, precisam ser analisados em tempo real. Streams são uma sequência de elementos de dados com variação de tempo e que não devem ser tratados como dados a serem armazenados para sempre e consultados sob demanda. Os dados em streaming precisam ser consumidos rapidamente por meio de consultas contínuas que analisam e produzem novos dados relevantes. A capacidade de inferir novas relações semânticas sobre dados em streaming é chamada de inferência sobre streams. Nesta pesquisa, propomos um modo semântico e um mecanismo para processamento e inferência sobre streams em tempo real baseados em Processamento de Eventos Complexos (CEP), RDF (Resource Description Framework) e OWL (Web Ontology Language). Apresentamos um middleware que suporta uma inferência contínua sobre dados produzidores por sensores. As principais vantagens de nossa abodagem são: (a) considerar o tempo como uma relação-chave entre a informação; (b) processamento de fluxo por ser implementado usando o CEP; (c) é geral o suficiente para ser aplicado a qualquer sistema de gerenciamento de fluxo de dados (DSMS). Foi desenvolvido no Laboratório de Colaboração Avançada (LAC) utlizando e um estudo de caso no domínio da detecção de incêndio é conduzido e implementado, elucidando o uso de inferência em tempo real sobre streams. / [en] The growth of the Internet of Things (IoT) has brought the opportunity to create applications in several areas, with the use of sensors and actuators. One of the problems encountered in IoT systems is the difficulty of adding semantic relations to the raw data produced by the sensors and being able to infer new facts from these relations. Moreover, due to the fact that many IoT applications are online and need to react instantly on sensor data collected by them, they need to be analyzed in real-time. Streams are a sequence of time-varying data elements that should not be stored forever and queried on demand. Streaming data needs to be consumed quickly through ongoing queries that continue to analyze and produce new relevant data, i.e. stream of output/result events. The ability to infer new semantic relationships over streaming data is called Stream Reasoning. We propose a semantic model and a mechanism for real-time data stream processing and reasoning based on Complex Event Processing (CEP), RDF (resource description structure) and OWL (Web Ontology Language). This work presents a middleware service that supports continuous reasoning on data produced by sensors. The main advantages of our approach are: (a) to consider time as a key relationship between information; (b) flow processing can be implemented using CEP; (c) is general enough to be applied to any data flow management system (DSMS). It was developed in the Advanced Collaboration Laboratory (LAC) and a case study in the field of fire detection is conducted and implemented, elucidating the use of real-time inference on streams.
24

A distributed service delivery platform for automotive environments : enhancing communication capabilities of an M2M service platform for automotive application

Glaab, Markus January 2018 (has links)
The automotive domain is changing. On the way to more convenient, safe, and efficient vehicles, the role of electronic controllers and particularly software has increased significantly for many years, and vehicles have become software-intensive systems. Furthermore, vehicles are connected to the Internet to enable Advanced Driver Assistance Systems and enhanced In-Vehicle Infotainment functionalities. This widens the automotive software and system landscape beyond the physical vehicle boundaries to presently include as well external backend servers in the cloud. Moreover, the connectivity facilitates new kinds of distributed functionalities, making the vehicle a part of an Intelligent Transportation System (ITS) and thus an important example for a future Internet of Things (IoT). Manufacturers, however, are confronted with the challenging task of integrating these ever-increasing range of functionalities with heterogeneous or even contradictory requirements into a homogenous overall system. This requires new software platforms and architectural approaches. In this regard, the connectivity to fixed side backend systems not only introduces additional challenges, but also enables new approaches for addressing them. The vehicle-to-backend approaches currently emerging are dominated by proprietary solutions, which is in clear contradiction to the requirements of ITS scenarios which call for interoperability within the broad scope of vehicles and manufacturers. Therefore, this research aims at the development and propagation of a new concept of a universal distributed Automotive Service Delivery Platform (ASDP), as enabler for future automotive functionalities, not limited to ITS applications. Since Machine-to-Machine communication (M2M) is considered as a primary building block for the IoT, emergent standards such as the oneM2M service platform are selected as the initial architectural hypothesis for the realisation of an ASDP. Accordingly, this project describes a oneM2M-based ASDP as a reference configuration of the oneM2M service platform for automotive environments. In the research, the general applicability of the oneM2M service platform for the proposed ASDP is shown. However, the research also identifies shortcomings of the current oneM2M platform with respect to the capabilities needed for efficient communication and data exchange policies. It is pointed out that, for example, distributed traffic efficiency or vehicle maintenance functionalities are not efficiently treated by the standard. This may also have negative privacy impacts. Following this analysis, this research proposes novel enhancements to the oneM2M service platform, such as application-data-dependent criteria for data exchange and policy aggregation. The feasibility and advancements of the newly proposed approach are evaluated by means of proof-of-concept implementation and experiments with selected automotive scenarios. The results show the benefits of the proposed enhancements for a oneM2M-based ASDP, without neglecting to indicate their advantages for other domains of the oneM2M landscape where they could be applied as well.
25

[en] CONTINUOUS SERVICE DISCOVERY IN IOT / [pt] DESCOBERTA CONTÍNUA DE SERVIÇOS EM IOT

FELIPE OLIVEIRA CARVALHO 28 July 2017 (has links)
[pt] A popularização da Internet das Coisas (IoT, Internet of Things) provocou uma crescente oportunidade para a criação de aplicações em diversas áreas, através da combinação do uso de sensores e/ou atuadores. Em ambientes de IoT, o papel de elementos chamados de gateways consiste em fornecer uma camada de comunicação intermediária entre os dispositivos de IoT e serviços de nuvem. Um fator crucial para a construção de aplicações em larga escala é que os dispositivos de IoT possam ser utilizados de maneira transparente, num paradigma orientado a serviços, onde detalhes de comunicação e configuração destes objetos são tratados pelos gateways. No modelo de serviços, as aplicações devem descobrir as interfaces de alto-nível dos dispositivos e não precisam lidar com detalhes subjacentes, que são tratados pelos gateways. Em cenários de grande dinamismo e mobilidade (com conexões e desconexões de dispositivos acontecendo a todo momento), a descoberta e configuração de objetos deve ocorrer de forma contínua. Os protocolos de descoberta de serviços tradicional, como o Universal Plug and Play (UPnP) ou o Service Location Protocol (SLP), não foram desenvolvidos levando em consideração o alto dinamismo de ambientes IoT. Nesse sentido, introduzimos o processamento de eventos complexos (CEP), que é uma tecnologia para processamento em tempo real de fluxos de eventos heterogêneos, que permite a utilização de consultas em linguagem CQL (Continuous Query Language) para a busca de eventos de interesse. Em um modelo onde os eventos relacionados à descoberta de sensores são enviados para um fluxo CEP, consultas expressivas são escritas para que uma aplicação descubra continuamente serviços de interesse. Este trabalho apresenta a extensão do MHub/CDDL para o suporte à descoberta contínua de serviços em IoT, utilizando CEP. O MHub/CDDL (Mobile Hub / Context Data Distribution Layer) é um middleware para descoberta de serviços e gerenciamento de qualidade de contexto em IoT, desenvolvido numa parceria entre o Laboratory for Advanced Collaboration (LAC) da PUC-Rio e o Laboratório de Sistemas Distribuídos Inteligentes (LSDi) da Universidade Federal do Maranhão (UFMA). A implementação deste trabalho é feita para a plataforma Android (Java) e um estudo de caso no domínio de estacionamentos inteligentes é conduzido e implementado, elucidando o uso do mecanismo de descoberta contínuo. / [en] The popularization of the Internet of Things sparked a growing opportunity for the creation of applications in various areas, by combining the use of sensors and/or actuators. In IoT environments, the role of elements called gateways is to provide an intermediate communication layer between IoT devices and cloud services. A crucial factor for the construction of large-scale applications is to allow the use of IoT devices in a transparent manner, in a service-oriented paradigm, where details of communication and configuration are handled by the gateways. In service model, applications must discover the high-level interfaces of the devices and do not have to deal with underlying details that are handled by gateways. In scenarios of high dynamism and mobility (with connections and disconnections of devices occuring all the time), this discovery and configuration must occur continuously. Traditional service discovery protocols, such as Universal Plug and Play (UPnP) or Service Location Protocol (SLP), have not been developed taking into consideration the high dinamicity of IoT environments. In this sense, we introduce complex event processing (CEP), which is a technology for real-time processing of heterogeneous event flows, which allows the use of CQL (Continuous Query Language for the search of events of interest. In a model where events related to sensor discovery are sent to a CEP flow, expressive queries are written for an application to continuously discover services of interest. This work presents the extension of MHub / CDDL to support continuous service discovery in IoT, using CEP. The MHub / CDDL (Mobile Hub / Context Data Distribution Layer) is a middleware for service discovery and quality context management in IoT, developed in a partnership between the Laboratory for Advanced Collaboration (LAC) from PUC-Rio and the Laboratório de Sistemas Distribuídos Inteligentes (LSDi) from Universidade Federal do Maranhão (UFMA). The implementation of this work is done in Android (Java) platform and a case study in the domain of smart parking is conducted and implemented, elucidating the use of the continuous discovery mechanism.
26

[pt] CEP DISTRIBUÍDO PARA AQUISIÇÃO E PROCESSAMENTO DE INFORMAÇÃO ADAPTATIVOS CIENTES DE CONTEXTO / [en] DISTRIBUTED CEP FOR CONTEXT-AWARE ADAPTIVE ACQUIREMENT AND PROCESSING OF INFORMATION

FERNANDO BENEDITO VERAS MAGALHAES 07 June 2021 (has links)
[pt] A disseminação atual da IoT aumenta a implantação de soluções de processamento de fluxo de dados para monitorar e controlar elementos do mundo real. Uma dessas soluções é o Processamento de Eventos Complexos (CEP). Inicialmente, um único computador ou cluster concentraria toda a execução do CEP. No entanto, a execução centralizada do CEP não é ideal para lidar com o alto volume, velocidade e volatilidade dos fluxos de dados dos sensores IoT. Em vez disso, as aplicações CEP devem criar e decentralizar o processamento de eventos CEP, de preferência tendo agentes CEP na nuvem e em dispositivos na borda. Além disso, tão importante quanto a descentralização, é decidir como o processamento será dividido entre esses dispositivos. Dito isso, estar ciente do contexto atual de cada dispositivo, por exemplo, sua localização e sensores disponíveis, pode ajudar a coletar e (parcialmente) processar os dados em dispositivos próximos ao local onde os dados foram produzidos. Este trabalho apresenta uma plataforma de CEP distribuído com ciência de contexto chamada Global CEP Manager (GCM). GCM é um serviço do middleware ContextNet que oferece suporte à implantação e ao rearranjo dinâmico de consultas CEP baseados em contexto para motores CEP em execução na nuvem, em dispositivos na borda estacionários e M-Hubs, que são dispositivos na borda móveis do ContextNet. O GCM usa o ContextMatcher, que também faz parte deste trabalho. ContextMatcher é um módulo para aplicações ContextNet que permite a entrega de mensagens para nós cujo contexto esteja de compatível com um determinado conjunto de características contextuais. / [en] The current dissemination of IoT increases the deployment of stream processing solutions for monitoring and controlling elements of the real world. One of those solutions is Complex Event Processing (CEP). Initially, a single computer/cluster would concentrate all the CEP execution. However, a centralized execution of CEP is not suitable for coping with the high volume, velocity, and volatility of IoT sensors’ data streams. Instead, applications using CEP should deploy a distributed CEP Event Processing Network, preferably having CEP agents both in the cloud and at edge devices. Also, deciding the arrangement used to split the processing among these tiers and their devices can be just as important. That said, being aware of each of the devices current context, for instance, their location and available sensors, can help to collect and (partially) process the data on devices close to the data s production site. This work presents a contextaware distributed CEP platform called Global CEP Manager (GCM). GCM is a service of the ContextNet middleware that supports the context-based deployment, and dynamic rearrangement of CEP queries to CEP engines executing in the cloud, stationary edge devices, and M-Hubs, which are ContextNet s mobile edge devices. GCM uses the ContextMatcher, which is also part of this work. ContextMatcher is a module for ContextNet applications that enables the delivery of messages for nodes that match a specified set of contextual requirements.
27

Semantically-enabled stream processing and complex event processing over RDF graph streams / Traitement de flux sémantiquement activé et traitement d'évènements complexes sur des flux de graphe RDF

Gillani, Syed 04 November 2016 (has links)
Résumé en français non fourni par l'auteur. / There is a paradigm shift in the nature and processing means of today’s data: data are used to being mostly static and stored in large databases to be queried. Today, with the advent of new applications and means of collecting data, most applications on the Web and in enterprises produce data in a continuous manner under the form of streams. Thus, the users of these applications expect to process a large volume of data with fresh low latency results. This has resulted in the introduction of Data Stream Processing Systems (DSMSs) and a Complex Event Processing (CEP) paradigm – both with distinctive aims: DSMSs are mostly employed to process traditional query operators (mostly stateless), while CEP systems focus on temporal pattern matching (stateful operators) to detect changes in the data that can be thought of as events. In the past decade or so, a number of scalable and performance intensive DSMSs and CEP systems have been proposed. Most of them, however, are based on the relational data models – which begs the question for the support of heterogeneous data sources, i.e., variety of the data. Work in RDF stream processing (RSP) systems partly addresses the challenge of variety by promoting the RDF data model. Nonetheless, challenges like volume and velocity are overlooked by existing approaches. These challenges require customised optimisations which consider RDF as a first class citizen and scale the processof continuous graph pattern matching. To gain insights into these problems, this thesis focuses on developing scalable RDF graph stream processing, and semantically-enabled CEP systems (i.e., Semantic Complex Event Processing, SCEP). In addition to our optimised algorithmic and data structure methodologies, we also contribute to the design of a new query language for SCEP. Our contributions in these two fields are as follows: • RDF Graph Stream Processing. We first propose an RDF graph stream model, where each data item/event within streams is comprised of an RDF graph (a set of RDF triples). Second, we implement customised indexing techniques and data structures to continuously process RDF graph streams in an incremental manner. • Semantic Complex Event Processing. We extend the idea of RDF graph stream processing to enable SCEP over such RDF graph streams, i.e., temporalpattern matching. Our first contribution in this context is to provide a new querylanguage that encompasses the RDF graph stream model and employs a set of expressive temporal operators such as sequencing, kleene-+, negation, optional,conjunction, disjunction and event selection strategies. Based on this, we implement a scalable system that employs a non-deterministic finite automata model to evaluate these operators in an optimised manner. We leverage techniques from diverse fields, such as relational query optimisations, incremental query processing, sensor and social networks in order to solve real-world problems. We have applied our proposed techniques to a wide range of real-world and synthetic datasets to extract the knowledge from RDF structured data in motion. Our experimental evaluations confirm our theoretical insights, and demonstrate the viability of our proposed methods
28

An Efficient, Extensible, Hardware-aware Indexing Kernel

Sadoghi Hamedani, Mohammad 20 June 2014 (has links)
Modern hardware has the potential to play a central role in scalable data management systems. A realization of this potential arises in the context of indexing queries, a recurring theme in real-time data analytics, targeted advertising, algorithmic trading, and data-centric workflows, and of indexing data, a challenge in multi-version analytical query processing. To enhance query and data indexing, in this thesis, we present an efficient, extensible, and hardware-aware indexing kernel. This indexing kernel rests upon novel data structures and (parallel) algorithms that utilize the capabilities offered by modern hardware, especially abundance of main memory, multi-core architectures, hardware accelerators, and solid state drives. This thesis focuses on presenting our query indexing techniques to cope with processing queries in data-intensive applications that are susceptible to ever increasing data volume and velocity. At the core of our query indexing kernel lies the BE-Tree family of memory-resident indexing structures that scales by overcoming the curse of dimensionality through a novel two-phase space-cutting technique, an effective Top-k processing, and adaptive parallel algorithms to operate directly on compressed data (that exploits the multi-core architecture). Furthermore, we achieve line-rate processing by harnessing the unprecedented degrees of parallelism and pipelining only available through low-level logic design using FPGAs. Finally, we present a comprehensive evaluation that establishes the superiority of BE-Tree in comparison with state-of-the-art algorithms. In this thesis, we further expand the scope of our indexing kernel and describe how to accelerate analytical queries on (multi-version) databases by enabling indexes on the most recent data. Our goal is to reduce the overhead of index maintenance, so that indexes can be used effectively for analytical queries without being a heavy burden on transaction throughput. To achieve this end, we re-design the data structures in the storage hierarchy to employ an extra level of indirection over solid state drives. This indirection layer dramatically reduces the amount of magnetic disk I/Os that is needed for updating indexes and localizes the index maintenance. As a result, by rethinking how data is indexed, we eliminate the dilemma between update vs. query performance and reduce index maintenance and query processing cost substantially.
29

An Efficient, Extensible, Hardware-aware Indexing Kernel

Sadoghi Hamedani, Mohammad 20 June 2014 (has links)
Modern hardware has the potential to play a central role in scalable data management systems. A realization of this potential arises in the context of indexing queries, a recurring theme in real-time data analytics, targeted advertising, algorithmic trading, and data-centric workflows, and of indexing data, a challenge in multi-version analytical query processing. To enhance query and data indexing, in this thesis, we present an efficient, extensible, and hardware-aware indexing kernel. This indexing kernel rests upon novel data structures and (parallel) algorithms that utilize the capabilities offered by modern hardware, especially abundance of main memory, multi-core architectures, hardware accelerators, and solid state drives. This thesis focuses on presenting our query indexing techniques to cope with processing queries in data-intensive applications that are susceptible to ever increasing data volume and velocity. At the core of our query indexing kernel lies the BE-Tree family of memory-resident indexing structures that scales by overcoming the curse of dimensionality through a novel two-phase space-cutting technique, an effective Top-k processing, and adaptive parallel algorithms to operate directly on compressed data (that exploits the multi-core architecture). Furthermore, we achieve line-rate processing by harnessing the unprecedented degrees of parallelism and pipelining only available through low-level logic design using FPGAs. Finally, we present a comprehensive evaluation that establishes the superiority of BE-Tree in comparison with state-of-the-art algorithms. In this thesis, we further expand the scope of our indexing kernel and describe how to accelerate analytical queries on (multi-version) databases by enabling indexes on the most recent data. Our goal is to reduce the overhead of index maintenance, so that indexes can be used effectively for analytical queries without being a heavy burden on transaction throughput. To achieve this end, we re-design the data structures in the storage hierarchy to employ an extra level of indirection over solid state drives. This indirection layer dramatically reduces the amount of magnetic disk I/Os that is needed for updating indexes and localizes the index maintenance. As a result, by rethinking how data is indexed, we eliminate the dilemma between update vs. query performance and reduce index maintenance and query processing cost substantially.
30

Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data

Carvalho, Danilo Codeco 06 June 2016 (has links)
Submitted by Daniele Amaral (daniee_ni@hotmail.com) on 2016-10-20T18:13:56Z No. of bitstreams: 1 DissDCC.pdf: 2421455 bytes, checksum: 5fd16625959b31340d5f845754f109ce (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-11-08T18:42:36Z (GMT) No. of bitstreams: 1 DissDCC.pdf: 2421455 bytes, checksum: 5fd16625959b31340d5f845754f109ce (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-11-08T18:42:42Z (GMT) No. of bitstreams: 1 DissDCC.pdf: 2421455 bytes, checksum: 5fd16625959b31340d5f845754f109ce (MD5) / Made available in DSpace on 2016-11-08T18:42:49Z (GMT). No. of bitstreams: 1 DissDCC.pdf: 2421455 bytes, checksum: 5fd16625959b31340d5f845754f109ce (MD5) Previous issue date: 2016-06-06 / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / The growing amount of data produced daily, by both businesses and individuals in the web, increased the demand for analysis and extraction of knowledge of this data. While the last two decades the solution was to store and perform data mining algorithms, currently it has become unviable even to supercomputers. In addition, the requirements of the Big Data age go far beyond the large amount of data to analyze. Response time requirements and complexity of the data acquire more weight in many areas in the real world. New models have been researched and developed, often proposing distributed computing or different ways to handle the data stream mining. Current researches shows that an alternative in the data stream mining is to join a real-time event handling mechanism with a classic mining association rules or sequential patterns algorithms. In this work is shown a data stream mining approach to meet the Big Data response time requirement, linking the event handling mechanism in real time Esper and Incremental Miner of Stretchy Time Sequences (IncMSTS) algorithm. The results show that is possible to take a static data mining algorithm for data stream environment and keep tendency in the patterns, although not possible to continuously read all data coming into the data stream. / O crescimento da quantidade de dados produzidos diariamente, tanto por empresas como por indivíduos na web, aumentou a exigência para a análise e extração de conhecimento sobre esses dados. Enquanto nas duas últimas décadas a solução era armazenar e executar algoritmos de mineração de dados, atualmente isso se tornou inviável mesmo em super computadores. Além disso, os requisitos da chamada era do Big Data vão muito além da grande quantidade de dados a se analisar. Requisitos de tempo de resposta e complexidade dos dados adquirem maior peso em muitos domínios no mundo real. Novos modelos têm sido pesquisados e desenvolvidos, muitas vezes propondo computação distribuída ou diferentes formas de se tratar a mineração de fluxo de dados. Pesquisas atuais mostram que uma alternativa na mineração de fluxo de dados é unir um mecanismo de tratamento de eventos em tempo real com algoritmos clássicos de mineração de regras de associação ou padrões sequenciais. Neste trabalho é mostrada uma abordagem de mineração de fluxo de dados (data stream) para atender ao requisito de tempo de resposta do Big Data, que une o mecanismo de manipulação de eventos em tempo real Esper e o algoritmo Incremental Miner of Stretchy Time Sequences (IncMSTS). Os resultados mostram ser possível levar um algoritmo de mineração de dados estático para o ambiente de fluxo de dados e manter as tendências de padrões encontrados, mesmo não sendo possível ler todos os dados vindos continuamente no fluxo de dados.

Page generated in 0.0492 seconds