Spelling suggestions: "subject:"complex every""
41 |
[en] A MOBILE AND ONLINE OUTLIER DETECTION OVER MULTIPLE DATA STREAMS: A COMPLEX EVENT PROCESSING APPROACH FOR DRIVING BEHAVIOR DETECTION / [pt] DETECÇÃO MÓVEL E ONLINE DE ANOMALIA EM MÚLTIPLOS FLUXOS DE DADOS: UMA ABORDAGEM BASEADA EM PROCESSAMENTO DE EVENTOS COMPLEXOS PARA DETECÇÃO DE COMPORTAMENTO DE CONDUÇÃOIGOR OLIVEIRA VASCONCELOS 24 July 2017 (has links)
[pt] Dirigir é uma tarefa diária que permite uma locomoção mais rápida e mais confortável, no entanto, mais da metade dos acidentes fatais estão relacionados à imprudência. Manobras imprudentes podem ser detectadas com boa precisão, analisando dados relativos à interação motorista-veículo, por exemplo, curvas, aceleração e desaceleração abruptas. Embora existam algoritmos para detecção online de anomalias, estes normalmente são projetados para serem executados em computadores com grande poder computacional. Além disso, geralmente visam escala através da computação paralela, computação em grid ou computação em nuvem. Esta tese apresenta uma abordagem baseada em complex event processing para a detecção online de anomalias e classificação do comportamento de condução. Além disso, objetivamos identificar se dispositivos móveis com poder computacional limitado, como os smartphones, podem ser usados para uma detecção online do comportamento de condução. Para isso, modelamos e avaliamos três algoritmos de detecção online de anomalia no paradigma de processamento de fluxos de dados, que recebem os dados dos sensores do smartphone e dos sensores à bordo do veículo como entrada. As vantagens que o processamento de fluxos de dados proporciona reside no fato de que este reduz a quantidade de dados transmitidos do dispositivo móvel para servidores/nuvem, bem como se reduz o consumo de energia/bateria devido à transmissão de dados dos sensores e possibilidade de operação mesmo se o dispositivo móvel estiver desconectado. Para classificar os motoristas, um mecanismo estatístico utilizado na mineração de documentos que avalia a importância de uma palavra em uma coleção de documentos, denominada frequência de documento inversa, foi adaptado para identificar a importância de uma anomalia em um fluxo de dados, e avaliar quantitativamente o grau de prudência ou imprudência das manobras dos motoristas. Finalmente, uma avaliação da abordagem (usando o algoritmo que obteve melhor resultado na primeira etapa) foi realizada através de um estudo de caso do comportamento de condução de 25 motoristas em cenário real. Os resultados mostram uma acurácia de classificação de 84 por cento e um tempo médio de processamento de 100 milissegundos. / [en] Driving is a daily task that allows individuals to travel faster and more comfortably, however, more than half of fatal crashes are related to recklessness driving behaviors. Reckless maneuvers can be detected with accuracy by analyzing data related to driver-vehicle interactions, abrupt turns, acceleration, and deceleration, for instance. Although there are algorithms for online anomaly detection, they are usually designed to run on computers with high computational power. In addition, they typically target scale through parallel computing, grid computing, or cloud computing. This thesis presents an online anomaly detection approach based on complex event processing to enable driving behavior classification. In addition, we investigate if mobile devices with limited computational power, such as smartphones, can be used for online detection of driving behavior. To do so, we first model and evaluate three online anomaly detection algorithms in the data stream processing paradigm, which receive data from the smartphone and the in-vehicle embedded sensors as input. The advantages that stream processing provides lies in the fact that reduce the amount of data transmitted from the mobile device to servers/the cloud, as well as reduce the energy/battery usage due to transmission of sensor data and possibility to operate even if the mobile device is disconnected. To classify the drivers, a statistical mechanism used in document mining that evaluates the importance of a word in a collection of documents, called inverse document frequency, has been adapted to identify the importance of an anomaly in a data stream, and then quantitatively evaluate how cautious or reckless drivers maneuvers are. Finally, an evaluation of the approach (using the algorithm that achieved better result in the first step) was carried out through a case study of the 25 drivers driving
behavior. The results show an accuracy of 84 percent and an average processing time of 100 milliseconds.
|
42 |
Combinaison de l’Internet des objets, du traitement d’évènements complexes et de la classification de séries temporelles pour une gestion proactive de processus métier / Combining the Internet of things, complex event processing, and time series classification for a proactive business process management.Mousheimish, Raef 27 October 2017 (has links)
L’internet des objets est au coeur desprocessus industriels intelligents grâce à lacapacité de détection d’évènements à partir dedonnées de capteurs. Cependant, beaucoup resteà faire pour tirer le meilleur parti de cettetechnologie récente et la faire passer à l’échelle.Cette thèse vise à combler le gap entre les fluxmassifs de données collectées par les capteurs etleur exploitation effective dans la gestion desprocessus métier. Elle propose une approcheglobale qui combine le traitement de flux dedonnées, l’apprentissage supervisé et/oul’utilisation de règles sur des évènementscomplexes permettant de prédire (et doncéviter) des évènements indésirables, et enfin lagestion des processus métier étendue par cesrègles complexes.Les contributions scientifiques de cette thèse sesituent dans différents domaines : les processusmétiers plus intelligents et dynamiques; letraitement d’évènements complexes automatisépar l’apprentissage de règles; et enfin et surtout,dans le domaine de la fouille de données deséries temporelles multivariéespar la prédiction précoce de risques.L’application cible de cette thèse est le transportinstrumenté d’oeuvres d’art / Internet of things is at the core ofsmart industrial processes thanks to its capacityof event detection from data conveyed bysensors. However, much remains to be done tomake the most out of this recent technologyand make it scale. This thesis aims at filling thegap between the massive data flow collected bysensors and their effective exploitation inbusiness process management. It proposes aglobal approach, which combines stream dataprocessing, supervised learning and/or use ofcomplex event processing rules allowing topredict (and thereby avoid) undesirable events,and finally business process managementextended to these complex rules. The scientificcontributions of this thesis lie in several topics:making the business process more intelligentand more dynamic; automation of complexevent processing by learning the rules; and lastand not least, in datamining for multivariatetime series by early prediction of risks. Thetarget application of this thesis is theinstrumented transportation of artworks.
|
43 |
[en] DSCEP: AN INFRASTRUCTURE FOR DECENTRALIZED SEMANTIC COMPLEX EVENT PROCESSING / [pt] DSCEP: UMA INFRESTRUTURA DISTRIBUÍDA PARA PROCESSAMENTO DE EVENTOS COMPLEXOS SEMÂNTICOSVITOR PINHEIRO DE ALMEIDA 28 October 2021 (has links)
[pt] Muitas aplicações necessitam do processamento de eventos de streeams de
fontes diferentes em combinação com grandes quantidades de dados de bases de
conhecimento. CEP Semântico é um paradigma especificamente designado
para isso, ele extende o processamento complexo de eventos (CEP) para
adicionar o suporte para a linguagem RDF e utiliza uma rede de operadores
para processar streams RDF em combinação com bases de conhecimento em
RDF. Outra classe popular de sistemas projetados para um proposito similar
são os processadores de stream RDF (RSPs). Estes são sistemas que extendem a
linguagem SPARQL (a linguaguem de query padrão para RDF) para adicionar
a capacidade de fazer queries em stream. CEP Semântico e RSPs possuem
propositos similares porém focam em objetivos diferentes. O CEP Semântico,
foca na scalabilidade e processamento distribuido enquanto os RSPs focam nos
desafios do processamento de streams RDF. Nesta tese, propomos o uso de
RSPs como unidades para processamento de streams RDF dentro do contexto
de CEP Semântico. Apresentamos uma infraestrutura, chamada DSCEP, que
permite o encapsulamento de RSPs existentes em operadores do estilo CEP,
de maneira que estes RSPs possam ser interconectados formando uma rede
de operadores distribuída e descentralizada. DSCEP lida com os desafios e
obstáculos desta interconexão, como comunicação confiável, divisão e agregação
de streams, identificação de eventos e time-stamping, etc., permitindo que os
usuários se concentrem nas consultas. Também discutimos nesta tese como o
DSCEP pode ser usado para diminuir o tempo de processamento de consultas
SPARQL monolíticas, seja dividindo-as em subconsultas e operando-as em
paralelo através do uso de operadores ou seja dividingo a stream de entrada
em multiplos operadores que possuem a mesma query e são executados em
paralelo. Além disso também é avaliado o impacto que a base de conhecimento
possui no tempo de processamento de queires contínuas. / [en] Many applications require the processing of event streams from different
sources in combination with large amounts of background knowledge. Semantic
CEP is a paradigm explicitly designed for that. It extends complex event
processing (CEP) with RDF support and uses a network of operators to process
RDF streams combined with RDF knowledge bases. Another popular class of
systems designed for a similar purpose is the RDF stream processors (RSPs).
These are systems that extend SPARQL (the RDF query language) with stream
processing capabilities. Semantic CEP and RSPs have similar purposes but
focus on different things. The former focuses on scalability and distributed
processing, while the latter tends to focus on the intricacies of RDF stream
processing per se. In this thesis, we propose the use of RSP engines as building
blocks for Semantic CEP. We present an infrastructure, called DSCEP, that
allows the encapsulation of existing RSP engines into CEP-like operators so
that these can be seamlessly interconnected in a distributed, decentralized
operator network. DSCEP handles the hurdles of such interconnection, such
as reliable communication, stream aggregation and slicing, event identification
and time-stamping, etc., allowing users to concentrate on the queries. We also
discuss how DSCEP can be used to speed up monolithic SPARQL queries; by
splitting them into parallel subqueries that can be executed by the operator
network or even by splitting the input stream into multiple operators with the
same query running in parallel. Additionally, we evaluate the impact of the
knowledge base on the processing time of SPARQL continuous queries.
|
44 |
Recognition Of Complex Events In Open-source Web-scale Videos: Features, Intermediate Representations And Their Temporal InteractionsBhattacharya, Subhabrata 01 January 2013 (has links)
Recognition of complex events in consumer uploaded Internet videos, captured under realworld settings, has emerged as a challenging area of research across both computer vision and multimedia community. In this dissertation, we present a systematic decomposition of complex events into hierarchical components and make an in-depth analysis of how existing research are being used to cater to various levels of this hierarchy and identify three key stages where we make novel contributions, keeping complex events in focus. These are listed as follows: (a) Extraction of novel semi-global features – firstly, we introduce a Lie-algebra based representation of dominant camera motion present while capturing videos and show how this can be used as a complementary feature for video analysis. Secondly, we propose compact clip level descriptors of a video based on covariance of appearance and motion features which we further use in a sparse coding framework to recognize realistic actions and gestures. (b) Construction of intermediate representations – We propose an efficient probabilistic representation from low-level features computed from videos, based on Maximum Likelihood Estimates which demonstrates state of the art performance in large scale visual concept detection, and finally, (c) Modeling temporal interactions between intermediate concepts – Using block Hankel matrices and harmonic analysis of slowly evolving Linear Dynamical Systems, we propose two new discriminative feature spaces for complex event recognition and demonstrate significantly improved recognition rates over previously proposed approaches.
|
45 |
General dynamic Yannakakis: Conjunctive queries with theta joins under updatesIdris, Muhammad, Ugarte, Martín, Vansummeren, Stijn, Voigt, Hannes, Lehner, Wolfgang 17 July 2023 (has links)
The ability to efficiently analyze changing data is a key requirement of many real-time analytics applications. In prior work, we have proposed general dynamic Yannakakis (GDYN), a general framework for dynamically processing acyclic conjunctive queries with θ-joins in the presence of data updates. Whereas traditional approaches face a trade-off between materialization of subresults (to avoid inefficient recomputation) and recomputation of subresults (to avoid the potentially large space overhead of materialization), GDYN is able to avoid this trade-off. It intelligently maintains a succinct data structure that supports efficient maintenance under updates and from which the full query result can quickly be enumerated. In this paper, we consolidate and extend the development of GDYN. First, we give full formal proof of GDYN ’s correctness and complexity. Second, we present a novel algorithm for computing GDYN query plans. Finally, we instantiate GDYN to the case where all θ-joins are inequalities and present extended experimental comparison against state-of-the-art engines. Our approach performs consistently better than the competitor systems with multiple orders of magnitude improvements in both time and memory consumption.
|
46 |
[en] AN ENERGY-AWARE IOT GATEWAY, WITH CONTINUOUS PROCESSING OF SENSOR DATA / [pt] UM ENERGY-AWARE IOT GATEWAY, COM PROCESSAMENTO CONTÍNUO DE DADOS DE SENSORLUIS EDUARDO TALAVERA RIOS 30 August 2016 (has links)
[pt] Poucos estudos têm investigado e propôs uma solução de middleware
para a Internet das Coisas Móveis (IoMT), onde as coisas inteligentes
(Objetos Inteligente) podem ser movidos, ou podem mover-se de forma
autônoma, mas permanecem acessíveis a partir de qualquer outro computador
através da Internet. Neste contexto, existe uma necessidade de gateways
com eficiência energética para fornecer conectividade para uma grande variedade
de objetos inteligentes. As soluções propostas têm mostrado que
os dispositivos móveis (smartphones e tablets) são uma boa opção para se
tornar os intermediários universais, proporcionando um ponto de conexão
para os objetos inteligentes vizinhos com tecnologias de comunicação de
curto alcance. No entanto, eles só se preocupam apenas sobre a transmissão
de dados de sensores-primas (obtido a partir de objetos inteligentes conectados)
para a nuvem onde o processamento (e.g. agregação) é executada.
Comunicação via Internet é uma atividade de forte drenagem da bateria em
dispositivos móveis; Além disso, a largura de banda pode não ser suficiente
quando grandes quantidades de informação estão sendo recebidas dos objetos
inteligentes. Por isso, consideramos que uma parte do processamento
deve ser empurrada tão perto quanto possível das fontes. A respeito disso,
processamento de eventos complexos (CEP) é muitas vezes usado para o
processamento em tempo real de dados heterogêneos e pode ser uma tecnologia
chave para ser incluído nas Gateways. Ele permite uma maneira
de descrever o processamento como consultas expressivas que podem ser
implantados ou removidos dinamicamente no vôo. Assim, sendo adequado
para aplicações que têm de lidar com adaptação dinâmica de processamento
local. Esta dissertação descreve uma extensão de um middleware móvel com
a inclusão de processamento contínuo dos dados do sensor, a sua concepção
e implementação de um protótipo para Android. Experimentos têm mostrado
que a nossa implementação proporciona uma boa redução no consumo
de energia e largura de banda. / [en] Few studies have investigated and proposed a middleware solution for
the Internet of Mobile Things (IoMT), where the smart things (Smart Objects)
can be moved, or else can move autonomously, but remain accessible
from any other computer over the Internet. In this context, there is a need
for energy-efficient gateways to provide connectivity to a great variety of
Smart Objects. Proposed solutions have shown that mobile devices (smartphones
and tablets) are a good option to become the universal intermediates
by providing a connection point to nearby Smart Objects with short-range
communication technologies. However, they only focus on the transmission
of raw sensor data (obtained from connected Smart Objects) to the cloud
where processing (e.g. aggregation) is performed. Internet Communication
is a strong battery-draining activity for mobile devices; moreover, bandwidth
may not be sufficient when large amounts of information is being
received from the Smart Objects. Hence, we argue that some of the processing
should be pushed as close as possible to the sources. In this regard,
Complex Event Processing (CEP) is often used for real-time processing of
heterogeneous data and could be a key technology to be included in the
gateways. It allows a way to describe the processing as expressive queries
that can be dynamically deployed or removed on-the-
fly. Thus, being suitable
for applications that have to deal with dynamic adaptation of local
processing. This dissertation describes an extension of a mobile middleware
with the inclusion of continuous processing of sensor data, its design and
prototype implementation for Android. Experiments have shown that our
implementation delivers good reduction in energy and bandwidth consumption.
|
47 |
Parallel Execution of Order Dependent Grouping FunctionsPeters, Mathias 29 October 2024 (has links)
Der exponentielle Anstieg elektronisch gespeicherter Daten erfordert leistungsfähige Systeme zur Verarbeitung und Analyse großer Datenmengen. Parallel relationale Datenbanksysteme (PRDBMS) waren lange Zeit der Standard für analytische Abfragen. Neuere Systeme, wie Apache Flink, Tez und Spark, nutzen erweiterte Ansätze zur Analyse und trennen logische Spezifikationen von physischen Ausführungen. Ein weit verbreitetes Optimierungsverfahren in der analytischen Verarbeitung ist die partielle Aggregation, bei der Aggregation in zwei Stufen erfolgt: Zunächst werden partielle Aggregatgruppen erstellt, die dann zusammengeführt werden, um das Endergebnis zu berechnen. Dieses Verfahren ermöglicht eine parallele Verarbeitung und reduziert die Größe der Zwischenergebnisse.
Bisherige Ansätze konzentrieren sich auf ordnungsunabhängige Gruppierungsfunktionen, bei denen Elemente ohne Berücksichtigung der Reihenfolge gruppiert werden können. In der Praxis gibt es jedoch auch ordnungsabhängige Gruppierungsfunktionen, die von der Reihenfolge der Eingaben abhängen und komplexer in der parallelen Ausführung sind. Derzeit existieren nur begrenzte Ansätze für eine effiziente Parallelisierung solcher Funktionen.
Diese Dissertation präsentiert einen neuen Ansatz zur Parallelisierung von Aggregationsanfragen für drei ordnungsabhängige Gruppierungsfunktionen: Sessionization, Regular Expression Matching (REM) und Complex Event Recognition (CER). Unsere Methode nutzt zerlegbare Aggregationsfunktionen, um eine effiziente parallele Ausführung in modernen Shared-Nothing-Compute-Umgebungen zu ermöglichen. Die stufenweise Ausführung dieser Funktionen eröffnet neue Optimierungsmöglichkeiten. Unser Ansatz erlaubt es Optimierungsalgorithmen, zwischen sequentiellen und stufenweisen Verfahren zu wählen. Zusätzlich schlägt die Arbeit ein Schema vor, wie weitere Gruppierungsfunktionen zerlegt und in die partielle Aggregation integriert werden können. / Advances in information technologies and decreasing cost for storage and compute capacities lead to exponential growth of data being available electronically worldwide. Systems capable of processing these large amounts of data with the goal of analyzing and extracting information are essential for both: research and businesses. Analytical data processing systems employ various optimizations to execute queries efficiently.
Partial Aggregation (PA) using GroupBy and decomposable aggregation functions is a common optimization approach in analytical query processing. Analytical systems execute PA in two stages: During the first stage, they create partial groups to compute partial aggregates. During the second stage, the partial aggregates are grouped and aggregated again to produce the final result. The main benefits of PA are an increased potential of parallel execution during the first stage and a reduction of intermediate result sizes by aggregating over the partial groups. So far, existing approaches to PA only use an order-agnostic grouping function on sets to create groups.
There are grouping functions that depend on ordered input and information on previously processed input items to associate a given input item to its group. Staged execution of order-dependent grouping functions is more difficult than for order-agnostic grouping functions. Systems must compute correct partial states during the first stage and combine them during the final stage. Approaches for efficient parallel execution only exist in a limited way despite the high practical relevance.
In this thesis, we present a novel approach for parallelizing aggregation for three order-dependent grouping functions: Sessionization, Regular Expression Matching (REM), and Complex Event Recognition (CER). Our approach of computing the three grouping functions in stages combined with decomposable aggregation functions allows for efficient parallel execution in state-of-the-art shared-nothing compute environments.
|
48 |
Real-time Business Intelligence through Compact and Efficient Query Processing Under UpdatesIdris, Muhammad 05 March 2019 (has links) (PDF)
Responsive analytics are rapidly taking over the traditional data analytics dominated by the post-fact approaches in traditional data warehousing. Recent advancements in analytics demand placing analytical engines at the forefront of the system to react to updates occurring at high speed and detect patterns, trends, and anomalies. These kinds of solutions find applications in Financial Systems, Industrial Control Systems, Business Intelligence and on-line Machine Learning among others. These applications are usually associated with Big Data and require the ability to react to constantly changing data in order to obtain timely insights and take proactive measures. Generally, these systems specify the analytical results or their basic elements in a query language, where the main task then is to maintain query results under frequent updates efficiently. The task of reacting to updates and analyzing changing data has been addressed in two ways in the literature: traditional business intelligence (BI) solutions focus on historical data analysis where the data is refreshed periodically and in batches, and stream processing solutions process streams of data from transient sources as flows of data items. Both kinds of systems share the niche of reacting to updates (known as dynamic evaluation), however, they differ in architecture, query languages, and processing mechanisms. In this thesis, we investigate the possibility of a reactive and unified framework to model queries that appear in both kinds of systems.In traditional BI solutions, evaluating queries under updates has been studied under the umbrella of incremental evaluation of queries that are based on the relational incremental view maintenance model and mostly focus on queries that feature equi-joins. Streaming systems, in contrast, generally follow automaton based models to evaluate queries under updates, and they generally process queries that mostly feature comparisons of temporal attributes (e.g. timestamp attributes) along with comparisons of non-temporal attributes over streams of bounded sizes. Temporal comparisons constitute inequality constraints while non-temporal comparisons can either be equality or inequality constraints. Hence these systems mostly process inequality joins. As a starting point for our research, we postulate the thesis that queries in streaming systems can also be evaluated efficiently based on the paradigm of incremental evaluation just like in BI systems in a main-memory model. The efficiency of such a model is measured in terms of runtime memory footprint and the update processing cost. To this end, the existing approaches of dynamic evaluation in both kinds of systems present a trade-off between memory footprint and the update processing cost. More specifically, systems that avoid materialization of query (sub)results incur high update latency and systems that materialize (sub)results incur high memory footprint. We are interested in investigating the possibility to build a model that can address this trade-off. In particular, we overcome this trade-off by investigating the possibility of practical dynamic evaluation algorithm for queries that appear in both kinds of systems and present a main-memory data representation that allows to enumerate query (sub)results without materialization and can be maintained efficiently under updates. We call this representation the Dynamic Constant Delay Linear Representation (DCLRs).We devise DCLRs with the following properties: 1) they allow, without materialization, enumeration of query results with bounded-delay (and with constant delay for a sub-class of queries), 2) they allow tuple lookup in query results with logarithmic delay (and with constant delay for conjunctive queries with equi-joins only), 3) they take space linear in the size of the database, 4) they can be maintained efficiently under updates. We first study the DCLRs with the above-described properties for the class of acyclic conjunctive queries featuring equi-joins with projections and present the dynamic evaluation algorithm called the Dynamic Yannakakis (DYN) algorithm. Then, we present the generalization of the DYN algorithm to the class of acyclic queries featuring multi-way Theta-joins with projections and call it Generalized DYN (GDYN). We devise DCLRs with the above properties for acyclic conjunctive queries, and the working of DYN and GDYN over DCLRs are based on a particular variant of join trees, called the Generalized Join Trees (GJTs) that guarantee the above-described properties of DCLRs. We define GJTs and present algorithms to test a conjunctive query featuring Theta-joins for acyclicity and to generate GJTs for such queries. We extend the classical GYO algorithm from testing a conjunctive query with equalities for acyclicity to testing a conjunctive query featuring multi-way Theta-joins with projections for acyclicity. We further extend the GYO algorithm to generate GJTs for queries that are acyclic.GDYN is hence a unified framework based on DCLRs that enables processing of queries that appear in streaming systems as well as in BI systems in a unified main-memory model and addresses the space-time trade-off. We instantiate GDYN to the particular case where all Theta-joins involve only equalities and inequalities and call this instantiation IEDYN. We implement DYN and IEDYN as query compilers that generate executable programs in the Scala programming language and provide all the necessary data structures and their maintenance and enumeration methods in a continuous stream processing model. We evaluate DYN and IEDYN against state-of-the-art BI and streaming systems on both industrial and synthetically generated benchmarks. We show that DYN and IEDYN outperform the existing systems by over an order of magnitude efficiency in both memory footprint and update processing time. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished
|
49 |
Real-time Business Intelligence through Compact and Efficient Query Processing Under UpdatesIdris, Muhammad 10 April 2019 (has links)
Responsive analytics are rapidly taking over the traditional data analytics dominated by the post-fact approaches in traditional data warehousing. Recent advancements in analytics demand placing analytical engines at the forefront of the system to react to updates occurring at high speed and detect patterns, trends and anomalies. These kinds of solutions find applications in Financial Systems, Industrial Control Systems, Business Intelligence and on-line Machine Learning among others. These applications are usually associated with Big Data and require the ability to react to constantly changing data in order to obtain timely insights and take proactive measures. Generally, these systems specify the analytical results or their basic elements in a query language, where the main task then is to maintain these results under frequent updates efficiently. The task of reacting to updates and analyzing changing data has been addressed in two ways in the literature: traditional business intelligence (BI) solutions focus on historical data analysis where the data is refreshed periodically and in batches, and stream processing solutions process streams of data from transient sources as flow (or set of flows) of data items. Both kinds of systems share the niche of reacting to updates (known as dynamic evaluation); however, they differ in architecture, query languages, and processing mechanisms. In this thesis, we investigate the possibility of a reactive and unified framework to model queries that appear in both kinds of systems.
In traditional BI solutions, evaluating queries under updates has been studied under the umbrella of incremental evaluation of updates that is based on relational incremental view maintenance model and mostly focus on queries that feature equi-joins. Streaming systems, in contrast, generally follow the automaton based models to evaluate queries under updates, and they generally process queries that mostly feature comparisons of temporal attributes (e.g., timestamp attributes) along-with comparisons of non-temporal attributes over streams of bounded sizes. Temporal comparisons constitute inequality constraints, while non-temporal comparisons can either be equality or inequality constraints, hence these systems mostly process inequality joins. As starting point, we postulate the thesis that queries in streaming systems can also be evaluated efficiently based on the paradigm of incremental evaluation just like in BI systems in a main-memory model. The efficiency of such a model is measured in terms of runtime memory footprint and the update processing cost. To this end, the existing approaches of dynamic evaluation in both kind of systems present a trade-off between memory footprint and the update processing cost. More specifically, systems that avoid materialization of query (sub) results incur high update latency and systems that materialize (sub) results incur high memory footprint. We are interested in investigating the possibility to build a model that can address this trade-off. In particular, we overcome this trade-off by investigating the possibility of practical dynamic evaluation algorithm for queries that appear in both kinds of systems, and present a main-memory data representation that allows to enumerate query (sub) results without materialization and can be maintained efficiently under updates. We call this representation the Dynamic Constant Delay Linear Representation (DCLR).
We devise DCLRs with the following properties: 1) they allow, without materialization, enumeration of query results with bounded-delay (and with constant delay for a sub-class of queries); 2) they allow tuple lookup in query results with logarithmic delay (and with constant delay for conjunctive queries with equi-joins only); 3) they take space linear in the size of the database; 4) they can be maintained efficiently under updates. We first study the DCLRs with the above-described properties for the class of acyclic conjunctive queries featuring equi-joins with projections and present the dynamic evaluation algorithm. Then, we present the generalization of thiw algorithm to the class of acyclic queries featuring multi-way theta-joins with projections. We devise DCLRs with the above properties for acyclic conjunctive queries, and the working of dynamic algorithms over DCLRs is based on a particular variant of join trees, called the Generalized Join Trees (GJTs) that guarantee the above-described properties of DCLRs. We define GJTs and present the algorithms to test a conjunctive query featuring theta-joins for acyclicity and to generate GJTs for such queries. To do this, we extend the classical GYO algorithm from testing a conjunctive query with equalities for acyclicity to test a conjunctive query featuring multi-way theta-joins with projections for acyclicity. We further extend the GYO algorithm to generate GJTs for queries that are acyclic. We implemented our algorithms in a query compiler that takes as input the SQL queries and generates Scala executable code – a trigger program to process queries and maintain under updates. We tested our approach against state of the art main-memory BI and CEP systems. Our evaluation results have shown that our DCLRs based approach is over an order of magnitude efficient than existing systems for both memory footprint and update processing cost. We have also shown that the enumeration of query results without materialization in DCLRs is comparable (and in some cases efficient) as compared to enumerating from materialized query results.
|
50 |
Robust Subspace Estimation Using Low-rank Optimization. Theory And Applications In Scene Reconstruction, Video Denoising, And Activity Recognition.Oreifej, Omar 01 January 2013 (has links)
In this dissertation, we discuss the problem of robust linear subspace estimation using low-rank optimization and propose three formulations of it. We demonstrate how these formulations can be used to solve fundamental computer vision problems, and provide superior performance in terms of accuracy and running time. Consider a set of observations extracted from images (such as pixel gray values, local features, trajectories . . . etc). If the assumption that these observations are drawn from a liner subspace (or can be linearly approximated) is valid, then the goal is to represent each observation as a linear combination of a compact basis, while maintaining a minimal reconstruction error. One of the earliest, yet most popular, approaches to achieve that is Principal Component Analysis (PCA). However, PCA can only handle Gaussian noise, and thus suffers when the observations are contaminated with gross and sparse outliers. To this end, in this dissertation, we focus on estimating the subspace robustly using low-rank optimization, where the sparse outliers are detected and separated through the `1 norm. The robust estimation has a two-fold advantage: First, the obtained basis better represents the actual subspace because it does not include contributions from the outliers. Second, the detected outliers are often of a specific interest in many applications, as we will show throughout this thesis. We demonstrate four different formulations and applications for low-rank optimization. First, we consider the problem of reconstructing an underwater sequence by removing the iii turbulence caused by the water waves. The main drawback of most previous attempts to tackle this problem is that they heavily depend on modelling the waves, which in fact is ill-posed since the actual behavior of the waves along with the imaging process are complicated and include several noise components; therefore, their results are not satisfactory. In contrast, we propose a novel approach which outperforms the state-of-the-art. The intuition behind our method is that in a sequence where the water is static, the frames would be linearly correlated. Therefore, in the presence of water waves, we may consider the frames as noisy observations drawn from a the subspace of linearly correlated frames. However, the noise introduced by the water waves is not sparse, and thus cannot directly be detected using low-rank optimization. Therefore, we propose a data-driven two-stage approach, where the first stage “sparsifies” the noise, and the second stage detects it. The first stage leverages the temporal mean of the sequence to overcome the structured turbulence of the waves through an iterative registration algorithm. The result of the first stage is a high quality mean and a better structured sequence; however, the sequence still contains unstructured sparse noise. Thus, we employ a second stage at which we extract the sparse errors from the sequence through rank minimization. Our method converges faster, and drastically outperforms state of the art on all testing sequences. Secondly, we consider a closely related situation where an independently moving object is also present in the turbulent video. More precisely, we consider video sequences acquired in a desert battlefields, where atmospheric turbulence is typically present, in addition to independently moving targets. Typical approaches for turbulence mitigation follow averaging or de-warping techniques. Although these methods can reduce the turbulence, they distort the independently moving objects which can often be of great interest. Therefore, we address the iv problem of simultaneous turbulence mitigation and moving object detection. We propose a novel three-term low-rank matrix decomposition approach in which we decompose the turbulence sequence into three components: the background, the turbulence, and the object. We simplify this extremely difficult problem into a minimization of nuclear norm, Frobenius norm, and `1 norm. Our method is based on two observations: First, the turbulence causes dense and Gaussian noise, and therefore can be captured by Frobenius norm, while the moving objects are sparse and thus can be captured by `1 norm. Second, since the object’s motion is linear and intrinsically different than the Gaussian-like turbulence, a Gaussian-based turbulence model can be employed to enforce an additional constraint on the search space of the minimization. We demonstrate the robustness of our approach on challenging sequences which are significantly distorted with atmospheric turbulence and include extremely tiny moving objects. In addition to robustly detecting the subspace of the frames of a sequence, we consider using trajectories as observations in the low-rank optimization framework. In particular, in videos acquired by moving cameras, we track all the pixels in the video and use that to estimate the camera motion subspace. This is particularly useful in activity recognition, which typically requires standard preprocessing steps such as motion compensation, moving object detection, and object tracking. The errors from the motion compensation step propagate to the object detection stage, resulting in miss-detections, which further complicates the tracking stage, resulting in cluttered and incorrect tracks. In contrast, we propose a novel approach which does not follow the standard steps, and accordingly avoids the aforementioned diffi- culties. Our approach is based on Lagrangian particle trajectories which are a set of dense trajectories obtained by advecting optical flow over time, thus capturing the ensemble motions v of a scene. This is done in frames of unaligned video, and no object detection is required. In order to handle the moving camera, we decompose the trajectories into their camera-induced and object-induced components. Having obtained the relevant object motion trajectories, we compute a compact set of chaotic invariant features, which captures the characteristics of the trajectories. Consequently, a SVM is employed to learn and recognize the human actions using the computed motion features. We performed intensive experiments on multiple benchmark datasets, and obtained promising results. Finally, we consider a more challenging problem referred to as complex event recognition, where the activities of interest are complex and unconstrained. This problem typically pose significant challenges because it involves videos of highly variable content, noise, length, frame size . . . etc. In this extremely challenging task, high-level features have recently shown a promising direction as in [53, 129], where core low-level events referred to as concepts are annotated and modelled using a portion of the training data, then each event is described using its content of these concepts. However, because of the complex nature of the videos, both the concept models and the corresponding high-level features are significantly noisy. In order to address this problem, we propose a novel low-rank formulation, which combines the precisely annotated videos used to train the concepts, with the rich high-level features. Our approach finds a new representation for each event, which is not only low-rank, but also constrained to adhere to the concept annotation, thus suppressing the noise, and maintaining a consistent occurrence of the concepts in each event. Extensive experiments on large scale real world dataset TRECVID Multimedia Event Detection 2011 and 2012 demonstrate that our approach consistently improves the discriminativity of the high-level features by a significant margin.
|
Page generated in 0.0538 seconds