Spelling suggestions: "subject:"synopsis"" "subject:"synopsys""
1 |
Correlated Sample Synopsis on Big DataWilson, David S. 12 December 2018 (has links)
No description available.
|
2 |
Geo-distributed multi-layer stream aggregationCannalire, Pietro January 2018 (has links)
The standard processing architectures are enough to satisfy a lot of applications by employing already existing stream processing frameworks which are able to manage distributed data processing. In some specific cases, having geographically distributed data sources requires to distribute even more the processing over a large area by employing a geographically distributed architecture. The issue addressed in this work is the reduction of data movement across the network which is continuously flowing in a geo-distributed architecture from streaming sources to the processing location and among processing entities within the same distributed cluster. Reduction of data movement can be critical for decreasing bandwidth costs since accessing links placed in the middle of the network can be costly and can increase as the amount of data exchanges increase. In this work we want to create a different concept to deploy geographically distributed architectures by relying on Apache Spark Structured Streaming and Apache Kafka. The features needed for an algorithm to run on a geo-distributed architecture are provided. The algorithms to be executed on this architecture apply the windowing and the data synopses techniques to produce a summaries of the input data and to address issues of the geographically distributed architecture. The computation of the average and the Misra-Gries algorithm are then implemented to test the designed architecture. This thesis work contributes in providing a new model of building geographically distributed architecture. The experimental results show that, for the algorithms running on top of the geo distributed architecture, the computation time is reduced on average by 70% compared to the distributed setup. Similarly, and the amount of data exchanged across the network is reduced on average by 99%, compared to the distributed setup. / Standardbehandlingsarkitekturer är tillräckligt för uppfylla behoven av många tillämpningar genom användning av befintliga ramverk för flödesbehandling med stöd för distribuerad databehandling. I specifika fall kan geografiskt fördelade datakällor kräva att databehandlingen fördelas över ett stort område med hjälp av en geografiskt distribuerad arkitektur. Problemet som behandlas i detta arbete är minskningen av kontinuerlig dataöverföring i ett nätverk med geo-distribuerad arkitektur. Minskad dataöverföring kan vara avgörande för minskade bandbreddskonstnader då åtkomst av länkar placerade i mitten av ett nätverk kan vara dyrt och öka ytterligare med tilltagande dataöverföring. I det här arbetet vill vi skapa ett nytt koncept för att upprätta geografiskt distribuerade arkitekturer med hjälp av Apache Spark Structured Streaming och Apache Kafka. Funktioner och förutsättningar som behövs för att en algoritm ska kunna köras på en geografisk distribuerad arkitektur tillhandahålls. Algoritmerna som ska köras på denna arkitektur tillämpar “windowing synopsing” och “data synopses”-tekniker för att framställa en sammanfattning av ingående data samt behandla problem beträffande den geografiskt fördelade arkitekturen. Beräkning av medelvärdet och Misra-Gries-algoritmen implementeras för att testa den konstruerade arkitekturen. Denna avhandling bidrar till att förse ny modell för att bygga geografiskt distribuerad arkitektur. Experimentella resultat visar att beräkningstiden reduceras i genomsnitt 70% för de algoritmer som körs ovanför den geo-distribuerade arkitekturen jämfört med den distribuerade konfigurationen. På liknande sätt reduceras mängden data som utväxlas över nätverket med 99% i snitt jämfört med den distribuerade inställningen.
|
3 |
Sample Footprints für Data-Warehouse-DatenbankenRösch, Philipp, Lehner, Wolfgang 20 January 2023 (has links)
Durch stetig wachsende Datenmengen in aktuellen Data-Warehouse-Datenbanken erlangen Stichproben eine immer größer werdende Bedeutung. Insbesondere interaktive Analysen können von den signifikant kürzeren Antwortzeiten der approximativen Anfrageverarbeitung erheblich profitieren. Linked-Bernoulli-Synopsen bieten in diesem Szenario speichereffiziente, schemaweite Synopsen, d. h. Synopsen mit Stichproben jeder im Schema enthaltenen Tabelle bei minimalem Mehraufwand für die Erhaltung der referenziellen Integrität innerhalb der Synopse. Dies ermöglicht eine effiziente Unterstützung der näherungsweisen Beantwortung von Anfragen mit beliebigen Fremdschlüsselverbundoperationen. In diesem Artikel wird der Einsatz von Linked-Bernoulli-Synopsen in Data-Warehouse-Umgebungen detaillierter analysiert. Dies beinhaltet zum einen die Konstruktion speicherplatzbeschränkter, schemaweiter Synopsen, wobei unter anderem folgende Fragen adressiert werden: Wie kann der verfügbare Speicherplatz auf die einzelnen Stichproben aufgeteilt werden? Was sind die Auswirkungen auf den Mehraufwand? Zum anderen wird untersucht, wie Linked-Bernoulli-Synopsen für die Verwendung in Data-Warehouse-Datenbanken angepasst werden können. Hierfür werden eine inkrementelle Wartungsstrategie sowie eine Erweiterung um eine Ausreißerbehandlung für die Reduzierung von Schätzfehlern approximativer Antworten von Aggregationsanfragen mit Fremdschlüsselverbundoperationen vorgestellt. Eine Vielzahl von Experimenten zeigt, dass Linked-Bernoulli-Synopsen und die in diesem Artikel präsentierten Verfahren vielversprechend für den Einsatz in Data-Warehouse-Datenbanken sind. / With the amount of data in current data warehouse databases growing steadily, random sampling is continuously gaining in importance. In particular, interactive analyses of large datasets can greatly benefit from the significantly shorter response times of approximate query processing. In this scenario, Linked Bernoulli Synopses provide memory-efficient schema-level synopses, i. e., synopses that consist of random samples of each table in the schema with minimal overhead for retaining foreign-key integrity within the synopsis. This provides efficient support to the approximate answering of queries with arbitrary foreign-key joins. In this article, we focus on the application of Linked Bernoulli Synopses in data warehouse environments. On the one hand, we analyze the instantiation of memory-bounded synopses. Among others, we address the following questions: How can the given space be partitioned among the individual samples? What is the impact on the overhead? On the other hand, we consider further adaptations of Linked Bernoulli Synopses for usage in data warehouse databases. We show how synopses can incrementally be kept up-to-date when the underlying data changes. Further, we suggest additional outlier handling methods to reduce the estimation error of approximate answers of aggregation queries with foreign-key joins. With a variety of experiments, we show that Linked Bernoulli Synopses and the proposed techniques have great potential in the context of data warehouse databases.
|
4 |
[pt] CONTRIBUIÇÕES AO PROBLEMA DE BUSCA POR PALAVRAS-CHAVE EM CONJUNTOS DE DADOS E TRAJETÓRIAS SEMÂNTICAS BASEADOS NO RESOURCE DESCRIPTION FRAMEWORK / [en] CONTRIBUTIONS TO THE PROBLEM OF KEYWORD SEARCH OVER DATASETS AND SEMANTIC TRAJECTORIES BASED ON THE RESOURCE DESCRIPTION FRAMEWORKYENIER TORRES IZQUIERDO 18 May 2021 (has links)
[pt] Busca por palavras-chave fornece uma interface fácil de usar para recuperar
informação. Esta tese contribui para os problemas de busca por palavras chave
em conjuntos de dados sem esquema e trajetórias semânticas baseados
no Resource Description Framework.
Para endereçar o problema da busca por palavras-chave em conjuntos
de dados RDF sem esquema, a tese introduz um algoritmo para traduzir automaticamente
uma consulta K baseada em palavras-chave especificadas pelo
usuário em uma consulta SPARQL Q de tal forma que as respostas que Q retorna
também são respostas para K. O algoritmo não depende de um esquema
RDF, mas sintetiza as consultas SPARQL explorando a semelhança entre os
domínios e contradomínios das propriedades e os conjuntos de instâncias de
classe observados no grafo RDF. O algoritmo estima a similaridade entre conjuntos
com base em sinopses, que podem ser precalculadas, com eficiência, em
uma única passagem sobre o conjunto de dados RDF. O trabalho inclui dois
conjuntos de experimentos com uma implementação do algoritmo. O primeiro
conjunto de experimentos mostra que a implementação supera uma ferramenta
de pesquisa por palavras-chave sobre grafos RDF que explora o esquema RDF
para sintetizar as consultas SPARQL, enquanto o segundo conjunto indica que
a implementação tem um desempenho melhor do que sistemas de pesquisa
por palavras-chave em conjuntos de dados RDF baseados na abordagem de
documentos virtuais denominados TSA+BM25 e TSA+VDP. Finalmente, a
tese também computa a eficácia do algoritmo proposto usando uma métrica
baseada no conceito de relevância do grafo resposta.
O segundo problema abordado nesta tese é o problema da busca por
palavras-chave sobre trajetórias semânticas baseadas em RDF. Trajetórias semânticas
são trajetórias segmentadas em que as paradas e os deslocamentos de
um objeto móvel são semanticamente enriquecidos com dados adicionais. Uma
linguagem de consulta para conjuntos de trajetórias semânticas deve incluir
seletores para paradas ou deslocamentos com base em seus enriquecimentos
e expressões de sequência que definem como combinar os resultados dos seletores
com a sequência que a trajetória semântica define. A tese inicialmente
propõe um framework formal para definir trajetórias semânticas e introduz
expressões de sequências de paradas-e-deslocamentos (stop-and-move sequences),
com sintaxe e semântica bem definidas, que atuam como uma linguagem
de consulta expressiva para trajetórias semânticas. A tese descreve um modelo
concreto de trajetória semântica em RDF, define expressões de sequências
de paradas-e-deslocamentos em SPARQL e discute estratégias para compilar
tais expressões em consultas SPARQL. A tese define consultas sobre trajetórias
semânticas com base no uso de palavras-chave para especificar paradas e
deslocamentos e a adoção de termos com semântica predefinida para compor
expressões de sequência. Em seguida, descreve como compilar tais expressões
em consultas SPARQL, mediante o uso de padrões predefinidos. Finalmente,
a tese apresenta uma prova de conceito usando um conjunto de trajetórias semânticas
construído com conteúdo gerado pelos usuários do Flickr, combinado
com dados da Wikipedia. / [en] Keyword search provides an easy-to-use interface for retrieving information.
This thesis contributes to the problems of keyword search over schema-less
datasets and semantic trajectories based on RDF.
To address the keyword search over schema-less RDF datasets problem,
this thesis introduces an algorithm to automatically translate a user-specified
keyword-based query K into a SPARQL query Q so that the answers Q returns
are also answers for K. The algorithm does not rely on an RDF schema, but it
synthesizes SPARQL queries by exploring the similarity between the property
domains and ranges, and the class instance sets observed in the RDF dataset.
It estimates set similarity based on set synopses, which can be efficiently precomputed
in a single pass over the RDF dataset. The thesis includes two
sets of experiments with an implementation of the algorithm. The first set
of experiments shows that the implementation outperforms a baseline RDF
keyword search tool that explores the RDF schema, while the second set of
experiments indicate that the implementation performs better than the stateof-
the-art TSA+BM25 and TSA+VDP keyword search systems over RDF
datasets based on the virtual documents approach. Finally, the thesis also
computes the effectiveness of the proposed algorithm using a metric based on
the concept of graph relevance.
The second problem addressed in this thesis is the keyword search over
RDF semantic trajectories problem. Stop-and-move semantic trajectories are
segmented trajectories where the stops and moves are semantically enriched
with additional data. A query language for semantic trajectory datasets has
to include selectors for stops or moves based on their enrichments, and
sequence expressions that define how to match the results of selectors with
the sequence the semantic trajectory defines. The thesis first proposes a
formal framework to define semantic trajectories and introduces stop and move
sequence expressions, with well-defined syntax and semantics, which act as
an expressive query language for semantic trajectories. Then, it describes a
concrete semantic trajectory model in RDF, defines SPARQL stop-and-move
sequence expressions, and discusses strategies to compile such expressions
into SPARQL queries. Next, the thesis specifies user-friendly keyword search
expressions over semantic trajectories based on the use of keywords to specify
stop and move queries, and the adoption of terms with predefined semantics
to compose sequence expressions. It then shows how to compile such keyword
search expressions into SPARQL queries. Finally, it provides a proof-of-concept
experiment over a semantic trajectory dataset constructed with user-generated
content from Flickr, combined with Wikipedia data.
|
Page generated in 0.0364 seconds