Spelling suggestions: "subject:"moins"" "subject:"soins""
11 |
Junções por similaridade com expressões complexas em ambientes distribuídos / Set similarity joins with complex expressions on distributed platformsOliveira, Diego Junior do Carmo 31 August 2018 (has links)
Submitted by Liliane Ferreira (ljuvencia30@gmail.com) on 2018-10-01T13:06:03Z
No. of bitstreams: 2
Dissertação - Diego Junior do Carmo Oliveira - 2018.pdf: 2678764 bytes, checksum: c32f645ce8abd8a764bec1993d41337b (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2018-10-01T14:48:43Z (GMT) No. of bitstreams: 2
Dissertação - Diego Junior do Carmo Oliveira - 2018.pdf: 2678764 bytes, checksum: c32f645ce8abd8a764bec1993d41337b (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-10-01T14:48:43Z (GMT). No. of bitstreams: 2
Dissertação - Diego Junior do Carmo Oliveira - 2018.pdf: 2678764 bytes, checksum: c32f645ce8abd8a764bec1993d41337b (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Previous issue date: 2018-08-31 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / A recurrent problem that degrades the quality of the information in databases is the presence
of duplicates, i.e., multiple representations of the same real-world entity. Despite being
computationally expensive, the use of similarity operations is fundamental to identify
duplicates. Furthermore, real-world data is typically composed of different attributes and each
attribute represents a distinct type of information. The application of complex similarity
expressions is important in this context because they allow considering the importance of
each attribute in the similarity evaluation. However, due to a large amount of data present in
Big Data applications, it has become crucial to perform these operations in parallel and
distributed processing environments. In order to solve such problems of great relevance to
organizations, this work proposes a novel strategy to identify duplicates in textual data by
using similarity joins with complex expressions in a distributed environment. / Um problema recorrente que degrada a qualidade das informações em banco de dados é a
presença de duplicatas, isto é, múltiplas representações de uma mesma entidade do mundo
real. Apesar de ser computacionalmente oneroso, para realizar a identificação de duplicatas é
fundamental o emprego operações de similaridade. Além disso, os dados atuais são
tipicamente compostos por diferentes atributos, cada um destes contendo um tipo distinto de
informação. A aplicação de expressões de similaridade complexas é importante neste contexto
uma vez que permitem considerar a importância de cada atributo na avaliação da
similaridade. No entanto, em virtude da grande quantidade de dados presentes em aplicações
Big Data, fez-se necessário realizar o processamento destas operações em ambientes de
programação paralelo ou distribuído. Visando solucionar estes problemas de grande relevância
para as organizações, este trabalho propõe uma nova estratégia de processamento para identificação de duplicatas em dados textuais utilizando junções por similaridade com
expressões complexas em um ambiente distribuído.
|
12 |
Recollements de morceaux de cyclides de Dupin pour la modélisation et la reconstruction 3D : étude dans l'espace des sphères / Blending pieces of Dupin cyclides for 3D modeling and reconstruction : study in the space of spheresDruoton, Lucie 04 April 2013 (has links)
La thèse porte sur le raccordement de surfaces canal en modélisation géométriques en utilisant des morceaux de cyclides de Dupin. Elle tente de répondre à un problème de reconstruction de pièces controlées et usinées par le CEA de Valduc. En se plaçant dans l'espace adéquat, l'espace des sphères, dans lequel nous pouvons manipuler à la fois les points, les sphères et les surfaces canal, nous simplifions considérablement certains problèmes. Cet espace est représenté par une quadrique de dimension 4 dans un espace de dimension 5, muni de la forme de Lorentz : l'espace de Lorentz. Dans l'espace des sphères, les problèmes de recollements de surfaces canal par des morceaux de cyclides de Dupin se simplifient en problèmes linéaires. Nous donnons les algorithmes permettant de réaliser ce type de jointures en utilisant l'espace des sphères puis nous revenons dans l'espace à 3 dimensions usuel. Ces jointures se font toujours le long de cercles caractéristiques des surfaces considérées. En résolvant le problème dit des trois conditions de contact, nous mettons en évidence une autre courbe particulière, sur une famille à un paramètre de cyclides, que nous appellons courbe de contact qui permettrait d'effectuer des jointures le long d'autres courbes / The thesis deals with the blending of canal surfaces in geometric modeling using pieces of Dupin Cyclides. We try to solve a problem of reconstructing real parts manufactured and controlled by the CEA of Valduc. Using the space of spheres in which we can manipulate both points, spheres and canal surfaces, we simplify some problems. This space is represented by a 4-dimensional quadric in a 5-dimensional space, equipped with the Lorentz form, it is the Lorentz space. In the space of spheres, problems of blending canal surfaces by pieces of Dupin cyclides are simplified in linear problems. We give algorithms to make such blends using the space of spheres and after we come back to 3 dimensions to draw the result. These blends are always made along characteristics circles of the considered surfaces. By solving the problem of three contact conditions, we highlight another particular curve, on a one parameter familly of cyclides, that we call contact curve along which we could also make this kind of blends
|
13 |
Struktura a vlastnosti svarového spoje TiAl6V4/6061 zhotoveného technologií elektronového paprsku / Structure and properties of welded joint TiAl6V4 / 6061 made by electron beam technologyKrál, Michael January 2017 (has links)
Titanium and aluminium alloys are among the most used construct materials due to their physical and mechanical properties except steels. The joining of these alloys can improve properties of whole construction but it is still difficult task. Especially welding of titanium and aluminium alloys is difficult cause formation of undesirable intermetalic phases in the weld. This thesis focuses on influences of electron beam welding parameters especially focusing and deflection of beam and preheating of base material to quality of heterogeneous join of titanium alloy Ti6Al4V and aluminium alloy EN AW-6061 – T651. There is described preparation of welded joins and brazed joins in the thesis, which are evaluated by light microscopy, scanning electron microscopy and EDS analysis of chemical composition. There was evaluated presence and chemical composition of formated intermetalic phases in the welded joins and quality and defects in the brazed joins.
|
14 |
Improving the performance of GPU-accelerated spatial joinsHrstic, Dusan Viktor January 2017 (has links)
Data collisions have been widely studied by various fields of science and industry. Combing CPU and GPU for processing spatial joins has been broadly accepted due to the increased speed of computations. This should redirect efforts in GPGPU research from straightforward porting of applications to establishing principles and strategies that allow efficient mapping of computation to graphics hardware. As threads are executing instructions while using hardware resources that are available, impact of different thread organizations and their effect on spatial join performance is analyzed and examined in this report.Having new perspectives and solutions to the problem of thread organization and warp scheduling may contribute more to encourage others to program on the GPU side. The aim with this project is to examine the impact of different thread organizations in spatial join processes. The relationship between the items inside datasets are examined by counting the number of collisions their join produce in order to understand how different approaches may have an influence on performance. Performance benchmarking, analysis and measuring of different approaches in thread organization are investigated and analyzed in this report in order to find the most time efficient solution which is the purpose of the conducted work.This report shows the obtained results for the utilization of different thread techniques in order to optimize the computational speeds of the spatial join algorithms. There are two algorithms on the GPU, one implementing thread techniques and the other non-optimizing solution. The GPU times are compared with the execution times on the CPU and the GPU implementations are verified by observing the collision counters that are matching with all of the collision counters from the CPU counterpart.In the analysis part of this report the the implementations are discussed and compared to each other. It has shown that the difference between algorithm implementing thread techniques and the non-optimizing one lies around 80% in favour of the algorithm implementing thread techniques and it is also around 56 times faster then the spatial joins on the CPU. / Datakollisioner har studerats i stor utsträckning i olika områden inom vetenskap och industri. Att kombinera CPU och GPU för bearbetning av rumsliga föreningar har godtagits på grund av bättre prestanda. Detta bör omdirigera insatser i GPGPU-forskning från en enkel portning av applikationer till fastställande av principer och strategier som möjliggör en effektiv användning av grafikhårdvara. Eftersom trådar som exekverar instruktioner använder sig av hårdvaruresurser, förekommer olika effekter beroende på olika trådorganisationer. Deras på verkan på prestanda av rumsliga föreningar kommer att analyseras och granskas i denna rapport. Nya perspektiv och lösningar på problemet med trådorganisationen och schemaläggning av warps kan bidra till att fler uppmuntras till att använda GPU-programmering. Syftet med denna rapport är att undersöka effekterna av olika trådorganisationer i rumsliga föreningar. Förhållandet mellan objekten inom datamängder undersöks genom att beräkna antalet kollisioner som ihopslagna datamängder förorsakar. Detta görs för att förstå hur olika metoder kan påverka effektivitet och prestanda. Prestandamätningar av olika metoder inom trå dorganisationer undersö ks och analyseras fö r att hitta den mest tidseffektiva lösningen. I denna rapport visualiseras också det erhållna resultatet av olika trådtekniker som används för att optimera beräkningshastigheterna för rumsliga föreningar. Rapporten undersökeren CPU-algoritm och två GPU-algoritmer. GPU tiderna jämförs hela tiden med exekveringstiderna på CPU:n, och GPU-implementeringarna verifieras genom att jämföra antalet kollisioner från både CPU:n och GPU:n. Under analysdelen av rapporten jämförs och diskuteras olika implementationer med varandra. Det visade sig att skillnaden mellan en algoritm som implementerar trådtekniker och en icke-optimerad version är cirka 80 % till förmån för algoritmen som implementerar trådtekniker. Det visade sig också föreningarna på CPU:n att den är runt 56 gånger snabbare än de rumsliga
|
15 |
Real-time Business Intelligence through Compact and Efficient Query Processing Under UpdatesIdris, Muhammad 10 April 2019 (has links)
Responsive analytics are rapidly taking over the traditional data analytics dominated by the post-fact approaches in traditional data warehousing. Recent advancements in analytics demand placing analytical engines at the forefront of the system to react to updates occurring at high speed and detect patterns, trends and anomalies. These kinds of solutions find applications in Financial Systems, Industrial Control Systems, Business Intelligence and on-line Machine Learning among others. These applications are usually associated with Big Data and require the ability to react to constantly changing data in order to obtain timely insights and take proactive measures. Generally, these systems specify the analytical results or their basic elements in a query language, where the main task then is to maintain these results under frequent updates efficiently. The task of reacting to updates and analyzing changing data has been addressed in two ways in the literature: traditional business intelligence (BI) solutions focus on historical data analysis where the data is refreshed periodically and in batches, and stream processing solutions process streams of data from transient sources as flow (or set of flows) of data items. Both kinds of systems share the niche of reacting to updates (known as dynamic evaluation); however, they differ in architecture, query languages, and processing mechanisms. In this thesis, we investigate the possibility of a reactive and unified framework to model queries that appear in both kinds of systems.
In traditional BI solutions, evaluating queries under updates has been studied under the umbrella of incremental evaluation of updates that is based on relational incremental view maintenance model and mostly focus on queries that feature equi-joins. Streaming systems, in contrast, generally follow the automaton based models to evaluate queries under updates, and they generally process queries that mostly feature comparisons of temporal attributes (e.g., timestamp attributes) along-with comparisons of non-temporal attributes over streams of bounded sizes. Temporal comparisons constitute inequality constraints, while non-temporal comparisons can either be equality or inequality constraints, hence these systems mostly process inequality joins. As starting point, we postulate the thesis that queries in streaming systems can also be evaluated efficiently based on the paradigm of incremental evaluation just like in BI systems in a main-memory model. The efficiency of such a model is measured in terms of runtime memory footprint and the update processing cost. To this end, the existing approaches of dynamic evaluation in both kind of systems present a trade-off between memory footprint and the update processing cost. More specifically, systems that avoid materialization of query (sub) results incur high update latency and systems that materialize (sub) results incur high memory footprint. We are interested in investigating the possibility to build a model that can address this trade-off. In particular, we overcome this trade-off by investigating the possibility of practical dynamic evaluation algorithm for queries that appear in both kinds of systems, and present a main-memory data representation that allows to enumerate query (sub) results without materialization and can be maintained efficiently under updates. We call this representation the Dynamic Constant Delay Linear Representation (DCLR).
We devise DCLRs with the following properties: 1) they allow, without materialization, enumeration of query results with bounded-delay (and with constant delay for a sub-class of queries); 2) they allow tuple lookup in query results with logarithmic delay (and with constant delay for conjunctive queries with equi-joins only); 3) they take space linear in the size of the database; 4) they can be maintained efficiently under updates. We first study the DCLRs with the above-described properties for the class of acyclic conjunctive queries featuring equi-joins with projections and present the dynamic evaluation algorithm. Then, we present the generalization of thiw algorithm to the class of acyclic queries featuring multi-way theta-joins with projections. We devise DCLRs with the above properties for acyclic conjunctive queries, and the working of dynamic algorithms over DCLRs is based on a particular variant of join trees, called the Generalized Join Trees (GJTs) that guarantee the above-described properties of DCLRs. We define GJTs and present the algorithms to test a conjunctive query featuring theta-joins for acyclicity and to generate GJTs for such queries. To do this, we extend the classical GYO algorithm from testing a conjunctive query with equalities for acyclicity to test a conjunctive query featuring multi-way theta-joins with projections for acyclicity. We further extend the GYO algorithm to generate GJTs for queries that are acyclic. We implemented our algorithms in a query compiler that takes as input the SQL queries and generates Scala executable code – a trigger program to process queries and maintain under updates. We tested our approach against state of the art main-memory BI and CEP systems. Our evaluation results have shown that our DCLRs based approach is over an order of magnitude efficient than existing systems for both memory footprint and update processing cost. We have also shown that the enumeration of query results without materialization in DCLRs is comparable (and in some cases efficient) as compared to enumerating from materialized query results.
|
16 |
Versões do teorema de Tverberg e aplicaçõesPoncio, Carlos Henrique Felicio 25 February 2016 (has links)
Submitted by Livia Mello (liviacmello@yahoo.com.br) on 2016-10-05T14:40:49Z
No. of bitstreams: 1
DissCHFP.pdf: 1216039 bytes, checksum: e21e062b0283d2bfe6ec436442e824a5 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T19:23:38Z (GMT) No. of bitstreams: 1
DissCHFP.pdf: 1216039 bytes, checksum: e21e062b0283d2bfe6ec436442e824a5 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T19:23:43Z (GMT) No. of bitstreams: 1
DissCHFP.pdf: 1216039 bytes, checksum: e21e062b0283d2bfe6ec436442e824a5 (MD5) / Made available in DSpace on 2016-10-20T19:23:50Z (GMT). No. of bitstreams: 1
DissCHFP.pdf: 1216039 bytes, checksum: e21e062b0283d2bfe6ec436442e824a5 (MD5)
Previous issue date: 2016-02-25 / Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) / In this work, we will use topological methods in combinatorics and geometry to
present a proof of the topological Tverberg theorem and a result about many Tverberg
partitions. / O objetivo principal desta dissertação consiste em desenvolver um estudo detalhado
de métodos topológicos em combinatória e geometria visando apresentar uma prova da
versão topológica do teorema de Tverberg e de um teorema sobre a quantidade de partições
de Tverberg. / FAPESP: 2015/01264-7
|
17 |
Étude fondamentale des interactions plasma-graphène dans les plasmas Argon/B2H6Vinchon, Pierre 12 1900 (has links)
Les travaux de recherche menés dans cette thèse de doctorat se sont focalisés sur la compréhension des interactions graphène-plasma dans le cas de l’exposition de graphène polycristallin à un plasma d’argon pouvant contenant du diborane (B2H6). Une attention particulière est portée sur la cinétique de génération de dommage dans un plasma d’argon pur. Ainsi dans le cas d’un plasma continu, l’absence de seuil en énergie pour la génération de dommage due à un bombardement ionique est mis en évidence. Ceci ne peut s’expliquer que par une gravure à deux étapes, facilitée par la densité ionique élevée caractéristique des plasmas inductifs opérés en mode H. La caractérisation Raman des échantillons exposés au plasma montre une large distribution sur la petite zone sondée. Afin de relier ces fluctuations à l’état initial du graphène, l’imagerie Raman (RIMA) est adaptée dans le but d’extraire des données quantitatives sur l’état du graphène et utilisée pour le reste des travaux. Par la suite, l’étude temporelle des plasmas pulsés en puissance permet de trouver des conditions opératoires avec une fluence ionique drastiquement diminuée. Les traitements subséquents combinés aux analyses RIMA ont permis de suivre l’évolution de l’état du graphène et de distinguer l’état des joints du graphène des domaines de croissance. Ainsi, pour la première fois, l’autoréparation des joints de grains dans un matériau 2D est mis en évidence expérimentalement. Cet effet, théorisé dans les matériaux 3D mais difficilement observé expérimentalement, était effectivement prédis dans le cas du graphène. De plus, un contrôle fin des conditions opératoires du plasma pulsé d’argon a permis d’extraire des paramètres plasmas dans lesquels les métastables d’argons puis les photons VUV émis par les états résonants de l’argon sont les principaux vecteurs d’énergie. Suivant la même méthodologie que précédemment, ces traitements ont mis en lumière les rôles respectifs des ions, des métastables et des photons VUV dans la transmission d’énergie du graphène. Enfin, l’introduction de 5% de diborane a pour conséquence une modification radicale des paramètres physique du plasma. L’exposition de graphène à ce graphène à ce plasma démontre l’intérêt de cette technique pour l’incorporation élevé de bore tout en minimisant la génération de dommages / The research realized in this PhD thesis focuses on the understanding of plasma-graphene interactions during exposure of polycrystalline graphene films to a low-pressure argon RF plasma containing diborane (B2H6). A particular attention is devoted to the kinetics driving the damage formation dynamics. In the case of a continuous, argon plasma, the absence of energy threshold for the production of ion-induced damage is demonstrated. This is explained by two-step etching, facilitated by the high number density of charged species in the H-mode of RF plasmas. Raman characterization of plasma-treated graphene films shows a wide distribution over the small area surveyed. In order to link these fluctuations to the initial state of graphene, Raman imaging (RIMA) is adapted to extract quantitative data on the state of graphene before and after plasma treatment. Subsequently, the temporal study of argon RF plasmas in the pulsed regime makes it possible to find operating conditions with a drastically reduced fluence of charged species compared to the continuous regime; in combination with RIMA studies, this allows temporally- and spatially-resolved investigations of plasma-graphene interactions. For the first time, a preferential self-healing of ion-irradiation damage at grain boundaries of graphene films is experimentally demonstrated. Moreover, by using several electrical and optical diagnostics of the argon plasma in the pulsed regime, it is possible to determine operating conditions in which either the ions, the metastables or the VUV photons emitted by the resonant states become the main energy vectors. From these experiments, the respective roles of each of these species in the physics of plasma-graphene interactions could be highlighted. Finally, the introduction of 5% of diborane into the argon plasma induces a radical modification of the physicochemical properties of the plasma. Exposure of graphene films to this highly reactive plasma reveals high boron incorporation with minimal ion and hydrogen damage.
|
18 |
Database System Acceleration on FPGAsMoghaddamfar, Mehdi 30 May 2023 (has links)
Relational database systems provide various services and applications with an efficient means for storing, processing, and retrieving their data. The performance of these systems has a direct impact on the quality of service of the applications that rely on them. Therefore, it is crucial that database systems are able to adapt and grow in tandem with the demands of these applications, ensuring that their performance scales accordingly. In the past, Moore's law and algorithmic advancements have been sufficient to meet these demands. However, with the slowdown of Moore's law, researchers have begun exploring alternative methods, such as application-specific technologies, to satisfy the more challenging performance requirements. One such technology is field-programmable gate arrays (FPGAs), which provide ideal platforms for developing and running custom architectures for accelerating database systems.
The goal of this thesis is to develop a domain-specific architecture that can enhance the performance of in-memory database systems when executing analytical queries. Our research is guided by a combination of academic and industrial requirements that seek to strike a balance between generality and performance. The former ensures that our platform can be used to process a diverse range of workloads, while the latter makes it an attractive solution for high-performance use cases.
Throughout this thesis, we present the development of a system-on-chip for database system acceleration that meets our requirements. The resulting architecture, called CbMSMK, is capable of processing the projection, sort, aggregation, and equi-join database operators and can also run some complex TPC-H queries. CbMSMK employs a shared sort-merge pipeline for executing all these operators, which results in an efficient use of FPGA resources. This approach enables the instantiation of multiple acceleration cores on the FPGA, allowing it to serve multiple clients simultaneously. CbMSMK can process both arbitrarily deep and wide tables efficiently. The former is achieved through the use of the sort-merge algorithm which utilizes the FPGA RAM for buffering intermediate sort results. The latter is achieved through the use of KeRRaS, a novel variant of the forward radix sort algorithm introduced in this thesis. KeRRaS allows CbMSMK to process a table a few columns at a time, incrementally generating the final result through multiple iterations. Given that acceleration is a key objective of our work, CbMSMK benefits from many performance optimizations. For instance, multi-way merging is employed to reduce the number of merge passes required for the execution of the sort-merge algorithm, thus improving the performance of all our pipeline-breaking operators. Another example is our in-depth analysis of early aggregation, which led to the development of a novel cache-based algorithm that significantly enhances aggregation performance. Our experiments demonstrate that CbMSMK performs on average 5 times faster than the state-of-the-art CPU-based database management system MonetDB.:I Database Systems & FPGAs
1 INTRODUCTION
1.1 Databases & the Importance of Performance
1.2 Accelerators & FPGAs
1.3 Requirements
1.4 Outline & Summary of Contributions
2 BACKGROUND ON DATABASE SYSTEMS
2.1 Databases
2.1.1 Storage Model
2.1.2 Storage Medium
2.2 Database Operators
2.2.1 Projection
2.2.2 Filter
2.2.3 Sort
2.2.4 Aggregation
2.2.5 Join
2.2.6 Operator Classification
2.3 Database Queries
2.4 Impact of Acceleration
3 BACKGROUND ON FPGAS
3.1 FPGA
3.1.1 Logic Element
3.1.2 Block RAM (BRAM)
3.1.3 Digital Signal Processor (DSP)
3.1.4 IO Element
3.1.5 Programmable Interconnect
3.2 FPGADesignFlow
3.2.1 Specifications
3.2.2 RTL Description
3.2.3 Verification
3.2.4 Synthesis, Mapping, Placement, and Routing
3.2.5 TimingAnalysis
3.2.6 Bitstream Generation and FPGA Programming
3.3 Implementation Quality Metrics
3.4 FPGA Cards
3.5 Benefits of Using FPGAs
3.6 Challenges of Using FPGAs
4 RELATED WORK
4.1 Summary of Related Work
4.2 Platform Type
4.2.1 Accelerator Card
4.2.2 Coprocessor
4.2.3 Smart Storage
4.2.4 Network Processor
4.3 Implementation
4.3.1 Loop-based implementation
4.3.2 Sort-based Implementation
4.3.3 Hash-based Implementation
4.3.4 Mixed Implementation
4.4 A Note on Quantitative Performance Comparisons
II Cache-Based Morphing Sort-Merge with KeRRaS (CbMSMK)
5 OBJECTIVES AND ARCHITECTURE OVERVIEW
5.1 From Requirements to Objectives
5.2 Architecture Overview
5.3 Outlineof Part II
6 COMPARATIVE ANALYSIS OF OPENCL AND RTL FOR SORT-MERGE PRIMITIVES ON FPGAS
6.1 Programming FPGAs
6.2 RelatedWork
6.3 Architecture
6.3.1 Global Architecture
6.3.2 Sorter Architecture
6.3.3 Merger Architecture
6.3.4 Scalability and Resource Adaptability
6.4 Experiments
6.4.1 OpenCL Sort-Merge Implementation
6.4.2 RTLSorters
6.4.3 RTLMergers
6.4.4 Hybrid OpenCL-RTL Sort-Merge Implementation
6.5 Summary & Discussion
7 RESOURCE-EFFICIENT ACCELERATION OF PIPELINE-BREAKING DATABASE OPERATORS ON FPGAS
7.1 The Case for Resource Efficiency
7.2 Related Work
7.3 Architecture
7.3.1 Sorters
7.3.2 Sort-Network
7.3.3 X:Y Mergers
7.3.4 Merge-Network
7.3.5 Join Materialiser (JoinMat)
7.4 Experiments
7.4.1 Experimental Setup
7.4.2 Implementation Description & Tuning
7.4.3 Sort Benchmarks
7.4.4 Aggregation Benchmarks
7.4.5 Join Benchmarks
7. Summary
8 KERRAS: COLUMN-ORIENTED WIDE TABLE PROCESSING ON FPGAS
8.1 The Scope of Database System Accelerators
8.2 Related Work
8.3 Key-Reduce Radix Sort(KeRRaS)
8.3.1 Time Complexity
8.3.2 Space Complexity (Memory Utilization)
8.3.3 Discussion and Optimizations
8.4 Architecture
8.4.1 MSM
8.4.2 MSMK: Extending MSM with KeRRaS
8.4.3 Payload, Aggregation and Join Processing
8.4.4 Limitations
8.5 Experiments
8.5.1 Experimental Setup
8.5.2 Datasets
8.5.3 MSMK vs. MSM
8.5.4 Payload-Less Benchmarks
8.5.5 Payload-Based Benchmarks
8.5.6 Flexibility
8.6 Summary
9 A STUDY OF EARLY AGGREGATION IN DATABASE QUERY PROCESSING ON FPGAS
9.1 Early Aggregation
9.2 Background & Related Work
9.2.1 Sort-Based Early Aggregation
9.2.2 Cache-Based Early Aggregation
9.3 Simulations
9.3.1 Datasets
9.3.2 Metrics
9.3.3 Sort-Based Versus Cache-Based Early Aggregation
9.3.4 Comparison of Set-Associative Caches
9.3.5 Comparison of Cache Structures
9.3.6 Comparison of Replacement Policies
9.3.7 Cache Selection Methodology
9.4 Cache System Architecture
9.4.1 Window Aggregator
9.4.2 Compressor & Hasher
9.4.3 Collision Detector
9.4.4 Collision Resolver
9.4.5 Cache
9.5 Experiments
9.5.1 Experimental Setup
9.5.2 Resource Utilization and Parameter Tuning
9.5.3 Datasets
9.5.4 Benchmarks on Synthetic Data
9.5.5 Benchmarks on Real Data
9.6 Summary
10 THE FULL PICTURE
10.1 System Architecture
10.2 Benchmarks
10.3 Meeting the Objectives
III Conclusion
11 SUMMARY AND OUTLOOK ON FUTURE RESEARCH
11.1 Summary
11.2 Future Work
BIBLIOGRAPHY
LIST OF FIGURES
LIST OF TABLES
|
19 |
Supporting Advanced Queries on Scientific Array DataEbenstein, Roee A. 18 December 2018 (has links)
No description available.
|
20 |
Scalable algorithms for cloud-based Semantic Web data management / Algorithmes passant à l’échelle pour la gestion de données du Web sémantique sur les platformes cloudZampetakis, Stamatis 21 September 2015 (has links)
Afin de construire des systèmes intelligents, où les machines sont capables de raisonner exactement comme les humains, les données avec sémantique sont une exigence majeure. Ce besoin a conduit à l’apparition du Web sémantique, qui propose des technologies standards pour représenter et interroger les données avec sémantique. RDF est le modèle répandu destiné à décrire de façon formelle les ressources Web, et SPARQL est le langage de requête qui permet de rechercher, d’ajouter, de modifier ou de supprimer des données RDF. Être capable de stocker et de rechercher des données avec sémantique a engendré le développement des nombreux systèmes de gestion des données RDF.L’évolution rapide du Web sémantique a provoqué le passage de systèmes de gestion des données centralisées à ceux distribués. Les premiers systèmes étaient fondés sur les architectures pair-à-pair et client-serveur, alors que récemment l’attention se porte sur le cloud computing.Les environnements de cloud computing ont fortement impacté la recherche et développement dans les systèmes distribués. Les fournisseurs de cloud offrent des infrastructures distribuées autonomes pouvant être utilisées pour le stockage et le traitement des données. Les principales caractéristiques du cloud computing impliquent l’évolutivité́, la tolérance aux pannes et l’allocation élastique des ressources informatiques et de stockage en fonction des besoins des utilisateurs.Cette thèse étudie la conception et la mise en œuvre d’algorithmes et de systèmes passant à l’échelle pour la gestion des données du Web sémantique sur des platformes cloud. Plus particulièrement, nous étudions la performance et le coût d’exploitation des services de cloud computing pour construire des entrepôts de données du Web sémantique, ainsi que l’optimisation de requêtes SPARQL pour les cadres massivement parallèles.Tout d’abord, nous introduisons les concepts de base concernant le Web sémantique et les principaux composants des systèmes fondés sur le cloud. En outre, nous présentons un aperçu des systèmes de gestion des données RDF (centralisés et distribués), en mettant l’accent sur les concepts critiques de stockage, d’indexation, d’optimisation des requêtes et d’infrastructure.Ensuite, nous présentons AMADA, une architecture de gestion de données RDF utilisant les infrastructures de cloud public. Nous adoptons le modèle de logiciel en tant que service (software as a service - SaaS), où la plateforme réside dans le cloud et des APIs appropriées sont mises à disposition des utilisateurs, afin qu’ils soient capables de stocker et de récupérer des données RDF. Nous explorons diverses stratégies de stockage et d’interrogation, et nous étudions leurs avantages et inconvénients au regard de la performance et du coût monétaire, qui est une nouvelle dimension importante à considérer dans les services de cloud public.Enfin, nous présentons CliqueSquare, un système distribué de gestion des données RDF basé sur Hadoop. CliqueSquare intègre un nouvel algorithme d’optimisation qui est capable de produire des plans massivement parallèles pour des requêtes SPARQL. Nous présentons une famille d’algorithmes d’optimisation, s’appuyant sur les équijointures n- aires pour générer des plans plats, et nous comparons leur capacité à trouver les plans les plus plats possibles. Inspirés par des techniques de partitionnement et d’indexation existantes, nous présentons une stratégie de stockage générique appropriée au stockage de données RDF dans HDFS (Hadoop Distributed File System). Nos résultats expérimentaux valident l’effectivité et l’efficacité de l’algorithme d’optimisation démontrant également la performance globale du système. / In order to build smart systems, where machines are able to reason exactly like humans, data with semantics is a major requirement. This need led to the advent of the Semantic Web, proposing standard ways for representing and querying data with semantics. RDF is the prevalent data model used to describe web resources, and SPARQL is the query language that allows expressing queries over RDF data. Being able to store and query data with semantics triggered the development of many RDF data management systems. The rapid evolution of the Semantic Web provoked the shift from centralized data management systems to distributed ones. The first systems to appear relied on P2P and client-server architectures, while recently the focus moved to cloud computing.Cloud computing environments have strongly impacted research and development in distributed software platforms. Cloud providers offer distributed, shared-nothing infrastructures that may be used for data storage and processing. The main features of cloud computing involve scalability, fault-tolerance, and elastic allocation of computing and storage resources following the needs of the users.This thesis investigates the design and implementation of scalable algorithms and systems for cloud-based Semantic Web data management. In particular, we study the performance and cost of exploiting commercial cloud infrastructures to build Semantic Web data repositories, and the optimization of SPARQL queries for massively parallel frameworks.First, we introduce the basic concepts around Semantic Web and the main components and frameworks interacting in massively parallel cloud-based systems. In addition, we provide an extended overview of existing RDF data management systems in the centralized and distributed settings, emphasizing on the critical concepts of storage, indexing, query optimization, and infrastructure. Second, we present AMADA, an architecture for RDF data management using public cloud infrastructures. We follow the Software as a Service (SaaS) model, where the complete platform is running in the cloud and appropriate APIs are provided to the end-users for storing and retrieving RDF data. We explore various storage and querying strategies revealing pros and cons with respect to performance and also to monetary cost, which is a important new dimension to consider in public cloud services. Finally, we present CliqueSquare, a distributed RDF data management system built on top of Hadoop, incorporating a novel optimization algorithm that is able to produce massively parallel plans for SPARQL queries. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. Inspired by existing partitioning and indexing techniques we present a generic storage strategy suitable for storing RDF data in HDFS (Hadoop’s Distributed File System). Our experimental results validate the efficiency and effectiveness of the optimization algorithm demonstrating also the overall performance of the system.
|
Page generated in 0.0507 seconds