Global ETD Search

101	Partitionnement réparti basé sur les sommets / Distributed edge partitioning Mykhailenko, Hlib 14 June 2017 (has links) Pour traiter un graphe de manière répartie, le partitionnement est une étape préliminaire importante car elle influence de manière significative le temps final d’exécutions. Dans cette thèse nous étudions le problème du partitionnement réparti de graphe. Des travaux récents ont montré qu’une approche basée sur le partitionnement des sommets plutôt que des arêtes offre de meilleures performances pour les graphes de type power-laws qui sont courant dans les données réelles. Dans un premier temps nous avons étudié les différentes métriques utilisées pour évaluer la qualité d’un partitionnement. Ensuite nous avons analysé et comparé plusieurs logiciels d’analyse de grands graphes (Hadoop, Giraph, Giraph++, Distributed GrahpLab et PowerGraph), les comparant `a une solution très populaire actuellement, Spark et son API de traitement de graphe appelée GraphX. Nous présentons les algorithmes de partitionnement les plus récents et introduisons une classification. En étudiant les différentes publications, nous arrivons à la conclusion qu’il n’est pas possible de comparer la performance relative de tous ces algorithmes. Nous avons donc décidé de les implémenter afin de les comparer expérimentalement. Les résultats obtenus montrent qu’un partitionneur de type Hybrid-Cut offre les meilleures performances. Dans un deuxième temps, nous étudions comment il est possible de prédire la qualité d’un partitionnement avant d’effectivement traiter le graphe. Pour cela, nous avons effectué de nombreuses expérimentations avec GraphX et effectué une analyse statistique précise des résultats en utilisation un modèle de régression linéaire. Nos expérimentations montrent que les métriques de communication sont de bons indicateurs de la performance. Enfin, nous proposons un environnement de partitionnement réparti basé sur du recuit simulé qui peut être utilisé pour optimiser une large partie des métriques de partitionnement. Nous fournissons des conditions suffisantes pour assurer la convergence vers l’optimum et discutons des métriques pouvant être effectivement optimisées de manière répartie. Nous avons implémenté cet algorithme dans GraphX et comparé ses performances avec JA-BE-JA-VC. Nous montrons que notre stratégie amène a` des améliorations significatives. / In distributed graph computation, graph partitioning is an important preliminary step because the computation time can significantly depend on how the graph has been split among the different executors. In this thesis we explore the graph partitioning problem. Recently, edge partitioning approach has been advocated as a better approach to process graphs with a power-law degree distribution, which are very common in real-world datasets. That is why we focus on edge partition- ing approach. We start by an overview of existing metrics, to evaluate the quality of the graph partitioning. We briefly study existing graph processing systems: Hadoop, Giraph, Giraph++, Distributed GrahpLab, and PowerGraph with their key features. Next, we compare them to Spark, a popular big-data processing framework with its graph processing APIs — GraphX. We provide an overview of existing edge partitioning algorithms and introduce partitioner classification. We conclude that, based only on published work, it is not possible to draw a clear conclusion about the relative performances of these partitioners. For this reason, we have experimentally compared all the edge partitioners currently avail- able for GraphX. Results suggest that Hybrid-Cut partitioner provides the best performance. We then study how it is possible to evaluate the quality of a parti- tion before running a computation. To this purpose, we carry experiments with GraphX and we perform an accurate statistical analysis using a linear regression model. Our experimental results show that communication metrics like vertex-cut and communication cost are effective predictors in most of the cases. Finally, we propose a framework for distributed edge partitioning based on distributed simulated annealing which can be used to optimize a large family of partitioning metrics. We provide sufficient conditions for convergence to the optimum and discuss which metrics can be efficiently optimized in a distributed way. We implemented our framework with GraphX and performed a comparison with JA-BE-JA-VC, a state-of-the-art partitioner that inspired our approach. We show that our approach can provide significant improvements. Partitionnement basé sur les sommets Spark GraphX Edge partitioning Spark GraphX
102	Hive, Spark, Presto for Interactive Queries on Big Data Gureev, Nikita January 2018 (has links) Traditional relational database systems can not be efficiently used to analyze data with large volume and different formats, i.e. big data. Apache Hadoop is one of the first open-source tools that provides a distributed data storage system and resource manager. The space of big data processing has been growing fast over the past years and many technologies have been introduced in the big data ecosystem to address the problem of processing large volumes of data, and some of the early tools have become widely adopted, with Apache Hive being one of them. However,with the recent advances in technology, there are other tools better suited for interactive analytics of big data, such as Apache Spark and Presto. In this thesis these technologies are examined and benchmarked in order to determine their performance for the task of interactive business intelligence queries. The benchmark is representative of interactive business intelligence queries, and uses a star-shaped schema. The performance HiveTez, Hive LLAP, Spark SQL, and Presto is examined with text, ORC, Parquet data on different volume and concurrency. A short analysis and conclusions are presented with the reasoning about the choice of framework and data format for a system that would run interactive queries on bigdata. / Traditionella relationella databassystem kan inte användas effektivt för att analysera stora datavolymer och filformat, såsom big data. Apache Hadoop är en av de första open-source verktyg som tillhandahåller ett distribuerat datalagring och resurshanteringssystem. Området för big data processing har växt fort de senaste åren och många teknologier har introducerats inom ekosystemet för big data för att hantera problemet med processering av stora datavolymer, och vissa tidiga verktyg har blivit vanligt förekommande, där Apache Hive är en av de. Med nya framsteg inom området finns det nu bättre verktyg som är bättre anpassade för interaktiva analyser av big data, som till exempel Apache Spark och Presto. I denna uppsats är dessa teknologier analyserade med benchmarks för att fastställa deras prestanda för uppgiften av interaktiva business intelligence queries. Dessa benchmarks är representative för interaktiva business intelligence queries och använder stjärnformade scheman. Prestandan är undersökt för Hive Tex, Hive LLAP, Spark SQL och Presto med text, ORC Parquet data för olika volymer och parallelism. En kort analys och sammanfattning är presenterad med ett resonemang om valet av framework och dataformat för ett system som exekverar interaktiva queries på big data. Hadoop SQL interactive analysis Hive Spark Spark SQL Presto Big Data Computer and Information Sciences Data- och informationsvetenskap
103	Simulation studies of the effects of lean operation, turbocharging and heat transfer on spark ignition engines Watts, Paula A January 1979 (has links) Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 1979. / MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. / Includes bibliographical references. / by Paula A. Watts. / M.S. Mechanical Engineering. Superchargers
104	Webová aplikace pro grafické zadávání a spouštění Spark úloh / Web Application for Graphical Description and Execution of Spark Tasks Hmeľár, Jozef January 2018 (has links) This master's thesis deals with Big data processing in distributed system Apache Spark using tools, which allow remotely entry and execution of Spark tasks through web inter- face. Author describes the environment of Spark in the first part, in the next he focuses on the Apache Livy project, which offers REST API to run Spark tasks. Contemporary solutions that allow interactive data analysis are presented. Author further describes his own application design for interactive entry and launch of Spark tasks using graph repre- sentation of them. Author further describes the web part of the application as well as the server part of the application. In next section author presents the implementation of both parts and, last but not least, the demonstration of the result achieved on a typical task. The created application provides an intuitive interface for comfortable working with the Apache Spark environment, creating custom components, and also a number of other options that are standard in today's web applications.
105	Modelem řízený vývoj Spark úloh / Model Driven Development of Spark Tasks Bútora, Matúš January 2019 (has links) The aim of the master thesis is to describe Apache Spark framework , its structure and the way how Spark works . Next goal is to present topic of Model- Driven Development and Model-Drive Architecture . Define their advantages , disadvantages and way of usage . However , the main part of this text is devoted to design a model for creating tasks in Apache Spark framework . Text desribes application , that allows user to create graph based on proposed modeling language . Final application allows user to generate source code from created model.
106	Distributed Graph Storage And Querying System Balaji, Janani 12 August 2016 (has links) Graph databases offer an efficient way to store and access inter-connected data. However, to query large graphs that no longer fit in memory, it becomes necessary to make multiple trips to the storage device to filter and gather data based on the query. But I/O accesses are expensive operations and immensely slow down query response time and prevent us from fully exploiting the graph specific benefits that graph databases offer. The storage models of most existing graph database systems view graphs as indivisible structures and hence do not allow a hierarchical layering of the graph. This adversely affects query performance for large graphs as there is no way to filter the graph on a higher level without actually accessing the entire information from the disk. Distributing the storage and processing is one way to extract better performance. But current distributed solutions to this problem are not entirely effective, again due to the indivisible representation of graphs adopted in the storage format. This causes unnecessary latency due to increased inter-processor communication. In this dissertation, we propose an optimized distributed graph storage system for scalable and faster querying of big graph data. We start with our unique physical storage model, in which the graph is decomposed into three different levels of abstraction, each with a different storage hierarchy. We use a hybrid storage model to store the most critical component and restrict the I/O trips to only when absolutely necessary. This lets us actively make use of multi-level filters while querying, without the need of comprehensive indexes. Our results show that our system outperforms established graph databases for several class of queries. We show that this separation also eases the difficulties in distributing graph data and go on propose a more efficient distributed model for querying general purpose graph data using the Spark framework. Graph Databases Distributed Graph Databases Distributed Graph Query Processing Spark
107	Computational simulations of fuel/air mixture flow in the intake port of a SI engine Lim, Bryan Neo Beng January 1999 (has links) No description available. 621.43
108	Droplet atomisation of Newtonian and non-Newtonian fluids including automotive fuels Whitelaw, David Stuart January 1997 (has links) No description available. 662.6
109	Investigation of transient plasma ignition for a Pulse Detonation Engine Rodriguez, Joel. 03 1900 (has links) Elimination or reduction of auxiliary oxygen use in Pulse Detonation Engines (PDEs) is necessary if the technology is to compete with existing Ramjet systems. This thesis investigated a Transient Plasma Ignition (TPI) system and found that the technology can at least reduce and may be able to completely remove the auxiliary oxygen requirement of current PDE systems. TPI was tested and compared with a traditional capacitive discharge spark plug system in a dynamic flow, ethylene/air mixture combustor. Ignition delay time, Deflagration-to-Detonation transition (DDT) distance and time, detonation wave speed and fire success rate performance were analyzed for various mass flow rates and stoichiometric ratios. A transient plasma dualelectrode concept was also employed and analyzed. Results show that TPI is more effective and reliable than the spark plug ignition with considerable improvements to DDT performance. The TPI dual-electrode concept was proven to be the most effective configuration with average reductions in DDT distance and time of 17% and 41% respectively when compared to the capacitive discharge spark plug system configuration. Propulsion systems Detonation waves Spark ignition engines Ethylene
110	Fundamentální analýza investiční přiležitosti v oblasti energetiky / Fundamental Analysis of an Investment Opportunity on the Power Market Kĺučár, Michal January 2009 (has links) This thesis is trying to understand and describe fundamental principles of power market which needs to be understood by every investor present on this market. These principles are then challenged based on the observations and everyday power market experience. A special focus is put here on understanding of wholesalemarket price-setting mechanism and its implications for the choice of the power production technology by the investor.

Search results