331 |
Distributed real-time processing using GNU/Linux/libré software and COTS hardwareVan Schalkwyk, Dirko 03 1900 (has links)
Thesis (MScIng)--Stellenbosch University, 2004. / ENGLISH ABSTRACT: This dissertation's research aims at studying the viability of using both low cost
consumer Commodity Off The Self (COTS) PCs and libn~software in implementing a
distributed real-time system modeled on a real-world engineering problem.
Debugging and developing a modular satellite system is both time consuming
and complex, to this end the SUNSATteam has envisioned the Interactive Test System
that would be a dual mode simulator/monitoring system. It is this system that requires
a real-time back-end and is used as a real world problem model to implement.
The implementation was accomplished by researching the available parallel processing
software and real-time extensions to GNU/Linux and choosing the appropriate
solutions based on the needs of the model. A monitoring system was also implemented,
for system verification, using freely available system monitoring utilities.
The model was successfully implemented and verified with a global synchronization
of < 10ms. It was shown that GNU/Linux and libn~ software is both mature
enough and appropriate in solving a real world distributed real-time problem. / AFRIKAANSE OPSOMMING: Die tesis se navorsing is daarop gemik om die toepaslikheid van beide lae koste verbruikers
Komoduteits Van Die Rak (KVDR)persoonlike rekenaars en vemiet sagteware
in die implementasie van verspreide intydse stelsels te ondersoek aan die hand van die
oplossing van 'n gemodelleerde ingenieurs probleem.
Die ontfouting en ontwikkeling van 'n modulere satelliet is beide tyd rowend en
kompleks, om hierdie te vergemaklik het die SUNSAT span die Interaktiewe Toets
Stelsel gekonseptualiseer, wat in wese'n dubbel modus simulator/moniteerings stelsel
sou wees. Dit is hierdie stelsel wat 'n verspreide intydse onderstel benodig en dien as
die regte wereld probleem model om op te los.
Die implementasie is bereik deur die beskikbare verspreide verwerkings sagteware
en intydse uitbreidings vir GNU/Linux te ondersoek en die toepaslike opsies te
kies gegrond op die behoeftes van die model. 'n Moniteerings stelsel is ook geimplimenteer,
met behulp van libn~sagteware, vir stelsel verifikasie.
Die model was suksesvol geimplimenteer en geverifieer met 'n globale sinkronisasie
van < 10ms. Daar is getoon dat GNU/Linux en libn~sagteware beide volwaardig en
geskik is vir die oplossing van regte wereld verspreide intydse probleme.
|
332 |
Processamento eficiente de junção espacial em ambiente paralelo e distribuído baseado em SpatialhadoopMendes, Eduardo Fernando 17 February 2017 (has links)
Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2017-08-17T12:19:08Z
No. of bitstreams: 1
TeseEFM.pdf: 31334481 bytes, checksum: 966afb8a981794db0aee3bc97ee11d5b (MD5) / Approved for entry into archive by Ronildo Prado (producaointelectual.bco@ufscar.br) on 2017-10-25T17:55:23Z (GMT) No. of bitstreams: 1
TeseEFM.pdf: 31334481 bytes, checksum: 966afb8a981794db0aee3bc97ee11d5b (MD5) / Approved for entry into archive by Ronildo Prado (producaointelectual.bco@ufscar.br) on 2017-10-25T17:55:35Z (GMT) No. of bitstreams: 1
TeseEFM.pdf: 31334481 bytes, checksum: 966afb8a981794db0aee3bc97ee11d5b (MD5) / Made available in DSpace on 2017-10-25T18:01:51Z (GMT). No. of bitstreams: 1
TeseEFM.pdf: 31334481 bytes, checksum: 966afb8a981794db0aee3bc97ee11d5b (MD5)
Previous issue date: 2017-02-17 / Não recebi financiamento / The huge volume of spatial data generated and made available in recent years from
different sources, such as remote sensing, smart phones, space telescopes, and
satellites, has motivated researchers and practitioners around the world to find out a way
to process efficiently this huge volume of spatial data. Systems based on the MapReduce
programming paradigm, such as Hadoop, have proven to be an efficient framework for
processing huge volumes of data in many applications. However, Hadoop has showed
not to be adequate in native support for spatial data due to its central structure is not
aware of the spatial characteristics of such data. The solution to this problem gave rise to
SpatialHadoop, which is a Hadoop extension with native support for spatial data.
However, SpatialHadoop does not enable to jointly allocate related spatial data and also
does not take into account any characteristics of the data in the process of task scheduler
for processing on the nodes of a cluster of computers. Given this scenario, this PhD
dissertation aims to propose new strategies to improve the performance of the processing
of the spatial join operations for huge volumes of data using SpatialHadoop. For this
purpose, the proposed solutions explore the joint allocation of related spatial data and the
scheduling strategy of MapReduce for related spatial data also allocated in a jointly form.
The efficient data access is an essential step in achieving better performance during
query processing. Therefore, the proposed solutions allow the reduction of network traffic
and I/O operations to the disk and consequently improve the performance of spatial join
processing by using SpatialHadoop. By means of experimental evaluations, it was
possible to show that the novel data allocation policies and scheduling tasks actually
improve the total processing time of the spatial join operations. The performance gain
varied from 14.7% to 23.6% if compared to the baseline proposed by CoS-HDFS and
varied from 8.3% to 65% if compared to the native support of SpatialHadoop. / A explosão no volume de dados espaciais gerados e disponibilizados nos últimos anos,
provenientes de diferentes fontes, por exemplo, sensoriamento remoto, telefones
inteligentes, telescópios espaciais e satélites, motivaram pesquisadores e profissionais
em todo o mundo a encontrar uma forma de processar de forma eficiente esse grande
volume de dados espaciais. Sistemas baseados no paradigma de programação
MapReduce, como exemplo Hadoop, provaram ser durante anos um framework eficiente
para o processamento de enormes volumes de dados em muitas aplicações. No entanto,
o Hadoop demonstrou não ser adequado no suporte nativo a dados espaciais devido a
sua estrutura central não ter conhecimento das características espaciais desses dados.
A solução para este problema deu origem ao SpatialHadoop, uma extensão do Hadoop,
com suporte nativo para dados espaciais. Entretanto o SpatialHadoop não é capaz de
alocar conjuntamente dados espaciais relacionados e também não leva em consideração
qualquer característica dos dados no processo de escalonamento das tarefas para
processamento nos nós de um cluster de computadores. Diante deste cenário, esta tese
tem por objetivo propor novas estratégias para melhorar o desempenho do
processamento das operações de junção espacial para grandes volumes de dados
usando o SpatialHadoop. Para tanto, as soluções propostas exploram a alocação
conjunta dos dados espaciais relacionados e a estratégia de escalonamento de tarefas
MapReduce para dados espaciais relacionados também alocados de forma conjunta.
Acredita-se que o acesso eficiente aos dados é um passo essencial para alcançar um
melhor desempenho durante o processamento de consultas. Desta forma, as soluções
propostas permitem a redução do tráfego de rede e operações de Entrada/Saída para o
disco e consequentemente melhoram o desempenho no processamento de junção
espacial usando SpatialHadoop. Por meio de testes de desempenho experimentais foi
possível comprovar que as novas políticas de alocação de dados e escalonamento de
tarefas de fato melhoram o tempo total de processamento das operações de junção
espacial. O ganho de desempenho variou de 14,7% a 23,6% com relação ao baseline
proposto por CoS-HDFS e variou de 8,3% a 65% com relação ao suporte nativo do
SpatialHadoop.
|
333 |
Smart distributed processing technologies for hedge fund managementThayalakumar, Sinnathurai January 2017 (has links)
Distributed processing cluster design using commodity hardware and software has proven to be a technological breakthrough in the field of parallel and distributed computing. The research presented herein is the original investigation on distributed processing using hybrid processing clusters to improve the calculation efficiency of the compute-intensive applications. This has opened a new frontier in affordable supercomputing that can be utilised by businesses and industries at various levels. Distributed processing that uses commodity computer clusters has become extremely popular over recent years, particularly among university research groups and research organisations. The research work discussed herein addresses a bespoke-oriented design and implementation of highly specific and different types of distributed processing clusters with applied load balancing techniques that are well suited for particular business requirements. The research was performed in four phases, which are cohesively interconnected, to find a suitable solution using a new type of distributed processing approaches. The first phase is an implementation of a bespoke-type distributed processing cluster using an existing network of workstations as a calculation cluster based on a loosely coupled distributed process system design that has improved calculation efficiency of certain legacy applications. This approach has demonstrated how to design an innovative, cost-effective, and efficient way to utilise a workstation cluster for distributed processing. The second phase is to improve the calculation efficiency of the distributed processing system; a new type of load balancing system is designed to incorporate multiple processing devices. The load balancing system incorporates hardware, software and application related parameters to assigned calculation tasks to each processing devices accordingly. Three types of load balancing methods are tested, static, dynamic and hybrid, which each of them has their own advantages, and all three of them have further improved the calculation efficiency of the distributed processing system. The third phase is to facilitate the company to improve the batch processing application calculation time, and two separate dedicated calculation clusters are built using small form factor (SFF) computers and PCs as separate peer-to-peer (P2P) network based calculation clusters. Multiple batch processing applications were tested on theses clusters, and the results have shown consistent calculation time improvement across all the applications tested. In addition, dedicated clusters are built using SFF computers with reduced power consumption, small cluster size, and comparatively low cost to suit particular business needs. The fourth phase incorporates all the processing devices available in the company as a hybrid calculation cluster utilises various type of servers, workstations, and SFF computers to form a high-throughput distributed processing system that consolidates multiple calculations clusters. These clusters can be utilised as multiple mutually exclusive multiple clusters or combined as a single cluster depending on the applications used. The test results show considerable calculation time improvements by using consolidated calculation cluster in conjunction with rule-based load balancing techniques. The main design concept of the system is based on the original design that uses first principle methods and utilises existing LAN and separate P2P network infrastructures, hardware, and software. Tests and investigations conducted show promising results where the company's legacy applications can be modified and implemented with different types of distributed processing clusters to achieve calculation and processing efficiency for various applications within the company. The test results have confirmed the expected calculation time improvements in controlled environments and show that it is feasible to design and develop a bespoke-type dedicated distributed processing cluster using existing hardware, software, and low-cost SFF computers. Furthermore, a combination of bespoke distributed processing system with appropriate load balancing algorithms has shown considerable calculation time improvements for various legacy and bespoke applications. Hence, the bespoke design is better suited to provide a solution for the calculation of time improvements for critical problems currently faced by the sponsoring company.
|
334 |
A semi-formal comparison between the Common Object Request Broker Architecture (COBRA) and the Distributed Component Object Model (DCOM)Conradie, Pieter Wynand 06 1900 (has links)
The way in which application systems and software are built has changed dramatically over the past few
years. This is mainly due to advances in hardware technology, programming languages, as well as the
requirement to build better software application systems in less time. The importance of mondial (worldwide)
communication between systems is also growing exponentially. People are using network-based
applications daily, communicating not only locally, but also globally. The Internet, the global network,
therefore plays a significant role in the development of new software. Distributed object computing is one
of the computing paradigms that promise to solve the need to develop clienVserver application systems,
communicating over heterogeneous environments.
This study, of limited scope, concentrates on one crucial element without which distributed object computing
cannot be implemented. This element is the communication software, also called middleware, which allows
objects situated on different hardware platforms to communicate over a network. Two of the most important
middleware standards for distributed object computing today are the Common Object Request Broker
Architecture (CORBA) from the Object Management Group, and the Distributed Component Object
Model (DCOM) from Microsoft Corporation. Each of these standards is implemented in commercially
available products, allowing distributed objects to communicate over heterogeneous networks.
In studying each of the middleware standards, a formal way of comparing CORBA and DCOM is presented,
namely meta-modelling. For each of these two distributed object infrastructures (middleware), meta-models
are constructed. Based on this uniform and unbiased approach, a comparison of the two distributed object
infrastructures is then performed. The results are given as a set of tables in which the differences and
similarities of each distributed object infrastructure are exhibited. By adopting this approach, errors caused
by misunderstanding or misinterpretation are minimised. Consequently, an accurate and unbiased
comparison between CORBA and DCOM is made possible, which constitutes the main aim of this
dissertation. / Computing / M. Sc. (Computer Science)
|
335 |
Impact de la coopération dans les nouvelles plates-formes de calcul à hautes performances / Impact de la coopération dans les nouvelles plates-formes de calcul à hautes performancesAngelis Cordeiro, Daniel de 09 February 2012 (has links)
L'informatique a changé profondément les aspects méthodologiques du processus de découverte dans les différents domaines du savoir. Les chercheurs ont à leur disposition aujourd'hui de nouvelles capacités qui permettent d'envisager la résolution de nouveaux problèmes. Les plates-formes parallèles et distribués composées de ressources partagés entre différents participants peuvent rendre ces nouvelles capacités accessibles à tout chercheur et offre une puissance de calcul qui a été limitée jusqu'à présent, aux projets scientifiques les plus grands (et les plus riches). Dans ce document qui regroupe les résultats obtenus pendant mon doctorat, nous explorons quatre facettes différentes de la façon dont les organisations s'engagent dans une collaboration sur de plates-formes parallèles et distribuées. En utilisant des outils classiques de l'analyse combinatoire, de l'ordonnancement multi-objectif et de la théorie des jeux, nous avons montré comment calculer des ordonnancements avec un bon compromis entre les résultats obtenu par les participants et la performance globale de la plate-forme. En assurant des résultats justes et en garantissant des améliorations de performance pour les différents participants, nous pouvons créer une plate-forme efficace où chacun se sent toujours encourager à collaborer et à partager ses ressources. Tout d'abord, nous étudions la collaboration entre organisations égoïstes. Nous montrons que le comportement égoïste entre les participants impose une borne inférieure sur le makespan global. Nous présentons des algorithmes qui font face à l'égoïsme des organisations et qui présentent des résultats équitables. La seconde étude porte sur la collaboration entre les organisations qui peuvent tolérer une dégradation limitée de leur performance si cela peut aider à améliorer le makespan global. Nous améliorons les bornes d'inapproximabilité connues sur ce problème et nous présentons de nouveaux algorithmes dont les garanties sont proches de l'ensemble de Pareto (qui regroupe les meilleures solutions possibles). La troisième forme de collaboration étudiée est celle entre des participants rationnels qui peuvent choisir la meilleure stratégie pour leur tâches. Nous présentons un modèle de jeu non coopératif pour le problème et nous montrons comment l'utilisation de "coordination mechanisms" permet la création d'équilibres approchés avec un prix de l'anarchie borné. Finalement, nous étudions la collaboration entre utilisateurs partageant un ensemble de ressources communes. Nous présentons une méthode qui énumère la frontière des solutions avec des meilleurs compromis pour les utilisateurs et sélectionne la solution qui apporte la meilleure performance globale. / Computer science is deeply changing methodological aspects of the discovery process in different areas of knowledge. Researchers have at their disposal new capabilities that can create novel research opportunities. Parallel and distributed platforms composed of resources shared between different participants can make these new capabilities accessible to every researcher at every level, delivering computational power that was restricted before to bigger (and wealthy) scientific projects. This work explores four different facets of the rules that govern how organizations engage in collaboration on modern parallel and distributed platforms. Using classical combinatorial tools, multi-objective scheduling and game-theory, we showed how to compute schedules with good trade-offs between the results got by the participants and the global performance of the platform. By ensuring fair results and guaranteeing performance improvements for the participants, we can create an efficient platform where everyone always feels encouraged to collaborate and to share its resources. First, we study the collaboration between selfish organizations. We show how the selfish behavior between the participants imposes a lower bound on the global makespan. We present algorithms that cope with the selfishness of the organizations and that achieve good fairness in practice. The second study is about collaboration between organizations that can tolerate a limited degradation on their performance if this can help ameliorate the global makespan. We improve the existing inapproximation bounds for this problem and present new algorithms whose guarantees are close to the Pareto set. The third form of collaboration studied is between rational participants that can independently choose the best strategy for their jobs. We present a non-cooperative game-theoretic model for the problem and show how coordination mechanisms allow the creation of approximate pure equilibria with bounded price of anarchy. Finally, we study collaboration between users sharing a set of common resources. We present a method that enumerates the frontier of best compromise solutions for the users and selects the solution that brings the best value for the global performance function.
|
336 |
Times assíncronos inicializadores para o planejamento da expansão da transmissão de energia elétrica baseados no modelo híbrido linear /Sanchez, Fernando Rodrigo Lopes. January 2008 (has links)
Orientador: Sérgio Azevedo de Oliveira / Banca: Rubén Augusto Romero Lazaro / Banca: Eduardo Nobuhiro Asada / Resumo: Neste trabalho foram implementados diversos agentes heuristicos construtivos, baseados no modelo híbrido linear, que fazem parte de um time assíncrono que tem como objetivo gerar configurações de boa qualidade para inicializar as metaheuríticas que resolvem o problema do planejamento da expansão da transmissão dos sistemas de energia elétrica. A teoria de times assíncronos foi aplicada para reunir as qualidades individuais dos métodos heurísticos, de uma maneira que, partindo de uma configuração base (sem adições) e utilizando um fluxo de dados cíclico, os agentes construtivos adicionassem circuitos a esta configuração de maneira sistemática e aleatória até que esta atenda as demandas de carga solicitadas pelo sistema elétrico em um horizonte futuro. Estas configurações foram então utilizadas por um algoritmo genético no intuito de validar a qualidade das mesmas. Os algoritmos foram implementados em Fortran, utilizando as rotinas de trocas de mensagens do LAM-MPI e simulados para sistemas teste de pequeno, médio e grande porte em ambiente de processamento distribuido. Os resultados comprovam que os times ass'ıncronos de vários metodos heurísticos são mais eficazes comparados com uma única heurística. / Abstract: In this study, it was implemented several constructive heuristic algorithms, based on hybrid linear model, which are part of a asynchronous team that aims to generate initial solutions with good quality for meta-heuristics that solve the transmission expansion planning problem of electric power systems. The theory of asynchronous team was applied to meet the individual qualities of each heuristic method, in a way that, starting from a base network configuration and using a cyclical flow of data, heuristic agents add circuits to is configuration in a systematic and random way until they meet the load demands requested by the electrical system on a future horizon. Then these configurations are utilized by a genetic algorithm in order to validate the quality of them. The algorithms were implemented in Fortran, using exchanging messages routines from LAM-MPI and simulated for small, medium and large size test-systems in distributed processing environment. The results show that the solutions obtained with asynchronous teams of several heuristic methods are more effective than the solutions with a single heuristic algorithm. / Mestre
|
337 |
Uma heuristica de agrupamento de caminhos para escalonamento de tarefas em grades computacionaisBittencourt, Luiz Fernando, 1981- 15 March 2006 (has links)
Orientador: Edmundo Roberto Mauro Madeira / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-06T12:20:00Z (GMT). No. of bitstreams: 1
Bittencourt_LuizFernando_M.pdf: 1217558 bytes, checksum: dcbdeb1eaf538ae17a83304451a73126 (MD5)
Previous issue date: 2006 / Resumo: Uma grade computacional é um sistema heterogêneo colaborativo, geograficamente distribuído, multi-institucional e dinâmico, onde qualquer recurso computacional ligado a uma rede, local ou não, é um potencial colaborador. Grades computacionais são atualmente um grande foco de estudos relacionados à execução de aplicações paralelas, tanto aquelas que demandam grande poder computacional quanto aquelas que se adaptam bem a ambientes distribuídos. Como os recursos de uma grade pertencem a vários domínios administrativos diferentes com políticas diferentes, cada recurso tem autonomia para participar ou deixar de participar da grade em qualquer momento. Essa característica dinâmica e a heterogeneidade tornam o escalonamento de aplicações, a gerência de recursos e a tolerância a falhas grandes desafios nesses sistemas. Particularmente, o escalonamento desempenha um papel de suma importância, pois é determinante no tempo de execução das aplicações. O escalonamento de tarefas é um problema NP-Completo [6], o que levou ao desenvolvimento de uma heurística para o problema de otimização associado. Neste trabalho apresentamos um escalonador de tarefas em grades computacionais baseado no Xavantes [3], um middleware que oferece suporte a execução de tarefas dependentes através de estruturas de controle hierárquicas chamadas controladores. O algoritmo desenvolvido, chamado de Path Clustering Heuristic (PCH), agrupa as tarefas com o objetivo de minimizar a comunicação entre os controladores e as tarefas, diminuindo o tempo de execução total do processo / Abstract: A computational grid is a collaborative heterogeneous, geographically distributed, multiinstitutional and dynamic system, where any computational resource with a network connection, local or remote, is a potential collaborator. In computational grids, problems related to the execution of parallel applications, those which need a lot of computational power, as well as those which fit well in distributed environments, are wide studied nowadays. As the grid resources belong to various different administrative domains with different policies, each resource has the autonomy to participate or leave the grid at any time. These dynamic and heterogeneous characteristics make the application scheduling, the resource management and the fault tolerance relevant issues on these systems. Particularly, the scheduler plays an important role, since it is determinative in the execution time of an application. The task scheduling problem is NP-Complete [6], what led to the development of a heuristic for the associated optimization problem. In this work we present a task scheduler for a computational grid based on Xavantes [3], a middleware that supports dependent task execution through control structures called controllers. The developed algorithm, called Path Clustering Heuristic (PCH), clusterizes tasks aiming to minimize the communication between controllers and tasks, reducing the process execution time / Mestrado / Sistemas de Computação / Mestre em Ciência da Computação
|
338 |
Estrategias para comercialização de recursos computacionais em desktop grids / Strategies for computational resources trading in desktop gridsGois, Lourival Aparecido de 14 August 2018 (has links)
Orientador: Walter da Cunha Borelli / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-14T06:39:19Z (GMT). No. of bitstreams: 1
Gois_LourivalAparecidode_D.pdf: 1692406 bytes, checksum: ee697c2c8ee85fea4e2c5f9448722ac5 (MD5)
Previous issue date: 2009 / Resumo: A associação de máquinas ociosas em arquiteturas denominadas desktop grids, representam um avanço significativo na solução de problemas complexos nas áreas de ciências, engenharia, comércio entre outras. A grande dificuldade na implementação e na continuidade destas arquiteturas está em manter os níveis de disponibilidades exigidos por seus usuários, já que em sua maioria, são constituídas por voluntários que compartilham seus recursos sem nenhum compromisso formalizado. Esta tese propõe estratégias destinadas à modelagem de um sistema de gerenciamento de recursos denominado DGridE - Desktop Grid Economy, fundamentado nos princípios microeconômicos que orientam os mercados tradicionais de bens e serviços. São apresentados os mecanismos que permitem aos fornecedores identificarem e informarem suas disponibilidades ao gerente da cooperativa a qual pertencem. Também são propostas estratégias que permitem ao DGridE a estruturação de um mercado de recursos computacionais, com a integração de seus componentes por meio de mecanismos de formação de preços, negociação intra e inter cooperativas e controle das transações comerciais decorrentes dos processos de compartilhamento. As contribuições individuais no mercado interno da cooperativa e as expectativas dos consumidores e fornecedores, foram inseridas no mecanismo de formação dos preços de compra e venda, buscando com isto, aumentar o nível de satisfação dos participantes do mercado. / Abstract: The association of idle machines in desktop grids architectures represents a significant progress in the solution of complex problems in areas as science, engineering, trade and others. The difficulty in the implementation and in the continuity of these architectures is to maintain the levels of availability demanded by their users. In its majority, the users are constituted by volunteers that share their resources without any formalized commitment. This thesis proposes strategies for the design of a resources administration system called DGridE - Desktop Grid Economy, based on the microeconomics principle of traditional markets of goods and services. This approach allowed the structuring of a computational resource market through the utilization of formation prices mechanisms, negotiation inside and outside of the administrative domain and control of commercial transactions of the sharing processes. The individual satisfactions reached based on the well succeed sharing were appraised in order to determine their influences in the continuity of the market. / Doutorado / Telecomunicações e Telemática / Doutor em Engenharia Elétrica
|
339 |
DSI-RTree - Um Índice R-Tree Distribuído Escalável / DSI-RTree - A distributed Scalable R-Tree IndexOLIVEIRA, Thiago Borges de 15 December 2010 (has links)
Made available in DSpace on 2014-07-29T14:57:47Z (GMT). No. of bitstreams: 1
dissertacao thiago b de oliveira 2010.pdf: 575961 bytes, checksum: 7a5a7e195780fa853d33c7629520df2a (MD5)
Previous issue date: 2010-12-15 / The demand for spatial data processing systems that support the creation of massive applications has steadily grown in the increasingly ubiquitous computing world. These demands aims to explore the large amount of existing data to assist people s daily lives and provide new tools for business and government. Most of the current solutions to process spatial data do not meet the scalability needed, and thus new solutions that efficiently use distributed computing resources are needed. This work presents a distributed and scalable
system called DSI-RTree, which implements a distributed index to process spatial data in a cluster of computers. We also have done a review of details related to the construction
of the distributed spatial index, by addressing issues such as the size of data partitions, how that partitions are distributed and the impact of these definitions in the message flow
on the cluster. An equation to calculate the size of the partitions based on the size of data sets is proposed, to ensure efficiently query processing on the proposed architecture. We have done some experiments running window queries in spatial data sets of 33,000 and 158,000 polygons and the results showed a scalability greater than linear. / Em face de um mundo computacional ubíquo cada vez mais possível, tem crescido constantemente a necessidade de sistemas de processamento de dados espaciais que suportem
a criação de aplicações massivas para explorar a grande quantidade de dados existente, a fim de auxiliar a vida cotidiana das pessoas e prover novas ferramentas para empresas e governo. Soluções atuais de processamento, em sua maioria, não possuem a escalabilidade necessária para atender esta demanda e novas soluções distribuídas que usam eficientemente os recursos computacionais são necessárias. Este trabalho apresenta o DSIRTree, um sistema distribuído e escalável, que implementa a indexação e processamento
distribuído de dados espaciais em um cluster de computadores. Uma avaliação de parâmetros da construção do índice espacial distribuído é realizada, abordando aspectos como o tamanho das partições criadas, a forma de distribuição destas partições e o impacto destas definições na troca de mensagens entre as máquinas do cluster. Uma fórmula para cálculo do tamanho das partições conforme o tamanho dos datasets é proposta, a fim de garantir eficiência no processamento de consultas na arquitetura projetada. Testes práticos do sistema mostraram uma escalabilidade maior que linear no processamento de consultas de janela em datasets espaciais de 32 e 158 mil polígonos.
|
340 |
Uma solução de alta disponibilidade para o sistema de arquivos distribuidos do Hadoop / A high availability solution for the Hadoop distributed file systemOriani, André, 1984- 22 August 2018 (has links)
Orientador: Islene Calciolari Garcia / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-22T22:11:10Z (GMT). No. of bitstreams: 1
Oriani_Andre_M.pdf: 3560692 bytes, checksum: 90ac96e4274dea19b7bcaec78aa959f8 (MD5)
Previous issue date: 2013 / Resumo: Projetistas de sistema geralmente optam por sistemas de arquivos baseados em cluster como solução de armazenamento para ambientes de computação de alto desempenho. A razão para isso é que eles provêm dados com confiabilidade, consistência e alta vazão. Porém a maioria desses sistemas de arquivos emprega uma arquitetura centralizada, o que compromete sua disponibilidade. Este trabalho foca especificamente em um exemplar de tais sistemas, o Hadoop Distributed File System (HDFS). O trabalho propõe um hot standby para o nó mestre do HDFS a fim de conferir-lhe alta disponibilidade. O hot standby é implementado por meio da (i) extensão da replicação de estado do mestre realizada por seu checkpoint helper, o Backup Node; e por meio da (ii) introdução de um mecanismo automático de failover. O passo (i) aproveitou-se da técnica de duplicação de mensagens desenvolvida por outra técnica de alta disponibilidade para o HDFS chamada Avatar Nodes. O passo (ii) empregou ZooKeeper, um serviço distribuído de coordenação. Essa estratégia resultou em mudanças de código pequenas, cerca de 0,18% do código original, o que faz a solução ser de fácil estudo e manutenção. Experimentos mostraram que o custo adicional imposto pela replicação não aumentou em mais de 11% o consumo médio de recursos pelos nós do sistema nem diminuiu a vazão de dados comparando-se com a versão original do HDFS. A transição completa para o hot standby pode tomar até 60 segundos quando sob cargas de trabalho dominadas por operações de E/S, mas menos de 0,4 segundos em cenários com predomínio de requisições de metadados. Estes resultados evidenciam que a solução desenvolvida nesse trabalho alcançou seus objetivos de produzir uma solução de alta disponibilidade para o HDFS com baixo custo e capaz de reagir a falhas em um breve espaço de tempo / Abstract: System designers generally adopt cluster-based file systems as the storage solution for high-performance computing environments. That happens because they provide data with reliability, consistency and high throughput. But most of those fie systems employ a centralized architecture which compromises their availability. This work focuses on a specimen of such systems, the Hadoop Distributed File System (HDFS). A hot standby for the master node of HDFS is proposed in order to bring high availability to the system. The hot standby was achieved by (i) extending the master's state replication performed by its checkpointer helper, the Backup Node; and by (ii) introducing an automatic failover mechanism. Step (i) took advantage of the message duplication technique developed by other high availability solution for HDFS named AvatarNodes. Step (ii) employed ZooKeeper, a distributed coordination service. That approach resulted on small code changes, around 0.18% of the original code, which makes the solution easy to understand and to maintain. Experiments showed that the overhead implied by replication did not increase the average resource consumption of system nodes by more than 11% nor did it diminish the data throughput compared to the original version of HDFS. The complete transition for the hot standby can take up to 60 seconds on workloads dominated by I/O operations, but less than 0.4 seconds when there is predominance of metadata requisitions. Those results show that the solution developed on this work achieved the goals of producing a high availability solution for the HDFS with low overhead and short reaction time to failures / Mestrado / Ciência da Computação / Mestre em Ciência da Computação
|
Page generated in 0.037 seconds