Global ETD Search

191	Yali : uma extensão do modelo linda para programação paralela em redes heterogêneas / Yali, an extension to the linda model intended for parallel programming in heterogeneous computer networks Charao, Andrea Schwertner January 1996 (has links) Com a disponibilidade de redes que ligam estações cada vez mais poderosas a baixos custos, o interesse em torno de ferramentas que suportam a programação paralela em arquiteturas deste tipo tem aumentado significativamente. Esta dissertação trata do projeto e implementação de YALI (Yet Another Linda Implementation), uma ferramenta destinada ao desenvolvimento e execução de programas paralelos em redes heterogêneas de computadores. Com o objetivo de oferecer uma interface simples e flexível para os usuários programadores, YALI baseia-se no modelo Linda[GEL85], que destaca-se por utilizar uma abstração de alto nível para a cooperação entre processos. Em Linda, processos interagem por intermédio de uma memória associativa logicamente compartilhada, denominada Espaço de Tuplas. Entre outras vantagens deste modelo pode-se citar a simplicidade de suas primitivas e a possibilidade de incorporá-las a uma linguagem seqüencial conhecida, o que contribui fortemente para sua fácil assimilação, mesmo por usuários com pouca experiência em programação paralela. Após uma descrição detalhada do modelo Linda, este trabalho discute varias questões envolvidas no projeto e implementação de sistemas nele baseados. Para oferecer uma visão pratica das soluções mais freqüentemente adotadas para estas questões, quatro sistemas que implementam o modelo para programação paralela em redes são apresentados e avaliados. São eles: Glenda, uma implementacao do modelo baseada na ferramenta PVM (Parallel Virtual Machine); POSYBL (PrOgramming SYstem for distriButed appLications), um sistema construído através de recursos de sistemas operacionais compatíveis com Unix; p4-Linda, construído a partir da ferramenta de programação paralela p4 e, por fim, Network-Linda, uma implementação comercial do modelo. Depois do estudo dos quatro sistemas acima, o projeto de YALI e discutido detalhadamente. Decidiu-se, inicialmente, que YALI deveria incorporar o modelo Linda a linguagem C, que é largamente utilizada no desenvolvimento de programas de propósito geral. Além disso, optou-se por estender o modelo com algumas novas primitivas, de modo a oferecer maior poder de expressão ao usuário. Basicamente, as primitivas que YALI acrescenta ao modelo servem para dar suporte a operações globais e a criação dinâmica de threads. Operações globais servem para expressar a comunicação e a sincronização entre múltiplos processos, sendo utilizadas com bastante freqüência em vários tipos de programas paralelos. YALI suporta operações globais de maneira totalmente ortogonal ao modelo Linda, garantindo melhor desempenho sem afetar o nível de abstração oferecido. o suporte a criação dinâmica de threads, por outro lado, tem o objetivo de permitir a exploração de um paralelismo de granularidade fina, adequado ate mesmo a execução de rotinas simples em paralelo. Para suportar o desenvolvimento e execução de aplicações paralelas, YALI e implementado através de três componentes distintos. O primeiro e um pré-processador, que garante uma interface simplificada com o usuário. 0 segundo e uma biblioteca, que contem as rotinas de suporte as primitivas YALI e deve ser ligada aos programas de usuários. O terceiro componente, por fim, e um utilitário destinado a controlar a inicialização e o termino de aplicações paralelas, que baseia-se em uma configuração estabelecida pelo usuário para distribuir processos sobre uma rede de computadores. Ao contrário da maioria dos sistemas baseados em Linda, YALI implementa um espaço de tuplas distribuído entre os processos que compõem uma aplicação paralela, dispensando o use de processos especializados no gerenciamento de tuplas. Para isso, YALI utiliza múltiplas threads em cada processo definido pelo usuário, e distribui tuplas sobre estes processos através de um mecanismo baseado em hashing. A implementação de YALI leva em conta a heterogeneidade inerente a ambientes de rede, permitindo que maquinas com diferentes arquiteturas e sistemas operacionais sejam utilizadas na execução de programas paralelos. Por fim, YALI é totalmente implementado a partir de recursos presentes em sistemas compatíveis com Unix, de modo a aumentar sua portabilidade e garantir sua eficiência. / With the availability of networks connecting powerful workstations at a low cost, increasing interest has been devoted to systems that support parallel programming in such architectures. This document describes the design and implementation of YALI (Yet Another Linda Implementation), a tool that allows the development and execution of parallel programs in heterogeneous computer networks. Aiming to provide a simple and flexible interface for its users, YALI is based on the Linda parallel programming model[GEL85], that outstands in providing a high level abstraction for cooperation between processes. In Linda, communication and synchronization take place through an associative, logically shared memory called Tuple Space. Among the advantages of this model, one can mention the simplicity of its primitives, and the possibility of incorporate them in a well-known sequential language. These characteristics make Linda easy to learn, even to users with little experience in parallel programming. After a detailed description of the Linda model, this document discusses some design and implementation issues related to Linda-based systems. In order to provide a practical view of some usual solutions to address these issues, four Linda-based systems are presented and evaluated. These systems are: Glenda, an implementation of Linda built on top of PVM (Parallel Virtual Machine); POSYBL (PrOgramming SYstem for distriButed appLications), that relies on features provided by Unix-like operating systems to implement the model; p4-Linda, built on top of p4 parallel programming tool and, at last, Network-Linda, a comercial product based on Linda. All these systems, as YALI, are specially tailored to parallel programming in computer networks. Following the study of the four systems, this documents presents the design of the YALI system. One of the first design decisions was to incorporate the Linda primitives to the C language, that is broadly used as a general purpose programming language. In addition, a set of new primitives was designed as an extension to the original model, in order to increase YALI's expressivenes. Basically, the new primitives support global operations and dynamic thread creation. Global operations are useful to express communication and synchronization among multiple processes, and are frequently used many classes of parallel programs. YALI gives support to global operations in a way that is totally ortoghonal to the Linda model, ensuring better performance without affecting the abstraction level inherent to Linda-based systems. The support to dynamic thread creation, on the other hand, is helpful to explore lightweight parallelism, which allows the execution of simple routines in parallel. To support the development and execution of parallel applications, YALI is made up of three distinct components. The first is a pre-processor, that provides a simple user interface. The second is a library, that must be linked to the user programs since it's where YALI primitives are actuall y implemented. Finally, the third component is an utility that controls initialization and termination of parallel applications, which takes configuration parameters from the user to distribute processes over a newtork. In contrast with most Linda-based systems, YALI relies on a tuple space that is distributed among the processes in the same parallel application, so that intermediate tuple managers are not necessary To implement that, multiple threads are embedded in each user process, and tuples are spread over the processes in the basis of a hashing mechanism. YALI's implementation takes in account the inherent heterogeneity of network environments, allowing machines with different architectures and operating systems to be used in the execution of parallel programs. Finally, YALI is build on top of common features of Unix-like operating systems, in order to increase its efficiency and portability. Sistemas operacionais Programação paralela Sistemas distribuidos Parallel programming Linda Heterogeneity Distributed systems
192	Proposta de uma infraestrutura de baixo custo com multiprocessamento e utilizando software aberto Silva, Everaldo Lopes 16 May 2012 (has links) Made available in DSpace on 2016-04-29T14:23:06Z (GMT). No. of bitstreams: 1 Everaldo Lopes Silva.pdf: 2560111 bytes, checksum: 80d8866035ff50f6bddd07b19b709209 (MD5) Previous issue date: 2012-05-16 / This dissertation has the objective of identifying the technical aspects that deal with the utilization of computer cluster, specially the platforms with Linux operational system. It will be presented some cluster models in Linux, recognizing its advantages and its disadvantages and finally indicating the chosen model with the due justification. As part of this work, we will propose a laboratory with a cluster of two equipments connected with two gigabit interfaces each one and one computer working stand-alone. It will run Artificial Intelligence and Digital Design programs in this cluster, comparing its performance with only one computer running the same programs. The measuring and analysis will indicate if the Linux cluster would be a feasible infrastructure in technical and financial terms for AI and Digital Design application. The research method will be naturally the experimental and the approach method will be inductive, for through the results of the experimentation and technical analysis, it will be able to apply the knowledge achieved in others similar environments. For putting the experimental activity in the correct context, it will be used the more significant and contemporary research theories to establish in a clear way the scientific approach that it will lead the whole work / Esta dissertação visa identificar os aspectos técnicos e teóricos que envolvem a utilização de cluster de computadores, tratando especialmente de plataformas com o sistema operacional Linux. Serão apresentados alguns modelos de cluster em Linux, reconhecendo suas vantagens e desvantagens e por fim indicando o modelo escolhido com a devida justificativa. Como parte do trabalho, proporemos um laboratório com um agrupamento de dois equipamentos conectados com duas interfaces de rede gigabit ethernet em cada um e um computador trabalhando isoladamente. Executaremos programas de Inteligência Artificial e Design Digital nesse cluster e compararemos o seu desempenho com apenas um computador executando esses mesmos programas. As medições e análise servirão como base para análise para a verificação se um cluster de Linux seria uma infraestrutura viável em termos técnicos e financeiros para aplicações de Inteligência Artificial e Design Digital. O método de pesquisa será naturalmente a pesquisa experimental e o método de abordagem será indutivo, pois através dos resultados da experimentação e da análise técnica se poderá aplicar o conhecimento obtido em situações semelhantes. Para contextualizar a atividade experimental abordaremos as teorias de pesquisa mais significativas e contemporâneas para que se estabeleça de maneira clara a abordagem científica que norteará o trabalho como um todo Computação Redes Sistemas operacionais Processamento distribuído Computation Networking Computation Distributed systems CNPQ::OUTROS
193	Monitoramento on-line em sistemas distribuídos : mecanismo hierárquico para coleta de dados / On-line monitoring of distributed systems: a hierarchical mechanism for data collection Tesser, Rafael Keller January 2011 (has links) Este trabalho propõe um modelo hierárquico para coleta de dados de monitoramento em sistemas distribuídos. Seu objetivo é proporcionar a análise on-line do comportamento de sistemas e programas distribuídos. O meio escolhido para realizar essa análise foi a visualização. Inicialmente é apresentada uma contextualização sobre monitoramento de sistemas distribuídos. Também são abordados aspectos específicos ao monitoramento de Grid. Após, é analisado um conjunto de ferramentas de monitoramento. Então tem-se a apresentação do modelo proposto. Esse é composto por coletores locais, por uma hierarquia de agregadores e por clientes. É utilizado o modelo push de transmissão de dados e há um mecanismo de subscrição aos coletores. Foi implementado um protótipo do modelo de coleta proposto, que foi utilizado na implementação de um protótipo de ferramenta de monitoramento on-line. Nessa, os dados coletados são fornecidos ao DIMVisual, que é um modelo de integração de dados para visualização. Para visualização, o protótipo utiliza a ferramenta TRIVA, que recebe os dados integrados como entrada. Essa ferramenta foi modificada para gerar uma visualização que é atualizada de maneira on-line. Também foram realizados experimentos para avaliar o tempo necessário para enviar mensagens com diferentes hierarquias e configurações dos coletores. Além disso, foi avaliada a capacidade de o cliente implementado processar os dados recebidos, gerando sua visualização. / This work proposes a hierarchical model for collecting monitoring data from distributed systems. Its goal is to allow the on-line analysis of the behavior of distributed systems and applications. The means we chose to perform this analysis is to generate a visualization of the collected information. In the beginning of this dissertation we present an overview of the monitoring of distributed systems. Aspects that are specific to the monitoring of Grid systems are also reviewed. Next, we have an analysis of a set of monitoring tools. Then we present the proposed model, which is composed by local collectors, an hierarchical structure of aggregators and clients. A push data transmission model is used in the model and it also has a subscription mechanism. A prototype monitoring tool was implemented, integrating the data collection model with DIMVisual and TRIVA. The former is a data integration model whose output is formatted to be used as input for a visualization tool. The later is a visualization tool which, in the prototype, receives the integrated data from DIMVisual. TRIVA generates a visualization of the received information, which is updated in an on-line fashion. In order to evaluate the model, we performed a set of experiments using the prototype. One of the experiments measured the time spent to send data though different hierarchies. In these tests we have also varied the quantity and the configuration of the collectors. In another experiment we evaluated the capacity of the client to process the received data. Processamento paralelo Sistemas distribuidos Visualização Monitoring Distributed systems Distributed applications Data collection Behavoral analysis Information visualization
194	Oncogrid: uma grade computacional para a integração e compartilhamento de dados médicos em oncologia. / Oncogrid: a grid computing to the integration and sharing medical data in oncology. Alves, Higor Aparecido Vieira 28 August 2008 (has links) No Brasil as informações sobre o câncer estão distribuídas entre diferentes instituições que realizam o seu tratamento, nesse contexto são necessárias ferramentas para o levantamento do cenário nacional que possa auxiliar na atenção a doença. Este contexto motivou a criação do Oncogrid, que é uma grade computacional para integração e compartilhamento de dados médicos em oncologia e permitirá à comunidade médica a análise dos tratamentos aplicados com reflexos na gestão do câncer. Foi realizada uma pesquisa analizando as diferentes arquiteturas e componentes utilizados em projetos de grade voltados à saúde, a fim de propor uma arquitetura flexível, modular e escalável para o Oncogrid, em conformidade com as necessidades brasileiras. Realizou-se um projeto piloto entre o LSI/EPUSP e o NUTES/UFPE o qual implementou uma aplicação para geração de curvas de sobrevida utilizando o método Kaplan-Meier e serviu para avaliar a arquitetura do Oncogrid. Os resultados obtidos comprovaram a viabilidade da arquitetura utilizada e o potencial da proposta de uma grade computacional como um novo paradigma para a integração e compartilhamento de informações. O Oncogrid mostrou-se uma arquitetura computacional interessante para a realidade brasileira, especialmente no acesso as informações distribuídas, o que pode fornecer maiores subsídios para a evolução dos tratamentos e desenvolvimento de novas frentes de pesquisas. / In Brazil the cancer information is distributed among several institutions that accomplish your treatment, in this context we are need tools to build a national scenery that can be aid the cancer care. This context motivated the Oncogrid creation that is a grid computing for integration and sharing medical data in oncology and will allow the medical community to analise the applied treatments with reflection in cancer management.A study was done to analise the several architectures and components used in grid projects to health care, making possible to propose a flexible, modular and scalable architecture to the Oncogrid accordingly with the brazilian reality. An initial project between LSI/EPUSP and NUTES/UFPE that was developed an application to plot the survival curve using the Kaplan-Meier method and allow the evaluation of the Oncogrid architecture. The results achieved confirm the architecture viability used and the proposal potentiality of a grid computing with a new paradigm to the integration and sharing informations. The Oncogrid shows a viable computing architecture to Brazil, especially to access distributed information that can be prove great contributions to treatment evolution and to develop new research areas. Distributed systems Grid computing Health (computing systems) Oncologia Oncology Saúde (sistemas computacionais) Sistemas distribuídos
195	Uma ferramenta orientada ao objeto para monitoramento de cargas em sistemas paralelos. / An object oriented tool for load monitoring in parallel systems. Boas, Paulino Ribeiro Villas 27 April 2004 (has links) Este trabalho apresenta uma ferramenta orientada ao objeto para o monitoramento de cargas em sistemas paralelos. O desenvolvimento desta ferramenta surgiu com o intuito de facilitar a programação paralela em sistemas distribuídos como NOWs, Networks of Workstations , e Grids computacionais, pois este tipo de programação é bem mais difícil do que a seqüencial e, por isso, desestimula novos programadores a desenvolver aplicações paralelas. Dentre as razões que tornam a programação paralela difícil destaca-se o balanceamento de cargas em que se quer maximizar a utilização dos recursos computacionais do sistema distribuído. Outro motivo para o programador de aplicações paralelas se preocupar com balanceamento de cargas é o desempenho, que é drasticamente afetado com o desequilíbrio de cargas do sistema. Com relação ao tempo em que as decisões de rebalanceamento de cargas são tomadas, os algoritmos de distribuição de cargas podem ser estáticos, realizados em tempo de compilação, ou dinâmicos, efetuados em tempo de execução. Embora o algoritmo estático não gere sobrecarga em tempo de execução na distribuição de carga, o dinâmico é a melhor escolha, pois se adapta bem em qualquer situação. Assim, o sistema de monitoramento de cargas surge como uma ferramenta de auxílio ao programador que deseje implementar algoritmos de balanceamento dinâmico de cargas nas suas aplicações paralelas, provendo informações de como os recursos computacionais do sistema distribuído estão sendo utilizados. / This work presents an object oriented tool for load monitoring in parallel systems. This tool was developed with intention to easy the parallel programming in distributed systems like NOWs (Networks of Workstations) and Computational Grids, because this type of programming is more difficult than the sequential and, therefore, it does not stimulate new programmers to develop parallel softwares. One of the most important reasons why parallel programming is difficult is the worry about load balancing where the purpose is to maximize the use of the computational resources of the distributed system. Another reason for the programmer of parallel softwares to worry about load balancing is the performance, which is drastically affected with the load imbalance of the system. With respect to the time where the decisions of load balancing are made, the load distribution algorithms can be static, done at compilation time, or dynamic, done at execution time. Although the static algorithm does not generate overhead at execution time, the dynamic one is a better choice, because it adapts well to any situation. Thus, the monitoring system appears as a tool to aid the programmer who desires to implement dynamic load balancing algorithms in his or her parallel softwares, providing information on how the computational resources of the distributed system are being used. Computação paralela Distributed systems Load monitoring Monitoramento de carga Parallel computing Sistemas distribuidos
196	Design of a Network Independent Emergency Service Khayltash, Golara 28 February 2007 (has links) Student Number : 9301997W - MSc thesis - School of Electrical and Information Engineering - Faculty of Engineering and the Built Environment / Emergency services are vital for the minimization of damage, injury and loss of life. These services are, by definition, a combination of telecommunications and information services, and are by nature, distributed. However, most current emergency services do not take advantage of emerging technology, and hence, are restricted in the functionality they offer. This project proposes the design a full information structure for an emergency call centre service, which can be offered as a service or application on any core network. As emergency services are distributed, and combine both telecommunications and information services, an appropriate design tool which caters for these issues, is the Reference Model for Open Distributed Processing (RM-ODP), which will be used in the design of the emergency service. In addition, OSA/Parlay Application Programming Interfaces (APIs) will be used for the application to access telecommunication network functionality. The enterprise viewpoint examines the design requirements and considerations for an emergency system, which is the first step in designing a service based on the RMODP guidelines. Secondly, the information viewpoint is defined, which identifies the information flows between the objects and classes defined in the enterprise viewpoint with the aid of robustness diagrams and high level message sequence charts. Next, the computational viewpoint of the emergency service describes the components that the service consists of and the interfaces through which they communicate, enabling distribution of the system to be visualized. In addition, the engineering and technology viewpoints are briefly touched upon. The RM-ODP proves to be a useful tool the design of this application. In addition, the use of OSA/Parlay APIs have also proved beneficial, enabling the application to run on any platform, irrespective of the level of functionality it already provides. The benefits that this design offers over conventional emergency services are allowing callers and emergency response personnel full access to the functionality of the service, despite any limitations on their telecommunications network, finding the location of a caller from a fixed or mobile phone, ease and speed of obtaining relevant emergency information, and the ease and speed of sending relevant information to emergency response personnel. Finally we recommend improvements in the reliability and accuracy of finding the location of mobile phones, as well as creating ways of identifying the location of VoIP users. emergency services OSA/Parlay distributed systems RM-ODP location-based services
197	Controle de acesso para sistemas distribuídos. / Access control for distributed systems. Souza, Marcos Tork 22 November 2010 (has links) A implementação de arcabouços de controle de acesso para sistemas distribuídos é dificultada pelas características próprias desta classe de ambientes, demandando modificações tanto na arquitetura do arcabouço quanto no modelo de política de controle de acesso usualmente empregados em arquiteturas não distribuídas. Este trabalho tenciona sanar ou mitigar estas dificuldades formalizando os requisitos desta classe de ambientes em duas frentes distintas (arquitetura e modelo de política de acesso) e analisando o impacto que uma exerce sobre a outra. Duas conclusões fundamentais são suportadas por esta análise: a necessidade do arcabouço ser construído na forma de um sistema distribuído, e que embora um modelo de política de fato possa ser escolhido, a especificação deste precisará ser modificada de forma a se adaptar às características específicas do ambiente. O arcabouço DRBAC (Distributed Role Based Access Control) foi desenvolvido sobre uma arquitetura distribuída e aplica o modelo de política de controle de acesso baseado em papéis. A arquitetura foi obtida a partir da expansão da arquitetura de referência de ferramentas de controle de acesso e a especificação do modelo foi desenvolvida a partir da especificação padronizada pelo NIST (National Institute Of Standards and Technology). A validação do trabalho é levada a termo por meio de uma série de experimentos realizados sobre a implementação de uma prova de conceito deste arcabouço. / The creation of frameworks for access control in distributed systems is made difficult by this class of systems own characteristics, demanding changes in both the architecture of the framework and in the model of access control policy usually employed on non distributed systems. This works aims to solve or at least mitigate these problems by formalizing these requirements in two different fronts (architecture and model of access control policy) and analyzing its mutual impacts. Two fundamental conclusions are supported by this analysis: the need for the framework to be built in the form of a distributed system, and that although a policy model can indeed be chosen, the specification of this should to be modified to adapt the specific features of the environment. The DRBAC (Distributed Role Based Access Control) framework is built following a distributed architecture model that applies the Role Based Access Control policy. The DRBAC architecture was obtained from the expansion of the reference architecture for an access control tool for a generic access control system and the DRBAC access policy model was adapted from the one standardized by NIST (National Institute of Standards and Technology). The validation of this work is carried out through a series of experiments conducted on a proof of concept implementation of this framework. Computer network Computer security Distributed systems Redes de computadores Segurança de computadores Sistemas distribuídos
198	Symmetry breaking in congested models: lower and upper bounds Riaz, Talal 01 August 2019 (has links) A fundamental issue in many distributed computing problems is the need for nodes to distinguish themselves from their neighbors in a process referred to as symmetry breaking. Many well-known problems such as Maximal Independent Set (MIS), t-Ruling Set, Maximal Matching, and (\Delta+1)-Coloring, belong to the class of problems that require symmetry breaking. These problems have been studied extensively in the LOCAL model, which assumes arbitrarily large message sizes, but not as much in the CONGEST and k-machine models, which assume messages of size O(log n) bits. This dissertation focuses on finding upper and lower bounds for symmetry breaking problems, such as MIS and t-Ruling Set, in these congested models. Chapter 2 shows that an MIS can be computed in O(sqrt{log n loglog n}) rounds for graphs with constant arboricity in the CONGEST model. Chapter 3 shows that the t-ruling set problem, for t \geq 3, can be computed in o(log n) rounds in the CONGEST model. Moreover, it is shown that a 2-ruling set can be computed in o(log n) rounds for a large range of values of the maximum degree in the graph. In the k-machine model, k machines must work together to solve a problem on an arbitrary n-node graph, where n is typically much larger than k. Chapter 4 shows that any algorithm in the BEEP model (which assumes 'primitive' single bit messages) with message complexity M and round complexity T can be simulated in O(t(M/k^2 + T) poly(log n)) rounds in the k-machine model. Using this result, it is shown that MIS, Minimum Dominating Set (MDS), and Minimum Connected Dominating Set (MCDS) can all be solved in O(poly(log n) m/k^2) rounds in the k-machine model, where 'm' is the number of edges in the input graph. It is shown that a 2-ruling set can be computed even faster, in O((n/k^2+ k) poly(log n)) rounds, in the k-machine model. On the other hand, using information theoretic techniques and a reduction to a communication complexity problem, an \Omega(n/(k^2 poly(log n))) rounds lower bound for MIS in the k-machine model is also shown. As far as we know, this is the first example of a lower bound in the k-machine model for a symmetry breaking problem. Chapter 5 focuses on the Max Clique problem in the CONGEST model. Max Clique is trivially solvable in one round in the LOCAL model since each node can share its entire neighborhood with all neighbors in a single round. However, in the CONGEST model, nodes have to choose what to communicate and along what communication links. Thus, in a sense, they have to break symmetry and this is forced upon them by the bandwidth constraints. Chapter 5 shows that an O(n^{3/5})-approximation to Max Clique in the CONGEST model can be computed in O(1) rounds. This dissertation ends with open questions in Chapter 6. Distributed Algorithms Distributed Systems Graph Theory Information Theory Maximal Independent Set Modes of Computation Computer Sciences
199	Shared and distributed memory parallel algorithms to solve big data problems in biological, social network and spatial domain applications Sharma, Rahil 01 December 2016 (has links) Big data refers to information which cannot be processed and analyzed using traditional approaches and tools, due to 4 V's - sheer Volume, Velocity at which data is received and processed, and data Variety and Veracity. Today massive volumes of data originate in domains such as geospatial analysis, biological and social networks, etc. Hence, scalable algorithms for effcient processing of this massive data is a signicant challenge in the field of computer science. One way to achieve such effcient and scalable algorithms is by using shared & distributed memory parallel programming models. In this thesis, we present a variety of such algorithms to solve problems in various above mentioned domains. We solve five problems that fall into two categories. The first group of problems deals with the issue of community detection. Detecting communities in real world networks is of great importance because they consist of patterns that can be viewed as independent components, each of which has distinct features and can be detected based upon network structure. For example, communities in social networks can help target users for marketing purposes, provide user recommendations to connect with and join communities or forums, etc. We develop a novel sequential algorithm to accurately detect community structures in biological protein-protein interaction networks, where a community corresponds with a functional module of proteins. Generally, such sequential algorithms are computationally expensive, which makes them impractical to use for large real world networks. To address this limitation, we develop a new highly scalable Symmetric Multiprocessing (SMP) based parallel algorithm to detect high quality communities in large subsections of social networks like Facebook and Amazon. Due to the SMP architecture, however, our algorithm cannot process networks whose size is greater than the size of the RAM of a single machine. With the increasing size of social networks, community detection has become even more difficult, since network size can reach up to hundreds of millions of vertices and edges. Processing such massive networks requires several hundred gigabytes of RAM, which is only possible by adopting distributed infrastructure. To address this, we develop a novel hybrid (shared + distributed memory) parallel algorithm to efficiently detect high quality communities in massive Twitter and .uk domain networks. The second group of problems deals with the issue of effciently processing spatial Light Detection and Ranging (LiDAR) data. LiDAR data is widely used in forest and agricultural crop studies, landscape classification, 3D urban modeling, etc. Technological advancements in building LiDAR sensors have enabled highly accurate and dense LiDAR point clouds resulting in massive data volumes, which pose computing issues with processing and storage. We develop the first published landscape driven data reduction algorithm, which uses the slope-map of the terrain as a filter to reduce the data without sacrificing its accuracy. Our algorithm is highly scalable and adopts shared memory based parallel architecture. We also develop a parallel interpolation technique that is used to generate highly accurate continuous terrains, i.e. Digital Elevation Models (DEMs), from discrete LiDAR point clouds. Algorithms Big Data Analytics Distributed Systems High Performance Parallel Computing Computer Sciences
200	Effective task assignment strategies for distributed systems under highly variable workloads Broberg, James Andrew, james@broberg.com.au January 2007 (has links) Heavy-tailed workload distributions are commonly experienced in many areas of distributed computing. Such workloads are highly variable, where a small number of very large tasks make up a large proportion of the workload, making the load very hard to distribute effectively. Traditional task assignment policies are ineffective under these conditions as they were formulated based on the assumption of an exponentially distributed workload. Size-based task assignment policies have been proposed to handle heavy-tailed workloads, but their applications are limited by their static nature and assumption of prior knowledge of a task's service requirement. This thesis analyses existing approaches to load distribution under heavy-tailed workloads, and presents a new generalised task assignment policy that significantly improves performance for many distributed applications, by intelligently addressing the negative effects on performance that highly variable workloads cause. Many problems associated with the modelling and optimisations of systems under highly variable workloads were then addressed by a novel technique that approximated these workloads with simpler mathematical representations, without losing any of their pertinent original properties. Finally, we obtain advance queuing metrics (such as the variance of key measurements like waiting time and slowdown that are difficult to obtain analytically) through rigorous simulation. Scheduling policies distributed systems task assignment heavy-tailed workloads load balancing load sharing queuing theory

Search results