Global ETD Search

11	Lygiagretaus skaičiavimo technologijų naudojimas Kuršių marių ekologiniame modelyje / Parallel computing technology application to Curonian Lagoon ecological model Bliūdžiutė, Lina 14 June 2005 (has links) Modern computers are capable of completing most of the tasks in fairly short time, however there are areas in what calculations can last for months and even years. Parallel algorithms are one of the ways to accelerate long-lasting calculations. In the thesis we analyze parallel computing technologies OpenMP (shared memory) and MPI (distributed memory), cons and pros of their architecture. We identify potential parts of program code of Curonian lagoon model for parallelizing, in which chosen parallel computing technologies OpenMP and MPI is applied. Also we make the runtime speedup analysis. Informatics Amdahl'o dėsnis OpenMP Greitinimas Speedup SHYFEM Parallel computing MPI Lygiagretūs skaičiavimai Amdahl's low
12	Transforming Medical Imaging Applications into Collaborative PACS-based Telemedical Systems Maani, Rouzbeh 13 October 2010 (has links) Many medical imaging applications have been developed so far; however, many of them do not support collaboration and are not remotely accessible (i.e., Telemedicine). Medical imaging applications are not practical for use in clinical workflows unless they are able to communicate with the Picture Archiving and Communications System (PACS). This thesis presents an approach based on a three-tier architecture and provides several components to transform medical imaging applications into collaborative, PACS-based, telemedical systems. A novel method is presented to support PACS connectivity. The method is to use the Digital Imaging and COmmunication in Medicine (DICOM) protocol and enhance transmission time by employing a combination of parallelism and compression methods. Experimental results show up to 1.63 speedup over Local Area Networks (LANs) and up to 16.34 speedup over Wide Area Networks (WANs) compared to the current method of medical data transmission. Telemedicine Collaborative PACS Medical Imaging Applications DICOM Parallel Speedup three-tier
13	Transforming Medical Imaging Applications into Collaborative PACS-based Telemedical Systems Maani, Rouzbeh 13 October 2010 (has links) Many medical imaging applications have been developed so far; however, many of them do not support collaboration and are not remotely accessible (i.e., Telemedicine). Medical imaging applications are not practical for use in clinical workflows unless they are able to communicate with the Picture Archiving and Communications System (PACS). This thesis presents an approach based on a three-tier architecture and provides several components to transform medical imaging applications into collaborative, PACS-based, telemedical systems. A novel method is presented to support PACS connectivity. The method is to use the Digital Imaging and COmmunication in Medicine (DICOM) protocol and enhance transmission time by employing a combination of parallelism and compression methods. Experimental results show up to 1.63 speedup over Local Area Networks (LANs) and up to 16.34 speedup over Wide Area Networks (WANs) compared to the current method of medical data transmission. Telemedicine Collaborative PACS Medical Imaging Applications DICOM Parallel Speedup three-tier
14	Análise e estudo de desempenho e consumo de energia de memórias transacionais em software / Performance and energy consumption analysis and study on software transactional memories Garcia, Leonardo Augusto Guimarães, 1981- 23 August 2018 (has links) Orientador: Rodolfo Jardim de Azevedo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-23T23:05:30Z (GMT). No. of bitstreams: 1 Garcia_LeonardoAugustoGuimaraes_M.pdf: 3415043 bytes, checksum: 9df4293802303aa68e123c48882f387f (MD5) Previous issue date: 2013 / Resumo: A evolução das arquiteturas de computadores nos últimos anos, com a considerável introdução de processadores com vários núcleos e computadores com vários processadores, inclusive em máquinas consideradas de baixo poder de processamento, faz com que seja primordial o desenvolvimento de novos paradigmas e modelos de programação paralela que sejam fáceis de usar e depurar pela grande maioria dos programadores de sistemas. Os modelos de programação paralela, atualmente disponíveis, são baseados em primitivas de implementação cujo uso é complexo, tedioso e altamente sujeito a erros como, por exemplo, locks, semáforos, sinais, mutexes, monitores e threads. Neste cenário, as Memórias Transacionais (TM) aparecem como uma alternativa promissora que promete ser eficiente e, ao mesmo tempo, fácil de programar. Muita pesquisa foi feita nos últimos anos em relação às Memórias Transacionais em Software (STM), a maior parte delas relacionada a seu desempenho, com pouca atenção dada a outras métricas importantes, como, por exemplo, o consumo energético e como este se relaciona com o tempo de execução - energy-delay product (EDP). Esta dissertação de mestrado faz uma avaliação destas métricas em uma STM configurada com diversas políticas de gerenciamento da TM e de utilização energética, sendo algumas destas combinações inéditas. É mostrado que os resultados para desempenho e EDP nem sempre seguem a mesma tendência e, portanto, pode ser apropriado escolher diferentes políticas de gerenciamento dependendo de qual é o foco da otimização que se deseja fazer, sendo que algumas vezes a execução sequencial pode ser melhor que qualquer execução paralela. De uma forma geral, a execução com o uso de TM foi mais rápida quando comparada com a execução sequencial em dois terços dos casos analisados e teve melhor EDP em um terço das execuções. Através desta análise foi possível derivar um conjunto mínimo de políticas de gerenciamento da TM que foram capazes de entregar melhor resultado para o conjunto de benchmarks estudados, além de identificar tendências sobre o comportamento dos sistemas de TM para grupos de benchmarks quando se varia o número de núcleos executando em paralelo e o tamanho da carga de trabalho / Abstract: The recent unveilings on the computer architecture area, with the massive introduction of multi-core processors and computers with many processors in the last years, even in embedded systems, has brought to light the necessity to develop new paradigms and models for parallel programming that could be leveraged and are easy to debug by the majority of the developers. The current parallel programming models are based on primitives whose use is complex, tedious and highly error prone, such as locks, semaphores, signals, mutexes, monitors and threads. In this scenario, the Transactional Memories (TM) appear as a promising alternative which aims to be efficient and, at the same time, easy to program. Lots of research have been made on the past years on Software Transactional Memories (STM), the majority of them interested on the performance of such systems, with little attention given to other metrics such as energy and the relationship between energy and performance, known as the energy-delay product (EDP). This work evaluates these metrics in an STM configured with a number of TM and energy management policies, some of them new. It is shown that performance and EDP do not always follow the same trend, and, because of that, it might be appropriate to choose different management policies depending on the optimization target. It is also important to never forget about the sequential execution, as it can be more advantageous than any parallel execution in some scenarios. Overall, the execution with TM has a better performance when compared to the sequential execution in two thirds of the analysed situations, and a better EDP in one third of the scenarios. Through the analysis made in this work, it was possible to derive a minimal set of TM management policies that were able to deliver the best results with the benchmarks analysed. It was also possible to identify behavioural trends on the TM systems in certain sets of benchmarks when changing the number of cores in the execution and the workload size / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Memória transacional Software - Desempenho Energia - Consumo Transactional memory Software - Performance Energy consumption Software - Speedup
15	Supercomputing over Cloud using the Quicksort algorithm Mattamadugu, Lakshmi Narashima Seshendra, Pathan, Ashfaq Abdullah Khan January 2012 (has links) Context: Cloud Computing has advanced in recent years. It is catching people’s attention as a commodious resource of computational power. Slowly, Cloud is bringing new possibilities for a scientific community to build High Performance Computing platforms. Despite the wide benefits the Cloud offers, the question on everyone’s mind is “Whether the Cloud is a feasible platform for HPC applications”. This thesis evaluates the performance of the Amazon Cloud using a sorting benchmark. Objectives: 1. To investigate all the previous work on HPC that has been ported to the Cloud environment in various fields. Also, the problems and challenges are assessed relevant to HPC associated with the Cloud. 2. A study is done on how to implement parallel Quicksort efficiently to obtain good Speedup. 3. A parallel Quicksort is developed and its performance is measured using ‘Speedup’ by deploying in the Cloud. Methods: Two different research methods were used to carry out the research. They are Systematic Literature Review (SLR) and a Quantitative methodology. Research papers from academic databases namely IEEE Xplore, Inspec, ACM Digital Library and Springerlink were chosen for conducting SLR. Results: From the systematic review undertaken, 12 HPC applications, 9 problems and 5 challenges in the Cloud were identified. Efficient way to implement the parallel Quicksort on the Cloud has been identified. From the experiment results, a low Speedup is obtained in a Cloud environment. Conclusions: Many HPC applications which were deployed in the Cloud so far were identified along with problems and challenges. Message Passing interface (MPI) is chosen as the efficient method to develop and implement the parallel Quicksort in the Cloud. From the experiment results, we believe that the Cloud is not a suitable platform for HPC applications. Cloud Computing Quicksort HPC Amazon Web Services MPI Speedup. Computer Sciences Datavetenskap (datalogi) Telecommunications Telekommunikation
16	A Comparison of Parallel Design Patterns for Game Development Andblom, Robin, Sjöberg, Carl January 2018 (has links) ----- / As processor performance capabilities can only be increased through the useof a multicore architecture, software needs to be developed to utilize the parallelismoffered by the additional cores. Especially game developers need toseize this opportunity to save cycles and decrease the general rendering time.One of the existing advances towards this potential has been the creation ofmultithreaded game engines that take advantage of the additional processingunits. In such engines, different branches of the game loop are parallelized.However, the specifics of the parallel design patterns used are not outlined.Neither are any ideas of how to combine these patterns proposed. Thesemissing factors are addressed in this article, to provide a guideline for whento use which one of two parallel design patterns; fork-join and pipeline parallelism.Through a collection of data and a comparison using the metricsspeedup and efficiency, conclusions were derived that shed light on the waysin which a typical part of a game loop most efficiently can be organized forparallel execution through the use of different parallel design patterns. Thepipeline and fork-join patterns were applied respectively in a variety of testcases for two branches of a game loop: a BOIDS system and an animationsystem. Flocking Parallelism Multithreading Rendering time Speedup Skeletal animation BOIDS Engineering and Technology Teknik och teknologier
17	Load Balancing Parallel Explicit State Model Checking Kumar, Rahul 28 June 2004 (has links) (PDF) This research first identifies some of the key concerns about the techniques and algorithms developed for distributed and parallel model checking; specifically, the inherent problem with load balancing and large queue sizes resultant in a static partition algorithm. This research then presents a load balancing algorithm to improve the run time performance in distributed model checking, reduce maximum queue size, and reduce the number of states expanded before error discovery. The load balancing algorithm is based on Generalized Dimension Exchange (GDE). This research presents an empirical analysis of the GDE based load balancing algorithm on three different supercomputing architectures---distributed memory clusters, Networks of Workstations (NOW) and shared memory machines. The analysis shows increased speedup, lower maximum queue sizes and fewer total states explored before error discovery on each of the architectures. Finally, this research presents a study of the communication overhead incurred by using the load balancing algorithm, which although significant, does not offset performance gains. load balancing computer model checking verification gde speedup error states queue sizes Computer Sciences
18	Acceleration of Machine-Learning Pipeline Using Parallel Computing Erickson, Xavante January 2021 (has links) Researchers from Lund have conducted research on classifying images in three different categories, faces, landmarks and objects from EEG data [1]. The researchers used SVMs (Support Vector Machine) to classify between the three different categories [2, 3]. The scripts written to compute this had the potential to be extremely parallelized and could potentially be optimized to complete the computations much faster. The scripts were originally written in MATLAB which is a propriety software and not the most popular language for machine learning. The aim of this project is to translate the MATLAB code in the aforementioned Lund project to Python and perform code optimization and parallelization, in order to reduce the execution time. With much other data science transitioning into Python as well, it was a key part in this project to understand the differences between MATLAB and Python and how to translate MATLAB code to Python. With the exception of the preprocessing scripts, all the original MATLAB scripts were translated to Python. The translated Python scripts were optimized for speed and parallelized to decrease the execution time even further. Two major parallel implementations of the Python scripts were made. One parallel implementation was made using the Ray framework to compute in the cloud [4]. The other parallel implementation was made using the Accelerator, a framework to compute using local threads[5]. After translation, the code was tested versus the original results and profiled for any key mistakes, for example functions which took unnecessarily long time to execute. After optimization the single thread script was twelve times faster than the original MATLAB script. The final execution times were around 12−15 minutes, compared to the benchmark of 48 hours it is about 200 times faster. The benchmark of the original code used less iterations than the researchers used, decreasing the computational time from a week to 48 hours. The results of the project highlight the importance of learning and teaching basic profiling of slow code. While not entirely considered in this project, doing complexity analysis of code is important as well. Future work includes a deeper complexity analysis on both a high and low level, since a high level language such as Python relies heavily on modules with low level code. Future work also includes an in-depth analysis of the NumPy source code, as the current code relies heavily on NumPy and has shown tobe a bottleneck in this project. / Datorer är en central och oundviklig del av mångas vardag idag. De framsteg som har gjorts inom maskin-inlärning har gjort det nästintill lika viktigt inom mångas vardag som datorer. Med de otroliga framsteg som gjorts inom maskininlärning så har man börjat använda det för att försöka tolka hjärnsignaler, i hopp om att skapa BCI (Brain Computer Interface) eller hjärn dator gränssnitt. Forskare på Lund Universitet genomförde ett experiment där de försökte kategorisera hjärnsignaler med hjälp av maskininlärning. Forskarna försökte kategorisera mellan tre olika saker, objekt, ansikten och landmärken. En av de större utmaningarna med projektet var att det tog väldigt lång tid att beräkna på en vanlig dator, runt en veckas tid. Det här projektet hade som uppgift att försöka förbättra och snabba upp beräkningstiden av koden. Projektet översatte den kod som skulle förbättras från programmeringspråket MATLAB till Python. Projektet använde sig utav profilering, kluster och av ett accelereringsverktyg. Med hjälp av profilering kan man lokalisera delar av kod som körs långsamt och förbättra koden till att vara snabbare, ett optimeringsverktyg helt enkelt. Kluster är en samling av datorer som man kan använda för att kollektivt beräkna större problem med, för att öka beräkningshastigheten. Det här projektet använde sig utav ett ramverk kallat Ray, vilket möjliggjorde beräkningar av koden på ett kluster ägt av Ericsson. Ett accellereringsverktyg kallat the Accelerator implementerades också, separat från Ray implementationen av koden. The Accelerator utnyttjar endast lokala processorer för att parallelisera ett problem gentemot att använda flera datorer. Den största fördelen med the Accelerator är att den kan hålla reda på vad som beräknats och inte och sparar alla resultat automatiskt. När the Accelerator håller reda på allt så kan det återanvända gamla resultat till nya beräkningar ifall gammal kod används. Återanvändningen av gamla resultat betyder att man undviker beräkningstiden det skulle ta att beräkna kod man redan har beräknat. Detta projekt förbättrade beräkningshastigheten till att vara över två hundra gånger snabbare än den var innan. Med både Ray och the Accelerator sågs en förbättring på över två hundra gånger snabbare, med de bästa resultaten från the Accelerator på runt två hundra femtio gånger snabbare. Det skall dock nämnas att de bästa resultaten från the Accelerator gjordes på en bra server processor. En bra server processor är en stor investering medan en klustertjänst endast tar betalt för tiden man använder, vilket kan vara billigare på kort sikt. Om man däremot behöver använda datorkraften mycket kan det vara mer lönsamt i längden att använda en serverprocessor. En förbättring på två hundra gånger kan ha stora konsekvenser, om man kan se en sådan förbättring i hastighet för BCI överlag. Man skulle potentiellt kunna se en tolkning av hjärnsignaler mer i realtid, vilket man kunde använda till att styra apparater eller elektronik med. Resultaten i det här projektet har också visat att NumPy, ett vanligt beräknings bibliotek i Python, har saktat ned koden med de standardinställningar det kommer med. NumPy gjorde kod långsammare genom att använda flera trådar i processorn, även i en flertrådad miljö där manuell parallelisering hade gjorts. Det visade sig att NumPy var långsammare för både den fler och entrådade implementationen, vilket antyder att NumPy kan sakta ned kod generellt, något många är omedvetna om. Efter att manuellt fixat de miljövariabler som NumPy kommer med, så var koden mer än tre gånger så snabb än innan. / <p>Xavante Erickson ORCID-id: 0009-0000-6316-879X</p><p></p> acceleration ray accelerator numpy machine-learning machine learning optimization parallelization speedup profiling Computer Sciences Datavetenskap (datalogi)
19	Distributed parallel processing in networks of workstations Wang, Yang January 1994 (has links) No description available. distributed parallel processing dynamic scheduling mechanism TCP/IP communication overhead theoretical speedup equation
20	Resource Efficient Parallel VLDB with Customizable Degree of Redundancy Xiong, Fanfan January 2009 (has links) This thesis focuses on the practical use of very large scale relational databases. It leverages two recent breakthroughs in parallel and distributed computing: a) synchronous transaction replication technologies by Justin Y. Shi and Suntain Song; and b) Stateless Parallel Processing principle pioneered by Justin Y. Shi. These breakthroughs enable scalable performance and reliability of database service using multiple redundant shared-nothing database servers. This thesis presents a Functional Horizontal Partitioning method with customizable degree of redundancy to address practical very large scale database applications problems. The prototype VLDB implementation is designed for transparent non-intrusive deployments. The prototype system supports Microsoft SQL Servers databases. Computational experiments are conducted using industry-standard benchmark (TPC-E). / Computer and Information Science Computer Science Relational Database Partition Scalability Shared-nothing Distributed System Speedup Synchronous Transaction Replication Vldb Oltp

Search results