• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 149
  • 24
  • 19
  • 12
  • 8
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 269
  • 96
  • 82
  • 74
  • 67
  • 47
  • 37
  • 35
  • 31
  • 30
  • 28
  • 26
  • 25
  • 25
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

A Unified Infrastructure for Monitoring and Tuning the Energy Efficiency of HPC Applications

Schöne, Robert 19 September 2017 (has links)
High Performance Computing (HPC) has become an indispensable tool for the scientific community to perform simulations on models whose complexity would exceed the limits of a standard computer. An unfortunate trend concerning HPC systems is that their power consumption under high-demanding workloads increases. To counter this trend, hardware vendors have implemented power saving mechanisms in recent years, which has increased the variability in power demands of single nodes. These capabilities provide an opportunity to increase the energy efficiency of HPC applications. To utilize these hardware power saving mechanisms efficiently, their overhead must be analyzed. Furthermore, applications have to be examined for performance and energy efficiency issues, which can give hints for optimizations. This requires an infrastructure that is able to capture both, performance and power consumption information concurrently. The mechanisms that such an infrastructure would inherently support could further be used to implement a tool that is able to do both, measuring and tuning of energy efficiency. This thesis targets all steps in this process by making the following contributions: First, I provide a broad overview on different related fields. I list common performance measurement tools, power measurement infrastructures, hardware power saving capabilities, and tuning tools. Second, I lay out a model that can be used to define and describe energy efficiency tuning on program region scale. This model includes hardware and software dependent parameters. Hardware parameters include the runtime overhead and delay for switching power saving mechanisms as well as a contemplation of their scopes and the possible influence on application performance. Thus, in a third step, I present methods to evaluate common power saving mechanisms and list findings for different x86 processors. Software parameters include their performance and power consumption characteristics as well as the influence of power-saving mechanisms on these. To capture software parameters, an infrastructure for measuring performance and power consumption is necessary. With minor additions, the same infrastructure can later be used to tune software and hardware parameters. Thus, I lay out the structure for such an infrastructure and describe common components that are required for measuring and tuning. Based on that, I implement adequate interfaces that extend the functionality of contemporary performance measurement tools. Furthermore, I use these interfaces to conflate performance and power measurements and further process the gathered information for tuning. I conclude this work by demonstrating that the infrastructure can be used to manipulate power-saving mechanisms of contemporary x86 processors and increase the energy efficiency of HPC applications.
92

Integrating SkePU's algorithmic skeletons with GPI on a cluster / Integrering av SkePUs algoritmiska skelett med GPI på ett cluster

Almqvist, Joel January 2022 (has links)
As processors' clock-speed flattened out in the early 2000s, multi-core processors became more prevalent and so did parallel programming. However this programming paradigm introduces additional complexities, and to combat this, the SkePU framework was created. SkePU does this by offering a single-threaded interface which executes the user's code in parallel in accordance to a chosen computational pattern. Furthermore it allows the user themselves to decide which parallel backend should perform the execution, be it OpenMP, CUDA or OpenCL. This modular approach of SkePU thus allows for different hardware to be used without changing the code, and it currently supports CPUs, GPUs and clusters. This thesis presents a new so-called SkePU-backend made for clusters, using the communication library GPI. It demonstrates that the new backend is able to scale better and handle workload imbalances better than the existing SkePU-cluster-backend. This is achieved despite it performing worse at low node amounts, indicating that it requires less scaling overhead. Its weaknesses are also analyzed, partially from a design point of view, and clear solutions are presented, combined with a discussion as to why they arose in the first place.
93

Optimizing Checkpoint/Restart and Input/Output for Large Scale Applications

Jami, Masoud 15 November 2024 (has links)
Im Bereich von Exascale Computing und HPC sind Fehler nicht gelegentlich. Sondern treten sie regelmäßig während der Laufzeit von Anwendungen auf. Die Bewältigung dieser Herausforderungen ist wichtig, um die Zuverlässigkeit der Supercomputing-Anwendung zu verbessern. Checkpoint/Restart ist eine Technik, die in HPC verwendet wird, um die Ausfallsicherheit bei Ausfällen zu verbessern. Dabei wird der Status einer Anwendung regelmäßig auf der Festplatte gespeichert, sodass die Anwendung bei einem Ausfall vom letzten Checkpoint aus neu gestartet werden kann. Checkpointing kann jedoch zeitaufwändig sein insbesondere hinsichtlich I/O. Daher ist die Optimierung des C/R-Prozesses wichtig, um seine Auswirkungen auf die Anwendungsleistung zu reduzieren und die Job-Resilienz zu verbessern. Der erste Teil dieser Arbeit erforscht und entwickelt innovative Techniken im Bereich des C/R-Managements im HPC-Kontext. Dazu gehört die Entwicklung eines neuartigen C/R-Ansatzes, die Entwicklung eines Modells für mehrstufiges C/R, und die Optimierung der gemeinsamen Nutzung von Burst-Puffer für C/R in Supercomputern. C/R-Prozeduren erzeugen umfangreiche I/O-Operationen. Daher ist eine Optimierung der I/O-Prozesse zwingend erforderlich. Um den C/R-Prozess zu optimieren, ist es auch wichtig, das I/O-Verhalten einer Anwendung zu verstehen, einschließlich der Menge an Daten, die geschrieben werden müssen, wie oft Checkpoints genommen werden sollten und wo die Checkpoints gespeichert werden sollen. Daher untersuchen und stellen wir im zweiten Teil Innovationen von Ansätzen für I/O-Modellierung und -Management. Dazu gehört die Entwicklung eines Plugins für GCC, das das optimale Speichergerät für die I/O von Anwendungen basierend auf ihrem durch Pragma-Vorstellungen definierten Verhalten auswählt, und die Entwicklung eines Modells zur Schätzung der I/O-Kosten Anwendungen unter Linux unter Berücksichtigung von Seitenverwaltung und Prozessdrosselung. / In the context of exascale computing and HPC, failures are not occasional but rather inherent, occurring during the runtime of applications. Addressing these challenges is essential to enhance the resilience and reliability of supercomputing operations. Checkpoint/Restart (C/R) is a technique used in HPC to improve job resilience in the case of failures. This involves periodically saving the state of an application to disk, so that if the application fails, it can be restarted from the last checkpoint. However, checkpointing can be time-consuming and significantly impact application performance, particularly regarding its I/O operations. Therefore, optimizing C/R is crucial for reducing its impact on application performance and improving job resilience. The first part of this work develops novel techniques in C/R management within the context of HPC. This includes developing a novel C/R approach by combining XOR and partner C/R mechanisms, developing a model for multilevel C/R in large computational resources, and optimising the shared usage of burst buffers for C/R in supercomputers. C/R procedures generate substantial I/O operations, emerging as a bottleneck for HPC applications. Hence, the need for optimization in I/O processes becomes imperative to overcome this bottleneck. To optimize the C/R process, it is also important to understand the I/O behavior of an application, including how much data needs to be written, how frequently checkpoints should be taken, and where to store the checkpoints to minimize I/O bottlenecks. Hence, in the second part, we investigate and introduce innovative techniques and approaches for I/O modeling and management. This includes developing a plugin for GNU C Compiler (GCC) that selects the optimal storage device for the I/O of applications based on their behavior that is defined by Pragma notions, and developing a model to estimate I/O cost of applications under Linux considering page management and process throttling.
94

Potential pathogenicity of heterotrophic plate count bacteria isolated from untreated drinking water / Rachel Magrietha Petronella Prinsloo

Prinsloo, Rachel Magrietha Petronella January 2014 (has links)
Water is considered the most vital resource on earth and its quality is deteriorating. Not all residents living in South Africa‘s rural areas have access to treated drinking water, and use water from rivers, dams, and wells. The quality of these resources is unknown, as well as the effects of the bacteria in the water on human health. The heterotrophic plate count (HPC) method is a globally used test to evaluate microbial water quality. According to South African water quality guidelines, water of good quality may not contain more than a 1 000 coliforming units (CFU)/mℓ. There is mounting evidence that HPC bacteria may be hazardous to humans with compromised, underdeveloped, and weakened immune systems. In this study the pathogenic potential of HPC bacteria was investigated. Samples were collected from boreholes in the North West Province and HPCs were enumerated with a culture-based method. Standard physico-chemical parameters were measured for the water. Different HPC bacteria were isolated and purified and tested for α- or β-haemolysis, as well as the production of extracellular enzymes such as DNase, proteinase, lecithinase, chondroitinase, hyaluronidase and lipase, as these are pathogenic characteristics. The isolates were identified with 16S rRNA gene sequencing. The model for the human intestine, Hutu-80 cells, were exposed to the potentially pathogenic HPC isolates to determine their effects on the viability of the human cells. The isolates were also exposed to different dilutions of simulated gastric fluid (SGF) to evaluate its effect on the viability of bacteria. Antibiotic resistant potential of each isolate was determined by the Kirby-Bauer disk diffusion method. Three borehole samples did not comply with the physico-chemical guidelines. Half of the samples exceeded the microbial water quality guideline and the greatest CFU was 292 350 CFU/mℓ. 27% of the isolate HPC bacteria were α- or β- haemolytic. Subsequent analysis revealed the production of: DNase in 72%, proteinase in 40%, lipase and lecithinase in 29%, hyaluronidase in 25% and least produced was chondroitinase in 25%. The HPC isolates identified included: Alcaligenes faecalis, Aeromonas hydrophila and A. taiwanesis, Bacillus sp., Bacillus thuringiensis, Bacillus subtilis, Bacillus pumilus, Brevibacillus sp., Bacillus cereus and Pseudomonas sp. All the isolates, except Alcaligenes faecalis, were toxic to the human intestinal cells to varying degrees. Seven isolates survived exposure to the most diluted SGF and of these, four isolates also survived the intermediate dilution but, only one survived the highest SGF concentration. Some isolates were resistant to selected antibiotics, but none to neomycin and vancomycin. Amoxillin and oxytetracycline were the least effective of the antibiotics tested. A pathogen score was calculated for each isolate based on the results of this study. Bacillus cereus had the highest pathogen index with declining pathogenicity as follows: Alcaligenes faecalis > B. thuringiensis > Bacillus pumilus > Pseudomonas sp. > Brevibacillus > Aeromonas taiwanesis > Aeromonas hydrophila > Bacillus subtilis > Bacillus sp. The results of this study prove that standard water quality tests such as the physico-chemical and the HPC methods are insufficient to provide protection against the effects of certain pathogenic HPC bacteria. / MSc (Environmental Sciences), North-West University, Potchefstroom Campus, 2014
95

Potential pathogenicity of heterotrophic plate count bacteria isolated from untreated drinking water / Rachel Magrietha Petronella Prinsloo

Prinsloo, Rachel Magrietha Petronella January 2014 (has links)
Water is considered the most vital resource on earth and its quality is deteriorating. Not all residents living in South Africa‘s rural areas have access to treated drinking water, and use water from rivers, dams, and wells. The quality of these resources is unknown, as well as the effects of the bacteria in the water on human health. The heterotrophic plate count (HPC) method is a globally used test to evaluate microbial water quality. According to South African water quality guidelines, water of good quality may not contain more than a 1 000 coliforming units (CFU)/mℓ. There is mounting evidence that HPC bacteria may be hazardous to humans with compromised, underdeveloped, and weakened immune systems. In this study the pathogenic potential of HPC bacteria was investigated. Samples were collected from boreholes in the North West Province and HPCs were enumerated with a culture-based method. Standard physico-chemical parameters were measured for the water. Different HPC bacteria were isolated and purified and tested for α- or β-haemolysis, as well as the production of extracellular enzymes such as DNase, proteinase, lecithinase, chondroitinase, hyaluronidase and lipase, as these are pathogenic characteristics. The isolates were identified with 16S rRNA gene sequencing. The model for the human intestine, Hutu-80 cells, were exposed to the potentially pathogenic HPC isolates to determine their effects on the viability of the human cells. The isolates were also exposed to different dilutions of simulated gastric fluid (SGF) to evaluate its effect on the viability of bacteria. Antibiotic resistant potential of each isolate was determined by the Kirby-Bauer disk diffusion method. Three borehole samples did not comply with the physico-chemical guidelines. Half of the samples exceeded the microbial water quality guideline and the greatest CFU was 292 350 CFU/mℓ. 27% of the isolate HPC bacteria were α- or β- haemolytic. Subsequent analysis revealed the production of: DNase in 72%, proteinase in 40%, lipase and lecithinase in 29%, hyaluronidase in 25% and least produced was chondroitinase in 25%. The HPC isolates identified included: Alcaligenes faecalis, Aeromonas hydrophila and A. taiwanesis, Bacillus sp., Bacillus thuringiensis, Bacillus subtilis, Bacillus pumilus, Brevibacillus sp., Bacillus cereus and Pseudomonas sp. All the isolates, except Alcaligenes faecalis, were toxic to the human intestinal cells to varying degrees. Seven isolates survived exposure to the most diluted SGF and of these, four isolates also survived the intermediate dilution but, only one survived the highest SGF concentration. Some isolates were resistant to selected antibiotics, but none to neomycin and vancomycin. Amoxillin and oxytetracycline were the least effective of the antibiotics tested. A pathogen score was calculated for each isolate based on the results of this study. Bacillus cereus had the highest pathogen index with declining pathogenicity as follows: Alcaligenes faecalis > B. thuringiensis > Bacillus pumilus > Pseudomonas sp. > Brevibacillus > Aeromonas taiwanesis > Aeromonas hydrophila > Bacillus subtilis > Bacillus sp. The results of this study prove that standard water quality tests such as the physico-chemical and the HPC methods are insufficient to provide protection against the effects of certain pathogenic HPC bacteria. / MSc (Environmental Sciences), North-West University, Potchefstroom Campus, 2014
96

High performance bioinformatics and computational biology on general-purpose graphics processing units

Ling, Cheng January 2012 (has links)
Bioinformatics and Computational Biology (BCB) is a relatively new multidisciplinary field which brings together many aspects of the fields of biology, computer science, statistics, and engineering. Bioinformatics extracts useful information from biological data and makes these more intuitive and understandable by applying principles of information sciences, while computational biology harnesses computational approaches and technologies to answer biological questions conveniently. Recent years have seen an explosion of the size of biological data at a rate which outpaces the rate of increases in the computational power of mainstream computer technologies, namely general purpose processors (GPPs). The aim of this thesis is to explore the use of off-the-shelf Graphics Processing Unit (GPU) technology in the high performance and efficient implementation of BCB applications in order to meet the demands of biological data increases at affordable cost. The thesis presents detailed design and implementations of GPU solutions for a number of BCB algorithms in two widely used BCB applications, namely biological sequence alignment and phylogenetic analysis. Biological sequence alignment can be used to determine the potential information about a newly discovered biological sequence from other well-known sequences through similarity comparison. On the other hand, phylogenetic analysis is concerned with the investigation of the evolution and relationships among organisms, and has many uses in the fields of system biology and comparative genomics. In molecular-based phylogenetic analysis, the relationship between species is estimated by inferring the common history of their genes and then phylogenetic trees are constructed to illustrate evolutionary relationships among genes and organisms. However, both biological sequence alignment and phylogenetic analysis are computationally expensive applications as their computing and memory requirements grow polynomially or even worse with the size of sequence databases. The thesis firstly presents a multi-threaded parallel design of the Smith- Waterman (SW) algorithm alongside an implementation on NVIDIA GPUs. A novel technique is put forward to solve the restriction on the length of the query sequence in previous GPU-based implementations of the SW algorithm. Based on this implementation, the difference between two main task parallelization approaches (Inter-task and Intra-task parallelization) is presented. The resulting GPU implementation matches the speed of existing GPU implementations while providing more flexibility, i.e. flexible length of sequences in real world applications. It also outperforms an equivalent GPPbased implementation by 15x-20x. After this, the thesis presents the first reported multi-threaded design and GPU implementation of the Gapped BLAST with Two-Hit method algorithm, which is widely used for aligning biological sequences heuristically. This achieved up to 3x speed-up improvements compared to the most optimised GPP implementations. The thesis then presents a multi-threaded design and GPU implementation of a Neighbor-Joining (NJ)-based method for phylogenetic tree construction and multiple sequence alignment (MSA). This achieves 8x-20x speed up compared to an equivalent GPP implementation based on the widely used ClustalW software. The NJ method however only gives one possible tree which strongly depends on the evolutionary model used. A more advanced method uses maximum likelihood (ML) for scoring phylogenies with Markov Chain Monte Carlo (MCMC)-based Bayesian inference. The latter was the subject of another multi-threaded design and GPU implementation presented in this thesis, which achieved 4x-8x speed up compared to an equivalent GPP implementation based on the widely used MrBayes software. Finally, the thesis presents a general evaluation of the designs and implementations achieved in this work as a step towards the evaluation of GPU technology in BCB computing, in the context of other computer technologies including GPPs and Field Programmable Gate Arrays (FPGA) technology.
97

HPC scheduling in a brave new world

Gonzalo P., Rodrigo January 2017 (has links)
Many breakthroughs in scientific and industrial research are supported by simulations and calculations performed on high performance computing (HPC) systems. These systems typically consist of uniform, largely parallel compute resources and high bandwidth concurrent file systems interconnected by low latency synchronous networks. HPC systems are managed by batch schedulers that order the execution of application jobs to maximize utilization while steering turnaround time. In the past, demands for greater capacity were met by building more powerful systems with more compute nodes, greater transistor densities, and higher processor operating frequencies. Unfortunately, the scope for further increases in processor frequency is restricted by the limitations of semiconductor technology. Instead, parallelism within processors and in numbers of compute nodes is increasing, while the capacity of single processing units remains unchanged. In addition, HPC systems’ memory and I/O hierarchies are becoming deeper and more complex to keep up with the systems’ processing power. HPC applications are also changing: the need to analyze large data sets and simulation results is increasing the importance of data processing and data-intensive applications. Moreover, composition of applications through workflows within HPC centers is becoming increasingly important. This thesis addresses the HPC scheduling challenges created by such new systems and applications. It begins with a detailed analysis of the evolution of the workloads of three reference HPC systems at the National Energy Research Supercomputing Center (NERSC), with a focus on job heterogeneity and scheduler performance. This is followed by an analysis and improvement of a fairshare prioritization mechanism for HPC schedulers. The thesis then surveys the current state of the art and expected near-future developments in HPC hardware and applications, and identifies unaddressed scheduling challenges that they will introduce. These challenges include application diversity and issues with workflow scheduling or the scheduling of I/O resources to support applications. Next, a cloud-inspired HPC scheduling model is presented that can accommodate application diversity, takes advantage of malleable applications, and enables short wait times for applications. Finally, to support ongoing scheduling research, an open source scheduling simulation framework is proposed that allows new scheduling algorithms to be implemented and evaluated in a production scheduler using workloads modeled on those of a real system. The thesis concludes with the presentation of a workflow scheduling algorithm to minimize workflows’ turnaround time without over-allocating resources. / <p>Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.</p>
98

Hydrophobically-modified hydroxypropyl celluloses : synthesis and self-assembly in water

Piredda, Mariella January 2004 (has links)
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.
99

Ladrilhos e revestimentos hidráulicos de alto desempenho / High performance hydraulic tiles and coverings

Catoia, Thiago 08 May 2007 (has links)
Os ladrilhos hidráulicos são revestimentos produzidos utilizando aglomerante hidráulico, cuja tecnologia de produção não acompanhou a grande evolução tecnológica dos concretos, as disponibilidades de novos materiais e técnicas de utilização ocorrentes nos últimos anos, sendo assim esses revestimentos perderam espaço e competitividade no mercado pela característica artesanal de sua produção. O trabalho teve como objetivo desenvolver uma argamassa para produção de ladrilhos hidráulicos utilizando a tecnologia utilizada na produção de concretos de alto desempenho. Os agregados foram selecionados e compostos utilizando diferentes técnicas de empacotamento de partículas, essas técnicas foram implementadas experimentalmente e seus resultados foram analisados e comparados através da medida de massa unitária no estado compactado seco. Os aglomerantes foram selecionados de forma a atender a utilização de pigmentos claros e escuros necessários para a produção de ladrilhos hidráulicos decorativos, assim foram elaboradas duas diferentes composições de aglomerantes, a primeira com cimento Portland branco estrutural e metacaulinita branca, e a segunda com cimento Portland de alta resistência inicial resistente a sulfatos e sílica ativa de ferro-silício. Diferentes aditivos superplastificantes foram testados, sendo a medida de compatibilidade com os aglomerantes e a determinação dos teores ideais a serem utilizados com cada tipo de aglomerantes realizadas através da medida de consistência em mesa cadente. A avaliação das argamassas desenvolvidas para produção dos ladrilhos hidráulicos foi realizada através dos ensaios de compressão axial simples, tração na compressão diametral, e determinação do módulo de elasticidade. Os ladrilhos hidráulicos foram produzidos em fôrmas previamente elaboradas e moldados com auxílio de vibração, após desmoldagem e cura foram avaliados através dos ensaios de módulo de flexão, desgaste por abrasão, absorção de água, retração por secagem, ação química, e determinação das variações de dimensões em diferentes tempos de cura. Após elaboração, produção e avaliação dos ladrilhos hidráulicos, também foram realizados alguns testes práticos, e avaliados os custos dos materiais para produção desses revestimentos, como parte de um estudo para implementação da produção desses elementos em escala industrial. Os ladrilhos hidráulicos desenvolvidos apresentaram alto desempenho nas características avaliadas, com resistência à compressão axial simples de até 143 MPa e absorção de água próxima a 1%, também apresentando viabilidade de produção. / The hydraulic tiles are coverings produced by using hydraulic binders, whose production technology has not follow the great technological evolution of the concretes, the availabilities of new materials and the usual techniques in recent years, thus they have lost space and competition in the market due to their handicraft characteristic of production. This work had the objective to develop a mortar for production of hydraulic tiles using high performance concrete production technology. The aggregates were selected and proportioned using different techniques of particle packing, these techniques were implemented experimentally and their results were analyzed and compared through the measurement of the unit weight by rodding. The binders were selected to be used with clear and dark pigment necessity for production of decorative hydraulic tiles, thus two different mixture proportions of binders were elaborated, the first with structural white Portland cement and white metakaulin, and the second with high initial strength Portland cement resistant to sulphate and ferrosilicon silica fume. Different superplasticizer additives were tested, being the measure of compatibility with the binders and the determination of ideal proportions to be used with each type of binders measured in flow table consistency test. The evaluation of mortars developed for hydraulic tiles production was carried out by simple axial compression test, cylinder splitting test, and elastic modulus determination test. The hydraulic tiles were produced in a previously elaborated mould and molded by vibration, after dismoulding and cure were evaluated by flexural modulus test, wear by abrasion test, water absorption test, drying shrinkage test, chemical action test, and determination of the dimensions variations in different times of cure. After hydraulic tiles elaboration, production and evaluation, also were made some practical tests, and evaluated the material costs for production of these coverings, as part of an implementation study for production of these elements in industrial scale. The developed hydraulic tiles showed high performance in the evaluated characteristics, with simple axial compression strength until 143 MPa and water absorption about 1%, also showed production viability.
100

Source code optimizations to reduce multi core and many core performance bottlenecks / Otimizações de código fonte para reduzir gargalos de desempenho em multi core e many core

Serpa, Matheus da Silva January 2018 (has links)
Atualmente, existe uma variedade de arquiteturas disponíveis não apenas para a indústria, mas também para consumidores finais. Processadores multi-core tradicionais, GPUs, aceleradores, como o Xeon Phi, ou até mesmo processadores orientados para eficiência energética, como a família ARM, apresentam características arquiteturais muito diferentes. Essa ampla gama de características representa um desafio para os desenvolvedores de aplicações. Os desenvolvedores devem lidar com diferentes conjuntos de instruções, hierarquias de memória, ou até mesmo diferentes paradigmas de programação ao programar para essas arquiteturas. Para otimizar uma aplicação, é importante ter uma compreensão profunda de como ela se comporta em diferentes arquiteturas. Os trabalhos relacionados provaram ter uma ampla variedade de soluções. A maioria deles se concentrou em melhorar apenas o desempenho da memória. Outros se concentram no balanceamento de carga, na vetorização e no mapeamento de threads e dados, mas os realizam separadamente, perdendo oportunidades de otimização. Nesta dissertação de mestrado, foram propostas várias técnicas de otimização para melhorar o desempenho de uma aplicação de exploração sísmica real fornecida pela Petrobras, uma empresa multinacional do setor de petróleo. Os experimentos mostram que loop interchange é uma técnica útil para melhorar o desempenho de diferentes níveis de memória cache, melhorando o desempenho em até 5,3 e 3,9 nas arquiteturas Intel Broadwell e Intel Knights Landing, respectivamente. Ao alterar o código para ativar a vetorização, o desempenho foi aumentado em até 1,4 e 6,5 . O balanceamento de carga melhorou o desempenho em até 1,1 no Knights Landing. Técnicas de mapeamento de threads e dados também foram avaliadas, com uma melhora de desempenho de até 1,6 e 4,4 . O ganho de desempenho do Broadwell foi de 22,7 e do Knights Landing de 56,7 em comparação com uma versão sem otimizações, mas, no final, o Broadwell foi 1,2 mais rápido que o Knights Landing. / Nowadays, there are several different architectures available not only for the industry but also for final consumers. Traditional multi-core processors, GPUs, accelerators such as the Xeon Phi, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. This wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. Related work proved to have a wide variety of solutions. Most of then focused on improving only memory performance. Others focus on load balancing, vectorization, and thread and data mapping, but perform them separately, losing optimization opportunities. In this master thesis, we propose several optimization techniques to improve the performance of a real-world seismic exploration application provided by Petrobras, a multinational corporation in the petroleum industry. In our experiments, we show that loop interchange is a useful technique to improve the performance of different cache memory levels, improving the performance by up to 5.3 and 3.9 on the Intel Broadwell and Intel Knights Landing architectures, respectively. By changing the code to enable vectorization, performance was increased by up to 1.4 and 6.5 . Load Balancing improved the performance by up to 1.1 on Knights Landing. Thread and data mapping techniques were also evaluated, with a performance improvement of up to 1.6 and 4.4 . We also compared the best version of each architecture and showed that we were able to improve the performance of Broadwell by 22.7 and Knights Landing by 56.7 compared to a naive version, but, in the end, Broadwell was 1.2 faster than Knights Landing.

Page generated in 0.0416 seconds