• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 146
  • 24
  • 19
  • 12
  • 8
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 265
  • 96
  • 82
  • 73
  • 67
  • 47
  • 36
  • 35
  • 30
  • 29
  • 28
  • 25
  • 25
  • 25
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Radar Signal Processing with Graphics Processors (GPUS)

Pettersson, Jimmy, Wainwright, Ian January 2010 (has links)
No description available.
12

Colonization of Granular Activated Carbon Media Filters By Legionella and Heterotrophic Bacterial Cells

January 2014 (has links)
abstract: Granular activated carbon (GAC) filters are final polishing step in the drinking water treatment systems for removal of dissolved organic carbon fractions. Generally filters are colonized by bacterial communities and their activity reduces biodegradable solutes allowing partial regeneration of GAC's adsorptive capacity. When the bacteria pass into the filtrate due to increased growth, microbiological quality of drinking water is compromised and regrowth in the distribution system occurs. Bacteria attached to carbon particles as biofilms or in conjugation with other bacteria were observed to be highly resistant to post filtration microbial mitigation techniques. Some of these bacteria were identified as pathogenic. This study focuses on one such pathogen Legionella pneumophila which is resistant to environmental stressors and treatment conditions. It is also responsible for Legionnaires' disease outbreak through drinking water thus attracting attention of regulatory agencies. The work assessed the attachment and colonization of Legionella and heterotrophic bacteria in lab scale GAC media column filters. Quantification of Legionella and HPC in the influent, effluent, column's biofilms and on the GAC particles was performed over time using fluorescent microscopy and culture based techniques. The results indicated gradual increase in the colonization of the GAC particles with HPC bacteria. Initially high number of Legionella cells were detected in the column effluent and were not detected on GAC suggesting low attachment of the cells to the particles potentially due to lack of any previous biofilms. With the initial colonization of the filter media by other bacteria the number of Legionella cells on the GAC particles and biofilms also increased. Presence of Legionella was confirmed in all the samples collected from the columns spiked with Legionella. Significant increase in the Legionella was observed in column's inner surface biofilm (0.25 logs up to 0.52 logs) and on GAC particles (0.42 logs up to 0.63 logs) after 2 months. Legionella and HPC attached to column's biofilm were higher than that on GAC particles indicating the strong association with biofilms. The bacterial concentration slowly increased in the effluent. This may be due to column's wall effect decreasing filter efficiency, possible exhaustion of GAC capacity over time and potential bacterial growth. / Dissertation/Thesis / Masters Thesis Civil and Environmental Engineering 2014
13

Um cluster híbrido com módulos de co – processamento em hardware (FPGAS) para processamento de alto desempenho

BARROS JÚNIOR, Severino José de 10 September 2014 (has links)
Submitted by Luiz Felipe Barbosa (luiz.fbabreu2@ufpe.br) on 2015-03-10T19:00:58Z No. of bitstreams: 2 DISSERTAÇÃO Severino José de Barros Júnior.pdf: 3495935 bytes, checksum: b2c482e8b4f864c84aad98267495cde1 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Approved for entry into archive by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-03-10T19:42:57Z (GMT) No. of bitstreams: 2 DISSERTAÇÃO Severino José de Barros Júnior.pdf: 3495935 bytes, checksum: b2c482e8b4f864c84aad98267495cde1 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Made available in DSpace on 2015-03-10T19:42:57Z (GMT). No. of bitstreams: 2 DISSERTAÇÃO Severino José de Barros Júnior.pdf: 3495935 bytes, checksum: b2c482e8b4f864c84aad98267495cde1 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2014-09-10 / FINEP/Petrobrás(CENPES) / Organizações que lidam com sistemas computacionais buscam cada vez mais melhorar o desempenho de suas aplicações. Essas aplicações possuem como principal característica o processamento massivo de dados. A solução utilizada para execução desses problemas é baseada, em geral, em arquiteturas de processadores de uso geral, cuja principal característica é sua estrutura de hardware baseada no Paradigma de Von Neumann. Esse paradigma possui uma deficiência conhecida como “Gargalo de Von Neumann”, onde instruções que poderiam ser executadas de forma simultânea, devido à sua independência de dados, acabam sendo processadas sequencialmente, prejudicando o potencial desempenho dessa classe de aplicações. Para aumentar o processamento paralelo dos sistemas, as Organizações costumam adotar uma estrutura baseada na associação de vários PCs, conectados a uma rede de alta velocidade e trabalham em conjunto para resolver um grande problema. A essa associação é atribuída o nome de cluster, a qual cada integrante PC, chamado de nó, realiza uma parte da computação de um grande problema de forma simultânea, proporcionando a ideia de um paralelismo explícito da aplicação como um todo. Mesmo com um aumento significativo de elementos de processamento independentes, este crescimento é insuficiente para atender à enorme quantidade de demanda de computação de dados em aplicações complexas. Ela exige uma divisão de grupos de instruções independentes, distribuídos entre os nós. Esta estratégia dá a idéia de paralelismo e assim um melhor desempenho. No entanto, o desempenho em cada nó permanece degradado, devido ao estrangulamento seqüencial presente nós processadores. A fim de aumentar o paralelismo das operações em cada nó, soluções híbridas, compostas por CPUs convencionais e coprocessadores foram adotadas. Um desses coprocessadores é o FPGA (Field Programmable Gate Array), que geralmente é conectado ao PC através do barramento PCIe. O projeto descrito na dissertação propõe uma metodologia de desenvolvimento para este aglomerado híbrido, de modo a aumentar o desempenho de aplicações científicas que requerem uma grande quantidade de processamento de dados. A metodologia é apresentada e dois exemplos são discutidos em detalhes.
14

fastRTM: um ambiente integrado para desenvolvimento rápido da migração reversa no tempo (RTM) em plataformas FPGA de alto desempenho

Medeiros, Victor Wanderley Costa de 08 March 2013 (has links)
Submitted by João Arthur Martins (joao.arthur@ufpe.br) on 2015-03-12T19:43:57Z No. of bitstreams: 2 Tese Victor de Medeiros.pdf: 8732234 bytes, checksum: cb5f309fb8765288e881d47eef4ccb56 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) / Made available in DSpace on 2015-03-12T19:43:57Z (GMT). No. of bitstreams: 2 Tese Victor de Medeiros.pdf: 8732234 bytes, checksum: cb5f309fb8765288e881d47eef4ccb56 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2013-03-08 / CENPES/petrobrás;CAPES;CNPq;Finep e DAAD / O aumento constante da demanda por desempenho e eficiência, e a barreira imposta ao aumento da frequência de operação dos processadores pela tecnologia utilizada na construção dos chips atuais, trouxe o foco da indústria para o desenvolvimento de arquiteturas multi-core. Esta abordagem focada em paralelismo foi empregada não só em arquiteturas baseadas em processadores de propósito geral, como também, em novas plataformas como: os processadores gráficos (GPUs); processadores Cell; e os dispositivos reconfiguráveis (FPGAs). Esta mudança de paradigma tem exigido grandes investimentos em pesquisa e desenvolvimento. Além do desenvolvimento do hardware é necessário desenvolver linguagens, compiladores, ferramentas e sistemas operacionais capazes de lidar com o caráter paralelo e heterogêneo destas novas tecnologias. Outro ponto importante a se destacar é o cenário em que vivemos hoje, onde o uso eficiente e sustentável dos recursos naturais é essencial. Neste contexto, os dispositivos reconfiguráveis, mais especificamente os FPGAs (Field Programmable Gate Arrays), se apresentam como uma excelente alternativa devido ao seu caráter intrinsecamente paralelo e a baixa frequência em que operam, permitindo uma grande capacidade computacional a um baixo custo energético em muitas aplicações. No entanto, o desenvolvimento de aplicações em FPGAs ainda é uma tarefa bastante árdua, exigindo um tempo de desenvolvimento, muitas vezes, incompatível com a necessidade da indústria. Este trabalho apresenta o ambiente fastRTM, que tem como principal objetivo dar suporte ao desenvolvimento da modelagem e migração sísmicas RTM em plataformas FPGAs de alto desempenho. A modelagem e migração sísmicas são algoritmos computacionalmente intensivos utilizados na indústria para prospecção de óleo e gás. O ambiente provê mecanismos para descrição das arquiteturas em FPGA, reuso de componentes, simulação e exploração de espaço de projeto visando reduzir o tempo de desenvolvimento e explorando o potencial de desempenho presente na plataforma FPGA. O trabalho também apresenta um estudo que confirma a viabilidade do FPGA para este tipo de aplicação, comparando o seu desempenho com o de outras arquiteturas.
15

Scheduling Tasks over Multicore machines enhanced with Accelerators : a Runtime System’s Perspective / Vers des supports exécutifs capables d'exploiter des machines multicors hétérogènes

Augonnet, Cédric 09 December 2011 (has links)
Bien que les accélérateurs fassent désormais partie intégrante du calcul haute performance, les gains observés ont un impact direct sur la programmabilité, de telle sorte qu'un support proposant des abstractions portables est indispensable pour tirer pleinement partie de toute la puissance de calcul disponible de manière portable, malgré la complexité de la machine sous-jacente. Dans cette thèse, nous proposons un modèle de support exécutif offrant une interface expressive permettant notamment de répondre aux défis soulevés en termes d'ordonnancement et de gestion de données. Nous montrons la pertinence de notre approche à l'aide de la plateforme StarPU conçue à l'occasion de cette thèse. / Multicore machines equipped with accelerators are becoming increasingly popular in the HighPerformance Computing ecosystem. Hybrid architectures provide significantly improved energyefficiency, so that they are likely to generalize in the Manycore era. However, the complexity introducedby these architectures has a direct impact on programmability, so that it is crucial toprovide portable abstractions in order to fully tap into the potential of these machines. Pure offloadingapproaches, that consist in running an application on regular processors while offloadingpredetermined parts of the code on accelerators, are not sufficient. The real challenge is to buildsystems where the application would be spread across the entire machine, that is, where computationwould be dynamically scheduled over the full set of available processing units.In this thesis, we thus propose a new task-based model of runtime system specifically designedto address the numerous challenges introduced by hybrid architectures, especially in terms of taskscheduling and of data management. In order to demonstrate the relevance of this model, we designedthe StarPU platform. It provides an expressive interface along with flexible task schedulingcapabilities tightly coupled to an efficient data management. Using these facilities, together witha database of auto-tuned per-task performance models, it for instance becomes straightforward todevelop efficient scheduling policies that take into account both computation and communicationcosts. We show that our task-based model is not only powerful enough to provide support forclusters, but also to scale on hybrid manycore architectures.We analyze the performance of our approach on both synthetic and real-life workloads, andshow that we obtain significant speedups and a very high efficiency on various types of multicoreplatforms enhanced with accelerators.
16

Benchmark-driven approaches to performance modeling of multi-core architectures / Modélisation des architecture multi-cœur par des mesures de performance

Putigny, Bertrand 27 March 2014 (has links)
Ce manuscrit s'inscrit dans le domaine du calcul intensif (HPC) où le besoin croissant de performance pousse les fabricants de processeurs à y intégrer des mécanismes de plus en plus sophistiqués. Cette complexité grandissante rend l'utilisation des architectures compliquée. La modélisation des performances des architectures multi-cœurs permet de remonter des informations aux utilisateurs, c'est à dire les programmeurs, afin de mieux exploiter le matériel. Cependant, du fait du manque de documentation et de la complexité des processeurs modernes, cette modélisation est souvent difficile. L'objectif de ce manuscrit est d'utiliser des mesures de performances de petits fragments de codes afin de palier le manque d'information sur le matériel. Ces expériences, appelées micro-benchmarks, permettent de comprendre les performances des architectures modernes sans dépendre de la disponibilité des documentations techniques. Le premier chapitre présente l'architecture matérielle des processeurs modernes et, en particulier, les caractéristiques rendant la modélisation des performances complexe. Le deuxième chapitre présente une méthodologie automatique pour mesurer les performances des instructions arithmétiques. Les informations trouvées par cette méthode sont la base pour des modèles de calculs permettant de prédire le temps de calcul de fragments de codes arithmétique. Ce chapitre présent également comment de tels modèles peuvent être utilisés pour optimiser l'efficacité énergétique, en prenant pour exemple le processeur SCC. La dernière partie de ce chapitre motive le fait de réaliser un modèle mémoire prenant en compte la cohérence de cache pour prédire le temps d'accès au données. Le troisième chapitre présente l'environnement de développement de micro-benchmark utilisé pour caractériser les hiérarchies mémoires dotées de cohérence de cache. Ce chapitre fait également une étude comparative des performances mémoire de différentes architectures et l'impact sur les performances du choix du protocole de cohérence. Enfin, le quatrième chapitre présente un modèle mémoire permettant la prédiction du temps d'accès aux données pour des applications régulières de type \openmp. Le modèle s'appuie sur l'état des données dans le protocole de cohérence. Cet état évolue au fil de l'exécution du programme en fonction des accès à la mémoire. Pour chaque transition, une fonction de coût est associée. Cette fonction est directement dérivée des résultats des expériences faites dans le troisième chapitre, et permet de prédire le temps d'accès à la mémoire. Une preuve de concept de la fiabilité de ce modèle est faite, d'une part sur les applications d'algèbre et d'analyse numérique, d'autre part en utilisant ce modèle pour modéliser les performance des communications \mpi en mémoire partagée. / In the race for better performance, computer architectures are becoming more and more complex. Therefore the need for hardware models is crucial to i) tune software to the underling architecture, ii) build tools to better exploit hardware or iii) choose an architecture according to the needs of a given application. In this dissertation, we aim at describing how to build a hardware model that targets all critical parts of modern computer architecture. That is the processing unit itself, memory and even power consumption. We believe that a large part of hardware modeling can be done automatically. This would relieve people from the tiresome task of doing it by hand. Our first contribution is a set of performance models for the on-core part of several different CPUs. This part of an architecture model is called the computational model. The computational model targeting the Intel SCC chip also includes a power model allowing for power aware performance optimization. Our other main contribution is an auto-tuned memory hierarchy model for general purpose CPUs able to i) predict performance of memory bound computations, ii) provide programmer with programming guidelines to improve software memory behavior.
17

The Application of High-Performance Computing to Create and Analyze Simulations of Human Injury

Kevin G McIver (6577457) 11 August 2022 (has links)
<p>Research in the field of human injury biomechanics with respect to athletes has indicated that head acceleration events (HAEs) suffered during participation in a contact sport can cause long-term neurological changes that present asymptomatically. This concept has been referred to as “mild” traumatic brain injury (mTBI). This mirrors results found in soldiers, where it is also now thought that traumatic brain injury, coupled with psychological trauma can lead to posttraumatic stress disorder (PTSD). Current consensus amongst the neurotrauma research community is that all HAEs matter, whether caused by blast, blunt force, or directed energy weapons.</p> <p><br></p> <p>Previous research has focused on the long-term changes that have been demonstrated and quantified, however very little research has been done to quantify the effects of a single insult to the brain. Several studies have had participants perform head motions while in a magnetic resonance imaging (MRI) scanner. Digital twins may be used to simulate the effects of an insult, be it blast, blunt force, or directed energy to an object. Finite element models of the human head and brain have a long history of development from the earliest models in the 1970s to today. Currently, numerous software packages allow for the regularization and comparison of MRI datasets. Some software packages offer additionally the ability to create subject specific finite element meshes interactively from a single MRI image. Previous research in the HIRRT Lab reduced the time to generate simulation geometry to approximately 48 hours to generate a patient specific finite element mesh. This represented a substantial reduction in the processing time for a single scan, which to the knowledge of the authors required on the time scale of weeks to process a single geometry including the skull robustly or required costly software licenses, and still required user interactive processes. The architecture and deployment of the HIRRT Lab Cluster, a high-performance computing system that is a cost-optimized research tool to enable rapid processing of scans to simulation geometry using batch processes on a Slurm cluster. There are software optimizations, operating system optimizations, and Linux kernel-level optimization (and selections) utilized that enable the hardware selected to perform optimally. </p> <p><br></p> <p>To the knowledge of the author, no single pipeline enables the automated generation of robust, patient specific finite element meshes from raw datasets fresh from an MRI. This package addresses those limitations with a design heavily tilted towards Linux cluster implementations. The author has created a pipeline of code designed to run on a Linux-based compute cluster that is capable of processing 1700 scans from raw T1-weighted MRI scans to a finite element mesh with regions of interest (ROIs) identified as element sets, and white matter fiber orientation determined from diffusion tensor imaging (DTI) scans in under 7 days using the current hardware available in the HIRRT Lab Cluster with appropriate software licensing. This represents a speed up of over 1200x compared to the original program overall at just mesh processing, and a speed up of 22x for a single scan being processed, with additional features and detail not captured by the original code. </p> <p><br></p> <p>Accurate representative models for subpopulations via their immutable traits (e.g. size, biological sex, ethnicity/ancestry, or age) can further reduce the number of simulations that are required to accurately assist in the improvement of finite element models that may be used to improve the design of personal protective equipment, create new techniques, or aid in the design of new vehicles capable of reducing the exposure of individuals to potentially traumatic damage. The use of subpopulation groupings rather than the simulation of each unique individual, even models consisting of bounding cases, such as the largest or smallest representative members of a subpopulation can reduce the amount of data that needs to be processed to generate useful design feedback for engineers. </p> <p><br></p> <p>Subject-specific models allow for greater variation in strain due to geometric differences between individuals brains and should be used where possible to describe a given individual’s strain history more accurately, which can be used to assess the formation of damage as indicated by biomarkers. To understand the long-term effects of blast overpressures on brain structure, function, and chemistry, and subsequently develop appropriate mitigation strategies, computational models of individual soldiers must be developed. These models must integrate blast physics and neuroimaging of actual tissue damage to the brain. There is a need to develop constitutive equations capable of being used in multi-scale models to relate various insults directly to damage in the brain. These equations should be linked to damage as indicated through various MRI scan types and used to robustly assess individuals over the course of their unique impact histories. Through the development of a digital twin in this manner, unique predictive medicine may be used to proactively identify those athletes and warfighters who may be at higher risk for long term detrimental effects from further exposure to HAEs.</p>
18

An Evaluation of TensorFlow as a Programming Framework for HPC Applications / En undersökning av TensorFlow som ett utvecklingsramverk för högpresterade datorsystem

Chien, Wei Der January 2018 (has links)
In recent years, deep-learning, a branch of machine learning gained increasing popularity due to their extensive applications and performance. At the core of these application is dense matrix-matrix multiplication. Graphics Processing Units (GPUs) are commonly used in the training process due to their massively parallel computation capabilities. In addition, specialized low-precision accelerators have emerged to specifically address Tensor operations. Software frameworks, such as TensorFlow have also emerged to increase the expressiveness of neural network model development. In TensorFlow computation problems are expressed as Computation Graphs where nodes of a graph denote operation and edges denote data movement between operations. With increasing number of heterogeneous accelerators which might co-exist on the same cluster system, it became increasingly difficult for users to program efficient and scalable applications. TensorFlow provides a high level of abstraction and it is possible to place operations of a computation graph on a device easily through a high level API. In this work, the usability of TensorFlow as a programming framework for HPC application is reviewed. We give an introduction of TensorFlow as a programming framework and paradigm for distributed computation. Two sample applications are implemented on TensorFlow: tiled matrix multiplication and conjugate gradient solver for solving large linear systems. We try to illustrate how such problems can be expressed in computation graph for distributed computation. We perform scalability tests and comment on performance scaling results and quantify how TensorFlow can take advantage of HPC systems by performing micro-benchmarking on communication performance. Through this work, we show that TensorFlow is an emerging and promising platform which is well suited for a particular class of problem which requires very little synchronization. / Under de senaste åren har deep-learning, en så kallad typ av maskininlärning, blivit populärt på grund av dess applikationer och prestanda. Den viktigaste komponenten i de här teknikerna är matrismultiplikation. Grafikprocessorer (GPUs) är vanligt förekommande vid träningsprocesser av artificiella neuronnät. Detta på grund av deras massivt parallella beräkningskapacitet. Dessutom har specialiserade lågprecisionsacceleratorer  som  specifikt beräknar  matrismultiplikation tagits fram. Många utvecklingsramverk har framkommit för att hjälpa programmerare att hantera artificiella neuronnät. I TensorFlow uttrycks beräkningsproblem som en beräkningsgraf. En nod representerar en beräkningsoperation och en väg representerar dataflöde mellan beräkningsoperationer i en beräkningsgraf. Eftersom man måste programmera olika acceleratorer med olika systemarkitekturer har programmering av högprestandasystem blivit allt svårare. TensorFlow erbjuder en hög abstraktionsnivå och förenklar programmering av högprestandaberäkningar. Man programmerar acceleratorer genom att placera operationer inom grafen på olika acceleratorer med en API. I detta arbete granskas användbarheten hos TensorFlow som ett programmeringsramverk för applikationer med högprestandaberäkningar. Vi presenterar TensorFlow som ett programmeringsutvecklingsramverk för distribuerad beräkning. Vi implementerar två vanliga applikationer i TensorFlow: en lösare som löser linjära ekvationsystem med konjugerade gradientmetoden samt blockmatrismultiplikation och illustrerar hur de här problemen kan uttryckas i beräkningsgrafer för distribuerad beräkning. Vi experimenterar och kommenterar metoder för att demonstrera hur TensorFlow kan nyttja HPC-maskinvaror. Vi testar både skalbarhet och effektivitet samt gör mikro-benchmarking på kommunikationsprestanda. Genom detta arbete visar vi att TensorFlow är en framväxande och lovande plattform som passar väl för en viss typ av problem som kräver minimal synkronisering.
19

Designing High-Performance Remote Memory Access for MPI and PGAS Models with Modern Networking Technologies on Heterogeneous Clusters

Li, Mingzhe January 2017 (has links)
No description available.
20

RhoA GTPase Controls Cytokinesis and Programmed Necrosis of Hematopoietic Progenitors

Zhou, Xuan 28 October 2013 (has links)
No description available.

Page generated in 0.04 seconds