• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 146
  • 24
  • 19
  • 12
  • 8
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 265
  • 96
  • 82
  • 73
  • 67
  • 47
  • 36
  • 35
  • 30
  • 29
  • 28
  • 25
  • 25
  • 25
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Uma plataforma híbrida baseada em FPGA para a aceleração de um algoritmo de alinhamento de sequências biológicas

FIGUEIRÔA, Luiz Henrique Alves 17 August 2015 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-04-05T14:47:50Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação_Figueiroa(versao_final).pdf: 2779464 bytes, checksum: bec03362367d058faa9ed8c36d09b5f8 (MD5) / Made available in DSpace on 2016-04-05T14:47:50Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação_Figueiroa(versao_final).pdf: 2779464 bytes, checksum: bec03362367d058faa9ed8c36d09b5f8 (MD5) Previous issue date: 2015-08-17 / A partir da revelação da estrutura em dupla-hélice do DNA, em 1953, foi aberto o caminho para a compreensão dos mecanismos que codificam as instruções de construção e desenvolvimento das células dos seres vivos. A nova geração de sequenciadores (NGS) têm produzido gigantescos volumes de dados nos Bancos de Dados biológicos cujas informações podem demandar uma intensa atividade computacional em sua compilação. Entretanto, o desempenho das ferramentas empregadas na Biologia Computacional não tem evoluído na mesma taxa de crescimento desses bancos, podendo impor restrições aos avanços neste campo de pesquisa. Uma das principais técnicas usadas é o alinhamento de sequências que, a partir da identificação de similaridades, possibilitam a análise de regiões conservadas em sequências homólogas, servem como ponto de partida no estudo de estruturas secundárias de proteínas e de construção de àrvores filogenéticas, entre outros. Como os algoritmos exatos de alinhamento possuem complexidade quadrática no tempo e no espaço, o custo computacional poderá ser elevado demandando estratégias de aceleração. Neste contexto, a Computação de Alto Desempenho (HPC), estruturada em Supercomputadores e Clusters, tem sido, empregada. No entanto, o investimento inicial e os requisitos de manutenção, espaço físico, refrigeração, além do consumo de energia, podem representar custos significativos. As arquiteturas paralelas híbridas baseadas na ação conjunta de PCs e dispositivos aceleradores como chips VLSI, GPGPUs e FPGAs, surgiram como alternativas mais acessíveis, apresentando resultados promissores. O projeto descrito nesta dissertação tem por objetivo a aceleração do algoritmo de alinhamento-ótimo global, conhecido como Needleman-Wunsch, a partir de uma plataforma híbrida baseada em um PC (host) e um FPGA. A aceleração ocorre a partir da exploração das possibilidades de paralelismo oferecidas pelo algoritmo e sua implementação em hardware. A arquitetura desenvolvida é baseada num Array Sistólico Linear apresentando elevado desempenho e boa escalabilidade. / From the revelation of the structure in double-helix of Deoxyribonucleic Acid (DNA) by James D. Watson and Francis H. C. Crick, in 1953, it opened the way for the understanding of the mechanismis that encoding the building instructions and development of cells of living beings. The DNA sequencing is one of the first steps in this process. The new generation of sequencers (NGS) have produced massive amounts of data on biological databases whose information may require intense computational activity in your compilation. However, the performance of the tools employed in Computational Biology has not evolved at the same rate of growth of these banks, may impose restrictions on advances in this research field. One of the primary techniques used is the sequence alignment that from the identification of similarities, enable the analysis of conserved regions of homologous sequences, serve as the starting point in the study of protein secondary structures and the construction of phylogenetic trees, among others. As the exact alignment algorithms have quadratic complexity in time and space, the computational cost can be high demanding acceleration strategies. In this context, the High Performance Computing (HPC), structured in supercomputers and clusters, has been employed. However, the initial investment and maintenance requirements, floor space, cooling, in addition to energy consumption, may represent significant costs. The hybrid parallel architectures based on joint action of PCs and devices accelerators as VLSI chips, GPGPUs and FPGAs, have emerged as more affordable alternatives, with promising results. The project described in this dissertation aims at accelerating the global optimal-alignment algorithm, known as Needleman-Wunsch, from a hybrid platform based on a PC, that acts as host, and an FPGA. The acceleration occurs through exploration of the parallelism opportunities offered by the algorithm and implemented in hardware. In this, an architecture based on a Linear Systolic Array offers high performance and high scalability.
32

Métodos bacteriológicos aplicados à tuberculose bovina: comparação de três métodos de descontaminação e de três protocolos para criopreservação de isolados / Bacteriologic methods applied to bovine tuberculosis: comparison of three decontamination methods and three protocols for cryopreservation of isolates

Simone Rodrigues Ambrosio 09 December 2005 (has links)
Dada a importância do Programa Nacional de Controle e Erradicação da Brucelose e Tuberculose (PNCEBT), a necessidade de uma eficiente caracterização bacteriológica dos focos como ponto fundamental do sistema de vigilância e as dificuldades encontradas pelos laboratórios quanto aos métodos de isolamento de Mycobacterium bovis fizeram crescer o interesse do meio científico por estudos, sobretudo moleculares, de isolados M. bovis. Para a realização dessas técnicas moleculares, é necessária abundância de massa bacilar, obtida através da manutenção dos isolados em laboratório e repiques em meios de cultura. Entretanto o crescimento fastidioso do M. bovis em meios de cultura traz grandes dificuldades para essas operações. Assim sendo, o presente estudo teve por objetivos: 1º) Comparar três métodos de descontaminação para homogeneizados de órgãos, etapa que precede a semeadura em meios de cultura, onde 60 amostras de tecidos com lesões granulomatosas, provenientes de abatedouros bovinos do Estado de São Paulo, foram colhidas e imersas em solução saturada de Borato de Sódio e transportadas para o Laboratório de Zoonoses Bacterianas do VPS-FMVZ-USP, onde foram processadas até 60 dias após a colheita. Essas amostras foram submetidas a três métodos de descontaminação: Básico (NaOH 4%), Ácido (H2SO4 12%) e 1- Hexadecylpyridinium chloride a 1,5% (HPC) e o quarto método foi representado pela simples diluição com solução salina (controle). Os resultados foram submetidos à comparação de proporções, pelo teste de &#967;², na qual verificou-se que o método HPC foi o que apresentou menor proporção de contaminação (3%) e maior proporção de sucesso para isolamento de BAAR (40%). 2º) Comparar três diferentes meios criopreservates para M. bovis, foram utilizados 16 isolados identificados pela técnica de spoligotyping. Cada um desses isolados foi solubilizado em três meios (solução salina, 7H9 original e 7H9 modificado), e armazenado em três diferentes temperaturas (-20ºC, -80ºC e -196ºC), sendo descongelado em três diferentes tempos (45, 90 e 120 dias de congelamento). Antes do congelamento e após o descongelamento foram feitos cultivos quantitativos em meios de Stonebrink Leslie. Os porcentuais de redução de Unidades Formadoras de Colônias (UFC) nas diferentes condições foram calculados e comparados entre si através de métodos paramétricos e não-paramétricos. Os resultados obtidos foram: na análise da variável tempo, em 90 dias de congelamento foi observada uma maior proporção de perda de M. bovis, quando comparado ao tempo 120 dias (p=0,0002); na análise da variável temperatura, foi observada uma diferença estatística significativa entre as proporções de perda média nas temperaturas de -20ºC e -80ºC (p<0,05); na análise da variável meio, foi observada uma diferença significativa (p=0,044) entre os meios A e C, para 45 dias de congelamento e -20ºC de temperatura de criopreservação. Embora as medianas dos porcentuais de perdas de UFC terem sido sempre inferiores a 4,2%, permitiram sugerir que o melhor protocolo de criopreservação de isolados de M. bovis é solubilizá-los em 7H9 modificado e mantê-los à temperatura de -20ºC / In the context of the National Program of Control and Eradication of Brucellosis and Tuberculosis (PNCEBT), the necessity of an efficient bacteriologic characterization of the infected herds as a cornerstone of the monitoring system and the difficulties faced by the laboratories regarding the methods for Mycobacterium bovis isolation led to a growing interest for scientific studies, especially molecular, of M. bovis isolates. To use these molecular techniques it is necessary to have an abundant bacillary mass, obtained through the maintenance of isolates in laboratory and replication in culture media. However the fastidious growth of M. bovis in culture media brings out great difficulties for these activities. Thus, the present study has the following objectives: First, to compare three decontamination methods for organ homogenates, phase that precedes the sowing in culture media, 60 samples of tissues with granulomatosis injuries, proceeding from bovine slaughterhouses in the State of São Paulo, were obtained, immersed in sodium borato saturated solution and transported to the Laboratório de Zoonoses Bacterianas of the VPS-FMVZ-USP, where they were processed up to 60 days after the sampling. These samples were submitted to three methods of decontamination: Basic NaOH 4%, Acid (H2SO4 12%) and 1- Hexadecylpyridinium chloride (HPC) 1.5% and a simple dilution with saline solution (control method). The results were analysed by means of the &chi test to compare proportions, and it was verified that HPC method presented the smallest proportion of contamination (3%) and the greatest proportion of success for M. bovis isolation (40%). Second, to compare three different cryopreservation media for M. bovis, 16 isolates identified by the technique of spoligotyping were used. Each one of these isolates was solubilized in three media (original saline solution, 7H9 and 7H9 modified), and stored in three different temperatures (-20ºC, -80ºC and -196ordm;C), and defrosted in three different time periods (45, 90 and 120 days of freezing). Before the freezing and after the unfreezing, quantitative cultivations in Stonebrink Leslie media were carried out. The proportions of Colony-Forming Units (CFU) loss in the different conditions were calculated and compared with one another through parametric and non-parametric methods. The results obtained were: in the analysis of the variable time, at 90 days of freezing a bigger proportion of CFU loss was observed when compared to 120 days (p=0,0002); in the analysis of the variable temperature, a statistically significant difference was observed between the average proportions of CFU loss for the temperatures of -20ºC and -80ºC (p<0,05); in the analysis of the variable media, a significant difference was observed (p=0,044) between the media A and C, for 45 days of freezing and -20ºC of cryopreservation temperature. Althougth the medium ones of the proportion of losses of CFU to always have been inferior 4,2%, had allowed to suggest that the best protocol for cryopreservation of M. bovis isolates is to solubilize them in 7H9 modified medium and to keep them at a temperature of -20ºC
33

Profiling and debugging by efficient tracing of hybrid multi-threaded HPC applications / Profilage et débogage par prise de traces efficaces d'applications hybrides multi-threadées HPC

Besnard, Jean-Baptiste 16 July 2014 (has links)
L’évolution des supercalculateurs est à la source de défis logiciels et architecturaux. Dans la quête de puissance de calcul, l’interdépendance des éléments du processus de simulation devient de plus en plus impactante et requiert de nouvelles approches. Cette thèse se concentre sur le développement logiciel et particulièrement sur l’observation des programmes parallèles s’exécutant sur des milliers de cœurs. Dans ce but, nous décrivons d’abord le processus de développement de manière globale avant de présenter les outils existants et les travaux associés. Dans un second temps, nous détaillons notre contribution qui consiste d’une part en des outils de débogage et profilage par prise de traces, et d’autre part en leur évolution vers un couplage en ligne qui palie les limitations d’entrées–sorties. Notre contribution couvre également la synchronisation des horloges pour la prise de traces avec la présentation d’un algorithme de synchronisation probabiliste dont nous avons quantifié l’erreur. En outre, nous décrivons un outil de caractérisation machine qui couvre l’aspect MPI. Un tel outil met en évidence la présence de bruit aussi bien sur les communications de type point-à-point que de type collective. Enfin, nous proposons et motivons une alternative à la collecte d’événements par prise de traces tout en préservant la granularité des événements et un impact réduit sur les performances, tant sur le volet utilisation CPU que sur les entrées–sorties / Supercomputers’ evolution is at the source of both hardware and software challenges. In the quest for the highest computing power, the interdependence in-between simulation components is becoming more and more impacting, requiring new approaches. This thesis is focused on the software development aspect and particularly on the observation of parallel software when being run on several thousand cores. This observation aims at providing developers with the necessary feedback when running a program on an execution substrate which has not been modeled yet because of its complexity. In this purpose, we firstly introduce the development process from a global point of view, before describing developer tools and related work. In a second time, we present our contribution which consists in a trace based profiling and debugging tool and its evolution towards an on-line coupling method which as we will show is more scalable as it overcomes IOs limitations. Our contribution also covers our time-stamp synchronisation algorithm for tracing purposes which relies on a probabilistic approach with quantified error. We also present a tool allowing machine characterisation from the MPI aspect and demonstrate the presence of machine noise for both point to point and collectives, justifying the use of an empirical approach. In summary, this work proposes and motivates an alternative approach to trace based event collection while preserving event granularity and a reduced overhead
34

Performance Improvement of Hypervisors for HPC Workload

Zhang, Yu 11 February 2019 (has links)
The virtualization technology has many excellent features beneficial for today’s high-performance computing (HPC). It enables more flexible and effective utilization of the computing resources. However, a major barrier for its wide acceptance in HPC domain lies in the relative large performance loss for workloads. Of the major performance-influencing factors, memory management subsystem for virtual machines is a potential source of performance loss. Many efforts have been invested in seeking the solutions to reduce the performance overhead in guest memory address translation process. This work contributes two novel solutions - “DPMS” and “STDP”. Both of them are presented conceptually and implemented partially for a hypervisor - KVM. The benchmark results for DPMS show that the performance for a number of workloads that are sensitive to paging methods can be more or less improved through the adoption of this solution. STDP illustrates that it is feasible to reduce the performance overhead in the second dimension paging for those workloads that cannot make good use of the TLB. / Virtualisierungstechnologie verfügt über viele hervorragende Eigenschaften, die für das heutige Hochleistungsrechnen von Vorteil sind. Es ermöglicht eine flexiblere und effektivere Nutzung der Rechenressourcen. Ein Haupthindernis für Akzeptanz in der HPC-Domäne liegt jedoch in dem relativ großen Leistungsverlust für Workloads. Von den wichtigsten leistungsbeeinflussenden Faktoren ist die Speicherverwaltung für virtuelle Maschinen eine potenzielle Quelle der Leistungsverluste. Es wurden viele Anstrengungen unternommen, um Lösungen zu finden, die den Leistungsaufwand beim Konvertieren von Gastspeicheradressen reduzieren. Diese Arbeit liefert zwei neue Lösungen DPMS“ und STDP“. Beide werden konzeptionell vorgestellt und teilweise für einen Hypervisor - KVM - implementiert. Die Benchmark-Ergebnisse für DPMS zeigen, dass die Leistung für eine Reihe von pagingverfahren-spezifischen Workloads durch die Einführung dieser Lösung mehr oder weniger verbessert werden kann. STDP veranschaulicht, dass es möglich ist, den Leistungsaufwand im zweidimensionale Paging für diejenigen Workloads zu reduzieren, die die von dem TLB anbietende Vorteile nicht gut ausnutzen können.
35

Étude de transformations et d’optimisations de code parallèle statique ou dynamique pour architecture "many-core" / Study of transformations and static or dynamic parallel code optimization for manycore architecture

Gallet, Camille 13 October 2016 (has links)
L’évolution des supercalculateurs, de leur origine dans les années 60 jusqu’à nos jours, a fait face à 3 révolutions : (i) l’arrivée des transistors pour remplacer les triodes, (ii) l’apparition des calculs vectoriels, et (iii) l’organisation en grappe (clusters). Ces derniers se composent actuellement de processeurs standards qui ont profité de l’accroissement de leur puissance de calcul via une augmentation de la fréquence, la multiplication des cœurs sur la puce et l’élargissement des unités de calcul (jeu d’instructions SIMD). Un exemple récent comportant un grand nombre de cœurs et des unités vectorielles larges (512 bits) est le co-proceseur Intel Xeon Phi. Pour maximiser les performances de calcul sur ces puces en exploitant aux mieux ces instructions SIMD, il est nécessaire de réorganiser le corps des nids de boucles en tenant compte des aspects irréguliers (flot de contrôle et flot de données). Dans ce but, cette thèse propose d’étendre la transformation nommée Deep Jam pour extraire de la régularité d’un code irrégulier et ainsi faciliter la vectorisation. Ce document présente notre extension et son application sur une mini-application d’hydrodynamique multi-matériaux HydroMM. Ces travaux montrent ainsi qu’il est possible d’obtenir un gain de performances significatif sur des codes irréguliers. / Since the 60s to the present, the evolution of supercomputers faced three revolutions : (i) the arrival of the transistors to replace triodes, (ii) the appearance of the vector calculations, and (iii) the clusters. These currently consist of standards processors that have benefited of increased computing power via an increase in the frequency, the proliferation of cores on the chip and expansion of computing units (SIMD instruction set). A recent example involving a large number of cores and vector units wide (512-bit) is the co-proceseur Intel Xeon Phi. To maximize computing performance on these chips by better exploiting these SIMD instructions, it is necessary to reorganize the body of the loop nests taking into account irregular aspects (control flow and data flow). To this end, this thesis proposes to extend the transformation named Deep Jam to extract the regularity of an irregular code and facilitate vectorization. This thesis presents our extension and application of a multi-material hydrodynamic mini-application, HydroMM. Thus, these studies show that it is possible to achieve a significant performance gain on uneven codes.
36

Modeling and simulation of the diffusion MRI signal from human brain white matter to decode its microstructure and produce an anatomic atlas at high fields (3T) / Modélisation et simulation du signal IRM pondéré en diffusion de la substance blanche cérébrale en vue du décodage de sa microstructure et de l'établissement d'un atlas anatomique à hauts champs (3T)

Ginsburger, Kévin 30 August 2019 (has links)
L'imagerie par résonance magnétique du processus de diffusion (IRMd) de l'eau dans le cerveau a connu un succès fulgurant au cours de la décennie passée pour cartographier les connexions cérébrales. C'est toujours aujourd'hui la seule technique d'investigation de la connectivité anatomique du cerveau humain in vivo. Mais depuis quelques années, il a été démontré que l'IRMd est également un outil unique de biopsie virtuelle in vivo en permettant de sonder la composition du parenchyme cérébral également in vivo. Toutefois, les modèles développés à l'heure actuelle (AxCaliber, ActiveAx, CHARMED) reposent uniquement sur la modélisation des membranes axonales à l'aide de géométries cylindriques, et restent trop simplistes pour rendre compte précisément de l'ultrastructure de la substance blanche et du processus de diffusion dans l’espace extra-axonal. Dans un premier temps, un modèle analytique plus réaliste de la substance blanche cérébrale tenant compte notamment de la dépendance temporelle du processus de diffusion dans le milieu extra-axonal a été développé. Un outil de décodage complexe permettant de résoudre le problème inverse visant à estimer les divers paramètres de la cytoarchitecture de la substance blanche à partir du signal IRMd a été mis en place en choisissant un schéma d'optimisation robuste pour l'estimation des paramètres. Dans un second temps, une approche Big Data a été conduite pour améliorer le décodage de la microstructure cérébrale. Un outil de création de tissus synthétiques réalistes de la matière blanche a été développé, permettant de générer très rapidement un grand nombre de voxels virtuels. Un outil de simulation ultra-rapide du processus de diffusion des particules d'eau dans ces voxels virtuels a ensuite été mis en place, permettant la génération de signaux IRMd synthétiques associés à chaque voxel du dictionnaire. Un dictionnaire de voxels virtuels contenant un grand nombre de configurations géométriques rencontrées dans la matière blanche cérébrale a ainsi été construit, faisant en particulier varier le degré de gonflement de la membrane axonale qui peut survenir comme conséquence de pathologies neurologiques telles que l’accident vasculaire cérébral. L'ensemble des signaux simulés associés aux configurations géométriques des voxels virtuels dont ils sont issus a ensuite été utilisé comme un jeu de données permettant l'entraînement d'un algorithme de machine learning pour décoder la microstructure de la matière blanche cérébrale à partir du signal IRMd et estimer le degré de gonflement axonal. Ce décodeur a montré des résultats de régression encourageants sur des données simulées inconnues, montrant le potentiel de l’approche computationnelle présentée pour cartographier la microstructure de tissus cérébraux sains et pathologiques in vivo. Les outils de simulation développés durant cette thèse permettront, en utilisant un algorithme de recalage difféomorphe de propagateurs de diffusion d’ensemble également développé dans le cadre de cette thèse, de construire un atlas probabiliste des paramètres microstructuraux des faisceaux de matière blanche. / Diffusion Magnetic Resonance Imaging of water in the brain has proven very useful to establish a cartography of brain connections. It is the only in vivo modality to study anatomical connectivity. A few years ago, it has been shown that diffusion MRI is also a unique tool to perform virtual biopsy of cerebral tissues. However, most of current analytical models (AxCaliber, ActiveAx, CHARMED) employed for the estimation of white matter microstructure rely upon a basic modeling of white matter, with axons represented by simple cylinders and extra-axonal diffusion assumed to be Gaussian. First, a more physically plausible analytical model of the human brain white matter accounting for the time-dependence of the diffusion process in the extra-axonal space was developed for Oscillating Gradient Spin Echo (OGSE) sequence signals. A decoding tool enabling to solve the inverse problem of estimating the parameters of the white matter microstructure from the OGSE-weighted diffusion MRI signal was designed using a robust optimization scheme for parameter estimation. Second, a Big Data approach was designed to further improve the brain microstructure decoding. All the simulation tools necessary to construct computational models of brain tissues were developed in the frame of this thesis. An algorithm creating realistic white matter tissue numerical phantoms based on a spherical meshing of cell shapes was designed, enabling to generate a massive amount of virtual voxels in a computationally efficient way thanks to a GPU-based implementation. An ultra-fast simulation tool of the water molecules diffusion process in those virtual voxels was designed, enabling to generate synthetic diffusion MRI signal for each virtual voxel. A dictionary of virtual voxels containing a huge set of geometrical configurations present in white matter was built. This dictionary contained virtual voxels with varying degrees of axonal beading, a swelling of the axonal membrane which occurs after strokes and other pathologies. The set of synthetic signals and associated geometrical configurations of the corresponding voxels was used as a training data set for a machine learning algorithm designed to decode white matter microstructure from the diffusion MRI signal and estimate the degree of axonal beading. This decoder showed encouraging regression results on unknown simulated data, showing the potential of the presented approach to characterize the microstructure of healthy and injured brain tissues in vivo. The microstructure decoding tools developed during this thesis will in particular be used to characterize white matter tissue microstructural parameters (axonal density, mean axonal diameter, glial density, mean glial cells diameter, microvascular density ) in short and long bundles. The simulation tools developed in the frame of this thesis will enable the construction of a probabilistic atlas of the white matter bundles microstructural parameters, using a mean propagator based diffeomorphic registration tool also designed in the frame of this thesis to register each individual.
37

Scalable and distributed constrained low rank approximations

Kannan, Ramakrishnan 27 May 2016 (has links)
Low rank approximation is the problem of finding two low rank factors W and H such that the rank(WH) << rank(A) and A ≈ WH. These low rank factors W and H can be constrained for meaningful physical interpretation and referred as Constrained Low Rank Approximation (CLRA). Like most of the constrained optimization problem, performing CLRA can be computationally expensive than its unconstrained counterpart. A widely used CLRA is the Non-negative Matrix Factorization (NMF) which enforces non-negativity constraints in each of its low rank factors W and H. In this thesis, I focus on scalable/distributed CLRA algorithms for constraints such as boundedness and non-negativity for large real world matrices that includes text, High Definition (HD) video, social networks and recommender systems. First, I begin with the Bounded Matrix Low Rank Approximation (BMA) which imposes a lower and an upper bound on every element of the lower rank matrix. BMA is more challenging than NMF as it imposes bounds on the product WH rather than on each of the low rank factors W and H. For very large input matrices, we extend our BMA algorithm to Block BMA that can scale to a large number of processors. In applications, such as HD video, where the input matrix to be factored is extremely large, distributed computation is inevitable and the network communication becomes a major performance bottleneck. Towards this end, we propose a novel distributed Communication Avoiding NMF (CANMF) algorithm that communicates only the right low rank factor to its neighboring machine. Finally, a general distributed HPC- NMF framework that uses HPC techniques in communication intensive NMF operations and suitable for broader class of NMF algorithms.
38

Application of multi-core and cluster computing to the Transmission Line Matrix method

Browne, Daniel R. January 2014 (has links)
The Transmission Line Matrix (TLM) method is an existing and established mathematical method for conducting computational electromagnetic (CEM) simulations. TLM models Maxwell s equations by discretising the contiguous nature of an environment and its contents into individual small-scale elements and it is a computationally intensive process. This thesis focusses on parallel processing optimisations to the TLM method when considering the opposing ends of the contemporary computing hardware spectrum, namely large-scale computing systems versus small-scale mobile computing devices. Theoretical aspects covered in this thesis are: The historical development and derivation of the TLM method. A discrete random variable (DRV) for rain-drop diameter,allowing generation of a rain-field with raindrops adhering to a Gaussian size distribution, as a case study for a 3-D TLM implementation. Investigations into parallel computing strategies for accelerating TLM on large and small-scale computing platforms. Implementation aspects covered in this thesis are: A script for modelling rain-fields using free-to-use modelling software. The first known implementation of 2-D TLM on mobile computing devices. A 3-D TLM implementation designed for simulating the effects of rain-fields on extremely high frequency (EHF) band signals. By optimising both TLM solver implementations for their respective platforms, new opportunities present themselves. Rain-field simulations containing individual rain-drop geometry can be simulated, which was previously impractical due to the lengthy computation times required. Also, computationally time-intensive methods such as TLM were previously impractical on mobile computing devices. Contemporary hardware features on these devices now provide the opportunity for CEM simulations at speeds that are acceptable to end users, as well as providing a new avenue for educating relevant user cohorts via dynamic presentations of EM phenomena.
39

Využití GPU pro náročné výpočty / Using GPU for HPC

Máček, Branislav Unknown Date (has links)
Recently there was a significant grow in building HPC systems. Nowadays they are building from mainstream computer components. One of them is graphics accelerators with GPU. This thesis deals with description of graphics accelerators. It examines possibilities usage. GPU chip has hundreds simple processors. This thesis examine possibilities how to benefit from these parallel processors. It contains description of several testing applications, discuss results from experiments and compares them with another components used for HPC.
40

Integrating Performance Analysis in Parallel Software Engineering

Poliakoff, David 18 August 2015 (has links)
Modern computational software is increasingly large in terms of lines of code, number of developers, intended longevity, and complexity of intended architectures. While tools exist to mitigate the problems this type of software causes for the development of functional software, no solutions exist to deal with the problems it causes for performance. This thesis introduces a design called the Software Development Performance Analysis System, or SDPAS. SDPAS observes the performance of software tests as software is developed, tracking builds, tests, and developers in order to provide data with which to analyze a software development process. SDPAS integrates with the CMake build and test suite to obtain data about builds and provide consistent tests, with git to obtain data about how software is changing. SDPAS also integrates with TAU to obtain performance data and store it along with the data obtained from other tools. The utility of SDPAS is observed on two pieces of production software.

Page generated in 0.4306 seconds