Spelling suggestions: "subject:"ppc"" "subject:"dppc""
51 |
Uma metodologia de avaliação de desempenho para identificar as melhore regiões paralelas para reduzir o consumo de energia / A performance evaluation methodology to find the best parallel regions to reduce energy consumptionMillani, Luís Felipe Garlet January 2015 (has links)
Devido as limitações de consumo energético impostas a supercomputadores, métricas de eficiência energética estão sendo usadas para analisar aplicações paralelas desenvolvidas para computadores de alto desempenho. O objetivo é a redução do custo energético dessas aplicações. Algumas estratégias de redução de consumo energética consideram a aplicação como um todo, outras reduzem ajustam a frequência dos núcleos apenas em certas regiões do código paralelo. Fases de balanceamento de carga ou de comunicação bloqueante podem ser oportunas para redução do consumo energético. A análise de eficiência dessas estratégias é geralmente realizada com metodologias tradicionais derivadas do domínio de análise de desempenho. Uma metodologia de grão mais fino, onde a redução de energia é avaliada para cada região de código e frequência pode lever a um melhor entendimento de como o consumo energético pode ser minimizado para uma determinada implementação. Para tal, os principais desafios são: (a) a detecção de um número possivelmente grande de regiões paralelas; (b) qual frequência deve ser adotada para cada região de forma a limitar o impacto no tempo de execução; e (c) o custo do ajuste dinâmico da frequência dos núcleos. O trabalho descrito nesta dissertação apresenta uma metodologia de análise de desempenho para encontrar, dentre as regiões paralelas, os melhores candidatos a redução do consumo energético. (Cotninua0 Esta proposta consiste de: (a) um design inteligente de experimentos baseado em Plackett-Burman, especialmente importante quando um grande número de regiões paralelas é detectado na aplicação; (b) análise tradicional de energia e desempenho sobre as regiões consideradas candidatas a redução do consumo energético; e (c) análise baseada em eficiência de Pareto mostrando a dificuldade em otimizar o consumo energético. Em (c) também são mostrados os diferentes pontos de equilíbrio entre desempenho e eficiência energética que podem ser interessantes ao desenvolvedor. Nossa abordagem é validada por três aplicações: Graph500, busca em largura, e refinamento de Delaunay. / Due to energy limitations imposed to supercomputers, parallel applications developed for High Performance Computers (HPC) are currently being investigated with energy efficiency metrics. The idea is to reduce the energy footprint of these applications. While some energy reduction strategies consider the application as a whole, certain strategies adjust the core frequency only for certain regions of the parallel code. Load balancing or blocking communication phases could be used as opportunities for energy reduction, for instance. The efficiency analysis of such strategies is usually carried out with traditional methodologies derived from the performance analysis domain. It is clear that a finer grain methodology, where the energy reduction is evaluated per each code region and frequency configuration, could potentially lead to a better understanding of how energy consumption can be reduced for a particular algorithm implementation. To get this, the main challenges are: (a) the detection of such, possibly parallel, code regions and the large number of them; (b) which frequency should be adopted for that region (to reduce energy consumption without too much penalty for the runtime); and (c) the cost to dynamically adjust core frequency. The work described in this dissertation presents a performance analysis methodology to find the best parallel region candidates to reduce energy consumption. The proposal is three folded: (a) a clever design of experiments based on screening, especially important when a large number of parallel regions is detected in the applications; (b) a traditional energy and performance evaluation on the regions that were considered as good candidates for energy reduction; and (c) a Pareto-based analysis showing how hard is to obtain energy gains in optimized codes. In (c), we also show other trade-offs between performance loss and energy gains that might be of interest of the application developer. Our approach is validated against three HPC application codes: Graph500; Breadth-First Search, and Delaunay Refinement.
|
52 |
Eletrólitos sólidos poliméricos à base de polissacarídeos: síntese e caracterização. / Solid polymer electrolytes based on polysacharide: synthesis and characterization.Anelise Maria Regiani 10 November 2000 (has links)
A síntese e a caracterização de um novo tipo de eletrólito sólido polimérico são descritas neste trabalho. Os materiais preparados consistiram de filmes de hidroxietil celulose ou hidroxipropil celulose entrecruzadas com diisocianatos de poli(óxido de etileno) e poli(óxido de propileno) ou enxertadas com monoisocianato de poli(óxido de propileno). Todos estes isocianatos foram sintetizados a partir das respectivas aminas comerciais. Filmes de hidroxietil celulose entrecruzada com hexametileno diisocianato ou enxertados com fenil isocianato também foram estudados. Como técnicas de caracterização foram utilizadas espectroscopia no infravermelho, no ultravioleta e de ressonância magnética nuclear, análises térmicas e difração de raios-X. Os filmes dopados com LiClO4 foram caracterizados utilizando-se as mesmas técnicas e a condutividade foi determinada através do método de impedância complexa. Os resultados foram da ordem de 10-5 Scm-1 a 60oC. Este valor permitiu concluir que as cadeias de derivado de celulose parecem não influenciar no fenômeno de condução; aparentemente este encontra-se mais relacionado ao tipo de isocianato utilizado na formação do filme. Os resultados de condutividade e de mobilidade de cadeia polimérica indicam que os sistemas aqui estudados podem ser aplicados como eletrólitos sólidos poliméricos. Os filmes com isocianatos comerciais, no entanto não apresentaram resultado de condução interessante. / The synthesis and characterization of new types of solid polymer electrolytes based on hydroxyethyl and hydroxypropyl cellulose grafted with different polyethers were investigated. The synthesis is based on the reaction between the cellulose derivative and mono and difunctional isocyanates prepared from amines of polyethylene oxide and polypropylene oxide. It were also synthesized films of hydroxyethyl cellulose grafted with hexamethylene diisocyanate and phenylisocyanate. These materials were characterized through techniques of infrared, ultraviolet and nuclear magnetic ressonance spectroscopies, thermal analysis and X-ray diffraction. The films of polysaccharide and polyether that contained LiClO4 showed conductivity values of the order of 10-5 Scm-1 at 60oC. The value of this parameter seems to be independent of the cellulose derivative parameters and it is better related to the type of isocyanate grafted on the polysaccharide chain. The conductivity and chain mobility results show that the systems studied here can be applied as solid polymer electrolytes. The materials synthesized using commercial isocyanates as grafting reactant did not show interesting conductivity response.
|
53 |
Geocomputational Approaches to Improve Problem Solution in Spatial Optimization: A Case Study of the p-Median ProblemMu, Wangshu, Mu, Wangshu January 2018 (has links)
The p-Median problem (PMP) is one of the most widely applied location problems in urban and regional planning to support spatial decision-making. As an NP-hard problem, the PMP remains challenging to solve optimally, especially for large-sized problems. This research focuses on developing geocomputational approaches to improve the effectiveness and efficiency of solving the PMP. This research also examines existing PMP methods applied to choropleth mapping and proposes a new approach to address issues associated with uncertainty.
Chapter 2 introduces a new algorithm that solves the PMP more effectively. In this chapter, a method called the spatial-knowledge enhanced Teitz and Bart heuristic (STB) is proposed to improve the classic Teitz and Bart (TB) heuristic.. The STB heuristic prioritizes candidate facility sites to be examined in the solution set based on the spatial distribution of demand and candidate facility sites. Tests based on a range of PMPs demonstrate the effectiveness of the STB heuristic.
Chapter 3 provides a high performance computing (HPC) based heuristic, Random Sampling and Spatial Voting (RSSV), to solve large PMPs. Instead of solving a large-sized PMP directly, RSSV solves multiple sub-PMPs with each sub-PMP containing a subset of facility and demand sites. Combining all the sub-PMP solutions, a spatial voting strategy is introduced to select candidate facility sites to construct a PMP for obtaining the final problem solution. The RSSV algorithm is well-suited to the parallel structure of the HPC platform. Tests with the BIRCH dataset show that RSSV provides high-quality solutions and reduces computing time significantly. Tests also demonstrate the dynamic scalability of the algorithm; it can start with a small amount of computing resources and scale up or down when the availability of computing resources changes.
Chapter 4 provides a new classification scheme to draw choropleth maps when data contain uncertainty. Considering that units in the same class on a choropleth map are assigned the same color or pattern, the new approach assumes the existence of a representative value for each class. A maximum likelihood estimation (MLE) based approach is developed to determine class breaks so that the overall within-class deviation is minimized while considering uncertainty. Different methods, including mixed integer programming, dynamic programming, and an interchange heuristic, are developed to solve the new classification problem. The proposed mapping approach is then applied to map two American Community Survey datasets. The effectiveness of the new approach is demonstrated, and the linkage of the approach with the PMP method and the Jenks Natural Breaks is discussed.
|
54 |
Defining the functional role of laminin isoforms in the regulation of the adult hepatic progenitor cellWilliams, Michael John January 2015 (has links)
During chronic and severe acute liver injury, regeneration is thought to occur through hepatic progenitor cells (HPCs). Understanding the regulation of HPCs may offer therapeutic opportunities to enhance liver regeneration. HPCs are associated with an increase in laminins in the extracellular matrix. Laminins are heterotrimeric proteins, composed of an alpha, beta and gamma chain. There are 5 alpha chains with different distributions and functions, but the relative contributions of these in HPC-mediated liver regeneration are not known. My aims were to describe the laminin alpha chains associated with the HPC response and to define the functional effects of specific laminin chains on HPCs. I examined the laminin alpha chains in two mouse models of HPC activation: a transgenic model using conditional deletion of Mdm2 in hepatocytes, and a dietary model using 3,5-diethoxycarbonyl-1,4-dihydrocollidine (DDC). The laminin alpha 5 (Lama5) chain is significantly upregulated in both models and forms a basement membrane which surrounds the progenitor cells. I have also demonstrated Lama5 expression in the ductular reaction seen in human liver disease. Using primary mouse cell cultures, I have shown that Lama5 is produced predominantly by the HPCs themselves, rather than by stellate cells. The HPCs express the cell surface receptor alpha-6 beta-1 integrin, a binding partner of Lama5. I then studied the functional effects of matrix on cell behaviour in vitro using recombinant laminins and a line of spontaneously immortalised mouse HPCs. Compared to other laminin chains, Lama5 selectively promotes HPC adhesion and spreading. These effects are partially blocked by antibodies against beta-1 integrin. Lama5 also significantly enhances HPC migration, resulting in an increase in cell migration. Furthermore, only Lama5 enhances HPC survival in serum-free medium, with an increase in cell viability. Culturing HPCs on HPCs maintained in culture on plastic synthesise Lama5 chain. Knock-down of endogenous Lama5 production using siRNA results in reduced proliferation and increased hepatocytic differentiation, with increased albumin production. I then studied the effects in vivo using transgenic Cre-lox mouse strains that allow conditional knock-out of either laminin alpha 5 or beta-1 integrin in HPCs. The effects of gene deletion were examined in healthy mice and two dietary models of HPC activation: the DDC diet and a choline-deficient, ethionine-supplemented (CDE) diet. Although these experiments were limited by a low number of experimental animals and low recombination rates, there was a suggestion of impaired HPC expansion associated with loss of laminin alpha 5. There was also a significant increase in hepatocellular injury and fibrosis in response to the DDC diet seen with loss of laminin alpha 5 expression. Laminin alpha 5-containing matrix is deposited around HPCs during liver regeneration and supports progenitor cell attachment, migration and maintenance of an undifferentiated phenotype. This work identifies a novel target for enhancing liver regeneration.
|
55 |
Un modèle de transport et de chimie atmosphérique à grande échelle adapté aux calculateurs massivement parallèles / A large scale atmospheric chemistry tranport model for massively parallel architecturesPraga, Alexis 30 January 2015 (has links)
Cette thèse présente un modèle bi-dimensionnel pour le transport atmosphérique à grande échelle, nommé Pangolin, conçu pour passer à l'échelle sur les achitectures parallèles. La version actuelle comporte une advection 2D ainsi qu'un schéma linéaire de chimie et servira de base pour un modèle de chimie-transport (MCT). Pour obtenir la conservation de la masse, un schéma en volume finis de type van Leer a été retenu pour l'advection et étendu au cas 2D en utilisant des opérateurs alternés. La conservation de la masse est assurée en corrigeant les vents en amont. Nous proposons une solution au problème "des pôles" de la grille régulière latitude-longitude grâce à une nouvelle grille préservant approximativement les aires des cellules et couvrant la sphère uniformément. La parallélisation du modèle se base sur l'advection et utilise un algorithme de décomposition de domaines spécialement adapté à la grille. Cela permet d'obtenir l'équilibrage de la charge de calcul avec MPI, une librairie d'échanges de messages. Pour que les performances soient à la hauteur sur les architectures parallèles actuelles et futures, les propriétés analytiques de la grille sont exploitées pour le schéma d'advection et la parallélisation en privilégiant le moindre coût des flops par rapport aux mouvement de données. Le modèle est validé sur des cas tests analytiques et comparé à des schémas de transport à l'aide d'un comparatif récemment publié. Pangolin est aussi comparé au MCT de Météo-France via un schéma linéaire d'ozone et l'utilisation de coordonnées isentropes. / We present in this thesis the development of a large-scale bi-dimensional atmospheric transport scheme designed for parallel architectures with scalability in mind. The current version, named Pangolin, contains a bi-dimensional advection and a simple linear chemistry scheme for stratospheric ozone and will serve as a basis for a future Chemistry Transport Model (CTM). For mass-preservation, a van Leer finite volume scheme was chosen for advection and extended to 2D with operator splitting. To ensure mass preservation, winds are corrected in a preprocessing step. We aim at addressing the "pole issue" of the traditional regular latitude-longitude by presenting a new quasi area-preserving grid mapping the sphere uniformly. The parallelization of the model is based on the advection operator and a custom domain-decomposition algorithm is presented here to attain load-balancing in a message-passing context. To run efficiently on current and future parallel architectures, algebraic features of the grid are exploited in the advection scheme and parallelization algorithm to favor the cheaper costs of flops versus data movement. The model is validated on algebraic test cases and compared to other state-of-theart schemes using a recent benchmark. Pangolin is also compared to the CTM of Météo-France, MOCAGE, using a linear ozone scheme and isentropic coordinates.
|
56 |
Performance Models For Distributed Memory HPC Systems And Deep Neural NetworksDavid William Cardwell (8037125) 26 November 2019 (has links)
Performance models are useful as mathematical models to reason about the behavior of different computer systems while running various applications. In this thesis, we aim to provide two distinct performance models: one for distributed-<br>memory high performance computing systems with network communication, and one for deep neural networks. Our main goal for the first model is insight and simplicity, while for the second we aim for accuracy in prediction. The first model is generalized for networked multi-core computer systems, while the second is specific to deep neural networks on a shared-memory system.<br>
|
57 |
Profile, Monitor, and Introspect Spark Jobs Using OSU INAMKedia, Mansa January 2020 (has links)
No description available.
|
58 |
FPGA acceleration of high performance computing communication middlewareXiong, Qingqing 29 September 2019 (has links)
High-Performance Computing (HPC) necessarily requires computing with a large number of nodes. As computing technology progresses, internode communication becomes an ever more critical performance blocker. The execution time of software communication support is generally critical, often accounting for hundreds of times the latency of actual time-of-flight. This software support comes in two types. The first is support for core functions as defined in middleware such as the ubiquitous Message Passing Interface (MPI). Over the last decades this software overhead has been addressed through a number of advances such as eliminating data copies, improving drivers, and bypassing the operating system. However an essential core still remains, including message matching, data marshaling, and handling collective operations. The second type of communication support is for new services not inherently part of the middleware. The most prominent of these is compression; it brings huge savings in transmission time, but much of this benefit is offset by a new level of software overhead. In this dissertation, we address the software overhead in internode communication with elements of the emerging node architectures, which include FPGAs in multiple configurations, including closely coupled hardware support, programmable Network Interface Cards (NICs), and routers with programmable accelerators.
While there has been substantial work in offloading communication software into hardware, we advance the state-of-the-art in three ways. The first is to use an emerging hardware model that is, for the first time, both realistic and supportive of very high performance gains. Previous studies (and some products) have relied on hardware models that are either of limited benefit (a NIC processor) or not sustainable (NIC augmented with ASICs). Our hardware model is based on the various emerging CPU-FPGA computing architectures. The second is to improve on previous work. We have found this to be possible through a number of means: taking advantage of configurable hardware, taking advantage of close coupling, and coming up with novel improvements. The third is looking at problems that have been, so far, nearly completely unexplored. One of these is hardware acceleration of application-aware, in-line, lossy compression.
In this dissertation, we propose offload approaches and hardware designs for integrated FPGAs to bring down communication latency to ultra-low levels unachievable by today's software/hardware. We focus on improving performance from three aspects: 1) Accelerating middleware semantics within communication routines such as message matching and derived datatypes; 2) Optimizing complex communication routines, namely, collective operations; 3) Accelerating operations vital in new communication services independent of the middleware, such as data compression. % The last aspect is somewhat broader than the others. It is applicable both to HPC communication, but also is vital to broader system functions such as I/O.
|
59 |
Evaluation of an automated method for measuring hematopoietic progenitor cells to determine the start of stem cell apheresis.Bergman, Märta January 2020 (has links)
Stem cell transplantation is a known treatment for various cancers. Currently most cells transplanted are collected via apheresis. An injection of growth factor is given to the patient to start the proliferation and mobilization of stem cells. Apheresis can be initiated when the patient has a stem cell count of 15 to 20 stem cells/µL of peripheral blood. The standard method with which stem cells are analysed is immune flow cytometry where CD34+ and CD45+ are identified with targeted fluorescent antibodies. This analysis takes more than 45 minutes to perform. Sysmex XN-9000 analyses samples with flow cytometry by lysing erythrocytes and platelets and staining the leukocytes with fluorescent dye. Analysis of the hematopoietic progenitor cells (HPC) takes less than 4 minutes. The purpose of this study was to investigate ifit is possible to predict the start of the apheresis using XN-9000. For this study, 43 samples were analysed using both methods. Using the sign test, a p-value was calculated to <0.05, which indicates a significant difference between the results received by the two methods. Spearman’s rank correlation gave an observed ρ-value > the critical ρ-value which revealed a correlation between the methods, although not linear according to Pearson’s correlation coefficient. PPV and NPV were calculated with cut-off at 20, 30 and 40 HPC/µL blood where 20 HPC/µL gave an NPV at 100 %. According to the test made, there is correlation between the two methods, but further samples must be analysed to investigate how the results should be compared.
|
60 |
Maximizing I/O Bandwidth for Out-of-Core HPC Applications on Homogeneous and Heterogeneous Large-Scale SystemsAlturkestani, Tariq 30 September 2020 (has links)
Out-of-Core simulation systems often produce a massive amount of data that cannot
t on the aggregate fast memory of the compute nodes, and they also require to
read back these data for computation. As a result, I/O data movement can be a
bottleneck in large-scale simulations. Advances in memory architecture have made
it feasible and a ordable to integrate hierarchical storage media on large-scale systems,
starting from the traditional Parallel File Systems (PFSs) to intermediate fast
disk technologies (e.g., node-local and remote-shared NVMe and SSD-based Burst
Bu ers) and up to CPU main memory and GPU High Bandwidth Memory (HBM).
However, while adding additional and faster storage media increases I/O bandwidth,
it pressures the CPU, as it becomes responsible for managing and moving data between
these layers of storage. Simulation systems are thus vulnerable to being blocked
by I/O operations. The Multilayer Bu er System (MLBS) proposed in this research
demonstrates a general and versatile method for overlapping I/O with computation
that helps to ameliorate the strain on the processors through asynchronous access.
The main idea consists in decoupling I/O operations from computational phases using
dedicated hardware resources to perform expensive context switches. MLBS monitors
I/O tra c in each storage layer allowing fair utilization of shared resources. By
continually prefetching up and down across all hardware layers of the memory and
storage subsystems, MLBS transforms the original I/O-bound behavior of evaluated
applications and shifts it closer to a memory-bound or compute-bound regime. The evaluation on the Cray XC40 Shaheen-2 supercomputer for a representative I/Obound
application, seismic inversion, shows that MLBS outperforms state-of-the-art
PFSs, i.e., Lustre, Data Elevator and DataWarp by 6.06X, 2.23X, and 1.90X, respectively.
On the IBM-built Summit supercomputer, using 2048 compute nodes equipped
with a total of 12288 GPUs, MLBS achieves up to 1.4X performance speedup compared
to the reference PFS-based implementation. MLBS is also demonstrated on
applications from cosmology, combustion, and a classic out-of-core computational
physics and linear algebra routines.
|
Page generated in 0.0554 seconds