Global ETD Search

51	A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms / Um sistema de escalonamento dinâmico e tuning em tempo de execução para plataformas desktop heterogêneas de múltiplos núcleos Binotto, Alécio Pedro Delazari January 2011 (has links) Atualmente, o computador pessoal (PC) moderno poder ser considerado como um cluster heterogênedo de um nodo, o qual processa simultâneamente inúmeras tarefas provenientes das aplicações. O PC pode ser composto por Unidades de Processamento (PUs) assimétricas, como a Unidade Central de Processamento (CPU), composta de múltiplos núcleos, a Unidade de Processamento Gráfico (GPU), composta por inúmeros núcleos e que tem sido um dos principais co-processadores que contribuiram para a computação de alto desempenho em PCs, entre outras. Neste sentido, uma plataforma de execução heterogênea é formada em um PC para efetuar cálculos intensivos em um grande número de dados. Na perspectiva desta tese, a distribuição da carga de trabalho de uma aplicação nas PUs é um fator importante para melhorar o desempenho das aplicações e explorar tal heterogeneidade. Esta questão apresenta desafios uma vez que o custo de execução de uma tarefa de alto nível em uma PU é não-determinístico e pode ser afetado por uma série de parâmetros não conhecidos a priori, como o tamanho do domínio do problema e a precisão da solução, entre outros. Nesse escopo, esta pesquisa de doutorado apresenta um sistema sensível ao contexto e de adaptação em tempo de execução com base em um compromisso entre a redução do tempo de execução das aplicações - devido a um escalonamento dinâmico adequado de tarefas de alto nível - e o custo de computação do próprio escalonamento aplicados em uma plataforma composta de CPU e GPU. Esta abordagem combina um modelo para um primeiro escalonamento baseado em perfis de desempenho adquiridos em préprocessamento com um modelo online, o qual mantém o controle do tempo de execução real de novas tarefas e escalona dinâmicamente e de modo eficaz novas instâncias das tarefas de alto nível em uma plataforma de execução composta de CPU e de GPU. Para isso, é proposto um conjunto de heurísticas para escalonar tarefas em uma CPU e uma GPU e uma estratégia genérica e eficiente de escalonamento que considera várias unidades de processamento. A abordagem proposta é aplicada em um estudo de caso utilizando uma plataforma de execução composta por CPU e GPU para computação de métodos iterativos focados na solução de Sistemas de Equações Lineares que se utilizam de um cálculo de stencil especialmente concebido para explorar as características das GPUs modernas. A solução utiliza o número de incógnitas como o principal parâmetro para a decisão de escalonamento. Ao escalonar tarefas para a CPU e para a GPU, um ganho de 21,77% em desempenho é obtido em comparação com o escalonamento estático de todas as tarefas para a GPU (o qual é utilizado por modelos de programação atuais, como OpenCL e CUDA para Nvidia) com um erro de escalonamento de apenas 0,25% em relação à combinação exaustiva. / A modern personal computer can be now considered as a one-node heterogeneous cluster that simultaneously processes several applications’ tasks. It can be composed by asymmetric Processing Units (PUs), like the multi-core Central Processing Unit (CPU), the many-core Graphics Processing Units (GPUs) - which have become one of the main co-processors that contributed towards high performance computing - and other PUs. This way, a powerful heterogeneous execution platform is built on a desktop for data intensive calculations. In the perspective of this thesis, to improve the performance of applications and explore such heterogeneity, a workload distribution over the PUs plays a key role in such systems. This issue presents challenges since the execution cost of a task at a PU is non-deterministic and can be affected by a number of parameters not known a priori, like the problem size domain and the precision of the solution, among others. Within this scope, this doctoral research introduces a context-aware runtime and performance tuning system based on a compromise between reducing the execution time of the applications - due to appropriate dynamic scheduling of high-level tasks - and the cost of computing such scheduling applied on a platform composed of CPU and GPUs. This approach combines a model for a first scheduling based on an off-line task performance profile benchmark with a runtime model that keeps track of the tasks’ real execution time and efficiently schedules new instances of the high-level tasks dynamically over the CPU/GPU execution platform. For that, it is proposed a set of heuristics to schedule tasks over one CPU and one GPU and a generic and efficient scheduling strategy that considers several processing units. The proposed approach is applied in a case study using a CPU-GPU execution platform for computing iterative solvers for Systems of Linear Equations using a stencil code specially designed to explore the characteristics of modern GPUs. The solution uses the number of unknowns as the main parameter for assignment decision. By scheduling tasks to the CPU and to the GPU, it is achieved a performance gain of 21.77% in comparison to the static assignment of all tasks to the GPU (which is done by current programming models, such as OpenCL and CUDA for Nvidia) with a scheduling error of only 0.25% compared to exhaustive search. Processamento paralelo Microeletrônica Processamento : Imagem Processamento : Alto desempenho High-performance computing Scheduling Dynamic load-balancing Heterogenous systems Graphics processors Solvers for systems of linear equations
52	Análise de prova de carga dinâmica em estacas metálicas do tipo trilho / Analysis of dynamic load tests on steel crane rail piles Floriano Medeiros de Andrade Lima 21 December 1999 (has links) Este trabalho apresenta a análise de provas de carga dinâmica realizadas em estaca metálicas tipo trilho (TR - 68) cravadas no Campo Experimental de Fundações da USP/São Carlos. As estacas possuem segmentos de 12 e de 3 m, soldados segundo a NBR 8800/86, totalizando um comprimento máximo 27 m. Os objetivos desta análise são: 1) realizar um estudo abrangente da capacidade de carga do tipo trilho, 2) verificar a utilização de trilho usado como elemento estrutural de fundação, 3) comprovar o uso do repique como método de controle do estaqueamento, 4) demostrar a importância da energia crescente na prova de carga dinâmica. As medidas de repique foram realizadas com o uso de papel e lápis em vinte estacas, para níveis crescentes de energia. A energia foi aplicada pelo martelo de gravidade caindo de uma altura de 0,20 - 0,40 - 0,60 - 0,80 - 1,00 - 1,20 e 1,50 m. A capacidade de carga das estacas, determinada pela extrapolação da curva carga mobilizada - deslocamento, foi comparada com os valores obtidos pelos métodos de previsão da prática brasileira, bem como com os resultados de dois ensaios com PDA (Pile Driving Analyzer) e também com uma prova de carga estática, realizados em estacas representativas. / This paper presents an analysis of dynamic load tests performed on steel crane rails piles (TR-68) at the Experimental Foundation Field of USP/São Carlos. The piles have segments of 12 and 3 m, welded according to NBR 8800/86, comprising a maximum driving length of 27 m. The objectives of this analysis are: 1) to carry out a wide study of steel track pile bearing capacity, 2) verify the utilization of the steel crane rails piles with structural element of foundation, 3) to prove the use of the rebound with the control method of driving piles. 4) to demonstrate the importance of increasing energy in the dynamic load test. The rebound measurements were performed with paper and pencil in twenty piles and were analyzed by applying increasing energy level. The energy was applied by the gravity hammer falling of a drop heights of 0.2 - 0.4 - 0.6 - 0.8 - 1.0 - 1.2 and 1.5 m. The pile bearing capacity, determined by the extrapolations of the mobilized resistance - displacement curves was compared to the Brazilian experience methods, as well as to two load tests employing the Pile Driving Analyzer (PDA) and to one static load test carried out on representative piles. Energia crescente Estaca metálica Estaca tipo trilho Prova de carga dinâmica Repique Dynamic load test Increasing energy Rebound Steel crane rail piles
53	Novo procedimento para a realização de análise capwap no ensaio de carregamento dinâmico em estacas pré-moldadas. / New prodedure to perform CAPWAP analysis on dinamic load test in precast concrete piles. Daniel Kina Murakami 01 October 2015 (has links) Desde a década de 1980 diversos autores apresentaram correlações entre provas de carga estática e ensaios de carregamento dinâmico em estacas. Para uma boa correlação é fundamental que os testes sejam bem executados e que atinjam a ruptura segundo algum critério, como o de Davisson, por exemplo, além de levar em conta o intervalo de tempo entre a execução da prova de carga estática e do ensaio dinâmico, face ao efeito \"set up\". Após a realização do ensaio dinâmico realiza-se a análise CAPWAP que permite a determinação da distribuição do atrito lateral em profundidade, a carga de ponta e outros parâmetros dos solos tais como quakes e damping. A análise CAPWAP é realizada por tentativas através do procedimento \"signal matching\", isto é, o melhor ajuste entre os sinais de força medido pelos sensores e o calculado. É relativamente fácil mostrar que a mesma solução pode ser obtida através de dados de entrada diferentes. Isso significa que apesar de apresentarem cargas mobilizadas próximas o formato da curva da simulação de prova de carga estática, obtida pelo CAPWAP, assim como a distribuição do atrito lateral, podem ser diferentes, mesmo que as análises apresentem \"match quality\" (MQWU) satisfatórios. Uma forma de corrigir o formato da curva simulada do CAPWAP, assim como a distribuição do atrito lateral, é através da comparação com provas de carga estática (PCE). A sobreposição das duas curvas, a simulada e a \"real\", permite a determinação do quake do fuste através do trecho inicial da curva carga-recalque da prova de carga estática, que por sua vez permite uma melhor definição da distribuição do atrito lateral e da reação de ponta. Neste contexto surge o conceito de \"match quality de recalques\" (MQR). Quando a PCE não está disponível, propõe-se efetuar um carregamento estático utilizando o peso próprio do martelo do bate-estaca (CEPM). Mostra-se, através de dois casos de obra, em que estavam disponíveis ensaios de carregamento dinâmico e PCEs, que esse procedimento permite obter uma melhor solução do ponto de vista físico, isto é consistente com as características do subsolo e com a curva carga-recalque da PCE, e não apenas matemático, através da avaliação do \"match quality\" (MQWU). / Since the 1980s a lot of authors showed correlations between static load tests and dynamic load tests on piles. For a good correlation it is necessary a good execution of the load test, also it is necessary to choose a capacity value from the results of the static load test, for example, the Davisson Offset limit load. The time of execution between the static load test and the dynamic load test should be considered because of the set up effect. Dynamic data may be further analyzed by CAPWAP Method to evaluate the soil resistance distribution, the toe resistance, quake and damping values. It is a signal matching method. Its results are based on the \"best possible match\" between computed pile top variable such as the pile top force and its measured equivalent. It is easy to demonstrate almost the same pile capacity on CAPWAP using different soil parameters. It means that even the pile capacity is almost the same, the shape of the pile top load-displacement of the CAPWAP Method and the shaft friction distribution can be different, although all results confirm good match quality. One way to correct the shape of the top loaddisplacement of the CAPWAP Method, as well as the shaft friction distribution, is by comparisson to a static load test. Overlaying both curves, the static load test and the CAPWAP Method, it is possible to determine the shaft quake value on the initial loads on the top load-displacement curve, allowing this way a improvement of the shaft resistance distribution and the toe resistance. In this context arises the concept of \"match quality of settlements\". When the static load test is not avaliabe, this thesis proposes a static load test using the hammer\'s weight of the pile driving machine. It is shown by two case studies that were available static load tests and dynamic load tests, this procedure allows a better solution on physics aspects, this is consistent with the subsoil conditions and the load-settlement curve of the static load test, not only a mathematical solution based on match quality. Análise CAPWAP Ensaio de carregamento dinâmico Estacas Match quality de recalques Prova de carga estática CAPWAP analysis Dynamic load test Match quality fo settlements Static load test
54	Exploitation d'infrastructures hétérogènes de calcul distribué pour la simulation Monte-Carlo dans le domaine médical / Exploiting Heterogeneous Distributed Systems for Monte-Carlo Simulations in the Medical Field Pop, Sorina 21 October 2013 (has links) Les applications Monte-Carlo sont facilement parallélisables, mais une parallélisation efficace sur des grilles de calcul est difficile à réaliser. Des stratégies avancées d'ordonnancement et de parallélisation sont nécessaires pour faire face aux taux d'erreur élevés et à l'hétérogénéité des ressources sur des architectures distribuées. En outre, la fusion des résultats partiels est également une étape critique. Dans ce contexte, l'objectif principal de notre travail est de proposer de nouvelles stratégies pour une exécution plus rapide et plus fiable des applications Monte-Carlo sur des grilles de calcul. Ces stratégies concernent à la fois le phase de calcul et de fusion des applications Monte-Carlo et visent à être utilisées en production. Dans cette thèse, nous introduisons une approche de parallélisation basée sur l'emploi des tâches pilotes et sur un nouvel algorithme de partitionnement dynamique. Les résultats obtenus en production sur l'infrastructure de grille européenne (EGI) en utilisant l'application GATE montrent que l'utilisation des tâches pilotes apporte une forte amélioration par rapport au système d'ordonnancement classique et que l'algorithme de partitionnement dynamique proposé résout le problème d'équilibrage de charge des applications Monte-Carlo sur des systèmes distribués hétérogènes. Puisque toutes les tâches finissent presque simultanément, notre méthode peut être considérée comme optimale à la fois en termes d'utilisation des ressources et de temps nécessaire pour obtenir le résultat final (makespan). Nous proposons également des stratégies de fusion avancées avec plusieurs tâches de fusion. Une stratégie utilisant des sauvegardes intermédiaires de résultat (checkpointing) est utilisée pour permettre la fusion incrémentale à partir des résultats partiels et pour améliorer la fiabilité. Un modèle est proposé pour analyser le comportement de la plateforme complète et aider à régler ses paramètres. Les résultats expérimentaux montrent que le modèle correspond à la réalité avec une erreur relative de 10% maximum, que l'utilisation de plusieurs tâches de fusion parallèles réduit le temps d'exécution total de 40% en moyenne, que la stratégie utilisant des sauvegardes intermédiaires permet la réalisation de très longues simulations sans pénaliser le makespan. Pour évaluer notre équilibrage de charge et les stratégies de fusion, nous mettons en œuvre une simulation de bout-en-bout de la plateforme décrite ci-dessus. La simulation est réalisée en utilisant l'environnement de simulation SimGrid. Les makespan réels et simulés sont cohérents, et les conclusions tirées en production sur l'influence des paramètres tels que la fréquence des sauvegardes intermédiaires et le nombre de tâches de fusion sont également valables en simulation. La simulation ouvre ainsi la porte à des études paramétriques plus approfondies. / Particle-tracking Monte-Carlo applications are easily parallelizable, but efficient parallelization on computing grids is difficult to achieve. Advanced scheduling strategies and parallelization methods are required to cope with failures and resource heterogeneity on distributed architectures. Moreover, the merging of partial simulation results is also a critical step. In this context, the main goal of our work is to propose new strategies for a faster and more reliable execution of Monte-Carlo applications on computing grids. These strategies concern both the computing and merging phases of Monte-Carlo applications and aim at being used in production. In this thesis, we introduce a parallelization approach based on pilots jobs and on a new dynamic partitioning algorithm. Results obtained on the production European Grid Infrastructure (EGI) using the GATE application show that pilot jobs bring strong improvement w.r.t. regular metascheduling and that the proposed dynamic partitioning algorithm solves the load-balancing problem of particle-tracking Monte-Carlo applications executed in parallel on distributed heterogeneous systems. Since all tasks complete almost simultaneously, our method can be considered optimal both in terms of resource usage and makespan. We also propose advanced merging strategies with multiple parallel mergers. Checkpointing is used to enable incremental result merging from partial results and to improve reliability. A model is proposed to analyze the behavior of the complete framework and help tune its parameters. Experimental results show that the model fits the real makespan with a relative error of maximum 10%, that using multiple parallel mergers reduces the makespan by 40% on average, that checkpointing enables the completion of very long simulations and that it can be used without penalizing the makespan. To evaluate our load balancing and merging strategies, we implement an end-to-end SimGrid-based simulation of the previously described framework for Monte-Carlo computations on EGI. Simulated and real makespans are consistent, and conclusions drawn in production about the influence of application parameters such as the checkpointing frequency and the number of mergers are also made in simulation. These results open the door to better and faster experimentation. To illustrate the outcome of the proposed framework, we present some usage statistics and a few examples of results obtained in production. These results show that our experience in production is significant in terms of users and executions, that the dynamic load balancing can be used extensively in production, and that it significantly improves performance regardless of the variable grid conditions. Imagerie médicale Tomographie Grille de calcul Calcul distribué Simulation Monte Carlo Tomography Grid computing Dynamic load balancing Monte Carlo Simulation 616.075 407 2
55	Modèles de distribution pour la simulation de trafic multi-agent / Distributed models for multi-agent traffic simulation Mastio, Matthieu 12 July 2017 (has links) L'analyse et la prévision du comportement des réseaux de transport sont aujourd'hui des éléments cruciaux pour la mise en place de politiques de gestion territoriale. La simulation informatique du trafic routier est un outil puissant permettant de tester des stratégies de gestion avant de les déployer dans un contexte opérationnel. La simulation du trafic à l'échelle d'un ville requiert cependant une puissance de calcul très importante, dépassant les capacité d'un seul ordinateur.Dans cette thèse, nous étudions des méthodes permettant d'effectuer des simulations de trafic multi-agent à large échelle. Nous proposons des solutions permettant de distribuer l'exécution de telles simulations sur un grand nombre de coe urs de calcul. L'une d'elle distribue directement les agents sur les coeurs disponibles, tandis que la seconde découpe l'environnement sur lequel les agents évoluent. Les méthodes de partitionnement de graphes sont étudiées à cet effet, et nous proposons une procédure de partitionnement spécialement adaptée à la simulation de trafic multi-agent. Un algorithme d'équilibrage de charge dynamique est également développé, afin d'optimiser les performances de la distribution de la simulation microscopique.Les solutions proposées ont été éprouvées sur un réseau réel représentant la zone de Paris-Saclay.Ces solutions sont génériques et peuvent être appliquées sur la plupart des simulateurs existants.Les résultats montrent que la distribution des agents améliore grandement les performances de la simulation macroscopique, tandis que le découpage de l'environnement est plus adapté à la simulation microscopique. Notre algorithme d'équilibrage de charge améliore en outre significativement l'efficacité de la distribution de l'environnement / Nowadays, analysis and prediction of transport network behavior are crucial elements for the implementation of territorial management policies. Computer simulation of road traffic is a powerful tool for testing management strategies before deploying them in an operational context. Simulation of city-wide traffic requires significant computing power exceeding the capacity of a single computer.This thesis studies the methods to perform large-scale multi-agent traffic simulations. We propose solutions allowing the distribution of such simulations on a large amount of computing cores.One of them distributes the agents directly on the available cores, while the second splits the environment on which the agents evolve. Graph partitioning methods are studied for this purpose, and we propose a partitioning procedure specially adapted to the multi-agent traffic simulation. A dynamic load balancing algorithm is also developed to optimize the performance of the microscopic simulation distribution.The proposed solutions have been tested on a real network representing the Paris-Saclay area.These solutions are generic and can be applied to most existing simulators.The results show that the distribution of the agents greatly improves the performance of the macroscopic simulation, whereas the environment distribution is more suited to microscopic simulation. Our load balancing algorithm also significantly improves the efficiency of the environment based distribution Calcul distribué Systèmes multi agents Simulation de trafic Partition de graphe Équilibrage de charge dynamique Distributed computing Multi agent systems Traffic simulation Graph partitioning Dynamic load balancing
56	Optimalizace uložení ložisek převodovky domíchávače / Optimization of mixer truck gearbox bearing arrangements Górnisiewicz, Tomáš January 2015 (has links) This diploma thesis deals with a design and optimization of mixer truck gearbox bearing arrangements. The gearbox is an important part of mixer truck. Major emphasis is put on design of main bearing which is a key component of the gearbox because of high load applied on it. Two basic criterions were considered in main bearing design – dynamic radial load capacity and contact stress in contact of roller and raceways. Developed two-stage optimizing algorithm is based on standard bearing and allows to design new bearing which is specialized for carrying output shaft of mixer truck gearbox.
57	A simulation workflow to evaluate the performance of dynamic load balancing with over decomposition for iterative parallel applications Tesser, Rafael Keller January 2018 (has links) Nesta tese é apresentado um novo workflow de simulação para avaliar o desempenho do balanceamento de carga dinâmico baseado em sobre-decomposição aplicado a aplicações paralelas iterativas. Seus objetivos são realizar essa avaliação com modificações mínimas da aplicação e a baixo custo em termos de tempo e de sua necessidade de recursos computacionais. Muitas aplicações paralelas sofrem com desbalanceamento de carga dinâmico (temporal) que não pode ser tratado a nível de aplicação. Este pode ser causado por características intrínsecas da aplicação ou por fatores externos de hardware ou software. Como demonstrado nesta tese, tal desbalanceamento é encontrado mesmo em aplicações cujo código não aparenta qualquer dinamismo. Portanto, faz-se necessário utilizar mecanismo de balanceamento de carga dinâmico a nível de runtime. Este trabalho foca no balanceamento de carga dinâmico baseado em sobre-decomposição. No entanto, avaliar e ajustar o desempenho de tal técnica pode ser custoso. Isso geralmente requer modificações na aplicação e uma grande quantidade de execuções para obter resultados estatisticamente significativos com diferentes combinações de parâmetros de balanceamento de carga Além disso, para que essas medidas sejam úteis, são usualmente necessárias grandes alocações de recursos em um sistema de produção. Simulated Adaptive MPI (SAMPI), nosso workflow de simulação, emprega uma combinação de emulação sequencial e replay de rastros para reduzir os custos dessa avaliação. Tanto emulação sequencial como replay de rastros requerem um único nó computacional. Além disso, o replay demora apenas uma pequena fração do tempo de uma execução paralela real da aplicação. Adicionalmente à simulação de balanceamento de carga, foram desenvolvidas técnicas de agregação espacial e rescaling a nível de aplicação, as quais aceleram o processo de emulação. Para demonstrar os potenciais benefícios do balanceamento de carga dinâmico com sobre-decomposição, foram avaliados os ganhos de desempenho empregando essa técnica a uma aplicação iterativa paralela da área de geofísica (Ondes3D). Adaptive MPI (AMPI) foi utilizado para prover o suporte a balanceamento de carga dinâmico, resultando em ganhos de desempenho de até 36.58% em 288 cores de um cluster Essa avaliação também é usada pra ilustrar as dificuldades encontradas nesse processo, assim justificando o uso de simulação para facilitá-la. Para implementar o workflow SAMPI, foi utilizada a interface SMPI do simulador SimGrid, tanto no modo de emulação, como no de replay de rastros. Para validar esse simulador, foram comparadas execuções simuladas (SAMPI) e reais (AMPI) da aplicação Ondes3D. As simulações apresentaram uma evolução do balanceamento de carga bastante similar às execuções reais. Adicionalmente, SAMPI estimou com sucesso a melhor heurística de balanceamento de carga para os cenários testados. Além dessa validação, nesta tese é demonstrado o uso de SAMPI para exploração de parâmetros de balanceamento de carga e para planejamento de capacidade computacional. Quanto ao desempenho da simulação, estimamos que o workflow completo é capaz de simular a execução do Ondes3D com 24 combinações de parâmetros de balanceamento de carga em 5 horas para o nosso cenário de terremoto mais pesado e 3 horas para o mais leve. / In this thesis we present a novel simulation workflow to evaluate the performance of dynamic load balancing with over-decomposition applied to iterative parallel applications at low-cost. Its goals are to perform such evaluation with minimal application modification and at a low cost in terms of time and of resource requirements. Many parallel applications suffer from dynamic (temporal) load imbalance that can not be treated at the application level. It may be caused by intrinsic characteristics of the application or by external software and hardware factors. As demonstrated in this thesis, such dynamic imbalance can be found even in applications whose codes do not hint at any dynamism. Therefore, we need to rely on runtime dynamic load balancing mechanisms, such as dynamic load balancing based on over-decomposition. The problem is that evaluating and tuning the performance of such technique can be costly. This usually entails modifications to the application and a large number of executions to get statistically sound performance measurements with different load balancing parameter combinations. Moreover, useful and accurate measurements often require big resource allocations on a production cluster. Our simulation workflow, dubbed Simulated Adaptive MPI (SAMPI), employs a combined sequential emulation and trace-replay simulation approach to reduce the cost of such an evaluation Both sequential emulation and trace-replay require a single computer node. Additionally, the trace-replay simulation lasts a small fraction of the real-life parallel execution time of the application. Besides the basic SAMPI simulation, we developed spatial aggregation and applicationlevel rescaling techniques to speed-up the emulation process. To demonstrate the real-life performance benefits of dynamic load balance with over-decomposition, we evaluated the performance gains obtained by employing this technique on a iterative parallel geophysics application, called Ondes3D. Dynamic load balancing support was provided by Adaptive MPI (AMPI). This resulted in up to 36.58% performance improvement, on 288 cores of a cluster. This real-life evaluation also illustrates the difficulties found in this process, thus justifying the use of simulation. To implement the SAMPI workflow, we relied on SimGrid’s Simulated MPI (SMPI) interface in both emulation and trace-replay modes.To validate our simulator, we compared simulated (SAMPI) and real-life (AMPI) executions of Ondes3D. The simulations presented a load balance evolution very similar to real-life and were also successful in choosing the best load balancing heuristic for each scenario. Besides the validation, we demonstrate the use of SAMPI for load balancing parameter exploration and for computational capacity planning. As for the performance of the simulation itself, we roughly estimate that our full workflow can simulate the execution of Ondes3D with 24 different load balancing parameter combinations in 5 hours for our heavier earthquake scenario and in 3 hours for the lighter one. Processamento paralelo Computacao cientifica : Alto desempenho Parallel computing Charm++ AMPI SimGrid Iterative applications Simulation of distributed systems Over decomposition Dynamic load balancing Performance evaluation High performance computing
58	Hybridation directe d’une pile à combustible PEM et d’un organe supercapacitif de stockage : étude comparative du vieillissement en cyclage urbain, et gestion optimale de la consommation d’hydrogène / Direct hybridization of a PEM fuel cell and a supercapacitor storage device : Comparative study of aging in urban cycling, and optimal management of hydrogen consumption Arora, Divyesh 17 September 2019 (has links) La pile à combustible (FC) est peu adaptée aux variations brusques de puissance rencontrées dans les applications transport. L’hybridation de la pile à un supercondensateur (SC) a alors été étudiée, puisque cet organe de stockage capacitif permet de gérer les transitoires de puissance. L’hybridation est directe/passive, permettant ainsi de réduire le volume, la masse et le coût du système. Initialement, la faisabilité et l’impact de la taille du SC sur la performance de la FC en mono-cellule ont été examinés numériquement. Cette modélisation montre que l’augmentation de la taille du SC renforce l’effet de lissage induit par l’utilisation du SC sur le courant de la FC. Il en résulte des variations lentes et une réduction des amplitudes de courant et de tension, une diminution du courant efficace de la FC, et donc des pertes électriques de celle-ci. L’hybridation de la FC, comparativement à son fonctionnement seule, permet en outre de réduire la surconsommation en H2 de près de 50 % dans les mêmes conditions opératoires. Ces résultats ont été validés par des essais expérimentaux réalisés en mono-cellule et 3-cellules de 100 cm2 hybridée ou non. Par la suite, toujours en utilisant le protocole de cyclage urbain (FC-DLC), la durabilité de la FC a été étudiée lors d’essais de longue durée. La durabilité de la FC, qu’elle soit hybridée ou non, est la même. L’hybridation n’améliore donc pas la durée de vie de la FC mais ne lui nuit pas non plus. Par la suite, afin d’encore réduire la surconsommation en H2 en longue durée cyclage, différentes stratégies ont été étudiées : diminution du débit minimum des gaz imposé par le cyclage et diminution du coefficient de surstœchiométrie en H2. Ces changements n’ont pas d’influence sur la durabilité de la pile hybridée et ont permis de réduire à 10 % la surconsommation en hydrogène. La FC non hybridée, quant à elle, a vu sa durabilité divisée par deux lors de la diminution des débits minimum et ne fonctionnait pas avec le coefficient de surstœchiométrie ramené à 1,1. Ensuite, les travaux ont été étendus à un stack FC de forte puissance (Système Ballard de 1,2 kW) hybridé à deux modules de SC de 165 F (Maxwell Technologies). En final, un système hybride de 34 kW (FC de 10 kW et SC de 566.67 F) a montré des performances suffisantes pour une application transport urbain et péri-urbain. De plus, comparativement à une pile de 34 kW 21 % d’hydrogène sont économisés et l'investissement des équipements peut être réduit de près 50 % / The fuel cell (FC) is poorly adapted to the sudden variations in power encountered in transport applications. The FC hybridization to a supercapacitor (SC) was then studied, since this capacitive storage device allows to manage the power transients. Hybridization is direct/passive, thus reducing the volume, mass and cost of the system. Initially, the feasibility and the impact of SC size on FC performance have been examined numerically. Theoretical investigations show that increasing the size of SC enhances the smoothing effect introduced by the supercapacitor on FC current. This results into slow variations and reduction in both current and voltage amplitudes, a decrease in the fuel cell’s effective current, and therefore in FC electrical losses. Hybridization, compared to its FC operation alone, still reduces hydrogen overconsumption by nearly 50 % under the same operating conditions. These results have been validated by experimental tests carried out on a 100 cm2 single FC and a 3 cell stack. Later, the durability of the FC system has been investigated through long term tests. These durability tests have been conducted on the 100 cm2 single FC test bench using urban cycling protocol (FC-DLC), for both hybridized and unhybridized FC system, with continuous evaluation of degradation extent and causes. These tests suggest no detrimental impact on durability of the FC. For these two operating modes, a progressive aging of the gas diffusion layer seems to appear. Subsequently, in order to further reduce the overconsumption of hydrogen in long-term FC-DLC cycling, different strategies were studied: reducing the minimum gas flow rate imposed by FC-DLC cycling from 0.2 to 0.05 A cm-2, and reducing the hydrogen overstoichiometry coefficient from 1.2 to 1.1. These changes have no influence on the durability of the hybrid cell and have reduced hydrogen overconsumption to 10 %. On the contrary, in case of the unhybridized FC, durability was halved as minimum flows were reduced and it did not work when the overstoichiometry reduced coefficient. Further, work has been extended to high power FC systems (1.2 kW FC system, hybridized with two modules of 165 F, SC module). Finally, the FC downsizing has been demonstrated from 34kW FC system to hybrid source system of 10kW FC hybridized with 566.64 F SC, presenting 21 % hydrogen saving and nearly 50 % net cost savings. Hybridation directe PEMFC Supercondensateurs Économie d'hydrogène Cyclage urbain Direct hybridization PEMFC Supercapacitors Hydrogen saving Fuel Cell Dynamic Load Cycling 621.312 429 665.81
59	Dynamic amplification for moving vehicle loads on buried pipes : Evaluation of field-tests Smagina, Zana January 2001 (has links) No description available. Buried pipes impact factor dynamic amplification factor rolling wheel load vehicular dynamic load impact load field test surface roughness cover depth pipe stiffness
60	A Graphics Processing Unit Based Discontinuous Galerkin Wave Equation Solver with hp-Adaptivity and Load Balancing Tousignant, Guillaume 13 January 2023 (has links) In computational fluid dynamics, we often need to solve complex problems with high precision and efficiency. We propose a three-pronged approach to attain this goal. First, we use the discontinuous Galerkin spectral element method (DG-SEM) for its high accuracy. Second, we use graphics processing units (GPUs) to perform our computations to exploit available parallel computing power. Third, we implement a parallel adaptive mesh refinement (AMR) algorithm to efficiently use our computing power where it is most needed. We present a GPU DG-SEM solver with AMR and dynamic load balancing for the 2D wave equation. The DG-SEM is a higher-order method that splits a domain into elements and represents the solution within these elements as a truncated series of orthogonal polynomials. This approach combines the geometric flexibility of finite-element methods with the exponential convergence of spectral methods. GPUs provide a massively parallel architecture, achieving a higher throughput than traditional CPUs. They are relatively new as a platform in the scientific community, therefore most algorithms need to be adapted to that new architecture. We perform most of our computations in parallel on multiple GPUs. AMR selectively refines elements in the domain where the error is estimated to be higher than a prescribed tolerance, via two mechanisms: p-refinement increases the polynomial order within elements, and h-refinement splits elements into several smaller ones. This provides a higher accuracy in important flow regions and increases capabilities of modeling complex flows, while saving computing power in other parts of the domain. We use the mortar element method to retain the exponential convergence of high-order methods at the non-conforming interfaces created by AMR. We implement a parallel dynamic load balancing algorithm to even out the load imbalance caused by solving problems in parallel over multiple GPUs with AMR. We implement a space-filling curve-based repartitioning algorithm which ensures good locality and small interfaces. While the intense calculations of the high order approach suit the GPU architecture, programming of the highly dynamic adaptive algorithm on GPUs is the most challenging aspect of this work. The resulting solver is tested on up to 64 GPUs on HPC platforms, where it shows good strong and weak scaling characteristics. Several example problems of increasing complexity are performed, showing a reduction in computation time of up to 3× on GPUs vs CPUs, depending on the loading of the GPUs and other user-defined choices of parameters. AMR is shown to improve computation times by an order of magnitude or more. Spectral element methods discontinuous Galerkin graphics processing units adaptive mesh refinement space-filling curves Hilbert curve dynamic load balancing high performance computing

Search results