Spelling suggestions: "subject:"bperformance optimization"" "subject:"deperformance optimization""
61 |
Acceleration of Compressible Flow Simulations with Edge using Implicit Time SteppingOtero, Evelyn January 2012 (has links)
Computational fluid dynamics (CFD) has become a significant tool routinely used in design and optimization in aerospace industry. Typical flows may be characterized by high-speed and compressible flow features and, in many cases, by massive flow separation and unsteadiness. Accurate and efficient numerical solution of time-dependent problems is hence required, and the efficiency of standard dual-time stepping methods used for unsteady flows in many CFD codes has been found inadequate for large-scale industrial problems. This has motivated the present work, in which major effort is made to replace the explicit relaxation methods with implicit time integration schemes. The CFD flow solver considered in this work is Edge, a node-based solver for unstructured grids based on a dual, edge-based formulation. Edge is the Swedish national CFD tool for computing compressible flow, used at the Swedish aircraft manufacturer SAAB, and developed at FOI, lately in collaboration with external national and international partners. The work is initially devoted to the implementation of an implicit Lower-Upper Symmetric Gauss-Seidel (LU-SGS) type of relaxation in Edge with the purpose to speed up the convergence to steady state. The convergence of LU-SGS has been firstly accelerated by basing the implicit operator on a flux splitting method of matrix dissipation type. An increase of the diagonal dominance of the system matrix was the principal motivation. Then the code has been optimized by means of performance tools Intel Vtune and CrayPAT, improving the run time. It was found that the ordering of the unknowns significantly influences the convergence. Thus, different ordering techniques have been investigated. Finding the optimal ordering method is a very hard problem and the results obtained are mostly illustrative. Finally, to improve convergence speed on the stretched computational grids used for boundary layers LU-SGS has been combined with the line-implicit method. / QC 20120626
|
62 |
Efficient Compiler and Runtime Support for Serializability and Strong Semantics on Commodity HardwareSengupta, Aritra 07 September 2017 (has links)
No description available.
|
63 |
Evaluation of natural materials in Sustainable Buildings : A potential solution to the European 2050 long-term strategyde las Heras Reverte, Víctor January 2021 (has links)
Today, buildings consume 40% of total energy demand in the EU and are responsible for 36% of GHG emissions. For this reason, and due to the delicate situation of climate change that planet Earth is experiencing, solutions are being sought to make the building sector more sustainable. In the current project, the use of natural materials has been chosen as a solution in line with the EU 2050 long-term strategy. This research broadens the knowledge on sustainable building with natural materials as an alternative to conventional construction. To this end, first, an extensive state of the art has been carried out to gather information and identify research gaps on natural building materials and energy efficiency, proving the suitability of natural construction materials. Special emphasis has been put on straw bale construction and rammed earth construction, which have been studied individually. In addition, geometrically identical building models of both building techniques have been developed and simulated in Stockholm and Valencia in order to see how they would perform in different climates. Total energy demand for the straw-bale building of 140.22 kWh/(m2·year) in the case of Stockholm and 37.05 kWh/(m2·year) in the case of Valencia has been obtained. For the rammed earth building, a total demand of 301.82 kWh/(m2·year) has been obtained in Stockholm and 78.66 kWh/(m2·year) in Valencia. Once passive measures are applied in the different models, a reduction in demand for the straw bale building of 77.8% and 36.3% has been achieved for Stockholm and Valencia, respectively. In the rammed earth building, in contrast, the demand has been reduced by 86.3% in Stockholm and 73.9% in Valencia. Heat recovery ventilation and high insulation level have been identified as imperative needs in Stockholm, in contrast to Valencia. Other improvement strategies such as windows substitution, air permeability improvement, or natural ventilation for cooling have been implemented. Apart from that, better performance of the straw-bale buildings has been identified for both climates. Additionally, focusing on thermal inertia, its influence has been identified as not completely significant in terms of annual demand in the simulated climates.
|
64 |
Effiziente parallele Sortier- und Datenumverteilungsverfahren für Partikelsimulationen auf Parallelrechnern mit verteiltem Speicher / Efficient Parallel Sorting and Data Redistribution Methods for Particle Codes on Distributed Memory SystemsHofmann, Michael 16 April 2012 (has links) (PDF)
Partikelsimulationen repräsentieren eine Klasse von daten- und rechenintensiven Simulationsanwendungen, die in unterschiedlichen Bereichen der Wissenschaft und der industriellen Forschung zum Einsatz kommen. Der hohe Berechnungsaufwand der eingesetzten Lösungsmethoden und die großen Datenmengen, die zur Modellierung realistischer Probleme benötigt werden, machen die Nutzung paralleler Rechentechnik hierfür unverzichtbar. Parallelrechner mit verteiltem Speicher stellen dabei eine weit verbreitete Architektur dar, bei der eine Vielzahl an parallel arbeitenden Rechenknoten über ein Verbindungsnetzwerk miteinander Daten austauschen können. Die Berechnung von Wechselwirkungen zwischen Partikeln stellt oft den Hauptaufwand einer Partikelsimulation dar und wird mit Hilfe schneller Lösungsmethoden, wie dem Barnes-Hut-Algorithmus oder der Schnellen Multipolmethode, durchgeführt. Effiziente parallele Implementierungen dieser Algorithmen benötigen dabei eine Sortierung der Partikel nach ihren räumlichen Positionen. Die Sortierung ist sowohl notwendig, um einen effizienten Zugriff auf die Partikeldaten zu erhalten, als auch Teil von Optimierungen zur Erhöhung der Lokalität von Speicherzugriffen, zur Minimierung der Kommunikation und zur Verbesserung der Lastbalancierung paralleler Berechnungen.
Die vorliegende Dissertation beschäftigt sich mit der Entwicklung eines effizienten parallelen Sortierverfahrens und der dafür benötigten Kommunikationsoperationen zur Datenumverteilung in Partikelsimulationen. Hierzu werden eine Vielzahl existierender paralleler Sortierverfahren für verteilten Speicher analysiert und mit den Anforderungen von Seiten der Partikelsimulationsanwendungen verglichen. Besondere Herausforderungen ergeben sich dabei hinsichtlich der Aufteilung der Partikeldaten auf verteilten Speicher, der Gewichtung zu sortierender Daten zur verbesserten Lastbalancierung, dem Umgang mit doppelten Schlüsselwerten sowie der Verfügbarkeit und Nutzung speichereffizienter Kommunikationsoperationen. Um diese Anforderungen zu erfüllen, wird ein neues paralleles Sortierverfahren entwickelt und in die betrachteten Anwendungsprogramme integriert. Darüber hinaus wird ein neuer In-place-Algorithmus für der MPI_Alltoallv-Kommunikationsoperation vorgestellt, mit dem der Speicherverbrauch für die notwendige Datenumverteilung innerhalb der parallelen Sortierung deutlich reduziert werden kann. Das Verhalten aller entwickelten Verfahren wird jeweils isoliert und im praxisrelevanten Einsatz innerhalb verschiedener Anwendungsprogramme und unter Verwendung unterschiedlicher, insbesondere auch hochskalierbarer Parallelrechner untersucht.
|
65 |
Analyse numérique et expérimentale d’un doublet de rotors contrarotatifs caréné au point fixe / Experimental and numerical analysis of a shrouded contrarotating coaxial rotor in hoverHuo, Chao 26 March 2012 (has links)
Cette étude se propose d’analyser le comportement du double rotor contra-rotatif caréné dans lecadre des échelles réduites des microdrones, pour exploiter le potentiel d’amélioration desperformances stationnaires des rotors libres. La demande d’une performance propulsive de hautniveau, alors que les échelles sont très réduites constitue un véritable défi scientifique. De façongénérale, par rapport au rotor libre, l’ajout de la carène permet de piloter la contraction del’écoulement et offre un potentiel de poussée de carène. La tuyère par sa condition d’adaptationpilote le débit entrant à puissance donnée. L’augmentation du débit massique, par comparaison ausystème de rotor libre, amplifie la poussée à travers la dépression distribuée sur toute la surface decaptation. Pour comprendre les lois de fonctionnement d’un système propulsif caréné, il a d’abord été proposé un modèle théorique simplifié basé sur une extension de la théorie de Froude pour les rotors libres: le système rotor est assimilé à un disque actuateur, générateur de débit dans une conduite à section variable. Une simulation Navier Stokes 2D axisymétrique a permis d’optimiser les paramètres de forme du carénage. Les simulations ont confirmé l’influence déterminante des sections d’entrée et de sortie, et relativisé l’impact des formes possibles, pourvu que les variations de sections limitent le décollement de la couche limite. Après conception d’un banc d’essai utilisant un doublet de rotor coaxial placé dans cette carène optimisée, l’étude expérimentale complète et confirme les performances globales du système et qualifie l’écoulement méridien. Enfin, une simulation 3D instationnaire a été entreprise pour compléter l’analyse de l’écoulement autour des rotors. / This study aims to analyze the behavior of shrouded, contrarotating coaxial rotor in the reducedMAVs’ scale in order to exploit its potential to improve the free rotor steady performance. The highhover ability under low operational Reynolds number is therefore, a scientific challenge. Generally,comparing with free rotor, the addition of the shroud decreases the flow contraction and gives thepotential to generate an extra thrust. A suitable nozzle can control the mass flow for a given power.The increased mass flow, comparing with free rotor, amplifies the thrust offered by the lowpressure formed at the air entrance. To understand the principals of shrouded propulsion system, a simplified theory model was first proposed through the extension of Froude theory for free rotors: the double rotor is initially treated as an actuator disk, generating the flow at varied sections through the shroud passage. A 2D simulation which accounts for an axial flow of viscous effects within the actual shroud profile, confirmed effects of all defined geometrical parameters. It further demonstrated that within the non-stalling region of the different crosssections, shroud shape and inlet shape do not have asignificant impact on performance. The experimental study, carried out with coaxial rotor, contributed to the confirmation of the overall performance and the approximation of the flow field through the shroud. Meanwhile, the 3D simulation, developed to better model the actual coaxial rotor in counter rotation, was validated to well solve the steady performance. It was applied to complement the analysis of the flow around the coaxial rotor.
|
66 |
EstratÃgia Ativa no Mercado AcionÃrio Brasileiro: otimizaÃÃo ou aposta na winners? / Strategy Active in the Brazilian stock market: investment in optimization or winners?Cauà MÃrcio dos Reis 17 September 2010 (has links)
nÃo hà / Este artigo analisa grÃfica e quantitativamente a performance, mensurada sob vÃrias mÃtricas de risco-retorno, de estratÃgias ativas disponÃveis para um investidor brasileiro que opte por compor carteiras dinÃmicas de aÃÃes transacionadas na Bolsa de Valores de SÃo Paulo. As estratÃgias adotadas se baseiam: (i) em âapostarâ em aÃÃes que se mostraram vencedoras em Sharpe e Treynor no ano anterior, compondo carteiras equal-weighted ou (ii) em definir os pesos a partir da
otimizaÃÃo destas duas mÃtricas de performance, as mais usuais no mercado financeiro. Em suma, em perÃodos de boom econÃmico-financeiro, ou seja, atà 2007 e durante 2009, ao lidar com o trade-off entre o uso de tÃcnicas mais sofisticadas de composiÃÃo de carteira, o investidor brasileiro teria obtido um retorno nominal acumulado bastante superior quando do uso da otimizaÃÃo do Ãndice de Sharpe â acima de 4000% entre julho de 1995 e dezembro de 2007, por exemplo â, vis-Ã-visas demais estratÃgias e mesmo quando comparado aos maiores fundos de investimento em aÃÃes ou ainda aos benchmarks de mercado e setoriais, ao quais nÃo ultrapassaram 2500%. Em termos de performance risco-retorno, as estratÃgias de aposta nas vencedoras em Sharpe ou Treynor se mostram as mais adequadas. Em perÃodos de crise financeira, analisando sob todas as mÃtricas de ganho ou
performance, o investidor deveria ter optado por uma postura passiva. / This paper analyzes the risk-return performance, graphically and quantitatively - measured under various metrics, of active strategies available for a Brazilian investor
who chooses to compose dynamic portfolios with stocks traded in BOVEPSA, Bolsa de Valores de SÃo Paulo. The strategies used here are based on: (i) "betting" in Sharpe and Treynor winners securities the previous year, composing an equalweighted portfolio or (ii) optimizing these two widely used performance metrics, and the defining the weights. To summarize, in periods of economic boom - until 2007
and during 2009 - when dealing with the trade-off between using more sophisticated portfolio composition techniques, the Brazilian investor would have gotten a much
higher accumulated nominal return when using the Sharpe index optimization â over 4000% between July 1995 and December 2007, for example â vis-Ã-vis other
strategies and even when compared to larger stock mutual funds or to the market and industry benchmarks, which does not exceed 2500%. In terms of risk-return
performance, the betting strategies based on Sharpe and Treynor winners are the most appropriate. In periods of financial crisis, observing any performance measure,
an investor should have chosen for a passive strategy.
|
67 |
High Performance by Exploiting Information Locality through Reverse Computing / Hautes Performances en Exploitant la Localité de l'Information via le Calcul Réversible.Bahi, Mouad 21 December 2011 (has links)
Les trois principales ressources du calcul sont le temps, l'espace et l'énergie, les minimiser constitue un des défis les plus importants de la recherche de la performance des processeurs.Dans cette thèse, nous nous intéressons à un quatrième facteur qui est l'information. L'information a un impact direct sur ces trois facteurs, et nous montrons comment elle contribue ainsi à l'optimisation des performances. Landauer a montré que c’est la destruction - logique - d’information qui coûte de l’énergie, ceci est un résultat fondamental de la thermodynamique en physique. Sous cette hypothèse, un calcul ne consommant pas d’énergie est donc un calcul qui ne détruit pas d’information. On peut toujours retrouver les valeurs d’origine et intermédiaires à tout moment du calcul, le calcul est réversible. L'information peut être portée non seulement par une donnée mais aussi par le processus et les données d’entrée qui la génèrent. Quand un calcul est réversible, on peut aussi retrouver une information au moyen de données déjà calculées et du calcul inverse. Donc, le calcul réversible améliore la localité de l'information. La thèse développe ces idées dans deux directions. Dans la première partie, partant d'un calcul, donné sous forme de DAG (graphe dirigé acyclique), nous définissons la notion de « garbage » comme étant la taille mémoire – le nombre de registres - supplémentaire nécessaire pour rendre ce calcul réversible. Nous proposons un allocateur réversible de registres, et nous montrons empiriquement que le garbage est au maximum la moitié du nombre de noeuds du graphe.La deuxième partie consiste à appliquer cette approche au compromis entre le recalcul (direct ou inverse) et le stockage dans le contexte des supercalculateurs que sont les récents coprocesseurs vectoriels et parallèles, cartes graphiques (GPU, Graphics Processing Unit), processeur Cell d’IBM, etc., où le fossé entre temps d’accès à la mémoire et temps de calcul ne fait que s'aggraver. Nous montons comment le recalcul en général, et le recalcul inverse en particulier, permettent de minimiser la demande en registres et par suite la pression sur la mémoire. Cette démarche conduit également à augmenter significativement le parallélisme d’instructions (Cell BE), et le parallélisme de threads sur un multicore avec mémoire et/ou banc de registres partagés (GPU), dans lequel le nombre de threads dépend de manière importante du nombre de registres utilisés par un thread. Ainsi, l’ajout d’instructions du fait du calcul inverse pour la rematérialisation de certaines variables est largement compensé par le gain en parallélisme. Nos expérimentations sur le code de Lattice QCD porté sur un GPU Nvidia montrent un gain de performances atteignant 11%. / The main resources for computation are time, space and energy. Reducing them is the main challenge in the field of processor performance.In this thesis, we are interested in a fourth factor which is information. Information has an important and direct impact on these three resources. We show how it contributes to performance optimization. Landauer has suggested that independently on the hardware where computation is run information erasure generates dissipated energy. This is a fundamental result of thermodynamics in physics. Therefore, under this hypothesis, only reversible computations where no information is ever lost, are likely to be thermodynamically adiabatic and do not dissipate power. Reversibility means that data can always be retrieved from any point of the program. Information may be carried not only by the data but also by the process and input data that generate it. When a computation is reversible, information can also be retrieved from other already computed data and reverse computation. Hence reversible computing improves information locality.This thesis develops these ideas in two directions. In the first part, we address the issue of making a computation DAG (directed acyclic graph) reversible in terms of spatial complexity. We define energetic garbage as the additional number of registers needed for the reversible computation with respect to the original computation. We propose a reversible register allocator and we show empirically that the garbage size is never more than 50% of the DAG size. In the second part, we apply this approach to the trade-off between recomputing (direct or reverse) and storage in the context of supercomputers such as the recent vector and parallel coprocessors, graphical processing units (GPUs), IBM Cell processor, etc., where the gap between processor cycle time and memory access time is increasing. We show that recomputing in general and reverse computing in particular helps reduce register requirements and memory pressure. This approach of reverse rematerialization also contributes to the increase of instruction-level parallelism (Cell) and thread-level parallelism in multicore processors with shared register/memory file (GPU). On the latter architecture, the number of registers required by the kernel limits the number of running threads and affects performance. Reverse rematerialization generates additional instructions but their cost can be hidden by the parallelism gain. Experiments on the highly memory demanding Lattice QCD simulation code on Nvidia GPU show a performance gain up to 11%.
|
68 |
Effiziente parallele Sortier- und Datenumverteilungsverfahren für Partikelsimulationen auf Parallelrechnern mit verteiltem SpeicherHofmann, Michael 09 March 2012 (has links)
Partikelsimulationen repräsentieren eine Klasse von daten- und rechenintensiven Simulationsanwendungen, die in unterschiedlichen Bereichen der Wissenschaft und der industriellen Forschung zum Einsatz kommen. Der hohe Berechnungsaufwand der eingesetzten Lösungsmethoden und die großen Datenmengen, die zur Modellierung realistischer Probleme benötigt werden, machen die Nutzung paralleler Rechentechnik hierfür unverzichtbar. Parallelrechner mit verteiltem Speicher stellen dabei eine weit verbreitete Architektur dar, bei der eine Vielzahl an parallel arbeitenden Rechenknoten über ein Verbindungsnetzwerk miteinander Daten austauschen können. Die Berechnung von Wechselwirkungen zwischen Partikeln stellt oft den Hauptaufwand einer Partikelsimulation dar und wird mit Hilfe schneller Lösungsmethoden, wie dem Barnes-Hut-Algorithmus oder der Schnellen Multipolmethode, durchgeführt. Effiziente parallele Implementierungen dieser Algorithmen benötigen dabei eine Sortierung der Partikel nach ihren räumlichen Positionen. Die Sortierung ist sowohl notwendig, um einen effizienten Zugriff auf die Partikeldaten zu erhalten, als auch Teil von Optimierungen zur Erhöhung der Lokalität von Speicherzugriffen, zur Minimierung der Kommunikation und zur Verbesserung der Lastbalancierung paralleler Berechnungen.
Die vorliegende Dissertation beschäftigt sich mit der Entwicklung eines effizienten parallelen Sortierverfahrens und der dafür benötigten Kommunikationsoperationen zur Datenumverteilung in Partikelsimulationen. Hierzu werden eine Vielzahl existierender paralleler Sortierverfahren für verteilten Speicher analysiert und mit den Anforderungen von Seiten der Partikelsimulationsanwendungen verglichen. Besondere Herausforderungen ergeben sich dabei hinsichtlich der Aufteilung der Partikeldaten auf verteilten Speicher, der Gewichtung zu sortierender Daten zur verbesserten Lastbalancierung, dem Umgang mit doppelten Schlüsselwerten sowie der Verfügbarkeit und Nutzung speichereffizienter Kommunikationsoperationen. Um diese Anforderungen zu erfüllen, wird ein neues paralleles Sortierverfahren entwickelt und in die betrachteten Anwendungsprogramme integriert. Darüber hinaus wird ein neuer In-place-Algorithmus für der MPI_Alltoallv-Kommunikationsoperation vorgestellt, mit dem der Speicherverbrauch für die notwendige Datenumverteilung innerhalb der parallelen Sortierung deutlich reduziert werden kann. Das Verhalten aller entwickelten Verfahren wird jeweils isoliert und im praxisrelevanten Einsatz innerhalb verschiedener Anwendungsprogramme und unter Verwendung unterschiedlicher, insbesondere auch hochskalierbarer Parallelrechner untersucht.
|
69 |
Energy retrofit of an office building in Stockholm: feasibility analysis of an EWIS / Energieffektivisering av en kontorsbyggnad i Stockholm genom tilläggsisolering – en fallstudieLapioli, Simone January 2016 (has links)
The energy retrofit of existing buildings has always been a challenging task to accomplish. The example of the Swecohuset building, proves how an integrated approach design between architectural and energetic aspects as well as the use of well-known and efficient technologies are key aspects to achieve the energy-saving goal. This work, in the first part describes the Swecohuset retrofit process, along with the reasons behind the choices which have led to the current result of a reduction by 2/3 of the energy need for space conditioning purposes. Then, in the second part, after a brief focus on the passive aspects which characterize the current energy performance of the building, it is carried out a feasibility analysis of an EWIS (external wall insulation system) by studying its interaction with a complex system as an optimization problem, with the main purpose of understanding the basis of the BPO and explore further building potentialities. / SIRen
|
70 |
Scalable critical-path analysis and optimization guidance for hybrid MPI-CUDA applicationsSchmitt, Felix, Dietrich, Robert, Juckeland, Guido 29 October 2019 (has links)
The use of accelerators in heterogeneous systems is an established approach in designing petascale applications. Today, Compute Unified Device Architecture (CUDA) offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both the CPU and the GPU. From this increasing program complexity emerges the need for sophisticated performance tools. This work contributes by analyzing hybrid MPICUDA programs for properties based on wait states, such as the critical path, a metric proven to identify application bottlenecks effectively. We developed a tool to construct a dependency graph based on an execution trace and the inherent dependencies of the programming models CUDA and Message Passing Interface (MPI). Thereafter, it detects wait states and attributes blame to responsible activities. Together with the property of being on the critical path, we can identify activities that are most viable for optimization. To evaluate the global impact of optimizations to critical activities, we predict the program execution using a graph-based performance projection. The developed approach has been demonstrated with suitable examples to be both scalable and correct. Furthermore, we establish a new categorization of CUDA inefficiency patterns ensuing from the dependencies between CUDA activities.
|
Page generated in 0.1322 seconds