361 |
Finite Element Analysis and Genetic Algorithm Optimization Design for the Actuator Placement on a Large Adaptive StructureSheng, Lizeng 29 December 2004 (has links)
The dissertation focuses on one of the major research needs in the area of adaptive /intelligent/smart structures, the development and application of finite element analysis and genetic algorithms for optimal design of large-scale adaptive structures. We first review some basic concepts in finite element method and genetic algorithms, along with the research on smart structures. Then we propose a solution methodology for solving a critical problem in the design of a next generation of large-scale adaptive structures -- optimal placements of a large number of actuators to control thermal deformations. After briefly reviewing the three most frequently used general approaches to derive a finite element formulation, the dissertation presents techniques associated with general shell finite element analysis using flat triangular laminated composite elements. The element used here has three nodes and eighteen degrees of freedom and is obtained by combining a triangular membrane element and a triangular plate bending element. The element includes the coupling effect between membrane deformation and bending deformation. The membrane element is derived from the linear strain triangular element using Cook's transformation. The discrete Kirchhoff triangular (DKT) element is used as the plate bending element. For completeness, a complete derivation of the DKT is presented. Geometrically nonlinear finite element formulation is derived for the analysis of adaptive structures under the combined thermal and electrical loads. Next, we solve the optimization problems of placing a large number of piezoelectric actuators to control thermal distortions in a large mirror in the presence of four different thermal loads. We then extend this to a multi-objective optimization problem of determining only one set of piezoelectric actuator locations that can be used to control the deformation in the same mirror under the action of any one of the four thermal loads. A series of genetic algorithms, GA Version 1, 2 and 3, were developed to find the optimal locations of piezoelectric actuators from the order of 10<SUP>21</SUP> ~ 10<SUP>56</SUP> candidate placements. Introducing a variable population approach, we improve the flexibility of selection operation in genetic algorithms. Incorporating mutation and hill climbing into micro-genetic algorithms, we are able to develop a more efficient genetic algorithm. Through extensive numerical experiments, we find that the design search space for the optimal placements of a large number of actuators is highly multi-modal and that the most distinct nature of genetic algorithms is their robustness. They give results that are random but with only a slight variability. The genetic algorithms can be used to get adequate solution using a limited number of evaluations. To get the highest quality solution, multiple runs including different random seed generators are necessary. The investigation time can be significantly reduced using a very coarse grain parallel computing. Overall, the methodology of using finite element analysis and genetic algorithm optimization provides a robust solution approach for the challenging problem of optimal placements of a large number of actuators in the design of next generation of adaptive structures. / Ph. D.
|
362 |
Development of Approximations for HSCT Wing Bending Material Weight using Response Surface MethodologyBalabanov, Vladimir Olegovich 01 October 1997 (has links)
A procedure for generating a customized weight function for wing bending material weight of a High Speed Civil Transport (HSCT) is described. The weight function is based on HSCT configuration parameters. A response surface methodology is used to fit a quadratic polynomial to data gathered from a large number of structural optimizations. To reduce the time of performing a large number of structural optimizations, coarse-grained parallelization with a master-slave processor assignment on an Intel Paragon computer is used. The results of the structural optimization are noisy. Noise reduction in the structural optimization results is discussed. It is shown that the response surface filters out this noise. A statistical design of experiments technique is used to minimize the number of required structural optimizations and to maintain accuracy. Simple analysis techniques are used to find regions of the design space where reasonable HSCT designs could occur, thus customizing the weight function to the design requirements of the HSCT, while the response surface itself is created employing detailed analysis methods. Analysis of variance is used to reduce the number of polynomial terms in the response surface model function. Linear and constant corrections based on a small number of high fidelity results are employed to improve the accuracy of the response surface model. Configuration optimization of the HSCT employing a customized weight function is compared to the configuration optimization of the HSCT with a general weight function. / Ph. D.
|
363 |
Malleable Contextual Partitioning and Computational DreamingBrar, Gurkanwal Singh 20 January 2015 (has links)
Computer Architecture is entering an era where hundreds of Processing Elements (PE) can be integrated onto single chips even as decades-long, steady advances in instruction, thread level parallelism are coming to an end. And yet, conventional methods of parallelism fail to scale beyond 4-5 PE's, well short of the levels of parallelism found in the human brain. The human brain is able to maintain constant real time performance as cognitive complexity grows virtually unbounded through our lifetime. Our underlying thesis is that contextual categorization leading to simplified algorithmic processing is crucial to the brains performance efficiency. But, since the overheads of such reorganization are unaffordable in real time, we also observe the critical role of sleep and dreaming in the lives of all intelligent beings. Based on the importance of dream sleep in memory consolidation, we propose that it is also responsible for contextual reorganization. We target mobile device applications that can be personalized to the user, including speech, image and gesture recognition, as well as other kinds of personalized classification, which are arguably the foundation of intelligence. These algorithms rely on a knowledge database of symbols, where the database size determines the level of intelligence. Essential to achieving intelligence and a seamless user interface however is that real time performance be maintained. Observing this, we define our chief performance goal as: Maintaining constant real time performance against ever increasing algorithmic and architectural complexities. Our solution is a method for Malleable Contextual Partitioning (MCP) that enables closer personalization to user behavior. We conceptualize a novel architectural framework, the Dream Architecture for Lateral Intelligence (DALI) that demonstrates the MCP approach. The DALI implements a dream phase to execute MCP in ideal MISD parallelism and reorganize its architecture to enable contextually simplified real time operation. With speech recognition as an example application, we show that the DALI is successful in achieving the performance goal, as it maintains constant real time recognition, scaling almost ideally, with PE numbers up to 16 and vocabulary size up to 220 words. / Master of Science
|
364 |
Parallel paradigms in optimal structural designVan Huyssteen, Salomon Stephanus 12 1900 (has links)
Thesis (MScEng)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: Modern-day processors are not getting any faster. Due to the power consumption limit of frequency
scaling, parallel processing is increasingly being used to decrease computation time. In
this thesis, several parallel paradigms are used to improve the performance of commonly serial
SAO programs. Four novelties are discussed:
First, replacing double precision solvers with single precision solvers. This is attempted in order
to take advantage of the anticipated factor 2 speed increase that single precision computations
have over that of double precision computations. However, single precision routines present
unpredictable performance characteristics and struggle to converge to required accuracies, which
is unfavourable for optimization solvers.
Second, QP and dual are statements pitted against one another in a parallel environment. This
is done because it is not always easy to see which is best a priori. Therefore both are started in
parallel and the competing threads are cancelled as soon as one returns a valid point. Parallel QP
vs. dual statements prove to be very attractive, converging within the minimum number of outer
iterations. The most appropriate solver is selected as the problem properties change during the
iteration steps. Thread cancellation poses problems caused by threads having to wait to arrive at
appropriate checkpoints, thus su ering from unnecessarily long wait times because of struggling
competing routines.
Third, multiple global searches are started in parallel on a shared memory system. Problems
see a speed increase of nearly 4x for all problems. Dynamically scheduled threads alleviate the
need for set thread amounts, as in message passing implementations.
Lastly, the replacement of existing matrix-vector multiplication routines with optimized BLAS
routines, especially BLAS routines targeted at GPGPU technologies (graphics processing units),
proves to be superior when solving large matrix-vector products in an iterative environment. These problems scale well within the hardware capabilities and speedups of up to 36x are
recorded. / AFRIKAANSE OPSOMMING: Hedendaagse verwerkers word nie vinniger nie as gevolg van kragverbruikingslimiet soos die
verwerkerfrekwensie op-skaal. Parallelle prosesseering word dus meer dikwels gebruik om berekeningstyd
te laat daal. Verskeie parallelle paradigmas word gebruik om die prestasie van
algemeen sekwensiële optimeringsprogramme te verbeter. Vier ontwikkelinge word bespreek:
Eerste, is die vervanging van dubbel presisie roetines met enkel presisie roetines. Dit poog om
voordeel te trek uit die faktor 2 spoed verbetering wat enkele presisie berekeninge het oor dubbel
presisie berekeninge. Enkele presisie roetines is onvoorspelbaar en sukkel in meeste gevalle om
die korrekte akkuraatheid te vind.
Tweedens word QP teen duale algoritmes in ’n parallel omgewing gebruik. Omdat dit nie altyd
voor die tyd maklik is om te sien watter een die beste gaan presteer nie, word almal in parallel
begin en die mededingers word dan gekanselleer sodra een terugkeer met ’n geldige KKT punt.
Parallele QP teen duale algoritmes blyk om baie aantreklik te wees. Konvergensie gebeur in alle
gevalle binne die minimum aantal iterasies. Die mees geskikte algoritme word op elke iterasie
gebruik soos die probleem eienskappe verander gedurende die iterasie stappe. “Thread” kanseleering
hou probleme in en word veroorsaak deur “threads” wat moet wag om die kontrolepunte
te bereik, dus ly die beste roetines onnodig as gevolg van meededinger roetines was sukkel.
Derdens, verskeie globale optimerings word in parallel op ’n “shared memory” stelsel begin.
Probleme bekom ’n spoed verhoging van byna vier maal vir alle probleme. Dinamiese geskeduleerde
“threads” verlig die behoefte aan voorafbepaalde “threads” soos gebruik word in die
“message passing” implementerings.
Laastens is die vervanging van die bestaande matriks-vektor vermenigvuldiging roetines met
geoptimeerde BLAS roetines, veral BLAS roetines wat gerig is op GPGPU tegnologië. Die GPU roetines bewys om superieur te wees wanneer die oplossing van groot matrix-vektor produkte in
’n iteratiewe omgewing gebruik word. Hierdie probleme skaal ook goed binne die hardeware se
vermoëns, vir die grootste probleme wat getoets word, word ’n versnelling van 36 maal bereik.
|
365 |
Modeling Multi-factor Financial Derivatives by a Partial Differential Equation Approach with Efficient Implementation on Graphics Processing UnitsDang, Duy Minh 15 November 2013 (has links)
This thesis develops efficient modeling frameworks via a Partial Differential Equation (PDE) approach for multi-factor financial derivatives, with emphasis on three-factor models, and studies highly efficient implementations of the numerical methods on novel high-performance computer architectures, with particular focus on Graphics Processing Units (GPUs) and multi-GPU platforms/clusters of GPUs. Two important classes of multi-factor financial instruments are considered: cross-currency/foreign exchange (FX) interest rate derivatives and multi-asset options. For cross-currency interest rate derivatives, the focus of the thesis is on Power Reverse Dual Currency (PRDC) swaps with three of the most popular exotic features, namely Bermudan cancelability, knockout, and FX Target Redemption. The modeling of PRDC swaps using one-factor Gaussian models for the domestic and foreign interest short rates, and a one-factor skew model for the spot FX rate results in a time-dependent parabolic PDE in three space dimensions. Our proposed PDE pricing framework is based on partitioning the pricing problem into several independent pricing subproblems over each time period of the swap's tenor structure, with possible communication at the end of the time period. Each of these subproblems requires a solution of the model PDE. We then develop a highly efficient GPU-based parallelization of the Alternating Direction Implicit (ADI) timestepping methods for solving the model PDE. To further handle the substantially increased computational requirements due to the exotic features, we extend the pricing procedures to multi-GPU platforms/clusters of GPUs to solve each of these independent subproblems on a separate GPU. Numerical results indicate that the proposed GPU-based parallel numerical methods are highly efficient and provide significant increase in performance over CPU-based methods when pricing PRDC swaps. An analysis of the impact of the FX volatility skew on the price of PRDC swaps is provided.
In the second part of the thesis, we develop efficient pricing algorithms for multi-asset options under the Black-Scholes-Merton framework, with strong emphasis on multi-asset American options. Our proposed pricing approach is built upon a combination of (i) a discrete penalty approach for the linear complementarity problem arising due to the free boundary and (ii) a GPU-based parallel ADI Approximate Factorization technique for the solution of the linear algebraic system arising from each penalty iteration. A timestep size selector implemented efficiently on GPUs is used to further increase the efficiency of the methods. We demonstrate the efficiency and accuracy of the proposed GPU-based parallel numerical methods by pricing American options written on three assets.
|
366 |
Upravljanje tokovima aktivnosti u distributivnom menadžment sistemu / Workflow management system for DMSNedić Nemanja 24 February 2016 (has links)
<p>U radu je predstavljeno istraživanje vezano za poboljšanje performansi rada velikih nadzorno-upravljačkih sistema poput DMS-a. Ovaj cilj je postignut koordinacijom izvršavanja tokova aktivnosti, što podrazumeva efikasnu raspodelu zadataka na računarske resurse. U te svrhe razvijeni su i testirani različiti algoritmi. Ovakav pristup je obezbedio veći stepen iskorišćenja računarskih resursa, što je rezultiralo boljim performansama.</p> / <p>Thе paper presents an approach how to improve performance of larger scale distributed utility management system such as DMS. This goal is accomplished by using an intelligent workflow management. Workflows are divided into the atomic tasks which are scheduled to computing resources for execution. For these purposes various scheduling algorithms are developed and thoroughly tested. This approach has provided greater utilization of computing resources which further have resulted in better performance.</p>
|
367 |
Un environnement pour le calcul intensif pair à pair / An environment for peer-to-peer high performance computingNguyen, The Tung 16 November 2011 (has links)
Le concept de pair à pair (P2P) a connu récemment de grands développements dans les domaines du partage de fichiers, du streaming vidéo et des bases de données distribuées. Le développement du concept de parallélisme dans les architectures de microprocesseurs et les avancées en matière de réseaux à haut débit permettent d'envisager de nouvelles applications telles que le calcul intensif distribué. Cependant, la mise en oeuvre de ce nouveau type d'application sur des réseaux P2P pose de nombreux défis comme l'hétérogénéité des machines, le passage à l'échelle et la robustesse. Par ailleurs, les protocoles de transport existants comme TCP et UDP ne sont pas bien adaptés à ce nouveau type d'application. Ce mémoire de thèse a pour objectif de présenter un environnement décentralisé pour la mise en oeuvre de calculs intensifs sur des réseaux pair à pair. Nous nous intéressons à des applications dans les domaines de la simulation numérique et de l'optimisation qui font appel à des modèles de type parallélisme de tâches et qui sont résolues au moyen d'algorithmes itératifs distribués or parallèles. Contrairement aux solutions existantes, notre environnement permet des communications directes et fréquentes entre les pairs. L'environnement est conçu à partir d'un protocole de communication auto-adaptatif qui peut se reconfigurer en adoptant le mode de communication le plus approprié entre les pairs en fonction de choix algorithmiques relevant de la couche application ou d'éléments de contexte comme la topologie au niveau de la couche réseau. Nous présentons et analysons des résultats expérimentaux obtenus sur diverses plateformes comme GRID'5000 et PlanetLab pour le problème de l'obstacle et des problèmes non linéaires de flots dans les réseaux. / The concept of peer-to-peer (P2P) has known great developments these years in the domains of file sharing, video streaming or distributed databases. Recent advances in microprocessors architecture and networks permit one to consider new applications like distributed high performance computing. However, the implementation of this new type of application on P2P networks gives raise to numerous challenges like heterogeneity, scalability and robustness. In addition, existing transport protocols like TCP and UDP are not well suited to this new type of application. This thesis aims at designing a decentralized and robust environment for the implementation of high performance computing applications on peer-to-peer networks. We are interested in applications in the domains of numerical simulation and optimization that rely on tasks parallel models and that are solved via parallel or distributed iterative algorithms. Unlike existing solutions, our environment allows frequent direct communications between peers. The environment is based on a self adaptive communication protocol that can reconfigure itself dynamically by choosing the most appropriate communication mode between any peers according to decisions concerning algorithmic choice made at the application level or elements of context at transport level, like topology. We present and analyze computational results obtained on several testeds like GRID’5000 and PlanetLab for the obstacle problem and nonlinear network flow problems.
|
368 |
Exploitation efficace des architectures parallèles de type grappes de NUMA à l’aide de modèles hybrides de programmationClet-Ortega, Jérôme 18 April 2012 (has links)
Les systèmes de calcul actuels sont généralement des grappes de machines composés de nombreux processeurs à l'architecture fortement hiérarchique. Leur exploitation constitue le défi majeur des implémentations de modèles de programmation tels MPI ou OpenMP. Une pratique courante consiste à mélanger ces deux modèles pour bénéficier des avantages de chacun. Cependant ces modèles n'ont pas été pensés pour fonctionner conjointement ce qui pose des problèmes de performances. Les travaux de cette thèse visent à assister le développeur dans la programmation d'application de type hybride. Il s'appuient sur une analyse de la hiérarchie architecturale du système de calcul pour dimensionner les ressources d'exécution (processus et threads). Plutôt qu'une approche hybride classique, créant un processus MPI multithreadé par noeud, nous évaluons de façon automatique des solutions alternatives, avec plusieurs processus multithreadés par noeud, mieux adaptées aux machines de calcul modernes. / Modern computing servers usually consist in clusters of computers with several multi-core CPUs featuring a highly hierarchical hardware design. The major challenge of the programming models implementations is to efficiently take benefit from these servers. Combining two type of models, like MPI and OpenMP, is a current trend to reach this point. However these programming models haven't been designed to work together and that leads to performance issues. In this thesis, we propose to assist the programmer who develop hybrid applications. We lean on an analysis of the computing system architecture in order to set the number of processes and threads. Rather than a classical hybrid approach, that is to say creating one multithreaded MPI process per node, we automatically evaluate alternative solutions, with several multithreaded processes per node, better fitted to modern computing systems.
|
369 |
Various extensions in the theory of dynamic materials with a specific focus on the checkerboard geometrySanguinet, William Charles 01 May 2017 (has links)
This work is a numerical and analytical study of wave motion through dynamic materials (DM). This work focuses on showing several results that greatly extend the applicability of the checkerboard focusing effect. First, it is shown that it is possible to simultaneously focus dilatation and shear waves propagating through a linear elastic checkerboard structure. Next, it is shown that the focusing effect found for the original €œperfect€� checkerboard extends to the case of the checkerboard with smooth transitions between materials, this is termed a functionally graded (FG) checkerboard. With the additional assumption of a linear transition region, it is shown that there is a region of existence for limit cycles that takes the shape of a parallelogram in (m,n)-space. Similar to the perfect case, this is termed a €œplateau€� region. This shows that the robustness of the characteristic focusing effect is preserved even when the interfaces between materials are relaxed. Lastly, by using finite volume methods with limiting and adaptive mesh refinement, it is shown that energy accumulation is present for the functionally graded checkerboard as well as for the checkerboard with non-matching wave impedances. The main contribution of this work was to show that the characteristic focusing effect is highly robust and exists even under much more general assumptions than originally made. Furthermore, it provides a tool to assist future material engineers in constructing such structures. To this effect, exact bounds are given regarding how much the original perfect checkerboard structure can be spoiled before losing the expected characteristic focusing behavior.
|
370 |
Methode multigrilles parallèle pour les simulations 3D de mise en forme de matériaux / Methode multigrilles parallèle pour les simulations 3D de mise en forme de matériauxVi, Frédéric 16 June 2017 (has links)
Cette thèse porte sur le développement d’une méthode multigrilles parallèle visant à réduire les temps de calculs des simulations éléments finis dans le domaine de la mise en forme de pièces forgées en 3D. Ces applications utilisent une méthode implicite, caractérisées par une formulation mixte en vitesse/pression et une gestion du contact par pénalisation. Elles impliquent de grandes déformations qui rendent nécessaires des remaillages fréquents sur les maillages tétraédriques non structurés utilisés. La méthode multigrilles développée suit une approche hybride, se basant sur une construction géométrique des niveaux grossiers par déraffinement de maillage non emboîtés et sur une construction algébrique des systèmes linéaires intermédiaires et grossiers. Un comportement asymptotique quasi-linéaire et une bonne efficacité parallèle sont attendus afin de permettre la réalisation de simulations à grand nombre de degrés de liberté dans des temps plus raisonnables qu’aujourd’hui. Pour cela, l’algorithme de déraffinement de maillages est compatible avec le calcul parallèle, ainsi que les opérateurs permettant les transferts de champs entre les différents niveaux de maillages partitionnés. Les spécificités des problèmes à traiter ont mené à la sélection d'un lisseur plus complexe que ceux utilisés plus fréquemment dans la littérature. Sur la grille la plus grossière, une méthode de résolution directe est utilisée, en séquentiel comme en calcul parallèle. La méthode multigrilles est utilisée en tant que préconditionneur d’une méthode de résidu conjugué et a été intégrée au logiciel FORGE NxT et montre un comportement asymptotique et une efficacité parallèle proches de l’optimal. Le déraffinement automatique de maillages permet une compatibilité avec les remaillages fréquents et permet à la méthode multigrilles de simuler un procédé du début à la fin. Les temps de calculs sont significativement réduits, même sur des simulations avec des écoulements particuliers, sur lesquelles la méthode multigrilles ne peut être utilisée de manière optimale. Cette robustesse permet, par exemple, de réduire de 4,5 à 2,5 jours le temps de simulation d’un procédé. / A parallel multigrid method is developed to reduce large computational costs involved by the finite element simulation of 3D metal forming applications. These applications are characterized by a mixed velocity/pressure implicit formulation with a penalty formulation to enforce contact and lead to large deformations, handled by frequent remeshings of unstructured meshes of tetrahedral. The developed multigrid method follows a hybrid approach where the different levels of non-nested meshes are geometrically constructed by mesh coarsening, while the linear systems of the intermediate and coarse levels result from the algebraic approach. A close to linear asymptotical behavior is expected along with parallel efficiency in order to allow simulations with large number of degrees of freedom under reasonable computation times. These objectives lead to a parallel mesh coarsening algorithm and parallel transfer operators allowing fields transfer between the different levels of partitioned meshes. Physical specificities of metal forming applications lead to select a more complex multigrid smoother than those classically used in literature. A direct resolution method is used on the coarsest mesh, in sequential and in parallel computing. The developed multigrid method is used as a preconditioner for a Conjugate Residual algorithm within FORGE NxT software and shows an asymptotical behavior and a parallel efficiency close to optimal. The automatic mesh coarsening algorithm enables compatibility with frequent remeshings and allows the simulation of a forging process from beginning to end with the multigrid method. Computation times are significantly reduced, even on simulations with particular material flows on which the multigrid method is not optimal. This robustness allows, for instance, reducing from 4.5 to 2.5 days the computation of a forging process.
|
Page generated in 0.1093 seconds