Spelling suggestions: "subject:"kolver"" "subject:"golver""
221 |
High-Performance Scientific Applications Using Mixed Precision and Low-Rank Approximation Powered by Task-based Runtime SystemsAlomairy, Rabab M. 20 July 2022 (has links)
To leverage the extreme parallelism of emerging architectures, so that scientific applications can fulfill their high fidelity and multi-physics potential while sustaining high efficiency relative to the limiting resource, numerical algorithms must be redesigned. Algorithmic redesign is capable of shifting the limiting resource, for example from memory or communication to arithmetic capacity. The benefit of algorithmic redesign expands greatly when introducing a tunable tradeoff between accuracy and resources. Scientific applications from diverse sources rely on dense matrix operations. These operations arise in: Schur complements, integral equations, covariances in spatial statistics, ridge regression, radial basis functions from unstructured meshes, and kernel matrices from machine learning, among others. This thesis demonstrates how to extend the problem sizes that may be treated and to reduce their execution time. Two “universes” of algorithmic innovations have emerged to improve computations by orders of magnitude in capacity and runtime. Each introduces a hierarchy, of rank or precision. Tile Low-Rank approximation replaces blocks of dense operator with those of low rank. Mixed precision approximation, increasingly well supported by contemporary hardware, replaces blocks of high with low precision. Herein, we design new high-performance direct solvers based on the synergism of TLR and mixed precision. Since adapting to data sparsity leads to heterogeneous workloads, we rely on task-based runtime systems to orchestrate the scheduling of fine-grained kernels onto computational resources. We first demonstrate how TLR permits to accelerate acoustic scattering and mesh deformation simulations. Our solvers outperform the state-of-art libraries by up to an order of magnitude. Then, we demonstrate the impact of enabling mixed precision in bioinformatics context. Mixed precision enhances the performance up to three-fold speedup. To facilitate the adoption of task-based runtime systems, we introduce the AL4SAN library to provide a common API for the expression and queueing of tasks across multiple dynamic runtime systems. This library handles a variety of workloads at a low overhead, while increasing user productivity. AL4SAN enables interoperability by switching runtimes at runtime, which permits to achieve a twofold speedup on a task-based generalized symmetric eigenvalue solver.
|
222 |
A Real-Time Capable Adaptive Optimal Controller for a Commuter TrainYazhemsky, Dennis Ion January 2017 (has links)
This research formulates and implements a novel closed-loop optimal control system that drives a train between two stations in an optimal time, energy efficient, or mixed objective manner. The optimal controller uses sensor feedback from the train and in real-time computes the most efficient control decision for the train to follow given knowledge of the track profile ahead of the train, speed restrictions and required arrival time windows. The control problem is solved both on an open track and while safely driving no closer than a fixed distance behind another locomotive. In contrast to other research in the field, this thesis achieves a real-time capable and embeddable closed-loop optimization with advanced modeling and numerical solving techniques with a non-linear optimal control problem.
This controller is first formulated as a non-convex control problem and then converted to an advanced convex second-order cone problem with the intent of using a simple numerical solver, ensuring global optimality, and improving control robustness. Convex and non-convex numerical methods of solving the control problem are investigated and closed-loop performance results with a simulated vehicle are presented under realistic modeling conditions on advanced tracks both on desktop and embedded computer architectures. It is observed that the controller is capable of robust vehicle driving in cases both with and without modeling uncertainty. The benefits of pairing the optimal controller with a parameter estimator are demonstrated for cases where very large mismatches exists between the controller model and the simulated vehicle. Stopping performance is consistently within 25cm of target stations, and the worst case closed-loop optimization time was within 100ms for the computation of a 1000 point control horizon on an i7-6700 machine. / Thesis / Master of Applied Science (MASc) / This research formulates and implements a novel closed-loop optimal control system that drives a train between two stations in an optimal time, energy efficient, or mixed objective manner. It is deployed on a commuter vehicle and directly manages the motoring and braking systems. The optimal controller uses sensor feedback from the train and in real-time computes the most efficient control decision for the train to follow given knowledge of the track profile ahead of the train, speed restrictions and required arrival time windows. The final control implementation is capable of safe, high accuracy and optimal driving all while computing fast enough to reliably deploy on a rail vehicle.
|
223 |
Modal satisifiability in a constraint logic environmentStevenson, Lynette 30 November 2007 (has links)
The modal satisfiability problem has to date been solved using either a specifically
designed algorithm, or by translating the modal logic formula into a different class
of problem, such as a first-order logic, a propositional satisfiability problem or a constraint
satisfaction problem. These approaches and the solvers developed to support
them are surveyed and a synthesis thereof is presented.
The translation of a modal K formula into a constraint satisfaction problem,
as developed by Brand et al. [18], is further enhanced. The modal formula, which
must be in conjunctive normal form, is translated into layered propositional formulae.
Each of these layers is translated into a constraint satisfaction problem and solved
using the constraint solver ECLiPSe. I extend this translation to deal with reflexive
and transitive accessibility relations, thereby providing for the modal logics KT and
S4. Two of the difficulties that arise when these accessibility relations are added
are that the resultant formula increases considerably in complexity, and that it is
no longer in conjunctive normal form (CNF). I eliminate the need for the conversion
of the formula to CNF and deal instead with formulae that are in negation normal
form (NNF). I apply a number of enhancements to the formula at each modal layer
before it is translated into a constraint satisfaction problem. These include extensive
simplification, the assignment of a single value to propositional variables that occur
only positively or only negatively, and caching the status of the formula at each node
of the search tree. All of these significantly prune the search space. The final results
I achieve compare favorably with those obtained by other solvers. / Computing / M.Sc. (Computer Science)
|
224 |
Conception d’un solveur linéaire creux parallèle hybride direct-itératifGaidamour, Jérémie 08 December 2009 (has links)
Cette thèse présente une méthode de résolution parallèle de systèmes linéaires creux qui combine efficacement les techniques de résolutions directes et itératives en utilisant une approche de type complément de Schur. Nous construisons une décomposition de domaine. L'intérieur des sous-domaines est éliminé de manière directe pour se ramener à un problème sur l'interface. Ce problème est résolu grâce à une méthode itérative préconditionnée par une factorisation incomplète. Un réordonnancement de l'interface permet la construction d'un préconditionneur global du complément de Schur. Des algorithmes minimisant le pic mémoire de la construction du préconditionneur sont proposés. Nous exploitons un schéma d'équilibrage de charge utilisant une répartition de multiples sous-domaines sur les processeurs. Les méthodes sont implémentées dans le solveur HIPS et des résultats expérimentaux parallèles sont présentés sur de grands cas tests industriels. / This thesis presents a parallel resolution method for sparse linear systems which combines effectively techniques of direct and iterative solvers using a Schur complement approach. A domain decomposition is built ; the interiors of the subdomains are eliminated by a direct method in order to use an iterative method only on the interface unknowns. The system on the interface (Schur complement) is solved thanks to an iterative method preconditioned by a global incomplete factorization. A special ordering on the Schur complement allows to build a scalable preconditioner. Algorithms minimizing the memory peak that appears during the construction of the preconditioner are presented. The memory is balanced thanks to a multiple domains per processors parallelization scheme. The methods are implemented in the HIPS solver and parallel experimental results are presented on large industrial test cases.
|
225 |
Optimisation de code Galerkin discontinu sur ordinateur hybride : application à la simulation numérique en électromagnétisme / Discontinuous Galerkin code optimization on hybrid computer : application to the numerical simulation in electromagnetismWeber, Bruno 26 November 2018 (has links)
Nous présentons dans cette thèse les évolutions apportées au solveur Galerkin Discontinu Teta-CLAC, issu de la collaboration IRMA-AxesSim, au cours du projet HOROCH (2015-2018). Ce solveur permet de résoudre les équations de Maxwell en 3D, en parallèle sur un grand nombre d'accélérateurs OpenCL. L'objectif du projet HOROCH était d'effectuer des simulations de grande envergure sur un modèle numérique complet de corps humain. Ce modèle comporte 24 millions de mailles hexaédriques pour des calculs dans la bande de fréquences des objets connectés allant de 1 à 3 GHz (Bluetooth). Les applications sont nombreuses : téléphonie et accessoires, sport (maillots connectés), médecine (sondes : gélules, patchs), etc. Les évolutions ainsi apportées comprennent, entre autres : l'optimisation des kernels OpenCL à destination des CPU dans le but d'utiliser au mieux les architectures hybrides ; l'expérimentation du runtime StarPU ; le design d'un schéma d'intégration à pas de temps local ; et bon nombre d'optimisations permettant au solveur de traiter des simulations de plusieurs millions de mailles. / In this thesis, we present the evolutions made to the Discontinuous Galerkin solver Teta-CLAC – resulting from the IRMA-AxesSim collaboration – during the HOROCH project (2015-2018). This solver allows to solve the Maxwell equations in 3D and in parallel on a large amount of OpenCL accelerators. The goal of the HOROCH project was to perform large-scale simulations on a complete digital human body model. This model is composed of 24 million hexahedral cells in order to perform calculations in the frequency band of connected objects going from 1 to 3 GHz (Bluetooth). The applications are numerous: telephony and accessories, sport (connected shirts), medicine (probes: capsules, patches), etc. The changes thus made include, among others: optimization of OpenCL kernels for CPUs in order to make the best use of hybrid architectures; StarPU runtime experimentation; the design of an integration scheme using local time steps; and many optimizations allowing the solver to process simulations of several millions of cells.
|
226 |
Optimization of production allocation under price uncertainty : relating price model assumptions to decisionsBukhari, Abdulwahab Abdullatif 05 October 2011 (has links)
Allocating production volumes across a portfolio of producing assets is a complex optimization problem. Each producing asset possesses different technical attributes (e.g. crude type), facility constraints, and costs. In addition, there are corporate objectives and constraints (e.g. contract delivery requirements). While complex, such a problem can be specified and solved using conventional deterministic optimization methods. However, there is often uncertainty in many of the inputs, and in these cases the appropriate approach is neither obvious nor straightforward. One of the major uncertainties in the oil and gas industry is the commodity price assumption(s). This paper investigates this problem in three major sections: (1) We specify an integrated stochastic optimization model that solves for the optimal production allocation for a portfolio of producing assets when there is uncertainty in commodity prices, (2) We then compare the solutions that result when different price models are used, and (3) We perform a value of information analysis to estimate the value of more accurate price models. The results show that the optimum production allocation is a function of the price model assumptions. However, the differences between models are minor, and thus the value of choosing the “correct” price model, or similarly of estimating a more accurate model, is small. This work falls in the emerging research area of decision-oriented assessments of information value. / text
|
227 |
Modal satisifiability in a constraint logic environmentStevenson, Lynette 30 November 2007 (has links)
The modal satisfiability problem has to date been solved using either a specifically
designed algorithm, or by translating the modal logic formula into a different class
of problem, such as a first-order logic, a propositional satisfiability problem or a constraint
satisfaction problem. These approaches and the solvers developed to support
them are surveyed and a synthesis thereof is presented.
The translation of a modal K formula into a constraint satisfaction problem,
as developed by Brand et al. [18], is further enhanced. The modal formula, which
must be in conjunctive normal form, is translated into layered propositional formulae.
Each of these layers is translated into a constraint satisfaction problem and solved
using the constraint solver ECLiPSe. I extend this translation to deal with reflexive
and transitive accessibility relations, thereby providing for the modal logics KT and
S4. Two of the difficulties that arise when these accessibility relations are added
are that the resultant formula increases considerably in complexity, and that it is
no longer in conjunctive normal form (CNF). I eliminate the need for the conversion
of the formula to CNF and deal instead with formulae that are in negation normal
form (NNF). I apply a number of enhancements to the formula at each modal layer
before it is translated into a constraint satisfaction problem. These include extensive
simplification, the assignment of a single value to propositional variables that occur
only positively or only negatively, and caching the status of the formula at each node
of the search tree. All of these significantly prune the search space. The final results
I achieve compare favorably with those obtained by other solvers. / Computing / M.Sc. (Computer Science)
|
228 |
Simulations expérimentale et numérique des phénomènes de ruissellement et d’atomisation lors d’une procédure de lavage à l’eau / Experimental and numerical simulations of the atomisation and surface run-off phenomena during a water washing processPushparajalingam, Jegan Sutharsan 16 February 2012 (has links)
Celui-ci a pour objectif de valider l'ensemble des modèles physiques utilisés dans un code de simulation numérique pour simuler un écoulement de type annulaire dispersé en conduite rencontré lors d'une procédure de lavage à eau utilisé dans les raffineries. Pour ce faire une banque de données expérimentale est mise en place sur des configurations représentatives de celles utilisées en condition industrielle. La géométrie retenue comporte une zone horizontale d'injection rectiligne avec un injecteur central, suivi d'un coude à 90° situé dans un plan vertical. Différentes conditions expérimentales permettent d'étudier l'influence de la vitesse du gaz, de la condition d'injection du brouillard et de la pression sur les différents processus physiques. Ces résultats comprenant des visualisations du brouillard et du film pariétale, des mesures de taille et de distribution de gouttes,des mesures de débit et d'épaisseur de film, sont analysés pour faire ressortir les principaux mécanismes d'interaction entre le gaz et la phase dispersée, le gaz et le film liquide pariétal et la phase dispersée et le film pariétal. En parallèle, des premières simulations, avec une approche RANS, sont réalisées avec le code CEDRE de l'ONERA et les résultats sont confrontés aux mesures. / This work has been realised within a CIFRE contract with TOTAL. Its aim was to validate all the physical models used in a computation, which simulates an annular dispersed flow through a pipe used in a water washing process in refinery plants. That is why, a whole set of data has been gathered using experimental boundary conditions which are representative to those used in industrial configurations. The geometry is made of a horizontal pipe with a centred nozzle followed by a 90º elbow in the vertical plane. Several experimental boundary conditions enable one to study the influence of the gas velocity, the type of the spray injection and the pressure on the different physical phenomena. These results including spray and liquid film visualisations, droplets distribution and size measurements as well as liquid film thickness and mass flow measurements were analysed in order to extract the main interaction mechanism between the gas and the dispersed phase, the gas and the liquid film, and the dispersed phase and the annular liquid film. Meanwhile, simulations using a RANS approach were realized with the ONERA code named CEDRE and its results were compared to the gathered measurements.
|
229 |
Experimental Analysis of Shock Stand off Distance over Spherical Bodies in Hypersonic FlowsThakur, Ruchi January 2015 (has links) (PDF)
One of the characteristics of the high speed ows over blunt bodies is the detached shock formed in front of the body. The distance of the shock from the stagnation point measured along the stagnation streamline is termed as the shock stand o distance or the shock detachment distance. It is one of the most basic parameters in such ows. The need to know the shock stand o distance arises due to the high temperatures faced in these cases. The biggest challenge faced in high enthalpy ows is the high amounts of heat transfer to the body. The position of the shock is relevant in knowing the temperatures that the body being subjected to such ows will have to face and thus building an efficient system to reduce the heat transfer. Despite being a basic parameter, there is no theoretical means to determine the shock stand o distance which is accepted universally. Deduction of this quantity depends more or less on experimental or computational means until a successful theoretical model for its predictions is developed.
The experimental data available in open literature for spherical bodies in high speed ows mostly lies beyond the 2 km/s regime. Experiments were conducted to determine the shock stand o distance in the velocity range of 1-2 km/s. Three different hemispherical bodies of radii 25, 40 and 50 mm were taken as test models. Since the shock stand o distance is known to depend on the density ratio across the shock and hence gamma (ratio of specific heats), two different test gases, air and carbon dioxide were used for the experiments here. Five different test cases were studied with air as the test gas; Mach 5.56 with Reynolds number of 5.71 million/m and enthalpy of 1.08 MJ/kg, Mach 5.39 with Reynolds number of 3.04 million/m and enthalpy of 1.42 MJ/kg Mach 8.42 with Reynolds number of 1.72 million/m and enthalpy of 1.21 MJ/kg, Mach 11.8 with Reynolds number of 1.09 million/m and enthalpy of 2.03 MJ/kg and Mach 11.25 with Reynolds number of 0.90 million/m and enthalpy of 2.88 MJ/kg. For the experiments conducted with carbon dioxide as test gas, typical freestream conditions were: Mach 6.66 with Reynolds number of 1.46 million/m and enthalpy of 1.23 MJ/kg. The shock stand o distance was determined from the images that were obtained through schlieren photography, the ow visualization technique employed here. The results obtained were found to follow the same trend as the existing experimental data in the higher velocity range. The experimental data obtained was compared with two different theoretical models given by Lobb and Olivier and was found to match. Simulations were carried out in HiFUN, an in-house CFD package for Euler and laminar own conditions for Mach 8 own over 50 mm body with air as the test gas. The computational data was found to match well with the experimental and theoretical data
|
230 |
Optimisations des solveurs linéaires creux hybrides basés sur une approche par complément de Schur et décomposition de domaine / Optimizations of hybrid sparse linear solvers relying on Schur complement and domain decomposition approachesCasadei, Astrid 19 October 2015 (has links)
Dans cette thèse, nous nous intéressons à la résolution parallèle de grands systèmes linéaires creux. Nous nous focalisons plus particulièrement sur les solveurs linéaires creux hybrides directs itératifs tels que HIPS, MaPHyS, PDSLIN ou ShyLU, qui sont basés sur une décomposition de domaine et une approche « complément de Schur ». Bien que ces solveurs soient moins coûteux en temps et en mémoire que leurs homologues directs, ils ne sont néanmoins pas exempts de surcoûts. Dans une première partie, nous présentons les différentes méthodes de réduction de la consommation mémoire déjà existantes et en proposons une nouvelle qui n’impacte pas la robustesse numérique du précondionneur construit. Cette technique se base sur une atténuation du pic mémoire par un ordonnancement spécifique des tâches de calcul, d’allocation et de désallocation des blocs, notamment ceux se trouvant dans les parties « couplage » des domaines.Dans une seconde partie, nous nous intéressons à la question de l’équilibrage de la charge que pose la décomposition de domaine pour le calcul parallèle. Ce problème revient à partitionner le graphe d’adjacence de la matrice en autant de parties que de domaines désirés. Nous mettons en évidence le fait que pour avoir un équilibrage correct des temps de calcul lors des phases les plus coûteuses d’un solveur hybride tel que MaPHyS, il faut à la fois équilibrer les domaines en termes de nombre de noeuds et de taille d’interface locale. Jusqu’à aujourd’hui, les partitionneurs de graphes tels que Scotch et MeTiS ne s’intéressaient toutefois qu’au premier critère (la taille des domaines) dans le contexte de la renumérotation des matrices creuses. Nous proposons plusieurs variantes des algorithmes existants afin de prendre également en compte l’équilibrage des interfaces locales. Toutes nos modifications sont implémentées dans le partitionneur Scotch, et nous présentons des résultats sur de grands cas de tests industriels. / In this thesis, we focus on the parallel solving of large sparse linear systems. Our main interestis on direct-iterative hybrid solvers such as HIPS, MaPHyS, PDSLIN or ShyLU, whichrely on domain decomposition and Schur complement approaches. Althrough these solvers arenot as time and space consuming as direct methods, they still suffer from serious overheads. Ina first part, we thus present the existing techniques for reducing the memory consumption, andwe present a new method which does not impact the numerical robustness of the preconditioner.This technique reduces the memory peak by doing a special scheduling of computation, allocation,and freeing tasks in particular in the Schur coupling blocks of the matrix. In a second part,we focus on the load balancing of the domain decomposition in a parallel context. This problemconsists in partitioning the adjacency graph of the matrix in as many domains as desired. Wepoint out that a good load balancing for the most expensive steps of an hybrid solver such asMaPHyS relies on the balancing of both interior nodes and interface nodes of the domains.Through, until now, graph partitioners such as MeTiS or Scotch used to optimize only thefirst criteria (i.e., the balancing of interior nodes) in the context of sparse matrix ordering. Wepropose different variations of the existing algorithms to improve the balancing of interface nodesand interior nodes simultaneously. All our changes are implemented in the Scotch partitioner.We present our results on large collection of matrices coming from real industrial cases.
|
Page generated in 0.0266 seconds