Global ETD Search

61	Fast Numerical Techniques for Electromagnetic Problems in Frequency Domain Nilsson, Martin January 2003 (has links) The Method of Moments is a numerical technique for solving electromagnetic problems with integral equations. The method discretizes a surface in three dimensions, which reduces the dimension of the problem with one. A drawback of the method is that it yields a dense system of linear equations. This effectively prohibits the solution of large scale problems. Papers I-III describe the Fast Multipole Method. It reduces the cost of computing a dense matrix vector multiplication. This implies that large scale problems can be solved on personal computers. In Paper I the error introduced by the Fast Multipole Method is analyzed. Paper II and Paper III describe the implementation of the Fast Multipole Method. The problem of computing monostatic Radar Cross Section involves many right hand sides. Since the Fast Multipole Method computes a matrix times a vector, iterative techniques are used to solve the linear systems. It is important that the solution time for each system is as low as possible. Otherwise the total solution time becomes too large. Different techniques for reducing the work in the iterative solver are described in Paper IV-VI. Paper IV describes a block Quasi Minimal Residual method for several right hand sides and Sparse Approximate Inverse preconditioner that reduce the number of iterations significantly. In Paper V and Paper VI a method based on linear algebra called the Minimal Residual Interpolation method is described. It reduces the work in an iterative solver by accurately computing an initial guess for the iterative method. In Paper VII a hybrid method between Physical Optics and the Fast Multipole Method is described. It can handle large problems that are out of reach for the Fast Multipole Method. Fast Multipole Method Minimal Residual Interpolation Method of Moments fast solvers iterative methods multiple right-hand sides error analysis
62	A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms / Um sistema de escalonamento dinâmico e tuning em tempo de execução para plataformas desktop heterogêneas de múltiplos núcleos Binotto, Alécio Pedro Delazari January 2011 (has links) Atualmente, o computador pessoal (PC) moderno poder ser considerado como um cluster heterogênedo de um nodo, o qual processa simultâneamente inúmeras tarefas provenientes das aplicações. O PC pode ser composto por Unidades de Processamento (PUs) assimétricas, como a Unidade Central de Processamento (CPU), composta de múltiplos núcleos, a Unidade de Processamento Gráfico (GPU), composta por inúmeros núcleos e que tem sido um dos principais co-processadores que contribuiram para a computação de alto desempenho em PCs, entre outras. Neste sentido, uma plataforma de execução heterogênea é formada em um PC para efetuar cálculos intensivos em um grande número de dados. Na perspectiva desta tese, a distribuição da carga de trabalho de uma aplicação nas PUs é um fator importante para melhorar o desempenho das aplicações e explorar tal heterogeneidade. Esta questão apresenta desafios uma vez que o custo de execução de uma tarefa de alto nível em uma PU é não-determinístico e pode ser afetado por uma série de parâmetros não conhecidos a priori, como o tamanho do domínio do problema e a precisão da solução, entre outros. Nesse escopo, esta pesquisa de doutorado apresenta um sistema sensível ao contexto e de adaptação em tempo de execução com base em um compromisso entre a redução do tempo de execução das aplicações - devido a um escalonamento dinâmico adequado de tarefas de alto nível - e o custo de computação do próprio escalonamento aplicados em uma plataforma composta de CPU e GPU. Esta abordagem combina um modelo para um primeiro escalonamento baseado em perfis de desempenho adquiridos em préprocessamento com um modelo online, o qual mantém o controle do tempo de execução real de novas tarefas e escalona dinâmicamente e de modo eficaz novas instâncias das tarefas de alto nível em uma plataforma de execução composta de CPU e de GPU. Para isso, é proposto um conjunto de heurísticas para escalonar tarefas em uma CPU e uma GPU e uma estratégia genérica e eficiente de escalonamento que considera várias unidades de processamento. A abordagem proposta é aplicada em um estudo de caso utilizando uma plataforma de execução composta por CPU e GPU para computação de métodos iterativos focados na solução de Sistemas de Equações Lineares que se utilizam de um cálculo de stencil especialmente concebido para explorar as características das GPUs modernas. A solução utiliza o número de incógnitas como o principal parâmetro para a decisão de escalonamento. Ao escalonar tarefas para a CPU e para a GPU, um ganho de 21,77% em desempenho é obtido em comparação com o escalonamento estático de todas as tarefas para a GPU (o qual é utilizado por modelos de programação atuais, como OpenCL e CUDA para Nvidia) com um erro de escalonamento de apenas 0,25% em relação à combinação exaustiva. / A modern personal computer can be now considered as a one-node heterogeneous cluster that simultaneously processes several applications’ tasks. It can be composed by asymmetric Processing Units (PUs), like the multi-core Central Processing Unit (CPU), the many-core Graphics Processing Units (GPUs) - which have become one of the main co-processors that contributed towards high performance computing - and other PUs. This way, a powerful heterogeneous execution platform is built on a desktop for data intensive calculations. In the perspective of this thesis, to improve the performance of applications and explore such heterogeneity, a workload distribution over the PUs plays a key role in such systems. This issue presents challenges since the execution cost of a task at a PU is non-deterministic and can be affected by a number of parameters not known a priori, like the problem size domain and the precision of the solution, among others. Within this scope, this doctoral research introduces a context-aware runtime and performance tuning system based on a compromise between reducing the execution time of the applications - due to appropriate dynamic scheduling of high-level tasks - and the cost of computing such scheduling applied on a platform composed of CPU and GPUs. This approach combines a model for a first scheduling based on an off-line task performance profile benchmark with a runtime model that keeps track of the tasks’ real execution time and efficiently schedules new instances of the high-level tasks dynamically over the CPU/GPU execution platform. For that, it is proposed a set of heuristics to schedule tasks over one CPU and one GPU and a generic and efficient scheduling strategy that considers several processing units. The proposed approach is applied in a case study using a CPU-GPU execution platform for computing iterative solvers for Systems of Linear Equations using a stencil code specially designed to explore the characteristics of modern GPUs. The solution uses the number of unknowns as the main parameter for assignment decision. By scheduling tasks to the CPU and to the GPU, it is achieved a performance gain of 21.77% in comparison to the static assignment of all tasks to the GPU (which is done by current programming models, such as OpenCL and CUDA for Nvidia) with a scheduling error of only 0.25% compared to exhaustive search. Processamento paralelo Microeletrônica Processamento : Imagem Processamento : Alto desempenho High-performance computing Scheduling Dynamic load-balancing Heterogenous systems Graphics processors Solvers for systems of linear equations
63	Finding inductive invariants using satisfiability modulo theories and convex optimization / Recherche d'invariants inductifs par satisfiabilité modulo théorie et optimisation convexe Karpenkov, George Egor 29 March 2017 (has links) L'analyse statique correcte d'un programme consiste à obtenir des propriétés vraies de toute exécution de ce programme. Celles-ci sont utiles pour démontrer des caractéristiques appréciables du logiciel, telles que l'absence de dépassement de capacité ou autre erreur à l'exécution quelle que soient les entrées du programme. Elles sont presque toujours établies à l'aide d'invariants inductifs : des propriétés vraies de l'état initial et telles que si elles sont vraies à une étape de calcul, alors elles restent vraies à l'étape suivante de la transition de calcul, donc sont toujours vraies par récurrence. L'interprétation abstraite est une approche traditionnelle de la recherche d'invariants numériques, que l'on peut exprimer comme une interprétation non-standard du programme dans un domaine abstrait choisi et ne tenant compte que de certaines propriétés intéressantes. Même dans un domaine aussi simple que les intervalles (un minorant et un majorant pour chaque variable), ce calcul ne converge pas nécessairement, et l'analyse doit recourir à des opérateurs d'élargissement pour forcer la convergence au détriment de la précision. Une autre approche, appelée itération de politique et inspirée par la théorie des jeux, garantit de trouver le plus fort invariant inductif dans le domaine abstrait choisi après un nombre fini d'itérations. Cependant, la description originale de cet algorithme souffrait de quelques faiblesses : elle se basait sur une conversion totale du programme en un système d'équations, sans intégration ni synergie avec les autres méthodes d'analyse. Notre nouvel algorithme est une forme locale de l'itération de politique, qui la replace dans l'itération de Kleene mais avec un opérateur d'élargissement spécial qui garantit d'obtenir le plus petit invariant inductif dans le domaine abstrait après un nombre fini de ses applications. L'itération de politique locale opère dans les domaines de contraintes linéaires données par patron, qui demandent de fixer d'avance la «forme» de l'invariant (p.ex. "x + 2y" pour obtenir "x + 2y <= 10" ). Notre seconde contribution théorique est le développement et la comparaison de plusieurs stratégies de synthèse de patrons, utilisées en conjonction avec l'itération locale de politiques. De plus, nous présentons une méthode pour générer des arbres d'accessibilité abstraite par interprétation abstraite, permettant la génération de traces de contre-exemples, et ensuite la génération de nouveaux patrons à partir d'interpolants de Craig. Notre troisième contribution concerne l'analyse interprocédurale de programmes, éventuellement récursifs. Nous proposons un algorithme qui génère pour chaque procédure un résumé, applicable à toute interprétation abstraite et notamment à l'itération de politique locale. Nous pouvons ainsi générer les invariants inductifs les plus forts dans le domaine pour un nombre fixé de résumés pour un programme récursif. Notre dernière contribution théorique est une méthode d'affaiblissement permettant de trouver des invariants inductifs, éventuellement disjonctifs, à partir de formules obtenues par exécution symbolique. Nous avons mis en œuvre toutes ces approches dans le système d'analyse statique CPAchecker, un logiciel libre, ce qui permet des communications et collaborations entre analyses. Nos techniques utilisent des résolveurs de satisfiabilité modulo théorie, capables, étant donné une formule de logique du premier ordre sur certaines théories, d'en donner un modèle ou de démontrer qu'aucun n'existe.Afin de simplifier les communications avec ces outils, nous présentons la bibliothèque JavaSMT, fournissant une interface générique. Cette bibliothèque a déjà démontré son utilité pour de nombreux chercheurs. / Static analysis concerns itself with deriving program properties which holduniversally for all program executions.Such properties are used for proving program properties (e.g. there neveroccurs an overflow or other runtime error regardless of a particular execution) and are almostinvariably established using inductive invariants: properties which holdfor the initial state and imply themselves under the program transition, and thushold universally due to induction.A traditional approach for finding numerical invariants is using abstractinterpretation, which can be seen as interpreting the program in the abstractdomain of choice, only tracking properties of interest.Yet even in the intervals abstract domain (upper and lower boundsfor each variable) such computation does not necessarily converge, and theanalysis has to resort to the use of widenings to enforceconvergence at the cost of precision.An alternative game-theoretic approach called policy iteration,guarantees to findthe least inductive invariant in the chosen abstract domain under the finitenumber of iterations.Yet the original description of the algorithm includes a number of drawbacks:it requires converting the entire program to an equation system,does not integrate with other approaches,and is unable to benefit from other analyses.Our new algorithm for running local policy iteration (LPI)instead formulates policy iteration as traditional Kleene iteration,with a widening operator that guarantees to return the least inductiveinvariant in the domain after finitely many applications.Local policy iteration runs in template linear constraint domains whichrequires setting in advance the ``shape'' of the derived invariant (e.g.$x + 2y$ for deriving $x + 2y leq 10$).Our second theoretical contribution involves development and comparison ofa number of different template synthesis strategies, when used in conjunctionwith LPI.Additionally, we present an approach for generating abstract reachabilitytrees using abstract interpretation,enabling the construction of counterexample traces,which in turns lets us generate new templates using Craig interpolants.In our third contribution we bring our attention to interprocedural andpotentially recursive programs.We develop an algorithm parameterizable with any abstract interpretation forsummary generation, and we study it's parameterization with LPI.The resulting approach is able to generate least inductive invariants in the domain for a fixed number of summaries for recursive programs.Our final theoretical contribution is a novel "formula slicing''method for finding potentially disjunctive inductive invariantsfrom program fragments obtained by symbolic execution.We implement all of these techniques in the open-source state-of-the-artCPAchecker program analysis framework, enabling communication and collaborationbetween different analyses.The techniques mentioned above rely onsatisfiability modulo theories solvers,which are capable ofgiving solutions tofirst-order formulas over certain theories or showingthat none exists.In order to simplify communication with such toolswe present the JavaSMT library, which provides a generic interface for suchcommunication.The library has shown itself to be a valuable tool, and is already used by manyresearchers. Analyse statique de programmes Satisfiabilité modulo théorie Itération de politiques Analyse interprocédurale Procédures récursives SMT Static analysis Convex optimization SMT solvers Intraprocedural Recursive Smt 004
64	Methods for solving discontinuous-Galerkin finite element equations with application to neutron transport / Méthodes de résolution d'équations aux éléments finis Galerkin discontinus et application à la neutronique Murphy, Steven 26 August 2015 (has links) Cette thèse traite des méthodes d’éléments finis Galerkin discontinus d’ordre élevé pour la résolution d’équations aux dérivées partielles, avec un intérêt particulier pour l’équation de transport des neutrons. Nous nous intéressons tout d’abord à une méthode de pré-traitement de matrices creuses par blocs, qu’on retrouve dans les méthodes Galerkin discontinues, avant factorisation par un solveur multifrontal. Des expériences numériques conduites sur de grandes matrices bi- et tri-dimensionnelles montrent que cette méthode de pré-traitement permet une réduction significative du ’fill-in’, par rapport aux méthodes n’exploitant pas la structure par blocs. Ensuite, nous proposons une méthode d’éléments finis Galerkin discontinus, employant des éléments d’ordre élevé en espace comme en angle, pour résoudre l’équation de transport des neutrons. Nous considérons des solveurs parallèles basés sur les sous-espaces de Krylov à la fois pour des problèmes ’source’ et des problèmes aux valeur propre multiplicatif. Dans cet algorithme, l’erreur est décomposée par projection(s) afin d’équilibrer les contraintes numériques entre les parties spatiales et angulaires du domaine de calcul. Enfin, un algorithme HP-adaptatif est présenté ; les résultats obtenus démontrent une nette supériorité par rapport aux algorithmes h-adaptatifs, à la fois en terme de réduction de coût de calcul et d’amélioration de la précision. Les valeurs propres et effectivités sont présentées pour un panel de cas test industriels. Une estimation précise de l’erreur (avec effectivité de 1) est atteinte pour un ensemble de problèmes aux domaines inhomogènes et de formes irrégulières ainsi que des groupes d’énergie multiples. Nous montrons numériquement que l’algorithme HP-adaptatif atteint une convergence exponentielle par rapport au nombre de degrés de liberté de l’espace éléments finis. / We consider high order discontinuous-Galerkin finite element methods for partial differential equations, with a focus on the neutron transport equation. We begin by examining a method for preprocessing block-sparse matrices, of the type that arise from discontinuous-Galerkin methods, prior to factorisation by a multifrontal solver. Numerical experiments on large two and three dimensional matrices show that this pre-processing method achieves a significant reduction in fill-in, when compared to methods that fail to exploit block structures. A discontinuous-Galerkin finite element method for the neutron transport equation is derived that employs high order finite elements in both space and angle. Parallel Krylov subspace based solvers are considered for both source problems and $k_{eff}$-eigenvalue problems. An a-posteriori error estimator is derived and implemented as part of an h-adaptive mesh refinement algorithm for neutron transport $k_{eff}$-eigenvalue problems. This algorithm employs a projection-based error splitting in order to balance the computational requirements between the spatial and angular parts of the computational domain. An hp-adaptive algorithm is presented and results are collected that demonstrate greatly improved efficiency compared to the h-adaptive algorithm, both in terms of reduced computational expense and enhanced accuracy. Computed eigenvalues and effectivities are presented for a variety of challenging industrial benchmarks. Accurate error estimation (with effectivities of 1) is demonstrated for a collection of problems with inhomogeneous, irregularly shaped spatial domains as well as multiple energy groups. Numerical results are presented showing that the hp-refinement algorithm can achieve exponential convergence with respect to the number of degrees of freedom in the finite element space Méthodes a posteriori Algorithmes HP-adaptatif Méthodes Galerkin discontinus Neutronique Matrices creuses Solveurs linéaires A-posteriori methods Hp-refinement Discontinuous-Galerkin methods Neutron Transport Sparse matrices Linear Solvers
65	A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms / Um sistema de escalonamento dinâmico e tuning em tempo de execução para plataformas desktop heterogêneas de múltiplos núcleos Binotto, Alécio Pedro Delazari January 2011 (has links) Atualmente, o computador pessoal (PC) moderno poder ser considerado como um cluster heterogênedo de um nodo, o qual processa simultâneamente inúmeras tarefas provenientes das aplicações. O PC pode ser composto por Unidades de Processamento (PUs) assimétricas, como a Unidade Central de Processamento (CPU), composta de múltiplos núcleos, a Unidade de Processamento Gráfico (GPU), composta por inúmeros núcleos e que tem sido um dos principais co-processadores que contribuiram para a computação de alto desempenho em PCs, entre outras. Neste sentido, uma plataforma de execução heterogênea é formada em um PC para efetuar cálculos intensivos em um grande número de dados. Na perspectiva desta tese, a distribuição da carga de trabalho de uma aplicação nas PUs é um fator importante para melhorar o desempenho das aplicações e explorar tal heterogeneidade. Esta questão apresenta desafios uma vez que o custo de execução de uma tarefa de alto nível em uma PU é não-determinístico e pode ser afetado por uma série de parâmetros não conhecidos a priori, como o tamanho do domínio do problema e a precisão da solução, entre outros. Nesse escopo, esta pesquisa de doutorado apresenta um sistema sensível ao contexto e de adaptação em tempo de execução com base em um compromisso entre a redução do tempo de execução das aplicações - devido a um escalonamento dinâmico adequado de tarefas de alto nível - e o custo de computação do próprio escalonamento aplicados em uma plataforma composta de CPU e GPU. Esta abordagem combina um modelo para um primeiro escalonamento baseado em perfis de desempenho adquiridos em préprocessamento com um modelo online, o qual mantém o controle do tempo de execução real de novas tarefas e escalona dinâmicamente e de modo eficaz novas instâncias das tarefas de alto nível em uma plataforma de execução composta de CPU e de GPU. Para isso, é proposto um conjunto de heurísticas para escalonar tarefas em uma CPU e uma GPU e uma estratégia genérica e eficiente de escalonamento que considera várias unidades de processamento. A abordagem proposta é aplicada em um estudo de caso utilizando uma plataforma de execução composta por CPU e GPU para computação de métodos iterativos focados na solução de Sistemas de Equações Lineares que se utilizam de um cálculo de stencil especialmente concebido para explorar as características das GPUs modernas. A solução utiliza o número de incógnitas como o principal parâmetro para a decisão de escalonamento. Ao escalonar tarefas para a CPU e para a GPU, um ganho de 21,77% em desempenho é obtido em comparação com o escalonamento estático de todas as tarefas para a GPU (o qual é utilizado por modelos de programação atuais, como OpenCL e CUDA para Nvidia) com um erro de escalonamento de apenas 0,25% em relação à combinação exaustiva. / A modern personal computer can be now considered as a one-node heterogeneous cluster that simultaneously processes several applications’ tasks. It can be composed by asymmetric Processing Units (PUs), like the multi-core Central Processing Unit (CPU), the many-core Graphics Processing Units (GPUs) - which have become one of the main co-processors that contributed towards high performance computing - and other PUs. This way, a powerful heterogeneous execution platform is built on a desktop for data intensive calculations. In the perspective of this thesis, to improve the performance of applications and explore such heterogeneity, a workload distribution over the PUs plays a key role in such systems. This issue presents challenges since the execution cost of a task at a PU is non-deterministic and can be affected by a number of parameters not known a priori, like the problem size domain and the precision of the solution, among others. Within this scope, this doctoral research introduces a context-aware runtime and performance tuning system based on a compromise between reducing the execution time of the applications - due to appropriate dynamic scheduling of high-level tasks - and the cost of computing such scheduling applied on a platform composed of CPU and GPUs. This approach combines a model for a first scheduling based on an off-line task performance profile benchmark with a runtime model that keeps track of the tasks’ real execution time and efficiently schedules new instances of the high-level tasks dynamically over the CPU/GPU execution platform. For that, it is proposed a set of heuristics to schedule tasks over one CPU and one GPU and a generic and efficient scheduling strategy that considers several processing units. The proposed approach is applied in a case study using a CPU-GPU execution platform for computing iterative solvers for Systems of Linear Equations using a stencil code specially designed to explore the characteristics of modern GPUs. The solution uses the number of unknowns as the main parameter for assignment decision. By scheduling tasks to the CPU and to the GPU, it is achieved a performance gain of 21.77% in comparison to the static assignment of all tasks to the GPU (which is done by current programming models, such as OpenCL and CUDA for Nvidia) with a scheduling error of only 0.25% compared to exhaustive search. Processamento paralelo Microeletrônica Processamento : Imagem Processamento : Alto desempenho High-performance computing Scheduling Dynamic load-balancing Heterogenous systems Graphics processors Solvers for systems of linear equations
66	A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms / Um sistema de escalonamento dinâmico e tuning em tempo de execução para plataformas desktop heterogêneas de múltiplos núcleos Binotto, Alécio Pedro Delazari January 2011 (has links) Atualmente, o computador pessoal (PC) moderno poder ser considerado como um cluster heterogênedo de um nodo, o qual processa simultâneamente inúmeras tarefas provenientes das aplicações. O PC pode ser composto por Unidades de Processamento (PUs) assimétricas, como a Unidade Central de Processamento (CPU), composta de múltiplos núcleos, a Unidade de Processamento Gráfico (GPU), composta por inúmeros núcleos e que tem sido um dos principais co-processadores que contribuiram para a computação de alto desempenho em PCs, entre outras. Neste sentido, uma plataforma de execução heterogênea é formada em um PC para efetuar cálculos intensivos em um grande número de dados. Na perspectiva desta tese, a distribuição da carga de trabalho de uma aplicação nas PUs é um fator importante para melhorar o desempenho das aplicações e explorar tal heterogeneidade. Esta questão apresenta desafios uma vez que o custo de execução de uma tarefa de alto nível em uma PU é não-determinístico e pode ser afetado por uma série de parâmetros não conhecidos a priori, como o tamanho do domínio do problema e a precisão da solução, entre outros. Nesse escopo, esta pesquisa de doutorado apresenta um sistema sensível ao contexto e de adaptação em tempo de execução com base em um compromisso entre a redução do tempo de execução das aplicações - devido a um escalonamento dinâmico adequado de tarefas de alto nível - e o custo de computação do próprio escalonamento aplicados em uma plataforma composta de CPU e GPU. Esta abordagem combina um modelo para um primeiro escalonamento baseado em perfis de desempenho adquiridos em préprocessamento com um modelo online, o qual mantém o controle do tempo de execução real de novas tarefas e escalona dinâmicamente e de modo eficaz novas instâncias das tarefas de alto nível em uma plataforma de execução composta de CPU e de GPU. Para isso, é proposto um conjunto de heurísticas para escalonar tarefas em uma CPU e uma GPU e uma estratégia genérica e eficiente de escalonamento que considera várias unidades de processamento. A abordagem proposta é aplicada em um estudo de caso utilizando uma plataforma de execução composta por CPU e GPU para computação de métodos iterativos focados na solução de Sistemas de Equações Lineares que se utilizam de um cálculo de stencil especialmente concebido para explorar as características das GPUs modernas. A solução utiliza o número de incógnitas como o principal parâmetro para a decisão de escalonamento. Ao escalonar tarefas para a CPU e para a GPU, um ganho de 21,77% em desempenho é obtido em comparação com o escalonamento estático de todas as tarefas para a GPU (o qual é utilizado por modelos de programação atuais, como OpenCL e CUDA para Nvidia) com um erro de escalonamento de apenas 0,25% em relação à combinação exaustiva. / A modern personal computer can be now considered as a one-node heterogeneous cluster that simultaneously processes several applications’ tasks. It can be composed by asymmetric Processing Units (PUs), like the multi-core Central Processing Unit (CPU), the many-core Graphics Processing Units (GPUs) - which have become one of the main co-processors that contributed towards high performance computing - and other PUs. This way, a powerful heterogeneous execution platform is built on a desktop for data intensive calculations. In the perspective of this thesis, to improve the performance of applications and explore such heterogeneity, a workload distribution over the PUs plays a key role in such systems. This issue presents challenges since the execution cost of a task at a PU is non-deterministic and can be affected by a number of parameters not known a priori, like the problem size domain and the precision of the solution, among others. Within this scope, this doctoral research introduces a context-aware runtime and performance tuning system based on a compromise between reducing the execution time of the applications - due to appropriate dynamic scheduling of high-level tasks - and the cost of computing such scheduling applied on a platform composed of CPU and GPUs. This approach combines a model for a first scheduling based on an off-line task performance profile benchmark with a runtime model that keeps track of the tasks’ real execution time and efficiently schedules new instances of the high-level tasks dynamically over the CPU/GPU execution platform. For that, it is proposed a set of heuristics to schedule tasks over one CPU and one GPU and a generic and efficient scheduling strategy that considers several processing units. The proposed approach is applied in a case study using a CPU-GPU execution platform for computing iterative solvers for Systems of Linear Equations using a stencil code specially designed to explore the characteristics of modern GPUs. The solution uses the number of unknowns as the main parameter for assignment decision. By scheduling tasks to the CPU and to the GPU, it is achieved a performance gain of 21.77% in comparison to the static assignment of all tasks to the GPU (which is done by current programming models, such as OpenCL and CUDA for Nvidia) with a scheduling error of only 0.25% compared to exhaustive search. Processamento paralelo Microeletrônica Processamento : Imagem Processamento : Alto desempenho High-performance computing Scheduling Dynamic load-balancing Heterogenous systems Graphics processors Solvers for systems of linear equations
67	Three-and four-derivative Hermite-Birkhoff-Obrechkoff solvers for stiff ODE Albishi, Njwd January 2016 (has links) Three- and four-derivative k-step Hermite-Birkhoff-Obrechkoff (HBO) methods are constructed for solving stiff systems of first-order differential equations of the form y'= f(t,y), y(t0) = y0. These methods use higher derivatives of the solution y as in Obrechkoff methods. We compute their regions of absolute stability and show the three- and four-derivative HBO are A( 𝜶)-stable with 𝜶 > 71 ° and 𝜶 > 78 ° respectively. We conduct numerical tests and show that our new methods are more efficient than several existing well-known methods. general linear method for stiff ODE's Hermite-Birkhoff-Obrechkoff method maximum end error number of function evaluations CPU time comparing stiff ODE solvers.
68	Unconditionally stable finite difference time domain methods for frequency dependent media Rouf, Hasan January 2010 (has links) The efficiency of the conventional, explicit finite difference time domain (FDTD)method is constrained by the upper limit on the temporal discretization, imposed by the Courant–Friedrich–Lewy (CFL) stability condition. Therefore, there is a growing interest in overcoming this limitation by employing unconditionally stable FDTD methods for which time-step and space-step can be independently chosen. Unconditionally stable Crank Nicolson method has not been widely used in time domain electromagnetics despite its high accuracy and low anisotropy. There has been no work on the Crank Nicolson FDTD (CN–FDTD) method for frequency dependent medium. In this thesis a new three-dimensional frequency dependent CN–FDTD (FD–CN–FDTD) method is proposed. Frequency dependency of single–pole Debye materials is incorporated into the CN–FDTD method by means of an auxiliary differential formulation. In order to provide a convenient and straightforward algorithm, Mur’s first-order absorbing boundary conditions are used in the FD–CN–FDTD method. Numerical tests validate and confirm that the FD–CN–FDTD method is unconditionally stable beyond the CFL limit. The proposed method yields a sparse system of linear equations which can be solved by direct or iterative methods, but numerical experiments demonstrate that for large problems of practical importance iterative solvers are to be used. The FD–CN–FDTD sparse matrix is diagonally dominant when the time-stepis near the CFL limit but the diagonal dominance of the matrix deteriorates with the increase of the time-step, making the solution time longer. Selection of the matrix solver to handle the FD–CN–FDTD sparse system is crucial to fully harness the advantages of using larger time-step, because the computational costs associated with the solver must be kept as low as possible. Two best–known iterative solvers, Bi-Conjugate Gradient Stabilised (BiCGStab) and Generalised Minimal Residual (GMRES), are extensively studied in terms of the number of iteration requirements for convergence, CPU time and memory requirements. BiCGStab outperforms GMRES in every aspect. Many of these findings do not match with the existing literature on frequency–independent CN–FDTD method and the possible reasons for this are pointed out. The proposed method is coded in Fortran and major implementation techniques of the serial code as well as its parallel implementation in Open Multi-Processing (OpenMP) are presented. As an application, a simulation model of the human body is developed in the FD–CN–FDTD method and numerical simulation of the electromagnetic wave propagation inside the human head is shown. Finally, this thesis presents a new method modifying the frequency dependent alternating direction implicit FDTD (FD–ADI–FDTD) method. Although the ADI–FDTD method provides a computationally affordable approximation of the CN–FDTD method, it exhibits a loss of accuracy with respect to the CN-FDTD method which may become severe for some practical applications. The modified FD–ADI–FDTD method can improve the accuracy of the normal FD–ADI–FDTD method without significantly increasing the computational costs. 621.381
69	Test generation and animation based on object-oriented specifications / Génération de tests et animation à partir de spécifications orientées objet Krieger, Matthias 09 December 2011 (has links) L'objectif de cette thèse est l'assistance à la génération de tests et à l'animation de spécifications orientées objet. Nous cherchons en particulier à profiter de l'état de l'art des techniques de résolution de satisfaisabilité en utilisant une représentation appropriée des données orientées objet. Alors que la génération automatique de cas de tests recherche un large ensemble de valeurs à fournir en entrée d'une application, l’animation de spécifications effectue les calculs qui sont conformes à une spécification à partir de valeurs fournies par l'utilisateur. L'animation est une technique importante pour la validation des spécifications.Comme fondement de ce travail, nous présentons des clarifications et une formalisation partielle du langage de spécification OCL (Object Constraint Language) ainsi que quelques extensions, afin de permettre la génération de tests et l'animation à partir de spécifications OCL.Pour la génération de tests, nous avons implémenté plusieurs améliorations à HOL-TestGen, outil basé sur le démonstrateur de théorème Isabelle, qui engendre des tests à partir de spécifications en Logique d’Ordre Supérieure (Higher-Order Logic ou HOL). Nous montrons comment des solveurs SMT peuvent être utilisés pour résoudre différents types de contraintes en HOL et nous présentons une approche modulaire de raisonnement par cas pour dériver des cas de tests. Cette dernière approche facilite l'introduction de règles de decomposition par cas qui sont adaptées aux spécifications orientées objet.Pour l'animation de spécifications, nous avons développé OCLexec, outil d'animation de spécifications en OCL. A partir de contrats de fonctions OCLexec produit les implémentations Java correspondantes qui appellent un solveur de contraintes SMT lors de leur exécution. / The goal of this thesis is the development of support for test generation and animation based on object-oriented specifications. We aim particularly to take advantage of state-of-the-art satisfiability solving techniques by using an appropriate representation of object-oriented data. While automated test generation seeks a large set of data to execute an implementation on, animation performs computations that comply with a specification based on user-provided input data. Animation is a valuable technique for validating specifications.As a foundation of this work, we present clarifications and a partial formalization of the Object Constraint Language (OCL) as well as some extensions in order to allow for test generation and animation based on OCL specifications.For test generation, we have implemented several enhancements to HOL-TestGen, a tool built on top of the Isabelle theorem proving system that generates tests from specifications in Higher-Order Logic (HOL). We show how SMT solvers can be used to solve various types of constraints in HOL and present a modular approach to case splitting for deriving test cases. The latter facilitates the introduction of splitting rules that are tailored to object-oriented specifications.For animation, we implemented the tool OCLexec for animating OCL specifications. OCLexec generates from operation contracts corresponding Java implementations that call an SMT-based constraint solver at runtime. Génération de tests Animation Execution de modèle UML OCL Solveurs SAT Isabelle/HOL Test generation Animation Model Execution UML OCL SAT solvers Isabelle/HOL
70	Modèle géométrique déformable pour la simulation et l’optimisation automatique de forme / Geometric modelling and deformation for automatic shape optimisation Berrini, Elisa 07 June 2017 (has links) Le contrôle précis des modèles géométriques joue un rôle important dans de nombreux domaines. Pour l’optimisation de forme en CFD, le choix des paramètres de contrôle et la technique de déformation de forme est critique. Nous proposons un modeleur paramétrique avec une nouvelle méthode de déformation d’objets, ayant pour objectif d’être intégré dans une boucle d’optimisation automatique de forme avec un solveur CFD. Notre méthodologie est basée sur une double paramétrisation des objets : géométrique et architecturale. L’approche géométrique consiste à décrire les formes par un squelette, composé d’une famille de courbes B-Splines, appelées courbes génératrice et courbes de section. Le squelette est paramétré avec une approche architecturale. Au lieu d’utiliser les points de contrôle de la représentation classique par courbes B-Splines, la géométrie est contrôlée par ces paramètres architecturaux. Cela permet de réduire considérablement le nombre de degrés de liberté utilisés dans le problème d’optimisation de forme, et permet de maintenir une description haut niveau des objets. Notre technique intègre un contrôle de forme et un contrôle de régularité, permettant d’assurer la génération de nouvelles formes valides et réalistes. Les déformations de la géométrie sont réalisées en posant un problème inverse : déterminer une géométrie correspondant à un jeu de paramètres cibles. Enfin, une technique de reconstruction de surface est proposée. Nous illustrons le modeleur paramétrique développé et intégré dans une boucle d’optimisation automatique de forme sur trois cas : un profil d’aile d’avion, un foil AC45 d’un voilier de course et un bulbe de chalutier de pêche. / The precise control of geometric models plays an important role in many domains. For shape optimisation in CFD, the choice of control parameters and the way to deform a shape are critical. In this thesis, we propose a new approach to shape deformation for parametric modellers with the purpose of being integrated into an automatic shape optimisation loop with a CFD solver. Our methodology is based on a twofold parameterisation: geometrical and architectural. The geometrical approach consist of a skeleton-based representation of object. The skeleton is made of a family of B-Spline curves, called generating curve and section curves. The skeleton is parametrised with an architectural approach: meaningful design parameters are chosen on the studied object. Thus, instead of using the control points of a classical B-spline representation, we control the geometry in terms of architectural parameters. This reduce the number of degrees of freedom and maintain a high level description of shapes. We ensure to generate valid shapes with a strong shape consistency control based on architectural considerations. Deformations of the geometry are performed by solving optimisation problems on the skeleton. Finally, a surface reconstruction method is proposed to evaluate the shape’s performances with CFD solvers. We illustrate the parametric modeller capabilities on three problems, performed with an automatic shape optimisation loop: the wind section of an plane (airfoil), the foil of an AC45 racing sail boat and the bulbous bow of a fishing trawler. Modeleur paramétrique Conception assistée par ordinateur Solveurs (logiciels) Modélisation CFD Architecture navale Parametric modeller Geometrical model Automatic shape optimization CAD Numerical solvers CFD Naval architecture

Search results