Global ETD Search

11	Implementation methodology for using concurrent and collaborative approaches for theorem provers, with case studies of SAT and LCF style provers G, Sriipriya January 2013 (has links) Theorem provers are faced with the challenges of size and complexity, fueled by the increasing range of applications. The use of concurrent/ distributed programming paradigms to engineer better theorem provers merits serious investigation, as it provides: more processing power and opportunities for implementing novel approaches to address theorem proving tasks hitherto infeasible in a sequential setting. Investigation of these opportunities for two diverse theorem prover settings with an emphasis on desirable implementation criteria is the core focus of this thesis. Concurrent programming is notoriously error prone, hard to debug and evaluate. Thus, implementation approaches which promote easy prototyping, portability, incremental development and effective isolation of design and implementation can greatly aid the enterprise of experimentation with the application of concurrent techniques to address specific theorem proving tasks. In this thesis, we have explored one such approach by using Alice ML, a functional programming language with support for concurrency and distribution, to implement the prototypes and have used programming abstractions to encapsulate the implementations of the concurrent techniques used. The utility of this approach is illustrated via proof-of-concept prototypes of concurrent systems for two diverse case studies of theorem proving: the propositional satisfiability problem (SAT) and LCF style (first-order) theorem proving, addressing some previously unexplored parallelisation opportunities for each, as follows:. SAT: We have developed a novel hybrid approach for SAT and implemented a prototype for the same: DPLL-Stalmarck. It uses two complementary algorithms for SAT, DPLL and Stalmarck’s. The two solvers run asynchronously and dynamic information exchange is used for co-operative solving. Interaction of the solvers has been encapsulated as a programming abstraction. Compared to the standalone DPLL solver, DPLL-Stalmarck shows significant performance gains for two of the three problem classes considered and comparable behaviour otherwise. As an exploratory research effort, we have developed a novel algorithm, Concurrent Stalmarck, by applying concurrent techniques to the Stalmarck algorithm. A proof-of-concept prototype for the same has been implemented. Implementation of the saturation technique of the Stalmarck algorithm in a parallel setting, as implemented in Concurrent Stalmarck, has been encapsulated as a programming abstraction. LCF: Provision of programmable concurrent primitives enables customisation of concurrent techniques to specific theorem proving scenarios. In this case study, we have developed a multilayered approach to support programmable, sound extensions for an LCF prover: use programming abstractions to implement the concurrent techniques; use these to develop novel tacticals (control structures to apply tactics), incorporating concurrent techniques; and use these to develop novel proof search procedures. This approach has been implemented in a prototypical LCF style first-order prover, using Alice ML. New tacticals developed are: fastest-first; distributed composition; crossTalk: a novel tactic which uses dynamic, collaborative information exchange to handle unification across multiple sub-goals, with shared meta-variables; a new tactic, performing simultaneous proof-refutation attempts on propositional (sub- )goals, by invoking an external SAT solver (SAT case study), as a counter-example finder. Examples of concrete theorem proving scenarios are provided, demonstrating the utility of these extensions. Synthesis of a variety of automatic proof search procedures has been demonstrated, illustrating the scope of programmability and customisation, enabled by our multilayered approach. 006.3
12	Numerical modelling and visualization of the evolution of extensional fault systems Longshaw, Stephen Michael January 2011 (has links) The purpose of this work is split into two categories, the first was to analyse the application of real-time Physics Engine software libraries for use in calculating a geological numerical model. Second was the analysis of the applicability of glyph and implicit surface based visualization techniques to explore fault systems produced by the model. The current state of the art in Physics Engines was explored by redeveloping a Discrete Element Model to be calculated using NVIDIA's PhysX engine. Analyses regarding the suitability of the engine in terms of numerical accuracy and developmental capabilities is given, as well as the definition of a specialised and bespoke parallelisation technique. The use of various glyph based visualizations is explored to define a new standardised taxonomy for geological data and the MetaBall visualization technique was applied to reveal three dimensional fault structures as an implicit surface. Qualitative analysis was undertaken in the form of a user study, comprising of interviews with expert geologists. The processing pipeline used by many Physics Engines was found to be comparable to the design of Discrete Element Model software, however, aspects of their design, such as integration accuracy, limitation to single precision floating point and imposed limits on the scale of n-body problem means their suitability is restricted to specific modelling cases. Glyph and implicit surface based visualization have been shown to be an effective way to present a geological Discrete Element Model, with the majority of experts interviewed able to perceive the fault structures that it contained. Development of a new engine, or modification of one that exists in accordance with the findings of this thesis would result in a library extremely well suited to the problem of rigid-body simulation for the sciences. 502.85
13	Contribution à la parallélisation automatique : un modèle de processeur à beaucoup de coeurs parallélisant. / Contribution to the automatic parallelization : the model of the manycore parallelizing processor Porada, Katarzyna 14 November 2017 (has links) Depuis les premiers ordinateurs on est en quête de machines plus rapides, plus puissantes, plus performantes. Après avoir épuisé le filon de l’augmentation de la fréquence, les constructeurs se sont tournés vers les multi-cœurs. Le modèle de calcul actuel repose sur les threads de l'OS qu’on exploite à travers différents langages à constructions parallèles. Cependant, la programmation multithread reste un art délicat car le calcul parallèle découpé en threads souffre d’un grand défaut : il est non déterministe.Pourtant, on peut faire du calcul parallèle déterministe, à condition de remplacer le modèle des threads par un modèle s’appuyant sur l’ordre partiel des dépendances. Dans cette thèse, nous proposons un modèle alternatif d’architecture qui exploite le parallélisme d’instructions (ILP) présent dans les programmes. Nous proposons de nombreuses techniques pour s’affranchir de la plupart des dépendances architecturales et obtenir ainsi un ILP qui croît avec la taille de l’exécution. L’ILP qu’on atteint de cette façon est suffisant pour permettre d’alimenter plusieurs milliers de cœurs. Les dépendances architecturales sérialisantes ayant été supprimées, l’ILP peut être bien mieux exploité que dans les architectures actuelles. Un code VHDL au niveau RTL de l’architecture a été développé pour en mesurer les avantages. Les résultats de synthèse d’un processeur allant de 2 à 64 cœurs montrent que la vitesse du matériel que nous proposons reste constante et que sa surface varie linéairement avec le nombre de cœurs. Cela prouve que le modèle d’interconnexion proposé est extensible. / The pursuit for faster and more powerful machines started from the first computers. After exhausting the increase of the frequency, the manufacturers have turned to another solution and started to introduce multiples cores on a chip. The computational model is today based on the OS threads exploited through different languages offering parallel constructions. However, parallel programming remains an art because the thread management by the operating system is not deterministic.Nonetheless, it is possible to compute in a parallel deterministic way if we replace the thread model by a model built on the partial order of dependencies. In this thesis, we present an alternative architectural model exploiting the Instruction Level Parallelism (ILP) naturally present in applications. We propose many techniques to remove most of the architectural dependencies which leads to an ILP increasing with the execution length. The ILP which is reached this way is enough to allow feeding thousands of cores. Eliminating the architecutral dependencies serializing the run allows to exploit the ILP better than in actual microarchitectures. A VHDL code at the RTL level has been implemented to mesure the benefits of our design. The results of the synthesis of a processeur ranging from 2 to 64 cores are reported. They show that the speed of the proposed material keeps constant and the surface grows linearly with the number of cores : our interconnect solution is scalable. Processeur à beaucoup de cœurs Déterminisme Parallélisation automatique Description VHDL RTL FPGA Many-core processor Determinism Automatic parallelisation VHDL RTL FPGA 004
14	Contribution à l'évaluation d'attributs et l'optimisation mémoire sur machines multiprocesseurs Marmol, Bruno 01 December 1995 (has links) (PDF) Les grammaires attribuées offrent un formalisme très adapté à la détection du parallélisme et à la parallélisation. Les graphes de dépendances associés à chaque production correspondent en effet à des graphes de flot de contrôle. Grâce aux grammaires attribuées (\it l)-ordonnées, il est possible de calculer statiquement un ordre total sur les attributs des non-terminaux qui soit compatible avec l'ordre partiel induit par les graphes de dépendances, ce qui évite grand nombre de synchronisations dynamiques. Toutefois, il apparaît que le parallélisme inhérent à ces graphes est beaucoup trop important en pratique pour supporter une parallélisation complète. Notre but a été de montrer qu'il est possible de sélectionner le parallélisme pour obtenir une parallélisation efficace en pratique. Pour cela, l'évaluateur parallèle a été implanté dans un système réel de traitement des grammaires attribuées qu'est le système (\sc FNC-2) et porté sur plusieurs plateformes (KSR1, Multimax et Sequent). Plusieurs types d'implantations ont été effectués afin d'étudier l'influence de la méthode d'évaluation sur la parallélisation. Les méthodes que nous avons utilisées s'appliquent à des architectures à mémoire partagée. Sur les machines testées, les résultats obtenus sont très encourageants malgré l'absence d'utilisation de caractéristiques propres à chaque machine. Un deuxième problème soulevé par le parallélisme est l'explosion mémoire qui a lieu pendant l'évaluation. En séquentiel, cette consommation a été largement limitée par l'utilisation d'un optimiseur mémoire qui permet le partage des instances d'attributs en dehors de l'arbre. Deux structures sont utilisées\,: la variable globale et la pile. Nous avons proposé une méthode pour étendre cette optimisation mémoire au cas parallèle ce qui permet d'une part de sortir les attributs de l'arbre même en parallèle et d'autre part d'éliminer de nombreuses règles de copie. SCIENCES APPLIQUEES : INFORMATIQUE AUTOMATIQUE THEORIQUE SYSTEMES GRAMMAIRE PARALLELISME EVALUATION PARALLELISATION GESTION MEMOIRE GRAPHE PROGRAMME DEPENDANCE GRAMMAIRE ATTRIBUEE EVALUATEUR I-ORDONNEE
15	Parallel algorithms for target tracking on multi-coreplatform with mobile LEGO robots Wahlberg, Fredrik January 2011 (has links) The aim of this master thesis was to develop a versatile and reliable experimentalplatform of mobile robots, solving tracking problems, for education and research.Evaluation of parallel bearings-only tracking and control algorithms on a multi-corearchitecture has been performed. The platform was implemented as a mobile wirelesssensor network using multiple mobile robots, each using a mounted camera for dataacquisition. Data processing was performed on the mobile robots and on a server,which also played the role of network communication hub. A major focus was toimplement this platform in a flexible manner to allow for education and futureresearch in the fields of signal processing, wireless sensor networks and automaticcontrol. The implemented platform was intended to act as a bridge between the idealworld of simulation and the non-ideal real world of full scale prototypes.The implemented algorithms did estimation of the positions of the robots, estimationof a non-cooperating target's position and regulating the positions of the robots. Thetracking algorithms implemented were the Gaussian particle filter, the globallydistributed particle filter and the locally distributed particle filter. The regulator triedto move the robots to give the highest possible sensor information under givenconstraints. The regulators implemented used model predictive control algorithms.Code for communicating with filters in external processes were implementedtogether with tools for data extraction and statistical analysis.Both implementation details and evaluation of different tracking algorithms arepresented. Some algorithms have been tested as examples of the platformscapabilities, among them scalability and accuracy of some particle filtering techniques.The filters performed with sufficient accuracy and showed a close to linear speedupusing up to 12 processor cores. Performance of parallel particle filtering withconstraints on network bandwidth was also studied, measuring breakpoints on filtercommunication to avoid weight starvation. Quality of the sensor readings, networklatency and hardware performance are discussed. Experiments showed that theplatform was a viable alternative for data acquisition in algorithm development and forbenchmarking to multi-core architecture. The platform was shown to be flexibleenough to be used a framework for future algorithm development and education inautomatic control.
16	Étude théorique de la transition de phase α<->γ du cérium : prise en compte des fortes corrélations en DFT+DMFT Bieder, Jordan 17 October 2013 (has links) (PDF) La transition de phase isostructurale du cérium a été et reste l'objet de nombreuses études pour tester les méthodes permettant de décrire les matériaux fortement corrélés.La Théorie du Champ Moyen Dynamique (DMFT) jointe à la Théorie de la fonctionnelle de la densité à permis de décrire de tels systèmes.Pourtant, le calcul des propriétés de l'état fondamental nécessite une très bonne précision de calcul à la fois de la part de la DFT et de la DMFT.Nous utilisons un résolveur Monte Carlo Quantique en Temps Continu (CT-QMC), rapide et capable de simuler les basses températures, combiné à une implantation ondes planes augmentées par projection de la DMFT pour calculer les énergies internes et libres -- et par conséquent l'entropie -- au cours de la transition de phase du cérium.D'importants calculs, utilisant cette implantation, nous ont permis de reconsidérer les propriétés de l'état fondamental et une grande partie de la thermodynamique de la transition de phase α<->γ du cérium à basses températures.En particulier, le bruit stochastique est suffisamment faible pour interpréter, sans ambiguïté, les courbes énergie en fonction du volume.Sur ces dernières, un double point d'inflexion est clairement visible pour l'énergie interne jusqu'à une température relativement basse.Les courbes d'énergie libre mettent, de plus, en évidence l'importance de l'entropie pour ce système.D'autre part, les spectres de photoemission tout au long de la transition de phase sont analysés.Le schéma DMFT est comparé avec des calculs DFT récents et des données expérimentales récentes.Enfin, nous mettons en avant les approximations utilisées et nous nous interrogeons sur leurs validité. Ab-initio Corrélations Premiers principes Localisation Kondo DFT Monte Carlo Cérium Transition de phase DMFT Thermodynamique Phonons Parallelisation
17	GPU-akcelerovná syntéza pravděpodobnostních programů / GPU-Accelerated Synthesis of Probabilistic Programs Marcin, Vladimír January 2021 (has links) V tejto práci sa zoberáme problémom automatizovanej syntézy pravdepodobnostných programov: majme konečnú rodinu kandidátnych programov, v ktorej chceme efektívne identifikovať program spĺňajúci danú špecifikáciu. Aj riešenie tých najjednoduchších syntéznych problémov v praxi predstavuje NP-ťažký problém. Pokrok v tejto oblasti prináša nástroj Paynt, ktorý na riešenie tohto problému používa novú integrovanú metódu syntézy pravdepodobnostných programov. Aj keď sa tento prístup dokáže efektívne vysporiadať s exponenciálnym rastom rodín kandidátnych riešení, stále tu existuje problém spôsobený exponenciálnym rastom jednotlivých členov týchto rodín. S cieľom vysporiadať sa aj s týmto problémom, sme implementovali GPU orientované algoritmy slúžiace na overovanie kandidátnych programov (modelov), ktoré danú úlohu paralelizujú na stavovej úrovni pravdepodobnostých modelov. Celkové zrýchlenie doshiahnuté týmto prístupom za určitých podmienok potom prinieslo takmer teoretický limit možného zrýchlenia syntézneho procesu.
18	Traitement de données multi-spectrales par calcul intensif et applications chez l'homme en imagerie par résonnance magnétique nucléaire / Processing of multi-spectral data by high performance computing and its applications on human nuclear magnetic resonance imaging Angeletti, Mélodie 21 February 2019 (has links) L'imagerie par résonance magnétique fonctionnelle (IRMf) étant une technique non invasive pour l'étude de cerveau, elle a été employée pour comprendre les mécanismes cérébraux sous-jacents à la prise alimentaire. Cependant, l'utilisation de stimuli liquides pour simuler la prise alimentaire engendre des difficultés supplémentaires par rapport aux stimulations visuellement habituellement mises en œuvre en IRMf. L'objectif de cette thèse a donc été de proposer une méthode robuste d'analyse des données tenant compte de la spécificité d'une stimulation alimentaire. Pour prendre en compte le mouvement dû à la déglutition, nous proposons une méthode de censure fondée uniquement sur le signal mesuré. Nous avons de plus perfectionné l'étape de normalisation des données afin de réduire la perte de signal. La principale contribution de cette thèse est d'implémenter l'algorithme de Ward de sorte que parcelliser l'ensemble du cerveau soit réalisable en quelques heures et sans avoir à réduire les données au préalable. Comme le calcul de la distance euclidienne entre toutes les paires de signaux des voxels représente une part importante de l'algorithme de Ward, nous proposons un algorithme cache-aware du calcul de la distance ainsi que trois parallélisations sur les architectures suivantes : architecture à mémoire partagée, architecture à mémoire distribuée et GPU NVIDIA. Une fois l'algorithme de Ward exécuté, il est possible d'explorer toutes les échelles de parcellisation. Nous considérons plusieurs critères pour évaluer la qualité de la parcellisation à une échelle donnée. À une échelle donnée, nous proposons soit de calculer des cartes de connectivités entre les parcelles, soit d'identifier les parcelles répondant à la stimulation à l'aide du coefficient de corrélation de Pearson. / As a non-invasive technology for studying brain imaging, functional magnetic resonance imaging (fMRI) has been employed to understand the brain underlying mechanisms of food intake. Using liquid stimuli to fake food intake adds difficulties which are not present in fMRI studies with visual stimuli. This PhD thesis aims to propose a robust method to analyse food stimulated fMRI data. To correct the data from swallowing movements, we have proposed to censure the data uniquely from the measured signal. We have also improved the normalization step of data between subjects to reduce signal loss.The main contribution of this thesis is the implementation of Ward's algorithm without data reduction. Thus, clustering the whole brain in several hours is now feasible. Because Euclidean distance computation is the main part of Ward algorithm, we have developed a cache-aware algorithm to compute the distance between each pair of voxels. Then, we have parallelized this algorithm for three architectures: shared-memory architecture, distributed memory architecture and NVIDIA GPGPU. Once Ward's algorithm has been applied, it is possible to explore multi-scale clustering of data. Several criteria are considered in order to evaluate the quality of clusters. For a given number of clusters, we have proposed to compute connectivity maps between clusters or to compute Pearson correlation coefficient to identify brain regions activated by the stimulation. IRMf alimentaire Parcellisation multi-échelle Algorithme de Ward Distance euclidienne Parallélisation OpenMP MPI CUDA Food fMRI Multi-scale clustering Ward's algorithm Euclidean distance Parallelisation OpenMP MPI CUDA
19	Analyse de méthodes de résolution parallèles d’EDO/EDA raides / Analysis of parallel methods for solving stiff ODE and DAE Guibert, David 10 September 2009 (has links) La simulation numérique de systèmes d’équations différentielles raides ordinaires ou algébriques est devenue partie intégrante dans le processus de conception des systèmes mécaniques à dynamiques complexes. L’objet de ce travail est de développer des méthodes numériques pour réduire les temps de calcul par le parallélisme en suivant deux axes : interne à l’intégrateur numérique, et au niveau de la décomposition de l’intervalle de temps. Nous montrons l’efficacité limitée au nombre d’étapes de la parallélisation à travers les méthodes de Runge-Kutta et DIMSIM. Nous développons alors une méthodologie pour appliquer le complément de Schur sur le système linéarisé intervenant dans les intégrateurs par l’introduction d’un masque de dépendance construit automatiquement lors de la mise en équations du modèle. Finalement, nous étendons le complément de Schur aux méthodes de type "Krylov Matrix Free". La décomposition en temps est d’abord vue par la résolution globale des pas de temps dont nous traitons la parallélisation du solveur non-linéaire (point fixe, Newton-Krylov et accélération de Steffensen). Nous introduisons les méthodes de tirs à deux niveaux, comme Parareal et Pita dont nous redéfinissons les finesses de grilles pour résoudre les problèmes raides pour lesquels leur efficacité parallèle est limitée. Les estimateurs de l’erreur globale, nous permettent de construire une extension parallèle de l’extrapolation de Richardson pour remplacer le premier niveau de calcul. Et nous proposons une parallélisation de la méthode de correction du résidu. / This PhD Thesis deals with the development of parallel numerical methods for solving Ordinary and Algebraic Differential Equations. ODE and DAE are commonly arising when modeling complex dynamical phenomena. We first show that the parallelization across the method is limited by the number of stages of the RK method or DIMSIM. We introduce the Schur complement into the linearised linear system of time integrators. An automatic framework is given to build a mask defining the relationships between the variables. Then the Schur complement is coupled with Jacobian Free Newton-Krylov methods. As time decomposition, global time steps resolutions can be solved by parallel nonlinear solvers (such as fixed point, Newton and Steffensen acceleration). Two steps time decomposition (Parareal, Pita,...) are developed with a new definition of their grids to solved stiff problems. Global error estimates, especially the Richardson extrapolation, are used to compute a good approximation for the second grid. Finally we propose a parallel deferred correction Complément de Schur Correction du résidu Décomposition de domaine en temps Jacobian Free Newton-Krylov Paralléisatin Parareal/Pita Deferred correction method Time decomposition method Jacobian Free Newton-Krylov Parallelisation Runge-Kutta
20	Étude théorique de la transition de phase α<->γ du cérium : prise en compte des fortes corrélations en DFT+DMFT / Theoretical study of the α<->γ cerium phase transition : including strong correlations in DFT+DMFT Bieder, Jordan 17 October 2013 (has links) La transition de phase isostructurale du cérium a été et reste l'objet de nombreuses études pour tester les méthodes permettant de décrire les matériaux fortement corrélés.La Théorie du Champ Moyen Dynamique (DMFT) jointe à la Théorie de la fonctionnelle de la densité à permis de décrire de tels systèmes.Pourtant, le calcul des propriétés de l'état fondamental nécessite une très bonne précision de calcul à la fois de la part de la DFT et de la DMFT.Nous utilisons un résolveur Monte Carlo Quantique en Temps Continu (CT-QMC), rapide et capable de simuler les basses températures, combiné à une implantation ondes planes augmentées par projection de la DMFT pour calculer les énergies internes et libres -- et par conséquent l'entropie -- au cours de la transition de phase du cérium.D'importants calculs, utilisant cette implantation, nous ont permis de reconsidérer les propriétés de l'état fondamental et une grande partie de la thermodynamique de la transition de phase α<->γ du cérium à basses températures.En particulier, le bruit stochastique est suffisamment faible pour interpréter, sans ambiguïté, les courbes énergie en fonction du volume.Sur ces dernières, un double point d'inflexion est clairement visible pour l'énergie interne jusqu'à une température relativement basse.Les courbes d'énergie libre mettent, de plus, en évidence l'importance de l'entropie pour ce système.D'autre part, les spectres de photoemission tout au long de la transition de phase sont analysés.Le schéma DMFT est comparé avec des calculs DFT récents et des données expérimentales récentes.Enfin, nous mettons en avant les approximations utilisées et nous nous interrogeons sur leurs validité. / The isostructural phase transition of cerium has been and remains the aim of many studies in order to test methods developed to describe strongly correlated materials.The Dynamical Mean Field Theory (DMFT) combined with density functional theory (DFT) has been successful to describe such systems.However, the computation of the ground state properties requires a very good accuracy from both DFT and DMFT sides.We use thus a strong coupling Continuous Time Quantum Monte Carlo (CT-QMC) solver, which is fast and able to reach low temperatures, in combination with a projector augmented wave (PAW) DMFT implementation to calculate internls and free energies -- and thus the entropy -- during the phase transition of cerium.Extensive calculations using this implementation allows us to carefully reassess the ground state properties and almost all thermodynamics of the α<->γ phase transition in cerium at low temperatures.In particular, stochastic noise is small enough to avoid any ambiguity on the interpretation of energy versus volume curves.On those curves, a double inflexion point is clearly observable ont the internal energy curves untill a relatively low temperature.Moreover, free energy curves highlight the importance of including the entropy contribution.The DMFT picture is put in perspective with recent DFT calculations and recent experimental investigations.Furthermore, photoemission spectra are analysed while the phase transition.Finaly, we discuss the approximations used and raise curiosity about their consideration. Ab-initio Corrélations Premiers principes Localisation Kondo DFT Monte Carlo Cérium Transition de phase DMFT Thermodynamique Phonons Parallelisation Ab-initio Correlations First principles Localization Localization DFT Monte Carlo Cerium Phase transtion DMFT Thermodynamics Phonons Parallelism

Search results