Global ETD Search

1	Computer vision applications on graphics processing units Ohmer, Julius Fabian January 2007 (has links) Over the last few years, commodity Graphics Processing Units (GPUs) have evolved from fixed graphics pipeline processors into more flexible and powerful data-parallel processors. These stream processors are capable of sustaining computation rates of greater than ten times that of a single-core CPU. GPUs are inexpensive and are becoming ubiquitous in a wide variety of computer architectures including desktop and laptop computers, PDAs and cell phones. This research works investigates possible ways to use modern GPUs for real-time computer vision and pattern classification tasks. Special attention is paid to algorithms, where the power of the CPU is a limiting factor. This is in particular the case for real-time tracking algorithms on video streams, where many candidate regions must be evaluated at once to allow stable tracking of features. They impose a high computational burdon on sequential processing units such as the CPU. The proposed implementation presented in this thesis is considering standard PC platforms rather than expensive special dedicated hardware to allow a broad variety of users to benefit from powerful computer vision applications. In particular, this thesis includes following topics: 1. First, we present a framework for computer vision on the GPU, which is used as a foundation for the implementation of computer vision methods. 2. We continue with the discussion of GPU-based implementation of Kernel Methods, including Support Vector Machines and Kernel PCA. 3. Finally, we propose GPU-accelerated implementations of two tracking algorithms. The first algorithm uses geometric templates in a gradient vector field. The second algorithm is a color-based approach in a particle filter framework. Both are able to track objects in a video stream. This thesis concludes with a final discussion of the presented methods and will propose directions for further research work. It will also briefly present the features of the next generation of GPUs. graphics processing units (GPU) computer vision applications
2	Accélérateurs logiciels et matériels pour l'algèbre linéaire creuse sur les corps finis / Hardware and Software Accelerators for Sparse Linear Algebra over Finite Fields Jeljeli, Hamza 16 July 2015 (has links) Les primitives de la cryptographie à clé publique reposent sur la difficulté supposée de résoudre certains problèmes mathématiques. Dans ce travail, on s'intéresse à la cryptanalyse du problème du logarithme discret dans les sous-groupes multiplicatifs des corps finis. Les algorithmes de calcul d'index, utilisés dans ce contexte, nécessitent de résoudre de grands systèmes linéaires creux définis sur des corps finis de grande caractéristique. Cette algèbre linéaire représente dans beaucoup de cas le goulot d'étranglement qui empêche de cibler des tailles de corps plus grandes. L'objectif de cette thèse est d'explorer les éléments qui permettent d'accélérer cette algèbre linéaire sur des architectures pensées pour le calcul parallèle. On est amené à exploiter le parallélisme qui intervient dans différents niveaux algorithmiques et arithmétiques et à adapter les algorithmes classiques aux caractéristiques des architectures utilisées et aux spécificités du problème. Dans la première partie du manuscrit, on présente un rappel sur le contexte du logarithme discret et des architectures logicielles et matérielles utilisées. La seconde partie du manuscrit est consacrée à l'accélération de l'algèbre linéaire. Ce travail a donné lieu à deux implémentations de résolution de systèmes linéaires basées sur l'algorithme de Wiedemann par blocs : une implémentation adaptée à un cluster de GPU NVIDIA et une implémentation adaptée à un cluster de CPU multi-cœurs. Ces implémentations ont contribué à la réalisation de records de calcul de logarithme discret dans les corps binaires GF(2^{619}) et GF(2^{809} et dans le corps premier GF(p_{180}) / The security of public-key cryptographic primitives relies on the computational difficulty of solving some mathematical problems. In this work, we are interested in the cryptanalysis of the discrete logarithm problem over the multiplicative subgroups of finite fields. The index calculus algorithms, which are used in this context, require solving large sparse systems of linear equations over finite fields. This linear algebra represents a serious limiting factor when targeting larger fields. The object of this thesis is to explore all the elements that accelerate this linear algebra over parallel architectures. We need to exploit the different levels of parallelism provided by these computations and to adapt the state-of-the-art algorithms to the characteristics of the considered architectures and to the specificities of the problem. In the first part of the manuscript, we present an overview of the discrete logarithm context and an overview of the considered software and hardware architectures. The second part deals with accelerating the linear algebra. We developed two implementations of linear system solvers based on the block Wiedemann algorithm: an NVIDIA-GPU-based implementation and an implementation adapted to a cluster of multi-core CPU. These implementations contributed to solving the discrete logarithm problem in binary fields GF(2^{619}) et GF(2^{809}) and in the prime field GF(p_{180}) Calcul haute-Performance Solveurs d’algèbre linéaire creuse Arithmétique sur les corps finis Residue Number System Processeurs multicœurs Graphics Processing Units (GPU) 004.35
3	A Multidimensional Filtering Framework with Applications to Local Structure Analysis and Image Enhancement Svensson, Björn January 2008 (has links) Filtering is a fundamental operation in image science in general and in medical image science in particular. The most central applications are image enhancement, registration, segmentation and feature extraction. Even though these applications involve non-linear processing a majority of the methodologies available rely on initial estimates using linear filters. Linear filtering is a well established cornerstone of signal processing, which is reflected by the overwhelming amount of literature on finite impulse response filters and their design. Standard techniques for multidimensional filtering are computationally intense. This leads to either a long computation time or a performance loss caused by approximations made in order to increase the computational efficiency. This dissertation presents a framework for realization of efficient multidimensional filters. A weighted least squares design criterion ensures preservation of the performance and the two techniques called filter networks and sub-filter sequences significantly reduce the computational demand. A filter network is a realization of a set of filters, which are decomposed into a structure of sparse sub-filters each with a low number of coefficients. Sparsity is here a key property to reduce the number of floating point operations required for filtering. Also, the network structure is important for efficiency, since it determines how the sub-filters contribute to several output nodes, allowing reduction or elimination of redundant computations. Filter networks, which is the main contribution of this dissertation, has many potential applications. The primary target of the research presented here has been local structure analysis and image enhancement. A filter network realization for local structure analysis in 3D shows a computational gain, in terms of multiplications required, which can exceed a factor 70 compared to standard convolution. For comparison, this filter network requires approximately the same amount of multiplications per signal sample as a single 2D filter. These results are purely algorithmic and are not in conflict with the use of hardware acceleration techniques such as parallel processing or graphics processing units (GPU). To get a flavor of the computation time required, a prototype implementation which makes use of filter networks carries out image enhancement in 3D, involving the computation of 16 filter responses, at an approximate speed of 1MVoxel/s on a standard PC. Medical image science multidimensional filtering image enhancement image registration image segmentation filter networks graphics processing units (GPU) Medical engineering Medicinsk teknik
4	Enhancing GPGPU Performance through Warp Scheduling, Divergence Taming and Runtime Parallelizing Transformations Anantpur, Jayvant P January 2017 (has links) (PDF) There has been a tremendous growth in the use of Graphics Processing Units (GPU) for the acceleration of general purpose applications. The growth is primarily due to the huge computing power offered by the GPUs and the emergence of programming languages such as CUDA and OpenCL. A typical GPU consists of several 100s to a few 1000s of Single Instruction Multiple Data (SIMD) cores, organized as 10s of Streaming Multiprocessors (SMs), each having several SIMD cores which operate in a lock-step manner, o ering a few TeraFLOPS of performance in a single socket. SMs execute instructions from a group of consecutive threads, called warps. At each cycle, an SM schedules a warp from a group of active warps and can context switch among the active warps to hide various stalls. However, various factors, such as global memory latency, divergence among warps of a thread block (TB), branch divergence among threads of a warp (Control Divergence), number of active warps, etc., can significantly impact the ability of a warp scheduler to hide stalls. This reduces the speedup of applications running on the GPU. Further, applications containing loops with potential cross iteration dependences, do not utilize the available resources (SIMD cores) effectively and hence su er in terms of performance. In this thesis, we propose several mechanisms which address the above issues and enhance the performance of GPU applications through efficient warp scheduling, taming branch and warp divergence, and runtime parallelization. First, we propose RLWS, a Reinforcement Learning (RL) based Warp Scheduler which uses unsupervised learning to schedule warps based on the current state of the core and the long-term benefits of scheduling actions. As the design space involving the state variables used by the RL and the RL parameters (such as learning and exploration rates, reward and penalty values, etc.) is large, we use a Genetic Algorithm to identify the useful subset of state variables and RL parameter values. We evaluated the proposed RL based scheduler using the GPGPU-SIM simulator on a large number of applications from the Rodinia, Parboil, CUDA-SDK and GPGPU-SIM benchmark suites. Our RL based implementation achieved an average speedup of 1.06x over the Loose Round Robin (LRR) strategy and 1.07x over the Two-Level (TL) strategy. A salient feature of RLWS is that it is robust, i.e., performs nearly as well as the best performing warp scheduler, consistently across a wide range of applications. Using the insights obtained from RLWS, we designed PRO, a heuristic warp scheduler which in addition to hiding the long latencies of certain operations, reduces the waiting time of warps at synchronization points. Evaluation of the proposed algorithm using the GPGPU-SIM simulator on a diverse set of applications showed an average speedup of 1.07x over the LRR warp scheduler and 1.08x over the TL warp scheduler. In the second part of the thesis, we address problems due to warp and branch divergences. First, many GPU kernels exhibit warp divergence due to various reasons such as, different amounts of work, cache misses, and thread divergence. Also, we observed that some kernels contain code which is redundant across TBs, i.e., all TBs will execute the code identically and hence compute the same results. To improve performance of such kernels, we propose a solution based on the concept of virtual TBs and loop independent code motion. We propose necessary code transformations which enable one virtual TB to execute the kernel code for multiple real TBs. We evaluated this technique using the GPGPU-SIM simulator on a diverse set of applications and observed an average improvement of 1.08x over the LRR and 1.04x over the Greedy Then Old (GTO) warp scheduling algorithms. Second, branch divergence causes execution of diverging branches to be serialized to execute only one control ow path at a time. Existing stack based hardware mechanism to reconverge threads causes duplicate execution of code for unstructured control ow graphs (CFG). We propose a simple and elegant transformation to convert an unstructured CFG to a structured CFG. The transformation eliminates duplicate execution of user code while incurring only a linear increase in the number of basic blocks and also the number of instructions. We implemented the proposed transformation at the PTX level using the Ocelot compiler infrastructure and demonstrate that the pro-posed technique is effective in handling the performance problem due to divergence in unstructured CFGs. Our third proposal is to enable efficient execution of loops with indirect memory accesses that can potentially cause cross iteration dependences. Such dependences are hard to detect using existing compilation techniques. We present an algorithm to compute at run-time, the cross iteration dependences in such loops, using both the CPU and the GPU. It effectively uses the compute capabilities of the GPU to collect the memory accesses performed by the iterations. Using the dependence information, the loop iterations are levelized such that each level contains independent iterations which can be executed in parallel. Experimental evaluation on real hardware (NVIDIA GPUs) reveals that the proposed technique can achieve an average speedup of 6.4x on loops with a reasonable number of cross iteration dependences. Computer Graphics Graphics Processing Units (GPU) Runtime Parallelization Transformation Warp Scheduler Taming Warp Divergence Warp Scheduling Reinforcement Learning Control Divergence Warp Divergence Computer Science
5	Towards Manifesting Reliability Issues In Modern Computer Systems Zheng, Mai 02 September 2015 (has links) No description available. Computer Science
6	Improving Performance and Energy Efficiency of Heterogeneous Systems with rCUDA Prades Gasulla, Javier 14 June 2021 (has links) Tesis por compendio / [ES] En la última década la utilización de la GPGPU (General Purpose computing in Graphics Processing Units; Computación de Propósito General en Unidades de Procesamiento Gráfico) se ha vuelto tremendamente popular en los centros de datos de todo el mundo. Las GPUs (Graphics Processing Units; Unidades de Procesamiento Gráfico) se han establecido como elementos aceleradores de cómputo que son usados junto a las CPUs formando sistemas heterogéneos. La naturaleza masivamente paralela de las GPUs, destinadas tradicionalmente al cómputo de gráficos, permite realizar operaciones numéricas con matrices de datos a gran velocidad debido al gran número de núcleos que integran y al gran ancho de banda de acceso a memoria que poseen. En consecuencia, aplicaciones de todo tipo de campos, tales como química, física, ingeniería, inteligencia artificial, ciencia de materiales, etc. que presentan este tipo de patrones de cómputo se ven beneficiadas, reduciendo drásticamente su tiempo de ejecución. En general, el uso de la aceleración del cómputo en GPUs ha significado un paso adelante y una revolución. Sin embargo, no está exento de problemas, tales como problemas de eficiencia energética, baja utilización de las GPUs, altos costes de adquisición y mantenimiento, etc. En esta tesis pretendemos analizar las principales carencias que presentan estos sistemas heterogéneos y proponer soluciones basadas en el uso de la virtualización remota de GPUs. Para ello hemos utilizado la herramienta rCUDA, desarrollada en la Universitat Politècnica de València, ya que multitud de publicaciones la avalan como el framework de virtualización remota de GPUs más avanzado de la actualidad. Los resutados obtenidos en esta tesis muestran que el uso de rCUDA en entornos de Cloud Computing incrementa el grado de libertad del sistema, ya que permite crear instancias virtuales de las GPUs físicas totalmente a medida de las necesidades de cada una de las máquinas virtuales. En entornos HPC (High Performance Computing; Computación de Altas Prestaciones), rCUDA también proporciona un mayor grado de flexibilidad de uso de las GPUs de todo el clúster de cómputo, ya que permite desacoplar totalmente la parte CPU de la parte GPU de las aplicaciones. Además, las GPUs pueden estar en cualquier nodo del clúster, independientemente del nodo en el que se está ejecutando la parte CPU de la aplicación. En general, tanto para Cloud Computing como en el caso de HPC, este mayor grado de flexibilidad se traduce en un aumento hasta 2x de la productividad de todo el sistema al mismo tiempo que se reduce el consumo energético en un 15%. Finalmente, también hemos desarrollado un mecanismo de migración de trabajos de la parte GPU de las aplicaciones que ha sido integrado dentro del framework rCUDA. Este mecanismo de migración ha sido evaluado y los resultados muestran claramente que, a cambio de una pequeña sobrecarga, alrededor de 400 milisegundos, en el tiempo de ejecución de las aplicaciones, es una potente herramienta con la que, de nuevo, aumentar la productividad y reducir el gasto energético del sistema. En resumen, en esta tesis se analizan los principales problemas derivados del uso de las GPUs como aceleradores de cómputo, tanto en entornos HPC como de Cloud Computing, y se demuestra cómo a través del uso del framework rCUDA, estos problemas pueden solucionarse. Además se desarrolla un potente mecanismo de migración de trabajos GPU, que integrado dentro del framework rCUDA, se convierte en una herramienta clave para los futuros planificadores de trabajos en clusters heterogéneos. / [CA] En l'última dècada la utilització de la GPGPU(General Purpose computing in Graphics Processing Units; Computació de Propòsit General en Unitats de Processament Gràfic) s'ha tornat extremadament popular en els centres de dades de tot el món. Les GPUs (Graphics Processing Units; Unitats de Processament Gràfic) s'han establert com a elements acceleradors de còmput que s'utilitzen al costat de les CPUs formant sistemes heterogenis. La naturalesa massivament paral·lela de les GPUs, destinades tradicionalment al còmput de gràfics, permet realitzar operacions numèriques amb matrius de dades a gran velocitat degut al gran nombre de nuclis que integren i al gran ample de banda d'accés a memòria que posseeixen. En conseqüència, les aplicacions de tot tipus de camps, com ara química, física, enginyeria, intel·ligència artificial, ciència de materials, etc. que presenten aquest tipus de patrons de còmput es veuen beneficiades reduint dràsticament el seu temps d'execució. En general, l'ús de l'acceleració del còmput en GPUs ha significat un pas endavant i una revolució, però no està exempt de problemes, com ara poden ser problemes d'eficiència energètica, baixa utilització de les GPUs, alts costos d'adquisició i manteniment, etc. En aquesta tesi pretenem analitzar les principals mancances que presenten aquests sistemes heterogenis i proposar solucions basades en l'ús de la virtualització remota de GPUs. Per a això hem utilitzat l'eina rCUDA, desenvolupada a la Universitat Politècnica de València, ja que multitud de publicacions l'avalen com el framework de virtualització remota de GPUs més avançat de l'actualitat. Els resultats obtinguts en aquesta tesi mostren que l'ús de rCUDA en entorns de Cloud Computing incrementa el grau de llibertat del sistema, ja que permet crear instàncies virtuals de les GPUs físiques totalment a mida de les necessitats de cadascuna de les màquines virtuals. En entorns HPC (High Performance Computing; Computació d'Altes Prestacions), rCUDA també proporciona un major grau de flexibilitat en l'ús de les GPUs de tot el clúster de còmput, ja que permet desacoblar totalment la part CPU de la part GPU de les aplicacions. A més, les GPUs poden estar en qualsevol node del clúster, sense importar el node en el qual s'està executant la part CPU de l'aplicació. En general, tant per a Cloud Computing com en el cas del HPC, aquest major grau de flexibilitat es tradueix en un augment fins 2x de la productivitat de tot el sistema al mateix temps que es redueix el consum energètic en aproximadament un 15%. Finalment, també hem desenvolupat un mecanisme de migració de treballs de la part GPU de les aplicacions que ha estat integrat dins del framework rCUDA. Aquest mecanisme de migració ha estat avaluat i els resultats mostren clarament que, a canvi d'una petita sobrecàrrega, al voltant de 400 mil·lisegons, en el temps d'execució de les aplicacions, és una potent eina amb la qual, de nou, augmentar la productivitat i reduir la despesa energètica de sistema. En resum, en aquesta tesi s'analitzen els principals problemes derivats de l'ús de les GPUs com acceleradors de còmput, tant en entorns HPC com de Cloud Computing, i es demostra com a través de l'ús del framework rCUDA, aquests problemes poden solucionar-se. A més es desenvolupa un potent mecanisme de migració de treballs GPU, que integrat dins del framework rCUDA, esdevé una eina clau per als futurs planificadors de treballs en clústers heterogenis. / [EN] In the last decade the use of GPGPU (General Purpose computing in Graphics Processing Units) has become extremely popular in data centers around the world. GPUs (Graphics Processing Units) have been established as computational accelerators that are used alongside CPUs to form heterogeneous systems. The massively parallel nature of GPUs, traditionally intended for graphics computing, allows to perform numerical operations with data arrays at high speed. This is achieved thanks to the large number of cores GPUs integrate and the large bandwidth of memory access. Consequently, applications of all kinds of fields, such as chemistry, physics, engineering, artificial intelligence, materials science, and so on, presenting this type of computational patterns are benefited by drastically reducing their execution time. In general, the use of computing acceleration provided by GPUs has meant a step forward and a revolution, but it is not without problems, such as energy efficiency problems, low utilization of GPUs, high acquisition and maintenance costs, etc. In this PhD thesis we aim to analyze the main shortcomings of these heterogeneous systems and propose solutions based on the use of remote GPU virtualization. To that end, we have used the rCUDA middleware, developed at Universitat Politècnica de València. Many publications support rCUDA as the most advanced remote GPU virtualization framework nowadays. The results obtained in this PhD thesis show that the use of rCUDA in Cloud Computing environments increases the degree of freedom of the system, as it allows to create virtual instances of the physical GPUs fully tailored to the needs of each of the virtual machines. In HPC (High Performance Computing) environments, rCUDA also provides a greater degree of flexibility in the use of GPUs throughout the computing cluster, as it allows the CPU part to be completely decoupled from the GPU part of the applications. In addition, GPUs can be on any node in the cluster, regardless of the node on which the CPU part of the application is running. In general, both for Cloud Computing and in the case of HPC, this greater degree of flexibility translates into an up to 2x increase in system-wide throughput while reducing energy consumption by approximately 15%. Finally, we have also developed a job migration mechanism for the GPU part of applications that has been integrated within the rCUDA middleware. This migration mechanism has been evaluated and the results clearly show that, in exchange for a small overhead of about 400 milliseconds in the execution time of the applications, it is a powerful tool with which, again, we can increase productivity and reduce energy foot print of the computing system. In summary, this PhD thesis analyzes the main problems arising from the use of GPUs as computing accelerators, both in HPC and Cloud Computing environments, and demonstrates how thanks to the use of the rCUDA middleware these problems can be addressed. In addition, a powerful GPU job migration mechanism is being developed, which, integrated within the rCUDA framework, becomes a key tool for future job schedulers in heterogeneous clusters. / This work jointly supported by the Fundación Séneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants (20524/PDC/18, 20813/PI/18 and 20988/PI/18) and by the Spanish MEC and European Commission FEDER under grants TIN2015-66972-C5-3-R, TIN2016-78799-P and CTQ2017-87974-R (AEI/FEDER, UE). We also thank NVIDIA for hardware donation under GPU Educational Center 2014-2016 and Research Center 2015-2016. The authors thankfully acknowledge the computer resources at CTE-POWER and the technical support provided by Barcelona Supercomputing Center - Centro Nacional de Supercomputación (RES-BCV-2018-3-0008). Furthermore, researchers from Universitat Politècnica de València are supported by the Generalitat Valenciana under Grant PROMETEO/2017/077. Authors are also grateful for the generous support provided by Mellanox Technologies Inc. Prof. Pradipta Purkayastha, from Department of Chemical Sciences, Indian Institute of Science Education and Research (IISER) Kolkata, is acknowledged for kindly providing the initial ligand and DNA structures. / Prades Gasulla, J. (2021). Improving Performance and Energy Efficiency of Heterogeneous Systems with rCUDA [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/168081 / Compendio Computación de altas prestaciones High Performance Computing Unidades de procesamiento gráfico GPGPU Energy Efficiency HPC Cloud Computing Heterogeneous systems rCUDA Graphics processing units (GPU)
7	Accelerated sampling of energy landscapes Mantell, Rosemary Genevieve January 2017 (has links) In this project, various computational energy landscape methods were accelerated using graphics processing units (GPUs). Basin-hopping global optimisation was treated using a version of the limited-memory BFGS algorithm adapted for CUDA, in combination with GPU-acceleration of the potential calculation. The Lennard-Jones potential was implemented using CUDA, and an interface to the GPU-accelerated AMBER potential was constructed. These results were then extended to form the basis of a GPU-accelerated version of hybrid eigenvector-following. The doubly-nudged elastic band method was also accelerated using an interface to the potential calculation on GPU. Additionally, a local rigid body framework was adapted for GPU hardware. Tests were performed for eight biomolecules represented using the AMBER potential, ranging in size from 81 to 22\,811 atoms, and the effects of minimiser history size and local rigidification on the overall efficiency were analysed. Improvements relative to CPU performance of up to two orders of magnitude were obtained for the largest systems. These methods have been successfully applied to both biological systems and atomic clusters. An existing interface between a code for free energy basin-hopping and the SuiteSparse package for sparse Cholesky factorisation was refined, validated and tested. Tests were performed for both Lennard-Jones clusters and selected biomolecules represented using the AMBER potential. Significant acceleration of the vibrational frequency calculations was achieved, with negligible loss of accuracy, relative to the standard diagonalisation procedure. For the larger systems, exploiting sparsity reduces the computational cost by factors of 10 to 30. The acceleration of these computational energy landscape methods opens up the possibility of investigating much larger and more complex systems than previously accessible. A wide array of new applications are now computationally feasible. 660.0285
8	Résolutions rapides et fiables pour les solveurs d'algèbre linéaire numérique en calcul haute performance. Baboulin, Marc 05 December 2012 (has links) (PDF) Dans cette Habilitation à Diriger des Recherches (HDR), nous présentons notre recherche effectuée au cours de ces dernières années dans le domaine du calcul haute-performance. Notre travail a porté essentiellement sur les algorithmes parallèles pour les solveurs d'algèbre linéaire numérique et leur implémentation parallèle dans les bibliothèques logicielles du domaine public. Nous illustrons dans ce manuscrit comment ces calculs peuvent être accélérées en utilisant des algorithmes innovants et être rendus fiables en utilisant des quantités spécifiques de l'analyse d'erreur. Nous expliquons tout d'abord comment les solveurs d'algèbre linéaire numérique peuvent être conçus de façon à exploiter les capacités des calculateurs hétérogènes actuels comprenant des processeurs multicœurs et des GPUs. Nous considérons des algorithmes de factorisation dense pour lesquels nous décrivons la répartition des tâches entre les différentes unités de calcul et son influence en terme de coût des communications. Ces cal- culs peuvent être également rendus plus performants grâce à des algorithmes en précision mixte qui utilisent une précision moindre pour les tâches les plus coûteuses tout en calculant la solution en précision supérieure. Puis nous décrivons notre travail de recherche dans le développement de solveurs d'algèbre linéaire rapides qui utilisent des algorithmes randomisés. La randomisation représente une approche innovante pour accélérer les calculs d'algèbre linéaire et la classe d'algorithmes que nous proposons a l'avantage de réduire la volume de communications dans les factorisations en supprimant complètement la phase de pivotage dans les systèmes linéaires. Les logiciels correspondants on été développés pour architectures multicœurs éventuellement accélérées par des GPUs. Enfin nous proposons des outils qui nous permettent de garantir la qualité de la solution calculée pour les problèmes de moindres carrés sur-déterminés, incluant les moindres carrés totaux. Notre méthode repose sur la dérivation de formules exactes ou d'estimateurs pour le conditionnement de ces problèmes. Nous décrivons les algorithmes et les logiciels qui permettent de calculer ces quantités avec les bibliothèques logicielles parallèles standards. Des pistes de recherche pour les années à venir sont données dans un chapître de conclusion. Calcul haute-performance solveurs d'algèbre linéaire dense systèmes linéaires moindres carrés linéaires processeurs multicœurs Graphics Processing Units (GPU) algorithmes randomisés analyse d'erreur inverse estimation de conditionnement LAPACK ScaLAPACK PLASMA MAGMA

Search results