Global ETD Search

141	Akcelerace algoritmů na architektuře Larrabee / Algorithm Acceleration on Larrabee Platform Veselý, Ivo January 2010 (has links) Intel Larrabee is one of the first of fully programmable graphical architectures. Thesis describes this many-core architecture by hardware implementation and programmer's model point of view. Larrabee bets on many complete in-order cores, built over x86 instruction set. Cores contains four hardware threads, each with it's own register file, and new vector processing unit. Vector processing unit together with instruction set extension rapidly increases system performance. New cache modes helps to increase throughput even when irregular data structures. This architecture is not focused only on computer graphics nor image processing, but all parallel tasks. Second part of this text deals with hologram synthesis. Specifically, it brings two new methods for patch of point light sources generation with concrete radiation.
142	Code Optimization on GPUs Hong, Changwan 30 October 2019 (has links) No description available. Computer Science GPU performance modeling optimization SpMV SpMM SDDMM sparse matrix graph processing tiling multicore manycore matrix multiplication tensor stencil SIMD data locality CSR parallel load balance shared memory graph analytics
143	A SIMD Approach To Large-scale Real-time System Air Traffic Control Using Associative Processor and Consequences For Parallel Computing Yuan, Man 01 October 2012 (has links) No description available. Computer Science Air Traffic Control (ATC) SIMD MIMD Real-Time Systems Associative Processor (AP) Conflict Detection and Resolution (CDR) ClearSpeed CSX600 Multicore Processor OpenMP Federal Aviation Administration (FAA) Multiprocessor NP-complete Predictable
144	Simulator for optimizing performance and power of embedded multicore processors Goska, Benjamin J. 26 April 2012 (has links) This work presents improvements to a multi-core performance/power simulator. The improvements which include updated power models, voltage scaling aware models, and an application specific benchmark, are done to increase the accuracy of power models under voltage and frequency scaling. Improvements to the simulator enable more accurate design space exploration for a biomedical application. The work flow used to modify the simulator is also presented so similar modifications could be used on future simulators. / Graduation date: 2012 Low power digital SIMD Multicore Digital simulation Power modeling
145	SIMD-aware word length optimization for floating-point to fixed-point conversion targeting embedded processors / Optimisation SIMD de la largeur des mots pour la conversion de virgule flottante en virgule fixe pour des processeurs embarqués El Moussawi, Ali Hassan 16 December 2016 (has links) Afin de limiter leur coût et/ou leur consommation électrique, certains processeurs embarqués sacrifient le support matériel de l'arithmétique à virgule flottante. Pourtant, pour des raisons de simplicité, les applications sont généralement spécifiées en utilisant l'arithmétique à virgule flottante. Porter ces applications sur des processeurs embarqués de ce genre nécessite une émulation logicielle de l'arithmétique à virgule flottante, qui peut sévèrement dégrader la performance. Pour éviter cela, l'application est converti pour utiliser l'arithmétique à virgule fixe, qui a l'avantage d'être plus efficace à implémenter sur des unités de calcul entier. La conversion de virgule flottante en virgule fixe est une procédure délicate qui implique des compromis subtils entre performance et précision de calcul. Elle permet, entre autre, de réduire la taille des données pour le coût de dégrader la précision de calcul. Par ailleurs, la plupart de ces processeurs fournissent un support pour le calcul vectoriel de type SIMD (Single Instruction Multiple Data) afin d'améliorer la performance. En effet, cela permet l'exécution d'une opération sur plusieurs données en parallèle, réduisant ainsi le temps d'exécution. Cependant, il est généralement nécessaire de transformer l'application pour exploiter les unités de calcul vectoriel. Cette transformation de vectorisation est sensible à la taille des données ; plus leurs tailles diminuent, plus le taux de vectorisation augmente. Il apparaît donc un compromis entre vectorisation et précision de calcul. Plusieurs travaux ont proposé des méthodologies permettant, d'une part la conversion automatique de virgule flottante en virgule fixe, et d'autre part la vectorisation automatique. Dans l'état de l'art, ces deux transformations sont considérées indépendamment, pourtant elles sont fortement liées. Dans ce contexte, nous étudions la relation entre ces deux transformations, dans le but d'exploiter efficacement le compromis entre performance et précision de calcul. Ainsi, nous proposons d'abord un algorithme amélioré pour l'extraction de parallélisme SLP (Superword Level Parallelism ; une technique de vectorisation). Puis, nous proposons une nouvelle méthodologie permettant l'application conjointe de la conversion de virgule flottante en virgule fixe et de l'exploitation du SLP. Enfin, nous implémentons cette approche sous forme d'un flot de compilation source-à-source complètement automatisé, afin de valider ces travaux. Les résultats montrent l'efficacité de cette approche, dans l'exploitation du compromis entre performance et précision, vis-à-vis d'une approche classique considérant ces deux transformations indépendamment. / In order to cut-down their cost and/or their power consumption, many embedded processors do not provide hardware support for floating-point arithmetic. However, applications in many domains, such as signal processing, are generally specified using floating-point arithmetic for the sake of simplicity. Porting these applications on such embedded processors requires a software emulation of floating-point arithmetic, which can greatly degrade performance. To avoid this, the application is converted to use fixed-point arithmetic instead. Floating-point to fixed-point conversion involves a subtle tradeoff between performance and precision ; it enables the use of narrower data word lengths at the cost of degrading the computation accuracy. Besides, most embedded processors provide support for SIMD (Single Instruction Multiple Data) as a mean to improve performance. In fact, this allows the execution of one operation on multiple data in parallel, thus ultimately reducing the execution time. However, the application should usually be transformed in order to take advantage of the SIMD instruction set. This transformation, known as Simdization, is affected by the data word lengths ; narrower word lengths enable a higher SIMD parallelism rate. Hence the tradeoff between precision and Simdization. Many existing work aimed at provide/improving methodologies for automatic floating-point to fixed-point conversion on the one side, and Simdization on the other. In the state-of-the-art, both transformations are considered separately even though they are strongly related. In this context, we study the interactions between these transformations in order to better exploit the performance/accuracy tradeoff. First, we propose an improved SLP (Superword Level Parallelism) extraction (an Simdization technique) algorithm. Then, we propose a new methodology to jointly perform floating-point to fixed-point conversion and SLP extraction. Finally, we implement this work as a fully automated source-to-source compiler flow. Experimental results, targeting four different embedded processors, show the validity of our approach in efficiently exploiting the performance/accuracy tradeoff compared to a typical approach, which considers both transformations independently. Optimisation de la largeur des mots Vectorisation Processeurs embarqués Compilation source-À-Source Génération de code C Embedded processors Source-To-Source compilation Floating-Point to fixed-Point conversion Single Instruction Multiple Data (SIMD) Superword Level Parallelism Word length conversion C code generation
146	Ray-tracing s knihovnou IPP / Ray-tracing Using IPP Library Kukla, Michal January 2010 (has links) Master thesis is dealing with design and implementation of ray-tracing and path-tracing using IPP library. Theoretical part discusses current trends in acceleration of selected algorithms and also possibilities of parallelization. Design of ray-tracing and path-tracing algorithm and form of parallelization are described in proposal. This part also discusses implementation of adaptive sampling and importance sampling with Monte Carlo method to accelerate path-tracing algorithm. Next part is dealing with particular steps in implementation of selected rendering methods regarding IPP library. Implementation of network interface using Boost library is also discussed. At the end, implemented methods are subjected to performance and quality test. Final product of this thesis is server aplication capable of handling multiple connections which provides visualisation and client application which implements ray-tracing and path-tracing.

Page generated in 0.073 seconds