Global ETD Search

1	Investigating opportunities for instruction-level parallelism for stack machine code Shi, Huibin January 2006 (has links) No description available. 005.275
2	High-performance parallel computing for real-time signal processing and control VillasenÌƒor, JoseÌ Miguel MartiÌnez January 2006 (has links) No description available. 005.275
3	Concurrent program development McEwan, Alistair A. January 2006 (has links) No description available. 005.275
4	Exploiting object structure in hardware transactional memory Khan, Behram January 2009 (has links) Fundamental limits in integrated circuit technology are bringing about the acceptance that multi-core and, in the future, many-core processors will be commonplace. If general purpose applications are required to exhibit high performance on such processors, it will be necessary to develop new, easy to use, parallel programming techniques. Traditionally, concurrent programming has employed locks to safeguard concurrent access to shared data, but these are known to be challenging to use, and only a minority of developers have the expertise to write robust, let alone highly scalable, lock-based code. Transactional Memory (TM) is a new concurrent programming model that is receiving attention as a way of expressing parallelism for programming multi-core systems. As a parallel programming model it is able to avoid the complexity of conventional locking, while attempting to deliver similar or better performance. The ACI (atomicity, consistency, isolation) properties of transactions provide a foundation to ensure that concurrent reads and writes of shared data do not produce inconsistent or incorrect results. 005.275
5	Efficient execution of concurrent applications using transactional memory Ansari, Mohammad M. January 2009 (has links) Challenges in the research and development of uniprocessors have led to the rise of multi-cores. However, multi-cores introduce new challenges to software development: applications will need to be developed using concurrent programming techniques and highly scalable to improve performance on successive generations of multi-cores, i.e. as the number of cores increases. Traditionally concurrent programming has employed locks to safeguard concurrent access to shared data, but these are known to be challenging to use, and only a minority of developers have the expertise to write correct, let alone highly scalable, lock-based code. 005.275
6	Nested parallelism for multi-core systems using Java Shafi, Aamir January 2006 (has links) No description available. 005.275
7	High-level neighbourhood-based programming abstractions for parallel processing Turkington, G. A. January 2005 (has links) No description available. 005.275
8	The automatic implementation of a dynamic load balancing strategy within structured mesh codes generated using a parallelisation tool Rodrigues, Jacqueline Nadine January 2003 (has links) This research demonstrates that the automatic implementation of a dynamic load balancing (DLB) strategy within a parallel SPMD (single program multiple data) structured mesh application code is possible. It details how DLB can be effectively employed to reduce the level of load imbalance in a parallel system without expert knowledge of the application. Furnishing CAPTools (the Computer Aided Parallelisation Tools) with the additional functionality of DLB, a DLB parallel version of the serial Fortran 77 application code can be generated quickly and easily with the press of a few buttons, allowing the user to obtain results on various platforms rather than concentrate on implementing a DLB strategy within their code. Results show that the devised DLB strategy has successfully decreased idle time by locally increasing/decreasing processor workloads as and when required to suit the parallel application, utilising the available resources efficiently. Several possible DLB strategies are examined with the understanding that it needs to be generic if it is to be automatically implemented within CAPTools and applied to a wide range of application codes. This research investigates the issues surrounding load imbalance, distinguishing between processor and physical imbalance in terms of the load redistribution of a parallel application executed on a homogeneous or heterogeneous system. Issues such as where to redistribute the workload, how often to redistribute, calculating and implementing the new distribution (deciding what data arrays to redistribute in the latter case), are all covered in detail, with many of these issues common to the automatic implementation of DLB for unstructured mesh application codes. The devised DLB Staggered Limit Strategy discussed in this thesis offers flexibility as well as ease of implementation whilst minimising changes to the user's code. The generic utilities developed for this research are discussed along with their manual implementation upon which the automation algorithms are based, where these utilities are interchangeable with alternative methods if desired. This thesis aims to encourage the use of the DLB Staggered Limit Strategy since its benefits are evidently significant and are now easily achievable with its automatic implementation using CAPTools. 005.275
9	A unified model for inter- and intra-processor concurrency Schweigler, Mario January 2006 (has links) Although concurrency is generally perceived to be a `hard' subject, it can in fact be very simple --- provided that the underlying model is simple. The occam-pi parallel processing language provides such a simple yet powerful concurrency model that is based on CSP and the pi-calculus. This thesis presents pony, the occam-pi Network Environment. occam-pi and pony provide a new, unified, concurrency model that bridges inter- and intra-processor concurrency. This enables the development of distributed applications in a transparent, dynamic and highly scalable way. The author specified the layout of the pony system as presented in this thesis, and carried out about 90% of the implementation. This thesis is structured into three main parts, as well as an introduction and an appendix. In the introduction, the need for a unified concurrency model is examined in detail. Thereupon, the pony environment is presented as a solution that provides such a unified model. The first part of this thesis is concerned with the usage of the pony environment for the development of distributed applications. It presents the interface between pony and the user-level code, as well as pony's configuration and a sample application. The second part presents the design and implementation of the pony environment. It explains the internal structure of pony, the implementation of pony's components and public processes, and the integration of pony in the KRoC compiler. The third part evaluates pony's performance and contains the final conclusions. It presents a number of performance tests and concludes with a discussion of the work presented in this thesis, along with an outline of possible future research. 005.275 QA 76 Software, computer programming,
10	High-level structured programming models for explicit and automatic parallelization on multicore architectures / Modèle de programmation de haut niveau pour la parallélisation expicite et automatique : application aux architectures multicoeurs Khammassi, Nader 05 December 2014 (has links) La prolifération des architectures multi-coeurs est source d’unepression importante pour les developpeurs, qui doivent chercherà paralléliser leurs applications de manière à profiter au mieux deces plateformes. Malheureusement, les modèles de programmationde bas niveau amplifient les difficultés inhérentes à la conceptiond’applications complexes et parallèles. Il existe donc une attentepour des modèles de programmation de plus haut niveau, quipuissent simplifier la vie des programmeurs de manière significative,tout en proposant des abstractions suffisantes pour absorberl’hétérogénéité des architectures matérielles.Contrairement à une multitude de modèles de programmation parallèlequi introduisent de nouveaux langages, annotations ou étendentdes langages existants et requièrent donc des compilateurs spécialisés,nous exploitons ici le potentiel du language C++ standardet traditionnel. En particulier nous avons recours à ses capacitésen terme de meta-programmation, afin de fournir au programmeurune interface de programmation parallèle simple et directe. Cetteinterface autorise le programmeur à exprimer le parallélismede son application au prix d’une altération négligeable du codeséquentiel initial. Un runtime intelligent se charge d’extraire touteinformation relative aux dépendances de données entre tâches,ainsi que celles relatives à l’ordonnancement. Nous montronscomment ce runtime est à même d’exploiter ces informations dansle but de détecter et protéger les données partagées, puis réaliserun ordonnancement prenant en compte les particularités des caches.L’implémentation initiale de notre modèle de programmation est unelibrairie C++ pure appelée XPU. XPU est conÃ˘gue dans le but defaciliter l’explicitation, par le programmeur, du parallélisme applicatif.Une seconde réalisation appelée FATMA doit être considérée commeune extension d’XPU qui permet une détection automatique desdépendances dans une séquence de tâches : il s’agit donc de parallélisationautomatique, sans recours à quelque outil que se soit,excepté un compilateur C++ standard. Afin de démontrer le potentielde notre approche, nous utilisons ces deux outils –XPU et FATMA–pour paralléliser des problèmes populaires, ainsi que des applicationsindustrielles réelles. Nous montrons qu’en dépit de leur abstractionélevée, nos modèles de programmation présentent des performancescomparables à des modèles de programmation de basniveau,et offrent un meilleur compromis productivité-performance. / The continuous proliferation of multicore architectures has placeddevelopers under great pressure to parallelize their applicationsaccordingly with what such platforms can offer. Unfortunately,traditional low-level programming models exacerbate the difficultiesof building large and complex parallel applications. High-level parallelprogramming models are in high-demand as they reduce the burdenson programmers significantly and provide enough abstraction toaccommodate hardware heterogeneity. In this thesis, we proposea flexible parallelization methodology, and we introduce a newtask-based parallel programming model designed to provide highproductivity and expressiveness without sacrificing performance.Our programming model aims to ease expression of both sequentialexecution and several types of parallelism including task, data andpipeline parallelism at different granularity levels to form a structuredhomogeneous programming model.Contrary to many parallel programming models which introducenew languages, compiler annotations or extend existing languagesand thus require specialized compilers, extra-hardware or virtualmachines..., we exploit the potential of the traditional standardC++ language and particularly its meta-programming capabilities toprovide a light-weight and smart parallel programming interface. Thisprogramming interface enable programmer to express parallelismat the cost of a little amount of extra-code while reuse its legacysequential code almost without any alteration. An intelligent run-timesystem is able to extract transparently many information on task-datadependencies and ordering. We show how the run-time system canexploit these valuable information to detect and protect shared dataautomatically and perform cache-aware scheduling.The initial implementation of our programming model is a pure C++library named "XPU" and is designed for explicit parallelism specification.A second implementation named "FATMA" extends XPU andexploits the transparent task dependencies extraction feature to provideautomatic parallelization of a given sequence of tasks withoutneed to any specific tool apart a standard C++ compiler. In order todemonstrate the potential of our approach, we use both of the explicitand automatic parallel programming models to parallelize popularproblems as well as real industrial applications. We show thatdespite its high abstraction, our programming models provide comparableperformances to lower-level programming models and offersa better productivity-performance tradeoff. XPU FATMA CHATS Parallélisation automatique Squelettes algorithmiques Multicore architectures 005.275

Search results