Return to search

Self-tuned parallel runtimes: a case of study for OpenMP

In recent years parallel computing has become ubiquitous. Lead by the spread of commodity multicore processors, parallel programming is not anymore an obscure discipline only mastered by a few.Unfortunately, the amount of able parallel programmers has not increased at the same speed because is not easy to write parallel codes.Parallel programming is inherently different from sequential programming. Programmers must deal with a whole new set of problems: identification of parallelism, work and data distribution, load balancing, synchronization and communication.Parallel programmers have embraced several languages designed to allow the creation of parallel applications. In these languages, the programmer is not only responsible of identifying the parallelism but also of specifying low-level details of how the parallelism needs to exploited (e.g. scheduling, thread distribution ...). This is a burden than hampers the productivity of the programmers.We demonstrate that is possible for the runtime component of a parallel environment to adapt itself to the application and the execution environment and thus reducing the burden put into the programmer. For this purpose we study three different parameters that are involved in the parallel exploitation of the OpenMP parallel language: parallel loop scheduling, thread allocation in multiple levels of parallelism and task granularity control.In all the cases, we propose a self-tuned algorithm that will first perform an on-line profiling of the application and based on the information gathered it will adapt the value of the parameter to the one that maximizes the performance of the application.Our goal is not to develop methods that outperform a hand-tuned application for a specific scenario, as this is probably just as difficult as compiler code outperforming hand-tuned assembly code, but methods that get close to that performance with a minimum effort from the programmer. In other words, what we want to achieve with our self-tuned algorithms is to maximize the ratio performance over effort so the entry level to the parallelism is lower. The evaluation of our algorithms with different applications shows that we achieve that goal.

Identiferoai:union.ndltd.org:TDX_UPC/oai:www.tdx.cat:10803/6026
Date22 October 2008
CreatorsDurán González, Alejandro
ContributorsCorbalán González, Julita, Ayguadé i Parra, Eduard, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
PublisherUniversitat Politècnica de Catalunya
Source SetsUniversitat Politècnica de Catalunya
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/doctoralThesis, info:eu-repo/semantics/publishedVersion
Formatapplication/pdf
SourceTDX (Tesis Doctorals en Xarxa)
Rightsinfo:eu-repo/semantics/openAccess, ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.

Page generated in 0.0031 seconds