• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 19
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 33
  • 33
  • 8
  • 8
  • 7
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Appliction-driven Memory System Design on FPGAs

Dai, Zefu 08 January 2014 (has links)
Moore's Law has helped Field Programmable Gate Arrays (FPGAs) scale continuously in speed, capacity and energy efficiency, allowing the integration of ever-larger systems into a single FPGA chip. This brings challenges to the productivity of developers in leveraging the sea of FPGA resources. Higher level of design abstractions and programming models are needed to improve the design productivity, which in turn require memory architectural supports on FPGAs. While previous efforts focus on computation-centric applications, we take a bandwidth-centric approach in designing memory systems. In particular, we investigate the scheduling, buffered switching and searching problems, which are common to a wide range of FPGA applications. Despite that the bandwidth problem has been extensively studied for general-purpose computing and application specific integrated circuit (ASIC) designs, the proposed techniques are often not applicable to FPGAs. In order to achieve optimized design implementations, designers need to take into consideration both the underlying FPGA physical characteristics as well as the requirements from applications. We therefore extract design requirements from four driving applications for the selected problems, and address them by exploiting the physical architectures and available resources of FPGAs. Towards solving the selected problems, we manage to advance state-of-the-art with a scheduling algorithm, a switch organization and a cache analytical model. These lead to performance improvements, resource savings and feasibilities of new approaches for well-known problems.
22

Contribution à l’évaluation de la qualité de la collaboration en conception de produits. / Contribution to the assessment of the quality of collaboration in product design.

Kobenan, Kouamé Jean-Moïse 09 December 2016 (has links)
L’organisation fonctionnelle des entreprises impose un fonctionnement en mode projet. Ces projets sont menés par des experts de différentes spécialités et d’origine diverses. Ces équipes ont besoin d’outils pour être performant et proposer des solutions adaptées aux multiples besoins des marchés de plus en plus exigeants. Au cours de leurs réunions et activités, ils se créé une conscience de groupe qui s’enrichit à travers les interactions et s’appuie sur divers représentations externes. Cette thèse aborde l’étude des mécanismes permettant la performance des équipes de conception collaborative en réunion synchrone. Elle a essayé de démontrer les liens entre leur Système de Mémoire Transactive (TMS), leurs Activités de Conception Collaboratives (CDA) et leurs Objets désignés Ressources (RSC), et d’identifier les éléments favorisant ces liens. Nous avons dans un premier temps réalisé une enquête auprès des équipes de conception collaborative en environnement universitaire. Puis dans un second temps nous avons réalisé une expérimentation sur deux équipes en situation de conception collaborative synchrone autour du Serious Game Delta Design déployé sur une table interactive. Les résultats de cette recherche montrent que si la performance est favorisée par la TMS ou la CDA, leur lien statistique est établit, de même que le lien entre les activités de conception et les ressources mobilisées. Cependant, l’absence de lien statistique fort TMS CDA montre que les équipes ne semblent conscientes de l’utilisation des objets dans la construction de leur TMS, alors que les artefacts de la table interactive sont les plus sollicités et les activités de prise de décision semblent les plus importantes dans ces séances.Cette thèse contribue au corpus de connaissance sur l’étude des équipes de conception collaborative et mets à disposition des outils d’évaluation de la performance des équipes de conception collaboratives synchrone. / Today functional structure of enterprises requires teamwork and project mode organization. These projects are carried out by experts from different domains of expertise. Besides the teams need supporting tools in order to improve the efficiency of their design process and to propose adapted solutions to complex design problems. In the course of their meetings and collaborative activities they create group awareness through their interactions and creation of various artifacts. This thesis studies mechanisms that underlie performance of collaborative design team in synchronous meeting. In this thesis we demonstrate the links between Transactive Memory System (TMS), Collaborative Design Activities (CDA) and Objects called Resources (RSC), and identify the elements that drive these links. A survey has been used to study collaborative design teams in academic environment. Then, we performed direct observation of two teams during synchronous collaborative design meetings with a serious game Delta Design on interactive table. Results show that if performance is enhanced statistically by good TMS or CDA, and the link is demonstrate, so is the links between CDA RSC. However, the lack of statistical strong correlation between TMS and RSC seems to show that team members are not conscious of using artifacts during TMS building. While artifacts on interactive table are more solicited and decision making activities seems to be more important during their session. This thesis contributes to a better understanding of collaborative design teams and offers tools to evaluate collaborative design activities.
23

Caracterização de memorias analogicas implementadas com transistores MOS floating gate / Analogic memories characterization implemented with floating gate MOS transistors

Couto, Andre Luis do 28 November 2005 (has links)
Orientador: Carlos Alberto dos Reis Filho / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-07T11:14:24Z (GMT). No. of bitstreams: 1 Couto_AndreLuisdo_M.pdf: 2940356 bytes, checksum: 959908541a3bc46b7b7035eb035de186 (MD5) Previous issue date: 2005 / Resumo: A integração de memórias e circuitos analógicos em um mesmo die oferece diversas vantagens: redução de espaço nas placas, maior confiabilidade, menor custo. Para tanto, prescindir-se de tecnologia específica à confecção de memórias e utilizar-se somente de tecnologia CMOS convencional é requisito para tal integração. Essa pode ser tanto mais eficiente quanto maior a capacidade de armazenagem de dados, ou seja, maior a densidade de informação. Para isso, memórias analógicas mostram-se bem mais adequadas, posto que em uma só célula (um ou dois transistores) podem ser armazenados dados que precisariam de diversas células de memórias digitais e, portanto, de maior área. Neste trabalho, transistores MOS com porta flutuante mostraram-se viáveis de serem confeccionados e resultados de caracterização como tipos de programação, retenção de dados e endurance foram obtidos. O trabalho apresenta as principais características dos FGMOS (Floating Gate MOS) e presta-se como referência à futuros trabalhos na área / Abstract:Monolithic integration of memories and analog circuits ,in the same die offers interesting advantages like: smaller application boards, higher robustness and mainly lower costs. Today, a profitable integration of these kind of circuit can only be possible using conventional CMOS technology, which allows efficiently extraordinary levels of integration. Thus, the possibility of integrating analog memories looks more suitable since one single cell (usually use one or two transistors) serves for storing the same data stored by few digital memory cells, therefore, they requiring less area. In this work, it was implemented different memory cells together with few devices using floating gate MOS transistors and manufactured by a conventional CMOS technology. Differemt sort of programrning', data retention, and endurance were characterized as well as the main characteristics of the FGMOS (Floating Gate MOS) were obtained. The results of their characterization reveal that is possible to make and' to program fIoating gate MOSFETS analog memories and must serve as starting-point and reference for new academic studies / Mestrado / Eletrônica, Microeletrônica e Optoeletrônica / Mestre em Engenharia Elétrica
24

Benefits of transactive memory systems in large-scale development

Aivars, Sablis January 2016 (has links)
Context. Large-scale software development projects are those consisting of a large number of teams, maybe even spread across multiple locations, and working on large and complex software tasks. That means that neither a team member individually nor an entire team holds all the knowledge about the software being developed and teams have to communicate and coordinate their knowledge. Therefore, teams and team members in large-scale software development projects must acquire and manage expertise as one of the critical resources for high-quality work. Objectives. We aim at understanding whether software teams in different contexts develop transactive memory systems (TMS) and whether well-developed TMS leads to performance benefits as suggested by research conducted in other knowledge-intensive disciplines. Because multiple factors may influence the development of TMS, based on related TMS literature we also suggest to focus on task allocation strategies, task characteristics and management decisions regarding the project structure, team structure and team composition. Methods. We use the data from two large-scale distributed development companies and 9 teams, including quantitative data collected through a survey and qualitative data from interviews to measure transactive memory systems and their role in determining team performance. We measure teams’ TMS with a latent variable model. Finally, we use focus group interviews to analyze different organizational practices with respect to team management, as a set of decisions based on two aspects: team structure and composition, and task allocation. Results. Data from two companies and 9 teams are analyzed and the positive influence of well-developed TMS on team performance is found. We found that in large-scale software development, teams need not only well-developed team’s internal TMS, but also have well- developed and effective team’s external TMS. Furthermore, we identified practices that help of hinder development of TMS in large-scale projects. Conclusions. Our findings suggest that teams working in large-scale software development can achieve performance benefits if transactive memory practices within the team are supported with networking practices in the organization.
25

Adaptive and intelligent memory systems / Système mémoire adaptatif intelligent

Sridharan, Aswinkumar 15 December 2016 (has links)
Dans cette thèse, nous nous sommes concentrés sur l'interférence aux ressources de la hiérarchie de la mémoire partagée : cache de dernier niveau et accès à la mémoire hors-puce dans le contexte des systèmes multicœurs à grande échelle. À cette fin, le premier travail a porté sur les caches de dernier niveau partagées, où le nombre d'applications partageant le cache pourrait dépasser l'associativité du cache. Pour gérer les caches dans de telles situations, notre solution évalue l'empreinte du cache des applications pour déterminer approximativement à quel point elles pourraient utiliser le cache. L'estimation quantitative de l'utilitaire de cache permet explicitement de faire respecter différentes priorités entre les applications. La seconde partie apporte une prédétection dans la gestion de la mémoire cache. En particulier, nous observons les blocs cache pré-sélectionnés pour présenter un bon comportement de réutilisation dans le contexte de caches plus grands. Notre troisième travail est axé sur l'interférence entre les demandes à la demande et les demandes de prélecture à l'accès partagé à la mémoire morte. Ce travail est basé sur deux observations fondamentales de la fraction des requêtes de prélecture générées et de sa corrélation avec l'utilité de prélecture et l'interférence causée par le prélecteur. Au total, deux observations conduisent à contrôler le flux de requêtes de prélecture entre les mémoires LLC et off-chip. / In this thesis, we have focused on addressing interference at the shared memory-hierarchy resources: last level cache and off-chip memory access in the context of large-scale multicore systems. Towards this end, the first work focused on shared last level caches, where the number of applications sharing the cache could exceed the associativity of the cache. To manage caches in such situations, our solution estimates the cache footprint of applications to approximate how well they could utilize the cache. Quantitative estimate of cache utility explicitly allows enforcing different priorities across applications. The second part brings in prefetch awareness in cache management. In particular, we observe prefetched cache blocks to exhibit good reuse behavior in the context of larger caches. Our third work focuses on addressing interference between on-demand and prefetch requests at the shared off-chip memory access. This work is based on two fundamental observations of the fraction of prefetch requests generated and its correlation with prefetch usefulness and prefetcher-caused interference. Altogether, two observations lead to control the flow of prefetch requests between LLC and off-chip memory.
26

Hierarchical Matrix Techniques on Massively Parallel Computers

Izadi, Mohammad 12 April 2012 (has links)
Hierarchical matrix (H-matrix) techniques can be used to efficiently treat dense matrices. With an H-matrix, the storage requirements and performing all fundamental operations, namely matrix-vector multiplication, matrix-matrix multiplication and matrix inversion can be done in almost linear complexity. In this work, we tried to gain even further speedup for the H-matrix arithmetic by utilizing multiple processors. Our approach towards an H-matrix distribution relies on the splitting of the index set. The main results achieved in this work based on the index-wise H-distribution are: A highly scalable algorithm for the H-matrix truncation and matrix-vector multiplication, a scalable algorithm for the H-matrix matrix multiplication, a limited scalable algorithm for the H-matrix inversion for a large number of processors.
27

Memory Turbo Boost: Architectural Support for Using Unused Memory for Memory Replication to Boost Server Memory Performance

Zhang, Da 28 June 2023 (has links)
A significant portion of the memory in servers today is often unused. Our large-scale study of HPC systems finds that more than half of the total memory in active nodes running user jobs are unused for 88% of the time. Google and Azure Cloud studies also report unused memory accounts for 40% of the total memory in their servers, on average. Leaving so much memory unused is wasteful. To address this problem, we note that in the context of CPUs, Turbo Boost can turn off the unused cores to boost the performance of in-use cores. However, there is no equivalent technology in the context of memory; no matter how much memory is unused, the performance of in-use memory remains the same. This dissertation explores architectural techniques to utilize the unused memory to boost the performance of in-use memory and refer to them collectively as Memory Turbo Boost. This dissertation explores how to turbo boost memory performance through memory replication; specifically, it explores how to efficiently store the replicas in the unused memory and explores multiple architectural techniques to utilize the replicas to enhance memory system performance. Performance simulations show that Memory Turbo Boost can improve node-level performance by 18%, on average across a wide spectrum of workloads. Our system-wide simulations show applying Memory Turbo Boost to an HPC system provides 1.4x average speedup on job turnaround time. / Doctor of Philosophy / Today's servers often have a significant portion of their memory unused. Our large-scale study of HPC systems finds that more than half of the total memory of an HPC server is unused for most of the time; Google and Azure Cloud studies find that 40% of the total memory in their servers is often unused. Today's servers usually have 100s of GBs to TB memory; 40% unused memory means 10s-100s of GBs unused memory on the servers. Leaving so much memory unused is wasteful. To address this problem, I note that there are techniques to leverage unused hardware resources to improve the performance of in-use resources in other types of hardware. For example, CPU Turbo Boost can turn off the unused cores to boost the performance of in-use cores; modern SSDs can use the unused space to switch the Multi-Level Cell blocks to Single-Level Cell blocks to boost performance. However, there is no equivalent technology in the context of memory; no matter how much memory is unused, the performance of in-use memory remains the same. This dissertation explores techniques to utilize the unused memory to boost the performance of in-use memory and refer to them collectively as Memory Turbo Boost. Performance evaluations show that Memory Turbo Boost can provide up to 18% average performance improvement.
28

Optimization of memory management on distributed machine

Ha, Viet Hai 05 October 2012 (has links) (PDF)
In order to explore further the capabilities of parallel computing architectures such as grids, clusters, multi-processors and more recently, clouds and multi-cores, an easy-to-use parallel language is an important challenging issue. From the programmer's point of view, OpenMP is very easy to use with its ability to support incremental parallelization, features for dynamically setting the number of threads and scheduling strategies. However, as initially designed for shared memory systems, OpenMP is usually limited on distributed memory systems to intra-nodes' computations. Many attempts have tried to port OpenMP on distributed systems. The most emerged approaches mainly focus on exploiting the capabilities of a special network architecture and therefore cannot provide an open solution. Others are based on an already available software solution such as DMS, MPI or Global Array and, as a consequence, they meet difficulties to become a fully-compliant and high-performance implementation of OpenMP. As yet another attempt to built an OpenMP compliant implementation for distributed memory systems, CAPE − which stands for Checkpointing Aide Parallel Execution − has been developed which with the following idea: when reaching a parallel section, the master thread is dumped and its image is sent to slaves; then, each slave executes a different thread; at the end of the parallel section, slave threads extract and return to the master thread the list of all modifications that has been locally performed; the master includes these modifications and resumes its execution. In order to prove the feasibility of this paradigm, the first version of CAPE was implemented using complete checkpoints. However, preliminary analysis showed that the large amount of data transferred between threads and the extraction of the list of modifications from complete checkpoints lead to weak performance. Furthermore, this version was restricted to parallel problems satisfying the Bernstein's conditions, i.e. it did not solve the requirements of shared data. This thesis aims at presenting the approaches we proposed to improve CAPE' performance and to overcome the restrictions on shared data. First, we developed DICKPT which stands for Discontinuous Incremental Checkpointing, an incremental checkpointing technique that supports the ability to save incremental checkpoints discontinuously during the execution of a process. Based on the DICKPT, the execution speed of the new version of CAPE was significantly increased. For example, the time to compute a large matrix-matrix product on a desktop cluster has become very similar to the execution time of the same optimized MPI program. Moreover, the speedup associated with this new version for various number of threads is quite linear for different problem sizes. In the side of shared data, we proposed UHLRC, which stands for Updated Home-based Lazy Release Consistency, a modified version of the Home-based Lazy Release Consistency (HLRC) memory model, to make it more appropriate to the characteristics of CAPE. Prototypes and algorithms to implement the synchronization and OpenMP data-sharing clauses and directives are also specified. These two works ensures the ability for CAPE to respect shared-data behavior
29

兩人團隊創意交融系絡之比較個案研究 / Creative collaboration within two-person teams: A comparative case study

樊學良, Fan, Hsueh Liang Unknown Date (has links)
「兩人」是團隊組成的最小單位,回顧人類社會經濟活動的發展,許多重要的創新,經常是從兩人團隊開始。特別是以創意為核心的任務或作為,更易於以兩人為基礎,個別尋覓契合自我價值觀、專業能力,和人格特質的合作搭檔。雖然兩人團隊是人類社會經濟活動的常態,存在著重要的學理和實務意涵需要探討;但是,目前有關兩人團隊的文獻仍不多見。本研究的目的,即在回應此一文獻缺口,探討兩人團隊創意交融的系絡。 研究採取比較個案研究法,以八個兩人設計團隊進行半結構的深度訪談,正式研究前則進行了兩個前導個案的研究。經彙整訪談逐字稿之文本分析內容後,本研究有三個主要發現。第一,兩人設計團隊創意協作的歷程,從成員對產品核心概念的探索開始,之後會經歷共同建構創意、精緻化創意和實踐創意等過程。而產品核心概念的頓悟,通常是兩人在相近的問題意識下,從參與共享的情境當中浮現。此一情境浮現之頓悟並不專屬哪一方,但又必須仰賴兩人共同參與情境而獲得。第二,友誼是促進兩人設計團隊發展交融記憶系統的重要因素,友誼不僅帶動了兩人從玩票接案到正式形成團隊,同時也在團隊運作過程中,增強交融記憶系統本身的能量。第三,成員會透過不同形式的溝通互動,促進資訊的交流並精緻彼此的創意,這些形式包括了競爭型互動、合作型互動,和基於玩興的互動。 整體而言,本研究拓展了吾人對於兩人團隊及其創意交融的理解,也對團隊經營和團隊創意管理的實務有所貢獻。團隊發展實務上,可培養交融記憶系統成為團隊賴以運作的核心能耐,並透過成員共同參與情境的方式,營造有利於兩人從事創意交融活動的場域。 / ‘Two person’ is a basic unit of the composition of a team. Looking back on the historical development of the social and economic activities of the human, many radical innovations were often invented by two-person teams. Especially, creative workers, based on the two-person team structure, are more likely to search for the right partner that whose vision, value, expertise, and personality are best fit with each other. Despite the research on two-person team could provide valuable theoretical and practical implications for team literature; very little attention has given to such phenomena. Using a multi-case of eight two-person design teams, this dissertation explores the creative collaboration within the two-person teams. Through the qualitative analysis of the data, this dissertation provides three major findings. First, the creative collaboration processes within the two-person team contain four stages which begin exploring the core concept of the product. Then, team members collectively co-construct, elaborate and execute the core concept of the product. The insight of core concept of the product usually emerges from the two persons who have similar problem awareness and collectively engage in a shared context. This insight is not exclusive which party, but must rely on both sides to collectively engage in the shared context to develop. Second, friendship plays an important role for facilitating the development of the team’s transactive memory system (TMS). Friendship not only helps the two persons to build a team, help the team’s TMS function well in the team development processes. Third, members within the two-person team use three kinds of dialogue communication for exchanging information and elaborating creative ideas of the members. The three kinds of dialogue are: competitive dialogue, collaborative dialogue, and playfulness dialogue. Overall, this dissertation extends the current knowledge in management area about the creative collaboration within the two-person teams. Results of this dissertation also contribute to the management for two-person teams and the management of team members’ creative ideas. This dissertation suggests the two-person teams can build TMS as a team’s core capability and engage in shared contexts where beneficial for the creative collaboration.
30

Optimization of memory management on distributed machine / Optimisation de la gestion mémoire sur machines distribuées

Ha, Viet Hai 05 October 2012 (has links)
Afin d'exploiter les capacités des architectures parallèles telles que les grappes, les grilles, les systèmes multi-processeurs, et plus récemment les nuages et les systèmes multi-cœurs, un langage de programmation universel et facile à utiliser reste à développer. Du point de vue du programmeur, OpenMP est très facile à utiliser en grande partie grâce à sa capacité à supporter une parallélisation incrémentale, la possibilité de définir dynamiquement le nombre de fils d'exécution, et aussi grâce à ses stratégies d'ordonnancement. Cependant, comme il a été initialement conçu pour des systèmes à mémoire partagée, OpenMP est généralement très limité pour effectuer des calculs sur des systèmes à mémoire distribuée. De nombreuses solutions ont été essayées pour faire tourner OpenMP sur des systèmes à mémoire distribuée. Les approches les plus abouties se concentrent sur l’exploitation d’une architecture réseau spéciale et donc ne peuvent fournir une solution ouverte. D'autres sont basées sur une solution logicielle déjà disponible telle que DMS, MPI ou Global Array, et par conséquent rencontrent des difficultés pour fournir une implémentation d'OpenMP complètement conforme et à haute performance. CAPE — pour Checkpointing Aided Parallel Execution — est une solution alternative permettant de développer une implémentation conforme d'OpenMP pour les systèmes à mémoire distribuée. L'idée est la suivante : en arrivant à une section parallèle, l'image du thread maître est sauvegardé et est envoyée aux esclaves ; puis, chaque esclave exécute l'un des threads ; à la fin de la section parallèle, chaque threads esclaves extraient une liste de toutes modifications ayant été effectuées localement et la renvoie au thread maître ; le thread maître intègre ces modifications et reprend son exécution. Afin de prouver la faisabilité de cette approche, la première version de CAPE a été implémentée en utilisant des points de reprise complets. Cependant, une analyse préliminaire a montré que la grande quantité de données transmises entre les threads et l’extraction de la liste des modifications depuis les points de reprise complets conduit à de faibles performances. De plus, cette version est limitée à des problèmes parallèles satisfaisant les conditions de Bernstein, autrement dit, il ne permet pas de prendre en compte les données partagées. L'objectif de cette thèse est de proposer de nouvelles approches pour améliorer les performances de CAPE et dépasser les restrictions sur les données partagées. Tout d'abord, nous avons développé DICKPT (Discontinuous Incremental ChecKPoinTing), une technique points de reprise incrémentaux qui supporte la possibilité de prendre des points de reprise discontinue lors de l'exécution d'un processus. Basé sur DICKPT, la vitesse d'exécution de la nouvelle version de CAPE a été considérablement augmenté. Par exemple, le temps de calculer une grande multiplication matrice-matrice sur un cluster des ordinateurs bureaux est devenu très similaire à la durée d'exécution d'un programme MPI optimisé. En outre, l'accélération associée à cette nouvelle version pour divers nombre de threads est assez linéaire pour différentes tailles du problème. Pour des données partagées, nous avons proposé UHLRC (Updated Home-based Lazy Relaxed Consistency), une version modifiée de la HLRC (Home-based Lazy Relaxed Consistency) modèle de mémoire, pour le rendre plus adapté aux caractéristiques de CAPE. Les prototypes et les algorithmes à mettre en œuvre la synchronisation des données et des directives et clauses de données partagées sont également précisées. Ces deux travaux garantit la possibilité pour CAPE de respecter des demandes de données partagées d'OpenMP / In order to explore further the capabilities of parallel computing architectures such as grids, clusters, multi-processors and more recently, clouds and multi-cores, an easy-to-use parallel language is an important challenging issue. From the programmer's point of view, OpenMP is very easy to use with its ability to support incremental parallelization, features for dynamically setting the number of threads and scheduling strategies. However, as initially designed for shared memory systems, OpenMP is usually limited on distributed memory systems to intra-nodes' computations. Many attempts have tried to port OpenMP on distributed systems. The most emerged approaches mainly focus on exploiting the capabilities of a special network architecture and therefore cannot provide an open solution. Others are based on an already available software solution such as DMS, MPI or Global Array and, as a consequence, they meet difficulties to become a fully-compliant and high-performance implementation of OpenMP. As yet another attempt to built an OpenMP compliant implementation for distributed memory systems, CAPE − which stands for Checkpointing Aide Parallel Execution − has been developed which with the following idea: when reaching a parallel section, the master thread is dumped and its image is sent to slaves; then, each slave executes a different thread; at the end of the parallel section, slave threads extract and return to the master thread the list of all modifications that has been locally performed; the master includes these modifications and resumes its execution. In order to prove the feasibility of this paradigm, the first version of CAPE was implemented using complete checkpoints. However, preliminary analysis showed that the large amount of data transferred between threads and the extraction of the list of modifications from complete checkpoints lead to weak performance. Furthermore, this version was restricted to parallel problems satisfying the Bernstein's conditions, i.e. it did not solve the requirements of shared data. This thesis aims at presenting the approaches we proposed to improve CAPE' performance and to overcome the restrictions on shared data. First, we developed DICKPT which stands for Discontinuous Incremental Checkpointing, an incremental checkpointing technique that supports the ability to save incremental checkpoints discontinuously during the execution of a process. Based on the DICKPT, the execution speed of the new version of CAPE was significantly increased. For example, the time to compute a large matrix-matrix product on a desktop cluster has become very similar to the execution time of the same optimized MPI program. Moreover, the speedup associated with this new version for various number of threads is quite linear for different problem sizes. In the side of shared data, we proposed UHLRC, which stands for Updated Home-based Lazy Release Consistency, a modified version of the Home-based Lazy Release Consistency (HLRC) memory model, to make it more appropriate to the characteristics of CAPE. Prototypes and algorithms to implement the synchronization and OpenMP data-sharing clauses and directives are also specified. These two works ensures the ability for CAPE to respect shared-data behavior

Page generated in 0.0385 seconds