Spelling suggestions: "subject:"caches"" "subject:"coaches""
31 |
Caches collaboratifs noyau adaptés aux environnements virtualisés / A kernel cooperative cache for virtualized environmentsLorrillere, Maxime 04 February 2016 (has links)
Avec l'avènement du cloud computing, la virtualisation est devenue aujourd'hui incontournable. Elle offre isolation et flexibilité, en revanche elle implique une fragmentation des ressources, et notamment de la mémoire. Les performances des applications qui effectuent beaucoup d'entrées/sorties (E/S) en sont particulièrement impactées. En effet, celles-ci reposent en grande partie sur la présence de mémoire libre, utilisée par le système pour faire du cache et ainsi accélérer les E/S. Ajuster dynamiquement les ressources d'une machine virtuelle devient donc un enjeu majeur. Dans cette thèse nous nous intéressons à ce problème, et nous proposons Puma, un cache réparti permettant de mutualiser la mémoire inutilisée des machines virtuelles pour améliorer les performances des applications qui effectuent beaucoup d'E/S. Contrairement aux solutions existantes, notre approche noyau permet à Puma de fonctionner avec les applications sans adaptation ni système de fichiers spécifique. Nous proposons plusieurs métriques, reposant sur des mécanismes existants du noyau Linux, qui permettent de définir le niveau d'activité « cache » du système. Ces métriques sont utilisées par Puma pour automatiser le niveau de contribution d'un noeud au cache réparti. Nos évaluations de Puma montrent qu'il est capable d'améliorer significativement les performances d'applications qui effectuent beaucoup d'E/S et de s'adapter dynamiquement afin de ne pas dégrader leurs performances. / With the advent of cloud architectures, virtualization has become a key mechanism for ensuring isolation and flexibility. However, a drawback of using virtual machines (VMs) is the fragmentation of physical resources. As operating systems leverage free memory for I/O caching, memory fragmentation is particularly problematic for I/O-intensive applications, which suffer a significant performance drop. In this context, providing the ability to dynamically adjust the resources allocated among the VMs is a primary concern.To address this issue, this thesis proposes a distributed cache mechanism called Puma. Puma pools together the free memory left unused by VMs: it enables a VM to entrust clean page-cache pages to other VMs. Puma extends the Linux kernel page cache, and thus remains transparent, to both applications and the rest of the operating system. Puma adjusts itself dynamically to the caching activity of a VM, which Puma evaluates by means of metrics derived from existing Linux kernel memory management mechanisms. Our experiments show that Puma significantly improves the performance of I/O-intensive applications and that it adapts well to dynamically changing conditions.
|
32 |
Protocoles scalables de cohérence des caches pour processeurs manycore à espace d'adressage partagé visant la basse consommation. / Scalable cache coherence protocols for energy-efficient shared memory manycore processorsLiu, Hao 27 January 2016 (has links)
L'architecture TSAR (Tera-Scale ARchitecture) développée conjointement par BULL, le Lip6 et le CEA-LETI est une architecture manycore CC-NUMA extensible jusqu'à 1024 cœurs. Le protocole de cohérence de cache DHCCP dans l'architecture TSAR repose sur le principe du répertoire global distribué en utilisant la stratégie d'écriture simultanée afin de passer à l'échelle, mais cette scalabilité a un coût énergétique important que nous cherchons à réduire. Actuellement, les plus grosses entreprises dans le domaine des semi-conducteurs, comme Intel ou AMD, utilisent les protocoles MESI ou MOESI dans leurs processeurs multicoeurs. Ces types de protocoles utilisent la stratégie d'écriture différée pour réduire la forte consommation énergétique due aux écritures. Mais la complexité d'implémentation et la forte augmentation de ce trafic de cohérence quand le nombre de processeurs augmente limite le passage à l'échelle de ces protocoles au-delà de quelques dizaines de coeurs. Dans cette thèse, nous proposons un nouveau protocole de cohérence de cache utilisant une méthode hybride pour traiter les écritures dans le cache L1 privé : pour les lignes non partagées, le contrôleur de cache L1 utilise la stratégie d'écriture différée, de façon à modifier les lignes localement. Pour les lignes partagées, le contrôleur de cache L1 utilise la stratégie d'écriture immédiate pour éviter l'état de propriété exclusive sur ces lignes partagées. Cette méthode, appelée RWT pour Released Write Through, passe non seulement à l'échelle, mais réduit aussi significativement la consommation énergétique liée aux écritures. Nous avons aussi optimisé la solution actuelle pour gérer la cohérence des TLBs dans l'architecture TSAR, en termes de performance et de consommation énergétique. Enfin, nous introduisons dans cette thèse un nouveau petit cache, appelé micro-cache, entre le coeur et le cache L1, afin de réduire le nombre d'accès au cache d'instructions. / The TSAR architecture (Tera-Scale ARchitecture) developed jointly by Lip6 Bull and CEA-LETI is a CC-NUMA manycore architecture which is scalable up to 1024 cores. The DHCCP cache coherence protocol in the TSAR architecture is a global directory protocol using the write-through policy in the L1 cache for scalability purpose, but this write policy causes a high power consumption which we want to reduce. Currently the biggest semiconductors companies, such as Intel or AMD, use the MESI MOESI protocols in their multi-core processors. These protocols use the write-back policy to reduce the high power consumption due to writes. However, the complexity of implementation and the sharp increase in the coherence traffic when the number of processors increases limits the scalability of these protocols beyond a few dozen cores. In this thesis, we propose a new cache coherence protocol using a hybrid method to process write requests in the L1 private cache : for exclusive lines, the L1 cache controller chooses the write-back policy in order to modify locally the lines as well as eliminate the write traffic for exclusive lines. For shared lines, the L1 cache controller uses the write-through policy to simplify the protocol and in order to guarantee the scalability. We also optimized the current solution for the TLB coherence problem in the TSAR architecture. The new method which is called CC-TLB not only improves the performance, but also reduces the energy consumption. Finally, this thesis introduces a new micro cache between the core and the L1 cache, which allows to reduce the number of accesses to the instruction cache, in order to save energy.
|
33 |
Chiapa de Corzo Mound 3 Revisited: Burials, Caches, and ArchitectureOstler, Michaela Ann 01 August 2017 (has links)
Chiapa de Corzo Mound 3 was excavated by Tim Tucker under the direction of the New World Archaeological Foundation in July 1965. Mound 3 is located in the ritual center of Chiapa de Corzo, the southwest quadrant. Significant Preclassic and Protoclassic architecture, burials, and caches were discovered there but were never fully analyzed or published. A complete analysis of this mound is necessary to better understand the role of Chiapa de Corzo as a whole and as a regional power. This thesis completes the analysis and accomplishes the following goals: (1) completes the ceramic analysis and classification started by Tucker, (2) produces a catalog of all the burials and caches and their furniture found in Mound 3, and (3) describes changes in the architecture of this mound for each construction phase to determine the general function of Mound 3 throughout its occupation.
|
34 |
Cache Miss Reduction Techniques for Embedded CPU Instruction CachesBatcher, Kenneth William 23 April 2008 (has links)
No description available.
|
35 |
Efficient and Scalable Cache Coherence for Many-Core Chip MultiprocessorsRos Bardisa, Alberto 24 September 2009 (has links)
La nueva tendencia para aumentar el rendimiento de los futuroscomputadores son los multiprocesadores en un solo chip (CMPs). Seespera que en un futuro cercano salgan al mercado CMPs con decenas deprocesadores. Hoy en d�a, la mejor manera de mantener la coherencia decache en estos sistemas es mediante los protocolos basados endirectorio. Sin embargo, estos protocolos tienen dos grandesproblemas: una gran sobrecarga de memoria y una alta latencia de losfallos de cache.Esta tesis se ha centrado en estos problemas claves para la eficienciay escalabilidad del CMP. En primer lugar, se ha presentado unaorganizaci�n de directorios escalable. En segundo lugar, se hanpropuesto los protocolos de coherencia directa, que evitan laindirecci�n al nodo home y, por tanto, reducen el tiempo de ejecuci�nde las aplicaciones. Por �ltimo, se ha desarrollado una pol�tica demapeo para caches compartidas pero f�sicamente distribuidas, quereduce la latencia de acceso y garantiza una distribuci�n uniforme delos datos con el fin de reducir su tasa de fallos. Esto se traducefinalmente en un menor tiempo de ejecuci�n para las aplicaciones. / Chip multiprocessors (CMPs) constitute the new trend for increasingthe performance of future computers. In the near future, chips withtens of cores will become more popular. Nowadays, directory-basedprotocols constitute the best alternative to keep cache coherence inlarge-scale systems. Nevertheless, directory-based protocols have twoimportant issues that prevent them from achieving better scalability:the directory memory overhead and the long cache miss latencies.This thesis focuses on these key issues. The first proposal is ascalable distributed directory organization that copes with the memoryoverhead of directory-based protocols. The second proposal presentsthe direct coherence protocols, which are aimed at avoiding theindirection problem of traditional directory-based protocols and,therefore, they improve applications' performance. Finally, a novelmapping policy for distributed caches is presented. This policyreduces the long access latency while lessening the number of off-chipaccesses, leading to improvements in applications' execution time.
|
36 |
Designing Energy-Aware Optimization Techniques through Program Behaviour AnalysisKommaraju, Ananda Varadhan January 2014 (has links) (PDF)
Green computing techniques aim to reduce the power foot print of modern embedded devices with particular emphasis on processors, the power hot-spots of these devices. In this thesis we propose compiler-driven and profile-driven optimizations that reduce power consumption in a modern embedded processor. We show that these optimizations reduce power consumption in functional units and memory subsystems with very low performance loss. We present three new techniques to reduce power consumption in processors, namely, transition aware scheduling, leakage reduction in data caches using criticality analysis, and dynamic power reduction in data caches using locality analysis of data regions.
A novel instruction scheduling technique to address leakage power consumption in functional units is proposed. This scheduling technique, transition aware scheduling, is motivated by idle periods that arise in the utilization of functional units during program execution. A continuously large idle period in a functional unit can be exploited to place the unit in low power state. This novel scheduling algorithm increases the duration of idle periods without hampering performance and drives power gating in these periods. A power model defined with idle cycles as a parameter shows that this technique saves up to 25% of leakage power with very low performance impact.
In modern embedded programs, data regions can be classified as critical and non-critical. Critical data regions significantly impact the performance. A new technique to identify such data regions through profiling is proposed. This technique along with a new criticality based cache policy is used to control the power state of the data cache. This scheme allocates non-critical data regions to low-power cache regions, thereby reducing leakage power consumption by up to 40% without compromising on the performance.
This profiling technique is extended to identify data regions that have low locality. Some data regions have high data reuse. A locality based cache policy based on cache parameters like size and associativity is proposed. This scheme reduces dynamic as well as static power consumption in the cache subsystem. This optimization reduces 25% of the total power consumption in the data caches without hampering the execution time.
In this thesis, the problem of power consumption of a program is decoupled from the number of processor cores. The underlying architecture model is simplified to abstract away a variety of processor scenarios. This simplified model can be scaled up to be implemented in various multi-core architecture models like Chip Multi-Processors, Simultaneous Multi-Threaded Processors, Chip Multi-Threaded Processors, to name a few.
The three techniques proposed in this thesis leverage underlying hardware features like low power functional units, drowsy caches and split data caches. These techniques reduce power consumption of a wide range of benchmarks with low performance loss.
|
37 |
Etude de méthodes et mécanismes pour un accès transparent et efficace aux données dans un système multiprocesseur sur puceGuironnet De Massas, P. 12 November 2009 (has links) (PDF)
Afin de fournir toujours plus de puissance de calcul les architectes intègrent plusieurs dizaines de processeurs dans une même puce. Le but de nos travaux est d'améliorer l'efficacité des accès aux données à l'aide de solutions entièrement transparentes au logiciel. Notre contexte vise les machines multiprocesseurs à base de NoC qui possèdent des caches L1 et de la mémoire partagée et distribuée. Dans une première partie nous montrons que la redéfinition des contraintes dans les systèmes embarqués rend l'utilisation du protocole de cohérence write-through invalidate envisageable. Nous présentons également une solution innovante pour évaluer et comparer les protocoles de cohérence mémoire. Dans une deuxième partie nous présentons une solution innovante à la migration des données dans la puce. Celle-ci, gérée par le matériel, vise à placer dynamiquement et intelligemment les données afin de diminuer le coût d'accès moyen à la mémoire.
|
38 |
Reconnaissance de l'écriture manuscrite en-ligne par approche combinant systèmes à vastes marges et modèles de Markov cachésAhmad, Abdul Rahim 29 December 2008 (has links) (PDF)
Nos travaux concernent la reconnaissance de l'écriture manuscrite qui est l'un des domaines de prédilection pour la reconnaissance des formes et les algorithmes d'apprentissage. Dans le domaine de l'écriture en-ligne, les applications concernent tous les dispositifs de saisie permettant à un usager de communiquer de façon transparente avec les systèmes d'information. Dans ce cadre, nos travaux apportent une contribution pour proposer une nouvelle architecture de reconnaissance de mots manuscrits sans contrainte de style. Celle-ci se situe dans la famille des approches hybrides locale/globale où le paradigme de la segmentation/reconnaissance va se trouver résolu par la complémentarité d'un système de reconnaissance de type discriminant agissant au niveau caractère et d'un système par approche modèle pour superviser le niveau global. Nos choix se sont portés sur des Séparateurs à Vastes Marges (SVM) pour le classifieur de caractères et sur des algorithmes de programmation dynamique, issus d'une modélisation par Modèles de Markov Cachés (HMM). Cette combinaison SVM/HMM est unique dans le domaine de la reconnaissance de l'écriture manuscrite. Des expérimentations ont été menées, d'abord dans un cadre de reconnaissance de caractères isolés puis sur la base IRONOFF de mots cursifs. Elles ont montré la supériorité des approches SVM par rapport aux solutions à bases de réseaux de neurones à convolutions (Time Delay Neural Network) que nous avions développées précédemment, et leur bon comportement en situation de reconnaissance de mots.
|
39 |
Operating System Management of Shared Caches on Multicore ProcessorsTam, David Kar Fai 01 September 2010 (has links)
Our thesis is that operating systems should manage the on-chip shared caches of multicore processors for the purposes of achieving performance gains. Consequently, this dissertation demonstrates how the operating system can profitably manage these shared caches. Two shared-cache management principles are investigated: (1) promoting shared use of the shared cache, demonstrated by an automated online thread clustering technique, and (2) providing cache space isolation, demonstrated by a software-based cache partitioning technique. In support of providing isolation, cache provisioning is also investigated, demonstrated by an automated online technique called RapidMRC. We show how these software-based techniques are feasible on existing multicore systems with the help of their hardware performance monitoring units and their associated hardware performance counters. On a 2-chip IBM POWER5 multicore system, promoting sharing reduced processor pipeline stalls caused by cross-chip cache accesses by up to 70%, resulting in performance improvements of up to 7%. On a larger 8-chip IBM POWER5+ multicore system, the potential for up to 14% performance improvement was measured. Providing isolation improved performance by up to 50%, using an exhaustive offline search method to determine optimal partition size. On the other hand, up to 27% performance improvement was extracted from the corresponding workload using an automated online approximation technique, made possible by RapidMRC.
|
40 |
Operating System Management of Shared Caches on Multicore ProcessorsTam, David Kar Fai 01 September 2010 (has links)
Our thesis is that operating systems should manage the on-chip shared caches of multicore processors for the purposes of achieving performance gains. Consequently, this dissertation demonstrates how the operating system can profitably manage these shared caches. Two shared-cache management principles are investigated: (1) promoting shared use of the shared cache, demonstrated by an automated online thread clustering technique, and (2) providing cache space isolation, demonstrated by a software-based cache partitioning technique. In support of providing isolation, cache provisioning is also investigated, demonstrated by an automated online technique called RapidMRC. We show how these software-based techniques are feasible on existing multicore systems with the help of their hardware performance monitoring units and their associated hardware performance counters. On a 2-chip IBM POWER5 multicore system, promoting sharing reduced processor pipeline stalls caused by cross-chip cache accesses by up to 70%, resulting in performance improvements of up to 7%. On a larger 8-chip IBM POWER5+ multicore system, the potential for up to 14% performance improvement was measured. Providing isolation improved performance by up to 50%, using an exhaustive offline search method to determine optimal partition size. On the other hand, up to 27% performance improvement was extracted from the corresponding workload using an automated online approximation technique, made possible by RapidMRC.
|
Page generated in 0.0491 seconds