Spelling suggestions: "subject:"cache."" "subject:"vache.""
171 |
Avaliação do compartilhamento das memórias cache no desempenho de arquiteturas multi-core / Performance evaluation of shared cache memory for multi-core architecturesAlves, Marco Antonio Zanata January 2009 (has links)
No atual contexto de inovações em multi-core, em que as novas tecnologias de integração estão fornecendo um número crescente de transistores por chip, o estudo de técnicas de aumento de vazão de dados é de suma importância para os atuais e futuros processadores multi-core e many-core. Com a contínua demanda por desempenho computacional, as memórias cache vêm sendo largamente adotadas nos diversos tipos de projetos arquiteturais de computadores. Os atuais processadores disponíveis no mercado apontam na direção do uso de memórias cache L2 compartilhadas. No entanto, ainda não está claro quais os ganhos e custos inerentes desses modelos de compartilhamento da memória cache. Assim, nota-se a importância de estudos que abordem os diversos aspectos do compartilhamento de memória cache em processadores com múltiplos núcleos. Portanto, essa dissertação visa avaliar diferentes compartilhamentos de memória cache, modelando e aplicando cargas de trabalho sobre as diferentes organizações, a fim de obter resultados significativos sobre o desempenho e a influência do compartilhamento da memória cache em processadores multi-core. Para isso, foram avaliados diversos compartilhamentos de memória cache, utilizando técnicas tradicionais de aumento de desempenho, como aumento da associatividade, maior tamanho de linha, maior tamanho de memória cache e também aumento no número de níveis de memória cache, investigando a correlação entre essas arquiteturas de memória cache e os diversos tipos de aplicações da carga de trabalho. Os resultados mostram a importância da integração entre os projetos de arquitetura de memória cache e o projeto físico da memória, a fim de obter o melhor equilíbrio entre tempo de acesso à memória cache e redução de faltas de dados. Nota-se nos resultados, dentro do espaço de projeto avaliado, que devido às limitações físicas e de desempenho, as organizações 1Core/L2 e 2Cores/L2, com tamanho total igual a 32 MB (bancos de 2 MB compartilhados), tamanho de linha igual a 128 bytes, representam uma boa escolha de implementação física em sistemas de propósito geral, obtendo um bom desempenho em todas aplicações avaliadas sem grandes sobrecustos de ocupação de área e consumo de energia. Além disso, como conclusão desta dissertação, mostra-se que, para as atuais e futuras tecnologias de integração, as tradicionais técnicas de ganho de desempenho obtidas com modificações na memória cache, como aumento do tamanho das memórias, incremento da associatividade, maiores tamanhos da linha, etc. não devem apresentar ganhos reais de desempenho caso o acréscimo de latência gerado por essas técnicas não seja reduzido, a fim de equilibrar entre a redução na taxa de faltas de dados e o tempo de acesso aos dados. / In the current context of innovations in multi-core processors, where the new integration technologies are providing an increasing number of transistors inside chip, the study of techniques for increasing data throughput has great importance for the current and future multi-core and many-core processors. With the continuous demand for performance, the cache memories have been widely adopted in various types of architectural designs of computers. Nowadays, processors on the market point out for the use of shared L2 cache memory. However, it is not clear the gains and costs of these shared cache memory models. Thus, studies that address different aspects of shared cache memory have great importance in context of multi-core processors. Therefore, this dissertation aims to evaluate different shared cache memory, modeling and applying workloads on different organizations in order to obtain significant results from the performance and the influence of the shared cache memory multi-core processors. Thus, several types of shared cache memory were evaluated using traditional techniques to increase performance, such as increasing the associativity, larger line size, larger cache memory and also the increase on the cache memory hierarchy, investigating the correlation between the cache memory architecture and the workload applications. The results show the importance of integration between cache memory architecture project and memory physical design in order to obtain the best trade-off between cache memory access time and cache misses. According to the results, within evaluations, due to physical limitations and performance, organizations 1Core/L2 and 2Cores/L2 with total cache size equal to 32MB, using banks of 2 MB, line size equal to 128 bytes, represent a good choice for physical implementation in general purpose systems, obtaining a good performance in all evaluated applications without major extra costs of area occupation and power consumption. Furthermore, as a conclusion in this dissertation is shown that, for current and future integration technologies, traditional techniques for performance gain obtained with changes in the cache memory such as, increase of the memory size, increasing the associativity, larger line sizes etc.. should not lead to real performance gains if the additional latency generated by these techniques was not treated, in order to balance between the reduction of cache miss rate and the data access time.
|
172 |
Algoritmo de prefetching de dados temporizado para sistemas multiprocessadores baseados em NOCSILVEIRA, Maria Cireno Ribeiro 09 March 2015 (has links)
Submitted by Fabio Sobreira Campos da Costa (fabio.sobreira@ufpe.br) on 2016-03-15T13:58:26Z
No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
UFPE-MEI 2015-078 - Maria Cireno Ribeiro Silveira.pdf: 4578273 bytes, checksum: 1c434494e0c03cb02156a37ebfd1c7da (MD5) / Made available in DSpace on 2016-03-15T13:58:26Z (GMT). No. of bitstreams: 2
license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
UFPE-MEI 2015-078 - Maria Cireno Ribeiro Silveira.pdf: 4578273 bytes, checksum: 1c434494e0c03cb02156a37ebfd1c7da (MD5)
Previous issue date: 2015-03-09 / O prefetching é uma técnica considerada e ciente para mitigar um problema já conhecido
em sistemas computacionais: a diferença entre o desempenho do processador e do acesso
à memória. O objetivo do prefetching é aproximar o dado do processador retirando-o da
memória e carregando na cache local. Uma vez que o dado seja requisitado pelo processador,
ele já estará disponível na cache, reduzindo a taxa de perdas e a penalidade do
sistema. Para sistemas multiprocessadores baseados em NoCs a e ciência do prefetching
é ainda mais crítica em relação ao desempenho, uma vez que o tempo de acesso ao dado
varia dependendo da distância entre processador e memória e do tráfego da rede.
Este trabalho propõe um algoritmo de prefetching de dados temporizado, que tem
como objetivo minimizar a penalidade dos núcleos através uma solução de prefetching
baseada em predição de tempo para sistemas multiprocessadores baseados em NoC. O
algoritmo utiliza um processo pró-ativo iniciado pelo servidor para realizar requisições
de prefetching baseado no histórico de perdas de cache e informações da NoC. Nos experimentos
realizados para 16 núcleos, o algoritmo proposto reduziu a penalidade dos
processadores em 53,6% em comparação com o prefetching baseado em eventos (faltas na
cache), sendo a maior redução de 29% da penalidade. / The prefetching technique is an e ective approach to mitigate a well-known problem in
multi-core processors: the gap between computing and data access performance. The
goal of prefetching is to approximate data to the CPU by retrieving the data from the
memory and loading it in the cache. When the data is requested by the CPU, it is already
available in the cache, reducing the miss rate and penalty. In multiprocessor NoC-based
systems the prefetching e ciency is even more critical to system performance, since the
access time depends of the distance between the requesting processor and the memory
and also of the network tra c.
This work proposes a temporized data prefetching algorithm that aims to minimize
the penalty of the cores through one prefetching solution based on time prediction for
multiprocessor NoC-based systems. The algorithm utilizes a proactive process initiated by
the server to request prefetching data based on cache miss history and NoC's information.
In the experiments for 16 cores, the proposed algorithm has successfully reduced the
processors penalty in 53,6% compared to the event-based prefetching and the best case
was a penalty reduction of 29%.
|
173 |
Otimização de memória cache em tempo de execução para o processador embarcado LEON3 / Optimization of cache memory at runtime for embedded processor LEON3Lucas Albers Cuminato 28 April 2014 (has links)
O consumo de energia é uma das questões mais importantes em sistemas embarcados. Estudos demonstram que neste tipo de sistema a cache é responsável por consumir a maior parte da energia fornecida ao processador. Na maioria dos processadores embarcados, os parâmetros de configuração da cache são fixos e não permitem mudanças após sua fabricação/síntese. Entretanto, este não é o cenário ideal, pois a configuração da cache pode não ser adequada para uma determinada aplicação, tendo como consequência menor desempenho na execução e consumo excessivo de energia. Neste contexto, este trabalho apresenta uma implementação em hardware, utilizando computação reconfigurável, capaz de reconfigurar automática, dinâmica e transparentemente a quantidade de ways e por consequência o tamanho da cache de dados do processador embarcado LEON3, de forma que a cache se adeque à aplicação em tempo de execução. Com esta técnica, espera-se melhorar o desempenho das aplicações e reduzir o consumo de energia do sistema. Os resultados dos experimentos demonstram que é possível reduzir em até 5% o consumo de energia das aplicações com degradação de apenas 0.1% de desempenho / Energy consumption is one of the most important issues in embedded systems. Studies have shown that in this type of system the cache consumes most of the power supplied to the processor. In most embedded processors, the cache configuration parameters are fixed and do not allow changes after manufacture/synthesis. However, this is not the ideal scenario, since the configuration of the cache may not be suitable for a particular application, resulting in lower performance and excessive energy consumption. In this context, this project proposes a hardware implementation, using reconfigurable computing, able to reconfigure the parameters of the LEON3 processor\'s cache in run-time improving applications performance and reducing the power consumption of the system. The result of the experiment shows it is possible to reduce the processor\'s power consumption up to 5% with only 0.1% degradation in performance
|
174 |
Implementação de cache no projeto ArchC / Cache implementation in the ArchC projectAlmeida, Henrique Dante de, 1982- 20 August 2018 (has links)
Orientadores: Paulo Cesar Centoducatte, Rodolfo Jardim de Azevedo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-20T15:21:59Z (GMT). No. of bitstreams: 1
Almeida_HenriqueDantede_M.pdf: 506967 bytes, checksum: ca41d5af5008feeb442f3b9d9322af51 (MD5)
Previous issue date: 2012 / Resumo: O projeto ArchC visa criar uma linguagem de descrição de arquiteturas, com o objetivo de se construir simuladores e toolchains de arquiteturas computacionais completas. O objetivo deste trabalho é dotar ArchC com capacidade para gerar simuladores de caches. Para tanto foi realizado um estudo detalhado das caches (tipos, organizações, configurações etc) e do funcionamento e do código do ArchC. O resultado foi a descrição de uma coleção de caches parametrizáveis que podem ser adicionadas 'as arquiteturas descritas em ArchC. A implementação das caches é modular, possuindo código isolado para a memória de armazenamento da cache e políticas de operação. A corretude da cache foi verificada utilizando uma sequ¿encia de simulações de diversas configurações de cache e com comparações com o simulador dinero. A cache resultante apresentou um overhead, no tempo de simulaçao, que varia entre 10% e 60%, quando comparada a um simulador sem cache / Abstract: The ArchC project aims to create an architecture description language, with the goal of building complete computer architecture simulators and toolchains. The goal of this project is to add support in ArchC for simulating caches. To achieve this, a detailed study about caches (types, organization, configuration etc) and about the ArchC code was done. The result was a collection of parameterized caches that may be included on the architectures described with ArchC. The cache implementation is modular, having isolated code for the storage and operation policies. Implementation correctness was verified using a set of many cache configurations and with comparisons with the results from dinero simulator. The resulting cache showed an overhead varying between 10% and 60%, when compared to a simulator without caches / Mestrado / Ciência da Computação / Mestre em Ciência da Computação
|
175 |
Gestion hétérogène des données dans les hiérarchies mémoires pour l’optimisation énergétique des architectures multi-coeurs / Read Only Data Specific Management for an Energy Efficient Memory SystemVaumourin, Gregory 04 October 2016 (has links)
Les problématiques de consommation dans la hiérarchie mémoire sont très présentes dans les architectures actuelles que ce soit pour les systèmes embarqués limités par leurs batteries ou pour les supercalculateurs limités par leurs enveloppes thermiques. Introduire une information de classification dans le système mémoire permet une gestion hétérogène, adaptée à chaque type particulier de données. Nous nous sommes intéressé dans cette thèse plus précisément aux données en lecture seule et étudions les possibilités d’une gestion spécifique dans la hiérarchie mémoire à travers un codesign compilation/architecture. Cela permet d’ouvrir de nouveaux potentiels en terme de localité des données, passage à l’échelle des architectures ou design des mémoires. Evaluée par simulation sur une architecture multi-coeurs, la solution mise en oeuvre permet des gains significatifs en terme de réduction de la consommation d’énergie à performance constante. / The energy consumption of the memory system in modern architectures is a major issue for embedded system limited by their battery or supercalculators limited by their Thermal Design Power. Using a classification information in the memory system allows a heterogeneous management of data, more specific to each kind of data. During this thesis, we focused on the specific management of read-only data into the memory system through a compilation/architecture codesign. It allows to explore new potentials in terms of data locality, scalability of the system or cache designs. Evaluated by simulation with multi-core architecture, the proposed solution others significant energy consumption reduction while keeping the performance stable.
|
176 |
Optimisation et visualisation de cache de luminance en éclairage global / optimization and visualization of a radiance cache in global IlluminationOmidvar, Mahmoud 20 May 2015 (has links)
La simulation d'éclairage est un processus qui s'avère plus complexe (temps de calcul, coût mémoire, mise en œuvre complexe) aussi bien pour les matériaux brillants que pour les matériaux lambertiens ou spéculaires. Afin d'éviter le calcul coûteux de certains termes de l'équation de luminance (convolution entre la fonction de réflexion des matériaux et la distribution de luminance de l'environnement), nous proposons une nouvelle structure de données appelée Source Surfacique Équivalente (SSE). L'utilisation de cette structure de données nécessite le pré-calcul puis la modélisation du comportement des matériaux soumis à divers types de sources lumineuses (positions, étendues). L'exploitation d'algorithmes génétiques nous permet de déterminer les paramètres des modèles de BRDF, en introduisant une première source d'approximation. L'approche de simulation d'éclairage utilisée est basée sur un cache de luminance. Ce dernier consiste à stocker l'éclairement incident sous forme de SSE en des points appelés enregistrements. Durant la simulation d'éclairage, l'environnement lumineux doit également être assimilé à un ensemble de sources surfaciques équivalentes (en chaque enregistrement) qu'il convient de définir de manière dynamique. Cette phase constitue une deuxième source d'erreur. Toutefois, l'incertitude globale ne se réduit pas au cumul des approximations réalisées à chaque étape. Les comparatives réalisées prouvent, au contraire, que l'approche des Sources Surfaciques Équivalentes est particulièrement intéressante pour des matériaux rugueux ou pour les matériaux très brillants placés dans des environnements relativement uniformes. L'utilisation de SSE a permis de réduire considérablement à la fois le coût mémoire et le temps de calcul. Une fois que les SSE sont calculés en chaque enregistrement et pour un certain nombre de points de vue, nous proposons une nouvelle méthode de visualisation interactive exploitant les performances des GPU (carte graphique) et s'avérant plus rapide que les méthodes existantes. Enfin nous traiterons le cas où les grandeurs photométriques sont spectrales, ce qui est très important lorsqu'il s'agit de réaliser des simulations d'éclairage précises. Nous montrerons comment adapter les zones d'influence des enregistrements en fonction des gradients de luminance et de la géométrie autour des enregistrements. / Radiance caching methods have proven efficient for global illumination. Their goal is to compute precisely illumination values (incident radiance or irradiance) at a reasonable number of points lying on the scene surfaces. These points, called records, are stored in a cache used for estimating illumination of other points in the scene. Unfortunately, with records lying on glossy surfaces, the irradiance value alone is not sufficient to evaluate the reflected radiance; each record should also store the incident radiance for all incident directions. Memory storage can be reduced with projection techniques using spherical harmonics or other basis functions. These techniques provide good results with low shininess BRDFs. However, they get impractical for shininess of even moderate value since the number of projection coefficients increase drastically. In this paper, we propose a new radiance caching method, that handles highly glossy surfaces, while requiring a low memory storage. Each cache record stores a coarse representation of the incident illumination thanks to a new data structure called Equivalent Area light Sources (EAS), capable of handling fuzzy mirror surfaces. In addition, our method proposes a new simplification of the interpolation process since it avoids the need for expressing and evaluating complex gradients. Moreover, we propose a new GPU based visualisation method which exploits these EAS data structure. Thus, interactive rendering is done faster than existing methods. Finally, physical ligting simulations need to manipulate spectral physical quantities. We demonstrate in our work how these quantities can be handle with our technic by adapting the record influence zone depending on the radiance gradients and the geometry around the records.
|
177 |
Desempenho em ambiente Web considerando diferenciação de serviços (QoS) em cache, rede e servidor: modelagem e simulação / Performance in Web environments with differentiation of service (QoS) in caches, network and server: modeling and simutationIran Calixto Abrão 18 December 2008 (has links)
Esta tese de doutorado apresenta a investigação de alternativas para melhorar o desempenho de ambientes Web, avaliando o impacto da utilização de mecanismos de diferenciação de serviços em todos os pontos do sistema. Foram criados e modelados no OPNET Modeler cenários com diferentes configurações voltadas tanto para a diferenciação de serviços, quanto para o congestionamento da rede. Foi implementado um servidor cache com suporte à diferenciação de serviços (cache CDF), que constitui uma contribuição dentro deste trabalho, complementando o cenário de diferenciação de serviços de forma positiva, assegurando que os ganhos obtidos em outras etapas do sistema não sejam perdidos no momento da utilização do cache. Os principais resultados obtidos mostram que a diferenciação de serviços introduzida de forma isolada em partes do sistema, pode não gerar os ganhos de desempenho desejados. Todos os equipamentos considerados nos cenários propostos possuem características reais e os modelos utilizados no OPNET foram avaliados e validados pelos seus fabricantes. Assim, os modelos que implementam os cenários considerados constituem também uma contribuição importante deste trabalho, uma vez que o estudo apresentado não se restringe a uma modelagem teórica, ao contrário, aborda aspectos bem próximos da realidade, constituindo um possível suporte de gerenciamento de sistemas Web / This PhD thesis presents the investigation of alternatives to improve the performance of Web environments by evaluating the impact of using differentiated service mechanisms in all points of the system. Several scenarios were created and modeled in the OPNET Modeler, with different configurations of both differentiated services and network overloading. A special cache server supporting differentiated services (CDF cache) was proposed and included in the model, comprising one of the major contributions of this work once it positively complements the differentiated service scenario, making that the gains obtained with other stages of the system do not be spoiled when using the cache. The main results obtained show that the adoption of differentiated services in isolated parts of the system cannot generate the expected performance gains. The features of all the equipments considered in the several scenarios defined in this work are very close to the reality and the models used in the OPNET were evaluated and validated by the companies that produce those equipments. Thus, the models that implement the scenarios considered in this work also comprises an important contribution of this thesis, once the study presented is not just a theoretical modeling exercise but, conversely, it approaches aspects very close to the reality, comprising a possible Web system management support
|
178 |
Minimizing the unpredictability that real-time tasks suffer due to inter-core cache interference.Satka, Zenepe, Hodžić, Hena January 2020 (has links)
Since different companies are introducing new capabilities and features on their products, the demand for computing power and performance in real-time systems is increasing. To achieve higher performance, processor's manufactures have introduced multi-core platforms. These platforms provide the possibility of executing different tasks in parallel on multiple cores. Since tasks share the same cache level, they face some interference that affects their timing predictability. This thesis is divided into two parts. The first part presents a survey on the existing solutions that others have introduced to solve the problem of cache interference that tasks face on multi-core platforms. The second part's focus is on one of the hardware-based techniques introduced by Intel Cooperation to achieve timing predictability of real-time tasks. This technique is called Cache Allocation Technology (CAT) and the main idea of it is to divide last level cache on some partitions called classes of services that will be reserved to specific cores. Since tasks of one core can only access the assigned partition of it, cache interference will be decreased and a better real-time tasks' performance will be achieved. In the end to evaluate CAT efficiency an experiment is conducted with different test cases and the obtained results show a significant difference on real-time tasks' performance when CAT is applied.
|
179 |
Dynamic Eviction Set Algorithms and Their Applicability to Cache CharacterisationLindqvist, Maria January 2020 (has links)
Eviction sets are groups of memory addresses that map to the same cache set. They can be used to perform efficient information-leaking attacks against the cache memory, so-called cache side channel attacks. In this project, two different algorithms that find such sets are implemented and compared. The second of the algorithms improves on the first by using a concept called group testing. It is also evaluated if these algorithms can be used to analyse or reverse engineer the cache characteristics, which is a new area of application for this type of algorithms. The results show that the optimised algorithm performs significantly better than the previous state-of-the-art algorithm. This means that countermeasures developed against this type of attacks need to be designed with the possibility of faster attacks in mind. The results also shows, as a proof-of-concept, that it is possible to use these algorithms to create a tool for cache analysis.
|
180 |
Profilem řízené optimalizace pro instrukční vyrovnávací paměti / Profile-Guided Optimizations for Instruction CachesBobek, Jiří January 2015 (has links)
Instruction cache performance is very important for the overall performance of a computer. The placement of code blocks in memory can significantly affect the cache miss rate. This means that a compiler can improve the performance of a program by placing parts of code at the right addresses in memory. This work discusses several methods for collecting profile information, and describes an algorithm that uses profile information to guide code block placement. Additionally, the algorithm is added into the optimizer of the LLVM compiler, and improvements in cache performance are evaluated.
|
Page generated in 0.0428 seconds