Global ETD Search

1	Measuring the effect of memory bandwidth contention in applications on multi-core processors Lindberg, Emil January 2015 (has links) In this thesis we design and implement a benchmarking tool for applications' sensitivity to main memory bandwidth contention, in a multi-core environment, on an ARM Cortex-A15 CPU. The tool is supposed to minimize usage of shared resources, except for the main memory bandwidth, allowing it to isolate the effects of the bandwidth contention only. The difficulty in doing this lies in using a correct memory access pattern for this purpose, i.e. which memory addresses to access, in which order and at what rate in order to minimize cache usage while generating a high and controllable main memory bandwidth usage. We manage to implement a tool with low cache memory usage while still being able to saturate the main memory bandwidth. The tool uses a proportional-integral controller to control the amount of bandwidth it uses. We then use the tool to investigate the memory behaviour of the platform and of some applications when the tool is using a variable amount of bandwidth. However, we have some difficulties in analyzing the results due to the lack of support for hardware performance counters in the operating system we are using and are forced to rely on hardware timers for our data gathering. Another difficulty is the platform's limited L2 cache bandwidth, which leads to a heavy impact on L2 cache read latency by the tool. Despite this, we are able to draw some conclusions on the bandwidth usage of other applications in optimal cases with the help of the tool. Memory bandwidth shared resources contention Computer Engineering Datorteknik
2	SoMMA : a software managed memory architecture for multi-issue processors Jost, Tiago Trevisan January 2017 (has links) Processadores embarcados utilizam eficientemente o paralelismo a nível de instrução para atender as necessidades de desempenho e energia em aplicações atuais. Embora a melhoria de performance seja um dos principais objetivos em processadores em geral, ela pode levar a um impacto negativo no consumo de energia, uma restrição crítica para sistemas atuais. Nesta dissertação, apresentamos o SoMMA, uma arquitetura de memória gerenciada por software para processadores embarcados capaz de reduz consumo de energia e energy-delay product (EDP), enquanto ainda aumenta a banda de memória. A solução combina o uso de memórias gerenciadas por software com a cache de dados, de modo a reduzir o consumo de energia e EDP do sistema. SoMMA também melhora a performance do sistema, pois os acessos à memória podem ser realizados em paralelo, sem custo em portas de memória extra na cache de dados. Transformações de código do compilador auxiliam o programador a utilizar a arquitetura proposta. Resultados experimentais mostram que SoMMA é mais eficiente em termos de energia e desempenho tanto a nível de processador quanto a nível do sistema completo. A técnica apresenta speedups de 1.118x e 1.121x, consumindo 11% e 12.8% menos energia quando comparando processadores que utilizam e não utilizam SoMMA. Há ainda redução de até 41.5% em EDP do sistema, sempre mantendo a área dos processadores equivalentes. Por fim, SoMMA também reduz o número de cache misses quando comparado ao processador baseline. / Embedded processors rely on the efficient use of instruction-level parallelism to answer the performance and energy needs of modern applications. Though improving performance is the primary goal for processors in general, it might lead to a negative impact on energy consumption, a particularly critical constraint for current systems. In this dissertation, we present SoMMA, a software-managed memory architecture for embedded multi-issue processors that can reduce energy consumption and energy-delay product (EDP), while still providing an increase in memory bandwidth. We combine the use of software-managed memories (SMM) with the data cache, and leverage the lower energy access cost of SMMs to provide a processor with reduced energy consumption and EDP. SoMMA also provides a better overall performance, as memory accesses can be performed in parallel, with no cost in extra memory ports. Compiler-automated code transformations minimize the programmer’s effort to benefit from the proposed architecture. Our experimental results show that SoMMA is more energy- and performance-efficient not only for the processing cores, but also at full-system level. Comparisons were done using the VEX processor, a VLIW reconfigurable processor. The approach shows average speedups of 1.118x and 1.121x, while consuming up to 11% and 12.8% less energy when comparing two modified processors and their baselines. SoMMA also shows reduction of up to 41.5% on full-system EDP, maintaining the same processor area as baseline processors. Lastly, even with SoMMA halving the data cache size, we still reduce the number of data cache misses in comparison to baselines. Memoria : Computadores Sistemas embarcados Code generation process Software-managed memory Multi-issue processors Memory bandwidth limitation Instruction-level parallelism
3	SoMMA : a software managed memory architecture for multi-issue processors Jost, Tiago Trevisan January 2017 (has links) Processadores embarcados utilizam eficientemente o paralelismo a nível de instrução para atender as necessidades de desempenho e energia em aplicações atuais. Embora a melhoria de performance seja um dos principais objetivos em processadores em geral, ela pode levar a um impacto negativo no consumo de energia, uma restrição crítica para sistemas atuais. Nesta dissertação, apresentamos o SoMMA, uma arquitetura de memória gerenciada por software para processadores embarcados capaz de reduz consumo de energia e energy-delay product (EDP), enquanto ainda aumenta a banda de memória. A solução combina o uso de memórias gerenciadas por software com a cache de dados, de modo a reduzir o consumo de energia e EDP do sistema. SoMMA também melhora a performance do sistema, pois os acessos à memória podem ser realizados em paralelo, sem custo em portas de memória extra na cache de dados. Transformações de código do compilador auxiliam o programador a utilizar a arquitetura proposta. Resultados experimentais mostram que SoMMA é mais eficiente em termos de energia e desempenho tanto a nível de processador quanto a nível do sistema completo. A técnica apresenta speedups de 1.118x e 1.121x, consumindo 11% e 12.8% menos energia quando comparando processadores que utilizam e não utilizam SoMMA. Há ainda redução de até 41.5% em EDP do sistema, sempre mantendo a área dos processadores equivalentes. Por fim, SoMMA também reduz o número de cache misses quando comparado ao processador baseline. / Embedded processors rely on the efficient use of instruction-level parallelism to answer the performance and energy needs of modern applications. Though improving performance is the primary goal for processors in general, it might lead to a negative impact on energy consumption, a particularly critical constraint for current systems. In this dissertation, we present SoMMA, a software-managed memory architecture for embedded multi-issue processors that can reduce energy consumption and energy-delay product (EDP), while still providing an increase in memory bandwidth. We combine the use of software-managed memories (SMM) with the data cache, and leverage the lower energy access cost of SMMs to provide a processor with reduced energy consumption and EDP. SoMMA also provides a better overall performance, as memory accesses can be performed in parallel, with no cost in extra memory ports. Compiler-automated code transformations minimize the programmer’s effort to benefit from the proposed architecture. Our experimental results show that SoMMA is more energy- and performance-efficient not only for the processing cores, but also at full-system level. Comparisons were done using the VEX processor, a VLIW reconfigurable processor. The approach shows average speedups of 1.118x and 1.121x, while consuming up to 11% and 12.8% less energy when comparing two modified processors and their baselines. SoMMA also shows reduction of up to 41.5% on full-system EDP, maintaining the same processor area as baseline processors. Lastly, even with SoMMA halving the data cache size, we still reduce the number of data cache misses in comparison to baselines. Memoria : Computadores Sistemas embarcados Code generation process Software-managed memory Multi-issue processors Memory bandwidth limitation Instruction-level parallelism
4	SoMMA : a software managed memory architecture for multi-issue processors Jost, Tiago Trevisan January 2017 (has links) Processadores embarcados utilizam eficientemente o paralelismo a nível de instrução para atender as necessidades de desempenho e energia em aplicações atuais. Embora a melhoria de performance seja um dos principais objetivos em processadores em geral, ela pode levar a um impacto negativo no consumo de energia, uma restrição crítica para sistemas atuais. Nesta dissertação, apresentamos o SoMMA, uma arquitetura de memória gerenciada por software para processadores embarcados capaz de reduz consumo de energia e energy-delay product (EDP), enquanto ainda aumenta a banda de memória. A solução combina o uso de memórias gerenciadas por software com a cache de dados, de modo a reduzir o consumo de energia e EDP do sistema. SoMMA também melhora a performance do sistema, pois os acessos à memória podem ser realizados em paralelo, sem custo em portas de memória extra na cache de dados. Transformações de código do compilador auxiliam o programador a utilizar a arquitetura proposta. Resultados experimentais mostram que SoMMA é mais eficiente em termos de energia e desempenho tanto a nível de processador quanto a nível do sistema completo. A técnica apresenta speedups de 1.118x e 1.121x, consumindo 11% e 12.8% menos energia quando comparando processadores que utilizam e não utilizam SoMMA. Há ainda redução de até 41.5% em EDP do sistema, sempre mantendo a área dos processadores equivalentes. Por fim, SoMMA também reduz o número de cache misses quando comparado ao processador baseline. / Embedded processors rely on the efficient use of instruction-level parallelism to answer the performance and energy needs of modern applications. Though improving performance is the primary goal for processors in general, it might lead to a negative impact on energy consumption, a particularly critical constraint for current systems. In this dissertation, we present SoMMA, a software-managed memory architecture for embedded multi-issue processors that can reduce energy consumption and energy-delay product (EDP), while still providing an increase in memory bandwidth. We combine the use of software-managed memories (SMM) with the data cache, and leverage the lower energy access cost of SMMs to provide a processor with reduced energy consumption and EDP. SoMMA also provides a better overall performance, as memory accesses can be performed in parallel, with no cost in extra memory ports. Compiler-automated code transformations minimize the programmer’s effort to benefit from the proposed architecture. Our experimental results show that SoMMA is more energy- and performance-efficient not only for the processing cores, but also at full-system level. Comparisons were done using the VEX processor, a VLIW reconfigurable processor. The approach shows average speedups of 1.118x and 1.121x, while consuming up to 11% and 12.8% less energy when comparing two modified processors and their baselines. SoMMA also shows reduction of up to 41.5% on full-system EDP, maintaining the same processor area as baseline processors. Lastly, even with SoMMA halving the data cache size, we still reduce the number of data cache misses in comparison to baselines. Memoria : Computadores Sistemas embarcados Code generation process Software-managed memory Multi-issue processors Memory bandwidth limitation Instruction-level parallelism
5	Hierarquia de memória configurável para redução energética no codificador de vídeo HEVC / Configurable memory hierarchy for energy reduction in HEVC video encoder Martins, Anderson da Silva 29 September 2017 (has links) Submitted by Aline Batista (alinehb.ufpel@gmail.com) on 2018-04-18T14:40:46Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Dissertacao_Anderson_Martins.pdf: 8654389 bytes, checksum: f6e25bd57867fb8466bfe88dcf25afb3 (MD5) / Approved for entry into archive by Aline Batista (alinehb.ufpel@gmail.com) on 2018-04-19T14:42:52Z (GMT) No. of bitstreams: 2 Dissertacao_Anderson_Martins.pdf: 8654389 bytes, checksum: f6e25bd57867fb8466bfe88dcf25afb3 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-04-19T14:43:00Z (GMT). No. of bitstreams: 2 Dissertacao_Anderson_Martins.pdf: 8654389 bytes, checksum: f6e25bd57867fb8466bfe88dcf25afb3 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-09-29 / Sem bolsa / Dados recentes mostram que há uma demanda crescente de aplicações de vídeo em dispositivos móveis, sendo este um grande desafio para pesquisas em arquiteturas de codificadores de vídeo de alto desempenho como o padrão HEVC. Em um sistema embarcado o consumo de energia e o desempenho estão diretamente ligados ao sistema de memória. No codificador de vídeo não é diferente, e no HEVC a etapa de estimação de movimento (ME) é conhecida por ser responsável pela maior parte do tempo de processamento e acesso à memória. Portanto, este trabalho apresenta uma exploração do espaço de projeto para definir configurações de memória cache eficientes em energia para o processo da ME e, propor uma hierarquia de memória cache configurável, considerando diferentes sequências de vídeo e configurações do codificador HEVC. A avaliação considerou o algoritmo TZ Search, amplamente utilizado, 23 sequências de vídeo com resoluções distintas e quatro Parâmetros de Quantização (QPs) sob 32 configurações de cache diferentes. Um simulador de cache foi desenvolvido e a ferramenta CACTI foi utilizada para obter parâmetros de tempo e energia. Assim, foi possível identificar configurações de cache ótimas para cada cenário, visto que não existe uma única configuração de memória cache que satisfaça todos os cenários ao mesmo tempo quando o objetivo é redução de energia. Considerando a configuração ótima de cache para cada cenário, o uso de cache pode levar a uma economia de largura de banda da memória externa de até 97,37%, que corresponde a uma redução de 25,48GB/s para 548,53MB/s em um caso. A redução de energia chega a 93,95%, o que corresponde, uma redução de energia de 5,02mJ para 0,30mJ, ao comparar diferentes configurações de cache. Estes resultados possibilitaram propor uma hierarquia de memória cache configurável para o processo de estimação de movimento que é capaz de atender eficientemente todos os cenários testados. Para a arquitetura configurável proposta foram encontradas economia de energia de até 78,09% quando as configurações ótimas são comparadas com o pior caso dentro da cache configurável (16KB-8). Já quando comparada com Level-C, foram alcançadas economia de energia de até 86,91%. Além disso, a economia de largura de banda alcançada ficou entre 90,21% e 96,84% com uma média de 94,97%. / Recent data show that there is a growing demand for video applications on mobile devices, which is a major challenge for research into high performance video encoder architectures such as the HEVC standard. In an embedded system, power consumption and performance are directly connected to the memory system. In the video encoder it is no different, and in the HEVC the motion estimation (ME) step is known to be responsible for most of the processing time and memory access. Therefore, this work presents an exploration of the design space to define energy-efficient cache memory configurations for the ME process and propose a configurable cache memory hierarchy considering different video sequences and HEVC encoder configurations. The evaluation considered the widely used TZ Search algorithm, 23 video sequences with distinct resolutions, and four Quantization Parameters (QPs) under 32 different cache configurations. A cache simulator was developed and the CACTI tool was used to obtain time and energy parameters. Thus, it was possible to identify optimal cache configurations for each scenario, since there is no single cache configuration that satisfies all scenarios at the same time when the goal is to reduce power. Considering the optimal cache configuration for each scenario, cache usage can lead to external memory bandwidth savings of up to 97.37%, which corresponds to a reduction of 25.48GB/s to 548.53MB/s in one case. The energy reduction comes to 93.95%, which corresponds to an energy reduction of 5.02mJ to 0.30mJ when comparing different cache configurations. These results have made it possible to propose a configurable cache memory hierarchy for motion estimation process that is capable of efficiently satisfying all scenarios tested. For the proposed configurable architecture, energy savings of up to 78.09% were found when the optimal configurations were compared to the worst case within the configurable cache (16KB-8). When compared to Level-C, energy savings of up to 86.91% were achieved. In addition, the external memory bandwidth savings achieved was between 90.21% and 96.84% with an average of 94.97%. Memória cache Economia de energia HEVC Estimação de movimento Cache memory Energy saving Memory bandwidth reduction Motion estimation

1

Page generated in 0.0621 seconds