• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 147
  • 30
  • 21
  • 15
  • 7
  • 6
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 268
  • 76
  • 50
  • 50
  • 49
  • 38
  • 35
  • 35
  • 33
  • 32
  • 32
  • 30
  • 30
  • 30
  • 28
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Online Management of Resilient and Power Efficient Multicore Processors

Rodrigues, Rance 01 September 2013 (has links)
The semiconductor industry has been driven by Moore's law for almost half a century. Miniaturization of device size has allowed more transistors to be packed into a smaller area while the improved transistor performance has resulted in a significant increase in frequency. Increased density of devices and rising frequency led, unfortunately, to a power density problem which became an obstacle to further integration. The processor industry responded to this problem by lowering processor frequency and integrating multiple processor cores on a die, choosing to focus on Thread Level Parallelism (TLP) for performance instead of traditional Instruction Level Parallelism (ILP). While continued scaling of devices have provided unprecedented integration, it has also unfortunately led to a few serious problems: The first problem is that of increasing rates of system failures due to soft errors and aging defects. Soft errors are caused by ionizing radiations that originate from radioactive contaminants or secondary release of charged particles from cosmic neutrons. Ionizing radiations may charge/discharge a storage node causing bit flips which may result in a system failure. In this dissertation, we propose solutions for online detection of such errors in microprocessors. A small and functionally limited core called the Sentry Core (SC) is added to the multicore. It monitors operation of the functional cores in the multicore and whenever deemed necessary, it opportunistically initiates Dual Modular redundancy (DMR) to test the operation of the cores in the multicore. This scheme thus allows detection of potential core failure and comes at a small hardware overhead. In addition to detection of soft errors, this solution is also capable of detecting errors introduced by device aging that results in failure of operation. The solution is further extended to verify cache coherence transactions. A second problem we address in this dissertation relate to power concerns. While the multicore solution addresses the power density problem, overall power dissipation is still limited by packaging and cooling technologies. This limits the number of cores that can be integrated for a given package specification. One way to improve performance within this constraint is to reduce power dissipation of individual cores without sacrificing system performance. There have been prior solutions to achieve this objective that involve Dynamic Voltage and Frequency Scaling (DVFS) and the use of sleep states. DVFS and sleep states take advantage of coarse grain variation in demand for computation. In this dissertation, we propose techniques to maximize performance-per-power of multicores at a fine grained time scale. We propose multiple alternative architectures to attain this goal. One of such architectures we explore is Asymmetric Multicore Processors (AMPs). AMPs have been shown to outperform the symmetric ones in terms of performance and Performance-per-Watt for a fixed resource and power budget. However, effectiveness of these architectures depends on accurate thread-to-core scheduling. To address this problem, we propose online thread scheduling solutions responding to changing computational requirements of the threads. Another solution we consider is for Symmetric Multicore processors (SMPs). Here we target sharing of the large and underutilized resources between pairs of cores. While such architectures have been explored in the past, the evaluations were incomplete. Due to sharing, sometimes the shared resource is a bottleneck resulting in significant performance loss. To mitigate such loss, we propose the Dynamic Voltage and Frequency Boosting (DVFB) of the shared resources. This solution is found to significantly mitigate performance loss in times of contention. We also explore in this dissertation, performance-per-Watt improvement of individual cores in a multicore. This is based on dynamic reconfiguration of individual cores to run them alternately in out-of-order (OOO) and in-order (InO) modes adapting dynamically to workload characteristics. This solution is found to significantly improve power efficiency without compromising overall performance. Thus, in this dissertation we propose solutions for several important problems to facilitate continued scaling of processors. Specifically, we address challenges in the area of reliability of computation and propose low power design solutions to address power constraints.
72

Proposta para aceleração de desempenho de algoritmos de visão computacional em sistemas embarcados / Proposed algorithms performance acceleration computer vision in embedded systems

Curvello, André Márcio de Lima 10 June 2016 (has links)
O presente trabalho apresenta um benchmark para avaliar o desempenho de uma plataforma embarcada WandBoard Quad no processamento de imagens, considerando o uso da sua GPU Vivante GC2000 na execução de rotinas usando OpenGL ES 2.0. Para esse fim, foi tomado por base a execução de filtros de imagem em CPU e GPU. Os filtros são as aplicações mais comumente utilizadas em processamento de imagens, que por sua vez operam por meio de convoluções, técnica esta que faz uso de sucessivas multiplicações matriciais, o que justifica um alto custo computacional dos algoritmos de filtros de imagem em processamento de imagens. Dessa forma, o emprego da GPU em sistemas embarcados é uma interessante alternativa que torna viável a realização de processamento de imagem nestes sistemas, pois além de fazer uso de um recurso presente em uma grande gama de dispositivos presentes no mercado, é capaz de acelerar a execução de algoritmos de processamento de imagem, que por sua vez são a base para aplicações de visão computacional tais como reconhecimento facial, reconhecimento de gestos, dentre outras. Tais aplicações tornam-se cada vez mais requisitadas em um cenário de uso e consumo em aplicações modernas de sistemas embarcados. Para embasar esse objetivo foram realizados estudos comparativos de desempenho entre sistemas e entre bibliotecas capazes de auxiliar no aproveitamento de recursos de processadores multicore. Para comprovar o potencial do assunto abordado e fundamentar a proposta do presente trabalho, foi realizado um benchmark na forma de uma sequência de testes, tendo como alvo uma aplicação modelo que executa o algoritmo do Filtro de Sobel sobre um fluxo de imagens capturadas de uma webcam. A aplicação foi executada diretamente na CPU e também na GPU embarcada. Como resultado, a execução em GPU por meio de OpenGL ES 2.0 alcançou desempenho quase 10 vezes maior com relação à execução em CPU, e considerando tempos de readback, obteve ganho de desempenho total de até 4 vezes. / This work presents a benchmark for evaluating the performance of an embedded WandBoard Quad platform in image processing, considering the use of its GPU Vivante GC2000 in executing routines using OpenGL ES 2.0. To this goal, it has relied upon the execution of image filters in CPU and GPU. The filters are the most commonly applications used in image processing, which in turn operate through convolutions, a technique which makes use of successive matrix multiplications, which justifies a high computational cost of image filters algorithms for image processing. Thus, the use of the GPU for embedded systems is an interesting alternative that makes it feasible to image processing performing in these systems, as well as make use of a present feature in a wide range of devices on the market, it is able to accelerate image processing algorithms, which in turn are the basis for computer vision applications such as facial recognition, gesture recognition, among others. Such applications become increasingly required in a consumption and usage scenario in modern applications of embedded systems. To support this goal were carried out a comparative studies of performance between systems and between libraries capable of assisting in the use of multicore processors resources. To prove the potential of the subject matter and explain the purpose of this study, it was performed a benchmark in the form of a sequence of tests, targeting a model application that runs Sobel filter algorithm on a stream of images captured from a webcam. The application was performed directly on the embbedded CPU and GPU. As a result, running on GPU via OpenGL ES 2.0 performance achieved nearly 10 times higher with respect to the running CPU, and considering readback times, achieved total performance gain of up to 4 times.
73

Selective Core Boosting: The Return of the Turbo Button

Wamhoff, Jons-Tobias, Diestelhorst, Stephan, Fetzer, Christof, Marlier, Patrick, Felber, Pascal, Dice, Dave 26 November 2013 (has links) (PDF)
Several modern multi-core architectures support the dynamic control of the CPU's clock rate, allowing processor cores to temporarily operate at speeds exceeding the operational base frequency. Conversely, cores can operate at a lower speed or be disabled altogether to save power. Such facilities are notably provided by Intel's Turbo Boost and AMD's Turbo CORE technologies. Frequency control is typically driven by the operating system which requests changes to the performance state of the processor based on the current load of the system. In this paper, we investigate the use of dynamic frequency scaling from user space to speed up multi-threaded applications that must occasionally execute time-critical tasks or to solve problems that have heterogeneous computing requirements. We propose a general-purpose library that allows selective control of the frequency of the cores - subject to the limitations of the target architecture. We analyze the performance trade-offs and illustrate its benefits using several benchmarks and real-world workloads when temporarily boosting selected cores executing time-critical operations. While our study primarily focuses on AMD's architecture, we also provide a comparative evaluation of the features, limitations, and runtime overheads of both Turbo Boost and Turbo CORE technologies. Our results show that we can successful exploit these new hardware facilities to accelerate the execution of key sections of code (critical paths) improving overall performance of some multi-threaded applications. Unlike prior research, we focus on performance instead of power conservation. Our results further can give guidelines for the design of hardware power management facilities and the operating system interfaces to those facilities.
74

HPI Future SOC Lab : proceedings 2011

January 2013 (has links)
Together with industrial partners Hasso-Plattner-Institut (HPI) is currently establishing a “HPI Future SOC Lab,” which will provide a complete infrastructure for research on on-demand systems. The lab utilizes the latest, multi/many-core hardware and its practical implementation and testing as well as further development. The necessary components for such a highly ambitious project are provided by renowned companies: Fujitsu and Hewlett Packard provide their latest 4 and 8-way servers with 1-2 TB RAM, SAP will make available its latest Business byDesign (ByD) system in its most complete version. EMC² provides high performance storage systems and VMware offers virtualization solutions. The lab will operate on the basis of real data from large enterprises. The HPI Future SOC Lab, which will be open for use by interested researchers also from other universities, will provide an opportunity to study real-life complex systems and follow new ideas all the way to their practical implementation and testing. This technical report presents results of research projects executed in 2011. Selected projects have presented their results on June 15th and October 26th 2011 at the Future SOC Lab Day events. / In Kooperation mit Partnern aus der Industrie etabliert das Hasso-Plattner-Institut (HPI) ein “HPI Future SOC Lab”, das eine komplette Infrastruktur von hochkomplexen on-demand Systemen auf neuester, am Markt noch nicht verfügbarer, massiv paralleler (multi-/many-core) Hardware mit enormen Hauptspeicherkapazitäten und dafür konzipierte Software bereitstellt. Das HPI Future SOC Lab verfügt über prototypische 4- und 8-way Intel 64-Bit Serversysteme von Fujitsu und Hewlett-Packard mit 32- bzw. 64-Cores und 1 - 2 TB Hauptspeicher. Es kommen weiterhin hochperformante Speichersysteme von EMC². SAP stellt ihre neueste Business by Design (ByD) Software zur Verfügung und auch komplexe reale Unternehmensdaten stehen zur Verfügung, auf die für Forschungszwecke zugegriffen werden kann. Interessierte Wissenschaftler aus universitären und außeruniversitären Forschungsinstitutionen können im HPI Future SOC Lab zukünftige hoch-komplexe IT-Systeme untersuchen, neue Ideen / Datenstrukturen / Algorithmen entwickeln und bis hin zur praktischen Erprobung verfolgen. In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2011 vorgestellt. Ausgewählte Projekte stellten ihre Ergebnisse am 15. Juni 2011 und 26. Oktober 2011 im Rahmen der Future SOC Lab Tag Veranstaltungen vor.
75

Comparación del uso de GPGPU y cluster de multicore en problemas con alta demanda computacional

Montes de Oca, Erica January 2012 (has links)
La presente Tesina de Grado tiene como objetivo la investigación y el estudio de las plataformas de memoria compartida GPU y cluster de Multicore para la resolución de problemas con alta demanda computacional. Se presentan soluciones al problema planteado con el fin de comparar rendimiento en sus versiones secuencial, paralela con memoria compartida, paralela con pasaje de mensajes, paralela híbrida y paralela en GPU. Se analiza la bondad de las soluciones en relación al tiempo de ejecución y aceleración, y se introduce el análisis de consumo energético.
76

HPI future SOC lab : proceedings 2013

January 2014 (has links)
The “HPI Future SOC Lab” is a cooperation of the Hasso-Plattner-Institut (HPI) and industrial partners. Its mission is to enable and promote exchange and interaction between the research community and the industrial partners. The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard- and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies. This technical report presents results of research projects executed in 2013. Selected projects have presented their results on April 10th and September 24th 2013 at the Future SOC Lab Day events. / Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie. Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2013 vorgestellt. Ausgewählte Projekte stellten ihre Ergebnisse am 10. April 2013 und 24. September 2013 im Rahmen der Future SOC Lab Tag Veranstaltungen vor.
77

HPI future SOC lab : proceedings 2012

January 2013 (has links)
The “HPI Future SOC Lab” is a cooperation of the Hasso-Plattner-Institut (HPI) and industrial partners. Its mission is to enable and promote exchange and interaction between the research community and the industrial partners. The HPI Future SOC Lab provides researchers with free of charge access to a complete infrastructure of state of the art hard- and software. This infrastructure includes components, which might be too expensive for an ordinary research environment, such as servers with up to 64 cores. The offerings address researchers particularly from but not limited to the areas of computer science and business information systems. Main areas of research include cloud computing, parallelization, and In-Memory technologies. This technical report presents results of research projects executed in 2012. Selected projects have presented their results on June 18th and November 26th 2012 at the Future SOC Lab Day events. / Das Future SOC Lab am HPI ist eine Kooperation des Hasso-Plattner-Instituts mit verschiedenen Industriepartnern. Seine Aufgabe ist die Ermöglichung und Förderung des Austausches zwischen Forschungsgemeinschaft und Industrie. Am Lab wird interessierten Wissenschaftlern eine Infrastruktur von neuester Hard- und Software kostenfrei für Forschungszwecke zur Verfügung gestellt. Dazu zählen teilweise noch nicht am Markt verfügbare Technologien, die im normalen Hochschulbereich in der Regel nicht zu finanzieren wären, bspw. Server mit bis zu 64 Cores und 2 TB Hauptspeicher. Diese Angebote richten sich insbesondere an Wissenschaftler in den Gebieten Informatik und Wirtschaftsinformatik. Einige der Schwerpunkte sind Cloud Computing, Parallelisierung und In-Memory Technologien. In diesem Technischen Bericht werden die Ergebnisse der Forschungsprojekte des Jahres 2012 vorgestellt. Ausgewählte Projekte stellten ihre Ergebnisse am 18. April 2012 und 14. November 2012 im Rahmen der Future SOC Lab Tag Veranstaltungen vor.
78

Estudo da influência dos parâmetros de algoritmos paralelos da computação evolutiva no seu desempenho em plataformas multicore

Pais, Mônica Sakuray 14 March 2014 (has links)
Parallel computing is a powerful way to reduce the computation time and to improve the quality of solutions of evolutionary algorithms (EAs). At first, parallel evolutionary algorithms (PEAs) ran on very expensive and not easily available parallel machines. As multicore processors become ubiquitous, the improved performance available to parallel programs is a great motivation to computationally demanding EAs to turn into parallel programs and exploit the power of multicores. The parallel implementation brings more factors to influence performance, and consequently adds more complexity on PEAs evaluations. Statistics can help in this task and guarantee the significance and correct conclusions with minimum tests, provided that the correct design of experiments is applied. This work presents a methodology that guarantees the correct estimation of speedups and applies a factorial design on the analysis of PEAs performance. As a case study, the influence of migration related parameters on the performance of a parallel evolutionary algorithm solving two benchmark problems executed on a multicore processor is evaluated. / A computação paralela é um modo poderoso de reduzir o tempo de processamento e de melhorar a qualidade das soluções dos algoritmos evolutivos (AE). No princípio, os AE paralelos (AEP) eram executados em máquinas paralelas caras e pouco disponíveis. Desde que os processadores multicore tornaram-se largamente disponíveis, sua capacidade de processamento paralelo é um grande incentivo para que os AE, programas exigentes de poder computacional, sejam paralelizados e explorem ao máximo a capacidade de processamento dos multicore. A implementação paralela traz mais fatores que podem influenciar a performance dos AEP e adiciona mais complexidade na avaliação desses algoritmos. A estatística pode ajudar nessa tarefa e garantir conclusões corretas e significativas, com o mínimo de testes, se for aplicado o planejamento de experimentos adequado. Neste trabalho é apresentada uma metodologia de experimentação com AEP. Essa metodologia garante a correta estimação do speedup e aplica ao planejamento fatorial na análise dos fatores que influenciam o desempenho. Como estudo de caso, um algoritmo genético, denominado AGP-I, foi paralelizado segundo o modelo de ilhas. O AGP-I foi executado em plataformas com diferentes processadores multicore na resolução de duas funções de teste. A metodologia de experimentação com AEP foi aplicada para se determinar a influência dos fatores relacionados à migração no desempenho do AGP-I. / Doutor em Ciências
79

Um modelo de memória transacional para arquiteturas heterogêneas baseado em software Cache / A transactional memory model for heterogeneous architectures based in Software Cache

Goldstein, Felipe Portavales 17 August 2018 (has links)
Orientador: Rodolfo Jardim de Azevedo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica / Made available in DSpace on 2018-08-17T02:02:14Z (GMT). No. of bitstreams: 1 Goldstein_FelipePortavales_M.pdf: 2303926 bytes, checksum: c44512059a990654552904a0f94d74f2 (MD5) Previous issue date: 2010 / Resumo: A adoção de processadores com múltiplos núcleos pela indústria, levou à necessidade de novas técnicas para facilitar a programação de software paralelo. A técnica chamada memórias transacionais é uma das mais promissoras. Esta técnica é capaz de executar tarefas concorrentemente de forma otimista, o que permite um bom desempenho. Outra vantagem é que a sua utilização é muito mais simples comparada com a técnica clássica de exclusão mútua. Neste trabalho é proposto o primeiro modelo de memória transacional para arquiteturas híbridas, neste caso a arquitetura alvo é o processador Cell BE. O processador Cell BE é especialmente complexo por causa das dificuldades que a arquitetura deste processador impõe ao programador quando se necessita acessar a memória global compartilhada. O modelo proposto age como uma camada entre o programa e a memória principal, permitindo um acesso transparente aos dados, garantindo coerência e realizando o controle de concorrência de forma automática. O modelo proposto utiliza Software Cache combinado com a memória transacional para facilitar o acesso à memória externa a partir dos SPEs. Ele foi implementado e testado utilizando 8 aplicativos benchmark diferentes, mostrando sua viabilidade para casos de uso reais. Foi feita uma análise detalhada de cada parte da arquitetura proposta com relação ao impacto no desempenho geral do sistema. Este modelo foi capaz de obter um desempenho até duas vezes superior à implementação utilizando um mutex global. As vantagens da utilização se concentram principalmente na facilidade de uso, garantias de coerência e por evitar alguns tipos de bugs que seriam comuns em uma implementação com mutex, como por exemplo dead-locks. Este trabalho obteve o prêmio de melhor artigo no SBAC-PAD 2008 / Abstract: The adoption of multi-core processors by the industry has pushed towards the development of new techniques to simplify programming parallel software. The technique called transactional memories is one of the most promising. This technique is able to execute multiple tasks concurrently in an optimistic way to achieve a better performance. Another advantage is that the usage of this technique is simpler than the classic mutual exclusion. This work proposes the first transactional memory model for hybrid architectures, in this case the target architecture is the Cell BE processor. The Cell BE is specially complex because of the dificulties when acessing the main shared memory from one of the SPEs. The proposed model acts as a layer between the program running and the main shared memory, allowing transparent access to the data, guaranteeing coherency and automatic concurrency control. The proposed model uses a Software Cache combined with a transactional memory to facilitate the acess to the main memory from the SPEs. This model was implemented and tested using 8 benchmark applications, showing its feasability in real use cases. A detailed analysis of its internal parts has been made to show the impact of each part in the overal system performance. The model was able to achieve a performance up to two times better than a similar implementation using a global mutex. The advantages of this model rely on its usability, coherency guaranty and because it is able to avoid concurrency programming bugs such as dead-lock, which are common in a mutex implementation. This work won the best paper award at SBAC-PAD 2008 / Mestrado / Arquitetura de Computadores / Mestre em Ciência da Computação
80

Um modelo de execução para Java no processador Cell BE / An execution model for Java on the Cell BE processor

Hoyos, Francisco Rafael Lorenzo 15 August 2018 (has links)
Orientador: Rodolfo Jardim de Azevedo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-15T06:45:02Z (GMT). No. of bitstreams: 1 Hoyos_FranciscoRafaelLorenzo_M.pdf: 663609 bytes, checksum: 9bf12382c86fbf499da0f33713f074a4 (MD5) Previous issue date: 2009 / Resumo: O Cell Broadand Engine (Cell BE) é um processador com arquitetura de múltiplos núcleos heterogêneos, voltado para o uso em aplicações de alto desempenho. Talvez mais conhecido como o processador do Playstation 3 da Sony, ele também está presente aos milhares no supercomputador Roadrunner da IBM. Entretanto, o SDK do Cell BE não suporta o desenvolvimento de aplicações sem Java. Como é sabido, Java é uma das linguagens mais utilizadas hoje em dia, nas mais variadas plataformas de hardware e para quase todos os tipos de aplicações. Este trabalho introduz um novo modelo para a execução de programas Java no Cell BE. Esse modelo permite ao programador Java executar tarefas (partes do código Java do programa principal) nos Synergistic Processing Elements (SPE), que são núcleos especializados do Cell BE, maiores responsáveis pelo grande poder de processamento desse chip. Enquanto outras soluções tentam esconder completamente a arquitetura de múltiplos núcleos heterogêneos do Cell BE, a nova proposta expõe um modelo de memória explicitamente distribuída, habilitando o programador Java a definir exatamente qual código deve executar nos SPEs. A viabilidade do modelo é então demonstrada através da melhoria de desempenho obtida consistentemente com vários programas executados em uma máquina virtual Java modificada para suportar a plataforma Cell BE. Com seis SPEs, esses programas executam, em média, aproximadamente duas vezes mais rápido do que os mesmos programas na máquina virtual Java original / Abstract: The Cell Broadband Engine (Cell BE) is a processor with a heterogeneous multicore architecture, targeted at high performance applications. Perhaps best known as the processor of Sony's PlayStation 3, it is also used (thousands of them) in the IBM Roadrunner supercomputer. However, the Cell BE SDK does not support Java application development. It is well known that Java is currently one of the most widely used languages, being present on many different hardware platforms and in almost all types of applications. This work introduces a new model for the execution of Java programs on the Cell BE. Such model allows the Java programmer to execute tasks (pieces of the main program's Java code) on the Synergistic Processing Elements (SPE), which are highly specialized cores in the Cell BE and are the main source of the chip's huge processing power. While other solutions try to completely hide the Cell BE's heterogeneous multicore architecture, this new proposal exposes an explicit distributed memory model, empowering the Java programmer to define exactly what code runs on the SPEs. The feasibility of the model is demonstrated by means of consistent performance improvements achieved with several programs executed on a Java virtual machine, which has been modified to support the Cell BE platform. With six SPEs those programs run, on average, around twice as fast as the same programs on the original Java virtual machine / Mestrado / Linguagens de Programação / Mestre em Ciência da Computação

Page generated in 0.4391 seconds