• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 42
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 62
  • 25
  • 22
  • 16
  • 15
  • 15
  • 14
  • 14
  • 14
  • 14
  • 12
  • 11
  • 8
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Clock Generator Circuits for Low-Power Heterogeneous Multiprocessor Systems-on-Chip

Höppner, Sebastian 14 March 2016 (has links) (PDF)
In this work concepts and circuits for local clock generation in low-power heterogeneous multiprocessor systems-on-chip (MPSoCs) are researched and developed. The targeted systems feature a globally asynchronous locally synchronous (GALS) clocking architecture and advanced power management functionality, as for example fine-grained ultra-fast dynamic voltage and frequency scaling (DVFS). To enable this functionality compact clock generators with low chip area, low power consumption, wide output frequency range and the capability for ultra-fast frequency changes are required. They are to be instantiated individually per core. For this purpose compact all digital phase-locked loop (ADPLL) frequency synthesizers are developed. The bang-bang ADPLL architecture is analyzed using a numerical system model and optimized for low jitter accumulation. A 65nm CMOS ADPLL is implemented, featuring a novel active current bias circuit which compensates the supply voltage and temperature sensitivity of the digitally controlled oscillator (DCO) for reduced digital tuning effort. Additionally, a 28nm ADPLL with a new ultra-fast lock-in scheme based on single-shot phase synchronization is proposed. The core clock is generated by an open-loop method using phase-switching between multi-phase DCO clocks at a fixed frequency. This allows instantaneous core frequency changes for ultra-fast DVFS without re-locking the closed loop ADPLL. The sensitivity of the open-loop clock generator with respect to phase mismatch is analyzed analytically and a compensation technique by cross-coupled inverter buffers is proposed. The clock generators show small area (0.0097mm2 (65nm), 0.00234mm2 (28nm)), low power consumption (2.7mW (65nm), 0.64mW (28nm)) and they provide core clock frequencies from 83MHz to 666MHz which can be changed instantaneously. The jitter performance is compliant to DDR2/DDR3 memory interface specifications. Additionally, high-speed clocks for novel serial on-chip data transceivers are generated. The ADPLL circuits have been verified successfully by 3 testchip implementations. They enable efficient realization of future low-power MPSoCs with advanced power management functionality in deep-submicron CMOS technologies. / In dieser Arbeit werden Konzepte und Schaltungen zur lokalen Takterzeugung in heterogenen Multiprozessorsystemen (MPSoCs) mit geringer Verlustleistung erforscht und entwickelt. Diese Systeme besitzen eine global-asynchrone lokal-synchrone Architektur sowie Funktionalität zum Power Management, wie z.B. das feingranulare, schnelle Skalieren von Spannung und Taktfrequenz (DVFS). Um diese Funktionalität zu realisieren werden kompakte Taktgeneratoren benötigt, welche eine kleine Chipfläche einnehmen, wenig Verlustleitung aufnehmen, einen weiten Bereich an Ausgangsfrequenzen erzeugen und diese sehr schnell ändern können. Sie sollen individuell pro Prozessorkern integriert werden. Dazu werden kompakte volldigitale Phasenregelkreise (ADPLLs) entwickelt, wobei eine bang-bang ADPLL Architektur numerisch modelliert und für kleine Jitterakkumulation optimiert wird. Es wird eine 65nm CMOS ADPLL implementiert, welche eine neuartige Kompensationsschlatung für den digital gesteuerten Oszillator (DCO) zur Verringerung der Sensitivität bezüglich Versorgungsspannung und Temperatur beinhaltet. Zusätzlich wird eine 28nm CMOS ADPLL mit einer neuen Technik zum schnellen Einschwingen unter Nutzung eines Phasensynchronisierers realisiert. Der Prozessortakt wird durch ein neuartiges Phasenmultiplex- und Frequenzteilerverfahren erzeugt, welches es ermöglicht die Taktfrequenz sofort zu ändern um schnelles DVFS zu realisieren. Die Sensitivität dieses Frequenzgenerators bezüglich Phasen-Mismatch wird theoretisch analysiert und durch Verwendung von kreuzgekoppelten Taktverstärkern kompensiert. Die hier entwickelten Taktgeneratoren haben eine kleine Chipfläche (0.0097mm2 (65nm), 0.00234mm2 (28nm)) und Leistungsaufnahme (2.7mW (65nm), 0.64mW (28nm)). Sie stellen Frequenzen von 83MHz bis 666MHz bereit, welche sofort geändert werden können. Die Schaltungen erfüllen die Jitterspezifikationen von DDR2/DDR3 Speicherinterfaces. Zusätzliche können schnelle Takte für neuartige serielle on-Chip Verbindungen erzeugt werden. Die ADPLL Schaltungen wurden erfolgreich in 3 Testchips erprobt. Sie ermöglichen die effiziente Realisierung von zukünftigen MPSoCs mit Power Management in modernsten CMOS Technologien.
22

Clock Generator Circuits for Low-Power Heterogeneous Multiprocessor Systems-on-Chip

Höppner, Sebastian 25 July 2013 (has links)
In this work concepts and circuits for local clock generation in low-power heterogeneous multiprocessor systems-on-chip (MPSoCs) are researched and developed. The targeted systems feature a globally asynchronous locally synchronous (GALS) clocking architecture and advanced power management functionality, as for example fine-grained ultra-fast dynamic voltage and frequency scaling (DVFS). To enable this functionality compact clock generators with low chip area, low power consumption, wide output frequency range and the capability for ultra-fast frequency changes are required. They are to be instantiated individually per core. For this purpose compact all digital phase-locked loop (ADPLL) frequency synthesizers are developed. The bang-bang ADPLL architecture is analyzed using a numerical system model and optimized for low jitter accumulation. A 65nm CMOS ADPLL is implemented, featuring a novel active current bias circuit which compensates the supply voltage and temperature sensitivity of the digitally controlled oscillator (DCO) for reduced digital tuning effort. Additionally, a 28nm ADPLL with a new ultra-fast lock-in scheme based on single-shot phase synchronization is proposed. The core clock is generated by an open-loop method using phase-switching between multi-phase DCO clocks at a fixed frequency. This allows instantaneous core frequency changes for ultra-fast DVFS without re-locking the closed loop ADPLL. The sensitivity of the open-loop clock generator with respect to phase mismatch is analyzed analytically and a compensation technique by cross-coupled inverter buffers is proposed. The clock generators show small area (0.0097mm2 (65nm), 0.00234mm2 (28nm)), low power consumption (2.7mW (65nm), 0.64mW (28nm)) and they provide core clock frequencies from 83MHz to 666MHz which can be changed instantaneously. The jitter performance is compliant to DDR2/DDR3 memory interface specifications. Additionally, high-speed clocks for novel serial on-chip data transceivers are generated. The ADPLL circuits have been verified successfully by 3 testchip implementations. They enable efficient realization of future low-power MPSoCs with advanced power management functionality in deep-submicron CMOS technologies. / In dieser Arbeit werden Konzepte und Schaltungen zur lokalen Takterzeugung in heterogenen Multiprozessorsystemen (MPSoCs) mit geringer Verlustleistung erforscht und entwickelt. Diese Systeme besitzen eine global-asynchrone lokal-synchrone Architektur sowie Funktionalität zum Power Management, wie z.B. das feingranulare, schnelle Skalieren von Spannung und Taktfrequenz (DVFS). Um diese Funktionalität zu realisieren werden kompakte Taktgeneratoren benötigt, welche eine kleine Chipfläche einnehmen, wenig Verlustleitung aufnehmen, einen weiten Bereich an Ausgangsfrequenzen erzeugen und diese sehr schnell ändern können. Sie sollen individuell pro Prozessorkern integriert werden. Dazu werden kompakte volldigitale Phasenregelkreise (ADPLLs) entwickelt, wobei eine bang-bang ADPLL Architektur numerisch modelliert und für kleine Jitterakkumulation optimiert wird. Es wird eine 65nm CMOS ADPLL implementiert, welche eine neuartige Kompensationsschlatung für den digital gesteuerten Oszillator (DCO) zur Verringerung der Sensitivität bezüglich Versorgungsspannung und Temperatur beinhaltet. Zusätzlich wird eine 28nm CMOS ADPLL mit einer neuen Technik zum schnellen Einschwingen unter Nutzung eines Phasensynchronisierers realisiert. Der Prozessortakt wird durch ein neuartiges Phasenmultiplex- und Frequenzteilerverfahren erzeugt, welches es ermöglicht die Taktfrequenz sofort zu ändern um schnelles DVFS zu realisieren. Die Sensitivität dieses Frequenzgenerators bezüglich Phasen-Mismatch wird theoretisch analysiert und durch Verwendung von kreuzgekoppelten Taktverstärkern kompensiert. Die hier entwickelten Taktgeneratoren haben eine kleine Chipfläche (0.0097mm2 (65nm), 0.00234mm2 (28nm)) und Leistungsaufnahme (2.7mW (65nm), 0.64mW (28nm)). Sie stellen Frequenzen von 83MHz bis 666MHz bereit, welche sofort geändert werden können. Die Schaltungen erfüllen die Jitterspezifikationen von DDR2/DDR3 Speicherinterfaces. Zusätzliche können schnelle Takte für neuartige serielle on-Chip Verbindungen erzeugt werden. Die ADPLL Schaltungen wurden erfolgreich in 3 Testchips erprobt. Sie ermöglichen die effiziente Realisierung von zukünftigen MPSoCs mit Power Management in modernsten CMOS Technologien.
23

Multi-objective resource management for many-core systems

Martins, Andr? Lu?s Del Mestre 19 March 2018 (has links)
Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-05-22T12:22:46Z No. of bitstreams: 1 ANDR?_LU?S_DEL_MESTRE_MARTINS_TES.pdf: 10284806 bytes, checksum: 089cdc5e5c91b6ab23816b94fdbe3d1d (MD5) / Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-06-04T11:21:09Z (GMT) No. of bitstreams: 1 ANDR?_LU?S_DEL_MESTRE_MARTINS_TES.pdf: 10284806 bytes, checksum: 089cdc5e5c91b6ab23816b94fdbe3d1d (MD5) / Made available in DSpace on 2018-06-04T11:37:12Z (GMT). No. of bitstreams: 1 ANDR?_LU?S_DEL_MESTRE_MARTINS_TES.pdf: 10284806 bytes, checksum: 089cdc5e5c91b6ab23816b94fdbe3d1d (MD5) Previous issue date: 2018-03-19 / Sistemas many-core integram m?ltiplos cores em um chip, fornecendo alto desempenho para v?rios segmentos de mercado. Novas tecnologias introduzem restri??es de pot?ncia conhecidos como utilization-wall ou dark-silicon, onde a dissipa??o de pot?ncia no chip impede que todos os PEs sejam utilizados simultaneamente em m?ximo desempenho. A carga de trabalho (workload) em sistemas many-core inclui aplica??es tempo real (RT), com restri??es de vaz?o e temporiza??o. Al?m disso, workloads t?picos geram vales e picos de utiliza??o de recursos ao longo do tempo. Este cen?rio, sistemas complexos de alto desempenho sujeitos a restri??es de pot?ncia e utiliza??o, exigem um gerenciamento de recursos (RM) multi-objetivos capaz de adaptar dinamicamente os objetivos do sistema, respeitando as restri??es impostas. Os trabalhos relacionados que tratam aplica??es RT aplicam uma an?lise em tempo de projeto com o workload esperado, para atender ?s restri??es de vaz?o e temporiza??o. Para abordar esta limita??o do estado-da-arte, ecis?es em tempo de projeto, esta Tese prop?e um gerenciamento hier?rquico de energia (REM), sendo o primeiro trabalho que considera a execu??o de aplica??es RT e ger?ncia de recursos sujeitos a restri??es de pot?ncia, sem uma an?lise pr?via do conjunto de aplica??es. REM emprega diferentes heur?sticas de mapeamento e de DVFS para reduzir o consumo de energia. Al?m de n?o incluir as aplica??es RT, os trabalhos relacionados n?o consideram um workload din?mico, propondo RMs com um ?nico objetivo a otimizar. Para tratar esta segunda limita??o do estado-da-arte, RMs com objetivo ?nico a otimizar, esta Tese apresenta um gerenciamento de recursos multi-objetivos adaptativo e hier?rquico (MORM) para sistemas many-core com restri??es de pot?ncia, considerando workloads din?micos com picos e vales de utiliza??o. MORM pode mudar dinamicamente os objetivos, priorizando energia ou desempenho, de acordo com o comportamento do workload. Ambos RMs (REM e MORM) s?o abordagens multi-objetivos. Esta Tese emprega o paradigma Observar-Decidir-Atuar (ODA) como m?todo de projeto para implementar REM e MORM. A Observa??o consiste em caracterizar os cores e integrar monitores de hardware para fornecer informa??es precisas e r?pidas relacionadas ? energia. A Atua??o configura os atuadores do sistema em tempo de execu??o para permitir que os RMs atendam ?s decis?es multi-objetivos. A Decis?o corresponde ? implementa??o do REM e do MORM, os quais compartilham os m?todos de Observa??o e Atua??o. REM e MORM destacam-se dos trabalhos relacionados devido ?s suas caracter?sticas de escalabilidade, abrang?ncia e estimativa de pot?ncia e energia precisas. As avalia??es utilizando REM em manycores com at? 144 cores reduzem o consumo de energia entre 15% e 28%, mantendo as viola??es de temporiza??o abaixo de 2,5%. Resultados mostram que MORM pode atender dinamicamente a objetivos distintos. Comparado MORM com um RM estado-da-arte, MORM otimiza o desempenho em vales de workload em 11,56% e em picos workload em at? 49%. / Many-core systems integrate several cores in a single die to provide high-performance computing in multiple market segments. The newest technology nodes introduce restricted power caps so that results in the utilization-wall (also known as dark silicon), i.e., the on-chip power dissipation prevents the use of all resources at full performance simultaneously. The workload of many-core systems includes real-time (RT) applications, which bring the application throughput as another constraint to meet. Also, dynamic workloads generate valleys and peaks of resources utilization over the time. This scenario, complex high-performance systems subject to power and performance constraints, creates the need for multi-objective resource management (RM) able to dynamically adapt the system goals while respecting the constraints. Concerning RT applications, related works apply a design-time analysis of the expected workload to ensure throughput constraints. To cover this limitation, design-time decisions, this Thesis proposes a hierarchical Runtime Energy Management (REM) for RT applications as the first work to link the execution of RT applications and RM under a power cap without design-time analysis of the application set. REM employs different mapping and DVFS (Dynamic Voltage Frequency Scaling) heuristics for RT and non-RT tasks to save energy. Besides not considering RT applications, related works do not consider the workload variation and propose single-objective RMs. To tackle this second limitation, single-objective RMs, this Thesis presents a hierarchical adaptive multi-objective resource management (MORM) for many-core systems under a power cap. MORM addresses dynamic workloads with peaks and valleys of resources utilization. MORM can dynamically shift the goals to prioritize energy or performance according to the workload behavior. Both RMs (REM and MORM), are multi-objective approaches. This Thesis employs the Observe-Decide-Act (ODA) paradigm as the design methodology to implement REM and MORM. The Observing consists on characterizing the cores and on integrating hardware monitors to provide accurate and fast power-related information for an efficient RM. The Actuation configures the system actuators at runtime to enable the RMs to follow the multi-objective decisions. The Decision corresponds to REM and MORM, which share the Observing and Actuation infrastructure. REM and MORM stand out from related works regarding scalability, comprehensiveness, and accurate power and energy estimation. Concerning REM, evaluations on many-core systems up to 144 cores show energy savings from 15% to 28% while keeping timing violations below 2.5%. Regarding MORM, results show it can drive applications to dynamically follow distinct objectives. Compared to a stateof- the-art RM targeting performance, MORM speeds up the workload valley by 11.56% and the workload peak by up to 49%.
24

Resilient and energy-efficient scheduling algorithms at scale / Algorithmes d'ordonnancement fiables et efficaces énergétiquement à l'échelle

Aupy, Guillaume 16 September 2014 (has links)
Dans cette thèse, j'ai considéré d'un point de vue théorique deux problèmes importants pour les futures plateformes dîtes Exascales : les restrictions liées à leur fiabilité ainsi que les contraintes énergétiques. En première partie de cette thèse, je me suis intéressé à l'étude de placements optimal de ces checkpoints dans un but de minimisation de temps total d'exécution. En particulier, j'ai considéré les checkpoints périodiques et coordonnés. J'ai considéré des prédicteurs de fautes capables de prévoir, de manière imparfaite, les fautes arrivant sur la plateforme. Dans ce contexte, j'ai conçu des algorithmes efficaces pour résoudre mes problèmes. Dans un deuxième temps, j'ai considéré des fautes silencieuses. Ces fautes ne peuvent être détectées qu'uniquement par un système de vérification.Dans le cas où une de ces fautes est détectée, l'utilisateur doit retourner au point de sauvegarde le plus récent qui n'a pas été affecté par cette faute, si un tel point existe ! Dans ce contexte, j'ai à nouveau proposé des algorithmes optimaux au premier ordre, mixant points de sauvegarde et points de vérification. Dans la seconde partie de cette thèse, j'ai considéré des problèmes énergétiques liés à ces mêmes plateformes. Ces problèmes critiques doivent être reliés aux problèmes de fiabilité de la partie précédente. Dans ce contexte, j'ai couplé des techniques de baisse de consommation énergétique à des techniques d'augmentation de fiabilité comme la reexécution, la réplication ainsi que le checkpoint. Pour ces différents problèmes, j'ai pu fournir des algorithmes dont l'efficacité a été montrée soit au travers de simulations, soit grâce à des preuves mathématiques. / This thesis deals with two issues for future Exascale platforms, namelyresilience and energy.In the first part of this thesis, we focus on the optimal placement ofperiodic coordinated checkpoints to minimize execution time.We consider fault predictors, a software used by system administratorsthat tries to predict (through the study of passed events) where andwhen faults will strike. In this context, we propose efficientalgorithms, and give a first-order optimal formula for the amount ofwork that should be done between two checkpoints.We then focus on silent data corruption errors. Contrarily to fail-stopfailures, such latent errors cannot be detected immediately, and amechanism to detect them must be provided. We compute the optimal periodin order to minimize the waste.In the second part of the thesis we address the energy consumptionchallenge.The speed scaling technique consists in diminishing the voltage of theprocessor, hence diminishing its execution speed. Unfortunately, it waspointed out that DVFS increases the probability of failures. In thiscontext, we consider the speed scaling technique coupled withreliability-increasing techniques such as re-execution, replication orcheckpointing. For these different problems, we propose variousalgorithms whose efficiency is shown either through thoroughsimulations, or approximation results relatively to the optimalsolution. Finally, we consider the different energetic costs involved inperiodic coordinated checkpointing and compute the optimal period tominimize energy consumption, as we did for execution time.
25

Integração de características preemptivas à técnica de escalonamento dinâmico de tensões e frequências intra-tarefa

Gonçalves, Rawlinson da Silva 10 July 2015 (has links)
Submitted by Lúcia Brandão (lucia.elaine@live.com) on 2015-12-11T18:22:47Z No. of bitstreams: 1 Dissertação - Rawlinson da Silva Gonçalves.pdf: 25918994 bytes, checksum: 31dbbcde9e265b8281faa9ef25a9b346 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2016-01-20T15:20:18Z (GMT) No. of bitstreams: 1 Dissertação - Rawlinson da Silva Gonçalves.pdf: 25918994 bytes, checksum: 31dbbcde9e265b8281faa9ef25a9b346 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2016-01-20T15:23:25Z (GMT) No. of bitstreams: 1 Dissertação - Rawlinson da Silva Gonçalves.pdf: 25918994 bytes, checksum: 31dbbcde9e265b8281faa9ef25a9b346 (MD5) / Made available in DSpace on 2016-01-20T15:23:25Z (GMT). No. of bitstreams: 1 Dissertação - Rawlinson da Silva Gonçalves.pdf: 25918994 bytes, checksum: 31dbbcde9e265b8281faa9ef25a9b346 (MD5) Previous issue date: 2015-07-10 / FAPEAM - Fundação de Amparo à Pesquisa do Estado do Amazonas / Embedded systems have evolved significantly in recent years,mainlyduetoadvances in technology, cost reduction of electronic equipment, and mainly the popularization of mobile devices. Many of these systems require energy resources from battery to maintain the operation of their various components. However, for these devices to have a good autonomy, several techniques and methodologies have been implemented to better manage energy consumption of the system as a whole. This need has contributed to the rise of various lines of research, mainly in the area of real-time systems, where the complicating factor is not only reducing energy consumptionbutalsorespectthetime constraints of all tasks running on the system. Thus, this work aims to maximize energy gains from the use of intra-task dynamic voltage and frequency scaling technique, also known as intra-task DVFS. The proposed online methodology aims to achieve better management of exchanging voltages and frequency of the processor, through a collaborative approach between real-time applications and the operating system. Therefore, both can work together, within the kernel of the system, to reduce the response times of the processor context switches, mainly after preemptions. The experimental results, using the C-Benchmarck, showed that it is possible to decrease about 6% processor power consumption even performing all tasks in the worst case. / Os sistemas embarcados têm evoluído significativamente nos últimos anos, principalmente devido aos avanços da tecnologia, a redução dos custos dos equipamentos eletrônicos e a popularização dos dispositivos móveis. Muitos desses sistemas dependem da energia provenientes de baterias para manter o funcionamento dos seus diversos componentes. No entanto, para que esses dispositivos tenham uma boa autonomia, várias técnicas e metodologias têm sido propostas para melhor gerenciar o consumo de energia do sistema como um todo. Essa necessidade tem contribuído para o surgimento de diversas linhas de pesquisa, principalmente na área de sistemas de tempo real, onde o fator complicante não está somente em reduzir o consumo de energia, mas também em respeitar as restrições temporais de todas as tarefas em execução no sistema. Sendo assim, este trabalho tem como objetivo diminuir o consumo de energia do processador utilizando a técnica de escalonamento dinâmicodetensõesefrequênciasdo processador intra-tarefa, também conhecido como DVFS intra-tarefa (em inglês, Dynamic Voltage and Frequency Scaling). A metodologia online proposta visa realizar ogerenciamentodastrocasdetensõesefrequênciasdoprocessador, através de uma abordagem colaborativa entre as aplicações de tempo real e o sistema operacional. Dessa forma, ambos podem trabalhar em conjunto, dentro do núcleo do sistema, para diminuir os tempos de resposta dos chaveamentos de tensões e frequências do processador, principalmente diante de sucessivas preempções entre as aplicações de tempo real em execução no sistema. Os resultados experimentais dessa metodologia, utilizando o C-Benchmarck, mostraram que é possível diminuircercade6%oconsumo de energia do processador, mesmo executando todas as tarefasnopiorcaso.
26

Inserção de Código DVFS-Aware em Sistemas de tempo real críticos

Pinheiro, Diego Quintana 25 September 2015 (has links)
Submitted by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2016-11-24T12:43:54Z No. of bitstreams: 1 Dissertação - Diego Q. Pinheiro.pdf: 1711679 bytes, checksum: e41a75f9b4c8239fe90ffde9746a3501 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2016-11-24T12:45:04Z (GMT) No. of bitstreams: 1 Dissertação - Diego Q. Pinheiro.pdf: 1711679 bytes, checksum: e41a75f9b4c8239fe90ffde9746a3501 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2016-11-24T12:45:23Z (GMT) No. of bitstreams: 1 Dissertação - Diego Q. Pinheiro.pdf: 1711679 bytes, checksum: e41a75f9b4c8239fe90ffde9746a3501 (MD5) / Made available in DSpace on 2016-11-24T12:45:23Z (GMT). No. of bitstreams: 1 Dissertação - Diego Q. Pinheiro.pdf: 1711679 bytes, checksum: e41a75f9b4c8239fe90ffde9746a3501 (MD5) Previous issue date: 2015-09-25 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Performance and energy consumption are directly related. To increase performance, the number of instructions per second to be executed must also be increased, in other words, processor frequency must be changed. The higher this value is, higher energy consumption also has to be. Likewise, by decreasing the number of instructions to be executed, energy consumption and performance are also reduced. So, exploring performance and energy relation is the key idea behind Dynamic Voltage and Frequency Scaling – DVFS, technique. Applying DVFS in real time systems is not a trivial task. These system’s tasks are bounded to timing constraints in such a way that, if decreasing performance does not guarantee constraints, the system may totally fail. Thus, this work aims to gather two DVFS approaches in real time systems: intra and inter-tasks. The intra-task analyzes execution flow of a task and identify where the new instructions can be inserted to change supply voltage and frequency when the worst case path is not followed. On the other hand, the inter-task approach analyzes how long a task will wait due to interferences (e.g. preemption, shared resources), verifies system schedulability and defines a set of initial optimum frequencies in multi-task environment. The result is a new code with the same functionality as the original one, however with instructions to change voltage and frequency when taking into account a task interferences. Moreover, the experimental results show not only energy consumption was reduced, but also timing constraints were satisfied. / Desempenho e consumo de energia são variáveis diretamente proporcionais. Para aumentar o desempenho, é necessário também aumentar o número de instruções por segundo a serem executadas, ou seja, alterar a frequência do processador. Quanto maior for este valor, também será o consumo de energia. Do mesmo modo, reduzir o consumo de energia implica diminuir o número de instruções a serem executadas e, logo, o desempenho. Explorar a relação entre desempenho e consumo de energia é a ideia base da técnica de escalonamento dinâmico de tensão e frequência DVFS (do inglês Dynamic Voltage and Frequency Scaling). Em sistemas de tempo real críticos, aplicar a técnica DVFS não é uma tarefa trivial. Estes sistemas associam a execução de uma tarefa a um limite temporal, de modo que, se este valor não for respeitado, devido à redução do desempenho, falhas graves podem ocorrer ao sistema. Assim, esta dissertação tem como objetivo unir duas abordagens da técnica DVFS em sistemas de tempo real críticos: uma intra e outra inter-tarefas. A abordagem intra-tarefa procura analisar o fluxo de execução de uma tarefa e identificar pontos onde é possível inserir instruções para troca de frequência e tensão, quando a execução de uma tarefa se distanciar do pior caso. Já a abordagem inter-tarefas, é responsável por: analisar o tempo de espera na execução de uma tarefa devido às interferências (preempções, compartilhamento de recursos), verificar a escalonabilidade do sistema e determinar um conjunto de frequências iniciais ótimas em ambientes de múltiplas tarefas. O resultado deste estudo é a geração de um novo código com funcionalidade igual ao de entrada, porém com instruções de troca de frequência e tensão, consideradas as interferências que uma tarefa possa sofrer. Além disso, resultados experimentais mostram como não só foi possível reduzir o consumo de energia, mas também respeitar os limites temporais das tarefas em questão.
27

Contrôle distribué pour les systèmes multi-cœurs auto-adaptatifs / Distributed Control for Self-adaptatif Multi-Core Architectures

Mansouri, Imen 30 November 2011 (has links)
Les architectures régulières intégrant plusieurs cœurs de traitement sont davantage utilisées dans les systèmes embarqués. Dans cette thèse, on s'intéresse aux mécanismes d'optimisation d'énergie dans des architectures avec une dimension étendue; pour faire face aux problèmes de variabilité technologique et aux changements du contexte applicatif, le processus d'optimisation se déroule en temps réel. Des capteurs in-situ détectent le degré de dégradation du circuit. Quant a la variabilité applicative, des moniteurs d'activité sont insérés sur un niveau architectural pour estimer la charge de travail engendrée par l'application en cours et la consommation qui en découle. Nous avons développé une méthode systématique pour l'intégration de ces capteurs avec un moindre coût en surface. Leurs sorties alimentent un processus d'optimisation basé sur la théorie de consensus et dupliqué dans chaque cœur. Ce contrôle vise à fixer la meilleure configuration locale à chaque cœur permettant d'optimiser la consommation globale du système tout en respectant les contraintes temps réel de l'application en cours. Ce schéma opère d'une manière complètement distribuée afin de garantir la scalabilité de notre solution, et donc sa faisabilité, compte tenu de la complexité des circuits actuels et futurs. / Regular architectures embedding several processing elements are increasingly used in embedded systems. They require careful design to avoid high power consumption and to improve their flexibility. This thesis work deals with optimization mechanisms of large scale architectures; to meet variability issues, optimization is processed at run-time. The target design implements in-situ features to collect physical information about its yield and to monitor application workload and generated consumption. As for workload monitoring, we use activity counters connected at architecture level to a set of critical signals. We developed an automated method to optimally place these features with a minimal area overhead. The collected information are used further jointly with a power model to estimate the dissipated power and then driven appropriate optimization process. Optimal frequency for each core is set by means of a distributed controller based on consensus theory. The resulting settings aim to reduce the whole system power while fulfilling application constraints. The scheme needs to be fully distributed to garantee the control scalability, and so feasibility, as the number of cores scales.
28

Dynamic Power Management in a Heterogeneous Processor Architecture

Arega, Frehiwot Melak, Hähnel, Markus, Dargie, Waltenegus 15 May 2023 (has links)
Emerging mobile platforms integrate heterogeneous, multicore processors to efficiently deal with the heterogeneity of data (in magnitude, type, and quality). The main goal is to achieve a high degree of energy-proportionality which corresponds with the nature and fluctuation of mobile workloads. Most existing power and energy consumption analyses of these architectures rely on simulation or static benchmarks neither of which truly reflects the type of workload the processors handle in reality. By contrast, we generate two types of stochastic workloads and employ four types of dynamic voltage and frequency scaling (DVFS) policies to investigate the energy proportionality and the dynamic power consumption characteristics of a heterogeneous processor architecture when operating in different configurations. The analysis illustrates, both qualitatively and quantitatively, that knowledge of the statistics of the incoming workload is critical to determine the appropriate processor configuration.
29

Performance prediction for dynamic voltage and frequency scaling

Miftakhutdinov, Rustam Raisovich 28 October 2014 (has links)
This dissertation proves the feasibility of accurate runtime prediction of processor performance under frequency scaling. The performance predictors developed in this dissertation allow processors capable of dynamic voltage and frequency scaling (DVFS) to improve their performance or energy efficiency by dynamically adapting chip or core voltages and frequencies to workload characteristics. The dissertation considers three processor configurations: the uniprocessor capable of chip-level DVFS, the private cache chip multiprocessor capable of per-core DVFS, and the shared cache chip multiprocessor capable of per-core DVFS. Depending on processor configuration, the presented performance predictors help the processor realize 72–85% of average oracle performance or energy efficiency gains. / text
30

Energy efficient design of an adaptive switching algorithm for the iterative-MIMO receiver

Mohd Tadza, Noor Zahrinah Binti January 2015 (has links)
An efficient design dedicated for iterative-multiple-input multiple-output (MIMO) receiver systems is now imperative in our world since data demands are increasing tremendously in wireless networks. This puts a massive burden on the signal processing power especially in small receiver systems where power sources are often shared or limited. This thesis proposes an attractive solution to both the wireless signal processing and the architectural implementation design sides of the problem. A novel algorithm, dubbed the Adaptive Switching Algorithm, is proven to not only save more than a third of the energy consumption in the algorithmic design, but is also able to achieve an energy reduction of more than 50% in terms of processing power when the design is mapped onto state-of-the-art programmable hardware. Simulations are based in MatlabTM using the Monte Carlo approach, where multiple additive white Gaussian noise (AWGN) and Rayleigh fading channels for both fast and slow fading environments were investigated. The software selects the appropriate detection algorithm depending on the current channel conditions. The design for the hardware is based on the latest field programmable gate arrays (FPGA) hardware from Xilinx R , specifically the Virtex-5 and Virtex-7 chipsets. They were chosen during the experimental phase to verify the results in order to examine trends for energy consumption in the proposed algorithm design. Savings come from dynamic allocation of the hardware resources by implementing power minimization techniques depending on the processing requirements of the system. Having demonstrated the feasibility of the algorithm in controlled environments, realistic channel conditions were simulated using spatially correlated MIMO channels to test the algorithm’s readiness for real-world deployment. The proposed algorithm is placed in both the MIMO detector and the iterative-decoder blocks of the receiver. When the final full receiver design setup is implemented, it shows that the key to energy saving lies in the fact that both software and hardware components of the Adaptive Switching Algorithm adopt adaptivity in the respective designs. The detector saves energy by selecting suitable detection schemes while the decoder provides adaptivity by limiting the number of decoding iterations, both of which are updated in real-time. The overall receiver can achieve more than 70% energy savings in comparison to state-of-the-art iterative-MIMO receivers and thus it can be concluded that this level of ‘intelligence’ is an important direction towards a more efficient iterative-MIMO receiver designs in the future.

Page generated in 0.046 seconds