• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 61
  • Tagged with
  • 68
  • 68
  • 68
  • 56
  • 55
  • 54
  • 54
  • 54
  • 54
  • 54
  • 54
  • 54
  • 54
  • 44
  • 30
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

ADACORE: Achieving Energy Efficiency via Adaptive Core Morphing at Runtime

Kurella, Nithesh 23 November 2015 (has links)
Heterogeneous multicore processors offer an energy-efficient alternative to homogeneous multicores. Typically, heterogeneous multi-core refers to a system with more than one core where all the cores use a single ISA but differ in one or more micro-architectural configurations. A carefully designed multicore system consists of cores of diverse power and performance profiles. During execution, an application is run on a core that offers the best trade-off between performance and energy-efficiency. Since the resource needs of an application may vary with time, so does the optimal core choice. Moving a thread from one core to another involves transferring the entire processor state and cache warm-up. Frequent migration leads to large performance overhead, negating any benefits of migration. Infrequent migration on the other hand leads to missed opportunities. Thus, reducing overhead of migration is integral to harnessing benefits of heterogeneous multicores. \par This work proposes \textit{AdaCore}, a novel core architecture which pushes the heterogeneity exploited in the heterogeneous multicore into a single core. \textit{AdaCore} primarily addresses the resource bottlenecks in workloads. The design attempts to adaptively match the resource demands by reconfiguring on-chip resources at a fine-grain granularity. The adaptive core morphing allows core configurations with diverse power and performance profiles within a single core by adaptive voltage, frequency and resource reconfiguration. Towards this end, the proposed novel architecture while providing energy savings, improves performance with a low overhead in-core reconfiguration. This thesis further compares \textit{AdaCore} with a standard Out-of-Order core with capability to perform Dynamic Voltage and Frequency Scaling (DVFS) designed to achieve energy efficiency. The results presented in this thesis indicate that the proposed scheme can improve the performance/Watt of application, on average, by 32\% over a static out-of-order core and by 14\% over DVFS. The proposed scheme improves $IPS^{2}/Watt$ by 38\% over static out-of-order core.
2

Energy Efficient Computing in FPGA Through Embedded RAM Blocks

Ghosh, Anandaroop 16 August 2013 (has links)
No description available.
3

HIGH-PERFORMANCE AND RELIABLE INTERMITTENT COMPUTATION

Jongouk Choi (8536866) 26 July 2022 (has links)
<p>    </p> <p>An energy harvesting system (EHS) provides the intriguing possibility of battery-less computing and enables various applications such as wearable, industrial or environmental sensors, and im- plantable medical devices. The biggest challenge of EHS is the instability of energy sources (e.g., Wi-Fi, solar, thermal energy, etc.) which causes unpredictable and frequent power outages. To address the challenge, existing works introduce software-based and hardware-based power failure recovery solutions that ensure program correctness across a power outage. However, they cause a significant performance overhead without providing the high quality of service in reality, and suffer from a reliability issue. In this dissertation, we address the limitations of recovery solutions across the system stack, from the compiler-directed approach and run-time systems to hardware mechanisms, and demonstrate the effectiveness of the approaches using real EHS platforms and simulators. We first present software-based recovery solutions by leveraging compiler support. We develop a compiler-directed solution built upon commodity EHS platform that can achieve 3X speedup compared to the software-based state-of-the-art solution. We also introduce a compiler optimization technique that can cooperate with run-time systems and hardware support, achieving 8X speedup compared to the software-based solution. We then present hardware-based recov- ery solutions by leveraging compiler and hardware support. We develop an architecture/compiler co-design solution that re-purposes existing hardware components in a core for power failure spec- ulative execution, a new speculation paradigm, and leverages a novel compiler analysis for cor- rect power failure recovery. Our result highlights 2 ∼ 3x performance improvement compared to the hardware-based state-of-the-art solution without requiring hardware modification. Next, we present a new cache design for EHS that can achieve cost-effective, high-performance intermit- tent computing. According to experimental results, the new cache design outperforms the state- of-the-art cache scheme by 4X and reduces the hardware cost by 90%. Finally, we present an operating system (OS)-driven solution to address a reliability problem on EHS devices while all existing works are vulnerable, causing the wrong recovery across power failure. Our experiments demonstrate that the solution causes less than 1% run-time overhead and successfully addresses the reliability problem without compromising correct power failure recovery. </p>
4

Energy Efficient Computing Using Scalable General Purpose Analog Processors

De Guzman, Ethan Paul Palisoc 01 June 2021 (has links) (PDF)
Due to fundamental physical limitations, conventional digital circuits have not been able to scale at the pace expected from Moore’s law. In addition, computationally intensive applications such as neural networks and computer vision demand large amounts of energy from digital circuits. As a result, energy efficient alternatives are needed in order to provide continued performance scaling. Analog circuits have many well known benefits: the ability to store more information onto a single wire and efficiently perform mathematical operations such as addition, subtraction, and differential equation solving. However, analog computing also comes with drawbacks such as its sensitivity to process variation and noise, limited scalability, programming difficulty, and poor compatibility with digital circuits and design tools. We propose to leverage the strengths of analog circuits and avoid its weaknesses by using digital circuits and time-encoded computation. Time-encoded circuits also operate on continuous data but are implemented using digital circuits. We propose a novel scalable general purpose analog processor using time-encoded circuits that is well suited for emerging applications that require high numeric precision. The processor’s datapath, including time-domain register file and function units are described. We evaluate our proposed approach using an implementation that is simulated with a 0.18µm TSMC process and demonstrate that this approach improves the performance of a scientific benchmark by 4x compared against conventional analog implementations and improves energy consumption by 146x compared against digital implementations.
5

Power Optimization of Data Center Network with Scalability and Performance Control

Zheng, Kuangyu 03 December 2018 (has links)
No description available.
6

Towards Energy Efficient Data Mining & Graph Processing

Faisal, S M January 2015 (has links)
No description available.
7

Capacity of Communications Channels with 1-Bit Quantization and Oversampling at the Receiver

Krone, Stefan, Fettweis, Gerhard 25 January 2013 (has links) (PDF)
Communications receivers that rely on 1-bit analogto-digital conversion are advantageous in terms of hardware complexity and power dissipation. Performance limitations due to the 1-bit quantization can be tackled with oversampling. This paper considers the oversampling gain from an information-theoretic perspective by analyzing the channel capacity with 1-bit quantization and oversampling at the receiver for the particular case of AWGN channels. This includes a numerical computation of the capacity and optimal transmit symbol constellations, as well as the derivation of closed-form expressions for large oversampling ratios and for high signal-to-noise ratios of the channel.
8

From dataflow models to energy efficient application specific processors

Hautala, I. (Ilkka) 11 October 2019 (has links)
Abstract The development of wireless networks has provided the necessary conditions for several new applications. The emergence of the virtual and augmented reality and the Internet of things and during the era of social media and streaming services, various demands related to functionality and performance have been set for mobile and wearable devices. Meeting these demands is complicated due to minimal energy budgets, which are characteristic of embedded devices. Lately, the energy efficiency of devices has been addressed by increasing parallelism and the use of application-specific hardware resources. This has been hindered by hardware development as well as software development because the conventional development methods are based on the use of low-level abstractions and sequential programming paradigms. On the other hand, deployment of high-level design methods is slowed down because of final solutions that are too much compromised when energy efficiency and performance are considered. This doctoral thesis introduces a model-driven framework for the development of signal processing systems that facilitates hardware and software co-design. The design flow exploits an easily customizable, re-programmable and energy-efficient processor template. The proposed design flow enables tailoring of multiple heterogeneous processing elements and the connections between them to the demands of an application. Application software is described by using high-level dataflow models, which enable the automatic synthesis of parallel applications for different multicore hardware platforms and speed up design space exploration. Suitability of the proposed design flow is demonstrated by using three different applications from different signal processing domains. The experiments showed that raising the level of abstraction has only a minor impact on performance. Video processing algorithms are selected to be the main application area in this thesis. The thesis proposes tailored and reprogrammable energy-efficient processing elements for video coding algorithms. The solutions are based on the use of multiple processing elements by exploiting the pipeline parallelism of the application, which is characteristic of many signal processing algorithms. Performance, power and area metrics for the designed solutions have been obtained using post-layout simulation models. In terms of energy efficiency, the proposed programmable processors form a new compromise solution between fixed hardware accelerators and conventional embedded processors for video coding. / Tiivistelmä Langattomien verkkojen kehittyminen on luonut edellytykset useille uusille sovelluksille. Muiden muassa sosiaalisen media, suoratoistopalvelut, virtuaalitodellisuus ja esineiden internet asettavat kannettaville ja puettaville laitteille moninaisia toimintoihin, suorituskykyyn, energiankulutukseen ja fyysiseen muotoon liittyviä vaatimuksia. Yksi isoimmista haasteista on sulautettujen laitteiden energiankulutus. Laitteiden energiatehokkuutta on pyritty parantamaan rinnakkaislaskentaa ja räätälöityjä laskentaresursseja hyödyntämällä. Tämä puolestaan on vaikeuttanut niin laite- kuin sovelluskehitystä, koska laajassa käytössä olevat kehitystyökalut perustuvat matalan tason abstraktioihin ja hyödyntävät alun perin yksi ydinprosessoreille suunniteltuja ohjelmointikieliä. Korkean tason ja automatisoitujen kehitysmenetelmien käyttöönottoa on hidastanut aikaansaatujen järjestelmien puutteellinen suorituskyky ja laiteresurssien tehoton hyödyntäminen. Väitöskirja esittelee datavuopohjaiseen suunnitteluun perustuvan työkaluketjun, joka on tarkoitettu energiatehokkaiden signaalikäsittelyjärjestelmien toteuttamiseen. Työssä esiteltävä suunnitteluvuo pohjautuu laitteistoratkaisuissa räätälöitävään ja ohjelmoitavaan siirtoliipaistavaan prosessoritemplaattiin. Ehdotettu suunnitteluvuo mahdollistaa useiden heterogeenisten prosessoriytimien ja niiden välisten kytkentöjen räätälöimisen sovelluksien tarpeiden vaatimalla tavalla. Suunnitteluvuossa ohjelmistot kuvataan korkean tason datavuomallien avulla. Tämä mahdollistaa erityisesti rinnakkaista laskentaa sisältävän ohjelmiston automaattisen sovittamisen erilaisiin moniprosessorijärjestelmiin ja nopeuttaa erilaisten järjestelmätason ratkaisujen kartoittamista. Suunnitteluvuon käyttökelpoisuus osoitetaan käyttäen esimerkkinä kolmea eri signaalinkäsittelysovellusta. Tulokset osoittavat, että suunnittelumenetelmien abstraktiotasoa on mahdollista nostaa ilman merkittävää suorituskyvyn heikkenemistä. Väitöskirjan keskeinen sovellusalue on videonkoodaus. Työ esittelee videonkoodaukseen suunniteltuja energiatehokkaita ja uudelleenohjelmoitavia prosessoriytimiä. Ratkaisut perustuvat usean prosessoriytimen käyttämiseen hyödyntäen erityisesti videonkäsittelyalgoritmeille ominaista liukuhihnarinnakkaisuutta. Prosessorien virrankulutus, suorituskyky ja pinta-ala on analysoitu käyttämällä simulointimalleja, jotka huomioivat logiikkasolujen sijoittelun ja johdotuksen. Ehdotetut sovelluskohtaiset prosessoriratkaisut tarjoavat uuden energiatehokkaan kompromissiratkaisun tavanomaisten ohjelmoitavien prosessoreiden ja kiinteästi johdotettujen video-kiihdyttimien välille.
9

Measuring energy consumption for short code paths using RAPL

Hähnel, Marcus, Döbel, Björn, Völp, Marcus, Härtig, Hermann 28 May 2013 (has links) (PDF)
Measuring the energy consumption of software components is a major building block for generating models that allow for energy-aware scheduling, accounting and budgeting. Current measurement techniques focus on coarse-grained measurements of application or system events. However, fine grain adjustments in particular in the operating-system kernel and in application-level servers require power profiles at the level of a single software function. Until recently, this appeared to be impossible due to the lacking fine grain resolution and high costs of measurement equipment. In this paper we report on our experience in using the Running Average Power Limit (RAPL) energy sensors available in recent Intel CPUs for measuring energy consumption of short code paths. We investigate the granularity at which RAPL measurements can be performed and discuss practical obstacles that occur when performing these measurements on complex modern CPUs. Furthermore, we demonstrate how to use the RAPL infrastructure to characterize the energy costs for decoding video slices.
10

QPPT: Query Processing on Prefix Trees

Kissinger, Thomas, Schlegel, Benjamin, Habich, Dirk, Lehner, Wolfgang 28 May 2013 (has links) (PDF)
Modern database systems have to process huge amounts of data and should provide results with low latency at the same time. To achieve this, data is nowadays typically hold completely in main memory, to benefit of its high bandwidth and low access latency that could never be reached with disks. Current in-memory databases are usually columnstores that exchange columns or vectors between operators and suffer from a high tuple reconstruction overhead. In this paper, we present the indexed table-at-a-time processing model that makes indexes the first-class citizen of the database system. The processing model comprises the concepts of intermediate indexed tables and cooperative operators, which make indexes the common data exchange format between plan operators. To keep the intermediate index materialization costs low, we employ optimized prefix trees that offer a balanced read/write performance. The indexed tableat-a-time processing model allows the efficient construction of composed operators like the multi-way-select-join-group. Such operators speed up the processing of complex OLAP queries so that our approach outperforms state-of-the-art in-memory databases.

Page generated in 0.0783 seconds