Spelling suggestions: "subject:"cofficient computing"" "subject:"cofficient acomputing""
1 |
ADACORE: Achieving Energy Efficiency via Adaptive Core Morphing at RuntimeKurella, Nithesh 23 November 2015 (has links)
Heterogeneous multicore processors offer an energy-efficient alternative to homogeneous multicores. Typically, heterogeneous multi-core refers to a system with more than one core where all the cores use a single ISA but differ in one or more micro-architectural configurations. A carefully designed multicore system consists of cores of diverse power and performance profiles. During execution, an application is run on a core that offers the best trade-off between performance and energy-efficiency. Since the resource needs of an application may vary with time, so does the optimal core choice. Moving a thread from one core to another involves transferring the entire processor state and cache warm-up. Frequent migration leads to large performance overhead, negating any benefits of migration. Infrequent migration on the other hand leads to missed opportunities. Thus, reducing overhead of migration is integral to harnessing benefits of heterogeneous multicores. \par This work proposes \textit{AdaCore}, a novel core architecture which pushes the heterogeneity exploited in the heterogeneous multicore into a single core. \textit{AdaCore} primarily addresses the resource bottlenecks in workloads. The design attempts to adaptively match the resource demands by reconfiguring on-chip resources at a fine-grain granularity. The adaptive core morphing allows core configurations with diverse power and performance profiles within a single core by adaptive voltage, frequency and resource reconfiguration. Towards this end, the proposed novel architecture while providing energy savings, improves performance with a low overhead in-core reconfiguration. This thesis further compares \textit{AdaCore} with a standard Out-of-Order core with capability to perform Dynamic Voltage and Frequency Scaling (DVFS) designed to achieve energy efficiency.
The results presented in this thesis indicate that the proposed scheme can improve the performance/Watt of application, on average, by 32\% over a static out-of-order core and by 14\% over DVFS. The proposed scheme improves $IPS^{2}/Watt$ by 38\% over static out-of-order core.
|
2 |
Energy Efficient Computing in FPGA Through Embedded RAM BlocksGhosh, Anandaroop 16 August 2013 (has links)
No description available.
|
3 |
HIGH-PERFORMANCE AND RELIABLE INTERMITTENT COMPUTATIONJongouk Choi (8536866) 26 July 2022 (has links)
<p> </p>
<p>An energy harvesting system (EHS) provides the intriguing possibility of battery-less computing and enables various applications such as wearable, industrial or environmental sensors, and im- plantable medical devices. The biggest challenge of EHS is the instability of energy sources (e.g., Wi-Fi, solar, thermal energy, etc.) which causes unpredictable and frequent power outages. To address the challenge, existing works introduce software-based and hardware-based power failure recovery solutions that ensure program correctness across a power outage. However, they cause a significant performance overhead without providing the high quality of service in reality, and suffer from a reliability issue. In this dissertation, we address the limitations of recovery solutions across the system stack, from the compiler-directed approach and run-time systems to hardware mechanisms, and demonstrate the effectiveness of the approaches using real EHS platforms and simulators. We first present software-based recovery solutions by leveraging compiler support. We develop a compiler-directed solution built upon commodity EHS platform that can achieve 3X speedup compared to the software-based state-of-the-art solution. We also introduce a compiler optimization technique that can cooperate with run-time systems and hardware support, achieving 8X speedup compared to the software-based solution. We then present hardware-based recov- ery solutions by leveraging compiler and hardware support. We develop an architecture/compiler co-design solution that re-purposes existing hardware components in a core for power failure spec- ulative execution, a new speculation paradigm, and leverages a novel compiler analysis for cor- rect power failure recovery. Our result highlights 2 ∼ 3x performance improvement compared to the hardware-based state-of-the-art solution without requiring hardware modification. Next, we present a new cache design for EHS that can achieve cost-effective, high-performance intermit- tent computing. According to experimental results, the new cache design outperforms the state- of-the-art cache scheme by 4X and reduces the hardware cost by 90%. Finally, we present an operating system (OS)-driven solution to address a reliability problem on EHS devices while all existing works are vulnerable, causing the wrong recovery across power failure. Our experiments demonstrate that the solution causes less than 1% run-time overhead and successfully addresses the reliability problem without compromising correct power failure recovery. </p>
|
4 |
Energy Efficient Computing Using Scalable General Purpose Analog ProcessorsDe Guzman, Ethan Paul Palisoc 01 June 2021 (has links) (PDF)
Due to fundamental physical limitations, conventional digital circuits have not been able to scale at the pace expected from Moore’s law. In addition, computationally intensive applications such as neural networks and computer vision demand large amounts of energy from digital circuits. As a result, energy efficient alternatives are needed in order to provide continued performance scaling. Analog circuits have many well known benefits: the ability to store more information onto a single wire and efficiently perform mathematical operations such as addition, subtraction, and differential equation solving. However, analog computing also comes with drawbacks such as its sensitivity to process variation and noise, limited scalability, programming difficulty, and poor compatibility with digital circuits and design tools. We propose to leverage the strengths of analog circuits and avoid its weaknesses by using digital circuits and time-encoded computation. Time-encoded circuits also operate on continuous data but are implemented using digital circuits. We propose a novel scalable general purpose analog processor using time-encoded circuits that is well suited for emerging applications that require high numeric precision. The processor’s datapath, including time-domain register file and function units are described. We evaluate our proposed approach using an implementation that is simulated with a 0.18µm TSMC process and demonstrate that this approach improves the performance of a scientific benchmark by 4x compared against conventional analog implementations and improves energy consumption by 146x compared against digital implementations.
|
5 |
Power Optimization of Data Center Network with Scalability and Performance ControlZheng, Kuangyu 03 December 2018 (has links)
No description available.
|
6 |
Towards Energy Efficient Data Mining & Graph ProcessingFaisal, S M January 2015 (has links)
No description available.
|
7 |
Capacity of Communications Channels with 1-Bit Quantization and Oversampling at the ReceiverKrone, Stefan, Fettweis, Gerhard 25 January 2013 (has links) (PDF)
Communications receivers that rely on 1-bit analogto-digital conversion are advantageous in terms of hardware complexity and power dissipation. Performance limitations due to the 1-bit quantization can be tackled with oversampling. This paper considers the oversampling gain from an information-theoretic perspective by analyzing the channel capacity with 1-bit quantization and oversampling at the receiver for the particular case of AWGN channels. This includes a numerical computation of the capacity and optimal transmit symbol constellations, as well as the derivation of closed-form expressions for large oversampling ratios and for high signal-to-noise ratios of the channel.
|
8 |
Efficient FPGA SoC Processing Design for a Small UAV RadarNewmeyer, Luke Oliver 01 April 2018 (has links)
Modern radar technology relies heavily on digital signal processing. As radar technology pushes the boundaries of miniaturization, computational systems must be developed to support the processing demand. One particular application for small radar technology is in modern drone systems. Many drone applications are currently inhibited by safety concerns of autonomous vehicles navigating shared airspace. Research in radar based Detect and Avoid (DAA) attempts to address these concerns by using radar to detect nearby aircraft and choosing an alternative flight path. Implementation of radar on small Unmanned Air Vehicles (UAV), however, requires a lightweight and power efficient design. Likewise, the radar processing system must also be small and efficient.This thesis presents the design of the processing system for a small Frequency Modulated Continuous Wave (FMCW) phased array radar. The radar and processing is designed to be light-weight and low-power in order to fly onboard a UAV less than 25 kg in weight. The radar algorithms for this design include a parallelized Fast Fourier Transform (FFT), cross correlation, and beamforming. Target detection algorithms are also implemented. All of the computation is performed in real-time on a Xilinx Zynq 7010 System on Chip (SoC) processor utilizing both FPGA and CPU resources.The radar system (excluding antennas) has dimensions of 2.25 x 4 x 1.5 in3, weighs 120 g, and consumes 8 W of power of which the processing system occupies 2.6 W. The processing system performs over 652 million arithmetic operations per second and is capable of performing the full processing in real-time. The radar has also been tested in several scenarios both airborne on small UAVs as well as on the ground. Small UAVs have been detected to ranges of 350 m and larger aircraft up to 800 m. This thesis will describe the radar design architecture, the custom designed radar hardware, the FPGA based processing implementations, and conclude with an evaluation of the system's effectiveness and performance.
|
9 |
From dataflow models to energy efficient application specific processorsHautala, I. (Ilkka) 11 October 2019 (has links)
Abstract
The development of wireless networks has provided the necessary conditions for several new applications. The emergence of the virtual and augmented reality and the Internet of things and during the era of social media and streaming services, various demands related to functionality and performance have been set for mobile and wearable devices. Meeting these demands is complicated due to minimal energy budgets, which are characteristic of embedded devices. Lately, the energy efficiency of devices has been addressed by increasing parallelism and the use of application-specific hardware resources. This has been hindered by hardware development as well as software development because the conventional development methods are based on the use of low-level abstractions and sequential programming paradigms. On the other hand, deployment of high-level design methods is slowed down because of final solutions that are too much compromised when energy efficiency and performance are considered.
This doctoral thesis introduces a model-driven framework for the development of signal processing systems that facilitates hardware and software co-design. The design flow exploits an easily customizable, re-programmable and energy-efficient processor template. The proposed design flow enables tailoring of multiple heterogeneous processing elements and the connections between them to the demands of an application. Application software is described by using high-level dataflow models, which enable the automatic synthesis of parallel applications for different multicore hardware platforms and speed up design space exploration. Suitability of the proposed design flow is demonstrated by using three different applications from different signal processing domains. The experiments showed that raising the level of abstraction has only a minor impact on performance.
Video processing algorithms are selected to be the main application area in this thesis. The thesis proposes tailored and reprogrammable energy-efficient processing elements for video coding algorithms. The solutions are based on the use of multiple processing elements by exploiting the pipeline parallelism of the application, which is characteristic of many signal processing algorithms. Performance, power and area metrics for the designed solutions have been obtained using post-layout simulation models. In terms of energy efficiency, the proposed programmable processors form a new compromise solution between fixed hardware accelerators and conventional embedded processors for video coding. / Tiivistelmä
Langattomien verkkojen kehittyminen on luonut edellytykset useille uusille sovelluksille. Muiden muassa sosiaalisen media, suoratoistopalvelut, virtuaalitodellisuus ja esineiden internet asettavat kannettaville ja puettaville laitteille moninaisia toimintoihin, suorituskykyyn, energiankulutukseen ja fyysiseen muotoon liittyviä vaatimuksia. Yksi isoimmista haasteista on sulautettujen laitteiden energiankulutus. Laitteiden energiatehokkuutta on pyritty parantamaan rinnakkaislaskentaa ja räätälöityjä laskentaresursseja hyödyntämällä. Tämä puolestaan on vaikeuttanut niin laite- kuin sovelluskehitystä, koska laajassa käytössä olevat kehitystyökalut perustuvat matalan tason abstraktioihin ja hyödyntävät alun perin yksi ydinprosessoreille suunniteltuja ohjelmointikieliä. Korkean tason ja automatisoitujen kehitysmenetelmien käyttöönottoa on hidastanut aikaansaatujen järjestelmien puutteellinen suorituskyky ja laiteresurssien tehoton hyödyntäminen.
Väitöskirja esittelee datavuopohjaiseen suunnitteluun perustuvan työkaluketjun, joka on tarkoitettu energiatehokkaiden signaalikäsittelyjärjestelmien toteuttamiseen. Työssä esiteltävä suunnitteluvuo pohjautuu laitteistoratkaisuissa räätälöitävään ja ohjelmoitavaan siirtoliipaistavaan prosessoritemplaattiin. Ehdotettu suunnitteluvuo mahdollistaa useiden heterogeenisten prosessoriytimien ja niiden välisten kytkentöjen räätälöimisen sovelluksien tarpeiden vaatimalla tavalla. Suunnitteluvuossa ohjelmistot kuvataan korkean tason datavuomallien avulla. Tämä mahdollistaa erityisesti rinnakkaista laskentaa sisältävän ohjelmiston automaattisen sovittamisen erilaisiin moniprosessorijärjestelmiin ja nopeuttaa erilaisten järjestelmätason ratkaisujen kartoittamista. Suunnitteluvuon käyttökelpoisuus osoitetaan käyttäen esimerkkinä kolmea eri signaalinkäsittelysovellusta. Tulokset osoittavat, että suunnittelumenetelmien abstraktiotasoa on mahdollista nostaa ilman merkittävää suorituskyvyn heikkenemistä.
Väitöskirjan keskeinen sovellusalue on videonkoodaus. Työ esittelee videonkoodaukseen suunniteltuja energiatehokkaita ja uudelleenohjelmoitavia prosessoriytimiä. Ratkaisut perustuvat usean prosessoriytimen käyttämiseen hyödyntäen erityisesti videonkäsittelyalgoritmeille ominaista liukuhihnarinnakkaisuutta. Prosessorien virrankulutus, suorituskyky ja pinta-ala on analysoitu käyttämällä simulointimalleja, jotka huomioivat logiikkasolujen sijoittelun ja johdotuksen. Ehdotetut sovelluskohtaiset prosessoriratkaisut tarjoavat uuden energiatehokkaan kompromissiratkaisun tavanomaisten ohjelmoitavien prosessoreiden ja kiinteästi johdotettujen video-kiihdyttimien välille.
|
10 |
Measuring energy consumption for short code paths using RAPLHähnel, Marcus, Döbel, Björn, Völp, Marcus, Härtig, Hermann 28 May 2013 (has links) (PDF)
Measuring the energy consumption of software components is a major building block for generating models that allow for energy-aware scheduling, accounting and budgeting. Current measurement techniques focus on coarse-grained measurements of application or system events. However, fine grain adjustments in particular in the operating-system kernel and in application-level servers require power profiles at the level of a single software function. Until recently, this appeared to be impossible due to the lacking fine grain resolution and high costs of measurement equipment. In this paper we report on our experience in using the Running Average Power Limit (RAPL) energy sensors available in recent Intel CPUs for measuring energy consumption of short code paths. We investigate the granularity at which RAPL measurements can be performed and discuss practical obstacles that occur when performing these measurements on complex modern CPUs. Furthermore, we demonstrate how to use the RAPL infrastructure to characterize the energy costs for decoding video slices.
|
Page generated in 0.0717 seconds