Global ETD Search

171	Approximate Computing: From Circuits to Software Younghoon Kim (10184063) 01 March 2021 (has links) <div>Many modern workloads such as multimedia, recognition, mining, search, vision, etc. possess the characteristic of intrinsic application resilience: The ability to produce acceptable-quality outputs despite their underlying computations being performed in an approximate manner. Approximate computing has emerged as a paradigm that exploits intrinsic application resilience to design systems that produce outputs of acceptable quality with significant performance/energy improvement. The research community has proposed a range of approximate computing techniques spanning across circuits, architecture, and software over the last decade. Nevertheless, approximate computing is yet to be incorporated into mainstream HW/SW design processes largely due to the deviation from the conventional design flow and the lack of runtime approximation controllability by the user.</div><div><br></div><div>The primary objective of this thesis is to provide approximate computing techniques across different layers of abstraction that possess the two following characteristics: (i) They can be applied with minimal change to the conventional design flow, and (ii) the approximation is controllable at runtime by the user with minimal overhead. To this end, this thesis proposes three novel approximate computing techniques: Clock overgating which targets HW design at the Register Transfer Level (RTL), value similarity extensions which enhance general-purpose processors with a set of microarchitectural and ISA extensions, and data subsetting which targets SW executing for commodity platforms.</div><div><br></div><div>The thesis first explores clock overgating, which extends the concept of clock gating: A conventional low-power technique that turns off the clock to a Flip-Flop (FF) when the value remains unchanged. In contrast to traditional clock gating, in clock overgating the clock signals to selected FFs in the circuit are gated even when the circuit functionality is sensitive to their state. This saves additional power in the clock tree, the gated FFs and in their downstream logic, while a quality loss occurs if the erroneous FF states propagate to the circuit outputs. This thesis develops a systematic methodology to identify an energy-efficient clock overgating configuration for any given circuit and quality constraint. Towards this end, three key strategies for efficiently pruning the large space of possible overgating configurations are proposed: Significance-based overgating, grouping FFs into overgating islands, and utilizing internal signals of the circuit as triggers for overgating. Across a suite of 6 machine learning accelerators, energy benefits of 1.36X on average are achieved at the cost of a very small (<0.5%) loss in classification accuracy.</div><div><br></div><div>The thesis also explores value similarity extensions, a set of lightweight micro-architectural and ISA extensions for general-purpose processors that provide performance improvements for computations on data structures with value similarity. The key idea is that programs often contain repeated instructions that are performed on very similar inputs (e.g., neighboring pixels within a homogeneous region of an image). In such cases, it may be possible to skip an instruction that operates on data similar to a previously executed instruction, and approximate the skipped instruction's result with the saved result of the previous one. The thesis provides three key strategies for realizing this approach: Identifying potentially skippable instructions from user annotations in SW, obtaining similarity information for future load values from the data cache line currently being accessed, and a mechanism for saving & reusing results of potentially skippable instructions. As a further optimization, the thesis proposes to replace multiple loop iterations that produce similar results with a specialized instruction sequence. The proposed extensions are modeled on the gem5 architectural simulator, achieving speedup of 1.81X on average across 6 machine-learning benchmarks running on a microcontroller-class in-order processor.</div><div><br></div><div>Finally, the thesis explores a data-centric approach to approximate computing called data subsetting that shifts the focus of approximation from computations to data. The key idea is to restrict the application's data accesses to a subset of its elements so that the overall memory footprint becomes smaller. Constraining the accesses to lie within a smaller memory footprint renders the memory accesses more cache-friendly, thereby improving performance. This thesis presents a C++ data structure template called SubsettableTensor, which embodies mechanisms to define an accessible subset of data and redirect accesses away from non-subset elements, for realizing data subsetting in SW. The proposed concept is evaluated on parallel SW implementations of 7 machine learning applications on a 48-core AMD Opteron server. Experimental results indicate that 1.33X-4.44X performance improvement can be achieved within a <0.5% loss in classification accuracy.</div><div><br></div><div>In summary, the proposed approximation techniques have shown significant efficiency improvements for various machine learning applications in circuits, architecture and SW, underscoring their promise as designer-friendly approaches to approximate computing.</div> Approximate Computing Circuit Design in RTL Computer Architecture Parallel Programming
172	Practical Web-scale Recommender Systems / 実用的なWebスケール推薦システム / # ja-Kana Tagami, Yukihiro 25 September 2018 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21390号 / 情博第676号 / 新制\|\|情\|\|117(附属図書館) / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授鹿島久嗣, 教授山本章博, 教授下平英寿 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Recommender systems Online advertising Extreme multi-label classification Learning-to-rank Approximate nearest neighbor search 007
173	Modeling the Interaction of Numerosity and Perceptual Variables with the Diffusion Model Kang, Inhan 26 August 2019 (has links) No description available. Psychology
174	Solving Linear and Bilinear Inverse Problems using Approximate Message Passing Methods Sarkar, Subrata January 2020 (has links) No description available. Electrical Engineering Approximate message passing expectation propagation expectation maximization self-calibration computed tomography dictionary learning MRI
175	Approximate representations of groups De Chiffre, Marcus 31 August 2018 (has links) In this thesis, we consider various notions of approximate representations of groups. Loosely speaking, an approximate representation is a map from a group into the unitary operators on a Hilbert space that satisfies the homomorphism equation up to a small error. Maps that are close to actual representations are trivial examples of approximate representations, and a natural question to ask is whether all approximate representations of a given group arise in this way. A group with this property is called stable. In joint work with Lev Glebsky, Alexander Lubotzky and Andreas Thom, we approach the stability question in the setting of local asymptotic representations. We provide sufficient condition in terms of cohomology vanishing for a finitely presented group to be stable. We use this result to provide new examples of groups that are stable with respect to the Frobenius norm, including the first examples of groups that are not Frobenius approximable. In joint work with Narutaka Ozawa and Andreas Thom, we generalize a theorem by Gowers and Hatami about maps with non-vanishing uniformity norm. We use this to prove a very general stability result for uniform epsilon-representations of amenable groups which subsumes results by both Gowers-Hatami and Kazhdan. info:eu-repo/classification/ddc/510 ddc:510
176	Energy-Efficient Devices and Circuits for Ultra-Low Power VLSI Applications Li, Ren 04 1900 (has links) Nowadays, integrated circuits (IC) are mostly implemented using Complementary Metal Oxide Semiconductor (CMOS) transistor technology. This technology has allowed the chip industry to shrink transistors and thus increase the device density, circuit complexity, operation speed, and computation power of the ICs. However, in recent years, the scaling of transistor has faced multiple roadblocks, which will eventually lead the scaling to an end as it approaches physical and economic limits. The dominance of sub-threshold leakage, which slows down the scaling of threshold voltage VTH and the supply voltage VDD, has resulted in high power density on chips. Furthermore, even widely popular solutions such as parallel and multi-core computing have not been able to fully address that problem. These drawbacks have overshadowed the benefits of transistor scaling. With the dawn of Internet of Things (IoT) era, the chip industry needs adjustments towards ultra-low-power circuits and systems. In this thesis, energy-efficient Micro-/Nano-electromechanical (M/NEM) relays are introduced, their non-leaking property and abrupt switch ON/OFF characteristics are studied, and designs and applications in the implementation of ultra-low-power integrated circuits and systems are explored. The proposed designs compose of core building blocks for any functional microprocessor, for instance, fundamental logic gates; arithmetic adder circuits; sequential latch and flip-flop circuits; input/output (I/O) interface data converters, including an analog-to-digital converter (ADC), and a digital-to-analog converter (DAC); system-level power management DC-DC converters and energy management power gating scheme. Another contribution of this thesis is the study of device non-ideality and variations in terms of functionality of circuits. We have thoroughly investigated energy-efficient approximate computing with non-ideal transistors and relays for the next generation of ultra-low-power VLSI systems. Microelectromechanical Systems Ultra-low power electronics Integrated circuits and systems Electromechanical Computing Approximate Computing Internet of Things
177	Point Based Approximate Color Bleeding with Cuda Feeney, Nicholas D 01 June 2013 (has links) (PDF) Simulating light is a very computationally expensive proposition. There are a wide variety of global illumination algorithms that are implemented and used by major motion picture companies to render interesting and believable scenes. Every algorithm strives to find a balance between speed and accuracy. The Point Based Approximate Color Bleeding algorithm is one of the most widely used algorithms in the field today. The Point Based Approximate Color Bleeding(PBACB) global illumination algorithm is based on the central idea that the geometry and direct illumination of the scene can be approximated by using a point cloud representation. This point cloud representation can then be used to generate the indirect illumination. The most basic unit of the point cloud is a surfel. A surfel is a two dimensional circle in space that contains the direct illumination for that section of space. The surfels are gathered in a tree structure and approximations are generated for the different levels of the tree. This tree is then used to calculate the appropriate color bleeding effect to apply to the surfaces in a rendered image. The main goal of this project was to explore the possibility of applying CUDA to the PBACB global illumination algorithm. CUDA is an extension of the C/C++ programing languages which allows for GPU parallel programming. In this paper, we present our GPU based implementation of the PBACB algorithm. The PBACB algorithm involves three central steps, creation of a surfel point cloud, generation of the spherical harmonics approximations for the point cloud, and using the surfel point cloud to generate an approximation for global illumi- nation. For this project, CUDA was applied to two of the steps of the PBACB algorithm, the generation of the spherical harmonic representations and the ap- plication of the surfel point cloud to generate indirect illumination. Our final GPU algorithm was able to obtain a 4.0 times speedup over our CPU version. We also discuss future work which could include the use of CUDA’s Dynamic Parallelism and a stack free implementation which could increase the speedups seen by our algorithm. Point Based Approximate Color Bleeding CUDA Computer Graphics Parallelism Color Bleeding Computational Engineering
178	Post-Training Optimization of Cross-layer Approximate Computing for Edge Inference of Deep Learning Applications De la Parra Aparicio, Cecilia Eugenia 07 February 2024 (has links) Over the past decade, the rapid development of deep learning (DL) algorithms has enabled extraordinary advances in perception tasks throughout different fields, from computer vision to audio signal processing. Additionally, increasing computational resources available in supercomputers and graphic processor clusters have provided a suitable environment to train larger and deeper deep neural network (DNN) models for improved performances. However, the resulting memory bandwidth and computational requirements of such DNN models restricts their deployment in embedded systems with constrained hardware resources. To overcome this challenge, it is important to establish new paradigms to reduce the computational workload of such DL algorithms while maintaining their original accuracy. A key observation of previous research is that DL models are resilient to input noise and computational errors; therefore, a reasonable approach to decreasing such hardware requirements is to embrace DNN resiliency and utilize approximate computing techniques at different system design layers. This approach requires, however, constant monitoring as well as a careful combination of approximation techniques to avoid performance degradation while maximizing computational savings. Within this context, the focus of this thesis is the simulation of cross-layer approximate computing (AC) methods for DNN computation and the development of optimization methods to compensate AC errors in approximated DNNs. The first part of this thesis proposes the simulation framework ProxSim. This framework enables accelerated approximate computational unit (ACU) simulation for evaluation and training of approximated DNNs. ProxSim supports quantization and approximation of common neural layers such as fully connected (FC), convolutional, and recurrent layers. A performance evaluation using a variety of DNN architectures, as well as a comparison with the state of the art is also presented. The author used ProxSim to implement and evaluate the following methods presented in this work. The second part of this thesis introduces an approach to model the approximation error in DNN computation. First, the author thoroughly anaylzes the error caused by approximate multipliers to compute the multiply and accumulate (MAC) operations in DNN models. From this analysis, a statistical model of the approximation error is obtained. Through various experiments with DNNs for image classification, the proposed model is verified and compared with other methods from the literature. The results demonstrate the validity of the approximation error model and reinforce a general understanding of approximate computing in DNNs. In the third part of this thesis, the author presents a methodology for uniform systematic approximation of DNNs. This methodology focuses on the optimization of full DNN approximation with a single type of ACU to minimize power consumption without accuracy loss. The backbone of this methodology is the custom fine-tuning methods the author proposes to compensate for the approximation error. These methods enable the use of ACUs with large approximation errors, which results in significant power savings and negligible accuracy losses. This process is corroborated by extensive experiments, where the estimated savings and the accuracy achieved after approximation are thoroughly examined using ProxSim. In the last part of this thesis, the author proposes two different methodologies to further boost energy savings after applying uniform approximation. This increment in energy savings is achieved by computing more resilient DNN elements (neurons or layers) with increased approximation levels. The first methodology focuses on iterative kernel-wise approximation and quantization enabled by a custom approximate MAC unit. The second method is based on flexible layer-wise approximation, and applied to bit-decomposed in-memory computing (IMC) architectures as a case study to demonstrate the effectiveness of the proposed approach. info:eu-repo/classification/ddc/006 ddc:006
179	VARIABILITY ANALYSIS & ITS APPLICATIONS TO PHYSIOLOGICAL TIME SERIES DATA Kaffashi, Farhad 06 June 2007 (has links) No description available. Engineering, Biomedical Epilepsy Neonatal EEG Respiration Variability Analysis Approximate Entropy Sample Entropy
180	Complexity of the Electroencephalogram of the Sprague-Dawley Rat Smith, Phillip James 27 July 2010 (has links) No description available. Biomedical Research Electrical Engineering Complexity Approximate Entropy Sample Entropy Spectral Entropy Surrogate Data

Search results