Spelling suggestions: "subject:"mixed precision"" "subject:"fixed precision""
1 |
Mixed Precision Quantization for Computer Vision Tasks in Autonomous Driving / Blandad Precisionskvantisering för Datorvisionsuppgifter vid Autonom KörningRengarajan, Sri Janani January 2022 (has links)
Quantization of Neural Networks is popular technique for adopting computation intensive Deep Learning applications to edge devices. In this work, low bit mixed precision quantization of FPN-Resnet18 model trained for the task of semantic segmentation is explored using Cityscapes and Arriver datasets. The Hessian information of each layer in the model is used to determine the bit precision for each layer and in some experiments the bit precision for the layers are determined randomly. The networks are quantization-aware trained with bit combinations 2, 4 and 8. The results obtained for both Cityscapes and Arriver datasets show that the quantization-aware trained networks with the low bit mixed precision technique offer a performance at par with the 8-bit quantization-aware trained networks and the segmentation performance degrades when the network activations are quantized below 8 bits. Also, it was found that the usage of the Hessian information had little effect on the network’s performance. / Kvantisering av Neurala nätverk är populär teknik för att införa beräknings-intensiva Deep Learning -applikationer till edge-enheter. I detta arbete utforskas låg bitmixad precisionskvantisering av FPN-Resnet18-modellen som är utbildad för uppgiften för semantisk segmentering med hjälp av Cityscapes och Arriverdatauppsättningar. Hessisk information från varje lager i modellen, används för att bestämma bitprecisionen för respektive lager. I vissa experiment bestäms bitprecision för skikten slumpmässigt. Nätverken är kvantiserings medvetna utbildade med bitkombinationer 2, 4 och 8. Resultaten som erhållits för både Cityscapes och Arriver datauppsättningar visar att de kvantiserings medvetna utbildade nätverken med lågbit blandad precisionsteknik erbjuder en prestanda i nivå med 8-bitars kvantiseringsmedvetna utbildade nätverk och segmenteringens prestationsgrader när nätverksaktiveringarna kvantiseras under 8 bitar. Det visade sig också att användningen av hessisk information hade liten effekt på nätets prestanda.
|
2 |
High-Performance Scientific Applications Using Mixed Precision and Low-Rank Approximation Powered by Task-based Runtime SystemsAlomairy, Rabab M. 20 July 2022 (has links)
To leverage the extreme parallelism of emerging architectures, so that scientific applications can fulfill their high fidelity and multi-physics potential while sustaining high efficiency relative to the limiting resource, numerical algorithms must be redesigned. Algorithmic redesign is capable of shifting the limiting resource, for example from memory or communication to arithmetic capacity. The benefit of algorithmic redesign expands greatly when introducing a tunable tradeoff between accuracy and resources. Scientific applications from diverse sources rely on dense matrix operations. These operations arise in: Schur complements, integral equations, covariances in spatial statistics, ridge regression, radial basis functions from unstructured meshes, and kernel matrices from machine learning, among others. This thesis demonstrates how to extend the problem sizes that may be treated and to reduce their execution time. Two “universes” of algorithmic innovations have emerged to improve computations by orders of magnitude in capacity and runtime. Each introduces a hierarchy, of rank or precision. Tile Low-Rank approximation replaces blocks of dense operator with those of low rank. Mixed precision approximation, increasingly well supported by contemporary hardware, replaces blocks of high with low precision. Herein, we design new high-performance direct solvers based on the synergism of TLR and mixed precision. Since adapting to data sparsity leads to heterogeneous workloads, we rely on task-based runtime systems to orchestrate the scheduling of fine-grained kernels onto computational resources. We first demonstrate how TLR permits to accelerate acoustic scattering and mesh deformation simulations. Our solvers outperform the state-of-art libraries by up to an order of magnitude. Then, we demonstrate the impact of enabling mixed precision in bioinformatics context. Mixed precision enhances the performance up to three-fold speedup. To facilitate the adoption of task-based runtime systems, we introduce the AL4SAN library to provide a common API for the expression and queueing of tasks across multiple dynamic runtime systems. This library handles a variety of workloads at a low overhead, while increasing user productivity. AL4SAN enables interoperability by switching runtimes at runtime, which permits to achieve a twofold speedup on a task-based generalized symmetric eigenvalue solver.
|
Page generated in 0.056 seconds