Global ETD Search

21	Σχεδίαση παράλληλης διάταξης επεξεργαστών σε ένα chip : δημιουργία και μελέτη high radix RNS αθροιστή Γιαννοπούλου, Λεμονιά 09 July 2013 (has links) Η άθροιση μεγάλων αριθμών είναι μια χρονοβόρα και ενεργοβόρα διαδικασία. Πολλές μέθοδοι έχουν αναπτυχθεί για να μειωθεί η καθυστέρηση υπολογισμού του αθροίσματος λόγω της μετάδοσης κρατουμένου. Τέτοιες είναι η πρόβλεψη κρατουμένου (carry look ahead) και η επιλογή κρατουμένου (carry select). Αυτές οι αρχιτεκτονικές δεν είναι επαρκώς επεκτάσιμες για μεγάλους αριθμούς (με πολλά bits) ή πολλούς αριθμούς, διότι παράγονται μεγάλα και ενεργοβόρα κυκλώματα. Στην παρούσα εργασία μελετάται η μέθοδος υπολοίπου (RNS), η οποία χρησιμοποιεί συστήματα αριθμών μεγαλύτερα από το δυαδικό. Ορίζεται μια βάση τριών αριθμών και οι αριθμοί αναπαρίστανται στα εκάστοτε τρία συστήματα της βάσης. Η άθροιση γίνεται παράλληλα σε κάθε σύστημα και τέλος οι αριθμοί μετατρέπονται πάλι στο δυαδικό. Τα πλεονεκτήματα αυτής της προσέγγισης είναι η παραλληλία και η απουσία μεγάλων κυκλωμάτων διάδοσης κρατουμένου. Το μειονέκτημα είναι ότι χρειάζονται κυκλώματα μετατροπής από και προς το δυαδικό σύστημα. Αυτού του είδους οι αθροιστές συγκρίνονται για κατανάλωση ενέργειας με τους γνωστούς carry look ahead και carry select. Διαπιστώθηκε ότι οι RNS αθροιστές καταναλώνουν λιγότερη ενέργεια. / The addition of many-bits numbers is a time and power consuming task. Many methods are developed to reduce the sum calculation delay due to carry propagation. Such techniques are Carry Look Ahead and Carry Select, Those techniques are not scalable to many bits numbers or a set of many numbers: the circuits needed are big and power consuming. In this thesis, the the RNS technique is investigated. This technique uses radix bigger than binary. A 3-numbers base is defined and the numbers that participate in the sum are represented uniquely in each element radix. The addition is performed in parallel in each radix. Finally the result is transformed back to the binary numbers system. The advantages of this technique are the parallelization of the process and the lack of carry propagation circuits. The disadvantage is that transformation circuits are need from/to binary system. The RNS adders are compared to CLA and CS for power. Such adders are compared to CLA and CS for power consumption. It is found that RNS adders consume less energy. Κατανάλωση ενέργειας Αθροιστές Διάδοση κρατουμένου 621.39 5 Chinese remainder theorem (CRT) Power consumption Residue numbering system (RNS) Adders Carry propagation Carry look ahead (CLA) Carry select (CS)
22	Simulated molecular adder circuits on a surface of DNA : Studying the scalability of surface chemical reaction network digital logic circuits / Simulerade additionskretsar på en yta av DNA : En studie av skalbarheten hos kretsar för digital logik på ytbundna kemiska reaktionsnätverk Arvidsson, Jakob January 2023 (has links) The behavior of the Deoxyribonucleic Acid (DNA) molecule can be exploited to perform useful computation. It can also be ”programmed” using the language of Chemical Reaction Networks (CRNs). One specialized CRN construct is the Surface Chemical Reaction Network (SCRN). The SCRN construct can implement asynchronous cellular automata, which can in turn be used to implement digital logic circuits. SCRN based digital logic circuits are thought to have several advantages over regular CRN circuits. One of these proposed advantages is their scalability. This thesis investigates the scalability of SCRN based adder circuits, how does an increase in the number of bits affect the time required for the circuit to produce a correct result? Additionally, how is the throughput of the circuit affected when multiple additions are performed in a pipelined fashion? These questions are studied through experiments where the execution of optimized SCRN adder circuits is simulated. Due to the stochastic nature of SCRNs each such execution is all but guaranteed to be unique, requiring the simulation of the circuits to be repeated until a sufficiently large statistical sample has been collected. The results show these samples to follow a Gaussian distribution, regardless of the number of bits or the number of pipelined operations. The experiments show the simulated latency of the studied SCRN adder circuits to scale linearly with the number of input bits. The results also show that the throughput can be greatly improved through the pipelining of multiple operations. However, the results are inconclusive as to the maximum possible throughput of SCRN adder circuits. A conclusion of the project is that SCRN digital logic circuit design could conceivably benefit from the implementation of specialized components beyond the standard logic gates. / DNA-molekylen kan utnyttjas för att genomföra användbara beräkningar. Den kan också ”programmeras” via abstraktionen kemiska reaktionsnätverk. Ytbundna Kemiska Reaktionsnätverk (YKR) är i sin tur en vidare specialisering av sådana reaktionsnätverk. Ett YKR kan implementera en asynkrona cellulära automat, som i sin tur kan implementera kretsar för digital logik. Kretsar för digital logik byggda med YKR anses ha flera fördelar gentemot motsvarande kretsar byggda från vanliga kemiska reaktionsnätverk. En av dessa tilltänkta fördelar ligger i deras skalbarhet. Detta examensarbete undersöker skalbarheten hos YKR-baserade additions-kretsar, hur påverkar ett ökat antal bitar tiden som krävs för att kretsen ska producera ett korrekt resultat? Vidare, hur påverkas genomströmningen när flera operationer matas in direkt och genomför efter varandra i en pipeline? Dessa frågor studeras genom experiment där körningar av optimerande YKR-baserade additionskretsar simuleras. På grund av de stokastiska egenskaperna hos YKR är varje sådan körning i princip garanterad att vara unik, vilket kräver upprepade simuleringar av varje krets tills ett tillräckligt stort statistiskt urval har insamlats. Dessa resultat visar sig följa en normalfördelningskurva, oavsett antalet bitar eller antalet operationer som matats in i en pipeline. Experimenten visar att den simulerade latensen skalar linjärt med antalet indata-bitar för de studerade additionskretsarna. Resultaten visar även att genomströmningen förbättras avsevärt när flera operationer körs direkt efter varandra i en pipeline. Resultaten är dock ofullständiga när det gäller uppmätandet av additionskretsarna högsta möjliga genomströmning. En slutsats av projektet är att YKR-baserade kretsar för digital logik möjligen skulle kunna gagnas av implementerandet av specialiserade komponenter utöver de vanliga logikgrindarna. Surface chemical reaction networks Molecular computing DNA computing Circuit simulation Adders Ytbundna kemiska reaktionsnätverk Molekylär beräkning DNA-beräkning Kretssimulering Additionskretsar Computer Sciences Datavetenskap (datalogi) Computer and Information Sciences Data- och informationsvetenskap
23	Design, Analysis, and Applications of Approximate Arithmetic Modules Ullah, Salim 06 April 2022 (has links) From the initial computing machines, Colossus of 1943 and ENIAC of 1945, to modern high-performance data centers and Internet of Things (IOTs), four design goals, i.e., high-performance, energy-efficiency, resource utilization, and ease of programmability, have remained a beacon of development for the computing industry. During this period, the computing industry has exploited the advantages of technology scaling and microarchitectural enhancements to achieve these goals. However, with the end of Dennard scaling, these techniques have diminishing energy and performance advantages. Therefore, it is necessary to explore alternative techniques for satisfying the computational and energy requirements of modern applications. Towards this end, one promising technique is analyzing and surrendering the strict notion of correctness in various layers of the computation stack. Most modern applications across the computing spectrum---from data centers to IoTs---interact and analyze real-world data and take decisions accordingly. These applications are broadly classified as Recognition, Mining, and Synthesis (RMS). Instead of producing a single golden answer, these applications produce several feasible answers. These applications possess an inherent error-resilience to the inexactness of processed data and corresponding operations. Utilizing these applications' inherent error-resilience, the paradigm of Approximate Computing relaxes the strict notion of computation correctness to realize high-performance and energy-efficient systems with acceptable quality outputs. The prior works on circuit-level approximations have mainly focused on Application-specific Integrated Circuits (ASICs). However, ASIC-based solutions suffer from long time-to-market and high-cost developing cycles. These limitations of ASICs can be overcome by utilizing the reconfigurable nature of Field Programmable Gate Arrays (FPGAs). However, due to architectural differences between ASICs and FPGAs, the utilization of ASIC-based approximation techniques for FPGA-based systems does not result in proportional performance and energy gains. Therefore, to exploit the principles of approximate computing for FPGA-based hardware accelerators for error-resilient applications, FPGA-optimized approximation techniques are required. Further, most state-of-the-art approximate arithmetic operators do not have a generic approximation methodology to implement new approximate designs for an application's changing accuracy and performance requirements. These works also lack a methodology where a machine learning model can be used to correlate an approximate operator with its impact on the output quality of an application. This thesis focuses on these research challenges by designing and exploring FPGA-optimized logic-based approximate arithmetic operators. As multiplication operation is one of the computationally complex and most frequently used arithmetic operations in various modern applications, such as Artificial Neural Networks (ANNs), we have, therefore, considered it for most of the proposed approximation techniques in this thesis. The primary focus of the work is to provide a framework for generating FPGA-optimized approximate arithmetic operators and efficient techniques to explore approximate operators for implementing hardware accelerators for error-resilient applications. Towards this end, we first present various designs of resource-optimized, high-performance, and energy-efficient accurate multipliers. Although modern FPGAs host high-performance DSP blocks to perform multiplication and other arithmetic operations, our analysis and results show that the orthogonal approach of having resource-efficient and high-performance multipliers is necessary for implementing high-performance accelerators. Due to the differences in the type of data processed by various applications, the thesis presents individual designs for unsigned, signed, and constant multipliers. Compared to the multiplier IPs provided by the FPGA Synthesis tool, our proposed designs provide significant performance gains. We then explore the designed accurate multipliers and provide a library of approximate unsigned/signed multipliers. The proposed approximations target the reduction in the total utilized resources, critical path delay, and energy consumption of the multipliers. We have explored various statistical error metrics to characterize the approximation-induced accuracy degradation of the approximate multipliers. We have also utilized the designed multipliers in various error-resilient applications to evaluate their impact on applications' output quality and performance. Based on our analysis of the designed approximate multipliers, we identify the need for a framework to design application-specific approximate arithmetic operators. An application-specific approximate arithmetic operator intends to implement only the logic that can satisfy the application's overall output accuracy and performance constraints. Towards this end, we present a generic design methodology for implementing FPGA-based application-specific approximate arithmetic operators from their accurate implementations according to the applications' accuracy and performance requirements. In this regard, we utilize various machine learning models to identify feasible approximate arithmetic configurations for various applications. We also utilize different machine learning models and optimization techniques to efficiently explore the large design space of individual operators and their utilization in various applications. In this thesis, we have used the proposed methodology to design approximate adders and multipliers. This thesis also explores other layers of the computation stack (cross-layer) for possible approximations to satisfy an application's accuracy and performance requirements. Towards this end, we first present a low bit-width and highly accurate quantization scheme for pre-trained Deep Neural Networks (DNNs). The proposed quantization scheme does not require re-training (fine-tuning the parameters) after quantization. We also present a resource-efficient FPGA-based multiplier that utilizes our proposed quantization scheme. Finally, we present a framework to allow the intelligent exploration and highly accurate identification of the feasible design points in the large design space enabled by cross-layer approximations. The proposed framework utilizes a novel Polynomial Regression (PR)-based method to model approximate arithmetic operators. The PR-based representation enables machine learning models to better correlate an approximate operator's coefficients with their impact on an application's output quality.:1. Introduction 1.1 Inherent Error Resilience of Applications 1.2 Approximate Computing Paradigm 1.2.1 Software Layer Approximation 1.2.2 Architecture Layer Approximation 1.2.3 Circuit Layer Approximation 1.3 Problem Statement 1.4 Focus of the Thesis 1.5 Key Contributions and Thesis Overview 2. Preliminaries 2.1 Xilinx FPGA Slice Structure 2.2 Multiplication Algorithms 2.2.1 Baugh-Wooley’s Multiplication Algorithm 2.2.2 Booth’s Multiplication Algorithm 2.2.3 Sign Extension for Booth’s Multiplier 2.3 Statistical Error Metrics 2.4 Design Space Exploration and Optimization Techniques 2.4.1 Genetic Algorithm 2.4.2 Bayesian Optimization 2.5 Artificial Neural Networks 3. Accurate Multipliers 3.1 Introduction 3.2 Related Work 3.3 Unsigned Multiplier Architecture 3.4 Motivation for Signed Multipliers 3.5 Baugh-Wooley’s Multiplier 3.6 Booth’s Algorithm-based Signed Multipliers 3.6.1 Booth-Mult Design 3.6.2 Booth-Opt Design 3.6.3 Booth-Par Design 3.7 Constant Multipliers 3.8 Results and Discussion 3.8.1 Experimental Setup and Tool Flow 3.8.2 Performance comparison of the proposed accurate unsigned multiplier 3.8.3 Performance comparison of the proposed accurate signed multiplier with the state-of-the-art accurate multipliers 3.8.4 Performance comparison of the proposed constant multiplier with the state-of-the-art accurate multipliers 3.9 Conclusion 4. Approximate Multipliers 4.1 Introduction 4.2 Related Work 4.3 Unsigned Approximate Multipliers 4.3.1 Approximate 4 × 4 Multiplier (Approx-1) 4.3.2 Approximate 4 × 4 Multiplier (Approx-2) 4.3.3 Approximate 4 × 4 Multiplier (Approx-3) 4.4 Designing Higher Order Approximate Unsigned Multipliers 4.4.1 Accurate Adders for Implementing 8 × 8 Approximate Multipliers from 4 × 4 Approximate Multipliers 4.4.2 Approximate Adders for Implementing Higher-order Approximate Multipliers 4.5 Approximate Signed Multipliers (Booth-Approx) 4.6 Results and Discussion 4.6.1 Experimental Setup and Tool Flow 4.6.2 Evaluation of the Proposed Approximate Unsigned Multipliers 4.6.3 Evaluation of the Proposed Approximate Signed Multiplier 4.7 Conclusion 5. Designing Application-specific Approximate Operators 5.1 Introduction 5.2 Related Work 5.3 Modeling Approximate Arithmetic Operators 5.3.1 Accurate Multiplier Design 5.3.2 Approximation Methodology 5.3.3 Approximate Adders 5.4 DSE for FPGA-based Approximate Operators Synthesis 5.4.1 DSE using Bayesian Optimization 5.4.2 MOEA-based Optimization 5.4.3 Machine Learning Models for DSE 5.5 Results and Discussion 5.5.1 Experimental Setup and Tool Flow 5.5.2 Accuracy-Performance Analysis of Approximate Adders 5.5.3 Accuracy-Performance Analysis of Approximate Multipliers 5.5.4 AppAxO MBO 5.5.5 ML Modeling 5.5.6 DSE using ML Models 5.5.7 Proposed Approximate Operators 5.6 Conclusion 6. Quantization of Pre-trained Deep Neural Networks 6.1 Introduction 6.2 Related Work 6.2.1 Commonly Used Quantization Techniques 6.3 Proposed Quantization Techniques 6.3.1 L2L: Log_2_Lead Quantization 6.3.2 ALigN: Adaptive Log_2_Lead Quantization 6.3.3 Quantitative Analysis of the Proposed Quantization Schemes 6.3.4 Proposed Quantization Technique-based Multiplier 6.4 Results and Discussion 6.4.1 Experimental Setup and Tool Flow 6.4.2 Image Classification 6.4.3 Semantic Segmentation 6.4.4 Hardware Implementation Results 6.5 Conclusion 7. A Framework for Cross-layer Approximations 7.1 Introduction 7.2 Related Work 7.3 Error-analysis of approximate arithmetic units 7.3.1 Application Independent Error-analysis of Approximate Multipliers 7.3.2 Application Specific Error Analysis 7.4 Accelerator Performance Estimation 7.5 DSE Methodology 7.6 Results and Discussion 7.6.1 Experimental Setup and Tool Flow 7.6.2 Behavioral Analysis 7.6.3 Accelerator Performance Estimation 7.6.4 DSE Performance 7.7 Conclusion 8. Conclusions and Future Work info:eu-repo/classification/ddc/004 ddc:004

Page generated in 0.0389 seconds