561 |
Μοντελοποίηση και εξομοίωση των χαρακτηριστικών γήρανσης NV μνημώνΠροδρομάκης, Αντώνιος 12 June 2015 (has links)
Τις τελευταίες δεκαετίες, η ανάπτυξη των non-volatile μνημών (NVMs) κατέστησε ικανή την αντικατάσταση volatile μνημών, όπως των DRAMs και των μαγνητικών σκληρών δίσκων (HDDs), σε caching και storage εφαρμογές, αντίστοιχα. Οι δίσκοι στερεάς κατάστασης (SSDs) που βασίζονται σε NAND Flash μνήμες έχουν ήδη αναδειχθεί ως ένα χαμηλού κόστους, υψηλής απόδοσης και αξιόπιστο μέσο στα σύγχρονα συστήματα αποθήκευσης. Επιπλέον, οι ιδιότητες των υλικών αλλαγής φάσης και η πρόσφατη κλιμάκωση της Phase-Change μνήμης (PCM), την καθιστά ένα τέλειο υποψήφιο για την ανάπτυξη μνημών τυχαίας προσπέλασης αλλαγής φάσης (PCRAMs).
Η ραγδαία κλιμάκωση των NVMs, με διαδικασίες ολοκλήρωσης κάτω από 19nm, και η χρήση της multi-level cell (MLC) τεχνολογίας συνέβαλλαν στην αύξηση της πυκνότητας αποθήκευσης πληροφορίας και συνεπώς μείωσαν το κόστος αποθήκευσης δραματικά. Ωστόσο, η διάρκεια ζωής των NV μνημών δεν παρέμεινε ανεπηρέαστη. Διαφορετικές παρεμβολές και πηγές θορύβου σε συνδυασμό με την επίδραση της γήρανσης έχουν ένα μεγάλο αντίκτυπο στην αξιοπιστία και την αντοχή αυτών των τεχνολογιών μνήμης, και ως εκ τούτου, των συστημάτων αποθήκευσης στα οποία χρησιμοποιούνται (SSDs, PCRAMs). Πολλές μέθοδοι και τεχνικές, όπως η μέθοδος wear-leveling, εξειδικευμένοι κώδικες ανίχνευσης και διόρθωσης λαθών (ECC) και τεχνικές pre-coding έχουν χρησιμοποιηθεί για να αντισταθμίσουν αυτές τις επιπτώσεις, ενώ άλλες, πιο περίπλοκες μεν, αλλά και πιο αποτελεσματικές, όπως η δυναμική προσαρμογή των κατωφλίων ανάγνωσης, βρίσκονται σε πειραματικό στάδιο.
Η ανάπτυξη αυτών των τεχνικών βασίζεται στον πειραματικό χαρακτηρισμό των NV μνημών, τόσο σε επίπεδο κελιού όσο και σε επίπεδο ολοκληρωμένου κυκλώματος. Ο χαρακτηρισμός αυτός σχετίζεται με την μέτρηση του λόγου του αριθμού των bit σφαλμάτων προς τον αριθμό των συνολικών bits (BER) και το χρόνο απόκρισης (ανάγνωσης και εγγραφής) καθ' όλη τη διάρκεια ζωής της μνήμης, για διάφορες μορφές δεδομένων και σενάρια χρονισμών. Η διαδικασία αυτή, μέχρι τώρα, γίνεται με τη χρήση της πραγματικής NV μνήμης, συνήθως με ολοκληρωμένα κυκλώματα που βρίσκονται στο στάδιο της προ-παραγωγής, ενώ πιο ενδελεχής έλεγχος γίνεται στο τελικό στάδιο της παραγωγής. Αυτή η προσέγγιση έχει δύο σημαντικά μειονεκτήματα. Από τη μία πλευρά, είναι μια πολύ χρονοβόρα διαδικασία, δεδομένου ότι η γήρανση μίας NVM μπορεί να απαιτεί ένα μεγάλο αριθμό από program / erase (P/E) κύκλους που πρέπει να εκτελεστούν για κάθε πείραμα. Ο αριθμός αυτός κυμαίνεται από κάποιες δεκάδες χιλιάδες (NAND Flash) έως και κάποια εκατομμύρια κύκλους (PCM). Από την άλλη πλευρά, τα χαρακτηριστικά γήρανσης μίας NVM είναι αναλόγως εξαρτώμενα από τον αριθμό των Ρ/Ε κύκλων που εκτελούνται, καθιστώντας έτσι αδύνατη την διεξαγωγή διαφορετικών ή διαδοχικών πειραμάτων στην ίδια κατάσταση γήρανσης της μνήμης.
Σε αυτή την εργασία παρουσιάζουμε ένα μοντέλο που αντιπροσωπεύει με ακρίβεια τη διαδικασία γήρανσης NV μνημών, αντιμετωπίζοντας τες ως ένα χρονικά μεταβαλλόμενο κανάλι επικοινωνίας βασισμένο σε ένα μη συμμετρικό n-PAM μοντέλο. Με βάση τη μοντελοποίηση των χαρακτηριστικών γήρανσης, υλοποιούμε ένα σύστημα εξομοίωσης σε πραγματικό χρόνο και με μεγάλη ακρίβεια της συμπεριφοράς NV-μνημών, κάτω από ορισμένες από το χρήστη συνθήκες γήρανσης, σε τεχνολογία FPGA. Η πλατφόρμα που παρουσιάζεται στην παρούσα εργασία βασίζεται σε μια αναπροσαρμόσιμη αρχιτεκτονική υλικού και λογισμικού που επιτρέπει την ακριβή εξομοίωση των νέων και αναδυόμενων τεχνολογιών και μοντέλων των NVMs. Η πλατφόρμα που αναπτύχθηκε μπορεί να αποτελέσει ένα πολύτιμο εργαλείο για την ανάπτυξη και αξιολόγηση αλγορίθμων και τεχνικών κωδικοποίησης. / Over the last few years, non-volatle memory (NVM) has shown a great potential in replacing volatile memory, like DRAM in caching applications, and magnetic HDDs in storage applications. NAND Flash-based solid state drives (SSDs) have already emerged as a low-cost, high-performance and reliable storage medium for both commercial and enterprise storage systems. Additionally, the properties of phase-change materials and the recent scaling of Phase-Change Memory (PCM) has made it a perfect candidate for developing phase-change random access memories (PCRAMs).
The rapid scaling of NVMs, with process nodes below 19nm, and the use of multi-level cell (MLC) technologies has increased their storage density and reduced the storage cost per bit. However, their lifetime capacity has not remained unaffected. Different interferences and noise sources along with aging effects have now a great impact on the reliability and endurance of these memory technologies, and hence, on the storage systems where these memories are used (SSDs, PCRAMs). Numerous techniques, such as wear-leveling, specialized error correcting codes (ECC) and precoding techniques have been employed to compensate these effects, while others, more complex but also more efficient, like dynamic adaptation of read reference thresholds, are at an experimental level.
The development of these techniques is based on experimental characterization of NVM cells and chips. Characterization is related with measuring bit error ratio (BER) and response time (read and write time) during the whole lifetime of a device, for various loading data patterns and timing scenarios. This process is performed using real NVM integrated chips, usually the engineering, pre-production parts, while more thorough testing at the system level is performed when production parts are available. This approach has two major drawbacks. On one hand it is a very time-consuming process, since the aging of an NVM may require a large number of program/erase (P/E) cycles to be performed for each experiment, ranging from tens of thousands (NAND Flash) to millions (PCM) program cycles. On the other hand, the aging characteristics of an NVM are proportionally dependent on the number of the performed P/E cycles, thus making it impossible to conduct different or successive experiments at the same aging state of a memory chip.
In this work, we present a model that accurately represents the aging process of an NVM cell, by treating it as a time-variant communications channel, based on an asymmetric n-PAM model. We present the architecture of a flexible FPGA-based platform, designed for accurate emulations of NVM technologies, focusing mainly on MLC NAND Flash technologies. Accuracy is measured in reference to experimentally specified bit error probabilities for various aging conditions (ie. the number of P/E cycles applied to a NAND Flash chip), usually for random data patterns.
The hardware platform presented in this work is based on a reconfigurable hardware-software architecture, which enables the accurate emulation of new and emerging models and technologies of NVMs. The developed platform can be a valuable tool for the evaluation of memory-related algorithms, signal processing and coding techniques.
|
562 |
Σχεδίαση αποκωδικοποιητή VLSI για κώδικες LDPCΤσατσαράγκος, Ιωάννης 12 April 2010 (has links)
Η διόρθωση λαθών με κώδικες LDPC είναι μεγάλου ενδιαφέροντος σε σημαντικές νέες τηλεπικοινωνιακές εφαρμογές, όπως δορυφορικό Digital Video Broadcast (DVB) DVB-S2, IEEE 802.3an (10GBASE-T) και IEEE 802.16 (WiMAX).
Οι κώδικες LDPC ανήκουν στην κατηγορία των γραμμικών μπλοκ κωδικών. Πρόκειται για κώδικες ελέγχου και διόρθωσης σφαλμάτων μετάδοσης, με κυριότερο χαρακτηριστικό τους τον χαμηλής πυκνότητας πίνακα ελέγχου ισοτιμίας (Low Density Parity Check), από τον οποίο και πήραν το όνομά τους. Η αποκωδικοποίηση γίνεται μέσω μιας επαναληπτικής διαδικασίας ανταλλαγής πληροφορίας μεταξύ δύο τύπων επεξεργαστικών μονάδων.
Η υλοποίηση σε υλικό των LDPC αποκωδικοποιητών αποτελεί ένα ραγδαία εξελισσόμενο πεδίο για τη σύγχρονη επιστημονική έρευνα. Σκοπός της παρούσας διπλωματικής εργασίας υπήρξε ο σχεδιασμός, η υλοποίηση και η βελτιστοποίηση αρχιτεκτονικών αποκωδικοποιητών VLSI για κώδικες LDPC.
Έχουν αναπτυχθεί διάφοροι αλγόριθμοι αποκωδικοποίησης, οι οποίοι είναι επαναληπτικοί. Μελετήθηκαν αρχιτεκτονικές βασισμένες σε δύο αλγόριθμους, τον log Sum-Product και τον Min-Sum. Ο πρώτος είναι θεωρητικά βέλτιστος, αλλά ο Min-Sum είναι αρκετά απλούστερος και έχει μεγαλύτερο πρακτικό ενδιαφέρον στα πλαίσια μιας ρεαλιστικής εφαρμογής. Συγκεκριμένα, αναπτύχθηκαν δύο αλγόριθμοι αποκωδικοποίησης, οι οποίοι χρησιμοποιούν ως δομικά στοιχεία, τους δύο προαναφερθέντες αλγορίθμους και τη φιλοσοφία του layered decoding.
Η μελέτη μας επικεντρώθηκε σε κώδικες, η δομή των πινάκων ελέγχου ισοτιμίας των οποίων, προσφέρεται για υλοποίηση. Για αυτό το λόγο, χρησιμοποιήσαμε κώδικες του προτύπου WiMax 802.16e.
Η συνεισφορά της παρούσας εργασίας έγκειται στο σχεδιασμό και την υλοποίηση αποδοτικών αρχιτεκτονικών σε επίπεδο επιφάνειας και ταχύτητας αποκωδικοποίησης (Mbps), καθώς και η διερεύνηση του σχετικού σχεδιαστικού χώρου, χρησιμοποιώντας ως σχεδιαστικές παραμέτρους, τον αλγόριθμο αποκωδικοποίησης, τη χρονοδρομολόγηση των πράξεων, το βαθμό παραλληλίας της αρχιτεκτονικής, το βάθος του pipelining και την αριθμητική αναπαράσταση των δεδομένων.
Επιπλέον, είναι σημαντικό να αναφέρουμε πως, στα πλαίσια της σχεδίασης του LDPC αποκωδικοποιητή και με τη βοήθεια του εργαλείου Matlab, αναπτύχθηκαν παραμετρικά scripts για την παραγωγή του VHDL κώδικα. Οι δύο βασικές παράμετροι που χρησιμοποιήθηκαν ήταν το πλήθος των επεξεργαστικών μονάδων και το μήκος λέξης των δεδομένων. Τα scripts αυτά αποτέλεσαν ένα πολύ χρήσιμο εργαλείο κατά τη διαδικασία ανάπτυξης και βελτιστοποίησης της αρχιτεκτονικής, δίνοντας μας τη δυνατότητα να παράγουμε με αυτοματοποιημένο και γρήγορο τρόπο τον VHDL κώδικα, για τις επιμέρους μονάδες του αποκωδικοποιητή.
Η υλοποίηση ενός μοντέλου αποκωδικοποιητή σε υλικό, μας δίνει τη δυνατότητα να διεξάγουμε ταχύτατες εξομοιώσεις, σε σχέση με αντίστοιχες υλοποιήσεις σε λογισμικό (π.χ. σε Matlab περιβάλλον). Διαθέτουμε, έτσι, ένα ισχυρό εργαλείο για τη μελέτη της επίδοσης διαφόρων ρεαλιστικών υλοποιήσεων αποκωδικοποιητών.
Κατά τη διάρκεια της υλοποίησης, αξιοποιήθηκε αναπτυξιακό σύστημα βασισμένο σε virtex-4 fpga. / LDPC (low-density parity-check) codes are widely applied for error correction, in the development of highly efficient modern digital communication systems, as satellite Digital Video Broadcast (DVB) DVB-S2, IEEE 802.3an (10GBASE-T) and IEEE 802.16 (WiMax).
LDPC codes are linear block codes, characterized by a sparse parity-check matrix. They are error detection and correction codes. The most typical decoding procedure is the message passing algorithm that implements the iterative exchange of node-generated messages between two types of processing units, called check and variable nodes.
Hardware implementation of an LDPC decoder is a fast growing field for contemporary scientific research. This work presents the results of the design, implementation and optimization of a VLSI decoder for LDPC codes.
Several iterative decoding algorithms have been developed. At this work we present architectures based on the log Sum-Product (Log-SP) and Min-Sum algorithm. Log-SP is theoretically optimal; however Min-Sum is substantially simpler and reduces the hardware complexity. Two alternative decoding algorithms have been developed, that use these two algorithms for the check-node LLR update, and the philosophy of layered decoding for the exchange of messages.
Our study focused on WiMax 801.16e LDPC codes, whose form, based on permuted identity matrices, is suitable for a hardware realization.
The contribution of this work lays within the design and implementation of area and decoding throughput efficient architectures, as well a detailed investigation of design space, using decoding algorithm, message exchange scheduling, pipelining and quantization schemes as design parameters.
Furthermore, important to mention is, -the development of parametric Matlab scripts, in order to achieve easy and automated structural VHDL code production. The two key parameters are the number of the processing units and the data length.
A hardware realization of a LDPC decoder, gives us a simulation tool that is much faster than corresponding software implementations (for example, a matlab implementation).
During the implementation procedure, development board based in virtex-4 fpga has been used.
|
563 |
Post-Routing Analytical Models for Homogeneous FPGA ArchitecturesLeow, Yoon Kah January 2013 (has links)
The rapid growth in Field Programmable Gate Array (FPGA) architecture design space has led to an explosion in architectural choices that exceed well over 1,000,000 configurations. This makes searching for pareto-optimal solutions using a CAD-based incremental design process near impossible for hardware architects and application engineers. Designers need fast and accurate analytical models in order to evaluate the impact of their design choices on performance. Despite the proliferation of FPGA models, todays state-of the art modeling tools suffer from two drawbacks. First, they rely on circuit characteristics extracted from various stages of the FPGA CAD flow making them CAD dependent. Second, they lack ability to take routing architecture parameters into account. These two factors pose as a barrier for converging to the desired implementation rapidly. In this research, we address these two challenges and propose the first static power and post-routing wirelength models in academia. Our models are unique as they are CAD-independent, and they take both logic and routing architecture parameters into account. Using the static power model we are able to estimate the active and idle leakage power dissipation in homogeneous FPGAs with average correlation factor of 95% and mean percentage error of 17% over experimental results based on MCNC benchmarks. Using our wirelength model, we are able to obtain a low mean percentage error of 4.2% and an average correlation factor of 84% using MCNC and VTR benchmarks. We also show that utilizing wirelength model for architecture optimization process reduces the design space exploration time by 53% compared to the CAD-based process. We finally propose an algorithmic approach to estimate the logic density (i.e., number of LUTs) of multiplexer-based circuits, and address the problem of discrete effects in FPGA analytical models. We show that a model that generates logic density of a fundamental circuit element, such as a multiplexer, can be used to estimate performance metrics, such as critical path delay and power.
|
564 |
Post-Routing Analytical Models for Homogeneous FPGA ArchitecturesLeow, Yoon Kah January 2013 (has links)
The rapid growth in Field Programmable Gate Array (FPGA) architecture design space has led to an explosion in architectural choices that exceed well over 1,000,000 configurations. This makes searching for pareto-optimal solutions using a CAD-based incremental design process near impossible for hardware architects and application engineers. Designers need fast and accurate analytical models in order to evaluate the impact of their design choices on performance. Despite the proliferation of FPGA models, todays state-of the art modeling tools suffer from two drawbacks. First, they rely on circuit characteristics extracted from various stages of the FPGA CAD flow making them CAD dependent. Second, they lack ability to take routing architecture parameters into account. These two factors pose as a barrier for converging to the desired implementation rapidly. In this research, we address these two challenges and propose the first static power and post-routing wirelength models in academia. Our models are unique as they are CAD-independent, and they take both logic and routing architecture parameters into account. Using the static power model we are able to estimate the active and idle leakage power dissipation in homogeneous FPGAs with average correlation factor of 95% and mean percentage error of 17% over experimental results based on MCNC benchmarks. Using our wirelength model, we are able to obtain a low mean percentage error of 4.2% and an average correlation factor of 84% using MCNC and VTR benchmarks. We also show that utilizing wirelength model for architecture optimization process reduces the design space exploration time by 53% compared to the CAD-based process. We finally propose an algorithmic approach to estimate the logic density (i.e., number of LUTs) of multiplexer-based circuits, and address the problem of discrete effects in FPGA analytical models. We show that a model that generates logic density of a fundamental circuit element, such as a multiplexer, can be used to estimate performance metrics, such as critical path delay and power.
|
565 |
Design of Low-Floor Quasi-Cyclic IRA Codes and Their FPGA DecodersZhang, Yifei January 2007 (has links)
Low-density parity-check (LDPC) codes have been intensively studied in the past decade for their capacity-approaching performance. LDPC code implementation complexity and the error-rate floor are still two significant unsolved issues which prevent their application in some important communication systems. In this dissertation, we make efforts toward solving these two problems by introducing the design of a class of LDPC codes called structured irregular repeat-accumulate (S-IRA) codes. These S-IRA codes combine several advantages of other types of LDPC codes, including low encoder and decoder complexities, flexibility in design, and good performance on different channels. It is also demonstrated in this dissertation that the S-IRA codes are suitable for rate-compatible code family design and a multi-rate code family has been designed which may be implemented with a single encoder/decoder.The study of the error floor problem of LDPC codes is very difficult because simulating LDPC codes on a computer at very low error rates takes an unacceptably long time. To circumvent this difficulty, we implemented a universal quasi-cyclic LDPC decoder on a field programmable gate array (FPGA) platform. This hardware platform accelerates the simulations by more than 100 times as compared to software simulations. We implemented two types of decoders with partially parallel architectures on the FPGA: a circulant-based decoder and a protograph-based decoder. By focusing on the protograph-based decoder, different soft iterative decoding algorithms were implemented. It provides us with a platform for quickly evaluating and analyzing different quasi-cyclic LDPC codes, including the S-IRA codes. A universal decoder architecture is also proposed which is capable of decoding of an arbitrary LDPC code, quasi-cyclic or not. Finally, we studied the low-floor problem by focusing on one example S-IRA code. We identified the weaknesses of the code and proposed several techniques to lower the error floor. We successfully demonstrated in hardware that it is possible to lower the floor substantially by encoder and decoder modifications, but the best solution appeared to be an outer BCH code.
|
566 |
Debug Interface for 56000 DSPNilsson, Andreas January 2007 (has links)
The scope for this thesis was to design a debug interface for a DSP (digital signal processor). The DSP is a research version of a Motorola 56000 that is designed for a project on asynchronous processor and for use in education. The DSP and debug interface are controlled via a standard PC with RS232 interface equipped with Linux operation system. In the project 4 blocks has been designed: The first block can set the DSP core in debug mode or run mode. The second block sends a debug instruction to the DSP core, these debug instructions were prerequisite to the project. The third block enable read and write connection to the memory buses between the DSP core and the three memory blocks. The forth block can override the control signals to the memories from the DSP core. The project also uses an UART for interpreting and sending control signals and data between the different blocks and the computer. A text terminal program for Linux has also been programmed for handling the PC side communication. The hardware has been constructed and tested together with a dummy DSP core and dummy memories, but it has not been tested together with the live DSP core. The Linux program has been tested the same way and seems to do what it's supposed to, though it leaves a lot work to be easy to handle.
|
567 |
Decoding Ogg Vorbis Audio with The C6416 DSP, using a custom made MDCT core on FPGAKärnhall, Henric January 2007 (has links)
Ogg Vorbis is a fairly new and growing audio format, often used for online distribution of music and internet radio stations for streaming audio. It is considered to be better than MP3 in both quality and compression and in the same league as for example AAC. In contrast with many other formats, like MP3 and AAC, Ogg Vorbis is patent and royalty free. The purpose of this thesis project was to investigate how the C6416 DSP processor and a Stratix II FPGA could be connected to each other and work together as co-processors and using an Ogg Vorbis decoder as implementation example. A fixed-point decoder called Tremor (developed by Xiph.Org the creator of the Vorbis I specification), has been ported to the DSP processor and an Ogg Vorbis player has been developed. Tremor was profiled before performing the software / hardware partitioning to decide what parts of the source code of Tremor that should be implemented in the FPGA to off-load and accelerate the DSP.
|
568 |
Sammanvägning av diversitetssignaler med FPGAMartinsson, Mike January 2007 (has links)
Genom samtal med radioamatörer visade det sig att det fanns ett intresse för att använda rumsdiversitet på deras mottagare då de upplevde fädning (vid körning med bil) som ett problem för hörbarheten. I ett system där mottagaren är stationär och sändaren mobil kommer den mottagna signalen att fäda ibland då radiovågorna tar olika vägar till mottagaren och ibland förstärker och ibland stör varandra. Tanken med detta examensarbete var att kunna ta emot två bandbegränsade audiosignaler från två mottagare med varsin antenn som tar emot samma signal (rumsdiversitet) och vikta ihop dessa med lämplig metod för att få ut en bättre signal. Om man kunde implementera ett diversitetssystem med VHDL i en FPGA så skulle man få ett system som var både billigt och flexibelt. I det här examensarbetet har jag försökt att konstruera ett sådant system.
|
569 |
Pedestrian Detection on FPGAQureshi, Kamran January 2014 (has links)
Image processing emerges from the curiosity of human vision. To translate, what we see in everyday life and how we differentiate between objects, to robotic vision is a challenging and modern research topic. This thesis focuses on detecting a pedestrian within a standard format of an image. The efficiency of the algorithm is observed after its implementation in FPGA. The algorithm for pedestrian detection was developed using MATLAB as a base. To detect a pedestrian, a histogram of oriented gradient (HOG) of an image was computed. Study indicates that HOG is unique for different objects within an image. The HOG of a series of images was computed to train a binary classifier. A new image was then fed to the classifier in order to test its efficiency. Within the time frame of the thesis, the algorithm was partially translated to a hardware description using VHDL as a base descriptor. The proficiency of the hardware implementation was noted and the result exported to MATLAB for further processing. A hybrid model was created, in which the pre-processing steps were computed in FPGA and a classification performed in MATLAB. The outcome of the thesis shows that HOG is a very efficient and effective way to classify and differentiate different objects within an image. Given its efficiency, this algorithm may even be extended to video.
|
570 |
An Energy Efficient FPGA Hardware Architecture for the Acceleration of OpenCV Object DetectionBrousseau, Braiden 21 November 2012 (has links)
The use of Computer Vision in programmable mobile devices could lead to novel and creative applications. However, the computational demands of Computer Vision are ill-suited to low performance mobile processors. Also the evolving algorithms, due to active research in this fi eld, are ill-suited to dedicated digital circuits. This thesis proposes the inclusion of an FPGA co-processor in smartphones as a means of efficiently computing
tasks such as Computer Vision. An open source object detection algorithm is run on a mobile device and implemented on an FPGA to motivate this proposal. Our hardware implementation presents a novel memory architecture and a SIMD processing style that achieves both high performance and energy efficiency. The FPGA implementation outperforms a mobile device by 59 times while being 13.5 times more energy efficient.
|
Page generated in 0.0193 seconds