501 |
Algorithms and architectures for decimal transcendental function computationChen, Dongdong 27 January 2011 (has links)
Nowadays, there are many commercial demands for decimal floating-point (DFP) arithmetic operations such as financial analysis, tax calculation, currency conversion, Internet based applications, and e-commerce. This trend gives rise to further development on DFP arithmetic units which can perform accurate computations with exact decimal operands. Due to the significance of DFP arithmetic, the IEEE 754-2008 standard for floating-point arithmetic includes it in its specifications. The basic decimal arithmetic unit, such as decimal adder, subtracter, multiplier, divider or square-root unit, as a main part of a decimal microprocessor, is attracting more and more researchers' attentions. Recently, the decimal-encoded formats and DFP arithmetic units have been implemented in IBM's system z900, POWER6, and z10 microprocessors.<p>
Increasing chip densities and transistor count provide more room for designers to add more essential functions on application domains into upcoming microprocessors. Decimal transcendental functions, such as DFP logarithm, antilogarithm, exponential, reciprocal and trigonometric, etc, as useful arithmetic operations in many areas of science and engineering, has been specified as the recommended arithmetic in the IEEE 754-2008 standard. Thus, virtually all the computing systems that are compliant with the IEEE 754-2008 standard could include a DFP mathematical library providing transcendental function computation. Based on the development of basic decimal arithmetic units, more complex DFP transcendental arithmetic will be the next building blocks in microprocessors.<p>
In this dissertation, we researched and developed several new decimal algorithms and architectures for the DFP transcendental function computation. These designs are composed of several different methods: 1) the decimal transcendental function computation based on the table-based first-order polynomial approximation method; 2) DFP logarithmic and antilogarithmic converters based on the decimal digit-recurrence algorithm with selection by rounding; 3) a decimal reciprocal unit using the efficient table look-up based on Newton-Raphson iterations; and 4) a first radix-100 division unit based on the non-restoring algorithm with pre-scaling method. Most decimal algorithms and architectures for the DFP transcendental function computation developed in this dissertation have been the first attempt to analyze and implement the DFP transcendental arithmetic in order to achieve faithful results of DFP operands, specified in IEEE 754-2008.<p>
To help researchers evaluate the hardware performance of DFP transcendental arithmetic units, the proposed architectures based on the different methods are modeled, verified and synthesized using FPGAs or with CMOS standard cells libraries in ASIC. Some of implementation results are compared with those of the binary radix-16 logarithmic and exponential converters; recent developed high performance decimal CORDIC based architecture; and Intel's DFP transcendental function computation software library. The comparison results show that the proposed architectures have significant speed-up in contrast to the above designs in terms of the latency. The algorithms and architectures developed in this dissertation provide a useful starting point for future hardware-oriented DFP transcendental function computation researches.
|
502 |
Coherent Shared Memories for FPGAsWoods, David 17 February 2010 (has links)
To build a shared-memory programming model for FPGAs, a fast and highly parallel method of accessing the shared-memory is required. This thesis presents a first look at how to implement a coherent caching system in an FPGA. The coherent caching system consists of multiple distributed caches that implement the write-once coherence protocol, allowing efficient access to system memory while simplifying the user programming model. Several test applications are used to verify functionality, and assess performance of the current system. Results show that with a processor-based system, some applications could benefit from improvements to the coherence system, but for many applications, the current system is sufficient. However, the current coherent caching system is not sufficient for most hardware core based systems, because the faster memory accesses quickly saturate shared system resources. As well, the performance of distributed-memory systems currently surpasses that of the coherent caching system. Performance results are promising, and given the potential for improvements, future work on this system is warranted.
|
503 |
Analysis and Design of Clock-glitch Fault Injection within an FPGADadjou, Masoumeh January 2013 (has links)
In modern cryptanalysis, an active attacker may induce errors during the computation of a cryptographic algorithm and exploit the faulty results to extract information about
the secret key in embedded systems. This kind of attack is called a fault attack. There have been various attack mechanisms with diff erent fault models proposed in the literature. Among them, clock glitch faults support practically dangerous fault attacks on cryptosystems. This thesis presents an FPGA-based practical testbed for characterizing exploitable clock glitch faults and uniformly evaluating cryptographic systems against them. Concentrating on Advanced Encryption Standard (AES), simulation and experimental results illustrates proper features for the clock glitches generated by the implemented on-chip glitch generator. These glitches can be injected reliably with acceptably accurate timing. The produced faults are random but their eff ect domain is finely controllable by the attacker. These features makes clock glitch faults practically suitable for future possible complete fault attacks on AES. This research is important for investigating the viability and analysis of fault injections on various cryptographic functions in future embedded systems.
|
504 |
Handheld Navigation System Implementation on FPGA BoardSalman Ali, Thamer January 2011 (has links)
The widespread use of navigation devices is increasing rapidly. This all becomes possible mainly due to increased development of hardware, for instance increased computing power (e.g. microcontroller, GPS, Compass) and software. The Handheld Navigation (HNS) is one of the navigation techniques. It is used in different fields. Just like any-other means of navigation, it is used to determine the position and direction of the user accurately and find the shortest track with precision. Global Positioning System (GPS) is a technology that can be used to determine position coordinates, time, speed and course over ground. The electronic compass is a traditional device that is used to determine the current directional angle of the user. The goal of the thesis is to compare the results of directions angle and distance from two designs (direction’s angle and distance are calculated based upon information from GPS receiver and the other direction’s angle and distance are calculated based upon information from GPS receiver and Compass). In the thesis, we have developed dual designs to achieve the goal of the thesis. The first design uses the GPS receiver coordinates to calculate the direction angle and distance, the second design integrates the GPS positioning and the digital compass, to calculate the direction and distance of Handheld Navigation user. Each device communicates with the microcontroller through the interfaces. As there are two designs. Directional results are obtained from each design. Then these results are compared with each other. After comparison, the more accurate result is chosen for the user. A Handheld Navigation PCB board design has been made. In addition SD card and LCD display are used. Both designs have been carried out on Altera Cyclone II FPGAs. The result of the prototyping shows, that the best design for Handheld Navigation System is the design that consists of GPS and Compass because the compass sensing is stable depending on the magnetic north while the previous design depends on calculated direction on movement and then also on the speed of movement. / Handhållna navigationssystem för satellitnavigering, GPS, har blivit allt vanligare. Vid navigation måste man känna till riktningen till målet men också i vilken riktning navigationsutrustningen pekar eftersom detta utgör referens för att beräkna korrigeringar. Om navigationsutrustningen rör sig med en viss hastighet så kan rörelseriktningen beräknas från ett antal på varandra följande positions- koordinater. Denna metod fungerar bra i t.ex. ett fordon som rör sig med en rimlig hastighet. Om systemet skall användas av en person som går så uppstår problem. Personen kan stanna upp och vrida runt i olika riktningar. Då finns då inga bra tidigare koordinater för att beräkna rörelseriktningen dvs. hur navigationssystemet pekar. När personen sedan rör sig i en viss riktning så måste systemet förflyttas en viss sträcka innan riktningen kan beräknas. Längden på den sträcka som krävs påverkas också av noggrannheten hos koordinatbestämningen. GPS- systemet har en icke försumbar osäkerhet på ett antal meter. Om en elektronisk kompass används för att bestämma hur navigationssystemet pekar så försvinner kravet på att systemet måste förflyttas för att kunna bestämma sin riktning. I detta examensarbete har ett GPS baserat navigationssystem utvecklats för att kunna jämföra system baserade på enbart GPS med sådana som har också en elektronisk kompass. Ett utvecklingskort för programmerbar logik har använts som plattform. Kortets FPGA-krets innehåller både processor, Nios-II soft core, och interface mot givare och minnen. Resultaten från testerna visar, inte helt oväntat, att ett system med kompass ger en säkrare navigation och en kortare väg mellan start och mål. Detta gäller främst när det finns hinder i vägen.
|
505 |
Cache Coherency for Symmetric Multiprocessor Systems on Programmable ChipsHung, Austin January 2004 (has links)
Rapid progress in the area of Field-Programmable Gate Arrays (FPGAs) has led to the availability of softcore processors that are simple to use, and can enable the development of a fully working system in minutes. This has lead to the enormous popularity of System-On-Programmable-Chip (SOPC) computing platforms. These softcore processors, while relatively simple compared to their leading-edge hardcore counterparts, are often designed with a number of advanced performance-enhancing features, such as instruction and data caches. Moreover, they are designed to be used in a uniprocessor or uncoupled multiprocessor architecture, and not in a tightly-coupled multiprocessing architecture. As a result, traditional cache-coherency protocols are not suitable for use with such systems. This thesis describes a system for enforcing cache coherency on symmetric multiprocessing (SMP) systems using softcore processors. A hybrid protocol that incorporates hardware and software to enforce cache coherency is presented.
|
506 |
Implementing Real-Time Video Deblocking in FPGA HardwareHansen, Martin January 2007 (has links)
Video compression techniques are commonly used to meet the increasing demands for the storage and transmission of digital video content. Popular video compression techniques such as MPEG video encoding make use of block-transform coding algorithms which are susceptible to blocking artifacts. These artifacts can be reduced using a deblocking process, of which there are many. However, those deblocking algorithms which provide noticeable improvements in visual quality also tend to be computationally expensive and unsuitable for real-time video use.
This dissertation selects and examines an appropriate algorithm for real-time video deblocking applications, and describes its hardware implementation on a Altera Cyclone II FPGA. The chosen algorithm is based on the concept of shifted thresholding; it reduces computational complexity by several means, such as by using only integer arithmetic and by replacing division operations with bit shifting. The implementation leverages the reduced hardware complexity of the chosen algorithm to cost-effectively implement real-time video deblocking.
|
507 |
An Architecture for the AES-GCM Security StandardWang, Sheng January 2006 (has links)
The forth recommendation of symmetric block cipher mode of operation SP800-38D, <em>Galois/Counter Mode of Operation</em> (GCM) was developed by David A McGrew and John Viega. GCM uses an approved symmetric key block cipher with a block size of 128 bits and a universal hashing over a binary Galois field to provide confidentiality and authentication. It is built specifically to support very high data rates as it can take advantage of pipelining and parallel processing techniques. <br /><br /> Before GCM, SP800-38A only provided confidentiality and SP800-38B provided authentication. SP800-38C provided confidentiality using the counter mode and authentication. However the authentication technique in SP800-38C was not parallelizable and slowed down the throughput of the cipher. Hence, none of these three recommendations were suitable for high speed network and computer system applications. <br /><br /> With the advent of GCM, authenticated encryption at data rates of several Gbps is now practical, permitting high grade encryption and authentication on systems which previously could not be fully protected. However there have not yet been any published results on actual architectures for this standard based on FPGA technology. <br /><br /> This thesis presents a fully pipelined and parallelized hardware architecture for AES-GCM which is GCM running under symmetric block cipher AES on a FPGA multi-core platform corresponding to the IPsec ESP data flow. <br /><br /> The results from this thesis show that the round transformations of confidentiality and hash operations of authentication in AES-GCM can cooperate very efficiently within this pipelined architecture. Furthermore, this AES-GCM hardware architecture never unnecessarily stalls data pipelines. For the first time this thesis provides a complete FPGA-based high speed architecture for the AES-GCM standard, suitable for high speed embedded applications.
|
508 |
On the Use of Directed Moves for Placement in VLSI CADVorwerk, Kristofer January 2009 (has links)
Search-based placement methods have long been used for placing integrated circuits targeting the field programmable gate array (FPGA) and standard cell design styles. Such methods offer the potential for high-quality solutions but often come at the cost of long run-times compared to alternative methods.
This dissertation examines strategies for enhancing local search heuristics---and in particular, simulated annealing---through the application of directed moves. These moves help to guide a search-based optimizer by focusing efforts on states which are most likely to yield productive improvement, effectively pruning the size of the search space.
The engineering theory and implementation details of directed moves are discussed in the context of both field programmable gate array and standard cell designs. This work explores the ways in which such moves can be used to improve the quality of FPGA placements, improve the robustness of floorplan repair and legalization methods for mixed-size standard cell designs, and enhance the quality of detailed placement for standard cell circuits. The analysis presented herein confirms the validity and efficacy of directed moves, and supports the use of such heuristics within various optimization frameworks.
|
509 |
Post-mapping Topology Rewriting for FPGA Area MinimizationChen, Lei January 2009 (has links)
Circuit designers require Computer-Aided Design (CAD) tools when compiling designs into Field Programmable Gate Arrays (FPGAs) in order to achieve high quality results due to the complexity of the compilation tasks involved. Technology mapping is one critical step in the FPGA CAD flow. The final mapping
result has significant impact on the subsequent steps of clustering, placement
and routing, for the objectives of delay, area and power dissipation. While depth-optimal FPGA technology mapping can be solved in polynomial time, area minimization has proven to be NP-hard.
Most modern state-of-the-art FPGA technology mappers are structural in nature; they are based on cut enumeration and use various heuristics to yield depth and area minimized solutions. However, the results produced by structural technology mappers rely strongly on the structure of the input netlists.
Hence, it is common to apply additional heuristics after technology mapping to further optimize area and reduce the amount of structural bias while not harming depth.
Recently, SAT-based Boolean matching has been used for post-mapping area minimization. However, SAT-based matching is computationally complex and too time consuming in practice.
This thesis proposes an alternative Boolean matching approach based on NPN equivalence. Using a library of pre-computed topologies, the matching problem becomes as simple as performing NPN encoding followed by a hash lookup which is very efficient. In conjunction with Ashenhurst decomposition, the NPN-based Boolean matching is allowed to handle up to 10-input Boolean functions.
When applied to a large set of designs, the proposed algorithm yields, on average, more than 3% reduction in circuit area without harming circuit depth. The priori generation of a library of topologies can be difficult; the potential difficulty in generating a library of topologies represents one limitation of the proposed algorithm.
|
510 |
CMOS bildsensor och Cyclone I I Kameramodul till DE2 / Interface for TRDB_DC2 CMOS camera moduleBok, Daniel January 2007 (has links)
Detta dokument beskriver hur man kan använda kameramodulen TRDB DC2 från Terasic tillsammans med ett utvecklingskort DE2 för Alteras FPGA-kretsar. Kamerabilder överförs från kameramodulen till en VGA-skärm. VGA-bilden har en upplösning på 640 x 480 pixlar och 10bitars upplösning på färgerna. Systemet presterar maximalt 15 bilder per sekund och det är själva bildsensorn som sätter den begränsningen, man kan bla ändra exponeringstid och frysa bilden om man så vill. Hela projektet är skrivet i VHDL och arbetet är gjort i Quartus 6.0 från Altera. VHDL-koden är skriven i första hand för att vara lättförståelig och enkel att modifiera, några större ansträngningar för att minimera hårdvara eller på annat sätt effektivisera konstruktionen har inte gjorts.
|
Page generated in 0.034 seconds