Spelling suggestions: "subject:"pipelining"" "subject:"sidelining""
11 |
Enabling Gigabit IP for Embedded SystemsTsakiris, Nicholas, n.tsakiris@internode.on.net January 2009 (has links)
For any practical implementation of chip design, there needs to be a hardware platform available for the purpose of prototyping and implementation of FPGA-based programs, whether they are written in VHDL or Verilog. Communication between the platform and a computer is a useful feature of many hardware solutions as it allows for the capability of regular data transmission between the two devices. Furthermore, the ability to communicate between the platform and a computer at high-speeds requires a specially constructed interface, one that can be modified by the designer at their choosing.
There are a number of commercial packages which provide a hardware platform to perform this task, however there are drawbacks to many of the available options. Some may require special hardware to connect to a computer using proprietary connectors or boards, which increases the cost and reduces the flexibility of any solution. Other options may have limited access to the internal structure of the interface, limiting the ability of the developer to modify the interface to suit their needs. There may be an extra cost to provide the code to the interface, separate from the board, which can also tax design budgets.
This dissertation provides a solution in the form of a Gigabit Ethernet connection with a custom IP/network layer written in VHDL to facilitate the connection. With an increasing number of IP-enabled devices available such as IPTV and set top boxes, the ability to link hardware using Ethernet is very useful and so the development of a lean and capable network layer was considered a suitable focus for the project. The overall goal has been to provide an interface which is cheap, open, robust and efficient, retaining the flexibility a developer might require to modify the code to their needs.
After covering some basic background information about the project, the dissertation looks at the requirements of the board and interface, as well as the alternative interface solutions which were looked at before deciding on Gigabit Ethernet. The protocols used in Ethernet are then covered, with both an explanation of the structure of each and their relevance to the implementation. The Finite State Machines which control operation of the interface are covered in depth, with an explanation of their inter-connectivity to each other and how they fit in the data-flow between the computer and the board. Error correction and reliability is discussed, as well as any remaining components critical to the operation of the interface.
Pipelining, the method of design which provides the speed required for Gigabit Ethernet, is covered along with the extra speed optimisation techniques used in the design such as RAM swinging buffers. Testing and synthesis are covered which ensure the design is as robust as possible, both in simulations and in real-world applications. The final design was implemented on a Xilinx Spartan 3 FPGA (XC3S5000-5FG900C) and capable of a maximum speed of 128.287 MHz, which is more than enough to satisfy the requirements of Gigabit Ethernet under a variety of network conditions. The interface code occupies 1,166 slices of logic on the FPGA (3% of the total amount of logic available), making it sufficiently compact to run large projects on the same chip. The core was tested on physical hardware and performed correctly at real line Gigabit speeds. Configuration of the computer along with the method of connecting to the board and transferring data is mentioned, with explanation of the code run on the computer to make this possible. Finally, the dissertation provides an example application through the use of JPEG2000 image compression/decompression.
|
12 |
Aplicação de Loop Pipelining e Loop Unrolling à síntese de alto nívelFerrari, Dione Jonathan January 2002 (has links)
Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Ciência da Computação. / Made available in DSpace on 2012-10-20T09:40:08Z (GMT). No. of bitstreams: 1
188157.pdf: 431174 bytes, checksum: 32f4521af19e9fdd365fc47b62c259a6 (MD5) / Este trabalho tem como objetivo resolver um problema clássico da Síntese de Alto Nível através de uma abordagem orientada à exploração de soluções alternativas. O problema consiste no escalonamento de operações de um dado algoritmo sob restrição de recursos físicos de forma que cada operação é executada respeitando a ordem de precedência imposta pelo algoritmo. Para abordar o problema acima, utilizou-se as técnicas de Loop Pipelining e Loop Unrolling, onde operações de diferentes iterações podem ser executadas em um mesmo estado. Estas técnicas, por exporem mais paralelismo, permitem uma melhor utilização dos recursos. Este trabalho descreve a abordagem proposta, a modelagem que a ampara e a implementação de ferramentas que a suportam (escalonador e paralelizador). São apresentados resultados experimentais obtidos a partir de exemplos clássicos da literatura.
|
13 |
Explicitly Staged Software PipeliningThaller, Wolfgang 08 1900 (has links)
<p> Software Pipelining is a method of instruction scheduling where loops
are scheduled more efficiently by executing operations from more than one
iteration of the loop in parallel. Finding an optimal software pipelined schedule
is NP-complete, but many heuristic algorithms exist. </p> In iteration i, a software pipelined loop will execute, in parallel, "stage" 1 of iteration i, stage 2 of iteration i- 1 and so on until stage k of iteration i-k+l. </p> <p> We present a new approach to software pipelining based on using a hemistic algorithm to explicitly assign each operation to its stage before the actual scheduling. </p> <p> This explicit assignment allows us to implement control flow mechanisms that are hard to implement with traditional methods of software pipelining, which do not give us direct control over what stages instructions are assigned to. </p> / Thesis / Master of Science (MSc)
|
14 |
Dynamic execution prediction and pipeline balancing of streaming applicationsAleen, Farhana Afroz 30 August 2010 (has links)
The number and scope of data driven streaming applications is growing. Such streaming applications are promising targets for
effectively utilizing multi-cores because of their inherent amenability to pipelined parallelism. While existing methods of
orchestrating streaming programs on multi-cores have mostly been static, real-world applications show ample variations in execution time that may cause the achieved speedup and throughput to be sub-optimal. One of the principle challenges for moving towards dynamic pipeline balancing has been the lack of approaches that can predict upcoming dynamic variations in execution efficiently, well before they occur. In this thesis, we propose an automated dynamic execution behavior prediction approach based on compiler analysis that can be used to efficiently estimate the time to be spent in different pipeline stages for
upcoming inputs. Our approach first uses dynamic taint analysis to automatically generate an input-based execution characterization of the streaming program, which identifies the key control points where variation in execution might occur with respect to the associated input elements. We then automatically generate a light-weight emulator from the
program using this characterization that can predict the execution paths taken for new streaming inputs and provide execution time estimates and possible dynamic variations. The main challenge in devising such an approach is the essential trade-off
between accuracy and overhead of dynamic analysis. We present experimental evidence that our technique can accurately and
efficiently estimate dynamic execution behaviors for several benchmarks with a small error rate. We also showed that the error rate could be lowered with the trade-off of execution overhead by implementing a selective symbolic expression generation for each of the complex conditions of control-flow operations. Our experiments show that dynamic pipeline balancing using our predicted execution behavior can achieve considerably higher speedup and throughput along with more effective utilization of multi-cores than static balancing approaches.
|
15 |
Optimizing Sparse Matrix-Matrix Multiplication on a Heterogeneous CPU-GPU PlatformWu, Xiaolong 16 December 2015 (has links)
Sparse Matrix-Matrix multiplication (SpMM) is a fundamental operation over irregular data, which is widely used in graph algorithms, such as finding minimum spanning trees and shortest paths. In this work, we present a hybrid CPU and GPU-based parallel SpMM algorithm to improve the performance of SpMM. First, we improve data locality by element-wise multiplication. Second, we utilize the ordered property of row indices for partial sorting instead of full sorting of all triples according to row and column indices. Finally, through a hybrid CPU-GPU approach using two level pipelining technique, our algorithm is able to better exploit a heterogeneous system. Compared with the state-of-the-art SpMM methods in cuSPARSE and CUSP libraries, our approach achieves an average of 1.6x and 2.9x speedup separately on the nine representative matrices from University of Florida sparse matrix collection.
|
16 |
An ALU design using a novel asynchronous pipeline architecture.January 2000 (has links)
Tang, Tin-Yau. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2000. / Includes bibliographical references (leaves 122-123). / Abstracts in English and Chinese. / Table of Content --- p.2 / List of Figures --- p.4 / List of Tables --- p.6 / Acknowledgements --- p.7 / Abstract --- p.8 / Chapter I. --- Introduction --- p.11 / Chapter 1.1 --- Asynchronous Design --- p.12 / Chapter 1.1.1 --- What is asynchronous design? --- p.12 / Chapter 1.1.2 --- Potential advantages of asynchronous design --- p.12 / Chapter 1.1.3 --- Design methodology for asynchronous circuit --- p.15 / Chapter 1.1.4 --- Difficulty and limitation of asynchronous design --- p.19 / Chapter 1.2 --- Pipeline and Asynchronous Pipeline --- p.21 / Chapter 1.2.1 --- What is pipeline? --- p.21 / Chapter 1.2.2 --- Property of pipeline system --- p.21 / Chapter 1.2.3 --- Asynchronous pipeline --- p.23 / Chapter 1.3 --- Design Motivation --- p.26 / Chapter II. --- Design Theory --- p.27 / Chapter 2.1 --- A Novel Asynchronous Pipeline Architecture --- p.28 / Chapter 2.1.1 --- The problem of classical asynchronous pipeline --- p.28 / Chapter 2.1.2 --- The new handshake cell --- p.28 / Chapter 2.1.3 --- The modified asynchronous pipeline architecture --- p.29 / Chapter 2.2 --- Design of the ALU --- p.36 / Chapter 2.2.1 --- The functionality of ALU --- p.36 / Chapter 2.2.2 --- The choice of the adder and the BLC adder --- p.37 / Chapter III. --- Implementation --- p.41 / Chapter 3.1 --- ALU Detail --- p.42 / Chapter 3.1.1 --- Global arrangement --- p.42 / Chapter 3.1.2 --- Shift and Rotate --- p.46 / Chapter 3.1.3 --- Flags generation --- p.49 / Chapter 3.2 --- Application of the Pipeline Architecture --- p.53 / Chapter 3.2.1 --- The reset network for the pipeline architecture --- p.53 / Chapter 3.2.2 --- Handshake simplification for splitting and joining of datapath. --- p.55 / Chapter IV. --- Result --- p.59 / Chapter 4.1 --- Measurement and Simulation Result --- p.60 / Chapter 4.2 --- Global Routing Parasites --- p.63 / Chapter 4.3 --- Low Power Application --- p.65 / Chapter V. --- Conclusion --- p.67 / Chapter VI. --- Appendixes --- p.69 / Chapter 6.1 --- The Small Micro-coded Processor --- p.69 / Chapter 6.2 --- The Instruction Table of the ALU --- p.70 / Chapter 6.3 --- Measurement and Simulation Result --- p.71 / Chapter 6.4 --- "VHDLs, Schematics and Layout" --- p.87 / Chapter 6.5 --- Pinout of the Test Chip --- p.120 / Chapter 6.6 --- The Chip Photo --- p.121 / Chapter VII. --- Reference --- p.122
|
17 |
Wave-Pipelined Multiplexed (WPM) Routing for Gigascale Integration (GSI)Joshi, Ajay Jayant 12 April 2006 (has links)
The main objective of this research is to develop a pervasive wire sharing technique that can be easily applied across the entire range of on-chip interconnects in a very large scale integration (VLSI) system. A wave-pipelined multiplexed (WPM) routing technique that can be applied both intra-macrocell and inter-macrocell interconnects is proposed in this thesis. It is shown that an extensive application of the WPM routing technique can provide significant advantages in terms of area, power and performance. In order to study the WPM routing technique, a hierarchical approach is adopted. A circuit-level, system-level and physical-level analysis is completed to explore the limits and opportunities to apply WPM routing to current VLSI and future gigascale integration (GSI) systems. Design, verification and optimization of the WPM circuit and measurement of its tolerance to external noise constitute the circuit-level analysis. The physical-level study involves designing wire sharing-aware placement algorithms to maximize the advantages of WPM routing. A system-level simulator that designs the entire multilevel interconnect network is developed to perform the system-level analysis. The effect of WPM routing on a full-custom interconnect network and a semi-custom interconnect network is studied.
|
18 |
The Study of Double Level Branch BufferChen, Yi-Chang 12 October 2001 (has links)
Pipelining is the major organizational technique by which computers can execute several instructions simultaneously to reach higher single-processor performance. Branches are recognized as a major impediment to achieve the maximum performance of pipelining and superscalar processors due to stalls caused by unresolved branches. Branch prediction is an effective strategy to reduce the branch penalty via predicting, prefetching and executing the speculative instructions before the branch is resolved. A branch target buffer (BTB)[13] can reduce the performance caused by branches via predicting the direction of the branch and caching information about the branch. While prediction is incorrect, the processor requires flushing the speculative instructions, undoing the effects of the improperly initiated speculative execution and resuming on the correct path. These flushing and refilling degrade significantly processor performance.
In this thesis we propose a mechanism, Double Level Branch Buffer, which can reduce the branch penalty and performance loss caused from incorrect prediction. We try to cache the information of branch about both taken and not taken direction. The pipeline will degrade the dependence upon branch prediction accuracy by utilizing this mechanism.
|
19 |
A MOSCAP pipeline pseudo passive DAC /Behera, Prachee Shree. January 1900 (has links)
Thesis (M.S.)--Oregon State University, 2006. / Printout. Includes bibliographical references (leaves 104-107). Also available on the World Wide Web.
|
20 |
A compiler framework for loop nest software-pipeliningDouillet, Alban. January 2006 (has links)
Thesis (Ph.D.)--University of Delaware, 2006. / Principal faculty advisor: Guang R. Gao, Dept. of Electrical and Computer Engineering. Includes bibliographical references.
|
Page generated in 0.0885 seconds