Global ETD Search

1	Vector processing as a soft-core processor accelerator Yu, Jason Kwok Kwun 11 1900 (has links) Soft processors simplify hardware design by being able to implement complex control strategies using software. However, they are not fast enough for many intensive data-processing tasks, such as highly data-parallel embedded applications. This thesis suggests adding a vector processing core to the soft processor as a general-purpose accelerator for these types of applications. The approach has the benefits of a purely software-oriented development model, a fixed ISA allowing parallel software and hardware development, a single accelerator that can accelerate multiple functions in an application, and scalable performance with a single source code. With no hardware design experience needed, a software programmer can make area-versus-performance tradeoffs by scaling the number of functional units and register file bandwidth with a single parameter. The soft vector processor can be further customized by a number of secondary parameters to add and remove features for the specific application to optimize resource utilization. This thesis shows that a vector processing architecture maps efficiently into an FPGA and provides a scalable amount of performance for a reasonable amount of area. Configurations of the soft vector processor with different performance levels are estimated to achieve speedups of 2-24x for 5-26x the area of a Nios II/s processor on three benchmark kernels. Soft processors Data processing Software
2	Vector processing as a soft-core processor accelerator Yu, Jason Kwok Kwun 11 1900 (has links) Soft processors simplify hardware design by being able to implement complex control strategies using software. However, they are not fast enough for many intensive data-processing tasks, such as highly data-parallel embedded applications. This thesis suggests adding a vector processing core to the soft processor as a general-purpose accelerator for these types of applications. The approach has the benefits of a purely software-oriented development model, a fixed ISA allowing parallel software and hardware development, a single accelerator that can accelerate multiple functions in an application, and scalable performance with a single source code. With no hardware design experience needed, a software programmer can make area-versus-performance tradeoffs by scaling the number of functional units and register file bandwidth with a single parameter. The soft vector processor can be further customized by a number of secondary parameters to add and remove features for the specific application to optimize resource utilization. This thesis shows that a vector processing architecture maps efficiently into an FPGA and provides a scalable amount of performance for a reasonable amount of area. Configurations of the soft vector processor with different performance levels are estimated to achieve speedups of 2-24x for 5-26x the area of a Nios II/s processor on three benchmark kernels. Soft processors Data processing Software
3	Vector processing as a soft-core processor accelerator Yu, Jason Kwok Kwun 11 1900 (has links) Soft processors simplify hardware design by being able to implement complex control strategies using software. However, they are not fast enough for many intensive data-processing tasks, such as highly data-parallel embedded applications. This thesis suggests adding a vector processing core to the soft processor as a general-purpose accelerator for these types of applications. The approach has the benefits of a purely software-oriented development model, a fixed ISA allowing parallel software and hardware development, a single accelerator that can accelerate multiple functions in an application, and scalable performance with a single source code. With no hardware design experience needed, a software programmer can make area-versus-performance tradeoffs by scaling the number of functional units and register file bandwidth with a single parameter. The soft vector processor can be further customized by a number of secondary parameters to add and remove features for the specific application to optimize resource utilization. This thesis shows that a vector processing architecture maps efficiently into an FPGA and provides a scalable amount of performance for a reasonable amount of area. Configurations of the soft vector processor with different performance levels are estimated to achieve speedups of 2-24x for 5-26x the area of a Nios II/s processor on three benchmark kernels. / Applied Science, Faculty of / Electrical and Computer Engineering, Department of / Graduate Soft processors Data processing Software
4	Low-cost Hardware Profiling of Run-time and Energy in FPGA Soft Processors Aldham, Mark 11 August 2011 (has links) Field Programmable Gate Arrays (FPGAs) are a reconfigurable hardware platform which enable the acceleration of software code through the use of custom-hardware circuits. Complex systems combining processors with programmable logic require partitioning to decide which code segments to accelerate. This thesis provides tools to help determine which software code sections would most benefit from hardware acceleration. A low-overhead profiling architecture, called LEAP, is proposed to attain real-time profiles of an FPGA-based processor. LEAP is designed to be extensible for a variety of profiling tasks, three of which are investigated and implemented to identify candidate software for acceleration. 1) Cycle profiling determines the most time-consuming functions to maximize speedup. 2) Cache stall profiling detects memory-intensive code; large memory overheads reduce the benefits of acceleration. 3) Energy consumption profiling detects energy-inefficient code through the use of an instruction-level power database to minimize the system's energy consumption. Profiling FPGAs hardware/software co-design hardware/software partitioning soft processors 0544
5	Low-cost Hardware Profiling of Run-time and Energy in FPGA Soft Processors Aldham, Mark 11 August 2011 (has links) Field Programmable Gate Arrays (FPGAs) are a reconfigurable hardware platform which enable the acceleration of software code through the use of custom-hardware circuits. Complex systems combining processors with programmable logic require partitioning to decide which code segments to accelerate. This thesis provides tools to help determine which software code sections would most benefit from hardware acceleration. A low-overhead profiling architecture, called LEAP, is proposed to attain real-time profiles of an FPGA-based processor. LEAP is designed to be extensible for a variety of profiling tasks, three of which are investigated and implemented to identify candidate software for acceleration. 1) Cycle profiling determines the most time-consuming functions to maximize speedup. 2) Cache stall profiling detects memory-intensive code; large memory overheads reduce the benefits of acceleration. 3) Energy consumption profiling detects energy-inefficient code through the use of an instruction-level power database to minimize the system's energy consumption. Profiling FPGAs hardware/software co-design hardware/software partitioning soft processors 0544
6	Overlay Architectures for FPGA-Based Software Packet Processing Martin, Labrecque 16 June 2011 (has links) Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since software is the preferred way of expressing complex computation, we are interested in finding a platform to execute packet processing software with the best possible throughput. Because FPGAs are widely used in network equipment and they can implement processors, we are motivated to investigate executing software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric are currently geared towards performing embedded sequential tasks and, in contrast, network processing is most often inherently parallel between packet flows, if not between each individual packet. Our goal is to allow multiple threads of execution in an FPGA to reach a higher aggregate throughput than commercially available shared-memory soft multi-processors via improvements to the underlying soft processor architecture. We study a number of processor pipeline organizations to identify which ones can scale to a larger number of execution threads and find that tuning multithreaded pipelines can provide compact cores with high throughput. We then perform a design space exploration of multicore soft systems, compare single-threaded and multithreaded designs to identify scalability limits and develop processor architectures allowing threads to execute with as little architectural stalls as possible: in particular with instruction replay and static hazard detection mechanisms. To further reduce the wait times, we allow threads to speculatively execute by leveraging transactional memory. Our multithreaded multiprocessor along with our compilation and simulation framework makes the FPGA easy to use for an average programmer who can write an application as a single thread of computation with coarse-grained synchronization around shared data structures. Comparing with multithreaded processors using lock-based synchronization, we measure up to 57\% additional throughput with the use of transactional-memory-based synchronization. Given our applications, gigabit interfaces and 125 MHz system clock rate, our results suggest that soft processors can process packets in software at high throughput and low latency, while capitalizing on the FPGAs already available in network equipment. computer architecture soft processors FPGA packet processing network processor multithreaded transactional memory 0544
7	Overlay Architectures for FPGA-Based Software Packet Processing Martin, Labrecque 16 June 2011 (has links) Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly, there is increasing interest in adapting features of the processing over time and, since software is the preferred way of expressing complex computation, we are interested in finding a platform to execute packet processing software with the best possible throughput. Because FPGAs are widely used in network equipment and they can implement processors, we are motivated to investigate executing software directly on the FPGAs. Off-the-shelf soft processors on FPGA fabric are currently geared towards performing embedded sequential tasks and, in contrast, network processing is most often inherently parallel between packet flows, if not between each individual packet. Our goal is to allow multiple threads of execution in an FPGA to reach a higher aggregate throughput than commercially available shared-memory soft multi-processors via improvements to the underlying soft processor architecture. We study a number of processor pipeline organizations to identify which ones can scale to a larger number of execution threads and find that tuning multithreaded pipelines can provide compact cores with high throughput. We then perform a design space exploration of multicore soft systems, compare single-threaded and multithreaded designs to identify scalability limits and develop processor architectures allowing threads to execute with as little architectural stalls as possible: in particular with instruction replay and static hazard detection mechanisms. To further reduce the wait times, we allow threads to speculatively execute by leveraging transactional memory. Our multithreaded multiprocessor along with our compilation and simulation framework makes the FPGA easy to use for an average programmer who can write an application as a single thread of computation with coarse-grained synchronization around shared data structures. Comparing with multithreaded processors using lock-based synchronization, we measure up to 57\% additional throughput with the use of transactional-memory-based synchronization. Given our applications, gigabit interfaces and 125 MHz system clock rate, our results suggest that soft processors can process packets in software at high throughput and low latency, while capitalizing on the FPGAs already available in network equipment. computer architecture soft processors FPGA packet processing network processor multithreaded transactional memory 0544
8	Ugurel, Gokhan 01 June 2012 (has links) (PDF) In real time embedded systems, more and more developers are choosing the soft processor option to save money, power and area on their boards. Reconfigurability concept of the soft processor gives more options to the designer, also solving the problem of processor obsolescence. Another increasing trend is using real time operating systems (RTOSs) for microprocessors or microcontrollers. RTOSs help software developers to meet the critical deadlines of the real time environment with their deterministic and predictable behaviour. Providing service APIs and fast response times for task management, memory and interrupts / RTOSs decrease the development time of on going, and also future, projects of software developers. Comparing RTOSs on RTOS-specific benchmark criteria, called RTOS benchmarking in the literature, helps software developers to choose the appropriate RTOS for their requirements and provokes RTOS companies to strengthen their products on areas where they are weak. This study will compare three popular RTOSs on Xilinx&rsquo / s soft processor platform MicroBlaze. Xilkernel, &micro / C/OS-II and FreeRTOS are selected among nine available RTOSs for MicroBlaze and are compared against critical RTOS benchmarking criteria, which are task preemption time, task preemption time under load, get/release semaphore time, pass/receive message time, get/release fixed sized dynamic memory time, UART RS-422 message interrupt serving time, RTOS initialization time and memory footprint data. Results are interpreted using architectural concepts of the RTOSs considered. QA Computer Software 76.75-76.765
9	Compiling for a Multithreaded Horizontally-microcoded Soft Processor Family Tili, Ilian 28 November 2013 (has links) Soft processing engines make FPGA programming simpler for software programmers. TILT is a multithreaded soft processing engine that contains multiple deeply pipelined and varying latency functional units. In this thesis, we present a compiler framework for compiling and scheduling for TILT. By using the compiler to generate schedules and manage hardware we create computationally dense designs (high throughput per hardware area) which make compelling processing engines. High schedule density is achieved by mixing instructions from different threads and by prioritizing the longest path of data flow graphs. Averaged across benchmark kernels we can achieve 90% of the theoretical throughput, and can reduce the performance gap relative to custom hardware from 543x for a scalar processor to only 4.41x by replicating TILT cores up to a comparable area cost. We also present methods of quickly navigating the design space and predicting the area of hardware configurations. compiling soft processors fpga scheduling throughput computational density design space computer architecture 0544 0984
10	Compiling for a Multithreaded Horizontally-microcoded Soft Processor Family Tili, Ilian 28 November 2013 (has links) Soft processing engines make FPGA programming simpler for software programmers. TILT is a multithreaded soft processing engine that contains multiple deeply pipelined and varying latency functional units. In this thesis, we present a compiler framework for compiling and scheduling for TILT. By using the compiler to generate schedules and manage hardware we create computationally dense designs (high throughput per hardware area) which make compelling processing engines. High schedule density is achieved by mixing instructions from different threads and by prioritizing the longest path of data flow graphs. Averaged across benchmark kernels we can achieve 90% of the theoretical throughput, and can reduce the performance gap relative to custom hardware from 543x for a scalar processor to only 4.41x by replicating TILT cores up to a comparable area cost. We also present methods of quickly navigating the design space and predicting the area of hardware configurations. compiling soft processors fpga scheduling throughput computational density design space computer architecture 0544 0984

Search results