Global ETD Search

71	FPGA-based Soft Vector Processors Yiannacouras, Peter 23 February 2010 (has links) FPGAs are increasingly used to implement embedded digital systems because of their low time-to-market and low costs compared to integrated circuit design, as well as their superior performance and area over a general purpose microprocessor. However, the hardware design necessary to achieve this superior performance and area is very difficult to perform causing long design times and preventing wide-spread adoption of FPGA technology. The amount of hardware design can be reduced by employing a microprocessor for less-critical computation in the system. Often this microprocessor is implemented using the FPGA reprogrammable fabric as a soft processor which can preserve the benefits of a single-chip FPGA solution without specializing the device with dedicated hard processors. Current soft processors have simple architectures that provide performance adequate for only the least-critical computations. Our goal is to improve soft processors by scaling their performance and expanding their suitability to more critical computation. To this end we focus on the data parallelism found in many embedded applications and propose that soft processors be augmented with vector extensions to exploit this parallelism. We support this proposal through experimentation with a parameterized soft vector processor called VESPA (Vector Extended Soft Processor Architecture) which is designed, implemented, and evaluated on real FPGA hardware. The scalability of VESPA combined with several other architectural parameters can be used to finely span a large design space and derive a custom architecture for exactly matching the needs of an application. Such customization is a key advantage for soft processors since their architectures can be easily reconfigured by the end-user. Specifically, customizations can be made to the pipeline, functional units, and memory system within VESPA. In addition, general purpose overheads can be automatically eliminated from VESPA. Comparing VESPA to manual hardware design, we observe a 13x speed advantage for hardware over our fastest VESPA, though this is significantly less than the 500x speed advantage over scalar soft processors. The performance-per-area of VESPA is also observed to be significantly higher than a scalar soft processor suggesting that the addition of vector extensions makes more efficient use of silicon area for data parallel workloads. soft processor soft Vector processor FPGA custom embedded SIMD data parallel 0544
72	Design automation methodologies for extensible processor platform Cheung, Newton, Computer Science & Engineering, Faculty of Engineering, UNSW January 2005 (has links) This thesis addresses two ubiquitous trends in the embedded system world - the increasing importance of design turnaround time as a design metric, and the move towards closing the design productivity gap. Adopting the right choice of design approach has been recognised as an integral part of the design flow in order to meet desired characteristics such as increasing software content, satisfying the growing complexities of an application, reusing off-the-shelf components, and exploring design metrics tradeoff, which closes the design productivity gap. The importance of design turnaround time is motivated by the intensive competition between manufacturers, especially makers of mainstream electronic consumer products, who shrinks the product life cycle and requires faster time-to-market to maximise economic benefits. This thesis presents a suite of design automation methodologies to automatically design embedded systems for an application in the state-of-the-art design approach - the extensible processor platform. These design automation methodologies systematise the extensible processor platform???s design flow, with particular emphasis on solving four challenging design problems: i) code segment identification; ii) instruction generation; iii) architectural customisation selection; and iv) processor evaluation. Our suite of design automation methodologies includes: i) a semi-automatic design system - to design an extensible processor that maximises the application performance while satisfying the area constraint. By specifying a fitting function to identify suitable code segments within an application, a two-level hierarchy selection algorithm is used to first select a predefined processor and then select the right instruction, and a performance estimator is used to estimate an application's performance; ii) a tool to match instructions - to automatically match the pre-designed instructions with computationally intensive code segments, reducing verification time and effort; iii) an instructions estimation model - to estimate the area overhead, latency, power consumption of extensible instructions, exploring larger design space; and iv) an instructions generation tool - to generate new extensible instructions that maximises the speedup while minimising power dissipation. A number of techniques such as system decomposition, combinational equivalence checking and regression analysis etc., have been heavily relied upon in the creation of the final design system. This thesis shows results at every stage to demonstrate the efficacy of our design methodologies in the creation of extensible processors. The methodologies and results presented in this thesis demonstrate that automating the design process for an extensible processor platform results in significant performance increase - on average, an increase of 4.74x (up to 15.71x) compared to the original base processor. Our system achieves significant design turnaround time savings (2.5% of the full simulation time for the entire design space) with majority Pareto points obtained (91% on average), and can lead to fewer and faster design iterations. Our instruction matching tool is 7.3x faster on average compared to the best known approaches to the problem (partial simulations). Our estimation model has a mean absolute error as small as 3.4% (6.7% max.) for area overhead, 5.9% (9.4% max.) for latency, and 4.2% (7.2% max.) for power consumption, compared to estimation through the time consuming synthesis and simulation steps using commercial tools. Finally, the instruction generation tool reduces energy consumption by a further 5.8% on average (up to 17.7%) compared to extensible instructions generated by previous approaches. data processing design and construction integrated circuits design automation extensible processor
73	Improving processor power demand comprehension in data-driven power and software phase classification and prediction Khoshbakht, Saman 14 August 2018 (has links) The single-core performance trend predicted by Moore's law has been impeded in recent years partly due to the limitations imposed by increasing processor power demands. One way to mitigate this limitation in performance improvement is the introduction of multi-core and multi-processor computation. Another approach to increasing the performance-per-Watt metric is to utilize the processor's power more efficiently. In a single-core system, the processor cannot sustainably dissipate more than the nominal Thermal Design Power (TDP) limit determined for the processor at design time. Therefore it is important to understand and manage the power demands of the processes being executed. This principle also applies to multi-core and multi-processor environments. In a multi-processor environment, knowing the power demands of the workload, the power management unit can schedule the workload to a processor based on the state of each processor and process in the most efficient way. This is an example of the knapsack problem. Another approach, also applicable to multi-cores, could be to reduce the core's power by reducing its working voltage and frequency, leading to mitigation of the power bursts, lending more headroom to other cores, and keeping the total power under the TDP limit. The information collected from the execution of the software running on the processor (i.e. the workload) is the key to determining the actions needed with regards to power management at any given time. This work comprises two different approaches in improving the comprehension of software power demands as it executes on the processor. In the first part of this work, the effects of software data on power is analysed. It is important to be able to model the power based on the instructions it comprises, however, to the best of our knowledge, no work exists in which the effects of the values being processed has been investigated with regards to processor power. Creating a power model capable of accurately reflecting the power demands of the software at any given time is a problem addressed by previous research. The software power model can be used in processor simulation environments as well as in the processor itself to create an estimated power dissipation without the need to physically measure the power. In the first part of this research, the effects of software data on power is investigated. In order to collect the data required as part of this research, a profiler tool has been developed by the author and used in this part of the research as well as the second part. The second part of this work focuses on the development of processor power throughout time during the execution of the software. Understanding the power demands of the processor at any given time is important to maintain and manage processor power. Additionally, acquiring an insight into the future power demands of the software can help the system with scheduling planning ahead of time, in order to prepare for any high-power section of the code as well as to plan to use the available power headroom as a result of an upcoming low-power section. In this part of our work, a new hierarchical approach to software phase classification is developed. Software phase classification problem focuses on determining the behaviour of the software at any given time slice by assigning the time slice to one of pre-determined software phases. Each phase is assumed to have known behaviour which was previously measured and instrumented based on previously observed instances of the phase, or by utilizing a model capable of estimating the behaviour of each phase. Using a two-tiered hierarchical clustering approach, our proposed phase classification methodology incorporates the recent performance behaviour of the software in order to determine the power phase. We focused on determining the power phase using the performance information because the real processor power is not usually available without the need for added hardware, while there exists a large number of different performance counters available on most modern processors. Additionally, based on our observations, the relation between performance phases and power behaviour is highly predictable. This method is shown to provide robust results with a low amount of noise compared to other methods, while providing a high enough timing accuracy for the processor to act on. To the best of our knowledge, no other existing work is able to provide both timing accuracy and reduced noise compared to our work. Software phase classification can be used to control the processor power based on the software's phase at any given time, but it does not provide future insight into the progression of the workload. Finally, we developed and compared several phase prediction methodologies based on phase precursors and phase locality concepts. Phase precursor-based methods rely on detecting the precursors observed before the software enters a certain phase, while phase locality methods rely on the locality principle, which postulates a high probability for the current software behaviour to be observed in the near-future. The phase classification, as well as phase prediction methodologies was shown to be able to reduce the power bursts within a workload in order to provide a more smooth power trace. As the bursts are removed from one workload's power trace, the multi-core processor power headroom can be confidently utilized for another process. / Graduate processor power prediction software phase classification software phase prediction data-driven power processor power management
74	An Evaluation of Soft Processors as a Reliable Computing Platform Gardiner, Michael Robert 01 July 2015 (has links) (PDF) This study evaluates the benefits and limitations of soft processors operating in a radiation-hardened FPGA, focusing primarily on the performance and reliability of these systems. FPGAs designs for four popular soft processors, the MicroBlaze, LEON3, Cortex-M0 DesignStart, and OpenRISC 1200 are developed for a Virtex-5 FPGA. The performance of these soft processor designs is then compared on ten widely-used benchmark programs. Benchmarking results indicate that the MicroBlaze has the best integer performance of the soft processors, with at least 2.23X better performance on average than the other three processors. However, the LEON3 has the best floating-point performance, with benchmark scores 8.9X higher on average than its competitors.The soft processors' performance is also compared against estimated benchmark scores for a radiation-hardened processor, the RAD750. We find the average performance of the RAD750 to be 2.58X better than the best soft processor scores on each benchmark, although the best soft processor scores were higher on two benchmarks. The soft processors' inability to compete with the performance of the decade-old RAD750 illustrates the substantial performance gap between hard and soft processor architectures. Although soft processors are not capable of competing with rad-hard processors in performance, the flexibility they provide nevertheless makes them a desirable option for space systems where speed is not the key issue.Fault injection experiments are also completed on three of the soft processors to evaluate their configuration memory sensitivity. Our results demonstrate that the MicroBlaze is less sensitive than the LEON3 and the Cortex-M0 DesignStart, but that the LEON3 has lower sensitivity per FPGA slice than the other processors. A combined metric for soft processor performance and configuration sensitivity is then developed to aid future researchers in evaluating the trade-offs between these two distinct processor attributes. soft processor radiation-hardened processor benchmarking fault injection Electrical and Computer Engineering
75	FPGA Design of a Multicore Neuromorphic Processing System Zhang, Bin 18 May 2016 (has links) No description available. Electrical Engineering Neuronmorphic Network micro-processor FPGA ultra-low power processor
76	Testing and evaluation of the integratability of the Senior processor / Testning och evaluering av Senior processorns integrerbarhet Hedin, Alexander January 2011 (has links) The ﬁrst version of the Senior processor was created as part of a thesis projectin 2007. This processor was completed and used for educational purposes atLinköpings University. In 2008 several parts of the processor were optimized andthe processor expanded with additional functionality as part of another thesisproject. In 2009 an EU funded project called MULTI-BASE started, in which theComputer Division at the Department of Electrical Engineering participated in.For their part of the MULTI-BASE project, the Senior processor was selected tobe used. After continuous revision and development, this processor was sent formanufacturing. The assignment of this thesis project was to test and verify the diﬀerent func-tions implemted in the Senior processor. To do this a PCB was developed fortesting the Senior processor together with a Virtex-4 FPGA. Extensive testingwas done on the most important functions of the Senior processor. These testsshowed that the manufactured Senior processor works as designed and that it alonecan perform larger calculations and use external hardware accelerators with thehelp of its various interfaces. / Den första versionen av Senior processorn skapades som en del i ett examensarbe-te under 2007, denna processor färdigställdes och användes i utbildningssyfte påLinköping Universitet. 2008 optimerades ﬂera delar av processorn och utökadesmed extra funktionalitet som del av ytterligare ett examensarbete. 2009 startadeett EU ﬁnansierat projekt vid namn MULTI-BASE, som ISYs Datortekniks avdel-ning deltar i. Till deras del av MULTI-BASE projektet valdes Senior processorn attanvändas, efter ytterligare utveckling skickades denna processor för tillverkning. Detta examensarbete hade i uppgift att testa och veriﬁera de olika funktionernasom Senior processorn har implementerats med. För att göra detta tillverkades ettkretskort som ska användas för att testa Senior processorn tillsammans med enVirtex-4 FPGA. Utförliga tester gjordes på de viktigaste funktionerna hos Seniorprocessorn, dessa tester visade att den tillverkade Senior processorn fungerar somplanerat. Den kan på egen hand utföra större beräkningar och använda sig avexterna hårdvare acceleratorer med hjälp av sina olika gränssnitt. Senior processor PCB Printed Circuit Board FPGA Hardware accelerator SPI Serial Peripheral Interface DSP Digital Signal Processor Senior processor Computer science Datavetenskap
77	SYNCHRONOUS COMMAND GENERATOR IN A SINGLE STANDALONE CHASSIS FOR SPINNING SATELLITES Boulinguez, Marc, Carlier, Pierre-Marie 10 1900 (has links) International Telemetering Conference Proceedings / October 26-29, 1998 / Town & Country Resort Hotel and Convention Center, San Diego, California / Designed for unattended 24 hours-a-day operation in automatic system environments, the 3801 TT&C Digital Processor Unit is the key communication unit for ground stations operating spacecraft, from integration to positioning phase and in-orbit operation. Its architecture and technology concept combine high performance, compactness and modularity. The 3801 TT&C Digital Processor Unit supports multiple formats in a single stand-alone chassis, and incorporates extensive interfacing and functional provisions to maximize effectiveness, reliability and dependability. It supports a number of configurations for satellite control applications and performs :* • Telemetry IF demodulation and transmission of data to a high-level communication interface, with time tagging and display of decommutated parameters, • Command generation, with FSK or PSK and FM or PM modulation at 70 MHz, • Ranging measurements and calibration using ESA, INTELSAT and major standards (tones and codes). In addition, the 3801 TT&C Digital Processor supports a Synchronous Command Generator for spinning satellite in a single stand-alone chassis and includes : • FM signal discrimination, for satellite spin reference information coming from the Telemetry Reception channel, • Synchronization Controller for providing the reference « top » for the transmission of the synchronous tones, • Tones Generation of frequency tones towards the PM/FM Modulator. Spinning Satellite FM Signal TT&C Processor
78	COMPACT AIRBORNE REAL TIME DATA MONITOR SYSTEM - PRODUCTION MONITOR Tolleth, Grant H. 10 1900 (has links) International Telemetering Conference Proceedings / October 28-31, 1996 / Town and Country Hotel and Convention Center, San Diego, California / This paper describes the Production Monitor (PM), a result of integrating very diverse hardware architectures into a compact, portable, real time airborne data monitor, and data analysis station. Flight testing of aircraft is typically conducted with personnel aboard during flight. These personnel monitor real time data, play back recorded data, and adjust test suites to certify or analyze systems as quickly as possible. In the past, Boeing has used a variety of dissimilar equipment and software to meet our testing needs. During the process of standardizing and streamlining testing processes, the PM was developed. PM combines Data Flow, VME, Ethernet, and PC architectures into a single integrated system. This approach allows PM to run applications, provide indistinguishable operator interfaces, and use data bases and peripherals common to our other systems. Real time data monitor Airborne Multi-processor VME
79	LOW EARTH ORBITER TERMINAL (LEO-T) Harrison, Keith, Blevins, William 10 1900 (has links) International Telemetering Conference Proceedings / October 27-30, 1997 / Riviera Hotel and Convention Center, Las Vegas, Nevada / The Low Earth Orbit Terminal (LEO-T) developed by AlliedSignal for NASA Wallops is a fully autonomous satellite tracking system which provides a reliable, high quality, satellite data collection and dissemination service. The procurement was initiated by NASA, in an effort to provide more tracking capacity with a decreasing budget. A large mission set of NASA satellites in the next decade will not require the performance of existing large aperture systems. NASA is planning to use the larger aperture antennas to only support those missions needing the higher performance. The remainder of the missions will be supported with the smaller LEO-Ts, which are smaller, significantly less expensive, and fully automated. The procurement is also an attempt at a first step towards fostering commercialization and privatization of small station acquisition and services. The system design features a modular architecture to simplify integration and to support affordable future expansion. This paper begins with a brief summary of the LEO-T program, then provides the design details and capabilities of the LEO-T system. Autonomous Front End Processor CCSDS and TDM Telemetry Low Earth Orbit
80	Managing lifetime reliability, performance, and power tradeoffs in multicore microarchitectures Song, William J. 07 January 2016 (has links) The objective of this research is to characterize and manage lifetime reliability, microarchitectural performance, and power tradeoffs in multicore processors. This dissertation is comprised of three research themes; 1) modeling and simulation method of interacting multicore processor physics, 2) characterization and management of performance and lifetime reliability tradeoff, and 3) extending Amdahl’s Law for understanding lifetime reliability, performance, and energy efficiency of heterogeneous processors. With continued technology scaling, processor operations are increasingly dominated by multiple distinct physical phenomena and their coupled interactions. Understanding these behaviors requires the modeling of complex physical interactions. This dissertation first presents a novel simulation framework that orchestrates interactions between multiple physical models and microarchitecture simulators to enable research explorations at the intersection of application, microarchitecture, energy, power, thermal, and reliability. Using this framework, workload-induced variation of device degradation is characterized, and its impacts on processor lifetime and performance are analyzed. This research introduces a new metric to quantify performance-reliability tradeoff. Lastly, the theoretical models of heterogeneous multicore processors are proposed for understanding performance, energy efficiency, and lifetime reliability consequences. It is shown that these system metrics are governed by Amdahl’s Law and correlated as a function of processor composition, scheduling method, and Amdahl’s scaling factor. This dissertation highlights the importance of multidimensional analysis and extends the scope of microarchitectural studies by incorporating the physical aspects of processor operations and designs. Computer architecture Lifetime reliability Modeling Performance Energy efficiency Heterogeneous processor

Search results