Spelling suggestions: "subject:"On-Chip sus"" "subject:"On-Chip uus""
1 |
Embedded In-Circuit Emulation and Tracing for Bus-based System-on-Chip IntegrationKao, Chung-fu 10 September 2007 (has links)
In the System-on-Chip (SoC) era, common industry estimates are that functional verification takes approximately 70% of the total effort on a project. For the time-to-market constrain, it¡¦s a challenge to reduce the SoC verification/debugging time efficiently. In an SoC, a microprocessor is an essential part of it. First, we focus the debugging problem on microprocessors. An in-circuit emulation (ICE) module that can be embedded with a microprocessor core. The ICE module, based on the IEEE 1149.1 JTAG architecture, supports typical debugging and testing mechanisms, including boundary scan paths, partial scan paths, single stepping, internal resource monitoring and modification, breakpoint detection, and mode switching between debugging and normal modes. The architecture of the ICE module is parameterized and retargetable to different microprocessors. It has been successfully integrated with two microprocessors with significantly different architectures: one 8-bit industrial embedded microcontroller HT48x00 and one 32-bit ARM7-like embedded microprocessor. FPGA prototypes and chip implementation have been accomplished. Experiments show that real-time (on-line) debugging at full speed is possible with the embedded ICE at a minor gate count overhead.
Collecting the program execution traces at full speed is essential to the analysis and debugging of real-time software behavior of a complex system. However, the generation rate and the size of real time program traces are so huge such that real-time program tracing is often infeasible without proper hardware support. This paper presents a hardware approach to compress program execution traces in real time in order to reduce the trace size. The approach consists of three modularized phases: (1) branch/target filtering, (2) branch/target address encoding and (3) Lempel-Ziv-based data compression. A synthesizable RTL code for the proposed hardware is constructed to analyze the hardware cost and speed and typical multimedia benchmarks are used to measure the compression results. The results show that our hardware is capable of real time compression and achieving compression ratio of 454:1, far better than 5:1 achieved by typical existing hardware approaches. Furthermore, our modularized approach makes it possible to trade off between the hardware cost (typically from 1K to 50K gates) and the achievable compression ratio (typically from 5:1 to 454:1).
For SoC debugging, bus signal tracing represents that the information which is generated from the system can be collected for later observation, debugging and analysis. However, the generation rate and the size of real time system traces are so huge such that a mechanism for system tracing that can reduce trace size efficiently is needed. In this paper, we propose a multi-resolution bus trace approach. The hardware bus tracer consists of two major stages: (1) signal monitor & tracing stage, and (2) trace compression stage. In the first stage, designer can trace the signals in detail or in rough depends on the debug purpose. In other word, the multi-resolution trace approach provides the trade-off between trace accuracy and trace depth. In the second stage, the bus tracer compresses the trace size efficiently; therefore the capability of on-chip storage is increased. In the host, the analyzer tool decompresses the trace data for future observation and debugging.
|
2 |
Variable length pattern coding for power reduction in off-chip data busesVenkitasubramanian Iyer, Jayakrishnan 15 May 2009 (has links)
Off-chip buses consume a huge fraction (20%-40%) of the system power. Hence, techniques
such as increasing bus widths, transition encoding etc. have been used for
power reduction on off-chip data buses. Since capacitances at the I/O pads and interwire
capacitances contribute significantly to increase in power, encoding/decoding
schemes have been developed to reduce switching activity of the off-chip bus lines,
thus reducing power. Frequent-Value Encoding(FVE) [1], Frequent Value Encoding
with Xor (FVExor) [1] and VALVE [2] are some of the better known encoding schemes
but they still have scope for improvement.
This thesis addresses the problem of power reduction in off-chip data buses by
encoding variable number (1 to 4) of fixed-size (32-bit) data values (variable length
patterns) which exhibit temporal locality. This characteristic enables us to cache
these patterns using 64-entry CAM at the encoder and 64-entry SRAM at the decoder.
Whenever a pattern match occurs a 2-bit code indicating the index of the match is
sent. If a variable length pattern match occurs then the code and unmatched portion
of data is sent.
We implemented our scheme, Variable Length Pattern Coding (VLPC) for various
integer and floating point benchmarks and have seen 6% to 49% encodable patterns
in these benchmarks. Based on the experiments on simplescalar and our analysis
in MATLAB, we obtained 4.88% to 40.11% reduction in transition activity for SPEC2000 benchmarks such as crafty, swim, mcf, applu, ammp etc. over unencoded
data. This is 0.3% to 38.9% higher than that obtained using FVE, FVExor [1] and
VALVE [2] encoding schemes. Finally, we have designed a low-power custom CAM
and SRAM using 45nm BSIM4 technology models which has been used to verify lower
latency of data matching and storing.
|
3 |
Implementación de Interfaz PCI Sobre Plataforma Industrial Basada en Dispositivo FPGARomán Asenjo, Enrique Efraín January 2009 (has links)
ISIS es una placa madre industrial desarrollada en Chile por Continental Lensa S.A orientada al soporte de SoPCs (Systems on a Programmable Chip) sobre un dispositivo FPGA (Field Programmable Gate Array), integrado con una serie de periféricos on-board. La capacidad de soportar SoPCs basados en el procesador Nios II y el sistema operativo uClinux, en conjunto con diversos núcleos de hardware de propiedad intelectual o IP cores, abre un universo de aplicaciones que abarca desde el control de sistemas, procesamiento digital de señales, y sistemas de radio y televisión digital.
ISIS incorpora un conector PMC (PCI Mezzanine Card), que corresponde a una especificación mecánica para sistemas PCI de montaje paralelo y tamaño pequeño, contrario al estándar PCI convencional donde las tarjetas se montan en forma perpendicular. Sin embargo, no es posible controlar dispositivos PCI con la plataforma ISIS sin un adecuado soporte de hardware y software que provea una interfaz de bus acorde a los requerimientos del estándar PCI.
El presente trabajo otorga a la plataforma ISIS soporte para conectividad con dispositivos PCI 3.3V 32 bit @ 33 MHz. El trabajo aporta la implementación de un chipset PCI embebido en el dispositivo FPGA, el soporte de software para operación con el sistema operativo uClinux, y una aplicación para control y diagnóstico del hardware. Además, se aporta un nuevo hardware que brinda una solución a la incompatibilidad entre los complejos estándares mecánicos PCI Mezzanine Card y PCI convencional de PC.
Uno de los aportes es la implementación del IP core de libre distribución PCI Bridge de Opencores con interfaz de bus Wishbone, en un SoPC con arquitectura de comunicación nativa Avalon System Interconnect Fabric, lo que requiere implementar lógica de adaptación entre dos estándares de interconexión SoC incompatibles. Además, los requerimientos del sistema exigen que el IP core PCI Bridge sea implementado en modo Host, estando disponible solamente con pruebas de operación en modo Guest, lo que implica el desafío de implementar funcionalidades que no cuentan con un proceso de validación. También se desarrolla una capa de software que comunica el hardware PCI con el kernel de Linux, y un programa que permite el control y diagnóstico de los dispositivos presentes en el bus.
El presente trabajo se integra como parte fundamental del equipo de radiodifusión digital de tercera generación GSD-21 Exgine. El núcleo de hardware del equipo lo constituye la plataforma ISIS integrada con el dispositivo PCI DUC-II (Next Generation Digital Up Converter), por medio de los sistemas de hardware y software desarrollados. Se obtiene una tasa de transferencia promedio de 14,5 MByte/s para transferencias PCI usando DMA, y una tasa de error de bus igual a cero para 24 horas de operación sin interrupciones del equipo GSD-21.
|
4 |
"Implementação do barramento on-chip AMBA baseada em computação reconfigurável" / Implementation of on-chip AMBA bus based on Reconfigurable ComputingQueiroz, Daniel Cruz de 04 February 2005 (has links)
A computação reconfigurável está se fortalecendo cada vez mais devido ao grande avanço dos dispositivos reprogramáveis e ferramentas de projeto de hardware utilizadas atualmente. Isso possibilita que o desenvolvimento de hardware torne-se bem menos trabalhoso e complicado, facilitando assim a vida do desenvolvedor. A tecnologia utilizada atualmente em projetos de computação reconfigurável é denominada FPGA (Field Programmable Gate Array), que une algumas características tanto de software (flexibilidade), como de hardware (desempenho). Isso fornece um ambiente bastante propício para desenvolvimento de aplicações que precisam de um bom desempenho, sem que estas devam possuir uma configuração definitiva. O objetivo deste trabalho foi implementar um barramento eficiente para possibilitar a comunicação entre diferentes CORES de um robô reconfigurável, que podem estar dispersos em diferentes dispositivos FPGAs. Tal barramento seguirá o padrão AMBA (Advanced Microcontroller Bus Architecture), pertencente à ARM. Todo o desenvolvimento do core completo do AMBA foi realizado utilizando-se a linguagem VHDL (Very High Speed Integrated Circuit Hardware Description Language) e ferramentas EDAs (Electronic Design Automation) apropriadas. É importante notar que, embora o barramento tenha sido projetado para ser utilizado em um robô, o mesmo pode ser usado em qualquer sistema on-chip. / The reconfigurable computing is each time more fortified, what leads to a great advance of reprogrammable devices and hardware design tools. This has become hardware development less laborious and complicated, thus, facilitating the life of the designer. The technology currently used in projects of reconfigurable computing is called FPGA (Field Programmable Gate Array), which combines some characteristics of software (flexibility) and hardware (performance). This technology provides a propitious environment to the development of applications that need a good performance. Those that dont need a definitive configuration. The purpose of this work was to implement an efficient bus to make possible the communication among different modules of a reconfigurable robot. This bus is based on a bus standard called AMBA (Advanced Microcontroller Bus Architecture), which belongs to ARM. All the development of full AMBA core was carried through using VHDL (Very High Speed Integrated Circuit the Hardware Description Language) language and appropriated EDA (Electronic Design Automation) tools. It is important to notice that, even so the bus have been projected to be used in a robot, it could be used in any system on-chip.
|
5 |
Automatic Generation of On-Chip Bus Infrastructure for System-on-ChipChen, Chun-Chang 15 December 2004 (has links)
For the on-chip bus, flexibility is the key to reuse by enabling developers to select the optimal architecture to efficiently meet the performance requirements of a wide variety of systems. AMBA is an open standard, on-chip bus specification that details a strategy for the interconnection and management of functional blocks that makes up a System-on-Chip (SoC). AMBA will let designers multiply the total bandwidth available in a system without changing the bus interface on existing intellectual property (IP) cores. Sometimes, the SoC designer to select the optimal combination of bus frequency (to match the peripherals) and number of channels (to achieve the bandwidth), using the AMBA Multi-layer architecture. The AHB of the AMBA System Bus connects embedded processors such as an ARM core to high-performance peripherals, DMA controllers, on-chip memory and interfaces. It is a high-speed, high-bandwidth bus that supports multi-master bus management to maximize system performance. In this thesis, we implement an software, Automatic Generation of On-Chip Bus Infrastructure for SoC, and it supports the AMBA AHB, Multi-layer AHB architecture to optimize system bandwidth, or AHB-Lite to streamline single master layers. By user set up, it can generate the relative on-chip bus infrastructure. We use each AHB Monitor of SDV and Synposys to validate the protocol of infrastructure respectively. In Test Patterns, we use Bus Functional Model to verify all type transfers of bus. In hardware implement, we use SYS32TM, SYS32TME, SYS16TM, and MEMCU to integrate three type AHBs. Every example, we also build FPGA prototyping and chip layout. We do this to validate our on-chip bus infrastructure.
|
6 |
"Implementação do barramento on-chip AMBA baseada em computação reconfigurável" / Implementation of on-chip AMBA bus based on Reconfigurable ComputingDaniel Cruz de Queiroz 04 February 2005 (has links)
A computação reconfigurável está se fortalecendo cada vez mais devido ao grande avanço dos dispositivos reprogramáveis e ferramentas de projeto de hardware utilizadas atualmente. Isso possibilita que o desenvolvimento de hardware torne-se bem menos trabalhoso e complicado, facilitando assim a vida do desenvolvedor. A tecnologia utilizada atualmente em projetos de computação reconfigurável é denominada FPGA (Field Programmable Gate Array), que une algumas características tanto de software (flexibilidade), como de hardware (desempenho). Isso fornece um ambiente bastante propício para desenvolvimento de aplicações que precisam de um bom desempenho, sem que estas devam possuir uma configuração definitiva. O objetivo deste trabalho foi implementar um barramento eficiente para possibilitar a comunicação entre diferentes CORES de um robô reconfigurável, que podem estar dispersos em diferentes dispositivos FPGAs. Tal barramento seguirá o padrão AMBA (Advanced Microcontroller Bus Architecture), pertencente à ARM. Todo o desenvolvimento do core completo do AMBA foi realizado utilizando-se a linguagem VHDL (Very High Speed Integrated Circuit Hardware Description Language) e ferramentas EDAs (Electronic Design Automation) apropriadas. É importante notar que, embora o barramento tenha sido projetado para ser utilizado em um robô, o mesmo pode ser usado em qualquer sistema on-chip. / The reconfigurable computing is each time more fortified, what leads to a great advance of reprogrammable devices and hardware design tools. This has become hardware development less laborious and complicated, thus, facilitating the life of the designer. The technology currently used in projects of reconfigurable computing is called FPGA (Field Programmable Gate Array), which combines some characteristics of software (flexibility) and hardware (performance). This technology provides a propitious environment to the development of applications that need a good performance. Those that dont need a definitive configuration. The purpose of this work was to implement an efficient bus to make possible the communication among different modules of a reconfigurable robot. This bus is based on a bus standard called AMBA (Advanced Microcontroller Bus Architecture), which belongs to ARM. All the development of full AMBA core was carried through using VHDL (Very High Speed Integrated Circuit the Hardware Description Language) language and appropriated EDA (Electronic Design Automation) tools. It is important to notice that, even so the bus have been projected to be used in a robot, it could be used in any system on-chip.
|
7 |
Modelling and Analysis of Interconnects for Deep Submicron Systems-on-ChipPamunuwa, Dinesh January 2003 (has links)
The last few decades have been a very exciting period in thedevelopment of micro-electronics and brought us to the brink ofimplementing entire systems on a single chip, on a hithertounimagined scale. However an unforeseen challenge has croppedup in the form of managing wires, which have become the mainbottleneck in performance, masking the blinding speed of activedevices. A major problem is that increasingly complicatedeffects need to be modelled, but the computational complexityof any proposed model needs to be low enough to allow manyiterations in a design cycle. This thesis addresses the issue of closed form modelling ofthe response of coupled interconnect systems. Following astrict mathematical approach, second order models for thetransfer functions of coupled RC trees based on the first andsecond moments of the impulse response are developed. The2-pole-1-zero transfer function that is the best possible fromthe available information is obtained for the signal path fromeach driver to the output in multiple aggressor systems. Thisallows the complete response to be estimated accurately bysumming up the individual waveforms. The model represents theminimum complexity for a 2-pole-1-zero estimate, for this classof circuits. Also proposed are new techniques for the optimisation ofwires in on-chip buses. Rather than minimising the delay overeach individual wire, the configuration that maximises thetotal bandwidth over a number of parallel wires isinvestigated. It is shown from simulations that there is aunique optimal solution which does not necessarily translate tothe maximum possible number of wires, and in fact deviatesconsiderably from it when the resources available for repeatersare limited. Analytic guidelines dependent only on processparameters are derived for optimal sizing of wires andrepeaters. Finally regular tiled architectures with a commoncommunication backplane are being proposed as being the mostefficient way to implement systems-on-chip in the deepsubmicron regime. This thesis also considers the feasibility ofimplementing a regular packet-switched network-on-chip in atypical future deep submicron technology. All major physicalissues and challenges are discussed for two differentarchitectures and important limitations are identified.
|
8 |
IEEE 802.15.4 Protocol Stack Library Implementation,Hardware Design, and Applications in Medical MonitoringYang, Cheng-Yen 12 July 2010 (has links)
Due to the rapid development of semiconductor technology, the number of transistors of integrated circuits in unit area increases by double in roughly every two years. We then can add more circuits and functionality into a single chip. The size of electronic products certainly is reduced. Besides, because of the blooming popularity of wireless network standards in recently year, sensors have been wireless connected to provide more functionality and intelligence. They are, namely, wireless sensor network (WSN). Before long, the integrated circuit design will not only be emphasized on front-end circuits and hardware design, but also integration and functionality, which is so-called the system-on-chip (SOC) design.
The first topic of this thesis is the implementation of IEEE 802.15.4 network prototype and hardware design. The main purpose of prototyping is to realize the highly portable IEEE 802.15.4 protocol stack library which can be quickly transferred to different hardwares. Thus, it shortens the time to market. In ASIC hardware design, we use WISHBONE bus as the interconnection architecture which can be easily integrated into current SOC design for an embedded system.
The second topic is an application of IEEE 802.15.4 in medical monitoring, including system prototyping and ASIC hardware design, which collects the bladder pressure readings by a wireless link and ECG signals from our ASIC sensors. Finally, we realize the medical monitoring in a prototypical system.
|
9 |
Modelling and Analysis of Interconnects for Deep Submicron Systems-on-ChipPamunuwa, Dinesh January 2003 (has links)
<p>The last few decades have been a very exciting period in thedevelopment of micro-electronics and brought us to the brink ofimplementing entire systems on a single chip, on a hithertounimagined scale. However an unforeseen challenge has croppedup in the form of managing wires, which have become the mainbottleneck in performance, masking the blinding speed of activedevices. A major problem is that increasingly complicatedeffects need to be modelled, but the computational complexityof any proposed model needs to be low enough to allow manyiterations in a design cycle.</p><p>This thesis addresses the issue of closed form modelling ofthe response of coupled interconnect systems. Following astrict mathematical approach, second order models for thetransfer functions of coupled RC trees based on the first andsecond moments of the impulse response are developed. The2-pole-1-zero transfer function that is the best possible fromthe available information is obtained for the signal path fromeach driver to the output in multiple aggressor systems. Thisallows the complete response to be estimated accurately bysumming up the individual waveforms. The model represents theminimum complexity for a 2-pole-1-zero estimate, for this classof circuits.</p><p>Also proposed are new techniques for the optimisation ofwires in on-chip buses. Rather than minimising the delay overeach individual wire, the configuration that maximises thetotal bandwidth over a number of parallel wires isinvestigated. It is shown from simulations that there is aunique optimal solution which does not necessarily translate tothe maximum possible number of wires, and in fact deviatesconsiderably from it when the resources available for repeatersare limited. Analytic guidelines dependent only on processparameters are derived for optimal sizing of wires andrepeaters.</p><p>Finally regular tiled architectures with a commoncommunication backplane are being proposed as being the mostefficient way to implement systems-on-chip in the deepsubmicron regime. This thesis also considers the feasibility ofimplementing a regular packet-switched network-on-chip in atypical future deep submicron technology. All major physicalissues and challenges are discussed for two differentarchitectures and important limitations are identified.</p>
|
10 |
AXI-PACK : Near-memory Bus Packing for Bandwidth-Efficient Irregular Workloads / AXI-PACK : Busspackning med nära minne för bandbreddseffektiv oregelbunden arbetsbelastningZhang, Chi January 2022 (has links)
General propose processor (GPP) are demanded high performance in dataintensive applications, such as deep learning, high performance computation (HPC), where algorithm kernels like GEMM (general matrix-matrix multiply) and SPMV (sparse matrix-vector multiply) kernels are intensively used. The performance of these data-intensive applications are bounded with memory bandwidth, which is limited by computing & memory access coupling and memory wall effect. Recent works proposed streaming ISA extensions to maximum memory bandwidth, which decouple computation and memory access, prefetching data by memory access pattern, hiding architecture latency. However, the performance of irregular memory access still suffers from low bus utilization when transferring narrow stream elements on wide memory buses. To solve this problem, the project proposes a new on-chip bus protocol - AXI-PACK, extended from Advance eXtensible Interface4 (AXI4) on-chip protocol, which enables high bandwidth end-to-end irregular memory streaming. Next, an on-chip multi-banked SRAM memory system is designed for supporting AXI-PACK, and AXI-PACK is evaluated under an open-source RISC-V vector processor system. AXI-PACK demonstrates high bus utilization and bandwidth in irregular access, which helps speedup GEMM(element size = 32bits) kernel 6.1 times and SpMV(element size = 32bits) kernel 3.0 times under bus data width of 256 bits, comparing to standard AXI4 bus. / General propose processor (GPP) efterfrågas hög prestanda i dataintensiva applikationer, såsom djupinlärning, högpresterande beräkningar (HPC), där algoritmkärnor som GEMM (generell matris-matris multiplicera) och SPMV (sparse matrix-vector multiply) kärnor används intensivt. Prestandan för dessa dataintensiva applikationer är begränsade till minnesbandbredd, som begränsas av dator & minnesåtkomstkoppling och minnesväggeffekt. Nya arbeten föreslog strömning av ISA-förlängningar till maximal minnesbandbredd, som frikopplar beräkning och minnesåtkomst, förhämtning av data genom minnesåtkomstmönster, döljer arkitekturlatens. Emellertid lider prestandan för oregelbunden minnesåtkomst fortfarande av låg bussanvändning vid överföring av smala strömelement på breda minnesbussar. För att lösa detta problem föreslår projektet ett nytt on-chip-bussprotokoll - AXIPACK, utvidgat från Advance eXtensible Interface4 (AXI4) on-chip-protokoll, vilket möjliggör oregelbunden minnesströmning med hög bandbredd ändetill-ände. Därefter är ett SRAM-minnessystem med flera banker på chip designat för att stödja AXI-PACK, och AXI-PACK utvärderas under ett RISC-V vektorprocessorsystem med öppen källkod. AXI-PACK visar hög bussanvändning och bandbredd vid oregelbunden åtkomst, vilket hjälper till att snabba upp GEMM (elementstorlek = 32 bitar) kärnan 6,1 gånger och SpMV (elementstorlek = 32 bitar) kärnan 3,0 gånger under bussdatabredden på 256 bitar, jämfört med standard AXI4-buss .
|
Page generated in 0.0262 seconds