Global ETD Search

211	Conception d’une mémoire SRAM en tension sous le seuil pour des applications biomédicales et les nœuds de capteurs sans fils en technologies CMOS avancées / Solutions of subthreshold SRAM in ultra-wide-voltage range in advanced CMOS technologies for biomedical and wireless sensor applications Feki, Anis 29 May 2015 (has links) L’émergence des circuits complexes numériques, ou System-On-Chip (SOC), pose notamment la problématique de la consommation énergétique. Parmi les blocs fonctionnels significatifs à ce titre, apparaissent les mémoires et en particulier les mémoires statiques (SRAM). La maîtrise de la consommation énergétique d’une mémoire SRAM inclue la capacité à rendre la mémoire fonctionnelle sous très faible tension d’alimentation, avec un objectif agressif de 300 mV (inférieur à la tension de seuil des transistors standard CMOS). Dans ce contexte, les travaux de thèse ont concerné la proposition d’un point mémoire SRAM suffisamment performant sous très faible tension d’alimentation et pour les nœuds technologiques avancés (CMOS bulk 28nm et FDSOI 28nm). Une analyse comparative des architectures proposées dans l’état de l’art a permis d’élaborer deux points mémoire à 10 transistors avec de très faibles impacts de courant de fuite. Outre une segmentation des ports de lecture, les propositions reposent sur l’utilisation de périphéries adaptées synchrones avec notamment une solution nouvelle de réplication, un amplificateur de lecture de données en mode tension et l’utilisation d’une polarisation dynamique arrière du caisson SOI (Body Bias). Des validations expérimentales s’appuient sur des circuits en technologies avancées. Enfin, une mémoire complète de 32kb (1024x32) a été soumise à fabrication en 28 FDSOI. Ce circuit embarque une solution de test (BIST) capable de fonctionner sous 300mV d’alimentation. Après une introduction générale, le 2ème chapitre du manuscrit décrit l’état de l’art. Le chapitre 3 présente les nouveaux points mémoire. Le 4ème chapitre décrit l’amplificateur de lecture avec la solution de réplication. Le chapitre 5 présente l’architecture d’une mémoire ultra basse tension ainsi que le circuit de test embarqué. Les travaux ont donné lieu au dépôt de 4 propositions de brevet, deux conférences internationales, un article de journal international est accepté et un autre vient d’être soumis. / Emergence of large Systems-On-Chip introduces the challenge of power management. Of the various embedded blocks, static random access memories (SRAM) constitute the angrier contributors to power consumption. Scaling down the power supply is one way to act positively on power consumption. One aggressive target is to enable the operation of SRAMs at Ultra-Low-Voltage, i.e. as low as 300 mV (lower than the threshold voltage of standard CMOS transistors). The present work concerned the proposal of SRAM bitcells able to operate at ULV and for advanced technology nodes (either CMOS bulk 28 nm or FDSOI 28 nm). The benchmarking of published architectures as state-of-the-art has led to propose two flavors of 10-transitor bitcells, solving the limitations due to leakage current and parasitic power consumption. Segmented read-ports have been used along with the required synchronous peripheral circuitry including original replica assistance, a dedicated unbalanced sense amplifier for ULV operation and dynamic forward back-biasing of SOI boxes. Experimental test chips are provided in previously mentioned technologies. A complete memory cut of 32 kbits (1024x32) has been designed with an embedded BIST block, able to operate at ULV. After a general introduction, the manuscript proposes the state-of-the-art in chapter two. The new 10T bitcells are presented in chapter 3. The sense amplifier along with the replica assistance is the core of chapter 4. The memory cut in FDSOI 28 nm is detailed in chapter 5. Results of the PhD have been disseminated with 4 patent proposals, 2 papers in international conferences, a first paper accepted in an international journal and a second but only submitted paper in an international journal. Electronique Cicrcuit numérique CMOS System on Chip - SOC Mémoire statique - SRAM Alimentation à très faible tension Circuit de test embarqué Electronics Digital CMOS circuit System on Chip - SOC Static random access memory - SRAM Ultra-Low-Voltage power supply Embedded test chips 621.397 072
212	A 16-Channel Fully Configurable Neural SoC With 1.52 μW/Ch Signal Acquisition, 2.79 μW/Ch Real-Time Spike Classifier, and 1.79 TOPS/W Deep Neural Network Accelerator in 22 nm FDSOI Zeinolabedin, Seyed Mohammad Ali, Schüffny, Franz Marcus, George, Richard, Kelber, Florian, Bauer, Heiner, Scholze, Stefan, Hänzsche, Stefan, Stolba, Marco, Dixius, Andreas, Ellguth, Georg, Walter, Dennis, Höppner, Sebastian, Mayr, Christian 20 January 2023 (has links) With the advent of high-density micro-electrodes arrays, developing neural probes satisfying the real-time and stringent power-efficiency requirements becomes more challenging. A smart neural probe is an essential device in future neuroscientific research and medical applications. To realize such devices, we present a 22 nm FDSOI SoC with complex on-chip real-time data processing and training for neural signal analysis. It consists of a digitally-assisted 16-channel analog front-end with 1.52 μ W/Ch, dedicated bio-processing accelerators for spike detection and classification with 2.79 μ W/Ch, and a 125 MHz RISC-V CPU, utilizing adaptive body biasing at 0.5 V with a supporting 1.79 TOPS/W MAC array. The proposed SoC shows a proof-of-concept of how to realize a high-level integration of various on-chip accelerators to satisfy the neural probe requirements for modern applications. info:eu-repo/classification/ddc/570 ddc:570 info:eu-repo/classification/ddc/610 ddc:610 info:eu-repo/classification/ddc/620 ddc:620
213	Deep Learning Model Deployment for Spaceborne Reconfigurable Hardware : A flexible acceleration approach Ferre Martin, Javier January 2023 (has links) Space debris and space situational awareness (SSA) have become growing concerns for national security and the sustainability of space operations, where timely detection and tracking of space objects is critical in preventing collision events. Traditional computer-vision algorithms have been used extensively to solve detection and tracking problems in flight, but recently deep learning approaches have seen widespread adoption in non-space related applications for their high accuracy. The performanceper-watt and flexibility of reconfigurable Field-Programmable Gate Arrays (FPGAs) make them a good candidate for deep learning model deployment in space, supporting in-flight updates and maintenance. However, the FPGA design costs of custom accelerators for complex algorithms remains high. The research focus of the thesis relies on novel high-level synthesis (HLS) workflows that allow the developer to raise the level of abstraction and lower design costs for deep learning accelerators, particularly for space-representative applications. To this end, four different hardware accelerators of convolutional neural network models for spacebased debris detection are implemented (ResNet, SqueezeNet, DenseNet, TinyCNN), using the open-source HLS tool NNgen. The obtained hardware accelerators are deployed to a reconfigurable module of the Zynq Ultrascale+ MPSoC programmable logic, and compared in terms of inference performance, resource utilization and latency. The tests on the target hardware show a detection accuracy over 95% for ResNet, DenseNet and SqueezeNet, and a localization intersection-over-union over 0.5 for the deep models, and over 0.7 for TinyCNN, for space debris objects at a range between 1km and 100km for a diameter of 1cm, or between 100km and 1000km for a diameter of 10cm. The obtained speed-ups with respect to software-only implementations lay between 3x and 32x for the different hardware accelerators. / Rymdskrot och rymdsituationstänksamhet (SSA) har blivit växande oro för nationell säkerhet och hållbarheten för rymdoperationer, där snabb upptäckt och spårning av rymdobjekt är avgörande för att förhindra kollisioner. Traditionella datorseendealgoritmer har använts omfattande för att lösa problem med upptäckt och spårning i flygning, men på senare tid har djupinlärningsmetoder fått stor användning inom icke rymdrelaterade applikationer på grund av sin höga noggrannhet. Prestandaper-watt och flexibiliteten hos omkonfigurerbara Field-Programmable Gate Arrays (FPGAs) gör dem till en bra kandidat för distribution av djupinlärningsmodeller i rymden, med stöd för uppdateringar och underhåll under flygning. Men FPGAdesignkostnaderna för anpassade acceleratorer för komplexa algoritmer är fortfarande höga. Forskningsfokus för avhandlingen ligger på nya högnivåsyntes (HLS) arbetsflöden som gör det möjligt för utvecklaren att höja abstraktionsnivån och sänka designkostnaderna för acceleratorer för djupinlärning, särskilt för tillämpningar i rymden. För detta har fyra olika hårdvaruacceleratorer för modeller av konvolutionsnätverk för upptäckt av rymdbaserat skrot implementerats (ResNet, SqueezeNet, DenseNet, TinyCNN), med hjälp av öppen källkod HLS-verktyget NNgen. De erhållna hårdvaruacceleratorerna distribueras till en omkonfigurerbar modul av Zynq Ultrascale+ MPSoC-programmerbar logik och jämförs med avseende på inferensprestanda, resursutnyttjande och latens. Testerna på målhardwaren visar en upptäktnoggrannhet på över 95% för ResNet, DenseNet och SqueezeNet, och en lokaliserings-intersektion-över-union på över 0,5 för de djupa modellerna och över 0,7 för TinyCNN för rymdskrotobjekt på en avstånd mellan 1 km och 100 km för en diameter på 1 cm eller mellan 100 km och 1000 km för en diameter på 10 cm. De erhållna hastighetsökningarna i förhållande till endast programvara ligger mellan 3x och 32x för de olika hårdvaruacceleratorerna. Space Situational Awareness Deep Learning Convolutional Neural Networks FieldProgrammable Gate Arrays System-On-Chip Computer Vision Dynamic Partial Reconfiguration High-Level Synthesis Rymdsituationstänksamhet Djupinlärning Konvolutionsnätverk System-On-Chip (SoC) Datorseende Dynamisk partiell omkonfigurering Högnivåsyntes. Elektroteknik och elektronik
214	EVALUATION OF SOURCE ROUTING FOR MESH TOPOLOGY NETWORK ON CHIP PLATFORMS MUBEEN, SAAD January 2009 (has links) <p>Network on Chip is a scalable and flexible communication infrastructure for the design of core based System on Chip. Communication performance of a NoC depends heavily on the routing algorithm. Deterministic and adaptive distributed routing algorithms have been advocated in all the current NoC architectural proposals. In this thesis we make a case for the use of source routing for NoCs, especially for regular topologies like mesh. The advantages of source routing include in-order packet delivery; faster and simpler router design; and possibility of mixing non-minimal paths in a mainly minimal routing. We propose a method to compute paths for various communications in such a way that traffic congestion is avoided while ensuring deadlock free routing. We also propose an efficient scheme to encode the paths.</p><p>We developed a tool in Matlab that computes paths for source routing for both general and application specific communications. Depending upon the type of traffic, this tool computes paths for source routing by selecting best routing algorithm out of many routing algorithms. The tool uses a constructive path improvement algorithm to compute paths that give more uniform link load distribution. It also generates different types of traffics. We also developed a simulator capable of simulating source routing for mesh topology NoC. The experiments and simulations which we performed were successful and the results show that the advantages of source routing especially lower packet latency more than compensate its disadvantages. The results also demonstrate that source routing can be a good routing candidate for practical core based SoCs design using network on chip communication infrastructure.</p> Network on Chip (NoC) System on Chip (SoC) Core Based Design On Chip Communication Distributed Routing Source Routing Routing Algorithms Performance Analysis Packet Switched Network Electronics Elektronik Electrical engineering Elektroteknik Computer engineering Datorteknik
215	Integrated silicon technology and hardware design techniques for ultra-wideband and next generation wireless systems Huo, Yiming 18 May 2017 (has links) The last two decades have witnessed the CMOS processes and design techniques develop and prosper with unprecedented speed. They have been widely employed in contemporary integrated circuit (IC) commercial products resulting in highly added value. Tremendous e orts have been devoted to extend and optimize the CMOS process and its application for future wireless communication systems. Meanwhile, the last twenty years have also seen the fast booming of the wireless communication technology typically characterized by the mobile communication technology, WLAN technology, WPAN technology, etc. Nowadays, the spectral resource is getting increasingly scarce, particularly over the frequency from 0.7 to 6 GHz, whether the employed frequency band is licensed or not. To combat this dilemma, the ultra wideband (UWB) technology emerges to provide a promising solution for short-range wireless communication while using an unlicensed wide band in an overlay manner. Another trend of obtaining more spectrum is moving upwards to higher frequency bands. The WiFi-Alliance has already developed a certi cation program of the 60-GHz band. On the other side, millimeterwave (mmWave) frequency bands such as 28-GHz, 38-GHz, and 71-GHz are likely to be licensed for next generation wireless communication networks. This new trend poses both a challenge and opportunity for the mmWave integrated circuits design. This thesis combines the state-of-the-art IC and hardware technologies and design techniques to implement and propose UWB and 5G prototyping systems. First of all, by giving a thorough analysis of a transmitted reference pulse cluster (TRPC) scheme and mathematical modeling, a TRPC-UWB transceiver structure is proposed and its features and speci cations are derived. Following that, the detailed design, fabrication and veri cation of the TRPC-UWB transmitter front end and wideband voltage-controlled oscillators (VCOs) in CMOS process is presented. The TRPCUWB transmitter demonstrates a state-of-the-art energy e ciency of 38.4 pJ/pulse. Secondly, a novel system architecture named distributed phased array based MIMO (DPA-MIMO) is proposed as a solution to overcome design challenges for the future 5G cellular user equipment (UE) design. In addition, a prototyping design of on-chip mmWave antenna with radiation e ciency enhancement is presented for the IEEE 802.11ad application. Furthermore, two wideband K-band VCO prototypes based on two di erent topologies are designed and fabricated in a standard CMOS process. They both show good performance at center frequencies of 22.3 and 26.1 GHz. Finally, two CMOS mmWave VCO prototypes working at the potential future 5G frequency bands are presented with measurement results. / Graduate / 2018-04-30 / amenghym@gmail.com UWB Next Generation Wireless Communication CMOS Silicon 5G Voltage-controlled Oscillator (VCO) Apple Inc. Low Noise Amplifier User Equipment Base Station Transmitter Receiver Antenna System-on-Chip (SoC) PCB Millimeter Wave
216	Etude de la synchronisation et de la stabilité d’un réseau d’oscillateurs non linéaires. Application à la conception d’un système d’horlogerie distribuée pour un System-on-Chip (projet HODISS). / Study of the synchronization and the stability of a network of non-linear oscillators. Application to the design of a clock distribution system for a System-on-Chip (HODISS Project). Akre, Niamba Jean-Michel 11 January 2013 (has links) Le projet HODISS dans le cadre duquel s'effectue nos travaux adresse la problématique de la synchronisation globale des systèmes complexes sur puce (System-on-Chip ou SOCs, par exemple un multiprocesseur monolithique). Les approches classiques de distribution d'horloges étant devenues de plus en plus obsolètes à cause de l'augmentation de la fréquence d'horloge, l'accroissement des temps de propagation, l'accroissement de la complexité des circuits et les incertitudes de fabrication, les concepteurs s’intéressent (pour contourner ces difficultés) à d'autres techniques basées entre autres sur les oscillateurs distribués. La difficulté majeure de cette dernière approche réside dans la capacité d’assurer le synchronisme global du système. Nous proposons un système d'horlogerie distribuée basé sur un réseau d’oscillateurs couplés en phase. Pour synchroniser ces oscillateurs, chacun d'eux est en fait une boucle à verrouillage de phase qui permet ainsi d'assurer un couplage en phase avec les oscillateurs des zones voisines. Nous analysons la stabilité de l'état synchrone dans des réseaux cartésiens identiques de boucles à verrouillage de phase entièrement numériques (ADPLLs). Sous certaines conditions, on montre que l'ensemble du réseau peut synchroniser à la fois en phase et en fréquence. Un aspect majeur de cette étude réside dans le fait que, en l'absence d'une horloge de référence absolue, le filtre de boucle dans chaque ADPLL est piloté par les fronts montants irréguliers de l'oscillateur local et, par conséquent, n'est pas régi par les mêmes équations d'état selon que l'horloge locale est avancée ou retardée par rapport au signal considéré comme référence. Sous des hypothèses simples, ces réseaux d'ADPLLs dits "auto-échantillonnés" peuvent être décrits comme des systèmes linéaires par morceaux dont la stabilité est notoirement difficile à établir. L'une des principales contributions que nous présentons est la définition de règles de conception simples qui doivent être satisfaites sur les coefficients de chaque filtre de boucle afin d'obtenir une synchronisation dans un réseau cartésien de taille quelconque. Les simulations transitoires indiquent que cette condition nécessaire de synchronisation peut également être suffisante pour une classe particulière d'ADPLLs "auto-échantillonnés". / The HODISS project, context in which this work is achieved, addresses the problem of global synchronization of complex systems-on-chip (SOCs, such as a monolithic multiprocessor). Since the traditional approaches of clock distribution are less used due to the increase of the clock frequency, increased delay, increased circuit complexity and uncertainties of manufacture, designers are interested (to circumvent these difficulties) to other techniques based among others on distributed synchronous clocks. The main difficulty of this latter approach is the ability to ensure the overall system synchronization. We propose a clock distribution system based on a network of phase-coupled oscillators. To synchronize these oscillators, each is in fact a phase-locked loop which allows to ensure a phase coupling with the nearest neighboring oscillators. We analyze the stability of the synchronized state in Cartesian networks of identical all-digital phase-locked loops (ADPLLs). Under certain conditions, we show that the entire network may synchronize both in phase and frequency. A key aspect of this study lies in the fact that, in the absence of an absolute reference clock, the loop-filter in each ADPLL is operated on the irregular rising edges of the local oscillator and consequently, does not use the same operands depending on whether the local clock is leading or lagging with respect to the signal considered as reference. Under simple assumptions, these networks of so-called “self-sampled” all-digital phase-locked-loops (SS-ADPLLs) can be described as piecewise-linear systems, the stability of which is notoriously difficult to establish. One of the main contributions presented here is the definition of simple design rules that must be satisfied by the coefficients of each loop-filter in order to achieve synchronization in a Cartesian network of arbitrary size. Transient simulations indicate that this necessary synchronization condition may also be sufficient for a specific class of SS-ADPLLs. Boucles à verrouillage de phase Systèmes sur puce Fonctions de lyapunov quadratiques Distribution d'horloges synchronisés Phase-locked loops Nonlinear oscillators synchronization System-on-chip Quadratic lyapunov functions Distributed synchronous clocking Digitally controlled oscillators 378.242
217	Stratégie de réduction des cycles thermiques pour systèmes temps-réel multiprocesseurs sur puce / Strategy to reduce thermal cycles for real-time multiprocessor systems-on-chip Baati, Khaled 19 December 2013 (has links) L'augmentation de la densité des transistors dans les circuits électroniques conduit à une augmentation de la consommation d'énergie induisant des phénomènes thermiques plus complexes à maitriser. Dans le cas de systèmes embarqués en environnement où la température ambiante varie dans des proportions importantes (automobile par exemple), ces phénomènes peuvent conduire à des problèmes de fiabilité. Parmi les mécanismes de défaillance observés, on peut citer les cycles thermiques (CT) qui induisent des déformations dans les couches métalliques de la puce pouvant conduire à des fissurations. L’objectif de la thèse est de proposer pour des architectures de type multiprocesseur sur puce une technique de réduction des CT subis par les processeurs, et ce en respectant les contraintes temps réel des applications. L’exemple du circuit MPC5517 de Freescale a été considéré. Dans un premier temps un modèle thermique de ce circuit a été élaboré à partir de mesures par une caméra thermique sur ce circuit décapsulé. Un environnement de simulation a été mis en oeuvre pour permettre d’effectuer simultanément des analyses thermiques et d’ordonnancement de tâches et mettre en évidence l’influence de la température sur la puissance dissipée. Une heuristique globale pour réduire à la fois les CT et la température maximale des processeurs a été étudiée. Elle tient compte des variations de la température ambiante et se base sur les techniques DVFS et DPM. Les résultats de simulation avec les algorithmes d’ordonnancement globaux RM, EDF et EDZL et avec différentes charges processeur (sur un circuit type MPC5517 et un UltraSparc T1) illustrent l’efficacité de la technique proposée. / Increasing the density of transistors in electronic circuits leads to an increase in energy consumption resulting in more complex thermal phenomena to master. For systems embedded in environments where the ambient temperature can vary in large range (e.g. automotive), these thermal effects can induce reliability problems. Among classical failure mechanisms thermal cycles (CTs) produce deformations in materials and play a major role in the cracking of the metal layers in the chip. The aim of the thesis is to propose a reduction technique of CTs suffered by the processor cores in a multiprocessor on chip architecture such that real-time application constraints are met. The example of the Freescale MPC5517 circuit has been considered. In a first step a thermal model of this circuit was developed. This was achieved from measurements taken by a thermal camera on a decapsulated circuit. Next, a simulation environment has been implemented allowing both the analysis of thermal behavior and the scheduling of tasks so as to highlight the influence of temperature on the dissipated power. A global heuristic to reduce both the CTs and the maximum temperature of processors has been studied. It takes into account variations in the ambient temperature and is based on DVFS and DPM techniques. Simulation results with global scheduling algorithms RM, EDF and EDZL and different processor loads (for a MPC5517 type circuit and a T1 UltraSparc from Sun Microsystems) illustrate the effectiveness of the proposed technique. Modélisation thermique Conception système Système sur puce Ordonnancement temps réel Basse consommation Fiabilité Cycles thermiques Co-simulation Temperature aware system-level design Multiprocessor-system on chip Real time scheduling Low-power Reliability Themal cycles Co-simulation
218	System Interconnection Design Trade-offs in Three-Dimensional (3-D) Integrated Circuits Weerasekera, Roshan January 2008 (has links) Continued technology scaling together with the integration of disparate technologies in a single chip means that device performance continues to outstrip interconnect and packaging capabilities, and hence there exist many difficult engineering challenges, most notably in power management, noise isolation, and intra and inter-chip communication. Significant research effort spanning many decades has been expended on traditional VLSI integration technologies, encompassing process, circuit and architectural issues to tackle these problems. Recently however, three- dimensional (3-D) integration has emerged as a leading contender in the challenge to meet performance, heterogeneous integration, cost, and size demands through this decade and beyond. Through silicon via (TSV) based 3-D wafer-level integration is an emerging vertical interconnect methodology that is used to route the signal and power supply links through all chips in the stack vertically. Delay and signal integrity (SI) calculation for signal propagation through TSVs is a critical analysis step in the physical design of such systems. In order to reduce design time and mirror well established practices, it is desirable to carry this out in two stages, with the physical structures being modelled by parasitic parameters in equivalent circuits, and subsequent analysis of the equivalent circuits for the desired metric. This thesis addresses both these issues. Parasitic parameter extraction is carried out using a field solver to explore trends in typical technologies to gain an insight into the variation of resistive, capacitive and inductive parasitics including coupling effects. A set of novel closed-form equations are proposed for TSV parasitics in terms of physical dimensions and material properties, allowing the electrical modelling of TSV bundles without the need for computationally expensive field-solvers. Suitable equivalent circuits including capacitive and inductive coupling are derived, and comparisons with field solver provided values are used to show the accuracy of the proposed parasitic parameter models for the purpose of performance and SI analysis. The deep submicron era saw the interconnection delay rather than the gate delay become the major bottleneck in modern digital design. The nature of this problem in 3-D circuits is studied in detail in this thesis. The ubiquitous technique of repeater insertion for reducing propagation delay and signal degradation is examined for TSVs, and suitable strategies and analysis techniques are proposed. Further, a minimal power smart repeater suitable for global on-chip interconnects, which has the potential to reduce power consumption by as much as 20% with respect to a traditional inverter is proposed. A modeling and analysis methodology is also proposed, that makes the smart repeater easier to amalgamate in CAD flows at different levels of hierarchy from initial signal planning to detailed place and route when compared to alternatives proposed in the literature. Finally, the topic of system-level performance estimation for massively integrated systems is discussed. As designers are presented with an extra spatial dimension in 3-D integration, the complexity of the layout and the architectural trade-offs also increase. Therefore, to obtain a true improvement in performance, a very careful analysis using detailed models at different hierarchical levels is crucial. This thesis presents a cohesive analysis of the technological, cost, and performance trade-offs for digital and mixed-mode systems, outlining the choices available at different points in the design and their ramifications / QC 20100916 Interconnects Parasitic Extraction Repeaters Signal Integrity System-on-Chip (SoP) System-in-Package (SiP) System-on-Package (SoP) Three-dimensional (3-D) Integration Through-Silicon-Via (TSV) Vertical Integration Information technology Informationsteknik
219	Design, Implementation and Evaluation of a Configurable NoC for AcENoCs FPGA Accelerated Emulation Platform Lotlikar, Swapnil Subhash 2010 August 1900 (has links) The heterogenous nature and the demand for extensive parallel processing in modern applications have resulted in widespread use of Multicore System-on-Chip (SoC) architectures. The emerging Network-on-Chip (NoC) architecture provides an energy-efficient and scalable communication solution for Multicore SoCs, serving as a powerful replacement for traditional bus-based solutions. The key to successful realization of such architectures is a flexible, fast and robust emulation platform for fast design space exploration. In this research, we present the design and evaluation of a highly configurable NoC used in AcENoCs (Accelerated Emulation platform for NoCs), a flexible and cycle accurate field programmable gate array (FPGA) emulation platform for validating NoC architectures. Along with the implementation details, we also discuss the various design optimizations and tradeoffs, and assess the performance improvements of AcENoCs over existing simulators and emulators. We design a hardware library consisting of routers and links using verilog hardware description language (HDL). The router is parameterized and has a configurable number of physical ports, virtual channels (VCs) and pipeline depth. A packet switched NoC is constructed by connecting the routers in either 2D-Mesh or 2D-Torus topology. The NoC is integrated in the AcENoCs platform and prototyped on Xilinx Virtex-5 FPGA. The NoC was evaluated under various synthetic and realistic workloads generated by AcENoCs' traffic generators implemented on the Xilinx MicroBlaze embedded processor. In order to validate the NoC design, performance metrics like average latency and throughput were measured and compared against the results obtained using standard network simulators. FPGA implementation of the NoC using Xilinx tools indicated a 76% LUT utilization for a 5x5 2D-Mesh network. A VC allocator was found to be the single largest consumer of hardware resources within a router. The router design synthesized at a frequency of 135MHz, 124MHz and 109MHz for 3-port, 4-port and 5-port configurations, respectively. The operational frequency of the router in the AcENoCs environment was limited only by the software execution latency even though the hardware itself could be clocked at a much higher rate. An AcENoCs emulator showed speedup improvements of 10000-12000X over HDL simulators and 5-15X over software simulators, without sacrificing cycle accuracy. FPGA NoC Network-on-Chip On-Chip Communication SoC System-on-Chip RTL HDL Hardware Description Language Hardware-Software Emulation Framework Network-on-Chip Emulation Network-on-Chip Validation Network-on-Chip Design Space Exploration
220	Microprocessor power management and a stand-alone benchmarking application for Android based platforms Yeager, Hans L. 19 January 2012 (has links) Components used in mobile hand-held devices (smart phones and tablets) vary greatly in performance and power consumption. The microprocessors used in these devices also have vastly different capabilities and manufacturing limitations leading to significant variation effects. Battery life is a significant concern to the end users of these products. A stand-alone Android application capable of benchmarking a device's performance and power consumption is introduced. The application does not require the end user to have any analytic equipment or to have a technical background. This enables individual end users to better understand their particular device's performance and battery life interaction. They may also use the application to determine if their device's performance or battery life has degraded over time. Data is also uploaded to a central location so that devices can be compared against each other. The benchmarking application is capable of resolving variation effects caused by device, environmental changes and power management actions. This application demonstrates the feasibility of creating a low cost ecosystem where thousands of devices can be quantitatively compared. / text Android System on chip SOC Benchmark Performance Battery life MobilePowerBench Processor Microprocessor Power Power management Power delivery Power estimation Power virus TDP Average power Idle power DVFS Clock gating ACPI C-state P-state S-state OSPM Power gate DRM DTM

Search results