Global ETD Search

241	Designing Low Power and High Performance Network-on-Chip Communication Architectures for Nanometer SoCs Reehal, Gursharan Kaur 19 July 2012 (has links) No description available. Electrical Engineering Network-on-Chip (NoC) System-on-Chip (SoC) NoC Interconnects NoC Power Modeling Low Power NoC design Cliche Network BFT Network SPIN Network Octagon Network Asynchronous NoC design Power Models
242	A Trusted Autonomic Architecture to Safeguard Cyber-Physical Control Leaf Nodes and Protect Process Integrity Chiluvuri, Nayana Teja 16 September 2015 (has links) Cyber-physical systems are networked through IT infrastructure and susceptible to malware. Threats targeting process control are much more safety-critical than traditional computing systems since they jeopardize the integrity of physical infrastructure. Existing defence mechanisms address security at the network nodes but do not protect the physical infrastructure if network integrity is compromised. An interface guardian architecture is implemented on cyber-physical control leaf nodes to maintain process integrity by enforcing high-level safety and stability policies. Preemptive detection schemes are implemented to monitor process behavior and anticipate malicious activity before process safety and stability are compromised. Autonomic properties are employed to automatically protect process integrity by initiating switch-over to a verified backup controller. Subsystems adhere to strict trust requirements safeguarding them from adversarial intrusion. The preemptive detection schemes, switch-over logic, backup controller, and process communication are all trusted components that are separated from the untrusted production controller. The proposed architecture is applied to a rotary inverted pendulum experiment and implemented on a Xilinx Zynq-7000 configurable SoC. The leaf node implementation is integrated into a cyber-physical control topology. Simulated attack scenarios show strengthened resilience to both network integrity and reconfiguration attacks. Threats attempting to disrupt process behavior are successfully thwarted by having a backup controller maintain process stability. The system ensures both safety and liveness properties even under adversarial conditions. / Master of Science Process control systems cyber-physical systems autonomic systems programmable logic controller remote terminal unit human-machine interface Field programmable gate arrays trust configurable system-on-chip heterogeneous computing high-level synthesis
243	Implementation of Bolt Detection and Visual-Inertial Localization Algorithm for Tightening Tool on SoC FPGA / Implementering av bultdetektering och visuell tröghetslokaliseringsalgoritm för åtdragningsverktyg på SoC FPGA Al Hafiz, Muhammad Ihsan January 2023 (has links) With the emergence of Industry 4.0, there is a pronounced emphasis on the necessity for enhanced flexibility in assembly processes. In the domain of bolt-tightening, this transition is evident. Tools are now required to navigate a variety of bolts and unpredictable tightening methodologies. Each bolt, possessing distinct tightening parameters, necessitates a specific sequence to prevent issues like bolt cross-talk or unbalanced force. This thesis introduces an approach that integrates advanced computing techniques with machine learning to address these challenges in the tightening areas. The primary objective is to offer edge computation for bolt detection and tightening tools' precise localization. It is realized by leveraging visual-inertial data, all encapsulated within a System-on-Chip (SoC) Field Programmable Gate Array (FPGA). The chosen approach combines visual information and motion detection, enabling tools to quickly and precisely do the localization of the tool. All the computing is done inside the SoC FPGA. The key element for identifying different bolts is the YOLOv3-Tiny-3L model, run using the Deep-learning Processor Unit (DPU) that is implemented in the FPGA. In parallel, the thesis employs the Error-State Extended Kalman Filter (ESEKF) algorithm to fuse the visual and motion data effectively. The ESEKF is accelerated via a full implementation in Register Transfer Level (RTL) in the FPGA fabric. We examined the empirical outcomes and found that the visual-inertial localization exhibited a Root Mean Square Error (RMSE) position of 39.69 mm and a standard deviation of 9.9 mm. The precision in orientation determination yields a mean error of 4.8 degrees, offset by a standard deviation of 5.39 degrees. Notably, the entire computational process, from the initial bolt detection to its final localization, is executed in 113.1 milliseconds. This thesis articulates the feasibility of executing bolt detection and visual-inertial localization using edge computing within the SoC FPGA framework. The computation trajectory is significantly streamlined by harnessing the adaptability of programmable logic within the FPGA. This evolution signifies a step towards realizing a more adaptable and error-resistant bolt-tightening procedure in industrial areas. / Med framväxten av Industry 4.0, finns det en uttalad betoning på nödvändigheten av ökad flexibilitet i monteringsprocesser. Inom området bultåtdragning är denna övergång tydlig. Verktyg krävs nu för att navigera i en mängd olika bultar och oförutsägbara åtdragningsmetoder. Varje bult, som har distinkta åtdragningsparametrar, kräver en specifik sekvens för att förhindra problem som bultöverhörning eller obalanserad kraft. Detta examensarbete introducerar ett tillvägagångssätt som integrerar avancerade datortekniker med maskininlärning för att hantera dessa utmaningar i skärpningsområdena. Det primära målet är att erbjuda kantberäkning för bultdetektering och åtdragningsverktygs exakta lokalisering. Det realiseras genom att utnyttja visuella tröghetsdata, allt inkapslat i en System-on-Chip (SoC) Field Programmable Gate Array (FPGA). Det valda tillvägagångssättet kombinerar visuell information och rörelsedetektering, vilket gör det möjligt för verktyg att snabbt och exakt lokalisera verktyget. All beräkning sker inuti SoC FPGA. Nyckelelementet för att identifiera olika bultar är YOLOv3-Tiny-3L-modellen, som körs med hjälp av Deep-learning Processor Unit (DPU) som är implementerad i FPGA. Parallellt använder avhandlingen algoritmen Error-State Extended Kalman Filter (ESEKF) för att effektivt sammansmälta visuella data och rörelsedata. ESEKF accelereras via en fullständig implementering i Register Transfer Level (RTL) i FPGA-strukturen. Vi undersökte de empiriska resultaten och fann att den visuella tröghetslokaliseringen uppvisade en Root Mean Square Error (RMSE) position på 39,69 mm och en standardavvikelse på 9,9 mm. Precisionen i orienteringsbestämningen ger ett medelfel på 4,8 grader, kompenserat av en standardavvikelse på 5,39 grader. Noterbart är att hela beräkningsprocessen, från den första bultdetekteringen till dess slutliga lokalisering, exekveras på 113,1 millisekunder. Denna avhandling artikulerar möjligheten att utföra bultdetektering och visuell tröghetslokalisering med hjälp av kantberäkning inom SoC FPGA-ramverket. Beräkningsbanan är avsevärt effektiviserad genom att utnyttja anpassningsförmågan hos programmerbar logik inom FPGA. Denna utveckling innebär ett steg mot att förverkliga en mer anpassningsbar och felbeständig skruvdragningsprocedur i industriområden. Bolt detection Visual-Inertial localization System-on-Chip (SoC) Field-Programmable Gate Array (FPGA) Machine learning Perspective-n-Points High-Level Synthesis (HLS) YOLO Tightening tool Bultdetektering visuell-tröghetslokalisering System-on-Chip (SoC) Field-Programmable Gate Array (FPGA) Machine Learning Perspective-n-Points High-Level Synthesis (HLS) YOLO åtdragningsverktyg Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
244	Réalisation d'un réseau de neurones "SOM" sur une architecture matérielle adaptable et extensible à base de réseaux sur puce "NoC" / Neural Network Implementation on an Adaptable and Scalable Hardware Architecture based-on Network-on-Chip Abadi, Mehdi 07 July 2018 (has links) Depuis son introduction en 1982, la carte auto-organisatrice de Kohonen (Self-Organizing Map : SOM) a prouvé ses capacités de classification et visualisation des données multidimensionnelles dans différents domaines d’application. Les implémentations matérielles de la carte SOM, en exploitant le taux de parallélisme élevé de l’algorithme de Kohonen, permettent d’augmenter les performances de ce modèle neuronal souvent au détriment de la flexibilité. D’autre part, la flexibilité est offerte par les implémentations logicielles qui quant à elles ne sont pas adaptées pour les applications temps réel à cause de leurs performances temporelles limitées. Dans cette thèse nous avons proposé une architecture matérielle distribuée, adaptable, flexible et extensible de la carte SOM à base de NoC dédiée pour une implantation matérielle sur FPGA. A base de cette approche, nous avons également proposé une architecture matérielle innovante d’une carte SOM à structure croissante au cours de la phase d’apprentissage / Since its introduction in 1982, Kohonen’s Self-Organizing Map (SOM) showed its ability to classify and visualize multidimensional data in various application fields. Hardware implementations of SOM, by exploiting the inherent parallelism of the Kohonen algorithm, allow to increase the overall performances of this neuronal network, often at the expense of the flexibility. On the other hand, the flexibility is offered by software implementations which on their side are not suited for real-time applications due to the limited time performances. In this thesis we proposed a distributed, adaptable, flexible and scalable hardware architecture of SOM based on Network-on-Chip (NoC) designed for FPGA implementation. Moreover, based on this approach we also proposed a novel hardware architecture of a growing SOM able to evolve its own structure during the learning phase Réseau de neurones artificiels (RNA) Carte auto-organisatrice (SOM) Réseau sur puce (NoC) Architecture auto-adaptable Réseau de neurones évolutif Artificial neural network (ANN) Self-Organizing Maps (SOM) Network-on-Chip (NoC) Multiprocessor system-on-chip (MPSoC) Self-adaptive architecture Evolving Artificial Neural Networks 621.381 5 006.32
245	Models, Design Methods and Tools for Improved Partial Dynamic Reconﬁguration / Modelle, Entwurfsmethoden und -Werkzeuge für die partielle dynamische Rekonfiguration Rullmann, Markus 14 October 2010 (has links) (PDF) Partial dynamic reconﬁguration of FPGAs has attracted high attention from both academia and industry in recent years. With this technique, the functionality of the programmable devices can be adapted at runtime to changing requirements. The approach allows designers to use FPGAs more efﬁciently: E. g. FPGA resources can be time-shared between different functions and the functions itself can be adapted to changing workloads at runtime. Thus partial dynamic reconﬁguration enables a unique combination of software-like ﬂexibility and hardware-like performance. Still there exists no common understanding on how to assess the overhead introduced by partial dynamic reconﬁguration. This dissertation presents a new cost model for both the runtime and the memory overhead that results from partial dynamic reconﬁguration. It is shown how the model can be incorporated into all stages of the design optimization for reconﬁgurable hardware. In particular digital circuits can be mapped onto FPGAs such that only small fractions of the hardware must be reconﬁgured at runtime, which saves time, memory, and energy. The design optimization is most efﬁcient if it is applied during high level synthesis. This book describes how the cost model has been integrated into a new high level synthesis tool. The tool allows the designer to trade-off FPGA resource use versus reconﬁguration overhead. It is shown that partial reconﬁguration causes only small overhead if the design is optimized with regard to reconﬁguration cost. A wide range of experimental results is provided that demonstrates the beneﬁts of the applied method. / Partielle dynamische Rekonfiguration von FPGAs hat in den letzten Jahren große Aufmerksamkeit von Wissenschaft und Industrie auf sich gezogen. Die Technik erlaubt es, die Funktionalität von progammierbaren Bausteinen zur Laufzeit an veränderte Anforderungen anzupassen. Dynamische Rekonfiguration erlaubt es Entwicklern, FPGAs effizienter einzusetzen: z.B. können Ressourcen für verschiedene Funktionen wiederverwendet werden und die Funktionen selbst können zur Laufzeit an veränderte Verarbeitungsschritte angepasst werden. Insgesamt erlaubt partielle dynamische Rekonfiguration eine einzigartige Kombination von software-artiger Flexibilität und hardware-artiger Leistungsfähigkeit. Bis heute gibt es keine Übereinkunft darüber, wie der zusätzliche Aufwand, der durch partielle dynamische Rekonfiguration verursacht wird, zu bewerten ist. Diese Dissertation führt ein neues Kostenmodell für Laufzeit und Speicherbedarf ein, welche durch partielle dynamische Rekonfiguration verursacht wird. Es wird aufgezeigt, wie das Modell in alle Ebenen der Entwurfsoptimierung für rekonfigurierbare Hardware einbezogen werden kann. Insbesondere wird gezeigt, wie digitale Schaltungen derart auf FPGAs abgebildet werden können, sodass nur wenig Ressourcen der Hardware zur Laufzeit rekonfiguriert werden müssen. Dadurch kann Zeit, Speicher und Energie eingespart werden. Die Entwurfsoptimierung ist am effektivsten, wenn sie auf der Ebene der High-Level-Synthese angewendet wird. Diese Arbeit beschreibt, wie das Kostenmodell in ein neuartiges Werkzeug für die High-Level-Synthese integriert wurde. Das Werkzeug erlaubt es, beim Entwurf die Nutzung von FPGA-Ressourcen gegen den Rekonfigurationsaufwand abzuwägen. Es wird gezeigt, dass partielle Rekonfiguration nur wenig Kosten verursacht, wenn der Entwurf bezüglich Rekonfigurationskosten optimiert wird. Eine Anzahl von Beispielen und experimentellen Ergebnissen belegt die Vorteile der angewendeten Methodik. FPGA High-Level-Synthese Rekonfigurationskosten Dynamische Rekonfiguration Partielle Rekonfiguration Kostenoptimierung FPGA High-Level-Synthesis Reconfiguration Cost Dynamic Reconfiguration Partial Reconfiguration Cost Optimization ddc:620 rvk:ZN 4940 Field programmable gate array Technische Informatik Rekonfiguration Netzwerksynthese EDA <Optimierung> Entwurfsautomation Schaltungsentwurf System-on-Chip
246	Innovative transceiver approaches for low-power near-field and far-field applications Inanlou, Farzad Michael-David 27 August 2014 (has links) Wireless operation, near-field or far-field, is a core functionality of any mobile or autonomous system. These systems are battery operated or most often utilize energy scavenging as a means of power generation. Limited access to power, expected long and uninterrupted operation, and constrained physical parameters (e.g. weight and size), which limit overall power harvesting capabilities, are factors that outline the importance for innovative low-power approaches and designs in advanced low-power wireless applications. Low-power approaches become especially important for the wireless transceiver, the block in charge of wireless/remote functionality of the system, as this block is usually the most power hungry component in an integrated system-on-chip (SoC). Three such advanced applications with stringent power requirements are examined including space-based exploratory remote sensing probes and their associated radiation effects, millimeter-wave phased-array radar for high-altitude tactical and geological imaging, and implantable biomedical devices (IMDs), leading to the proposal and implementation of low-power wireless solutions for these applications in SiGe BiCMOS and CMOS and platforms. Wireless Transceiver Radar Low-power Wide-band Millimeter-wave System on chip Remote sensing Phased-array Implantable Medical Devices Near-field Communications Space-based transceivers SiGe BiCMOS CMOS Integrated circuits RF Radiation effects Total ionizing dose High-altitude Transcutaneous telemetry Data telemetry low-voltage Low noise Amplifier X-band K-band Ku-band W-band Impulse radio Ultra wideband Inductive link
247	Stress and Microstructural Evolution During the Growth of Transition Metal Oxide Thin Films by PVD Narayanachari, K V L V January 2015 (has links) (PDF) System on Chip (SoC) and System in Package (SiP) are two electronic technologies that involve integrating multiple functionalities onto a single platform. When the platform is a single wafer, as in SOC, it requires the ability to deposit various materials that enable the different functions on to an underlying substrate that can host the electronic circuitry. Transition metal oxides which have a wide range of properties are ideal candidates for the functional material. Si wafer on which micro-electronics technology is widely commercialized is the ideal host platform. Integrating oxides with Si, generally in the form of thin films as required by microelectronics technology, is however a challenge. It starts with the fact that the properties of crystalline oxides to be exploited in performing various functions are direction dependent. Thus, thin films of these oxides need to be deposited on Si in certain crystallographic orientations. Even if a suitably oriented Si wafer surface were available, it does not always provide for epitaxial growth a critical requirement for controlling the crystalline orientation of thin films. This is because Si surface is covered by an amorphous oxide of Si (SiOx). Thus, during growth of the functional oxide, an ambience in which the Si itself will not oxidize needs to be provided. In addition, during thin film growth on either Si or SiOx surface stresses are generated from various sources. Stress and its relaxation are also associated with the formation and evolution of defects. Both, stress and defects need to be managed in order to harness their beneficial effects and prevent detrimental ones. Given the requirement of SoC technology and the problem associated, the research work reported in this thesis was hence concerned with the precise controlling the stress and microstructure in oxide thin films deposited on Si substrates. In order to do so a versatile, ultra high vacuum (UHV) thin film with a base pressure of 10-9 Torr was designed and built as part of this study. The chamber is capable of depositing films by both sputtering (RF & DC) and pulsed laser ablation (PLD). The system has been designed to include an optical curvature measurement tool that enabled real-time stress measurement during growth. Doped zirconia, ZrO2, was chosen as the first oxide to be deposited, as it is among the few oxides that is more stable than SiOx. It is hence used as a buffer layer. It is shown in this thesis that a change in the growth rate at nucleation can lead to (100) or (111) textured films. These two are among the most commonly preferred orientation. Following nucleation a change in growth rate does not affect orientation but affects stress. Thus, independent selection of texture and stress is demonstrated in YSZ thin films on Si. A quantitative model based on the adatom motion on the growth surface and the anisotropic growth rates of the two orientations is used to explain these observations. This study was then subsequent extended to the growth on platinized Si another commonly used Si platform.. A knowledge of the stress and microstructure tailoring in cubic zirconia on Si was then extended to look at the effect of stress on electrical properties of zirconia on germanium for high-k dielectric applications. Ge channels are expected to play a key role in next generation n-MOS technology. Development of high-k dielectrics for channel control is hence essential. Interesting stress and property relations were analyzed in ZrO2/Ge. Stress and texture in pulsed laser deposited (PLD) oxides on silicon and SrTiO3 were studied. It is shown in this thesis that stress tuning is critical to achieve the highest possible dielectric constant. The effect of stress on dielectric constant is due to two reasons. The first one is an indirect effect involving the effect of stress on phase stability. The second one is the direct effect involving interatomic distance. By stress control an equivalent oxide thickness (EOT) of 0.8 nm was achieved in sputter deposited ZrO2/Ge films at 5 nm thickness. This is among the best reported till date. Finally, the effect of growth parameters and deposition geometry on the microstructural and stress evolution during deposition of SrTiO3 on Si and BaTiO3 on SrTiO3 by pulsed laser deposition is the same chamber is described. Transition Metal Oxide Thin Films System On Chip (SoC) System on Package (SiP) Thin Film Deposition Ultra High Vacuum Thin Film Deposition Pulsed Laser Ablation Sputter Deposition Oxide Thin Film Growth Oxides On Silicon Physical Vapor Deposition ZrO2 Thin Films ZrO2/Ge Thin Films BaTiO3 Thin Films SrTiO3 Thin Films Materials Science
248	NoC Design & Optimization of Multicore Media Processors Basavaraj, T January 2013 (has links) (PDF) Network on Chips[1][2][3][4] are critical elements of modern System on Chip(SoC) as well as Chip Multiprocessor(CMP)designs. Network on Chips (NoCs) help manage high complexity of designing large chips by decoupling computation from communication. SoCs and CMPs have a multiplicity of communicating entities like programmable processing elements, hardware acceleration engines, memory blocks as well as off-chip interfaces. With power having become a serious design constraint[5], there is a great need for designing NoC which meets the target communication requirements, while minimizing power using all the tricks available at the architecture, microarchitecture and circuit levels of the de-sign. This thesis presents a holistic, QoS based, power optimal design solution of a NoC inside a CMP taking into account link microarchitecture and processor tile configurations. Guaranteeing QoS by NoCs involves guaranteeing bandwidth and throughput for connections and deterministic latencies in communication paths. Label Switching based Network-on-Chip(LS-NoC) uses a centralized LS-NoC Management framework that engineers traffic into QoS guaranteed routes. LS-NoC uses label switching, enables band-width reservation, allows physical link sharing and leverages advantages of both packet and circuit switching techniques. A flow identification algorithm takes into account band-width available in individual links to establish QoS guaranteed routes. LS-NoC caters to the requirements of streaming applications where communication channels are fixed over the lifetime of the application. The proposed NoC framework inherently supports heterogeneous and ad-hoc SoC designs. A multicast, broadcast capable label switched router for the LS-NoC has been de-signed, verified, synthesized, placed and routed and timing analyzed. A 5 port, 256 bit data bus, 4 bit label router occupies 0.431 mm2 in 130nm and delivers peak band-width of80Gbits/s per link at312.5MHz. LS Router is estimated to consume 43.08 mW. Bandwidth and latency guarantees of LS-NoC have been demonstrated on streaming applications like Hiper LAN/2 and Object Recognition Processor, Constant Bit Rate traffic patterns and video decoder traﬃc representing Variable Bit Rate traffic. LS-NoC was found to have a competitive figure of merit with state-of-the-art NoCs providing QoS. We envision the use of LS-NoC in general purpose CMPs where applications demand deterministic latencies and hard bandwidth requirements. Design variables for interconnect exploration include wire width, wire spacing, repeater size and spacing, degree of pipelining, supply, threshold voltage, activity and coupling factors. An optimal link configuration in terms of number of pipeline stages for a given length of link and desired operating frequency is arrived at. Optimal configurations of all links in the NoC are identified and a power-performance optimal NoC is presented. We presents a latency, power and performance trade-off study of NoCs using link microarchitecture exploration. The design and implementation of a framework for such a design space exploration study is also presented. We present the trade-oﬀ study on NoCs by varying microarchitectural(e.g. pipelining) and circuit level(e.g. frequency and voltage) parameters. A System-C based NoC exploration framework is used to explore impacts of various architectural and microarchitectural level parameters of NoC elements on power and performance of the NoC. The framework enables the designer to choose from a variety of architectural options like topology, routing policy, etc., as well as allows experimentation with various microarchitectural options for the individual links like length, wire width, pitch, pipelining, supply voltage and frequency. The framework also supports a flexible traffic generation and communication model. Latency, power and throughput results using this framework to study a 4x4 CMP are presented. The framework is used to study NoC designs of a CMP using different classes of parallel computing benchmarks[6]. One of the key findings is that the average latency of a link can be reduced by increasing pipeline depth to a certain extent, as it enables link operation at higher link frequencies. Abstract There exists an optimum degree of pipelining which minimizes the energy-delay product of the link. In a 2D Torus when the longest link is pipelined by 4 stages at which point least latency(1.56 times minimum) is achieved and power(40% of max) and throughput (64%of max) are nominal. Using frequency scaling experiments, power variations of up to40%,26.6% and24% can be seen in 2D Torus, Reduced 2D Torus and Tree based NoC between various pipeline configurations to achieve same frequency at constant voltages. Also in some cases, we find that switching to a higher pipelining configuration can actually help reduce power as the links can be designed with smaller repeaters. We also find that the overall performance of the ICNs is determined by the lengths of the links needed to support the communication patterns. Thus the mesh seems to perform the best amongst the three topologies(Mesh, Torus and Folded Torus) considered in case studies. The effects of communication overheads on performance, power and energy of a multiprocessor chip using L1,L2 cache sizes as primary exploration parameters using accurate interconnect, processor, on-chip and off-chip memory modelling are presented. On-chip and off-chip communication times have significant impact on execution time and the energy efficiency of CMPs. Large cache simply larger tile area that result in longer inter-tile communication link lengths and latencies, thus adversely impacting communication time. Smaller caches potentially have higher number of misses and frequent of off-tile communication. Energy efficient tile design is a configuration exploration and trade-off study using different cache sizes and tile areas to identify a power-performance optimal configuration for the CMP. Trade-offs are explored using a detailed, cycle accurate, multicore simulation frame-work which includes superscalar processor cores, cache coherent memory hierarchies, on-chip point-to-point communication networks and detailed interconnect model including pipelining and latency. Sapphire, a detailed multiprocessor execution environment integrating SESC, Ruby and DRAM Sim was used to run applications from the Splash2 benchmark(64KpointFFT).Link latencies are estimated for a16 core CMP simulation on Sapphire. Each tile has a single processor, L1 and L2 caches and a router. Different sizesofL1 andL2lead to different tile clock speeds, tile miss rates and tile area and hence interconnect latency. Simulations across various L1, L2 sizes indicate that the tile configuration that maximizes energy efficiency is related to minimizing communication time. Experiments also indicate different optimal tile configurations for performance, energy and energy efficiency. Clustered interconnection network, communication aware cache bank mapping and thread mapping to physical cores are also explored as potential energy saving solutions. Results indicate that ignoring link latencies can lead to large errors in estimates of program completion times, of up to 17%. Performance optimal configurations are achieved at lower L1 caches and at moderateL2 cache sizes due to higher operating frequencies and smaller link lengths and comparatively lesser communication. Using minimal L1 cache size to operate at the highest frequency may not always be the performance-power optimal choice. Larger L1 sizes, despite a drop in frequency, offer a energy advantage due to lesser communication due to misses. Clustered tile placement experiments for FFT show considerable performance per watt improvement (1.2%). Remapping most accessed L2 banks by a process in the same core or neighbouring cores after communication traffic analysis offers power and performance advantages. Remapped processes and banks in clustered tile placement show a performance per watt improvement of5.25% and energy reductionof2.53%. This suggests that processors could execute a program in multiple modes, for example, minimum energy, maximum performance. Network-On-Chip (NOC) Quality of Service (QOS) Label Switched Network On Chip (LS-NoC) Chip Multiprocessor Designs Multicore Media Processors System on Chip (SoC) Ford-Fulkerson’s MaxFlow Algorithm Flow Algorithm Network-on-Chips On-Chip Interconnection Networks Microarchitecture Exploration Computer Science
249	EVALUATION OF SOURCE ROUTING FOR MESH TOPOLOGY NETWORK ON CHIP PLATFORMS MUBEEN, SAAD January 2009 (has links) Network on Chip is a scalable and flexible communication infrastructure for the design of core based System on Chip. Communication performance of a NoC depends heavily on the routing algorithm. Deterministic and adaptive distributed routing algorithms have been advocated in all the current NoC architectural proposals. In this thesis we make a case for the use of source routing for NoCs, especially for regular topologies like mesh. The advantages of source routing include in-order packet delivery; faster and simpler router design; and possibility of mixing non-minimal paths in a mainly minimal routing. We propose a method to compute paths for various communications in such a way that traffic congestion is avoided while ensuring deadlock free routing. We also propose an efficient scheme to encode the paths. We developed a tool in Matlab that computes paths for source routing for both general and application specific communications. Depending upon the type of traffic, this tool computes paths for source routing by selecting best routing algorithm out of many routing algorithms. The tool uses a constructive path improvement algorithm to compute paths that give more uniform link load distribution. It also generates different types of traffics. We also developed a simulator capable of simulating source routing for mesh topology NoC. The experiments and simulations which we performed were successful and the results show that the advantages of source routing especially lower packet latency more than compensate its disadvantages. The results also demonstrate that source routing can be a good routing candidate for practical core based SoCs design using network on chip communication infrastructure. Network on Chip (NoC) System on Chip (SoC) Core Based Design On Chip Communication Distributed Routing Source Routing Routing Algorithms Performance Analysis Packet Switched Network Annan elektroteknik och elektronik Annan elektroteknik och elektronik Computer Engineering Datorteknik
250	Power Issues in SoCs : Power Aware DFT Architecture and Power Estimation Tudu, Jaynarayan Thakurdas January 2016 (has links) (PDF) Test power, data volume, and test time have been long-standing problems for sequential scan based testing of system-on-chip (SoC) design. The modern SoCs fabricated at lower technology nodes are complex in nature, the transistor count is as large as billions of gate for some of the microprocessors. The design complexity is further projected to increase in the coming years in accordance with Moore's law. The larger gate count and integration of multiple functionalities are the causes for higher test power dissipation, test time and data volume. The dynamic power dissipation during scan testing, i.e. during scan shift, launch and response capture, are major concerns for reliable as well as cost effective testing. Excessive average power dissipation leads to a thermal problem which causes burn-out of the chip during testing. Peak power on other hand causes test failure due to power induced additional delay. The test failure has direct impact on yield. The test power problem in modern 3D stacked based IC is even a more serious issue. Estimating the worst case functional power dissipation is yet another great challenge. The worst case functional power estimation is necessary because it gives an upper bound on the functional power dissipation which can further be used to determine the safe power zone for the test. Several solutions in the past have been proposed to address these issues. In this thesis we have three major contributions: 1) Sequential scan chain reordering, and 2) JScan-an alternative Joint-scan DFT architecture to address primarily the test power issues along with test time and data volume, and 3) an integer linear programming methodology to address the power estimation problem. In order to reduce test power during shift, we have proposed a graph theoretic formulation for scan chain reordering and for optimum scan shift operation. For each formulation a set of algorithms is proposed. The experimental results on ISCAS-89 benchmark circuit show a reduction of around 25% and 15% in peak power and scan shift time respectively. In order to have a holistic DFT architecture which could solve test power, test time, and data volume problems, a new DFT architecture called Joint-scan (JScan) have been developed. In JScan we have integrated the serial and random access scan architectures in a systematic way by which the JScan could harness the respective advantages from each of the architectures. The serial scan architecture from test power, test time, and data volume problems. However, the serial scan is simple in terms of its functionality and is cost effective in terms of DFT circuitry. Whereas, the random ac-cess scan architecture is opposite to this; it is power efficient and it takes lesser time and data volume compared to serial scan. However, the random access scan occupies larger DFT area and introduces routing congestion. Therefore, we have proposed a methodology to realize the JScan architecture as an efficient alternative for standard serial and random access scan. Further, the JScan architecture is optimized and it resulted into a 2-Mode 2M-Jscan Joint-scan architecture. The proposed architectures are experimentally verified on larger benchmark circuits and compared with existing state of the art DFT architectures. The results show a reduction of 50% to 80% in test power and 30% to 50% in test time and data volume. The proposed architectures are also evaluated for routing area minimization and we obtained a saving of around 7% to 15% of chip area. Estimating the worst case functional power being a challenging problem, we have proposed a binary integer linear programming (BILP) based methodology. Two different formulations have been proposed considering the different delay models namely zero-delay and unit-delay. The proposed methodology generates a pair or input vectors which could toggle the circuit to dissipate worst power. The BILP problems are solved using CPLEX solver for ISCAS-85 combinational benchmark circuits. For some of the circuits, the proposed methodology provided the worst possible power dissipation i.e. 80 to 100% toggling in nets. Aware Scan Architecture Random Access Scan Power Aware Test BILP Programming VLSI Testing Power Estimation System-On-Chip (SOC) Sequential Scan Chain Ordering JScan Integer Linear Programming Design-for-Test (DFT) Architecture Congestion Aware Scan Architecture Computer Science

Search results