Global ETD Search

1	A verilog-hdl implementation of virtual channels in a network-on-chip router Park, Sungho 15 May 2009 (has links) As the feature size is continuously decreasing and integration density is increasing, interconnections have become a dominating factor in determining the overall quality of a chip. Due to the limited scalability of system bus, it cannot meet the requirement of current System-on-Chip (SoC) implementations where only a limited number of functional units can be supported. Long global wires also cause many design problems, such as routing congestion, noise coupling, and difficult timing closure. Network-on-Chip (NoC) architectures have been proposed to be an alternative to solve the above problems by using a packet-based communication network. The processing elements (PEs) communicate with each other by exchanging messages over the network and these messages go through buffers in each router. Buffers are one of the major resource used by the routers in virtual channel flow control. In this thesis, we analyze two kinds of buffer allocation approaches, static and dynamic buffer allocations. These approaches aim to increase throughput and minimize latency by means of virtual channel flow control. In statically allocated buffer architecture, size and organization are design time decisions and thus, do not perform optimally for all traffic conditions. In addition, statically allocated virtual channel consumes a waste of area and significant leakage power. However, dynamic buffer allocation scheme claims that buffer utilization can be increased using dynamic virtual channels. Dynamic virtual channel regulator (ViChaR), have been proposed to use centralized buffer architecture which dynamically allocates virtual channels and buffer slots in real-time depending on traffic conditions. This ViChaR’s dynamic buffer management scheme increases buffer utilization, but it also increases design complexity. In this research, we reexamine performance, power consumption, and area of ViChaR’s buffer architecture through implementation. We implement a generic router and a ViChaR architecture using Verilog-HDL. These RTL codes are verified by dynamic simulation, and synthesized by Design Compiler to get area and power consumption. In addition, we get latency through Static Timing Analysis. The results show that a ViChaR’s dynamic buffer management scheme increases the latency and power consumption significantly even though it could increase buffer utilization. Therefore, we need a novel design to achieve high buffer utilization without a loss. NoC Virtual Channels Network-on-Chip router
2	Performance modelling and evaluation of network on chip under bursty traffic : performance evaluation of communication networks using analytical and simulation models in NOCs with fat tree topology under bursty traffic with virtual channels Ibrahim, Hatem Musbah January 2014 (has links) Physical constrains of integrated circuits (commonly called chip) in regards to size and finite number of wires, has made the design of System-on-Chip (SoC) more interesting to study in terms of finding better solutions for the complexity of the chip-interconnections. The SoC has hundreds of Processing Elements (PEs), and a single shared bus can no longer be acceptable due to poor scalability with the system size. Networks on Chip (NoC) have been proposed as a solution to mitigate complex on-chip communication problems for complex SoCs. They consists of computational resources in the form of PE cores and switching nodes which allow PEs to communicate with each other. In the design and development of Networks on Chip, performance modelling and analysis has great theoretical and practical importance. This research is devoted to developing efficient and cost-effective analytical tools for the performance analysis and enhancement of NoCs with m-port n-tree topology under bursty traffic. Recent measurement studies have strongly verified that the traffic generated by many real-world applications in communication networks exhibits bursty and self-similar properties in nature and the message destinations are uniformly distributed. NoC's performance is generally affected by different traffic patterns generated by the processing elements. As the first step in the research, a new analytical model is developed to capture the burstiness and self-similarity characteristics of the traffic within NoCs through the use of Markov Modulated Poisson Process. The performance results of the developed model highlight the importance of accurate traffic modelling in the study and performance evaluation of NoCs. Having developed an efficient analytical tool to capture the traffic behaviour with a higher accuracy, in the next step, the research focuses on the effect of topology on the performance of NoCs. Many important challenges still remain as vulnerabilities within the design of NoCs with topology being the most important. Therefore a new analytical model is developed to investigate the performance of NoCs with the m-port n-tree topology under bursty traffic. Even though it is broadly proved in practice that fat-tree topology and its varieties result in lower latency, higher throughput and bandwidth, still most studies on NoCs adopt Mesh, Torus and Spidergon topologies. The results gained from the developed model and advanced simulation experiments significantly show the effect of fat-tree topology in reducing latency and increasing the throughput of NoCs. In order to obtain deeper understanding of NoCs performance attributes and for further improvement, in the final stage of the research, the developed analytical model was extended to consider the use of virtual channels within the architecture of NoCs. Extensive simulation experiments were carried out which show satisfactory improvements in the throughput of NoCs with fat-tree topology and VCs under bursty traffic. The analytical results and those obtained from extensive simulation experiments have shown a good degree of accuracy for predicting the network performance under different design alternatives and various traffic conditions. 621.3815
3	Performance Modelling and Evaluation of Network On Chip Under Bursty Traffic. Performance evaluation of communication networks using analytical and simulation models in NOCs with Fat tree topology under Bursty Traffic with virtual channels. Ibrahim, Hatem Musbah January 2014 (has links) Physical constrains of integrated circuits (commonly called chip) in regards to size and finite number of wires, has made the design of System-on-Chip (SoC) more interesting to study in terms of finding better solutions for the complexity of the chip-interconnections. The SoC has hundreds of Processing Elements (PEs), and a single shared bus can no longer be acceptable due to poor scalability with the system size. Networks on Chip (NoC) have been proposed as a solution to mitigate complex on-chip communication problems for complex SoCs. They consists of computational resources in the form of PE cores and switching nodes which allow PEs to communicate with each other. In the design and development of Networks on Chip, performance modelling and analysis has great theoretical and practical importance. This research is devoted to developing efficient and cost-effective analytical tools for the performance analysis and enhancement of NoCs with m-port n-tree topology under bursty traffic. Recent measurement studies have strongly verified that the traffic generated by many real-world applications in communication networks exhibits bursty and self-similar properties in nature and the message destinations are uniformly distributed. NoC's performance is generally affected by different traffic patterns generated by the processing elements. As the first step in the research, a new analytical model is developed to capture the burstiness and self-similarity characteristics of the traffic within NoCs through the use of Markov Modulated Poisson Process. The performance results of the developed model highlight the importance of accurate traffic modelling in the study and performance evaluation of NoCs. Having developed an efficient analytical tool to capture the traffic behaviour with a higher accuracy, in the next step, the research focuses on the effect of topology on the performance of NoCs. Many important challenges still remain as vulnerabilities within the design of NoCs with topology being the most important. Therefore a new analytical model is developed to investigate the performance of NoCs with the m-port n-tree topology under bursty traffic. Even though it is broadly proved in practice that fat-tree topology and its varieties result in lower latency, higher throughput and bandwidth, still most studies on NoCs adopt Mesh, Torus and Spidergon topologies. The results gained from the developed model and advanced simulation experiments significantly show the effect of fat-tree topology in reducing latency and increasing the throughput of NoCs. In order to obtain deeper understanding of NoCs performance attributes and for further improvement, in the final stage of the research, the developed analytical model was extended to consider the use of virtual channels within the architecture of NoCs. Extensive simulation experiments were carried out which show satisfactory improvements in the throughput of NoCs with fat-tree topology and VCs under bursty traffic. The analytical results and those obtained from extensive simulation experiments have shown a good degree of accuracy for predicting the network performance under different design alternatives and various traffic conditions. / Libyan Ministry of Higher Education
4	Off-chip Communications Architectures For High Throughput Network Processors Engel, Jacob 01 January 2005 (has links) In this work, we present off-chip communications architectures for line cards to increase the throughput of the currently used memory system. In recent years there is a significant increase in memory bandwidth demand on line cards as a result of higher line rates, an increase in deep packet inspection operations and an unstoppable expansion in lookup tables. As line-rate data and NPU processing power increase, memory access time becomes the main system bottleneck during data store/retrieve operations. The growing demand for memory bandwidth contrasts the notion of indirect interconnect methodologies. Moreover, solutions to the memory bandwidth bottleneck are limited by physical constraints such as area and NPU I/O pins. Therefore, indirect interconnects are replaced with direct, packet-based networks such as mesh, torus or k-ary n-cubes. We investigate multiple k-ary n-cube based interconnects and propose two variations of 2-ary 3-cube interconnect called the 3D-bus and 3D-mesh. All of the k-ary n-cube interconnects include multiple, highly efficient techniques to route, switch, and control packet flows in order to minimize congestion spots and packet loss. We explore the tradeoffs between implementation constraints and performance. We also developed an event-driven, interconnect simulation framework to evaluate the performance of packet-based off-chip k-ary n-cube interconnect architectures for line cards. The simulator uses the state-of-the-art software design techniques to provide the user with a flexible yet robust tool, that can emulate multiple interconnect architectures under non-uniform traffic patterns. Moreover, the simulator offers the user with full control over network parameters, performance enhancing features and simulation time frames that make the platform as identical as possible to the real line card physical and functional properties. By using our network simulator, we reveal the best processor-memory configuration, out of multiple configurations, that achieves optimal performance. Moreover, we explore how network enhancement techniques such as virtual channels and sub-channeling improve network latency and throughput. Our performance results show that k-ary n-cube topologies, and especially our modified version of 2-ary 3-cube interconnect - the 3D-mesh, significantly outperform existing line card interconnects and are able to sustain higher traffic loads. The flow control mechanism proved to extensively reduce hot-spots, load-balance areas of high traffic rate and achieve low transmission failure rate. Moreover, it can scale to adopt more memories and/or processors and as a result to increase the line card's processing power. Network-processors k-ary n-cubes processing elements linecards virtual channels Computer Engineering Engineering
5	Balancing Performance, Area, and Power in an On-Chip Network Gold, Brian 06 August 2003 (has links) Several trends can be observed in modern microprocessor design. Architectures have become increasingly complex while design time continues to dwindle. As feature sizes shrink, wire resistance and delay increase, limiting architects from scaling designs centered around a single thread of execution. Where previous decades have focused on exploiting instruction-level parallelism, emerging applications such as streaming media and on-line transaction processing have shown greater thread-level parallelism. Finally, the increasing gap between processor and off-chip memory speeds has constrained performance of memory-intensive applications. The Single-Chip Message Passing (SCMP) parallel computer sits at the confluence of these trends. SCMP is a tiled architecture consisting of numerous thread-parallel processor and memory nodes connected through a structured interconnection network. Using an interconnection network removes global, ad-hoc wiring that limits scalability and introduces design complexity. However, routing data through general purpose interconnection networks can come at the cost of dedicated bandwidth, longer latency, increased area, and higher power consumption. Understanding the impact architectural decisions have on cost and performance will aid in the eventual adoption of general purpose interconnects. This thesis covers the design and analysis of the on-chip network and its integration with the SCMP system. The result of these efforts is a framework for analyzing on-chip interconnection networks that considers network performance, circuit area, and power consumption. / Master of Science area virtual channels SCMP power network router crossbar switch single chip computer message passing system on chip
6	A performance model for wormhole-switched interconnection networks under self-similar traffic. Min, Geyong, Ould-Khaoua, M. January 2004 (has links) No / Many recent studies have convincingly demonstrated that network traffic exhibits a noticeable self-similar nature which has a considerable impact on queuing performance. However, the networks used in current multicomputers have been primarily designed and analyzed under the assumption of the traditional Poisson arrival process, which is inherently unable to capture traffic self-similarity. Consequently, it is crucial to reexamine the performance properties of multicomputer networks in the context of more realistic traffic models before practical implementations show their potential faults. In an effort toward this end, this paper proposes the first analytical model for wormhole-switched k-ary n-cubes in the presence of self-similar traffic. Simulation experiments demonstrate that the proposed model exhibits a good degree of accuracy for various system sizes and under different operating conditions. The analytical model is then used to investigate the implications of traffic self-similarity on network performance. This study reveals that the network suffers considerable performance degradation when subjected to self-similar traffic, stressing the great need for improving network performance to ensure efficient support for this type of traffic. Multicomputers Interconnection networks Traffic self-similarity Adaptive routing Virtual channels Performance modeling
7	Spatial Detection of Multiple Movement Intentions from SAM-Filtered Single-Trial MEG for a high performance BCI Battapady, Harsha 28 July 2009 (has links) The objective of this study is to test whether human intentions to sustain or cease movements in right and left hands can be decoded reliably from spatially filtered single trial magneto-encephalographic (MEG) signals. This study was performed using motor execution and motor imagery movements to achieve a potential high performance Brain-Computer interface (BCI). Seven healthy volunteers, naïve to BCI technology, participated in this study. Signals were recorded from 275-channel MEG and synthetic aperture magnetometry (SAM) was employed as the spatial filter. The four-class classification for natural movement intentions was performed offline; Genetic Algorithm based Mahalanobis Linear Distance (GA-MLD) and direct-decision tree classifier (DTC) techniques were adopted for the classification through 10-fold cross-validation. Through SAM imaging, strong and distinct event related desynchronisation (ERD) associated with sustaining, and event related synchronisation (ERS) patterns associated with ceasing of hand movements were observed in the beta band (15 - 30 Hz). The right and left hand ERD/ERS patterns were observed on the contralateral hemispheres for motor execution and motor imagery sessions. Virtual channels were selected from these cortical areas of high activity to correspond with the motor tasks as per the paradigm of the study. Through a statistical comparison between SAM-filtered virtual channels from single trial MEG signals and basic MEG sensors, it was found that SAM-filtered virtual channels significantly increased the classification accuracy for motor execution (GA-MLD: 96.51 ± 2.43 %) as well as motor imagery sessions (GA-MLD: 89.69 ± 3.34%). Thus, multiple movement intentions can be reliably detected from SAM-based spatially-filtered single trial MEG signals. MEG signals associated with natural motor behavior may be utilized for a reliable high-performance brain-computer interface (BCI) and may reduce long-term training compared with conventional BCI methods using rhythm control. This may prove tremendously helpful for patients suffering from various movement disorders to improve their quality of life. Magnetoencephalography (MEG) synthetic aperture magnetometry (SAM) Virtual channels Brain-computer interface (BCI) Movement intention Motor Control. Engineering
8	Evaluation of Finite Element simulation methods for High Cycle Fatigue on engine components / Utvärdering av simuleringsmetoder för analys av högcykelutmattning på motorkomponenter Pacheco Roman, Oscar January 2018 (has links) This document reflects the results of evaluating three computational methods to analyse the fatigue life of components mounted on the cylinder block; two currently in use at Scania and one that has been further developed from its previous state. Due to the cost of testing and the exponential increase in computational power throughout the years, the cheaper computational analyses have gained in popularity. When a component is mounted in a fairly complex assembly such as an engine, simplifications need to be made in order to make the analysis as less expensive as possible while keeping a high degree of accuracy. The methods of Virtual Vibrations, VROM and VFEM have been evaluated and compared in terms of accuracy, computational cost, user friendliness and general capacities. Additionally, the method VFEM has been further developed and improved from its previous state. A in-depth investigation regarding the differences of the methods has been conducted and improvements to make them more efficient are suggested herein. The reader can also find a decision matrix and recommendations regarding which method to use depending on the general characteristics of the component of interest and other factors. Two components, which differ in complexity and mounting nature, have been used to do the research. High Cycle Fatigue HCF Virtual Vibrations VROM VFEM Virtual Channels Random Vibrations Engine Vibrations FEM Abaqus Reliability and Maintenance Tillförlitlighets- och kvalitetsteknik Vehicle Engineering Farkostteknik
9	Routing on the Channel Dependency Graph: Domke, Jens 20 June 2017 (has links) (PDF) In the pursuit for ever-increasing compute power, and with Moore's law slowly coming to an end, high-performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology-aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today's networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardware- and software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables. Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions. Netzwerkmanagement Routing Hochleistungsrechnen Blockierungsfrei high performance computing network management routing protocols deadlock-free virtual channels unicast network simulations fail-in-place fault tolerance ddc:004 rvk:ST 200
10	Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks Domke, Jens 16 June 2017 (has links) In the pursuit for ever-increasing compute power, and with Moore's law slowly coming to an end, high-performance computing started to scale-out to larger systems. Alongside the increasing system size, the interconnection network is growing to accommodate and connect tens of thousands of compute nodes. These networks have a large influence on total cost, application performance, energy consumption, and overall system efficiency of the supercomputer. Unfortunately, state-of-the-art routing algorithms, which define the packet paths through the network, do not utilize this important resource efficiently. Topology-aware routing algorithms become increasingly inapplicable, due to irregular topologies, which either are irregular by design, or most often a result of hardware failures. Exchanging faulty network components potentially requires whole system downtime further increasing the cost of the failure. This management approach becomes more and more impractical due to the scale of today's networks and the accompanying steady decrease of the mean time between failures. Alternative methods of operating and maintaining these high-performance interconnects, both in terms of hardware- and software-management, are necessary to mitigate negative effects experienced by scientific applications executed on the supercomputer. However, existing topology-agnostic routing algorithms either suffer from poor load balancing or are not bounded in the number of virtual channels needed to resolve deadlocks in the routing tables. Using the fail-in-place strategy, a well-established method for storage systems to repair only critical component failures, is a feasible solution for current and future HPC interconnects as well as other large-scale installations such as data center networks. Although, an appropriate combination of topology and routing algorithm is required to minimize the throughput degradation for the entire system. This thesis contributes a network simulation toolchain to facilitate the process of finding a suitable combination, either during system design or while it is in operation. On top of this foundation, a key contribution is a novel scheduling-aware routing, which reduces fault-induced throughput degradation while improving overall network utilization. The scheduling-aware routing performs frequent property preserving routing updates to optimize the path balancing for simultaneously running batch jobs. The increased deployment of lossless interconnection networks, in conjunction with fail-in-place modes of operation and topology-agnostic, scheduling-aware routing algorithms, necessitates new solutions to solve the routing-deadlock problem. Therefore, this thesis further advances the state-of-the-art by introducing a novel concept of routing on the channel dependency graph, which allows the design of an universally applicable destination-based routing capable of optimizing the path balancing without exceeding a given number of virtual channels, which are a common hardware limitation. This disruptive innovation enables implicit deadlock-avoidance during path calculation, instead of solving both problems separately as all previous solutions. info:eu-repo/classification/ddc/004 ddc:004

Search results