Global ETD Search

11	Multi-Functional Interfaces for Accelerators Piccolboni, Luca January 2022 (has links) Heterogeneous System-on-Chip (SoC) architectures combine general-purpose processors with many accelerators, which are application-specific computing engines. By having their hardware optimized to perform specific tasks, accelerators deliver massive speedups and energy savings compared to corresponding software executions on a processor. Heterogeneity and hardware specialization complicate accelerator design and integration, reducing regularity and reusability across platforms. The many system-level architectural aspects to consider make it hard to explore the design space and arrive to optimal solutions. Furthermore, integrating accelerators affects the programmability of the applications and the security of the entire SoC. In this dissertation, I present design methodologies and architectural contributions that use multi-functional interfaces to simplify many of the tasks that designers perform when designing and integrating accelerators in heterogeneous SoCs. The accelerator interfaces exploit latency-insensitive design to effectively explore the design space when multiple accelerators are integrated and to speed up the verification of accelerators. This improves their reusability across SoC platforms, while ensuring correctness when the accelerators are integrated with the various components of the SoC. In addition, the accelerator interfaces improve the integration with software by making it transparent and by establishing a strong layer of protection between accelerators and applications.The interfaces aim at securing the accelerators and the applications without requiring modifications to the accelerator implementations and without degrading their performance and energy efficiency. Computer science Computer engineering Computer security
12	Scheduling on-chip networks Wu, Xiang 23 October 2009 (has links) Networks-on-Chip (NoC) have been proposed to meet many challenges of modern Systems-on-Chip (SoC) design and manufacturing. At the architectural level, a clean separation of computation and communication helps integration and verification. Networking abstraction of the communication infrastructure also promotes reuse and fast development. But the benefit is most visible when it comes to circuit and physical design. Networks can be made sparse and regular and thus facilitate placement and route. It is also much easier to reach timing and power closure as NoC shield communication details away from complicating analysis. Last but not the least, networks are flexible at the design stage and adaptable post-silicon. Many techniques of tackling process variation and interconnect failure can be built upon NoC. However, when interconnects are time multiplexed in a NoC, the network’s performance will deteriorate if it is not scheduled properly. For a wide range of applications, the traffic on the network can be determined before run-time and offline scheduling offers guaranteed performance and enables simple design. We propose a synthesis flow that takes the data flow graph of the application and a network topology as inputs; and outputs an offline schedule that can be deployed directly to the NoC. We analyze the complexity of combinatorial problems that arise from this context and provide efficient heuristics when polynomial time algorithms are not available assuming P [not equal to] NP. Results on LDPC decoding and FFT designs are compared with previous ones. We further apply our findings to parallel shared memories (PSM) and formalize the PSM architecture and its scheduling problem. An efficient heuristic is derived from our algorithm for unbuffered networks. Another application exemplifies how the NoC can be reprogrammed after silicon is back from fab in order to avoid failed interconnects due to process variation. A simple statistical model is studied and the simulation result is rather interesting. We find out that high performance and yield are not always at conflict if we are able to change the network schedule based on silicon diagnosis. / text Networks-on-chip design Systems-on-chip design Network scheduling Interconnects Parallel shared memories Network traffic matrix scheduling NoC design Scheduling networks-on-chip
13	Optimal Network Topologies and Resource Mappings for Heterogeneous Networks-on-Chip Chung, Haera 01 January 2013 (has links) Communication has become a bottleneck for modern microprocessors and multi-core chips because metal wires don't scale. The problem becomes worse as the number of components increases and chips become bigger. Traditional Systems-on-Chips (SoCs) interconnect architectures are based on shared-bus communication, which can carry only one communication transaction at a time. This limits the communication bandwidth and scalability. Networks-on-Chip (NoC) were proposed as a promising solution for designing large and complex SoCs. The NoC paradigm provides better scalability and reusability for future SoCs, however, long-distance multi-hop communication through traditional metal wires suffers from both high latency and power consumption. A radical solution to address this challenge is to add long-range, low power, and high-bandwidth single-hop links between distant cores. The use of optical or on-chip RF wireless links has been explored in this context. However, all previous work has focused on regular mesh-based metal wire fabrics that were expanded with one or two additional link types only for long-distance communication. In this thesis we address the following main research questions to address the above-mentioned challenges: (1) What library of different link types would represent an optimum in the design space? (2) How would these links be used to design an application-specific NoC architecture? (3) How would applications use the resulting NoC architecture efficiently? We hypothesize that networks with a higher degree of heterogeneity, i.e., three or more link types, will improve the network throughput and consume less energy compared to traditional NoC architectures. In order to verify our hypothesis and to address the research challenges, we design and analyze optimal heterogeneous networks under different realistic traffic models by considering different cost and performance trade-offs in a comprehensive technology-agnostic simulation framework that uses metaheuristic optimization techniques. As opposed to related work, our heterogeneous links can be placed anywhere in the network, which allows to explore the entire search space. The resulting application-specific networks are then analyzed by using complex network techniques, such as community detection and small-worldness, to understand how heterogeneous link types are used to improve the NoCs performance and cost. Next, we use the application-specific networks as a target architecture for other applications. The goal is to evaluate the performance of our new NoCs for applications they have not been designed for by finding optimal resource allocations. Our results show that there is an optimal number of heterogeneous link types for each set of constraints and that networks with three or more heterogeneous link types provide significantly higher throughput along with lower energy consumption compared to both homogeneous link type and regular 2D mesh networks under three different traffic scenarios. Our evolved networks with three different technology-driven link types, namely metal wires, wireless, and optical links, provide 15% more throughput and fourteen times less energy consumption compared to homogeneous link type network. When ten different abstract link types are used in the design, 12% more throughput and 52% less energy consumption are obtained compared to networks with three different technology-driven link types. This shows that heterogeneous NoC designs based on traditional metal wires, wireless, and optical links, occupy a non-optimal spot in the entire design space. Our results further show that heterogeneous NoCs scale up significantly better in terms of performance and cost compared to mesh networks. We uncovered that network communities evolve robustly and that heterogeneous link types are efficiently establishing inter- and intra-subnet connections depending on their link type properties. We also show that mapping an application on our application-specific NoC architecture provides on average 45% more throughput at 70% less energy consumption compared to regular 2D mesh networks. The NoCs are therefore not only good for the application they were designed for, but for a broad range of other applications as well. Networks on a chip Heterogeneous computing Computer and Systems Architecture Systems and Communications
14	Conception des systèmes logiciel/matériel : du partitionnement logiciel/matériel au prototypage sur plateformes reconfigurables Rousseau, F. 08 July 2005 (has links) (PDF) Ce document retrace mes activités de recherche depuis ma thèse soutenue en juillet 1997. Certains des travaux présentés sont achevés, d'autres sont en cours ou encore dans un stade exploratoire. De 1993 à 1999, je me suis intéressé aux différents aspects du <br />partitionnement logiciel/matériel dans la conception de systèmes intégrés numériques de télécommunications. Depuis 1999, mes travaux ont porté sur la conception de systèmes multiprocesseurs monopuces, et plus particulièrement sur ce qui a trait aux relations entre <br />logiciel et matériel. Ces systèmes sont généralement dédiés à une application ou à une classe d'applications, ce qui permet d'optimiser l'architecture et les programmes. Mes recherches ses sont donc <br />focalisées sur l'architecture mémoire, les interfaces de <br />communication entre composants et le prototypage. Pour ces trois axes de recherche, des méthodes et des outils d'aide à la conception ont été définis et développés. Des travaux toujours en cours portent sur la généralisation d'une méthode de conception de composants d'interface matériels à partir <br />d'une spécification sous forme de services requis et fournis. Une telle spécification est déjà utilisée pour représenter des protocoles dans les réseaux de communication et pour le développement<br />des couches logicielles de communication. Son extension à la conception des interfaces matérielles homogénéiserait les langages, méthodes et outils de l'environnement de conception. Mes travaux futurs s'orientent vers deux axes : l'intégration <br />logiciel/matériel et l'adéquation entre architecture et système d'exploitation. Dans les deux cas, les relations étroites entre les ressources physiques de l'architecture et les couches logicielles qui y accèdent doivent permettre d'améliorer sensiblement les performances. hardware/software system design system-on-chip design and validation
15	Physical synthesis for nanometer VLSI and emerging technologies Cho, Minsik, 1976- 07 September 2012 (has links) The unabated silicon technology scaling makes design and manufacturing increasingly harder in nanometer VLSI. Emerging technologies on the horizon require strong design automation to handle the large complexity of future systems. This dissertation studies eight related research topics in design and manufacturing closure in nanometer VLSI as well as design optimization for emerging technologies from physical synthesis perspective. In physical synthesis for design closure, we study three research topics, which are key challenges in nanometer VLSI designs: (a) We propose a highly efficient floorplanning algorithm to minimize substrate noise for mixed-signal system-on-a-chip designs. (b) We propose a clock tree synthesis algorithm to reduce clock skew under thermal variation. (c) We develop a global router, BoxRouter to enhance routability which is one of the classic but still critical challenges in modern VLSI. In physical synthesis for manufacturing closure, we propose the first systematic manufacturability aware routing framework to address three key manufacturing challenges: (a) We develop a predictive chemical-mechanical polishing model to guide global routing in order to reduce surface topography variation. (b) We formulate a random defect minimize problem in track routing, and develop a highly efficient algorithm. (b) We propose a lithography enhancement technique during detailed routing based on statistical and macro-level Post-OPC printability prediction. Regarding design optimization of emerging technologies, we focus on two topics, one in double patterning technology for future VLSI fabrication and the other in microfluidics for biochips: (a) We claim double patterning should be considered during physical synthesis, and propose an effective double patterning technology aware detailed routing algorithm. (b) We propose a droplet routing algorithm to improve routability in digital microfluidic biochip design. / text Algorithms Biochips--Design
16	Architecture and physical design for advanced networks-on-chip Jang, Woo Young 01 June 2011 (has links) The aggressive scaling of the semiconductor technology following the Moore’s Law has delivered true system-on-chip (SoC) integration. Network-on-chip (NoC) has been recently introduced as an effective solution for scalable on-chip communication since dedicated point-to-point (P2P) interconnection and shared bus architecture become performance and power bottlenecks in the SoCs. This dissertation studies three critical NoC challenges such as latency, power, and compatibility with emerging technologies in aspect of an architecture and physical design level. Latency is a key issue in NoC since the performance of applications considerably depends on resource sharing policies employed in an on-chip network. NoCs have been mainly developed to improve network-level performance that captures the inherent performance characteristics of a network itself, but the network-level optimizations are not directly related to application- or system-level performance. In addition, memory latency on NoC critically affects the performance of applications or systems. We propose a synchronous dynamic random access memory (SDRAM) aware NoC design to optimize memory throughput, latency, and design complexity. Furthermore, it is extended to an application-aware NoC design to provide the quality-of-service (QoS) of memory for various applications. NoC provides great on-chip communication. However, it brings no true relief to power budget when the on-chip network scales in terms of complexity/size and signal bandwidth. The combination of NoC and other techniques has the potential to reduce power. We study two power saving research topics for NoC: (a) we propose a voltage-frequency island (VFI) aware NoC optimization framework with a better tradeoff between power efficiency and design complexity to minimize both computation and on-chip communication power. (b) We formulate an application mapping problem to mixed integer quadratic programming (MIQP) with the purpose of reducing power consumption in various hard networks and develop highly efficient algorithms for the MIQP. Regarding NoC compatible with new technologies, we focus on three dimensional (3D) die integration based on through-silicon vias (TSVs). Since an on-chip network design has been subject to not only application constraints but also design/manufacturing constraints, a 3D NoC design is required for innovation in interconnection networks. We propose a chemical-mechanical polishing (CMP) aware application-specific 3D NoC design that minimizes TSV height variation, thus reduces bonding failure, and meanwhile optimizes conventional NoC design objectives such as hop count, wirelength, power, and area. / text 3-D integration Networks-on-chip System-on-chip Latency Power 3D integration Chip design Chip architecture
17	A Logic Test Chip for Optimal Test and Diagnosis Niewenhuis, Benjamin T. 01 May 2018 (has links) The benefits of the continued progress in integrated circuit manufacturing have been numerous, most notably in the explosion of computing power in devices ranging from cell phones to cars. Key to this success has been strategies to identify, manage, and mitigate yield loss. One such strategy is the use of test structures to identify sources of yield loss early in the development of a new manufacturing process. However, the aggressive scaling of feature dimensions, the integration of new materials, and the increase in structural complexity in modern technologies has challenged the capabilities of conventional test structures. To help address these challenges, a new logic test chip, called the Carnegie Mellon Logic Characterization Vehicle (CM-LCV), has been developed. The CM-LCV utilizes a two- dimensional array of functional unit blocks (FUBs) that each implement an innovative functionality. Properties including fault coverage, logical and physical design features, and fault distinguishability are shown to be composable within the FUB array; that is, they exist regardless of the size and composition of the FUB array. A synthesis ow that leverages this composability to adapt the FUB array to a wide range of test chip design requirements is presented. The connection between the innovative FUB functionality and orthogonal Latin squares is identified and used to analyze the universe of possible FUB functions. Two additional variants to the FUB array are also developed: heterogenous FUB arrays utilize multiple FUB functions to improve the synthesis ow performance, while pipelined FUB arrays incorporate sequential circuit elements (e.g., ip- ops and latches) that are absent from the original combinational FUB array. In addition to the design of the CM-LCV, methods for testing it are presented. Techniques to create minimal sets of test patterns that exhaustively exercise each FUB within the FUB array are developed. Additional constraints are described for the heterogenous and pipelined FUB arrays that allow these techniques to be applied for both variant FUB arrays. Furthermore, a simple built-in self test (BIST) scheme is described and applied to a reference design, resulting in a 88.0% reduction in the number of test cycles required without loss in fault coverage. A hierarchical FUB array diagnosis methodology (HFAD) is also presented for the CM- LCV that leverages its unique properties to improve performance for multiple defects. Experiments demonstrate that this HFAD methodology is capable of perfect accuracy in 93.1% of simulations with two injected faults, an improvement on the state-of-the-art commercial diagnosis. Additionally, silicon fail data was collected from a CM-LCV manufactured using a 14nm process by an industry partner. A comparison of the diagnosis results for the 1,375 fail logs examined shows that the HFAD methodology discovers additional defects during multiple defect diagnosis that the commercial tool misses for 40 of the diagnosed fail logs. Examination of these cases shows that the additional defects found by the HFAD methodology can result in improved diagnosis confidence and more precise descriptions of the defect behavior(s). The contributions of this dissertation can thus be summarized as the description of the design, test, and diagnosis of a new logic test chip for use in yield learning during process development. This CM-LCV can be adapted to meet a wide range of test chip requirements, can be efficiently and rigorously tested, and exhibits properties that can be used to improve diagnosis outcomes. All of these claims are validated through both simulated experiments and silicon data. digital circuit diagnosis digital circuit test test chip design yield learning
18	Método otimizado de arquitetura de coerência de cache baseado em sistemas embarcados multinúcleos. / Optimized method for cache coherence architecture based on multicore embedded systems. Kofuji, Jussara Marândola 01 December 2011 (has links) A tese apresenta um método de arquitetura de coerência de cache especializado por sistemas embarcados. Um das contribuições principais deste método é apresentar uma proposição de arquitetura CMP de memória compartilhada orientada a padrões de acesso a memória e de um protocolo de coerência híbrido. A contribuição principal é a especificação do novo componente de hardware, chamado tabela de padrões, o qual é validado por representação formal e pela implementação da estrutura da tabela de padrões. A partir desta tabela foi desenvolvido um modelo de transação de mensagens do protocolo híbrido que diferencia as mensagens em clássicas e especulativas. A contribuição final apresenta um modelo analítico do custo efetivo de desempenho do protocolo híbrido. / This thesis presents the optimized method of cache coherent architecture based on embedded systems. The main contribution of this method presents the proposal of shared memory architecture CMP oriented by memory access patterns and cache coherent hybrid protocol. The cache coherent architecture provided the hardware specification called pattern table which can be validated by formal representation and the first implementation of pattern table. Through pattern table was developed the model of messages transaction to hybrid protocol witch differ the messages in classical and speculative. The final contribution presents the analytic model of effective cost of hybrid protocol performance. Cache coherent protocol Chip design Concepção de processador Descrição de hardware Hardware description Memory access patterns Padrões de acesso à memória Protocolo de coerência de cache
19	Método otimizado de arquitetura de coerência de cache baseado em sistemas embarcados multinúcleos. / Optimized method for cache coherence architecture based on multicore embedded systems. Jussara Marândola Kofuji 01 December 2011 (has links) A tese apresenta um método de arquitetura de coerência de cache especializado por sistemas embarcados. Um das contribuições principais deste método é apresentar uma proposição de arquitetura CMP de memória compartilhada orientada a padrões de acesso a memória e de um protocolo de coerência híbrido. A contribuição principal é a especificação do novo componente de hardware, chamado tabela de padrões, o qual é validado por representação formal e pela implementação da estrutura da tabela de padrões. A partir desta tabela foi desenvolvido um modelo de transação de mensagens do protocolo híbrido que diferencia as mensagens em clássicas e especulativas. A contribuição final apresenta um modelo analítico do custo efetivo de desempenho do protocolo híbrido. / This thesis presents the optimized method of cache coherent architecture based on embedded systems. The main contribution of this method presents the proposal of shared memory architecture CMP oriented by memory access patterns and cache coherent hybrid protocol. The cache coherent architecture provided the hardware specification called pattern table which can be validated by formal representation and the first implementation of pattern table. Through pattern table was developed the model of messages transaction to hybrid protocol witch differ the messages in classical and speculative. The final contribution presents the analytic model of effective cost of hybrid protocol performance. Concepção de processador Descrição de hardware Padrões de acesso à memória Protocolo de coerência de cache Cache coherent protocol Chip design Hardware description Memory access patterns
20	Post-silicon Functional Validation with Virtual Prototypes Cong, Kai 03 June 2015 (has links) Post-silicon validation has become a critical stage in the system-on-chip (SoC) development cycle, driven by increasing design complexity, higher level of integration and decreasing time-to-market. According to recent reports, post-silicon validation effort comprises more than 50% of the overall development effort of an 65nm SoC. Though post-silicon validation covers many aspects ranging from electronic properties of hardware to performance and power consumption of whole systems, a central task remains validating functional correctness of both hardware and its integration with software. There are several key challenges to achieving accelerated and low-cost post-silicon functional validation. First, there is only limited silicon observability and controllability; second, there is no good test coverage estimation over a silicon device; third, it is difficult to generate good post-silicon tests before a silicon device is available; fourth, there is no effective software robustness testing approaches to ensure the quality of hardware/software integration. We propose a systematic approach to accelerating post-silicon functional validation with virtual prototypes. Post-silicon test coverage is estimated in the pre-silicon stage by evaluating the test cases on the virtual prototypes. Such analysis is first conducted on the initial test suite assembled by the user and subsequently on the expanded test suite which includes test cases that are automatically generated. Based on the coverage statistics of the initial test suite on the virtual prototypes, test cases are automatically generated to improve the test coverage. In the post-silicon stage, our approach supports coverage evaluation of test cases on silicon devices to ensure fidelity of early coverage evaluation. The generated test cases are issued to silicon devices to detect inconsistencies between virtual prototypes and silicon devices using conformance checking. We further extend the test case generation framework to generate and inject fault scenario with virtual prototypes for driver robustness testing. Besides virtual prototype-based fault injection, an automatic driver fault injection approach is developed to support runtime fault generation and injection for driver robustness testing. Since virtual prototype enables early driver development, our automatic driver fault injection approach can be applied to driver testing in both pre-silicon and post-silicon stages. For preliminary evaluation, we have applied our coverage evaluation and test generation to several network adapters and their virtual prototypes. We have conducted coverage analysis for a suite of common tests on both the virtual prototypes and silicon devices. The results show that our approach can estimate the test coverage with high fidelity. Based on the coverage estimation, we have employed our automatic test generation approach to generate additional tests. When the generated test cases were issued to both virtual prototypes and silicon devices, we observed significant coverage improvement. And we detected 20 inconsistencies between virtual prototypes and silicon devices, each of which reveals a virtual prototype or silicon device defect. After we applied virtual prototype-based fault injection approach to virtual prototypes for three widely-used network adapters, we generated and injected thousands of fault scenarios and found 2 driver bugs. For automatic driver fault injection, we have applied our approach to 12 widely used drivers with either virtual prototypes or silicon devices. After testing all these drivers, we found 28 distinct bugs. Computer software -- Verification Prototypes Engineering -- Computer simulation Other Computer Sciences Software Engineering

Search results