• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 205
  • 72
  • 64
  • 50
  • 25
  • 21
  • 15
  • 10
  • 6
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • Tagged with
  • 680
  • 197
  • 162
  • 136
  • 135
  • 134
  • 127
  • 124
  • 118
  • 85
  • 81
  • 75
  • 73
  • 69
  • 59
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
411

Network-on-Chip Synchronization

Buckler, Mark 07 November 2014 (has links)
Technology scaling has enabled the number of cores within a System on Chip (SoC) to increase significantly. Globally Asynchronous Locally Synchronous (GALS) systems using Dynamic Voltage and Frequency Scaling (DVFS) operate each of these cores on distinct and dynamic clock domains. The main communication method between these cores is increasingly more likely to be a Network-on-Chip (NoC). Typically, the interfaces between these clock domains experience multi-cycle synchronization latencies due to their use of “brute-force” synchronizers. This dissertation aims to improve the performance of NoCs and thereby SoCs as a whole by reducing this synchronization latency. First, a survey of NoC improvement techniques is presented. One such improvement technique: a multi-layer NoC, has been successfully simulated. Given how one of the most commonly used techniques is DVFS, a thorough analysis and simulation of brute-force synchronizer circuits in both current and future process technologies is presented. Unfortunately, a multi-cycle latency is unavoidable when using brute-force synchronizers, so predictive synchronizers which require only a single cycle of latency have been proposed. To demonstrate the impact of these predictive synchronizer circuits at a high level, multi-core system simulations incorporating these circuits have been completed. Multiple forms of GALS NoC configurations have been simulated, including multi-synchronous, NoC-synchronous, and single-synchronizer. Speedup on the SPLASH benchmark suite was measured to directly quantify the performance benefit of predictive synchronizers in a full system. Additionally, Mean Time Between Failures (MTBF) has been calculated for each NoC synchronizer configuration to determine the reliability benefit possible when using predictive synchronizers.
412

Function Verification of Combinational Arithmetic Circuits

Liu, Duo 17 July 2015 (has links)
Hardware design verification is the most challenging part in overall hardware design process. It is because design size and complexity are growing very fast while the requirement for performance is ever higher. Conventional simulation-based verification method cannot keep up with the rapid increase in the design size, since it is impossible to exhaustively test all input vectors of a complex design. An important part of hardware verification is combinational arithmetic circuit verification. It draws a lot of attention because flattening the design into bit-level, known as the bit-blasting problem, hinders the efficiency of many current formal techniques. The goal of this thesis is to introduce a robust and efficient formal verification method for combinational integer arithmetic circuit based on an in-depth analysis of recent advances in computer algebra. The method proposed here solves the verification problem at bit level, while avoiding bit-blasting problem. It also avoids the expensive Groebner basis computation, typically employed by symbolic computer algebra methods. The proposed method verifies the gate-level implementation of the design by representing the design components (logic gates and arithmetic modules) by polynomials in Z2n . It then transforms the polynomial representing the output bits (called “output signature”) into a unique polynomial in input signals (called “input signature”) using gate-level information of the design. The computed input signature is then compared with the reference input signature (golden model) to determine whether the circuit behaves as anticipated. If the reference input signature is not given, our method can be used to compute (or extract) the arithmetic function of the design by computing its input signature. Additional tools, based on canonical word-level design representations (such as TED or BMD) can be used to determine the function of the computed input signature represents. We demonstrate the applicability of the proposed method to arithmetic circuit verification on a large number of designs.
413

On-Chip Communication and Security in FPGAs

Patil, Shivukumar Basanagouda 25 October 2018 (has links)
Innovations in Field Programmable Gate Array (FPGA) manufacturing processes and architectural design have led to the development of extremely large FPGAs. There has also been a widespread adaptation of these large FPGAs in cloud infrastructures and data centers to accelerate search and machine learning applications. Two important topics related to FPGAs are addressed in this work: on-chip communication and security. On-chip communication is quickly becoming a bottleneck in to- day’s large multi-million gate FPGAs. Hard Networks-on-Chip (NoC), made of fixed silicon, have been shown to provide low power, high speed, flexible on-chip communication. An iterative algorithm for routing pre-scheduled time-division-multiplexed paths in a hybrid NoC FPGA architecture is demonstrated in this thesis work. The routing algorithm is based on the well known Pathfinder algorithm, overcomes several limitations of a previous greedy implementation and successfully routes connections using a higher number of timeslots than greedy approaches. The new algorithm shows an average bandwidth improvement of 11% for unicast traffic and multicast traffic patterns. Regarding on-chip FPGA security, a recent study on covert channel communication in Xilinx FPGA devices has shown information leaking from long interconnect wires into immediate neighboring wires. This information leakage can be used by an attacker in a multi-tenant FPGA cloud infrastructure to non-invasively steal secret information from an unsuspecting user design. It is demonstrated that the information leakage is also present in Intel SRAM FPGAs. Information leakage in Cyclone-IV E and Stratix-V FPGA devices is quantified and characterized with varying parameters, and across different routing elements of the FPGAs.
414

Energy-Efficient On-Chip Cache Architectures and Deep Neural Network Accelerators Considering the Cost of Data Movement / データ移動コストを考慮したエネルギー効率の高いキャッシュアーキテクチャとディープニューラルネットワークアクセラレータ

Xu, Hongjie 23 March 2021 (has links)
付記する学位プログラム名: 京都大学卓越大学院プログラム「先端光・電子デバイス創成学」 / 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23325号 / 情博第761号 / 新制||情||130(附属図書館) / 京都大学大学院情報学研究科通信情報システム専攻 / (主査)教授 小野寺 秀俊, 教授 大木 英司, 教授 佐藤 高史 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
415

Differential Power Analysis In-Practice for Hardware Implementations of the Keccak Sponge Function

Graff, Nathaniel 01 June 2018 (has links)
The Keccak Sponge Function is the winner of the National Institute of Standards and Technology (NIST) competition to develop the Secure Hash Algorithm-3 Standard (SHA-3). Prior work has developed reference implementations of the algorithm and described the structures necessary to harden the algorithm against power analysis attacks which can weaken the cryptographic properties of the hash algorithm. This work demonstrates the architectural changes to the reference implementation necessary to achieve the theoretical side channel-resistant structures, compare their efficiency and performance characteristics after synthesis and place-and-route when implementing them on Field Programmable Gate Arrays (FPGAs), publish the resulting implementations under the Massachusetts Institute of Technology (MIT) open source license, and show that the resulting implementations demonstrably harden the sponge function against power analysis attacks.
416

Partitionierungsalgorithmen für Modelldatenstrukturen zur parallelen compilergesteuerten Logiksimulation (Projekt)

Hering, Klaus 08 July 2019 (has links)
Die enorme Komplexität in naher Zukunft absehbarer VLSI-Schaltkreisentwicklungen bedingt auf allen Entwurfsebenen sehr zeitintensive Simulationsprozesse. Eine Antwort auf diese Herausforderung besteht in der Parallelisierung dieser Prozesse. Es wird ein Forschungsvorhaben vorgestellt, welches auf eine effektive Partitionierung von Modelldatenstrukturen im Vorfeld compilergesteuerter Logiksimulationen auf Parallelrechnern mit lose gekoppelten Prozessoren gerichtet ist. Im Rahmen dieses Projekts sollen ausgehend von einem Graphen-Modell Partitionierungsalgorithmen entwickelt, theoretisch untersucht sowie Kriterien für ihren Einsatz in Abhängigkeit von anwendungstypischen Modelleigenschaften abgeleitet werden. Zur experimentellen Unterstützung ist die Entwicklung einer parallel arbeitenden Testumgebung für die Analyse relevanter Modelldatenstrukturen vorgesehen. Eine Erweiterung dieser Umgebung zu einer Softwarekomponente, welche im Ergebnis eines Präprocessing von Modelldatenstrukturen Partitionierungsalgorithmen auswählt und realisiert, soll schließich in ein System zur Logiksimulation auf der Basis parallel arbeitender Instanzen eines der führenden heute kommerziell verfügbaren funktionellen Logiksimulatoren eingebunden werden.
417

Cone-basierte, hierarchische Modellpartitionierung zur parallelen compilergesteuerten Logiksimulation beim VLSI-Design

Hering, Klaus, Haupt, Reiner, Villmann, Thomas 11 July 2019 (has links)
Eine wichtige Form der Verifkation von komplette Prozessorstrukturen umfassenden VLSI-Entwürfen stellt die funktionelle Logiksimulation auf Gatter- und Register-Ebene dar. Im Kontext der Entwicklung eines parallelen Logiksimulationssystems auf Basis des nach dem clock-cycle-Algorithmus arbeitenden funktionellen Simulators TEXSIM (IBM) ist die der parallelen Simulation vorangehende Modellpartitionierung Gegenstand der Betrachtung. Ausgehend von einem strukturellen Hardware-Modell wird auf der Basis des Cone-Begriffs ein zweistufiger hierarchischer Partitionierungsansatz im Rahmen einer k-stufigen Strategie vorgestellt. Dieser Ansatz gibt Untersuchungen zur Kombination von Algorithmen Raum. Ein Superpositionsprinzip für Partitionen gestattet die Verschmelzung der Resultate von Partitionierungsverfahren einer Hierarchiestufe. Mit dem Backward-Cone-Concentration-Algorithmus (n-BCC) und dem Minimum-Overlap-Cone-Cluster-Algorithmus (MOCC) werden im Rahmen unseres bottom-up-Partitionierungsansatzes zwei neue Modellpartitionierungsverfahren eingeführt.
418

Experimental Study and Modeling of the GM-I Dependence of Long-Channel Mosfets

Cheng, Michael Fong 01 March 2019 (has links)
This thesis describes an experimental study and modeling of the current-transconductance dependence of the ALD1106, ALD1107, and CD4007 arrays. The study tests the hypothesis that the I-gm dependence of these 7.8 µm to 10 µm MOSFETs conforms to the Advanced Compact Model (ACM). Results from performed measurements, however, do not support this expectation. Despite the relatively large length, both ALD1106 and ALD1107 show sufficiently pronounced ‘short-channel’ effects to render the ACM inadequate. As a byproduct of this effort, we confirmed the modified ACM equation. With an m factor of approximately 0.6, it captures the I-gm dependence with sub-28% maximum error and sub-10% average error. The paper also introduces several formulas and procedures for I-gm model extraction and tuning. These are not specific to the ALD transistor family and can apply to MOSFETs with different physical size and electrical performance.
419

Testing and Validation of a Prototype Gpgpu Design for FPGAs

Merchant, Murtaza 01 January 2013 (has links) (PDF)
Due to their suitability for highly parallel and pipelined computation, field programmable gate arrays (FPGAs) and general-purpose graphics processing units (GPGPUs) have emerged as top contenders for hardware acceleration of high-performance computing applications. FPGAs are highly specialized devices that can be customized to a specific application, whereas GPGPUs are made of a fixed array of multiprocessors with a rigid architectural model. To alleviate this rigidity as well as to combine some other benefits of the two platforms, it is desirable to explore the implementation of a flexible GPGPU (soft GPGPU) using the reconfigurable fabric found in an FPGA. This thesis describes an aggressive effort to test and validate a prototype GPGPU design targeted to a Virtex-6 FPGA. Individual design stages are tested and integrated together using manually-generated RTL testbenches and logic simulation tools. The soft GPGPU design is validated by benchmarking the platform against five standard CUDA benchmarks. The platform is fully CUDA-compatible and supports direct execution of CUDA compiled binaries. Platform scalability is validated by varying the number of processing cores as well as multiprocessors, and evaluating their effects on area and performance. Experimental results show as average speedup of 25x for a 32 core soft GPGPU configuration over a fully optimized MicroBlaze soft microprocessor, accentuating benefits of the thread-based execution model of GPUs and their ability to perform complex control flow operations in hardware. The testing and validation of the designed soft GPGPU system serves as a prerequisite for rapid design exploration of the platform in the future.
420

Testable Clock Distributions for 3d Integrated Circuits

Buttrick, Michael T 01 January 2011 (has links) (PDF)
The 3D integration of dies promises to address the problem of increased die size caused by the slowing of scaling. By partitioning a design among two or more dies and stacking them vertically, the average interconnect length is greatly decreased and thus power is reduced. Also, since smaller dies will have a higher yield, 3D integration will reduce manufacturing costs. However, this increase in yield can only be seen if manufactured dies can be tested before they are stacked. If not, the overall yield for the die stack will be worse than that of the single, larger die. One of the largest issues with prebond die testing is that, to save power, a single die may not have a complete clock distribution network until bonding. This thesis addresses the problem of prebond die testability by ensuring the clock distribution network on a single die will operate with low skew during testing and at a reduced power consumption during operation as compared to a full clock network. The development of a Delay Lock Loop is detailed and used to synchronize disconnected clock networks on a prebond die. This succeeds in providing a test clock network that operates with a skew that is sufficiently close to the target postbond skew. Additionally, a scheme to increase interdie bandwidth by multiplexing Through-Silicon Vias (TSVs) by the system clock is presented. This technique allows for great increase in the number of effective signal TSVs while imposing a negligible area overhead causing no performance degradation.

Page generated in 0.0152 seconds