• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 26
  • 26
  • 14
  • 12
  • 12
  • 12
  • 12
  • 11
  • 11
  • 11
  • 9
  • 9
  • 6
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Quantum Emulation with Probabilistic Computers

Shuvro Chowdhury (14030571) 31 October 2022 (has links)
<p>The recent groundbreaking demonstrations of quantum supremacy in noisy intermediate scale quantum (NISQ) computing era has triggered an intense activity in establishing finer boundaries between classical and quantum computing. In this dissertation, we use established techniques based on quantum Monte Carlo (QMC) to map quantum problems into probabilistic networks where the fundamental unit of computation, p-bit, is inherently probabilistic and can be tuned to fluctuate between ‘0’ and ‘1’ with desired probability. We can view this mapped network as a Boltzmann machine whose states each represent a Feynman path leading from an initial configuration of q-bits to a final configuration. Each such path, in general, has a complex amplitude, ψ which can be associated with a complex energy. The real part of this energy can be used to generate samples of Feynman paths in the usual way, while the imaginary part is accounted for by treating the samples as complex entities, unlike ordinary Boltzmann machines where samples are positive. This mapping of a quantum circuit onto a Boltzmann machine with complex energies should be particularly useful in view of the advent of special-purpose hardware accelerators known as Ising Machines which can obtain a very large number of samples per second through massively parallel operation. We also demonstrate this acceleration using a recently used quantum problem and speeding its QMC simulation by a factor of ∼ 1000× compared to a highly optimized CPU program. Although this speed-up has been demonstrated using a graph colored architecture in FPGA, we project another ∼ 100× improvement with an architecture that utilizes clockless analog circuits. We believe that this will contribute significantly to the growing efforts to push the boundaries of the simulability of quantum circuits with classical/probabilistic resources and comparing them with NISQ-era quantum computers. </p>
22

Internet of Things Architecture Design and Implementation for Immersive Interfaces

Javier Belmonte (9193829) 09 September 2022 (has links)
<div>The coming of the Internet of things (IoT) has enabled manufacturers, teachers, machine operators, makers, and researchers to design and use new workflows, fabricate parts efficiently and effectively, and interact with systems and devices in ways that were not possible before.</div><div>These networked systems have changed the way in which input is received from and data is outputted to humans. Context-awareness and autonomy are characteristics of these devices that result in automated processes, faster production times, and more intuitive interfaces. Direct manipulation is an intuitive and natural human-computer interaction (HCI) that enables its users an easy and fast learning.</div><div>In this thesis, an Internet of things architecture is designed and implemented to enable control and data visualization in machines and devices through immersive interfaces using direct manipulation. The proposed architecture and interfaces are tested and validated approaching three different categories of systems; namely, systems that need to be modified to be IoT ready, systems that are IoT ready, and systems that have not yet been constructed. For the latter case, a custom system has been made to evaluate and test the whole architecture and its implementation. The knowledge acquired in developing this architecture and the design rationale behind the development of immersive interfaces, are summarized and presented as a series of guidelines and recommendations for IoT systems manufacturers to follow to include immersive interfaces in their designs.</div>
23

Towards No-Penalty Control Hazard Handling in RISC architecture microcontrollers

LINKNATH SURYA BALASUBRAMANIAN (8781929) 03 September 2024 (has links)
<p dir="ltr">Achieving higher throughput is one of the most important requirements of a modern microcontroller. It is therefore not affordable for it to waste a considerable number of clock cycles in branch mispredictions. This paper proposes a hardware mechanism that makes microcontrollers forgo branch predictors, thereby removing branch mispredictions. The scope of this work is limited to low cost microcontroller cores that are applied in embedded systems. The proposed technique is implemented as five different modules which work together to forward required operands, resolve branches without prediction, and calculate the next instruction's address in the first stage of an in-order five stage pipelined micro-architecture. Since the address of successive instruction to a control transfer instruction is calculated in the first stage of pipeline, branch prediction is no longer necessary, thereby eliminating the clock cycle penalties occurred when using a branch predictor. The designed architecture was able to successfully calculate the address of next correct instruction and fetch it without any wastage of clock cycles except in cases where control transfer instructions are in true dependence with their immediate previous instructions. Further, we synthesized the proposed design with 7nm FinFET process and compared its latency with other designs to make sure that the microcontroller's operating frequency is not degraded by using this design. The critical path latency of instruction fetch stage integrated with the proposed architecture is 307 ps excluding the instruction cache access time.</p>
24

EFFICIENTNEXT: EFFICIENTNET FOR EMBEDDED SYSTEMS

Abhishek Rajendra Deokar (12456477) 12 July 2022 (has links)
<p>Convolutional Neural Networks have come a long way since  AlexNet. Each year the limits of the state of the art are being pushed to new  levels. EfficientNet pushed the performance metrics to a new high and EfficientNetV2 even more so. Even so, architectures for mobile applications can benefit from improved accuracy and reduced model footprint. The classic Inverted Residual block has been the foundation upon which most mobile networks seek to improve. EfficientNet architecture is built using the same Inverted Residual block. In this paper we experiment with Harmonious Bottlenecks in  place of the Inverted Residuals to observe a reduction in the number of parameters and improvement in accuracy. The designed network is then deployed on the NXP i.MX 8M Mini board for Image classification.</p>
25

AUTOMATING BIG VISUAL DATA COLLECTION AND ANALYTICS TOWARD LIFECYCLE MANAGEMENT OF ENGINEERING SYSTEMS

Jongseong Choi (9011111) 09 September 2022 (has links)
Images have become a ubiquitous and efficient data form to record information. Use of this option for data capture has largely increased due to the widespread availability of image sensors and sensor platforms (e.g., smartphones and drones), the simplicity of this approach for broad groups of users, and our pervasive access to the internet as one class of infrastructure in itself. Such data contains abundant visual information that can be exploited to automate asset assessment and management tasks that traditionally are manually conducted for engineering systems. Automation of the data collection, extraction and analytics is however, key to realizing the use of these data for decision-making. Despite recent advances in computer vision and machine learning techniques extracting information from an image, automation of these real-world tasks has been limited thus far. This is partly due to the variety of data and the fundamental challenges associated with each domain. Due to the societal demands for access to and steady operation of our infrastructure systems, this class of systems represents an ideal application where automation can have high impact. Extensive human involvement is required at this time to perform everyday procedures such as organizing, filtering, and ranking of the data before executing analysis techniques, consequently, discouraging engineers from even collecting large volumes of data. To break down these barriers, methods must be developed and validated to speed up the analysis and management of data over the lifecycle of infrastructure systems. In this dissertation, big visual data collection and analysis methods are developed with the goal of reducing the burden associated with human manual procedures. The automated capabilities developed herein are focused on applications in lifecycle visual assessment and are intended to exploit large volumes of data collected periodically over time. To demonstrate the methods, various classes of infrastructure, commonly located in our communities, are chosen for validating this work because they: (i) provide commodities and service essential to enable, sustain, or enhance our lives; and (ii) require a lifecycle structural assessment in a high priority. To validate those capabilities, applications of infrastructure assessment are developed to achieve multiple approaches of big visual data such as region-of-interest extraction, orthophoto generation, image localization, object detection, and image organization using convolution neural networks (CNNs), depending on the domain of lifecycle assessment needed in the target infrastructure. However, this research can be adapted to many other applications where monitoring and maintenance are required over their lifecycle.
26

ACCELERATING SPARSE MACHINE LEARNING INFERENCE

Ashish Gondimalla (14214179) 17 May 2024 (has links)
<p>Convolutional neural networks (CNNs) have become important workloads due to their<br> impressive accuracy in tasks like image classification and recognition. Convolution operations<br> are compute intensive, and this cost profoundly increases with newer and better CNN models.<br> However, convolutions come with characteristics such as sparsity which can be exploited. In<br> this dissertation, we propose three different works to capture sparsity for faster performance<br> and reduced energy. </p> <p><br></p> <p>The first work is an accelerator design called <em>SparTen</em> for improving two-<br> sided sparsity (i.e, sparsity in both filters and feature maps) convolutions with fine-grained<br> sparsity. <em>SparTen</em> identifies efficient inner join as the key primitive for hardware acceleration<br> of sparse convolution. In addition, <em>SparTen</em> proposes load balancing schemes for higher<br> compute unit utilization. <em>SparTen</em> performs 4.7x, 1.8x and 3x better than dense architecture,<br> one-sided architecture and SCNN, the previous state of the art accelerator. The second work<br> <em>BARISTA</em> scales up SparTen (and SparTen like proposals) to large-scale implementation<br> with as many compute units as recent dense accelerators (e.g., Googles Tensor processing<br> unit) to achieve full speedups afforded by sparsity. However at such large scales, buffering,<br> on-chip bandwidth, and compute utilization are highly intertwined where optimizing for<br> one factor strains another and may invalidate some optimizations proposed in small-scale<br> implementations. <em>BARISTA</em> proposes novel techniques to balance the three factors in large-<br> scale accelerators. <em>BARISTA</em> performs 5.4x, 2.2x, 1.7x and 2.5x better than dense, one-<br> sided, naively scaled two-sided and an iso-area two-sided architecture, respectively. The last<br> work, <em>EUREKA</em> builds an efficient tensor core to execute dense, structured and unstructured<br> sparsity with losing efficiency. <em>EUREKA</em> achieves this by proposing novel techniques to<br> improve compute utilization by slightly tweaking operand stationarity. <em>EUREKA</em> achieves a<br> speedup of 5x, 2.5x, along with 3.2x and 1.7x energy reductions over Dense and structured<br> sparse execution respectively. <em>EUREKA</em> only incurs area and power overheads of 6% and<br> 11.5%, respectively, over Ampere</p>

Page generated in 0.0546 seconds