• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 14
  • 14
  • 14
  • 12
  • 11
  • 11
  • 11
  • 11
  • 10
  • 9
  • 9
  • 9
  • 6
  • 6
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Towards No-Penalty Control Hazard Handling in RISC architecture microcontrollers

LINKNATH SURYA BALASUBRAMANIAN (8781929) 03 September 2024 (has links)
<p dir="ltr">Achieving higher throughput is one of the most important requirements of a modern microcontroller. It is therefore not affordable for it to waste a considerable number of clock cycles in branch mispredictions. This paper proposes a hardware mechanism that makes microcontrollers forgo branch predictors, thereby removing branch mispredictions. The scope of this work is limited to low cost microcontroller cores that are applied in embedded systems. The proposed technique is implemented as five different modules which work together to forward required operands, resolve branches without prediction, and calculate the next instruction's address in the first stage of an in-order five stage pipelined micro-architecture. Since the address of successive instruction to a control transfer instruction is calculated in the first stage of pipeline, branch prediction is no longer necessary, thereby eliminating the clock cycle penalties occurred when using a branch predictor. The designed architecture was able to successfully calculate the address of next correct instruction and fetch it without any wastage of clock cycles except in cases where control transfer instructions are in true dependence with their immediate previous instructions. Further, we synthesized the proposed design with 7nm FinFET process and compared its latency with other designs to make sure that the microcontroller's operating frequency is not degraded by using this design. The critical path latency of instruction fetch stage integrated with the proposed architecture is 307 ps excluding the instruction cache access time.</p>
12

EFFICIENTNEXT: EFFICIENTNET FOR EMBEDDED SYSTEMS

Abhishek Rajendra Deokar (12456477) 12 July 2022 (has links)
<p>Convolutional Neural Networks have come a long way since  AlexNet. Each year the limits of the state of the art are being pushed to new  levels. EfficientNet pushed the performance metrics to a new high and EfficientNetV2 even more so. Even so, architectures for mobile applications can benefit from improved accuracy and reduced model footprint. The classic Inverted Residual block has been the foundation upon which most mobile networks seek to improve. EfficientNet architecture is built using the same Inverted Residual block. In this paper we experiment with Harmonious Bottlenecks in  place of the Inverted Residuals to observe a reduction in the number of parameters and improvement in accuracy. The designed network is then deployed on the NXP i.MX 8M Mini board for Image classification.</p>
13

AUTOMATING BIG VISUAL DATA COLLECTION AND ANALYTICS TOWARD LIFECYCLE MANAGEMENT OF ENGINEERING SYSTEMS

Jongseong Choi (9011111) 09 September 2022 (has links)
Images have become a ubiquitous and efficient data form to record information. Use of this option for data capture has largely increased due to the widespread availability of image sensors and sensor platforms (e.g., smartphones and drones), the simplicity of this approach for broad groups of users, and our pervasive access to the internet as one class of infrastructure in itself. Such data contains abundant visual information that can be exploited to automate asset assessment and management tasks that traditionally are manually conducted for engineering systems. Automation of the data collection, extraction and analytics is however, key to realizing the use of these data for decision-making. Despite recent advances in computer vision and machine learning techniques extracting information from an image, automation of these real-world tasks has been limited thus far. This is partly due to the variety of data and the fundamental challenges associated with each domain. Due to the societal demands for access to and steady operation of our infrastructure systems, this class of systems represents an ideal application where automation can have high impact. Extensive human involvement is required at this time to perform everyday procedures such as organizing, filtering, and ranking of the data before executing analysis techniques, consequently, discouraging engineers from even collecting large volumes of data. To break down these barriers, methods must be developed and validated to speed up the analysis and management of data over the lifecycle of infrastructure systems. In this dissertation, big visual data collection and analysis methods are developed with the goal of reducing the burden associated with human manual procedures. The automated capabilities developed herein are focused on applications in lifecycle visual assessment and are intended to exploit large volumes of data collected periodically over time. To demonstrate the methods, various classes of infrastructure, commonly located in our communities, are chosen for validating this work because they: (i) provide commodities and service essential to enable, sustain, or enhance our lives; and (ii) require a lifecycle structural assessment in a high priority. To validate those capabilities, applications of infrastructure assessment are developed to achieve multiple approaches of big visual data such as region-of-interest extraction, orthophoto generation, image localization, object detection, and image organization using convolution neural networks (CNNs), depending on the domain of lifecycle assessment needed in the target infrastructure. However, this research can be adapted to many other applications where monitoring and maintenance are required over their lifecycle.
14

ACCELERATING SPARSE MACHINE LEARNING INFERENCE

Ashish Gondimalla (14214179) 17 May 2024 (has links)
<p>Convolutional neural networks (CNNs) have become important workloads due to their<br> impressive accuracy in tasks like image classification and recognition. Convolution operations<br> are compute intensive, and this cost profoundly increases with newer and better CNN models.<br> However, convolutions come with characteristics such as sparsity which can be exploited. In<br> this dissertation, we propose three different works to capture sparsity for faster performance<br> and reduced energy. </p> <p><br></p> <p>The first work is an accelerator design called <em>SparTen</em> for improving two-<br> sided sparsity (i.e, sparsity in both filters and feature maps) convolutions with fine-grained<br> sparsity. <em>SparTen</em> identifies efficient inner join as the key primitive for hardware acceleration<br> of sparse convolution. In addition, <em>SparTen</em> proposes load balancing schemes for higher<br> compute unit utilization. <em>SparTen</em> performs 4.7x, 1.8x and 3x better than dense architecture,<br> one-sided architecture and SCNN, the previous state of the art accelerator. The second work<br> <em>BARISTA</em> scales up SparTen (and SparTen like proposals) to large-scale implementation<br> with as many compute units as recent dense accelerators (e.g., Googles Tensor processing<br> unit) to achieve full speedups afforded by sparsity. However at such large scales, buffering,<br> on-chip bandwidth, and compute utilization are highly intertwined where optimizing for<br> one factor strains another and may invalidate some optimizations proposed in small-scale<br> implementations. <em>BARISTA</em> proposes novel techniques to balance the three factors in large-<br> scale accelerators. <em>BARISTA</em> performs 5.4x, 2.2x, 1.7x and 2.5x better than dense, one-<br> sided, naively scaled two-sided and an iso-area two-sided architecture, respectively. The last<br> work, <em>EUREKA</em> builds an efficient tensor core to execute dense, structured and unstructured<br> sparsity with losing efficiency. <em>EUREKA</em> achieves this by proposing novel techniques to<br> improve compute utilization by slightly tweaking operand stationarity. <em>EUREKA</em> achieves a<br> speedup of 5x, 2.5x, along with 3.2x and 1.7x energy reductions over Dense and structured<br> sparse execution respectively. <em>EUREKA</em> only incurs area and power overheads of 6% and<br> 11.5%, respectively, over Ampere</p>

Page generated in 0.082 seconds