Global ETD Search

121	PROCESSOR TEMPERATURE AND RELIABILITY ESTIMATION USING ACTIVITY COUNTERS Chhablani, Mayank 23 March 2016 (has links) With the advent of technology scaling lifetime reliability is an emerging threat in high-performance and deadline-critical systems. High on-chip thermal gradients accelerates localised thermal elevations (hotspots) which increases the aging rate of the semiconductor devices. As a result, reliable operation of the processors has become a challenging task. Therefore, cost effective schemes for estimating temperature and reliability are crucial. In this work we present a reliability estimation scheme that is based on a light-weight temperature estimation technique that monitors hardware events. Unlike previously pro- posed hardware counter-based approaches, our approach involves a linear-temporal-feedback estimator, taking into account the effects of thermal inertia. The proposed approach shows an average absolute error of We then present a counter-based technique to estimate the thermal accelerated aging factor (TAAF), which is an indicator of lifetime reliability. Results demonstrate that the estimation error is within [−3, +5]. Reliability Estimation Temperature Monitoring Performance Counters Localized hotspots Temperature Estimation Computer and Systems Architecture
122	A Hardware Framework for Yield and Reliability Enhancement in Chip Multiprocessors Pan, Abhisek 01 January 2009 (has links) (PDF) Device reliability and manufacturability have emerged as dominant concerns in end-of-road CMOS devices. Today an increasing number of hardware failures are attributed to device reliability problems that cause partial system failure or shutdown. Also maintaining an acceptable manufacturing yield is seen as challenge because of smaller feature sizes, process variation, and reduced headroom for burn-in tests. In this project we investigate a hardware-based scheme for improving yield and reliability of a homogeneous chip multiprocessor (CMP). The proposed solution involves a hardware framework that enables us to utilize the redundancies inherent in a multi-core system to keep the system operational in face of partial failures due to hard faults (faults due to manufacturing defects or permanent faults developed during system lifetime). A micro-architectural modification allows a faulty core in a multiprocessor system to use another core as a coprocessor to service any instruction that the former cannot execute correctly by itself. This service is accessed to improve yield and reliability, but at the cost of some loss of performance. In order to quantify this loss we have used a cycle-accurate architectural simulator to simulate the performance of dual-core and quad-core systems with one or more cores sustaining partial failure. Simulation studies indicate that when a large and sparingly-used unit such as a floating point unit fails in a core, even for a floating point intensive benchmark, we can continue to run the faulty core with as little as 10% performance impact and minimal area overhead. Incorporating this recovery mechanism entails some modifications in the microprocessor micro-architecture. The modifications are also described here through a simplified model of a superscalar processor. Computer architecture Yield and reliability Chip multiprocessors Microarchitectural modification Simulation Redundance across cores Electrical engineering Computer and Systems Architecture Hardware Systems
123	Computational Delay in Vehicles and Its Effect on Real Time Scheduling Jain, Abhinna 01 January 2012 (has links) (PDF) Present research into critical embedded control systems tends to focus on the computational elements and largely ignore the link between the computational and physical elements. This link is very important since the computational capability of the computer can greatly affect the performance and dynamics of the system it controls. The control computer is in the feedback loop of control systems and contributes to feedback delay in addition to already existing mechanical delays. While mechanical delays are compensated in control design, variable computational delays cause system to underperform in its intended physical behavior and impose a cost in terms of fuel or time. For this reason, the scheduler in a real-time operating systems should not focus only on the task deadlines, but also on efficient scheduling which minimizes the effect of computational delay on the controlled plant. The proposed work provides a systematic framework to manage and evaluate the implications of computational delay in vehicles. The work also includes cost sensitive real-time control task scheduling heuristics and Dynamic Voltage Scaling (DVS) for better energy/thermal control. We show through simulations that our heuristic achieves a significant improvement in cost over the traditional real-time scheduling algorithm Earliest Deadline First (EDF) and show that it can adjust according to energy constraints imposed on the system. CPS Real time systems delay scheduling dvs cost function Computer and Systems Architecture
124	N3asics: Designing Nanofabrics with Fine-Grained Cmos Integration Panchapakeshan, Pavan 01 January 2012 (has links) (PDF) Nanoscale-computing fabrics based on novel materials such as semiconductor nanowires, carbon nanotubes, graphene, etc. have been proposed in recent years. These fabrics employ unconventional manufacturing techniques like Nano-imprint lithography or Super-lattice Nanowire Pattern Transfer to produce ultra-dense nano-structures. However, one key challenge that has received limited attention is the interfacing of unconventional/self-assembly based approaches with conventional CMOS manufacturing to build integrated systems. We propose a novel nanofabric approach that mixes unconventional nanomanufacturing with CMOS manufacturing flow and design rules to build a reliable nanowire-CMOS 3-D integrated fabric called N3ASICs with no new manufacturing constraints. In N3ASICs active devices are formed on a dense semiconductor nanowire array and standard area distributed pins/vias, metal interconnects route signals in 3D. The proposed N3ASICs fabric is fully described and thoroughly evaluated at all design levels. Novel nanowire based devices are envisioned and characterized based on 3D physics modeling. Overall N3ASICs fabric design, associated circuits, interconnection approach, and a layer-by-layer assembly sequence for the fabric are introduced. System level metrics such as power, performance, and density for a nanoprocessor design built using N3ASICs were evaluated and compared against a functionally equivalent CMOS design. We show that the N3ASICs version of the processor is 3X denser and 5X more power efficient for a comparable performance than the 16-nm scaled CMOS version without any new/unknown-manufacturing requirement. Systematic yield implications due to mask overlay misalignment have been evaluated. A partitioning approach to build complex circuits has been studied. NASIC N3ASIC nanowires nano-CMOS hybrid systems 3-D integration Computer and Systems Architecture Nanotechnology Fabrication
125	A Neural Network Approach to Border Gateway Protocol Peer Failure Detection and Prediction White, Cory B. 01 December 2009 (has links) (PDF) The size and speed of computer networks continue to expand at a rapid pace, as do the corresponding errors, failures, and faults inherent within such extensive networks. This thesis introduces a novel approach to interface Border Gateway Protocol (BGP) computer networks with neural networks to learn the precursor connectivity patterns that emerge prior to a node failure. Details of the design and construction of a framework that utilizes neural networks to learn and monitor BGP connection states as a means of detecting and predicting BGP peer node failure are presented. Moreover, this framework is used to monitor a BGP network and a suite of tests are conducted to establish that this neural network approach as a viable strategy for predicting BGP peer node failure. For all performed experiments both of the proposed neural network architectures succeed in memorizing and utilizing the network connectivity patterns. Lastly, a discussion of this framework's generic design is presented to acknowledge how other types of networks and alternate machine learning techniques can be accommodated with relative ease. neural network artificial neural network border gateway protocol BGP machine learning networks Artificial Intelligence and Robotics Computer and Systems Architecture OS and Networks
126	CUDA Web API Remote Execution of CUDA Kernels Using Web Services Becker, Massimo J 01 June 2012 (has links) (PDF) Massively parallel programming is an increasingly growing field with the recent introduction of general purpose GPU computing. Modern graphics processors from NVIDIA and AMD have massively parallel architectures that can be used for such applications as 3D rendering, financial analysis, physics simulations, and biomedical analysis. These massively parallel systems are exposed to programmers through in- terfaces such as NVIDIAs CUDA, OpenCL, and Microsofts C++ AMP. These frame- works expose functionality using primarily either C or C++. In order to use these massively parallel frameworks, programs being implemented must be run on machines equipped with massively parallel hardware. These requirements limit the flexibility of new massively parallel systems. This paper explores the possibility that massively parallel systems can be exposed through web services in order to facilitate using these architectures from remote systems written in other languages. To explore this possi- bility, an architecture is put forth with requirements and high level design for building a web service that can overcome limitations of existing tools and frameworks. The CUDA Web API is built using Python, PyCUDA, NumPy, JSON, and Django to meet the requirements set forth. Additionaly, a client application, CUDA Cloud, is built and serves as an example web service client. The CUDA Web API’s performance and its functionality is validated using a common matrix multiplication algorithm implemented using different languages and tools. Performance tests show runtime improvements for larger datasets using the CUDA Web API for remote CUDA kernel execution over serial implementations. This paper concludes that existing limitations associated with GPGPU usage can be overcome with the specified architecture. CUDA Web Services Parallelism Distributed Systems Computer and Systems Architecture OS and Networks Programming Languages and Compilers Software Engineering Systems Architecture
127	Relevance Analysis for Document Retrieval Labouve, Eric 01 March 2019 (has links) (PDF) Document retrieval systems recover documents from a dataset and order them according to their perceived relevance to a user’s search query. This is a diﬃcult task for machines to accomplish because there exists a semantic gap between the meaning of the terms in a user’s literal query and a user’s true intentions. Even with this ambiguity that arises with a lack of context, users still expect that the set of documents returned by a search engine is both highly relevant to their query and properly ordered. The focus of this thesis is on document retrieval systems that explore methods of ordering documents from unstructured, textual corpora using text queries. The main goal of this study is to enhance the Okapi BM25 document retrieval model. In doing so, this research hypothesizes that the structure of text inside documents and queries hold valuable semantic information that can be incorporated into the Okapi BM25 model to increase its performance. Modiﬁcations that account for a term’s part of speech, the proximity between a pair of related terms, the proximity of a term with respect to its location in a document, and query expansion are used to augment Okapi BM25 to increase the model’s performance. The study resulted in 87 modiﬁcations which were all validated using open source corpora. The top scoring modiﬁcation from the validation phase was then tested under the Lisa corpus and the model performed 10.25% better than Okapi BM25 when evaluated under mean average precision. When compared against two industry standard search engines, Lucene and Solr, the top scoring modiﬁcation largely outperforms these systems by upwards to 21.78% and 23.01%, respectively. Semantic Analysis Document Retrieval Query Expansion Term Proximity Search Okapi BM25 Computer and Systems Architecture Data Storage Systems
128	Symbolic Generation of Parallel Solvers for Unconstrained Optimization Pavlin, Jessica L. 10 1900 (has links) <p>In this thesis we consider the need to generate efficient solvers for inverse imaging problems in a way that supports both quality and performance in software, as well as flexibility in the underlying mathematical models. Many problem domains involve large data sizes and rates, and changes in mathematical modelling are limited only by researcher ingenuity and driven by the value of the application. We use a problem in Magnetic Resonance Imaging to illustrate this situation, motivate the need for better software tools and test the tools we develop. The problem is the determination of velocity profiles, think blood-flow patterns, using Phase Contrast Angiography. Despite the name, this method is completely noninvasive, not requiring the injection of contrast agents, but it is too time-consuming with present imaging and computing technology.</p> <p>Our approach is to separate the specification, the mathematical model, from the implementation details required for performance, using a custom language. The Domain Specific Language (DSL) provided to scientists allows for a complete abstraction from the highly optimized generated code. The mathematical DSL is converted to an internal representation we refer to as the Coconut Expression Library. Our expression library uses the directed acyclic graphs as an underlying data structure, which lends itself nicely to our automatic simplifications, differentiation and subexpression elimination. We show how parallelization and other optimizations are encoded as rules which are applied automatically rather than schemes that need to be implemented by the programmer in the low-level implementation. Finally, we present results, both in terms of numerical results and computational performance.</p> / Master of Science (MSc) symbolic code generation parallelization MRI optimization Bioimaging and biomedical optics Computer and Systems Architecture Other Computer Engineering Bioimaging and biomedical optics
129	Adaptive Beyond Von-Neumann Computing Devices and Reconfigurable Architectures for Edge Computing Applications Hossain, Mousam 01 January 2024 (has links) (PDF) The Von-Neumann bottleneck, a major challenge in computer architecture, results from significant data transfer delays between the processor and main memory. Crossbar arrays utilizing spin-based devices like Magnetoresistive Random Access Memory (MRAM) aim to overcome this bottleneck by offering advantages in area and performance, particularly for tasks requiring linear transformations. These arrays enable single-cycle and in-memory vector-matrix multiplication, reducing overheads, which is crucial for energy and area-constrained Internet of Things (IoT) sensors and embedded devices. This dissertation focuses on designing, implementing, and evaluating reconfigurable computation platforms that leverage MRAM-based crossbar arrays and analog computation to support deep learning and error resilience implementations. One key contribution is the investigation of Spin Torque Transfer MRAM (STT-MRAM) technology scaling trends, considering power dissipation, area, and process variation (PV) across different technology nodes. A predictive model for power estimation in hybrid CMOS/MTJ technology has been developed and validated, along with new metrics considering the Internet of Things (IoT) energy profile of various applications. The dissertation introduces the Spintronically Configurable Analog Processing in-memory Environment (SCAPE), integrating analog arithmetic, runtime reconfigurability, and non-volatile devices within a selectable 2-D topology of hybrid spin/CMOS devices. Simulation results show improvements in error rates, power consumption, and power-error-product metric for real-world applications like machine learning and compressive sensing, while assessing process variation impact. Additionally, it explores transportable approaches to more robust SCAPE implementations, including applying redundancy techniques for artificial neural network (ANN)-based digit recognition applications. Generic redundancy techniques are developed and applied to hybrid spin/CMOS-based ANNs, showcasing improved/comparable accuracy with smaller-sized networks. Furthermore, the dissertation examines hardware security considerations for emerging memristive device-based applications, discussing mitigation approaches against malicious manufacturing interventions. It also discusses reconfigurable computing for AI/ML applications based on state-of-the-art FPGAs, along with future directions in adaptive computing architectures for AI/ML at the edge of the network. Computer and Systems Architecture Computer Engineering
130	Security Analysis of ECC Based Protocols Khatwani, Chanchal 01 January 2017 (has links) Elliptic curve cryptography (ECC) is extensively used in various multifactor authentication protocols. In this work, various recent ECC based authentication and key exchange protocols are subjected to threat modeling and static analysis to detect vulnerabilities, and to enhance them to be more secure against threats. This work demonstrates how currently used ECC based protocols are vulnerable to attacks. If protocols are vulnerable, damages could include critical data loss and elevated privacy concerns. The protocols considered in thiswork differ in their usage of security factors (e.g. passwords, pins, and biometrics), encryption and timestamps. The threatmodel considers various kinds of attacks including denial of service, man in the middle, weak authentication and SQL injection. Countermeasures to reduce or prevent such attacks are suggested. Beyond cryptanalysis of current schemes and proposal of new schemes, the proposed adversary model and criteria set forth provide a benchmark for the systematic evaluation of future two-factor authentication proposals. Thesis University of North Florida UNF Computer and Systems Architecture

Search results