Global ETD Search

181	Reactive Rejuvenation of CMOS Logic Paths using Self-activating Voltage Domains Khoshavi Najafabadi, Navid 01 January 2016 (has links) Aggressive CMOS technology scaling trends exacerbate the aging-related degradation of propagation delay and energy efficiency in nanoscale designs. Recently, power-gating has been utilized as an effective low-power design technique which has also been shown to alleviate some aging impacts. However, the use of MOSFETs to realize power-gated designs will also encounter aging-induced degradations in the sleep transistors themselves which necessitates the exploration of design strategies to utilize power-gating effectively to mitigate aging. In particular, Bias Temperature Instability (BTI) which occurs during activation of power-gated voltage islands is investigated with respect to the placement of the sleep transistor in the header or footer as well as the impact of ungated input transitions on interfacial trapping. Results indicate the effectiveness of power-gating on NBTI/PBTI phenomena and propose a preferred sleep transistor configuration for maximizing higher recovery. Furthermore, the aging effect can manifest itself as timing error on critical speed-paths of the circuit, if a large design guardband is not reserved. To mitigate circuit from BTI-induced aging, the Reactive Rejuvenation (RR) architectural approach is proposed which entails detection and recovery phases. The BTI impact on the critical and near critical paths performance is continuously examined through a lightweight logic circuit which asserts an error signal in the case of any timing violation in those paths. By observing the timing violation occurrence in the system, the timing-sensitive portion of the circuit is recovered from BTI through switching computations to redundant aging-critical voltage domain. The proposed technique achieves aging mitigation and reduced energy consumption as compared to a baseline circuit. Thus, signi?cant voltage guardbands to meet the desired timing speci?cation are avoided result in energy savings during circuit operation. Computer Engineering
182	Design Disjunction for Resilient Reconfigurable Hardware Alzahrani, Ahmad 01 January 2015 (has links) Contemporary reconfigurable hardware devices have the capability to achieve high performance, power efficiency, and adaptability required to meet a wide range of design goals. With scaling challenges facing current complementary metal oxide semiconductor (CMOS), new concepts and methodologies supporting efficient adaptation to handle reliability issues are becoming increasingly prominent. Reconfigurable hardware and their ability to realize self-organization features are expected to play a key role in designing future dependable hardware architectures. However, the exponential increase in density and complexity of current commercial SRAM-based field-programmable gate arrays (FPGAs) has escalated the overhead associated with dynamic runtime design adaptation. Traditionally, static modular redundancy techniques are considered to surmount this limitation; however, they can incur substantial overheads in both area and power requirements. To achieve a better trade-off among performance, area, power, and reliability, this research proposes design-time approaches that enable fine selection of redundancy level based on target reliability goals and autonomous adaptation to runtime demands. To achieve this goal, three studies were conducted: First, a graph and set theoretic approach, named Hypergraph-Cover Diversity (HCD), is introduced as a preemptive design technique to shift the dominant costs of resiliency to design-time. In particular, union-free hypergraphs are exploited to partition the reconfigurable resources pool into highly separable subsets of resources, each of which can be utilized by the same synthesized application netlist. The diverse implementations provide reconfiguration-based resilience throughout the system lifetime while avoiding the significant overheads associated with runtime placement and routing phases. Evaluation on a Motion-JPEG image compression core using a Xilinx 7-series-based FPGA hardware platform has demonstrated the potential of the proposed FT method to achieve 37.5% area saving and up to 66% reduction in power consumption compared to the frequently-used TMR scheme while providing superior fault tolerance. Second, Design Disjunction based on non-adaptive group testing is developed to realize a low-overhead fault tolerant system capable of handling self-testing and self-recovery using runtime partial reconfiguration. Reconfiguration is guided by resource grouping procedures which employ non-linear measurements given by the constructive property of f-disjunctness to extend runtime resilience to a large fault space and realize a favorable range of tradeoffs. Disjunct designs are created using the mosaic convergence algorithm developed such that at least one configuration in the library evades any occurrence of up to d resource faults, where d is lower-bounded by f. Experimental results for a set of MCNC and ISCAS benchmarks have demonstrated f-diagnosability at the individual slice level with average isolation resolution of 96.4% (94.4%) for f=1 (f=2) while incurring an average critical path delay impact of only 1.49% and area cost roughly comparable to conventional 2-MR approaches. Finally, the proposed Design Disjunction method is evaluated as a design-time method to improve timing yield in the presence of large random within-die (WID) process variations for application with a moderately high production capacity. Computer Engineering
183	Towards Energy-Efficient and Reliable Computing: From Highly-Scaled CMOS Devices to Resistive Memories Salehi Mobarakeh, Soheil 01 January 2016 (has links) The continuous increase in transistor density based on Moore's Law has led us to highly scaled Complementary Metal-Oxide Semiconductor (CMOS) technologies. These transistor-based process technologies offer improved density as well as a reduction in nominal supply voltage. An analysis regarding different aspects of 45nm and 15nm technologies, such as power consumption and cell area to compare these two technologies is proposed on an IEEE 754 Single Precision Floating-Point Unit implementation. Based on the results, using the 15nm technology offers 4-times less energy and 3-fold smaller footprint. New challenges also arise, such as relative proportion of leakage power in standby mode that can be addressed by post-CMOS technologies. Spin-Transfer Torque Random Access Memory (STT-MRAM) has been explored as a post-CMOS technology for embedded and data storage applications seeking non-volatility, near-zero standby energy, and high density. Towards attaining these objectives for practical implementations, various techniques to mitigate the specific reliability challenges associated with STT-MRAM elements are surveyed, classified, and assessed herein. Cost and suitability metrics assessed include the area of nanomagmetic and CMOS components per bit, access time and complexity, Sense Margin (SM), and energy or power consumption costs versus resiliency benefits. In an attempt to further improve the Process Variation (PV) immunity of the Sense Amplifiers (SAs), a new SA has been introduced called Adaptive Sense Amplifier (ASA). ASA can benefit from low Bit Error Rate (BER) and low Energy Delay Product (EDP) by combining the properties of two of the commonly used SAs, Pre-Charge Sense Amplifier (PCSA) and Separated Pre-Charge Sense Amplifier (SPCSA). ASA can operate in either PCSA or SPCSA mode based on the requirements of the circuit such as energy efficiency or reliability. Then, ASA is utilized to propose a novel approach to actually leverage the PV in Non-Volatile Memory (NVM) arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. Computer Engineering
184	Developing New Power Management and High-Reliability Schemes in Data-Intensive Environment Wang, Ruijun 01 January 2016 (has links) With the increasing popularity of data-intensive applications as well as the large-scale computing and storage systems, current data centers and supercomputers are often dealing with extremely large data-sets. To store and process this huge amount of data reliably and energy-efficiently, three major challenges should be taken into consideration for the system designers. Firstly, power conservation–Multicore processors or CMPs have become a mainstream in the current processor market because of the tremendous improvement in transistor density and the advancement in semiconductor technology. However, the increasing number of transistors on a single die or chip reveals a super-linear growth in power consumption [4]. Thus, how to balance system performance and power-saving is a critical issue which needs to be solved effectively. Secondly, system reliability–Reliability is a critical metric in the design and development of replication-based big data storage systems such as Hadoop File System (HDFS). In the system with thousands machines and storage devices, even in-frequent failures become likely. In Google File System, the annual disk failure rate is 2:88%,which means you were expected to see 8,760 disk failures in a year. Unfortunately, given an increasing number of node failures, how often a cluster starts losing data when being scaled out is not well investigated. Thirdly, energy efficiency–The fast processing speeds of the current generation of supercomputers provide a great convenience to scientists dealing with extremely large data sets. The next generation of "exascale" supercomputers could provide accurate simulation results for the automobile industry, aerospace industry, and even nuclear fusion reactors for the very first time. However, the energy cost of super-computing is extremely high, with a total electricity bill of 9 million dollars per year. Thus, conserving energy and increasing the energy efficiency of supercomputers has become critical in recent years. This dissertation proposes new solutions to address the above three key challenges for current large-scale storage and computing systems. Firstly, we propose a novel power management scheme called MAR (model-free, adaptive, rule-based) in multiprocessor systems to minimize the CPU power consumption subject to performance constraints. By introducing new I/O wait status, MAR is able to accurately describe the relationship between core frequencies, performance and power consumption. Moreover, we adopt a model-free control method to filter out the I/O wait status from the traditional CPU busy/idle model in order to achieve fast responsiveness to burst situations and take full advantage of power saving. Our extensive experiments on a physical testbed demonstrate that, for SPEC benchmarks and data-intensive (TPC-C) benchmarks, an MAR prototype system achieves 95.8-97.8% accuracy of the ideal power saving strategy calculated offline. Compared with baseline solutions, MAR is able to save 12.3-16.1% more power while maintain a comparable performance loss of about 0.78-1.08%. In addition, more simulation results indicate that our design achieved 3.35-14.2% more power saving efficiency and 4.2-10.7% less performance loss under various CMP configurations as compared with various baseline approaches such as LAST, Relax, PID and MPC. Secondly, we create a new reliability model by incorporating the probability of replica loss to investigate the system reliability of multi-way declustering data layouts and analyze their potential parallel recovery possibilities. Our comprehensive simulation results on Matlab and SHARPE show that the shifted declustering data layout outperforms the random declustering layout in a multi-way replication scale-out architecture, in terms of data loss probability and system reliability by upto 63% and 85% respectively. Our study on both 5-year and 10-year system reliability equipped with various recovery bandwidth settings shows that, the shifted declustering layout surpasses the two baseline approaches in both cases by consuming up to 79 % and 87% less recovery bandwidth for copyset, as well as 4.8% and 10.2% less recovery bandwidth for random layout. Thirdly, we develop a power-aware job scheduler by applying a rule based control method and taking into account real world power and speedup profiles to improve power efficiency while adhering to predetermined power constraints. The intensive simulation results shown that our proposed method is able to achieve the maximum utilization of computing resources as compared to baseline scheduling algorithms while keeping the energy cost under the threshold. Moreover, by introducing a Power Performance Factor (PPF) based on the real world power and speedup profiles, we are able to increase the power efficiency by up to 75%. Computer Engineering
185	Energy-Aware Data Movement In Non-Volatile Memory Hierarchies Najafabadi, Navid Khoshavi 01 January 2017 (has links) While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore's Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore's Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach. Computer Engineering
186	Load-Balancing in Local and Metro-Area networks with MPTCP and OpenFlow Jerome, Austin 01 January 2017 (has links) In this thesis, a novel load-balancing technique for local or metro-area traffic is proposed in mesh-style topologies. The technique uses Software Defined Networking (SDN) architecture with virtual local area network (VLAN) setups typically seen in a campus or small-to-medium enterprise environment. This was done to provide a possible solution or at least a platform to expand on for the load-balancing dilemma that network administrators face today. The transport layer protocol Multi-Path TCP (MPTCP) coupled with IP aliasing is also used. The trait of MPTCP of forming multiple subflows from sender to receiver depending on the availability of IP addresses at either the sender or receiver helps to divert traffic in the subflows across all available paths. The combination of MPTCP subflows with IP aliasing enables spreading out of the traffic load across greater number of links in the network, and thereby achieving load balancing and better network utilization. The traffic formed of each subflow would be forwarded across the network based on Hamiltonian 'paths' which are created in association with each switch in the topology which are directly connected to hosts. The amount of 'paths' in the topology would also depend on the number of VLANs setup for the hosts in the topology. This segregation would allow for network administrators to monitor network utilization across VLANs and give the ability to balance load across VLANs. We have devised several experiments in Mininet, and the experimentation showed promising results with significantly better throughput and network utilization compared to cases where normal TCP was used to send traffic from source to destination. Our study clearly shows the advantages of using MPTCP for load balancing purposes in SDN type architectures and provides a platform for future research on using VLANs, SDN, and MPTCP for network traffic management. Computer Engineering
187	Real-time SIL Emulation Architecture for Cooperative Automated Vehicles Gupta, Nitish 01 January 2018 (has links) This thesis presents a robust, flexible and real-time architecture for Software-in-the-Loop (SIL) testing of connected vehicle safety applications. Emerging connected and automated vehicles (CAV) use sensing, communication and computing technologies in the design of a host of new safety applications. Testing and verification of these applications is a major concern for the automotive industry. The CAV safety applications work by sharing their state and movement information over wireless communication links. Vehicular communication has fueled the development of various Cooperative Vehicle Safety (CVS) applications. Development of safety applications for CAV requires testing in many different scenarios. However, the recreation of test scenarios for evaluating safety applications is a very challenging task. This is mainly due to the randomness in communication, difficulty in recreating vehicle movements precisely, and safety concerns for certain scenarios. We propose to develop a standalone Remote Vehicle Emulator (RVE) that can reproduce V2V messages of remote vehicles from simulations or from previous tests, while also emulating the over the air behavior of multiple communicating nodes. This is expected to significantly accelerate the development cycle. RVE is a unique and easily configurable emulation cum simulation setup to allow Software in the Loop (SIL) testing of connected vehicle applications in a realistic and safe manner. It will help in tailoring numerous test scenarios, expediting algorithm development and validation as well as increase the probability of finding failure modes. This, in turn, will help improve the quality of safety applications while saving testing time and reducing cost. The RVE architecture consists of two modules, the Mobility Generator, and the Communication emulator. Both of these modules consist of a sequence of events that are handled based on the type of testing to be carried out. The communication emulator simulates the behavior of MAC layer while also considering the channel model to increase the probability of successful transmission. It then produces over the air messages that resemble the output of multiple nodes transmitting, including corrupted messages due to collisions. The algorithm that goes inside the emulator has been optimized so as to minimize the communication latency and make this a realistic and real-time safety testing tool. Finally, we provide a multi-metric experimental evaluation wherein we verified the simulation results with an identically configured ns3 simulator. With the aim to improve the quality of testing of CVS applications, this unique architecture would serve as a fundamental design for the future of CVS application testing. Computer Engineering
188	Reducing the Overhead of Memory Space, Network Communication and Disk I/O for Analytic Frameworks in Big Data Ecosystem Zhang, Xuhong 01 January 2017 (has links) To facilitate big data processing, many distributed analytic frameworks and storage systems such as Apache Hadoop, Apache Hama, Apache Spark and Hadoop Distributed File System (HDFS) have been developed. Currently, many researchers are conducting research to either make them more scalable or enabling them to support more analysis applications. In my PhD study, I conducted three main works in this topic, which are minimizing the communication delay in Apache Hama, minimizing the memory space and computational overhead in HDFS and minimizing the disk I/O overhead for approximation applications in Hadoop ecosystem. Specifically, In Apache Hama, communication delay makes up a large percentage of the overall graph processing time. While most recent research has focused on reducing the number of network messages, we add a runtime communication and computation scheduler to overlap them as much as possible. As a result, communication delay can be mitigated. In HDFS, the block location table and its corresponding maintenance could occupy more than half of the memory space and 30% of processing capacity in master node, which severely limit the scalability and performance of master node. We propose Deister that uses deterministic mathematical calculations to eliminate the huge table for storing the block locations and its corresponding maintenance. My third work proposes to enable both efficient and accurate approximations on arbitrary sub-datasets of a large dataset. Existing offline sampling based approximation systems are not adaptive to dynamic query workloads and online sampling based approximation systems suffer from low I/O efficiency and poor estimation accuracy. Therefore, we develop a distribution aware method called Sapprox. Our idea is to collect the occurrences of a sub-dataset at each logical partition of a dataset (storage distribution) in the distributed system at a very small cost, and make good use of such information to facilitate online sampling. Computer Engineering
189	End to End Brain Fiber Orientation Estimation Using Deep Learning Puttashamachar, Nandakishore 01 January 2017 (has links) In this work, we explore the various Brain Neuron tracking techniques, one of the most significant applications of Diffusion Tensor Imaging. Tractography is a non-invasive method to analyze underlying tissue micro-structure. Understanding the structure and organization of the tissues facilitates a diagnosis method to identify any aberrations which can occur within tissues due to loss of cell functionalities, provides acute information on the occurrences of brain ischemia or stroke, the mutation of certain neurological diseases such as Alzheimer, multiple sclerosis and so on. Under all these circumstances, accurate localization of the aberrations in efficient manner can help save a life. Following up with the limitations introduced by the current Tractography techniques such as computational complexity, reconstruction errors during tensor estimation and standardization, we aim to elucidate these limitations through our research findings. We introduce an End to End Deep Learning framework which can accurately estimate the most probable likelihood orientation at each voxel along a neuronal pathway. We use Probabilistic Tractography as our baseline model to obtain the training data and which also serve as a Tractography Gold Standard for our evaluations. Through experiments we show that our Deep Network can do a significant improvement over current Tractography implementations by reducing the run-time complexity to a significant new level. Our architecture also allows for variable sized input DWI signals eliminating the need to worry about memory issues as seen with the traditional techniques. The advantage of this architecture is that it is perfectly desirable to be processed on a cloud setup and utilize the existing multi GPU frameworks to perform whole brain Tractography in minutes rather than hours. The proposed method is a good alternative to the current state of the art orientation estimation technique which we demonstrate across multiple benchmarks. Computer Engineering
190	Deep Hashing for Image Similarity Search Al Kobaisi, Ali 01 January 2020 (has links) Hashing for similarity search is one of the most widely used methods to solve the approximate nearest neighbor search problem. In this method, one first maps data items from a real valued high-dimensional space to a suitable low dimensional binary code space and then performs the approximate nearest neighbor search in this code space instead. This is beneficial because the search in the code space can be solved more efficiently in terms of runtime complexity and storage consumption. Obviously, for this method to succeed, it is necessary that similar data items be mapped to binary code words that have small Hamming distance. For real-world data such as images, one usually proceeds as follows. For each data item, a pre-processing algorithm removes noise and insignificant information and extracts important discriminating information to generate a feature vector that captures the important semantic content. Next, a vector hash function maps this real valued feature vector to a binary code word. It is also possible to use the raw feature vectors afterwards to further process the search result candidates produced by binary hash codes. In this dissertation we focus on the following. First, developing a learning based counterpart for the MinHash hashing algorithm. Second, presenting a new unsupervised hashing method UmapHash to map the neighborhood relations of data items from the feature vector space to the binary hash code space. Finally, an application of the aforementioned hashing methods for rapid face image recognition. Computer Engineering

Search results