Global ETD Search

141	Towards hardware as a reconfigurable, elastic, and specialized service Sanaullah, Ahmed 29 September 2019 (has links) As modern Data Center workloads become increasingly complex, constrained, and critical, mainstream CPU-centric computing has had ever more difficulty in keeping pace. Future data centers are moving towards a more fluid and heterogeneous model, with computation and communication no longer localized to commodity CPUs and routers. Next generation data-centric Data Centers will compute everywhere, whether data is stationary (e.g. in memory) or on the move (e.g. in network). While deploying FPGAs in NICS, as co-processors, in the router, and in Bump-in-the-Wire configurations is a step towards implementing the data-centric model, it is only part of the overall solution. The other part is actually leveraging this reconfigurable hardware. For this to happen, two problems must be addressed: code generation and deployment generation. By code generation we mean transforming abstract representations of an algorithm into equivalent hardware. Deployment generation refers to the runtime support needed to facilitate the execution of this hardware on an FPGA. Efforts at creating supporting tools in these two areas have thus far provided limited benefits. This is because the efforts are limited in one or more of the following ways: They i) do not provide fundamental solutions to a number of challenges, which makes them useful only to a limited group of (mostly) hardware developers, ii) are constrained in their scope, or iii) are ad hoc, i.e., specific to a single usage context, FPGA vendor, or Data Center configuration. Moreover, efforts in these areas have largely been mutually exclusive, which results in incompatibility across development layers; this requires wrappers to be designed to make interfaces compatible. As a result there is significant complexity and effort required to code and deploy efficient custom hardware for FPGAs; effort that may be orders-of-magnitude greater than for analogous software environments. The goal of this dissertation is to create a framework that enables reconfigurable logic in Data Centers to be targeted with the same level of effort as for a single CPU core. The underlying mechanism to this is a framework, which we refer to as Hardware as a Reconfigurable, Elastic and Specialized Service, or HaaRNESS. In this dissertation, we address two of the core challenges of HaaRNESS: reducing the complexity of code generation by constraining High Level Synthesis (HLS) toolflows, and replacing ad hoc models of deployment generation by generalizing and formalizing what is needed for a hardware Operating System. These parts are unified by the back-end of HLS toolflows which link generated compute pipelines with the operating system, and provide appropriate APIs, wrappers, and software runtimes. The contributions of this dissertation are the following: i) an empirically guided set of systematic transformations for generating high quality HLS code; ii) a framework for instrumenting HLS compiler to identify and remove optimization blockers; iii) a framework for RTL simulation and IP generation of HLS kernels for rapid turnaround; and iv) a framework for generalization and formalization of hardware operating systems to address the {\it ad hoc}'ness of existing deployment generation and ensure uniform structure and APIs. Computer engineering
142	Real-Time Fetal ECG System Design Using Embedded Microprocessors Unknown Date (has links) Monitoring the fetal ECG (FECG) gives us important information about the fetal wellbeing. FECG is a complex waveform where each of the P through T complexes provides wealth of information. The objective of this project is to develop the best state-of-art real-time FECG monitoring system using embedded microprocessors. Many researchers from various fields like signal processing, artificial intelligence, and advanced statistics, have applied different techniques to extract FECG from the mixture of MECG and other noises and calculate the FHR, with accuracy as an objective. Most of them are calculation intensive and not real-time. The proposed approach focuses mainly on real time processing, robustness and portability of the system. The work discussed here will provide a novel algorithm to extract FECG from abdominal ECG (AECG) which is mixture of FECG, MECG, and noise, and finding Fetal FHR with less number of dimensions (measurements) with the best signal-to-noise ratio. This approach is tested on different soft-core processors and results are compared with other commercial of-the-shelf (COTS) hardcore solutions, in terms of power, cost, size and speed. In the end FECG was successfully extracted and identified on the basis of BPM and SNR values calculated using this method. It was found that hard-core processor (ARM Cortex A9) has achieved the best real-time performance among all. / A Thesis submitted to the Department of Electrical & Computer Engineering in partial fulfillment of the requirements for the degree of Master of Science. / Fall Semester 2016. / November 22, 2016. / Eigenvectors, Embedded Systems, FECG, FPGA, Principle Component Analysis, System on Chip / Includes bibliographical references. / Uwe H. Meyer-Baese, Professor Directing Thesis; Simon Y. Foo, Committee Member; Shonda Bernadin, Committee Member. Computer engineering
143	A New Scheme for Emergency Message Dissemination in Vehicular Ad Hoc Network Unknown Date (has links) Emergency message dissemination (EMD) in vehicular ad hoc network (VANET) becomes a hot topic due to the ever increasing concern on the road safety. When encountering unusual situations, emergency messages should be disseminated quickly to as many vehicles as possible in order to avoid any potential accidents. For this application, two basic requirements are low latency and high reliability [23]. In urban area, EMD application can be applied with roadside devices (e.g. using a base station to broadcast). But in the areas that infrastructures are difficult to be deployed and maintained, multi-hop broadcast is the main technique used in this application. However, multi-hop broadcast scheme leads to broadcast storm problem in dense traffic. Several approaches have been proposed to solve this problem. They can be classified as distanced-based, cluster-based, and probability-based approaches. In distance-based approaches, vehicles that farther from the source vehicle are selected as relay nodes in order to achieve a better cover area. In cluster-based approaches, each cluster has a cluster head which is used as the relay node. The cluster is self-organized and the cluster head is selected according to information such as vehicle's speed, direction, location, and antenna height. Both distance-based and cluster-based approaches require the maintenance of the network topology. For probability-based approaches, vehicles don't care about the network topology or their neighbors’ information. A vehicle decides its probability to broadcast based on the information contained in the received packet. This thesis analyzes the existing protocols and points out the issues of the existing approaches. For the issues, we present a novel probability-based broadcast scheme that can inhibit broadcast storm, decrease the end-to-end delay, and guarantee that the emergency messages are delivered to most of the vehicles. The rest of the thesis is structured as follows. In chapter 1 we introduce the concept of VANET and remaining issues of the existing methods for EMD. The research problem in this work is defined and several related works are reviewed. In chapter 2, we analyze the probability-based protocol and proposed our protocol in detail. The typical probability-based protocol [13] uses a linear function to determine whether or not a vehicle should broadcast. However, it doesn't perform well in dense traffic in term of end-to-end delay. Our protocol uses an exponent function instead of the linear function in probability-based approaches. The simulation results show that our protocol can shorten the end-to-end delay in dense traffic without impacting the reliability. In chapter 3, we present the network model and simulation results, including the tools used for the simulation. Especially, key parameters of the proposed protocol are discussed. The simulation results as well as network performance analysis are presented in chapter 4. In chapter 5, we conclude this work. The remaining problems and future work of the proposed protocol are summarized in this chapter. / A Thesis submitted to the Department of Electrical and Computer Engineering in partial fulfillment of the requirements for the degree of Master of Science. / Summer Semester 2016. / May 18, 2016. / Emergency Message Dissemination, Probabiliy, VANET / Includes bibliographical references. / Ming Yu, Professor Directing Thesis; Bruce A. Harvey, Committee Member; Bing W. Kwan, Committee Member. Computer engineering
144	System Design, Validation of I/O Layers for Vehicle to Everything (V2X) Applications and Data Filtering of SPaT for Optimizer Input Sonandkar, Vinayak Anandrao January 2021 (has links) No description available. Computer Engineering
145	Enabling software security mechanisms through architectural support Delshadtehrani, Leila 15 May 2021 (has links) Over the past decades, there has been a growing number of attacks compromising the security of computing systems. In the first half of 2020, data breaches caused by security attacks led to the exposure of 36 billion records containing private information, where the average cost of a data breach was $3.86 million. Over the years, researchers have developed a variety of software solutions that can actively protect computing systems against different classes of security attacks. However, such software solutions are rarely deployed in practice, largely due to their significant performance overhead, ranging from ~15% to multiple orders of magnitude. A hardware-assisted security extension can reduce the performance overhead of software-level implementations and provide a practical security solution. Hence, in recent years, there has been a growing trend in the industry to enforce security policies in hardware. Unfortunately, the current trend only implements dedicated hardware extensions for enforcing fixed security policies in hardware. As these policies are built in silicon, they cannot be updated at the pace at which security threats evolve. In this thesis, we propose a hybrid approach by developing and deploying both dedicated and flexible hardware-assisted security extensions. We incorporate an array of hardware engines as a security layer on top of an existing processor design. These engines are in the form of Programmable Engines (PEs) and Specialized Engines (SEs). A PE is a minimally invasive and flexible design, capable of enforcing a variety of security policies as security threats evolve. In contrast, an SE, which requires targeted modifications to an existing processor design, is a dedicated hardware security extension. An SE is less flexible than a PE, but has lower overheads. We first propose a PE called PHMon, which can enforce a variety of security policies. PHMon can also assist with detecting software bugs and security vulnerabilities. We demonstrate the versatility of PHMon through five representative use cases, (1) a shadow stack, (2) a hardware-accelerated fuzzing engine, (3) information leak prevention, (4) hardware accelerated debugging, and (5) a code coverage engine. We also propose two SEs as dedicated hardware extensions. Our first SE, called SealPK, provides an efficient and secure protection key-based intra-process memory isolation mechanism for the RISC-V ISA. SealPK provides higher security guarantees than the existing hardware extension in Intel processors, through three novel sealing features. These features prevent an attacker from modifying sealed domains, sealed pages, and sealed permissions. Our second SE, called FlexFilt, provides an efficient capability to guarantee the integrity of isolation-based mechanisms by preventing the execution of various instructions in untrusted parts of the code at runtime. We demonstrate the feasibility of our PE and SEs by providing a practical prototype of our hardware engines interfaced with a RISC-V processor on an FPGA and by providing the full Linux software stack for our design. Our FPGA-based evaluation demonstrates that PHMon improves the performance of fuzzing by 16X over the state-of-the-art software-based implementation while a PHMon-based shadow stack has less than 1% performance overhead. An isolated shadow stack implemented by leveraging SealPK is 80X faster than an isolated implementation using mprotect, and FlexFilt incurs negligible performance overhead for filtering instructions. / 2021-11-15T00:00:00Z Computer engineering
146	Enabling secure multi-party computation with FPGAs in the datacenter Wolfe, Pierre-Francois W. 15 May 2021 (has links) Big data utilizes large amounts of processing resources requiring either greater efficiency or more selectivity. The collection and managing of such large pools of data also introduces more opportunities for compromised security and privacy, necessitating more attentive planning and mitigations. Multi-Party Computation (MPC) is a technique enabling confidential data from multiple sources to be processed securely, only revealing agreed-upon results. Currently, adoption is limited by the challenge of basing a complete system on available software libraries. Many libraries require expertise in cryptography, do not efficiently address the computation overhead of employing MPC, and leave deployment considerations to the user. In this work we consider the available MPC protocols, changes in computer hardware, and growth of cloud computing. We propose a cloud-deployed MPC as a Service (MPCaaS) to help eliminate the barriers to adoption and enable more organizations and individuals to handle their shared data processing securely. The growing presence of Field Programmable Gate Array (FPGA) hardware in datacenters enables accelerated computing as well as low latency, high bandwidth communication that bolsters the performance of MPC. Developing an abstract service that employs this hardware will democratize access to MPC, rather than restricting it to the small overlapping pools of users knowledgeable about both cryptography and hardware accelerators. A hardware proof of concept we have implemented at BU supports this idea. We deployed an efficient three-party Secret Sharing (SS) protocol supporting both Boolean and arithmetic shares on FPGA hardware. We compare our hardware design to the original authors' software implementations of Secret Sharing and to research results accelerating MPC protocols based on Garbled Circuits with FPGAs. Our conclusion is that Secret Sharing in the datacenter is competitive and, when implemented on FPGA hardware, is able to use at least 10$\times$ fewer computer resources than the original work using CPUs. Finally, we describe the ongoing work and envision research stages that will help us to build a complete MPCaaS system. Computer engineering
147	Cross-layer design of thermally-aware 2.5D systems Ma, Yenai 29 September 2020 (has links) Over the past decade, CMOS technology scaling has slowed down. To sustain the historic performance improvement predicted by Moore's Law, in the mid-2000s the computing industry moved to using manycore systems and exploiting parallelism. The on-chip power densities of manycore systems, however, continued to increase after the breakdown of Dennard's Scaling. This leads to the `dark silicon' problem, whereby not all cores can operate at the highest frequency or can be turned on simultaneously due to thermal constraints. As a result, we have not been able to take full advantage of the parallelism in manycore systems. One of the 'More than Moore' approaches that is being explored to address this problem is integration of diverse functional components onto a substrate using 2.5D integration technology. 2.5D integration provides opportunities to exploit chiplet placement flexibility to address the dark silicon problem and mitigate the thermal stress of today's high-performance systems. These opportunities can be leveraged to improve the overall performance of the manycore heterogeneous computing systems. Broadly, this thesis aims at designing thermally-aware 2.5D systems. More specifically, to address the dark silicon problem of manycore systems, we first propose a single-layer thermally-aware chiplet organization methodology for homogeneous 2.5D systems. The key idea is to strategically insert spacing between the chiplets of a 2.5D manycore system to lower the operating temperature, and thus reclaim dark silicon by allowing more active cores and/or higher operating frequency under a temperature threshold. We investigate manufacturing cost and thermal behavior of 2.5D systems, then formulate and solve an optimization problem that jointly maximizes performance and minimizes manufacturing cost. We then enhance our methodology by incorporating a cross-layer co-optimization approach. We jointly maximize performance and minimize manufacturing cost and operating temperature across logical, physical, and circuit layers. We propose a novel gas-station link design that enables pipelining in passive interposers. We then extend our thermally-aware optimization methodology for network routing and chiplet placement of heterogeneous 2.5D systems, which consist of central processing unit (CPU) chiplets, graphics processing unit (GPU) chiplets, accelerator chiplets, and/or memory stacks. We jointly minimize the total wirelength and the system temperature. Our enhanced methodology increases the thermal design power budget and thereby improves thermal-constraint performance of the system. / 2021-03-29T00:00:00Z Computer engineering
148	Design of a 10GHz RF power amplifier in 130nm CMOS technology based on Wilkinson combiner methodology Zhao, Shanshan 04 June 2019 (has links) There is a growing demand today to design and fabricate RF power amplifiers at high frequencies above 5GHz that can directly drive a 50Ω antenna with sufficiently high transmission power to meet the needs of various wireless communication applications. This has typically been done by using GaN or other III-V technologies to build the power amplifier transistor, in order to allow for the use of much higher power supply voltages, than are used in today’s silicon technologies. For example, a 5W GaN power amplifier at 5GHz would typically make use of a VDD of 5V to 10V, and would be done as a discrete device on a separate module from the RF analog circuitry built out of silicon. With the continuing evolution of Moore’s Law, silicon technologies in use today for high frequency wireless communications typically are using VDD of 1.5V or less. There is a desire, however, in many wireless applications to be able to place the RF power amplifier on the same silicon chip as all the other RF/analog IC circuitry, in order to save chip fabrication cost. Consequently, research in improved methods of RF power amplifier design in silicon technology is being done in many IC design laboratories in order to increase the RF power output of power amplifiers built in silicon. This MS Thesis proposes the complete design of a four channel RF power amplifier by using the Wilkinson combiner with 27dBm output power. All the circuits are designed and implemented based on the Global Foundries 130nm SiGe BiCMOS technology and design kit at a frequency of 10GHz with a VDD = 1.5V, to provide 0.5W of RF output signal power into a 50Ω antenna. Computer engineering
149	Synergistic Timing Speculation for Multi-Threaded Programs Yasin, Atif 01 May 2016 (has links) Timing speculation is a promising approach to increase the processor performance and energy efficiency. Under timing speculation, an integrated circuit is allowed to operate at a speed faster than its slowest path\|the critical path. It is based on the empirical observation, which is presented later in the thesis, that these critical path delays are rarely manifested during the program execution. Consequently, as long as the processor is equipped with an error detection and recovery mechanism, its performance can be increased and/or energy consumption reduced beyond that achievable by any other conventional operation. While many past works have dealt with timing speculation within a single core, in this work, a new direction is being uncovered \| timing speculation for a multi-core processor executing a parallel, multi-threaded application. Through a rigorous cross-layered circuit architectural analysis, it is observed that during the execution of a multi-threaded program, there is a significant variation in circuit delay characteristics across different threads. Synergistic Timing Speculation (SynTS) is proposed to exploit this variation (heterogeneity) in path sensitization delays, to jointly optimize the energy and execution time of the many-core processor. In particular, SynTS uses a sampling based online error probability estimation technique, coupled with a polynomial time algorithm, to optimally determine the voltage, frequency and the amount of timing speculation for each thread. The experimental analysis is presented for three pipe stages, namely, Decode, SimpleALU and ComplexALU, with a reduction in Energy Delay Product by up to 26%, 25% and 7.5% respectively, compared to existing per-core timing speculation scheme. The analysis also embeds a case study for a General Purpose Graphics Processing Unit. Computer Engineering
150	Real-Time Scheduling Algorithm Design on Stochastic Processors Pakrashi, Anushka 01 May 2014 (has links) Recent studies have shown that significant power savings are possible with the use of in- exact processors, which may contain a small percentage of errors in computation. However, use of such processors in time-sensitive systems is challenging as these processors significantly hamper the system performance. In this thesis, a design framework is developed for real-time applications running on stochastic processors. To identify hardware error pat- terns, two methods are proposed to predict the occurrence of hardware errors. In addition, an algorithm is designed that uses knowledge of the hardware error patterns to judiciously schedule real-time jobs in order to maximize real-time performance. Both analytical and simulation results show that the proposed approach provides significant performance improvements when compared to an existing real-time scheduling algorithm and is efficient enough for online use. Computer Engineering

Search results