Global ETD Search

121	Queue length based pacing of internet traffic Cai, Yan 01 January 2011 (has links) As the Internet evolves, there is a continued demand for high Internet bandwidth. This demand is driven partly by the widely spreading real-time video applications, such as on-line gaming, teleconference, high-definition video streaming. All-optical switches and routers have long been studied as a promising solution to the rapidly growing demand. Nevertheless, buffer sizes in all-optical switches and routers are very limited due to the challenges in manufacturing larger optical buffers. On the other hand, Internet traffic is bursty. The existence of burstiness in network traffic has been shown at all time scales, from tens of milliseconds to thousands of seconds. The widely existing burstiness has a very significant impact on the performance of small buffer networks, resulting in high packet drop probabilities and low link utilization. There have been many solutions proposed in the literature to solve the burstiness issue of network traffic. Traffic engineering techniques, such as traffic shaping and polishing, have been available in commercial routers/switches since the era of Asynchronous Transfer Mode (ATM) networks. Moreover, TCP pacing, as a natural solution to the TCP burstiness, has long been studied. Furthermore, several traffic conditioning and scheduling techniques are proposed to smooth core network traffics in a coordinated manner. However, all the existing solutions are inadequate to efficiently solve the burstiness issue of high-speed traffic. In this dissertation we aim to tackle the burstiness issue in small buffer networks, which refer to the future Internet core network consisting of all-optical routers and switches with small buffers. This dissertation is composed of two parts. In the first part, we analyze the impact of a general pacing scheme on the performance of a tandem queue network. This part serves as a theoretical foundation, based on which we demonstrate the benefits of pacing in a tandem queue model. Specifically, we use the Infinitesimal Perturbation Analysis (IPA) technique to study the impact of pacing on the instantaneous and average queue lengths of a series of nodes. Through theoretical analyses and extensive simulations, we show that under certain conditions there exists a linear relationship between system parameters and instantaneous/average queue lengths of nodes and that pacing improves the performance of the underlying tandem queue system by reducing the burstiness of the packet arrival process. In the second part, we propose a practical on-line packet pacing scheme, named Queue Length Based Pacing (QLBP). We analyze the impact of QLBP on the underlying network traffic in both time and frequency domains. We also present two implementation algorithms that allow us to evaluate the performance of QLBP in real experimental and virtual simulation environments. Through extensive simulations, we show that QLBP can effectively reduce the burstiness of network traffic and hence significantly improve the performance of a small buffer network. More important, the network traffic paced with QLBP does not exhibit a weakened competition capability when competing with non-paced traffic, which makes the QLBP scheme more attractive for ISPs. Computer Engineering
122	Online management of resilient and power efficient multicore processors Rodrigues, Rance 01 January 2013 (has links) The semiconductor industry has been driven by Moore's law for almost half a century. Miniaturization of device size has allowed more transistors to be packed into a smaller area while the improved transistor performance has resulted in a significant increase in frequency. Increased density of devices and rising frequency led, unfortunately, to a power density problem which became an obstacle to further integration. The processor industry responded to this problem by lowering processor frequency and integrating multiple processor cores on a die, choosing to focus on Thread Level Parallelism (TLP) for performance instead of traditional Instruction Level Parallelism (ILP). While continued scaling of devices have provided unprecedented integration, it has also unfortunately led to a few serious problems: The first problem is that of increasing rates of system failures due to soft errors and aging defects. Soft errors are caused by ionizing radiations that originate from radioactive contaminants or secondary release of charged particles from cosmic neutrons. Ionizing radiations may charge/discharge a storage node causing bit flips which may result in a system failure. In this dissertation, we propose solutions for online detection of such errors in microprocessors. A small and functionally limited core called the Sentry Core (SC) is added to the multicore. It monitors operation of the functional cores in the multicore and whenever deemed necessary, it opportunistically initiates Dual Modular redundancy (DMR) to test the operation of the cores in the multicore. This scheme thus allows detection of potential core failure and comes at a small hardware overhead. In addition to detection of soft errors, this solution is also capable of detecting errors introduced by device aging that results in failure of operation. The solution is further extended to verify cache coherence transactions. A second problem we address in this dissertation relate to power concerns. While the multicore solution addresses the power density problem, overall power dissipation is still limited by packaging and cooling technologies. This limits the number of cores that can be integrated for a given package specification. One way to improve performance within this constraint is to reduce power dissipation of individual cores without sacrificing system performance. There have been prior solutions to achieve this objective that involve Dynamic Voltage and Frequency Scaling (DVFS) and the use of sleep states. DVFS and sleep states take advantage of coarse grain variation in demand for computation. In this dissertation, we propose techniques to maximize performance-per-power of multicores at a ne grained time scale. We propose multiple alternative architectures to attain this goal. One of such architectures we explore is Asymmetric Multicore Processors (AMPs). AMPs have been shown to outperform the symmetric ones in terms of performance and Performance-per-Watt for a fixed resource and power budget. However, effectiveness of these architectures depends on accurate thread-to-core scheduling. To address this problem, we propose online thread scheduling solutions responding to changing computational requirements of the threads. Another solution we consider is for Symmetric Multicore processors (SMPs). Here we target sharing of the large and underutilized resources between pairs of cores. While such architectures have been explored in the past, the evaluations were incomplete. Due to sharing, sometimes the shared resource is a bottleneck resulting in signicant performance loss. To mitigate such loss, we propose the Dynamic Voltage and Frequency Boosting (DVFB) of the shared resources. This solution is found to significantly mitigate performance loss in times of contention. We also explore in this dissertation, performance-per-Watt improvement of individual cores in a multicore. This is based on dynamic reconfiguration of individual cores to run them alternately in out-of-order (OOO) and in-order (InO) modes adapting dynamically to workload characteristics. This solution is found to significantly improve power efficiency without compromising overall performance. Thus, in this dissertation we propose solutions for several important problems to facilitate continued scaling of processors. Specifically, we address challenges in the area of reliability of computation and propose low power design solutions to address power constraints. Computer Engineering
123	BraiNet: A Framework for Designing Pervasive Brain-Machine Interface Applications January 2020 (has links) abstract: Due to the advent of easy-to-use, portable, and cost-effective brain signal sensing devices, pervasive Brain-Machine Interface (BMI) applications using Electroencephalogram (EEG) are growing rapidly. The main objectives of these applications are: 1) pervasive collection of brain data from multiple users, 2) processing the collected data to recognize the corresponding mental states, and 3) providing real-time feedback to the end users, activating an actuator, or information harvesting by enterprises for further services. Developing BMI applications faces several challenges, such as cumbersome setup procedure, low signal-to-noise ratio, insufficient signal samples for analysis, and long processing times. Internet-of-Things (IoT) technologies provide the opportunity to solve these challenges through large scale data collection, fast data transmission, and computational offloading. This research proposes an IoT-based framework, called BraiNet, that provides a standard design methodology for fulfilling the pervasive BMI applications requirements including: accuracy, timeliness, energy-efficiency, security, and dependability. BraiNet applies Machine Learning (ML) based solutions (e.g. classifiers and predictive models) to: 1) improve the accuracy of mental state detection on-the-go, 2) provide real-time feedback to the users, and 3) save power on mobile platforms. However, BraiNet inherits security breaches of IoT, due to applying off-the-shelf soft/hardware, high accessibility, and massive network size. ML algorithms, as the core technology for mental state recognition, are among the main targets for cyber attackers. Novel ML security solutions are proposed and added to BraiNet, which provide analytical methodologies for tuning the ML hyper-parameters to be secure against attacks. To implement these solutions, two main optimization problems are solved: 1) maximizing accuracy, while minimizing delays and power consumption, and 2) maximizing the ML security, while keeping its accuracy high. Deep learning algorithms, delay and power models are developed to solve the former problem, while gradient-free optimization techniques, such as Bayesian optimization are applied for the latter. To test the framework, several BMI applications are implemented, such as EEG-based drivers fatigue detector (SafeDrive), EEG-based identification and authentication system (E-BIAS), and interactive movies that adapt to viewers mental states (nMovie). The results from the experiments on the implemented applications show the successful design of pervasive BMI applications based on the BraiNet framework. / Dissertation/Thesis / Doctoral Dissertation Computer Engineering 2020 Computer engineering
124	Dynamic resource management for high-performance many-core packet processing systems Wu, Qiang 01 January 2011 (has links) The complexity of operations performed in the data path of today’s Internet has expanded significantly beyond the simple store-and-forward concept proposed in the original architecture. The trend towards more functionality and complexity in the data path is expected for next-generation networks in order to accommodate innovative applications that may emerge in the future. Flexible deployment of network applications and sufficient processing capacity are therefore key to the success of next- generation packet processing systems. To develop such systems, three key questions need to be answered: (1) How to build them? (2) How to use them? and (3) How to utilize them. In this dissertation, I discuss my work in areas of network processor architecture design, programming abstraction, and runtime management. Task graph is proposed as a simple programming model to separate network processing functionalities from resource management in order to exploit inherent parallelism presented in network processing. The abstraction of task is supported in hardware design to simplify software development. A novel network processor architecture is introduced in this dissertation to address key challenges of general programmability and high performance in next-generation data path. Further design extensions provide fair multithreading on individual cores for fine-grained hardware resource management. With task graph, traditional monolithic processing workload of typical network protocols in data path is partitioned and analyzed. Based on this analysis, an efficient runtime management system is then designed to solve load-balancing problem on current many-core packet processing systems. As the number of integrated cores on many-core architecture keeps increasing in a steady pace, a distributed workload offloading mechanism is further designed for processing task mapping on large scale many-core systems. Evaluation results show solid improvement in simplifying programming model, data path hardware resource utilization, and many-core scalability. With current industry shift towards many-core architecture and increasingly diversified network applications, this work represents an important step towards next-generation programmable packet processing systems. Computer Engineering
125	Toward a secure and scalable internet and economic incentives for evolvable internet architecture Song, Yang 01 January 2013 (has links) The Internet consists of tens of thousands of interconnected diverse, self-owned smaller networks. These networks engage in strategic decision making to maximize their own performance or benefits. In this thesis, I study the routing process and the economics of the decentralized system formed by these networks. Any decentralized system must find the right balance between centralized management and the freedom of individual parties. Internet protocol designers make a choice to give enough trust and flexibility to the autonomous networks. This decision helps the Internet evolve into a system with a tremendous size and a variety of novel applications and services. However, the Internet faces several problems that constrain its further development. In this thesis, I focus on three issues related to routing security, routing scalability and economic incentives. First, Internet routing is vulnerable to malicious behaviors because the protocol design is based on trust. Even though researchers have studied a variety of security solutions for the last decade, protocol attacks can still disrupt the connections of thousands of networks. Second, the rapidly growing Internet challenges the scalability of routing systems with fourfold routing table size increase in 10 years. The Internet routing system is so fragile that router flaps caused by a software bug can disrupt of the connectivity for a portion of the Internet. Third, the Internet forms a unique supply chain for content delivery. It is of great interest to understand the competition under the unique market. This thesis aims to improve the sustainability of the Internet. For the routing security concern, we identify a protocol manipulation attack in the Internet routing system, and propose a simple solution. To improve routing scalability, we design a resilient routing protocol to assistant a new routing addressing scheme that significantly reduces the routing table size. In the last piece of the thesis, we discuss Internet economics. We propose incentive-compatible pricing and investment strategies for self-owned networks to obtain a high social utility under the dynamic and unique Internet market. Computer Engineering
126	Managing resources for high performance and low energy in general-purpose processors Wang, Huaping 01 January 2010 (has links) Microarchitectural techniques, such as superscalar instruction issue, Out-Of-Order instruction execution (OOO), Simultaneous Multi-Threading (SMT) and Chip Multi-Processing (CMP), improve processor performance dramatically. However, as processor design becomes more and more complicated, how to manage the abundant processor resources to achieve optimal performance and power consumption of processors becomes increasingly more sophisticated. This dissertation investigates resource usage controlling techniques for general-purpose microprocessors (supporting both single hardware context and multiple hardware contexts) targeting both energy and performance. We address the power-inefficient resource usage issue in single-context processors and propose a Compiler-based Adaptive Fetch Throttling (CAFT) technique which combines the benefits of a hardware-based runtime throttling technique and a software-based static throttling technique providing good energy savings with a low performance loss. Our simulation results show that the proposed technique doubles the energy-delay product (EDP) savings compared to the fixed threshold throttling. We introduce the resource competing problem for SMT processors, which allow multiple threads to simultaneously share processor resources and improve the energy-efficiency indirectly by resource sharing. We present a novel Adaptive Resource Partitioning Algorithm (ARPA) to control the usage and sharing of processor resources in SMT processors. ARPA analyzes the resource usage efficiency of each thread in a time period and assigns more resources to threads which can use them in a more efficient way. Simulation results on a large set of 42 multiprogrammed workloads show that ARPA outperforms the currently best dynamic resource allocation technique, Hill-climbing, by 5.7% with regard to the overall instruction throughput. Considering fairness accorded to each thread, ARPA attains 9.2% improvements over Hill-climbing, using a commonly used fairness metric. We also propose resource adaptation approaches to adaptively control the number of powered-on ROB entries and partition shared resources among threads for both shared-ROB and divided-ROB structures, targeting both high performance and low energy. Our resource adaptation algorithms approaches consider not only the relative resource usage efficiency of each thread like ARPA, but also take into account the real resource usage of threads to identify cases of inefficient resource usage behavior and save energy. Our experimental results show that for an SMT processor with a shared-ROB structure, our resource adaptation approach achieves 16.7% energy savings over ARPA, while the performance loss is negligible across 42 sample workloads. For an SMT processor with a divided-ROB structure, our resource adaptation approach outperforms ARPA by 4.2% in addition to achieving 12.4% energy savings. Computer Engineering
127	Protocol and system design for a service-centric network architecture Huang, Xin 01 January 2010 (has links) Next-generation Internet will be governed by the need for flexibility. Heterogeneous end-systems, novel applications, and security and manageability challenges require networks to provide a broad range of services that go beyond store-and-forward. Following this trend, a service-centric network architecture is proposed for the next-generation Internet. It utilizes router-based programmability to provide packet processing services inside the network and decompose communications into these service blocks. By providing different compositions of services along the data path, such network can customize its connections to satisfy various communication requirements. This design extends the flexibility of the Internet to meet its next-generation challenges. This work addresses three major challenges in implementing such service-centric networks. Finding the optimal path for a given composition of services is the first challenge. This is called "service routing" since both service availability and routing cost need to be considered. Novel algorithms and a matching protocol are designed to solve the service routing problem in large scale networks. A prototype based on Emulab is implemented to demonstrate and evaluate our design. Finding the optimal composition of services to satisfy the communication requirements of a given connection is the second challenge. This is called "service composition." A novel decision making framework is proposed, which allows the deduction of the service composition problem into a planning problem and automates the composition of service according to specified communication requirements. A further investigation shows that extending this decision making framework to combine the service routing and service composition problems yields a better solution than solving them separately. Run-time resource management on the data plane is the third challenge. Several run-time task mapping approaches have been proposed for Network Processor systems. An evaluation methodology based on queuing network is designed to systematically evaluate and compare these solutions under various network traffic scenarios. The results of this work give qualitative and quantitative insights into next-generation Internet design that combines issues from computer networking, architecture, and system design. Computer Engineering
128	Operational Safety Verification of AI-Enabled Cyber-Physical Systems January 2020 (has links) abstract: One of the main challenges in testing artificial intelligence (AI) enabled cyber physicalsystems (CPS) such as autonomous driving systems and internet-of-things (IoT) medical devices is the presence of machine learning components, for which formal properties are difficult to establish. In addition, operational components interaction circumstances, inclusion of human-in-the-loop, and environmental changes result in a myriad of safety concerns all of which may not only be comprehensibly tested before deployment but also may not even have been detected during design and testing phase. This dissertation identifies major challenges of safety verification of AI-enabled safety critical systems and addresses the safety problem by proposing an operational safety verification technique which relies on solving the following subproblems: 1. Given Input/Output operational traces collected from sensors/actuators, automatically learn a hybrid automata (HA) representation of the AI-enabled CPS. 2. Given the learned HA, evaluate the operational safety of AI-enabled CPS in the field. This dissertation presents novel approaches for learning hybrid automata model from time series traces collected from the operation of the AI-enabled CPS in the real world for linear and nonlinear CPS. The learned model allows operational safety to be stringently evaluated by comparing the learned HA model against a reference specifications model of the system. The proposed techniques are evaluated on the artificial pancreas control system / Dissertation/Thesis / Doctoral Dissertation Computer Science 2020 Computer engineering
129	Dash Database: Structured Kernel Data For The Machine Understanding of Computation January 2020 (has links) abstract: As device and voltage scaling cease, ever-increasing performance targets can only be achieved through the design of parallel, heterogeneous architectures. The workloads targeted by these domain-specific architectures must be designed to leverage the strengths of the platform: a task that has proven to be extremely difficult and expensive. Machine learning has the potential to automate this process by understanding the features of computation that optimize device utilization and throughput. Unfortunately, applications of this technique have utilized small data-sets and specific feature extraction, limiting the impact of their contributions. To address this problem I present Dash-Database; a repository of C and C++ programs for software-defined radio applications and its neighboring fields; a methodology for structuring the features of computation using kernels, and a set of evaluation metrics to standardize computation data sets. Dash-Database contributes a general data set that supports machine understanding of computation and standardizes the input corpus utilized for machine learning of computation; currently only a small set of benchmarks and features are being used. I present an evaluation of Dash-Database using three novel metrics: breadth, depth and richness; and compare its results to a data set largely representative of those used in prior work, indicating a 5x increase in breadth, 40x increase in depth, and a rich set of sample features. Using Dash-Database, the broader community can work toward a general machine understanding of computation that can automate the design of workloads for domain-specific computation. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2020 Computer engineering
130	FPGA-based high-performance neural network acceleration Geng, Tong 19 January 2021 (has links) In the last ten years, Artificial Intelligence through Deep Neural Networks (DNNs) has penetrated virtually every aspect of science, technology, and business. Advances are rapid with thousands of papers being published annually. Many types of DNNs have been and continue to be developed -- in this thesis, we address Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Graph Neural Networks (GNNs) -- each with a different set of target applications and implementation challenges. The overall problem for all of these Neural Networks (NNs) is that their target applications generally pose stringent constraints on latency and throughput, but also have strict accuracy requirements. Much research has therefore gone into all aspects of improving NN quality and performance: algorithms, code optimization, acceleration with GPUs, and acceleration with hardware, both dedicated ASICs and off-the-shelf FPGAs. In this thesis, we concentrate on the last of these approaches. There have been many previous efforts in creating hardware to accelerate NNs. The problem designers face is that optimal NN models typically have significant irregularities, making them hardware unfriendly. One commonly used approach is to train NN models to follow regular computation and data patterns. This approach, however, can hurt the models' accuracy or lead to models with non-negligible redundancies. This dissertation takes a different approach. Instead of regularizing the model, we create architectures friendly to irregular models. Our thesis is that high-accuracy and high-performance NN inference and training can be achieved by creating a series of novel irregularity-aware architectures for Field-Programmable Gate Arrays (FPGAs). In four different studies on four different NN types, we find that this approach results in speedups of 2.1x to 3255x compared with carefully selected prior art; for inference, there is no change in accuracy. The bulk of this dissertation revolves around these studies, the various workload balancing techniques, and the resulting NN acceleration architectures. In particular, we propose four different architectures to handle, respectively, data structure level, operation level, bit level, and model level irregularities. At the data structure level, we propose AWB-GCN, which uses runtime workload rebalancing to handle Sparse Matrices Multiplications (SpMM) on extremely sparse and unbalanced input. With GNN inference as a case study, AWB-GCN achieves over 90% system efficiency, guarantees efficient off-chip memory access, and provides considerable speedups over CPUs (3255x), GPUs (80x), and a prior ASIC accelerator (5.1x). At the operation level, we propose O3BNN-R, which can detect redundant operations and prune them at run time. This works even for those that are highly data-dependent and unpredictable. With Binarized NNs (BNNs) as a case study, O3BNN-R can prune over 30% of the operations, without any accuracy loss, yielding speedups over state-of-the-art implementations on CPUs (1122x), GPUs (2.3x), and FPGAs (2.1x). At the bit level, we propose CQNN. CQNN embeds a Coarse-Grained Reconfigurable Architecture (CGRA) which can be programmed at runtime to support NN functions with various data-width requirements. Results show that CQNN can deliver us-level Quantized NN (QNN) inference. At the model level, we propose FPDeep, especially for training. In order to address model-level irregularity, FPDeep uses a novel model partitioning schemes to balance workload and storage among nodes. By using a hybrid of model and layer parallelism to train DNNs, FPDeep avoids the large gap that commonly occurs between training and testing accuracy due to the improper convergence to sharp minimizers (caused by large training batches). Results show that FPDeep provides scalable, fast, and accurate training and leads to 6.6x higher energy efficiency than GPUs. Computer engineering

Search results