Global ETD Search

1	AN OPEN, SCALABLE APPROACH TO EFFICIENT DATA PROCESSING Kilpatrick, Stephen, Westhart, Philip M, Abbott, Ben A. 11 1900 (has links) The growth of network-based systems in flight test will present performance problems within the community. Legacy instrumentation systems are not capable of meeting the high-bandwidth, low latency data processing requirements of these next generation data acquisition systems. Ongoing research at Southwest Research Institute is exploring the use of a variety of commodity components, such as Graphics Processing Units (GPUs) and multicore Central Processing Units (CPUs), in ways that can be applied to both the small embedded components as well as the larger ground systems. This paper will explore an open, scalable Commercial-Off-The-Shelf (COTS) approach to bridge the gap and minimize changes to the legacy systems. Current results from this approach will be presented at the conference. IP networking iNET data processing commodity hardware COTS GPU
2	Designing Practical Software Bug Detectors Using Commodity Hardware and Common Programming Patterns Zhang, Tong 13 January 2020 (has links) Software bugs can cost millions and affect people's daily lives. However, many bug detection tools are not always practical in reality, which hinders their wide adoption. There are three main concerns regarding existing bug detectors: 1) run-time overhead in dynamic bug detectors, 2) space overhead in dynamic bug detectors, and 3) scalability and precision issues in static bug detectors. With those in mind, we propose to: 1) leverage commodity hardware to reduce run-time overhead, 2) reuse metadata maintained by one bug detector to detect other types of bugs, reducing space overhead, and 3) apply programming idioms to static analyses, improving scalability and precision. We demonstrate the effectiveness of three approaches using data race bugs, memory safety bugs, and permission check bugs, respectively. First, we leverage the commodity hardware transactional memory (HTM) selectively to use the dynamic data race detector only if necessary, thereby reducing the overhead from 11.68x to 4.65x. We then present a production-ready data race detector, which only incurs a 2.6% run-time overhead, by using performance monitoring units (PMUs) for online memory access sampling and offline unsampled memory access reconstruction. Second, for memory safety bugs, which are more common than data races, we provide practical temporal memory safety on top of the spatial memory safety of the Intel MPX in a memory-efficient manner without additional hardware support. We achieve this by reusing the existing metadata and checks already available in the Intel MPX-instrumented applications, thereby offering full memory safety at only 36% memory overhead. Finally, we design a scalable and precise function pointer analysis tool leveraging indirect call usage patterns in the Linux kernel. We applied the tool to the detection of permission check bugs; the detector found 14 previously unknown bugs within a limited time budget. / Doctor of Philosophy / Software bugs have caused many real-world problems, e.g., the 2003 Northeast blackout and the Facebook stock price mismatch. Finding bugs is critical to solving those problems. Unfortunately, many existing bug detectors suffer from high run-time and space overheads as well as scalability and precision issues. In this dissertation, we address the limitations of bug detectors by leveraging commodity hardware and common programming patterns. Particularly, we focus on improving the run-time overhead of dynamic data race detectors, the space overhead of a memory safety bug detector, and the scalability and precision of the Linux kernel permission check bug detector. We first present a data race detector built upon commodity hardware transactional memory that can achieve 7x overhead reduction compared to the state-of-the-art solution (Google's TSAN). We then present a very lightweight sampling-based data race detector which re-purposes performance monitoring hardware features for lightweight sampling and uses a novel offline analysis for better race detection capability. Our result highlights very low overhead (2.6%) with 27.5% detection probability with a sampling period of 10,000. Next, we present a space-efficient temporal memory safety bug detector for a hardware spatial memory safety bug detector, without additional hardware support. According to experimental results, our full memory safety solution incurs only a 36% memory overhead with a 60% run-time overhead. Finally, we present a permission check bug detector for the Linux kernel. This bug detector leverages indirect call usage patterns in the Linux kernel for scalable and precise analysis. As a result, within a limited time budget (scalable), the detector discovered 14 previously unknown bugs (precise). Software Bug Detection Compilers Commodity Hardware Data Race Detection Memory Safety Permission Check Placement Analysis
3	Toward Highly-efficient GPU-centric Networking / Mot Högeffektiva GPU-centrerade Nätverk Girondi, Massimo January 2024 (has links) Graphics Processing Units (GPUs) are emerging as the most popular accelerator for many applications, powering the core of Machine Learning applications and many computing-intensive workloads. GPUs have typically been consideredas accelerators, with Central Processing Units (CPUs) in charge of the mainapplication logic, data movement, and network connectivity. In these architectures,input and output data of network-based GPU-accelerated application typically traverse the CPU, and the Operating System network stack multiple times, getting copied across the system main memory. These increase application latency and require expensive CPU cycles, reducing the power efficiency of systems, and increasing the overall response times. These inefficiencies become of higher importance in latency-bounded deployments, or with high throughput, where copy times could easily inflate the response time of modern GPUs. The main contribution of this dissertation is towards a GPU-centric network architecture, allowing GPUs to initiate network transfers without the intervention of CPUs. We focus on commodity hardware, using NVIDIA GPUs and Remote Direct Memory Access over Converged Ethernet (RoCE) to realize this architecture, removing the need of highly homogeneous clusters and ad-hoc designed network architecture, as it is required by many other similar approaches. By porting some rdma-core posting routines to GPU runtime, we can saturate a 100-Gbps link without any CPU cycle, reducing the overall system response time, while increasing the power efficiency and improving the application throughput.The second contribution concerns the analysis of Clockwork, a State-of-The-Art inference serving system, showing the limitations imposed by controller-centric, CPU-mediated architectures. We then propose an alternative architecture to this system based on an RDMA transport, and we study some performance gains that such a system would introduce. An integral component of an inference system is to account and track user flows,and distribute them across multiple worker nodes. Our third contribution aims to understand the challenges of Connection Tracking applications running at 100Gbps, in the context of a Stateful Load Balancer running on commodity hardware. / <p>QC 20240315</p> Low-Latency Internet Services Packet Processing Network Functions Virtualization Middle Boxes Commodity Hardware Multi-Hundred-Gigabit-Per-Second Low-Level Optimization Graphics Processing Units Inference Serving Remote Direct Memory Access Internettjänster med Låg Fördröjning Paketbearbetning Virtualisering av Nätverksfunktioner Mellanutrustning Tillgänglig Datorhårdvara Flera-Hundra- Gigabit-Per-Sekund Lågnivå-Optimering Grafikprocessor Inferensserving Remote Direct Memory Access Communication Systems Kommunikationssystem Computer Systems Datorsystem

1

Page generated in 0.0706 seconds