Orchestrating thread scheduling and cache management to improve memory system throughput in throughput processorsLi, Dong, active 21st century 10 July 2014 (has links)
Throughput processors such as GPUs continue to provide higher peak arithmetic capability. Designing a high throughput memory system to keep the computational units busy is very challenging. Future throughput processors must continue to exploit data locality and utilize the on-chip and off-chip resources in the memory system more effectively to further improve the memory system throughput. This dissertation advocates orchestrating the thread scheduler with the cache management algorithms to alleviate GPU cache thrashing and pollution, avoid bandwidth saturation and maximize GPU memory system throughput. Based on this principle, this thesis work proposes three mechanisms to improve the cache efficiency and the memory throughput. This thesis work enhances the thread throttling mechanism with the Priority-based Cache Allocation mechanism (PCAL). By estimating the cache miss ratio with a variable number of cache-feeding threads and monitoring the usage of key memory system resources, PCAL determines the number of threads to share the cache and the minimum number of threads bypassing the cache that saturate memory system resources. This approach reduces the cache thrashing problem and effectively employs chip resources that would otherwise go unused by a pure thread throttling approach. We observe 67% improvement over the original as-is benchmarks and a 18% improvement over a better-tuned warp-throttling baseline. This work proposes the AgeLRU and Dynamic-AgeLRU mechanisms to address the inter-thread cache thrashing problem. AgeLRU prioritizes cache blocks based on the scheduling priority of their fetching warp at replacement. Dynamic-AgeLRU selects the AgeLRU algorithm and the LRU algorithm adaptively to avoid degrading the performance of non-thrashing applications. There are three variants of the AgeLRU algorithm: (1) replacement-only, (2) bypassing, and (3) bypassing with traffic optimization. Compared to the LRU algorithm, the above mentioned three variants of the AgeLRU algorithm enable increases in performance of 4%, 8% and 28% respectively across a set of cache-sensitive benchmarks. This thesis work develops the Reuse-Prediction-based cache Replacement scheme (RPR) for the GPU L1 data cache to address the intra-thread cache pollution problem. By combining the GPU thread scheduling priority together with the fetching Program Counter (PC) to generate a signature as the index of the prediction table, RPR identifies and prioritizes the near-reuse blocks and high-reuse blocks to maximize the cache efficiency. Compared to the AgeLRU algorithm, the experimental results show that the RPR algorithm results in a throughput improvement of 5% on average for regular applications, and a speedup of 3.2% on average across a set of cache-sensitive benchmarks. The techniques proposed in this dissertation are able to alleviate the cache thrashing, cache pollution and resource saturation problems effectively. We believe when these techniques are combined, they will synergistically further improve GPU cache efficiency and the overall memory system throughput. / text
2011 August 1900
ChipMulti-processor (CMP) architectures have become mainstream for designing processors. With a large number of cores, Network-On-Chip (NOC) provides a scalable communication method for CMPs. NOC must be carefully designed to provide low latencies and high throughput in the resource-constrained environment. To improve the network throughput, we propose the Very Long Packet Window (VLPW) architecture for the NOC router design that tries to close the throughput gap between state-of-the-art on-chip routers and the ideal interconnect fabric. To improve throughput, VLPW optimizes Switch Allocation (SA) efficiency. Existing SA normally applies Round-Robin scheduling to arbitrate among the packets targeting the same output port. However, this simple approach suffers from low arbitration efficiency and incurs low network throughput. Instead of relying solely on simple switch scheduling, the VLPW router design globally schedules all the input packets, resolves the output conflicts and achieves high throughput. With the VLPW architecture, we propose two scheduling schemes: Global Fairness and Global Diversity. Our simulation results show that the VLPW router achieves more than 20% throughput improvement without negative effects on zero-load latency.
15 February 2011
Cognitive radio network is discussing how to enhance frequency reuse by allowing the unlicensed users to utilize the frequency bands of licensed users when these bands are not currently being used. Generally speaking, we called these unlicensed users as secondary users and these licensed users as primary users. In order to enhance frequency reuse, the secondary users need to monitor the spectrum continuously to avoid possible interference with the primary users, and once the primary users are found to be active, the secondary users are required to vacate the frequency bands. Therefore, spectrum sensing plays a significant important role in cognitive radio network. There are two probability values associated with spectrum sensing: probability of detection and probability of false alarm. The higher the probability of detection means the better theprimary users are protected. However, from the secondary users¡¦ perspective, the lower the probability of false alarm, the more chances the frequency bands can be reused when it is available, thus the higher the achievable throughput for the secondary network. In this thesis, we study the problem of designing the sensing duration to maximize the achievable throughput for the secondary network under the constraint that the primary user is sufficiently protected. We formulate the sensing-throughput tradeoff problem mathematically, and use energy detection ¡]ED¡^ sensing scheme to prove that the formulated problem indeed has oneoptimal sensing time that yields the highest throughput for the secondary network. We also discuss the case of two secondary users with the concept of cooperative systems.
Cell based High Throughput Screening (HTS) has become a very important method in pharmaceutical drug discovery and presently carried out using robots and well plates. A microfluidics based device for cell based HTS using traditional cell culture protocol would be a significant addition to the field. In this thesis novel microfluidic HTS devices targeted for cell based assays having traditional non-compartmentalized agar gel as cell culture medium and electric control over drug dose is being reported. The basic design of device consists of a gel layer supported by a nanoporous membrane that is bonded to microchannels underneath it. The pores of membrane are blocked everywhere except in selected regions that serve as fluidic interfaces between the microchannel below and the gel above. Upon application of electric field nanopores start to act as electrokinetic pumps. By selectively switching an array of such micropumps, a number of spots -containing drug molecules- are created simultaneously in the gel layer. By diffusion drugs reach to the top surface of gel where cells are to be grown. Based on this principle, a number of different devices are fabricated using microfabrication technology. The fabricated devices include, single drug spot forming device, multiple drug spot forming device and microarray of drug spots forming device. By controlling pumping potential and duration spots sizes ranging from 200μm to 6mm diameter and having inter-spot distances of 0.4mm-10mm have been created. Absence of diffusional transport through the nanoporous interfaces without electric field is demonstrated. A number of representative molecules, including surrogate drug molecules (trypan blue, and methylene blue) and biomolecules (DNA and protein) were selected for demonstration purpose. Dosing range of 50-3000 μg and spot density of 156 spots/cm² were achieved. The drug spot density was found to be limited by molecular diffusion in gel and hence numerical study was carried to find out ways for density increase. Based on this simulation, a method for diffusion reduction called diffusion barrier was proposed. Diffusion barrier used specially dimensioned (having shallow grooves) gel sheet to reduce the diffusion. / Thesis / Master of Applied Science (MASc)
Untersuchungen im Rahmen einer Konzeption und Entwicklung eines neuen biohybriden Mikrosystems für den Einsatz im pharmazeutischen "Screening" /Thielecke, Hagen. January 2003 (has links) (PDF)
Univ., Diss.--Saarbrücken, 2002.
Düsseldorf, Universiẗat, Diss., 2003.
Estimation of QoE aware sustainable throughput in relation to TCP throughput to evaluate the user experienceRouthu, Venkata Sai Kalyan January 2018 (has links)
Throughput Measurements and Empirical Prediction Models for IEEE 802.11b Wireless LAN (WLAN) InstallationsHenty, Benjamin E. 19 August 2001 (has links)
Typically a wireless LAN infrastructure is designed and installed by Networking professionals. These individuals are extremely familiar with wired networks, but are often unfamiliar with wireless networks. Thus, Wireless LAN installations are currently handicapped by the lack of an accurate, performance prediction model that is intuitive for use by non-wireless professionals. To provide a solution to this problem, this thesis presents a method of predicting the expected wireless LAN throughput using a site-specific model of an indoor environment. In order to develop this throughput prediction model, two wireless LAN throughput measurement products, LANFielder and SiteSpy, were created. These two products, which are patent pending, allow site-specific network performance measurements to be made. These two software packages were used to conduct an extensive measurement campaign to evaluate the performance of two IEEE 802.11b access points (APs) under ideal, multiuser, and interference scenarios. The data from this measurement campaign was then used to create empirically based throughput prediction models. The resulting models were first developed using RSSI measurements and then confirmed using predicted signal strength parameters. / Master of Science
Britton, Jennifer Kathleen Susan
No description available.
The Secondary Users¡¦Throughput Maximization in Cognitive Radio System Under Channel Capacity ConstraintChang, Chih-Kai 04 August 2010 (has links)
In a CR network, the maximum SUs throughput is desired generally. In this thesis, We investigate and formulate the problem of the secondary users¡¦ throughput maximization in cognitive radio systems under channel capacity constrain. By using KKT theorem, an objec- tive function is developed to obtain an optimal solution for the SU throughput maximization problem. An numerical example is also presented for illustration. The most important results revealed in the example show that the maximum SU throughput is achieved by cooperating an optimal number of SU pairs instead of cooperating all the SU pairs.
Page generated in 0.1091 seconds