Global ETD Search

81	Hybrid Caching for Chip Multiprocessors Using Compiler-Based Data Classification Li, Yong 26 January 2011 (has links) The high performance delivered by modern computer system keeps scaling with an increasing number of processors connected using distributed network on-chip. As a result, memory access latency, largely dominated by remote data cache access and inter-processor communication, is becoming a critical performance bottleneck. To release this problem, it is necessary to localize data access as much as possible while keep efficient on-chip cache memory utilization. Achieving this however, is application dependent and needs a keen insight into the memory access characteristics of the applications. This thesis demonstrates how using fairly simple thus inexpensive compiler analysis memory accesses can be classified into private data access and shared data access. In addition, we introduce a third classification named probably private access and demonstrate the impact of this category compared to traditional private and shared memory classification. The memory access classification information from the compiler analysis is then provided to the runtime system through a modified memory allocator and page table to facilitate a hybrid private-shared caching technique. The hybrid cache mechanism is aware of different data access classification and adopts appropriate placement and search policies accordingly to improve performance. Our analysis demonstrates that many applications have a significant amount of both private and shared data and that compiler analysis can identify the private data effectively for many applications. Experiments results show that the implemented hybrid caching scheme achieves 4.03% performance improvement over state of the art NUCA-base caching. Computer Engineering
82	Minimum Transmission Power Configuration in Real-Time Wireless Sensor Networks Wang, Xiaodong 01 August 2009 (has links) Multi-channel communications can effectively reduce channel competition and interferences in a wireless sensor network, and thus achieve increased throughput and improved end-to-end delay guarantees with reduced power consumption. However, existing work relies only on a small number of orthogonal channels, resulting in degraded performance when a large number of data flows need to be transmitted on different channels. In this thesis, empirical studies are conducted to investigate the interferences among overlapping channels. The results show that overlapping channels can also be utilized for improved real-time performance if the node transmission power is carefully configured. In order to minimize the overall power consumption of a network with multiple data flows under end-to-end delay constraints, a constrained optimization problem is formulated to configure the transmission power level for every node and assign overlapping channels to different data flows. Since the optimization problem has an exponential computational complexity, a heuristic algorithm designed based on Simulated Annealing is then presented to find a suboptimal solution. The extensive empirical results on a 25-mote testbed demonstrate that the proposed algorithm achieves better real-time performance and less power consumption than two baselines including a scheme using only orthogonal channels. Computer Engineering
83	Optimization of Digital Filter Design Using Hardware Accelerated Simulation Liang, Getao 01 May 2007 (has links) iii Abstract The goal to this research was to develop a scheme to optimize a digital filter design using an optimization engine and hardware-accelerated simulation using a Field Programmable Gate Array (FPGA). A parameterizable generic digital filter, which was fully implemented on a prototyping board with a Xilinx Virtex-II Pro xc2vp30-7-ff896 FPGA, was developed using Xilinx System Generator for DSP. The optimization engine, which actually is a random candidate generator that will eventually be replaced by a differential evolution engine, was implemented using MATLAB along with a candidate evaluator and other supporting programs. Automatic hardware co-simulations of 100 candidate filters were performed successfully to demonstrate that this approach is feasible, reliable and efficient for complex systems. Computer Engineering
84	Accelerating the Stochastic Simulation Algorithm Using Emerging Architectures Jenkins, David Dewayne 01 December 2009 (has links) In order for scientists to learn more about molecular biology, it is imperative that they have the ability to construct and evaluate models. Model statistics consistent with the chemical master equation can be obtained using Gillespie's stochastic simulation algorithm (SSA). Due to the stochastic nature of the Monte Carlo simulations, large numbers of simulations must be run in order to get accurate statistics for the species populations and reactions. However, the algorithm tends to be computationally heavy and leads to long simulation runtimes for large systems. In this research, the performance of Gillespie's stochastic simulation algorithm is analyzed and optimized using a number of techniques and architectures. These techniques include parallelizing simulations using streaming SIMD extensions (SSE), message passing interface with multicore systems and computer cluters, and CUDA with NVIDIA graphics processing units. This research is an attempt to make using the SSA a better option for modeling biological and chemical systems. Through this work, it will be shown that accelerating the algorithm in both of the serial and SSE implementations proved to be beneficial, while the CUDA implementation had lower than expected results. Computer Engineering
85	Vision-Based Reinforcement Learning Using A Consolidated Actor-Critic Model Niedzwiedz, Christopher Allen 01 December 2009 (has links) Vision-based machine learning agents are tasked with making decisions based on high-dimensional, noisy input, placing a heavy load on available resources. Moreover, observations typically provide only partial information with respect to the environment state, necessitating robust state inference by the agent. Reinforcement learning provides a framework for decision making with the goal of maximizing long-term reward. This thesis introduces a novel approach to vision-based reinforce- ment learning through the use of a consolidated actor-critic model (CACM). The approach takes advantage of artificial neural networks as non-linear function approximators and the reduced com- putational requirements of the CACM scheme to yield a scalable vision-based control system. In this thesis, a comparison between the actor-critic and CACM is made. Additionally, the affect of observation prediction and correlated exploration has on the agent's performance is investigated. Computer Engineering
86	Design of a Highly Portable Data Logging Embedded System for Naturalistic Motorcycle Study Elmehraz, Noureddine 01 January 2013 (has links) According to Motorcycle Industrial Council (MIC), in USA the number of owned motorcycle increased during last few years and most likely will keep increasing. However, the number of the deadly crash accidents associated with motorcycles is on the rise. Although MIC doesn't explain why the accident rate has increased, the unprotected motorcyclist gear can be one of the reasons. The most recent National Highway Traffic Safety Administration (NHTSA) annual report stated that its data analyses are based on their experiences and the best judgment is not based on solid scientific experiment [3]. Thus, building a framework for the data acquisition about the motorcyclist environment is a first step towards decreasing motorcyclist crashes. There are a few naturalistic motorcycle studies reported in the literature. The naturalistic motorcycle study also identifies the behaviors and environmental crash hazards. The primary objective of this thesis work is to design a highly portable data logging embedded system for naturalistic motorcycle study with capability of collecting many types of data such as images, speed, acceleration, time, location, distance approximation, etc. This thesis work is the first phase (of three phases) of a naturalistic motorcycle study project. The second phase is to optimize system area, form factor, and power consumption. The third phase will be concerned with aggressive low power design and energy harvesting. The proposed embedded system design is based on an Arduino microcontroller. A whole suite of Arduino based prototype boards, sensor boards, support software, and user forum is available. The system is high portable with capability to store up to eight (8) hours of text/image data during a one month study period. We have successfully designed and implemented the system and performed three trial runs. The data acquired has been validated and found to be accurate. Computer Engineering
87	Stable Queue Management in TCP/IP Networks Using Feedback Control Theory Pourmohammad, Sajjad 29 August 2015 (has links) <p> Traffic management in data communication networks plays a significant role in the performance and reliability of the network. It has been shown that the dynamics of TCP-based communication networks can be highly complicated due to TCPs nonlinear Additive Increase Multiplicative Decrease (AIMD) congestion control mechanism and the stochastic behavior of internet traffic. Early works in flow control over TCP/IP networks suggested the deployment of end to end control mechanisms to avoid congestion. However, higher levels of performance and reliability were only achievable via effective cooperation of the intermediate routers in traffic control. Different control strategies have been discussed for homogenous networks. However, less attention has been paid to stability and optimality of the controller for heterogeneous network topologies including multiple time-varying delays for the links. In this work, we propose an optimal controller design scheme for heterogeneous networks preserving the closed-loop system stability. Delay dependent stability conditions of the closed loop system are derived based on the Lyapunov-Krasovskii method. The proposed approach offers flexible choice of control parameters allowing the network administrator to control fairness and response time for each individual node in a network of multiple links with different delay properties. We have also proposed a cross-layer analytical model to estimate Quality of Service (QoS) metrics such as delay, throughput, and jitter in multi-hop wireless ad hoc networks operating on IEEE 802.11-based MAC with CSMA/CA. The proposed model can be used for both evaluating quality of service and designing more efficient model-based control and management schemes. The model is developed in a queuing theory paradigm which investigates the stochastic behavior of data transmission in wireless ad hoc networks. An extensive list of key factors including network layer processing time, network/MAC layer queuing delay, traffic coming from application layer, network layer queuing delay, retransmission delays, random back-off times due to channel contention period, and the time spent for RTS/CTS access method have been considered. The effectiveness of the proposed controller and the model are both analyzed using event-based computer simulations.</p> Computer engineering
88	The performance of high-order quadrature amplitude modulation schemes for broadband wireless communication systems Riche, Larry 01 July 2012 (has links) The limited amount frequency spectrum available to wireless comnmnication systemsmakes it difficult to satisfy the rapidly growing demand for wireless service. Spectral efficiency can be increased by using higher order modulation schemes. However this come at the cost of increased probability of error. In this paper we investigate through MATLAB simulation, the implementation of orders of Quadrature Amplitude Modulation (QAM) more commonly used in wired networks. The BER performance of 64, 128, 256, 512, 1024, 2048, 4096, and 8192 QAM signals in the presence of Rayleigh and Rician multipath channels with additive white Gaussian noise are simulated. Computer Engineering
89	Fast Time-of-Flight Phase Unwrapping and Scene Segmentation Using Data Driven Scene Priors Crabb, Ryan Eugene 16 January 2016 (has links) <p> This thesis regards the method of full field time-of-flight depth imaging by way of amplitude modulated continuous wave signals correlated with step-shifted reference waveforms using a specialized solid state CMOS sensor, referred to as photonic mixing device. The specific focus deals with the inherent issue of depth ambiguity due to a fundamental property of periodic signals: that they repeat, or wrap, after each period, and any signal shifted by a whole number of wavelengths is indistinguishable from the original. Recovering the full extent of the signal’s path is known as phase unwrapping. The common, accepted solution requires the imaging of a series of two or more signals with differing modulation frequencies to resolve the ambiguity, the time delay of which will result in erroneous or invalid measurements for non-static elements of the scene. This work details a physical model of the observable illumination of the scene which provides priors for a novel probabilistic framework to recover the scene geometry by imaging only a single modulated signal. It is demonstrated that this process is able to provide more than adequate results in a majority of representative scenes, and that it can be accomplished on typical computer hardware at a speed that allows for the range imaging to be utilized in real-time, interactive applications.</p><p> One such real-time application is presented: alpha-matting, or foreground segmentation, for background substitution of live video. This is a generalized version of the common technique of green-screening that is utilized, for example, by every local weather reporter. The presented method, however, requires no special background, and is able to perform on high resolution video from a lower resolution depth image.</p> Computer engineering
90	Hardware Support for Productive Partitioned Global Address Space (PGAS) Programming Serres, Olivier 16 January 2016 (has links) <p> In order to exploit the increasing number of transistors, and due to the limitations of frequency scaling, the number of cores inside a chip keeps growing. As many-core chips become ubiquitous, there is a greater need for a more productive and efficient parallel programming model. The easy-to-use, but locality-agnostic, shared memory model (e.g. OpenMP) is unable to efficiently exploit memory locality in systems with Non-Uniform Memory Access (NUMA) and Non-Uniform Cache-Access (NUCA) effects. The locality-aware, but explicit, message-passing model (e.g. MPI1) does not provide a productive development environment due to its two-sided communication and a distributed (and isolated) memory model.</p><p> The Partitioned Global Address Space (PGAS) programming model strikes a balance between those two extremes via a global address space that is provided for ease-of-use, but is partitioned for locality awareness. The user-friendly PGAS memory model, however, comes at a performance cost, due to the needed address mapping, which can hinder its potential for performance. To mitigate this overhead and achieve full performance, compiler optimizations may be applied, but are often insufficient. Alternatively, manual optimizations can be applied but they are quite cumbersome and, as such, are unproductive. As a result, the overall benefit of PGAS has been severely limited. In this dissertation, we improved both the productivity and performance of PGAS by introducing a novel hardware support. This PGAS hardware support efficiently handles the complex PGAS mapping and communication without the intervention of an application developer. By introducing the new hardware at the micro-architecture level, fine grain and low latency local shared memory accesses are supported. The hardware is also made available through an ISA extension, so that it can easily be exploited by PGAS compilers to efficiently access and traverse the PGAS memory space. The automatic code generation eliminates the need for hand-tuning, and thus simultaneously improve both the performance and productivity of PGAS languages. This research also introduces and evaluates the possibility for the hardware support to handle a variety of PGAS languages.</p><p> Results are obtained on two different system implementations: the <u> first</u> is based on the well-adopted full system simulator Gem5, which allows the precise evaluation of the performance gain. Two prototype compilers supporting the new hardware are created for experimentation by extending the Berkeley Unified Parallel C (UPC) compiler and the Cray Chapel compiler. This allows unmodified code to use the new instructions without any user intervention, thereby creating a productive programming environment. The <u>second </u> proof-of-concept implementation is a hardware prototype based on the multi-core Leon3 softcore processor running on a Virtex-6 FPGA. This allowed us to not only verify the feasibility of the implementation but also to evaluate the cost of the new hardware and its instructions.</p><p> This research has shown very promising results. With benchmarks in UPC and Chapel including the NAS Parallel Benchmarks implemented in UPC, a speedup of up to 5.5x is demonstrated when using the hardware support with unmodified codes. Unmodified code performance using this hardware was shown to also surpass the performance of manually optimized UPC code in some of the cases by up to 10%. With Chapel, we obtained measurable speed-ups of up to 19x. Additionally, the hardware prototype demonstrated that only a very small area increase is needed.</p> Computer engineering

Search results