Global ETD Search

21	Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems Fang, Ye 26 January 2017 (has links) Accelerated parallel computing techniques using devices such as GPUs and Xeon Phis (along with CPUs) have proposed promising solutions of extending the cutting edge of high-performance computer systems. A significant performance improvement can be achieved when suitable workloads are handled by the accelerator. Traditional CPUs can handle those workloads not well suited for accelerators. Combination of multiple types of processors in a single computer system is referred to as a heterogeneous system. This dissertation addresses tuning and scheduling issues in heterogeneous systems. The first section presents work on tuning scientific workloads on three different types of processors: multi-core CPU, Xeon Phi massively parallel processor, and NVIDIA GPU; common tuning methods and platform-specific tuning techniques are presented. Then, analysis is done to demonstrate the performance characteristics of the heterogeneous system on different input data. This section of the dissertation is part of the GeauxDock project, which prototyped a few state-of-art bioinformatics algorithms, and delivered a fast molecular docking program. The second section of this work studies the performance model of the GeauxDock computing kernel. Specifically, the work presents an extraction of features from the input data set and the target systems, and then uses various regression models to calculate the perspective computation time. This helps understand why a certain processor is faster for certain sets of tasks. It also provides the essential information for scheduling on heterogeneous systems. In addition, this dissertation investigates a high-level task scheduling framework for heterogeneous processor systems in which, the pros and cons of using different heterogeneous processors can complement each other. Thus a higher performance can be achieve on heterogeneous computing systems. A new scheduling algorithm with four innovations is presented: Ranked Opportunistic Balancing (ROB), Multi-subject Ranking (MR), Multi-subject Relative Ranking (MRR), and Automatic Small Tasks Rearranging (ASTR). The new algorithm consistently outperforms previously proposed algorithms with better scheduling results, lower computational complexity, and more consistent results over a range of performance prediction errors. Finally, this work extends the heterogeneous task scheduling algorithm to handle power capping feature. It demonstrates that a power-aware scheduler significantly improves the power efficiencies and saves the energy consumption. This suggests that, in addition to performance benefits, heterogeneous systems may have certain advantages on overall power efficiency. Electrical & Computer Engineering
22	A Performance Model and Optimization Strategies for Automatic GPU Code Generation of PDE Systems Described by a Domain-Specific Language Hu, Yue 23 August 2016 (has links) Stencil computations are a class of algorithms operating on multi-dimensional arrays also called grid functions (GFs), which update array elements using their nearest-neighbors. This type of computation forms the basis for computer simulations across almost every field of science, such as computational fluid dynamics. Its mostly regular data access patterns potentially enable it to take advantage of GPU's high computation and data bandwidth. However, manual GPU programming is time-consuming and error-prone, as well as requiring an in-depth knowledge of GPU architecture and programming. To overcome the difficulties in manual programming, a number of stencil frameworks have been developed to automatically generate GPU codes from user-written stencil code, usually in a Domain Specific Language. The previous stencil frameworks demonstrate the feasibility, but they also introduce a set of unprecedented challenges in real stencil applications. This dissertation is based on the Chemora stencil framework, aiming to better deal with real stencil applications, especially with large stencil calculations. The large calculations usually consist of dozens of GFs with a variety of stencil patterns, resulting in extremely large code-generation ways. First, we propose an algorithm to map a calculation into one or more kernels by minimizing off-chip memory accesses while maintaining a relatively high thread-level parallelism. Second, we propose an efficiency-based buffering algorithm which operates by scoring a change in buffering strategy for a GF using a performance estimation and resource usage. Let b (i.e., 5) denote the number of buffering strategies the framework supports. With the algorithm, a near optimal solution can be found in (b-1)N(N+1)/2 steps, instead of b^N steps, for a calculation with N GFs. Third, we wrote a set of microbenchmarks to explore and measure some performance-critical GPU microarchitecture features and parameters for better performance modeling. Finally, we propose an analytic performance model to predict the execution time. Electrical & Computer Engineering
23	Enhancing Program Soft Error Resilience through Algorithmic Approaches Chen, Sui 03 November 2016 (has links) The rising count and shrinking feature size of transistors within modern computers is making them increasingly vulnerable to various types of soft faults. This problem is especially acute in high-performance computing (HPC) systems used for scientific computing, because these systems include many thousands of compute cores and nodes, all of which may be utilized in a single large-scale run. The increasing vulnerability of HPC applications to errors induced by soft faults is motivating extensive work on techniques to make these applications more resilient to such faults, ranging from generic techniques such as replication or checkpoint/restart to algorithm-specific error detection and tolerance techniques. Effective use of such techniques requires a detailed understanding of how a given application is affected by soft faults to ensure that (i) efforts to improve application resilience are spent in the code regions most vulnerable to faults, (ii) the appropriate resilience techniques is applied to each code region, and (iii) the understanding be obtained in an efficient manner. This thesis presents two tools: FaultTelescope helps application developers view the routine and application vulnerability to soft errors while ErrorSight helps perform modular fault characteristics analysis for more complex applications. This thesis also illustrates how these tools can be used in the context of representative applications and kernels. In addition to providing actionable insights into application behavior, the tools automatically selects the number of fault injection experiments required to efficiently generation error profiles of an application, ensuring that the information is statistically well-grounded without performing unnecessary experiments. Electrical & Computer Engineering
24	Automatic Detection, Segmentation and Tracking of Vehicles in Wide-Area Aerial Imagery Gao, Xin, Gao, Xin January 2016 (has links) Object detection is crucial for many research areas in computer vision, image analysis and pattern recognition. Since vehicles in wide-area images appear with variable shape and size, illumination changes, partial occlusion, and background clutter, automatic detection has often been a challenging task. We present a brief study of various techniques for object detection and image segmentation, and contribute to a variety of algorithms for detecting vehicles in traffic lanes from two low-resolution aerial video datasets. We present twelve detection algorithms adapted from previously published work, and we propose two post-processing schemes in contrast to four existing schemes to reduce false detections. We present the results of several experiments for quantitative evaluation by combining detection algorithms before and after using a post-processing scheme. Manual segmentation of each vehicle in the cropped frames serves as the ground truth. We classify several types of detections by comparing the binary detection output to the ground truth in each frame, and use two sets of evaluation metrics to measure the performance. A pixel classification scheme is also derived for spatial post-processing applied to seven detection algorithms, among which two algorithms are selected for sensitivity analysis with respect to a range of overlap ratios. Six tracking algorithms are selected for performance analysis for overall accuracy under four different scenarios for sample frames in Tucson dataset. Electrical & Computer Engineering
25	Analytical Model for Relating FPGA Logic and Routing Architecture Parameters to Post-Routing Wirelength Soni, Arpit, Soni, Arpit January 2016 (has links) Analytical models have been introduced for rapidly evaluating the impact of architectural design choices on FPGA performance through model-based trend analysis. Modeling wirelength is a critical problem since channel width can be expressed as a function of total net length in a design, which is an indicator of routability for an FPGA. Furthermore, performance indicators, such as critical path delay and power consumption, are functions of net capacitance, which in turn is a function of net length. The analytical models to this date mainly originate from extracting circuit characteristics from post-placement stage of the CAD flow, which instills a strong binding between the model and the optimization objective of the CAD flow. Furthermore, these models primarily take only logic architecture features into account. In this study, we present a post-routing wirelength model that takes into account both logic and routing architectural parameters, and that does not rely on circuit characteristics extracted from any stage of the FPGA CAD flow. We apply a methodological approach to model parameter tuning as opposed to relying on a curve-fitting method, and show that our model accurately captures the experimental trends in wirelength with respect to changes in logic and routing architecture parameters individually. We demonstrate that the model accuracy is not sacrificed even if the performance objective of the CAD flow changes or the algorithms used by individual stages of the CAD flow (technology mapping, clustering, and routing) change. We swap the training and validation benchmarks, and show that our model development approach is robust and the model accuracy is not sacrificed. We evaluate our model based on new set of benchmarks that are not part of the training and validation benchmarks, and demonstrate its superiority over the state of the art. Based on the swapping based experiments, we show that the model parameters take values in a fixed range. We verify that this range holds its validity even for benchmarks that are not part of the training and validation benchmarks. We finally show that our model maintains a good estimation of the empirical trends even when very large values are used for the logic block architecture parameter. Electrical & Computer Engineering
26	Knowledge Enhanced Compressive Measurement Design: Detection and Estimation Tasks Huang, James, Huang, James January 2016 (has links) Compressive imaging exploits the inherent sparsity/compressibility of natural scenes to reduce the number of measurements required for reliable reconstruction/recovery. In many applications, however, additional scene prior information beyond sparsity (such as natural scene statistics) and task prior information may also be available. While current efforts on compressive measurement design attempt to exploit such scene and task priors in a heuristic/ad-hoc manner, in this dissertation, we develop a principled information-theoretic approach to this design problem that is able to fully exploit a probabilistic description (i.e. scene prior) of relevant scenes for a given task, along with the appropriate physical design constraints (e.g. photon count/exposure time) towards maximizing the system performance. We apply this information-theoretic framework to optimize compressive measurement designs, in EO/IR and X-ray spectral bands, for various detection/classification and estimation tasks. More specifically, we consider image reconstruction and target detection/classification tasks, and for each task we develop an information-optimal design framework for both static and adaptive measurements within parallel and sequential measurement architectures. For the image reconstruction task we show that the information-optimal static compressive measurement design is able to achieve significantly better compression ratios (and also reduced detector count, readout power/bandwidth) relative to various state-of-the-art compressive designs in the literature. Moreover, within a sequential measurement architecture our information-optimal adaptive design is able to successfully learn scene information online, i.e. from past measurement, and adapt next measurement (in a greedy sense) towards improving the measurement information efficiency, thereby providing additional performance gains beyond the corresponding static measurement design. We also develop a non-greedy adaptive measurement design framework for a face recognition task that is able to surpass the greedy adaptive design performance, by (strategically) maximizing the the long-term cumulative system performance over all measurements. Such a non-greedy adaptive design is also able to predict the optimal number of measurements for a fixed system measurement resource (e.g. photon-count). Finally, we develop a scalable (computationally) information-theoretic design framework to an X-ray threat detection task and demonstrate that information-optimized measurements can achieve a 99% threat detection threshold using 4x fewer exposures compared to a conventional system. Equivalently, the false alarm rate of the optimized measurements is reduced by nearly an order of magnitude relative to the conventional measurement design. Electrical & Computer Engineering
27	Improved Subset Generation For The MU-Decoder Agarwal, Utsav 21 February 2017 (has links) The MU-Decoder is a hardware subset generator that finds use in partial reconfiguration of FPGAs and in numerous other applications. It is capable of generating a set S of subsets of a large set Z_n with n elements. If the subsets in S satisfy the isomorphic totally- ordered property, then the MU-Decoder works very efficiently to produce a set of u subsets in O(log n) time and Θ(n √u log n) gate cost. In contrast, a vain approach requires Θ(un) gate cost. We show that this low cost for the MU-Decoder can be achieved without the isomorphism constraint, thereby allowing S to include a much wider range of subsets. We also show that if additional constraints on the relative sizes of the subsets in S can be placed, then u subsets can be generated with Θ(n √u) cost. This uses a new hardware enhancement proposed in this thesis. Finally, we show that by properly selecting S and by using some elements of traditional methods, a set of Θ (un^log( log (n/log n))) subsets can be produced with Θ(n √u) cost. Electrical & Computer Engineering
28	Achievable Secrecy Enhancement Through Joint Encryption and Privacy Amplification Sowti Khiabani, Yahya 18 June 2013 (has links) In this dissertation we try to achieve secrecy enhancement in communications by resorting to both cryptographic and information theoretic secrecy tools and metrics. Our objective is to unify tools and measures from cryptography community with techniques and metrics from information theory community that are utilized to provide privacy and confidentiality in communication systems. For this purpose we adopt encryption techniques accompanied with privacy amplification tools in order to achieve secrecy goals that are determined based on information theoretic and cryptographic metrics. Every secrecy scheme relies on a certain advantage for legitimate users over adversaries viewed as an asymmetry in the system to deliver the required security for data transmission. In all of the proposed schemes in this dissertation, we resort to either inherently existing asymmetry in the system or proactively created advantage for legitimate users over a passive eavesdropper to further enhance secrecy of the communications. This advantage is manipulated by means of privacy amplification and encryption tools to achieve secrecy goals for the system evaluated based on information theoretic and cryptographic metrics. In our first work discussed in Chapter 2 and the third work explained in Chapter 4, we rely on a proactively established advantage for legitimate users based on eavesdroppers lack of knowledge about a shared source of data. Unlike these works that assume an errorfree physical channel, in the second work discussed in Chapter 3 correlated erasure wiretap channel model is considered. This work relies on a passive and internally existing advantage for legitimate users that is built upon statistical and partial independence of eavesdroppers channel errors from the errors in the main channel. We arrive at this secrecy advantage for legitimate users by exploitation of an authenticated but insecure feedback channel. From the perspective of the utilized tools, the first work discussed in Chapter 2 considers a specific scenario where secrecy enhancement of a particular block cipher called Data Encryption standard (DES) operating in cipher feedback mode (CFB) is studied. This secrecy enhancement is achieved by means of deliberate noise injection and wiretap channel encoding as a technique for privacy amplification against a resource constrained eavesdropper. Compared to the first work, the third work considers a more general framework in terms of both metrics and secrecy tools. This work studies secrecy enhancement of a general cipher based on universal hashing as a privacy amplification technique against an unbounded adversary. In this work, we have achieved the goal of exponential secrecy where information leakage to adversary, that is assessed in terms of mutual information as an information theoretic measure and Eves distinguishability as a cryptographic metric, decays at an exponential rate. In the second work generally encrypted data frames are transmitted through Automatic Repeat reQuest (ARQ) protocol to generate a common random source between legitimate users that later on is transformed into information theoretically secure keys for encryption by means of privacy amplification based on universal hashing. Towards the end, future works as an extension of the accomplished research in this dissertation are outlined. Proofs of major theorems and lemmas are presented in the Appendix. Electrical & Computer Engineering
29	High Capacity Digital Beam Steering Technology Hebert, Daniel James 19 June 2013 (has links) A novel method is described in detail for steering light in many directions without moving mechanical parts. The method involves a combination of liquid crystal cells and polarizing beam splitters. The polarization at each beam splitter is controlled by applying a signal to its corresponding liquid crystal cell. A study of light steering techniques is described for efficient beam placement, in a line and plane. These techniques permit accurate, non-mechanical, beam steering limited by the response time of the liquid crystal cells. A theoretical limit to the number of discrete directions is described and closely approached for a one dimensional system. Electrical & Computer Engineering
30	Study on Effects of Supply Voltage Asymmetry and Distortion on Induction Machine. Bhattarai, Prashanna Dev 27 March 2013 (has links) Performance of induction motor supplied with asymmetrical and nonsinusoidal supply voltages is studied in this thesis. Theory of induction motor is first presented for sinusoidal symmetrical supply voltages. Equations for the torque, losses, currents and efficiency are derived. Appropriate changes are made to apply this theory to induction motors operating at nonsinusoidal and asymmetrical supply voltages. Single phase equivalent circuit of the induction motor is presented for both asymmetrical and nonsinusoidal supply voltages. The equations governing operating characteristics are presented. Machine torque, losses, current and efficiency for asymmetrical and nonsinusoidal supply voltages are compared with the same for sinusoidal symmetrical supply voltages. Computer simulation in MATLAB is used to study the impacts of asymmetrical and nonsinusoidal supply voltages on induction machines. Machine torque, losses, current and efficiency are calculated for various levels of voltage asymmetry and distortion. Results of computer simulation are presented. Electrical & Computer Engineering

Search results