Global ETD Search

1	Extracting Data-Level Parallelism from Sequential Programs for SIMD Execution Baumstark, Lewis Benton, Jr. 29 October 2004 (has links) The goal of this research is to retarget multimedia programs written in sequential languages (e.g., C) to architectures with data-parallel execution capabilities. Image processing algorithms often have a high potential for data-level parallelism, but the artifacts imposed by the sequential programming language (e.g., loops, pointer variables) can obscure the parallelism and prohibit generation of efficient parallel code. This research presents a program representation and recognition approach for generating a data parallel program specification from sequential source code and retargeting it to data parallel execution mechanisms. The representation is based on an extension of the multi-dimensional synchronous dataflow model of computation. A partial recognition approach identifies and transforms only those program elements that hinder parallelization while leaving other computational elements intact. This permits flexibility in the types of programs that can be retargeted, while avoiding the complexity of complete program recognition. This representation and recognition process is implemented in the PARRET system, which is used to extract the high-level specification of a set of image-processing programs. From this specification, code is generated for Intels SSE2 instruction set and for the SIMPil processor. The results demonstrate that PARRET can exploit, given sufficient parallel resources, the maximum available parallelism in the retargeted applications. Similarly, the results show PARRET can also exploit parallelism on architectures with hardware-limited parallel resources. It is desirable to estimate potential parallelism before undertaking the expensive process of reverse engineering and retargeting. The goal is to narrow down the search space to a select set of loops which have a high likelihood of being data-parallel. This work also presents a hybrid static/dynamic approach, called DLPEST, for estimating the data-level parallelism in sequential program loops. We demonstrate the correctness of the DLPESTs estimates, show that estimates for programs of 25 to 5000 lines of code can be performed in under 10 minutes and that estimation time scales sub-linearly with input program size. Program recognition Data-level parallelization SIMD processors Reengineering
2	Unstructured Computations on Emerging Architectures Al Farhan, Mohammed 05 May 2019 (has links) This dissertation describes detailed performance engineering and optimization of an unstructured computational aerodynamics software system with irregular memory accesses on various multi- and many-core emerging high performance computing scalable architectures, which are expected to be the building blocks of energy-austere exascale systems, and on which algorithmic- and architecture-oriented optimizations are essential for achieving worthy performance. We investigate several state-of-the-practice shared-memory optimization techniques applied to key kernels for the important problem class of unstructured meshes. We illustrate for a broad spectrum of emerging microprocessor architectures as representatives of the compute units in contemporary leading supercomputers, identifying and addressing performance challenges without compromising the floating-point numerics of the original code. While the linear algebraic kernels are bottlenecked by memory bandwidth for even modest numbers of hardware cores sharing a common address space, the edge-based loop kernels, which arise in the control volume discretization of the conservation law residuals and in the formation of the preconditioner for the Jacobian by finite-differencing the conservation law residuals, are compute-intensive and effectively exploit contemporary multi- and many-core processing hardware. We therefore employ low- and high-level algorithmic- and architecture-specific code optimizations and tuning in light of thread- and data-level parallelism, with a focus on strong thread scaling at the node-level. Our approaches are based upon novel multi-level hierarchical workload distribution mechanisms of data across different compute units (from the address space down to the registers) within every hardware core. We analyze the demonstrated aerodynamics application on specific computing architectures to develop certain performance metrics and models to bespeak the upper and lower bounds of the performance. We present significant full application speedup relative to the baseline code, on a succession of many-core processor architectures, i.e., Intel Xeon Phi Knights Corner (5.0x) and Knights Landing (2.9x). In addition, the performance of Knights Landing outperforms, at significantly lower power consumption, Intel Xeon Skylake with nearly twofold speedup. These optimizations are expected to be of value for many other unstructured mesh partial differential equation-based scientific applications as multi- and many- core architecture evolves. Performance Optimizations Thread-level parallelism Data-level parallelism Unstructured Grids Computational Aerodynamics Intel Xeon Phi
3	Idiom-driven innermost loop vectorization in the presence of cross-iteration data dependencies in the HotSpot C2 compiler / Idiomdriven vektorisering av inre loopar med databeroenden i HotSpots C2 kompilator Sjöblom, William January 2020 (has links) This thesis presents a technique for automatic vectorization of innermost single statement loops with a cross-iteration data dependence by analyzing data-flow to recognize frequently recurring program idioms. Recognition is carried out by matching the circular SSA data-flow found around the loop body’s φ-function against several primitive patterns, forming a tree representation of the relevant data-flow that is then pruned down to a single parameterized node, providing a high-level specification of the data-flow idiom at hand used to guide algorithmic replacement applied to the intermediate representation. The versatility of the technique is shown by presenting an implementation supporting vectorization of both a limited class of linear recurrences as well as prefix sums, where the latter shows how the technique generalizes to intermediate representations with memory state in SSA-form. Finally, a thorough performance evaluation is presented, showing the effectiveness of the vectorization technique. compiler vectorization SIMD Java HotSpot code optimization reductions prefix sums parallel programming data-level parallelism Computer Sciences Datavetenskap (datalogi)
4	應用剖面導向技術研製網路應用程式之可設定式細緻化存取控管林經緯, Lin,Ching Wei Unknown Date (has links) 存取控管(Access Control)是網路應用程式(Web Applications)安全防護中的核心課題。貫徹存取控管的程式碼往往必須嵌入到應用系統的各個模組中，具有橫跨(cross-cutting)的特性，卻也因此常常造成系統中反覆出現類似的程式碼以及不同需求的程式碼夾雜不清的現象。所以學界業界紛紛提出了許多可設定式(configurable)的存取控管機制來解決此一問題。但這些機制都著重在一般功能性(function-level)的存取控管，對於較細緻化(fine-grained)的資料存取(data-level)控管，並未提供設定式的控管方式，還是得透過程式化(programmatic)的方式處理，所以仍然有程式橫跨性的問題。最近興起的剖面導向程式設計(Aspect-Oriented Programming)基於關注分離的原則(Separation of Concerns)，針對像安全橫跨性的需求，倡議在原有的物件或函式模組外，另以剖面作為這些橫跨性需求的模組單位，既可集中開發又可依規則將安全程式碼整合至系統的各個模組。因此本研究將以AOP技術來設計與製作一套可設定式的細緻化存取控管服務與工具。 / Security is attracting more and more concerns in the development of Web applications. However, it is not easy to derive a robust security implementation for Web applications. The principle difficulty in designing security such as access control into an application system is that it is a concern that permeates through all the different modules of a system. As a result, security concerns in an application are often implemented with scattered and tangled code, which is not only error-prone but also makes it difficult to verify its correctness and perform the needed maintenance. Aspect-Oriented Programming (AOP) is a relative new design method that allows a programmer to isolate some of the code that crosscuts his program modules into a separate module, and thus realizes the concept of Separation of Concerns. AOP offers significant advantages to programming over traditional OO techniques in implementing crosscutting concerns such as access control. In this thesis, we define an XML schema for specifying fine-grained access control rules for Web applications in a configuration file and devise an aspect-oriented implementation scheme. Specifically, we develop an aspect synthesis tool that generates concrete access control aspects automatically from access control rules. These aspects, after woven into the base application, will enforce proper access control in a highly modular manner. As a result, we get a configurable implementation of access control that is not only adaptive but also effective. 網路應用程式宣告式存取控管機制以角色為基礎之存取控管資料層次存取控管剖面導向程式設計 web applications data-level access control Role-based access control MVC Aspect-oriented programming

1

Page generated in 0.0754 seconds