451 |
ParModelica : Extending the Algorithmic Subset ofModelica with Explicit Parallel LanguageConstructs for Multi-core SimulationGebremedhin, Mahder January 2011 (has links)
In today’s world of high tech manufacturing and computer-aided design simulations of models is at theheart of the whole manufacturing process. Trying to represent and study the variables of real worldmodels using simulation computer programs can turn out to be a very expensive and time consumingtask. On the other hand advancements in modern multi-core CPUs and general purpose GPUs promiseremarkable computational power. Properly utilizing this computational power can provide reduced simulation time. To this end modernmodeling environments provide different optimization and parallelization options to take advantage ofthe available computational power. Some of these parallelization approaches are based onautomatically extracting parallelism with the help of a compiler. Another approach is to provide themodel programmers with the necessary language constructs to express any potential parallelism intheir models. This second approach is taken in this thesis work. The OpenModelica modeling and simulation environment for the Modelica language has beenextended with new language constructs for explicitly stating parallelism in algorithms. This slightlyextended algorithmic subset of Modelica is called ParModelica. The new extensions allow modelswritten in ParModelica to be translated to optimized OpenCL code which can take advantage of thecomputational power of available Multi-core CPUs and general purpose GPUs.
|
452 |
Rendering for Microlithography on GPU HardwareIwaniec, Michel January 2008 (has links)
Over the last decades, integrated circuits have changed our everyday lives in a number of ways. Many common devices today taken for granted would not have been possible without this industrial revolution. Central to the manufacturing of integrated circuits is the photomask used to expose the wafers. Additionally, such photomasks are also used for manufacturing of flat screen displays. Microlithography, the manufacturing technique of such photomasks, requires complex electronics equipment that excels in both speed and fidelity. Manufacture of such equipment requires competence in virtually all engineering disciplines, where the conversion of geometry into pixels is but one of these. Nevertheless, this single step in the photomask drawing process has a major impact on the throughput and quality of a photomask writer. Current high-end semiconductor writers from Micronic use a cluster of Field-Programmable Gate Array circuits (FPGA). FPGAs have for many years been able to replace Application Specific Integrated Circuits due to their flexibility and low initial development cost. For parallel computation, an FPGA can achieve throughput not possible with microprocessors alone. Nevertheless, high-performance FPGAs are expensive devices, and upgrading from one generation to the next often requires a major redesign. During the last decade, the computer games industry has taken the lead in parallel computation with graphics card for 3D gaming. While essentially being designed to render 3D polygons and lacking the flexibility of an FPGA, graphics cards have nevertheless started to rival FPGAs as the main workhorse of many parallel computing applications. This thesis covers an investigation on utilizing graphics cards for the task of rendering geometry into photomask patterns. It describes different strategies that were tried and the throughput and fidelity achieved with them, along with the problems encountered. It also describes the development of a suitable evaluation framework that was critical to the process.
|
453 |
An Embedded Shading LanguageQin, Zheng January 2004 (has links)
Modern graphics accelerators have embedded programmable components in the form of vertex and fragment shading units. Current APIs permit specification of the programs for these components using an assembly-language level interface. Compilers for high-level shading languages are available but these read in an external string specification, which can be inconvenient.
It is possible, using standard C++, to define an embedded high-level shading language. Such a language can be nearly indistinguishable from a special-purpose shading language, yet permits more direct interaction with the specification of textures and parameters, simplifies implementation, and enables on-the-fly generation, manipulation, and specification of shader programs. An embedded shading language also permits the lifting of C++ host language type, modularity, and scoping constructs into the shading language without any additional implementation effort.
|
454 |
Rendering Antialiased Shadows using Warped Variance Shadow MapsLauritzen, Andrew Timothy January 2008 (has links)
Shadows contribute significantly to the perceived realism of an image, and provide an important depth cue. Rendering high quality, antialiased shadows efficiently is a difficult problem. To antialias shadows, it is necessary to compute partial visibilities, but computing these visibilities using existing approaches is often too slow for interactive applications.
Shadow maps are a widely used technique for real-time shadow rendering. One major drawback of shadow maps is aliasing, because the shadow map data cannot be filtered in the same way as colour textures.
In this thesis, I present variance shadow maps (VSMs). Variance shadow maps use a linear representation of the depth distributions in the shadow map, which enables the use of standard linear texture filtering algorithms. Thus VSMs can address the problem of shadow aliasing using the same highly-tuned mechanisms that are available for colour images. Given the mean and variance of the depth distribution, Chebyshev's inequality provides an upper bound on the fraction of a shaded fragment that is occluded, and I show that this bound often provides a good approximation to the true partial occlusion.
For more difficult cases, I show that warping the depth distribution can produce multiple bounds, some tighter than others. Based on this insight, I present layered variance shadow maps, a scalable generalization of variance shadow maps that partitions the depth distribution into multiple segments. This reduces or eliminates an artifact - "light bleeding" - that can appear when using the simpler version of variance shadow maps. Additionally, I demonstrate exponential variance shadow maps, which combine moments computed from two exponentially-warped depth distributions. Using this approach, high quality results are produced at a fraction of the storage cost of layered variance shadow maps.
These algorithms are easy to implement on current graphics hardware and provide efficient, scalable solutions to the problem of shadow map aliasing.
|
455 |
Automatic Parallelization for Graphics Processing Units in JikesRVMLeung, Alan Chun Wai January 2008 (has links)
Accelerated graphics cards, or Graphics Processing Units (GPUs), have become ubiquitous in recent years. On the right kinds of problems, GPUs greatly surpass CPUs in terms of raw performance. However, GPUs are currently used only for a narrow class of special-purpose applications; the raw processing power available in a typical desktop PC is unused most of the time. The goal of this work is to present an extension to JikesRVM that automatically executes suitable code on the GPU instead of the CPU. Both static and dynamic features are used to decide whether it is feasible and beneficial to off-load a piece of code on the GPU. Feasible code is discovered by an implementation of data dependence analysis. A cost model that balances the speedup available from the GPU against the cost of transferring input and output data between main memory and GPU memory has been deployed to determine if a feasible parallelization is indeed beneficial. The cost model is parameterized so that it can be applied to different hardware combinations. We also present ways to overcome several obstacles to parallelization inherent in the design of the Java bytecode language: unstructured control flow, the lack of multi-dimensional arrays, the precise exception semantics, and the proliferation of indirect references.
|
456 |
A Study of Efficiency, Accuracy, and Robustness in Intensity-Based Rigid Image RegistrationXu, Lin January 2008 (has links)
Image registration is widely used in different areas nowadays. Usually, the efficiency, accuracy, and robustness in
the registration process are concerned in applications. This thesis studies these issues by presenting
an efficient intensity-based mono-modality rigid 2D-3D image registration method and constructing a novel mathematical
model for intensity-based multi-modality rigid image registration.
For mono-modality image registration,
an algorithm is developed using RapidMind Multi-core Development Platform (RapidMind) to exploit the highly
parallel multi-core architecture of graphics processing units (GPUs). A parallel ray casting algorithm is used
to generate the digitally reconstructed radiographs (DRRs) to efficiently reduce the complexity
of DRR construction. The optimization problem in the registration process is solved by the Gauss-Newton method.
To fully exploit the multi-core parallelism, almost the entire registration process is implemented in parallel
by RapidMind on GPUs. The implementation of the major computation steps is discussed. Numerical results
are presented to demonstrate the efficiency of the new method.
For multi-modality image registration,
a new model for computing mutual information functions is devised in order to remove the artifacts in the functions
and in turn smooth the functions so that optimization methods can converge to the optimal solutions accurately and efficiently.
With the motivation originating from the objective to harmonize the discrepancy between
the image presentation and the mutual information definition in previous models,
the new model computes the mutual information function using both the continuous image function
representation and the mutual information definition
for continuous random variables. Its implementation and complexity are discussed and compared with other models.
The mutual information computed using the new model appears quite smooth compared with the functions computed by others.
Numerical experiments demonstrate the accuracy and efficiency of optimization methods
in the case that the new model is used. Furthermore, the robustness of the new model is also verified.
|
457 |
Towards High Speed Aerial Tracking of Agile TargetsRizwan, Yassir January 2011 (has links)
In order to provide a novel perspective for videography of high speed sporting events, a highly capable trajectory tracking control methodology is developed for a custom designed Kadet Senior Unmanned Aerial Vehicle (UAV). The accompanying high fidelity system identification ensures that accurate flight models are used to design the control laws. A parallel vision based target tracking technique is also demonstrated and implemented on a Graphical Processing Unit (GPU), to assist in real-time tracking of the target.
Nonlinear control techniques like feedback linearization require a detailed and accurate system model. This thesis discusses techniques used for estimating these models using data collected during planned test flights. A class of methods known as the Output Error Methods are discussed with extensions for dealing with wind turbulence. Implementation of these methods, including data acquisition details, on the Kadet Senior are also discussed. Results for this UAV are provided. For comparison, additional results using data from a BAC-221 simulation are also provided as well as typical results from the work done at the Dryden Flight Research Center.
The proposed controller combines feedback linearization with linear tracking control using the internal model approach, and relies on a trajectory generating exosystem. Three different aircraft models are presented each with
increasing levels of complexity, in an effort to identify the simplest controller that yields acceptable performance. The dynamic inversion and linear tracking control laws are derived for each model, and simulation results are presented for tracking of elliptical and periodic trajectories on the Kadet Senior.
|
458 |
Design of a Multi-Core Multi-thread Floating-Point Processor and Its Application in Computer GraphicsYeh, Chia-Yu 06 September 2011 (has links)
Graphics processing unit (GPU) designs usually adopts various computer architecture techniques to boost the computation speed, including single-instruction multiple data (SIMD), very-long-instruction word (VLIW), multi-threading, and/or multi-core. In OpenGL ES 2.0, user programmable vertex shader (VS) hardware unit can be designed using vectored SIMD computation unit so that it can efficiently compute the matrix-vector multiplication, one of the key operations in vertex transformation. Recently, high-performance GPU, such as Telsa series from nVidia, is designed with many-core architectures with each core responsible for scalar operations. The intention is to allow for efficient execution of general-purpose computations in addition to the specialized graphics computations. In this thesis, we design a scalar-based multi-threaded GPU design that is composed of four scalar processors, one special-function unit, and can execute multi-threaded instructions. We use the example of vertex transformation to demonstrate execution efficiency of the scalar-based multi-threaded GPU. We also make comparison with the vector-based SIMD GPU.
|
459 |
GPU Based Digital Coherent Receiver for Optical transmission systemHsiao, Hsiang-Hung 18 July 2012 (has links)
The coherent optical fiber communication technology is attracting significant attentions in the world, because it can realize the spectrally efficient transmission system.
One major difference between 1980¡¦s and the latest coherent technology is the utilization of the digital signal processing (DSP). In 1980¡¦s the optical phase locked loop (OPLL) was required to realize the homodyne detection, and it was significantly difficult to realize. The latest coherent technology utilizes the DSP in place of the OPLL to realize the homodyne detection, and it is much easier than the OPLL.
The real-time realization of the DSP is still a problem. Because the DSP uses software to process the signal, it needs an extreme calculation power for the high-speed communication system. People always utilize the field programmable gate array (FPGA) to realize the real-time DSP, but the cost of the FPGA is too expensive for the commercial system at this moment.
This master thesis intend to utilize commercially available personal computer (PC) contained a GPU calculation board to replace FPGA. It can reduce the cost of the coherent receiver. Also, this receiver is defined by the software rather than the hardware. It means that we can realize a flexible receiver defined by the software.
|
460 |
Real-time Water Waves with Wave ParticlesYuksel, Cem 2010 August 1900 (has links)
This dissertation describes the wave particles technique for simulating water surface waves and two way fluid-object interactions for real-time applications, such as video games. Water exists in various different forms in our environment and it is important to develop necessary technologies to be able to incorporate all these forms in real-time virtual environments. Handling the behavior of large bodies of water, such as an ocean, lake, or pool, has been computationally expensive with traditional techniques even for offline graphics applications, because of the high resolution requirements of these simulations. A significant portion of water behavior for large bodies of water is the surface wave phenomenon. This dissertation discusses how water surface waves can be simulated efficiently and effectively at real-time frame rates using a simple particle system that we call "wave particles." This approach offers a simple, fast, and unconditionally stable solution to wave simulation. Unlike traditional techniques that try to simulate the water body (or its surface) as a whole with numerical techniques, wave particles merely track the deviations of the surface due to waves forming an analytical solution. This allows simulation of seemingly infinite water surfaces, like an open ocean. Both the theory and implementation of wave particles are discussed in great detail. Two-way interactions of floating objects with water is explained, including generation of waves due to object interaction and proper simulation of the effect of water on the object motion. Timing studies show that the method is scalable, allowing simulation of wave interaction with several hundreds of objects at real-time rates.
|
Page generated in 0.0284 seconds