• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 72
  • 21
  • 7
  • 7
  • 5
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 164
  • 164
  • 82
  • 76
  • 70
  • 55
  • 44
  • 29
  • 25
  • 24
  • 23
  • 22
  • 22
  • 19
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Efficient image/video restyling and collage on GPU. / CUHK electronic theses & dissertations collection

January 2013 (has links)
創意媒體研究中,圖像/視頻再藝術作為有表現力的用戶定制外觀的創作手段受到了很大關注。交互設計中,特別是在圖像空間只有單張圖像或視頻輸入的情況下,運用計算機輔助設計虛擬地再渲染關注物體的風格化外觀來實現紋理替換是很強大的。現行的紋理替換往往通過操作圖像空間中像素的間距來處理紋理扭曲,原始圖像中潛在的紋理扭曲總是被破壞,因為現行的方法要麼存在由於手動網格拉伸導致的不恰當扭曲,要麼就由於紋理合成而導致不可避免的紋理開裂。圖像/視頻拼貼畫是被發明用以支持在顯示畫布上並行展示多個物體和活動。隨著數字視頻俘獲裝置的快速發展,相關的議題就是快速檢閱和摘要大量的視覺媒體數據集來找出關注的資料。這會是一項繁瑣的任務來審查長且乏味的監控視頻並快速把握重要信息。以關鍵信息和縮短視頻形式為交流媒介,視頻摘要是增強視覺數據集瀏覽效率和簡易理解的手段。 / 本文首先將圖像/視頻再藝術聚焦在高效紋理替換和風格化上。我們展示了一種交互紋理替換方法,能夠在不知潛在幾何結構和光照環境的情況下保持相似的紋理扭曲。我們運用SIFT 棱角特徵來自然地發現潛在紋理扭曲,並應用梯度深度圖復原和皺褶重要性優化來完成扭曲過程。我們運用GPU-CUDA 的並行性,通過實時雙邊網格和特徵導向的扭曲優化來促成交互紋理替換。我們運用基於塊的實時高精度TV-L¹光流,通過基於關鍵幀的紋理傳遞來完成視頻紋理替換。我們進一步研究了基於GPU 的風格化方法,並運用梯度優化保持原始圖像的精細結構。我們提出了一種能夠自然建模原始圖像精細結構的圖像結構圖,並運用基於梯度的切線生成和切線導向的形態學來構建這個結構圖。我們在GPU-CUDA 上通過並行雙邊網格和結構保持促成最終風格化。實驗中,我們的方法實時連續地展現了高質量的圖像/視頻的抽象再藝術。 / 當前,視頻拼貼畫大多創作靜態的基於關鍵幀的拼貼圖片,該結果只包含動態視頻有限的信息,會很大程度影響視覺數據集的理解。爲了便於瀏覽,我們展示了一種在顯示畫布上有效並行摘要動態活動的動態視頻拼貼畫。我們提出應用活動長方體來重組織及提取事件,執行視頻防抖來生成穩定的活動長方體,實行時空域優化來優化活動長方體在三維拼貼空間的位置。我們通過在GPU 上的事件相似性和移動關係優化來完成高效的動態拼貼畫,允許多視頻輸入。擁有再序核函數CUDA 處理,我們的視頻拼貼畫爲便捷瀏覽長視頻激活了動態摘要,節省大量存儲傳輸空間。實驗和調查表明我們的動態拼貼畫快捷有效,能被廣泛應用于視頻摘要。將來,我們會擴展交互紋理替換來支持更複雜的具大運動和遮蔽場景的一般視頻,避免紋理跳動。我們會採用最新視頻技術靈感使視頻紋理替換更加穩定。我們未來關於視頻拼貼畫的工作包括審查監控業中動態拼貼畫應用,並研究含有大量相機運動和不同種視頻過度的移動相機和一般視頻。 / Image/video restyling as an expressive way for producing usercustomized appearances has received much attention in creative media researches. In interactive design, it would be powerful to re-render the stylized presentation of interested objects virtually using computer-aided design tools for retexturing, especially in the image space with a single image or video as input. The nowaday retexturing methods mostly process texture distortion by inter-pixel distance manipulation in image space, the underlying texture distortion is always destroyed due to limitations like improper distortion caused by human mesh stretching, or unavoidable texture splitting caused by texture synthesis. Image/ video collage techniques are invented to allow parallel presenting of multiple objects and events on the display canvas. With the rapid development of digital video capture devices, the related issues are to quickly review and brief such large amount of visual media datasets to find out interested video materials. It will be a tedious task to investigate long boring surveillance videos and grasp the essential information quickly. By applying key information and shortened video forms as vehicles for communication, video abstraction and summary are the means to enhance the browsing efficiency and easy understanding of visual media datasets. / In this thesis, we first focused our image/video restyling work on efficient retexturing and stylization. We present an interactive retexturing that preserves similar texture distortion without knowing the underlying geometry and lighting environment. We utilized SIFT corner features to naturally discover the underlying texture distortion. The gradient depth recovery and wrinkle stress optimization are applied to accomplish the distortion process. We facilitate the interactive retexturing via real-time bilateral grids and feature-guided distortion optimization using GPU-CUDA parallelism. Video retexturing is achieved through a keyframe-based texture transferring strategy using accurate TV-L¹ optical flow with patch motion tracking techniques in real-time. Further, we work on GPU-based abstract stylization that preserves the fine structure in the original images using gradient optimization. We propose an image structure map to naturally distill the fine structure of the original images. Gradientbased tangent generation and tangent-guided morphology are applied to build the structure map. We facilitate the final stylization via parallel bilateral grids and structure-aware stylizing in real-time on GPU-CUDA. In the experiments, our proposed methods consistently demonstrate high quality performance of image/video abstract restyling in real-time. / Currently, in video abstraction, video collages are mostly produced with static keyfame-based collage pictures, which contain limited information of dynamic videos and in uence understanding of visual media datasets greatly. We present dynamic video collage that effectively summarizes condensed dynamic activities in parallel on the canvas for easy browsing. We propose to utilize activity cuboids to reorganize and extract dynamic objects for further collaging, and video stabilization is performed to generate stabilized activity cuboids. Spatial-temporal optimization is carried out to optimize the positions of activity cuboids in the 3D collage space. We facilitate the efficient dynamic collage via event similarity and moving relationship optimization on GPU allowing multi-video inputs. Our video collage approach with kernel reordering CUDA processing enables dynamic summaries for easy browsing of long videos, while saving huge memory space for storing and transmitting them. The experiments and user study have shown the efficiency and usefulness of our dynamic video collage, which can be widely applied for video briefing and summary applications. In the future, we will further extend the interactive retexturing to more complicated general video applications with large motion and occluded scene avoiding textures icking. We will also work on new approaches to make video retexturing more stable by inspiration from latest video processing techniques. Our future work for video collage includes investigating applications of dynamic collage into the surveillance industry, and working on moving camera and general videos, which may contain large amount of camera motions and different types of video shot transitions. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Li, Ping. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2013. / Includes bibliographical references (leaves 109-121). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstracts also in Chinese. / Abstract --- p.i / Acknowledgements --- p.v / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.2 --- Main Contributions --- p.5 / Chapter 1.3 --- Thesis Overview --- p.7 / Chapter 2 --- Efficient Image/video Retexturing --- p.8 / Chapter 2.1 --- Introduction --- p.8 / Chapter 2.2 --- Related Work --- p.11 / Chapter 2.3 --- Image/video Retexturing on GPU --- p.16 / Chapter 2.3.1 --- Wrinkle Stress Optimization --- p.19 / Chapter 2.3.2 --- Efficient Video Retexturing --- p.24 / Chapter 2.3.3 --- Interactive Parallel Retexturing --- p.29 / Chapter 2.4 --- Results and Discussion --- p.35 / Chapter 2.5 --- Chapter Summary --- p.41 / Chapter 3 --- Structure-Aware Image Stylization --- p.43 / Chapter 3.1 --- Introduction --- p.43 / Chapter 3.2 --- Related Work --- p.46 / Chapter 3.3 --- Structure-Aware Stylization --- p.50 / Chapter 3.3.1 --- Approach Overview --- p.50 / Chapter 3.3.2 --- Gradient-Based Tangent Generation --- p.52 / Chapter 3.3.3 --- Tangent-Guided Image Morphology --- p.54 / Chapter 3.3.4 --- Structure-Aware Optimization --- p.56 / Chapter 3.3.5 --- GPU-Accelerated Stylization --- p.58 / Chapter 3.4 --- Results and Discussion --- p.61 / Chapter 3.5 --- Chapter Summary --- p.66 / Chapter 4 --- Dynamic Video Collage --- p.67 / Chapter 4.1 --- Introduction --- p.67 / Chapter 4.2 --- Related Work --- p.70 / Chapter 4.3 --- Dynamic Video Collage on GPU --- p.74 / Chapter 4.3.1 --- Activity Cuboid Generation --- p.75 / Chapter 4.3.2 --- Spatial-Temporal Optimization --- p.80 / Chapter 4.3.3 --- GPU-Accelerated Parallel Collage --- p.86 / Chapter 4.4 --- Results and Discussion --- p.90 / Chapter 4.5 --- Chapter Summary --- p.100 / Chapter 5 --- Conclusion --- p.101 / Chapter 5.1 --- Research Summary --- p.101 / Chapter 5.2 --- Future Work --- p.104 / Chapter A --- Publication List --- p.107 / Bibliography --- p.109
32

GPU-Acceleration of In-Memory Data Analytics

Sitaridi, Evangelia January 2016 (has links)
Hardware advances strongly influence the database system design. The flattening speed of CPU cores makes many-core accelerators, such as GPUs, a vital alternative to explore for processing the ever-increasing amounts of data. GPUs have a significantly higher degree of parallelism than multi-core CPUs but their cores are simpler. As a result, they do not face the power constraints limiting the parallelism of CPUs. Their trade-off, however, is the increased implementation complexity. This thesis adapts and redesigns data analytics operators to better exploit the GPU special memory and threading model. Due to the increasing memory capacity and also the user's need for fast interaction with the data, we focus on in-memory analytics. Our techniques span different steps of the data processing pipeline: (1) Data preprocessing, (2) Query compilation, and (3) Algorithmic optimization of the operators. Our data preprocessing techniques adapt the data layout for numeric and string columns to maximize the achieved GPU memory bandwidth. Our query compilation techniques compute the optimal execution plan for conjunctive filters. We formulate \textit{memory divergence} for string matching algorithms and suggest how to eliminate it. Finally, we parallelize decompression algorithms in our compression framework \textit{Gompresso} to fit more data into the limited GPU memory. Gompresso achieves high speed-ups on GPUs over multi-core CPU state-of-the-art libraries and is suitable for any massively parallel processor.
33

Colorization in Gabor space and realistic surface rendering on GPUs. / 基於Gabor特徵空間的染色技術與真實感表面GPU繪製 / CUHK electronic theses & dissertations collection / Ji yu Gabor te zheng kong jian de ran se ji shu yu zhen shi gan biao mian GPU hui zhi

January 2011 (has links)
Based on the construction of Gabor feature space, which is important in applying pixel similarity computations, we formalize the space using rotation-invariant Gabor filter banks and apply optimizations in texture feature space. In image colorizations, the pixels that have similar Gabor features appear similar colors, our approach can colorize natural images globally, without the restriction of the disjoint regions with similar texture-like appearances. Our approach supports the two-pass colorization processes: coloring optimization in Gabor space and color detailing for progressive effects. We further work on the video colorization using the optimized Gabor flow computing, including coloring keyframes, color propagation by Gabor filtering, and optimized parallel computing over the video. Our video colorization is designed in a spatiotemporal manner to keep temporal coherence, and provides simple closed-form solutions in energy optimization that yield fast colonizations. Moreover, we develop parallel surface texturing of geometric models on GPU, generating spatially-varying visual appearances. We incorporate the Gabor feature space for the searching of 2D exemplars, to determine the k-coherence candidate pixels. The multi-pass correction in synthesis is applied to the local neighborhood for parallel processes. The iso/aniso-scale texture synthesis leverages the strengths of GPU computing, so to synthesize the iso/aniso-scale texturing appearance in parallel over arbitrary surfaces. Our experimental results showed that our approach produces simply controllable texturing effects of surface synthesis, generating texture-similar and spatially-varying visual appearances with GPU accelerated performance. / Texture feature similarity has long been crucial and important topic in VR/graphics applications, such as image and video colorizations, surface texture synthesis and geometry image applications. Generally, the image feature is highly subjective, depending on not only the image pixels but also interactive users. Existing colorization and surface texture synthesis pay little attention to the generation of conforming color/textures that accurately reflect exemplar structures or user's intension. Realistic surface synthesis remains a challenging task in VR/graphics researches. In this dissertation, we focus on the encoding of the Gabor filter banks into texture feature similarity computations and GPU-parallel surface rendering faithfully, including image/vodeo colorizations, parallel texturing of geometric surfaces, and multiresolution rendering on sole-cube maps (SCMs). / We further explore the GPU-based multiresolution rendering on solecube maps (SCMs). Our SCMs on GPU generate adaptive mesh surfaces dynamically, and are fully developed in parallelization for large-scale and complex VR environments. We also encapsulate the differential coordinates in SCMs, reflecting the local geometric characteristics for geometric modeling and interactive animation applications. For the future work, we will work on improving the image/ video feature analysis framework in VR/graphics applications. The further work lying in the surface texture synthesis includes the interactive control of texture orientations by surface vector fields using sketch editing, so to widen the gamut of interactive tools available for texturing artists and end users. / Sheng, Bin. / Adviser: Hanqin Sun. / Source: Dissertation Abstracts International, Volume: 73-04, Section: B, page: . / Thesis (Ph.D.)--Chinese University of Hong Kong, 2011. / Includes bibliographical references (leaves 128-142). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese.
34

Parallélisation de simulations physiques utilisant un modéle de Boltzmann mullti-phases et multi-composants en vue d'un épandage de GNL sur sol / Parallelisation of physical simulations using Boltzmann method multiphase and multicomponent with the aim of manuring GNL on ground

Duchateau, Julien 09 December 2015 (has links)
Cette thèse a pour but de définir et de développer des solutions informatiques de manière à permettre la mise en place de simulations physiques sur des domaines de simulation très grands tels qu'un site industriel comme le terminal méthanier de Dunkerque. Le modèle d'écoulement mis en place est basé sur la méthode de Boltzmann sur réseau et permet de gérer de nombreux cas de simulation. Différentes architectures de calculs sont étudiées dans ce travail de thèse. L'utilisation du processeur central ainsi que de processeurs graphiques pour la parallélisation des calculs est abordée. Des solutions sont mises en place de manière à obtenir une parallélisation efficace du modèle de calcul sur plusieurs GPUS pouvant calculer en parallèle. Une approche de maillage progressif du maillage de simulation est également abordée pour gérer dynamiquement la quantité de mémoire nécessaire pour simuler en fonction des besoins de la simulation et de sa progression. Son intégration sur une architecture de calcul composée de plusieurs processeurs graphiques est également mise en avant. Finalement, une solution de type "Out-of-core" a été mise en place pour traiter des cas où la mémoire liée aux processeurs graphiques est insuffisante pour simuler. En effet, les processeurs graphiques disposent généralement d'une quantité de mémoire nettement inférieure à celle de la RAM du processeur central. La mise en place d'un système d'échange efficace entre les processeurs graphiques et la RAM est donc essentielle. / This thesis has for goal to define and develop solutions in order to achieve physical simulations on large simulation domains such as industrial sites (Dunkerque LNG Terminal). The simulation model is based on the lattice Boltzmann method (LBM) and allows to treat several simulation cases. The use of several computing architectures are studied in this work. The use of a multicore central processing unit (CPU) and also several graphics processing units (GPUS) is considered. An efficient parllelization of the simulation model is obtained by the use of several GPUS able to calculate in parallel. A progressive mesh algorithm is also defined in order to automatically mesh the simulation domain according to fluids propagation. Its integration on a multi-GPU architecture is studied. Finally, an "out-of-core" method is introduced in order to handle cases that require more memory than all GPUS have. Indeed, GPU memory is generally significantly inferior to the CPU memory. The definition of an exchange system between GPUS and the CPU is therefore essential.
35

Využití grafických procesorů v úlohách celočíselného programování / Solving vehicle routing problems and algorithm implementation on GPU

Hájek, Jan January 2010 (has links)
A very wide-ranging subgroup of vehicle routing problems from the graph theory is a common and frequent problem handled daily by transport companies, airline businesses, hi-tech companies with planning drilling of printed circuits boards or other companies from different industries. During numerous previous researches of these problems a lot of analyses were made and many solutions proposed -- of which an outline is in this paper. Some of them giving better or worse results in longer or shorter computing time. In spite of the fact that the processors and new technologies performance is increasing, with some algorithms we cannon compute the result in a reasonable time. That is why this paper is asking a question, if there can be found a fitting algorithm which could be applied on different and faster processing unit structures so it could be ensured a multiple computing speed increase so far. The analysis was carried out using computer experiments on a new build and implemented branch and bound algorithm with a matrix rate reduction.
36

SSAGA: Streaming Multiprocessors (SMs) Sculpted for Asymmetric General Purpose Graphics Processing Unit (GPGPU) Applications

Saha, Shamik 01 May 2016 (has links)
The evolution of the Graphics Processing Units (GPUs) over the last decade, has reinforced general purpose computing while sustaining a steady performance growth in graphics intensive applications. However, the immense performance improvement is generally associated with a steep rise in GPU power consumption. Consequently, GPUs are already close to the abominable power wall. With a massive popularity of the mobile devices running general-purpose GPU (GPGPU) applications, it is of utmost importance to ensure a high energy efficiency, while meeting the strict performance requirements. In this work, we demonstrate that, customizing a Streaming Multiprocessor (SM) of a GPU, at a lower frequency, is significantly more energy efficient, compared to employing Dynamic Voltage and Frequency Scaling (DVFS) on an SM, designed for a high frequency operation. Using a system level Computer Aided Design (CAD) technique, we propose SSAGA - Streaming Multiprocessors Sculpted for Asymmetric GPGPU Applications, an energy efficient GPU design paradigm. SSAGA creates architecturally identical SM cores, customized for different voltage-frequency domains.
37

Cellular matrix for parallel k-means and local search to Euclidean grid matching / Matrice cellulaire pour des algorithmes parallèles de k-means et de recherche locale appliqués à des problèmes euclidiens d’appariement de graphes

Wang, Hongjian 03 December 2015 (has links)
Dans cette thèse, nous proposons un modèle de calcul parallèle, appelé « matrice cellulaire », pour apporter des réponses aux problématiques de calcul parallèle appliqué à la résolution de problèmes d’appariement de graphes euclidiens. Ces problèmes d’optimisation NP-difficiles font intervenir des données réparties dans le plan et des structures élastiques représentées par des graphes qui doivent s’apparier aux données. Ils recouvrent des problèmes connus sous des appellations diverses telles que geometric k-means, elastic net, topographic mapping, elastic image matching. Ils permettent de modéliser par exemple le problème du voyageur de commerce euclidien, le problème du cycle médian, ainsi que des problèmes de mise en correspondance d’images. La contribution présentée est divisée en trois parties. Dans la première partie, nous présentons le modèle de matrice cellulaire qui partitionne les données et définit le niveau de granularité du calcul parallèle. Nous présentons une boucle générique de calcul parallèle qui modélise le principe des projections de graphes et de leur appariement. Dans la deuxième partie, nous appliquons le modèle de calcul parallèle aux algorithmes de k-means avec topologie dans le plan. Les algorithmes proposés sont appliqués au voyageur de commerce, à la génération de maillage structuré et à la segmentation d'image suivant le concept de superpixel. L’approche est nommée superpixel adaptive segmentation map (SPASM). Dans la troisième partie, nous proposons un algorithme de recherche locale parallèle, appelé distributed local search (DLS). La solution du problème résulte des opérations locales sur les structures et les données réparties dans le plan, incluant des évaluations, des recherches de voisinage, et des mouvements structurés. L’algorithme est appliqué à des problèmes d’appariement de graphe tels que le stéréo-matching et le problème de flot optique. / In this thesis, we propose a parallel computing model, called cellular matrix, to provide answers to problematic issues of parallel computation when applied to Euclidean graph matching problems. These NP-hard optimization problems involve data distributed in the plane and elastic structures represented by graphs that must match the data. They include problems known under various names, such as geometric k-means, elastic net, topographic mapping, and elastic image matching. The Euclidean traveling salesman problem (TSP), the median cycle problem, and the image matching problem are also examples that can be modeled by graph matching. The contribution presented is divided into three parts. In the first part, we present the cellular matrix model that partitions data and defines the level of granularity of parallel computation. We present a generic loop for parallel computations, and this loop models the projection between graphs and their matching. In the second part, we apply the parallel computing model to k-means algorithms in the plane extended with topology. The proposed algorithms are applied to the TSP, structured mesh generation, and image segmentation following the concept of superpixel. The approach is called superpixel adaptive segmentation map (SPASM). In the third part, we propose a parallel local search algorithm, called distributed local search (DLS). The solution results from the many local operations, including local evaluation, neighborhood search, and structured move, performed on the distributed data in the plane. The algorithm is applied to Euclidean graph matching problems including stereo matching and optical flow.
38

Acceleration of Transient Stability Simulation for Large-Scale Power Systems on Parallel and Distributed Hardware

Jalili-Marandi, Vahid 11 1900 (has links)
Transient stability analysis is necessary for the planning, operation, and control of power systems. However, its mathematical modeling and time-domain solution is computationally onerous and has attracted the attention of power systems experts and simulation specialists for decades. The ultimate promised goal has been always to perform this simulation as fast as real-time for realistic-sized systems. In this thesis, methods to speedup transient stability simulation for large-scale power systems are investigated. The research reported in this thesis can be divided into two parts. First, real-time simulation on a general-purpose simulator composed of CPU-based computational nodes is considered. A novel approach called Instantaneous Relaxation (IR) is proposed for the real-time transient stability simulation on such a simulator. The motivation of proposing this technique comes from the inherent parallelism that exists in the transient stability problem that allows to have a coarse grain decomposition of resulting system equations. Comparison of the real-time results with the off-line results shows both the accuracy and efficiency of the proposed method. In the second part of this thesis, Graphics Processing Units (GPUs) are used for the first time for the transient stability simulation of power systems. Data-parallel programming techniques are used on the single-instruction multiple-date (SIMD) architecture of the GPU to implement the transient stability simulations. Several test cases of varying sizes are used to investigate the GPU-based simulation. The simulation results reveal the obvious advantage of using GPUs instead of CPUs for large-scale problems. In the continuation of part two of this thesis the application of multiple GPUs running in parallel is investigated. Two different parallel processing based techniques are implemented: the IR method, and the incomplete LU factorization based approach. Practical information is provided on how to use multi-threaded programming to manage multiple GPUs running simultaneously for the implementation of the transient stability simulation. The implementation of the IR method on multiple GPUs is the intersection of data parallelism and program-level parallelism, which makes possible the simulation of very large-scale systems with 7020 buses and 1800 synchronous generators. / Energy Systems
39

Modeling performance and power for energy-efficient GPGPU computing

Hong, Sunpyo 12 November 2012 (has links)
The objective of the proposed research is to develop an analytical model that predicts performance and power for many-core architecture and further propose a mechanism, which leverages the analytical model, to enable energy-efficient execution of an application. The key insight of the model is to investigate and quantify a complex relationship that exists between the thread-level parallelism and memory-level parallelism for an application on a given many-core architecture. Two metrics are proposed: memory warp parallelism (MWP), which refers to the number of overlapping memory accesses per core, and computation warp parallelism (CWP), which characterizes an application type. By using these metrics in addition to the architectural and application parameters, the overall application performance is produced. The model uses statically-available parameters such as instruction-mixture information and input-data size, and the prediction accuracy is 13.3% for the GPU-computing benchmarks. Another important aspect of using many-core architecture is reducing peak power and achieving energy savings. By using the proposed integrated power and performance (IPP) framework, the results showed that different optimization points exist for GPU architecture depending on the application type. The work shows that by activating fewer cores, 10.99% of run-time energy consumption can be saved for the bandwidth-limited benchmarks, and a projection of 25.8% energy savings is predicted when power-gating at core level is employed. Finally, the model is shifted to throughput using OpenCL for targeting more variety of processors. First, multiple outputs relating to performance are predicted, including upper-bound and lower-bound values. Second, by using the model parameters, an application can be categorized into a different category, each with its own suggestions for improving performance and energy efficiency. Third, the bandwidth saturation point accuracy is significantly improved by considering independent memory accesses and updating the performance model. Furthermore, a trade-off analysis using architectural and application parameters is straightforward, which provides more insights to improve energy efficiency. In the future, a computer system will contain hundreds of heterogeneous cores. Hence, it is mandatory that a workload gets scheduled to an efficient core or distributed on both types of cores. A preliminary work by using the analytical model to do scheduling between CPU and GPU is demonstrated in the appendix. Since profiling phase is not required, the kernel code can be transformed to run more efficiently on the specific architecture. Another extension of the work regarding the relationship between the speed-up and energy efficiency is mathematically derived. Finally, future research ideas are presented regarding the usage of the model for programmer, compiler, and runtime for future heterogeneous systems.
40

Computational Medical Image Analysis : With a Focus on Real-Time fMRI and Non-Parametric Statistics

Eklund, Anders January 2012 (has links)
Functional magnetic resonance imaging (fMRI) is a prime example of multi-disciplinary research. Without the beautiful physics of MRI, there wouldnot be any images to look at in the first place. To obtain images of goodquality, it is necessary to fully understand the concepts of the frequencydomain. The analysis of fMRI data requires understanding of signal pro-cessing, statistics and knowledge about the anatomy and function of thehuman brain. The resulting brain activity maps are used by physicians,neurologists, psychologists and behaviourists, in order to plan surgery andto increase their understanding of how the brain works. This thesis presents methods for real-time fMRI and non-parametric fMRIanalysis. Real-time fMRI places high demands on the signal processing,as all the calculations have to be made in real-time in complex situations.Real-time fMRI can, for example, be used for interactive brain mapping.Another possibility is to change the stimulus that is given to the subject, inreal-time, such that the brain and the computer can work together to solvea given task, yielding a brain computer interface (BCI). Non-parametricfMRI analysis, for example, concerns the problem of calculating signifi-cance thresholds and p-values for test statistics without a parametric nulldistribution. Two BCIs are presented in this thesis. In the first BCI, the subject wasable to balance a virtual inverted pendulum by thinking of activating theleft or right hand or resting. In the second BCI, the subject in the MRscanner was able to communicate with a person outside the MR scanner,through a virtual keyboard. A graphics processing unit (GPU) implementation of a random permuta-tion test for single subject fMRI analysis is also presented. The randompermutation test is used to calculate significance thresholds and p-values forfMRI analysis by canonical correlation analysis (CCA), and to investigatethe correctness of standard parametric approaches. The random permuta-tion test was verified by using 10 000 noise datasets and 1484 resting statefMRI datasets. The random permutation test is also used for a non-localCCA approach to fMRI analysis.

Page generated in 0.0946 seconds