Global ETD Search

631	Parallel Computation of the Meddis MATLAB Auditory Periphery Model Sanghvi, Niraj D. 18 July 2012 (has links) No description available. Parallel Computing Auditory Periphery High Performance Computing GPU CUDA Meddis Auditory Periphery MATLAB Parallel Computing Toolbox
632	Automatic Code Generation for Stencil Computations on GPU Architectures Holewinski, Justin A. 19 December 2012 (has links) No description available. Computer Engineering Computer Science GPU SIMD stencils CUDA OpenCL code generation dynamic analysis
633	Raydio: Grafisk radiovågssimulering med strålspårning / Raydio: Graphical radio wave simulation using raytracing Ivarsson, Mattias, Lindström, Alexander January 2022 (has links) Den 5:e generationens mobilnätverk, som utlovat en ökning av bland annat tillgänglighet i förhållande till tidigare nätverksgenerationer, befinner sig just nu i en global pågående lanseringsfas. Lanseringen av 5G-nätet är tänkt att öppna möjligheter för ett stort antal nya användningsområden. Däribland återfinns IoT och kritisk kommunikation inom områden såsom medicinska processer och självkörande bilar för vilka tillgänglighet kommer att vara en viktig faktor. Ett relativt vanligt problem som kan uppstå i och med trådlös uppkoppling är radioskugga, vilket är områden med väldigt låg eller ingen signalnivå. Raydio (som projektet heter) bygger på att använda en relativt ny teknologi för att kunna skapa ett hjälpmedel som ska kunna användas för att utvärdera tillgänglighet för den 5:e generationens mobilnätverks radiovågor. Teknologin som används är strålspårning i realtid. Strålspårningstekniken ska användas till att simulera radiovågor i en grafisk miljö för att kunna hitta områden som saknar täckning, för att sedan kunna ge en grafisk bild av hur det skulle kunna åtgärdas. Resultatet av projektet är ett koncepttest som visar på hur strålspårning kan användas i spelmotorn Unity med hjälp av NVIDIA OptiX strålspårnings-API, och där en början på ett plugin har utformats. / The 5th generation mobile network, which has promised an increase in accessibility compared to its predecessors, is currently in a global launch phase. The launch of the 5G network is intended to open up the possibility of a whole range of new areas of use. Among those are IoT and critical communication in areas such as medical processesand self-driving cars for which accessibility will be an important factor. A relatively common problem that can occur with wireless connection are black spots, which are areas with very low or no signal level. Raydio (as the project is called) is based on using a relatively new technology to be able to create an aid that can be used to evaluate the availability of radio waves for the 5th generation mobile network. The technology to be used is real-time ray tracing. Ray tracing technology will be used to simulate radiowaves in a graphical environment in order to find areas that lacks signal coverage in order to be able to provide a graphical representation of how it could be remedied. The result of the project is a proof of concept using ray tracing in the game engine Unity by using the NVIDIA OptiX ray tracing API, where a basic implementation of a plugin has been designed. strålspårning 5g ray tracing radio gpu datorgrafik Computer Sciences Datavetenskap (datalogi)
634	MONTE CARLO MODELING OF DIFFUSE REFLECTANCE AND RAMAN SPECTROSCOPY IN BIOMEDICAL DIAGNOSTICS Dumont, Alexander Pierre January 2020 (has links) Computational modeling of light-matter interactions is a valuable approach for simulating photon paths in highly scattering media such as biological tissues. Monte Carlo (MC) models are considered to be the gold standard of implementation and can offer insights into light flux, absorption, and emission through tissues. Monte Carlo modeling is a computationally intensive approach, but this burden has been alleviated in recent years due to the parallelizable nature of the algorithm and the recent implementation of graphics processing unit (GPU) acceleration. Despite impressive translational applications, the relatively recent emergence of GPU-based acceleration of MC models can still be utilized to address some pressing challenges in biomedical optics beyond DOT and PDT. The overarching goal of the current dissertation is to advance the applications and abilities of GPU accelerated MC models to include low-cost devices and model Raman scattering phenomena as they relate to clinical diagnoses. The massive increase in computational capacity afforded by GPU acceleration dramatically reduces the time necessary to model and optimize optical detection systems over a wide range of real-world scenarios. Specifically, the development of simplified optical devices to meet diagnostic challenges in low-resource settings is an emerging area of interest in which the use of MC modeling to better inform device design has not yet been widely reported. In this dissertation, GPU accelerated MC modeling is utilized to guide the development of a mobile phone-based approach for diagnosing neonatal jaundice. Increased computational capacity makes the incorporation of less common optical phenomena such as Raman scattering feasible in realistic time frames. Previous Raman scattering MC models were simplistic by necessity. As a result, it was either challenging or impractical to adequately include model parameters relevant to guiding clinical translation. This dissertation develops a Raman scattering MC model and validates it in biological tissues. The high computational capacity of a GPU-accelerated model can be used to dramatically decrease the model’s grid size and potentially provide an understanding of measured signals in Raman spectroscopy that span multiple orders of magnitude in spatial scale. In this dissertation, a GPU-accelerated Raman scattering MC model is used to inform clinical measurements of millimeter-scale bulk tissue specimens based on Raman microscopy images. The current study further develops the MC model as a tool for designing diffuse detection systems and expands the ability to use the MC model in Raman scattering in biological tissues. / Bioengineering Bioengineering Biophysics Engineering, Biomedical Diffuse Reflectance Fluorescence Spectroscopy Gpu Acceleration Modeling Monte-carlo Raman Spectroscopy
635	Accelerating Radiowave Propagation Simulations: A GPU-based Approach to Parabolic Equation Modeling / Accelererad simulering av utbredning av radiovågor: En GPU-baserad lösning av en parabolisk ekvation Nilsson, Andreas January 2024 (has links) This study explores the application of GPU-based algorithms in radiowave propagation modeling, specifically through the scope of solving parabolic wave equations. Radiowave propagation models are crucial in the field of wireless communications, where they help predict how radio waves travel through different environments, which is vital for planning and optimization. The research specifically examines the implementation of two numerical methods: the Split Step Method and the Finite Difference Method. Both methods are adapted to utilize the parallel processing capabilities of modern GPUs, harnessing a parallel computing framework known as CUDA to achieve considerable speed enhancements compared to traditional CPU-based methods.Our findings reveal that the Split Step method generally achieves higher speedup factors, especially in scenarios involving large system sizes and high-frequency simulations, making it particularly effective for expansive and complex models. In contrast, the Finite Difference Method shows more consistent speedup across various domain sizes and frequencies, suggesting its robustness across a diverse range of simulation conditions. Both methods maintained high accuracy levels, with differences in computed norms remaining low when comparing GPU implementations against their CPU counterparts. GPU CPU Radiowave propagation Split Step Finite Difference Other Physics Topics Annan fysik
636	Modeling, Analysis, and Real-Time Design of Many-Antenna MIMO Networks Chen, Yongce 14 September 2021 (has links) Among the many advances and innovations in wireless technologies over the past twenty years, MIMO is perhaps among the most successful. MIMO technology has been evolving over the past two decades. Today, the number of antennas equipped at a base station (BS) or an access point (AP) is increasing, which forms what we call ``many-antenna'' MIMO systems. Many-antenna MIMO will have significant impacts on modern wireless communications, as it will allow numerous wireless applications to operate on the vastly underexplored mid-band and high-band spectrum and is able to deliver ultra-high throughput. Although there are considerable efforts on many-antenna MIMO systems, most of them came from physical (PHY) layer information-theoretic exploitation. There is a lack of investigation of many-antenna MIMO from a networking perspective. On the other hand, new knowledge and understanding begin to emerge at the PHY layer, such as the rank-deficient channel phenomenon. This calls for new theories and models for many-antenna MIMO in a networking environment. In addition, the problem space for many-antenna MIMO systems is much broader and more challenging than conventional MIMO. Reusing existing solutions designed for conventional MIMO systems may suffer from inferior performance or require excessive computation time. The goal of this dissertation is to advance many-antenna MIMO techniques for networking research. We focus on the following two critical areas in the context of many-antenna MIMO networks: (i) DoF-based modeling and (ii) real-time optimization. This dissertation consists of two parts that study these two areas. In the first part, we aim to develop new DoF models and theories under general channel rank conditions for many-antenna MIMO networks, and we explored efficient DoF allocation based on our new DoF model. The main contributions of this part are summarized as follows. New DoF models and theories under general channel rank conditions: Existing DoF-based models in networking community assume that the channel matrix is of full rank. However, this assumption no longer holds when the number of antennas becomes many and the propagation environment is not ideal. In this study, we develop a novel DoF model under general channel rank conditions. In particular, we find that for IC, shared DoF consumption at both transmit and receive nodes is most efficient for DoF allocation, which is contrary to existing unilateral IC models based on full-rank channel assumption. Further, we show that existing DoF models under the full-rank assumption are a special case of our generalized DoF model. The findings of this study pave the way for future research of many-antenna networks under general channel rank conditions. Efficient DoF utilization for MIMO networks: We observes that, in addition to the fact that channel is not full-rank, the strength of signals on different directions in the eigenspace is extremely uneven. This offers us new opportunities to efficiently utilize DoFs in a MIMO network. In this study, we introduce a novel concept called ``effective rank threshold''. Based on this threshold, DoFs are consumed only to cancel strong interferences in the eigenspace while weak interferences are treated as noise in throughput calculation. To better understand the benefits of this approach, we study a fundamental trade-off between network throughput and effective rank threshold for an MU-MIMO network. Our simulation results show that network throughput under optimal rank threshold is significantly higher than that under existing DoF IC models. In the second part, we offered real-time designs and implementations to solve many-antenna MIMO problems for 5G cellular systems. In addition to maximizing a specific optimization objective, we aim at offering a solution that can be implemented in sub-ms to meet requirements in 5G standards. The main contributions of this part are summarized as follows. Turbo-HB---A novel design and implementation for ultra-fast hybrid beamforming: We investigate the beamforming problem under hybrid beamforming (HB) architecture. A major practical challenge for HB is to obtain a solution in 500 $mu$s, which is an extremely stringent but necessary time requirement for its deployment in the field. To address this challenge, we present Turbo-HB---a novel beamforming design under the HB architecture that can obtain the beamforming matrices in about 500 $mu$s. The key ideas of Turbo-HB are two-fold. First, we develop low-complexity SVD by exploiting randomized SVD technique and leveraging channel sparsity at mmWave frequencies. Second, we accelerate the overall computation time through large-scale parallel computation on a commercial off-the-shelf (COTS) GPU platform, with special engineering efforts for matrix operations and minimized memory access. Experimental results show that Turbo-HB is able to obtain the beamforming matrices in 500 $mu$s for an MU-MIMO cellular system while achieving similar or better throughput performance by those state-of-the-art algorithms. mCore+---A sub-millisecond scheduler for 5G MU-MIMO systems: We study a scheduling problem in a 5G NR environment. In 5G NR, an MU-MIMO scheduler needs to allocate RBs and assign MCS for each user at each TTI. In particular, multiple users may be co-scheduled on the same RB under MU-MIMO. In addition, the real-time requirement for determining a scheduling solution is at most 1 ms. In this study, we present a novel scheduler mCore+ that can meet the sub-ms real-time requirement. mCore+ is designed through a multi-phase optimization, leveraging large-scale parallelism. In each phase, mCore+ either decomposes the optimization problem into a large number of independent sub-problems, or reduces the search space into a smaller but more promising subspace, or both. We implement mCore+ on a COTS GPU platform. Experimental results show that mCore+ can obtain a scheduling solution in $sim$500 $mu$s. Moreover, mCore+ can achieve better throughput performance than state-of-the-art algorithms. M3---A sub-millisecond scheduler for multi-cell MIMO networks under C-RAN architecture: We investigate a scheduling problem for a multi-cell environment. Under Cloud Radio Access Network (C-RAN) architecture, the signal processing can be performed cooperatively for multiple cells at a centralized baseband unit (BBU) pool. However, a new resource scheduler is needed to jointly determine RB allocation, MCS assignment, and beamforming matrices for all users under multiple cells. In addition, we aim at finding a scheduling solution within each TTI (i.e., at most 1 ms) to conform to the frame structure defined by 5G NR. To do this, we propose M3---a GPU-based real-time scheduler for a multi-cell MIMO system. M3 is developed through a novel multi-pipeline design that exploits large-scale parallelism. Under this design, one pipeline performs a sequence of operations for cell-edge users to explore joint transmission, and in parallel, the other pipeline is for cell-center users to explore MU-MIMO transmission. For validation, we implement M3 on a COTS GPU. We showed that M3 can find a scheduling solution within 1 ms for all tested cases, while it can significantly increase user throughput by leveraging joint transmission among neighboring cells. / Doctor of Philosophy / MIMO is widely considered to be a major breakthrough in modern wireless communications. MIMO comes in different forms. For conventional MIMO, the number of antennas at a base station (BS) or access point (AP) is typically small (< 8). Today, the number of antennas at a BS/AP is typically ranging from 8 to 64 when the carrier frequency is below 24 GHz. When the carrier frequency is above 24 GHz (e.g., mmWave), the number of antennas can be even larger (> 64). We call today's MIMO systems (typically with $ge$ 8 antennas at some nodes) as ``many-antenna'' MIMO systems, and this will be the focus of this dissertation. Although there exists a considerable amount of works on many-antenna MIMO techniques, most efforts focus on physical (PHY) layer for information-theoretic exploitation. There is a lack of investigation on how to efficiently and effectively utilize many-antenna MIMO from a networking perspective. The goal of this dissertation is to advance many-antenna MIMO techniques for networking research. We focus on the following two critical areas in the context of many-antenna MIMO networks: (i) degree-of-freedom (DoF)--based modeling and (ii) real-time optimization. In the first part, we investigate a novel DoF model under general channel rank conditions for many-antenna MIMO networks. The main contributions of this part are summarized as follows. New DoF models and theories under general channel rank conditions: In this study, we develop a novel DoF model under general channel rank conditions. We show that existing works claiming that unilateral DoF consumption is optimal no longer hold when channel rank is deficient (not full-rank). We find that for IC, shared DoF consumption at both Tx and Rx nodes is the most efficient scheme for DoF allocation. Efficient DoF utilization for MIMO networks: In this study, we proposed a new approach to efficiently utilize DoFs in a MIMO network. The DoFs used to cancel interference are conserved by exploiting the interference signal strength in the eigenspace. Our simulation results show that network throughput under our approach is significantly higher than that under existing DoF IC models. In the second part, we offer real-time designs and implementations to solve many-antenna MIMO problems for 5G cellular systems. The timing performance of these designs is tested in actual wall-clock time. A novel design and implementation for ultra-fast hybrid beamforming: We investigate a beamforming problem under the hybrid beamforming (HB) architecture. We propose Turbo-HB---a novel beamforming design under the HB architecture that can obtain the beamforming matrices in about 500 $mu$s. At the same time, Turbo-HB can achieve similar or better throughput performance by those state-of-the-art algorithms. A sub-millisecond scheduler for 5G multi-user (MU)-MIMO systems: We study a resource scheduling problem in 5G NR. We present a novel scheduler called mCore+ that can schedule time-frequency resources to MU-MIMO users and meet the 500 $mu$s real-time requirement in 5G NR. A sub-millisecond scheduler for multi-cell MIMO networks under C-RAN architecture: We investigate the scheduling problem for a multi-cell environment under a centralized architecture. We present M3---a GPU-based real-time scheduler that jointly determines a scheduling solution among multiple cells. M3 can find the scheduling solution within 1 ms. Wireless communications 5G MIMO many antennas degree of freedom resource scheduling real time GPU
637	CPU/GPU Code Acceleration on Heterogeneous Systems and Code Verification for CFD Applications Xue, Weicheng 25 January 2021 (has links) Computational Fluid Dynamics (CFD) applications usually involve intensive computations, which can be accelerated through using open accelerators, especially GPUs due to their common use in the scientific computing community. In addition to code acceleration, it is important to ensure that the code and algorithm are implemented numerically correctly, which is called code verification. This dissertation focuses on accelerating research CFD codes on multi-CPUs/GPUs using MPI and OpenACC, as well as the code verification for turbulence model implementation using the method of manufactured solutions and code-to-code comparisons. First, a variety of performance optimizations both agnostic and specific to applications and platforms are developed in order to 1) improve the heterogeneous CPU/GPU compute utilization; 2) improve the memory bandwidth to the main memory; 3) reduce communication overhead between the CPU host and the GPU accelerator; and 4) reduce the tedious manual tuning work for GPU scheduling. Both finite difference and finite volume CFD codes and multiple platforms with different architectures are utilized to evaluate the performance optimizations used. A maximum speedup of over 70 is achieved on 16 V100 GPUs over 16 Xeon E5-2680v4 CPUs for multi-block test cases. In addition, systematic studies of code verification are performed for a second-order accurate finite volume research CFD code. Cross-term sinusoidal manufactured solutions are applied to verify the Spalart-Allmaras and k-omega SST model implementation, both in 2D and 3D. This dissertation shows that the spatial and temporal schemes are implemented numerically correctly. / Doctor of Philosophy / Computational Fluid Dynamics (CFD) is a numerical method to solve fluid problems, which usually requires a large amount of computations. A large CFD problem can be decomposed into smaller sub-problems which are stored in discrete memory locations and accelerated by a large number of compute units. In addition to code acceleration, it is important to ensure that the code and algorithm are implemented correctly, which is called code verification. This dissertation focuses on the CFD code acceleration as well as the code verification for turbulence model implementation. In this dissertation, multiple Graphic Processing Units (GPUs) are utilized to accelerate two CFD codes, considering that the GPU has high computational power and high memory bandwidth. A variety of optimizations are developed and applied to improve the performance of CFD codes on different parallel computing systems. The program execution time can be reduced significantly especially when multiple GPUs are used. In addition, code-to-code comparisons with some NASA CFD codes and the method of manufactured solutions are utilized to verify the correctness of a research CFD code. GPU OpenACC MPI Domain Decomposition Performance Optimization GPUDirect Code Verification OOA Discretization Error
638	Real-Time GPU Scheduling with Preemption Support for Autonomous Mobile Robots Bharmal, Burhanuddin Asifhusain 18 January 2022 (has links) The use of graphical processing units (GPUs) for autonomous robots has grown recently due to their efficiency and suitability for data intensive computation. However, the current embedded GPU platforms may lack sufficient real-time capabilities for safety-critical autonomous systems. The GPU driver provides little to no control over the execution of the computational kernels and does not allow multiple kernels to execute concurrently for integrated GPUs. With the development of modern embedded platforms with integrated GPU, many embedded applications are accelerated using GPU. These applications are very computationally intensive, and they often have different criticality levels. In this thesis, we provide a software-based approach to schedule the real-world robotics application with two different scheduling policies: Fixed Priority FIFO Scheduling and Earliest Deadline First Scheduling. We implement several commonly used applications in autonomous mobile robots, such as Path Planning, Object Detection, and Depth Estimation, and improve the response time of these applications. We test our framework on NVIDIA AGX Xavier, which provides high computing power and supports eight different power modes. We measure the response times of all three applications with and without the scheduler on the NVIDIA AGX Xavier platform on different power modes, to evaluate the effectiveness of the scheduler. / Master of Science / Autonomous mobile robots for general human services have increased significantly due to ever-growing technology. The common applications of these robots include delivery services, search and rescue, hotel services, and so on. This thesis focuses on implementing the computational tasks performed by these robots as well as designing the task scheduler, to improve the overall performance of these tasks. The embedded hardware is resource-constrained with limited memory, power, and operating frequency. The use of a graphical processing unit (GPU) for executing the tasks to speed up the operation has increased with the development of the GPU programming framework. We propose a software-based GPU scheduler to execute the functions on GPU and get the best possible performance from the embedded hardware. RT-GPU Scheduling Limited Preemption Path Planning Object Detection Depth Estimation
639	Algorithms and Frameworks for Accelerating Security Applications on HPC Platforms Yu, Xiaodong 09 September 2019 (has links) Typical cybersecurity solutions emphasize on achieving defense functionalities. However, execution efficiency and scalability are equally important, especially for real-world deployment. Straightforward mappings of cybersecurity applications onto HPC platforms may significantly underutilize the HPC devices' capacities. On the other hand, the sophisticated implementations are quite difficult: they require both in-depth understandings of cybersecurity domain-specific characteristics and HPC architecture and system model. In our work, we investigate three sub-areas in cybersecurity, including mobile software security, network security, and system security. They have the following performance issues, respectively: 1) The flow- and context-sensitive static analysis for the large and complex Android APKs are incredibly time-consuming. Existing CPU-only frameworks/tools have to set a timeout threshold to cease the program analysis to trade the precision for performance. 2) Network intrusion detection systems (NIDS) use automata processing as its searching core and requires line-speed processing. However, achieving high-speed automata processing is exceptionally difficult in both algorithm and implementation aspects. 3) It is unclear how the cache configurations impact time-driven cache side-channel attacks' performance. This question remains open because it is difficult to conduct comparative measurement to study the impacts. In this dissertation, we demonstrate how application-specific characteristics can be leveraged to optimize implementations on various types of HPC for faster and more scalable cybersecurity executions. For example, we present a new GPU-assisted framework and a collection of optimization strategies for fast Android static data-flow analysis that achieve up to 128X speedups against the plain GPU implementation. For network intrusion detection systems (IDS), we design and implement an algorithm capable of eliminating the state explosion in out-of-order packet situations, which reduces up to 400X of the memory overhead. We also present tools for improving the usability of Micron's Automata Processor. To study the cache configurations' impact on time-driven cache side-channel attacks' performance, we design an approach to conducting comparative measurement. We propose a quantifiable success rate metric to measure the performance of time-driven cache attacks and utilize the GEM5 platform to emulate the configurable cache. / Doctor of Philosophy / Typical cybersecurity solutions emphasize on achieving defense functionalities. However, execution efficiency and scalability are equally important, especially for the real-world deployment. Straightforward mappings of applications onto High-Performance Computing (HPC) platforms may significantly underutilize the HPC devices’ capacities. In this dissertation, we demonstrate how application-specific characteristics can be leveraged to optimize various types of HPC executions for cybersecurity. We investigate several sub-areas, including mobile software security, network security, and system security. For example, we present a new GPU-assisted framework and a collection of optimization strategies for fast Android static data-flow analysis that achieve up to 128X speedups against the unoptimized GPU implementation. For network intrusion detection systems (IDS), we design and implement an algorithm capable of eliminating the state explosion in out-of-order packet situations, which reduces up to 400X of the memory overhead. We also present tools for improving the usability of HPC programming. To study the cache configurations’ impact on time-driven cache side-channel attacks’ performance, we design an approach to conducting comparative measurement. We propose a quantifiable success rate metric to measure the performance of time-driven cache attacks and utilize the GEM5 platform to emulate the configurable cache. Cybersecurity HPC GPU Intrusion Detection Automata Processor Android Program Analysis Cache Side-Channel Attack
640	Multi-level Parallelism with MPI and OpenACC for CFD Applications McCall, Andrew James 14 June 2017 (has links) High-level parallel programming approaches, such as OpenACC, have recently become popular in complex fluid dynamics research since they are cross-platform and easy to implement. OpenACC is a directive-based programming model that, unlike low-level programming models, abstracts the details of implementation on the GPU. Although OpenACC generally limits the performance of the GPU, this model significantly reduces the work required to port an existing code to any accelerator platform, including GPUs. The purpose of this research is twofold: to investigate the effectiveness of OpenACC in developing a portable and maintainable GPU-accelerated code, and to determine the capability of OpenACC to accelerate large, complex programs on the GPU. In both of these studies, the OpenACC implementation is optimized and extended to a multi-GPU implementation while maintaining a unified code base. OpenACC is shown as a viable option for GPU computing with CFD problems. In the first study, a CFD code that solves incompressible cavity flows is accelerated using OpenACC. Overlapping communication with computation improves performance for the multi-GPU implementation by up to 21%, achieving up to 400 times faster performance than a single CPU and 99% weak scalability efficiency with 32 GPUs. The second study ports the execution of a more complex CFD research code to the GPU using OpenACC. Challenges using OpenACC with modern Fortran are discussed. Three test cases are used to evaluate performance and scalability. The multi-GPU performance using 27 GPUs is up to 100 times faster than a single CPU and maintains a weak scalability efficiency of 95%. / Master of Science Graphics processing unit Directive-based programming OpenACC Lid-driven cavity Multi-GPU Parallel computing

Search results