Global ETD Search

221	High-performance communication infrastructure design on FPGA-centric clusters Yang, Chen 29 September 2019 (has links) FPGA-Centric Clusters (FCCs) with the FPGAs directly linked through their Multi-Gigabit Transceivers (MGTs) have a proven advantage over other commodity architectures for communication bound applications. To date, however, communication infrastructure for such clusters has generally only taken one of two simple approaches: nearest-neighbor-only, which is fast but of limited utility, and processor-based, which is general but slow. The overall problem addressed in this dissertation is the architecture, design, and implementation of communication networks for FCCs. These network designs should take advantage of the decades of design experience in networks for High-Performance Computing (HPC) clusters, but should also account for, and take advantage of, unique characteristics of FCCs, in particular, the configurability of the FPGAs themselves. This dissertation has seven parts. We begin with in-depth implementations of two model applications, Directional Dark Matter (DM) Detection, and Molecular Dynamics (MD). These implementations expose the necessary characteristics of FCC networks from physical through application layers. The second is the systematic exploration of communication microarchitecture for FCCs, as has been done previously for HPC clusters and for Networks on Chips (NoCs) on both FPGAs and ASICs. One outcome of this part is to find the properties of FCCs that substantially influence the router design space. Another outcome is to create a selection of candidate routers and generalize it so that it is parameterized by routing algorithm, arbitration policy, number of virtual channels (VCs), and other parameters. The third part is to use the proposed application-aware framework to evaluate the resulting design space with respect to a number of common communication patterns and packet sizes. The results from this part enable two sets of designs. One is the selection of an optimal router for a given resource budget that accounts for all the workloads. The other is to take advantage of FPGA reconfigurability to select the optimal router accounting for both resource budget and a particular workload. The fourth part is to evaluate the advantages of this approach of adapting the router design to the application. We find that the optimality of the router design varies significantly with workloads. We observe that compared with the router configuration with the best average performance, application-aware router selection can lead to substantial improvement in performance or reduction in resources required. The fifth part is application-specific optimizations in which we develop several modules and functional units that can provide specific optimizations for certain types of communication workloads depending on the application it going to serve. The sixth part explores topology emulation, e.g., when a three-dimensional network is used in the computation of an application that is logically two dimensional. We propose a generalized fold-and-cut mechanism that both preserves the locality in logical mapping, while also making use of the extra links provided by our 3D-torus fixture. The seventh part is a table-based static-scheduled router for applications with a static or persistent communication pattern. The router supports various cases, including unicast, multicast, and reduction. By making routing decisions a priori, we can bring better load-balance to network links and reduce congestion. Computer engineering Communication Dark matter detection FPGA-cluster High-performance computing Molecular dynamics Reconfigurable computing
222	Gene-EnvironmentInteraction Analysis UsingGraphic Cards / Analys av genmiljöinteraktion med använding avgrafikkort Berglund, Daniel January 2015 (has links) Genome-wide association studies(GWAS) are used to find associations betweengenetic markers and diseases. One part of GWAS is to study interactions be-tween markers which can play an important role in the risk for the disease. Thesearch for interactions can be computationally intensive. The aim of this thesiswas to improve the performance of software used for gene-environment interac-tion by using parallel programming techniques on graphical processors. A studyof the new programs performance, speedup and efficiency was made using mul-tiple simulated datasets. The program shows significantly better performancecompared with the older program. HPC high performance computing CUDA GPU GWAS gene-environment interaction interaction Computer Sciences Datavetenskap (datalogi)
223	Automating telemetry- and trace-based analytics on large-scale distributed systems Ateş, Emre 28 September 2020 (has links) Large-scale distributed systems---such as supercomputers, cloud computing platforms, and distributed applications---routinely suffer from slowdowns and crashes due to software and hardware problems, resulting in reduced efficiency and wasted resources. These large-scale systems typically deploy monitoring or tracing systems that gather a variety of statistics about the state of the hardware and the software. State-of-the-art methods either analyze this data manually, or design unique automated methods for each specific problem. This thesis builds on the vision that generalized automated analytics methods on the data sets collected from these complex computing systems provide critical information about the causes of the problems, and this analysis can then enable proactive management to improve performance, resilience, efficiency, or security significantly beyond current limits. This thesis seeks to design scalable, automated analytics methods and frameworks for large-scale distributed systems that minimize dependency on expert knowledge, automate parts of the solution process, and help make systems more resilient. In addition to analyzing data that is already collected from systems, our frameworks also identify what to collect from where in the system, such that the collected data would be concise and useful for manual analytics. We focus on two data sources for conducting analytics: numeric telemetry data, which is typically collected from operating system or hardware counters, and end-to-end traces collected from distributed applications. This thesis makes the following contributions in large-scale distributed systems: (1) Designing a framework for accurately diagnosing previously encountered performance variations, (2) designing a technique for detecting (unwanted) applications running on the systems, (3) developing a suite for reproducing performance variations that can be used to systematically develop analytics methods, (4) designing a method to explain predictions of black-box machine learning frameworks, and (5) constructing an end-to-end tracing framework that can dynamically adjust instrumentation for effective diagnosis of performance problems. / 2021-09-28T00:00:00Z Computer engineering Distributed systems Explainability High performance computing Machine learning Monitoring Tracing
224	HPCC based Platform for COPD Readmission Risk Analysis with implementation of Dimensionality reduction and balancing techniques Unknown Date (has links) Hospital readmission rates are considered to be an important indicator of quality of care because they may be a consequence of actions of commission or omission made during the initial hospitalization of the patient, or as a consequence of poorly managed transition of the patient back into the community. The negative impact on patient quality of life and huge burden on healthcare system have made reducing hospital readmissions a central goal of healthcare delivery and payment reform efforts. In this study, we will be proposing a framework on how the readmission analysis and other healthcare models could be deployed in real world and a Machine learning based solution which uses patients discharge summaries as a dataset to train and test the machine learning model created. Current systems does not take into consideration one of the very important aspect of solving readmission problem by taking Big data into consideration. This study also takes into consideration Big data aspect of solutions which can be deployed in the field for real world use. We have used HPCC compute platform which provides distributed parallel programming platform to create, run and manage applications which involves large amount of data. We have also proposed some feature engineering and data balancing techniques which have shown to greatly enhance the machine learning model performance. This was achieved by reducing the dimensionality in the data and fixing the imbalance in the dataset. The system presented in this study provides a real world machine learning based predictive modeling for reducing readmissions which could be templatized for other diseases. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection Machine learning Big data Patient Readmission High performance computing
225	Statistical Techniques to Model and Optimize Performance of Scientific, Numerically Intensive Workloads Steven Monteiro, Steena Dominica 01 December 2016 (has links) Projecting performance of applications and hardware is important to several market segments—hardware designers, software developers, supercomputing centers, and end users. Hardware designers estimate performance of current applications on future systems when designing new hardware. Software developers make performance estimates to evaluate performance of their code on different architectures and input datasets. Supercomputing centers try to optimize the process of matching computing resources to computing needs. End users requesting time on supercomputers must provide estimates of their application’s run time, and incorrect estimates can lead to wasted supercomputing resources and time. However, application performance is challenging to predict because it is affected by several factors in application code, specifications of system hardware, choice of compilers, compiler flags, and libraries. This dissertation uses statistical techniques to model and optimize performance of scientific applications across different computer processors. The first study in this research offers statistical models that predict performance of an application across different input datasets prior to application execution. These models guide end users to select parameters that produce optimal application performance during execution. The second study offers a suite of statistical models that predict performance of a new application on a new processor. Both studies present statistical techniques that can be generalized to analyze, optimize, and predict performance of diverse computation- and data-intensive applications on different hardware. performance prediction performance analysis performance modeling high performance computing Computer Sciences
226	Turbulence Modeling of Strongly Heated Internal Pipe Flow Using Large Eddy Simulation Hradisky, Michal 01 May 2011 (has links) The main objective of this study was to evaluate the performance of three Large Eddy Simulation (LES) subgrid scale (SGS) models on a strongly heated, low Mach number upward gas flow in a vertical pipe with forced convection. The models chosen for this study were the Smagorinsky-Lilly Dynamic model (SLD), the Kinetic Energy Transport model (KET), and the Wall-Adaptive Local-Eddy viscosity model (WALE). The used heating rate was sufficiently large to cause properties to vary significantly in both the radial and streamwise directions. All simulations were carried out using the commercial software FLUENT. The effect of inlet turbulence generation techniques was considered as well. Three inlet turbulence generation techniques were compared, namely, the Spectral Synthesizer Method (SSM), the Vortex Method (VM), and the Generator (GEN) technique. A user-defined function (UDF) was written to implement the GEN technique into the solver; the SSM and VM techniques were already build-in. All simulation and solver settings were validated by performing computational simulations of isothermal fully developed pipe flow and results were compared to available experimental and Direct Numerical Simulation (DNS) data. For isothermal boundary conditions, among the three inlet turbulence generation techniques, the GEN technique produced results which best matched the experimental and DNS results. All three LES SGS models performed equally well when coupled with the GEN technique for the study of isothermal pipe flow. However, all models incorrectly predicted the behavior of radial and circumferential velocity fluctuations near the wall and the GEN technique proved to be the most computationally expensive. For simulations with longer computational domain, the effect of the inlet turbulence generation technique diminishes. However, results suggest that both the SLD and KET models need shorter computational domains to recover proper LES behavior when coupled with the VM technique in comparison to the WALE SGS model with the same turbulence inlet generation technique. For high heat flux simulations all SGS models were coupled with the VM technique to decrease the computational effort to obtain statistically steady-state solution. For comparative purposes, one simulation was carried out using the WALE and GEN techniques. All simulations equally significantly underpredicted the streamwise temperature distribution along the pipe wall as well as in the radial directions at various streamwise locations. These effects are attributed to the overpredicted streamwise velocity components and incorrect behavior of both the radial and circumferential velocity components in the near wall region for all subgrid scale models. Large Eddy Simulation Computational FLUENT High Performance Computing Modeling Turbulence Mechanical Engineering
227	Scalable event tracking on high-end parallel systems Mohror, Kathryn Marie 01 January 2010 (has links) Accurate performance analysis of high end systems requires event-based traces to correctly identify the root cause of a number of the complex performance problems that arise on these highly parallel systems. These high-end architectures contain tens to hundreds of thousands of processors, pushing application scalability challenges to new heights. Unfortunately, the collection of event-based data presents scalability challenges itself: the large volume of collected data increases tool overhead, and results in data files that are difficult to store and analyze. Our solution to these problems is a new measurement technique called trace profiling that collects the information needed to diagnose performance problems that traditionally require traces, but at a greatly reduced data volume. The trace profiling technique reduces the amount of data measured and stored by capitalizing on the repeated behavior of programs, and on the similarity of the behavior and performance of parallel processes in an application run. Trace profiling is a hybrid between profiling and tracing, collecting summary information about the event patterns in an application run. Because the data has already been classified into behavior categories, we can present reduced, partially analyzed performance data to the user, highlighting the performance behaviors that comprised most of the execution time. High performance computing Computer Sciences Systems Architecture
228	Power, Performance and Energy Models and Systems for Emergent Architectures Song, Shuaiwen 10 April 2013 (has links) Massive parallelism combined with complex memory hierarchies and heterogeneity in high-performance computing (HPC) systems form a barrier to efficient application and architecture design. The performance achievements of the past must continue over the next decade to address the needs of scientific simulations. However, building an exascale system by 2022 that uses less than 20 megawatts will require significant innovations in power and performance efficiency. A key limitation of past approaches is a lack of power-performance policies allowing users to quantitatively bound the effects of power management on the performance of their applications and systems. Existing controllers and predictors use policies fixed by a knowledgeable user to opportunistically save energy and minimize performance impact. While the qualitative effects are often good and the aggressiveness of a controller can be tuned to try to save more or less energy, the quantitative effects of tuning and setting opportunistic policies on performance and power are unknown. In other words, the controller will save energy and minimize performance loss in many cases but we have little understanding of the quantitative effects of controller tuning. This makes setting power-performance policies a manual trial and error process for domain experts and a black art for practitioners. To improve upon past approaches to high-performance power management, we need to quantitatively understand the effects of power and performance at scale. In this work, I have developed theories and techniques to quantitatively understand the relationship between power and performance for high performance systems at scale. For instance, our system-level, iso-energy-efficiency model analyzes, evaluates and predicts the performance and energy use of data intensive parallel applications on multi-core systems. This model allows users to study the effects of machine and application dependent characteristics on system energy efficiency. Furthermore, this model helps users isolate root causes of energy or performance inefficiencies and develop strategies for scaling systems to maintain or improve efficiency. I have also developed methodologies which can be extended and applied to model modern heterogeneous architectures such as GPU-based clusters to improve their efficiency at scale. / Ph. D. High performance computing power-aware computing runtime system
229	Popcorn Linux: enabling efficient inter-core communication in a Linux-based multikernel operating system Shelton, Benjamin H. 31 May 2013 (has links) As manufacturers introduce new machines with more cores, more NUMA-like architectures, and more tightly integrated heterogeneous processors, the traditional abstraction of a monolithic OS running on a SMP system is encountering new challenges. One proposed path forward is the multikernel operating system. Previous efforts have shown promising results both in scalability and in support for heterogeneity. However, one effort\'s source code is not freely available (FOS), and the other effort is not self-hosting and does not support a majority of existing applications (Barrelfish). In this thesis, we present Popcorn, a Linux-based multikernel operating system. While Popcorn was a group effort, the boot layer code and the memory partitioning code are the author\'s work, and we present them in detail here. To our knowledge, we are the first to support multiple instances of the Linux kernel on a 64-bit x86 machine and to support more than 4 kernels running simultaneously. We demonstrate that existing subsystems within Linux can be leveraged to meet the design goals of a multikernel OS. Taking this approach, we developed a fast inter-kernel network driver and messaging layer. We demonstrate that the network driver can share a 1 Gbit/s link without degraded performance and that in combination with guest kernels, it meets or exceeds the performance of SMP Linux with an event-based web server. We evaluate the messaging layer with microbenchmarks and conclude that it performs well given the limitations of current x86-64 hardware. Finally, we use the messaging layer to provide live process migration between cores. / Master of Science Operating systems multikernel high-performance computing heterogeneous computing multicore scalability message passing
230	Study on Propulsive Characteristics of Magnetic Sail and Magneto Plasma Sail by Plasma Particle Simulations / 粒子シミュレーションによる磁気セイル・磁気プラズマセイルの推力特性に関する研究 Ashida, Yasumasa 23 January 2014 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(工学) / 甲第17984号 / 工博第3813号 / 新制\|\|工\|\|1584(附属図書館) / 80828 / 京都大学大学院工学研究科電気工学専攻 / (主査)教授山川宏, 教授松尾哲司, 准教授中村武恒 / 学位規則第4条第1項該当 / Doctor of Philosophy (Engineering) / Kyoto University / DFAM Magnetic sail Magneto plasma sail Plasma simulation Full-PIC High performance computing 500

Search results