Global ETD Search

271	Impact of data dependencies for real-time high performance computing. Hossain, M. Alamgir, Kabir, U., Tokhi, M.O. January 2002 (has links) No / This paper presents an investigation into the impact of data dependencies in real-time high performance sequential and parallel processing. An adaptive active vibration control algorithm is considered to demonstrate the impact of data dependencies in real-time computing. The algorithm is analysed in detail to explore the inherent data dependencies. To minimize the impact of data dependencies, an investigation into reducing memory access in sequential computing is provided. The impact of data dependencies with various interconnections is also explored and demonstrated in real-time parallel processing through a set of experiments. Active vibration control Data dependency High performance computing Memory management Finite difference method Real-time control
272	Toward full-stack in silico synthetic biology: integrating model specification, simulation, verification, and biological compilation Konur, Savas, Mierla, L.M., Fellermann, H., Ladroue, C., Brown, B., Wipat, A., Twycross, J., Dun, B.P., Kalvala, S., Gheorghe, Marian, Krasnogor, N. 02 August 2021 (has links) Yes / We present the Infobiotics Workbench (IBW), a user-friendly, scalable, and integrated computational environment for the computer-aided design of synthetic biological systems. It supports an iterative workflow that begins with specification of the desired synthetic system, followed by simulation and verification of the system in high- performance environments and ending with the eventual compilation of the system specification into suitable genetic constructs. IBW integrates modelling, simulation, verification and bicompilation features into a single software suite. This integration is achieved through a new domain-specific biological programming language, the Infobiotics Language (IBL), which tightly combines these different aspects of in silico synthetic biology into a full-stack integrated development environment. Unlike existing synthetic biology modelling or specification languages, IBL uniquely blends modelling, verification and biocompilation statements into a single file. This allows biologists to incorporate design constraints within the specification file rather than using decoupled and independent formalisms for different in silico analyses. This novel approach offers seamless interoperability across different tools as well as compatibility with SBOL and SBML frameworks and removes the burden of doing manual translations for standalone applications. We demonstrate the features, usability, and effectiveness of IBW and IBL using well-established synthetic biological circuits. / The work of S.K. is supported by EPSRC (EP/R043787/1). N.K., A.W., and B.B. acknowledge a Royal Academy of Engineering Chair in Emerging Technologies award and an EPSRC programme grant (EP/N031962/1). Synthetic biology Computational biology In silico Modelling Simulation Verification Biocompilation High performance computing SBOL SBML
273	A Probabilistic Classification Algorithm With Soft Classification Output Phillips, Rhonda D. 23 April 2009 (has links) This thesis presents a shared memory parallel version of the hybrid classification algorithm IGSCR (iterative guided spectral class rejection), a novel data reduction technique that can be used in conjunction with PIGSCR (parallel IGSCR), a noise removal method based on the maximum noise fraction (MNF), and a continuous version of IGSCR (CIGSCR) that outputs soft classifications. All of the above are either classification algorithms or preprocessing algorithms necessary prior to the classification of high dimensional, noisy images. PIGSCR was developed to produce fast and portable code using Fortran 95, OpenMP, and the Hierarchical Data Format version 5 (HDF5) and accompanying data access library. The feature reduction method introduced in this thesis is based on the singular value decomposition (SVD). This feature reduction technique demonstrated that SVD-based feature reduction can lead to more accurate IGSCR classifications than PCA-based feature reduction. This thesis describes a new algorithm used to adaptively filter a remote sensing dataset based on signal-to-noise ratios (SNRs) once the maximum noise fraction (MNF) has been applied. The adaptive filtering scheme improves image quality as shown by estimated SNRs and classification accuracy improvements greater than 10%. The continuous iterative guided spectral class rejection (CIGSCR) classification method is based on the iterative guided spectral class rejection (IGSCR) classification method for remotely sensed data. Both CIGSCR and IGSCR use semisupervised clustering to locate clusters that are associated with classes in a classification scheme. This type of semisupervised classification method is particularly useful in remote sensing where datasets are large, training data are difficult to acquire, and clustering makes the identification of subclasses adequate for training purposes less difficult. Experimental results indicate that the soft classification output by CIGSCR is reasonably accurate (when compared to IGSCR), and the fundamental algorithmic changes in CIGSCR (from IGSCR) result in CIGSCR being less sensitive to input parameters that influence iterations. / Ph. D. classification data reduction parallel processing remote sensing high performance computing cluster evaluation
274	Theories and Techniques for Efficient High-End Computing Ge, Rong 02 November 2007 (has links) Today, power consumption costs supercomputer centers millions of dollars annually and the heat produced can reduce system reliability and availability. Achieving high performance while reducing power consumption is challenging since power and performance are inextricably interwoven; reducing power often results in degradation in performance. This thesis aims to address these challenges by providing theories, techniques, and tools to 1) accurately predict performance and improve it in systems with advanced hierarchical memories, 2) understand and evaluate power and its impacts on performance, 3) control power and performance for maximum efficiency. Our theories, techniques, and tools have been applied to high-end computing systems. Our theroetical models can improve algorithm performance by up to 59% and accurately predict the impacts of power on performance. Our techniques can evaluate power consumption of high-end computing systems and their applications with fine granularity and save up to 36% energy with little performance degradation. / Ph. D. High-performance computing power and performance management power-performance efficiency performance modeling and analysis
275	Parallel Algorithms for Switching Edges and Generating Random Graphs from Given Degree Sequences using HPC Platforms Bhuiyan, Md Hasanuzzaman 09 November 2017 (has links) Networks (or graphs) are an effective abstraction for representing many real-world complex systems. Analyzing various structural properties of and dynamics on such networks reveal valuable insights about the behavior of such systems. In today's data-rich world, we are deluged by the massive amount of heterogeneous data from various sources, such as the web, infrastructure, and online social media. Analyzing this huge amount of data may take a prohibitively long time and even may not fit into the main memory of a single processing unit, thus motivating the necessity of efficient parallel algorithms in various high-performance computing (HPC) platforms. In this dissertation, we present distributed and shared memory parallel algorithms for some important network analytic problems. First, we present distributed memory parallel algorithms for switching edges in a network. Edge switch is an operation on a network, where two edges are selected randomly, and one of their end vertices are swapped with each other. This operation is repeated either a given number of times or until a specified criterion is satisfied. It has diverse real-world applications such as in generating simple random networks with a given degree sequence and in modeling and studying various dynamic networks. One of the steps in our edge switch algorithm requires generating multinomial random variables in parallel. We also present the first non-trivial parallel algorithm for generating multinomial random variables. Next, we present efficient algorithms for assortative edge switch in a labeled network. Assuming each vertex has a label, an assortative edge switch operation imposes an extra constraint, i.e., two edges are randomly selected and one of their end vertices are swapped with each other if the labels of the end vertices of the edges remain the same as before. It can be used to study the effect of the network structural properties on dynamics over a network. Although the problem of assortative edge switch seems to be similar to that of (regular) edge switch, the constraint on the vertex labels in assortative edge switch leads to a new difficulty, which needs to be addressed by an entirely new algorithmic approach. We first present a novel sequential algorithm for assortative edge switch; then we present an efficient distributed memory parallel algorithm based on our sequential algorithm. Finally, we present efficient shared memory parallel algorithms for generating random networks with exact given degree sequence using a direct graph construction method, which involves computing a candidate list for creating an edge incident on a vertex using the Erdos-Gallai characterization and then randomly creating the edges from the candidates. / Ph. D. Network Science Parallel Algorithms High Performance Computing Edge Switch Random Networks
276	Advanced Sampling Methods for Solving Large-Scale Inverse Problems Attia, Ahmed Mohamed Mohamed 19 September 2016 (has links) Ensemble and variational techniques have gained wide popularity as the two main approaches for solving data assimilation and inverse problems. The majority of the methods in these two approaches are derived (at least implicitly) under the assumption that the underlying probability distributions are Gaussian. It is well accepted, however, that the Gaussianity assumption is too restrictive when applied to large nonlinear models, nonlinear observation operators, and large levels of uncertainty. This work develops a family of fully non-Gaussian data assimilation algorithms that work by directly sampling the posterior distribution. The sampling strategy is based on a Hybrid/Hamiltonian Monte Carlo (HMC) approach that can handle non-normal probability distributions. The first algorithm proposed in this work is the "HMC sampling filter", an ensemble-based data assimilation algorithm for solving the sequential filtering problem. Unlike traditional ensemble-based filters, such as the ensemble Kalman filter and the maximum likelihood ensemble filter, the proposed sampling filter naturally accommodates non-Gaussian errors and nonlinear model dynamics, as well as nonlinear observations. To test the capabilities of the HMC sampling filter numerical experiments are carried out using the Lorenz-96 model and observation operators with different levels of nonlinearity and differentiability. The filter is also tested with shallow water model on the sphere with linear observation operator. Numerical results show that the sampling filter performs well even in highly nonlinear situations where the traditional filters diverge. Next, the HMC sampling approach is extended to the four-dimensional case, where several observations are assimilated simultaneously, resulting in the second member of the proposed family of algorithms. The new algorithm, named "HMC sampling smoother", is an ensemble-based smoother for four-dimensional data assimilation that works by sampling from the posterior probability density of the solution at the initial time. The sampling smoother naturally accommodates non-Gaussian errors and nonlinear model dynamics and observation operators, and provides a full description of the posterior distribution. Numerical experiments for this algorithm are carried out using a shallow water model on the sphere with observation operators of different levels of nonlinearity. The numerical results demonstrate the advantages of the proposed method compared to the traditional variational and ensemble-based smoothing methods. The HMC sampling smoother, in its original formulation, is computationally expensive due to the innate requirement of running the forward and adjoint models repeatedly. The proposed family of algorithms proceeds by developing computationally efficient versions of the HMC sampling smoother based on reduced-order approximations of the underlying model dynamics. The reduced-order HMC sampling smoothers, developed as extensions to the original HMC smoother, are tested numerically using the shallow-water equations model in Cartesian coordinates. The results reveal that the reduced-order versions of the smoother are capable of accurately capturing the posterior probability density, while being significantly faster than the original full order formulation. In the presence of nonlinear model dynamics, nonlinear observation operator, or non-Gaussian errors, the prior distribution in the sequential data assimilation framework is not analytically tractable. In the original formulation of the HMC sampling filter, the prior distribution is approximated by a Gaussian distribution whose parameters are inferred from the ensemble of forecasts. The Gaussian prior assumption in the original HMC filter is relaxed. Specifically, a clustering step is introduced after the forecast phase of the filter, and the prior density function is estimated by fitting a Gaussian Mixture Model (GMM) to the prior ensemble. The base filter developed following this strategy is named cluster HMC sampling filter (ClHMC ). A multi-chain version of the ClHMC filter, namely MC-ClHMC , is also proposed to guarantee that samples are taken from the vicinities of all probability modes of the formulated posterior. These methodologies are tested using a quasi-geostrophic (QG) model with double-gyre wind forcing and bi-harmonic friction. Numerical results demonstrate the usefulness of using GMMs to relax the Gaussian prior assumption in the HMC filtering paradigm. To provide a unified platform for data assimilation research, a flexible and a highly-extensible testing suite, named DATeS , is developed and described in this work. The core of DATeS is implemented in Python to enable for Object-Oriented capabilities. The main components, such as the models, the data assimilation algorithms, the linear algebra solvers, and the time discretization routines are independent of each other, such as to offer maximum flexibility to configure data assimilation studies. / Ph. D. Data Assimilation Inverse Problems Uncertainty Quantification Hamiltonian Monte-Carlo Cluster Sampling Filters High Performance Computing.
277	Towards a Polyalgorithm for Land Use and Land Cover Change Detection Saxena, Rishu 23 February 2018 (has links) Earth observation satellites (EOS) such as Landsat provide image datasets that can be immensely useful in numerous application domains. One way of analyzing satellite images for land use and land cover change (LULCC) is time series analysis (TSA). Several algorithms for time series analysis have been proposed by various groups in remote sensing; more algorithms (that can be adapted) are available in the general time series literature. However, in spite of an abundance of algorithms, the choice of algorithm to be used for analyzing an image stack is presently an open question. A concurrent issue is the prohibitive size of Landsat datasets, currently of the order of petabytes and growing. This makes them computationally unwieldy --- both in storage and processing. An EOS image stack typically consists of multiple images of a fixed area on the Earth's surface (same latitudes and longitudes) taken at different time points. Experiments on multicore servers indicate that carrying out meaningful time series analysis on one such interannual, multitemporal stack with existing state of the art codes can take several days. This work proposes using multiple algorithms to analyze a given image stack in a polyalgorithmic framework. A polyalgorithm combines several basic algorithms, each meant to solve the same problem, producing a strategy that unites the strengths and circumvents the weaknesses of constituent algorithms. The foundation of the proposed TSA based polyalgorithm is laid using three algorithms (LandTrendR, EWMACD, and BFAST). These algorithms are precisely described mathematically, and chosen to be fundamentally distinct from each other in design and in the phenomena they capture. Analysis of results representing success, failure, and parameter sensitivity for each algorithm is presented. Scalability issues, important for real simulations, are also discussed, along with scalable implementations, and speedup results. For a given pixel, Hausdorff distance is used to compare the distance between the change times (breakpoints) obtained from two different algorithms. Timesync validation data, a dataset that is based on human interpretation of Landsat time series in concert with historical aerial photography, is used for validation. The polyalgorithm yields more accurate results than EWMACD and LandTrendR alone, but counterintuitively not better than BFAST alone. This nascent work will be directly useful in land use and land cover change studies, of interest to terrestrial science research, especially regarding anthropogenic impacts on the environment, and in much broader applications such as health monitoring and urban transportation. / M. S. Remote sensing time series analysis event detection change detection big data scalability high performance computing.
278	Exploring the Landscape of Big Data Analytics Through Domain-Aware Algorithm Design Dash, Sajal 20 August 2020 (has links) Experimental and observational data emerging from various scientific domains necessitate fast, accurate, and low-cost analysis of the data. While exploring the landscape of big data analytics, multiple challenges arise from three characteristics of big data: the volume, the variety, and the velocity. High volume and velocity of the data warrant a large amount of storage, memory, and compute power while a large variety of data demands cognition across domains. Addressing domain-intrinsic properties of data can help us analyze the data efficiently through the frugal use of high-performance computing (HPC) resources. In this thesis, we present our exploration of the data analytics landscape with domain-aware approximate and incremental algorithm design. We propose three guidelines targeting three properties of big data for domain-aware big data analytics: (1) explore geometric and domain-specific properties of high dimensional data for succinct representation, which addresses the volume property, (2) design domain-aware algorithms through mapping of domain problems to computational problems, which addresses the variety property, and (3) leverage incremental arrival of data through incremental analysis and invention of problem-specific merging methodologies, which addresses the velocity property. We demonstrate these three guidelines through the solution approaches of three representative domain problems. We present Claret, a fast and portable parallel weighted multi-dimensional scaling (WMDS) tool, to demonstrate the application of the first guideline. It combines algorithmic concepts extended from the stochastic force-based multi-dimensional scaling (SF-MDS) and Glimmer. Claret computes approximate weighted Euclidean distances by combining a novel data mapping called stretching and Johnson Lindestrauss' lemma to reduce the complexity of WMDS from O(f(n)d) to O(f(n) log d). In demonstrating the second guideline, we map the problem of identifying multi-hit combinations of genetic mutations responsible for cancers to weighted set cover (WSC) problem by leveraging the semantics of cancer genomic data obtained from cancer biology. Solving the mapped WSC with an approximate algorithm, we identified a set of multi-hit combinations that differentiate between tumor and normal tissue samples. To identify three- and four-hits, which require orders of magnitude larger computational power, we have scaled out the WSC algorithm on a hundred nodes of Summit supercomputer. In demonstrating the third guideline, we developed a tool iBLAST to perform an incremental sequence similarity search. Developing new statistics to combine search results over time makes incremental analysis feasible. iBLAST performs (1+δ)/δ times faster than NCBI BLAST, where δ represents the fraction of database growth. We also explored various approaches to mitigate catastrophic forgetting in incremental training of deep learning models. / Doctor of Philosophy / Experimental and observational data emerging from various scientific domains necessitate fast, accurate, and low-cost analysis of the data. While exploring the landscape of big data analytics, multiple challenges arise from three characteristics of big data: the volume, the variety, and the velocity. Here volume represents the data's size, variety represents various sources and formats of the data, and velocity represents the data arrival rate. High volume and velocity of the data warrant a large amount of storage, memory, and computational power. In contrast, a large variety of data demands cognition across domains. Addressing domain-intrinsic properties of data can help us analyze the data efficiently through the frugal use of high-performance computing (HPC) resources. This thesis presents our exploration of the data analytics landscape with domain-aware approximate and incremental algorithm design. We propose three guidelines targeting three properties of big data for domain-aware big data analytics: (1) explore geometric (pair-wise distance and distribution-related) and domain-specific properties of high dimensional data for succinct representation, which addresses the volume property, (2) design domain-aware algorithms through mapping of domain problems to computational problems, which addresses the variety property, and (3) leverage incremental data arrival through incremental analysis and invention of problem-specific merging methodologies, which addresses the velocity property. We demonstrate these three guidelines through the solution approaches of three representative domain problems. We demonstrate the application of the first guideline through the design and development of Claret. Claret is a fast and portable parallel weighted multi-dimensional scaling (WMDS) tool that can reduce the dimension of high-dimensional data points. In demonstrating the second guideline, we identify combinations of cancer-causing gene mutations by mapping the problem to a well known computational problem known as the weighted set cover (WSC) problem. We have scaled out the WSC algorithm on a hundred nodes of Summit supercomputer to solve the problem in less than two hours instead of an estimated hundred years. In demonstrating the third guideline, we developed a tool iBLAST to perform an incremental sequence similarity search. This analysis was made possible by developing new statistics to combine search results over time. We also explored various approaches to mitigate the catastrophic forgetting of deep learning models, where a model forgets to perform machine learning tasks efficiently on older data in a streaming setting. Big data analytics High-performance computing Algorithmic machine learning Incremental algorithm Approximate algorithm
279	Directive-Based Data Partitioning and Pipelining and Auto-Tuning for High-Performance GPU Computing Cui, Xuewen 15 December 2020 (has links) The computer science community needs simpler mechanisms to achieve the performance potential of accelerators, such as graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and co-processors (e.g., Intel Xeon Phi), due to their increasing use in state-of-the-art supercomputers. Over the past 10 years, we have seen a significant improvement in both computing power and memory connection bandwidth for accelerators. However, we also observe that the computation power has grown significantly faster than the interconnection bandwidth between the central processing unit (CPU) and the accelerator. Given that accelerators generally have their own discrete memory space, data needs to be copied from the CPU host memory to the accelerator (device) memory before computation starts on the accelerator. Moreover, programming models like CUDA, OpenMP, OpenACC, and OpenCL can efficiently offload compute-intensive workloads to these accelerators. However, achieving the overlapping of data transfers with computation in a kernel with these models is neither simple nor straightforward. Instead, codes copy data to or from the device without overlapping or requiring explicit user design and refactoring. Achieving performance can require extensive refactoring and hand-tuning to apply data transfer optimizations, and users must manually partition their dataset whenever its size is larger than device memory, which can be highly difficult when the device memory size is not exposed to the user. As the systems are becoming more and more complex in terms of heterogeneity, CPUs are responsible for handling many tasks related to other accelerators, computation and data movement tasks, task dependency checking, and task callbacks. Leaving all logic controls to the CPU not only costs extra communication delay over the PCI-e bus but also consumes the CPU resources, which may affect the performance of other CPU tasks. This thesis work aims to provide efficient directive-based data pipelining approaches for GPUs that tackle these issues and improve performance, programmability, and memory management. / Doctor of Philosophy / Over the past decade, parallel accelerators have become increasingly prominent in this emerging era of "big data, big compute, and artificial intelligence.'' In more recent supercomputers and datacenter clusters, we find multi-core central processing units (CPUs), many-core graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and co-processors (e.g., Intel Xeon Phi) being used to accelerate many kinds of computation tasks. While many new programming models have been proposed to support these accelerators, scientists or developers without domain knowledge usually find existing programming models not efficient enough to port their code to accelerators. Due to the limited accelerator on-chip memory size, the data array size is often too large to fit in the on-chip memory, especially while dealing with deep learning tasks. The data need to be partitioned and managed properly, which requires more hand-tuning effort. Moreover, performance tuning is difficult for developers to achieve high performance for specific applications due to a lack of domain knowledge. To handle these problems, this dissertation aims to propose a general approach to provide better programmability, performance, and data management for the accelerators. Accelerator users often prefer to keep their existing verified C, C++, or Fortran code rather than grapple with the unfamiliar code. Since 2013, OpenMP has provided a straightforward way to adapt existing programs to accelerated systems. We propose multiple associated clauses to help developers easily partition and pipeline the accelerated code. Specifically, the proposed extension can overlap kernel computation and data transfer between host and device efficiently. The extension supports memory over-subscription, meaning the memory size required by the tasks could be larger than the GPU size. The internal scheduler guarantees that the data is swapped out correctly and efficiently. Machine learning methods are also leveraged to help with auto-tuning accelerator performance. OpenMP GPU Pipeline Partitioning Unified Memory High-performance Computing Machine learning
280	Scalable and Energy Efficient Execution Methods for Multicore Systems Li, Dong 16 February 2011 (has links) Multicore architectures impose great pressure on resource management. The exploration spaces available for resource management increase explosively, especially for large-scale high end computing systems. The availability of abundant parallelism causes scalability concerns at all levels. Multicore architectures also impose pressure on power management. Growth in the number of cores causes continuous growth in power. In this dissertation, we introduce methods and techniques to enable scalable and energy efficient execution of parallel applications on multicore architectures. We study strategies and methodologies that combine DCT and DVFS for the hybrid MPI/OpenMP programming model. Our algorithms yield substantial energy saving (8.74% on average and up to 13.8%) with either negligible performance loss or performance gain (up to 7.5%). To save additional energy for high-end computing systems, we propose a power-aware MPI task aggregation framework. The framework predicts the performance effect of task aggregation in both computation and communication phases and its impact in terms of execution time and energy of MPI programs. Our framework provides accurate predictions that lead to substantial energy saving through aggregation (64.87% on average and up to 70.03%) with tolerable performance loss (under 5%). As we aggregate multiple MPI tasks within the same node, we have the scalability concern of memory registration for high performance networking. We propose a new memory registration/deregistration strategy to reduce registered memory on multicore architectures with helper threads. We investigate design polices and performance implications of the helper thread approach. Our method efficiently reduces registered memory (23.62% on average and up to 49.39%) and avoids memory registration/deregistration costs for reused communication memory. Our system enables the execution of application input sets that could not run to the completion with the memory registration limitation. / Ph. D. Performance Modeling and Analysis Multicore Processors Power-Aware Computing Concurrency Throttling High-Performance Computing

Search results