291 |
Performance and power modeling of GPU systems with dynamic voltage and frequency scalingWang, Qiang 13 August 2020 (has links)
To address the ever-increasing demand for computing capacities, more and more heterogeneous systems have been designed to use both general-purpose and special-purpose processors. The huge energy consumption of them raises new environmental concerns and challenges. Besides performance, energy efficiency is another key factor to be considered by system designers and consumers. In particular, contemporary graphics processing units (GPUs) support dynamic voltage and frequency scaling (DVFS) to balance computational performance and energy consumption. However, accurate and straightforward performance and power estimation for a given GPU kernel under different frequency settings is still lacking for real hardware, which is essential to determine the best frequency configuration for energy saving. In this thesis, we investigate how to improve the energy efficiency of GPU systems by accurately modeling the effects of GPU DVFS on the target GPU kernel. We also propose efficient algorithms to solve the communication contention problem in scheduling multiple distributed deep learning (DDL) jobs on GPU clusters. We introduce our studies as follows. First, we present a benchmark suite EPPMiner for evaluating the performance, power, and energy of different heterogeneous systems. EPPMiner consists of 16 benchmark programs that cover a broad range of application domains, and it shows a great variety in the intensity of utilizing the processors. We have implemented a prototype of EPPMiner that supports OpenMP, CUDA, and OpenCL, and demonstrated its usage by three showcases. The showcases justify that GPUs provide much better energy efficiency than other types of computing systems, and especially illustrate the effectiveness of GPU Dynamic Voltage and Frequency Scaling (DVFS) on the energy efficiency of GPU applications. Second, we reveal a fine-grained analytical model to estimate the execution time of GPU kernels with both core and memory frequency scaling. Compared to the cycle-level simulators, which are too slow to apply on real hardware, our model only needs one-off micro-benchmarks to extract a set of hardware parameters and kernel performance counters without any source code analysis. Our experimental results show that the proposed performance model can capture the kernel performance scaling behaviors under different frequency settings and achieve decent accuracy. Third, we design a cross-benchmarking suite, which simulates kernels with a wide range of instruction distributions. The synthetic kernels generated by this suite can be used for model pre- training or as supplementary training samples. We then build machine learning models to predict the execution time and runtime power of a GPU kernel under different voltage and frequency settings. Validated on three modern GPUs with a wide frequency scaling range, by using a collection of 24 real application kernels, the model trained only with our cross-benchmarking suite is able to achieve considerably accurate results. At last, we establish a new DDL job scheduling framework which organizes DDL jobs as Directed Acyclic Graphs (DAGs) and considers communication contention between nodes. We then propose an efficient job placement algorithm, Least-Workload-First- (LWF-), to balance the GPU utilization and consolidate the allocated GPUs for each job. When scheduling the communication tasks, we propose Ada-SRSF for the DDL job scheduling problem to address the communication contention issue. Our simulation results show that LWF- achieves up to 1.59x improvement over the classical first-fit algorithms. More importantly, Ada-SRSF reduces the average job completion time by up to 36.7%, as compared to the solutions of either avoiding all the communication contention or accepting all of it
|
292 |
Performance and power modeling of GPU systems with dynamic voltage and frequency scalingWang, Qiang 13 August 2020 (has links)
To address the ever-increasing demand for computing capacities, more and more heterogeneous systems have been designed to use both general-purpose and special-purpose processors. The huge energy consumption of them raises new environmental concerns and challenges. Besides performance, energy efficiency is another key factor to be considered by system designers and consumers. In particular, contemporary graphics processing units (GPUs) support dynamic voltage and frequency scaling (DVFS) to balance computational performance and energy consumption. However, accurate and straightforward performance and power estimation for a given GPU kernel under different frequency settings is still lacking for real hardware, which is essential to determine the best frequency configuration for energy saving. In this thesis, we investigate how to improve the energy efficiency of GPU systems by accurately modeling the effects of GPU DVFS on the target GPU kernel. We also propose efficient algorithms to solve the communication contention problem in scheduling multiple distributed deep learning (DDL) jobs on GPU clusters. We introduce our studies as follows. First, we present a benchmark suite EPPMiner for evaluating the performance, power, and energy of different heterogeneous systems. EPPMiner consists of 16 benchmark programs that cover a broad range of application domains, and it shows a great variety in the intensity of utilizing the processors. We have implemented a prototype of EPPMiner that supports OpenMP, CUDA, and OpenCL, and demonstrated its usage by three showcases. The showcases justify that GPUs provide much better energy efficiency than other types of computing systems, and especially illustrate the effectiveness of GPU Dynamic Voltage and Frequency Scaling (DVFS) on the energy efficiency of GPU applications. Second, we reveal a fine-grained analytical model to estimate the execution time of GPU kernels with both core and memory frequency scaling. Compared to the cycle-level simulators, which are too slow to apply on real hardware, our model only needs one-off micro-benchmarks to extract a set of hardware parameters and kernel performance counters without any source code analysis. Our experimental results show that the proposed performance model can capture the kernel performance scaling behaviors under different frequency settings and achieve decent accuracy. Third, we design a cross-benchmarking suite, which simulates kernels with a wide range of instruction distributions. The synthetic kernels generated by this suite can be used for model pre- training or as supplementary training samples. We then build machine learning models to predict the execution time and runtime power of a GPU kernel under different voltage and frequency settings. Validated on three modern GPUs with a wide frequency scaling range, by using a collection of 24 real application kernels, the model trained only with our cross-benchmarking suite is able to achieve considerably accurate results. At last, we establish a new DDL job scheduling framework which organizes DDL jobs as Directed Acyclic Graphs (DAGs) and considers communication contention between nodes. We then propose an efficient job placement algorithm, Least-Workload-First- (LWF-), to balance the GPU utilization and consolidate the allocated GPUs for each job. When scheduling the communication tasks, we propose Ada-SRSF for the DDL job scheduling problem to address the communication contention issue. Our simulation results show that LWF- achieves up to 1.59x improvement over the classical first-fit algorithms. More importantly, Ada-SRSF reduces the average job completion time by up to 36.7%, as compared to the solutions of either avoiding all the communication contention or accepting all of it
|
293 |
Visual Assessment of Rivers and Marshes: An Examination of the Relationship of Visual Units, Perceptual Variables and PreferenceEllsworth, John C. 01 May 1982 (has links)
The purpose of this research was to examine the relationship of two approaches to visual assessment of landscape--the qualitative descriptive inventory and the theoretically-based empirical perceptual preference approach. Three levels of landscape visual units based on bio-physical similarities (landscape units, setting units, and waterscape units) were identified in a marsh (CUtler Reservoir, Cache County , Utah), and its tributary streams. Color slide photographs were taken from five of the visual units. These slides were rated on a 5- point scale by panels of judges for the expression of four perceptual variables--coherence, complexity, mystery, and legibility. The same slides were rated on a 5-point scale by 98 respondents according to their preference for each slide. The relationship of the visual units, perceptual variables, and preference was evaluated by analytical and statistical procedures.
Results showed significant differences in the expression of the four perceptual variables between rivers and marshes and between setting units~ Both rivers and marshes were considered coherent when there were similarities in vegetation within the respective types; however, the strong horizontal organization of the marsh scenes necessary for coherence contrasted with the edge definition and orderliness considered necessary in rivers. Mystery was also related to similar factors in rivers and marshes (such as obscuring vegetation, particularly in the marsh) but the presence of riverbanks and bends in the river corridor had a distinct effect on mystery ratings in the river scenes . Complexity in both rivers and marshes was primarily dependent on diversity of vegetation and visual depth , but the number of different visual elements in river scenes also influenced complexity. Legibility was related to straight, enclosed and simple corridors in river images and to simple spaces with regular vegetation in marsh images. Fine textures and clear spatial definition enhanced legibility.
Preference ratings were significantly different between rivers and marshes, but not between river setting units or waterscape units. River scenes received higher preference ratings than marsh scenes. Mystery , complexity, and visual depth were especially important to preference. Demographic variables of age, sex, academic major, and home state did not significantly affect preference. Statistical analysis indicated each perceptual variable was an independent predictor, and that compared to visual units, perceptual variables were more strongly related to preference.
|
294 |
Multiple IMU Sensor Fusion for SUAS Navigation and PhotogrammetryGivens, Matthew 01 August 2019 (has links)
Inertial measurement units (IMUs) are devices that sense accelerations and angular rates in 3D so that vehicles and other devices can estimate their orientations, positions, and velocities. While traditionally large, heavy, and costly, using mechanical gyroscopes and stabilized platforms, the recent development of micro-electromechanical sensor (MEMS) IMUs that are small, light, and inexpensive has led to their adoption in many everyday systems such as cell phones, video game controllers, and commercial drones. MEMS IMUs, despite their advantages, have major drawbacks when it comes to accuracy and reliability. The idea of using more than one of these sensors in an array, instead of using only one, and fusing their outputs to generate an improved solution is explored in this thesis.
|
295 |
Sparse array representations and some selected array operations on GPUsWang, Hairong 01 September 2014 (has links)
A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science. Johannesburg, 2014. / A multi-dimensional data model provides a good conceptual view of the data in data warehousing and On-Line
Analytical Processing (OLAP). A typical representation of such a data model is as a multi-dimensional array
which is well suited when the array is dense. If the array is sparse, i.e., has a few number of non-zero elements
relative to the product of the cardinalities of the dimensions, using a multi-dimensional array to represent the
data set requires extremely large memory space while the actual data elements occupy a relatively small fraction
of the space. Existing storage schemes for Multi-Dimensional Sparse Arrays (MDSAs) of higher dimensions
k (k > 2), focus on optimizing the storage utilization, and offer little flexibility in data access efficiency.
Most efficient storage schemes for sparse arrays are limited to matrices that are arrays in 2 dimensions. In
this dissertation, we introduce four storage schemes for MDSAs that handle the sparsity of the array with two
primary goals; reducing the storage overhead and maintaining efficient data element access. These schemes,
including a well known method referred to as the Bit Encoded Sparse Storage (BESS), were evaluated and
compared on four basic array operations, namely construction of a scheme, large scale random element access,
sub-array retrieval and multi-dimensional aggregation. The four storage schemes being proposed, together
with the evaluation results are: i.) The extended compressed row storage (xCRS) which extends CRS method
for sparse matrix storage to sparse arrays of higher dimensions and achieves the best data element access
efficiency among the methods compared; ii.) The bit encoded xCRS (BxCRS) which optimizes the storage
utilization of xCRS by applying data compression methods with run length encoding, while maintaining its
data access efficiency; iii.) A hybrid approach (Hybrid) that provides the best control of the balance between
the storage utilization and data manipulation efficiency by combining xCRS and BESS. iv.) The PATRICIA
trie compressed storage (PTCS) which uses PATRICIA trie to store the valid non-zero array elements. PTCS
supports efficient data access, and has a unique property of supporting update operations conveniently. v.)
BESS performs the best for the multi-dimensional aggregation, closely followed by the other schemes.
We also addressed the problem of accelerating some selected array operations using General Purpose Computing
on Graphics Processing Unit (GPGPU). The experimental results showed different levels of speed up,
ranging from 2 to over 20 times, on large scale random element access and sub-array retrieval. In particular, we
utilized GPUs on the computation of the cube operator, a special case of multi-dimensional aggregation, using
BESS. This resulted in a 5 to 8 times of speed up compared with our CPU only implementation. The main
contributions of this dissertation include the developments, implementations and evaluations of four efficient
schemes to store multi-dimensional sparse arrays, as well as utilizing massive parallelism of GPUs for some
data warehousing operations.
|
296 |
Development of a New Method to Optimize Storage Units in Urban Drainage SystemsLiu, Jing 18 July 2022 (has links)
Flood severity and frequency have grown over the years as a result of urban development and climate change. Floods in cities cause major challenges such as property and infrastructure damage, transportation congestion, loss of life, environmental threats, and health concerns. To relieve the load on the urban drainage system and prevent flooding, effective measures to strengthen its resilience are required. Traditional design methods, which rely on past performance trends and long lifespans, usually result in infrastructure that is inflexible and unable to adapt to changing situations. Those traditional studies focused on drainage design, such as pipe slope and diameter optimization, coupling design cost limitation. Furthermore, various terminologies for the overall concept of green/grey infrastructure have been proposed in the literature. Some studies have been focused on the optimization of the suitable locations for storage tanks, which would be one of the most efficient approaches. Building storage facilities such as retention or detention basins are a cost-effective and efficient structural option to improve the resilience of urban sewerage system, reducing peak runoff in existing drainage systems in urban areas, especially compared to traditional methodologies such as increasing pipe diameter or slope providing sufficient hydraulic capacity. The basic concept is to create an optimization framework using Non-dominated Sorting Genetic Algorithm II (NSGA II), coupling with hydraulic model SWMM, and use it to change a number of drainage system-related variables such pipe diameter, slope, and storage unit size. The main idea of the optimization framework in thesis is to combine different methods into one framework, which is a challenge in a complex system due to the dilemma between the resilience objective and financial limitation. Literature review would shows that the recent research in terms of sewerage system resilience optimization utilizing different methodologies. Application of the system would shows that optimization model has the capability to improve the resiliency of urban sewerage system.
The main objective of the thesis are (i) develop a new framework to optimize volume and location of storage units in urban drainage systems; (ii) develop a two-stage multi-objective optimization framework; (iii) develop the new index to make the optimization process feasible.
|
297 |
The experience of nurses with boarder babies on an acute-care unit.Soparkar, Anjani A. 01 January 1992 (has links) (PDF)
No description available.
|
298 |
Robust Estimation and Prediction in the Presence of Influential Units in SurveysTeng, Yizhen 02 August 2023 (has links)
In surveys, one may face the problem of influential units at the estimation stage.
A unit is said to be influential if its inclusion or exclusion from the sample has a
drastic impact on the estimates. This is a common situation in business surveys
as the distribution of economic variables tends to be highly skewed. We study and
examine some commonly used estimators and predictors of a population total and
propose a robust estimator and predictor based on an adaptive tuning constant. The
proposed tuning constant is based on the concept of conditional bias of a unit, which
is a measure of influence. We present the results of a simulation study that compares
the performance of several estimators and predictors in terms of bias and efficiency.
|
299 |
On the Galois module structure of the units and ray classes of a real abelian number fieldAll, Timothy James 23 July 2013 (has links)
No description available.
|
300 |
Freight Truck Traffic Associated with the Port of Oakland: A Case Study of Roadway ImpactsHinkamp, James 01 December 2011 (has links) (PDF)
The Port of Oakland (“Port”) is the 5th largest container seaport by volume in the U.S. and the largest in Northern California. Maritime shipping activity at the Port exceeds 2 million import and export twenty-foot equivalent unit (TEU) containers annually. Containers may be full or empty, but nonetheless typically require hinterland shipment and intermodal transfer between maritime and land-based freight distribution systems. The freight trucking mode (“drayage”) handles approximately 80% of all TEU throughput at the Port, thus constituting the majority of landside Port traffic. The Port is also situated adjacent to dense urban development thereby exacting certain external impacts. Drayage impacts on regional roadway infrastructure proximate to the Port are explored, to expand knowledge of freight network conditions and relevant policies addressing the topic in the San Francisco Bay Area.
Statistical regression analysis and elasticity results estimate a certain level of impact on nearby freight corridors of I-80, I-680, and I-880. Drayage traffic has continued to increase since 2000, as a function of increasing TEU throughput occurring at the Port. Policies to address stable freight flow and infrastructure maintenance are ongoing, although additional studies are also recommended to ascertain comprehensive network impacts.
|
Page generated in 0.0364 seconds