Spelling suggestions: "subject:"mparse data"" "subject:"mparse mata""
1 |
Functional Analysis of Real World Truck Fuel Consumption DataVogetseder, Georg January 2008 (has links)
<p>This thesis covers the analysis of sparse and irregular fuel consumption data of long</p><p>distance haulage articulate trucks. It is shown that this kind of data is hard to analyse with multivariate as well as with functional methods. To be able to analyse the data, Principal Components Analysis through Conditional Expectation (PACE) is used, which enables the use of observations from many trucks to compensate for the sparsity of observations in order to get continuous results. The principal component scores generated by PACE, can then be used to get rough estimates of the trajectories for single trucks as well as to detect outliers. The data centric approach of PACE is very useful to enable functional analysis of sparse and irregular data. Functional analysis is desirable for this data to sidestep feature extraction and enabling a more natural view on the data.</p>
|
2 |
Functional Analysis of Real World Truck Fuel Consumption DataVogetseder, Georg January 2008 (has links)
This thesis covers the analysis of sparse and irregular fuel consumption data of long distance haulage articulate trucks. It is shown that this kind of data is hard to analyse with multivariate as well as with functional methods. To be able to analyse the data, Principal Components Analysis through Conditional Expectation (PACE) is used, which enables the use of observations from many trucks to compensate for the sparsity of observations in order to get continuous results. The principal component scores generated by PACE, can then be used to get rough estimates of the trajectories for single trucks as well as to detect outliers. The data centric approach of PACE is very useful to enable functional analysis of sparse and irregular data. Functional analysis is desirable for this data to sidestep feature extraction and enabling a more natural view on the data.
|
3 |
Seismic Applications of Interactive Computational MethodsLI, MIN Unknown Date (has links)
Effective interactive computing methods are needed in a number of specific areas of geophysical interpretation, even though the basic algorithms have been established. One approach to raise the quality of interpretation is to promote better interaction between human and the computer. The thesis is concerned with improving this dialog in three areas: automatic event picking, data visualization and sparse data imaging. Fully automatic seismic event picking methods work well in relatively good conditions. They collapse when the signal-to-noise ratio is low and the structure of the subsurface is complex. The interactive seismic event picking system described here blends the interpreter's guidance and judgment into the computer program, as it can bring the user into the loop to make subjective decisions when the picking problem is complicated. Several interactive approaches for 2-D event picking and 3-D horizon tracking have been developed. Envelope (or amplitude) threshold detection for first break picking is based on the assumption that the power of the signal is larger than that of the noise. Correlation and instantaneous phase pickers are designed for and better suited to picking other arrivals. The former is based on the cross-correlation function, and a model trace (or model traces) selected by the interpreter is needed. The instantaneous phase picker is designed to track spatial variations in the instantaneous phase of the analytic form of the arrival. The picking options implemented into the software package SeisWin were tested on real data drawn from many sources, such as full waveform sonic borehole logs, seismic reflection surveys and borehole radar profiles, as well as seven of the most recent 3-D seismic surveys conducted over Australian coal mines. The results show that the interactive picking system in SeisWin is efficient and tolerant. The 3-D horizon tracking method developed especially attracts industrial users. The visualization of data is also a part of the study, as picking accuracy, and indeed the whole of seismic interpretation depends largely on the quality of the final display. The display is often the only window through which an interpreter can see the earth's substructures. Display is a non-linear operation. Adjustments made to meet display deficiencies such as automatic gain control (AGC) have an important and yet ill-documented effect on the performance of pattern recognition operators, both human and computational. AGC is usually implemented in one dimension. Some of the tools in wide spread use for two dimensional image processing which are of great value in the local gain control of conventional seismic sections such as edge detectors, histogram equalisers, high-pass filters, shaded relief are discussed. Examples are presented to show the relative effectiveness of various display options. Conventional migration requires dense arrays with uniform coverage and uniform illumination of targets. There are, however, many instances in which these ideals can not be approached. Event migration and common tangent plane stacking procedures were developed especially for sparse data sets as a part of the research effort underlying this thesis. Picked-event migration migrates the line between any two points on different traces on the time section to the base map. The interplay between the space and time domain gives the interpreter an immediate view of mapping. Tangent plane migration maps the reflector by accumulating the energy from any two possible reflecting points along the common tangent lines on the space plane. These methods have been applied to both seismic and borehole-radar data and satisfactory results have been achieved.
|
4 |
Seismic Applications of Interactive Computational MethodsLI, MIN Unknown Date (has links)
Effective interactive computing methods are needed in a number of specific areas of geophysical interpretation, even though the basic algorithms have been established. One approach to raise the quality of interpretation is to promote better interaction between human and the computer. The thesis is concerned with improving this dialog in three areas: automatic event picking, data visualization and sparse data imaging. Fully automatic seismic event picking methods work well in relatively good conditions. They collapse when the signal-to-noise ratio is low and the structure of the subsurface is complex. The interactive seismic event picking system described here blends the interpreter's guidance and judgment into the computer program, as it can bring the user into the loop to make subjective decisions when the picking problem is complicated. Several interactive approaches for 2-D event picking and 3-D horizon tracking have been developed. Envelope (or amplitude) threshold detection for first break picking is based on the assumption that the power of the signal is larger than that of the noise. Correlation and instantaneous phase pickers are designed for and better suited to picking other arrivals. The former is based on the cross-correlation function, and a model trace (or model traces) selected by the interpreter is needed. The instantaneous phase picker is designed to track spatial variations in the instantaneous phase of the analytic form of the arrival. The picking options implemented into the software package SeisWin were tested on real data drawn from many sources, such as full waveform sonic borehole logs, seismic reflection surveys and borehole radar profiles, as well as seven of the most recent 3-D seismic surveys conducted over Australian coal mines. The results show that the interactive picking system in SeisWin is efficient and tolerant. The 3-D horizon tracking method developed especially attracts industrial users. The visualization of data is also a part of the study, as picking accuracy, and indeed the whole of seismic interpretation depends largely on the quality of the final display. The display is often the only window through which an interpreter can see the earth's substructures. Display is a non-linear operation. Adjustments made to meet display deficiencies such as automatic gain control (AGC) have an important and yet ill-documented effect on the performance of pattern recognition operators, both human and computational. AGC is usually implemented in one dimension. Some of the tools in wide spread use for two dimensional image processing which are of great value in the local gain control of conventional seismic sections such as edge detectors, histogram equalisers, high-pass filters, shaded relief are discussed. Examples are presented to show the relative effectiveness of various display options. Conventional migration requires dense arrays with uniform coverage and uniform illumination of targets. There are, however, many instances in which these ideals can not be approached. Event migration and common tangent plane stacking procedures were developed especially for sparse data sets as a part of the research effort underlying this thesis. Picked-event migration migrates the line between any two points on different traces on the time section to the base map. The interplay between the space and time domain gives the interpreter an immediate view of mapping. Tangent plane migration maps the reflector by accumulating the energy from any two possible reflecting points along the common tangent lines on the space plane. These methods have been applied to both seismic and borehole-radar data and satisfactory results have been achieved.
|
5 |
Data envelopment analysis with sparse dataGullipalli, Deep Kumar January 1900 (has links)
Master of Science / Department of Industrial & Manufacturing Systems Engineering / David H. Ben-Arieh / Quest for continuous improvement among the organizations and issue of missing data for data analysis are never ending. This thesis brings these two topics under one roof, i.e., to evaluate the productivity of organizations with sparse data. This study focuses on Data Envelopment Analysis (DEA) to determine the efficiency of 41 member clinics of Kansas Association of Medically Underserved (KAMU) with missing data. The primary focus of this thesis is to develop new reliable methods to determine the missing values and to execute DEA.
DEA is a linear programming methodology to evaluate relative technical efficiency of homogenous Decision Making Units, using multiple inputs and outputs. Effectiveness of DEA depends on the quality and quantity of data being used. DEA outcomes are susceptible to missing data, thus, creating a need to supplement sparse data in a reliable manner. Determining missing values more precisely improves the robustness of DEA methodology.
Three methods to determine the missing values are proposed in this thesis based on three different platforms. First method named as Average Ratio Method (ARM) uses average value, of all the ratios between two variables. Second method is based on a modified Fuzzy C-Means Clustering algorithm, which can handle missing data. The issues associated with this clustering algorithm are resolved to improve its effectiveness. Third method is based on interval approach. Missing values are replaced by interval ranges estimated by experts. Crisp efficiency scores are identified in similar lines to how DEA determines efficiency scores using the best set of weights.
There exists no unique way to evaluate the effectiveness of these methods. Effectiveness of these methods is tested by choosing a complete dataset and assuming varying levels of data as missing. Best set of recovered missing values, based on the above methods, serves as a source to execute DEA. Results show that the DEA efficiency scores generated with recovered values are close within close proximity to the actual efficiency scores that would be generated with the complete data.
As a summary, this thesis provides an effective and practical approach for replacing missing values needed for DEA.
|
6 |
An investigation of fuzzy modeling for spatial prediction with sparsely distributed dataThomas, Robert 31 August 2018 (has links)
Dioxins are highly toxic persistent environmental pollutants that occur in marine harbour
sediments as the results of industrial practices around the world and pose a significant risk to human health. To adequately remediate contaminated sediments, the spatial extent of contamination must first be determined by spatial interpolation. The ability to lower sampling frequency and perform laboratory analysis on fewer samples, yet still produce an adequate pollutant distribution map, would reduce the initial cost of new remediation projects. Fuzzy Set Theory has been shown as a way to reduce uncertainty due to data sparsity and provides an advantageous way to quantify gradational changes like those of pollutant concentrations through fuzzing clustering based approaches; Fuzzy modelling has the ability to utilize these advantages for making spatial predictions. To assess the ability of fuzzy modeling to make spatial predictions using fewer sample points, its predictive ability was compared to Ordinary Kriging (OK) and Inverse Distance Weighting (IDW) under increasingly sparse data conditions. This research used a Takagi-Sugeno (T-S) fuzzy modelling approach with fuzzy c-means clustering to make spatial predictions of lead concentrations in soil to determine the efficacy of the fuzzy model for applications of modeling dioxins in marine sediment. The spatial density of the data used to make the predictions was incrementally reduced to simulate increasingly sparse spatial
data conditions. To determine model performance, the data at each increment not used for
making the spatial predictions was used as validation data, which the model attempted to predict and the performance was analyzed. Initially, the parameters associated with the T-S fuzzy model were determined by the optimum observed performance, where the combination of parameters that produced the most accurate prediction of the validation data were retained as optimal for each increment of the data reduction. To determine performance Mean Absolute Error, the Coefficient of Determination, and Root Mean Squared Error were selected as metrics. To give each metric equal weighting a binned scoring system was developed where each metric received a score from 1 to 10, the average represented that methods score. The Akaike Information Criterion (AIC) was also employed to determine the effect of the varied validation set lengths on performance. For the T-S fuzzy model as the amount of data used to solve the respective validation set points was reduced the number of clusters was lower and the cluster centres were more spread out, the fuzzy overlap between clusters was larger, and the widths of the
membership function in the T-S fuzzy model were wider. Although it was possible to determine an optimal number of clusters, fuzzy overlap, and membership function width that yielded an optimal prediction of the validation data, gain in performance was minor compared to many other combinations of parameters. Therefore, for the data used in this study the T-S fuzzy model was insensitive to parameter choice. For OK, as the data was reduced, the range of spatial dependence in the data from variography became lower, and for IDW the power parameters optimal value became lower to give a greater weighting to more widely spread points. For the TS fuzzy model, OK, and IDW the increasingly sparse data conditions resulted in an increasingly poor model performance for all metrics. This was supported by AIC values for each method at each increment of the data reduction that were within 1 point of each other. The ability of the methods to predict outlier points and reproduce the variance in the validation sets was very similar and overall quite poor. Based on the scoring system IDW did exhibit a slight outperformance of the T-S fuzzy model, which slightly outperformed OK. However, the scoring system employed in this research was overly sensitive and so was only useful for assessing relative performance. The performance of the T-S model was very dependent on the number of outliers in the respective validation set. For modeling under sparse data conditions, the T-S fuzzy modeling approach using FCM clustering and constant width Gaussian shaped membership functions used in this research did not show any advantages over IDW and OK for the type of data tested. Therefore, it was not possible to speculate on a possible reduction in sampling frequency for delineating the extent of contamination for new remediation projects. / Graduate
|
7 |
Clustering High-dimensional Noisy Categorical and Mixed DataZhiyi Tian (10925280) 27 July 2021 (has links)
Clustering is an unsupervised learning technique widely used to group data into homogeneous clusters. For many real-world data containing categorical values, existing algorithms are often computationally costly in high dimensions, do not work well on noisy data with missing values, and rarely provide theoretical guarantees on clustering accuracy. In this thesis, we propose a general categorical data encoding method and a computationally efficient spectral based algorithm to cluster high-dimensional noisy categorical (nominal or ordinal) data. Under a statistical model for data on m attributes from n subjects in r clusters with missing probability epsilon, we show that our algorithm exactly recovers the true clusters with high probability when mn(1-epsilon) >= CMr<sup>2</sup> log<sup>3</sup>M, with M=max(n,m) and a fixed constant C. Moreover, we show that mn(1- epsilon)<sup>2</sup> >= r *delta/2 with 0< delta <1 is necessary for any algorithm to succeed with probability at least (1+delta)/2. In case, where m=n and r is fixed, for example, the sufficient condition matches with the necessary condition up to a polylog(n) factor, showing that our proposed algorithm is nearly optimal. We also show our algorithm outperforms several existing algorithms in both clustering accuracy and computational efficiency, both theoretically and numerically. In addition, we propose a spectral algorithm with standardization to cluster mixed data. This algorithm is computationally efficient and its clustering accuracy has been evaluated numerically on both real world data and synthetic data.
|
8 |
A Permutation-Based Confidence Distribution for Rare-Event Meta-AnalysisAndersen, Travis 18 April 2022 (has links)
Confidence distributions (CDs), which provide evidence across all levels of significance, are receiving increasing attention, especially in meta-analysis. Meta-analyses allow independent study results to be combined to produce one overall conclusion and are particularly useful in public health and medicine. For studies with binary outcomes that are rare, many traditional meta-analysis methods often fail (Sutton et al. 2002; Efthimiou 2018; Liu et al. 2018; Liu 2019; Hunter and Schmidt 2000; Kontopantelis et al. 2013). Zabriskie et al. (2021b) develop a permutation-based method to analyze such data when study treatment effects vary beyond what is expected by chance. In this work, we prove that this method can be considered a CD. Additionally, we develop two new metrics to assess a CD's relative performance.
|
9 |
Methods for Characterizing Groundwater Resources with Sparse In-Situ DataNishimura, Ren 14 June 2022 (has links)
Groundwater water resources must be accurately characterized in order to be managed sustainably. Due to the cost to install monitoring wells and challenges in collecting and managing in-situ data, groundwater data is sparse in space and time especially in developing countries. In this study we analyzed long-term groundwater storage changes with limited times-series data where each well had only one groundwater measurement in time. We developed methods to synthetically create times-series groundwater table elevation (WTE) by clustering wells with uniform grid and k-means-constrained clustering and creating pseudo wells. Pseudo wells with the WTE values from the cluster-member wells were temporally and spatially interpolated to analyze groundwater changes. We used the methods for the Beryl-Enterprise aquifer in Utah where other researchers quantified the groundwater storage depletion rate in the past, and the methods yielded a similar storage depletion rate. The method was then applied to the southern region in Niger and the result showed a ground water storage change that partially matched with the trend calculated by the GRACE data. With a limited data set that regressions or machine learning did not work, our method captured the groundwater storage trend correctly and can be used for the area where in-situ data is highly limited in time and space.
|
10 |
The Impact of Environmental Change and Water Conservation on Dryland Groundwater Resources in Northern Egypt: Modeling Aquifer Response Using Sparse DataSwitzman, Harris R. 10 1900 (has links)
<p>Please contact the author with any questions. A compressed tar.b2z file is attached with the groundwater model input files.</p> / <p>Wadi El Naturn, located in the Western Desert in northern Egypt, has been subject to significant groundwater degradation since the 1990s, attributed primarily to agricultural development. Information required to diagnose the drivers of groundwater degradation and assess management options in dryland environments like Wadi El Natrun is however, frequently sparse. This research presents an approach for modeling the impacts of dryland environmental change on groundwater in the context of sparse data. A focus is placed on understanding the potential impacts of conservation strategies in the context of climate change. Water use, hydrostratigraphic and groundwater flow data were collected from literature, monitoring records, satellite imagery and a survey of local landholders. MODFLOW-NWT was used to model the multi-layer aquifer system, and algorithms were developed in R to create realizations of groundwater recharge, and well-pumping at a monthly time-step from 1957 to 2011. The model was deemed to be reasonably capable of capturing the cumulative impact of environmental change over this historical period. A risk assessment approach was then used to assess the impact of climate change and conservation-focused management scenarios on groundwater locally over a 50-year future planning horizon. The optimization of irrigation systems and increased cultivation of drought/salt tolerant crops have the potential to significantly reduce the risk of groundwater depletion compared to an across-the-board 20% water use reduction scenario. The influence of groundwater pumping also outweighed that of climate change, and the most vulnerable water users/ecosystem were found to be the most exposed to groundwater degradation.</p> / Master of Applied Science (MASc)
|
Page generated in 0.0424 seconds