Spelling suggestions: "subject:", multivariate statistics"" "subject:", mmultivariate statistics""
51 |
Groundwater resources in coastal hard rock terrains : Geostatistical and GIS approachDehkordi, S. Emad January 2009 (has links)
Stockholm archipelago is a combination of coastal and young glaciated conditions on hard rock geology with almost no primary porosity and very limited secondary porosity. Therefore the aquifer is both of limited capacity and exposed to salinity problem. In this context importance of fractures and soil cover is magnified. Lineaments are representatives of fractures in remote sensing. Fracture mapping in study area proves close correspondence between orientation of fractures and nearly located lineaments. Especially in this type of terrain, lineaments normally occur together with many other interesting hydrogeological features such as topographic attributes, soil, and vegetation; however, still each of these factors has its own effect on the groundwater situation. Through employment of geostatistical analysis and a modified variant of the RV (Risk Variable) method, called the PV (Probability Value) method, different attributes are rated by importance. The results show, soil cover is the most influencing factor then rock type and distance from lineaments; other factors are classified after them. It is discovered that the center of lineaments may not be the most suitable site to extract water because of being clogged by fills. This is particularly the case for shear fractures in which clay can be internally formed due to friction. Based on the statistical results a model is made in GIS environment in order to create hydrogeological maps. Such maps, after validation, can be used for any other area with similar properties even with missing or very limited data from boreholes. These maps definitively are only probability maps projecting areas with higher and lower prospect of aquifer potential and cannot guarantee high capacity in every borehole drilled in designated areas due to high heterogeneity of fractured rock system. Analysis of chemical data from wells proves a correlation between fracture orientations and topography with salinization and groundwater flow. Groundwater flow in the surroundings seems to be essential for feeding the aquifer as most of the wells with increased salt content have also low capacities.
|
52 |
Spatio-Temporal Statistical Modeling with Application to Wind Energy Assessment in Saudi ArabiaChen, Wanfang 08 November 2020 (has links)
Saudi Arabia has been trying to change its long tradition of relying on fossil fuels
and seek renewable energy sources such as wind power. In this thesis, I firstly provide
a comprehensive assessment of wind energy resources and associated spatio-temporal
patterns over Saudi Arabia in both current and future climate conditions, based on a
Regional Climate Model output. A high wind energy potential exists and is likely to
persist at least until 2050 over a vast area ofWestern Saudi Arabia, particularly in the
region between Medina and the Red Sea coast and during Summer months. Since an
accurate assessment of wind extremes is crucial for risk management purposes, I then
present the first high-resolution risk assessment of wind extremes over Saudi Arabia.
Under the Bayesian framework, I measure the uncertainty of return levels and produce
risk maps of wind extremes, which show that locations in the South of Saudi
Arabia and near the Red Sea and the Persian Gulf are at very high risk of disruption
of wind turbine operations. In order to perform spatial predictions of the bivariate
wind random field for efficient turbine control, I propose parametric variogram matrix
(function) models for cokriging, which have the advantage of allowing for a smooth
transition between a joint second-order and intrinsically stationary vector random
field. Under Gaussianity, the covariance function is central to spatio-temporal modeling,
which is useful to understand the dynamics of winds in space and time. I review
the various space-time covariance structures and models, some of which are visualized
with animations, and associated tests. I also discuss inference issues and a case study based on a high-resolution wind-speed dataset. The Gaussian assumption commonly
made in statistics needs to be validated, and I show that tests for independently and
identically distributed data cannot be used directly for spatial data. I then propose a
new multivariate test for spatial data by accounting for the spatial dependence. The
new test is easy to compute, has a chi-square null distribution, and has a good control
of the type I error and a high empirical power.
|
53 |
Some recent advances in multivariate statistics: modality inference and statistical monitoring of clinical trials with multiple co-primary endpointsCheng, Yansong 22 January 2016 (has links)
This dissertation focuses on two topics in multivariate statistics. The first part develops an inference procedure and fast computation tool for the modal clustering method proposed by Li et al. (2007). The modal clustering, based on the kernel density estimate, clusters data using their associations within a single mode, with the final number of clusters equaling the number of modes, otherwise known as the modality of the distribution of the data. This method provides a flexible tool for clustering data of low to moderate dimensions with arbitrary distributional shapes. In contrast to Li and colleagues, we expand their method by proposing a procedure that determines the number of clusters in the data. A test statistic and its asymptotic distribution are derived to assess the significance of each mode within the data. The inference procedure is tested on both simulated and real data sets. In addition, an R computing package is developed (Modalclust) that implements the modal clustering procedure using parallel processing which dramatically increases computing speed over the previously available method. This package is available on the Comprehensive R Archive Network (CRAN).
The second part of this dissertation develops methods of statistical monitoring of clinical trials with multiple co-primary endpoints, where success is defined as meeting both endpoints simultaneously. In practice, a group sequential design method is used to stop trials early for promising efficacy, and conditional power is used for futility stopping rules. In this dissertation we show that stopping boundaries for the group sequential design with multiple co-primary endpoints should be the same as those for studies with single endpoints. Lan and Wittes (1988) proposed the B-value tool to calculate the conditional power of single endpoint trials and we extend this tool to calculate the conditional power for studies with multiple co-primary endpoints. We consider the cases of two-arm studies with co-primary normal and binary endpoints and provide several examples of implementation with simulated trials. A fixed-weight sample size re-estimation approach based on conditional power is introduced. Finally, we discuss the possibility of blinded interim analyses for multiple endpoints using the modality inference method introduced in the first part.
|
54 |
Ecological Correlates of Community Structure in Seagrass-Associated Fishes in North Biscayne Bay and Port of Miami, FloridaColhoun, Elizabeth F 04 May 2018 (has links)
Seagrass habitats are critical habitat for many fish species and are currently threatened by anthropogenic and natural factors, such as coastal development, pollution, global climate change, and sea level rise. There are few studies that have tracked long- term changes in seagrass habitat and their associated fish communities. This project addressed this need using data collected by the United States Geological Survey (USGS) from two South Florida sites, North Biscayne Bay, FL (NBB) and Port of Miami, FL (POM). The USGS sampling was part of ongoing monitoring projects designed to assist future management decisions that would enhance the protection of these valuable habitats. Data were collected biannually at the conclusion of the dry (April) and wet (September) seasons from 30 cells at each site. In each cell, the data collected included: six replicates for seagrass species and cover, five sweep net collections for fish species and abundance, as well as abiotic variables (water temperature, salinity, turbidity, water depth, and sediment depth). A distinct loss in fish and seagrass species were observed, particularly between the years of 2011-2014. These years coincided with several events including: the Port Miami Deep Dredge (PMDD) project during the years 2013-2015; periods of drought; and major storm events. Changes in fish community structure over this time period were largely driven by loss of species and increased homogenization of fish communities at both locations. More specifically, the NBB community shifted to resemble that of POM by 2014. These changes mirrored the loss of seagrass cover at both locations. Further studies are required to assess the extent to which ongoing dredging activities and other factors might be affecting seagrass cover, which ultimately affect fish communities.
|
55 |
Analysis of Coastal Erosion on Martha's Vineyard, Massachusetts: a Paraglacial IslandBrouillette-jacobson, Denise M 01 January 2008 (has links) (PDF)
As the sea rises in response to global climate changes, small islands will lose a significant portion of their land through ensuing erosion processes. The particular vulnerability of small island systems led me to choose Martha’s Vineyard (MV), a 248 km2 paraglacial island, 8 km off the south shore of Cape Cod, Massachusetts, as a model system with which to analyze the interrelated problems of sea level rise (SLR) and coastal erosion. Historical data documented ongoing SLR (~3mm/yr) in the vicinity of MV. Three study sites differing in geomorphological and climatological properties, on the island’s south (SS), northwest (NW), and northeastern (NE) coasts, were selected for further study. Mathematical models and spatial data analysis, as well as data on shoreline erosion from almost 1500 transects, were employed to evaluate the roles of geology, surficial geology, wetlands, land use, soils, percent of sand, slope, erodible land, wind, waves, and compass direction in the erosion processes at each site. These analyses indicated that: 1) the three sites manifested different rates of erosion and accretion, from a loss of approximately 0.1 m/yr at the NE and NW sites to over 1.7 m/yr at the SS site; 2) the NE and NW sites fit the ratio predicted by Bruun for the rate of erosion vs. SLR, but the SS site exceeded that ratio more than fivefold; 3) the shoreline erosion patterns for all three sites are dominated by short-range effects, not long-range stable effects; 4) geological components play key roles in erosion on MV, a possibility consistent with the island’s paraglacial nature; and 5) the south side of MV is the segment of the coastline that is particularly vulnerable to significant erosion over the next 100 years. These conclusions were not evident from simple statistical analyses. Rather, the recognition that multiple factors besides sea level positions contribute to the progressive change in coastal landscapes only emerged from more complex analyses, including fractal dimension analysis, multivariate statistics, and spatial data analysis. This suggests that analyses of coastal erosion that are limited to only one or two variables may not fully unravel the underlying processes.
|
56 |
Integrative and Multivariate Statistical Approaches to Assessing Phenotypic and Genotypic Determinants of Complex DiseaseKarns, Rebekah A., B.S. 05 October 2012 (has links)
No description available.
|
57 |
Chemometrics Development using Multivariate Statistics and Vibrational Spectroscopy and its Application to Cancer DiagnosisLi, Ran January 2015 (has links)
No description available.
|
58 |
Laser Electrospray Mass Spectrometry: Mechanistic Insights and Classification of Inorganic-Based Explosives and Tissue Phenotypes Using Multivariate StatisticsFlanigan IV, Paul M. January 2014 (has links)
This dissertation elucidates a greater understanding of the vaporization and electrospray post-ionization mechanisms when using femtosecond laser pulses for desorption of surface molecules and electrospray ionization for capture and mass analysis of the gas phase ions. The internal energy deposition from nonresonant vaporization with femtosecond laser pulses was measured using dried and liquid samples of p-substituted benzylpyridinium ions and peptides. In the comparison of the experiments of using 800 nm and 1042 nm laser pulses, it was found that there are different vaporization mechanisms for dried and liquid samples. It was established that LEMS is a "soft" mass analysis technique as it resulted in comparable internal energy distributions to ESI-MS with one caveat; multiphoton excitation of dried samples results in extensive fragmentation at higher pulse energies. The quantitative aspects of the laser electrospray mass spectrometry (LEMS) technique were established using various multicomponent mixtures of small biomolecules. Experiments with LEMS resulted in similar quantitative characteristics to ESI-MS except that ESI-MS demonstrated a greater degree of ion suppression when using higher concentrations, particularly in the four-component mixture. The lack of ion suppression in the LEMS measurements was due to the ~1% neutral capture efficiency and most likely not a result of nonequilibrium partitioning. This was supported by the excess charge limit not being surpassed in the LEMS experiments and the quantitative analysis requiring the use of response factors. This dissertation also expanded upon the use of multivariate analysis for the classification of samples that were directly mass analyzed without any sample preparation using LEMS. A novel electrospray complexation mixture using cationic pairing agents, a lipid, and sodium acetate enabled the simultaneous detection of positive, neutral and negative charged features of inorganic-based explosive residues in a single experiment. This complexation mixture also enabled the detection of new features from an RDX-based propellant mixture. Principal component analysis (PCA) proved reliable for accurate classifications of the explosive mixtures. PCA was also used for accurate classification of eight phenotypes of Impatiens plant flower petals after mass analysis with LEMS. The PCA loading values were used to identify the key biomarkers in the classification. These important mass spectral features were identified as the biologically-relevant anthocyanins, which are phytochemicals that are responsible for the color of the flower petals. / Chemistry
|
59 |
Machine Learning and Multivariate Statistics for Optimizing Bioprocessing and Polyolefin ManufacturingAgarwal, Aman 07 January 2022 (has links)
Chemical engineers have routinely used computational tools for modeling, optimizing, and debottlenecking chemical processes. Because of the advances in computational science over the past decade, multivariate statistics and machine learning have become an integral part of the computerization of chemical processes. In this research, we look into using multivariate statistics, machine learning tools, and their combinations through a series of case studies including a case with a successful industrial deployment of machine learning models for fermentation. We use both commercially-available software tools, Aspen ProMV and Python, to demonstrate the feasibility of the computational tools.
This work demonstrates a novel application of ensemble-based machine learning methods in bioprocessing, particularly for the prediction of different fermenter types in a fermentation process (to allow for successful data integration) and the prediction of the onset of foaming. We apply two ensemble frameworks, Extreme Gradient Boosting (XGBoost) and Random Forest (RF), to build classification and regression models. Excessive foaming can interfere with the mixing of reactants and lead to problems, such as decreasing effective reactor volume, microbial contamination, product loss, and increased reaction time. Physical modeling of foaming is an arduous process as it requires estimation of foam height, which is dynamic in nature and varies for different processes.
In addition to foaming prediction, we extend our work to control and prevent foaming by allowing data-driven ad hoc addition of antifoam using exhaust differential pressure as an indicator of foaming. We use large-scale real fermentation data for six different types of sporulating microorganisms to predict foaming over multiple strains of microorganisms and build exploratory time-series driven antifoam profiles for four different fermenter types. In order to successfully predict the antifoam addition from the large-scale multivariate dataset (about half a million instances for 163 batches), we use TPOT (Tree-based Pipeline Optimization Tool), an automated genetic programming algorithm, to find the best pipeline from 600 other pipelines. Our antifoam profiles are able to decrease hourly volume retention by over 53% for a specific fermenter. A decrease in hourly volume retention leads to an increase in fermentation product yield.
We also study two different cases associated with the manufacturing of polyolefins, particularly LDPE (low-density polyethylene) and HDPE (high-density polyethylene). Through these cases, we showcase the usage of machine learning and multivariate statistical tools to improve process understanding and enhance the predictive capability for process optimization.
By using indirect measurements such as temperature profiles, we demonstrate the viability of such measures in the prediction of polyolefin quality parameters, anomaly detection, and statistical monitoring and control of the chemical processes associated with a LDPE plant. We use dimensionality reduction, visualization tools, and regression analysis to achieve our goals. Using advanced analytical tools and a combination of algorithms such as PCA (Principal Component Analysis), PLS (Partial Least Squares), Random Forest, etc., we identify predictive models that can be used to create inferential schemes.
Soft-sensors are widely used for on-line monitoring and real-time prediction of process variables. In one of our cases, we use advanced machine learning algorithms to predict the polymer melt index, which is crucial in determining the product quality of polymers. We use real industrial data from one of the leading chemical engineering companies in the Asia-Pacific region to build a predictive model for a HDPE plant. Lastly, we show an end-to-end workflow for deep learning on both industrial and simulated polyolefin datasets.
Thus, using these five cases, we explore the usage of advanced machine learning and multivariate statistical techniques in the optimization of chemical and biochemical processes. The recent advances in computational hardware allow engineers to design such data-driven models, which enhances their capacity to effectively and efficiently monitor and control a process. We showcase that even non-expert chemical engineers can implement such machine learning algorithms with ease using open-source or commercially available software tools. / Doctor of Philosophy / Most chemical and biochemical processes are equipped with advanced probes and connectivity sensors that collect large amounts of data on a daily basis. It is critical to manage and utilize the significant amount of data collected from the start and throughout the development and manufacturing cycle. Chemical engineers have routinely used computational tools for modeling, designing, optimizing, debottlenecking, and troubleshooting chemical processes. Herein, we present different applications of machine learning and multivariate statistics using industrial datasets.
This dissertation also includes a deployed industrial solution to mitigate foaming in commercial fermentation reactors as a proof-of-concept (PoC). Our antifoam profiles are able to decrease volume loss by over 53% for a specific fermenter. Throughout this dissertation, we demonstrate applications of several techniques like ensemble methods, automated machine learning, exploratory time series, and deep learning for solving industrial problems. Our aim is to bridge the gap from industrial data acquisition to finding meaningful insights for process optimization.
|
60 |
The influence of probability of detection when modeling species occurrence using GIS and survey dataWilliams, Alison Kay 12 April 2004 (has links)
I compared the performance of habitat models created from data of differing reliability. Because the reliability is dependent on the probability of detecting the species, I experimented to estimate detectability for a salamander species. Based on these estimates, I investigated the sensitivity of habitat models to varying detectability.
Models were created using a database of amphibian and reptile observations at Fort A.P. Hill, Virginia, USA. Performance was compared among modeling methods, taxa, life histories, and sample sizes. Model performance was poor for all methods and species, except for the carpenter frog (Rana virgatipes). Discriminant function analysis and ecological niche factor analysis (ENFA) predicted presence better than logistic regression and Bayesian logistic regression models. Database collections of observations have limited value as input for modeling because of the lack of absence data. Without knowledge of detectability, it is unknown whether non-detection represents absence.
To estimate detectability, I experimented with red-backed salamanders (Plethodon cinereus) using daytime, cover-object searches and nighttime, visual surveys. Salamanders were maintained in enclosures (n = 124) assigned to four treatments, daytime__low density, daytime__high density, nighttime__low density, and nighttime__high density. Multiple observations of each enclosure were made. Detectability was higher using daytime, cover-object searches (64%) than nighttime, visual surveys (20%). Detection was also higher in high-density (49%) versus low-density enclosures (35%).
Because of variation in detectability, I tested model sensitivity to the probability of detection. A simulated distribution was created using functions relating habitat suitability to environmental variables from a landscape. Surveys were replicated by randomly selecting locations (n = 50, 100, 200, or 500) and determining whether the species was observed, based on the probability of detection (p = 40%, 60%, 80%, or 100%). Bayesian logistic regression and ENFA models were created for each sample. When detection was 80 __ 100%, Bayesian predictions were more correlated with the known suitability and identified presence more accurately than ENFA.
Probability of detection was variable among sampling methods and effort. Models created from presence/absence data were sensitive to the probability of detection in the input data. This stresses the importance of quantifying detectability and using presence-only modeling methods when detectability is low. If planning for sampling as an input for suitability modeling, it is important to choose sampling methods to ensure that detection is 80% or higher. / Ph. D.
|
Page generated in 0.0915 seconds