Global ETD Search

31	Multiple Testing Correction with Repeated Correlated Outcomes: Applications to Epigenetics Leap, Katie 27 October 2017 (has links) Epigenetic changes (specifically DNA methylation) have been associated with adverse health outcomes; however, unlike genetic markers that are fixed over the lifetime of an individual, methylation can change. Given that there are a large number of methylation sites, measuring them repeatedly introduces multiple testing problems beyond those that exist in a static genetic context. Using simulations of epigenetic data, we considered different methods of controlling the false discovery rate. We considered several underlying associations between an exposure and methylation over time. We found that testing each site with a linear mixed effects model and then controlling the false discovery rate (FDR) had the highest positive predictive value (PPV), a low number of false positives, and was able to differentiate between differential methylation that was present at only one time point vs. a persistent relationship. In contrast, methods that controlled FDR at a single time point and ad hoc methods tended to have lower PPV, more false positives, and/or were unable to differentiate these conditions. Validation in data obtained from Project Viva found a difference between fitting longitudinal models only to sites significant at one time point and fitting all sites longitudinally. multiple testing epigenetics methylation mixed models longitudinal false discovery rate Bioinformatics Biostatistics Computational Biology
32	Zhodnocení finanční situace podniku pomocí statistických metod / Assessment of the Financial Situation of a Company Using Statistical Methods Lesonický, Lukáš January 2013 (has links) This thesis deals with the analysis of selected indicators of Zea a.s. by using statistical methods. Thesis evaluates performance of the company on the basis of the output of accounting. The first part is focused on theoretical aspects of selected economic indicators and the isme of time series. The second part is focused on individual financial indicators of the company to which are applied statistical methods to obtain forecasts for following years. The last part is the assessment of individual indicators and formulation of proposals and recommendations to improve the financial health of the company.
33	Statistická analýza ekonomických rizikových faktorů organizace / Statistical Analysis of an Organization´s Economic Risk Factors Ambrožová, Andrea January 2013 (has links) This thesis deals with the evaluation of high-risk economical factors of one particular organization and consequently with their evaluation based on statistical methods. The principal aim of the study was to determine the dominant economic indicators of the organization and assess their development over time based on statistical methods, using statistical tools. The tools utilized in the thesis were Statgraphics Centurion XV and MS Excel.
34	A Comparison of Techniques for Handling Missing Data in Longitudinal Studies Bogdan, Alexander R 07 November 2016 (has links) Missing data are a common problem in virtually all epidemiological research, especially when conducting longitudinal studies. In these settings, clinicians may collect biological samples to analyze changes in biomarkers, which often do not conform to parametric distributions and may be censored due to limits of detection. Using complete data from the BioCycle Study (2005-2007), which followed 259 premenopausal women over two menstrual cycles, we compared four techniques for handling missing biomarker data with non-Normal distributions. We imposed increasing degrees of missing data on two non-Normally distributed biomarkers under conditions of missing completely at random, missing at random, and missing not at random. Generalized estimating equations were used to obtain estimates from complete case analysis, multiple imputation using joint modeling, multiple imputation using chained equations, and multiple imputation using chained equations and predictive mean matching on Day 2, Day 13 and Day 14 of a standardized 28-day menstrual cycle. Estimates were compared against those obtained from analysis of the completely observed biomarker data. All techniques performed comparably when applied to a Normally distributed biomarker. Multiple imputation using joint modeling and multiple imputation using chained equations produced similar estimates across all types and degrees of missingness for each biomarker. Multiple imputation using chained equations and predictive mean matching consistently deviated from both the complete data estimates and the other missing data techniques when applied to a biomarker with a bimodal distribution. When addressing missing biomarker data in longitudinal studies, special attention should be given to the underlying distribution of the missing variable. As biomarkers become increasingly Normal, the amount of missing data tolerable while still obtaining accurate estimates may also increase when data are missing at random. Future studies are necessary to assess these techniques under more elaborate missingness mechanisms and to explore interactions between biomarkers for improved imputation models. Missing Data Longitudinal Study Multiple Imputation Biostatistics Epidemiology Other Statistics and Probability Women's Health
35	Comparison of Time Series and Functional Data Analysis for the Study of Seasonality. Allen, Jake 17 August 2011 (has links) (PDF) Classical time series analysis has well known methods for the study of seasonality. A more recent method of functional data analysis has proposed phase-plane plots for the representation of each year of a time series. However, the study of seasonality within functional data analysis has not been explored extensively. Time series analysis is first introduced, followed by phase-plane plot analysis, and then compared by looking at the insight that both methods offer particularly with respect to the seasonal behavior of a variable. Also, the possible combination of both approaches is explored, specifically with the analysis of the phase-plane plots. The methods are applied to data observations measuring water flow in cubic feet per second collected monthly in Newport, TN from the French Broad River. Simulated data corresponding to typical time series cases are then used for comparison and further exploration. moving averages French Broad River X-11 phase-plane plots Physical Sciences and Mathematics Statistics and Probability
36	MULTI-STATE MODELS WITH MISSING COVARIATES Lou, Wenjie 01 January 2016 (has links) Multi-state models have been widely used to analyze longitudinal event history data obtained in medical studies. The tools and methods developed recently in this area require the complete observed datasets. While, in many applications measurements on certain components of the covariate vector are missing on some study subjects. In this dissertation, several likelihood-based methodologies were proposed to deal with datasets with different types of missing covariates efficiently when applying multi-state models. Firstly, a maximum observed data likelihood method was proposed when the data has a univariate missing pattern and the missing covariate is a categorical variable. The construction of the observed data likelihood function is based on the model of a joint distribution of the response longitudinal event history data and the discrete covariate with missing values. Secondly, we proposed a maximum simulated likelihood method to deal with the missing continuous covariate when applying multi-state models. The observed data likelihood function was approximated by using the Monte Carlo simulation method. At last, an EM algorithm was used to deal with multiple missing covariates when estimating the parameters of multi-state model. The EM algorithm would be able to handle multiple missing discrete covariates in general missing pattern efficiently. All the proposed methods are justified by simulation studies and applications to the datasets from the SMART project, a consortium of 11 different high-quality longitudinal studies of aging and cognition. Longitudinal event history data multi-state model missing covariate data EM algorithm maximum simulated likelihood SMART project Applied Statistics Statistical Models
37	Takens Theorem with Singular Spectrum Analysis Applied to Noisy Time Series Torku, Thomas K 01 May 2016 (has links) The evolution of big data has led to financial time series becoming increasingly complex, noisy, non-stationary and nonlinear. Takens theorem can be used to analyze and forecast nonlinear time series, but even small amounts of noise can hopelessly corrupt a Takens approach. In contrast, Singular Spectrum Analysis is an excellent tool for both forecasting and noise reduction. Fortunately, it is possible to combine the Takens approach with Singular Spectrum analysis (SSA), and in fact, estimation of key parameters in Takens theorem is performed with Singular Spectrum Analysis. In this thesis, we combine the denoising abilities of SSA with the Takens theorem approach to make the manifold reconstruction outcomes of Takens theorem less sensitive to noise. In particular, in the course of performing the SSA on a noisy time series, we branch of into a Takens theorem approach. We apply this approach to a variety of noisy time series. Singular Value Decomposition Singular Spectrum Analysis Takens Vectors Entropy Mutual Information Financial time series Algebra Applied Mathematics Other Statistics and Probability Statistical Models
38	Spatio-Temporal Analysis of Point Patterns Soale, Abdul-Nasah 01 August 2016 (has links) In this thesis, the basic tools of spatial statistics and time series analysis are applied to the case study of the earthquakes in a certain geographical region and time frame. Then some of the existing methods for joint analysis of time and space are described and applied. Finally, additional research questions about the spatial-temporal distribution of the earthquakes are posed and explored using statistical plots and models. The focus in the last section is in the relationship between number of events per year and maximum magnitude and its effect on how clustered the spatial distribution is and the relationship between distances in time and space in between consecutive events as well as the distribution of the distances. Spatio-temporal earthquakes K-function Inhomogeneous K CSR First and Second-Order Properties Applied Statistics Spatial Science
39	Improved Methods and Selecting Classification Types for Time-Dependent Covariates in the Marginal Analysis of Longitudinal Data Chen, I-Chen 01 January 2018 (has links) Generalized estimating equations (GEE) are popularly utilized for the marginal analysis of longitudinal data. In order to obtain consistent regression parameter estimates, these estimating equations must be unbiased. However, when certain types of time-dependent covariates are presented, these equations can be biased unless an independence working correlation structure is employed. Moreover, in this case regression parameter estimation can be very inefficient because not all valid moment conditions are incorporated within the corresponding estimating equations. Therefore, approaches using the generalized method of moments or quadratic inference functions have been proposed for utilizing all valid moment conditions. However, we have found that such methods will not always provide valid inference and can also be improved upon in terms of finite-sample regression parameter estimation. Therefore, we propose a modified GEE approach and a selection method that will both ensure the validity of inference and improve regression parameter estimation. In addition, these modified approaches assume the data analyst knows the type of time-dependent covariate, although this likely is not the case in practice. Whereas hypothesis testing has been used to determine covariate type, we propose a novel strategy to select a working covariate type in order to avoid potentially high type II error rates with these hypothesis testing procedures. Parameter estimates resulting from our proposed method are consistent and have overall improved mean squared error relative to hypothesis testing approaches. Finally, for some real-world examples the use of mean regression models may be sensitive to skewness and outliers in the data. Therefore, we extend our approaches from their use with marginal quantile regression to modeling the conditional quantiles of the response variable. Existing and proposed methods are compared in simulation studies and application examples. Generalized Estimating Equations Time-Dependent Covariate Empirical Covariance Matrix Working Correlation Structure Mean Squared Error Marginal Quantile Regression Applied Statistics Biostatistics Statistical Models
40	Bias Reduction in Machine Learning Classifiers for Spatiotemporal Analysis of Coral Reefs using Remote Sensing Images Gapper, Justin J. 06 May 2019 (has links) This dissertation is an evaluation of the generalization characteristics of machine learning classifiers as applied to the detection of coral reefs using remote sensing images. Three scientific studies have been conducted as part of this research: 1) Evaluation of Spatial Generalization Characteristics of a Robust Classifier as Applied to Coral Reef Habitats in Remote Islands of the Pacific Ocean 2) Coral Reef Change Detection in Remote Pacific Islands using Support Vector Machine Classifiers 3) A Generalized Machine Learning Classifier for Spatiotemporal Analysis of Coral Reefs in the Red Sea. The aim of this dissertation is to propose and evaluate a methodology for developing a robust machine learning classifier that can effectively be deployed to accurately detect coral reefs at scale. The hypothesis is that Landsat data can be used to train a classifier to detect coral reefs in remote sensing imagery and that this classifier can be trained to generalize across multiple sites. Another objective is to identify how well different classifiers perform under the generalized conditions and how unique the spectral signature of coral is as environmental conditions vary across observation sites. A methodology for validating the generalization performance of a classifier to unseen locations is proposed and implemented (Controlled Parameter Cross-Validation,). Analysis is performed using satellite imagery from nine different locations with known coral reefs (six Pacific Ocean sites and three Red Sea sites). Ground truth observations for four of the Pacific Ocean sites and two of the Red Sea sites were used to validate the proposed methodology. Within the Pacific Ocean sites, the consolidated classifier (trained on data from all sites) yielded an accuracy of 75.5% (0.778 AUC). Within the Red Sea sites, the consolidated classifier yielded an accuracy of 71.0% (0.7754 AUC). Finally, long-term change detection analysis is conducted for each of the sites evaluated. In total, over 16,700 km2 was analyzed for benthic cover type and cover change detection analysis. Within the Pacific Ocean sites, decreases in coral cover ranged from 25.3% reduction (Kingman Reef) to 42.7% reduction (Kiritimati Island). Within the Red Sea sites, decrease in coral cover ranged from 3.4% (Umluj) to 13.6% (Al Wajh). Data Science Machine Learning Statistics Applied Mathematics Remote Sensing Coral Reef Oceanography Statistical Models

Search results