• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 9
  • 1
  • 1
  • 1
  • Tagged with
  • 88
  • 88
  • 51
  • 38
  • 35
  • 32
  • 19
  • 19
  • 19
  • 17
  • 16
  • 16
  • 15
  • 15
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

A statistical framework to detect gene-environment interactions influencing complex traits

Deng, Wei Q. 27 August 2014 (has links)
<p>Advancements in human genomic technology have helped to improve our understanding of how genetic variation plays a central role in the mechanism of disease susceptibility. However, the very high dimensional nature of the data generated from large-scale genetic association studies has limited our ability to thoroughly examine genetic interactions. A prioritization scheme – Variance Prioritization (VP) – has been developed to select genetic variants based on differences in the quantitative trait variance between the possible genotypes using Levene’s test (Pare et al., 2010). Genetic variants with Levene’s test p-values lower than a pre-determined level of significance are selected to test for interactions using linear regression models. Under a variety of scenarios, VP has increased power to detect interactions over an exhaustive search as a result of reduced search space. Nevertheless, the use of Levene’s test does not take into account that the variance will either monotonically increase or decrease with the number of minor alleles when interactions are present. To address this issue, I propose a maximum likelihood approach to test for trends in variance between the genotypes, and derive a closed-form representation of the likelihood ratio test (LRT) statistic. Using simulations, I examine the performance of LRT in assessing the inequality of quantitative traits variance stratified by genotypes, and subsequently in identifying potentially interacting genetic variants. LRT is also used in an empirical dataset of 2,161 individuals to prioritize genetic variants for gene-environment interactions. The interaction p-values of the prioritized genetic variants are consistently lower than expected by chance compared to the non-prioritized, suggesting improved statistical power to detect interactions in the set of prioritized genetic variants. This new statistical test is expected to complement the existing VP framework and accelerate the process of genetic interaction discovery in future genome-wide studies and meta-analyses.</p> / Master of Health Sciences (MSc)
52

In-Shoe Plantar Pressure System To Investigate Ground Reaction Force Using Android Platform

Mostfa, Ahmed A. 01 January 2016 (has links)
Human footwear is not yet designed to optimally relieve pressure on the heel of the foot. Proper foot pressure assessment requires personal training and measurements by specialized machinery. This research aims to investigate and hypothesize about Preferred Transition Speed (PTS) and to classify the gait phase of explicit variances in walking patterns between different subjects. An in-shoe wearable pressure system using Android application was developed to investigate walking patterns and collect data on Activities of Daily Living (ADL). In-shoe circuitry used Flexi-Force A201 sensors placed at three major areas: heel contact, 1st metatarsal, and 5th metatarsal with a PIC16F688 microcontroller and Bluetooth module. This method provides a low-cost instantaneous solution to both wear and records plantar foot simultaneously. Data acquisition used internal local memory to store pressure logs for offline data analysis. Data processing used the perpendicular slope to determine peak pressure and time of index. Statistical analysis can utilize to discover foot deformity. The empirical results in one subject showed weak linearity between normal and fast walk and a significant difference in body weight acceptance between normal and slow walk. In addition, T-test hypothesis testing between two healthy subjects, with , illustrated a significant difference in their Initial Contact pressure and no difference between their peak-to-peak time interval. Preferred Transition Speed versus VGRF was measured in 19 subjects. The experiments demonstrated that vertical GRF averagely increased 18.46% when the speed changed from 50% to 75% of PTS with STD 4.78. While VGRF increased 21.24% when the speed changed from 75% to 100% of PTS with STD 7.81. Finally, logistic regression between 12 healthy subjects demonstrated a good classification with 82.6% accuracy between partial foot bearing and their normal walk.
53

Dimension Reduction and Variable Selection

Moradi Rekabdarkolaee, Hossein 01 January 2016 (has links)
High-dimensional data are becoming increasingly available as data collection technology advances. Over the last decade, significant developments have been taking place in high-dimensional data analysis, driven primarily by a wide range of applications in many fields such as genomics, signal processing, and environmental studies. Statistical techniques such as dimension reduction and variable selection play important roles in high dimensional data analysis. Sufficient dimension reduction provides a way to find the reduced space of the original space without a parametric model. This method has been widely applied in many scientific fields such as genetics, brain imaging analysis, econometrics, environmental sciences, etc. in recent years. In this dissertation, we worked on three projects. The first one combines local modal regression and Minimum Average Variance Estimation (MAVE) to introduce a robust dimension reduction approach. In addition to being robust to outliers or heavy-tailed distribution, our proposed method has the same convergence rate as the original MAVE. Furthermore, we combine local modal base MAVE with a $L_1$ penalty to select informative covariates in a regression setting. This new approach can exhaustively estimate directions in the regression mean function and select informative covariates simultaneously, while being robust to the existence of possible outliers in the dependent variable. The second project develops sparse adaptive MAVE (saMAVE). SaMAVE has advantages over adaptive LASSO because it extends adaptive LASSO to multi-dimensional and nonlinear settings, without any model assumption, and has advantages over sparse inverse dimension reduction methods in that it does not require any particular probability distribution on \textbf{X}. In addition, saMAVE can exhaustively estimate the dimensions in the conditional mean function. The third project extends the envelope method to multivariate spatial data. The envelope technique is a new version of the classical multivariate linear model. The estimator from envelope asymptotically has less variation compare to the Maximum Likelihood Estimator (MLE). The current envelope methodology is for independent observations. While the assumption of independence is convenient, this does not address the additional complication associated with a spatial correlation. This work extends the idea of the envelope method to cases where independence is an unreasonable assumption, specifically multivariate data from spatially correlated process. This novel approach provides estimates for the parameters of interest with smaller variance compared to maximum likelihood estimator while still being able to capture the spatial structure in the data.
54

THE USE OF 3-D HIGHWAY DIFFERENTIAL GEOMETRY IN CRASH PREDICTION MODELING

Amiridis, Kiriakos 01 January 2019 (has links)
The objective of this research is to evaluate and introduce a new methodology regarding rural highway safety. Current practices rely on crash prediction models that utilize specific explanatory variables, whereas the depository of knowledge for past research is the Highway Safety Manual (HSM). Most of the prediction models in the HSM identify the effect of individual geometric elements on crash occurrence and consider their combination in a multiplicative manner, where each effect is multiplied with others to determine their combined influence. The concepts of 3-dimesnional (3-D) representation of the roadway surface have also been explored in the past aiming to model the highway structure and optimize the roadway alignment. The use of differential geometry on utilizing the 3-D roadway surface in order to understand how new metrics can be used to identify and express roadway geometric elements has been recently utilized and indicated that this may be a new approach in representing the combined effects of all geometry features into single variables. This research will further explore this potential and examine the possibility to utilize 3-D differential geometry in representing the roadway surface and utilize its associated metrics to consider the combined effect of roadway features on crashes. It is anticipated that a series of single metrics could be used that would combine horizontal and vertical alignment features and eventually predict roadway crashes in a more robust manner. It should be also noted that that the main purpose of this research is not to simply suggest predictive crash models, but to prove in a statistically concrete manner that 3-D metrics of differential geometry, e.g. Gaussian Curvature and Mean Curvature can assist in analyzing highway design and safety. Therefore, the value of this research is oriented towards the proof of concept of the link between 3-D geometry in highway design and safety. This thesis presents the steps and rationale of the procedure that is followed in order to complete the proposed research. Finally, the results of the suggested methodology are compared with the ones that would be derived from the, state-of-the-art, Interactive Highway Safety Design Model (IHSDM), which is essentially the software that is currently used and based on the findings of the HSM.
55

TRANSFORMS IN SUFFICIENT DIMENSION REDUCTION AND THEIR APPLICATIONS IN HIGH DIMENSIONAL DATA

Weng, Jiaying 01 January 2019 (has links)
The big data era poses great challenges as well as opportunities for researchers to develop efficient statistical approaches to analyze massive data. Sufficient dimension reduction is such an important tool in modern data analysis and has received extensive attention in both academia and industry. In this dissertation, we introduce inverse regression estimators using Fourier transforms, which is superior to the existing SDR methods in two folds, (1) it avoids the slicing of the response variable, (2) it can be readily extended to solve the high dimensional data problem. For the ultra-high dimensional problem, we investigate both eigenvalue decomposition and minimum discrepancy approaches to achieve optimal solutions and also develop a novel and efficient optimization algorithm to obtain the sparse estimates. We derive asymptotic properties of the proposed estimators and demonstrate its efficiency gains compared to the traditional estimators. The oracle properties of the sparse estimates are derived. Simulation studies and real data examples are used to illustrate the effectiveness of the proposed methods. Wavelet transform is another tool that effectively detects information from time-localization of high frequency. Parallel to our proposed Fourier transform methods, we also develop a wavelet transform version approach and derive the asymptotic properties of the resulting estimators.
56

Generalizing Multistage Partition Procedures for Two-parameter Exponential Populations

Wang, Rui 06 August 2018 (has links)
ANOVA analysis is a classic tool for multiple comparisons and has been widely used in numerous disciplines due to its simplicity and convenience. The ANOVA procedure is designed to test if a number of different populations are all different. This is followed by usual multiple comparison tests to rank the populations. However, the probability of selecting the best population via ANOVA procedure does not guarantee the probability to be larger than some desired prespecified level. This lack of desirability of the ANOVA procedure was overcome by researchers in early 1950's by designing experiments with the goal of selecting the best population. In this dissertation, a single-stage procedure is introduced to partition k treatments into "good" and "bad" groups with respect to a control population assuming some key parameters are known. Next, the proposed partition procedure is genaralized for the case when the parameters are unknown and a purely-sequential procedure and a two-stage procedure are derived. Theoretical asymptotic properties, such as first order and second order properties, of the proposed procedures are derived to document the efficiency of the proposed procedures. These theoretical properties are studied via Monte Carlo simulations to document the performance of the procedures for small and moderate sample sizes.
57

ESTIMATING THE RESPIRATORY LUNG MOTION MODEL USING TENSOR DECOMPOSITION ON DISPLACEMENT VECTOR FIELD

Kang, Kingston 01 January 2018 (has links)
Modern big data often emerge as tensors. Standard statistical methods are inadequate to deal with datasets of large volume, high dimensionality, and complex structure. Therefore, it is important to develop algorithms such as low-rank tensor decomposition for data compression, dimensionality reduction, and approximation. With the advancement in technology, high-dimensional images are becoming ubiquitous in the medical field. In lung radiation therapy, the respiratory motion of the lung introduces variabilities during treatment as the tumor inside the lung is moving, which brings challenges to the precise delivery of radiation to the tumor. Several approaches to quantifying this uncertainty propose using a model to formulate the motion through a mathematical function over time. [Li et al., 2011] uses principal component analysis (PCA) to propose one such model using each image as a long vector. However, the images come in a multidimensional arrays, and vectorization breaks the spatial structure. Driven by the needs to develop low-rank tensor decomposition and provided the 4DCT and Displacement Vector Field (DVF), we introduce two tensor decompositions, Population Value Decomposition (PVD) and Population Tucker Decomposition (PTD), to estimate the respiratory lung motion with high levels of accuracy and data compression. The first algorithm is a generalization of PVD [Crainiceanu et al., 2011] to higher order tensor. The second algorithm generalizes the concept of PVD using Tucker decomposition. Both algorithms are tested on clinical and phantom DVFs. New metrics for measuring the model performance are developed in our research. Results of the two new algorithms are compared to the result of the PCA algorithm.
58

Sabermetrics - Statistical Modeling of Run Creation and Prevention in Baseball

Chernoff, Parker 30 March 2018 (has links)
The focus of this thesis was to investigate which baseball metrics are most conducive to run creation and prevention. Stepwise regression and Liu estimation were used to formulate two models for the dependent variables and also used for cross validation. Finally, the predicted values were fed into the Pythagorean Expectation formula to predict a team’s most important goal: winning. Each model fit strongly and collinearity amongst offensive predictors was considered using variance inflation factors. Hits, walks, and home runs allowed, infield putouts, errors, defense-independent earned run average ratio, defensive efficiency ratio, saves, runners left on base, shutouts, and walks per nine innings were significant defensive predictors. Doubles, home runs, walks, batting average, and runners left on base were significant offensive regressors. Both models produced error rates below 3% for run prediction and together they did an excellent job of estimating a team’s per-season win ratio.
59

Mixture of Factor Analyzers with Information Criteria and the Genetic Algorithm

Turan, Esra 01 August 2010 (has links)
In this dissertation, we have developed and combined several statistical techniques in Bayesian factor analysis (BAYFA) and mixture of factor analyzers (MFA) to overcome the shortcoming of these existing methods. Information Criteria are brought into the context of the BAYFA model as a decision rule for choosing the number of factors m along with the Press and Shigemasu method, Gibbs Sampling and Iterated Conditional Modes deterministic optimization. Because of sensitivity of BAYFA on the prior information of the factor pattern structure, the prior factor pattern structure is learned directly from the given sample observations data adaptively using Sparse Root algorithm. Clustering and dimensionality reduction have long been considered two of the fundamental problems in unsupervised learning or statistical pattern recognition. In this dissertation, we shall introduce a novel statistical learning technique by focusing our attention on MFA from the perspective of a method for model-based density estimation to cluster the high-dimensional data and at the same time carry out factor analysis to reduce the curse of dimensionality simultaneously in an expert data mining system. The typical EM algorithm can get trapped in one of the many local maxima therefore, it is slow to converge and can never converge to global optima, and highly dependent upon initial values. We extend the EM algorithm proposed by cite{Gahramani1997} for the MFA using intelligent initialization techniques, K-means and regularized Mahalabonis distance and introduce the new Genetic Expectation Algorithm (GEM) into MFA in order to overcome the shortcomings of typical EM algorithm. Another shortcoming of EM algorithm for MFA is assuming the variance of the error vector and the number of factors is the same for each mixture. We propose Two Stage GEM algorithm for MFA to relax this constraint and obtain different numbers of factors for each population. In this dissertation, our approach will integrate statistical modeling procedures based on the information criteria as a fitness function to determine the number of mixture clusters and at the same time to choose the number factors that can be extracted from the data.
60

Musical Missteps: The Severity of the Sophomore Slump in the Music Industry

Zackery, Shane M. 17 May 2014 (has links)
This study looks at alternative models of follow-up album success in order to determine if there is a relationship between the decrease in Metascore ratings (assigned by Metacritic.com) between the first and second album for a musician or band and the 1) music genre or 2) the number of years between the first and second album release. The results support the dominant thought, which suggests that neither belonging to a certain genre of music nor waiting more or less time to drop the second album makes an artist more susceptible to the Sophomore Slump. This finding is important because it forces us to identify other potential causes for the observed disappointing performance of a generally favorable musician’s second album.

Page generated in 0.1257 seconds