91 |
Hierarchical Statistical Models for Large Spatial Data in Uncertainty Quantification and Data FusionShi, Hongxiang January 2017 (has links)
No description available.
|
92 |
STATISTICAL APPROACHES TO ANALYZE CENSORED DATA WITH MULTIPLE DETECTION LIMITSZHONG, WEI January 2005 (has links)
No description available.
|
93 |
A distributed cooperative algorithm for localization in wireless sensor networks using Gaussian mixture modelingChowdhury, Tashnim Jabir Shovon January 2016 (has links)
No description available.
|
94 |
Statistical Inferences under a semiparametric finite mixture modelZhang, Shiju January 2005 (has links)
No description available.
|
95 |
Extending Growth Mixture Models and Handling Missing Values via Mixtures of Non-Elliptical DistributionsWei, Yuhong January 2017 (has links)
Growth mixture models (GMMs) are used to model intra-individual change and inter-individual differences in change and to detect underlying group structure in longitudinal studies. Regularly, these models are fitted under the assumption of normality, an assumption that is frequently invalid. To this end, this thesis focuses on the development of novel non-elliptical growth mixture models to better fit real data. Two non-elliptical growth mixture models, via the multivariate skew-t distribution and the generalized hyperbolic distribution, are developed and applied to simulated and real data. Furthermore, these two non-elliptical growth mixture models are extended to accommodate missing values, which are near-ubiquitous in real data.
Recently, finite mixtures of non-elliptical distributions have flourished and facilitated the flexible clustering of the data featuring longer tails and asymmetry. However, in practice, real data often have missing values, and so work in this direction is also pursued. A novel approach, via mixtures of the generalized hyperbolic distribution and mixtures of the multivariate skew-t distributions, is presented to handle missing values in mixture model-based clustering context. To increase parsimony, families of mixture models have been developed by imposing constraints on the component scale matrices whenever missing data occur. Next, a mixture of generalized hyperbolic factor analyzers model is also proposed to cluster high-dimensional data with different patterns of missing values. Two missingness indicator matrices are also introduced to ease the computational burden. The algorithms used for parameter estimation are presented, and the performance of the methods is illustrated on simulated and real data. / Thesis / Doctor of Philosophy (PhD)
|
96 |
Inference for Generalized Multivariate Analysis of Variance (GMANOVA) Models and High-dimensional ExtensionsJana, Sayantee 11 1900 (has links)
A Growth Curve Model (GCM) is a multivariate linear model used for analyzing longitudinal data with short to moderate time series. It is a special case of Generalized Multivariate Analysis of Variance (GMANOVA) models. Analysis using the GCM involves comparison of mean growths among different groups. The classical GCM, however, possesses some limitations including distributional assumptions, assumption of identical degree of polynomials for all groups and it requires larger sample size than the number of time points. In this thesis, we relax some of the assumptions of the traditional GCM and develop appropriate inferential tools for its analysis, with the aim of reducing bias, improving precision and to gain increased power as well as overcome limitations of high-dimensionality.
Existing methods for estimating the parameters of the GCM assume that the underlying distribution for the error terms is multivariate normal. In practical problems, however, we often come across skewed data and hence estimation techniques developed under the normality assumption may not be optimal. Simulation studies conducted in this thesis, in fact, show that existing methods are sensitive to the presence of skewness in the data, where estimators are associated with increased bias and mean square error (MSE), when the normality assumption is violated. Methods appropriate for skewed distributions are, therefore, required. In this thesis, we relax the distributional assumption of the GCM and provide estimators for the mean and covariance matrices of the GCM under multivariate skew normal (MSN) distribution. An estimator for the additional skewness parameter of the MSN distribution is also provided. The estimators are derived using the expectation maximization (EM) algorithm and extensive simulations are performed to examine the performance of the estimators. Comparisons with existing estimators show that our estimators perform better than existing estimators, when the underlying distribution is multivariate skew normal. Illustration using real data set is also provided, wherein Triglyceride levels from the Framingham Heart Study is modelled over time.
The GCM assumes equal degree of polynomial for each group. Therefore, when groups means follow different shapes of polynomials, the GCM fails to accommodate this difference in one model. We consider an extension of the GCM, wherein mean responses from different groups can have different shapes, represented by polynomials of different degree. Such a model is referred to as Extended Growth Curve Model (EGCM). We extend our work on GCM to EGCM, and develop estimators for the mean and covariance matrices under MSN errors. We adopted the Restricted Expectation Maximization (REM) algorithm, which is based on the multivariate Newton-Raphson (NR) method and Lagrangian optimization. However, the multivariate NR method and hence, the existing REM algorithm are applicable to vector parameters and the parameters of interest in this study are matrices. We, therefore, extended the NR approach to matrix parameters, which consequently allowed us to extend the REM algorithm to matrix parameters. The performance of the proposed estimators were examined using extensive simulations and a motivating real data example was provided to illustrate the application of the proposed estimators.
Finally, this thesis deals with high-dimensional application of GCM. Existing methods for a GCM are developed under the assumption of ‘small p large n’ (n >> p) and are not appropriate for analyzing high-dimensional longitudinal data, due to singularity of the sample covariance matrix. In a previous work, we used Moore-Penrose generalized inverse to overcome this challenge. However, the method has some limitations around near singularity, when p~n. In this thesis, a Bayesian framework was used to derive a test for testing the linear hypothesis on the mean parameter of the GCM, which is applicable in high-dimensional situations. Extensive simulations are performed to investigate the performance of the test statistic and establish optimality characteristics. Results show that this test performs well, under different conditions, including the near singularity zone. Sensitivity of the test to mis-specification of the parameters of the prior distribution are also examined empirically. A numerical example is provided to illustrate the usefulness of the proposed method in practical situations. / Thesis / Doctor of Philosophy (PhD)
|
97 |
Normal Mixture Models for Gene Cluster Identification in Two Dimensional Microarray DataHarvey, Eric Scott 01 January 2003 (has links)
This dissertation focuses on methodology specific to microarray data analyses that organize the data in preliminary steps and proposes a cluster analysis method which improves the interpretability of the cluster results. Cluster analysis of microarray data allows samples with similar gene expression values to be discovered and may serve as a useful diagnostic tool. Since microarray data is inherently noisy, data preprocessing steps including smoothing and filtering are discussed. Comparing the results of different clustering methods is complicated by the arbitrariness of the cluster labels. Methods for re-labeling clusters to assess the agreement between the results of different clustering techniques are proposed. Microarray data involve large numbers of observations and generally present as arrays of light intensity values reflecting the degree of activity of the genes. These measurements are often two dimensional in nature since each is associated with an individual sample (cell line) and gene. The usual hierarchical clustering techniques do not easily adapt to this type of problem. These techniques allow only one dimension of the data to be clustered at a time and lose information due to the collapsing of the data in the opposite dimension. A novel clustering technique based on normal mixture distribution models is developed. This method clusters observations that arise from the same normal distribution and allows the data to be simultaneously clustered in two dimensions. The model is fitted using the Expectation/Maximization (EM) algorithm. For every cluster, the posterior probability that an observation belongs to that cluster is calculated. These probabilities allow the analyst to control the cluster assignments, including the use of overlapping clusters. A user friendly program, 2-DCluster, was written to support these methods. This program was written for Microsoft Windows 2000 and XP systems and supports one and two dimensional clustering. The program and sample applications are available at http://etd.vcu.edu. An electronic copy of this dissertation is available at the same address.
|
98 |
Statistical inference for rankings in the presence of panel segmentationXie, Lin January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Paul Nelson / Panels of judges are often used to estimate consumer preferences for m items such as food products. Judges can either evaluate each item on several ordinal scales and indirectly produce an overall ranking, or directly report a ranking of the items. A complete ranking orders all the items from best to worst. A partial ranking, as we use the term, only reports rankings of the best q out of m items. Direct ranking, the subject of this report, does not require the widespread but questionable practice of treating ordinal measurement as though they were on ratio or interval scales. Here, we develop and study segmentation models in which the panel may consist of relatively homogeneous subgroups, the segments. Judges within a subgroup will tend to agree among themselves and differ from judges in the other subgroups. We develop and study the statistical analysis of mixture models where it is not known to which segment a judge belongs or in some cases how many segments there are. Viewing segment membership indicator variables as latent data, an E-M algorithm was used to find the maximum likelihood estimators of the parameters specifying a mixture of Mallow’s (1957) distance models for complete and partial rankings. A simulation study was conducted to evaluate the behavior of the E-M algorithm in terms of such issues as the fraction of data sets for which the algorithm fails to converge and the sensitivity of initial values to the convergence rate and the performance of the maximum likelihood estimators in terms of bias and mean square error, where applicable.
A Bayesian approach was developed and credible set estimators was constructed. Simulation was used to evaluate the performance of these credible sets as
confidence sets.
A method for predicting segment membership from covariates measured on a judge was derived using a logistic model applied to a mixture of Mallows probability distance models. The effects of covariates on segment membership were assessed.
Likelihood sets for parameters specifying mixtures of Mallows distance models were constructed and explored.
|
99 |
Software for Estimation of Human Transcriptome Isoform Expression Using RNA-Seq DataJohnson, Kristen 18 May 2012 (has links)
The goal of this thesis research was to develop software to be used with RNA-Seq data for transcriptome quantification that was capable of handling multireads and quantifying isoforms on a more global level. Current software available for these purposes uses various forms of parameter alteration in order to work with multireads. Many still analyze isoforms per gene or per researcher determined clusters as well. By doing so, the effects of multireads are diminished or possibly wrongly represented. To address this issue, two programs, GWIE and ChromIE, were developed based on a simple iterative EM-like algorithm with no parameter manipulation. These programs are used to produce accurate isoform expression levels.
|
100 |
Estimation of Regression Coefficients under a Truncated Covariate with Missing ValuesReinhammar, Ragna January 2019 (has links)
By means of a Monte Carlo study, this paper investigates the relative performance of Listwise Deletion, the EM-algorithm and the default algorithm in the MICE-package for R (PMM) in estimating regression coefficients under a left truncated covariate with missing values. The intention is to investigate whether the three frequently used missing data techniques are robust against left truncation when missing values are MCAR or MAR. The results suggest that no technique is superior overall in all combinations of factors studied. The EM-algorithm is unaffected by left truncation under MCAR but negatively affected by strong left truncation under MAR. Compared to the default MICE-algorithm, the performance of EM is more stable across distributions and combinations of sample size and missing rate. The default MICE-algorithm is improved by left truncation but is sensitive to missingness pattern and missing rate. Compared to Listwise Deletion, the EM-algorithm is less robust against left truncation when missing values are MAR. However, the decline in performance of the EM-algorithm is not large enough for the algorithm to be completely outperformed by Listwise Deletion, especially not when the missing rate is moderate. Listwise Deletion might be robust against left truncation but is inefficient.
|
Page generated in 0.0343 seconds