Return to search

Inference for Generalized Multivariate Analysis of Variance (GMANOVA) Models and High-dimensional Extensions

A Growth Curve Model (GCM) is a multivariate linear model used for analyzing longitudinal data with short to moderate time series. It is a special case of Generalized Multivariate Analysis of Variance (GMANOVA) models. Analysis using the GCM involves comparison of mean growths among different groups. The classical GCM, however, possesses some limitations including distributional assumptions, assumption of identical degree of polynomials for all groups and it requires larger sample size than the number of time points. In this thesis, we relax some of the assumptions of the traditional GCM and develop appropriate inferential tools for its analysis, with the aim of reducing bias, improving precision and to gain increased power as well as overcome limitations of high-dimensionality.

Existing methods for estimating the parameters of the GCM assume that the underlying distribution for the error terms is multivariate normal. In practical problems, however, we often come across skewed data and hence estimation techniques developed under the normality assumption may not be optimal. Simulation studies conducted in this thesis, in fact, show that existing methods are sensitive to the presence of skewness in the data, where estimators are associated with increased bias and mean square error (MSE), when the normality assumption is violated. Methods appropriate for skewed distributions are, therefore, required. In this thesis, we relax the distributional assumption of the GCM and provide estimators for the mean and covariance matrices of the GCM under multivariate skew normal (MSN) distribution. An estimator for the additional skewness parameter of the MSN distribution is also provided. The estimators are derived using the expectation maximization (EM) algorithm and extensive simulations are performed to examine the performance of the estimators. Comparisons with existing estimators show that our estimators perform better than existing estimators, when the underlying distribution is multivariate skew normal. Illustration using real data set is also provided, wherein Triglyceride levels from the Framingham Heart Study is modelled over time.

The GCM assumes equal degree of polynomial for each group. Therefore, when groups means follow different shapes of polynomials, the GCM fails to accommodate this difference in one model. We consider an extension of the GCM, wherein mean responses from different groups can have different shapes, represented by polynomials of different degree. Such a model is referred to as Extended Growth Curve Model (EGCM). We extend our work on GCM to EGCM, and develop estimators for the mean and covariance matrices under MSN errors. We adopted the Restricted Expectation Maximization (REM) algorithm, which is based on the multivariate Newton-Raphson (NR) method and Lagrangian optimization. However, the multivariate NR method and hence, the existing REM algorithm are applicable to vector parameters and the parameters of interest in this study are matrices. We, therefore, extended the NR approach to matrix parameters, which consequently allowed us to extend the REM algorithm to matrix parameters. The performance of the proposed estimators were examined using extensive simulations and a motivating real data example was provided to illustrate the application of the proposed estimators.

Finally, this thesis deals with high-dimensional application of GCM. Existing methods for a GCM are developed under the assumption of ‘small p large n’ (n >> p) and are not appropriate for analyzing high-dimensional longitudinal data, due to singularity of the sample covariance matrix. In a previous work, we used Moore-Penrose generalized inverse to overcome this challenge. However, the method has some limitations around near singularity, when p~n. In this thesis, a Bayesian framework was used to derive a test for testing the linear hypothesis on the mean parameter of the GCM, which is applicable in high-dimensional situations. Extensive simulations are performed to investigate the performance of the test statistic and establish optimality characteristics. Results show that this test performs well, under different conditions, including the near singularity zone. Sensitivity of the test to mis-specification of the parameters of the prior distribution are also examined empirically. A numerical example is provided to illustrate the usefulness of the proposed method in practical situations. / Thesis / Doctor of Philosophy (PhD)

Identiferoai:union.ndltd.org:mcmaster.ca/oai:macsphere.mcmaster.ca:11375/22043
Date11 1900
CreatorsJana, Sayantee
ContributorsHamid, Dr. Jemila, Balakrishnan, Prof. Narayanaswamy, Mathematics and Statistics
Source SetsMcMaster University
LanguageEnglish
Detected LanguageEnglish
TypeThesis

Page generated in 0.002 seconds