1 |
Addressing the Variable Selection Bias and Local Optimum Limitations of Longitudinal Recursive Partitioning with Time-Efficient ApproximationsJanuary 2019 (has links)
abstract: Longitudinal recursive partitioning (LRP) is a tree-based method for longitudinal data. It takes a sample of individuals that were each measured repeatedly across time, and it splits them based on a set of covariates such that individuals with similar trajectories become grouped together into nodes. LRP does this by fitting a mixed-effects model to each node every time that it becomes partitioned and extracting the deviance, which is the measure of node purity. LRP is implemented using the classification and regression tree algorithm, which suffers from a variable selection bias and does not guarantee reaching a global optimum. Additionally, fitting mixed-effects models to each potential split only to extract the deviance and discard the rest of the information is a computationally intensive procedure. Therefore, in this dissertation, I address the high computational demand, variable selection bias, and local optimum solution. I propose three approximation methods that reduce the computational demand of LRP, and at the same time, allow for a straightforward extension to recursive partitioning algorithms that do not have a variable selection bias and can reach the global optimum solution. In the three proposed approximations, a mixed-effects model is fit to the full data, and the growth curve coefficients for each individual are extracted. Then, (1) a principal component analysis is fit to the set of coefficients and the principal component score is extracted for each individual, (2) a one-factor model is fit to the coefficients and the factor score is extracted, or (3) the coefficients are summed. The three methods result in each individual having a single score that represents the growth curve trajectory. Therefore, now that the outcome is a single score for each individual, any tree-based method may be used for partitioning the data and group the individuals together. Once the individuals are assigned to their final nodes, a mixed-effects model is fit to each terminal node with the individuals belonging to it.
I conduct a simulation study, where I show that the approximation methods achieve the goals proposed while maintaining a similar level of out-of-sample prediction accuracy as LRP. I then illustrate and compare the methods using an applied data. / Dissertation/Thesis / Doctoral Dissertation Psychology 2019
|
2 |
Residuals in the growth curve model with applications to the analysis of longitudinal dataHUANG, WEILIANG January 2012 (has links)
<p>Statistical models often rely on several assumptions including distributional assumptions on outcome variables and relational assumptions where we model the relationship between outcomes and independent variables. Further assumptions are also made depending on the complexity of the data and the model being used. Model diagnostics is, therefore, a crucial component of any model fitting problem. Residuals play important roles in model diagnostics. Residuals are not only used to check adequacy of model fit, but they also are excellent tools to validate model assumptions as well as identify outliers and influential observations. Residuals in univariate models are studied extensively and are routinely used for model diagnostics. In multivariate models residuals are not commonly used to assess model fit, although a few approaches have been proposed to check multivariate normality. However, in the analysis of longitudinal data, the resulting residuals are correlated and are not normally distributed. It is, therefore, not clear as to how ordinary residuals can be used for model diagnostics. Under sufficiently large sample size, a transformation of ordinary residuals are proposed to check the normality assumption. The transformation is based solely on removing correlation among the residuals. However, we show that these transformed residuals fail in the presence of model mis-specification. In this thesis, we investigate residuals in the analysis of longitudinal data. We consider ordinary residuals, Fitzmaurice’s transformed (uncorrelated) residuals as well as von Rosen’s decomposed residuals. Using simulation studies, we show how the residuals behave under multivariate normality and when this assumption is violated. We also investigate their properties under correct fitting as well as wrongly fitted models. Finally, we propose new residuals by transforming von Rosen’s decomposed residuals. We show that these residuals perform better than Fitzmourice’s transformed residuals in the presence of model mis-specification. We illustrate our approach using two real data sets.</p> / Master of Science (MSc)
|
3 |
Inference for Generalized Multivariate Analysis of Variance (GMANOVA) Models and High-dimensional ExtensionsJana, Sayantee 11 1900 (has links)
A Growth Curve Model (GCM) is a multivariate linear model used for analyzing longitudinal data with short to moderate time series. It is a special case of Generalized Multivariate Analysis of Variance (GMANOVA) models. Analysis using the GCM involves comparison of mean growths among different groups. The classical GCM, however, possesses some limitations including distributional assumptions, assumption of identical degree of polynomials for all groups and it requires larger sample size than the number of time points. In this thesis, we relax some of the assumptions of the traditional GCM and develop appropriate inferential tools for its analysis, with the aim of reducing bias, improving precision and to gain increased power as well as overcome limitations of high-dimensionality.
Existing methods for estimating the parameters of the GCM assume that the underlying distribution for the error terms is multivariate normal. In practical problems, however, we often come across skewed data and hence estimation techniques developed under the normality assumption may not be optimal. Simulation studies conducted in this thesis, in fact, show that existing methods are sensitive to the presence of skewness in the data, where estimators are associated with increased bias and mean square error (MSE), when the normality assumption is violated. Methods appropriate for skewed distributions are, therefore, required. In this thesis, we relax the distributional assumption of the GCM and provide estimators for the mean and covariance matrices of the GCM under multivariate skew normal (MSN) distribution. An estimator for the additional skewness parameter of the MSN distribution is also provided. The estimators are derived using the expectation maximization (EM) algorithm and extensive simulations are performed to examine the performance of the estimators. Comparisons with existing estimators show that our estimators perform better than existing estimators, when the underlying distribution is multivariate skew normal. Illustration using real data set is also provided, wherein Triglyceride levels from the Framingham Heart Study is modelled over time.
The GCM assumes equal degree of polynomial for each group. Therefore, when groups means follow different shapes of polynomials, the GCM fails to accommodate this difference in one model. We consider an extension of the GCM, wherein mean responses from different groups can have different shapes, represented by polynomials of different degree. Such a model is referred to as Extended Growth Curve Model (EGCM). We extend our work on GCM to EGCM, and develop estimators for the mean and covariance matrices under MSN errors. We adopted the Restricted Expectation Maximization (REM) algorithm, which is based on the multivariate Newton-Raphson (NR) method and Lagrangian optimization. However, the multivariate NR method and hence, the existing REM algorithm are applicable to vector parameters and the parameters of interest in this study are matrices. We, therefore, extended the NR approach to matrix parameters, which consequently allowed us to extend the REM algorithm to matrix parameters. The performance of the proposed estimators were examined using extensive simulations and a motivating real data example was provided to illustrate the application of the proposed estimators.
Finally, this thesis deals with high-dimensional application of GCM. Existing methods for a GCM are developed under the assumption of ‘small p large n’ (n >> p) and are not appropriate for analyzing high-dimensional longitudinal data, due to singularity of the sample covariance matrix. In a previous work, we used Moore-Penrose generalized inverse to overcome this challenge. However, the method has some limitations around near singularity, when p~n. In this thesis, a Bayesian framework was used to derive a test for testing the linear hypothesis on the mean parameter of the GCM, which is applicable in high-dimensional situations. Extensive simulations are performed to investigate the performance of the test statistic and establish optimality characteristics. Results show that this test performs well, under different conditions, including the near singularity zone. Sensitivity of the test to mis-specification of the parameters of the prior distribution are also examined empirically. A numerical example is provided to illustrate the usefulness of the proposed method in practical situations. / Thesis / Doctor of Philosophy (PhD)
|
4 |
Linear Discriminant Analysis with Repeated MeasurementsSkinner, Evelina January 2019 (has links)
The classification of observations based on repeated measurements performed on the same subject over a given period of time or under different conditions is a common procedure in many disciplines such as medicine, psychology and environmental studies. In this thesis repeated measurements follow the Growth Curve model and are classified using linear discriminant analysis. The aim of this thesis is both to examine the effect of missing data on classification accuracy and to examine the effect of additional data on classification robustness. The results indicate that an increasing amount of missing data leads to a progressive decline in classification accuracy. With regard to the effect of additional data on classification robustness the results show a less predictable effect which can only be characterised as a general tendency towards improved robustness.
|
5 |
Bilinear Gaussian Radial Basis Function Networks for classification of repeated measurementsSjödin Hällstrand, Andreas January 2020 (has links)
The Growth Curve Model is a bilinear statistical model which can be used to analyse several groups of repeated measurements. Normally the Growth Curve Model is defined in such a way that the permitted sampling frequency of the repeated measurement is limited by the number of observed individuals in the data set.In this thesis, we examine the possibilities of utilizing highly frequently sampled measurements to increase classification accuracy for real world data. That is, we look at the case where the regular Growth Curve Model is not defined due to the relationship between the sampling frequency and the number of observed individuals. When working with this high frequency data, we develop a new method of basis selection for the regression analysis which yields what we call a Bilinear Gaussian Radial Basis Function Network (BGRBFN), which we then compare to more conventional polynomial and trigonometrical functional bases. Finally, we examine if Tikhonov regularization can be used to further increase the classification accuracy in the high frequency data case.Our findings suggest that the BGRBFN performs better than the conventional methods in both classification accuracy and functional approximability. The results also suggest that both high frequency data and furthermore Tikhonov regularization can be used to increase classification accuracy.
|
6 |
Depressive Symptoms Trajectories Following Child Death in Later Life: Variation by Race-EthnicityMellencamp, Kagan Alexander 13 August 2019 (has links)
No description available.
|
7 |
The unweighted mean estimator in a Growth Curve modelKarlsson, Emil January 2016 (has links)
The field of statistics is becoming increasingly more important as the amount of data in the world grows. This thesis studies the Growth Curve model in multivariate statistics which is a model that is not widely used. One difference compared with the linear model is that the Maximum Likelihood Estimators are more complicated. That makes it more difficult to use and to interpret which may be a reason for its not so widespread use. From this perspective this thesis will compare the traditional mean estimator for the Growth Curve model with the unweighted mean estimator. The unweighted mean estimator is simpler than the regular MLE. It will be proven that the unweighted estimator is in fact the MLE under certain conditions and examples when this occurs will be discussed. In a more general setting this thesis will present conditions when the un-weighted estimator has a smaller covariance matrix than the MLEs and also present confidence intervals and hypothesis testing based on these inequalities.
|
8 |
Decision Trees for Classification of Repeated MeasurementsHolmberg, Julianna January 2024 (has links)
Classification of data from repeated measurements is useful in various disciplines, for example that of medicine. This thesis explores how classification trees (CART) can be used for classifying repeated measures data. The reader is introduced to variations of the CART algorithm which can be used for classifying the data set and tests the performance of these algorithms on a data set that can be modelled using bilinear regression. The performance is compared with that of a classification rule based on linear discriminant analysis. It is found that while the performance of the CART algorithm can be satisfactory, using linear discriminant analysis is more reliable for achieving good results. / Klassificering av data från upprepade mätningar är användbart inom olika discipliner, till exempel medicin. Denna uppsats undersöker hur klassificeringsträd (CART) kan användas för att klassificera upprepade mätningar. Läsaren introduceras till varianter av CART-algoritmen som kan användas för att klassificera datamängden och testar prestandan för dessa algoritmer på en datamängd som kan modelleras med hjälp av bilinjär regression. Prestandan jämförs med en klassificeringsregel baserad på linjär diskriminantanalys. Det har visar sig att även om prestandan för CART-algoritmen kan vara tillfredsställande, är användning av linjär diskriminantanalys mer tillförlitlig för att uppnå goda resultat.
|
9 |
The Growth Curve Model for High Dimensional Data and its Application in GenomicsJana, Sayantee 04 1900 (has links)
<p>Recent advances in technology have allowed researchers to collect high-dimensional biological data simultaneously. In genomic studies, for instance, measurements from tens of thousands of genes are taken from individuals across several experimental groups. In time course microarray experiments, gene expression is measured at several time points for each individual across the whole genome resulting in massive amount of data. In such experiments, researchers are faced with two types of high-dimensionality. The first is global high-dimensionality, which is common to all genomic experiments. The global high-dimensionality arises because inference is being done on tens of thousands of genes resulting in multiplicity. This challenge is often dealt with statistical methods for multiple comparison, such as the Bonferroni correction or false discovery rate (FDR). We refer to the second type of high-dimensionality as gene specific high-dimensionality, which arises in time course microarry experiments due to the fact that, in such experiments, sample size is often smaller than the number of time points ($n</p> <p>In this thesis, we use the growth curve model (GCM), which is a generalized multivariate analysis of variance (GMANOVA) model, and propose a moderated test statistic for testing a special case of the general linear hypothesis, which is specially useful for identifying genes that are expressed. We use the trace test for the GCM and modify it so that it can be used in high-dimensional situations. We consider two types of moderation: the Moore-Penrose generalized inverse and Stein's shrinkage estimator of $ S $. We performed extensive simulations to show performance of the moderated test, and compared the results with original trace test. We calculated empirical level and power of the test under many scenarios. Although the focus is on hypothesis testing, we also provided moderated maximum likelihood estimator for the parameter matrix and assessed its performance by investigating bias and mean squared error of the estimator and compared the results with those of the maximum likelihood estimators. Since the parameters are matrices, we consider distance measures in both power and level comparisons as well as when investigating bias and mean squared error. We also illustrated our approach using time course microarray data taken from a study on Lung Cancer. We were able to filter out 1053 genes as non-noise genes from a pool of 22,277 genes which is approximately 5\% of the total number of genes. This is in sync with results from most biological experiments where around 5\% genes are found to be differentially expressed.</p> / Master of Science (MSc)
|
10 |
復原力的力量: 個人與來自家庭、學校脈絡中的保護機制對青少年憂鬱症狀改變之影響 / Resilient Outcome:The Impacts of Self-Esteem and Protective Mechanisms in Family and School Contexts on Trajectories of Adolescent Depressive Symptoms黃鈺婷, Huang,Yu Ting Unknown Date (has links)
本研究採用一項有關青少年成長與發展調適問題的長期貫時性追蹤資料(1996-1999),試圖突破過去討論青少年憂鬱症狀發展時,所用之横斷式資料的囿限,嘗試應用潛在成長曲線模型(Latent growth curve model, LGC Model)的分析方法,加入歷史時間的縱深,捕捉青少年憂鬱症狀的「起始狀態」、與「個別的成長軌跡發展」。以不扭曲地將所有受試青少年在三年間的內化症狀變化情形,忠實地描述出來。而後,加入「改變」因素的討論,企圖尋找能影響青少年憂鬱症狀發展軌跡的關鍵機制。
此研究主要目的即在「具象化」復原力的理論觀點,企圖加入動態的時間面向,確認負向生活事件與青少年憂鬱症狀發展軌跡之間的因果關聯,並探討來自個人、與環境脈絡中的關係運作,對青少年憂鬱症狀平均數、變化方向與速率的跨時間影響。研究結果明確回答:為什麼有些青少年在受到憂鬱症狀的負向影響之後,尚能有回復機會並「表現地比預期好」的疑問。至於針對一群憂鬱症狀發展呈現改善、或惡化的少數青少年樣本,在性別、自尊、負向生活事件、家庭親子互動、學校好朋友關係等特性上的差異,本研究亦逐一說明。
在理論層次上,本項研究結合適切的研究方法,從「靜態」到「動態」地觀察青少年的身心發展、自「個人」到「家庭系統內外」討論內外在資源對青少年復原的短暫以及長久影響效果,並以一般青少年為研究對象的作法,擴增了復原力理論的推論範疇與解釋深廣。研究顯示,青少年的「改善」或「惡化」憂鬱症狀發展軌跡,確實在環境脈絡的節制之下,存在著個別差異。此外,青少年起始的憂鬱狀態並不影響憂鬱症狀軌跡發展的變化率。家庭經濟不利這項負向生活事件,對於青少年憂鬱症狀的預測,只呈現短暫的初始影響。自尊和好朋友關係皆是青少年可以主動建構與可為之舉,為兩個最重要能影響青少年憂鬱症狀變化的關鍵因素。至於學校脈絡,則可視為在家庭脈絡之外,能提供青少年憂鬱症狀改變效果的新路徑,以及讓青少年可以順利「轉大人」之雙重機會的結構因素。 / Using data derived from a panel study (1996-1999) of long-term Taiwanese adolescent development and adaptation, this study intended to break through the limitations of cross-sectional studies, which plagued past studies of adolescents’ developing depressive symptoms. By employing the Latent Growth Curve Model (LGC Model), this study mainly attempted to feature the individual initial status and the trajectory of every adolescent’s developmental depressive symptoms, which concerned about the important functions of the dynamic historical time and space on youth internalizing symptoms, for the research purpose to reflect the real resilient outcome each adolescent displayed. Besides, in order to understand the key factors that were taken as positive and effective mechanisms to influence the initial status and rates of changes on youth trajectories of depressive symptoms, several latent constructs such as self-esteem and protective factors developed from family and school contexts were taken into accounts. Further, specified characteristics were noted to highlight the basic differences gradually showed between resilient improved adolescents and worsen ones.
A positive-psychological stance was taken as the leading research perspective in this study. The results shows that familial economic hardship only affects the initial status of adolescent depressive symptoms, implying that this negative event just had a short-term effect on youth’s psychological well beings. Those who were initially vulnerable to familial negative event had opportunity to become resilient over time. As to the protective factors, self-esteem and cohesive good-friendship were two crucial facets adolescents could actively construct and make efforts for further resilient performances to be better than expected.
The analyzing results indicated, interestingly, that parent-child relationship early obtained in family context and adolescent’s satisfaction with parenting merely counted for the initial impact on adolescent trajectories of depressive symptoms. Concerns and cohesive relationships acquired in school contexts, especially in classes, provided dual chances for adolescents to become resilient in a long run.
|
Page generated in 0.0477 seconds