Return to search

Parma: Applications of Vector-Autoregressive Models to Biological Inference with an Emphasis on Procrustes-Based Data

Many phenomena in ecology, evolution, and organismal biology relate to how a system changes through time. Unfortunately, most of the statistical methods that are common in these fields represent samples as static scalars or vectors. Since variables in temporally-dynamic systems do not have stable values this representation is unideal. Differential equation and basis function representations provide alternative systems for description, but they are also not without drawbacks of their own. Differential equations are typically outside the scope of statistical inference, and basis function representations rely on functions that solely relate to the original data in regards to qualitative appearance, not in regards to any property of the original system. In this dissertation, I propose that vector autoregressive-moving average (VARMA) and vector autoregressive (VAR) processes can represent temporally-dynamic systems. Under this strategy, each sample is a time series, instead of a scalar or vector. Unlike differential equations, these representations facilitate statistical description and inference, and, unlike basis function representations, these processes directly relate to an emergent property of dynamic systems, their cross-covariance structure. In the first chapter, I describe how VAR representations for biological systems lead to both a metric for the difference between systems, the Euclidean process distance, and to a statistical test to assess whether two time series may have originated from a single VAR process, the likelihood ratio test for a common process. Using simulated time series, I demonstrate that the likelihood ratio test for a common process has a true Type I error rate that is close to the pre-specified nominal error rate, regardless of the number of subseries in the system or of the order of the processes. Further, using the Euclidean process distance as a measure of difference, I establish power curves for the test using logistic regression. The test has a high probability of rejecting a false null hypothesis, even for modest differences between series. In addition, I illustrate that if two competitors follow the Lotka-Volterra equations for competition with some additional white noise, the system deviates from VAR assumptions. Yet, the test can still differentiate between a simulation based on these equations in which the constraints on the system change and a simulation where the constraints do not change. Although the Type I error rate is inflated in this scenario, the degree of inflation does not appear to be larger when the system deviates more noticeably from model assumptions. In the second chapter, I investigate the likelihood ratio test for a common process's performance with shape trajectory data. Shape trajectories are an extension of geometric morphometric data in which a sample is a set of temporally-ordered shapes as opposed to a single static shape. Like all geometric morphometric data, each shape in a trajectory is inherently high-dimensional. Since the number of parameters in a VAR representation grows quadratically with the number of subseries, shape trajectory data will often require dimension reduction before a VAR representation can be estimated, but the effects that this reduction will have on subsequent inferences remains unclear. In this study, I simulated shape trajectories based on the movements of roundworms. I then reduced the number of variables that described each shape using principle components analysis. Based on these lower dimensional representations, I estimated the likelihood ratio test's Type I error rate and power with the simulated trajectories. In addition, I also used the same workflow on an empirical dataset of women walking (originally from Morris13) but also tried varying amounts of preprocessing before applying the workflow as well. The likelihood ratio test's Type I error rate was mildly inflated with the simulated shape trajectories but had a high probability of rejecting false null hypotheses. Without preprocessing, the likelihood ratio test for a common process had a highly inflated Type I error rate with the empirical data, but when the sampling density is lowered and the number of cycles is standardized within a comparison the degree of inflation becomes comparable to that of the simulated shape trajectories. Yet, these preprocessing steps do not appear to negatively impact the test's power. Visualization is a crucial step in geometric morphometric studies, but there are currently few, if any, methods to visualize differences in shape trajectories. To address this absence, I propose an extension to the classic vector-displacement diagram. In this new procedure, the VAR representations for two trajectories' processes generate two simulated trajectories that share the same shocks. Then, a vector-displacement diagram compares the simulated shapes at each time step. The set of all diagrams then illustrates the difference between the trajectories' processes. I assessed the validity of this procedure using two simulated shape trajectories, one based on the movements of roundworms and the other on the movements of earthworms. The result provided mixed results. Some diagrams do show comparisons between shapes that are similar to those in the original trajectories but others do not. Of particular note, diagrams show a bias towards whichever trajectory's process was used to generate pseudo-random shocks. This implies that the shocks to the system are just as crucial a component to a trajectory's behavior as the VAR model itself. Finally, in the third chapter I discuss a new R library to study dynamic systems and represent them as VAR and VARMA processes, iPARMA. Since certain processes can have multiple VARMA representations, the routines in this library place an emphasis on the reverse echelon format. For every process, there is only one VARMA model in reverse echelon format. The routines in iPARMA cover a diverse set of topics, but they all generally fall into one of four categories: simulation and study, model estimation, hypothesis testing, and visualization methods for shape trajectories. Within the chapter, I discuss highlights and features of key routines' algorithms, as well as how they differ from analogous routines in the R package MTS \citep{mtsCite}. In many regards, this dissertation is foundational, so it provides a number of lines for future research. One major area for further work involves alternative ways to represent a system as a VAR or VARMA process. For example, the parameter estimates in a VAR or VARMA model could depict a process as a point in parameter space. Other potentially fruitful areas include the extension of representational applications to other families of time series models, such as co-integrated models, or altering the generalized Procrustes algorithm to better suit shape trajectories. Based on these extensions, it is my hope that statistical inference based on stochastic process representations will help to progress what systems biologists are able to study and what questions they are able to answer about them. / A Dissertation submitted to the Department of Scientific Computing in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Summer Semester 2017. / May 3, 2017. / Function-valued Trait, Geometric morphometrics, Shape trajectory, Stochastic process, Time series analysis, Vector autoregressive-moving average (VARMA) model / Includes bibliographical references. / Dennis E. Slice, Professor Directing Dissertation; Paul M. Beaumont, University Representative; Peter Beerli, Committee Member; Anke Meyer-Baese, Committee Member; Sachin Shanbhag, Committee Member.

Identiferoai:union.ndltd.org:fsu.edu/oai:fsu.digital.flvc.org:fsu_552377
ContributorsSoda, K. James (Kenneth James) (authoraut), Slice, Dennis E. (professor directing dissertation), Beaumont, Paul M. (university representative), Beerli, Peter (committee member), Meyer-Baese, Anke (committee member), Shanbhag, Sachin (committee member), Florida State University (degree granting institution), College of Arts and Sciences (degree granting college), Department of Scientific Computing (degree granting departmentdgg)
PublisherFlorida State University
Source SetsFlorida State University
LanguageEnglish, English
Detected LanguageEnglish
TypeText, text, doctoral thesis
Format1 online resource (163 pages), computer, application/pdf

Page generated in 0.0026 seconds