This dissertation will focus on three research projects: Neighborhood vector auto regression in multivariate time series, uncertainty quantification for agent-based modeling networked anagrams, and a scalable algorithm for multi-class classification. The first project studies the modeling of multivariate time series, with the applications in the environmental sciences and other areas. In this work, a so-called neighborhood vector autoregression (NVAR) model is proposed to efficiently analyze large-dimensional multivariate time series. The time series are assumed to have underlying distances among them based on the inherent setting of the problem. When this distance matrix is available or can be obtained, the proposed NVAR method is demonstrated to provides a computationally efficient and theoretically sound estimation of model parameters. The performance of the proposed method is compared with other existing approaches in both simulation studies and a real application of stream nitrogen study. The second project focuses on the study of group anagram games. In a group anagram game, players are provided letters to form as many words as possible. In this work, the enhanced agent behavior models for networked group anagram games are built, exercised, and evaluated under an uncertainty quantification framework. Specifically, the game data for players is clustered based on their skill levels (forming words, requesting letters, and replying to requests), the multinomial logistic regressions for transition probabilities are performed, and
the uncertainty is quantified within each cluster. The result of this process is a model where players are assigned different numbers of neighbors and different skill levels in the game. Simulations of ego agents with neighbors are conducted to demonstrate the efficacy of the proposed methods. The third project aims to develop efficient and scalable algorithms for multi-class classification, which achieve a balance between prediction accuracy and computing efficiency, especially in high dimensional settings. The traditional multinomial logistic regression becomes slow in high dimensional settings where the number of classes (M) and the number of features (p) is large. Our algorithms are computing efficiently and scalable to data with even higher dimensions. The simulation and case study results demonstrate that our algorithms have huge advantage over traditional multinomial logistic regressions, and maintains comparable prediction performance. / Doctor of Philosophy / In many data-central applications, data often have complex structures involving temporal structures and high dimensionality. Modeling of complex data with temporal structures have attracted great attention in many applications such as enviromental sciences, network sciences, data mining, neuroscience, and economics. However, modeling such complex data is quite challenging due to large uncertainty and dimensionality of complex data. This dissertation focuses on modeling and prediction of complex data with temporal structures. Three different types of complex data are modeled. For example, the nitrogen of multiple streams are modeled in a joint manner, human actions in networked group anagrams are modeled and the uncertainty is quantified, and data with multiple labels are classified. Different models are proposed and they are demonstrated to be efficient through simulation and case study.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/109094 |
Date | 03 March 2022 |
Creators | Hu, Zhihao |
Contributors | Statistics, Deng, Xinwei, Kuhlman, Christopher James, Kim, Inyoung, Ranganathan, Shyam |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0023 seconds