Return to search

Statistical Learning for Process Data

Computer-based tests facilitate the collection of problem-solving processes, also known as process data. Response processes recorded in computer log files provide a new venue for investigating and understanding human behaviors. This thesis focuses on the development of statistical learning methods for process data and considers the following three problems.

The first problem is feature extraction. Response processes are noisy and of non-standard formats. To exploit information in process data, we propose two generic methods that summarize response processes to vectors so that standard statistical tools such as regression models are applicable. In Chapter 2, features are extracted using multidimensional scaling and a pairwise dissimilarity measure of response processes. Chapter 3 utilizes autoencoder and recurrent neural network to explore the latent structure of process data. For both methods, empirical studies show that the extracted features preserve a substantial amount of information in the observed processes and have greater predictive power for many variables than the traditional item responses.

The second problem is assessment based on process data. We present a statistical procedure in Chapter 4 that incorporates process information to improve the latent trait estimation of item response theory models. The procedure is data-driven and can be easily implemented by means of regression models. Theoretical guarantee is established for the mean squared error reduction. Application of this new process-data-based estimator to a real dataset shows that it achieves higher reliability than the traditional item-response-theory-based estimator.

The third problem is identification of problem-solving strategies for exploratory analysis. The approach presented in Chapter 5 segments individual process into a sequence of more homogeneous subprocesses using action predictability. Each subprocess is associated with a subtask whereby long and complex response process can be transformed into shorter and more interpretable subtask sequence. Using this approach, problem-solving strategies can be visualized and compared among groups of respondents and process information can be decomposed for further analysis.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/d8-k9kk-5e95
Date January 2021
CreatorsWang, Zhi
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0021 seconds