Return to search

Functional Data Analysis and Machine Learning for High-Dimensional Structured Data

This thesis pertains to the uses of Functional Data Analysis and Machine Learning when analyzing high-dimensional structured datasets. The theme that motivates the first two chapters is the development of dimension-reduction methods in the context of functional data to advance the understanding of in-vivo measurements of neural-spike data. The last chapter addresses the analysis of survey data using machine learning techniques to identify novel risk factors for suicide in the general population.

The first chapter of this thesis, "Adaptive Functional Principal Component Analysis," provides a novel method for adequately capturing modes of variation in data exhibiting sharp changes in smoothness. Our work integrates a novel scatterplot technique that adaptively smooths latent functions estimated in an FPCA framework. We are motivated to identify coordinated patterns of brain activity across multiple simultaneously-recorded neurons during motor behavior to understand the dynamics between the brain and dexterous movement. Our proposed method adequately captures the underlying biological mechanisms in our experiment, offering interpretable activation patterns when compared to standard approaches.

The second chapter of our dissertation develops statistical procedures to compare the eigendecomposition from two samples of functional data. We first introduce appropriate tests for both independent and paired functions. We are motivated to test whether activation patterns in the motor cortex hold constant when a mouse performs a reaching movement repeatedly. We test all pairwise comparisons across trials and compare the distribution of the p-values against the distribution under the null. Our results suggest trial-to-trial variation in the latent activation patterns that can't be attributed to sampling noise. Our results can inform future methodology for deriving activation patterns from noisy neural spikes.

The last chapter of this dissertation dives into applying Machine Learning Techniques to analyze survey data. We use the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) survey to identify novel risk factors for suicide attempts in the general population. Our analysis uses a Balanced Random Forest (BRF) approach and incorporates extreme class imbalance and survey architecture into the algorithm. We extend prior research focusing on high-risk clinical samples by identifying risk factors for suicide attempts in the general population. Our work identifies risk variables that can help guide clinical assessment and the development of suicide risk scales.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/4dxt-h363
Date January 2022
CreatorsGarcia de la Garza, Angel
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.002 seconds