Sparse modelling has attracted great attention as an efficient way of handling statistical problems in high dimensions. This thesis considers sparse modelling and estimation in a selection of problems such as breakpoint detection in nonstationary time series, nonparametric regression using piecewise constant functions and variable selection in high-dimensional linear regression. We first propose a method for detecting breakpoints in the secondorder structure of piecewise stationary time series, assuming that those structural breakpoints are sufficiently scattered over time. Our choice of time series model is the locally stationary wavelet process (Nason et al., 2000), under which the entire second-order structure of a time series is described by wavelet-based local periodogram sequences. As the initial stage of breakpoint detection, we apply a binary segmentation procedure to wavelet periodogram sequences at each scale separately, which is followed by within-scale and across-scales postprocessing steps. We show that the combined methodology achieves consistent estimation of the breakpoints in terms of their total number and locations, and investigate its practical performance using both simulated and real data. Next, we study the problem of nonparametric regression by means of piecewise constant functions, which are known to be flexible in approximating a wide range of function spaces. Among many approaches developed for this purpose, we focus on comparing two well-performing techniques, the taut string (Davies & Kovac, 2001) and the Unbalanced Haar (Fryzlewicz, 2007) methods. While the multiscale nature of the latter is easily observed, it is not so obvious that the former can also be interpreted as multiscale. We provide a unified, multiscale representation for both methods, which offers an insight into the relationship between them as well as suggesting some lessons that both methods can learn from each other. Lastly, one of the most widely-studied applications of sparse modelling and estimation is considered, variable selection in high-dimensional linear regression. High dimensionality of the data brings in many complications including (possibly spurious) non-negligible correlations among the variables, which may result in marginal correlation being unreliable as a measure of association between the variables and the response. We propose a new way of measuring the contribution of each variable to the response, which adaptively takes into account high correlations among the variables. A key ingredient of the proposed tilting procedure is hard-thresholding sample correlation of the design matrix, which enables a data-driven switch between the use of marginal correlation and tilted correlation for each variable. We study the conditions under which this measure can discriminate between relevant and irrelevant variables, and thus be used as a tool for variable selection. In order to exploit these theoretical properties of tilted correlation, we construct an iterative variable screening algorithm and examine its practical performance in a comparative simulation study.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:550106 |
Date | January 2010 |
Creators | Cho, Haeran |
Publisher | London School of Economics and Political Science (University of London) |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://etheses.lse.ac.uk/257/ |
Page generated in 0.0013 seconds