• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Approaches to modelling functional time series with an application to electricity generation data

Jin, Zehui January 2018 (has links)
We study the half-hourly electricity generation by coal and by gas in the UK over a period of three years from 2012 to 2014. As a highly frequent time series, daily cycles along with seasonality and trend across days can be seen in the data for each fuel. Taylor (2003), Taylor et al. (2006), and Taylor (2008) studied time series of the similar features by introducing double seasonality into the methods for a single univariate time series. As we are interested in the continuous variation in the generation within a day, the half-hourly observations within a day are considered as a continuous function. In this way, a time series of half-hourly discrete observations is transformed into a time series of daily functions. The idea of a time series of functions can also seen in Shang (2013), Shang and Hyndman (2011) and Hyndman and Ullah (2007). We improve their methods in a few ways. Firstly, we identify the systematic effect due to the factors that take effect in a long term, such as weather and prices of fuels, and the intrinsic differences between the days of the week. The systematic effect is modeled and removed before we study the day-by-day random variation in the functions. Secondly, we extend functional principal component analysis (PCA), which was applied on one group of functions in Shang (2013), Shang and Hyndman (2011) and Hyndman and Ullah (2007), into partial common PCA, in order to consider the covariance structures of two groups of functions and their similarities. A test on the goodness of the approximation to the functions given by the common eigenfunctions is also proposed. The idea of bootstrapping residuals from the approximation seen in Shang (2014) is employed but is improved with non-overlapping blocks and moving blocks of residuals. Thirdly, we use a vector autoregressive (VAR) model, which is a multivariate approach, to model the scores on common eigenfunctions of a group such that the cross-correlation between the scores can be considered. We include Lasso penalties in the VAR model to select the significant covariates and refit the selection with ordinary least squares to reduce the bias. Our method is compared with the stepwise procedure by Pfaff (2007), and is proved to be less variable and more accurate on estimation and prediction. Finally, we propose the method to give the point forecasts of the daily functions. It is more complicated than the methods of Shang (2013), Shang and Hyndman (2011) and Hyndman and Ullah (2007) as the systematic effect needs to be included. An adjustment interval is also given along with a point forecast, which represents the range within which the true function might vary. Our methods to give the point forecast and the adjustment interval include the information updating after the training period, which is not considered in the classical predicting equations of VAR and GARCH seen in Tsay (2013) and Engle and Bollerslev (1986).
2

Dimension Reduction and Covariance Structure for Multivariate Data, Beyond Gaussian Assumption

Maadooliat, Mehdi 2011 August 1900 (has links)
Storage and analysis of high-dimensional datasets are always challenging. Dimension reduction techniques are commonly used to reduce the complexity of the data and obtain the informative aspects of datasets. Principal Component Analysis (PCA) is one of the commonly used dimension reduction techniques. However, PCA does not work well when there are outliers or the data distribution is skewed. Gene expression index estimation is an important problem in bioinformatics. Some of the popular methods in this area are based on the PCA, and thus may not work well when there is non-Gaussian structure in the data. To address this issue, a likelihood based data transformation method with a computationally efficient algorithm is developed. Also, a new multivariate expression index is studied and the performance of the multivariate expression index is compared with the commonly used univariate expression index. As an extension of the gene expression index estimation problem, a general procedure that integrates data transformation with the PCA is developed. In particular, this general method can handle missing data and data with functional structure. It is well-known that the PCA can be obtained by the eigen decomposition of the sample covariance matrix. Another focus of this dissertation is to study the covariance (or correlation) structure under the non-Gaussian assumption. An important issue in modeling the covariance matrix is the positive definiteness constraint. The modified Cholesky decomposition of the inverse covariance matrix has been considered to address this issue in the literature. An alternative Cholesky decomposition of the covariance matrix is considered and used to construct an estimator of the covariance matrix under multivariate-t assumption. The advantage of this alternative Cholesky decomposition is the decoupling of the correlation and the variances.
3

Functional Principal Component Analysis for Discretely Observed Functional Data and Sparse Fisher’s Discriminant Analysis with Thresholded Linear Constraints

Wang, Jing 01 December 2016 (has links)
We propose a new method to perform functional principal component analysis (FPCA) for discretely observed functional data by solving successive optimization problems. The new framework can be applied to both regularly and irregularly observed data, and to both dense and sparse data. Our method does not require estimates of the individual sample functions or the covariance functions. Hence, it can be used to analyze functional data with multidimensional arguments (e.g. random surfaces). Furthermore, it can be applied to many processes and models with complicated or nonsmooth covariance functions. In our method, smoothness of eigenfunctions is controlled by directly imposing roughness penalties on eigenfunctions, which makes it more efficient and flexible to tune the smoothness. Efficient algorithms for solving the successive optimization problems are proposed. We provide the existence and characterization of the solutions to the successive optimization problems. The consistency of our method is also proved. Through simulations, we demonstrate that our method performs well in the cases with smooth samples curves, with discontinuous sample curves and nonsmooth covariance and with sample functions having two dimensional arguments (random surfaces), repectively. We apply our method to classification problems of retinal pigment epithelial cells in eyes of mice and to longitudinal CD4 counts data. In the second part of this dissertation, we propose a sparse Fisher’s discriminant analysis method with thresholded linear constraints. Various regularized linear discriminant analysis (LDA) methods have been proposed to address the problems of the LDA in high-dimensional settings. Asymptotic optimality has been established for some of these methods when there are only two classes. A difficulty in the asymptotic study for the multiclass classification is that for the two-class classification, the classification boundary is a hyperplane and an explicit formula for the classification error exists, however, in the case of multiclass, the boundary is usually complicated and no explicit formula for the error generally exists. Another difficulty in proving the asymptotic consistency and optimality for sparse Fisher’s discriminant analysis is that the covariance matrix is involved in the constraints of the optimization problems for high order components. It is not easy to estimate a general high-dimensional covariance matrix. Thus, we propose a sparse Fisher’s discriminant analysis method which avoids the estimation of the covariance matrix, provide asymptotic consistency results and the corresponding convergence rates for all components. To prove the asymptotic optimality, we provide an asymptotic upper bound for a general linear classification rule in the case of muticlass which is applied to our method to obtain the asymptotic optimality and the corresponding convergence rate. In the special case of two classes, our method achieves the same as or better convergence rates compared to the existing method. The proposed method is applied to multivariate functional data with wavelet transformations.

Page generated in 0.0893 seconds