Spelling suggestions: "subject:"bioinformatics - amathematical models"" "subject:"bioinformatics - dmathematical models""
1 |
New algorithms in factor analysis : applications, model selection and findings in bioinformaticsWu, Ho-chun, 胡皓竣 January 2013 (has links)
Advancements in microelectronic devices and computational and storage technologies enable the collection of high volume, high speed and high dimension data in many applications. Due to the high dimensionality of these measurements, exact dependence of the observations on the various parameters or variables may not be exactly known. Factor analysis (FA) is a useful multivariate technique to exploit the redundancies among observations and reveal their dependence to some latent variables called factors. Some major issues of the conventional FA are high arithmetic complexity for real-time online implementation, assumption of static system parameters, the demand of interval forecasting, robustness against outlying observations and model selection in problems with high dimension but low number of samples (HDLS). This thesis addresses these issues and proposes new extensions to the existing FA algorithms.
First, in order to reduce the arithmetic complexity, we propose new recursive FA algorithms (RFA) that recursively compute only the dominant Principal Components (PCs) and eigenvalues in the major subspace tracked by efficient subspace tracking algorithms. Specifically, two new approaches are proposed for updating the PCs and eigenvalues in the classical fault detection problem with different tradeoff between accuracy and arithmetic complexity, namely rank-1 modification and deflation. They significantly reduce the online arithmetic complexity and allow the adaption to time-varying system parameters.
Second, we extend the RFA algorithm to forecasting of time series and propose a new recursive dynamic factor analysis (RDFA) algorithm for electricity price forecasting. While the PCs are recursively tracked by the subspace algorithm, a random walk or a state dynamical model can be incorporated to describe the latest state of the time-varying auto-regressive (AR) model built from the factors. This formulation can be solved by the celebrated Kalman filter (KF), which in turn allows future values to be forecasted with estimated confidence intervals.
Third, we propose new robust covariance and outlier detection criteria to improve the robustness of the proposed RFA and RDFA algorithms against outlying observations based on the concept of robust M-estimation. Experimental results show that the proposed methods can effectively suppress the adverse contributions of the outliers on the factors and PCs.
Finally, in order to improve the consistency of model selection and facilitate the estimation of p-values in HDLS problems, we propose a new automatic model selection method based on ridge partial least squares and recursive feature elimination. Furthermore, a novel performance criterion is proposed for ranking variables according to their consistency of being chosen in different perturbation of the samples. Using this criterion, the associated p-values can be estimated under the HDLS setting. Experimental results using real gene cancer microarray datasets show that improved prognosis can be obtained by the proposed approach as compared with conventional techniques. Furthermore, to quantify their statistical significance, the p-value of the identified genes are estimated and functional analysis of the significant genes found in the diffused large B-cell lymphoma (DLBCL) gene microarray data is performed to validate the findings. While we focus in a few engineering problems, these algorithms are also applicable to other related applications. / published_or_final_version / Electrical and Electronic Engineering / Doctoral / Doctor of Philosophy
|
Page generated in 0.1422 seconds