Stochastic approximation has been widely used since first proposed by Herbert Robbins and Sutton Monro in 1951. It is an iterative stochastic method that attempts to find the zeros of functions that cannot be computed directly. In this thesis, we used the technique in several different aspects. It was used in the analysis of large geostatistical data, in the improvement of simulated annealing algorithm also, as well as for NMR protein structure determination.
1. We proposed a resampling based Stochastic approximation method for the analysis of large geostatistical data. The main difficulty that lies in the analysis of geostatistical data is the computation time is extremely long when the sample size becomes large. Our proposed method only use a small portion of the data at each iteration. Each time, we update our estimators based on a randomly selected subset of the data using stochastic approximation. In this way, we use the information from the whole data set while keep the computation time almost irrelevant to the sample size. We proved the consistency of our estimator and showed by simulation study that the computation time is much reduced compared to other existing methods.
2. Simulated Annealing algorithm has been widely used for optimization problems. However, it can not guarantee the global optima to be located unless a logarithmic cooling schedule is used. However, the logarithm rate is so slow that no one can afford such a long cpu time. We proposed a new stochastic optimization algorithm, the so-called simulated stochastic approximation annealing (SAA) algorithm, which is a combination of simulated annealing and the stochastic approximation Monte Carlo (SAMC) algorithm. It is shown that the new algorithm can work with a cooling schedule that decreases much faster than in the logarithmic cooling schedule while guarantee the global optima to be reached when temperature tends to zero.
3. Protein Structure determination is a very important topic in computational biology. It aims to determine different conformations for each protein, which helps to understand biological functions such as protein-protein interactions, protein-DNA interactions and so on. Protein structure determination consists of a series of steps and peak picking is a very important step. It is the prerequisite for all other steps. Manually pick the peaks is very time consuming. To automate this process, several methods have been proposed. However, due to the complexity of NMR spectra, the existing method is hard to distinguish false peaks and true peaks perfectly. The main difficulty lies in identifying true peaks with low intensity and overlapping peaks.
We propose to model the spectrum as a mixture of bivariate Gaussian densities and used stochastic approximation Monte Carlo (SAMC) method as the computational approach to solve this problem. Essentially, by putting the peak picking problem into a Bayesian framework, we turned it into a model selection problem. Because Bayesian method will automatically penalize including too much component into the model, our model will distinguish true peaks from noises without pre-process of the data.
Identifer | oai:union.ndltd.org:tamu.edu/oai:repository.tamu.edu:1969.1/151016 |
Date | 16 December 2013 |
Creators | Cheng, Yichen |
Contributors | Liang, Faming, Sang, Huiyan, Sinha, Samiran, Zhou, Jianxin |
Source Sets | Texas A and M University |
Language | English |
Detected Language | English |
Type | Thesis, text |
Format | application/pdf |
Page generated in 0.0023 seconds