461 |
Statistical analysis of high dimensional dataRuan, Lingyan 05 November 2010 (has links)
This century is surely the century of data (Donoho, 2000). Data analysis has been an emerging activity over the last few decades. High dimensional data is in particular more and more pervasive with the advance of massive data collection system, such as microarrays, satellite imagery, and financial data. However, analysis of high dimensional data is of challenge with the so called curse of dimensionality (Bellman 1961). This research dissertation presents several methodologies in the application of high dimensional data analysis.
The first part discusses a joint analysis of multiple microarray gene expressions. Microarray analysis dates back to Golub et al. (1999). It draws much attention after that. One common goal of microarray analysis is to determine which genes are differentially expressed. These genes behave significantly differently between groups of individuals. However, in microarray analysis, there are thousands of genes but few arrays (samples, individuals) and thus relatively low reproducibility remains. It is natural to consider joint analyses that could combine microarrays from different experiments effectively in order to achieve improved accuracy. In particular, we present a model-based approach for better identification of differentially expressed genes by incorporating data from different studies. The model can accommodate in a seamless fashion a wide range of studies including those performed at different platforms, and/or under different but overlapping biological conditions. Model-based inferences can be done in an empirical Bayes fashion. Because of the information sharing among studies, the joint analysis dramatically improves inferences based on individual analysis. Simulation studies and real data examples are presented to demonstrate the effectiveness of the proposed approach under a variety of complications that often arise in practice.
The second part is about covariance matrix estimation in high dimensional data. First, we propose a penalised likelihood estimator for high dimensional t-distribution. The student t-distribution is of increasing interest in mathematical finance, education and many other applications. However, the application in t-distribution is limited by the difficulty in the parameter estimation of the covariance matrix for high dimensional data. We show that by imposing LASSO penalty on the Cholesky factors of the covariance matrix, EM algorithm can efficiently compute the estimator and it performs much better than other popular estimators.
Secondly, we propose an estimator for high dimensional Gaussian mixture models. Finite Gaussian mixture models are widely used in statistics thanks to its great flexibility. However, parameter estimation for Gaussian mixture models with high dimensionality can be rather challenging because of the huge number of parameters that need to be estimated. For such purposes, we propose a penalized likelihood estimator to specifically address such difficulties. The LASSO penalty we impose on the inverse covariance matrices encourages sparsity on its entries and therefore helps reducing the dimensionality of the problem. We show that the proposed estimator can be efficiently computed via an Expectation-Maximization algorithm. To illustrate the practical merits of the proposed method, we consider its application in model-based clustering and mixture discriminant analysis. Numerical experiments with both simulated and real data show that the new method is a valuable tool in handling high dimensional data.
Finally, we present structured estimators for high dimensional Gaussian mixture models. The graphical representation of every cluster in Gaussian mixture models may have the same or similar structure, which is an important feature in many applications, such as image processing, speech recognition and gene network analysis. Failure to consider the sharing structure would deteriorate the estimation accuracy. To address such issues, we propose two structured estimators, hierarchical Lasso estimator and group Lasso estimator. An EM algorithm can be applied to conveniently solve the estimation problem. We show that when clusters share similar structures, the proposed estimator perform much better than the separate Lasso estimator.
|
462 |
Statistical validation and calibration of computer modelsLiu, Xuyuan 21 January 2011 (has links)
This thesis deals with modeling, validation and calibration problems in experiments of computer models. Computer models are mathematic representations of real systems developed for understanding and investigating the systems. Before a computer model
is used, it often needs to be validated by comparing the computer outputs with physical observations and calibrated by adjusting internal model parameters in order to improve the agreement between the computer outputs and physical observations.
As computer models become more powerful and popular, the complexity of input and output data raises new computational challenges and stimulates the development of novel statistical modeling methods.
One challenge is to deal with computer models with random inputs (random effects). This kind of computer models is very common in engineering applications. For example, in a thermal experiment in the Sandia National Lab (Dowding et al. 2008), the volumetric heat capacity and thermal conductivity are random input variables. If input variables are randomly sampled from particular distributions with unknown parameters, the existing methods in the literature are not directly applicable. The reason is that integration over the random variable distribution is needed for the joint likelihood and the integration cannot always be expressed in a closed form. In this research, we propose a new approach which combines the nonlinear mixed effects model and the Gaussian process model (Kriging model). Different model formulations are also studied to have an better understanding of validation and calibration activities by using the thermal problem.
Another challenge comes from computer models with functional outputs. While many methods have been developed for modeling computer experiments with single response, the literature on modeling computer experiments with functional response is sketchy. Dimension reduction techniques can be used to overcome the complexity problem of function response; however, they generally involve two steps. Models are first fit at each individual setting of the input to reduce the dimensionality of the functional data. Then the estimated parameters of the models are treated as new responses, which are further modeled for prediction. Alternatively, pointwise models are first constructed at each time point and then functional curves are fit to the parameter estimates obtained from the fitted models. In this research, we first propose a functional regression model to relate functional responses to both design and time variables in one single step. Secondly, we propose a functional kriging model which uses variable selection methods by imposing a penalty function. we show that the proposed model performs better than dimension reduction based approaches and the kriging model without regularization. In addition, non-asymptotic theoretical bounds on the estimation error are presented.
|
463 |
投資人之從眾行為與股市崩盤之關係研究陳執中 Unknown Date (has links)
近年來的股市崩盤,使得不少投資人的財富大為縮水,甚至畢生積蓄付之一炬。如果能夠建立一個初步的崩盤警示指標,便可使投資錯誤帶來的傷害降到最低。研究中使用CARA-Gaussian model為模型基礎,此原始模型對於市場上投資人收到訊息後的影響以及市場上訊息傳達的過程,做了良好的解釋。我們將投資人依收到的訊息不同分為(1)擁有私有訊息的投資者、(2)無私有訊息的投資者、(3)追漲殺跌的投資者、(4)雜訊投資者,並求出其需求函數。在代入實際資料後發現,追漲殺跌的投資者為股市崩盤的主要原因,他們的存在會影響整體投資人需求曲線的形狀。當市場上持續收到強大的負面訊息時,有可能會引起股價大幅度的滑落。
本研究挑選了1990年台灣泡沫經濟、1997年東南亞金融危機以及2000年總統大選這三段時期進行分析。在重大的金融危機事件中,我們雖然無法預測股市反轉前的最高點,但能夠在股價指數開始下滑後,檢視接下來是否有可能造成崩盤的危機。研究中受到最主要限制為如何確定各訊息對於市場的影響程度,以及公開訊息與私有訊息的分辨。如果能夠突破此一限制,此模型或許能夠更進一步預測投資人收到訊息後的股價變動。
|
464 |
Computational solutions of linear systems and models of the human tear filmMaki, Kara Lee. January 2009 (has links)
Thesis (Ph.D.)--University of Delaware, 2009. / Principal faculty advisor: Richard J. Braun, Dept. of Mathematical Sciences. Includes bibliographical references.
|
465 |
Al/P2ClAn(C2H5COOH)/P-Si/Al yapılarda elektriksel parametrelerin sıcaklığa bağlılığı /Kotan, Zeynep. Özdemir, Ahmet Faruk. January 2008 (has links) (PDF)
Tez (Yüksek Lisans) - Süleyman Demirel Üniversitesi, Fen Bilimleri Enstitüsü, Fizik Anabilim Dalı, 2008. / Kaynakça var.
|
466 |
Neutral zone classifiers within a decision-theoretic frameworkYu, Hua. January 2009 (has links)
Thesis (Ph. D.)--University of California, Riverside, 2009. / Includes abstract. Also issued in print. Includes bibliographical references (leaves 81-84). Available via ProQuest Digital Dissertations.
|
467 |
Stochastic tomography and Gaussian beam depth migrationHu, Chaoshun, 1976- 25 September 2012 (has links)
Ocean-bottom seismometers (OBS) allow wider angle recording and therefore, they have the potential to significantly enhance imaging of deep subsurface structures. Currently, conventional OBS data analysis still uses first arrival traveltime tomography and prestack Kirchhoff depth migration method. However, using first arrival traveltimes to build a velocity model has its limitations. In the Taiwan region, subduction and collision cause very complex subsurface structures and generate extensive basalt-like anomalies. Since the velocity beneath basalt-like anomalies is lower than that of high velocity anomalies, no first-arrival refractions for the target areas occur. Thus, conventional traveltime tomography is not accurate and amplitude constrained traveltime tomography can be dangerous. Here, a new first-arrival stochastic tomography method for automatic background velocity estimation is proposed. Our method uses the local beam semblance of each common-shot or common-receiver gathers instead of first-arrival picking. Both the ray parameter and traveltime information are utilized. The use of Very Fast Simulated Annealing (VFSA) method also allows for easier implementation of the uncertainty analysis. Synthetic and real data benchmark tests demonstrate that this new method is robust, efficient, and accurate. In addition, migrated images of low-fold data or data with limited observation geometry like OBS are often corrupted by migration aliasing. Incorporation of prestack instantaneous-slowness information into the imaging condition can significantly reduce migration artifacts and noise and improve the image quality in areas of poor illumination. Here I combine slowness information with Gaussian beam depth migration and implement a new slowness driven Gaussian beam prestack depth migration. The prestack instantaneous slowness information, denoted by ray parameter gathers p(x,t), is extracted from the original OBS or shot gathers using local slant stacking and subsequent localsemblance analysis. In migration, we propagate both the seismic energy and the principal instantaneous slowness information backward. At a specific image location, the beam summation is localized in the resolution-dependent Fresnel zone where the instantaneousslowness-information-related weights are used to control the beams. The effectiveness of the new method is illustrated using two synthetic data examples: a simple model and a more realistic complicated sub-basalt model. / text
|
468 |
Bayesian learning methods for neural codingPark, Mi Jung 27 January 2014 (has links)
A primary goal in systems neuroscience is to understand how neural spike responses encode information about the external world. A popular approach to this problem is to build an explicit probabilistic model that characterizes the encoding relationship in terms of a cascade of stages: (1) linear dimensionality reduction of a high-dimensional stimulus space using a bank of filters or receptive fields (RFs); (2) a nonlinear function from filter outputs to spike rate; and (3) a stochastic spiking process with recurrent feedback. These models have described single- and multi-neuron spike responses in a wide variety of brain areas.
This dissertation addresses Bayesian methods to efficiently estimate the linear and non-linear stages of the cascade encoding model. In the first part, the dissertation describes a novel Bayesian receptive field estimator based on a hierarchical prior that flexibly incorporates knowledge about the shapes of neural receptive fields. This estimator achieves error rates several times lower than existing methods, and can be applied to a variety of other neural inference problems such as extracting structure in fMRI data. The dissertation also presents active learning frameworks developed for receptive field estimation incorporating a hierarchical prior in real-time neurophysiology experiments. In addition, the dissertation describes a novel low-rank model for the high dimensional receptive field, combined with a hierarchical prior for more efficient receptive field estimation.
In the second part, the dissertation describes new models for neural nonlinearities using Gaussian processes (GPs) and Bayesian active learning algorithms in closed-loop neurophysiology experiments to rapidly estimate neural nonlinearities. The dissertation also presents several stimulus selection criteria and compare their performance in neural nonlinearity estimation. Furthermore, the dissertation presents a variation of the new models by including an additional latent Gaussian noise source, to infer the degree of over-dispersion in neural spike responses. The proposed model successfully captures various mean-variance relationships in neural spike responses and achieves higher prediction accuracy than previous models. / text
|
469 |
Multiple suppression in the t-x-p domainGhosh, Shaunak 18 February 2014 (has links)
Multiples in seismic data pose serious problems to seismic interpreters for both AVO studies and interpretation of stacked sections. Several methods have been practiced with varying degrees of success to suppress multiples in seismic data. One family of velocity filters for demultiple operations using Radon transforms traditionally face challenges when the water column is shallow. Additionally, the hyperbolic Radon Transform can be computationally expensive. In this thesis, I introduce a novel multiple suppression technique in the t-x-p domain, where p is the local slope of seismic events that aims at tackling some of the aforementioned limitations, and discuss the advantages and scope of this approach. The technique involves essentially two steps: the decomposition part and the suppression part. Common Mid-Point (CMP) gathers are taken and transformed from the original t-x space to the extended t-x-p space and eventually to the t0-x-p space, where t0 is the zero offset traveltime. Multiplication of the gather in the extended space with Gaussian tapering filters, formed using the difference of the powers of the intrinsically calculated velocities in terms of t0 , x and p using analytical relations and the picked primary velocities and stacking along the p axis produces gathers with multiples suppressed. / text
|
470 |
Nanostructure morphology variation modeling and estimation for nanomanufacturing process yield improvementLiu, Gang 01 June 2009 (has links)
Nanomanufacturing is critical to the future growth of U.S. manufacturing. Yet the process yield of current nanodevices is typically 10% or less. Particularly in nanomaterials growth, there may exist large variability across the sites on a substrate, which could lead to variability in properties. Essential to the reduction of variability is to mathematically describe the spatial variation of nanostructure. This research therefore aims at a method of modeling and estimating nanostructure morphology variation for process yield improvement. This method consists of (1) morphology variation modeling based on Gaussian Markov random field (GMRF) theory, and (2) maximum likelihood estimation (MLE) of morphology variation model based on measurement data. The research challenge lies in the proper definition and estimation of the interactions among neighboring nanostructures. To model morphology variation, nanostructures on all sites are collectively described as a GMRF.
The morphology variation model serves for the space-time growth model of nanostructures. The probability structure of the GMRF is specified by a so-called simultaneous autoregressive scheme, which defines the neighborhood systems for any site on a substrate. The neighborhood system characterizes the interactions among adjacent nanostructures by determining neighbors and their influence on a given site in terms of conditional auto-regression. The conditional auto-regression representation uniquely determines the precision matrix of the GMRF. Simulation of nanostructure morphology variation is conducted for various neighborhood structures. Considering the boundary effects, both finite lattice and infinite lattice models are discussed. The simultaneous autoregressive scheme of the GMRF is estimated via the maximum likelihood estimation (MLE) method. The MLE estimation of morphology variation requires the approximation of the determinant of the precision matrix in the GMRF.
The absolute term in the double Fourier expansion of a determinant function is used to approximate the coefficients in the precision matrix. Since the conditional MLE estimates of the parameters are affected by coding the date, different coding schemes are considered in the estimation based on numerical simulation and the data collected from SEM images. The results show that the nanostructure morphology variation modeling and estimation method could provide tools for yield improvement in nanomanufacturing.
|
Page generated in 0.0693 seconds