Global ETD Search

1	Bayesian Hierarchical Latent Model for Gene Set Analysis Chao, Yi 13 May 2009 (has links) Pathway is a set of genes which are predefined and serve a particular celluar or physiological function. Ranking pathways relevant to a particular phenotype can help researchers focus on a few sets of genes in pathways. In this thesis, a Bayesian hierarchical latent model was proposed using generalized linear random effects model. The advantage of the approach was that it can easily incorporate prior knowledges when the sample size was small and the number of genes was large. For the covariance matrix of a set of random variables, two Gaussian random processes were considered to construct the dependencies among genes in a pathway. One was based on the polynomial kernel and the other was based on the Gaussian kernel. Then these two kernels were compared with constant covariance matrix of the random effect by using the ratio, which was based on the joint posterior distribution with respect to each model. For mixture models, log-likelihood values were computed at different values of the mixture proportion, compared among mixtures of selected kernels and point-mass density (or constant covariance matrix). The approach was applied to a data set (Mootha et al., 2003) containing the expression profiles of type II diabetes where the motivation was to identify pathways that can discriminate between normal patients and patients with type II diabetes. / Master of Science Pathway based analysis Point-mass density Probit regression model Bayesian hierarchical model Latent variable Generalized linear mixed model
2	Bayesian models for DNA microarray data analysis Lee, Kyeong Eun 29 August 2005 (has links) Selection of signi?cant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This research proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables in a regression setting and use a Bayesian mixture prior to perform the variable selection. Due to the binary nature of the data, the posterior distributions of the parameters are not in explicit form, and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the posterior distributions. The Bayesian model is ?exible enough to identify the signi?cant genes as well as to perform future predictions. The method is applied to cancer classi?cation via cDNA microarrays. In particular, the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify the set of signi?cant genes to classify BRCA1 and others. Microarray data can also be applied to survival models. We address the issue of how to reduce the dimension in building model by selecting signi?cant genes as well as assessing the estimated survival curves. Additionally, we consider the wellknown Weibull regression and semiparametric proportional hazards (PH) models for survival analysis. With microarray data, we need to consider the case where the number of covariates p exceeds the number of samples n. Speci?cally, for a given vector of response values, which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the responsible genes, which are controlling the survival time. This approach enables us to estimate the survival curve when n << p. In our approach, rather than ?xing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional ?exibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in e?ect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology with (a) di?use large B??cell lymphoma (DLBCL) complementary DNA (cDNA) data and (b) Breast Carcinoma data. Lastly, we propose a mixture of Dirichlet process models using discrete wavelet transform for a curve clustering. In order to characterize these time??course gene expresssions, we consider them as trajectory functions of time and gene??speci?c parameters and obtain their wavelet coe?cients by a discrete wavelet transform. We then build cluster curves using a mixture of Dirichlet process priors. DNA microarray Bayesian Variable Selection Probit Regression Model Weibull Regression Model Cox's Proportional Hazard Model Survival Analysis Curve Clustering Mixture of Dirichlet Processes Wavelet
3	Bayesian models for DNA microarray data analysis Lee, Kyeong Eun 29 August 2005 (has links) Selection of signi?cant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This research proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables in a regression setting and use a Bayesian mixture prior to perform the variable selection. Due to the binary nature of the data, the posterior distributions of the parameters are not in explicit form, and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the posterior distributions. The Bayesian model is ?exible enough to identify the signi?cant genes as well as to perform future predictions. The method is applied to cancer classi?cation via cDNA microarrays. In particular, the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify the set of signi?cant genes to classify BRCA1 and others. Microarray data can also be applied to survival models. We address the issue of how to reduce the dimension in building model by selecting signi?cant genes as well as assessing the estimated survival curves. Additionally, we consider the wellknown Weibull regression and semiparametric proportional hazards (PH) models for survival analysis. With microarray data, we need to consider the case where the number of covariates p exceeds the number of samples n. Speci?cally, for a given vector of response values, which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the responsible genes, which are controlling the survival time. This approach enables us to estimate the survival curve when n << p. In our approach, rather than ?xing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional ?exibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in e?ect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology with (a) di?use large B??cell lymphoma (DLBCL) complementary DNA (cDNA) data and (b) Breast Carcinoma data. Lastly, we propose a mixture of Dirichlet process models using discrete wavelet transform for a curve clustering. In order to characterize these time??course gene expresssions, we consider them as trajectory functions of time and gene??speci?c parameters and obtain their wavelet coe?cients by a discrete wavelet transform. We then build cluster curves using a mixture of Dirichlet process priors. DNA microarray Bayesian Variable Selection Probit Regression Model Weibull Regression Model Cox's Proportional Hazard Model Survival Analysis Curve Clustering Mixture of Dirichlet Processes Wavelet
4	Statistical Modeling for Credit Ratings Vana, Laura 01 August 2018 (has links) (PDF) This thesis deals with the development, implementation and application of statistical modeling techniques which can be employed in the analysis of credit ratings. Credit ratings are one of the most widely used measures of credit risk and are relevant for a wide array of financial market participants, from investors, as part of their investment decision process, to regulators and legislators as a means of measuring and limiting risk. The majority of credit ratings is produced by the "Big Three" credit rating agencies Standard & Poors', Moody's and Fitch. Especially in the light of the 2007-2009 financial crisis, these rating agencies have been strongly criticized for failing to assess risk accurately and for the lack of transparency in their rating methodology. However, they continue to maintain a powerful role as financial market participants and have a huge impact on the cost of funding. These points of criticism call for the development of modeling techniques that can 1) facilitate an understanding of the factors that drive the rating agencies' evaluations, 2) generate insights into the rating patterns that these agencies exhibit. This dissertation consists of three research articles. The first one focuses on variable selection and assessment of variable importance in accounting-based models of credit risk. The credit risk measure employed in the study is derived from credit ratings assigned by ratings agencies Standard & Poors' and Moody's. To deal with the lack of theoretical foundation specific to this type of models, state-of-the-art statistical methods are employed. Different models are compared based on a predictive criterion and model uncertainty is accounted for in a Bayesian setting. Parsimonious models are identified after applying the proposed techniques. The second paper proposes the class of multivariate ordinal regression models for the modeling of credit ratings. The model class is motivated by the fact that correlated ordinal data arises naturally in the context of credit ratings. From a methodological point of view, we extend existing model specifications in several directions by allowing, among others, for a flexible covariate dependent correlation structure between the continuous variables underlying the ordinal credit ratings. The estimation of the proposed models is performed using composite likelihood methods. Insights into the heterogeneity among the "Big Three" are gained when applying this model class to the multiple credit ratings dataset. A comprehensive simulation study on the performance of the estimators is provided. The third research paper deals with the implementation and application of the model class introduced in the second article. In order to make the class of multivariate ordinal regression models more accessible, the R package mvord and the complementary paper included in this dissertation have been developed. The mvord package is available on the "Comprehensive R Archive Network" (CRAN) for free download and enhances the available ready-to-use statistical software for the analysis of correlated ordinal data. In the creation of the package a strong emphasis has been put on developing a user-friendly and flexible design. The user-friendly design allows end users to estimate in an easy way sophisticated models from the implemented model class. The end users the package appeals to are practitioners and researchers who deal with correlated ordinal data in various areas of application, ranging from credit risk to medicine or psychology.

1

Page generated in 0.1063 seconds