1 |
Bayesian models for DNA microarray data analysisLee, Kyeong Eun 29 August 2005 (has links)
Selection of signi?cant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This research proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables in a regression setting and use a Bayesian mixture prior to perform the variable selection. Due to the binary nature of the data, the posterior distributions of the parameters are not in explicit form, and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the posterior distributions. The Bayesian model is ?exible enough to identify the signi?cant genes as well as to perform future predictions. The method is applied to cancer classi?cation via cDNA microarrays. In particular, the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify the set of signi?cant genes to classify BRCA1 and others. Microarray data can also be applied to survival models. We address the issue of how to reduce the dimension in building model by selecting signi?cant genes as well as assessing the estimated survival curves. Additionally, we consider the wellknown Weibull regression and semiparametric proportional hazards (PH) models for survival analysis. With microarray data, we need to consider the case where the number of covariates p exceeds the number of samples n. Speci?cally, for a given vector of response values, which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the responsible genes, which are controlling the survival time. This approach enables us to estimate the survival curve when n << p. In our approach, rather than ?xing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional ?exibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in e?ect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology with (a) di?use large B??cell lymphoma (DLBCL) complementary DNA (cDNA) data and (b) Breast Carcinoma data. Lastly, we propose a mixture of Dirichlet process models using discrete wavelet transform for a curve clustering. In order to characterize these time??course gene expresssions, we consider them as trajectory functions of time and gene??speci?c parameters and obtain their wavelet coe?cients by a discrete wavelet transform. We then build cluster curves using a mixture of Dirichlet process priors.
|
2 |
Bayesian models for DNA microarray data analysisLee, Kyeong Eun 29 August 2005 (has links)
Selection of signi?cant genes via expression patterns is important in a microarray problem. Owing to small sample size and large number of variables (genes), the selection process can be unstable. This research proposes a hierarchical Bayesian model for gene (variable) selection. We employ latent variables in a regression setting and use a Bayesian mixture prior to perform the variable selection. Due to the binary nature of the data, the posterior distributions of the parameters are not in explicit form, and we need to use a combination of truncated sampling and Markov Chain Monte Carlo (MCMC) based computation techniques to simulate the posterior distributions. The Bayesian model is ?exible enough to identify the signi?cant genes as well as to perform future predictions. The method is applied to cancer classi?cation via cDNA microarrays. In particular, the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the method is used to identify the set of signi?cant genes to classify BRCA1 and others. Microarray data can also be applied to survival models. We address the issue of how to reduce the dimension in building model by selecting signi?cant genes as well as assessing the estimated survival curves. Additionally, we consider the wellknown Weibull regression and semiparametric proportional hazards (PH) models for survival analysis. With microarray data, we need to consider the case where the number of covariates p exceeds the number of samples n. Speci?cally, for a given vector of response values, which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the responsible genes, which are controlling the survival time. This approach enables us to estimate the survival curve when n << p. In our approach, rather than ?xing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional ?exibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in e?ect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology with (a) di?use large B??cell lymphoma (DLBCL) complementary DNA (cDNA) data and (b) Breast Carcinoma data. Lastly, we propose a mixture of Dirichlet process models using discrete wavelet transform for a curve clustering. In order to characterize these time??course gene expresssions, we consider them as trajectory functions of time and gene??speci?c parameters and obtain their wavelet coe?cients by a discrete wavelet transform. We then build cluster curves using a mixture of Dirichlet process priors.
|
3 |
雙變量脆弱性韋伯迴歸模式之研究余立德, Yu, Li-Ta Unknown Date (has links)
摘要
本文主要考慮群集樣本(clustered samples)的存活分析,而每一群集中又分為兩種組別(groups)。假定同群集同組別內的個體共享相同但不可觀測的隨機脆弱性(frailty),因此面臨的是雙變量脆弱性變數的多變量存活資料。首先,驗證雙變量脆弱性對雙變量對數存活時間及雙變量存活時間之相關係數所造成的影響。接著,假定雙變量脆弱性服從雙變量對數常態分配,條件存活時間模式為韋伯迴歸模式,我們利用EM法則,推導出雙變量脆弱性之多變量存活模式中母數的估計方法。
關鍵詞:雙變量脆弱性,Weibull迴歸模式,對數常態分配,EM法則 / Abstract
Consider survival analysis for clustered samples, where each cluster contains two groups. Assume that individuals within the same cluster and the same group share a common but unobservable random frailty. Hence, the focus of this work is on bivariate frailty model in analysis of multivariate survival data. First, we derive expressions for the correlation between the two survival times to show how the bivariate frailty affects these correlation coefficients. Then, the bivariate log-normal distribution is used to model the bivariate frailty. We modified EM algorithm to estimate the parameters for the Weibull regression model with bivariate log-normal frailty.
Key words:bivariate frailty, Weibull regression model, log-normal distribution, EM algorithm.
|
4 |
Modelos não-lineares de regressão : alguns aspectos de teoria assintóticaPRUDENTE, Andréa Andrade 18 March 2009 (has links)
Submitted by (ana.araujo@ufrpe.br) on 2016-05-20T14:30:00Z
No. of bitstreams: 1
Andrea Andrade Prudente.pdf: 1364424 bytes, checksum: 52db48248a4f42fd96b6ee53463083eb (MD5) / Made available in DSpace on 2016-05-20T14:30:00Z (GMT). No. of bitstreams: 1
Andrea Andrade Prudente.pdf: 1364424 bytes, checksum: 52db48248a4f42fd96b6ee53463083eb (MD5)
Previous issue date: 2009-03-18 / Conselho Nacional de Pesquisa e Desenvolvimento Científico e Tecnológico - CNPq / The main objective in this dissertation is to derive expressions for the second-order biases of the maximum likelihood estimators of the parameters of the Weibull generalized linear model (WGLM), which are useful to define corrected estimators. In order to reduce the bias of these estimators in finite sample sizes, the method of bias correction introduced by Cox and Snell (1968) was used. The new model adopts a link function which relates the vector of scale parameters of the Weibull distribution to a linear predictor. As a second objective, a revision of the normal non-linear models was also presented, including the method of least squares for estimating the parameters, some asymptotic results, measures of nonlinearity and diagnostic techniques, because in contrast to linear models, quality and, especially, the validity of their fits are evaluated not only by means of regression diagnostics, but also with the extent of the non-linear behavior. Finally, a brief description of generalized linear models (GLM) is given and the applicability of the model range. Real data sets were analyzed to demonstrate the applicability of the proposed models. These tests were conducted in the R environment for programming, data analysis, andgraphics. / Esta dissertação tem como objetivo principal apresentar expressões para os vieses de segunda ordem dos estimadores de máxima verossimilhança dos parâmetros do modelo linear generalizado de Weibull (MLGW), utilizando-as para obter estimadores corrigidos. Com o intuito de reduzir os vieses destes estimadores, em amostras de tamanho finito, utilizou-se a correção do viés pelo uso da equação de Cox e Snell (1968). Esse modelo permite a utilização de uma função de ligação para relacionar o vetor dos parâmetros de escala da distribuição de Weibull (parte da média) ao preditor linear. Um objetivo secundário foi revisar os modelos normais não-lineares, contemplando o método de mínimos quadrados para estimação dos seus parâmetros, alguns resultados assintóticos, medidas de não-linearidade e técnicas de diagnóstico, pois ao contrário dos modelos lineares, a qualidade e, principalmente, a validade dos seus ajustes são avaliadas não só por meio de diagnósticos de regressão, mas pela extensão do comportamento nãolinear. Por fim, foi apresentada, também, uma sucinta descrição dos modelos lineares generalizados (MLG) e a aplicabilidade do modelo gama. Dados reais foram analisados para demonstrar a aplicabilidade dos modelos propostos. Estas análises foram realizadas no ambiente de programação, análise de dados e gráficos R.
|
5 |
事故傾向服從Inverse Gaussian分配時混合Weibull模式之研究黃(糸秀)琪, Huang,Hsiu-Chi Unknown Date (has links)
本篇論文主要考慮成群資料的存活分析,其特點為群內個體間具有相關性,並假定群內個體具有相同但無法觀測到的事故傾向。首先,探討事故傾向服從任一連續分配時混合Weibull迴歸模式的特性,接著,推導出事故傾向服從血Inverse Gaussian吧時之混合Weibull模式,並介紹參數的估計問題。然後,推導出群內個體是否獨立之分數檢定統計量,以分別就兩種最常見的存活資料型態一完整型態與右設限型態:檢定模式中事故傾向的效應是否存在。最後,並以實例說明分數檢定之程序。 / In this paper, we study survival analysis for grouped data, where the within group correlations are considered. It is also assumed that individuals within the same group share a common but unobservable random frailty. First, we discuss the properties of the Weibull regression model mixed by any continuous distribution. Next, we derive an Inverse Gaussan mixture of Weibull regression model, and discuss the estimation problem. Then, we derive the score test for testing independence between components within the same group, where the two most common cases are discussed the complete data case and the right censoring case. Finally, the testing procedures are illustrated by two examples.
|
6 |
含存活分率之貝氏迴歸模式李涵君 Unknown Date (has links)
當母體中有部份對象因被治癒或免疫而不會失敗時,需考慮這群對象所佔的比率,即存活分率。本文主要在探討如何以貝氏方法對含存活分率之治癒率模式進行分析,並特別針對兩種含存活分率的迴歸模式,分別是Weibull迴歸模式以及對數邏輯斯迴歸模式,導出概似函數與各參數之完全條件後驗分配及其性質。由於聯合後驗分配相當複雜,各參數之邊際後驗分配之解析形式很難表達出。所以,我們採用了馬可夫鏈蒙地卡羅方法(MCMC)中的Gibbs抽樣法及Metropolis法,模擬產生參數值,以進行貝氏分析。實證部份,我們分析了黑色素皮膚癌的資料,這是由美國Eastern Cooperative Oncology Group所進行的第三階段臨床試驗研究。有關模式選取的部份,我們先分別求出各對象在每個模式之下的條件預測指標(CPO),再據以算出各模式的對數擬邊際概似函數值(LPML),以比較各模式之適合性。 / When we face the problem that part of subjects have been cured or are immune so they never fail, we need to consider the fraction of this group among the whole population, which is the so called survival fraction. This article discuss that how to analyze cure rate models containing survival fraction based on Bayesian method. Two cure rate models containing survival fraction are focused; one is based on the Weibull regression model and the other is based on the log-logistic regression model. Then, we derive likelihood functions and full conditional posterior distributions under these two models. Since joint posterior distributions are both complicated, and marginal posterior distributions don’t have closed form, we take Gibbs sampling and Metropolis sampling of Markov Monte Carlo chain method to simulate parameter values. We illustrate how to conduct Bayesian analysis by using the data from a melanoma clinical trial in the third stage conducted by Eastern Cooperative Oncology Group. To do model selection, we compute the conditional predictive ordinate (CPO) for every subject under each model, then the goodness is determined by the comparing the value of log of pseudomarginal likelihood (LPML) of each model.
|
Page generated in 0.112 seconds