Global ETD Search

1	Capturing patterns of spatial and temporal autocorrelation in ordered response data : a case study of land use and air quality changes in Austin, Texas Wang, Xiaokun, 1979- 05 May 2015 (has links) Many databases involve ordered discrete responses in a temporal and spatial context, including, for example, land development intensity levels, vehicle ownership, and pavement conditions. An appreciation of such behaviors requires rigorous statistical methods, recognizing spatial effects and dynamic processes. This dissertation develops a dynamic spatial ordered probit (DSOP) model in order to capture patterns of spatial and temporal autocorrelation in ordered categorical response data. This model is estimated in a Bayesian framework using Gibbs sampling and data augmentation, in order to generate all autocorrelated latent variables. The specifications, methodologies, and applications undertaken here advance the field of spatial econometrics while enhancing our understanding of land use and air quality changes. The proposed DSOP model incorporates spatial effects in an ordered probit model by allowing for inter-regional spatial interactions and heteroskedasticity, along with random effects across regions (where "region" describes any cluster of observational units). The model assumes an autoregressive, AR(1), process across latent response values, thereby recognizing time-series dynamics in panel data sets. The model code and estimation approach is first tested on simulated data sets, in order to reproduce known parameter values and provide insights into estimation performance. Root mean squared errors (RMSE) are used to evaluate the accuracy of estimates, and the deviance information criterion (DIC) is used for model comparisons. It is found that the DSOP model yields much more accurate estimates than standard, non-spatial techniques. As for model selection, even considering the penalty for using more parameters, the DSOP model is clearly preferred to standard OP, dynamic OP and spatial OP models. The model and methods are then used to analyze both land use and air quality (ozone) dynamics in Austin, Texas. In analyzing Austin's land use intensity patterns over a 4-point panel, the observational units are 300 m × 300 m grid cells derived from satellite images (at 30 m resolution). The sample contains 2,771 such grid cells, spread among 57 clusters (zip code regions), covering about 10% of the overall study area. In this analysis, temporal and spatial autocorrelation effects are found to be significantly positive. In addition, increases in travel times to the region's central business district (CBD) are estimated to substantially reduce land development intensity. The observational units for the ozone variation analysis are 4 km × 4 km grid cells, and all 132 observations falling in the study area are used. While variations in ozone concentration levels are found to exhibit strong patterns of temporal autocorrelation, they appear strikingly random in a spatial context (after controlling for local land cover, transportation, and temperature conditions). While transportation and land cover conditions appear to influence ozone levels, their effects are not as instantaneous, nor as practically significant as the impact of temperature. The proposed and tested DSOP model is felt to be a significant contribution to the field of spatial econometrics, where binary applications (for discrete response data) have been seen as the cutting edge. The Bayesian framework and Gibbs sampling techniques used here permit such complexity, in world of two-dimensional autocorrelation. / text Dynamic spatial ordered probit Spatial and temporal autocorrelation Bayesian framework Gibbs sampling Heteroskedasticity Root mean squared errors Deviance information criterion
2	Bayesian model estimation and comparison for longitudinal categorical data Tran, Thu Trung January 2008 (has links) In this thesis, we address issues of model estimation for longitudinal categorical data and of model selection for these data with missing covariates. Longitudinal survey data capture the responses of each subject repeatedly through time, allowing for the separation of variation in the measured variable of interest across time for one subject from the variation in that variable among all subjects. Questions concerning persistence, patterns of structure, interaction of events and stability of multivariate relationships can be answered through longitudinal data analysis. Longitudinal data require special statistical methods because they must take into account the correlation between observations recorded on one subject. A further complication in analysing longitudinal data is accounting for the non- response or drop-out process. Potentially, the missing values are correlated with variables under study and hence cannot be totally excluded. Firstly, we investigate a Bayesian hierarchical model for the analysis of categorical longitudinal data from the Longitudinal Survey of Immigrants to Australia. Data for each subject is observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia. Secondly, we examine the Bayesian model selection techniques of the Bayes factor and Deviance Information Criterion for our regression models with miss- ing covariates. Computing Bayes factors involve computing the often complex marginal likelihood p(y\|model) and various authors have presented methods to estimate this quantity. Here, we take the approach of path sampling via power posteriors (Friel and Pettitt, 2006). The appeal of this method is that for hierarchical regression models with missing covariates, a common occurrence in longitudinal data analysis, it is straightforward to calculate and interpret since integration over all parameters, including the imputed missing covariates and the random effects, is carried out automatically with minimal added complexi- ties of modelling or computation. We apply this technique to compare models for the employment status of immigrants to Australia. Finally, we also develop a model choice criterion based on the Deviance In- formation Criterion (DIC), similar to Celeux et al. (2006), but which is suitable for use with generalized linear models (GLMs) when covariates are missing at random. We define three different DICs: the marginal, where the missing data are averaged out of the likelihood; the complete, where the joint likelihood for response and covariates is considered; and the naive, where the likelihood is found assuming the missing values are parameters. These three versions have different computational complexities. We investigate through simulation the performance of these three different DICs for GLMs consisting of normally, binomially and multinomially distributed data with missing covariates having a normal distribution. We find that the marginal DIC and the estimate of the effective number of parameters, pD, have desirable properties appropriately indicating the true model for the response under differing amounts of missingness of the covariates. We find that the complete DIC is inappropriate generally in this context as it is extremely sensitive to the degree of missingness of the covariate model. Our new methodology is illustrated by analysing the results of a community survey.
3	Bayesian Methods in Gaussian Graphical Models Mitsakakis, Nikolaos 31 August 2010 (has links) This thesis contributes to the field of Gaussian Graphical Models by exploring either numerically or theoretically various topics of Bayesian Methods in Gaussian Graphical Models and by providing a number of interesting results, the further exploration of which would be promising, pointing to numerous future research directions. Gaussian Graphical Models are statistical methods for the investigation and representation of interdependencies between components of continuous random vectors. This thesis aims to investigate some issues related to the application of Bayesian methods for Gaussian Graphical Models. We adopt the popular $G$-Wishart conjugate prior $W_G(\delta,D)$ for the precision matrix. We propose an efficient sampling method for the $G$-Wishart distribution based on the Metropolis Hastings algorithm and show its validity through a number of numerical experiments. We show that this method can be easily used to estimate the Deviance Information Criterion, providing a computationally inexpensive approach for model selection. In addition, we look at the marginal likelihood of a graphical model given a set of data. This is proportional to the ratio of the posterior over the prior normalizing constant. We explore methods for the estimation of this ratio, focusing primarily on applying the Monte Carlo simulation method of path sampling. We also explore numerically the effect of the completion of the incomplete matrix $D^{\mathcal{V}}$, hyperparameter of the $G$-Wishart distribution, for the estimation of the normalizing constant. We also derive a series of exact and approximate expressions for the Bayes Factor between two graphs that differ by one edge. A new theoretical result regarding the limit of the normalizing constant multiplied by the hyperparameter $\delta$ is given and its implications to the validity of an improper prior and of the subsequent Bayes Factor are discussed. Gaussian Graphical Models DY-conjugate prior Markov Chain Monte Carlo Model Selection Normalizing constant Metropolis Hastings Bayes Factors Deviance Information Criterion 0308
4	Bayesian Methods in Gaussian Graphical Models Mitsakakis, Nikolaos 31 August 2010 (has links) This thesis contributes to the field of Gaussian Graphical Models by exploring either numerically or theoretically various topics of Bayesian Methods in Gaussian Graphical Models and by providing a number of interesting results, the further exploration of which would be promising, pointing to numerous future research directions. Gaussian Graphical Models are statistical methods for the investigation and representation of interdependencies between components of continuous random vectors. This thesis aims to investigate some issues related to the application of Bayesian methods for Gaussian Graphical Models. We adopt the popular $G$-Wishart conjugate prior $W_G(\delta,D)$ for the precision matrix. We propose an efficient sampling method for the $G$-Wishart distribution based on the Metropolis Hastings algorithm and show its validity through a number of numerical experiments. We show that this method can be easily used to estimate the Deviance Information Criterion, providing a computationally inexpensive approach for model selection. In addition, we look at the marginal likelihood of a graphical model given a set of data. This is proportional to the ratio of the posterior over the prior normalizing constant. We explore methods for the estimation of this ratio, focusing primarily on applying the Monte Carlo simulation method of path sampling. We also explore numerically the effect of the completion of the incomplete matrix $D^{\mathcal{V}}$, hyperparameter of the $G$-Wishart distribution, for the estimation of the normalizing constant. We also derive a series of exact and approximate expressions for the Bayes Factor between two graphs that differ by one edge. A new theoretical result regarding the limit of the normalizing constant multiplied by the hyperparameter $\delta$ is given and its implications to the validity of an improper prior and of the subsequent Bayes Factor are discussed. Gaussian Graphical Models DY-conjugate prior Markov Chain Monte Carlo Model Selection Normalizing constant Metropolis Hastings Bayes Factors Deviance Information Criterion 0308
5	Bayesian modelling of ultra high-frequency financial data Shahtahmassebi, Golnaz January 2011 (has links) The availability of ultra high-frequency (UHF) data on transactions has revolutionised data processing and statistical modelling techniques in finance. The unique characteristics of such data, e.g. discrete structure of price change, unequally spaced time intervals and multiple transactions have introduced new theoretical and computational challenges. In this study, we develop a Bayesian framework for modelling integer-valued variables to capture the fundamental properties of price change. We propose the application of the zero inflated Poisson difference (ZPD) distribution for modelling UHF data and assess the effect of covariates on the behaviour of price change. For this purpose, we present two modelling schemes; the first one is based on the analysis of the data after the market closes for the day and is referred to as off-line data processing. In this case, the Bayesian interpretation and analysis are undertaken using Markov chain Monte Carlo methods. The second modelling scheme introduces the dynamic ZPD model which is implemented through Sequential Monte Carlo methods (also known as particle filters). This procedure enables us to update our inference from data as new transactions take place and is known as online data processing. We apply our models to a set of FTSE100 index changes. Based on the probability integral transform, modified for the case of integer-valued random variables, we show that our models are capable of explaining well the observed distribution of price change. We then apply the deviance information criterion and introduce its sequential version for the purpose of model comparison for off-line and online modelling, respectively. Moreover, in order to add more flexibility to the tails of the ZPD distribution, we introduce the zero inflated generalised Poisson difference distribution and outline its possible application for modelling UHF data. 658.05
6	Modélisation des bi-grappes et sélection des variables pour des données de grande dimension : application aux données d’expression génétique Chekouo Tekougang, Thierry 08 1900 (has links) Les simulations ont été implémentées avec le programme Java. / Le regroupement des données est une méthode classique pour analyser les matrices d'expression génétiques. Lorsque le regroupement est appliqué sur les lignes (gènes), chaque colonne (conditions expérimentales) appartient à toutes les grappes obtenues. Cependant, il est souvent observé que des sous-groupes de gènes sont seulement co-régulés (i.e. avec les expressions similaires) sous un sous-groupe de conditions. Ainsi, les techniques de bi-regroupement ont été proposées pour révéler ces sous-matrices des gènes et conditions. Un bi-regroupement est donc un regroupement simultané des lignes et des colonnes d'une matrice de données. La plupart des algorithmes de bi-regroupement proposés dans la littérature n'ont pas de fondement statistique. Cependant, il est intéressant de porter une attention sur les modèles sous-jacents à ces algorithmes et de développer des modèles statistiques permettant d'obtenir des bi-grappes significatives. Dans cette thèse, nous faisons une revue de littérature sur les algorithmes qui semblent être les plus populaires. Nous groupons ces algorithmes en fonction du type d'homogénéité dans la bi-grappe et du type d'imbrication que l'on peut rencontrer. Nous mettons en lumière les modèles statistiques qui peuvent justifier ces algorithmes. Il s'avère que certaines techniques peuvent être justifiées dans un contexte bayésien. Nous développons une extension du modèle à carreaux (plaid) de bi-regroupement dans un cadre bayésien et nous proposons une mesure de la complexité du bi-regroupement. Le critère d'information de déviance (DIC) est utilisé pour choisir le nombre de bi-grappes. Les études sur les données d'expression génétiques et les données simulées ont produit des résultats satisfaisants. À notre connaissance, les algorithmes de bi-regroupement supposent que les gènes et les conditions expérimentales sont des entités indépendantes. Ces algorithmes n'incorporent pas de l'information biologique a priori que l'on peut avoir sur les gènes et les conditions. Nous introduisons un nouveau modèle bayésien à carreaux pour les données d'expression génétique qui intègre les connaissances biologiques et prend en compte l'interaction par paires entre les gènes et entre les conditions à travers un champ de Gibbs. La dépendance entre ces entités est faite à partir des graphes relationnels, l'un pour les gènes et l'autre pour les conditions. Le graphe des gènes et celui des conditions sont construits par les k-voisins les plus proches et permet de définir la distribution a priori des étiquettes comme des modèles auto-logistiques. Les similarités des gènes se calculent en utilisant l'ontologie des gènes (GO). L'estimation est faite par une procédure hybride qui mixe les MCMC avec une variante de l'algorithme de Wang-Landau. Les expériences sur les données simulées et réelles montrent la performance de notre approche. Il est à noter qu'il peut exister plusieurs variables de bruit dans les données à micro-puces, c'est-à-dire des variables qui ne sont pas capables de discriminer les groupes. Ces variables peuvent masquer la vraie structure du regroupement. Nous proposons un modèle inspiré de celui à carreaux qui, simultanément retrouve la vraie structure de regroupement et identifie les variables discriminantes. Ce problème est traité en utilisant un vecteur latent binaire, donc l'estimation est obtenue via l'algorithme EM de Monte Carlo. L'importance échantillonnale est utilisée pour réduire le coût computationnel de l'échantillonnage Monte Carlo à chaque étape de l'algorithme EM. Nous proposons un nouveau modèle pour résoudre le problème. Il suppose une superposition additive des grappes, c'est-à-dire qu'une observation peut être expliquée par plus d'une seule grappe. Les exemples numériques démontrent l'utilité de nos méthodes en terme de sélection de variables et de regroupement. / Clustering is a classical method to analyse gene expression data. When applied to the rows (e.g. genes), each column belongs to all clusters. However, it is often observed that the genes of a subset of genes are co-regulated and co-expressed in a subset of conditions, but behave almost independently under other conditions. For these reasons, biclustering techniques have been proposed to look for sub-matrices of a data matrix. Biclustering is a simultaneous clustering of rows and columns of a data matrix. Most of the biclustering algorithms proposed in the literature have no statistical foundation. It is interesting to pay attention to the underlying models of these algorithms and develop statistical models to obtain significant biclusters. In this thesis, we review some biclustering algorithms that seem to be most popular. We group these algorithms in accordance to the type of homogeneity in the bicluster and the type of overlapping that may be encountered. We shed light on statistical models that can justify these algorithms. It turns out that some techniques can be justified in a Bayesian framework. We develop an extension of the biclustering plaid model in a Bayesian framework and we propose a measure of complexity for biclustering. The deviance information criterion (DIC) is used to select the number of biclusters. Studies on gene expression data and simulated data give satisfactory results. To our knowledge, the biclustering algorithms assume that genes and experimental conditions are independent entities. These algorithms do not incorporate prior biological information that could be available on genes and conditions. We introduce a new Bayesian plaid model for gene expression data which integrates biological knowledge and takes into account the pairwise interactions between genes and between conditions via a Gibbs field. Dependence between these entities is made from relational graphs, one for genes and another for conditions. The graph of the genes and conditions is constructed by the k-nearest neighbors and allows to define a priori distribution of labels as auto-logistic models. The similarities of genes are calculated using gene ontology (GO). To estimate the parameters, we adopt a hybrid procedure that mixes MCMC with a variant of the Wang-Landau algorithm. Experiments on simulated and real data show the performance of our approach. It should be noted that there may be several variables of noise in microarray data. These variables may mask the true structure of the clustering. Inspired by the plaid model, we propose a model that simultaneously finds the true clustering structure and identifies discriminating variables. We propose a new model to solve the problem. It assumes that an observation can be explained by more than one cluster. This problem is addressed by using a binary latent vector, so the estimation is obtained via the Monte Carlo EM algorithm. Importance Sampling is used to reduce the computational cost of the Monte Carlo sampling at each step of the EM algorithm. Numerical examples demonstrate the usefulness of these methods in terms of variable selection and clustering. Groupement Clustering Ontologie des gènes Gene Ontology Expression génétique gene expression Critère d’information de déviance Deviance information criterion Algorithme de Wang-Landau Wang-Landau algorithm modèle auto-logistique auto-logistic models Sélection des variables Variable selection modèle à carreaux plaid model Algorithme EM de Monte Carlo Monte Carlo EM algorithm Importance échantillonnale Importance Sampling
7	Modélisation des bi-grappes et sélection des variables pour des données de grande dimension : application aux données d’expression génétique Chekouo Tekougang, Thierry 08 1900 (has links) Le regroupement des données est une méthode classique pour analyser les matrices d'expression génétiques. Lorsque le regroupement est appliqué sur les lignes (gènes), chaque colonne (conditions expérimentales) appartient à toutes les grappes obtenues. Cependant, il est souvent observé que des sous-groupes de gènes sont seulement co-régulés (i.e. avec les expressions similaires) sous un sous-groupe de conditions. Ainsi, les techniques de bi-regroupement ont été proposées pour révéler ces sous-matrices des gènes et conditions. Un bi-regroupement est donc un regroupement simultané des lignes et des colonnes d'une matrice de données. La plupart des algorithmes de bi-regroupement proposés dans la littérature n'ont pas de fondement statistique. Cependant, il est intéressant de porter une attention sur les modèles sous-jacents à ces algorithmes et de développer des modèles statistiques permettant d'obtenir des bi-grappes significatives. Dans cette thèse, nous faisons une revue de littérature sur les algorithmes qui semblent être les plus populaires. Nous groupons ces algorithmes en fonction du type d'homogénéité dans la bi-grappe et du type d'imbrication que l'on peut rencontrer. Nous mettons en lumière les modèles statistiques qui peuvent justifier ces algorithmes. Il s'avère que certaines techniques peuvent être justifiées dans un contexte bayésien. Nous développons une extension du modèle à carreaux (plaid) de bi-regroupement dans un cadre bayésien et nous proposons une mesure de la complexité du bi-regroupement. Le critère d'information de déviance (DIC) est utilisé pour choisir le nombre de bi-grappes. Les études sur les données d'expression génétiques et les données simulées ont produit des résultats satisfaisants. À notre connaissance, les algorithmes de bi-regroupement supposent que les gènes et les conditions expérimentales sont des entités indépendantes. Ces algorithmes n'incorporent pas de l'information biologique a priori que l'on peut avoir sur les gènes et les conditions. Nous introduisons un nouveau modèle bayésien à carreaux pour les données d'expression génétique qui intègre les connaissances biologiques et prend en compte l'interaction par paires entre les gènes et entre les conditions à travers un champ de Gibbs. La dépendance entre ces entités est faite à partir des graphes relationnels, l'un pour les gènes et l'autre pour les conditions. Le graphe des gènes et celui des conditions sont construits par les k-voisins les plus proches et permet de définir la distribution a priori des étiquettes comme des modèles auto-logistiques. Les similarités des gènes se calculent en utilisant l'ontologie des gènes (GO). L'estimation est faite par une procédure hybride qui mixe les MCMC avec une variante de l'algorithme de Wang-Landau. Les expériences sur les données simulées et réelles montrent la performance de notre approche. Il est à noter qu'il peut exister plusieurs variables de bruit dans les données à micro-puces, c'est-à-dire des variables qui ne sont pas capables de discriminer les groupes. Ces variables peuvent masquer la vraie structure du regroupement. Nous proposons un modèle inspiré de celui à carreaux qui, simultanément retrouve la vraie structure de regroupement et identifie les variables discriminantes. Ce problème est traité en utilisant un vecteur latent binaire, donc l'estimation est obtenue via l'algorithme EM de Monte Carlo. L'importance échantillonnale est utilisée pour réduire le coût computationnel de l'échantillonnage Monte Carlo à chaque étape de l'algorithme EM. Nous proposons un nouveau modèle pour résoudre le problème. Il suppose une superposition additive des grappes, c'est-à-dire qu'une observation peut être expliquée par plus d'une seule grappe. Les exemples numériques démontrent l'utilité de nos méthodes en terme de sélection de variables et de regroupement. / Clustering is a classical method to analyse gene expression data. When applied to the rows (e.g. genes), each column belongs to all clusters. However, it is often observed that the genes of a subset of genes are co-regulated and co-expressed in a subset of conditions, but behave almost independently under other conditions. For these reasons, biclustering techniques have been proposed to look for sub-matrices of a data matrix. Biclustering is a simultaneous clustering of rows and columns of a data matrix. Most of the biclustering algorithms proposed in the literature have no statistical foundation. It is interesting to pay attention to the underlying models of these algorithms and develop statistical models to obtain significant biclusters. In this thesis, we review some biclustering algorithms that seem to be most popular. We group these algorithms in accordance to the type of homogeneity in the bicluster and the type of overlapping that may be encountered. We shed light on statistical models that can justify these algorithms. It turns out that some techniques can be justified in a Bayesian framework. We develop an extension of the biclustering plaid model in a Bayesian framework and we propose a measure of complexity for biclustering. The deviance information criterion (DIC) is used to select the number of biclusters. Studies on gene expression data and simulated data give satisfactory results. To our knowledge, the biclustering algorithms assume that genes and experimental conditions are independent entities. These algorithms do not incorporate prior biological information that could be available on genes and conditions. We introduce a new Bayesian plaid model for gene expression data which integrates biological knowledge and takes into account the pairwise interactions between genes and between conditions via a Gibbs field. Dependence between these entities is made from relational graphs, one for genes and another for conditions. The graph of the genes and conditions is constructed by the k-nearest neighbors and allows to define a priori distribution of labels as auto-logistic models. The similarities of genes are calculated using gene ontology (GO). To estimate the parameters, we adopt a hybrid procedure that mixes MCMC with a variant of the Wang-Landau algorithm. Experiments on simulated and real data show the performance of our approach. It should be noted that there may be several variables of noise in microarray data. These variables may mask the true structure of the clustering. Inspired by the plaid model, we propose a model that simultaneously finds the true clustering structure and identifies discriminating variables. We propose a new model to solve the problem. It assumes that an observation can be explained by more than one cluster. This problem is addressed by using a binary latent vector, so the estimation is obtained via the Monte Carlo EM algorithm. Importance Sampling is used to reduce the computational cost of the Monte Carlo sampling at each step of the EM algorithm. Numerical examples demonstrate the usefulness of these methods in terms of variable selection and clustering. / Les simulations ont été implémentées avec le programme Java. Groupement Clustering Ontologie des gènes Gene Ontology Expression génétique gene expression Critère d’information de déviance Deviance information criterion Algorithme de Wang-Landau Wang-Landau algorithm modèle auto-logistique auto-logistic models Sélection des variables Variable selection modèle à carreaux plaid model Algorithme EM de Monte Carlo Monte Carlo EM algorithm Importance échantillonnale Importance Sampling
8	空間相關存活資料之貝氏半參數比例勝算模式 / Bayesian semiparametric proportional odds models for spatially correlated survival data 張凱嵐, Chang, Kai lan Unknown Date (has links) 近來地理資訊系統(GIS)之資料庫受到不同領域的統計學家廣泛的研究，以期建立及分析可描述空間聚集效應及變異之模型，而描述空間相關存活資料之統計模式為公共衛生及流行病學上新興的研究議題。本文擬建立多維度半參數的貝氏階層模型，並結合空間及非空間隨機效應以描述存活資料中的空間變異。此模式將利用多變量條件自回歸(MCAR)模型以檢驗在不同地理區域中是否存有空間聚集效應。而基準風險函數之生成為分析貝氏半參數階層模型的重要步驟，本研究將利用混合Polya樹之方式生成基準風險函數。美國國家癌症研究院之「流行病監測及最終結果」(Surveillance Epidemiology and End Results, SEER)資料庫為目前美國最完整的癌症病人長期追蹤資料，包含癌症病人存活狀況、多重癌症史、居住地區及其他分析所需之個人資料。本文將自此資料庫擷取美國愛荷華州之癌症病人資料為例作實證分析，並以貝氏統計分析中常用之模型比較標準如條件預測指標(CPO)、平均對數擬邊際概似函數值(ALMPL)、離差訊息準則(DIC)分別測試其可靠度。 / The databases of Geographic Information System (GIS) have gained attention among different fields of statisticians to develop and analyze models which account for spatial clustering and variation. There is an emerging interest in modeling spatially correlated survival data in public health and epidemiologic studies. In this article, we develop Bayesian multivariate semiparametric hierarchical models to incorporate both spatially correlated and uncorrelated frailties to answer the question of spatial variation in the survival patterns, and we use multivariate conditionally autoregressive (MCAR) model to detect that whether there exists the spatial cluster across different areas. The baseline hazard function will be modeled semiparametrically using mixtures of finite Polya trees. The SEER (Surveillance Epidemiology and End Results) database from the National Cancer Institute (NCI) provides comprehensive cancer data about patient’s survival time, regional information, and others demographic information. We implement our Bayesian hierarchical spatial models on Iowa cancer data extracted from SEER database. We illustrate how to compute the conditional predictive ordinate (CPO), the average log-marginal pseudo-likelihood (ALMPL), and deviance information criterion (DIC), which are Bayesian criterions for model checking and comparison among competing models. 空間聚集比例勝算模型貝氏階層模型混合Polya樹馬可夫鏈蒙地卡羅模擬多變量條件自回歸模型條件預測指標平均對數擬邊際概似函數值離差訊息準則 spatial clusters proportional odds Bayesian hierarchical model mixture of Polya trees Markov Chain Monte Carlo (MCMC) conditional predictive ordinate (CPO) deviance information criterion (DIC)

Search results