1 |
The impact of choosing different meshes under INLA/SPDE framework for geostatistical modelling / O impacto na escolha de diferentes malhas em modelagem geoestatística sob a abordagem INLA/SPDERighetto, Ana Julia 02 October 2017 (has links)
Spatial statistics methods are widely used since several areas of knowledge such as environmental sciences, geology, agronomy, among others, involve the understanding of the spatial distribution of processes from spatially referenced data. With the advancement of Geographic Information Systems and the Global Positioning Systems this use has been extended. Many methods used in spatial statistics are computationally demanding, and therefore, the development of more computationally efficient methods has received a lot of attention in recent years. One such important development is the introduction of the integrated nested Laplace approximation method which is able to carry out Bayesian analysis in a more efficient way. The use of this method for geostatistical data is commonly done considering the stochastic partial differential equation approach that requires the creation of a mesh overlying the study area. This is the first and an important step since all results will depend on the choice of this mesh. As there is no formal and close way to specify the mesh, we investigate possible guidelines on how a suitable mesh is chosen for a specific problem. Through simulations studies, we tried to create guidelines for the construction of the mesh for random, regular and cluster data set and we aplly this guidelines in real data set. / Métodos de estatística espacial são amplamente utilizados, uma vez que várias áreas do conhecimento, como ciências ambientais, geologia, agronomia, entre outros, envolvem a compreensão da distribuição espacial de processos a partir de dados referenciados espacialmente. Com o avanço dos Sistemas de Informação Geográfica e dos Sistemas de Posicionamento Global, esse uso foi ampliado. Muitos métodos utilizados na estatística espacial são computacionalmente exigentes e, portanto, o desenvolvimento de métodos mais eficientes recebeu muita atenção nos últimos anos. Um desenvolvimento importante foi a introdução do método de aproximação de Laplace aninhado integrado, capaz de realizar análises Bayesianas de forma mais eficiente. O uso deste método para dados geoestatísticos é comumente feito considerando a abordagem de equações diferenciais parciais estocásticas que requer a criação de uma malha que cobre a área de estudo. Este é o primeiro e um importante passo, pois todos os resultados dependerão da escolha desta malha. Como não existe uma maneira formal e direta de especificar a malha, investigamos possíveis diretrizes sobre como uma malha adequada é escolhida para um problema específico. Através de estudos de simulações, tentamos criar diretrizes para a construção da malha para conjunto de dados aleatórios, regulares e de cluster e aplicamos essas diretrizes em conjunto de dados reais.
|
2 |
The impact of choosing different meshes under INLA/SPDE framework for geostatistical modelling / O impacto na escolha de diferentes malhas em modelagem geoestatística sob a abordagem INLA/SPDEAna Julia Righetto 02 October 2017 (has links)
Spatial statistics methods are widely used since several areas of knowledge such as environmental sciences, geology, agronomy, among others, involve the understanding of the spatial distribution of processes from spatially referenced data. With the advancement of Geographic Information Systems and the Global Positioning Systems this use has been extended. Many methods used in spatial statistics are computationally demanding, and therefore, the development of more computationally efficient methods has received a lot of attention in recent years. One such important development is the introduction of the integrated nested Laplace approximation method which is able to carry out Bayesian analysis in a more efficient way. The use of this method for geostatistical data is commonly done considering the stochastic partial differential equation approach that requires the creation of a mesh overlying the study area. This is the first and an important step since all results will depend on the choice of this mesh. As there is no formal and close way to specify the mesh, we investigate possible guidelines on how a suitable mesh is chosen for a specific problem. Through simulations studies, we tried to create guidelines for the construction of the mesh for random, regular and cluster data set and we aplly this guidelines in real data set. / Métodos de estatística espacial são amplamente utilizados, uma vez que várias áreas do conhecimento, como ciências ambientais, geologia, agronomia, entre outros, envolvem a compreensão da distribuição espacial de processos a partir de dados referenciados espacialmente. Com o avanço dos Sistemas de Informação Geográfica e dos Sistemas de Posicionamento Global, esse uso foi ampliado. Muitos métodos utilizados na estatística espacial são computacionalmente exigentes e, portanto, o desenvolvimento de métodos mais eficientes recebeu muita atenção nos últimos anos. Um desenvolvimento importante foi a introdução do método de aproximação de Laplace aninhado integrado, capaz de realizar análises Bayesianas de forma mais eficiente. O uso deste método para dados geoestatísticos é comumente feito considerando a abordagem de equações diferenciais parciais estocásticas que requer a criação de uma malha que cobre a área de estudo. Este é o primeiro e um importante passo, pois todos os resultados dependerão da escolha desta malha. Como não existe uma maneira formal e direta de especificar a malha, investigamos possíveis diretrizes sobre como uma malha adequada é escolhida para um problema específico. Através de estudos de simulações, tentamos criar diretrizes para a construção da malha para conjunto de dados aleatórios, regulares e de cluster e aplicamos essas diretrizes em conjunto de dados reais.
|
3 |
Diretrizes para aplicação de inferência Bayesiana aproximada para modelos lineares generalizados e dados georreferenciados / Approximate Bayesian inference guidelines for generalized linear models and georeferenced dataFrade, Djair Durand Ramalho 15 August 2018 (has links)
Neste trabalho, exploramos e propusemos diretrizes para a análise de dados utilizando o método Integrated Nested Laplace Approxímation - INLA para os modelos lineares generalizados (MLG\'s) e modelos baseados em dados georreferenciados. No caso dos MLG\'s, verificou-se o impacto do método de aproximação utilizado para aproximar a distribuição a posteriori conjunta. Nos dados georreferenciados, avaliou-se e propôs-se diretrizes para construção das malhas, passo imprescindível para obtenção de resultados mais precisos. Em ambos os casos, foram realizados estudos de simulação. Para selecionar os melhores modelos, foram calculadas medidas de concordância entre as observações e os valores ajustados pelos modelos, por exemplo, erro quadrático médio e taxa de cobertura. / In this work, we explore and propose guidelines for data analysis using the Integrated Nested Laplace Approximation (INLA) method for generalized linear models (GLM) and models based on georeferenced data. In the case of GLMs, the impact of the approximation method used to approximate the a posteriori joint distribution was verified. In the georeferenced data, we evaluated and proposed guidelines for the construction of the meshes, an essential step for obtaining more precise results. In both cases, simulation studies were performed. To select the best models, agreement measures were calculated between observations and models, for example, mean square error and coverage rate.
|
4 |
Indicador de risco de óbito de pacientes com AIDS no município de Campinas sob uma abordagem de modelos espaço-temporais em análise de sobrevivênciaMota, Thiago Santos. January 2018 (has links)
Orientador: Liciana Vaz de Arruda Silveira / Resumo: O objetivo principal desta tese foi avaliar a sobrevida e o risco espacial de óbito de pacientes portadores de HIV/AIDS em três períodos de tempo, por meio de modelos de sobrevida espaço-temporais. A justificativa desse estudo se deve a lacuna de trabalhos na área de epidemiologia que abordem simultaneamente técnicas estatísticas de análise espacial e sobrevivência na análise de dados de AIDS. Essas técnicas são úteis para um melhor entendimento sobre a epidemia, para auxiliar no monitoramento clínico, avaliar a mortalidade e obter os fatores de risco que influenciam na sobrevida de pacientes portadores de HIV/AIDS. Os dados foram obtidos do Sistema de Informação de Agravos de Notificação (SINAN) e dos óbitos que constam no banco de dados do sistema de informação sobre mortalidade (SIM) de moradores de Campinas. O banco dados foi dividido em três coortes retrospectivas, sendo a primeira composta de 286 indivíduos notificados nos primeiros anos da epidemia de 1980 a 1990 (coorte 1), uma segunda coorte com 1456 indivíduos notificados de 1996 a 2000 e uma terceira coorte com 1342 indivíduos notificados de 2001 a 2005 (coorte 3). Nestas coortes o tempo de seguimento médio foi de 10 anos. Na modelagem utilizou-se duas abordagens, na primeira, bayesiana, ajustou-se um modelo semi-paramétrico bayesiano empregando o método de Aproximação de Laplace Aninhada e Integrada (INLA) agregando as localizações das residências dos indivíduos que pertenciam ao mesmo setor censitário. Na segunda... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: The main objective of this thesis was to evaluate the survival and spatial risk of death of patients with HIV/AIDS in three periods of time, through of spatio--temporal survival models. The justification for this study is due to a gap in work in the area of the epidemiology that simultaneously addresses statistical techniques of spatial analysis and survival in the analysis of AIDS data. These techniques are useful for a better understanding of the epidemic, to assist in clinical monitoring, to evaluate mortality and to obtain the risk factors that influence the survival of patients with AIDS. The data were obtained from the Information System for Notifiable Diseases (SINAN) and from the deaths in the database of the mortality information system (SIM) of residents of Campinas. The data was divided into three retrospective cohorts, the first consisting of 286 individuals reported in the first years of the epidemic from 1980 to 1990 (cohort 1), a second cohort with 1456 individuals reported from 1996 to 2000 and a third cohort with 1342 individuals reported from 2001 to 2005 (cohort 3). In these cohorts the mean follow-up time was 10 years. In the modeling we used two approaches, in the first, Bayesian, adjusted a Bayesian semi--parametric model using the Approximation Laplace Nested and Integrated (INLA) aggregating the locations of the residences of individuals belonging to the same census sector. In the second approach, frequentist, we used a proportional extension of the ri... (Complete abstract click electronic access below) / Doutor
|
5 |
Incorporating high-dimensional exposure modelling into studies of air pollution and healthLiu, Yi January 2015 (has links)
Air pollution is an important determinant of health. There is convincing, and growing, evidence linking the risk of disease, and premature death, with exposure to various pollutants including fine particulate matter and ozone. Knowledge about the health and environmental risks and their trends is important stimulus for developing environmental and public health policy. In order to perform studies into the risks of environmental hazards on human health study there is a requirement for accurate estimates of exposures that might be experienced by the populations at risk. In this thesis we develop spatio-temporal models within a Bayesian framework to obtain accurate estimates of such exposures. These models are set within a hierarchical framework in a Bayesian setting with different levels describing dependencies over space and time. Considering the complexity of hierarchical models and the large amounts of data that can arise from environmental networks mean that inference using Markov Chain Monte Carlo (MCMC) may be computational challenging in this setting. We use both MCMC and Integrated Nested Laplace Approximations (INLA) to implement spatio-temporal exposure models when dealing with high–dimensional data. We also propose an approach for utilising the results from exposure models in health models which allows them to enhance studies of the health effects of air pollution. Moreover, we investigate the possible effects of preferential sampling, where monitoring sites in environmental networks are preferentially located by the designers in order to assess whether guideline and policies are being adhered to. This means the data arising from such networks may not accurately characterise the spatial-temporal field they intend to monitor and as such will not provide accurate estimates of the exposures that are potentially experienced by populations. This has the potential to introduce bias into estimates of risk associated with exposure to air pollution and subsequent health impact analyses. Throughout the thesis, the methods developed are assessed using simulation studies and applied to real–life case studies assessing the effects of particulate matter on health in Greater London and throughout the UK.
|
6 |
Joint Quantile Disease Mapping for Areal DataAlahmadi, Hanan H. 16 November 2021 (has links)
The statistical analysis based on the quantile method is more comprehensive, flexible, and not sensitive against outliers compared to the mean methods. The study of the joint disease mapping has usually focused on the mean regression. This means they study the correlation or the dependence between the means of the diseases by using standard regression. However, sometimes one disease limits the occurrence of another disease. In this case, the dependence between the two diseases will not be in the means but in the different quantiles; thus, the analyzes will consider a joint disease mapping of high quantile for one disease with low quantile of the other disease. In the proposed joint quantile model, the key idea is to link the diseases with different quantiles and estimate their dependence instead of connecting their means. The various components of this formulation are modeled by using the latent Gaussian model, and the parameters were estimated via R-INLA. Finally, we illustrate the model by analyzing the malaria and G6PD deficiency incidences in 21 African countries.
|
7 |
Joint Weibull Models for Survival and Longitudinal Data with Dynamic PredictionsUvasheva, Dilyara 22 August 2022 (has links)
Patients who were previously diagnosed with prostate cancer usually undergo a routine clinical monitoring that involves measuring the Prostate-specific antigen (PSA). The trajectory of this biomarker over time serves as an indication of cancer recurrence. If the PSA value begins to increase, the cancer is said to be more likely to recur and thus, the patient is advised to start a treatment. There are two reasons for stopping the patient follow-up and this poses a certain challenge. One of them is starting a salvage hormone therapy and another is actual recurrence of cancer. When analyzing such data, we need to account for informative dropout, otherwise, neglecting it may lead to increased bias in estimation of the PSA trajectory. Thus, hormone therapy serves as a censoring event, which is a defining feature of survival analysis.
Motivated by the PSA data, we need to efficiently describe the dropout mechanism using the joint model. The survival submodel is based on the Weibull distribution and we use the Bayesian inference to fit this model, more specifically, we use the R-INLA package, which is a much faster alternative to MCMC-based inference. The fact that our joint model with a linear bivariate Gaussian association structure is a latent Gaussian model (LGM) allows us to use this inferential tool. Based on this work, we are then able to develop dynamic predictions of prostate cancer recurrence. Making accurate prognosis for cancer data is clinically impactful and could ultimately contribute to the development of precision medicine.
|
8 |
Criticism and robustification of latent Gaussian modelsCabral, Rafael 28 May 2023 (has links)
Latent Gaussian models (LGMs) are perhaps the most commonly used class of statistical models with broad applications in various fields, including biostatistics, econometrics, and spatial modeling. LGMs assume that a set of unobserved or latent variables follow a Gaussian distribution, commonly used to model spatial and temporal dependence in the data. The availability of computational tools, such as R-INLA, that permit fast and accurate estimation of LGMs has made their use widespread. Nevertheless, it is easy to find datasets that contain inherently non-Gaussian features, such as sudden jumps or spikes, that adversely affect the inferences and predictions made from an LGM. These datasets require more general latent non-Gaussian models (LnGMs) that can automatically handle these non-Gaussian features by assuming more flexible and robust non-Gaussian distributions on the latent variables. However, fast implementation and easy-to-use software are lacking, which prevents LnGMs from becoming widely applicable.
This dissertation aims to tackle these challenges and provide ready-to-use implementations for the R-INLA package. We view scientific learning as an iterative process involving model criticism followed by model improvement and robustification. Thus, the first step is to provide a framework that allows researchers to criticize and check the adequacy of an LGM without fitting the more expensive LnGM. We employ concepts from Bayesian sensitivity analysis to check the influence of the latent Gaussian assumption on the statistical answers and Bayesian predictive checking to check if the fitted LGM can predict important features in the data. In many applications, this procedure will suffice to justify using an LGM. For cases where this check fails, we provide fast and scalable implementations of LnGMs based on variational Bayes and Laplace approximations. The approximation leads to an LGM that downweights extreme events in the latent variables, reducing their impact and leading to more robust inferences. Each step, the first of LGM criticism and the second of LGM robustification, can be executed in R-INLA, requiring only the addition of a few lines of code. This results in a robust workflow that applied researchers can readily use.
|
9 |
Análise espacial dos determinantes sociais e o risco de mortes por tuberculose: da aplicação da estatística de varredura à abordagem Bayesiana em uma metrópole do Centro Oeste brasileiro / Spatial analysis of social determinants and risk of death from tuberculosis: from the application of scanning statistics to the Bayesian approach in the brazilian Midwest.Alves, Josilene Dália 20 December 2018 (has links)
A tuberculose é uma das dez principais causas de morte dentre as doenças infecciosas no mundo, o que evidencia a doença como um problema de saúde pública. A redução da mortalidade por tuberculose em 95% até 2035, proposta pela Estratégia End TB, tem sido desafiadora para o Brasil devido sua extensão territorial, variações culturais e desigualdades na distribuição dos recursos de proteção social e de saúde. Assim, buscou-se analisar a relação espacial e espaço-temporal dos determinantes sociais e o risco de mortes por tuberculose em Cuiabá. Trata-se de um estudo ecológico, realizado na cidade de Cuiabá, capital do estado de Mato Grosso. As unidades de análise do estudo foram as Unidades de Desenvolvimento Humano (UDHs) e a população foi constituída por casos de óbitos que apresentaram como causa básica a TB registrados no Sistema de Informação sobre Mortalidade (SIM) entre 2006 a 2016, residentes na zona urbana do município. Para identificação das áreas de risco das mortes por tuberculose, utilizou-se a estatística de varredura. Em seguida, recorreu-se à técnica da Análise de Componentes Principais que permitiu a elaboração das dimensões dos determinantes sociais. A associação entre os determinantes sociais e as áreas de risco das mortes por tuberculose foi obtida, por meio da regressão logística binária. As modelagens Bayesianas foram empregadas, por meio da abordagem Integrated Nested Laplace Approximation (INLA), para verificar os riscos relativos temporais e espaciais e avaliar sua a relação com covariáveis representativas dos determinantes sociais. Nesse período foram registradas 225 mortes por tuberculose, identificou-se aglomerado de risco para a mortalidade por tuberculose, com RR = 2,09 (IC95% = 1,48-2,94; p = 0,04). No modelo logístico, os determinantes sociais relacionados ao déficit escolar e pobreza estiveram associados ao aglomerado de risco de mortes por tuberculose (OR=2,92; IC95% = 1,17-7,28), a renda apresentou uma associação negativa (OR=0,05; IC95% = 0,00 - 0,70). O valor da curva ROC do modelo foi de 92,1%. Em relação aos modelos Bayesianos observou-se redução do risco de morte por tuberculose entre 2006 (RR=1,03) e 2016 (RR=0,98) e ainda áreas de risco que persistem por mais de uma década. Dentre os determinantes sociais, a renda foi um importante fator associado ao risco de morte por tuberculose, sendo que o aumento de um desvio padrão na renda correspondeu à diminuição de 31% no risco de mortalidade por tuberculose. Os resultados do estudo apontam que existe associação entre os determinantes sociais e o risco de mortalidade por tuberculose no município investigado, sendo este um fenômeno que persiste no tempo. O investimento em políticas públicas de melhoria de distribuição de renda pode favorecer a mudança dessa realidade. Espera-se que os achados possam nortear gestores e trabalhadores no âmbito local e regional / Tuberculosis is one of the top 10 causes of death among infectious diseases in the world, which shows the disease as a public health problem. The reduction of tuberculosis mortality by 95% up to 2035, proposed by the End TB Strategy, has been challenging for Brazil due to its territorial extension, cultural variations and inequalities in the distribution of social protection and health resources. Thus, we sought to analyze the spatial and spatial-temporal relationship of social determinants and the risk of deaths from tuberculosis in Cuiabá.This is an ecological study conducted in the city of Cuiaba, capital of Mato Grosso. The units of analysis of the study were the Human Development Units (UDHs) and the population was constituted by cases of deaths that presented the basic cause of TB registered in the Mortality Information System (SIM) between 2006 and 2016, of the municipality.To identify the risk areas for tuberculosis deaths, the scan statistic was used. Next, we used the technique of Principal Component Analysis that allowed the elaboration of the dimensions of social determinants. The association between social determinants and risk areas for tuberculosis deaths was obtained through binary logistic regression. Bayesian modeling was used, through the Integrated Nested Laplace Approximation (INLA) approach, to verify temporal and spatial relative risks and to evaluate its relationship with covariables representative of social determinants. During this period, there were 225 deaths due to tuberculosis and a risk cluster was identified for tuberculosis mortality, with RR = 2.09 (IC95% = 1.48-2.94, p = 0.04). In the logistic model, the social determinants related to school deficit and poverty were associated with the risk cluster of deaths due to tuberculosis (OR = 2.92, IC95% = 1.17-7.28), income had a negative association (OR = 0.05, IC95% = 0.00 - 0.70). The value of the ROC curve of the model was 92.1%. In relation to Bayesian models, there was a reduction in the risk of death due to tuberculosis between 2006 (RR = 1.03) and 2016 (RR = 0.98), as well as risk areas that persisted for more than a decade. Among the social determinants, income was an important factor associated with the risk of death due to tuberculosis, and the increase of a standard deviation in the income corresponded to a 31% decrease in the risk of mortality due to tuberculosis. The results of the study indicate that there is an association between the social determinants and the risk of mortality due to tuberculosis in the municipality under investigation, which is a phenomenon that persists over time. Investment in public policies to improve income distribution may favor a change in this reality. It is hoped that the findings will guide managers and workers at local and regional levels
|
10 |
Modélisation spatiale multi-sources de la teneur en carbone organique du sol d'une petite région agricole francilienne / Multi-source spatial modelling of the soil organic carbon content in Western Paris croplandsZaouche, Mounia 15 March 2019 (has links)
Cette thèse porte sur l’estimation spatiale de la teneur superficielle en carbone organiquedu sol ou teneur en SOC (pour ’Soil Organic Carbon content’), à l’échelle d’une petite région agricolefrancilienne. La variabilité de la teneur en SOC a été identifiée comme étant l’une des principales sourcesd’incertitude de la prédiction des stocks de SOC, dont l’accroissement favorise la fertilité des sols etl’atténuation des émissions de gaz à effet de serre. Nous utilisons des données provenant de sourceshétérogènes décrites selon différentes résolutions spatiales (prélèvements de sol, carte pédologique, imagessatellitaires multispectrales, etc) dans le but de produire d’une part une information spatiale exhaustive,et d’autre part des estimations précises de la teneur en SOC sur la région d’étude ainsi qu’une uneévaluation des incertitudes associées. Plusieurs modèles originaux, dont certains tiennent compte duchangement du support, sont construits et plusieurs approches et méthodes de prédiction sont considérées.Parmi elles, on retrouve des méthodes bayésiennes récentes et performantes permettant non seulementd’inférer des modèles sophistiqués intégrant conjointement des données de résolution spatiale différentemais aussi de traiter des données en grande dimension. Afin d’optimiser la qualité de la prédictiondes modélisations multi-sources, nous proposons également une approche efficace et rapide permettantd’accroître l’influence d’un type de données importantes mais sous-représentées dans l’ensemble de toutesles données initialement intégrées. / In this thesis, we are interested in the spatial estimation of the topsoil organic carbon(SOC) content over a small agricultural area located West of Paris. The variability of the SOC contenthas been identified as one of the main sources of prediction uncertainty of SOC stocks, whose increasepromotes soil fertility and mitigates greenhouse gas emissions. We use data issued from heterogeneoussources defined at different spatial resolutions (soil samples, soil map, multispectral satellite images, etc)with the aim of providing on the one hand an exhaustive spatial information, and on the other accurateestimates of the SOC content in the study region and an assessment of the related uncertainties. Severaloriginal models, some of which incorporate the change of support, are built and several approaches andprediction methods are considered. These include recent and powerful Bayesian methods enabling notonly the inference of sophisticated models integrating jointly data of different spatial resolutions butalso the exploitation of large data sets. In order to optimize the quality of prediction of the multi-sourcedata modellings, we also propose an efficient and fast approach : it allows to increase the influence of animportant but under-represented type of data, in the set of all initially integrated data.
|
Page generated in 0.0285 seconds