351 |
Statistical Analysis of Structured High-dimensional DataSun, Yizhi 05 October 2018 (has links)
High-dimensional data such as multi-modal neuroimaging data and large-scale networks carry excessive amount of information, and can be used to test various scientific hypotheses or discover important patterns in complicated systems. While considerable efforts have been made to analyze high-dimensional data, existing approaches often rely on simple summaries which could miss important information, and many challenges on modeling complex structures in data remain unaddressed. In this proposal, we focus on analyzing structured high-dimensional data, including functional data with important local regions and network data with community structures.
The first part of this dissertation concerns the detection of ``important'' regions in functional data. We propose a novel Bayesian approach that enables region selection in the functional data regression framework. The selection of regions is achieved through encouraging sparse estimation of the regression coefficient, where nonzero regions correspond to regions that are selected. To achieve sparse estimation, we adopt compactly supported and potentially over-complete basis to capture local features of the regression coefficient function, and assume a spike-slab prior to the coefficients of the bases functions. To encourage continuous shrinkage of nearby regions, we assume an Ising hyper-prior which takes into account the neighboring structure of the bases functions. This neighboring structure is represented by an undirected graph. We perform posterior sampling through Markov chain Monte Carlo algorithms. The practical performance of the proposed approach is demonstrated through simulations as well as near-infrared and sonar data.
The second part of this dissertation focuses on constructing diversified portfolios using stock return data in the Center for Research in Security Prices (CRSP) database maintained by the University of Chicago. Diversification is a risk management strategy that involves mixing a variety of financial assets in a portfolio. This strategy helps reduce the overall risk of the investment and improve performance of the portfolio. To construct portfolios that effectively diversify risks, we first construct a co-movement network using the correlations between stock returns over a training time period. Correlation characterizes the synchrony among stock returns thus helps us understand whether two or multiple stocks have common risk attributes. Based on the co-movement network, we apply multiple network community detection algorithms to detect groups of stocks with common co-movement patterns. Stocks within the same community tend to be highly correlated, while stocks across different communities tend to be less correlated. A portfolio is then constructed by selecting stocks from different communities. The average return of the constructed portfolio over a testing time period is finally compared with the SandP 500 market index. Our constructed portfolios demonstrate outstanding performance during a non-crisis period (2004-2006) and good performance during a financial crisis period (2008-2010). / PHD / High dimensional data, which are composed by data points with a tremendous number of features (a.k.a. attributes, independent variables, explanatory variables), brings challenges to statistical analysis due to their “high-dimensionality” and complicated structure. In this dissertation work, I consider two types of high-dimension data. The first type is functional data in which each observation is a function. The second type is network data whose internal structure can be described as a network. I aim to detect “important” regions in functional data by using a novel statistical model, and I treat stock market data as network data to construct quality portfolios efficiently
|
352 |
A markov chain monte carlo method for inverse stochastic modeling and uncertainty assessmentFu, Jianlin 07 May 2008 (has links)
Unlike the traditional two-stage methods, a conditional and inverse-conditional simulation approach may directly generate independent, identically distributed realizations to honor both static data and state data in one step. The Markov chain Monte Carlo (McMC) method was proved a powerful tool to perform such type of stochastic simulation. One of the main advantages of the McMC over the traditional sensitivity-based optimization methods to inverse problems is its power, flexibility and well-posedness in incorporating observation data from different sources. In this work, an improved version of the McMC method is presented to perform the stochastic simulation of reservoirs and aquifers in the framework of multi-Gaussian geostatistics.
First, a blocking scheme is proposed to overcome the limitations of the classic single-component Metropolis-Hastings-type McMC. One of the main characteristics of the blocking McMC (BMcMC) scheme is that, depending on the inconsistence between the prior model and the reality, it can preserve the prior spatial structure and statistics as users specified. At the same time, it improves the mixing of the Markov chain and hence enhances the computational efficiency of the McMC. Furthermore, the exploration ability and the mixing speed of McMC are efficiently improved by coupling the multiscale proposals, i.e., the coupled multiscale McMC method. In order to make the BMcMC method capable of dealing with the high-dimensional cases, a multi-scale scheme is introduced to accelerate the computation of the likelihood which greatly improves the computational efficiency of the McMC due to the fact that most of the computational efforts are spent on the forward simulations. To this end, a flexible-grid full-tensor finite-difference simulator, which is widely compatible with the outputs from various upscaling subroutines, is developed to solve the flow equations and a constant-displacement random-walk
particle-tracking method, which enhances the com / Fu, J. (2008). A markov chain monte carlo method for inverse stochastic modeling and uncertainty assessment [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1969
|
353 |
Reconstruction de profils protéiques pour la recherche de biomarqueurs / Reconstruction of proteomic profiles for biomarker discoverySzacherski, Pascal 21 December 2012 (has links)
Cette thèse préparée au CEA Leti, Minatec Campus, Grenoble, et à l’IMS, Bordeaux, s’inscrit dans le thème du traitement de l’information pour des données protéomiques. Nous cherchons à reconstruire des profils protéiques à partir des données issues de chaînes d’analyse complexes associant chromatographie liquide et spectrométrie de masse. Or, les signaux cibles sont des mesures de traces peptidiques qui sont de faible niveau dans un environnement très complexe et perturbé. Ceci nous a conduits à étudier des outils statistiques adaptés. Ces perturbations peuvent provenir des instruments de mesure (variabilité technique) ou des individus (variabilité biologique). Le modèle hiérarchique de l’acquisition des données permet d’inclure ces variabilités explicitement dans la modélisation probabiliste directe. La mise en place d’une méthodologie problèmes inverses permet ensuite d’estimer les grandeurs d’intérêt. Dans cette thèse, nous avons étudié trois types de problèmes inverses associés aux opérations suivantes: 1. la quantification de protéines cibles, vue comme l’estimation de la concentration protéique, 2. l’apprentissage supervisé à partir d’une cohorte multi-classe, vu comme l’estimation des paramètres des classes, et 3. la classification à partir des connaissances sur les classes, vue comme l’estimation de la classe à laquelle appartient un nouvel échantillon.La résolution des problèmes inverses se fait dans le cadre des méthodes statistiques bayésiennes, en ayant recours pour les calculs numériques aux méthodes d’échantillonnage stochastique (Monte Carlo Chaîne de Markov). / This thesis has been prepared at the CEA Leti, Minatec Campus, (Grenoble, France) and the IMS (Bordeaux, France) in the context of information and signal processing of proteomic data. The aim is to reconstruct the proteomic profile from the data provided by complex analytical workflow combining a spectrometer and a chromatograph. The signals are measurements of peptide traces which have low amplitude within a complex and noisy background. Therefore, adapted statistical signal processing methods are required. The uncertainty can be of technical nature (instruments, measurements) or of biological nature (individuals, “patients”). A hierarchical model, describing the forward problem of data acquisition, allows for includingexplicitly those variability sources within the probabilistic model. The use of the inverse problem methodology, finally, leads us to the estimation of the parameters of interest. In this thesis, we have studied three types of inverse problems for the following applications:1. quantification of targeted proteins, seen as estimation of the protein concentration,2. supervised training from a labelled cohort, seen as estimation of distribution parameters for each class,3. classification given the knowledge about the classes, seen as estimation of the class a biological sample belongs to.We solve these inverse problems within a Bayesian framework, resorting to stochastic sampling methods (Monte Carlo Markov Chain) for computation.
|
354 |
Simulation du canal optique sans fil. Application aux télécommunications optique sans fil / Optical wireless channel simulation. Applications to optical wireless communicationsBehlouli, Abdeslam 07 December 2016 (has links)
Le contexte de cette thèse est celui des communications optiques sans fil pour des applications en environnements indoor. Pour discuter des performances d'une liaison optique sans fil, il est nécessaire d'établir une étude caractéristique du comportement du canal de propagation. Cette étude passe par l'étape de la mesure ou de l'estimation par la simulation de la réponse impulsionnelle. Après avoir décrit la composition d'une liaison et passé en revue les méthodes de simulation existantes, nous présentons nos algorithmes de simulation dans des environnements réalistes, en nous intéressant à leurs performances en termes de précision et de temps de calcul. Ces méthodes sont basées sur la résolution des équations de transport de la lumière par du lancer de rayons associées aux méthodes d'intégration stochastique de Monte Carlo. La version classique de ces méthodes est à la base de trois algorithmes de simulations proposés. En utilisant une optimisation par des chaînes de Markov, nous présentons ensuite deux autres algorithmes. Un bilan des performances de ces algorithmes est établi dans des scénarios mono et multi-antennes. Finalement, nous appliquons nos algorithmes pour caractériser l'impact de l'environnement de simulation sur les performances d'une liaison de communication par lumière visible, à savoir les modèles d'émetteurs, les matériaux des surfaces, l'obstruction du corps de l'utilisateur et sa mobilité, et la géométrie de la scène de simulation. / The context of this PhD thesis falls within the scope of optical wireless communications for applications in indoor environments. To discuss the performance of an optical wireless link, it is necessary to establish a characteristic study of the behavior of the optical wave propagation channel. This study can be realized by measurement or by the simulation of the channel impulse response. After describing the composition of an optical wireless link and reviewing existing simulation methods, we present our new simulation algorithms channel in realistic environments by focusing on their performances in terms of accuracy and their complexity in terms of computation time. These methods are based on solving the light transport equations by ray-tracing techniques associated with stochastic Monte Carlo integration methods. The classical version of these methods is the basis of three proposed simulation algorithms. By applying an optimization using Markov Chain, we present two new algorithms. A performance assessment of our simulation algorithms is established in mono and multi-antenna scenarios of our simulation algorithms. Finally, we present the application of these algorithms for characterizing the impact of the simulation environment on the performances of a visible light communication link. We particularly focus on the transmitter models, surface coating materials, obstruction of the user's body and its mobility, and the geometry of the simulation scene.
|
355 |
Cosmologia usando aglomerados de galáxias no Dark Energy Survey / Cosmology with Galaxy Clusters in the Dark Energy SurveySilva, Michel Aguena da 03 August 2017 (has links)
Aglomerados de galáxias são as maiores estruturas no Universo. Sua distribuição mapeia os halos de matéria escura formados nos potenciais profundos do campo de matéria escura. Consequentemente, a abundância de aglomerados é altamente sensível a expansão do Universo, assim como ao crescimento das perturbações de matéria escura, constituindo uma poderosa ferramenta para fins cosmológicos. Na era atual de grandes levantamentos observacionais que produzem uma quantidade gigantesca de dados, as propriedades estatísticas dos objetos observados (galáxias, aglomerados, supernovas, quasares, etc) podem ser usadas para extrair informações cosmológicas. Para isso, é necessária o estudo da formação de halos de matéria escura, da detecção dos halos e aglomerados, das ferramentas estatísticas usadas para o vínculos de parâmetros, e finalmente, dos efeitos da detecções ópticas. No contexto da formulação da predição teórica da contagem de halos, foi analisada a influência de cada parâmetro cosmológico na abundância dos halos, a importância do uso da covariância dos halos, e a eficácia da utilização dos halos para vincular cosmologia. Também foi analisado em detalhes os intervalos de redshift e o uso de conhecimento prévio dos parâmetros ({\\it priors}). A predição teórica foi testada um uma simulação de matéria escura, onde a cosmologia era conhecida e os halos de matéria escura já haviam sido detectados. Nessa análise, foi atestado que é possível obter bons vínculos cosmológicos para alguns parâmetros (Omega_m,w,sigma_8,n_s), enquanto outros parâmetros (h,Omega_b) necessitavam de conhecimento prévio de outros testes cosmológicos. Na seção dos métodos estatísticos, foram discutidos os conceitos de {\\it likelihood}, {\\it priors} e {\\it posterior distribution}. O formalismo da Matriz de Fisher, bem como sua aplicação em aglomerados de galáxias, foi apresentado e usado para a realização de predições dos vínculos em levantamentos atuais e futuros. Para a análise de dados, foram apresentados métodos de Cadeias de Markov de Monte Carlo (MCMC), que diferentemente da Matriz de Fisher não assumem Gaussianidade entre os parâmetros vinculados, porém possuem um custo computacional muito mais alto. Os efeitos observacionais também foram estudados em detalhes. Usando uma abordagem com a Matriz de Fisher, os efeitos de completeza e pureza foram extensivamente explorados. Como resultado, foi determinado em quais casos é vantajoso incluir uma modelagem adicional para que o limite mínimo de massa possa ser diminuído. Um dos principais resultados foi o fato que a inclusão dos efeitos de completeza e pureza na modelagem não degradam os vínculos de energia escura, se alguns outros efeitos já estão sendo incluídos. Também foi verificados que o uso de priors nos parâmetros não cosmológicos só afetam os vínculos de energia escura se forem melhores que 1\\%. O cluster finder(código para detecção de aglomerados) WaZp foi usado na simulação, produzindo um catálogo de aglomerados. Comparando-se esse catálogo com os halos de matéria escura da simulação, foi possível investigar e medir os efeitos observacionais. A partir dessas medidas, pôde-se incluir correções para a predição da abundância de aglomerados, que resultou em boa concordância com os aglomerados detectados. Os resultados a as ferramentas desenvolvidos ao longo desta tese podem fornecer um a estrutura para a análise de aglomerados com fins cosmológicos. Durante esse trabalho, diversos códigos foram desenvolvidos, dentre eles, estão um código eficiente para computar a predição teórica da abundância e covariância de halos de matéria escura, um código para estimar a abundância e covariância dos aglomerados de galáxias incluindo os efeitos observacionais, e um código para comparar diferentes catálogos de halos e aglomerados. Esse último foi integrado ao portal científico do Laboratório Interinstitucional de e-Astronomia (LIneA) e está sendo usado para avaliar a qualidade de catálogos de aglomerados produzidos pela colaboração do Dark Energy Survey (DES), assim como também será usado em levantamentos futuros. / Abstract Galaxy clusters are the largest bound structures of the Universe. Their distribution maps the dark matter halos formed in the deep potential wells of the dark matter field. As a result, the abundance of galaxy clusters is highly sensitive to the expansion of the universe as well as the growth of dark matter perturbations, representing a powerful tool for cosmological purposes. In the current era of large scale surveys with enormous volumes of data, the statistical quantities from the objects surveyed (galaxies, clusters, supernovae, quasars, etc) can be used to extract cosmological information. The main goal of this thesis is to explore the potential use of galaxy clusters for constraining cosmology. To that end, we study the halo formation theory, the detection of halos and clusters, the statistical tools required to quarry cosmological information from detected clusters and finally the effects of optical detection. In the composition of the theoretical prediction for the halo number counts, we analyze how each cosmological parameter of interest affects the halo abundance, the importance of the use of the halo covariance, and the effectiveness of halos on cosmological constraints. The redshift range and the use of prior knowledge of parameters are also investigated in detail. The theoretical prediction is tested on a dark matter simulation, where the cosmology is known and a dark matter halo catalog is available. In the analysis of the simulation we find that it is possible to obtain good constraints for some parameters such as (Omega_m,w,sigma_8,n_s) while other parameters (h,Omega_b) require external priors from different cosmological probes. In the statistical methods, we discuss the concept of likelihood, priors and the posterior distribution. The Fisher Matrix formalism and its application on galaxy clusters is presented, and used for making forecasts of ongoing and future surveys. For the real analysis of data we introduce Monte Carlo Markov Chain (MCMC) methods, which do not assume Gaussianity of the parameters distribution, but have a much higher computational cost relative to the Fisher Matrix. The observational effects are studied in detail. Using the Fisher Matrix approach, we carefully explore the effects of completeness and purity. We find in which cases it is worth to include extra parameters in order to lower the mass threshold. An interesting finding is the fact that including completeness and purity parameters along with cosmological parameters does not degrade dark energy constraints if other observational effects are already being considered. The use of priors on nuisance parameters does not seem to affect the dark energy constraints, unless these priors are better than 1\\%.The WaZp cluster finder was run on a cosmological simulation, producing a cluster catalog. Comparing the detected galaxy clusters to the dark matter halos, the observational effects were investigated and measured. Using these measurements, we were able to include corrections for the prediction of cluster counts, resulting in a good agreement with the detected cluster abundance. The results and tools developed in this thesis can provide a framework for the analysis of galaxy clusters for cosmological purposes. Several codes were created and tested along this work, among them are an efficient code to compute theoretical predictions of halo abundance and covariance, a code to estimate the abundance and covariance of galaxy clusters including multiple observational effects and a pipeline to match and compare halo/cluster catalogs. This pipeline has been integrated to the Science Portal of the Laboratório Interinstitucional de e-Astronomia (LIneA) and is being used to automatically assess the quality of cluster catalogs produced by the Dark Energy Survey (DES) collaboration and will be used in other future surveys.
|
356 |
Dados de sobrevivência multivariados na presença de covariáveis e observações censuradas: uma abordagem bayesianaSantos, Carlos Aparecido dos 04 March 2010 (has links)
Made available in DSpace on 2016-06-02T20:04:51Z (GMT). No. of bitstreams: 1
3028.pdf: 7339557 bytes, checksum: 16711c2271b754604bfa0b0fba30290b (MD5)
Previous issue date: 2010-03-04 / In this work, we introduce a Bayesian Analysis for survival multivariate data in the presence of a covariate vector and censored observations. Different frailties or latent variables are considered to capture the correlation among the survival times for the same individual. We also introduce a Bayesian analysis for some of the most popular bivariate exponential distributions introduced in the literature. A Bayesian analysis is also introduced for the Block & Basu bivariate exponential distribution using Markov Chain Monte Carlo (MCMC) methods and considering lifetimes in presence of covariates and censored data. In another topic, we introduce a Bayesian Analysis for bivariate lifetime data in the presence of covariates and censoring data assuming different bivariate Weibull distributions derived from some existing copula functions. A great computational simplification to simulate samples for the joint posterior distribution is obtained using the WinBUGS software. Numerical illustrations are introduced considering real data sets considering every proposed methodology. / Nesta tese introduzimos uma an´alise Bayesiana para dados de sobreviv encia multivariados, na presen¸ca de um vetor de covari´aveis e observa¸c oes censuradas. Diferentes fragilidades ou vari´aveis latentes s ao consideradas para capturar a correla¸c ao existente entre os tempos de sobreviv encia, para o mesmo indiv´ıduo. Tamb´em apresentamos uma an´alise Bayesiana para algumas das mais populares distribui¸c oes exponenciais bivariadas introduzidas na literatura. Uma an´alise Bayesiana tamb´em ´e introduzida para a distribui¸c ao exponencial bivariada de Block & Basu, usando m´etodos MCMC (Monte Carlo em Cadeias de Markov) e considerando os tempos de sobreviv encia na presen¸ca de covari´aveis e dados censurados. Em outro t´opico, introduzimos uma an´alise Bayesiana para dados de sobreviv encia bivariados na presen¸ca de covari´aveis e observa¸c oes censuradas, assumindo diferentes distribui¸c oes bivariadas Weibull derivadas de algumas fun¸c oes c´opulas existentes. Uma grande simplifica¸c ao computacional para simular amostras da distribui¸c ao a posteriori conjunta de interesse ´e obtida usando o software WinBUGS. Ilustra¸c oes num´ericas s ao introduzidas considerando conjunto de dados reais, para cada uma das metodologias propostas.
|
357 |
Família Weibull de razão de chances na presença de covariáveisGomes, André Yoshizumi 18 March 2009 (has links)
Made available in DSpace on 2016-06-02T20:06:06Z (GMT). No. of bitstreams: 1
4331.pdf: 1908865 bytes, checksum: d564b46a6111fdca6f7cc9f4d5596637 (MD5)
Previous issue date: 2009-03-18 / Universidade Federal de Minas Gerais / The Weibull distribuition is a common initial choice for modeling data with monotone hazard rates. However, such distribution fails to provide a reasonable parametric _t when the hazard function is unimodal or bathtub-shaped. In this context, Cooray (2006) proposed a generalization of the Weibull family by considering the distributions of the odds of Weibull and inverse Weibull families, referred as the odd Weibull family which is not just useful for modeling unimodal and bathtub-shaped hazards, but it is also convenient for testing goodness-of-_t of Weibull and inverse Weibull as submodels. In this project we have systematically studied the odd Weibull family along with its properties, showing motivations for its utilization, inserting covariates in the model, pointing out some troubles associated with the maximum likelihood estimation and proposing interval estimation and hypothesis test construction methodologies for the model parameters. We have also compared resampling results with asymptotic ones. Coverage probability from proposed con_dence intervals and size and power of considered hypothesis tests were both analyzed as well via Monte Carlo simulation. Furthermore, we have proposed a Bayesian estimation methodology for the model parameters based in Monte Carlo Markov Chain (MCMC) simulation techniques. / A distribuição Weibull é uma escolha inicial freqüente para modelagem de dados com taxas de risco monótonas. Entretanto, esta distribuição não fornece um ajuste paramétrico razoável quando as funções de risco assumem um formato unimodal ou em forma de banheira. Neste contexto, Cooray (2006) propôs uma generalização da família Weibull considerando a distribuição da razão de chances das famílias Weibull e Weibull inversa, referida como família Weibull de razão de chances. Esta família não é apenas conveniente para modelar taxas de risco unimodal e banheira, mas também é adequada para testar a adequabilidade do ajuste das famílias Weibull e Weibull inversa como submodelos. Neste trabalho, estudamos sistematicamente a família Weibull de razão de chances e suas propriedades, apontando as motivações para o seu uso, inserindo covariáveis no modelo, veri_cando as di_culdades referentes ao problema da estimação de máxima verossimilhança dos parâmetros do modelo e propondo metodologia de estimação intervalar e construção de testes de hipóteses para os parâmetros do modelo. Comparamos os resultados obtidos por meio dos métodos de reamostragem com os resultados obtidos via teoria assintótica. Tanto a probabilidade de cobertura dos intervalos de con_ança propostos quanto o tamanho e poder dos testes de hipóteses considerados foram estudados via simulação de Monte Carlo. Além disso, propusemos uma metodologia Bayesiana de estimação para os parâmetros do modelo baseados em técnicas de simulação de Monte Carlo via Cadeias de Markov.
|
358 |
Cosmologia usando aglomerados de galáxias no Dark Energy Survey / Cosmology with Galaxy Clusters in the Dark Energy SurveyMichel Aguena da Silva 03 August 2017 (has links)
Aglomerados de galáxias são as maiores estruturas no Universo. Sua distribuição mapeia os halos de matéria escura formados nos potenciais profundos do campo de matéria escura. Consequentemente, a abundância de aglomerados é altamente sensível a expansão do Universo, assim como ao crescimento das perturbações de matéria escura, constituindo uma poderosa ferramenta para fins cosmológicos. Na era atual de grandes levantamentos observacionais que produzem uma quantidade gigantesca de dados, as propriedades estatísticas dos objetos observados (galáxias, aglomerados, supernovas, quasares, etc) podem ser usadas para extrair informações cosmológicas. Para isso, é necessária o estudo da formação de halos de matéria escura, da detecção dos halos e aglomerados, das ferramentas estatísticas usadas para o vínculos de parâmetros, e finalmente, dos efeitos da detecções ópticas. No contexto da formulação da predição teórica da contagem de halos, foi analisada a influência de cada parâmetro cosmológico na abundância dos halos, a importância do uso da covariância dos halos, e a eficácia da utilização dos halos para vincular cosmologia. Também foi analisado em detalhes os intervalos de redshift e o uso de conhecimento prévio dos parâmetros ({\\it priors}). A predição teórica foi testada um uma simulação de matéria escura, onde a cosmologia era conhecida e os halos de matéria escura já haviam sido detectados. Nessa análise, foi atestado que é possível obter bons vínculos cosmológicos para alguns parâmetros (Omega_m,w,sigma_8,n_s), enquanto outros parâmetros (h,Omega_b) necessitavam de conhecimento prévio de outros testes cosmológicos. Na seção dos métodos estatísticos, foram discutidos os conceitos de {\\it likelihood}, {\\it priors} e {\\it posterior distribution}. O formalismo da Matriz de Fisher, bem como sua aplicação em aglomerados de galáxias, foi apresentado e usado para a realização de predições dos vínculos em levantamentos atuais e futuros. Para a análise de dados, foram apresentados métodos de Cadeias de Markov de Monte Carlo (MCMC), que diferentemente da Matriz de Fisher não assumem Gaussianidade entre os parâmetros vinculados, porém possuem um custo computacional muito mais alto. Os efeitos observacionais também foram estudados em detalhes. Usando uma abordagem com a Matriz de Fisher, os efeitos de completeza e pureza foram extensivamente explorados. Como resultado, foi determinado em quais casos é vantajoso incluir uma modelagem adicional para que o limite mínimo de massa possa ser diminuído. Um dos principais resultados foi o fato que a inclusão dos efeitos de completeza e pureza na modelagem não degradam os vínculos de energia escura, se alguns outros efeitos já estão sendo incluídos. Também foi verificados que o uso de priors nos parâmetros não cosmológicos só afetam os vínculos de energia escura se forem melhores que 1\\%. O cluster finder(código para detecção de aglomerados) WaZp foi usado na simulação, produzindo um catálogo de aglomerados. Comparando-se esse catálogo com os halos de matéria escura da simulação, foi possível investigar e medir os efeitos observacionais. A partir dessas medidas, pôde-se incluir correções para a predição da abundância de aglomerados, que resultou em boa concordância com os aglomerados detectados. Os resultados a as ferramentas desenvolvidos ao longo desta tese podem fornecer um a estrutura para a análise de aglomerados com fins cosmológicos. Durante esse trabalho, diversos códigos foram desenvolvidos, dentre eles, estão um código eficiente para computar a predição teórica da abundância e covariância de halos de matéria escura, um código para estimar a abundância e covariância dos aglomerados de galáxias incluindo os efeitos observacionais, e um código para comparar diferentes catálogos de halos e aglomerados. Esse último foi integrado ao portal científico do Laboratório Interinstitucional de e-Astronomia (LIneA) e está sendo usado para avaliar a qualidade de catálogos de aglomerados produzidos pela colaboração do Dark Energy Survey (DES), assim como também será usado em levantamentos futuros. / Abstract Galaxy clusters are the largest bound structures of the Universe. Their distribution maps the dark matter halos formed in the deep potential wells of the dark matter field. As a result, the abundance of galaxy clusters is highly sensitive to the expansion of the universe as well as the growth of dark matter perturbations, representing a powerful tool for cosmological purposes. In the current era of large scale surveys with enormous volumes of data, the statistical quantities from the objects surveyed (galaxies, clusters, supernovae, quasars, etc) can be used to extract cosmological information. The main goal of this thesis is to explore the potential use of galaxy clusters for constraining cosmology. To that end, we study the halo formation theory, the detection of halos and clusters, the statistical tools required to quarry cosmological information from detected clusters and finally the effects of optical detection. In the composition of the theoretical prediction for the halo number counts, we analyze how each cosmological parameter of interest affects the halo abundance, the importance of the use of the halo covariance, and the effectiveness of halos on cosmological constraints. The redshift range and the use of prior knowledge of parameters are also investigated in detail. The theoretical prediction is tested on a dark matter simulation, where the cosmology is known and a dark matter halo catalog is available. In the analysis of the simulation we find that it is possible to obtain good constraints for some parameters such as (Omega_m,w,sigma_8,n_s) while other parameters (h,Omega_b) require external priors from different cosmological probes. In the statistical methods, we discuss the concept of likelihood, priors and the posterior distribution. The Fisher Matrix formalism and its application on galaxy clusters is presented, and used for making forecasts of ongoing and future surveys. For the real analysis of data we introduce Monte Carlo Markov Chain (MCMC) methods, which do not assume Gaussianity of the parameters distribution, but have a much higher computational cost relative to the Fisher Matrix. The observational effects are studied in detail. Using the Fisher Matrix approach, we carefully explore the effects of completeness and purity. We find in which cases it is worth to include extra parameters in order to lower the mass threshold. An interesting finding is the fact that including completeness and purity parameters along with cosmological parameters does not degrade dark energy constraints if other observational effects are already being considered. The use of priors on nuisance parameters does not seem to affect the dark energy constraints, unless these priors are better than 1\\%.The WaZp cluster finder was run on a cosmological simulation, producing a cluster catalog. Comparing the detected galaxy clusters to the dark matter halos, the observational effects were investigated and measured. Using these measurements, we were able to include corrections for the prediction of cluster counts, resulting in a good agreement with the detected cluster abundance. The results and tools developed in this thesis can provide a framework for the analysis of galaxy clusters for cosmological purposes. Several codes were created and tested along this work, among them are an efficient code to compute theoretical predictions of halo abundance and covariance, a code to estimate the abundance and covariance of galaxy clusters including multiple observational effects and a pipeline to match and compare halo/cluster catalogs. This pipeline has been integrated to the Science Portal of the Laboratório Interinstitucional de e-Astronomia (LIneA) and is being used to automatically assess the quality of cluster catalogs produced by the Dark Energy Survey (DES) collaboration and will be used in other future surveys.
|
359 |
Optimization and Bayesian Modeling of Road Distance for Inventory of Potholes in Gävle Municipality / Optimering och bayesiansk modellering av bilvägsavstånd för inventering av potthål i Gävle kommunLindblom, Timothy Rafael, Tollin, Oskar January 2022 (has links)
Time management and distance evaluation have long been a difficult task for workers and companies. This thesis studies 6712 pothole coordinates in Gävle municipality, and evaluates the minimal total road distance needed to visit each pothole once, and return to an initial pothole. Road distance is approximated using the flight distance and a simple random sample of 113 road distances from Google Maps. Thereafter, the data from the sample along with a Bayesian approach is used to find a distribution of the ratio between road distance and flight distance. Lastly, a solution to the shortest distance is devised using the Nearest Neighbor algorithm (NNA) and Simulated Annealing (SA). Computational work is performed with Markov Chain Monte Carlo (MCMC). The results provide a minimal road distance of 717 km. / Tidshantering och distansutvärdering är som regel en svår uppgift för arbetare och företag. Den här uppsatsen studerar 6712 potthål i Gävle kommun, och utvärderar den bilväg som på kortast sträcka besöker varje potthål och återgår till den ursprungliga startpunkten. Bilvägsavståndet mellan potthålen uppskattas med hjälp av flygavståndet, där ett obundet slumpmässigt urval av 113 bilvägsavstånd mellan potthålens koordinatpunkter dras. Bilvägsdistanser hittas med hjälp av Google Maps. Därefter används data från urvalet tillsammans med en bayesiansk modell för att hitta en fördelning för förhållandet mellan bilvägsavstånd och flygavstånd. Slutligen framförs en lösning på det kortaste bilvägsavståndet med hjälp av en Nearest Neighbour algoritm (NNA) samt Simulated Annealing (SA). Statistiskt beräkningsarbete utförs med Markov Chain Monte Carlo (MCMC). Resultaten ger en kortaste bilvägssträcka på 717 km.
|
360 |
Caractérisation multibande de galaxies par hiérarchie de modèles et arbres de composantes connexesPerret, Benjamin 17 November 2010 (has links) (PDF)
Cette thèse propose une méthode de caractérisation morphologique multibande des galaxies. Ces dernières ont commencé à évoluer et à interagir très tôt dans l'histoire de l'Univers: leurs formes dans les différentes parties du spectre électromagnétique représentent donc un traceur important de cette histoire. Ce travail propose une organisation hiérarchique de modèles, allant de la description des structures dominantes (bulbe et disque) aux composantes les plus fines (bras spiraux, anneaux, ...). Elle permet d'aboutir à une description des galaxies de haut niveau sémantique, chaque modèle réalisant une décomposition multibande de l'image en composantes astrophysiques interprétables par les astronomes. Les modélisations proposées innovent par l'intégration d'un filtre adaptatif appliqué sur les observations, dont les paramètres sont estimés conjointement avec ceux des composantes de la galaxie. L'estimation des paramètres des modèles est effectuée dans un contexte bayésien et résolue à l'aide d'algorithmes d'optimisation stochastique (algorithmes de Monte Carlo par chaines de Markov). La rapidité des algorithmes est améliorée grâce à des techniques d'échelles et de directions adaptatives, ainsi qu'à une version multi-températures du recuit simulé. En outre, les développements concernant la théorie des arbres de composantes connexes permettent la mise au point d'algorithmes non paramétriques multibandes, efficaces et robustes pour la réalisation des pré-traitements nécessaires à la mise en oeuvre de la décomposition en structures. Cela a notamment abouti à des avancées dans la théorie des hyperconnexions et des représentations sous forme d'arbres de composantes hyperconnexes. Les performances des méthodes proposées ont été évaluées sur un ensemble conséquent d'environ 1 500 galaxies et discutées avec les astronomes: elles montrent clairement la pertinence et la robustesse de la méthode. Ces résultats ouvrent la voie à de nouvelles classifications prenant en compte la signature multibande des galaxies spatialement résolues.
|
Page generated in 0.0239 seconds