• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • Tagged with
  • 5
  • 5
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Statistical methods for the testing and estimation of linear dependence structures on paired high-dimensional data : application to genomic data

Mestres, Adrià Caballé January 2018 (has links)
This thesis provides novel methodology for statistical analysis of paired high-dimensional genomic data, with the aimto identify gene interactions specific to each group of samples as well as the gene connections that change between the two classes of observations. An example of such groups can be patients under two medical conditions, in which the estimation of gene interaction networks is relevant to biologists as part of discerning gene regulatory mechanisms that control a disease process like, for instance, cancer. We construct these interaction networks fromdata by considering the non-zero structure of correlationmatrices, which measure linear dependence between random variables, and their inversematrices, which are commonly known as precision matrices and determine linear conditional dependence instead. In this regard, we study three statistical problems related to the testing, single estimation and joint estimation of (conditional) dependence structures. Firstly, we develop hypothesis testingmethods to assess the equality of two correlation matrices, and also two correlation sub-matrices, corresponding to two classes of samples, and hence the equality of the underlying gene interaction networks. We consider statistics based on the average of squares, maximum and sum of exceedances of sample correlations, which are suitable for both independent and paired observations. We derive the limiting distributions for the test statistics where possible and, for practical needs, we present a permuted samples based approach to find their corresponding non-parametric distributions. Cases where such hypothesis testing presents enough evidence against the null hypothesis of equality of two correlation matrices give rise to the problem of estimating two correlation (or precision) matrices. However, before that we address the statistical problem of estimating conditional dependence between random variables in a single class of samples when data are high-dimensional, which is the second topic of the thesis. We study the graphical lasso method which employs an L1 penalized likelihood expression to estimate the precision matrix and its underlying non-zero graph structure. The lasso penalization termis given by the L1 normof the precisionmatrix elements scaled by a regularization parameter, which determines the trade-off between sparsity of the graph and fit to the data, and its selection is our main focus of investigation. We propose several procedures to select the regularization parameter in the graphical lasso optimization problem that rely on network characteristics such as clustering or connectivity of the graph. Thirdly, we address the more general problem of estimating two precision matrices that are expected to be similar, when datasets are dependent, focusing on the particular case of paired observations. We propose a new method to estimate these precision matrices simultaneously, a weighted fused graphical lasso estimator. The analogous joint estimation method concerning two regression coefficient matrices, which we call weighted fused regression lasso, is also developed in this thesis under the same paired and high-dimensional setting. The two joint estimators maximize penalized marginal log likelihood functions, which encourage both sparsity and similarity in the estimated matrices, and that are solved using an alternating direction method of multipliers (ADMM) algorithm. Sparsity and similarity of thematrices are determined by two tuning parameters and we propose to choose them by controlling the corresponding average error rates related to the expected number of false positive edges in the estimated conditional dependence networks. These testing and estimation methods are implemented within the R package ldstatsHD, and are applied to a comprehensive range of simulated data sets as well as to high-dimensional real case studies of genomic data. We employ testing approaches with the purpose of discovering pathway lists of genes that present significantly different correlation matrices on healthy and unhealthy (e.g., tumor) samples. Besides, we use hypothesis testing problems on correlation sub-matrices to reduce the number of genes for estimation. The proposed joint estimation methods are then considered to find gene interactions that are common between medical conditions as well as interactions that vary in the presence of unhealthy tissues.
2

An Improved Classifier Chain Ensemble for Multi-DimensionalClassification with Conditional Dependence

Heydorn, Joseph Ethan 01 July 2015 (has links) (PDF)
We focus on multi-dimensional classification (MDC) problems with conditional dependence, which we call multiple output dependence (MOD) problems. MDC is the task of predicting a vector of categorical outputs for each input. Conditional dependence in MDC means that the choice for one output value affects the choice for others, so it is not desirable to predict outputs independently. We show that conditional dependence in MDC implies that a single input can map to multiple correct output vectors. This means it is desirable to find multiple correct output vectors per input. Current solutions for MOD problems are not sufficient because they predict only one of the correct output vectors per input, ignoring all others.We modify four existing MDC solutions, including chain classifiers, to predict multiple output vectors. We further create a novel ensemble technique named weighted output vector ensemble (WOVE) which combines these multiple predictions from multiple chain classifiers in a way that preserves the integrity of output vectors and thus preserves conditional dependence among outputs. We verify the effectiveness of WOVE by comparing it against 7 other solutions on a variety of data sets and find that it shows significant gains over existing methods.
3

Applications of modern regression techniques in empirical economics

März, Alexander 14 July 2016 (has links)
No description available.
4

Computing strategies for complex Bayesian models / Stratégies computationnelles pour des modèles Bayésiens complexes

Banterle, Marco 21 July 2016 (has links)
Cette thèse présente des contributions à la littérature des méthodes de Monte Carlo utilisé dans l'analyse des modèles complexes en statistique Bayésienne; l'accent est mis à la fois sur la complexité des modèles et sur les difficultés de calcul.Le premier chapitre élargit Delayed Acceptance, une variante computationellement efficace du Metropolis--Hastings, et agrandit son cadre théorique fournissant une justification adéquate pour la méthode, des limits pour sa variance asymptotique par rapport au Metropolis--Hastings et des idées pour le réglage optimal de sa distribution instrumentale.Nous allons ensuite développer une méthode Bayésienne pour analyser les processus environnementaux non stationnaires, appelées Expansion Dimension, qui considère le processus observé comme une projection depuis une dimension supérieure, où l'hypothèse de stationnarité pourrait etre acceptée. Le dernier chapitre sera finalement consacrée à l'étude des structures de dépendances conditionnelles par une formulation entièrement Bayésienne du modèle de Copule Gaussien graphique. / This thesis presents contributions to the Monte Carlo literature aimed toward the analysis of complex models in Bayesian Statistics; the focus is on both complexity related to complicate models and computational difficulties.We will first expand Delayed Acceptance, a computationally efficient variant ofMetropolis--Hastings, to a multi-step procedure and enlarge its theoretical background, providing proper justification for the method, asymptotic variance bounds relative to its parent MH kernel and optimal tuning for the scale of its proposal.We will then develop a flexible Bayesian method to analyse nonlinear environmentalprocesses, called Dimension Expansion, that essentially consider the observed process as a projection from a higher dimension, where the assumption of stationarity could hold.The last chapter will finally be dedicated to the investigation of conditional (in)dependence structures via a fully Bayesian formulation of the Gaussian Copula graphical model.
5

Avaliação de testes diagnósticos na ausência de padrão ouro considerando relaxamento da suposição de independência condicional, covariáveis e estratificação da população: uma abordagem Bayesiana

Pereira, Gilberto de Araujo 16 December 2011 (has links)
Made available in DSpace on 2016-06-02T20:04:51Z (GMT). No. of bitstreams: 1 4040.pdf: 1510214 bytes, checksum: 7dfe4542c20ffa8a47309738bc22a922 (MD5) Previous issue date: 2011-12-16 / Financiadora de Estudos e Projetos / The application of a gold standard reference test in all or part of the sample under investigation is often not feasible for the majority of diseases affecting humans, either by a lack of consensus on which testing may be considered a gold standard, the high level of invasion of the gold standard technique, the high cost of financially large-scale application, or by ethical questions, so to know the performance of existing tests is essential for the process of diagnosis of these diseases. In statistical modeling aimed to obtain robust estimates of the prevalence of the disease (x ) and the performance parameters of diagnostic tests (sensitivity (Se) and specificity (Sp)), various strategies have been considered such as the stratification of the population, the relaxation of the assumption of conditional independence, the inclusion of covariates, the verification type (partial or total) and the techniques to replace the gold standard. In this thesis we propose a new structure of stratification of the population considering both the prevalence rates and the parameters of test performance among the different strata (EHW). A Bayesian latent class modeling to estimate these parameters was developed for the general case of K diagnostic tests under investigation, relaxation of the assumption of conditional independence according to the formulations of the fixed effect (FECD) and random (RECD) with dependent order (h _ k) and M covariates. The application of models to two data sets about the performance evaluation of diagnostic tests used in screening for Chagas disease in blood donors showed results consistent with the sensitivity studies. Overall, we observed for the structure of stratification proposal (EHW) superior performance and estimates closer to the nominal values when compared to the structure of stratification when only the prevalence rates are different between the strata (HW), even when we consider data set with rates of Se, Sp and x close among the strata. Generally, the structure of latent class, when we have low or high prevalence of the disease, estimates of sensitivity and specificity rates have higher standard errors. However, in these cases, when there is high concordance of positive or negative results of the tests, the error pattern of these estimates are reduced. Regardless of the structure of stratification (EHW, HW), sample size and the different scenarios used to model the prior information, the model of conditional dependency from the FECD and RECD had, from the information criteria (AIC, BIC and DIC), superior performance to the structure of conditional independence (CI) and to FECD with improved performance and estimates closer to the nominal values. Besides the connection logit, derived from the logistic distribution with symmetrical shape, find in the link GEV, derived from the generalized extreme value distribution which accommodates symmetric and asymmetric shapes, a interesting alternative to construct the conditional dependence structure from the RECD. As an alternative to the problem of identifiability, present in this type of model, the criteria adopted to elicit the informative priors by combining descriptive analysis of data, adjustment models from simpler structures, were able to produce estimates with low standard error and very close to the nominal values. / Na área da saúde a aplicação de teste de referência padrão ouro na totalidade ou parte da amostra sob investigação é, muitas vezes, impraticável devido à inexistência de consenso sobre o teste a ser considerado padrão ouro, ao elevado nível de invasão da técnica, ao alto custo da aplicação em grande escala ou por questões éticas. Contudo, conhecer o desempenho dos testes é fundamental no processo de diagnóstico. Na modelagem estatística voltada à estimação da taxa de prevalência da doença (x ) e dos parâmetros de desempenho de testes diagnósticos (sensibilidade (S) e especificidade (E)), a literatura tem explorado: estratificação da população, relaxamento da suposição de independência condicional, inclusão de covariáveis, tipo de verificação pelo teste padrão ouro e técnicas para substituir o teste padrão ouro inexistente ou inviável de ser aplicado em toda a amostra. Neste trabalho, propomos uma nova estrutura de estratificação da população considerando taxas de prevalências e parâmetros de desempenho diferentes entre os estratos (HWE). Apresentamos uma modelagem bayesiana de classe latente para o caso geral de K testes diagnósticos sob investigação, relaxamento da suposição de independência condicional segundo as formulações de efeito fixo (DCEF) e efeito aleatório (DCEA) com dependência de ordem (h _ K) e inclusão de M covariáveis. A aplicação dos modelos a dois conjuntos de dados sobre avaliação do desempenho de testes diagnósticos utilizados na triagem da doença de Chagas em doadores de sangue apresentou resultados coerentes com os estudos de sensibilidade. Observamos, para a estrutura de estratificação proposta, HWE, desempenho superior e estimativas muito próximas dos valores nominais quando comparados à estrutura de estratificação na qual somente as taxas de prevalências são diferentes entre os estratos (HW), mesmo quando consideramos dados com taxas de S, E e x muito próximas entre os estratos. Geralmente, na estrutura de classe latente, quando temos baixa ou alta prevalência da doença, as estimativas das sensibilidades e especificidades apresentam, respectivamente, erro padrão mais elevado. No entanto, quando há alta concordância de resultados positivos ou negativos, tal erro diminui. Independentemente da estrutura de estratificação (HWE, HW), do tamanho amostral e dos diferentes cenários utilizados para modelar o conhecimento a priori, os modelos de DCEF e de DCEA apresentaram, a partir dos critérios de informação (AIC, BIC e DIC), desempenhos superiores à estrutura de independência condicional (IC), sendo o de DCEF com melhor desempenho e estimativas mais próximas dos valores nominais. Além da ligação logito, derivada da distribuição logística com forma simétrica, encontramos na ligação VEG , derivada da distribuição de valor extremo generalizada a qual acomoda formas simétricas e assimétricas, interessante alternativa para construir a estrutura de DCEA. Como alternativa ao problema de identificabilidade, neste tipo de modelo, os critérios para elicitar as prioris informativas, combinando análise descritiva dos dados com ajuste de modelos de estruturas mais simples, contribuíram para produzir estimativas com baixo erro padrão e muito próximas dos valores nominais.

Page generated in 0.1188 seconds