• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 8
  • 1
  • Tagged with
  • 13
  • 13
  • 5
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

The parametrisation of statistical models

Hills, Susan January 1989 (has links)
No description available.
2

Regularized and robust regression methods for high dimensional data

Hashem, Hussein Abdulahman January 2014 (has links)
Recently, variable selection in high-dimensional data has attracted much research interest. Classical stepwise subset selection methods are widely used in practice, but when the number of predictors is large these methods are difficult to implement. In these cases, modern regularization methods have become a popular choice as they perform variable selection and parameter estimation simultaneously. However, the estimation procedure becomes more difficult and challenging when the data suffer from outliers or when the assumption of normality is violated such as in the case of heavy-tailed errors. In these cases, quantile regression is the most appropriate method to use. In this thesis we combine these two classical approaches together to produce regularized quantile regression methods. Chapter 2 shows a comparative simulation study of regularized and robust regression methods when the response variable is continuous. In chapter 3, we develop a quantile regression model with a group lasso penalty for binary response data when the predictors have a grouped structure and when the data suffer from outliers. In chapter 4, we extend this method to the case of censored response variables. Numerical examples on simulated and real data are used to evaluate the performance of the proposed methods in comparisons with other existing methods.
3

Structured Bayesian methods for splicing analysis in RNA-seq data

Huang, Yuanhua January 2018 (has links)
In most eukaryotes, alternative splicing is an important regulatory mechanism of gene expression that results in a single gene coding for multiple protein isoforms, thus largely increases the diversity of the proteome. RNA-seq is widely used for genome-wide splicing isoform quantification, and several effective and powerful methods have been developed for splicing analysis with RNA-seq data. However, it remains problematic for genes with low coverages or large number of isoforms. These difficulties may in principle be ameliorated by exploiting correlations encoded in the structured data sources. This thesis contributes to developments of Bayesian methods for splicing analysis by leveraging additional information in multiple datasets with structured prior distributions. First, we developed DICEseq, the first isoform quantification method tailored to time-series RNA-seq experiments. DICEseq explicitly models the correlations between experiments at different time points to aid the quantification of isoforms across experiments. Numerical experiments on both simulated and real datasets show that DICEseq yields more accurate results than state-of-the-art methods, an advantage that can become considerable at low coverage levels. Furthermore, DICEseq permits to quantify the trade-off between temporal sampling of RNA and depth of sequencing, frequently an important choice when planning experiments. Second, we developed BRIE (Bayesian Regression for Isoform Estimation), a Bayesian hierarchical model which resolves the difficulties in splicing analysis in single-cell RNA-seq (scRNA-seq) data by learning an informative prior distribution from sequence features. This method combines the quantification and imputation for splicing analysis via a Bayesian way, which is particularly useful in scRNA-seq data due to its extreme low coverages and high technical noises. We validated BRIE on several scRNA-seq data sets, showing that BRIE yields reproducible estimates of exon inclusion ratios in single cells. Third, we provided an effective tool by using Bayes factor to sensitively detect differential splicing between different single cells. When applying BRIE to a few real datasets, we found interesting heterogeneity patterns in splicing events across cell population, for example alternative exons in DNMT3B. In summary, this thesis proposes structured Bayesian methods to integrate multiple datasets to improve splicing analysis and study its biological functions.
4

Bayesian Regression Inference Using a Normal Mixture Model

Maldonado, Hernan 08 August 2012 (has links)
In this thesis we develop a two component mixture model to perform a Bayesian regression. We implement our model computationally using the Gibbs sampler algorithm and apply it to a dataset of differences in time measurement between two clocks. The dataset has ``good" time measurements and ``bad" time measurements that were associated with the two components of our mixture model. From our theoretical work we show that latent variables are a useful tool to implement our Bayesian normal mixture model with two components. After applying our model to the data we found that the model reasonably assigned probabilities of occurrence to the two states of the phenomenon of study; it also identified two processes with the same slope, different intercepts and different variances. / McAnulty College and Graduate School of Liberal Arts; / Computational Mathematics / MS; / Thesis;
5

A Bayesian approach to predict the number of soccer goals : Modeling with Bayesian Negative Binomial regression

Bäcklund, JOakim, Nils, Johdet January 2018 (has links)
This thesis focuses on a well-known topic in sports betting, predicting the number of goals in soccer games.The data set used comes from the top English soccer league: Premier League, and consists of games played in the seasons 2015/16 to 2017/18.This thesis approaches the prediction with the auxiliary support of the odds from the betting exchange Betfair. The purpose is to find a model that can create an accurate goal distribution. %The other purpose is to investigate whether Negative binomial distribution regressionThe methods used are Bayesian Negative Binomial regression and Bayesian Poisson regression. The results conclude that the Poisson regression is the better model because of the presence of underdispersion.We argue that the methods can be used to compare different sportsbooks accuracies, and may help creating better models.
6

Bayesian Additive Regression Trees: Sensitivity Analysis and Multiobjective Optimization

Horiguchi, Akira January 2020 (has links)
No description available.
7

On labour market discrimination against Roma in South East Europe

Milcher, Susanne, Fischer, Manfred M. 10 1900 (has links) (PDF)
This paper directs interest on country-specific labour market discrimination Roma may suffer in South East Europe. The study lies in the tradition of statistical Blinder-Oaxaca decomposition analysis. We use microdata from UNDP's 2004 survey of Roma minorities, and apply a Bayesian approach, proposed by Keith and LeSage (2004), for the decomposition analysis of wage differentials. This approach is based on a robust Bayesian heteroscedastic linear regression model in conjunction with Markov Chain Monte Carlo (MCMC) estimation. The results obtained indicate the presence of labour market discrimination in Albania and Kosovo, but point to its absence in Bulgaria, Croatia, and Serbia. (authors' abstract)
8

Feed efficiency traits in Santa Inês sheep under genomic approaches / Eficiência alimentar em ovinos da raça Santa Inês sob abordagem genômica

Alvarenga, Amanda Botelho 28 September 2017 (has links)
The selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals by increasing accuracy of prediction and reducing generation interval, especially for difficult to measure traits, such as feed efficiency. Feed efficiency is the most important trait in animal production due to its impacts on cost of production and environmental factors. Many metrics measure the feed efficiency, such as ratio of gain to feed (FER), the ratio of feed to gain (FCR) and residual feed intake (RFI). Nevertheless, in ovine, no study with the aim of understand the genetic variants or the accuracy of genomic estimated breeding value (GEBV) for feed efficiency traits was published yet. Moreover, before to apply the genomic information, it is necessary to understand and characterized the population structure, for instance, by linkage disequilibrium (LD). Both genome-wide association studies (GWAS) and genomic selection (GS) leverage LD between marker and causal mutation. Based on the above considerations, the aim of this study was to map LD in ovine, characterized by Brazilian Santa Inês sheep; to search genetic variants for feed efficiency traits (FER, FCR and RFI) through GWAS; and to verify the accuracy of GEBV for RFI. In total, 396 samples (animals) of Longissimus dorsi muscle were collect. A high-density panel of SNP (Illumina High-Density Ovine SNP BeadChip®) comprising 54,241 SNPs was used to obtain the genotyping data. The phenotype data was comprised of 387 animals. The average LD between adjacent markers for two LD metrics, r² and |D\'|, were 0.166 and 0.617, respectively. The degree of LD estimated was lower than reported in other species and it was characterized by short haplotype blocks. Consequently, for genomic analyses, high-density panels of marker are recommended. Many markers were associated to feed efficiency traits in GWAS, mainly to RFI trait. Few candidate genes were reported in this study, highlighting NRF-1 (nuclear respiratory factor 1), which controls mitochondrial biosynthesis, the most important process responsible by a great fraction of the produced energy. Finally, we verified the accuracy of GEBV for RFI using few Bayesian regression models, and we found low accuracy, ranging from 0.033 (BayesB with π=0.9912) to 0.036 (BayesA), which might be explained by the low relationship among animals and small training population. / A seleção com base nos valores genéticos genômicos preditos pode aumentar substancialmente a taxa de ganho genético em animais por meio do aumento da acurácia de predição e redução do intervalo de gerações, especialmente para características de difícil e/ou onerosa mensuração, como eficiência alimentar. A eficiência alimentar é uma das características mais importantes na produção animal devido principalmente aos seus impactos econômicos e ambientais. Muitas métricas representam a eficiência alimentar, por exemplo: a relação do ganho de peso e consumo alimentar (EA), a proporção do consumo alimentar e ganho de peso (CA) e o consumo alimentar residual (CAR). Em ovinos, nenhum estudo com o objetivo de buscar variantes genéticas ou verificar a acurácia do valor genético genômico estimado para eficiência alimentar foi publicado. Adicionalmente, antes de aplicar a informação genômica, é necessário compreender e caracterizar a estrutura da população, como por meio do desequilíbrio de ligação (LD). O estudo de associação genômica (GWAS) e seleção genômica (GS) consideram o LD entre marcador e a mutação causal. Com base nas considerações acima, o objetivo deste estudo foi mapear o LD em ovinos, caracterizado pela raça ovina Santa Inês; localizar variantes genéticas para as características de eficiência alimentar (EA, CA e CAR) utilizando a abordagem GWAS; e verificar a acurácia da estimação dos valores genéticos genômico para o CAR. No total, foram coletadas 396 amostras (animais) do músculo Longissimus dorsi, para posterior genotipagem utilizando o painel de alta densidade (Illumina High-Density Ovine SNP BeadChip®), compreendendo 54.241 SNPs. O banco fenotípico é composto por 387 animais. O LD médio entre marcadores adjacentes para duas métricas de LD, r² e |D\'|, foram 0,166 e 0,617, respectivamente. O grau de LD estimado foi menor que o relatado em outras espécies e foi caracterizado por blocos de haplótipos curtos. Consequentemente, para as análises genômicas são recomendados painéis de marcadores de alta densidade. No GWAS, foram encontrados muitos marcadores associados aos fenótipos, em especial, à característica CAR. Alguns genes candidatos foram relatados neste estudo, destacando-se o NRF-1 (fator respiratório nuclear 1), que controla a biossíntese mitocondrial, o processo mais importante responsável por grande parte da produção de energia. Finalmente, verificamos a acurácia do valor genético genômico estimado para o CAR usando modelos de regressão Bayesiana, e encontramos baixos valores para acurácia (0,033 a 0,036) o que pode ser explicado pelo baixo grau de relacionamento entre os indivíduos e tamanho reduzido da população de treinamento.
9

Feed efficiency traits in Santa Inês sheep under genomic approaches / Eficiência alimentar em ovinos da raça Santa Inês sob abordagem genômica

Amanda Botelho Alvarenga 28 September 2017 (has links)
The selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals by increasing accuracy of prediction and reducing generation interval, especially for difficult to measure traits, such as feed efficiency. Feed efficiency is the most important trait in animal production due to its impacts on cost of production and environmental factors. Many metrics measure the feed efficiency, such as ratio of gain to feed (FER), the ratio of feed to gain (FCR) and residual feed intake (RFI). Nevertheless, in ovine, no study with the aim of understand the genetic variants or the accuracy of genomic estimated breeding value (GEBV) for feed efficiency traits was published yet. Moreover, before to apply the genomic information, it is necessary to understand and characterized the population structure, for instance, by linkage disequilibrium (LD). Both genome-wide association studies (GWAS) and genomic selection (GS) leverage LD between marker and causal mutation. Based on the above considerations, the aim of this study was to map LD in ovine, characterized by Brazilian Santa Inês sheep; to search genetic variants for feed efficiency traits (FER, FCR and RFI) through GWAS; and to verify the accuracy of GEBV for RFI. In total, 396 samples (animals) of Longissimus dorsi muscle were collect. A high-density panel of SNP (Illumina High-Density Ovine SNP BeadChip®) comprising 54,241 SNPs was used to obtain the genotyping data. The phenotype data was comprised of 387 animals. The average LD between adjacent markers for two LD metrics, r² and |D\'|, were 0.166 and 0.617, respectively. The degree of LD estimated was lower than reported in other species and it was characterized by short haplotype blocks. Consequently, for genomic analyses, high-density panels of marker are recommended. Many markers were associated to feed efficiency traits in GWAS, mainly to RFI trait. Few candidate genes were reported in this study, highlighting NRF-1 (nuclear respiratory factor 1), which controls mitochondrial biosynthesis, the most important process responsible by a great fraction of the produced energy. Finally, we verified the accuracy of GEBV for RFI using few Bayesian regression models, and we found low accuracy, ranging from 0.033 (BayesB with π=0.9912) to 0.036 (BayesA), which might be explained by the low relationship among animals and small training population. / A seleção com base nos valores genéticos genômicos preditos pode aumentar substancialmente a taxa de ganho genético em animais por meio do aumento da acurácia de predição e redução do intervalo de gerações, especialmente para características de difícil e/ou onerosa mensuração, como eficiência alimentar. A eficiência alimentar é uma das características mais importantes na produção animal devido principalmente aos seus impactos econômicos e ambientais. Muitas métricas representam a eficiência alimentar, por exemplo: a relação do ganho de peso e consumo alimentar (EA), a proporção do consumo alimentar e ganho de peso (CA) e o consumo alimentar residual (CAR). Em ovinos, nenhum estudo com o objetivo de buscar variantes genéticas ou verificar a acurácia do valor genético genômico estimado para eficiência alimentar foi publicado. Adicionalmente, antes de aplicar a informação genômica, é necessário compreender e caracterizar a estrutura da população, como por meio do desequilíbrio de ligação (LD). O estudo de associação genômica (GWAS) e seleção genômica (GS) consideram o LD entre marcador e a mutação causal. Com base nas considerações acima, o objetivo deste estudo foi mapear o LD em ovinos, caracterizado pela raça ovina Santa Inês; localizar variantes genéticas para as características de eficiência alimentar (EA, CA e CAR) utilizando a abordagem GWAS; e verificar a acurácia da estimação dos valores genéticos genômico para o CAR. No total, foram coletadas 396 amostras (animais) do músculo Longissimus dorsi, para posterior genotipagem utilizando o painel de alta densidade (Illumina High-Density Ovine SNP BeadChip®), compreendendo 54.241 SNPs. O banco fenotípico é composto por 387 animais. O LD médio entre marcadores adjacentes para duas métricas de LD, r² e |D\'|, foram 0,166 e 0,617, respectivamente. O grau de LD estimado foi menor que o relatado em outras espécies e foi caracterizado por blocos de haplótipos curtos. Consequentemente, para as análises genômicas são recomendados painéis de marcadores de alta densidade. No GWAS, foram encontrados muitos marcadores associados aos fenótipos, em especial, à característica CAR. Alguns genes candidatos foram relatados neste estudo, destacando-se o NRF-1 (fator respiratório nuclear 1), que controla a biossíntese mitocondrial, o processo mais importante responsável por grande parte da produção de energia. Finalmente, verificamos a acurácia do valor genético genômico estimado para o CAR usando modelos de regressão Bayesiana, e encontramos baixos valores para acurácia (0,033 a 0,036) o que pode ser explicado pelo baixo grau de relacionamento entre os indivíduos e tamanho reduzido da população de treinamento.
10

Régression bayésienne sous contraintes de régularité et de forme. / Bayesian regression under shape and smoothness restriction.

Khadraoui, Khader 08 December 2011 (has links)
Nous étudions la régression bayésienne sous contraintes de régularité et de forme. Pour cela,on considère une base de B-spline pour obtenir une courbe lisse et nous démontrons que la forme d'une spline engendrée par une base de B-spline est contrôlée par un ensemble de points de contrôle qui ne sont pas situés sur la courbe de la spline. On propose différents types de contraintes de forme (monotonie, unimodalité, convexité, etc). Ces contraintes sont prises en compte grâce à la loi a priori. L'inférence bayésienne a permis de dériver la distribution posteriori sous forme explicite à une constante près. En utilisant un algorithme hybride de type Metropolis-Hastings avec une étape de Gibbs, on propose des simulations suivant la distribution a posteriori tronquée. Nous estimons la fonction de régression par le mode a posteriori. Un algorithme de type recuit simulé a permis de calculer le mode a posteriori. La convergence des algorithmes de simulations et du calcul de l'estimateur est prouvée. En particulier, quand les noeuds des B-splines sont variables, l'analyse bayésienne de la régression sous contrainte devient complexe. On propose des schémas de simulations originaux permettant de générer suivant la loi a posteriori lorsque la densité tronquée des coefficients de régression prend des dimensions variables. / We investigate the Bayesian regression under shape and smoothness constraints. We first elicita Bayesian method for regression under shape restrictions and smoothness conditions. Theregression function is built from B-spline basis that controls its regularity. Then we show thatits shape can be controlled simply from its coefficients in the B-spline basis. This is achievedthrough the control polygon whose definition and some properties are given in this article.The regression function is estimated by the posterior mode. This mode is calculated by asimulated annealing algorithm which allows to take into account the constraints of form inthe proposal distribution. A credible interval is obtained from simulations using Metropolis-Hastings algorithm with the same proposal distribution as the simulated annealing algorithm.The convergence of algorithms for simulations and calculation of the estimator is proved. Inparticular, in the case of Bayesian regression under constraints and with free knots, Bayesiananalysis becomes complex. we propose original simulation schemes which allows to simulatefrom the truncated posterior distribution with free dimension.

Page generated in 0.0977 seconds