• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 307
  • 92
  • 59
  • 51
  • 12
  • 10
  • 7
  • 6
  • 6
  • 5
  • 4
  • 3
  • 2
  • 2
  • 2
  • Tagged with
  • 644
  • 280
  • 161
  • 138
  • 137
  • 100
  • 72
  • 69
  • 67
  • 66
  • 66
  • 63
  • 57
  • 49
  • 48
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
501

Contributions à la statistique bayésienne non-paramétrique / Contributions to Bayesian nonparametric statistic

Arbel, Julyan 24 September 2013 (has links)
La thèse est divisée en deux parties portant sur deux aspects relativement différents des approches bayésiennes non-paramétriques. Dans la première partie, nous nous intéressons aux propriétés fréquentistes (asymptotiques) de lois a posteriori pour des paramètres appartenant à l'ensemble des suites réelles de carré sommable. Dans la deuxième partie, nous nous intéressons à des approches non-paramétriques modélisant des données d'espèces et leur diversité en fonction de certaines variables explicatives, à partir de modèles qui utilisent des mesures de probabilité aléatoires. / This thesis is divided in two parts on rather different aspects of Bayesian statistics. In the first part, we deal with frequentist (asymptotic) properties of posterior distributions for parameters which belong to the space of real square sommable sequences. In the second part, we deal with nonparametric approaches modelling species data and the diversity of these data with respect to covariates. To that purpose, we use models based on random probability measures.
502

[en] COMPARISON OF DIFFERENT APPROACHES FOR DETECTION AND TREATMENT OF OUTLIERS IN METER FACTORS DETERMINATION / [pt] COMPARAÇÃO DE DIFERENTES TÉCNICAS PARA DETECÇÃO E TRATAMENTO DE OUTLIERS NA DETERMINAÇÃO DE FATORES DE MEDIDORES

ANDERSON LUIZ DOS SANTOS FERREIRA 20 February 2018 (has links)
[pt] O objetivo da dissertação é analisar o comportamento das diferentes metodologias utilizadas para detecção e tratamento de outliers na determinação de fatores de prova de medidores do tipo turbina. A motivação desse trabalho é evitar tomadas de decisão equivocadas decorrentes de tratamento inadequado de outliers, comprometendo a confiabilidade na medição e consequentemente seu faturamento. A prova do fator de medidor pode ser considerada como um parâmetro de calibração, expressando a razão entre o volume de referência e o volume total de líquido que passa através do medidor. A Norma internacional recomenda o teste de Dixon para o tratamento de outliers para um conjunto de prova do fator de medidor. No entanto, a literatura é explícita quanto à avaliação do comportamento dos dados, a priori. A metodologia avalia se o comportamento dos dados do conjunto de prova do fator de medidor é Gaussiano, em seguida se comparam diferentes abordagens paramétricas e não paramétricas para a detecção e tratamento de outliers aplicados às provas dos fatores de medidores do tipo turbina para a transferência de custódia de gás liquefeito de petróleo. Posteriormente, este efeito é avaliado em relação ao número de outliers e como este manuseio afeta os critérios da amplitude variável para a incerteza expandida na prova do fator de medidor médio. Os resultados mostram que diferentes fatores de medidores médios podem ser alcançados para cada teste paramétrico e não paramétrico; de qualquer forma, conclui-se que estatisticamente não é observada diferença significativa entre eles. / [en] The objective of the dissertation is to analyze the behavior of the different methodologies used for the detection and treatment of outliers in the determination of meter proving factors of turbine type meters. The motivation of this work is to avoid mistaken decision-making as a result of inadequate treatment of outliers, compromising reliability in measurement and consequently its billing. A meter proving factor can be considered as a calibration parameter, by expressing the ratio the reference volume and the gross volume of liquid passed through a meter. The international guideline recommends Dixon s test for outliers to a meter proving factor set. However, the literature is explicit regarding the evaluation of data behavior, a priori. The methodology evaluates if the behavior of the meter proving factor set is Gaussian, then different parametric and nonparametric approaches for detection and treating outliers applied to turbine meter proving factors for custody transfer of liquefied petroleum gas are compared. Afterwards, this effect is evaluated in relation to the number of outliers and how this handling affects the variable range criteria for expanded uncertainty in average meter proving factor. The results show that different average meter factors can be reached for each nonparametric and parametric test; anyway, it is concluded that no statistically significant difference between them is noticed.
503

Time series forecasting with applications in macroeconomics and energy

Arora, Siddharth January 2013 (has links)
The aim of this study is to develop novel forecasting methodologies. The applications of our proposed models lie in two different areas: macroeconomics and energy. Though we consider two very different applications, the common underlying theme of this thesis is to develop novel methodologies that are not only accurate, but are also parsimonious. For macroeconomic time series, we focus on generating forecasts for the US Gross National Product (GNP). The contribution of our study on macroeconomic forecasting lies in proposing a novel nonlinear and nonparametric method, called weighted random analogue prediction (WRAP) method. The out-of-sample forecasting ability of WRAP is evaluated by employing a range of different performance scores, which measure its accuracy in generating both point and density forecasts. We show that WRAP outperforms some of the most commonly used models for forecasting the GNP time series. For energy, we focus on two different applications: (1) Generating accurate short-term forecasts for the total electricity demand (load) for Great Britain. (2) Modelling Irish electricity smart meter data (consumption) for both residential consumers and small and medium-sized enterprises (SMEs), using methods based on kernel density (KD) and conditional kernel density (CKD) estimation. To model load, we propose methods based on a commonly used statistical dimension reduction technique, called singular value decomposition (SVD). Specifically, we propose two novel methods, namely, discount weighted (DW) intraday and DW intraweek SVD-based exponential smoothing methods. We show that the proposed methods are competitive with some of the most commonly used models for load forecasting, and also lead to a substantial reduction in the dimension of the model. The load time series exhibits a prominent intraday, intraweek and intrayear seasonality. However, most existing studies accommodate the ‘double seasonality’ while modelling short-term load, focussing only on the intraday and intraweek seasonal effects. The methods considered in this study accommodate the ‘triple seasonality’ in load, by capturing not only intraday and intraweek seasonal cycles, but also intrayear seasonality. For modelling load, we also propose a novel rule-based approach, with emphasis on special days. The load observed on special days, e.g. public holidays, is substantially lower compared to load observed on normal working days. Special day effects have often been ignored during the modelling process, which leads to large forecast errors on special days, and also on normal working days that lie in the vicinity of special days. The contribution of this study lies in adapting some of the most commonly used seasonal methods to model load for both normal and special days in a coherent and unified framework, using a rule-based approach. We show that the post-sample error across special days for the rule-based methods are less than half, compared to their original counterparts that ignore special day effects. For modelling electricity smart meter data, we investigate a range of different methods based on KD and CKD estimation. Over the coming decade, electricity smart meters are scheduled to replace the conventional electronic meters, in both US and Europe. Future estimates of consumption can help the consumer identify and reduce excess consumption, while such estimates can help the supplier devise innovative tariff strategies. To the best of our knowledge, there are no existing studies which focus on generating density forecasts of electricity consumption from smart meter data. In this study, we evaluate the density, quantile and point forecast accuracy of different methods across one thousand consumption time series, recorded from both residential consumers and SMEs. We show that the KD and CKD methods accommodate the seasonality in consumption, and correctly distinguish weekdays from weekends. For each application, our comprehensive empirical comparison of the existing and proposed methods was undertaken using multiple performance scores. The results show strong potential for the models proposed in this thesis.
504

Analyse de données fonctionnelles en télédétection hyperspectrale : application à l'étude des paysages agri-forestiers / Functional data analysis in hyperspectral remote sensing : application to the study of agri-forest landscape

Zullo, Anthony 19 September 2016 (has links)
En imagerie hyperspectrale, chaque pixel est associé à un spectre provenant de la réflectance observée en d points de mesure (i.e., longueurs d'onde). On se retrouve souvent dans une situation où la taille d'échantillon n est relativement faible devant le nombre d de variables. Ce phénomène appelé "fléau de la dimension" est bien connu en statistique multivariée. Plus d augmente devant n, plus les performances des méthodologies statistiques standard se dégradent. Les spectres de réflectance intègrent dans leur dimension spectrale un continuum qui leur confère une nature fonctionnelle. Un hyperspectre peut être modélisé par une fonction univariée de la longueur d'onde, sa représentation produisant une courbe. L'utilisation de méthodes fonctionnelles sur de telles données permet de prendre en compte des aspects fonctionnels tels que la continuité, l'ordre des bandes spectrales, et de s'affranchir des fortes corrélations liées à la finesse de la grille de discrétisation. L'objectif principal de cette thèse est d'évaluer la pertinence de l'approche fonctionnelle dans le domaine de la télédétection hyperspectrale lors de l'analyse statistique. Nous nous sommes focalisés sur le modèle non-paramétrique de régression fonctionnelle, couvrant la classification supervisée. Dans un premier temps, l'approche fonctionnelle a été comparée avec des méthodes multivariées usuellement employées en télédétection. L'approche fonctionnelle surpasse les méthodes multivariées dans des situations délicates où l'on dispose d'une petite taille d'échantillon d'apprentissage combinée à des classes relativement homogènes (c'est-à-dire difficiles à discriminer). Dans un second temps, une alternative à l'approche fonctionnelle pour s'affranchir du fléau de la dimension a été développée à l'aide d'un modèle parcimonieux. Ce dernier permet, à travers la sélection d'un petit nombre de points de mesure, de réduire la dimensionnalité du problème tout en augmentant l'interprétabilité des résultats. Dans un troisième temps, nous nous sommes intéressés à la situation pratique quasi-systématique où l'on dispose de données fonctionnelles contaminées. Nous avons démontré que pour une taille d'échantillon fixée, plus la discrétisation est fine, meilleure sera la prédiction. Autrement dit, plus d est grand devant n, plus la méthode statistique fonctionnelle développée est performante. / In hyperspectral imaging, each pixel is associated with a spectrum derived from observed reflectance in d measurement points (i.e., wavelengths). We are often facing a situation where the sample size n is relatively low compared to the number d of variables. This phenomenon called "curse of dimensionality" is well known in multivariate statistics. The mored increases with respect to n, the more standard statistical methodologies performances are degraded. Reflectance spectra incorporate in their spectral dimension a continuum that gives them a functional nature. A hyperspectrum can be modelised by an univariate function of wavelength and his representation produces a curve. The use of functional methods allows to take into account functional aspects such as continuity, spectral bands order, and to overcome strong correlations coming from the discretization grid fineness. The main aim of this thesis is to assess the relevance of the functional approach in the field of hyperspectral remote sensing for statistical analysis. We focused on the nonparametric fonctional regression model, including supervised classification. Firstly, the functional approach has been compared with multivariate methods usually involved in remote sensing. The functional approach outperforms multivariate methods in critical situations where one has a small training sample size combined with relatively homogeneous classes (that is to say, hard to discriminate). Secondly, an alternative to the functional approach to overcome the curse of dimensionality has been proposed using parsimonious models. This latter allows, through the selection of few measurement points, to reduce problem dimensionality while increasing results interpretability. Finally, we were interested in the almost systematic situation where one has contaminated functional data. We proved that for a fixed sample size, the finer the discretization, the better the prediction. In other words, the larger dis compared to n, the more effective the functional statistical methodis.
505

Srovnání znalostí složek IZS u laické a odborné veřejnosti / Comparison of knowledge of the general and professional public about the IRS

VINCÍK, Miroslav January 2012 (has links)
To meet the objectives set by the thesis it was needed to carry out structural analysis of the Integrated Rescue System. Based on this analysis, statistical survey using descriptive methods and mathematical statistics was conducted. Statistical survey is based on the results of the questionnaire survey carried out in two groups of respondents from both the general and professional public. The respondents were chosen as a representative sample of citizens of the district Strakonice, which was specified in a separate chapter. The presence of the normal distribution of knowledge in the general public was then verified, on the contrary, in the professional public the presence of Poisson distribution was surveyed. The difference in the level of knowledge between the two groups of respondents was determined. To achieve the set objectives of the research three hypotheses were established: H1. The empirical distribution of knowledge of the general public can be replaced by the normal distribution at the level of mathematical statistics. H2. The empirical distribution of knowledge of the professional public is more remote to the normal distribution due to a higher level of knowledge. H3. There is a statistically significant difference between the knowledge of both groups of respondents. All 3 hypotheses were verified and accepted, with a positive result confirming the established hypotheses. In the "Discussion", analysis of the obtained results and confirmation of the established hypotheses H1, H2 and H3, together with proposed measures that could increase knowledge of the IRS in both groups of respondents, is presented.
506

Estimação não-paramétrica e semi-paramétrica de fronteiras de produção

Torrent, Hudson da Silva January 2010 (has links)
Existe uma grande e crescente literatura sobre especificação e estimação de fronteiras de produção e, portanto, de eficiência de unidades produtivas. Nesta tese, o foco esta sobre modelos de fronteiras determinísticas, os quais são baseados na hipótese de que os dados observados pertencem ao conjunto tecnológico. Dentre os modelos estatísticos e estimadores para fronteiras determinísticas existentes, uma abordagem promissora e a adotada por Martins-Filho e Yao (2007). Esses autores propõem um procedimento de estimação composto por três estágios. Esse estimador e de fácil implementação, visto que envolve procedimentos não-paramétricos bem conhecidos. Além disso, o estimador possui características desejáveis vis-à-vis estimadores para fronteiras determinísticas tradicionais como DEA e FDH. Nesta tese, três artigos, que melhoram o modelo proposto por Martins-Filho e Yao (2007), sao propostos. No primeiro artigo, o procedimento de estimação desses autores e melhorado a partir de uma variação do estimador exponencial local, proposto por Ziegelmann (2002). Demonstra-se que estimador proposto a consistente e assintoticamente normal. Além disso, devido ao estimador exponencial local, estimativas potencialmente negativas para a função de variância condicional, que poderiam prejudicar a aplicabilidade do estimador proposto por Martins-Filho e Yao, são evitadas. No segundo artigo, e proposto um método original para estimação de fronteiras de produção em apenas dois estágios. E mostrado que se pode eliminar o segundo estágio proposto por Martins-Filho e Yao, assim como, eliminar o segundo estagio proposto no primeiro artigo desta tese. Em ambos os casos, a estimação do mesmo modelo de fronteira de produção requer três estágios, sendo versões diferentes para o segundo estagio. As propriedades assintóticas do estimador proposto são analisadas, mostrando-se consistência e normalidade assintótica sob hipóteses razoáveis. No terceiro artigo, a proposta uma variação semi-paramétrica do modelo estudado no segundo artigo. Reescreve-se aquele modelo de modo que se possa estimar a fronteira de produção e a eficiência de unidades produtivas no contexto de múltiplos insumos, sem incorrer no curse of dimensionality. A abordagem adotada coloca o modelo na estrutura de modelos aditivos, a partir de hipóteses sobre como os insumos se combinam no processo produtivo. Em particular, considera-se aqui os casos de insumos aditivos e insumos multiplicativos, os quais são amplamente considerados em teoria econômica e aplicações. Estudos de Monte Carlo são apresentados em todos os artigos, afim de elucidar as propriedades dos estimadores propostos em amostras finitas. Além disso, estudos com dados reais são apresentados em todos os artigos, nos quais são estimador rankings de eficiência para uma amostra de departamentos policiais dos EUA, a partir de dados sobre criminalidade daquele país. / There exists a large and growing literature on the specification and estimation of production frontiers and therefore efficiency of production units. In this thesis we focus on deterministic production frontier models, which are based on the assumption that all observed data lie in the technological set. Among the existing statistical models and estimators for deterministic frontiers, a promising approach is that of Martins-Filho and Yao (2007). They propose an estimation procedure that consists of three stages. Their estimator is fairly easy to implement as it involves standard nonparametric procedures. In addition, it has a number of desirable characteristics vis-a-vis traditional deterministic frontier estimators as DEA and FDH. In this thesis we propose three papers that improve the model proposed in Martins-Filho and Yao (2007). In the first paper we improve their estimation procedure by adopting a variant of the local exponential smoothing proposed in Ziegelmann (2002). Our estimator is shown to be consistent and asymptotically normal. In addition, due to local exponential smoothing, potential negativity of conditional variance functions that may hinder the use of Martins-Filho and Yao's estimator is avoided. In the second paper we propose a novel method for estimating production frontiers in only two stages. (Continue). There we show that we can eliminate the second stage of Martins-Filho and Yao as well as of our first paper, where estimation of the same frontier model requires three stages under different versions for the second stage. We study asymptotic properties showing consistency andNirtnin, asymptotic normality of our proposed estimator under standard assumptions. In the third paper we propose a semiparametric variation of the frontier model studied in the second paper. We rewrite that model allowing for estimating the production frontier and efficiency of production units in a multiple input context without suffering the curse of dimensionality. Our approach places that model within the framework of additive models based on assumptions regarding the way inputs combine in production. In particular, we consider the cases of additive and multiplicative inputs, which are widely considered in economic theory and applications. Monte Carlo studies are performed in all papers to shed light on the finite sample properties of the proposed estimators. Furthermore a real data study is carried out in all papers, from which we rank efficiency within a sample of USA Law Enforcement agencies using USA crime data.
507

Estimação não-paramétrica e semi-paramétrica de fronteiras de produção

Torrent, Hudson da Silva January 2010 (has links)
Existe uma grande e crescente literatura sobre especificação e estimação de fronteiras de produção e, portanto, de eficiência de unidades produtivas. Nesta tese, o foco esta sobre modelos de fronteiras determinísticas, os quais são baseados na hipótese de que os dados observados pertencem ao conjunto tecnológico. Dentre os modelos estatísticos e estimadores para fronteiras determinísticas existentes, uma abordagem promissora e a adotada por Martins-Filho e Yao (2007). Esses autores propõem um procedimento de estimação composto por três estágios. Esse estimador e de fácil implementação, visto que envolve procedimentos não-paramétricos bem conhecidos. Além disso, o estimador possui características desejáveis vis-à-vis estimadores para fronteiras determinísticas tradicionais como DEA e FDH. Nesta tese, três artigos, que melhoram o modelo proposto por Martins-Filho e Yao (2007), sao propostos. No primeiro artigo, o procedimento de estimação desses autores e melhorado a partir de uma variação do estimador exponencial local, proposto por Ziegelmann (2002). Demonstra-se que estimador proposto a consistente e assintoticamente normal. Além disso, devido ao estimador exponencial local, estimativas potencialmente negativas para a função de variância condicional, que poderiam prejudicar a aplicabilidade do estimador proposto por Martins-Filho e Yao, são evitadas. No segundo artigo, e proposto um método original para estimação de fronteiras de produção em apenas dois estágios. E mostrado que se pode eliminar o segundo estágio proposto por Martins-Filho e Yao, assim como, eliminar o segundo estagio proposto no primeiro artigo desta tese. Em ambos os casos, a estimação do mesmo modelo de fronteira de produção requer três estágios, sendo versões diferentes para o segundo estagio. As propriedades assintóticas do estimador proposto são analisadas, mostrando-se consistência e normalidade assintótica sob hipóteses razoáveis. No terceiro artigo, a proposta uma variação semi-paramétrica do modelo estudado no segundo artigo. Reescreve-se aquele modelo de modo que se possa estimar a fronteira de produção e a eficiência de unidades produtivas no contexto de múltiplos insumos, sem incorrer no curse of dimensionality. A abordagem adotada coloca o modelo na estrutura de modelos aditivos, a partir de hipóteses sobre como os insumos se combinam no processo produtivo. Em particular, considera-se aqui os casos de insumos aditivos e insumos multiplicativos, os quais são amplamente considerados em teoria econômica e aplicações. Estudos de Monte Carlo são apresentados em todos os artigos, afim de elucidar as propriedades dos estimadores propostos em amostras finitas. Além disso, estudos com dados reais são apresentados em todos os artigos, nos quais são estimador rankings de eficiência para uma amostra de departamentos policiais dos EUA, a partir de dados sobre criminalidade daquele país. / There exists a large and growing literature on the specification and estimation of production frontiers and therefore efficiency of production units. In this thesis we focus on deterministic production frontier models, which are based on the assumption that all observed data lie in the technological set. Among the existing statistical models and estimators for deterministic frontiers, a promising approach is that of Martins-Filho and Yao (2007). They propose an estimation procedure that consists of three stages. Their estimator is fairly easy to implement as it involves standard nonparametric procedures. In addition, it has a number of desirable characteristics vis-a-vis traditional deterministic frontier estimators as DEA and FDH. In this thesis we propose three papers that improve the model proposed in Martins-Filho and Yao (2007). In the first paper we improve their estimation procedure by adopting a variant of the local exponential smoothing proposed in Ziegelmann (2002). Our estimator is shown to be consistent and asymptotically normal. In addition, due to local exponential smoothing, potential negativity of conditional variance functions that may hinder the use of Martins-Filho and Yao's estimator is avoided. In the second paper we propose a novel method for estimating production frontiers in only two stages. (Continue). There we show that we can eliminate the second stage of Martins-Filho and Yao as well as of our first paper, where estimation of the same frontier model requires three stages under different versions for the second stage. We study asymptotic properties showing consistency andNirtnin, asymptotic normality of our proposed estimator under standard assumptions. In the third paper we propose a semiparametric variation of the frontier model studied in the second paper. We rewrite that model allowing for estimating the production frontier and efficiency of production units in a multiple input context without suffering the curse of dimensionality. Our approach places that model within the framework of additive models based on assumptions regarding the way inputs combine in production. In particular, we consider the cases of additive and multiplicative inputs, which are widely considered in economic theory and applications. Monte Carlo studies are performed in all papers to shed light on the finite sample properties of the proposed estimators. Furthermore a real data study is carried out in all papers, from which we rank efficiency within a sample of USA Law Enforcement agencies using USA crime data.
508

Aspects théoriques et pratiques dans l'estimation non paramétrique de la densité conditionnelle pour des données fonctionnelles / Theoretical and practical aspects in non parametric estimation of the conditional density with functional data

Madani, Fethi 11 May 2012 (has links)
Dans cette thèse, nous nous intéressons à l'estimation non paramétrique de la densité conditionnelle d'une variable aléatoire réponse réelle conditionnée par une variable aléatoire explicative fonctionnelle de dimension éventuellement fi nie. Dans un premier temps, nous considérons l'estimation de ce modèle par la méthode du double noyaux. Nous proposons une méthode de sélection automatique du paramètre de lissage (global et puis local) intervenant dans l'estimateur à noyau, et puis nous montrons l'optimalité asymptotique du paramètre obtenu quand les observations sont indépendantes et identiquement distribuées. Le critère adopté est issu du principe de validation croisée. Dans cette partie nous avons procédé également à la comparaison de l'efficacité des deux types de choix (local et global). Dans la deuxième partie et dans le même contexte topologique, nous estimons la densité conditionnelle par la méthode des polynômes locaux. Sous certaines conditions, nous établissons des propriétés asymptotiques de cet estimateur telles que la convergence presque-complète et la convergence en moyenne quadratique dans le cas où les observations sont indépendantes et identiquement distribuées. Nous étendons aussi nos résultats au cas où les observations sont de type α- mélangeantes, dont on montre la convergence presque-complète (avec vitesse de convergence) de l'estimateur proposé. Enfi n, l'applicabilité rapide et facile de nos résultats théoriques, dans le cadre fonctionnel, est illustrée par des exemples (1) sur des données simulées, et (2) sur des données réelles. / In this thesis, we consider the problem of the nonparametric estimation of the conditional density when the response variable is real and the regressor is valued in a functional space. In the rst part, we use the double kernels method's as a estimation method where we focus on the choice of the smoothing parameters. We construct a data driven method permitting to select optimally and automatically bandwidths. As main results, we study the asymptotic optimality of this selection method in the case where observations are independent and identically distributed (i.i.d). Our selection rule is based on the classical cross-validation ideas and it deals with the both global and local choices. The performance of our approach is illustrated also by some simulation results on nite samples where we conduct a comparison between the two types of bandwidths choices (local and global). In the second part, we adopt a functional version of the local linear method, in the same topological context, to estimate some functional parameters. Under some general conditions, we establish the almost-complete convergence (with rates) of the proposed estimator in the both cases ( the i.i.d. case and the α-mixing case) . As application, we use the conditional density estimator to estimate the conditional mode estimation and to derive some asymptotic proprieties of the constructed estimator. Then, we establish the quadratic error of this estimator by giving its exact asymptotic expansion (involved in the leading in the bias and variance terms). Finally, the applicability of our results is then veri ed and validated for (1) simulated data, and (2) some real data.
509

Efficient Bayesian methods for mixture models with genetic applications

Zuanetti, Daiane Aparecida 14 December 2016 (has links)
Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2017-01-16T12:38:12Z No. of bitstreams: 1 TeseDAZ.pdf: 20535130 bytes, checksum: 82585444ba6f0568a20adac88fdfc626 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-17T11:47:35Z (GMT) No. of bitstreams: 1 TeseDAZ.pdf: 20535130 bytes, checksum: 82585444ba6f0568a20adac88fdfc626 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-17T11:47:42Z (GMT) No. of bitstreams: 1 TeseDAZ.pdf: 20535130 bytes, checksum: 82585444ba6f0568a20adac88fdfc626 (MD5) / Made available in DSpace on 2017-01-17T11:47:50Z (GMT). No. of bitstreams: 1 TeseDAZ.pdf: 20535130 bytes, checksum: 82585444ba6f0568a20adac88fdfc626 (MD5) Previous issue date: 2016-12-14 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / We propose Bayesian methods for selecting and estimating di erent types of mixture models which are widely used in Genetics and Molecular Biology. We speci cally propose data-driven selection and estimation methods for a generalized mixture model, which accommodates the usual (independent) and the rst-order (dependent) models in one framework, and QTL (quantitative trait locus) mapping models for independent and pedigree data. For clustering genes through a mixture model, we propose three nonparametric Bayesian methods: a marginal nested Dirichlet process (NDP), which is able to cluster distributions and, a predictive recursion clustering scheme (PRC) and a subset nonparametric Bayesian (SNOB) clustering algorithm for clustering big data. We analyze and compare the performance of the proposed methods and traditional procedures of selection, estimation and clustering in simulated and real data sets. The proposed methods are more exible, improve the convergence of the algorithms and provide more accurate estimates in many situations. In addition, we propose methods for predicting nonobservable QTLs genotypes and missing parents and improve the Mendelian probability of inheritance of nonfounder genotype using conditional independence structures. We also suggest applying diagnostic measures to check the goodness of t of QTL mapping models. / N os propomos métodos Bayesianos para selecionar e estimar diferentes tipos de modelos de mistura que são amplamente utilizados em Genética e Biologia Molecular. Especificamente, propomos métodos direcionados pelos dados para selecionar e estimar um modelo de mistura generalizado, que descreve o modelo de mistura usual (independente) e o de primeira ordem numa mesma estrutura, e modelos de mapeamento de QTL com dados independentes e familiares. Para agrupar genes através de modelos de mistura, nós propomos três métodos Bayesianos não-paramétricos: o processo de Dirichlet aninhado que possibilita agrupamento de distribuições e, um algoritmo preditivo recursivo e outro Bayesiano nãoparamétrico exato para agrupar dados de alta dimensão. Analisamos e comparamos o desempenho dos métodos propostos e dos procedimentos tradicionais de seleção e estimação de modelos e agrupamento de dados em conjuntos de dados simulados e reais. Os métodos propostos são mais extáveis, aprimoram a convergência dos algoritmos e apresentam estimativas mais precisas em muitas situações. Além disso, nós propomos procedimentos para predizer o genótipo não observável dos QTLs e de pais faltantes e melhorar a probabilidade Mendeliana de herança genética do genótipo dos descendentes através da estrutura de independência condicional entre os indivíduos. Também sugerimos aplicar medidas de diagnóstico para verificar a qualidade do ajuste dos modelos de mapeamento de QTLs.
510

Obtenção dos níveis de significância para os testes de Kruskal-Wallis, Friedman e comparações múltiplas não-paramétricas. / Obtaining significance levels for Kruskal-Wallis, Friedman and nonparametric multiple comparisons tests.

Antonio Carlos Fonseca Pontes 29 June 2000 (has links)
Uma das principais dificuldades encontradas pelos pesquisadores na utilização da Estatística Experimental Não-Paramétrica é a obtenção de resultados confiáveis. Os testes mais utilizados para os delineamentos com um fator de classificação simples inteiramente casualizados e blocos casualizados são o de Kruskal-Wallis e o de Friedman, respectivamente. As tabelas disponíveis para estes testes são pouco abrangentes, fazendo com que o pesquisador seja obrigado a recorrer a aproximações. Estas aproximações diferem dependendo do autor a ser consultado, podendo levar a resultados contraditórios. Além disso, tais tabelas não consideram empates, mesmo no caso de pequenas amostras. No caso de comparações múltiplas isto é mais evidente ainda, em especial quando ocorrem empates ou ainda, nos delineamentos inteiramente casualizados onde se tem número diferente de repetições entre tratamentos. Nota-se ainda que os softwares mais utilizados em geral recorrem a aproximações para fornecer os níveis de significância, além de não apresentarem resultados para as comparações múltiplas. Assim, o objetivo deste trabalho é apresentar um programa, em linguagem C, que realiza os testes de Kruskal-Wallis, de Friedman e de comparações múltiplas entre todos os tratamentos (bilateral) e entre os tratamentos e o controle (uni e bilateral) considerando todas as configurações sistemáticas de postos ou com 1.000.000 de configurações aleatórias, dependendo do número total de permutações possíveis. Dois níveis de significância são apresentados: o DW ou MaxDif , baseado na comparação com a diferença máxima dentro de cada configuração e o Geral, baseado na comparação com todas as diferenças em cada configuração. Os valores do nível de significância Geral assemelham-se aos fornecidos pela aproximação normal. Os resultados obtidos através da utilização do programa mostram, ainda, que os testes utilizando as permutações aleatórias podem ser bons substitutos nos casos em que o número de permutações sistemáticas é muito grande, já que os níveis de probabilidade são bastante próximos. / One of the most difficulties for the researchers in using Nonparametric Methods is to obtain reliable results. Kruskal-Wallis and Friedman tests are the most used for one-way layout and for randomized blocks, respectively. Tables available for these tests are not too wild, so the research must use approximate values. These approximations are different, depending on the author and the results can be not similar. Furthermore, these tables do not taking account tied observations, even in the case of small sample. For multiple comparisons, this is more evident, specially when tied observations occur or the number of replications is different. Many softwares like SAS, STATISTICA, S-Plus, MINITAB, etc., use approximation in order to get the significance levels and they do not present results for multiple comparisons. Thus, the aim of this work is to present a routine in C language that runs Kruskal-Wallis, Friedman and multiple comparisons among all treatments (bi-tailed) and between treatment and control (uni and bi-tailed), considering all the systematic configurations of the ranks or with more than 1,000,000 random ones, depending on the total of possible permutations. Two levels of significance are presented: DW or MaxDif, based on the comparison of the maximum difference within each configuration and the Geral, based on the comparison of all differences for each configuration. The Geral values of the significance level are very similar for the normal approximation. The obtaining results through this routine show that, the tests using random permutations can be nice substitutes for the case of the number of systematic permutations is too large, once the levels of probability are very near.

Page generated in 0.0789 seconds