321 |
Estimação parametrica e semi-parametrica em misturas uniforme-beta generalizada : uma aplicação em dados de microarranjos / Parametric and semi-parametric estimation in uniform-generalized beta mixtures : application in microarray dataAbreu, Gabriel Coelho Gonçalves de 17 January 2007 (has links)
Orientador: Aluisio de Souza Pinheiro / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica / Made available in DSpace on 2018-08-07T20:15:26Z (GMT). No. of bitstreams: 1
Abreu_GabrielCoelhoGoncalvesde_M.pdf: 2309026 bytes, checksum: 4a857ceb54e460ea976576c36730e349 (MD5)
Previous issue date: 2007 / Resumo: A análise de dados de expressão gênica tem sido de grande importância nas mais variadas áreas do desenvolvimento humano, como agricultura, melhoramento animal e medicina. Apesar dos avanços na área de estatística genética, a análise desse tipo de dados pode ser complexa e de difícil execução. Os investimentos já feitos nos últimos anos em pesquisa laboratorial podem levar a resultados concretos (melhoramento genético, vacinas genéticas, patentes) em pouco tempo, sob a correta interpretação dos resultados. Como a análise é feita em milhares de genes, existem problemas de comparações múltiplas, excedendo substancialmente o valor nominal de cada teste. Atualmente, em biologia, o problema de testes múltiplos se tornou uma norma, e não uma excessão. Assim, soluções sugeridas englobam o controle da taxas de erro, como o FDR (False discovery rate). O estudo da distribuição empírica dos p-valores, obtidos através dos testes estatísticos, pode ser realizado sob um modelo de mistura finita de distribuições beta. Sugere-se a utilização da distribuição beta generalizada com três parâmetros, mais flexível que a beta padrão. Faz-se um estudo da estimação paramétrica e semi-paramétrica no modelo proposto. São feitos estudos de simulação e aplicação a dados reais / Abstract: The analysis of gene expression data has been of great importance in many fields of human knowledge, as agriculture, animal breeding, and medicine. Despite the continuous progress of statsitical genetics, the analysis of such data can be complex and of difficult evaluation. The investments done in the last years in laboratorial research can lead to important results in short time under the correct interpretation of data. As the analysis are done under a huge amount of data, multiple comparison problems are present, resulting in a redction of the nominal confidence of each test. Nowadays, in biologi and related fields, multiple testing problems has become a reality. Thus, among possible solutions is the control of error rates, such the FDR (False discovery rate). The estimation of the p-values distribution, obtained through statistical tests, can be evaluated by a finite mixture model of beta distributions. The use of generalized beta distribution with three parameters, a more frexible distribution, is sugested. The parametric and semi-parametric
estimations are studied over the proposed model. Simulations and a application to real data are considered. / Mestrado / Mestre em Estatística
|
322 |
Méthodes statistiques pour la mise en correspondance de descripteurs / Statistical methods for descriptor matchingCollier, Olivier 02 October 2013 (has links)
De nombreuses applications, en vision par ordinateur ou en médecine notamment,ont pour but d'identifier des similarités entre plusieurs images ou signaux. On peut alors détecter des objets, les suivre, ou recouper des prises de vue. Dans tous les cas, les procédures algorithmiques qui traitent les images utilisent une sélection de points-clefs qu'elles essayent ensuite de mettre en correspondance par paire. Elles calculent pour chaque point un descripteur qui le caractérise, le discrimine des autres. Parmi toutes les procédures possibles,la plus utilisée aujourd'hui est SIFT, qui sélectionne les points-clefs, calcule des descripteurs et propose un critère de mise en correspondance globale. Dans une première partie, nous tentons d'améliorer cet algorithme en changeant le descripteur original qui nécessite de trouver l'argument du maximum d'un histogramme : en effet, son calcul est statistiquement instable. Nous devons alors également changer le critère de mise en correspondance de deux descripteurs. Il en résulte un problème de test non paramétrique dans lequel à la fois l'hypothèse nulle et alternative sont composites, et même non paramétriques. Nous utilisons le test du rapport de vraisemblance généralisé afin d'exhiber des procédures de test consistantes, et proposons une étude minimax du problème. Dans une seconde partie, nous nous intéressons à l'optimalité d'une procédure globale de mise en correspondance. Nous énonçons un modèle statistique dans lequel des descripteurs sont présents dans un certain ordre dans une première image, et dans un autre dans une seconde image. La mise en correspondance revient alors à l'estimation d'une permutation. Nous donnons un critère d'optimalité au sens minimax pour les estimateurs. Nous utilisons en particulier la vraisemblance afin de trouver plusieurs estimateurs consistants, et même optimaux sous certaines conditions. Enfin, nous nous sommes intéressés à des aspects pratiques en montrant que nos estimateurs étaient calculables en temps raisonnable, ce qui nous a permis ensuite d'illustrer la hiérarchie de nos estimateurs par des simulations / Many applications, as in computer vision or medicine, aim at identifying the similarities between several images or signals. There after, it is possible to detect objects, to follow them, or to overlap different pictures. In every case, the algorithmic procedures that treat the images use a selection of key points that they try to match by pairs. The most popular algorithm nowadays is SIFT, that performs key point selection, descriptor calculation, and provides a criterion for global descriptor matching. In the first part, we aim at improving this procedure by changing the original descriptor, that requires to find the argument of the maximum of a histogram: its computation is indeed statistically unstable. So we also have to change the criterion to match two descriptors. This yields a nonparametric hypothesis testing problem, in which both the null and the alternative hypotheses are composite, even nonparametric. We use the generalized likelihood ratio test to get consistent testing procedures, and carry out a minimax study. In the second part, we are interested in the optimality of the procedure of global matching. We give a statistical model in which some descriptors are present in a given order in a first image, and in another order in a second image. Descriptor matching is equivalent in this case to the estimation of a permutation. We give an optimality criterion for the estimators in the minimax sense. In particular, we use the likelihood to find several consistent estimators, which are even optimal under some conditions. Finally, we tackled some practical aspects and showed that our estimators are computable in reasonable time, so that we could then illustrate the hierarchy of our estimators by some simulations
|
323 |
An investigation of bootstrap methods for estimating the standard error of equating under the common-item nonequivalent groups designWang, Chunxin 01 July 2011 (has links)
The purpose of this study was to investigate the performance of the parametric bootstrap method and to compare the parametric and nonparametric bootstrap methods for estimating the standard error of equating (SEE) under the common-item nonequivalent groups (CINEG) design with the frequency estimation (FE) equipercentile method under a variety of simulated conditions.
When the performance of the parametric bootstrap method was investigated, bivariate polynomial log-linear models were employed to fit the data. With the consideration of the different polynomial degrees and two different numbers of cross-product moments, a total of eight parametric bootstrap models were examined. Two real datasets were used as the basis to define the population distributions and the "true" SEEs. A simulation study was conducted reflecting three levels for group proficiency differences, three levels of sample sizes, two test lengths and two ratios of the number of common items and the total number of items. Bias of the SEE, standard errors of the SEE, root mean square errors of the SEE, and their corresponding weighted indices were calculated and used to evaluate and compare the simulation results.
The main findings from this simulation study were as follows: (1) The parametric bootstrap models with larger polynomial degrees generally produced smaller bias but larger standard errors than those with lower polynomial degrees. (2) The parametric bootstrap models with a higher order cross product moment (CPM) of two generally yielded more accurate estimates of the SEE than the corresponding models with the CPM of one. (3) The nonparametric bootstrap method generally produced less accurate estimates of the SEE than the parametric bootstrap method. However, as the sample size increased, the differences between the two bootstrap methods became smaller. When the sample size was equal to or larger than 3,000, the differences between the nonparametric bootstrap method and the parametric bootstrap model that produced the smallest RMSE were very small. (4) Of all the models considered in this study, parametric bootstrap models with the polynomial degree of four performed better under most simulation conditions. (5) Aside from method effects, sample size and test length had the most impact on estimating the SEE. Group proficiency differences and the ratio of the number of common items to the total number of items had little effect on a short test, but had slight effect on a long test.
|
324 |
Contributions à l'inférence statistique en présence de censure multivariée / Contributions to statistical inference in presence of multivariate censoringGribkova, Svetlana 29 September 2014 (has links)
L'objectif de cette thèse est d'explorer plusieurs approches pour l'étude des données censurées multivariées, à savoir l'estimation non paramétrique de la fonction de répartition jointe, la modélisation de dépendance par les modèles de copules et l'étude exploratoire par des méthodes de clustering. Le Chapitre 1 introduit le contexte général de cette thèse ainsi que ses contributions. Le Chapitre 2 est consacré à l'estimation de la distribution jointe des deux variables censurées dans le cadre d'un modèle de durée simplifié où la différence entre deux variables de censure est observée. Un nouvel estimateur non paramétrique de la fonction de répartition jointe y est introduit. La normalité asymptotique a été démontrée, pour les intégrales par rapport à la mesure définie par cet estimateur. Le Chapitre 3 est dédié à la problématique de l'estimation non paramétrique de la copule bivariée, à partir d'un échantillon de données censurées. La copule est d'abord estimée par une fonction discrète qui peut être interprétée comme une extension de la copule empirique en présence de censure, puis par ses versions lisses. Les propriétés asymptotiques et des applications de des estimateurs ont été considérées. Le Chapitre 4 présente une approche exploratoire pour l'étude de données censurées. Plus précisément, une configuration multivariée est considérée où une variable est une durée sujette à la censure, et toutes les autres variables sont observées. Sous ces conditions, une nouvelle méthode de quantification de la loi jointe est introduite. La méthode est étudiée théoriquement et appliquée à la construction d'un algorithme de clustering pour des observations censurées. / The main purpose of this thesis is to explore several approaches for studying multivariate censored data: nonparametric estimation of the joint distribution function, modeling dependence with copulas and k-clustering for the exploratory analysis. Chapter 1 presents the general framework and the contributions of this thesis. Chapter 2 deals with the estimation of the joint distribution function of two censored variables in a simplified survival model in which the difference between two censoring variables is observed. We provide a new nonparametric estimator of the joint distribution function and we establish the asymptotic normality of the integrals with respect to its associated measure. Chapter 3 is devoted to nonparametric copula estimation under bivariate censoring. We provide a discrete and two smooth copula estimators along with two estimators of its density. The discrete estimator can be seen as an extension of the empirical copula under censoring. Chapter 4 provides a new exploratory approach for censored data analysis. We consider a multivariate configuration with one variable subjected to censoring and the others completely observed. We extend the probabilistic k-quantization method in the case of random vector with one censored component. The definitions of the empirical distortion and of empirically optimal quantizer are generalized in presence of one-dimensional censoring. We study the asymptotic properties of the distortion of the empirically optimal quantizer and we provide a non-asymptotic exponential bound for the rate of convergence. Our results are then applied to construct a new two-step clustering algorithm for censored data.
|
325 |
Conditional quantile estimation through optimal quantizationCharlier, Isabelle 17 December 2015 (has links) (PDF)
Les applications les plus courantes des méthodes non paramétriques concernent l'estimation d'une fonction de régression (i.e. de l'espérance conditionnelle). Cependant, il est souvent intéressant de modéliser les quantiles conditionnels, en particulier lorsque la moyenne conditionnelle ne permet pas de représenter convenablement l'impact des covariables sur la variable dépendante. De plus, ils permettent d'obtenir des graphiques plus compréhensibles de la distribution conditionnelle de la variable dépendante que ceux obtenus avec la moyenne conditionnelle. A l'origine, la "quantification" était utilisée en ingénierie du signal et de l'information. Elle permet de discrétiser un signal continu en un nombre fini de quantifieurs. En mathématique, le problème de la quantification optimale consiste à trouver la meilleure approximation d'une distribution continue d'une variable aléatoire par une loi discrète avec un nombre fixé de quantifieurs. Initialement utilisée pour des signaux univariés, la méthode a été étendue au cadre multivarié et est devenue un outil pour résoudre certains problèmes en probabilités numériques.Le but de cette thèse est d'appliquer la quantification optimale en norme Lp à l'estimation des quantiles conditionnels. Différents cas sont abordés :covariable uni- ou multidimensionnelle, variable dépendante uni- ou multivariée. La convergence des estimateurs proposés est étudiée d'un point de vue théorique. Ces estimateurs ont été implémentés et un package R, nommé QuantifQuantile, a été développé. Leur comportement numérique est évalué sur des simulations et des données réelles. / One of the most common applications of nonparametric techniques has been the estimation of a regression function (i.e. a conditional mean). However it is often of interest to model conditional quantiles, particularly when it is felt that the conditional mean is not representative of the impact of the covariates on the dependent variable. Moreover, the quantile regression function provides a much more comprehensive picture of the conditional distribution of a dependent variable than the conditional mean function. Originally, the "quantization'" was used in signal and information theories since the fifties. Quantization was devoted to the discretization of a continuous signal by a finite number of "quantizers". In mathematics, the problem of optimal quantization is to find the best approximation of thecontinuous distribution of a random variable by a discrete law with a fixed number of charged points. Firstly used for a one-dimensional signal, themethod has then been developed in the multi-dimensional case and extensively used as a tool to solve problems arising in numerical probability.The goal of this thesis is to study how to apply optimal quantization in Lp-norm to conditional quantile estimation. Various cases are studied: one-dimensional or multidimensional covariate, univariate or multivariate dependent variable. The convergence of the proposed estimators is studied from a theoretical point of view. The proposed estimators were implemented and a R package, called QuantifQuantile, was developed. Numerical behavior of the estimators is evaluated through simulation studies and real data applications. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
|
326 |
Single-Focus Confocal Data Analysis with Bayesian NonparametricsJanuary 2020 (has links)
abstract: The cell is a dense environment composes of proteins, nucleic acids, as well as other small molecules, which are constantly bombarding each other and interacting. These interactions and the diffusive motions are driven by internal thermal fluctuations. Upon collision, molecules can interact and form complexes. It is of interest to learn kinetic parameters such as reaction rates of one molecule converting to different species or two molecules colliding and form a new species as well as to learn diffusion coefficients.
Several experimental measurements can probe diffusion coefficients at the single-molecule and bulk level. The target of this thesis is on single-molecule methods, which can assess diffusion coefficients at the individual molecular level. For instance, super resolution methods like stochastic optical reconstruction microscopy (STORM) and photo activated localization microscopy (PALM), have a high spatial resolution with the cost of lower temporal resolution. Also, there is a different group of methods, such as MINFLUX, multi-detector tracking, which can track a single molecule with high spatio-temporal resolution. The problem with these methods is that they are only applicable to very diluted samples since they need to ensure existence of a single molecule in the region of interest (ROI).
In this thesis, the goal is to have the best of both worlds by achieving high spatio-temporal resolutions without being limited to a few molecules. To do so, one needs to refocus on fluorescence correlation spectroscopy (FCS) as a method that applies to both in vivo and in vitro systems with a high temporal resolution and relies on multiple molecules traversing a confocal volume for an extended period of time. The difficulty here is that the interpretation of the signal leads to different estimates for the kinetic parameters such as diffusion coefficients based on a different number of molecules we consider in the model. It is for this reason that the focus of this thesis is now on using Bayesian nonparametrics (BNPs) as a way to solve this model selection problem and extract kinetic parameters such as diffusion coefficients at the single-molecule level from a few photons, and thus with the highest temporal resolution as possible. / Dissertation/Thesis / Source code related to chapter 3 / Source code related to chapter 4 / Doctoral Dissertation Physics 2020
|
327 |
Neparametrické testování nezávislosti trajektorií zvířat / Nonparametric tests of independence between animal movement trajectoriesVeselý, Martin January 2021 (has links)
In this thesis, we assume observing a pair of trajectories of two objects which could interact with one another and we want to propose a way to test their independence. We formulate basic point process definitions and discuss ways to describe trajectory data. We formulate the theory behind Monte Carlo tests and global envelope testing. In Chapter 2, we propose a parametric model to represent trajectories and derive Maximum Likelihood estimates of its model. We conclude the chapter by exploring the performance of these estimates. In Chapter 3, we propose test statistics used to test for independence using a nonparametric Monte Carlo test based on a random shift approach. We perform a simulation study to assess the performance of these statistics under various conditions and discuss the selection of fine-tuning parameters. Finally, in Chapter 4, we study real data provided by the Voyageurs Wolf Project and apply the proposed tests on real wolf trajectories. 1
|
328 |
Spatial and Temporal Trends in Water Quality in the Alafia River WatershedAragon, Jennifer M 16 November 2009 (has links)
Water quality data and land use information were analyzed within the Alafia River watershed in Florida to determine spatial and temporal trends in these variables over a 16 year time period from 1991-2006. Monthly water quality data (for dissolved oxygen, turbidity, fecal coliform, total phosphorus, and total nitrogen) were statistically analyzed using the modified seasonal Kendall nonparametric test for trends that accounts for serial correlation. The statistical trend analysis was conducted for the entire study period, but monthly, seasonal, and land use trends were also examined. Land use information was examined using Geographic Information Systems to determine the percent change in land use proportion from 1990 to 1999, 1999 to 2006, and 1990 to 2006. The proportions of each land use and their percent change were then related to the trends in water quality.
The results of this analysis showed that water quality for the parameters turbidity and total phosphorus have been shown to be improving with statistically significant decreasing trends for turbidity at stations 74, 111, 116, and 139 and for total phosphorus at stations 74, 114, and 115. A statistically significant decreasing trend in dissolved oxygen was determined for stations 116 and an increasing trend in total nitrogen for stations 114, 115, and 151 implying water quality for these parameters is degrading. Other noted trends were high fecal coliform and total nitrogen at station 111, which has higher proportions of agricultural land use and an increasing proportion of urban and built-up land use. Also, low dissolved oxygen was noted at station 74. The proportions of land use for the entire study area have changed from predominantly wetlands to now urban and built-up land use.
While agricultural, rangeland, and wetlands land use have shown a reduction in the proportion of coverage in the contributing zone of almost every station, urban and built-up land use has increased in proportion at every station.
|
329 |
Výnosové křivky / Yield CurvesKorbel, Michal January 2019 (has links)
The master thesis is looking into the estimation of yield curve using two ap- proaches. The first one is searching for parametric model which is able to describe the behavior of yield curve well and estimate its parameters. The parametric mo- dels used in the thesis are derived from the class of models introduced by Nelson and Siegel. The second approach is nonparametric estimation of yield curves using spline smoothing and kernel smoothing. All used methods are then compared on real observed data and their suitability for various tasks and concrete available observations is considered. 1
|
330 |
Proposed Nonparametric Tests for Equality of Location and Scale Against Ordered AlternativesZhu, Tiwei January 2021 (has links)
Ordered alternatives tests are sometimes used in life-testing experiments and drug-screening studies. An ordered alternative test is sometimes used to gain power if the researcher thinks parameters will be ordered in a certain way if they are different. This research proposal focuses on developing new nonparametric tests for the nondecreasing ordered alternative problem for k (k?3) populations when testing for differences in both location and scale.
Six nonparametric tests are proposed for the nondecreasing ordered alternative when testing for a difference in either location or scale. The six tests are various combinations of a well-known ordered alternatives test for location and a test based on the Moses test technique for testing differences in scale. A simulation study is conducted to determine how well the proposed tests maintain their significance levels. Powers are estimated for the proposed tests under a variety of conditions for three, four and five populations. Several types of variable parameters are considered: when the location parameters are different and the scale parameters are equal; when the location parameters are equal and the scale parameters are different; when the location and scale parameters are both different. Equal and unequal samples sizes of 18 and 30 are considered. Subgroup sizes of 3 and 6 are both used when applying the Moses test technique. Recommendations are given for which test should be used for various situations.
|
Page generated in 0.1349 seconds