Global ETD Search

41	Melhoramento do resíduo de Wald em modelos lineares generalizados / Improvement of Wald residual in generalized linear models Urbano, Mariana Ragassi 18 December 2008 (has links) A teoria dos modelos lineares generalizados é muito utilizada na estatística, para a modelagem de observações provenientes da distribuição Normal, mas, principalmente, na modelagem de observações cuja distribuição pertença à família exponencial de distribuições. Alguns exemplos são as distribuições binomial, gama, normal inversa, dentre outras. Ajustado um modelo, para vericar a adequação do ajuste, são aplicadas técnicas de diagnósticos e feita uma análise de resíduos. As propriedades dos resíduos para modelos lineares generalizados não são muito conhecidas e resultados assintóticos são o único recurso. Este trabalho teve como objetivo estudar as propriedades assintóticas do resíduo de Wald, e realizar correções para que sua distribuição se aproxime de uma distribuição normal padrão. Uma aplicação das correções para o resíduo de Wald foi feita para cinco conjuntos de dados. Em dois conjuntos, a variável resposta apresentava-se na forma de contagem, e para a modelagem utilizou-se a distribuição de Poisson. Dois outros conjuntos são provenientes de delineamentos experimentais inteiramente casualizados, com variável resposta contínua e para a modelagem utilizou-se a distribuição normal, e para o último conjunto, o interesse era modelar a proporção, e utilizou-se a distribuição binomial. Um estudo de simulação foi conduzido, utilizando-se o método de Monte Carlo, e concluiu-se, que com as correções realizadas no resíduo de Wald, houve uma melhora signicativa em sua distribuição, sendo que a versão melhorada do resíduo tem distribuição que aproxima mais de uma distribuição normal padrão. / The theory of generalized linear models is very used in statistics, not only for modeling data normally distributed, but in the modeling of data whose distribution belongs to the exponential family of distributions. Some examples are binomial, gamma and inverse Gaussian distribution, among others. After tting a model in order to check the adequacy of tting, diagnostic techniques are used. The properties of residuals in generalized linear models are not well known, and asymptotic results are the only recourse. This work aims to study the asymptotic properties of Wald residual, and to obtain corrections to make the distribution of the modied residuals closer to standard normal. An application of the corrections for Wald residuals was done to ve datasets. In two datasets the response variables were counts, and to model, was used the Poisson distribution. Other two datasets are provided from a completely randomized design with a continuous response, and to model, was used the normal distribution, and, in the last dataset the interest was to model the proportion and the binomial distribution was used. A Monte Carlo simulation, was performed showing that the distribution of the corrected Wald residuals, is more close to the standard normal distribution. Distribuição normal Generalized linear models Método de Monte Carlo Modelos lineares generalizados Normal distribution Monte Carlo method.
42	Non-asymptotic bounds for prediction problems and density estimation. Minsker, Stanislav 05 July 2012 (has links) This dissertation investigates the learning scenarios where a high-dimensional parameter has to be estimated from a given sample of fixed size, often smaller than the dimension of the problem. The first part answers some open questions for the binary classification problem in the framework of active learning. Given a random couple (X,Y) with unknown distribution P, the goal of binary classification is to predict a label Y based on the observation X. Prediction rule is constructed from a sequence of observations sampled from P. The concept of active learning can be informally characterized as follows: on every iteration, the algorithm is allowed to request a label Y for any instance X which it considers to be the most informative. The contribution of this work consists of two parts: first, we provide the minimax lower bounds for the performance of active learning methods. Second, we propose an active learning algorithm which attains nearly optimal rates over a broad class of underlying distributions and is adaptive with respect to the unknown parameters of the problem. The second part of this thesis is related to sparse recovery in the framework of dictionary learning. Let (X,Y) be a random couple with unknown distribution P. Given a collection of functions H, the goal of dictionary learning is to construct a prediction rule for Y given by a linear combination of the elements of H. The problem is sparse if there exists a good prediction rule that depends on a small number of functions from H. We propose an estimator of the unknown optimal prediction rule based on penalized empirical risk minimization algorithm. We show that the proposed estimator is able to take advantage of the possible sparse structure of the problem by providing probabilistic bounds for its performance. Active learning Sparse recovery Oracle inequality Confidence bands Infinite dictionary Estimation theory Asymptotic theory Estimation theory Distribution (Probability theory) Prediction theory Active learning Algorithms Mathematical optimization Chebyshev approximation
43	Two statistical problems related to credit scoring / Tanja de la Rey. De la Rey, Tanja January 2007 (has links) This thesis focuses on two statistical problems related to credit scoring. In credit scoring of individuals, two classes are distinguished, namely low and high risk individuals (the so-called "good" and "bad" risk classes). Firstly, we suggest a measure which may be used to study the nature of a classifier for distinguishing between the two risk classes. Secondly, we derive a new method DOUW (detecting outliers using weights) which may be used to fit logistic regression models robustly and for the detection of outliers. In the first problem, the focus is on a measure which may be used to study the nature of a classifier. This measure transforms a random variable so that it has the same distribution as another random variable. Assuming a linear form of this measure, three methods for estimating the parameters (slope and intercept) and for constructing confidence bands are developed and compared by means of a Monte Carlo study. The application of these estimators is illustrated on a number of datasets. We also construct statistical hypothesis to test this linearity assumption. In the second problem, the focus is on providing a robust logistic regression fit and the identification of outliers. It is well-known that maximum likelihood estimators of logistic regression parameters are adversely affected by outliers. We propose a robust approach that also serves as an outlier detection procedure and is called DOUW. The approach is based on associating high and low weights with the observations as a result of the likelihood maximization. It turns out that the outliers are those observations to which low weights are assigned. This procedure depends on two tuning constants. A simulation study is presented to show the effects of these constants on the performance of the proposed methodology. The results are presented in terms of four benchmark datasets as well as a large new dataset from the application area of retail marketing campaign analysis. In the last chapter we apply the techniques developed in this thesis on a practical credit scoring dataset. We show that the DOUW method improves the classifier performance and that the measure developed to study the nature of a classifier is useful in a credit scoring context and may be used for assessing whether the distribution of the good and the bad risk individuals is from the same translation-scale family. / Thesis (Ph.D. (Risk Analysis))--North-West University, Potchefstroom Campus, 2008. Credit scoring Quantile comparison function Method of moments Method of quantiles Estimation Asymptotic theory Test of linearity Logistic regression Outliers Robust estimators Trimming Down weighting
44	Two statistical problems related to credit scoring / Tanja de la Rey. De la Rey, Tanja January 2007 (has links) This thesis focuses on two statistical problems related to credit scoring. In credit scoring of individuals, two classes are distinguished, namely low and high risk individuals (the so-called "good" and "bad" risk classes). Firstly, we suggest a measure which may be used to study the nature of a classifier for distinguishing between the two risk classes. Secondly, we derive a new method DOUW (detecting outliers using weights) which may be used to fit logistic regression models robustly and for the detection of outliers. In the first problem, the focus is on a measure which may be used to study the nature of a classifier. This measure transforms a random variable so that it has the same distribution as another random variable. Assuming a linear form of this measure, three methods for estimating the parameters (slope and intercept) and for constructing confidence bands are developed and compared by means of a Monte Carlo study. The application of these estimators is illustrated on a number of datasets. We also construct statistical hypothesis to test this linearity assumption. In the second problem, the focus is on providing a robust logistic regression fit and the identification of outliers. It is well-known that maximum likelihood estimators of logistic regression parameters are adversely affected by outliers. We propose a robust approach that also serves as an outlier detection procedure and is called DOUW. The approach is based on associating high and low weights with the observations as a result of the likelihood maximization. It turns out that the outliers are those observations to which low weights are assigned. This procedure depends on two tuning constants. A simulation study is presented to show the effects of these constants on the performance of the proposed methodology. The results are presented in terms of four benchmark datasets as well as a large new dataset from the application area of retail marketing campaign analysis. In the last chapter we apply the techniques developed in this thesis on a practical credit scoring dataset. We show that the DOUW method improves the classifier performance and that the measure developed to study the nature of a classifier is useful in a credit scoring context and may be used for assessing whether the distribution of the good and the bad risk individuals is from the same translation-scale family. / Thesis (Ph.D. (Risk Analysis))--North-West University, Potchefstroom Campus, 2008. Credit scoring Quantile comparison function Method of moments Method of quantiles Estimation Asymptotic theory Test of linearity Logistic regression Outliers Robust estimators Trimming Down weighting
45	Intervalos de confiança para dados com presença de eventos recorrentes e censuras. Faria, Rodrigo 23 May 2003 (has links) Made available in DSpace on 2016-06-02T20:06:09Z (GMT). No. of bitstreams: 1 DissRF.pdf: 61430 bytes, checksum: 98abe5764051c2697adcbd0c9cfcd965 (MD5) Previous issue date: 2003-05-23 / In survival analysis and reliability is common that the population units in study presents recurrence events and censoring ages, besides, is possible to exist a cost related to each event that happens. The objectives of this dissertation consists in display a methodology that makes possible the direct obtaining of confidence intervals baseds in asymptotic theory for nonparametric estimates to the mean cumulative number or cost events per unit. Some simulation studies are also showed and the objectives are check if there is some sample size's influence in the asymptotics confidence interval's precision. One of the great advantages from the methodology presented in this dissertation is the validity for its application in several areas of the knowledge. There's two examples considered here. One of them consists in coming data from engineering. This example contains a eet of machines in analysis. The interest is to obtain punctual estimates with the respective confidence intervals for the mean cumulative number and cost repairs per machine. The other example comes from the medical area and it treats of a study accomplished with two groups of patients with bladder can- cer, each one submitted in a di¤erent treatment type. The application of the methodology in this example seeks the obtaining of confidence intervals for the mean cumulative number of tumors per patient and gain estimates that compare these two di¤erents treatments informing, statistically, which presents better results. / Em análise de sobrevivência e confiabilidade, é comum que as unidades populacionais em estudo apresentem eventos recorrentes e presença de censuras, sendo possível a atribuição de um custo relacionado a cada evento que ocorra. Os objetivos deste trabalho consistem na apresentação de uma metodologia que possibilita a obtenção direta de estimativas intervalares não-paramétricas, baseadas na teoria assintótica, para o número ou custo médio de eventos acumulados por unidade. São também realizados alguns estudos de simulação que verificam a influência do tamanho da amostra na precisão dos intervalos de confiança assintóticos obtidos. Uma das grandes vantagens da metodologia estudada, e apresentada neste trabalho, é a possibilidade de sua aplicação em diversas áreas do conhecimento. Dois exemplos são considerados. Um deles consiste em dados provenientes da área de engenharia, no qual um conjunto de motores é analisado. Neste, o interesse é obter estimativas pontuais com os respectivos intervalos de confiança para o número e custo médio de reparos acumulados por motor. O outro exemplo provém da área médica e trata de um estudo realizado com dois grupos de pacientes com câncer de bexiga, cada qual submetido a um diferente tipo de tratamento. A aplicação da metodologia neste exemplo visa, além da obtenção de intervalos de confiança para o número médio de tumores acumulados por paciente, também obter estimativas que levem à comparação dos dois tratamentos, no sentido de informar estatisticamente qual deles apresenta melhores resultados. Estatística matemática Análise de sobrevivência (biometria) Bootstrap ( estatística) Confiabilidade Recurrence events Censoring age
46	Melhoramento do resíduo de Wald em modelos lineares generalizados / Improvement of Wald residual in generalized linear models Mariana Ragassi Urbano 18 December 2008 (has links) A teoria dos modelos lineares generalizados é muito utilizada na estatística, para a modelagem de observações provenientes da distribuição Normal, mas, principalmente, na modelagem de observações cuja distribuição pertença à família exponencial de distribuições. Alguns exemplos são as distribuições binomial, gama, normal inversa, dentre outras. Ajustado um modelo, para vericar a adequação do ajuste, são aplicadas técnicas de diagnósticos e feita uma análise de resíduos. As propriedades dos resíduos para modelos lineares generalizados não são muito conhecidas e resultados assintóticos são o único recurso. Este trabalho teve como objetivo estudar as propriedades assintóticas do resíduo de Wald, e realizar correções para que sua distribuição se aproxime de uma distribuição normal padrão. Uma aplicação das correções para o resíduo de Wald foi feita para cinco conjuntos de dados. Em dois conjuntos, a variável resposta apresentava-se na forma de contagem, e para a modelagem utilizou-se a distribuição de Poisson. Dois outros conjuntos são provenientes de delineamentos experimentais inteiramente casualizados, com variável resposta contínua e para a modelagem utilizou-se a distribuição normal, e para o último conjunto, o interesse era modelar a proporção, e utilizou-se a distribuição binomial. Um estudo de simulação foi conduzido, utilizando-se o método de Monte Carlo, e concluiu-se, que com as correções realizadas no resíduo de Wald, houve uma melhora signicativa em sua distribuição, sendo que a versão melhorada do resíduo tem distribuição que aproxima mais de uma distribuição normal padrão. / The theory of generalized linear models is very used in statistics, not only for modeling data normally distributed, but in the modeling of data whose distribution belongs to the exponential family of distributions. Some examples are binomial, gamma and inverse Gaussian distribution, among others. After tting a model in order to check the adequacy of tting, diagnostic techniques are used. The properties of residuals in generalized linear models are not well known, and asymptotic results are the only recourse. This work aims to study the asymptotic properties of Wald residual, and to obtain corrections to make the distribution of the modied residuals closer to standard normal. An application of the corrections for Wald residuals was done to ve datasets. In two datasets the response variables were counts, and to model, was used the Poisson distribution. Other two datasets are provided from a completely randomized design with a continuous response, and to model, was used the normal distribution, and, in the last dataset the interest was to model the proportion and the binomial distribution was used. A Monte Carlo simulation, was performed showing that the distribution of the corrected Wald residuals, is more close to the standard normal distribution. Distribuição normal Método de Monte Carlo Modelos lineares generalizados Generalized linear models Normal distribution Monte Carlo method.
47	Tests d'hypothèses pour les processus de Poisson dans les cas non réguliers / Hypotheses testing problems for inhomogeneous Poisson processes Yang, Lin 22 January 2014 (has links) Ce travail est consacré aux problèmes de testd’hypothèses pour les processus de Poisson nonhomogènes.L’objectif principal de ce travail est l’étude decomportement des différents tests dans le cas desmodèles statistiques singuliers. L’évolution de lasingularité de la fonction d'intensité est comme suit :régulière (l'information de Fisher finie), continue maisnon différentiable (singularité de type “cusp”),discontinue (singularité de type saut) et discontinueavec un saut de taille variable. Dans tous les cas ondécrit analytiquement les tests. Dans le cas d’un saut detaille variable, on présente également les propriétésasymptotiques des estimateurs.En particulier, on décrit les statistiques de tests, le choixdes seuils et le comportement des fonctions depuissance sous les alternatives locales. Le problèmeinitial est toujours le test d’une hypothèse simple contreune alternative unilatérale. La méthode principale est lathéorie de la convergence faible dans l’espace desfonctions discontinues. Cette théorie est appliquée àl’étude des processus de rapport de vraisemblancenormalisé dans les modèles singuliers considérés. Laconvergence faible du rapport de vraisemblance sousl’hypothèse et sous les alternatives vers les processuslimites correspondants nous permet de résoudre lesproblèmes mentionnés précédemment.Les résultats asymptotiques sont illustrés par dessimulations numériques contenant la construction destests, le choix des seuils et les fonctions de puissancessous les alternatives locales. / This work is devoted to the hypotheses testing problems for inhomogeneous Poisson processes.The main object of the work is the study of the behaviour of different tests in the case of singular statistical models. The “evolution of singularity” of the intensity function is the following: regular (finite Fisherinformation), continuous but not differentiable (“cusp”type singularity), discontinuous (jump type singularity)and discontinuous with variable jump size. In all thecases we describe analytically the tests. In the case ofvariable jump size we present as well the asymptoticproperties of the estimators.In particular we describe the test statistics, the choice ofthresholds and the form of the power functions for thelocal alternatives. The initial problem is always the testof a simple hypothesis against a one-sided alternative.The main tool is the weak convergence theory in thespace of discontinuous functions. This theory is appliedto the study of the normalized likelihood ratio processesin the considered singular models. The weakconvergence of the likelihood ratio processes underhypothesis and under alternatives to the correspondinglimit processes allows us to solve the mentioned aboveproblems.The asymptotic results are illustrated by numericalsimulations which contain the construction of the tests,the choice of the thresholds, and the power functions forlocal alternatives. Tests d'hypothèses Processus de Poisson non homogène Théorie asymptotique Alternatives composées Modèles statistiques singuliers Hypotheses testing Inhomogeneous Poisson processes Asymptotic theory Composed alternatives Singular statistical models 519.56
48	Operação para continuação do afastamento : operador diferencial, comportamento dinâmico e empilhamento multi-paramétrico / Offset continuation operation : differential operator, dynamic behavior and multi-parameter stacking Coimbra, Tiago Antonio Alves, 1981- 24 August 2018 (has links) Orientadores: Maria Amélia Novais Schleicher, Joerg Dietrich Wilhelm Schleicher / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica / Made available in DSpace on 2018-08-24T12:48:22Z (GMT). No. of bitstreams: 1 Coimbra_TiagoAntonioAlves_D.pdf: 11322080 bytes, checksum: 57d63d91892a162c542c0a1d25e3c08b (MD5) Previous issue date: 2014 / Resumo: A operação para continuação de afastamento (Offset Continuation Operation - OCO) transforma um registro sísmico adquirido com um certo afastamento entre fonte e receptor, em um registro correspondente como se fosse adquirido com outro afastamento. O deslocamento de um evento sísmico sob esta operação pode ser descrito por uma equação diferencial parcial de segunda ordem. Baseado na aproximação WKBJ, deduzimos uma equação tipo iconal OCO que descreve os aspectos cinemáticos deste deslocamento em analogia a uma onda acústica, e uma equação de transporte que descreve a alteração das amplitudes. Baseado na teoria dos raios representamos uma forma de solução para a nova equação proposta. Notamos que operadores diferencias de transformação de configuração que corrigem o fator de espalhamento geométrico para qualquer afastamento, ao menos de modo assintótico, são novos na literatura. Baseados na cinemática da operação, propomos um operador de empilhamento multi-paramétrico no domínio não-migrado dos dados sísmicos. Esse empilhamento multi-paramétrico usa uma velocidade média, chamada de velocidade OCO, bem como outros parâmetros cinemáticos do campo de onda importantes. Por se basear na OCO, os tempos de trânsito usados neste empilhamento multi-paramétrico acompanham a trajetória OCO que aproxima à verdadeira trajetória do ponto de reflexão comum. Assim, os parâmetros extraídos servem para melhorar a correção do sobretempo convencional ou realizar correções correspondentes para afastamentos não nulos. Desta forma, é possível aumentar a qualidade das seções empilhadas convencionais de afastamento nulo ou até gerar seções empilhadas de outros afastamentos. Os parâmetros cinemáticos envolvidos ainda podem ser utilizado para construir um melhor modelo de velocidade. Exemplos numéricos mostram que o empilhamento usando trajetórias OCO aumenta, de forma significativa, a qualidade dos dados com uso de menos parâmetros que nos métodos clássicos / Abstract: The Offset Continuation Operation (OCO) transforms a seismic record with a certain offset between source and receiver in another record as if obtained with another offset. The displacement between a seismic event under this operation may be modeled by a second order partial differential equation. We base on the WKBJ approximation and deduce an OCO equation type-eikonal and a transport equation. The former decribes the kinematic features of this displacement, analogously to an acoustic wave, and the latter describes the change of the amplitudes. We present a solution for the proposed new equation, based on the ray theory. The differential configuration transformation operators that correct the geometric spreading for any common offset section (CO) in an asymptoptic way are a novelty in the literature. Based on the kinematics of the operation, we propose a multi-parametric stacking on the unmigrated data domain. This multi-parametric use stacking average velocity called OCO velocity and other kinematic parameters important field from waveform. Since it is based on OCO, travel times used in this multi-parametric stacking accompany OCO trajectory that approximates the true trajectory of the common reflection point (CRP). Thus, the extracted parameters are used to improve the precision of the moveout or to do corresponding corrections for nonzero offsets. Thus, it is possible to increase the quality of conventional sections stacked in zero offset or even generate stacked sections other common offsets. The kinematic parameters involved can also be used to build a velocity model better. Numerical examples show that the stacking using trajectories OCO increases, significantly, the quality of the data using fewer parameters than the classical methods / Doutorado / Matematica Aplicada / Doutor em Matemática Aplicada Ondas sismicas Equações diferenciais parciais Métodos de continuação Seismic waves Partial differential equations Asymptotic theory Continuation methods
49	Semiparametric Estimation of Drift, Rotation and Scaling in Sparse Sequential Dynamic Imaging: Asymptotic theory and an application in nanoscale fluorescence microscopy Hobert, Anne 29 January 2019 (has links) No description available. 510 SMS microscopy SML microscopy Mathematical statistics Asymptotic theory Nanoscale fluorescence microscopy Central limit theorem M-estimators Motion estimation Mathematik (PPN61756535X)
50	Inférence statistique dans le modèle de mélange à risques proportionnels / Statistical inference in mixture of proportional hazards models Ben elouefi, Rim 05 September 2017 (has links) Dans ce travail, nous nous intéressons à l'inférence statistique dans deux modèles semi-paramétrique et non-paramétrique stratifiés de durées de vie censurées. Nous proposons tout d'abord une statistique de test d'ajustement pour le modèle de régression stratifié à risques proportionnels. Nous établissons sa distribution asymptotique sous l'hypothèse nulle d'un ajustement correct du modèle aux données. Nous étudions les propriétés numériques de ce test (niveau, puissance sous différentes alternatives) au moyen de simulations. Nous proposons ensuite une procédure permettant de stratifier le modèle à 1isques proportionnels suivant un seuil inconnu d'une variable de stratification. Cette procédure repose sur l'utilisation du test d'ajustement proposé précédemment. Une étude de simulation exhaustive est conduite pour évaluer les pe1fonnances de cette procédure. Dans une seconde partie de notre travail, nous nous intéressons à l'application du test du logrank stratifié dans un contexte de données manquantes (nous considérons la situation où les strates ne peuvent être observées chez tous les individus de l'échantillon). Nous construisons une version pondérée du logrank stratifié adaptée à ce problème. Nous en établissons la loi limite sous l'hypothèse nulle d'égalité des fonctions de risque dans les différents groupes. Les propriétés de cette nouvelle statistique de test sont évaluée au moyen de simulations. Le test est ensuite appliqué à un jeu de données médicales. / In this work, we are interested in the statistical inference in two semi-parametric and non-parametric stratified models for censored data. We first propose a goodnessof- fit test statistic for the stratified proportional hazards regression model. We establish its asymptotic distribution under the null hypothesis of a correct fit of the model. We investigate the numerical properties of this test (level, power under different alternatives) by means of simulations. Then, we propose a procedure allowing to stratify the proportional hazards model according to an unknown threshold in a stratification variable. This procedure is based on the goodness-of-fit test proposed earlier. An exhaustive simulation study is conducted to evaluate the performance of this procedure. In a second part of our work, we consider the stratified logrank test in a context of missing data (we consider the situation where strata can not be observed on all sample individuals). We construct a weighted version of the stratified logrank, adapted to this problem. We establish its asymptotic distribution under the null hypothesis of equality of the hazards functions in the different groups. The prope1ties of this new test statistic are assessed using simulatious. Finally, the test is applied to a medical dataset. Analyse de survie Données manquantes Modèle de Cox stratifié Résultats asymptotiques Données médicales Goodness-of-fit tests Survival Analysis Statistical hypothesis testing Simulation methods Nonparametric statistics 519.5

Search results