Global ETD Search

111	MMD and Ward criterion in a RKHS : application to Kernel based hierarchical agglomerative clustering / Maximum Dean Discrepancy et critère de Ward dans un RKHS : application à la classification hierarchique à noyau Li, Na 01 December 2015 (has links) La classification non supervisée consiste à regrouper des objets afin de former des groupes homogènes au sens d’une mesure de similitude. C’est un outil utile pour explorer la structure d’un ensemble de données non étiquetées. Par ailleurs, les méthodes à noyau, introduites initialement dans le cadre supervisé, ont démontré leur intérêt par leur capacité à réaliser des traitements non linéaires des données en limitant la complexité algorithmique. En effet, elles permettent de transformer un problème non linéaire en un problème linéaire dans un espace de plus grande dimension. Dans ce travail, nous proposons un algorithme de classification hiérarchique ascendante utilisant le formalisme des méthodes à noyau. Nous avons tout d’abord recherché des mesures de similitude entre des distributions de probabilité aisément calculables à l’aide de noyaux. Parmi celles-ci, la maximum mean discrepancy a retenu notre attention. Afin de pallier les limites inhérentes à son usage, nous avons proposé une modification qui conduit au critère de Ward, bien connu en classification hiérarchique. Nous avons enfin proposé un algorithme itératif de clustering reposant sur la classification hiérarchique à noyau et permettant d’optimiser le noyau et de déterminer le nombre de classes en présence / Clustering, as a useful tool for unsupervised classification, is the task of grouping objects according to some measured or perceived characteristics of them and it has owned great success in exploring the hidden structure of unlabeled data sets. Kernel-based clustering algorithms have shown great prominence. They provide competitive performance compared with conventional methods owing to their ability of transforming nonlinear problem into linear ones in a higher dimensional feature space. In this work, we propose a Kernel-based Hierarchical Agglomerative Clustering algorithms (KHAC) using Ward’s criterion. Our method is induced by a recently arisen criterion called Maximum Mean Discrepancy (MMD). This criterion has firstly been proposed to measure difference between different distributions and can easily be embedded into a RKHS. Close relationships have been proved between MMD and Ward's criterion. In our KHAC method, selection of the kernel parameter and determination of the number of clusters have been studied, which provide satisfactory performance. Finally an iterative KHAC algorithm is proposed which aims at determining the optimal kernel parameter, giving a meaningful number of clusters and partitioning the data set automatically Classification automatique (statistique) Reconnaissance des formes (informatique) Apprentissage automatique Tests d'hypothèses (statistique) Cluster analysis Pattern recognition systems Machine learning Statistical hypothesis testing 620.004 52
112	Inferência estatística para regressão múltipla h-splines / Statistical inference for h-splines multiple regression Morellato, Saulo Almeida, 1983- 25 August 2018 (has links) Orientador: Ronaldo Dias / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica / Made available in DSpace on 2018-08-25T00:25:46Z (GMT). No. of bitstreams: 1 Morellato_SauloAlmeida_D.pdf: 32854783 bytes, checksum: 040664acd0c8f1efe07cedccda8d11f6 (MD5) Previous issue date: 2014 / Resumo: Este trabalho aborda dois problemas de inferência relacionados à regressão múltipla não paramétrica: a estimação em modelos aditivos usando um método não paramétrico e o teste de hipóteses para igualdade de curvas ajustadas a partir do modelo. Na etapa de estimação é construída uma generalização dos métodos h-splines, tanto no contexto sequencial adaptativo proposto por Dias (1999), quanto no contexto bayesiano proposto por Dias e Gamerman (2002). Os métodos h-splines fornecem uma escolha automática do número de bases utilizada na estimação do modelo. Estudos de simulação mostram que os resultados obtidos pelos métodos de estimação propostos são superiores aos conseguidos nos pacotes gamlss, mgcv e DPpackage em R. São criados dois testes de hipóteses para testar H0 : f = f0. Um teste de hipóteses que tem sua regra de decisão baseada na distância quadrática integrada entre duas curvas, referente à abordagem sequencial adaptativa, e outro baseado na medida de evidência bayesiana proposta por Pereira e Stern (1999). No teste de hipóteses bayesiano o desempenho da medida de evidência é observado em vários cenários de simulação. A medida proposta apresentou um comportamento que condiz com uma medida de evidência favorável à hipótese H0. No teste baseado na distância entre curvas, o poder do teste foi estimado em diversos cenários usando simulações e os resultados são satisfatórios. Os procedimentos propostos de estimação e teste de hipóteses são aplicados a um conjunto de dados referente ao trabalho de Tanaka e Nishii (2009) sobre o desmatamento no leste da Ásia. O objetivo é escolher um entre oito modelos candidatos. Os testes concordaram apontando um par de modelos como sendo os mais adequados / Abstract: In this work we discuss two inference problems related to multiple nonparametric regression: estimation in additive models using a nonparametric method and hypotheses testing for equality of curves, also considering additive models. In the estimation step, it is constructed a generalization of the h-splines method, both in the sequential adaptive context proposed by Dias (1999), and in the Bayesian context proposed by Dias and Gamerman (2002). The h-splines methods provide an automatic choice of the number of bases used in the estimation of the model. Simulation studies show that the results obtained by proposed estimation methods are superior to those achieved in the packages gamlss, mgcv and DPpackage in R. Two hypotheses testing are created to test H0 : f = f0. A hypotheses test that has a decision rule based on the integrated squared distance between two curves, for adaptive sequential approach, and another based on the Bayesian evidence measure proposed by Pereira and Stern (1999). In Bayesian hypothesis testing the performance measure of evidence is observed in several simulation scenarios. The proposed measure showed a behavior that is consistent with evidence favorable to H0. In the test based on the distance between the curves, the power of the test was estimated at various scenarios using simulations, and the results are satisfactory. At the end of the work the proposed procedures of estimation and hypotheses testing are applied in a dataset concerning to the work of Tanaka and Nishii (2009) about the deforestation in East Asia. The objective is to choose one amongst eight models. The tests point to a pair of models as being the most suitableIn this work we discuss two inference problems related to multiple nonparametric regression: estimation in additive models using a nonparametric method and hypotheses testing for equality of curves, also considering additive models. In the estimation step, it is constructed a generalization of the h-splines method, both in the sequential adaptive context proposed by Dias (1999), and in the Bayesian context proposed by Dias and Gamerman (2002). The h-splines methods provide an automatic choice of the number of bases used in the estimation of the model. Simulation studies show that the results obtained by proposed estimation methods are superior to those achieved in the packages gamlss, mgcv and DPpackage in R. Two hypotheses testing are created to test H0 : f = f0. A hypotheses test that has a decision rule based on the integrated squared distance between two curves, for adaptive sequential approach, and another based on the Bayesian evidence measure proposed by Pereira and Stern (1999). In Bayesian hypothesis testing the performance measure of evidence is observed in several simulation scenarios. The proposed measure showed a behavior that is consistent with evidence favorable to H0. In the test based on the distance between the curves, the power of the test was estimated at various scenarios using simulations, and the results are satisfactory. At the end of the work the proposed procedures of estimation and hypotheses testing are applied in a dataset concerning to the work of Tanaka and Nishii (2009) about the deforestation in East Asia. The objective is to choose one amongst eight models. The tests point to a pair of models as being the most suitable / Doutorado / Estatistica / Doutor em Estatística Modelos aditivos generalizados Spline, Teoria do Métodos MCMC (Estatística) Testes de hipóteses estatísticas Análise de regressão Generalized additive models Spline theory MCMC methods Statistical hypothesis testing Regression analysis
113	Statistical detection for digital image forensics / Détection statistique pour la criminalistique des images numériques Qiao, Tong 25 April 2016 (has links) Le XXIème siècle étant le siècle du passage au tout numérique, les médias digitaux jouent un rôle de plus en plus important. Les logiciels sophistiqués de retouche d’images se sont démocratisés et permettent de diffuser facilement des images falsifiées. Ceci pose un problème sociétal puisqu’il s’agit de savoir si ce que l’on voit a été manipulé. Cette thèse s'inscrit dans le cadre de la criminalistique des images. Trois problèmes sont abordés : l'identification de l'origine d'une image, la détection d'informations cachées dans une image et la détection d'un exemple falsification : le rééchantillonnage. Ces travaux s'inscrivent dans le cadre de la théorie de la décision statistique et proposent la construction de détecteurs permettant de respecter une contrainte sur la probabilité de fausse alarme. Afin d'atteindre une performance de détection élevée, il est proposé d'exploiter les propriétés des images naturelles en modélisant les principales étapes de la chaîne d'acquisition d'un appareil photographique. La méthodologie, tout au long de ce manuscrit, consiste à étudier le détecteur optimal donné par le test du rapport de vraisemblance dans le contexte idéal où tous les paramètres du modèle sont connus. Lorsque des paramètres du modèle sont inconnus, ces derniers sont estimés afin de construire le test du rapport de vraisemblance généralisé dont les performances statistiques sont analytiquement établies. De nombreuses expérimentations sur des images simulées et réelles permettent de souligner la pertinence de l'approche proposée / The remarkable evolution of information technologies and digital imaging technology in the past decades allow digital images to be ubiquitous. The tampering of these images has become an unavoidable reality, especially in the field of cybercrime. The credibility and trustworthiness of digital images have been eroded, resulting in important consequences in terms of political, economic, and social issues. To restore the trust to digital images, the field of digital forensics was born. Three important problems are addressed in this thesis: image origin identification, detection of hidden information in a digital image and an example of tampering image detection : the resampling. The goal is to develop a statistical decision approach as reliable as possible that allows to guarantee a prescribed false alarm probability. To this end, the approach involves designing a statistical test within the framework of hypothesis testing theory based on a parametric model that characterizes physical and statistical properties of natural images. This model is developed by studying the image processing pipeline of a digital camera. As part of this work, the difficulty of the presence of unknown parameters is addressed using statistical estimation, making the application of statistical tests straightforward in practice. Numerical experiments on simulated and real images have highlighted the relevance of the proposed approach Criminalistique Traitement d'images Modèles mathématiques Tests d'hypothèses (statistique) Estimation de paramètres Forensic sciences Image processing Mathematical models Statistical hypothesis testing Parameter estimation 005.8
114	Bayesian multiple hypotheses testing with quadratic criterion / Test bayésien entre hypothèses multiples avec critère quadratique Zhang, Jian 04 April 2014 (has links) Le problème de détection et localisation d’anomalie peut être traité comme le problème du test entre des hypothèses multiples (THM) dans le cadre bayésien. Le test bayésien avec la fonction de perte 0−1 est une solution standard pour ce problème, mais les hypothèses alternatives pourraient avoir une importance tout à fait différente en pratique. La fonction de perte 0−1 ne reflète pas cette réalité tandis que la fonction de perte quadratique est plus appropriée. L’objectif de cette thèse est la conception d’un test bayésien avec la fonction de perte quadratique ainsi que son étude asymptotique. La construction de ce test est effectuée en deux étapes. Dans la première étape, un test bayésien avec la fonction de perte quadratique pour le problème du THM sans l’hypothèse de base est conçu et les bornes inférieures et supérieures des probabilités de classification erronée sont calculées. La deuxième étape construit un test bayésien pour le problème du THM avec l’hypothèse de base. Les bornes inférieures et supérieures des probabilités de fausse alarme, des probabilités de détection manquée, et des probabilités de classification erronée sont calculées. A partir de ces bornes, l’équivalence asymptotique entre le test proposé et le test standard avec la fonction de perte 0−1 est étudiée. Beaucoup d’expériences de simulation et une expérimentation acoustique ont illustré l’efficacité du nouveau test statistique / The anomaly detection and localization problem can be treated as a multiple hypotheses testing (MHT) problem in the Bayesian framework. The Bayesian test with the 0−1 loss function is a standard solution for this problem, but the alternative hypotheses have quite different importance in practice. The 0−1 loss function does not reflect this fact while the quadratic loss function is more appropriate. The objective of the thesis is the design of a Bayesian test with the quadratic loss function and its asymptotic study. The construction of the test is made in two steps. In the first step, a Bayesian test with the quadratic loss function for the MHT problem without the null hypothesis is designed and the lower and upper bounds of the misclassification probabilities are calculated. The second step constructs a Bayesian test for the MHT problem with the null hypothesis. The lower and upper bounds of the false alarm probabilities, the missed detection probabilities as well as the misclassification probabilities are calculated. From these bounds, the asymptotic equivalence between the proposed test and the standard one with the 0-1 loss function is studied. A lot of simulation and an acoustic experiment have illustrated the effectiveness of the new statistical test Tests d'hypothèses (statistique) Statistique bayésienne Détection du signal Analyse discriminante Son -- Mesure Statistical hypothesis testing Bayesian statistical decision theory Signal detection Discriminant analysis Sound -- Measurement 003 519.5
115	Dôsledky porušenia predpokladov použitia vybraných štatistických metód / Consequences of assumption violations of selected statistical methods Marcinko, Tomáš January 2010 (has links) Classical parametric methods of statistical inference and hypothesis testing are derived under fundamental theoretical assumptions, which may or may not be met in real world applications. However, these methods are usually used despite the violation of their underlying assumptions, while it is argued, that these methods are quite insensitive to the violation of relevant assumptions. Moreover, alternative nonparametric or rank tests are often overlooked, mostly because these methods may be deemed to be less powerful then parametric methods. The aim of the dissertation is therefore a description of the consequences of assumption violations concerning classical one-sample and two-sample statistical methods and a consistent and comprehensive comparison of parametric, nonparametric and robust statistical techniques, which is based on extensive simulation study and focused mostly on a normality and heteroscedasticity assumption violation. The results of the simulation study confirmed that the classical parametric methods are relatively robust, with some reservations in case of outlying observations, when traditional methods may fail. On the other hand, the empirical study clearly proved that the classical parametric methods are losing their optimal properties, when the underlying assumptions are violated. For example, in many cases of non-normality the appropriate nonparametric and rank-based methods are more powerful, and therefore a statement, that these methods are unproductive due to their lack of power may be considered a crucial mistake. However, the choice of the most appropriate distribution-free method generally depends on the particular form of the underlying distribution.
116	Social welfare policy and the crisis of hunger Bolesworth, Karen, Tufts, Susan 01 January 2001 (has links) The Personal Responsibility and Work Opportunity Reconciliation Act of 1996 has lead to reduced welfare assistance to the needy. This thesis analyzes how families have become increasingly homeless and hungry during the welfare reform years. Public welfare Federal aid to public welfare Human services Supplemental security income program Poor -- Government policy Chi-square test Statistical hypothesis testing Social Welfare Social Work
117	Inférence statistique dans le modèle de mélange à risques proportionnels / Statistical inference in mixture of proportional hazards models Ben elouefi, Rim 05 September 2017 (has links) Dans ce travail, nous nous intéressons à l'inférence statistique dans deux modèles semi-paramétrique et non-paramétrique stratifiés de durées de vie censurées. Nous proposons tout d'abord une statistique de test d'ajustement pour le modèle de régression stratifié à risques proportionnels. Nous établissons sa distribution asymptotique sous l'hypothèse nulle d'un ajustement correct du modèle aux données. Nous étudions les propriétés numériques de ce test (niveau, puissance sous différentes alternatives) au moyen de simulations. Nous proposons ensuite une procédure permettant de stratifier le modèle à 1isques proportionnels suivant un seuil inconnu d'une variable de stratification. Cette procédure repose sur l'utilisation du test d'ajustement proposé précédemment. Une étude de simulation exhaustive est conduite pour évaluer les pe1fonnances de cette procédure. Dans une seconde partie de notre travail, nous nous intéressons à l'application du test du logrank stratifié dans un contexte de données manquantes (nous considérons la situation où les strates ne peuvent être observées chez tous les individus de l'échantillon). Nous construisons une version pondérée du logrank stratifié adaptée à ce problème. Nous en établissons la loi limite sous l'hypothèse nulle d'égalité des fonctions de risque dans les différents groupes. Les propriétés de cette nouvelle statistique de test sont évaluée au moyen de simulations. Le test est ensuite appliqué à un jeu de données médicales. / In this work, we are interested in the statistical inference in two semi-parametric and non-parametric stratified models for censored data. We first propose a goodnessof- fit test statistic for the stratified proportional hazards regression model. We establish its asymptotic distribution under the null hypothesis of a correct fit of the model. We investigate the numerical properties of this test (level, power under different alternatives) by means of simulations. Then, we propose a procedure allowing to stratify the proportional hazards model according to an unknown threshold in a stratification variable. This procedure is based on the goodness-of-fit test proposed earlier. An exhaustive simulation study is conducted to evaluate the performance of this procedure. In a second part of our work, we consider the stratified logrank test in a context of missing data (we consider the situation where strata can not be observed on all sample individuals). We construct a weighted version of the stratified logrank, adapted to this problem. We establish its asymptotic distribution under the null hypothesis of equality of the hazards functions in the different groups. The prope1ties of this new test statistic are assessed using simulatious. Finally, the test is applied to a medical dataset. Analyse de survie Données manquantes Modèle de Cox stratifié Résultats asymptotiques Données médicales Goodness-of-fit tests Survival Analysis Statistical hypothesis testing Simulation methods Nonparametric statistics 519.5
118	A comparative study of permutation procedures Van Heerden, Liske 30 November 1994 (has links) The unique problems encountered when analyzing weather data sets - that is, measurements taken while conducting a meteorological experiment- have forced statisticians to reconsider the conventional analysis methods and investigate permutation test procedures. The problems encountered when analyzing weather data sets are simulated for a Monte Carlo study, and the results of the parametric and permutation t-tests are compared with regard to significance level, power, and the average coilfidence interval length. Seven population distributions are considered - three are variations of the normal distribution, and the others the gamma, the lognormal, the rectangular and empirical distributions. The normal distribution contaminated with zero measurements is also simulated. In those simulated situations in which the variances are unequal, the permutation test procedure was performed using other test statistics, namely the Scheffe, Welch and Behrens-Fisher test statistics. / Mathematical Sciences / M. Sc. (Statistics) Weather modification Robustness Permutation test Parametric test Monte Carlo study Significance level Power Average confidence interval length Zero measurements Behrens-Fisher problem Scheffe Welch Test statistic Unequal variances 519.56 Statistical hypothesis testing
119	A comparative study of permutation procedures Van Heerden, Liske 30 November 1994 (has links) The unique problems encountered when analyzing weather data sets - that is, measurements taken while conducting a meteorological experiment- have forced statisticians to reconsider the conventional analysis methods and investigate permutation test procedures. The problems encountered when analyzing weather data sets are simulated for a Monte Carlo study, and the results of the parametric and permutation t-tests are compared with regard to significance level, power, and the average coilfidence interval length. Seven population distributions are considered - three are variations of the normal distribution, and the others the gamma, the lognormal, the rectangular and empirical distributions. The normal distribution contaminated with zero measurements is also simulated. In those simulated situations in which the variances are unequal, the permutation test procedure was performed using other test statistics, namely the Scheffe, Welch and Behrens-Fisher test statistics. / Mathematical Sciences / M. Sc. (Statistics) Weather modification Robustness Permutation test Parametric test Monte Carlo study Significance level Power Average confidence interval length Zero measurements Behrens-Fisher problem Scheffe Welch Test statistic Unequal variances 519.56 Statistical hypothesis testing
120	O uso de quase U-estatísticas para séries temporais uni e multivaridas / The use of quasi U-statistics for univariate and multivariate time series Valk, Marcio 17 August 2018 (has links) Orientador: Aluísio de Souza Pinheiro / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Matemática Estatítica e Computação Científica / Made available in DSpace on 2018-08-17T14:57:09Z (GMT). No. of bitstreams: 1 Valk_Marcio_D.pdf: 2306844 bytes, checksum: 31162915c290291a91806cdc6f69f697 (MD5) Previous issue date: 2011 / Resumo: Classificação e agrupamento de séries temporais são problemas bastante explorados na literatura atual. Muitas técnicas são apresentadas para resolver estes problemas. No entanto, as restrições necessárias, em geral, tornam os procedimentos específicos e aplicáveis somente a uma determinada classe de séries temporais. Além disso, muitas dessas abordagens são empíricas. Neste trabalho, propomos métodos para classificação e agrupamento de séries temporais baseados em quase U-estatísticas(Pinheiro et al. (2009) e Pinheiro et al. (2010)). Como núcleos das U-estatísticas são utilizadas métricas baseadas em ferramentas bem conhecidas na literatura de séries temporais, entre as quais o periodograma e a autocorrelação amostral. Três situações principais são consideradas: séries univariadas; séries multivariadas; e séries com valores aberrantes. _E demonstrada a normalidade assintética dos testes propostos para uma ampla classe de métricas e modelos. Os métodos são estudados também por simulação e ilustrados por aplicação em dados reais. / Abstract: Classifcation and clustering of time series are problems widely explored in the current literature. Many techniques are presented to solve these problems. However, the necessary restrictions in general, make the procedures specific and applicable only to a certain class of time series. Moreover, many of these approaches are empirical. We present methods for classi_cation and clustering of time series based on Quasi U-statistics (Pinheiro et al. (2009) and Pinheiro et al. (2010)). As kernel of U-statistics are used metrics based on tools well known in the literature of time series, including the sample autocorrelation and periodogram. Three main situations are considered: univariate time series, multivariate time series, and time series with outliers. It is demonstrated the asymptotic normality of the proposed tests for a wide class of metrics and models. The methods are also studied by simulation and applied in a real data set. / Doutorado / Estatistica / Doutor em Estatística Análise de séries temporais Series temporais Estatística não paramétrica Testes de hipóteses estatísticas Valores estranhos (Estatistica) Teoria da previsão Time-series analysis Time-series Nonparametric statistics Statistical hypothesis testing Outliers (Statistics) Prediction theory

Search results