Global ETD Search

41	Essays on Inference in Linear Mixed Models Kramlinger, Peter 28 April 2020 (has links) No description available. 510 Small Area Estimation Marginal Inference Conditional Inference Multiple testing simultaneous inference Lasso Sparsity REML Variance components Mathematics (PPN61756535X)
42	Modèles de mélange semi-paramétriques et applications aux tests multiples / Semi-parametric mixture models and applications to multiple testing Nguyen, Van Hanh 01 October 2013 (has links) Dans un contexte de test multiple, nous considérons un modèle de mélange semi-paramétrique avec deux composantes. Une composante est supposée connue et correspond à la distribution des p-valeurs sous hypothèse nulle avec probabilité a priori p. L'autre composante f est nonparamétrique et représente la distribution des p-valeurs sous l'hypothèse alternative. Le problème d'estimer les paramètres p et f du modèle apparaît dans les procédures de contrôle du taux de faux positifs (``false discovery rate'' ou FDR). Dans la première partie de cette dissertation, nous étudions l'estimation de la proportion p. Nous discutons de résultats d'efficacité asymptotique et établissons que deux cas différents arrivent suivant que f s'annule ou non surtout un intervalle non-vide. Dans le premier cas (annulation surtout un intervalle), nous présentons des estimateurs qui convergent \`{a} la vitesse paramétrique, calculons la variance asymptotique optimale et conjecturons qu'aucun estimateur n'est asymptotiquement efficace (i.e atteint la variance asymptotique optimale). Dans le deuxième cas, nous prouvons que le risque quadratique de n'importe quel estimateur ne converge pas à la vitesse paramétrique. Dans la deuxième partie de la dissertation, nous nous concentrons sur l'estimation de la composante inconnue nonparamétrique f dans le mélange, en comptant sur un estimateur préliminaire de p. Nous proposons et étudions les propriétés asymptotiques de deux estimateurs différents pour cette composante inconnue. Le premier estimateur est un estimateur à noyau avec poids aléatoires. Nous établissons une borne supérieure pour son risque quadratique ponctuel, en montrant une vitesse de convergence nonparamétrique classique sur une classe de Holder. Le deuxième estimateur est un estimateur du maximum de vraisemblance régularisée. Il est calculé par un algorithme itératif, pour lequel nous établissons une propriété de décroissance d'un critère. De plus, ces estimateurs sont utilisés dans une procédure de test multiple pour estimer le taux local de faux positifs (``local false discovery rate'' ou lfdr). / In a multiple testing context, we consider a semiparametric mixture model with two components. One component is assumed to be known and corresponds to the distribution of p-values under the null hypothesis with prior probability p. The other component f is nonparametric and stands for the distribution under the alternative hypothesis. The problem of estimating the parameters p and f of the model appears from the false discovery rate control procedures. In the first part of this dissertation, we study the estimation of the proportion p. We discuss asymptotic efficiency results and establish that two different cases occur whether f vanishes on a non-empty interval or not. In the first case, we exhibit estimators converging at parametric rate, compute the optimal asymptotic variance and conjecture that no estimator is asymptotically efficient (i.e. attains the optimal asymptotic variance). In the second case, we prove that the quadratic risk of any estimator does not converge at parametric rate. In the second part of the dissertation, we focus on the estimation of the nonparametric unknown component f in the mixture, relying on a preliminary estimator of p. We propose and study the asymptotic properties of two different estimators for this unknown component. The first estimator is a randomly weighted kernel estimator. We establish an upper bound for its pointwise quadratic risk, exhibiting the classical nonparametric rate of convergence over a class of Holder densities. The second estimator is a maximum smoothed likelihood estimator. It is computed through an iterative algorithm, for which we establish a descent property. In addition, these estimators are used in a multiple testing procedure in order to estimate the local false discovery rate. Modèles de mélange Semi-paramétrique Tests multiple Semi-paramétrique Estimateurs à noyau Estimateurs par histogramme Mixture models Semi-parametric Multiple testing False discovery rate Kernel estimators Histogram based estimators
43	Méthodes pour l'analyse des champs profonds extragalactiques MUSE : démélange et fusion de données hyperspectrales ;détection de sources étendues par inférence à grande échelle / Methods for the analysis of extragalactic MUSE deep fields : hyperspectral unmixing and data fusion;detection of extented sources with large-scale inference Bacher, Raphael 08 November 2017 (has links) Ces travaux se placent dans le contexte de l'étude des champs profonds hyperspectraux produits par l'instrument d'observation céleste MUSE. Ces données permettent de sonder l'Univers lointain et d'étudier les propriétés physiques et chimiques des premières structures galactiques et extra-galactiques. La première problématique abordée dans cette thèse est l'attribution d'une signature spectrale pour chaque source galactique. MUSE étant un instrument au sol, la turbulence atmosphérique dégrade fortement le pouvoir de résolution spatiale de l'instrument, ce qui génère des situations de mélange spectral pour un grand nombre de sources. Pour lever cette limitation, des approches de fusion de données, s'appuyant sur les données complémentaires du télescope spatial Hubble et d'un modèle de mélange linéaire, sont proposées, permettant la séparation spectrale des sources du champ. Le second objectif de cette thèse est la détection du Circum-Galactic Medium (CGM). Le CGM, milieu gazeux s'étendant autour de certaines galaxies, se caractérise par une signature spatialement diffuse et de faible intensité spectrale. Une méthode de détection de cette signature par test d'hypothèses est développée, basée sur une stratégie de max-test sur un dictionnaire et un apprentissage des statistiques de test sur les données. Cette méthode est ensuite étendue pour prendre en compte la structure spatiale des sources et ainsi améliorer la puissance de détection tout en conservant un contrôle global des erreurs. Les codes développés sont intégrés dans la bibliothèque logicielle du consortium MUSE afin d'être utilisables par l'ensemble de la communauté. De plus, si ces travaux sont particulièrement adaptés aux données MUSE, ils peuvent être étendus à d'autres applications dans les domaines de la séparation de sources et de la détection de sources faibles et étendues. / This work takes place in the context of the study of hyperspectral deep fields produced by the European 3D spectrograph MUSE. These fields allow to explore the young remote Universe and to study the physical and chemical properties of the first galactical and extra-galactical structures.The first part of the thesis deals with the estimation of a spectral signature for each galaxy. As MUSE is a terrestrial instrument, the atmospheric turbulences strongly degrades the spatial resolution power of the instrument thus generating spectral mixing of multiple sources. To remove this issue, data fusion approaches, based on a linear mixing model and complementary data from the Hubble Space Telescope are proposed, allowing the spectral separation of the sources.The second goal of this thesis is to detect the Circum-Galactic Medium (CGM). This CGM, which is formed of clouds of gas surrounding some galaxies, is characterized by a spatially extended faint spectral signature. To detect this kind of signal, an hypothesis testing approach is proposed, based on a max-test strategy on a dictionary. The test statistics is learned on the data. This method is then extended to better take into account the spatial structure of the targets, thus improving the detection power, while still ensuring global error control.All these developments are integrated in the software library of the MUSE consortium in order to be used by the astrophysical community.Moreover, these works can easily be extended beyond MUSE data to other application fields that need faint extended source detection and source separation methods. Démélange spectral Fusion de données Hyperspectral Inférence à grande échelle Tests multiples Contrôle global d'erreurs Spectral unmixing Data fusion Hyperspectral Large-Scale inference Multiple testing Global error control 620
44	Inférence de graphes par une procédure de test multiple avec application en Neuroimagerie / Graph inference by multiple testing with application to Neuroimaging Roux, Marine 24 September 2018 (has links) Cette thèse est motivée par l’analyse des données issues de l’imagerie par résonance magnétique fonctionnelle (IRMf). La nécessité de développer des méthodes capables d’extraire la structure sous-jacente des données d’IRMf constitue un challenge mathématique attractif. A cet égard, nous modélisons les réseaux de connectivité cérébrale par un graphe et nous étudions des procédures permettant d’inférer ce graphe.Plus précisément, nous nous intéressons à l’inférence de la structure d’un modèle graphique non orienté par une procédure de test multiple. Nous considérons deux types de structure, à savoir celle induite par la corrélation et celle induite par la corrélation partielle entre les variables aléatoires. Les statistiques de tests basées sur ces deux dernières mesures sont connues pour présenter une forte dépendance et nous les supposerons être asymptotiquement gaussiennes. Dans ce contexte, nous analysons plusieurs procédures de test multiple permettant un contrôle des arêtes incluses à tort dans le graphe inféré.Dans un premier temps, nous questionnons théoriquement le contrôle du False Discovery Rate (FDR) de la procédure de Benjamini et Hochberg dans un cadre gaussien pour des statistiques de test non nécessairement positivement dépendantes. Nous interrogeons par suite le contrôle du FDR et du Family Wise Error Rate (FWER) dans un cadre gaussien asymptotique. Nous présentons plusieurs procédures de test multiple, adaptées aux tests de corrélations (resp. corrélations partielles), qui contrôlent asymptotiquement le FWER. Nous proposons de plus quelques pistes théoriques relatives au contrôle asymptotique du FDR.Dans un second temps, nous illustrons les propriétés des procédures contrôlant asymptotiquement le FWER à travers une étude sur simulation pour des tests basés sur la corrélation. Nous concluons finalement par l’extraction de réseaux de connectivité cérébrale sur données réelles. / This thesis is motivated by the analysis of the functional magnetic resonance imaging (fMRI). The need for methods to build such structures from fMRI data gives rise to exciting new challenges for mathematics. In this regards, the brain connectivity networks are modelized by a graph and we study some procedures that allow us to infer this graph.More precisely, we investigate the problem of the inference of the structure of an undirected graphical model by a multiple testing procedure. The structure induced by both the correlation and the partial correlation are considered. The statistical tests based on the latter are known to be highly dependent and we assume that they have an asymptotic Gaussian distribution. Within this framework, we study some multiple testing procedures that allow a control of false edges included in the inferred graph.First, we theoretically examine the False Discovery Rate (FDR) control of Benjamini and Hochberg’s procedure in Gaussian setting for non necessary positive dependent statistical tests. Then, we explore both the FDR and the Family Wise Error Rate (FWER) control in asymptotic Gaussian setting. We present some multiple testing procedures, well-suited for correlation (resp. partial correlation) tests, which provide an asymptotic control of the FWER. Furthermore, some first theoretical results regarding asymptotic FDR control are established.Second, the properties of the multiple testing procedures that asymptotically control the FWER are illustrated on a simulation study, for statistical tests based on correlation. We finally conclude with the extraction of cerebral connectivity networks on real data set. Test multiple Contrôle du FDR Réseaux de connectivité cérébrale Contrôle du FWER IRMf Procédure de Benjamini et Hochberg Multiple testing FDR control Brain connectivity networks FWER control Fmri Procedure of Benjamini and Hochberg 510 620
45	Stochastic modelling using large data sets : applications in ecology and genetics / Modélisation stochastique de grands jeux de données : applications en écologie et en génétique Coudret, Raphaël 16 September 2013 (has links) Deux parties principales composent cette thèse. La première d'entre elles est consacrée à la valvométrie, c'est-à-dire ici l'étude de la distance entre les deux parties de la coquille d'une huître au cours du temps. La valvométrie est utilisée afin de déterminer si de tels animaux sont en bonne santé, pour éventuellement tirer des conclusions sur la qualité de leur environnement. Nous considérons qu'un processus de renouvellement à quatre états sous-tend le comportement des huîtres étudiées. Afin de retrouver ce processus caché dans le signal valvométrique, nous supposons qu'une densité de probabilité reliée à ce signal est bimodale. Nous comparons donc plusieurs estimateurs qui prennent en compte ce type d'hypothèse, dont des estimateurs à noyau.Dans un second temps, nous comparons plusieurs méthodes de régression, dans le but d'analyser des données transcriptomiques. Pour comprendre quelles variables explicatives influent sur l'expression de gènes, nous avons réalisé des tests multiples grâce au modèle linéaire FAMT. La méthode SIR peut être envisagée pour trouver des relations non-linéaires. Toutefois, elle est principalement employée lorsque la variable à expliquer est univariée. Une version multivariée de cette approche a donc été développée. Le coût d'acquisition des données transcriptomiques pouvant être élevé, la taille n des échantillons correspondants est souvent faible. C'est pourquoi, nous avons également étudié la méthode SIR lorsque n est inférieur au nombre de variables explicatives p. / There are two main parts in this thesis. The first one concerns valvometry, which is here the study of the distance between both parts of the shell of an oyster, over time. The health status of oysters can be characterized using valvometry in order to obtain insights about the quality of their environment. We consider that a renewal process with four states underlies the behaviour of the studied oysters. Such a hidden process can be retrieved from a valvometric signal by assuming that some probability density function linked with this signal, is bimodal. We then compare several estimators which take this assumption into account, including kernel density estimators.In another chapter, we compare several regression approaches, aiming at analysing transcriptomic data. To understand which explanatory variables have an effect on gene expressions, we apply a multiple testing procedure on these data, through the linear model FAMT. The SIR method may find nonlinear relations in such a context. It is however more commonly used when the response variable is univariate. A multivariate version of SIR was then developed. Procedures to measure gene expressions can be expensive. The sample size n of the corresponding datasets is then often small. That is why we also studied SIR when n is less than the number of explanatory variables p. Données transcriptomiques Estimateur à noyau Processus de renouvellement Régression inverse par tranches Tests multiples Valvométrie Kernel density estimator Multiple testing Renewal process Sliced inverse regression Transcriptomics Valvometry
46	Étude des déterminants de la puissance statistique en spectrométrie de masse / Statistical power determinants in mass-spectrometry Jouve, Thomas 03 December 2009 (has links) La spectrométrie de masse fait partie des technologies haut débit et offre à ce titre un regard inédit, à une échelle nouvelle, sur les protéines contenues dans divers échantillons biologiques. Les études biomédicales utilisant cette technologie sont de plus en plus nombreuses et visent à détecter de nouveaux biomarqueurs de différents processus biologiques, notamment de processus pathologiques à l'origine de cancers. Cette utilisation comme outil de criblage pose des questions quant à la capacité même des expériences de spectrométrie de masse dans cette détection. La puissance statistique traduit cette capacité et rappelle que les études doivent être calibrées pour offrir des garanties suffisantes de succès. Toutefois, cette exploration de la puissance statistique en spectrométrie de masse n'a pas encore été réalisée. L'objet de cette thèse est précisément l'étude des déterminants de la puissance pour la détection de biomarqueurs en spectrométrie de masse. Une revue de la littérature a été réalisée, reprenant l'ensemble des étapes nécessaires du traitement du signal, afin de bien comprendre les techniques utilisées. Les méthodes statistiques disponibles pour l'analyse du signal ainsi traité sont revues et mises en perspective. Les situations de tests multiples, qui émergent notamment de ces données de spectrométrie de masse, suggèrent une redéfinition de la puissance, détaillée par la suite. La puissance statistique dépend du plan d'expérience. La taille d'échantillon, la répartition entre groupes étudiés et l'effet différentiel ont été investigués, par l'intermédiaire de simulations d'expériences de spectrométrie de masse. On retrouve ainsi les résultats classiques de la puissance, faisant notamment ressortir le besoin crucial d'augmenter la tailles des études pour détecter des biomarqueurs, particulièrement lorsque ceux-ci présentent un faible effet différentiel. Au delà de ces déterminants classiques de la puissance, des déterminants propres à la spectrométrie de masse apparaissent. Une chute importante de puissance est mise en évidence, due à l'erreur de mesure des technologies de spectrométrie de masse. Une synergie péjorative existe de plus entre erreur de mesure et procédure de contrôle du risque de première espèce de type FDR. D'autre part, les méthodes de détection des pics, par leurs imperfections (faux pics et pics manqués), induisent un contrôle suboptimal de ce risque de première espèce, conduisant à une autre chute de puissance. Ce travail de thèse met ainsi en évidence trois niveaux d'intervention possibles pour améliorer la puissance des études : la meilleure calibration des plans d'expérience, la minimisation de l'erreur de mesure et l'amélioration des algorithmes de prétraitement. La technologie même de spectrométrie de masse ne pourra conduire de façon fiable à la détection de nouveaux biomarqueurs qu'au prix d'un travail à ces trois niveaux. / Mass-spectrometry (MS) belongs to the high-throughput technologies and therefore offers an originalperspective on proteins contained in various biological samples, at a new scale. Biomedicalstudies using this technology are increasingly frequent. They aim at detecting new biomarkersof different biological processes, especially pathological processes leading to cancer. This use asa screening tool asks questions regarding the very detection effectiveness of MS experiments.Statistical power is the direct translation of this effectiveness and reminds us that calibratedstudies are required to offer sufficient guarantees of success. However, this exploration of statisticalpower in mass-spectrometry has not been performed yet. The theme of this work is preciselythe study of power determinants for the detection of biomarkers in MS studies.A literature review was performed, summarizing all necessary pretreatment steps of thesignal analysis, in order to understand the utilized techniques. Available statistical methods forthe analysis of this pretreated signal are also reviewed and put into perspective. Multiple testingsettings arising from MS data suggest a power redefinition. This power redefinition is detailed.Statistical power depends on the study design. Sample sizes, group repartition and the differentialeffect were investigated through MS experiment simulations. Classical results of statisticalpower are acknowledged, with an emphasis on the crucial need to increase sample sizes forbiomarker detection, especially when these markers show low differential effects.Beyond these classical power determinants, mass-spectrometry specific determinants appear.An important power drop is experienced when taking into account the high measurement variabilityencountered in mass-spectrometry. A detrimental synergy exists between measurementvariability and type 1 error control procedures (e.g. FDR). Furtheremore, the imperfections ofpeak detection methods (false and missed peaks) induce a sub-optimal control of this type 1error, leading to another power drop.This work shows three possible intervention levels if we want to improve power in MS studies: a better study design, measurement variability minimisation and pretreatment algorithmsimprovements. Only a work at these three levels can guarantee reliable biomarker detections inthese studies. Puissance statistique Protéomique Spectrométrie de masse Haut-débit Calibration Tests multiples Perte séquentielle de puissance Statistical power Proteomics Mass-spectrometry High-throughput Calibration Multiple testing Sequential power los
47	Multiplicité des tests, et calculs de taille d'échantillon en recherche clinique / Multiplicity of tests, and sample size determination of clinical trials Riou, Jérémie 11 December 2013 (has links) Ce travail a eu pour objectif de répondre aux problématiques inhérentes aux tests multiples dans le contexte des essais cliniques. A l’heure actuelle un nombre croissant d’essais cliniques ont pour objectif d’observer l’effet multifactoriel d’un produit, et nécessite donc l’utilisation de co-critères de jugement principaux. La significativité de l’étude est alors conclue si et seulement si nous observons le rejet d’au moins r hypothèses nulles parmi les m hypothèses nulles testées. Dans ce contexte, les statisticiens doivent prendre en compte la multiplicité induite par cette pratique. Nous nous sommes consacrés dans un premier temps à la recherche d’une correction exacte pour l’analyse des données et le calcul de taille d’échantillon pour r = 1. Puis nous avons travaillé sur le calcul de taille d’´echantillon pour toutes valeurs de r, quand les procédures en une étape, ou les procédures séquentielles sont utilisées. Finalement nous nous sommes intéressés à la correction du degré de signification engendré par la recherche d’un codage optimal d’une variable explicative continue dans un modèle linéaire généralisé / This work aimed to meet multiple testing problems in clinical trials context. Nowadays, in clinical research it is increasingly common to define multiple co-primary endpoints in order to capture a multi-factorial effect of the product. The significance of the study is concluded if and only if at least r null hypotheses are rejected among the m null hypotheses. In this context, statisticians need to take into account multiplicity problems. We initially devoted our work on exact correction of the multiple testing for data analysis and sample size computation, when r = 1. Then we worked on sample size computation for any values of r, when stepwise and single step procedures are used. Finally we are interested in the correction of significance level generated by the search for an optimal coding of a continuous explanatory variable in generalized linear model. Calcul de taille d’´echantillon Co-critères de jugement principaux Essais cliniques Tests multiples Clinical trials Co-primary endpoints Multiple Testing Sample-size Computation
48	False Discovery Rates, Higher Criticism and Related Methods in High-Dimensional Multiple Testing Klaus, Bernd 09 January 2013 (has links) The technical advancements in genomics, functional magnetic-resonance and other areas of scientific research seen in the last two decades have led to a burst of interest in multiple testing procedures. A driving factor for innovations in the field of multiple testing has been the problem of large scale simultaneous testing. There, the goal is to uncover lower--dimensional signals from high--dimensional data. Mathematically speaking, this means that the dimension d is usually in the thousands while the sample size n is relatively small (max. 100 in general, often due to cost constraints) --- a characteristic commonly abbreviated as d >> n. In my thesis I look at several multiple testing problems and corresponding procedures from a false discovery rate (FDR) perspective, a methodology originally introduced in a seminal paper by Benjamini and Hochberg (2005). FDR analysis starts by fitting a two--component mixture model to the observed test statistics. This mixture consists of a null model density and an alternative component density from which the interesting cases are assumed to be drawn. In the thesis I proposed a new approach called log--FDR to the estimation of false discovery rates. Specifically, my new approach to truncated maximum likelihood estimation yields accurate null model estimates. This is complemented by constrained maximum likelihood estimation for the alternative density using log--concave density estimation. A recent competitor to the FDR is the method of \"Higher Criticism\". It has been strongly advocated in the context of variable selection in classification which is deeply linked to multiple comparisons. Hence, I also looked at variable selection in class prediction which can be viewed as a special signal identification problem. Both FDR methods and Higher Criticism can be highly useful for signal identification. This is discussed in the context of variable selection in linear discriminant analysis (LDA), a popular classification method. FDR methods are not only useful for multiple testing situations in the strict sense, they are also applicable to related problems. I looked at several kinds of applications of FDR in linear classification. I present and extend statistical techniques related to effect size estimation using false discovery rates and showed how to use these for variable selection. The resulting fdr--effect method proposed for effect size estimation is shown to work as well as competing approaches while being conceptually simple and computationally inexpensive. Additionally, I applied the fdr--effect method to variable selection by minimizing the misclassification rate and showed that it works very well and leads to compact and interpretable feature sets. info:eu-repo/classification/ddc/500 ddc:500
49	Novel Step-Down Multiple Testing Procedures Under Dependence Lu, Shihai 01 December 2014 (has links) No description available. Mathematics Statistics Multiple comparisons Multiple testing procedure Familywise error rate FWER k-FWER Step-down procedure Holm procedure Correlation Bivariate joint distribution
50	Multiple Testing Procedures for One- and Two-Way Classified Hypotheses Nandi, Shinjini January 2019 (has links) Multiple testing literature contains ample research on controlling false discoveries for hypotheses classified according to one criterion, which we refer to as `one-way classified hypotheses'. However, one often encounters the scenario of `two-way classified hypotheses' where hypotheses can be partitioned into two sets of groups via two different criteria. Associated multiple testing procedures that incorporate such structural information are potentially more effective than their one-way classified or non-classified counterparts. To the best of our knowledge, very little research has been pursued in this direction. This dissertation proposes two types of multiple testing procedures for two-way classified hypotheses. In the first part, we propose a general methodology for controlling the false discovery rate (FDR) using the Benjamini-Hochberg (BH) procedure based on weighted p-values. The weights can be appropriately chosen to reflect one- or two-way classified structure of hypotheses, producing novel multiple testing procedures for two-way classified hypotheses. Newer results for one-way classified hypotheses have been obtained in this process. Our proposed procedures control the false discovery rate (FDR) non-asymptotically in their oracle forms under positive regression dependence on subset of null p-values (PRDS) and in their data-adaptive forms for independent p-values. Simulation studies demonstrate that our proposed procedures can be considerably more powerful than some contemporary methods in many instances and that our data-adaptive procedures can non-asymptotically control the FDR under certain dependent scenarios. The proposed two-way adaptive procedure is applied to a data set from microbial abundance study, for which it makes more discoveries than an existing method. In the second part, we propose a Local false discovery rate (Lfdr) based multiple testing procedure for two-way classified hypotheses. The procedure has been developed in its oracle form under a model based framework that isolates the effects due to two-way grouping from the significance of an individual hypothesis. Simulation studies show that our proposed procedure successfully controls the average proportion of false discoveries, and is more powerful than existing methods. / Statistics Statistics Data-adaptive One-way Grouped Bh Data-adaptive Two-way Grouped Bh Lfdr Based Two-way Gate Multiple Testing One-way Grouped Bh Two-way Grouped Bh

Search results