Spelling suggestions: "subject:"false discovery date"" "subject:"false discovery rate""
1 |
Search for Cosmic Sources of High Energy Neutrinos with the AMANDA-II Detector Recherche de sources cosmiques de neutrinos à haute énergie avec le détecteur AMANDA-IILabare, Mathieu 26 January 2010 (has links)
AMANDA-II est un télescope à neutrinos composé d'un réseau tri-dimensionnel de senseurs optiques déployé dans la glace du Pôle Sud.
Son principe de détection repose sur la mise en évidence de particules secondaires chargées émises lors de l'interaction d'un neutrino de haute énergie (> 100 GeV) avec la matière environnant le détecteur, sur base de la détection de rayonnement Cerenkov.
Ce travail est basé sur les données enregistrées par AMANDA-II entre 2000 et 2006, afin de rechercher des sources cosmiques de neutrinos.
Le signal recherché est affecté d'un bruit de fond important de muons et de neutrinos issus de l'interaction du rayonnement cosmique primaire dans l'atmosphère. En se limitant à l'observation de l'hémisphère nord, le bruit de fond des muons atmosphériques, absorbés par la Terre, est éliminé.
Par contre, les neutrinos atmosphériques forment un bruit de fond irréductible constituant la majorité des 6100 événements sélectionnés pour cette analyse.
Il est cependant possible d'identifier une source ponctuelle de neutrinos cosmiques en recherchant un excès local se détachant du bruit de fond isotrope de neutrinos atmosphériques, couplé à une sélection basée sur l'énergie, dont le spectre est différent pour les deux catégories de neutrinos.
Une approche statistique originale est développée dans le but d'optimiser le pouvoir de détection de sources ponctuelles, tout en contrôlant le taux de fausses découvertes, donc le niveau de confiance d'une observation.
Cette méthode repose uniquement sur la connaissance de l'hypothèse de bruit de fond, sans aucune hypothèse sur le modèle de production de neutrinos par les sources recherchées. De plus, elle intègre naturellement la notion de facteur d'essai rencontrée dans le cadre de test d'hypothèses multiples.La procédure a été appliquée sur l'échantillon final d'évènements récoltés par AMANDA-II.
---------
MANDA-II is a neutrino telescope which comprises a three dimensional array of optical sensors deployed in the South Pole glacier.
Its principle rests on the detection of the Cherenkov radiation emitted by charged secondary particles produced by the interaction of a high energy neutrino (> 100 GeV) with the matter surrounding the detector.
This work is based on data recorded by the AMANDA-II detector between 2000 and 2006 in order to search for cosmic sources of neutrinos. A potential signal must be extracted from the overwhelming background of muons and neutrinos originating from the interaction of primary cosmic rays within the atmosphere.
The observation is limited to the northern hemisphere in order to be free of the atmospheric muon background, which is stopped by the Earth. However, atmospheric neutrinos constitute an irreducible background composing the main part of the 6100 events selected for this analysis.
It is nevertheless possible to identify a point source of cosmic neutrinos by looking for a local excess breaking away from the isotropic background of atmospheric neutrinos;
This search is coupled with a selection based on the energy, whose spectrum is different from that of the atmospheric neutrino background.
An original statistical approach has been developed in order to optimize the detection of point sources, whilst controlling the false discovery rate -- hence the confidence level -- of an observation. This method is based solely on the knowledge of the background hypothesis, without any assumption on the production model of neutrinos in sought sources. Moreover, the method naturally accounts for the trial factor inherent in multiple testing.The procedure was applied on the final sample of events collected by AMANDA-II.
|
2 |
Detecting differentially expressed genes while controlling the false discovery rate for microarray dataJiao, Shuo. January 2009 (has links)
Thesis (Ph.D.)--University of Nebraska-Lincoln, 2009. / Title from title screen (site viewed March 2, 2010). PDF text: 100 p. : col. ill. ; 953 K. UMI publication number: AAT 3379821. Includes bibliographical references. Also available in microfilm and microfiche formats.
|
3 |
An adaptive single-step FDR controlling procedureIyer, Vishwanath January 2010 (has links)
This research is focused on identifying a single-step procedure that, upon adapting to the data through estimating the unknown parameters, would asymptotically control the False Discovery Rate when testing a large number of hypotheses simultaneously, and exploring some of the characteristics of this procedure. / Statistics
|
4 |
Regaining control of false findings in feature selection, classification, and prediction on neuroimaging and genomics dataJanuary 2018 (has links)
acase@tulane.edu / The technological advances of past decades have led to the accumulation of large amounts of genomic and neuroimaging data, enabling novel strategies in precision medicine. These largely rely on machine learning algorithms and modern statistical methods for big biological datasets, which are data-driven rather than hypothesis-driven. These methods often lack guarantees on the validity of the research findings. Because it can be a matter of life and death, when computational methods are deployed in clinical practice in medicine, establishing guarantees on the validity of the results is essential for the advancement of precision medicine. This thesis proposes several novel sparse regression and sparse canonical correlation analysis techniques, which by design include guarantees on the false discovery rate in variable selection. Variable selection on biomedical data is essential for many areas of healthcare, including precision medicine, population stratification, drug development, and predictive modeling of disease phenotypes. Predictive machine learning models can directly affect the patient when used to aid diagnosis, and therefore they need to be thoroughly evaluated before deployment. We present a novel approach to validly reuse the test data for performance evaluation of predictive models. The proposed methods are validated in the application on large genomic and neuroimaging datasets, where they confirm results from previous studies and also lead to new biological insights. In addition, this work puts a focus on making the proposed methods widely available to the scientific community though the release of free and open-source scientific software. / 1 / Alexej Gossmann
|
5 |
Topics in multiple hypotheses testingQian, Yi 25 April 2007 (has links)
It is common to test many hypotheses simultaneously in the application of statistics.
The probability of making a false discovery grows with the number of statistical tests
performed. When all the null hypotheses are true, and the test statistics are indepen-
dent and continuous, the error rates from the family wise error rate (FWER)- and
the false discovery rate (FDR)-controlling procedures are equal to the nominal level.
When some of the null hypotheses are not true, both procedures are conservative. In
the first part of this study, we review the background of the problem and propose
methods to estimate the number of true null hypotheses. The estimates can be used
in FWER- and FDR-controlling procedures with a consequent increase in power. We
conduct simulation studies and apply the estimation methods to data sets with bio-
logical or clinical significance.
In the second part of the study, we propose a mixture model approach for the
analysis of ChIP-chip high density oligonucleotide array data to study the interac-
tions between proteins and DNA. If we could identify the specific locations where
proteins interact with DNA, we could increase our understanding of many important
cellular events. Most experiments to date are performed in culture on cell lines, bac-
teria, or yeast, and future experiments will include those in developing tissues, organs,
or cancer biopsies, and they are critical in understanding the function of genes and proteins. Here we investigate the ChIP-chip data structure and use a beta-mixture
model to help identify the binding sites. To determine the appropriate number of
components in the mixture model, we suggest the Anderson-Darling testing. Our
study indicates that it is a reasonable means of choosing the number of components
in a beta-mixture model. The mixture model procedure has broad applications in
biology and is illustrated with several data sets from bioinformatics experiments.
|
6 |
RECOVERING SPARSE DIFFERENCES BETWEEN TWO HIGH-DIMENSIONAL COVARIANCE MATRICESALHARBI, YOUSEF S. 19 July 2017 (has links)
No description available.
|
7 |
Multiple Testing in Grouped Dependent DataClements, Nicolle January 2013 (has links)
This dissertation is focused on multiple testing procedures to be used in data that are naturally grouped or possess a spatial structure. We propose `Two-Stage' procedure to control the False Discovery Rate (FDR) in situations where one-sided hypothesis testing is appropriate, such as astronomical source detection. Similarly, we propose a `Three-Stage' procedure to control the mixed directional False Discovery Rate (mdFDR) in situations where two-sided hypothesis testing is appropriate, such as vegetation monitoring in remote sensing NDVI data. The Two and Three-Stage procedures have provable FDR/mdFDR control under certain dependence situations. We also present the Adaptive versions which are examined under simulation studies. The `Stages' refer to testing hypotheses both group-wise and individually, which is motivated by the belief that the dependencies among the p-values associated with the spatially oriented hypotheses occur more locally than globally. Thus, these `Staged' procedures test hypotheses in groups that incorporate the local, unknown dependencies of neighboring p-values. If a group is found significant, further investigation is done to the individual p-values within that group. For the vegetation monitoring data, we extend the investigation by providing some spatio-temporal models and forecasts to some regions where significant change was detected through the multiple testing procedure. / Statistics
|
8 |
Falso positivo na performance dos fundos de investimento com gestão ativa no Brasil: mensurando sorte dos gestores nos alfas estimadosJesus, Marcelo de 01 February 2011 (has links)
Made available in DSpace on 2016-03-15T19:30:42Z (GMT). No. of bitstreams: 1
Marcelo de Jesus.pdf: 753815 bytes, checksum: 4b3631ad6c0a3a4e6928e2b70685850d (MD5)
Previous issue date: 2011-02-01 / This study investigates, for the period between 2002 and 2009, what is the impact of luck on the performance of stocks mutual funds managers with active management in Brazil to surpass its benchmark. To that purpose, we used a new method, the False Discovery Rate approach - FDR to empirically test those impact. To measure precisely luck and unluck, ig, the frequency of false positives (Type I errors) in the tails of the cross-section of the tdistribution associated with the alphas of funds in the sample, this new approach was applied to measure the skills of grouped shape managers of stock funds with active management in Brazil. The FDR approach offers a simple and objective method to estimate the proportion of skilled funds (with a positive alpha), alpha-zero funds, and unskilled funds (with a negative alpha) across the population. Applying the FDR technique, it was found as a result of research that the majority of funds were alpha-zero, then no truly skilled funds, and only a small proportion of truly skilled funds. / Esta pesquisa investiga, para o período entre 2002 e 2009, qual o impacto da sorte na performance dos gestores de fundos de investimentos em ações com gestão ativa no Brasil que superam o seu benchmark. Para tanto, foi usado um novo método, a abordagem False Discovery Rate - FDR para testar empiricamente esse impacto. Para mensurar precisamente sorte e azar, ou seja, a freqüência de falsos positivos (erros do tipo I) nas caudas do crosssection da distribuição t associadas aos alfas dos fundos da amostra, foi aplicada essa nova abordagem para mensurar de forma agrupada a habilidade dos gestores de fundos de ações com gestão ativa no Brasil. A abordagem FDR oferece um método simples e objetivo para estimar a proporção de fundos habilidosos (com um alfa positivo), fundos de alfa-zero, e fundos não habilidosos (com um alfa negativo) em toda a população. Aplicando-se a técnica FDR, encontrou-se como resultado da pesquisa que a maioria dos fundos foram alfa-zero, seguida pelos fundos verdadeiramente não habilidosos, e apenas uma pequena proporção de fundos verdadeiramente habilidosos.
|
9 |
Regularized methods for high-dimensional and bi-level variable selectionBreheny, Patrick John 01 July 2009 (has links)
Many traditional approaches cease to be useful when the number of variables is large in comparison with the sample size. Penalized regression methods have proved to be an attractive approach, both theoretically and empirically, for dealing with these problems. This thesis focuses on the development of penalized regression methods for high-dimensional variable selection. The first part of this thesis deals with problems in which the covariates possess a grouping structure that can be incorporated into the analysis to select important groups as well as important members of those groups. I introduce a framework for grouped penalization that encompasses the previously proposed group lasso and group bridge methods, sheds light on the behavior of grouped penalties, and motivates the proposal of a new method, group MCP.
The second part of this thesis develops fast algorithms for fitting models with complicated penalty functions such as grouped penalization methods. These algorithms combine the idea of local approximation of penalty functions with recent research into coordinate descent algorithms to produce highly efficient numerical methods for fitting models with complicated penalties. Importantly, I show these algorithms to be both stable and linear in the dimension of the feature space, allowing them to be efficiently scaled up to very large problems.
In the third part of this thesis, I extend the idea of false discovery rates to penalized regression. The Karush-Kuhn-Tucker conditions describing penalized regression estimates provide testable hypotheses involving partial residuals. I use these hypotheses to connect the previously disparate elds of multiple comparisons and penalized regression, develop estimators for the false discovery rates of methods such as the lasso and elastic net, and establish theoretical results.
Finally, the methods from all three sections are studied in a number of simulations and applied to real data from gene expression and genetic association studies.
|
10 |
Marginal false discovery rate approaches to inference on penalized regression modelsMiller, Ryan 01 August 2018 (has links)
Data containing large number of variables is becoming increasingly more common and sparsity inducing penalized regression methods, such the lasso, have become a popular analysis tool for these datasets due to their ability to naturally perform variable selection. However, quantifying the importance of the variables selected by these models is a difficult task. These difficulties are compounded by the tendency for the most predictive models, for example those which were chosen using procedures like cross-validation, to include substantial amounts of noise variables with no real relationship with the outcome. To address the task of performing inference on penalized regression models, this thesis proposes false discovery rate approaches for a broad class of penalized regression models. This work includes the development of an upper bound for the number of noise variables in a model, as well as local false discovery rate approaches that quantify the likelihood of each individual selection being a false discovery. These methods are applicable to a wide range of penalties, such as the lasso, elastic net, SCAD, and MCP; a wide range of models, including linear regression, generalized linear models, and Cox proportional hazards models; and are also extended to the group regression setting under the group lasso penalty. In addition to studying these methods using numerous simulation studies, the practical utility of these methods is demonstrated using real data from several high-dimensional genome wide association studies.
|
Page generated in 0.0807 seconds