Global ETD Search

31	A RESAMPLING BASED APPROACH IN EVALUATION OF DOSE-RESPONSE MODELS Fu, Min January 2014 (has links) In this dissertation, we propose a computational approach using a resampling based permutation test as an alternative to MCP-Mod (a hybrid framework integrating the multiple comparison procedure and the modeling technique) and gMCP-Mod (generalized MCP-Mod) [11], [29] in the step of identifying significant dose-response signals via model selection. We name our proposed approach RMCP-Mod or gRMCP-Mod correspondingly. The RMCP-Mod/gRMCP-Mod transforms the drug dose comparisons into a dose-response model selection issue via multiple hypotheses testing, an area where not much extended researches have been done, and solve it using resampling based multiple testing procedures [38]. The proposed approach avoids the inclusion of the prior dose-response knowledge known as "guesstimates" used in the model selection step of the MCP-Mod/gMCP-Mod framework, and therefore reduces the uncertainty in the significant model identification. When a new drug is being developed to treat patients with a specified disease, one of the key steps is to discover an optimal drug dose or doses that would produce the desired clinical effect with an acceptable level of toxicity. In order to nd such a dose or doses (different doses may be able to produce the same or better clinical effect with similar acceptable toxicity), the underlying dose-response signals need to be identified and thoroughly examined through statistical analyses. A dose-response signal refers to the fact that a drug has different clinical effects at many quantitative dose levels. Statistically speaking, the dose-response signal is a numeric relationship curve (shape) between drug doses and the clinical effects in quantitative measures. It's often been a challenge to nd correct and accurate efficacy and/or safety dose-response signals that would best describe the dose-effect relationship in the drug development process via conventional statistical methods because the conventional methods tend to either focus on a fixed, small number of quantitative dosages or evaluate multiple pre-denied dose-response models without Type I error control. In searching for more efficient methods, a framework of combining both multiple comparisons procedure (MCP) and model-based (Mod) techniques acronymed MCP-Mod was developed by F. Bretz, J. C. Pinheiro, and M. Branson [11] to handle normally distributed, homoscedastic dose response observations. Subsequently, a generalized version of the MCP- Mod named gMCP-Mod which can additionally deal with binary, counts, or time-to-event dose-response data as well as repeated measurements over time was developed by J. C. Pinheiro, B. Bornkamp, E. Glimm and F. Bretz [29]. The MCP-Mod/gMCP-Mod uses the guesstimates" in the MCP step to pre-specify parameters of the candidate models; however, in situations where the prior knowledge of the dose-response information is difficult to obtain, the uncertainties could be introduced into the model selection process, impacting on the correctness of the model identification. Throughout the evaluation of its application to the hypothetical and real study examples as well as simulation comparisons to the MCP-Mod/gMCP-Mod, our proposed approach, RMCP-Mod/gRMCP-Mod seems a viable method that can be used in the practice with some further improvements and researches that are still needed in applications to broader dose-response data types. / Statistics Biostatistics Dose--response Mcp-mod Multiple Testing Procedures Permutation Test Proof-of-concept Resampliing
32	On Group-Sequential Multiple Testing Controlling Familywise Error Rate Fu, Yiyong January 2015 (has links) The importance of multiplicity adjustment has gained wide recognition in modern scientific research. Without it, there will be too many spurious results and reproducibility becomes an issue; with it, if overtly conservative, discoveries will be made more difficult. In the current literature on repeated testing of multiple hypotheses, Bonferroni-based methods are still the main vehicle carrying the bulk of multiplicity adjustment. There is room for power improvement by suitably utilizing both hypothesis-wise and analysis- wise dependencies. This research will contribute to the development of a natural group-sequential extension of the classical stepwise multiple testing procedures, such as Dunnett’s stepdown and Hochberg’s step-up procedures. It is shown that the proposed group-sequential procedures strongly control the familywise error rate while being more powerful than the recently developed class of group-sequential Bonferroni-Holm’s procedures. Particularly in this research, a convexity property is discovered for the distribution of the maxima of pairwise null P-values with the underlying test statistics having distributions such as bivariate normal, t, Gamma, F, or Archimedean copulas. Such property renders itself for an immediate use in improving Holm’s procedure by incorporating pairwise dependencies of P-values. The improved Holm’s procedure, as all stepdown multiple testing procedures, can also be naturally extended to group-sequential setting. / Statistics Statistics Familywise Error Rate Group-sequential Hochberg's Step-up Procedure Holm's Step-down Procedure Multiple Testing
33	以White的真實性檢定與Stepwise Multiple Testing來檢驗技術分析在不同股票市場的獲利性 / Examining the profitability of technical analysis with white’s reality check and stepwise multiple testing in different stock markets 俞海慶, Yu, Hai Cing Unknown Date (has links) 在使用White的真實性檢定和Stepwise Multiple Test消除資料勘誤的問題之後，有些技術分析確實可以擊敗大盤，在1989到2008，DJIA, NASDAQ, S&P 500, NIKKEI 225, TAIEX這五個指數中。但是在較不成熟的市場或較過去的時間內，我沒辦法找到任何強烈的關係在這些市場與超額報酬間。還有學習策略通常沒辦法獲得比簡單策略更好的表現，代表使用過去最好的策略來預測未來並不是個好主意。我同時還發現在熊市比穩定的牛市更有可能擊敗買進持有的策略。 / In five indices, DJIA, NASDAQ, S&P 500, NIKKEI 225, TAIEX, from 1989 to 2008, some technical trading rules indeed can defeat the broad market even after using the White reality check and stepwise multiple test to solve the data snooping problem. But in the markets like less mature ones or the one which was in the older period, I can’t find a strong relation between these markets and the excess return in my research. And the learning strategy usually can’t have a better performance than the simple one, means applying the rule which had a best record to forecast the future may not be a good idea. I also found that it is more likely to beat the buy and hold strategy when there is a bear market but not a steady bull market. 資料勘誤技術分析 data snooping White's reality check Stepwise multiple testing Technical trading rule
34	False Discovery Rates, Higher Criticism and Related Methods in High-Dimensional Multiple Testing Klaus, Bernd 16 January 2013 (has links) (PDF) The technical advancements in genomics, functional magnetic-resonance and other areas of scientific research seen in the last two decades have led to a burst of interest in multiple testing procedures. A driving factor for innovations in the field of multiple testing has been the problem of large scale simultaneous testing. There, the goal is to uncover lower--dimensional signals from high--dimensional data. Mathematically speaking, this means that the dimension d is usually in the thousands while the sample size n is relatively small (max. 100 in general, often due to cost constraints) --- a characteristic commonly abbreviated as d >> n. In my thesis I look at several multiple testing problems and corresponding procedures from a false discovery rate (FDR) perspective, a methodology originally introduced in a seminal paper by Benjamini and Hochberg (2005). FDR analysis starts by fitting a two--component mixture model to the observed test statistics. This mixture consists of a null model density and an alternative component density from which the interesting cases are assumed to be drawn. In the thesis I proposed a new approach called log--FDR to the estimation of false discovery rates. Specifically, my new approach to truncated maximum likelihood estimation yields accurate null model estimates. This is complemented by constrained maximum likelihood estimation for the alternative density using log--concave density estimation. A recent competitor to the FDR is the method of \"Higher Criticism\". It has been strongly advocated in the context of variable selection in classification which is deeply linked to multiple comparisons. Hence, I also looked at variable selection in class prediction which can be viewed as a special signal identification problem. Both FDR methods and Higher Criticism can be highly useful for signal identification. This is discussed in the context of variable selection in linear discriminant analysis (LDA), a popular classification method. FDR methods are not only useful for multiple testing situations in the strict sense, they are also applicable to related problems. I looked at several kinds of applications of FDR in linear classification. I present and extend statistical techniques related to effect size estimation using false discovery rates and showed how to use these for variable selection. The resulting fdr--effect method proposed for effect size estimation is shown to work as well as competing approaches while being conceptually simple and computationally inexpensive. Additionally, I applied the fdr--effect method to variable selection by minimizing the misclassification rate and showed that it works very well and leads to compact and interpretable feature sets. Multiples Testen Hochdimensionale Daten FDR Klassifikation Higher Criticism Multiple Testing High-dimensional Data FDR Classification Higher Criticism ddc:500
35	Statistical Multiscale Segmentation: Inference, Algorithms and Applications Sieling, Hannes 22 January 2014 (has links) No description available. 510 multiple testing dynamic programming change-point regression exponential families multiscale methods honest confidence sets Mathematics (PPN61756535X)
36	Stochastic modelling using large data sets : applications in ecology and genetics Coudret, Raphaël 16 September 2013 (has links) (PDF) There are two main parts in this thesis. The first one concerns valvometry, which is here the study of the distance between both parts of the shell of an oyster, over time. The health status of oysters can be characterized using valvometry in order to obtain insights about the quality of their environment. We consider that a renewal process with four states underlies the behaviour of the studied oysters. Such a hidden process can be retrieved from a valvometric signal by assuming that some probability density function linked with this signal, is bimodal. We then compare several estimators which take this assumption into account, including kernel density estimators.In another chapter, we compare several regression approaches, aiming at analysing transcriptomic data. To understand which explanatory variables have an effect on gene expressions, we apply a multiple testing procedure on these data, through the linear model FAMT. The SIR method may find nonlinear relations in such a context. It is however more commonly used when the response variable is univariate. A multivariate version of SIR was then developed. Procedures to measure gene expressions can be expensive. The sample size n of the corresponding datasets is then often small. That is why we also studied SIR when n is less than the number of explanatory variables p. Kernel density estimator Multiple testing Renewal process Sliced inverse regression Transcriptomics Valvometry
37	Interfaces between Bayesian and Frequentist Multiplte Testing CHANG, SHIH-HAN January 2015 (has links) <p>This thesis investigates frequentist properties of Bayesian multiple testing procedures in a variety of scenarios and depicts the asymptotic behaviors of Bayesian methods. Both Bayesian and frequentist approaches to multiplicity control are studied and compared, with special focus on understanding the multiplicity control behavior in situations of dependence between test statistics.</p><p>Chapter 2 examines a problem of testing mutually exclusive hypotheses with dependent data. The Bayesian approach is shown to have excellent frequentist properties and is argued to be the most effective way of obtaining frequentist multiplicity control without sacrificing power. Chapter 3 further generalizes the model such that multiple signals are acceptable, and depicts the asymptotic behavior of false positives rates and the expected number of false positives. Chapter 4 considers the problem of dealing with a sequence of different trials concerning some medical or scientific issue, and discusses the possibilities for multiplicity control of the sequence. Chapter 5 addresses issues and efforts in reconciling frequentist and Bayesian approaches in sequential endpoint testing. We consider the conditional frequentist approach in sequential endpoint testing and show several examples in which Bayesian and frequentist methodologies cannot be made to match.</p> / Dissertation Statistics Biostatistics Applied mathematics Bayesian statistics clinical trials large-scale inference model selection multiple testing sequential testing
38	Détection de sources quasi-ponctuelles dans des champs de données massifs / Quasi-ponctual sources detection in massive data fields Meillier, Céline 15 October 2015 (has links) Dans cette thèse, nous nous sommes intéressés à la détection de galaxies lointaines dans les données hyperspectrales MUSE. Ces galaxies, en particulier, sont difficiles à observer, elles sont spatialement peu étendues du fait de leur distance, leur spectre est composé d'une seule raie d'émission dont la position est inconnue et dépend de la distance de la galaxie, et elles présentent un rapport signal-à-bruit très faible. Ces galaxies lointaines peuvent être considérées comme des sources quasi-ponctuelles dans les trois dimensions du cube. Il existe peu de méthodes dans la littérature qui permettent de détecter des sources dans des données en trois dimensions. L'approche proposée dans cette thèse repose sur la modélisation de la configuration de galaxies par un processus ponctuel marqué. Ceci consiste à représenter la position des galaxies comme une configuration de points auxquels nous ajoutons des caractéristiques géométriques, spectrales, etc, qui transforment un point en objet. Cette approche présente l'avantage d'avoir une représentation mathématique proche du phénomène physique et permet de s'affranchir des approches pixelliques qui sont pénalisées par les dimensions conséquentes des données (300 x 300 x 3600 pixels). La détection des galaxies et l'estimation de leurs caractéristiques spatiales, spectrales ou d'intensité sont réalisées dans un cadre entièrement bayésien, ce qui conduit à un algorithme générique et robuste, où tous les paramètres sont estimés sur la base des seules données observées, la détection des objets d'intérêt étant effectuée conjointement.La dimension des données et la difficulté du problème de détection nous ont conduit à envisager une phase de prétraitement des données visant à définir des zones de recherche dans le cube. Des approches de type tests multiples permettent de construire des cartes de proposition des objets. La détection bayésienne est guidée par ces cartes de pré-détection (définition de la fonction d'intensité du processus ponctuel marqué), la proposition des objets est réalisée sur les pixels sélectionnés sur ces cartes. La qualité de la détection peut être caractérisée par un critère de contrôle des erreurs.L'ensemble des traitements développés au cours de cette thèse a été validé sur des données synthétiques, et appliqué ensuite à un jeu de données réelles acquises par MUSE suite à sa mise en service en 2014. L'analyse de la détection obtenue est présentée dans le manuscrit. / Detecting the faintest galaxies in the hyperspectral MUSE data is particularly challenging because they have a small spatial extension, a very sparse spectrum that contains only one narrow emission line, which position in the spectral range is unknown. Moreover, their signal-to-noise ratio are very low. These galaxies are modeled as quasi point sources in the three dimensions of the data cube. We propose a method for the detection of a galaxy configuration based on a marked point process in a nonparametric Bayesian framework. A galaxy is modeled by a point (its position in the spatial domain), and marks (geometrical, spectral features) are added to transform a point into an object. These processes yield a natural sparse representation of massive data (300 x 300 x 3600 pixels). The fully Bayesian framework leads to a general and robust algorithm where the parameters of the objects are estimated in a fully data-driven way. Preprocessing strategies are drawn to tackle the massive dimensions of the data and the complexity of the detection problem, they allow to reduce the exploration of the data to areas that probably contain sources. Multiple testing approaches have been proposed to build proposition map. This map is also used to define the intensity of the point process, textit{i.e.} it describes the probability density function of the point process. It also gives a global error control criterion for the detection. The performance of the proposed algorithm is illustrated on synthetic data and real hyperspectral data acquired by the MUSE instrument for young galaxy detection. Détection Estimation Processus ponctuels marqués Tests multiples Hyperspectral Detection Estimation Marked point processes Multiple testing problem Hyperspectral 620
39	Tests multiples et bornes post hoc pour des données hétérogènes / Multiple testing and post hoc bounds for heterogeneous data Durand, Guillermo 26 November 2018 (has links) Ce manuscrit présente mes contributions dans trois domaines des tests multiples où l'hétérogénéité des données peut être exploitée pour mieux détecter le signal tout en contrôlant les faux positifs : pondération des p-valeurs, tests discrets, et inférence post hoc. Premièrement, une nouvelle classe de procédures avec pondération données-dépendante, avec une structure de groupe et des estimateurs de la proportion de vraies nulles, est définie, et contrôle le False Discovery Rate (FDR) asymptotiquement. Cette procédure atteint aussi l'optimalité en puissance sous certaines conditions sur les estimateurs. Deuxièmement, de nouvelles procédures step-up et step-down, adaptées aux tests discrets sous indépendance, sont conçues pour contrôler le FDR pour une distribution arbitraire des marginales des p-valeurs sous l'hypothèse nulle. Finalement, de nouvelles familles de référence pour l'inférence post hoc, adaptées pour le cas où le signal est localisé, sont étudiées, et on calcule les bornes post hoc associées avec un algorithme simple. / This manuscript presents my contributions in three areas of multiple testing where data heterogeneity can be exploited to better detect false null hypotheses or improve signal detection while controlling false positives: p-value weighting, discrete tests, and post hoc inference. First, a new class of data-driven weighting procedures, incorporating group structure and true null proportion estimators, is defined, and its False Discovery Rate (FDR) control is proven asymptotically. This procedure also achieves power optimality under some conditions on the proportion estimators. Secondly, new step-up and step-down procedures, tailored for discrete tests under independence, are designed to control the FDR for arbitrary p-value null marginals. Finally, new confidence bounds for post hoc inference (called post hoc bounds), tailored for the case where the signal is localized, are studied, and the associated optimal post hoc bounds are derived with a simple algorithm. Tests multiples Inférence post hoc Pondération Hétérogénéité Tests discrets Puissance optimale Multiple testing Heterogeneity Post hoc inference 519.54
40	Multiple Testing Correction with Repeated Correlated Outcomes: Applications to Epigenetics Leap, Katie 27 October 2017 (has links) Epigenetic changes (specifically DNA methylation) have been associated with adverse health outcomes; however, unlike genetic markers that are fixed over the lifetime of an individual, methylation can change. Given that there are a large number of methylation sites, measuring them repeatedly introduces multiple testing problems beyond those that exist in a static genetic context. Using simulations of epigenetic data, we considered different methods of controlling the false discovery rate. We considered several underlying associations between an exposure and methylation over time. We found that testing each site with a linear mixed effects model and then controlling the false discovery rate (FDR) had the highest positive predictive value (PPV), a low number of false positives, and was able to differentiate between differential methylation that was present at only one time point vs. a persistent relationship. In contrast, methods that controlled FDR at a single time point and ad hoc methods tended to have lower PPV, more false positives, and/or were unable to differentiate these conditions. Validation in data obtained from Project Viva found a difference between fitting longitudinal models only to sites significant at one time point and fitting all sites longitudinally. multiple testing epigenetics methylation mixed models longitudinal false discovery rate Bioinformatics Biostatistics Computational Biology

Search results