Spelling suggestions: "subject:"small samples"" "subject:"tmall samples""
1 |
Methods for Meta–Analyses of Rare Events, Sparse Data, and HeterogeneityZabriskie, Brinley 01 May 2019 (has links)
The vast and complex wealth of information available to researchers often leads to a systematic review, which involves a detailed and comprehensive plan and search strategy with the goal of identifying, appraising, and synthesizing all relevant studies on a particular topic. A meta–analysis, conducted ideally as part of a comprehensive systematic review, statistically synthesizes evidence from multiple independent studies to produce one overall conclusion. The increasingly widespread use of meta–analysis has led to growing interest in meta–analytic methods for rare events and sparse data. Conventional approaches tend to perform very poorly in such settings. Recent work in this area has provided options for sparse data, but these are still often hampered when heterogeneity across the available studies differs based on treatment group. Heterogeneity arises when participants in a study are more correlated than participants across studies, often stemming from differences in the administration of the treatment, study design, or measurement of the outcome. We propose several new exact methods that accommodate this common contingency, providing more reliable statistical tests when such patterns on heterogeneity are observed. First, we develop a permutation–based approach that can also be used as a basis for computing exact confidence intervals when estimating the effect size. Second, we extend the permutation–based approach to the network meta–analysis setting. Third, we develop a new exact confidence distribution approach for effect size estimation. We show these new methods perform markedly better than traditional methods when events are rare, and heterogeneity is present.
|
2 |
Analysis and Optimization of Classifier Error Estimator Performance within a Bayesian Modeling FrameworkDalton, Lori Anne 2012 May 1900 (has links)
With the advent of high-throughput genomic and proteomic technologies, in conjunction with the difficulty in obtaining even moderately sized samples, small-sample classifier design has become a major issue in the biological and medical communities. Training-data error estimation becomes mandatory, yet none of the popular error estimation techniques have been rigorously designed via statistical inference or optimization. In this investigation, we place classifier error estimation in a framework of minimum mean-square error (MMSE) signal estimation in the presence of uncertainty, where uncertainty is relative to a prior over a family of distributions. This results in a Bayesian approach to error estimation that is optimal and unbiased relative to the model. The prior addresses a trade-off between estimator robustness (modeling assumptions) and accuracy.
Closed-form representations for Bayesian error estimators are provided for two important models: discrete classification with Dirichlet priors (the discrete model) and linear classification of Gaussian distributions with fixed, scaled identity or arbitrary covariances and conjugate priors (the Gaussian model). We examine robustness to false modeling assumptions and demonstrate that Bayesian error estimators perform especially well for moderate true errors.
The Bayesian modeling framework facilitates both optimization and analysis. It naturally gives rise to a practical expected measure of performance for arbitrary error estimators: the sample-conditioned mean-square error (MSE). Closed-form expressions are provided for both Bayesian models. We examine the consistency of Bayesian error estimation and illustrate a salient application in censored sampling, where sample points are collected one at a time until the conditional MSE reaches a stopping criterion.
We address practical considerations for gene-expression microarray data, including the suitability of the Gaussian model, a methodology for calibrating normal-inverse-Wishart priors from unused data, and an approximation method for non-linear classification. We observe superior performance on synthetic high-dimensional data and real data, especially for moderate to high expected true errors and small feature sizes.
Finally, arbitrary error estimators may be optimally calibrated assuming a fixed Bayesian model, sample size, classification rule, and error estimation rule. Using a calibration function mapping error estimates to their optimally calibrated values off-line, error estimates may be calibrated on the fly whenever the assumptions apply.
|
3 |
Spurious Heavy Tails / Falska tunga svansarSegerfors, Ted January 2015 (has links)
Since the financial crisis which started in 2007, the risk awareness in the financial sector is greater than ever. Financial institutions such as banks and insurance companies are heavily regulated in order to create a harmonic and resilient global economic environment. Sufficiently large capital buffers may protect institutions from bankruptcy due to some adverse financial events leading to an undesirable outcome for the company. In many regulatory frameworks, the institutions are obliged to estimate high quantiles of their loss distributions. This is relatively unproblematic when large samples of relevant historical data are available. Serious statistical problems appear when only small samples of relevant data are available. One possible solution would be to pool two or more samples that appear to have the same distribution, in order to create a larger sample. This thesis identifies the advantages and risks of pooling of small samples. For some mixtures of normally distributed samples, with what is considered to be the same variances, the pooled data may indicate heavy tails. Since a finite mixture of normally distributed samples has light tails, this is an example of spurious heavy tails. Even though two samples may appear to have the same distribution function it is not necessarily better to pool the samples in order to obtain a larger sample size with the aim of more accurate quantile estimation. For two normally distributed samples of sizes m and n and standard deviations s and v, we find that when v=s is approximately 2, n+m is less than 100 and m=(m+n) is approximately 0.75, then there is a considerable risk of believing that the two samples have equal variance and that the pooled sample has heavy tails. / Efter den finansiella krisen som hade sin start 2007 har riskmedvetenheten inom den finansiella sektorn ökat. Finansiella institutioner så som banker och försäkringsbolag är noga reglerade och kontrollerade för att skapa en stark och stabil världsekonomi. Genom att banker och försäkringsbolag enligt regelverken måste ha kapitalbuffertar som ska skydda mot konkurser vid oväntade och oönskade händelser skapas en mer harmonisk finansiell marknad. Dessa regelverk som institutionerna måste följa innebär ofta att de ansvariga måste skatta höga kvantiler av institutionens förväntade förlustfunktion. Att skapa en pålitligt modell och sedan skatta höga kvantiler är lätt när det finns mycket relevant data tillgänglig. När det inte finns tillr äckligt med historisk data uppkommer statistiska problem. En lösning på problemet är att poola två eller _era grupper av data som ser ut att komma från samma fördelningsfunktion för att på så sätt skapa en större grupp med historisk data tillgänglig. Detta arbetet går igenom fördelar och risker med att poola data när det inte finns tillräckligt med relevant historisk data för att skapa en pålitlig modell. En viss mix av normalfördelade datagrupper som ser ut att ha samma varians kan uppfattas att komma från tungsvansade fördelningar. Eftersom normalfördelningen inte är en tungsvansad fördelning kan denna missuppfattning skapa problem, detta är ett exempel på falska tunga svansar. Även fast två datagrupper ser ut att komma från samma fördelningsfunktion så är det inte nödvändigtvis bättre att poola dessa grupper för att skapa ett större urval. För två normalfördelade datagrupper med storlekarna m och n och standardavvikelserna s och v, är det farligaste scenariot när v=s är ungefär 2, n+m är mindre än 100 och m=(m+n)är ungefär 0.75. När detta inträffar finns det en signifikant risk att de två datagrupperna ser ut att komma från samma fördelningsfunktion och att den poolade datan innehar tungsvansade egenskaper.
|
4 |
Equating Accuracy Using Small Samples in the Random Groups DesignHeh, Victor K. 24 August 2007 (has links)
No description available.
|
5 |
Issues in the Distribution Dynamics Approach to the Analysis of Regional Economic Growth and Convergence: Spatial Effects and Small SamplesJanuary 2018 (has links)
abstract: In the study of regional economic growth and convergence, the distribution dynamics approach which interrogates the evolution of the cross-sectional distribution as a whole and is concerned with both the external and internal dynamics of the distribution has received wide usage. However, many methodological issues remain to be resolved before valid inferences and conclusions can be drawn from empirical research. Among them, spatial effects including spatial heterogeneity and spatial dependence invalidate the assumption of independent and identical distributions underlying the conventional maximum likelihood techniques while the availability of small samples in regional settings questions the usage of the asymptotic properties. This dissertation is comprised of three papers targeted at addressing these two issues. The first paper investigates whether the conventional regional income mobility estimators are still suitable in the presence of spatial dependence and/or a small sample. It is approached through a series of Monte Carlo experiments which require the proposal of a novel data generating process (DGP) capable of generating spatially dependent time series. The second paper moves to the statistical tests for detecting specific forms of spatial (spatiotemporal) effects in the discrete Markov chain model, investigating their robustness to the alternative spatial effect, sensitivity to discretization granularity, and properties in small sample settings. The third paper proposes discrete kernel estimators with cross-validated bandwidths as an alternative to maximum likelihood estimators in small sample settings. It is demonstrated that the performance of discrete kernel estimators offers improvement when the sample size is small. Taken together, the three papers constitute an endeavor to relax the restrictive assumptions of spatial independence and spatial homogeneity, as well as demonstrating the difference between the small sample and asymptotic properties for conventionally adopted maximum likelihood estimators towards a more valid inferential framework for the distribution dynamics approach to the study of regional economic growth and convergence. / Dissertation/Thesis / Doctoral Dissertation Geography 2018
|
6 |
Missing Data - A Gentle IntroductionÖsterlund, Vilgot January 2020 (has links)
This thesis provides an introduction to methods for handling missing data. A thorough review of earlier methods and the development of the field of missing data is provided. The thesis present the methods suggested in today’s literature, multiple imputation and maximum likelihood estimation. A simulation study is performed to see if there are circumstances in small samples when any of the two methods are to be preferred. To show the importance of handling missing data, multiple imputation and maximum likelihood are compared to listwise deletion. The results from the simulation study does not show any crucial differences between multiple imputation and maximum likelihood when it comes to point estimates. Some differences are seen in the estimation of the confidence intervals, talking in favour of multiple imputation. The difference is decreasing with an increasing sample size and more studies are needed to draw definite conclusions. Further, the results shows that listwise deletion lead to biased estimations under a missing at random mechanism. The methods are also applied to a real dataset, the Swedish enrollment registry, to show how the methods work in a practical application.
|
7 |
Statistická analýza souborů s malým rozsahem / Statistical Analysis of Sample with Small SizeHolčák, Lukáš January 2008 (has links)
This diploma thesis is focused on the analysis of small samples where it is not possible to obtain more data. It can be especially due to the capital intensity or time demandingness. Where the production have not a wherewithall for the realization more data or absence of the financial resources. Of course, analysis of small samples is very uncertain, because inferences are always encumbered with the level of uncertainty.
|
Page generated in 0.0584 seconds