31 |
An explorative literature review on misinterpretations of confidence intervalsHåkansson, Åsa January 2021 (has links)
Confidence intervals are presented in various scientific fields and are used to justify claims,although in several studies it has been shown that various groups of people, for example researchers, teachers and students incorrectly interpret the confidence interval. To get an overview over articles which studies how confidence intervals are misinterpreted we have performed an explorative literature review. The search for articles to include was conducted in a semi-structured way, and the explorative literature review exist of 36 articles. The results are presented in tables, where the articles have been placed in the table of the relevant misinterpretation. In this study there are five different tables which are probability fallacies, precision fallacies, likelihood fallacies, overlaps fallacies and miscellaneous fallacies. This paper state that confidence intervals are commonly misinterpreted accordingly to our first four categories. In the last category, the less common misinterpretations are presented, at least less common based on the articles included in this paper.
|
32 |
Simultaneous Inference on Survival DataMa, Yehan 24 April 2019 (has links)
No description available.
|
33 |
Parameter Estimation In Linear RegressionOllikainen, Kati 01 January 2006 (has links)
Today increasing amounts of data are available for analysis purposes and often times for resource allocation. One method for analysis is linear regression which utilizes the least squares estimation technique to estimate a model's parameters. This research investigated, from a user's perspective, the ability of linear regression to estimate the parameters' confidence intervals at the usual 95% level for medium sized data sets. A controlled environment using simulation with known data characteristics (clean data, bias and or multicollinearity present) was used to show underlying problems exist with confidence intervals not including the true parameter (even though the variable was selected). The Elder/Pregibon rule was used for variable selection. A comparison of the bootstrap Percentile and BCa confidence interval was made as well as an investigation of adjustments to the usual 95% confidence intervals based on the Bonferroni and Scheffe multiple comparison principles. The results show that linear regression has problems in capturing the true parameters in the confidence intervals for the sample sizes considered, the bootstrap intervals perform no better than linear regression, and the Scheffe method is too wide for any application considered. The Bonferroni adjustment is recommended for larger sample sizes and when the t-value for a selected variable is about 3.35 or higher. For smaller sample sizes all methods show problems with type II errors resulting from confidence intervals being too wide.
|
34 |
Correction Methods, Approximate Biases, and Inference for Misclassified DataShieh, Meng-Shiou 01 May 2009 (has links)
When categorical data are misplaced into the wrong category, we say the data is affected by misclassification. This is common for data collection. It is well-known that naive estimators of category probabilities and coefficients for regression that ignore misclassification can be biased. In this dissertation, we develop methods to provide improved estimators and confidence intervals for a proportion when only a misclassified proxy is observed, and provide improved estimators and confidence intervals for regression coefficients when only misclassified covariates are observed. Following the introduction and literature review, we develop two estimators for a proportion , one which reduces the bias, and one with smaller mean square error. Then we will give two methods to find a confidence interval for a proportion, one using optimization techniques, and the other one using Fieller's method. After that, we will focus on developing methods to find corrected estimators for coefficients of regression with misclassified covariates, with or without perfectly measured covariates, and with a known estimated misclassification/reclassification model. These correction methods use the score function approach, regression calibration and a mixture model. We also use Fieller's method to find a confidence interval for the slope of simple regression with misclassified binary covariates. Finally, we use simulation to demonstrate the performance of our proposed methods.
|
35 |
Is It More Advantageous to Administer Libqual+® Lite Over Libqual+®? an Analysis of Confidence Intervals, Root Mean Square Errors, and BiasPonce, Hector F. 08 1900 (has links)
The Association of Research Libraries (ARL) provides an option for librarians to administer a combination of LibQUAL+® and LibQUAL+® Lite to measure users' perceptions of library service quality. LibQUAL+® Lite is a shorter version of LibQUAL+® that uses planned missing data in its design. The present study investigates the loss of information in commonly administered proportions of LibQUAL+® and LibQUAL+® Lite when compared to administering LibQUAL+® alone. Data from previous administrations of LibQUAL+® protocol (2005, N = 525; 2007, N = 3,261; and 2009, N = 2,103) were used to create simulated datasets representing various proportions of LibQUAL+® versus LibQUAL+® Lite administration (0.2:0.8, 0.4:0.6. 0.5:0.5, 0.6:0.4, and 0.8:0.2). Statistics (i.e., means, adequacy and superiority gaps, standard deviations, Pearson product-moment correlation coefficients, and polychoric correlation coefficients) from simulated and real data were compared. Confidence intervals captured the original values. Root mean square errors and absolute and relative biases of correlations showed that accuracy in the estimates decreased with increase in percentage of planned missing data. The recommendation is to avoid using combinations with more than 20% planned missing data.
|
36 |
Estimation of the standard error and confidence interval of the indirect effect in multiple mediator modelsBriggs, Nancy Elizabeth 22 September 2006 (has links)
No description available.
|
37 |
An Investigation into Classification of High Dimensional Frequency DataMcGraw, John M. 25 October 2001 (has links)
We desire an algorithm to classify a physical object in ``real-time" using an easily portable probing device. The probe excites a given object at frequencies from 100 MHz up to 800 MHz at intervals of 0.5 MHz. Thus the data used for classification is the 1400-component vector of these frequency responses.
The Interdisciplinary Center for Applied Mathematics (ICAM) was asked to help develop an algorithm and executable computer code for the probing device to use in its classification analysis. Due to these and other requirements, all work had to be done in Matlab. Hence a significant portion of the effort was spent in writing and testing applicable Matlab code which incorporated the various statistical techniques implemented.
We offer three approaches to classification: maximum log-likelihood estimates, correlation coefficients, and confidence bands. Related work included considering ways to recover and exploit certain symmetry characteristics of the objects (using the response data). Present investigations are not entirely conclusive, but the correlation coefficient classifier seems to produce reasonable and consistent results. All three methods currently require the evaluation of the full 1400-component vector. It has been suggested that unknown portions of the vectors may include extraneous and misleading information, or information common to all classes. Identifying and removing the respective components may be beneficial to classification regardless of method. Another advantage of dimension reduction should be a strengthening of mean and covariance estimates. / Master of Science
|
38 |
Estimation de la moyenne et de la variance de l’abondance de populations en écologie à partir d’échantillons de petite taille / Estimating mean and variance of populations abundance in ecology with small-sized samplesVaudor, Lise 25 January 2011 (has links)
En écologie comme dans bien d’autres domaines, les échantillons de données de comptage comprennent souvent de nombreux zéros et quelques abondances fortes. Leur distribution est particulièrement surdispersée et asymétrique. Les méthodes les plus classiques d’inférence sont souvent mal adaptées à ces distributions, à moins de disposer d’échantillons de très grande taille. Il est donc nécessaire de s’interroger sur la validité des méthodes d’inférence, et de quantifier les erreurs d’estimation pour de telles données. Ce travail de thèse a ainsi été motivé par un jeu de données d’abondance de poissons, correspondant à un échantillonnage ponctuel par pêche électrique. Ce jeu de données comprend plus de 2000 échantillons, dont chacun correspond aux abondances ponctuelles (considérées indépendantes et identiquement distribuées) d’une espèce pour une campagne de pêche donnée. Ces échantillons sont de petite taille (en général, 20 _ n _ 50) et comprennent de nombreux zéros (en tout, 80% de zéros). Les ajustements de plusieurs modèles de distribution classiques pour les données de comptage ont été comparés sur ces échantillons, et la distribution binomiale négative a été sélectionnée. Nous nous sommes donc intéressés à l’estimation des deux paramètres de cette distribution : le paramètre de moyenne m, et le paramètre de dispersion, q. Dans un premier temps, nous avons étudié les problèmes d’estimation de la dispersion. L’erreur d’estimation est d’autant plus importante que le nombre d’individus observés est faible, et l’on peut, pour une population donnée, quantifier le gain en précision résultant de l’exclusion d’échantillons comprenant très peu d’individus. Nous avons ensuite comparé plusieurs méthodes de calcul d’intervalles de confiance pour la moyenne. Les intervalles de confiance basés sur la vraisemblance du modèle binomial négatif sont, de loin, préférables à des méthodes plus classiques comme la méthode de Student. Par ailleurs, ces deux études ont révélé que certains problèmes d’estimation étaient prévisibles, à travers l’observation de statistiques simples des échantillons comme le nombre total d’individus, ou le nombre de comptages non-nuls. En conséquence, nous avons comparé la méthode d’échantillonnage à taille fixe, à une méthode séquentielle, où l’on échantillonne jusqu’à observer un nombre minimum d’individus ou un nombre minimum de comptages non-nuls. Nous avons ainsi montré que l’échantillonnage séquentiel améliore l’estimation du paramètre de dispersion mais induit un biais dans l’estimation de la moyenne ; néanmoins, il représente une amélioration des intervalles de confiance estimés pour la moyenne. Ainsi, ce travail quantifie les erreurs d’estimation de la moyenne et de la dispersion dans le cas de données de comptage surdispersées, compare certaines méthodes d’estimations, et aboutit à des recommandations pratiques en termes de méthodes d’échantillonnage et d’estimation. / In ecology as well as in other scientific areas, count samples often comprise many zeros, and few high abundances. Their distribution is particularly overdispersed, and skewed. The most classical methods of inference are often ill-adapted to these distributions, unless sample size is really large. It is thus necessary to question the validity of inference methods, and to quantify estimation errors for such data. This work has been motivated by a fish abundance dataset, corresponding to punctual sampling by electrofishing. This dataset comprises more than 2000 samples : each sample corresponds to punctual abundances (considered to be independent and identically distributed) for one species and one fishing campaign. These samples are small-sized (generally, 20 _ n _ 50) and comprise many zeros (overall, 80% of counts are zeros). The fits of various classical distribution models were compared on these samples, and the negative binomial distribution was selected. Consequently, we dealt with the estimation of the parameters of this distribution : the parameter of mean m and parameter of dispersion q. First, we studied estimation problems for the dispersion. The estimation error is higher when few individuals are observed, and the gain in precision for a population, resulting from the exclusion of samples comprising very few individuals, can be quantified. We then compared several methods of interval estimation for the mean. Confidence intervals based on negative binomial likelihood are, by far, preferable to more classical ones such as Student’s method. Besides, both studies showed that some estimation problems are predictable through simple statistics such as total number of individuals or number of non-null counts. Accordingly, we compared the fixed sample size sampling method, to a sequential method, where sampling goes on until a minimum number of individuals or positive counts have been observed. We showed that sequential sampling improves the estimation of dispersion but causes the estimation of mean to be biased ; still, it improves the estimation of confidence intervals for the mean. Hence, this work quantifies errors in the estimation of mean and dispersion in the case of overdispersed count data, compares various estimation methods, and leads to practical recommendations as for sampling and estimation methods.
|
39 |
Alternative Methods of Estimating the Degree of Uncertainty in Student Ratings of TeachingAlsarhan, Ala'a Mohammad 01 July 2017 (has links)
This study used simulated results to evaluate four alternative methods of computing confidence intervals for class means in the context of student evaluations of teaching in a university setting. Because of the skewed and bounded nature of the ratings, the goal was to identify a procedure for constructing confidence intervals that would be asymmetric and not dependent upon normal curve theory. The four methods included (a) a logit transformation, (b) a resampling procedure, (c) a nonparametric, bias corrected accelerated Bootstrapping procedure, and (d) a Bayesian bootstrap procedure. The methods were compared against four criteria including (a) coverage probability, (b) coverage error, (c) average interval width, and (d) the lower and upper error probability. The results of each method were also compared with a classical procedure for computing the confidence interval based on normal curve theory. In addition, Student evaluations of teaching effectiveness (SET) ratings from all courses taught during one semester at Brigham Young University were analyzed using multilevel generalizability theory to estimate variance components and to estimate the reliability of the class means as a function of the number of respondents in each class. The results showed that the logit transformation procedure outperformed the alternative methods. The results also showed that the reliability of the class means exceeded .80 for classes averaging 15 respondents or more. The study demonstrates the need to routinely report a margin of error associated with the mean SET rating for each class and recommends that a confidence interval based on the logit transformation procedure be used for this purpose.
|
40 |
Statistical Inference for Costs and Incremental Cost-Effectiveness Ratios with Censored DataChen, Shuai 2012 May 1900 (has links)
Cost-effectiveness analysis is widely conducted in the economic evaluation of new treatment options. In many clinical and observational studies of costs, data are often censored. Censoring brings challenges to both medical cost estimation and cost-effectiveness analysis. Although methods have been proposed for estimating the mean costs with censored data, they are often derived from theory and it is not always easy to understand how these methods work. We provide an alternative method for estimating the mean cost more efficiently based on a replace-from-the-right algorithm, and show that this estimator is equivalent to an existing estimator based on the inverse probability weighting principle and semiparametric efficiency theory. Therefore, we provide an intuitive explanation to a theoretically derived mean cost estimator.
In many applications, it is also important to estimate the survival function of costs. We propose a generalized redistribute-to-the right algorithm for estimating the survival function of costs with censored data, and show that it is equivalent to a simple weighted survival estimator of costs based on inverse probability weighting techniques. Motivated by this redistribute-to-the-right principle, we also develop a more efficient survival estimator for costs, which has the desirable property of being monotone, and more efficient, although not always consistent. We conduct simulation to compare our method with some existing survival estimators for costs, and find the bias seems quite small. Thus, it may be considered as a candidate for survival estimator for costs in a real setting when the censoring is heavy and cost history information is available.
Finally, we consider one special situation in conducting cost-effectiveness analysis, when the terminating events for survival time and costs are different. Traditional methods for statistical inference cannot deal with such data. We propose a new method for deriving the confidence interval for the incremental cost-effectiveness ratio under this situation, based on counting process and the general theory for missing data process. The simulation studies show that our method performs very well for some practical settings. Our proposed method has a great potential of being applied to a real setting when different terminating events exist for survival time and costs.
|
Page generated in 0.0912 seconds