Spelling suggestions: "subject:"estatistics"" "subject:"cstatistics""
491 |
A unified theory of hypothesis testing based on rankings.Pan, Jianhong. January 1994 (has links)
A unified theory of hypothesis testing based on the ranks of the data is proposed. A hypothesis testing problem often gives rise to two separate permutation sets corresponding to the data and to the alternative, respectively. By defining the distance between permutation sets as the average of all distances between pairs of permutations, one from each set, it is possible to obtain various test statistics. The limiting distributions of test statistics derived by the unified approach herein are obtained under both the null hypothesis and contiguous alternatives. The unified approach produces not only some well-known test statistics but also some new yet plausible test statistics. The corresponding results are extensions of the simple linear rank statistics defined by Hajek and Sidak (1967) to the generalized linear rank statistics and of the two-sample case to the multi-sample case. Furthermore, a combined method was developed for the case of composite alternatives.
|
492 |
Rank tests for interaction in two-way layouts with application in genetic analysisGao, Xin January 2003 (has links)
To fully dissect complex traits, it is desirable to have methods able to test gene-gene interaction for human genetic data. This thesis provides a framework to process human quantitative trait data such that the problem becomes a hypothesis test of interaction for a two-way layout design with unequal replicates. Three new nonparametric rank tests are proposed. Their limiting distributions under Pitman alternatives and asymptotic relative efficiencies are studied. The tests are extended to unbalanced designs. We also introduce the notion of composite linear rank statistics and prove asymptotic normality under mild conditions. Consistent estimators are provided for the limiting variance-covariance matrix of arbitrary linear rank statistics.
|
493 |
Multiple comparison methods and certain distributions arising in multivariate statistical analysisBagai, Om Parkash January 1960 (has links)
The problem of classifying multivariate normal populations into homogeneous clusters on the basis of random samples drawn from those populations is taken up. Three alternative methods have been suggested for this. One of them is explained fully with an illustrative example, and the tabular values for the corresponding statistic, used for the purpose, have been computed. In the case of the other two alternatives only the working procedure is discussed. Further, a new statistic R, 'the largest distance', is proposed in one of these two alternatives, and its distribution is determined for the bivariate case in the form of definite integrals.
Ignoring a priori probabilities, two alternative methods are suggested for assigning an arbitrary population to one or more clusters of populations, and are demonstrated by an illustrative example.
A method is discussed for finding confidence regions for the non-centrality parameters of the distributions of certain statistics used in multivariate analysis and this method is also illustrated by an example.
The exact distribution of the determinant of the sum of products (S.P.) matrix is found (in series), both in the central and the non-central linear cases for particular values of the rank of the matrix. Further, these results have been made use of in finding the limiting distribution of the Wilks-Lawley statistic proposed for testing the null hypothesis of the equality of the mean vectors of any number of populations.
Six different statistics based on the roots of certain determinantal equations have been proposed for various tests of hypotheses arising in the problems of multivariate analysis of variance (Anova). Their distributions in the limited cases of two and three eigenroots have been found in the form of definite integrals. Also, the limiting distribution of the Roy's statistics of the largest, an intermediate and the smallest eigenroots have been found by a simple, easy method of integration, which method is quite different from that of Nanda (1948).
Lastly, the distributions of the mean square and the mean product (M.P.) matrix have been approximated respectively in the univariate and multivariate cases of unequal sub-class numbers in the analysis of variance (Anova) of Model II. / Science, Faculty of / Mathematics, Department of / Graduate
|
494 |
Small area variations in surgical rates: Simulation as an aid in interpretation of findings.Howard, Andrew William. January 1998 (has links)
Purpose. To explore Monte Carlo simulation for interpretation of small area variations in surgical rates. Methods. Simulation was used to generate sets of surgical rates under the null hypothesis of equal rates. The distributions of the extremal quotient, coefficient of variation, chi square, systematic component of variation, and case count were described. The null hypothesis was modified to allow reasonable variability. Results. The chi-square, CV, and CC had interpretable values and adequate power over the range of parameters tested. The EQ and the SCV did not. Only two of the five operations studied had greater variability than expected under a modified null hypothesis, compared with five out of five under traditional testing. Conclusions. Simulation can estimate the distributions of nonstandard statistics. The new defined case count adds valuable policy-relevant information. Modification of the null hypothesis improves decisions about which variations need further investigation.
|
495 |
Hammashoidon potilasvahinkojen korvattavuuden analyysi kaksiarvoisella ja järjestysasteikollisella logistisella regressiollaKarhunen, S. (Sini) 02 June 2014 (has links)
Tutkielmassa tarkastellaan potilaiden potilasvahinkohakemusten korvattavuuteen vaikuttavia tekijöitä hammassairauksia koskevien hoitojen osalta. Siinä analysoidaan erityisesti millainen vaikutus potilasvahingon korvattavuuteen on potilaan iällä, sukupuolella, perussairaudella, vakuutussektorilla ja hakemuksen ratkaisuvuodella. Tutkimusaineisto käsittää Potilasvakuutuskeskuksen vuosien 2000–2011 keräämät tiedot hammassairauksia koskevista korvaushakemuksista. Tutkielmassa kaikki potilasvahingot olivat käytännössä hoitovahinkoja.
Tutkimusmenetelminä on käytetty sekä kaksiarvoista että järjestysasteikollista logistista regressiomallia. Kaksiarvoisessa mallissa vasteen luokat ovat "korvattu hoitovahinko" ja "ei korvattu hoitovahinko”. Järjestysasteikollisena mallina on verrannollisten vetojen malli ja sen vastemuuttujalla on kolme järjestettyä luokkaa; "korvattu hoitovahinko", "vahinko oli vähäinen tai siedettävä" ja "muu epäämisperuste". Tutkielmassa tarkastellaan myös verrannollisten vetojen mallin soveltuvuutta aineistoon.
Selkeimpiä eroja oli julkisen ja yksityisen vakuutussektorin välillä. Yksityisellä vakuutussektorilla potilasvahinkohakemuksista korvattiin 47 % ja julkisella 31 %. Eri ikäluokissa eniten korvattavia hoitovahinkoja oli 20–39-vuotiailla (43 %) ja 40–59-vuotiailla (42 %). Vähiten korvattavia hoitovahinkoja oli 0–19-vuotiailla (31 %) ja 80–98-vuotiailla (28 %). Perussairauksien kohdalla suurin osuus korvattavia hoitovahinkoja oli luokassa ”muut hampaiden ja tukikudosten sairaudet” (47 %), kun taas luokassa ”hammaskaries” vastaava osuus oli 37 % ja luokassa ”hampaanjuuren kärkeä ympäröivien kudosten sairaudet” 39 %.
Tuloksissa nähdään, että hoitovahinkoja korvattiin vähemmän vuosina 2005–2011 kuin niitä edeltäneinä vuosina. Vuonna 2003 oli korvattu suhteellisesti eniten hoitovahinkoja. Yksityistä sektoria käyttäneiden potilaiden hoitovahinkoja korvattiin julkista sektoria käyttäneitä enemmän. Ikäluokissa 0–19 ja 80–98 potilaiden hoitovahinkoja korvattiin vähemmän kuin muissa ikäluokissa. Sukupuolten välillä ei ollut selkeää eroa. Perussairautena hammaskariesta hoidattaneilta potilailta korvattiin vähemmän vahinkoja kuin muista sairausluokista. Kaksiarvoisen ja järjestysasteikollisen mallin tulokset ovat samansuuntaisia.
Verrannollisten vetojen mallin käytössä saattaa tämän aineiston kohdalla olla jonkin verran ristiriitaa. Näyttää siltä, että selittävillä muuttujilla on hieman erilainen vaikutus vasteeseen sen luokasta riippuen. Tämä koskee kaikkia muista selittäviä muuttujia paitsi sukupuolta.
|
496 |
Estimation of the stationary distribution of Markov chainsGaribotti, Gilda 01 January 2004 (has links)
In this dissertation we introduce a new estimator of the stationary probability measure of Markov processes, in the case where the transition structure depends on an unknown parameter. We prove that the proposed estimator is consistent and asymptotically normally distributed. Then we apply these ideas to Lindley processes and demonstrate via simulations the potential applicability of our estimator.
|
497 |
Data combination from multiple sources under measurement errorGasca-Aragon, Hugo 01 January 2012 (has links)
Regulatory Agencies are responsible for monitoring the performance of particular measurement communities. In order to achieve their objectives, they sponsor Intercomparison exercises between the members of these communities. The Intercomparison Exercise Program for Organic Contaminants in the Marine Environment is an ongoing NIST/NOAA program. It was started in 1986 and there have been 19 studies to date. Using this data as a motivation we review the theory and practices applied to its analysis. It is a common practice to apply some kind of filter the comparison study data. These filters go from outliers detection and exclusion to exclusion of the entire data from a participant when its measurements are very "different". When the measurements are not so "different" the usual assumption is that the laboratories are unbiased then the simple mean, the weighted mean or the one way random effects model are applied to obtain estimates of the true value. Instead we explore methods to analyze these data under weaker assumptions and apply them to some of the available data. More specifically we explore estimation of models assessing the laboratories performance and way to use those fitted models in estimating a consensus value for new study material. This is done in various ways starting with models that allow a separate bias for each lab with each compound at each point in time and then considering generalizations of that. This is done first by exploiting models where, for a particular compound, the bias may be shared over labs or over time and then by modeling systematic biases (which depend on the concentration) by combining data from different labs. As seen in the analyses, the latter models may be more realistic. Due to uncertainty in the certified reference material analyzing systematic biases leads to a measurement error in linear regression problem. This work has two differences from the standard work in this area. First, it allows heterogeneity in the material being delivered to the lab, whether it be control or study material. Secondly, we make use of Fieller's method for estimation which has not been used in the context before, although others have suggested it. One challenge in using Fieller's method is that explicit expressions for the variance and covariance of the sample variance and covariance of independent but non-identically distributed random variables are needed. These are developed. Simulations are used to compare the performance of moment/Wald, Fieller and bootstrap methods for getting confidence intervals for the slope in the measurement model. These suggest that the Fieller's method performs better than the bootstrap technique. We also explore four estimators for the variance of the error in the equation in this context and determine that the estimator based on the modified squared residuals outperforms the others. Homogeneity is a desirable property in control and study samples. Special experiments with nested designs must be conducted for homogeneity analysis and assessment purposes. However, simulation shows that heterogeneity has low impact on the performance of the studied estimators. This work shows that a biased but consistent estimator for the heterogeneity variance can be obtained from the current experimental design.
|
498 |
Classification by active testing with applications to imaging and change detectionLi, Chunming 01 January 1999 (has links)
In this dissertation, we investigate adaptive strategies for sequential testing, especially those driven by maximizing information gain when the conditional distribution of tests given hypotheses is Gaussian. We implement a classification algorithm in which tests are selected recursively and adaptively on-line. We show that such information-based strategies are statistically sensible and computationally efficient, and accommodate testing at multiple resolutions. Finally, applications are made to change point detection and medical image classification.
|
499 |
Prediction and Testing for Non-Parametric Random Function Signals in a Complex SystemUnknown Date (has links)
Methods employed in the construction of prediction bands for continuous curves require a dierent approach to those used for a data point. In many cases, the underlying function is unknown and thus a distribution-free approach which preserves sufficient coverage for the entire signal is necessary in the signal analysis. This paper discusses three methods for the formation of (1-alpha)100% bootstrap prediction bands and their performances are compared through the coverage probabilities obtained for each technique. Bootstrap samples are first obtained for the signal and then three dierent criteria are provided for the removal of 100% of the curves resulting in the (1-alpha)100% prediction band. The first method uses the L1 distance between the upper and lower curves as a gauge to extract the widest bands in the dataset of signals. Also investigated are extractions using the Hausdorffdistance between the bounds as well as an adaption to the bootstrap intervals discussed in Lenhoffet al (1999). The bootstrap prediction bands each have good coverage probabilities for the continuous signals in the dataset. For a 95% prediction band, the coverage obtained were 90.59%, 93.72% and 95% for the L1 Distance, Hausdorff Distance and the adjusted Bootstrap methods respectively. The methods discussed in this paper have been applied to constructing prediction bands for spring discharge in a successful manner giving good coverage in each case. Spring Discharge measured over time can be considered as a continuous signal and the ability to predict the future signals of spring discharge is useful for monitoring flow and other issues related to the spring. While in some cases, rainfall has been tted with the gamma distribution, the discharge of the spring represented as continuous curves, is better approached not assuming any specific distribution. The Bootstrap aspect occurs not in sampling the output discharge curves but rather in simulating the input recharge that enters the spring. Bootstrapping the rainfall as described in this paper, allows for adequately creating new samples over different periods of time as well as specic rain events such as hurricanes or drought. The Bootstrap prediction methods put forth in this paper provide an approach that supplies adequate coverage for prediction bands for signals represented as continuous curves. The pathway outlined by the flow of the discharge through the springshed is described as a tree. A non-parametric pairwise test, motivated by the idea of K-means clustering, is proposed to decipher whether there is equality between two trees in terms of their discharges. A large sample approximation is devised for this lower-tail significance test and test statistics for different numbers of input signals are compared to a generated table of critical values. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester, 2012. / March 29, 2012. / Includes bibliographical references. / Eric Chicken, Professor Directing Dissertation; Eric Klassen, University Representative; Xufeng Niu, Committee Member; Adrian Barbu, Committee Member.
|
500 |
Weighted Adaptive Methods for Multivariate Response Models with an HIV/Neurocognitive ApplicationUnknown Date (has links)
Multivariate response models are being used increasingly more in almost all fields with the necessary employment of inferential methods such as Canonical Correlation Analysis (CCA). This requires the estimation of the number of uncorrelated canonical relationships between the two sets, or, equivalently so, determining the rank of the coefficient estimator in the multivariate response model.One way to do this is by the Rank Selection Criterion (RSC) by Bunea et al. with the assumption the error matrix has independent constant variance entries. While this assumption is necessary to show their strong theoretical results, in practical application, some flexibility is required. That is, such assumption cannot always be safely made. What is developed here are the theoretics that parallel Bunea et al.'s work with the addition of a "decorrelator" weight matrix. One choice for the weight matrix is the residual covariance, but this introduces many issues in practice. A computationally more convenient weight matrix is the sample response covariance. When such a weight matrix is chosen, CCA is directly accessible by this weighted version of RSC giving rise to an Adaptive CCA (ACCA) with principal proofs for the large sample setting. However, particular considerations are required for the high-dimensional setting, where similar theoretics do not hold. What is offered instead are extensive empirical simulations that reveal that using the sample response covariance still provides good rank recovery and estimation of the coefficient matrix, and hence, also provides good estimation of the number of canonical relationships and variates. It is argued precisely why other versions of the residual covariance, including a regularized version, are poor choices in the high-dimensional setting. Another approach to avoid these issues is to employ some type of variable selection methodology first before applying ACCA. Truly, any group selection method may be applied prior to ACCA as variable selection in the multivariate response model is the same as group selection in the univariate response model and thus completely eliminates these high-dimensional concerns. To offer a practical application of these ideas, ACCA is applied to a "large sample'" neurocognitive dataset. Then, a high-dimensional dataset is generated to which Group LASSO will be first utilized before ACCA. This provides a unique perspective into the relationships between cognitive deficiencies in HIV-positive patients and the extensive, available neuroimaging measures. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester, 2012. / February 10, 2012. / Includes bibliographical references. / Yiyuan She, Professor Directing Thesis; Anke Meyer-Baese, University Representative; Adrian Barbu, Committee Member; Florentina Bunea, Committee Member; Xufeng Niu, Committee Member.
|
Page generated in 0.0717 seconds