Global ETD Search

341	Modern variable selection techniques in the generalised linear model with application in Biostatistics Millard, Salomi 10 1900 (has links) In a Biostatistics environment, the datasets to be analysed are frequently high-dimensional and multicollinearity is expected due to the nature of the features. However, many traditional approaches to statistical analysis and feature selection cease to be useful in the presence of high-dimensionality and multicollinearity. Penalised regression methods have proved to be practical and attractive for dealing with these problems. In this dissertation, we propose a new penalised approach, the modified elastic-net (MEnet), for statistical analysis and feature selection using a combination of the ridge and bridge penalties. This method is designed to deal with high-dimensional problems with highly correlated predictor variables. Furthermore, it has a closed-form solution, unlike the most frequently used penalised techniques, which makes it simple to implement on high-dimensional data. We show how this approach can be used to analyse high-dimensional data with binary responses, e.g., microarray data, and simultaneously select significant features. An extensive simulation study and analysis of a colon cancer dataset demonstrate the properties and practical aspects of the proposed method. / Mini Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2020. / DSI-CSIR Interbursary Support (IBS) Programme / Statistics Industry HUB, Department of Statistics, University of Pretoria / Statistics / MSc / Restricted Mathematical statistics Penalised regression Feature selection UCTD
342	Modelling of highly skewed longitudinal count data based on the discrete Weibull distribution Nel, Helene Mari January 2021 (has links) Longitudinal data refer to multiple observations collected on the same subject (or unit) over time. Zero-inflated data (containing many zeros) frequently occur, resulting in overdispersion in count data. Regression models used to analyze count data are often based on the Poisson and negative binomial (NB) distribution. The Poisson distribution is restrictive when count data are overdispersed; the regression model can, therefore, give inappropriate fits when the variability in the data is larger or smaller than the theoretical variance. These two cases are, respectively, referred to as overdispersion and underdispersion. The NB distribution handles overdispersed data better compared to the Poisson distribution, but not underdispersed data. Another problem with the NB distribution is that it does not accommodate heavy-tailed or highly skewed data well. In this study, the discrete Weibull (DW) and the zero-inflated DW (ZIDW) distributions are explored in a mixed model context that models the median using a Bayesian approach. In contrast, the conventional NB and ZINB mixed-effects regression models model the mean counts over time. Results from the four mixed-effects regression models are compared. It is observed that the Bayesian DW and ZIDW mixed-effects regression models are computationally competitive with the Bayesian NB and ZINB mixed-effects regression models concerning flexibility, implementation, and convergence speed. The DW and ZIDW models are found to be excellent choices to model highly skewed longitudinal count data. / Mini Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2021. / NRF / Statistics / MSc (Advanced Data Analytics) / Unrestricted UCTD Mathematical statistics 895 (WST 895)
343	An analysis of test reliability Unknown Date (has links) "The need for efficient means of testing has long been recognized. To obtain efficiency in testing requires the study of four attributes of the testing instrument--namely: reliability, validity, interpretability and administrability. It is the purpose of this paper to examine in some detail the first of these attributes, reliability. In particular, this is an attempt to analyse the reliability of Mathematics 101 Test D which was administered at Florida State University in the fall of 1948"--Introduction. / Typescript. / "July, 1949." / "Submitted to the Graduate Council of Florida State University in partial fulfillment of the requirements for the degree of Master of Science under Plan II." / Includes bibliographical references (leaf 28). Educational statistics Mathematical statistics Statistics--Standards
344	Some statistical properties of Laguerre coefficient estimates. Kaufman, David. January 1970 (has links) No description available. Estimation theory Mathematical statistics System analysis
345	Comparison of Proposed K Sample Tests with Dietz's Test for Nondecreasing Ordered Alternatives for Bivariate Normal Data Zhao, Yanchun January 2011 (has links) There are many situations in which researchers want to consider a set of response variables simultaneously rather than just one response variable. For instance, a possible example is when a researcher wishes to determine the effects of an exercise and diet program on both the cholesterol levels and the weights of obese subjects. Dietz (1989) proposed two multivariate generalizations of the Jonckheere test for ordered alternatives. In this study, we propose k-sample tests for nondecreasing ordered alternatives for bivariate normal data and compare their powers with Dietz's sum statistic. The proposed k-sample tests are based on transformations of bivariate data to univariate data. The transformations considered are the sum, maximum and minimum functions. The ideas for these transformations come from the Leconte, Moreau, and Lellouch (1994). After the underlying bivariate normal data are reduced to univariate data, the Jonckheere-Terpstra (JT) test (Terpstra, 1952 and Jonckheere, 1954) and the Modified Jonckheere-Terpstra (MJT) test (Tryon and Hettmansperger, 1973) are applied to the univariate data. A simulation study is conducted to compare the proposed tests with Dietz's test for k bivariate normal populations (k=3, 4, 5). A variety of sample sizes and various location shifts are considered in this study. Two different correlations are used for the bivariate normal distributions. The simulation results show that generally the Dietz test performs the best for the situations considered with the underlying bivariate normal distribution. The estimated powers of MJT sum and JT sum are often close with the MJT sum generally having a little higher power. The sum transformation was the best of the three transformations to use for bivariate normal data. Order statistics. Mathematical statistics. Nonparametric statistics.
346	Groupings in item demand problems Carter, Walter 08 June 2010 (has links) In this dissertation an iterative procedure, due to Hartley [9], for obtaining the maximum likelihood estimators of the parameters from underlying discrete distributions is studied for the case of grouped random samples. It is shown that when the underlying distribution is Poisson the process always converges and does so regardless of the initial values taken for the unknown parameter. In showing this, a rather interesting property of the Poisson distribution was derived. If one defines a connected group of integers to be such that it contains all the integers between and including its end points, it is shown that the variance of the sub- distribution defined on this connected set is strictly less than the variance of the complete Poisson distribution. A Monte Carlo study was performed to indicate how increasing group sizes affected the variances of the maximum likelihood estimators. As a result of a problem encountered by the Office of Naval Research, combinations of distributions diff kb were introduced. The difference between such combinations and the classical mixtures of distributions is that a new distribution must be considered whenever the random variable in question increases by an integral multiple of a known integer constant, b. When all the data are present, the estimation problem is no more complicated than when estimating the individual parameters from the component distributions. However, it is pointed out that very frequently the observed samples are defective in the fact that none of the component frequencies are observed. Hence, horizontal grouping of the sample values occurs as opposed to the vertical grouping encountered previously in the one parameter Poisson case. An extension of the iterative procedure used to obtain the maximum likelihood estimator of the single parameter grouped Poisson distribution is made to obtain the estimators of the parameters in a horizontally grouped sample. As a practical example, the component distributions were all taken to be from the Poisson family. The estimators were obtained and their properties were studied. The regularity conditions which are sufficient to show that a consistent and asymptotically normally distributed solution to the likelihood equations exist are seen to be satisfied for such combinations of the Poisson distributions. Further, in the full data case, a set of jointly sufficient statistics is exhibited and since, in the presence of sufficient statistics, the solutions to the likelihood equations are unique, the estimators are consistent and asymptotically normal. It is seen that such combinations of distributions can be applied to problems in item demands. A justification of the Poisson distribution is given for such applications, but it is also pointed out that the Negative Binomial distribution might be applicable. It is also shown that such a probability model might have an application in testing the efficiency of an anti-ballistic missile system when under attack by missiles which carry multiple warheads. However, no data were available and hence the study of this application could be carried no further. / Ph. D. LD5655.V856 1968.C36 Mathematical statistics
347	A Subset Selection Rule for Three Normal Populations Culpepper, Bert 01 July 1982 (has links) (PDF) No description available. Mathematical statistics Physical Sciences and Mathematics Statistics and Probability
348	Restrictive ranking Norman, James Everett January 1965 (has links) This dissertation is a study of certain aspects of restricted ranking, a method intended for use by a panel of m judges evaluating the relative merits of N subjects, candidates tor scholarships, awards, etc. Each judge divides the N subjects into R classes, so that n₁ individuals receive a grade i (i = 1, 2, R; Σnᵢ = N) where the R numbers nᵢ are close to N/R (nᵢ = N/R when N is divisible by R) and are preassigned and the same for all judges. When this method is used, all subjects are treated alike, the grading system is the same for all judges and the grades of each judge are given equal weight. Equally important, the meaning of a particular grade is clear to each judge and the same for each judge. Under the null hypothesis that all nR = N subjects are of equal merit, tests of significance are developed to determine whether (1) a particular individual is superior or inferior to the rest of the subjects; (2) two particular subjects are of equal merit; (3) the individuals with the highest and lowest scores are respectively superior and interior to the rest of the subjects and (4) the nR subjects form a homogeneous group. The critical values of the test statistics for (1), (2) and (3) are tabled for small to moderate values of m, an approximation based on the asymptotic normality of the appropriate test statistic proving suitable for large m. The test of homogeneity (4) employs a sum of squares of subjects’ scores which is shown to be asymptotically distributed for m→∞ as chi-square with nR-1 degrees of freedom. For the special case of complete ranking (R=N), this statistic is identical to one proposed by Friedman (1937) form rankings. The behavior of two of these tests is theoretically investigated for the non-null case of nR-1 subjects having equal merit and one "outlying" subject whose merit exceeds the others. The assumption is made that each judge j assigns a grade to every subject i on the basis of a "subjective random variable" xᵢⱼ with mean equal to the "true" merit of subject i and that the distribution of xᵢⱼ is the same for all j. The probability, P(δ), that subject #1 with true mean differing from the others by an amount δ would receive a significantly high score according to the test for outliers is obtained and presented graphically as a function of for xᵢⱼ distributed as (1/2) sech² (x-δ) and also as N ( δ, 1). Using a result due to Hannan (1956), an expression for the asymptotic relative efficiency of the chi-squared homogeneity test for restricted vs. complete ranking for the aforementioned non-null case is obtained and values of this A.R.E. for 2 ≤n≤10 and. 2≤R≤8 are tabled. This A.R.E. is found to be at least 0.9 for all cases where n≤10 and R≥4. A further comparison of the performances of restricted (R) and complete (C) ranking is made by way of some simulation studies performed on a high speed digital computer tor the non- null ease where xᵢⱼ is normally distributed with unit variance and a mean δ₁ having as many as three different possible values. The complete and restricted ranks assigned by the jth judge to the ith subject are assigned on the basis of the value of xᵢⱼ obtained by experimental sampling using a random normal number generator in the computer program. A group of Nₛ subjects with the highest rank sums for (R) and for (C) are then selected in each study. The observed difference in true means between selected and remaining groups is then used as a measure of goodness of the two selection procedures. The results of these studies are presented graphically, displaying a very close agreement between (R) and (C) in all instances. / Ph. D. LD5655.V856 1965.N675 Mathematical statistics Probabilities
349	Two-way rank-sum tests for variances Ansari, Abdur Rahman January 1959 (has links) Ph. D. LD5655.V856 1959.A572 Mathematical statistics
350	Power characteristics of Kramer's method for analysis of variance of a two-way classification with disproportionate subclass numbers Dunn, James Eldon January 1963 (has links) A theorem by Shah and Khatri [A.M.S. (1961) 321883-887] is extended to give the distribution of Q/X² , where Q is a positive definite quadratic form involving non-central normal variates and X² is an independently distributed chi-square variate. Conditions are given under which the distribution of this ratio reduces to that of non-central F. Kramer’s method for analysis of variance of a two-way classification with disproportionate subclass numbers is reviewed and shown to satisfy these conditions. Various functional forms of the non-centrality parameter for evaluating the power of his method are given. Additional algebraic and numerical results are obtained to compare the power of Kramer's method and the method of fitting constants (least squares) outlined by Yates [J.A.S.A. (1934) 24151-66]. A mnemonic rule, based on 310 randomly generated two-way classifications, is given for discriminating against use of Kramer’s method in situations where his method potentially may be very deficient in power as compared to the method of fitting constants. / Ph. D. LD5655.V856 1963.D866 Mathematical statistics Probabilities

Search results