91 |
Item and person parameter estimation using hierarchical generalized linear models and polytomous item response theory modelsWilliams, Natasha Jayne 27 July 2011 (has links)
Not available / text
|
92 |
Structural inference of linear models for some families of error distributions欒世武, Luan, Shiwu. January 1998 (has links)
published_or_final_version / Statistics / Doctoral / Doctor of Philosophy
|
93 |
Analyzing the Behavior of Rats by Repeated MeasurementsHall, Kenita A 03 May 2007 (has links)
Longitudinal data, which is also known as repeated measures, has grown increasingly within the past years because of its ability to monitor change both within and between subjects. Statisticians in many fields of study have chosen this way of collecting data because it is cost effective and it minimizes the number of subjects required to produce a meaningful outcome. This thesis will explore the world of longitudinal studies to gain a thorough understanding of why this type of collecting data has grown so rapidly. This study will also describe several methods to analyze repeated measures using data collected on the behavior of both adolescent and adult rats. The question of interest is to see if there is a change in the mean response over time and if the covariates (age, bodyweight, gender, and time) influence those changes. After much testing, our data set has a positive nonlinear change in the mean response over time within the age and gender groups. Using a model that included random effects proved to be a better method than models that did not use any random effects. Taking the log of the response variable and using day as the random effect was overall a better fit for our dataset. The transformed model also showed all covariates except for age as being significant.
|
94 |
Optimal designs for linear mixed models.Debusho, Legesse Kassa. January 2004 (has links)
The research of this thesis deals with the derivation of optimum designs for linear mixed models. The problem of constructing optimal designs for linear mixed models is very broad. Thus the thesis is mainly focused on the design theory for random coefficient regression models which are a special case of the linear mixed model. Specifically, the major objective of the thesis is to construct optimal designs for the simple linear and the quadratic regression
models with a random intercept algebraically. A second objective is to investigate the nature of optimal designs for the simple linear random coefficient regression model numerically. In all models time is considered as an explanatory variable and its values are assumed to belong the set {a, 1, ... , k}. Two sets of individual designs, designs with non-repeated time points comprising up to k + 1 distinct time points and designs with repeated time points comprising up to k + 1 time points not necessarily distinct, are used in the thesis. In the first case there are 2k+ - 1 individual designs while in the second case there are ( 2 2k k+ 1 ) - 1 such designs. The problems of constructing population designs, which allocate weights to the individual designs in such a way that the information associated with
the model parameters is in some sense maximized and the variances associated with the mean responses at a given vector of time points are in some sense minimized, are addressed. In particular D- and V-optimal designs are discussed. A geometric approach is introduced to confirm the global optimality of D- and V-optimal designs for the simple linear regression
model with a random intercept. It is shown that for the simple linear regression model with a random intercept these optimal designs are robust to the choice of the variance ratio. A comparison of these optimal designs over the sets of individual designs with repeated and non-repeated points for that model is also made and indicates that the D- and V-optimal
iii population designs based on the individual designs with repeated points are more efficient than the corresponding optimal population designs with non-repeated points. Except for the one-point case, D- and V-optimal population designs change with the values of the variance ratio for the quadratic regression model with a random intercept. Further numerical results show that the D-optimal designs for the random coefficient models are dependent on the choice of variance components. / Thesis (Ph.D.) - University of KwaZulu-Natal, Pietermaritzburg, 2004.
|
95 |
Use of statistical modelling and analyses of malaria rapid diagnostic test outcome in Ethiopia.Ayele, Dawit Getnet. 12 December 2013 (has links)
The transmission of malaria is among the leading public health problems in
Ethiopia. From the total area of Ethiopia, more than 75% is malarious. Identifying
the infectiousness of malaria by socio-economic, demographic and geographic risk
factors based on the malaria rapid diagnosis test (RDT) survey results has several
advantages for planning, monitoring and controlling, and eventual malaria
eradication effort. Such a study requires thorough understanding of the diseases
process and associated factors. However such studies are limited. Therefore, the
aim of this study was to use different statistical tools suitable to identify socioeconomic,
demographic and geographic risk factors of malaria based on the
malaria rapid diagnosis test (RDT) survey results in Ethiopia. A total of 224
clusters of about 25 households were selected from the Amhara, Oromiya and
Southern Nation Nationalities and People (SNNP) regions of Ethiopia. Accordingly,
a number of binary response statistical analysis models were used. Multiple
correspondence analysis was carried out to identify the association among socioeconomic,
demographic and geographic factors. Moreover a number of binary
response models such as survey logistic, GLMM, GLMM with spatial correlation,
joint models and semi-parametric models were applied. To test and investigate how well the observed malaria RDT result, use of mosquito nets and use of indoor residual spray data fit the expectations of the model, Rasch model was used. The fitted models have their own strengths and weaknesses. Application of
these models was carried out by analysing data on malaria RDT result. The data
used in this study, which was conducted from December 2006 to January 2007 by
The Carter Center, is from baseline malaria indicator survey in Amhara, Oromiya
and Southern Nation Nationalities and People (SNNP) regions of Ethiopia.
The correspondence analysis and survey logistic regression model was used to
identify predictors which affect malaria RDT results. The effect of identified socioeconomic,
demographic and geographic factors were subsequently explored by
fitting a generalized linear mixed model (GLMM), i.e., to assess the covariance
structures of the random components (to assess the association structure of the
data). To examine whether the data displayed any spatial autocorrelation, i.e.,
whether surveys that are near in space have malaria prevalence or incidence that
is similar to the surveys that are far apart, spatial statistics analysis was
performed. This was done by introducing spatial autocorrelation structure in
GLMM. Moreover, the customary two variables joint modelling approach was
extended to three variables joint effect by exploring the joint effect of malaria RDT
result, use of mosquito nets and indoor residual spray in the last twelve months.
Assessing the association between these outcomes was also of interest.
Furthermore, the relationships between the response and some confounding
covariates may have unknown functional form. This led to proposing the use of
semiparametric additive models which are less restrictive in their specification.
Therefore, generalized additive mixed models were used to model the effect of age,
family size, number of rooms per person, number of nets per person, altitude and
number of months the room sprayed nonparametrically. The result from the study
suggests that with the correct use of mosquito nets, indoor residual spraying and
other preventative measures, coupled with factors such as the number of rooms in
a house, are associated with a decrease in the incidence of malaria as determined
by the RDT. However, the study also suggests that the poor are less likely to use
these preventative measures to effectively counteract the spread of malaria. In
order to determine whether or not the limited number of respondents had undue
influence on the malaria RDT result, a Rasch model was used. The result shows
that none of the responses had such influences. Therefore, application of the
Rasch model has supported the viability of the total sixteen (socio-economic,
demographic and geographic) items for measuring malaria RDT result, use of
indoor residual spray and use of mosquito nets. From the analysis it can be seen
that the scale shows high reliability. Hence, the result from Rasch model supports the analysis carried out in previous models. / Thesis (Ph.D.)-University of KwaZulu-Natal, Pietermaritzburg, 2013.
|
96 |
Statistical modelling of availability of major food cereals in Lesotho : application of regression models and diagnostics.Khoeli, Makhala Bernice. January 2012 (has links)
Oftentimes, application of regression models to analyse cereals data is limited to estimating and
predicting crop production or yield. The general approach has been to fit the model without much
consideration of the problems that accompany application of regression models to real life data, such
as collinearity, models not fitting the data correctly and violation of assumptions. These problems
may interfere with applicability and usefulness of the models, and compromise validity of results
if they are not corrected when fitting the model. We applied regression models and diagnostics
on national and household data to model availability of main cereals in Lesotho, namely, maize,
sorghum and wheat. The application includes the linear regression model, regression and collinear
diagnostics, Box-Cox transformation, ridge regression, quantile regression, logistic regression and
its extensions with multiple nominal and ordinal responses.
The Linear model with first-order autoregressive process AR(1) was used to determine factors
that affected availability of cereals at the national level. Case deletion diagnostics were used to
identify extreme observations with influence on different quantities of the fitted regression model,
such as estimated parameters, predicted values, and covariance matrix of the estimates. Collinearity
diagnostics detected the presence of more than one collinear relationship coexisting in the data
set. They also determined variables involved in each relationship, and assessed potential negative
impact of collinearity on estimated parameters. Ridge regression remedied collinearity problems
by controlling inflation and instability of estimates. The Box-Cox transformation corrected non-constant
variance, longer and heavier tails of the distribution of data. These increased applicability
and usefulness of the linear models in modeling availability of cereals.
Quantile regression, as a robust regression, was applied to the household data as an alternative
to classical regression. Classical regression estimates from ordinary least squares method are sensitive
to distributions with longer and heavier tails than the normal distribution, as well as to
outliers. Quantile regression estimates appear to be more efficient than least squares estimates for
a wide range of error term distribution. We studied availability of cereals further by categorizing
households according to availability of different cereals, and applied the logistic regression model
and its extensions. Logistic regression was applied to model availability and non-availability of
cereals. Multinomial logistic regression was applied to model availability with nominal multiple
categories. Ordinal logistic regression was applied to model availability with ordinal categories and
this made full use of available information. The three variants of logistic regression model gave
results that are in agreement, which are also in agreement with the results from the linear regression
model and quantile regression model. / Thesis (Ph.D.)-University of KwaZulu-Natal, Durban, 2012.
|
97 |
A comparison of Bayesian variable selection approaches for linear modelsRahman, Husneara 03 May 2014 (has links)
Bayesian variable selection approaches are more powerful in discriminating among
models regardless of whether these models under investigation are hierarchical or not.
Although Bayesian approaches require complex computation, use of theMarkov Chain
Monte Carlo (MCMC) methods, such as, Gibbs sampler and Metropolis-Hastings algorithm
make computations easier. In this study we investigated the e↵ectiveness
of Bayesian variable selection approaches in comparison to other non-Bayesian or
classical approaches. For this purpose, we compared the performance of Bayesian
versus non-Bayesian variable selection approaches for linear models. Among these
approaches, we studied Conditional Predictive Ordinate (CPO) and Bayes factor.
Among the non-Bayesian or classical approaches, we implemented adjusted R-square,
Akaike Information Criterion (AIC) and Bayes Information Criterion (BIC) for model
selection. We performed a simulation study to examine how Bayesian and non-
Bayesian approaches perform in selecting variables. We also applied these methods
to real data and compared their performances. We observed that for linear models,
Bayesian variable selection approaches perform consistently as that of non-Bayesian
approaches. / Bayesian inference -- Bayesian inference for normally distributed likekilhood -- Model adequacy -- Simulation approach -- Application to wage data. / Department of Mathematical Sciences
|
98 |
An investigation into the use of combined linear and neural network models for time series data / A.S. Kruger.Kruger, Albertus Stephanus January 2009 (has links)
Time series forecasting is an important area of forecasting in which past observations of the same variable are collected and analyzed to develop a model describing the underlying relationship. The model is then used to extrapolate the time series into the future. This modeling approach is particularly useful when little knowledge is available on the underlying data generating process or when there is no satisfactory explanatory model that relates the prediction variable to other explanatory variables. Time series can be modeled in a variety of ways e.g. using exponential smoothing techniques, regression models, autoregressive (AR) techniques, moving averages (MA) etc. Recent research activities in forecasting also suggested that artificial neural networks can be used as an alternative to traditional linear forecasting models. This study will, along the lines of an existing study in the literature, investigate the use of a hybrid approach to time series forecasting using both linear and neural network models. The proposed methodology consists of two basic steps. In the first step, a linear model is used to analyze the linear part of the problem and in the second step a neural network model is developed to model the residuals from the linear model. The results from the neural network can then be used to predict the error terms for the linear model. This means that the combined forecast of the time series will depend on both models. Following an overview of the models, empirical tests on real world data will be performed to determine the forecasting performance of such a hybrid model. Results have indicated that depending on the forecasting period, it might be worthwhile to consider the use of a hybrid model. / Thesis (M.Sc. (Computer Science))--North-West University, Vaal Triangle Campus, 2010.
|
99 |
An investigation into the use of combined linear and neural network models for time series data / A.S. Kruger.Kruger, Albertus Stephanus January 2009 (has links)
Time series forecasting is an important area of forecasting in which past observations of the same variable are collected and analyzed to develop a model describing the underlying relationship. The model is then used to extrapolate the time series into the future. This modeling approach is particularly useful when little knowledge is available on the underlying data generating process or when there is no satisfactory explanatory model that relates the prediction variable to other explanatory variables. Time series can be modeled in a variety of ways e.g. using exponential smoothing techniques, regression models, autoregressive (AR) techniques, moving averages (MA) etc. Recent research activities in forecasting also suggested that artificial neural networks can be used as an alternative to traditional linear forecasting models. This study will, along the lines of an existing study in the literature, investigate the use of a hybrid approach to time series forecasting using both linear and neural network models. The proposed methodology consists of two basic steps. In the first step, a linear model is used to analyze the linear part of the problem and in the second step a neural network model is developed to model the residuals from the linear model. The results from the neural network can then be used to predict the error terms for the linear model. This means that the combined forecast of the time series will depend on both models. Following an overview of the models, empirical tests on real world data will be performed to determine the forecasting performance of such a hybrid model. Results have indicated that depending on the forecasting period, it might be worthwhile to consider the use of a hybrid model. / Thesis (M.Sc. (Computer Science))--North-West University, Vaal Triangle Campus, 2010.
|
100 |
Applications of statistics in flood frequency analysisAhmad, Muhammad Idrees January 1989 (has links)
Estimation of the probability of occurrence of future flood events at one or more locations across a river system is frequently required for the design of bridges, culverts, spillways, dams and other engineering works. This study investigates some of the statistical aspects for estimating the flood frequency distribution at a single site and on regional basis. It is demonstrated that generalized logistic (GL) distribution has many properties well suited for the modelling of flood frequency data. The GL distribution performs better than the other commonly recommended flood frequency distributions in terms of several key properties. Specifically, it is capable of reproducing almost the same degree of skewness typically present in observed flood data. It appears to be more robust to the presence of extreme outliers in the upper tail of the distribution. It has a relatively simpler mathematical form. Thus all the well known methods of parameter estimation can be easily implemented. It is shown that the method of probability weighted moments (PWM) using the conventionally recommended plotting position substantially effects the estimation of the shape parameter of the generalized extreme value (GEV) distribution by relocating the annual maximum flood series. A location invariant plotting position is introduced to use in estimating, by the method of PWM, the parameters of the GEV and the GL distributions. Tests based on empirical distribution function (EDF) statistics are proposed to assess the goodness of fit of the flood frequency distributions. A modified EDF test is derived that gives greater emphasis to the upper tail of a distribution which is more important for flood frequency prediction. Significance points are derived for the GEV and GL distributions when the parameters are to be estimated from the sample data by the method of PWMs. The critical points are considerably smaller than for the case where the parameters of a distribution are assumed to be specified. Approximate formulae over the whole range of the distribution for these tests are also developed which can be used for regional assessment of GEV and GL models based on all the annual maximum series simultaneously in a hydrological region. In order to pool at-site flood data across a region into a single series for regional analysis, the effect of standardization by at-site mean on the estimation of the regional shape parameter of the GEV distribution is examined. Our simulation study based on various synthetic regions reveals that the standardization by the at-site mean underestimates the shape parameter of the GEV by about 30% of its true value and also contributes to the separation of skewness of observed and simulated floods. A two parameter standardization by the at-site estimates of location and scale parameters is proposed. It does not distort the shape of the flood frequency data in the pooling process. Therefore, it offers significantly improved estimate of the shape parameter, allows pooling data with heterogeneous coefficients of variation and helps to explain the separation of skewness effect. Regions on the basis of flood statistics L-CV and USKEW are derived for Scotland and North England. Only about 50% of the basins could be correctly identified as belonging to these regions by a set of seven catchment characteristics. The alternative approach of grouping basins solely on the basis of physical properties is preferable. Six physically homogeneous groups of basins are identified by WARD's multivariate clustering algorithm using the same seven characteristics. These regions have hydrological homogeneity in addition to their physical homogeneity. Dimensionless regional flood frequency curves are produced by fitting GEV and GL distributions for each region. The GEV regional growth curves imply a larger return period for a given magnitude flood. When floods are described by GL model the respective return periods are considerably smaller.
|
Page generated in 0.0548 seconds