Global ETD Search

191	Estimating Veterans' Health Benefit Grants Using the Generalized Linear Mixed Cluster-Weighted Model with Incomplete Data Deng, Xiaoying January 2018 (has links) The poverty rate among veterans in US has increased over the past decade, according to the U.S. Department of Veterans Affairs (2015). Thus, it is crucial to veterans who live below the poverty level to get sufficient benefit grants. A study on prudently managing health benefit grants for veterans may be helpful for government and policy-makers making appropriate decisions and investments. The purpose of this research is to find an underlying group structure for the veterans' benefit grants dataset and then estimate veterans' benefit grants sought using incomplete data. The generalized linear mixed cluster-weighted model based on mixture models is carried out by grouping similar observations to the same cluster. Finally, the estimates of veterans' benefit grants sought will provide reference for future public policies. / Thesis / Master of Science (MSc) Cluster-weighted models Mixture models Generalized linear models Clustering Mixed-type data Incomplete data
192	Understanding Scaled Prediction Variance Using Graphical Methods for Model Robustness, Measurement Error and Generalized Linear Models for Response Surface Designs Ozol-Godfrey, Ayca 23 December 2004 (has links) Graphical summaries are becoming important tools for evaluating designs. The need to compare designs in term of their prediction variance properties advanced this development. A recent graphical tool, the Fraction of Design Space plot, is useful to calculate the fraction of the design space where the scaled prediction variance (SPV) is less than or equal to a given value. In this dissertation we adapt FDS plots, to study three specific design problems: robustness to model assumptions, robustness to measurement error and design properties for generalized linear models (GLM). This dissertation presents a graphical method for examining design robustness related to the SPV values using FDS plots by comparing designs across a number of potential models in a pre-specified model space. Scaling the FDS curves by the G-optimal bounds of each model helps compare designs on the same model scale. FDS plots are also adapted for comparing designs under the GLM framework. Since parameter estimates need to be specified, robustness to parameter misspecification is incorporated into the plots. Binomial and Poisson examples are used to study several scenarios. The third section involves a special type of response surface designs, mixture experiments, and deals with adapting FDS plots for two types of measurement error which can appear due to inaccurate measurements of the individual mixture component amounts. The last part of the dissertation covers mixture experiments for the GLM case and examines prediction properties of mixture designs using the adapted FDS plots. / Ph. D. FDS Plots Design Optimality Generalized Linear Models LD5655.V856 2004.O965
193	Data-Driven Methods for Modeling and Predicting Multivariate Time Series using Surrogates Chakraborty, Prithwish 05 July 2016 (has links) Modeling and predicting multivariate time series data has been of prime interest to researchers for many decades. Traditionally, time series prediction models have focused on finding attributes that have consistent correlations with target variable(s). However, diverse surrogate signals, such as News data and Twitter chatter, are increasingly available which can provide real-time information albeit with inconsistent correlations. Intelligent use of such sources can lead to early and real-time warning systems such as Google Flu Trends. Furthermore, the target variables of interest, such as public heath surveillance, can be noisy. Thus models built for such data sources should be flexible as well as adaptable to changing correlation patterns. In this thesis we explore various methods of using surrogates to generate more reliable and timely forecasts for noisy target signals. We primarily investigate three key components of the forecasting problem viz. (i) short-term forecasting where surrogates can be employed in a now-casting framework, (ii) long-term forecasting problem where surrogates acts as forcing parameters to model system dynamics and, (iii) robust drift models that detect and exploit 'changepoints' in surrogate-target relationship to produce robust models. We explore various 'physical' and 'social' surrogate sources to study these sub-problems, primarily to generate real-time forecasts for endemic diseases. On modeling side, we employed matrix factorization and generalized linear models to detect short-term trends and explored various Bayesian sequential analysis methods to model long-term effects. Our research indicates that, in general, a combination of surrogates can lead to more robust models. Interestingly, our findings indicate that under specific scenarios, particular surrogates can decrease overall forecasting accuracy - thus providing an argument towards the use of 'Good data' against 'Big data'. / Ph. D. Multivariate Time Series Surrogates Generalized Linear Models Bayesian Sequential Analysis Computational Epidemiology
194	Semiparametric Regression Methods with Covariate Measurement Error Johnson, Nels Gordon 06 December 2012 (has links) In public health, biomedical, epidemiological, and other applications, data collected are often measured with error. When mismeasured data is used in a regression analysis, not accounting for the measurement error can lead to incorrect inference about the relationships between the covariates and the response. We investigate measurement error in the covariates of two types of regression models. For each we propose a fully Bayesian approach that treats the variable measured with error as a latent variable to be integrated over, and a semi-Bayesian approach which uses a first order Laplace approximation to marginalize the variable measured with error out of the likelihood. The first model is the matched case-control study for analyzing clustered binary outcomes. We develop low-rank thin plate splines for the case where a variable measured with error has an unknown, nonlinear relationship with the response. In addition to the semi- and fully Bayesian approaches, we propose another using expectation-maximization to detect both parametric and nonparametric relationships between the covariates and the binary outcome. We assess the performance of each method via simulation terms of mean squared error and mean bias. We illustrate each method on a perturbed example of 1--4 matched case-control study. The second regression model is the generalized linear model (GLM) with unknown link function. Usually, the link function is chosen by the user based on the distribution of the response variable, often to be the canonical link. However, when covariates are measured with error, incorrect inference as a result of the error can be compounded by incorrect choice of link function. We assess performance via simulation of the semi- and fully Bayesian methods in terms of mean squared error. We illustrate each method on the Framingham Heart Study dataset. The simulation results for both regression models support that the fully Bayesian approach is at least as good as the semi-Bayesian approach for adjusting for measurement error, particularly when the distribution of the variable of measure with error and the distribution of the measurement error are misspecified. / Ph. D. Bayesian methods error-in-covariates generalized linear models matched case-control studies mixed models semiparametric reg
195	Long-term Benefits of Extracurricular Activities on Socioeconomic Outcomes and Their Trends in 1988-2012 Long, Thomas Carl 09 November 2015 (has links) Across the country, budget cuts to education have resulted in decreased funds available for extracurricular activities. This trend in policy may have a significant impact on future outcomes, as reflected in student success measures. Using two datasets that were collected over the last two decades, in the present study, the researcher assessed the relationship between participation in extracurricular activities and the future socioeconomic outcomes in respondents' lives, including post-secondary education, full-time employment status, and income. Two existing large-scale longitudinal studies of the U.S. secondary students, i.e., the National Education Longitudinal Study of 1988 (NELS: 88) and the Education Longitudinal Study of 2002 (ELS: 2002), served as data sources. As these surveys were conducted about a decade apart, the information they yielded was suitable for meeting the study aims. Generalized linear models, such as multiple regression and logistic regression analyses, by applying sample weights, were performed to examine the impacts of extracurricular activity participation on the aforementioned outcome measures. The implications of the study findings, including the comparison of the results from two different datasets collected at different time points, were interpreted with respect to school budget policy. Results from the NELS: 88 and ELS: 2002 were also compared to evaluate the trends in the characteristics and performance of U.S. high school students during the 1988-2012 period. / Ph. D. Extracurricular Activities Socioeconomic Outcomes NELS: 88 ELS: 2002 Generalized Linear Models
196	Randomization analysis of experimental designs under non standard conditions Morris, David Dry January 1987 (has links) Often the basic assumptions of the ANOVA for an experimental design are not met or the statistical model is incorrectly specified. Randomization of treatments to experimental units is expected to protect against such shortcomings. This paper uses randomization theory to examine the impact on the expectations of mean squares, treatment means, and treatment differences for two model mis·specifications: Systematic response shifts and correlated experimental units. Systematic response shifts are presented in the context of the randomized complete block design (RCBD). In particular fixed shifts are added to the responses of experimental units in the initial and final positions of each block. The fixed shifts are called border shifts. It is shown that the RCBD is an unbiased design under randomization theory when border shifts are present. Treatment means are biased but treatment differences are unbiased. However the estimate of error is biased upwards and the power of the F test is reduced. Alternative designs to the RCBD under border shifts are the Latin square, semi-Latin square, and two-column designs. Randomization analysis demonstrates that the Latin square is an unbiased design with an unbiased estimate of error and of treatment differences. The semi-Latin square has each of the t treatments occurring only once per row and column, but t is a multiple of the number of rows or columns. Thus each row-column combination contains more than one experimental unit. The semi-Latin square is a biased design with a biased estimate of error even when no border shifts are present. Row-column interaction is responsible for the bias. Border shifts do not contaminate the expected mean squares or treatment differences, and thus the semi-Latin square is a viable alternative when the border shift overwhelms the row-column interaction. The two columns of the two-column design correspond to the border and interior experimental units respectively. Results similar to that for the semi-Latin square are obtained. Simulation studies for the RCBD and its alternatives indicate that the power of the F test is reduced for the RCBD when border shifts are present. When no row-column interaction is present, the semi-Latin square and two-column designs provide good alternatives to the RCBD. Similar results are found for the split plot design when border shifts occur in the sub plots. A main effects plan is presented for situations when the number of whole plot units equals the number of sub plot units per whole plot. The analysis of designs in which the experimental units occur in a sequence and exhibit correlation is considered next. The Williams Type Il(a) design is examined in conjunction with the usual ANOVA and with the method of first differencing. Expected mean squares, treatment means, and treatment differences are obtained under randomization theory for each analysis. When only adjacent experimental units have non negligible correlation, the Type Il(a) design provides an unbiased error estimate for the usual ANOVA. However the expectation of the treatment mean square is biased downwards for a positive correlation. First differencing results in a biased test and a biased error estimate. The test is approximately unbiased if the correlation between units is close to a half. / Ph. D. LD5655.V856 1987.M67 Analysis of variance Linear models (Statistics) Mathematical statistics
197	Hypothesis testing procedures for non-nested regression models Bauer, Laura L. January 1987 (has links) Theory often indicates that a given response variable should be a function of certain explanatory variables yet fails to provide meaningful information as to the specific form of this function. To test the validity of a given functional form with sensitivity toward the feasible alternatives, a procedure is needed for comparing non-nested families of hypotheses. Two hypothesized models are said to be non-nested when one model is neither a restricted case nor a limiting approximation of the other. These non-nested hypotheses cannot be tested using conventional likelihood ratio procedures. In recent years, however, several new approaches have been developed for testing non-nested regression models. A comprehensive review of the procedures for the case of two linear regression models was presented. Comparisons between these procedures were made on the basis of asymptotic distributional properties, simulated finite sample performance and computational ease. A modification to the Fisher and McAleer JA-test was proposed and its properties investigated. As a compromise between the JA-test and the Orthodox F-test, it was shown to have an exact non-null distribution. Its properties, both analytically and empirically derived, exhibited the practical worth of such an adjustment. A Monte Carlo study of the testing procedures involving non-nested linear regression models in small sample situations (n ≤ 40) provided information necessary for the formulation of practical guidelines. It was evident that the modified Cox procedure, N̄ , was most powerful for providing correct inferences. In addition, there was strong evidence to support the use of the adjusted J-test (AJ) (Davidson and MacKinnon's test with small-sample modifications due to Godfrey and Pesaran), the modified JA-test (NJ) and the Orthodox F-test for supplemental information. Under non normal disturbances, similar results were yielded. An empirical study of spending patterns for household food consumption provided a practical application of the non-nested procedures in a large sample setting. The study provided not only an example of non-nested testing situations but also the opportunity to draw sound inferences from the test results. / Ph. D. LD5655.V856 1987.B383 Linear models (Statistics) Statistical hypothesis testing Econometrics
198	Superscalar Processor Models Using Statistical Learning Joseph, P J 04 1900 (has links) Processor architectures are becoming increasingly complex and hence architects have to evaluate a large design space consisting of several parameters, each with a number of potential settings. In order to assist in guiding design decisions we develop simple and accurate models of the superscalar processor design space using a detailed and validated superscalar processor simulator. Firstly, we obtain precise estimates of all signiﬁcant micro-architectural parameters and their interactions by building linear regression models using simulation based experiments. We obtain good approximate models at low simulation costs using an iterative process in which Akaike’s Information Criteria is used to extract a good linear model from a small set of simulations, and limited further simulation is guided by the model using D-optimal experimental designs. The iterative process is repeated until desired error bounds are achieved. We use this procedure for model construction and show that it provides a cost effective scheme to experiment with all relevant parameters. We also obtain accurate predictors of the processors performance response across the entire design-space, by constructing radial basis function networks from sampled simulation experiments. We construct these models, by simulating at limited design points selected by latin hypercube sampling, and then deriving the radial neural networks from the results. We show that these predictors provide accurate approximations to the simulator’s performance response, and hence provide a cheap alternative to simulation while searching for optimal processor design points. Supercomputers Supercomputers - Statistical Methods MATLAB Linear Regression Models Superscalar Processor Architecture Superscalar Processors - Linear Models Radial Basis Function Networks Linear Models RBF Networks Processor Performance Analysis Predictive Performance Model Predictive Modeling Computer Science
199	Statistical Methods for Dating Collections of Historical Documents Tilahun, Gelila 31 August 2011 (has links) The problem in this thesis was originally motivated by problems presented with documents of Early England Data Set (DEEDS). The central problem with these medieval documents is the lack of methods to assign accurate dates to those documents which bear no date. With the problems of the DEEDS documents in mind, we present two methods to impute missing features of texts. In the first method, we suggest a new class of metrics for measuring distances between texts. We then show how to combine the distances between the texts using statistical smoothing. This method can be adapted to settings where the features of the texts are ordered or unordered categoricals (as in the case of, for example, authorship assignment problems). In the second method, we estimate the probability of occurrences of words in texts using nonparametric regression techniques of local polynomial fitting with kernel weight to generalized linear models. We combine the estimated probability of occurrences of words of a text to estimate the probability of occurrence of a text as a function of its feature -- the feature in this case being the date in which the text is written. The application and results of our methods to the DEEDS documents are presented. Kernel Dating Documents Shingle Correspondence distance Smoothing Generalized linear models Logistics regression Local polynomial regression 0581 0463 0800
200	Nonlinearity In Exchange Rates : Evidence From African Economies Jobe, Ndey Isatou January 2016 (has links) In an effort to assess the predictive ability of exchange rate models when data on African countries is sampled, this paper studies nonlinear modelling and prediction of the nominal exchange rate series of the United States dollar to currencies of thirty-eight African states using the smooth transition autoregressive (STAR) model. A three step analysis is undertaken. One, it investigates nonlinearity in all nominal exchange rate series examined using a chain of credible statistical in-sample tests. Significantly, evidence of nonlinear exponential STAR (ESTAR) dynamics is detected across all series. Two, linear models are provided another chance to make it right by shuffling to data on African countries to investigate their predictive power against the tough random walk without drift model. Linear models again failed significantly. Lastly, the predictive ability of nonlinear models against both the random walk without drift and the corresponding linear models is investigated. Nonlinear models display useful forecasting gains over all contending models. Nominal Exchange Rates Linear Models Random Walk Model Smooth Transition Autoregressive Model Linearity Tests Unit Root Tests Forecast Evaluation.

Search results