Global ETD Search

1	The consolidation of forecests with regression models Venter, Daniel Jacobus Lodewyk January 2014 (has links) The primary objective of this study was to develop a dashboard for the consolidation of multiple forecasts utilising a range of multiple linear regression models. The term dashboard is used to describe with a single word the characteristics of the forecasts consolidation application that was developed to provide the required functionalities via a graphical user interface structured as a series of interlinked screens. Microsoft Excel© was used as the platform to develop the dashboard named ConFoRM (acronym for Consolidate Forecasts with Regression Models). The major steps of the consolidation process incorporated in ConFoRM are: 1. Input historical data. Select appropriate analysis and holdout samples. 3. Specify regression models to be considered as candidates for the final model to be used for the consolidation of forecasts. 4. Perform regression analysis and holdout analysis for each of the models specified in step 3. 5. Perform post-holdout testing to assess the performance of the model with best holdout validation results on out-of-sample data. 6. Consolidate forecasts. Two data transformations are available: the removal of growth and time-periods effect from the time series; a translation of the time series by subtracting ̅i, the mean of all the forecasts for data record i, from the variable being predicted and its related forecasts for each data record I. The pre-defined regression models available for ordinary least square linear regression models (LRM) are: a. A set of k simple LRM’s, one for each of the k forecasts; b. A multiple LRM that includes all the forecasts: c. A multiple LRM that includes all the forecasts and as many of the first-order interactions between the input forecasts as allowed by the sample size and the maximum number of predictors provided by the dashboard with the interactions included in the model to be those with the highest individual correlation with the variable being predicted; d. A multiple LRM that includes as many of the forecasts and first-order interactions between the input forecasts as allowed by the sample size and the maximum number of predictors provided by the dashboard: with the forecasts and interactions included in the model to be those with the highest individual correlation with the variable being predicted; e. A simple LRM with the predictor variable being the mean of the forecasts: f. A set of simple LRM’s with the predictor variable in each case being the weighted mean of the forecasts with different formulas for the weights Also available is an ad hoc user specified model in terms of the forecasts and the predictor variables generated by the dashboard for the pre-defined models. Provision is made in the regression analysis for both of forward entry and backward removal regression. Weighted least squares (WLS) regression can be performed optionally based on the age of forecasts with smaller weight for older forecasts. Forecasting -- Mathematical models
2	Least median squares algorithm for clusterwise linear regression. January 2009 (has links) Fung, Chun Yip. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2009. / Includes bibliographical references (leaves 53-54). / Abstract also in Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- The Exchange Algorithm Framework --- p.4 / Chapter 2.1 --- Ordinary Least Squares Linear Regression --- p.5 / Chapter 2.2 --- The Exchange Algorithm --- p.6 / Chapter 3 --- Methodology --- p.12 / Chapter 3.1 --- Least Median Squares Linear Regression --- p.12 / Chapter 3.2 --- Least Median Squares Algorithm for Clusterwise Linear Re- gression --- p.16 / Chapter 3.3 --- Measures of Performance --- p.20 / Chapter 3.4 --- An Illustrative Example --- p.24 / Chapter 4 --- Monte Carlo Simulation Study --- p.34 / Chapter 4.1 --- Simulation Plan --- p.34 / Chapter 4.2 --- Simulation Results --- p.41 / Chapter 4.2.1 --- Effects of the Six factors --- p.41 / Chapter 4.2.2 --- Comparisons between LMSA and the Exchange Algorithm --- p.47 / Chapter 4.2.3 --- Evaluation of the Improvement of Regression Parame- ters by Performing Stage 3 in LMSA --- p.50 / Chapter 5 --- Concluding Remarks --- p.51 / Bibliography --- p.52 Regression analysis--Mathematical models Cluster analysis Least squares
3	Zero-inflated regression models for count data : an application to under-5 deaths Mamun, Md Abdullah Al 03 May 2014 (has links) Zero-inflated (ZI) count data models overcome the restriction of equality relationship between mean and variance, but functional relationship still exists. For ZI models it is important to know whether the proportion of zeros and the rate of counts have any influence on the fit of the model. In this study we have considered three zero-inflated models, namely, ZIP, ZINB, and Hurdle model. We also considered Poisson and negative binomial model as classical count data models. Our simulation experiment suggests that the proportion of zeros for given rate parameter does not a↵ect the fit of the models as long as model is correctly specified. In case of misspecification of the model, it does not perform well for large rate parameter. These three zero-inflated models performed better than the classical models as the rate parameter and the proportion of zeros become larger. We applied five models to the BDHS 2011 survey data to understand the social determinants associated with a mother to experience under-5 deaths of her children. The classical models failed to di↵erentiate between mothers who have experienced under-5 deaths of their children and who have never experienced under-5 deaths. While zero-inflated models were able to di↵erentiate between those two groups of mothers in terms of zero counts and positive counts of number of under-5 deaths of their children with associated covariates in opposite slope of coefficients. Among the three zero-inflated models, Hurdle model performed best in fitting the data compared to the ZIP and ZINB models.
4	Smoothing approaches in regression Liu, Baisen. January 2008 (has links) No description available. Failure time data analysis. Smoothing (Numerical analysis)
5	Single-index regression models Wu, Jingwei 05 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Useful medical indices pose important roles in predicting medical outcomes. Medical indices, such as the well-known Body Mass Index (BMI), Charleson Comorbidity Index, etc., have been used extensively in research and clinical practice, for the quantification of risks in individual patients. However, the development of these indices is challenged; and primarily based on heuristic arguments. Statistically, most medical indices can be expressed as a function of a linear combination of individual variables and fitted by single-index model. Single-index model represents a way to retain latent nonlinear features of the data without the usual complications that come with increased dimensionality. In my dissertation, I propose a single-index model approach to analytically derive indices from observed data; the resulted index inherently correlates with specific health outcomes of interest. The first part of this dissertation discusses the derivation of an index function for the prediction of one outcome using longitudinal data. A cubic-spline estimation scheme for partially linear single-index mixed effect model is proposed to incorporate the within-subject correlations among outcome measures contributed by the same subject. A recursive algorithm based on the optimization of penalized least square estimation equation is derived and is shown to work well in both simulated data and derivation of a new body mass measure for the assessment of hypertension risk in children. The second part of this dissertation extends the single-index model to a multivariate setting. Specifically, a multivariate version of single-index model for longitudinal data is presented. An important feature of the proposed model is the accommodation of both correlations among multivariate outcomes and among the repeated measurements from the same subject via random effects that link the outcomes in a unified modeling structure. A new body mass index measure that simultaneously predicts systolic and diastolic blood pressure in children is illustrated. The final part of this dissertation shows existence, root-n strong consistency and asymptotic normality of the estimators in multivariate single-index model under suitable conditions. These asymptotic results are assessed in finite sample simulation and permit joint inference for all parameters. Asymptotics Longitudinal data analysis Mixed effect model Multivariate outcomes P-spline Errors-in-variables models Parameter estimation Mathematical statistics Biometry Multivariate analysis Diagnosis
6	Preliminary investigation into estimating eye disease incidence rate from age specific prevalence data Majeke, Lunga January 2011 (has links) This study presents the methodology for estimating the incidence rate from the age specific prevalence data of three different eye diseases. We consider both situations where the mortality may differ from one person to another, with and without the disease. The method used was developed by Marvin J. Podgor for estimating incidence rate from prevalence data. It delves into the application of logistic regression to obtain the smoothed prevalence rates that helps in obtaining incidence rate. The study concluded that the use of logistic regression can produce a meaningful model, and the incidence rates of these diseases were not affected by the assumption of differential mortality. Eye -- Diseases -- Statistics Incidence functions Regression analysis -- Data processing Medical statistics
7	Nitrous oxide emission from riparian buffers in agricultural landscapes of Indiana Fisher, Katelin Rose 25 February 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Riparian buffers have well documented capacity to remove nitrate (NO3-) from runoff and subsurface flow paths, but information on field-scale N2O emission from these buffers is lacking. This study monitored N2O fluxes at two agricultural riparian buffers in the White River watershed (Indiana) from December 2009 to May 2011 to assess the impact of landscape and hydrogeomorphologic factors on emission. Soil chemical and biochemical properties were measured and environmental variables (soil temperature and moisture) were monitored in an attempt to identify key drivers of N2O emission. The study sites included a mature riparian forest (WR) and a riparian grass buffer (LWD); adjacent corn fields were also monitored for land-use comparison. With the exception of net N mineralization, most soil properties (particle size, bulk density, pH, denitrification potential, organic carbon, C:N) showed little correlation with N2O emission. Analysis of variance (ANOVA) identified season, land-use (riparian buffer vs. crop field), and site geomorphology as major drivers of N2O emission. At both study sites, N2O emission showed strong seasonal variability; the largest emission peaks in the riparian buffers (up to 1,300 % increase) and crop fields (up to 3,500 % increase) occurred in late spring/early summer as a result of flooding, elevated soil moisture and N-fertilization. Nitrous oxide emission was found to be significantly higher in crop fields than in riparian buffers at both LWD (mean: 1.72 and 0.18 mg N2O-N m-2 d-1) and WR (mean: 0.72 and 1.26 mg N2O-N m-2 d-1, respectively). Significant difference (p=0.02) in N2O emission between the riparian buffers was detected, and this effect was attributed to site geomorphology and the greater potential for flooding at the WR site (no flooding occurred at LWD). More than previously expected, the study results demonstrate that N2O emission in riparian buffers is largely driven by landscape geomorphology and land-stream connection (flood potential). Denitrification -- Research Biomineralization Analysis of variance -- Research
8	Advanced Modeling of Longitudinal Spectroscopy Data Kundu, Madan Gopal January 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Magnetic resonance (MR) spectroscopy is a neuroimaging technique. It is widely used to quantify the concentration of important metabolites in a brain tissue. Imbalance in concentration of brain metabolites has been found to be associated with development of neurological impairment. There has been increasing trend of using MR spectroscopy as a diagnosis tool for neurological disorders. We established statistical methodology to analyze data obtained from the MR spectroscopy in the context of the HIV associated neurological disorder. First, we have developed novel methodology to study the association of marker of neurological disorder with MR spectrum from brain and how this association evolves with time. The entire problem fits into the framework of scalar-on-function regression model with individual spectrum being the functional predictor. We have extended one of the existing cross-sectional scalar-on-function regression techniques to longitudinal set-up. Advantage of proposed method includes: 1) ability to model flexible time-varying association between response and functional predictor and (2) ability to incorporate prior information. Second part of research attempts to study the influence of the clinical and demographic factors on the progression of brain metabolites over time. In order to understand the influence of these factors in fully non-parametric way, we proposed LongCART algorithm to construct regression tree with longitudinal data. Such a regression tree helps to identify smaller subpopulations (characterized by baseline factors) with differential longitudinal profile and hence helps us to identify influence of baseline factors. Advantage of LongCART algorithm includes: (1) it maintains of type-I error in determining best split, (2) substantially reduces computation time and (2) applicable even observations are taken at subject-specific time-points. Finally, we carried out an in-depth analysis of longitudinal changes in the brain metabolite concentrations in three brain regions, namely, white matter, gray matter and basal ganglia in chronically infected HIV patients enrolled in HIV Neuroimaging Consortium study. We studied the influence of important baseline factors (clinical and demographic) on these longitudinal profiles of brain metabolites using LongCART algorithm in order to identify subgroup of patients at higher risk of neurological impairment. / Partial research support was provided by the National Institutes of Health grants U01-MH083545, R01-CA126205 and U01-CA086368 Spectroscopy Functional Data Analysis Longitudinal Functional Data Analysis Brownian Bridge Longitudinal CART Longitudinal Regression Tree HIV Brain metabolites HIV neuroimaging consortium LongPEER PEER Decomposition based penalty NAA Creatine Myo-inositol Choline Glutamine and Glutamate White matter Gray matter Basal ganglia LongCART neurological disorder Global deficit score GSVD General Singular Value Decomposition Microbial metabolites -- Research HIV infections -- Complications Myelinated neurofibrils Research -- Methodology Trees (Graph theory) -- Research Biometry -- Research -- Methodology Cerebral cortex Central nervous system -- Abnormalities Spectrum analysis -- Research HIV (Viruses) -- Research -- Analysis Creatine Choline Glutamine
9	Spatio-temporal analyses of the distribution of alcohol outlets in California Li, Li January 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The objective of this research is to examine the development of the California alcohol outlets over time and the relationship between neighborhood characteristics and densities of the alcohol outlets. Two types of advanced analyses were done after the usual preliminary description of data. Firstly, fixed and random effects linear regression were used for the county panel data across time (1945-2010) with a dummy variable added to capture the change in law regarding limitations on alcohol outlets density. Secondly, a Bayesian spatio-temporal Poisson regression of the census tract panel data was conducted to capture recent availability of population characteristics affecting outlet density. The spatial Conditional Autoregressive model was embedded in the Poisson regression to detect spatial dependency of unexplained variance of alcohol outlet density. The results show that the alcohol outlets density reduced under the limitation law over time. However, it was no more effective in reducing the growth of alcohol outlets after the limitation was modified to be more restrictive. Poorer, higher vacancy rate and lower percentage of Black neighborhoods tend to have higher alcohol outlet density (numbers of alcohol outlets to population ratio) for both on-sale general and off-sale general. Other characteristics like percentage of Hispanics, percentage of Asians, percentage of younger population and median income of adjacency neighbors were associated with densities of on-sale general and off sale general alcohol outlets. Some regions like the San Francisco Bay area and the Greater Los Angeles area have more alcohol outlets than the predictions of neighborhood characteristics included in the model. Liquor stores -- Location -- California Alcoholism and crime -- California Youth -- Alcohol use -- California

Search results