Global ETD Search

651	A simulation study of the robustness of the least median of squares estimator of slope in a regression through the origin model Paranagama, Thilanka Dilruwani January 1900 (has links) Master of Science / Department of Statistics / Paul I. Nelson / The principle of least squares applied to regression models estimates parameters by minimizing the mean of squared residuals. Least squares estimators are optimal under normality but can perform poorly in the presence of outliers. This well known lack of robustness motivated the development of alternatives, such as least median of squares estimators obtained by minimizing the median of squared residuals. This report uses simulation to examine and compare the robustness of least median of squares estimators and least squares estimators of the slope of a regression line through the origin in terms of bias and mean squared error in a variety of conditions containing outliers created by using mixtures of normal and heavy tailed distributions. It is found that least median of squares estimation is almost as good as least squares estimation under normality and can be much better in the presence of outliers. Least median of squares estimates Regression Estimates Median Regression through the origin Statistics (0463)
652	Eine empirische Analyse des individuellen Verkehrsmittelwahlverhaltens am Beispiel der Stadt Dresden Schletze, Matthias 15 December 2015 (has links) (PDF) Das Verkehrsmittelwahlverhalten von Menschen ist komplex. So spielen soziodemografische, sozioökonomische sowie raum- und siedlungsstrukturelle Merkmale eine Rolle. In dieser Arbeit wird dieses Verhalten untersucht. Dabei wird eine homogene Grundgesamtheit geschaffen, welche alle Personen beinhaltet, die sowohl über eine Dauerkarte des öffentlichen Personenverkehrs als auch einen Personenkraftwagen verfügen. Anhand derer soll eine deskriptive Analyse und eine multinomiale logistische Regression Aufschluss geben, ob es Unterschiede zwischen den jeweiligen Nutzergruppen gibt. So lässt sich die Gruppe der ÖV-Nutzer durch folgende Charakteristiken beschreiben: der Großteil sind Frauen, sowie Personen, die eine hohe schulische und berufliche Bildung besitzen. Des Weiteren werden eher weniger Wege mit dem ÖV als mit dem PKW zurückgelegt. Erwerbstätige hingegen entscheiden sich eher für den PKW. / Human behavior towards the choice of transportation varies in very complex ways such as sociodemographics, socioeconomics as well as settlement structures. For this paper a homogenous population is created from season ticket holders for public transportation and car owners. Based on this population a descriptive analysis followed by a multinomial logistic regression is supposed to generate the differences between the user groups. The group of users of the public transportation system can be characterized as followed: the majority of users are women as well as highly educated people. Within this specific group distances are more likely to be covered by public transportation rather than by car. However the working population prefers to go by passenger car. Verkehrsmittelwahl ÖV PKW Verhalten Multinomiale logistische Regression Modal choice multinomial logistic regression ddc:380 rvk:ZO 3300
653	Essays on Computational Problems in Insurance Ha, Hongjun 31 July 2016 (has links) This dissertation consists of two chapters. The first chapter establishes an algorithm for calculating capital requirements. The calculation of capital requirements for financial institutions usually entails a reevaluation of the company's assets and liabilities at some future point in time for a (large) number of stochastic forecasts of economic and firm-specific variables. The complexity of this nested valuation problem leads many companies to struggle with the implementation. The current chapter proposes and analyzes a novel approach to this computational problem based on least-squares regression and Monte Carlo simulations. Our approach is motivated by a well-known method for pricing non-European derivatives. We study convergence of the algorithm and analyze the resulting estimate for practically important risk measures. Moreover, we address the problem of how to choose the regressors, and show that an optimal choice is given by the left singular functions of the corresponding valuation operator. Our numerical examples demonstrate that the algorithm can produce accurate results at relatively low computational costs, particularly when relying on the optimal basis functions. The second chapter discusses another application of regression-based methods, in the context of pricing variable annuities. Advanced life insurance products with exercise-dependent financial guarantees present challenging problems in view of pricing and risk management. In particular, due to the complexity of the guarantees and since practical valuation frameworks include a variety of stochastic risk factors, conventional methods that are based on the discretization of the underlying (Markov) state space may not be feasible. As a practical alternative, this chapter explores the applicability of Least-Squares Monte Carlo (LSM) methods familiar from American option pricing in this context. Unlike previous literature we consider optionality beyond surrendering the contract, where we focus on popular withdrawal benefits - so-called GMWBs - within Variable Annuities. We introduce different LSM variants, particularly the regression-now and regression-later approaches, and explore their viability and potential pitfalls. We commence our numerical analysis in a basic Black-Scholes framework, where we compare the LSM results to those from a discretization approach. We then extend the model to include various relevant risk factors and compare the results to those from the basic framework. Loss Distribution least-squares Monte Carlo Value-at-Risk Singular Value Decomposition Regression-now Regression-later
654	New regression methods for measures of central tendency Aristodemou, Katerina January 2014 (has links) Measures of central tendency have been widely used for summarising statistical data, with the mean being the most popular summary statistic. However, in reallife applications it is not always the most representative measure of central location, especially when dealing with data which is skewed or contains outliers. Alternative statistics with less bias are the median and the mode. Median and quantile regression has been used in different fields to examine the effect of factors at different points of the distribution. Mode estimation, on the other hand, has found many applications in cases where the analysis focuses on obtaining information about the most typical value or pattern. This thesis demonstrates that mode also plays an important role in the analysis of big data, which is becoming increasingly important in many sectors of the global economy. However, mode regression has not been widely applied, even though there is a clear conceptual benefit, due to the computational and theoretical limitations of the existing estimators. Similarly, despite the popularity of the binary quantile regression model, computational straight forward estimation techniques do not exist. Driven by the demand for simple, well-found and easy to implement inference tools, this thesis develops a series of new regression methods for mode and binary quantile regression. Chapter 2 deals with mode regression methods from the Bayesian perspective and presents one parametric and two non-parametric methods of inference. Chapter 3 demonstrates a mode-based, fast pattern-identification method for big data and proposes the first fully parametric mode regression method, which effectively uncovers the dependency of typical patterns on a number of covariates. The proposed approach is demonstrated through the analysis of a decade-long dataset on the Body Mass Index and associated factors, taken from the Health Survey for England. Finally, Chapter 4 presents an alternative binary quantile regression approach, based on the nonlinear least asymmetric weighted squares, which can be implemented using standard statistical packages and guarantees a unique solution. 519.5
655	Prognostisering av försäljningsvolym med hjälp av omvärldsindikatorer Liendeborg, Zaida, Karlsson, Mattias January 2016 (has links) Background Forecasts are used as a basis for decision making and they mainly affect decisions at strategic and tactical levels in a company or organization. There are two different methods to perform forecasts. The first one is a qualitative method where a n expert or group of experts tell about the future. The second one is a quantitative method where forecast are produced by mathematical and statistical models. This study used a quantitative method to build a forecast model and took into account external f actors in forecasting the sales volume of Bosch Rexroth’s hydraulic motors. There is a very wide range of external factors and only a limited selection had been analyzed in this study. The selection of the variables was based on the markets where Bosch Rexroth products are used, such as mining. Purpose This study aimed to develop five predictive models: one model for the global sales volume, one model each for sales volume in USA and China and one model each for sales volume of CA engine and Viking engine. By identifying external factors that showed significant relationship in various time lags with Bosch Rexroth’s sales volume, the forecasts 2016 and 2017 were produced. Methods The study used a combination of multiple linear regression and a Box - Jenkins AR MA errors to analyze the association of external factors and to produce forecasts. Externa l factors such as commodity prices, inflation and exchange rates between different currencies were taken into account. By using a cross - correlation function between external factors and the sales volume, significant external factors in different time lags were identified and then put into the model. The forecasting method used is a Causal forecasting model. Conclusions The global sales volume of Bosch Rexroth turned out to be affected by the historical price of copper in three different time lags , one, six and seven months . From 2010 to 2015, the copper price have been continuously dropping which explain s the downward trend of the sales volume. The sales volume in The U SA showed a significant association by the price of coal with three and four time lags. This means that the change of coal price takes three and four months before it affects the sales volume in the USA. The market in China showed to be affected by the development of the price of silver. The volume of sales is affected by the price of silver by four and six time lags. CA engine also displayed association with the price of copper at the same time lags as in the global sales volume. On the other hand, Viking engine showed no significant association at all with any of the external factors that were analyzed in this study. The forecast for global mean sales volume will be between 253 to 309 units a month for May 2016 – December 2017. Mean sales volume in USA projected to be in between 24 to 32 units per month. China's mean sales volume is expected to be in between 42 to 81 units a month. Mean sales volume of CA engine has a forecast of 175 to 212 units a month. While the mean s ales of Viking engine projected to stay in a constant volume of 25 units per month. Forecasting ARIMAX Dynamic regression ARMA error Prognostisering försäljningsvolym ARIMAX Dynamisk regression ARMA feltermer
656	Evaluating the Use of Ridge Regression and Principal Components in Propensity Score Estimators under Multicollinearity Gripencrantz, Sarah January 2014 (has links) Multicollinearity can be present in the propensity score model when estimating average treatment effects (ATEs). In this thesis, logistic ridge regression (LRR) and principal components logistic regression (PCLR) are evaluated as an alternative to ML estimation of the propensity score model. ATE estimators based on weighting (IPW), matching and stratification are assessed in a Monte Carlo simulation study to evaluate LRR and PCLR. Further, an empirical example of using LRR and PCLR on real data under multicollinearity is provided. Results from the simulation study reveal that under multicollinearity and in small samples, the use of LRR reduces bias in the matching estimator, compared to ML. In large samples PCLR yields lowest bias, and typically was found to have the lowest MSE in all estimators. PCLR matched ML in bias under IPW estimation and in some cases had lower bias. The stratification estimator was heavily biased compared to matching and IPW but both bias and MSE improved as PCLR was applied, and for some cases under LRR. The specification with PCLR in the empirical example was usually most sensitive as a strongly correlated covariate was included in the propensity score model. Causal Inference Propensity Score IPW estimator Stratification Matching Logistic Ridge Regression Principal Components Logistic Regression
657	Comparisons of Improvement-Over-Chance Effect Sizes for Two Groups Under Variance Heterogeneity and Prior Probabilities Alexander, Erika D. 05 1900 (has links) The distributional properties of improvement-over-chance, I, effect sizes derived from linear and quadratic predictive discriminant analysis (PDA) and from logistic regression analysis (LRA) for the two-group univariate classification were examined. Data were generated under varying levels of four data conditions: population separation, variance pattern, sample size, and prior probabilities. None of the indices provided acceptable estimates of effect for all the conditions examined. There were only a small number of conditions under which both accuracy and precision were acceptable. The results indicate that the decision of which method to choose is primarily determined by variance pattern and prior probabilities. Under variance homogeneity, any of the methods may be recommended. However, LRA is recommended when priors are equal or extreme and linear PDA is recommended when priors are moderate. Under variance heterogeneity, selecting a recommended method is more complex. In many cases, more than one method could be used appropriately. Discriminant analysis. Logistic regression analysis. Index effect sizes predictive discriminant analysis logistic regression simulation
658	Unbiased Recursive Partitioning: A Conditional Inference Framework Hothorn, Torsten, Hornik, Kurt, Zeileis, Achim January 2004 (has links) (PDF) Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: Overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously effects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, indicating the need for an unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on animal abundance, glaucoma classification, node positive breast cancer and mammography experience are re-analyzed. / Series: Research Report Series / Department of Statistics and Mathematics
659	A Cross-Sectional Analysis of Health Impacts of Inorganic Arsenic in Chemical Mixtures Hargarten, Paul 01 January 2015 (has links) Drinking groundwater is the primary way humans accumulate arsenic. Chronic exposure to inorganic arsenic (iAs) (over decades) has been shown to be associated with multiple health effects at low levels (5-10 ppb) including: cancer, elevated blood pressure and cardiovascular disease, skin lesions, renal failure, and peripheral neuropathy. Using hypertension (or high blood pressure) as a surrogate marker for cardiovascular disease, we examined the effect of iAs alone and in a mixture with other metals using a cross-sectional study of adults in United States (National Health and Examination Survey, NHANES, 2005-2010) adjusting for covariates: urinary creatinine level (mg/dL), poverty index ratio (PIR, measure of socioeconomic status, 1 to 5), age, smoking (yes/no), alcohol usage, gender, non-Hispanic Black, and overweight (BMI>=25). A logistic regression model suggests that a one-unit increase in log of inorganic arsenic increases the odds of hypertension by a factor of 1.093 (95% Confidence Interval=0.935, 1.277) adjusted for these covariates , which indicates that there was not significant evidence to claim that inorganic arsenic is a risk factor for hypertension. Biomonitoring data provides evidence that humans are not only exposed to inorganic arsenic but also to mixtures of chemicals including inorganic arsenic, total mercury, cadmium, and lead. We tested for a mixture effect of these four environmental chemicals using weighted quantile sum (WQS) regression, which takes into account the correlation among the chemicals and with the outcome. For one-unit increase in the weighted sum, the adjusted odds of developing hypertension increases by a factor of 1.027 (95% CI=0.882,1.196), which is also not significant after taking into account the same covariates. The insignificant finding may be due to the low inorganic arsenic concentration (8-620 μg /L) in US drinking water, compared to those in countries like Bangladesh where the concentrations are much higher. Literature provides conflicting evidence of the association of inorganic arsenic and hypertension in low/moderate regions; future studies, especially a large cohort study, are needed to confirm if inorganic arsenic alone or with other metals is associated with hypertension in the United States. inorganic arsenic hypertension weighted quantile sum regression logistic regression National Health and Examination Survey NHANES toxicology
660	Robust mixture regression models using t-distribution Wei, Yan January 1900 (has links) Master of Science / Department of Statistics / Weixin Yao / In this report, we propose a robust mixture of regression based on t-distribution by extending the mixture of t-distributions proposed by Peel and McLachlan (2000) to the regression setting. This new mixture of regression model is robust to outliers in y direction but not robust to the outliers with high leverage points. In order to combat this, we also propose a modified version of the proposed method, which fits the mixture of regression based on t-distribution to the data after adaptively trimming the high leverage points. We further propose to adaptively choose the degree of freedom for the t-distribution using profile likelihood. The proposed robust mixture regression estimate has high efficiency due to the adaptive choice of degree of freedom. We demonstrate the effectiveness of the proposed new method and compare it with some of the existing methods through simulation study. EM algorithm Mixture regression models Outliers Robust regression T-distribution Statistics (0463)

Search results