Global ETD Search

651	New regression methods for measures of central tendency Aristodemou, Katerina January 2014 (has links) Measures of central tendency have been widely used for summarising statistical data, with the mean being the most popular summary statistic. However, in reallife applications it is not always the most representative measure of central location, especially when dealing with data which is skewed or contains outliers. Alternative statistics with less bias are the median and the mode. Median and quantile regression has been used in different fields to examine the effect of factors at different points of the distribution. Mode estimation, on the other hand, has found many applications in cases where the analysis focuses on obtaining information about the most typical value or pattern. This thesis demonstrates that mode also plays an important role in the analysis of big data, which is becoming increasingly important in many sectors of the global economy. However, mode regression has not been widely applied, even though there is a clear conceptual benefit, due to the computational and theoretical limitations of the existing estimators. Similarly, despite the popularity of the binary quantile regression model, computational straight forward estimation techniques do not exist. Driven by the demand for simple, well-found and easy to implement inference tools, this thesis develops a series of new regression methods for mode and binary quantile regression. Chapter 2 deals with mode regression methods from the Bayesian perspective and presents one parametric and two non-parametric methods of inference. Chapter 3 demonstrates a mode-based, fast pattern-identification method for big data and proposes the first fully parametric mode regression method, which effectively uncovers the dependency of typical patterns on a number of covariates. The proposed approach is demonstrated through the analysis of a decade-long dataset on the Body Mass Index and associated factors, taken from the Health Survey for England. Finally, Chapter 4 presents an alternative binary quantile regression approach, based on the nonlinear least asymmetric weighted squares, which can be implemented using standard statistical packages and guarantees a unique solution. 519.5
652	Prognostisering av försäljningsvolym med hjälp av omvärldsindikatorer Liendeborg, Zaida, Karlsson, Mattias January 2016 (has links) Background Forecasts are used as a basis for decision making and they mainly affect decisions at strategic and tactical levels in a company or organization. There are two different methods to perform forecasts. The first one is a qualitative method where a n expert or group of experts tell about the future. The second one is a quantitative method where forecast are produced by mathematical and statistical models. This study used a quantitative method to build a forecast model and took into account external f actors in forecasting the sales volume of Bosch Rexroth’s hydraulic motors. There is a very wide range of external factors and only a limited selection had been analyzed in this study. The selection of the variables was based on the markets where Bosch Rexroth products are used, such as mining. Purpose This study aimed to develop five predictive models: one model for the global sales volume, one model each for sales volume in USA and China and one model each for sales volume of CA engine and Viking engine. By identifying external factors that showed significant relationship in various time lags with Bosch Rexroth’s sales volume, the forecasts 2016 and 2017 were produced. Methods The study used a combination of multiple linear regression and a Box - Jenkins AR MA errors to analyze the association of external factors and to produce forecasts. Externa l factors such as commodity prices, inflation and exchange rates between different currencies were taken into account. By using a cross - correlation function between external factors and the sales volume, significant external factors in different time lags were identified and then put into the model. The forecasting method used is a Causal forecasting model. Conclusions The global sales volume of Bosch Rexroth turned out to be affected by the historical price of copper in three different time lags , one, six and seven months . From 2010 to 2015, the copper price have been continuously dropping which explain s the downward trend of the sales volume. The sales volume in The U SA showed a significant association by the price of coal with three and four time lags. This means that the change of coal price takes three and four months before it affects the sales volume in the USA. The market in China showed to be affected by the development of the price of silver. The volume of sales is affected by the price of silver by four and six time lags. CA engine also displayed association with the price of copper at the same time lags as in the global sales volume. On the other hand, Viking engine showed no significant association at all with any of the external factors that were analyzed in this study. The forecast for global mean sales volume will be between 253 to 309 units a month for May 2016 – December 2017. Mean sales volume in USA projected to be in between 24 to 32 units per month. China's mean sales volume is expected to be in between 42 to 81 units a month. Mean sales volume of CA engine has a forecast of 175 to 212 units a month. While the mean s ales of Viking engine projected to stay in a constant volume of 25 units per month. Forecasting ARIMAX Dynamic regression ARMA error Prognostisering försäljningsvolym ARIMAX Dynamisk regression ARMA feltermer
653	Evaluating the Use of Ridge Regression and Principal Components in Propensity Score Estimators under Multicollinearity Gripencrantz, Sarah January 2014 (has links) Multicollinearity can be present in the propensity score model when estimating average treatment effects (ATEs). In this thesis, logistic ridge regression (LRR) and principal components logistic regression (PCLR) are evaluated as an alternative to ML estimation of the propensity score model. ATE estimators based on weighting (IPW), matching and stratification are assessed in a Monte Carlo simulation study to evaluate LRR and PCLR. Further, an empirical example of using LRR and PCLR on real data under multicollinearity is provided. Results from the simulation study reveal that under multicollinearity and in small samples, the use of LRR reduces bias in the matching estimator, compared to ML. In large samples PCLR yields lowest bias, and typically was found to have the lowest MSE in all estimators. PCLR matched ML in bias under IPW estimation and in some cases had lower bias. The stratification estimator was heavily biased compared to matching and IPW but both bias and MSE improved as PCLR was applied, and for some cases under LRR. The specification with PCLR in the empirical example was usually most sensitive as a strongly correlated covariate was included in the propensity score model. Causal Inference Propensity Score IPW estimator Stratification Matching Logistic Ridge Regression Principal Components Logistic Regression
654	Comparisons of Improvement-Over-Chance Effect Sizes for Two Groups Under Variance Heterogeneity and Prior Probabilities Alexander, Erika D. 05 1900 (has links) The distributional properties of improvement-over-chance, I, effect sizes derived from linear and quadratic predictive discriminant analysis (PDA) and from logistic regression analysis (LRA) for the two-group univariate classification were examined. Data were generated under varying levels of four data conditions: population separation, variance pattern, sample size, and prior probabilities. None of the indices provided acceptable estimates of effect for all the conditions examined. There were only a small number of conditions under which both accuracy and precision were acceptable. The results indicate that the decision of which method to choose is primarily determined by variance pattern and prior probabilities. Under variance homogeneity, any of the methods may be recommended. However, LRA is recommended when priors are equal or extreme and linear PDA is recommended when priors are moderate. Under variance heterogeneity, selecting a recommended method is more complex. In many cases, more than one method could be used appropriately. Discriminant analysis. Logistic regression analysis. Index effect sizes predictive discriminant analysis logistic regression simulation
655	Unbiased Recursive Partitioning: A Conditional Inference Framework Hothorn, Torsten, Hornik, Kurt, Zeileis, Achim January 2004 (has links) (PDF) Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: Overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously effects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, indicating the need for an unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on animal abundance, glaucoma classification, node positive breast cancer and mammography experience are re-analyzed. / Series: Research Report Series / Department of Statistics and Mathematics
656	A Cross-Sectional Analysis of Health Impacts of Inorganic Arsenic in Chemical Mixtures Hargarten, Paul 01 January 2015 (has links) Drinking groundwater is the primary way humans accumulate arsenic. Chronic exposure to inorganic arsenic (iAs) (over decades) has been shown to be associated with multiple health effects at low levels (5-10 ppb) including: cancer, elevated blood pressure and cardiovascular disease, skin lesions, renal failure, and peripheral neuropathy. Using hypertension (or high blood pressure) as a surrogate marker for cardiovascular disease, we examined the effect of iAs alone and in a mixture with other metals using a cross-sectional study of adults in United States (National Health and Examination Survey, NHANES, 2005-2010) adjusting for covariates: urinary creatinine level (mg/dL), poverty index ratio (PIR, measure of socioeconomic status, 1 to 5), age, smoking (yes/no), alcohol usage, gender, non-Hispanic Black, and overweight (BMI>=25). A logistic regression model suggests that a one-unit increase in log of inorganic arsenic increases the odds of hypertension by a factor of 1.093 (95% Confidence Interval=0.935, 1.277) adjusted for these covariates , which indicates that there was not significant evidence to claim that inorganic arsenic is a risk factor for hypertension. Biomonitoring data provides evidence that humans are not only exposed to inorganic arsenic but also to mixtures of chemicals including inorganic arsenic, total mercury, cadmium, and lead. We tested for a mixture effect of these four environmental chemicals using weighted quantile sum (WQS) regression, which takes into account the correlation among the chemicals and with the outcome. For one-unit increase in the weighted sum, the adjusted odds of developing hypertension increases by a factor of 1.027 (95% CI=0.882,1.196), which is also not significant after taking into account the same covariates. The insignificant finding may be due to the low inorganic arsenic concentration (8-620 μg /L) in US drinking water, compared to those in countries like Bangladesh where the concentrations are much higher. Literature provides conflicting evidence of the association of inorganic arsenic and hypertension in low/moderate regions; future studies, especially a large cohort study, are needed to confirm if inorganic arsenic alone or with other metals is associated with hypertension in the United States. inorganic arsenic hypertension weighted quantile sum regression logistic regression National Health and Examination Survey NHANES toxicology
657	Robust mixture regression models using t-distribution Wei, Yan January 1900 (has links) Master of Science / Department of Statistics / Weixin Yao / In this report, we propose a robust mixture of regression based on t-distribution by extending the mixture of t-distributions proposed by Peel and McLachlan (2000) to the regression setting. This new mixture of regression model is robust to outliers in y direction but not robust to the outliers with high leverage points. In order to combat this, we also propose a modified version of the proposed method, which fits the mixture of regression based on t-distribution to the data after adaptively trimming the high leverage points. We further propose to adaptively choose the degree of freedom for the t-distribution using profile likelihood. The proposed robust mixture regression estimate has high efficiency due to the adaptive choice of degree of freedom. We demonstrate the effectiveness of the proposed new method and compare it with some of the existing methods through simulation study. EM algorithm Mixture regression models Outliers Robust regression T-distribution Statistics (0463)
658	A new biased estimator for multivariate regression models with highly collinear variables / Ein neuer verzerrter Schätzer für lineare Regressionsmodelle mit stark korrelierten Regressoren Wissel, Julia January 2009 (has links) (PDF) Es ist wohlbekannt, dass der Kleinste-Quadrate-Schätzer im Falle vorhandener Multikollinearität eine große Varianz besitzt. Eine Möglichkeit dieses Problem zu umgehen, besteht in der Verwendung von verzerrten Schätzern, z.B den Ridge-Schätzer. In dieser Arbeit wird ein neues Schätzverfahren vorgestellt, dass auf Addition einer kleinen Konstanten omega auf die Regressoren beruht. Der dadurch erzeugte Schätzer wird in Abhängigkeit von omega beschrieben und es wird gezeigt, dass dessen Mean Squared Error kleiner ist als der des Kleinste-Quadrate-Schätzers im Falle von stark korrelierten Regressoren. / It is well known, that the least squares estimator performs poorly in the presence of multicollinearity. One way to overcome this problem is using biased estimators, e.g. ridge regression estimators. In this study an estimation procedure is proposed based on adding a small quantity omega on some or each regressor. The resulting biased estimator is described in dependence of omega and furthermore it is shown that its mean squared error is smaller than the one corresponding to the least squares estimator in the case of highly correlated regressors. Starke Kopplung Korrelation Regressionsanalyse Kollinearität Ridge-Regression Lineare Regression ddc:510
659	Tackling the Antibiotic Resistant Bacteria Crisis Using Longitudinal Antibiograms Tlachac, Monica 31 May 2018 (has links) Antibiotic resistant bacteria, a growing health crisis, arise due to antibiotic overuse and misuse. Resistant infections endanger the lives of patients and are financially burdensome. Aggregate antimicrobial susceptibility reports, called antibiograms, are critical for tracking antibiotic susceptibility and evaluating the likelihood of the effectiveness of different antibiotics to treat an infection prior to the availability of patient specific susceptibility data. This research leverages the Massachusetts Statewide Antibiogram database, a rich dataset composed of antibiograms for $754$ antibiotic-bacteria pairs collected by the Massachusetts Department of Public Health from $2002$ to $2016$. However, these antibiograms are at least a year old, meaning antibiotics are prescribed based on outdated data which unnecessarily furthers resistance. Our objective is to employ data science techniques on these antibiograms to assist in developing more responsible antibiotic prescription practices. First, we use model selectors with regression-based techniques to forecast the current antimicrobial resistance. Next, we develop an assistant to immediately identify clinically and statistically significant changes in antimicrobial resistance between years once the most recent year of antibiograms are collected. Lastly, we use k-means clustering on resistance trends to detect antibiotic-bacteria pairs with resistance trends for which forecasting will not be effective. These three strategies can be implemented to guide more responsible antibiotic prescription practices and thus reduce unnecessary increases in antibiotic resistance. Antibiograms Antimicrobial Resistance ARIMA Clinical Significance Model Selector Outlier Detection Regression Statistical Significance Support Vector Regression
660	Tackling the Antibiotic Resistant Bacteria Crisis Using Longitudinal Antibiograms Tlachac, Monica 31 May 2018 (has links) Antibiotic resistant bacteria, a growing health crisis, arise due to antibiotic overuse and misuse. Resistant infections endanger the lives of patients and are financially burdensome. Aggregate antimicrobial susceptibility reports, called antibiograms, are critical for tracking antibiotic susceptibility and evaluating the likelihood of the effectiveness of different antibiotics to treat an infection prior to the availability of patient specific susceptibility data. This research leverages the Massachusetts Statewide Antibiogram database, a rich dataset composed of antibiograms for $754$ antibiotic-bacteria pairs collected by the Massachusetts Department of Public Health from $2002$ to $2016$. However, these antibiograms are at least a year old, meaning antibiotics are prescribed based on outdated data which unnecessarily furthers resistance. Our objective is to employ data science techniques on these antibiograms to assist in developing more responsible antibiotic prescription practices. First, we use model selectors with regression-based techniques to forecast the current antimicrobial resistance. Next, we develop an assistant to immediately identify clinically and statistically significant changes in antimicrobial resistance between years once the most recent year of antibiograms are collected. Lastly, we use k-means clustering on resistance trends to detect antibiotic-bacteria pairs with resistance trends for which forecasting will not be effective. These three strategies can be implemented to guide more responsible antibiotic prescription practices and thus reduce unnecessary increases in antibiotic resistance. Antibiograms Antimicrobial Resistance ARIMA Clinical Significance Model Selector Outlier Detection Regression Statistical Significance Support Vector Regression

Search results