• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 21
  • 8
  • 5
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 46
  • 25
  • 14
  • 13
  • 10
  • 9
  • 8
  • 8
  • 7
  • 7
  • 5
  • 5
  • 5
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Multicollinearity and the Estimation of Regression Coefficients

Teed, John Charles 01 May 1978 (has links)
The precision of the estimates of the regression coefficients in a regression analysis is affected by multicollinearity. The effect of certain factors on multicollinearity and the estimates was studied. The response variables were the standard error of the regression coefficients and a standarized statistic that measures the deviation of the regression coefficient from the population parameter. The estimates are not influenced by any one factor in particular, but rather some combination of factors. The larger the sample size, the better the precision of the estimates no matter how "bad" the other factors may be. The standard error of the regression coefficients proved to be the best indication of estimation problems.
12

A comparison of three prediction based methods of choosing the ridge regression parameter k

Gatz, Philip L., Jr. 15 November 2013 (has links)
A solution to the regression model y = xβ+ε is usually obtained using ordinary least squares. However, when the condition of multicollinearity exists among the regressor variables, then many qualities of this solution deteriorate. The qualities include the variances, the length, the stability, and the prediction capabilities of the solution. An analysis called ridge regression introduced a solution to combat this deterioration (Hoerl and Kennard, 1970a). The method uses a solution biased by a parameter k. Many methods have been developed to determine an optimal value of k. This study chose to investigate three little used methods of determining k: the PRESS statistic, Mallows' C<sub>k</sub>. statistic, and DF-trace. The study compared the prediction capabilities of the three methods using data that contained various levels of both collinearity and leverage. This was completed by using a Monte Carlo experiment. / Master of Science
13

Robustifikace statistických a ekonometrických metod regrese / Robustification of statistical and econometrical regression methods

Jurczyk, Tomáš January 2016 (has links)
Title: Robustification of statistical and econometrical regression methods Author: Mgr. Tomáš Jurczyk Department: Department of probability and mathematical statistics Supervisor: prof. RNDr. Jan Ámos Víšek CSc., IES FSV UK Praha Abstract: Multicollinearity and outlier presence are two problems of data which can occur during the regression analysis. In this thesis we are interested mainly in situations where combined outlier-multicollinearity problem is present. We will show first the behavior of classical methods developed for overcoming one of these problems. We will investigate the functionality of methods proposed as robust multicollinearity detectors as well. We will prove that proposed two-step procedures (in one step typically based on robust regression methods) are failing in outlier detection and therefore also multicollinearity detection, if the strong multicollinearity is present in the majority of the data. We will propose a new one-step method as a candidate for the robust detector of multicollinearity as well as the robust ridge regression estimate. We will derive its properties, behavior and propose the diagnostic tools derived from that method. Keywords: multicollinearity, outliers, robust detector of multicollinearity, ro- bust ridge regression 1
14

WEIGHTED QUANTILE SUM REGRESSION FOR ANALYZING CORRELATED PREDICTORS ACTING THROUGH A MEDIATION PATHWAY ON A BIOLOGICAL OUTCOME

Evani, Bhanu M 01 January 2017 (has links)
Abstract Weighted Quantile Sum Regression for Analyzing Correlated Predictors Acting Through a Mediation Pathway on a Biological Outcome By Bhanu M. Evani, Ph.D. A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at Virginia Commonwealth University. Virginia Commonwealth University, 2017. Major Director: Robert A. Perera, Asst. Professor, Department of Biostatistics This work examines mediated effects of a set of correlated predictors using the recently developed Weighted Quantile Sum (WQS) regression method. Traditionally, mediation analysis has been conducted using the multiple regression method, first proposed by Baron and Kenny (1986), which has since been advanced by several authors like MacKinnon (2008). Mediation analysis of a highly correlated predictor set is challenging due to the condition of multicollinearity. Weighted Quantile Sum (WQS) regression can be used as an alternative method to analyze the mediated effects, when predictor correlations are high. As part of the WQS method, a weighted quartile sum index (WQSindex) is computed to represent the predictor set as an entity. The predictor variables in classic mediation are then replaced with the WQSindex, allowing for the estimation of the total indirect effect between all the predictors and the outcome. Predictors having a high relative importance in their association with the outcome can be identified by examining the empirical weights for the individual predictors estimated by the WQS regression method. Other constrained optimization methods (e.g. LASSO) focus on reducing dimensionality of the correlated predictors to reduce multicollinearity. WQS regression in the context of mediation is studied using Monte Carlo simulation for mediation models with two and three correlated predictors. WQS regression’s performance is compared to the classic OLS multiple regression and the regularized LASSO regression methods. An application of these three methods to the National Health and Nutrition Examination Survey (NHANES) dataset examines the effect of serum concentrations of Polychlorinated Biphenyls (independent variables) on the liver enzyme, alanine aminotransferase ALT (outcome), with chromosomal telomere length as a potential mediator. Keywords: Multicollinearity, Weighted Quantile Sum Regression, Mediation Analysis
15

Generalized autoregressive and moving average models: control charts, multicollinearity, and a new modified model / Modelos generalizados auto-regressivos e de médias móveis: gráficos de controle, multicolinearidade e novo modelo modificado

Albarracin, Orlando Yesid Esparza 24 October 2017 (has links)
Recently, in the health surveillance area, control charts have been proposed to decide if the morbidity or mortality of a specific disease reached an epidemic level. This thesis is composed by 3 papers. In the first two papers, CUSUM and EWMA control charts were proposed to monitor count time series with seasonal and trend effects using the Generalized Autoregressive and Moving Average models (GARMA), instead of the independent Generalized Linear Model (GLM) as it is usually used in practice. Different statistics based on transformations, for variables that follow a Negative Binomial distribution, were used in these control charts. In the second paper, two new statistics were proposed based on the ratio of log-likelihood function. Different scenarios describing disease profiles were considered to evaluate the effect of omission of serial correlation in EWMA and CUSUM control charts. The performance of CUSUM and EWMA charts when the serial correlation is neglected in the regression model was measure in terms of average run length (ARL). In summary, when the autocorrelation is neglected, fitting a pure GLM instead of a GARMA model will lead to an increase of false alarms. However, no statistics among the tested ones seem to be robust, in a sense to produce the smallest increase of false alarms in all scenarios. In general, all monitored statistics presented a smaller ARL_0 for higher values of autocorrelation. \\\\ In the last paper, the GARMA models (p, q) with p and q simultaneously different from zero were studied since that two features were observed in practice. One is the multicollinearity, which may lead to a non-convergence of the maximum likelihood, using iteratively reweighted least squares. The second is the inclusion of the same lagged observations into the autoregressive and moving average components confounding the interpretation of the parameters. In a general sense, simulation studies show that the modified model provide estimators closer to the parameters and offer confidence intervals with higher coverage percentage than obtained with the GARMA model, but some restrictions in the parametric space are imposed to guarantee the stationarity of the process. Also, a real data analysis illustrate the GARMA-M fit for daily hospilatization rates of elderly people due to respiratory diseases from October 2012 to April 2015 in São Paulo city, Brazil. / Recentemente, no campo da saúde, gráficos de controle têm sido propostos para monitorar a morbidade ou a mortalidade decorrentes de doenças. Este trabalho está composto por três artigos. Nos dois primeiros artigos, gráficos de controle CUSUM e EWMA foram propostos para monitorar séries temporais de contagens com efeitos sazonais e de tendência usando os modelos Generalized autoregressive and moving average models (GARMA), em vez dos modelos lineares generalizados (GLM), como usualmente são utilizados na prática. Diferentes estatísticas baseadas em transformações, para variávies que seguem uma distribuição Binomial Negativa, foram usadas nestes gráficos de controle. No segundo artigo foram propostas duas novas estatísticas baseadas na razão da função de log-verossimilhança. Diferentes cenários que descrevem perfis de doenças foram considerados para avaliar o efeito da omissão da correlação serial nesses gráficos de controle. Este impacto foi medido em termos do Average Run Lenght (ARL). Notou-se que a negligência da correlação serial induz um aumento de falsos alarmes. Em geral, todas as estatísticas monitoradas apresentaram menores valores de ARL_0 para maiores valores de autocorrelação. No entanto, nenhuma estatística entre as consideradas mostrou ser mais robusta, no sentido de produzir o menor aumento de falsos alarmes nos cenários considerados. No último artigo, foram estudados os modelos GARMA (p, q) com p e q simultaneamente diferentes de zero, uma vez que duas características foram observadas na prática. A primeira é a presença de multicolinearidade, que induz à não-convergência do método de máxima verossimilhança usando mínimos quadrados ponderados reiterados. A segunda é a inclusão dos mesmos termos defasados nos componentes autorregressivos e de médias móveis. Um modelo modificado, GARMA-M, foi apresentado para lidar com a multicolinearidade e melhorar a interpretação dos parâmetros. Em sentido geral, estudos de simulação mostraram que o modelo modificado fornece estimativas mais próximas dos parâmetros e intervalos de confiança com uma cobertura percentual maior do que a obtida nos modelos GARMA. No entanto, algumas restrições no espaço paramétrico são impostas para garantir a estacionariedade do processo. Por último, uma análise de dados reais ilustra o ajuste do modelo GARMA-M para o número de internações diárias de idosos devido a doenças respiratórias de outubro de 2012 a abril de 2015 na cidade de São Paulo, Brasil.
16

Metody dynamické analýzy složení portfolia / Methods of dynamical analysis of portfolio composition

Meňhartová, Ivana January 2012 (has links)
Title: Methods of dynamical analysis of portfolio composition Author: Ivana Meňhartová Department: Department of Probability and Mathematical Statistics Supervisor: Mgr. Tomáš Hanzák, KPMS, MFF UK Abstract: In the presented thesis we study methods used for dynamic analysis of portfolio based on it's revenues. The thesis focuses on Kalman filter and local- ly weighted regression as two basic methods for dynamic analysis. It describes in detail theory for these methods as well as their utilization and it discusses their proper settings. Practical applications of both methods on artificial data and real data from Prague stock-exchange are presented. Using artificial data we demonstrate practical importance of Kalman filter's assumptions. Afterwards we introduce term multicolinearity as a possible complication to real data applicati- ons. At the end of the thesis we compare results and usage of both methods and we introduce possibility of enhancing Kalman filter by projection of estimations or by CUSUM tests (change detection tests). Keywords: Kalman filter, locally weighted regression, multicollinearity, CUSUM test
17

Generalized autoregressive and moving average models: control charts, multicollinearity, and a new modified model / Modelos generalizados auto-regressivos e de médias móveis: gráficos de controle, multicolinearidade e novo modelo modificado

Orlando Yesid Esparza Albarracin 24 October 2017 (has links)
Recently, in the health surveillance area, control charts have been proposed to decide if the morbidity or mortality of a specific disease reached an epidemic level. This thesis is composed by 3 papers. In the first two papers, CUSUM and EWMA control charts were proposed to monitor count time series with seasonal and trend effects using the Generalized Autoregressive and Moving Average models (GARMA), instead of the independent Generalized Linear Model (GLM) as it is usually used in practice. Different statistics based on transformations, for variables that follow a Negative Binomial distribution, were used in these control charts. In the second paper, two new statistics were proposed based on the ratio of log-likelihood function. Different scenarios describing disease profiles were considered to evaluate the effect of omission of serial correlation in EWMA and CUSUM control charts. The performance of CUSUM and EWMA charts when the serial correlation is neglected in the regression model was measure in terms of average run length (ARL). In summary, when the autocorrelation is neglected, fitting a pure GLM instead of a GARMA model will lead to an increase of false alarms. However, no statistics among the tested ones seem to be robust, in a sense to produce the smallest increase of false alarms in all scenarios. In general, all monitored statistics presented a smaller ARL_0 for higher values of autocorrelation. \\\\ In the last paper, the GARMA models (p, q) with p and q simultaneously different from zero were studied since that two features were observed in practice. One is the multicollinearity, which may lead to a non-convergence of the maximum likelihood, using iteratively reweighted least squares. The second is the inclusion of the same lagged observations into the autoregressive and moving average components confounding the interpretation of the parameters. In a general sense, simulation studies show that the modified model provide estimators closer to the parameters and offer confidence intervals with higher coverage percentage than obtained with the GARMA model, but some restrictions in the parametric space are imposed to guarantee the stationarity of the process. Also, a real data analysis illustrate the GARMA-M fit for daily hospilatization rates of elderly people due to respiratory diseases from October 2012 to April 2015 in São Paulo city, Brazil. / Recentemente, no campo da saúde, gráficos de controle têm sido propostos para monitorar a morbidade ou a mortalidade decorrentes de doenças. Este trabalho está composto por três artigos. Nos dois primeiros artigos, gráficos de controle CUSUM e EWMA foram propostos para monitorar séries temporais de contagens com efeitos sazonais e de tendência usando os modelos Generalized autoregressive and moving average models (GARMA), em vez dos modelos lineares generalizados (GLM), como usualmente são utilizados na prática. Diferentes estatísticas baseadas em transformações, para variávies que seguem uma distribuição Binomial Negativa, foram usadas nestes gráficos de controle. No segundo artigo foram propostas duas novas estatísticas baseadas na razão da função de log-verossimilhança. Diferentes cenários que descrevem perfis de doenças foram considerados para avaliar o efeito da omissão da correlação serial nesses gráficos de controle. Este impacto foi medido em termos do Average Run Lenght (ARL). Notou-se que a negligência da correlação serial induz um aumento de falsos alarmes. Em geral, todas as estatísticas monitoradas apresentaram menores valores de ARL_0 para maiores valores de autocorrelação. No entanto, nenhuma estatística entre as consideradas mostrou ser mais robusta, no sentido de produzir o menor aumento de falsos alarmes nos cenários considerados. No último artigo, foram estudados os modelos GARMA (p, q) com p e q simultaneamente diferentes de zero, uma vez que duas características foram observadas na prática. A primeira é a presença de multicolinearidade, que induz à não-convergência do método de máxima verossimilhança usando mínimos quadrados ponderados reiterados. A segunda é a inclusão dos mesmos termos defasados nos componentes autorregressivos e de médias móveis. Um modelo modificado, GARMA-M, foi apresentado para lidar com a multicolinearidade e melhorar a interpretação dos parâmetros. Em sentido geral, estudos de simulação mostraram que o modelo modificado fornece estimativas mais próximas dos parâmetros e intervalos de confiança com uma cobertura percentual maior do que a obtida nos modelos GARMA. No entanto, algumas restrições no espaço paramétrico são impostas para garantir a estacionariedade do processo. Por último, uma análise de dados reais ilustra o ajuste do modelo GARMA-M para o número de internações diárias de idosos devido a doenças respiratórias de outubro de 2012 a abril de 2015 na cidade de São Paulo, Brasil.
18

On Some Ridge Regression Estimators for Logistic Regression Models

Williams, Ulyana P 28 March 2018 (has links)
The purpose of this research is to investigate the performance of some ridge regression estimators for the logistic regression model in the presence of moderate to high correlation among the explanatory variables. As a performance criterion, we use the mean square error (MSE), the mean absolute percentage error (MAPE), the magnitude of bias, and the percentage of times the ridge regression estimator produces a higher MSE than the maximum likelihood estimator. A Monto Carlo simulation study has been executed to compare the performance of the ridge regression estimators under different experimental conditions. The degree of correlation, sample size, number of independent variables, and log odds ratio has been varied in the design of experiment. Simulation results show that under certain conditions, the ridge regression estimators outperform the maximum likelihood estimator. Moreover, an empirical data analysis supports the main findings of this study. This thesis proposed and recommended some good ridge regression estimators of the logistic regression model for the practitioners in the field of health, physical and social sciences.
19

Παραβιάσεις των βασικών υποθέσεων του γραμμικού μοντέλου παλινδρόμησης

Γρηγοριάδου, Μαρία 05 February 2015 (has links)
Το στατιστικό μοντέλο είναι μία τυποποίηση στοχαστικών σχέσεων μεταξύ μεταβλητών σε μορφή μαθηματικών εξισώσεων με σκοπό την όσο το δυνατόν πιο ακριβή περιγραφή ενός συστήματος (φαινομένου ή γεγονότος). Σχεδόν σε κάθε σύστημα, υπάρχουν μεταβλητές ποσότητες που αλλάζουν. Ένα ενδιαφέρον ζήτημα είναι η μελέτη των επιδράσεων που αυτές οι μεταβλητές ασκούν (ή φαίνεται να ασκούν) πάνω σε άλλες. Η μελέτη αυτή είναι το αντικείμενο της ανάλυσης παλινδρόμησης, μίας ευρέως χρησιμοποιούμενης στατιστικής τεχνικής, την οποία χρησιμοποιούμε για να ανιχνεύσουμε και να μοντελοποιήσουμε σχέσεις και εξαρτήσεις μεταξύ μεταβλητών. Όταν οι σχέσεις μεταξύ των μεταβλητών είναι γραμμικές, προκύπτουν τα λεγόμενα γραμμικά παλινδρομικά μοντέλα. Τα στατιστικά μοντέλα παλινδρόμησης, βασίζονται σε κάποιες βασικές υποθέσεις, τις οποίες υποχρεούμαστε να ελέγχουμε πριν την ανάλυση του μοντέλου. Στην πράξη, όμως, οι υποθέσεις αυτές συχνά παραβιάζονται. Όταν δε, έχουμε να κάνουμε με δεδομένα του πραγματικού κόσμου, η παραβίαση των υποθέσεων αυτών είναι τόσο συχνή που αποτελεί στη συντριπτική πλειοψηφία τον κανόνα παρά την εξαίρεση. Η παρούσα διπλωματική εργασία πραγματεύεται το σημαντικότατο θέμα που ανακύπτει σε περιπτώσεις στις οποίες κάποιες από τις βασικές υποθέσεις που διέπουν το γραμμικό μοντέλο παλινδρόμησης παραβιάζονται. Σκοπός της εργασίας αυτής είναι : α)να αναλυθούν οι αιτίες που προκαλούν την κάθε παραβίαση και οι επιπτώσεις που έχει αυτή στο μοντέλο, β)να καταγραφούν οι βασικότεροι τρόποι ανίχνευσης των παραβιάσεων στο υπόδειγμα, γ)να βρεθούν τρόποι αντιμετώπισης των "προβληματικών καταστάσεων". Τα αποτελέσματα δείχνουν ότι ο συνδυασμός της καθεστηκυίας γνώσης (του θεωρητικού υποβάθρου) για το αντικείμενο και των σύγχρονων μεθόδων και ιδεών μπορούν να μειώσουν σημαντικά τις δυσμενείς επιπτώσεις που επιφέρουν οι παραβιάσεις των κανόνων στο μοντέλο, και παράλληλα μας επιτρέπει να "περισώσουμε" ικανοποιητικό ποσό πληροφορίας. / The statistical model is a standarization of stochastic relationships between variables in a form of mathematical equations in order to accurately describe a system, either phenomena, or facts. Almost every system includes some variable amounts that change.The interesting question is to investigate the effects those variables have (or appear to have) on other variables. This kind of investigation is the object of the regression analysis,a widely used statistical technic, which is used so as to detect relations and dependences between variables. Linear regression models are created when there are linear relations between variables. In addition, statistical models are based on some significant assumptions, that we are obliged to validate before we analyze the model. However, these assumptions are often violated in practise. Especially when we have to face with <<real world>> data, the violation is too frecuent that ends to be the rule instead the exception. The current thesis addresses the important subject which arises when some basic assumptions of the linear regression model are violated.The purpose of writing this thesis is : a)to analyse the reasons why the basic assumptions are violated and how these violations effect to our model b)to report the main methods in order to scan the model for violations c)to find ways to fight the problems The investigation results to the fact that if we combine the theoretical backround and the modern methods and techniques, we can reduce the adverse consecuences -and occasionally even reverse the damages- that the violations breed to the model, with simultaneous <<salvation>> of a quite satisfactory amount of information.
20

Some Aspects of Propensity Score-based Estimators for Causal Inference

Pingel, Ronnie January 2014 (has links)
This thesis consists of four papers that are related to commonly used propensity score-based estimators for average causal effects. The first paper starts with the observation that researchers often have access to data containing lots of covariates that are correlated. We therefore study the effect of correlation on the asymptotic variance of an inverse probability weighting and a matching estimator. Under the assumptions of normally distributed covariates, constant causal effect, and potential outcomes and a logit that are linear in the parameters we show that the correlation influences the asymptotic efficiency of the estimators differently, both with regard to direction and magnitude. Further, the strength of the confounding towards the outcome and the treatment plays an important role. The second paper extends the first paper in that the estimators are studied under the more realistic setting of using the estimated propensity score. We also relax several assumptions made in the first paper, and include the doubly robust estimator. Again, the results show that the correlation may increase or decrease the variances of the estimators, but we also observe that several aspects influence how correlation affects the variance of the estimators, such as the choice of estimator, the strength of the confounding towards the outcome and the treatment, and whether constant or non-constant causal effect is present. The third paper concerns estimation of the asymptotic variance of a propensity score matching estimator. Simulations show that large gains can be made for the mean squared error by properly selecting smoothing parameters of the variance estimator and that a residual-based local linear estimator may be a more efficient estimator for the asymptotic variance. The specification of the variance estimator is shown to be crucial when evaluating the effect of right heart catheterisation, i.e. we show either a negative effect on survival or no significant effect depending on the choice of smoothing parameters.   In the fourth paper, we provide an analytic expression for the covariance matrix of logistic regression with normally distributed regressors. This paper is related to the other papers in that logistic regression is commonly used to estimate the propensity score.

Page generated in 0.1766 seconds