1 |
Réduction de dimension en régression logistique, application aux données actu-palu / Dimension reduction in logistic regression, application to actu-palu dataKwémou Djoukoué, Marius 29 September 2014 (has links)
Cette thèse est consacrée à la sélection de variables ou de modèles en régression logistique. Elle peut-être divisée en deux parties, une partie appliquée et une partie méthodologique. La partie appliquée porte sur l'analyse des données d'une grande enquête socio - épidémiologique dénommée actu-palu. Ces grandes enquêtes socio - épidémiologiques impliquent généralement un nombre considérable de variables explicatives. Le contexte est par nature dit de grande dimension. En raison du fléau de la dimension, le modèle de régression logistique n'est pas directement applicable. Nous procédons en deux étapes, une première étape de réduction du nombre de variables par les méthodes Lasso, Group Lasso et les forêts aléatoires. La deuxième étape consiste à appliquer le modèle logistique au sous-ensemble de variables sélectionné à la première étape. Ces méthodes ont permis de sélectionner les variables pertinentes pour l'identification des foyers à risque d'avoir un épisode fébrile chez un enfant de 2 à 10 ans à Dakar. La partie méthodologique, composée de deux sous-parties, porte sur l'établissement de propriétés techniques d'estimateurs dans le modèle de régression logistique non paramétrique. Ces estimateurs sont obtenus par maximum de vraisemblance pénalisé, dans un cas avec une pénalité de type Lasso ou Group Lasso et dans l'autre cas avec une pénalité de type 1 exposant 0. Dans un premier temps, nous proposons des versions pondérées des estimateurs Lasso et Group Lasso pour le modèle logistique non paramétrique. Nous établissons des inégalités oracles non asymptotiques pour ces estimateurs. Un deuxième ensemble de résultats vise à étendre le principe de sélection de modèle introduit par Birgé et Massart (2001) à la régression logistique. Cette sélection se fait via des critères du maximum de vraisemblance pénalisé. Nous proposons dans ce contexte des critères de sélection de modèle, et nous établissons des inégalités oracles non asymptotiques pour les estimateurs sélectionnés. La pénalité utilisée, dépendant uniquement des données, est calibrée suivant l'idée de l'heuristique de pente. Tous les résultats de la partie méthodologique sont illustrés par des études de simulations numériques. / This thesis is devoted to variables selection or model selection in logistic regression. The applied part focuses on the analysis of data from a large socioepidémiological survey, called actu-palu. These large socioepidemiological survey typically involve a considerable number of explanatory variables. This is well-known as high-dimensional setting. Due to the curse of dimensionality, logistic regression model is no longer reliable. We proceed in two steps, a first step of reducing the number of variables by the Lasso, Group Lasso ans random forests methods. The second step is to apply the logistic model to the sub-set of variables selected in the first step. These methods have helped to select relevant variables for the identification of households at risk of having febrile episode amongst children from 2 to 10 years old in Dakar. In the methodological part, as a first step, we propose weighted versions of Lasso and group Lasso estimators for nonparametric logistic model. We prove non asymptotic oracle inequalities for these estimators. Secondly we extend the model selection principle introduced by Birgé and Massart (2001) to logistic regression model. This selection is done using penalized macimum likelihood criteria. We propose in this context a completely data-driven criteria based on the slope heuristics. We prove non asymptotic oracle inequalities for selected estimators. The results of the methodological part are illustrated through simulation studies.
|
2 |
Dealing with measurement error in covariates with special reference to logistic regression model: a flexible parametric approachHossain, Shahadut 05 1900 (has links)
In many fields of statistical application the fundamental task is to quantify the association between some explanatory variables or covariates and a response or outcome variable through a suitable regression model. The accuracy of such quantification depends on how precisely we measure the relevant covariates. In many instances, we can not measure some of the covariates accurately, rather we can measure noisy versions of them. In statistical terminology this is known as measurement errors or errors in variables. Regression analyses based on noisy covariate measurements lead to biased and inaccurate inference about the true underlying response-covariate associations.
In this thesis we investigate some aspects of measurement error modelling in the case of binary logistic regression models. We suggest a flexible parametric approach for adjusting the measurement error bias while estimating the response-covariate relationship through logistic regression model. We investigate the performance of the proposed flexible parametric approach in comparison with the other flexible parametric and nonparametric approaches through extensive simulation studies. We also compare the proposed method with the other competitive methods with respect to a real-life data set. Though emphasis is put on the logistic regression model the proposed method is applicable to the other members of the generalized linear models, and other types of non-linear regression models too. Finally, we develop a new computational technique to approximate the large sample bias that my arise due to exposure model misspecification in the estimation of the regression parameters in a measurement error scenario.
|
3 |
The factors of influencing people to adopt public animal shelter dogsChen, Ying-peng 27 July 2010 (has links)
none
|
4 |
Dealing with measurement error in covariates with special reference to logistic regression model: a flexible parametric approachHossain, Shahadut 05 1900 (has links)
In many fields of statistical application the fundamental task is to quantify the association between some explanatory variables or covariates and a response or outcome variable through a suitable regression model. The accuracy of such quantification depends on how precisely we measure the relevant covariates. In many instances, we can not measure some of the covariates accurately, rather we can measure noisy versions of them. In statistical terminology this is known as measurement errors or errors in variables. Regression analyses based on noisy covariate measurements lead to biased and inaccurate inference about the true underlying response-covariate associations.
In this thesis we investigate some aspects of measurement error modelling in the case of binary logistic regression models. We suggest a flexible parametric approach for adjusting the measurement error bias while estimating the response-covariate relationship through logistic regression model. We investigate the performance of the proposed flexible parametric approach in comparison with the other flexible parametric and nonparametric approaches through extensive simulation studies. We also compare the proposed method with the other competitive methods with respect to a real-life data set. Though emphasis is put on the logistic regression model the proposed method is applicable to the other members of the generalized linear models, and other types of non-linear regression models too. Finally, we develop a new computational technique to approximate the large sample bias that my arise due to exposure model misspecification in the estimation of the regression parameters in a measurement error scenario.
|
5 |
A combination procedure of universal kriging and logistic regression a thesis presented to the faculty of the Graduate School, Tennessee Technological University /Wu, Songfei. January 2008 (has links)
Thesis (M.S.)--Tennessee Technological University, 2008. / Title from title page screen (viewed on Aug. 26, 2009). Bibliography: leaves 24-26.
|
6 |
Dealing with measurement error in covariates with special reference to logistic regression model: a flexible parametric approachHossain, Shahadut 05 1900 (has links)
In many fields of statistical application the fundamental task is to quantify the association between some explanatory variables or covariates and a response or outcome variable through a suitable regression model. The accuracy of such quantification depends on how precisely we measure the relevant covariates. In many instances, we can not measure some of the covariates accurately, rather we can measure noisy versions of them. In statistical terminology this is known as measurement errors or errors in variables. Regression analyses based on noisy covariate measurements lead to biased and inaccurate inference about the true underlying response-covariate associations.
In this thesis we investigate some aspects of measurement error modelling in the case of binary logistic regression models. We suggest a flexible parametric approach for adjusting the measurement error bias while estimating the response-covariate relationship through logistic regression model. We investigate the performance of the proposed flexible parametric approach in comparison with the other flexible parametric and nonparametric approaches through extensive simulation studies. We also compare the proposed method with the other competitive methods with respect to a real-life data set. Though emphasis is put on the logistic regression model the proposed method is applicable to the other members of the generalized linear models, and other types of non-linear regression models too. Finally, we develop a new computational technique to approximate the large sample bias that my arise due to exposure model misspecification in the estimation of the regression parameters in a measurement error scenario. / Science, Faculty of / Statistics, Department of / Graduate
|
7 |
Sample comparisons using microarrays: - Application of False Discovery Rate and quadratic logistic regressionGuo, Ruijuan 08 January 2008 (has links)
In microarray analysis, people are interested in those features that have different characters in diseased samples compared to normal samples. The usual p-value method of selecting significant genes either gives too many false positives or cannot detect all the significant features. The False Discovery Rate (FDR) method controls false positives and at the same time selects significant features. We introduced Benjamini's method and Storey's method to control FDR, applied the two methods to human Meningioma data. We found that Benjamini's method is more conservative and that, after the number of the tests exceeds a threshold, increase in number of tests will lead to decrease in number of significant genes. In the second chapter, we investigate ways to search interesting gene expressions that cannot be detected by linear models as t-test or ANOVA. We propose a novel approach to use quadratic logistic regression to detect genes in Meningioma data that have non-linear relationship within phenotypes. By using quadratic logistic regression, we can find genes whose expression correlates to their phenotypes both linearly and quadratically. Whether these genes have clinical significant is a very interesting question, since these genes most likely be neglected by traditional linear approach.
|
8 |
Analys av bortfall i en uppföljningsundersökning av hälsa / Analysis of attrition in a longitudinal health studyUdd, Mattias, Pettersson, Niklas January 2008 (has links)
The LSH-study started in 2003 at the department of Health and Society at the University of Linköping. The purpose of the study was to examine the relationship between life condition, stress and health. A total of 1007 people from ten different health centres in Östergötlands län participated. At the follow up, a couple of years later, 795 of the 1007 participated. 127 of the 212 in the attrition turned down the follow up, twelve people were not invited (for example in case of death) and the rest did not respond at all. The purpose of this paper is to find out in what degree the attrition in the follow up can be predicted using the information from the first survey and which variables are important. The differences between different types of attrition have also been examined. Simple and multiple bi- and multinomial logistic regression have been used in the analysis. In total 34 variables were examined and in the final model six variables remained with a significant relation to the attrition. High BMI, regular smoking, high pulse and lack of daily exercise at the first survey were connected to a higher risk for an individual to not participate at the follow up. It is interesting that these factors are considered as risk factors for unhealthy living. Other factors related to a higher attrition were unemployment in the last year before the first survey and if the individual had parents born in another country than Sweden. The risk for attrition increased gradually when more risk factors were shown by the individual. The factors contributing an individual to turn down the follow up instead of not responding at all was if he or she were in the older age segments in the survey or if they were not active in any type of association.
|
9 |
Analys av bortfall i en uppföljningsundersökning av hälsa / Analysis of attrition in a longitudinal health studyUdd, Mattias, Pettersson, Niklas January 2008 (has links)
<p>The LSH-study started in 2003 at the department of Health and Society at the University of Linköping. The purpose of the study was to examine the relationship between life condition, stress and health. A total of 1007 people from ten different health centres in Östergötlands län participated. At the follow up, a couple of years later, 795 of the 1007 participated. 127 of the 212 in the attrition turned down the follow up, twelve people were not invited (for example in case of death) and the rest did not respond at all. The purpose of this paper is to find out in what degree the attrition in the follow up can be predicted using the information from the first survey and which variables are important. The differences between different types of attrition have also been examined. Simple and multiple bi- and multinomial logistic regression have been used in the analysis.</p><p>In total 34 variables were examined and in the final model six variables remained with a significant relation to the attrition. High BMI, regular smoking, high pulse and lack of daily exercise at the first survey were connected to a higher risk for an individual to not participate at the follow up. It is interesting that these factors are considered as risk factors for unhealthy living. Other factors related to a higher attrition were unemployment in the last year before the first survey and if the individual had parents born in another country than Sweden. The risk for attrition increased gradually when more risk factors were shown by the individual. The factors contributing an individual to turn down the follow up instead of not responding at all was if he or she were in the older age segments in the survey or if they were not active in any type of association.</p>
|
10 |
A Comparison Of Remedy Methods For Logistic Regression When Data Are CollinearJanuary 2016 (has links)
Heng Wang
|
Page generated in 0.1061 seconds