Spelling suggestions: "subject:"degression"" "subject:"aregression""
91 |
Statistical modelling of ECDA data for the prioritisation of defects on buried pipelinesBin Muhd Noor, Nik Nooruhafidzi January 2017 (has links)
Buried pipelines are vulnerable to the threat of corrosion. Hence, they are normally coated with a protective coating to isolate the metal substrate from the surrounding environment with the addition of CP current being applied to the pipeline surface to halt any corrosion activity that might be taking place. With time, this barrier will deteriorate which could potentially lead to corrosion of the pipe. The External Corrosion Direct Assessment (ECDA) methodology was developed with the intention of upholding the structural integrity of pipelines. Above ground indirect inspection techniques such as the DCVG which is an essential part of an ECDA, is commonly used to determine coating defect locations and measure the defect's severity. This is followed by excavation of the identified location for further examination on the extent of pipeline damage. Any coating or corrosion defect found at this stage is repaired and remediated. The location of such excavations is determined by the measurements obtained from the DCVG examination in the form of %IR and subjective inputs from experts which bases their justification on the environment and the physical characteristics of the pipeline. Whilst this seems to be a straight forward process, the factors that comes into play which gave rise to the initial %IR is not fully understood. The lack of understanding with the additional subjective inputs from the assessors has led to unnecessary excavations being conducted which has put tremendous financial strain on pipeline operators. Additionally, the threat of undiscovered defects due to the erroneous nature of the current method has the potential to severely compromise the pipeline's safe continual operation. Accurately predicting the coating defect size (TCDA) and interpretation of the indication signal (%IR) from an ECDA is important for pipeline operators to promote safety while keeping operating cost at a minimum. Furthermore, with better estimates, the uncertainty from the DCVG indication is reduced and the decisions made on the locations of excavation is better informed. However, ensuring the accuracy of these estimates does not come without challenges. These challenges include (1) the need of proper methods for large data analysis from indirect assessment and (2) uncertainty about the probability distribution of quantities. Standard mean regression models e.g. the OLS, were used but fail to take the skewness of the distributions involved into account. The aim of this thesis is thus, to come up with statistical models to better predict TCDA and to interpret the %IR from the indirect assessment of an ECDA more precisely. The pipeline data used for the analyses is based on a recent ECDA project conducted by TWI Ltd. for the Middle Eastern Oil Company (MEOC). To address the challenges highlighted above, Quantile Regression (QR) was used to comprehensively characterise the underlying distribution of the dependent variable. This can be effective for example, when determining the different effect of contributing variables towards different sizes of TCDA (different quantiles). Another useful advantage is that the technique is robust to outliers due to its reliance on absolute errors. With the traditional mean regression, the effect of contributing variables towards other quantiles of the dependent variable is ignored. Furthermore, the OLS involves the squaring of errors which makes it less robust to outliers. Other forms of QR such as the Bayesian Quantile Regression (BQR) which has the advantage of supplementing future inspection projects with prior data and the Logistic Quantile Regression (LQR) which ensures the prediction of the dependent variable is within its specified bounds was applied to the MEOC dataset. The novelty of research lies in the approaches (methods) taken by the author in producing the models highlighted above. The summary of such novelty includes: * The use of non-linear Quantile Regression (QR) with interacting variables for TCDA prediction. * The application of a regularisation procedure (LASSO) for the generalisation of the TCDA prediction model.* The usage of the Bayesian Quantile Regression (BQR) technique to estimate the %IR and TCDA. * The use of Logistic Regression as a guideline towards the probability of excavation * And finally, the use of Logistic Quantile Regression (LQR) in ensuring the predicted values are within bounds for the prediction of the %IR and POPD. Novel findings from this thesis includes: * Some degree of relationship between the DCVG technique (%IR readings) and corrosion dimension. The results of the relationship between TCDA and POPD highlights a negative trend which further supports the idea that %IR has some relation to corrosion. * Based on the findings from Chapter 4, 5 and 6 suggests that corrosion activity rate is more prominent than the growth of TCDA at its median depth. It is therefore suggested that for this set of pipelines (those belonging to MEOC) repair of coating defects should be done before the coating defect has reached its median size. To the best of the Author's knowledge, the process of employing such approaches has never been applied before towards any ECDA data. The findings from this thesis also shed some light into the stochastic nature of the evolution of corrosion pits. This was not known before and is only made possible by the usage of the approaches highlighted above. The resulting models are also of novelty since no previous model has ever been developed based on the said methods. The contribution to knowledge from this research is therefore the greater understanding of relationship between variables stated above (TCDA, %IR and POPD). With this new knowledge, one has the potential to better prioritise location of excavation and better interpret DCVG indications. With the availability of ECDA data, it is also possible to predict the magnitude of corrosion activity by using the models developed in this thesis. Furthermore, the knowledge gained here has the potential to translate into cost saving measures for pipeline operators while ensuring safety is properly addressed.
|
92 |
A unified view of high-dimensional bridge regressionWeng, Haolei January 2017 (has links)
In many application areas ranging from bioinformatics to imaging, we are interested in recovering a sparse coefficient in the high-dimensional linear model, when the sample size n is comparable to or less than the dimension p. One of the most popular classes of estimators is the Lq-regularized least squares (LQLS), a.k.a. bridge regression. There have been extensive studies towards understanding the performance of the best subset selection (q=0), LASSO (q=1) and ridge (q=2), three widely known estimators from the LQLS family. This thesis aims at giving a unified view of LQLS for all the non-negative values of q. In contrast to most existing works which obtain order-wise error bounds with loose constants, we derive asymptotically exact error formulas characterized through a series of fixed point equations. A delicate analysis of the fixed point equations enables us to gain fruitful insights into the statistical properties of LQLS across the entire spectrum of Lq-regularization. Our work not only validates the scope of folklore understanding of Lq-minimization, but also provides new insights into high-dimensional statistics as a whole. We will elaborate on our theoretical findings mainly from parameter estimation point of view. At the end of the thesis, we briefly mention bridge regression for variable selection and prediction.
We start by considering the parameter estimation problem and evaluate the performance of LQLS by characterizing the asymptotic mean square error (AMSE). The expression we derive for AMSE does not have explicit forms and hence is not useful in comparing LQLS for different values of q, or providing information in evaluating the effect of relative sample size n/p or the sparsity level of the coefficient. To simplify the expression, we first perform the phase transition (PT) analysis, a widely accepted analysis diagram, of LQLS. Our results reveal some of the limitations and misleading features of the PT framework. To overcome these limitations, we propose the small-error analysis of LQLS. Our new analysis framework not only sheds light on the results of the phase transition analysis, but also describes when phase transition analysis is reliable, and presents a more accurate comparison among different Lq-regularizations.
We then extend our low noise sensitivity analysis to linear models without sparsity structure. Our analysis, as a generalization of phase transition analysis, reveals a clear picture of bridge regression for estimating generic coefficients. Moreover, by a simple transformation we connect our low-noise sensitivity framework to the classical asymptotic regime in which n/p goes to infinity, and give some insightful implications beyond what classical asymptotic analysis of bridge regression can offer.
Furthermore, following the same idea of the new analysis framework, we are able to obtain an explicit characterization of AMSE in the form of second-order expansions under the large noise regime. The expansions provide us some intriguing messages. For example, ridge will outperform LASSO in terms of estimating sparse coefficients when the measurement noise is large.
Finally, we present a short analysis of LQLS, for the purpose of variable selection and prediction. We propose a two-stage variable selection technique based on the LQLS estimators, and describe its superiority and close connection to parameter estimation. For prediction, we illustrate the intricate relation between the tuning parameter selection for optimal in-sample prediction and optimal parameter estimation.
|
93 |
On single-index model and its related topicsChang, Ziqing 01 January 2009 (has links)
No description available.
|
94 |
Fixed and random effects selection in nonparametric additive mixed models.January 2010 (has links)
Lai, Chu Shing. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2010. / Includes bibliographical references (leaves 44-46). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 2 --- B-Spline Modeling of Nonparametric Fixed Effects --- p.3 / Chapter 3 --- Parameter Estimation --- p.5 / Chapter 3.1 --- Fixed Component Estimation using Adaptive Group Lasso --- p.5 / Chapter 3.2 --- Random Component Estimation using Newton Raphson --- p.7 / Chapter 3.3 --- Combining the Two Algorithms --- p.9 / Chapter 4 --- Selection of Model Complexity --- p.10 / Chapter 4.1 --- Model Selection Criterion --- p.10 / Chapter 4.2 --- Calculating the Degrees of Freedom --- p.10 / Chapter 4.3 --- Practical Minimization of (4.1) --- p.12 / Chapter 5 --- Theoretical results / Chapter 5.1 --- Consistency of adaptive group lasso --- p.14 / Chapter 5.2 --- Consistency of Bayesian Information Criterion --- p.16 / Chapter 6 --- Simulations / Chapter 7 --- Real applications / Chapter 7.1 --- Prostate cancer data --- p.23 / Chapter 7.2 --- Housing data --- p.25 / Chapter 7.3 --- Depression Dataset --- p.27 / Chapter 8 --- Summary --- p.31 / Chapter A --- Derivation of (3.7) and (3.8) --- p.32 / Chapter B --- Lemmas --- p.34 / Chapter C --- Proofs of theorems --- p.37
|
95 |
Réduction de dimension en régression logistique, application aux données actu-palu / Dimension reduction in logistic regression, application to actu-palu dataKwémou Djoukoué, Marius 29 September 2014 (has links)
Cette thèse est consacrée à la sélection de variables ou de modèles en régression logistique. Elle peut-être divisée en deux parties, une partie appliquée et une partie méthodologique. La partie appliquée porte sur l'analyse des données d'une grande enquête socio - épidémiologique dénommée actu-palu. Ces grandes enquêtes socio - épidémiologiques impliquent généralement un nombre considérable de variables explicatives. Le contexte est par nature dit de grande dimension. En raison du fléau de la dimension, le modèle de régression logistique n'est pas directement applicable. Nous procédons en deux étapes, une première étape de réduction du nombre de variables par les méthodes Lasso, Group Lasso et les forêts aléatoires. La deuxième étape consiste à appliquer le modèle logistique au sous-ensemble de variables sélectionné à la première étape. Ces méthodes ont permis de sélectionner les variables pertinentes pour l'identification des foyers à risque d'avoir un épisode fébrile chez un enfant de 2 à 10 ans à Dakar. La partie méthodologique, composée de deux sous-parties, porte sur l'établissement de propriétés techniques d'estimateurs dans le modèle de régression logistique non paramétrique. Ces estimateurs sont obtenus par maximum de vraisemblance pénalisé, dans un cas avec une pénalité de type Lasso ou Group Lasso et dans l'autre cas avec une pénalité de type 1 exposant 0. Dans un premier temps, nous proposons des versions pondérées des estimateurs Lasso et Group Lasso pour le modèle logistique non paramétrique. Nous établissons des inégalités oracles non asymptotiques pour ces estimateurs. Un deuxième ensemble de résultats vise à étendre le principe de sélection de modèle introduit par Birgé et Massart (2001) à la régression logistique. Cette sélection se fait via des critères du maximum de vraisemblance pénalisé. Nous proposons dans ce contexte des critères de sélection de modèle, et nous établissons des inégalités oracles non asymptotiques pour les estimateurs sélectionnés. La pénalité utilisée, dépendant uniquement des données, est calibrée suivant l'idée de l'heuristique de pente. Tous les résultats de la partie méthodologique sont illustrés par des études de simulations numériques. / This thesis is devoted to variables selection or model selection in logistic regression. The applied part focuses on the analysis of data from a large socioepidémiological survey, called actu-palu. These large socioepidemiological survey typically involve a considerable number of explanatory variables. This is well-known as high-dimensional setting. Due to the curse of dimensionality, logistic regression model is no longer reliable. We proceed in two steps, a first step of reducing the number of variables by the Lasso, Group Lasso ans random forests methods. The second step is to apply the logistic model to the sub-set of variables selected in the first step. These methods have helped to select relevant variables for the identification of households at risk of having febrile episode amongst children from 2 to 10 years old in Dakar. In the methodological part, as a first step, we propose weighted versions of Lasso and group Lasso estimators for nonparametric logistic model. We prove non asymptotic oracle inequalities for these estimators. Secondly we extend the model selection principle introduced by Birgé and Massart (2001) to logistic regression model. This selection is done using penalized macimum likelihood criteria. We propose in this context a completely data-driven criteria based on the slope heuristics. We prove non asymptotic oracle inequalities for selected estimators. The results of the methodological part are illustrated through simulation studies.
|
96 |
Multiple maxima of likelihood functions and their implications for inference in the general linear regression modelYeasmin, Mahbuba, 1965- January 2003 (has links)
Abstract not available
|
97 |
Independent Associations between Psychosocial Constructs and C-Reactive Protein among Healthy WomenFarrell, Kristen Anne 01 January 2007 (has links)
C-reactive protein (CRP) is associated with an increased risk of cardiovascular disease (CVD), peripheral vascular disease, diabetes, and stroke. In addition to traditional risk factors of CVD, some studies have shown that depression and anger independently predict CRP, but other studies have found null results, and few, if any, studies have considered possible roles of physical activity and diet. The purpose of this study was to investigate the ability of certain psychosocial variables to predict CRP controlling for traditional CVD risk factors. Cross-sectional data for 300 healthy women who participated in the Stockholm Female Coronary Risk Study were analyzed. Regression analyses were performed to determine whether anger, depression, social support, marital stress, and self-esteem were associated with CRP levels while controlling for relevant covariates. Analyses investigated possible mediating effects of certain aspects of diet and physical activity and whether body composition (measured by waist circumference) and fasting glucose moderates the relationship between psychosocial variables and CRP. We found that anger symptoms were negatively associated with CRP and anger discussion was positively associated with CRP controlling for several biological variables. Diet and physical activity did not explain the relationship between these anger variables and CRP. Social support in the forms of social attachment and social integration were positively associated with CRP among women with a larger waist circumference and higher fasting glucose, respectively. Marital stress was positively related to CRP among women with a larger waist circumference. Among women with a smaller waist circumference, marital stress was negatively related to CRP and social integration was positively related to CRP. These findings suggest that having a large waist in addition to less social support and more marital stress is disadvantageous with regard to CRP. Furthermore, it is possible that being quite thin may not necessarily be advantageous with regard to inflammation.
|
98 |
Linear mixed effects models in functional data analysisWang, Wei 05 1900 (has links)
Regression models with a scalar response and
a functional predictor have been extensively
studied. One approach is to approximate the
functional predictor using basis function or
eigenfunction expansions. In the expansion,
the coefficient vector can either be fixed or
random. The random coefficient vector
is also known as random effects and thus the
regression models are in a mixed effects
framework.
The random effects provide a model for the
within individual covariance of the
observations. But it also introduces an
additional parameter into the model, the
covariance matrix of the random effects.
This additional parameter complicates the
covariance matrix of the observations.
Possibly, the covariance parameters of the
model are not identifiable.
We study identifiability in normal linear
mixed effects models. We derive necessary and
sufficient conditions of identifiability,
particularly, conditions of identifiability
for the regression models with a scalar
response and a functional predictor using
random effects.
We study the regression model using the
eigenfunction expansion approach with random
effects. We assume the random effects have a
general covariance matrix
and the observed values of the predictor are
contaminated with measurement error.
We propose methods of inference for the
regression model's functional coefficient.
As an application of the model, we analyze a
biological data set to investigate the
dependence of a mouse's wheel running
distance on its body mass trajectory.
|
99 |
Dealing with measurement error in covariates with special reference to logistic regression model: a flexible parametric approachHossain, Shahadut 05 1900 (has links)
In many fields of statistical application the fundamental task is to quantify the association between some explanatory variables or covariates and a response or outcome variable through a suitable regression model. The accuracy of such quantification depends on how precisely we measure the relevant covariates. In many instances, we can not measure some of the covariates accurately, rather we can measure noisy versions of them. In statistical terminology this is known as measurement errors or errors in variables. Regression analyses based on noisy covariate measurements lead to biased and inaccurate inference about the true underlying response-covariate associations.
In this thesis we investigate some aspects of measurement error modelling in the case of binary logistic regression models. We suggest a flexible parametric approach for adjusting the measurement error bias while estimating the response-covariate relationship through logistic regression model. We investigate the performance of the proposed flexible parametric approach in comparison with the other flexible parametric and nonparametric approaches through extensive simulation studies. We also compare the proposed method with the other competitive methods with respect to a real-life data set. Though emphasis is put on the logistic regression model the proposed method is applicable to the other members of the generalized linear models, and other types of non-linear regression models too. Finally, we develop a new computational technique to approximate the large sample bias that my arise due to exposure model misspecification in the estimation of the regression parameters in a measurement error scenario.
|
100 |
Methods for longitudinal data measured at distinct time pointsXiong, Xiaoqin January 2010 (has links)
For longitudinal data where the response and time-dependent
predictors within each individual are measured at distinct time
points, traditional longitudinal models such as generalized linear
mixed effects models or marginal models cannot be directly applied.
Instead, some preprocessing such as smoothing is required to
temporally align the response and predictors.
In Chapter 2, we propose a binning method, which results in equally
spaced bins of time for both the response and predictor(s). Hence,
after incorporating binning, traditional models can be applied. The
proposed binning approach was applied on a longitudinal hemodialysis
study to look for possible contemporaneous and lagged effects
between occurrences of a health event (i.e., infection) and levels
of a protein marker of inflammation (i.e., C-reactive protein). Both
Poisson mixed effects models and zero-inflated Poisson (ZIP) mixed
effects models were applied to the subsequent binned data, and some
important biological findings about contemporaneous and lagged
associations were uncovered. In addition, a simulation study was
conducted to investigate various properties of the binning approach.
In Chapter 3, asymptotic properties have been derived for the fixed
effects association parameter estimates following binning, under
different data scenarios. In addition, we propose some
leave-one-subject-out cross-validation algorithms for bin size
selection.
In Chapter 4, in order to identify levels of a predictor that might
be indicative of recently occurred event(s), we propose a
generalized mixed effects regression tree (GMRTree) based method
which estimates the tree by standard tree method such as CART and
estimates the random effects by a generalized linear mixed effects
model. One of the main steps in this method was to use a
linearization technique to change the longitudinal count response
into a continuous surrogate response. Simulations have shown that
the GMRTree method can effectively detect the underlying tree
structure in an applicable longitudinal dataset, and has better
predictive performance than either a standard tree approach without
random effects or a generalized linear mixed effects model, assuming
the underlying model indeed has a tree structure. We have also
applied this method to two longitudinal datasets, one from the
aforementioned hemodialysis study and the other from an epilepsy
study.
|
Page generated in 0.0946 seconds