1 |
Influential data cases when the C-p criterion is used for variable selection in multiple linear regressionUys, Daniel Wilhelm January 2003 (has links)
Dissertation (PhD)--Stellenbosch University, 2003. / ENGLISH ABSTRACT: In this dissertation we study the influence of data cases when the Cp criterion of Mallows (1973)
is used for variable selection in multiple linear regression. The influence is investigated in
terms of the predictive power and the predictor variables included in the resulting model when
variable selection is applied. In particular, we focus on the importance of identifying and
dealing with these so called selection influential data cases before model selection and fitting
are performed. For this purpose we develop two new selection influence measures, both based
on the Cp criterion. The first measure is specifically developed to identify individual selection
influential data cases, whereas the second identifies subsets of selection influential data cases.
The success with which these influence measures identify selection influential data cases, is
evaluated in example data sets and in simulation. All results are derived in the coordinate free
context, with special application in multiple linear regression. / AFRIKAANSE OPSOMMING: Invloedryke waarnemings as die C-p kriterium vir veranderlike seleksie in meervoudigelineêre regressie gebruik word: In hierdie proefskrif ondersoek ons die invloed van waarnemings as die Cp kriterium van Mallows
(1973) vir veranderlike seleksie in meervoudige lineêre regressie gebruik word. Die
invloed van waarnemings op die voorspellingskrag en die onafhanklike veranderlikes wat ingesluit
word in die finale geselekteerde model, word ondersoek. In besonder fokus ons op
die belangrikheid van identifisering van en handeling met sogenaamde seleksie invloedryke
waarnemings voordat model seleksie en passing gedoen word. Vir hierdie doel word twee
nuwe invloedsmaatstawwe, albei gebaseer op die Cp kriterium, ontwikkel. Die eerste maatstaf
is spesifiek ontwikkelom die invloed van individuele waarnemings te meet, terwyl die tweede
die invloed van deelversamelings van waarnemings op die seleksie proses meet. Die sukses
waarmee hierdie invloedsmaatstawwe seleksie invloedryke waarnemings identifiseer word
beoordeel in voorbeeld datastelle en in simulasie. Alle resultate word afgelei binne die koërdinaatvrye
konteks, met spesiale toepassing in meervoudige lineêre regressie.
|
Page generated in 0.0718 seconds