Global ETD Search

Return to search

On semiparametric regression and data mining

Semiparametric regression is playing an increasingly large role in the analysis of datasets exhibiting various complications (Ruppert, Wand & Carroll, 2003). In particular semiparametric regression a plays prominent role in the area of data mining where such complications are numerous (Hastie, Tibshirani & Friedman, 2001). In this thesis we develop fast, interpretable methods addressing many of the difficulties associated with data mining applications including: model selection, missing value analysis, outliers and heteroscedastic noise. We focus on function estimation using penalised splines via mixed model methodology (Wahba 1990; Speed 1991; Ruppert et al. 2003). In dealing with the difficulties associated with data mining applications many of the models we consider deviate from typical normality assumptions. These models lead to likelihoods involving analytically intractable integrals. Thus, in keeping with the aim of speed, we seek analytic approximations to such integrals which are typically faster than numeric alternatives. These analytic approximations not only include popular penalised quasi-likelihood (PQL) approximations (Breslow & Clayton, 1993) but variational approximations. Originating in physics, variational approximations are a relatively new class of approximations (to statistics) which are simple, fast, flexible and effective. They have recently been applied to statistical problems in machine learning where they are rapidly gaining popularity (Jordan, Ghahramani, Jaakkola & Sau11999; Corduneanu & Bishop, 2001; Ueda & Ghahramani, 2002; Bishop & Winn, 2003; Winn & Bishop 2005). We develop variational approximations to: generalized linear mixed models (GLMMs); Bayesian GLMMs; simple missing values models; and for outlier and heteroscedastic noise models, which are, to the best of our knowledge, new. These methods are quite effective and extremely fast, with fitting taking minutes if not seconds on a typical 2008 computer. We also make a contribution to variational methods themselves. Variational approximations often underestimate the variance of posterior densities in Bayesian models (Humphreys & Titterington, 2000; Consonni & Marin, 2004; Wang & Titterington, 2005). We develop grid-based variational posterior approximations. These approximations combine a sequence of variational posterior approximations, can be extremely accurate and are reasonably fast.

http://handle.unsw.edu.au/1959.4/40913

Data mining

Regression analysis

Identifer	oai:union.ndltd.org:ADTP/279658
Date	January 2008
Creators	Ormerod, John T, Mathematics & Statistics, Faculty of Science, UNSW
Publisher	Awarded by:University of New South Wales. Mathematics & Statistics
Source Sets	Australiasian Digital Theses Program
Language	English
Detected Language	English
Rights	Copyright Ormerod John T., http://unsworks.unsw.edu.au/copyright

Page generated in 0.0016 seconds

On semiparametric regression and data mining

Description

Links & Downloads

Tags

Additional Fields