Spelling suggestions: "subject:"penalized"" "subject:"menalized""
111 |
Pénalisation et réduction de la dimension des variables auxiliaires en théorie des sondages / Penalization and data reduction of auxiliary variables in survey samplingShehzad, Muhammad Ahmed 12 October 2012 (has links)
Les enquêtes par sondage sont utiles pour estimer des caractéristiques d'une populationtelles que le total ou la moyenne. Cette thèse s'intéresse à l'étude detechniques permettant de prendre en compte un grand nombre de variables auxiliairespour l'estimation d'un total.Le premier chapitre rappelle quelques définitions et propriétés utiles pour lasuite du manuscrit : l'estimateur de Horvitz-Thompson, qui est présenté commeun estimateur n'utilisant pas l'information auxiliaire ainsi que les techniques decalage qui permettent de modifier les poids de sondage de facon à prendre encompte l'information auxiliaire en restituant exactement dans l'échantillon leurstotaux sur la population.Le deuxième chapitre, qui est une partie d'un article de synthèse accepté pourpublication, présente les méthodes de régression ridge comme un remède possibleau problème de colinéarité des variables auxiliaires, et donc de mauvais conditionnement.Nous étudions les points de vue "model-based" et "model-assisted" dela ridge regression. Cette technique qui fournit de meilleurs résultats en termed'erreur quadratique en comparaison avec les moindres carrés ordinaires peutégalement s'interpréter comme un calage pénalisé. Des simulations permettentd'illustrer l'intérêt de cette technique par compar[a]ison avec l'estimateur de Horvitz-Thompson.Le chapitre trois présente une autre manière de traiter les problèmes de colinéaritévia une réduction de la dimension basée sur les composantes principales. Nousétudions la régression sur composantes principales dans le contexte des sondages.Nous explorons également le calage sur les moments d'ordre deux des composantesprincipales ainsi que le calage partiel et le calage sur les composantes principalesestimées. Une illustration sur des données de l'entreprise Médiamétrie permet deconfirmer l'intérêt des ces techniques basées sur la réduction de la dimension pourl'estimation d'un total en présence d'un grand nombre de variables auxiliaires / Survey sampling techniques are quite useful in a way to estimate population parameterssuch as the population total when the large dimensional auxiliary data setis available. This thesis deals with the estimation of population total in presenceof ill-conditioned large data set.In the first chapter, we give some basic definitions that will be used in thelater chapters. The Horvitz-Thompson estimator is defined as an estimator whichdoes not use auxiliary variables. Along with, calibration technique is defined toincorporate the auxiliary variables for sake of improvement in the estimation ofpopulation totals for a fixed sample size.The second chapter is a part of a review article about ridge regression estimationas a remedy for the multicollinearity. We give a detailed review ofthe model-based, design-based and model-assisted scenarios for ridge estimation.These estimates give improved results in terms of MSE compared to the leastsquared estimates. Penalized calibration is also defined under survey sampling asan equivalent estimation technique to the ridge regression in the classical statisticscase. Simulation results confirm the improved estimation compared to theHorvitz-Thompson estimator.Another solution to the ill-conditioned large auxiliary data is given in terms ofprincipal components analysis in chapter three. Principal component regression isdefined and its use in survey sampling is explored. Some new types of principalcomponent calibration techniques are proposed such as calibration on the secondmoment of principal component variables, partial principal component calibrationand estimated principal component calibration to estimate a population total. Applicationof these techniques on real data advocates the use of these data reductiontechniques for the improved estimation of population totals
|
112 |
Medium term load forecasting in South Africa using Generalized Additive models with tensor product interactionsRavele, Thakhani 21 September 2018 (has links)
MSc (Statistics) / Department of Statistics / Forecasting of electricity peak demand levels is important for decision makers
in Eskom. The overall objective of this study was to develop medium
term load forecasting models which will help decision makers in Eskom for
planning of the operations of the utility company. The frequency table of
hourly daily demands was carried out and the results show that most peak
loads occur at hours 19:00 and 20:00, over the period 2009 to 2013. The
study used generalised additive models with and without tensor product interactions
to forecast electricity demand at 19:00 and 20:00 including daily
peak electricity demand. Least absolute shrinkage and selection operator
(Lasso) and Lasso via hierarchical interactions were used for variable selection
to increase the model interpretability by eliminating irrelevant variables
that are not associated with the response variable, this way also over tting
is reduced. The parameters of the developed models were estimated using
restricted maximum likelihood and penalized regression. The best models
were selected based on smallest values of the Akaike information criterion
(AIC), Bayesian information criterion (BIC) and Generalized cross validation
(GCV) along with the highest Adjusted R2. Forecasts from best models
with and without tensor product interactions were evaluated using mean absolute
percentage error (MAPE), mean absolute error (MAE) and root mean
square error (RMSE). Operational forecasting was proposed to forecast the
demand at hour 19:00 with unknown predictor variables. Empirical results
from this study show that modelling hours individually during the peak period
results in more accurate peak forecasts compared to forecasting daily
peak electricity demand. The performance of the proposed models for hour
19:00 were compared and the generalized additive model with tensor product
interactions was found to be the best tting model. / NRF
|
Page generated in 0.0485 seconds