1 |
A Bayesian approach to energy monitoring optimizationCarstens, Herman January 2017 (has links)
This thesis develops methods for reducing energy Measurement and Verification (M&V) costs through
the use of Bayesian statistics. M&V quantifies the savings of energy efficiency and demand side
projects by comparing the energy use in a given period to what that use would have been, had no
interventions taken place. The case of a large-scale lighting retrofit study, where incandescent lamps
are replaced by Compact Fluorescent Lamps (CFLs), is considered. These projects often need to be
monitored over a number of years with a predetermined level of statistical rigour, making M&V very
expensive.
M&V lighting retrofit projects have two interrelated uncertainty components that need to be addressed,
and which form the basis of this thesis. The first is the uncertainty in the annual energy use of the
average lamp, and the second the persistence of the savings over multiple years, determined by the
number of lamps that are still functioning in a given year. For longitudinal projects, the results from
these two aspects need to be obtained for multiple years.
This thesis addresses these problems by using the Bayesian statistical paradigm. Bayesian statistics is
still relatively unknown in M&V, and presents an opportunity for increasing the efficiency of statistical
analyses, especially for such projects.
After a thorough literature review, especially of measurement uncertainty in M&V, and an introduction
to Bayesian statistics for M&V, three methods are developed. These methods address the three types
of uncertainty in M&V: measurement, sampling, and modelling. The first method is a low-cost energy
meter calibration technique. The second method is a Dynamic Linear Model (DLM) with Bayesian
Forecasting for determining the size of the metering sample that needs to be taken in a given year.
The third method is a Dynamic Generalised Linear Model (DGLM) for determining the size of the
population survival survey sample.
It is often required by law that M&V energy meters be calibrated periodically by accredited laboratories.
This can be expensive and inconvenient, especially if the facility needs to be shut down for meter
installation or removal. Some jurisdictions also require meters to be calibrated in-situ; in their operating
environments. However, it is shown that metering uncertainty makes a relatively small impact to
overall M&V uncertainty in the presence of sampling, and therefore the costs of such laboratory
calibration may outweigh the benefits. The proposed technique uses another commercial-grade meter
(which also measures with error) to achieve this calibration in-situ. This is done by accounting for the
mismeasurement effect through a mathematical technique called Simulation Extrapolation (SIMEX).
The SIMEX result is refined using Bayesian statistics, and achieves acceptably low error rates and
accurate parameter estimates.
The second technique uses a DLM with Bayesian forecasting to quantify the uncertainty in metering
only a sample of the total population of lighting circuits. A Genetic Algorithm (GA) is then applied
to determine an efficient sampling plan. Bayesian statistics is especially useful in this case because
it allows the results from previous years to inform the planning of future samples. It also allows for
exact uncertainty quantification, where current confidence interval techniques do not always do so.
Results show a cost reduction of up to 66%, but this depends on the costing scheme used. The study
then explores the robustness of the efficient sampling plans to forecast error, and finds a 50% chance
of undersampling for such plans, due to the standard M&V sampling formula which lacks statistical
power.
The third technique uses a DGLM in the same way as the DLM, except for population survival
survey samples and persistence studies, not metering samples. Convolving the binomial survey result
distributions inside a GA is problematic, and instead of Monte Carlo simulation, a relatively new
technique called Mellin Transform Moment Calculation is applied to the problem. The technique is
then expanded to model stratified sampling designs for heterogeneous populations. Results show a
cost reduction of 17-40%, although this depends on the costing scheme used.
Finally the DLM and DGLM are combined into an efficient overall M&V plan where metering and
survey costs are traded off over multiple years, while still adhering to statistical precision constraints.
This is done for simple random sampling and stratified designs. Monitoring costs are reduced by
26-40% for the costing scheme assumed.
The results demonstrate the power and flexibility of Bayesian statistics for M&V applications, both in
terms of exact uncertainty quantification, and by increasing the efficiency of the study and reducing
monitoring costs. / Hierdie proefskrif ontwikkel metodes waarmee die koste van energiemonitering en verifieëring (M&V)
deur Bayesiese statistiek verlaag kan word. M&V bepaal die hoeveelheid besparings wat deur
energiedoeltreffendheid- en vraagkantbestuurprojekte behaal kan word. Dit word gedoen deur die
energieverbruik in ’n gegewe tydperk te vergelyk met wat dit sou wees indien geen ingryping plaasgevind
het nie. ’n Grootskaalse beligtingsretrofitstudie, waar filamentgloeilampe met fluoresserende
spaarlampe vervang word, dien as ’n gevallestudie. Sulke projekte moet gewoonlik oor baie jare met
’n vasgestelde statistiese akkuuraatheid gemonitor word, wat M&V duur kan maak.
Twee verwante onsekerheidskomponente moet in M&V beligtingsprojekte aangespreek word, en vorm
die grondslag van hierdie proefskrif. Ten eerste is daar die onsekerheid in jaarlikse energieverbruik
van die gemiddelde lamp. Ten tweede is daar die volhoubaarheid van die besparings oor veelvoudige
jare, wat bepaal word deur die aantal lampe wat tot in ’n gegewe jaar behoue bly. Vir longitudinale
projekte moet hierdie twee komponente oor veelvoudige jare bepaal word.
Hierdie proefskrif spreek die probleem deur middel van ’n Bayesiese paradigma aan. Bayesiese
statistiek is nog relatief onbekend in M&V, en bied ’n geleentheid om die doeltreffendheid van
statistiese analises te verhoog, veral vir bogenoemde projekte.
Die proefskrif begin met ’n deeglike literatuurstudie, veral met betrekking tot metingsonsekerheid
in M&V. Daarna word ’n inleiding tot Bayesiese statistiek vir M&V voorgehou, en drie metodes
word ontwikkel. Hierdie metodes spreek die drie hoofbronne van onsekerheid in M&V aan: metings,
opnames, en modellering. Die eerste metode is ’n laekoste energiemeterkalibrasietegniek. Die
tweede metode is ’n Dinamiese Linieêre Model (DLM) met Bayesiese vooruitskatting, waarmee meter
opnamegroottes bepaal kan word. Die derde metode is ’n Dinamiese Veralgemeende Linieêre Model
(DVLM), waarmee bevolkingsoorlewing opnamegroottes bepaal kan word.
Volgens wet moet M&V energiemeters gereeld deur erkende laboratoria gekalibreer word. Dit kan
duur en ongerieflik wees, veral as die aanleg tydens meterverwydering en -installering afgeskakel moet
word. Sommige regsgebiede vereis ook dat meters in-situ gekalibreer word; in hul bedryfsomgewings.
Tog word dit aangetoon dat metingsonsekerheid ’n klein deel van die algehele M&V onsekerheid
beslaan, veral wanneer opnames gedoen word. Dit bevraagteken die kostevoordeel van laboratoriumkalibrering.
Die voorgestelde tegniek gebruik ’n ander kommersieële-akkuurraatheidsgraad meter
(wat self ’n nie-weglaatbare metingsfout bevat), om die kalibrasie in-situ te behaal. Dit word gedoen
deur die metingsfout deur SIMulerings EKStraptolering (SIMEKS) te verminder. Die SIMEKS resultaat
word dan deur Bayesiese statistiek verbeter, en behaal aanvaarbare foutbereike en akkuurate
parameterafskattings.
Die tweede tegniek gebruik ’n DLM met Bayesiese vooruitskatting om die onsekerheid in die meting
van die opnamemonster van die algehele bevolking af te skat. ’n Genetiese Algoritme (GA) word
dan toegepas om doeltreffende opnamegroottes te vind. Bayesiese statistiek is veral nuttig in hierdie
geval aangesien dit vorige jare se uitslae kan gebruik om huidige afskattings te belig Dit laat ook
die presiese afskatting van onsekerheid toe, terwyl standaard vertrouensintervaltegnieke dit nie doen
nie. Resultate toon ’n kostebesparing van tot 66%. Die studie ondersoek dan die standvastigheid van
kostedoeltreffende opnameplanne in die teenwoordigheid van vooruitskattingsfoute. Dit word gevind
dat kostedoeltreffende opnamegroottes 50% van die tyd te klein is, vanweë die gebrek aan statistiese
krag in die standaard M&V formules.
Die derde tegniek gebruik ’n DVLM op dieselfde manier as die DLM, behalwe dat bevolkingsoorlewingopnamegroottes
ondersoek word. Die saamrol van binomiale opname-uitslae binne die GA skep ’n
probleem, en in plaas van ’n Monte Carlo simulasie word die relatiewe nuwe Mellin Vervorming
Moment Berekening op die probleem toegepas. Die tegniek word dan uitgebou om laagsgewyse
opname-ontwerpe vir heterogene bevolkings te vind. Die uitslae wys ’n 17-40% kosteverlaging,
alhoewel dit van die koste-skema afhang.
Laastens word die DLM en DVLM saamgevoeg om ’n doeltreffende algehele M&V plan, waar meting
en opnamekostes teen mekaar afgespeel word, te ontwerp. Dit word vir eenvoudige en laagsgewyse
opname-ontwerpe gedoen. Moniteringskostes word met 26-40% verlaag, maar hang van die aangenome
koste-skema af.
Die uitslae bewys die krag en buigsaamheid van Bayesiese statistiek vir M&V toepassings, beide vir
presiese onsekerheidskwantifisering, en deur die doeltreffendheid van die dataverbruik te verhoog en
sodoende moniteringskostes te verlaag. / Thesis (PhD)--University of Pretoria, 2017. / National Research Foundation / Department of Science and Technology / National Hub for the Postgraduate
Programme in Energy Efficiency and Demand Side Management / Electrical, Electronic and Computer Engineering / PhD / Unrestricted
|
2 |
Calibration of trip distribution by generalised linear modelsShrewsbury, John Stephen January 2012 (has links)
Generalised linear models (GLMs) provide a flexible and sound basis for calibrating gravity models for trip distribution, for a wide range of deterrence functions (from steps to splines), with K factors and geographic segmentation. The Tanner function fitted Wellington Transport Strategy Model data as well as more complex functions and was insensitive to the formulation of intrazonal and external costs. Weighting from variable expansion factors and interpretation of the deviance under sparsity are addressed.
An observed trip matrix is disaggregated and fitted at the household, person and trip levels with consistent results. Hierarchical GLMs (HGLMs) are formulated to fit mixed logit models, but were unable to reproduce the coefficients of simple nested logit models.
Geospatial analysis by HGLM showed no evidence of spatial error patterns, either as random K factors or as correlations between them. Equivalence with hierarchical mode choice, duality with trip distribution, regularisation, lorelograms, and the modifiable areal unit problem are considered.
Trip distribution is calibrated from aggregate data by the MVESTM matrix estimation package, incorporating period and direction factors in the intercepts. Counts across four screenlines showed a significance similar to a thousand-household travel survey. Calibration was possible only in conjuction with trip end data. Criteria for validation against screenline counts were met, but only if allowance was made for error in the trip end data.
|
3 |
Combined Actuarial Neural Networks in Actuarial Rate Making / Kombinerade aktuariska neurala nätverk i aktuarisk tariffanalysGustafsson, Axel, Hansén, Jacob January 2021 (has links)
Insurance is built on the principle that a group of people contributes to a common pool of money which will be used to cover the costs for individuals who suffer from the insured event. In a competitive market, an insurance company will only be profitable if their pricing reflects the covered risks as good as possible. This thesis investigates the recently proposed Combined Actuarial Neural Network (CANN), a model nesting the traditional Generalised Linear Model (GLM) used in insurance pricing into a Neural Network (NN). The main idea of utilising NNs for insurance pricing is to model interactions between features that the GLM is unable to capture. The CANN model is analysed in a commercial insurance setting with respect to two research questions. The first research question, RQ 1, seeks to answer if the CANN model can outperform the underlying GLM with respect to error metrics and actuarial model evaluation tools. The second research question, RQ 2, seeks to identify existing interpretability methods that can be applied to the CANN model and also showcase how they can be applied. The results for RQ 1 show that CANN models are able to consistently outperform the GLM with respect to chosen model evaluation tools. A literature search is conducted to answer RQ 2, identifying interpretability methods that either are applicable or are possibly applicable to the CANN model. One interpretability method is also proposed in this thesis specifically for the CANN model, using model-fitted averages on two-dimensional segments of the data. Three interpretability methods from the literature search and the one proposed in this thesis are demonstrated, illustrating how these may be applied. / Försäkringar bygger på principen att en grupp människor bidrar till en gemensam summa pengar som används för att täcka kostnader för individer som råkar ut för den försäkrade händelsen. I en konkurrensutsatt marknad kommer försäkringsbolag endast vara lönsamma om deras prissättning är så bra som möjligt. Denna uppsats undersöker den nyligen föreslagna Combined Actuarial Neural Network (CANN) modellen som bygger in en Generalised Linear Model (GLM) i ett neuralt nätverk, i en praktiskt och kommersiell försäkringskontext med avseende på två forskningsfrågor. Huvudidén för en CANN modell är att fånga interaktioner mellan variabler, vilket en GLM inte automatiskt kan göra. Forskningsfråga 1 ämnar undersöka huruvida en CANN modell kan prestera bättre än en GLM med avseende på utvalda statistiska prestationsmått och modellutvärderingsverktyg som används av aktuarier. Forskningsfråga 2 ämnar identifiera några tolkningsverktyg som kan appliceras på CANN modellen samt demonstrera hur de kan användas. Resultaten för Forskningsfråga 1 visar att CANN modellen kan prestera bättre än en GLM. En literatursökning genomförs för att svara på Forskningsfråga 2, och ett antal tolkningsverktyg identifieras. Ett tolkningsverktyg föreslås också i denna uppsats specifikt för att tolka CANN modellen. Tre av tolkningsverktygen samt det utvecklade verktyget demonstreras för att visa hur de kan användas för att tolka CANN modellen.
|
4 |
Combined Actuarial Neural Networks in Actuarial Rate Making / Kombinerade aktuariska neurala nätverk i aktuarisk tariffanalysGustafsson, Axel, Hansen, Jacob January 2021 (has links)
Insurance is built on the principle that a group of people contributes to a common pool of money which will be used to cover the costs for individuals who suffer from the insured event. In a competitive market, an insurance company will only be profitable if their pricing reflects the covered risks as good as possible. This thesis investigates the recently proposed Combined Actuarial Neural Network (CANN), a model nesting the traditional Generalised Linear Model (GLM) used in insurance pricing into a Neural Network (NN). The main idea of utilising NNs for insurance pricing is to model interactions between features that the GLM is unable to capture. The CANN model is analysed in a commercial insurance setting with respect to two research questions. The first research question, RQ 1, seeks to answer if the CANN model can outperform the underlying GLM with respect to error metrics and actuarial model evaluation tools. The second research question, RQ 2, seeks to identify existing interpretability methods that can be applied to the CANN model and also showcase how they can be applied. The results for RQ 1 show that CANN models are able to consistently outperform the GLM with respect to chosen model evaluation tools. A literature search is conducted to answer RQ 2, identifying interpretability methods that either are applicable or are possibly applicable to the CANN model. One interpretability method is also proposed in this thesis specifically for the CANN model, using model-fitted averages on two-dimensional segments of the data. Three interpretability methods from the literature search and the one proposed in this thesis are demonstrated, illustrating how these may be applied. / Försäkringar bygger på principen att en grupp människor bidrar till en gemensam summa pengar som används för att täcka kostnader för individer som råkar ut för den försäkrade händelsen. I en konkurrensutsatt marknad kommer försäkringsbolag endast vara lönsamma om deras prissättning är så bra som möjligt. Denna uppsats undersöker den nyligen föreslagna Combined Actuarial Neural Network (CANN) modellen som bygger in en Generalised Linear Model (GLM) i ett neuralt nätverk, i en praktiskt och kommersiell försäkringskontext med avseende på två forskningsfrågor. Huvudidén för en CANN modell är att fånga interaktioner mellan variabler, vilket en GLM inte automatiskt kan göra. Forskningsfråga 1 ämnar undersöka huruvida en CANN modell kan prestera bättre än en GLM med avseende på utvalda statistiska prestationsmått och modellutvärderingsverktyg som används av aktuarier. Forskningsfråga 2 ämnar identifiera några tolkningsverktyg som kan appliceras på CANN modellen samt demonstrera hur de kan användas. Resultaten för Forskningsfråga 1 visar att CANN modellen kan prestera bättre än en GLM. En literatursökning genomförs för att svara på Forskningsfråga 2, och ett antal tolkningsverktyg identifieras. Ett tolkningsverktyg föreslås också i denna uppsats specifikt för att tolka CANN modellen. Tre av tolkningsverktygen samt det utvecklade verktyget demonstreras för att visa hur de kan användas för att tolka CANN modellen.
|
5 |
Applications of Spatio-temporal Analytical Methods in Surveillance of Ross River Virus DiseaseHu, Wenbiao January 2005 (has links)
The incidence of many arboviral diseases is largely associated with social and environmental conditions. Ross River virus (RRV) is the most prevalent arboviral disease in Australia. It has long been recognised that the transmission pattern of RRV is sensitive to socio-ecological factors including climate variation, population movement, mosquito-density and vegetation types. This study aimed to assess the relationships between socio-environmental variability and the transmission of RRV using spatio-temporal analytic methods. Computerised data files of daily RRV disease cases and daily climatic variables in Brisbane, Queensland during 1985-2001 were obtained from the Queensland Department of Health and the Australian Bureau of Meteorology, respectively. Available information on other socio-ecological factors was also collected from relevant government agencies as follows: 1) socio-demographic data from the Australia Bureau of Statistics; 2) information on vegetation (littoral wetlands, ephemeral wetlands, open freshwater, riparian vegetation, melaleuca open forests, wet eucalypt, open forests and other bushland) from Brisbane City Council; 3) tidal activities from the Queensland Department of Transport; and 4) mosquito-density from Brisbane City Council. Principal components analysis (PCA) was used as an exploratory technique for discovering spatial and temporal pattern of RRV distribution. The PCA results show that the first principal component accounted for approximately 57% of the information, which contained the four seasonal rates and loaded highest and positively for autumn. K-means cluster analysis indicates that the seasonality of RRV is characterised by three groups with high, medium and low incidence of disease, and it suggests that there are at least three different disease ecologies. The variation in spatio-temporal patterns of RRV indicates a complex ecology that is unlikely to be explained by a single dominant transmission route across these three groupings. Therefore, there is need to explore socio-economic and environmental determinants of RRV disease at the statistical local area (SLA) level. Spatial distribution analysis and multiple negative binomial regression models were employed to identify the socio-economic and environmental determinants of RRV disease at both the city and local (ie, SLA) levels. The results show that RRV activity was primarily concentrated in the northeast, northwest and southeast areas in Brisbane. The negative binomial regression models reveal that RRV incidence for the whole of the Brisbane area was significantly associated with Southern Oscillation Index (SOI) at a lag of 3 months (Relative Risk (RR): 1.12; 95% confidence interval (CI): 1.06 - 1.17), the proportion of people with lower levels of education (RR: 1.02; 95% CI: 1.01 - 1.03), the proportion of labour workers (RR: 0.97; 95% CI: 0.95 - 1.00) and vegetation density (RR: 1.02; 95% CI: 1.00 - 1.04). However, RRV incidence for high risk areas (ie, SLAs with higher incidence of RRV) was significantly associated with mosquito density (RR: 1.01; 95% CI: 1.00 - 1.01), SOI at a lag of 3 months (RR: 1.48; 95% CI: 1.23 - 1.78), human population density (RR: 3.77; 95% CI: 1.35 - 10.51), the proportion of indigenous population (RR: 0.56; 95% CI: 0.37 - 0.87) and the proportion of overseas visitors (RR: 0.57; 95% CI: 0.35 - 0.92). It is acknowledged that some of these risk factors, while statistically significant, are small in magnitude. However, given the high incidence of RRV, they may still be important in practice. The results of this study suggest that the spatial pattern of RRV disease in Brisbane is determined by a combination of ecological, socio-economic and environmental factors. The possibility of developing an epidemic forecasting system for RRV disease was explored using the multivariate Seasonal Auto-regressive Integrated Moving Average (SARIMA) technique. The results of this study suggest that climatic variability, particularly precipitation, may have played a significant role in the transmission of RRV disease in Brisbane. This finding cannot entirely be explained by confounding factors such as other socio-ecological conditions because they have been unlikely to change dramatically on a monthly time scale in this city over the past two decades. SARIMA models show that monthly precipitation at a lag 2 months (=0.004,p=0.031) was statistically significantly associated with RRV disease. It suggests that there may be 50 more cases a year for an increase of 100 mm precipitation on average in Brisbane. The predictive values in the model were generally consistent with actual values (root-mean-square error (RMSE): 1.96). Therefore, this model may have applications as a decision support tool in disease control and risk-management planning programs in Brisbane. The Polynomial distributed lag (PDL) time series regression models were performed to examine the associations between rainfall, mosquito density and the occurrence of RRV after adjusting for season and auto-correlation. The PDL model was used because rainfall and mosquito density can affect not merely RRV occurring in the same month, but in several subsequent months. The rationale for the use of the PDL technique is that it increases the precision of the estimates. We developed an epidemic forecasting model to predict incidence of RRV disease. The results show that 95% and 85% of the variation in the RRV disease was accounted for by the mosquito density and rainfall, respectively. The predictive values in the model were generally consistent with actual values (RMSE: 1.25). The model diagnosis reveals that the residuals were randomly distributed with no significant auto-correlation. The results of this study suggest that PDL models may be better than SARIMA models (R-square increased and RMSE decreased). The findings of this study may facilitate the development of early warning systems for the control and prevention of this widespread disease. Further analyses were conducted using classification trees to identify major mosquito species of Ross River virus (RRV) transmission and explore the threshold of mosquito density for RRV disease in Brisbane, Australia. The results show that Ochlerotatus vigilax (RR: 1.028; 95% CI: 1.001 - 1.057) and Culex annulirostris (RR: 1.013, 95% CI: 1.003 - 1.023) were significantly associated with RRV disease cycles at a lag of 1 month. The presence of RRV was associated with average monthly mosquito density of 72 Ochlerotatus vigilax and 52 Culex annulirostris per light trap. These results may also have applications as a decision support tool in disease control and risk management planning programs. As RRV has significant impact on population health, industry, and tourism, it is important to develop an epidemic forecast system for this disease. The results of this study show the disease surveillance data can be integrated with social, biological and environmental databases. These data can provide additional input into the development of epidemic forecasting models. These attempts may have significant implications in environmental health decision-making and practices, and may help health authorities determine public health priorities more wisely and use resources more effectively and efficiently.
|
6 |
Epidemic models and inference for the transmission of hospital pathogensForrester, Marie Leanne January 2006 (has links)
The primary objective of this dissertation is to utilise, adapt and extend current stochastic models and statistical inference techniques to describe the transmission of nosocomial pathogens, i.e. hospital-acquired pathogens, and multiply-resistant organisms within the hospital setting. The emergence of higher levels of antibiotic resistance is threatening the long term viability of current treatment options and placing greater emphasis on the use of infection control procedures. The relative importance and value of various infection control practices is often debated and there is a lack of quantitative evidence concerning their effectiveness. The methods developed in this dissertation are applied to data of methicillin-resistant Staphylococcus aureus occurrence in intensive care units to quantify the effectiveness of infection control procedures. Analysis of infectious disease or carriage data is complicated by dependencies within the data and partial observation of the transmission process. Dependencies within the data are inherent because the risk of colonisation depends on the number of other colonised individuals. The colonisation times, chain and duration are often not visible to the human eye making only partial observation of the transmission process possible. Within a hospital setting, routine surveillance monitoring permits knowledge of interval-censored colonisation times. However, consideration needs to be given to the possibility of false negative outcomes when relying on observations from routine surveillance monitoring. SI (Susceptible, Infected) models are commonly used to describe community epidemic processes and allow for any inherent dependencies. Statistical inference techniques, such as the expectation-maximisation (EM) algorithm and Markov chain Monte Carlo (MCMC) can be used to estimate the model parameters when only partial observation of the epidemic process is possible. These methods appear well suited for the analysis of hospital infectious disease data but need to be adapted for short patient stays through migration. This thesis focuses on the use of Bayesian statistics to explore the posterior distributions of the unknown parameters. MCMC techniques are introduced to overcome analytical intractability caused by partial observation of the epidemic process. Statistical issues such as model adequacy and MCMC convergence assessment are discussed throughout the thesis. The new methodology allows the quantification of the relative importance of different transmission routes and the benefits of hospital practices, in terms of changed transmission rates. Evidence-based decisions can therefore be made on the impact of infection control procedures which is otherwise difficult on the basis of clinical studies alone. The methods are applied to data describing the occurrence of methicillin-resistant Staphylococcus aureus within intensive care units in hospitals in Brisbane and London
|
Page generated in 0.11 seconds