441 |
Binary classification trees : a comparison with popular classification methods in statistics using different softwareLamont, Morné Michael Connell 12 1900 (has links)
Thesis (MComm) -- Stellenbosch University, 2002. / ENGLISH ABSTRACT: Consider a data set with a categorical response variable and a set of explanatory
variables. The response variable can have two or more categories and the explanatory
variables can be numerical or categorical. This is a typical setup for a classification
analysis, where we want to model the response based on the explanatory variables.
Traditional statistical methods have been developed under certain assumptions
such as: the explanatory variables are numeric only and! or the data follow a multivariate
normal distribution. hl practice such assumptions are not always met. Different research
fields generate data that have a mixed structure (categorical and numeric) and researchers
are often interested using all these data in the analysis. hl recent years robust methods
such as classification trees have become the substitute for traditional statistical methods
when the above assumptions are violated. Classification trees are not only an effective
classification method, but offer many other advantages.
The aim of this thesis is to highlight the advantages of classification trees. hl the
chapters that follow, the theory of and further developments on classification trees are
discussed. This forms the foundation for the CART software which is discussed in
Chapter 5, as well as other software in which classification tree modeling is possible. We
will compare classification trees to parametric-, kernel- and k-nearest-neighbour
discriminant analyses. A neural network is also compared to classification trees and
finally we draw some conclusions on classification trees and its comparisons with other
methods. / AFRIKAANSE OPSOMMING: Beskou 'n datastel met 'n kategoriese respons veranderlike en 'n stel verklarende
veranderlikes. Die respons veranderlike kan twee of meer kategorieë hê en die
verklarende veranderlikes kan numeries of kategories wees. Hierdie is 'n tipiese opset vir
'n klassifikasie analise, waar ons die respons wil modelleer deur gebruik te maak van die
verklarende veranderlikes.
Tradisionele statistiese metodes is ontwikkelonder sekere aannames soos: die
verklarende veranderlikes is slegs numeries en! of dat die data 'n meerveranderlike
normaal verdeling het. In die praktyk word daar nie altyd voldoen aan hierdie aannames
nie. Verskillende navorsingsvelde genereer data wat 'n gemengde struktuur het
(kategories en numeries) en navorsers wil soms al hierdie data gebruik in die analise. In
die afgelope jare het robuuste metodes soos klassifikasie bome die alternatief geword vir
tradisionele statistiese metodes as daar nie aan bogenoemde aannames voldoen word nie.
Klassifikasie bome is nie net 'n effektiewe klassifikasie metode nie, maar bied baie meer
voordele.
Die doel van hierdie werkstuk is om die voordele van klassifikasie bome uit te
wys. In die hoofstukke wat volg word die teorie en verdere ontwikkelinge van
klassifikasie bome bespreek. Hierdie vorm die fondament vir die CART sagteware wat
bespreek word in Hoofstuk 5, asook ander sagteware waarin klassifikasie boom
modelering moontlik is. Ons sal klassifikasie bome vergelyk met parametriese-, "kernel"-
en "k-nearest-neighbour" diskriminant analise. 'n Neurale netwerk word ook vergelyk
met klassifikasie bome en ten slote word daar gevolgtrekkings gemaak oor klassifikasie
bome en hoe dit vergelyk met ander metodes.
|
442 |
Managing the forecasting function within the fast moving consumer goods industryBurger, S. (Stephan) 12 1900 (has links)
Thesis (MBA)--Stellenbosch University, 2003. / ENGLISH ABSTRACT: Forecasting the future has always been one of the man's strongest desires. The aim
to determine the future has resulted in scientifically based forecasting models of
human health, behaviour, economics, weather, etc. The main purpose of forecasting
is to reduce the range of uncertainty within which management decisions must be
made. Forecasts are only effective if they are utilized by those who have decisionmaking
authority. Forecasts need to be understood and appreciated by decision
makers so that they find their way into management of the firm.
Companies still predominantly rely on judgemental forecasting methods, most often
on an informal basis. There is a large literature base that point to the numerous biases
inherent in judgemental forecasting. Most companies know that their forecasts are
incorrect but don't know what to do about it and choose to ignore the issue, hoping
that the problem will solve itself.
The collaborative forecasting process attempts to use history as a baseline, but
supplement current knowledge about specific trends, events and other items. This
approach integrates the knowledge and information that exists internally and
externally into a single, more accurate forecast that supports the entire supply chain.
Demand forecasting is not just a matter of duplicating or predicting history into the
future. It is important that one person should lead and manage the process.
Accountability needs to be established.
An audit on the writer's own organization indicated that no formal forecasting process
was present. The company's forecasting process was very political, since values were
entered just to add up to the required targets. The real gap was never fully
understood. Little knowledge existed regarding statistical analysis and forecasting
within the marketing department who is accountable for the forecast. The forecasting
method was therefore a top-down approach and never really checked with a bottom up
approach.
It was decided to learn more about the new demand planning process prescribed by
the head office, and to start implementing the approach. The approach is a form of a collaborative approach which aims to involve all stakeholders when generating the
forecast, therefore applying a bottom up approach.
Statistical forecasting was applied to see how accurate the output was versus that of
the old way of forecasting. The statistical forecast approach performed better with
product groups where little changed from previous years existed, while the old way
performed better where new activities were planned or known by the marketing team.
This indicates that statistical forecasting is very important for creating the starting
point or baseline forecast, but requires qualitative input from all stakeholders.
Statistical forecasting is therefore not the solution to improved forecasting, but rather
part of the solution to create robust forecasts. / AFRIKAANSE OPSOMMING: Vooruitskatting van die toekoms was nog altyd een van die mens se grootste
begeertes. Die doel om die toekoms te bepaal het gelei tot wiskundige gebaseerde
modelle van die mens se gesondheid, gedrag, ekonomie, weer, ens. The hoofdoel van
vooruitskatting is om die reeks van risikos te verminder waarbinne bestuur besluite
moet neem. Vooruitskattings is slegs effektief as dit gebruik word deur hulle wat
besluitnemingsmag het. Vooruitskattings moet verstaan en gewaardeer word deur die
besluitnemers sodat dit die weg kan vind na die bestuur van die firma.
Maatskappye vertrou nog steeds hoofsaaklik op eie oordeel vooruitskatting metodes,
en meestal op 'n informele basis. Daar is 'n uitgebreide literatuurbasis wat daarop dui
dat heelwat sydigheid betrokke is by vooruitskattings wat gebaseer is op eie oordeel.
Baie organisasies weet dat hulle vooruitskattings verkeerd is, maar weet nie wat
daaromtrent te doen nie en kies om die probleem te ignoreer, met die hoop dat die
probleem vanself sal oplos.
Die geïntegreerde vooruitskattingsproses probeer om die verlede te gebruik as 'n
basis, maar voeg huidige kennis rakende spesifieke neigings, gebeurtenisse, en ander
items saam. Hierdie benadering integreer die kennis en informasie wat intern en
ekstern bestaan in 'n enkele, meer akkurate vooruitskatting wat die hele
verskaffingsketting ondersteun. Vraagvooruitskatting is nie alleen 'n duplisering of
vooruitskatting van die verlede in die toekoms in nie. Dit is belangrik dat een persoon
die proses moet lei en bestuur. Verantwoordelikhede moet vasgestel word.
'n Oudit op die skrywer se organisasie het getoon dat geen formele
vooruitskattingsprosesse bestaan het nie. Die maatskappy se vooruitskattingsproses
was hoogs gepolitiseerd, want getalle was vasgestel wat in lyn was met die nodige
teikens. Die ware gaping was nooit werklik begryp nie. Min kennis was aanwesig
rakende statistiese analises en vooruitskatting binne die bemarkingsdepartement wat
verantwoordelik is vir die vooruitskatting. Die vooruitskatting is dus eerder gedoen
op 'n globale vlak en nie noodwendig getoets deur die vooruitskatting op te bou uit
detail nie. Daar is besluit om meer te leer rakende die nuwe vraagbeplanningsproses, wat
voorgeskryf is deur hoofkantoor, en om die metode te begin implementeer. Die
metode is 'n vorm van 'n geïntegreerde model wat beoog om alle aandeelhouers te
betrek wanneer die vooruitskatting gedoen word, dus die vooruitskatting opbou met
detail.
Statistiese vooruitskatting was toegepas om te sien hoe akkuraat die uitset was teenoor
die ou manier van vooruitskatting. Die statistiese proses het beter gevaar waar die
produkgroepe min verandering van vorige jare ervaar het, terwyl die ou manier beter
gevaar het waar bemarking self die nuwe aktiwiteite beplan het of bewus was
daarvan. Dit bewys dat statistiese vooruitskatting baie belangrik is om die basis
vooruitskatting te skep, maar dit benodig kwalitatiewe insette van all aandeelhouers.
Statistiese vooruitskattings is dus nie die oplossing vir beter vooruitskattings nie, maar
deel van die oplossing om kragtige vooruitskattings te skep.
|
443 |
RELIABILITY GROWTH MODELS FOR ATTRIBUTES (BAYES, SMITH).SANATGAR FARD, NASSER. January 1982 (has links)
In this dissertation the estimation of reliability for a developmental process generating attribute type data is examined. It is assumed that the process consists of m stages, and the probability of failure is constant or decreasing from stage to stage. Several models for estimating the reliability at each stage of the developmental process are examined. In the classical area, Barlow and Scheuer's model, Lloyd and Lipow's model and a cumulative maximum likelihood estimation model are investigated. In the Bayesian area A.F.M. Smith's model, an empirical Bayes model and a cumulative beta Bayes model are investigated. These models are analyzed both theoretically and by computer simulation. The strengths and weaknesses of each are pointed out, and modifications are made in an attempt to improve their accuracy. The constrained maximum likelihood estimation model of Barlow and Scheuer is shown to be inaccurate when no failures occur at the final stage. Smith's model is shown to be incorrect and a corrected algorithm is presented. The simulation results of these models with the same data indicate that with the exception of the Barlow and Scheuer's model they are all conservative estimators. When reliability estimation with growth is considered, it is reasonable to emphasize data obtained at recent stages and de-emphasize data from the earlier stages. A methodology is developed using geometric weights to improve the estimates. This modification is applied to the cumulative MLE model, Lloyd and Lipow's model, Barlow and Scheuer's model and cumulative beta Bayes model. The simulation results of these modified models show considerable improvement is obtained in the cumulative MLE model and the cumulative beta Bayes model. For Bayesian models, in the absence of prior knowledge, the uniform prior is usually used. A prior with maximum variance is examined theoretically and through simulation experiments for use with the cumulative beta Bayes model. These results show that the maximum variance prior results in faster convergence of the posterior distribution than the uniform prior. The revised Smith's model is shown to provide good estimates of the unknown parameter during the developmental process, particularly for the later stages. The beta Bayes model with maximum variance prior and geometric weights also provides good estimates.
|
444 |
Probabilistic analysis of monthly peak factors in a regional water distribution systemKriegler, Benjamin Jacobus 12 1900 (has links)
Thesis (MScEng)--Stellenbosch University, 2013. / ENGLISH ABSTRACT: The design of a water supply system relies on the knowledge of the water demands of its specific end-users.
It is also important to understand the end-users’ temporal variation in water demand. Failure of the system to
provide the required volume of water at the required flow-rate is deemed a system failure. The system
therefore needs to be designed with sufficient capacity to ensure that it is able to supply the required volume
of water during the highest demand periods. In practice, bulk water supply systems do not have to cater for
the high frequency, short duration high peak demand scenarios of the end-user, such as the peak hour or peak
day events, as the impact of events is reduced by the provision of water storage capacity at the off-take from
the bulk supply system. However, for peak demand scenarios with durations longer than an hour or a day,
depending on the situation, the provision of sufficient storage capacity to reduce the impact on the bulk water
system, becomes impractical and could lead to potential water quality issues during low demand periods. It
is, therefore, a requirement that bulk water systems be designed to be able to meet the peak weekly or peak
month end-user demands. These peak demand scenarios usually occur only during a certain portion of the
year, generally concentrated in a two to three month period during the drier months. Existing design
guidelines usually follow a deterministic design approach, whereby a suitable DPF is applied to the average
annual daily system demand in order to determine the expected peak demand on the system. This DPF does
not account for the potential variability in end-user demand profiles, or the impact that end-storage has on
the required peak design factor of the bulk system.
This study investigated the temporal variations of end-user demand on two bulk water supply systems. These
systems are located in the winter rainfall region of the Western Cape province of South Africa. The data
analysed was the monthly measured consumption figures of different end-users supplied from the two
systems. The data-sets extended over 14 years of data. Actual monthly peak factors were extracted from this
data and used in deterministic and probabilistic methods to determine the expected monthly peak factor for
both the end-user and the system design. The probabilistic method made use of a Monte Carlo analysis,
whereby the actual recorded monthly peak factor for each end-user per bulk system was used as an input into
discrete probability functions. The Monte Carlo analysis executed 1 500 000 iterations in order to produce
probability distributions of the monthly peak factors for each system. The deterministic and probabilistic
results were compared to the actual monthly peak factors as calculated from the existing water use data, as
well as against current DPFs as published in guidelines used in the industry. The study demonstrated that the
deterministic method would overstate the expected peak system demand and result in an oversized system.
The probabilistic method yielded good results and compared well with the actual monthly peak factors. It is
thus deemed an appropriate tool to use to determine the required DPF of a bulk water system for a chosen
reliability of supply. The study also indicated the DPFs proposed by current guidelines to be too low. The
study identified a potential relationship between the average demand of an end-user and the expected
maximum monthly peak factor, whereas in current guidelines peak factors are not indicated as being
influenced by the end-user average demand. / AFRIKAANSE OPSOMMING: Die ontwerp van ‘n watervoorsiening stelsel berus op die kennis van die water aanvraag van sy spesifieke
eindverbruikers. Dit is ook belangrik om ‘n begrip te hê van die tydelike variasie van die eindverbruiker se
water-aanvraag. Indien die voorsieningstelsel nie in staat is om die benodigde volume water teen die
verlangde vloeitempo te kan lewer nie, word dit beskou as ‘n faling. Die stelsel word dus ontwerp met
voldoende kapasiteit wat dit sal in staat stel om die benodigde volume gedurende die hoogste aanvraag
periodes te kan voorsien. In die praktyk hoef grootmaat water-voorsiening stelsels nie te voldoen aan spits
watergebeurtenisse met hoë frekwensie en kort duurtes, soos piek-dag of piek-uur aanvraag nie, aangesien
hierdie gebeurtenisse se impak op die grootmaat stelsel verminder word deur die voorsiening van wateropgaring
fasiliteite by die aftap-punte vanaf die grootmaatstelsels. Nieteenstaande, vir piek-aanvraag
gebeurtenisse met langer duurtes as ‘n uur of dag, raak die voorsiening van voldoende wateropgaring
kapasiteit by die aftap-punt onprakties en kan dit selfs lei tot waterkwaliteits probleme. Dit is dus ‘n vereiste
dat grootmaat watervoorsienings stelsels ontwerp moet word om die piek-week of piek-maand eindverbruiker
aanvrae te kan voorsien. Hierdie piek-aanvraag gebeurtenisse vind algemeen in gekonsentreerde
twee- of drie maand periodes tydens die droeër maande plaas. Bestaande ontwerpsriglyne volg gewoonlik ‘n
deterministiese ontwerp benadering, deurdat ‘n voldoende ontwerp spits faktor toegepas word op die
gemiddelde jaarlikse daaglikse stelsel aanvraag om sodoende te bepaal wat die verwagte spits aanvraag van
die stelsel sal wees. Hierdie ontwerp spits faktor maak nie voorsiening vir die potensiële variasie in die
eindverbruiker se aanvraag karakter of die impak van die beskikbare water-opgaring fasiliteit op die
benodigde ontwerp spits faktor van die grootmaat-stelsel nie.
Hierdie studie ondersoek die tydelike variasie van die eindverbruiker se aanvraag op twee grootmaat watervoorsiening
stelsels. Die twee stelsels is geleë in die winter reënval streek van die Wes-Kaap provinsie van
Suid-Afrika. Die data wat geanaliseer is was die maandelikse gemeterde verbruiksyfers van verskillende
eindverbruikers voorsien deur die twee stelsels. Die datastelle het oor 14 jaar gestrek. Die ware maand piekfaktore
is bereken vanaf die data en is in deterministiese en probabilistiese metodes gebruik om die verwagte
eindverbruiker en stelsel ontwerp se maand spits-faktore te bereken. Die probabilistiese metode het gebruik
gemaak van ‘n Monte Carlo analise metode, waardeur die ware gemeette maand spits-faktor vir elke
eindverbruiker vir elke grootmaatstelsel gebruik is as invoer tot diskrete waarskynlikheids funksies. Die
Monte Carlo analise het 1 500 000 iterasies voltooi om waarskynlikheids-verdelings van elke maand spitsfaktor
vir elke stelsel te bereken. Die deterministiese en probabilistiese resultate is vergelyk met die ware
maand spits faktore soos bereken vanuit die bestaande waterverbruik data, asook teen huidige gepubliseerde
ontwerp spits-faktore, wat in die bedryf gebruik word.
Die studie het aangetoon dat die deterministiese metode te konserwatief is en dat dit die verwagte piekaanvraag
van die stelsel sal oorskat en dus sal lei tot ‘n oorgrootte stelsel. Die probabilistiese metode het
goeie resultate opgelewer wat goed vergelyk met die ware maand piek-faktore. Dit word gereken as ‘n
toepaslike metode om die benodigde ontwerp spits-faktor van ‘n grootmaat-watervoorsiening stelsel te bepaal vir ‘n gekose voorsieningsbetroubaarheid. Die studie het ook aangedui dat die ontwerps piek-faktore
voorgestel deur die huidige riglyne te laag is en dat dit tot die falings van ‘n stelsel sal lei. Die studie het ‘n
moontlike verwantskap tussen die gemiddelde daaglikse wateraanvraag van die eindverbruiker en die
verwagte maksimum maand spits faktor geïdentifiseer, nademaal die piek-faktore soos voorgestel deur die
huidige riglyne nie beïnvloed word deur die eindverbruiker se gemiddelde verbruik nie.
|
445 |
Statistical modelling of daily mortality and air pollutant concentrations馬時樂, Ma, Sze-lok, Stefan. January 2003 (has links)
published_or_final_version / Community Medicine / Doctoral / Doctor of Philosophy
|
446 |
A study of alcohol pharmacokinetic of local Chinese in Hong KongYang, Chi-ting., 楊志停. January 2003 (has links)
published_or_final_version / abstract / toc / Statistics and Actuarial Science / Master / Master of Philosophy
|
447 |
Can automated alerts generated from influenza surveillance data reduceinstitutional outbreaks in Hong KongTam, Yat-hung., 譚一鴻. January 2006 (has links)
published_or_final_version / Community Medicine / Master / Master of Public Health
|
448 |
Applications of age-period-cohort and state-transition Markov models in understanding cervical cancer incidence trends and evaluating thecost-effectiveness of cytologic screeningWoo, Pao-sun, Pauline., 胡寶璇. January 2006 (has links)
published_or_final_version / abstract / Community Medicine / Doctoral / Doctor of Philosophy
|
449 |
Applications of Bayesian statistical model selection in social scienceresearchSo, Moon-tong., 蘇滿堂. January 2007 (has links)
published_or_final_version / abstract / Social Sciences / Doctoral / Doctor of Philosophy
|
450 |
Using GIS and statistical models for traffic accidents analysis: a case study of the Tuen Mun town centreYau, C. P., Eric., 丘之鵬. January 2006 (has links)
published_or_final_version / abstract / Transport Policy and Planning / Master / Master of Arts in Transport Policy and Planning
|
Page generated in 0.1259 seconds