111 |
Comparison of methods to calculate measures of inequality based on interval dataNeethling, Willem Francois 12 1900 (has links)
Thesis (MComm)—Stellenbosch University, 2015. / ENGLISH ABSTRACT: In recent decades, economists and sociologists have taken an increasing interest in the study of
income attainment and income inequality. Many of these studies have used census data, but
social surveys have also increasingly been utilised as sources for these analyses. In these
surveys, respondents’ incomes are most often not measured in true amounts, but in categories
of which the last category is open-ended. The reason is that income is seen as sensitive data
and/or is sometimes difficult to reveal.
Continuous data divided into categories is often more difficult to work with than ungrouped data.
In this study, we compare different methods to convert grouped data to data where each
observation has a specific value or point. For some methods, all the observations in an interval
receive the same value; an example is the midpoint method, where all the observations in an
interval are assigned the midpoint. Other methods include random methods, where each
observation receives a random point between the lower and upper bound of the interval. For
some methods, random and non-random, a distribution is fitted to the data and a value is
calculated according to the distribution.
The non-random methods that we use are the midpoint-, Pareto means- and lognormal means
methods; the random methods are the random midpoint-, random Pareto- and random
lognormal methods. Since our focus falls on income data, which usually follows a heavy-tailed
distribution, we use the Pareto and lognormal distributions in our methods.
The above-mentioned methods are applied to simulated and real datasets. The raw values of
these datasets are known, and are categorised into intervals. These methods are then applied
to the interval data to reconvert the interval data to point data. To test the effectiveness of these
methods, we calculate some measures of inequality. The measures considered are the Gini
coefficient, quintile share ratio (QSR), the Theil measure and the Atkinson measure. The
estimated measures of inequality, calculated from each dataset obtained through these
methods, are then compared to the true measures of inequality. / AFRIKAANSE OPSOMMING: Oor die afgelope dekades het ekonome en sosioloë ʼn toenemende belangstelling getoon in
studies aangaande inkomsteverkryging en inkomste-ongelykheid. Baie van die studies maak
gebruik van sensus data, maar die gebruik van sosiale opnames as bronne vir die ontledings
het ook merkbaar toegeneem. In die opnames word die inkomste van ʼn persoon meestal in
kategorieë aangedui waar die laaste interval oop is, in plaas van numeriese waardes. Die rede
vir die kategorieë is dat inkomste data as sensitief beskou word en soms is dit ook moeilik om
aan te dui.
Kontinue data wat in kategorieë opgedeel is, is meeste van die tyd moeiliker om mee te werk as
ongegroepeerde data. In dié studie word verskeie metodes vergelyk om gegroepeerde data om
te skakel na data waar elke waarneming ʼn numeriese waarde het. Vir van die metodes word
dieselfde waarde aan al die waarnemings in ʼn interval gegee, byvoorbeeld die ‘midpoint’
metode waar elke waarde die middelpunt van die interval verkry. Ander metodes is ewekansige
metodes waar elke waarneming ʼn ewekansige waarde kry tussen die onder- en bogrens van die
interval. Vir sommige van die metodes, ewekansig en nie-ewekansig, word ʼn verdeling oor die
data gepas en ʼn waarde bereken volgens die verdeling.
Die nie-ewekansige metodes wat gebruik word, is die ‘midpoint’, ‘Pareto means’ en ‘Lognormal
means’ en die ewekansige metodes is die ‘random midpoint’, ‘random Pareto’ en ‘random
lognormal’. Ons fokus is op inkomste data, wat gewoonlik ʼn swaar stertverdeling volg, en om
hierdie rede maak ons gebruik van die Pareto en lognormaal verdelings in ons metodes.
Al die metodes word toegepas op gesimuleerde en werklike datastelle. Die rou waardes van die
datastelle is bekend en word in intervalle gekategoriseer. Die metodes word dan op die interval
data toegepas om dit terug te skakel na data waar elke waarneming ʼn numeriese waardes het.
Om die doeltreffendheid van die metodes te toets word ʼn paar maatstawwe van ongelykheid
bereken. Die maatstawwe sluit in die Gini koeffisiënt, ‘quintile share ratio’ (QSR), die Theil en
Atkinson maatstawwe. Die beraamde maatstawwe van ongelykheid, wat bereken is vanaf die
datastelle verkry deur die metodes, word dan vergelyk met die ware maatstawwe van
ongelykheid.
|
112 |
Machine learning for systems pathologyVerleyen, Wim January 2013 (has links)
Systems pathology attempts to introduce more holistic approaches towards pathology and attempts to integrate clinicopathological information with “-omics” technology. This doctorate researches two examples of a systems approach for pathology: (1) a personalized patient output prediction for ovarian cancer and (2) an analytical approach differentiates between individual and collective tumour invasion. During the personalized patient output prediction for ovarian cancer study, clinicopathological measurements and proteomic biomarkers are analysed with a set of newly engineered bioinformatic tools. These tools are based upon feature selection, survival analysis with Cox proportional hazards regression, and a novel Monte Carlo approach. Clinical and pathological data proves to have highly significant information content, as expected; however, molecular data has little information content alone, and is only significant when selected most-informative variables are placed in the context of the patient's clinical and pathological measures. Furthermore, classifiers based on support vector machines (SVMs) that predict one-year PFS and three-year OS with high accuracy, show how the addition of carefully selected molecular measures to clinical and pathological knowledge can enable personalized prognosis predictions. Finally, the high-performance of these classifiers are validated on an additional data set. A second study, an analytical approach differentiates between individual and collective tumour invasion, analyses a set of morphological measures. These morphological measurements are collected with a newly developed process using automated imaging analysis for data collection in combination with a Bayesian network analysis to probabilistically connect morphological variables with tumour invasion modes. Between an individual and collective invasion mode, cell-cell contact is the most discriminating morphological feature. Smaller invading groups were typified by smoother cellular surfaces than those invading collectively in larger groups. Interestingly, elongation was evident in all invading cell groups and was not a specific feature of single cell invasion as a surrogate of epithelialmesenchymal transition. In conclusion, the combination of automated imaging analysis and Bayesian network analysis provides an insight into morphological variables associated with transition of cancer cells between invasion modes. We show that only two morphologically distinct modes of invasion exist. The two studies performed in this thesis illustrate the potential of a systems approach for pathology and illustrate the need of quantitative approaches in order to reveal the system behind pathology.
|
113 |
BAYES RISK ANALYSIS OF REGIONAL REGRESSION ESTIMATES OF FLOODSMetler, William Arledge 02 1900 (has links)
This thesis defines a methodology for the evaluation of the
worth of streamflow data using a Bayes risk approach. Using regional
streamflow data in a regression analysis, the Bayes risk can be computed
by considering the probability of the error in using the regionalized
estimates of bridge or culvert design parameters. Cost curves for over-
and underestimation of the design parameter can be generated based on
the error of the estimate. The Bayes risk can then be computed by integrating
the probability of estimation error over the cost curves. The
methodology may then be used to analyze the regional data collection effort
by considering the worth of data for a record site relative to the
other sites contributing to the regression equations.
The methodology is illustrated by using a set of actual streamflow
data from Missouri. The cost curves for over- and underestimation
of the streamflow design parameter for bridges and culverts are hypothesized
so that the Bayes risk might be computed and the results of the
analysis discussed. The results are discussed by demonstrating small
sample bias that is introduced into the estimate of the design parameter
for the construction of bridges and culverts. The conclusions are that
the small sample bias in the estimation of large floods can be substantial
and that the Bayes risk methodology can evaluate the relative worth
of data when the data are used in regionalization.
|
114 |
Optimal design for experiments with mixtures陳令由, Chan, Ling-yau. January 1986 (has links)
published_or_final_version / Mathematics / Doctoral / Doctor of Philosophy
|
115 |
Construction and testing of causal models in voting behaviour with reference to Hong KongLui, Kwok-man, Richard., 呂國民. January 1996 (has links)
published_or_final_version / Politics and Public Administration / Doctoral / Doctor of Philosophy
|
116 |
Multilevel models for survival analysis in dental researchWong, Chun-mei, May., 王春美. January 2005 (has links)
published_or_final_version / abstract / Dentistry / Doctoral / Doctor of Philosophy
|
117 |
New recursive parameter estimation algorithms in impulsive noise environment with application to frequency estimation and systemidentificationLau, Wing-yi., 劉穎兒. January 2006 (has links)
published_or_final_version / abstract / Electrical and Electronic Engineering / Master / Master of Philosophy
|
118 |
Statistical analysis of the infectivity and fatality of an emerging epidemicXu, Ying, 徐穎 January 2009 (has links)
published_or_final_version / Statistics and Actuarial Science / Doctoral / Doctor of Philosophy
|
119 |
Statistical evaluation of mixed DNA stainsChoy, Yan-tsun., 蔡恩浚. January 2009 (has links)
published_or_final_version / Statistics and Actuarial Science / Master / Master of Philosophy
|
120 |
Statistical analysis of temporal and spatial variations in suicide dataYang, Kit-ling., 楊潔玲. January 2009 (has links)
published_or_final_version / Statistics and Actuarial Science / Master / Master of Philosophy
|
Page generated in 0.09 seconds