41 |
On ENSO-Modified Hurricane Formation in the North AtlanticWelty, Joshua Stephen 22 May 2015 (has links)
No description available.
|
42 |
Discriminant analysis applied to predict success in advanced placement mathematics : calculus AB or calculus BC /Bowers, Francis Andrew Imaikalani January 1984 (has links)
No description available.
|
43 |
Exploratory Study of Distracted Behaviors of Transit OperatorsArbie, Nurlayla 30 August 2014 (has links)
Bus transit driving is an occupation that requires high concentration in driving and is demanding due to work overload, time pressure, and responsibility for lives. In 2006, there were 103 fatal crashes involving transit buses. As the number of distraction-related crashes increases, it is important to conduct a transit distraction study to reduce future crashes.
This thesis focused on the analysis of the likelihood of the operator distraction behaviors and the analysis to find a predictive model to classify different distraction categories. An ordinal logistic regression was carried out to evaluate how age, gender, driving experience of the operators, and their driving frequencies accounts for the likelihood of 17 potential distracted driving behaviors. The results of this analysis showed that there were only 5 best models (p-value of model fit less than 0.005 and p-value of parallel line test more than 0.005) that could be constructed, including: listening to the radio/ CD/DVD/MP3 player (D1); picking Up and Holding 2-way Radio (D5); listening to the Dispatch Office broadcast (D6); adjusting switches/controls on dashboard (D15); and utilizing mentor ranger (D16).
On the other hand, a discriminant analysis was performed to predict how different transit operator driving behaviors when exposed by 10 different distraction activities and 16 predictors were considered in this analysis. The final results showed that there are 4 predictors that seem to be able to classify distraction groups across all 4 models; those include segment length, average duration of idling time/stop delay at speed interval 0—4 km/hr, frequency of speed transitions that deviate by ± 0 to 4 km/hr from its speed, and frequency of speed transitions that deviate by ± 8 to 12 km/hr from its speed. / Master of Science
|
44 |
The Use of Genetic Polymorphisms and Discriminant Analysis in Evaluating Genetic Polymorphisms as a Predictor of PopulationHowell, Bruce F. 05 1900 (has links)
Discriminant analysis is a procedure for identifying the relationships between qualitative criterion variables and quantitative predictor variables. Data bases of genetic polymorphisms are currently available that group such polymorphisms by ethnic origin or nationality. Such information could be useful to entities that base financial determinations upon predictions of disease or to medical researchers who wish to target prevention and treatment to population groups. While the use of genetic information to make such determinations is unlawful in states and confidentiality and privacy concerns abound, methods for human “redlining” may occur. Thus, it is necessary to investigate the efficacy of the relationship of certain genetic information to ethnicity to determine if a statistical analysis can provide information concerning such relationship. The use of the statistical technique of discriminant analysis provides a tool for examining such relationship.
|
45 |
Comparisons of Improvement-Over-Chance Effect Sizes for Two Groups Under Variance Heterogeneity and Prior ProbabilitiesAlexander, Erika D. 05 1900 (has links)
The distributional properties of improvement-over-chance, I, effect sizes derived from linear and quadratic predictive discriminant analysis (PDA) and from logistic regression analysis (LRA) for the two-group univariate classification were examined. Data were generated under varying levels of four data conditions: population separation, variance pattern, sample size, and prior probabilities. None of the indices provided acceptable estimates of effect for all the conditions examined. There were only a small number of conditions under which both accuracy and precision were acceptable. The results indicate that the decision of which method to choose is primarily determined by variance pattern and prior probabilities. Under variance homogeneity, any of the methods may be recommended. However, LRA is recommended when priors are equal or extreme and linear PDA is recommended when priors are moderate. Under variance heterogeneity, selecting a recommended method is more complex. In many cases, more than one method could be used appropriately.
|
46 |
Robustness Against Non-Normality : Evaluating LDA and QDA in Simulated Settings Using Multivariate Non-Normal DistributionsViktor, Gånheim, Isak, Åslund January 2023 (has links)
Evaluating classifiers in controlled settings is essential for empirical applications, as extensive knowledge on model-behaviour is needed for accurate predictions. This thesis investigates robustness against non-normality of two prominent classifiers, LDA and QDA. Through simulation, errors in leave-one-out cross-validation are compared for data generated by different multivariate distributions, also controlling for covariance structures, class separation and sample sizes. Unexpectedly, the classifiers perform better on data generated by heavy-tailed symmetrical distributions than by the normal distribution. Possible explanations are proposed, but the cause remains unknown. There is need for further studies, investigating more settings as well as mathematical properties to verify and understand these results.
|
47 |
Nonlinear Generalizations of Linear Discriminant Analysis: the Geometry of the Common Variance Space and Kernel Discriminant AnalysisKim, Jiae January 2020 (has links)
No description available.
|
48 |
Predicting business cycle regimes using discriminant analysisBowden, Dion Eldred 12 1900 (has links)
Thesis (MBA)--Stellenbosch University, 2000. / ENGLISH ABSTRACT: The assumption underlying this study is that the regime of the economy imparts certain
characteristics to the business cycle indicators and that by using a discriminant analysis it would
be possible to gain information from the various indicators as to the state of activity in the
economy.
A discriminant analysis was developed on an Excel spreadsheet. The Schwartz Information
Criterion, SIC, was calculated for the models. This value compares how closely the model
follows the true data generating process. The discriminant analysis was performed using all the
variables or indicators applicable to the model in question. Using a linear programming algorithm
the variables were removed from the model in order to maximise the SIC value for the model.
The result was a variable set that maximised the information about the regime of the economy
available from the various economic indicators. The models' performance was evaluated for post
sample performance in a test data set. Five models were developed. They were:
• the coincident logistic model;
• the one period ahead logistic CLI (composite leading indicator) model;
• the one period ahead logistic component model;
• the three period ahead logistic CLI model; and
• the three period ahead logistic component model.
All the models produced meaningful results in the estimation data set for the United States
economy. In the test data set only the coincident logistic model was found to give a clear signal of the regime switch. All models applied to the US data showed activity around all the regime
switches.
Two of the models did not produce useful results when applied to South African economic data.
For this reason the one and two period ahead logistic component models were not used. The
remaining three models gave clear signals of regime switches for all regime switches in the
estimation and the test data set.
The best overall model as far as SIC value was the one period ahead logistic CLI model applied
to the South African data. The highest SIC for a model applied to the United States data is the
logistic coincident model. The models were also evaluated on the number of wrong
classifications. The best model in this regard is the coincident logistic model and one period
ahead logistic CLI model applied to the United States data. The most accurate model for the
South African data was the one-month ahead logistic CLI model in the estimation data set and the
logistic coincident model in the test data set. The models were more decisive in the South African
data than in the United States data set having a much lower region of uncertainty. Taking into
consideration the greater decisiveness in conjunction with accuracy the models performed better
with the South African data.
The discriminant analysis generates a probability of expansion, which is used in conjunction with
a classification rule based on observed frequencies in the estimation data set. A plot of the
probability of expansion calculated by the models versus the true data generating process reveals
that the models provide meaningful information as to the regime of the economy. The models
tend to lag the true data generating process but do show activity around the regime switches. The models when applied to the United States data show good correlation with the true data
generating process over the estimation data set but not as good over the test data set.
The models perform better when applied to South African data when evaluated graphically. The
models when applied to the South African data give good clear signals over all regime switches
in all data sets. Indications of regime switches in the estimation data set were clearer than in the
test data set.
The use of a discriminant analysis for regime classification has been proven to be effective. This
method should be used in conjunction with other methods to evaluate business cycle regimes.
Useful information is extracted as regards the state of the economy from the various economic
indicators. For this reason discriminant analysis of business cycles can be used as an additional
tool for the evaluation of business cycle regimes. / AFRIKAANSE OPSOMMING: Die onderliggende aanname van hierdie studie is dat die ekonomiese stelsel sekere eienskappe
aan die sakesiklus verleen, en dat 'n diskriminant ontleding dit moontlik maak om inligting te
verkry uit die verskeie aanwysers oor die stand van ekonomiese aktiwiteite.
'n Diskriminant ontleding is op 'n Excel-sigblad ontwerp. Die Schwartz Informasie Kriterium
(SIK) is vir die modelle bereken. Hierdie waarde dui aan hoe getrou die model die ware
datagenereringsproses volg. Die diskriminant ontleding is gedoen deur gebruik te maak van al die
veranderlikes of aanwysers wat van toepassing is op die betrokke model. Die veranderlikes is uit
die model verwyder deur die gebruik van 'n lineêre programmerings algoritme, ten einde die
SIK-waarde van die model te maksimaliseer. Die resultaat was 'n stel veranderlikes wat inligting
via die verskeie ekonomiese aanwysers oor die beskikbare ekonomiese stelsel maksimaliseer het.
Die model is vir buite-steekproef prestasie in 'n toetsdatastel evalueer. Die volgende vyf modelle
is ontwikkel:
• samevallende logistiese model
• een periode vooruit logistiese saamgestelde leidende aanwysers (SLA)- model
• een periode vooruit logistiese komponentmodel
• drie periode vooruit logistiese SLA-model
• drie periode vooruit logistiese komponentmodel.
Al die modelle het betekenisvolle resultate in die steekproefdata vir die ekonomie van die VSA
gelewer. In die toetsdatastel het slegs die samevallende logistiese model 'n duidelike aanduiding
van regime-verandering gegee. Alle modelle wat op die VSA data toegepas is, het aktiwiteite
rondom al die regime-veranderings aangetoon.
Twee van die modelle wat op Suid-Afrikaanse data toegepas is, het nie bruikbare resultate
opgelewer nie, en om hierdie rede is die een en twee periodes vooruit logistiese
komponentmodelle nie gebruik nie. Die oorblywende drie modelle het duidelike aanduidings van
regime-veranderings vir alle regime-veranderings aangetoon in die steekproefdata en die
toetsdatastel.
Die beste oorkoepelende model in terme van SIK-waarde was die een periode vooruit logistiese
SLA-model wat op Suid-Afrikaanse data toegepas is. Die grootste SIK-waarde vir 'n model wat
op VSA-data toegepas is, is vir die samevallende logistiese model. Modelle is ook evalueer in
terme van die foutiewe klassifikasies. Die beste model in hierdie verband is die samevallende
logistiese model en die een periode vooruit logistiese SLA-model wat op VSA-data toegepas is.
Die mees akkurate model vir Suid-Afrikaanse data was die een maand vooruit logistiese
SLA-model in die steekproef datastel en die samevallende logistiese model in die toetsdatastel.
Die modelle was meer beslissend in die Suid-Afrikaanse data as in die VSA-datastel, omdat die
Suid-Afrikaanse data 'n baie kleiner onsekerheidsgebied openbaar het. Gegewe die groter
beslistheid tesame met akkuraatheid, het die modelle beter presteer met Suid-Afrikaanse data.
Die diskriminant ontleding skep 'n opswaaiwaarskynlikheid, wat saam met 'n klassifikasiereël,
gebaseer op die waargenome frekwensies in die steekproefdata, gebruik word. 'n Stip van die opswaaiwaarskynlikhede, bereken volgens die modelle versus die ware datagenereringsproses, dui
daarop dat die modelle betekenisvolle inligting oor die ekonomiese stelsel bied. Die modelle
neig om die ware datagenereringsproses te volg, maar toon tog beweging rondom
regime-veranderings. Die modelle het goeie korrelasie met die ware datagenereringsproses oor
die steekproefdatastel getoon op die VSA-data, maar nie juis goeie korrelasie oor die toetsdatastel
nie. Die modelle presteer beter wanneer dit op Suid-Afrikaanse data toegepas word, en gee goeie,
duidelike tekens oor alle regime-veranderings in alle datastelle. Aanduidings van
regime-veranderings in die steekproefdatastel was duideliker as in die toetsdatastel.
'n Diskriminant ontleding vir stelselklassifikasie het effektief geblyk te wees. Hierdie metode
behoort saam met ander metodes gebruik te word om sakesiklusstelsels te evalueer. Nuttige
inligting word uit die verskillende ekonomiese aanwysers verkry oor die stand van die ekonomie.
Juis om hierdie rede kan 'n diskriminant ontleding van sakesiklusse as bykomende instrument
gebruik word om sakesiklusse te evalueer.
|
49 |
Aspects of the pre- and post-selection classification performance of discriminant analysis and logistic regressionLouw, Nelmarie 12 1900 (has links)
Thesis (PhD)--Stellenbosch University, 1997. / One copy microfiche. / ENGLISH ABSTRACT: Discriminani analysis and logistic regression are techniques that can be used to classify
entities of unknown origin into one of a number of groups. However, the underlying
models and assumptions for application of the two techniques differ. In this study, the
two techniques are compared with respect to classification of entities.
Firstly, the two techniques were compared in situations where no data dependent
variable selection took place. Several underlying distributions were studied: the
normal distribution, the double exponential distribution and the lognormal distribution.
The number of variables, sample sizes from the different groups and the correlation
structure between the variables were varied to' obtain a large number of different
configurations. .The cases of two and three groups were studied. The most important
conclusions are: "for normal and double' exponential data linear discriminant analysis
outperforms logistic regression, especially in cases where the ratio of the number of
variables to the total sample size is large. For lognormal data, logistic regression
should be preferred, except in cases where the ratio of the number of variables to the
total sample size is large. "
Variable selection is frequently the first step in statistical analyses. A large number of
potenti8.Ily important variables are observed, and an optimal subset has to be selected
for use in further analyses. Despite the fact that variable selection is often used, the
influence of a selection step on further analyses of the same data, is often completely
ignored. An important aim of this study was to develop new selection techniques for
use in discriminant analysis and logistic regression. New estimators of the postselection
error rate were also developed. A new selection technique, cross model
validation (CMV) that can be applied both in discriminant analysis and logistic
regression, was developed. ."This technique combines the selection of variables and the
estimation of the post-selection error rate. It provides a method to determine the
optimal model dimension, to select the variables for the final model and to estimate the
post-selection error rate of the discriminant rule. An extensive Monte Carlo simulation
study comparing the CMV technique to existing procedures in the literature, was
undertaken. In general, this technique outperformed the other methods, especially
with respect to the accuracy of estimating the post-selection error rate.
Finally, pre-test type variable selection was considered. A pre-test estimation
procedure was adapted for use as selection technique in linear discriminant analysis. In
a simulation study, this technique was compared to CMV, and was found to perform
well, especially with respect to correct selection. However, this technique is only valid
for uncorrelated normal variables, and its applicability is therefore limited.
A numerically intensive approach was used throughout the study, since the problems
that were investigated are not amenable to an analytical approach. / AFRIKAANSE OPSOMMING: Lineere diskriminantanaliseen logistiese regressie is tegnieke wat gebruik kan word vir die
Idassifikasie van items van onbekende oorsprong in een van 'n aantal groepe. Die
agterliggende modelle en aannames vir die gebruik van die twee tegnieke is egter
verskillend. In die studie is die twee tegnieke vergelyk ten opsigte van k1assifikasievan
items.
Eerstens is die twee tegnieke vergelyk in 'n apset waar daar geen data-afhanklike seleksie
van veranderlikes plaasvind me. Verskeie onderliggende verdelings is bestudeer: die
normaalverdeling, die dubbeleksponensiaal-verdeling,en die lognormaal verdeling. Die
aantal veranderlikes, steekproefgroottes uit die onderskeie groepe en die
korrelasiestruktuur tussen die veranderlikes is gevarieer om 'n groot aantal konfigurasies
te verkry. Die geval van twee en drie groepe is bestudeer. Die belangrikste
gevolgtrekkings wat op grond van die studie gemaak kan word is: vir normaal en
dubbeleksponensiaal data vaar lineere diskriminantanalise beter as logistiese regressie,
veral in gevalle waar die. verhouding van die aantal veranderlikes tot die totale
steekproefgrootte groot is. In die geval van data uit 'n lognormaalverdeling, hehoort
logistiese regressie die metode van keuse te wees, tensy die verhouding van die aantal
veranderlikes tot die totale steekproefgrootte groot is.
Veranderlike seleksie is dikwels die eerste stap in statistiese ontledings. 'n Groot aantal
potensieel belangrike veranderlikes word waargeneem, en 'n subversamelingwat optimaal
is, word gekies om in die verdere ontledings te gebruik. Ten spyte van die feit dat
veranderlike seleksie dikwels gebruik word, word die invloed wat 'n seleksie-stap op
verdere ontledings van dieselfde data. het, dikwels heeltemal geYgnoreer.'n Belangrike
doelwit van die studie was om nuwe seleksietegniekete ontwikkel wat gebruik kan word
in diskriminantanalise en logistiese regressie. Verder is ook aandag gegee aan
ontwikkeling van beramers van die foutkoers van 'n diskriminantfunksie wat met
geselekteerde veranderlikes gevorm word. 'n Nuwe seleksietegniek, kruis-model validasie
(KMV) wat gebruik kan word vir die seleksie van veranderlikes in beide
diskriminantanalise en logistiese regressie is ontwikkel. Hierdie tegniek hanteer die
seleksie van veranderlikes en die beraming van die na-seleksie foutkoers in een stap, en
verskaf 'n metode om die optimale modeldimensiete bepaal, die veranderlikes wat in die
model bevat moet word te kies, en ook die na-seleksie foutkoers van die
diskriminantfunksie te beraam. 'n Uitgebreide simulasiestudie waarin die voorgestelde
KMV-tegniek met ander prosedures in die Iiteratuur. vergelyk is, is vir beide
diskriminantanaliseen logistiese regressie ondemeem. In die algemeen het hierdie tegniek
beter gevaar as die ander metodes wat beskou is, veral ten opsigte van die akkuraatheid
waarmee die na-seleksie foutkoers beraam word.
Ten slotte is daar ook aandag gegee aan voor-toets tipeseleksie. 'n Tegniek is ontwikkel
wat gebruik maak van 'nvoor-toets berarningsmetode om veranderlikes vir insluiting in 'n
lineere diskriminantfunksie te selekteer. Die tegniek ISin 'n simulasiestudie met die KMV-tegniek vergelyk, en vaar baie goed, veral t.o.v. korrekte seleksie. Hierdie tegniek is egter
slegs geldig vir ongekorreleerde normaalveranderlikes, wat die gebruik darvan beperk.
'n Numeries intensiewe benadering is deurgaans in die studie gebruik. Dit is genoodsaak
deur die feit dat die probleme wat ondersoek is, nie deur middel van 'n analitiese
benadering hanteer kan word nie.
|
50 |
Mathematical Programming Approaches to the Three-Group Classification ProblemLoucopoulos, Constantine 08 1900 (has links)
In the last twelve years there has been considerable research interest in mathematical programming approaches to the statistical classification problem, primarily because they are not based on the assumptions of the parametric methods (Fisher's linear discriminant function, Smith's quadratic discriminant function) for optimality. This dissertation focuses on the development of mathematical programming models for the three-group classification problem and examines the computational efficiency and classificatory performance of proposed and existing models. The classificatory performance of these models is compared with that of Fisher's linear discriminant function and Smith's quadratic discriminant function. Additionally, this dissertation investigates theoretical characteristics of mathematical programming models for the classification problem with three or more groups. A computationally efficient model for the three-group classification problem is developed. This model minimizes directly the number of misclassifications in the training sample. Furthermore, the classificatory performance of the proposed model is enhanced by the introduction of a two-phase algorithm. The same algorithm can be used to improve the classificatory performance of any interval-based mathematical programming model for the classification problem with three or more groups. A modification to improve the computational efficiency of an existing model is also proposed. In addition, a multiple-group extension of a mathematical programming model for the two-group classification problem is introduced. A simulation study on classificatory performance reveals that the proposed models yield lower misclassification rates than Fisher's linear discriminant function and Smith's quadratic discriminant function under certain data configurations. Data configurations, where the parametric methods outperform the proposed models, are also identified. A number of theoretical characteristics of mathematical programming models for the classification problem are identified. These include conditions for the existence of feasible solutions, as well as conditions for the avoidance of degenerate solutions. Additionally, conditions are identified that guarantee the classificatory non-inferiority of one model over another in the training sample.
|
Page generated in 0.1621 seconds