Spelling suggestions: "subject:"multivariate analysis"" "subject:"ultivariate analysis""
241 |
Multivariate sequential procedures for testing meansJackson, James Edward January 1959 (has links)
We consider a multivariate situation with means µ₁,...,µ<sub>p</sub> and covariance matrix Σ. We wish to derive sequential procedures for testing the hypothesis:
H₀: (µ̲ - µ̲ₒ)Σ⁻¹(µ̲ - μ̲₀)’= λ₀²( usually zero)
against the alternative: μ̲₀
H₁: (µ̲ - µ̲ₒ)Σ⁻¹µ̲ - μ̲₀)’=λ₁²
both for the case where Σ is known (the sequential X²-test) and where Σ is unknown and must be estimated from the sample (the sequential T²-test). These sequential procedures should guarantee that the probability of accepting H₁ when H₀ is true is equal to a and the probability of accepting H₀ when H₁ is true is equal to β.
For the case where Σ is known, λ₀² = 0 and λ₁² = λ², the test procedure is as follows: for a sample of n observations form the probability ratio:
P<sub>1n</sub>/P<sub>0n</sub> = e<sup>-nλ²</sup><sub>0</sub>F₁(p/2;nλ²X<sub>n²</sub>/4)
where p denotes the number of variables, <sub>n</sub>x[with horizontal bars above and below the x] denotes the vector of the sample means based on n observations,
X²<sub>n</sub> = n(<sub>n</sub>x[with horizontal bars above and below the x] - μ̲₀) Σ⁻¹(<sub>n</sub>x[with horizontal bars above and below the x] - μ̲₀)’ and ₀F₁ (c;x) is a type of generalized hypergeometric function.
a. If P<sub>1n</sub>/P<sub>0n</sub> ≤ β/(1-α), accept H₀;
b. If P<sub>1n</sub>/P<sub>0n</sub> ≥ (1- β)/α, accept H₁;
c. If β/(1-α) < P<sub>1n</sub>/P<sub>0n</sub> < (1-β)/α, continue sampling.
For the case where Σ is unknown, the procedure is exactly the same except that the probability ratio is now:
P<sub>1n</sub>/P<sub>0n</sub> = e⁻<sup>-nλ²/2</sup> ₁F₁[n/2,p/2;nλ²T<sub>n</sub>²/2(n-1+T<sub>n</sub>²)]
where T<sub>n</sub>² = n(<sub>n</sub>x[with horizontal bars above and below the x] - μ̲₀)S<sub>n</n>⁻¹(<sub>n</sub>x[with horizontal bars above and below the x] - μ̲₀)' , S<sub>n</sub>denotes the sample covariance matrix based on n observations and ₁F₁(a,c;x) is a confluent hypergeometric function. Procedures are also given for the case λ²₀ ≠ 0.
Similar procedures are given to test the hypothesis:
H₀ = (μ̲₁ - μ̲₂- δ ̲)Σ⁻¹(μ̲₁ - μ̲₂ - δ̲)’ = λ₀² (usually zero) against the alternative:
H₁ = (μ̲₁ - μ̲₂- δ ̲)Σ⁻¹(μ̲₁ - μ̲₂ - δ̲)’ = λ₁²
It is shown that these sequential procedures all exist in the sense that the risks of accepting H₀ when H₁ is true and of accepting H₁ when H₀ is true are approximately α and β respectively and that these sequential procedures terminate with probability unity. Some of these situation have been generalized to give simultaneous tests and the means and covariance matrix of a sample.
No expressions yet exist for the OC or ASN functions although some conjectured values have been determined for the latter and suggest, in comparison with their corresponding fixed-sample tests, substantial reductions in the sample sizes required when either H₀ or H₁ is true.
The general problem of tolerances is discussed and then some of these procedures are demonstrated with a numerical example drawn from the field of ballistic missiles.
The determination of P<sub>1n</sub>/P<sub>0n</sub> is quite laborious for both the sequential X² - and T²-tests since it requires the evaluation of a hypergeometric function each time an observation is made. It would be better for each value of n, given p, α, β and λ² under H₁, to compute the values of X²<sub>n</sub> or T²<sub>n</sub> which would correspond to the boundaries of the tests indicated by β/(1-α) and (1-β)/α. Tables to facilitate both the sequential X²- and T²-tests are given for p = 2,3,...,9; λ² = 0.5, l.0, 2.0; α = β = 0.05 for n ranging from the minimum value necessary to reach a decision to 30, 45 and 60 for λ² = 0.5, 1.0, 2.0 respectively. These tables were prepared on the IBM 650 computer using the Newton-Raphson iterative procedure.
Finally, a discussion is given for the hypergeometric function ₀F₁ (c;x) and a table given of this function for c = .5(.5)5.0 and x = .1(.1)1(1) 10(10)100(50)1000. / Doctor of Philosophy
|
242 |
An empirical evaluation of multivariate sequential procedures for testing meansAppleby, Robert Houston January 1960 (has links)
The purpose of this study is to make an empirical evaluation of Multivariate Sequential Procedures for Testing Means as proposed by J. E. Jackson. This was done by simulated sampling from a multivariate normal population with known means, both when the variance-covariance matrix is assumed known (Sequential χ²-Test), and when it is assumed unknown (Sequential T² -Test).
The results indicate the test procedure, either the Sequential χ²-Test or the Sequential T² -Test, lead to satisfactory decisions. The tests are conservative but the α and β errors are of an appropriate magnitude. The method (Bhate's Conjecture) used to approximate the Average Sample Number appears to give good estimates of the size of sample needed to obtain a decision. / Master of Science
|
243 |
On the adequacy of the -2 log [lambda] approximation in multivariate analysisBruce, Charles A. 08 September 2012 (has links)
Exact distributions of statistics for the tests of hypotheses in multivariate analysis of variance and for the test of independence are compared z with the asymptotic -XL distribution for -2 log à s.
â Critica1" sample sizes have been recorded which indicate the magnitude of a sample needed so that the approximate technique may produce satisfactory results for testing at the .05 and .01 significance levels. / Master of Science
|
244 |
A multivariate analysis of work-life balance outcomes from a large-scale telework programmeMaruyama, Takao, Hopkinson, Peter G., James, P. January 2009 (has links)
No / A multivariate analysis identified six predictors to explain positive work-life balance (WLB) among 1,566 teleworkers. Time flexibility variables were found to be most dominant. Gender or having dependent children was not significant. These results demonstrated that controlling working hours was the most important ability for sampled teleworkers to achieve positive WLB.
|
245 |
Three-mode principal component analysis in designed experimentsSee, Kyoungah 21 October 2005 (has links)
This dissertation is concerned with the application of a method for decomposing multivariate data, three-mode principal component analysis, to a three-way table with one observation per cell. It is based on the class of multiplicative models for three-way tables (s x t x u) whose general form has expectation
E(y<sub>ijk</sub>) = μ + α<sub>i</sub> + β<sub>j</sub> + γ<sub>k</sub> + (αβ)<sub>ij</sub> + (αγ)<sub>ik</sub> + (βγ)<sub>jk</sub> + s Σ <sub>p=1</sub> t Σ <sub>q=1</sub> u Σ <sub>r=1</sub> c<sub>pqr</sub> g<sub>ip</sub> h<sub>jq</sub> e<sub>kr</sub>.
The application is related to tests for nonadditivity in the two-way analysis of variance with noreplication.
Three-factor interaction can be assessed for three-way cross classified tables with only one observation per treatment combination. This is done by partitioning the three-factor interaction sum of squares into a portion related to the interaction and a portion associated with random error. In particular, the estimated interaction matrix is decomposed by three-mode principal component analysis to separate significant interaction from random error. Three test procedures are presented for assessing interaction: randomization tests, Monte Carlo methods, and likelihood ratio tests. Examples illustrating the use of these approaches are presented.
In addition to the above testing approaches, a graphical procedure, joint plots, is investigated for diagnosing the type of model to fit to three-way arrays of data. The plot is a multiway analogue of the biplot graphical analysis for two-way matrices. Each observation is represented by a linear combination of inner products of markers which are obtained from three-mode principal component analysis. The relationship between various models and the geometrical configurations of the plots on Euclidean spaces of such markers allows one to diagnose the type of model which fits the data. An example is given to illustrate the simplicity of the technique and the usefulness of this graphical approach in diagnosing models. / Ph. D.
|
246 |
An empirical examination of the impact of JROTC participation on enlistment, retention and attritionDays, Janet H., Ang, Yee Ling 12 1900 (has links)
Approved for public release; distribution in unlimited. / Our primary research interest is whether participation in the Junior Reserve Offices Training Corps (JROTC) program influences youths' propensity to enlist; and for those who subsequently enlist, the influence on retention rates and propensity to reenlist. The novelty of this thesis lies in conducting multivariate analysis of the impact of JROTC participation on enlistment, retention and reenlistment. Our data sources are (1) the 1980 High School and Beyond (HS & B) survey and (2) Defense Manpower Data Center (DMDC) enlisted personnel cohort files from Fiscal Year (FY) 1980 to 2000. We employ a number of econometric models with the HS & B data, including single equation PROBIT and LOGIT models, two-stage least squares (2SLS) with instrumental variables (IVs) and bivariate PROBIT equation. Our results show that JROTC positively influence enlistment when we treat JROTC participation as exogenous for both high school seniors and sophomores. The impact of JROTC participation on military enlistment decisions becomes negligible however, when we account for self-selection into the JROTC program of high school students. Using PROBIT and LOGIT models on the DMDC data, we find that enlisted personnel who graduated from JROTC are more likely to reenlist than non-JROTC graduates. Using the Cox proportional hazard survival analysis method, we find that JROTC graduates personnel tend to stay longer and complete their first-term than non-JROTC graduates. Synthesizing the results, we conclude that policy-makers might find it worthwhile to actively target JROTC cadets for enlistment because in the long run, it pays off in terms of higher first-term completion rates which results in cost savings in the form of enlistment bonuses and training costs. One possible extension of our study is to monetize our results for a cost-benefit analysis of the JROTC program vis--̉vis other recruitment programs. Quantifying the net benefits and costs of the JROTC program will allow policy-makers to make more informed decisions with regard to the future direction of the JROTC program. / Lieutenant, United States Navy / Civilian, Ministry of Defense Singapore
|
247 |
COMPOSITE NONPARAMETRIC TESTS IN HIGH DIMENSIONVillasante Tezanos, Alejandro G. 01 January 2019 (has links)
This dissertation focuses on the problem of making high-dimensional inference for two or more groups. High-dimensional means both the sample size (n) and dimension (p) tend to infinity, possibly at different rates. Classical approaches for group comparisons fail in the high-dimensional situation, in the sense that they have incorrect sizes and low powers. Much has been done in recent years to overcome these problems. However, these recent works make restrictive assumptions in terms of the number of treatments to be compared and/or the distribution of the data. This research aims to (1) propose and investigate refined small-sample approaches for high-dimension data in the multi-group setting (2) propose and study a fully-nonparametric approach, and (3) conduct an extensive comparison of the proposed methods with some existing ones in a simulation.
When treatment effects can meaningfully be formulated in terms of means, a semiparametric approach under equal and unequal covariance assumptions is investigated. Composites of F-type statistics are used to construct two tests. One test is a moderate-p version – the test statistic is centered by asymptotic mean – and the other test is a large-p version asymptotic-expansion based finite-sample correction for the mean of the test statistic. These tests do not make any distributional assumptions and, therefore, they are nonparametric in a way. The theory for the tests only requires mild assumptions to regulate the dependence. Simulation results show that, for moderately small samples, the large-p version yields substantial gain in the size with a small power tradeoff.
In some situations mean-based inference is not appropriate, for example, for data that is in ordinal scale or heavy tailed. For these situations, a high-dimensional fully-nonparametric test is proposed. In the two-sample situation, a composite of a Wilcoxon-Mann-Whitney type test is investigated. Assumptions needed are weaker than those in the semiparametric approach. Numerical comparisons with the moderate-p version of the semiparametric approach show that the nonparametric test has very similar size but achieves superior power, especially for skewed data with some amount of dependence between variables.
Finally, we conduct an extensive simulation to compare our proposed methods with other nonparametric test and rank transformation methods. A wide spectrum of simulation settings is considered. These simulation settings include a variety of heavy tailed and skewed data distributions, homoscedastic and heteroscedastic covariance structures, various amounts of dependence and choices of tuning (smoothing window) parameter for the asymptotic variance estimators. The fully-nonparametric and the rank transformation methods behave similarly in terms of type I and type II errors. However, the two approaches fundamentally differ in their hypotheses. Although there are no formal mathematical proofs for the rank transformations, they have a tendency to provide immunity against effects of outliers. From a theoretical standpoint, our nonparametric method essentially uses variable-by-variable ranking which naturally arises from estimating the nonparametric effect of interest. As a result of this, our method is invariant against application of any monotone marginal transformations. For a more practical comparison, real-data from an Encephalogram (EEG) experiment is analyzed.
|
248 |
Aspects of the pre- and post-selection classification performance of discriminant analysis and logistic regressionLouw, Nelmarie 12 1900 (has links)
Thesis (PhD)--Stellenbosch University, 1997. / One copy microfiche. / ENGLISH ABSTRACT: Discriminani analysis and logistic regression are techniques that can be used to classify
entities of unknown origin into one of a number of groups. However, the underlying
models and assumptions for application of the two techniques differ. In this study, the
two techniques are compared with respect to classification of entities.
Firstly, the two techniques were compared in situations where no data dependent
variable selection took place. Several underlying distributions were studied: the
normal distribution, the double exponential distribution and the lognormal distribution.
The number of variables, sample sizes from the different groups and the correlation
structure between the variables were varied to' obtain a large number of different
configurations. .The cases of two and three groups were studied. The most important
conclusions are: "for normal and double' exponential data linear discriminant analysis
outperforms logistic regression, especially in cases where the ratio of the number of
variables to the total sample size is large. For lognormal data, logistic regression
should be preferred, except in cases where the ratio of the number of variables to the
total sample size is large. "
Variable selection is frequently the first step in statistical analyses. A large number of
potenti8.Ily important variables are observed, and an optimal subset has to be selected
for use in further analyses. Despite the fact that variable selection is often used, the
influence of a selection step on further analyses of the same data, is often completely
ignored. An important aim of this study was to develop new selection techniques for
use in discriminant analysis and logistic regression. New estimators of the postselection
error rate were also developed. A new selection technique, cross model
validation (CMV) that can be applied both in discriminant analysis and logistic
regression, was developed. ."This technique combines the selection of variables and the
estimation of the post-selection error rate. It provides a method to determine the
optimal model dimension, to select the variables for the final model and to estimate the
post-selection error rate of the discriminant rule. An extensive Monte Carlo simulation
study comparing the CMV technique to existing procedures in the literature, was
undertaken. In general, this technique outperformed the other methods, especially
with respect to the accuracy of estimating the post-selection error rate.
Finally, pre-test type variable selection was considered. A pre-test estimation
procedure was adapted for use as selection technique in linear discriminant analysis. In
a simulation study, this technique was compared to CMV, and was found to perform
well, especially with respect to correct selection. However, this technique is only valid
for uncorrelated normal variables, and its applicability is therefore limited.
A numerically intensive approach was used throughout the study, since the problems
that were investigated are not amenable to an analytical approach. / AFRIKAANSE OPSOMMING: Lineere diskriminantanaliseen logistiese regressie is tegnieke wat gebruik kan word vir die
Idassifikasie van items van onbekende oorsprong in een van 'n aantal groepe. Die
agterliggende modelle en aannames vir die gebruik van die twee tegnieke is egter
verskillend. In die studie is die twee tegnieke vergelyk ten opsigte van k1assifikasievan
items.
Eerstens is die twee tegnieke vergelyk in 'n apset waar daar geen data-afhanklike seleksie
van veranderlikes plaasvind me. Verskeie onderliggende verdelings is bestudeer: die
normaalverdeling, die dubbeleksponensiaal-verdeling,en die lognormaal verdeling. Die
aantal veranderlikes, steekproefgroottes uit die onderskeie groepe en die
korrelasiestruktuur tussen die veranderlikes is gevarieer om 'n groot aantal konfigurasies
te verkry. Die geval van twee en drie groepe is bestudeer. Die belangrikste
gevolgtrekkings wat op grond van die studie gemaak kan word is: vir normaal en
dubbeleksponensiaal data vaar lineere diskriminantanalise beter as logistiese regressie,
veral in gevalle waar die. verhouding van die aantal veranderlikes tot die totale
steekproefgrootte groot is. In die geval van data uit 'n lognormaalverdeling, hehoort
logistiese regressie die metode van keuse te wees, tensy die verhouding van die aantal
veranderlikes tot die totale steekproefgrootte groot is.
Veranderlike seleksie is dikwels die eerste stap in statistiese ontledings. 'n Groot aantal
potensieel belangrike veranderlikes word waargeneem, en 'n subversamelingwat optimaal
is, word gekies om in die verdere ontledings te gebruik. Ten spyte van die feit dat
veranderlike seleksie dikwels gebruik word, word die invloed wat 'n seleksie-stap op
verdere ontledings van dieselfde data. het, dikwels heeltemal geYgnoreer.'n Belangrike
doelwit van die studie was om nuwe seleksietegniekete ontwikkel wat gebruik kan word
in diskriminantanalise en logistiese regressie. Verder is ook aandag gegee aan
ontwikkeling van beramers van die foutkoers van 'n diskriminantfunksie wat met
geselekteerde veranderlikes gevorm word. 'n Nuwe seleksietegniek, kruis-model validasie
(KMV) wat gebruik kan word vir die seleksie van veranderlikes in beide
diskriminantanalise en logistiese regressie is ontwikkel. Hierdie tegniek hanteer die
seleksie van veranderlikes en die beraming van die na-seleksie foutkoers in een stap, en
verskaf 'n metode om die optimale modeldimensiete bepaal, die veranderlikes wat in die
model bevat moet word te kies, en ook die na-seleksie foutkoers van die
diskriminantfunksie te beraam. 'n Uitgebreide simulasiestudie waarin die voorgestelde
KMV-tegniek met ander prosedures in die Iiteratuur. vergelyk is, is vir beide
diskriminantanaliseen logistiese regressie ondemeem. In die algemeen het hierdie tegniek
beter gevaar as die ander metodes wat beskou is, veral ten opsigte van die akkuraatheid
waarmee die na-seleksie foutkoers beraam word.
Ten slotte is daar ook aandag gegee aan voor-toets tipeseleksie. 'n Tegniek is ontwikkel
wat gebruik maak van 'nvoor-toets berarningsmetode om veranderlikes vir insluiting in 'n
lineere diskriminantfunksie te selekteer. Die tegniek ISin 'n simulasiestudie met die KMV-tegniek vergelyk, en vaar baie goed, veral t.o.v. korrekte seleksie. Hierdie tegniek is egter
slegs geldig vir ongekorreleerde normaalveranderlikes, wat die gebruik darvan beperk.
'n Numeries intensiewe benadering is deurgaans in die studie gebruik. Dit is genoodsaak
deur die feit dat die probleme wat ondersoek is, nie deur middel van 'n analitiese
benadering hanteer kan word nie.
|
249 |
Modelling multivariate survival data using semiparametric models李友榮, Lee, Yau-wing. January 2000 (has links)
published_or_final_version / Statistics and Actuarial Science / Master / Master of Philosophy
|
250 |
Multivariate statistical strategies for the diagnosis of space-occupying liver disease.Stempski, Mark Owen. January 1987 (has links)
This dissertation investigated the use of a variety of multivariate statistical procedures to answer questions regarding the value of a number of medical tests and procedures in the diagnosis of space-occupying liver disease. Also investigated were some aspects of test ordering behavior by physicians. A basic methodology was developed to deal with archival data. A number of methodological problems were addressed. Discriminant function analysis was used to determine which procedures and tests served to provide the best classification of disease entities. Although the results were not spectacular, some variables, including a physical examination variable and a number of laboratory procedures were identified as being important. A more detailed analysis of the role of the laboratory variables was afforded by the use of stepwise logistic regression. In these analyses pairs of disease classifications were compared. Two of the more specific laboratory tests, total bilirubin and alkaline phosphate, entered into the equations to provide a fit to the data. Logistic regression analyses employing patient variables mirrored the results obtained with the discriminant function analyses. Liver-spleen scan indicants were also employed as predictor variables in a series of logistic regression analyses. In general, for a range of comparisons, those indicants cited in the literature as being valuable in discriminating between disease entities entered into the equations. Log-Linear models were used to investigate test ordering behavior. In general, test ordering was independent of department. The sole exception being that of the Gynecology-oncology department which relies heavily on Ultrasound. Log-Linear analyses investigating the use of a number of procedures showed differential use of procedures consistent with what is usually suggested in the medical literature for the combination of different imaging and more specialized procedures. Finally, a set of analyses investigated the ordering of a number of procedures relative to specific disease classifications. This set of analyses suffers, as do a number of the other analyses, from insufficient numbers of cases. However, some indications of differential performance of tests for different disease classifications were evident. Suggestions for further study concentrated on the development of experimental procedures given the results of this study.
|
Page generated in 0.1571 seconds