Global ETD Search

1	The problem of classifying members of a population into groups Flora, Roger Everette January 1965 (has links) A model is assumed in which individuals are to be classified into groups as to their "potential” with respect to a given characteristic. For example, one may wish to classify college applicants into groups with respect to their ability to succeed in college. Although actual values for the “potential,” or underlying variable of classification, may be unobservable, it is assumed possible to divide the individuals into groups with respect to this characteristic. Division into groups may be accomplished either by fixing the boundaries of the underlying variable of classification or by fixing the proportion of the individuals which may belong to a given group. For discriminating among the different groups, a set of measurements is obtained for each individual. In the example above, for instance, classification might be based on test scores achieved by the applicants on a set of tests administered to them. Since the value of the underlying variable of classification is unobservable, we may assign, in place of this variable, a characteristic random variable to each individual. The characteristic variable will be the same for every member of a given group. We then consider a choice of characteristic random variable and a linear combination of the observed measurements such that the correlation between the two is a maximum with respect to both the coefficients of the different measurements and the characteristic variable. If a significant correlation is found, one may then use a discriminant for a randomly selected individual the linear combination obtained by using the coefficients found by the above procedure. In order to facilitate a test of validity for the proposed discriminant function, the distribution of a suitable function of the above correlation coefficient is found under the null hypothesis of no correlation between the underlying variable of classification and the observed measurements. A test procedure based on the statistic for which the null distribution is found is then described. Special consideration is given in the study to the case of only two classification groups with the proportion of individuals to belong to each group fixed. For this case, in addition to obtaining the null distribution, the distribution of the test statistic is also considered under the alternative hypothesis. Low order momenta of the test criterion are obtained, and the approximate power of the proposed test is found for specific cases of the model by fitting an appropriate density to the moments derived. A general consideration of the power function and its behavior as the sample size increases and as the population multiple correlation between the underlying variable of classification and the observed measurements increases is also investigated. Finally, the probability of misclassification, or “the problem of shrinkage" as it is often called, is considered. Possible approaches to the problem and aaae of the difficulties in investigating this problem are indicated. / Ph. D. LD5655.V856 1965.F467 Mathematical decision Statistical decision
2	The Problem of classifying members of a population on a continuous scale Barnett, Frederic Charles January 1964 (has links) Having available a vector of measurements for each individual in a random sample from a multivariate population, we assume in addition that these individuals can be ranked on some criterion of interest. As an example of this situation, we may have measured certain physiological characteristics (blood pressure, amounts of certain chemical substances in the blood, etc.) in a random sample of schizophrenics. After a series of treatments (perhaps shock treatments, doses of a tranquillizer, etc.) these individuals might be ranked on the basis of favorable response to treatment. We shall in general be interested in predicting which individuals in a new group would respond most favorably. Thus, in the example, we should wish to know·which individuals would most likely benefit from the series of treatments. Some difficulties in applying the classical discriminant function analysis to problems of this type are noted. We have chosen to use the multiple correlation coefficient of ranks with measured variates as a statistic in testing whether ranks are associated with measurements. We give to this coefficient the name "quasi-rank multiple correlation coefficient", and proceed to find its first four exact moments under the assumption that the underlying probability distribution is multivariate normal. Two methods are used to approximate the power of tests based on the quasi-rank multiple correlation coefficient in the case of just one measured variate. The agreement for a sample size of twenty is quite good. The asymptotic relative efficiency of the squared quasi-rank coefficient vis-a-vis the squared standard multiple correlation coefficient is 9/π² , a result which does not depend on the number of measured variates. If the null hypothesis that ranks are not associated with measurements is rejected, it is appropriate to use the measurements in some way to predict the ranks. The quasi-rank multiple correlation coefficient is, however, the maximized simple correlation of ranks with linear combinations of the measured variates. The maximizing linear combination of measured variates is taken as a discriminant function, and its values for subsequently chosen individuals is used to rank these individuals in order of merit. A demonstration study is included in which we employ a random sample of size twenty from a six-variate normal distribution of known structure (for which the population multiple correlation coefficient is .655). The null hypothesis of no association of ranks with measurements is rejected in a two-sided size .05 test. The discriminant function is obtained and is used to "predict" the true ranks of the twenty individuals in the sample. The predicted ranks represent the true ranks rather well, with no predicted rank more than four places from the true rank. For other populations in which the population multiple correlation coefficient is greater than .655 we should expect to obtain even better sets of predicted ranks. In developing the moments of the quasi-rank multiple correlation coefficient it was necessary to obtain exact moments of a certain linear combination of quasi-ranges in a random sample from a normal population. Since this quasi-range statistic may be useful in other investigations, we include also its moment generating function and some derivatives of this moment generating function. / Ph. D. LD5655.V856 1964.B375 Mathematical decision Statistical decision

Search results

The problem of classifying members of a population into groups

The Problem of classifying members of a population on a continuous scale