Selected topics in statistical discriminant analysis

Ounpraseuth, Songthip T. Young, Dean M. January 2006 (has links)
Thesis (Ph.D.)--Baylor University, 2006. / Includes bibliographical references (p. 110-114).

Διαχωριστική ανάλυση - λογιστική παλινδρόμηση

Χουντής, Βασίλειος 07 July 2010 (has links)
Στην σημερινή εποχή είναι μεγάλη η ανάγκη να κατατάσσουμε παρατηρήσεις σε γνωστές ομάδες - πληθυσμούς καθώς επίσης και να κάνουμε προβλέψεις. Υπάρχουν πολλές μέθοδοι που κάνουν ή σκοπό έχουν να κατατάσσουν παρατηρήσεις. Στην διπλωματική εργασία περιγράφω δυο από τις σημαντικότερες μεθόδους που χρησιμοποιούνται ευρέως στην στατιστική, την διαχωριστική ανάλυση (discriminant analysis) και την λογιστική παλινδρόμηση (logistic regression). Στο πρώτο μέρος αναφέρω τι είναι η διαχωριστική ανάλυση, δίνω συνοπτικά μερικές εφαρμογές της μεθόδου και περιγράφω την διαφορά από την ανάλυση σε συστάδες. Στην συνέχεια αναλύω τον διαχωρισμό δυο πληθυσμών που ακολουθούν την κανονική κατανομή και τα κριτήρια που πρέπει να λάβουμε υπόψη. Στόχος μας είναι να κατασκευάσουμε μια συνάρτηση που θα διαχωρίζει όσο το δυνατόν καλύτερα τους δυο πληθυσμούς. Πρέπει να σημειώσουμε ότι δεν υπάρχει τέλειος διαχωρισμός, δηλαδή ενδέχεται η συνάρτηση να κατατάσσει λανθασμένα μια παρατήρηση σε μια από τις δυο ομάδες. Για αυτό πρέπει να λάβουμε υπόψη τα κόστη λανθασμένης κατάταξης και τις εκ των προτέρων πιθανότητες. Ο βέλτιστος διαχωρισμός θα πραγματοποιηθεί αν καταφέρουμε να ελαχιστοποιήσουμε το κόστος λανθασμένης κατάταξης. Στο τμήμα 3 βρίσκω την συνάρτηση κατάταξης όταν οι δυο πληθυσμοί έχουν ίσους πίνακες διασποράς (γραμμικός κανόνας κατάταξης) αλλά και όταν έχουν άνισες διασπορές (τετραγωνικός κανόνας κατάταξης). Εφόσον, έχω φτιάξει την συνάρτηση κατάταξης το επόμενο βήμα είναι να την αξιολογήσω. Περιγράφω δυο τρόπους αξιολόγησης (επικύρωσης), τον υπολογισμό του ρυθμού σφάλματος και την holdout διαδικασία. Στο τμήμα 5 αναφέρω την διαχωριστική ανάλυση του Fisher, τι υποθέσεις έκανε και πως κατάφερε να φτάσει στην ίδια συνάρτηση κατάταξης. Στην συνέχεια κάνω μια γενίκευση της διαχωριστικής ανάλυσης αν έχω g πληθυσμούς και δίνω το νέο τύπο της συνάρτησης κατάταξης όταν έχω ίσους και άνισους πίνακες διασποράς (γραμμικό – τετραγωνικό διαχωριστικό σκορ). Ερμηνεύω γεωμετρικά το γραμμικό διαχωριστικό σκορ. Στο τελευταίο τμήμα μελετάω την μέθοδο του Fisher όταν έχω g πληθυσμούς και αποδεικνύω μερικά θεωρήματα. Στο δεύτερος μέρος της διπλωματικής περιγράφω μια άλλη διαδικασία κατάταξης, την λογιστική παλινδρόμηση. Δίνω συνοπτικά μερικές εφαρμογές της μεθόδου και αναλύω πότε χρησιμοποιούμε αυτή την μέθοδο. Ξεκινώντας από το απλό γραμμικό μοντέλο παλινδρόμησης , αναφέρω τα προβλήματα που έχουμε τώρα που η μεταβλητή είναι δυαδική και πως τα αντιμετωπίζουμε, καταλήγοντας στην μορφή που έχει η απλή λογιστική συνάρτηση. Περιγράφω τις ιδιότητες της λογιστικής αποκρινόμενης συνάρτησης και πως προσαρμόζουμε το λογιστικό μοντέλο παλινδρόμησης χρησιμοποιώντας τους εκτιμητές μέγιστης πιθανοφάνειας. Κατόπιν δίνω την ερμηνεία του συντελεστή παλινδρόμησης και δίνω την μορφή της λογαριθμικής συνάρτησης πιθανοφάνειας όταν έχω επαναλαμβανόμενες παρατηρήσεις. Στο τμήμα 4 περιγράφω το πολλαπλό λογιστικό μοντέλο παλινδρόμησης και στο τμήμα 5 πως κατασκευάζεται το μοντέλο. Ελέγχω αν μπορούμε να παραλείψουμε μερικές προβλέπουσες μεταβλητές, χρησιμοποιώντας ένα στατιστικό που λέγεται μοντέλο απόκλισης, αλλά και από τον έλεγχο του λόγου πιθανοφάνειας. Προτού όμως χρησιμοποιήσω το μοντέλο στην πράξη εξετάζω την καταλληλότητα του, δηλαδή αν ικανοποιεί τις ιδιότητες της λογιστικής αποκρινόμενης συνάρτησης και αναζητώ τα outliers και τις παρατηρήσεις που έχουν την μεγαλύτερη επιρροή. Στα τμήματα 7 και 8 περιγράφω τα συμπεράσματα για τις παραμέτρους της λογιστικής παλινδρόμησης και για τον αποκρινόμενο μέσο, ενώ στο τμήμα 9 αναφέρω πως γίνεται η πρόβλεψη καινούριων παρατηρήσεων. Τελειώνοντας αναφέρω την πολύτομη λογιστική παλινδρόμηση και περιγράφω συνοπτικά τις ομοιότητες- διαφορές της διαχωριστικής ανάλυσης και της λογιστικής παλινδρόμησης. / -

Talento Motor : estudo dos indicadores somatomotores na seleção de escolares para o futebol

Santos, Fábio Rosa dos January 2013 (has links)
A identificação de talentos no esporte é um tema atual e amplamente discutido. Este debate ocorre no âmbito conceitual e metodológico. Especificamente para o futebol, esta é uma temática que tem gerado diversas discussões no meio acadêmico e na prática cotidiana dos clubes brasileiros. Assim, o objetivo deste estudo foi descrever o perfil de crianças e jovens praticantes de futebol e identificar indicadores de desempenho esportivo que permitam desenvolver parâmetros e metodologias para detecção de possíveis talentos motores para o futebol. As amostras foram compostas por 361 jovens de 10 a 13 anos de idade cronológica, constituídas por 188 escolares, aleatoriamente selecionadas do Banco de Dados do PROESPBr e por 173 atletas de futebol, selecionados por critério de acessibilidade, provenientes de dois clubes de futebol do Estado do Rio Grande do Sul (RS). Foi medida a estatura, envergadura, massa corporal, flexibilidade (sentar e alcançar com banco), força/resistência abdominal (abdominais por minuto), velocidade (20 metros), agilidade (quadrado de agilidade), força explosiva de membros inferiores (salto horizontal) e aptidão cardiorrespiratória (6 minutos). Para descrever o perfil antropométrico e de aptidão física foi utilizada a estatística descritiva (média e desvio padrão). Para a identificação das variáveis preditoras recorreu-se à Análise Discriminante e foi utilizado o pacote estatístico SPSS, versão 17.0. Os resultados demonstraram que os atletas apresentaram valores médios superiores aos escolares em todas as variáveis nas idades estudadas, com exceção da massa corporal, estatura e envergadura aos 11 anos de idade. Quanto as variáveis preditoras, aquelas que discriminaram o grupo de atletas do grupo de escolares foram: aos 10 anos de idade - flexibilidade, força/resistência abdominal, agilidade, velocidade e resistência aeróbia; 11 anos de idade - força resistência abdominal, agilidade, resistência aeróbia, velocidade; 12 anos de idade - resistência aeróbia, agilidade, velocidade, força resistência abdominal e força explosiva de membros inferiores e 13 anos de idade - resistência aeróbia, agilidade, flexibilidade, velocidade, força resistência abdominal. Os resultados demonstraram que a partir da utilização da bateria de testes do Projeto Esporte Brasil foi possível discriminar atletas de futebol de escolares e elaborar modelos capazes de identificar escolares com perfil de atleta. Assim, a utilização destes modelos podem compor os critérios de seleção de jovens para a modalidade do futebol e, ainda, atuar como instrumento auxiliar aos professores de Educação Física escolar. / Talent identification in sports is a current topic and it is widely discussed. This discussion occurs within conceptual and methodological. Specifically for soccer, this is a topic that has generated many discussions on academic level and in the daily practice of the Brazilian clubs. The aim of this study was to describe the profile of children and young people playing soccer and identify performance indicators sports which build parameters and methodologies for detecting potential talents engines for soccer. The sample was composed of 361 young people 10-13 years of chronological age, consisting of 188 students, randomly selected from the database of PROESP-Br and 173 soccer players selected by criteria of accessibility, from two soccer clubs of Rio Grande do Sul (RS) State. We measured height, arm span, body mass, flexibility (sit and reach with bench), abdominal strength / resistance (abdominal per minute), speed (20 meters), agility (agility square), explosive strength of lower limbs (horizontal jump) aerobic resistance (6 minutes). To describe anthropometric and physical fitness it was used descriptive statistics (mean and standard deviation). To identify the predictors we used the discriminant analysis. All analyses was done in statistical package SPSS, version 17.0. The results showed that athletes had higher values to schoolchildren in all variables and ages, with the exception of body mass, height and spread at 11 years old. The predictor variables, those discriminated group of athletes and students were: at 10 years of age - flexibility, abdominal strength / resistance, agility, speed and aerobic endurance, 11 years old - Strength abdominal endurance, agility, endurance aerobic speed; 12 years old - aerobic endurance, agility, speed, strength, abdominal strength and explosive strength of lower limbs and 13 years old - aerobic endurance, agility, flexibility, speed, strength, abdominal strength. The results showed that it was possible to discriminate between and students using the tests from Project Sport Brazil battery. Thus, we developed models capable of identifying students with athlete profile. The use of the models developed can compose the selection criteria of youth to the soccer sport and also act as an aid to teachers of Physical Education.

Geochemical and petrographic characterization of platreef pyroxenite Package p1, p2, p3 and p4 units at the akanani prospect area, bushveld Complex, South Africa

Mandende, Hakundwi January 2014 (has links)
>Magister Scientiae - MSc / This study is focused on the Akanani prospect area, approximately 25 km north-west of the town of Mokopane, Limpopo Province where exploration geologists at the study area have classified the ‘pyroxenitic’ units into P1, P2, P3 and P4 units upward in order of succession with height based on their textures, mineralogy and colour. The primary aim of this study is to distinguish the distinctive geochemical and mineralogical characteristics that can be used to identify each unit (P1 to P4) and in so doing create major geochemical, petrographic and mineralogical variables that will help or facilitate the exploration for and recovery of PGE and BMS mineralisation. Geochemical and mineralogical variation studies were carried out on the cores from ZF044, ZF045, ZF048, ZF057, ZF078, ZF082 and M0023, located in the Platreef at the Akanani Prospect area on the farms Moordkopje 813LR and Zwartfontein 814LR. Using a combination of various multivariate statistical techniques (factor, cluster and discriminant analysis) and mineralogical studies (CIPW norm, microprobe analysis, petrography), the outcomes of the study have demonstrated that the Platreef at Akanani comprise at least four lithological units i.e. the basal pyroxenite portion referred to as the P1 unit comprises chromitite, pyroxenites and feldspathic pyroxenites with associated Cr, TiO2, chromite, pyroxenes, hematite and Fe2O3, the mineralized section of the P2 unit is characterized by harzburgite, serpentinized harzburgite and in places orthopyroxenites are present consistent with high MgO and LOI contents, the feldspathic portion referred here as the P3 unit is characterized by a feldspathic pyroxenite containing higher Al2O3, Na2O, K2O, albite, hypersthene and SiO2 and the top most portion of the P4 unit comprising CaO, Diopside, ilmenite, anorthite, apatite and P2O5 that can be interpreted to have formed by three separate magma pulses. Considering the possibility that the P4 unit is a hybrid melt of assimilated Platreef that interacted with intruding Main Zone magma, this reduces the number of magma pulses to two. The classification of P1, P2, P3 and P4 units of the Platreef at Akanani shows that the criteria used by mining personnel to classify the four lithological units is not definitive and therefore are not highly reliable. Although various multivariate statistical techniques were employed relatively similar elemental associations were obtained highlighiting the importance of this approach. The strongly positive correlation between sulphides, PGEs and chromite at Akanani is consistent with an orthomagmatic deposit that had been disturbed by significant hydrothermal activity, while in places a good BMS-PGE relationship is commonly associated with the main chromitite stringers in P1. Mineral and whole rock compositions of silicate rocks highlight the strongly magnesian nature of the ultramafic P2 unit. Mineral chemistry studies of chromite, orthopyroxene, olivine, clinopyroxene and plagioclase are consistent with the multi- emplacement model. Convective exchange resulted in the enrichment of iron at the bottom of the stagnant chamber, while incompatible elements migrated upwards consistent with iron depletion with stratigraphic height. Injection of P1 magma and subsequent mixing with country rocks gave rise to the formation of chromitites and addition of plagioclase component to the intruding magma. A normal fractionation trend is suggested between P2 and P3 consistent with enrichment of MgO in P2 and enrichment of Al2O3, Na2O, SiO2 and K2O in P3. The An% of 84.4 of plagioclase coupled with CaO enrichment in P4 is suggestive of some Main Zone influence and can be interpreted as resulting from partial melting and recrystallization of P3 in response to the intrusion of the Main Zone magma is suggested for the formation of the P4 unit. There exists a good correlation between the modal mineralogy and mineral chemistry as determined optically, the norm as determined by the CIPW norm and the whole-rock geochemical results as determined by multivariate statistics and conventional methods.

Postcraniometric analysis of ancestry among modern South Africans

Liebenberg, Leandi January 2015 (has links)
The primary role of a physical anthropologist is to provide sufficient information to assist in the individualisation of unknown skeletal remains. This is often achieved in establishing a biological profile of the deceased, of which ancestry is an essential aspect. Several successful osteometric and morphological approaches have been developed to facilitate the estimation of ancestry from the cranium. However, the cranium is not always available for analysis, emphasising a need for postcranial alternatives. The postcranial skeleton is frequently labelled as too variable and unreliable to provide an accurate assessment of ancestry. Yet, numerous studies utilise the postcrania for sex and stature estimation, where the a priori knowledge of ancestry results in higher accuracy. Thus, the presence of postcranial differences among populations when investigating other biological parameters inherently demonstrates the potential for the estimation of ancestry. The purpose of this study was to quantify postcranial variation among modern, peer-reported black, white and coloured South Africans. A series of 39 standard measurements were taken from 11 postcranial bones, namely the clavicle, scapula, humerus, radius, ulna, sacrum, pelvis, femur, tibia, fibula and calcaneus. The sample consisted of 360 modern South African individuals (120 black, 120 white, 120 coloured) from the Pretoria Bone and Kirsten Collections housed at the University of Pretoria and the University of Stellenbosch, respectively. Group differences were explored with ANOVA and Tukey’s honestly significant difference test (HSD). Group means were used to create univariate sectioning points for each variable indicated as significant with ANOVA. Where two of the three groups had similar mean values, the groups were pooled for the creation of the sectioning points. Multivariate classification models were employed using linear and flexible discriminant analysis (LDA and FDA, respectively). Classification accuracies were compared to evaluate which model yielded the best results. The results demonstrated variable patterns of group overlap. Black and coloured South Africans displayed similar means for breadth measurements, and black and white South Africans showed similar means for the maximum length of distal limb elements. The majority of group variation is attributed to differences in size and robusticity, where white South Africans are overall larger and more robust than black and coloured South Africans. Accuracies for the univariate sectioning points ranged from 43% to 87%, with iliac breadth performing the best. However, the majority of the univariate sectioning points can only classify individuals into two groups rather than three because of similar group means. Multivariate bone models created using all measurements per bone resulted in accuracies ranging from 46% to 62% (LDA) and 41% to 66% (FDA). Multivariate subsets consisting of numerous different measurement combinations from several skeletal elements achieved accuracies as high as 85% (LDA) and 87% (FDA). Ultimately the best results were achieved using combinations of different variables from several skeletal elements. Overall, the multivariate models yielded better results than the univariate approach, as the inclusion of more variables is generally better for maximising group differences. Furthermore, FDA achieved higher accuracies than the more traditional approach of LDA. Despite the significant overlap among the groups, the postcranial skeleton has proven to be proficient in distinguishing the three groups. Thus, even in a heterogeneous population, a multivariate postcraniometric approach can be used to estimate ancestry with high accuracy. / Dissertation (MSc)--University of Pretoria, 2015. / Anatomy / Unrestricted

Random Matrix Theory: Selected Applications from Statistical Signal Processing and Machine Learning

Elkhalil, Khalil 06 1900 (has links)
Random matrix theory is an outstanding mathematical tool that has demonstrated its usefulness in many areas ranging from wireless communication to finance and economics. The main motivation behind its use comes from the fundamental role that random matrices play in modeling unknown and unpredictable physical quantities. In many situations, meaningful metrics expressed as scalar functionals of these random matrices arise naturally. Along this line, the present work consists in leveraging tools from random matrix theory in an attempt to answer fundamental questions related to applications from statistical signal processing and machine learning. In a first part, this thesis addresses the development of analytical tools for the computation of the inverse moments of random Gram matrices with one side correlation. Such a question is mainly driven by applications in signal processing and wireless communications wherein such matrices naturally arise. In particular, we derive closed-form expressions for the inverse moments and show that the obtained results can help approximate several performance metrics of common estimation techniques. Then, we carry out a large dimensional study of discriminant analysis classifiers. Under mild assumptions, we show that the asymptotic classification error approaches a deterministic quantity that depends only on the means and covariances associated with each class as well as the problem dimensions. Such result permits a better understanding of the underlying classifiers, in practical large but finite dimensions, and can be used to optimize the performance. Finally, we revisit kernel ridge regression and study a centered version of it that we call centered kernel ridge regression or CKRR in short. Relying on recent advances on the asymptotic properties of random kernel matrices, we carry out a large dimensional analysis of CKRR under the assumption that both the data dimesion and the training size grow simultaneiusly large at the same rate. We particularly show that both the empirical and prediction risks converge to a limiting risk that relates the performance to the data statistics and the parameters involved. Such a result is important as it permits a better undertanding of kernel ridge regression and allows to efficiently optimize the performance.

Petrophysical characterization of sandstone reservoirs through boreholes E-S3, E-S5 and F-AH4 using multivariate statistical techniques and seismic facies in the Central Bredasdorp Basin

Mosavel, Haajierah January 2014 (has links)
>Magister Scientiae - MSc / The thesis aims to determine the depositional environments, rock types and petrophysical characteristics of the reservoirs in Wells E-S3, E-S5 and F-AH4 of Area X in the Bredasdorp Basin, offshore South Africa. The three wells were studied using methods including core description, petrophysical analysis, seismic facies and multivariate statistics in order to evaluate their reservoir potential. The thesis includes digital wireline log signatures, 2D seismic data, well data and core analysis from selected depths. Based on core description, five lithofacies were identified as claystone (HM1), fine to coarse grained sandstone (HM2), very fine to medium grained sandstone (HM3), fine to medium grained sandstone (HM4) and conglomerate (HM5). Deltaic and shallow marine depositional environments were also interpreted from the core description based on the sedimentary structures and ichnofossils. The results obtained from the petrophysical analysis indicate that the sandstone reservoirs show a relatively fair to good porosity (range 13-20 %), water saturation (range 17-45 %) and a predicted permeability (range 4- 108 mD) for Wells E-S3, E-S5 andF-AH4. The seismic facies model of the study area shows five seismic facies described as parallel, variable amplitude variable continuity, semi-continuous high amplitude, divergent variable amplitude and chaotic seismic facies as well as a probable shallow marine, deltaic and submarine fan depositional system. Linking lithofacies to seismic facies maps helped to understand and predict the distribution and quality of reservoir packages in the studied wells. Multivariate statistical methods of factor, discriminant and cluster analysis were used. For Wells E-S3, E-S5 and F-AH4, two factors were derived from the wireline log data reflecting oil and non- oil bearing depths. Cluster analysis delineated oil and non-oil bearing groups with similar wireline properties. This thesis demonstrates that the approach taken is useful because petrophysical analysis, seismic facies and multivariate statistics has provided useful information on reservoir quality such as net to gross, depths of hydrocarbon saturation and depositional environment.

Comparative Analysis of Mature Travelers on the Basis of Internet Use

Cho, SeongMin 12 June 2002 (has links)
Travel and tourism marketers face a highly competitive environment brought on by the changing demographics of the U.S. population, the most significant change being the growth in size of the mature segment of the population. In terms of market size, there are currently 73 million people age 50 and older, comprising nearly one-fourth of the U.S. population (U.S. Census Bureau 2000). That number is expected to rise to 96 million by 2010, representing one-third of the population (Rasmusson 2000). A swelling population is not the only enticement that this age group offers. It is important to note that many mature consumers have deep pockets and a strong desire to spend. In fact, they control more than three-quarters of the wealth and one-half of the discretionary income in the nation. It is also estimated that they lay claim to three-fourths of the country's financial assets and boast more than $1 trillion in annual buying power. When all is said and done, this age group accounts for 40 percent of the total consumer demand in the United States (Swartz, 1999). However, even though recognizing the significance of the mature market in terms of their market size and economic potential, little research has been conducted to identify and understand the mature travelers who use the Internet.The main purpose of this study is to profile mature travelers on the basis of Internet use. More specifically, the intention is to examine the demographic and socio-economic characteristics of mature travelers who use the Internet compared to those who do not use the Internet. In addition, the purpose of the present study is to examine whether or not differences exist between Internet users and Internet non-users among mature travelers with respect to travel behavior. Attention is paid to investigate types of trip selected, the preferred activities participated in during the travel, length of stay, travel-related expenditures, type of lodging, type of transportation, number in the travel party, and type of travel party in explaining the differences between Internet users and Internet non-users of the mature market.Data were collected by utilizing a mailed questionnaire. 433 responses (23.44 percent of the total target population) were coded and used for data analysis. Data were analyzed by employing three types of data analysis: chi-square tests of independence; t-tests; and multiple discriminant analysis.The findings in the present study suggest that there are numerous differences in demographics, socio-economic characteristics, and travel characteristics between Internet users and Internet non-users among mature travelers. As a whole, for example, the results revealed that mature travelers who use the Internet were more likely to be younger, have higher annual household incomes, and have higher levels of education than mature travelers who do not use the Internet. Also, the results indicated that mature travelers who are still working are more likely to use the Internet than those who are not working. By understanding and utilizing information gathered from Internet users' and Internet non-users' demographics, socio-economic characteristics, and travel characteristics, tourism planners and marketers can develop appropriate and effective marketing strategies that appeal to mature travelers. / Master of Science

