Global ETD Search

1	Extensions of biplot methodology to discriminant analysis with applications of non-parametric principal components Gardner, Sugnet January 2001 (has links) Dissertation (PhD)--Stellenbosch University, 2001. / ENGLISH ABSTRACT: Gower and Hand offer a new perspective on the traditional biplot. This perspective provides a unified approach to principal component analysis (PCA) biplots based on Pythagorean distance; canonical variate analysis (CVA) biplots based on Mahalanobis distance; non-linear biplots based on Euclidean embeddable distances as well as generalised biplots for use with both continuous and categorical variables. The biplot methodology of Gower and Hand is extended and applied in statistical discrimination and classification. This leads to discriminant analysis by means of PCA biplots, CVA biplots, non-linear biplots as well as generalised biplots. Properties of these techniques are derived in detail. Classification regions defined for linear discriminant analysis (LDA) are applied in the CVA biplot leading to discriminant analysis using biplot methodology. Situations where the assumptions of LDA are not met are considered and various existing alternative discriminant analysis procedures are formulated in terms of biplots and apart from PCA biplots, QDA, FDA and DSM biplots are defined, constructed and their usage illustrated. It is demonstrated that biplot methodology naturally provides for managing categorical and continuous variables simultaneously. It is shown through a simulation study that the techniques based on biplot methodology can be applied successfully to the reversal problem with categorical variables in discriminant analysis. Situations occurring in practice where existing discriminant analysis procedures based on distances from means fail are considered. After discussing self-consistency and principal curves (a form of non-parametric principal components), discriminant analysis based on distances from principal curves (a form of a conditional mean) are proposed. This biplot classification procedure based upon principal curves, yields much better results. Bootstrapping is considered as a means of describing variability in biplots. Variability in samples as well as of axes in biplot displays receives attention. Bootstrap a-regions are defined and the ability of these regions to describe biplot variability and to detect outliers is demonstrated. Robust PCA and CVA biplots restricting the role of influential observations on biplot displays are also considered. An extensive library of S-PLUS computer programmes is provided for implementing the various discriminant analysis techniques that were developed using biplot methodology. The application of the above theoretical developments and computer software is illustrated by analysing real-life data sets. Biplots are used to investigate the degree of capital intensity of companies and to serve as an aid in risk management of a financial institution. A particular application of the PCA biplot is the TQI biplot used in industry to determine the degree to which manufactured items comply with multidimensional specifications. A further interesting application is to determine whether an Old-Cape furniture item is manufactured of stinkwood or embuia. A data set provided by the Western Cape Nature Conservation Board consisting of measurements of tortoises from the species Homopus areolatus is analysed by means of biplot methodology to determine if morphological differences exist among tortoises from different geographical regions. Allometric considerations need to be taken into account and the resulting small sample sizes in some subgroups severely limit the use of conventional statistical procedures. Biplot methodology is also applied to classification in a diabetes data set illustrating the combined advantage of using classification with principal curves in a robust biplot or biplot classification where covariance matrices are unequal. A discriminant analysis problem where foraging behaviour of deer might eventually result in a change in the dominant plant species is used to illustrate biplot classification of data sets containing both continuous and categorical variables. As an example of the use of biplots with large data sets a data set consisting of 16828 lemons is analysed using biplot methodology to investigate differences in fruit from various areas of production, cultivars and rootstocks. The proposed a-bags also provide a measure of quantifying the graphical overlap among classes. This method is successfully applied in a multidimensional socio-economical data set to quantify the degree of overlap among different race groups. The application of the proposed biplot methodology in practice has an important byproduct: It provides the impetus for many a new idea, e.g. applying a peA biplot in industry led to the development of quality regions; a-bags were constructed to represent thousands of observations in the lemons data set, in tum leading to means for quantifying the degree of overlap. This illustrates the enormous flexibility of biplots - biplot methodology provides an infrastructure for many novelties when applied in practice. / AFRIKAANSE OPSOMMING: Gower en Hand bied 'n nuwe perspektief op die tradisionele bistipping. Hierdie perspektief verskaf 'n uniforme benadering tot hoofkomponent analise (HKA) bistippings gebaseer op Pythagoras-afstand; kanoniese veranderlike analise (KVA) bistippings gebaseer op Mahalanobis-afstand; nie-lineere bistippings gebaseer op Euclidies inbedbare afstande sowel as veralgemeende bistippings vir gebruik wanneer beide kontinue en kategoriese veranderlikes voorkom. Die bistippingsmetodologie van Gower en Hand word uitgebrei en toegepas in statistiese diskriminasie en klassifikasie. Dit lei tot diskriminantanalise met behulp van HKA bistippings, KVA bistippings, nie-lineere bistippings sowel as veralgemeende bistippings. Die eienskappe van hierdie tegnieke word in besonderhede afgelei. Die toepassing van die konsep van 'n klassifikasiegebied in die KVA bistipping baan die weg vir lineere diskriminantanalise (LDA) met behulp van bistippingsmetodologie. Situasies waar daar nie aan die aannames van LDA voldoen word nie kry aandag en verskeie bestaande altematiewe diskriminantanalise prosedures word in terme van bistippings geformuleer en naas HKA bistippings, word QDA, FDA en DSM bistippings gedefinieer, gekonstrueer en hul gebruike gedemonstreer. Dit word aangetoon dat bistippingsmetodologie op 'n natuurlik wyse voorsiening maak om kategoriese veranderlikes en kontinue veranderlikes gelyktydig te hanteer. Daar word met behulp van 'n simulasie-studie aangetoon dat tegnieke gebaseer op die bistippingsmetodologie wat ontwikkel IS, suksesvol by die sogenaamde ornkeringsprobleem by diskriminantanalise met kategoriese veranderlikes gebruik kan word. Verder word aangevoer dat daar baie praktiese situasies voorkom waar bestaande prosedures van diskriminantanalise faal omdat dit op afstande vanaf gemiddeldes gebaseer IS. Na 'n bespreking van self-konsekwentheid en hoofkrommes ('n vorm van nieparametriese hoofkomponente) word voorgestel om diskriminantanalise op afstand vanaf hoofkrommes ('n vonn van 'n voorwaardelike gemiddelde) te baseer. Sodoende is 'n bistippingklassifikasie prosedure wat op afstand vanaf hoofkrommes gebaseer is en wat baie beter resultate lewer, ontwikkel. Die variasie in die posisies van datapunte in die bistipping sowel as van die bistippingsasse word bestudeer met behulp van skoenlusmetodes. 'n Skoenlus a-gebied word gedefinieer en dit word gedemonstreer hoe so 'n a-gebied aangewend kan word om variasie in bistippings te beskryf en wegleers te identifiseer. Robuuste HKA en KV A bistippings wat die rol van invloedryke waamemings op die bistipping beperk, word bespreek. 'n Omvangryke biblioteek van S-PLUS rekenaarprogramme is geskryf VIr die implementering van die verskillende diskriminantanalise tegnieke wat met behulp van bistippingsmetodologie ontwikkel is. Die toepassing van die voorafgaande teoretiese ontwikkelinge en rekenaarprogramme word geillustreer aan die hand van werklike datastelle vanuit die praktyk. So word bistippings gebruik om die mate van kapitaalintensiteit van ondememings te ondersoek en om as hulpmiddel by risikobestuur van 'n finansiele instelling te dien. 'n Besondere toepassing van die HKA bistipping is die TQI bistipping wat in die industriele omgewing gebruik word ten einde te bepaal tot watter mate vervaardigde artikels aan neergelegde meerdimensionele spesifikasies voldoen. 'n Verdere interessante toepassing is om te bepaal of 'n Ou-Kaapse meubelstuk van stinkhout of embuia gemaak is. 'n Datastel verskaf deur Wes-Kaap Natuurbewaring in verband met die bekende padloper skilpad, Homopus areolatus, is met behulp van bistippings geanaliseer om te bepaal of daar morfometriese verskille tussen die padlopers afkomstig van bepaalde geografiese gebiede is. Allometriese beginsels moes ook in ag gene em word en die min waamemings in sommige van die subgroepe het tot gevolg dat konvensionele statistiese tegnieke nie sonder meer gebruik kan word nie. Die bistippingsmetodologie is ook toegepas op klassifikasie by 'n diabetes datastel om die gekombineerde gebruik van. hoofkrommes in 'n robuuste bistipping te illustreer en bistippingklassifikasie waar daar sprake van ongelyke kovariansiematrikse is. 'n Diskriminantanalise probleem waar die weidingsvoorkeure van wildsbokke 'n verandering in die dominante plantegroei tot gevolg kan he, word gebruik om bistippingklassifikasie met data waar kontinue sowel as kategoriese veranderlikes verskaf word, te illustreer. As voorbeeld van die gebruik van bistippings by 'n groot datastel is 'n datastel bestaande uit waamemings van 16828 suurlemoene met behulp van bistippingsmetodologie geanaliseer ten einde verskille in vrugte afkomstig van verskillende produsente-streke, kultivars en onderstamme te ondersoek. Die a-sakkies wat hier ontwikkel is, lei tot kwantifisering van die grafiese oorvleueling van groepe. Hierdie beginsel word suksesvol toegepas in 'n meerdimensionele sosio-ekonomiese datastel om die mate van oorvleueling van verskillende bevolkingsgroepe te kwantifiseer. Die toepassing van die voorgestelde bistippingsmetodologie in die praktyk lei tot 'n belangrike newe-produk: Dit verskaf die stimulus tot die ontstaan van nuwe idees, byvoorbeeld, die toepassing van 'n HKA bistipping in 'n industriele omgewing het tot die ontwikkeling van die konsep van 'n kwaliteitsgebied aanleiding gegee; a-sakkies is gekonstrueer om duisende waamemings in die suurlemoendatastel te verteenwoordig wat weer gelei het tot 'n metode om die graad van oorvleueling te kwantifiseer. Hierdeur is die geweldige veelsydigheid van bistippings geillustreer - bistippingsmetodologie verskaf die infrastruktuur vir baie vindingryke toepassings in die praktyk. Multivariate analysis Statistics -- Graphic methods
2	A comparison of the efficiencies of the Gram-Charlier and Pearson frequency functions for fitting certain distributions Bradford, Henry Franklin January 1936 (has links) No description available. Algebraic functions. Statistics -- Graphic methods.
3	Analysis of outliers using graphical and quasi-Bayesian methods 馮榮錦, Fung, Wing-kam, Tony. January 1987 (has links) published_or_final_version / Statistics / Doctoral / Doctor of Philosophy Outliers (Statistics) - Graphic methods. Bayesian statistical decision theory.
4	Some examples of Pearson's frequency curves Thomson, Mary Gilmore, 1897- January 1940 (has links) No description available. Statistics -- Graphic methods. Frequency curves. Mathematical statistics. Pearson, Karl, 1857-1936.
5	Métodos alternativos para análise rápida de parâmetros de qualidade da soja / Alternative methods for rapid analysis of soybean quality parameters Santos, Larissa da Rocha dos 24 February 2017 (has links) CAPES; CNPQ / Dada a importância mundial da cultivar soja, é imprescindível a aplicação de metodologias para o monitoramento eficiente dos parâmetros fisíco-químicos que determinam a qualidade dos grãos com agilidade e confiabilidade adequadas. Entretanto, os métodos analíticos empregados para as análises tradicionais envolvem técnicas demoradas, utilizam vários equipamentos e reagentes, além de gerarem resíduos químicos. Desta forma, o desenvolvimento de metodologias alternativas para esta finalidade pode trazer benefícios tanto para as indústrias e órgãos reguladores quanto para os analistas. Este estudo propõe a utilização de Espectroscopia de Infravermelho Próximo (NIR) associada a métodos quimiométricos para a construção de modelos multivariados para previsão do percentual de lipídios totais, índice de acidez, teor de clorofila, proteína bruta e umidade em soja. Na construção dos modelos foram avaliadas 300 amostras de soja Glycine max (L.) Merrill. Os dados espectrais foram processados por meio do método de Mínimos Quadrados Parciais (PLS). Os resultados sugerem que os modelos desenvolvidos podem ser utilizados como uma metodologia alternativa para determinar parâmetros físico-químicos e poderiam ser aplicados no controle de qualidade em indústrias de soja. / Given the worldwide importance of soybean cultivars, it is essential to apply methodologies for the efficient monitoring of the physico-chemical parameters that determine the grain quality with adequate agility and reliability. Nonetheless, the analytical methods used in the traditional analysis involves time-consuming techniques, usage of various equipment and reagents besides generating chemical residues. Considering that, the development of alternative methodologies for this purpose can bring benefits to both industries and regulatory bodies as for the analysts. This study proposes the use of Near Infrared Spectroscopy (NIR) associated with chemometric methods for the construction of multivariate models to predict the percentage of total lipids, acidity index, chlorophyll content, crude protein and moisture in soybean. For this, 300 samples of Glycine max (L.) Merrill soybean were evaluated. The spectral data were processed by the method of Partial Least Squares (PLS). The results suggest that the developed model can be used as an alternative methodology to determine the physical-chemical parameters and could be applied in quality control in the soybean industries. Soja Análise espectral Estatística - Métodos gráficos Soybean Spectrum analysis Statistics - Graphic methods Tecnologia de Alimentos
6	Métodos alternativos para análise rápida de parâmetros de qualidade da soja / Alternative methods for rapid analysis of soybean quality parameters Santos, Larissa da Rocha dos 24 February 2017 (has links) CAPES; CNPQ / Dada a importância mundial da cultivar soja, é imprescindível a aplicação de metodologias para o monitoramento eficiente dos parâmetros fisíco-químicos que determinam a qualidade dos grãos com agilidade e confiabilidade adequadas. Entretanto, os métodos analíticos empregados para as análises tradicionais envolvem técnicas demoradas, utilizam vários equipamentos e reagentes, além de gerarem resíduos químicos. Desta forma, o desenvolvimento de metodologias alternativas para esta finalidade pode trazer benefícios tanto para as indústrias e órgãos reguladores quanto para os analistas. Este estudo propõe a utilização de Espectroscopia de Infravermelho Próximo (NIR) associada a métodos quimiométricos para a construção de modelos multivariados para previsão do percentual de lipídios totais, índice de acidez, teor de clorofila, proteína bruta e umidade em soja. Na construção dos modelos foram avaliadas 300 amostras de soja Glycine max (L.) Merrill. Os dados espectrais foram processados por meio do método de Mínimos Quadrados Parciais (PLS). Os resultados sugerem que os modelos desenvolvidos podem ser utilizados como uma metodologia alternativa para determinar parâmetros físico-químicos e poderiam ser aplicados no controle de qualidade em indústrias de soja. / Given the worldwide importance of soybean cultivars, it is essential to apply methodologies for the efficient monitoring of the physico-chemical parameters that determine the grain quality with adequate agility and reliability. Nonetheless, the analytical methods used in the traditional analysis involves time-consuming techniques, usage of various equipment and reagents besides generating chemical residues. Considering that, the development of alternative methodologies for this purpose can bring benefits to both industries and regulatory bodies as for the analysts. This study proposes the use of Near Infrared Spectroscopy (NIR) associated with chemometric methods for the construction of multivariate models to predict the percentage of total lipids, acidity index, chlorophyll content, crude protein and moisture in soybean. For this, 300 samples of Glycine max (L.) Merrill soybean were evaluated. The spectral data were processed by the method of Partial Least Squares (PLS). The results suggest that the developed model can be used as an alternative methodology to determine the physical-chemical parameters and could be applied in quality control in the soybean industries. Soja Análise espectral Estatística - Métodos gráficos Soybean Spectrum analysis Statistics - Graphic methods Tecnologia de Alimentos
7	Types and levels of data arrangement and representation in statistics as modeled by grade 4 - 7 learners Wessels, Helena Margaretha 28 February 2006 (has links) The crucial role of representation in mathematical and statistical modeling and problem solving as evident in learners' arrangement and representation of statistical data were investigated with focus points data arrangement, data representation and statistical thinking levels. The representation tasks required learners to arrange and represent data through modeling, focusing on spontaneous representations. Successful transnumeration determine the ultimate success of a representation and the ability to organise data is regarded as critical. Arrangement types increased in sophistication with increased grade level and the hierarchical nature of arrangement types became apparent when regarded in the context of an adapted SOLO Taxonomy framework. A higher level arrangement strategy pointed to a higher SOLO level of statistical thinking. Learners in the two tasks produced a rich variety of representations which included idiosyncratic, unsophisticated responses as well as standard statistical representations. The context of the two tasks, the quantitative versus qualitative nature of the data in the tasks, and the statistical tools or representational skills learners have at their disposal, played an important role in their representations. Well-planned data handling activities develop representational and higher order thinking skills. The variety of responses and different response levels elicited in the two tasks indicate that the nature of the tasks rather than the size of the data set play a conclusive role in data tasks. Multiple representations by an individual were an indication of successful modeling, are effective in problem solving and are associated with good performance. The SOLO model which incorporates a structural approach as well as a multimodal component proved valuable in the analysis of responses. Using this model with accompanying acknowledgement of different problem solving paths and the contribution of ikonic support in the concrete symbolic mode, promote the in-depth analysis of responses. This study contributes to the research in the field of data representation and statistical thinking. The analysis and results led to an integrated picture of Grade 4-7 learners' representation of statistical data and of the statistical thinking levels evident in their representations. / Educational Studies / D. Ed. (Didactics) Statistics education Mathematics education Modeling Representation Arrangement types Transnumeration Statistical thinking levels Multiple representations 372.7
8	Types and levels of data arrangement and representation in statistics as modeled by grade 4 - 7 learners Wessels, Helena Margaretha 28 February 2006 (has links) The crucial role of representation in mathematical and statistical modeling and problem solving as evident in learners' arrangement and representation of statistical data were investigated with focus points data arrangement, data representation and statistical thinking levels. The representation tasks required learners to arrange and represent data through modeling, focusing on spontaneous representations. Successful transnumeration determine the ultimate success of a representation and the ability to organise data is regarded as critical. Arrangement types increased in sophistication with increased grade level and the hierarchical nature of arrangement types became apparent when regarded in the context of an adapted SOLO Taxonomy framework. A higher level arrangement strategy pointed to a higher SOLO level of statistical thinking. Learners in the two tasks produced a rich variety of representations which included idiosyncratic, unsophisticated responses as well as standard statistical representations. The context of the two tasks, the quantitative versus qualitative nature of the data in the tasks, and the statistical tools or representational skills learners have at their disposal, played an important role in their representations. Well-planned data handling activities develop representational and higher order thinking skills. The variety of responses and different response levels elicited in the two tasks indicate that the nature of the tasks rather than the size of the data set play a conclusive role in data tasks. Multiple representations by an individual were an indication of successful modeling, are effective in problem solving and are associated with good performance. The SOLO model which incorporates a structural approach as well as a multimodal component proved valuable in the analysis of responses. Using this model with accompanying acknowledgement of different problem solving paths and the contribution of ikonic support in the concrete symbolic mode, promote the in-depth analysis of responses. This study contributes to the research in the field of data representation and statistical thinking. The analysis and results led to an integrated picture of Grade 4-7 learners' representation of statistical data and of the statistical thinking levels evident in their representations. / Educational Studies / D. Ed. (Didactics) Statistics education Mathematics education Modeling Representation Arrangement types Transnumeration Statistical thinking levels Multiple representations 372.7
9	The geo-spatial analysis and environmental factors of narcotics hot spots Balchak, Stefanie Wrae 01 January 2005 (has links) A mixed methodological approach with two different analytic procedures and multiple data sources was used to examine narcotics hot spots. The first phase compares two methods of hot spots identification; the prediction model and the actual crimes. The second phase involves an intensive study to better understand the phenomenon of drug hot spots areas consistently shown to be repeat hot spots. Crime analysis Drug abuse and crime Analysis Narcotics Narcotics Statistics Graphic methods Drug abuse Environmental aspects Crime prevention Drug abuse and crime Social conditions Crime analysis Crime prevention Narcotics. Criminology and Criminal Justice Geographic Information Sciences Substance Abuse and Addiction
10	Web-based geotemporal visualization of healthcare data Bloomquist, Samuel W. 09 October 2014 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Healthcare data visualization presents challenges due to its non-standard organizational structure and disparate record formats. Epidemiologists and clinicians currently lack the tools to discern patterns in large-scale data that would reveal valuable healthcare information at the granular level of individual patients and populations. Integrating geospatial and temporal healthcare data within a common visual context provides a twofold benefit: it allows clinicians to synthesize large-scale healthcare data to provide a context for local patient care decisions, and it better informs epidemiologists in making public health recommendations. Advanced implementations of the Scalable Vector Graphic (SVG), HyperText Markup Language version 5 (HTML5), and Cascading Style Sheets version 3 (CSS3) specifications in the latest versions of most major Web browsers brought hardware-accelerated graphics to the Web and opened the door for more intricate and interactive visualization techniques than have previously been possible. We developed a series of new geotemporal visualization techniques under a general healthcare data visualization framework in order to provide a real-time dashboard for analysis and exploration of complex healthcare data. This visualization framework, HealthTerrain, is a concept space constructed using text and data mining techniques, extracted concepts, and attributes associated with geographical locations. HealthTerrain's association graph serves two purposes. First, it is a powerful interactive visualization of the relationships among concept terms, allowing users to explore the concept space, discover correlations, and generate novel hypotheses. Second, it functions as a user interface, allowing selection of concept terms for further visual analysis. In addition to the association graph, concept terms can be compared across time and location using several new visualization techniques. A spatial-temporal choropleth map projection embeds rich textures to generate an integrated, two-dimensional visualization. Its key feature is a new offset contour method to visualize multidimensional and time-series data associated with different geographical regions. Additionally, a ring graph reveals patterns at the fine granularity of patient occurrences using a new radial coordinate-based time-series visualization technique. geotemporal healthcare data visualization Information visualization Medical records -- Data processing Epidemiologists Cascading style sheets HTML (Document markup language) SVG (Document markup language) Medical statistics Data mining Evidence-based medicine statistics -- Graphic methods Quantiative research

Search results