Spelling suggestions: "subject:"statistics -- cographic methods"" "subject:"statistics -- 12graphic methods""
1 |
Extensions of biplot methodology to discriminant analysis with applications of non-parametric principal componentsGardner, Sugnet January 2001 (has links)
Dissertation (PhD)--Stellenbosch University, 2001. / ENGLISH ABSTRACT: Gower and Hand offer a new perspective on the traditional biplot. This perspective
provides a unified approach to principal component analysis (PCA) biplots based on
Pythagorean distance; canonical variate analysis (CVA) biplots based on Mahalanobis
distance; non-linear biplots based on Euclidean embeddable distances as well as
generalised biplots for use with both continuous and categorical variables.
The biplot methodology of Gower and Hand is extended and applied in statistical
discrimination and classification. This leads to discriminant analysis by means of PCA
biplots, CVA biplots, non-linear biplots as well as generalised biplots. Properties of these
techniques are derived in detail. Classification regions defined for linear discriminant
analysis (LDA) are applied in the CVA biplot leading to discriminant analysis using biplot
methodology. Situations where the assumptions of LDA are not met are considered and
various existing alternative discriminant analysis procedures are formulated in terms of
biplots and apart from PCA biplots, QDA, FDA and DSM biplots are defined, constructed
and their usage illustrated.
It is demonstrated that biplot methodology naturally provides for managing categorical and
continuous variables simultaneously. It is shown through a simulation study that the
techniques based on biplot methodology can be applied successfully to the reversal
problem with categorical variables in discriminant analysis.
Situations occurring in practice where existing discriminant analysis procedures based on
distances from means fail are considered. After discussing self-consistency and principal
curves (a form of non-parametric principal components), discriminant analysis based on
distances from principal curves (a form of a conditional mean) are proposed. This biplot
classification procedure based upon principal curves, yields much better results.
Bootstrapping is considered as a means of describing variability in biplots. Variability in
samples as well as of axes in biplot displays receives attention. Bootstrap a-regions are defined and the ability of these regions to describe biplot variability and to detect outliers
is demonstrated. Robust PCA and CVA biplots restricting the role of influential
observations on biplot displays are also considered.
An extensive library of S-PLUS computer programmes is provided for implementing the
various discriminant analysis techniques that were developed using biplot methodology.
The application of the above theoretical developments and computer software is illustrated
by analysing real-life data sets. Biplots are used to investigate the degree of capital
intensity of companies and to serve as an aid in risk management of a financial institution.
A particular application of the PCA biplot is the TQI biplot used in industry to determine
the degree to which manufactured items comply with multidimensional specifications. A
further interesting application is to determine whether an Old-Cape furniture item is
manufactured of stinkwood or embuia. A data set provided by the Western Cape Nature
Conservation Board consisting of measurements of tortoises from the species Homopus
areolatus is analysed by means of biplot methodology to determine if morphological
differences exist among tortoises from different geographical regions. Allometric
considerations need to be taken into account and the resulting small sample sizes in some
subgroups severely limit the use of conventional statistical procedures.
Biplot methodology is also applied to classification in a diabetes data set illustrating the
combined advantage of using classification with principal curves in a robust biplot or
biplot classification where covariance matrices are unequal. A discriminant analysis
problem where foraging behaviour of deer might eventually result in a change in the
dominant plant species is used to illustrate biplot classification of data sets containing both
continuous and categorical variables. As an example of the use of biplots with large data
sets a data set consisting of 16828 lemons is analysed using biplot methodology to
investigate differences in fruit from various areas of production, cultivars and rootstocks.
The proposed a-bags also provide a measure of quantifying the graphical overlap among
classes. This method is successfully applied in a multidimensional socio-economical data
set to quantify the degree of overlap among different race groups. The application of the proposed biplot methodology in practice has an important byproduct:
It provides the impetus for many a new idea, e.g. applying a peA biplot in
industry led to the development of quality regions; a-bags were constructed to represent
thousands of observations in the lemons data set, in tum leading to means for quantifying
the degree of overlap. This illustrates the enormous flexibility of biplots - biplot
methodology provides an infrastructure for many novelties when applied in practice. / AFRIKAANSE OPSOMMING: Gower en Hand bied 'n nuwe perspektief op die tradisionele bistipping. Hierdie
perspektief verskaf 'n uniforme benadering tot hoofkomponent analise (HKA) bistippings
gebaseer op Pythagoras-afstand; kanoniese veranderlike analise (KVA) bistippings
gebaseer op Mahalanobis-afstand; nie-lineere bistippings gebaseer op Euclidies inbedbare
afstande sowel as veralgemeende bistippings vir gebruik wanneer beide kontinue en
kategoriese veranderlikes voorkom.
Die bistippingsmetodologie van Gower en Hand word uitgebrei en toegepas in statistiese
diskriminasie en klassifikasie. Dit lei tot diskriminantanalise met behulp van HKA
bistippings, KVA bistippings, nie-lineere bistippings sowel as veralgemeende bistippings.
Die eienskappe van hierdie tegnieke word in besonderhede afgelei. Die toepassing van
die konsep van 'n klassifikasiegebied in die KVA bistipping baan die weg vir lineere
diskriminantanalise (LDA) met behulp van bistippingsmetodologie. Situasies waar daar
nie aan die aannames van LDA voldoen word nie kry aandag en verskeie bestaande
altematiewe diskriminantanalise prosedures word in terme van bistippings geformuleer en
naas HKA bistippings, word QDA, FDA en DSM bistippings gedefinieer, gekonstrueer en
hul gebruike gedemonstreer.
Dit word aangetoon dat bistippingsmetodologie op 'n natuurlik wyse voorsiening maak om
kategoriese veranderlikes en kontinue veranderlikes gelyktydig te hanteer. Daar word met
behulp van 'n simulasie-studie aangetoon dat tegnieke gebaseer op die
bistippingsmetodologie wat ontwikkel IS, suksesvol by die sogenaamde
ornkeringsprobleem by diskriminantanalise met kategoriese veranderlikes gebruik kan
word.
Verder word aangevoer dat daar baie praktiese situasies voorkom waar bestaande
prosedures van diskriminantanalise faal omdat dit op afstande vanaf gemiddeldes gebaseer
IS. Na 'n bespreking van self-konsekwentheid en hoofkrommes ('n vorm van nieparametriese
hoofkomponente) word voorgestel om diskriminantanalise op afstand vanaf hoofkrommes ('n vonn van 'n voorwaardelike gemiddelde) te baseer. Sodoende is 'n
bistippingklassifikasie prosedure wat op afstand vanaf hoofkrommes gebaseer is en wat
baie beter resultate lewer, ontwikkel.
Die variasie in die posisies van datapunte in die bistipping sowel as van die bistippingsasse
word bestudeer met behulp van skoenlusmetodes. 'n Skoenlus a-gebied word gedefinieer
en dit word gedemonstreer hoe so 'n a-gebied aangewend kan word om variasie in
bistippings te beskryf en wegleers te identifiseer. Robuuste HKA en KV A bistippings wat
die rol van invloedryke waamemings op die bistipping beperk, word bespreek.
'n Omvangryke biblioteek van S-PLUS rekenaarprogramme is geskryf VIr die
implementering van die verskillende diskriminantanalise tegnieke wat met behulp van
bistippingsmetodologie ontwikkel is. Die toepassing van die voorafgaande teoretiese
ontwikkelinge en rekenaarprogramme word geillustreer aan die hand van werklike
datastelle vanuit die praktyk. So word bistippings gebruik om die mate van
kapitaalintensiteit van ondememings te ondersoek en om as hulpmiddel by risikobestuur
van 'n finansiele instelling te dien. 'n Besondere toepassing van die HKA bistipping is die
TQI bistipping wat in die industriele omgewing gebruik word ten einde te bepaal tot watter
mate vervaardigde artikels aan neergelegde meerdimensionele spesifikasies voldoen. 'n
Verdere interessante toepassing is om te bepaal of 'n Ou-Kaapse meubelstuk van stinkhout
of embuia gemaak is. 'n Datastel verskaf deur Wes-Kaap Natuurbewaring in verband met
die bekende padloper skilpad, Homopus areolatus, is met behulp van bistippings
geanaliseer om te bepaal of daar morfometriese verskille tussen die padlopers afkomstig
van bepaalde geografiese gebiede is. Allometriese beginsels moes ook in ag gene em word
en die min waamemings in sommige van die subgroepe het tot gevolg dat konvensionele
statistiese tegnieke nie sonder meer gebruik kan word nie.
Die bistippingsmetodologie is ook toegepas op klassifikasie by 'n diabetes datastel om die
gekombineerde gebruik van. hoofkrommes in 'n robuuste bistipping te illustreer en
bistippingklassifikasie waar daar sprake van ongelyke kovariansiematrikse is. 'n
Diskriminantanalise probleem waar die weidingsvoorkeure van wildsbokke 'n verandering
in die dominante plantegroei tot gevolg kan he, word gebruik om bistippingklassifikasie met data waar kontinue sowel as kategoriese veranderlikes verskaf word, te illustreer. As
voorbeeld van die gebruik van bistippings by 'n groot datastel is 'n datastel bestaande uit
waamemings van 16828 suurlemoene met behulp van bistippingsmetodologie geanaliseer
ten einde verskille in vrugte afkomstig van verskillende produsente-streke, kultivars en
onderstamme te ondersoek. Die a-sakkies wat hier ontwikkel is, lei tot kwantifisering van
die grafiese oorvleueling van groepe. Hierdie beginsel word suksesvol toegepas in 'n
meerdimensionele sosio-ekonomiese datastel om die mate van oorvleueling van
verskillende bevolkingsgroepe te kwantifiseer.
Die toepassing van die voorgestelde bistippingsmetodologie in die praktyk lei tot 'n
belangrike newe-produk: Dit verskaf die stimulus tot die ontstaan van nuwe idees,
byvoorbeeld, die toepassing van 'n HKA bistipping in 'n industriele omgewing het tot die
ontwikkeling van die konsep van 'n kwaliteitsgebied aanleiding gegee; a-sakkies is
gekonstrueer om duisende waamemings in die suurlemoendatastel te verteenwoordig wat
weer gelei het tot 'n metode om die graad van oorvleueling te kwantifiseer. Hierdeur is die
geweldige veelsydigheid van bistippings geillustreer - bistippingsmetodologie verskaf die
infrastruktuur vir baie vindingryke toepassings in die praktyk.
|
2 |
A comparison of the efficiencies of the Gram-Charlier and Pearson frequency functions for fitting certain distributionsBradford, Henry Franklin January 1936 (has links)
No description available.
|
3 |
Analysis of outliers using graphical and quasi-Bayesian methods馮榮錦, Fung, Wing-kam, Tony. January 1987 (has links)
published_or_final_version / Statistics / Doctoral / Doctor of Philosophy
|
4 |
Some examples of Pearson's frequency curvesThomson, Mary Gilmore, 1897- January 1940 (has links)
No description available.
|
5 |
Métodos alternativos para análise rápida de parâmetros de qualidade da soja / Alternative methods for rapid analysis of soybean quality parametersSantos, Larissa da Rocha dos 24 February 2017 (has links)
CAPES; CNPQ / Dada a importância mundial da cultivar soja, é imprescindível a aplicação de metodologias para o monitoramento eficiente dos parâmetros fisíco-químicos que determinam a qualidade dos grãos com agilidade e confiabilidade adequadas. Entretanto, os métodos analíticos empregados para as análises tradicionais envolvem técnicas demoradas, utilizam vários equipamentos e reagentes, além de gerarem resíduos químicos. Desta forma, o desenvolvimento de metodologias alternativas para esta finalidade pode trazer benefícios tanto para as indústrias e órgãos reguladores quanto para os analistas. Este estudo propõe a utilização de Espectroscopia de Infravermelho Próximo (NIR) associada a métodos quimiométricos para a construção de modelos multivariados para previsão do percentual de lipídios totais, índice de acidez, teor de clorofila, proteína bruta e umidade em soja. Na construção dos modelos foram avaliadas 300 amostras de soja Glycine max (L.) Merrill. Os dados espectrais foram processados por meio do método de Mínimos Quadrados Parciais (PLS). Os resultados sugerem que os modelos desenvolvidos podem ser utilizados como uma metodologia alternativa para determinar parâmetros físico-químicos e poderiam ser aplicados no controle de qualidade em indústrias de soja. / Given the worldwide importance of soybean cultivars, it is essential to apply methodologies for the efficient monitoring of the physico-chemical parameters that determine the grain quality with adequate agility and reliability. Nonetheless, the analytical methods used in the traditional analysis involves time-consuming techniques, usage of various equipment and reagents besides generating chemical residues. Considering that, the development of alternative methodologies for this purpose can bring benefits to both industries and regulatory bodies as for the analysts. This study proposes the use of Near Infrared Spectroscopy (NIR) associated with chemometric methods for the construction of multivariate models to predict the percentage of total lipids, acidity index, chlorophyll content, crude protein and moisture in soybean. For this, 300 samples of Glycine max (L.) Merrill soybean were evaluated. The spectral data were processed by the method of Partial Least Squares (PLS). The results suggest that the developed model can be used as an alternative methodology to determine the physical-chemical parameters and could be applied in quality control in the soybean industries.
|
6 |
Métodos alternativos para análise rápida de parâmetros de qualidade da soja / Alternative methods for rapid analysis of soybean quality parametersSantos, Larissa da Rocha dos 24 February 2017 (has links)
CAPES; CNPQ / Dada a importância mundial da cultivar soja, é imprescindível a aplicação de metodologias para o monitoramento eficiente dos parâmetros fisíco-químicos que determinam a qualidade dos grãos com agilidade e confiabilidade adequadas. Entretanto, os métodos analíticos empregados para as análises tradicionais envolvem técnicas demoradas, utilizam vários equipamentos e reagentes, além de gerarem resíduos químicos. Desta forma, o desenvolvimento de metodologias alternativas para esta finalidade pode trazer benefícios tanto para as indústrias e órgãos reguladores quanto para os analistas. Este estudo propõe a utilização de Espectroscopia de Infravermelho Próximo (NIR) associada a métodos quimiométricos para a construção de modelos multivariados para previsão do percentual de lipídios totais, índice de acidez, teor de clorofila, proteína bruta e umidade em soja. Na construção dos modelos foram avaliadas 300 amostras de soja Glycine max (L.) Merrill. Os dados espectrais foram processados por meio do método de Mínimos Quadrados Parciais (PLS). Os resultados sugerem que os modelos desenvolvidos podem ser utilizados como uma metodologia alternativa para determinar parâmetros físico-químicos e poderiam ser aplicados no controle de qualidade em indústrias de soja. / Given the worldwide importance of soybean cultivars, it is essential to apply methodologies for the efficient monitoring of the physico-chemical parameters that determine the grain quality with adequate agility and reliability. Nonetheless, the analytical methods used in the traditional analysis involves time-consuming techniques, usage of various equipment and reagents besides generating chemical residues. Considering that, the development of alternative methodologies for this purpose can bring benefits to both industries and regulatory bodies as for the analysts. This study proposes the use of Near Infrared Spectroscopy (NIR) associated with chemometric methods for the construction of multivariate models to predict the percentage of total lipids, acidity index, chlorophyll content, crude protein and moisture in soybean. For this, 300 samples of Glycine max (L.) Merrill soybean were evaluated. The spectral data were processed by the method of Partial Least Squares (PLS). The results suggest that the developed model can be used as an alternative methodology to determine the physical-chemical parameters and could be applied in quality control in the soybean industries.
|
7 |
Types and levels of data arrangement and representation in statistics as modeled by grade 4 - 7 learnersWessels, Helena Margaretha 28 February 2006 (has links)
The crucial role of representation in mathematical and statistical modeling and problem solving as evident in learners' arrangement and representation of statistical data were investigated with focus points data arrangement, data representation and statistical thinking levels. The representation tasks required learners to arrange and represent data through modeling, focusing on spontaneous representations. Successful transnumeration determine the ultimate success of a representation and the ability to organise data is regarded as critical. Arrangement types increased in sophistication with increased grade level and the hierarchical nature of arrangement types became apparent when regarded in the context of an adapted SOLO Taxonomy framework. A higher level arrangement strategy pointed to a higher SOLO level of statistical thinking. Learners in the two tasks produced a rich variety of representations which included idiosyncratic, unsophisticated responses as well as standard statistical representations. The context of the two tasks, the quantitative versus qualitative nature of the data in the tasks, and the statistical tools or representational skills learners have at their disposal, played an important role in their representations. Well-planned data handling activities develop representational and higher order thinking skills. The variety of responses and different response levels elicited in the two tasks indicate that the nature of the tasks rather than the size of the data set play a conclusive role in data tasks. Multiple representations by an individual were an indication of successful modeling, are effective in problem solving and are associated with good performance. The SOLO model which incorporates a structural approach as well as a multimodal component proved valuable in the analysis of responses. Using this model with accompanying acknowledgement of different problem solving paths and the contribution of ikonic support in the concrete symbolic mode, promote the in-depth analysis of responses.
This study contributes to the research in the field of data representation and statistical thinking. The analysis and results led to an integrated picture of Grade 4-7 learners' representation of statistical data and of the statistical thinking levels evident in their representations. / Educational Studies / D. Ed. (Didactics)
|
8 |
Types and levels of data arrangement and representation in statistics as modeled by grade 4 - 7 learnersWessels, Helena Margaretha 28 February 2006 (has links)
The crucial role of representation in mathematical and statistical modeling and problem solving as evident in learners' arrangement and representation of statistical data were investigated with focus points data arrangement, data representation and statistical thinking levels. The representation tasks required learners to arrange and represent data through modeling, focusing on spontaneous representations. Successful transnumeration determine the ultimate success of a representation and the ability to organise data is regarded as critical. Arrangement types increased in sophistication with increased grade level and the hierarchical nature of arrangement types became apparent when regarded in the context of an adapted SOLO Taxonomy framework. A higher level arrangement strategy pointed to a higher SOLO level of statistical thinking. Learners in the two tasks produced a rich variety of representations which included idiosyncratic, unsophisticated responses as well as standard statistical representations. The context of the two tasks, the quantitative versus qualitative nature of the data in the tasks, and the statistical tools or representational skills learners have at their disposal, played an important role in their representations. Well-planned data handling activities develop representational and higher order thinking skills. The variety of responses and different response levels elicited in the two tasks indicate that the nature of the tasks rather than the size of the data set play a conclusive role in data tasks. Multiple representations by an individual were an indication of successful modeling, are effective in problem solving and are associated with good performance. The SOLO model which incorporates a structural approach as well as a multimodal component proved valuable in the analysis of responses. Using this model with accompanying acknowledgement of different problem solving paths and the contribution of ikonic support in the concrete symbolic mode, promote the in-depth analysis of responses.
This study contributes to the research in the field of data representation and statistical thinking. The analysis and results led to an integrated picture of Grade 4-7 learners' representation of statistical data and of the statistical thinking levels evident in their representations. / Educational Studies / D. Ed. (Didactics)
|
9 |
The geo-spatial analysis and environmental factors of narcotics hot spotsBalchak, Stefanie Wrae 01 January 2005 (has links)
A mixed methodological approach with two different analytic procedures and multiple data sources was used to examine narcotics hot spots. The first phase compares two methods of hot spots identification; the prediction model and the actual crimes. The second phase involves an intensive study to better understand the phenomenon of drug hot spots areas consistently shown to be repeat hot spots.
|
10 |
Web-based geotemporal visualization of healthcare dataBloomquist, Samuel W. 09 October 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Healthcare data visualization presents challenges due to its non-standard organizational structure and disparate record formats. Epidemiologists and clinicians currently lack the tools to discern patterns in large-scale data that would reveal valuable healthcare information at the granular level of individual patients and populations. Integrating geospatial and temporal healthcare data within a common visual context provides a twofold benefit: it allows clinicians to synthesize large-scale healthcare data to provide a context for local patient care decisions, and it better informs epidemiologists in making public health recommendations.
Advanced implementations of the Scalable Vector Graphic (SVG), HyperText Markup Language version 5 (HTML5), and Cascading Style Sheets version 3 (CSS3) specifications in the latest versions of most major Web browsers brought hardware-accelerated graphics to the Web and opened the door for more intricate and interactive visualization techniques than have previously been possible. We developed a series of new geotemporal visualization techniques under a general healthcare data visualization framework in order to provide a real-time dashboard for analysis and exploration of complex healthcare data. This visualization framework, HealthTerrain, is a concept space constructed using text and data mining techniques, extracted concepts, and attributes associated with geographical locations.
HealthTerrain's association graph serves two purposes. First, it is a powerful interactive visualization of the relationships among concept terms, allowing users to explore the concept space, discover correlations, and generate novel hypotheses. Second, it functions as a user interface, allowing selection of concept terms for further visual analysis.
In addition to the association graph, concept terms can be compared across time and location using several new visualization techniques. A spatial-temporal choropleth map projection embeds rich textures to generate an integrated, two-dimensional visualization. Its key feature is a new offset contour method to visualize multidimensional and time-series data associated with different geographical regions. Additionally, a ring graph reveals patterns at the fine granularity of patient occurrences using a new radial coordinate-based time-series visualization technique.
|
Page generated in 0.0944 seconds