• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 14
  • 4
  • 2
  • 1
  • Tagged with
  • 21
  • 10
  • 7
  • 7
  • 7
  • 7
  • 6
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Comparação do GGE biplot-ponderado e AMMI- ponderado com outros modelos de interação genótipo x ambiente / Comparison of weighted-GGE biplot and weighted-AMMI with other models of interaction genotype × environment

Hongyu, Kuang 09 April 2015 (has links)
Interação genótipo × ambiente (GEI) é uma questão extremamente importante no melhoramento genético de plantas e produção. A seleção e recomendação de genótipos superiores são dificultadas devido à ocorrência de GEI e representa um grande desafio para os pesquisadores. Nesse contexto, as análises biplot têm sido cada vez mais utilizadas na análise de dados agronômicos, em que os dados são representados por uma tabela de dupla entradas de médias de GEI. Entretanto, as particularidades existentes no gráfico biplot dificultam sua interpretação, podendo induzir o pesquisador a erros. Existem vários modelos na literatura para análise de DGE (dados de GEI), entre eles, os mais utilizados são os modelos AMMI (Additive Main effects and Multiplicative Interaction) e GGE biplot (Genotype main effects + Genotype environment interaction). O modelo AMMI é um método estatístico para compreender a estrutura de interações entre genótipos e ambientes, que combina a análise de variância e a análise de componentes principais, para ajustar, respectivamente, os efeitos principais (G e E) e os efeitos da GEI. O GGE Biplot agrupa o efeito aditivo de genótipo com o efeito multiplicativo da GEI, e submete estes à análise de componentes principais. Existem dois problemas na utilização destes modelos: i) só pode ser utilizado para analisar dados MET (multi-ambientes), que tenha uma única característica e ii) cujos ambientes são heterogêneos. O presente trabalho tem como objetivos propor novos modelos W-GGE biplot (Weighted Genotype main effects + Genotype environment interaction) e AMMI-ponderado para análise de dados multi-ambientes, além de fazer uma comparação entre os modelos existentes como AMMI e GGE biplot; análise de mega-ambiente; avaliação de genótipos, ambiente de teste dentro de cada mega-ambiente e compreender as causas da GEI. / Genotype × environment interaction (GEI) is an extremely important issue in plant breeding and production. The selection and recommendation of superior genotypes are hampered due to the occurrence of GEI and represents a major challenge for researchers. In this context, biplot analyzes have been increasingly used in analyzing agronomic data, in which data are represented by a table of two entries of means of GEI. However, the particularities in the biplot graphic hamper its interpretation, and could lead the researcher to errors. There are several models in the literature for DGE analysis (GEI data), among them, the most used are the AMMI model (Additive Main effects and Multiplicative Interaction Models) and GGE biplot (Genotype main effects + Genotype environment interaction). The AMMI model is a statistical method to understand the structure of interactions between genotypes and environments, combining the analysis of variance and principal component analysis, to adjust, respectively, the main effects (G and E) and the effects of GEI. The GGE Biplot groups genotype of additive effect with multiplicative effect of GEI, and submit these to the principal component analysis. There are two problems in using these models: i) can only be used to analyze MET data (multi-environments), which has a unique feature and ii) whose environments are heterogeneous. This paper aims to propose new W-GGE biplot models (Weighted Genotype main efffects + Genotype environment interaction) and AMMI-weighted multi-environments for data analysis, and make a comparison between the existing models as AMMI and GGE biplot; mega-environment analysis; genotype evaluation, test environment within each mega-environment and understand the causes of GEI.
2

Comparação do GGE biplot-ponderado e AMMI- ponderado com outros modelos de interação genótipo x ambiente / Comparison of weighted-GGE biplot and weighted-AMMI with other models of interaction genotype × environment

Kuang Hongyu 09 April 2015 (has links)
Interação genótipo × ambiente (GEI) é uma questão extremamente importante no melhoramento genético de plantas e produção. A seleção e recomendação de genótipos superiores são dificultadas devido à ocorrência de GEI e representa um grande desafio para os pesquisadores. Nesse contexto, as análises biplot têm sido cada vez mais utilizadas na análise de dados agronômicos, em que os dados são representados por uma tabela de dupla entradas de médias de GEI. Entretanto, as particularidades existentes no gráfico biplot dificultam sua interpretação, podendo induzir o pesquisador a erros. Existem vários modelos na literatura para análise de DGE (dados de GEI), entre eles, os mais utilizados são os modelos AMMI (Additive Main effects and Multiplicative Interaction) e GGE biplot (Genotype main effects + Genotype environment interaction). O modelo AMMI é um método estatístico para compreender a estrutura de interações entre genótipos e ambientes, que combina a análise de variância e a análise de componentes principais, para ajustar, respectivamente, os efeitos principais (G e E) e os efeitos da GEI. O GGE Biplot agrupa o efeito aditivo de genótipo com o efeito multiplicativo da GEI, e submete estes à análise de componentes principais. Existem dois problemas na utilização destes modelos: i) só pode ser utilizado para analisar dados MET (multi-ambientes), que tenha uma única característica e ii) cujos ambientes são heterogêneos. O presente trabalho tem como objetivos propor novos modelos W-GGE biplot (Weighted Genotype main effects + Genotype environment interaction) e AMMI-ponderado para análise de dados multi-ambientes, além de fazer uma comparação entre os modelos existentes como AMMI e GGE biplot; análise de mega-ambiente; avaliação de genótipos, ambiente de teste dentro de cada mega-ambiente e compreender as causas da GEI. / Genotype × environment interaction (GEI) is an extremely important issue in plant breeding and production. The selection and recommendation of superior genotypes are hampered due to the occurrence of GEI and represents a major challenge for researchers. In this context, biplot analyzes have been increasingly used in analyzing agronomic data, in which data are represented by a table of two entries of means of GEI. However, the particularities in the biplot graphic hamper its interpretation, and could lead the researcher to errors. There are several models in the literature for DGE analysis (GEI data), among them, the most used are the AMMI model (Additive Main effects and Multiplicative Interaction Models) and GGE biplot (Genotype main effects + Genotype environment interaction). The AMMI model is a statistical method to understand the structure of interactions between genotypes and environments, combining the analysis of variance and principal component analysis, to adjust, respectively, the main effects (G and E) and the effects of GEI. The GGE Biplot groups genotype of additive effect with multiplicative effect of GEI, and submit these to the principal component analysis. There are two problems in using these models: i) can only be used to analyze MET data (multi-environments), which has a unique feature and ii) whose environments are heterogeneous. This paper aims to propose new W-GGE biplot models (Weighted Genotype main efffects + Genotype environment interaction) and AMMI-weighted multi-environments for data analysis, and make a comparison between the existing models as AMMI and GGE biplot; mega-environment analysis; genotype evaluation, test environment within each mega-environment and understand the causes of GEI.
3

Empregando técnicas de visualização de informação para transformação interativa de dados multidimensionais / Transforming muldimensional data using information visualization techniques

Fatore, Francisco Morgani 27 July 2015 (has links)
A exploração de conjuntos de dados é um problema abordado com frequência em diversos domínios e tem como objetivo uma melhor compreensão de fenômenos simulados ou medidos. Tal atividade é precedida pelas etapas de coleta e armazenamento de dados que buscam registrar o máximo de detalhes sobre algum fenômeno observado. Porém, a exploração efetiva dos dados envolve uma série de desafios. Um deles é a dificuldade em identificar quais dados são realmente relevantes para as análises. Outro problema está relacionado com a falta de garantias de que os fatores fundamentais para a compreensão do problema tenham sido coletados. A transformação interativa de dados é uma abordagem que utiliza técnicas de visualização computacional para resolver ou minimizar esses problemas. No entanto, os trabalhos disponíveis na literatura possuem limitações, como interfaces demasiadamente complexas e mecanismos de interação pouco flexíveis. Assim, este projeto de mestrado teve como objetivo desenvolver novas técnicas visuais interativas para a transformação de dados multidimensionais. A metodologia desenvolvida se baseou no uso de biplots e na ação conjunta dos mecanismos de interação para superar as limitações das técnicas do estado da arte. Os resultados dos experimentos realizados sobre diversos conjuntos de dados dão indícios de que os métodos desenvolvidos possibilitam a obtenção de conjuntos de dados mais representativos. Mais especificamente, foram obtidos melhores resultados em tarefas de classificação de dados ao utilizar os métodos desenvolvidos. / The exploration of datasets is a frequently task in several fields and aims at a better understanding of simulated or measured phenomena. Such activity is preceded by the steps of collecting and storing data, which seek to record as much detail possible about an observed phenomenon. The exploration task is challenging due to many aspects. One of them is the difficulty in identifying which collected data are actually relevant to the analysis. Another one is related to the lack of guarantees that the key factors for understanding the problem have been collected. The interactive transformation of data is a visual based approach that seeks to solve or mitigate these problems. However, the available methods in the literature have limitations in several aspects, such as complex user interfaces and inflexible interactive mechanisms. So, this master project had the goal to develop novel visual techniques for the transformation of datasets. The proposed methodology was based on the use of biplots and interaction mechanisms to overcome the limitations of the state of the art techniques. Empirical results show that by using the proposed approach, it is possible to make the data more representative. Therefore, exploratory activities, classifications, were performed more efficiently and thus provided better results.
4

Empregando técnicas de visualização de informação para transformação interativa de dados multidimensionais / Transforming muldimensional data using information visualization techniques

Francisco Morgani Fatore 27 July 2015 (has links)
A exploração de conjuntos de dados é um problema abordado com frequência em diversos domínios e tem como objetivo uma melhor compreensão de fenômenos simulados ou medidos. Tal atividade é precedida pelas etapas de coleta e armazenamento de dados que buscam registrar o máximo de detalhes sobre algum fenômeno observado. Porém, a exploração efetiva dos dados envolve uma série de desafios. Um deles é a dificuldade em identificar quais dados são realmente relevantes para as análises. Outro problema está relacionado com a falta de garantias de que os fatores fundamentais para a compreensão do problema tenham sido coletados. A transformação interativa de dados é uma abordagem que utiliza técnicas de visualização computacional para resolver ou minimizar esses problemas. No entanto, os trabalhos disponíveis na literatura possuem limitações, como interfaces demasiadamente complexas e mecanismos de interação pouco flexíveis. Assim, este projeto de mestrado teve como objetivo desenvolver novas técnicas visuais interativas para a transformação de dados multidimensionais. A metodologia desenvolvida se baseou no uso de biplots e na ação conjunta dos mecanismos de interação para superar as limitações das técnicas do estado da arte. Os resultados dos experimentos realizados sobre diversos conjuntos de dados dão indícios de que os métodos desenvolvidos possibilitam a obtenção de conjuntos de dados mais representativos. Mais especificamente, foram obtidos melhores resultados em tarefas de classificação de dados ao utilizar os métodos desenvolvidos. / The exploration of datasets is a frequently task in several fields and aims at a better understanding of simulated or measured phenomena. Such activity is preceded by the steps of collecting and storing data, which seek to record as much detail possible about an observed phenomenon. The exploration task is challenging due to many aspects. One of them is the difficulty in identifying which collected data are actually relevant to the analysis. Another one is related to the lack of guarantees that the key factors for understanding the problem have been collected. The interactive transformation of data is a visual based approach that seeks to solve or mitigate these problems. However, the available methods in the literature have limitations in several aspects, such as complex user interfaces and inflexible interactive mechanisms. So, this master project had the goal to develop novel visual techniques for the transformation of datasets. The proposed methodology was based on the use of biplots and interaction mechanisms to overcome the limitations of the state of the art techniques. Empirical results show that by using the proposed approach, it is possible to make the data more representative. Therefore, exploratory activities, classifications, were performed more efficiently and thus provided better results.
5

PCA and CVA biplots : a study of their underlying theory and quality measures

Brand, Hilmarie 03 1900 (has links)
Thesis (MComm)--Stellenbosch University, 2013. / ENGLISH ABSTRACT: The main topics of study in this thesis are the Principal Component Analysis (PCA) and Canonical Variate Analysis (CVA) biplots, with the primary focus falling on the quality measures associated with these biplots. A detailed study of different routes along which PCA and CVA can be derived precedes the study of the PCA biplot and CVA biplot respectively. Different perspectives on PCA and CVA highlight different aspects of the theory that underlie PCA and CVA biplots respectively and so contribute to a more solid understanding of these biplots and their interpretation. PCA is studied via the routes followed by Pearson (1901) and Hotelling (1933). CVA is studied from the perspectives of Linear Discriminant Analysis, Canonical Correlation Analysis as well as a two-step approach introduced in Gower et al. (2011). The close relationship between CVA and Multivariate Analysis of Variance (MANOVA) also receives some attention. An explanation of the construction of the PCA biplot is provided subsequent to the study of PCA. Thereafter follows an in depth investigation of quality measures of the PCA biplot as well as the relationships between these quality measures. Specific attention is given to the effect of standardisation on the PCA biplot and its quality measures. Following the study of CVA is an explanation of the construction of the weighted CVA biplot as well as two different unweighted CVA biplots based on the two-step approach to CVA. Specific attention is given to the effect of accounting for group sizes in the construction of the CVA biplot on the representation of the group structure underlying a data set. It was found that larger groups tend to be better separated from other groups in the weighted CVA biplot than in the corresponding unweighted CVA biplots. Similarly it was found that smaller groups tend to be separated to a greater extent from other groups in the unweighted CVA biplots than in the corresponding weighted CVA biplot. A detailed investigation of previously defined quality measures of the CVA biplot follows the study of the CVA biplot. It was found that the accuracy with which the group centroids of larger groups are approximated in the weighted CVA biplot is usually higher than that in the corresponding unweighted CVA biplots. Three new quality measures that assess that accuracy of the Pythagorean distances in the CVA biplot are also defined. These quality measures assess the accuracy of the Pythagorean distances between the group centroids, the Pythagorean distances between the individual samples and the Pythagorean distances between the individual samples and group centroids in the CVA biplot respectively. / AFRIKAANSE OPSOMMING: Die hoofonderwerpe van studie in hierdie tesis is die Hoofkomponent Analise (HKA) bistipping asook die Kanoniese Veranderlike Analise (KVA) bistipping met die primêre fokus op die kwaliteitsmaatstawwe wat daarmee geassosieer word. ’n Gedetailleerde studie van verskillende roetes waarlangs HKA en KVA afgelei kan word, gaan die studie van die HKA en KVA bistippings respektiewelik vooraf. Verskillende perspektiewe op HKA en KVA belig verskillende aspekte van die teorie wat onderliggend is tot die HKA en KVA bistippings respektiewelik en dra sodoende by tot ’n meer breedvoerige begrip van hierdie bistippings en hulle interpretasies. HKA word bestudeer volgens die roetes wat gevolg is deur Pearson (1901) en Hotelling (1933). KVA word bestudeer vanuit die perspektiewe van Linieêre Diskriminantanalise, Kanoniese Korrelasie-analise sowel as ’n twee-stap-benadering soos voorgestel in Gower et al. (2011). Die noue verwantskap tussen KVA en Meerveranderlike Analise van Variansie (MANOVA) kry ook aandag. ’n Verduideliking van die konstruksie van die HKA bistipping word voorsien na afloop van die studie van HKA. Daarna volg ’n indiepte-ondersoek van die HKA bistipping kwaliteitsmaatstawwe sowel as die onderlinge verhoudings tussen hierdie kwaliteitsmaatstawe. Spesifieke aandag word gegee aan die effek van die standaardisasie op die HKA bistipping en sy kwaliteitsmaatstawe. Opvolgend op die studie van KVA is ’n verduideliking van die konstruksie van die geweegde KVA bistipping sowel as twee veskillende ongeweegde KVA bistippings gebaseer op die twee-stap-benadering tot KVA. Spesifieke aandag word gegee aan die effek wat die inagneming van die groepsgroottes in die konstruksie van die KVA bistipping op die voorstelling van die groepstruktuur onderliggend aan ’n datastel het. Daar is gevind dat groter groepe beter geskei is van ander groepe in die geweegde KVA bistipping as in die oorstemmende ongeweegde KVA bistipping. Soortgelyk daaraan is gevind dat kleiner groepe tot ’n groter mate geskei is van ander groepe in die ongeweegde KVA bistipping as in die oorstemmende geweegde KVA bistipping. ’n Gedetailleerde ondersoek van voorheen gedefinieerde kwaliteitsmaatstawe van die KVA bistipping volg op die studie van die KVA bistipping. Daar is gevind dat die akkuraatheid waarmee die groepsgemiddeldes van groter groepe benader word in die geweegde KVA bistipping, gewoonlik hoër is as in die ooreenstemmende ongeweegde KVA bistippings. Drie nuwe kwaliteitsmaatstawe wat die akkuraatheid van die Pythagoras-afstande in die KVA bistipping meet, word gedefinieer. Hierdie kwaliteitsmaatstawe beskryf onderskeidelik die akkuraatheid van die voorstelling van die Pythagoras-afstande tussen die groepsgemiddeldes, die Pythagoras-afstande tussen die individuele observasies en die Pythagoras-afstande tussen die individuele observasies en groepsgemiddeldes in die KVA bistipping.
6

Estabilidade em análise de agrupamento via modelo AMMI com reamostragem \"bootstrap\" / Stability in clustering analysis through the AMMI methodology with bootstrap

Godoi, Débora Robert de 11 October 2013 (has links)
O objetivo deste trabalho é propor uma nova metodologia de interpretação da estabilidade dos métodos de agrupamento, para dados de vegetação, utilizando a metodologia AMMI e a reamostragem (bootstrap), para ganhar confiabilidade nos agrupamentos formados. Os dados utilizados são provenientes do departamento de genética da Escola Superior de Agricultura \"Luiz de Queiroz\", e visam à produtividade de soja. Primeiramente aplica-se a metodologia AMMI e então, é estimada a matriz de distâncias euclidianas - com base nos dados originais e obtidos via reamostragem (bootstrap) - para a aplicação dos métodos de agrupamento (vizinho mais próximo, vizinho mais distante, ligação média, centroide, mediana e Ward). Para a verificação da validade dos agrupamentos formados utiliza-se o coeficiente de correlação cofenética, e pelo teste de Mantel, é apresentada a distribuição empírica dos coeficientes de correlação cofenética. Os agrupamentos obtidos pelos diferentes métodos são, em sua maioria, semelhantes indicando que, em princípio, qualquer um desses métodos seria adequado para a representação. O método que apresenta resultados discrepantes em relação aos outros (tanto para os dados originais, quanto pelos dados obtidos via bootstrap) - na representação gráfica em dendrograma - é método de Ward. Este estudo é promissor na análise da validade de agrupamentos formados em dados de vegetação. / The objective of this work is to propose a new interpretation methodology of clustering methods for vegetation data stability, using the AMMI and bootstrap methodology, to gain reliability in the clusters formed. The database used is from the Departament of Genetics of Luiz de Queiroz College of Agriculture, aiming soybean yield. Firstly AMMI is applied, then the Euclidian distance matrix is estimated - based on the original data and on the acquired by the bootstrap method - for the application of clustering methods (nearest neighbor, furthest neighbor, average linkage, centroid , median and Ward). In order to assess the validity of clusters formed the cophenetic correlation coefficient is used, and the Mantel test, in order to show the empirical distribution of the cophenetic correlation coefficients. The clusters obtained by different methods are, in most cases, quite similar, indicating that in principle, any of these methods would be suitable for the representation. The method that presents discrepant results (for both the original and bootstrap method obtained data) - on the dendrogram graphical representation, compared to the others - is the Ward\'s. This study is promising in the analysis of validity of clusters formed in vegetation data.
7

Estabilidade em análise de agrupamento via modelo AMMI com reamostragem \"bootstrap\" / Stability in clustering analysis through the AMMI methodology with bootstrap

Débora Robert de Godoi 11 October 2013 (has links)
O objetivo deste trabalho é propor uma nova metodologia de interpretação da estabilidade dos métodos de agrupamento, para dados de vegetação, utilizando a metodologia AMMI e a reamostragem (bootstrap), para ganhar confiabilidade nos agrupamentos formados. Os dados utilizados são provenientes do departamento de genética da Escola Superior de Agricultura \"Luiz de Queiroz\", e visam à produtividade de soja. Primeiramente aplica-se a metodologia AMMI e então, é estimada a matriz de distâncias euclidianas - com base nos dados originais e obtidos via reamostragem (bootstrap) - para a aplicação dos métodos de agrupamento (vizinho mais próximo, vizinho mais distante, ligação média, centroide, mediana e Ward). Para a verificação da validade dos agrupamentos formados utiliza-se o coeficiente de correlação cofenética, e pelo teste de Mantel, é apresentada a distribuição empírica dos coeficientes de correlação cofenética. Os agrupamentos obtidos pelos diferentes métodos são, em sua maioria, semelhantes indicando que, em princípio, qualquer um desses métodos seria adequado para a representação. O método que apresenta resultados discrepantes em relação aos outros (tanto para os dados originais, quanto pelos dados obtidos via bootstrap) - na representação gráfica em dendrograma - é método de Ward. Este estudo é promissor na análise da validade de agrupamentos formados em dados de vegetação. / The objective of this work is to propose a new interpretation methodology of clustering methods for vegetation data stability, using the AMMI and bootstrap methodology, to gain reliability in the clusters formed. The database used is from the Departament of Genetics of Luiz de Queiroz College of Agriculture, aiming soybean yield. Firstly AMMI is applied, then the Euclidian distance matrix is estimated - based on the original data and on the acquired by the bootstrap method - for the application of clustering methods (nearest neighbor, furthest neighbor, average linkage, centroid , median and Ward). In order to assess the validity of clusters formed the cophenetic correlation coefficient is used, and the Mantel test, in order to show the empirical distribution of the cophenetic correlation coefficients. The clusters obtained by different methods are, in most cases, quite similar, indicating that in principle, any of these methods would be suitable for the representation. The method that presents discrepant results (for both the original and bootstrap method obtained data) - on the dendrogram graphical representation, compared to the others - is the Ward\'s. This study is promising in the analysis of validity of clusters formed in vegetation data.
8

Adaptabilidade e estabilidade fenotípica de clones de cana-de-açúcar em dois ciclos produtivos /

Regis, Jiuli Ani Vilas Boas January 2016 (has links)
Orientador: João Antonio da Costa Andrade / Resumo: Na fase final de um programa de melhoramento, especificamente na recomendação de cultivares, o conhecimento da interação genótipos x ambientes (GxA) é essencial, porque analisa a existência de desempenho diferencial de genótipos em diferentes ambientes. Os efeitos da interação genótipos x ambientes sobre a adaptabilidade e estabilidade são de grande importância, visto que cada genótipo possui uma capacidade inerente de responder às mudanças ambientais. Dentre as estratégias usadas para identificar cultivares com baixos níveis de interação genótipos x ambientes, está a seleção de genótipos com alta adaptabilidade e estabilidade. O objetivo deste trabalho foi identificar clones de cana-de-açúcar produtivos, com boa estabilidade e adaptabilidade, considerando dois ciclos produtivos. Vinte e cinco clones precoces mais cinco testemunhas foram avaliados em 24 ambientes, em delineamento de blocos ao acaso, com três repetições. Para verificação da adaptabilidade e estabilidade foi utilizado o método de regressão bissegmentada e os métodos multivariados AMMI e GGE biplot. Os clones avaliados não apresentam os parâmetros ideais, como preconizado pelo método da regressão bissegmentada. De acordo com as três abordagens utilizadas, que são complementares nas informações desejadas, os clones mais promissores em termos de estabilidade e adaptabilidade geral são G13, G12 e G5. / Mestre
9

Genotype by Environment Interaction Effects on Starch, Fibre and Agronomic Traits in Potato (Solanum tuberosum L.)

Bach, Stephanie 15 December 2011 (has links)
In this thesis, the relationships between 17 traits including starch, fibre, culinary quality and agronomic parameters of potato were investigated. In two studies, 12 genotypes were grown at three locations in Ontario and 18 genotypes were grown at four locations in Manitoba, Ontario and New Brunswick. Genotype by environment interactions were significant for fibre and agronomic traits, except bake score and specific gravity. Correlations were found between some, but not all, starch, fibre and agronomic parameters. Several genotypes containing desirable starch, fibre and agronomic profiles with high stability were identified. Although no single genotype was superior in all analyzed traits, certain genotypes excelled in specific attributes. CV96044-3 had the best starch and fibre profile, but low yields compared to other cultivars. Three genotypes, CV96044-3, F04037 and Goldrush, may be useful parents in a breeding program to improve starch and fibre characteristics, producing cultivars containing all desirable traits. / AAFC, Agricultural Bioproducts Innovation Program, BioPotato Network
10

Robust principal component analysis biplots

Wedlake, Ryan Stuart 03 1900 (has links)
Thesis (MSc (Mathematical Statistics))--University of Stellenbosch, 2008. / In this study several procedures for finding robust principal components (RPCs) for low and high dimensional data sets are investigated in parallel with robust principal component analysis (RPCA) biplots. These RPCA biplots will be used for the simultaneous visualisation of the observations and variables in the subspace spanned by the RPCs. Chapter 1 contains: a brief overview of the difficulties that are encountered when graphically investigating patterns and relationships in multidimensional data and why PCA can be used to circumvent these difficulties; the objectives of this study; a summary of the work done in order to meet these objectives; certain results in matrix algebra that are needed throughout this study. In Chapter 2 the derivation of the classic sample principal components (SPCs) is first discussed in detail since they are the „building blocks‟ of classic principal component analysis (CPCA) biplots. Secondly, the traditional CPCA biplot of Gabriel (1971) is reviewed. Thirdly, modifications to this biplot using the new philosophy of Gower & Hand (1996) are given attention. Reasons why this modified biplot has several advantages over the traditional biplot – some of which are aesthetical in nature – are given. Lastly, changes that can be made to the Gower & Hand (1996) PCA biplot to optimally visualise the correlations between the variables is discussed. Because the SPCs determine the position of the observations as well as the orientation of the arrows (traditional biplot) or axes (Gower and Hand biplot) in the PCA biplot subspace, it is useful to give estimates of the standard errors of the SPCs together with the biplot display as an indication of the stability of the biplot. A computer-intensive statistical technique called the Bootstrap is firstly discussed that is used to calculate the standard errors of the SPCs without making underlying distributional assumptions. Secondly, the influence of outliers on Bootstrap results is investigated. Lastly, a robust form of the Bootstrap is briefly discussed for calculating standard error estimates that remain stable with or without the presence of outliers in the sample. All the preceding topics are the subject matter of Chapter 3. In Chapter 4, reasons why a PC analysis should be made robust in the presence of outliers are firstly discussed. Secondly, different types of outliers are discussed. Thirdly, a method for identifying influential observations and a method for identifying outlying observations are investigated. Lastly, different methods for constructing robust estimates of location and dispersion for the observations receive attention. These robust estimates are used in numerical procedures that calculate RPCs. In Chapter 5, an overview of some of the procedures that are used to calculate RPCs for lower and higher dimensional data sets is firstly discussed. Secondly, two numerical procedures that can be used to calculate RPCs for lower dimensional data sets are discussed and compared in detail. Details and examples of robust versions of the Gower & Hand (1996) PCA biplot that can be constructed using these RPCs are also provided. In Chapter 6, five numerical procedures for calculating RPCs for higher dimensional data sets are discussed in detail. Once RPCs have been obtained by using these methods, they are used to construct robust versions of the PCA biplot of Gower & Hand (1996). Details and examples of these robust PCA biplots are also provided. An extensive software library has been developed so that the biplot methodology discussed in this study can be used in practice. The functions in this library are given in an appendix at the end of this study. This software library is used on data sets from various fields so that the merit of the theory developed in this study can be visually appraised.

Page generated in 0.4199 seconds