Spelling suggestions: "subject:"missing calues"" "subject:"missing 5values""
21 |
Εφαρμογή της παραγοντικής ανάλυσης για την ανίχνευση και περιγραφή της κατανάλωσης αλκοολούχων ποτών του ελληνικού πληθυσμούΡεκούτη, Αγγελική 21 October 2011 (has links)
Σκοπός της εργασίας αυτής είναι να εφαρμόσουμε την Παραγοντική Ανάλυση στο δείγμα μας, έτσι ώστε να ανιχνεύσουμε και να περιγράψουμε τις καταναλωτικές συνήθειες του Ελληνικού πληθυσμού ως προς την κατανάλωση 9 κατηγοριών αλκοολούχων ποτών. Η εφαρμογή της μεθόδου γίνεται με την χρήση του στατιστικού προγράμματος SPSS.
Στο πρώτο κεφάλαιο παρουσιάζεται η οικογένεια μεθόδων επίλυσης του προβλήματος και στο δεύτερο η μέθοδος που επιλέχτηκε για την επίλυση, η Παραγοντική Ανάλυση. Προσδιορίζουμε το αντικείμενο, τα στάδια σχεδιασμού και τις προϋποθέσεις της μεθόδου, καθώς και τα κριτήρια αξιολόγησης των αποτελεσμάτων.
Τα κεφάλαια που ακολουθούν αποτελούν το πρακτικό μέρος της εργασίας. Στο 3ο κεφάλαιο αναφέρουμε την πηγή των δεδομένων μας και την διεξαγωγή του τρόπου συλλογής τους. Ακολουθεί ο εντοπισμός των «χαμένων» απαντήσεων και εφαρμόζεται η Ανάλυση των Χαμένων Τιμών (Missing Values Analysis) για τον προσδιορισμό του είδους αυτών και την αποκατάσταση τους στο δείγμα. Στην συνέχεια παρουσιάζουμε το δείγμα μας με τη βοήθεια της περιγραφικής στατιστικής και τέλος δημιουργούμε και περιγράφουμε το τελικό μητρώο δεδομένων το οποίο θα αναλύσουμε παραγοντικά.
Στο 4ο και τελευταίο κεφάλαιο διερευνάται η καταλληλότητα του δείγματος για την εφαρμογή της Παραγοντικής Ανάλυσης με τον έλεγχο της ικανοποίησης των προϋποθέσεων της μεθόδου. Ακολουθεί η παράλληλη μελέτη του δείγματος συμπεριλαμβάνοντας και μη στην επίλυση τις ακραίες τιμές (outliers) που εντοπίστηκαν. Καταλήγοντας στο συμπέρασμα ότι οι ακραίες τιμές δεν επηρεάζουν τα αποτελέσματα της μεθόδου, εφαρμόζουμε την Παραγοντική Ανάλυση με τη χρήση της μεθόδου των κυρίων συνιστωσών και αναφέρουμε αναλυτικά όλα τα βήματα μέχρι να καταλήξουμε στα τελικά συμπεράσματα μας. / The purpose of this paper is to apply the Factor Analysis to our sample in order to detect and describe patterns concerning the consumption of 9 categories of alcoholic beverages by the Greek population. For the application of the method, we use the statistical program SPSS.
The first chapter presents the available methods for solving this problem and the second one presents the chosen method, namely Factor Analysis. We specify the objective of the analysis, the design and the critical assumptions of the method, as well as the criteria for the evaluation of the results.
In the third chapter we present the source of our data and how the sampling was performed. Furthermore, we identify the missing values and we apply the Missing Values Analysis to determine their type. We also present our sample using descriptive statistics and then create and describe the final matrix which we analyze with Factor Analysis.
In the fourth and last chapter we investigate the suitability of our samples for applying Factor Analysis. In the sequence, we perform the parallel study of our sample both including and not including the extreme values that we identified (which we call “outliers”). We conclude that the outliers do not affect the results of our method and then apply Factor Analysis using the extraction method of Principal Components. We also mention in detail all steps until reaching our final conclusions.
|
22 |
Análise e predição de desembarque de characiformes migradores do município de Santarém-PASantana, Isabela Feitosa 19 July 2009 (has links)
Made available in DSpace on 2015-04-11T13:56:31Z (GMT). No. of bitstreams: 1
Dissertacao Isabela.pdf: 1836486 bytes, checksum: bf6c8c5db338bc806954228e1b7b94fe (MD5)
Previous issue date: 2009-07-19 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / As séries históricas de 11 anos de desembarque das espécies Prochilodus nigricans e Semaprochilodus sp., ocorridas no período de janeiro de 1992 a dezembro de 2002 no município de Santarém-PA, foram utilizadas para análise e predição, juntamente com séries de SOI, SST‟s e níveis hidrológicos dos rios Amazonas e Tapajós. Infelizmente, os dados relativos às séries de desembarque de jaraquis e das cotas do Rio tapajós possuíam missing values, o que impossibilitava a realização de análises e predições, porém, o uso da modelagem de Box & Jenkins permitiu completar essas lacunas. Após as estimações dos missing values, promovemos a análise espectral em todas as variáveis citadas, verificamos ciclos relacionados com os fenômenos El Niño e La Niña, com duração de 2 a 7 anos, notamos que esses eventos influenciaram fortemente na variação do nível dos rios e, conseqüentemente, no desembarque dessas espécies. Notamos, também, aumento dos valores de desembarque nos períodos de 2 a 3 anos. Estes períodos podem estar relacionados à ocorrência de fortes cheias que, provavelmente, geraram o sucesso reprodutivo dessas espécies, levando ao aumento das capturas após 2 ou 3 anos. Outras oscilações foram observadas nos desembarques e nível dos rios, tais como oscilações semi-anuais e intra-sazonais. Sabemos que estas oscilações possuem certa influência sobre as precipitações na região amazônica e, portanto, sobre a pesca, mas ainda são necessários estudos mais apurados para o melhor entendimento dessas oscilações sobre o comportamento da pesca dessas espécies. Os modelos de Box & Jenkins também foram usados para a modelagem de desembarque nos anos de 2003 e 2004, a fim de verificar a eficiência desta ferramenta para predições. Empregamos ferramentas métricas que definem o erro das predições, com isso, observamos que os modelos ARIMA são eficientes na predição para médio e curto prazo (12 meses), no qual o modelo demonstrou bom ajuste nas predições para o ano de 2003 em ambas as espécies. / As séries históricas de 11 anos de desembarque das espécies Prochilodus nigricans e Semaprochilodus sp., ocorridas no período de janeiro de 1992 a dezembro de 2002 no município de Santarém-PA, foram utilizadas para análise e predição, juntamente com séries de SOI, SST‟s e níveis hidrológicos dos rios Amazonas e Tapajós. Infelizmente, os dados relativos às séries de desembarque de jaraquis e das cotas do Rio tapajós possuíam missing values, o que impossibilitava a realização de análises e predições, porém, o uso da modelagem de Box & Jenkins permitiu completar essas lacunas. Após as estimações dos missing values, promovemos a análise espectral em todas as variáveis citadas, verificamos ciclos relacionados com os fenômenos El Niño e La Niña, com duração de 2 a 7 anos, notamos que esses eventos influenciaram fortemente na variação do nível dos rios e, conseqüentemente, no desembarque dessas espécies. Notamos, também, aumento dos valores de desembarque nos períodos de 2 a 3 anos. Estes períodos podem estar relacionados à ocorrência de fortes cheias que, provavelmente, geraram o sucesso reprodutivo dessas espécies, levando ao aumento das capturas após 2 ou 3 anos. Outras oscilações foram observadas nos desembarques e nível dos rios, tais como oscilações semi-anuais e intra-sazonais. Sabemos que estas oscilações possuem certa influência sobre as precipitações na região amazônica e, portanto, sobre a pesca, mas ainda são necessários estudos mais apurados para o melhor entendimento dessas oscilações sobre o comportamento da pesca dessas espécies. Os modelos de Box & Jenkins também foram usados para a modelagem de desembarque nos anos de 2003 e 2004, a fim de verificar a eficiência desta ferramenta para predições. Empregamos ferramentas métricas que definem o erro das predições, com isso, observamos que os modelos ARIMA são eficientes na predição para médio e curto prazo (12 meses), no qual o modelo demonstrou bom ajuste nas predições para o ano de 2003 em ambas as espécies.
|
23 |
Imputação de dados em experimentos multiambientais: novos algoritmos utilizando a decomposição por valores singulares / Data imputation in multi-environment trials: new algorithms using the singular value decompositionSergio Arciniegas Alarcon 02 February 2016 (has links)
As análises biplot que utilizam os modelos de efeitos principais aditivos com inter- ação multiplicativa (AMMI) requerem matrizes de dados completas, mas, frequentemente os ensaios multiambientais apresentam dados faltantes. Nesta tese são propostas novas metodologias de imputação simples e múltipla que podem ser usadas para analisar da- dos desbalanceados em experimentos com interação genótipo por ambiente (G×E). A primeira, é uma nova extensão do método de validação cruzada por autovetor (Bro et al, 2008). A segunda, corresponde a um novo algoritmo não-paramétrico obtido por meio de modificações no método de imputação simples desenvolvido por Yan (2013). Também é incluído um estudo que considera sistemas de imputação recentemente relatados na literatura e os compara com o procedimento clássico recomendado para imputação em ensaios (G×E), ou seja, a combinação do algoritmo de Esperança-Maximização com os modelos AMMI ou EM-AMMI. Por último, são fornecidas generalizações da imputação simples descrita por Arciniegas-Alarcón et al. (2010) que mistura regressão com aproximação de posto inferior de uma matriz. Todas as metodologias têm como base a decomposição por valores singulares (DVS), portanto, são livres de pressuposições distribucionais ou estruturais. Para determinar o desempenho dos novos esquemas de imputação foram realizadas simulações baseadas em conjuntos de dados reais de diferentes espécies, com valores re- tirados aleatoriamente em diferentes porcentagens e a qualidade das imputações avaliada com distintas estatísticas. Concluiu-se que a DVS constitui uma ferramenta útil e flexível na construção de técnicas eficientes que contornem o problema de perda de informação em matrizes experimentais. / The biplot analysis using the additive main effects and multiplicative interaction models (AMMI) require complete data matrix, but often multi-environments trials have missing values. This thesis proposed new methods of single and multiple imputation that can be used to analyze unbalanced data in experiments with genotype by environment interaction (G×E). The first is a new extension of the cross-validation method by eigenvector (Bro et al., 2008). The second, corresponds to a new non-parametric algorithm obtained through modifications of the simple imputation method developed by Yan (2013). Also is included a study that considers imputation systems recently reported in the literature and compares them with the classic procedure recommended for imputation in trials (G×E), it means, the combination of the Expectation-Maximization (EM) algorithm with the additive main effects and multiplicative interaction (AMMI) model or EM-AMMI. Finally, are supplied generalizations of simple imputation described by Arciniegas-Alarcón et al. (2010) that combines regression with lower-rank approximation of a matrix. All methodologies are based on singular value decomposition (SVD), so, are free of any distributional or structural assumptions. In order to determine the performance of the new imputation schemes were performed simulations based on real data set of different species, with values deleted randomly at different percentages and the quality of the imputations was evaluated using different statistics. It was concluded that SVD provides a useful and flexible tool for the construction of efficient techniques that circumvent the problem of missing data in experimental matrices.
|
24 |
Využití data miningových metod při zpracování dat z demografických šetření / Using data mining methods for demographic survey data processingFišer, David January 2015 (has links)
USING DATA MINING METHODS FOR DEMOGRAPHIC SURVEY DATA PROCESSING Abstract The goal of the thesis was to describe and demonstrate principles of the process of knowledge discovery in databases - data mining (DM). In the theoretical part of the thesis, selected methods for data mining processes are described as well as basic principles of those DM techniques. In the second part of the thesis a DM task is realized in accordance to CRISP-DM methodology. Practical part of the thesis is divided into two parts and data from the survey of American Community Survey served as the basic data for the practical part of the thesis. First part contains a classification task which goal was to determinate whether the selected DM techniques can be used to solve missing data in the surveys. The success rate of classifications and following data value prediction in selected attributes was in 55-80 % range. The second part of the practical part of the thesis was then focused of determining knowledge of interest using associating rules and the GUHA method. Keywords: data mining, knowledge discovery in databases, statistic surveys, missing values, classification, association rules, GUHA method, ACS
|
25 |
The use of weights to account for non-response and drop-outHöfler, Michael, Pfister, Hildegard, Lieb, Roselind, Wittchen, Hans-Ulrich January 2005 (has links)
Background: Empirical studies in psychiatric research and other fields often show substantially high refusal and drop-out rates. Non-participation and drop-out may introduce a bias whose magnitude depends on how strongly its determinants are related to the respective parameter of interest.
Methods: When most information is missing, the standard approach is to estimate each respondent’s probability of participating and assign each respondent a weight that is inversely proportional to this probability. This paper contains a review of the major ideas and principles regarding the computation of statistical weights and the analysis of weighted data.
Results: A short software review for weighted data is provided and the use of statistical weights is illustrated through data from the EDSP (Early Developmental Stages of Psychopathology) Study. The results show that disregarding different sampling and response probabilities can have a major impact on estimated odds ratios.
Conclusions: The benefit of using statistical weights in reducing sampling bias should be balanced against increased variances in the weighted parameter estimates.
|
26 |
Sélection de modèle d'imputation à partir de modèles bayésiens hiérarchiques linéaires multivariésChagra, Djamila 06 1900 (has links)
Les logiciels utilisés sont Splus et R. / Résumé
La technique connue comme l'imputation multiple semble être la technique la plus appropriée pour résoudre le problème de non-réponse. La littérature mentionne des méthodes qui modélisent la nature et la structure des valeurs manquantes. Une des méthodes les plus populaires est l'algorithme « Pan » de (Schafer & Yucel, 2002). Les imputations rapportées par cette méthode sont basées sur un modèle linéaire multivarié à effets mixtes pour la variable réponse. La méthode « BHLC » de (Murua et al, 2005) est une extension de « Pan » dont le modèle est bayésien hiérarchique avec groupes. Le but principal de ce travail est d'étudier le problème de sélection du modèle pour l'imputation multiple en termes d'efficacité et d'exactitude des prédictions des valeurs manquantes. Nous proposons une mesure de performance liée à la prédiction des valeurs manquantes. La mesure est une erreur quadratique moyenne reflétant la variance associée aux imputations multiples et le biais de prédiction. Nous montrons que cette mesure est plus objective que la mesure de variance de Rubin. Notre mesure est calculée en augmentant par une faible proportion le nombre de valeurs manquantes dans les données. La performance du modèle d'imputation est alors évaluée par l'erreur de prédiction associée aux valeurs manquantes. Pour étudier le problème objectivement, nous avons effectué plusieurs simulations. Les données ont été produites selon des modèles explicites différents avec des hypothèses particulières sur la structure des erreurs et la distribution a priori des valeurs manquantes. Notre étude examine si la vraie structure d'erreur des données a un effet sur la performance du choix des différentes hypothèses formulées pour le modèle d'imputation. Nous avons conclu que la réponse est oui. De plus, le choix de la distribution des valeurs manquantes semble être le facteur le plus important pour l'exactitude des prédictions. En général, les choix les plus efficaces pour de bonnes imputations sont une distribution de student avec inégalité des variances dans les groupes pour la structure des erreurs et une loi a priori choisie pour les valeurs manquantes est la loi normale avec moyenne et variance empirique des données observées, ou celle régularisé avec grande variabilité. Finalement, nous avons appliqué nos idées à un cas réel traitant un problème de santé.
Mots clés : valeurs manquantes, imputations multiples, modèle linéaire bayésien hiérarchique, modèle à effets mixtes. / Abstract
The technique known as multiple imputation seems to be the most suitable technique for solving the problem of non-response. The literature mentions methods that models the nature and structure of missing values. One of the most popular methods is the PAN algorithm of Schafer and Yucel (2002). The imputations yielded by this method are based on a multivariate linear mixed-effects model for the response variable. A Bayesian hierarchical clustered and more flexible extension of PAN is given by the BHLC model of Murua et al. (2005). The main goal of this work is to study the problem of model selection for multiple imputation in terms of efficiency and accuracy of missing-value predictions. We propose a measure of performance linked to the prediction of missing values. The measure is a mean squared error, and hence in addition to the variance associated to the multiple imputations, it includes a measure of bias in the prediction. We show that this measure is more objective than the most common variance measure of Rubin. Our measure is computed by incrementing by a small proportion the number of missing values in the data and supposing that those values are also missing. The performance of the imputation model is then assessed through the prediction error associated to these pseudo missing values. In order to study the problem objectively, we have devised several simulations. Data were generated according to different explicit models that assumed particular error structures. Several missing-value prior distributions as well as error-term distributions are then hypothesized. Our study investigates if the true error structure of the data has an effect on the performance of the different hypothesized choices for the imputation model. We concluded that the answer is yes. Moreover, the choice of missing-value prior distribution seems to be the most important factor for accuracy of predictions. In general, the most effective choices for good imputations are a t-Student distribution with different cluster variances for the error-term, and a missing-value Normal prior with data-driven mean and variance, or a missing-value regularizing Normal prior with large variance (a ridge-regression-like prior). Finally, we have applied our ideas to a real problem dealing with health outcome observations associated to a large number of countries around the world.
Keywords: Missing values, multiple imputation, Bayesian hierarchical linear model, mixed effects model.
|
27 |
Alternativas de análise para experimentos G × E multiatributo / Alternatives of analysis of G×E trials multi-attributePeña, Marisol Garcia 04 February 2016 (has links)
Geralmente, nos experimentos genótipo por ambiente (G × E) é comum observar o comportamento dos genótipos em relação a distintos atributos nos ambientes considerados. A análise deste tipo de experimentos tem sido abordada amplamente para o caso de um único atributo. Nesta tese são apresentadas algumas alternativas de análise considerando genótipos, ambientes e atributos simultaneamente. A primeira, é baseada no método de mistura de máxima verossimilhança de agrupamento - Mixclus e a análise de componentes principais de 3 modos - 3MPCA, que permitem a análise de tabelas de tripla entrada, estes dois métodos têm sido muito usados na área da psicologia e da química, mas pouco na agricultura. A segunda, é uma metodologia que combina, o modelo de efeitos aditivos com interação multiplicativa - AMMI, modelo eficiente para a análise de experimentos (G × E) com um atributo e a análise de procrustes generalizada, que permite comparar configurações de pontos e proporcionar uma medida numérica de quanto elas diferem. Finalmente, é apresentada uma alternativa para realizar imputação de dados nos experimentos (G × E), pois, uma situação muito frequente nestes experimentos, é a presença de dados faltantes. Conclui-se que as metodologias propostas constituem ferramentas úteis para a análise de experimentos (G × E) multiatributo. / Usually, in the experiments genotype by environment (G×E) it is common to observe the behaviour of genotypes in relation to different attributes in the environments considered. The analysis of such experiments have been widely discussed for the case of a single attribute. This thesis presents some alternatives of analysis, considering genotypes, environments and attributes simultaneously. The first, is based on the mixture maximum likelihood method - Mixclus and the three-mode principal component analysis, these two methods have been very used in the psychology and chemistry, but little in agriculture. The second, is a methodology that combines the additive main effects and multiplicative interaction models - AMMI, efficient model for the analysis of experiments (G×E) with one attribute, and the generalised procrustes analysis, which allows compare configurations of points and provide a numerical measure of how much they differ. Finally, an alternative to perform data imputation in the experiments (G×E) is presented, because, a very frequent situation in these experiments, is the presence of missing values. It is concluded that the proposed methodologies are useful tools for the analysis of experiments (G×E) multi-attribute.
|
28 |
Sélection de modèle d'imputation à partir de modèles bayésiens hiérarchiques linéaires multivariésChagra, Djamila 06 1900 (has links)
Résumé
La technique connue comme l'imputation multiple semble être la technique la plus appropriée pour résoudre le problème de non-réponse. La littérature mentionne des méthodes qui modélisent la nature et la structure des valeurs manquantes. Une des méthodes les plus populaires est l'algorithme « Pan » de (Schafer & Yucel, 2002). Les imputations rapportées par cette méthode sont basées sur un modèle linéaire multivarié à effets mixtes pour la variable réponse. La méthode « BHLC » de (Murua et al, 2005) est une extension de « Pan » dont le modèle est bayésien hiérarchique avec groupes. Le but principal de ce travail est d'étudier le problème de sélection du modèle pour l'imputation multiple en termes d'efficacité et d'exactitude des prédictions des valeurs manquantes. Nous proposons une mesure de performance liée à la prédiction des valeurs manquantes. La mesure est une erreur quadratique moyenne reflétant la variance associée aux imputations multiples et le biais de prédiction. Nous montrons que cette mesure est plus objective que la mesure de variance de Rubin. Notre mesure est calculée en augmentant par une faible proportion le nombre de valeurs manquantes dans les données. La performance du modèle d'imputation est alors évaluée par l'erreur de prédiction associée aux valeurs manquantes. Pour étudier le problème objectivement, nous avons effectué plusieurs simulations. Les données ont été produites selon des modèles explicites différents avec des hypothèses particulières sur la structure des erreurs et la distribution a priori des valeurs manquantes. Notre étude examine si la vraie structure d'erreur des données a un effet sur la performance du choix des différentes hypothèses formulées pour le modèle d'imputation. Nous avons conclu que la réponse est oui. De plus, le choix de la distribution des valeurs manquantes semble être le facteur le plus important pour l'exactitude des prédictions. En général, les choix les plus efficaces pour de bonnes imputations sont une distribution de student avec inégalité des variances dans les groupes pour la structure des erreurs et une loi a priori choisie pour les valeurs manquantes est la loi normale avec moyenne et variance empirique des données observées, ou celle régularisé avec grande variabilité. Finalement, nous avons appliqué nos idées à un cas réel traitant un problème de santé.
Mots clés : valeurs manquantes, imputations multiples, modèle linéaire bayésien hiérarchique, modèle à effets mixtes. / Abstract
The technique known as multiple imputation seems to be the most suitable technique for solving the problem of non-response. The literature mentions methods that models the nature and structure of missing values. One of the most popular methods is the PAN algorithm of Schafer and Yucel (2002). The imputations yielded by this method are based on a multivariate linear mixed-effects model for the response variable. A Bayesian hierarchical clustered and more flexible extension of PAN is given by the BHLC model of Murua et al. (2005). The main goal of this work is to study the problem of model selection for multiple imputation in terms of efficiency and accuracy of missing-value predictions. We propose a measure of performance linked to the prediction of missing values. The measure is a mean squared error, and hence in addition to the variance associated to the multiple imputations, it includes a measure of bias in the prediction. We show that this measure is more objective than the most common variance measure of Rubin. Our measure is computed by incrementing by a small proportion the number of missing values in the data and supposing that those values are also missing. The performance of the imputation model is then assessed through the prediction error associated to these pseudo missing values. In order to study the problem objectively, we have devised several simulations. Data were generated according to different explicit models that assumed particular error structures. Several missing-value prior distributions as well as error-term distributions are then hypothesized. Our study investigates if the true error structure of the data has an effect on the performance of the different hypothesized choices for the imputation model. We concluded that the answer is yes. Moreover, the choice of missing-value prior distribution seems to be the most important factor for accuracy of predictions. In general, the most effective choices for good imputations are a t-Student distribution with different cluster variances for the error-term, and a missing-value Normal prior with data-driven mean and variance, or a missing-value regularizing Normal prior with large variance (a ridge-regression-like prior). Finally, we have applied our ideas to a real problem dealing with health outcome observations associated to a large number of countries around the world.
Keywords: Missing values, multiple imputation, Bayesian hierarchical linear model, mixed effects model. / Les logiciels utilisés sont Splus et R.
|
29 |
Alternativas de análise para experimentos G × E multiatributo / Alternatives of analysis of G×E trials multi-attributeMarisol Garcia Peña 04 February 2016 (has links)
Geralmente, nos experimentos genótipo por ambiente (G × E) é comum observar o comportamento dos genótipos em relação a distintos atributos nos ambientes considerados. A análise deste tipo de experimentos tem sido abordada amplamente para o caso de um único atributo. Nesta tese são apresentadas algumas alternativas de análise considerando genótipos, ambientes e atributos simultaneamente. A primeira, é baseada no método de mistura de máxima verossimilhança de agrupamento - Mixclus e a análise de componentes principais de 3 modos - 3MPCA, que permitem a análise de tabelas de tripla entrada, estes dois métodos têm sido muito usados na área da psicologia e da química, mas pouco na agricultura. A segunda, é uma metodologia que combina, o modelo de efeitos aditivos com interação multiplicativa - AMMI, modelo eficiente para a análise de experimentos (G × E) com um atributo e a análise de procrustes generalizada, que permite comparar configurações de pontos e proporcionar uma medida numérica de quanto elas diferem. Finalmente, é apresentada uma alternativa para realizar imputação de dados nos experimentos (G × E), pois, uma situação muito frequente nestes experimentos, é a presença de dados faltantes. Conclui-se que as metodologias propostas constituem ferramentas úteis para a análise de experimentos (G × E) multiatributo. / Usually, in the experiments genotype by environment (G×E) it is common to observe the behaviour of genotypes in relation to different attributes in the environments considered. The analysis of such experiments have been widely discussed for the case of a single attribute. This thesis presents some alternatives of analysis, considering genotypes, environments and attributes simultaneously. The first, is based on the mixture maximum likelihood method - Mixclus and the three-mode principal component analysis, these two methods have been very used in the psychology and chemistry, but little in agriculture. The second, is a methodology that combines the additive main effects and multiplicative interaction models - AMMI, efficient model for the analysis of experiments (G×E) with one attribute, and the generalised procrustes analysis, which allows compare configurations of points and provide a numerical measure of how much they differ. Finally, an alternative to perform data imputation in the experiments (G×E) is presented, because, a very frequent situation in these experiments, is the presence of missing values. It is concluded that the proposed methodologies are useful tools for the analysis of experiments (G×E) multi-attribute.
|
30 |
AUTOMATED OPTIMAL FORECASTING OF UNIVARIATE MONITORING PROCESSES : Employing a novel optimal forecast methodology to define four classes of forecast approaches and testing them on real-life monitoring processesRazroev, Stanislav January 2019 (has links)
This work aims to explore practical one-step-ahead forecasting of structurally changing data, an unstable behaviour, that real-life data connected to human activity often exhibit. This setting can be characterized as monitoring process. Various forecast models, methods and approaches can range from being simple and computationally "cheap" to very sophisticated and computationally "expensive". Moreover, different forecast methods handle different data-patterns and structural changes differently: for some particular data types or data intervals some particular forecast methods are better than the others, something that is usually not known beforehand. This raises a question: "Can one design a forecast procedure, that effectively and optimally switches between various forecast methods, adapting the forecast methods usage to the changes in the incoming data flow?" The thesis answers this question by introducing optimality concept, that allows optimal switching between simultaneously executed forecast methods, thus "tailoring" forecast methods to the changes in the data. It is also shown, how another forecast approach: combinational forecasting, where forecast methods are combined using weighted average, can be utilized by optimality principle and can therefore benefit from it. Thus, four classes of forecast results can be considered and compared: basic forecast methods, basic optimality, combinational forecasting, and combinational optimality. The thesis shows, that the usage of optimality gives results, where most of the time optimality is no worse or better than the best of forecast methods, that optimality is based on. Optimality reduces also scattering from multitude of various forecast suggestions to a single number or only a few numbers (in a controllable fashion). Optimality gives additionally lower bound for optimal forecasting: the hypothetically best achievable forecast result. The main conclusion is that optimality approach makes more or less obsolete other traditional ways of treating the monitoring processes: trying to find the single best forecast method for some structurally changing data. This search still can be sought, of course, but it is best done within optimality approach as its innate component. All this makes the proposed optimality approach for forecasting purposes a valid "representative" of a more broad ensemble approach (which likewise motivated development of now popular Ensemble Learning concept as a valid part of Machine Learning framework). / Denna avhandling syftar till undersöka en praktisk ett-steg-i-taget prediktering av strukturmässigt skiftande data, ett icke-stabilt beteende som verkliga data kopplade till människoaktiviteter ofta demonstrerar. Denna uppsättning kan alltså karakteriseras som övervakningsprocess eller monitoringsprocess. Olika prediktionsmodeller, metoder och tillvägagångssätt kan variera från att vara enkla och "beräkningsbilliga" till sofistikerade och "beräkningsdyra". Olika prediktionsmetoder hanterar dessutom olika mönster eller strukturförändringar i data på olika sätt: för vissa typer av data eller vissa dataintervall är vissa prediktionsmetoder bättre än andra, vilket inte brukar vara känt i förväg. Detta väcker en fråga: "Kan man skapa en predictionsprocedur, som effektivt och på ett optimalt sätt skulle byta mellan olika prediktionsmetoder och för att adaptera dess användning till ändringar i inkommande dataflöde?" Avhandlingen svarar på frågan genom att introducera optimalitetskoncept eller optimalitet, något som tillåter ett optimalbyte mellan parallellt utförda prediktionsmetoder, för att på så sätt skräddarsy prediktionsmetoder till förändringar i data. Det visas också, hur ett annat prediktionstillvägagångssätt: kombinationsprediktering, där olika prediktionsmetoder kombineras med hjälp av viktat medelvärde, kan utnyttjas av optimalitetsprincipen och därmed få nytta av den. Alltså, fyra klasser av prediktionsresultat kan betraktas och jämföras: basprediktionsmetoder, basoptimalitet, kombinationsprediktering och kombinationsoptimalitet. Denna avhandling visar, att användning av optimalitet ger resultat, där optimaliteten för det mesta inte är sämre eller bättre än den bästa av enskilda prediktionsmetoder, som själva optimaliteten är baserad på. Optimalitet reducerar också spridningen från mängden av olika prediktionsförslag till ett tal eller bara några enstaka tal (på ett kontrollerat sätt). Optimalitet producerar ytterligare en nedre gräns för optimalprediktion: det hypotetiskt bästa uppnåeliga prediktionsresultatet. Huvudslutsatsen är följande: optimalitetstillvägagångssätt gör att andra traditionella sätt att ta hand om övervakningsprocesser blir mer eller mindre föråldrade: att leta bara efter den enda bästa enskilda prediktionsmetoden för data med strukturskift. Sådan sökning kan fortfarande göras, men det är bäst att göra den inom optimalitetstillvägagångssättet, där den ingår som en naturlig komponent. Allt detta gör det föreslagna optimalitetstillvägagångssättetet för prediktionsändamål till en giltig "representant" för det mer allmäna ensembletillvägagångssättet (något som också motiverade utvecklingen av numera populär Ensembleinlärning som en giltig del av Maskininlärning).
|
Page generated in 0.0723 seconds