• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 132
  • 55
  • 42
  • 15
  • 14
  • 8
  • 6
  • 4
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 322
  • 140
  • 119
  • 119
  • 69
  • 54
  • 44
  • 39
  • 27
  • 24
  • 22
  • 22
  • 21
  • 20
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
231

以Hot deck插補法推估成就測驗之不完整作答反應 / Inferring feasibility in non response of achievement test by using hot deck imputation method

林曉芳 Unknown Date (has links)
本研究之目的旨在探討成就測驗中,學生的不完整作答反應是否能利用插補法,對不完整作答反應資料進行彌補。研究者藉由試題參數與受試者能力參數的分析討論,期望能獲得支持插補技術應用於成就測驗的結論。研究欲探討的問題有三:(一)利用統計插補法所估算之替代值與實際作答反應之間是否有差異存在;(二)受試者之部分答題反應組型在經過插補後,與完全作答反應組型之分析結果是否有差異存在;(三)能否將統計插補技術應用於成就測驗模式中。 本研究程序包含兩部分,一為模擬資料(N=1000,3000,5000,l0000;缺失比例為5%,10%,15%,30%,50%)的分析,模擬研究主要作為實證研究結果的驗證與推論;另一個則為實證資料的分析與討論。針對不完整作答反應,基於IRT的強假設前提,以及成就測驗作答反應的資料型態,研究者選擇熱卡插補法(HOt Deck imputation method)的統計插補技術,分別對於實證資料與模擬資料中之各類樣本數,與不同缺失比率下的作答反應作插補。另又以EM插補法作對照分析。 根據研究結果與討論,提出以下幾點歸納結論:(一)當缺失比例不大時,能符合原本的資料分佈假設,但隨著缺失比例愈高,高至30%以上時,已漸不符合原本假設;(二)當缺失比例愈高時,各項參數之估計標準差值幾乎是最大的;若忽略未作答反應之受試者的表現時,其分析所得的參數估計值亦並未是最佳的,反而是將所有受試者的作答反應進行插補估計後,所得的參數估計標準差值才是最小、最佳的;(三)本研究中,主要以熱卡法為插補方法,而EM插補法並不符合本研究資料之性質,故若採用此法進行插補,則所得的估計標準差會是最大的;(四)經過模擬研究與實證資料的分析後,證明熱卡法所推估的未作答反應,與直接刪除未作答反應或不處理未作答反應的確有差異存在,且經過插補所產生的替代值,對於受試者的能力表現能提供更穩定有效的解釋力。 關鍵詞:熱卡插補法、不完整作答反應、成就測驗 / This purpose of this study is to infer the feasibility if examinees' non response could be made up, by using imputation method in non response or missing value of achievement test. The research design contains two procedures: one is simulation research (setting sample sizes are 1000, 3000, 5000, and 10000; percents of non response are 5%, 10%, 15%, 30%, and 50%), and the other is pragmatic research. Hot deck imputation method is the main concern method in this research. To test if this method fits to achievement test, EM method is used for comparison with the Hot deck imputation method. The results are as follows: 1. The distribution of below 30% percent non response data after imputated is the same as the original data, but following the higher percents of non response, the distribution is not match what we expected. 2. Applying Hot Deck imputation method to the achievement test with different sample size and different percents of non response, the researcher found that following the higher percents of non response in any sample size, the higher standard deviation happened. Besides, ignoring or deleting these non responses is not a good way to deal with this test response pattern. Imputating an appropriate answer for the non response by Hot Deck imputation method, we could get the least standard deviation of the test and ability parameters estimation, and get largest test information for examinees. 3. We found the Hot Deck imputation method is suitable for the data pattern of achievement test than EM method. There are different outcomes between Hot deck imputation method and EM method. Hot Deck imputation method also has accuracy parameter estimation. 4. Based on above discussions, this study suggested that Hot deck imputation method could cope with non response in achievement test pretty well. Key Words: Hot Deck imputation method, Non response, Achievement test
232

ESSAYS ON FARMER WILLINGNESS TO PARTICIPATE IN BEST MANAGEMENT PRACTICES IN THE KENTUCKY RIVER WATERSHED

Zhong, Hua 01 January 2016 (has links)
This dissertation explores the adoption of Best Management Practices (BMPs) in the Kentucky River watershed. Through a survey of farmers in the Kentucky River watershed, chapter two investigates farmers’ current BMP adoption and their willingness to engage in additional adoption incentivized through a proposed Water Quality Trading (WQT) program. This chapter includes two parts: the first part is to investigate the factors influencing farmers’ current usage of BMPs; the second part is to estimate farmers’ willingness to implement BMPs given different levels of compensation specified in the survey. Farmers’ experiences about BMPs are more likely to persuade them to adopt additional BMPs. The activities of using riparian buffers, fencing off animals and building up waste storage facilities are found to be responsive to the levels of compensation offered. The third chapter discusses farmers’ expected economic benefits from BMP adoption, and addresses the missing data issue. In the survey, of those respondents who indicated that they accept the offered level of compensation, about 20% of them did not answer the follow-up question of how much they would adopt the practice, creating missing data. We compare three methods to handle the issue of missing data: deletion method, mean imputation, and multiple imputation method. Following these methods, we estimate factors affecting how much farmers may engage in BMPs using a Tobit or Poisson model. The results show that increasing the compensation for using BMPs is more likely to encourage farmers to adopt riparian buffers. Results obtained using the method of multivariate imputation by chained equation are more promising than using the deletion or mean imputation method. The fourth chapter examines whether wealth change and local community interaction may affect BMP adoption. Survey data on BMP adoption are combined with the local community data from publically available sources. Results show that the decrease in land values between 2007 and 2012 discouraged the adoption of riparian buffers; the equine inventory in local communities has positive impact on the adoption of animal fences and nutrient management; the more rural the local communities are, the less likely farmers would fence off livestock from water resources.
233

Bayesian Cluster Analysis : Some Extensions to Non-standard Situations

Franzén, Jessica January 2008 (has links)
The Bayesian approach to cluster analysis is presented. We assume that all data stem from a finite mixture model, where each component corresponds to one cluster and is given by a multivariate normal distribution with unknown mean and variance. The method produces posterior distributions of all cluster parameters and proportions as well as associated cluster probabilities for all objects. We extend this method in several directions to some common but non-standard situations. The first extension covers the case with a few deviant observations not belonging to one of the normal clusters. An extra component/cluster is created for them, which has a larger variance or a different distribution, e.g. is uniform over the whole range. The second extension is clustering of longitudinal data. All units are clustered at all time points separately and the movements between time points are modeled by Markov transition matrices. This means that the clustering at one time point will be affected by what happens at the neighbouring time points. The third extension handles datasets with missing data, e.g. item non-response. We impute the missing values iteratively in an extra step of the Gibbs sampler estimation algorithm. The Bayesian inference of mixture models has many advantages over the classical approach. However, it is not without computational difficulties. A software package, written in Matlab for Bayesian inference of mixture models is introduced. The programs of the package handle the basic cases of clustering data that are assumed to arise from mixture models of multivariate normal distributions, as well as the non-standard situations.
234

Avaliação de métodos de imputação na variável Receita das empresas da Pesquisa Anual de Comércio - PAC-IBGE / An evaluation of imputation iethods on the Revenue variable from the Annual Survey of Commerces (PAC-IBGE) companies

Rodrigues, João Carlos Silva 07 June 2019 (has links)
O presente trabalho utiliza as informações da Pesquisa Anual do Comércio - PAC, uma das quatro pesquisas econômicas estruturais do IBGE, para avaliar o Modelo de Imputação atual da pesquisa comparando-o com outros modelos disponíveis na literatura. Foi feito um recorte da base da PAC-IBGE dos anos de 2014 e 2015 e foram testados vinte modelos de imputação. Na PAC, tem sido observado um aumento do impacto das não-respostas nas estimativas de seus totais. Isto deriva da alta assimetria das variáveis econômicas em conjunto com o pequeno número de empresas de alguns estratos, somados ainda ao aumento populacional de algumas atividades econômicas - e, por consequência, dos pesos amostrais - e ainda do elevado número de mortes (fechamento) de empresas pequenas. Tais problemas apresentados geram a necessidade de se estudar alternativas de tratamento para essas empresas não-respondentes. Os modelos foram analisados selecionando algumas empresas aleatoriamente e assumindo que elas não tivessem respondido à pesquisa. Posteriormente, essas empresas foram submetidas aos modelos de imputação selecionados e os resultados foram avaliados utilizando Erro Quadrático Médio (EQM) e Variação Percentual (VP) dos totais estimados contra o real. Foi escolhida a variável de RECEITA para ser usada nos testes. Os modelos utilizados podem ser agrupados em quatro grupos: de médias de respondentes; através de uma regressão com uso de variáveis auxiliares de cadastro; média dos respondentes mais próximos através de uma função distância; e através de uma regressão dos respondentes mais próximos com uso de uma função distância. Ao final das análises, verificou-se que apesar de alguns modelos também terem tido bons desempenhos, não foi observado um fator relevante que indique a troca do modelo atual de imputação utilizado na PAC-IBGE. / The present work uses the information from the Annual Survey of Commerce - PAC, one of the four structural surveys of IBGE, to evaluate its current imputation model against other available models in the literature. The dataset used was obtained from PAC in the years of 2014 and 2015 and twenty imputation models were tested. At PAC, there has been an increase in the impact of non-responses on its totals estimative. This is due to the high asymmetry of the economic variables together with the small number of companies of some strata, added to the population increase of some economic activities - and, consequently, of their sample weights - and also with the high number of deaths (closure) of small businesses. Such problems present the need to study alternatives treatments for these non-responding companies. The analysis of models were made by selecting some companies randomly and assuming that they had not responded the survey. Subsequently, these companies were submitted to the selected imputation models and the results were evaluated using Mean Square Error (MSE) and the Percent Variation (PV) between the estimated totals against the real ones. The Revenue variable was the one chosen to be used in the tests. The models used can be grouped into four groups: average of the respondents; through a regression function using auxiliary variables of cadastre; average of the closest respondents through a distance function; and through a regression function of the closest respondents using a distance function. At the end of the analyzes, it was verified that although some imputation models presented good results, there is no relevant factor indicating the change of the current one.
235

Droit de la responsabilité des états et arbitrage transnational CIRDI / Law of state responsability and ICSID transnational arbitration

Kane, Mouhamadou Madana 19 December 2012 (has links)
La prolifération des traités bilatéraux d'investissement a contribué, ces dernières années, à l'augmentation des litiges portés devant les tribunaux d'arbitrage du Centre International pour le Règlement des Différends relatifs aux Investissements (CIRDI). En effet, les clauses de règlement des différends contenus dans ces traités ont permis aux investisseurs étrangers de saisir directement les tribunaux CIRDI en cas de violation par l'État d'accueil de l'investissement des dispositions protectrices ou de traitement prévues dans ces traités. La présence de l'État au contentieux CIRDI fait que les litiges soumis aux tribunaux arbitraux portent par nature sur des questions de responsabilité. Dès lors, l'invocation par les arbitres des règles coutumières du droit de la responsabilité de l'État, telles que codifiées par la Commission du droit international, est quasi systématique. Au regard de la pratique arbitrale, cette thèse se veut un essai sur les interactions entre le droit de la responsabilité de l'État et l'arbitrage CIRDI sur le fondement des traités de protection, l'objectif final étant de parvenir à une conclusion sur l'existence ou non d'un sous-système de responsabilité de l' État sur le fondement des traités de protection des investissements. Pour ce faire, suivant la démarche de codification de la Commission du droit International, elle met l'accent sur l'influence des règles coutumières d'engagement de la responsabilité de l'État sur la pratique des tribunaux d'arbitrage du CIRDI fondée sur les traités de protection ; et, sous l'angle de la mise en œuvre et du contenu de la responsabilité étatique, elle aborde, à la lumière du droit international général, les aspects de compétence des tribunaux d'arbitrage du CIRDI, les éléments de recevabilité des réclamations des investisseurs étrangers, et les questions liées à la réparation du préjudice causé par l'État. / With the proliferation of Bilateral Investment Treaties, many disputes have in the recent years been brought before arbitral tribunals under the auspices of the International Centre for the Settlement of Investment Disputes (ICSID). By virtue of dispute settlement clauses of such treaties, foreign investors are able to directly call upon the jurisdiction of ICSID in case of breach by the host State of its treaty-based protection and treatment obligations. Because of the State's involvement, ICSID disputes raise, by nature, issues of Sate Responsibility. Therefore, it is not surprising that ICSID arbitrators systematically rely on customary rules on State Responsibility as codified by the International Law Commission to form and motivate their opinions. The current thesis aims at assessing, in light of the arbitral practice, the interactions between the Law of State Responsibility and ICSID's treaty-based arbitration, with the objective to determine whether State responsibility under treaties is a self-contained regime. We have adopted the International Law Commission's codification approach to highlight, on one side, the influence of customary rules on engagement of State Responsibility on the practice of ICSID arbitral tribunals; and, on the other side, with regards to invocation and content of the State's responsibility, the relationships between general international law and salient aspects of the jurisdiction of ICSID tribunals, the admissibility of claims and the reparation of injury caused to the investor by the State.
236

Estudo genético quantitativo e molecular de características de crescimento e carcaça em bovinos da raça Nelore usando inferência bayesiana. / Quantitative and molecular study of growth and carcass traits in Nellore cattle using bayesian inference.

Cucco, Diego de Córdova 22 November 2010 (has links)
Estudos genético quantitativos e moleculares são fundamentais para o melhoramento animal e sua realização com a raça Nelore é de grande importância devido a ampla participação dessa no rebanho de corte nacional. A estimação constante dos parâmetros genéticos das características de produção é necessário para a adequada condução do processo de seleção dos animais. A melhoria de características relacionadas à carcaça bovina é essencial para a eficiência e sustentabilidade da atividade e a implementação de métodos de seleção animal baseados em informações moleculares pode revolucionar a produção zootécnica e deve ser profundamente estudado. Sendo assim, os objetivos do presente estudo foram estimar parâmetros genéticos e componentes de variância através de diferentes modelos matemáticos para um total de 14 características fenotípicas (o peso ao nascimento, peso a desmama, peso ao sobreano, ganho de peso entre a desmama e o sobreano ajustado para um intervalo de 345 dias, perímetro escrotal ao sobreano, altura de garupa ao sobreano, escores visuais avaliados ao sobreano de conformação, precocidade, musculosidade, comprimento de umbigo e ossatura, e ainda características de carcaça mensuradas por ultrassonografia realizada após 30 a 45 dias de confinamento como a área de olho de lombo, espessura de gordura subcutânea, espessura de gordura na picanha). Foram estimadas correlações entre todas estas características com as de carcaça mensuradas por ultrassonografia. Sob o enfoque molecular, desenvolveu-se um programa para imputação de genótipos faltantes e estudaram-se diferentes métodos de associação de marcadores moleculares do tipo mutação de base nitrogenada única (SNP) a características de produção incluídas no índice de seleção de um programa de melhoramento da raça Nelore, utilizando inferência bayesiana. Todas as características estudadas podem ser selecionadas esperando-se progresso genético na população. Os efeitos maternos foram importantes em algumas características onde normalmente estes efeitos não têm sido considerados atualmente. A quantidade de escores atribuídos a uma característica categórica assim como o número de observações fenotípicas resultam em diferenças nas estimativas quando avaliadas por modelos lineares ou de limiar. Não deverão ser obtidos resultados satisfatórios na melhoria das características de carcaça se a seleção for baseada nas tradicionais avaliações visuais utilizadas no momento. Os métodos utilizados na análise de associação dos marcadores podem originar diferentes resultados. Os marcadores que apresentaram efeitos altamente relevantes (P<0,01) geralmente apresentaram resultados semelhantes, independentemente do método utilizado. Certos marcadores podem ter efeitos positivos para algumas características componentes do índice de seleção e negativos para as demais. A análise em conjunto com todos os SNP\'s e todos os dados fenotípicos disponíveis é viável e parece ser a mais adequada. O método desenvolvido de imputação de genótipos faltantes a partir do parentesco de animais genotipados foi eficiente. / Quantitative and molecular genetic studies are very important for animal breeding and studies with Nellore cattle have great importance due to the large participation of that breed in the Brazilian beef cattle industry (around 80% of the herd). The constant estimation of genetic parameters for traits linked to production is necessary for properly perform selection of animals. The improvement of carcass traits is essential for efficiency and profitability of the activity. The implementation of methods for animal selection based on molecular information could revolutionize animal production and should be deeply studied. Thus, the objectives of this study were to estimate genetic parameters and variance components using different mathematical models for a total of 14 traits, such as birth weight, weaning weight, yearling weight, post-weaning weight gain between weaning and yearling adjusted for 345 days, yearling scrotal circumference, yearling hip height, yearling visual scores like conformation, finishing, muscularity, bone structure and navel length. Ultrasound measurements for carcass traits performed at feedlot (30 to 45 days, at approximate age of 20 months) such as rib-eye area, fat thickness, rump fat thickness were, also, evaluated. Estimate correlations between all these traits with the carcass traits measured by ultrasound were estimated. As concerned to molecular study, an algorithm for imputation of missing genotypes was developed and different methods to analyze molecular marker (single nucleotide polymorphism - SNP) association with traits components of the selection index of a breeding program that is applied to the population studied, using bayesian inference, were used. Genetic progress will be expected for selection of all the traits studied. Maternal effects were important in some traits in which those effects are not usually considered. The amount of scores assigned to a categorical trait and the number of observations could result in different estimates when evaluated by linear or threshold models. The selection for visual scores traditionally used in that population will not improve carcass traits. The methods used to analyze the association of markers may lead to different results. The SNP\'s with association effects of high relevance (P<0.01) generally express their effects regardless of the method used to analyze. Some markers may have positive effects for some traits of selection index, but negative for others. The joint analysis with all SNPs and with all available phenotypes is feasible and appears to be more appropriate. The algorithm developed for imputation of missing genotypes from pedigree information of genotyped animals was efficient.
237

Imputação de dados em experimentos multiambientais: novos algoritmos utilizando a decomposição por valores singulares / Data imputation in multi-environment trials: new algorithms using the singular value decomposition

Alarcon, Sergio Arciniegas 02 February 2016 (has links)
As análises biplot que utilizam os modelos de efeitos principais aditivos com inter- ação multiplicativa (AMMI) requerem matrizes de dados completas, mas, frequentemente os ensaios multiambientais apresentam dados faltantes. Nesta tese são propostas novas metodologias de imputação simples e múltipla que podem ser usadas para analisar da- dos desbalanceados em experimentos com interação genótipo por ambiente (G×E). A primeira, é uma nova extensão do método de validação cruzada por autovetor (Bro et al, 2008). A segunda, corresponde a um novo algoritmo não-paramétrico obtido por meio de modificações no método de imputação simples desenvolvido por Yan (2013). Também é incluído um estudo que considera sistemas de imputação recentemente relatados na literatura e os compara com o procedimento clássico recomendado para imputação em ensaios (G×E), ou seja, a combinação do algoritmo de Esperança-Maximização com os modelos AMMI ou EM-AMMI. Por último, são fornecidas generalizações da imputação simples descrita por Arciniegas-Alarcón et al. (2010) que mistura regressão com aproximação de posto inferior de uma matriz. Todas as metodologias têm como base a decomposição por valores singulares (DVS), portanto, são livres de pressuposições distribucionais ou estruturais. Para determinar o desempenho dos novos esquemas de imputação foram realizadas simulações baseadas em conjuntos de dados reais de diferentes espécies, com valores re- tirados aleatoriamente em diferentes porcentagens e a qualidade das imputações avaliada com distintas estatísticas. Concluiu-se que a DVS constitui uma ferramenta útil e flexível na construção de técnicas eficientes que contornem o problema de perda de informação em matrizes experimentais. / The biplot analysis using the additive main effects and multiplicative interaction models (AMMI) require complete data matrix, but often multi-environments trials have missing values. This thesis proposed new methods of single and multiple imputation that can be used to analyze unbalanced data in experiments with genotype by environment interaction (G×E). The first is a new extension of the cross-validation method by eigenvector (Bro et al., 2008). The second, corresponds to a new non-parametric algorithm obtained through modifications of the simple imputation method developed by Yan (2013). Also is included a study that considers imputation systems recently reported in the literature and compares them with the classic procedure recommended for imputation in trials (G×E), it means, the combination of the Expectation-Maximization (EM) algorithm with the additive main effects and multiplicative interaction (AMMI) model or EM-AMMI. Finally, are supplied generalizations of simple imputation described by Arciniegas-Alarcón et al. (2010) that combines regression with lower-rank approximation of a matrix. All methodologies are based on singular value decomposition (SVD), so, are free of any distributional or structural assumptions. In order to determine the performance of the new imputation schemes were performed simulations based on real data set of different species, with values deleted randomly at different percentages and the quality of the imputations was evaluated using different statistics. It was concluded that SVD provides a useful and flexible tool for the construction of efficient techniques that circumvent the problem of missing data in experimental matrices.
238

Imputação múltipla: comparação e eficiência em experimentos multiambientais / Multiple Imputations: comparison and efficiency of multi-environmental trials

Silva, Maria Joseane Cruz da 19 July 2012 (has links)
Em experimentos de genótipos ambiente são comuns à presença de valores ausentes, devido à quantidade insuficiente de genótipos para aplicação dificultando, por exemplo, o processo de recomendação de genótipos mais produtivos, pois para a aplicação da maioria das técnicas estatísticas multivariadas exigem uma matriz de dados completa. Desta forma, aplicam-se métodos que estimam os valores ausentes a partir dos dados disponíveis conhecidos como imputação de dados (simples e múltiplas), levando em consideração o padrão e o mecanismo de dados ausentes. O objetivo deste trabalho é avaliar a eficiência da imputação múltipla livre da distribuição (IMLD) (BERGAMO et al., 2008; BERGAMO, 2007) comparando-a com o método de imputação múltipla com Monte Carlo via cadeia de Markov (IMMCMC), na imputação de unidades ausentes presentes em experimentos de interação genótipo (25) ambiente (7). Estes dados são provenientes de um experimento aleatorizado em blocos com a cultura de Eucaluptus grandis (LAVORANTI, 2003), os quais foram feitas retiradas de porcentagens aleatoriamente (10%, 20%, 30%) e posteriormente imputadas pelos métodos considerados. Os resultados obtidos por cada método mostraram que, a eficiência relativa em ambas as porcentagens manteve-se acima de 90%, sendo menor para o ambiente (4) quando imputado com a IMLD. Para a medida geral de exatidão, a medida que ocorreu acréscimo de dados em falta, foi maior ao imputar os valores ausentes com a IMMCMC, já para o método IMLD estes valores variaram sendo menor a 20% de retirada aleatória. Dentre os resultados encontrados, é de suma importância considerar o fato de que o método IMMCMC considera a suposição de normalidade, já o método IMLD leva vantagem sobre este ponto, pois não considera restrição alguma sobre a distribuição dos dados nem sobre os mecanismos e padrões de ausência. / In trials of genotypes by environment, the presence of absent values is common, due to the quantity of insufficiency of genotype application, making difficult for example, the process of recommendation of more productive genotypes, because for the application of the majority of the multivariate statistical techniques, a complete data matrix is required. Thus, methods that estimate the absent values from available data, known as imputation of data (simple and multiple) are applied, taking into consideration standards and mechanisms of absent data. The goal of this study is to evaluate the efficiency of multiple imputations free of distributions (IMLD) (BERGAMO et al., 2008; BERGAMO, 2007), compared with the Monte Carlo via Markov chain method of multiple imputation (IMMCMC), in the absent units present in trials of genotype interaction (25)environment (7). This data is provisional of random tests in blocks with Eucaluptus grandis cultures (LAVORANTI, 2003), of which random percentages of withdrawals (10%, 20%, 30%) were performed, with posterior imputation of the considered methods. The results obtained for each method show that, the relative efficiency in both percentages were maintained above 90%, being less for environmental (4) when imputed with an IMLD. The general measure of exactness, the measures where higher absent data occurred, was larger when absent values with an IMMCMC was imputed, as for the IMLD method, the varied absent values were lower at 20% for random withdrawals. Among results found, it is of sum importance to take into consideration the fact that the IMMCMC method considers it to be an assumption of normality, as for the IMLD method, it does not consider any restriction on the distribution of data, not on mechanisms and absent standards, which is an advantage on imputations.
239

Modeling Patterns of Small Scale Spatial Variation in Soil

Huang, Fang 11 January 2006 (has links)
The microbial communities found in soils are inherently heterogeneous and often exhibit spatial variations on a small scale. Becker et al. (2006) investigate this phenomenon and present statistical analyses to support their findings. In this project, alternative statistical methods and models are considered and employed in a re-analysis of the data from Becker. First, parametric nested random effects models are considered as an alternative to the nonparametric semivariogram models and kriging methods employed by Becker to analyze patterns of spatial variation. Second, multiple logistic regression models are employed to investigate factors influencing microbial community structure as an alternative to the simple logistic models used by Becker. Additionally, the microbial community profile data of Becker were unobservable at several points in the spatial grid. The Becker analysis assumes that the data are missing completely at random and as such have relatively little impact on inference. In this re-analysis, this assumption is investigated and it is shown that the pattern of missingness is correlated with both metabolic potential and spatial coordinates and thus provides useful information that was previously ignored by Becker. Multiple imputation methods are employed to incorporate the information present in the missing data pattern and results are compared with those of Becker.
240

Algoritmo kNN na imputação de dados de espectros de massa do tipo MALDI-TOF: uma análise da influência da imputação com kNN sobre o desempenho de classificadores logísticos para identificação de bactérias

Santos, Fábio dos 14 September 2018 (has links)
Submitted by Angela Maria de Oliveira (amolivei@uepg.br) on 2018-11-06T17:08:39Z No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Fábio dos Santos.pdf: 1456053 bytes, checksum: 5ee15a88a68aaef87a46a8f42f816e32 (MD5) / Made available in DSpace on 2018-11-06T17:08:39Z (GMT). No. of bitstreams: 2 license_rdf: 811 bytes, checksum: e39d27027a6cc9cb039ad269a5db8e34 (MD5) Fábio dos Santos.pdf: 1456053 bytes, checksum: 5ee15a88a68aaef87a46a8f42f816e32 (MD5) Previous issue date: 2018-09-14 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / O processo de identificação de bactérias relacionadas ao crescimento vegetal,é alvo de diversos estudos na área de bioinformática. Uma das formas para realizar esta identificação é utilizar dados de espectrometria de massa do tipo MALDI-TOF para detectar a presença de proteínas ribossomaisemumaamostra,eentão,usarclassificadoresparaprocessarestesdadoseselecionar o rótulo com a maior probabilidade. Durante o processo de geração dos espectros de massa paraclassificaçãoécomumanãodetecçãodealgumdospicosrelacionadosaproteínasribossomais. Considerando isto, este trabalho apresenta um estudo sobre o uso do algoritmo kNN para imputação desses casos. O estudo foi desenvolvido com o uso de classificadores logísticos para identificação de bactérias da espécie Staphylococcus aureus e do gênero Bacillus. Durante os experimentos foram testados três técnicas para imputar dados: imputação com zero, imputação com a média do atributo faltante, e a imputação com kNN. Desta última foram usadas duas abordagens: função de agregação de média e função de agregação de mediana. O protocolo experimental implementado possibilitou avaliar a influência da imputação sobre os resultados de classificação sob diferentes cenários no que se refere ao número de variáveis faltantes. Os resultadosobtidosmostramqueoempregodokNNnãolevouàumareduçãododesempenhodos classificadores, em relação àquele observado quando do uso de dados completos. Além disto, a classificação de dados submetidos a imputação pelo kNN apresentou desempenho superior àquele verificado quando do uso dos demais métodos. / It is subject of several studies in bioinformatics area the plant growth promoting bacteria identification process. An approach to performing it is to process sample’s ribosomal proteins data obtained by MALDI-TOF mass spectrometry through a classifier and select the highest probability label. However, at the time of mass spectra generation, it is common not detecting some ribosomal proteins related peaks data. With this in mind, this work presents a study about data imputation through the kNN algorithm. Logistic classifiers were applied to identify bacteria of the Bacillus genus and the Staphylococcus aureus species while three data imputation techniques were tested: with zero, with the average of the missing attribute, and with kNN algorithm. From this latter imputation technique, two approaches were considered: average aggregation function and median aggregation function. The adopted experimental protocol investigated the imputation influence on classification results under different scenarios regarding missing variablesnumber.TheresultsshowthatbothkNN’sapproachesdidnotpromotesignificantreduction on classifiers’ performance when compared with complete data approach and that the classification of imputed data by kNN presented superior performance to that of other considered methods.

Page generated in 0.1409 seconds