Global ETD Search

141	Bayesian Modeling of Complex High-Dimensional Data Huo, Shuning 07 December 2020 (has links) With the rapid development of modern high-throughput technologies, scientists can now collect high-dimensional complex data in different forms, such as medical images, genomics measurements. However, acquisition of more data does not automatically lead to better knowledge discovery. One needs efficient and reliable analytical tools to extract useful information from complex datasets. The main objective of this dissertation is to develop innovative Bayesian methodologies to enable effective and efficient knowledge discovery from complex high-dimensional data. It contains two parts—the development of computationally efficient functional mixed models and the modeling of data heterogeneity via Dirichlet Diffusion Tree. The first part focuses on tackling the computational bottleneck in Bayesian functional mixed models. We propose a computational framework called variational functional mixed model (VFMM). This new method facilitates efficient data compression and high-performance computing in basis space. We also propose a new multiple testing procedure in basis space, which can be used to detect significant local regions. The effectiveness of the proposed model is demonstrated through two datasets, a mass spectrometry dataset in a cancer study and a neuroimaging dataset in an Alzheimer's disease study. The second part is about modeling data heterogeneity by using Dirichlet Diffusion Trees. We propose a Bayesian latent tree model that incorporates covariates of subjects to characterize the heterogeneity and uncover the latent tree structure underlying data. This innovative model may reveal the hierarchical evolution process through branch structures and estimate systematic differences between groups of samples. We demonstrate the effectiveness of the model through the simulation study and a brain tumor real data. / Doctor of Philosophy / With the rapid development of modern high-throughput technologies, scientists can now collect high-dimensional data in different forms, such as engineering signals, medical images, and genomics measurements. However, acquisition of such data does not automatically lead to efficient knowledge discovery. The main objective of this dissertation is to develop novel Bayesian methods to extract useful knowledge from complex high-dimensional data. It has two parts—the development of an ultra-fast functional mixed model and the modeling of data heterogeneity via Dirichlet Diffusion Trees. The first part focuses on developing approximate Bayesian methods in functional mixed models to estimate parameters and detect significant regions. Two datasets demonstrate the effectiveness of proposed method—a mass spectrometry dataset in a cancer study and a neuroimaging dataset in an Alzheimer's disease study. The second part focuses on modeling data heterogeneity via Dirichlet Diffusion Trees. The method helps uncover the underlying hierarchical tree structures and estimate systematic differences between the group of samples. We demonstrate the effectiveness of the method through the brain tumor imaging data. Variational Inference Bayesian Variable Selection Functional Mixed Model Parallel Computing Bayesian Hierarchical Clustering Dirichlet Diffusion Tree
142	Application and feasibility of visible-NIR-MIR spectroscopy and classification techniques for wetland soil identification Whatley, Caleb 10 May 2024 (has links) (PDF) Wetland determinations require the visual identification of anaerobic soil indicators by an expert, which is a complex and subjective task. To eliminate bias, an objective method is needed to identify wetland soil. Currently, no such method exists that is rapid and easily interpretable. This study proposes a method for wetland soil identification using visible through mid-infrared (MIR) spectroscopy and classification algorithms. Wetland and non-wetland soils (n = 440) were collected across Mississippi. Spectra were measured from fresh and dried soil. Support Vector Classification and Random Forest modeling techniques were used to classify spectra with 75%/25% calibration and validation split. POWERSHAP Shapley feature selection and Gini importance were used to locate highest-contributing spectral features. Average classification accuracy was ~91%, with a maximum accuracy of 99.6% on MIR spectra. The most important features were related to iron compounds, nitrates, and soil texture. This study improves the reliability of wetland determinations as an objective and rapid wetland soil identification method while eliminating the need for an expert for determination.
143	Topics in Modern Bayesian Computation Qamar, Shaan January 2015 (has links) <p>Collections of large volumes of rich and complex data has become ubiquitous in recent years, posing new challenges in methodological and theoretical statistics alike. Today, statisticians are tasked with developing flexible methods capable of adapting to the degree of complexity and noise in increasingly rich data gathered across a variety of disciplines and settings. This has spurred the need for novel multivariate regression techniques that can efficiently capture a wide range of naturally occurring predictor-response relations, identify important predictors and their interactions and do so even when the number of predictors is large but the sample size remains limited. </p><p>Meanwhile, efficient model fitting tools must evolve quickly to keep pace with the rapidly growing dimension and complexity of data they are applied to. Aided by the tremendous success of modern computing, Bayesian methods have gained tremendous popularity in recent years. These methods provide a natural probabilistic characterization of uncertainty in the parameters and in predictions. In addition, they provide a practical way of encoding model structure that can lead to large gains in statistical estimation and more interpretable results. However, this flexibility is often hindered in applications to modern data which are increasingly high dimensional, both in the number of observations $n$ and the number of predictors $p$. Here, computational complexity and the curse of dimensionality typically render posterior computation inefficient. In particular, Markov chain Monte Carlo (MCMC) methods which remain the workhorse for Bayesian computation (owing to their generality and asymptotic accuracy guarantee), typically suffer data processing and computational bottlenecks as a consequence of (i) the need to hold the entire dataset (or available sufficient statistics) in memory at once; and (ii) having to evaluate of the (often expensive to compute) data likelihood at each sampling iteration. </p><p>This thesis divides into two parts. The first part concerns itself with developing efficient MCMC methods for posterior computation in the high dimensional {\em large-n large-p} setting. In particular, we develop an efficient and widely applicable approximate inference algorithm that extends MCMC to the online data setting, and separately propose a novel stochastic search sampling scheme for variable selection in high dimensional predictor settings. The second part of this thesis develops novel methods for structured sparsity in the high-dimensional {\em large-p small-n} regression setting. Here, statistical methods should scale well with the predictor dimension and be able to efficiently identify low dimensional structure so as to facilitate optimal statistical estimation in the presence of limited data. Importantly, these methods must be flexible to accommodate potentially complex relationships between the response and its associated explanatory variables. The first work proposes a nonparametric additive Gaussian process model to learn predictor-response relations that may be highly nonlinear and include numerous lower order interaction effects, possibly in different parts of the predictor space. A second work proposes a novel class of Bayesian shrinkage priors for multivariate regression with a tensor valued predictor. Dimension reduction is achieved using a low-rank additive decomposition for the latter, enabling a highly flexible and rich structure within which excellent cell-estimation and region selection may be obtained through state-of-the-art shrinkage methods. In addition, the methods developed in these works come with strong theoretical guarantees.</p> / Dissertation Statistics Approximate Bayesian computation High dimensional regression Nonparametric regression Scalable Markov chain Monte Carlo Structured additive models Variable selection
144	Novel variable influence on projection (VIP) methods in OPLS, O2PLS, and OnPLS models for single- and multi-block variable selection : VIPOPLS, VIPO2PLS, and MB-VIOP methods Galindo-Prieto, Beatriz January 2017 (has links) Multivariate and multiblock data analysis involves useful methodologies for analyzing large data sets in chemistry, biology, psychology, economics, sensory science, and industrial processes; among these methodologies, partial least squares (PLS) and orthogonal projections to latent structures (OPLS®) have become popular. Due to the increasingly computerized instrumentation, a data set can consist of thousands of input variables which contain latent information valuable for research and industrial purposes. When analyzing a large number of data sets (blocks) simultaneously, the number of variables and underlying connections between them grow very much indeed; at this point, reducing the number of variables keeping high interpretability becomes a much needed strategy. The main direction of research in this thesis is the development of a variable selection method, based on variable influence on projection (VIP), in order to improve the model interpretability of OnPLS models in multiblock data analysis. This new method is called multiblock variable influence on orthogonal projections (MB-VIOP), and its novelty lies in the fact that it is the first multiblock variable selection method for OnPLS models. Several milestones needed to be reached in order to successfully create MB-VIOP. The first milestone was the development of a single-block variable selection method able to handle orthogonal latent variables in OPLS models, i.e. VIP for OPLS (denoted as VIPOPLS or OPLS-VIP in Paper I), which proved to increase the interpretability of PLS and OPLS models, and afterwards, was successfully extended to multivariate time series analysis (MTSA) aiming at process control (Paper II). The second milestone was to develop the first multiblock VIP approach for enhancement of O2PLS® models, i.e. VIPO2PLS for two-block multivariate data analysis (Paper III). And finally, the third milestone and main goal of this thesis, the development of the MB-VIOP algorithm for the improvement of OnPLS model interpretability when analyzing a large number of data sets simultaneously (Paper IV). The results of this thesis, and their enclosed papers, showed that VIPOPLS, VIPO2PLS, and MB-VIOP methods successfully assess the most relevant variables for model interpretation in PLS, OPLS, O2PLS, and OnPLS models. In addition, predictability, robustness, dimensionality reduction, and other variable selection purposes, can be potentially improved/achieved by using these methods. Variable influence on projection VIP MB-VIOP OPLS O2PLS OnPLS variable selection
145	Um procedimento para seleção de variáveis em modelos lineares generalizados duplos / A procedure for variable selection in double generalized linear models Cavalaro, Lucas Leite 01 April 2019 (has links) Os modelos lineares generalizados duplos (MLGD), diferentemente dos modelos lineares generalizados (MLG), permitem o ajuste do parâmetro de dispersão da variável resposta em função de variáveis preditoras, aperfeiçoando a forma de modelar fenômenos. Desse modo, os mesmos são uma possível solução quando a suposição de que o parâmetro de dispersão constante não é razoável e a variável resposta tem distribuição que pertence à família exponencial. Considerando nosso interesse em seleção de variáveis nesta classe de modelos, estudamos o esquema de seleção de variáveis em dois passos proposto por Bayer e Cribari-Neto (2015) e, com base neste método, desenvolvemos um esquema para seleção de variáveis em até k passos. Para verificar a performance do nosso procedimento, realizamos estudos de simulação de Monte Carlo em MLGD. Os resultados obtidos indicam que o nosso procedimento para seleção de variáveis apresenta, em geral, performance semelhante ou superior à das demais metodologias estudadas sem necessitar de um grande custo computacional. Também avaliamos o esquema para seleção de variáveis em até \"k\" passos em um conjunto de dados reais e o comparamos com diferentes métodos de regressão. Os resultados mostraram que o nosso procedimento pode ser também uma boa alternativa quando possui-se interesse em realizar previsões. / The double generalized linear models (DGLM), unlike the generalized linear model (GLM), allow the fit of the dispersion parameter of the response variable as a function of predictor variables, improving the way of modeling phenomena. Thus, they are a possible solution when the assumption that the constant dispersion parameter is unreasonable and the response variable has distribution belonging to the exponential family. Considering our interest in variable selection in this class of models, we studied the two-step variable selection scheme proposed by Bayer and Cribari-Neto (2015) and, based on this method, we developed a scheme to select variables in up to k steps. To check the performance of our procedure, we performed Monte Carlo simulation studies in DGLM. The results indicate that our procedure for variable selection presents, in general, similar or superior performance than the other studied methods without requiring a large computational cost. We also evaluated the scheme to select variables in up to \"k\" steps in a set of real data and compared it with different regression methods. The results showed that our procedure can also be a good alternative when the interest is in making predictions. Critérios de informação Double generalized linear models Information criteria Modelos lineares generalizados duplos Seleção de variáveis Stepwise Stepwise Variable selection
146	Penalized regression models for compositional data / Métodos de regressão penalizados para dados composicionais Shimizu, Taciana Kisaki Oliveira 10 December 2018 (has links) Compositional data consist of known vectors such as compositions whose components are positive and defined in the interval (0,1) representing proportions or fractions of a whole, where the sum of these components must be equal to one. Compositional data is present in different areas, such as in geology, ecology, economy, medicine, among many others. Thus, there is great interest in new modeling approaches for compositional data, mainly when there is an influence of covariates in this type of data. In this context, the main objective of this thesis is to address the new approach of regression models applied in compositional data. The main idea consists of developing a marked method by penalized regression, in particular the Lasso (least absolute shrinkage and selection operator), elastic net and Spike-and-Slab Lasso (SSL) for the estimation of parameters of the models. In particular, we envision developing this modeling for compositional data, when the number of explanatory variables exceeds the number of observations in the presence of large databases, and when there are constraints on the dependent variables and covariates. / Dados composicionais consistem em vetores conhecidos como composições cujos componentes são positivos e definidos no intervalo (0,1) representando proporções ou frações de um todo, sendo que a soma desses componentes totalizam um. Tais dados estão presentes em diferentes áreas, como na geologia, ecologia, economia, medicina entre outras. Desta forma, há um grande interesse em ampliar os conhecimentos acerca da modelagem de dados composicionais, principalmente quando há a influência de covariáveis nesse tipo de dado. Nesse contexto, a presente tese tem por objetivo propor uma nova abordagem de modelos de regressão aplicada em dados composicionais. A ideia central consiste no desenvolvimento de um método balizado por regressão penalizada, em particular Lasso, do inglês least absolute shrinkage and selection operator, elastic net e Spike-e-Slab Lasso (SSL) para a estimação dos parâmetros do modelo. Em particular, visionamos o desenvolvimento dessa modelagem para dados composicionais, com o número de variáveis explicativas excedendo o número de observações e na presença de grandes bases de dados, e além disso, quando há restrição na variável resposta e nas covariáveis. Compositional data Coordenadas log-razão isométricas Dados composicionais Isometric log-ratio coordinates Modelo de regressão Regression model Seleção de variáveis Variable selection
147	Seleção bayesiana de variáveis em modelos multiníveis da teoria de resposta ao item com aplicações em genômica / Bayesian variable selection for multilevel item response theory models with applications in genomics Fragoso, Tiago de Miranda 12 September 2014 (has links) As investigações sobre as bases genéticas de doenças complexas em Genômica utilizam diversos tipos de informação. Diversos sintomas são avaliados de maneira a diagnosticar a doença, os indivíduos apresentam padrões de agrupamento baseados, por exemplo no seu parentesco ou ambiente comum e uma quantidade imensa de características dos indivíduos são medidas por meio de marcadores genéticos. No presente trabalho, um modelo multiníveis da teoria de resposta ao item (TRI) é proposto de forma a integrar todas essas fontes de informação e caracterizar doenças complexas através de uma variável latente. Além disso, a quantidade de marcadores moleculares induz um problema de seleção de variáveis, para o qual uma seleção baseada nos métodos da busca estocástica e do LASSO bayesiano são propostos. Os parâmetros do modelo e a seleção de variáveis são realizados sob um paradigma bayesiano, no qual um algoritmo Monte Carlo via Cadeias de Markov é construído e implementado para a obtenção de amostras da distribuição a posteriori dos parâmetros. O mesmo é validado através de estudos de simulação, nos quais a capacidade de recuperação dos parâmetros, de escolha de variáveis e características das estimativas pontuais dos parâmetros são avaliadas em cenários similares aos dados reais. O processo de estimação apresenta uma recuperação satisfatória nos parâmetros estruturais do modelo e capacidade de selecionar covariáveis em espaços de dimensão elevada apesar de um viés considerável nas estimativas das variáveis latentes associadas ao traço latente e ao efeito aleatório. Os métodos desenvolvidos são então aplicados aos dados colhidos no estudo de associação familiar \'Corações de Baependi\', nos quais o modelo multiníveis se mostra capaz de caracterizar a síndrome metabólica, uma série de sintomas associados com o risco cardiovascular. O modelo multiníveis e a seleção de variáveis se mostram capazes de recuperar características conhecidas da doença e selecionar um marcador associado. / Recent investigations about the genetic architecture of complex diseases use diferent sources of information. Diferent symptoms are measured to obtain a diagnosis, individuals may not be independent due to kinship or common environment and their genetic makeup may be measured through a large quantity of genetic markers. In the present work, a multilevel item response theory (IRT) model is proposed that unifies all these diferent sources of information through a latent variable. Furthermore, the large ammount of molecular markers induce a variable selection problem, for which procedures based on stochastic search variable selection and the Bayesian LASSO are considered. Parameter estimation and variable selection is conducted under a Bayesian framework in which a Markov chain Monte Carlo algorithm is derived and implemented to obtain posterior distribution samples. The estimation procedure is validated through a series of simulation studies in which parameter recovery, variable selection and estimation error are evaluated in scenarios similar to the real dataset. The estimation procedure showed adequate recovery of the structural parameters and the capability to correctly nd a large number of the covariates even in high dimensional settings albeit it also produced biased estimates for the incidental latent variables. The proposed methods were then applied to the real dataset collected on the \'Corações de Baependi\' familiar association study and was able to apropriately model the metabolic syndrome, a series of symptoms associated with elevated heart failure and diabetes risk. The multilevel model produced a latent trait that could be identified with the syndrome and an associated molecular marker was found. Bayesian LASSO busca estocástica item response theory LASSO bayesiano stochastic search variable selection teoria da resposta ao item
148	Seleção de variáveis aplicada ao controle estatístico multivariado de processos em bateladas Peres, Fernanda Araujo Pimentel January 2018 (has links) A presente tese apresenta proposições para o uso da seleção de variáveis no aprimoramento do controle estatístico de processos multivariados (MSPC) em bateladas, a fim de contribuir com a melhoria da qualidade de processos industriais. Dessa forma, os objetivos desta tese são: (i) identificar as limitações encontradas pelos métodos MSPC no monitoramento de processos industriais; (ii) entender como métodos de seleção de variáveis são integrados para promover a melhoria do monitoramento de processos de elevada dimensionalidade; (iii) discutir sobre métodos para alinhamento e sincronização de bateladas aplicados a processos com diferentes durações; (iv) definir o método de alinhamento e sincronização mais adequado para o tratamento de dados de bateladas, visando aprimorar a construção do modelo de monitoramento na Fase I do controle estatístico de processo; (v) propor a seleção de variáveis, com propósito de classificação, prévia à construção das cartas de controle multivariadas (CCM) baseadas na análise de componentes principais (PCA) para monitorar um processo em bateladas; e (vi) validar o desempenho de detecção de falhas da carta de controle multivariada proposta em comparação às cartas tradicionais e baseadas em PCA. O desempenho do método proposto foi avaliado mediante aplicação em um estudo de caso com dados reais de um processo industrial alimentício. Os resultados obtidos demonstraram que a realização de uma seleção de variáveis prévia à construção das CCM contribuiu para reduzir eficientemente o número de variáveis a serem analisadas e superar as limitações encontradas na detecção de falhas quando bancos de elevada dimensionalidade são monitorados. Conclui-se que, ao possibilitar que CCM, amplamente utilizadas no meio industrial, sejam adequadas para banco de dados reais de elevada dimensionalidade, o método proposto agrega inovação à área de monitoramento de processos em bateladas e contribui para a geração de produtos de elevado padrão de qualidade. / This dissertation presents propositions for the use of variable selection in the improvement of multivariate statistical process control (MSPC) of batch processes, in order to contribute to the enhacement of industrial processes’ quality. There are six objectives: (i) identify MSPC limitations in industrial processes monitoring; (ii) understand how methods of variable selection are used to improve high dimensional processes monitoring; (iii) discuss about methods for alignment and synchronization of batches with different durations; (iv) define the most adequate alignment and synchronization method for batch data treatment, aiming to improve Phase I of process monitoring; (v) propose variable selection for classification prior to establishing multivariate control charts (MCC) based on principal component analysis (PCA) to monitor a batch process; and (vi) validate fault detection performance of the proposed MCC in comparison with traditional PCA-based and charts. The performance of the proposed method was evaluated in a case study using real data from an industrial food process. Results showed that performing variable selection prior to establishing MCC contributed to efficiently reduce the number of variables and overcome limitations found in fault detection when high dimensional datasets are monitored. We conclude that by improving control charts widely used in industry to accomodate high dimensional datasets the proposed method adds innovation to the area of batch process monitoring and contributes to the generation of high quality standard products. Controle estatístico de processo Análise multivariada Detecção de falhas Variable selection Fault detection Batch processes Multivariate statistical process control
149	Dados hiperespectrais para predição do teor foliar de nitrogênio em cana-de-açúcar / Hyperspectral data to predict sugarcane leaf nitrogen content Martins, Juliano Araújo 17 February 2016 (has links) Uma das alternativas bastante abordada na literatura para a melhoria do gerenciamento da adubação nitrogenada nas culturas é o sensoriamento remoto, tendo destaque a utilização de sensores espectrais na região do visível e infravermelho. Neste trabalho, buscou-se estabelecer as relações existentes entre variações no teor foliar de nitrogênio (TFN) e a resposta espectral da folha de cana-de-açúcar, utilizando um sensor hiperespectral, com avaliações em três áreas experimentais do estado de São Paulo, com diferentes solos e variedades. Cada experimento foi alocado em blocos ao acaso, com parcelas subdividas e quatro repetições. Foram aplicadas doses de 0, 50, 100 e 150 kg de nitrogênio por hectare. A análise espectral foi realizada na folha \"+1\" em laboratório, sendo coletadas 10 folhas por subparcela, estas foram posteriormente submetidas a análise química para o TFN. Observou-se que existe correlação significativa entre o TFN e as variações na resposta espectral da cana-de-açúcar, sendo que a região do verde e de transição entre o vermelho e o infravermelho próximo (\"red-edge\") foram as mais consistentes e estáveis entre as áreas em estudo e safras avaliadas. A análise de componentes principais permitiu reforçar estes resultados, uma vez que as pontuações (\"scores\") dos componentes que apresentaram correlações significativas com o TFN, tiveram maiores pesos (\"loadings\") nas regiões espectrais citadas anteriormente. A partir das curvas espectrais foram também realizados os cálculos dos índices de vegetação já descritos em literatura, e estes submetidos a análise de regressão simples para predição do TFN, sendo os modelos calibrados com dados da safra 2012/13 e validados com os dados da safra 2013/14. Índices espectrais calculados com a combinação dos comprimentos de onda do verde e/ou \"red-edge\" com comprimentos de onda do infravermelho próximo tiveram bom desempenho na fase de validação, sendo que os cinco mais estáveis foram os índices BNi (500, 705 e 750 nm), GNDVI (550 e 780 nm), NDRE (790 e 720 nm), RI-1db (735 e 720 nm) e VOGa (740 e 720 nm). A variedade SP 81 3250 foi cultivada nas três áreas experimentais, o que permitiu a comparação do potencial de modelos calibrados por área, com um modelo generalista para uma mesma variedade cultivada em diferentes condições edáficas. Observou-se que embora o modelo generalista apresente parâmetros estatísticos significativos, existe redução expressiva da sensibilidade de predição quando comparado aos modelos calibrados por área experimental. Empregou-se também nesta pesquisa a análise de regressão linear múltipla por \"stepwise\" (RLMS) que gerou modelos com boa precisão na estimativa do TFN, mesmo quando calibrados por área experimental, independentes da variedade, utilizando de 5 a 6 comprimentos de onda. Concluímos com a presente pesquisa que comprimentos de onda específicos estão associados a variação do TFN em cana-de-açúcar, e estes são reportados na região do verde (próximos a 550 nm) e na região de transição entre os comprimentos de onda do vermelho e infravermelho próximo (680 a 720 nm). Apesar da baixa correlação entre a região do infravermelho próximo com o TFN, índices de vegetação calculados a partir destes comprimentos de onda ou a inserção destes na geração de modelos lineares foram importantes para melhorar a precisão da predição. / An alternative method, quite cited in literature to improve nitrogen fertilization management on crops is the remote sensing, highlighted with the use of spectral sensors in the visible and infrared region. In this work, we sought to establish the relationship between variations in leaf nitrogen content and the spectral response of sugarcane leaf using a hyperspectral sensor, with assessments in three experimental areas of São Paulo state, Brazil, with evaluations in different soils and varieties. Each experimental area was allocated in randomized block, with splitted plots and four repetition, hence, receiving doses of 0, 50, 100 and 150 kg of nitrogen per hectare. Spectral analysis was performed on the \"+1\" leaf in laboratory; we collected 10 leaves per subplots; which were subsequently subjected to chemical analysis to leaf nitrogen content determination. We observed a significant correlation between leaf nitrogen content and variations in sugarcane spectral response, we noticed that the region of the green light and red-edge were the most consistent and stable among the studied area and the crop seasons evaluated. The principal component analysis allowed to reinforce these results, since that the scores for principal components showed significant correlations with the leaf nitrogen content, had higher loadings values for the previous spectral regions mentioned. From the spectral curves were also performed calculations of spectral indices previously described in literature, being these submitted to simple regression analysis to direct prediction of leaf nitrogen content. The models were calibrated with 2012/13 and validated with 2013/14 crop season data. Spectral indices that were calculated with green and/or red-edge, combined with near-infrared wavelengths performed well in the validation phase, and the five most stable were the BNi (500, 705 and 750 nm), GNDVI (550 and 780 nm), NDRE (790 and 720 nm), IR-1dB (735 and 720 nm) and VOGa (740 and 720 nm). The variety SP 81 3250 was cultured in the three experimental areas, allowing to compare the performance of a specific site model with a general model for the same variety growing on different soil conditions. Although the general model presents meaningful statistical parameters, there is a significant reduction in sensitivity to predict leaf nitrogen content of sugarcane when compared with specific site calibrated models. We also used on this research the stepwise multiple linear regression (SMLR) that generated models with good precision to estimate the leaf nitrogen content, even when models are calibrated for an experimental area, regardless of spectral differences between varieties, using 5 to 6 wavelengths. This study shows that specific wavelengths are associated with variation in leaf nitrogen content of sugarcane, and these are reported in the region of green (near to 550 nm) and red-edge (680 to 720nm). Despite the low correlation observed between the infrared wavelengths to the leaf nitrogen content of sugarcane, vegetation indices calculated from these wavelengths, or its insertion on linear models generation were important to improve prediction accuracy. Adubação nitrogenada Índices de Vegetação Modelos de regressão Nitrogen fertilization Regression models Seleção de variáveis Sensores Sensors Variable selection Vegetation index
150	Uso de polinômios fracionários nos modelos mistos Garcia, Edijane Paredes January 2019 (has links) Orientador: Luzia Aparecida Trinca / Resumo: A classe dos modelos de regressão incorporando polinômios fracionários - FPs (Fractional Polynomials), proposta por Royston & Altman (1994), tem sido amplamente estudada. O uso de FPs em modelos mistos constitui uma alternativa muito atrativa para explicar a dependência das medidas intra-unidades amostrais em modelos em que há não linearidade na relação entre a variável resposta e variáveis regressoras contínua. Tal característica ocorre devido aos FPs oferecerem, para a resposta média, uma variedade de formas funcionais não lineares para as variáveis regressoras contínuas, em que se destacam a família dos polinômios convencionais e algumas curvas assimétricas e com assíntotas. A incorporação dos FPs na estrutura dos modelos mistos tem sido investigada por diversos autores. Porém, não existem publicações sobre: a exploração da problemática da modelagem na parte fixa e na parte aleatória (principalmente na presença de várias variáveis regressoras contínuas e categóricas); o estudo da influência dos FPs na estrutura dos efeitos aleatórios; a investigação de uma adequada estrutura para a matriz de covariâncias do erro; ou, um ponto de fundamental importância para colaborar com a seleção do modelo, a realização da análise de diagnóstico dos modelos ajustados. Uma contribuição, do nosso ponto de vista, de grande relevância é a investigação e oferecimento de estratégias de ajuste dos modelos polinômios fracionários com efeitos mistos englobando os pontos citados acima com o objetiv... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: The class of regression models incorporating Fractional Polynomials (FPs), proposed by Royston & Altman (1994), has been extensively studied. The use of FPs in mixed models is a very attractive alternative to explain the within-subjects’ measurements dependence in models where there is non-linearity in the relationship between the response variable and continuous covariates. This characteristic occurs because the FPs offers a variety of non-linear functional forms for the continuous covariates in the average response, in which the family of the conventional polynomials and some asymmetric curves with asymptotes stand out. The incorporation of FPs into the structure of the mixed models has been investigated by several authors. However, there are no works about the following issues: the modeling of the fixed and random effects (mainly in the presence of several continuous and categorical covariates), the study of the influence of the FPs on the structure of the random effects, the investigation of an adequate structure for the covariance of the random errors, or, a point that has central importance to the selection of the model, to perform a diagnostic analysis of the fitted models. In our point of view, a contribution of great relevance is the investigation and the proposition of strategies for fitting FPs with mixed effects encompassing the points mentioned above, with the goals of filling these gaps and to awaken the users to the great potential of mixed models, now even mor... (Complete abstract click electronic access below) / Doutor Curvatura. Dados longitudinais Forma funcional Modelo polinomial fracionário Seleção de variáveis Curvature Functional form Fractional polynomial model Longitudinal data Variable selection

Search results