Global ETD Search

61	Variable selection in the general linear model for censored data Yu, Lili 08 March 2007 (has links) No description available. Statistics LASSO seive likelihood model selection right censored data
62	Rôle des répétitions textuelles dans les Psaumes de la Pénitence de LASSUS Lessoil-Daelman, Marcelle January 1993 (has links) No description available.
63	Flood pulse influences on exploited fish populations of the Central Amazon Olsen, Jesse Eric Burle 10 January 2017 (has links) Seasonally fluctuating water levels, known as flood pulses, influence the population dynamics and catches of fishes from river-floodplains. Although different measures of flood pulses, here called flood pulse variables, have been correlated to changes in catches of river-floodplain fishes, the flood pulse variables that have the strongest relationships to catches have not been identified. Furthermore, it is unclear if flood pulses influence catches of river-floodplain fishes with different life history strategies in different ways. Catches of 21 taxa from approximately 18,000 fishing trips were modeled as a function of fishing effort, gear type, seasonal flood pulse variables, and interannual flood pulse variables. These models were analyzed to understand which flood pulse variables had the strongest relationships to catches, and evaluate different flood pulse influences among taxa with different life history strategies. High water flood pulse variables generally had positive influences on catches in future years, while low water flood pulse variables generally had negative influences on catches in future years. Flood pulses generally had stronger influences on the catches of fishes with high fecundities and smaller eggs than on catches of fishes with low fecundities and larger eggs. Variation was observed in strengths and directions of flood pulse influences on catches of fishes with similar and different life history strategies. While my results were generally consistent with prevailing knowledge of how flood pulses influence catches of fishes, other biological factors of specific fish populations may further explain population responses to flood pulses. / Master of Science Multispecies fisheries flood pulse Modeling floodplain ecology LASSO
64	Seleção bayesiana de variáveis em modelos multiníveis da teoria de resposta ao item com aplicações em genômica / Bayesian variable selection for multilevel item response theory models with applications in genomics Fragoso, Tiago de Miranda 12 September 2014 (has links) As investigações sobre as bases genéticas de doenças complexas em Genômica utilizam diversos tipos de informação. Diversos sintomas são avaliados de maneira a diagnosticar a doença, os indivíduos apresentam padrões de agrupamento baseados, por exemplo no seu parentesco ou ambiente comum e uma quantidade imensa de características dos indivíduos são medidas por meio de marcadores genéticos. No presente trabalho, um modelo multiníveis da teoria de resposta ao item (TRI) é proposto de forma a integrar todas essas fontes de informação e caracterizar doenças complexas através de uma variável latente. Além disso, a quantidade de marcadores moleculares induz um problema de seleção de variáveis, para o qual uma seleção baseada nos métodos da busca estocástica e do LASSO bayesiano são propostos. Os parâmetros do modelo e a seleção de variáveis são realizados sob um paradigma bayesiano, no qual um algoritmo Monte Carlo via Cadeias de Markov é construído e implementado para a obtenção de amostras da distribuição a posteriori dos parâmetros. O mesmo é validado através de estudos de simulação, nos quais a capacidade de recuperação dos parâmetros, de escolha de variáveis e características das estimativas pontuais dos parâmetros são avaliadas em cenários similares aos dados reais. O processo de estimação apresenta uma recuperação satisfatória nos parâmetros estruturais do modelo e capacidade de selecionar covariáveis em espaços de dimensão elevada apesar de um viés considerável nas estimativas das variáveis latentes associadas ao traço latente e ao efeito aleatório. Os métodos desenvolvidos são então aplicados aos dados colhidos no estudo de associação familiar \'Corações de Baependi\', nos quais o modelo multiníveis se mostra capaz de caracterizar a síndrome metabólica, uma série de sintomas associados com o risco cardiovascular. O modelo multiníveis e a seleção de variáveis se mostram capazes de recuperar características conhecidas da doença e selecionar um marcador associado. / Recent investigations about the genetic architecture of complex diseases use diferent sources of information. Diferent symptoms are measured to obtain a diagnosis, individuals may not be independent due to kinship or common environment and their genetic makeup may be measured through a large quantity of genetic markers. In the present work, a multilevel item response theory (IRT) model is proposed that unifies all these diferent sources of information through a latent variable. Furthermore, the large ammount of molecular markers induce a variable selection problem, for which procedures based on stochastic search variable selection and the Bayesian LASSO are considered. Parameter estimation and variable selection is conducted under a Bayesian framework in which a Markov chain Monte Carlo algorithm is derived and implemented to obtain posterior distribution samples. The estimation procedure is validated through a series of simulation studies in which parameter recovery, variable selection and estimation error are evaluated in scenarios similar to the real dataset. The estimation procedure showed adequate recovery of the structural parameters and the capability to correctly nd a large number of the covariates even in high dimensional settings albeit it also produced biased estimates for the incidental latent variables. The proposed methods were then applied to the real dataset collected on the \'Corações de Baependi\' familiar association study and was able to apropriately model the metabolic syndrome, a series of symptoms associated with elevated heart failure and diabetes risk. The multilevel model produced a latent trait that could be identified with the syndrome and an associated molecular marker was found. Bayesian LASSO busca estocástica item response theory LASSO bayesiano stochastic search variable selection teoria da resposta ao item
65	Seleção bayesiana de variáveis em modelos multiníveis da teoria de resposta ao item com aplicações em genômica / Bayesian variable selection for multilevel item response theory models with applications in genomics Tiago de Miranda Fragoso 12 September 2014 (has links) As investigações sobre as bases genéticas de doenças complexas em Genômica utilizam diversos tipos de informação. Diversos sintomas são avaliados de maneira a diagnosticar a doença, os indivíduos apresentam padrões de agrupamento baseados, por exemplo no seu parentesco ou ambiente comum e uma quantidade imensa de características dos indivíduos são medidas por meio de marcadores genéticos. No presente trabalho, um modelo multiníveis da teoria de resposta ao item (TRI) é proposto de forma a integrar todas essas fontes de informação e caracterizar doenças complexas através de uma variável latente. Além disso, a quantidade de marcadores moleculares induz um problema de seleção de variáveis, para o qual uma seleção baseada nos métodos da busca estocástica e do LASSO bayesiano são propostos. Os parâmetros do modelo e a seleção de variáveis são realizados sob um paradigma bayesiano, no qual um algoritmo Monte Carlo via Cadeias de Markov é construído e implementado para a obtenção de amostras da distribuição a posteriori dos parâmetros. O mesmo é validado através de estudos de simulação, nos quais a capacidade de recuperação dos parâmetros, de escolha de variáveis e características das estimativas pontuais dos parâmetros são avaliadas em cenários similares aos dados reais. O processo de estimação apresenta uma recuperação satisfatória nos parâmetros estruturais do modelo e capacidade de selecionar covariáveis em espaços de dimensão elevada apesar de um viés considerável nas estimativas das variáveis latentes associadas ao traço latente e ao efeito aleatório. Os métodos desenvolvidos são então aplicados aos dados colhidos no estudo de associação familiar \'Corações de Baependi\', nos quais o modelo multiníveis se mostra capaz de caracterizar a síndrome metabólica, uma série de sintomas associados com o risco cardiovascular. O modelo multiníveis e a seleção de variáveis se mostram capazes de recuperar características conhecidas da doença e selecionar um marcador associado. / Recent investigations about the genetic architecture of complex diseases use diferent sources of information. Diferent symptoms are measured to obtain a diagnosis, individuals may not be independent due to kinship or common environment and their genetic makeup may be measured through a large quantity of genetic markers. In the present work, a multilevel item response theory (IRT) model is proposed that unifies all these diferent sources of information through a latent variable. Furthermore, the large ammount of molecular markers induce a variable selection problem, for which procedures based on stochastic search variable selection and the Bayesian LASSO are considered. Parameter estimation and variable selection is conducted under a Bayesian framework in which a Markov chain Monte Carlo algorithm is derived and implemented to obtain posterior distribution samples. The estimation procedure is validated through a series of simulation studies in which parameter recovery, variable selection and estimation error are evaluated in scenarios similar to the real dataset. The estimation procedure showed adequate recovery of the structural parameters and the capability to correctly nd a large number of the covariates even in high dimensional settings albeit it also produced biased estimates for the incidental latent variables. The proposed methods were then applied to the real dataset collected on the \'Corações de Baependi\' familiar association study and was able to apropriately model the metabolic syndrome, a series of symptoms associated with elevated heart failure and diabetes risk. The multilevel model produced a latent trait that could be identified with the syndrome and an associated molecular marker was found. busca estocástica LASSO bayesiano teoria da resposta ao item Bayesian LASSO item response theory stochastic search variable selection
66	A Study of Missing Data Imputation and Predictive Modeling of Strength Properties of Wood Composites Zeng, Yan 01 August 2011 (has links) Problem: Real-time process and destructive test data were collected from a wood composite manufacturer in the U.S. to develop real-time predictive models of two key strength properties (Modulus of Rupture (MOR) and Internal Bound (IB)) of a wood composite manufacturing process. Sensor malfunction and data “send/retrieval” problems lead to null fields in the company’s data warehouse which resulted in information loss. Many manufacturers attempt to build accurate predictive models excluding entire records with null fields or using summary statistics such as mean or median in place of the null field. However, predictive model errors in validation may be higher in the presence of information loss. In addition, the selection of predictive modeling methods poses another challenge to many wood composite manufacturers. Approach: This thesis consists of two parts addressing above issues: 1) how to improve data quality using missing data imputation; 2) what predictive modeling method is better in terms of prediction precision (measured by root mean square error or RMSE). The first part summarizes an application of missing data imputation methods in predictive modeling. After variable selection, two missing data imputation methods were selected after comparing six possible methods. Predictive models of imputed data were developed using partial least squares regression (PLSR) and compared with models of non-imputed data using ten-fold cross-validation. Root mean square error of prediction (RMSEP) and normalized RMSEP (NRMSEP) were calculated. The second presents a series of comparisons among four predictive modeling methods using imputed data without variable selection. Results: The first part concludes that expectation-maximization (EM) algorithm and multiple imputation (MI) using Markov Chain Monte Carlo (MCMC) simulation achieved more precise results. Predictive models based on imputed datasets generated more precise prediction results (average NRMSEP of 5.8% for model of MOR model and 7.2% for model of IB) than models of non-imputed datasets (average NRMSEP of 6.3% for model of MOR and 8.1% for model of IB). The second part finds that Bayesian Additive Regression Tree (BART) produced most precise prediction results (average NRMSEP of 7.7% for MOR model and 8.6% for IB model) than other three models: PLSR, LASSO, and Adaptive LASSO. missing data imputation predictive modeling partial least squares regression LASSO Adaptive LASSO BART Applied Statistics Statistical Methodology Statistical Models
67	Robustní optimalizace v klasifikačních a regresních úlohách / Robust optimization in classification and regression problems Semela, Ondřej January 2016 (has links) In this thesis, we present selected methods of regression and classification analysis in terms of robust optimization which aim to compensate for data imprecisions and measurement errors. In the first part, ordinary least squares method and its generalizations derived within the context of robust optimization - ridge regression and Lasso method are introduced. The connection between robust least squares and stated generalizations is also shown. Theoretical results are accompanied with simulation study investigating from a different perspective the robustness of stated methods. In the second part, we define a modern classification method - Support Vector Machines (SVM). Using the obtained knowledge, we formulate a robust SVM method, which can be applied in robust classification. The final part is devoted to the biometric identification of a style of typing and an individual based on keystroke dynamics using the formulated theory. Powered by TCPDF (www.tcpdf.org)
68	[pt] LAWIE: DECONVOLUÇÃO EM PICOS ESPARSOS USANDO O LASSO E FILTRO DE WIENER / [en] LAWIE: SPARSE-SPIKE DECONVOLUTION WITH LASSO AND WIENER FILTER FELIPE JORDAO PINHEIRO DE ANDRADE 06 November 2020 (has links) [pt] Este trabalho propõe um algoritmo para o problema da deconvolução sísmica em picos esparsos. Intitulado LaWie, este algoritmo é baseado na combinação do Least Absolute Shrinkage and Selection Operator (LASSO) e a modelagem de blocos usada no filtro de Wiener. A deconvolução é feita traço a traço para estimar o perfil de refletividade e a wavelet original que deu origem as amplitudes sísmicas. Este trabalho apresenta o resultado do método no dataset sintético do Marmousi2, onde existe um ground truth para comparações objetivas. Além disso, também apresenta os resultados no dataset real Netherlands Offshore F3 Block e mostra a aplicabilidade do algoritmo proposto para não apenas delinear o perfil de refletividades como também para ressaltar características como fraturas neste dado. / [en] This work proposes an algorithm for solving the seismic sparse-spike deconvolution problem. Entitled LaWie, this algorithm is based on the combination of Least Absolute Shrinkage and Selection Operator (LASSO) and the block modeling used in the Wiener filter. Deconvolution is done trace by trace to estimate the reflectivity profile and the convolution wavelet that originated the seismic amplitudes. This work presents the results in the synthetic dataset of Marmousi2, where there is a ground truth for objective comparisons. Also, this work presents the results in a real dataset, Netherlands Offshore F3 Block, and shows the applicability of the proposed algorithm to outline the reflectivity profile and highlight characteristics such as fractures in this data. [pt] LASSO [pt] WIENER [pt] INVERSAO [pt] MODELAGEM ESPARSA [pt] DECONVOLUCAO [en] LASSO [en] WIENER [en] INVERSION [en] SPARSE MODELING [en] DECONVOLUTION
69	[en] A THEORY BASED, DATA DRIVEN SELECTION FOR THE REGULARIZATION PARAMETER FOR LASSO / [pt] SELECIONANDO O PARÂMETRO DE REGULARIZAÇÃO PARA O LASSO: BASEADO NA TEORIA E NOS DADOS DANIEL MARTINS COUTINHO 25 March 2021 (has links) [pt] O presente trabalho apresenta uma nova forma de selecionar o parâmetro de regularização do LASSO e do adaLASSO. Ela é baseada na teoria e incorpora a estimativa da variância do ruído. Nós mostramos propriedades teóricas e simulações Monte Carlo que o nosso procedimento é capaz de lidar com mais variáveis no conjunto ativo do que outras opções populares para a escolha do parâmetro de regularização. / [en] We provide a new way to select the regularization parameter for the LASSO and adaLASSO. It is based on the theory and incorporates an estimate of the variance of the noise. We show theoretical properties of the procedure and Monte Carlo simulations showing that it is able to handle more variables in the active set than other popular options for the regularization parameter. [pt] LASSO [pt] PARAMETRO DE REGULARIZACAO [pt] APRENDIZADO POR MAQUINA [pt] ADALASSO [en] LASSO [en] REGULARIZATION PARAMETER [en] MACHINE LEARNING [en] ADALASSO
70	Comparação de métodos de estimação para problemas com colinearidade e/ou alta dimensionalidade (p > n ) / Comparison of estimation methods for problems with collinear and/or high dimensionality (p > n) Casagrande, Marcelo Henrique 29 April 2016 (has links) Este trabalho apresenta um estudo comparativo do poder de predição de quatro métodos de regressão adequados para situações nas quais os dados, dispostos na matriz de planejamento, apresentam sérios problemas de multicolinearidade e/ou de alta dimensionalidade, em que o número de covariáveis é maior do que o número de observações. No presente trabalho, os métodos abordados são: regressão por componentes principais, regressão por mínimos quadrados parciais, regressão ridge e LASSO. O trabalho engloba simulações, em que o poder preditivo de cada uma das técnicas é avaliado para diferentes cenários definidos por número de covariáveis, tamanho de amostra e quantidade e intensidade de coeficientes (efeitos) significativos, destacando as principais diferenças entre os métodos e possibilitando a criação de um guia para que o usuário possa escolher qual metodologia usar com base em algum conhecimento prévio que o mesmo possa ter. Uma aplicação em dados reais (não simulados) também é abordada. / This paper presents a comparative study of the predictive power of four suitable regression methods for situations in which data, arranged in the planning matrix, are very poorly multicolinearity and / or highdimensionality, wherein the number of covariatesis greater the number of observations. In this study, the methods discussed are: principal component regression,partial least squares regression,ridge regression and LASSO. The work includes simulations, where in the predictive power of each of the techniques is evaluated for different scenarios defined by the number of covariates, sample size and quantity and intensity ratios (effects) significant, high lighting the main dffierences between the methods and allowing for the creating a guide for the user to choose which method to use based on some prior knowledge that it may have. An applicationon real data (not simulated) is also addressed. Alta dimensionalidade Highdimensionality LASSO LASSO Mínimos quadrados parciais Partial least squares Principal component regression Regressão por componentes principais Regressão ridge Ridge regression

Search results