Global ETD Search

11	Missing Data Imputation Method Comparison in Ohio University Student Retention Database Hening, Dyah A. January 2009 (has links) No description available. Higher Education Industrial Engineering Data Imputation Missing data MNAR MCAR MAR student retention
12	Statistical Modeling and Analysis of Bivariate Spatial-Temporal Data with the Application to Stream Temperature Study Li, Han 04 November 2014 (has links) Water temperature is a critical factor for the quality and biological condition of streams. Among various factors affecting stream water temperature, air temperature is one of the most important factors related to water temperature. To appropriately quantify the relationship between water and air temperatures over a large geographic region, it is important to accommodate the spatial and temporal information of the steam temperature. In this dissertation, I devote effort to several statistical modeling techniques for analyzing bivariate spatial-temporal data in a stream temperature study. In the first part, I focus our analysis on the individual stream. A time varying coefficient model (VCM) is used to study the relationship between air temperature and water temperature for each stream. The time varying coefficient model enables dynamic modeling of the relationship, and therefore can be used to enhance the understanding of water and air temperature relationships. The proposed model is applied to 10 streams in Maryland, West Virginia, Virginia, North Carolina and Georgia using daily maximum temperatures. The VCM approach increases the prediction accuracy by more than 50% compared to the simple linear regression model and the nonlinear logistic model. The VCM that describes the relationship between water and air temperatures for each stream is represented by slope and intercept curves from the fitted model. In the second part, I consider water and air temperatures for different streams that are spatial correlated. I focus on clustering multiple streams by using intercept and slope curves estimated from the VCM. Spatial information is incorporated to make clustering results geographically meaningful. I further propose a weighted distance as a dissimilarity measure for streams, which provides a flexible framework to interpret the clustering results under different weights. Real data analysis shows that streams in same cluster share similar geographic features such as solar radiation, percent forest and elevation. In the third part, I develop a spatial-temporal VCM (STVCM) to deal with missing data. The STVCM takes both spatial and temporal variation of water temperature into account. I develop a novel estimation method that emphasizes the time effect and treats the space effect as a varying coefficient for the time effect. A simulation study shows that the performance of the STVCM on missing data imputation is better than several existing methods such as the neural network and the Gaussian process. The STVCM is also applied to all 156 streams in this study to obtain a complete data record. / Ph. D. Steam Temperature Varying Coefficient Model Functional Data Clustering Missing Data Imputation
13	Ajuste de modelos e comparação de séries temporais para dados de vazão específica em microbacias pareadas / Fitting of models and comparison of time series for specific flow data in paired catchments Amaral, Marcus Vinicius Silva Gurgel do 15 July 2014 (has links) A crescente preocupação com o meio ambiente pressiona a sociedade como um todo para a uma mudança rumo a hábitos mais sustentáveis. No setor produtivo, o impulso se dá pelo desenvolvimento de técnicas mais eficientes de produção, embasados em pesquisas e experimentos de campo. No setor florestal, além da preocupação com a técnicas de manejo e com o solo, o principal recurso a ser preservado é a água. Por meio do monitoramento de rios em bacias hidrográficas, séries históricas são coletadas, possibilitando o uso da teoria de séries temporais para ajuste de modelos pela metodologia Box e Jenkins. Em casos de monitoramentos de microbacias pareadas, existe a possibilidade de se comparar séries temporais, como descrito no presente trabalho. Em duas microbacias pareadas localizadas na região centro-leste do estado do Paraná, em uma fazenda no município de Telêmaco Borba, dados correspondendo a duas séries temporais distintas de vazão específica foram coletados. Devido a presença de falhas nos conjuntos de dados, uma metodologia para imputação foi utilizada de duas maneiras diferentes, possibilitando a posterior comparação das duas séries temporais pela metodologia de séries temporais. De acordo com os resultados, verifica-se que ambas as séries são diferentes tanto para o teste de comparação das funções de autocorrelação, quanto para o teste de comparação de séries temporais proposto por Silva, Ferreira e Sáfadi (2000). Portanto, segundo a caracterização dos estudos em microbacias pareadas, pode-se constatar que o manejo florestal empregado nos dois locais influenciam de forma diferente no comportamento da variável avaliada. / The growing concern for the enviroment presses society as a whole for a change towards sustainable habits. Regarding the production systems, more efficient production techniques based on research and field experiments are needed. As for forestry, besides the concern with management techniques and with soil preparation, the main resource to be preserved is water. Time series are collected by monitoring rivers in drainage basins, making possible the use of time series theory for fitting models based on Box and Jenkins methodology. When studying paired drainage basins, it is possible to compare time series, as described in this work. Two time series consisting of specific flow data were collected in a farm situated in the municipality of Telêmaco Borba, Eastern Paraná state, in two paired drainage basins. Because there were missing data, imputation techniques were used, making it possible to compare the two time series. Results showed that the time series are different for the comparison of the autocorrelation test and the time series comparison test proposed by Silva, Ferreira e Sáfadi (2000). Therefore, according to studies involving paired drainage basins, different forest management techniques influence differently the behavior of the response variable in the different drainage basins. Comparação de séries temporais Comparison of time series Data imputation Imputação de dados Microbacias pareadas Paired catchments Séries temporais Time series
14	A Study of Missing Data Imputation and Predictive Modeling of Strength Properties of Wood Composites Zeng, Yan 01 August 2011 (has links) Problem: Real-time process and destructive test data were collected from a wood composite manufacturer in the U.S. to develop real-time predictive models of two key strength properties (Modulus of Rupture (MOR) and Internal Bound (IB)) of a wood composite manufacturing process. Sensor malfunction and data “send/retrieval” problems lead to null fields in the company’s data warehouse which resulted in information loss. Many manufacturers attempt to build accurate predictive models excluding entire records with null fields or using summary statistics such as mean or median in place of the null field. However, predictive model errors in validation may be higher in the presence of information loss. In addition, the selection of predictive modeling methods poses another challenge to many wood composite manufacturers. Approach: This thesis consists of two parts addressing above issues: 1) how to improve data quality using missing data imputation; 2) what predictive modeling method is better in terms of prediction precision (measured by root mean square error or RMSE). The first part summarizes an application of missing data imputation methods in predictive modeling. After variable selection, two missing data imputation methods were selected after comparing six possible methods. Predictive models of imputed data were developed using partial least squares regression (PLSR) and compared with models of non-imputed data using ten-fold cross-validation. Root mean square error of prediction (RMSEP) and normalized RMSEP (NRMSEP) were calculated. The second presents a series of comparisons among four predictive modeling methods using imputed data without variable selection. Results: The first part concludes that expectation-maximization (EM) algorithm and multiple imputation (MI) using Markov Chain Monte Carlo (MCMC) simulation achieved more precise results. Predictive models based on imputed datasets generated more precise prediction results (average NRMSEP of 5.8% for model of MOR model and 7.2% for model of IB) than models of non-imputed datasets (average NRMSEP of 6.3% for model of MOR and 8.1% for model of IB). The second part finds that Bayesian Additive Regression Tree (BART) produced most precise prediction results (average NRMSEP of 7.7% for MOR model and 8.6% for IB model) than other three models: PLSR, LASSO, and Adaptive LASSO. missing data imputation predictive modeling partial least squares regression LASSO Adaptive LASSO BART Applied Statistics Statistical Methodology Statistical Models
15	Ajuste de modelos e comparação de séries temporais para dados de vazão específica em microbacias pareadas / Fitting of models and comparison of time series for specific flow data in paired catchments Marcus Vinicius Silva Gurgel do Amaral 15 July 2014 (has links) A crescente preocupação com o meio ambiente pressiona a sociedade como um todo para a uma mudança rumo a hábitos mais sustentáveis. No setor produtivo, o impulso se dá pelo desenvolvimento de técnicas mais eficientes de produção, embasados em pesquisas e experimentos de campo. No setor florestal, além da preocupação com a técnicas de manejo e com o solo, o principal recurso a ser preservado é a água. Por meio do monitoramento de rios em bacias hidrográficas, séries históricas são coletadas, possibilitando o uso da teoria de séries temporais para ajuste de modelos pela metodologia Box e Jenkins. Em casos de monitoramentos de microbacias pareadas, existe a possibilidade de se comparar séries temporais, como descrito no presente trabalho. Em duas microbacias pareadas localizadas na região centro-leste do estado do Paraná, em uma fazenda no município de Telêmaco Borba, dados correspondendo a duas séries temporais distintas de vazão específica foram coletados. Devido a presença de falhas nos conjuntos de dados, uma metodologia para imputação foi utilizada de duas maneiras diferentes, possibilitando a posterior comparação das duas séries temporais pela metodologia de séries temporais. De acordo com os resultados, verifica-se que ambas as séries são diferentes tanto para o teste de comparação das funções de autocorrelação, quanto para o teste de comparação de séries temporais proposto por Silva, Ferreira e Sáfadi (2000). Portanto, segundo a caracterização dos estudos em microbacias pareadas, pode-se constatar que o manejo florestal empregado nos dois locais influenciam de forma diferente no comportamento da variável avaliada. / The growing concern for the enviroment presses society as a whole for a change towards sustainable habits. Regarding the production systems, more efficient production techniques based on research and field experiments are needed. As for forestry, besides the concern with management techniques and with soil preparation, the main resource to be preserved is water. Time series are collected by monitoring rivers in drainage basins, making possible the use of time series theory for fitting models based on Box and Jenkins methodology. When studying paired drainage basins, it is possible to compare time series, as described in this work. Two time series consisting of specific flow data were collected in a farm situated in the municipality of Telêmaco Borba, Eastern Paraná state, in two paired drainage basins. Because there were missing data, imputation techniques were used, making it possible to compare the two time series. Results showed that the time series are different for the comparison of the autocorrelation test and the time series comparison test proposed by Silva, Ferreira e Sáfadi (2000). Therefore, according to studies involving paired drainage basins, different forest management techniques influence differently the behavior of the response variable in the different drainage basins. Comparação de séries temporais Imputação de dados Microbacias pareadas Séries temporais Comparison of time series Data imputation Paired catchments Time series
16	Statistical Inference for Multivariate Stochastic Differential Equations Liu, Ge 15 November 2019 (has links) No description available. Statistics data imputation Bayesian data augmentation method Bayesian MCMC pseudo marginal MCMC stochastic process
17	Context Similarity for Retrieval-Based Imputation Ahmadov, Ahmad, Thiele, Maik, Lehner, Wolfgang, Wrembel, Robert 30 June 2022 (has links) Completeness as one of the four major dimensions of data quality is a pervasive issue in modern databases. Although data imputation has been studied extensively in the literature, most of the research is focused on inference-based approach. We propose to harness Web tables as an external data source to effectively and efficiently retrieve missing data while taking into account the inherent uncertainty and lack of veracity that they contain. Existing approaches mostly rely on standard retrieval techniques and out-of-the-box matching methods which result in a very low precision, especially when dealing with numerical data. We, therefore, propose a novel data imputation approach by applying numerical context similarity measures which results in a significant increase in the precision of the imputation procedure, by ensuring that the imputed values are of the same domain and magnitude as the local values, thus resulting in an accurate imputation. We use Dresden Web Table Corpus which is comprised of more than 125 million web tables extracted from the Common Crawl as our knowledge source. The comprehensive experimental results demonstrate that the proposed method well outperforms the default out-of-the-box retrieval approach. info:eu-repo/classification/ddc/004 ddc:004
18	An?lise de res?duos em modelos de tempo de falha acelerado com efeito aleat?rio Rodrigues, Elis?ngela da Silva 15 April 2013 (has links) Made available in DSpace on 2014-12-17T15:26:39Z (GMT). No. of bitstreams: 1 ElisangelaSR_Parcial.pdf: 3958937 bytes, checksum: ab2ba14c6737760ff8b9b0b1ac7f9db2 (MD5) Previous issue date: 2013-04-15 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / We present residual analysis techniques to assess the fit of correlated survival data by Accelerated Failure Time Models (AFTM) with random effects. We propose an imputation procedure for censored observations and consider three types of residuals to evaluate different model characteristics. We illustrate the proposal with the analysis of AFTM with random effects to a real data set involving times between failures of oil well equipment / Apresentamos t?cnicas de an?lise de res?duos para avaliar o ajuste de dados de sobreviv?ncia correlacionados por meio de Modelos de Tempo de Falha Acelerado (MTFA) com efeitos aleat?rios. Propomos um procedimento de imputa??o para as informa??es censuradas e consideramos tr?s tipos de res?duos para avaliar diferentes caracter?sticas do modelo. Ilustramos as propostas com a an?lise do ajuste de um MTFA com efeito aleat?rio a um conjunto de dados reais envolvendo tempos entre falhas de equipamentos de po?os de petr?leo / 2020-01-01
19	Imputação de dados faltantes via algoritmo EM e rede neural MLP com o método de estimativa de máxima verossimilhança para aumentar a acurácia das estimativas Ribeiro, Elisalvo Alves 14 August 2015 (has links) Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Database with missing values it is an occurrence often found in the real world, beiging of this problem caused by several reasons (equipment failure that transmits and stores the data, handler failure, failure who provides information, etc.). This may make the data inconsistent and unable to be analyzed, leading to very skewed conclusions. This dissertation aims to explore the use of Multilayer Perceptron Artificial Neural Network (ANN MLP), with new activation functions, considering two approaches (single imputation and multiple imputation). First, we propose the use of Maximum Likelihood Estimation Method (MLE) in each network neuron activation function, against the approach currently used, which is without the use of such a method or when is used only in the cost function (network output). It is then analyzed the results of these approaches compared with the Expectation Maximization algorithm (EM) is that the state of the art to treat missing data. The results indicate that when using the Artificial Neural Network MLP with Maximum Likelihood Estimation Method, both in all neurons and only in the output function, lead the an imputation with lower error. These experimental results, evaluated by metrics such as MAE (Mean Absolute Error) and RMSE (Root Mean Square Error), showed that the better results in most experiments occured when using the MLP RNA addressed in this dissertation to single imputation and multiple. / Base de dados com valores faltantes é uma ocorrência frequentemente encontrada no mundo real, sendo as causas deste problema são originadas por motivos diversos (falha no equipamento que transmite e armazena os dados, falha do manipulador, falha de quem fornece a informação, etc.). Tal situação pode tornar os dados inconsistentes e inaptos de serem analisados, conduzindo às conclusões muito enviesadas. Esta dissertação tem como objetivo explorar o emprego de Redes Neurais Artificiais Multilayer Perceptron (RNA MLP), com novas funções de ativação, considerando duas abordagens (imputação única e imputação múltipla). Primeiramente, é proposto o uso do Método de Estimativa de Máxima Verossimilhança (EMV) na função de ativação de cada neurônio da rede, em contrapartida à abordagem utilizada atualmente, que é sem o uso de tal método, ou quando o utiliza é apenas na função de custo (na saída da rede). Em seguida, são analisados os resultados destas abordagens em comparação com o algoritmo Expectation Maximization (EM) que é o estado da arte para tratar dados faltantes. Os resultados obtidos indicam que ao utilizar a Rede Neural Artificial MLP com o Método de Estimativa de Máxima Verossimilhança, tanto em todos os neurônios como apenas na função de saída, conduzem a uma imputação com menor erro. Os resultados experimentais foram avaliados via algumas métricas, sendo as principais o MAE (Mean Absolute Error) e RMSE (Root Mean Square Error), as quais apresentaram melhores resultados na maioria dos experimentos quando se utiliza a RNA MLP abordada neste trabalho para fazer imputação única e múltipla. Redes Neurais Artificiais MLP Algoritmo EM Imputação de dados Dados faltantes Novas funções de ativação Computação Algoritmos de computador Redes neurais (Computação) Variáveis aleatórias Banco de dados Artificial Neural Networks MLP Maximum Likelihood Estimation Method EM Algorithm Data imputation Missing data New function activation

Search results