Global ETD Search

1	The development of a spatial-temporal data imputation technique for the applications of environmental monitoring Huang, Ya-Chen 12 September 2006 (has links) In recent years, sustainable development has become one of the most important issues internationally. Many indicators related to sustainable development have been proposed and implemented, such as Island Taiwan and Urban Taiwan. However the missing values come along with environmental monitoring data pose serious problems when we conducted the study on building a sustainable development indicator for marine environment. Since data is the origin of the summarized information, such as indicators. Given the poor data quality caused by the missing values, there will be some doubts about the result accuracy when using such data set for estimation. It is therefore important to apply suitable data pre-processing, such that reliable information can be acquired by advanced data analysis. Several reasons cause the problem of missing value in environmental monitoring data, for example: breakdown of machines, ruin of samples, forgot recording, mismatch of records when merging data, and lost of records when processing data. The situations of missing data are also diverse, for example: in the same time of sampling, some data records at several sampling sites are partially or completely disappeared. On the contrary, partial or complete time series data are missing at the same sampling site. It is therefore obvious to see that the missing values of environmental monitoring data are both related to spatial and temporal dimensions. Currently the techniques of data imputation have been developed for certain types of data or the interpolation of missing values based on either geographic data distributions or time-series functions. To accommodate both spatial and temporal information in an analysis is rarely seen. The current study has been tried to integrate the related analysis procedures and develop a computing process using both spatial and temporal dimensions inherent in the environmental monitoring data. Such data imputation process can enhance the accuracy of estimated missing values. environmental monitoring data data imputation missing values
2	The wild bootstrap resampling in regression imputation algorithm with a Gaussian Mixture Model Mat Jasin, A., Neagu, Daniel, Csenki, Attila 08 July 2018 (has links) Yes / Unsupervised learning of finite Gaussian mixture model (FGMM) is used to learn the distribution of population data. This paper proposes the use of the wild bootstrapping to create the variability of the imputed data in single miss-ing data imputation. We compare the performance and accuracy of the proposed method in single imputation and multiple imputation from the R-package Amelia II using RMSE, R-squared, MAE and MAPE. The proposed method shows better performance when compared with the multiple imputation (MI) which is indeed known as the golden method of missing data imputation techniques. Missing data imputation Gaussian Mixture Model Bootstrap
3	Missing imputation methods explored in big data analytics Brydon, Humphrey Charles January 2018 (has links) Philosophiae Doctor - PhD (Statistics and Population Studies) / The aim of this study is to look at the methods and processes involved in imputing missing data and more specifically, complete missing blocks of data. A further aim of this study is to look at the effect that the imputed data has on the accuracy of various predictive models constructed on the imputed data and hence determine if the imputation method involved is suitable. The identification of the missingness mechanism present in the data should be the first process to follow in order to identify a possible imputation method. The identification of a suitable imputation method is easier if the mechanism can be identified as one of the following; missing completely at random (MCAR), missing at random (MAR) or not missing at random (NMAR). Predictive models constructed on the complete imputed data sets are shown to be less accurate for those models constructed on data sets which employed a hot-deck imputation method. The data sets which employed either a single or multiple Monte Carlo Markov Chain (MCMC) or the Fully Conditional Specification (FCS) imputation methods are shown to result in predictive models that are more accurate. The addition of an iterative bagging technique in the modelling procedure is shown to produce highly accurate prediction estimates. The bagging technique is applied to variants of the neural network, a decision tree and a multiple linear regression (MLR) modelling procedure. A stochastic gradient boosted decision tree (SGBT) is also constructed as a comparison to the bagged decision tree. Final models are constructed from 200 iterations of the various modelling procedures using a 60% sampling ratio in the bagging procedure. It is further shown that the addition of the bagging technique in the MLR modelling procedure can produce a MLR model that is more accurate than that of the other more advanced modelling procedures under certain conditions. The evaluation of the predictive models constructed on imputed data is shown to vary based on the type of fit statistic used. It is shown that the average squared error reports little difference in the accuracy levels when compared to the results of the Mean Absolute Prediction Error (MAPE). The MAPE fit statistic is able to magnify the difference in the prediction errors reported. The Normalized Mean Bias Error (NMBE) results show that all predictive models constructed produced estimates that were an over-prediction, although these did vary depending on the data set and modelling procedure used. The Nash Sutcliffe efficiency (NSE) was used as a comparison statistic to compare the accuracy of the predictive models in the context of imputed data. The NSE statistic showed that the estimates of the models constructed on the imputed data sets employing a multiple imputation method were highly accurate. The NSE statistic results reported that the estimates from the predictive models constructed on the hot-deck imputed data were inaccurate and that a mean substitution of the fully observed data would have been a better method of imputation. The conclusion reached in this study shows that the choice of imputation method as well as that of the predictive model is dependent on the data used. Four unique combinations of imputation methods and modelling procedures were concluded for the data considered in this study.
4	As the World Turns Out: Economic Growth and Voter Turnout From a Global Perspective Koch, Luther Allen 11 June 2007 (has links) No description available. voter turnout economic growth data imputation global perspective machine politics international political economy democracy declining voter turnout hot deck data imputation economic development public spending on health
5	Anomaly detection in unknown environments using wireless sensor networks Li, YuanYuan 01 May 2010 (has links) This dissertation addresses the problem of distributed anomaly detection in Wireless Sensor Networks (WSN). A challenge of designing such systems is that the sensor nodes are battery powered, often have different capabilities and generally operate in dynamic environments. Programming such sensor nodes at a large scale can be a tedious job if the system is not carefully designed. Data modeling in distributed systems is important for determining the normal operation mode of the system. Being able to model the expected sensor signatures for typical operations greatly simplifies the human designer’s job by enabling the system to autonomously characterize the expected sensor data streams. This, in turn, allows the system to perform autonomous anomaly detection to recognize when unexpected sensor signals are detected. This type of distributed sensor modeling can be used in a wide variety of sensor networks, such as detecting the presence of intruders, detecting sensor failures, and so forth. The advantage of this approach is that the human designer does not have to characterize the anomalous signatures in advance. The contributions of this approach include: (1) providing a way for a WSN to autonomously model sensor data with no prior knowledge of the environment; (2) enabling a distributed system to detect anomalies in both sensor signals and temporal events online; (3) providing a way to automatically extract semantic labels from temporal sequences; (4) providing a way for WSNs to save communication power by transmitting compressed temporal sequences; (5) enabling the system to detect time-related anomalies without prior knowledge of abnormal events; and, (6) providing a novel missing data estimation method that utilizes temporal and spatial information to replace missing values. The algorithms have been designed, developed, evaluated, and validated experimentally in synthesized data, and in real-world sensor network applications. wireless sensor network signal processing sensor fusion time-series analysis missing data imputation anomaly detection Robotics
6	Comparação das águas dos rios Jaguari e Atibaia na região de lançamento de efluente de indústria petroquímica / Comparision of the water from rivers Jaguari and Atibaia at the region of wastewater release by a petrochemical industry Oliveira, Eduardo Schneider Bueno de [UNESP] 03 February 2016 (has links) Submitted by EDUARDO SCHNEIDER BUENO DE OLIVEIRA null (eduardosbdeoliveira@hotmail.com) on 2016-04-14T17:34:57Z No. of bitstreams: 1 Dissertação Final - Eduardo Schneider.pdf: 4265629 bytes, checksum: 4e5da4135aad7da51adb68c347b376b1 (MD5) / Approved for entry into archive by Felipe Augusto Arakaki (arakaki@reitoria.unesp.br) on 2016-04-18T13:08:57Z (GMT) No. of bitstreams: 1 oliveira_esb_me_bot.pdf: 4265629 bytes, checksum: 4e5da4135aad7da51adb68c347b376b1 (MD5) / Made available in DSpace on 2016-04-18T13:08:57Z (GMT). No. of bitstreams: 1 oliveira_esb_me_bot.pdf: 4265629 bytes, checksum: 4e5da4135aad7da51adb68c347b376b1 (MD5) Previous issue date: 2016-02-03 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / A ação antrópica na natureza é algo muito constante ao longo de toda a história, mas cada vez mais notam-se os efeitos negativos que por vezes ela pode trazer. Verificar esses efeitos, suas implicações, e aquilo que pode ser feito para evitar maiores problemas é de suma importância para a manutenção de nosso planeta em boas condições e consequentemente para a qualidade de vida do ser humano. O presente estudo realiza uma an álise da qualidade da água dos Rios Jaguari e Atibaia, entre os quais há o despejo de resíduos de uma indústria, além da qualidade da água após o processo de utilização pela indústria, antes de sua devolução ao rio. Com isso, pode-se verificar a qualidade do tratamento de resíduo de tal indústria e analisar possíveis efeitos que possa haver na qualidade da água após o despejo dos resíduos no rio. Para isso, com base em dados sobre características físicas, químicas e microbiológicas da água, são utilizadas técnicas estatísticas adequadas para realizar a análise necessária ao intuito anteriormente exposto. Como os dados possuem dependência entre si, é necessário que sejam utilizados métodos que permitam tal ocorrência, como o Bootstrap em Blocos não param étrico (Künsch, 1989; Politis & Romano, 1994). Também há a realização de imputação múltipla de dados, uma vez que há diversos meses do estudo com dados ausentes, através da técnica de Imputação de Dados Livre de Distribuição (Bergamo, 2007; Bergamo et al., 2008). / The anthropic action in nature is a constant factor along the history, but each day the negative effects that it brings can be increasingly seen. Check these effects, its implications and what can be done in order to avoid bigger problems has a great importance to the manteinance of our planet in good conditions and, consequently, to the human being life quality. This study performs an analysis of the water quality of the Jaguari and Atibaia rivers, among which happens the dumping of residuals from a petrochemical industry, as well as of the quality of the water after its utilization process by the industry, before its devolution to the river. Thus, it is possible to verify this industry’s residual treatment quality and to analyze possible effects to the water quality after the residual dumping at the river. For this, based on data about fisical, chemical and microbiological characteristics of the water, appropriate statistical techniques are used, aiming to do the necessary analysis to fullfill the exposed intention. Because of the existence of dependency, methods that allow this ocurrence shall be used, such as the non parametric Blocks Bootstrap (K¨unsch, 1989; Politis & Romano, 1994). There is also the realization of multiple imputation, using the technique of the Distribution-free Multiple Imputation (Bergamo, 2007; Bergamo et al., 2008), once for some months there are missing data. Qualidade da água Imputação de dados Bootstrap em blocos Water quality Data imputation Blocks bootstrap
7	Comparação das águas dos rios Jaguari e Atibaia na região de lançamento de efluente de indústria petroquímica Oliveira, Eduardo Schneider Bueno de January 2016 (has links) Orientador: Antonio Carlos Simões Pião / Resumo: A ação antrópica na natureza é algo muito constante ao longo de todaa história, mas cada vez mais notam-se os efeitos negativos que por vezes ela podetrazer. Verificar esses efeitos, suas implicações, e aquilo que pode ser feito para evitarmaiores problemas é de suma importância para a manutenção de nosso planetaem boas condições e consequentemente para a qualidade de vida do ser humano.O presente estudo realiza uma an álise da qualidade da água dos Rios Jaguari eAtibaia, entre os quais há o despejo de resíduos de uma indústria, além da qualidadeda água após o processo de utilização pela indústria, antes de sua devolução ao rio.Com isso, pode-se verificar a qualidade do tratamento de resíduo de tal indústria eanalisar possíveis efeitos que possa haver na qualidade da água após o despejo dosresíduos no rio. Para isso, com base em dados sobre características físicas, químicas emicrobiológicas da água, são utilizadas técnicas estatísticas adequadas para realizara análise necessária ao intuito anteriormente exposto. Como os dados possuemdependência entre si, é necessário que sejam utilizados métodos que permitam talocorrência, como o Bootstrap em Blocos não param étrico (Künsch, 1989; Politis& Romano, 1994). Também há a realização de imputação múltipla de dados,uma vez que há diversos meses do estudo com dados ausentes, através da técnicade Imputação de Dados Livre de Distribuição (Bergamo, 2007; Bergamo et al., 2008). / Abstract: The anthropic action in nature is a constant factor along the history, but each day the negative effects that it brings can be increasingly seen. Check these effects, its implications and what can be done in order to avoid bigger problems has a great importance to the manteinance of our planet in good conditions and, consequently, to the human being life quality. This study performs an analysis of the water quality of the Jaguari and Atibaia rivers, among which happens the dumping of residuals from a petrochemical industry, as well as of the quality of the water after its utilization process by the industry, before its devolution to the river. Thus, it is possible to verify this industry’s residual treatment quality and to analyze possible effects to the water quality after the residual dumping at the river. For this, based on data about fisical, chemical and microbiological characteristics of the water, appropriate statistical techniques are used, aiming to do the necessary analysis to fullfill the exposed intention. Because of the existence of dependency, methods that allow this ocurrence shall be used, such as the non parametric Blocks Bootstrap (K¨unsch, 1989; Politis & Romano, 1994). There is also the realization of multiple imputation, using the technique of the Distribution-free Multiple Imputation (Bergamo, 2007; Bergamo et al., 2008), once for some months there are missing data. / Mestre Água - Qualidade. Imputação de dados Bootstrap em blocos Water quality Data imputation Blocks bootstrap
8	Understanding Visual Representation of Imputed Data for Aiding Human Decision-Making Thompson, Ryan M. January 2020 (has links) No description available. Engineering Experiments data imputation method imputed data human decision process human decision making
9	The Single Imputation Technique in the Gaussian Mixture Model Framework Aisyah, Binti M.J. January 2018 (has links) Missing data is a common issue in data analysis. Numerous techniques have been proposed to deal with the missing data problem. Imputation is the most popular strategy for handling the missing data. Imputation for data analysis is the process to replace the missing values with any plausible values. Two most frequent imputation techniques cited in literature are the single imputation and the multiple imputation. The multiple imputation, also known as the golden imputation technique, has been proposed by Rubin in 1987 to address the missing data. However, the inconsistency is the major problem in the multiple imputation technique. The single imputation is less popular in missing data research due to bias and less variability issues. One of the solutions to improve the single imputation technique in the basic regression model: the main motivation is that, the residual is added to improve the bias and variability. The residual is drawn by normal distribution assumption with a mean of 0, and the variance is equal to the residual variance. Although new methods in the single imputation technique, such as stochastic regression model, and hot deck imputation, might be able to improve the variability and bias issues, the single imputation techniques suffer with the uncertainty that may underestimate the R-square or standard error in the analysis results. The research reported in this thesis provides two imputation solutions for the single imputation technique. In the first imputation procedure, the wild bootstrap is proposed to improve the uncertainty for the residual variance in the regression model. In the second solution, the predictive mean matching (PMM) is enhanced, where the regression model is taking the main role to generate the recipient values while the observations in the donors are taken from the observed values. Then the missing values are imputed by randomly drawing one of the observations in the donor pool. The size of the donor pool is significant to determine the quality of the imputed values. The fixed size of donor is used to be employed in many existing research works with PMM imputation technique, but might not be appropriate in certain circumstance such as when the data distribution has high density region. Instead of using the fixed size of donor pool, the proposed method applies the radius-based solution to determine the size of donor pool. Both proposed imputation procedures will be combined with the Gaussian mixture model framework to preserve the original data distribution. The results reported in the thesis from the experiments on benchmark and artificial data sets confirm improvement for further data analysis. The proposed approaches are therefore worthwhile to be considered for further investigation and experiments. Missing data imputation Gaussian mixture model Wild bootstrap resampling Predictive mean matching
10	The Impact of Data Imputation Methodologies on Knowledge Discovery Brown, Marvin Lane 26 November 2008 (has links) No description available. Business Education Computer Science Data Mining Knowledge Discovery Data Imputation Neural Networks Transfer Functions Sigmoid

Search results