• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 127
  • 25
  • 20
  • 17
  • 4
  • 4
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 250
  • 250
  • 77
  • 53
  • 53
  • 52
  • 35
  • 33
  • 31
  • 25
  • 25
  • 24
  • 23
  • 20
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

Anomaly detection in unknown environments using wireless sensor networks

Li, YuanYuan 01 May 2010 (has links)
This dissertation addresses the problem of distributed anomaly detection in Wireless Sensor Networks (WSN). A challenge of designing such systems is that the sensor nodes are battery powered, often have different capabilities and generally operate in dynamic environments. Programming such sensor nodes at a large scale can be a tedious job if the system is not carefully designed. Data modeling in distributed systems is important for determining the normal operation mode of the system. Being able to model the expected sensor signatures for typical operations greatly simplifies the human designer’s job by enabling the system to autonomously characterize the expected sensor data streams. This, in turn, allows the system to perform autonomous anomaly detection to recognize when unexpected sensor signals are detected. This type of distributed sensor modeling can be used in a wide variety of sensor networks, such as detecting the presence of intruders, detecting sensor failures, and so forth. The advantage of this approach is that the human designer does not have to characterize the anomalous signatures in advance. The contributions of this approach include: (1) providing a way for a WSN to autonomously model sensor data with no prior knowledge of the environment; (2) enabling a distributed system to detect anomalies in both sensor signals and temporal events online; (3) providing a way to automatically extract semantic labels from temporal sequences; (4) providing a way for WSNs to save communication power by transmitting compressed temporal sequences; (5) enabling the system to detect time-related anomalies without prior knowledge of abnormal events; and, (6) providing a novel missing data estimation method that utilizes temporal and spatial information to replace missing values. The algorithms have been designed, developed, evaluated, and validated experimentally in synthesized data, and in real-world sensor network applications.
82

Study and validation of data structures with missing values. Application to survival analysis

Serrat i Piè, Carles 21 May 2001 (has links)
En aquest treball tractem tres metodologies diferents -no paramètrica, paramètrica i semiparamètrica- per tal de considerar els patrons de dades amb valors no observats en un context d'anàlisi de la supervivència. Les dues primeres metodologies han estat desenvolupades sota les hipòtesis de MCAR (Missing Completely at Random) o MAR (Missing at Random). Primer, hem utilitzat el mètode de remostreig de bootstrap i un esquema d'imputació basat en un model bilineal en la matriu de dades per tal d'inferir sobre la distribució dels paràmetres d'interès. Per una altra banda, hem analitzat els inconvenients a l'hora d'obtenir inferències correctes quan es tracta el problema de forma totalment paramètrica, a la vegada que hem proposat algunes estratègies per tenir en compte la informació complementària que poden proporcionar altres covariants completament observades.De tota manera, en general no es pot suposar la ignorabilitat del mecanisme de no resposta. Aleshores, ens proposem desenvolupar un mètode semiparamètric per a l'anàlisi de la supervivència quan tenim un patró de no resposta no ignorable. Primer, proposem l'estimador de Kaplan-Meier Agrupat (GKM) com una alternativa a l'estimador KM estàndard per tal d'estimar la supervivència en un nombre finit de temps fixats. De tota manera, quan les covariants són parcialment observades ni l'estimador GKM estratificat ni l'estimador KM estratificat poden ser calculats directament a partir de la mostra. Aleshores, proposem una classe d'equacions d'estimació per tal d'obtenir estimadors semiparamètrics de les probabilitats i substituïm aquestes estimacions en l'estimador GKM estratificat. Ens referim a aquest nou estimador com l'estimador Kaplan-Meier Agrupat-Estimat (EGKM). Demostrem que els estimadors GKM i EGKM són arrel quadrada consistents i que asimptòticament segueixen una distribució normal multivariant, a la vegada que obtenim estimadors consistents per a la matriu de variància-covariància límit. L'avantatge de l'estimador EGKM és que proporciona estimacions no esbiaixades de la supervivència i permet utilitzar un model de selecció flexible per a les probabilitats de no resposta. Il·lustrem el mètode amb una aplicació a una cohort de pacients amb Tuberculosi i infectats pel VIH. Al final de l'aplicació, duem a terme una anàlisi de sensibilitat que inclou tots els tipus de patrons de no resposta, des de MCAR fins a no ignorable, i que permet que l'analista pugui obtenir conclusions després d'analitzar tots els escenaris plausibles i d'avaluar l'impacte que tenen les suposicions en el mecanisme no ignorable de no resposta sobre les inferències resultants.Acabem l'enfoc semiparamètric explorant el comportament de l'estimador EGKM per a mostres finites. Per fer-ho, duem a terme un estudi de simulació. Les simulacions, sota escenaris que tenen en compte diferents nivells de censura, de patrons de no resposta i de grandàries mostrals, il·lustren les bones propietats que té l'estimador que proposem. Per exemple, les probabilitats de cobertura tendeixen a les nominals quan el patró de no resposta fet servir en l'anàlisi és proper al vertader patró de no resposta que ha generat les dades. En particular, l'estimador és eficient en el cas menys informatiu dels considerats: aproximadament un 80% de censura i un 50% de dades no observades. / In this work we have approached three different methodologies --nonparametric, parametric and semiparametric-- to deal with data patterns with missing values in a survival analysis context. The first two approaches have been developed under the assumption that the investigator has enough information and can assume that the non-response mechanism is MCAR or MAR. In this situation, we have adapted a bootstrap and bilinear multiple imputation scheme to draw the distribution of the parameters of interest. On the other hand, we have analyzed the drawbacks encountered to get correct inferences, as well as, we have proposed some strategies to take into account the information provided by other fully observed covariates.However, in many situations it is impossible to assume the ignorability of the non-response probabilities. Then, we focus our interest in developing a method for survival analysis when we have a non-ignorable non-response pattern, using a semiparametric perspective. First, for right censored samples with completely observed covariates, we propose the Grouped Kaplan-Meier estimator (GKM) as an alternative to the standard KM estimator when we are interested in the survival at a finite number of fixed times of interest. However, when the covariates are partially observed, neither the stratified GKM estimator, nor the stratified KM estimator can be directly computed from the sample. Henceforth, we propose a class of estimating equations to obtain semiparametric estimates for these probabilities and then we substitute these estimates in the stratified GKM estimator. We refer to this new estimation procedure as Estimated Grouped Kaplan-Meier estimator (EGKM). We prove that the GKM and EGKM estimators are squared root consistent and asymptotically normal distributed, and a consistent estimator for their limiting variances is derived. The advantage of the EGKM estimator is that provides asymptotically unbiased estimates for the survival under a flexible selection model for the non-response probability pattern. We illustrate the method with a cohort of HIV-infected with Tuberculosis patients. At the end of the application, a sensitivity analysis that includes all types of non-response pattern, from MCAR to non-ignorable, allows the investigator to draw conclusions after analyzing all the plausible scenarios and evaluating the impact on the resulting inferences of the non-ignorable assumptions in the non-response mechanism.We close the semiparametric approach by exploring the behaviour of the EGKM estimator for finite samples. In order to do that, a simulation study is carried out. Simulations performed under scenarios taking into account different levels of censoring, non-response probability patterns and sample sizes show the good properties of the proposed estimator. For instance, the empirical coverage probabilities tend to the nominal ones when the non-response pattern used in the analysis is close to the true non-response pattern that generated the data. In particular, it is specially efficient in the less informative scenarios (e,g, around a 80% of censoring and a 50% of missing data).
83

A Monte Carlo Study: The Impact of Missing Data in Cross-Classification Random Effects Models

Alemdar, Meltem 12 August 2009 (has links)
Unlike multilevel data with a purely nested structure, data that are cross-classified not only may be clustered into hierarchically ordered units but also may belong to more than one unit at a given level of a hierarchy. In a cross-classified design, students at a given school might be from several different neighborhoods and one neighborhood might have students who attend a number of different schools. In this type of scenario, schools and neighborhoods are considered to be cross-classified factors, and cross-classified random effects modeling (CCREM) should be used to analyze these data appropriately. A common problem in any type of multilevel analysis is the presence of missing data at any given level. There has been little research conducted in the multilevel literature about the impact of missing data, and none in the area of cross-classified models. The purpose of this study was to examine the effect of data that are missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR), on CCREM estimates while exploring multiple imputation to handle the missing data. In addition, this study examined the impact of including an auxiliary variable that is correlated with the variable with missingness (the level-1 predictor) in the imputation model for multiple imputation. This study expanded on the CCREM Monte Carlo simulation work of Meyers (2004) by the inclusion of studying the effect of missing data and method for handling these missing data with CCREM. The results demonstrated that in general, multiple imputation met Hoogland and Boomsma’s (1998) relative bias estimation criteria (less than 5% in magnitude) for parameter estimates under different types of missing data patterns. For the standard error estimates, substantial relative bias (defined by Hoogland and Boomsma as greater than 10%) was found in some conditions. When multiple imputation was used to handle the missing data then substantial bias was found in the standard errors in most cells where data were MNAR. This bias increased as a function of the percentage of missing data.
84

Statistical Evaluation of Continuous-Scale Diagnostic Tests with Missing Data

Wang, Binhuan 12 June 2012 (has links)
The receiver operating characteristic (ROC) curve methodology is the statistical methodology for assessment of the accuracy of diagnostics tests or bio-markers. Currently most widely used statistical methods for the inferences of ROC curves are complete-data based parametric, semi-parametric or nonparametric methods. However, these methods cannot be used in diagnostic applications with missing data. In practical situations, missing diagnostic data occur more commonly due to various reasons such as medical tests being too expensive, too time consuming or too invasive. This dissertation aims to develop new nonparametric statistical methods for evaluating the accuracy of diagnostic tests or biomarkers in the presence of missing data. Specifically, novel nonparametric statistical methods will be developed with different types of missing data for (i) the inference of the area under the ROC curve (AUC, which is a summary index for the diagnostic accuracy of the test) and (ii) the joint inference of the sensitivity and the specificity of a continuous-scale diagnostic test. In this dissertation, we will provide a general framework that combines the empirical likelihood and general estimation equations with nuisance parameters for the joint inferences of sensitivity and specificity with missing diagnostic data. The proposed methods will have sound theoretical properties. The theoretical development is challenging because the proposed profile log-empirical likelihood ratio statistics are not the standard sum of independent random variables. The new methods have the power of likelihood based approaches and jackknife method in ROC studies. Therefore, they are expected to be more robust, more accurate and less computationally intensive than existing methods in the evaluation of competing diagnostic tests.
85

Second Level Cluster Dependencies: A Comparison of Modeling Software and Missing Data Techniques

Larsen, Ross Allen Andrew 2010 August 1900 (has links)
Dependencies in multilevel models at the second level have never been thoroughly examined. For certain designs first-level subjects are independent over time, but the second level subjects may exhibit nonzero covariances over time. Following a review of revelant literature the first study investigated which widely used computer programs adequately take into account these dependencies in their analysis. This was accomplished through a simulation study with SAS, and examples of analyses with Mplus and LISREL. The second study investigated the impact of two different missing data techniques for such designs in the case where data is missing at the first level with a simulation study in SAS. The first study simulated data produced in a multiyear study varying the numbers of subjects in the first and second levels, the number of data waves, the magnitude of effects at both the first and second level, and the magnitude of the second level covariance. Results showed that SAS and the MULTILEV component in LISREL analyze such data well while Mplus does not. The second study compared two missing data techniques in the presence of a second level dependency, multiple imputation (MI) and full information maximum likelihood (FIML). They were compared in a SAS simulation study in which the data was simulated with all the factors of the first study and the addition of missing data varied in amounts and patterns (missing completely at random or missing at random). Results showed that FIML is superior to MI because it produces lower bias and correctly estimates standard errors
86

A Comparsion of Multiple Imputation Methods for Missing Covariate Values in Recurrent Event Data

Huo, Zhao January 2015 (has links)
Multiple imputation (MI) is a commonly used approach to impute missing data. This thesis studies missing covariates in recurrent event data, and discusses ways to include the survival outcomes in the imputation model. Some MI methods under consideration are the event indicator D combined with, respectively, the right-censored event times T, the logarithm of T and the cumulative baseline hazard H0(T). After imputation, we can then proceed to the complete data analysis. The Cox proportional hazards (PH) model and the PWP model are chosen as the analysis models, and the coefficient estimates are of substantive interest. A Monte Carlo simulation study is conducted to compare different MI methods, the relative bias and mean square error will be used in the evaluation process. Furthermore, an empirical study based on cardiovascular disease event data which contains missing values will be conducted. Overall, the results show that MI based on the Nelson-Aalen estimate of H0(T) is preferred in most circumstances.
87

Partial least squares structural equation modelling with incomplete data : an investigation of the impact of imputation methods

Mohd Jamil, J. B. January 2012 (has links)
Despite considerable advances in missing data imputation methods over the last three decades, the problem of missing data remains largely unsolved. Many techniques have emerged in the literature as candidate solutions. These techniques can be categorised into two classes: statistical methods of data imputation and computational intelligence methods of data imputation. Due to the longstanding use of statistical methods in handling missing data problems, it takes quite some time for computational intelligence methods to gain profound attention even though these methods have analogous accuracy, in comparison to other approaches. The merits of both these classes have been discussed at length in the literature, but only limited studies make significant comparison to these classes. This thesis contributes to knowledge by firstly, conducting a comprehensive comparison of standard statistical methods of data imputation, namely, mean substitution (MS), regression imputation (RI), expectation maximization (EM), tree imputation (TI) and multiple imputation (MI) on missing completely at random (MCAR) data sets. Secondly, this study also compares the efficacy of these methods with a computational intelligence method of data imputation, ii namely, a neural network (NN) on missing not at random (MNAR) data sets. The significance difference in performance of the methods is presented. Thirdly, a novel procedure for handling missing data is presented. A hybrid combination of each of these statistical methods with a NN, known here as the post-processing procedure, was adopted to approximate MNAR data sets. Simulation studies for each of these imputation approaches have been conducted to assess the impact of missing values on partial least squares structural equation modelling (PLS-SEM) based on the estimated accuracy of both structural and measurement parameters. The best method to deal with particular missing data mechanisms is highly recognized. Several significant insights were deduced from the simulation results. It was figured that for the problem of MCAR by using statistical methods of data imputation, MI performs better than the other methods for all percentages of missing data. Another unique contribution is found when comparing the results before and after the NN post-processing procedure. This improvement in accuracy may be resulted from the neural network's ability to derive meaning from the imputed data set found by the statistical methods. Based on these results, the NN post-processing procedure is capable to assist MS in producing significant improvement in accuracy of the approximated values. This is a promising result, as MS is the weakest method in this study. This evidence is also informative as MS is often used as the default method available to users of PLS-SEM software.
88

Jackknife Empirical Likelihood-Based Confidence Intervals for Low Income Proportions with Missing Data

YIN, YANAN 18 December 2013 (has links)
The estimation of low income proportions plays an important role in comparisons of poverty in different countries. In most countries, the stability of the society and the development of economics depend on the estimation of low income proportions. An accurate estimation of a low income proportion has a crucial role for the development of the natural economy and the improvement of people's living standards. In this thesis, the Jackknife empirical likelihood method is employed to construct confidence intervals for a low income proportion when the observed data had missing values. Comprehensive simulation studies are conducted to compare the relative performances of two Jackknife empirical likelihood based confidence intervals for low income proportions in terms of coverage probability. A real data example is used to illustrate the application of the proposed methods.
89

A Monte Carlo Study Investigating Missing Data, Differential Item Functioning, and Effect Size

Garrett, Phyllis Lorena 12 August 2009 (has links)
ABSTRACT A MONTE CARLO STUDY INVESTIGATING MISSING DATA, DIFFERENTIAL ITEM FUNCTIONING, AND EFFECT SIZE by Phyllis Garrett The use of polytomous items in assessments has increased over the years, and as a result, the validity of these assessments has been a concern. Differential item functioning (DIF) and missing data are two factors that may adversely affect assessment validity. Both factors have been studied separately, but DIF and missing data are likely to occur simultaneously in real assessment situations. This study investigated the Type I error and power of several DIF detection methods and methods of handling missing data for polytomous items generated under the partial credit model. The Type I error and power of the Mantel and ordinal logistic regression were compared using within-person mean substitution and multiple imputation when data were missing completely at random. In addition to assessing the Type I error and power of DIF detection methods and methods of handling missing data, this study also assessed the impact of missing data on the effect size measure associated with the Mantel, the standardized mean difference effect size measure, and ordinal logistic regression, the R-squared effect size measure. Results indicated that the performance of the Mantel and ordinal logistic regression depended on the percent of missing data in the data set, the magnitude of DIF, and the sample size ratio. The Type I error for both DIF detection methods varied based on the missing data method used to impute the missing data. Power to detect DIF increased as DIF magnitude increased, but there was a relative decrease in power as the percent of missing data increased. Additional findings indicated that the percent of missing data, DIF magnitude, and sample size ratio also influenced the effect size measures associated with the Mantel and ordinal logistic regression. The effect size values for both DIF detection methods generally increased as DIF magnitude increased, but as the percent of missing data increased, the effect size values decreased.
90

Empirical Likelihood Confidence Intervals for the Population Mean Based on Incomplete Data

Valdovinos Alvarez, Jose Manuel 09 May 2015 (has links)
The use of doubly robust estimators is a key for estimating the population mean response in the presence of incomplete data. Cao et al. (2009) proposed an alternative doubly robust estimator which exhibits strong performance compared to existing estimation methods. In this thesis, we apply the jackknife empirical likelihood, the jackknife empirical likelihood with nuisance parameters, the profile empirical likelihood, and an empirical likelihood method based on the influence function to make an inference for the population mean. We use these methods to construct confidence intervals for the population mean, and compare the coverage probabilities and interval lengths using both the ``usual'' doubly robust estimator and the alternative estimator proposed by Cao et al. (2009). An extensive simulation study is carried out to compare the different methods. Finally, the proposed methods are applied to two real data sets.

Page generated in 0.0732 seconds