• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 103
  • 25
  • 20
  • 15
  • 4
  • 4
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 218
  • 218
  • 60
  • 49
  • 43
  • 42
  • 32
  • 30
  • 29
  • 25
  • 25
  • 24
  • 20
  • 20
  • 19
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Planned Missing Data Designs in Communication Research

Parsons, Michael M. January 2013 (has links)
No description available.
22

A Simulation Study On The Comparison Of Methods For The Analysis Of Longitudinal Count Data

Inan, Gul 01 July 2009 (has links) (PDF)
The longitudinal feature of measurements and counting process of responses motivate the regression models for longitudinal count data (LCD) to take into account the phenomenons such as within-subject association and overdispersion. One common problem in longitudinal studies is the missing data problem, which adds additional difficulties into the analysis. The missingness can be handled with missing data techniques. However, the amount of missingness in the data and the missingness mechanism that the data have affect the performance of missing data techniques. In this thesis, among the regression models for LCD, the Log-Log-Gamma marginalized multilevel model (Log-Log-Gamma MMM) and the random-intercept model are focused on. The performance of the models is compared via a simulation study under three missing data mechanisms (missing completely at random, missing at random conditional on observed data, and missing not random), two types of missingness percentage (10% and 20%), and four missing data techniques (complete case analysis, subject, occasion and conditional mean imputation). The simulation study shows that while the mean absolute error and mean square error values of Log-Log-Gamma MMM are larger in amount compared to the random-intercept model, both regression models yield parallel results. The simulation study results justify that the amount of missingness in the data and that the missingness mechanism that the data have, strictly influence the performance of missing data techniques under both regression models. Furthermore, while generally occasion mean imputation displays the worst performance, conditional mean imputation shows a superior performance over occasion and subject mean imputation and gives parallel results with complete case analysis.
23

Real-Time Estimation of Aerodynamic Parameters

Larsson Cahlin, Sofia January 2016 (has links)
Extensive testing is performed when a new aircraft is developed. Flight testing is costly and time consuming but there are aspects of the process that can be made more efficient. A program that estimates aerodynamic parameters during flight could be used as a tool when deciding to continue or abort a flight from a safety or data collecting perspective. The algorithm of such a program must function in real time, which for this application would mean a maximum delay of a couple of seconds, and it must handle telemetric data, which might have missing samples in the data stream. Here, a conceptual program for real-time estimation of aerodynamic parameters is developed. Two estimation methods and four methods for handling of missing data are compared. The comparisons are performed using both simulated data and real flight test data. The first estimation method uses the least squares algorithm in the frequency domain and is based on the chirp z-transform. The second estimation method is created by adding boundary terms in the frequency domain differentiation and instrumental variables to the first method. The added boundary terms result in better estimates at the beginning of the excitation and the instrumental variables result in a smaller bias when the noise levels are high. The second method is therefore chosen in the algorithm of the conceptual program as it is judged to have a better performance than the first. The sequential property of the transform ensures functionality in real-time and the program has a maximum delay of just above one second. The four compared methods for handling missing data are to discard the missing data, hold the previous value, use linear interpolation or regard the missing samples as variations in the sample time. The linear interpolation method performs best on analytical data and is compared to the variable sample time method using simulated data. The results of the comparison using simulated data varies depending on the other implementation choices but neither method is found to give unbiased results. In the conceptual program, the variable sample time method is chosen as it gives a lower variance and is preferable from an implementational point of view.
24

Quantifying Power and Bias in Cluster Randomized Trials Using Mixed Models vs. Cluster-Level Analysis in the Presence of Missing Data: A Simulation Study

Vincent, Brenda January 2016 (has links)
In cluster randomized trials (CRTs), groups are randomized to treatment arms rather than individuals while the outcome is assessed on the individuals within each cluster. Individuals within clusters tend to be more similar than in a randomly selected sample, which poses issues with dependence, which may lead to underestimated standard errors if ignored. To adjust for the correlation between individuals within clusters, two main approaches are used to analyze CRTs: cluster-level and individual-level analysis. In a cluster-level analysis summary measures are obtained for each cluster and then the two sets of cluster-specific measures are compared, such as with a t-test of the cluster means. A mixed model which takes into account cluster membership is an example of an individual-level analysis. We used a simulation study to quantify and compare power and bias of these two methods. We further take into account the effect of missing data. Complete datasets were generated and then data were deleted to simulate missing completely at random (MCAR) and missing at random (MAR) data. A balanced design, with two treatment groups and two time points was assumed. Cluster size, variance components (including within-subject, within-cluster and between-cluster variance) and proportion of missingness were varied to simulate common scenarios seen in practice. For each combination of parameters, 1,000 datasets were generated and analyzed. Results of our simulation study indicate that cluster-level analysis resulted in substantial loss of power when data were MAR. Individual-level analysis had higher power and remained unbiased, even with a small number of clusters.
25

Predicting HIV Status Using Neural Networks and Demographic Factors

Tim, Taryn Nicole Ho 15 February 2007 (has links)
Student Number : 0006036T - MSc(Eng) project report - School of Electrical and Information Engineering - Faculty of Engineering and the Built Environment / Demographic and medical history information obtained from annual South African antenatal surveys is used to estimate the risk of acquiring HIV. The estimation system consists of a classifier: a neural network trained to perform binary classification, using supervised learning with the survey data. The survey information contains discrete variables such as age, gravidity and parity, as well as the quantitative variables race and location, making up the input to the neural network. HIV status is the output. A multilayer perceptron with a logistic function is trained with a cross entropy error function, providing a probabilistic interpretation of the output. Predictive and classification performance is measured, and the sensitivity and specificity are illustrated on the Receiver Operating Characteristic. An auto-associative neural network is trained on complete datasets, and when presented with partial data, global optimisation methods are used to approximate the missing entries. The effect of the imputed data on the network prediction is investigated.
26

Techniques to handle missing values in a factor analysis

Turville, Christopher, University of Western Sydney, Faculty of Informatics, Science and Technology January 2000 (has links)
A factor analysis typically involves a large collection of data, and it is common for some of the data to be unrecorded. This study investigates the ability of several techniques to handle missing values in a factor analysis, including complete cases only, all available cases, imputing means, an iterative component method, singular value decomposition and the EM algorithm. A data set that is representative of that used for a factor analysis is simulated. Some of this data are then randomly removed to represent missing values, and the performance of the techniques are investigated over a wide range of conditions. Several criteria are used to investigate the abilities of the techniques to handle missing values in a factor analysis. Overall, there is no one technique that performs best for all of the conditions studied. The EM algorithm is generally the most effective technique except when there are ill-conditioned matrices present or when computing time is of concern. Some theoretical concerns are introduced regarding the effects that changes in the correlation matrix will have on the loadings of a factor analysis. A complicated expression is derived that shows that the change in factor loadings as a result of change in the elements of a correlation matrix involves components of eigenvectors and eigenvalues. / Doctor of Philosophy (PhD)
27

Statistical modeling of longitudinal survey data with binary outcomes

Ghosh, Sunita 20 December 2007
Data obtained from longitudinal surveys using complex multi-stage sampling designs contain cross-sectional dependencies among units caused by inherent hierarchies in the data, and within subject correlation arising due to repeated measurements. The statistical methods used for analyzing such data should account for stratification, clustering and unequal probability of selection as well as within-subject correlations due to repeated measurements. <p>The complex multi-stage design approach has been used in the longitudinal National Population Health Survey (NPHS). This on-going survey collects information on health determinants and outcomes in a sample of the general Canadian population. <p>This dissertation compares the model-based and design-based approaches used to determine the risk factors of asthma prevalence in the Canadian female population of the NPHS (marginal model). Weighted, unweighted and robust statistical methods were used to examine the risk factors of the incidence of asthma (event history analysis) and of recurrent asthma episodes (recurrent survival analysis). Missing data analysis was used to study the bias associated with incomplete data. To determine the risk factors of asthma prevalence, the Generalized Estimating Equations (GEE) approach was used for marginal modeling (model-based approach) followed by Taylor Linearization and bootstrap estimation of standard errors (design-based approach). The incidence of asthma (event history analysis) was estimated using weighted, unweighted and robust methods. Recurrent event history analysis was conducted using Anderson and Gill, Wei, Lin and Weissfeld (WLW) and Prentice, Williams and Peterson (PWP) approaches. To assess the presence of bias associated with missing data, the weighted GEE and pattern-mixture models were used.<p>The prevalence of asthma in the Canadian female population was 6.9% (6.1-7.7) at the end of Cycle 5. When comparing model-based and design- based approaches for asthma prevalence, design-based method provided unbiased estimates of standard errors. The overall incidence of asthma in this population, excluding those with asthma at baseline, was 10.5/1000/year (9.2-12.1). For the event history analysis, the robust method provided the most stable estimates and standard errors. <p>For recurrent event history, the WLW method provided stable standard error estimates. Finally, for the missing data approach, the pattern-mixture model produced the most stable standard errors <p>To conclude, design-based approaches should be preferred over model-based approaches for analyzing complex survey data, as the former provides the most unbiased parameter estimates and standard errors.
28

Statistical modeling of longitudinal survey data with binary outcomes

Ghosh, Sunita 20 December 2007 (has links)
Data obtained from longitudinal surveys using complex multi-stage sampling designs contain cross-sectional dependencies among units caused by inherent hierarchies in the data, and within subject correlation arising due to repeated measurements. The statistical methods used for analyzing such data should account for stratification, clustering and unequal probability of selection as well as within-subject correlations due to repeated measurements. <p>The complex multi-stage design approach has been used in the longitudinal National Population Health Survey (NPHS). This on-going survey collects information on health determinants and outcomes in a sample of the general Canadian population. <p>This dissertation compares the model-based and design-based approaches used to determine the risk factors of asthma prevalence in the Canadian female population of the NPHS (marginal model). Weighted, unweighted and robust statistical methods were used to examine the risk factors of the incidence of asthma (event history analysis) and of recurrent asthma episodes (recurrent survival analysis). Missing data analysis was used to study the bias associated with incomplete data. To determine the risk factors of asthma prevalence, the Generalized Estimating Equations (GEE) approach was used for marginal modeling (model-based approach) followed by Taylor Linearization and bootstrap estimation of standard errors (design-based approach). The incidence of asthma (event history analysis) was estimated using weighted, unweighted and robust methods. Recurrent event history analysis was conducted using Anderson and Gill, Wei, Lin and Weissfeld (WLW) and Prentice, Williams and Peterson (PWP) approaches. To assess the presence of bias associated with missing data, the weighted GEE and pattern-mixture models were used.<p>The prevalence of asthma in the Canadian female population was 6.9% (6.1-7.7) at the end of Cycle 5. When comparing model-based and design- based approaches for asthma prevalence, design-based method provided unbiased estimates of standard errors. The overall incidence of asthma in this population, excluding those with asthma at baseline, was 10.5/1000/year (9.2-12.1). For the event history analysis, the robust method provided the most stable estimates and standard errors. <p>For recurrent event history, the WLW method provided stable standard error estimates. Finally, for the missing data approach, the pattern-mixture model produced the most stable standard errors <p>To conclude, design-based approaches should be preferred over model-based approaches for analyzing complex survey data, as the former provides the most unbiased parameter estimates and standard errors.
29

Missing Data Problems in Machine Learning

Marlin, Benjamin 01 August 2008 (has links)
Learning, inference, and prediction in the presence of missing data are pervasive problems in machine learning and statistical data analysis. This thesis focuses on the problems of collaborative prediction with non-random missing data and classification with missing features. We begin by presenting and elaborating on the theory of missing data due to Little and Rubin. We place a particular emphasis on the missing at random assumption in the multivariate setting with arbitrary patterns of missing data. We derive inference and prediction methods in the presence of random missing data for a variety of probabilistic models including finite mixture models, Dirichlet process mixture models, and factor analysis. Based on this foundation, we develop several novel models and inference procedures for both the collaborative prediction problem and the problem of classification with missing features. We develop models and methods for collaborative prediction with non-random missing data by combining standard models for complete data with models of the missing data process. Using a novel recommender system data set and experimental protocol, we show that each proposed method achieves a substantial increase in rating prediction performance compared to models that assume missing ratings are missing at random. We describe several strategies for classification with missing features including the use of generative classifiers, and the combination of standard discriminative classifiers with single imputation, multiple imputation, classification in subspaces, and an approach based on modifying the classifier input representation to include response indicators. Results on real and synthetic data sets show that in some cases performance gains over baseline methods can be achieved by methods that do not learn a detailed model of the feature space.
30

Missing Data Problems in Machine Learning

Marlin, Benjamin 01 August 2008 (has links)
Learning, inference, and prediction in the presence of missing data are pervasive problems in machine learning and statistical data analysis. This thesis focuses on the problems of collaborative prediction with non-random missing data and classification with missing features. We begin by presenting and elaborating on the theory of missing data due to Little and Rubin. We place a particular emphasis on the missing at random assumption in the multivariate setting with arbitrary patterns of missing data. We derive inference and prediction methods in the presence of random missing data for a variety of probabilistic models including finite mixture models, Dirichlet process mixture models, and factor analysis. Based on this foundation, we develop several novel models and inference procedures for both the collaborative prediction problem and the problem of classification with missing features. We develop models and methods for collaborative prediction with non-random missing data by combining standard models for complete data with models of the missing data process. Using a novel recommender system data set and experimental protocol, we show that each proposed method achieves a substantial increase in rating prediction performance compared to models that assume missing ratings are missing at random. We describe several strategies for classification with missing features including the use of generative classifiers, and the combination of standard discriminative classifiers with single imputation, multiple imputation, classification in subspaces, and an approach based on modifying the classifier input representation to include response indicators. Results on real and synthetic data sets show that in some cases performance gains over baseline methods can be achieved by methods that do not learn a detailed model of the feature space.

Page generated in 0.0525 seconds