Global ETD Search

91	Modeling Patterns of Small Scale Spatial Variation in Soil Huang, Fang 11 January 2006 (has links) The microbial communities found in soils are inherently heterogeneous and often exhibit spatial variations on a small scale. Becker et al. (2006) investigate this phenomenon and present statistical analyses to support their findings. In this project, alternative statistical methods and models are considered and employed in a re-analysis of the data from Becker. First, parametric nested random effects models are considered as an alternative to the nonparametric semivariogram models and kriging methods employed by Becker to analyze patterns of spatial variation. Second, multiple logistic regression models are employed to investigate factors influencing microbial community structure as an alternative to the simple logistic models used by Becker. Additionally, the microbial community profile data of Becker were unobservable at several points in the spatial grid. The Becker analysis assumes that the data are missing completely at random and as such have relatively little impact on inference. In this re-analysis, this assumption is investigated and it is shown that the pattern of missingness is correlated with both metabolic potential and spatial coordinates and thus provides useful information that was previously ignored by Becker. Multiple imputation methods are employed to incorporate the information present in the missing data pattern and results are compared with those of Becker. spatial variations nested random effects models semivariogram models kriging methods multiple logistic regression models missing multiple imputation Soil microbiology Mathematical models Spatial analysis (Statistics)
92	Traitement des données manquantes en épidémiologie : application de l’imputation multiple à des données de surveillance et d’enquêtes / Missing data management in epidemiology : Application of multiple imputation to data from surveillance systems and surveys Héraud Bousquet, Vanina 06 April 2012 (has links) Le traitement des données manquantes est un sujet en pleine expansion en épidémiologie. La méthode la plus souvent utilisée restreint les analyses aux sujets ayant des données complètes pour les variables d’intérêt, ce qui peut réduire lapuissance et la précision et induire des biais dans les estimations. L’objectif de ce travail a été d’investiguer et d’appliquer une méthode d’imputation multiple à des données transversales d’enquêtes épidémiologiques et de systèmes de surveillance de maladies infectieuses. Nous avons présenté l’application d’une méthode d’imputation multiple à des études de schémas différents : une analyse de risque de transmission du VIH par transfusion, une étude cas-témoins sur les facteurs de risque de l’infection à Campylobacter et une étude capture-recapture estimant le nombre de nouveaux diagnostics VIH chez les enfants. A partir d’une base de données de surveillance de l’hépatite C chronique (VHC), nous avons réalisé une imputation des données manquantes afind’identifier les facteurs de risque de complications hépatiques graves chez des usagers de drogue. A partir des mêmes données, nous avons proposé des critères d’application d’une analyse de sensibilité aux hypothèses sous-jacentes àl’imputation multiple. Enfin, nous avons décrit l’élaboration d’un processus d’imputation pérenne appliqué aux données du système de surveillance du VIH et son évolution au cours du temps, ainsi que les procédures d’évaluation et devalidation.Les applications pratiques présentées nous ont permis d’élaborer une stratégie de traitement des données manquantes, incluant l’examen approfondi de la base de données incomplète, la construction du modèle d’imputation multiple, ainsi queles étapes de validation des modèles et de vérification des hypothèses. / The management of missing values is a common and widespread problem in epidemiology. The most common technique used restricts the data analysis to subjects with complete information on variables of interest, which can reducesubstantially statistical power and precision and may also result in biased estimates.This thesis investigates the application of multiple imputation methods to manage missing values in epidemiological studies and surveillance systems for infectious diseases. Study designs to which multiple imputation was applied were diverse: a risk analysis of HIV transmission through blood transfusion, a case-control study on risk factors for ampylobacter infection, and a capture-recapture study to estimate the number of new HIV diagnoses among children. We then performed multiple imputation analysis on data of a surveillance system for chronic hepatitis C (HCV) to assess risk factors of severe liver disease among HCV infected patients who reported drug use. Within this study on HCV, we proposedguidelines to apply a sensitivity analysis in order to test the multiple imputation underlying hypotheses. Finally, we describe how we elaborated and applied an ongoing multiple imputation process of the French national HIV surveillance database, evaluated and attempted to validate multiple imputation procedures.Based on these practical applications, we worked out a strategy to handle missing data in surveillance data base, including the thorough examination of the incomplete database, the building of the imputation model, and the procedure to validate imputation models and examine underlying multiple imputation hypotheses. Données manquantes Imputation multiple Analyse de sensibilité Enquêtes Systèmes de surveillance VIH Hépatite C chronique Missing data Multiple imputation Sensitivity analysis Surveillance systems Surveys HIV Chronic hepatitis C
93	Bayesian Cluster Analysis : Some Extensions to Non-standard Situations Franzén, Jessica January 2008 (has links) <p>The Bayesian approach to cluster analysis is presented. We assume that all data stem from a finite mixture model, where each component corresponds to one cluster and is given by a multivariate normal distribution with unknown mean and variance. The method produces posterior distributions of all cluster parameters and proportions as well as associated cluster probabilities for all objects. We extend this method in several directions to some common but non-standard situations. The first extension covers the case with a few deviant observations not belonging to one of the normal clusters. An extra component/cluster is created for them, which has a larger variance or a different distribution, e.g. is uniform over the whole range. The second extension is clustering of longitudinal data. All units are clustered at all time points separately and the movements between time points are modeled by Markov transition matrices. This means that the clustering at one time point will be affected by what happens at the neighbouring time points. The third extension handles datasets with missing data, e.g. item non-response. We impute the missing values iteratively in an extra step of the Gibbs sampler estimation algorithm. The Bayesian inference of mixture models has many advantages over the classical approach. However, it is not without computational difficulties. A software package, written in Matlab for Bayesian inference of mixture models is introduced. The programs of the package handle the basic cases of clustering data that are assumed to arise from mixture models of multivariate normal distributions, as well as the non-standard situations.</p> Cluster analysis Clustering Classification Mixture model Gaussian Bayesian inference MCMC Gibbs sampler Deviant group Longitudinal Missing data Multiple imputation Statistics Statistik
94	Comparison Of Missing Value Imputation Methods For Meteorological Time Series Data Aslan, Sipan 01 September 2010 (has links) (PDF) Dealing with missing data in spatio-temporal time series constitutes important branch of general missing data problem. Since the statistical properties of time-dependent data characterized by sequentiality of observations then any interruption of consecutiveness in time series will cause severe problems. In order to make reliable analyses in this case missing data must be handled cautiously without disturbing the series statistical properties, mainly as temporal and spatial dependencies. In this study we aimed to compare several imputation methods for the appropriate completion of missing values of the spatio-temporal meteorological time series. For this purpose, several missing imputation methods are assessed on their imputation performances for artificially created missing data in monthly total precipitation and monthly mean temperature series which are obtained from the climate stations of Turkish State Meteorological Service. Artificially created missing data are estimated by using six methods. Single Arithmetic Average (SAA), Normal Ratio (NR) and NR Weighted with Correlations (NRWC) are the three simple methods used in the study. On the other hand, we used two computational intensive methods for missing data imputation which are called Multi Layer Perceptron type Neural Network (MLPNN) and Monte Carlo Markov Chain based on Expectation-Maximization Algorithm (EM-MCMC). In addition to these, we propose a modification in the EM-MCMC method in which results of simple imputation methods are used as auxiliary variables. Beside the using accuracy measure based on squared errors we proposed Correlation Dimension (CD) technique for appropriate evaluation of imputation performances which is also important subject of Nonlinear Dynamic Time Series Analysis. QA Analysis 299.6-433
95	The role of families in the stratification of attainment : parental occupations, parental education and family structure in the 1990s Playford, C. J. January 2011 (has links) The closing decades of the 20th century have witnessed a large increase in the numbers of young people remaining in education post-16 rather than entering the labour market. Concurrently, overall educational attainment in General Certificate of Secondary Education (GCSE) qualifications in England and Wales has steadily increased since their introduction in 1988. The 1990s represent a key period of change in these trends. Some sociologists argue that processes of detraditionalisation have occurred whereby previous indicators of social inequality, such as social class, are less relevant to the transitions of young people from school to work. Sociologists from other traditions argue that inequalities persist in the stratification of educational attainment by the family backgrounds of young people but that these factors have changed during this period. This thesis is an investigation of the influence of family background factors upon GCSE attainment during the 1990s. This includes extensive statistical analysis of measures of parental occupation, parental education and family structure with gender, ethnicity, school type and housing tenure type within the Youth Cohort Study of England and Wales. These analyses include over 100,000 respondents in 6 cohorts of school leavers with the harmonisation of data from cohort 6 (1992) to the Youth Cohort Time Series for England, Wales and Scotland 1984-2002 (Croxford, Ianelli and Shapira 2007). By adding the 1992 data to existing 1990s cohorts, the statistical models fitted apply to the complete set of 1990s cohorts and are therefore able to provide insight for the whole of this period. Strong differentials by parental occupation persist throughout the 1990s and do not diminish despite the overall context of rising attainment. This relationship remains net of the other factors listed, irrespective of the measure of parental occupation or the GCSE attainment outcome measure used. This builds upon and supports previous work conducted using the Youth Cohort Study and suggests that stratification in educational attainment remains a significant factor. Gender and ethnicity remain further sources of persistent stratification in GCSE attainment. Following a discussion of the weighting system and features of the Youth Cohort Study as a dataset, a thorough investigation of missing data is included, with the results of multiply imputed datasets used to examine the potential for missing data to bias estimates. This includes a critique of these approaches in the context of survey data analysis. The findings from this investigation suggest the importance of survey data collection methods, the limitations of post-survey bias correction methods and provide a thorough investigation of the data. The analysis then develops and expands previous work by investigating variation in GCSE attainment by subjects studied, through Latent Class Analysis of YCS cohort 6 (1992). Of the four groups identified in the model, a clear division is noted between those middle-attaining groups with respect to attainment in Science and Mathematics. GCSE attainment in combinations of subjects studied is stratified particularly with respect to gender and ethnicity. This research offers new insight into the role of family background factors in GCSE attainment by subject combination. 373
96	Novel computationally intelligent machine learning algorithms for data mining and knowledge discovery Gheyas, Iffat A. January 2009 (has links) This thesis addresses three major issues in data mining regarding feature subset selection in large dimensionality domains, plausible reconstruction of incomplete data in cross-sectional applications, and forecasting univariate time series. For the automated selection of an optimal subset of features in real time, we present an improved hybrid algorithm: SAGA. SAGA combines the ability to avoid being trapped in local minima of Simulated Annealing with the very high convergence rate of the crossover operator of Genetic Algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of generalized regression neural networks (GRNN). For imputing missing values and forecasting univariate time series, we propose a homogeneous neural network ensemble. The proposed ensemble consists of a committee of Generalized Regression Neural Networks (GRNNs) trained on different subsets of features generated by SAGA and the predictions of base classifiers are combined by a fusion rule. This approach makes it possible to discover all important interrelations between the values of the target variable and the input features. The proposed ensemble scheme has two innovative features which make it stand out amongst ensemble learning algorithms: (1) the ensemble makeup is optimized automatically by SAGA; and (2) GRNN is used for both base classifiers and the top level combiner classifier. Because of GRNN, the proposed ensemble is a dynamic weighting scheme. This is in contrast to the existing ensemble approaches which belong to the simple voting and static weighting strategy. The basic idea of the dynamic weighting procedure is to give a higher reliability weight to those scenarios that are similar to the new ones. The simulation results demonstrate the validity of the proposed ensemble model. 006.3
97	[en] MULTIPLE IMPUTATION IN MULTIVARIATE NORMAL DATA VIA A EM TYPE ALGORITHM / [pt] UM ALGORITMO - EM - PARA IMPUTAÇÃO MÚLTIPLA DE DADOS CENSURADOS FABIANO SALDANHA GOMES DE OLIVEIRA 05 July 2002 (has links) [pt] Construímos um algoritmo tipo EM para estimar os parâmetros por máxima verossimilhança. Os valores imputados são calculados pela média condicional sujeito a ser maior (ou menor) do que o valor observado. Como a estimação é por máxima verossimilhança, a matriz de informação permite o cálculo de intervalos de confiança para os parâmetros e para os valores imputados. Fizemos experiência com dados simulados e há também um estudo de dados reais (onde na verdade a hipótese de normalidade não se aplica). / [en] An EM algorithm was developed to parameter estimation of a multivariate truncate normal distribution. The multiple imputation is evaluated by the conditional expectation becoming the estimated values greater or lower than the observed value. The information matrix gives the confident interval to the parameter and values estimations. The proposed algorithm was tested with simulated and real data (where the normality is not followed). [pt] DADOS CENSURADOS MULTIVARIADOS [en] MULTIVARIATE CENSORED DATA [pt] NORMAL TRUNCADA MULTIVARIADA [en] TRUNCATED NORMAL MULTIVARIATE [pt] ALGORITMO EM [en] EM ALGORITHM [pt] IMPUTACAO MULTIPLA DE DADOS [en] MULTIPLE IMPUTATION DATA.
98	Modélisation statistique de l'impact des environnements académiques sur les croyances et la réussite des élèves au Chili / Statistical modeling of the impact of academic environments on student’s beliefs and achievement in Chile Giaconi Smoje, Valentina 26 September 2016 (has links) Cette thèse de doctorat est consacrée à la modélisation statistique de l'impact des environnements académiques sur les croyances et la réussite des élèves au Chili. Nous contribuons au domaine de l'efficacité éducative avec une discussion statistique et deux études empiriques. La discussion statique questionne la façon de combiner les modèles multiniveaux avec des méthodes pour le biais de sélection et pour les données manquantes. Cette discussion statistique sera utilisée pour prendre des décisions méthodologiques dans les études empiriques. La première étude empirique consiste en une évaluation d'intervention de l'impact des cours de sciences sur les croyances des étudiants. La deuxième étude empirique concerne l'effet des écoles sur les trajectoires des scores de mathématiques et de lecture des élèves. Dans la partie statistique, nous avons décrit et analysé les méthodes d'ajustement linéaire et d'appariement des scores de propension pour modéliser le biais de sélection. En ce qui concerne les problèmes de données manquantes, nous avons analysé la méthode d'imputation multiple. Chacune de ces méthodes est compatible avec les modèles multi-niveaux. En revanche, l'utilisation combinée de ces méthodes pour des données hiérarchiques n'est pas résolu. Nous présentons alors une discussion statistique qui analyse et classe des stratégies pour combiner ces méthodes.La première étude empirique concerne l'influence des disciplines scientifiques qui s'intéressent à des objets vivants et non-vivants sur les croyances épistémiques et le sentiment d'auto-efficacité des étudiants de secondaire. Nous avons comparé, pour ces croyances, les étudiants qui ont suivi des cours de sciences à un groupe contrôle sur deux temps de mesure, à la fin des cours et 4 mois après. Nous avons constaté un effet positif du travail en laboratoire et des disciplines qui s'intéressent à des objets vivants (en contrôlant les variables confondues). Cette étude met en lumière des différences entre les disciplines qui s'intéressent à des objets vivant et des objets non-vivant qui devront être explorées.La deuxième étude empirique concerne l'effet des écoles sur les trajectoires des scores en mathématiques et en lecture des élèves. Le premier objectif est de décrire les caractéristiques des trajectoires et la variance expliquée par les écoles primaires et secondaires. Le deuxième objectif est de mesurer l'effet du type d'école, publique ou voucher (privée avec un financement de l'état), sur les trajectoires. Nous avons utilisé une base de données nationale longitudinale qui comprenait des mesures pour les mêmes élèves en 4ème, 8ème et 10ème années. Des modèles de croissance multiniveaux ont été utilisés pour modéliser les trajectoires. Nos résultats montrent que les écoles secondaires et primaires ont un effet sur les interceptes et les pentes des trajectoires. Par ailleurs, nous avons constaté un effet négatif de l'école publique, qui est devenu non significatif lorsque nous avons contrôlé la composition socio-économique de l'école et ses pratiques de sélection. Ces résultats illustrent la stratification entre le système public et le système voucher ainsi que la nécessité de questionner l'efficacité des écoles pour chaque système. / This PhD thesis is dedicated to the statistical modeling of the impact of academic environments on student’s beliefs and achievement in Chile. We contribute to the field of educational effectiveness with a statistical discussion regarding how to combine multilevel models with methods for selection bias and missing data and two empirical studies. The statistical discussion was used to take methodological decisions in the empirical studies. The first empirical study evaluates the impact of science courses on students’ beliefs. The second empirical study is about school effects on students’ trajectories in mathematics and reading scores. In the statistical part, we analyze linear adjustment and propensity score matching to address selection bias. Regarding the missing data problem, we considered multiple imputation techniques. Each of these methods is compatible with multilevel models. However, the problem of addressing selection bias and missing data simultaneously with hierarchical data is not resolved. We present a statistical discussion that classifies and analyzes strategies to combine the methods. The first empirical study regards the influence of Life and Non-life science courses in secondary students’ epistemic and self-efficacy beliefs related to sciences. We compared students that took summer science courses with a control group in a post and follow-up beliefs questionnaire. We found positive effects of Life courses and courses with laboratory work, controlling for confounding variables. The results show differences between Life and Non-life scientific disciplines that should be explored. The second empirical study concerns school effects on trajectories of Chilean students. It has two aims. The first aim is to describe the characteristics of the trajectories in mathematics and reading scores and the variation explained by primary and secondary schools. The second aim is to measure the effect of public schools in comparison with voucher schools on students’ trajectories in mathematics and reading scores. We used a longitudinal national database which included measures for the same students at 4th, 8th and 10th grade. Multilevel growth models were used to model the trajectories. We found effects of secondary and primary schools on intercepts and slopes. In addition, we found negative effects from public education, which became not significant after controlling for school’ socioeconomic composition and selection practices. The results illustrate the stratification between the public system and voucher system and the need to study inside each system which schools are more efficient. Modèles multiniveaux Appariement des scores de propension Imputation multiple Disciplines scientifiques Effets-Écoles Trajectoires des élèves Multilevel modeling Propensity score matching Multiple imputation Science disciplines School effects Students' trajectories 370
99	Rozdílný dopad minimální mzdy na zaměstnanost napříč regiony EU / The Differential Impact of Minimum Wage on Employment across the EU Regions Sklenářová, Tereza January 2018 (has links) Several studies have shown that prices differ across regions and affect standards of living substantially. This thesis investigates whether they cause the differential impact of minimum wage on employment and hours of work across the European Union NUTS 2 regions. Based on the existing regional price estimates of 7 European Union countries and publicly available aggregate regional data, estimates of regional price levels for another 11 European Union countries with minimum wage are obtained. The method that was used for this purpose (multiple imputation) enables to use the resulting estimates as an explanatory variable in another regression as it takes into consideration using imputed instead of observed values by correcting the variances of parameter coefficients. The impacts of minimum wage are investigated for 3 groups of people who are at risk of being affected by its increase - young adults (15-19 years), low-educated individuals and low-skilled individuals. The results indicate that the minimum wage has a negative impact on employment that is higher in regions with higher price levels. The negative effect of minimum wage on hours of work was not confirmed.
100	Multiple Imputation for Two-Level Hierarchical Models with Categorical Variables and Missing at Random Data January 2016 (has links) abstract: Accurate data analysis and interpretation of results may be influenced by many potential factors. The factors of interest in the current work are the chosen analysis model(s), the presence of missing data, and the type(s) of data collected. If analysis models are used which a) do not accurately capture the structure of relationships in the data such as clustered/hierarchical data, b) do not allow or control for missing values present in the data, or c) do not accurately compensate for different data types such as categorical data, then the assumptions associated with the model have not been met and the results of the analysis may be inaccurate. In the presence of clustered/nested data, hierarchical linear modeling or multilevel modeling (MLM; Raudenbush & Bryk, 2002) has the ability to predict outcomes for each level of analysis and across multiple levels (accounting for relationships between levels) providing a significant advantage over single-level analyses. When multilevel data contain missingness, multilevel multiple imputation (MLMI) techniques may be used to model both the missingness and the clustered nature of the data. With categorical multilevel data with missingness, categorical MLMI must be used. Two such routines for MLMI with continuous and categorical data were explored with missing at random (MAR) data: a formal Bayesian imputation and analysis routine in JAGS (R/JAGS) and a common MLM procedure of imputation via Bayesian estimation in BLImP with frequentist analysis of the multilevel model in Mplus (BLImP/Mplus). Manipulated variables included interclass correlations, number of clusters, and the rate of missingness. Results showed that with continuous data, R/JAGS returned more accurate parameter estimates than BLImP/Mplus for almost all parameters of interest across levels of the manipulated variables. Both R/JAGS and BLImP/Mplus encountered convergence issues and returned inaccurate parameter estimates when imputing and analyzing dichotomous data. Follow-up studies showed that JAGS and BLImP returned similar imputed datasets but the choice of analysis software for MLM impacted the recovery of accurate parameter estimates. Implications of these findings and recommendations for further research will be discussed. / Dissertation/Thesis / Doctoral Dissertation Educational Psychology 2016 Quantitative psychology Statistics Educational tests & measurements Bayesian Estimation Categorical Data Analysis Missing at Random Data Missing Data Theory Multilevel Modeling Multiple Imputation

Search results