• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 4
  • 4
  • 2
  • 1
  • 1
  • Tagged with
  • 12
  • 12
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Estimating Prevalence from Complex Surveys

O'Brien, Sophie 07 November 2014 (has links)
Massachusetts passed legislation in the fall of 2012 to allow the construction of three casinos and a slot parlor in the state. The prevalence of problem gambling in the state and in areas where casinos will be constructed is of particular interest. The goal is to evaluate the change in prevalence after construction of the casinos, using a multi-mode address based sample survey. The objective of this thesis is to evaluate and describe ways of using statistical inference to estimates prevalence rates in finite populations. Four methods were considered in an attempt to evaluate the prevalence of problem gambling in the context of the gambling study. These methods were evaluated unconditionally and conditionally, controlling for gender, using mean square error (MSE) as a measure of accuracy. The simple mean, the post-stratified mean, the best linear unbiased predictor (BLUP), and the empirical best linear unbiased predictor (EBLUP) were considered in three examples. Conditional analyses of a population with N=1,000 and a crude problem gambling rate of 1.5, samples of n=200 led to the simple mean and the post-stratified mean to perform better in certain situations, as measured by their low MSE values. When there are less females than expected in a sample, the post-stratified mean produces a lower mean MSE over the 10,000 simulations. When there are more females than expected in a sample, the simple mean produces a lower mean MSE over the 10,000 simulations. Conditional analysis provided more appropriate results than unconditional analysis.
2

Bayesian Inference in Large Data Problems

Quiroz, Matias January 2015 (has links)
In the last decade or so, there has been a dramatic increase in storage facilities and the possibility of processing huge amounts of data. This has made large high-quality data sets widely accessible for practitioners. This technology innovation seriously challenges traditional modeling and inference methodology. This thesis is devoted to developing inference and modeling tools to handle large data sets. Four included papers treat various important aspects of this topic, with a special emphasis on Bayesian inference by scalable Markov Chain Monte Carlo (MCMC) methods. In the first paper, we propose a novel mixture-of-experts model for longitudinal data. The model and inference methodology allows for manageable computations with a large number of subjects. The model dramatically improves the out-of-sample predictive density forecasts compared to existing models. The second paper aims at developing a scalable MCMC algorithm. Ideas from the survey sampling literature are used to estimate the likelihood on a random subset of data. The likelihood estimate is used within the pseudomarginal MCMC framework and we develop a theoretical framework for such algorithms based on subsets of the data. The third paper further develops the ideas introduced in the second paper. We introduce the difference estimator in this framework and modify the methods for estimating the likelihood on a random subset of data. This results in scalable inference for a wider class of models. Finally, the fourth paper brings the survey sampling tools for estimating the likelihood developed in the thesis into the delayed acceptance MCMC framework. We compare to an existing approach in the literature and document promising results for our algorithm. / <p>At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 1: Submitted. Paper 2: Submitted. Paper 3: Manuscript. Paper 4: Manuscript.</p>
3

Inférence basée sur le plan pour l'estimation de petits domaines / Design-based inference for small area estimation

Randrianasolo, Toky 18 November 2013 (has links)
La forte demande de résultats à un niveau géographique fin, notamment à partir d'enquêtes nationales, a mis en évidence la fragilité des estimations sur petits domaines. Cette thèse propose d'y remédier avec des méthodes spécifiques basées sur le plan de sondage. Celles-ci reposent sur la constructionde nouvelles pondérations pour chaque unité statistique. La première méthode consiste à optimiser le redressement du sous-échantillon d'une enquête inclusdans un domaine. La deuxième repose sur la construction de poids dépendant à la fois des unités statistiques et des domaines. Elle consiste à scinder les poids de sondage de l'estimateur global tout en respectant deux contraintes : 1/ la somme des estimations sur toute partition en domaines est égale à l'estimation globale ; 2/ le système de pondération pour un domaine particulier satisfait les propriétés de calage sur les variables auxiliaires connues pour le domaine. L'estimateur par scission ainsi obtenu se comporte de manière quasi analogue au célèbre estimateur blup (meilleur prédicteur linéaire sans biais). La troisième méthode propose une réécriture de l'estimateur blup sous la forme d'un estimateur linéaire homogène, en adoptant une approche basée sur le plan de sondage, bien que l'estimateur dépende d'un modèle. De nouveaux estimateurs blup modifiés sont obtenus. Leur précision, estimée par simulation avec application sur des données réelles, est assez proche de celle de l'estimateur blup standard. Les méthodes développées dans cette thèse sont ensuite appliquées à l'estimation d'indicateurs de la mobilité locale à partir de l'Enquête Nationale sur les Transports et les Déplacements 2007-2008. Lorsque la taille d'un domaine est faible dans l'échantillon, les estimations obtenues avec la première méthode perdent en précision, alors que la précision reste satisfaisante pour les deux autres méthodes. / The strong demand for results at a detailed geographic level, particularly from national surveys, has raised the problem of the fragility of estimates for small areas. This thesis addresses this issue with specific methods based on the sample design. These ones consist of building new weights for each statistical unit. The first method consists of optimizing the re-weighting of a subsample survey included in an area. The second one is based on the construction of weights that depend on the statistical units as well as the areas. It consists of splitting the sampling weights of the overall estimator while satisfying two constraints : 1/ the sum of the estimates on every partition into areas is equal to the overall estimate ; 2/ the system of weights for a given area satisfies calibration properties on known auxiliary variables at the level of the area. The split estimator thus obtained behaves almost similarly as the well-known blup (best linear unbiased predictor) estimator. The third method proposes a rewriting of the blup estimator, although model-based, in the form of a homogenous linear estimator from a design-based approach. New modified blup estimators are obtained. Their precision, estimated by simulation with an application to real data, is quite close to that of the standard blup estimator. Then, the methods developed in this thesis are applied to the estimation of local mobility indicators from the 2007-2008 French National Travel Survey. When the size of an area is small in the sample, the estimates obtained with the first method are not precise enough whereas the precision remains satisfactory for the two other methods.
4

Kalibrační odhady ve výběrových šetřeních / Calibration Estimators in Survey Sampling

Klička, Petr January 2018 (has links)
V této práci se zabýváme odhady populačního úhrnu s využitím pomoc- ných informací. V práci je popsán obecný regresní odhad a předpoklady, za kterých je splněna asymptotická normalita tohoto odhadu. Dále jsou zde po- psány kalibrační odhady a zmínka o jejich asymptotické ekvivalenci s obec- ným regresním odhadem. Odvozené závěry aplikujeme na data z RADIO- PROJEKTu a porovnáme je s výsledky získanými společnostmi, které tento projekt realizovali. Na závěr pomocí simulací porovnáme skutečné pravdě- podobnosti pokrytí interval· spolehlivosti pro populační úhrn spočítané na základě teorie uvedené v této práci a na základě metod společností realizu- jících RADIOPROJEKT. 1
5

Calibration Adjustment for Nonresponse in Sample Surveys

Rota, Bernardo João January 2016 (has links)
In this thesis, we discuss calibration estimation in the presence of nonresponse with a focus on the linear calibration estimator and the propensity calibration estimator, along with the use of different levels of auxiliary information, that is, sample and population levels. This is a fourpapers- based thesis, two of which discuss estimation in two steps. The two-step-type estimator here suggested is an improved compromise of both the linear calibration and the propensity calibration estimators mentioned above. Assuming that the functional form of the response model is known, it is estimated in the first step using calibration approach. In the second step the linear calibration estimator is constructed replacing the design weights by products of these with the inverse of the estimated response probabilities in the first step. The first step of estimation uses sample level of auxiliary information and we demonstrate that this results in more efficient estimated response probabilities than using population-level as earlier suggested. The variance expression for the two-step estimator is derived and an estimator of this is suggested. Two other papers address the use of auxiliary variables in estimation. One of which introduces the use of principal components theory in the calibration for nonresponse adjustment and suggests a selection of components using a theory of canonical correlation. Principal components are used as a mean to accounting the problem of estimation in presence of large sets of candidate auxiliary variables. In addition to the use of auxiliary variables, the last paper also discusses the use of explicit models representing the true response behavior. Usually simple models such as logistic, probit, linear or log-linear are used for this purpose. However, given a possible complexity on the structure of the true response probability, it may raise a question whether these simple models are effective. We use an example of telephone-based survey data collection process and demonstrate that the logistic model is generally not appropriate.
6

Linhas telefônicas residenciais: uso em inquéritos epidemiológicos no Brasil / Telephone surveys: its use in epidemiologic investigation in Brazil

Bernal, Regina Tomie Ivata 31 October 2006 (has links)
Objetivos: Estudar as possibilidades de uso de cadastros de linhas telefônicas residenciais para implementação de inquéritos por amostragem. Descrever presença de vícios potenciais, associados às taxas de coberturas de LTR, nas principais variáveis que usualmente compõem o núcleo de informações de inquéritos epidemiológicos. Métodos: Com base nos dados da Pesquisa Nacional por Amostra de Domicílios (PNAD) no período de 1998 a 2003, exceto 2000, foram estimadas por intervalo de confiança de 95%, as médias e proporções. Nas análises dos dados considerou-se o plano de amostragem complexa. Resultados: No Brasil, houve um crescimento de 50% dos domicílios atendidos por LTR, no período. No entanto, essa evolução não ocorreu de forma uniforme no Brasil. Foram identificados diferentes perfis de usuários de LTR, sendo as principais características relacionadas com a escolaridade, a raça, a posse de um plano de saúde e a localização geográfica. Nas regiões com baixa cobertura de LTR podem ocorrer vícios nas estimativas de prevalências de doenças crônicas. Conclusão: O uso das linhas telefônicas residencias para a realização das entrevistas em inquéritos epidemiológicos mostrou-se viável para as unidades de federação com taxas de cobertura de LTR acima de 70%. / Objectives: To study the possibilities to use sampling frame of telephone interview the implementation of survey sampling. To describe the presence of potential biases associated to with coverage rates of telephone surveys in the main variables that usually compose the information core of epidemiological investigations. Methods: From database of the National Household Sample Survey (PNAD) in the period between 1998 and 2003, except for 2000, the averages and proportions were estimated by a 95% confidence interval. In the statistics analysis , the complex sampling plan was considered. Results: In Brazil, it has been there was a 50% increase in the households served by telephone, during the studied period. However, this evolution did not occur in a uniform way. Different profiles of telephone users were identified, and the main characteristics were related to education, race, the health plans and the geographic location. The regions with low telephone coverage may introduce bias on the estimates of prevalence of chronic diseases. Conclusion: The use of telephone survey for the conduction of interviews during epidemiologic investigations could be suitable to be feasible for the federal units with coverage rates above 70%.
7

Linhas telefônicas residenciais: uso em inquéritos epidemiológicos no Brasil / Telephone surveys: its use in epidemiologic investigation in Brazil

Regina Tomie Ivata Bernal 31 October 2006 (has links)
Objetivos: Estudar as possibilidades de uso de cadastros de linhas telefônicas residenciais para implementação de inquéritos por amostragem. Descrever presença de vícios potenciais, associados às taxas de coberturas de LTR, nas principais variáveis que usualmente compõem o núcleo de informações de inquéritos epidemiológicos. Métodos: Com base nos dados da Pesquisa Nacional por Amostra de Domicílios (PNAD) no período de 1998 a 2003, exceto 2000, foram estimadas por intervalo de confiança de 95%, as médias e proporções. Nas análises dos dados considerou-se o plano de amostragem complexa. Resultados: No Brasil, houve um crescimento de 50% dos domicílios atendidos por LTR, no período. No entanto, essa evolução não ocorreu de forma uniforme no Brasil. Foram identificados diferentes perfis de usuários de LTR, sendo as principais características relacionadas com a escolaridade, a raça, a posse de um plano de saúde e a localização geográfica. Nas regiões com baixa cobertura de LTR podem ocorrer vícios nas estimativas de prevalências de doenças crônicas. Conclusão: O uso das linhas telefônicas residencias para a realização das entrevistas em inquéritos epidemiológicos mostrou-se viável para as unidades de federação com taxas de cobertura de LTR acima de 70%. / Objectives: To study the possibilities to use sampling frame of telephone interview the implementation of survey sampling. To describe the presence of potential biases associated to with coverage rates of telephone surveys in the main variables that usually compose the information core of epidemiological investigations. Methods: From database of the National Household Sample Survey (PNAD) in the period between 1998 and 2003, except for 2000, the averages and proportions were estimated by a 95% confidence interval. In the statistics analysis , the complex sampling plan was considered. Results: In Brazil, it has been there was a 50% increase in the households served by telephone, during the studied period. However, this evolution did not occur in a uniform way. Different profiles of telephone users were identified, and the main characteristics were related to education, race, the health plans and the geographic location. The regions with low telephone coverage may introduce bias on the estimates of prevalence of chronic diseases. Conclusion: The use of telephone survey for the conduction of interviews during epidemiologic investigations could be suitable to be feasible for the federal units with coverage rates above 70%.
8

Estimateur bootstrap de la variance d'un estimateur de quantile en contexte de population finie

McNealis, Vanessa 12 1900 (has links)
Ce mémoire propose une adaptation lisse de méthodes bootstrap par pseudo-population aux fins d'estimation de la variance et de formation d'intervalles de confiance pour des quantiles de population finie. Dans le cas de données i.i.d., Hall et al. (1989) ont montré que l'ordre de convergence de l'erreur relative de l’estimateur bootstrap de la variance d’un quantile échantillonnal connaît un gain lorsque l'on rééchantillonne à partir d’une estimation lisse de la fonction de répartition plutôt que de la fonction de répartition expérimentale. Dans cet ouvrage, nous étendons le principe du bootstrap lisse au contexte de population finie en le mettant en œuvre au sein des méthodes bootstrap par pseudo-population. Étant donné un noyau et un paramètre de lissage, cela consiste à lisser la pseudo-population dont sont issus les échantillons bootstrap selon le plan de sondage initial. Deux plans sont abordés, soit l'échantillonnage aléatoire simple sans remise et l'échantillonnage de Poisson. Comme l'utilisation des algorithmes proposés nécessite la spécification du paramètre de lissage, nous décrivons une méthode de sélection par injection et des méthodes de sélection par la minimisation d'estimés bootstrap de critères d'ajustement sur une grille de valeurs du paramètre de lissage. Nous présentons des résultats d'une étude par simulation permettant de montrer empiriquement l'efficacité de l'approche lisse par rapport à l'approche standard pour ce qui est de l'estimation de la variance d'un estimateur de quantile et des résultats plus mitigés en ce qui concerne les intervalles de confiance. / This thesis introduces smoothed pseudo-population bootstrap methods for the purposes of variance estimation and the construction of confidence intervals for finite population quantiles. In an i.i.d. context, Hall et al. (1989) have shown that resampling from a smoothed estimate of the distribution function instead of the usual empirical distribution function can improve the convergence rate of the bootstrap variance estimator of a sample quantile. We extend the smoothed bootstrap to the survey sampling framework by implementing it in pseudo-population bootstrap methods. Given a kernel function and a bandwidth, it consists of smoothing the pseudo-population from which bootstrap samples are drawn using the original sampling design. Two designs are discussed, namely simple random sampling and Poisson sampling. The implementation of the proposed algorithms requires the specification of the bandwidth. To do so, we develop a plug-in selection method along with grid search selection methods based on bootstrap estimates of two performance metrics. We present the results of a simulation study which provide empirical evidence that the smoothed approach is more efficient than the standard approach for estimating the variance of a quantile estimator together with mixed results regarding confidence intervals.
9

Statistiques multivariées pour l'analyse du risque alimentaire / Multivariate statistics for dietary risk analysis

Chautru, Emilie 06 September 2013 (has links)
Véritable carrefour de problématiques économiques, biologiques, sociologiques, culturelles et sanitaires, l’alimentation suscite de nombreuses polémiques. Dans un contexte où les échanges mondiaux facilitent le transport de denrées alimentaires produites dans des conditions environnementales diverses, où la consommation de masse encourage les stratégies visant à réduire les coûts et maximiser le volume de production (OGM, pesticides, etc.) il devient nécessaire de quantifier les risques sanitaires que de tels procédés engendrent. Notre intérêt se place ici sur l’étude de l’exposition chronique, de l’ordre de l’année, à un ensemble de contaminants dont la nocivité à long terme est d’ores et déjà établie. Les dangers et bénéfices de l’alimentation ne se restreignant pas à l’ingestion ou non de substances toxiques, nous ajoutons à nos objectifs l’étude de certains apports nutritionnels. Nos travaux se centrent ainsi autour de trois axes principaux. Dans un premier temps, nous nous intéressons à l'analyse statistique des très fortes expositions chroniques à une ou plusieurs substances chimiques, en nous basant principalement sur des résultats issus de la théorie des valeurs extrêmes. Nous adaptons ensuite des méthodes d'apprentissage statistique de type ensembles de volume minimum pour l'identification de paniers de consommation réalisant un compromis entre risque toxicologique et bénéfice nutritionnel. Enfin, nous étudions les propriétés asymptotiques d'un certain nombre d'estimateurs permettant d'évaluer les caractéristiques de l'exposition, qui prennent en compte le plan de sondage utilisé pour collecter les données. / At a crossroads of economical, sociological, cultural and sanitary issues, dietary analysis is of major importance for public health institutes. When international trade facilitates the transportation of foodstuffs produced in very different environmental conditions, when conspicuous consumption encourages profitable strategies (GMO, pesticides, etc.), it is necessary to quantify the sanitary risks engendered by such economic behaviors. We are interested in the evaluation of chronic types of exposure (at a yearly scale) to food contaminants, the long-term toxicity of which is already well documented. Because dietary risk and benefit is not limited to the abuse or the avoidance of toxic substances, nutritional intakes are also considered. Our work is thus organized along three main lines of research. We first consider the statistical analysis of very high long-term types of exposure to one or more chemical elements present in the food, adopting approaches in keeping with extreme value theory. Then, we adapt classical techniques borrowed from the statistical learning field concerning minimum volume set estimation in order to identify dietary habits that realize a compromise between toxicological risk and nutritional benefit. Finally, we study the asymptotic properties of a number of statistics that can assess the characteristics of the distribution of individual exposure, which take into account the possible survey scheme from which the data originate.
10

Estimation robuste de courbes de consommmation électrique moyennes par sondage pour de petits domaines en présence de valeurs manquantes / Robust estimation of mean electricity consumption curves by sampling for small areas in presence of missing values

De Moliner, Anne 05 December 2017 (has links)
Dans cette thèse, nous nous intéressons à l'estimation robuste de courbes moyennes ou totales de consommation électrique par sondage en population finie, pour l'ensemble de la population ainsi que pour des petites sous-populations, en présence ou non de courbes partiellement inobservées.En effet, de nombreuses études réalisées dans le groupe EDF, que ce soit dans une optique commerciale ou de gestion du réseau de distribution par Enedis, se basent sur l'analyse de courbes de consommation électrique moyennes ou totales, pour différents groupes de clients partageant des caractéristiques communes. L'ensemble des consommations électriques de chacun des 35 millions de clients résidentiels et professionnels Français ne pouvant être mesurées pour des raisons de coût et de protection de la vie privée, ces courbes de consommation moyennes sont estimées par sondage à partir de panels. Nous prolongeons les travaux de Lardin (2012) sur l'estimation de courbes moyennes par sondage en nous intéressant à des aspects spécifiques de cette problématique, à savoir l'estimation robuste aux unités influentes, l'estimation sur des petits domaines, et l'estimation en présence de courbes partiellement ou totalement inobservées.Pour proposer des estimateurs robustes de courbes moyennes, nous adaptons au cadre fonctionnel l'approche unifiée d'estimation robuste en sondages basée sur le biais conditionnel proposée par Beaumont (2013). Pour cela, nous proposons et comparons sur des jeux de données réelles trois approches : l'application des méthodes usuelles sur les courbes discrétisées, la projection sur des bases de dimension finie (Ondelettes ou Composantes Principales de l'Analyse en Composantes Principales Sphériques Fonctionnelle en particulier) et la troncature fonctionnelle des biais conditionnels basée sur la notion de profondeur d'une courbe dans un jeu de données fonctionnelles. Des estimateurs d'erreur quadratique moyenne instantanée, explicites et par bootstrap, sont également proposés.Nous traitons ensuite la problématique de l'estimation sur de petites sous-populations. Dans ce cadre, nous proposons trois méthodes : les modèles linéaires mixtes au niveau unité appliqués sur les scores de l'Analyse en Composantes Principales ou les coefficients d'ondelettes, la régression fonctionnelle et enfin l'agrégation de prédictions de courbes individuelles réalisées à l'aide d'arbres de régression ou de forêts aléatoires pour une variable cible fonctionnelle. Des versions robustes de ces différents estimateurs sont ensuite proposées en déclinant la démarche d'estimation robuste basée sur les biais conditionnels proposée précédemment.Enfin, nous proposons quatre estimateurs de courbes moyennes en présence de courbes partiellement ou totalement inobservées. Le premier est un estimateur par repondération par lissage temporel non paramétrique adapté au contexte des sondages et de la non réponse et les suivants reposent sur des méthodes d'imputation. Les portions manquantes des courbes sont alors déterminées soit en utilisant l'estimateur par lissage précédemment cité, soit par imputation par les plus proches voisins adaptée au cadre fonctionnel ou enfin par une variante de l'interpolation linéaire permettant de prendre en compte le comportement moyen de l'ensemble des unités de l'échantillon. Des approximations de variance sont proposées dans chaque cas et l'ensemble des méthodes sont comparées sur des jeux de données réelles, pour des scénarios variés de valeurs manquantes. / In this thesis, we address the problem of robust estimation of mean or total electricity consumption curves by sampling in a finite population for the entire population and for small areas. We are also interested in estimating mean curves by sampling in presence of partially missing trajectories.Indeed, many studies carried out in the French electricity company EDF, for marketing or power grid management purposes, are based on the analysis of mean or total electricity consumption curves at a fine time scale, for different groups of clients sharing some common characteristics.Because of privacy issues and financial costs, it is not possible to measure the electricity consumption curve of each customer so these mean curves are estimated using samples. In this thesis, we extend the work of Lardin (2012) on mean curve estimation by sampling by focusing on specific aspects of this problem such as robustness to influential units, small area estimation and estimation in presence of partially or totally unobserved curves.In order to build robust estimators of mean curves we adapt the unified approach to robust estimation in finite population proposed by Beaumont et al (2013) to the context of functional data. To that purpose we propose three approaches : application of the usual method for real variables on discretised curves, projection on Functional Spherical Principal Components or on a Wavelets basis and thirdly functional truncation of conditional biases based on the notion of depth.These methods are tested and compared to each other on real datasets and Mean Squared Error estimators are also proposed.Secondly we address the problem of small area estimation for functional means or totals. We introduce three methods: unit level linear mixed model applied on the scores of functional principal components analysis or on wavelets coefficients, functional regression and aggregation of individual curves predictions by functional regression trees or functional random forests. Robust versions of these estimators are then proposed by following the approach to robust estimation based on conditional biais presented before.Finally, we suggest four estimators of mean curves by sampling in presence of partially or totally unobserved trajectories. The first estimator is a reweighting estimator where the weights are determined using a temporal non parametric kernel smoothing adapted to the context of finite population and missing data and the other ones rely on imputation of missing data. Missing parts of the curves are determined either by using the smoothing estimator presented before, or by nearest neighbours imputation adapted to functional data or by a variant of linear interpolation which takes into account the mean trajectory of the entire sample. Variance approximations are proposed for each method and all the estimators are compared to each other on real datasets for various missing data scenarios.

Page generated in 0.8126 seconds