• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 127
  • 25
  • 20
  • 17
  • 4
  • 4
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 250
  • 250
  • 77
  • 53
  • 53
  • 52
  • 35
  • 33
  • 31
  • 25
  • 25
  • 24
  • 23
  • 20
  • 20
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
231

Uncertainty in radar emitter classification and clustering / Gestion des incertitudes en identification des modes radar

Revillon, Guillaume 18 April 2019 (has links)
En Guerre Electronique, l’identification des signaux radar est un atout majeur de la prise de décisions tactiques liées au théâtre d’opérations militaires. En fournissant des informations sur la présence de menaces, la classification et le partitionnement des signaux radar ont alors un rôle crucial assurant un choix adapté des contre-mesures dédiées à ces menaces et permettant la détection de signaux radar inconnus pour la mise à jour des bases de données. Les systèmes de Mesures de Soutien Electronique enregistrent la plupart du temps des mélanges de signaux radar provenant de différents émetteurs présents dans l’environnement électromagnétique. Le signal radar, décrit par un motif de modulations impulsionnelles, est alors souvent partiellement observé du fait de mesures manquantes et aberrantes. Le processus d’identification se fonde sur l’analyse statistique des paramètres mesurables du signal radar qui le caractérisent tant quantitativement que qualitativement. De nombreuses approches mêlant des techniques de fusion de données et d’apprentissage statistique ont été développées. Cependant, ces algorithmes ne peuvent pas gérer les données manquantes et des méthodes de substitution de données sont requises afin d’utiliser ces derniers. L’objectif principal de cette thèse est alors de définir un modèle de classification et partitionnement intégrant la gestion des valeurs aberrantes et manquantes présentes dans tout type de données. Une approche fondée sur les modèles de mélange de lois de probabilités est proposée dans cette thèse. Les modèles de mélange fournissent un formalisme mathématique flexible favorisant l’introduction de variables latentes permettant la gestion des données aberrantes et la modélisation des données manquantes dans les problèmes de classification et de partionnement. L’apprentissage du modèle ainsi que la classification et le partitionnement sont réalisés dans un cadre d’inférence bayésienne où une méthode d’approximation variationnelle est introduite afin d’estimer la loi jointe a posteriori des variables latentes et des paramètres. Des expériences sur diverses données montrent que la méthode proposée fournit de meilleurs résultats que les algorithmes standards. / In Electronic Warfare, radar signals identification is a supreme asset for decision making in military tactical situations. By providing information about the presence of threats, classification and clustering of radar signals have a significant role ensuring that countermeasures against enemies are well-chosen and enabling detection of unknown radar signals to update databases. Most of the time, Electronic Support Measures systems receive mixtures of signals from different radar emitters in the electromagnetic environment. Hence a radar signal, described by a pulse-to-pulse modulation pattern, is often partially observed due to missing measurements and measurement errors. The identification process relies on statistical analysis of basic measurable parameters of a radar signal which constitute both quantitative and qualitative data. Many general and practical approaches based on data fusion and machine learning have been developed and traditionally proceed to feature extraction, dimensionality reduction and classification or clustering. However, these algorithms cannot handle missing data and imputation methods are required to generate data to use them. Hence, the main objective of this work is to define a classification/clustering framework that handles both outliers and missing values for any types of data. Here, an approach based on mixture models is developed since mixture models provide a mathematically based, flexible and meaningful framework for the wide variety of classification and clustering requirements. The proposed approach focuses on the introduction of latent variables that give us the possibility to handle sensitivity of the model to outliers and to allow a less restrictive modelling of missing data. A Bayesian treatment is adopted for model learning, supervised classification and clustering and inference is processed through a variational Bayesian approximation since the joint posterior distribution of latent variables and parameters is untractable. Some numerical experiments on synthetic and real data show that the proposed method provides more accurate results than standard algorithms.
232

Imputation of Missing Data with Application to Commodity Futures / Imputation av saknad data med tillämpning på råvaruterminer

Östlund, Simon January 2016 (has links)
In recent years additional requirements have been imposed on financial institutions, including Central Counterparty clearing houses (CCPs), as an attempt to assess quantitative measures of their exposure to different types of risk. One of these requirements results in a need to perform stress tests to check the resilience in case of a stressed market/crisis. However, financial markets develop over time and this leads to a situation where some instruments traded today are not present at the chosen date because they were introduced after the considered historical event. Based on current routines, the main goal of this thesis is to provide a more sophisticated method to impute (fill in) historical missing data as a preparatory work in the context of stress testing. The models considered in this paper include two methods currently regarded as state-of-the-art techniques, based on maximum likelihood estimation (MLE) and multiple imputation (MI), together with a third alternative approach involving copulas. The different methods are applied on historical return data of commodity futures contracts from the Nordic energy market. By using conventional error metrics, and out-of-sample log-likelihood, the conclusion is that it is very hard (in general) to distinguish the performance of each method, or draw any conclusion about how good the models are in comparison to each other. Even if the Student’s t-distribution seems (in general) to be a more adequate assumption regarding the data compared to the normal distribution, all the models are showing quite poor performance. However, by analysing the conditional distributions more thoroughly, and evaluating how well each model performs by extracting certain quantile values, the performance of each method is increased significantly. By comparing the different models (when imputing more extreme quantile values) it can be concluded that all methods produce satisfying results, even if the g-copula and t-copula models seems to be more robust than the respective linear models. / På senare år har ytterligare krav införts för finansiella institut (t.ex. Clearinghus) i ett försök att fastställa kvantitativa mått på deras exponering mot olika typer av risker. Ett av dessa krav innebär att utföra stresstester för att uppskatta motståndskraften under stressade marknader/kriser. Dock förändras finansiella marknader över tiden vilket leder till att vissa instrument som handlas idag inte fanns under den dåvarande perioden, eftersom de introducerades vid ett senare tillfälle. Baserat på nuvarande rutiner så är målet med detta arbete att tillhandahålla en mer sofistikerad metod för imputation (ifyllnad) av historisk data som ett förberedande arbete i utförandet av stresstester. I denna rapport implementeras två modeller som betraktas som de bäst presterande metoderna idag, baserade på maximum likelihood estimering (MLE) och multiple imputation (MI), samt en tredje alternativ metod som involverar copulas. Modellerna tillämpas på historisk data förterminskontrakt från den nordiska energimarkanden. Genom att använda väl etablerade mätmetoder för att skatta noggrannheten förrespektive modell, är det väldigt svårt (generellt) att särskilja prestandan för varje metod, eller att dra några slutsatser om hur bra varje modell är i jämförelse med varandra. även om Students t-fördelningen verkar (generellt) vara ett mer adekvat antagande rörande datan i jämförelse med normalfördelningen, så visar alla modeller ganska svag prestanda vid en första anblick. Däremot, genom att undersöka de betingade fördelningarna mer noggrant, för att se hur väl varje modell presterar genom att extrahera specifika kvantilvärden, kan varje metod förbättras markant. Genom att jämföra de olika modellerna (vid imputering av mer extrema kvantilvärden) kan slutsatsen dras att alla metoder producerar tillfredställande resultat, även om g-copula och t-copula modellerna verkar vara mer robusta än de motsvarande linjära modellerna.
233

Contribution à la sélection de variables en présence de données longitudinales : application à des biomarqueurs issus d'imagerie médicale / Contribution to variable selection in the presence of longitudinal data : application to biomarkers derived from medical imaging

Geronimi, Julia 13 December 2016 (has links)
Les études cliniques permettent de mesurer de nombreuses variables répétées dans le temps. Lorsque l'objectif est de les relier à un critère clinique d'intérêt, les méthodes de régularisation de type LASSO, généralisées aux Generalized Estimating Equations (GEE) permettent de sélectionner un sous-groupe de variables en tenant compte des corrélations intra-patients. Les bases de données présentent souvent des données non renseignées et des problèmes de mesures ce qui entraîne des données manquantes inévitables. L'objectif de ce travail de thèse est d'intégrer ces données manquantes pour la sélection de variables en présence de données longitudinales. Nous utilisons la méthode d'imputation multiple et proposons une fonction d'imputation pour le cas spécifique des variables soumises à un seuil de détection. Nous proposons une nouvelle méthode de sélection de variables pour données corrélées qui intègre les données manquantes : le Multiple Imputation Penalized Generalized Estimating Equations (MI-PGEE). Notre opérateur utilise la pénalité group-LASSO en considérant l'ensemble des coefficients de régression estimés d'une même variable sur les échantillons imputés comme un groupe. Notre méthode permet une sélection consistante sur l'ensemble des imputations, et minimise un critère de type BIC pour le choix du paramètre de régularisation. Nous présentons une application sur l'arthrose du genoux où notre objectif est de sélectionner le sous-groupe de biomarqueurs qui expliquent le mieux les différences de largeur de l'espace articulaire au cours du temps. / Clinical studies enable us to measure many longitudinales variables. When our goal is to find a link between a response and some covariates, one can use regularisation methods, such as LASSO which have been extended to Generalized Estimating Equations (GEE). They allow us to select a subgroup of variables of interest taking into account intra-patient correlations. Databases often have unfilled data and measurement problems resulting in inevitable missing data. The objective of this thesis is to integrate missing data for variable selection in the presence of longitudinal data. We use mutiple imputation and introduce a new imputation function for the specific case of variables under detection limit. We provide a new variable selection method for correlated data that integrate missing data : the Multiple Imputation Penalized Generalized Estimating Equations (MI-PGEE). Our operator applies the group-LASSO penalty on the group of estimated regression coefficients of the same variable across multiply-imputed datasets. Our method provides a consistent selection across multiply-imputed datasets, where the optimal shrinkage parameter is chosen by minimizing a BIC-like criteria. We then present an application on knee osteoarthritis aiming to select the subset of biomarkers that best explain the differences in joint space width over time.
234

Estimation de l'occupation des sols à grande échelle pour l'exploitation d'images d'observation de la Terre à hautes résolutions spatiale, spectrale et temporelle / Exploitation of high spatial, spectral and temporal resolution Earth observation imagery for large area land cover estimation

Rodes Arnau, Isabel 10 November 2016 (has links)
Les missions spatiales d'observation de la Terre de nouvelle génération telles que Sentinel-2 (préparé par l'Agence Spatiale Européenne ESA dans le cadre du programme Copernicus, auparavant appelé Global Monitoring for Environment and Security ou GMES) ou Venµs, conjointement développé par l'Agence Spatiale Française (Centre National d 'Études Spatiales CNES) et l'Agence Spatiale Israélienne (ISA), vont révolutionner la surveillance de l'environnement d' aujourd'hui avec le rendement de volumes inédits de données en termes de richesse spectrale, de revisite temporelle et de résolution spatiale. Venµs livrera des images dans 12 bandes spectrales de 412 à 910 nm, une répétitivité de 2 jours et une résolution spatiale de 10 m; les satellites jumeaux Sentinel-2 assureront une couverture dans 13 bandes spectrales de 443 à 2200 nm, avec une répétitivité de 5 jours, et des résolutions spatiales de 10 à 60m. La production efficace de cartes d'occupation des sols basée sur l'exploitation de tels volumes d'information pour grandes surfaces est un défi à la fois en termes de coûts de traitement mais aussi de variabilité des données. En général, les méthodes classiques font soit usage des approches surveillées (trop coûteux en termes de travaux manuels pour les grandes surfaces), ou soit ciblent des modèles locaux spécialisés pour des problématiques précises (ne s'appliquent pas à autres terrains ou applications), ou comprennent des modèles physiques complexes avec coûts de traitement rédhibitoires. Ces approches existantes actuelles sont donc inefficaces pour l'exploitation du nouveau type de données que les nouvelles missions fourniront, et un besoin se fait sentir pour la mise en œuvre de méthodes précises, rapides et peu supervisées qui permettent la généralisation à l'échelle de grandes zones avec des résolutions élevées. Afin de permettre l'exploitation des volumes de données précédemment décrits, l'objectif de ce travail est la conception et validation d'une approche entièrement automatique qui permet l'estimation de la couverture terrestre de grandes surfaces avec imagerie d'observation de la Terre de haute résolution spatiale, spectrale et temporelle, généralisable à des paysages différents, et offrant un temps de calcul opérationnel avec ensembles de données satellitaires simulés, en préparation des prochaines missions. Cette approche est basée sur l'intégration d'algorithmes de traitement de données, tels que les techniques d'apprentissage de modèles et de classification, et des connaissances liées à l'occupation des sols sur des questions écologiques et agricoles, telles que les variables avec un impact sur la croissance de la végétation ou les pratiques de production. Par exemple, la nouvelle introduction de température comme axe temporel pour un apprentissage des modèles ultérieurs intègre un facteur établi de la croissance de la végétation à des techniques d'apprentissage automatiques pour la caractérisation des paysages. Une attention particulière est accordée au traitement de différentes questions, telles que l'automatisation, les informations manquantes (déterminées par des passages satellitaires, des effets de réflexion des nuages, des ombres ou encore la présence de neige), l'apprentissage et les données de validation limitées, les échantillonnages temporels irréguliers (différent nombre d'images disponible pour chaque période et région, données inégalement réparties dans le temps), la variabilité des données, et enfin la possibilité de travailler avec différents ensembles de données et nomenclatures. / The new generation Earth observation missions such as Sentinel-2 (a twin-satellite initiative prepared by the European Space Agency, ESA, in the frame of the Copernicus programme, previously known as Global Monitoring for Environment and Security or GMES) and Venµs, jointly developed by the French Space Agency (Centre National d'Études Spatiales, CNES) and the Israeli Space Agency (ISA), will revolutionize present-day environmental monitoring with the yielding of unseen volumes of data in terms of spectral richness, temporal revisit and spatial resolution. Venµs will deliver images in 12 spectral bands from 412 to 910 nm, a repetitivity of 2 days, and a spatial resolution of 10 m; the twin Sentinel-2 satellites will provide coverage in 13 spectral bands from 443 to 2200 nm, with a repetitivity of 5 days, and spatial resolutions of 10 to 60m. The efficient production of land cover maps based on the exploitation of such volumes of information for large areas is challenging both in terms of processing costs and data variability. In general, conventional methods either make use of supervised approaches (too costly in terms of manual work for large areas), target specialised local models for precise problem areas (not applicable to other terrains or applications), or include complex physical models with inhibitory processing costs. These existent present-day approaches are thus inefficient for the exploitation of the new type of data that the new missions will provide, and a need arises for the implementation of accurate, fast and minimally supervised methods that allow for generalisation to large scale areas with high resolutions. In order to allow for the exploitation of the previously described volumes of data, the objective of this thesis is the conception, design, and validation of a fully automatic approach that allows the estimation of large-area land cover with high spatial, spectral and temporal resolution Earth observation imagery, being generalisable to different landscapes, and offering operational computation times with simulated satellite data sets, in preparation of the coming missions.
235

Trendschätzung in Large-Scale Assessments bei differenziellem Itemfunktionieren

Sachse, Karoline A. 27 February 2020 (has links)
Differenzielles Itemfunktionieren bedeutet für die Trendschätzung durch Linking in querschnittlich angelegten Large-Scale Assessments eine Voraussetzungsverletzung. Solche Voraussetzungsverletzungen können sich negativ auf die Eigenschaften von Trendschätzern auswirken, woraus sich Einschränkungen für die Interpretierbarkeit der Trendschätzung ergeben können. Die vorliegende Arbeit umfasst, eingebettet in einen Rahmungstext, drei Einzelbeiträge, die sich mit der Prüfung der Auswirkung differenziellen Itemfunktionierens unterschiedlicher Provenienz auseinandersetzen. Im ersten Einzelbeitrag wird die Interaktion von Linkingdesigns und Linkingmethoden mit zwischen Ländern und über die Zeit unsystematisch unterschiedlich funktionierenden Items untersucht. Dabei zeigte sich, dass die Wahl des Designs von großer Bedeutung sein kann, während der Performanzunterschied zwischen gängigen Linkingmethoden marginal war. Darüber hinaus führte der häufig praktizierte Ausschluss von differenziell funktionierenden Items zu einem Effizienzverlust. Im zweiten Einzelbeitrag wird die Unsicherheit der Trendschätzung, die entsteht, wenn Items zwischen Ländern und über die Zeit unsystematisch unterschiedlich funktionieren, quantifiziert und in die Berechnung der zugehörigen Standardfehler integriert. Im dritten Einzelbeitrag wird betrachtet, wie differenziellem Itemfunktionieren begegnet werden kann, das durch fehlende Werte und wechselnde Ausfallmechanismen zustande kommt. Wurden die fehlenden Werte inadäquat behandelt, verloren die Trendschätzer ihre Erwartungstreue und Konsistenz sowie an Effizienz. In der Summe wird in der vorliegenden Arbeit identifiziert und hervorgehoben, dass es in den untersuchten Bedingungen je nach Art des differenziellen Itemfunktionierens effektive Möglichkeiten des Umgangs mit diesem gibt, die potenziellen Einschränkungen bei der validen Interpretation der Trendschätzung zumindest teilweise entgegenwirken können. / Differential item functioning signifies a violation of the prerequisites required for trend estimation, which involves the linking of cross-sectional large-scale assessments. Such violations can negatively affect the properties of the trend estimators. Hence, the interpretability of trend estimates will be limited under such circumstances. Embedded within an overarching framework, three individual contributions that examine and deal with the effects of differential item functioning from different origins are presented in the current dissertation. The first article examines the interactions of linking designs and linking methods with items that show unsystematic and differential functioning between countries and across time. It showed that the choice of the design can be of great importance, whereas the difference in performance between common linking methods was marginal. In addition, the exclusion of differentially functioning items, an approach that is frequently used in practice, led to a loss of efficiency. In the second contribution, the uncertainty for the trend estimation resulting from items that show unsystematic and differential functioning between countries and across time is quantified and incorporated into the calculation of the trends' standard errors. The third article focuses on differential item functioning that is induced by missing values and nonresponse mechanisms that change over time. When the missing values were treated inappropriately, the trend estimators lost their unbiasedness, their consistency, and their efficiency. In sum, this dissertation identifies and emphasizes the ideas that, depending on the type of differential item functioning, there are effective ways to deal with it under the investigated conditions, and these can at least partially counteract potential limitations so that the trend estimates can still be interpreted validly.
236

Efficient Data Driven Multi Source Fusion

Islam, Muhammad Aminul 10 August 2018 (has links)
Data/information fusion is an integral component of many existing and emerging applications; e.g., remote sensing, smart cars, Internet of Things (IoT), and Big Data, to name a few. While fusion aims to achieve better results than what any one individual input can provide, often the challenge is to determine the underlying mathematics for aggregation suitable for an application. In this dissertation, I focus on the following three aspects of aggregation: (i) efficient data-driven learning and optimization, (ii) extensions and new aggregation methods, and (iii) feature and decision level fusion for machine learning with applications to signal and image processing. The Choquet integral (ChI), a powerful nonlinear aggregation operator, is a parametric way (with respect to the fuzzy measure (FM)) to generate a wealth of aggregation operators. The FM has 2N variables and N(2N − 1) constraints for N inputs. As a result, learning the ChI parameters from data quickly becomes impractical for most applications. Herein, I propose a scalable learning procedure (which is linear with respect to training sample size) for the ChI that identifies and optimizes only data-supported variables. As such, the computational complexity of the learning algorithm is proportional to the complexity of the solver used. This method also includes an imputation framework to obtain scalar values for data-unsupported (aka missing) variables and a compression algorithm (lossy or losselss) of the learned variables. I also propose a genetic algorithm (GA) to optimize the ChI for non-convex, multi-modal, and/or analytical objective functions. This algorithm introduces two operators that automatically preserve the constraints; therefore there is no need to explicitly enforce the constraints as is required by traditional GA algorithms. In addition, this algorithm provides an efficient representation of the search space with the minimal set of vertices. Furthermore, I study different strategies for extending the fuzzy integral for missing data and I propose a GOAL programming framework to aggregate inputs from heterogeneous sources for the ChI learning. Last, my work in remote sensing involves visual clustering based band group selection and Lp-norm multiple kernel learning based feature level fusion in hyperspectral image processing to enhance pixel level classification.
237

Stressful Events and Religious Identities: Investigating the Risk of Radical Accommodation

Uzdavines, Alex 30 August 2017 (has links)
No description available.
238

Essays zu methodischen Herausforderungen im Large-Scale Assessment

Robitzsch, Alexander 21 January 2016 (has links)
Mit der wachsenden Verbreitung empirischer Schulleistungsleistungen im Large-Scale Assessment gehen eine Reihe methodischer Herausforderungen einher. Die vorliegende Arbeit untersucht, welche Konsequenzen Modellverletzungen in eindimensionalen Item-Response-Modellen (besonders im Rasch-Modell) besitzen. Insbesondere liegt der Fokus auf vier methodischen Herausforderungen von Modellverletzungen. Erstens, implizieren Positions- und Kontexteffekte, dass gegenüber einem eindimensionalen IRT-Modell Itemschwierigkeiten nicht unabhängig von der Position im Testheft und der Zusammenstellung des Testheftes ausgeprägt sind und Schülerfähigkeiten im Verlauf eines Tests variieren können. Zweitens, verursacht die Vorlage von Items innerhalb von Testlets lokale Abhängigkeiten, wobei unklar ist, ob und wie diese in der Skalierung berücksichtigt werden sollen. Drittens, können Itemschwierigkeiten aufgrund verschiedener Lerngelegenheiten zwischen Schulklassen variieren. Viertens, sind insbesondere in low stakes Tests nicht bearbeitete Items vorzufinden. In der Arbeit wird argumentiert, dass trotz Modellverletzungen nicht zwingend von verzerrten Schätzungen von Itemschwierigkeiten, Personenfähigkeiten und Reliabilitäten ausgegangen werden muss. Außerdem wird hervorgehoben, dass man psychometrisch häufig nicht entscheiden kann und entscheiden sollte, welches IRT-Modell vorzuziehen ist. Dies trifft auch auf die Fragestellung zu, wie nicht bearbeitete Items zu bewerten sind. Ausschließlich Validitätsüberlegungen können dafür Hinweise geben. Modellverletzungen in IRT-Modellen lassen sich konzeptuell plausibel in den Ansatz des Domain Samplings (Item Sampling; Generalisierbarkeitstheorie) einordnen. In dieser Arbeit wird gezeigt, dass die statistische Unsicherheit in der Modellierung von Kompetenzen nicht nur von der Stichprobe der Personen, sondern auch von der Stichprobe der Items und der Wahl statistischer Modelle verursacht wird. / Several methodological challenges emerge in large-scale student assessment studies like PISA and TIMSS. Item response models (IRT models) are essential for scaling student abilities within these studies. This thesis investigates the consequences of several model violations in unidimensional IRT models (especially in the Rasch model). In particular, this thesis focuses on the following four methodological challenges of model violations. First, position effects and contextual effects imply (in comparison to unidimensional IRT models) that item difficulties depend on the item position in a test booklet as well as on the composition of a test booklet. Furthermore, student abilities are allowed to vary among test positions. Second, the administration of items within testlets causes local dependencies, but it is unclear whether and how these dependencies should be taken into account for the scaling of student abilities. Third, item difficulties can vary among different school classes due to different opportunities to learn. Fourth, the amount of omitted items is in general non-negligible in low stakes tests. In this thesis it is argued that estimates of item difficulties, student abilities and reliabilities can be unbiased despite model violations. Furthermore, it is argued that the choice of an IRT model cannot and should not be made (solely) from a psychometric perspective. This also holds true for the problem of how to score omitted items. Only validity considerations provide reasons for choosing an adequate scoring procedure. Model violations in IRT models can be conceptually classified within the approach of domain sampling (item sampling; generalizability theory). In this approach, the existence of latent variables need not be posed. It is argued that statistical uncertainty in modelling competencies does not only depend on the sampling of persons, but also on the sampling of items and on the choice of statistical models.
239

Análise de dados categorizados com omissão em variáveis explicativas e respostas / Categorical data analysis with missingness in explanatory and response variables

Poleto, Frederico Zanqueta 08 April 2011 (has links)
Nesta tese apresentam-se desenvolvimentos metodológicos para analisar dados com omissão e também estudos delineados para compreender os resultados de tais análises. Escrutinam-se análises de sensibilidade bayesiana e clássica para dados com respostas categorizadas sujeitas a omissão. Mostra-se que as componentes subjetivas de cada abordagem podem influenciar os resultados de maneira não-trivial, independentemente do tamanho da amostra, e que, portanto, as conclusões devem ser cuidadosamente avaliadas. Especificamente, demonstra-se que distribuições \\apriori\\ comumente consideradas como não-informativas ou levemente informativas podem, na verdade, ser bastante informativas para parâmetros inidentificáveis, e que a escolha do modelo sobreparametrizado também tem um papel importante. Quando há omissão em variáveis explicativas, também é necessário propor um modelo marginal para as covariáveis mesmo se houver interesse apenas no modelo condicional. A especificação incorreta do modelo para as covariáveis ou do modelo para o mecanismo de omissão leva a inferências enviesadas para o modelo de interesse. Trabalhos anteriormente publicados têm-se dividido em duas vertentes: ou utilizam distribuições semiparamétricas/não-paramétricas, flexíveis para as covariáveis, e identificam o modelo com a suposição de um mecanismo de omissão não-informativa, ou empregam distribuições paramétricas para as covariáveis e permitem um mecanismo mais geral, de omissão informativa. Neste trabalho analisam-se respostas binárias, combinando um mecanismo de omissão informativa com um modelo não-paramétrico para as covariáveis contínuas, por meio de uma mistura induzida pela distribuição \\apriori\\ de processo de Dirichlet. No caso em que o interesse recai apenas em momentos da distribuição das respostas, propõe-se uma nova análise de sensibilidade sob o enfoque clássico para respostas incompletas que evita suposições distribucionais e utiliza parâmetros de sensibilidade de fácil interpretação. O procedimento tem, em particular, grande apelo na análise de dados contínuos, campo que tradicionalmente emprega suposições de normalidade e/ou utiliza parâmetros de sensibilidade de difícil interpretação. Todas as análises são ilustradas com conjuntos de dados reais. / We present methodological developments to conduct analyses with missing data and also studies designed to understand the results of such analyses. We examine Bayesian and classical sensitivity analyses for data with missing categorical responses and show that the subjective components of each approach can influence results in non-trivial ways, irrespectively of the sample size, concluding that they need to be carefully evaluated. Specifically, we show that prior distributions commonly regarded as slightly informative or non-informative may actually be too informative for non-identifiable parameters, and that the choice of over-parameterized models may drastically impact the results. When there is missingness in explanatory variables, we also need to consider a marginal model for the covariates even if the interest lies only on the conditional model. An incorrect specification of either the model for the covariates or of the model for the missingness mechanism leads to biased inferences for the parameters of interest. Previously published works are commonly divided into two streams: either they use semi-/non-parametric flexible distributions for the covariates and identify the model via a non-informative missingness mechanism, or they employ parametric distributions for the covariates and allow a more general informative missingness mechanism. We consider the analysis of binary responses, combining an informative missingness model with a non-parametric model for the continuous covariates via a Dirichlet process mixture. When the interest lies only in moments of the response distribution, we consider a new classical sensitivity analysis for incomplete responses that avoids distributional assumptions and employs easily interpreted sensitivity parameters. The procedure is particularly useful for analyses of missing continuous data, an area where normality is traditionally assumed and/or relies on hard-to-interpret sensitivity parameters. We illustrate all analyses with real data sets.
240

Análise de dados categorizados com omissão em variáveis explicativas e respostas / Categorical data analysis with missingness in explanatory and response variables

Frederico Zanqueta Poleto 08 April 2011 (has links)
Nesta tese apresentam-se desenvolvimentos metodológicos para analisar dados com omissão e também estudos delineados para compreender os resultados de tais análises. Escrutinam-se análises de sensibilidade bayesiana e clássica para dados com respostas categorizadas sujeitas a omissão. Mostra-se que as componentes subjetivas de cada abordagem podem influenciar os resultados de maneira não-trivial, independentemente do tamanho da amostra, e que, portanto, as conclusões devem ser cuidadosamente avaliadas. Especificamente, demonstra-se que distribuições \\apriori\\ comumente consideradas como não-informativas ou levemente informativas podem, na verdade, ser bastante informativas para parâmetros inidentificáveis, e que a escolha do modelo sobreparametrizado também tem um papel importante. Quando há omissão em variáveis explicativas, também é necessário propor um modelo marginal para as covariáveis mesmo se houver interesse apenas no modelo condicional. A especificação incorreta do modelo para as covariáveis ou do modelo para o mecanismo de omissão leva a inferências enviesadas para o modelo de interesse. Trabalhos anteriormente publicados têm-se dividido em duas vertentes: ou utilizam distribuições semiparamétricas/não-paramétricas, flexíveis para as covariáveis, e identificam o modelo com a suposição de um mecanismo de omissão não-informativa, ou empregam distribuições paramétricas para as covariáveis e permitem um mecanismo mais geral, de omissão informativa. Neste trabalho analisam-se respostas binárias, combinando um mecanismo de omissão informativa com um modelo não-paramétrico para as covariáveis contínuas, por meio de uma mistura induzida pela distribuição \\apriori\\ de processo de Dirichlet. No caso em que o interesse recai apenas em momentos da distribuição das respostas, propõe-se uma nova análise de sensibilidade sob o enfoque clássico para respostas incompletas que evita suposições distribucionais e utiliza parâmetros de sensibilidade de fácil interpretação. O procedimento tem, em particular, grande apelo na análise de dados contínuos, campo que tradicionalmente emprega suposições de normalidade e/ou utiliza parâmetros de sensibilidade de difícil interpretação. Todas as análises são ilustradas com conjuntos de dados reais. / We present methodological developments to conduct analyses with missing data and also studies designed to understand the results of such analyses. We examine Bayesian and classical sensitivity analyses for data with missing categorical responses and show that the subjective components of each approach can influence results in non-trivial ways, irrespectively of the sample size, concluding that they need to be carefully evaluated. Specifically, we show that prior distributions commonly regarded as slightly informative or non-informative may actually be too informative for non-identifiable parameters, and that the choice of over-parameterized models may drastically impact the results. When there is missingness in explanatory variables, we also need to consider a marginal model for the covariates even if the interest lies only on the conditional model. An incorrect specification of either the model for the covariates or of the model for the missingness mechanism leads to biased inferences for the parameters of interest. Previously published works are commonly divided into two streams: either they use semi-/non-parametric flexible distributions for the covariates and identify the model via a non-informative missingness mechanism, or they employ parametric distributions for the covariates and allow a more general informative missingness mechanism. We consider the analysis of binary responses, combining an informative missingness model with a non-parametric model for the continuous covariates via a Dirichlet process mixture. When the interest lies only in moments of the response distribution, we consider a new classical sensitivity analysis for incomplete responses that avoids distributional assumptions and employs easily interpreted sensitivity parameters. The procedure is particularly useful for analyses of missing continuous data, an area where normality is traditionally assumed and/or relies on hard-to-interpret sensitivity parameters. We illustrate all analyses with real data sets.

Page generated in 0.1183 seconds