Global ETD Search

191	Boclusterização na análise de dados incertos / Biclustering on uncertais data analysis França, Fabricio Olivetti de 17 August 2018 (has links) Orientador: Fernando Jose Von Zuben / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-17T09:17:50Z (GMT). No. of bitstreams: 1 Franca_FabricioOlivettide_D.pdf: 3983253 bytes, checksum: 6b0d30018574ad5a6e0cce05c34606b8 (MD5) Previous issue date: 2010 / Resumo: O processo de aquisição de dados está sujeito a muitas fontes de incerteza e inconsistência. Essas incertezas podem fazer com que os dados se tornem ruidosos ou impedir a aquisição dos mesmos, gerando o problema de dados faltantes. A maioria das ferramentas utilizadas para tratar tais problemas age de forma global em relação às informações da base de dados e ignora o efeito que o ruído pode ter na análise desses. Esta tese tem como objetivo explorar as propriedades do processo de biclusterização, que faz uma análise local dos dados, criando múltiplos modelos de imputação de dados que buscam minimizar o erro de predição dos valores faltantes na base de dados. Primeiramente, é proposto um novo algoritmo de biclusterização com um melhor desempenho que outras abordagens utilizadas atualmente, enfatizando a capacidade dos biclusters em gerar modelos com ruído reduzido. Em seguida, é proposta uma formulação de otimização quadrática para, utilizando os modelos locais gerados pelo bicluster, imputar os valores faltantes na base de dados. Os resultados obtidos indicam que a utilização da biclusterização ajuda a reduzir o erro de predição da imputação, além de fornecer condições favoráveis a uma análise a posteriori das informações contidas nos dados / Abstract: The data acquisition process is subject to many inconsistencies and uncertainties. These uncertainties may produce noisy data or even provoke the absence of some of them, thus leading to the missing data problem. Most procedures used to deal with such problem act in a global manner, relatively to the dataset, and ignore the noise e_ect on such analysis. The objective of this thesis is to explore the properties of the so called biclustering method, which performs a local data analysis, creating several imputation models for the dataset in order to minimize the prediction error estimating missing values of the dataset. First, it is proposed a new biclustering algorithm with a better performance than the one produced by other traditional approaches, with emphasis on the noise reduction capability of the models generated by the biclusters. Next, it is proposed the formulation of a quadratic optimization problem to impute the missing data by means of the local models engendered by a set of biclusters. The obtained results show that the use of biclustering helps to reduce the prediction error of data imputation, besides providing some interesting conditions for an a posteriori analysis of the dataset / Doutorado / Engenharia de Computação / Doutor em Engenharia Elétrica Aprendizado de máquina Dados faltantes (Estatística) Cluster Mineração de dados (Computação) Algoritmos evolutivos Computer training Missing data (Statistics) Cluster Data mining (Computer) Evolutionary algorithms
192	Problématiques statistiques rencontrées dans l’étude du traitement antirétroviral des adultes infectés par le VIH en Afrique subsaharienne / Statistical problems encountered in the study of antiretroviral treatment of adults infected with HIV in sub-Saharan Africa Tchatchueng Mbougua, Jules Brice 12 June 2012 (has links) Partant de problématiques statistiques rencontrées dans l'étude du traitement antirétroviral des adultes infectés par le virus de l'immunodéficience humaine (VIH) en Afrique subsaharienne, cette thèse cherche, d'une part, à favoriser la vulgarisation d'outils méthodologiques relativement récents auprès d'un public d'utilisateurs moins avertis et, d'autre part, à participer au développement de nouveaux outils. Le premier chapitre présente différentes méthodes de modélisation des données longitudinales dont des méthodes d'analyse de l'évolution d'un critère au cours du temps (les modèles linéaires mixtes généralisés et les modèles d'équations d'estimation généralisées) ou de la survenue d'un évènement au cours du temps (le modèle semi-paramétrique de Cox et ses extensions à la prise en compte des covariables dépendantes du temps et de la censure informative). Le deuxième chapitre s'intéresse aux tests de non-infériorité et propose deux développements de la procédure classique de ces tests pour les cas où la marge de non-infériorité est relative. Enfin, le troisième chapitre aborde la question des données manquantes et propose une extension de la méthode d'imputation multiple par les distributions conditionnelles univariées qui consiste à prendre en compte des effets non-linéaires des covariables dans les modèles d'imputation par des fonctions B-splines. Ces méthodes sont illustrées par des études sur le VIH au Cameroun et au Sénégal. / On the basis of statistical challenges encountered in study of antiretroviral treatment of adults infected with human immunodeficiency virus (HIV) in sub-Saharan Africa, this thesis aims to promote the dissemination of relatively recent methodological tools of less aware audience of users on one hand and to participate to development of new tools on the other hand. The first chapter presents various methods for modeling longitudinal data of which analysis methods for changing of a criterion over time (the generalized linear mixed models and models of generalized estimating equations) or the occurrence of an event over time (the semi-parametric Cox model and its extensions to take into account time-dependent covariates and informative censoring). The second chapter focuses on non-inferiority test and provides two developments of the classical procedure of these tests in cases where the non-inferiority margin is relative. The third chapter addresses the question of missing data and proposes an extension of the multiple imputation method based on fully conditional specification, to take into account nonlinear effects of covariates in the imputation models using B-splines functions. These methods are illustrated by studies on HIV in Cameroon and Senegal. Méthodes statistiques Données longitudinales Analyse de survie Données manquantes Non-infériorité Vih/sida Statistic methods Longitudinal data Survival analysis Missing data Non-inferiority Hiv/aids
193	Non-response and information bias in population-based psychiatric research:the Northern Finland 1966 Birth Cohort study Haapea, M. (Marianne) 13 April 2010 (has links) Abstract Study samples in medical research are selected according to the objectives of the studies. Researchers seek to collect data as extensively and reliably as possible. In practice, however, data are often missing or may be incorrect. This thesis covers some of the problems concerning missing data and data collection in psychiatric research. Methods for adjusting for missing data and for evaluating the reliability of data are presented. The data originate from the Northern Finland 1966 Birth Cohort (N = 12058). This study explored how participation in an epidemiologic study that includes questionnaires and a clinical examination is affected by mental health (N = 11540), and whether non-participants experience more severe clinical symptoms than participants in a psychiatric field study (N = 145) among subjects with a psychosis. Inverse probability weighting (IPW) was used to adjust for non-participation in comparisons of brain volumes between schizophrenia and control groups. The precision of self-reported medication use was also explored (N = 7625). In an epidemiologic study of all cohort members, subjects with a psychiatric disorder participated less actively than those without one. In the psychiatric field study, non participants were more often patients with schizophrenia than other psychoses. The psychiatric symptoms of non-participants were more severe and they needed more hospital care than participants. The use of IPW led to higher estimates of cerebrospinal fluid volume and lower estimates of grey and white matter volumes in schizophrenia patients, and increased the statistical significance of the differences in brain volume estimates between the schizophrenia and control groups. The precision of self-reported data on psychoactive medication use was substantial. Due to non-participation, the true prevalence of psychiatric disorders is probably higher than the prevalence estimates from field studies that are based on data provided by participants only. In order to reflect the true differences in the target population, weighting methods can be used to improve estimates affected by non-participation. Regarding psychoactive medication use, data collected by postal questionnaire can be assumed accurate enough for study purposes. However, it may underestimate the prevalence of medication use due to non-participation. / Abstract Tutkimusaineisto valitaan tutkimuksen tavoitteiden perusteella. Tavoitteena on kerätä kattava ja virheetön aineisto. Käytännössä kuitenkin osa tiedoista voi puuttua tai olla virheellistä. Tässä väitöskirjassa esitellään yleisesti menetelmiä huomioida puuttuva tieto analyyseissä ja arvioida aineistojen luotettavuutta psykiatrisessa tutkimuksessa. Aineisto perustuu Pohjois-Suomen vuoden 1966 syntymäkohorttiin (N = 12058). Väitöskirjassa tutkittiin, miten psykiatrinen sairastavuus vaikuttaa osallistumiseen epidemiologisessa tutkimuksessa, joka sisälsi kyselyitä ja terveystutkimuksen (N = 11540), sekä erosiko psykiatriseen kenttätutkimukseen osallistuneiden ja osallistumattomien psykoosipotilaiden kliininen taudinkuva toisistaan (N = 145). Käänteisen todennäköisyyden painotusmenetelmää käytettiin korjaamaan puuttuvan tiedon aiheuttamaa virhettä aivovolyymien estimaateissa skitsofreniapotilailla. Lisäksi arvioitiin itse ilmoitetun lääkekäyttötiedon luotettavuutta (N = 7625). Epidemiologisessa tutkimuksessa ne kohortin jäsenet, joilla oli jokin psykiatrinen sairaus, osallistuivat passiivisemmin kuin ne, joilla ei ollut psykiatrista sairautta. Psykoosipotilaat, jotka eivät osallistuneet psykiatriseen kenttätutkimukseen, sairastivat tutkimukseen osallistuneita useammin skitsofreniaa kuin muita psykooseja ja heidän taudinkuvansa oli vakavampi. Painottaminen kasvatti aivonesteen ja alensi harmaan ja valkean aineen tilavuuksien estimaatteja skitsofreniapotilailla, ja lisäsi aivovolyymien erojen tilastollista merkitsevyyttä skitsofreniapotilaiden ja vertailuhenkilöiden välillä. Itse ilmoitetun psykoaktiivisten lääkkeiden käyttötiedon luotettavuus oli merkittävä. Kadosta johtuen psykiatristen sairauksien todellinen vallitsevuus on todennäköisesti korkeampi kuin vallitsevuuden estimaatit, jotka on laskettu tutkimukseen osallistuneiden tiedoista. Painotusmenetelmiä voidaan käyttää parantamaan puuttuvan tiedon vääristämiä estimaatteja, koska painottamalla huomioidaan todellisia eroja kohdeväestössä. Tutkittaessa lääkekäyttötietoa postikyselyillä kerätyn aineiston voidaan olettaa olevan laadultaan riittävä tutkimustarpeisiin. Birth cohort follow-up missing data non-participation non-response bias precision register reliability schizophrenia self-report weighting kyselytutkimus osallistumattomuusharha painottaminen puuttuva tieto rekisteritutkimus seuranta skitsofrenia syntymäkohortti tarkkuus toistettavuus
194	Tratamento de dados faltantes empregando biclusterização com imputação múltipla / Treatment of missing data using biclustering with multiple imputation Veroneze, Rosana, 1982- 18 August 2018 (has links) Orientadores: Fernando José Von Zuben, Fabrício Olivetti de França. / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-18T15:42:38Z (GMT). No. of bitstreams: 1 Veroneze_Rosana_M.pdf: 1996086 bytes, checksum: d4be557c3ffb4512e37232c537c78721 (MD5) Previous issue date: 2011 / Resumo: As respostas fornecidas por sistemas de recomendação podem ser interpretadas como dados faltantes a serem imputados a partir do conhecimento dos dados presentes e de sua relação com os dados faltantes. Existem variadas técnicas de imputação de dados faltantes, sendo que o emprego de imputação múltipla será considerado neste trabalho. Também existem propostas alternativas para se chegar à imputação múltipla, sendo que se propõe aqui a biclusterização como uma estratégia eficaz, flexível e com desempenho promissor. Para tanto, primeiramente é realizada a análise de sensibilidade paramétrica do algoritmo SwarmBcluster, recentemente proposto para a tarefa de biclusterização e já adaptado, na literatura, para a realização de imputação única. Essa análise mostrou que a escolha correta dos parâmetros pode melhorar o desempenho do algoritmo. Em seguida, o SwarmBcluster é estendido para a implementação de imputação múltipla, sendo comparado com o bem-conhecido algoritmo NORM. A qualidade dos resultados obtidos é mensurada através de métricas diversas, as quais mostram que a biclusterização conduz a imputações múltiplas de melhor qualidade na maioria dos experimentos / Abstract: The answers provided by recommender systems can be interpreted as missing data to be imputed considering the knowledge associated with the available data and the relation between the available and the missing data. There is a wide range of techniques for data imputation, and this work is concerned with multiple imputation. Alternative approaches for multiple imputation have already been proposed, and this work takes biclustering as an effective, flexible and promising strategy. To this end, firstly it is performed a parameter sensitivity analysis of the SwarmBcluster algorithm, recently proposed to implement biclustering and already adapted, in the literature, to accomplish single imputation of missing data. This analysis has indicated that a proper choice of parameters may significantly improve the performance of the algorithm. Secondly, SwarmBcluster was extended to implement multiple imputation, being compared with the well-known NORM algorithm. The quality of the obtained results is computed considering diverse metrics, which reveal that biclustering guides to imputations of better quality in the majority of the experiments / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica Dados faltantes (Estatística) Sistemas de recomendação Cluster Algoritmos evolutivos Mineração de dados (Computação) Missing data (Statistics) Recommender systems Cluster Evolutionary algorithms Data mining
195	Fast and slow machine learning / Apprentissage automatique rapide et lent Montiel López, Jacob 07 March 2019 (has links) L'ère du Big Data a révolutionné la manière dont les données sont créées et traitées. Dans ce contexte, de nombreux défis se posent, compte tenu de la quantité énorme de données disponibles qui doivent être efficacement gérées et traitées afin d’extraire des connaissances. Cette thèse explore la symbiose de l'apprentissage en mode batch et en flux, traditionnellement considérés dans la littérature comme antagonistes, sur le problème de la classification à partir de flux de données en évolution. L'apprentissage en mode batch est une approche bien établie basée sur une séquence finie: d'abord les données sont collectées, puis les modèles prédictifs sont créés, finalement le modèle est appliqué. Par contre, l’apprentissage par flux considère les données comme infinies, rendant le problème d’apprentissage comme une tâche continue (sans fin). De plus, les flux de données peuvent évoluer dans le temps, ce qui signifie que la relation entre les caractéristiques et la réponse correspondante peut changer. Nous proposons un cadre systématique pour prévoir le surendettement, un problème du monde réel ayant des implications importantes dans la société moderne. Les deux versions du mécanisme d'alerte précoce (batch et flux) surpassent les performances de base de la solution mise en œuvre par le Groupe BPCE, la deuxième institution bancaire en France. De plus, nous introduisons une méthode d'imputation évolutive basée sur un modèle pour les données manquantes dans la classification. Cette méthode présente le problème d'imputation sous la forme d'un ensemble de tâches de classification / régression résolues progressivement.Nous présentons un cadre unifié qui sert de plate-forme d'apprentissage commune où les méthodes de traitement par batch et par flux peuvent interagir de manière positive. Nous montrons que les méthodes batch peuvent être efficacement formées sur le réglage du flux dans des conditions spécifiques. Nous proposons également une adaptation de l'Extreme Gradient Boosting algorithme aux flux de données en évolution. La méthode adaptative proposée génère et met à jour l'ensemble de manière incrémentielle à l'aide de mini-lots de données. Enfin, nous présentons scikit-multiflow, un framework open source en Python qui comble le vide en Python pour une plate-forme de développement/recherche pour l'apprentissage à partir de flux de données en évolution. / The Big Data era has revolutionized the way in which data is created and processed. In this context, multiple challenges arise given the massive amount of data that needs to be efficiently handled and processed in order to extract knowledge. This thesis explores the symbiosis of batch and stream learning, which are traditionally considered in the literature as antagonists. We focus on the problem of classification from evolving data streams.Batch learning is a well-established approach in machine learning based on a finite sequence: first data is collected, then predictive models are created, then the model is applied. On the other hand, stream learning considers data as infinite, rendering the learning problem as a continuous (never-ending) task. Furthermore, data streams can evolve over time, meaning that the relationship between features and the corresponding response (class in classification) can change.We propose a systematic framework to predict over-indebtedness, a real-world problem with significant implications in modern society. The two versions of the early warning mechanism (batch and stream) outperform the baseline performance of the solution implemented by the Groupe BPCE, the second largest banking institution in France. Additionally, we introduce a scalable model-based imputation method for missing data in classification. This method casts the imputation problem as a set of classification/regression tasks which are solved incrementally.We present a unified framework that serves as a common learning platform where batch and stream methods can positively interact. We show that batch methods can be efficiently trained on the stream setting under specific conditions. The proposed hybrid solution works under the positive interactions between batch and stream methods. We also propose an adaptation of the Extreme Gradient Boosting (XGBoost) algorithm for evolving data streams. The proposed adaptive method generates and updates the ensemble incrementally using mini-batches of data. Finally, we introduce scikit-multiflow, an open source framework in Python that fills the gap in Python for a development/research platform for learning from evolving data streams. Apprentissage automatique Flux de données Classification Données manquantes Dérive de concept Big data Machine learning Data stream Classification Missing data Concept drift Big data
196	Comparison of Imputation Methods for Mixed Data Missing at Random Heidt, Kaitlyn 01 May 2019 (has links) A statistician's job is to produce statistical models. When these models are precise and unbiased, we can relate them to new data appropriately. However, when data sets have missing values, assumptions to statistical methods are violated and produce biased results. The statistician's objective is to implement methods that produce unbiased and accurate results. Research in missing data is becoming popular as modern methods that produce unbiased and accurate results are emerging, such as MICE in R, a statistical software. Using real data, we compare four common imputation methods, in the MICE package in R, at different levels of missingness. The results were compared in terms of the regression coefficients and adjusted R^2 values using the complete data set. The CART and PMM methods consistently performed better than the OTF and RF methods. The procedures were repeated on a second sample of real data and the same conclusions were drawn. Missing data Multiple imputation methods Multiple imputation by chained equation Mixed data Multivariate Analysis Physical Sciences and Mathematics Statistical Methodology Statistics and Probability
197	Tests non paramétriques minimax pour de grandes matrices de covariance / Non parametric minimax tests for high dimensional covariance matrices Zgheib, Rania 23 May 2016 (has links) Ces travaux contribuent à la théorie des tests non paramétriques minimax dans le modèle de grandes matrices de covariance. Plus précisément, nous observons $n$ vecteurs indépendants, de dimension $p$, $X_1,ldots, X_n$, ayant la même loi gaussienne $mathcal {N}_p(0, Sigma)$, où $Sigma$ est la matrice de covariance inconnue. Nous testons l'hypothèse nulle $H_0:Sigma = I$, où $I$ est la matrice identité. L'hypothèse alternative est constituée d'un ellipsoïde avec une boule de rayon $varphi$ autour de $I$ enlevée. Asymptotiquement, $n$ et $p$ tendent vers l'infini. La théorie minimax des tests, les autres approches considérées pour le modèle de matrice de covariance, ainsi que le résumé de nos résultats font l'objet de l'introduction.Le deuxième chapitre est consacré aux matrices de covariance $Sigma$ de Toeplitz. Le lien avec le modèle de densité spectrale est discuté. Nous considérons deux types d'ellipsoïdes, décrits par des pondérations polynomiales (dits de type Sobolev) et exponentielles, respectivement.Dans les deux cas, nous trouvons les vitesses de séparation minimax. Nous établissons également des équivalents asymptotiques exacts de l'erreur minimax de deuxième espèce et de l'erreur minimax totale. La procédure de test asymptotiquement minimax exacte est basée sur une U-statistique d'ordre 2 pondérée de façon optimale.Le troisième chapitre considère une hypothèse alternative de matrices de covariance pas nécessairement de Toeplitz, appartenant à un ellipsoïde de type Sobolev de paramètre $alpha$. Nous donnons des équivalents asymptotiques exacts des erreurs minimax de 2ème espèce et totale. Nous proposons une procédure de test adaptative, c-à-d libre de $alpha$, quand $alpha$ appartient à un compact de $(1/2, + infty)$.L'implémentation numérique des procédures introduites dans les deux premiers chapitres montrent qu'elles se comportent très bien pour de grandes valeurs de $p$, en particulier elles gagnent beaucoup sur les méthodes existantes quand $p$ est grand et $n$ petit.Le quatrième chapitre se consacre aux tests adaptatifs dans un modèle de covariance où les observations sont incomplètes. En effet, chaque coordonnée du vecteur est manquante de manière indépendante avec probabilité $1-a$, $ ain (0,1)$, où $a$ peut tendre vers 0. Nous traitons ce problème comme un problème inverse. Nous établissons ici les vitesses minimax de séparation et introduisons de nouvelles procédures adaptatives de test. Les statistiques de test définies ici ont des poids constants. Nous considérons les deux cas: matrices de Toeplitz ou pas, appartenant aux ellipsoïdes de type Sobolev / Our work contributes to the theory of non-parametric minimax tests for high dimensional covariance matrices. More precisely, we observe $n$ independent, identically distributed vectors of dimension $p$, $X_1,ldots, X_n$ having Gaussian distribution $mathcal{N}_p(0,Sigma)$, where $Sigma$ is the unknown covariance matrix. We test the null hypothesis $H_0 : Sigma =I$, where $I$ is the identity matrix. The alternative hypothesis is given by an ellipsoid from which a ball of radius $varphi$ centered in $I$ is removed. Asymptotically, $n$ and $p$ tend to infinity. The minimax test theory, other approaches considered for testing covariance matrices and a summary of our results are given in the introduction.The second chapter is devoted to the case of Toeplitz covariance matrices $Sigma$. The connection with the spectral density model is discussed. We consider two types of ellipsoids, describe by polynomial weights and exponential weights, respectively. We find the minimax separation rate in both cases. We establish the sharp asymptotic equivalents of the minimax type II error probability and the minimax total error probability. The asymptotically minimax test procedure is a U-statistic of order 2 weighted by an optimal way.The third chapter considers alternative hypothesis containing covariance matrices not necessarily Toeplitz, that belong to an ellipsoid of parameter $alpha$. We obtain the minimax separation rate and give sharp asymptotic equivalents of the minimax type II error probability and the minimax total error probability. We propose an adaptive test procedure free of $alpha$, for $alpha$ belonging to a compact of $(1/2, + infty)$.We implement the tests procedures given in the previous two chapters. The results show their good behavior for large values of $p$ and that, in particular, they gain significantly over existing methods for large $p$ and small $n$.The fourth chapter is dedicated to adaptive tests in the model of covariance matrices where the observations are incomplete. That is, each value of the observed vector is missing with probability $1-a$, $a in (0,1)$ and $a$ may tend to 0. We treat this problem as an inverse problem. We establish the minimax separation rates and introduce new adaptive test procedures. Here, the tests statistics are weighted by constant weights. We consider ellipsoids of Sobolev type, for both cases : Toeplitz and non Toeplitz matrices Matrice de covariance Matrice de Toeplitz Tests adaptatifs Vitesse de séparation minimax Asymptotiques exactes Données incomplètes Covariance matrices Toeplitz matrices Adaptive tests Minimax separation rates Sharp asymptotics Missing data
198	Primena mašinskog učenja u problemu nedostajućih podataka pri razvoju prediktivnih modela / Application of machine learning to the problem of missing data in the development of predictive models Vrbaški Dunja 20 July 2020 (has links) <p>Problem nedostajućih podataka je često prisutan prilikom razvoja<br />prediktivnih modela. Umesto uklanjanja podataka koji sadrže<br />vrednosti koje nedostaju mogu se primeniti metode za njihovu<br />imputaciju. Disertacija predlaže metodologiju za pristup analizi<br />uspešnosti imputacija prilikom razvoja prediktivnih modela. Na<br />osnovu iznete metodologije prikazuju se rezultati primene algoritama<br />mašinskog učenja, kao metoda imputacije, prilikom razvoja određenih,<br />konkretnih prediktivnih modela.</p> / <p>The problem of missing data is often present when developing predictive<br />models. Instead of removing data containing missing values, methods for<br />imputation can be applied. The dissertation proposes a methodology for<br />analysis of imputation performance in the development of predictive models.<br />Based on the proposed methodology, results of the application of machine<br />learning algorithms, as an imputation method in the development of specific<br />models, are presented.</p>
199	Missing Data - A Gentle Introduction Österlund, Vilgot January 2020 (has links) This thesis provides an introduction to methods for handling missing data. A thorough review of earlier methods and the development of the field of missing data is provided. The thesis present the methods suggested in today’s literature, multiple imputation and maximum likelihood estimation. A simulation study is performed to see if there are circumstances in small samples when any of the two methods are to be preferred. To show the importance of handling missing data, multiple imputation and maximum likelihood are compared to listwise deletion. The results from the simulation study does not show any crucial differences between multiple imputation and maximum likelihood when it comes to point estimates. Some differences are seen in the estimation of the confidence intervals, talking in favour of multiple imputation. The difference is decreasing with an increasing sample size and more studies are needed to draw definite conclusions. Further, the results shows that listwise deletion lead to biased estimations under a missing at random mechanism. The methods are also applied to a real dataset, the Swedish enrollment registry, to show how the methods work in a practical application. Missing data Small samples Multiple imputation Maximum likelihood Listwise deletion Missing at random Missing completely at random Linear regression Logistic regression. Probability Theory and Statistics Sannolikhetsteori och statistik
200	Methods of Handling Missing Data in One Shot Response Based Power System Control Dahal, Niraj 08 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / The thesis extends the work done in [1] [2] by Rovnyak, et al. where the authors have described about transient event prediction and response based one shot control using decision trees trained and tested in a 176 bus model of WECC power system network. This thesis contains results from rigorous simulations performed to measure robustness of the existing one shot control subjected to missing PMU's data ranging from 0-10%. We can divide the thesis into two parts in which the first part includes understanding of the work done in [2] using another set of one-shot control combinations labelled as CC2 and the second part includes measuring their robustness while assuming missing PMU's data. Previous work from [2] involves use of decision trees for event detection based on different indices to classify a contingency as a 'Fault' or 'No fault' and another set of decision trees that decides either to actuate 'Control' or 'No control'. The actuation of control here means application of one-shot control combination to possibly bring the system to a new equilibrium point which would otherwise attain loss of synchronism. The work done in [2] also includes assessing performance of the one shot control without event detection. The thesis is organized as follows- Chapter 1 of the thesis highlights the effect of missing PMUs' data in a power system network and the need to address them appropriately. It also provides a general idea of transient stability and response of a transient fault in a power system. Chapter 2 forms the foundation of the thesis as it describes the work done in [1] [2] in detail. It describes the power system model used, contingencies set, and different indices used for decision trees. It also describes about the one shot control combination (CC1) deduced by Rovnyak, et.al. of which performance is later tested in this thesis assuming different missing data scenarios. In addition to CC1, the chapter also describes another set of control combination (CC2) whose performance is also tested assuming the same missing data scenarios. This chapter also explains about the control methodology used in [2]. Finally the performance metrics of the DTs are explained at the end of the chapter. These are the same performance metrics used in [2] to measure the robustness of the one shot control. Chapter 2 is thus more a literature review of previous work plus inclusion of few simulation results obtained from CC2 using exactly the same model and same control methodology. Chapter 3 describes different techniques of handling missing data from PMUs most of which have been used in and referred from different previous papers. Finally Chapter 4 presents the results and analysis of the simulation. The thesis is wrapped up explaining future enhancements and room for improvements. Decision trees Missing data handling One shot control Phasor measurement units Power system transient stability Response based system Wide area control

Search results