Global ETD Search

171	Robust estimation for spatial models and the skill test for disease diagnosis Lin, Shu-Chuan 25 August 2008 (has links) This thesis focuses on (1) the statistical methodologies for the estimation of spatial data with outliers and (2) classification accuracy of disease diagnosis. Chapter I, Robust Estimation for Spatial Markov Random Field Models: Markov Random Field (MRF) models are useful in analyzing spatial lattice data collected from semiconductor device fabrication and printed circuit board manufacturing processes or agricultural field trials. When outliers are present in the data, classical parameter estimation techniques (e.g., least squares) can be inefficient and potentially mislead the analyst. This chapter extends the MRF model to accommodate outliers and proposes robust parameter estimation methods such as the robust M- and RA-estimates. Asymptotic distributions of the estimates with differentiable and non-differentiable robustifying function are derived. Extensive simulation studies explore robustness properties of the proposed methods in situations with various amounts of outliers in different patterns. Also provided are studies of analysis of grid data with and without the edge information. Three data sets taken from the literature illustrate advantages of the methods. Chapter II, Extending the Skill Test for Disease Diagnosis: For diagnostic tests, we present an extension to the skill plot introduced by Mozer and Briggs (2003). The method is motivated by diagnostic measures for osteoporosis in a study. By restricting the area under the ROC curve (AUC) according to the skill statistic, we have an improved diagnostic test for practical applications by considering the misclassification costs. We also construct relationships, using the Koziol-Green model and mean-shift model, between the diseased group and the healthy group for improving the skill statistic. Asymptotic properties of the skill statistic are provided. Simulation studies compare the theoretical results and the estimates under various disease rates and misclassification costs. We apply the proposed method in classification of osteoporosis data. True positive rate False positive rate Classification Disease diagnosis Skill test Robust estimation Spatial models Markov random field models Spatial lattice data Koziol-Green model and mean-shift model Area under the curve ROC curve Markov random fields Lattice theory Outliers (Statistics)
172	Analyse des modèles résines pour la correction des effets de proximité en lithographie optique / Resist modeling analysis for optical proximity correction effect in optical lithography Top, Mame Kouna 12 January 2011 (has links) Les progrès réalisés dans la microélectronique répondent à la problématique de la réduction des coûts de production et celle de la recherche de nouveaux marchés. Ces progrès sont possibles notamment grâce à ceux effectués en lithographie optique par projection, le procédé lithographique principalement utilisé par les industriels. La miniaturisation des circuits intégrés n’a donc été possible qu’en poussant les limites d’impression lithographique. Cependant en réduisant les largeurs des transistors et l’espace entre eux, on augmente la sensibilité du transfert à ce que l’on appelle les effets de proximité optique au fur et à mesure des générations les plus avancées de 45 et 32 nm de dimension de grille de transistor.L’utilisation des modèles OPC est devenue incontournable en lithographie optique, pour les nœuds technologiques avancés. Les techniques de correction des effets de proximité (OPC) permettent de garantir la fidélité des motifs sur plaquette, par des corrections sur le masque. La précision des corrections apportées au masque dépend de la qualité des modèles OPC mis en œuvre. La qualité de ces modèles est donc primordiale. Cette thèse s’inscrit dans une démarche d’analyse et d’évaluation des modèles résine OPC qui simulent le comportement de la résine après exposition. La modélisation de données et l’analyse statistique ont été utilisées pour étudier ces modèles résine de plus en plus empiriques. Outre la fiabilisation des données de calibrage des modèles, l’utilisation des plateformes de création de modèles dédiées en milieu industriel et la méthodologie de création et de validation des modèles OPC ont également été étudié. Cette thèse expose le résultat de l’analyse des modèles résine OPC et propose une nouvelles méthodologie de création, d’analyse et de validation de ces modèles. / The Progress made in microelectronics responds to the matter of production costs reduction and to the search of new markets. These progresses have been possible thanks those made in optical lithography, the printing process principally used in integrated circuit (IC) manufacturing.The miniaturization of integrated circuits has been possible only by pushing the limits of optical resolution. However this miniaturization increases the sensitivity of the transfer, leading to more proximity effects at progressively more advanced technology nodes (45 and 32 nm in transistor gate size). The correction of these optical proximity effects is indispensible in photolithographic processes for advanced technology nodes. Techniques of optical proximity correction (OPC) enable to increase the achievable resolution and the pattern transfer fidelity for advanced lithographic generations. Corrections are made on the mask based on OPC models which connect the image on the resin to the changes made on the mask. The reliability of these OPC models is essential for the improvement of the pattern transfer fidelity.This thesis analyses and evaluates the OPC resist models which simulates the behavior of the resist after the photolithographic process. Data modeling and statistical analysis have been used to study these increasingly empirical resist models. Besides the model calibration data reliability, we worked on the way of using the models calibration platforms generally used in IC manufacturing.This thesis exposed the results of the analysis of OPC resist models and proposes a new methodology for OPC resist models creation, analysis and validation. Microélectroniques Photolithographie Modèle Résine Métrologie Valeurs aberrantes Ensemble d’apprentissage Validation de modèles Modélisation de données Microelectronics Photolithography OPC (Optical Proximity Correction) Resist Models Metrology Outliers Learning Data Model validation Data modelling Statistical analysis 620
173	Impact of unbalancedness and heteroscedasticity on classic parametric significance tests of two-way fixed-effects ANOVA tests Chaka, Lyson 31 October 2017 (has links) Classic parametric statistical tests, like the analysis of variance (ANOVA), are powerful tools used for comparing population means. These tests produce accurate results provided the data satisfies underlying assumptions such as homoscedasticity and balancedness, otherwise biased results are obtained. However, these assumptions are rarely satisfied in real-life. Alternative procedures must be explored. This thesis aims at investigating the impact of heteroscedasticity and unbalancedness on effect sizes in two-way fixed-effects ANOVA models. A real-life dataset, from which three different samples were simulated was used to investigate the changes in effect sizes under the influence of unequal variances and unbalancedness. The parametric bootstrap approach was proposed in case of unequal variances and non-normality. The results obtained indicated that heteroscedasticity significantly inflates effect sizes while unbalancedness has non-significant impact on effect sizes in two-way ANOVA models. However, the impact worsens when the data is both unbalanced and heteroscedastic. / Statistics / M. Sc. (Statistics) Fixed-effects analysis of variance Unbalancedness Heteroscedasticity Homoscedasticity Effect size Eta-squared Traditional F-tests Robust tests Normality Outliers Shapiro Wilk’s tests 519.538 Analysis of variance Regression analysis Heteroscedasticity Homoscedasticity Multivariate analysis Mathematical statistics
174	Robust Nonparametric Sequential Distributed Spectrum Sensing under EMI and Fading Sahasranand, K R January 2015 (has links) (PDF) Opportunistic use of unused spectrum could efficiently be carried out using the paradigm of Cognitive Radio (CR). A spectrum remains idle when the primary user (licensee) is not using it. The secondary nodes detect this spectral hole quickly and make use of it for data transmission during this interval and stop transmitting once the primary starts transmitting. Detection of spectral holes by the secondary is called spectrum sensing in the CR scenario. Spectrum Sensing is formulated as a hypothesis testing problem wherein under H0 the spectrum is free and under H1, occupied. The samples will have different probability distributions, P0 and P1, under H0 and H1 respectively. In the first part of the thesis, a new algorithm - entropy test is presented, which performs better than the available algorithms when P0 is known but not P1. This is extended to a distributed setting as well, in which different secondary nodes collect samples independently and send their decisions to a Fusion Centre (FC) over a noisy MAC which then makes the final decision. The asymptotic optimality of the algorithm is also shown. In the second part, the spectrum sensing problem under impediments such as fading, electromagnetic interference and outliers is tackled. Here the detector does not possess full knowledge of either P0 or P1. This is a more general and practically relevant setting. It is found that a recently developed algorithm (which we call random walk test) under suitable modifications works well. The performance of the algorithm theoretically and via simulations is shown. The same algorithm is extended to the distributed setting as above. Spectrum Sensing Cognitive Radio Entropy Test Multipath Fading Spectrum Sensing Algorithms Electromagnetic Interference Outliers Data Transmission Nonparametric Spectrum Sensing Fading (Radio) Nonparametric Sequential Detection Electrical Communication Engineering
175	Estudo comparativo de gr?ficos de probabilidade normal para an?lise de experimentos fatoriais n?o replicados N?brega, Manass?s Pereira 17 May 2010 (has links) Made available in DSpace on 2015-03-03T15:28:32Z (GMT). No. of bitstreams: 1 ManassesPN_DISSERT.pdf: 2146671 bytes, checksum: a562634d1e686680a598403ed93762dd (MD5) Previous issue date: 2010-05-17 / Two-level factorial designs are widely used in industrial experimentation. However, many factors in such a design require a large number of runs to perform the experiment, and too many replications of the treatments may not be feasible, considering limitations of resources and of time, making it expensive. In these cases, unreplicated designs are used. But, with only one replicate, there is no internal estimate of experimental error to make judgments about the significance of the observed efects. One of the possible solutions for this problem is to use normal plots or half-normal plots of the efects. Many experimenters use the normal plot, while others prefer the half-normal plot and, often, for both cases, without justification. The controversy about the use of these two graphical techniques motivates this work, once there is no register of formal procedure or statistical test that indicates \which one is best". The choice between the two plots seems to be a subjective issue. The central objective of this master's thesis is, then, to perform an experimental comparative study of the normal plot and half-normal plot in the context of the analysis of the 2k unreplicated factorial experiments. This study involves the construction of simulated scenarios, in which the graphics performance to detect significant efects and to identify outliers is evaluated in order to verify the following questions: Can be a plot better than other? In which situations? What kind of information does a plot increase to the analysis of the experiment that might complement those provided by the other plot? What are the restrictions on the use of graphics? Herewith, this work intends to confront these two techniques; to examine them simultaneously in order to identify similarities, diferences or relationships that contribute to the construction of a theoretical reference to justify or to aid in the experimenter's decision about which of the two graphical techniques to use and the reason for this use. The simulation results show that the half-normal plot is better to assist in the judgement of the efects, while the normal plot is recommended to detect outliers in the data / Os experimentos fatoriais 2k s?o muito utilizados na experimenta??o industrial. Contudo, quanto maior o n?mero de fatores considerados maior ser? a quantidade de provas necess?rias para a execu??o de um experimento, e realizar replica??es dos tratamentos pode ser invi?vel, considerando as limita??es de recursos e de tempo, tornando tal experimento dispendioso. Nestes casos, s~ao utilizados os fatoriais 2k n?o replicados. Mas, sem replica??oo, n?o ? poss?vel obter uma estimativa direta da variabilidade do erro experimental para se avaliar a signific^ancia dos efeitos. Uma das poss?veis solu??es para este problema ? utilizar os gr?fificos normal ou semi-normal dos efeitos. Muitos pesquisadores usam o gr?fifico normal, ao passo que outros preferem o semi-normal e, em muitas vezes, para ambos os casos, sem alguma justificativa. A controv?rsia sobre o uso destas duas t?cnicas gr?ficas ? o que motiva a realiza??o do presente trabalho, uma vez que n?o h? registro de procedimento formal ou teste estat?stico que indique \qual delas ? melhor". A escolha entre os dois gr?fificos parece ser uma quest~ao subjetiva. O objetivo central desta disserta??o ?, ent?o, realizar um estudo comparativo experimental dos gr?fificos normal e semi-normal no contexto da an?lise dos experimentos fatoriais 2k n?o replicados. Tal estudo consiste na constru??o de cen?rios simulados, nos quais o desempenho dos gr?fificos em detectar os efeitos significativos e identificar valores discrepantes ? avaliado com o intuito de verificar as seguintes quest?es: Um gr?fifico pode ser melhor que o outro? Em que situa??es? Que informa??es um gr?fifico acrescenta ? an?lise do experimento que possam complementar aquelas fornecidas pelo outro gr?fifico? Quais as restri??es no uso de cada gr?fifico? Com isso, prop?e-se confrontar estas duas t?cnicas; examin?-las simultaneamente a fim de conhecer semelhan?as, diferen?as ou rela??es que possam contribuir para a constru??o de um referencial te?rico que sirva como um subs?dio para justificar ou auxiliar na decis~ao do pesquisador sobre qual das duas t?cnicas gr?fificas utilizar e o porqu^e deste uso. Os resultados das simula??es mostram que o gr?fifico semi-normal ? melhor para auxiliar no julgamento dos efeitos, ao passo que o gr?fifico normal ? recomendado para detectar a presen?a de valores discrepantes nos dados Fatoriais com dois n?veis Gr?fifico normal Gr?fifico semi-normal Efeitos significativos Parcelas subdivididas Valores discrepantes Two-level factorials Normal plot Half-normal plot Significant efects Splitplot Outliers.
176	O uso de quase U-estatísticas para séries temporais uni e multivaridas / The use of quasi U-statistics for univariate and multivariate time series Valk, Marcio 17 August 2018 (has links) Orientador: Aluísio de Souza Pinheiro / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Matemática Estatítica e Computação Científica / Made available in DSpace on 2018-08-17T14:57:09Z (GMT). No. of bitstreams: 1 Valk_Marcio_D.pdf: 2306844 bytes, checksum: 31162915c290291a91806cdc6f69f697 (MD5) Previous issue date: 2011 / Resumo: Classificação e agrupamento de séries temporais são problemas bastante explorados na literatura atual. Muitas técnicas são apresentadas para resolver estes problemas. No entanto, as restrições necessárias, em geral, tornam os procedimentos específicos e aplicáveis somente a uma determinada classe de séries temporais. Além disso, muitas dessas abordagens são empíricas. Neste trabalho, propomos métodos para classificação e agrupamento de séries temporais baseados em quase U-estatísticas(Pinheiro et al. (2009) e Pinheiro et al. (2010)). Como núcleos das U-estatísticas são utilizadas métricas baseadas em ferramentas bem conhecidas na literatura de séries temporais, entre as quais o periodograma e a autocorrelação amostral. Três situações principais são consideradas: séries univariadas; séries multivariadas; e séries com valores aberrantes. _E demonstrada a normalidade assintética dos testes propostos para uma ampla classe de métricas e modelos. Os métodos são estudados também por simulação e ilustrados por aplicação em dados reais. / Abstract: Classifcation and clustering of time series are problems widely explored in the current literature. Many techniques are presented to solve these problems. However, the necessary restrictions in general, make the procedures specific and applicable only to a certain class of time series. Moreover, many of these approaches are empirical. We present methods for classi_cation and clustering of time series based on Quasi U-statistics (Pinheiro et al. (2009) and Pinheiro et al. (2010)). As kernel of U-statistics are used metrics based on tools well known in the literature of time series, including the sample autocorrelation and periodogram. Three main situations are considered: univariate time series, multivariate time series, and time series with outliers. It is demonstrated the asymptotic normality of the proposed tests for a wide class of metrics and models. The methods are also studied by simulation and applied in a real data set. / Doutorado / Estatistica / Doutor em Estatística Análise de séries temporais Series temporais Estatística não paramétrica Testes de hipóteses estatísticas Valores estranhos (Estatistica) Teoria da previsão Time-series analysis Time-series Nonparametric statistics Statistical hypothesis testing Outliers (Statistics) Prediction theory
177	Extensões dos modelos de regressão quantílica bayesianos / Extensions of bayesian quantile regression models Bruno Ramos dos Santos 29 April 2016 (has links) Esta tese visa propor extensões dos modelos de regressão quantílica bayesianos, considerando dados de proporção com inflação de zeros, e também dados censurados no zero. Inicialmente, é sugerida uma análise de observações influentes, a partir da representação por mistura localização-escala da distribuição Laplace assimétrica, em que as distribuições a posteriori das variáveis latentes são comparadas com o intuito de identificar possíveis observações aberrantes. Em seguida, é proposto um modelo de duas partes para analisar dados de proporção com inflação de zeros ou uns, estudando os quantis condicionais e a probabilidade da variável resposta ser igual a zero. Além disso, são propostos modelos de regressão quantílica bayesiana para dados contínuos com um componente discreto no zero, em que parte dessas observações é suposta censurada. Esses modelos podem ser considerados mais completos na análise desse tipo de dados, uma vez que a probabilidade de censura é verificada para cada quantil de interesse. E por último, é considerada uma aplicação desses modelos com correlação espacial, para estudar os dados da eleição presidencial no Brasil em 2014. Nesse caso, os modelos de regressão quantílica são capazes de incorporar essa informação espacial a partir do processo Laplace assimétrico. Para todos os modelos propostos foi desenvolvido um pacote do software R, que está exemplificado no apêndice. / This thesis aims to propose extensions of Bayesian quantile regression models, considering proportion data with zero inflation, and also censored data at zero. Initially, it is suggested an analysis of influential observations, based on the location-scale mixture representation of the asymmetric Laplace distribution, where the posterior distribution of the latent variables are compared with the goal of identifying possible outlying observations. Next, a two-part model is proposed to analyze proportion data with zero or one inflation, studying the conditional quantile and the probability of the response variable being equal to zero. Following, Bayesian quantile regression models are proposed for continuous data with a discrete component at zero, where part of these observations are assumed censored. These models may be considered more complete in the analysis of this type of data, as the censoring probability varies with the quantiles of interest. For last, it is considered an application of these models with spacial correlation, in order to study the data about the last presidential election in Brazil in 2014. In this example, the quantile regression models are able to incorporate spatial dependence with the asymmetric Laplace process. For all the proposed models it was developed a R package, which is exemplified in the appendix. Dados censurados Distribuição Laplace assimétrica Modelo de duas partes Modelo espacial Observações aberrantes Regressão quantica bayesiana Bayesian quantile regression Censored data Outliers Two-part model
178	Algoritmy pro detekci anomálií v datech z klinických studií a zdravotnických registrů / Algorithms for anomaly detection in data from clinical trials and health registries Bondarenko, Maxim January 2018 (has links) This master's thesis deals with the problems of anomalies detection in data from clinical trials and medical registries. The purpose of this work is to perform literary research about quality of data in clinical trials and to design a personal algorithm for detection of anomalous records based on machine learning methods in real clinical data from current or completed clinical trials or medical registries. In the practical part is described the implemented algorithm of detection, consists of several parts: import of data from information system, preprocessing and transformation of imported data records with variables of different data types into numerical vectors, using well known statistical methods for detection outliers and evaluation of the quality and accuracy of the algorithm. The result of creating the algorithm is vector of parameters containing anomalies, which has to make the work of data manager easier. This algorithm is designed for extension the palette of information system functions (CLADE-IS) on automatic monitoring the quality of data by detecting anomalous records.
179	Algoritmy pro detekci anomálií v datech z klinických studií a zdravotnických registrů / Algorithms for anomaly detection in data from clinical trials and health registries Bondarenko, Maxim January 2018 (has links) This master's thesis deals with the problems of anomalies detection in data from clinical trials and medical registries. The purpose of this work is to perform literary research about quality of data in clinical trials and to design a personal algorithm for detection of anomalous records based on machine learning methods in real clinical data from current or completed clinical trials or medical registries. In the practical part is described the implemented algorithm of detection, consists of several parts: import of data from information system, preprocessing and transformation of imported data records with variables of different data types into numerical vectors, using well known statistical methods for detection outliers and evaluation of the quality and accuracy of the algorithm. The result of creating the algorithm is vector of parameters containing anomalies, which has to make the work of data manager easier. This algorithm is designed for extension the palette of information system functions (CLADE-IS) on automatic monitoring the quality of data by detecting anomalous records.
180	Dolovací modul systému pro dolování z dat na platformě NetBeans / Data Mining Module of a Data Mining System on NetBeans Platform Výtvar, Jaromír January 2010 (has links) The aim of this work is to get basic overview about the process of obtaining knowledge from databases - datamining and to analyze the datamining system developed at FIT BUT on the NetBeans platform in order to create a new mining module. We decided to implement a module for mining outliers and to extend existing regression module with multiple linear regression using generalized linear models. New methods using existing methods of Oracle Data Mining.

Search results