Global ETD Search

91	Hat Bayes eine Chance? Sontag, Ralph 10 May 2004 (has links) Workshop "Netz- und Service-Infrastrukturen" Hat Bayes eine Chance? Seit einigen Monaten oder Jahren werden verstärkt Bayes-Filter eingesetzt, um die Nutz-E-Mail ("`Ham"') vom unerwünschten "`Spam"' zu trennen. Diese stoßen jedoch leicht an ihre Grenzen. In einem zweiten Abschnitt wird ein Filtertest der Zeitschrift c't genauer analysiert. info:eu-repo/classification/ddc/004 ddc:004 Bayes; Thomas Bayes-Test Bayes-Verfahren E-Mail Filter Filter <Stochastik> Spam-Mail
92	Bayesian Networks with Expert Elicitation as Applicable to Student Retention in Institutional Research Dunn, Jessamine Corey 13 May 2016 (has links) The application of Bayesian networks within the field of institutional research is explored through the development of a Bayesian network used to predict first- to second-year retention of undergraduates. A hybrid approach to model development is employed, in which formal elicitation of subject-matter expertise is combined with machine learning in designing model structure and specification of model parameters. Subject-matter experts include two academic advisors at a small, private liberal arts college in the southeast, and the data used in machine learning include six years of historical student-related information (i.e., demographic, admissions, academic, and financial) on 1,438 first-year students. Netica 5.12, a software package designed for constructing Bayesian networks, is used for building and validating the model. Evaluation of the resulting model’s predictive capabilities is examined, as well as analyses of sensitivity, internal validity, and model complexity. Additionally, the utility of using Bayesian networks within institutional research and higher education is discussed. The importance of comprehensive evaluation is highlighted, due to the study’s inclusion of an unbalanced data set. Best practices and experiences with expert elicitation are also noted, including recommendations for use of formal elicitation frameworks and careful consideration of operating definitions. Academic preparation and financial need risk profile are identified as key variables related to retention, and the need for enhanced data collection surrounding such variables is also revealed. For example, the experts emphasize study skills as an important predictor of retention while noting the absence of collection of quantitative data related to measuring students’ study skills. Finally, the importance and value of the model development process is stressed, as stakeholders are required to articulate, define, discuss, and evaluate model components, assumptions, and results. Bayes Theorem Bayesian Networks Expert Elicitation Institutional Research Retention
93	Macroeconomic Applications of Bayesian Model Averaging Moser, Mathias 02 1900 (has links) (PDF) Bayesian Model Averaging (BMA) is a common econometric tool to assess the uncertainty regarding model specification and parameter inference and is widely applied in fields where no strong theoretical guidelines are present. Its major advantage over single-equation models is the combination of evidence from a large number of specifications. The three papers included in this thesis all investigate model structures in the BMA model space. The first contribution evaluates how priors can be chosen to enforce model structures in the presence of interactions terms and multicollinearity. This is linked to a discussion in the Journal of Applied Econometrics regarding the question whether being a Sub-Saharan African country makes a difference for growth modelling. The second essay is concerned with clusters of different models in the model space. We apply Latent Class Analysis to the set of sampled models from BMA and identify different subsets (kinds of) models for two well-known growth data sets. The last paper focuses on the application of "jointness", which tries to find bivariate relationships between regressors in BMA. Accordingly this approach attempts to identify substitutes and complements by linking the econometric discussion on this subject to the field of Machine Learning.(author's abstract) RVK QH 233 ; JEL C11, C21, O47 Bayes-Verfahren
94	A Bayesian expected error reduction approach to Active Learning Fredlund, Richard January 2011 (has links) There has been growing recent interest in the field of active learning for binary classification. This thesis develops a Bayesian approach to active learning which aims to minimise the objective function on which the learner is evaluated, namely the expected misclassification cost. We call this approach the expected cost reduction approach to active learning. In this form of active learning queries are selected by performing a `lookahead' to evaluate the associated expected misclassification cost. \paragraph{} Firstly, we introduce the concept of a \textit{query density} to explicitly model how new data is sampled. An expected cost reduction framework for active learning is then developed which allows the learner to sample data according to arbitrary query densities. The model makes no assumption of independence between queries, instead updating model parameters on the basis of both which observations were made \textsl{and} how they were sampled. This approach is demonstrated on the probabilistic high-low game which is a non-separable extension of the high-low game presented by \cite{Seung_etal1993}. The results indicate that the Bayes expected cost reduction approach performs significantly better than passive learning even when there is considerable overlap between the class distributions, covering $30\%$ of input space. For the probabilistic high-low game however narrow queries appear to consistently outperform wide queries. We therefore conclude the first part of the thesis by investigating whether or not this is always the case, demonstrating examples where sampling broadly is favourable to a single input query. \paragraph{} Secondly, we explore the Bayesian expected cost reduction approach to active learning within the pool-based setting. This is where learning is limited to a finite pool of unlabelled observations from which the learner may select observations to be queried for class-labels. Our implementation of this approach uses Gaussian process classification with the expectation propagation approximation to make the necessary inferences. The implementation is demonstrated on six benchmark data sets and again demonstrates superior performance to passive learning.
95	Optimización del clasificador “naive bayes” usando árbol de decisión C4.5 Alarcón Jaimes, Carlos January 2015 (has links) El clasificador Naive Bayes es uno de los modelos de clasificación más efectivos, debido a su simplicidad, resistencia al ruido, poco tiempo de procesamiento y alto poder predictivo. El clasificador Naive Bayes asume una fuerte suposición de independencia entre las variables predictoras dada la clase, lo que generalmente no se cumple. Muchas investigaciones buscan mejorar el poder predictivo del clasificador relajando esta suposición de independencia, como el escoger un subconjunto de variables que sean independientes o aproximadamente independientes. En este trabajo, se presenta un método que busca optimizar el clasificador Naive Bayes usando el árbol de decisión C4.5. Este método, selecciona un subconjunto de variables del conjunto de datos usando el árbol de decisión C4.5 inducido y luego aplica el clasificador Naive Bayes a estas variables seleccionadas. Con el uso previo del árbol de decisión C4.5 se consigue remover las variables redundantes y/o irrelevantes del conjunto de datos y escoger las que son más informativas en tareas de clasificación, y de esta forma mejorar el poder predictivo del clasificador. Este método es ilustrado utilizando tres conjuntos de datos provenientes del repositorio UCI , Irvin Repository of Machine Learning databases de la Universidad de California y un conjunto de datos proveniente de la Encuesta Nacional de Hogares del Instituto Nacional de Estadística e Informática del Perú, ENAHO – INEI, e implementado con el programa WEKA. Redes bayesianas Clasificador bayesiano Naive Bayes Árbol de decisión C4.5
96	Using machine learning to classify news articles Lagerkrants, Eleonor, Holmström, Jesper January 2016 (has links) In today’s society a large portion of the worlds population get their news on electronicdevices. This opens up the possibility to enhance their reading experience bypersonalizing news for the readers based on their previous preferences. We have conductedan experiment to find out how accurately a Naïve Bayes classifier can selectarticles that a user might find interesting. Our experiments was done on two userswho read and classified 200 articles as interesting or not interesting. Those articleswere divided into four datasets with the sizes 50, 100, 150 and 200. We used a NaïveBayes classifier with 16 different settings configurations to classify the articles intotwo categories. From these experiments we could find several settings configurationsthat showed good results. One settings configuration was chosen as a good generalsetting for this kind of problem. We found that for datasets with a size larger than 50there were no significant increase in classification confidence. Machine learning Naive Bayes News articles text classification WEKA
97	Methods for Bayesian inversion of seismic data Walker, Matthew James January 2015 (has links) The purpose of Bayesian seismic inversion is to combine information derived from seismic data and prior geological knowledge to determine a posterior probability distribution over parameters describing the elastic and geological properties of the subsurface. Typically the subsurface is modelled by a cellular grid model containing thousands or millions of cells within which these parameters are to be determined. Thus such inversions are computationally expensive due to the size of the parameter space (being proportional to the number of grid cells) over which the posterior is to be determined. Therefore, in practice approximations to Bayesian seismic inversion must be considered. A particular, existing approximate workflow is described in this thesis: the so-called two-stage inversion method explicitly splits the inversion problem into elastic and geological inversion stages. These two stages sequentially estimate the elastic parameters given the seismic data, and then the geological parameters given the elastic parameter estimates, respectively. In this thesis a number of methodologies are developed which enhance the accuracy of this approximate workflow. To reduce computational cost, existing elastic inversion methods often incorporate only simplified prior information about the elastic parameters. Thus a method is introduced which transforms such results, obtained using prior information specified using only two-point geostatistics, into new estimates containing sophisticated multi-point geostatistical prior information. The method uses a so-called deep neural network, trained using only synthetic instances (or `examples') of these two estimates, to apply this transformation. The method is shown to improve the resolution and accuracy (by comparison to well measurements) of elastic parameter estimates determined for a real hydrocarbon reservoir. It has been shown previously that so-called mixture density network (MDN) inversion can be used to solve geological inversion analytically (and thus very rapidly and efficiently) but only under certain assumptions about the geological prior distribution. A so-called prior replacement operation is developed here, which can be used to relax these requirements. It permits the efficient MDN method to be incorporated into general stochastic geological inversion methods which are free from the restrictive assumptions. Such methods rely on the use of Markov-chain Monte-Carlo (MCMC) sampling, which estimate the posterior (over the geological parameters) by producing a correlated chain of samples from it. It is shown that this approach can yield biased estimates of the posterior. Thus an alternative method which obtains a set of non-correlated samples from the posterior is developed, avoiding the possibility of bias in the estimate. The new method was tested on a synthetic geological inversion problem; its results compared favourably to those of Gibbs sampling (a MCMC method) on the same problem, which exhibited very significant bias. The geological prior information used in seismic inversion can be derived from real images which bear similarity to the geology anticipated within the target region of the subsurface. Such so-called training images are not always available from which this information (in the form of geostatistics) may be extracted. In this case appropriate training images may be generated by geological experts. However, this process can be costly and difficult. Thus an elicitation method (based on a genetic algorithm) is developed here which obtains the appropriate geostatistics reliably and directly from a geological expert, without the need for training images. 12 experts were asked to use the algorithm (individually) to determine the appropriate geostatistics for a physical (target) geological image. The majority of the experts were able to obtain a set of geostatistics which were consistent with the true (measured) statistics of the target image. 551.22
98	The prediction of HLA genotypes from next generation sequencing and genome scan data Farrell, John J. 22 January 2016 (has links) Genome-wide association studies have very successfully found highly significant disease associations with single nucleotide polymorphisms (SNP) in the Major Histocompatibility Complex for adverse drug reactions, autoimmune diseases and infectious diseases. However, the extensive linkage disequilibrium in the region has made it difficult to unravel the HLA alleles underlying these diseases. Here I present two methods to comprehensively predict 4-digit HLA types from the two types of experimental genome data widely available. The Virtual SNP Imputation approach was developed for genome scan data and demonstrated a high precision and recall (96% and 97% respectively) for the prediction of HLA genotypes. A reanalysis of 6 genome-wide association studies using the HLA imputation method identified 18 significant HLA allele associations for 6 autoimmune diseases: 2 in ankylosing spondylitis, 2 in autoimmune thyroid disease, 2 in Crohn's disease, 3 in multiple sclerosis, 2 in psoriasis and 7 in rheumatoid arthritis. The EPIGEN consortium also used the Virtual SNP Imputation approach to detect a novel association of HLA-A31:01 with adverse reactions to carbamazepine. For the prediction of HLA genotypes from next generation sequencing data, I developed a novel approach using a naïve Bayes algorithm called HLA-Genotyper. The validation results covered whole genome, whole exome and RNA-Seq experimental designs in the European and Yoruba population samples available from the 1000 Genomes Project. The RNA-Seq data gave the best results with an overall precision and recall near 0.99 for Europeans and 0.98 for the Yoruba population. I then successfully used the method on targeted sequencing data to detect significant associations of idiopathic membranous nephropathy with HLA-DRB103:01 and HLA-DQA1*05:01 using the 1000 Genomes European subjects as controls. Using the results reported here, researchers may now readily unravel the association of HLA alleles with many diseases from genome scans and next generation sequencing experiments without the expensive and laborious HLA typing of thousands of subjects. Both algorithms enable the analysis of diverse populations to help researchers pinpoint HLA loci with biological roles in infection, inflammation, autoimmunity, aging, mental illness and adverse drug reactions. Bioinformatics GWAS HLA Naive Bayes Next generation sequencing
99	Discretização para aprendizagem bayesiana: aplicação no auxílio à validação de dados em proteção ao vôo. Jackson Paul Matsuura 00 December 2003 (has links) A utilização de redes Bayesianas, que são uma representação compacta de distribuições de probabilidades conjuntas de um domínio, vem crescendo em diversas áreas e aplicações. As redes Bayesianas podem ser construídas a partir do conhecimento de especialistas ou por algoritmos de aprendizagem Bayesiana que inferem as relações entre as variáveis do domínio a partir de um conjunto de dados de treinamento. A construção manual de redes Bayesianas, pode ser trabalhosa, cara e estar propenso a erros vem cada vez mais sendo preterida pelo uso de algoritmos de aprendizagem Bayesiana, mas os algoritmos de aprendizagem em geral pressupõem que as variáveis utilizadas na aprendizagem sejam discretas ou, caso sejam contínuas, apresentem uma distribuição gaussiana, o que normalmente não ocorre na prática. Portanto para o uso da aprendizagem Bayesiana é necessário que as variáveis sejam discretizadas segundo algum critério, que no caso mais simples pode ser uma discretização uniforme. A grande maioria dos métodos de discretização existentes, porém, não são adequados à aprendizagem Bayesiana, pois foram desenvolvidos no contexto de classificação e não de descoberta de conhecimento. Nesse trabalho é proposto e utilizado um método de discretização de variáveis que leva em conta as distribuições condicionais das mesmas no processo de discretização, objetivando um melhor resultado do processo de aprendizagem Bayesiana. O método proposto foi utilizado em uma base de dados real de informações de Proteção ao Vôo e a rede Bayesiana construída foi utilizada no auxílio à validação de dados, realizando uma triagem automatizada dos dados. Foi realizada uma comparação entre o método proposto de discretização e um dos métodos mais comuns. Os resultados obtidos mostram a efetividade do método de discretização proposto e apontam para um grande potencial dessa nova aplicação da aprendizagem e inferência Bayesiana. Teorema de Bayes Análise estatística multivariada Validação Matemática
100	Análise de propagação em vegetação utilizando Bayes e UT Loureiro, Alexandre José Figueiredo 30 September 2018 (has links) Tese (doutorado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2018. / A vegetação é considerada um ambiente complexo para análise de espalhamento e atenuação dentro do fenômeno de propagação de ondas rádio. Esta tese apresenta um preditor bayesiano de atenuação de propagação de ondas de rádio em vegetação baseado na sua correlação com pixels de vegetação de uma imagem e utilizando as vantagens computacionais da transformada da incerteza (UT). O processamento de imagens de satélite pode refinar o planejamento de sistemas de rádio usando a vegetação como preditor de atenuação. Neste trabalho a predição é baseada na correlação de mais de 56% entre valores de pixel RGB e valores de atenuação na vegetação obtida de três grupos de medições de potência em testes de campo em ondas centimétricas em duas regiões distintas do Brasil: Belo Horizonte, na região sudeste com medições em 18 GHz, e Manaus em 24 GHz na região norte. Esta predição aplicada nos dois grupos de medições em Manaus apresentou correlações de 0,59 e 0,56 respectivamente enquanto que em Belo Horizonte apresentou correlação de 0,57. As análises estatísticas mostraram que mais de 30% da variância da atenuação nestes três grupos de medições podem ser explicadas pelos valores de pixel RGB. Utilizando este modelo linear correlacionado entre pixels RGB de vegetação e valores geolocalizados de atenuação, este trabalho combina a Transformada da Incerteza (UT) e a inferência de Bayes para refinar a distribuição de atenuação em vegetação. Como a necessária multiplicação das distribuições prior e amostral de Bayes não está facilmente disponível na UT, este trabalho apresenta um método que calcula novos pontos sigma comuns, mas com diferentes pesos para as distribuições prior e amostral da UT, desta forma permitindo a multiplicação de Bayes. / The vegetation is considered a complex environment for analysis of scattering and attenuation in radio propagation phenomena. This thesis presents a bayesian predictor for radio propagation attenuation through vegetation based on the its correlation with vegetation pixels from an image and utilizing the computational advantages of the unscented transform (UT). The satellite image processing can improve planning of radio systems with a vegetation attenuation predictor. In this research, the prediction is based on the correlation of more than 56% between RGB pixel values and vegetation attenuation taken from three groups of power measurements at centimeter waves at two distinct regions of Brazil: Belo Horizonte, in the southeast region measured at 18 GHz, and Manaus at 24 GHz in the north region. This prediction applied at two groups of power measurements at Manaus showed correlation 0.62 and 0.56 respectively, while at Belo Horizonte showed correlation of 0.57. The statistical analysis showed that more than 30% of the attenuation variance at these three measurements groups was due to the RGB pixel values. Using this linear correlated model between vegetation pixel RGB values and geolocated attenuation values, this work combined the unscented transform (UT) and bayesian inference to refine the vegetation attenuation distribution. Since the necessary multiplication of bayes prior and sampling distributions is not easily available in the UT, this research presents a method that calculates new common sigma points and different new weights for the prior and sampling UT distributions, thus allowing the Bayes multiplication. Propagação de ondas de rádio Ondas de rádio Teorema de Bayes Incerteza

Search results