241 |
Classificação de imagens de diversas fontes de informação com o uso de controladores de influência para as imagens e suas classes.Orlando Alves Máximo 19 December 2008 (has links)
Este trabalho aborda as técnicas de classificação supervisionada de imagens utilizando controladores de influência. Avaliou-se o desempenho do uso dos controladores de influência das imagens e também das classes presentes nas imagens. Para a determinação dos valores dos controladores de influência, foram propostos métodos para a estimativa dos controladores de influência das imagens e das suas classes. Dentre os métodos propostos, destacam-se os indicadores de separabilidade entre as classes da imagem e os provenientes do cálculo do coeficiente kappa e da Precisão Global da classificação. Apresentou-se, também, a proposta de um novo classificador que incorpora o conceito de controladores de influência através das probabilidades de ocorrência condicionais das classes presentes nas imagens. Para os testes de avaliação de desempenho do uso de controladores de influência, foram utilizados seis conjuntos de duas imagens SAR (originais e filtradas com filtros da média com janelas 3×3, 5×5 7×7, 9×9 e 11×11). O desempenho dos classificadores propostos mostrou-se superior aos Classificadores em Cascata, da Distância Euclidiana e da Distância de Mahalanobis, que não incorporam o conceito de controladores de influência em sua estrutura. Para os testes de desempenho do classificador baseado nas probabilidades de ocorrência condicional das classes, foram utilizados quatro conjuntos de imagens SAR simuladas. A análise dos resultados evidencia que o classificador proposto obteve desempenho superior ao Classificador em Cascata.
|
242 |
Estudo de técnicas em análise de dados de detectores de ondas gravitacionais.Helmo Alan Batista de Araújo 08 July 2008 (has links)
Neste trabalho inicialmente se investiga a possibilidade de utilização de uma inovadora transformada de tempo-frequência, conhecida como transformada S, para a análise de dados do detector de ondas gravitacionais ALLEGRO. Verifica-se que sua utilidade para este detector é limitada por causa da estreita largura de banda do mesmo. No entanto, argumenta-se que pode ser útil para detectores interferométricos. Em seguida, é apresentado um método robusto para a análise de dados baseado em um teste de hipótese conhecido como critério de Neyman-Pearson, para a determinação de eventos candidatos a sinais impulsivos. O método consiste na construção de funções de distribuição de probabilidade para a energia média ponderada dos blocos de dados gravados pelo detector, tanto na situação de ausência de sinal como para o caso de sinal misturado ao ruído. Com base nessas distribuições é possível encontrar a probabilidade do bloco de dados, (no qual um evento candidato é localizado), não coincidir com um bloco de ruído. Essa forma de busca por sinais candidatos imersos no ruído apresenta concordância com outro método utilizado para esse fim. Conclui-se que este é um método promissor, pois não é necessário passar por um processo mais refinado na busca por eventos candidatos, assim diminuindo o tempo de processamento computacional.
|
243 |
Bayesian Nonparametric Models for Multi-Stage Sample SurveysYin, Jiani 27 April 2016 (has links)
It is a standard practice in small area estimation (SAE) to use a model-based approach to borrow information from neighboring areas or from areas with similar characteristics. However, survey data tend to have gaps, ties and outliers, and parametric models may be problematic because statistical inference is sensitive to parametric assumptions. We propose nonparametric hierarchical Bayesian models for multi-stage finite population sampling to robustify the inference and allow for heterogeneity, outliers, skewness, etc. Bayesian predictive inference for SAE is studied by embedding a parametric model in a nonparametric model. The Dirichlet process (DP) has attractive properties such as clustering that permits borrowing information. We exemplify by considering in detail two-stage and three-stage hierarchical Bayesian models with DPs at various stages. The computational difficulties of the predictive inference when the population size is much larger than the sample size can be overcome by the stick-breaking algorithm and approximate methods. Moreover, the model comparison is conducted by computing log pseudo marginal likelihood and Bayes factors. We illustrate the methodology using body mass index (BMI) data from the National Health and Nutrition Examination Survey and simulated data. We conclude that a nonparametric model should be used unless there is a strong belief in the specific parametric form of a model.
|
244 |
Medical data mining using Bayesian network and DNA sequence analysis.January 2004 (has links)
Lee Kit Ying. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 115-117). / Abstracts in English and Chinese. / Abstract --- p.i / Acknowledgement --- p.iv / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Project Background --- p.1 / Chapter 1.2 --- Problem Specifications --- p.3 / Chapter 1.3 --- Contributions --- p.5 / Chapter 1.4 --- Thesis Organization --- p.6 / Chapter 2 --- Background --- p.8 / Chapter 2.1 --- Medical Data Mining --- p.8 / Chapter 2.1.1 --- General Information --- p.9 / Chapter 2.1.2 --- Related Research --- p.10 / Chapter 2.1.3 --- Characteristics and Difficulties Encountered --- p.11 / Chapter 2.2 --- DNA Sequence Analysis --- p.13 / Chapter 2.3 --- Hepatitis B Virus --- p.14 / Chapter 2.3.1 --- Virus Characteristics --- p.15 / Chapter 2.3.2 --- Important Findings on the Virus --- p.17 / Chapter 2.4 --- Bayesian Network and its Classifiers --- p.17 / Chapter 2.4.1 --- Formal Definition --- p.18 / Chapter 2.4.2 --- Existing Learning Algorithms --- p.19 / Chapter 2.4.3 --- Evolutionary Algorithms and Hybrid EP (HEP) --- p.22 / Chapter 2.4.4 --- Bayesian Network Classifiers --- p.25 / Chapter 2.4.5 --- Learning Algorithms for BN Classifiers --- p.32 / Chapter 3 --- Bayesian Network Classifier for Clinical Data --- p.35 / Chapter 3.1 --- Related Work --- p.36 / Chapter 3.2 --- Proposed BN-augmented Naive Bayes Classifier (BAN) --- p.38 / Chapter 3.2.1 --- Definition --- p.38 / Chapter 3.2.2 --- Learning Algorithm with HEP --- p.39 / Chapter 3.2.3 --- Modifications on HEP --- p.39 / Chapter 3.3 --- Proposed General Bayesian Network with Markov Blan- ket (GBN) --- p.40 / Chapter 3.3.1 --- Definition --- p.41 / Chapter 3.3.2 --- Learning Algorithm with HEP --- p.41 / Chapter 3.4 --- Findings on Bayesian Network Parameters Calculation --- p.43 / Chapter 3.4.1 --- Situation and Errors --- p.43 / Chapter 3.4.2 --- Proposed Solution --- p.46 / Chapter 3.5 --- Performance Analysis on Proposed BN Classifier Learn- ing Algorithms --- p.47 / Chapter 3.5.1 --- Experimental Methodology --- p.47 / Chapter 3.5.2 --- Benchmark Data --- p.48 / Chapter 3.5.3 --- Clinical Data --- p.50 / Chapter 3.5.4 --- Discussion --- p.55 / Chapter 3.6 --- Summary --- p.56 / Chapter 4 --- Classification in DNA Analysis --- p.57 / Chapter 4.1 --- Related Work --- p.58 / Chapter 4.2 --- Problem Definition --- p.59 / Chapter 4.3 --- Proposed Methodology Architecture --- p.60 / Chapter 4.3.1 --- Overall Design --- p.60 / Chapter 4.3.2 --- Important Components --- p.62 / Chapter 4.4 --- Clustering --- p.63 / Chapter 4.5 --- Feature Selection Algorithms --- p.65 / Chapter 4.5.1 --- Information Gain --- p.66 / Chapter 4.5.2 --- Other Approaches --- p.67 / Chapter 4.6 --- Classification Algorithms --- p.67 / Chapter 4.6.1 --- Naive Bayes Classifier --- p.68 / Chapter 4.6.2 --- Decision Tree --- p.68 / Chapter 4.6.3 --- Neural Networks --- p.68 / Chapter 4.6.4 --- Other Approaches --- p.69 / Chapter 4.7 --- Important Points on Evaluation --- p.69 / Chapter 4.7.1 --- Errors --- p.70 / Chapter 4.7.2 --- Independent Test --- p.70 / Chapter 4.8 --- Performance Analysis on Classification of DNA Data --- p.71 / Chapter 4.8.1 --- Experimental Methodology --- p.71 / Chapter 4.8.2 --- Using Naive-Bayes Classifier --- p.73 / Chapter 4.8.3 --- Using Decision Tree --- p.73 / Chapter 4.8.4 --- Using Neural Network --- p.74 / Chapter 4.8.5 --- Discussion --- p.76 / Chapter 4.9 --- Summary --- p.77 / Chapter 5 --- Adaptive HEP for Learning Bayesian Network Struc- ture --- p.78 / Chapter 5.1 --- Background --- p.79 / Chapter 5.1.1 --- Objective --- p.79 / Chapter 5.1.2 --- Related Work - AEGA --- p.79 / Chapter 5.2 --- Feasibility Study --- p.80 / Chapter 5.3 --- Proposed A-HEP Algorithm --- p.82 / Chapter 5.3.1 --- Structural Dissimilarity Comparison --- p.82 / Chapter 5.3.2 --- Dynamic Population Size --- p.83 / Chapter 5.4 --- Evaluation on Proposed Algorithm --- p.88 / Chapter 5.4.1 --- Experimental Methodology --- p.89 / Chapter 5.4.2 --- Comparison on Running Time --- p.93 / Chapter 5.4.3 --- Comparison on Fitness of Final Network --- p.94 / Chapter 5.4.4 --- Comparison on Similarity to the Original Network --- p.95 / Chapter 5.4.5 --- Parameter Study --- p.96 / Chapter 5.5 --- Applications on Medical Domain --- p.100 / Chapter 5.5.1 --- Discussion --- p.100 / Chapter 5.5.2 --- An Example --- p.101 / Chapter 5.6 --- Summary --- p.105 / Chapter 6 --- Conclusion --- p.107 / Chapter 6.1 --- Summary --- p.107 / Chapter 6.2 --- Future Work --- p.109 / Bibliography --- p.117
|
245 |
Categorização hierárquica de textos em um portal agregador de notíciasBorges, Hugo Lima January 2009 (has links)
Orientadora: Ana Carolina Lorena / Dissertação (mestrado) - Universidade Federal do ABC. Programa de Pós-Graduação em Engenharia da Informação, 2009
|
246 |
Uma proposta lingüística para a edução dos parâmetros de redes Bayesianas-Fuzzy na estimação da probabilidade de erro humanoSALES FILHO, Romero Luiz Mendonça 31 January 2008 (has links)
Made available in DSpace on 2014-06-12T17:38:01Z (GMT). No. of bitstreams: 2
arquivo3964_1.pdf: 949658 bytes, checksum: b644e3df914d1f9c2018ffd9d301379b (MD5)
license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5)
Previous issue date: 2008 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Uma grande escassez de dados é notada quando se está trabalhando em uma análise
probabilística de riscos (APR). Alguns métodos são propostos na literatura como forma de
contornar esse grande problema. Tais métodos são chamados de métodos de edução da
opinião do especialista. Nesses métodos o analista recorre a especialistas que têm grande
conhecimento sobre o problema analisado. Os especialistas, por sua vez, fornecem opiniões
sobre o parâmetro investigado e com isso o analista consegue obter uma estimativa sobre o
valor desconhecido. Neste trabalho será proposto um método de edução capaz de trabalhar
com variáveis lingüísticas, de forma que ao final do processo possa ser obtida uma estimativa
fuzzy sobre determinado parâmetro. Especificamente neste trabalho a idéia é obter estimativas
fuzzy sobre probabilidades condicionais as quais serão utilizadas em uma rede bayesiana-fuzzy
para a estimação da probabilidade de erro humano. Um exemplo de aplicação envolvendo um
eletricista auxiliar presente na atividade de substituição de cadeias de isoladores em linhas de
transmissão é discutido ao final do trabalho
|
247 |
Uma abordagem bayesiana para mapeamento de QTLs em populações experimentais / A Bayesian approach for mapping QTL in experimental populationsAndréia da Silva Meyer 03 April 2009 (has links)
Muitos caracteres em plantas e animais são de natureza quantitativa, influenciados por múltiplos genes. Com o advento de novas técnicas moleculares tem sido possível mapear os locos que controlam os caracteres quantitativos, denominados QTLs (Quantitative Trait Loci). Mapear um QTL significa identificar sua posição no genoma, bem como, estimar seus efeitos genéticos. A maior dificuldade para realizar o mapeamento de QTLs, se deve ao fato de que o número de QTLs é desconhecido. Métodos bayesianos juntamente com método Monte Carlo com Cadeias de Markov (MCMC), têm sido implementados para inferir conjuntamente o número de QTLs, suas posições no genoma e os efeitos genéticos . O desafio está em obter a amostra da distribuição conjunta a posteriori desses parâmetros, uma vez que o número de QTLs pode ser considerado desconhecido e a dimensão do espaço paramétrico muda de acordo com o número de QTLs presente no modelo. No presente trabalho foi implementado, utilizando-se o programa estatístico R uma abordagem bayesiana para mapear QTLs em que múltiplos QTLs e os efeitos de epistasia são considerados no modelo. Para tanto foram ajustados modelos com números crescentes de QTLs e o fator de Bayes foi utilizado para selecionar o modelo mais adequado e conseqüentemente, estimar o número de QTLs que controlam os fenótipos de interesse. Para investigar a eficiência da metodologia implementada foi feito um estudo de simulação em que foram considerados duas diferentes populações experimentais: retrocruzamento e F2, sendo que para ambas as populações foi feito o estudo de simulação considerando modelos com e sem epistasia. A abordagem implementada mostrou-se muito eficiente, sendo que para todas as situações consideradas o modelo selecionado foi o modelo contendo o número verdadeiro de QTLs considerado na simulação dos dados. Além disso, foi feito o mapeamento de QTLs de três fenótipos de milho tropical: altura da planta (AP), altura da espiga (AE) e produção de grãos utilizando a metodologia implementada e os resultados obtidos foram comparados com os resultados encontrados pelo método CIM. / Many traits in plants and animals have quantitative nature, influenced by multiple genes. With the new molecular techniques, it has been possible to map the loci, which control the quantitative traits, called QTL (Quantitative Trait Loci). Mapping a QTL means to identify its position in the genome, as well as to estimate its genetics effects. The great difficulty of mapping QTL relates to the fact that the number of QTL is unknown. Bayesian approaches used with Markov Chain Monte Carlo method (MCMC) have been applied to infer QTL number, their positions in the genome and their genetic effects. The challenge is to obtain the sample from the joined distribution posterior of these parameters, since the number of QTL may be considered unknown and hence the dimension of the parametric space changes according to the number of QTL in the model. In this study, a Bayesian approach was applied, using the statistical program R, in order to map QTL, considering multiples QTL and epistasis effects in the model. Models were adjusted with the crescent number of QTL and Bayes factor was used to select the most suitable model and, consequently, to estimate the number of QTL that control interesting phenotype. To evaluate the efficiency of the applied methodology, a simulation study was done, considering two different experimental populations: backcross and F2, accomplishing the simulation study for both populations, considering models with and without epistasis. The applied approach resulted to be very efficient, considering that for all the used situations, the selected model was the one containing the real number of QTL used in the data simulation. Moreover, the QTL mapping of three phenotypes of tropical corn was done: plant height, corn-cob height and grain production, using the applied methodology and the results were compared to the results found by the CIM method.
|
248 |
Sparse Gaussian process approximations and applicationsvan der Wilk, Mark January 2019 (has links)
Many tasks in machine learning require learning some kind of input-output relation (function), for example, recognising handwritten digits (from image to number) or learning the motion behaviour of a dynamical system like a pendulum (from positions and velocities now to future positions and velocities). We consider this problem using the Bayesian framework, where we use probability distributions to represent the state of uncertainty that a learning agent is in. In particular, we will investigate methods which use Gaussian processes to represent distributions over functions. Gaussian process models require approximations in order to be practically useful. This thesis focuses on understanding existing approximations and investigating new ones tailored to specific applications. We advance the understanding of existing techniques first through a thorough review. We propose desiderata for non-parametric basis function model approximations, which we use to assess the existing approximations. Following this, we perform an in-depth empirical investigation of two popular approximations (VFE and FITC). Based on the insights gained, we propose a new inter-domain Gaussian process approximation, which can be used to increase the sparsity of the approximation, in comparison to regular inducing point approximations. This allows GP models to be stored and communicated more compactly. Next, we show that inter-domain approximations can also allow the use of models which would otherwise be impractical, as opposed to improving existing approximations. We introduce an inter-domain approximation for the Convolutional Gaussian process - a model that makes Gaussian processes suitable to image inputs, and which has strong relations to convolutional neural networks. This same technique is valuable for approximating Gaussian processes with more general invariance properties. Finally, we revisit the derivation of the Gaussian process State Space Model, and discuss some subtleties relating to their approximation. We hope that this thesis illustrates some benefits of non-parametric models and their approximation in a non-parametric fashion, and that it provides models and approximations that prove to be useful for the development of more complex and performant models in the future.
|
249 |
Automated retrieval and extraction of training course information from unstructured web pagesXhemali, Daniela January 2010 (has links)
Web Information Extraction (WIE) is the discipline dealing with the discovery, processing and extraction of specific pieces of information from semi-structured or unstructured web pages. The World Wide Web comprises billions of web pages and there is much need for systems that will locate, extract and integrate the acquired knowledge into organisations practices. There are some commercial, automated web extraction software packages, however their success comes from heavily involving their users in the process of finding the relevant web pages, preparing the system to recognise items of interest on these pages and manually dealing with the evaluation and storage of the extracted results. This research has explored WIE, specifically with regard to the automation of the extraction and validation of online training information. The work also includes research and development in the area of automated Web Information Retrieval (WIR), more specifically in Web Searching (or Crawling) and Web Classification. Different technologies were considered, however after much consideration, Naïve Bayes Networks were chosen as the most suitable for the development of the classification system. The extraction part of the system used Genetic Programming (GP) for the generation of web extraction solutions. Specifically, GP was used to evolve Regular Expressions, which were then used to extract specific training course information from the web such as: course names, prices, dates and locations. The experimental results indicate that all three aspects of this research perform very well, with the Web Crawler outperforming existing crawling systems, the Web Classifier performing with an accuracy of over 95% and a precision of over 98%, and the Web Extractor achieving an accuracy of over 94% for the extraction of course titles and an accuracy of just under 67% for the extraction of other course attributes such as dates, prices and locations. Furthermore, the overall work is of great significance to the sponsoring company, as it simplifies and improves the existing time-consuming, labour-intensive and error-prone manual techniques, as will be discussed in this thesis. The prototype developed in this research works in the background and requires very little, often no, human assistance.
|
250 |
Bayesian compartmental models for zoonotic visceral leishmaniasis in the AmericasOzanne, Marie Veronica 01 May 2019 (has links)
Visceral leishmaniasis (VL) is a serious neglected tropical disease that is endemic in 98 countries and presents a significant public health risk. The epidemiology of VL is complex. In the Americas, it is a zoonotic disease that is caused by a parasite and transmitted among humans and dogs through the bite of an infected sand fly vector. The infection also can be transmitted vertically from mother to child during pregnancy. Infected individuals can be classified as asymptomatic or symptomatic; both classes can transmit infection. In part due to its complexity, VL transmission dynamics are not fully understood. Stochastic compartmental epidemic models are a powerful set of tools that can be used to study these transmission dynamics.
Past compartmental models for VL have been developed in a deterministic framework to accommodate complexity while remaining computationally tractable. In this work, we propose stochastic compartmental models for VL, which are simpler than their deterministic counterparts, but also have several advantages. Notably, this framework allows us to: (1) define a probability of infection transmission between two individuals, (2) obtain both parameter estimates and corresponding uncertainty measures, and (3) employ formal model comparisons.
In this dissertation, we develop both population level and individual level Bayesian compartmental models to study both vector and vertical VL transmission dynamics. As part of this model development, we introduce a compartmental model that allows for two infectious classes. We also derive source specific reproductive numbers to quantify the contributions of different species and infectious classes to maintaining infection in a population. Finally, we propose a formal model comparison method for Bayesian models with high-dimensional discrete parameter spaces. These models, reproductive numbers, and model comparison method are explored in the context of simulations and real VL data from Brazil and the United States.
|
Page generated in 0.0768 seconds