311 |
Mapeamento da mortalidade infantil no Rio Grande do Sul : uma comparação entre as abordagens empírica bayesiana e totalmente bayesianaSilva, Sabrina Letícia Couto da January 2009 (has links)
A taxa de mortalidade infantil (TMI) tem sido utilizada como um dos principais indicadores da qualidade de vida de uma população e reflete os níveis de saúde e de desenvolvimento sócio-econômico de uma determinada área, sendo considerado um dos mais importantes indicadores epidemiológicos. A análise da dispersão espacial do risco de ocorrência de um evento para dados agregados usualmente é feita via mapas de taxas de incidência, onde as áreas são sombreadas de acordo com os valores calculados para essa taxa. Um grande problema associado ao uso de taxas, porém, é a alta instabilidade que elas possuem para expressar o risco de eventos raros em regiões de população pequena. Alternativamente, existem os métodos de estatística espacial para mapeamento de doenças, denominados estimação Bayesiana Empírica e também estimação Totalmente Bayesiana, que utilizam informações de toda a região ou da vizinhança para estimar o risco de ocorrência do evento em cada área. O presente trabalho aplica e compara os dois métodos de estimação da TMI nos 496 municípios do Rio Grande do Sul através de dados acumulados entre os anos de 2001 a 2004 (dados disponíveis no DATASUS); aponta as vantagens de utilização dos estimadores Bayesianos em relação à taxa bruta e faz a comparação entre as estimativas obtidas através da modelagem bruta e dos métodos Bayesianos. Ao comparar as estimativas obtidas pelas modelagens Bayesianas com as obtidas pelo cálculo bruto, foi possível observar um ganho substancial na interpretação e na detecção de padrões de variação do risco de mortalidade infantil nos municípios do Rio Grande do Sul. Comparando-se os métodos Bayesianos observou-se que as estimativas calculadas através do método Bayesiano Empírico suavizam menos nas áreas de risco alto do que as estimativas Totalmente Bayesianas. O método Bayesiano Empírico pode ser utilizado para reduzir a variação observada na estimação através do método clássico e está implementado em diversos softwares de geoprocessamento e Epidemiologia Espacial, que podem ser facilmente utilizados por profissionais da área da Saúde. O método Totalmente Bayesiano, embora tenha também algumas propriedades importantes do ponto de vista estatístico, ainda tem alta complexidade computacional por demandar mais tempo nas análises. / The infant mortality rate (IMR) has been used as one of the main indicators of the quality of life of a population. It reflects the levels of health and socioeconomic development in a given area, and is considered one of the most important epidemiological indicators. The analysis of spatial dispersion of the risk of occurrence of an event for aggregate data is usually done by incidence rates maps, where the areas are shaded according to the values calculated for this rate. A major problem associated with the use of rates, however, is their high instability to express the risk of rare events in regions with a small population. Alternately, there are spatial statistical methods to map diseases, called Empirical Bayes estimate, and also Totally Bayesian estimate, which use information from the whole region or surroundings to estimate the risk of occurrence of the event in each area. The present study applies and compares the two methods for IMR estimates in the 496 municipalities of Rio Grande do Sul using data accumulated between 2001 and 2004 (data available in DATASUS); it indicates the advantages of using the Bayesian estimates compared to the gross rate and compares the estimates obtained by gross modeling and the Bayesian methods. When the estimates obtained by Bayesian modeling were compared to those of the gross calculation, a substantial gain could be observed in the interpretation and detection of patterns of variation of infant mortality risk in the municipalities of Rio Grande do Sul. Comparing the Bayesian methods, it is observed that the estimates calculated using the Empirical Bayes method smooth out less in the high risk areas than the Totally Bayesian estimates. The Empirical Bayes method can be used to reduce the variation observed in estimation through the classical method and is implemented in different Geoprocessing and Spatial Epidemiology softwares, which can be easily utilized by health care professionals. Although the Totally Bayesian method has a few important properties from the statistical perspective, it is requires highly complex computations.
|
312 |
Seleção de características para problemas de classificação de documentosHugo Wanderley Pinheiro, Roberto 31 January 2011 (has links)
Made available in DSpace on 2014-06-12T15:58:24Z (GMT). No. of bitstreams: 2
arquivo4097_1.pdf: 888475 bytes, checksum: 0cb3006c0211d4a3f7598e6efed04914 (MD5)
license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5)
Previous issue date: 2011 / Os sistemas de classificação de documentos servem, de modo geral, para facilitar o acesso
do usuário a uma base de documentos. Esses sistemas podem ser utilizados para detectar
spams; recomendar notícias de uma revista, artigos científicos ou produtos de uma loja virtual;
refinar buscas e direcioná-las por assunto. Uma das maiores dificuldades na classificação de
documentos é sua alta dimensionalidade. A abordagem bag of words, utilizada para extrair as
características e obter os vetores que representam os documentos, gera dezenas de milhares de
características. Vetores dessa dimensão demandam elevado custo computacional, além de possuir
informações irrelevantes e redundantes. Técnicas de seleção de características reduzem a
dimensionalidade da representação, de modo a acelerar o processamento do sistema e a facilitar
a classificação. Entretanto, a seleção de características utilizada em problemas de classificação
de documentos requer um parâmetro m que define quantas características serão selecionadas.
Encontrar um bom valor para m é um procedimento complicado e custoso. A idéia introduzida
neste trabalho visa remover a necessidade do parâmetro m e garantir que as características
selecionadas cubram todos os documentos do conjunto de treinamento. Para atingir esse objetivo,
o algoritmo proposto itera sobre os documentos do conjunto de treinamento e, para cada
documento, escolhe a característica mais relevante. Se a característica escolhida já tiver sido
selecionada, ela é ignorada, caso contrário, ela é selecionada. Deste modo, a quantidade de
características é conhecida no final da execução do algoritmo, sem a necessidade de declarar
um valor prévio para m. Os métodos propostos seguem essa ideia inicial com certas variações:
inserção do parâmetro f para selecionar várias características por documento; utilização de informação
local das classes; restrição de quais documentos serão usados no processo de seleção.
Os novos algoritmos são comparados com um método clássico (Variable Ranking). Nos experimentos,
foram usadas três bases de dados e cinco funções de avaliação de característica. Os
resultados mostram que os métodos propostos conseguem melhores taxas de acerto
|
313 |
Classificação de séries temporais via Classificador de Bayes empregando Modelos Lineares DinâmicosAguiar, Diana Dorgam de, 92-99171-6468 09 August 2017 (has links)
Submitted by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-12-04T14:17:52Z
No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Dissertação_Diana D. Aguiar.pdf: 2526734 bytes, checksum: ef02491a952f20781293fdfd0e5f5052 (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-12-04T14:18:04Z (GMT) No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Dissertação_Diana D. Aguiar.pdf: 2526734 bytes, checksum: ef02491a952f20781293fdfd0e5f5052 (MD5) / Made available in DSpace on 2017-12-04T14:18:04Z (GMT). No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Dissertação_Diana D. Aguiar.pdf: 2526734 bytes, checksum: ef02491a952f20781293fdfd0e5f5052 (MD5)
Previous issue date: 2017-08-09 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / In this work we present a new approach for applications in Discriminant Analysis
(DA) to problems whose observations in the training set are from time series, using the
Bayes classifier and modeling the classes distributions in with Linear Dynamic Models.
Theoretical developments were conducted to obtain an analytic form for the classe posterior
probability. The simulation studies have been developed to evaluate the proposed
approach, to evaluate different strategies to estimate the model variance and determine the
classification error rates (ET) to compare them with other usual approaches in AD. Time
series were simulated with different structures of classes separation and with different
sizes for the training set. The proposed approach was also applied to data from real problems
with different degrees of difficulty with respect to the classes number, the time series
size and number of observations in the training set. With real data the proposed classifier
was compared with other classifiers in terms of error rate. Although it is needed most
complete studies, the results suggest that this parametric approach developed constitutes
a promising alternative for problems in AD with time series, particularly in a challenging
context when the size time series is much large than the number of observations in the
classes. / Na presente dissertação apresentamos uma nova abordagem para aplicações em Análise
Discriminante (AD) para problemas cujas observações no conjunto de treinamento
são oriundas de séries temporais, empregando o Classificador de Bayes e modelando as
distribuições nas classes com o emprego de Modelos Lineares Dinâmicos. Foram realizados
os desenvolvimentos teóricos necessários para a obtenção de uma forma analítica
para as probabilidades a posteriori das classes. Para avaliar a abordagem proposta foram
desenvolvidos estudos de simulação, tanto para avaliar as estratégias da escolha do procedimento
da estimação da variância, como também, determinar as taxas de erro (TE) de
classificação para compará-las com outras abordagens usuais para classificadores em AD.
Foram simuladas observações de séries temporais com diferentes estruturas de separação
das classes e com diferentes tamanhos para o conjunto de treinamento. A abordagem
proposta também foi aplicada em dados de problemas reais, com diferentes graus de dificuldades
com relação ao número de classes, tamanho das séries e o número de observações
no conjunto de treinamento, sendo então comparadas suas TE com as de outros
classificadores. Embora sejam necessários estudos mais completos, os resultados obtidos
sugerem que a abordagem paramétrica desenvolvida se constitui em uma alternativa promissora
para esta categoria de problemas em AD, com observações de séries temporais,
em particular, em um contexto bastante desafiador na prática quando temos séries com
tamanhos grandes com relação ao número de observações nas classes.
|
314 |
Análise probabilística de riscos via Redes Bayesianas : uma aplicação na construção de poços multilateraisSANTOS, Wagner Barbosa dos January 2005 (has links)
Made available in DSpace on 2014-06-12T17:42:21Z (GMT). No. of bitstreams: 2
arquivo7441_1.pdf: 1476766 bytes, checksum: 966c97680d6347ba16c47188da5e72dd (MD5)
license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5)
Previous issue date: 2005 / A análise probabilística de riscos é um método que ajuda a identificar e avaliar o risco,
em sistemas de tecnologia complexa, com o propósito de através de uma análise de custobenefício
melhorar a segurança e desempenho. O método tradicional faz uso de duas
técnicas de modelagem e avaliação: árvore de falhas e árvore de eventos. Porém, estas
técnicas possuem algumas limitações, tais como: O modelo se torna, algumas vezes, uma
aproximação grosseira da realidade, devido a considerações necessárias na modelagem do
sistema; Dentro deste conceito tem-se a consideração de independência entre variáveis, que
algumas vezes não são; a necessidade em descrever os eventos como dicotômicos, que em
alguns casos possuem vários estados possíveis. Outra limitação está na dificuldade em
atualizar as informações já modeladas a partir de uma nova informação.
Diante destas limitações, a utilização de redes Bayesianas, foi a saída encontrada para
modelar sistemas de forma mais aproxima a realidade. Possibilitando a constante atualização
com base nas informações obtidas, no decorrer da vida útil do sistema.
A análise probabilística de riscos via redes Bayesianas, foi validada pela aplicação da
técnica na análise da tecnologia multilateral, que são sistemas utilizados em poços
multilaterais de petróleo. A aplicação teve como objetivo, avaliar o risco na construção de
poços multilaterais, e com base no modelo, realizar o gerenciamento do risco durante a
execução da atividade
|
315 |
Learning words and syntactic cues in highly ambiguous contextsJones, Bevan Keeley January 2016 (has links)
The cross-situational word learning paradigm argues that word meanings can be approximated by word-object associations, computed from co-occurrence statistics between words and entities in the world. Lexicon acquisition involves simultaneously guessing (1) which objects are being talked about (the ”meaning”) and (2) which words relate to those objects. However, most modeling work focuses on acquiring meanings for isolated words, largely neglecting relationships between words or physical entities, which can play an important role in learning. Semantic parsing, on the other hand, aims to learn a mapping between entire utterances and compositional meaning representations where such relations are central. The focus is the mapping between meaning and words, while utterance meanings are treated as observed quantities. Here, we extend the joint inference problem of word learning to account for compositional meanings by incorporating a semantic parsing model for relating utterances to non-linguistic context. Integrating semantic parsing and word learning permits us to explore the impact of word-word and concept-concept relations. The result is a joint-inference problem inherited from the word learning setting where we must simultaneously learn utterance-level and individual word meanings, only now we also contend with the many possible relationships between concepts in the meaning and words in the sentence. To simplify design, we factorize the model into separate modules, one for each of the world, the meaning, and the words, and merge them into a single synchronous grammar for joint inference. There are three main contributions. First, we introduce a novel word learning model and accompanying semantic parser. Second, we produce a corpus which allows us to demonstrate the importance of structure in word learning. Finally, we also present a number of technical innovations required for implementing such a model.
|
316 |
Using dated training sets for classifying recent news articles with Naive Bayes and Support Vector Machines : An experiment comparing the accuracy of classifications using test sets from 2005 and 2017Rydberg, Filip, Tornfors, Jonas January 2017 (has links)
Text categorisation is an important feature for organising text data and making it easier to find information on the world wide web. The categorisation of text data can be done through the use of machine learning classifiers. These classifiers need to be trained with data in order to predict a result for future input. The authors chose to investigate how accurate two classifiers are when classifying recent news articles on a classifier model that is trained with older news articles. To reach a result the authors chose the Naive Bayes and Support Vector Machine classifiers and conducted an experiment. The experiment involved training models of both classifiers with news articles from 2005 and testing the models with news articles from 2005 and 2017 to compare the results. The results showed that both classifiers did considerably worse when classifying the news articles from 2017 compared to classifying the news articles from the same year as the training data.
|
317 |
Bayes linear variance learning for mixed linear temporal modelsRandell, David January 2012 (has links)
Modelling of complex corroding industrial systems is ritical to effective inspection and maintenance for ssurance of system integrity. Wall thickness and corrosion rate are modelled for multiple dependent corroding omponents, given observations of minimum wall thickness per component. At each inspection, partial observations of the system are considered. A Bayes Linear approach is adopted simplifying parameter estimation and avoiding often unrealistic distributional assumptions. Key system variances are modelled, making exchangeability assumptions to facilitate analysis for sparse inspection time-series. A utility based criterion is used to assess quality of inspection design and aid decision making. The model is applied to inspection data from pipework networks on a full-scale offshore platform.
|
318 |
Classifying receipts or invoices from images based on text extractionKaci, Iuliia January 2016 (has links)
Nowadays, most of the documents are stored in electronic form and there is a high demand to organize and categorize them efficiently. Therefore, the field of automated text classification has gained a significant attention both from science and industry. This technology has been applied to information retrieval, information filtering, news classification, etc. The goal of this project is the automated text classification of photos as invoices or receipts in Visma Mobile Scanner, based on the previously extracted text. Firstly, several OCR tools available on the market have been evaluated in order to find the most accurate to be used for the text extraction, which turned out to be ABBYY FineReader. The machine learning tool WEKA has been used for the text classification, with the focus on the Naïve Bayes classifier. Since the Naïve Bayes implementation provided by WEKA does not support some advances in the text classification field such as N-gram, Laplace smoothing, etc., an improved version of Naïve Bayes classifier which is more specialized for the text classification and the invoice/receipt classification has been implemented. Improving the Naive Bayes classifier, investigating how it can be improved for the problem domain and evaluating the obtained classification accuracy compared to the generic Naïve Bayes are the main parts of this research. Experimental results show that the specialized Naïve Bayes classifier has the highest accuracy. By applying the Fixed penalty feature, the best result of 95.6522% accuracy on cross-validation mode has been achieved. In case of more accurate text extraction, the accuracy is even higher.
|
319 |
Quelques contributions au filtrage optimal avec l'estimation de paramètres et application à la séparation de la parole mono-capteur / Some contributions to joint optimal filtering and parameter estimation with application to monaural speech separationBensaid, Siouar 06 June 2014 (has links)
Nous traitons le sujet de l’estimation conjointe des signaux aléatoires dépendant de paramètres déterministes et inconnus. Premièrement, on aborde le sujet du côté applicatif en proposant deux algorithmes de séparation de la parole voisée mono-capteur. Dans le premier, nous utilisons le modèle autorégressif de la parole qui décrit les corrélations court et long termes (quasi-périodique) pour formuler un modèle d’état dépendant de paramètres inconnus. EM-Kalman est ainsi utilisé pour estimer conjointement les sources et les paramètres. Dans le deuxième, nous proposons une méthode fréquentielle pour le même modèle de la parole où les sources et les paramètres sont estimés séparément. Les observations sont découpées à l’aide d’un fenêtrage bien conçu pour assurer une reconstruction parfaite des sources après. Les paramètres (de l’enveloppe spectrale) sont estimés en maximisant le critère du GML exprimé avec la matrice de covariance paramétrée que nous modélisons plus correctement en tenant compte de l’effet du fenêtrage. Le filtre de Wiener est utilisé pour estimer les sources. Deuxièmement, on aborde l’estimation conjointe d’un point de vue plus théorique en s'interrogeant sur les performances relatives de l’estimation conjointe par rapport à l’estimation séparée d’une manière générale. Nous considérons le cas conjointement Gaussien (observations et variables cachées) et trois méthodes itératives d'estimation conjointe: MAP en alternance avec ML, biaisé même asymptotiquement pour les paramètres, EM qui converge asymptotiquement vers ML et VB que nous prouvons converger asymptotiquement vers la solution ML pour les paramètres déterministes. / The thesis is composed of two parts. In the first part, we deal with the monaural speech separation problem. We propose two algorithms. In the first algorithm, we exploit the joint autoregressive model that models short and long (periodic) correlations of Gaussian speech signals to formulate a state space model with unknown parameters. The EM-Kalman algorithm is then used to estimate jointly the sources (involved in the state vector) and the parameters of the model. In the second algorithm, we use the same speech model but this time in the frequency domain (quasi-periodic Gaussian sources with AR spectral envelope). Observation data is sliced using a well-designed window. Parameters are estimated separately from the sources by optimizing the Gaussian ML criterion expressed using the sample and parameterized covariance matrices. Classical frequency domain asymptotic methods replace linear convolution by circulant convolution leading to approximation errors. We show how the introduction of windows can lead to slightly more complex frequency domain techniques, replacing diagonal covariance matrices by banded covariance matrices, but with controlled approximation error. The sources are then estimated using the Wiener filtering. The second part is about the relative performance of joint vs. marginalized parameter estimation. We consider jointly Gaussian latent data and observations. We provide contributions to Cramer-Rao bounds, then, we investigate three iterative joint estimation approaches: Alternating MAP/ML which suffers from inconsistent parameter bias, EM which converges to ML and VB that we prove converges asymptotically to the ML solution for parameter estimation.
|
320 |
Essays on the Economics of Risky Health BehaviorsQiu, Qihua 15 December 2017 (has links)
This dissertation consists of three essays studying the economics of risky health behaviors. Essay 1 estimates the effects of Graduated Driver Licensing (GDL) restrictions on weight status among adolescents aged 14 to 17 in the U.S. The findings suggest that a night curfew significantly raises adolescents’ probability of being “overweight or obese” by 1.32 percentage points, corresponding to an increase in “overweight or obesity” rate of 4.8%. A night curfew combined with a passenger restriction increases this rate by 5.8%. Overall, I estimate that nearly 16% of the rise in “overweight or obesity” rate among teenagers aged 14 to 17 in the U.S from 1999 to 2015 can be explained by the presence of the GDL restrictions. In addition, the restrictions reduce teenagers’ exercise frequency while increasing their time spent watching TV, which may help to explain the adverse effects on obesity.
Essay 2 exploits the effects of the Graduated Driver Licensing (GDL) restrictions on youth smoking and drinking. It finds that being subject to minimum entry age, a learner stage, or only a night curfew has no statistically significant effect whereas, interestingly, a night curfew combined with a passenger restriction reduces youth smoking and drinking. The estimated effects become more statistically significant and larger in magnitude in the medium run, which is in line with the addictive nature of these substances.
Essay 3 investigates the underlying causes of suicide. It uses data from the U.S. at the county level and the primary methodology is a two-level Bayesian hierarchical model with spatially correlated random effects. The results show that the significant effects of observable factors on suicides found by earlier research may partially stem from excluding small area effects and time trends, without controlling for which the true contribution of unobserved propensities and time trends can be hidden within observable factors. Most importantly, a lot can be learned from unobserved yet persistent propensity toward suicide captured by the spatially correlated county specific random effects. Resources should be allocated to counties with high suicide rates, but also counties with low raw suicide rates but high unobserved propensities of suicide.
|
Page generated in 0.0266 seconds