411 |
Sentiment Analysis of Twitter Data Using Machine Learning and Deep Learning MethodsManda, Kundan Reddy January 2019 (has links)
Background: Twitter, Facebook, WordPress, etc. act as the major sources of information exchange in today's world. The tweets on Twitter are mainly based on the public opinion on a product, event or topic and thus contains large volumes of unprocessed data. Synthesis and Analysis of this data is very important and difficult due to the size of the dataset. Sentiment analysis is chosen as the apt method to analyse this data as this method does not go through all the tweets but rather relates to the sentiments of these tweets in terms of positive, negative and neutral opinions. Sentiment Analysis is normally performed in 3 ways namely Machine learning-based approach, Sentiment lexicon-based approach, and Hybrid approach. The Machine learning based approach uses machine learning algorithms and deep learning algorithms for analysing the data, whereas the sentiment lexicon-based approach uses lexicons in analysing the data and they contain vocabulary of positive and negative words. The Hybrid approach uses a combination of both Machine learning and sentiment lexicon approach for classification. Objectives: The primary objectives of this research are: To identify the algorithms and metrics for evaluating the performance of Machine Learning Classifiers. To compare the metrics from the identified algorithms depending on the size of the dataset that affects the performance of the best-suited algorithm for sentiment analysis. Method: The method chosen to address the research questions is Experiment. Through which the identified algorithms are evaluated with the selected metrics. Results: The identified machine learning algorithms are Naïve Bayes, Random Forest, XGBoost and the deep learning algorithm is CNN-LSTM. The algorithms are evaluated with respect to the metrics namely precision, accuracy, F1 score, recall and compared. CNN-LSTM model is best suited for sentiment analysis on twitter data with respect to the selected size of the dataset. Conclusion: Through the analysis of results, the aim of this research is achieved in identifying the best-suited algorithm for sentiment analysis on twitter data with respect to the selected dataset. CNN-LSTM model results in having the highest accuracy of 88% among the selected algorithms for the sentiment analysis of Twitter data with respect to the selected dataset.
|
412 |
Classificação de alvos utilizando atributos cinemáticosMateus de Araujo Fernandes 13 October 2009 (has links)
Sistemas de auxílio à tomada de decisões no cenário de controle/vigilância do espaço aéreo têm sido cada vez mais solicitados à medida que os operadores necessitam lidar com uma maior quantidade de informações e tomar decisões de forma mais ágil e precisa. Neste contexto, é apresentada uma solução para o problema de classificação de alvos aéreos com base em atributos cinemáticos, especificamente velocidade, aceleração e altitude. Estes dados podem ser estimados por um algoritmo de rastreamento a partir de informações de um radar de vigilância primário. São apresentadas três propostas de classificadores, sendo o primeiro deles baseado na relação entre probabilidades condicionais expressa pelo Teorema de Bayes, o segundo implementado a partir de um sistema de inferência fuzzy com pertinências unidimensionais e o terceiro também apoiado na lógica fuzzy, porém com funções de pertinência bidimensionais constituídas a partir de envelopes de vôo das classes de alvos previstas. As características destes classificadores são comparadas entre si e com resultados de trabalhos similares encontrados na literatura. Os três classificadores são capazes de fornecer em tempo real a crença da pertinência de um alvo a determinadas classes, mostrando utilidade na ausência de dados provenientes de um radar secundário/IFF ou de sensores imageadores.
|
413 |
Uso de redes de crença para seleção de declarações de importação.Marcos Antonio Cardoso Ferreira 00 December 2003 (has links)
O Sistema Integrado de Comércio Exterior (SISCOMEX) automatiza as operações de importação e exportação e atua como uma plataforma que sincroniza as ações de todos os órgãos governamentais envolvidos. O módulo importação usa uma técnica denominada Seleção Parametrizada com o objetivo de selecionar para conferência declarações de importação com infrações e liberar ou descartar as que não possuem. Contudo, o desempenho atual dessa técnica é considerado insatisfatório porque regularmente seleciona declarações de importação sem qualquer infração e libera declarações contendo infração. O principal motivo para esse comportamento é creditado à forma como os parâmetros de seleção são fixados e à liberdade limitada para diferenciar e ordenar as declarações selecionadas. Além disso, ela é completamente dependente do julgamento do usuário e ignora o histórico de infrações. Nesse trabalho é proposta uma nova técnica para selecionar declarações de importação para conferência e fiscalização, a qual permite ordená-las de acordo com a possibilidade de infração e permite que seja integrada ao sistema atual. Essa técnica denominada Seleção Probabilística é baseada em Redes de Crença. A Rede de Crença é representada por grafo acíclico dirigido (GAD) o qual exibe o relacionamento de causa e efeito entre diversas variáveis. A técnica é implementada utilizando a linguagem Java, sua API para acesso a banco de dados, JDBCTM, e o Banco de Dados MySQLTM. Dois conjuntos contendo dados de operações reais foram coletados e extraídos. O maior dos conjuntos foi usado para a modelagem e o outro foi usado para avaliar a Seleção Probabilística. Foram obtidos alguns resultados e sobre estes efetuadas análises que mostraram o bom desempenho da técnica proposta. No final são mostradas as conclusões e algumas sugestões para futuros trabalhos.
|
414 |
Klasifikace na základě longitudinálních pozorování / Classification based on longitudinal observationsBandas, Lukáš January 2012 (has links)
The concern of this thesis is to discuss classification of different objects based on longitudinal observations. In the first instance the reader is introduced to a linear mixed-effects model which is useful for longitudinal data modeling. Description of discriminant analysis methods follows. These methods ares usually used for classification based on longitudinal observations. Individual methods are introduced in the theoretic aspect. Random effects approach is generalized to continuous time. Subsequently the methods and features of the linear mixed-effects model are applied to real data. Finally features of the methods are studied with help of simulations.
|
415 |
Classification Performance Between Machine Learning and Traditional Programming in JavaAlassadi, Abdulrahman, Ivanauskas, Tadas January 2019 (has links)
This study proposes a performance comparison between two Java applications with two different programming approaches, machine learning, and traditional programming. A case where both machine learning and traditional programming can be applied is a classification problem with numeric values. The data is heart disease dataset since heart disease is the leading cause of death in the USA. Performance analysis of both applications is carried to state the differences in four main points; the development time for each application, code complexity, and time complexity of the implemented algorithms, the classification accuracy results, and the resource consumption of each application. The machine learning Java application is built with the help of WEKA library and using its NaiveBayes class to build the model and evaluate its accuracy. While the traditional programming Java application is built with the help of a cardiologist as an expert in the field of the problem to identify the injury indications values. The findings of this study are that the traditional programming application scored better performance results in development time, code complexity, and resource consumption. It scored a classification accuracy of 80.2% while the Naive Bayes algorithms in the machine learning application scored an accuracy of 85.51% but on the expense of high resource consumption and execution time.
|
416 |
Comparing survival from cancer using population-based cancer registry data - methods and applicationsYu, Xue Qin January 2007 (has links)
Doctor of Philosophy / Over the past decade, population-based cancer registry data have been used increasingly worldwide to evaluate and improve the quality of cancer care. The utility of the conclusions from such studies relies heavily on the data quality and the methods used to analyse the data. Interpretation of comparative survival from such data, examining either temporal trends or geographical differences, is generally not easy. The observed differences could be due to methodological and statistical approaches or to real effects. For example, geographical differences in cancer survival could be due to a number of real factors, including access to primary health care, the availability of diagnostic and treatment facilities and the treatment actually given, or to artefact, such as lead-time bias, stage migration, sampling error or measurement error. Likewise, a temporal increase in survival could be the result of earlier diagnosis and improved treatment of cancer; it could also be due to artefact after the introduction of screening programs (adding lead time), changes in the definition of cancer, stage migration or several of these factors, producing both real and artefactual trends. In this thesis, I report methods that I modified and applied, some technical issues in the use of such data, and an analysis of data from the State of New South Wales (NSW), Australia, illustrating their use in evaluating and potentially improving the quality of cancer care, showing how data quality might affect the conclusions of such analyses. This thesis describes studies of comparative survival based on population-based cancer registry data, with three published papers and one accepted manuscript (subject to minor revision). In the first paper, I describe a modified method for estimating spatial variation in cancer survival using empirical Bayes methods (which was published in Cancer Causes and Control 2004). I demonstrate in this paper that the empirical Bayes method is preferable to standard approaches and show how it can be used to identify cancer types where a focus on reducing area differentials in survival might lead to important gains in survival. In the second paper (published in the European Journal of Cancer 2005), I apply this method to a more complete analysis of spatial variation in survival from colorectal cancer in NSW and show that estimates of spatial variation in colorectal cancer can help to identify subgroups of patients for whom better application of treatment guidelines could improve outcome. I also show how estimates of the numbers of lives that could be extended might assist in setting priorities for treatment improvement. In the third paper, I examine time trends in survival from 28 cancers in NSW between 1980 and 1996 (published in the International Journal of Cancer 2006) and conclude that for many cancers, falls in excess deaths in NSW from 1980 to 1996 are unlikely to be attributable to earlier diagnosis or stage migration; thus, advances in cancer treatment have probably contributed to them. In the accepted manuscript, I described an extension of the work reported in the second paper, investigating the accuracy of staging information recorded in the registry database and assessing the impact of error in its measurement on estimates of spatial variation in survival from colorectal cancer. The results indicate that misclassified registry stage can have an important impact on estimates of spatial variation in stage-specific survival from colorectal cancer. Thus, if cancer registry data are to be used effectively in evaluating and improving cancer care, the quality of stage data might have to be improved. Taken together, the four papers show that creative, informed use of population-based cancer registry data, with appropriate statistical methods and acknowledgement of the limitations of the data, can be a valuable tool for evaluating and possibly improving cancer care. Use of these findings to stimulate evaluation of the quality of cancer care should enhance the value of the investment in cancer registries. They should also stimulate improvement in the quality of cancer registry data, particularly that on stage at diagnosis. The methods developed in this thesis may also be used to improve estimation of geographical variation in other count-based health measures when the available data are sparse.
|
417 |
Phénoménologie du LHC, prédictions théoriques et leurs incertitudes dans un contexte BayesienHoudeau, Nicolas 30 September 2011 (has links) (PDF)
Le récent démarrage du LHC appelle à la mise à jour et à l'amélioration des prédictions théoriques obtenues à partir du Modèle Standard. Les éventuels signes d'une nouvelle physique devraient apparaître dans un premier temps sous la forme de légères différences entre ces résultats et les observations. Les sections efficaces recherchées doivent donc être ré-estimées à l'énergie de collision du nouvel accélérateur, puis calculées à des ordres supérieurs en perturbation. La complexité des manipulations mises en jeu impose le développement de nouvelles techniques, notamment numériques. Une fois les prédictions théoriques obtenues, leur précision doit également être évaluée de la manière la plus juste possible. Une différence notable avec les résultats expérimentaux ne pourra être mise en évidence qu'à cette seule condition. Cette thèse présente des travaux effectués sur ces trois aspects. D'une part, l'outil numérique FONLL déjà existant a été utilisé pour actualiser les prédictions de production de quarks lourds au premier ordre en perturbation à l'énergie de collision de 7 TeV du LHC. L'étude d'une approche alternative de la gestion d'un type de divergences apparaissant dans la procédure de calcul des sections efficaces, les divergences infrarouges, est également présentée. Enfin, un modèle de confiance (ou probabilités bayésiennes) permettant de décrire rigoureusement la contribution de l'effet de troncature des séries perturbatives à l'incertitude théorique est détaillé. Une discussion sur les notions de mesure de probabilité et de mesure de confiance introduit cette étude.
|
418 |
Approximations of Bayes Classifiers for Statistical Learning of ClustersEkdahl, Magnus January 2006 (has links)
<p>It is rarely possible to use an optimal classifier. Often the classifier used for a specific problem is an approximation of the optimal classifier. Methods are presented for evaluating the performance of an approximation in the model class of Bayesian Networks. Specifically for the approximation of class conditional independence a bound for the performance is sharpened.</p><p>The class conditional independence approximation is connected to the minimum description length principle (MDL), which is connected to Jeffreys’ prior through commonly used assumptions. One algorithm for unsupervised classification is presented and compared against other unsupervised classifiers on three data sets.</p> / Report code: LiU-TEK-LIC 2006:11.
|
419 |
Stochastic Modeling and Statistical Inference of Geological Fault Populations and PatternsBorgos, Hilde Grude January 2000 (has links)
<p>The focus of this work is on faults, and the main issue is statistical analysis and stochastic modeling of faults and fault patterns in petroleum reservoirs. The thesis consists of Part I-V and Appendix A-C. The units can be read independently. Part III is written for a geophysical audience, and the topic of this part is fault and fracture size-frequency distributions. The remaining parts are written for a statistical audience, but can also be read by people with an interest in quantitative geology. The topic of Part I and II is statistical model choice for fault size distributions, with a samling algorithm for estimating Bayes factor. Part IV describes work on spatial modeling of fault geometry, and Part V is a short note on line partitioning. Part I, II and III constitute the main part of the thesis. The appendices are conference abstracts and papers based on Part I and IV.</p> / Paper III: reprinted with kind permission of the American Geophysical Union. An edited version of this paper was published by AGU. Copyright [2000] American Geophysical Union
|
420 |
Stochastic Modeling and Statistical Inference of Geological Fault Populations and PatternsBorgos, Hilde Grude January 2000 (has links)
The focus of this work is on faults, and the main issue is statistical analysis and stochastic modeling of faults and fault patterns in petroleum reservoirs. The thesis consists of Part I-V and Appendix A-C. The units can be read independently. Part III is written for a geophysical audience, and the topic of this part is fault and fracture size-frequency distributions. The remaining parts are written for a statistical audience, but can also be read by people with an interest in quantitative geology. The topic of Part I and II is statistical model choice for fault size distributions, with a samling algorithm for estimating Bayes factor. Part IV describes work on spatial modeling of fault geometry, and Part V is a short note on line partitioning. Part I, II and III constitute the main part of the thesis. The appendices are conference abstracts and papers based on Part I and IV. / Paper III: reprinted with kind permission of the American Geophysical Union. An edited version of this paper was published by AGU. Copyright [2000] American Geophysical Union
|
Page generated in 0.3055 seconds