• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 57
  • 18
  • 13
  • 7
  • 5
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 121
  • 121
  • 64
  • 57
  • 49
  • 42
  • 28
  • 28
  • 27
  • 26
  • 24
  • 21
  • 20
  • 18
  • 17
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Using dated training sets for classifying recent news articles with Naive Bayes and Support Vector Machines : An experiment comparing the accuracy of classifications using test sets from 2005 and 2017

Rydberg, Filip, Tornfors, Jonas January 2017 (has links)
Text categorisation is an important feature for organising text data and making it easier to find information on the world wide web.  The categorisation of text data can be done through the use of machine learning classifiers. These classifiers need to be trained with data in order to predict a result for future input. The authors chose to investigate how accurate two classifiers are when classifying recent news articles on a classifier model that is trained with older news articles. To reach a result the authors chose the Naive Bayes and Support Vector Machine classifiers and conducted an experiment. The experiment involved training models of both classifiers with news articles from 2005 and testing the models with news articles from 2005 and 2017 to compare the results. The results showed that both classifiers did considerably worse when classifying the news articles from 2017 compared to classifying the news articles from the same year as the training data.
32

Classifying receipts or invoices from images based on text extraction

Kaci, Iuliia January 2016 (has links)
Nowadays, most of the documents are stored in electronic form and there is a high demand to organize and categorize them efficiently. Therefore, the field of automated text classification has gained a significant attention both from science and industry. This technology has been applied to information retrieval, information filtering, news classification, etc. The goal of this project is the automated text classification of photos as invoices or receipts in Visma Mobile Scanner, based on the previously extracted text. Firstly, several OCR tools available on the market have been evaluated in order to find the most accurate to be used for the text extraction, which turned out to be ABBYY FineReader. The machine learning tool WEKA has been used for the text classification, with the focus on the Naïve Bayes classifier. Since the Naïve Bayes implementation provided by WEKA does not support some advances in the text classification field such as N-gram, Laplace smoothing, etc., an improved version of Naïve Bayes classifier which is more specialized for the text classification and the invoice/receipt classification has been implemented. Improving the Naive Bayes classifier, investigating how it can be improved for the problem domain and evaluating the obtained classification accuracy compared to the generic Naïve Bayes are the main parts of this research. Experimental results show that the specialized Naïve Bayes classifier has the highest accuracy. By applying the Fixed penalty feature, the best result of 95.6522% accuracy on cross-validation mode has been achieved. In case of more accurate text extraction, the accuracy is even higher.
33

Production planning of combined heat and power plants with regards to electricity price spikes : A machine learning approach

Fransson, Nathalie January 2017 (has links)
District heating systems could help manage the expected increase of volatility on the Nordic electricity market by starting a combined heat and power production plant (CHP) instead of a heat only production plant when electricity prices are expected to be high. Fortum Värme is interested in adjusting the production planning of their district heating system more towards high electricity prices and in their system there is a peak load CHP unit that could be utilised for this purpose. The economic potential of starting the CHP, instead of a heat only production unit, when profitable was approximated for 2013-2016. Three machine learning classification algorithms, Support vector machine (SVM), Naive Bayes and an ensemble of decision trees were implemented and compared with the purpose of predicting price spikes in price area SE3, where Fortum Värme operates, and to assist production planning. The results show that the SVM model achieved highest performance and could be useful in production planning towards high electricity prices. The results also show a potential profit of adjusting production planning. A potential that might increase if the electricity market becomes more volatile.
34

Analýza sentimentu zákaznických recenzí / Sentiment Analysis of Customer Reviews

Hrabák, Jan January 2016 (has links)
This thesis is focused on sentiment analysis of unstructured text and its practical application on the real data downloaded from website Yelp.com The objectives of the theoretical part of this thesis is to sum up the information related to history, methods and possible applications of sentiment analysis. A reader is acquainted with important terms and processes of sentiment analysis. Theoretical part is focused on Naive Bayes classifier, that will be used in practical part of this thesis. In practical part there is detailed description of data set, construction and testing of model. At the end there are presented pros and cons of the chosen model and described some possibilities of its usage.
35

Modelos probabilísticos e não probabilísticos de classificação binária para pacientes com ou sem demência como auxílio na prática clínica em geriatria.

Galdino, Maicon Vinícius. January 2020 (has links)
Orientador: Liciana Vaz de Arruda Silveira / Resumo: Os objetivos deste trabalho foram apresentar modelos de classificação (Regressão Logística, Naive Bayes, Árvores de Classificação, Random Forest, k-Vizinhos mais próximos e Redes Neurais Artificiais) e a comparação destes utilizando processos de reamostragem em um conjunto de dados da área de geriatria (diagnóstico de demência). Analisar as pressuposições de cada metodologia, vantagens, desvantagens e cenários em que cada metodologia pode ser melhor utilizada. A justificativa e relevância desse projeto se baseiam na importância e na utilidade do tema proposto, visto que a população idosa aumenta em todo o mundo (nos países desenvolvidos e nos em desenvolvimento como o Brasil), os modelos de classificação podem ser úteis aos profissionais médicos, em especial aos médicos generalistas, no diagnóstico de demências, pois em diversos momentos o diagnóstico não é simples. / Doutor
36

An Automated Digital Analysis of Depictions of Child Maltreatment in Ancient Roman Writings

Browne, Alexander January 2019 (has links)
Historians, mostly engaging with written evidence, have argued that the Christianisation of the Roman Empire resulted in changes in both attitudes and behaviour towards children, resulting in a decrease in their maltreatment by society. I begin with a working hypothesis that this attitude-change was real and resulted in a reduction in the maltreatment of children; and that this reduction in maltreatment is evident in the literature. The approach to investigating this hypothesis belongs to the emerging field of digital humanities: by using programming techniques developed in the field of sentiment analysis, I create two sentiment-analysis like tools, one a lexicon-based approach, the other an application of a naive bayes machine learning approach. The latter is favoured as more accurate. The tool is used to automatically tag sentences, extracted from a corpus of texts written between 100 B.C and 600 A.D, that mention children, as to whether the sentences feature the maltreatment of children or not. The results are then quantitively analysed with reference to the year in which the text was written, with no statistically significant result found. However, the high accuracy of the tool in tagging sentences, at above 88%, suggests that similar tools may be able to play an important role, alongside traditional research techniques, in historical and social-science research in the future.
37

Data Analysis of Minimally-Structured Heterogeneous Logs : An experimental study of log template extraction and anomaly detection based on Recurrent Neural Network and Naive Bayes.

Liu, Chang January 2016 (has links)
Nowadays, the ideas of continuous integration and continuous delivery are under heavy usage in order to achieve rapid software development speed and quick product delivery to the customers with good quality. During the process ofmodern software development, the testing stage has always been with great significance so that the delivered software is meeting all the requirements and with high quality, maintainability, sustainability, scalability, etc. The key assignment of software testing is to find bugs from every test and solve them. The developers and test engineers at Ericsson, who are working on a large scale software architecture, are mainly relying on the logs generated during the testing, which contains important information regarding the system behavior and software status, to debug the software. However, the volume of the data is too big and the variety is too complex and unpredictable, therefore, it is very time consuming and with great efforts for them to manually locate and resolve the bugs from such vast amount of log data. The objective of this thesis project is to explore a way to conduct log analysis efficiently and effectively by applying relevant machine learning algorithms in order to help people quickly detect the test failure and its possible causalities. In this project, a method of preprocessing and clusering original logs is designed and implemented in order to obtain useful data which can be fed to machine learning algorithms. The comparable log analysis, based on two machine learning algorithms - Recurrent Neural Network and Naive Bayes, is conducted for detecting the place of system failures and anomalies. Finally, relevant experimental results are provided and analyzed.
38

Taskfinder : Comparison of NLP techniques for textclassification within FMCG stores

Jensen, Julius January 2022 (has links)
Natural language processing has many important applications in today, such as translations, spam filters, and other useful products. To achieve these applications supervised and unsupervised machine learning models, have shown to be successful. The most important aspect of these models is what the model can achieve with different datasets. This article will examine how RNN models compare with Naive Bayes in text classification. The chosen RNN models are long short-term memory (LSTM) and gated recurrent unit (GRU). Both LSTM and GRU will be trained using the flair Framework. The models will be trained on three separate datasets with different compositions, where the trend within each model will be examined and compared with the other models. The result showed that Naive Bayes performed better on classifying short sentences than the RNN models, but worse in longer sentences. When trained on a small dataset LSTM and GRU had a better result then Naive Bayes. The best performing model was Naive Bayes, which had the highest accuracy score in two out of the three datasets.
39

Exploration of infectious disease transmission dynamics using the relative probability of direct transmission between patients

Leavitt, Sarah Van Ness 06 October 2020 (has links)
The question “who infected whom” is a perennial one in the study of infectious disease dynamics. To understand characteristics of infectious diseases such as how many people will one case produce over the course of infection (the reproductive number), how much time between the infection of two connected cases (the generation interval), and what factors are associated with transmission, one must ascertain who infected whom. The current best practices for linking cases are contact investigations and pathogen whole genome sequencing (WGS). However, these data sources cannot perfectly link cases, are expensive to obtain, and are often not available for all cases in a study. This lack of discriminatory data limits the use of established methods in many existing infectious disease datasets. We developed a method to estimate the relative probability of direct transmission between any two infectious disease cases. We used a subset of cases that have pathogen WGS or contact investigation data to train a model and then used demographic, spatial, clinical, and temporal data to predict the relative transmission probabilities for all case-pairs using a simple machine learning algorithm called naive Bayes. We adapted existing methods to estimate the reproductive number and generation interval to use these probabilities. Finally, we explored the associations between various covariates and transmission and how they related to the associations between covariates and pathogen genetic relatedness. We applied these methods to a tuberculosis outbreak in Hamburg, Germany and to surveillance data in Massachusetts, USA. Through simulations we found that our estimated transmission probabilities accurately classified pairs as links and nonlinks and were able to accurately estimate the reproductive number and the generation interval. We also found that the association between covariates and genetic relatedness captures the direction but not absolute magnitude of the association between covariates and transmission, but the bias was improved by using effect estimates from the naive Bayes algorithm. The methods developed in this dissertation can be used to explore transmission dynamics and estimate infectious disease parameters in established datasets where this was not previously feasible because of a lack of highly discriminatory information, and therefore expand our understanding of many infectious diseases.
40

Email Classification : An evaluation of Deep Neural Networks with Naive Bayes

Michailoff, John January 2019 (has links)
Machine learning (ML) is an area of computer science that gives computers the ability to learn data patterns without prior programming for those patterns. Using neural networks in this area is based on simulating the biological functions of neurons in brains to learn patterns in data, giving computers a predictive ability to comprehend how data can be clustered. This research investigates the possibilities of using neural networks for classifying email, i.e. working as an email case manager. A Deep Neural Network (DNN) are multiple layers of neurons connected to each other by trainable weights. The main objective of this thesis was to evaluate how the three input arguments - data size, training time and neural network structure – affects the accuracy of Deep Neural Networks pattern recognition; also an evaluation of how the DNN performs compared to the statistical ML method, Naïve Bayes, in the form of prediction accuracy and complexity; and finally the viability of the resulting DNN as a case manager. Results show an improvement of accuracy on our networks with the increase of training time and data size respectively. By testing increasingly complex network structures (larger networks of neurons with more layers) it is observed that overfitting becomes a problem with increased training time, i.e. how accuracy decrease after a certain threshold of training time. Naïve Bayes classifiers performs worse than DNN in terms of accuracy, but better in reduced complexity; making NB viable on mobile platforms. We conclude that our developed prototype may work well in tangent with existing case management systems, tested by future research.

Page generated in 0.0946 seconds