• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 61
  • 18
  • 13
  • 7
  • 5
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 125
  • 125
  • 66
  • 60
  • 50
  • 44
  • 28
  • 28
  • 28
  • 27
  • 24
  • 22
  • 20
  • 19
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Automated classification of bibliographic data using SVM and Naive Bayes

Nordström, Jesper January 2018 (has links)
Classification of scientific bibliographic data is an important and increasingly more time-consuming task in a “publish or perish” paradigm where the number of scientific publications is steadily growing. Apart from being a resource-intensive endeavor, manual classification has also been shown to be often performed with a quite high degree of inconsistency. Since many bibliographic databases contain a large number of already classified records supervised machine learning for automated classification might be a solution for handling the increasing volumes of published scientific articles. In this study automated classification of bibliographic data, based on two different machine learning methods; Naive Bayes and Support Vector Machine (SVM), were evaluated. The data used in the study were collected from the Swedish research database SwePub and the features used for training the classifiers were based on abstracts and titles in the bibliographic records. The accuracy achieved ranged between a lowest score of 0.54 and a highest score of 0.84. The classifiers based on Support Vector Machine did consistently receive higher scores than the classifiers based on Naive Bayes. Classification performed at the second level in the hierarchical classification system used clearly resulted in lower scores than classification performed at the first level. Using abstracts as the basis for feature extraction yielded overall better results than using titles, the differences were however very small.
62

Lane Change Intent Analysis for Preceding Vehicles : a Study Using Various Machine Learning Techniques / Analys av framförvarande fordons filbytesintentioner : En studie utnyttjande koncept från maskininlärning

Fredrik, Ljungberg January 2017 (has links)
In recent years, the level of technology in heavy duty vehicles has increased significantly. Progress has been made towards autonomous driving, with increaseddriver comfort and safety, partly by use of advanced driver assistance systems (ADAS). In this thesis the possibilities to detect and predict lane changes for the preceding vehicle are studied. This important information will help to improve the decision-making for safety systems. Some suitable approaches to solving the problem are presented, along with an evaluation of their related accuracies. The modelling of human perceptions and actions is a challenging task. Several thousand kilometers of driving data was available, and a reasonable course of action was to let the system learn from this off-line. For the thesis it was therefore decided to review the possibility to utilize a branch within the area of artificial intelligence, called supervised learning. The study of driving intentions was formulatedas a binary classification problem. To distinguish between lane-change and lane-keep actions, four machine learning-techniques were evaluated, namely naive Bayes, artificial neural networks, support vector machines and Gaussian processes. As input to the classifiers, fused sensor signals from today commercially accessible systems in Scania vehicles were used. The project was carried out within the boundaries of a Master’s Thesis projectin collaboration between Linköping University and Scania CV AB. Scania CV AB is a leading manufacturer of heavy trucks, buses and coaches, alongside industrialand marine engines.
63

Predicting Political Party Affiliation in the Swedish Parliament using Natural Language Processing

Zetterberg, Johannes January 2022 (has links)
Text classification is a fundamental part of natural language processing. In this thesis, methods for text classification are used in an attempt to predict the political party affiliation of members of parliament (MPs). The objective is to evaluate the performance of Support Vector Machines (SVM), naive Bayes, and a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model in predicting MPs' political party affiliation based on speeches given in the Chamber of the Swedish Parliament. This study shows that BERT outperforms SVM and naive Bayes in correctly classifying MPs, and SVM makes better predictions than naive Bayes and performs reasonably well compared to BERT. The results show that all models correctly predict MPs representing the Sweden Democrats to the highest degree. Both BERT and SVM roughly classify every other speech correctly, which implies much better than making random predictions. These results indicate the potential use of methods for automatically classifying political speeches.
64

Natural language processing for researchh philosophies and paradigms dissertation (DFIT91)

Mawila, Ntombhimuni 28 February 2021 (has links)
Research philosophies and paradigms (RPPs) reveal researchers’ assumptions and provide a systematic way in which research can be carried out effectively and appropriately. Different studies highlight cognitive and comprehension challenges of RPPs concepts at the postgraduate level. This study develops a natural language processing (NLP) supervised classification application that guides students in identifying RPPs applicable to their study. By using algorithms rooted in a quantitative research approach, this study builds a corpus represented using the Bag of Words model to train the naïve Bayes, Logistic Regression, and Support Vector Machine algorithms. Computer experiments conducted to evaluate the performance of the algorithms reveal that the Naïve Bayes algorithm presents the highest accuracy and precision levels. In practice, user testing results show the varying impact of knowledge, performance, and effort expectancy. The findings contribute to the minimization of issues postgraduates encounter in identifying research philosophies and the underlying paradigms for their studies. / Science and Technology Education / MTech. (Information Technology)
65

How to explain graph-based semi-supervised learning for non-mathematicians?

Jönsson, Mattias, Borg, Lucas January 2019 (has links)
Den stora mängden tillgänglig data på internet kan användas för att förbättra förutsägelser genom maskininlärning. Problemet är att sådan data ofta är i ett obehandlat format och kräver att någon manuellt bestämmer etiketter på den insamlade datan innan den kan användas av algoritmen. Semi-supervised learning (SSL) är en teknik där algoritmen använder ett fåtal förbehandlade exempel och därefter automatiskt bestämmer etiketter för resterande data. Ett tillvägagångssätt inom SSL är att representera datan i en graf, vilket kallas för graf-baserad semi-supervised learning (GSSL), och sedan hitta likheter mellan noderna i grafen för att automatiskt bestämma etiketter.Vårt mål i denna uppsatsen är att förenkla de avancerade processerna och stegen för att implementera en GSSL-algoritm. Vi kommer att gå igen grundläggande steg som hur utvecklingsmiljön ska installeras men även mer avancerade steg som data pre-processering och feature extraction. Feature extraction metoderna som uppsatsen använder sig av är bag-of-words (BOW) och term frequency-inverse document frequency (TF-IDF). Slutgiltligen presenterar vi klassificering av dokument med Label Propagation (LP) och Multinomial Naive Bayes (MNB) samt en detaljerad beskrivning över hur GSSL fungerar.Vi presenterar även prestanda för klassificering-algoritmerna genom att klassificera 20 Newsgroup datasetet med LP och MNB. Resultaten dokumenteras genom två olika utvärderingspoäng vilka är F1-score och accuracy. Vi gör även en jämförelse mellan MNB och LP med två olika typer av kärnor, KNN och RBF, på olika mängder av förbehandlade träningsdokument. Resultaten ifrån klassificering-algoritmerna visar att MNB är bättre på att klassificera datasetet än LP. / The large amount of available data on the web can be used to improve the predictions made by machine learning algorithms. The problem is that such data is often in a raw format and needs to be manually labeled by a human before it can be used by a machine learning algorithm. Semi-supervised learning (SSL) is a technique where the algorithm uses a few prepared samples to automatically prepare the rest of the data. One approach to SSL is to represent the data in a graph, also called graph-based semi-supervised learning (GSSL), and find similarities between the nodes for automatic labeling.Our goal in this thesis is to simplify the advanced processes and steps to implement a GSSL-algorithm. We will cover basic tasks such as setup of the developing environment and more advanced steps such as data preprocessing and feature extraction. The feature extraction techniques covered are bag-of-words (BOW) and term frequency-inverse document frequency (TF-IDF). Lastly, we present how to classify documents using Label Propagation (LP) and Multinomial Naive Bayes (MNB) with a detailed explanation of the inner workings of GSSL. We showcased the classification performance by classifying documents from the 20 Newsgroup dataset using LP and MNB. The results are documented using two different evaluation scores called F1-score and accuracy. A comparison between MNB and the LP-algorithm using two different types of kernels, KNN and RBF, was made on different amount of labeled documents. The results from the classification algorithms shows that MNB is better at classifying the data than LP.
66

Klasifikace příspěvků ve webových diskusích / Classification of Web Forum Entries

Margold, Tomáš January 2008 (has links)
This thesis is dealing text ranking on the internet background. There are described available methods for classification and splitting of the text reports. The part of this thesis is implementation of Bayes naive algorithm and classifier using neuron nets. Selected methods are compared considering their error rate or other ranking features.
67

Improving Filtering of Email Phishing Attacks by Using Three-Way Text Classifiers

Trevino, Alberto 13 March 2012 (has links) (PDF)
The Internet has been plagued with endless spam for over 15 years. However, in the last five years spam has morphed from an annoying advertising tool to a social engineering attack vector. Much of today's unwanted email tries to deceive users into replying with passwords, bank account information, or to visit malicious sites which steal login credentials and spread malware. These email-based attacks are known as phishing attacks. Much has been published about these attacks which try to appear real not only to users and subsequently, spam filters. Several sources indicate traditional content filters have a hard time detecting phishing attacks because the emails lack the traditional features and characteristics of spam messages. This thesis tests the hypothesis that by separating the messages into three categories (ham, spam and phish) content filters will yield better filtering performance. Even though experimentation showed three-way classification did not improve performance, several additional premises were tested, including the validity of the claim that phishing emails are too much like legitimate emails and the ability of Naive Bayes classifiers to properly classify emails.
68

Evaluating Statistical MachineLearning and Deep Learning Algorithms for Anomaly Detection in Chat Messages / Utvärdering av statistiska maskininlärnings- och djupinlärningsalgoritmer för anomalitetsdetektering i chattmeddelanden

Freberg, Daniel January 2018 (has links)
Automatically detecting anomalies in text is of great interest for surveillance entities as vast amounts of data can be analysed to find suspicious activity. In this thesis, three distinct machine learning algorithms are evaluated as a chat message classifier is being implemented for the purpose of market surveillance. Naive Bayes and Support Vector Machine belong to the statistical class of machine learning algorithms being evaluated in this thesis and both require feature selection, a side objective of the thesis is thus to find a suitable feature selection technique to ensure mentioned algorithms achieve high performance. Long Short-Term Memory network is the deep learning algorithm being evaluated in the thesis, rather than depend on feature selection, the deep neural network will be evaluated as it is trained using word embeddings. Each of the algorithms achieved high performance but the findings ofthe thesis suggest Naive Bayes algorithm in conjunction with a feature counting feature selection technique is the most suitable choice for this particular learning problem. / Att automatiskt kunna upptäcka anomalier i text har stora implikationer för företag och myndigheter som övervakar olika sorters kommunikation. I detta examensarbete utvärderas tre olika maskininlärningsalgoritmer för chattmeddelandeklassifikation i ett marknadsövervakningsystem. Naive Bayes och Support Vector Machine tillhör båda den statistiska klassen av maskininlärningsalgoritmer som utvärderas i studien och bådar kräver selektion av vilka särdrag i texten som ska användas i algoritmen. Ett sekundärt mål med studien är således att hitta en passande selektionsteknik för att de statistiska algoritmerna ska prestera så bra som möjligt. Long Short-Term Memory Network är djupinlärningsalgoritmen som utvärderas i studien. Istället för att använda en selektionsteknik kommer djupinlärningsalgoritmen nyttja ordvektorer för att representera text. Resultaten visar att alla utvärderade algoritmer kan nå hög prestanda för ändamålet, i synnerhet Naive Bayes tillsammans med termfrekvensselektion.
69

Cross-domain sentiment classification using grams derived from syntax trees and an adapted naive Bayes approach

Cheeti, Srilaxmi January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / Doina Caragea / There is an increasing amount of user-generated information in online documents, includ- ing user opinions on various topics and products such as movies, DVDs, kitchen appliances, etc. To make use of such opinions, it is useful to identify the polarity of the opinion, in other words, to perform sentiment classification. The goal of sentiment classification is to classify a given text/document as either positive, negative or neutral based on the words present in the document. Supervised learning approaches have been successfully used for sentiment classification in domains that are rich in labeled data. Some of these approaches make use of features such as unigrams, bigrams, sentiment words, adjective words, syntax trees (or variations of trees obtained using pruning strategies), etc. However, for some domains the amount of labeled data can be relatively small and we cannot train an accurate classifier using the supervised learning approach. Therefore, it is useful to study domain adaptation techniques that can transfer knowledge from a source domain that has labeled data to a target domain that has little or no labeled data, but a large amount of unlabeled data. We address this problem in the context of product reviews, specifically reviews of movies, DVDs and kitchen appliances. Our approach uses an Adapted Naive Bayes classifier (ANB) on top of the Expectation Maximization (EM) algorithm to predict the sentiment of a sentence. We use grams derived from complete syntax trees or from syntax subtrees as features, when training the ANB classifier. More precisely, we extract grams from syntax trees correspond- ing to sentences in either the source or target domains. To be able to transfer knowledge from source to target, we identify generalized features (grams) using the frequently co-occurring entropy (FCE) method, and represent the source instances using these generalized features. The target instances are represented with all grams occurring in the target, or with a reduced grams set obtained by removing infrequent grams. We experiment with different types of grams in a supervised framework in order to identify the most predictive types of gram, and further use those grams in the domain adaptation framework. Experimental results on several cross-domains task show that domain adaptation approaches that combine source and target data (small amount of labeled and some unlabeled data) can help learn classifiers for the target that are better than those learned from the labeled target data alone.
70

Σύγχρονες τεχνικές στις διεπαφές ανθρώπινου εγκεφάλου - υπολογιστή

Τσιλιγκιρίδης, Βασίλειος 16 June 2011 (has links)
Τα συστήματα διεπαφών ανθρώπινου εγκεφάλου-υπολογιστή (BCIs: Brain-Computer Interfaces) απαιτούν την πραγματικού χρόνου, αποτελεσματική επεξεργασία των μετρήσεων των ηλεκτροεγκεφαλογραφικών (ΗΕΓ) σημάτων του χρήστη τους, προκειμένου να μεταφράσουν τις νοητικές διεργασίες/προθέσεις του σε σήματα ελέγχου εξωτερικών διατάξεων ή συστημάτων. Στο πλαίσιο της εργασίας αυτής μελετήθηκε το θεωρητικό υπόβαθρο του προβλήματος και αναλύθηκαν συνοπτικά οι κυριότερες τεχνικές που χρησιμοποιούνται σήμερα. Επιπρόσθετα, παρουσιάστηκε μία μέθοδος ταξινόμησης των νοητικών προθέσεων της αριστερής και δεξιάς κίνησης των χεριών ενός χρήστη η οποία εφαρμόστηκε σε πραγματικά ιατρικά δεδομένα. Η εξαγωγή των χαρακτηριστικών που διαφοροποιούνται μεταξύ των δύο καταστάσεων βασίστηκε σε πληροφορίες του πεδίου χρόνου-συχνότητας, οι οποίες αντλούνται με το φιλτράρισμα των ακατέργαστων ΗΕΓ δεδομένων και με τη βοήθεια των αιτιατών κυματιδίων Morlet, ενώ για την επακόλουθη ταξινόμηση των χαρακτηριστικών αναπτύχθηκαν και συγκρίθηκαν δύο αξιόπιστες μέθοδοι. Η πρώτη αφορά στη δημιουργία πιθανοθεωρητικών προτύπων κανονικής κατανομής για κάθε κατηγορία πρόθεσης κίνησης, με την τελική απόφαση ταξινόμησης να λαμβάνεται με εφαρμογή του απλού ταξινομητή του Bayes, ενώ η δεύτερη δημιουργεί ένα πρότυπο ταξινόμησης με βάση το θεωρητικό πλαίσιο των Μηχανών Διανυσμάτων Υποστήριξης (SVM). Στόχος του προβλήματος της δυαδικής ταξινόμησης είναι να αποφασίζεται σε ποια από τις δύο κατηγορίες ανήκει μία δεδομένη νοητική πρόθεση όσο το δυνατόν ταχύτερα και αξιόπιστα, έτσι ώστε ο σχεδιαζόμενος αλγόριθμος να εξυπηρετήσει ένα πλαίσιο ανατροφοδότησης της τελικής απόφασης στο χρήστη σε συνθήκες πραγματικού χρόνου. / Brain-Computer Interfaces (BCIs) demand the efficient processing of EEG data in order to translate one's thought or wish into a control signal that can be applied as input to external devices. Here we present a method to classify left from right hand movements, by extracting features from the data with Morlet wavelets and classifying with two different models, SVMs and Naive Bayes Classifier.

Page generated in 0.0519 seconds