Global ETD Search

61	Automated classification of bibliographic data using SVM and Naive Bayes Nordström, Jesper January 2018 (has links) Classification of scientific bibliographic data is an important and increasingly more time-consuming task in a “publish or perish” paradigm where the number of scientific publications is steadily growing. Apart from being a resource-intensive endeavor, manual classification has also been shown to be often performed with a quite high degree of inconsistency. Since many bibliographic databases contain a large number of already classified records supervised machine learning for automated classification might be a solution for handling the increasing volumes of published scientific articles. In this study automated classification of bibliographic data, based on two different machine learning methods; Naive Bayes and Support Vector Machine (SVM), were evaluated. The data used in the study were collected from the Swedish research database SwePub and the features used for training the classifiers were based on abstracts and titles in the bibliographic records. The accuracy achieved ranged between a lowest score of 0.54 and a highest score of 0.84. The classifiers based on Support Vector Machine did consistently receive higher scores than the classifiers based on Naive Bayes. Classification performed at the second level in the hierarchical classification system used clearly resulted in lower scores than classification performed at the first level. Using abstracts as the basis for feature extraction yielded overall better results than using titles, the differences were however very small. automated classification machine learning Naive Bayes Support Vector Machine SVM bibliographic data SwePub Computer and Information Sciences Data- och informationsvetenskap
62	Lane Change Intent Analysis for Preceding Vehicles : a Study Using Various Machine Learning Techniques / Analys av framförvarande fordons filbytesintentioner : En studie utnyttjande koncept från maskininlärning Fredrik, Ljungberg January 2017 (has links) In recent years, the level of technology in heavy duty vehicles has increased significantly. Progress has been made towards autonomous driving, with increaseddriver comfort and safety, partly by use of advanced driver assistance systems (ADAS). In this thesis the possibilities to detect and predict lane changes for the preceding vehicle are studied. This important information will help to improve the decision-making for safety systems. Some suitable approaches to solving the problem are presented, along with an evaluation of their related accuracies. The modelling of human perceptions and actions is a challenging task. Several thousand kilometers of driving data was available, and a reasonable course of action was to let the system learn from this off-line. For the thesis it was therefore decided to review the possibility to utilize a branch within the area of artificial intelligence, called supervised learning. The study of driving intentions was formulatedas a binary classification problem. To distinguish between lane-change and lane-keep actions, four machine learning-techniques were evaluated, namely naive Bayes, artificial neural networks, support vector machines and Gaussian processes. As input to the classifiers, fused sensor signals from today commercially accessible systems in Scania vehicles were used. The project was carried out within the boundaries of a Master’s Thesis projectin collaboration between Linköping University and Scania CV AB. Scania CV AB is a leading manufacturer of heavy trucks, buses and coaches, alongside industrialand marine engines. Machine Learning Gaussian Processes Support Vector Machines Artificial Neural Networks Naive Bayes Self-driving car autonomous Scania Control Engineering Reglerteknik
63	Predicting Political Party Affiliation in the Swedish Parliament using Natural Language Processing Zetterberg, Johannes January 2022 (has links) Text classification is a fundamental part of natural language processing. In this thesis, methods for text classification are used in an attempt to predict the political party affiliation of members of parliament (MPs). The objective is to evaluate the performance of Support Vector Machines (SVM), naive Bayes, and a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model in predicting MPs' political party affiliation based on speeches given in the Chamber of the Swedish Parliament. This study shows that BERT outperforms SVM and naive Bayes in correctly classifying MPs, and SVM makes better predictions than naive Bayes and performs reasonably well compared to BERT. The results show that all models correctly predict MPs representing the Sweden Democrats to the highest degree. Both BERT and SVM roughly classify every other speech correctly, which implies much better than making random predictions. These results indicate the potential use of methods for automatically classifying political speeches. Machine learning support vector machines naive Bayes transformer BERT text classification NLP Probability Theory and Statistics Sannolikhetsteori och statistik
64	Natural language processing for researchh philosophies and paradigms dissertation (DFIT91) Mawila, Ntombhimuni 28 February 2021 (has links) Research philosophies and paradigms (RPPs) reveal researchers’ assumptions and provide a systematic way in which research can be carried out effectively and appropriately. Different studies highlight cognitive and comprehension challenges of RPPs concepts at the postgraduate level. This study develops a natural language processing (NLP) supervised classification application that guides students in identifying RPPs applicable to their study. By using algorithms rooted in a quantitative research approach, this study builds a corpus represented using the Bag of Words model to train the naïve Bayes, Logistic Regression, and Support Vector Machine algorithms. Computer experiments conducted to evaluate the performance of the algorithms reveal that the Naïve Bayes algorithm presents the highest accuracy and precision levels. In practice, user testing results show the varying impact of knowledge, performance, and effort expectancy. The findings contribute to the minimization of issues postgraduates encounter in identifying research philosophies and the underlying paradigms for their studies. / Science and Technology Education / MTech. (Information Technology) Research Philosophy Paradigm Corpus Algorithm Classification model Classifier Bag of words Naive Bayes Researcher 006.35
65	How to explain graph-based semi-supervised learning for non-mathematicians? Jönsson, Mattias, Borg, Lucas January 2019 (has links) Den stora mängden tillgänglig data på internet kan användas för att förbättra förutsägelser genom maskininlärning. Problemet är att sådan data ofta är i ett obehandlat format och kräver att någon manuellt bestämmer etiketter på den insamlade datan innan den kan användas av algoritmen. Semi-supervised learning (SSL) är en teknik där algoritmen använder ett fåtal förbehandlade exempel och därefter automatiskt bestämmer etiketter för resterande data. Ett tillvägagångssätt inom SSL är att representera datan i en graf, vilket kallas för graf-baserad semi-supervised learning (GSSL), och sedan hitta likheter mellan noderna i grafen för att automatiskt bestämma etiketter.Vårt mål i denna uppsatsen är att förenkla de avancerade processerna och stegen för att implementera en GSSL-algoritm. Vi kommer att gå igen grundläggande steg som hur utvecklingsmiljön ska installeras men även mer avancerade steg som data pre-processering och feature extraction. Feature extraction metoderna som uppsatsen använder sig av är bag-of-words (BOW) och term frequency-inverse document frequency (TF-IDF). Slutgiltligen presenterar vi klassificering av dokument med Label Propagation (LP) och Multinomial Naive Bayes (MNB) samt en detaljerad beskrivning över hur GSSL fungerar.Vi presenterar även prestanda för klassificering-algoritmerna genom att klassificera 20 Newsgroup datasetet med LP och MNB. Resultaten dokumenteras genom två olika utvärderingspoäng vilka är F1-score och accuracy. Vi gör även en jämförelse mellan MNB och LP med två olika typer av kärnor, KNN och RBF, på olika mängder av förbehandlade träningsdokument. Resultaten ifrån klassificering-algoritmerna visar att MNB är bättre på att klassificera datasetet än LP. / The large amount of available data on the web can be used to improve the predictions made by machine learning algorithms. The problem is that such data is often in a raw format and needs to be manually labeled by a human before it can be used by a machine learning algorithm. Semi-supervised learning (SSL) is a technique where the algorithm uses a few prepared samples to automatically prepare the rest of the data. One approach to SSL is to represent the data in a graph, also called graph-based semi-supervised learning (GSSL), and find similarities between the nodes for automatic labeling.Our goal in this thesis is to simplify the advanced processes and steps to implement a GSSL-algorithm. We will cover basic tasks such as setup of the developing environment and more advanced steps such as data preprocessing and feature extraction. The feature extraction techniques covered are bag-of-words (BOW) and term frequency-inverse document frequency (TF-IDF). Lastly, we present how to classify documents using Label Propagation (LP) and Multinomial Naive Bayes (MNB) with a detailed explanation of the inner workings of GSSL. We showcased the classification performance by classifying documents from the 20 Newsgroup dataset using LP and MNB. The results are documented using two different evaluation scores called F1-score and accuracy. A comparison between MNB and the LP-algorithm using two different types of kernels, KNN and RBF, was made on different amount of labeled documents. The results from the classification algorithms shows that MNB is better at classifying the data than LP. Graph based SSL Label Propagation Naive Bayes’ KNN RBF Feature extraction 20 newsgroup preprocessing graph construction Engineering and Technology Teknik och teknologier
66	Klasifikace příspěvků ve webových diskusích / Classification of Web Forum Entries Margold, Tomáš January 2008 (has links) This thesis is dealing text ranking on the internet background. There are described available methods for classification and splitting of the text reports. The part of this thesis is implementation of Bayes naive algorithm and classifier using neuron nets. Selected methods are compared considering their error rate or other ranking features.
67	Improving Filtering of Email Phishing Attacks by Using Three-Way Text Classifiers Trevino, Alberto 13 March 2012 (has links) (PDF) The Internet has been plagued with endless spam for over 15 years. However, in the last five years spam has morphed from an annoying advertising tool to a social engineering attack vector. Much of today's unwanted email tries to deceive users into replying with passwords, bank account information, or to visit malicious sites which steal login credentials and spread malware. These email-based attacks are known as phishing attacks. Much has been published about these attacks which try to appear real not only to users and subsequently, spam filters. Several sources indicate traditional content filters have a hard time detecting phishing attacks because the emails lack the traditional features and characteristics of spam messages. This thesis tests the hypothesis that by separating the messages into three categories (ham, spam and phish) content filters will yield better filtering performance. Even though experimentation showed three-way classification did not improve performance, several additional premises were tested, including the validity of the claim that phishing emails are too much like legitimate emails and the ability of Naive Bayes classifiers to properly classify emails. email spam filtering phish phishing attacks support vector machines maximum entropy naive bayes bayesian filters Information Security
68	Evaluating Statistical MachineLearning and Deep Learning Algorithms for Anomaly Detection in Chat Messages / Utvärdering av statistiska maskininlärnings- och djupinlärningsalgoritmer för anomalitetsdetektering i chattmeddelanden Freberg, Daniel January 2018 (has links) Automatically detecting anomalies in text is of great interest for surveillance entities as vast amounts of data can be analysed to find suspicious activity. In this thesis, three distinct machine learning algorithms are evaluated as a chat message classifier is being implemented for the purpose of market surveillance. Naive Bayes and Support Vector Machine belong to the statistical class of machine learning algorithms being evaluated in this thesis and both require feature selection, a side objective of the thesis is thus to find a suitable feature selection technique to ensure mentioned algorithms achieve high performance. Long Short-Term Memory network is the deep learning algorithm being evaluated in the thesis, rather than depend on feature selection, the deep neural network will be evaluated as it is trained using word embeddings. Each of the algorithms achieved high performance but the findings ofthe thesis suggest Naive Bayes algorithm in conjunction with a feature counting feature selection technique is the most suitable choice for this particular learning problem. / Att automatiskt kunna upptäcka anomalier i text har stora implikationer för företag och myndigheter som övervakar olika sorters kommunikation. I detta examensarbete utvärderas tre olika maskininlärningsalgoritmer för chattmeddelandeklassifikation i ett marknadsövervakningsystem. Naive Bayes och Support Vector Machine tillhör båda den statistiska klassen av maskininlärningsalgoritmer som utvärderas i studien och bådar kräver selektion av vilka särdrag i texten som ska användas i algoritmen. Ett sekundärt mål med studien är således att hitta en passande selektionsteknik för att de statistiska algoritmerna ska prestera så bra som möjligt. Long Short-Term Memory Network är djupinlärningsalgoritmen som utvärderas i studien. Istället för att använda en selektionsteknik kommer djupinlärningsalgoritmen nyttja ordvektorer för att representera text. Resultaten visar att alla utvärderade algoritmer kan nå hög prestanda för ändamålet, i synnerhet Naive Bayes tillsammans med termfrekvensselektion. machine learning NLP deep learning word vectors naive bayes support vector machine LSTM Computer Sciences Datavetenskap (datalogi)
69	IMPROVING THE UTILITY OF DIFFERENTIALLY PRIVATE ALGORITHMS USING DATA CHARACTERISTICS Farzad Zafarani (11837222) 10 January 2025 (has links) <p dir="ltr">As data continues to grow rapidly in volume and complexity, there is an increasing need to extract meaningful insights from it. These datasets often contain sensitive individual information, making privacy protection crucial. Differential privacy has become the de facto standard for protecting individuals' privacy. Many datasets also have known constraints and structures. Can these known constraints or structures be leveraged to design mechanisms with better utility?</p><p dir="ltr">The focus of this thesis is to demonstrate that by leveraging the inherent structures and constraints within datasets, it may be possible to design differential privacy mechanisms that offer better utility (i.e., more accurate results) while maintaining the required level of privacy. This involves exploring advanced techniques and modifications to the basic mechanisms that take advantage of dataset-specific properties, such as sparsity, distributional assumptions, or other contextual information. This approach aims to minimize the noise added, thereby improving the utility of differentially private outputs.</p><p dir="ltr">In many scenarios, datasets contain constraints. In this thesis, we show that generating differentially private synthetic data while preserving constraints increases utility across several metrics, including marginal queries, classification task accuracy, and clustering. Smooth sensitivity is a data-dependent sensitivity metric that allows for more precise noise addition based on the actual data distribution, rather than worst-case scenarios. It addresses the limitations of local sensitivity by ensuring robust privacy guarantees, even in the presence of outliers or small changes in the data.</p><p dir="ltr"><br></p><p dir="ltr">We have developed a differentially private Naive Bayes model using smooth sensitivity. By using data-dependent sensitivity measures like smooth sensitivity and incorporating known data constraints, we can reduce the amount of noise added, resulting in a more accurate model.</p> Data and information privacy Naive Bayes Classifier Differential Privacy Privacy Synthetic Data Generation Smooth Sensitivity
70	Cross-domain sentiment classification using grams derived from syntax trees and an adapted naive Bayes approach Cheeti, Srilaxmi January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / There is an increasing amount of user-generated information in online documents, includ- ing user opinions on various topics and products such as movies, DVDs, kitchen appliances, etc. To make use of such opinions, it is useful to identify the polarity of the opinion, in other words, to perform sentiment classification. The goal of sentiment classification is to classify a given text/document as either positive, negative or neutral based on the words present in the document. Supervised learning approaches have been successfully used for sentiment classification in domains that are rich in labeled data. Some of these approaches make use of features such as unigrams, bigrams, sentiment words, adjective words, syntax trees (or variations of trees obtained using pruning strategies), etc. However, for some domains the amount of labeled data can be relatively small and we cannot train an accurate classifier using the supervised learning approach. Therefore, it is useful to study domain adaptation techniques that can transfer knowledge from a source domain that has labeled data to a target domain that has little or no labeled data, but a large amount of unlabeled data. We address this problem in the context of product reviews, specifically reviews of movies, DVDs and kitchen appliances. Our approach uses an Adapted Naive Bayes classifier (ANB) on top of the Expectation Maximization (EM) algorithm to predict the sentiment of a sentence. We use grams derived from complete syntax trees or from syntax subtrees as features, when training the ANB classifier. More precisely, we extract grams from syntax trees correspond- ing to sentences in either the source or target domains. To be able to transfer knowledge from source to target, we identify generalized features (grams) using the frequently co-occurring entropy (FCE) method, and represent the source instances using these generalized features. The target instances are represented with all grams occurring in the target, or with a reduced grams set obtained by removing infrequent grams. We experiment with different types of grams in a supervised framework in order to identify the most predictive types of gram, and further use those grams in the domain adaptation framework. Experimental results on several cross-domains task show that domain adaptation approaches that combine source and target data (small amount of labeled and some unlabeled data) can help learn classifiers for the target that are better than those learned from the labeled target data alone. Adapted naive bayes algorithm Cross domain sentiment classification Grams Domain adaptation Syntax subtrees Computer Engineering (0464) Computer Science (0984) Information Science (0723)

Search results