• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 102
  • 39
  • 5
  • 5
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 182
  • 182
  • 76
  • 70
  • 56
  • 54
  • 41
  • 38
  • 38
  • 35
  • 28
  • 28
  • 25
  • 21
  • 19
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Taskfinder : Comparison of NLP techniques for textclassification within FMCG stores

Jensen, Julius January 2022 (has links)
Natural language processing has many important applications in today, such as translations, spam filters, and other useful products. To achieve these applications supervised and unsupervised machine learning models, have shown to be successful. The most important aspect of these models is what the model can achieve with different datasets. This article will examine how RNN models compare with Naive Bayes in text classification. The chosen RNN models are long short-term memory (LSTM) and gated recurrent unit (GRU). Both LSTM and GRU will be trained using the flair Framework. The models will be trained on three separate datasets with different compositions, where the trend within each model will be examined and compared with the other models. The result showed that Naive Bayes performed better on classifying short sentences than the RNN models, but worse in longer sentences. When trained on a small dataset LSTM and GRU had a better result then Naive Bayes. The best performing model was Naive Bayes, which had the highest accuracy score in two out of the three datasets.
62

Machine Learning explainability in text classification for Fake News detection

Kurasinski, Lukas January 2020 (has links)
Fake news detection gained an interest in recent years. This made researchers try to findmodels that can classify text in the direction of fake news detection. While new modelsare developed, researchers mostly focus on the accuracy of a model. There is little researchdone in the subject of explainability of Neural Network (NN) models constructed for textclassification and fake news detection. When trying to add a level of explainability to aNeural Network model, allot of different aspects have to be taken under consideration.Text length, pre-processing, and complexity play an important role in achieving successfully classification. Model’s architecture has to be taken under consideration as well. Allthese aspects are analyzed in this thesis. In this work, an analysis of attention weightsis performed to give an insight into NN reasoning about texts. Visualizations are usedto show how 2 models, Bidirectional Long-Short term memory Convolution Neural Network (BIDir-LSTM-CNN), and Bidirectional Encoder Representations from Transformers(BERT), distribute their attentions while training and classifying texts. In addition, statistical data is gathered to deepen the analysis. After the analysis, it is concluded thatexplainability can positively influence the decisions made while constructing a NN modelfor text classification and fake news detection. Although explainability is useful, it is nota definitive answer to the problem. Architects should test, and experiment with differentsolutions, to be successful in effective model construction.
63

Klassificering av transkriberade telefonsamtal med Support Vector Machines för ökad effektivitet inom vården / Classification of transcribed telephone calls with support vector machines for increased efficiency in healthcare

Höglind, Sanna, Sundström, Emelie January 2019 (has links)
Patientnämndens förvaltning i Stockholm tar årligen emot tusentals samtal som önskar framföra klagomål på vården i Region Stockholm. Syftet med arbetet är att undersöka hur en NLP-robot för klassificering av inkomna klagomål skulle kunna bidra till en ökad effektivitet av verksamheten. Klassificeringen av klagomålen har utförts med hjälp av en metod baserad på Support Vector Machines. För att optimera modellens korrekthet undersöktes hur längden av ordvektorerna påverkar korrektheten. Modellen gav en slutgiltig korrekthet 53,10 %. Detta resultat analyserades sedan med målsättningen att identifiera potentiella förbättringsmöjligheter hos modellen. För framtida arbeten kan det därför vara intressant att undersöka hur antalet samtal, antalet personer som spelar in samtal och klassfördelningen i datamängden påverkar korrektheten. För att undersöka hur effektiviteten hos Patientnämndens förvaltning i Stockholm skulle påverkas av implementeringen av en NLP-robot användes en SWOT-analys. Denna analys visade på tydliga fördelar med automatisering av klagomålshanteringen, men att en sådan implementation måste ske med försiktighet där det säkerställs att tillgången på kompetens är tillräcklig för att förebygga potentiella hot. / Every year Patientnämnden recieves thousands of phone calls from patients wishing to make complaints about the health care in Stockholm. The aim of this work is to investigate how an NLP-robot for classification of recieved phone calls would contribute to an increased efficiency of the operation. The classification of the complaints has been made using a method based on Support Vector Machines. In order to optimize the accuracy of the model the impact of the length of the word vector has been investigated. The final result was an accuracy of 53.10%. The result was analyzed with the goal to identify potential opportunities of improvement of the model. For future work it could be interesting to investigate in how the number of calls, the number of people recording the calls and the distribution between the classes affect the accuracy A SWOT-analysis was performed in order to investigate in how the efficiency of Patientnämnden would be affected by the implementation of an NLP-robot. The analysis showed apparent benefits of automation of complaint management, but also that such an implementation must be done with great caution in order to be able to ensure that the available competence is high enough to prevent potential threats.
64

Predicting SNI Codes from Company Descriptions : A Machine Learning Solution

Lindholm, Erik, Nilsson, Jonas January 2023 (has links)
This study aims to develop an automated solution for assigning area of industry codes to businesses based on the contents of their business descriptions. The Swedish standard industrial classification (SNI) is a system used by Statistics Sweden (SCB) for categorizing businesses for their statistics reports. Assignment of SNI codes has so far been done manually by the person registering a new company, but this is a far from optimal solution. Some of the 88 main group areas of industry are hard to tell apart from one another, and this often leads to incorrect assignments. Our approach to this problem was to train a machine learning model using the Naive Bayes and SVM classifier algorithms and conduct an experiment. In 2019, Dahlqvist and Strandlund had attempted this and reached an accuracy score of 52 percent by use of the gradient boosting classifier, but this was considered too low for real-world implementation. Our main goal was to achieve a higher accuracy than that of Dahlqvist and Strandlund, which we eventually succeeded in - our best-performing SVM model reached a score of 60.11 percent. Similarly to Dahlqvist and Strandlund, we concluded that the low quality of the dataset was the main obstacle for achieving higher scores. The dataset we used was severely imbalanced, and much time was spent on investigating and applying oversampling and undersampling as strategies for mitigating this problem. However, we found during the testing phase that none of these strategies had any positive effect on the accuracy scores.
65

NOVA: Automated Detection of Violent Threats in Swedish Online Environments

Lindén, Kevin, Moshfegh, Arvin January 2023 (has links)
Social media and online environments have become an integral part of society, allowing for self-expression, information sharing, and discussions online. However, these platforms are also used to express hate and threats of violence. Violent threats online lead to negative consequences, such as an unsafe online environment, self-censorship, and endangering democracy. Manually detecting and moderating threats online is challenging due to the vast amounts of data uploaded daily. Scholars have called for efficient tools based on machine learning to tackle this problem. Another challenge is that few threat-focused datasets and models exist, especially for low-resource languages such as Swedish, making identifying and detecting threats challenging. Therefore, this study aims to develop a practical and effective tool to automatically detect and identify online threats in Swedish. A tailored Swedish threat dataset will be generated to fine-tune KBLab’s Swedish BERT model. The research question that guides this project is “How effective is a fine-tuned BERT model in classifying texts as threatening or non-threatening in Swedish online environments?”. To the authors’ knowledge, no existing model can detect threats in Swedish. This study uses design science research to develop the artifact and evaluates the artifact’s performance using experiments. The dataset will be generated during the design and development by manually annotating translated English, synthetic, and authentic Swedish data. The BERT model will be fine-tuned using hyperparameters from previous research. The generated dataset comprised 6,040 posts split into 39% threats and 61% non-threats. The model, NOVA, achieved good performance on the test set and in the wild - successfully differentiating threats from non-threats. NOVA achieved almost perfect recall but a lower precision - indicating room for improvement. NOVA might be too lenient when classifying threats, which could be attributed to the complexity and ambiguity of threats and the relatively small dataset. Nevertheless, NOVA can be used as a filter to identify threatening posts online among vast amounts of data.
66

Genre classification using syntactic features

Brigadoi, Ivan January 2021 (has links)
This thesis work adresses text classification in relation to genre identification using different feature sets, with a focus on syntactic based features. We built our models by means of traditional machine learning algorithms, i.e. Naive Bayes, K-nearest neighbour, Support Vector Machine and Random Forest in order to predict the literary genre of books. We trained our models using as feature sets bag-of-words (BOW), bigrams, syntactic-based bigrams and emotional features, as well as combinations of features. Results obtained using the best features, i.e. BOW combined with bigrams based on syntactic relations between words, on the test set showed an enhancement in performance by 2% in F1-score over the baseline using BOW features, which translates into a positive impact of using syntactic information in the task of text classification.
67

Unstructured to Actionable: Extracting wind event impact data for enhanced infrastructure resilience

Pham, An Huy 28 August 2023 (has links)
The United States experiences more extreme wind events than any other country, owing to its extensive coastlines, central regions prone to tornadoes, and varied climate that together create a wide array of wind phenomena. Despite advanced meteorological forecasts, these events continue to have significant impacts on infrastructure due to the knowledge gap between hazard prediction and tangible impact. Consequently, disaster managers are increasingly interested in understanding the impacts of past wind events that can assist in formulating strategies to enhance community resilience. However, this data is often non-structured and embedded in various agency documents. This makes it challenging to access and use the data effectively. Therefore, it is important to investigate approaches that can distinguish and extract impact data from non-essential information. This research aims at exploring methods that can identify, extract, and summarize sentences containing impact data. The significance of this study lies in addressing the scarcity of historical impact data related to structural and community damage, given that such information is dispersed across multiple briefings and damage reports. The research has two main objectives. The first is to extract sentences providing information on infrastructure, or community damage. This task uses Zero-shot text classification with the large version of the Bidirectional and Auto-Regressive Transformers model (BART-large) pre-trained on the multi-nominal language inference (MNLI) dataset. The model identifies the impact sentences by evaluating entailment probabilities with user-defined impact keywords. This method addresses the absence of manually labeled data and establishes a framework applicable to various reports. The second objective transforms this extracted data into easily digestible summaries. This is achieved by using a pre-trained BART-large model on the Cable News Network (CNN) Daily Mail dataset to generate abstractive summaries, making it easier to understand the key points from the extracted impact data. This approach is versatile, given its dependence on user-defined keywords, and can adapt to different disasters, including tornadoes, hurricanes, earthquakes, floods, and more. A case study will demonstrate this methodology, specifically examining the Hurricane Ian impact data found in the Structural Extreme Events Reconnaissance (StEER) damage report. / Master of Science / The U.S. sees more severe windstorms than any other country. These storms can cause significant damage, despite the availability of warnings and alerts generated from weather forecast systems up to 72 hours before the storm hits. One challenge is the ineffective communication between emergency managers and at-risk communities, which can hinder timely evacuation and preparation. Additionally, data about past storm damages are often mixed up with non-actionable information in many different reports, making it difficult to use the data to enhance future warnings and readiness for upcoming storms. This study tries to solve this problem by finding ways to identify, extract, and summarize information about damage caused by windstorms. It is an important step toward using historical data to prepare for future events. Two main objectives guide this research. The first involves extracting sentences in these reports that provide information on damage to buildings, infrastructure, or communities. We're using a machine learning model to sort the sentences into two groups: those that contain useful information and those that do not. The second objective revolves around transforming this extracted data into easily digestible summaries. The same machine learning model is then trained in a different way, to create these summaries. As a result, critical data can be presented in a more user-friendly and effective format, enhancing its usefulness to disaster managers.
68

Enhanced Content-Based Fake News Detection Methods with Context-Labeled News Sources

Arnfield, Duncan 01 December 2023 (has links) (PDF)
This work examined the relative effectiveness of multilayer perceptron, random forest, and multinomial naïve Bayes classifiers, trained using bag of words and term frequency-inverse dense frequency transformations of documents in the Fake News Corpus and Fake and Real News Dataset. The goal of this work was to help meet the formidable challenges posed by proliferation of fake news to society, including the erosion of public trust, disruption of social harmony, and endangerment of lives. This training included the use of context-categorized fake news in an effort to enhance the tools’ effectiveness. It was found that term frequency-inverse dense frequency provided more accurate results than bag of words across all evaluation metrics for identifying fake news instances, and that the Fake News Corpus provided much higher result metrics than the Fake and Real News Dataset. In comparison to state-of-the-art methods the models performed as expected.
69

Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches.

AlKhateeb, Jawad H.Y. January 2010 (has links)
The design of a machine which reads unconstrained words still remains an unsolved problem. For example, automatic interpretation of handwritten documents by a computer is still under research. Most systems attempt to segment words into letters and read words one character at a time. However, segmenting handwritten words is very difficult. So to avoid this words are treated as a whole. This research investigates a number of features computed from whole words for the recognition of handwritten words in particular. Arabic text classification and recognition is a complicated process compared to Latin and Chinese text recognition systems. This is due to the nature cursiveness of Arabic text. The work presented in this thesis is proposed for word based recognition of handwritten Arabic scripts. This work is divided into three main stages to provide a recognition system. The first stage is the pre-processing, which applies efficient pre-processing methods which are essential for automatic recognition of handwritten documents. In this stage, techniques for detecting baseline and segmenting words in handwritten Arabic text are presented. Then connected components are extracted, and distances between different components are analyzed. The statistical distribution of these distances is then obtained to determine an optimal threshold for word segmentation. The second stage is feature extraction. This stage makes use of the normalized images to extract features that are essential in recognizing the images. Various method of feature extraction are implemented and examined. The third and final stage is the classification. Various classifiers are used for classification such as K nearest neighbour classifier (k-NN), neural network classifier (NN), Hidden Markov models (HMMs), and the Dynamic Bayesian Network (DBN). To test this concept, the particular pattern recognition problem studied is the classification of 32492 words using ii the IFN/ENIT database. The results were promising and very encouraging in terms of improved baseline detection and word segmentation for further recognition. Moreover, several feature subsets were examined and a best recognition performance of 81.5% is achieved.
70

Rethinking Document Classification: A Pilot for the Application of Text Mining Techniques To Enhance Standardized Assessment Protocols for Critical Care Medical Team Transfer of Care

Walker, Briana Shanise 09 June 2017 (has links)
No description available.

Page generated in 0.1076 seconds