Global ETD Search

1	Email Classification with Machine Learning and Word Embeddings for Improved Customer Support Rosander, Oliver, Ahlstrand, Jim January 2018 (has links) Classifying emails into distinct labels can have a great impact on customer support. By using machine learning to label emails the system can set up queues containing emails of a specific category. This enables support personnel to handle request quicker and more easily by selecting a queue that match their expertise. This study aims to improve the manually defined rule based algorithm, currently implemented at a large telecom company, by using machine learning. The proposed model should have higher F1-score and classification rate. Integrating or migrating from a manually defined rule based model to a machine learning model should also reduce the administrative and maintenance work. It should also make the model more flexible. By using the frameworks, TensorFlow, Scikit-learn and Gensim, the authors conduct five experiments to test the performance of several common machine learning algorithms, text-representations, word embeddings and how they work together. In this article a web based interface were implemented which can classify emails into 33 different labels with 0.91 F1-score using a Long Short Term Memory network. The authors conclude that Long Short Term Memory networks outperform other non-sequential models such as Support Vector Machines and ADABoost when predicting labels for emails. Email Classification Machine Learning Long Short Term Memory Natural Language Processing Computer Sciences Datavetenskap (datalogi)
2	Gone Phishing: How Task Interruptions Impact Email Classification Ability Slifkin, Elisabeth 01 January 2024 (has links) (PDF) With the continuous rise in email use, the prevalence and sophistication of phishing attacks have increased. Expanding cybersecurity awareness and strengthening email practices will help reduce the dangers posed by phishing emails, but ultimately, the extent to which a user can accurately detect phishing emails directly impacts the amount of risk to which they are exposed. Being interrupted while reading and replying to emails is a consequence of working in a dynamic world. Interruptions are often identified to be disruptive, both in terms of time costs and performance changes; they reliably increase a task's completion time, but their impact on accuracy is less consistent. The present three studies manipulated the length (Experiment 1), difficulty (Experiment 2), and similarity (Experiment 3) of interruptions in accordance with the memory for goals (MFG) model, which aims to explain why interruptions may be disruptive. Participants classified emails as either phishing or legitimate, while periodically being interrupted with a secondary task. Across all three experiments, interruptions did not affect classification accuracy, but they did reliably increase classification response time. Oculomotor analyses indicated that interruptions, regardless of type, impaired memory of previously encoded email information. This was evidenced across all three experiments by an increase in refixations and an increase in the distance between fixations pre- and post-interruption. MFG can account for some of these findings, but not all. Interruptions did not impair performance on an email classification task when participants could review the interrupted information, yet overall classification accuracy was still low. These results may suggest a pathway toward improving email classification performance however, as participants exhibited behaviors known to improve performance on other tasks, such as revisiting previously viewed areas of an email. Email classification task interruptions eye tracking phishing Cognitive Psychology Human Factors Psychology
3	Email classification using machine learning algorithms Jonsson, Isak January 2022 (has links) The goal of this project is to construct a machine learning algorithmthat improves over time. This was done by first constructing a datasetthat reflects real world messages, that would simulate receiving emailsfrom two different sources. The data set was constructed by combiningdata from two different online forums. Two application programminginterrfaces were used to collect and send data to the program. Thedataset was tested on 4 different methods where the best one would beused for the final product. The 4 different methods were: k-nearestneighbors, adaptive boosting, random forest and artificial neuralnetwork. All the above methods were tested and tuned to achieve the bestaccuracy. From the result it became clear that the artificial neuralnetwork outperformed the other methods by a large margin and would bemost suited for the final product. The final product was an algorithmthat would improve over time. This was achieved by using a feedback loopon the new data that was collected over time from the online forums. Ifthe algorithm was sure that a new datapoint was the right class it wouldincorporate it into the dataset and over time the dataset would growlarger and the algorithm would adapt to new data and trends. The finalresult became a growing dataset that started on a 1000 data points andended up at 8464 data points, where the total amount ofmisclassification ended up at 74. Machine learning artificial neural networks email classification Annan elektroteknik och elektronik
4	Evaluation of the performance of machine learning techniques for email classification / Utvärdering av prestationen av maskininlärningstekniker för e-post klassificering Tapper, Isabella January 2022 (has links) Manual categorization of a mail inbox can often become time-consuming. Therefore many attempts have been made to use machine learning for this task. One essential Natural Language Processing (NLP) task is text classification, which is a big challenge since an NLP engine is not a native speaker of any human language. An NLP engine often fails at understanding sarcasm and underlying intent. One of the NLP challenges is to represent text. Text embeddings can be learned, or they can be generated from a pre-trained model. Google’s pre-trained model Sentence Bidirectional Encoder Representations from Transformers (SBERT) is state-of-the-art for generating pre-trained vector representation of longer text. In this project, different methods of classifying and clustering emails were studied. The performances of three supervised classification models were compared to each other. A Support Vector Machine (SVM) and a Neural Network (NN) were trained with SBERT embeddings, and the third model, a Recurrent Neural Network (RNN) was trained on raw data. The motivation for this experiment was to see whether SBERT embedding is an excellent choice of text representation when combined with simpler classification models in an email classification task. The results show that the SVM and NN perform higher than RNN in the email classification task. Since most real data is unlabeled, this thesis also evaluated how well unsupervised methods could perform in email clustering taking advantage of the available labels and using SBERT embeddings as text representations. Three unsupervised clustering models are reviewed in this thesis: K-Means (KM), Spectral Clustering (SC), and Hierarchical Agglomerative Clustering (HAC). The results show that the unsupervised models all had a similar performance in terms of precision, recall and F1-score, and the performances were evaluated using the available labeled dataset. In conclusion, this thesis gives evidence that in an email classification task, it is better for supervised models to train with pre-trained SBERT embeddings than to train on raw data. This thesis also showed that the output of the clustering methods compared on par with the output of the selected supervised learning techniques. / Manuell kategorisering av en inkorg kan ofta bli tidskrävande. Därför har många försök gjorts att använda maskininlärning för denna uppgift. En viktig uppgift för Natural Language Processing (NLP) är textklassificering, vilket är en stor utmaning eftersom en språkmotor inte talar något mänskligt språk som modersmål. En språkmotor misslyckas ofta med att förstå sarkasm och underliggande avsikt. En av språkmotorns utmaningar är att representera text. Textinbäddningar kan bli inlärda, eller så kan de genereras av en förutbildad modell. Googles förutbildade modell Sentence Bidirectional Encoder Representations from Transformers (SBERT) är den senaste tekniken för att generera förtränade vektorrepresentation av längre text. I detta projekt studerades olika metoder för att klassificera e-postmeddelanden. Prestandan av tre övervakade klassificeringsmodeller jämfördes med varandra, och av dessa var två utbildade med SBERT-inbäddningar: Support Vector Machine (SVM), Neural Network (NN) och den tredje modellen tränades på rådata: Recurrent Neural Network (RNN). Motivationen till detta experiment var att se om SBERT-inbäddningar tillsammans med enklare klassificeringsmodeller är ett bra val av textrepresentation i en e-post klassificeringsuppgift. Resultaten visar att SVM och NN har högre prestanda än RNN i e-postklassificeringsuppgiften. Eftersom mycket verklig data är omärkt utvärderade denna avhandling också hur väl oövervakade metoder kan utföras i samma e-postklassificeringsuppgift med SBERT-inbäddningar som textrepresentationer. Tre oövervakade klustringsmodeller utvärderas i denna avhandling: K-Means (KM), Spectral Clustering (SC) och Hierarchical Agglomerative Clustering (HAC). Resultaten visar att de oövervakade modeller hade liknande prestanda i precision, recall och F1-score, och prestandan var baserad på de tillgängliga klassannoteringarna. Sammanfattningsvis ger denna avhandling bevis på att i en e-postklassificeringsuppgift är det bättre att övervakade modeller tränar med förtränade SBERT-inbäddningar än att träna på rådata. Denna avhandling visade också att resultatet av klustringsmodellerna hade en jämförbar prestanda med resultatet av de valda övervakade inlärningstekniker. Natural Language Processing Text Representations Email Classification Text Classification Behandling Av Naturliga Språk Text Representation epost-klassificering Textklassificering Computer and Information Sciences Data- och informationsvetenskap
5	Inteligentní emailová schránka / Intelligent Mailbox Pohlídal, Antonín January 2012 (has links) This master's thesis deals with the use of text classification for sorting of incoming emails. First, there is described the Knowledge Discovery in Databases and there is also analyzed in detail the text classification with selected methods. Further, this thesis describes the email communication and SMTP, POP3 and IMAP protocols. The next part contains design of the system that classifies incoming emails and there are also described realated technologie ie Apache James Server, PostgreSQL and RapidMiner. Further, there is described the implementation of all necessary components. The last part contains an experiments with email server using Enron Dataset.

1

Page generated in 0.0831 seconds