Global ETD Search

Return to search

Evaluating Statistical MachineLearning and Deep Learning Algorithms for Anomaly Detection in Chat Messages / Utvärdering av statistiska maskininlärnings- och djupinlärningsalgoritmer för anomalitetsdetektering i chattmeddelanden

Automatically detecting anomalies in text is of great interest for surveillance entities as vast amounts of data can be analysed to find suspicious activity. In this thesis, three distinct machine learning algorithms are evaluated as a chat message classifier is being implemented for the purpose of market surveillance. Naive Bayes and Support Vector Machine belong to the statistical class of machine learning algorithms being evaluated in this thesis and both require feature selection, a side objective of the thesis is thus to find a suitable feature selection technique to ensure mentioned algorithms achieve high performance. Long Short-Term Memory network is the deep learning algorithm being evaluated in the thesis, rather than depend on feature selection, the deep neural network will be evaluated as it is trained using word embeddings. Each of the algorithms achieved high performance but the findings ofthe thesis suggest Naive Bayes algorithm in conjunction with a feature counting feature selection technique is the most suitable choice for this particular learning problem. / Att automatiskt kunna upptäcka anomalier i text har stora implikationer för företag och myndigheter som övervakar olika sorters kommunikation. I detta examensarbete utvärderas tre olika maskininlärningsalgoritmer för chattmeddelandeklassifikation i ett marknadsövervakningsystem. Naive Bayes och Support Vector Machine tillhör båda den statistiska klassen av maskininlärningsalgoritmer som utvärderas i studien och bådar kräver selektion av vilka särdrag i texten som ska användas i algoritmen. Ett sekundärt mål med studien är således att hitta en passande selektionsteknik för att de statistiska algoritmerna ska prestera så bra som möjligt. Long Short-Term Memory Network är djupinlärningsalgoritmen som utvärderas i studien. Istället för att använda en selektionsteknik kommer djupinlärningsalgoritmen nyttja ordvektorer för att representera text. Resultaten visar att alla utvärderade algoritmer kan nå hög prestanda för ändamålet, i synnerhet Naive Bayes tillsammans med termfrekvensselektion.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235957

support vector machine

LSTM

Computer Sciences

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-235957
Date	January 2018
Creators	Freberg, Daniel
Publisher	KTH, Skolan för elektroteknik och datavetenskap (EECS)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	Swedish
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	TRITA-EECS-EX ; 2018:628

Page generated in 0.0024 seconds

Evaluating Statistical MachineLearning and Deep Learning Algorithms for Anomaly Detection in Chat Messages / Utvärdering av statistiska maskininlärnings- och djupinlärningsalgoritmer för anomalitetsdetektering i chattmeddelanden

Description

Links & Downloads

Tags

Additional Fields