Global ETD Search

Return to search

Using Naive Bayes and N-Gram for Document Classification Användning av Naive Bayes och N-Gram för dokumentklassificering

The purpose of this degree project is to present, evaluate and improve probabilistic machine-learning methods for supervised text classification. We will explore Naive Bayes algorithm and character level n-gram, two probabilistic methods. The two methods will then be compared. Probabilistic algorithms like Naive Bayes and character level n-gram are some of the most effective methods in text classification, but to get accurate results they need a large training set. Because of too simple assumptions, Naive Bayes is a poor classifier. To rectify the problem, we will try to improve the algorithm, by using some transformed word and n-gram counts. / Syftet med det här examensarbetet är att presentera, utvärdera och förbättra probabilistiska maskin-lärande metoder för övervakad textklassificering. Vi ska bekanta oss med Naive Bayes och tecken-baserad n-gram, två probabilistiska metoder. Vi ska sedan jämföra metoderna. Probabilistiska algoritmerna är bland de mest effektiva metoder för övervakad textklassificering, men för att de ska ge noggranna resultat behövs det att de tränas med en stor mängd data. På grund av antaganden som görs i modellen, är Naive Bayes en dålig klassificerare. För att åtgärda problemet, ska vi försöka förbättra algoritmerna genom att modifiera ordfrekvenserna i dokumentet.

http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170757

Bayes

Computer Sciences

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:kth-170757
Date	January 2015
Creators	Farah Mohamed, Khalif
Publisher	KTH, Skolan för datavetenskap och kommunikation (CSC)
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	Swedish
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0017 seconds

Using Naive Bayes and N-Gram for Document Classification Användning av Naive Bayes och N-Gram för dokumentklassificering

Description

Links & Downloads

Tags

Additional Fields