Global ETD Search

Return to search

Filtrering av e-post : Binär klassifikation med naiv Bayesiansk teknik / Filtering e-mail : Binary classification with naïve Bayesian technique

In this thesis we compare how different strategies in choosing attribute values affects junk mail filtering. We used two different variants of a naïve Bayesian junk mail filter. The first variant classified an e-mail by comparing it to a feature vector containing all attribute values that were found in junk mails in the part of the e-mail collection we used for training the filter. The second variant compared an e-mail to a feature vector that consisted of the attributes that was found in ten or more junk mails in the part of the e-mail collection we used for training the filter. We used an e-mail collection that consisted of 300 e-mails, 210 of these were junk mails and 90 were legitimate e-mails. We measured the results in our study using; SP, SR and F1 and to be able to compare the two different strategies we cross validated them. The results we got in our study showed that the first strategy got higher average F1 values than our second strategy. Despite of this we believe that the second strategy is the better one. Instead of comparing the e-mail to a feature vector containing all attribute values found in junk mails, the results will be better if the filter compares the e-mail to a feature vector that contains a limited amount of attribute values. / Uppsatsnivå: D

automatisk klassifikation

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:hb-18675
Date	January 2007
Creators	Bünger, Sara, Nilsson, Stefan
Publisher	Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan, University of Borås/Swedish School of Library and Information Science (SSLIS)
Source Sets	DiVA Archive at Upsalla University
Language	Swedish
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	Magisteruppsats i biblioteks- och informationsvetenskap vid institutionen Biblioteks- och informationsvetenskap, 1654-0247 ; 2007:132

Page generated in 0.0111 seconds

Filtrering av e-post : Binär klassifikation med naiv Bayesiansk teknik / Filtering e-mail : Binary classification with naïve Bayesian technique

Description

Links & Downloads

Tags

Additional Fields