Communication through text messaging, SMS (Short Message Service), is nowadays a huge industry with billions of active users. Because of the huge userbase it has attracted many companies trying to market themselves through unsolicited messages in this medium in the same way as was previously done through email. This is such a common phenomenon that SMS spam has now become a plague in many countries. This report evaluates several established machine learning algorithms to see how well they can be applied to the problem of filtering unsolicited SMS messages. Each filter is mainly evaluated by analyzing the accuracy of the filters on stored message data. The report also discusses and compares requirements for hardware versus performance measured by how many messages that can be evaluated in a fixed amount of time. The results from the evaluation shows that a decision tree filter is the best choice of the filters evaluated. It has the highest accuracy as well as a high enough process rate of messages to be applicable. The decision tree filter which was found to be the most suitable for the task in this environment has been implemented. The accuracy in this new implementation is shown to be as high as the implementation used for the evaluation of this filter. Though the decision tree filter is shown to be the best choice of the filters evaluated it turned out the accuracy is not high enough to meet the specified requirements. It however shows promising results for further testing in this area by using improved methods on the best performing algorithms.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-94161 |
Date | January 2013 |
Creators | Fredborg, Johan |
Publisher | Linköpings universitet, Institutionen för datavetenskap, Linköpings universitet, Tekniska högskolan |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0047 seconds