Return to search

Towards improving e-mail content classification for spam control: architecture, abstraction, and strategies

This dissertation discusses techniques to improve the effectiveness and the efficiency of spam control. Specifically, layer-3 e-mail content classification is proposed to allow e-mail pre-classification (for fast spam detection at receiving e-mail servers) and to allow distributed processing at network nodes for fast spam detection at spam control points, e.g., at e-mail servers. Fast spam detection allows prioritizing e-mail servicing at receiving e-mail servers to safeguard non-spam e-mail deliveries even under heavy spam traffic. Fast spam detection also allows spam rejection during Simple Mail Transfer Protocol sessions for inbound and outbound spam control. We have four contributions in the dissertation.

In our first contribution, we propose a hardware architecture for naive Bayes content classification unit for a high-throughput spam detection computation. We use the logarithmic number system to simplify the naive Bayes computation. To handle the fast but lossy logarithmic number system computation, we analyze the noise model of our hardware architecture. Through noise analysis, synthesis, and verification by numerical simulation, we show that the naive Bayes classification unit, implemented on FPGA is capable of processing, with very low computation noise, more than one hundred million features per second, an order of magnitude faster than that on a general-purpose processor implementation.

In our second contribution, we propose e-mail content pre-classification at network layer (layer 3) instead of at application layer (layer 7) as currently being practiced to allow e-mail packet pre-classification and distributed processing for
effective spam detection beyond server implementations. By performing e-mail content classification at a lower abstraction
level, e-mail packets can be pre-processed, without reassembly, at any network node between sender and receiver. We demonstrated that the naive Bayes e-mail content classification can be adapted for layer-3 processing. We also show that fast e-mail class estimation can be performed at receiving e-mail servers. Through simulation using e-mail data sets, we showed that the layer-3 e-mail content classification is capable of detecting spam with accuracy and false positive values that approximately equal the ones at layer 7.

In our third contribution, we propose a prioritized e-mail servicing scheme using a priority queuing approach to improve spam handling at receiving e-mail servers. In this scheme, priority is given higher to non-spam e-mails than spam. Four servicing strategies for the proposed scheme are studied. We analyzed the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned.

In our fourth contribution, we propose a spam handling scheme that rejects spam during Simple Mail Transfer Protocol sessions. The proposed spam handling scheme allows inbound and outbound spam control. It is capable of reducing servers' loadings and hence, non-spam queuing delay and loss probability. We analyze the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned.

In this dissertation, we present four techniques to improve spam control based on e-mail content classification. We envision that our proposed approaches complement rather than replace the current spam control systems. The proposed four approaches are capable to work with existing spam control systems and support proactive spam and other e-mail-based threats such as phishing and e-mail worm controls anywhere across the Internet.

Identiferoai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/209
Date28 August 2007
CreatorsMarsono, Muhammad Nadzir
ContributorsGebali, Fayez, El-Kharashi, M. Watheq
Source SetsUniversity of Victoria
LanguageEnglish, English
Detected LanguageEnglish
TypeThesis
RightsAvailable to the World Wide Web

Page generated in 0.0028 seconds