Spelling suggestions: "subject:"span filtering"" "subject:"spa filtering""
1 |
Towards eradication of SPAM : a study on intelligent adaptive SPAM filterst.hassan@aic.wa.edu.au, Tarek Hassan January 2006 (has links)
As the massive increase of electronic mail (email) usage continues, SPAM (unsolicited bulk email), has continued to grow because it is a very inexpensive method of advertising. These unwanted emails can cause a serious problem by filling up the email inbox and thereby leaving no space for legitimate emails to pass through. Currently the only defense against SPAM is the use of SPAM filters. A novel SPAM filter GetEmail5 along with the design rationale, is described in this thesis. To test the efficacy of GetEmail5 SPAM filter, an experimental setup was created and
a commercial bulk email program was used to send SPAM and non-SPAM emails to test the new SPAM filter.
GetEmail5s efficiency and ability to detect SPAM was compared against two highly ranked commercial SPAM filters on different sets of emails, these included all SPAM, non-SPAM, and mixed emails, also text and HTML emails.
The results showed the superiority of GetEmail5 compared to the two commercial SPAM filters in detecting SPAM emails and reducing the users involvement in categorizing the incoming emails.
This thesis demonstrates the design rationale for GetEmail5 and also its greater effectiveness in comparison with the commercial SPAM filters tested.
|
2 |
Personal Email Spam Filtering with Minimal User InteractionMojdeh, Mona January 2012 (has links)
This thesis investigates ways to reduce or eliminate the necessity of user input to
learning-based personal email spam filters. Personal spam filters have been shown in
previous studies to yield superior effectiveness, at the cost of requiring extensive user training which may be burdensome or impossible.
This work describes new approaches to solve the problem of building a personal
spam filter that requires minimal user feedback. An initial study investigates how well a personal filter can learn from different sources of data, as opposed to user’s messages. Our initial studies show that inter-user training yields substantially inferior results to
intra-user training using the best known methods. Moreover, contrary to previous
literature, it is found that transfer learning degrades the performance of spam filters when the source of training and test sets belong to two different users or different times.
We also adapt and modify a graph-based semi-supervising learning algorithm to
build a filter that can classify an entire inbox trained on twenty or fewer user judgments.
Our experiments show that this approach compares well with previous techniques when
trained on as few as two training examples.
We also present the toolkit we developed to perform privacy-preserving user studies
on spam filters. This toolkit allows researchers to evaluate any spam filter that conforms to a standard interface defined by TREC, on real users’ email boxes. Researchers have access only to the TREC-style result file, and not to any content of a user’s email
stream.
To eliminate the necessity of feedback from the user, we build a personal autonomous filter that learns exclusively on the result of a global spam filter. Our laboratory experiments show that learning filters with no user input can substantially
improve the results of open-source and industry-leading commercial filters that employ no user-specific training. We use our toolkit to validate the performance of the
autonomous filter in a user study.
|
3 |
Personal Email Spam Filtering with Minimal User InteractionMojdeh, Mona January 2012 (has links)
This thesis investigates ways to reduce or eliminate the necessity of user input to
learning-based personal email spam filters. Personal spam filters have been shown in
previous studies to yield superior effectiveness, at the cost of requiring extensive user training which may be burdensome or impossible.
This work describes new approaches to solve the problem of building a personal
spam filter that requires minimal user feedback. An initial study investigates how well a personal filter can learn from different sources of data, as opposed to user’s messages. Our initial studies show that inter-user training yields substantially inferior results to
intra-user training using the best known methods. Moreover, contrary to previous
literature, it is found that transfer learning degrades the performance of spam filters when the source of training and test sets belong to two different users or different times.
We also adapt and modify a graph-based semi-supervising learning algorithm to
build a filter that can classify an entire inbox trained on twenty or fewer user judgments.
Our experiments show that this approach compares well with previous techniques when
trained on as few as two training examples.
We also present the toolkit we developed to perform privacy-preserving user studies
on spam filters. This toolkit allows researchers to evaluate any spam filter that conforms to a standard interface defined by TREC, on real users’ email boxes. Researchers have access only to the TREC-style result file, and not to any content of a user’s email
stream.
To eliminate the necessity of feedback from the user, we build a personal autonomous filter that learns exclusively on the result of a global spam filter. Our laboratory experiments show that learning filters with no user input can substantially
improve the results of open-source and industry-leading commercial filters that employ no user-specific training. We use our toolkit to validate the performance of the
autonomous filter in a user study.
|
4 |
Towards eradication of SPAM : a study on intelligent adaptive SPAM filters /Hassan, Tarek. January 2006 (has links)
Thesis (M. Computer Sci.)--Murdoch University, 2006. / Thesis submitted to the Division of Arts. Includes bibliographical references (leaves 95-102).
|
5 |
Spam Filter Improvement Through MeasurementLynam, Thomas Richard January 2009 (has links)
This work supports the thesis that sound quantitative evaluation for
spam filters leads to substantial improvement in the classification
of email. To this end, new laboratory testing methods and datasets
are introduced, and evidence is presented that their adoption at Text
REtrieval Conference (TREC)and elsewhere has led to an improvement in state of the art
spam filtering. While many of these improvements have been discovered
by others, the best-performing method known at this time -- spam filter
fusion -- was demonstrated by the author.
This work describes four principal dimensions of spam filter evaluation
methodology and spam filter improvement. An initial study investigates
the application of twelve open-source filter configurations in a laboratory
environment, using a stream of 50,000 messages captured from a single
recipient over eight months. The study measures the impact of user
feedback and on-line learning on filter performance using methodology
and measures which were released to the research community as the
TREC Spam Filter Evaluation Toolkit.
The toolkit was used as the basis of the TREC Spam Track, which the
author co-founded with Cormack. The Spam Track, in addition to evaluating
a new application (email spam), addressed the issue of testing systems
on both private and public data. While streams of private messages
are most realistic, they are not easy to come by and cannot be shared
with the research community as archival benchmarks. Using the toolkit,
participant filters were evaluated on both, and the differences found
not to substantially confound evaluation; as a result, public corpora
were validated as research tools. Over the course of TREC and similar
evaluation efforts, a dozen or more archival benchmarks --
some private and some public -- have become available.
The toolkit and methodology have spawned improvements in the state
of the art every year since its deployment in 2005. In 2005, 2006,
and 2007, the spam track yielded new best-performing systems based
on sequential compression models, orthogonal sparse bigram features,
logistic regression and support vector machines. Using the TREC participant
filters, we develop and demonstrate methods for on-line filter fusion
that outperform all other reported on-line personal spam filters.
|
6 |
Spam Filter Improvement Through MeasurementLynam, Thomas Richard January 2009 (has links)
This work supports the thesis that sound quantitative evaluation for
spam filters leads to substantial improvement in the classification
of email. To this end, new laboratory testing methods and datasets
are introduced, and evidence is presented that their adoption at Text
REtrieval Conference (TREC)and elsewhere has led to an improvement in state of the art
spam filtering. While many of these improvements have been discovered
by others, the best-performing method known at this time -- spam filter
fusion -- was demonstrated by the author.
This work describes four principal dimensions of spam filter evaluation
methodology and spam filter improvement. An initial study investigates
the application of twelve open-source filter configurations in a laboratory
environment, using a stream of 50,000 messages captured from a single
recipient over eight months. The study measures the impact of user
feedback and on-line learning on filter performance using methodology
and measures which were released to the research community as the
TREC Spam Filter Evaluation Toolkit.
The toolkit was used as the basis of the TREC Spam Track, which the
author co-founded with Cormack. The Spam Track, in addition to evaluating
a new application (email spam), addressed the issue of testing systems
on both private and public data. While streams of private messages
are most realistic, they are not easy to come by and cannot be shared
with the research community as archival benchmarks. Using the toolkit,
participant filters were evaluated on both, and the differences found
not to substantially confound evaluation; as a result, public corpora
were validated as research tools. Over the course of TREC and similar
evaluation efforts, a dozen or more archival benchmarks --
some private and some public -- have become available.
The toolkit and methodology have spawned improvements in the state
of the art every year since its deployment in 2005. In 2005, 2006,
and 2007, the spam track yielded new best-performing systems based
on sequential compression models, orthogonal sparse bigram features,
logistic regression and support vector machines. Using the TREC participant
filters, we develop and demonstrate methods for on-line filter fusion
that outperform all other reported on-line personal spam filters.
|
7 |
Models to combat email spam botnets and unwanted phone callsHusna, Husain. Dantu, Ram, January 2008 (has links)
Thesis (M.S.)--University of North Texas, May, 2008. / Title from title page display. Includes bibliographical references.
|
8 |
Graph-based email prioritizationNussbaum, Ronald. January 2008 (has links)
Thesis (M.S.)--Michigan State University. Computer Science and Engineering, 2008. / Title from PDF t.p. (viewed on July 29, 2009) Includes bibliographical references (p. 45-47). Also issued in print.
|
9 |
A learning approach to spam detection based on social networks /Lam, Ho-Yu. January 2007 (has links)
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2007. / Includes bibliographical references (leaves 80-88). Also available in electronic version.
|
10 |
Computing with Granular WordsHou, Hailong 07 May 2011 (has links)
Computational linguistics is a sub-field of artificial intelligence; it is an interdisciplinary field dealing with statistical and/or rule-based modeling of natural language from a computational perspective. Traditionally, fuzzy logic is used to deal with fuzziness among single linguistic terms in documents. However, linguistic terms may be related to other types of uncertainty. For instance, different users search ‘cheap hotel’ in a search engine, they may need distinct pieces of relevant hidden information such as shopping, transportation, weather, etc. Therefore, this research work focuses on studying granular words and developing new algorithms to process them to deal with uncertainty globally. To precisely describe the granular words, a new structure called Granular Information Hyper Tree (GIHT) is constructed. Furthermore, several technologies are developed to cooperate with computing with granular words in spam filtering and query recommendation. Based on simulation results, the GIHT-Bayesian algorithm can get more accurate spam filtering rate than conventional method Naive Bayesian and SVM; computing with granular word also generates better recommendation results based on users’ assessment when applied it to search engine.
|
Page generated in 0.1278 seconds