Global ETD Search

Return to search

Detecting opinion spam and fake news using n-gram analysis and semantic similarity

In recent years, deceptive contents such as fake news and fake reviews, also known as opinion spams, have increasingly become a dangerous prospect, for online users. Fake reviews affect consumers and stores a like. Furthermore, the problem of fake news has gained attention in 2016, especially in the aftermath of the last US presidential election. Fake reviews and fake news are a closely related phenomenon as both consist of writing and spreading false information or beliefs. The opinion spam problem was formulated for the first time a few years ago, but it has quickly become a growing research area due to the abundance of user-generated content. It is now easy for anyone to either write fake reviews or write fake news on the web. The biggest challenge is the lack of an efficient way to tell the difference between a real review or a fake one; even humans are often unable to tell the difference. In this thesis, we have developed an n-gram model to detect automatically fake contents with a focus on fake reviews and fake news. We studied and compared two different features extraction techniques and six machine learning classification techniques. Furthermore, we investigated the impact of keystroke features on the accuracy of the n-gram model. We also applied semantic similarity metrics to detect near-duplicated content. Experimental evaluation of the proposed using existing public datasets and a newly introduced fake news dataset introduced indicate improved performances compared to state of the art. / Graduate

https://dspace.library.uvic.ca//handle/1828/8796

Identifer	oai:union.ndltd.org:uvic.ca/oai:dspace.library.uvic.ca:1828/8796
Date	14 November 2017
Creators	Ahmed, Hadeer
Contributors	Traoré, Issa
Source Sets	University of Victoria
Language	English, English
Detected Language	English
Type	Thesis
Format	application/pdf
Rights	Available to the World Wide Web

Page generated in 0.0023 seconds

Detecting opinion spam and fake news using n-gram analysis and semantic similarity

Description

Links & Downloads

Tags

Additional Fields