Return to search

Cyberbullying detection in Urdu language using machine learning

Yes / Cyberbullying has become a significant problem with the surge in the use of social media. The most basic way to prevent cyberbullying on these social media platforms is to identify and remove offensive comments. However, it is hard for humans to read and remove all the comments manually. Current research work focuses on using machine learning to detect and eliminate cyberbullying. Although most of the work has been conducted on English texts to detect cyberbullying, limited to no work can be found in Urdu. This paper aims to detect cyberbullying from the users' comments posted in Urdu on Twitter using machine learning and Natural Language Processing (NLP) techniques. To the best of our knowledge, cyberbullying detection on Urdu text comments has not been performed due to the lack of a publicly available standard Urdu dataset. In this paper, we created a dataset of offensive user-generated Urdu comments from Twitter. The comments in the dataset are classified into five categories. n-gram techniques are used to extract features at character and word levels. Various supervised machine-learning techniques are applied to the dataset to detect cyberbullying. Evaluation metrics such as precision, recall, accuracy and F1 scores are used to analyse the performance of machine learning techniques.

Identiferoai:union.ndltd.org:BRADFORD/oai:bradscholars.brad.ac.uk:10454/19312
Date11 January 2023
CreatorsKhan, Sara, Qureshi, Amna
PublisherIEEE
Source SetsBradford Scholars
LanguageEnglish
Detected LanguageEnglish
TypeConference paper, Accepted manuscript
Rights© 2022 IEEE. Reproduced in accordance with the publisher's self-archiving policy., Unspecified

Page generated in 0.0095 seconds