Global ETD Search

Return to search

Debiasing a Corpus with Semi-Supervised Topic Modelling : Swapping Gendered Words to Reduce Potentially Harmful Associations

Gender biases are present in many NLP models. Such biased models can have large negative consequences on individuals. This work is a case study where we attempt to reduce them in a corpus consisting of Wikipedia articles about persons, in order to reduce them in models that would be trained on the corpus. For this, we apply two methods of modifying the corpus’s documents. Both methods replace gendered words (such as ‘mother’, ‘father’ and ‘parent’) with each other to change the contexts in which they each appear. The analysis and comparison of those two methods show that one of them is indeed suited to reduce gender biases. By modifying 35% of the corpus’s documents, the context of gendered words seems equal between the three considered genders (feminine, masculine and non-binary). This is confirmed through the performance of coreference resolution models trained with word embeddings fine-tuned on the corpus before and after modifying it. Evaluating these models on schemas specifically designed to point out gender biases in coreference resolution models shows that the model using the modified corpus is indeed less gender biased than the original. Our analysis further shows that the method does not compromise the corpus’s overall quality.

http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-226786

Gender Studies

Genusstudier

Computer Sciences

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:umu-226786
Date	January 2024
Creators	Müller, Sal R.
Publisher	Umeå universitet, Institutionen för datavetenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess
Relation	UMNAD ; 1469

Page generated in 0.0021 seconds

Debiasing a Corpus with Semi-Supervised Topic Modelling : Swapping Gendered Words to Reduce Potentially Harmful Associations

Description

Links & Downloads

Tags

Additional Fields