Global ETD Search

Return to search

Automatic Categorization of News Articles With Contextualized Language Models / Automatisk kategorisering av nyhetsartiklar med kontextualiserade språkmodeller

This thesis investigates how pre-trained contextualized language models can be adapted for multi-label text classification of Swedish news articles. Various classifiers are built on pre-trained BERT and ELECTRA models, exploring global and local classifier approaches. Furthermore, the effects of domain specialization, using additional metadata features and model compression are investigated. Several hundred thousand news articles are gathered to create unlabeled and labeled datasets for pre-training and fine-tuning, respectively. The findings show that a local classifier approach is superior to a global classifier approach and that BERT outperforms ELECTRA significantly. Notably, a baseline classifier built on SVMs yields competitive performance. The effect of further in-domain pre-training varies; ELECTRA’s performance improves while BERT’s is largely unaffected. It is found that utilizing metadata features in combination with text representations improves performance. Both BERT and ELECTRA exhibit robustness to quantization and pruning, allowing model sizes to be cut in half without any performance loss.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177004

Natural Language Processing

Text Classification

Hierarchical Classification

Domain Specialization

Contextualized Language Models

BERT

ELECTRA

News Media

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-177004
Date	January 2021
Creators	Borggren, Lukas
Publisher	Linköpings universitet, Artificiell intelligens och integrerade datorsystem
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0026 seconds

Automatic Categorization of News Articles With Contextualized Language Models / Automatisk kategorisering av nyhetsartiklar med kontextualiserade språkmodeller

Description

Links & Downloads

Tags

Additional Fields