Global ETD Search

Return to search

Zpracování češtiny s využitím kontextualizované reprezentace / Czech NLP with Contextualized Embeddings

With the increasing amount of digital data in the form of unstructured text, the importance of natural language processing (NLP) increases. The most suc- cessful technologies of recent years are deep neural networks. This work applies the state-of-the-art methods, namely transfer learning of Bidirectional Encoders Representations from Transformers (BERT), on three Czech NLP tasks: part- of-speech tagging, lemmatization and sentiment analysis. We applied BERT model with a simple classification head on three Czech sentiment datasets: mall, facebook, and csfd, and we achieved state-of-the-art results. We also explored several possible architectures for tagging and lemmatization and obtained new state-of-the-art results in both tagging and lemmatization with fine-tunning ap- proach on data from Prague Dependency Treebank. Specifically, we achieved accuracy 98.57% for tagging, 99.00% for lemmatization, and 98.19% for joint accuracy of both tasks. Best models for all tasks are publicly available. 1

http://www.nusl.cz/ntk/nusl-451149

Identifer	oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:451149
Date	January 2021
Creators	Vysušilová, Petra
Contributors	Straka, Milan, Hajič, Jan
Source Sets	Czech ETDs
Language	Czech
Detected Language	English
Type	info:eu-repo/semantics/masterThesis
Rights	info:eu-repo/semantics/restrictedAccess

Page generated in 0.0015 seconds

Zpracování češtiny s využitím kontextualizované reprezentace / Czech NLP with Contextualized Embeddings

Description

Links & Downloads

Tags

Additional Fields