Global ETD Search

Return to search

Robustní parsing zašuměného obsah / Robust Parsing of Noisy Content

While parsing performance on in-domain text has developed steadily in recent years, out-of-domain text and grammatically noisy text remain an obstacle and often lead to significant decreases in parsing accuracy. In this thesis, we focus on the parsing of noisy content, such as user-generated content in services like Twitter. We investigate the question whether a preprocessing step based on machine translation techniques and unsupervised models for text-normalization can improve parsing performance on noisy data. Existing data sets are evaluated and a new data set for dependency parsing of grammatically noisy Twitter data is introduced. We show that text-normalization together with a combination of domain-specific and generic part-of-speech taggers can lead to a significant improvement in parsing accuracy. Powered by TCPDF (www.tcpdf.org)

http://www.nusl.cz/ntk/nusl-321428

Identifer	oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:321428
Date	January 2013
Creators	Daiber, Joachim
Contributors	Zeman, Daniel, Mareček, David
Source Sets	Czech ETDs
Language	English
Detected Language	English
Type	info:eu-repo/semantics/masterThesis
Rights	info:eu-repo/semantics/restrictedAccess

Page generated in 0.0018 seconds

Robustní parsing zašuměného obsah / Robust Parsing of Noisy Content

Description

Links & Downloads

Tags

Additional Fields