Return to search

Robustní parsing zašuměného obsah / Robust Parsing of Noisy Content

While parsing performance on in-domain text has developed steadily in recent years, out-of-domain text and grammatically noisy text remain an obstacle and often lead to significant decreases in parsing accuracy. In this thesis, we focus on the parsing of noisy content, such as user-generated content in services like Twitter. We investigate the question whether a preprocessing step based on machine translation techniques and unsupervised models for text-normalization can improve parsing performance on noisy data. Existing data sets are evaluated and a new data set for dependency parsing of grammatically noisy Twitter data is introduced. We show that text-normalization together with a combination of domain-specific and generic part-of-speech taggers can lead to a significant improvement in parsing accuracy. Powered by TCPDF (www.tcpdf.org)

Identiferoai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:321428
Date January 2013
CreatorsDaiber, Joachim
ContributorsZeman, Daniel, Mareček, David
Source SetsCzech ETDs
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/masterThesis
Rightsinfo:eu-repo/semantics/restrictedAccess

Page generated in 0.0017 seconds