• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • Tagged with
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Compound Processing for Phrase-Based Statistical Machine Translation

Stymne, Sara January 2009 (has links)
<p>In this thesis I explore how compound processing can be used to improve phrase-based statistical machine translation (PBSMT) between English and German/Swedish. Both German and Swedish generally use closed compounds, which are written as one word without spaces or other indicators of word boundaries. Compounding is both common and productive, which makes it problematic for PBSMT, mainly due to sparse data problems.</p><p>The adopted strategy for compound processing is to split compounds into their component parts before training and translation. For translation into Swedish and German the parts are merged after translation. I investigate the effect of different splitting algorithms for translation between English and German, and of different merging algorithms for German. I also apply these methods to a different language pair, English--Swedish. Overall the studies show that compound processing is useful, especially for translation from English into German or Swedish. But there are improvements for translation into English as well, such as a reduction of unknown words.</p><p>I show that for translation between English and German different splitting algorithms work best for different translation directions. I also design and evaluate a novel merging algorithm based on part-of-speech matching, which outperforms previous methods for compound merging, showing the need for information that is carried through the translation process, rather than only external knowledge sources such as word lists. Most of the methods for compound processing were originally developed for German. I show that these methods can be applied to Swedish as well, with similar results.</p>
2

Compound Processing for Phrase-Based Statistical Machine Translation

Stymne, Sara January 2009 (has links)
In this thesis I explore how compound processing can be used to improve phrase-based statistical machine translation (PBSMT) between English and German/Swedish. Both German and Swedish generally use closed compounds, which are written as one word without spaces or other indicators of word boundaries. Compounding is both common and productive, which makes it problematic for PBSMT, mainly due to sparse data problems. The adopted strategy for compound processing is to split compounds into their component parts before training and translation. For translation into Swedish and German the parts are merged after translation. I investigate the effect of different splitting algorithms for translation between English and German, and of different merging algorithms for German. I also apply these methods to a different language pair, English--Swedish. Overall the studies show that compound processing is useful, especially for translation from English into German or Swedish. But there are improvements for translation into English as well, such as a reduction of unknown words. I show that for translation between English and German different splitting algorithms work best for different translation directions. I also design and evaluate a novel merging algorithm based on part-of-speech matching, which outperforms previous methods for compound merging, showing the need for information that is carried through the translation process, rather than only external knowledge sources such as word lists. Most of the methods for compound processing were originally developed for German. I show that these methods can be applied to Swedish as well, with similar results.

Page generated in 0.1369 seconds