Return to search

A Hybrid System for Glossary Generation of Feature Film Content for Language Learning

This report introduces a suite of command-line tools created to assist content developers with the creation of rich supplementary material to use in conjunction with feature films and other video assets in language teaching. The tools are intended to leverage open-source corpora and software (the OPUS OpenSubs corpus and the Moses statistical machine translation system, respectively), but are written in a modular fashion so that other resources could be leveraged in their place. The completed tool suite facilitates three main tasks, which together constitute this project. First, several scripts created for use in preparing linguistic data for the system are discussed. Next, a set of scripts are described that together leverage the strengths of both terminology management and statistical machine translation to provide candidate translation entries for terms of interest. Finally, a tool chain and methodology are given for enriching the terminological data store based on the output of the machine translation process, thereby enabling greater accuracy and efficiency with each subsequent application.

Identiferoai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-3237
Date04 August 2010
CreatorsCorradini, Ryan Arthur
PublisherBYU ScholarsArchive
Source SetsBrigham Young University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses and Dissertations
Rightshttp://lib.byu.edu/about/copyright/

Page generated in 0.0016 seconds