Global ETD Search

Return to search

Evaluation of two word alignment systems

This project evaluates two different systems that generate wordalignments on English-Swedish data. The systems to be used are the Giza++ system, that may generate a variety of statistical translation models, and I*Trix system developed at IDA/NLPLab that generates word pairs with frequencies. The file formats of these two systems, the way of running them and the differences of the two systems are addressed in this paper. Evaluation in this project considers a variety of parameters such as corpus size, characteristics of the corpus, the effect of linguistic knowledge, etc. At the end of this paper, the conclusions of the two systems evaluation are presented. In general, Giza++ is better applying on big corpora while I*Trix is better for small corpora. Especially for corpora with high statistical ratio or special resource, I*Trix has a better performance.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2215

Datavetenskap (datalogi)

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-2215
Date	January 2004
Creators	Wang, Xiaoyang
Publisher	Linköpings universitet, Institutionen för datavetenskap, Institutionen för datavetenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0019 seconds

Evaluation of two word alignment systems

Description

Links & Downloads

Tags

Additional Fields