Global ETD Search

Return to search

Graph Similarity, Parallel Texts, and Automatic Bilingual Lexicon Acquisition

In this masters’ thesis report we present a graph theoretical method used for automatic bilingual lexicon acquisition with parallel texts. We analyze the concept of graph similarity and give an interpretation, of the parallel texts, connected to the vector space model. We represent the parallel texts by a directed, tripartite graph and from here use the corresponding adjacency matrix, A, to compute the similarity of the graph. By solving the eigenvalue problem ρS = ASAT + ATSA we obtain the self-similarity matrix S and the Perron root ρ. A rank k approximation of the self-similarity matrix is computed by implementations of the singular value decomposition and the non-negative matrix factorization algorithm GD-CLS. We construct an algorithm in order to extract the bilingual lexicon from the self-similarity matrix and apply a statistical model to estimate the precision, the correctness, of the translations in the bilingual lexicon. The best result is achieved with an application of the vector space model with a precision of about 80 %. This is a good result and can be compared with the precision of about 60 % found in the literature.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11550

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-11550
Date	January 2008
Creators	Törnfeldt, Tobias
Publisher	Linköpings universitet, Matematiska institutionen, Matematiska institutionen
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/bachelorThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0023 seconds

Graph Similarity, Parallel Texts, and Automatic Bilingual Lexicon Acquisition

Description

Links & Downloads

Tags

Additional Fields