Return to search

Graph Similarity, Parallel Texts, and Automatic Bilingual Lexicon Acquisition

In this masters’ thesis report we present a graph theoretical method used for automatic bilingual lexicon acquisition with parallel texts. We analyze the concept of graph similarity and give an interpretation, of the parallel texts, connected to the vector space model. We represent the parallel texts by a directed, tripartite graph and from here use the corresponding adjacency matrix, A, to compute the similarity of the graph. By solving the eigenvalue problem ρS = ASAT + ATSA we obtain the self-similarity matrix S and the Perron root ρ. A rank k approximation of the self-similarity matrix is computed by implementations of the singular value decomposition and the non-negative matrix factorization algorithm GD-CLS. We construct an algorithm in order to extract the bilingual lexicon from the self-similarity matrix and apply a statistical model to estimate the precision, the correctness, of the translations in the bilingual lexicon. The best result is achieved with an application of the vector space model with a precision of about 80 %. This is a good result and can be compared with the precision of about 60 % found in the literature.

Identiferoai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-11550
Date January 2008
CreatorsTörnfeldt, Tobias
PublisherLinköpings universitet, Matematiska institutionen, Matematiska institutionen
Source SetsDiVA Archive at Upsalla University
LanguageEnglish
Detected LanguageEnglish
TypeStudent thesis, info:eu-repo/semantics/bachelorThesis, text
Formatapplication/pdf
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0023 seconds