Global ETD Search

Return to search

A Lexicon for Gene Normalization / Ett lexicon för gennormalisering

Researchers tend to use their own or favourite gene names in scientific literature, even though there are official names. Some names may even be used for more than one gene. This leads to problems with ambiguity when automatically mining biological literature. To disambiguate the gene names, gene normalization is used. In this thesis, we look into an existing gene normalization system, and develop a new method to find gene candidates for the ambiguous genes. For the new method a lexicon is created, using information about the gene names, symbols and synonyms from three different databases. The gene mention found in the scientific literature is used as input for a search in this lexicon, and all genes in the lexicon that match the mention are returned as gene candidates for that mention. These candidates are then used in the system's disambiguation step. Results show that the new method gives a better over all result from the system, with an increase in precision and a small decrease in recall.

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-20250

Identifer	oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-20250
Date	January 2009
Creators	Lingemark, Maria
Publisher	Linköpings universitet, Institutionen för datavetenskap
Source Sets	DiVA Archive at Upsalla University
Language	English
Detected Language	English
Type	Student thesis, info:eu-repo/semantics/masterThesis, text
Format	application/pdf
Rights	info:eu-repo/semantics/openAccess

Page generated in 0.0024 seconds

A Lexicon for Gene Normalization / Ett lexicon för gennormalisering

Description

Links & Downloads

Tags

Additional Fields