The paper describes a computer system for testing the coherence and adequacy of dictionaries. The system suits also well for retiieving lexical material in context from computerized text archives Results are presented from a series of tests made with Kamusi ya Kiswahlli Sanifu (KKS), a monolingual Swahili dictionary.. The test of the intemal coherence of KKS shows that the text itself contains several hundreds of such words, for which there is no entry in the dictionary. Examples and frequency numbers of the most often occurring words are given The adequacy of KKS was also tested with a corpus of nearly one million words, and it was found out that 1.32% of words in book texts were not recognized by KKS, and with newspaper texts the amount was 2.24% The higher number in newspaper texts is partly due to numerous names occurring in news articles Some statistical results are given on frequencies of wordforms not recognized by KKS The tests shows that although KKS covers the modern vocabulary quite well, there are several ru·eas where the dictionary should be improved The internal coherence is far from satisfactory, and there are more than a thousand such rather common words in prose text which rue not included into KKS The system described in this article is au effective tool for `detecting problems and for retrieving lexical data in context for missing words.
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:10567 |
Date | January 1994 |
Creators | Horskainen, Arvi |
Contributors | University of Helsinki, Universität zu Köln |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | doc-type:article, info:eu-repo/semantics/article, doc-type:Text |
Source | Swahili Forum; 1 (1994), S. 169-179 |
Rights | info:eu-repo/semantics/openAccess |
Relation | urn:nbn:de:bsz:15-qucosa-94963, qucosa:11611 |
Page generated in 0.0018 seconds