Global ETD Search

Return to search

Kvantitativní charakteristiky termínů / Quantitative Characteristics of Terms

The new method of automatic term recognition TERMIT is focused not only on the high number of correctly labeled terms, but also on the most important attributes of a term (in terms of their role in automatic term identification process). The method is based on data mining, i.e. finding meaningful information in very large corpus data. It was able to both successfuly identify terms in academic texts and find constitutive features of a term as a terminological unit. The single-word term (SWT) can be characterized as a word with a low frequency in corpus (SYN2010) that occurs considerably more often in specialized texts of a given field than in non-academic texts, occurs in a small number of academic disciplines, its distribution in the corpus (SYN2010) is uneven as is the distance between its two instances. The multi-word term (MWT) is a stable collocation consisting of words with low frequency and contains at least one (and often more) single-word term. Based on the characteristics of SWT and MWT, it is possible to classify individual tokens in texts as terms or non-terms with a success rate of more than 95 %. Automatically identified terms can be used to identify percentage of SWT or MWT in different academic disciplines, as well as find terms shared by two or more domains in order to assess their...

http://www.nusl.cz/ntk/nusl-335637

Identifer	oai:union.ndltd.org:nusl.cz/oai:invenio.nusl.cz:335637
Date	January 2014
Creators	Kováříková, Dominika
Contributors	Čermák, František, Bozděchová, Ivana, Machová, Svatava
Source Sets	Czech ETDs
Language	Czech
Detected Language	English
Type	info:eu-repo/semantics/doctoralThesis
Rights	info:eu-repo/semantics/restrictedAccess

Page generated in 0.0017 seconds

Kvantitativní charakteristiky termínů / Quantitative Characteristics of Terms

Description

Links & Downloads

Tags

Additional Fields