Spelling suggestions: "subject:"[een] CORPUS LEXICOGRAPHY"" "subject:"[enn] CORPUS LEXICOGRAPHY""
1 |
Frazémy ve dvojjazyčném slovníku / Phrasemes in a Bilingual DictionaryJežková, Jaroslava January 2016 (has links)
This thesis deals with area of set phrasemes processing in dictionary, specifically processing of somatisms. The thesis consists of theoretical and practical part. The aim of theoretical part is phraseology in general, phrasemes (occasionally phraseologisms) and their application in Czech and German linguistics. Field of phrasemes like somatisms in order to language unit character is taken into account in the first section as well as dependence of phrasemes like their meaning explanation on the context in which they appear. Furthermore, there are listed and described main phrasemes characteristics which distinguish them from other language phenomenons. Conclusion of theoretical part analyzes area of corpus linguistics and its application based on corpus and co-occurrence analysis. Built on first part of thesis, practical part deals with processing of somatisms in bilingual dictionary particularly in lexicography point of view whereas proposal of specific solutions are given. As the attachment there are processed results of search into database input which may be considered as a part of bilingual dictionary. Keywords: phrasem, bilingual dictionary, corpus lexicography, corpus analysis, somatism
|
2 |
[en] THE CORPUS NEVER LIES: ON THE IDENTIFICATION AND USE OF MULTIWORD EXPRESSIONS / [pt] O CÓRPUS NÃO MENTE JAMAIS: SOBRE A IDENTIFICAÇÃO E USO DE COMBINAÇÕES MULTIVOCABULARES DO TIPO VERBO MAIS SINTAGMA NOMINALMILENA DE UZEDA GARRAO 22 August 2006 (has links)
[pt] Muitos estudos recentes sobre a identificação e uso de
combinações
multivocabulares (CMs) adotam uma perspectiva
representacionista do
significado da palavra. Este estudo propõe que é muito
mais interessante
identificar as CMs por um olhar não-representacionista. A
metodologia proposta
foi testada em CMs do tipo V+SN, um padrão bastante
freqüente no português do
Brasil (PB). Trata-se de uma análise estatística com base
em córpus que pode ser
resumida em três etapas: 1) córpus robusto do PB como base
de análise, 2)
aplicação de um teste estatístico ao córpus, a saber,
teste de Logaritmo de
Verossimilhança (Banerjee e Pedersen, 2003), para detecção
das CMs mais
freqüentes com padrão V+SN (como tomar café) e exclusão de
co-ocorrências
sintáticas aleatórias dos mesmos itens lexicais, 3)
aplicação de Medidas de
Similaridade (Baeza-Yates e Ribeiro-Neto, 1999) entre
todos os parágrafos
contendo uma certa CM (por exemplo, fazer campanha) e
todos os parágrafos
contendo o substantivo fora da CM (campanha). Esta última
etapa foi utilizada
para avaliar o grau de composicionalidade da CM. Pôde-se
concluir que quanto
maior a similaridade entre os parágrafos contendo a CM e
os parágrafos contendo
o substantivo fora da expressão, maior será o grau de
composicionalidade da CM.
Por essa razão, este estudo tem um impacto tanto teórico
quanto prático para a
semântica. / [en] A considerable amount of recent researches on defining
multi-word
expressions´ (MWE) phenomenon has an underlying
representational framework
of word meaning. In this study we claim that it is much
more interesting to view
MWE from a non-representational perspective. By choosing
this path, we avoid
the time-consuming and controversial human intuitions to
MWE identification
and definition. Our methodology was tested on Brazilian
Portuguese verbal
phrases of V+NP pattern. It is a statistically-based
corpus analysis which could be
summed up as the following three sequent steps: 1) robust
linguistic corpora as
output, 2) application of a probabilistic test to the
corpora, namely Log Likelihood
test (Banerjee and Pedersen, 2003), in order to spot the
Portuguese MWEs of V+NP
pattern (such as tomar café) and disregard casual
syntactic and not otherwise
motivated co-occurrences of the same lexical items, 3)
application of Similarity
Measures (Baeza-Yates and Ribeiro-Neto, 1999) between all
the paragraphs
containing a certain MWE and all the paragraphs containing
its separate noun.
This latter step is crucial to assess the MWE
compositionality level. We conclude
that the higher are the similarity measures between the
MWE (such as fazer
campanha) and its separate noun (campanha), the more
compositional will be the
MWE. Therefore, we believe that this work has both a
practical and a theoretical
impact to semantics.
|
3 |
[en] SUPPORT NOUNS: OPERATIONAL CRITERIA FOR CHARACTERIZATION / [pt] O SUBSTANTIVO-SUPORTE: CRITÉRIOS OPERACIONAIS DE CARACTERIZAÇÃOCLAUDIA MARIA GARCIA MEDEIROS DE OLIVEIRA 06 March 2007 (has links)
[pt] Este trabalho tem por objetivo prover um critério
operacional para caracterizar substantivos em combinações
de substantivo seguido de adjetivo, em que o substantivo
apresenta situação análoga à dos chamados verbos leves ou
verbos-suporte, largamente estudados em Lingüística e
Processamento de Linguagem Natural nos últimos anos. O
trabalho se situa na confluência entre estudos
lingüísticos, lexicográficos e computacionais e pretende
explorar a potencialidade da análise automática de corpora
e instrumentos quantitativos em busca de uma maior
objetividade na fundamentação de conceitos que norteiam a
atividade de análise lingüística. O desenvolvimento da
pesquisa alia a pesquisa em corpus ao dicionário
tradicional para realizar o levantamento das principais
propriedades das combinações S - Adj, particularizado para
o caso de ocorrência de adjetivos denominais. A partir das
informações lexicográficas e contextuais demonstra-se a
existência de um conjunto de substantivos que participam
das construções estudadas de maneira semelhante aos verbos-
suporte em combinações V - SN. Um método automático de
reconhecimento dos substantivos-suporte em textos é
elaborado, com o objetivo de fornecer aos estudiosos um
instrumento capaz de produzir evidências convincentes,
dada a insuficiência de julgamentos intuitivos para
justificar a delimitação de expressões de aparente
irregularidade. / [en] The main goal of this work is to provide operational
criteria for
characterizing nouns in Noun - Adjective combinations, in
which the noun
occurs in an analogous way to so called light verbs or
support verbs, widely
studied in recent years in both Linguistics and Natural
Language Processing.
In the work, linguistic, lexicographic and computational
studies converge in
order to explore the potential for automatic analysis of
corpora, whose aim
is to provide quantitative tools and methods which would
lead to a more
objective way of establishing concepts which underlie
linguistic analysis.
The work unites corpus-based research with traditional
lexicography in
order to elicit the main properties of the N-Adj
combinations occurring
with denominal adjectives. The lexicographic and
contextual data reveal
the existence of a set of nouns that occur in the studied
constructions
in a way similar to light verbs in V-Noun phrasal
combinations. An
automatic method for recognizing support nouns in texts is
developed, which
will provide language specialists with an instrument
capable of bringing
solid evidence to add to intuitive judgments in the task
of justifying the
delimitation of expressions that are apparently irregular
|
4 |
Srovnávací aspekty lotyšského a českého lexikonu (Materiály k sestavení lotyšsko-českého slovníku) / Comparative aspects of Latvian and Czech lexicons: Materials for assembling a Latvian- Czech dictionaryŠkrabal, Michal January 2016 (has links)
Title: Comparative aspects of Latvian and Czech lexicons: Materials for assembling a Latvian-Czech dictionary Autor: Mgr. Michal Škrabal Department: Institute of the Czech National Corpus Supervisor: prof. PhDr. František Čermák, DrSc. The primary aim of this work is to classify the Latvian lexicon, or better its relevant segment, into individual groups, definable semantically, grammatically, syntagmatically, pragmatically, and so on, and to attempt to find for these classifications an ideal method of lexicographical adaptation and apply it to an emerging Latvian-Czech dictionary (the very first manual of its type). To this end, modern instruments were utilized which, in the recent past, have radically altered the methodology of lexicographical work: on the one hand, the linguistic corpora, which nowadays represent authentic, linguistic usage and, on the other hand, the specialized lexicographic software TshwaneLex, in which a lexical database of Latvian is constructed and from which the dictionary itself will be subsequently constructed. Because of the limited size of the Latvian corpus it was not possible to completely eliminate traditional sources, and the author of the work was forced to consolidate traditional and modern lexicographical methods. His primary source however remained the corpus...
|
Page generated in 0.0421 seconds