Global ETD Search

1	[en] THE CORPUS NEVER LIES: ON THE IDENTIFICATION AND USE OF MULTIWORD EXPRESSIONS / [pt] O CÓRPUS NÃO MENTE JAMAIS: SOBRE A IDENTIFICAÇÃO E USO DE COMBINAÇÕES MULTIVOCABULARES DO TIPO VERBO MAIS SINTAGMA NOMINAL MILENA DE UZEDA GARRAO 22 August 2006 (has links) [pt] Muitos estudos recentes sobre a identificação e uso de combinações multivocabulares (CMs) adotam uma perspectiva representacionista do significado da palavra. Este estudo propõe que é muito mais interessante identificar as CMs por um olhar não-representacionista. A metodologia proposta foi testada em CMs do tipo V+SN, um padrão bastante freqüente no português do Brasil (PB). Trata-se de uma análise estatística com base em córpus que pode ser resumida em três etapas: 1) córpus robusto do PB como base de análise, 2) aplicação de um teste estatístico ao córpus, a saber, teste de Logaritmo de Verossimilhança (Banerjee e Pedersen, 2003), para detecção das CMs mais freqüentes com padrão V+SN (como tomar café) e exclusão de co-ocorrências sintáticas aleatórias dos mesmos itens lexicais, 3) aplicação de Medidas de Similaridade (Baeza-Yates e Ribeiro-Neto, 1999) entre todos os parágrafos contendo uma certa CM (por exemplo, fazer campanha) e todos os parágrafos contendo o substantivo fora da CM (campanha). Esta última etapa foi utilizada para avaliar o grau de composicionalidade da CM. Pôde-se concluir que quanto maior a similaridade entre os parágrafos contendo a CM e os parágrafos contendo o substantivo fora da expressão, maior será o grau de composicionalidade da CM. Por essa razão, este estudo tem um impacto tanto teórico quanto prático para a semântica. / [en] A considerable amount of recent researches on defining multi-word expressions´ (MWE) phenomenon has an underlying representational framework of word meaning. In this study we claim that it is much more interesting to view MWE from a non-representational perspective. By choosing this path, we avoid the time-consuming and controversial human intuitions to MWE identification and definition. Our methodology was tested on Brazilian Portuguese verbal phrases of V+NP pattern. It is a statistically-based corpus analysis which could be summed up as the following three sequent steps: 1) robust linguistic corpora as output, 2) application of a probabilistic test to the corpora, namely Log Likelihood test (Banerjee and Pedersen, 2003), in order to spot the Portuguese MWEs of V+NP pattern (such as tomar café) and disregard casual syntactic and not otherwise motivated co-occurrences of the same lexical items, 3) application of Similarity Measures (Baeza-Yates and Ribeiro-Neto, 1999) between all the paragraphs containing a certain MWE and all the paragraphs containing its separate noun. This latter step is crucial to assess the MWE compositionality level. We conclude that the higher are the similarity measures between the MWE (such as fazer campanha) and its separate noun (campanha), the more compositional will be the MWE. Therefore, we believe that this work has both a practical and a theoretical impact to semantics. [pt] COMBINACOES MULTIVOCABULARES [en] MULTIWORD EXPRESSIONS [pt] COLOCACOES VERBAIS [en] VERBAL COLLOCATIONS [pt] LEXICOGRAFIA DE CORPUS [en] CORPUS LEXICOGRAPHY [pt] SEMANTICA DE CORPUS [en] CORPUS SEMANTICS
2	[en] SUPPORT NOUNS: OPERATIONAL CRITERIA FOR CHARACTERIZATION / [pt] O SUBSTANTIVO-SUPORTE: CRITÉRIOS OPERACIONAIS DE CARACTERIZAÇÃO CLAUDIA MARIA GARCIA MEDEIROS DE OLIVEIRA 06 March 2007 (has links) [pt] Este trabalho tem por objetivo prover um critério operacional para caracterizar substantivos em combinações de substantivo seguido de adjetivo, em que o substantivo apresenta situação análoga à dos chamados verbos leves ou verbos-suporte, largamente estudados em Lingüística e Processamento de Linguagem Natural nos últimos anos. O trabalho se situa na confluência entre estudos lingüísticos, lexicográficos e computacionais e pretende explorar a potencialidade da análise automática de corpora e instrumentos quantitativos em busca de uma maior objetividade na fundamentação de conceitos que norteiam a atividade de análise lingüística. O desenvolvimento da pesquisa alia a pesquisa em corpus ao dicionário tradicional para realizar o levantamento das principais propriedades das combinações S - Adj, particularizado para o caso de ocorrência de adjetivos denominais. A partir das informações lexicográficas e contextuais demonstra-se a existência de um conjunto de substantivos que participam das construções estudadas de maneira semelhante aos verbos- suporte em combinações V - SN. Um método automático de reconhecimento dos substantivos-suporte em textos é elaborado, com o objetivo de fornecer aos estudiosos um instrumento capaz de produzir evidências convincentes, dada a insuficiência de julgamentos intuitivos para justificar a delimitação de expressões de aparente irregularidade. / [en] The main goal of this work is to provide operational criteria for characterizing nouns in Noun - Adjective combinations, in which the noun occurs in an analogous way to so called light verbs or support verbs, widely studied in recent years in both Linguistics and Natural Language Processing. In the work, linguistic, lexicographic and computational studies converge in order to explore the potential for automatic analysis of corpora, whose aim is to provide quantitative tools and methods which would lead to a more objective way of establishing concepts which underlie linguistic analysis. The work unites corpus-based research with traditional lexicography in order to elicit the main properties of the N-Adj combinations occurring with denominal adjectives. The lexicographic and contextual data reveal the existence of a set of nouns that occur in the studied constructions in a way similar to light verbs in V-Noun phrasal combinations. An automatic method for recognizing support nouns in texts is developed, which will provide language specialists with an instrument capable of bringing solid evidence to add to intuitive judgments in the task of justifying the delimitation of expressions that are apparently irregular [pt] LINGUISTICA [en] LINGUISTICS [pt] LEXICOGRAFIA DE CORPUS [en] CORPUS LEXICOGRAPHY [pt] SUBSTANTIVO-SUPORTE [en] SUPPORT NOUN [pt] ADJETIVO DENOMINAL [en] DENOMINAL ADJECTIVE [pt] CLASSE DE PALAVRAS [en] PART OF SPEECH

Search results

[en] THE CORPUS NEVER LIES: ON THE IDENTIFICATION AND USE OF MULTIWORD EXPRESSIONS / [pt] O CÓRPUS NÃO MENTE JAMAIS: SOBRE A IDENTIFICAÇÃO E USO DE COMBINAÇÕES MULTIVOCABULARES DO TIPO VERBO MAIS SINTAGMA NOMINAL

[en] SUPPORT NOUNS: OPERATIONAL CRITERIA FOR CHARACTERIZATION / [pt] O SUBSTANTIVO-SUPORTE: CRITÉRIOS OPERACIONAIS DE CARACTERIZAÇÃO