• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • 3
  • 1
  • 1
  • Tagged with
  • 10
  • 10
  • 5
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Delimitação e alinhamento de conceitos lexicalizados no inglês norte-americano e no português brasileiro

Di Felippo, Ariani [UNESP] 01 August 2008 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:32:47Z (GMT). No. of bitstreams: 0 Previous issue date: 2008-08-01Bitstream added on 2014-06-13T19:22:22Z : No. of bitstreams: 1 difelippo_a_dr_arafcl.pdf: 1956757 bytes, checksum: 06fd6b99e881a5fd4644f9556296351c (MD5) / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / Devido a vários fatores, como saliência perceptual e relevância semiótica, as línguas apresentam repertórios diferentes de conceitos lexicalizados (isto é, conceitos expressos por unidades lexicais). As divergências léxico-conceituais dificultam o tratamento computacional das línguas naturais em tarefas como tradução automática e recuperação de informação multilíngüe. Assim, a construção de base de dados lexicais bilíngües e multilíngües em que as unidades de línguas distintas estão inter-relacionadas por meio do conceito a elas subjacente tem recebido muita atenção no Processamento Automático das Línguas Naturais (PLN). Para o português brasileiro (PB), faz-se urgente a construção desse tipo de recurso. Nesse cenário, esta tese visa a investigar os padrões de lexicalização do PB e a construir um recurso léxicoconceitual, ainda que de extensões reduzidas, que possa auxiliar o processamento automático dessa língua em meio escrito. Assumindo-se a concepção de PLN enquanto “uma engenharia da linguagem humana”, utilizou-se uma metodologia tripartida que divide as atividades nos domínios: lingüístico, lingüístico-computacional e computacional. Este trabalho, em especial, não realizou as atividades previstas no terceiro domínio, pois estas não fazem parte do escopo desta pesquisa. No domínio lingüístico, um conjunto de conceitos lexicalizados no inglês norte-americano (AmE), extraído da WordNet de Princeton (WN.Pr), foi delimitado por meio da análise manual de recursos estruturados (base de dados e dicionários) e não-estruturados (corpora textuais). Na seqüência, as expressões do PB (em especial, as unidades lexicais) que materializam tais conceitos foram manualmente extraídas de dicionários bilíngües (AmE-PB), dicionários monolíngües e thesaurus e de corpora textuais do PB. No domínio lingüístico-computacional... / Because of several factors, including, for instance, perceptual salience and semiotic relevance, languages have different inventories of lexicalized concepts (i.e. concepts expressed by lexical units). The lexical-conceptual divergences are a hindrance to computational treatment of natural languages in tasks such as machine translation and cross-language information retrieval. Therefore, the construction of bilingual and multilingual lexical databases, in which the lexical units of different languages are aligned by their underlying concepts, has become a very important research topic in Natural Language Processing (NLP). For Brazilian Portuguese (BP), in particular, the construction of such resources is urgent. In this scenario, this thesis aims to investigate lexicalization patterns of BP and to develop a lexical-conceptual resource for the automatic processing of written BP language. Assuming a compromise between NLP and Linguistics, this work follows a three-domain approach methodology, which claims that the research activities should be divided into the linguistic, linguisticcomputational, and computational domains. In particular, this research does not perform the last step, since it is not in the scope of this work. Accordingly, in the linguistic domain, a set of lexicalized concepts of North-American English (AmE) extracted from Princeton WordNet (WN.Pr) was selected through manual analysis of the structured (lexical databases and standard dictionaries) and unstructured resources (textual corpora). Given those concepts, their lexical and phrasal expressions in BP were manually compiled from bilingual dictionaries, with the help of standard monolingual dictionaries, thesauri, and textual corpora. In the linguistic-computational domain, the lexicalized concepts of AmE and BP previously identified were aligned by means of a semantic structured interlingua (or ontology)... (Complete abstract click electronic access below)
2

Delimitação e alinhamento de conceitos lexicalizados no inglês norte-americano e no português brasileiro /

Di Felippo, Ariani. January 2008 (has links)
Orientador: Bento Carlos Dias da Silva / Banca: Stella Esther Ortweiller Tagnin / Banca: Cláudia Zavaglia / Banca: Beatriz Nunes de Oliveira Longo / Banca: Rosane de Andrade Berlinck / Resumo: Devido a vários fatores, como saliência perceptual e relevância semiótica, as línguas apresentam repertórios diferentes de conceitos lexicalizados (isto é, conceitos expressos por unidades lexicais). As divergências léxico-conceituais dificultam o tratamento computacional das línguas naturais em tarefas como tradução automática e recuperação de informação multilíngüe. Assim, a construção de base de dados lexicais bilíngües e multilíngües em que as unidades de línguas distintas estão inter-relacionadas por meio do conceito a elas subjacente tem recebido muita atenção no Processamento Automático das Línguas Naturais (PLN). Para o português brasileiro (PB), faz-se urgente a construção desse tipo de recurso. Nesse cenário, esta tese visa a investigar os padrões de lexicalização do PB e a construir um recurso léxicoconceitual, ainda que de extensões reduzidas, que possa auxiliar o processamento automático dessa língua em meio escrito. Assumindo-se a concepção de PLN enquanto "uma engenharia da linguagem humana", utilizou-se uma metodologia tripartida que divide as atividades nos domínios: lingüístico, lingüístico-computacional e computacional. Este trabalho, em especial, não realizou as atividades previstas no terceiro domínio, pois estas não fazem parte do escopo desta pesquisa. No domínio lingüístico, um conjunto de conceitos lexicalizados no inglês norte-americano (AmE), extraído da WordNet de Princeton (WN.Pr), foi delimitado por meio da análise manual de recursos estruturados (base de dados e dicionários) e não-estruturados (corpora textuais). Na seqüência, as expressões do PB (em especial, as unidades lexicais) que materializam tais conceitos foram manualmente extraídas de dicionários bilíngües (AmE-PB), dicionários monolíngües e thesaurus e de corpora textuais do PB. No domínio lingüístico-computacional... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Because of several factors, including, for instance, perceptual salience and semiotic relevance, languages have different inventories of lexicalized concepts (i.e. concepts expressed by lexical units). The lexical-conceptual divergences are a hindrance to computational treatment of natural languages in tasks such as machine translation and cross-language information retrieval. Therefore, the construction of bilingual and multilingual lexical databases, in which the lexical units of different languages are aligned by their underlying concepts, has become a very important research topic in Natural Language Processing (NLP). For Brazilian Portuguese (BP), in particular, the construction of such resources is urgent. In this scenario, this thesis aims to investigate lexicalization patterns of BP and to develop a lexical-conceptual resource for the automatic processing of written BP language. Assuming a compromise between NLP and Linguistics, this work follows a three-domain approach methodology, which claims that the research activities should be divided into the linguistic, linguisticcomputational, and computational domains. In particular, this research does not perform the last step, since it is not in the scope of this work. Accordingly, in the linguistic domain, a set of lexicalized concepts of North-American English (AmE) extracted from Princeton WordNet (WN.Pr) was selected through manual analysis of the structured (lexical databases and standard dictionaries) and unstructured resources (textual corpora). Given those concepts, their lexical and phrasal expressions in BP were manually compiled from bilingual dictionaries, with the help of standard monolingual dictionaries, thesauri, and textual corpora. In the linguistic-computational domain, the lexicalized concepts of AmE and BP previously identified were aligned by means of a semantic structured interlingua (or ontology)... (Complete abstract click electronic access below) / Doutor
3

[en] COMPOUNDS IN PORTUGUESE / [pt] OS NOMES COMPOSTOS EM PORTUGUÊS

TANIA VIEIRA GOMES 09 November 2005 (has links)
[pt] Este trabalho analisa os critérios de caracterização de nomes compostos em Português, com o objetivo de obter informações que permitam o estabelecimento dos padrões gerais do processo de composição de palavras, de tal modo que se possa distinguir este tipo de entidade lingüística de outras combinatórias lexicais, como as locuções nominais e outros sintagmas freqüentes e estáveis no repertório da língua. De início, faz-se um exame das abordagens da composição por autores identificados com a tradição gramatical. Em seguida, feita a identificação dos aspectos deste processo de criação lexical que permanecem inexplicados pela gramática, e sempre com o objetivo de se definir e caracterizar a palavra composta, procede-se ao estudo das abordagens dos lingüistas estruturalistas e suas tentativas de conceituar a palavra enquanto unidade lingüística. Finalmente, analisam-se as visões dos pesquisadores pósestruturalistas e os critérios e testes diferenciadores por eles propostos. As conclusões das análises efetuadas revelam que, ao lado de algumas poucas formações, que se comportam como palavras compostas quando analisadas sob os quatro critérios - fonológico, morfológico, sintático e semântico - há outras que se diferenciam dos grupos sintáticos comuns quando investigadas por alguns ou apenas um destes parâmetros, em geral o semântico. A investigação também revela que há muitas seqüências que, consideradas rigorosamente sob as leis da morfologia, não configuram unidades morfológicas, embora, do ponto de vista lexical, estejam cristalizadas no idioma e sejam percebidas como unidades lexicais pelos falantes. / [en] This work analyzes different criteria normally used to characterize compounds in Portuguese, as opposed to clauses and other types of linguistic units such as idioms or collocations. Its main goal is to organize information in such a way as to establish general patterns for lexical compounding. Initially, an analysis of different approaches to compounding in Traditional Grammar literature is made. Then, as the unexplained aspects of compounding in those approaches are identified, the goal of defining compounds in Portuguese is pursued in the analysis of structuralist and posterior approaches and their attempts to define the word as a linguistic unit. The results of the research reveal that, even though a few formations do behave as compounds under all relevant - phonological, morphological, syntactic and semantic - criteria, there are many others that differ from common syntactic sequences, only with respect to one of the mentioned parameters, more frequently the semantic one. We can also conclude from our analysis that many word sequences do not correspond to morphological units, in spite of the fact that they are lexicalized and thus perceived as lexical units by the speaker of Portuguese.
4

Analýza aditivních (zahrnujících) aktualizátorů v češtině a španělštině / Analysis of additive rhematizers in Czech and in Spanish

VLACHOVÁ, Sabina January 2018 (has links)
The aim of this diploma thesis is a comparison of Czech and Spanish actualizers, in other words lexical units emphasizing sentence elements. The comparative analysis is based on data obtained from parallel corpus Intercorp. The work consists of two main parts. The first part, which is divided into chapters and subchapters, deals with problematics of actualizers featuring particular sentence structure and considering the view of Czech and Spanish linguists dealing with expressions used for emphasizing certain lexical units in the sentence. The second part consists of analysis of particular actualizers with aid of parallel corpus Intercorp. This analysis is focused on the ways to express Czech actualizers in the target language, which is Spanish.
5

Nové německé výpůjčky v češtině / New German Borrowings in the Czech Language

Neprašová, Renáta January 2014 (has links)
The aim of the thesis New German Borrowings in the Czech Language is a confirmation of the importance of language contacts in Czech and German. The work demonstrates that acceptance of German lexical units into Czech language is a productive way of enriching its vocabulary. The use of German borrowings in different kinds of utterances and their frequency analysis show that language speakers are starting to see the position of germanisms in Czech in a neutral way, thus not negatively, as was the case in the past. I deal with the analysis of foreign language lexical units from the integral-adaptation, semantic and frequency point of view. The method of my research is targeted excerption of newspapers, where you can see how the new German borrowings are used more and more in contemporary Czech vocabulary. I focus on productivity of use of different borrowed parts of speech. In addition to the various parts of speech I also describe hybrid composites and deproprial expressions. The amount of the new German borrowings, which are collected in neologism exceptions database, demonstrates that acceptance of German words and word formation elements is an important and productive linguistic process of creating of neologisms.
6

'Enxergando' as colocações: para ajudar a vencer o medo de um texto autêntico. / Learning collocations: to help read a text.

Louro, Inês da Conceição dos Anjos 27 August 2001 (has links)
Este trabalho lida com unidades lexicais compostas por mais de uma palavra usadas com função referencial,ou seja, cada uma dessas unidades lexicais constitui um nome. Em uma sala de aula de ensino de língua inglesa para brasileiros, observou-se como o fato de o aluno 'enxergar' essas unidades lexicais pode ajudá-lo a ler um texto. / This study is about multi-word lexical units which have referential meaning, i.e., each unit is used as a name. In an English teaching classroom for Brazilian students it was noticed that making students aware of such lexical units may help them read a text.
7

'Enxergando' as colocações: para ajudar a vencer o medo de um texto autêntico. / Learning collocations: to help read a text.

Inês da Conceição dos Anjos Louro 27 August 2001 (has links)
Este trabalho lida com unidades lexicais compostas por mais de uma palavra usadas com função referencial,ou seja, cada uma dessas unidades lexicais constitui um nome. Em uma sala de aula de ensino de língua inglesa para brasileiros, observou-se como o fato de o aluno 'enxergar' essas unidades lexicais pode ajudá-lo a ler um texto. / This study is about multi-word lexical units which have referential meaning, i.e., each unit is used as a name. In an English teaching classroom for Brazilian students it was noticed that making students aware of such lexical units may help them read a text.
8

Periferní jednotky staročeského lexikálního systému: jednotky ustupující / Peripheral Units of the Old Czech Lexical System: Retreating Units

Nejedlý, Petr January 2016 (has links)
Petr Nejedlý: Peripheral Units of the Old Czech Lexical System: Retreating Units (abstract of the thesis) The thesis deals with Old Czech lexical units retreating from the centre to periphery of the lexical system. It examines prerequisites of the process, i.e. loss of systemic relations of the lexical unit, and describes its individual forms (the extinction of a lexical unit in the last resort). The work submits an overview of both (intra)linguistic and extra-linguistic factors which are based on development of the lexical unit and influence its retreat to the periphery. Attention is paid to systemic changes in the language which modify internal relationships within the lexical system and also lead to the loss of central position of some lexical units.
9

Des modèles de langage pour la reconnaissance de l'écriture manuscrite / Language Modelling for Handwriting Recognition

Swaileh, Wassim 04 October 2017 (has links)
Cette thèse porte sur le développement d'une chaîne de traitement complète pour réaliser des tâches de reconnaissance d'écriture manuscrite non contrainte. Trois difficultés majeures sont à résoudre: l'étape du prétraitement, l'étape de la modélisation optique et l'étape de la modélisation du langage. Au stade des prétraitements il faut extraire correctement les lignes de texte à partir de l'image du document. Une méthode de segmentation itérative en lignes utilisant des filtres orientables a été développée à cette fin. La difficulté dans l’étape de la modélisation optique vient de la diversité stylistique des scripts d'écriture manuscrite. Les modèles optiques statistiques développés sont des modèles de Markov cachés (HMM-GMM) et les modèles de réseaux de neurones récurrents (BLSTM-CTC). Les réseaux récurrents permettent d’atteindre les performances de l’état de l’art sur les deux bases de référence RIMES (pour le Français) et IAM (pour l’anglais). L'étape de modélisation du langage implique l'intégration d’un lexique et d’un modèle de langage statistique afin de rechercher parmi les hypothèses proposées par le modèle optique, la séquence de mots (phrase) la plus probable du point de vue linguistique. La difficulté à ce stade est liée à l’obtention d’un modèle de couverture lexicale optimale avec un minimum de mots hors vocabulaire (OOV). Pour cela nous introduisons une modélisation en sous-unités lexicales composée soit de syllabes soit de multigrammes. Ces modèles couvrent efficacement une partie importante des mots hors vocabulaire. Les performances du système de reconnaissance avec les unités sous-lexicales dépassent les performances des systèmes de reconnaissance traditionnelles de mots ou de caractères en présence d’un fort taux de mots hors lexique. Elles sont équivalentes aux modèles traditionnels en présence d’un faible taux de mots hors lexique. Grâce à la taille compacte du modèle de langage reposant sur des unités sous-lexicales, un système de reconnaissance multilingue unifié a été réalisé. Le système multilingue unifié améliore les performances de reconnaissance par rapport aux systèmes spécialisés dans chaque langue, notamment lorsque le modèle optique unifié est utilisé. / This thesis is about the design of a complete processing chain dedicated to unconstrained handwriting recognition. Three main difficulties are adressed: pre-processing, optical modeling and language modeling. The pre-processing stage is related to extracting properly the text lines to be recognized from the document image. An iterative text line segmentation method using oriented steerable filters was developed for this purpose. The difficulty in the optical modeling stage lies in style diversity of the handwriting scripts. Statistical optical models are traditionally used to tackle this problem such as Hidden Markov models (HMM-GMM) and more recently recurrent neural networks (BLSTM-CTC). Using BLSTM we achieve state of the art performance on the RIMES (for French) and IAM (for English) datasets. The language modeling stage implies the integration of a lexicon and a statistical language model to the recognition processing chain in order to constrain the recognition hypotheses to the most probable sequence of words (sentence) from the language point of view. The difficulty at this stage is related to the finding the optimal vocabulary with minimum Out-Of-Vocabulary words rate (OOV). Enhanced language modeling approaches has been introduced by using sub-lexical units made of syllables or multigrams. The sub-lexical units cover an important portion of the OOV words. Then the language coverage depends on the domain of the language model training corpus, thus the need to train the language model with in domain data. The recognition system performance with the sub-lexical units outperformes the traditional recognition systems that use words or characters language models, in case of high OOV rates. Otherwise equivalent performances are obtained with a compact sub-lexical language model. Thanks to the compact lexicon size of the sub-lexical units, a unified multilingual recognition system has been designed. The unified system performance have been evaluated on the RIMES and IAM datasets. The unified multilingual system shows enhanced recognition performance over the specialized systems, especially when a unified optical model is used.
10

Recopilación y estudio de las unidades léxicas de la lengua alemana de la bioquímica. Aspectos terminológicos y lingüísticos

López Mateo, Coral 18 April 2023 (has links)
[ES] Es indiscutible la relevancia de las investigaciones en el ámbito de la bioquímica en la actualidad. Las aplicaciones de sus avances en otros campos como son el de la medicina y el de la farmacología, entre otros, son de suma importancia. La innovación en técnicas experimentales, así como en el desarrollo de la ciencia base de la bioquímica, implica además de enormes beneficios para la humanidad, la aparición de nuevos conceptos que hay que denominar y caracterizar. En esta tesis se aborda el estudio sistemático monolingüe de la terminología de la bioquímica en alemán a través de un objetivo general doble: recopilar las unidades léxicas de la lengua alemana de la bioquímica y estudiarlas desde un punto de vista terminológico y lingüístico. Para ello se revisan, por un lado, el marco teórico de la terminología y las lenguas especializadas con la finalidad de sentar las bases del presente estudio y, por otro, los fundamentos de la lingüística de corpus para planificar la recopilación y elaboración del que es objeto de estudio. Se recopilan 528 textos especializados de investigaciones originales de una revista de química aplicada con un índice elevado de impacto, de los que se extraen los términos reales del campo. Se procesan y revisan todos los textos para la extracción semiautomática de candidatos a términos y se elabora un árbol de campo para la selección de los términos y su posterior ubicación, dentro de la estructura conceptual. Se realiza un estudio sobre la disponibilidad de fuentes lexicográficas monolingües, bilingües y plurilingües en línea, para abordar las equivalencias de los términos al español, así como su definición en alemán. Se diseñan las fichas terminológicas en una BBDD que contienen todos los datos necesarios para realizar un trabajo terminográfico riguroso. Y, finalmente, se analizan y describen las unidades terminológicas recogidas desde el punto de vista formal, con la finalidad de caracterizar lingüísticamente la lengua especializada de la bioquímica. Los resultados muestran, por un lado, la necesidad de crear un recurso lexicográfico bilingüe alemán-español de la bioquímica y, por otro, aportan 895 unidades terminológicas del subcampo de la bioquímica humana para su publicación. / [CA] És indiscutible la rellevància de les investigacions en l'àmbit de la bioquímica en l'actualitat. Les aplicacions dels seus avanços en altres camps com són el de la medicina i el de la farmacologia, entre altres, són de summa importància. La innovació en tècniques experimentals, així com en el desenvolupament de la ciència base de la bioquímica, implica a més d'enormes beneficis per a la humanitat, l'aparició de nous conceptes que cal denominar i caracteritzar. En aquesta tesi s'aborda l'estudi sistemàtic monolingüe de la terminologia de la bioquímica en alemany a través d'un objectiu general doble: recopilar les unitats lèxiques de la llengua alemanya de la bioquímica i estudiar-les des d'un punt de vista terminològic i lingüístic. Per a això es revisen, d'una banda, el marc teòric de la terminologia i les llengües especialitzades amb la finalitat d'establir les bases del present estudi i, per un altre, els fonaments de la lingüística de corpus per a planificar la recopilació i elaboració del qual és objecte d'estudi. Es recopilen 528 textos especialitzats d'investigacions originals d'una revista de química aplicada amb un índex elevat d'impacte, dels quals s'extrauen els termes reals del camp. Es processen i revisen tots els textos per a l'extracció semiautomàtica de candidats a termes i s'elabora un arbre de camp per a la selecció dels termes i la seua posterior ubicació, dins de l'estructura conceptual. Es realitza un estudi sobre la disponibilitat de fonts lexicogràfiques monolingües, bilingües i plurilingües en línia, per a abordar les equivalències dels termes a l'espanyol, així com la seua definició en alemany. Es dissenyen les fitxes terminològiques en una BBDD que contenen totes les dades necessàries per a fer un treball terminogràfic rigorós. I, finalment, s'analitzen i descriuen les unitats terminològiques recollides des del punt de vista formal, amb la finalitat de caracteritzar lingüísticament la llengua especialitzada de la bioquímica. Els resultats mostren, d'una banda, la necessitat de crear un recurs lexicogràfic bilingüe alemany-espanyol de la bioquímica i, per un altre, aporten 895 unitats terminològiques del subàrea de la bioquímica humana per a la seua publicació. / [EN] The relevance of research in the field of biochemistry today is indisputable. The applications of its advances in other fields such as medicine and pharmacology, among others, are extremely important. Innovation in experimental techniques, as well as in the development of the basic science of biochemistry, implies the appearance of new concepts that must be named and characterized, in addition to enormous benefits for humanity. This thesis deals with the monolingual systematic study of biochemistry terminology in German through a double general objective: to compile the lexical units of the German language of biochemistry and to study them from a terminological and linguistic perspective. Thus, on the one hand, the theoretical framework of terminology and specialized languages are reviewed in order to lay the foundations of this study and, on the other hand, the principles of corpus linguistics to plan the compilation and elaboration of this corpus. To this aim, 528 specialized texts of original research from an applied chemistry journal with a high impact index have been compiled, from which the real terms of the field have been extracted. All texts have been processed and reviewed for the semi-automatic extraction of term candidates, and a field tree has been created for the selection of terms and their subsequent location within the conceptual structure. A study is carried out on the availability of online monolingual, bilingual, and multilingual lexicographical sources to address the equivalences of the terms in Spanish, as well as their definition in German. The terminology files have been designed in a database that contain all the necessary data to carry out rigorous terminographic work. Finally, the terminological units collected from the formal point of view are analyzed and described in order to linguistically characterize the specialized language of biochemistry. The results show, on the one hand, the need to create a bilingual German-Spanish lexicographical resource for biochemistry; on the other hand, they provide 895 terminological units from the subfield of human biochemistry for publication. / López Mateo, C. (2023). Recopilación y estudio de las unidades léxicas de la lengua alemana de la bioquímica. Aspectos terminológicos y lingüísticos [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/192877

Page generated in 0.0383 seconds