Global ETD Search

1	AstrolÃbio: um corpus de redaÃÃes escolares do CearÃ anotado multidimensionalmente conforme a TEI P5 / AstrolÃbio: a corpus of school writings of CearÃ multi-dimensionally annotated according to TEI P5 Katiuscia de Moraes Andrade 18 February 2013 (has links) CoordenaÃÃo de AperfeiÃoamento de Pessoal de NÃvel Superior / AstrolÃbio is a compiled corpus, with multidimensional annotation, and shared under Creative Commons Attribution-NonCommercial 3.0 Unported licence. It is a corpus, in Brazilian Portuguese, that uses advanced technologies to text processing and corpora annotation. AstrolÃbio has multidimensional annotation based on TEI P5 guidelines, that prescribes XML metalanguage. Through these guidelines, essential structures from the annotated documents were preserved, keeping the transcription as reliable as possible to the original. By using tag <choice>, it enabled keep, in the same archive, linguistic variation phenomena, orthographic and punctuation errors, as the respectives corrected and normalized forms, and also makes possible the visualization of added and deleted terms. To automatize the integration of many levels of annotation, Astro was used, it is a software that works with several Python modules to Natural Language Processing (NLP), including Aelius and Enchant. To POS tagging, Aelius, a package that uses Natural Language Toolkit (NLTK) libraries, was utilized. From Aelius, AeliusHunPosMacMorpho was chosen, it is a tagger based on HunPos and trained by MAC-Morpho, a corpus composed of journalistic texts. The 9spell checking was made by Enchant, a large library with API (Application Programming Interface) in C and C++ languages. The tagger chosen from inside training corpus MacMorpho,. AstrolÃbio's texts were produced during text production workshops from the second edition of Rota das Especiarias project, realized on first semester of 2012, with public school students from Camocim, Barroquinha e Jijoca de Jericoacoara, cities located in CearÃ. Until this moment of AstrolÃbio's creation, concluded stages are texts selection, compilation and the first step of automatic annotation by Astro. AstrolÃbio corpus is already partially avaiable at Rota das Especiarias' website (www.rotadasespeciarias.art.br). Soon, the corpus will be submitted to University of Oxford Text Archive (OTA). As we observed from corpora scene of Portuguese, there's no corpus, in Brazilian Portuguese, with this level of annotation. / AstrolÃbio Ã um corpus compilado, anotado multidimensionalmente e disponibilizado eletronicamente sob a licenÃa Creative Commons Attribution-NonCommercial 3.0 Unported. Trata-se de um corpus, em PortuguÃs brasileiro, que emprega avanÃadas tecnologias para o processamento de texto e anotaÃÃo de corpora. AstrolÃbio possui anotaÃÃo multidimensional baseada na codificaÃÃo TEI P5, que prescreve o uso metalinguagem XML. Com o uso dessa codificaÃÃo, preservaram-se caracterÃsticas essenciais da estrutura e do conteÃdo dos documentos anotados, tornando a transcriÃÃo o mais fiel possÃvel ao original. Por meio do emprego da tag <choice>, foi possÃvel reunir, em um mesmo arquivo, fenÃmenos de variaÃÃo linguÃstica, erros ortogrÃficos e de pontuaÃÃo, bem como as respectivas formas corrigidas e normalizadas, alÃm de possibilitar a visualizaÃÃo de termos que foram acrescidos ou suprimidos. Para a integraÃÃo automÃtica dos vÃrios nÃveis de anotaÃÃo, utilizou-se o Astro, um software que utiliza diversos mÃdulos em Python para o Processamento da Linguagem Natural (PLN), como o Aelius e o Enchant. Na etiquetagem morfossintÃtica, utilizou-se o pacote Aelius, que, por sua vez, recorre Ã biblioteca Natural Language Toolkit (NLTK). O etiquetador escolhido, dentro do Aelius, foi o AeliusHunposMacMorpho, criado a partir do etiquetador Hunpos, treinado no corpus de textos jornalÃsticos MAC-Morpho. Efetivou-se a correÃÃo ortogrÃfica com o Enchant, uma vasta biblioteca com API (Application Programming Interface) em linguagem C e C++. Os textos que compÃem esse corpus foram produzidos durante as oficinas de produÃÃo textual da segunda ediÃÃo do projeto Rota das Especiarias, realizadas no primeiro semestre de 2012, com alunos de escolas pÃblicas das cidades cearenses de Camocim, Barroquinha e Jijoca de Jericoacoara. AtÃ o presente momento da construÃÃo do AstrolÃbio, encontram-se concluÃdas as etapas de seleÃÃo, escanerizaÃÃo, compilaÃÃo e a primeira fase de anotaÃÃo automÃtica dos textos por meio do Astro. O corpus AstrolÃbio jÃ se encontra parcialmente disponÃvel no sÃtio eletrÃnico Rota das Especiarias (www.rotadasespeciarias.art.br). Em breve, serÃ submetido ao repositÃrio eletrÃnico University of Oxford Text Archive (OTA). Pelo que se observou do panorama de corpora do PortuguÃs, inexiste um corpus, em PortuguÃs Brasileiro, com esse nÃvel de anotaÃÃo. LinguÃstica Computacional LinguÃstica de Corpus TEI P5 NLTK CorreÃÃo automÃtica de textos Etiquetagem morfossintÃtica LINGUISTICA
2	A criaÃÃo de um sistema hÃbrido de traduÃÃo automÃtica para a conversÃo de expressÃes nominais da lÃngua inglesa / The Creation of a Hybrid Machine Translation for the Conversion of Nominal Expressions from English Tiago Martins da Cunha 18 December 2013 (has links) CoordenaÃÃo de AperfeiÃoamento de NÃvel Superior / Deutscher Akademischer Austausch Dienst / A traduÃÃo automÃtica (TA) teve grande parte de sua credibilidade questionada por tradutores profissionais por muitos anos. No entanto, o uso de sistemas de TA tornou-se uma necessidade, a fim de organizar e acelerar o processo de traduÃÃo. A maioria dos usuÃrios, profissionais ou nÃo, nÃo tem conhecimento sobre o design das ferramentas que integram o sistema que eles usam. A concepÃÃo de um sistema de TA consiste de uma cadeia de ferramentas que formam o motor de um sistema de TA. Assim, propÃe-se a descriÃÃo e a criaÃÃo de uma ferramenta de traduÃÃo que seja capaz de lidar com expressÃes nominais da lÃngua Inglesa para portuguesa. As expressÃes nominais em InglÃs podem ser compostas de elementos como genitivo e gerÃndios, que nÃo apresentam correspondentes para o portuguÃs. Assim, estes elementos causam dificuldades para os sistemas de TA . O nosso objetivo Ã o de criar um sistema de TA que seja capaz de lidar com este problema de maneira satisfatÃria. O sistema desenvolvido e descrito nesta tese foi treinado com expressÃes nominais do corpus Europarl e testado com expressÃes nominais tratadas na literatura sobre a sintaxe dos sintagmas nominais. Nosso sistema apresentou resultados que consideramos satisfatÃrios de acordo com escores obtidos nas avaliaÃÃes manual e automÃtica ao compararmos com os resultados obtidos por outros sistemas de TA disponÃveis gratuitamente para utilizaÃÃo. / Machine translation (MT) had much of its credibility questioned by professional translators for many years. However, the use of MT systems has become a necessity in order to organize and accelerate the translation process. Most users, professionals or not, have no knowledge about the design of the tools that integrate the system they use. The design of a MT system consists of a pipeline of tools that form the systemâs engine. Thus, we propose the description and the creation of a translation tool that would able to handle nominal expressions from English to Portuguese. The nominal expressions in English may be composed of elements as genitive and gerunds, which lack Portuguese correspondents. Thus, these elements cause difficulties for MT systems. Our goal is to create a MT system that is able to deal satisfactorily with this problem. The system developed and described in this thesis was trained with nominal expressions from the Europarl corpus and tested with nominal expressions handled in the literature of noun phrases syntax. Our system showed what we consider satisfactory results according to the scores in the manual and automatic evaluation when we compare the results from other MT systems freely available for use. TraduÃÃo Sintaxe LinguÃstica Computacional TraduÃÃo AutomÃtica MemÃrias de TraduÃÃo.&#8236 Syntax, Computational Linguistics Machine Translation Translation Memories. LINGUISTICA APLICADA
3	CompilaÃÃo, anotaÃÃo e anÃlise linguÃstico-computacional de um corpus de textos literÃrios dos sÃculos XIX e XX: corpus Coelho Neto / Compilation, annotation and linguistic and computational analysis of corpus Coelho Netto (CCN), a corpus of literary texts of 19th and 20th centuries Francimary MacÃdo Martins 06 June 2014 (has links) nÃo hÃ / Esta tese Ã a compilaÃÃo, anotaÃÃo morfossintÃtica e anÃlise linguÃstico-computacional de um corpus de textos literÃrios dos sÃc. XIX e XX: o Corpus Coelho Netto (CCN), contendo textos dos romances A Conquista e TurbilhÃo e contos do livro SertÃo. O trabalho estÃ na interface da LinguÃstica de Corpus e da LinguÃstica Computacional (BERBER SARDINHA, 2000, 2003, 2004, 2005, 2009; BERBER SARDINHA; ALMEIDA, 2008; OLIVEIRA, 2009; BIDERMAN, 1998, 2001; ALUÃSIO; ALMEIDA, 2006; SHEPHERD, 2012; MACENERY E WILSON, 2001; LEECH, 2004; ALVES; TAGNIN, 2012; ALENCAR, 2009, 2010a, 2010b, 2011a, 2011b, 2013a, 2013b). O CCN contÃm 53.080 (cinquenta e trÃs mil e oitenta) tokens (pontuaÃÃo e palavras). A compilaÃÃo consiste nas etapas de seleÃÃo, coleta de textos e manipulaÃÃo; nesta sÃo realizadas a limpeza, ediÃÃo e atualizaÃÃo dos textos (ALUÃSIO; ALMEIDA, 2006), para depois ser submetido Ã anotaÃÃo morfossintÃtica e anÃlise linguÃstico-computacional, com o objetivo de obter dados que comprovem ou nÃo o uso âexcessivoâ de adjetivos, de verbos e de advÃrbios em âmente, demonstrando a diversidade lexical nos textos de Coelho Netto, constatando se o que a crÃtica modernista dizia a respeito do escritor era procedente. A anotaÃÃo morfossintÃtica foi realizada pelo etiquetador automÃtico Aelius, modelo AeliusHunPos, um software livre em Python que utiliza a biblioteca Natural Language Toolkit â NLTK (BIRD; KLEIN; LOPER, 2009), no prÃ-processamento de textos, na construÃÃo de etiquetador morfossintÃtico e na anotaÃÃo de corpora com auxÃlio de revisÃo humana (ALENCAR, 2010a, 2013a, 2013b), e que foi treinado no Corpus HistÃrico do PortuguÃs Tycho Brahe (CHPTB). A compilaÃÃo e anotaÃÃo do CCN envolve outras aÃÃes como a reavaliaÃÃo da acurÃcia desse etiquetador em textos literÃrios. Os resultados da pesquisa revelaram que: o AeliusHunpos ao anotar os textos do CCN demonstrou maior acurÃcia que em outros textos jÃ anotados, de 97,9%; que o modelo AeliusHunPos mostrou um desempenho muito alÃm ao anotar os corpora que com o modelo AeliusMaxEnt; e que, apÃs a seleÃÃo e correÃÃo manual dos 10% dos corpora anotados e gerados arquivos padrÃo gold, sugerimos um melhoramento dos aproximados 3% de erros cometidos pelo etiquetador, visando o aumento de sua acurÃcia. Quanto Ãs analises realizadas com os dados obtidos no CCN constatamos que: a diversidade lexical, especificamente quanto a verbos, adjetivos e advÃrbios em âmente, declarada como exagerada pela crÃtica Ã Coelho Netto nÃo procede, pois seus textos sÃo ricos, mas quando comparados aos textos de AluÃsio Azevedo e Camilo Castelo Branco, o Corpus de ComparaÃÃo, apresentam riqueza vocabular similar ao CCN, como expostos nos resultados. / This thesis is the compilation, morphosyntactic annotation and linguistic and computational analysis of a corpus of literary texts of 19th and 20th centuries: Corpus Coelho Netto (CCN), containing texts of the novels A Conquista and TurbilhÃo and short stories of the book SertÃo. The work is in the Corpus Linguistics and Computational Linguistics interface (BERBER SARDINHA, 2000, 2003, 2004, 2005, 2009; BERBER SARDINHA; ALMEIDA, 2008; OLIVEIRA, 2009; BIDERMAN, 1998, 2001; ALUÃSIO; ALMEIDA, 2006; SHEPHERD, 2012; MACENERY AND WILSON, 2001; LEECH, 2004; ALVES; TAGNIN, 2012; ALENCAR, 2009, 2010a, 2010b, 2011a, 2011b, 2013a, 2013b). The CCN contains 53.080 (fifty-three thousand and eighty) tokens. The compilation consists of the steps selection, collection off texts and handling; in which cleaning, editing and updating of texts (ALUÃSIO; ALMEIDA, 2006), and then be submitted to the morphosyntactic annotation and linguistic-computational analysis, with the goal of obtaining data to show whether or not the "excessive" use of adjectives, verbs and adverbs in ââmenteâ, demonstrating the lexical diversity in Coelho NettoÂs texts, noting if what the modernist critics said about the writer was correct. The annotation was performed by automatic tagger Aelius, AeliusHunPos model, free software in Python that uses the Natural Language Toolkit â NLTK library (BIRD; KLEIN; LOPER, 2009), in the pre-processing of texts, in the construction of morphosyntactic tagger and the automatic annotation of corpora with the help of human review (ALENCAR, 2010a, 2013a, 2013b), and it was trained in the Historical Corpus of Tycho Brahe Portuguese (CHPTB). The compilation and annotation CCN involves other actions such as revaluation the accuracy of this tagger in literary texts. The search results indicated that: AeliusHunpos demonstrated better performance than other texts already noted (97.9 %); AeliusHunPos model showed a far beyond performance by annotating corpora with AeliusMaxEnt model; and that, after selection and manual correction of 10% annotated corpora and generated gold standard files, it is suggested an improvement of the approximate 3% of errors by the tagger, in order to increase its accuracy. Regarding the analyzes performed with the CCN, it was found that: lexical diversity - about verbs, adjectives and adverbs in ââmenteâ considered exaggerated by critics to Coelho Netto unfounded, because his texts are rich, but when compared to the texts by AluÃsio Azevedo and Camilo Castelo Branco, comparison of corpus, present vocabulary richness similar to CCN, as exposed in the results. LinguÃstica de Corpus LinguÃstica Computacional Etiquetagem MorfossintÃtica AeliusHunPos Coelho Netto Corpus Linguistics Computational Linguistics Morphosyntactic tagging AeliusHunPos Coelho Netto LINGUISTICA APLICADA
4	Adjetivos adverbializados: anÃlise lÃxico-funcional e implementaÃÃo computacional Daniel de FranÃa Brasil Soares 00 September 2018 (has links) FundaÃÃo de Amparo Ã Pesquisa do Estado do CearÃ / In this work, we propose a computational linguistic analysis of the so-called adverbialized adjectives (hereinafter AdjAdvs). On the one hand, we start by questioning whether AdjAdvs belong to category A(djective) or ADV(erb) and, on the other, which approach is computationally more efficient. We base our point of view according to the Lexical-Functional Grammar (LFG) (KAPLAN and BRESNAN, 1982) and implement in the XLE system (Xerox Linguistic Environment) a fragment of Brazilian Portuguese grammar (henceforth PB) capable of analyzing adjectives in adverbial use. Our implementation is based on the adaptation of a fragment of French grammar constructed by Schwarze and Alencar (2016) and deepened in FrGramm by Alencar (2017). This fragment of grammar adapted to PB serves as the basis for the implementation of two versions for a comparative analysis: G-A and G-ADV. In the first version, AdjAdvs are analyzed as adjectival category, while in the second they are analyzed as adverbial category. The implementation of G-A and G-ADV is evaluated by applying a parser to a set of 168 grammatical sentences and 286 ungrammatical sentences. After testing grammatical and ungrammatical sentence sets, G-A and G-ADV grammars processing results in XLE and the statistical analysis based on the double factor variance test, we concluded that there was no significant difference in treatment of syntax between G-A and G-ADV versions built to parse AdjAdvs. This result reinforces Radford (1988) argument that adjectives and adverbs belong to a single category. / Neste trabalho, propomos uma anÃlise linguÃstico-computacional dos chamados adjetivos adverbializados (doravante AdjAdvs). Partimos, por um lado, do questionamento se AdjAdvs pertencem Ã categoria A(djetivo) ou ADV(Ãrbio) e, por outro, que abordagem Ã computacionalmente mais eficiente. Fundamentamos nosso ponto de vista de acordo com a GramÃtica LÃxico-Funcional (LFG, em inglÃs Lexical-Functional Grammar) (cf. KAPLAN e BRESNAN, 1982) e implementamos no sistema XLE (do inglÃs Xerox Linguistic Environment) um fragmento de gramÃtica do portuguÃs brasileiro (doravante PB) capaz de analisar adjetivos em uso adverbial. Nossa implementaÃÃo parte da adaptaÃÃo de uma minigramÃtica do francÃs construÃda por Schwarze e Alencar (2016) e aprofundada na FrGramm por Alencar (2017). Esse fragmento de gramÃtica adaptado ao PB serve de base para a construÃÃo de duas versÃes para uma anÃlise comparativa: G-A e G-ADV. Na primeira versÃo, AdjAdvs sÃo analisados como categoria adjetival, enquanto na segunda sÃo analisados como categoria adverbial. A implementaÃÃo de G-A e G-ADV Ã avaliada pela aplicaÃÃo de um analisador sintÃtico automÃtico (parser) a 168 sentenÃas gramaticais e 286 sentenÃas agramaticais. ApÃs os testes nos conjuntos de sentenÃas gramaticais e agramaticais, os resultados de processamento das gramÃticas G-A e G-ADV no software XLE e a anÃlise estatÃstica com base no teste de variÃncia de fator duplo, chegamos Ã conclusÃo de que nÃo hÃ diferenÃa significativa no tratamento sintÃtico entre as versÃes G-A e G-ADV construÃdas para analisar AdjAdvs. Esse resultado reforÃa o argumento de Radford (1988) de que adjetivos e advÃrbios pertencem a uma Ãnica categoria. LinguÃstica Computacional AnÃlise sintÃtica automÃtica profunda GramÃticaLÃxico-Funcional LFG-XLE Adjetivos adverbializados. Computational Linguistics Deep syntactic parsing Lexical-Functional Grammar LFG-XLE Adverbialized adjectives LINGUISTICA
5	Uma gramÃtica LFG-XLE para o processamento sintÃtico profunda do portuguÃs / A LFG-XLE grammar for Brazilian Portuguese deep parser AndrÃa Feitosa dos Santos 19 December 2014 (has links) CoordenaÃÃo de AperfeiÃoamento de Pessoal de NÃvel Superior / A presente tese descreve a elaboraÃÃo de uma gramÃtica da frase do PortuguÃs Brasileiro, desenvolvida no quadro de um modelo teÃrico de sofisticado formalismo computacional, a Lexical Functional Grammar (LFG) e implementada no sistema que constitui o estado da arte em ambiente de processamento sintÃtico profundo no modelo gerativo da LFG, o robusto Xerox Linguistic Environment (XLE). A principal caracterÃstica da gramÃtica Ã que adota o sistema de anotaÃÃo do ParGram e a metodologia convencionada por desenvolvedores de gramÃtica XLE. No fragmento de gramÃtica estÃo modelados diversificados elementos da sintaxe frasal. Em nossa gramÃtica, foram modelados constituintes oracionais como IP e CP, elementos que encabeÃam as sentenÃas do portuguÃs. TambÃm foram modelados determinados aspectos da subcategorizaÃÃo verbal e da estrutura argumental. Dos elementos verbais, nossa gramÃtica contempla alguns casos de complexos verbais constituÃdos de verbos modais e verbos de controle. Os elementos nominais tratados na gramÃtica, de modo central, foram os pronomes expletivos e reflexivos, e os casos de sintagmas nominais e determinantes com pronomes demonstrativos e interrogativos. Os demais aspectos modelados na gramÃtica sÃo os sintagmas preposicionados, cuja complexidade se dÃ na distinÃÃo entre preposiÃÃes semÃnticas e nÃo semÃnticas; os sintagmas adjetivais, cuja projeÃÃo na sentenÃa pode ocorrer a partir de formas adjetivais atributivas, de formas ordinais ou cardinais e na forma de intensificadores; e os sintagmas adverbiais, cuja estrutura interna foi modelada levando-se em consideraÃÃo tanto advÃrbios intransitivos quanto transitivos com complemento PP. A nossa avaliaÃÃo demonstra que das 40 sentenÃas testadas, a nossa gramÃtica atribui, para todas elas, anÃlises consistentes e bem fundamentadas, ao passo que o parser Palavras, o atual estado da arte em processamento sintÃtico profundo do portuguÃs, atribui, a 9 sentenÃas, anÃlises incorretas. Uma outra avaliaÃÃo demonstra que, das 20 sentenÃas agramaticais testadas tanto em nossa gramÃtica, quanto no Palavras, somente 2 receberam anÃlises por parte de nossa gramÃtica, enquanto o Palavras fornece anÃlises para 19 sentenÃas. O trabalho tem, essencialmente, o objetivo de fazer uma descriÃÃo formal e fundamentada de um amplo leque de fenÃmenos do portuguÃs brasileiro, mas, sobretudo, tem o objetivo de contribuir com uma gramÃtica nÃo trivial da frase do portuguÃs no formalismo LFG-XLE, disponibilizando efetivamente um recurso gramatical do portuguÃs voltado para o processamento de linguagem natural. / The present thesis describes the development of a Brazilian Portuguese sentence grammar, developed in the framework of a sophisticated computational formalism, named Lexical Functional Grammar, and implemented on a system that is state of the art in deep parsing environment in LFG generative model, the robust XLE. The main feature of the grammar is that it adopts the ParGram annotation system and the methodology agreed by XLE grammar developers. In the grammar fragment are modeled diverse elements of phrasal syntax. In our grammar were modeled constituents as IP and CP, elements that are head the sentences of the Portuguese. Also were modeled certain aspects of verbal subcategorization and argument structure. In terms of verbal elements, our grammar includes some cases of verbal complex made up of modal verbs and control verbs. The nominal elements treated in grammar, centrally, were the expletives and reflexive pronouns, and cases of nominal and determiners phrases with demonstrative pronouns and interrogative. The other aspects modeled in the grammar are PPs, whose complexity is given the distinction between semantic and nonstandard prepositions; the adjectival phrases, whose projection in the sentence can occur from attributive adjectival forms of ordinal or cardinal forms and as intensifiers; and adverbial phrases, whose internal structure was modeled taking into account both adverbs as intransitive and as transitive, with PP complement. Our evaluation shows that of the 40 tested sentences, our grammar assigns, for all of them, consistent and well-founded analysis, while the parser Palavras, the current state of the art in deep syntactic processing of Portuguese, assigns incorrect analysis for 9 sentences. Another evaluation shows that, of the 20 ungrammatical sentences tested both in our grammar, as in Palavras, only 2 received analysis by our grammar, while the Palavras provides analysis to 19 sentences. The work has essentially the goal of making a formal and grounded description in a broad range of phenomena in Brazilian Portuguese, but mainly aims to collaborate with a not trivial grammar of the sentence in the LFG-XLE formalism, effectively contributing to a grammatical resource turned to the natural language processing. GramÃtica LFG-XLE AnÃlise sintÃtica profunda LinguÃstica computacional Processamento de linguagem natural ParGram GramÃtica LFG-XLE AnÃlise sintÃtica profunda LinguÃstica computacional Processamento de linguagem natural ParGram LFG-XLE Grammar AnÃlise sintÃtica profunda ParGram Computational linguistics Natural Language Processing LFG-XLE Grammar AnÃlise sintÃtica profunda ParGram Computational linguistics Natural Language Processing LINGUISTICA APLICADA LINGUISTICA APLICADA

1

Page generated in 0.0979 seconds