Spelling suggestions: "subject:"[een] CORPUS"" "subject:"[enn] CORPUS""
51 |
Acquisition de schémas prédicatifs verbaux en japonais / Verbal predicate-frame acquisition in JapaneseMarchal, Pierre 15 October 2015 (has links)
L'acquisition de connaissances relatives aux constructions verbales est une question importante pour le traitement automatique des langues, mais aussi pour la lexicographie qui vise à documenter les nouveaux usages linguistiques. Cette tâche pose de nombreux enjeux, techniques et théoriques. Dans le cadre de cette thèse, nous nous intéressons plus particulièrement à deux aspects fondamentaux de la description du verbe : la notion d'entrée lexicale et la distinction entre arguments et circonstants. A la suite de précédentes études en traitement automatique des langues et en linguistique nous faisons l'hypothèse qu’il n’y a pas de distinction marquée entre homonymes et quasi-synonymes ; de même, nous posons qu’il existe un continuum entre arguments et circonstants. Nous proposons une chaîne de traitement complète pour l'acquisition de schémas prédicatifs verbaux en japonais à partir d'un corpus non étiqueté de textes journalistiques. Cette chaîne de traitement intègre la notion d'argumentalité au processus de création des entrées lexicales et met en œuvre une modélisation de ces deux continuums. La ressource produite a fait l'objet d'une évaluation comparative qualitative, qui a permis de mettre en évidence la difficulté des ressources linguistiques à décrire de nouvelles données, plaidant par là même pour une lexicologie s'inscrivant dans le cadre épistémologique de la linguistique de corpus. / Lexical knowledge acquisition of verbal constructions is an important issue for natural language processing as well as lexicography, which aims at referencing emerging linguistic usages. Such a task implies numerous challenges, technical as well as theoretical. In this thesis, we had a closer look at two fundamental aspects of the description of the verb: the notion of lexical item and the distinction between arguments and adjuncts. Following up on studies in natural language processing and linguistics, we embrace the hypothesis that there is no clear distinction between homonyms and quasi-synonyms, and the hypothesis of a continuum between arguments and adjuncts. We provide a complete approach to lexical knowledge acquisition of verbal constructions from an untagged news corpus. The acquisition process makes use of the notion of argumenthood, and builds models of the two continuums. Our lexicon has been evaluated on a qualitative and comparative basis. Siding with lexicography anchored in the theoretical framework of corpus linguistics, we show the difficulty of using lexical resources to describe as yet unseen data.
|
52 |
Proficiência escrita em inglês especializado : estudo de corpus de abstracts em Medicina, Nutrição e FarmáciaFreitas, Ana Luiza Pires de January 2016 (has links)
Este trabalho explora o desenvolvimento da proficiência escrita em língua inglesa no âmbito da produção de abstracts, no campo das Ciências da Saúde. O objetivo é contribuir para a elaboração de materiais instrucionais, para a formação de educadores linguísticos e para os avanços do campo de ensino e aprendizagem de English for Academic Purposes. A pesquisa reuniu, descreveu e analisou um corpus de 180.170 palavras, com abstracts das áreas de Medicina, Nutrição e Farmácia, com base nos fundamentos da Linguística de Corpus, da Linguística das Linguagens Especializadas e dos Estudos em English for Academic Purposes. A unidade analítica do estudo são os pacotes lexicais (lexical bundles), sequências recorrentes de palavras empregadas nos textos. Para o trabalho de extração e identificaçāo de pacotes lexicais, estabeleceu-se o critério de extensão de 4 palavras gráficas e frequência e distribuição mínimas de 5 ocorrências em, pelo menos, 5 textos diferentes, tanto para o acervo internacional, quanto para o brasileiro. Foram extraídos 96 pacotes lexicais do subcorpus internacional, com 90.098 palavras, e 88 sequências recorrentes do subcorpus brasileiro, com 90.072 palavras. Com base nas métricas de frequência e variabilidade lexical, constatam-se distinções nos modos de narrar a ciência entre as duas partes do acervo. O subcorpus brasileiro apresentou maior repetição de associações de palavras e um maior emprego de lexical bundles para expressar a finalidade e registrar a realização do trabalho acadêmico. O subcorpus internacional, por sua vez, caracterizou-se pela diversidade dos pacotes lexicais, pela objetividade da narrativa e pelo uso de feixes de palavras para destacar o fazer científico propriamente dito. Embora os resultados obtidos sejam específicos para o corpus reunido, os achados reforçam a importância de educadores linguísticos e desenhistas de programas de ensino e aprendizagem reconhecerem as peculiaridades dos contextos de produção dos abstracts, para que a prática pedagógica seja sintonizada às necessidades do aprendiz. Na conclusão do estudo, sāo apresentadas sugestōes para aproveitamento dos resultados em atividades de ensino. / This research explores the development of written proficiency in English regarding the production of abstracts in the filed of Health Sciences. As such, it aims at contributing to the advances in the studies of English for Academic Purposes by fostering language teachers’ development, and by providing support to the creation of instructional materials. Based on Corpus Linguistics, Linguistics for Specialized Languages and English for Academic Purposes, the investigation put together, described and analyzed a corpus of 180,170 words, comprised by abstracts in Medicine, Nutrition and Pharmacy. The analytical study units are lexical bundles, recurrent strings of words used in texts. For the bundles extraction and identification, an extent criterion of 4 graphic words and a frequency and minimum distribution of 5 occurrences, in at least 5 different texts in each of the two parts of the corpus, were established. 96 lexical bundles were extracted from the international subcorpus, which adds up to 90,098 words, whilst 88 recurrent word sequences were obtained from the Brazilian subcorpus, which amounts to 90,072 words. Regarding the metrics of lexical frequency and variability, the two data segments uncovered distinctions in the ways of building up a scientific narrative. A larger repetition of word associations and a higher use of lexical bundles to express purpose and to highlight the achievement of the academic endeavor were noticed in the Brazilian subcorpus. The international subcorpus, on the other hand, features more diverse recurrent strings of words, a concise prose and the use of extended collocations to highlight the scientific enterprise in itself. Although these findings are specific to the corpus studied, they bring out the usefulness of language educators’ and program designers’ awareness of the peculiarities of the different abstract production contexts, so that pedagogical practice can be attuned to learners’ needs. Suggestions for the application of the findings in teaching tasks are provided in the concluding part of the investigation.
|
53 |
Analyse de relations de discours causales en corpus : étude empirique et caractérisation théorique / Corpus analysis of causal discourse relations : empirical study and theoretical characterizationAtallah, Caroline 22 October 2014 (has links)
Dans cette thèse, nous nous interrogeons sur les réalisations linguistiques des relations causales selon une approche sémantique et pragmatique du discours. Bien que la causalité occupe une place centrale dans les théories du discours, il n’existe pas de consensus quant aux relations qui lui sont associées. Confrontant les propositions faites dans la littérature avec nos observations sur des données attestées, nous proposons de contribuer à l’enrichissement d’une théorie du discours spécifique : la SDRT (Segmented Discourse Representation Theory). Cette thèse se situe donc à l’interface entre linguistique de corpus et linguistique théorique. Les analyses qui y sont menées s’appuient sur le corpus EXPLICADIS, corpus de français écrit constitué spécifiquement pour répondre à l’objectif visé. L’annotation de ce corpus en relations de discours causales nous a ainsi autorisée à procéder à l’analyse de ces relations selon une approche originale qui consiste à prendre pour point de départ la relation elle-même et non ses marqueurs. Cette approche nous a permis d’offrir une vision unificatrice de la causalité en caractérisant les relations de discours qui lui sont liées dans le cadre théorique de la SDRT. Elle nous a également permis de mener des études quantitatives et comparatives sur corpus. Notre travail dresse, en outre, un panorama des moyens d’expression de la causalité observés à l’écrit en français. / The purpose of this thesis is to study the linguistic realizations of causal relations, according to a semantic and pragmatic approach of discourse structure. Even though causality is a central phenomenon in most theoretical frameworks on discourse, to date there is no consensus on the relations associated to it. Confronting the hypotheses put forward in the literature with our own observations on the basis of attested data, we offer to enrich a specific discourse theoretical model, i.e. SDRT (Segmented Discourse Representation Theory). Therefore, this study stands at the interface between corpus linguistics and theoretical linguistics. The analyses we carried out are based on the EXPLICADIS corpus, which is a written French corpus built specifically to meet the objective. Annotating this corpus with causal discourse relations allowed us to analyze these using an original approach which consists in starting from the relation itself rather than its markers. This approach provided us with the opportunity to offer a unified vision of causality by characterizing the different discourse causal relations in the framework of SDRT. It also provided us with the opportunity to conduct quantitative and comparative corpus studies. Our work also includes an overview of the different means of expression of causality that are documented in written French.
|
54 |
Discrimination prosodique et représentation du lexique : application aux emplois des connecteurs discursifs / Prosodic discrimination in the representation of the lexicon : an application to discourse connectivesPetit, Mélanie 28 November 2009 (has links)
Dans le cadre d’une sémantique linguistique reposant sur la distinction signification/sens et partant du principe que le sens se construit en discours, nos recherches ont pour objectif de rendre compte de la diversité des emplois d’un signe dans une perspective intégrant la prosodie, afin de définir un processus de discrimination prosodique des différents sens d’une même unité tels qu'ils peuvent être décrits sur la base de corpus oraux authentiques. Elles portent sur un ensemble d’objets empiriques, de enfin à quelques ou oui en passant par disons, mais principalement sur des connecteurs discursifs. Après avoir mis au jour des corrélations forme prosodique/sens au niveau du lexique, et en prenant en compte le caractère gradable de la langue ainsi que la notion d’argumentation dans la langue, nous proposons un nouveau format de représentation sémantique distinguant, sur la base de nos résultats, deux niveaux de sens que sont l’interprétation-type et l’emploi-type, ce dernier présentant la particularité de comporter un commentaire exprimé par la prosodie, commentaire qui porte sur le rapport à la situation et/ou à l'énonciation. L’intégration d’un niveau de sens supplémentaire constitue l’originalité de ce nouveau format et présente l’avantage de réduire les phénomènes de surgénéralisations observables dans les caractérisations sémantiques des emplois. Nous présentons ensuite la façon dont nos résultats pourraient être intégrés à une perspective lexicographique, et dont ils pourraient permettre d'obtenir à la fois une plus grande cohérence et une plus grande exhaustivité des articles d’une entrée de dictionnaire, et une prise en compte systématique de la prosodie des emplois. / Within a linguistic approach to semantics based on the distinction between signification and sense (lexical meaning) and the assumption that sense is built it in discourse, our research aims is to account for the diversity of uses of a sign in a perspective that integrates the prosodic dimension of the interpretative process. Based extensively on authentic oral corpora and, its goal is to define a process of prosodic discrimination of the different senses of the same lexical unit. It deals with of empirical objects, from French enfin to quelques (some) or oui (yes) passing by disons (so to say, etc.), but essentially about discourse connectives (discourse markers). After establishing the existence of an association of prosody and senses at the level of lexicon, which takes into account the gradable nature of language (la langue) and its argumentative nature, we present a new semantic model in which the classical level of sense or lexical meaning is described as split between two levels of sense which are interpretation-type and use-type, the specificity of the latter being to include a prosodically expressed lexicalized comment about the speaker's relationship with the situation and/or its own discourse or enunciation. The integration of an additional level of sense is the originality of the new model. It aims at reducing the otherwise constant fatality of over-generalisation in the semantic characterisations of lexical uses. We further present the way our results could be integrated into a lexicographic perspective in order to obtain both more coherent and more exhaustive descriptions of actual language use in dictionaries and a systematic description of prosody within each entry.
|
55 |
An investigation of students' experiences with corpus technology in second language academic writingYoon, Hyunsook 09 March 2005 (has links)
No description available.
|
56 |
Corpus callosum thickness on MRI as a surrogate marker of brain volume in children with HIV-related brain disease and its correlation with developmental scoresAndronikou, Savvas January 2015 (has links)
A research report submitted to the Faculty of Health Sciences, University of the
Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor
off Philosophy
Johannesburg, 2015 / Background
Objective volumetric assessment of white matter in children with HIV involves post M
processing, while corpus callosum (CC) thickness measurement on midMsagittal MRI may
represent a rapid surrogate marker.
Aim
To determine whether the thickness of the CC on midMsagittal MRI can be used as a
surrogate marker of brain volume in children with HIV Mrelated brain disease and in
appropriate controls and to determine whether thickness at particular locations
correlates with mental developmental scores and laboratory markers of immunity.
Methods
A retrospective analysis of 33 children with HIV Mrelated neurology(range 7 M 49 months;
median31 months; mean 30 months; 16 boys and 17 girls) and matched controls (range
13 – 48 months; median 34 months; mean 32 months; 6 boys and 5 girls) was performed.
A custom software tool imported sagittal MRI images, divided the midline CC contour into
40 segments and measured the thickness of each segment as well as the length of the CC.
Brain volume (total brain volume (TBV); white matter volume (WMV);grey matter volume
(GMV)) was determined using MATLAB and Statistical Parametric Mapping software.
Overall and segmental CC mean and maximum thickness and CC length were checked for
correlation with brain volume, Griffiths mental development scores(GMDS) and
laboratory parameters.
Results
Griffiths scores in patients were ‘low average’ (mean Griffiths general quotient (GQ) of 84,
range 72 – 101; ‘locomotor’ 84, range 59 – 116; ‘language’ 80; range57 –118).
There was no statistical difference in overall and regional CC thickness, CC length, TBV,
GMV and WMV between patients and controls.
Significant correlation was found in patients for the premotor CC mean with age (p =
0.04). Other significant correlations of CC measurements and laboratory / clinical
parameters were the prefrontal CC max with in adir CD4 (p=0.046)(+vecorrelation); motor
CC max with GQ (p=0.028) (Mve!correlation) and CC length with CD4(p=0.04) (Mve
correlation).
Significant correlations between CC thickness and brain volume were found in patients
and controls for the CC mean and TBV (p=0.049)(+ve correlation);premotor CC mean and
TBV (p=0.039)(+ve correlation); sensory CC mean and TBV (p=0.022)(+ve correlation);
prefrontal CC max and WMV (p=0.019)(+ve correlation); premotor CC mean and WMV
(p=0.019)(+ve correlation and for the premotor CC max and WMV (p=0.023)(+ve
correlation).
Conclusion:
This research met its objectives in demonstrating a statistically significant, albeit weak,
correlation between CC thickness and brain volume in patients and controls, even though
patients were not shown to have significantly diminished brain volumes as compared to
controls.
|
57 |
Colocações verbais em um corpus de aprendizes brasileiros de inglês / Verbal collocations in a corpus of Brazilian learners of EnglishMurakami, Danilo Suzuki 22 March 2016 (has links)
Muitas pesquisas reconhecem a importância das colocações para o aprendizado da língua inglesa. Contudo, poucos estudos investigaram o tema na escrita de aprendizes brasileiros de inglês. Esta pesquisa examina o papel das colocações verbais em um subcorpus do EF-Cambridge Open Language Database (EFCAMDAT) composto por redações de aprendizes brasileiros de inglês de nível avançado. A abordagem metodológica adotada neste estudo é baseada em técnicas da Linguística de Corpus. Para essa investigação, foi elaborada uma classificação semiautomática de todos os verbos com o auxílio de um programa de anotação de corpora. Em geral, os resultados mostram que praticamente uma em cada cinco combinações entre um verbo e um substantivo é uma colocação. No entanto, os aprendizes não empregam colocações verbais com sucesso mesmo sendo de nível avançado de aprendizado. As colocações verbais apresentaram desvios em 25% dos casos. O principal tipo de inadequação é o uso de um verbo inapropriado causado pela influência do português. Um pequeno número de estruturas sintáticas também pode ser responsável por desvios colocacionais. Mais pesquisas sobre esse tópico precisam ser conduzidas para a total compreensão dos fatores que determinam a taxa de sucesso. Os achados devem contribuir para a área de aprendizagem de inglês por brasileiros. / There is a growing body of literature that recognizes the importance of collocations in English language learning. However, few studies have investigated the use of collocations in the writing of Brazilian learners of English. This research examines the role of verbal collocations in a subcorpus of the EF-Cambridge Open Language Database (EFCAMDAT). The subcorpus comprises writings by advanced learners of English from Brazil. The methodological approach taken in this study is based on Corpus Linguistics. For this investigation, a semi-automatic classification of all verbs was applied with the aid of a computer program for annotation of text. Overall, the results indicate that nearly one out of every five combinations between a verb and a noun is a collocation and that learners are not completely successful in the use of verbal collocations despite their advanced level of learning. The use of verbal collocations was found to be deviant in 25% of the cases. The main type of inadequacy was the use of an inappropriate verb caused by the influence of Portuguese. A small number of syntactic patterns may also have been responsible for collocational deviations. More research on this topic needs to be undertaken before full comprehension of the factors that determine success rate. The findings should make a contribution to the field of English learning by Brazilians.
|
58 |
A (Im)possibilidade jurídica do cabimento da ação contitucional do habeas corpus nas punições disciplinares militares / The (im)possibility of legal action pertinence of constitutional habeas corpus in military punishments discipline (Inglês)Mesquita, Silvio Carlos Leite 15 December 2011 (has links)
Made available in DSpace on 2019-03-29T23:35:16Z (GMT). No. of bitstreams: 0
Previous issue date: 2011-12-15 / This research aims to develop a theoretical analysis on the constitutional action of habeas corpus in the military disciplinary punishments. Rescue the origin of habeas corpus, its historical aspects in Brazilian law, mediate its object, its legal nature, concepts, species, legitimacy active, passive, and its legal requirements. Addresses the public administration, especially in relation to linked and discretionary administrative act, its elements or requirements and the forms of its invalidation, approaching its existence, validity, effectiveness, vices and defects, and yet, we comment on the powers of the Public Administration. More specifically, the study examines the kinds of disciplinary punishment in the existing Military Corporations, especially those that may restrict the constitutional law of movement of military offenders. Far will study the pertinence or not the institution of habeas corpus, when the military has its freedom of movement curtailed or threatened by illegality or abuse of power, by action taken by the military administrator, although expressed in the sealing art. 142, § 2 of the Federal Constitution of 1988, as well as, there is the positioning of the doctrine and jurisprudence on the (un) constitutionality of that paragraph. We conclude by analyzing the shortcomings of military disciplinary actions that may lead the search for judicial review with the filing of the Institute of habeas corpus in the Judiciary, with the consequent release of issuance of the permit or pass.
Keywords: Constitutional action. Habeas corpus. Public administration. Disciplinary punishments. Locomotion freedom. / A presente pesquisa objetiva elaborar uma análise teórica sobre a ação constitucional do habeas corpus nas punições disciplinares militares. Resgata-se, a origem do habeas corpus, seus aspectos históricos no direito brasileiro, seu objeto mediato, sua natureza jurídica, conceitos, espécies, legitimidade ativa, passiva e seus pressupostos jurídicos. Aborda-se a Administração Pública, mormente em relação ao ato administrativo vinculado e discricionário, seus elementos ou requisitos e as formas de sua invalidação, abordando-se sua existência, validade, eficácia, vícios e defeitos, e ainda, comenta-se sobre os poderes conferidos à Administração Pública. De forma mais específica, o estudo analisa as espécies de punições disciplinares existentes nas Corporações Militares, em especial as que podem restringir o Direito Constitucional de locomoção dos militares transgressores. Far-se-á o estudo sobre o cabimento ou não do instituto do habeas corpus, quando o militar tiver a sua liberdade de locomoção coarctada ou ameaçada, por ilegalidade ou abuso de poder, por ato praticado pelo administrador militar, embora haja vedação expressa no art. 142, § 2º, da Constituição Federal de 1988, assim como, verifica-se o posicionamento da doutrina e da jurisprudência sobre a (in) constitucionalidade do referido parágrafo. Conclui-se, analisando os defeitos dos atos disciplinares militares, que podem ocasionar a busca da tutela jurisdicional com a impetração do instituto do habeas corpus no Poder Judiciário, com a conseqüente expedição do alvará de soltura ou de salvo-conduto.
Palavras-chave: Ação constitucional. Habeas corpus. Administração pública. Punições disciplinares. Liberdade de locomoção.
|
59 |
Pun Strategies Across Joke Schemata: A Corpus-Based StudyCrapo, Robert Nishan 01 April 2018 (has links)
In the linguistic study of humor, research has largely been centered around the formulation of models and theories or the dissecting and categorization of jokes. Because of the often difficult-to-categorize aspects of verbal jokes, much time has been spent trying to create taxonomies for humor types and mechanisms. Linguists such as Raskin and Attardo have sought to categorize all verbal humor according to various functional elements (Attardo & Raskin, 1991). Such elements include, but are not limited to, the logical mechanism that drives the humor in the joke or the situation where the joke takes place. These categorizations are helpful in understanding the potential components of a given joke. However, relatively few studies have sought to quantify and qualify the distribution of these components across real-world data. This study seeks to understand the distribution of some of these categorizations laid out by Raskin and Attardo across joke topics, namely pun wordplay and narrative strategy. To do this, an original 100,000 word joke corpus was designed and compiled consisting of four joke topics: Marriage, Politics, Animals, and Food. Through some manual sorting and Python programming, jokes were labeled according to wordplay strategy and narrative structure. A subsequent statistical analysis was carried out to determine whether there exists a pattern of specific joke strategies when dealing with children's humor versus adult humor.
|
60 |
Building a Document Corpus for Manufacturing Knowledge RetrievalLiu, Y., Loh, Han Tong, Tor, Shu Beng 01 1900 (has links)
When faced with challenging technical problems, R&D personnel would often turn to technical papers to seek inspiration for a solution. The building of a corpus of such papers and the easy retrieval of relevant papers by the user in his query is an area that has not been systematically dealt with. This is an attempt to build such a corpus for manufacturing R&D personnel. Manufacturing Corpus Version 1 (MCV1) is an archive of more than 1400 relevant manufacturing engineering papers between 1998 and 2000. In this paper, the origins and motivation of building MCV1 is discussed. The innovative coding process which is specially designed for manufacturing companies will be presented. All other relevant issues, like coding policy, category codes and input documents, will be explained. Finally, two quality indicators which integrate all concerns about coding quality will be examined. / Singapore-MIT Alliance (SMA)
|
Page generated in 1.9846 seconds