• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 88
  • 46
  • 46
  • 10
  • 4
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • Tagged with
  • 246
  • 106
  • 103
  • 89
  • 51
  • 30
  • 29
  • 28
  • 23
  • 22
  • 22
  • 21
  • 20
  • 20
  • 19
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
151

Entrevistas de emprego em inglês: uma análise multidimensional / Job interviews in English: a multidimensional analysis

Diegues, Ulysses Camargo Corrêa 23 August 2018 (has links)
Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2018-09-26T10:04:55Z No. of bitstreams: 1 Ulysses Camargo Corrêa Diegues.pdf: 1531825 bytes, checksum: 013effa494f4018a825471efea8d13a1 (MD5) / Made available in DSpace on 2018-09-26T10:04:55Z (GMT). No. of bitstreams: 1 Ulysses Camargo Corrêa Diegues.pdf: 1531825 bytes, checksum: 013effa494f4018a825471efea8d13a1 (MD5) Previous issue date: 2018-08-23 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / In a scenario which more and more the process of selecting candidates for job becomes more demanding (JOSEPH, 2013), the study of job interviews in English is of great importance. However, it has been receiving little attention in language studies. The purpose of this research is to compare the English job interview register with the other English language registers along the five dimensions of variation identified by Biber (1988 et seq.) through Multidimensional Analysis (MDA). To do so, this research is based on Corpus Linguistics (CL) that deals with the collection and exploitation of corpora with the purpose of helping to research a language or part of it. (BERBER SARDINHA, 2000; 2004). The corpus of this study was the Job Interview Corpus (JIC), composed of 40 real job interviews conducted in Germany with native speakers from Australia, Ireland, the United Kingdom and the United States, totaling approximately 50,000 words. In order to enable MDA, the study corpus, JIC, was grammatically tagged with the Biber Tagger and later processed by the Biber Tag Count, which calculated the frequency of 67 linguistic variables considered in this study. The MDA results showed how the English job interviews of the study corpus, JIC, resemble or differentiate from the other English language registers along the five dimensions of variation (BIBER, 1988 et seq.). Since there are no precedents of studies within the CL devoted to the investigation of English job interviews in a multidimensional analysis, this research intends to fill this gap in the academic field / Em um cenário em que cada vez mais o processo de seleção de candidatos a uma vaga de emprego se torna mais exigente (JOSEPH, 2013), o estudo das entrevistas de emprego em inglês é de grande importância. No entanto, o tema tem recebido pouca atenção nos estudos linguísticos. O objetivo desta pesquisa é comparar o registro entrevista de emprego em inglês com os outros registros da Língua Inglesa ao longo das cinco dimensões de variação identificadas por Biber (1988 et seq.) por meio da Análise Multidimensional (AMD). Para tanto, esta pesquisa se fundamenta na Linguística de Corpus (LC) que se ocupa da coleta e exploração de corpora com a finalidade de servir para uma pesquisa de uma língua (BERBER SARDINHA, 2000; 2004). O corpus de estudo utilizado nesta pesquisa foi o Job Interview Corpus (JIC), composto por 40 entrevistas de emprego reais realizadas na Alemanha com falantes nativos oriundos da Austrália, Estados Unidos, Irlanda e Reino Unido, totalizando aproximadamente 50 mil palavras. A fim de viabilizar a AMD, o corpus de estudo, JIC, foi etiquetado gramaticalmente com a ferramenta computacional Biber Tagger e posteriormente processado pelo Biber Tag Count que calculou a frequência das 67 variáveis linguísticas consideradas neste estudo. Os resultados da AMD demonstraram como as entrevistas de emprego em inglês presentes no corpus de estudo, JIC, se assemelham ou se diferencia dos demais registros da Língua Inglesa ao longo das cinco dimensões de variação (BIBER, 1988 et seq.). Uma vez que não há precedentes de estudos dentro da LC dedicados à investigação das entrevistas de emprego em inglês em uma análise multidimensional, a presente pesquisa pretende preencher esta lacuna na área acadêmica
152

O que e como escrevemos na web: um estudo multidimensional de variação de registro em língua inglesa

Mayer, Cristina 31 August 2018 (has links)
Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2018-09-27T12:26:57Z No. of bitstreams: 1 Cristina Mayer.pdf: 1995903 bytes, checksum: 09a2f90db9b38ad6cba6cbb1553e52b5 (MD5) / Made available in DSpace on 2018-09-27T12:26:57Z (GMT). No. of bitstreams: 1 Cristina Mayer.pdf: 1995903 bytes, checksum: 09a2f90db9b38ad6cba6cbb1553e52b5 (MD5) Previous issue date: 2018-08-31 / Conselho Nacional de Pesquisa e Desenvolvimento Científico e Tecnológico - CNPq / The main goal of this research was to look at Web text varieties, specifically in social networks and consumer generated content such as comments, reviews and complaints in English through the Multidimensional Approach (MD) for register variation analysis leading to a set of dimensions of variation across Web registers. Web registers have been the object of several MD investigations (BERBER SARDINHA, 2014; BIBER et al., 2015; BIBER; EGBERT, 2015, 2016); however, these studies have not focused on social networks and consumer generated content. To fill this gap, a corpus of 15 of these registers was designed and compiled, the CoUGC – Corpus of User Generated Content, and three separate multidimensional analyses (MD) were conducted on the corpus, as well as a canonical correlation analysis. The first one was the additive MD analysis, in which the registers were added to the Dimensions of Variation of English by Biber (1988). The second one was a ‘mainstream’ MD analysis, in which the coocurrence of lexicogrammatical variables was analyzed and 4 dimensions were interpreted, representing the functional parameters underlying the variation across the registers. The third analysis, in turn, was based on the traditional MD analysis; however, it was conducted with lexical variables (BERBER SARDINHA, 2014; 2017; no prelo). This analysis revealed 5 thematic dimensions, which reflected semantic groupings. A canonical correlation analysis was then run to explain the relationship between the functional and lexical dimensions. All the analyzes led to the study of the use of the language by Web users / O objetivo principal desta pesquisa foi o estudo de variedades de texto da web, especificamente em redes sociais e conteúdo gerado por consumidores, como comentários, críticas e reclamações em língua inglesa por meio da Abordagem Multidimensional (AMD) para análise de variação de registro, que leva a um conjunto de dimensões de variação de registros da web. Os registros da web foram objeto de várias pesquisas em AMD (BERBER SARDINHA, 2014; BIBER et al., 2015; BIBER; EGBERT, 2015, 2016). No entanto, esses estudos não tiveram como foco redes sociais e conteúdo gerado pelo consumidor. Para preencher essa lacuna, um corpus de 15 desses registros foi desenhado e compilado, o CoUGC - Corpus of User Generated Content e três análises multidimensionais (AMD) foram realizadas no corpus, além da análise de correlação canônica. A primeira foi a AMD aditiva, na qual os registros do corpus de estudo foram mapeados nas Dimensões de variação do inglês de BIBER (1988). A segunda foi uma análise AMD funcional tradicional, na qual foi analisada a coocorrência das variáveis lexicogramaticais e foram identificadas 4 dimensões, que representaram os parâmetros funcionais subjacentes à variação entre os registros. A terceira análise, por sua vez, baseou-se na AMD tradicional, entretanto com variáveis lexicais (BERBER SARDINHA, 2014; 2017; no prelo). Essa análise revelou 5 dimensões temáticas, que refletiram agrupamentos semânticos. Foi feita então uma análise de correlação canônica para a observação da relação entre os dois conjuntos de dimensões, funcional e lexical. Todas as análises permitiram o estudo da linguagem dos usuários da web
153

A incidência do princípio idiomático e do princípio da escolha aberta na produção escrita de alunos brasileiros de inglês como língua estrangeira

Gil, Cristina Borges 16 March 2017 (has links)
Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2017-03-23T13:09:34Z No. of bitstreams: 1 Cristina Borges Gil.pdf: 4736864 bytes, checksum: cc77f5e1e500d6c09793301a52d46456 (MD5) / Made available in DSpace on 2017-03-23T13:09:34Z (GMT). No. of bitstreams: 1 Cristina Borges Gil.pdf: 4736864 bytes, checksum: cc77f5e1e500d6c09793301a52d46456 (MD5) Previous issue date: 2017-03-16 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The aim of this research is to find evidence of both the idiom principle and the open-choice principle in the written production of Brazilian students of English as a foreign language. The theoretical basis of this study is Corpus Linguistics, an area which supports the research and the study of language in use, and which is based on the view of language as a probabilistic system. Sinclair (1991, 2004) sees language as a probabilistic system with two complementary principles: the idiom principle and the open-choice principle. The idiom principle has to do with the use of sequence of words which are at least partly prefabricated and that are appropriate in a given context. The open-choice principle is a way of seeing the language in which the only restriction to lexical choices is grammaticalness. The methodology consisted of the collection of a written corpus of Brazilian students of English as a foreign language and the subsequent analysis of all the sequences of words used in each text of the corpus. This procedure, known as ‘collocation tracking’, was introduced by Berber Sardinha (2014a). The findings point out that the two principles coexist in the texts as proposed by Sinclair. In addition, they also reveal nuances in the principles described by him in the written production of the learners. We called them idiom principle I and II, and open-choice principle I and II. The study presented here intended to have made an original contribution to Corpus Linguistics and to the study of Learner Corpora as it carried out a descriptive investigation of learner language and observed variant forms of the principles which are not found in texts written by educated native speakers / Esta pesquisa tem como objetivo principal detectar indícios do princípio idiomático e do princípio da escolha aberta na produção escrita de alunos brasileiros em inglês como língua estrangeira. A base teórica desta investigação é a Linguística de Corpus, uma área que proporciona a pesquisa, o estudo e a exploração da língua em uso e que se baseia na visão probabilística da linguagem. Sinclair (1991, 2004) considera a linguagem como sistema probabilístico a partir de dois princípios complementares: o idiomático e o da escolha aberta. O princípio idiomático diz respeito ao uso de sequências de palavras que são, pelo menos em parte, pré-fabricadas e adequadas para o contexto no qual se inserem. Já o princípio da escolha aberta diz respeito ao uso de sequências de palavras que seguem o modelo abertura-e-enchimento, combinadas a partir de regras gramaticais. A metodologia consistiu da coleta de um corpus de escrita de aprendizes brasileiros de inglês e do subsequente exame de todas as sequências de palavras de cada um dos textos do corpus, comparando-as com um corpus de referência representativo da língua em questão, o inglês. Esse procedimento, conhecido por rastreamento de colocações, foi introduzido por Berber Sardinha (2014a). A análise dos resultados indicou que os dois princípios coexistem nos textos analisados, como aventado por Sinclair (1991). Além disso, também revelou que há nas redações dos aprendizes nuances nos dois princípios propostos por Sinclair (1991), que denominamos princípio idiomático tipo I e II, e princípio da escolha aberta tipo I e II. A pesquisa pretende dar uma contribuição original à Linguística de Corpus, assim como à Linguística de Corpus de Aprendiz, à medida que foi realizada uma investigação descritiva da linguagem do aprendiz baseada em corpora e observado variantes dos princípios nos textos dos aprendizes que não se encontram em textos de falantes nativos letrados da língua
154

Étude comparative des pronoms démonstratifs neutres anglais et français à l'oral : référence indexicale, structure du discours et formalisation en grammaire notionnelle dépendancielle / A comparative study of English and French neutral demonstrative pronouns in face-to-face conversations : indexical reference, discourse structure and Notional Dependancy Grammar

Buscail, Laurie 04 October 2013 (has links)
Cette thèse explore le fonctionnement indexical des pronoms démonstratifs anglais this, that et it d’une part, et des pronoms démonstratifs français, en particulier ça, d’autre part, en vue d’une comparaison entre ces deux systèmes. L’ensemble des phénomènes référentiels et discursifs observés sont ramenés à certaines caractéristiques syntactico-sémantiques propres à chaque démonstratif, alors formalisées dans le cadre de la Grammaire Notionnelle Dépendancielle. Les occurrences de this, that, it et ça analysées étant issues de conversations orales spontanées et enregistrées selon le protocole des projets PAC et PFC, notre étude apporte un questionnement sur les avantages et les limites des grands corpus oraux pour les recherches en linguistique théorique. / This thesis focuses from a comparative perspective on the indexical functioning of both English and French demonstrative pronouns, namely this, that and it on the one hand, and celui-ci/là, ceci, cela and ça on the other – with particular emphasis on the neuter pronoun ça. The overall referential and discursive phenomena which are examined are tied to a number of syntactic and semantic features and representations characterizing each demonstrative item. Our treatment is then formalized within the framework of Notional Dependency Grammar. As all our crucial examples concerning this, that, it and ça are extracted from authentic face to face conversations from the PAC and PFC projects, the present study leads to a discussion of the potential advantages and limits of large spoken corpora for research in theoretical linguistics.
155

Escrita científica em português por hispano falantes: recursos linguísticos-computacionais baseados em métodos de alinhamento de textos paralelos / Scientific writing in portuguese by hispanic speaking: linguistic-computational resources based on alignments methods of parallel text

Torres, Lianet Sepúlveda 24 September 2015 (has links)
O número de estrangeiros interessados em aprender o português tem aumentado na última década, em consequência do crescimento da economia brasileira e do aumento da presença de multinacionais no Brasil. Esse fato se mostra pelo aumento do número de inscritos no exame de proficiência de português CELPE-Bras e de estudantes estrangeiros que ingressam nas universidades brasileiras. A maioria destes estudantes são de língua espanhola e precisam escrever seus textos acadêmicos em português. A proximidade das línguas portuguesa e espanhola apresenta-se tanto como um elemento positivo quanto como um obstáculo, pois oculta as diferenças e impede o domínio da língua portuguesa, mantendo, na fala e na escrita em português, interferências do espanhol. O maior número destas interferências acontece no nível lexical. Uma das alternativas para tratar os problemas em textos de aprendizes de uma língua é o emprego de ferramentas computacionais de pós-processamento e de suporte ao processo de escrita. No entanto, o número de recursos e ferramentas disponíveis para auxiliar a escrita de português como língua estrangeira é muito reduzido, diferentemente do cenário para a língua inglesa. Esta pesquisa propôs a criação de recursos e ferramentas de suporte à escrita no nível lexical como primeiro passo para a melhoria da qualidade linguística dos textos em português produzidos pelos nativos do espanhol. A Linguística de Córpus foi utilizada como metodologia para viabilizar a análise de erros de aprendizes. As ferramentas de auxílio utilizam léxicos bilíngues compilados por meio de técnicas de tradução, baseadas em alinhamento de córpus paralelos. Dado o número insuficiente de erros previamente anotados para suportar a detecção automática de erros, esta pesquisa propôs métodos baseados em modelo língua e na geração artificial de erros. A geração de erros artificiais se apresentou como um método eficiente para predizer erros lexicais dos aprendizes. As contribuições obtidas com a metodologia baseada em tradução automática para gerar auxílios à escrita entre línguas próximas, considerando a análise de erros lexicais extraídos de córpus de aprendizes, foco desta pesquisa, são: (i) do ponto de vista teórico, o levantamento e quantificação dos principais problemas causados pelas marcas do espanhol, deixadas nos textos acadêmicos em português escritos por nativos do espanhol; (ii) do ponto de vista de geração automática de recursos linguísticos, léxicos bilíngues de cognatos e falsos cognatos; léxico bilíngue de marcadores discursivos; léxico de expressões formulaicas que aparecem nos textos científicos e léxico bilíngue de verbos relacionados com pesquisa científica em português e, (iii) do ponto de vista da criação de subsídios para a área de auxílio à escrita científica, o projeto e avaliação de auxílios para suportar a escrita científica em português por nativos do espanhol. / In the last decade, as a result of Brazilian economic growth and the increased presence of multinationals in the country, the interest of foreigners in learning Portuguese rose. This fact is also noted by the number of students enrolled in the Portuguese proficiency exam, CELPE-Bras and the number of foreigner students entering at the Brazilian Universities. Most of these students are Spanish speakers and need to write the dissertation or thesis in Portuguese. The similarity between Portuguese and Spanish is considered as a positive element that often becomes an obstacle, because similarity and closeness frequently conceal differences and hinder learners from mastering the Portuguese, keeping interferences from their native Spanish both when speaking and writing in Portuguese. The largest number of this interference occurs at the lexical level. One alternative to deal with errors of second language learners is the use of computational post-edit tools and tools to support the writing process. However, the number of resources and tools available to help improve Portuguese writing as a foreign language is very small, unlike the scenario into English. This research proposed the creation of resources and writing support tools at the lexical level as a first step to improving the linguistic quality of the texts produced by Portuguese native Spanish. Corpus linguistics was used as a methodology to enable the learners error analysis. The writing support tools use bilingual lexicons compiled through translation techniques based on alignment parallel corpus. Given the insufficient number of errors previously annotated to support automatic error detection, this research proposed methods based on language model and artificial generation of errors. The generation of artificial errors introduced himself as an efficient method for predicting lexical errors of learners. The contributions obtained with the methodology based on automatic translation to generate written supports between similar languages, considering lexical error, extracted by the analysis of learners corpus, focus of this research are: (i) the theoretical point of view, the survey and quantification of the main problems caused by the Spanish marks, left in the academic texts written in Portuguese by native Spanish; (ii) the automatic generation of language resources point of view, bilingual lexicons of cognates and false cognates; bilingual lexicon of discourse markers; bilingual lexicon of formulaic expressions that appear in scientific texts and bilingual lexicon of verbs related to scientific research in Portuguese and, (iii) the point of view of creating subsidies for the area of support scientific writing, design and evaluation of aid to support scientific writing in Portuguese by native Spanish speakers.
156

Analyse des marqueurs de relations conceptuelles en corpus spécialisé : recensement, évaluation et caractérisation en fonction du domaine et du genre textuel / Analysis of markers of conceptual relation in specialized corpora : identification, evaluation, and description based on domain and text genre

Lefeuvre, Luce 05 September 2017 (has links)
L’intérêt d’utiliser des marqueurs de relations conceptuelles pour élaborer des ressources terminologiques à maintes fois été souligné, car ils permettent de passer d’un triplet repéré en corpus comme « Terme1 – Marqueur – Terme2 », à un triplet interprété comme « Terme1 – Relation – Terme2 » permettant une représentation sous forme relationnelle des connaissances. Le passage d’un triplet à l’autre soulève néanmoins la question de la stabilité d’un tel lien, indépendamment de tout corpus. Dans cette thèse, nous étudions la variation du fonctionnement des candidats-marqueurs de relation en prenant en compte le domaine et le genre textuel. Pour cela, nous avons constitué la liste des marqueurs des relations d’hyperonymie, de méronymie, et de cause en français et avons analysé le fonctionnement de chacune des occurrences de ces candidats-marqueurs dans un corpus traitant de deux domaines (volcanologie et cancer du sein) et relevant de deux genres textuels (scientifique et vulgarisé). La description systématique des contextes comportant un candidat-marqueur nous a permis de mesurer la précision de chacun des candidats-marqueurs, c’est-à-dire sa capacité à indiquer la relation attendue. Les analyses menées démontrent finalement la pertinence d’intégrer ces paramètres dans la description linguistique des candidats-marqueurs de relations. / The use of markers of conceptual relation for building terminological resources has been frequently emphasized. Those markers are used in corpora to detect “Term1 – marker – Term2” triple, which are then interpreted as “Term1 - Conceptual Relation – Term2” triple allowing to represent knowledge as a relational system model. The transition from one triple to another questions the stability of this link, regardless of corpora. In this thesis, we study the variation of the “candidate-markers” of relation taking into account the domain and the text genre. To this end, we identified the French markers for the hyperonym, the meronym and the causal relation, and systematically analyzed their functioning within corpora varying according to the domain (breast cancer vs. volcanology) and the text genre (popular science vs. specialized texts). For each context containing a candidate-marker, we evaluated the capacity of the candidate-marker to really indicate the required relation. Our researches attest to the relevance of taking into account the domain and the text genre when describing the functioning of conceptual relation markers.
157

Coarticulation C-à-V en français : interaction avec le type de voyelle, la position prosodique et le style de parole / C-to-V coarticulation in French : interaction with vowel type, prosodic position and speech style

Guitard-Ivent, Fanny 12 September 2018 (has links)
Cette thèse étudie la coarticulation C-à-V en français et son interaction avec d’autres sources de variations dans le but de mieux comprendre ce qui la module et ce qui gouverne la variation dans la parole. Pour cela, à partir de grands corpus de parole, nous avons testé comment la coarticulation C-à-V était fonction : 1) des caractéristiques articulatoires des consonnes et voyelles impliquées à partir de 18.5k voyelles /i, e, ɛ, a, x, u, o, ɔ/ (/x/=/ø, œ, ə/) en contexte ALVéolaire, UVulaire et VÉLaire ; 2) de la position prosodique occupée par les voyelles, en comparant le degré de coarticulation de 17k séquences CV et VC, V=/i, e, a, ɔ/ et C=ALV|UV, en position initiale de groupe intonatif, avec celui de séquences semblables en position interne de mot ; et 3) du style de parole, en analysant le degré de coarticulation dans 22k séquences CV et VC, V= /i, E, a, u, ɔ/ (/E/=/e, ɛ/) et C=ALV|UV, issues de parole journalistique et conversationnelle. Cette thèse montre qu’en plus de dépendre des caractéristiques articulatoires des segments, la coarticulation est aussi modulée par des facteurs linguistiques, liés à l’organisation prosodique du message, et des facteurs communicationnels dépendant de la situation de communication. Cependant, certains résultats suggèrent que la modulation de la coarticulation par la position prosodique et le style de parole, ont des fonctions linguistiques différentes dont les implications sur la variation dans la parole seront discutées. Enfin, une réflexion sur les changements de sons en lien avec la préférence universelle pour l’antériorisation des voyelles postérieures fermées sera proposée à partir des différences observées entre les voyelles. / This dissertation examines C-to-V coarticulation in French and its interaction with others sources of variation in order to better understand what modulates and governs variation in speech. Based on data from large speech corpora, we tested how C-to-V coarticulation is a function of: 1) the articulatory properties of the tested segments, i.e. 18.5k vowels /i, e, ɛ, a, x, u, o, ɔ/ (/x/=/ø, œ, ə/) in ALVeolar, UVular et VELar contexts; 2) the prosodic position occupied by the vowels, comparing the degree of coarticulation of 17k CV and VC sequences V=/i, e, a, ɔ/ and C=ALV|UV in initial position of intonational phrases, to similar sequences in internal word position; 3) the speech style, by analyzing the degree of coarticulation in 22k CV and VC sequences, V = /i, E, a, u, ɔ/ (/E/ = /e, ɛ/) and C = ALV|UV, in journalistic and conversational speech. The thesis demonstrates that coarticulation, in addition to being dependent on the articulatory characteristics of segments, is also modulated by other linguistic factors, related to the prosodic organization of the message, and communicational factors depending on the communication situation. Indeed, vowels are more resistant to coarticulation in strong prosodic positions as in formal speech. However, some results suggest that the modulation of coarticulation by prosodic position and speech style have different linguistic functions whose implications for speech variation will be discussed. Finally, a reflection on sound changes related to the universal preference for the anteriorization of back closed vowels will be proposed from the observed differences between the vowels.
158

語料庫及心理語言學為基礎之研究: 以[Do/Make+Noun] 為例 / Investigating [Do/Make+Noun] constructions: a study based on corpora and psycholinguistic experiments

謝怡箴, Hsieh, Yi Chen Unknown Date (has links)
大多數台灣英語學習者在進入大學前已經習得相當數量的英文字彙,即便如此,他們仍然會誤用常見的搭配詞 (例如: [do/make+noun])。本論文藉用兩種語料庫 (分別為台灣英語學習者語料庫及英國國家語料庫)來分析、觀察[do+noun]和[make+noun]的異同以及英語母語人士及台灣英語學習者使用[do/make+noun]狀況。結果顯示台灣英語學習者和英語母語人士最大的不同在[make+noun]:就語意層面而言,最常被英語母語人士使用的[make+noun]為 ‘to perform, to carry out’ (例如: make a speech, make a fine judge, etc.) 而台灣英語學習者偏好 ‘to create’ (例如: make a sushi, make a robot, etc.);就名詞特性而言,母語人士偏向使用抽象的名詞 (例如: comment, progress, etc.) 而學習者習慣使用具體的名詞 (例如: robot, sushi, etc.)。除了語料庫語料分析,本論文還透過心理語言學實驗測驗 (即圖片引述實驗-受試者描述他們不熟悉的動作) 觀察母語人士和學習者使用常見的搭配詞-[do/make+noun]-的差異。台灣英語學習者使用為數不少廣義的[do+noun] (例如:do exercise 或 do sports) 而英語母語人士傾向使用帶有具體意義的動詞 (例如:sit-up) 或搭配詞 (例如: do sit-up)。幾乎沒有母語人士使用[make+noun]而大多數學習者使用的是[make+noun]-當make做為causative的用法。根據此實驗分析,本論文提出一個模型來探討英語母語人士和英語學習者對[do/make+noun]的使用異同。 / Learners of English in Taiwan are estimated to reach a certain command of vocabulary size before they enter colleges. However, they still differ from native speakers in producing the commonly-used patterns, such as [do/make+noun]. In order to observe the similarities and differences of [do+noun] and [make+noun], as well as their uses by EFL learners, this paper inspects their senses using two types of corpus data, namely a Taiwan-based learner corpus and the British National Corpus. The results show that learners differ from native speakers mainly in their use of [make+noun]. For example, the most frequent sense used by native speakers is ‘to perform, to carry out,’ as in make a speech, make a fine judge, etc., whereas that used by Taiwanese learners is ‘to create’ as in make a sushi, make a robot, etc. With respect to the characteristic of the noun following make, native speakers tend to choose abstract nouns, such as comment, progress, etc., whereas learners prefer concrete nouns, such as robot, sushi, etc. A psycholinguistic experiment is also included in order to see whether learners use language with general meanings, such as [do/make+noun], more in describing situations unfamiliar to them. Results show that [do+noun] patterns with a more general meaning (e.g., do exercise or do sports) are more often used by the learners in our experiment while native speakers prefer language with a more precise meaning (e.g., sit-up or do sit-up). Few [make+noun] constructions are found in native speakers’ language whereas learners produce numerous [make+noun] constructions, mostly the causative uses of make.
159

Effective Techniques for Indonesian Text Retrieval

Asian, Jelita, jelitayang@gmail.com January 2007 (has links)
The Web is a vast repository of data, and information on almost any subject can be found with the aid of search engines. Although the Web is international, the majority of research on finding of information has a focus on languages such as English and Chinese. In this thesis, we investigate information retrieval techniques for Indonesian. Although Indonesia is the fourth most populous country in the world, little attention has been given to search of Indonesian documents. Stemming is the process of reducing morphological variants of a word to a common stem form. Previous research has shown that stemming is language-dependent. Although several stemming algorithms have been proposed for Indonesian, there is no consensus on which gives better performance. We empirically explore these algorithms, showing that even the best algorithm still has scope for improvement. We propose novel extensions to this algorithm and develop a new Indonesian stemmer, and show that these can improve stemming correctness by up to three percentage points; our approach makes less than one error in thirty-eight words. We propose a range of techniques to enhance the performance of Indonesian information retrieval. These techniques include: stopping; sub-word tokenisation; and identification of proper nouns; and modifications to existing similarity functions. Our experiments show that many of these techniques can increase retrieval performance, with the highest increase achieved when we use grams of size five to tokenise words. We also present an effective method for identifying the language of a document; this allows various information retrieval techniques to be applied selectively depending on the language of target documents. We also address the problem of automatic creation of parallel corpora --- collections of documents that are the direct translations of each other --- which are essential for cross-lingual information retrieval tasks. Well-curated parallel corpora are rare, and for many languages, such as Indonesian, do not exist at all. We describe algorithms that we have developed to automatically identify parallel documents for Indonesian and English. Unlike most current approaches, which consider only the context and structure of the documents, our approach is based on the document content itself. Our algorithms do not make any prior assumptions about the documents, and are based on the Needleman-Wunsch algorithm for global alignment of protein sequences. Our approach works well in identifying Indonesian-English parallel documents, especially when no translation is performed. It can increase the separation value, a measure to discriminate good matches of parallel documents from bad matches, by approximately ten percentage points. We also investigate the applicability of our identification algorithms for other languages that use the Latin alphabet. Our experiments show that, with minor modifications, our alignment methods are effective for English-French, English-German, and French-German corpora, especially when the documents are not translated. Our technique can increase the separation value for the European corpus by up to twenty-eight percentage points. Together, these results provide a substantial advance in understanding techniques that can be applied for effective Indonesian text retrieval.
160

Posture and Space in Virtual Characters : application to Ambient Interaction and Affective Interaction

Tan, Ning 31 January 2012 (has links) (PDF)
Multimodal communication is key to smooth interactions between people. However, multimodality remains limited in current human-computer interfaces. For example, posture is less explored than other modalities, such as speech and facial expressions. The postural expressions of others have a huge impact on how we situate and interpret an interaction. Devices and interfaces for representing full-body interaction are available (e.g., Kinect and full-body avatars), but systems still lack computational models relating these modalities to spatial and emotional communicative functions.The goal of this thesis is to lay the foundation for computational models that enable better use of posture in human-computer interaction. This necessitates addressing several research questions: How can we symbolically represent postures used in interpersonal communication? How can these representations inform the design of virtual characters' postural expressions? What are the requirements of a model of postural interaction for application to interactive virtual characters? How can this model be applied in different spatial and social contexts?In our approach, we start with the manual annotation of video corpora featuring postural expressions. We define a coding scheme for the manual annotation of posture at several levels of abstraction and for different body parts. These representations were used for analyzing the spatial and temporal relations between postures displayed by two human interlocutors during spontaneous conversations.Next, representations were used to inform the design of postural expressions displayed by virtual characters. For studying postural expressions, we selected one promising, relevant component of emotions: the action tendency. Animations were designed featuring action tendencies in a female character. These animations were used as a social context in perception tests.Finally, postural expressions were designed for a virtual character used in an ambient interaction system. These postural and spatial behaviors were used to help users locate real objects in an intelligent room (iRoom). The impact of these bodily expressions on the user¡¯s performance, subjective perception and behavior was evaluated in a user studyFurther studies of bodily interaction are called for involving, for example, motion-capture techniques, integration with other spatial modalities such as gaze, and consideration of individual differences in bodily interaction.

Page generated in 0.0741 seconds