Spelling suggestions: "subject:"arallel corpus"" "subject:"arallel korpus""
31 |
Apport de la linguistique de corpus à la lexicographie bilingue (français-arabe) : macrostructure et microstructure d'un dictionnaire de collocations / The contribution of corpus linguistics to bilingual French-Arabic lexicography : macrostructure and microstructure in collocation dictionariesAl-Qaisi, Fu'ad 07 December 2015 (has links)
L'objet de la présente étude est d’examiner l’apport de la linguistique de corpus à la lexicographie bilingue français-arabe. L’intérêt est porté tout particulièrement à la collocation. Ainsi, la quête commence dès la compilation du corpus jusqu'à l'intégration des collocations au lexique. Les notions fondamentales telle que la linguistique de corpus, le corpus et la collocation sont examinées. Ensuite, la recherche prend une tournure empirique qui se base sur un corpus. Pour pallier la non disponibilité des outils de traitement de corpus en langue arabe, une approche a été élaborée au sein de cette étude, que nous avons baptisée stratégie de passerelle. L’idée est de partir d’un corpus parallèle (traduit) français-arabe. Ce corpus est constitué de la version française du journal Le Monde Diplomatique, ainsi que sa traduction arabe. Le recours à un corpus parallèle a pour vocation de faciliter le repérage des phénomènes contrastifs. Les résultats obtenus seront vérifiés par la suite dans un corpus monolingue arabe (comparable) constitué de trois journaux, à savoir Alrai, Alayam, Algomhuria. Tout au long de cette partie, les résultats sont comparés dans un premiers temps entre corpus et dictionnaires, dans un deuxième temps entre types de corpus (parallèle et comparable), et dans un troisième temps entre journaux du corpus comparable (Alrai, Alayam et Algomhuria). Ensuite, un certain nombre des collocations est soumis à un examen structurel et à un examen sémantique. Ces exploitations apportent non seulement des éléments sur l’environnement collocationnel entre langue et discours, mais également sur une éventuelle approche pour la prise en compte des collocations. Des interrogations légitimes naissent au fur et à mesure des exploitations sur la ressemblance entre les collocations des deux langues. Les résultats mettent en évidence des points comme l’enchaînement collocationnel, la synonymie collocationnelle et d’autres aspects. L’étude est couronnée par la conception d’un dictionnaire informatique de collocations. Il s’agit d’un dictionnaire actif bilingue, qui s’adresse à un public arabisant et aux traducteurs. / The aim of this study is to examine the contribution of corpus linguistics to bilingual French-Arabic lexicography. We particularly focus on collocations, as our research begins with the compilation of a bilingual corpus leading up to the integration of collocations in the lexicon. Fundamentals such as corpus linguistics, corpora and collocation are examined. Our research then takes an empirical turn that is based on the use of our corpus. To overcome the unavailability of corpus processing tools in Arabic, an approach was developed in this study that we called the footbridge strategy. The idea is to start from a French-Arabic (translated) parallel corpus. This corpus consists of the French version of Le Monde Diplomatique, and its translation. Using a parallel corpus aims to facilitate the identification of contrastive phenomena. The results obtained in the translated corpus (in its Arabic component) will be subsequently checked in an Arabic monolingual corpus. The latter is a corpus consisting of three newspapers: Alrai, Alayyam, Algouhouria. Throughout the exploitation of the corpus, results are compared first between corpora and dictionaries, secondly between corpus types (parallel and comparable), and thirdly between newspapers (Alrai, Alayyam, Algouhouria). Then a number of collocations are subjected to semantic and structural review and consideration. This review process not only brings some clarifications on the environment of collocations between language and speech but also about a possible approach for their integration in the dictionary. Legitimate questions gradually arise regarding the resemblance of collocations in French and Arabic. The results highlight phenomena such as collocational chains (clusters), collocational synonyms, etc. The study culminates in the design of a computer dictionary of collocations, i.e. an active bilingual dictionary aimed at Arabic language specialists and translators.
|
32 |
The Russian Verbal Prefix v- and Circumfix v- -sja in Space : A Contrastive Study between Russian and SwedishSamuelsson, Thomas January 2017 (has links)
Den här studien undersöker det ryska verbprefixet v(o)- och cirkumfixet v(o)- -sja i det konkreta fysiska rummet. Syftet med den kontrastiva studien är att undersöka och beskriva betydelser. Tvåspråkig data från en samtida rysk-svensk ordbok analyseras med Krongauz metod. En lista över verbaffixens betydelser byggs upp genom att jämföra lexikala betydelser och morfosyntaktiska konstruktioner för verben i båda språken. Resultatet visar att affixens betydelser kan delas in i följande kategorier: Spatiala rörelser in i ett slutet rum, Spatiala rörelser till en avgränsad yta, Spatiala rörelser mot en närhet, Vidhäftning och Platser i det fysiska rummet. / This present study investigates the Russian verbal prefix v(o)- and circumfix v(o)- -sja in the concrete physical space. The aim of the contrastive study is to explore and describe meanings. Bilingual data, extracted from a contemporary Russian-Swedish dictionary, is analysed by using Krongauz’s method. A list of meanings of the Russian verbal affixes is built by comparing similarities and differences between lexical meanings and morphosyntactic structures for the verbs in both languages. The result shows that the meanings of the affixes can be divided into the following categories: Spatial movements into an enclosed space, Spatial movements onto a delimited surface, Spatial movements towards a vicinity, Adhesion and Locations in physical space.
|
33 |
Lygiagrečių tekstynų kūrimo interaktyvios informacinės sistemos / Interactive information systems for parallel corpus developmentStankevičius, Kęstutis 23 July 2012 (has links)
Šio magistro darbo užduotis – apžvelgti šiuo metu labiausiai naudojamas vartotojo sąsajas, kurios padeda žmogui sąveikauti su kompiuteriais ir kitais įrenginiais bei sąsajų architektūros būdus, kurie palengvina programų kūrimą. Taip pat išanalizuoti šiuo metu plačiausiai naudojamus metodus interneto paslaugoms įgyvendinti, kad būtų rastas sprendimas, kaip interaktyvios informacinės sistemos galėtų bendrauti tarpusavyje be apribojimų reikiamam funkcionalumui gauti pasirenkant geriausią būdą saugoti ir atvaizduoti reikiamus programos duomenis kuo paprastesniu ir lankstesniu būdu. Sukurti lygiagrečių tekstynų prototipą, kuris leistų matyti gautą rezultatą su galimybe kuo lengviau ir greičiau rasti bei koreguoti automatiškai sugeneruotus netikslumus, jei tokie yra, pritaikant sąsają, kuri būtų patogesnė ir reikalautų kuo mažiau darbo pastangų. Pasinaudojant prototipu atlikti tyrimą, kuris parodytų įvesties įrenginių naudojimo tendencijas. Darbą sudaro 8 dalys: įvadas, vartotojo sąsajų apžvalga, vartotojo sąsajos atskyrimas, interneto paslaugų analizė, XML duomenų bazės, vartotojo sąsajos kūrimas, išvados ir literatūros sąrašas. Darbo apimtis – 48 p. teksto be priedų, 25 paveikslai ir 2 lentelės. Atskirai pridedami 2 darbo priedai. / The purpose of this thesis is to review most currently used user interfaces that help people interact with computers and other equipment, and begin exploring new user interface paradigm, which allows humans to interact naturally with the computer. Furthermore, analyze the most widely used methods today for implementing web services, to find a solution how interactive information systems could communicate with each other without any restrictions to gain an overall result choosing the best way to store and display relevant data to the program simpler and more flexible way. Create an interactive parallel corpus development environment prototype for minimizing available errors, if they occur, from the generated parallel translation as easy as possible using as less human labor as possible. Using the prototype, perform a study that will show trends in the use of different interface input devices. The work consists of 8 parts: introduction, overview of user interfaces, user interface separation, web services analysis, XML databases, user interface development, conclusions and references. Thesis consists of: 48 pages of text without appendixes, 25 pictures and 2 tables. Two enclosures of the work are enclosed separately.
|
34 |
Induction de lexiques bilingues à partir de corpus comparables et parallèlesJakubina, Laurent 07 1900 (has links)
No description available.
|
35 |
Častice v slovenčine a v češtine. Systémová a korpusovolingvistická analýza / Particles in Slovak and Czech. System and Corpus AnalysisŠimková, Mária January 2015 (has links)
The youngest word class type used to arouse great interest and discussions when entering the grammar; in some countries (e. g. in Germany) particles have been an object of systematic research. However, many other languages still lack a complex description of particles as a class on its own - they represent an appropriate material also for comparative researches. Differences in functioning and theoretical treatment of particles have been present in typologically different languages but they can emerge also in related languages, even in the case of Slovak and Czech. Lexicographical and grammar descriptions of these languages provide only small sets of particles (in Slovak roughly amounting to 400, in Czech exceeding 200) and are usually divided by authors into small groups and further on into even smaller subgroups. Due to specific features as well as to paradigmatic and syntagmatic relations with other language or speech phenomena even one particle or a couple of them or a narrowly defined group of particles can become an object of individual scientific and research projects. Step by step, our thesis presents the development of attitudes towards particles as an independent word class in general and in Russian linguistics in particular, grammar descriptions of particles in Slovak, Czech and other...
|
36 |
Ryska gerundier i översättning till och från svenska : Implicita och explicita betydelserMellquist, Simone January 2017 (has links)
Swedish lacks gerunds and is therefore suitable for studying implicit meanings of Russian gerunds that are forced to become explicit in Swedish translations. The present contrastive parallel corpus study explores translation correspondences in both directions. It is shown that constructions with finite verbs of perfective aspect followed by imperfective gerunds largely correspond to Swedish absolute withconstructions (= Swedish med-constructions). Another finding is the insertion of extra gerunds in connection with translation of Swedish locative constructions. A classification of Swedish explicit time markers corresponding to converb constructions is structured according to time relations (taxis relations): perfective aspect converbs show a clear correspondence to anterior markers, and imperfective converbs correlate with simultaneity markers. Contextual secondary meanings like means, purpose, cause, consequence are analyzed and various structures are found
|
37 |
Méthodes de veille textométrique multilingue appliquées à des corpus de l’environnement et de l’énergie : « Restitution, prévision et anticipation d’événements par poly-résonances croisées » / Textometric Multilingual Information Monitoring Methods Applied to Energy & Environment Corpora : "Restitution, Forecasting and Anticipation of Events by Cross Poly-resonance"Shen, Lionel 21 October 2016 (has links)
Cette thèse propose une série de méthodes de veille textométrique multilingue appliquées à des corpus thématiques. Pour constituer ce travail, deux types de corpus sont mobilisés : un corpus comparable et un corpus parallèle, composés de données textuelles extraites des discours de presse, ainsi que ceux des ONG. Les informations récupérées proviennent de trois mondes en trois langues différentes : français, anglais et chinois. La construction de ces deux corpus s’effectue autour de deux thèmes d’actualité ayant pour objet, l’environnement et l’énergie, avec une attention particulière sur trois notions : les énergies, le nucléaire et l’EPR. Après un bref rappel de l’état de l’art en intelligence économique, veille et textométrie, nous avons exposé les deux sujets retenus, les technicités morphosyntaxiques des trois langues dans les contextes nationaux et internationaux. Successivement, les caractéristiques globales, les convergences et les particularités de ces corpus ont été mises en évidence. Les dépouillements et les analyses qualitatives et quantitatives des résultats obtenus sont réalisés à l’aide des outils de la textométrie, notamment grâce aux analyses factorielles des correspondances, réseaux cooccurrentiels et poly-cooccurrentiels, spécificités du modèle hypergéométrique, segments répétés ou encore à la carte des sections. Ensuite, la veille bi-textuelle bilingue a été appliquée sur les trois mêmes concepts dans l’objectif de mettre en évidence les modes selon lesquels les corpus multilingues à caractère comparé et parallèle se complètent dans un processus de veille plurilingue, de restitution, de prévision et d’anticipation. Nous concluons notre recherche en proposant une méthode analytique par Objets-Traits-Entrées (OTE). / This thesis proposes a series of textometric multilingual information monitoring methods applied to thematic corpora (textometry is also called textual statistics or text data analysis). Two types of corpora are mobilized to create this work: a comparable corpus and a parallel corpus in which the textual data are extracted from the press and discourse of NGOs. The information source was retrieved from three countries in three different languages: English, French and Chinese. The two corpora were constructed on two topical issues concerning the environment and energy, with a focus on three concepts: energy, nuclear power and the EPR (European Pressurized Reactor or Evolutionary Power Reactor). After a brief review of the state of the art on business intelligence, information monitoring and textometry, we first set out the two chosen subjects – the environment and energy – and then the morphosyntactic features of the three languages in national and international contexts. The overall characteristics, similarities and peculiarities of these corpora are highlighted successively. The recounts and qualitative and quantitative analyses of the results were carried out using textometric tools, including factor analysis of correspondences, co-occurrences and polyco-occurrential networks, specificities of the hypergeometric model and repeated segments or map sections. Thereafter, bilingual bitextual information monitoring was applied to the same three concepts with the aim of elucidating how the comparable corpus and the parallel corpus can mutually help each other in a process of multilingual information monitoring, by restitution, forecasting and anticipation. We conclude our research by offering an analytical method called Objects-Features-Opening (OFO).
|
38 |
Skoner en kleiner vertaalgeheuesWolff, Friedel 10 1900 (has links)
Rekenaars kan ’n nuttige rol speel in vertaling. Twee benaderings
is vertaalgeheuestelsels en masjienvertaalstelsels. By
hierdie twee tegnologieë word ’n vertaalgeheue gebruik—’n
tweetalige versameling vorige vertalings. Hierdie proefskrif
bied metodes aan om die kwaliteit van ’n vertaalgeheue te verbeter.
’n Masjienleerbenadering word gevolg om foutiewe inskrywings
in ’n vertaalgeheue te identifiseer. ’n Verskeidenheid leerkenmerke
in drie kategorieë word aangebied: kenmerke wat
verband hou met tekslengte, kenmerke wat deur kwaliteittoetsers
soos vertaaltoetsers, ’n speltoetser en ’n grammatikatoetser
bereken word, asook statistiese kenmerke wat met behulp van
eksterne data bereken word.
Die evaluasie van vertaalgeheuestelsels is nog nie gestandaardiseer
nie. In hierdie proefskrif word ’n verskeidenheid
probleme met bestaande evaluasiemetodes uitgewys, en ’n verbeterde
evaluasiemetode word ontwikkel.
Deur die foutiewe inskrywings uit ’n vertaalgeheue te verwyder,
is ’n kleiner, skoner vertaalgeheue beskikbaar vir toepassings.
Eksperimente dui aan dat so ’n vertaalgeheue beter
prestasie behaal in ’n vertaalgeheuestelsel. As ondersteunende
bewys vir die waarde van ’n skoner vertaalgeheue word ’n
verbetering ook aangedui by die opleiding van ’n masjienvertaalstelsel. / Computers can play a useful role in translation. Two approaches
are translation memory systems and machine translation
systems. With these two technologies a translation memory
is used— a bilingual collection of previous translations.
This thesis presents methods to improve the quality of a translation
memory.
A machine learning approach is followed to identify incorrect
entries in a translation memory. A variety of learning features
in three categories are presented: features associated with text
length, features calculated by quality checkers such as translation
checkers, a spell checker and a grammar checker, as well
as statistical features computed with the help of external data.
The evaluation of translation memory systems is not yet standardised.
This thesis points out a number of problems with existing
evaluation methods, and an improved evaluation method
is developed.
By removing the incorrect entries in a translation memory, a
smaller, cleaner translation memory is available to applications.
Experiments demonstrate that such a translation memory results
in better performance in a translation memory system.
As supporting evidence for the value of a cleaner translation
memory, an improvement is also achieved in training a machine
translation system. / School of Computing / Ph. D. (Rekenaarwetenskap)
|
39 |
Ryska diminutiv i svensk översättning : En parallellkorpusstudieBergstedt, Pontus January 2023 (has links)
Hur språkspecifika ord ska översättas mellan två språk är inte alltid självklart. Därför har denna uppsats i syfte att undersöka hur ryska diminutiva vanliga substantiv översätts till svenska. Svårigheten vid översättning uppstår bland annat av att ryska är ett språk rikt på diminutiv medan användning av diminutiv i svenska är sällsynt. På grund av denna skillnad är det inte givet att det finns en lexikal motsvarighet i svenska för ryska diminutiv och översättningen måste anpassas om den ska få med den diminutiva innebörden. Genom att undersöka hur ryska diminutiv brukar översättas till svenska, och om ryska diminutiv uppstår vid översättning från svenska, kan den rysk-svenska språkspecificiteten för diminutiv kartläggas. Med hjälp av en parallellkorpus kan genuina texter och deras översättningar undersökas i en kvalitativ analys för att kategorisera de vanligaste översättningsstrategierna och huruvida den diminutiva innebörden bevaras vid översättning. Denna uppsats undersökning visar att ryska diminutiv av vanliga substantiv är språkspecifika gentemot svenska. Detta då innebörden oftast går förlorad vid översättning till svenska, medan diminutiv ofta uppstår vid översättning till ryska, trots att den svenska källtexten helt saknar diminutiva ord. / Каким образом лингвоспецифичные слова должны быть переведены между двумя языками, не всегда очевидно. Цель этой диссертации исследовать, как русские уменьшительно-ласкательные существительные переводятся на шведский язык. Трудности при переводе возникают, помимо прочего, из-за того, что в русском языке диминутивы широко распространены, тогда как в шведском языке они используются редко. По причине этих различий нельзя полноправно утверждать, что в шведском языке существует лексический эквивалент русских диминутивов, следовательно, любой перевод должен быть адаптирован, если он включает уменьшительно-ласкательные слова. Путем анализа того, как русские диминутивы обычно переводятся на шведский язык и как возникают русские уменьшительные формы при переводе со шведского, можно выявить русско-шведскую лингвистическую специфику уменьшительно-ласкательных. С помощью параллельного корпуса в этой работе анализируются самые распространенные стратегии перевода. Исследование в данном дипломном сочинении показывает, что русские уменьшительно-ласкательные формы имён нарицательных являются лингвоспецифичными, поскольку значение чаще всего теряется. При переводе со шведского на русский часто появляются диминутивы, хотя в шведских текстах они могут полностью отсутствовать. / How language specific words should be translated between two languages is not always obvious. Therefore, this thesis has the intention to examine how Russian diminutive common nouns are translated into Swedish. The difficulties of translation, inter alia, arise from the fact that the Russian language is rich in diminutives, whereas the usage of diminutives in Swedish is rare. Because of this difference, it is not guaranteed that there is a lexical counterpart in Swedish for Russian diminutives, and the translation must be adapted to retain the diminutive meaning. By examining how Russian diminutives usually are translated into Swedish, and whether Russian diminutives emerge during translation from Swedish, the Russian-Swedish language specificity for diminutives could be mapped. With the help of a parallel corpus, native texts and their translations can be examined in a qualitative analysis to categorise the most common translation strategies, and whether the diminutive meaning is retained during translation. The findings presented in this thesis demonstrate that Russian diminutives of common nouns are exceedingly language specific toward Swedish, as the meaning frequently is lost. During translation from Swedish to Russian diminutives usually emerge, despite that the Swedish original text completely lacks diminutive words.
|
40 |
As equivalências no português e no italiano de verbos suecos com prefixos de origem germânica num corpus paralelo de textos escritos / The equivalences of Swedish verbs with prefixes of a Germanic origin in Portuguese and Italian in a parallel written corpusCuofano, Letizia January 2011 (has links)
Os prefixos germânicos de alguns verbos suecos serão comparados numa análise contrastiva com as relativas equivalências em português e em italiano num corpus paralelo escrito composto por um romance de língua sueca, um de língua portuguesa e um de língua italiana e pelas suas respectivas traduções. As funções desenvolvidas pelos prefixos germânicos dos verbos suecos analisados serão examinadas e depois confrontadas com as relativas equivalências, com o resultado que também nas duas línguas românicas relevam-se, de maneira bastante constante, procedimentos gramaticais parecidos aos desenvolvidos pelos prefixos germânicos. / Germanic prefixes of which some Swedish verbs are composed are going to be compared in acontrastive analysis with their relative equivalences in Portuguese and Italian in a parallel written corpus characterized by a Swedish-language romance, a Portuguese-language romance and an Italian language romance, and by their relative translations. The functions executed by the German prefixes of the analysed Swedish verbs are going to be examined and then compared with their relative equivalences, with the result that even in the Romance languages it is possible to find in a quite constant way grammatical processes which are similar to those executed by the Germanic prefixes. / I prefissi germanici di alcuni verbi svedesi saranno comparati in un'analisi contrastiva con le relative equivalenze in portoghese e in italiano in un corpus parallelo scritto composto da un romanzo di lingua svedese, uno di lingua portoghese e uno di lingua italiana e dalle rispettive traduzioni. Le funzioni svolte dai prefissi germanici dei verbi svedesi analizzati saranno esaminate e poi confrontate con le relative equivalenze, con il risultato che anche nelle due lingue romanze si riscontrano in maniera abbastanza costante processi grammaticali simili a quelli svolti dai prefissi germanici. / De germanska prefix som återfinns i vissa svenska verb kommer att jämföras med sina motsvarigheter på portugisiska och italienska. Detta görs med hjälp av en skriven korpus bestående av en roman ursprungligen skriven på svenska, en skriven på portugisiska och en skriven på italienska samt översättningar av dessa romaner till de två andra språken. Funktionen hos de svenska verben med germanska prefix kommer att analyseras och sedan jämföras med verbens motsvarigheter. Resultatet av analysen visar att det är möjligt att finna systematiskt återkommande grammatiska processer i de romanska språken, som liknar de som förekommer i samband med de germanska prefixen på svenska.
|
Page generated in 0.0391 seconds