281 |
Zero-Shot Cross-Lingual Domain Adaptation for Neural Machine Translation : Exploring The Interplay Between Language And Domain TransferabilityShahnazaryan, Lia January 2024 (has links)
Within the field of neural machine translation (NMT), transfer learning and domain adaptation techniques have emerged as central solutions to overcome the data scarcity challenges faced by low-resource languages and specialized domains. This thesis explores the potential of zero-shot cross-lingual domain adaptation, which integrates principles of transfer learning across languages and domain adaptation. By fine-tuning a multilingual pre-trained NMT model on domain-specific data from one language pair, the aim is to capture domain-specific knowledge and transfer it to target languages within the same domain, enabling effective zero-shot cross-lingual domain transfer. This study conducts a series of comprehensive experiments across both specialized and mixed domains to explore the feasibility and influencing factors of zero-shot cross-lingual domain adaptation. The results indicate that fine-tuned models generally outperform the pre-trained baseline in specialized domains and most target languages. However, the extent of improvement depends on the linguistic complexity of the domain, as well as the transferability potential driven by the linguistic similarity between the pivot and target languages. Additionally, the study examines zero-shot cross-lingual cross-domain transfer, where models fine-tuned on mixed domains are evaluated on specialized domains. The results reveal that while cross-domain transfer is feasible, its effectiveness depends on the characteristics of the pivot and target domains, with domains exhibiting more consistent language being more responsive to cross-domain transfer. By examining the interplay between language-specific and domain-specific factors, the research explores the dynamics influencing zero-shot cross-lingual domain adaptation, highlighting the significant role played by both linguistic relatedness and domain characteristics in determining the transferability potential.
|
282 |
Statistical approaches for natural language modelling and monotone statistical machine translationAndrés Ferrer, Jesús 11 February 2010 (has links)
Esta tesis reune algunas contribuciones al reconocimiento de formas estadístico y, más especícamente, a varias tareas del procesamiento del lenguaje natural. Varias técnicas estadísticas bien conocidas se revisan en esta tesis, a saber: estimación paramétrica, diseño de la función de pérdida y modelado estadístico. Estas técnicas se aplican a varias tareas del procesamiento del lenguajes natural tales como clasicación de documentos, modelado del lenguaje natural
y traducción automática estadística.
En relación con la estimación paramétrica, abordamos el problema del suavizado proponiendo una nueva técnica de estimación por máxima verosimilitud con dominio restringido (CDMLEa ). La técnica CDMLE evita la necesidad de la etapa de suavizado que propicia la pérdida de las propiedades del estimador máximo verosímil. Esta técnica se aplica a clasicación de documentos mediante el clasificador Naive Bayes. Más tarde, la técnica CDMLE se extiende a la estimación por máxima verosimilitud por leaving-one-out aplicandola al suavizado de modelos de lenguaje. Los resultados obtenidos en varias tareas de modelado del lenguaje natural, muestran una mejora en términos de perplejidad.
En a la función de pérdida, se estudia cuidadosamente el diseño de funciones de pérdida diferentes a la 0-1. El estudio se centra en aquellas funciones de pérdida que reteniendo una complejidad de decodificación similar a la función 0-1, proporcionan una mayor flexibilidad. Analizamos y presentamos varias funciones de pérdida en varias tareas de traducción automática y con varios modelos de traducción. También, analizamos algunas reglas de traducción que destacan por causas prácticas tales como la regla de traducción directa; y, así mismo, profundizamos en la comprensión de los modelos log-lineares, que son de hecho, casos particulares de funciones de pérdida.
Finalmente, se proponen varios modelos de traducción monótonos basados en técnicas de modelado estadístico . / Andrés Ferrer, J. (2010). Statistical approaches for natural language modelling and monotone statistical machine translation [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/7109
|
283 |
Different Contributions to Cost-Effective Transcription and Translation of Video LecturesSilvestre Cerdà, Joan Albert 05 April 2016 (has links)
[EN] In recent years, on-line multimedia repositories have experiencied a strong
growth that have made them consolidated as essential knowledge assets, especially
in the area of education, where large repositories of video lectures have been
built in order to complement or even replace traditional teaching methods.
However, most of these video lectures are neither transcribed nor translated
due to a lack of cost-effective solutions to do so in a way that gives accurate
enough results. Solutions of this kind are clearly necessary in order to make
these lectures accessible to speakers of different languages and to people with
hearing disabilities. They would also facilitate lecture searchability and
analysis functions, such as classification, recommendation or plagiarism
detection, as well as the development of advanced educational functionalities
like content summarisation to assist student note-taking.
For this reason, the main aim of this thesis is to develop a cost-effective
solution capable of transcribing and translating video lectures to a reasonable
degree of accuracy. More specifically, we address the integration of
state-of-the-art techniques in Automatic Speech Recognition and Machine
Translation into large video lecture repositories to generate high-quality
multilingual video subtitles without human intervention and at a reduced
computational cost. Also, we explore the potential benefits of the exploitation
of the information that we know a priori about these repositories, that is,
lecture-specific knowledge such as speaker, topic or slides, to create
specialised, in-domain transcription and translation systems by means of
massive adaptation techniques.
The proposed solutions have been tested in real-life scenarios by carrying out
several objective and subjective evaluations, obtaining very positive results.
The main outcome derived from this thesis, The transLectures-UPV
Platform, has been publicly released as an open-source software, and, at the
time of writing, it is serving automatic transcriptions and translations for
several thousands of video lectures in many Spanish and European
universities and institutions. / [ES] Durante estos últimos años, los repositorios multimedia on-line han experimentado un gran
crecimiento que les ha hecho establecerse como fuentes fundamentales de conocimiento,
especialmente en el área de la educación, donde se han creado grandes repositorios de vídeo
charlas educativas para complementar e incluso reemplazar los métodos de enseñanza tradicionales.
No obstante, la mayoría de estas charlas no están transcritas ni traducidas debido a
la ausencia de soluciones de bajo coste que sean capaces de hacerlo garantizando una calidad
mínima aceptable. Soluciones de este tipo son claramente necesarias para hacer que las vídeo
charlas sean más accesibles para hablantes de otras lenguas o para personas con discapacidades auditivas.
Además, dichas soluciones podrían facilitar la aplicación de funciones de
búsqueda y de análisis tales como clasificación, recomendación o detección de plagios, así
como el desarrollo de funcionalidades educativas avanzadas, como por ejemplo la generación
de resúmenes automáticos de contenidos para ayudar al estudiante a tomar apuntes.
Por este motivo, el principal objetivo de esta tesis es desarrollar una solución de bajo
coste capaz de transcribir y traducir vídeo charlas con un nivel de calidad razonable. Más
específicamente, abordamos la integración de técnicas estado del arte de Reconocimiento del
Habla Automático y Traducción Automática en grandes repositorios de vídeo charlas educativas
para la generación de subtítulos multilingües de alta calidad sin requerir intervención
humana y con un reducido coste computacional. Además, también exploramos los beneficios
potenciales que conllevaría la explotación de la información de la que disponemos a priori
sobre estos repositorios, es decir, conocimientos específicos sobre las charlas tales como el
locutor, la temática o las transparencias, para crear sistemas de transcripción y traducción
especializados mediante técnicas de adaptación masiva.
Las soluciones propuestas en esta tesis han sido testeadas en escenarios reales llevando
a cabo nombrosas evaluaciones objetivas y subjetivas, obteniendo muy buenos resultados.
El principal legado de esta tesis, The transLectures-UPV Platform, ha sido liberado públicamente
como software de código abierto, y, en el momento de escribir estas líneas, está
sirviendo transcripciones y traducciones automáticas para diversos miles de vídeo charlas
educativas en nombrosas universidades e instituciones Españolas y Europeas. / [CA] Durant aquests darrers anys, els repositoris multimèdia on-line han experimentat un gran
creixement que els ha fet consolidar-se com a fonts fonamentals de coneixement, especialment
a l'àrea de l'educació, on s'han creat grans repositoris de vídeo xarrades educatives per
tal de complementar o inclús reemplaçar els mètodes d'ensenyament tradicionals. No obstant
això, la majoria d'aquestes xarrades no estan transcrites ni traduïdes degut a l'absència de
solucions de baix cost capaces de fer-ho garantint una qualitat mínima acceptable. Solucions
d'aquest tipus són clarament necessàries per a fer que les vídeo xarres siguen més accessibles
per a parlants d'altres llengües o per a persones amb discapacitats auditives. A més, aquestes
solucions podrien facilitar l'aplicació de funcions de cerca i d'anàlisi tals com classificació,
recomanació o detecció de plagis, així com el desenvolupament de funcionalitats educatives
avançades, com per exemple la generació de resums automàtics de continguts per ajudar a
l'estudiant a prendre anotacions.
Per aquest motiu, el principal objectiu d'aquesta tesi és desenvolupar una solució de baix
cost capaç de transcriure i traduir vídeo xarrades amb un nivell de qualitat raonable. Més
específicament, abordem la integració de tècniques estat de l'art de Reconeixement de la
Parla Automàtic i Traducció Automàtica en grans repositoris de vídeo xarrades educatives
per a la generació de subtítols multilingües d'alta qualitat sense requerir intervenció humana
i amb un reduït cost computacional. A més, també explorem els beneficis potencials que
comportaria l'explotació de la informació de la que disposem a priori sobre aquests repositoris,
és a dir, coneixements específics sobre les xarrades tals com el locutor, la temàtica o
les transparències, per a crear sistemes de transcripció i traducció especialitzats mitjançant
tècniques d'adaptació massiva.
Les solucions proposades en aquesta tesi han estat testejades en escenaris reals duent a
terme nombroses avaluacions objectives i subjectives, obtenint molt bons resultats. El principal
llegat d'aquesta tesi, The transLectures-UPV Platform, ha sigut alliberat públicament
com a programari de codi obert, i, en el moment d'escriure aquestes línies, està servint transcripcions
i traduccions automàtiques per a diversos milers de vídeo xarrades educatives en
nombroses universitats i institucions Espanyoles i Europees. / Silvestre Cerdà, JA. (2016). Different Contributions to Cost-Effective Transcription and Translation of Video Lectures [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62194
|
284 |
Analyse lexicale, morphologique et syntaxique du Thaï en vue de la traduction automatique appliquée au domaine de l'administration publique / The lexical morpho-syntactic analysis of Thai machine translation applied to the domain of public adminstrationKiattibutra-Anantalapochai, Raksi 13 September 2011 (has links)
Cette recherche présente une méthode d'analyse micro-systémique des mots composés thaïs. Le but denotre étude est de trouver une réponse au questionnement suivant « existe-t- il une voie qui permette de traduireautomatiquement les mots thaïs vers le français avec un résultat parfait ? ». Ce travail est divisé en cinqchapitres. La première partie concerne une histoire brève de la traduction automatique dont celle du thaï. Lespoints de vue des autres travaux sont étudiés. Le deuxième chapitre présente les caractéristiques de la langue thaïe qui possède une forme d’écriture typique sans espacement et peut entrainer des difficultés en termes d’ambiguïté dans la traduction. Certaines divergences entre le thaï et le français sont soulignées à l’aide de la théorie micro-systémique du Centre Tesnière. Le troisième chapitre fait l’étude des mots composés thaïs en utilisant une méthode hybride de l’analyse morphosyntaxique et notre système à base de règles conformes à notre modèle d'analyse de données. Le quatrième chapitre met en évidence un contrôle modélisé des unités lexicales codées syntaxiquement et sémantiquement afin d’en définir des algorithmes efficaces. Le dernier chapitre conclut sur les résultats des nouveaux algorithmes par leur informatisation. Sont enfin énoncées les perspectives ouvertes par cette nouvelle recherche. Cette étude est présentée comme un travail fiable à l’élimination des ambiguïtés. Fondée sur une méthode hybride, elle nous a permis d’atteindre notre objectif et de trouver ainsi une voie efficace qui nous autorise à traduire automatiquement les mots thaïs vers le français. Le résultat place cet outil comme l’un des plus accessibles à la recherche internationale où le thaï et le français prennent leurs places de choix / This thesis presents a method of Micro-Systemic Linguistic Analysis of Thai compound words. The aim of our study is to find out: “Is there any method which allows us to translate Thai words into French automatically with a perfect result?” Our work is divided into five chapters as follows : The first chapter concerns a brief a history of machine translation including that of the Thai. Some notable points of view are discussed. The second chapter identifies some essential characteristics of the Thai language such as a non-space writing style resulted in ambiguity in machine translation. Different entities between Thai and French languages are underlined by means of the micro-systematic theory of the Centre Tesnière. The third chapter analyses Thai compound words using a hybrid method involving morpho-syntactic parsing and a rule-based system corresponding to our model of data analysis. The fourth chapter employs a technique of lexical-syntactic and semantic control enabling the definition of efficient algorithms. The final chapter concludes our work with some future perspectives. This study is presented as a reliable approach which enhances the elimination of word ambiguities in machine translation. This hybrid method allows us to reach our objective and to find an effective way to translate Thai to French automatically. The result could be an accessible tool for international research in the Thai and French languages
|
285 |
Specifika počítačem podporovaného překladu z němčiny do češtiny / CAT Tools in German - Czech TranslationHandšuhová, Jana January 2013 (has links)
Abstract This thesis handles special translation software, the mastery of which is becoming one of the basic requirements of successful translation work. The theoretical part describes the historical development, classification and main functions of translation memory systems. The thesis will further attempt to determine the criteria for the effective use of CAT tools and explore the text types and sorts for which the translation memory systems are most commonly used in the translation process. The functional view of the language-based text typology and the principles on which the translation memory systems work will also be handled. The practical part compares the result of a translation process (translation as a product) with and without CAT tools. The corpus of parallel texts (original translation) will be subjected to a translation analysis. This analysis concludes the levels which are affected by differences between translations made with and without CAT tools. The differences in the actual translation process with and without CAT tools which are not empirically verifiable will be analysed based on a survey conducted amongst translators. Then, the empirical part of the findings are summarized and systemized. The last chapter deals with the expected development in the translation market, the...
|
286 |
Investigating the effectiveness of available tools for translating into tshiVendaNemutamvuni, Mulalo Edward 11 1900 (has links)
Text in English / Abstracts in English and Venda / This study has investigated the effectiveness of available tools used for translating from English into Tshivenḓa and vice versa with the aim to investigate and determine the effectiveness of these tools. This study dealt with the problem of lack of effective translation tools used to translate between English and Tshivenḓa. Tshivenḓa is one of South Africa’s minority languages. Its (Tshivenḓa) lack of effective translation tools negatively affects language practitioners’ work. This situation is perilous for translation quality assurance. Translation tools, both computer technology and non-computer technology tools abound for developed languages such as English, French and others. Based on the results of this research project, the researcher did make recommendations that could remedy the situation. South Africa is a democratic country that has a number of language-related policies. This then creates a conducive context for stakeholders with language passion to fully develop Tshivenḓa language in all dimensions. The fact is that all languages have evolved and they were all underdeveloped. This vividly shows that Tshivenḓa language development is also possible just like Afrikaans, which never existed on earth before 1652. It (Afrikaans) has evolved and overtaken all indigenous South African languages.
This study did review the literature regarding translation and translation tools. The literature was obtained from both published and unpublished sources. The study has used mixed methods research, i.e. quantitative and qualitative research methods. These methods successfully complemented each other throughout the entire research. Data were gathered through questionnaires and interviews wherein both open and closed-ended questions were employed. Both purposive/judgemental and snowball (chain) sampling have been applied in this study. Data analysis was addressed through a combination of methods owing to the nature of mixed methods research. Guided by analytic comparison approach when grouping together related data during data analysis and presentation, both statistical and textual analyses have been vital in this study. Themes were constructed to lucidly present the gathered data. At the last chapters, the researcher discussed the findings and evaluated the entire research before making recommendations and conclusion. / Iyi ṱhoḓisiso yo ita tsedzuluso nga ha kushumele kwa zwishumiswa zwi re hone zwine zwa shumiswa u pindulela u bva kha luambo lwa English u ya kha Tshivenḓa na u bva kha Tshivenḓa u ya kha English ndivho I ya u sedzulusa na u lavhelesa kushumele kwa izwi zwishumiswa uri zwi a thusa naa. Ino ṱhoḓisiso yo shumana na thaidzo ya ṱhahelelo ya zwishumiswa zwa u pindulela zwine zwa shumiswa musi hu tshi pindulelwa vhukati ha English na Tshivenḓa. Tshivenḓa ndi luṅwe lwa nyambo dza Afrika Tshipembe dzine dza ambiwa nga vhathu vha si vhanzhi. U shaea ha zwishumiswa zwa u pindulela zwine zwa shuma nga nḓila I thusaho zwi kwama mushumo wa vhashumi vha zwa nyambo nga nḓila I si yavhuḓi. Iyi nyimele I na mulingo u kwamaho khwaḽithi ya zwo pindulelwaho. Zwishumiswa zwa u pindulela, zwa thekhnoḽodzhi ya khomphiyutha na zwi sa shumisi thekhnoḽodzhi ya khomphiyutha zwo ḓalesa kha nyambo dzo bvelelaho u tou fana na kha English, French na dziṅwe. Zwo sendeka kha mvelelo dza ino thandela ya ṱhoḓisiso, muṱoḓisisi o ita themendelo dzine dza nga fhelisa thaidzo ya nyimele. Afrika Tshipembe ndi shango ḽa demokirasi ḽine ḽa vha na mbekanyamaitele dzo vhalaho nga ha dzinyambo. Izwi zwi ita uri hu vhe na nyimele ine vhafaramikovhe vhane vha funesa nyambo vha kone u bveledza Tshivenḓa kha masia oṱhe. Zwavhukuma ndi zwa uri nyambo dzoṱhe dzi na mathomo nahone dzoṱhe dzo vha dzi songo bvelela. Izwi zwi ita uri zwi vhe khagala uri luambo lwa Tshivenḓa na lwone lu nga bveledzwa u tou fana na luambo lwa Afrikaans lwe lwa vha lu si ho ḽifhasini phanḓa ha ṅwaha wa 1652. Ulu luambo (Afrikaans) lwo vha hone shangoni lwa mbo bveledzwa lwa fhira nyambo dzoṱhe dza fhano hayani Afrika Tshipembe.
Kha ino ṱhoḓisiso ho vhaliwa maṅwalwa ane a amba nga ha u pindulela na nga ha zwishumiswa zwa u pindulela. Maṅwalwa e a vhalwa o wanala kha zwiko zwo kanḓiswaho na zwiko zwi songo kanḓiswaho. Ino ṱhoḓisiso yo shumisa ngona dza ṱhoḓisiso dzo ṱanganyiswaho, idzo ngona ndi khwanthithethivi na khwaḽithethivi. Idzi ngona dzo shumisana zwavhuḓisa kha ṱhoḓisiso yoṱhe. Data yo kuvhanganywa hu tshi khou shumiswa dzimbudziso na u tou vhudzisa hune afho ho shumiswa mbudziso dzo vuleaho na dzo valeaho. Ngona dza u nanga sambula muṱoḓisisi o shumisa khaṱulo yawe uri ndi nnyi ane a nga vha a na data yo teaho na u humbela vhavhudziswa uri vha bule vhaṅwe vhathu vha re na data yo teaho ino ṱhoḓisiso.
viii
Tsenguluso ya data ho ṱanganyiswa ngona dza u sengulusa zwo itiswa ngauri ṱhoḓisiso ino yo ṱanganyisa ngona dza u ita ṱhoḓisiso. Sumbanḓila ho shumiswa tsenguluso ya mbambedzo kha u sengulusa data. Data ine ya fana yo vhewa fhethu huthihi musi hu tshi khou senguluswa na u vhiga. Tsenguluso I shumisaho mbalo/tshivhalo (khwanthithethivi) na I shumisaho maipfi kha ino ngudo dzo shumiswa. Ho vhumbiwa dziṱhoho u itela u ṱana data ye ya kuvhanganywa. Ngei kha ndima dza u fhedza, muṱodisisi o rera nga ha mawanwa, o ṱhaṱhuvha ṱhoḓisiso yoṱhe phanḓa ha u ita themendelo na u vhina. / African Languages / M.A. (African Languages)
|
287 |
Extraction de phrases parallèles à partir d’un corpus comparable avec des réseaux de neurones récurrents bidirectionnelsGrégoire, Francis 12 1900 (has links)
No description available.
|
288 |
Étude sur l’influence du vocabulaire utilisé pour l’indexation des images en contexte de repérage multilingueMénard, Elaine 27 November 2008 (has links)
Depuis quelques années, Internet est devenu un média incontournable pour la diffusion de ressources multilingues. Cependant, les différences linguistiques constituent souvent un obstacle majeur aux échanges de documents scientifiques, culturels, pédagogiques et commerciaux. En plus de cette diversité linguistique, on constate le développement croissant de bases de données et de collections composées de différents types de documents textuels ou multimédias, ce qui complexifie également le processus de repérage documentaire. En général, on considère l’image comme « libre » au point de vue linguistique. Toutefois, l’indexation en vocabulaire contrôlé ou libre (non contrôlé) confère à l’image un statut linguistique au même titre que tout document textuel, ce qui peut avoir une incidence sur le repérage.
Le but de notre recherche est de vérifier l’existence de différences entre les caractéristiques de deux approches d’indexation pour les images ordinaires représentant des objets de la vie quotidienne, en vocabulaire contrôlé et en vocabulaire libre, et entre les résultats obtenus au moment de leur repérage. Cette étude suppose que les deux approches d’indexation présentent des caractéristiques communes, mais également des différences pouvant influencer le repérage de l’image. Cette recherche permet de vérifier si l’une ou l’autre de ces approches d’indexation surclasse l’autre, en termes d’efficacité, d’efficience et de satisfaction du chercheur d’images, en contexte de repérage multilingue.
Afin d’atteindre le but fixé par cette recherche, deux objectifs spécifiques sont définis : identifier les caractéristiques de chacune des deux approches d’indexation de l’image ordinaire représentant des objets de la vie quotidienne pouvant influencer le repérage, en contexte multilingue et exposer les différences sur le plan de l’efficacité, de l’efficience et de la satisfaction du chercheur d’images à repérer des images ordinaires représentant des objets de la vie quotidienne indexées à l’aide d’approches offrant des caractéristiques variées, en contexte multilingue. Trois modes de collecte des données sont employés : l’analyse des termes utilisés pour l’indexation des images, la simulation du repérage d’un ensemble d’images indexées selon chacune des formes d’indexation à l’étude réalisée auprès de soixante répondants, et le questionnaire administré aux participants pendant et après la simulation du repérage. Quatre mesures sont définies pour cette recherche : l’efficacité du repérage d’images, mesurée par le taux de succès du repérage calculé à l’aide du nombre d’images repérées; l’efficience temporelle, mesurée par le temps, en secondes, utilisé par image repérée; l’efficience humaine, mesurée par l’effort humain, en nombre de requêtes formulées par image repérée et la satisfaction du chercheur d’images, mesurée par son autoévaluation suite à chaque tâche de repérage effectuée.
Cette recherche montre que sur le plan de l’indexation de l’image ordinaire représentant des objets de la vie quotidienne, les approches d’indexation étudiées diffèrent fondamentalement l’une de l’autre, sur le plan terminologique, perceptuel et structurel. En outre, l’analyse des caractéristiques des deux approches d’indexation révèle que si la langue d’indexation est modifiée, les caractéristiques varient peu au sein d’une même approche d’indexation. Finalement, cette recherche souligne que les deux approches d’indexation à l’étude offrent une performance de repérage des images ordinaires représentant des objets de la vie quotidienne différente sur le plan de l’efficacité, de l’efficience et de la satisfaction du chercheur d’images, selon l’approche et la langue utilisées pour l’indexation. / During the last few years, the Internet has become an indispensable medium for the dissemination of multilingual resources. However, language differences are often a major obstacle to the exchange of scientific, cultural, educational and commercial documents. Besides this linguistic diversity, many databases and collections now contain documents in various formats that can also adversely affect their retrieval process. In general, images are considered to be language-independent resources. Nevertheless, the image indexing process using either a controlled or uncontrolled vocabulary gives the image a linguistic status similar to any other textual document and thus leads to the same difficulties in their retrieval.
The goal of our research is to first identify the differences between the indexing approaches using a controlled and an uncontrolled vocabulary for ordinary images of everyday-life objects and to then differentiate between the results obtained at the time of image retrieval. This study supposes that the two indexing approaches show not only common characteristics, but also differences that can influence image retrieval. Thus, this research makes it possible to indicate if one of these indexing approaches surpasses the other in terms of effectiveness, efficiency, and satisfaction of the image searcher in a multilingual retrieval context.
For this study, two specific objectives are defined: to identify the characteristics of each approach used for ordinary image indexing of everyday-life objects that can effect image retrieval in a multilingual context; and to explore the differences between the two indexing approaches in terms of their effectiveness, their efficiency, and the satisfaction of the image searcher when trying to retrieve ordinary images of everyday-life objects indexed according to either approach in a multilingual retrieval context. Three methods of data collection are used: an analysis of the image indexing terms, a simulation of the retrieval of a set of images indexed according to each of the two indexing approaches conducted with sixty respondents, and a questionnaire submitted to the participants during and after the retrieval simulation. Four measures are defined in this research: the effectiveness of image retrieval measured by the success rate calculated in terms of the number of retrieved images; time efficiency measured by the average time, in seconds, used to retrieve an image; human efficiency measured in terms of the human effort represented per average number of queries necessary to retrieve an image; and the satisfaction of the image searcher measured by the self-evaluation of the participant of the retrieval process after each completed task.
This research shows that in terms of ordinary image indexing representing everyday-life objects, the two approaches investigated are fundamentally distinct on the terminological, perceptual, and structural perspective. Additionally, the analysis of the characteristics of the two indexing approaches reveals that if the indexing language differs, the characteristics vary little within the same indexing approach. Finally, this research underlines that the two indexing approaches of ordinary images representing everyday-life objects have a retrieval performance that is different in terms of its effectiveness, efficiency, and satisfaction of the image searcher according to the approach and the language used for indexing.
|
289 |
An exploratory study of translations of the Dewey Decimal Classification system into South African languagesDe Jager, Gert Johannes Jacobus 06 1900 (has links)
This research investigated the feasibility of South African translations of Dewey Decimal Classification (DDC). The study provides an introductory overview of DDC throughout the world, followed by its use in South Africa. The introduction highlights shortcomings and possible solutions – of which translations seem to be the most ideal. This research involved a critical analysis of the literature on DDC
translations, a documentary analysis and technology-based research in the form of Google translations and evaluation of parts of Abridged Edition 15 of DDC.
The critical analysis of the literature and the documentary analysis identified problems relating to translations, how translations deal with shortcomings in DDC, the fact that no literature exists on multilingual translations, and the process of translations (including the fact that this is an expensive endeavour). It also revealed information about sponsorship and the mixed translation model.
The technology-based research, using Google Translate for translations of parts of Abridged Edition 15 and the subsequent evaluation of these translations indicated that Google translations were comprehensive and needed minimum editorial effort. Further to this it paved the way for describing a possible workflow for South African translations and indicated that the parts already translated as well as
further Google translations can expedite the translation process. A model for South African translations, based on only the cost of the Pansoft translation software was proposed. The mixed model approach, where some languages are used as main languages (schedules, Relative Index terms and the like) and others for Relative Index terms only, was deemed the most appropriate in the South African context.
This led to the conclusion that DDC translations into ten of the official South African languages are indeed feasible. The research supports translations that keep the integrity of DDC intact, with possible expansions based on literary arrant. It is important, though, to get the support of the South African library community and authoritative bodies such as the National Library of South Africa and/or the Library and Information Association of South Africa (LIASA) to negotiate and sign a contract for these translations. / Information Science / D. Litt. et Phil. (Information Science)
|
290 |
Skoner en kleiner vertaalgeheuesWolff, Friedel 10 1900 (has links)
Rekenaars kan ’n nuttige rol speel in vertaling. Twee benaderings
is vertaalgeheuestelsels en masjienvertaalstelsels. By
hierdie twee tegnologieë word ’n vertaalgeheue gebruik—’n
tweetalige versameling vorige vertalings. Hierdie proefskrif
bied metodes aan om die kwaliteit van ’n vertaalgeheue te verbeter.
’n Masjienleerbenadering word gevolg om foutiewe inskrywings
in ’n vertaalgeheue te identifiseer. ’n Verskeidenheid leerkenmerke
in drie kategorieë word aangebied: kenmerke wat
verband hou met tekslengte, kenmerke wat deur kwaliteittoetsers
soos vertaaltoetsers, ’n speltoetser en ’n grammatikatoetser
bereken word, asook statistiese kenmerke wat met behulp van
eksterne data bereken word.
Die evaluasie van vertaalgeheuestelsels is nog nie gestandaardiseer
nie. In hierdie proefskrif word ’n verskeidenheid
probleme met bestaande evaluasiemetodes uitgewys, en ’n verbeterde
evaluasiemetode word ontwikkel.
Deur die foutiewe inskrywings uit ’n vertaalgeheue te verwyder,
is ’n kleiner, skoner vertaalgeheue beskikbaar vir toepassings.
Eksperimente dui aan dat so ’n vertaalgeheue beter
prestasie behaal in ’n vertaalgeheuestelsel. As ondersteunende
bewys vir die waarde van ’n skoner vertaalgeheue word ’n
verbetering ook aangedui by die opleiding van ’n masjienvertaalstelsel. / Computers can play a useful role in translation. Two approaches
are translation memory systems and machine translation
systems. With these two technologies a translation memory
is used— a bilingual collection of previous translations.
This thesis presents methods to improve the quality of a translation
memory.
A machine learning approach is followed to identify incorrect
entries in a translation memory. A variety of learning features
in three categories are presented: features associated with text
length, features calculated by quality checkers such as translation
checkers, a spell checker and a grammar checker, as well
as statistical features computed with the help of external data.
The evaluation of translation memory systems is not yet standardised.
This thesis points out a number of problems with existing
evaluation methods, and an improved evaluation method
is developed.
By removing the incorrect entries in a translation memory, a
smaller, cleaner translation memory is available to applications.
Experiments demonstrate that such a translation memory results
in better performance in a translation memory system.
As supporting evidence for the value of a cleaner translation
memory, an improvement is also achieved in training a machine
translation system. / School of Computing / Ph. D. (Rekenaarwetenskap)
|
Page generated in 0.7612 seconds