• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 156
  • 38
  • 21
  • 13
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • Tagged with
  • 300
  • 300
  • 108
  • 77
  • 61
  • 56
  • 56
  • 54
  • 49
  • 47
  • 46
  • 42
  • 35
  • 32
  • 32
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
181

Grammatical Error Correction for Learners of Swedish as a Second Language

Nyberg, Martina January 2022 (has links)
Grammatical Error Correction refers to the task of automatically correcting errors in written text, typically with respect to texts written by learners of a second language. The work in this thesis implements and evaluates two methods to Grammatical Error Correction for Swedish. In addition, the proposed methods are compared to an existing, rule-based system. Previous research on GEC for the Swedish language is limited and has not yet utilized the potential of neural networks. The first method implemented in this work is based on a neural machine translation approach, training a Transformer model to translate erroneous text into a corrected version. A parallel dataset containing artificially generated errors is created to train the model. The second method utilizes a Swedish version of the pre-trained language model BERT to estimate the likelihood of potential corrections in an erroneous text. Employing the SweLL gold corpus consisting of essays written by learners of Swedish, the proposed methods are evaluated using GLEU and through a manual evaluation based on the types of errors and their corresponding corrections found in the essays. The results show that the two methods correct approximately the same amount of errors, while differing in terms of which error types that are best handled. Specifically, the translation approach has a wider coverage of error types and is superior for syntactical and punctuation errors. In contrast, the language model approach yields consistently higher recall and outperforms the translation approach with regards to lexical and morphological errors. To improve the results, future work could investigate the effect of increased model size and amount of training data, as well as the potential in combining the two methods.
182

Sequence to Sequence Machine Learning for Automatic Program Repair

Svensson, Niclas, Vrabac, Damir January 2019 (has links)
Most of previous program repair approaches are only able to generate fixes for one-line bugs, including machine learning based approaches. This work aims to reveal whether such a system with the state of the art technique is able to make useful predictions while being fed by whole source files. To verify whether multi-line bugs can be fixed using a state of the art solution a system has been created, using already existing Neural Machine Translation tools and data gathered from GitHub. The result of the finished system shows however, that the method used in this thesis is not sufficient to get satisfying results. No bug has successfully been corrected by the system. Although the results are poor there are still unexplored approaches to the project that possibly could improve the performance of the system. One way being narrowing down the input data to method level of source code instead of file level.
183

Practical Morphological Modeling: Insights from Dialectal Arabic

Erdmann, Alexander January 2020 (has links)
No description available.
184

Who is afraid of MT?

Schmitt, Peter A. 12 August 2022 (has links)
Machine translation (MT) is experiencing a renaissance. On one hand, machine translation is becoming more common and used in ever larger scale, on the other hand many translators have an almost hostile attitude towards machine translation programs and those translators who use MT as a tool. Either it is assumed that the MT can never be as good as a human translation or machine translation is viewed as the ultimate enemy of the translator and as a job killer. The article discusses with various examples the limits and possibilities of machine translation. It demonstrates that machine translation can be better than human translations – even if they were made by experienced professional translators. The paper also reports the results of a test that showed that translation customers must expect that even well-known and expensive translation service providers deliver a quality that is on par with poor MT. Overall, it is argued that machine translation programs are no more and no less than an additional tool with which the translation industry can satisfy certain requirements. This abstract was also – as the entire article – automatically translated into English.
185

Chinese Zero Pronoun Resolution with Neural Networks

Yang, Yifan January 2022 (has links)
In this thesis, I explored several neural network-based models to resolve the issues of zero pronoun in Chinese English translation tasks. I reviewed previous work that attempts to take the resolution as a classification task, such as determining if a candidate in a given set is the antecedent of a zero pronoun, which can be categorized as rule-based and supervised methods. Existing methods either did not take the relationship between potential zero pronoun candidates into consideration or did not fully utilize attention to zero pronoun representations. In my experiments, I investigated attention-based neural network models as well as its application in reinforcement learning environment building on an existing neural model. In particular, I integrated an LSTM-tree-based module into the attention network, which encodes syntax information for zero pronoun resolution tasks. In addition, I apply Bi-Attention layers between modules to interactively learn the syntax and semantic alignment. Furthermore, I leveraged a reinforcement learning framework to fine-tune the proposed model, and experiment with different encoding strategies, i.e., FastText, BERT, and trained RNN-based embedding. I found that attention-based model with LSTM-tree- based module, fine-tuned under reinforcement learning framework that utilized FastText embedding achieves the best performance, superior to the baseline models. I evaluated the model performance on different categories of resources, of which FastText shows great potential in encoding web blog text.
186

Java Syntax Error Repair Using RoBERTa

Xiang, Ziyi January 2022 (has links)
Deep learning has achieved promising results for automatic program repair (APR).In this paper, we revisit this topic and propose an end-to-end approach Classfix tocorrect java syntax errors. Classfix uses the RoBERTa classification model to localizethe error, and uses the RoBERTa encoder-decoder model to repair the located buggyline. Our work introduces a new localization method that enables us to fix a programwith an arbitrary length. Our approach categorises errors into symbol errors and worderrors. We conduct a large scale experiment to evaluate Classfix and the result showsClassfix is able to repair 75.5% symbol errors and 64.3% word errors. In addition,Classfix achieves 97% and 84.7% accuracy in locating symbol errors and word errors,respectively. / Deep learning har uppnått lovande resultat för automatisk programreparation (APR).I den här uppsatsen återkommer vi till det här ämnet och använder Classfix för attkorrigera javasyntaxfel. Classfix använder en RoBERTa-classification model för attlokalisera felet och en RoBERTa-encoder-decoder model för att reparera buggar.Vårt arbete introducerar en ny lokaliseringsmetod som gör att vi kan fixa programav godtycklig längd. Studien kategoriserar fel i symbolfel och ordfel. Vi genomförstorskaliga experiment för att utvärdera Classfix. Resultatet visar att Classfix kan fixa75.5% av symbolfel och 64.3% av ordfel. Dessutom uppnår Classfix 97% och 84,7% noggrannhet när det gäller att lokalisera symbolfel respektive ordfel.
187

Breaking Language Barriers: Enhancing Multilingual Representation for Sentence Alignment and Translation / 言語の壁を超える:文のアラインメントと翻訳のための多言語表現の改善

Mao, Zhuoyuan 25 March 2024 (has links)
京都大学 / 新制・課程博士 / 博士(情報学) / 甲第25420号 / 情博第858号 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)特定教授 黒橋 禎夫, 教授 河原 達也, 教授 鹿島 久嗣 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
188

Komparace výstupů z veřejně dostupných překladačů ve směru němčina-čeština / Comparing German-Czech translation output of publicly available machine translation engines

Řehořová, Klára January 2022 (has links)
This diploma thesis deals with the quality of the output of publicly available machine translation engines: Amazon Translate, Bing Translator, Deep L and Google Translate. The aim of the thesis was to qualitatively compare each machine translation engine and to test their capabilities on several types of texts - expressive, informative and operative. The analysis is carried out on translations of German texts into Czech. The thesis consists of two parts, in the theoretical part we discuss the origin and development of machine translation, its types, the automatic machine translation engines which are the object of our research, and we present a two-stage model for evaluating the quality of translation. In the empirical part, the results of the analysis based on the models of K. Reiß and A. Torrens are presented. These results show that the listed machine translation engines can be ranked from the highest to the lowest level of output quality as follows: Deep L, Google Translate, Bing Translator and Amazon Translate. Furthermore, it also turns out that the error rate correlates with the creativity of the text.
189

On improving natural language processing through phrase-based and one-to-one syntactic algorithms

Meyer, Christopher Henry January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / William H. Hsu / Machine Translation (MT) is the practice of using computational methods to convert words from one natural language to another. Several approaches have been created since MT’s inception in the 1950s and, with the vast increase in computational resources since then, have continued to evolve and improve. In this thesis I summarize several branches of MT theory and introduce several newly developed software applications, several parsing techniques to improve Japanese-to-English text translation, and a new key algorithm to correct translation errors when converting from Japanese kanji to English. The overall translation improvement is measured using the BLEU metric (an objective, numerical standard in Machine Translation quality analysis). The baseline translation system was built by combining Giza++, the Thot Phrase-Based SMT toolkit, the SRILM toolkit, and the Pharaoh decoder. The input and output parsing applications were created as intermediary to improve the baseline MT system as to eliminate artificially high improvement metrics. This baseline was measured with and without the additional parsing provided by the thesis software applications, and also with and without the thesis kanji correction utility. The new algorithm corrected for many contextual definition mistakes that are common when converting from Japanese to English text. By training the new kanji correction utility on an existing dictionary, identifying source text in Japanese with a high number of possible translations, and checking the baseline translation against other translation possibilities; I was able to increase the translation performance of the baseline system from minimum normalized BKEU scores of .0273 to maximum normalized scores of .081. The preliminary phase of making improvements to Japanese-to-English translation focused on correcting segmentation mistakes that occur when attempting to parse Japanese text into meaningful tokens. The initial increase is not indicative of future potential and is artificially high as the baseline score was so low to begin with, but was needed to create a reasonable baseline score. The final results of the tests confirmed that a significant, measurable improvement had been achieved through improving the initial segmentation of the Japanese text through parsing the input corpora and through correcting kanji translations after the Pharaoh decoding process had completed.
190

Spelling Normalisation and Linguistic Analysis of Historical Text for Information Extraction

Pettersson, Eva January 2016 (has links)
Historical text constitutes a rich source of information for historians and other researchers in humanities. Many texts are however not available in an electronic format, and even if they are, there is a lack of NLP tools designed to handle historical text. In my thesis, I aim to provide a generic workflow for automatic linguistic analysis and information extraction from historical text, with spelling normalisation as a core component in the pipeline. In the spelling normalisation step, the historical input text is automatically normalised to a more modern spelling, enabling the use of existing taggers and parsers trained on modern language data in the succeeding linguistic analysis step. In the final information extraction step, certain linguistic structures are identified based on the annotation labels given by the NLP tools, and ranked in accordance with the specific information need expressed by the user. An important consideration in my implementation is that the pipeline should be applicable to different languages, time periods, genres, and information needs by simply substituting the language resources used in each module. Furthermore, the reuse of existing NLP tools developed for the modern language is crucial, considering the lack of linguistically annotated historical data combined with the high variability in historical text, making it hard to train NLP tools specifically aimed at analysing historical text. In my evaluation, I show that spelling normalisation can be a very useful technique for easy access to historical information content, even in cases where there is little (or no) annotated historical training data available. For the specific information extraction task of automatically identifying verb phrases describing work in Early Modern Swedish text, 91 out of the 100 top-ranked instances are true positives in the best setting.

Page generated in 0.0278 seconds