1 |
Generating Paraphrases with Greater Variation Using Syntactic PhrasesMadsen, Rebecca Diane 01 December 2006 (has links) (PDF)
Given a sentence, a paraphrase generation system produces a sentence that says the same thing but usually in a different way. The paraphrase generation problem can be formulated in the machine translation paradigm; instead of translation of English to a foreign language, the system translates an English sentence (for example) to another English sentence. Quirk et al. (2004) demonstrated this approach to generate almost 90% acceptable paraphrases. However, most of the sentences had little variation from the original input sentence. Leveraging syntactic information, this thesis project presents an approach that successfully generated more varied paraphrase sentences than the approach of Quirk et al. while maintaining coverage of the proportion of acceptable paraphrases generated. The ParaMeTer system (Paraphrasing by MT) identifies syntactic chunks in paraphrase sentences and substitutes labels for those chunks. This enables the system to generalize movements that are more syntactically plausible, as syntactic chunks generally capture sets of words that can change order in the sentence without losing grammaticality. ParaMeTer then uses statistical phrase-based MT techniques to learn alignments for the words and chunk labels alike. The baseline system followed the same pattern as the Quirk et al. system - a statistical phrase-based MT system. Human judgments showed that the syntactic approach and baseline both achieve approximately the same ratio of fluent, acceptable paraphrase sentences per fluent sentences. These judgments also showed that the ParaMeTer system has more phrase rearrangement than the baseline system. Though the baseline has more within-phrase alteration, future modifications such as a chunk-only translation model should improve ParaMeTer's variation for phrase alteration as well.
|
2 |
Automatic Question Paraphrasing in Swedish with Deep Generative Models / Automatisk frågeparafrasering på svenska med djupa generativa modellerLindqvist, Niklas January 2021 (has links)
Paraphrase generation refers to the task of automatically generating a paraphrase given an input sentence or text. Paraphrase generation is a fundamental yet challenging natural language processing (NLP) task and is utilized in a variety of applications such as question answering, information retrieval, conversational systems etc. In this study, we address the problem of paraphrase generation of questions in Swedish by evaluating two different deep generative models that have shown promising results on paraphrase generation of questions in English. The first model is a Conditional Variational Autoencoder (C-VAE) and the other model is an extension of the first one where a discriminator network is introduced into the model to form a Generative Adversarial Network (GAN) architecture. In addition to these models, a method not based on machine-learning was implemented to act as a baseline. The models were evaluated using both quantitative and qualitative measures including grammatical correctness and equivalence to source question. The results show that the deep generative models outperformed the baseline across all quantitative metrics. Furthermore, from the qualitative evaluation it was shown that the deep generative models outperformed the baseline at generating grammatically correct sentences, but there was no noticeable difference in terms of equivalence to the source question between the models. / Parafrasgenerering syftar på uppgiften att, utifrån en given mening eller text, automatiskt generera en parafras, det vill säga en annan text med samma betydelse. Parafrasgenerering är en grundläggande men ändå utmanande uppgift inom naturlig språkbehandling och används i en rad olika applikationer som informationssökning, konversionssystem, att besvara frågor givet en text etc. I den här studien undersöker vi problemet med parafrasgenerering av frågor på svenska genom att utvärdera två olika djupa generativa modeller som visat lovande resultat på parafrasgenerering av frågor på engelska. Den första modellen är en villkorsbaserad variationsautokodare (C-VAE). Den andra modellen är också en C-VAE men introducerar även en diskriminator vilket gör modellen till ett generativt motståndarnätverk (GAN). Förutom modellerna presenterade ovan, implementerades även en icke maskininlärningsbaserad metod som en baslinje. Modellerna utvärderades med både kvantitativa och kvalitativa mått inklusive grammatisk korrekthet och likvärdighet mellan parafras och originalfråga. Resultaten visar att de djupa generativa modellerna presterar bättre än baslinjemodellen på alla kvantitativa mätvärden. Vidare, visade the kvalitativa utvärderingen att de djupa generativa modellerna kunde generera grammatiskt korrekta frågor i större utsträckning än baslinjemodellen. Det var däremot ingen större skillnad i semantisk ekvivalens mellan parafras och originalfråga för de olika modellerna.
|
Page generated in 0.1442 seconds