Global ETD Search

31	Generating Wikipedia Articles with Grammatical Framework : A Case Study / Generering av Wikipedia-artiklar med Grammatical Framework : En fallstudie Matinzadeh, Keivan January 2023 (has links) Natural language generation is a method used to produce understandable texts in human languages from data [1]. Grammatical Framework is a grammar formalism and a functional programming language using a nonstatistical approach to build natural language applications. It separates the semantics and the syntax - achieving multilingualism by mapping the same semantic model to several syntaxes [2]. Grammatical Framework also has a large library called the Resource Grammar Library which serves the programmer pre-made functions in over 30 languages ready to be used to build words and sentences [3]. This report investigates if Grammatical Framework can be successfully used to perform natural language generation in order to create Wikipedia articles from data taken from Wikidata. A grammar and a program has been built to generate articles in Swedish for urban areas in Sweden. The grammar has been built around the structure of the first three sentences in the Swedish article about the urban area Linköping. Furthermore, the grammar and program is extended in order to support generation of the same articles in English and French. The results show that Grammatical Framework can be somewhat successfully used to generate small Wikipedia articles in different languages using data from Wikidata as input. While all texts were coherent, the Swedish texts were the ones having the least amount of grammatical mistakes. The biggest drawback is the rule of no pattern matching on run-time arguments, which severely limits the programmer since many functions in the resource grammar library use pattern matching internally. Even though Grammatical Framework does not solve the whole problem, it serves as a powerful enough tool to be suitable for natural language generation, with the main advantage being that it relieves the programmer from needing to pay attention to tasks related to grammar such as inflection and gender agreement. / Textgeneration är en metod som används för att generera naturlig text från data. Grammatical Framework är en grammatikformalism och ett funktionellt programmeringsspråk som använder ett ickestatistiskt tillvägagångssätt för att skapa språkteknologiska applikationer. Grammatical Framework separerar semantik och syntax, och uppnår flerspråkighet genom att länka samma semantiska model till flera syntaxer. Grammatical Framework har också ett stort bibliotek, en resursgrammatik, kallad Resource Grammar Library, som tillhandahåller applikations-programmeraren färdiga funktioner i över 30 språk redo att användas för att skapa ord och meningar. Syftet med den här rapporten är att undersöka om Grammatical Framework på ett framgångsrikt sätt kan användas för att generera Wikipedia-artiklar genom att använda data taget från Wikidata. En grammatik och ett program har skapats för att generera artiklar på svenska för svenska tätorter. Grammatiken använder de tre första meningarna i den svenska artikeln om tätorten Linköping som textstruktur. Vidare utökas grammatiken och programmet till att kunna generera samma artiklar på engelska och franska. Resultaten visar att Grammatical Framework är någorlunda framgångsrik när det kommer till att generera små Wikipedia-artiklar på olika språk. Fastän alla texter var läsbara, så hade de svenska texterna minst antal grammatiska fel. Den största nackdelen är den regel i Grammatical Framework som inte tillåter mönstermatchning med run-time argument, vilket begränsar programmeraren då många funktioner i resursgrammatiken använder möstermatching internt på sina argument. Även om Grammatical Framework inte löser hela problemet så är det ett tillräckligt kraftfullt verktyg för att vara lämpat till att användas vid textgenerering, där den största fördelen är att den avlastar programmeraren från att behöva tänka på böjning och andra grammatiska aspekter. Grammatical Framework Computational Linguistics Natural Language Generation Computer Science Grammatical Framework Beräkningslingvistik Textgenerering Datavetenskap Computer Sciences Datavetenskap (datalogi)
32	Automating Question Generation Given the Correct Answer / Automatisering av frågegenerering givet det rätta svaret Cao, Haoliang January 2020 (has links) In this thesis, we propose an end-to-end deep learning model for a question generation task. Given a Wikipedia article written in English and a segment of text appearing in the article, the model can generate a simple question whose answer is the given text segment. The model is based on an encoder-decoder architecture. Our experiments show that a model with a fine-tuned BERT encoder and a self-attention decoder give the best performance. We also propose an evaluation metric for the question generation task, which evaluates both syntactic correctness and relevance of the generated questions. According to our analysis on sampled data, the new metric is found to give better evaluation compared to other popular metrics for sequence to sequence tasks. / I den här avhandlingen presenteras en djup neural nätverksmodell för en frågeställningsuppgift. Givet en Wikipediaartikel skriven på engelska och ett textsegment i artikeln kan modellen generera en enkel fråga vars svar är det givna textsegmentet. Modellen är baserad på en kodar-avkodararkitektur (encoderdecoder architecture). Våra experiment visar att en modell med en finjusterad BERT-kodare och en självuppmärksamhetsavkodare (self-attention decoder) ger bästa prestanda. Vi föreslår också en utvärderingsmetrik för frågeställningsuppgiften, som utvärderar både syntaktisk korrekthet och relevans för de genererade frågorna. Enligt vår analys av samplade data visar det sig att den nya metriken ger bättre utvärdering jämfört med andra populära metriker för utvärdering. Natural Language Processing NLP Natural Language Generation NLG Question Generation Naturligtspråkbehandling Naturligtspråkgenerering Frågegenerering Computer and Information Sciences Data- och informationsvetenskap
33	Génération automatique de lettres de recrutement Grand'Maison, Philippe 02 1900 (has links) Ce mémoire de maîtrise présente le développement d’un système de génération de la langue naturelle pour automatiser les lettres de contact envoyées par les chasseurs de tête. Les travaux de Ehud Reiter ont inspiré la portion de génération de texte. La génération du contenu est basée sur des règles d’associations obtenues par l’analyse statistique d’une base de données de profils LinkedIn. Le système écrit des lettres en anglais mais peut être facilement étendu à la langue française. Ce projet s’inscrit dans le cadre du Butterfly Predictive Project, une collaboration entre l’Université de Montréal et LittleBIGJob. / This master’s thesis presents the development of a Natural Language Generation system designed to automate the writing of first-contact letters by professional headhunters. A top-down approach modelled on Ehud Reiter’s work handles the Natural Language portion of the system. Content generation is based on association rules obtained by statistical analysis of a large database of LinkedIn profiles. The system writes English letters but can easily be extended to French. This project is part of the Butterfly Predictive Project, a collaboration between Université de Montréal and LittleBIGJob. Génération automatique de texte forage de données ressources humaines recrutement Natural language generation Data mining Human resources
34	JSreal : un réalisateur de texte pour la programmation web Daoust, Nicolas 09 1900 (has links) Site web associé au mémoire: http://daou.st/JSreal / La génération automatique de texte en langage naturel est une branche de l’intelligence artificielle qui étudie le développement de systèmes produisant des textes pour différentes applications, par exemple la description textuelle de jeux de données massifs ou l’automatisation de rédactions textuelles routinières. Un projet de génération de texte comporte plusieurs grandes étapes : la détermination du contenu à exprimer, son organisation en structures comme des paragraphes et des phrases et la production de chaînes de caractères pour un lecteur humain ; c’est la réalisation, à laquelle ce mémoire s’attaque. Le web est une plateforme en constante croissance dont le contenu, de plus en plus dynamique, se prête souvent bien à l’automatisation par un réalisateur. Toutefois, les réalisateurs existants ne sont pas conçus en fonction du web et leur utilisation requiert beaucoup de connaissances, compliquant leur emploi. Le présent mémoire de maîtrise présente JSreal, un réalisateur conçu spécifiquement pour le web et facile d’apprentissage et d’utilisation. JSreal permet de construire une variété d’expressions et de phrases en français, qui respectent les règles de grammaire et de syntaxe, d’y ajouter des balises HTML et de les intégrer facilement aux pages web. / Natural language generation, a part of artificial intelligence, studies the development of systems that produce text for different applications, for example the textual description of massive datasets or the automation of routine text redaction. Text generation projects consist of multiple steps : determining the content to be expressed, organising it in logical structures such as sentences and paragraphs, and producing human-readable character strings, a step usually called realisation, which this thesis takes on. The web is constantly growing and its contents, getting progressively more dynamic, are well-suited to automation by a realiser. However, existing realisers are not designed with the web in mind and their operation requires much knowledge, complicating their use. This master’s thesis presents JSreal, a realiser designed specifically for the web and easy to learn and use. JSreal allows its user to build a variety of French expressions and sentences, to add HTML tags to them and to easily integrate them into web pages. Génération automatique de texte Réalisation de texte Natural language processing Natural language generation Text realisation
35	JSreal : un réalisateur de texte pour la programmation web Daoust, Nicolas 09 1900 (has links) La génération automatique de texte en langage naturel est une branche de l’intelligence artificielle qui étudie le développement de systèmes produisant des textes pour différentes applications, par exemple la description textuelle de jeux de données massifs ou l’automatisation de rédactions textuelles routinières. Un projet de génération de texte comporte plusieurs grandes étapes : la détermination du contenu à exprimer, son organisation en structures comme des paragraphes et des phrases et la production de chaînes de caractères pour un lecteur humain ; c’est la réalisation, à laquelle ce mémoire s’attaque. Le web est une plateforme en constante croissance dont le contenu, de plus en plus dynamique, se prête souvent bien à l’automatisation par un réalisateur. Toutefois, les réalisateurs existants ne sont pas conçus en fonction du web et leur utilisation requiert beaucoup de connaissances, compliquant leur emploi. Le présent mémoire de maîtrise présente JSreal, un réalisateur conçu spécifiquement pour le web et facile d’apprentissage et d’utilisation. JSreal permet de construire une variété d’expressions et de phrases en français, qui respectent les règles de grammaire et de syntaxe, d’y ajouter des balises HTML et de les intégrer facilement aux pages web. / Natural language generation, a part of artificial intelligence, studies the development of systems that produce text for different applications, for example the textual description of massive datasets or the automation of routine text redaction. Text generation projects consist of multiple steps : determining the content to be expressed, organising it in logical structures such as sentences and paragraphs, and producing human-readable character strings, a step usually called realisation, which this thesis takes on. The web is constantly growing and its contents, getting progressively more dynamic, are well-suited to automation by a realiser. However, existing realisers are not designed with the web in mind and their operation requires much knowledge, complicating their use. This master’s thesis presents JSreal, a realiser designed specifically for the web and easy to learn and use. JSreal allows its user to build a variety of French expressions and sentences, to add HTML tags to them and to easily integrate them into web pages. / Site web associé au mémoire: http://daou.st/JSreal Génération automatique de texte Réalisation de texte Natural language processing Natural language generation Text realisation
36	Combining machine learning and rule-based approaches in Spanish syntactic generation Melero Nogués, Maria Teresa 02 June 2006 (has links) Aquesta tesi descriu una gramàtica de Generació que combina regles escrites a mà i tècniques d'aprenentatge automàtic. Aquesta gramàtica pertany a un sistema de Traducció Automàtica de qualitat comercial desenvolupat a Microsoft Research. La primera part presenta la gramàtica i les principals estratègies lingüístiques que aquesta gramàtica implementa. Els requeriments de robustesa que reclama l'ús real del sistema de TA, exigeix del Generador un esforç suplementari que es resol afegint un nivell de pre-generació, capaç de garantir la integritat de l'entrada, sense incorporar elements ad-hoc en les regles de la gramàtica. A la segona part, explorem l'ús dels classificadors d'arbres de decisió (DT) per tal d'aprendre automàticament una de les operacions que tenen lloc al mòdul de pre-generació, en concret la selecció lèxica del verb copulatiu en espanyol (ser o estar). Mostrem que és possible inferir a partir d'exemples els contextos per aquest fenòmen lingüístic no trivial, amb gran precisió. / This thesis describes a Spanish Generation grammar which combines hand-written rules and Machine Learning techniques. This grammar belongs to a full-scale commercial quality Machine Translation system developed at Microsoft Research. The first part presents the grammar and the linguistic strategies it embodies. The need for robustness in real-world situations in the everyday use of the MT system requires from the Generator an extra effort which is resolved by adding a Pre-Generation layer which is able to fix the input to Generation, without contaminating the grammar rules. In the second part we explore the use of Decision Tree classifiers (DT) for automatically learning one of the operations that take place in the Pre-Generation component, namely lexical selection of the Spanish copula (i.e. ser and estar). We show that it is possible to infer from examples the contexts for this non-trivial linguistic phenomenon with high accuracy. Spanish copula robustness decision trees machine learning sentence realizers natural language generation machine translation verb copulatiu robustesa arbres de decisió métodes estadístics generació sintàctica traducció automàtica 004 81
37	Semantiska modeller för syntetisk textgenerering - en jämförelsestudie / Semantic Models for Synthetic Textgeneration - A Comparative Study Åkerström, Joakim, Peñaloza Aravena, Carlos January 2018 (has links) Denna kunskapsöversikt undersöker det forskningsfält som rör musikintegrerad matematikundervisning. Syftet med översikten är att få en inblick i hur musiken påverkar elevernas matematikprestationer samt hur forskningen ser ut inom denna kombination. Därför är vår frågeställning: Vad kännetecknar forskningen om integrationen mellan matematik och musik? För att besvara denna fråga har vi utfört litteratursökningar för att finna studier och artiklar som tillsammans bildar en överblick. Med hjälp av den metod som Claes Nilholm beskriver i SMART (2016) har vi skapat en struktur för hur vi arbetat. Ur det material som vi fann under sökningarna har vi funnit mönster som talar för musikens positiva inverkan på matematikundervisning. Förmågan att uttrycka sina känslor i form av ord eller beröra andra med dem har alltid varit enbeundransvärd och sällsynt egenskap. Det här projektet handlar om att skapa en text generatorkapabel av att skriva text i stil med enastående män och kvinnor med den här egenskapen. Arbetet har genomförts genom att träna ett neuronnät med citat skrivna av märkvärdigamänniskor såsom Oscar Wilde, Mark Twain, Charles Dickens, etc. Nätverket samarbetar med två olika semantiska modeller: Word2Vec och One-Hot och alla tre är delarna som vår textgenerator består av. Med dessa genererade texterna gjordes en enkätudersökning för att samlaåsikter från studenter om kvaliteten på de genererade texterna för att på så vis utvärderalämpligheten hos de olika semantiska modellerna. Efter analysen av resultatet lärde vi oss att de flesta respondenter tyckte att texterna de läste var sammanhängande och roliga. Vi lärde oss också att Word2Vec, presterade signifikant bättre än One-hot. / The ability of expressing feelings in words or moving others with them has always been admired and rare feature. This project involves creating a text generator able to write text in the style of remarkable men and women with this ability, this gift. This has been done by training a neural network with quotes written by outstanding people such as Oscar Wilde, Mark Twain, Charles Dickens, et alt. This neural network cooperate with two different semantic models: Word2Vec and One-Hot and the three of them compound our text generator. With the text generated we carried out a survey in order to collect the opinion of students about the quality of the text generated by our generator. Upon examination of the result, we proudly learned that most of the respondents thought the texts were coherent and fun to read, we also learned that the former semantic model performed, not by a factor of magnitude, better than the latter. semantic models word embeddings natural language processing natural language generation semantiska modeller syntetisk textgenerering naturlig språkbehandling Information Systems
38	Un dictionnaire de régimes verbaux en mandarin He, Linna 12 1900 (has links) Ce mémoire s’insère dans le projet GenDR, un réalisateur de texte profond multilingue qui modélise l’interface sémantique-syntaxe pour la génération automatique de texte (GAT). Dans le cadre de la GAT, les ressources lexicales sont de première nécessité pour que le système puisse transformer des données nonlinguistiques en langage naturel. Ces ressources lexicales déterminent dans une certaine mesure la précision et la flexibilité des phrases générées. En raison de l’imprévisibilité du régime des verbes et du rôle central que les verbes jouent dans un énoncé, une ressource lexicale qui décrit le régime des verbes revêt une importance particulière pour générer du texte le plus précis et le plus naturel possible. Nous avons tenté de créer un dictionnaire de régimes verbaux en mandarin. Ce genre de ressource lexicale est toujours une lacune dans le domaine de la GAT en mandarin. En nous basant sur la base de données Mandarin VerbNet, nous avons eu recours à Python pour extraire les adpositions régies et créer notre dictionnaire. Il s’agit d’un dictionnaire dynamique, dont le contenu peut être paramétré en fonction des objectifs de l’utilisateur. / This work fits into the GenDR project, a multilingual deep realizer which models the semantics-syntax interface for natural language generation (NLG). In NLG, lexical resources are essential to transform non-linguistic data into natural language. To a certain extent, the lexical resources used determine the accuracy and flexibility of the sentences generated by a realizer. Due to the unpredictability of verbs’ syntactic behaviour and the central role that verbs play in an utterance, a lexical resource which describes the government patterns of verbs is key to generating the most precise and natural text possible. We aim to create a dictionary of verbs’ government patterns in Mandarin. This kind of lexical resource is still missing for NLG in Mandarin. Based on the Mandarin VerbNet database, we used Python to extract information about adpositions and to create our dictionary. This is a dynamic dictionary whose content can be parameterized according to the user’s needs. Génération automatique de texte Réalisation linguistique Mandarin Verbes Patrons de régime Natural language generation Linguistic realization Verbs Government patterns
39	Génération de phrases multilingues par apprentissage automatique de modèles de phrases / Multilingual Natural Language Generation using sentence models learned from corpora Charton, Éric 12 November 2010 (has links) La Génération Automatique de Texte (GAT) est le champ de recherche de la linguistique informatique qui étudie la possibilité d’attribuer à une machine la faculté de produire du texte intelligible. Dans ce mémoire, nous présentons une proposition de système de GAT reposant exclusivement sur des méthodes statistiques. Son originalité est d’exploiter un corpus en tant que ressource de formation de phrases. Cette méthode offre plusieurs avantages : elle simplifie l’implémentation d’un système de GAT en plusieurs langues et améliore les capacités d’adaptations d’un système de génération à un domaine sémantique particulier. La production, d’après un corpus d’apprentissage, des modèles de phrases finement étiquetées requises par notre générateur de texte nous a conduit à mener des recherches approfondies dans le domaine de l’extraction d’information et de la classification. Nous décrivons le système d’étiquetage et de classification de contenus encyclopédique mis au point à cette fin. Dans les étapes finales du processus de génération, les modèles de phrases sont exploités par un module de génération de texte multilingue. Ce module exploite des algorithmes de recherche d’information pour extraire du modèle une phrase pré-existante, utilisable en tant que support sémantique et syntaxique de l’intention à communiquer. Plusieurs méthodes sont proposées pour générer une phrase, choisies en fonction de la complexité du contenu sémantique à exprimer. Nous présentons notamment parmi ces méthodes une proposition originale de génération de phrases complexes par agrégation de proto-phrases de type Sujet, Verbe, Objet. Nous envisageons dans nos conclusions que cette méthode particulière de génération puisse ouvrir des voies d’investigations prometteuses sur la nature du processus de formation de phrases / Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system. In this thesis report, we present an architecture of NLG system relying on statistical methods. The originality of our proposition is its ability to use a corpus as a learning resource for sentences production. This method offers several advantages : it simplifies the implementation and design of a multilingual NLG system, capable of sentence production of the same meaning in several languages. Our method also improves the adaptability of a NLG system to a particular semantic field. In our proposal, sentence generation is achieved trough the use of sentence models, obtained from a training corpus. Extracted sentences are abstracted by a labelling step obtained from various information extraction and text mining methods like named entity recognition, co-reference resolution, semantic labelling and part of speech tagging. The sentence generation process is achieved by a sentence realisation module. This module provide an adapted sentence model to fit a communicative intent, and then transform this model to generate a new sentence. Two methods are proposed to transform a sentence model into a generated sentence, according to the semantic content to express. In this document, we describe the complete labelling system applied to encyclopaedic content to obtain the sentence models. Then we present two models of sentence generation. The first generation model substitute the semantic content to an original sentence content. The second model is used to find numerous proto-sentences, structured as Subject, Verb, Object, able to fit by part a whole communicative intent, and then aggregate all the selected proto-sentences into a more complex one. Our experiments of sentence generation with various configurations of our system have shown that this new approach of NLG have an interesting potential Génération automatique de texte Génération de phrases Apprentissage automatique Syntaxe Extraction d’information Agrégation Natural language generation Sentence generation Statistical learning Syntax Information extraction Aggregation
40	L'implémentation des relatives dans un réalisateur profond Portenseigne, Charlotte 10 1900 (has links) Ce mémoire porte sur l’implémentation des propositions relatives en français dans le réalisateur profond multilingue GenDR. Les réalisateurs de surface (SimpleNLG, JSReal ou RealPro) génèrent des propositions relatives, mais dans les réalisateurs profonds (MARQUIS, Forge ou GenDR) cette génération reste rudimentaire. Dans un corpus français de 21 461 phrases, 4505 contiennent une relative, soit environ une phrase sur cinq. Il s’agit donc d’un phénomène linguistique important que GenDR devrait couvrir. Notre cadre théorique est la théorie Sens-Texte. Les propositions relatives se situent au niveau de l’interface sémantique-syntaxe. Nous présentons une typologie des propositions relatives. Nous définissons la relative et elle est divisée en deux grandes catégories : directe et indirecte. La définition des pronoms relatifs se base sur Riegel et al. (2018). Nous avons utilisé GREW, afin d’analyser un corpus du français en SUD. Il y a plus de relatives directes (≈78 %) que d’indirectes (≈22 %). Les pronoms les plus fréquents sont qui (58,8 %), que (13,8%), dont (10,2%) et où (10%), enfin viennent préposition suivie de lequel (5,7%), préposition suivie de qui (0,7 %), lequel (0,4 %), préposition suivie de quoi (0,1 %). Le rôle syntaxique le plus fréquent du nom modifié est objet direct. Puis, nous avons implémenté dans GenDR les règles pour la relative directe, la relative indirecte, et les pronoms relatifs qui, que, dont, préposition suivie de qui et préposition suivie de lequel. Notre implémentation couvre les types de relatives les plus communs en français. Les phénomènes qui nous résistent sont la génération des pronoms lequel, préposition suivie de quoi, où et qui objet, le traitement des verbes modaux et la génération des phrases avec un verbe à l’infinitif après un verbe modal, le traitement des verbes supports et autres collocatifs. Notre implémentation traite le français, mais peut être facilement adaptée à d’autres langues. / This Master’s thesis is about the implementation of French relative clauses in the multilingual deep realizer GenDR. Surface realizers (SimpleNLG, JSReal or RealPro) generate relative clauses, but in deep realizers (MARQUIS, Forge or GenDR) their handling remains rudimentary. In a French corpus of 21,461 sentences, 4,505 contain a relative, i.e. about one in five sentences. Thus, it is a core linguistic phenomenon that should be handled by GenDR. Our theoretical framework is the Meaning-Text theory. Relative clause is relevant in the semantics-syntax interface. We offer a typology of relative clauses. The relative clause is defined, and it is divided into two main categories: direct and indirect. Our definition of relative pronouns is based on Riegel et al. (2018). We used GREW to analyze a French corpus in SUD. There are more direct (≈78%) than indirect (≈22%) relatives. The most frequent pronouns are qui (58.8%), que (13.8%), dont (10.2%) and où (10%), then a preposition followed by lequel (5.7%), a preposition followed by qui (0.7%), lequel (0.4%), and a preposition followed by quoi (0.1%). The most frequent function of the modified noun is direct object. We implemented in GenDR the rules for direct relative, indirect relative, and relative pronouns qui, que, dont, a preposition followed by qui, and a preposition followed by lequel. Our implementation covers the most common types of relatives. The phenomena that are not well handled by our rules are the generation of the pronouns lequel, a preposition followed by quoi, où and object qui, the treatment of modal verbs and the generation of sentences with an infinitive verb after a modal verb, the treatment of support verbs and other collocations. Our implementation is for French, but it can be easily adapted to other languages. proposition relative génération automatique de texte réalisateur de texte Théorie Sens-Texte relative clause natural language generation text realizer Meaning-Text theory

Search results