1 |
Improving the efficiency and capabilities of document structuringMarshall, Robert January 2007 (has links) (PDF)
Natural language generation (NLG), the problem of creating human-readable documents by computer, is one of the major fields of research in computational linguistics The task of creating a document is extremely common in many fields of activity. Accordingly, there are many potential applications for NLG - almost any document creation task could potentially be automated by an NLG system. Advanced forms of NLG could also be used to generate a document in multiple languages, or as an output interface for other programs, which might ordinarily produce a less-manageable collection of data. They may also be able to create documents tailored to the needs of individual users. This thesis deals with document structure, a recent theory which describes those aspects of a document’s layout which affect its meaning. As well as its theoretical interest, it is a useful intermediate representation in the process of NLG. There is a well-defined process for generating a document structure using constraint programming. We show how this process can be made considerably more efficient. This in turn allows us to extend the document structuring task to allow for summarisation and finer control of the document layout. This thesis is organised as follows. Firstly, we review the necessary background material in both natural language processing and constraint programming.
2 |
A Decision Theoretic Approach to Natural Language GenerationMcKinley, Nathan D. 21 February 2014 (has links)
No description available.
3 |
Algorithms and Resources for Scalable Natural Language GenerationPfeil, Jonathan W. 01 September 2016 (has links)
No description available.
4 |
Development of MegaTIC as a new tool for genome engineering and analysis of GABAA receptor localization mechanisms in Caenorhabditis elegans / Développement de MegaTIC comme un nouvel outil pour l'ingénierie du génome et analyse des mécanismes de localisation des récepteurs GABA dans Caenorhabditis elegansJi, Tingting 29 September 2015 (has links)
Les stratégies d'ingénierie du génome par recombinaison homologue basées sur la technologie CRISPR/Cas-9 ont été largement utilisées pour modifier la séquence de gènes chez C. elegans. Cependant, l'efficacité de sélection des animaux modifiés nécessite d'être améliorée. Nous avons développé un nouveau marqueur, le gène miniSOG, qui permet la contre-sélection des animaux non modifiés et que nous avons intégré dans une cassette de double sélection. Les animaux contenant la cassette HySOG sont d'abord sélectionnés pour leur résistance à un antibiotique, l'hygromycine B. HySOG est ensuite excisée et les modifications sont insérées au locus cible. Les souches recombinantes résistent à une exposition à la lumière bleue, qui tue les vers exprimant le gène miniSOG. Nous montrons que la méganucléase I-SceI peut être utilisée pour exciser HySOG et pour introduire des modifications dans la lignée germinale avec la même efficacité que la technologie CRISPR/Cas-9.Des technologies d'ingénierie du génome ont été utilisées pour étiqueter la sous-unité des GABAAR UNC-49 et pour analyser le rôle de MADD-4/Punctin, UNC-40/DCC et NLG-1/neurologine sur l'agrégation synaptique des GABAAR à la jonction neuromusculaire GABAergique. Nous montrons que MADD-4, une protéine de la matrice extracellulaire sécrétée par les motoneurones, est un nouveau ligand de NLG-1 et de UNC-40 et constitue un organisateur synaptique antérograde des synapses GABAergiques. D'abord, l'isoforme courte de MADD-4, MADD-4B, lie directement NLG-1 et assure sa localisation à la membrane post-synaptique. Ensuite, MADD-4B lie, recrute et active probablement le récepteur UNC-40, qui renforce l'interaction des GABAAR avec NLG-1. / CRISPR/Cas-9-based techniques have been widely used to engineer any gene in C. elegans by homologous recombination. However, the selective efficacy of engineered animals needs to be expanded. We have developed miniSOG as a counter-selection marker in a dual selection strategy. Animals containing the dual selection cassette HySOG are firstly selected by the resistance to the antibiotic hygromycin B. HySOG is then excised and customized gene modifications are inserted into target sites. Recombinant strains are selected based on the resistance to blue light exposure, which otherwise kills miniSOG expressing worms. We demonstrate that meganuclease I-SceI can be used to excise HySOG and to introduce gene modifications in C. elegans germline as efficiently as CRISPR/Cas-9. Genome-engineering techniques have been used to tag the GABAAR subunit UNC-49 with RFP and to analyze the role of MADD-4/Punctin, UNC-40/DCC and NLG-1/neuroligin on GABAARs clustering at GABAergic NMJs. We showed that MADD-4/Punctin, an extracellular matrix protein secreted by GABAergic neurons, is a new ligand of NLG-1/neuroligin and of UNC-40/DCC and functions as a central anterograde organizer of GABAergic synapses. First, the short isoform of MADD-4, MADD-4B directly binds NLG-1/neuroligin and localizes it in the post-synaptic membrane of GABAergic synapses. Second, MADD-4B binds, recruits and likely activates the netrin receptor UNC-40/DCC, which in turn promotes the interaction of GABAAR with NLG-1/neuroligin and its localization at the synapse.
5 |
Data-driven natural language generation using statistical machine translation and discriminative learning / L'approche discriminante à la génération de la paroleManishina, Elena 05 February 2016 (has links)
L'humanité a longtemps été passionnée par la création de machines intellectuelles qui peuvent librement intéragir avec nous dans notre langue. Tous les systèmes modernes qui communiquent directement avec l'utilisateur partagent une caractéristique commune: ils ont un système de dialogue à la base. Aujourd'hui pratiquement tous les composants d'un système de dialogue ont adopté des méthodes statistiques et les utilisent largement comme leurs modèles de base. Jusqu'à récemment la génération de langage naturel (GLN) utilisait pour la plupart des patrons/modèles codés manuellement, qui représentaient des phrases types mappées à des réalisations sémantiques particulières. C'était le cas jusqu'à ce que les approches statistiques aient envahi la communauté de recherche en systèmes de dialogue. Dans cette thèse, nous suivons cette ligne de recherche et présentons une nouvelle approche à la génération de la langue naturelle. Au cours de notre travail, nous nous concentrons sur deux aspects importants du développement des systèmes de génération: construire un générateur performant et diversifier sa production. Deux idées principales que nous défendons ici sont les suivantes: d'abord, la tâche de GLN peut être vue comme la traduction entre une langue naturelle et une représentation formelle de sens, et en second lieu, l'extension du corpus qui impliquait traditionnellement des paraphrases définies manuellement et des règles spécialisées peut être effectuée automatiquement en utilisant des méthodes automatiques d'extraction des synonymes et des paraphrases bien connues et largement utilisées. En ce qui concerne notre première idée, nous étudions la possibilité d'utiliser le cadre de la traduction automatique basé sur des modèles ngrams; nous explorons également le potentiel de l'apprentissage discriminant (notamment les champs aléatoires markoviens) appliqué à la GLN; nous construisons un système de génération qui permet l'inclusion et la combinaison des différents modèles et qui utilise un cadre de décodage efficace (automate à état fini). En ce qui concerne le second objectif, qui est l'extension du corpus, nous proposons d'élargir la taille du vocabulaire et le nombre de l'ensemble des structures syntaxiques disponibles via l'intégration des synonymes et des paraphrases. À notre connaissance, il n'y a pas eu de tentatives d'augmenter la taille du vocabulaire d'un système de GLN en incorporant les synonymes. À ce jour, la plupart d'études sur l'extension du corpus visent les paraphrases et recourent au crowdsourcing pour les obtenir, ce qui nécessite une validation supplémentaire effectuée par les développeurs du système. Nous montrons que l'extension du corpus au moyen d'extraction automatique de paraphrases et la validation automatique sont tout aussi efficaces, étant en même temps moins coûteux en termes de temps de développement et de ressources. Au cours d'expériences intermédiaires nos modèles ont montré une meilleure performance que celle obtenue par le modèle de référence basé sur les syntagmes et se sont révélés d'être plus robustes, pour le traitement des combinaisons inconnues de concepts, que le générateur à base des règles. L'évaluation humaine finale a prouvé que les modèles représent une alternative solide au générateur à base des règles / The humanity has long been passionate about creating intellectual machines that can freely communicate with us in our language. Most modern systems communicating directly with the user share one common feature: they have a dialog system (DS) at their base. As of today almost all DS components embraced statistical methods and widely use them as their core models. Until recently Natural Language Generation (NLG) component of a dialog system used primarily hand-coded generation templates, which represented model phrases in a natural language mapped to a particular semantic content. Today data-driven models are making their way into the NLG domain. In this thesis, we follow along this new line of research and present several novel data-driven approaches to natural language generation. In our work we focus on two important aspects of NLG systems development: building an efficient generator and diversifying its output. Two key ideas that we defend here are the following: first, the task of NLG can be regarded as the translation between a natural language and a formal meaning representation, and therefore, can be performed using statistical machine translation techniques, and second, corpus extension and diversification which traditionally involved manual paraphrasing and rule crafting can be performed automatically using well-known and widely used synonym and paraphrase extraction methods. Concerning our first idea, we investigate the possibility of using NGRAM translation framework and explore the potential of discriminative learning, notably Conditional Random Fields (CRF) models, as applied to NLG; we build a generation pipeline which allows for inclusion and combination of different generation models (NGRAM and CRF) and which uses an efficient decoding framework (finite-state transducers' best path search). Regarding the second objective, namely corpus extension, we propose to enlarge the system's vocabulary and the set of available syntactic structures via integrating automatically obtained synonyms and paraphrases into the training corpus. To our knowledge, there have been no attempts to increase the size of the system vocabulary by incorporating synonyms. To date most studies on corpus extension focused on paraphrasing and resorted to crowd-sourcing in order to obtain paraphrases, which then required additional manual validation often performed by system developers. We prove that automatic corpus extension by means of paraphrase extraction and validation is just as effective as crowd-sourcing, being at the same time less costly in terms of development time and resources. During intermediate experiments our generation models showed a significantly better performance than the phrase-based baseline model and appeared to be more robust in handling unknown combinations of concepts than the current in-house rule-based generator. The final human evaluation confirmed that our data-driven NLG models is a viable alternative to rule-based generators.
6 |
Automatic generation of definitions : Exploring if GPT is useful for defining wordsEriksson, Fanny January 2023 (has links)
When reading a text, it is common to get stuck on unfamiliar words that are difficult to understand in the local context. In these cases, we use dictionaries or similar online resources to find the general meaning of the word. However, maintaining a handwritten dictionary is highly resource demanding as the language is constantly developing, and using generative language models for producing definitions could therefore be a more efficient option. To explore this possibility, this thesis performs an online survey to examine if GPT could be useful for defining words. It also investigates how well the Swedish language model GPT-SW3 (3.5 b) define words compared to the model text-davinci-003, and how prompts should be formatted when defining words with these models. The results indicate that text-davinci-003 generates high quality definitions, and according to students t-test, the definitions received significantly higher ratings from participants than definitions taken from Svensk ordbok (SO). Furthermore, the results showed that GPT-SW3 (3.5 b) received the lowest ratings, indicating that it takes more investment to keep up with the big models developed by OpenAI. Regarding prompt formatting, the most appropriate prompt format for defining words is highly dependent on the model, and the results showed that text- davinci-003 performed well using zero-shot, while GPT-SW3 (3.5 b) required a few shot setting. Considering both the high quality of the definitions generated by text-davinci-003, and the practical advantages with generating definitions automatically, GPT could be a useful method for defining words.
7 |
Automating Question Generation Given the Correct Answer / Automatisering av frågegenerering givet det rätta svaretCao, Haoliang January 2020 (has links)
In this thesis, we propose an end-to-end deep learning model for a question generation task. Given a Wikipedia article written in English and a segment of text appearing in the article, the model can generate a simple question whose answer is the given text segment. The model is based on an encoder-decoder architecture. Our experiments show that a model with a fine-tuned BERT encoder and a self-attention decoder give the best performance. We also propose an evaluation metric for the question generation task, which evaluates both syntactic correctness and relevance of the generated questions. According to our analysis on sampled data, the new metric is found to give better evaluation compared to other popular metrics for sequence to sequence tasks. / I den här avhandlingen presenteras en djup neural nätverksmodell för en frågeställningsuppgift. Givet en Wikipediaartikel skriven på engelska och ett textsegment i artikeln kan modellen generera en enkel fråga vars svar är det givna textsegmentet. Modellen är baserad på en kodar-avkodararkitektur (encoderdecoder architecture). Våra experiment visar att en modell med en finjusterad BERT-kodare och en självuppmärksamhetsavkodare (self-attention decoder) ger bästa prestanda. Vi föreslår också en utvärderingsmetrik för frågeställningsuppgiften, som utvärderar både syntaktisk korrekthet och relevans för de genererade frågorna. Enligt vår analys av samplade data visar det sig att den nya metriken ger bättre utvärdering jämfört med andra populära metriker för utvärdering.
8 |
Não-linearidade física e geométrica no projeto de edifícios usuais de concreto armado / Physical and geometrical non-linearity in design of usual reinforced concrete buildingsPinto, Rivelli da Silva 26 April 1997 (has links)
Neste trabalho são discutidos os procedimentos simplificados para a consideração da não linearidade física (NLF) e da não linearidade geométrica (NLG) na análise de edifícios de concreto armado. Deste modo, pretende-se estabelecer o grau de confiabilidade desses processos. Algumas prescrições para redução na inércia dos elementos estruturais são comparadas com os resultados obtidos através de modelos em elementos finitos, permitindo, assim, a avaliação destas prescrições. Um estudo detalhado do parâmetro γz, como majorador dos esforços em primeira ordem para a obtenção dos esforços finais em segunda ordem, é efetuada, de modo que se possa estabelecer, de forma mais clara, as vantagens e as limitações deste parâmetro. / This work shows some simplified procedures to consider physical non-linearity (FNL) and geometrical non-linearity (GNL), for reinforced concrete buildings, and discusses its reliability. For FNL, prescriptions for stiffness reduction of structural elements are compared with the results obtained from finite elements analysis, in order to verify its assessment. For GNL, a detailed study of the accuracy of γz parameter to evaluate final second order effects is made. The behavior of the parameter along the height of the building and for each effort considered is shown.
9 |
Avaliação da não linearidade fisica na estabilidade global de edificios de concreto armado / Evaluation of physical no-linearity in the global stability of reinforced concrete buildingsFontana, Luiz Antonio 23 February 2006 (has links)
Orientador: Francisco Antonio Menezes / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Civil, Arquitetura e Urbanismo / Made available in DSpace on 2018-08-07T09:14:46Z (GMT). No. of bitstreams: 1
Fontana_LuizAntonio_M.pdf: 833272 bytes, checksum: c80e60878c10acff4e3559bea3929d20 (MD5)
Previous issue date: 2006 / Resumo: Neste trabalho são apresentadas as expressões matemáticas necessárias para determinar o momento de inércia em função da curvatura para seções retangulares de concreto armado solicitadas a flexo-compressão reta. As expressões são determinadas através da integração analítica da curva tensão-deformação do concreto. Com base nestas expressões foi desenvolvido um programa de computador para calcular o momento de inércia em função da curvatura. O objetivo é obter o momento de inércia real da seção para ser utilizado na verificação das deformações. Para o programa é necessário conhecer: a geometria, a disposição das armaduras, as características dos materiais e as solicitações atuantes na seção. Na entrada de dados é possível controlar os coeficientes de ponderação dos materiais e das solicitações. Para o módulo de elasticidade, podem ser adotados os valores prescritos na NBR6118, ou outro valor que deve ser informado. A resistência à tração do concreto foi desprezada. Para validação dois modelos estruturais são analisados. A análise estrutural é feita de várias formas, sendo que uma delas é a análise considerando a não linearidade fisica (NLF) através da curvatura. Neste caso os elementos estruturais têm uma discretização maior, visando considerar a não linearidade ao longo do comprimento das peças. O objetivo é avaliar a possibilidade de implementar um processo de análise estrutural onde a consideração dos efeitos de segunda ordem seja relevante, que considere um. momento de inércia que aproxime o modelo matemático do modelo fisico real / Abstract: In this work, are presented the mathematic expressions necessary for determinate the moment of inertia in function of the curvature for rectangular cross-section the reinforced concrete submitted to mono-axial eccentricity. The expressions are determinate by the integration of analytic parabola rectangle stress strain diagram of concrete. Based in these expressions a computer system was developed to calculate the inertia of moment curvature. The objective is to get the moment of real inertia to be used in the verification of the deformations. It is necessary that the computer system know: geometry, the disposal of the bars, the characteristics of the materiaIs and operating requests in the section. When the database is imputed it is possible to have a control over the method of partial factors of materiaIs and solicitations. For the elasticity model may be adopt the NBR6118's values, or another value that must be informed. The resistance tension of concrete was rejected. To make the validation two structural models are analyzed. The structural analysis is made of severa! forms, one of them is the analysis considering no-linearity physical (NLP) through the curvature. In this case the structural elements have the biggest discretization, aiming to consider no-linearity along the length of the parts. The objective is to evaluate the possibility to implement a process of structural analysis where the effects of second order are relevant, that considers an inertia moment that can make the mathematical model dose to the real physical one / Mestrado / Estruturas / Mestre em Engenharia Civil
10 |
Não-linearidade física e geométrica no projeto de edifícios usuais de concreto armado / Physical and geometrical non-linearity in design of usual reinforced concrete buildingsRivelli da Silva Pinto 26 April 1997 (has links)
Neste trabalho são discutidos os procedimentos simplificados para a consideração da não linearidade física (NLF) e da não linearidade geométrica (NLG) na análise de edifícios de concreto armado. Deste modo, pretende-se estabelecer o grau de confiabilidade desses processos. Algumas prescrições para redução na inércia dos elementos estruturais são comparadas com os resultados obtidos através de modelos em elementos finitos, permitindo, assim, a avaliação destas prescrições. Um estudo detalhado do parâmetro γz, como majorador dos esforços em primeira ordem para a obtenção dos esforços finais em segunda ordem, é efetuada, de modo que se possa estabelecer, de forma mais clara, as vantagens e as limitações deste parâmetro. / This work shows some simplified procedures to consider physical non-linearity (FNL) and geometrical non-linearity (GNL), for reinforced concrete buildings, and discusses its reliability. For FNL, prescriptions for stiffness reduction of structural elements are compared with the results obtained from finite elements analysis, in order to verify its assessment. For GNL, a detailed study of the accuracy of γz parameter to evaluate final second order effects is made. The behavior of the parameter along the height of the building and for each effort considered is shown.
Page generated in 0.0222 seconds