Spelling suggestions: "subject:"batural language aprocessing"" "subject:"batural language eprocessing""
581 |
Autoformalization of Mathematical Proofs from Natural Language to Proof AssistantsCunningham, Garett 04 May 2022 (has links)
No description available.
|
582 |
Email Classification with Machine Learning and Word Embeddings for Improved Customer SupportRosander, Oliver, Ahlstrand, Jim January 2018 (has links)
Classifying emails into distinct labels can have a great impact on customer support. By using machine learning to label emails the system can set up queues containing emails of a specific category. This enables support personnel to handle request quicker and more easily by selecting a queue that match their expertise. This study aims to improve the manually defined rule based algorithm, currently implemented at a large telecom company, by using machine learning. The proposed model should have higher F1-score and classification rate. Integrating or migrating from a manually defined rule based model to a machine learning model should also reduce the administrative and maintenance work. It should also make the model more flexible. By using the frameworks, TensorFlow, Scikit-learn and Gensim, the authors conduct five experiments to test the performance of several common machine learning algorithms, text-representations, word embeddings and how they work together. In this article a web based interface were implemented which can classify emails into 33 different labels with 0.91 F1-score using a Long Short Term Memory network. The authors conclude that Long Short Term Memory networks outperform other non-sequential models such as Support Vector Machines and ADABoost when predicting labels for emails.
|
583 |
Simulating Expert Clinical Comprehension: Adapting Latent Semantic Analysis to Accurately Extract Clinical Concepts From Psychiatric NarrativeCohen, Trevor, Blatter, Brett, Patel, Vimla 01 December 2008 (has links)
Cognitive studies reveal that less-than-expert clinicians are less able to recognize meaningful patterns of data in clinical narratives. Accordingly, psychiatric residents early in training fail to attend to information that is relevant to diagnosis and the assessment of dangerousness. This manuscript presents cognitively motivated methodology for the simulation of expert ability to organize relevant findings supporting intermediate diagnostic hypotheses. Latent Semantic Analysis is used to generate a semantic space from which meaningful associations between psychiatric terms are derived. Diagnostically meaningful clusters are modeled as geometric structures within this space and compared to elements of psychiatric narrative text using semantic distance measures. A learning algorithm is defined that alters components of these geometric structures in response to labeled training data. Extraction and classification of relevant text segments is evaluated against expert annotation, with system-rater agreement approximating rater-rater agreement. A range of biomedical informatics applications for these methods are suggested.
|
584 |
Locating SQL Injection Vulnerabilities in Java Byte Code Using Natural Language TechniquesJackson, Kevin A., Bennett, Brian T. 01 October 2018 (has links)
With so much our daily lives relying on digital devices like personal computers and cell phones, there is a growing demand for code that not only functions properly, but is secure and keeps user data safe. However, ensuring this is not such an easy task, and many developers do not have the required skills or resources to ensure their code is secure. Many code analysis tools have been written to find vulnerabilities in newly developed code, but this technology tends to produce many false positives, and is still not able to identify all of the problems. Other methods of finding software vulnerabilities automatically are required. This proof-of-concept study applied natural language processing on Java byte code to locate SQL injection vulnerabilities in a Java program. Preliminary findings show that, due to the high number of terms in the dataset, using singular decision trees will not produce a suitable model for locating SQL injection vulnerabilities, while random forest structures proved more promising. Still, further work is needed to determine the best classification tool.
|
585 |
Symbolic Semantic Memory in Transformer Language ModelsMorain, Robert Kenneth 16 March 2022 (has links)
This paper demonstrates how transformer language models can be improved by giving them access to relevant structured data extracted from a knowledge base. The knowledge base preparation process and modifications to transformer models are explained. We evaluate these methods on language modeling and question answering tasks. These results show that even simple additional knowledge augmentation leads to a reduction in validation loss by 73%. These methods also significantly outperform common ways of improving language models such as increasing the model size or adding more data.
|
586 |
Automatic language identification of short textsAvenberg, Anna January 2020 (has links)
The world is growing more connected through the use of online communication, exposing software and humans to all the world's languages. While devices are able to understand and share the raw data between themselves and with humans, the information itself is not expressed in a monolithic format. This causes issues both in the human to computer interaction and human to human communication. Automatic language identification (LID) is a field within artificial intelligence and natural language processing that strives to solve a part of these issues by identifying languages from text, sign language and speech. One of the challenges is to identify the short pieces of text that can be found online, such as messages, comments and posts on social media. This is due to the small amount of information they carry. The goal of this thesis has been to build a machine learning model that can identify the language for these short pieces of text. A long short-term memory (LSTM) machine learning model was built and benchmarked towards Facebook's fastText model. The results show how the LSTM model reached an accuracy of around 95% and the fastText model used as comparison reached an accuracy of 97%. The LSTM model struggled more when identifying texts shorter than 50 characters than with longer text. The classification performance of the LSTM model was also relatively poor in cases where languages were similar, like Croatian and Serbian. Both the LSTM model and the fastText model reached accuracy's above 94% which can be considered high, depending on how it is evaluated. There are however many improvements and possible future work to be considered; looking further into texts shorter than 50 characters, evaluating the model's softmax output vector values and how to handle similar languages.
|
587 |
Automation of support service using Natural Language Processing : Automation of errands taggingHaglund, Kristoffer January 2020 (has links)
In this paper, Natural Language Processing and classification algorithms were used to create a program that automatically can tag different errands that are connected to Fortnox (an IT company based in Växjö) support service. Controlled experiments were conducted to find the best classification algorithm together with different Bag-of-Word pre-processing algorithms to find what was best suited for this problem. All data were provided by Fortnox and were manually labeled with tags connected to it as training and test data. The result of the final algorithm was 69.15% correctly/accurately predicted errands using all original data. When looking at the data that were incorrectly predicted a pattern was noticed where many errands have identical text attached to them. By removing the majority of these errands, the result was increased to 94.08%
|
588 |
Automatically Generating Tests from Natural Language Descriptions of Software BehaviorSunil Kamalakar, FNU 18 October 2013 (has links)
Behavior-Driven Development (BDD) is an emerging agile development approach where all stakeholders (including developers and customers) work together to write user stories in structured natural language to capture a software application's functionality in terms of re- quired "behaviors". Developers then manually write "glue" code so that these scenarios can be executed as software tests. This glue code represents individual steps within unit and acceptance test cases, and tools exist that automate the mapping from scenario descriptions to manually written code steps (typically using regular expressions). Instead of requiring programmers to write manual glue code, this thesis investigates a practical approach to con- vert natural language scenario descriptions into executable software tests fully automatically. To show feasibility, we developed a tool called Kirby that uses natural language processing techniques, code information extraction and probabilistic matching to automatically gener- ate executable software tests from structured English scenario descriptions. Kirby relieves the developer from the laborious work of writing code for the individual steps described in scenarios, so that both developers and customers can both focus on the scenarios as pure behavior descriptions (understandable to all, not just programmers). Results from assessing the performance and accuracy of this technique are presented. / Master of Science
|
589 |
Exploiting Linguistic and Statistical Knowledge in a Text Alignment SystemSchrader, Bettina 20 February 2009 (has links)
In machine translation, the alignment of corpora has evolved into a mature research area, aimed at providing training data for statistical or example-based machine translation systems. Moreover, the alignment information can be used for a variety of other purposes, including lexicography and the induction of tools for natural language processing. The alignment techniques used for these purposes fall roughly in two separate classes: sentence alignment approaches that often combine statistical and linguistic information, and word alignment models that are dominated by the statistical machine translation paradigm. Alignment approaches that use linguistic knowledge provided by corpus annotation are rare, as are as non-statistical word alignment strategies. Furthermore, parallel corpora are typically not aligned at all text levels simultaneously. Rather, a corpus is first sentence aligned, and in a subsequent step, the alignment information is refined to go below the sentence level. In this thesis, the distinction between the two alignment classes is withdrawn. Rather, a system is introduced that can simultaneously align at the paragraph, sentence, word, and phrase level. Furthermore, linguistic as well as statistical information can be combined. This combination of alignment cues from different knowledge sources, as well as the combination of the sentence and word alignment tasks, is made possible by the development of a modular alignment platform. Its main features are that it supports different kinds of linguistic corpus annotation, and furthermore aligns a corpus hierarchically, such that sentence and word alignments are cohesive. Alignment cues are not used within a global alignment model. Rather, different sub-models can be implemented and allowed to interact. Most of the alignment modules of the system have been implemented using empirical corpus studies, aimed at showing how the most common types of corpus annotation can be exploited for the alignment task.
|
590 |
Semantik und Sentiment: Konzepte, Verfahren und Anwendungen von Text-MiningNeubauer, Nicolas 06 June 2014 (has links)
Diese Arbeit befasst sich mit zwei Themenbereichen des Data Mining beziehungsweise Text Mining, den zugehörigen algorithmischen Verfahren sowie Konzepten und untersucht mögliche Anwendungsszenarien. Auf der einen Seite wird das Gebiet der semantischen Ähnlichkeit besprochen. Kurz, der Frage, wie algorithmisch bestimmt werden kann, wie viel zwei Begriffe oder Konzepte miteinander zu tun haben. Die Technologie um das Wissen, dass etwa "Regen" ein Bestandteil von "Wetter" sein kann, ermöglicht verschiedenste Anwendungen. In dieser Arbeit wird ein Überblick über gängige Literatur gegeben, das Forschungsgebiet wird grob in die zwei Schulen der wissensbasierten und statistischen Methoden aufgeteilt und in jeder wird ein Beitrag durch Untersuchung vorhandener und Vorstellung eigener Ähnlichkeitsmaße geleistet. Eine Studie mit Probanden und ein daraus entstandener Datensatz liefert schließlich Einblicke in die Präferenzen von Menschen bezüglich ihrer Ähnlichkeitswahrnehmung. Auf der anderen Seite steht das Gebiet des Sentiment Mining, in dem versucht wird, algorithmisch aus großen Sammlungen unstrukturierten Texts, etwa Nachrichten von Twitter oder anderen sozialen Netzwerken, Stimmungen und Meinungen zu identifizieren und zu klassifizieren. Nach einer Besprechung zugehöriger Literatur wird der Aufbau eines neuen Testdatensatzes motiviert und die Ergebnisse der Gewinnung dieses beschrieben. Auf dieser neuen Grundlage erfolgt eine ausführliche Auswertung einer Vielzahl von Vorgehensweisen und Klassifikationsmethoden. Schließlich wird die praktische Nutzbarkeit der Ergebnisse anhand verschiedener Anwendungsszenarien bei Produkt-Präsentationen sowie Medien- oder Volksereignissen wie der Bundestagswahl nachgewiesen.
|
Page generated in 0.0905 seconds