Spelling suggestions: "subject:"[een] NLP"" "subject:"[enn] NLP""
131 |
A Feature Structure Approach for Disambiguating Preposition SensesBaglodi, Venkatesh 01 January 2009 (has links)
Word Sense Disambiguation (WSD) continues to be an open research problem in spite of recent advances in the NLP field, especially in machine learning. WSD for open-class words is well understood. However, WSD for closed class structural words (such as prepositions) is not so well resolved, and their role in frame semantics seems to be a relatively unknown area. This research uses a new method to disambiguate preposition senses by using a combined lookup from FrameNet and TPP databases. Motivated by recent work by Popescu, Tonelli, & Pianta (2007), it extends the concept to provide a deterministic WSD of prepositions using the lexical information drawn from the sentences in a local context. While the primary goal of the research is to disambiguate preposition sense, the approach also assigns frames and roles to different sentence elements. The use of prepositions for frame and role assignment seems to be a largely unexplored area which could provide a new dimension to research in lexical semantics.
|
132 |
Exploring Emerging Entities and Named Entity Disambiguation in News Articles / Utforskande av Framväxande Entiteter och Disambiguering av Entiteter i NyhetsartiklarEllgren, Robin January 2020 (has links)
Publicly editable knowledge bases such as Wikipedia and Wikidata have over the years grown tremendously in size. Despite the quick growth, they can never be fully complete due to the continuous stream of events happening in the world. In the task of Entity Linking, it is attempted to link mentions of objects in a document to its respective corresponding entries in a knowledge base. However, due to the incompleteness of knowledge bases, new or emerging entities cannot be linked. Attempts to solve this issue have created the field referred to as Emerging Entities. Recent state-of-the-art work has addressed the issue with promising results in English. In this thesis, the previous work is examined by evaluating its method in the context of a much smaller language; Swedish. The results reveal an expected drop in overall performance although remaining relative competitiveness. This indicates that the method is a feasible approach to the problem of Emerging Entities even for much less used languages. Due to limitations in the scope of the related work, this thesis also suggests a method for evaluating the accuracy of how the Emerging Entities are modeled in a knowledge base. The study also provides a comprehensive look into the landscape of Emerging Entities and suggests further improvements.
|
133 |
Mapping medical expressions to MedDRA using Natural Language ProcessingWallner, Vanja January 2020 (has links)
Pharmacovigilance, also referred to as drug safety, is an important science for identifying risks related to medicine intake. Side effects of medicine can be caused by for example interactions, high dosage and misuse. In order to find patterns in what causes the unwanted effects, information needs to be gathered and mapped to predefined terms. This mapping is today done manually by experts which can be a very difficult and time consuming task. In this thesis the aim is to automate the process of mapping side effects by using machine learning techniques. The model was developed using information from preexisting mappings of verbatim expressions of side effects. The final model that was constructed made use of the pre-trained language model BERT, which has received state-of-the-art results within the NLP field. When evaluating on the test set the final model performed an accuracy of 80.21%. It was found that some verbatims were very difficult for our model to classify mainly because of ambiguity or lack of information contained in the verbatim. As it is very important for the mappings to be done correctly, a threshold was introduced which left for manual mapping the verbatims that were most difficult to classify. This process could however still be improved as suggested terms were generated from the model, which could be used as support for the specialist responsible for the manual mapping.
|
134 |
Distributed analyses of disease risk and association across networks of de-identified medical systemsMcMurry, Andrew John 09 November 2015 (has links)
Health information networks continue to expand under the Affordable Care Act yet little research has been done to query and analyze multiple patient populations in parallel. Differences between hospitals relating to patient demographics, treatment approaches, disease prevalences, and medical coding practices all pose significant challenges for multi-site analysis and interpretation. Furthermore, numerous methodological issues arise when attempting to analyze disease association in heterogeneous health care settings. These issues will only continue to increase as greater numbers of hospitals are linked.
To address these challenges, I developed the Shared Health Research Informatics Network (SHRINE), a distributed query and analysis system used by more than 60 health institutions for a wide range of disease studies. SHRINE was used to conduct one of the largest comorbidity studies in Autism Spectrum Disorders. SHRINE has enabled population scale studies in diabetes, rheumatology, public health, and pathology. Using Natural Language Processing, we de-identify physician notes and query pathology reports to locate human tissues for high-throughput biological validation. Samples and evidence obtained using these methods supported novel discoveries in human metabolism and paripartum cardiomyopathy, respectively.
Each hospital in the SHRINE network hosts a local peer database that cannot be overridden by any federal agency. SHRINE can search both coded clinical concepts and de-identified physician notes to obtain very large cohort sizes for analysis. SHRINE intelligently clusters phenotypic concepts to minimize differences in health care settings.
I then analyzed a statewide sample of all Massachusetts acute care hospitals and found diagnoses codes useful for predicting Acute Myocardial Infarction (AMI). The AMI association methods selected 96 clinical concepts. Manual review of PubMed citations supported the automated associations. AMI associations were most often discovered in the circulatory system and were most strongly linked to background diabetic retinopathy, diabetes with renal manifestations, and hypertension with complications. AMI risks were strongly associated with chronic kidney failure, liver diseases, chronic airway obstruction, hemodialysis procedures, and medical device complications. Learning the AMI associated risk factors improved disease predictions for patients in Massachusetts acute care hospitals.
|
135 |
Künstliche neuronale Netze zur Verarbeitung natürlicher SpracheDittrich, Felix 21 April 2021 (has links)
An der Verarbeitung natürlicher Sprache durch computerbasierte Systeme wurde immer aktiv entwickelt und geforscht, um Aufgaben in den am weitesten verbreiteten Sprachen zu lösen. In dieser Arbeit werden verschiedene Ansätze zur Lösung von Problemen in diesem Bereich mittels künstlicher neuronaler Netze beschrieben. Dabei konzentriert sich diese Arbeit hauptsächlich auf modernere Architekturen wie Transformatoren oder BERT. Ziel dabei ist es, diese besser zu verstehen und herauszufinden, welche Vorteile sie gegenüber herkömmlichen künstlichen neuronalen Netzwerken haben. Anschließend wird dieses erlangte Wissen an einer Aufgabe aus dem Bereich der Verarbeitung natürlicher Sprache getestet, in welcher mittels einer sogenannten Named Entity Recognition (NER) spezielle Informationen aus Texten extrahiert werden.:1 Einleitung
1.1 Verarbeitung natürlicher Sprache (NLP)
1.2 Neuronale Netze
1.2.1 Biologischer Hintergrund
1.3 Aufbau der Arbeit
2 Grundlagen
2.1 Künstliche neuronale Netze
2.1.1 Arten des Lernens
2.1.2 Aktivierungsfunktionen
2.1.3 Verlustfunktionen
2.1.4 Optimierer
2.1.5 Über- und Unteranpassung
2.1.6 Explodierender und verschwindender Gradient
2.1.7 Optimierungsverfahren
3 Netzwerkarchitekturen zur Verarbeitung natürlicher Sprache
3.1 Rekurrente neuronale Netze (RNN)
3.1.1 Langes Kurzzeitgedächtnis (LSTM)
3.2 Autoencoder
3.3 Transformator
3.3.1 Worteinbettungen
3.3.2 Positionscodierung
3.3.3 Encoderblock
3.3.4 Decoderblock
3.3.5 Grenzen Transformatorarchitektur
3.4 Bidirektionale Encoder-Darstellungen von Transformatoren (BERT)
3.4.1 Vortraining
3.4.2 Feinabstimmung
4 Praktischer Teil und Ergebnisse
4.1 Aufgabe
4.2 Verwendete Bibliotheken, Programmiersprachen und Software
4.2.1 Python
4.2.2 NumPy
4.2.3 pandas
4.2.4 scikit-learn
4.2.5 Tensorflow
4.2.6 Keras
4.2.7 ktrain
4.2.8 Data Version Control (dvc)
4.2.9 FastAPI
4.2.10 Docker
4.2.11 Amazon Web Services
4.3 Daten
4.4 Netzwerkarchitektur
4.5 Training
4.6 Auswertung
4.7 Implementierung
5 Schlussbemerkungen
5.1 Zusammenfassung und Ausblick
|
136 |
Laff-O-Tron: Laugh Prediction in TED TalksAcosta, Andrew D 01 October 2016 (has links)
Did you hear where the thesis found its ancestors? They were in the "parent-thesis"! This joke, whether you laughed at it or not, contains a fascinating and mysterious quality: humor. Humor is something so incredibly human that if you squint, the two words can even look the same. As such, humor is not often considered something that computers can understand. But, that doesn't mean we won't try to teach it to them.
In this thesis, we propose the system Laff-O-Tron to attempt to predict when the audience of a public speech would laugh by looking only at the text of the speech. To do this, we create a corpus of over 1700 TED Talks retrieved from the TED website. We then adapted various techniques used by researchers to identify humor in text. We also investigated features that were specific to our public speaking environment. Using supervised learning, we try to classify if a chunk of text would cause the audience to laugh or not based on these features. We examine the effects of each feature, classifier, and size of the text chunk provided. On a balanced data set, we are able to accurately predict laughter with up to 75% accuracy in our best conditions. Medium level conditions prove to be around 70% accuracy; while our worst conditions result in 66% accuracy.
Computers with humor recognition capabilities would be useful in the fields of human computer interaction and communications. Humor can make a computer easier to interact with and function as a tool to check if humor was properly used in an advertisement or speech.
|
137 |
Chatbot pro Smart Cities / Chatbot for Smart CitiesJusko, Ján January 2019 (has links)
The aim of this work is to simplify access to information for citizens of the city of Brno and at the same time to innovate the way of communication between the citizen and his city. The problem is solved by creating a conversational agent - chatbot Kroko. Using artificial intelligence and a Czech language analyzer, the agent is able to understand and respond to a certain set of textual, natural language queries. The agent is available on the Messenger platform and has a knowledge base that includes data provided by the city council. After conducting an extensive user testing on a total of 76 citizens of the city, it turned out that up to 97\% of respondents like the idea of a city-oriented chatbot and can imagine using it regularly. The main finding of this work is that the general public can easily adopt and effectively use a chatbot. The results of this work motivate further development of practical applications of conversational agents.
|
138 |
Predicting the Vote Using Legislative SpeechBudhwar, Aditya 01 March 2018 (has links)
As most dedicated observers of voting bodies like the U.S. Supreme Court can attest, it is possible to guess vote outcomes based on statements made during deliberations or questioning by the voting members. In most forms of representative democracy, citizens can actively petition or lobby their representatives, and that often means understanding their intentions to vote for or against an issue of interest. In some U.S. state legislators, professional lobby groups and dedicated press members are highly informed and engaged, but the process is basically closed to ordinary citizens because they do not have enough background and familiarity with the issue, the legislator or the entire process. Our working hypothesis is that verbal utterances made during the legislative process by elected representatives can indicate their intent on a future vote, and therefore can be used to automatically predict said vote to a significant degree. In this research, we examine thousands of hours of legislative deliberations from the California state legislature’s 2015-2016 session to form models of voting behavior for each legislator and use them to train classifiers and predict the votes that occur subsequently. We can achieve legislator vote prediction accuracies as high as 83%. For bill vote prediction, our model can achieve 76% accuracy with an F1 score of 0.83 for balanced bill training data.
|
139 |
Expanding The NIF Ecosystem - Corpus Conversion, Parsing And Processing Using The NLP Interchange Format 2.0Brümmer, Martin 26 February 2018 (has links)
This work presents a thorough examination and expansion of the NIF ecosystem.
|
140 |
Generating Formal Representations of System Specification from Natural Language RequirementsIrfan, Zeeshan 05 October 2020 (has links)
Natural Language (NL) requirements play a significant role in specifying the system design, implementation and testing processes. Nevertheless, NL requirements are generally syntactically ambiguous and semantically inconsistent. Issues with NL
requirements can result into inaccurate and preposterous system design, implementation and testing. Moreover, informal nature of NL is a major hurdle in machine processing of system requirements specifications. To confront this problem, a requirement
template is introduced, based on controlled NL to produce deterministic and consistent representation of the system. The ultimate focus of this thesis is to generate test cases from system specifications driven from requirements communicated
in natural language. Manual software systems testing is a labour intensive, error prone and high cost activity. Traditionally, model-driven test generation approaches are employed for automated testing. However, system models are created
manually for test generation. The test cases generated from system models are not generally deterministic and traceable with individual requirements. This thesis proposes an approach for software system testing based on template-driven
requirements. This systematic approach is applied on the requirements elicited from system stakeholders. For this purpose natural language processing (NLP) methods are used. Using NLP approaches, useful information is extracted from controlled NL
requirements and afterwards the gathered information is processed to generate test scenarios. Our inceptive observation exhibits that this method provides remarkable gains in terms of reducing the cost, time and complexity of requirements based testing.
|
Page generated in 0.0584 seconds