Global ETD Search

51	Rozpoznávání pojmenovaných entit v biomedicínské doméně / Named entity recognition in the biomedical domain Williams, Shadasha January 2021 (has links) Thesis Title: Named Entity Recognition in the Biomedical Domain Named entity recognition (NER) is the task of information extraction that attempts to recognize and extract particular entities in a text. One of the issues that stems from NER is that its models are domain specific. The goal of the thesis is to focus on entities strictly from the biomedical domain. The other issue with NER comes the synonymous terms that may be linked to one entity, moreover they lead to issue of disambiguation of the entities. Due to the popularity of neural networks and their success in NLP tasks, the work should use a neural network architecture for the task of named entity disambiguation, which is described in the paper by Eshel et al [1]. One of the subtasks of the thesis is to map the words and entities to a vector space using word embeddings, which attempts to provide textual context similarity, and coherence [2]. The main output of the thesis will be a model that attempts to disambiguate entities of the biomedical domain, using scientific journals (PubMed and Embase) as the documents of our interest.
52	Künstliche neuronale Netze zur Verarbeitung natürlicher Sprache Dittrich, Felix 21 April 2021 (has links) An der Verarbeitung natürlicher Sprache durch computerbasierte Systeme wurde immer aktiv entwickelt und geforscht, um Aufgaben in den am weitesten verbreiteten Sprachen zu lösen. In dieser Arbeit werden verschiedene Ansätze zur Lösung von Problemen in diesem Bereich mittels künstlicher neuronaler Netze beschrieben. Dabei konzentriert sich diese Arbeit hauptsächlich auf modernere Architekturen wie Transformatoren oder BERT. Ziel dabei ist es, diese besser zu verstehen und herauszufinden, welche Vorteile sie gegenüber herkömmlichen künstlichen neuronalen Netzwerken haben. Anschließend wird dieses erlangte Wissen an einer Aufgabe aus dem Bereich der Verarbeitung natürlicher Sprache getestet, in welcher mittels einer sogenannten Named Entity Recognition (NER) spezielle Informationen aus Texten extrahiert werden.:1 Einleitung 1.1 Verarbeitung natürlicher Sprache (NLP) 1.2 Neuronale Netze 1.2.1 Biologischer Hintergrund 1.3 Aufbau der Arbeit 2 Grundlagen 2.1 Künstliche neuronale Netze 2.1.1 Arten des Lernens 2.1.2 Aktivierungsfunktionen 2.1.3 Verlustfunktionen 2.1.4 Optimierer 2.1.5 Über- und Unteranpassung 2.1.6 Explodierender und verschwindender Gradient 2.1.7 Optimierungsverfahren 3 Netzwerkarchitekturen zur Verarbeitung natürlicher Sprache 3.1 Rekurrente neuronale Netze (RNN) 3.1.1 Langes Kurzzeitgedächtnis (LSTM) 3.2 Autoencoder 3.3 Transformator 3.3.1 Worteinbettungen 3.3.2 Positionscodierung 3.3.3 Encoderblock 3.3.4 Decoderblock 3.3.5 Grenzen Transformatorarchitektur 3.4 Bidirektionale Encoder-Darstellungen von Transformatoren (BERT) 3.4.1 Vortraining 3.4.2 Feinabstimmung 4 Praktischer Teil und Ergebnisse 4.1 Aufgabe 4.2 Verwendete Bibliotheken, Programmiersprachen und Software 4.2.1 Python 4.2.2 NumPy 4.2.3 pandas 4.2.4 scikit-learn 4.2.5 Tensorflow 4.2.6 Keras 4.2.7 ktrain 4.2.8 Data Version Control (dvc) 4.2.9 FastAPI 4.2.10 Docker 4.2.11 Amazon Web Services 4.3 Daten 4.4 Netzwerkarchitektur 4.5 Training 4.6 Auswertung 4.7 Implementierung 5 Schlussbemerkungen 5.1 Zusammenfassung und Ausblick Künstliche neuronale Netze , Transformatoren Named Entity Recognition (NER) NLP maschinelles Lernen
53	Algoritmy pro rozpoznávání pojmenovaných entit / Algorithms for named entities recognition Winter, Luca January 2017 (has links) The aim of this work is to find out which algorithm is the best at recognizing named entities in e-mail messages. The theoretical part explains the existing tools in this field. The practical part describes the design of two tools specifically designed to create new models capable of recognizing named entities in e-mail messages. The first tool is based on a neural network and the second tool uses a CRF graph model. The existing and newly created tools and their ability to generalize are compared on a subset of e-mail messages provided by Kiwi.com.
54	Verbesserung einer Erkennungs- und Normalisierungsmaschine für natürlichsprachige Zeitausdrücke Thomas, Stefan 27 February 2018 (has links) Digital gespeicherte Daten erfreuen sich einer stetig steigenden Verwendung. Insbesondere die computerbasierte Kommunikation über E-Mail, SMS, Messenger usw. hat klassische Kommunikationsmittel nahezu vollständig verdrängt. Einen Mehrwert aus diesen Daten zu generieren, ist sowohl im geschäftlichen als auch im privaten Bereich von entscheidender Bedeutung. Eine Möglichkeit den Nutzer zu unterstützen ist es, seine textuellen Daten umfassend zu analysieren und bestimmte Elemente hervorzuheben und ihm die Erstellung von Einträgen für Kalender, Adressbuch und dergleichen abzunehmen bzw. zumindest vorzubereiten. Eine weitere Möglichkeit stellt die semantische Suche in den Daten des Nutzers dar. Selbst mit Volltextsuche muss man bisher den genauen Wortlaut kennen, wenn man eine bestimmte Information sucht. Durch ein tiefgreifendes Verständnis für Zeit ist es nun aber möglich, über einen Zeitstrahl alle mit einem bestimmten Zeitpunkt oder einer Zeitspanne verknüpften Daten zu finden. Es existieren bereits viele Ansätze um Named Entity Recognition voll- bzw. semi-automatisch durchzuführen, aber insbesondere Verfahren, welche weitgehend sprachunabhängig arbeiten und sich somit leicht auf viele Sprachen skalieren lassen, sind kaum publiziert. Um ein solches Verfahren für natürlichsprachige Zeitausdrücke zu verbessern, werden in dieser Arbeit, basierend auf umfangreichen Analysen, Möglichkeiten vorgestellt. Es wird speziell eine Strategie entwickelt, die auf einem Verfahren des maschinellen Lernens beruht und so den manuellen Aufwand für die Unterstützung neuer Sprachen reduziert. Diese und weitere Strategien wurden implementiert und in die bestehende Architektur der Zeiterkennungsmaschine der ExB-Gruppe integriert. info:eu-repo/classification/ddc/000 ddc:000
55	Convolutional Neural Networks for Named Entity Recognition in Images of Documents van de Kerkhof, Jan January 2016 (has links) This work researches named entity recognition (NER) with respect to images of documents with a domain-specific layout, by means of Convolutional Neural Networks (CNNs). Examples of such documents are receipts, invoices, forms and scientific papers, the latter of which are used in this work. An NER task is first performed statically, where a static number of entity classes is extracted per document. Networks based on the deep VGG-16 network are used for this task. Here, experimental evaluation shows that framing the task as a classification task, where the network classifies each bounding box coordinate separately, leads to the best network performance. Also, a multi-headed architecture is introduced, where the network has an independent fully-connected classification head per entity. VGG-16 achieves better performance with the multi-headed architecture than with its default, single-headed architecture. Additionally, it is shown that transfer learning does not improve performance of these networks. Analysis suggests that the networks trained for the static NER task learn to recognise document templates, rather than the entities themselves, and therefore do not generalize well to new, unseen templates. For a dynamic NER task, where the type and number of entity classes vary per document, experimental evaluation shows that, on large entities in the document, the Faster R-CNN object detection framework achieves comparable performance to the networks trained on the static task. Analysis suggests that Faster R-CNN generalizes better to new templates than the networks trained for the static task, as Faster R-CNN is trained on local features rather than the full document template. Finally, analysis shows that Faster R-CNN performs poorly on small entities in the image and suggestions are made to improve its performance. Convolutional Neural Networks Faster R-CNN Named Entity Recognition Images Documents Engineering and Technology Teknik och teknologier
56	Translation Memory System Optimization : How to effectively implement translation memory system optimization / Optimering av översättningsminnessystem : Hur man effektivt implementerar en optimering i översättningsminnessystem Chau, Ting-Hey January 2015 (has links) Translation of technical manuals is expensive, especially when a larger company needs to publish manuals for their whole product range in over 20 different languages. When a text segment (i.e. a phrase, sentence or paragraph) is manually translated, we would like to reuse these translated segments in future translation tasks. A translated segment is stored with its corresponding source language, often called a language pair in a Translation Memory System. A language pair in a Translation Memory represents a Translation Entry also known as a Translation Unit. During a translation, when a text segment in a source document matches a segment in the Translation Memory, available target languages in the Translation Unit will not require a human translation. The previously translated segment can be inserted into the target document. Such functionality is provided in the single source publishing software, Skribenta developed by Excosoft. Skribenta requires text segments in source documents to find an exact or a full match in the Translation Memory, in order to apply a translation to a target language. A full match can only be achieved if a source segment is stored in a standardized form, which requires manual tagging of entities, and often reoccurring words such as model names and product numbers. This thesis investigates different ways to improve and optimize a Translation Memory System. One way was to aid users with the work of manual tagging of entities, by developing Heuristic algorithms to approach the problem of Named Entity Recognition (NER). The evaluation results from the developed Heuristic algorithms were compared with the result from an off the shelf NER tool developed by Stanford. The results shows that the developed Heuristic algorithms is able to achieve a higher F-Measure compare to the Stanford NER, and may be a great initial step to aid Excosofts’ users to improve their Translation Memories. / Översättning av tekniska manualer är väldigt kostsamt, speciellt när större organisationer behöver publicera produktmanualer för hela deras utbud till över 20 olika språk. När en text (t.ex. en fras, mening, paragraf) har blivit översatt så vill vi kunna återanvända den översatta texten i framtida översättningsprojekt och dokument. De översatta texterna lagras i ett översättningsminne (Translation Memory). Varje text lagras i sitt källspråk tillsammans med dess översättning på ett annat språk, så kallat målspråk. Dessa utgör då ett språkpar i ett översättningsminnessystem (Translation Memory System). Ett språkpar som lagras i ett översättningsminne utgör en Translation Entry även kallat Translation Unit. Om man hittar en matchning när man söker på källspråket efter en given textsträng i översättningsminnet, får man upp översättningar på alla möjliga målspråk för den givna textsträngen. Dessa kan i sin tur sättas in i måldokumentet. En sådan funktionalitet erbjuds i publicerings programvaran Skribenta, som har utvecklats av Excosoft. För att utföra en översättning till ett målspråk kräver Skribenta att text i källspråket hittar en exakt matchning eller en s.k. full match i översättningsminnet. En full match kan bara uppnås om en text finns lagrad i standardform. Detta kräver manuell taggning av entiteter och ofta förekommande ord som modellnamn och produktnummer. I denna uppsats undersöker jag hur man effektivt implementerar en optimering i ett översättningsminnessystem, bland annat genom att underlätta den manuella taggningen av entitier. Detta har gjorts genom olika Heuristiker som angriper problemet med Named Entity Recognition (NER). Resultat från de utvecklade Heuristikerna har jämförts med resultatet från det NER-verktyg som har utvecklats av Stanford. Resultaten visar att de Heuristiker som jag utvecklat uppnår ett högre F-Measure jämfört med Stanford NER och kan därför vara ett bra inledande steg för att hjälpa Excosofts användare att förbättra deras översättningsminnen. NER Named Entity Recognition TM Translation Memory Stanford NER
57	Efficient naming for Smart Home devices in Information Centric Networks Rossland Lindvall, Caspar, Söderberg, Mikael January 2020 (has links) The current network trends point towards a significant discrepancy between the data usage and the underlying architecture; a severely increasing amount of data is being sent from more devices while data usage is becoming more data-centric instead of the previously host-centric. Information Centric Network (ICN) is a new alternative network paradigm that is designed for a data-centric usage. ICN is based on uniquely naming data packages and making it location independent. This thesis researched how to implement an efficient naming for ICN in a Smart Home Scenario. The results are based on testing how the forwarding information base is populated for numerous different scenarios and how a node's duty cycle affects its power usage. The results indicate that a hierarchical naming is optimized for hierarchical-like network topology and a flat naming for interconnected network topologies. An optimized duty cycle is strongly dependent on the specific network and accordingto the results can a sub-optimal duty cycle lead to excessive powerusage. Information Centric Network ICN Named Data Network NDN The Internet of Things IoT Computer and Information Sciences Data- och informationsvetenskap
58	Building a Personally Identifiable Information Recognizer in a Privacy Preserved Manner Using Automated Annotation and Federated Learning Hathurusinghe, Rajitha 16 September 2020 (has links) This thesis explores the training of a deep neural network based named entity recognizer in an end-to-end privacy preserved setting where dataset creation and model training happen in an environment with minimal manual interventions. With the improvement of accuracy in Deep Learning Models for practical tasks, a rising concern is satisfying the demand for training data for these models amidst the concerns on the data privacy. Several scenarios of data protection are suggested in the recent past due to public concerns hence the legal guidelines to enforce them. A promising new development is the decentralized model training on isolated datasets, which eliminates the compromises of privacy upon providing data to a centralized entity. However, in this federated setting curating the data source is still a privacy risk mostly in unstructured data sources such as text. We explore the feasibility of automatic dataset annotation for a Named Entity Recognition (NER) task and training a deep learning model with it in two federated learning settings. We explore the feasibility of utilizing a dataset created in this manner for fine-tuning a stateof- the-art deep learning language model for the downstream task of named entity recognition. We also explore this novel setting of deep learning NLP model and federated learning for its deviation from the classical centralized setting. We created an automatically annotated dataset containing around 80,000 sentences, a manual human annotated test set and tools to extend the dataset with more manual annotations. We observed the noise from automated annotation can be overcome to a level by increasing the dataset size. We also contributed to the federated learning framework with state-of-the-art NLP model developments. Overall, our NER model achieved around 0.80 F1-score for recognition of entities in sentences. Federated Learning Named Entity Recognition BERT Transformer based NLP NLP NER Deep learning Privacy Machine learning
59	Lze to říci jinak aneb automatické hledání parafrází / Automatic Identification of Paraphrases Otrusina, Lubomír January 2009 (has links) Automatic paraphrase discovery is an important task in natural language processing. Many systems use paraphrases for improve performance e.g. systems for question answering, information retrieval or document summarization. In this thesis, we explain basic concepts e.g. paraphrase or paraphrase pattern. Next we propose some methods for paraphrase discovery from various resources. Subsequently we propose an unsupervised method for discovering paraphrase from large plain text based on context and keywords between NE pairs. In the end we explain evaluation metods in paraphrase discovery area and then we evaluate our system and compare it with similar systems.
60	Exploring Data Extraction and Relation Identification Using Machine Learning : Utilizing Machine-Learning Techniques to Extract Relevant Information from Climate Reports Berger, William, Fooladi, Alex, Lindgren, Markus, Messo, Michel, Rosengren, Jonas, Rådmann, Lukas January 2023 (has links) Ensuring the accessibility of data from Swedish municipal climate reports is necessary for examining climate work in Sweden. Manual data extraction is time-consuming and prone to errors, necessitating automation of the process. This project presents machine-learning techniques that can be used to extract data and information from Swedish municipal climate plans, to improve the accessibility of climate data. The proposed solution involves recognizing entities in plain text and extracting predefined relations between these using Named Entity Recognition and Relation Extraction, respectively. The result of the project is a functioning prototype in the medical domain due to the lack of annotated climate datasets in Swedish. Nevertheless, the problem remained the same: how to effectively perform data extraction from reports using machine learning techniques. The presented prototype demonstrates the potential of automating data extraction from reports. These findings imply that the system could be adapted to handle climate reports when a sufficient dataset becomes available. / Tillgängliggörande av information som sammanställs i svenska kommunala klimatplaner är avgörande för att utvärdera och ifrågasätta klimatarbetet i Sverige. Manuell dataextraktion är tidskrävande och komplicerat, vilket understryker behovet av att automatisera processen. Detta projekt utforskar maskininlärningstekniker som kan användas för att extrahera data och information från de kommunala klimatplanerna. Den föreslagna lösningen utnyttjar Named Entity Recognition för att identifiera entiteter i text och Relation Extraction för att extrahera fördefinierade relationer mellan entiteterna. I brist på svenska annoterade dataset inom klimatdomänen, är resultatet av projektet en fungerande prototyp inom den medicinska domänen. Frågeställningen är således densamma, hur maskininlärning kan användas för att utföra dataextraktion på rapporter. Prototypen som presenteras visar potentialen i att automatisera denna typ av dataextrahering. Denna framgång antyder att modellen kan anpassas för att hantera klimatrapporter när ett adekvat dataset blir tillgängligt. Data extraction Named Entity Recognition Relation Extraction Machine Learning Engineering and Technology Teknik och teknologier

Search results