Spelling suggestions: "subject:"enter"" "subject:"entes""
1 |
Rozpoznávání a propojování pojmenovaných entit / Named Entity Recognition and LinkingTaufer, Pavel January 2017 (has links)
The goal of this master thesis is to design and implement a named entity recognition and linking algorithm. A part of this goal is to propose and create a knowledge base that will be used in the algorithm. Because of the limited amount of data for languages other than English, we want to be able to train our method on one language, and then transfer the learned parameters to other languages (that do not have enough training data). The thesis consists of description of available knowledge bases, existing methods and design and implementation of our own knowledge base and entity linking method. Our method achieves state of the art result on a few variants of the AIDA CoNLL-YAGO dataset. The method also obtains comparable results on a sample of Czech annotated data from the PDT dataset using the parameters trained on the English CoNLL dataset. Powered by TCPDF (www.tcpdf.org)
|
2 |
Kolektivní propojování entit pro aplikaci ClueMaker / Collective Entity Matching Solution for ClueMaker ApplicationJaroschy, Petr January 2021 (has links)
ClueMaker (CM) is a Java desktop application used for data visualisation (via graph) by subjects like insurance companies (to unravel fraud activity), Czech organisation Hlí- dač Státu (to identify connections between subjects) or many others. This application currently uses a naive way to merge entities from different data sources (matching one field by exact string match). Goal of this thesis is to analyse, create and integrate a solution to CM, which would allow for merging entities based on entity similarity, and integrate such solution into the GUI of CM. Such solution should allow the user to merge two graph entities, show user the potentially same or very similar entities and allow for a global scan of the graph for potential merges. Furthermore, this solution should make use of data relationships within CM in addition to the attributes of entities. 1
|
3 |
Analýza bezpečnostních vazeb v síti entit / Analysis of security relationships in networks of entitiesKuklisová, Anikó January 2019 (has links)
The goal of this master thesis is to design and implement an analytical application for Security Information Service by providing a software prototype. The solution proposes an enhancement of existing graph that allows security analytics to analyse, edit and visualize objects and relations that are saved into a relational database. In the thesis, we walk through the process of development step by step. First, we investigate the current version software and the requirements of the customer. Afterwards, we design the architecture to be easily extendable with new modules and reliable libraries. In the next step, we implement the application, present our solution to the customer and conduct excessive testing. The final step is evaluating our solution by comparing it to the current software solution in use.
|
4 |
Rozpoznávání pojmenovaných entit v biomedicínské doméně / Named entity recognition in the biomedical domainWilliams, Shadasha January 2021 (has links)
Thesis Title: Named Entity Recognition in the Biomedical Domain Named entity recognition (NER) is the task of information extraction that attempts to recognize and extract particular entities in a text. One of the issues that stems from NER is that its models are domain specific. The goal of the thesis is to focus on entities strictly from the biomedical domain. The other issue with NER comes the synonymous terms that may be linked to one entity, moreover they lead to issue of disambiguation of the entities. Due to the popularity of neural networks and their success in NLP tasks, the work should use a neural network architecture for the task of named entity disambiguation, which is described in the paper by Eshel et al [1]. One of the subtasks of the thesis is to map the words and entities to a vector space using word embeddings, which attempts to provide textual context similarity, and coherence [2]. The main output of the thesis will be a model that attempts to disambiguate entities of the biomedical domain, using scientific journals (PubMed and Embase) as the documents of our interest.
|
5 |
Unsupervised Entity Classification with Wikipedia and WordNet / Klasifikace entit pomocí Wikipedie a WordNetuKliegr, Tomáš January 2007 (has links)
This dissertation addresses the problem of classification of entities in text represented by noun phrases. The goal of this thesis is to develop a method for automated classification of entities appearing in datasets consisting of short textual fragments. The emphasis is on unsupervised and semi-supervised methods that will allow for fine-grained character of the assigned classes and require no labeled instances for training. The set of target classes is either user-defined or determined automatically. Our initial attempt to address the entity classification problem is called Semantic Concept Mapping (SCM) algorithm. SCM maps the noun phrases representing the entities as well as the target classes to WordNet. Graph-based WordNet similarity measures are used to assign the closest class to the noun phrase. If a noun phrase does not match any WordNet concept, a Targeted Hypernym Discovery (THD) algorithm is executed. The THD algorithm extracts a hypernym from a Wikipedia article defining the noun phrase using lexico-syntactic patterns. This hypernym is then used to map the noun phrase to a WordNet synset, but it can also be perceived as the classification result by itself, resulting in an unsupervised classification system. SCM and THD algorithms were designed for English. While adaptation of these algorithms for other languages is conceivable, we decided to develop the Bag of Articles (BOA) algorithm, which is language agnostic as it is based on the statistical Rocchio classifier. Since this algorithm utilizes Wikipedia as a source of data for classification, it does not require any labeled training instances. WordNet is used in a novel way to compute term weights. It is also used as a positive term list and for lemmatization. A disambiguation algorithm utilizing global context is also proposed. We consider the BOA algorithm to be the main contribution of this dissertation. Experimental evaluation of the proposed algorithms is performed on the WordSim353 dataset, which is used for evaluation in the Word Similarity Computation (WSC) task, and on the Czech Traveler dataset, the latter being specifically designed for the purpose of our research. BOA performance on WordSim353 achieves Spearman correlation of 0.72 with human judgment, which is close to the 0.75 correlation for the ESA algorithm, to the author's knowledge the best performing algorithm for this gold-standard dataset, which does not require training data. The advantage of BOA over ESA is that it has smaller requirements on preprocessing of the Wikipedia data. While SCM underperforms on the WordSim353 dataset, it overtakes BOA on the Czech Traveler dataset, which was designed specifically for our entity classification problem. This discrepancy requires further investigation. In a standalone evaluation of THD on Czech Traveler dataset the algorithm returned a correct hypernym for 62% of entities.
|
6 |
Klasifikace vztahů mezi pojmenovanými entitami v textu / Classification of Relations between Named Entities in TextOndřej, Karel January 2020 (has links)
This master thesis deals with the extraction of relationships between named entities in the text. In the theoretical part of the thesis, the issue of natural language representation for machine processing is discussed. Subsequently, two partial tasks of relationship extraction are defined, namely named entities recognition and classification of relationships between them, including a summary of state-of-the-art solutions. In the practical part of the thesis, system for automatic extraction of relationships between named entities from downloaded pages is designed. The classification of relationships between entities is based on the pre-trained transformers. In this thesis, four pre-trained transformers are compared, namely BERT, XLNet, RoBERTa and ALBERT.
|
7 |
Rozpoznávání pojmenovaných entit / Named Entity RecognitionRylko, Vojtěch January 2014 (has links)
In this master thesis are described the history and theoretical background of named-entity recognition and implementation of the system in C++ for named entity recognition and disambiguation. The system uses local disambiguation method and statistics generated from the Wikilinks web dataset. With implemented system and with alternative implementations are performed various experiments and tests. These experiments show that the system is sufficiently successful and fast. System participates in the Entity Recognition and Disambiguation Challenge 2014.
|
8 |
Využití syntaktické informace pro identifikaci hodnocených entit / Využití syntaktické informace pro identifikaci hodnocených entitGlončák, Vladan January 2019 (has links)
Opinion Target Extraction (OTE) is a well-established subtask of sentiment analysis. While detecting sentiment polarity is useful in itself, the ability to extract the targets of the opinions allows for more thorough decision making. For example, an owner of a restaurant needs to know whether the guests are complaining about the food, or the ambience, or any other aspect of their establishment, etc. Despite the lexical information being crucial for the task, syntactic structures have potential in being used to correctly decide among multiple candidate entities. Rules based on such structures have been used previously for the task. The objective of this thesis is to investigate, whether syntactic information influences the behavior of the state-of-the-art models such as recurrent neural networks for the OTE task. We did not find any substantial evidence to suggest that adding the syntactic information influences the behavior of the models.
|
9 |
Pojmenované entity a ontologie metodami hlubokého učení / Pojmenované entity a ontologie metodami hlubokého učeníRafaj, Filip January 2021 (has links)
In this master thesis we describe a method for linking named entities in a given text to a knowledge base - Named Entity Linking. Using a deep neural architecture together with BERT contextualized word embeddings we created a semi-supervised model that jointly performs Named Entity Recognition and Named Entity Disambiguation. The model outputs a Wikipedia ID for each entity detected in an input text. To compute contextualized word embeddings we used pre-trained BERT without making any changes to it (no fine-tuning). We experimented with components of our model and various versions of BERT embeddings. Moreover, we tested several different ways of using the contextual embeddings. Our model is evaluated using standard metrics and surpasses scores of models that were establishing the state of the art before the expansion of pre-trained contextualized models. The scores of our model are comparable to current state-of-the-art models.
|
10 |
Rozpoznání pojmenovaných entit v textuSüss, Martin January 2019 (has links)
This thesis deals with the named entity recognition (NER) in text. It is realized by machine learning techniques. Recently, techniques for creating word embeddings models have been introduced. These word vectors can encode many useful relationships between words in text data, such as their syntactic or semantic similarity. Modern NER systems use these vector features for improving their quality. However, only few of them investigate in greater detail how much these vectors have impact on recognition and whether they can be optimized for even greater recognition quality. This thesis examines various factors that may affect the quality of word embeddings, and thus the resulting quality of the NER system. A series of experiments have been performed, which examine these factors, such as corpus quality and size, vector dimensions, text preprocessing techniques, and various algorithms (Word2Vec, GloVe and FastText) and their parameters. Their results bring useful findings that can be used within creation of word vectors and thus indirectly increase the resulting quality of NER systems.
|
Page generated in 0.0871 seconds