Spelling suggestions: "subject:"named"" "subject:"famed""
121 |
Automatically Detecting the Resonance of Terrorist Movement Frames on the WebEtudo, Ugochukwu O 01 January 2017 (has links)
The ever-increasing use of the internet by terrorist groups as a platform for the dissemination of radical, violent ideologies is well documented. The internet has, in this way, become a breeding ground for potential lone-wolf terrorists; that is, individuals who commit acts of terror inspired by the ideological rhetoric emitted by terrorist organizations. These individuals are characterized by their lack of formal affiliation with terror organizations, making them difficult to intercept with traditional intelligence techniques. The radicalization of individuals on the internet poses a considerable threat to law enforcement and national security officials. This new medium of radicalization, however, also presents new opportunities for the interdiction of lone wolf terrorism. This dissertation is an account of the development and evaluation of an information technology (IT) framework for detecting potentially radicalized individuals on social media sites and Web fora. Unifying Collective Action Framing Theory (CAFT) and a radicalization model of lone wolf terrorism, this dissertation analyzes a corpus of propaganda documents produced by several, radically different, terror organizations. This analysis provides the building blocks to define a knowledge model of terrorist ideological framing that is implemented as a Semantic Web Ontology. Using several techniques for ontology guided information extraction, the resultant ontology can be accurately processed from textual data sources. This dissertation subsequently defines several techniques that leverage the populated ontological representation for automatically identifying individuals who are potentially radicalized to one or more terrorist ideologies based on their postings on social media and other Web fora. The dissertation also discusses how the ontology can be queried using intuitive structured query languages to infer triggering events in the news. The prototype system is evaluated in the context of classification and is shown to provide state of the art results. The main outputs of this research are (1) an ontological model of terrorist ideologies (2) an information extraction framework capable of identifying and extracting terrorist ideologies from text, (3) a classification methodology for classifying Web content as resonating the ideology of one or more terrorist groups and (4) a methodology for rapidly identifying news content of relevance to one or more terrorist groups.
|
122 |
Reconhecimento de entidades mencionadas em português utilizando aprendizado de máquina / Portuguese named entity recognition using machine learningWesley Seidel Carvalho 24 February 2012 (has links)
O Reconhecimento de Entidades Mencionadas (REM) é uma subtarefa da extração de informações e tem como objetivo localizar e classificar elementos do texto em categorias pré-definidas tais como nome de pessoas, organizações, lugares, datas e outras classes de interesse. Esse conhecimento obtido possibilita a execução de outras tarefas mais avançadas. O REM pode ser considerado um dos primeiros passos para a análise semântica de textos, além de ser uma subtarefa crucial para sistemas de gerenciamento de documentos, mineração de textos, extração da informação, entre outros. Neste trabalho, estudamos alguns métodos de Aprendizado de Máquina aplicados na tarefa de REM que estão relacionados ao atual estado da arte, dentre eles, dois métodos aplicados na tarefa de REM para a língua portuguesa. Apresentamos três diferentes formas de avaliação destes tipos de sistemas presentes na literatura da área. Além disso, desenvolvemos um sistema de REM para língua portuguesa utilizando Aprendizado de Máquina, mais especificamente, o arcabouço de máxima entropia. Os resultados obtidos com o nosso sistema alcançaram resultados equiparáveis aos melhores sistemas de REM para a língua portuguesa desenvolvidos utilizando outras abordagens de aprendizado de máquina. / Named Entity Recognition (NER), a task related to information extraction, aims to classify textual elements according to predefined categories such as names, places, dates etc. This enables the execution of more advanced tasks. NER is a first step towards semantic textual analysis and is also a crucial task for systems of information extraction and other types of systems. In this thesis, I analyze some Machine Learning methods applied to NER tasks, including two methods applied to Portuguese language. I present three ways of evaluating these types of systems found in the literature. I also develop an NER system for the Portuguese language utilizing Machine Learning that entails working with a maximum entropy framework. The results are comparable to the best NER systems for the Portuguese language developed with other Machine Learning alternatives.
|
123 |
AN EVALUATION OF SDN AND NFV SUPPORT FOR PARALLEL, ALTERNATIVE PROTOCOL STACK OPERATIONS IN FUTURE INTERNETSSuresh, Bhushan 09 July 2018 (has links)
Virtualization on top of high-performance servers has enabled the virtualization of network functions like caching, deep packet inspection, etc. Such Network Function Virtualization (NFV) is used to dynamically adapt to changes in network traffic and application popularity. We demonstrate how the combination of Software Defined Networking (SDN) and NFV can support the parallel operation of different Internet architectures on top of the same physical hardware. We introduce our architecture for this approach in an actual test setup, using CloudLab resources. We start of our evaluation in a small setup where we evaluate the feasibility of the SDN and NFV architecture and incrementally increase the complexity of the setup to run a live video streaming application. We use two vastly different protocol stacks, namely TCP/IP and NDN to demonstrate the capability of our approach. The evaluation of our approach shows that it introduces a new level of flexibility when it comes to operation of different Internet architectures on top of the same physical network and with this flexibility provides the ability to switch between the two protocol stacks depending on the application.
|
124 |
Pseudonymizace textových datových kolekcí pro strojové učení / De-identification of text data collections for machine learningMareš, Martin January 2021 (has links)
Text data collections enable the deployment of artificial intelligence algorithms for novel tasks. Such collections often contain miscellaneous personal data and other sensitive information that complicates sharing and further processing due to the personal data protection requirements. Searching for personal data is often carried out by sequential passes through the complete text. The objective of this thesis is to create a tool that helps the annotators decrease the risk of data leaks from the text collections. The tool utilizes pseudonymization (replacing a word with a different word, based on a set of rules). During the annotation process, the tool tags the words as "public", "private" and "candidate". The task of the annotator is to determine the role of the candidate words and detect any other untagged private information. The private words then become the subject of the pseudonymization process. The auto-tagging tool utilizes a named entity recognizer and a database of rules. The database is automatically improved based on the decisions of the annotator. Different named entity recognizers were compared for the purpose of personal data search on the collection of the ELITR project. During the comparison, a method was found which increased the sensitivity of the named entities detection which also...
|
125 |
Struktury trie pro zpracování rozsáhlých textových dat / Trie Structures for Large Text Data ProcessingRajčok, Andrej January 2016 (has links)
This study analyzes natural language processing with emphasis on morphological analysis of inflective languages and systems for named entity recognition. It analyzes effective pattern matching in dictionary by using succint structures and then analyzes practical implementation of succint structures. It describes design and implementation of named entity recognition system and morphological analyzer and compares and test their speed and effectiveness.
|
126 |
Pojmenované entity a ontologie metodami hlubokého učení / Pojmenované entity a ontologie metodami hlubokého učeníRafaj, Filip January 2021 (has links)
In this master thesis we describe a method for linking named entities in a given text to a knowledge base - Named Entity Linking. Using a deep neural architecture together with BERT contextualized word embeddings we created a semi-supervised model that jointly performs Named Entity Recognition and Named Entity Disambiguation. The model outputs a Wikipedia ID for each entity detected in an input text. To compute contextualized word embeddings we used pre-trained BERT without making any changes to it (no fine-tuning). We experimented with components of our model and various versions of BERT embeddings. Moreover, we tested several different ways of using the contextual embeddings. Our model is evaluated using standard metrics and surpasses scores of models that were establishing the state of the art before the expansion of pre-trained contextualized models. The scores of our model are comparable to current state-of-the-art models.
|
127 |
Serviceorientiertes Text Mining am Beispiel von Entitätsextrahierenden DienstenPfeifer, Katja 16 June 2014 (has links)
Der Großteil des geschäftsrelevanten Wissens liegt heute als unstrukturierte Information in Form von Textdaten auf Internetseiten, in Office-Dokumenten oder Foreneinträgen vor. Zur Extraktion und Verwertung dieser unstrukturierten Informationen wurde eine Vielzahl von Text-Mining-Lösungen entwickelt. Viele dieser Systeme wurden in der jüngeren Vergangenheit als Webdienste zugänglich gemacht, um die Verwertung und Integration zu vereinfachen.
Die Kombination verschiedener solcher Text-Mining-Dienste zur Lösung konkreter Extraktionsaufgaben erscheint vielversprechend, da so bestehende Stärken ausgenutzt, Schwächen der Systeme minimiert werden können und die Nutzung von Text-Mining-Lösungen vereinfacht werden kann. Die vorliegende Arbeit adressiert die flexible Kombination von Text-Mining-Diensten in einem serviceorientierten System und erweitert den Stand der Technik um gezielte Methoden zur Auswahl der Text-Mining-Dienste, zur Aggregation der Ergebnisse und zur Abbildung der eingesetzten Klassifikationsschemata.
Zunächst wird die derzeit existierende Dienstlandschaft analysiert und aufbauend darauf eine Ontologie zur funktionalen Beschreibung der Dienste bereitgestellt, so dass die funktionsgesteuerte Auswahl und Kombination der Text-Mining-Dienste ermöglicht wird. Des Weiteren werden am Beispiel entitätsextrahierender Dienste Algorithmen zur qualitätssteigernden Kombination von Extraktionsergebnissen erarbeitet und umfangreich evaluiert. Die Arbeit wird durch zusätzliche Abbildungs- und Integrationsprozesse ergänzt, die eine Anwendbarkeit auch in heterogenen Dienstlandschaften, bei denen unterschiedliche Klassifikationsschemata zum Einsatz kommen, gewährleisten. Zudem werden Möglichkeiten der Übertragbarkeit auf andere Text-Mining-Methoden erörtert.
|
128 |
Extrakce vztahů mezi entitami / Entity Relationship ExtractionŠimečková, Zuzana January 2020 (has links)
Relationship extraction is the task of extracting semantic relationships between en- tities from a text. We create a Czech Relationship Extraction Dataset (CERED) using distant supervision on Wikidata and Czech Wikipedia. We detail the methodology we used and the pitfalls we encountered. Then we use CERED to fine-tune a neural network model for relationship extraction. We base our model on BERT - a linguistic model pre-trained on extensive unlabeled data. We demonstrate that our model performs well on existing English relationship datasets (Semeval 2010 Task 8, TACRED) and report the results we achieved on CERED. 1
|
129 |
Automatic Extraction and Assessment of Entities from the WebUrbansky, David 15 October 2012 (has links)
The search for information about entities, such as people or movies, plays an increasingly important role on the Web. This information is still scattered across many Web pages, making it more time consuming for a user to find all relevant information about an entity. This thesis describes techniques to extract entities and information about these entities from the Web, such as facts, opinions, questions and answers, interactive multimedia objects, and events. The findings of this thesis are that it is possible to create a large knowledge base automatically using a manually-crafted ontology. The precision of the extracted information was found to be between 75–90 % (facts and entities respectively) after using assessment algorithms. The algorithms from this thesis can be used to create such a knowledge base, which can be used in various research fields, such as question answering, named entity recognition, and information retrieval.
|
130 |
Extracting Transaction Information from Financial Press Releases / Extrahering av Transaktionsdata från Finansiella PressmeddelandenSjöberg, Agaton January 2021 (has links)
The use cases of Information Extraction (IE) are more or less endless, often consisting of a combination of Named Entity Recognition (NER) and Relation Extraction (RE). One use case of IE is the extraction of transaction information from Norwegian insider transaction Press Releases (PRs), where a transaction consists of at most four entities: the name of the owner performing the transaction, the number of shares transferred, the transaction date, and the price of the shares bought or sold. The relationships between the entities define which entity belongs to which transaction, and whether shares were bought or sold. This report has investigated how a pair of supervised NER and RE models extract this information. Since these Norwegian PRs were not labeled, two different approaches to annotating the transaction entities and their associated relations were investigated, and it was found that it is better to annotate only entities that occur in a relation than annotating all occurrences. Furthermore, the number of PRs needed to achieve a satisfactory result in the IE pipeline was investigated. The study shows that training with about 400 PRs is sufficient for the results to converge, at around 0.85 in F1-score. Finally, the report shows that there is not much difference between a complex RE model and a simple rule-based approach, when applied on the studied corpus.
|
Page generated in 0.233 seconds