1 |
Rozpoznávání a propojování pojmenovaných entit / Named Entity Recognition and LinkingTaufer, Pavel January 2017 (has links)
The goal of this master thesis is to design and implement a named entity recognition and linking algorithm. A part of this goal is to propose and create a knowledge base that will be used in the algorithm. Because of the limited amount of data for languages other than English, we want to be able to train our method on one language, and then transfer the learned parameters to other languages (that do not have enough training data). The thesis consists of description of available knowledge bases, existing methods and design and implementation of our own knowledge base and entity linking method. Our method achieves state of the art result on a few variants of the AIDA CoNLL-YAGO dataset. The method also obtains comparable results on a sample of Czech annotated data from the PDT dataset using the parameters trained on the English CoNLL dataset. Powered by TCPDF (www.tcpdf.org)
2 |
Příprava vyhodnocovací sady pro složité problémy rozpoznávání a zjednoznačňování pojmenovaných entit pomocí crowdsourcingu / Preparing Evaluation Set for Complex Problems of Recognition and Disambiguation of Named Entities through CrowdsourcingPastorek, Peter January 2019 (has links)
This Master's Thesis prepares Evaluation Set for Problems of Recognition and Disambiguation of Named Entities. Evaluation Set is created using Automatization and Crowdsourcing. Evaluation Set can be used in testing Edge Cases in Recognition and Disambiguation of Named Entities.
3 |
Extracting Salient Named Entities from Financial News Articles / Extrahering av centrala entiteter från finansiella nyhetsartiklarGrönberg, David January 2021 (has links)
This thesis explores approaches for extracting company mentions from financial newsarticles that carry a central role in the news. The thesis introduces the task of salient named entity extraction (SNEE): extract all salient named entity mentions in a text document. Moreover, a neural sequence labeling approach is explored to address the SNEE task in an end-to-end fashion, both using a single-task and a multi-task learning setup. In order to train the models, a new procedure for automatically creating SNEE annotations for an existing news article corpus is explored. The neural sequence labeling approaches are compared against a two-stage approach utilizing NLP parsers, a knowledge base and a salience classifier. Textual features inspired from related work in salient entity detection are evaluated to determine what combination of features results in the highest performance on the SNEE task when used by a salience classifier. The experiments show that the difference in performance between the two-stage approach and the best performing sequence labeling approach is marginal, demonstrating the potential of the end-to-end sequence labeling approach on the SNEE task.
4 |
Creating a Graph Database from a Set of Documents / Skapandet av en grafdatabas från ett set av dokumentNikolic, Vladan January 2015 (has links)
In the context of search, it may be advantageous in some use-cases to have documents saved in a graph database rather than a document-orientated database. Graph databases are able to model relationships between objects, in this case documents, in ways which allow for efficient retrieval, as well as search queries that are slightly more specific or complex. This report will attempt to explore the possibilities of storing an existing set of documents into a graph database. A Named Entity Recognizer was used on a set of news articles in order to extract entities from each news article’s body of text. News articles that contain the same entities are then connected to each other in the graph. Ideas to improve this entity extraction are also explored. The method of evaluation that was utilized in this report proved not to be ideal for this task in that only a relative measure was given, not an absolute one. As such, no absolute answer with regards to the quality of the method can be presented. It is clear that improvements can be made, and the result should be subject to further study. / I ett sökkontext kan det vara födelaktigt att i några användarscenarion utgå från dokument lagrade i en grafdatabas gentemot en dokument-orienterad databas. Grafdatabaser kan modellera förhållanden mellan objekt, som i detta fall är dokument, på ett sätt som ökar effektiviteten för vissa mer specifika eller komplexa sökfrågor. Denna rapport utforskar möjligheterna i att lagra existerande dokument i en grafdatabas. En Named Entity Recognizer används för att extrahera entiter från en stor samling nyhetsartiklar. Nyhetsartiklar som innehåller samma entiteter är sedan kopplade till varandra i grafen. Dessutom undersöks möjligheter till att förbättra extraheringen av entiteter. Evalueringsmetoden som användes visade sig mindre än ideal, då endast en relativ snarare än absolut bedömning kan göras av den slutgiltiga grafen. Därav kan inget slutgiltigt svar ges angående grafens och metodens kvalitet, men resultatet bör vara av intresse för framtida undersökningar.
5 |
Using text mining to identify crime patterns from Arabic crime news report corpusAlruily, Meshrif January 2012 (has links)
Most text mining techniques have been proposed only for English text, and even here, most research has been conducted on specific texts related to special contexts within the English language, such as politics, medicine and crime. In contrast, although Arabic is a widely spoken language, few mining tools have been developed to process Arabic text, and some Arabic domains have not been studied at all. In fact, Arabic is a language with a very complex morphology because it is highly inflectional, and therefore, dealing with texts written in Arabic is highly complicated. This research studies the crime domain in the Arabic language, exploiting unstructured text using text mining techniques. Developing a system for extracting important information from crime reports would be useful for police investigators, for accelerating the investigative process (instead of reading entire reports) as well as for conducting further or wider analyses. We propose the Crime Profiling System (CPS) to extract crime-related information (crime type, crime location and nationality of persons involved in the event), automatically construct dictionaries for the existing information, cluster crime documents based on certain attributes and utilize visualisation techniques to assist in crime data analysis. The proposed information extraction approach is novel, and it relies on computational linguistic techniques to identify the abovementioned information, i.e. without using predefined dictionaries (e.g. lists of location names) and annotated corpus. The language used in crime reporting is studied to identify patterns of interest using a corpus-based approach. Frequency analysis, collocation analysis and concordance analysis are used to perform the syntactic analysis in order to discover the local grammar. Moreover, the Self Organising Map (SOM) approach is adopted in order to perform the clustering and visualisation tasks for crime documents based on crime type, location or nationality. This clustering technique is improved because only refined data containing meaningful keywords extracted through the information extraction process are inputted into it, i.e. the data is cleaned by removing noise. As a result, a huge reduction in the quantity of data fed into the SOM is obtained, consequently, saving memory, data loading time and the execution time needed to perform the clustering. Therefore, the computation of the SOM is accelerated. Finally, the quantization error is reduced, which leads to high quality clustering. The outcome of the clustering stage is also visualised and the system is able to provide statistical information in the form of graphs and tables about crimes committed within certain periods of time and within a particular area.
6 |
Benoemde-entiteitherkenning vir Afrikaans / G.D. MatthewMatthew, Gordon Derrac January 2013 (has links)
According to the Constitution of South Africa, the government is required to make all the infor-mation in the ten indigenous languages of South Africa (excluding English), available to the public. For this reason, the government made the information, that already existed for these ten languages, available to the public and an effort is also been made to increase the amount of resources available in these languages (Groenewald & Du Plooy, 2010). This release of infor-mation further helps to implement Krauwer‟s (2003) idea that there is an inventory for the mini-mal number of language-related resources required for a language to be competitive at the level of research and teaching. This inventory is known as the "Basic Language Resource Kit" (BLARK). Since most of the languages in South Africa are resource scarce, it is of the best in-terest for the cultural growth of the country, that each of the indigenous South African languages develops their own BLARK. In Chapter 1, the need for the development of an implementable named entity recogniser (NER) for Afrikaans is discussed by first referring to the Constitution of South Africa’s (Republic of South Africa, 2003) language policy. Secondly, the guidelines of BLARK (Krauwer, 2003) are discussed, which is followed by a discussion of an audit that focuses on the number of re-sources and the distribution of human language technology for all eleven South African languages (Sharma Grover, Van Huyssteen & Pretorius, 2010). In respect of an audit conducted by Sharma Grover et al. (2010), it was established that there is a shortage of text-based tools for Afrikaans. This study focuses on this need for text-based tools, by focusing on the develop-ment of a NER for Afrikaans. In Chapter 2 a description is given on what an entity and a named entity is. Later in the chapter the process of technology recycling is explained, by referring to other studies where the idea of technology recycling has been applied successfully (Rayner et al., 1997). Lastly, an analysis is done on the differences that may occur between Afrikaans and Dutch named entities. These differences are divided into three categories, namely: identical cognates, non-identical cognates and unrelated entities.
Chapter 3 begins with a description of Frog (van den Bosch et al, 2007), the Dutch NER used in this study, and the functions and operation of its NER-component. This is followed by a description of the Afrikaans-to-Dutch-converter (A2DC) (Van Huyssteen & Pilon, 2009) and finally the various experiments that were completed, are explained. The study consists of six experiments, the first of which was to determine the results of Frog on Dutch data. The second experiment evaluated the effectiveness of Frog on unchanged (raw) Afrikaans data. The following two experiments evaluated the results of Frog on “Dutched” Afrikaans data. The last two experiments evaluated the effectiveness of Frog on raw and “Dutched” Afrikaans data with the addition of gazetteers as part of the pre-processing step. In conclusion, a summary is given with regards to the comparisons between the NER for Afri-kaans that was developed in this study, and the NER-component that Puttkammer (2006) used in his tokeniser. Finally a few suggestions for future research are proposed. / MA (Applied Language and Literary Studies), North-West University, Vaal Triangle Campus, 2013
7 |
Making visible inter-agency working processes in children's servicesOctarra, Harla Sara January 2018 (has links)
Inter-agency working has been promoted as a way forward to improve public services, including children's services. However, the terminology is problematic because it often overlaps with other terminologies, such as partnership or collaboration. As a consequence, when describing working arrangements between people and organisations, a 'terminological quagmire' results (Leathard, 1994, p5), with 'definitional chaos' (Ling, 2000, p83). This definitional chaos is replicated in the on-going challenges found by research, on inter-agency working. While much literature has focussed on these challenges and solutions, little attention has been given to the processes that make up inter-agency working. My research explored inter-agency working processes at the frontline of children's services in Scotland. It examined formal mechanisms of working together, such as meetings and referral forms, which organised professionals' work and their relationships with one another. I used institutional ethnography to investigate inter-agency working processes. The research was conducted in one local authority in Scotland over a period of eight months and within the framework of Getting It Right For Every Child (GIRFEC), which is the country's national policy approach for children. One component of GIRFEC is the Named Person. It is a provision that would provide every child in Scotland a professional (for most children the professional is going to be their health visitor or head teacher) to help safeguard their wellbeing by means of offering advice, support and referral to other services. This service will make teachers at promoted posts responsible for coordinating support for their pupils and will change mechanisms of inter-agency working. The tenets of institutional ethnography allowed me to observe and trace the ways in which professionals worked together. The research found that when professionals worked together, they shared information and that sharing of information was complicated by the burgeoning use of technology. The working processes involved revealed the power relations between people and between people and organisations: specifically, between teachers and the Children and Families team members of the council, as the latter was responsible for maintaining the formal inter-agency working mechanisms of GIRFEC. The thesis highlights that inter-agency meetings, as formalised ways of working together, can boost professionals' confidence as they wrestle with uncertainty about their actions as professionals and how best to address children and young people's needs. This thesis also shows how policy changes changed the ways in which professionals work together. The Named Person provision of GIRFEC has ignited public debates in Scotland. This thesis is contributing to the debates by providing evidence on how this new role has changed the relationships between the teachers and other professionals. This is pertinent as the Scottish Government is currently redesigning the Named Person policy.
8 |
<p>Space syntax has been considered to be an important theory and analytical tool to study the correlation between spatial configuration and human social activities. But its traditional Axial Model has limitations in representing street. The conclusion got form Axial Model,that spatial configuration of street network can well predict the traffic flow, has been widely doubled.</p><p>In order to testify the conclusion, the thesis sets out to use Axial, Stroke and Named Street Models to model and analyze Hong Kong street network. Our research methodology is first to create and study different models of street network in pilot study area- Kowloon peninsula of Hong Kong, from the perspectives of space syntax theory and properties of complicated network. Through the pilot study, tentative correlations and conclusions could be derived, which are verified through the case study of whole street network of Hong Kong by taking samples from three different sampling criteria.</p><p>Through analysis, we find out that local integration best correlates with vehicle flow, and this correlation is called predictability of street network. Through comparisons of different models in terms of predictability, we conclude that stroke model has the best ability to predict vehicle flow. By analyzing the axial model of Hong Kong street network and comparing its result to early study, we prove that axial model does have limitations to represent street network. Also we find out all models of street network have properties of small world network and scale free, from the topological studies of these models.</p><p>In the research of this thesis, we develop an extension of ArcGIS, named Axwoman 4 in order to calculate and extract space syntax parameters from different models. And important implementation algorithms are introduced in this thesis.</p><p>The thesis is summed up at the end, and future research directions are given.</p>
9 |
Stereotypical Gender Roles and their Patriarchal Effects in A Streetcar Named DesireBauer, Christian January 2012 (has links)
Stereotypical gender roles have probably existed as long as human culture and are such a natural part if our lives that we barely take notice of them. Nevertheless, images of what we perceive as typically masculine and feminine in appearance and behavior depend on the individual’s perception. Within each gender one can find different stereotypes. A commonly assumed idea is that men are hard tough, while women are soft and vulnerable. I find it interesting hoe stereotypes function and how they are preserved almost without our awareness. Once I started reading and researching the topic of stereotypes it became clear to me that literature contains many stereotypes. The intension of this essay is to critically examine the stereotypical gender roles in the play A Streetcar Named Desire, written by Tennessee Williams in 1947. It is remarkable how the author portrays the three main characters: Stanley, Stella and Blanche. The sharp contracts and the dynamics between them are fascinating.
10 |
Space syntax has been considered to be an important theory and analytical tool to study the correlation between spatial configuration and human social activities. But its traditional Axial Model has limitations in representing street. The conclusion got form Axial Model,that spatial configuration of street network can well predict the traffic flow, has been widely doubled. In order to testify the conclusion, the thesis sets out to use Axial, Stroke and Named Street Models to model and analyze Hong Kong street network. Our research methodology is first to create and study different models of street network in pilot study area- Kowloon peninsula of Hong Kong, from the perspectives of space syntax theory and properties of complicated network. Through the pilot study, tentative correlations and conclusions could be derived, which are verified through the case study of whole street network of Hong Kong by taking samples from three different sampling criteria. Through analysis, we find out that local integration best correlates with vehicle flow, and this correlation is called predictability of street network. Through comparisons of different models in terms of predictability, we conclude that stroke model has the best ability to predict vehicle flow. By analyzing the axial model of Hong Kong street network and comparing its result to early study, we prove that axial model does have limitations to represent street network. Also we find out all models of street network have properties of small world network and scale free, from the topological studies of these models. In the research of this thesis, we develop an extension of ArcGIS, named Axwoman 4 in order to calculate and extract space syntax parameters from different models. And important implementation algorithms are introduced in this thesis. The thesis is summed up at the end, and future research directions are given.
Page generated in 0.0544 seconds