Spelling suggestions: "subject:"batural language aprocessing"" "subject:"batural language eprocessing""
201 |
Concept Maps Mining for Text SummarizationAGUIAR, C. Z. 31 March 2017 (has links)
Made available in DSpace on 2018-08-02T00:03:48Z (GMT). No. of bitstreams: 1
tese_11160_CamilaZacche_dissertacao_final.pdf: 5437260 bytes, checksum: 0c96c6b2cce9c15ea234627fad78ac9a (MD5)
Previous issue date: 2017-03-31 / 8
Resumo
Os mapas conceituais são ferramentas gráficas para a
representação e construção do
conhecimento. Conceitos e relações formam a base para o aprendizado e, portanto, os mapas
conceituais têm sido amplamente utilizados em diferentes situações e para diferentes
propósitos na educação, sendo uma delas a represent
ação do texto escrito. Mes
mo um
gramá
tico e complexo texto pode ser representado por um mapa conceitual contendo apenas
conceitos
e relações que represente
m o que foi expresso de uma forma mais complicada.
No entanto, a construção manual de um mapa conceit
ual exige bastante tempo e esforço
na identificação e estruturação do conhecimento, especialmente quando o mapa não deve
representar os conceitos da estrutura cognitiva do autor. Em vez disso, o mapa deve
representar os conceitos expressos em um texto. Ass
im, várias abordagens tecnológicas
foram propostas para facilitar o processo de construção de mapas conceituais a partir de
textos.
Portanto, esta dissertação propõe uma nova abordagem para a construção automática
de mapas conceituais como sumarização de t
extos científicos. A sumarização pretende
produzir um mapa conceitual como uma representação resumida do texto, mantendo suas
diversas e mais importantes características.
A sumarização pode facilitar a compreensão dos textos, uma vez que os alunos estão
te
ntando lidar com a sobrecarga cognitiva causada pela crescente quantidade de informação
textual disponível atualmente. Este
crescimento
também pode ser prejudicial à construção
do conhecimento. Assim, consideramos a hipótese de que a sumarização de um text
o
representado por um mapa conceitual pode atribuir características importantes para assimilar
o conhecimento do texto, bem como diminuir a sua complexidade e o tempo necessário para
processá
-
lo.
Neste contexto, realizamos uma revisão da literatura entre o
s anos de 1994 e 2016 sobre
as abordagens que visam a construção automática de mapas conceituais a partir de textos. A
partir disso, construímos uma categorização para melhor identificar e analisar os recursos e
as características dessas abordagens tecnoló
gicas. Além disso, buscamos identificar as
limitações e reunir as melhores características dos trabalhos relacionados para propor nossa
abordagem.
9
Ademais, apresentamos um processo Concept Map Mining
elaborado seguindo quatro
dimensões
: Descrição da Fonte
de Dados, Definição do Domínio, Identificação de
Elementos e Visualização do Mapa.
Com o intuito de desenvolver uma arquitetura computacional para construir
automaticamente mapas conceituais como sumarização de textos acadêmicos, esta pesquisa
resultou na
ferramenta pública
CMBuilder
, uma ferramenta online para a construção
automática de mapas conceituais a partir de textos, bem como uma api java chamada
ExtroutNLP
, que contém bibliotecas para extração de informações e serviços públicos.
Para alcançar o objetivo proposto, direcionados
esforços
para áreas de processamento
de linguagem natural e recuperação de informação. Ressaltamos que a principal tarefa para
alcançar nosso objetivo é extrair do texto as proposições do tipo (
conceito, rela
ção, conceito
). Sob
essa premissa, a pesquisa introduz um pipeline que compreende: regras gramaticais e busca
em profundidade para a extração de conceitos e relações a partir do texto; mapeamento de
preposição, resolução de anáforas e exploração de entidad
es nomeadas para a rotulação de
conceitos;
ranking
de conceitos baseado na análise de frequência de elementos e na topologia
do mapa; e sumarização de proposição baseada na topologia do grafo. Além disso, a
abordagem também propõe o uso de técnicas de apre
ndizagem supervisionada de
clusterização e classificação associadas ao uso de um tesauro para a definição do domínio do
texto e construção de um vocabulário conceitual de domínios.
Finalmente, uma análise objetiva para validar a exatidão da biblioteca
Extr
outNLP
é
executada e apresenta
0.65
precision
sobre o
corpus
. Além disso,
uma análise subjetiva para
validar a qualidade do mapa conceitual construído pela ferramenta
CMBuilder
é realizada
,
apresentando
0.75/0.45 para
precision
/
recall
de conceitos e 0.57/
0.23
para precision/
recall
de
relações em idioma inglês e
apresenta
ndo
0.68/
0.38
para precision/
recall
de conceitos e
0.41/
0.19
para precision/
recall
de
relações em idioma português.
Ademais
, um experimento
para verificar se o mapa conceitual
sumarizado
pe
lo CMBuilder tem influência para a
compreensão do assunto abordado em um texto é realizado
, atingindo 60% de
acertos
para
mapas extraídos de pequenos textos com questões de múltipla
escolha e 77% de
acertos para
m
apas extraídos de textos extensos com quest
ões discursivas
|
202 |
Text data analysis for a smart city project in a developing nationCurrin, Aubrey Jason January 2015 (has links)
Increased urbanisation against the backdrop of limited resources is complicating city planning and management of functions including public safety. The smart city concept can help, but most previous smart city systems have focused on utilising automated sensors and analysing quantitative data. In developing nations, using the ubiquitous mobile phone as an enabler for crowdsourcing of qualitative public safety reports, from the public, is a more viable option due to limited resources and infrastructure limitations. However, there is no specific best method for the analysis of qualitative text reports for a smart city in a developing nation. The aim of this study, therefore, is the development of a model for enabling the analysis of unstructured natural language text for use in a public safety smart city project. Following the guidelines of the design science paradigm, the resulting model was developed through the inductive review of related literature, assessed and refined by observations of a crowdsourcing prototype and conversational analysis with industry experts and academics. The content analysis technique was applied to the public safety reports obtained from the prototype via computer assisted qualitative data analysis software. This has resulted in the development of a hierarchical ontology which forms an additional output of this research project. Thus, this study has shown how municipalities or local government can use CAQDAS and content analysis techniques to prepare large quantities of text data for use in a smart city.
|
203 |
25 Challenges of Semantic Process ModelingMendling, Jan, Leopold, Henrik, Pittke, Fabian January 2014 (has links) (PDF)
Process modeling has become an essential part of many organizations for documenting, analyzing and redesigning their business operations and to support them with suitable information systems. In order to serve this purpose, it is important for process models to be well grounded in formal and precise semantics. While behavioural semantics of process models are well understood, there is a considerable gap of research into the semantic aspects of their text labels and natural language descriptions. The aim of this paper is to make this research gap more transparent. To this end, we clarify the role of textual content in process models and the challenges that are associated with the interpretation, analysis, and improvement of their natural language parts. More specifically, we discuss particular use cases of semantic process modeling to identify 25 challenges. For each challenge, we identify prior research and discuss directions for addressing them.
|
204 |
Personalized Medicine through Automatic Extraction of Information from Medical TextsFrunza, Oana Magdalena January 2012 (has links)
The wealth of medical-related information available today gives rise to a multidimensional source of knowledge. Research discoveries published in prestigious venues, electronic-health records data, discharge summaries, clinical notes, etc., all represent important medical information that can assist in the medical decision-making process. The challenge that comes with accessing and using such vast and diverse sources of data stands in the ability to distil and extract reliable and relevant information. Computer-based tools that use natural language processing and machine learning techniques have proven to help address such challenges. This current work proposes automatic reliable solutions for solving tasks that can help achieve a personalized-medicine, a medical practice that brings together general medical knowledge and case-specific medical information. Phenotypic medical observations, along with data coming from test results, are not enough when assessing and treating a medical case. Genetic, life-style, background and environmental data also need to be taken into
account in the medical decision process. This thesis’s goal is to prove that natural
language processing and machine learning techniques represent reliable solutions for
solving important medical-related problems.
From the numerous research problems that need to be answered when implementing
personalized medicine, the scope of this thesis is restricted to four, as follows:
1. Automatic identification of obesity-related diseases by using only textual clinical
data;
2. Automatic identification of relevant abstracts of published research to be used for
building systematic reviews;
3. Automatic identification of gene functions based on textual data of published medical abstracts;
4. Automatic identification and classification of important medical relations between medical concepts in clinical and technical data. This thesis investigation on finding automatic solutions for achieving a personalized medicine through information identification and extraction focused on individual specific problems that can be later linked in a puzzle-building manner. A diverse representation technique that follows a divide-and-conquer methodological approach shows to be the most reliable solution for building automatic models that solve the above mentioned
tasks. The methodologies that I propose are supported by in-depth research experiments
and thorough discussions and conclusions.
|
205 |
Automatic Supervised Thesauri Construction with Roget’s ThesaurusKennedy, Alistair H January 2012 (has links)
Thesauri are important tools for many Natural Language Processing applications. Roget's Thesaurus is particularly useful. It is of high quality and has been in development for over a century and a half. Yet its applications have been limited, largely because the only publicly available edition dates from 1911. This thesis proposes and tests methods of automatically updating the vocabulary of the 1911 Roget’s Thesaurus.
I use the Thesaurus as a source of training data in order to learn from Roget’s for the purpose of updating Roget’s. The lexicon is updated in two stages. First, I develop a measure of semantic relatedness that enhances existing distributional techniques. I improve existing methods by using known sets of synonyms from Roget’s to train a distributional measure to better identify near synonyms. Second, I use the new measure of semantic relatedness to find where in Roget’s to place a new word. Existing words from Roget’s are used as training data to tune the parameters of three methods of inserting words. Over 5000 new words and word-senses were added using this process.
I conduct two kinds of evaluation on the updated Thesaurus. One is on the procedure for updating Roget’s. This is accomplished by removing some words from the Thesaurus and testing my system's ability to reinsert them in the correct location. Human evaluation of the newly added words is also performed. Annotators must determine whether a newly added word is in the correct location. They found that in most cases the new words were almost indistinguishable from those already existing in Roget's Thesaurus.
The second kind of evaluation is to establish the usefulness of the updated Roget’s Thesaurus on actual Natural Language Processing applications. These applications include determining semantic relatedness between word pairs or sentence pairs, identifying the best synonym from a set of candidates, solving SAT-style analogy problems, pseudo-word-sense disambiguation, and sentence ranking for text summarization. The updated Thesaurus consistently performed at least as well or better the original Thesaurus on all these applications.
|
206 |
Evaluating Text SegmentationFournier, Christopher January 2013 (has links)
This thesis investigates the evaluation of automatic and manual text segmentation. Text segmentation is the process of placing boundaries within text to create segments according to some task-dependent criterion. An example of text segmentation is topical segmentation, which aims to segment a text according to the subjective definition of what constitutes a topic. A number of automatic segmenters have been created to perform this task, and the question that this thesis answers is how to select the best automatic segmenter for such a task. This requires choosing an appropriate segmentation evaluation metric, confirming the reliability of a manual solution, and then finally employing an evaluation methodology that can select the automatic segmenter that best approximates human performance.
A variety of comparison methods and metrics exist for comparing segmentations (e.g., WindowDiff, Pk), and all save a few are able to award partial credit for nearly missing a boundary. Those comparison methods that can award partial credit unfortunately lack consistency, symmetricity, intuition, and a host of other desirable qualities. This work proposes a new comparison method named boundary similarity (B) which is based upon a new minimal boundary edit distance to compare two segmentations. Near misses are frequent, even among manual segmenters (as is exemplified by the low inter-coder agreement reported by many segmentation studies). This work adapts some inter-coder agreement coefficients to award partial credit for near misses using the new metric proposed herein, B.
The methodologies employed by many works introducing automatic segmenters evaluate them simply in terms of a comparison of their output to one manual segmentation of a text, and often only by presenting nothing other than a series of mean performance values (along with no standard deviation, standard error, or little if any statistical hypothesis testing). This work asserts that one segmentation of a text cannot constitute a “true” segmentation; specifically, one manual segmentation is simply one sample of the population of all possible segmentations of a text and of that subset of desirable segmentations. This work further asserts that an adapted inter-coder agreement statistics proposed herein should be used to determine the reproducibility and reliability of a coding scheme and set of manual codings, and then statistical hypothesis testing using the specific comparison methods and methodologies demonstrated herein should be used to select the best automatic segmenter.
This work proposes new segmentation evaluation metrics, adapted inter-coder agreement coefficients, and methodologies. Most important, this work experimentally compares the state-or-the-art comparison methods to those proposed herein upon artificial data that simulates a variety of scenarios and chooses the best one (B). The ability of adapted inter-coder agreement coefficients, based upon B, to discern between various levels of agreement in artificial and natural data sets is then demonstrated. Finally, a contextual evaluation of three automatic segmenters is performed using the state-of-the art comparison methods and B using the methodology proposed herein to demonstrate the benefits and versatility of B as opposed to its counterparts.
|
207 |
Topical Structure in Long Informal DocumentsKazantseva, Anna January 2014 (has links)
This dissertation describes a research project concerned with establishing the topical structure of long informal documents. In this research, we place special emphasis on literary data, but also work with speech transcripts and several other types of data.
It has long been acknowledged that discourse is more than a sequence of sentences but, for the purposes of many Natural Language Processing tasks, it is often modelled exactly in that way. In this dissertation, we propose a practical approach to modelling discourse structure, with an emphasis on it being computationally feasible and easily applicable. Instead of following one of the many linguistic theories of discourse structure, we attempt to model the structure of a document as a tree of topical segments. Each segment encapsulates a span that concentrates on a particular topic at a certain level of granularity. Each span can be further sub-segmented based on finer fluctuations of topic. The lowest (most refined) level of segmentation is individual paragraphs.
In our model, each topical segment is described by a segment centre -- a sentence or a paragraph that best captures the contents of the segment. In this manner, the segmenter effectively builds an extractive hierarchical outline of the document. In order to achieve these goals, we use the framework of factor graphs and modify a recent clustering algorithm, Affinity Propagation, to perform hierarchical segmentation instead of clustering.
While it is far from being a solved problem, topical text segmentation is not uncharted territory. The methods developed so far, however, perform least well where they are most needed: on documents that lack rigid formal structure, such as speech transcripts, personal correspondence or literature. The model described in this dissertation is geared towards dealing with just such types of documents.
In order to study how people create similar models of literary data, we built two corpora of topical segmentations, one flat and one hierarchical. Each document in these corpora is annotated for topical structure by 3-6 people.
The corpora, the model of hierarchical segmentation and software for segmentation are the main contributions of this work.
|
208 |
Information Density and Persuasiveness in Naturalistic DataJanuary 2020 (has links)
abstract: Attitudes play a fundamental role when making critical judgments and the extremity of people’s attitudes can be influenced by one’s emotions, beliefs, or past experiences and behaviors. Human attitudes and preferences are susceptible to social influence and attempts to influence or change another person’s attitudes are pervasive in all societies. Given the importance of attitudes and attitude change, the current project investigated linguistic aspects of conversations that lead to attitude change by analyzing a dataset mined from Reddit’s Change My View (Priniski & Horne, 2018). Analysis of the data was done using Natural Language Processing (NLP), specifically information density, to predict attitude change. Top posts from Reddit’s (N = 510,149) were imported and processed in Python and information density measures were computed. The results indicate that comments with higher information density are more likely to be awarded a delta and are perceived to be more persuasive. / Dissertation/Thesis / Masters Thesis Psychology 2020
|
209 |
Emergency Medical Service EMR-Driven Concept Extraction From Narrative TextGeorge, Susanna Serene 08 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Being in the midst of a pandemic with patients having minor symptoms that quickly
become fatal to patients with situations like a stemi heart attack, a fatal accident injury,
and so on, the importance of medical research to improve speed and efficiency in patient
care, has increased. As researchers in the computer domain work hard to use automation
in technology in assisting the first responders in the work they do, decreasing the cognitive
load on the field crew, time taken for documentation of each patient case and improving
accuracy in details of a report has been a priority.
This paper presents an information extraction algorithm that custom engineers certain
existing extraction techniques that work on the principles of natural language processing
like metamap along with syntactic dependency parser like spacy for analyzing the sentence structure and regular expressions to recurring patterns, to retrieve patient-specific information from medical narratives. These concept value pairs automatically populates the fields of an EMR form which could be reviewed and modified manually if needed. This report can then be reused for various medical and billing purposes related to the patient.
|
210 |
Strojové učení pro odpovídání na otázky v češtině / Machine Learning for Question Answering in CzechPastorek, Peter January 2020 (has links)
This Master's thesis deals with teaching neural network question answering in Czech. Neural networks are created in Python programming language using the PyTorch library. They are created based on the LSTM structure. They are trained on the Czech SQAD dataset. Because Czech data set is smaller than the English data sets, I opted to extend neural networks with algorithmic procedures. For easier application of algorithmic procedures and better accuracy, I divide question answering into smaller parts.
|
Page generated in 0.1299 seconds