Spelling suggestions: "subject:"document summarized"" "subject:"document summarize""
1 |
Evaluation on user learning effect in different presentation of news eventchou, Shang-hua 19 May 2011 (has links)
Knowledge-based assets play a very important role in the Information Age. How to organize existing knowledge and present to the user properly are important research issue for decision support. Previous literature has indicated that multiple documents can be organized in different ways and different modes of knowledge presentation may result in different learning effects. Typical presentation modes include textual summarization and graphical presentation.
The purpose of this thesis is to evaluate whether textual and graphical presentations of a news event may result in different effects for the user. In particular, this study is focused on comparing the textual summary and ontology-base graphical presentation and use the Bloom Theory of Educational Objectives to measure the learning effect of the user
An experiment was conducted to assess the knowledge and cognitive process dimension in the Bloom¡¦s theory. We also measured the learning time, system quality, content quality, and overall satisfaction. The result shows that the textual system performed better in learning factual knowledge, and the ontology-base system performed better in learning conceptual and procedural knowledge.
|
2 |
Evaluation of Event Episode Analysis SystemLee, Ming-yu 26 July 2008 (has links)
Knowledge-based assets play a very important role in the Information Age, and its increasingly influence on organizational competition makes Knowledge Management a hot issue in business research.Content analysis of documents is a core function of knowledge management. In previous research, many techniques have been developed to generate textual summary and/or generating ontology-based episodic knowledge from multipl documents. However, not much research has been done to compare different ways of organizing and presenting knowledge.
Since different knowledge presentations may result in different effects on the user, the purpose of this thesis is to develop a method for investigating different document summary and presentation systems. In this research, we have developed an effect measurement method based on the extended Bloom¡¦s Taxonomy of Educational Objectives.More specifically, we proposes evaluation criteria based on memory and cognition of the user.
A field experiment was conducted to compare graphical and textual systems. Results indicate that the ontology-based system has significantly superior performance in concept memorizing and procedural memorizing. On the other hand, the textual summary-based system performed better in remembering facts.
|
3 |
Ranked Search on Data GraphsVaradarajan, Ramakrishna R. 10 March 2009 (has links)
Graph-structured databases are widely prevalent, and the problem of effective search and retrieval from such graphs has been receiving much attention recently. For example, the Web can be naturally viewed as a graph. Likewise, a relational database can be viewed as a graph where tuples are modeled as vertices connected via foreign-key relationships. Keyword search querying has emerged as one of the most effective paradigms for information discovery, especially over HTML documents in the World Wide Web. One of the key advantages of keyword search querying is its simplicity – users do not have to learn a complex query language, and can issue queries without any prior knowledge about the structure of the underlying data. The purpose of this dissertation was to develop techniques for user-friendly, high quality and efficient searching of graph structured databases. Several ranked search methods on data graphs have been studied in the recent years. Given a top-k keyword search query on a graph and some ranking criteria, a keyword proximity search finds the top-k answers where each answer is a substructure of the graph containing all query keywords, which illustrates the relationship between the keyword present in the graph. We applied keyword proximity search on the web and the page graph of web documents to find top-k answers that satisfy user’s information need and increase user satisfaction. Another effective ranking mechanism applied on data graphs is the authority flow based ranking mechanism. Given a top-k keyword search query on a graph, an authority-flow based search finds the top-k answers where each answer is a node in the graph ranked according to its relevance and importance to the query. We developed techniques that improved the authority flow based search on data graphs by creating a framework to explain and reformulate them taking in to consideration user preferences and feedback. We also applied the proposed graph search techniques for Information Discovery over biological databases. Our algorithms were experimentally evaluated for performance and quality. The quality of our method was compared to current approaches by using user surveys.
|
4 |
Using semantic folding with TextRank for automatic summarization / Semantisk vikning med TextRank för automatisk sammanfattningKarlsson, Simon January 2017 (has links)
This master thesis deals with automatic summarization of text and how semantic folding can be used as a similarity measure between sentences in the TextRank algorithm. The method was implemented and compared with two common similarity measures. These two similarity measures were cosine similarity of tf-idf vectors and the number of overlapping terms in two sentences. The three methods were implemented and the linguistic features used in the construction were stop words, part-of-speech filtering and stemming. Five different part-of-speech filters were used, with different mixtures of nouns, verbs, and adjectives. The three methods were evaluated by summarizing documents from the Document Understanding Conference and comparing them to gold-standard summarization created by human judges. Comparison between the system summaries and gold-standard summaries was made with the ROUGE-1 measure. The algorithm with semantic folding performed worst of the three methods, but only 0.0096 worse in F-score than cosine similarity of tf-idf vectors that performed best. For semantic folding, the average precision was 46.2% and recall 45.7% for the best-performing part-of-speech filter. / Det här examensarbetet behandlar automatisk textsammanfattning och hur semantisk vikning kan användas som likhetsmått mellan meningar i algoritmen TextRank. Metoden implementerades och jämfördes med två vanliga likhetsmått. Dessa två likhetsmått var cosinus-likhet mellan tf-idf-vektorer samt antal överlappande termer i två meningar. De tre metoderna implementerades och de lingvistiska särdragen som användes vid konstruktionen var stoppord, filtrering av ordklasser samt en avstämmare. Fem olika filter för ordklasser användes, med olika blandningar av substantiv, verb och adjektiv. De tre metoderna utvärderades genom att sammanfatta dokument från DUC och jämföra dessa mot guldsammanfattningar skapade av mänskliga domare. Jämförelse mellan systemsammanfattningar och guldsammanfattningar gjordes med måttet ROUGE-1. Algoritmen med semantisk vikning presterade sämst av de tre jämförda metoderna, dock bara 0.0096 sämre i F-score än cosinus-likhet mellan tf-idf-vektorer som presterade bäst. För semantisk vikning var den genomsnittliga precisionen 46.2% och recall 45.7% för det ordklassfiltret som presterade bäst.
|
5 |
Semantics-driven Abstractive Document SummarizationAlambo, Amanuel 02 August 2022 (has links)
No description available.
|
6 |
Investigação de modelos de coerência local para sumários multidocumento / Investigation of local coherence models for multri-document summariesDias, Márcio de Souza 10 May 2016 (has links)
A sumarização multidocumento consiste na tarefa de produzir automaticamente um único sumário a partir de um conjunto de textos derivados de um mesmo assunto. É imprescindível que seja feito o tratamento de fenômenos que ocorrem neste cenário, tais como: (i) a redundância, a complementaridade e a contradição de informações; (ii) a uniformização de estilos de escrita; (iii) tratamento de expressões referenciais; (iv) a manutenção de focos e perspectivas diferentes nos textos; (v) e a ordenação temporal das informações no sumário. O tratamento de tais fenômenos contribui significativamente para que seja produzido ao final um sumário informativo e coerente, características difíceis de serem garantidas ainda que por um humano. Um tipo particular de coerência estudado nesta tese é a coerência local, a qual é definida por meio de relações entre enunciados (unidades menores) em uma sequência de sentenças, de modo a garantir que os relacionamentos contribuirão para a construção do sentido do texto em sua totalidade. Partindo do pressuposto de que o uso de conhecimento discursivo pode melhorar a avaliação da coerência local, o presente trabalho propõe-se a investigar o uso de relações discursivas para elaborar modelos de coerência local, os quais são capazes de distinguir automaticamente sumários coerentes dos incoerentes. Além disso, um estudo sobre os erros que afetam a Qualidade Linguística dos sumários foi realizado com o propósito de verificar quais são os erros que afetam a coerência local dos sumários, se os modelos de coerência podem identificar tais erros e se há alguma relação entre os modelos de coerência e a informatividade dos sumários. Para a realização desta pesquisa foi necessário fazer o uso das informações semântico-discursivas dos modelos CST (Cross-document Structure Theory) e RST (Rhetorical Structure Theory) anotadas no córpus, de ferramentas automáticas, como o parser Palavras e de algoritmos que extraíram informações do córpus. Os resultados mostraram que o uso de informações semântico-discursivas foi bem sucedido na distinção dos sumários coerentes dos incoerentes e que os modelos de coerência implementados nesta tese podem ser usados na identificação de erros da qualidade linguística que afetam a coerência local. / Multi-document summarization is the task of automatically producing a single summary from a collection of texts derived from the same subject. It is essential to treat many phenomena, such as: (i) redundancy, complementarity and contradiction of information; (ii) writing styles standardization; (iii) treatment of referential expressions; (iv) text focus and different perspectives; (v) and temporal ordering of information in the summary. The treatment of these phenomena contributes to the informativeness and coherence of the final summary. A particular type of coherence studied in this thesis is the local coherence, which is defined by the relationship between statements (smallest units) in a sequence of sentences. The local coherence contributes to the construction of textual meaning in its totality. Assuming that the use of discursive knowledge can improve the evaluation of the local coherence, this thesis proposes to investigate the use of discursive relations to develop local coherence models, which are able to automatically distinguish coherent summaries from incoherent ones. In addition, a study on the errors that affect the Linguistic Quality of the summaries was conducted in order to verify what are the errors that affect the local coherence of summaries, as well as if the coherence models can identify such errors, and whether there is any relationship between coherence models and informativenessof summaries. For thisresearch, it wasnecessary theuseof semantic-discursive information of CST models (Cross-document Structure Theory) and RST (Rhetorical Structure Theory) annoted in the corpora, automatic tools, parser as Palavras, and algorithms that extract information from the corpus. The results showed that the use of semantic-discursive information was successful on the distinction between coherent and incoherent summaries, and that the information about coherence can be used in error detection of linguistic quality that affect the local coherence.
|
7 |
Investigação de métodos de sumarização automática multidocumento baseados em hierarquias conceituaisZacarias, Andressa Caroline Inácio 29 March 2016 (has links)
Submitted by Livia Mello (liviacmello@yahoo.com.br) on 2016-09-30T19:20:49Z
No. of bitstreams: 1
DissACIZ.pdf: 2734710 bytes, checksum: bf061fead4f2a8becfcbedc457a68b25 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T16:19:10Z (GMT) No. of bitstreams: 1
DissACIZ.pdf: 2734710 bytes, checksum: bf061fead4f2a8becfcbedc457a68b25 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T16:19:17Z (GMT) No. of bitstreams: 1
DissACIZ.pdf: 2734710 bytes, checksum: bf061fead4f2a8becfcbedc457a68b25 (MD5) / Made available in DSpace on 2016-10-20T16:19:25Z (GMT). No. of bitstreams: 1
DissACIZ.pdf: 2734710 bytes, checksum: bf061fead4f2a8becfcbedc457a68b25 (MD5)
Previous issue date: 2016-03-29 / Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) / The Automatic Multi-Document Summarization (MDS) aims at creating a single
summary, coherent and cohesive, from a collection of different sources texts, on the
same topic. The creation of these summaries, in general extracts (informative and
generic), requires the selection of the most important sentences from the collection.
Therefore, one may use superficial linguistic knowledge (or statistic) or deep
knowledge. It is important to note that deep methods, although more expensive and less robust, produce more informative extracts and with more linguistic quality. For the Portuguese language, the sole deep methods that use lexical-conceptual knowledge are based on the frequency of the occurrence of the concepts in the collection for the selection of a content. Considering the potential for application of semantic-conceptual knowledge, the proposition is to investigate MDS methods that start with representation of lexical concepts of source texts in a hierarchy for further exploration of certain hierarchical properties able to distinguish the most relevant concepts (in other words, the topics from a collection of texts) from the others. Specifically, 3 out of 50 CSTNews (multi-document corpus of Portuguese reference) collections were selected and the names that have occurred in the source texts of each collection were manually indexed to the concepts of the WordNet from Princenton (WN.Pr), engendering at the end, an hierarchy with the concepts derived from the collection and other concepts inherited from the WN.PR for the construction of the hierarchy. The hierarchy concepts were characterized in 5 graph metrics (of relevancy) potentially relevant to identify the concepts that compose a summary: Centrality, Simple Frequency, Cumulative Frequency, Closeness and Level. Said characterization was analyzed manually and by machine learning algorithms (ML) with the purpose of verifying the most suitable measures to identify the relevant concepts of the collection. As a result, the measure Centrality was disregarded and the other ones were used to propose content selection methods to MDS. Specifically, 2 sentences selection methods were selected which make up the extractive methods: (i) CFSumm whose content selection is exclusively based on the metric Simple Frequency, and (ii) LCHSumm whose selection is based on rules
learned by machine learning algorithms from the use of all 4 relevant measures as
attributes. These methods were intrinsically evaluated concerning the informativeness, by means of the package of measures called ROUGE, and the evaluation of linguistic quality was based on the criteria from the TAC conference. Therefore, the 6 human abstracts available in each CSTNews collection were used. Furthermore, the summaries generated by the proposed methods were compared to the extracts generated by the GistSumm summarizer, taken as baseline. The two methods got satisfactory results when compared to the GistSumm baseline and the CFSumm method stands out upon the LCHSumm method. / Na Sumarização Automática Multidocumento (SAM), busca-se gerar um único
sumário, coerente e coeso, a partir de uma coleção de textos, de diferentes fontes, que
tratam de um mesmo assunto. A geração de tais sumários, comumente extratos
(informativos e genéricos), requer a seleção das sentenças mais importantes da coleção.
Para tanto, pode-se empregar conhecimento linguístico superficial (ou estatística) ou
conhecimento profundo. Quanto aos métodos profundos, destaca-se que estes, apesar de
mais caros e menos robustos, produzem extratos mais informativos e com mais
qualidade linguística. Para o português, os únicos métodos profundos que utilizam
conhecimento léxico-conceitual baseiam na frequência de ocorrência dos conceitos na
coleção para a seleção de conteúdo. Tendo em vista o potencial de aplicação do
conhecimento semântico-conceitual, propôs-se investigar métodos de SAM que partem
da representação dos conceitos lexicais dos textos-fonte em uma hierarquia para a
posterior exploração de certas propriedades hierárquicas capazes de distinguir os
conceitos mais relevantes (ou seja, os tópicos da coleção) dos demais. Especificamente,
selecionaram-se 3 das 50 coleções do CSTNews, corpus multidocumento de referência
do português, e os nomes que ocorrem nos textos-fonte de cada coleção foram
manualmente indexados aos conceitos da WordNet de Princeton (WN.Pr), gerando, ao
final, uma hierarquia com os conceitos constitutivos da coleção e demais conceitos
herdados da WN.Pr para a construção da hierarquia. Os conceitos da hierarquia foram
caracterizados em função de 5 métricas (de relevância) de grafo potencialmente
pertinentes para a identificação dos conceitos a comporem um sumário: Centrality,
Simple Frequency, Cumulative Frequency, Closeness e Level. Tal caracterização foi
analisada de forma manual e por meio de algoritmos de Aprendizado de Máquina (AM)
com o objetivo de verificar quais medidas seriam as mais adequadas para identificar os
conceitos relevantes da coleção. Como resultado, a medida Centrality foi descartada e
as demais utilizadas para propor métodos de seleção de conteúdo para a SAM.
Especificamente, propuseram-se 2 métodos de seleção de sentenças, os quais compõem
os métodos extrativos: (i) CFSumm, cuja seleção de conteúdo se baseia exclusivamente
na métrica Simple Frequency, e (ii) LCHSumm, cuja seleção se baseia em regras
aprendidas por algoritmos de AM a partir da utilização em conjunto das 4 medidas
relevantes como atributos. Tais métodos foram avaliados intrinsecamente quanto à
informatividade, por meio do pacote de medidas ROUGE, e qualidade linguística, com
base nos critérios da conferência TAC. Para tanto, utilizaram-se os 6 abstracts humanos
disponíveis em cada coleção do CSTNews. Ademais, os sumários gerados pelos
métodos propostos foram comparados aos extratos gerados pelo sumarizador
GistSumm, tido como baseline. Os dois métodos obtiveram resultados satisfatórios
quando comparados ao baseline GistSumm e o método CFSumm se sobressai ao
método LCHSumm. / FAPESP 2014/12817-4
|
8 |
Investigação de modelos de coerência local para sumários multidocumento / Investigation of local coherence models for multri-document summariesMárcio de Souza Dias 10 May 2016 (has links)
A sumarização multidocumento consiste na tarefa de produzir automaticamente um único sumário a partir de um conjunto de textos derivados de um mesmo assunto. É imprescindível que seja feito o tratamento de fenômenos que ocorrem neste cenário, tais como: (i) a redundância, a complementaridade e a contradição de informações; (ii) a uniformização de estilos de escrita; (iii) tratamento de expressões referenciais; (iv) a manutenção de focos e perspectivas diferentes nos textos; (v) e a ordenação temporal das informações no sumário. O tratamento de tais fenômenos contribui significativamente para que seja produzido ao final um sumário informativo e coerente, características difíceis de serem garantidas ainda que por um humano. Um tipo particular de coerência estudado nesta tese é a coerência local, a qual é definida por meio de relações entre enunciados (unidades menores) em uma sequência de sentenças, de modo a garantir que os relacionamentos contribuirão para a construção do sentido do texto em sua totalidade. Partindo do pressuposto de que o uso de conhecimento discursivo pode melhorar a avaliação da coerência local, o presente trabalho propõe-se a investigar o uso de relações discursivas para elaborar modelos de coerência local, os quais são capazes de distinguir automaticamente sumários coerentes dos incoerentes. Além disso, um estudo sobre os erros que afetam a Qualidade Linguística dos sumários foi realizado com o propósito de verificar quais são os erros que afetam a coerência local dos sumários, se os modelos de coerência podem identificar tais erros e se há alguma relação entre os modelos de coerência e a informatividade dos sumários. Para a realização desta pesquisa foi necessário fazer o uso das informações semântico-discursivas dos modelos CST (Cross-document Structure Theory) e RST (Rhetorical Structure Theory) anotadas no córpus, de ferramentas automáticas, como o parser Palavras e de algoritmos que extraíram informações do córpus. Os resultados mostraram que o uso de informações semântico-discursivas foi bem sucedido na distinção dos sumários coerentes dos incoerentes e que os modelos de coerência implementados nesta tese podem ser usados na identificação de erros da qualidade linguística que afetam a coerência local. / Multi-document summarization is the task of automatically producing a single summary from a collection of texts derived from the same subject. It is essential to treat many phenomena, such as: (i) redundancy, complementarity and contradiction of information; (ii) writing styles standardization; (iii) treatment of referential expressions; (iv) text focus and different perspectives; (v) and temporal ordering of information in the summary. The treatment of these phenomena contributes to the informativeness and coherence of the final summary. A particular type of coherence studied in this thesis is the local coherence, which is defined by the relationship between statements (smallest units) in a sequence of sentences. The local coherence contributes to the construction of textual meaning in its totality. Assuming that the use of discursive knowledge can improve the evaluation of the local coherence, this thesis proposes to investigate the use of discursive relations to develop local coherence models, which are able to automatically distinguish coherent summaries from incoherent ones. In addition, a study on the errors that affect the Linguistic Quality of the summaries was conducted in order to verify what are the errors that affect the local coherence of summaries, as well as if the coherence models can identify such errors, and whether there is any relationship between coherence models and informativenessof summaries. For thisresearch, it wasnecessary theuseof semantic-discursive information of CST models (Cross-document Structure Theory) and RST (Rhetorical Structure Theory) annoted in the corpora, automatic tools, parser as Palavras, and algorithms that extract information from the corpus. The results showed that the use of semantic-discursive information was successful on the distinction between coherent and incoherent summaries, and that the information about coherence can be used in error detection of linguistic quality that affect the local coherence.
|
9 |
應用文本主題與關係探勘於多文件自動摘要方法之研究:以電影評論文章為例 / Application of text topic and relationship mining for multi-document summarization: using movie reviews as an example林孟儀 Unknown Date (has links)
由於網際網路的普及造成資訊量愈來愈大,在資訊的搜尋、整理與閱讀上會耗費許多時間,因此本研究提出一應用文本主題及關係探勘的方法,將多份文件自動生成一篇摘要,以幫助使用者能降低資訊的閱讀時間,並能快速理解文件所欲表達之意涵。
本研究以電影評論文章為例,結合文章結構的概念,將影評摘要分為「電影資訊」、「電影劇情介紹」及「心得結論」三部分,其中「電影資訊」及「心得結論」為透過本研究建置之電影領域相關詞庫比對得出。接著將餘下之段落歸屬於「電影劇情介紹」,並透過LDA主題模型將段落分群,再運用主題關係地圖的概念挑選各群之代表段落並排序,最後將各段落去除連接詞及將代名詞還原為其所指之主詞,以形成一篇列點式影評摘要。
研究結果顯示,本研究所實驗之三部電影,產生之摘要能涵蓋較多的資訊內容,提升了摘要之多樣性,在與最佳範本摘要的相似度比對上,分別提升了10.8228%、14.0123%及25.8142%,可知本研究方法能有效掌握文件之重點內容,生成之摘要更為全面,藉由此方法讓使用者自動彙整電影評論文章,以生成一精簡之摘要,幫助使用者節省其在資訊的搜尋及閱讀的時間,以便能快速了解相關電影之資訊及評論。 / The rapid development of information technology over the past decades has dramatically increased the amount of online information. Because of the time-wasting on absorbing large amounts of information for users, we would like to present a method in this thesis by using text topic and relationship mining for multi-document summarization to help users grasp the theme of multiple documents quickly and easily by reading the accurate summary without reading the whole documents.
We use movie reviews as an example of multi-document summarization and apply the concept of article structures to categorize summary into film data, film orientation and conclusion by comparing the thesaurus of movie review field built by this thesis. Then we cluster the paragraphs in the structure of film orientation into different topics by Latent Dirichlet Allocation (LDA). Next, we apply the concept of text relationship map, a network of paragraphs and the node in the network referring to a paragraph and an edge indicating that the corresponding paragraphs are related to each other, to extract the most important paragraph in each topic and order them. Finally, we remove conjunctions and replace pronouns with the name it indicates in each extracted paragraph s and generate a bullet-point summary.
From the result, the summary produced by this thesis can cover different topics of contents and improve the diversity of the summary. The similarities compared with the produced summaries and the best-sample summaries raise of 10.8228%, 14.0123% and 25.8142% respectively. The method presented in this thesis grasps the key contents effectively and generates a comprehensive summary. By providing this method, we try to let users aggregate the movie reviews automatically and generate a simplified summary to help them reduce the time in searching and reading articles.
|
10 |
Metody sumarizace textových dokumentů / Methods of Text Document SummarizationPokorný, Lubomír January 2012 (has links)
This thesis deals with one-document summarization of text data. Part of it is devoted to data preparation, mainly to the normalization. Listed are some of the stemming algorithms and it contains also description of lemmatization. The main part is devoted to Luhn"s method for summarization and its extension of use WordNet dictionary. Oswald summarization method is described and applied as well. Designed and implemented application performs automatic generation of abstracts using these methods. A set of experiments where developed, which verified correct functionality of the application and of extension of Luhn"s summarization method too.
|
Page generated in 0.1448 seconds