Spelling suggestions: "subject:"text summarized"" "subject:"text summarize""
11 |
Sentence Compression by Removing Recursive Structure from Parse TreeMatsubara, Shigeki, Kato, Yoshihide, Egawa, Seiji 04 December 2008 (has links)
PRICAI 2008: Trends in Artificial Intelligence 10th Pacific Rim International Conference on Artificial Intelligence, Hanoi, Vietnam, December 15-19, 2008. Proceedings
|
12 |
Automatic Video Categorization And SummarizationDemirtas, Kezban 01 September 2009 (has links) (PDF)
In this thesis, we make automatic video categorization and summarization by using subtitles of videos. We propose two methods for video categorization. The first method makes unsupervised categorization by applying natural language processing techniques on video subtitles and uses the WordNet lexical database and WordNet domains. The method starts with text preprocessing. Then a keyword extraction algorithm and a word sense disambiguation method are applied. The WordNet domains that correspond to the correct senses of keywords are extracted. Video is assigned a category label based on the extracted domains. The second method has the same steps for extracting WordNet domains of video but makes categorization by using a learning module. Experiments with documentary videos give promising results in discovering the correct categories of videos.
Video summarization algorithms present condensed versions of a full length video by identifying the most significant parts of the video. We propose a video summarization method using the subtitles of videos and text summarization techniques. We identify significant sentences in the subtitles of a video by using text summarization techniques and then we compose a video summary by finding the video parts corresponding to these summary sentences.
|
13 |
Text Summarization Using Latent Semantic AnalysisOzsoy, Makbule Gulcin 01 February 2011 (has links) (PDF)
Text summarization solves the problem of presenting the information needed by a user in a compact form. There are different approaches to create well formed summaries in literature. One of the newest methods in text summarization is the Latent Semantic Analysis (LSA) method. In this thesis, different LSA based summarization algorithms are explained and two new LSA based summarization algorithms are proposed. The algorithms are evaluated on Turkish and English documents, and their performances are compared using their ROUGE scores.
|
14 |
Generalized Probabilistic Topic and Syntax Models for Natural Language ProcessingDarling, William Michael 14 September 2012 (has links)
This thesis proposes a generalized probabilistic approach to modelling document collections along the combined axes of both semantics and syntax. Probabilistic topic (or semantic) models view documents as random mixtures of unobserved latent topics which are themselves represented as probabilistic distributions over words. They have grown immensely in popularity since the introduction of the original topic model, Latent Dirichlet Allocation (LDA), in 2004, and have seen successes in computational linguistics, bioinformatics, political science, and many other fields. Furthermore, the modular nature of topic models allows them to be extended and adapted to specific tasks with relative ease. Despite the recorded successes, however, there remains a gap in combining axes of information from different sources and in developing models that are as useful as possible for specific applications, particularly in Natural Language Processing (NLP). The main contributions of this thesis are two-fold. First, we present generalized probabilistic models (both parametric and nonparametric) that are semantically and syntactically coherent and contain many simpler probabilistic models as special cases. Our models are consistent along both axes of word information in that an LDA-like component sorts words that are semantically related into distinct topics and a Hidden Markov Model (HMM)-like component determines the syntactic parts-of-speech of words so that we can group words that are both semantically and syntactically affiliated in an unsupervised manner, leading to such groups as verbs about health care and nouns about sports. Second, we apply our generalized probabilistic models to two NLP tasks. Specifically, we present new approaches to automatic text summarization and unsupervised part-of-speech (POS) tagging using our models and report results commensurate with the state-of-the-art in these two sub-fields. Our successes demonstrate the general applicability of our modelling techniques to important areas in computational linguistics and NLP.
|
15 |
Modelo Cassiopeia como avaliador de sum?rios autom?ticos: aplica??o em um corpus educacionalAguiar, Lu?s Henrique Gon?alves de 05 December 2017 (has links)
Submitted by Jos? Henrique Henrique (jose.neves@ufvjm.edu.br) on 2018-04-19T18:44:37Z
No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
luis_henrique_goncalves_aguiar.pdf: 1963486 bytes, checksum: ce8ee9274d520386492773d2e289f109 (MD5) / Approved for entry into archive by Rodrigo Martins Cruz (rodrigo.cruz@ufvjm.edu.br) on 2018-04-23T16:27:14Z (GMT) No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
luis_henrique_goncalves_aguiar.pdf: 1963486 bytes, checksum: ce8ee9274d520386492773d2e289f109 (MD5) / Made available in DSpace on 2018-04-23T16:27:14Z (GMT). No. of bitstreams: 2
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
luis_henrique_goncalves_aguiar.pdf: 1963486 bytes, checksum: ce8ee9274d520386492773d2e289f109 (MD5)
Previous issue date: 2017 / Considerando a grande quantidade de informa??es textuais dispon?veis atualmente,
principalmente na web, est? se tronando cada vez mais dif?cil o acesso e a assimila??o desse
conte?do para o usu?rio. Nesse contexto, torna-se necess?rio buscar tarefas capazes de
transformar essa grande quantidade de dados em conhecimento ?til e organizado. Uma
alternativa para amenizar esse problema, ? reduzir o volume de informa??es dispon?veis a partir
da produ??o de resumos dos textos originais, por meio da sumariza??o autom?tica (SA) de
textos. A sumariza??o autom?tica de textos consiste na produ??o autom?tica de resumos a partir
de um ou mais textos-fonte, de modo que o sum?rio contenha as informa??es mais relevantes
deste. A avalia??o de resumos ? uma tarefa importante no campo da sumariza??o autom?tica
de texto, a abordagem mais intuitiva ? a avalia??o humana, por?m ? onerosa e improdutiva.
Outra alternativa ? a avalia??o autom?tica, alguns avaliadores foram propostos, sendo a mais
conhecida e amplamente usada ? a medida ROUGE (Recall-Oriented Understudy for Gisting
Evaluation). Um fator limitante na avalia??o da ROUGE ? a utiliza??o do sum?rio humano de
refer?ncia, o que implica em uma restri??o do idioma e dom?nio, al?m de requerer um trabalho
humano demorado e oneroso. Diante das dificuldades encontradas na avalia??o de sum?rios
autom?ticos, o presente trabalho apresenta o modelo Cassiopeia como um novo m?todo de
avalia??o. O modelo ? um agrupador de textos hier?rquico, o qual consiste no uso da
sumariza??o na etapa do pr?-processamento, onde a qualidade do agrupamento ? influenciada
positivamente conforme a qualidade da sumariza??o. As simula??es realizadas neste trabalho
mostraram que a avalia??o realizada pelo modelo Cassiopeia ? semelhante a avalia??o realizada
pela ferramenta ROUGE. Por outro lado, a utiliza??o do modelo Cassiopeia como avaliador de
sum?rios autom?ticos evidenciou algumas vantagens, sendo as principais; a n?o utiliza??o do
sum?rio humano no processo de avalia??o, e a independ?ncia do dom?nio e do idioma. / Disserta??o (Mestrado Profissional) ? Programa de P?s-Gradua??o em Educa??o, Universidade Federal dos Vales do Jequitinhonha e Mucuri, 2017. / Considering the large amount of textual information currently available, especially on the web,
it is becoming increasingly difficult to access and assimilate this content to the user. In this
context, it becomes necessary to search for tasks that can transform this large amount of
information into useful and organized knowledge. The solution, or at least an alternative, to
moderate this problem is to reduce the volume of information available, from the production of
abstracts of the original texts, through automatic summarization (SA) of texts. The Automatic
Summarization of texts consists of the automatic production of abstracts from one or more
source texts, which the summary must contain the most relevant information of the source text.
The evaluation of abstracts is an important task in the field of automatic text summarization,
the most intuitive approach is human evaluation, but it is costly and unproductive. Another
alternative is the automatic evaluation, some evaluators have been proposed, and the most
widely used is the ROUGE (Recall-Oriented Understudy for Gisting Evaluation). A limiting
factor in ROUGE's evaluation is the use of the human reference summary, which implies a
restriction of language and domain, as well as requiring time-consuming and expensive human
work. In view of the difficulties encountered in the evaluation of automatic summaries, this
paper presents the Cassiopeia model as a new evaluation method. The model is a hierarchical
text grouper, which consists of the use of the summarization in the stage of the pre-processing,
where the quality of the grouping is influenced positively according to the quality of the
summarization. The simulations performed in this work showed that the evaluations performed
by Cassiopeia in comparison to the ROUGE tool are similar. On the other hand, the use of the
Cassiopeia model as an automatic summarization evaluator showed some advantages, the main
ones are; being the non-use of the human abstract in the evaluation process, and the independent
of the domain and the language.
|
16 |
Automatic Text Summarization Using Importance of Sentences for Email CorpusJanuary 2015 (has links)
abstract: With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the results are not effective towards the search query of the user. The search engine user has to go through certain webpages before getting at the webpage he/she wanted. This problem of Information Overload can be solved using Automatic Text Summarization. Summarization is a process of obtaining at abridged version of documents so that user can have a quick view to understand what exactly the document is about. Email threads from W3C are used in this system. Apart from common IR features like Term Frequency, Inverse Document Frequency, Term Rank, a variation of page rank based on graph model, which can cluster the words with respective to word ambiguity, is implemented. Term Rank also considers the possibility of co-occurrence of words with the corpus and evaluates the rank of the word accordingly. Sentences of email threads are ranked as per features and summaries are generated. System implemented the concept of pyramid evaluation in content selection. The system can be considered as a framework for Unsupervised Learning in text summarization. / Dissertation/Thesis / Masters Thesis Computer Science 2015
|
17 |
Keeping an Eye on the Context : An Eye Tracking Study of Cohesion Errors in Automatic Text Summarization / Med ett öga på sammanhanget : En ögonrörelsestudie av kohesionsfel i automatiska textsammanfattningarRennes, Evelina January 2013 (has links)
Automatic text summarization is a growing field due to the modern world’s Internet based society, but to automatically create perfect summaries is not easy, and cohesion errors are common. By the usage of an eye tracking camera, this thesis studies the nature of four different types of cohesion errors occurring in summaries. A total of 23 participants read and rated four different texts and marked the most difficult areas of each text. Statistical analysis of the data revealed that absent cohesion or context and broken anaphoric reference (pronouns) caused some disturbance in reading, but that the impact is restricted to the effort to read rather than the comprehension of the text. Erroneous anaphoric reference (pronouns) was not detected by the participants which poses a problem for automatic text summarizers, and other potential disturbing factors were detected. Finally, the question of the meaningfulness of keeping absent cohesion or context as a separate error type was raised.
|
18 |
FINE-TUNE A LANGUAGE MODEL FOR TEXT SUMMARIZATION (BERTSUM) ON EDGAR-CORPUSNiu, Yijie January 2022 (has links)
Financial reports include a lot of useful information for investors, but extracting this information is time-consuming. We think text summarization is a feasible method. In this thesis, we implement BERTSUM, a state-of-the-art language model for text summarization, and evaluate the results by ROUGE metrics. The experiment was carried out on a novel and large-scale financial dataset called EDGAR-CORPUS. The BERTSUM with a transformer achieves the best performance with a ROUGE-L F1 score of 9.26%. We also hand-picked some model-generated summaries that contained common errors and investigated the causes. The results were then compared to previous research. The ROUGE-L F1 value in the previous study was much higher than ours, we think this is due to the length of the financial reports.
|
19 |
Summarization and keyword extraction on customer feedback data : Comparing different unsupervised methods for extracting trends and insight from textSkoghäll, Therése, Öhman, David January 2022 (has links)
Polestar has during the last couple of months more than doubled its amount of customer feedback, and the forecast for the future is that this amount will increase even more. Manually reading this feedback is expensive and time-consuming, and for this reason there's a need to automatically analyse the customer feedback. The company wants to understand the customer and extract trends and topics that concerns the consumer in order to improve the customer experience. Over the last couple of years as Natural Language Processing developed immensely, new state of the art language models have pushed the boundaries in all type of benchmark tasks. In this thesis have three different extractive summarization models and three different keyword extraction methods been tested and evaluated based on two different quantitative measures and human evaluation to extract information from text. This master thesis has shown that extractive summarization models with a Transformer-based text representation are best at capturing the context in a text. Based on the quantitative results and the company's needs, Textrank with a Transformer-based embedding was chosen as the final extractive summarization model. For Keywords extraction was the best overall model YAKE!, based on the quantitative measure and human validation
|
20 |
Numerical Optimization Methods based on Discrete Structure for Text Summarization and Relational Learning / 文書要約と関係学習のための離散構造に基づいた数値最適化法Nishino, Masaaki 24 September 2014 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第18613号 / 情博第537号 / 新制||情||95(附属図書館) / 31513 / 京都大学大学院情報学研究科知能情報学専攻 / (主査)教授 山本 章博, 教授 黒橋 禎夫, 教授 阿久津 達也 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM
|
Page generated in 0.1303 seconds