Global ETD Search

11	Multi-document Summarization System Using Rhetorical Information Alliheedi, Mohammed 03 July 2012 (has links) Over the past 20 years, research in automated text summarization has grown significantly in the field of natural language processing. The massive availability of scientific and technical information on the Internet, including journals, conferences, and news articles has attracted the interest of various groups of researchers working in text summarization. These researchers include linguistics, biologists, database researchers, and information retrieval experts. However, because the information available on the web is ever expanding, reading the sheer volume of information is a significant challenge. To deal with this volume of information, users need appropriate summaries to help them more efficiently manage their information needs. Although many automated text summarization systems have been proposed in the past twenty years, none of these systems have incorporated the use of rhetoric. To date, most automated text summarization systems have relied only on statistical approaches. These approaches do not take into account other features of language such as antimetabole and epanalepsis. Our hypothesis is that rhetoric can provide this type of additional information. This thesis addresses these issues by investigating the role of rhetorical figuration in detecting the salient information in texts. We show that automated multi-document summarization can be improved using metrics based on rhetorical figuration. A corpus of presidential speeches, which is for different U.S. presidents speeches, has been created. It includes campaign, state of union, and inaugural speeches to test our proposed multi-document summarization system. Various evaluation metrics have been used to test and compare the performance of the produced summaries of both our proposed system and other system. Our proposed multi-document summarization system using rhetorical figures improves the produced summaries, and achieves better performance over MEAD system in most of the cases especially in antimetabole, polyptoton, and isocolon. Overall, the results of our system are promising and leads to future progress on this research. Rhetorical Summarization Computer Science
12	Discovering and summarizing email conversations Zhou, Xiaodong 05 1900 (has links) With the ever increasing popularity of emails, it is very common nowadays that people discuss specific issues, events or tasks among a group of people by emails. Those discussions can be viewed as conversations via emails and are valuable for the user as a personal information repository. For instance, in 10 minutes before a meeting, a user may want to quickly go through a previous discussion via emails that is going to be discussed in the meeting soon. In this case, rather than reading each individual email one by one, it is preferable to read a concise summary of the previous discussion with major information summarized. In this thesis, we study the problem of discovering and summarizing email conversations. We believe that our work can greatly support users with their email folders. However, the characteristics of email conversations, e.g., lack of synchronization, conversational structure and informal writing style, make this task particularly challenging. In this thesis, we tackle this task by considering the following aspects: discovering emails in one conversation, capturing the conversation structure and summarizing the email conversation. We first study how to discover all emails belonging to one conversation. Specifically, we study the hidden email problem, which is important for email summarization and other applications but has not been studied before. We propose a framework to discover and regenerate hidden emails. The empirical evaluation shows that this framework is accurate and scalable to large folders. Second, we build a fragment quotation graph to capture email conversations. The hidden emails belonging to each conversation are also included into the corresponding graph. Based on the quotation graph, we develop a novel email conversation summarizer, ClueWordSummarizer. The comparison with a state-of-the-art email summarizer as well as with a popular multi-document summarizer shows that ClueWordSummarizer obtains a higher accuracy in most cases. Furthermore, to address the characteristics of email conversations, we study several ways to improve the ClueWordSummarizer by considering more lexical features. The experiments show that many of those improvements can significantly increase the accuracy especially the subjective words and phrases. / Science, Faculty of / Computer Science, Department of / Graduate email summarization data mining
13	A comparative study of automatic text summarization using human evaluation and automatic measures / En jämförande studie av automatisk textsammanfattning med användning av mänsklig utvärdering och automatiska mått Wennstig, Maja January 2023 (has links) Automatic text summarization has emerged as a promising solution to manage the vast amount of information available on the internet, enabling a wider audience to access it. Nevertheless, further development and experimentation with different approaches are still needed. This thesis explores the potential of combining extractive and abstractive approaches into a hybrid method, generating three types of summaries: extractive, abstractive, and hybrid. The news articles used in the study are from the Swedish newspaper Dagens Nyheter(DN). The quality of the summaries is assessed using various automatic measures, including ROUGE, BERTScore, and Coh-Metrix. Additionally, human evaluations are conducted to compare the different types of summaries in terms of perceived fluency, adequacy, and simplicity. The results of the human evaluation show a statistically significant difference between attractive, abstractive, and hybrid summaries with regard to fluency, adequacy, and simplicity. Specifically, there is a significant difference between abstractive and hybrid summaries in terms of fluency and simplicity, but not in adequacy. The automatic measures, however, do not show significant differences between the different summaries but tend to give higher scores to the hybrid and abstractive summaries Extractive summarization Abstractive summarization Hybrid summarization Automatic text summarization
14	Evaluation of Automatic Text Summarization Using Synthetic Facts Ahn, Jaewook 01 June 2022 (has links) (PDF) Automatic text summarization has achieved remarkable success with the development of deep neural networks and the availability of standardized benchmark datasets. It can generate fluent, human-like summaries. However, the unreliability of the existing evaluation metrics hinders its practical usage and slows down its progress. To address this issue, we propose an automatic reference-less text summarization evaluation system with dynamically generated synthetic facts. We hypothesize that if a system guarantees a summary that has all the facts that are 100% known in the synthetic document, it can provide natural interpretability and high feasibility in measuring factual consistency and comprehensiveness. To our knowledge, our system is the first system that measures the overarching quality of the text summarization models with factual consistency, comprehensiveness, and compression rate. We validate our system by comparing its correlation with human judgment with existing N-gram overlap-based metrics such as ROUGE and BLEU and a BERT-based evaluation metric, BERTScore. Our system's experimental evaluation of PEGASUS, BART, and T5 outperforms the current evaluation metrics in measuring factual consistency with a noticeable margin and demonstrates its statistical significance in measuring comprehensiveness and overall summary quality. Natural Language Processing Text summarization Summarization Evaluation Synthetic Facts
15	Supervised machine learning for email thread summarization Ulrich, Jan 11 1900 (has links) Email has become a part of most people's lives, and the ever increasing amount of messages people receive can lead to email overload. We attempt to mitigate this problem using email thread summarization. Summaries can be used for things other than just replacing an incoming email message. They can be used in the business world as a form of corporate memory, or to allow a new team member an easy way to catch up on an ongoing conversation. Email threads are of particular interest to summarization because they contain much structural redundancy due to their conversational nature. Our email thread summarization approach uses machine learning to pick which sentences from the email thread to use in the summary. A machine learning summarizer must be trained using previously labeled data, i.e. manually created summaries. After being trained our summarization algorithm can generate summaries that on average contain over 70% of the same sentences as human annotators. We show that labeling some key features such as speech acts, meta sentences, and subjectivity can improve performance to over 80% weighted recall. To create such email summarization software, an email dataset is needed for training and evaluation. Since email communication is a private matter, it is hard to get access to real emails for research. Furthermore these emails must be annotated with human generated summaries as well. As these annotated datasets are rare, we have created one and made it publicly available. The BC3 corpus contains annotations for 40 email threads which include extractive summaries, abstractive summaries with links, and labeled speech acts, meta sentences, and subjective sentences. While previous research has shown that machine learning algorithms are a promising approach to email summarization, there has not been a study on the impact of the choice of algorithm. We explore new techniques in email thread summarization using several different kinds of regression, and the results show that the choice of classifier is very critical. We also present a novel feature set for email summarization and do analysis on two email corpora: the BC3 corpus and the Enron corpus. Email Summarization Machine learning Corpus
16	Supervised machine learning for email thread summarization Ulrich, Jan 11 1900 (has links) Email has become a part of most people's lives, and the ever increasing amount of messages people receive can lead to email overload. We attempt to mitigate this problem using email thread summarization. Summaries can be used for things other than just replacing an incoming email message. They can be used in the business world as a form of corporate memory, or to allow a new team member an easy way to catch up on an ongoing conversation. Email threads are of particular interest to summarization because they contain much structural redundancy due to their conversational nature. Our email thread summarization approach uses machine learning to pick which sentences from the email thread to use in the summary. A machine learning summarizer must be trained using previously labeled data, i.e. manually created summaries. After being trained our summarization algorithm can generate summaries that on average contain over 70% of the same sentences as human annotators. We show that labeling some key features such as speech acts, meta sentences, and subjectivity can improve performance to over 80% weighted recall. To create such email summarization software, an email dataset is needed for training and evaluation. Since email communication is a private matter, it is hard to get access to real emails for research. Furthermore these emails must be annotated with human generated summaries as well. As these annotated datasets are rare, we have created one and made it publicly available. The BC3 corpus contains annotations for 40 email threads which include extractive summaries, abstractive summaries with links, and labeled speech acts, meta sentences, and subjective sentences. While previous research has shown that machine learning algorithms are a promising approach to email summarization, there has not been a study on the impact of the choice of algorithm. We explore new techniques in email thread summarization using several different kinds of regression, and the results show that the choice of classifier is very critical. We also present a novel feature set for email summarization and do analysis on two email corpora: the BC3 corpus and the Enron corpus. Email Summarization Machine learning Corpus
17	Supervised machine learning for email thread summarization Ulrich, Jan 11 1900 (has links) Email has become a part of most people's lives, and the ever increasing amount of messages people receive can lead to email overload. We attempt to mitigate this problem using email thread summarization. Summaries can be used for things other than just replacing an incoming email message. They can be used in the business world as a form of corporate memory, or to allow a new team member an easy way to catch up on an ongoing conversation. Email threads are of particular interest to summarization because they contain much structural redundancy due to their conversational nature. Our email thread summarization approach uses machine learning to pick which sentences from the email thread to use in the summary. A machine learning summarizer must be trained using previously labeled data, i.e. manually created summaries. After being trained our summarization algorithm can generate summaries that on average contain over 70% of the same sentences as human annotators. We show that labeling some key features such as speech acts, meta sentences, and subjectivity can improve performance to over 80% weighted recall. To create such email summarization software, an email dataset is needed for training and evaluation. Since email communication is a private matter, it is hard to get access to real emails for research. Furthermore these emails must be annotated with human generated summaries as well. As these annotated datasets are rare, we have created one and made it publicly available. The BC3 corpus contains annotations for 40 email threads which include extractive summaries, abstractive summaries with links, and labeled speech acts, meta sentences, and subjective sentences. While previous research has shown that machine learning algorithms are a promising approach to email summarization, there has not been a study on the impact of the choice of algorithm. We explore new techniques in email thread summarization using several different kinds of regression, and the results show that the choice of classifier is very critical. We also present a novel feature set for email summarization and do analysis on two email corpora: the BC3 corpus and the Enron corpus. / Science, Faculty of / Computer Science, Department of / Graduate Email Summarization Machine learning Corpus
18	An Empirical Study Investigating Source Code Summarization Using Multiple Sources of Information Sama, Sanjana 30 May 2018 (has links) No description available. Computer Science Information Systems Information Technology Automatic source code summarization Eye-tracking in code summarization Software engineering Code summarization
19	Topic-focused and summarized web information retrieval Yoo, Seung Yeol, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Since the Web is getting bigger and bigger with a rapidly increasing number of heterogeneous Web pages, Web users often suffer from two problems: P1) irrelevant information and P2) information overload Irrelevant information indicates the weak relevance between the retrieved information and a user's information need. Information overload indicates that the retrieved information may contain 1) redundant information (e.g., common information between two retrieved Web pages) or 2) too much amount of information which cannot be easily understood by a user. We consider four major causes of those two problems P1) and P2) as follows; ??? Firstly, ambiguous query-terms. ??? Secondly, ambiguous terms in a Web page. ??? Thirdly, a query and a Web page cannot be semantically matched, because of the first and second causes. ??? Fourthly, the whole content of a Web page is a coarse context-boundary to measure the similarity between the Web page and a query. To answer those two problems P1) and P2), we consider that the meanings of words in a Web page and a query are primitive hints for understanding the related semantics of the Web page. Thus, in this dissertation, we developed three cooperative technologies: Word Sense Based Web Information Retrieval (WSBWIR), Subjective Segment Importance Model (SSIM) and Topic Focused Web Page Summarization (TFWPS). ??? WSBWIR allows for a user to 1) describe their information needs at senselevel and 2) provides one way for users to conceptually explore information existing within Web pages. ??? SSIM discovers a semantic structure of a Web page. A semantic structure respects not only Web page authors logical presentation structures but also a user specific topic interests on the Web pages at query time. ??? TFWPS dynamically generates extractive summaries respecting a user's topic interests. WSBWIR, SSIM and TFWPS technologies are implemented and experimented through several case-studies, classification and clustering tasks. Our experiments demonstrated that 1) the comparable effectiveness of exploration of Web pages using word senses, and 2) the segments partitioned by SSIM and summaries generated by TFWPS can provide more topically coherent features for classification and clustering purposes. Topic-focus Summarization Web Information Retrieval
20	Summary-based document categorization with LSI Liu, Hsiao-Wen 14 February 2007 (has links) Text categorization to automatically assign documents into the appropriate pre-defined category or categories is essential to facilitating the retrieval of desired documents efficiently and effectively from a huge text depository, e.g., the world-wide web. Most techniques, however, suffer from the feature selection problem and the vocabulary mismatch problem. A few research works have addressed on text categorization via text summarization to reduce the size of documents, and consequently the number of features to consider, while some proposed using latent semantic indexing (LSI) to reveal the true meaning of a term via its association with other terms. Few works, however, have studied the joint effect of text summarization and the semantic dimension reduction technique in the literature. The objective of this research is thus to propose a practical approach, SBDR to deal with the above difficulties in text categorization tasks. Two experiments are conducted to validate our proposed approach. In the first experiment, the results show that text summarization does improve the performance in categorization. In addition, to construct important sentences, the association terms of both noun-noun and noun-verb pairs should be considered. Results of the second experiment indicate slight better performance with the approach of adopting LSI exclusively (i.e. no summarization) than that with SBDR (i.e. with summarization). Nonetheless, the minor accuracy reduction can be largely compensated for the computational time saved using LSI with text summarized. The feasibility of the SBDR approach is thus justified. Document Categorization Latent Semantic Indexing Text Summarization

Search results