Return to search

Automatic Text Summarization Using Importance of Sentences for Email Corpus

abstract: With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the results are not effective towards the search query of the user. The search engine user has to go through certain webpages before getting at the webpage he/she wanted. This problem of Information Overload can be solved using Automatic Text Summarization. Summarization is a process of obtaining at abridged version of documents so that user can have a quick view to understand what exactly the document is about. Email threads from W3C are used in this system. Apart from common IR features like Term Frequency, Inverse Document Frequency, Term Rank, a variation of page rank based on graph model, which can cluster the words with respective to word ambiguity, is implemented. Term Rank also considers the possibility of co-occurrence of words with the corpus and evaluates the rank of the word accordingly. Sentences of email threads are ranked as per features and summaries are generated. System implemented the concept of pyramid evaluation in content selection. The system can be considered as a framework for Unsupervised Learning in text summarization. / Dissertation/Thesis / Masters Thesis Computer Science 2015

Identiferoai:union.ndltd.org:asu.edu/item:34929
Date January 2015
ContributorsNadella, Sravan (Author), Davulcu, Hasan (Advisor), Li, Baoxin (Committee member), Sen, Arunabha (Committee member), Arizona State University (Publisher)
Source SetsArizona State University
LanguageEnglish
Detected LanguageEnglish
TypeMasters Thesis
Format41 pages
Rightshttp://rightsstatements.org/vocab/InC/1.0/, All Rights Reserved

Page generated in 0.0019 seconds