Automatic text summarization is a long-standing task with its origins in
summarizing scholarly documents by generating their abstracts. While
older approaches mainly focused on generating extractive summaries, recent
approaches using neural architectures have helped the task advance
towards generating more abstractive, human-like summaries.
Yet, the majority of the research in automatic text summarization has focused
on summarizing professionally-written news articles due to easier
availability of large-scale datasets with ground truth summaries in this domain.
Moreover, the inverted pyramid writing style enforced in news articles
places crucial information in the top sentences, essentially summarizing
it. This allows for a more reliable identification of ground truth for constructing
datasets. In contrast, user-generated discourse, such as social media
forums or debate portals, has acquired comparably little attention, despite
its evident importance. Possible reasons include the challenges posed
by the informal nature of user-generated discourse, which often lacks a rigid
structure, such as news articles, and the difficulty of obtaining high-quality
ground truth summaries for this text register.
This thesis aims to address this existing gap by delivering the following
novel contributions in the form of datasets, methodologies, and evaluation
strategies for automatically summarizing user-generated discourse:
(1) three new datasets for the registers of social media posts and argumentative
texts containing author-provided ground truth summaries as well as
crowdsourced summaries for argumentative texts by adapting theoretical
definitions of high-quality summaries; (2) methodologies for creating informative
as well as indicative summaries for long discussions of controversial
topics; (3) user-centric evaluation processes that emphasize the purpose
and provenance of the summary for qualitative assessment of the summarization
models; and (4) tools for facilitating the development and evaluation
of summarization models that leverage visual analytics and interactive
interfaces to enable a fine-grained inspection of the automatically generated
summaries in relation to their source documents.:1 Introduction
1.1 Understanding User-Generated Discourse
1.2 The Role of Automatic Summarization
1.3 Research Questions and Contributions
1.4 Thesis Structure
1.5 Publication Record
2 The Task of Text Summarization
2.1 Decoding Human Summarization Practices
2.2 Exploring Automatic Summarization Methods
2.3 Evaluation of Automatic Summarization and its Challenges
2.4 Summary
3 Defining Good Summaries: Examining News Editorials
3.1 Key Characteristics of News Editorials
3.2 Operationalizing High-Quality Summaries
3.3 Evaluating and Ensuring Summary Quality
3.4 Automatic Extractive Summarization of News Editorials
3.5 Summary
4 Mining Social Media for Author-provided Summaries
4.1 Leveraging Human Signals for Summary Identification
4.2 Constructing a Corpus of Abstractive Summaries
4.3 Insights from the TL;DR Challenge
4.4 Summary
5 Generating Conclusions for Argumentative Texts
5.1 Identifying Author-provided Conclusions
5.2 Enhancing Pretrained Models with External Knowledge
5.3 Evaluating Informative Conclusion Generation
5.4 Summary
6 Frame-Oriented Extractive Summarization of Argumentative Discussions
6.1 Importance of Summaries for Argumentative Discussions
6.2 Employing Argumentation Frames as Anchor Points
6.3 Extractive Summarization of Argumentative Discussions
6.4 Evaluation of Extractive Summaries via Relevance Judgments
6.5 Summary
7 Indicative Summarization of Long Discussions
7.1 Table of Contents as an Indicative Summary
7.2 Unsupervised Summarization with Large Language Models
7.3 Comprehensive Analysis of Prompt Engineering
7.4 Purpose-driven Evaluation of Summary Usefulness
7.5 Summary
8 Summary Explorer: Visual Analytics for the Qualitative Assessment
of the State of the Art in Text Summarization
8.1 Limitations of Automatic Evaluation Metrics
8.2 Designing Interfaces for Visual Exploration of Summaries
8.3 Corpora, Models, and Case Studies
8.4 Summary
9 SummaryWorkbench: Reproducible Models and Metrics for Text Summarization
9.1 Addressing the Requirements for Summarization Researchers
9.2 AUnified Interface for Applying and Evaluating State-of-the-Art Models and Metrics
9.3 Models and Measures
9.4 Curated Artifacts and Interaction Scenarios
9.5 Interaction Use Cases
9.6 Summary
10 Conclusion
10.1 Key Contributions of the Thesis
10.2 Open Problems and FutureWork
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:92498 |
Date | 04 July 2024 |
Creators | Syed, Shahbaz |
Contributors | Universität Leipzig |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/acceptedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0027 seconds