Abstract
Recent studies on extractive text summarization formulate it as a combinatorial optimization problem, extracting the optimal subset from a set of the textual units that maximizes an objective function without violating the length constraint. Although these methods successfully improve automatic evaluation scores, they do not consider the discourse structure in the source document. Thus, summaries generated by these methods may lack logical coherence. In previous work, we proposed a method that exploits a discourse tree structure to produce coherent summaries. By transforming a traditional discourse tree, namely a rhetorical structure theory-based discourse tree (RST-DT), into a dependency-based discourse tree (DEP-DT), we formulated the summarization procedure as a Tree Knapsack Problem whose tree corresponds to the DEP-DT. This paper extends the work with a detailed discussion of the approach together with a novel efficient dynamic programming algorithm for solving the Tree Knapsack Problem. Experiments show that our method not only achieved the highest score in both automatic and human evaluation, but also obtained good performance in terms of the linguistic qualities of the summaries.
Original language | English |
---|---|
Pages (from-to) | 2081-2092 |
Number of pages | 12 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 23 |
Issue number | 11 |
DOIs | |
Publication status | Published - 2015 Nov 1 |
Externally published | Yes |
Keywords
- Discourse analysis
- single-document summarization
- tree knapsack problem
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Acoustics and Ultrasonics
- Computational Mathematics
- Electrical and Electronic Engineering