Summarizing a Document by Trimming the Discourse Tree

Tsutomu Hirao, Masaaki Nishino, Yasuhisa Yoshida, Jun Suzuki, Norihito Yasuda, Masaaki Nagata

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

Recent studies on extractive text summarization formulate it as a combinatorial optimization problem, extracting the optimal subset from a set of the textual units that maximizes an objective function without violating the length constraint. Although these methods successfully improve automatic evaluation scores, they do not consider the discourse structure in the source document. Thus, summaries generated by these methods may lack logical coherence. In previous work, we proposed a method that exploits a discourse tree structure to produce coherent summaries. By transforming a traditional discourse tree, namely a rhetorical structure theory-based discourse tree (RST-DT), into a dependency-based discourse tree (DEP-DT), we formulated the summarization procedure as a Tree Knapsack Problem whose tree corresponds to the DEP-DT. This paper extends the work with a detailed discussion of the approach together with a novel efficient dynamic programming algorithm for solving the Tree Knapsack Problem. Experiments show that our method not only achieved the highest score in both automatic and human evaluation, but also obtained good performance in terms of the linguistic qualities of the summaries.

Original languageEnglish
Pages (from-to)2081-2092
Number of pages12
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume23
Issue number11
DOIs
Publication statusPublished - 2015 Nov 1
Externally publishedYes

Keywords

  • Discourse analysis
  • single-document summarization
  • tree knapsack problem

ASJC Scopus subject areas

  • Signal Processing
  • Media Technology
  • Instrumentation
  • Acoustics and Ultrasonics
  • Linguistics and Language
  • Electrical and Electronic Engineering
  • Speech and Hearing

Fingerprint Dive into the research topics of 'Summarizing a Document by Trimming the Discourse Tree'. Together they form a unique fingerprint.

Cite this