Implementation and Evaluation of Decision Trees with Range and Region Splitting

Yasuhiko Morimoto, Takeshi Fukuda, Shinichi Morishita, Takeshi Tokuyama

Research output: Contribution to journalArticlepeer-review

20 Citations (Scopus)

Abstract

We propose an extension of an entropy-based heuristic for constructing a decision tree from a large database with many numeric attributes. When it comes to handling numeric attributes, conventional methods are inefficient if any numeric attributes are strongly correlated. Our approach offers one solution to this problem. For each pair of numeric attributes with strong correlation, we compute a two-dimensional association rule with respect to these attributes and the objective attribute of the decision tree. In particular, we consider a family ℛ of grid-regions in the plane associated with the pair of attributes. For R ∈ ℛ, the data can be split into two classes: data inside R and data outside R. We compute the region Ropt ∈ ℛ that minimizes the entropy of the splitting, and add the splitting associated with Ropt (for each pair of strongly correlated attributes) to the set of candidate tests in an entropy-based heuristic. We give efficient algorithms for cases in which ℛ is (1) x-monotone connected regions, (2) based-monotone regions, (3) rectangles, and (4) rectilinear convex regions. The algorithm has been implemented as a subsystem of SONAR (System for Optimized Numeric Association Rules) developed by the authors. We have confirmed that we can compute the optimal region efficiently. And diverse experiments show that our approach can create compact trees whose accuracy is comparable with or better than that of conventional trees. More importantly, we can grasp non-linear correlation among numeric attributes which could not be found without our region splitting.

Original languageEnglish
Pages (from-to)401-427
Number of pages27
JournalConstraints
Volume2
Issue number3-4
DOIs
Publication statusPublished - 1997

Keywords

  • Decision trees
  • Multivariate tests
  • Range splitting
  • Region splitting

ASJC Scopus subject areas

  • Software
  • Discrete Mathematics and Combinatorics
  • Computational Theory and Mathematics
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Implementation and Evaluation of Decision Trees with Range and Region Splitting'. Together they form a unique fingerprint.

Cite this