Sign-constrained linear regression for prediction of microbe concentration based on water quality datasets

Tsuyoshi Kato, Ayano Kobayashi, Wakana Oishi, Syun Suke Kadoya, Satoshi Okabe, Naoya Ohta, Mohan Amarasiri, Daisuke Sano

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


This study presents a novel methodology for estimating the concentration of environmental pollutants in water, such as pathogens, based on environmental parameters. The scientific uniqueness of this study is the prevention of excess conformity in the model fitting by applying domain knowledge, which is the accumulated scientific knowledge regarding the correlations between response and explanatory variables. Sign constraints were used to express domain knowledge, and the effect of the sign constraints on the prediction performance using censored datasets was investigated. As a result, we confirmed that sign constraints made prediction more accurate compared to conventional sign-free approaches. The most remarkable technical contribution of this study is the finding that the sign constraints can be incorporated in the estimation of the correlation coefficient in Tobit analysis. We developed effective and numerically stable algorithms for fitting a model to datasets under the sign constraints. This novel algorithm is applicable to a wide variety of the prediction of pollutant contamination level, including the pathogen concentrations in water.

Original languageEnglish
Pages (from-to)404-415
Number of pages12
JournalJournal of Water and Health
Issue number3
Publication statusPublished - 2019


  • Censored datasets
  • Environmental regression
  • Sign constraints
  • Tobit analysis
  • Water quality

ASJC Scopus subject areas

  • Water Science and Technology
  • Waste Management and Disposal
  • Public Health, Environmental and Occupational Health
  • Microbiology (medical)
  • Infectious Diseases


Dive into the research topics of 'Sign-constrained linear regression for prediction of microbe concentration based on water quality datasets'. Together they form a unique fingerprint.

Cite this