Textmining in support of knowledge discovery for vaccine development

Christian Schönbach, Takeshi Nagashima, Akihiko Konagaya

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)


Complete genome data of infectious microorganisms permit systematic computational sequence-based predictions and experimental testing of candidate vaccine epitopes. Both, predictions and the interpretation of experiments rely on existing information in the literature which is mostly manually extracted and curated. The growing amount of data and literature information has created a major bottleneck for the interpretation of results and maintenance of curated databases. The lack of suitable free-text information extraction, processing, and reporting tools prompted us to develop a knowledge discovery support system that enhances the understanding of immune response and vaccine development. The current prototype system, Gene expression/epitpopes/protein interaction (GEpi), focusses on molecular functions of HIV-infected T-cells and HIV epitope information, using textmining, and interrelation of biomolecular data from domain-specific databases with MEDLINE abstract-inferred information. Results showed that extraction and processing of molecular interaction, disease associations, and gene ontology-derived functional information generate intuitive knowledge reports that aid the interpretation of host-pathogen interaction. In contrast, epitope (word and sequence) information in MEDLINE abstracts is surprisingly sparse and often lacks necessary context information, such as HLA-restriction. Since the majority of epitope information is found in tables, figures, and legends of full-text articles, its extraction may not require sophisticated natural language processing techniques. Support of vaccine development through textmining requires therefore the timely development of domain-specific extraction rules for full-text articles, and a knowledge model for epitope-related information.

Original languageEnglish
Pages (from-to)488-495
Number of pages8
Issue number4
Publication statusPublished - 2004 Dec 1


  • Disease association
  • Epitope
  • Gene ontology
  • HIV infection
  • MeSH
  • Molecular interaction
  • T-cell
  • Text information retrieval
  • Textmining
  • Vaccine development

ASJC Scopus subject areas

  • Molecular Biology
  • Biochemistry, Genetics and Molecular Biology(all)


Dive into the research topics of 'Textmining in support of knowledge discovery for vaccine development'. Together they form a unique fingerprint.

Cite this