TY - GEN
T1 - Filling the gaps between tools and users
T2 - 13th Pacific Symposium on Biocomputing, PSB 2008
AU - Kano, Yoshinobu
AU - Nguyen, Ngan
AU - SÆtre, Rune
AU - Yoshida, Kazuhiro
AU - Miyao, Yusuke
AU - Tsuruoka, Yoshimasa
AU - Matsubayashi, Yuichiro
AU - Ananiadou, Sophia
AU - Tsujii, Junichi
PY - 2008
Y1 - 2008
N2 - Recently, several text mining programs have reached a near-practical level of performance. Some systems are already being used by biologists and database curators. However, it has also been recognized that current Natural Language Processing (NLP) and Text Mining (TM) technology is not easy to deploy, since research groups tend to develop systems that cater specifically to their own requirements. One of the major reasons for the difficulty of deployment of NLP/TM technology is that re-usability and interoperability of software tools are typically not considered during development. While some effort has been invested in making interoperable NLP/TM toolkits, the developers of end-to-end systems still often struggle to reuse NLP/TM tools, and often opt to develop similar programs from scratch instead. This is particularly the case in BioNLP, since the requirements of biologists are so diverse that NLP tools have to be adapted and re-organized in a much more extensive manner than was originally expected. Although generic frameworks like UIMA (Unstructured Information Management Architecture) provide promising ways to solve this problem, the solution that they provide is only partial. In order for truly interoperable toolkits to become a reality, we also need sharable type systems and a developer-friendly environment for software integration that includes functionality for systematic comparisons of available tools, a simple I/O interface, and visualization tools. In this paper, we describe such an environment that was developed based on UIMA, and we show its feasibility through our experience in developing a protein-protein interaction (PPI) extraction system.
AB - Recently, several text mining programs have reached a near-practical level of performance. Some systems are already being used by biologists and database curators. However, it has also been recognized that current Natural Language Processing (NLP) and Text Mining (TM) technology is not easy to deploy, since research groups tend to develop systems that cater specifically to their own requirements. One of the major reasons for the difficulty of deployment of NLP/TM technology is that re-usability and interoperability of software tools are typically not considered during development. While some effort has been invested in making interoperable NLP/TM toolkits, the developers of end-to-end systems still often struggle to reuse NLP/TM tools, and often opt to develop similar programs from scratch instead. This is particularly the case in BioNLP, since the requirements of biologists are so diverse that NLP tools have to be adapted and re-organized in a much more extensive manner than was originally expected. Although generic frameworks like UIMA (Unstructured Information Management Architecture) provide promising ways to solve this problem, the solution that they provide is only partial. In order for truly interoperable toolkits to become a reality, we also need sharable type systems and a developer-friendly environment for software integration that includes functionality for systematic comparisons of available tools, a simple I/O interface, and visualization tools. In this paper, we describe such an environment that was developed based on UIMA, and we show its feasibility through our experience in developing a protein-protein interaction (PPI) extraction system.
UR - http://www.scopus.com/inward/record.url?scp=40549100761&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=40549100761&partnerID=8YFLogxK
M3 - Conference contribution
C2 - 18229720
AN - SCOPUS:40549100761
SN - 9812776087
SN - 9789812776082
T3 - Pacific Symposium on Biocomputing 2008, PSB 2008
SP - 616
EP - 627
BT - Pacific Symposium on Biocomputing 2008, PSB 2008
Y2 - 4 January 2008 through 8 January 2008
ER -