Artificial intelligence powered statistical genetics in biobanks

Akira Narita, Masao Ueki, Gen Tamiya

Research output: Contribution to journalReview articlepeer-review

Abstract

Large-scale, sometimes nationwide, prospective genomic cohorts biobanking rich biological specimens such as blood, urine and tissues, have been established and released their vast amount of data in several countries. These genetic and epidemiological resources are expected to allow investigators to disentangle genetic and environmental components conferring common complex diseases. There are, however, two major challenges to statistical genetics for this goal: small sample size—high dimensionality and multilayered—heterogenous endophenotypes. Rather counterintuitively, biobank data generally have small sample size relative to their data dimensionality consisting of genomic variation, lifestyle questionnaire, and sometimes their interaction. This is a widely acknowledged difficulty in data analysis, so-called “p»n problem” in statistics or “curse of dimensionality” in machine-learning field. On the other hand, we have too many measurements of individual health status, which are endophenotypes, such as health check-up data, images, psychological test scores in addition to metabolomics and proteomics data. These endophenotypes are rich but not so tractable because of their worsen dimensionality, and substantial correlation, sometimes confusing causation among them. We have tried to overcome the problems inherent to biobank data, using statistical machine-learning and deep-learning technologies.

Original languageEnglish
Pages (from-to)61-65
Number of pages5
JournalJournal of Human Genetics
Volume66
Issue number1
DOIs
Publication statusPublished - 2021 Jan

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Fingerprint Dive into the research topics of 'Artificial intelligence powered statistical genetics in biobanks'. Together they form a unique fingerprint.

Cite this