A study workflow that utilizes several data science methods to apply on polymer materials databases is introduced to reveal correlations among their properties, structural information, and molecular descriptors. The data science methods used in this pipeline include the unsupervised machine learning (ML) method of self-organizing mapping (SOM) and the polymer molecular descriptor generator, both of which have been tailored to fit the polymer materials study. To demonstrate how this pipeline can be applied in this context, we used it on an organic photovoltaic (OPV) donor polymer database to investigate which properties or structural factors positively correlate with the power conversion efficiency (PCE) of OPV materials. This led us to discover that among the studied 8 properties and 11 molecular descriptors, only the photon energy loss (Eloss) and the number of fluorine atoms (nF) show strong positive correlations with PCE values, which is consistent with other verified studies. We also discovered that research trends can also be statistically visualized using our method. In our case study, we found that most of the studied OPV donor materials in the database have branched side chains and typically 7-12 non-hydrogen atoms, and high PCE materials usually have 6-9 aromatics rings as well. These results proved that the data science pipeline proposed in this study provides a fast and effective way to obtain research insights for polymer materials.
ASJC Scopus subject areas
- Electronic, Optical and Magnetic Materials
- Physical and Theoretical Chemistry
- Surfaces, Coatings and Films