Faux-Data Injection Optimization for Accelerating Data-Driven Discovery of Materials

A. Ziaullah, S. Chawla, F. El Mellouhi
Hamad Bin Khalifa University,
Qatar

Keywords: Bayesian optimization, artificial intelligence, data-driven machine learning, materials discovery

Summary:

Artificial Intelligence (AI) is now extensively being used to optimize and discover novel materials through data-driven search. The search space for the material to be discovered is usually so large, that it renders manual optimization impractical. This is where data-driven search and optimization enables us to resourcefully locate an optimal or acceptable material configuration with desirable target properties. One such prominent data-driven optimization technique is Bayesian Optimization (BO). Among the mechanics of a BO is the use of a machine learning (ML) model that learns about the scope of the problem through data being acquired on the fly. In this way a BO becomes more informative, directing the search more exquisitely by providing informative suggestions for locating a suitable material candidate for further evaluation. The candidate material is suggested by proposing parameters such as its composition, configuration, etc. which is then evaluated either by physically synthesizing the material and testing its properties or through computational methods such as through Density Functional Theory (DFT). DFT enables researchers to exploit massively parallel architectures such as High-Performance Computing (HPC) which a traditional BO might not be able to fully leverage due to their typical sequential data acquisition bottleneck. Here, we tackle such shortcomings of BO and maximize the utilization of HPC by enabling BO to suggest multiple candidate material suggestions for DFT evaluations at once. . We achieve this objective through a batch optimization technique based on faux-data injection in the BO loop. In the approach at each candidate suggestion from a typical BO loop, we `predict` the outcome, instead of running the actual experiment or DFT calculation, forming a "faux-data-point" and injecting it back to update an ML model. The objective of this methodology is to simulate a time-consuming sequential data-gathering process, and approximate the next k-potential candidates, quickly. All these k-potential candidates can then be distributed to run in parallel in an HPC. Our objective in this work is to test the theory if faux-data injection methodology enables us accelerate our data-driven material discovery workflow. To this end we execute computational experiments by utilizing Organic-Inorganic Halide Perovskites as a case study since the optimality of the results can be easily verified from our previous work. We two performance indicators to compare this BO-based faux-data injection method (FDI-BO) with different baselines. The results show that based on our design constraints, FDI-BO approach enabled us to obtain around 2-6 folds acceleration on average compared to the sequential BO. This publication was made possible from the funding received for the Project, An Artificial Intelligence Platform for Accelerating Materials Discovery (AIPAM), awarded by the Hamad Bin Khalifa University Vice President Office grant number VPR-TG01-006. We also like to acknowledge Heesoo Park from University of Oslo, for assisting and troubleshooting the workflow used for Bayesian optimization on organic-cation halide perovskite, Alex Dunn from University of California, for providing the support to implement FDI-BO in Rocketsled.