Identifying Crystal Structure from Open and Accessible Materials

J. Tate, J. Aguiar, M.L. Gong, T. Tasdizen
University of Utah,
United States

Keywords: machine learning, crystallography, material informatics, materials discovery, data analytics


As materials data become more abundant, rapid translation from raw data into readily interpretable scientific information becomes more essential to understanding and discovery. Data-driven models, specifically deep and machine learning, continually disrupt ma- terials and microscopy communities with automated solutions to narrow and burdensome tasks. However, the ultimate aspiration of high throughput compressive imaging, where high-resolution, multi-modal images are automatically acquired, processed, characterized, and compressed in a single system, remains unrealized. Crystallographic determination is crucial to many of these high throughput workflows, yet has traditionally been performed by experts trained to distinguish minute pattern variations. Determining a crystal’s space group often involves a lengthy process requiring fitting to a series of non-linear equations and intimate knowledge of a sample to be performed properly, including standardized approaches such as Rietveld refinement. The heavy dependence in complex matching, time intensive processes, and potential for dimensionality reduction makes crystal classification an ideal case for automation. In this research, we build on our recent work to classify crystal structure from the material diffraction data by refactoring our model to account for additional material information. Our original classification model consisted of a convolutional neural network which predicted the crystal space group from the diffraction pattern. Our goal was to to improve disambiguation between high-level classifications by including additional materials descriptors, such as chemical composition, to our data-driven model. We found that the additional information provided by chemistry composition increased the accuracy of classification in general and that it may augment the model’s understanding of higher-level materials classifications drawn from open materials data. Specifically, the model trained with diffraction and chemistry data showed an improvement in predicting crystal family (from 86 to 88 %), genera (84.8 to 86.6 %), and space group (73.4 to 73.8%) over predictions with diffraction alone. Other input data augmentation methods demonstrated mixed improvements over diffraction only classification. The automated nature of our data-driven classification tool lends itself to a high throughput capacity, including classifying multiple structures within a sample many times throughout the course of an experiment. We will present the benefits and challenges associated with learning on multi-modal datasets, including a real-time demonstration of classifying data on the fly in a matter of milliseconds per prediction. The overall workflow and analysis strategies creating neural networks that incorporate both chemistry and diffraction data will be presented. Results utilizing the models to classify materials with minimal background knowledge in structure or chemistry will be presented, demonstrated, and discussed at length.