Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Domain-Knowledge-Guided Machine Learning Towards Accurate Materials Property Prediction and Materials Discovery

Abstract

In the past few decades, the first principles modeling algorithms, especially density functional theory (DFT), have been important complements to experiments in studying properties and materials design. Thanks to the success of DFT and the fast development of computational capabilities, we have witnessed the exploration of a huge amount of materials data. The logical next step is the introduction of tools capable of making use of the generated data. Machine learning (ML) techniques are such tools to extract knowledge from data and make predictions at a sub-second speed, which are currently steering materials science into a new data-driven paradigm.

In this thesis, following the close guidance of domain knowledge in materials science, we strive to develop accurate, interpretable ML models that could potentially serve as the surrogate of DFT in property prediction and the design of new materials. A unifying theme that differentiates the models in this thesis from their counterparts in other existing ML works is the practice of the principle of parsimony, where we aspire to develop and explain the models with minimum features.

The thesis is divided into three topics. In the first topic (Chapter 2), we aimed at predicting the phase stability of the inorganic crystals, which is often the first step in any materials discovery. Inspired by Pauling's rules, we show that deep neural networks utilizing just the Pauling electronegativity and ionic radii of the species of the symmetrically distinct sites can predict the DFT formation energies of garnets and perovskites within the low mean absolute errors (MAEs) of 7-34 meV atom$^{-1}$. The models can be easily extended to mixed garnets and perovskites with little loss in accuracy by using a binary encoding scheme, extending the applicability of ML models to the infinite universe of mixed-species crystals.

In the second topic (Chapter 3), we targeted predicting the bandgap. By machine learning on 1823 data, we show that the eXtreme gradient boosting(XGBoost) model reaches the state-of-the-art MAE of 0.13 eV at predicting the DFT bandgap (using generalized gradient approximation functional) of garnets. Interpreting the model's behavior reveals that the bandgap is affected mainly by the atomic number of the species occupying the tetrahedron sites in a garnet crystal. Integrating the models from both Chapter 2 and Chapter 3, we devised a high-throughput screening (HTS) workflow to screen for \ce{Eu^{2+}}-doped red emission phosphors in the garnet crystal family. Two candidates, \ce{Ca(Er$,$Tb)2Mg2Si3O12}, were identified by rapidly transversing 5554 candidate compositions, which is computationally prohibitive for pure DFT-based HTS workflows due to the large cell size of the garnet structures.

In the last topic (Chapter 4), we investigated the 2D defect, grain boundary (GB), in polycrystalline systems. We show that the energy of a grain boundary, normalized by the bulk cohesive energy, can be described purely by four geometric features. By machine learning on a large computed database of 369 low-$\Sigma$ ($\Sigma < 10$) GBs of more than 50 metals, we developed a model that can predict the grain boundary energies to within 0.12 J m$^{-2}$. This universal GB energy model can be extrapolated to the energies of higher sigma GBs with a modest increase in prediction error.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View