Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

High-throughput Computations of X-ray Spectroscopy and Machine Learning-assisted Chemical Environment Identification

No data is associated with this publication.
Abstract

The X-ray absorption spectroscopy(XAS) is a powerful and versatile characterization technique to probe the local environment, oxidation states and electronic states, which are critical to understanding the mechanism of the functional material. Depending on which core states are to be excited by X-ray photons, two major types of XAS measurements: K and L-edge that correspond to 1s and 2p core-states, respectively. Also, depending on how far the energy is from absorption onset, one XAS spectrum can be further divided into X-ray absorption near-edge structure (XANES) and extended X-ray absorption fine structure (EXAFS) with a cutoff of 50eV. Conventional interpretation of XANES spectra requires comparing the spectra from known references and the ones from unknown samples. However, the deficiencies of high-quality reference spectra poses challenges to the effective data analysis. This leads to our first project, where we establish a computational database for the L-edge XANES, which is commonly used to characterize transition metal compounds because of its strong signal. We use well-known multiple scattering FEFF9 codes to perform high-throughput calculations. As a result, more than 130,000 L-edge XANES spectra were computed and made available to the public through the Materials Project website. This database confronts the deficiency of reference data and opens the door to potential machine learning applications that could further accelerate data analysis.

In our second project, we adopt a machine learning(ML) approach to predict target properties from spectra directly. To start with, we carefully selected coordination environment descriptors so that diverse environments were considered. After comparing several state-of-art machine learning algorithms, we demonstrate that random forest models achieve the highest accuracy of 85.4\% for top coordination environment classification. It is worth mentioning that this work covers 33 cation elements for more than 22,500 oxide compounds, making it a fairly comprehensive study to explore the ML applications in the XAS field. Besides excellent classification accuracy, model interpretation has been one of the bottlenecks in relating modeling to knowledge extraction. By performing drop-variable feature importance analysis, we discover that the insights from trained models are consistent with previous findings from XAS experts. As a pioneer work, this project will attract more interest among the materials community to explore ML applications in XAS and addresses a critical gap in interpreting ML models from physics.

In the final project, we explore how the featurization of XANES spectra could affect ML predictions. While most published works in this field only investigate the original spectrum as inputs, the transformation and reduced dimension of the XANES spectrum could potentially enhance models' performance. We test a total of 12 features, including baseline ones, on two tasks: classification of oxidation states and regression of bond length. Cumulative distribution function(CDF) feature is yet the best one to achieve the highest accuracy and experiences the least performance decrease after dimensionality reduction. The experimental validation result indicates that the prediction on unseen experimental data can match well with the known ground-truth.

In summary, the first project serves as a standalone database and a data resource for ML applications in the XAS field. The latter two projects demonstrate how machine learning approaches could improve chemical environment identification from XANES. In-depth analysis, including feature importance analysis and experimental validation, further proved the interpretability and transferability of trained models. These findings will be crucial in accelerating the analysis of experimental XANES spectra.

Main Content

This item is under embargo until January 4, 2025.