Content area

Abstract

The discovery and characterization of novel materials are crucial for the development of new technology. Finding suitable materials for specific applications, however, is challenging due to the diverse and sometimes conflicting requirements for their properties. The decreasing cost of computing material properties and the recent development of data infrastructures have drastically increased the amount of available materials data. Being computed for various purposes, the available data employ different physical approximations and numerical parameters. This heterogeneity poses significant challenges in integrating and comparing data from different sources.

In this thesis, we make use of descriptors and metrics to quantitatively evaluate the similarity between different materials, represented by individual calculations. To achieve this task, we developed a computational framework that allows users to compose and manage datasets, specify and compute different descriptors and metrics, compute similarity matrices, and use methods of unsupervised machine learning. We furthermore present a spectral fingerprint, i.e., a novel descriptor that encodes spectra as binary-valued raster images, allowing us to compare the similarity of different quantities, such as the electronic density-of-states, or optical absorption spectra.

We apply our methodology to assess the quality of materials data and explore large data-spaces. We demonstrate with various examples that the spectral fingerprint can be used to quantitatively describe the differences between theoretical results obtained with different physical approximations or numerical parameters, or results stemming from independent experiments. By applying our methods to larger data sets, we identify and visualize the correlations between the precision of computational results and the relevant numerical parameters. This also allows us to find calculations based on different parameters that show very similar results. To explore large data spaces, we conduct similarity searches on materials data, which reveal unexpected similarities between materials with different compositions. Furthermore, we use a clustering algorithm to find sets of materials with similar electronic structure. We identify and rationalize the main mechanisms leading to these similarities. Importantly, we find outliers that cannot be explained by simple rules. Finally, we compare the results of clustering with different similarity measures, showcasing correlations between them.

Details

1010268
Business indexing term
Title
Classification of Materials Based on Similarity Measures
Number of pages
146
Publication year
2025
Degree date
2025
School code
5416
Source
DAI-B 87/5(E), Dissertation Abstracts International
ISBN
9798265415042
University/institution
Humboldt Universitaet zu Berlin (Germany)
University location
Germany
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32319789
ProQuest document ID
3275493840
Document URL
https://www.proquest.com/dissertations-theses/classification-materials-based-on-similarity/docview/3275493840/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic