Creating Bespoke Databases using AI to Assist Materials Engineering
19 Jul 2022
- Evan Jones



ISIS-funded PhD Students have produced the world’s largest ever auto-generated experimental databases for materials by mining information from the scientific literature.




Materials scientists and engineers are increasingly turning to data-driven techniques to aid their research, since such methods can be applied to deliver added value for materials. The value lies i​​n the connection of fragmented data about materials which, once integrated into a custom database, can afford knowledge innovation. The collated knowledge may seed innovation by enabling a better understanding of material behaviours, through the quantitative analysis of data about properties such as the effects of stress and strain on a material, or by comparing the relative efficiencies of all reported photovoltaic devices. This will, without a doubt, have a positive impact on a wide range of industries including aeronautics, transportation and sustainable energy. However, the successful implementation of data-driven methods requires large repositories of experimental data which are seldom available, particularly when dealing with the use and design of engineering materials.  

Two ISIS-funded PhD students based at STFC Rutherford Appleton Laboratory, on industrial-placement from the University of Cambridge, have tackled this proble​m. They have used artificial inte​lligence to auto-generate materials databases by extracting experimental information from the scientific literature. This employed the materials-focused text-mining software, ChemDataExtractor, home-grown by their research group at the University of Cambridge. One student targeted stress-strain information by extracting yield-strength and grain-size values; the other focused on extracting photovoltaic device information for solar-cell research. Both studies have been published by Springer Nature, in their journal, Scientific Data. 

The yield-strength and grain-size databases are first-of-a-kind and allow fo​​​r unique large-scale statistical studies of these properties and their relation to each other, which was simply not possible prior to this work. These databases will be of great use to the worldwide engineering community, as well as industrial users of the ISIS facility. Their PhD supervisor explained: 

“The databases realised from this study represent the first key component ​of a new data-science platform for materials engineering at ISIS that we are developing for the Facility.” – Professor Jacqueline Cole, University of Cambridge and ISIS Neutron and Muon Source 

The ISIS-funded PhD student working on photovoltaics has employed this form of AI to generate the world’s largest ever experimental database for materials, generated by ChemDataExtractor. This will be useful for the photovoltaics device community, thus helping to offset climate change and bringing ISIS closer to its energy sustainability goals. As part of STFC, they aim to reach net zero by 2040. 

Two open-source repositories of photovoltaic material and device data have been created: one containing data about perovskite-based solar cells and one about dye-sensitized solar cells. In recent years, these two ​​areas of photovoltaics have shown great promise, and commercial enterprises exist to exploit both technologies. Hence, their technological value will be enhanced via the provision of these centralised databases. In total, these photovoltaic repositories contain over 660,000 data records, which is about double that of the largest materials database that was previously auto-generated by ChemDataExtractor, for the field of batteries. 

Going forward, AI-based software could also be employed to auto-generate da​tabases that serve many other fields of applied-materials research, thus benefiting a vast range of the ISIS user community.​

Contact: Jones, Evan (STFC,RAL,ISIS)