Sprecher
Beschreibung
Machine learning (ML) and in particular artificial neural networks (ANNs) push state-of-the-art solutions for many hard problems e.g., image classification, speech recognition or time series forecasting. In the domain of climate science, ANNs have good prospects to identify causally linked modes of climate variability as key to understand the climate system and to improve the predictive skills of forecast systems. To attribute climate events in a data-driven way with ANNs, we need sufficient training data, which is often limited for real world measurements. The data science community provides standard data sets for many applications. As a new data set, we introduce a collection of climate indices typically used to describe Earth System dynamics. This collection is consistent and comprehensive as we use control simulations from Earth System Models (ESMs) over 1,000 years to derive climate indices. The data set is provided as an open-source framework that can be extended and customized to individual needs. It allows to develop new ML methodologies and to compare results to existing methods and models as benchmark. Exemplary, we use the data set to predict rainfall in the African Sahel region and El Niño Southern Oscillation with various ML models. We argue that this new data set allows to thoroughly explore techniques from the domain of explainable artificial intelligence to have trustworthy models, that are accepted by domain scientists. Our aim is to build a bridge between the data science community and researchers and practitioners from the domain of climate science to jointly improve our understanding of the climate system.