This paper presents the development of new soil property maps for the world at 250 m grid resolution, incorporating state-of-the-art practices and adapting them to the challenges of global digital soil mapping with legacy data. It builds on previous global soil properties maps (SoilGrids250m), integrating up-to-date machine learning methods, the increased availability of standardised soil profile data and environmental covariates. The aim is to produce global maps of soil properties, with cross-validation, hyper-parameter selection and quantification of spatially explicit uncertainty.
Key Insights
Incorporation of Soil Profile Data
The study incorporates soil profile data derived from ISRIC’s World Soil Information Service (WOSIS), significantly expanding the number and spatial distribution of observations. This data is crucial for improving the accuracy and reliability of the soil property maps.
Covariate Selection
A reproducible covariate selection procedure, relying on recursive feature elimination (RFE), is employed to reduce redundancy between covariates and improve model parsimony. This ensures that the most relevant environmental factors are used in the modeling process.
Spatial Stratification for Cross-Validation
An improved cross-validation procedure, based on spatial stratification, is used to address the high spatial variation in observation density. This approach helps to avoid biased results and provides a more robust assessment of model performance.
Uncertainty Quantification
The study quantifies prediction uncertainty using quantile regression forests (QRF), allowing for the estimation of prediction intervals and the assessment of model reliability. This is important for understanding the limitations of the maps and for guiding future data collection efforts.
Transformation of Texture Data
A transformation was applied to the texture fractions, as follows. The relative percentage of sand, silt and clay can be treated as compositional variables, as the sum of the components always equals 100%. Therefore, these components were transformed using the addictive log ratio (ALR) transformation with the Gauss-Hermite quadrature (Aitchison, 1986).
Key Statistics & Data
- The study uses soil observations from about 240,000 locations worldwide.
- Over 400 global environmental covariates are used to describe vegetation, terrain morphology, climate, geology, and hydrology.
- Soil properties are mapped at a spatial resolution of 250 m.
- About 5% of the profiles were sampled before 1960, 14% between 1961-1980, 32% between 1981-2000 and 16% between 2001-2020; the date of sampling is unknown for 34% of the shared profiles (Batjes et al., 2020).
- This study considers standardised data for some 240000 profiles, derived from WoSIS.
Methodology
The methodology involves several key steps:
- Input soil data preparation: Soil property data is derived from the ISRIC World Soil Information Service (WoSIS), which provides consistent, standardised soil profile data.
- Covariates’ selection: A standardized and reproducible procedure to select covariates used for modelling was implemented to (i) reduce redundancy between covariates, (ii) obtain a more parsimonious and computationally efficient model, (iii) decrease the risk of over-fitting and (iv) avoid a biased assessment of variable importance.
- Model tuning and cross-validation: Model tuning was performed with a 10-fold cross-validation procedure applied to multiple combinations of hyper-parameters.
- Final model fitting for prediction: The final model for each soil property was fitted with all available observations, the covariates and the hyper-parameters selected in the previous steps. Observation depth was included in the model as a covariate.
- Predictions with uncertainty estimation: Models were obtained with the ranger package (Wright and Ziegler, 2017), with the option quantreg to build quantile random forests (QRF; Meinshausen, 2006).
Implications and Conclusions
The study concludes that SoilGrids 2.0 provides a globally consistent product using the best available soil data and machine learning techniques. The maps are said to reproduce well known patterns.
Key Points
- SoilGrids 2.0 produces global maps of soil properties at 250m resolution.
- It uses machine learning with soil observations from 240,000 locations and 400+ environmental covariates.
- The spatial uncertainty at a global scale highlights the need for more soil observations, especially in high-latitude regions.
- The models incorporate soil profile data from ISRIC's World Soil Information Service (WOSIS).
- Spatial stratification is implemented in the cross-validation procedure to account for variations in observation density.
- Quantile regression forests are used to quantify prediction uncertainty.
- The mapped soil properties include organic carbon, total nitrogen, pH, cation exchange capacity, texture fractions, and coarse fragments.