Skip to content

LobellLab/csdl

Repository files navigation

Corn Soy Data Layer

This repo contains code that walks through key steps to create and validate the Corn-Soy Data Layer (CSDL), a map that classifies corn and soybean in 13 states in the US Midwest from 1999-2018 at 30m resolution. Although the USDA's Cropland Data Layer (CDL) offers crop type maps across the conterminous US from 2008 onward, such maps are missing in many Midwestern states or are uneven in quality before 2008. To fill these data gaps, we used the now-public Landsat archive and cloud computing services to map corn and soybean, the primary crops in the Midwest, back to 1999.

Dataset

Our dataset can be accessed through one of two ways:

  • Google Earth Engine asset here
  • Zenodo repo housing GeoTIFFs here

Map legend:

  • 0 = outside study area
  • 1 = corn
  • 5 = soy
  • 9 = other crop
  • 255 = non-crop (masked by NLCD)

Values were chosen to be consistent with CDL values when possible.

When using the dataset, please cite: DOI

Usage Notes

We recommend that users consider metrics such as (1) user's and producer's accuracy with CDL and (2) R2 with NASS statistics across space and time to determine in which states/counties and years CSDL is of high quality. This can be done with the CSV file of user's and producer's accuracies and annual county-level statistics we have included in this repo.

Code dependencies

  • To sample training points: R version 3.5.1, dplyr 0.8.0.1, sf 0.6-3, raster 2.6-7, rgdal 1.3-4, salustools 0.1.0, sp 1.3-1

  • To train our classifier and create the final maps: Google Earth Engine

  • To perform analyses: Python 3.7.3, numpy 1.16.4, pandas 0.24.2, matplotlib 3.1.0, sklearn 0.21.2, plotly 4.5.0

Map creation

  1. Sample a set of training coordinates. [R Markdown file]
  2. Export Landsat harmonic regression features. [Earth Engine script]
  3. After feature selection, assemble data into a dataframe for ingestion into GEE. [Jupyter notebook]
  4. Train random forest classifier in GEE. [Earth Engine script]

Map validation and error analysis

  1. Aggregated CSDL versus county-level NASS statistics. [Jupyter notebook]
  2. County-level CSDL time trends versus NASS time trends. [Jupyter notebook]
  3. Validate CSDL against ARMS crop rotation statistics. [Jupyter notebook]
  4. Landsat availability over the years. [Jupyter notebook]

Releases

No releases published

Packages

No packages published