Introduction

This document primarily lists resources for performing deep learning (DL) on satellite imagery. To a lesser extent Machine learning (ML, e.g. random forests, stochastic gradient descent) are also discussed, as are classical image processing techniques.

Datasets

Warning satellite image files can be LARGE, even a small data set may comprise 50 GB of imagery
Various datasets listed here and at awesome-satellite-imagery-datasets

WorldView - SpaceNet

https://en.wikipedia.org/wiki/WorldView-3
0.3m PAN, 1.24 MS, 3.7m SWIR. Off-Nadir (stereo) available.
Owned by DigitalGlobe
Intro to SpaceNet
SpaceNet dataset on AWS -> see this getting started notebook and this notebook on the off-Nadir dataset
cloud_optimized_geotif here used in the 3D modelling notebook here.
Package of utilities to assist working with the SpaceNet dataset.
For more Worldview imagery see Kaggle DSTL competition.

Sentinel

As part of the EU Copernicus program, multiple Sentinel satellites are capturing imagery -> see wikipedia.
13 bands, Spatial resolution of 10 m, 20 m and 60 m, 290 km swath, the temporal resolution is 5 days
Open access data on GCP
Paid access via sentinel-hub and python-api.
Example loading sentinel data in a notebook
so2sat on Tensorflow datasets - So2Sat LCZ42 is a dataset consisting of co-registered synthetic aperture radar and multispectral optical image patches acquired by the Sentinel-1 and Sentinel-2 remote sensing satellites, and the corresponding local climate zones (LCZ) label. The dataset is distributed over 42 cities across different continents and cultural regions of the world.
eurosat - EuroSAT dataset is based on Sentinel-2 satellite images covering 13 spectral bands and consisting of 10 classes with 27000 labeled and geo-referenced samples. Dataset and usage in EuroSAT: Land Use and Land Cover Classification with Sentinel-2, where a CNN achieves a classification accuracy 98.57%.
bigearthnet - The BigEarthNet is a new large-scale Sentinel-2 benchmark archive, consisting of 590,326 Sentinel-2 image patches. The image patch size on the ground is 1.2 x 1.2 km with variable image size depending on the channel resolution. This is a multi-label dataset with 43 imbalanced labels.

Landsat

Long running US program -> see Wikipedia
8 bands, 15 to 60 meters, 185km swath, the temporal resolution is 16 days
Imagery on GCP, see the GCP bucket here, with imagery analysed in this notebook on Pangeo
https://github.com/kylebarron/landsat-mosaic-latest - Auto-updating cloudless Landsat 8 mosaic from AWS SNS notifications
Visualise landsat imagery using Datashader

Shuttle Radar Topography Mission (digital elevation maps)

Data - open access

Aerial imagery (drones)

Stanford Drone Dataset

Kaggle

Kaggle hosts over 60 satellite image datasets, search results here. The kaggle blog is an interesting read.

Kaggle - Amazon from space - classification challenge

https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data
3-5 meter resolution GeoTIFF images from planet Dove satellite constellation
12 classes including - cloudy, primary + waterway etc
1st place winner interview - used 11 custom CNN
FastAI Multi-label image classification

Kaggle - DSTL - segmentation challenge

https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection
Rating - medium, many good examples (see the Discussion as well as kernels), but as this competition was run a couple of years ago many examples use python 2
WorldView 3 - 45 satellite images covering 1km x 1km in both 3 (i.e. RGB) and 16-band (400nm - SWIR) images
10 Labelled classes include - Buildings, Road, Trees, Crops, Waterway, Vehicles
Interview with 1st place winner who used segmentation networks - 40+ models, each tweaked for particular target (e.g. roads, trees)
Deepsense 4th place solution
My analysis here

Kaggle - Airbus Ship Detection Challenge

https://www.kaggle.com/c/airbus-ship-detection/overview
Rating - medium, most solutions using deep-learning, many kernels, good example kernel.
I believe there was a problem with this dataset, which led to many complaints that the competition was ruined.

Kaggle - Draper - place images in order of time

https://www.kaggle.com/c/draper-satellite-image-chronology/data
Rating - hard. Not many useful kernels.
Images are grouped into sets of five, each of which have the same setId. Each image in a set was taken on a different day (but not necessarily at the same time each day). The images for each set cover approximately the same area but are not exactly aligned.
Kaggle interviews for entrants who used XGBOOST and a hybrid human/ML approach

Kaggle - Deepsat - classification challenge

Not satellite but airborne imagery. Each sample image is 28x28 pixels and consists of 4 bands - red, green, blue and near infrared. The training and test labels are one-hot encoded 1x6 vectors. Each image patch is size normalized to 28x28 pixels. Data in .mat Matlab format. JPEG?

Imagery source
Sat4 500,000 image patches covering four broad land cover classes - barren land, trees, grassland and a class that consists of all land cover classes other than the above three Example notebook
Sat6 405,000 image patches each of size 28x28 and covering 6 landcover classes - barren land, trees, grassland, roads, buildings and water bodies.
Deep Gradient Boosted Learning article

Kaggle - other

Satellite + loan data -> https://www.kaggle.com/reubencpereira/spatial-data-repo

Alternative datasets

There are a variety of datasets suitable for land classification problems.

Tensorflow datasets

There are a number of remote sensing datasets
resisc45 - RESISC45 dataset is a publicly available benchmark for Remote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class.
eurosat - EuroSAT dataset is based on Sentinel-2 satellite images covering 13 spectral bands and consisting of 10 classes with 27000 labeled and geo-referenced samples.
bigearthnet - The BigEarthNet is a new large-scale Sentinel-2 benchmark archive, consisting of 590,326 Sentinel-2 image patches. The image patch size on the ground is 1.2 x 1.2 km with variable image size depending on the channel resolution. This is a multi-label dataset with 43 imbalanced labels.

UC Merced

http://weegee.vision.ucmerced.edu/datasets/landuse.html
Available as a Tensorflow dataset -> https://www.tensorflow.org/datasets/catalog/uc_merced
This is a 21 class land use image dataset meant for research purposes.
There are 100 RGB TIFF images for each class
Each image measures 256x256 pixels with a pixel resolution of 1 foot
Image classification using Keras

AWS datasets

Landsat -> free viewer at remotepixel and libra
Optical, radar, segmented etc. https://aws.amazon.com/earth/
SpaceNet - WorldView-3 and article here. Also example semantic segmentation using Raster Vision

Quilt

Several people have uploaded datasets to Quilt

Google Earth Engine

https://developers.google.com/earth-engine/
Various imagery and climate datasets, including Landsat & Sentinel imagery
Python API but all compute happens on Googles servers

Weather Datasets

UK met-odffice -> https://www.metoffice.gov.uk/datapoint
NASA (make request and emailed when ready) -> https://search.earthdata.nasa.gov
NOAA (requires BigQuery) -> https://www.kaggle.com/noaa/goes16/home
Time series weather data for several US cities -> https://www.kaggle.com/selfishgene/historical-hourly-weather-data

UAV datasets

These are mostly referenced on https://www.visualdata.io
AU-AIR dataset -> a multi-modal UAV dataset for object detection.
ERA -> A Dataset and Deep Learning Benchmark for Event Recognition in Aerial Videos.

Synthetic data

The Synthinel-1 dataset: a collection of high resolution synthetic overhead imagery for building segmentation
RarePlanes -> incorporates both real and synthetically generated satellite imagery including aircraft.

Interesting deep learning projects

Raster Vision by Azavea

https://www.azavea.com/projects/raster-vision/
An open source Python framework for building computer vision models on aerial, satellite, and other large imagery sets.
Accessible through the Raster Foundry
Example use cases on open data

RoboSat

https://github.com/mapbox/robosat
Generic ecosystem for feature extraction from aerial and satellite imagery.

neat-EO

https://neat-EO.pink
Efficient AI4EO OpenSource framework

DeepOSM

https://github.com/trailbehind/DeepOSM
Train a deep learning net with OpenStreetMap features and satellite imagery.

DeepNetsForEO - segmentation

https://github.com/nshaud/DeepNetsForEO
Uses SegNET for working on remote sensing images using deep learning.

Skynet-data

https://github.com/developmentseed/skynet-data
Data pipeline for machine learning with OpenStreetMap

Techniques

This section explores the different techniques (DL, ML & classical) people are applying to common problems in satellite imagery analysis. Classification problems are the most simply addressed via DL, object detection is harder, and cloud detection harder still (niche interest).

Land classification

Very common problem, assign land classification to a pixel based on pixel value, can be addressed via simple sklearn cluster algorithm or deep learning.
Land use is related to classification, but we are trying to detect a scene, e.g. housing, forestry. I have tried CNN -> See my notebooks
Land Use Classification using Convolutional Neural Network in Keras
Sea-Land segmentation using DL
Pixel level segmentation on Azure
Deep Learning-Based Classification of Hyperspectral Data
A U-net based on Tensorflow for objection detection (or segmentation) of satellite images - DSTL dataset but python 2.7
What’s growing there? Using eo-learn and fastai to identify crops from multi-spectral remote sensing data (Sentinel 2)
FastAI Multi-label image classification
- Image classification using Keras

Semantic segmentation

Pixel-wise classification
Instance segmentation with keras - links to satellite examples
Semantic Segmentation on Aerial Images using fastai
https://github.com/Paulymorphous/Road-Segmentation

Change detection

Monitor water levels, coast lines, size of urban areas, wildfire damage. Note, clouds change often too..!
Using PCA (python 2, requires updating) -> https://appliedmachinelearning.blog/2017/11/25/unsupervised-changed-detection-in-multi-temporal-satellite-images-using-pca-k-means-python-code/
Using CNN -> https://github.com/vbhavank/Unstructured-change-detection-using-CNN
Siamese neural network to detect changes in aerial images
https://www.spaceknow.com/
LANDSAT Time Series Analysis for Multi-temporal Land Cover Classification using Random Forest
Change Detection in 3D: Generating Digital Elevation Models from Dove Imagery
Change Detection in Hyperspectral Images Using Recurrent 3D Fully Convolutional Networks
PySAR - InSAR (Interferometric Synthetic Aperture Radar) timeseries analysis in python

Image registration

Wikipedia article on registration -> register for change detection or image stitching
Traditional approach -> define control points, employ RANSAC algorithm
Phase correlation used to estimate the translation between two images with sub-pixel accuracy, useful for allows accurate registration of low resolution imagery onto high resolution imagery, or register a sub-image on a full image -> Unlike many spatial-domain algorithms, the phase correlation method is resilient to noise, occlusions, and other defects. Applied to Landsat images here.

Object detection

A typical task is detecting boats on the ocean, which should be simpler than land based challenges owing to blank background in images, but is still challenging and no convincing robust solutions available.
Intro articles here and here.
DigitalGlobe article - they use a combination classical techniques (masks, erodes) to reduce the search space (identifying water via NDWI which requires SWIR) then apply a binary DL classifier on candidate regions of interest. They deploy the final algo as a task on their GBDX platform. They propose that in the future an R-CNN may be suitable for the whole process.
Planet use non DL felzenszwalb algorithm to detect ships
Segmentation of buildings on kaggle
Identifying Buildings in Satellite Images with Machine Learning and Quilt -> NDVI & edge detection via gaussian blur as features, fed to TPOT for training with labels from OpenStreetMap, modelled as a two class problem, “Buildings” and “Nature”.
Deep learning for satellite imagery via image segmentation
Building Extraction with YOLT2 and SpaceNet Data
Find sports fields using Mask R-CNN and overlay on open-street-map
Detecting solar panels from satellite imagery

Cloud detection

A subset of the object detection problem, but surprisingly challenging
From this article on sentinelhub there are three popular classical algorithms that detects thresholds in multiple bands in order to identify clouds. In the same article they propose using semantic segmentation combined with a CNN for a cloud classifier (excellent review paper here), but state that this requires too much compute resources.
This article compares a number of ML algorithms, random forests, stochastic gradient descent, support vector machines, Bayesian method.

Wealth and ecenomic activity measurement

The goal is to predict ecenomic activity from satellite imagery rather than conducting labour intensive ground surveys

Using publicly available satellite imagery and deep learning to understand economic well-being in Africa, Nature Comms 22 May 2020 -> Used CNN on Ladsat imagery (night & day) to predict asset wealth of African villages
Combining Satellite Imagery and machine learning to predict poverty -> review article
Measuring Human and Economic Activity from Satellite Imagery to Support City-Scale Decision-Making during COVID-19 Pandemic
Predicting Food Security Outcomes Using CNNs for Satellite Tasking

Super resolution

Pansharpening

Image fusion of low res multispectral with high res pan band. Several algorithms described in the ArcGIS docs, with the simplest being taking the mean of the pan and RGB pixel value.
Does not require DL, classical algos suffice, see this notebook and this kaggle kernel
https://github.com/mapbox/rio-pansharpen

Stereo imaging for terrain mapping & DEMs

Wikipedia DEM article and phase correlation article
Intro to depth from stereo
Map terrain from stereo images to produce a digital elevation model (DEM) -> high resolution & paired images required, typically 0.3 m, e.g. Worldview or GeoEye.
Process of creating a DEM here and here.
ArcGIS can generate DEMs from stereo images
https://github.com/MISS3D/s2p -> produces elevation models from images taken by high resolution optical satellites -> demo code on https://gfacciol.github.io/IS18/
Automatic 3D Reconstruction from Multi-Date Satellite Images
Semi-global matching with neural networks
Predict the fate of glaciers
monodepth - Unsupervised single image depth prediction with CNNs
Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches
Terrain and hydrological analysis based on LiDAR-derived digital elevation models (DEM) - Python package
Phase correlation in scikit-image

Lidar

Reconstructing 3D buildings from aerial LiDAR with Mask R-CNN)

NVDI - vegetation index

Simple band math ndvi = np.true_divide((ir - r), (ir + r)) but challenging due to the size of the imagery.
Example notebook local
Landsat data in cloud optimised format analysed for NVDI with medium article here.
Visualise water loss with Holoviews

SAR

Removing speckle noise from Sentinel-1 SAR using a CNN
A dataset which is specifically made for deep learning on SAR and optical imagery is the SEN1-2 dataset, which contains corresponding patch pairs of Sentinel 1 (VV) and 2 (RGB) data. It is the largest manually curated dataset of S1 and S2 products, with corresponding labels for land use/land cover mapping, SAR-optical fusion, segmentation and classification tasks. Paper: https://elib.dlr.de/128117/1/SEN12MS_Preprint.pdf Data: https://mediatum.ub.tum.de/1474000
so2sat on Tensorflow datasets - So2Sat LCZ42 is a dataset consisting of co-registered synthetic aperture radar and multispectral optical image patches acquired by the Sentinel-1 and Sentinel-2 remote sensing satellites, and the corresponding local climate zones (LCZ) label. The dataset is distributed over 42 cities across different continents and cultural regions of the world.
Using Machine Learning to Automatically Detect Volcanic Unrest in a Time Series of Interferograms

Aerial imagery (drones)

RetinaNet for pedestrian detection

Image formats and catalogues

We certainly want to consider cloud optimised GeoTiffs https://www.cogeo.org/
https://terria.io/ for pretty catalogues
Remote pixel
Sentinel-hub eo-browser
Large datasets may come in HDF5 format, can view with -> https://www.hdfgroup.org/downloads/hdfview/
Climate data is often in netcdf format, which can be opened using xarray
The xarray docs list a number of ways that data can be stored and loaded.

STAC - SpatioTemporal Asset Catalog

Specification describing the layout of a catalogue comprising of static files. The aim is that the catalogue is crawlable so it can be indexed by a search engine and make imagery discoverable, without requiring yet another API interface.
An initiative of https://www.radiant.earth/ in particular https://github.com/cholmes
Spec at https://github.com/radiantearth/stac-spec
Browser at https://github.com/radiantearth/stac-browser
Talk at https://docs.google.com/presentation/d/1O6W0lMeXyUtPLl-k30WPJIyH1ecqrcWk29Np3bi6rl0/edit#slide=id.p
Example catalogue at https://landsat-stac.s3.amazonaws.com/catalog.json
Chat https://gitter.im/SpatioTemporal-Asset-Catalog/Lobby
Several useful repos on https://github.com/sat-utils

State of the art

What are companies doing?

Overall trend to using AWS S3 backend for image storage. There are a variety of tools for exploring and having teams collaborate on data on S3, e.g. T4.
Bucking the trend, Descartes & Airbus are using a google backend -> checkout gcsts for google cloud storage sile-system
Just speculating, but a serverless pipeline appears to be where companies are headed for routine compute tasks, whilst providing a Jupyter notebook approach for custom analysis.
Traditional data formats aren't designed for processing, so new standards are developing such as cloud optimised geotiffs and zarr

Batch processing

Google provide training on how to use Apache Spark on Google Cloud Dataproc to distribute a computationally intensive (satellite) image processing task onto a cluster of machines -> https://google.qwiklabs.com/focuses/5834?parent=catalog

Online platforms for Geo analysis

This article discusses some of the available platforms -> TLDR Pangeo rocks, but must BYO imagery
Pangeo - open source resources for parallel processing using Dask and Xarray http://pangeo.io/index.html
Airbus Sandbox -> will provide access to imagery
Descartes Labs -> access to EO imagery from a variety of providers via python API -> not clear which imagery is available (Airbus + others?) or pricing
DigitalGlobe have a cloud hosted Jupyter notebook platform called GBDX. Cloud hosting means they can guarantee the infrastructure supports their algorithms, and they appear to be close/closer to deploying DL. Tutorial notebooks here. Only Sentinel-2 and Landsat data on free tier.
Planet have a Jupyter notebook platform which can be deployed locally and requires an API key (14 days free). They have a python wrapper (2.7..) to their rest API. No price after 14 day trial.
Earth-i Spectrum appears to allow processing of imagery, with the capability to perform segmentation, change detection, object recognition. This promo video contains some screenshots of the application.

Free online computing resources

Generally a GPU is required for DL, and this section lists a couple of free Jupyter environments with GPU available. There is a good overview of online Jupyter envs on the fast.at site.

Google Colab

Collaboratory notebooks with GPU as a backend for free for 12 hours at a time. Note that the GPU may be shared with other users, so if you aren't getting good performance try reloading.
Also a pro tier for $10 a month -> https://colab.research.google.com/signup
Tensorflow available & pytorch can be installed, useful articles

Kaggle - also Google!

Free to use
GPU Kernels - may run for 1 hour
Tensorflow, pytorch & fast.ai available
Advantage that many datasets are already available
Read

Production

Custom REST API

Tensorflow Serving

https://www.tensorflow.org/serving/
TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. Multiple models, or indeed multiple versions of the same model, can be served simultaneously. TensorFlow Serving comes with a scheduler that groups individual inference requests into batches for joint execution on a GPU

chip-n-scale-queue-arranger by developmentseed

https://github.com/developmentseed/chip-n-scale-queue-arranger
an orchestration pipeline for running machine learning inference at scale
Supports fast.ai models

Useful open source software

QGIS- Create, edit, visualise, analyse and publish geospatial information. Python scripting and plugins.
Orfeo toolbox - remote sensing toolbox with python API (just a wrapper to the C code). Do activites such as pansharpening, ortho-rectification, image registration, image segmentation & classification. Not much documentation.
QUICK TERRAIN READER - view DEMS, Windows
satpy - a python library for reading and manipulating meteorological remote sensing data and writing it to various image and data file formats
Pyviz examples include several interesting geospatial visualisations
torchvision-enhance -> Enhance PyTorch vision for semantic segmentation, multi-channel images and TIF file,...
dl-satellite-docker -> docker files for geospatial analysis, including tensorflow, pytorch, gdal, xgboost...
Geowombat -> geo-utilities applied to air- and space-borne imagery, uses Rasterio, Xarray and Dask for I/O and distributed computing with named coordinates
TorchSat is an open-source deep learning framework for satellite imagery analysis based on PyTorch.
AIDE V2 - Tools for detecting wildlife in aerial images using active learning
xarray-spatial: Fast, Accurate Python library for Raster Operations. Implements algorithms using Numba and Dask, free of GDAL

Movers and shakers on Github

Chris Holmes is doing great things at Planet
Christoph Rieke maintains a very popular imagery repo and has published his thesis on segmentation
Robin Wilson is a former academic who is very active in the satellite imagery space

Courses

Manning: Monitoring Changes in Surface Water Using Satellite Image Data

Online communities

fast AI geospatial study group

Geopsatial companies

https://github.com/chrieke/geospatial-companies -> List of 500+ geospatial companies by Christoph Rieke

For fun

Style transfer - see the world in a new way

Useful References

https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/#0
https://github.com/taspinar/sidl/blob/master/notebooks/2_Detecting_road_and_roadtypes_in_sattelite_images.ipynb
Geonotebooks with Docker container
Sentinel NetCDF data
Open Data Cube - serve up cubes of data https://www.opendatacube.org/
Process Satellite data using AWS Lambda functions
OpenDroneMap - generate maps, point clouds, 3D models and DEMs from drone, balloon or kite images.

About the author

My background is optical physics, and I have a PhD from Cambridge on the topic of Plasmon enhanced Raman spectroscopy. After doing a post doc I left academia and took a variety of roles, from industrial research at Sharp Labs Europe, to medical physics, to building optical telescopes at Surrey Satellites (SSTL). It was whilst at SSTL that I started this repo as a personal resource. I left SSTL, actually was made redundant along with 30% of the company, and after a brief stint at an IOT start up, I now work as a data engineer. Deep learning is currently a hobby, but I have ambitions to move into this domain when the right opportunity presents itself. Feel free to connect with me on LinkedIn.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
3d_models		3d_models
change_detection/using_pca_and_k_means		change_detection/using_pca_and_k_means
data/images		data/images
land_classification		land_classification
object_detection		object_detection
pangeo		pangeo
semantic_segmentation		semantic_segmentation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

shankarpd/satellite-image-deep-learning

Folders and files

Latest commit

History

Repository files navigation

Introduction

Top links

Table of contents

Datasets

WorldView - SpaceNet

Sentinel

Landsat

Shuttle Radar Topography Mission (digital elevation maps)

Aerial imagery (drones)

Kaggle

Kaggle - Amazon from space - classification challenge

Kaggle - DSTL - segmentation challenge

Kaggle - Airbus Ship Detection Challenge

Kaggle - Draper - place images in order of time

Kaggle - Deepsat - classification challenge

Kaggle - other

Alternative datasets

Tensorflow datasets

UC Merced

AWS datasets

Quilt

Google Earth Engine

Weather Datasets

UAV datasets

Synthetic data

Interesting deep learning projects

Raster Vision by Azavea

RoboSat

neat-EO

DeepOSM

DeepNetsForEO - segmentation

Skynet-data

Techniques

Land classification

Semantic segmentation

Change detection

Image registration

Object detection

Cloud detection

Wealth and ecenomic activity measurement

Super resolution

Pansharpening

Stereo imaging for terrain mapping & DEMs

Lidar

NVDI - vegetation index

SAR

Aerial imagery (drones)

Image formats and catalogues

STAC - SpatioTemporal Asset Catalog

State of the art

Batch processing

Online platforms for Geo analysis

Free online computing resources

Google Colab

Kaggle - also Google!

Production

Custom REST API

Tensorflow Serving

chip-n-scale-queue-arranger by developmentseed

Useful open source software

Movers and shakers on Github

Courses

Online communities

Geopsatial companies

For fun

Useful References

About the author

About

Resources

License

Stars

Watchers

Forks

Releases

Packages