SAR to RGB image translation using CycleGAN
The ability of SAR data to let us see through clouds make it more valuable specially in cloudy areas and bad weather. This is the time when earth observation can reap maximum benefits, but optical sensors prevent us doing that. Now a days a lot of organizations are investing in SAR data making it more available to users than before. The only disadvantage of SAR data is the unavailability of labelled data as it is more difficult for users to understand and label SAR data than optical imagery.
In this sample notebook, we will see how we can make use of benefits of SAR and optical imagery to perform all season earth observation. We will train a deep learning model to translate SAR imagery to RGB imagery, thereby making optical data (translated) available even in extreme weather days and cloudy areas.
We will train a CycleGAN model for this case. It is important to note that the CycleGAN model expects unpaired data and it does not have any information on mapping SAR to RGB pixels, so it may map dark pixels in the source image to darker shaded pixels in the other image which may not be right always (especially in agricultural land areas). If this kind of problem is faced where results are mismatched because of wrong mapping, Pix2Pix model which expects paired data can be used.
import os, zipfile from pathlib import Path from arcgis.gis import GIS from arcgis.learn import prepare_data, CycleGAN
# Connect to GIS gis = GIS('home')
For this usecase, we have SAR imagery from Capella Space and world imagery in the form of RGB tiles near Rotterdam city in the Netherlands. We have exported that data in a new “CycleGAN” metadata format available in the
Export Training Data For Deep Learning tool. This
Export Training Data For Deep Learning tool available in ArcGIS Pro as well as ArcGIS Image Server.
Input Raster: SAR imagery tile
Additional Raster: RGB imagery
Tile Size X & Tile Size Y: 256
Stride X & Stride Y: 128
Meta Data Format: CycleGAN
Environments: Set optimum
In the exported training data, 'A' and 'B' folders contain all the image tiles exported from SAR imagery and RGB imagery (world imagery cache), respectively. Each folder will also have other files like 'esri_accumulated_stats.json', 'esri_model_definition.emd', 'map.txt', 'stats.txt'. Now, we are ready to train the
Alternatively, we have provided a subset of training data containing a few samples that follows the same directory structure mentioned above. You can use the data directly to run the experiments.
training_data = gis.content.get('25ed4a30219e4ba7acb3633e1a75bae1') training_data
filepath = training_data.download(file_name=training_data.name)
with zipfile.ZipFile(filepath, 'r') as zip_ref: zip_ref.extractall(Path(filepath).parent)
output_path = Path(os.path.join(os.path.splitext(filepath)))
We will train CycleGAN model  that performs the task of Image-to-Image translation where it learns mapping between input and output images using unpaired dataset. This model is an extension of GAN architecture which involves simultaneous training of two generator models and two discriminator models. In GAN, we can generate images of domain Y from domain X, but in CycleGAN, we can also generate images of domain X from domain Y using the same model architecture.
Figure 4. CycleGAN architecture
It has two mapping functions: G : X → Y and F : Y → X, and associated adversarial discriminators Dy and Dx. G tries to generate images that look similar to images from domain Y, while Dy aims to distinguish between translated samples G(x) and real samples y. G aims to minimize this objective against an adversary D that tries to maximize it. The same process happens in generation of the images of domain X from domain Y using F as a generator and Dx as a discriminator.
We will specify the path to our training data and a few hyperparameters.
path: path of the folder containing training data.
batch_size: Number of images your model will train on each step inside an epoch, it directly depends on the memory of your graphic card. 4 worked for us on a 11GB GPU.
data = prepare_data(output_path, batch_size=8)
To get a sense of what the training data looks like,
arcgis.learn.show_batch() method randomly picks a few training chips and visualizes them.
rows: Number of rows to visualize
model = CycleGAN(data)
Learning rate is one of the most important hyperparameters in model training. ArcGIS API for Python provides a learning rate finder that automatically chooses the optimal learning rate for you.
lr = model.lr_find()
We will train the model for a few epochs with the learning rate we have found. For the sake of time, we can start with 25 epochs. Unlike some other models, we train CycleGAN from scratch with a learning rate of 2e-04 for some initial epochs and then linearly decay the rate to zero over the next epochs.
Here, with 25 epochs, we can see reasonable results — both training and validation losses have gone down considerably, indicating that the model is learning to translate SAR imagery to RGB and vice versa.
It is a good practice to see results of the model viz-a-viz ground truth. The code below picks random samples and shows us ground truth and model predictions, side by side. This enables us to preview the results of the model within the notebook.
We will save the model which we trained as a 'Deep Learning Package' ('.dlpk' format). Deep Learning package is the standard format used to deploy deep learning models on the ArcGIS platform.
We will use the
save() method to save the trained model. By default, it will be saved to the 'models' sub-folder within our training data folder.
We can translate SAR imagery to RGB and vice versa with the help of
Using predict function, we can apply the trained model on the image which we want to translate.
img_path: path to the image file.
convert_to: 'a' or 'b' type of fake image we want to generate.
#un-comment the cell to run predict over your desired image. # model.predict(r"D:\CycleGAN\Data\exported_data_CycleGAN\A\images\000002800.tif", convert_to="b")
In the above step, we are translating an image of
type a i.e. SAR imagery to an image of
type b i.e. RGB imagery. We can also perform
type b to
type a translation by changing the image file and
#un-comment the cell to run predict over your desired image. # model.predict(r"D:\CycleGAN\Data\exported_data_CycleGAN\B\images\000008007.tif", convert_to="a")
Also, we can make use of
Classify Pixels Using Deep Learning tool available in both ArcGIS Pro and ArcGIS Enterprise.
Input Raster: The raster layer you want to classify.
Model Definition: It will be located inside the saved model in 'models' folder in '.emd' format.
Padding: The 'Input Raster' is tiled and the deep learning model classifies each individual tile separately before producing the final 'Output Classified Raster'. This may lead to unwanted artifacts along the edges of each tile as the model has little context to predict accurately. Padding as the name suggests allows us to supply some extra information along the tile edges, this helps the model to predict better.
Cell Size: Should be close to the size used to train the model. This was specified in the Export training data step.
Processor Type: This allows you to control whether the system's 'GPU' or 'CPU' will be used to classify pixels, by 'default GPU' will be used if available.
The gif below was achieved with the model trained in this notebook and visualizes the generated RGB image over original RGB image near Rotterdam.
In this notebook, we demonstrated how to use
CycleGAN model using
ArcGIS API for Python in order to translate imagery of one type to the other.
 Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks; https://arxiv.org/abs/1703.10593.