SAR to RGB image translation using CycleGAN

Introduction

The ability of SAR data to let us see through clouds make it more valuable specially in cloudy areas and bad weather. This is the time when earth observation can reap maximum benefits, but optical sensors prevent us doing that. Now a days a lot of organizations are investing in SAR data making it more available to users than before. The only disadvantage of SAR data is the unavailability of labelled data as it is more difficult for users to understand and label SAR data than optical imagery.

In this sample notebook, we will see how we can make use of benefits of SAR and optical imagery to perform all season earth observation. We will train a deep learning model to translate SAR imagery to RGB imagery, thereby making optical data (translated) available even in extreme weather days and cloudy areas.

We will train a CycleGAN model for this case. It is important to note that the CycleGAN model expects unpaired data and it does not have any information on mapping SAR to RGB pixels, so it may map dark pixels in the source image to darker shaded pixels in the other image which may not be right always (especially in agricultural land areas). If this kind of problem is faced where results are mismatched because of wrong mapping, Pix2Pix model which expects paired data can be used.

Necessary imports

Input
import os, zipfile
from pathlib import Path

from arcgis.gis import GIS
from arcgis.learn import prepare_data, CycleGAN

Connect to your GIS

Input
# Connect to GIS
gis = GIS('home') 

Export training data

For this usecase, we have SAR imagery from Capella Space and world imagery in the form of RGB tiles near Rotterdam city in the Netherlands. We have exported that data in a new “CycleGAN” metadata format available in the Export Training Data For Deep Learning tool. This Export Training Data For Deep Learning tool available in ArcGIS Pro as well as ArcGIS Image Server.

  • Input Raster: SAR imagery tile
  • Additional Raster: RGB imagery
  • Tile Size X & Tile Size Y: 256
  • Stride X & Stride Y: 128
  • Meta Data Format: CycleGAN
  • Environments: Set optimum Cell Size, Processing Extent.

In the exported training data, 'A' and 'B' folders contain all the image tiles exported from SAR imagery and RGB imagery (world imagery cache), respectively. Each folder will also have other files like 'esri_accumulated_stats.json', 'esri_model_definition.emd', 'map.txt', 'stats.txt'. Now, we are ready to train the CycleGAN model.

Alternatively, we have provided a subset of training data containing a few samples that follows the same directory structure mentioned above. You can use the data directly to run the experiments.

Input
training_data = gis.content.get('25ed4a30219e4ba7acb3633e1a75bae1')
training_data
Output
sar_to_rgb_image_translation_using_cyclegan
Image Collection by api_data_owner
Last Modified: February 25, 2022
0 comments, 124 views
Input
filepath = training_data.download(file_name=training_data.name)
Input
with zipfile.ZipFile(filepath, 'r') as zip_ref:
    zip_ref.extractall(Path(filepath).parent)
Input
output_path = Path(os.path.join(os.path.splitext(filepath)[0]))

Train the model

We will train CycleGAN model [1] that performs the task of Image-to-Image translation where it learns mapping between input and output images using unpaired dataset. This model is an extension of GAN architecture which involves simultaneous training of two generator models and two discriminator models. In GAN, we can generate images of domain Y from domain X, but in CycleGAN, we can also generate images of domain X from domain Y using the same model architecture.


Figure 4. CycleGAN architecture

It has two mapping functions: G : X → Y and F : Y → X, and associated adversarial discriminators Dy and Dx. G tries to generate images that look similar to images from domain Y, while Dy aims to distinguish between translated samples G(x) and real samples y. G aims to minimize this objective against an adversary D that tries to maximize it. The same process happens in generation of the images of domain X from domain Y using F as a generator and Dx as a discriminator.

Prepare data

We will specify the path to our training data and a few hyperparameters.

  • path: path of the folder containing training data.
  • batch_size: Number of images your model will train on each step inside an epoch, it directly depends on the memory of your graphic card. 4 worked for us on a 11GB GPU.
Input
data = prepare_data(output_path, batch_size=8)

Visualize training data

To get a sense of what the training data looks like, arcgis.learn.show_batch() method randomly picks a few training chips and visualizes them.

  • rows: Number of rows to visualize
Input
data.show_batch()

Load model architecture

Input
model = CycleGAN(data)

Find an optimal learning rate

Learning rate is one of the most important hyperparameters in model training. ArcGIS API for Python provides a learning rate finder that automatically chooses the optimal learning rate for you.

Input
lr = model.lr_find()

Fit the model

We will train the model for a few epochs with the learning rate we have found. For the sake of time, we can start with 25 epochs. Unlike some other models, we train CycleGAN from scratch with a learning rate of 2e-04 for some initial epochs and then linearly decay the rate to zero over the next epochs.

Input
model.fit(25, lr)
epoch train_loss valid_loss id_loss gen_loss cyc_loss D_A_loss D_B_loss time
0 12.721999 11.730223 3.998488 0.769209 7.954296 0.265965 0.281635 13:34
1 7.841636 7.544123 2.355145 0.842276 4.644219 0.168540 0.219886 13:30
2 6.932475 6.646543 2.033799 0.860827 4.037850 0.167215 0.190510 13:31
3 6.302457 6.236442 1.826933 0.885448 3.590077 0.130738 0.217415 13:30
4 6.080861 6.053925 1.707871 0.943970 3.429021 0.126789 0.220006 13:30
5 5.962185 5.660501 1.664525 0.929826 3.367835 0.144954 0.200088 13:31
6 5.532198 5.643837 1.518743 0.907848 3.105606 0.146165 0.197749 13:30
7 5.537426 5.526513 1.510175 0.866212 3.161039 0.183122 0.187084 13:31
8 5.290096 6.090646 1.422879 0.877437 2.989780 0.161592 0.446271 13:31
9 5.662517 5.221590 1.532634 0.893956 3.235927 0.176112 0.197955 13:31
10 5.206953 5.220056 1.381885 0.889010 2.936058 0.189249 0.233941 13:29
11 5.070705 4.976690 1.332413 0.896229 2.842061 0.174638 0.201816 13:30
12 5.005665 4.977567 1.305359 0.902462 2.797843 0.167335 0.181656 13:37
13 5.110228 5.356337 1.337922 0.901490 2.870818 0.215103 0.180093 13:45
14 4.853711 4.679384 1.252533 0.888777 2.712400 0.180137 0.225612 13:35
15 4.977405 4.836682 1.300151 0.905164 2.772090 0.172903 0.229463 13:34
16 4.767528 4.654548 1.225715 0.903952 2.637859 0.189157 0.174896 13:32
17 5.028263 4.956115 1.318882 0.887852 2.821530 0.184559 0.179959 13:30
18 4.728264 4.597402 1.225199 0.874639 2.628427 0.175217 0.168787 13:28
19 4.706354 4.533289 1.217351 0.882405 2.606596 0.189901 0.174052 13:30
20 4.579993 4.584113 1.184290 0.867878 2.527825 0.178014 0.185218 13:31
21 4.523578 4.520947 1.163049 0.870982 2.489547 0.177160 0.175121 13:30
22 4.620153 4.558319 1.195908 0.877545 2.546699 0.176048 0.174974 13:30
23 4.677400 4.536479 1.217801 0.863664 2.595937 0.178214 0.179145 13:31
24 4.555197 4.551400 1.183150 0.861303 2.510743 0.177039 0.175313 13:29

Here, with 25 epochs, we can see reasonable results — both training and validation losses have gone down considerably, indicating that the model is learning to translate SAR imagery to RGB and vice versa.

Visualize results in validation set

It is a good practice to see results of the model viz-a-viz ground truth. The code below picks random samples and shows us ground truth and model predictions, side by side. This enables us to preview the results of the model within the notebook.

Input
model.show_results(4)

Save the model

We will save the model which we trained as a 'Deep Learning Package' ('.dlpk' format). Deep Learning package is the standard format used to deploy deep learning models on the ArcGIS platform.

We will use the save() method to save the trained model. By default, it will be saved to the 'models' sub-folder within our training data folder.

Input
model.save("SAR_to_RGB_25e", publish=True)    
Output
WindowsPath('D:/CycleGAN/Data/data_for_cyclegan_le_3Bands/models/SAR_to_RGB_25e')

Model inference

We can translate SAR imagery to RGB and vice versa with the help of predict() method.

Using predict function, we can apply the trained model on the image which we want to translate.

  • img_path: path to the image file.
  • convert_to: 'a' or 'b' type of fake image we want to generate.
Input
#un-comment the cell to run predict over your desired image.
# model.predict(r"D:\CycleGAN\Data\exported_data_CycleGAN\A\images\000002800.tif", convert_to="b")

In the above step, we are translating an image of type a i.e. SAR imagery to an image of type b i.e. RGB imagery. We can also perform type b to type a translation by changing the image file and convert_to parameter.

Input
#un-comment the cell to run predict over your desired image.
# model.predict(r"D:\CycleGAN\Data\exported_data_CycleGAN\B\images\000008007.tif", convert_to="a")

Also, we can make use of Classify Pixels Using Deep Learning tool available in both ArcGIS Pro and ArcGIS Enterprise.

  • Input Raster: The raster layer you want to classify.
  • Model Definition: It will be located inside the saved model in 'models' folder in '.emd' format.
  • Padding: The 'Input Raster' is tiled and the deep learning model classifies each individual tile separately before producing the final 'Output Classified Raster'. This may lead to unwanted artifacts along the edges of each tile as the model has little context to predict accurately. Padding as the name suggests allows us to supply some extra information along the tile edges, this helps the model to predict better.
  • Cell Size: Should be close to the size used to train the model. This was specified in the Export training data step.
  • Processor Type: This allows you to control whether the system's 'GPU' or 'CPU' will be used to classify pixels, by 'default GPU' will be used if available.

Results

The gif below was achieved with the model trained in this notebook and visualizes the generated RGB image over original RGB image near Rotterdam.

Conclusion

In this notebook, we demonstrated how to use CycleGAN model using ArcGIS API for Python in order to translate imagery of one type to the other.

References

[1] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks; https://arxiv.org/abs/1703.10593.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.