Coastline classification using Feature Classifier
- 🔬 Data Science
- 🥠 Deep Learning and Object classification
We have already seen how we can extract coastlines using Landsat-8 multispectral imagery and band ratio technique, and next, we will classify these coastlines into multiple categories. To achieve this, we can train a model that can classify a coastline as one of the different categories shown in the screenshot below:
In this sample notebook, we will see how we can classify these coastlines in the categories mentioned in figure 1, by training a
Feature Classifier model.
import os import glob import zipfile from pathlib import Path from arcgis.gis import GIS from arcgis.learn import prepare_data, FeatureClassifier
# Connect to GIS gis = GIS("home")
Using ArcGIS Maritime, we imported NOAA’s Electronic Navigational Charts. The maritime data in these charts contain the Coastline Feature class with the Category of Coastline details. The Sentinel 2 imagery was downloaded from the Copernicus Open Access Hub.
Before exporting the data, we will first create grids along the coastlines that can act as a feature class while exporting the data. For this, we will use the
Generate Rectangles Along Lines tool. The parameters required to run the function are:
Input Line Features: Coastlines (belonging to each category)
Output Feature Class: Output Feature class name
Length Along the Line: 1000
Length Perpendicular to the Line: 1000
Spatial Sort Method: Upper left
Figure 3 shows the output from the tool when run on a feature class belonging to category 3 (sandy shore).
Next, we add a category column to the feature class that can act as the class value field while exporting the data. We created a category column of the long data type named CATCOA. In this column, we set the value to be the same as the feature's coastline category using the
Calculate options respectively, as shown in figure 4.
We then performed a similar process for each type of coastline category. Next, we exported this data in the "Labeled Tiles" metadata format for a small extent with multiple categories, using the
Export Training Data For Deep Learning tool. This
Export Training Data For Deep Learning tool is available in ArcGIS Pro, as well as ArcGIS Image Server.
Input Raster: Sentinel2 imagery
Input Feature class Or Classified Raster: Feature class as shown in figure 4.
Tile Size X & Tile Size Y: 256
Stride X & Stride Y: 128
Meta Data Format: 'Labeled Tiles' as we are training a
Environments: Set optimum
with arcpy.EnvManager(extent="MINOF", cellSize=10): arcpy.ia.ExportTrainingDataForDeepLearning("Multispectral_MTD_MSIL1C", r"D:\Coastline category\Data\Category_1", "CoastlineL_Clip_category1_Rectangles", "TIFF", 256, 256, 128, 128, "ONLY_TILES_WITH_FEATURES", "Labeled_Tiles", 0, "CATCOA", 0, None, 0, "MAP_SPACE", "PROCESS_AS_MOSAICKED_IMAGE", "NO_BLACKEN", "FIXED_SIZE")
We also created separate folders for each category to demonstrate the recently added multi-folder training support. Alternatively, you can choose the same folder each time you export the data, resulting in the newly exported images being amended in the existing folder.
We have also provided a subset of training data exported from each category. You can use this data directly to run these experiments.
training_data = gis.content.get('9251417cb9ab4a059eb538282f82883c') training_data
filepath = training_data.download(file_name=training_data.name)
with zipfile.ZipFile(filepath, 'r') as zip_ref: zip_ref.extractall(Path(filepath).parent)
output_path = os.path.join(os.path.splitext(filepath), "*")
output_path = glob.glob(output_path)
arcgis.learn provides the ability to determine the class of each feature in the form of a FeatureClassifier model. To learn more about it's workings and use cases, see this guide - "How feature classifier works?".
Here, we will specify the path to our training data and a few hyperparameters.
path: path of the folder/list of folders containing training data.
batch_size: Number of images your model will train on each step inside an epoch, it directly depends on the memory of your graphic card. 128 worked for us on a 32GB GPU.
# output_path = [ r'D:\Coastline_category\Data_generate_rectangles\Category_1', # r'D:\Coastline_category\Data_generate_rectangles\Category_2', # r'D:\Coastline_category\Data_generate_rectangles\Category_3', # r'D:\Coastline_category\Data_generate_rectangles\Category_4', # r'D:\Coastline_category\Data_generate_rectangles\Category_6', # r'D:\Coastline_category\Data_generate_rectangles\Category_7', # r'D:\Coastline_category\Data_generate_rectangles\Category_8', # r'D:\Coastline_category\Data_generate_rectangles\Category_10']
data = prepare_data( path=output_path, batch_size=128, val_split_pct=0.2 )
To get a sense of what the training data looks like, the
arcgis.learn.show_batch() method randomly picks a few training chips and visualizes them.
rows: Number of rows to visualize
model = FeatureClassifier(data, oversample=True)
Learning rate is one of the most important hyperparameters in model training.
ArcGIS API for Python provides a learning rate finder that automatically chooses the optimal learning rate for you.
lr = model.lr_find()
We will train the model for a few epochs with the learning rate we have found. For the sake of time, we can start with 20 epochs.
Here, with only 20 epochs, we can see reasonable results — both training and validation losses have gone down considerably, indicating that the model is learning to classify coastlines.
It is a good practice to see the results of the model viz-a-viz ground truth. The code below picks random samples and shows us ground truth and model predictions, side by side. This enables us to preview the results of the model within the notebook.
plot_confusion_matrix() that plots a confusion matrix of the model predictions to evaluate its accuracy.
The confusion matrix validates that the trained model is learning to classify coastlines. The diagonal numbers show the number of chips correctly classified to the respective categories. The results are good for all but category 2. By looking at the row for category 2, we can see that there are very few chips in the validation set of our data (5 in total). As such, we can increase the number of chips either by increasing the value of the
val_split_pct parameter in
prepare_data() or by exporting more data for that particular category. We may need to re-train the model if we add more data to it.
Now, we will save the model that we trained as a 'Deep Learning Package' ('.dlpk' format). A Deep Learning package is the standard format used to deploy deep learning models on the ArcGIS platform.
We will use the
save() method to save the trained model. By default, it will be saved to the 'models' sub-folder within our training data folder.
Computing model metrics...
In order for us to perform inferencing in ArcGIS Pro, we need to create a feature class along the coastlines using the
Generate Rectangles Along Lines tool, as shown in figure 2, for an area that is not already seen by the model.
Now, we will use the
Classify Objects Using Deep Learning tool for inferencing the results. The parameters required to run the function are:
Input Raster: Sentinel2 imagery
Input Features: Output from the
Generate Rectangles Along coastlinestool.
Output CLassified Objects Feature Class: Output feature class.
Model Definition: The model that we trained.
Class Label Field: Feild name that will contain the detected class number.
Environments: Set optimum
We selected an unseen (by the model) sandy shoreline (category 3) and generated the required rectangles along it using the
Generate Rectangles Along Lines tool. We then used our model for classification. Below are the results that we got.
You can observe in figure 9 that two rectangles got misclassified into category 7 (mangrove), and that the rest were classified correctly as belonging to category 3. Further training of the model could produce even more accurate results.
In this notebook, we demonstrated how to use the
FeatureClassifier model from the
ArcGIS API for Python to classify coastlines into multiple categories.