Category-wise shoreline classification using Sentinel-2 imagery
- 🔬 Data Science
- 🥠 Deep Learning and Object Classification
Introduction
We have already explored how to extract coastlines using Landsat-8 multispectral imagery and band ratio technique. Now, the next step is to classify these coastlines into multiple categories based on their characteristics. To achieve this, we will train a deep learning model that can classify each coastline segment into one of the categories shown below:

Figure 1. Coastline categories
🎯 Objective
The goal of this notebook is to demonstrate how to use deep learning for automated coastline classification using Sentinel-2 satellite imagery and the FeatureClassifier model from the ArcGIS API for Python. The workflow classifies coastal segments into meaningful shoreline categories—such as sandy shores, rocky cliffs, and mangroves—based on their visual characteristics.
📚 What You'll Learn
By the end of this notebook, you will understand how to:
✅ Prepare and label geospatial training data for shoreline classification
✅ Use Sentinel-2 imagery in conjunction with NOAA ENC (Electronic Navigational Charts) datasets
✅ Train a deep learning model to classify coastline features
✅ Evaluate model performance using both quantitative metrics and visual inspection
✅ Apply the trained model to new/unseen coastal areas for inference
🧰 Tools & Technologies
- ArcGIS API for Python
- ArcGIS Pro
- FeatureClassifier deep learning model
- Python libraries:
arcgis
,glob
, etc.
🗂️ Dataset Description
- Imagery: Sentinel-2 optical satellite imagery
- Labels: Coastal segment categories derived from NOAA ENC datasets
- Format: Labelled tiles and metadata generated via the
Export Training Data for Deep Learning
tool. - Classes: Multiple shoreline types such as sandy shore, rocky shore, mangrove, etc.
🔁 Workflow Overview
-
Data Preparation
- Create a grid pattern using the
Generate Rectangles Along Lines
tool - Export labeled tiles from Sentinel-2 imagery using the
Export Training Data for Deep Learning
tool.
- Create a grid pattern using the
-
Model Training
- Train a
FeatureClassifier model
on the labeled tiles usingarcgis.learn
. - Monitor training using loss curves and metrics
- Train a
-
Model Evaluation
- Assess performance using a confusion matrix and visual validation
- Identify strengths and areas for improvement
-
Inference
- Apply the trained model to new areas
- Generate classification maps and analyze results
🚀 Let’s get started by importing required modules and preparing our data!
Necessary imports
# Standard Python Libraries
# ---------------------------------------------
import os # For interacting with the operating system (e.g., path operations)
import glob # For searching files using wildcard patterns (e.g., "*.zip", "*.tif")
import zipfile # For working with .zip files (extracting or creating archives)
from pathlib import Path # For modern and safe file path handling
# ArcGIS Deep Learning Libraries
# ---------------------------------------------
from arcgis.gis import GIS # For connecting to ArcGIS Online or Enterprise portal
from arcgis.learn import prepare_data, FeatureClassifier # prepare_data: Prepares training data for deep learning workflows, FeatureClassifier: Used to train and infer deep learning models for feature classification
Connect to your GIS
GIS("home") is typically used when you're running the script inside ArcGIS Pro or ArcGIS Enterprise Notebooks, where you're already signed in.
It uses your current active portal and authenticated session, so no need to enter username/password.
# Connect to your GIS (ArcGIS Online or Enterprise)
# -------------------------------------------------------
gis = GIS("home") # Connects using credentials from your active ArcGIS Pro session or ArcGIS Notebook environment
Connect using ArcGIS Online credentials (manual login)
Replace "url" with your portal URL (e.g., "https://www.arcgis.com" for ArcGIS Online). Replace "your_username" and "your_password" with your actual ArcGIS credentials
gis = GIS("url", "your_username", "your_password")
Read more about the GIS module here.
Export training data
Using ArcGIS Maritime, we imported NOAA’s Electronic Navigational Charts. The maritime data in these charts contain the coastline feature class with the category of coastline details. The Sentinel 2 imagery has been downloaded from the Copernicus Open Access Hub.
Before exporting the data, we will create a grid pattern—illustrated in Figure 2—along the coastline to serve as a feature class during export. To do this, we will use the Generate Rectangles Along Lines tool. The required parameters for this tool are as follows:
Input Line Features
: Coastline features (categorized by type)Output Feature Class
: Desired output feature class nameLength Along the Line
: 1000Length Perpendicular to the Line
: 1000Spatial Sort Method
: Upper Left

Figure 3 displays the output generated by running the tool on a coastline feature class classified as category 3 (sandy shore).
The next step involves adding a category field to the resulting feature class, which will be used as the class value field during data export. To do this, we created a new field named CATCOA with a data type of Long. Using the Add
and Calculate
options—illustrated in Figure 4—we populated this field with values corresponding to each feature’s coastline category.

Figure 4. Feature class after added category column
We then repeated this process for each coastline category to ensure comprehensive coverage. Once all categories were processed, we exported the dataset using the Labeled Tiles metadata format over a small area containing multiple coastline types. This was done using the Export Training Data For Deep Learning
, available in both ArcGIS Pro and ArcGIS Image Server.
The Parameters
used for this export were as follows:
Input Raster
: Sentinel-2 imageryInput Feature Class or Classified Raster
: The feature class generated earlier, as shown in Figure 4Tile Size X
: 256Tile Size Y
: 256Stride X
: 128Stride Y
: 128Metadata Format
: Labeled Tiles (suitable for training a Feature Classifier model)
The Environments
settings used for this export were as follows:
Cell Size
andProcessing Extent
were configured to ensure efficient export and appropriate spatial resolution.

Figure 5. Export Training Data for Deep Learning tool
Below is the Python code snippet that can be used to run the Export Training Data For Deep Learning
tool directly within an ArcGIS Notebook. This allows for automating the export process instead of using the tool interface manually. Either method—using the ArcGIS GUI tool or the Python script—is valid for exporting data in a format compatible with ArcGIS deep learning workflows.
with arcpy.EnvManager(extent="MINOF", cellSize=10):
arcpy.ia.ExportTrainingDataForDeepLearning(
"Multispectral_MTD_MSIL1C", # Input Raster (Sentinel-2 imagery)
r"D:\Coastline category\Data\Category_1", # Output folder
"CoastlineL_Clip_category1_Rectangles", # Input Feature Class
"TIFF", # Image format
256, 256, # Tile Size X, Tile Size Y
128, 128, # Stride X, Stride Y
"ONLY_TILES_WITH_FEATURES", # Export tiles with features only
"Labeled_Tiles", # Metadata Format
0, # Start Index
"CATCOA", # Class value field
0, None, 0, # Additional parameters (e.g., buffer, polygon name field)
"MAP_SPACE", # Reference system
"PROCESS_AS_MOSAICKED_IMAGE", # Image processing mode
"NO_BLACKEN", # Do not blacken tiles around features
"FIXED_SIZE" # Crop Mode
)
This script sets up the necessary environment settings, including extent and cell size, and then calls the export tool with all relevant parameters—ensuring the output is correctly formatted for training a feature classification model.
We also organized the exported data into separate folders for each coastline category to showcase the multi-folder training support. This structure allows for better separation and management of category-specific data during training.
Alternatively, if preferred, you can export all data into a single shared folder. In this case, the newly exported tiles from each category will be appended to the existing folder contents, maintaining a unified dataset.
To help you get started quickly, we’ve included a subset of training data exported from each category. This prepackaged data can be used directly to run classification experiments without needing to repeat the export process.
# Fetch the hosted training data
# ---------------------------------
# Get the training data item using its Item ID from ArcGIS Online or Enterprise
training_data = gis.content.get('9251417cb9ab4a059eb538282f82883c')
# Display the item (useful in notebooks to verify it's the correct one)
training_data
# Download the hosted dataset to your local system
# -------------------------------------------------------
# Downloads the training data item to the temp directory
# The file will be saved with its original name stored in training_data.name (e.g., "my_dataset.zip")
filepath = training_data.download(file_name=training_data.name)
# Unzip the downloaded ZIP file to the same directory
# -------------------------------------------------------
# Extract the contents of the ZIP file into the same folder where it was downloaded
with zipfile.ZipFile(filepath, 'r') as zip_ref:
zip_ref.extractall(Path(filepath).parent)
# Get the path of the unzipped folder
# -------------------------------------------------------
# Remove the '.zip' extension from the downloaded file path to get the folder name. os.path.splitext(filepath)[0] removes the .zip extension from the file path.
# Append '*' to match all contents inside the unzipped folder (if needed for globbing). Appending "*" gives you a wildcard pattern — useful if you want to list or process files inside the folder using glob.
output_path = os.path.join(os.path.splitext(filepath)[0], "*")
# List the contents of the unzipped folder
# -------------------------------------------------------
# Use glob to get a list of all files/folders inside the unzipped directory. `glob.glob(output_path)` takes the wildcard path (e.g., "unzipped_folder/*") and returns a list of all files and directories inside it.
output_path = glob.glob(output_path)
# Display the list of files/folders
output_path # output_path now becomes a list, such as:
['~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_1', '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_10', '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_2', '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_3', '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_4', '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_6', '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_7', '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_8']
Train the model
ArcGIS's arcgis.learn
module allows you to build a Feature Classifier model that can classify individual features based on training data. To understand the inner workings and potential applications of this model, refer to the official guide: "How feature classifier works?".
📁 Prepare the data
Prepare the data using ArcGIS Learn's prepare_data function
Here, we will specify the path to our training data and a few hyperparameters.
-
path
: Directory containing the exported labeled tiles. You can use:- A single folder (for one category), or
- A list of folders (for multi-category training).
-
batch_size
: Controls how many images are processed at once during training. You may adjust this value based on your system's GPU capacity. 128 worked for us on a 32GB GPU. -
val_split_pct
: percentage of data to use for validation
Once the data is prepared using prepare_data(), you can proceed to initialize and train the Feature Classifier model.
# Prepare the data for training the deep learning model
# -------------------------------------------------------
data = prepare_data(path=output_path, batch_size=128, val_split_pct=0.2)
The data
variable is now ready to be used with a deep learning model like FeatureClassifier
.
🖼️ Visualize Training Data
🔍 To inspect the quality and variety of your training data, you can use the show_batch()
method provided by arcgis.learn
. This method:
-
Randomly selects a batch of labeled image chips (or features).
-
Displays them in a grid along with their associated class labels.
-
Helps you visually verify that:
- Images are properly aligned with their labels.
- There's enough class variation. Check for class balance across categories.
- Data quality is suitable for training.
- Verify that the tiles are correctly labeled.
-
Especially useful during early experimentation to catch issues like:
- Mislabeling
- Missing classes
- Poor image quality
- Imbalanced datasets
-
rows
: Specifies in how many rows to display image chips.
# Visualize a sample batch from the training dataset
# -------------------------------------------------------
data.show_batch(rows=5)

🧩 Load the Model Architecture
Now that the data is prepared, you can initialize the Feature Classifier model using the FeatureClassifier
class from arcgis.learn
. Read more about this model here.
oversample=True
: This helps balance the dataset by oversampling minority classes, which is especially helpful if your training data contains class imbalances.
# Create a FeatureClassifier model object for training
# -------------------------------------------------------
model = FeatureClassifier(data, oversample=True)
📈 Find an Optimal Learning Rate
The learning rate is a critical hyperparameter that affects how quickly or effectively your model learns. The ArcGIS API for Python
provides a built-in learning rate finder to help you choose a value that leads to faster convergence without overshooting. After running the learning rate finder, it will generate a plot showing the learning rate versus loss. You can then select the learning rate corresponding to the point just before the loss starts to increase.
model.lr_find()
runs a learning rate range test—it trains the model for a few batches while gradually increasing the learning rate and records the loss. It returns a suggested learning rate (lr) that is:
- High enough to train quickly,
- But not so high that it causes unstable training.
To visualize the plot:
# Automatically find an optimal learning rate
# -------------------------------------------------------
lr = model.lr_find()
# Display the recommended learning rate
lr

0.00019054607179632462
This helps you:
- Visually identify the "sweet spot" before the loss starts to increase.
- Optionally pick a slightly lower LR than the steepest drop for stability.
Once you identify the optimal value from the plot, you'll use it when training the model in the next step.
🚀 Fit the Model
Here’s how you can fit the model using the chosen learning rate and number of epochs:
20
: Number of training epochs (you can increase this later based on performance). For the sake of time, we can start with 20 epochs.lr=lr
: Learning rate (example value—adjust it based on your learning rate finder plot).
💡 Tips for Monitoring Training: During training, watch:
- Training loss: Should steadily decrease.
- Validation loss: Should also decrease or stay stable (not increase significantly).
If validation loss increases while training loss keeps dropping, overfitting may be happening. If the model is overfitting, consider:
- Reducing the learning rate,
- Increasing the training data,
- Or applying data augmentation.
We will train the model for a few epochs with the learning rate we have found.
# Train the FeatureClassifier model
# -------------------------------------------------------
# Train the model for 20 epochs using the learning rate found earlier.
model.fit(20, lr=lr)
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.075467 | 0.124108 | 0.961510 | 08:20 |
1 | 0.068225 | 0.124054 | 0.962610 | 07:42 |
2 | 0.075965 | 0.119657 | 0.964809 | 07:59 |
3 | 0.069717 | 0.119751 | 0.965176 | 08:28 |
4 | 0.064654 | 0.121075 | 0.965176 | 08:22 |
5 | 0.072091 | 0.118563 | 0.966642 | 07:56 |
6 | 0.057890 | 0.117076 | 0.964809 | 08:16 |
7 | 0.060536 | 0.118210 | 0.963343 | 08:14 |
8 | 0.063993 | 0.117165 | 0.965176 | 08:00 |
9 | 0.065712 | 0.114718 | 0.965909 | 07:16 |
10 | 0.058225 | 0.118122 | 0.965909 | 07:18 |
11 | 0.058652 | 0.112426 | 0.966276 | 07:33 |
12 | 0.058043 | 0.110955 | 0.967375 | 07:40 |
13 | 0.050317 | 0.113226 | 0.966642 | 07:53 |
14 | 0.055455 | 0.111161 | 0.969575 | 07:42 |
15 | 0.056420 | 0.111660 | 0.967375 | 07:55 |
16 | 0.055499 | 0.112515 | 0.967375 | 07:40 |
17 | 0.054759 | 0.109142 | 0.969208 | 07:25 |
18 | 0.052774 | 0.109370 | 0.970308 | 07:20 |
19 | 0.056893 | 0.109999 | 0.968842 | 08:22 |
📊 Model Performance After Training
After just 20 epochs, we observe that both training and validation losses have significantly decreased, which indicates the model is effectively learning to classify different coastline categories. While these are promising early results, further fine-tuning or more training data could enhance performance even more.
📉 Plot the Loss Curve
To monitor the model’s learning progress over epochs, you can plot the training and validation loss curves. This helps visualize how well the model is learning and whether it's overfitting or underfitting. model.plot_losses()
generates a line plot showing:
- Training loss over each epoch
- Validation loss over each epoch
🧠 What to Look For:
- ✅ Smooth downward trend in both losses → Good training progress.
- ⚠️ Validation loss flattening or rising while training loss decreases → Possible overfitting.
- ❌ Erratic loss behavior → Learning rate may be too high or data may be noisy.
Use this method:
# Plot training and validation loss curves
# -------------------------------------------------------
model.plot_losses()

🖼️ Visualize Results on the Validation Set
To better understand how well the model is performing, it's good practice to visually compare the model’s predictions against the ground truth labels. This allows you to qualitatively assess whether the model is making correct classifications.
You can use the following code to preview the results of the model within the notebook:
rows=3
: You can increase this to view more samples.
This will display:
- Input tile
- Ground truth class
- Predicted class by the model
🔍 This visual check can reveal patterns in model errors, such as consistent confusion between similar categories (e.g., rocky vs. sandy shores), helping guide further improvements.
# Visualize model predictions vs actual labels
# -------------------------------------------------------
model.show_results(rows=4)

Accuracy Assessment
After training, it’s important to evaluate how accurately the model is predicting each class. One of the best tools for this is the confusion matrix, which shows how often predictions match the true labels—and where the model is making mistakes.
📉 Plot the Confusion Matrix
Use the plot_confusion_matrix()
method from arcgis.learn
:
# Plot the confusion matrix for model evaluation
# -------------------------------------------------------
model.plot_confusion_matrix()

🧠 How to Interpret the Confusion Matrix
- Each row = Actual class (ground truth label)
- Each column = Predicted class (model's prediction)
- Diagonal cells = Correct predictions
- Off-diagonal cells = Misclassifications (where the model predicted the wrong class)
- 🔍 Strong diagonal dominance suggests the model is performing well overall.
- ⚠️ Consistent misclassification in specific rows/columns may point to:
- Class imbalance
- Visually similar categories
- Too few training/validation samples for those categories
- The goal is to have high numbers on the diagonal and low or zero values elsewhere.
📊 Confusion Matrix Insights
The confusion matrix shows strong performance for most categories, with a majority of predictions falling along the diagonal (i.e., correct classifications).
✅ Category-wise Breakdown
Category | Total Tiles (Validation Set) | Correctly Classified | Misclassified As |
---|---|---|---|
1 | 469 | 450 | 13 as 3, 4 as 4, 1 as 8 |
2 | 8 | 7 | 1 as Category 1 |
3 | 369 | 333 | Some as 1, 4, 6, 7, 8 |
4 | 67 | 63 | 4 misclassified as cat 3 |
6 | 107 | 104 | 1 as 4, 2 as 3 |
7 | 756 | 737 | 2 as 3, 1 as 1, 2 and 4, 14 as 8 |
8 | 910 | 890 | 6 as 3, 12 as 7, 2 as 1 |
10 | 42 | 42 | 0 misclassified |
⚠️ Category 2 Observations
- There are 8 tiles for Category 2 in the validation set.
- 7 were correctly classified, while 1 was misclassified as Category 1.
- With such a small sample size, the model’s accuracy for this class is less reliable and could benefit from more examples.
🔁 What You Can Do
To improve classification performance, especially for underrepresented categories like Category 2:
If you observe significant misclassifications or inconsistent accuracy across categories:
-
📌 Check for class imbalance.
-
➕ Export more labeled tiles for underrepresented classes.
-
🔄 Increase the validation split (
val_split_pct
) inprepare_data()
as shown below.data = prepare_data(path=data_path, batch_size=128, val_split_pct=0.3)
-
🧪 Consider data augmentation or oversampling.
-
👀 Examine whether the misclassified categories have visually similar features.
💡 Note: If you modify or expand your dataset, be sure to retrain the model so it can learn from the updated distribution.
💾 Save the Model
Now, we will save the trained model as a Deep Learning Package (.dlpk
).
This is the standard format used to deploy deep learning models across the ArcGIS platform, including:
We use the .save()
method to export the model.
By default, the .dlpk
file will be saved to the models
sub-folder within the training data directory.
# Save the trained model for future use
# -------------------------------------------------------
model.save('model-20e_8classes')
Computing model metrics...
WindowsPath('D:/Work/Repos/Arcgis_python_API/arcgis-python-api/samples/04_gis_analysts_data_scientists/models/model-20e_8classes')
You can also specify a custom path if needed. For example -
model.save(r"D:\Exported_Models\coastline_classifier.dlpk")
💡Once saved, this .dlpk
file can be:
- Shared with colleagues,
- Uploaded to ArcGIS Online or Enterprise,
- Or used for inferencing on new imagery in ArcGIS Pro or here in sample notebook itself.
Model Inference
To perform inferencing in ArcGIS Pro, we first need to create a set of input features along the coastlines of a new, unseen area—i.e., an area not used during model training.
🧱 Step 1: Prepare Input Features
Use the Generate Rectangles Along Lines tool to create rectangular features along the coastline (similar to the training process; see Figure 2). These rectangles will act as the input objects for classification.
🤖 Step 2: Run the Inference Tool
Use the Classify Objects Using Deep Learning tool in ArcGIS Pro.
📥 Required Parameters:
- Input Raster: The Sentinel-2 imagery of the new area.
- Input Features: Output from the
Generate Rectangles Along coastlines
tool. (from Step 1). - Output Classified Objects Feature Class: The output feature class containing the classified rectangles.
- Model Definition: The
.dlpk
model saved in the previous step. - Class Label Field: The field name (e.g.,
ClassLabel
) where the model will write the predicted class.
⚙️ Environments to Set:
- Cell Size: Match the resolution of the Sentinel-2 imagery (e.g., 10 meters).
- Processing Extent: Limit the analysis to your area of interest.
- Processor Type: Choose between CPU or GPU based on your hardware.
💡After running the tool, each input rectangle will be assigned a coastline category label as predicted by the model—enabling spatial analysis and visualization directly in ArcGIS Pro.

Figure 6. Classify Objects Using Deep Learning tool
Results
We selected a sandy shoreline (Category 3) that was **unseen by the model during training. Using the Generate Rectangles Along Lines tool, we created rectangular features along this shoreline and ran inference using our trained model.

📊 Inference Results Summary (Figure 9)
In Figure 9, you can observe the classification output.
- ✅ Most rectangles were correctly classified as Category 3 (sandy shore).
- ❌ However, two rectangles were misclassified as Category 7 (mangrove).
Predicted | Count |
---|---|
Category 3 (Correct) | ✅ 6 tiles |
Category 7 (Misclassified) | ❌ 2 tiles |
This indicates the model is performing well overall, though there’s still room for improvement—particularly in distinguishing between visually similar shoreline types.
📌 Additional training data or fine-tuning could further improve model accuracy.
Conclusion
In this notebook, we demonstrated a complete workflow for coastline classification using the FeatureClassifier
model from the ArcGIS API for Python
.
We covered:
- ✅ Preparing training data using NOAA ENC coastline categories and Sentinel-2 imagery
- ✅ Generating labeled training tiles using the Export Training Data for Deep Learning tool
- ✅ Training a deep learning model to classify coastal segments into predefined categories
- ✅ Evaluating model performance using loss curves, confusion matrix, and visual validation
- ✅ Performing inference on unseen areas using the Classify Objects Using Deep Learning tool in ArcGIS Pro
The model showed promising results in correctly identifying most coastline categories, especially when sufficient training samples were available. Some minor misclassifications highlight areas where additional data or fine-tuning could improve accuracy.
🌊 Real-World Impact
By automating coastline classification:
- 🌐 Coastal monitoring becomes more scalable
- 📉 Manual interpretation effort is reduced
- 🛰️ Satellite imagery can be leveraged for large-scale environmental analysis
🔁 Machine learning is an iterative process—as the dataset improves, so will the model.
Thanks for following along!