Coastline Classification using Feature Classifier

Category-wise shoreline classification using Sentinel-2 imagery

🔬 Data Science

🥠 Deep Learning and Object Classification

Introduction

We have already explored how to extract coastlines using Landsat-8 multispectral imagery and band ratio technique. Now, the next step is to classify these coastlines into multiple categories based on their characteristics. To achieve this, we will train a deep learning model that can classify each coastline segment into one of the categories shown below:

Figure 1. Coastline categories

🎯 Objective

The goal of this notebook is to demonstrate how to use deep learning for automated coastline classification using Sentinel-2 satellite imagery and the FeatureClassifier model from the ArcGIS API for Python. The workflow classifies coastal segments into meaningful shoreline categories—such as sandy shores, rocky cliffs, and mangroves—based on their visual characteristics.

📚 What You'll Learn

By the end of this notebook, you will understand how to:
✅ Prepare and label geospatial training data for shoreline classification
✅ Use Sentinel-2 imagery in conjunction with NOAA ENC (Electronic Navigational Charts) datasets
✅ Train a deep learning model to classify coastline features
✅ Evaluate model performance using both quantitative metrics and visual inspection
✅ Apply the trained model to new/unseen coastal areas for inference

🧰 Tools & Technologies

ArcGIS API for Python
ArcGIS Pro
FeatureClassifier deep learning model
Python libraries: arcgis, glob, etc.

🗂️ Dataset Description

Imagery: Sentinel-2 optical satellite imagery
Labels: Coastal segment categories derived from NOAA ENC datasets
Format: Labelled tiles and metadata generated via the Export Training Data for Deep Learning tool.
Classes: Multiple shoreline types such as sandy shore, rocky shore, mangrove, etc.

🔁 Workflow Overview

Data Preparation
- Create a grid pattern using the Generate Rectangles Along Lines tool
- Export labeled tiles from Sentinel-2 imagery using the Export Training Data for Deep Learning tool.
Model Training
- Train a FeatureClassifier model on the labeled tiles using arcgis.learn.
- Monitor training using loss curves and metrics
Model Evaluation
- Assess performance using a confusion matrix and visual validation
- Identify strengths and areas for improvement
Inference
- Apply the trained model to new areas
- Generate classification maps and analyze results

🚀 Let’s get started by importing required modules and preparing our data!

Necessary imports

# Standard Python Libraries
# ---------------------------------------------

import os                                                 # For interacting with the operating system (e.g., path operations)
import glob                                               # For searching files using wildcard patterns (e.g., "*.zip", "*.tif")
import zipfile                                            # For working with .zip files (extracting or creating archives)
from pathlib import Path                                  # For modern and safe file path handling

# ArcGIS Deep Learning Libraries
# ---------------------------------------------

from arcgis.gis import GIS                                 # For connecting to ArcGIS Online or Enterprise portal
from arcgis.learn import prepare_data, FeatureClassifier   # prepare_data: Prepares training data for deep learning workflows, FeatureClassifier: Used to train and infer deep learning models for feature classification

Connect to your GIS

GIS("home") is typically used when you're running the script inside ArcGIS Pro or ArcGIS Enterprise Notebooks, where you're already signed in.

It uses your current active portal and authenticated session, so no need to enter username/password.

# Connect to your GIS (ArcGIS Online or Enterprise)
# -------------------------------------------------------

gis = GIS("home")  # Connects using credentials from your active ArcGIS Pro session or ArcGIS Notebook environment

Connect using ArcGIS Online credentials (manual login)

Replace "url" with your portal URL (e.g., "https://www.arcgis.com" for ArcGIS Online). Replace "your_username" and "your_password" with your actual ArcGIS credentials

gis = GIS("url", "your_username", "your_password")

Export training data

Using ArcGIS Maritime, we imported NOAA’s Electronic Navigational Charts. The maritime data in these charts contain the coastline feature class with the category of coastline details. The Sentinel 2 imagery has been downloaded from the Copernicus Open Access Hub.

Before exporting the data, we will create a grid pattern—illustrated in Figure 2—along the coastline to serve as a feature class during export. To do this, we will use the Generate Rectangles Along Lines tool. The required parameters for this tool are as follows:

Input Line Features: Coastline features (categorized by type)
Output Feature Class: Desired output feature class name
Length Along the Line: 1000
Length Perpendicular to the Line: 1000
Spatial Sort Method: Upper Left

Figure 3 displays the output generated by running the tool on a coastline feature class classified as category 3 (sandy shore).

The next step involves adding a category field to the resulting feature class, which will be used as the class value field during data export. To do this, we created a new field named CATCOA with a data type of Long. Using the Add and Calculate options—illustrated in Figure 4—we populated this field with values corresponding to each feature’s coastline category.

Figure 4. Feature class after added category column

We then repeated this process for each coastline category to ensure comprehensive coverage. Once all categories were processed, we exported the dataset using the Labeled Tiles metadata format over a small area containing multiple coastline types. This was done using the Export Training Data For Deep Learning, available in both ArcGIS Pro and ArcGIS Image Server.

The Parameters used for this export were as follows:

Input Raster: Sentinel-2 imagery
Input Feature Class or Classified Raster: The feature class generated earlier, as shown in Figure 4
Tile Size X: 256
Tile Size Y: 256
Stride X: 128
Stride Y: 128
Metadata Format: Labeled Tiles (suitable for training a Feature Classifier model)

The Environments settings used for this export were as follows:

Cell Size and Processing Extent were configured to ensure efficient export and appropriate spatial resolution.

Figure 5. Export Training Data for Deep Learning tool

Below is the Python code snippet that can be used to run the Export Training Data For Deep Learning tool directly within an ArcGIS Notebook. This allows for automating the export process instead of using the tool interface manually. Either method—using the ArcGIS GUI tool or the Python script—is valid for exporting data in a format compatible with ArcGIS deep learning workflows.

with arcpy.EnvManager(extent="MINOF", cellSize=10):
    arcpy.ia.ExportTrainingDataForDeepLearning(
        "Multispectral_MTD_MSIL1C",                         # Input Raster (Sentinel-2 imagery)
        r"D:\Coastline category\Data\Category_1",           # Output folder
        "CoastlineL_Clip_category1_Rectangles",             # Input Feature Class
        "TIFF",                                             # Image format
        256, 256,                                           # Tile Size X, Tile Size Y
        128, 128,                                           # Stride X, Stride Y
        "ONLY_TILES_WITH_FEATURES",                         # Export tiles with features only
        "Labeled_Tiles",                                    # Metadata Format
        0,                                                  # Start Index
        "CATCOA",                                           # Class value field
        0, None, 0,                                         # Additional parameters (e.g., buffer, polygon name field)
        "MAP_SPACE",                                        # Reference system
        "PROCESS_AS_MOSAICKED_IMAGE",                       # Image processing mode
        "NO_BLACKEN",                                       # Do not blacken tiles around features
        "FIXED_SIZE"                                        # Crop Mode
    )

This script sets up the necessary environment settings, including extent and cell size, and then calls the export tool with all relevant parameters—ensuring the output is correctly formatted for training a feature classification model.

We also organized the exported data into separate folders for each coastline category to showcase the multi-folder training support. This structure allows for better separation and management of category-specific data during training.

Alternatively, if preferred, you can export all data into a single shared folder. In this case, the newly exported tiles from each category will be appended to the existing folder contents, maintaining a unified dataset.

To help you get started quickly, we’ve included a subset of training data exported from each category. This prepackaged data can be used directly to run classification experiments without needing to repeat the export process.

# Fetch the hosted training data 
# ---------------------------------

# Get the training data item using its Item ID from ArcGIS Online or Enterprise
training_data = gis.content.get('9251417cb9ab4a059eb538282f82883c')

# Display the item (useful in notebooks to verify it's the correct one)
training_data

coastline_classification_using_feature_classifier
Training Data for Coastline Classification using Feature Classifier

Image Collection by api_data_owner
Last Modified: August 13, 2025
0 comments, 100 views

# Download the hosted dataset to your local system
# -------------------------------------------------------

# Downloads the training data item to the temp directory
# The file will be saved with its original name stored in training_data.name (e.g., "my_dataset.zip")

filepath = training_data.download(file_name=training_data.name)

# Unzip the downloaded ZIP file to the same directory
# -------------------------------------------------------

# Extract the contents of the ZIP file into the same folder where it was downloaded

with zipfile.ZipFile(filepath, 'r') as zip_ref:
    zip_ref.extractall(Path(filepath).parent)

# Get the path of the unzipped folder
# -------------------------------------------------------

# Remove the '.zip' extension from the downloaded file path to get the folder name. os.path.splitext(filepath)[0] removes the .zip extension from the file path.
# Append '*' to match all contents inside the unzipped folder (if needed for globbing). Appending "*" gives you a wildcard pattern — useful if you want to list or process files inside the folder using glob.

output_path = os.path.join(os.path.splitext(filepath)[0], "*")

# List the contents of the unzipped folder
# -------------------------------------------------------

# Use glob to get a list of all files/folders inside the unzipped directory. `glob.glob(output_path)` takes the wildcard path (e.g., "unzipped_folder/*") and returns a list of all files and directories inside it.
output_path = glob.glob(output_path)

# Display the list of files/folders
output_path # output_path now becomes a list, such as:

['~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_1',
 '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_10',
 '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_2',
 '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_3',
 '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_4',
 '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_6',
 '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_7',
 '~\AppData\\Local\\Temp\\coastline_classification_using_feature_classifier\\Category_8']

Train the model

ArcGIS's arcgis.learn module allows you to build a Feature Classifier model that can classify individual features based on training data. To understand the inner workings and potential applications of this model, refer to the official guide: "How feature classifier works?".

📁 Prepare the data

Prepare the data using ArcGIS Learn's prepare_data function

Here, we will specify the path to our training data and a few hyperparameters.

path: Directory containing the exported labeled tiles. You can use:
- A single folder (for one category), or
- A list of folders (for multi-category training).
batch_size: Controls how many images are processed at once during training. You may adjust this value based on your system's GPU capacity. 128 worked for us on a 32GB GPU.
val_split_pct: percentage of data to use for validation

Once the data is prepared using prepare_data(), you can proceed to initialize and train the Feature Classifier model.

# Prepare the data for training the deep learning model
# -------------------------------------------------------

data = prepare_data(path=output_path, batch_size=128, val_split_pct=0.2)

The data variable is now ready to be used with a deep learning model like FeatureClassifier.

🖼️ Visualize Training Data

🔍 To inspect the quality and variety of your training data, you can use the show_batch() method provided by arcgis.learn. This method:

Randomly selects a batch of labeled image chips (or features).
Displays them in a grid along with their associated class labels.
Helps you visually verify that:
- Images are properly aligned with their labels.
- There's enough class variation. Check for class balance across categories.
- Data quality is suitable for training.
- Verify that the tiles are correctly labeled.
Especially useful during early experimentation to catch issues like:
- Mislabeling
- Missing classes
- Poor image quality
- Imbalanced datasets
rows: Specifies in how many rows to display image chips.

# Visualize a sample batch from the training dataset
# -------------------------------------------------------

data.show_batch(rows=5)

🧩 Load the Model Architecture

Now that the data is prepared, you can initialize the Feature Classifier model using the FeatureClassifier class from arcgis.learn. Read more about this model here.

oversample=True: This helps balance the dataset by oversampling minority classes, which is especially helpful if your training data contains class imbalances.

# Create a FeatureClassifier model object for training
# -------------------------------------------------------

model = FeatureClassifier(data, oversample=True)

📈 Find an Optimal Learning Rate

The learning rate is a critical hyperparameter that affects how quickly or effectively your model learns. The ArcGIS API for Python provides a built-in learning rate finder to help you choose a value that leads to faster convergence without overshooting. After running the learning rate finder, it will generate a plot showing the learning rate versus loss. You can then select the learning rate corresponding to the point just before the loss starts to increase. model.lr_find() runs a learning rate range test—it trains the model for a few batches while gradually increasing the learning rate and records the loss. It returns a suggested learning rate (lr) that is:

High enough to train quickly,
But not so high that it causes unstable training.

To visualize the plot:

# Automatically find an optimal learning rate
# -------------------------------------------------------

lr = model.lr_find()

# Display the recommended learning rate
lr

0.00019054607179632462

This helps you:

Visually identify the "sweet spot" before the loss starts to increase.
Optionally pick a slightly lower LR than the steepest drop for stability.

Once you identify the optimal value from the plot, you'll use it when training the model in the next step.

🚀 Fit the Model

Here’s how you can fit the model using the chosen learning rate and number of epochs:

20: Number of training epochs (you can increase this later based on performance). For the sake of time, we can start with 20 epochs.
lr=lr: Learning rate (example value—adjust it based on your learning rate finder plot).

💡 Tips for Monitoring Training: During training, watch:

Training loss: Should steadily decrease.
Validation loss: Should also decrease or stay stable (not increase significantly).

If validation loss increases while training loss keeps dropping, overfitting may be happening. If the model is overfitting, consider:

Reducing the learning rate,
Increasing the training data,
Or applying data augmentation.

We will train the model for a few epochs with the learning rate we have found.

# Train the FeatureClassifier model
# -------------------------------------------------------

# Train the model for 20 epochs using the learning rate found earlier.

model.fit(20, lr=lr)

epoch	train_loss	valid_loss	accuracy	time
0	0.075467	0.124108	0.961510	08:20
1	0.068225	0.124054	0.962610	07:42
2	0.075965	0.119657	0.964809	07:59
3	0.069717	0.119751	0.965176	08:28
4	0.064654	0.121075	0.965176	08:22
5	0.072091	0.118563	0.966642	07:56
6	0.057890	0.117076	0.964809	08:16
7	0.060536	0.118210	0.963343	08:14
8	0.063993	0.117165	0.965176	08:00
9	0.065712	0.114718	0.965909	07:16
10	0.058225	0.118122	0.965909	07:18
11	0.058652	0.112426	0.966276	07:33
12	0.058043	0.110955	0.967375	07:40
13	0.050317	0.113226	0.966642	07:53
14	0.055455	0.111161	0.969575	07:42
15	0.056420	0.111660	0.967375	07:55
16	0.055499	0.112515	0.967375	07:40
17	0.054759	0.109142	0.969208	07:25
18	0.052774	0.109370	0.970308	07:20
19	0.056893	0.109999	0.968842	08:22

📊 Model Performance After Training

After just 20 epochs, we observe that both training and validation losses have significantly decreased, which indicates the model is effectively learning to classify different coastline categories. While these are promising early results, further fine-tuning or more training data could enhance performance even more.

📉 Plot the Loss Curve

To monitor the model’s learning progress over epochs, you can plot the training and validation loss curves. This helps visualize how well the model is learning and whether it's overfitting or underfitting. model.plot_losses() generates a line plot showing:

Training loss over each epoch
Validation loss over each epoch

🧠 What to Look For:

✅ Smooth downward trend in both losses → Good training progress.
⚠️ Validation loss flattening or rising while training loss decreases → Possible overfitting.
❌ Erratic loss behavior → Learning rate may be too high or data may be noisy.

Use this method:

# Plot training and validation loss curves
# -------------------------------------------------------

model.plot_losses()

🖼️ Visualize Results on the Validation Set

To better understand how well the model is performing, it's good practice to visually compare the model’s predictions against the ground truth labels. This allows you to qualitatively assess whether the model is making correct classifications.

You can use the following code to preview the results of the model within the notebook:

rows=3: You can increase this to view more samples.

This will display:

Input tile
Ground truth class
Predicted class by the model

🔍 This visual check can reveal patterns in model errors, such as consistent confusion between similar categories (e.g., rocky vs. sandy shores), helping guide further improvements.

# Visualize model predictions vs actual labels
# -------------------------------------------------------

model.show_results(rows=4)

Accuracy Assessment

After training, it’s important to evaluate how accurately the model is predicting each class. One of the best tools for this is the confusion matrix, which shows how often predictions match the true labels—and where the model is making mistakes.

📉 Plot the Confusion Matrix

Use the plot_confusion_matrix() method from arcgis.learn:

# Plot the confusion matrix for model evaluation
# -------------------------------------------------------

model.plot_confusion_matrix()

🧠 How to Interpret the Confusion Matrix

Each row = Actual class (ground truth label)
Each column = Predicted class (model's prediction)
Diagonal cells = Correct predictions
Off-diagonal cells = Misclassifications (where the model predicted the wrong class)
🔍 Strong diagonal dominance suggests the model is performing well overall.
⚠️ Consistent misclassification in specific rows/columns may point to:
- Class imbalance
- Visually similar categories
- Too few training/validation samples for those categories
The goal is to have high numbers on the diagonal and low or zero values elsewhere.

📊 Confusion Matrix Insights

The confusion matrix shows strong performance for most categories, with a majority of predictions falling along the diagonal (i.e., correct classifications).

✅ Category-wise Breakdown

Category	Total Tiles (Validation Set)	Correctly Classified	Misclassified As
1	469	450	13 as 3, 4 as 4, 1 as 8
2	8	7	1 as Category 1
3	369	333	Some as 1, 4, 6, 7, 8
4	67	63	4 misclassified as cat 3
6	107	104	1 as 4, 2 as 3
7	756	737	2 as 3, 1 as 1, 2 and 4, 14 as 8
8	910	890	6 as 3, 12 as 7, 2 as 1
10	42	42	0 misclassified

⚠️ Category 2 Observations

There are 8 tiles for Category 2 in the validation set.
7 were correctly classified, while 1 was misclassified as Category 1.
With such a small sample size, the model’s accuracy for this class is less reliable and could benefit from more examples.

🔁 What You Can Do

To improve classification performance, especially for underrepresented categories like Category 2:

If you observe significant misclassifications or inconsistent accuracy across categories:

📌 Check for class imbalance.
➕ Export more labeled tiles for underrepresented classes.
🔄 Increase the validation split (val_split_pct) in prepare_data() as shown below.
```
data = prepare_data(path=data_path, batch_size=128, val_split_pct=0.3)
```
🧪 Consider data augmentation or oversampling.
👀 Examine whether the misclassified categories have visually similar features.

💡 Note: If you modify or expand your dataset, be sure to retrain the model so it can learn from the updated distribution.

💾 Save the Model

Now, we will save the trained model as a Deep Learning Package (.dlpk).
This is the standard format used to deploy deep learning models across the ArcGIS platform, including:

We use the .save() method to export the model. By default, the .dlpk file will be saved to the models sub-folder within the training data directory.

# Save the trained model for future use
# -------------------------------------------------------

model.save('model-20e_8classes')

Computing model metrics...

WindowsPath('D:/Work/Repos/Arcgis_python_API/arcgis-python-api/samples/04_gis_analysts_data_scientists/models/model-20e_8classes')

You can also specify a custom path if needed. For example -

model.save(r"D:\Exported_Models\coastline_classifier.dlpk")

💡Once saved, this .dlpk file can be:

Shared with colleagues,
Uploaded to ArcGIS Online or Enterprise,
Or used for inferencing on new imagery in ArcGIS Pro or here in sample notebook itself.

Model Inference

To perform inferencing in ArcGIS Pro, we first need to create a set of input features along the coastlines of a new, unseen area—i.e., an area not used during model training.

🧱 Step 1: Prepare Input Features

Use the Generate Rectangles Along Lines tool to create rectangular features along the coastline (similar to the training process; see Figure 2). These rectangles will act as the input objects for classification.

🤖 Step 2: Run the Inference Tool

Use the Classify Objects Using Deep Learning tool in ArcGIS Pro.

📥 Required Parameters:

Input Raster: The Sentinel-2 imagery of the new area.
Input Features: Output from the Generate Rectangles Along coastlines tool. (from Step 1).
Output Classified Objects Feature Class: The output feature class containing the classified rectangles.
Model Definition: The .dlpkmodel saved in the previous step.
Class Label Field: The field name (e.g., ClassLabel) where the model will write the predicted class.

⚙️ Environments to Set:

Cell Size: Match the resolution of the Sentinel-2 imagery (e.g., 10 meters).
Processing Extent: Limit the analysis to your area of interest.
Processor Type: Choose between CPU or GPU based on your hardware.

💡After running the tool, each input rectangle will be assigned a coastline category label as predicted by the model—enabling spatial analysis and visualization directly in ArcGIS Pro.

Figure 6. Classify Objects Using Deep Learning tool

Results

We selected a sandy shoreline (Category 3) that was **unseen by the model during training. Using the Generate Rectangles Along Lines tool, we created rectangular features along this shoreline and ran inference using our trained model.

📊 Inference Results Summary (Figure 9)

In Figure 9, you can observe the classification output.

✅ Most rectangles were correctly classified as Category 3 (sandy shore).
❌ However, two rectangles were misclassified as Category 7 (mangrove).

Predicted	Count
Category 3 (Correct)	✅ 6 tiles
Category 7 (Misclassified)	❌ 2 tiles

This indicates the model is performing well overall, though there’s still room for improvement—particularly in distinguishing between visually similar shoreline types.

📌 Additional training data or fine-tuning could further improve model accuracy.

Conclusion

In this notebook, we demonstrated a complete workflow for coastline classification using the FeatureClassifier model from the ArcGIS API for Python.

We covered:

✅ Preparing training data using NOAA ENC coastline categories and Sentinel-2 imagery
✅ Generating labeled training tiles using the Export Training Data for Deep Learning tool
✅ Training a deep learning model to classify coastal segments into predefined categories
✅ Evaluating model performance using loss curves, confusion matrix, and visual validation
✅ Performing inference on unseen areas using the Classify Objects Using Deep Learning tool in ArcGIS Pro

The model showed promising results in correctly identifying most coastline categories, especially when sufficient training samples were available. Some minor misclassifications highlight areas where additional data or fine-tuning could improve accuracy.

🌊 Real-World Impact

By automating coastline classification:

🌐 Coastal monitoring becomes more scalable
📉 Manual interpretation effort is reduced
🛰️ Satellite imagery can be leveraged for large-scale environmental analysis

🔁 Machine learning is an iterative process—as the dataset improves, so will the model.

Thanks for following along!