Wildlife Species Identification in Camera Trap Images¶
- 🔬 Data Science
- 🥠 Deep Learning and Image Classification
Table of Contents¶
- Necessary imports
- Connect to your GIS
- Get the data for analysis
- Train the model
- Model Inference
Automatic animal identification can improve biology missions that require identifying species and counting individuals, such as animal monitoring and management, examining biodiversity, and population estimation.
This notebook will showcase a workflow to classify animal species in camera trap images. The notebook has two main sections:
- Training a deep learning model that can classify animal species
- Using the trained model to classify animal species against each camera location
We have used a subset of a community licensed open source dataset for camera trap images, provided under the LILA BC (Labeled Information Library of Alexandria: Biology and Conservation) repository, to train our deep learning model, which is further detailed in this section. For inferencing, we have taken 5 fictional camera locations in Kruger National Park, South Africa, and to each of those points, we have attached some images. This feature layer simulates a scenario where there are multiple cameras at different locations that have captured images that need to be classified for animal species. The whole workflow enabling this is explained in this section.
Note: This notebook is supported in ArcGIS Enterprise 10.9 and later.
import os import json import zipfile import pandas as pd from pathlib import Path from fastai.vision import ImageDataBunch, get_transforms, imagenet_stats from arcgis.gis import GIS from arcgis.learn import prepare_data, FeatureClassifier, classify_objects, Model
# connect to web GIS gis = GIS("Your_enterprise_profile") # ArcGIS Enterprise 10.9 or later
In this notebook, we have used the "WCS Camera Traps" dataset, made publicly available by the Wildlife Conservation Society under the LILA BC repository. This dataset contains approximately 1.4M camera trap images representing different species from 12 countries, making it one of the most diverse camera trap data sets publicly available. The dataset can be further explored and downloaded from this link. This data set is released under the Community Data License Agreement (permissive variant).
This dataset has 1.4M images related to 675 different species, of which we have chosen 11 species whose conservation status was either "Endangered", "Near Threatened", or "Vulnerable". These species include Jaguars, African elephants, Lions, Thomson's gazelles, East African oryxs, Gerenuks, Asian elephants, Tigers, Ocellated turkeys, and Great curassows.
The json file (wcs_camera_traps.json) that comes with the data contains metadata we require.
The cell below converts the downloaded data into required format to be used by
# # list containing corrupted filenames from the data downloaded # corrupted_files=['animals/0402/0905.jpg', 'animals/0407/1026.jpg', 'animals/0464/0880.jpg', # 'animals/0464/0881.jpg', 'animals/0464/0882.jpg', 'animals/0464/0884.jpg', # 'animals/0464/0888.jpg', 'animals/0464/0889.jpg', 'animals/0645/0531.jpg', # 'animals/0645/0532.jpg', 'animals/0656/0208.jpg', 'animals/0009/0215.jpg'] # # list with selected 11 species ids # retained_specie_list = [7,24,90,100,119,127,128,149,154,372,374] # with open('path_to: wcs_camera_traps.json') as f: # metadata = json.load(f) # annotation_df = pd.DataFrame(metadata['annotations']) # load the annotations and images into a dataframe # images_df = pd.DataFrame(metadata['images']) # img_ann_df = pd.merge(images_df, # annotation_df, # left_on='id', # right_on='image_id', # how='left').drop('image_id', axis=1) # train_df = img_ann_df[['file_name','category_id']] # selecting required columns from the merged dataframe. # train_df = train_df[~train_df['file_name'].isin(corrupted_files)] # removing corrupted files from the dataframe # train_df = train_df[~img_ann_df['file_name'].str.contains("avi")] # # A 'category_id' of 0 indicates an image that does not contain an animal. # # To reduce the class imbalance, we will only retain # # ~50% of the empty images in our training dataset. # new_train_df = train_df[train_df['category_id']==0].sample(frac=0.5, # random_state=42) # new_train_df = new_train_df.append(train_df[train_df['category_id'] # .isin(retained_specie_list)])
Alternatively, we have provided a subset of training data containing a few samples from the 11 species mentioned above.
agol_gis = GIS("home")
training_data = agol_gis.content.get('677f0d853c85430784169ce7a4a54037') training_data
filepath = training_data.download(file_name=training_data.name)
with zipfile.ZipFile(filepath, 'r') as zip_ref: zip_ref.extractall(Path(filepath).parent)
output_path = Path(os.path.join(os.path.splitext(filepath)))
new_train_df = pd.read_csv(str(output_path)+'/retained_data_subset.csv')
1801 rows × 2 columns
Here, we are not using the
prepare_data() function provided by
arcgis.learn to prepare the data for analysis. Instead, we will use the
ImageDataBunch.from_df method provided by fast.ai to read the necessary information from a dataframe and convert it into a 'DataBunch' object, which will then be used for training. We will use the standard set of transforms for data augmentation and use 20% of our training dataset as a validation dataset. It is important to note that we are also normalizing our inputs.
Path_df=Path(output_path) # path to the downloaded data data = ImageDataBunch.from_df(path=Path_df, folder='', df=new_train_df, fn_col=0, label_col=1, valid_pct=0.2, seed=42, bs=16, size=224, num_workers=2).normalize(imagenet_stats)
When working with large sets of jpeg images, it is possible that some images will only be available as a stream coming from an image and will not be complete. If these image streams are present in the dataset, they could potentially break the training process. To ensure that the training flow does not break, we set the LOAD_TRUNCATED_IMAGES parameter to True to indicate that the model should train on whatever image stream is available.
from PIL import ImageFile ImageFile.LOAD_TRUNCATED_IMAGES = True