Edge Detection with

Introduction

Edge Detection is the identification of edges and object boundaries in an image. As a human, we can easily identify the edges and object boundaries in images due to our highly evolved visual perception and prior context and understanding of the world. A computer algorithm, in contrast, would identify these edges by identifying changes in contrasts, gradients, colors, etc. within an image, which are represented by different pixel values. Edge Detection has multiple use cases across many domains; for example, Edge Detection can be used to identify land parcel boundaries in satellite imagery.

Figure 1. Edge Detection [1]

Earlier works

Edge Detection is a classical computer vision problem. There have been many algorithms in the past that have worked well, to a certain degree, for edge detection. Most of these employed well-researched filters or operators that worked in most cases. The Canny Edge Detection [2] technique has been one of the most popular ones. This technique involves multiple stages, including Gaussian blurring, gradient filtering, non-maxima suppression, etc. Many other techniques use hand-crafted features to detect the edges. Later, algorithms such as Structured Forest for Fast Edge Detection, which use machine learning with hand-crafted features, grew in popularity as they were more accurate and relatively faster than their predecessors[3].

In the last few years, with advancements in deep learning, many CNN-based models have been developed to solve edge detection problems. When using these deep learning models, Edge Detection can be considered a special case of pixel classification (or image segmentation, as called in AI jargon). In arcgis.learn, we have integrated two deep learning models for Edge Detection - HED and BDCN.

Holistically-Nested Edge Detection (HED)

HED is one of the earlier CNN-based models for edge detection. The model has two salient features that give the model its name, according to the authors. First, the model is 'holistic', as it takes an image as input and outputs another image (edge map). The model doesn't require hand-crafted features as inputs, rather, due to its architecture, it creates these features internally in the hidden layers. This property is inherited from the fully-convolutional networks. Second, the model is 'nested' and learns at multiple scales using deep supervised learning. This is achieved by taking side outputs at varying depths [4].

Figure 2. Architecture of Holistically-Nested Edge Detection (HED) [4]

The model uses VGGNet architecture as its base that is trimmed from its last pooling layer. The five convolution blocks have a side output, each of which helps in learning features at multiple scales. These side outputs are combined together to form a fusion layer, the weights of which are also learnable. The resulting fusion layer provides a unified output. This architecture of HED can be seen in Figure 2. Total loss is the sum of loss calculated at both the fusion layer and the side outputs. The loss function used in the model is the cross-entropy loss with the addition of a class-balancing weight for the side outputs. The class-balancing weight is introduced because the majority of the pixels in an edge map will not represent edges [4].

Bi-Directional Cascade Network for Perceptual Edge Detection (BDCN)

BDCN is a relatively recent model that tries to improve the edge detection for images with objects of largely varying scales. The main proposition of this model is to supervise an individual layer using an edge map in its scale instead of using a common ground truth edge map for all the layers. The layers at varying depths have varying receptive fields - shallower layers will capture high-level features, while deeper layers will capture more localized features. Therefore, it is more optimal if the learning in each layer is focused on a particular scale [1].

Figure 3. Architecture of Bi-Directional Cascade Network for Perceptual Edge Detection (BDCN) [1]

Like HED, BDCN model uses a VGGNet as the base architecture. Each of the five convolutional blocks leads to a pooling layer and then evolves into an Incremental Detection Block (ID Block). These ID Blocks are comprised of several convolutional layers, followed by a Scale Enhancement Module (SEM). SEM is made of multiple parallel convolution layers using dilation at different rates to create receptive fields at varying scales. Using dilation is an effective way of creating varying receptive fields without increasing the total parameters. ID Blocks eventually provide two edge predictions, with one propagated from the shallow layer to the deep layer and the other from the deep layer to the shallow layer, thus forming a bi-directional cascade structure. The final prediction is a fusion of the intermediate edge map predictions [1].

Implementation in `arcgis.learn`

Edge Detection can be initialized in arcgis.learn using a single line of code:

model = HEDEdgeDetector(data)

model = BDCNEdgeDetector(data)

where data is the databunch created in earlier steps using prepare_data method. You can optionally provide a backbone parameter, the default value for which is vgg19. In our implementation of Edge Detection models, ResNets can also be used as the backbone, in addition to VGGNets.

For more information about the API, please go to the API reference.

References

[1] Jianzhong He, Shiliang Zhang, Ming Yang, Yanhu Shan, Tiejun Huang: “Bi-Directional Cascade Network for Perceptual Edge Detection”, 2019; arXiv:1902.10903

[2] J. Canny, "A Computational Approach to Edge Detection," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679-698, Nov. 1986, doi: 10.1109/TPAMI.1986.4767851.

[3] P. Dollár and C. L. Zitnick, "Structured Forests for Fast Edge Detection," 2013 IEEE International Conference on Computer Vision, Sydney, NSW, 2013, pp. 1841-1848, doi: 10.1109/ICCV.2013.231.

[4] Saining Xie, Zhuowen Tu: “Holistically-Nested Edge Detection”, 2015; arXiv:1504.06375