In the guide How u-net works, we have learned in detail about semantic segmentation using U-net in the ArcGIS API for Python. There are many other semantic segmentation algorithms like PSPNet, Deeplab, etc. which can perform road extraction. However, road network extraction from satellite images often produce fragmented road segments using only semantic segmentation algorithms, as satellite images pose difficulties in the extraction of roads due to the following reasons:
- Shadows of clouds and trees.
- Diverse appearance and illumination conditions due to terrain, weather, geography, etc.
- The similarity of road texture with other materials.
In this guide, we explain the recent multi-task approach  to improve road connectivity in generated road masks. This work introduces a novel connectivity task called Orientation Learning, which is motivated by the human behavior of annotating roads by tracing it at a specific orientation.
Note: To follow the guide below, we assume that you have some basic understanding of deep learning and convolutional neural networks (CNNs). For a detailed review of CNNs, please review Stanford University's CS231n course about Convolutional Neural Networks for Visual Recognition. Also, we recommend reading How u-net works before reading this one.
Human beings attempt to find consistent patterns among our experiences and make a certain hypothesis about their features and causes. In real-life scenarios, humans do not receive tasks in isolation, but instead, receive a sequence of related tasks over time. And, humans transfer the knowledge from one scenario to another by utilizing prior experiences.
For example, when humans learn to ride a bicycle, they will learn traffic rules, balancing the vehicle, and how and when to apply brakes. With the experience of riding a bicycle, humans can learn to ride motor-bike faster by transferring their prior knowledge/experience. In general, this ability of knowledge sharing helps humans to learn complex concepts by first learning simple concepts. Machine learning algorithms can take advantage of the human behavior of knowledge sharing to improve the performance of a complex task by learning such smaller tasks in-parallel or in-sequence.
The multi-task learning mechanism  is inspired by human beings to acquire knowledge of complex tasks by performing different shared sub-tasks simultaneously i.e. in-parallel. It improves the performance by inducing mutual information of the tasks in the learning process. In deep learning, the common architecture for multi-task learning consists of an encoder and multiple decoders to perform predictions of related tasks (As shown in Figure. 1). For the road extraction system, we have one task i.e. identifying road pixels and in the next section, we will introduce the other task.
Multiple Tasks in Road Extraction:
Humans perform two related tasks while annotating the roads i.e. identify the road pixels and trace lines to connect them. These two tasks are used in a multi-task learning framework to improve road extraction. (As shown in Figure. 2)
- Binary Semantic Segmentation: identify road pixels.
- Road Orientation: identify the road orientation for tracing.
Road network extraction formulated as binary segmentation fails to produce a topologically correct road map due to a change in road appearance.
As the annotators, trace lines using the highlighted nodes, (As shown in Figure. 3) along the center of roads with a traversable shortest path (a, c, d, e, b) for a road route a → b. The fragmented road network is estimated using binary segmentation algorithms such as PSPNet, UNet, etc. which results in the path (a, c, f, g, h, b) for the route a → b, which is not the shortest route. The annotators use additional knowledge for orientation for tracing roads to achieve this connectivity.
Incorporating this in the Multi-Task Road Extractor framework, we extract connected and topologically correct road networks using segmentation and orientation.