Object Tracking is a methodology that helps to monitor the location of objects over a sequence of video frames.
SiamMask is a deep learning model architecture which performs both Visual Object Tracking (VOT) and semi-supervised Video Object Segmentation (VOS). Given the location of the object in the first frame of the sequence, the aim of VOT is to estimate an object's position in subsequent frames with the best possible accuracy. Similarly, the main goal of VOS is to output a binary segmentation mask which expresses whether or not a pixel belongs to the target. In other words, SiamMask takes as input a single object bounding box for initialization and outputs segmentation mask and object bounding box for each subsequent frame of a video.