Abstract: The problem of visual object tracking has evolved over the years. Traditionally, it is solved by a model that only learns the appearance of an object online, using the video itself as the only training data. The target in a single object tracking task is a relatively small object in most cases, and the deformation is more serious, referring to the dice loss used in the semantic segmentation problem, we introduced a new objective function to optimize during training based on the Dice coefficient. In this way, we can handle the strong imbalance between foreground and background patches. To cope with the limited amount of annotations available for training, we use random nonlinear transformations and histogram matching to increase the data. We have demonstrated in our experimental evaluation that our method has achieved good performance in challenging test data, while only requiring a small amount of processing time required by other previous methods.