Abstract: Ensuring robustness in image watermarking is crucial for and maintaining content integrity under diverse transformations. Recent self-supervised learning (SSL) approaches, such
as DINO, have been leveraged for watermarking but primarily
focus on general feature representation rather than explicitly
learning invariant features. In this work, we propose a novel
text-guided invariant feature learning framework for robust image watermarking. Our approach leverages CLIP’s multimodal
capabilities, using text embeddings as stable semantic anchors
to enforce feature invariance under distortions. We evaluate
the proposed method across multiple datasets, demonstrating
superior robustness against various image transformations. Compared to state-of-the-art SSL methods, our model achieves higher
cosine similarity in feature consistency tests and outperforms
existing watermarking schemes in extraction accuracy under
severe distortions. These results highlight the efficacy of our
method in learning invariant representations tailored for robust
deep learning-based watermarking.
Loading