What Time Tells Us? An Explorative Study of Time Awareness Learned from Static Images

Published: 22 Oct 2025, Last Modified: 22 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Time becomes visible through illumination changes in what we see. Inspired by this, in this paper we explore the potential to learn time awareness from static images, trying to answer: *what time tells us?* To this end, we first introduce a Time-Oriented Collection (TOC) dataset, which contains 130,906 images with reliable timestamps. Leveraging this dataset, we propose a Time-Image Contrastive Learning (TICL) approach to jointly model timestamps and related visual representations through cross-modal contrastive learning. We found that the proposed TICL, 1) not only achieves state-of-the-art performance on the timestamp estimation task, over various benchmark metrics, 2) but also, interestingly, though only seeing static images, the time-aware embeddings learned from TICL show strong capability in several time-aware downstream tasks such as time-based image retrieval, video scene classification, and time-aware image editing. Our findings suggest that time-related visual cues can be learned from static images and are beneficial for various vision tasks, laying a foundation for future research on understanding time-related visual context. Project page: https://rathgrith.github.io/timetells_release/
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We thank the AE and reviewers for their constructive advice and evaluations on our paper, which have helped us to improve the manuscript greatly. Below we summarise all the updates we made during the rebuttal period and in the camera-ready revision of our paper. Following suggestions and reviews from **Reviewer uEWw,** we made the following updates: 1. Explorations on Segsort-style supervised contrastive learning are updated to **Appendix C.3 (pages 30-31).** 2. Results of normalising time with regards to seasons are added to **Table 7 (page 23)**, with the discussions in **Figure 18** and **Appendix B.4 (page 24)**, in the revised version. We made updates to address the concerns raised by **Reviewer VzWe:** 1. We have added more pointers in the main text linking corresponding quantitative experiments and dataset/methodology descriptions to the appendix. 2. We fixed the typos raised and updated the title of **Section 5.3** and **Table 3 (page 8)** from *"Ablation study"* to *”Detailed component analysis“.* The following updates are made to resolve concerns from **Reviewer vBRf:** 1. In **Section 3 (page 3-5)**, we have revised the wording to focus more on the conceptual novelty of the model design rather than the architectural. 2. Added additional SOTA VLM model baseline results to **Table 2 (page 7).** Based on the feedback we received from **Reviewer C7jQ**, we updated the paper: 1. We add visualisations comparing sample-illumination distributions before and after our filtering process in **Figure 13 (page 20)** with discussions in **Appendix A.1 (page 18-21**). 2. Included a dataset-comparison table with proper visualisations of issues in previous datasets in **Figure 11** and **Table 6 (page 19)** to support discussions in **Appendix A.1 (page 18-21)**. Finally, apart from these changes, we also released a new project page for camera-ready in the abstract, and have the code open-sourced (see link in the project page). We hope these updates improve our paper, and we thank all the reviewers and AEs for their invaluable reviews and suggestions.
Video: https://rathgrith.github.io/timetells_release/static/videos/demo_video.mp4
Code: https://github.com/Rathgrith/TICL-Code
Supplementary Material: zip
Assigned Action Editor: ~Xinlei_Chen1
Submission Number: 4493
Loading