Abstract: Candid photography is a relatively recent trend among both professional and amateur photographers. However, capturing decisive candid moments with a natural head pose, body pose, eye gaze, facial expressions, human-object interactions, and background understanding is not a trivial task. It requires the timing and intuition of a professional photographer to capture fleeting moments. We propose a novel and real-time framework for detecting a candid moment in this work. The method includes a two-stream network, namely, Attribute Network and Visual Embedder Network. The former network stream collaboratively learns high-level semantic features from the candid feature pool and the latter network stream focuses on learning visual image features. Lastly, we have extended our solution, by fine-tuning CandidNet, to output candid scores for frames in the range 0 to 4 (0: non-candid; 4: extreme candid). The scoring mechanism allows us to compare images based on their candidness. A detailed ablation study conducted on the proposed framework with various configurations proves the efficacy of the method with a classification accuracy of 92% on CELEBA-HQ [16] and 94% on CANDID-SCORE [13]. With a high processing speed, the proposed solution is suitable for real-time applications like candid moment indication to the user in-camera preview. First-in-the-market, the solutionis being deployed on the Samsung Galaxy S22 flagship phone as part of the Single Take Photo application.
External IDs:dblp:conf/cvip/RamolaMV22
Loading