Falcon: Fast Proximal Linearization of Normalized Cuts for Unsupervised Image Segmentation

Published: 26 Jan 2026, Last Modified: 02 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Unsupervised Segmentation, Graph Cut, Normalized Cut, Proximal Gradient Method, Kurdyka–Łojasiewicz (KL) Convergence
TL;DR: Falcon is a fast proximal-gradient solver for discrete K-way Normalized Cut, with Kurdyka–Łojasiewicz convergence guarantees, outperforming recursive NCut in speed and accuracy on six segmentation benchmarks.
Abstract: Current zero-shot unsupervised segmentation methods based on normalized cuts (NCut) face three key limitations. First, they rely on recursive bipartitions with repeated eigen-decompositions, making them prohibitively expensive at scale. Second, each split requires spectral relaxation followed by rounding, introducing layers of approximation where the final partition may diverge from the true NCut objective. Third, recursive bipartitioning offers no principled assurance of producing a stable $K$-way segmentation, and existing heuristics lack convergence guarantees. We propose \textbf{Falcon}, a proximal-gradient solver that directly optimizes the discrete $K$-way NCut objective without spectral relaxation. We prove linear convergence under the \textit{Kurdyka--\L{}ojasiewicz} (KL) property. Falcon computes closed-form gradient scores weighted by cluster volumes and performs row-wise one-hot proximal updates stabilized by inertia. A monotone backtracking scheme adaptively tunes the proximal parameter, ensuring non-decreasing NCut values. This design preserves discrete feasibility, removes repeated eigen-decomposition, and guarantees convergence. Across six benchmarks, Falcon outperforms the strongest official baseline (DiffCut) by wide margins, e.g., +13.2 mIoU on VOC, +27.7 on COCO-Object, and +3.1 on Cityscapes, while remaining competitive on Pascal Context. It also runs up to an order of magnitude faster than recursive NCut and scales more favorably in memory at high resolution, making it practical for larger token grids. By pairing pretrained foundation models with a principled NCut solver, Falcon sets a new state of the art across six benchmarks and achieves the best performance on 17 of 18 benchmark--encoder pairs, underscoring both its robustness and its generality in bridging the gap between unsupervised and supervised segmentation.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 6302
Loading