Source Separation by Flow Matching

Robin Scheibler, Arnaud Doucet, John R. Hershey, Henry Li

Published: 04 Nov 2025, Last Modified: 27 Mar 2026WASPAA 2025EveryonearXiv.org perpetual, non-exclusive license

Abstract: We consider the problem of single-channel audio source separation with the goal of reconstructing K sources from their mixture. We address this ill-posed problem with FLOSS (FLOw matching for Source Separation), a constrained generation method based on flow matching, ensuring strict mixture consistency. Flow matching is a general methodology that, when given samples from two probability distributions defined on the same space, learns an ordinary differential equation to output a sample from one of the distributions when provided with a sample from the other. In our context, we have access to samples from the joint distribution of K sources and so the corresponding samples from the lower-dimensional distribution of their mixture. To apply flow matching, we augment these mixture samples with artificial noise components to match the dimensionality of the K source distribution. Additionally, as any permutation of the sources yields the same mixture, we adopt an equivariant formulation of flow matching which relies on a neural network architecture that is equivariant by design. We demonstrate the performance of the method for the separation of overlapping speech.