In the natural sciences, especially in disciplines such as biology and physics, Bayesian inference is becoming increasingly popular due to its ability both to quantify uncertainty in parameter values and to incorporate prior knowledge about quantities of interest. Bayesian statistics infers the posterior distribution $p( \theta | {y}) \propto p(y | \theta) p( \theta)$ of statistical parameters $\theta$ by conditioning a prior distributions $p( \theta)$ on data ${y}$. If the likelihood function $p({y} | \theta)$ is available, i.e., tractable to compute, conventional Bayesian inference using Markov chain Monte Carlo or variational methods can be used for parameter inference \citep{brooks2011handbook,wainwright2008graphical}. However, for many scientific hypotheses, the likelihood is not easy to compute and the experimenter merely has access to a simulator function ${sim}(\theta)$ that can generate synthetic data conditionally on a parameter configuration $\theta$. 

In the latter case, an emergent family of methods collectively called \textit{simulation-based inference} (SBI, \citet{cranmer2020frontier}) has been proposed. 
Traditionally, approximate Bayesian computation (ABC, \citet{sisson2018handbook}), and most successfully sequential Monte Carlo ABC (SMC-ABC; e.g., \citet{beaumont2009adaptive,lenormand2013adaptive}) or simulated annealing ABC (SABC; e.g., \citet{albert2015simulated}), has been used to infer approximate posterior distributions \citep{pritchard1999population,ratmann2007using}. More recently, methods that are based on neural density or density-ratio estimation have found increased application in the natural sciences due to their reduced computational cost and convincing inferential accuracy \citep{brehmer2018constraining,delaunoy2020lightning,gonccalves2020training,hermans2021towards,brehmer2021simulation,dax2021real}. Among these, several branches of methods exist. Likelihood-based methods \citep{papamakarios2019sequential,glockler2022variational} fit a surrogate model for the likelihood function using neural density estimators \citep{papamakarios2021normalizing} which allows to do conventional Bayesian inference and which has been shown to bring significant performance advantages in comparison to ABC methods with the same computational budget. \citet{cranmer2015approximating,durkan2020contrastive,hermans2020likelihood,thomas2022likelihood,delaunoy2022towards,miller2022contrastive} developed similar methods that instead target the likelihood-to-evidence ratio rather than the likelihood, while \citet{papamakarios2016fast,lueckmann2017flexible,greenberg2019automatic,deistler2022truncated,wildberger2023flow,sharrock2024sequential} developed methods that try to approximate the posterior distribution directly.

In the case of likelihood-based methods, the accuracy of posterior inferences might suffer due to the inability of neural density estimators to correctly approximate the surrogate likelihoods, e.g., if the data are very high-dimensional or the data are embedded in a low-dimensional manifold but lie in a higher-dimensional ambient space \citep{fefferman2016testing,kingma2018glow,greenberg2019automatic,cunningham2020normalizing,dai2020sliced,klein2021funnels}. 

To overcome this limitation, we present a new method for simulation-based inference which we call \textit{Surjective Sequential Neural Likelihood} (SSNL) estimation. SSNL uses a surjective dimensionality-reducing normalizing flow to model the surrogate likelihood of a Bayesian model by that allowing improved density estimation and consequently improved posterior inferences. We evaluate SSNL on multiple experiments from the SBI, astrophysics and neuroscience literatures and demonstrate that it achieves superior performance in comparison to state-of-the-art methods. Conversely, we also demonstrate negative examples when our method should, in theory and empirically, not have a performance gain.
\begin{figure*}[h!]
\centering
\subfloat[Bijection.]{
\includegraphics[width=0.7\columnwidth]{fig/bijection.pdf}
\label{fig:bijection}%
}
\centering
\subfloat[Surjection.]{%
\includegraphics[width=0.7\columnwidth]{fig/surjection.pdf}
\label{fig:surjection}%
}
\caption{Conditional bijective and surjective flow layers illustrated with masked autoregressive flows. (a) A bijective flow layer $f^{-1}(y; \theta)$ transforms an input $y_j$ conditional on all previous values $y_{<j}$ and a parameter vector $\theta$ using a conditioner $c$ and transformer $\tau$. (b) The surjective flow layer $f^{-1}(y_{+}; y_{-}, \theta)$ first splits the vector into two components $y_{+}$ and $y_{-}$ and then uses the component $y_{-}$ as additional conditioning variable. The implementations for conditioner and transformer remain the same as for the bijection. To evaluate the likelihood, a surjective layer additionally computes the conditional density $p(y_{-}|z, \theta)$ (which computationally is done after the transform).}
\label{fig:bijection-vs-surjection}
\end{figure*}