var documenterSearchIndex = {"docs":
[{"location":"customized_layer/#Defining-Your-Own-Flow-Layer","page":"Customize your own flow layer","title":"Defining Your Own Flow Layer","text":"","category":"section"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"In practice, user might want to define their own normalizing flow.  As briefly noted in What are normalizing flows?, the key is to define a customized normalizing flow layer, including its transformation and inverse, as well as the log-determinant of the Jacobian of the transformation. Bijectors.jl offers a convenient interface to define a customized bijection. We refer users to the documentation of Bijectors.jl for more details. Flux.jl is also a useful package, offering a convenient interface to define neural networks.","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"In this tutorial, we demonstrate how to define a customized normalizing flow layer – an Affine Coupling Layer (Dinh et al., 2016) – using Bijectors.jl and Flux.jl.","category":"page"},{"location":"customized_layer/#Affine-Coupling-Flow","page":"Customize your own flow layer","title":"Affine Coupling Flow","text":"","category":"section"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"Given an input vector boldsymbolx, the general coupling transformation splits it into two parts: boldsymbolx_I_1 and boldsymbolx_Isetminus I_1. Only one part (e.g., boldsymbolx_I_1) undergoes a bijective transformation f, noted as the coupling law,  based on the values of the other part (e.g., boldsymbolx_Isetminus I_1), which remains unchanged. ","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"beginarrayllll\nc_I_1(cdot  f theta)  mathbbR^d rightarrow mathbbR^d  c_I_1^-1(cdot  f theta)  mathbbR^d rightarrow mathbbR^d \n boldsymbolx_I backslash I_1 mapsto boldsymbolx_I backslash I_1   boldsymboly_I backslash I_1 mapsto boldsymboly_I backslash I_1 \n boldsymbolx_I_1 mapsto fleft(boldsymbolx_I_1  thetaleft(boldsymbolx_Isetminus I_1right)right)   boldsymboly_I_1 mapsto f^-1left(boldsymboly_I_1  thetaleft(boldsymboly_Isetminus I_1right)right)\nendarray","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"Here theta can be an arbitrary function, e.g., a neural network. As long as f(cdot theta(boldsymbolx_Isetminus I_1)) is invertible, c_I_1 is invertible, and the  Jacobian determinant of c_I_1 is easy to compute:","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"lefttextdet nabla_x c_I_1(x)right = lefttextdet nabla_x_I_1 f(x_I_1 theta(x_Isetminus I_1))right","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"The affine coupling layer is a special case of the coupling transformation, where the coupling law f is an affine function:","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"beginaligned\nboldsymbolx_I_1 mapsto boldsymbolx_I_1 odot sleft(boldsymbolx_Isetminus I_1right) + tleft(boldsymbolx_I setminus I_1right) \nboldsymbolx_I backslash I_1 mapsto boldsymbolx_I backslash I_1\nendaligned","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"Here, s and t are arbitrary functions (often neural networks) called the \"scaling\" and \"translation\" functions, respectively.  They produce vectors of the same dimension as boldsymbolx_I_1.","category":"page"},{"location":"customized_layer/#Implementing-Affine-Coupling-Layer","page":"Customize your own flow layer","title":"Implementing Affine Coupling Layer","text":"","category":"section"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"We start by defining a simple 3-layer multi-layer perceptron (MLP) using Flux.jl,  which will be used to define the scaling s and translation functions t in the affine coupling layer.","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"using Flux\n\nfunction MLP_3layer(input_dim::Int, hdims::Int, output_dim::Int; activation=Flux.leakyrelu)\n    return Chain(\n        Flux.Dense(input_dim, hdims, activation),\n        Flux.Dense(hdims, hdims, activation),\n        Flux.Dense(hdims, output_dim),\n    )\nend","category":"page"},{"location":"customized_layer/#Construct-the-Object","page":"Customize your own flow layer","title":"Construct the Object","text":"","category":"section"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"Following the user interface of Bijectors.jl, we define a struct AffineCoupling as a subtype of Bijectors.Bijector. The functions parition , combine are used to partition and recombine a vector into 3 disjoint subvectors.  And PartitionMask is used to store this partition rule.  These three functions are all defined in Bijectors.jl; see the documentaion for more details.","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"using Functors\nusing Bijectors\nusing Bijectors: partition, combine, PartitionMask\n\nstruct AffineCoupling <: Bijectors.Bijector\n    dim::Int\n    mask::Bijectors.PartitionMask\n    s::Flux.Chain\n    t::Flux.Chain\nend\n\n# to apply functions to the parameters that are contained in AffineCoupling.s and AffineCoupling.t, \n# and to re-build the struct from the parameters, we use the functor interface of `Functors.jl` \n# see https://fluxml.ai/Flux.jl/stable/models/functors/#Functors.functor\n@functor AffineCoupling (s, t)\n\nfunction AffineCoupling(\n    dim::Int,  # dimension of input\n    hdims::Int, # dimension of hidden units for s and t\n    mask_idx::AbstractVector, # index of dimension that one wants to apply transformations on\n)\n    cdims = length(mask_idx) # dimension of parts used to construct coupling law\n    s = MLP_3layer(cdims, hdims, cdims)\n    t = MLP_3layer(cdims, hdims, cdims)\n    mask = PartitionMask(dim, mask_idx)\n    return AffineCoupling(dim, mask, s, t)\nend","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"By default, we define s and t using the MLP_3layer function, which is a 3-layer MLP with leaky ReLU activation function.","category":"page"},{"location":"customized_layer/#Implement-the-Forward-and-Inverse-Transformations","page":"Customize your own flow layer","title":"Implement the Forward and Inverse Transformations","text":"","category":"section"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"function Bijectors.transform(af::AffineCoupling, x::AbstractVector)\n    # partition vector using 'af.mask::PartitionMask`\n    x₁, x₂, x₃ = partition(af.mask, x)\n    y₁ = x₁ .* af.s(x₂) .+ af.t(x₂)\n    return combine(af.mask, y₁, x₂, x₃)\nend\n\nfunction Bijectors.transform(iaf::Inverse{<:AffineCoupling}, y::AbstractVector)\n    af = iaf.orig\n    # partition vector using `af.mask::PartitionMask`\n    y_1, y_2, y_3 = partition(af.mask, y)\n    # inverse transformation\n    x_1 = (y_1 .- af.t(y_2)) ./ af.s(y_2)\n    return combine(af.mask, x_1, y_2, y_3)\nend","category":"page"},{"location":"customized_layer/#Implement-the-Log-determinant-of-the-Jacobian","page":"Customize your own flow layer","title":"Implement the Log-determinant of the Jacobian","text":"","category":"section"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"Notice that here we wrap the transformation and the log-determinant of the Jacobian into a single function, with_logabsdet_jacobian.","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"function Bijectors.with_logabsdet_jacobian(af::AffineCoupling, x::AbstractVector)\n    x_1, x_2, x_3 = Bijectors.partition(af.mask, x)\n    y_1 = af.s(x_2) .* x_1 .+ af.t(x_2)\n    logjac = sum(log ∘ abs, af.s(x_2))\n    return combine(af.mask, y_1, x_2, x_3), logjac\nend\n\nfunction Bijectors.with_logabsdet_jacobian(\n    iaf::Inverse{<:AffineCoupling}, y::AbstractVector\n)\n    af = iaf.orig\n    # partition vector using `af.mask::PartitionMask`\n    y_1, y_2, y_3 = partition(af.mask, y)\n    # inverse transformation\n    x_1 = (y_1 .- af.t(y_2)) ./ af.s(y_2)\n    logjac = -sum(log ∘ abs, af.s(y_2))\n    return combine(af.mask, x_1, y_2, y_3), logjac\nend","category":"page"},{"location":"customized_layer/#Construct-Normalizing-Flow","page":"Customize your own flow layer","title":"Construct Normalizing Flow","text":"","category":"section"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"Now with all the above implementations, we are ready to use the AffineCoupling layer for normalizing flow  by applying it to a base distribution q_0.","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"using Random, Distributions, LinearAlgebra\ndim = 4\nhdims = 10\nLs = [\n    AffineCoupling(dim, hdims, 1:2), \n    AffineCoupling(dim, hdims, 3:4), \n    AffineCoupling(dim, hdims, 1:2), \n    AffineCoupling(dim, hdims, 3:4), \n    ]\nts = reduce(∘, Ls)\nq₀ = MvNormal(zeros(Float32, dim), I)\nflow = Bijectors.transformed(q₀, ts)","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"We can now sample from the flow:","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"x = rand(flow, 10)","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"And evaluate the density of the flow:","category":"page"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"logpdf(flow, x[:,1])","category":"page"},{"location":"customized_layer/#Reference","page":"Customize your own flow layer","title":"Reference","text":"","category":"section"},{"location":"customized_layer/","page":"Customize your own flow layer","title":"Customize your own flow layer","text":"Dinh, L., Sohl-Dickstein, J. and Bengio, S., 2016. Density estimation using real nvp.  arXiv:1605.08803.","category":"page"},{"location":"api/#API","page":"API","title":"API","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"","category":"page"},{"location":"api/#Main-Function","page":"API","title":"Main Function","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"NormalizingFlows.train_flow","category":"page"},{"location":"api/#NormalizingFlows.train_flow","page":"API","title":"NormalizingFlows.train_flow","text":"train_flow([rng::AbstractRNG, ]vo, flow, args...; kwargs...)\n\nTrain the given normalizing flow flow by calling optimize.\n\nArguments\n\nrng::AbstractRNG: random number generator\nvo: variational objective\nflow: normalizing flow to be trained, we recommend to define flow as <:Bijectors.TransformedDistribution \nargs...: additional arguments for vo\n\nKeyword Arguments\n\nmax_iters::Int=1000: maximum number of iterations\noptimiser::Optimisers.AbstractRule=Optimisers.ADAM(): optimiser to compute the steps\nADbackend::ADTypes.AbstractADType=ADTypes.AutoZygote():    automatic differentiation backend, currently supports   ADTypes.AutoZygote(), ADTypes.ForwardDiff(), and ADTypes.ReverseDiff(). \nkwargs...: additional keyword arguments for optimize (See optimize for details)\n\nReturns\n\nflow_trained: trained normalizing flow\nopt_stats: statistics of the optimiser during the training process    (See optimize for details)\nst: optimiser state for potential continuation of training\n\n\n\n\n\n","category":"function"},{"location":"api/","page":"API","title":"API","text":"The flow object can be constructed by transformed function in Bijectors.jl package. For example of Gaussian VI, we can construct the flow as follows:","category":"page"},{"location":"api/","page":"API","title":"API","text":"using Distributions, Bijectors\nT= Float32\nq₀ = MvNormal(zeros(T, 2), ones(T, 2))\nflow = Bijectors.transformed(q₀, Bijectors.Shift(zeros(2)) ∘ Bijectors.Scale(ones(T, 2)))","category":"page"},{"location":"api/","page":"API","title":"API","text":"To train the Gaussian VI targeting at distirbution p via ELBO maiximization, we can run","category":"page"},{"location":"api/","page":"API","title":"API","text":"using NormalizingFlows\n\nsample_per_iter = 10\nflow_trained, stats, _ = train_flow(\n    elbo,\n    flow,\n    logp,\n    sample_per_iter;\n    max_iters=2_000,\n    optimiser=Optimisers.ADAM(0.01 * one(T)),\n)","category":"page"},{"location":"api/#Variational-Objectives","page":"API","title":"Variational Objectives","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"We have implemented two variational objectives, namely, ELBO and the log-likelihood objective.  Users can also define their own objective functions, and pass it to the train_flow function. train_flow will optimize the flow parameters by maximizing vo. The objective function should take the following general form:","category":"page"},{"location":"api/","page":"API","title":"API","text":"vo(rng, flow, args...) ","category":"page"},{"location":"api/","page":"API","title":"API","text":"where rng is the random number generator, flow is the flow object, and args... are the additional arguments that users can pass to the objective function.","category":"page"},{"location":"api/#Evidence-Lower-Bound-(ELBO)","page":"API","title":"Evidence Lower Bound (ELBO)","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between q_theta and p, i.e., ","category":"page"},{"location":"api/","page":"API","title":"API","text":"beginaligned\nmin _theta mathbbE_q_thetaleftlog q_theta(Z)-log p(Z)right  quad text(Reverse KL)\n = max _theta mathbbE_q_0left log pleft(T_N circ cdots circ\nT_1(Z_0)right)-log q_0(X)+sum_n=1^N log J_nleft(F_n circ cdots circ\nF_1(X)right)right quad text(ELBO) \nendaligned","category":"page"},{"location":"api/","page":"API","title":"API","text":"Reverse KL minimization is typically used for Bayesian computation,  where one only has access to the log-(unnormalized)density of the target distribution p (e.g., a Bayesian posterior distribution),  and hope to generate approximate samples from it.","category":"page"},{"location":"api/","page":"API","title":"API","text":"NormalizingFlows.elbo","category":"page"},{"location":"api/#NormalizingFlows.elbo","page":"API","title":"NormalizingFlows.elbo","text":"elbo(flow, logp, xs) \nelbo([rng, ]flow, logp, n_samples)\n\nCompute the ELBO for a batch of samples xs from the reference distribution flow.dist.\n\nArguments\n\nrng: random number generator\nflow: variational distribution to be trained. In particular  flow = transformed(q₀, T::Bijectors.Bijector),  q₀ is a reference distribution that one can easily sample and compute logpdf\nlogp: log-pdf of the target distribution (not necessarily normalized)\nxs: samples from reference dist q₀\nn_samples: number of samples from reference dist q₀\n\n\n\n\n\n","category":"function"},{"location":"api/#Log-likelihood","page":"API","title":"Log-likelihood","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between q_theta and p, i.e., ","category":"page"},{"location":"api/","page":"API","title":"API","text":"beginaligned\n min_theta mathbbE_pleftlog q_theta(Z)-log p(Z)right quad text(Forward KL) \n = max_theta mathbbE_pleftlog q_theta(Z)right quad text(Expected log-likelihood)\nendaligned","category":"page"},{"location":"api/","page":"API","title":"API","text":"Forward KL minimization is typically used for generative modeling,  where one is given a set of samples from the target distribution p (e.g., images) and aims to learn the density or a generative process that outputs high quality samples.","category":"page"},{"location":"api/","page":"API","title":"API","text":"NormalizingFlows.loglikelihood","category":"page"},{"location":"api/#NormalizingFlows.loglikelihood","page":"API","title":"NormalizingFlows.loglikelihood","text":"loglikelihood(flow::Bijectors.TransformedDistribution, xs::AbstractVecOrMat)\n\nCompute the log-likelihood for variational distribution flow at a batch of samples xs from  the target distribution p. \n\nArguments\n\nflow: variational distribution to be trained. In particular  \"flow = transformed(q₀, T::Bijectors.Bijector)\",  q₀ is a reference distribution that one can easily sample and compute logpdf\nxs: samples from the target distribution p.\n\n\n\n\n\n","category":"function"},{"location":"api/#Training-Loop","page":"API","title":"Training Loop","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"NormalizingFlows.optimize","category":"page"},{"location":"api/#NormalizingFlows.optimize","page":"API","title":"NormalizingFlows.optimize","text":"optimize(\n    rng::AbstractRNG, \n    ad::ADTypes.AbstractADType, \n    vo, \n    θ₀::AbstractVector{T}, \n    re, \n    args...; \n    kwargs...\n)\n\nIteratively updating the parameters θ of the normalizing flow re(θ) by calling grad!  and using the given optimiser to compute the steps.\n\nArguments\n\nrng::AbstractRNG: random number generator\nad::ADTypes.AbstractADType: automatic differentiation backend\nvo: variational objective\nθ₀::AbstractVector{T}: initial parameters of the normalizing flow\nre: function that reconstructs the normalizing flow from the flattened parameters\nargs...: additional arguments for vo\n\nKeyword Arguments\n\nmax_iters::Int=10000: maximum number of iterations\noptimiser::Optimisers.AbstractRule=Optimisers.ADAM(): optimiser to compute the steps\nshow_progress::Bool=true: whether to show the progress bar. The default information printed in the progress bar is the iteration number, the loss value, and the gradient norm.\ncallback=nothing: callback function with signature cb(iter, opt_state, re, θ) which returns a dictionary-like object of statistics to be displayed in the progress bar. re and θ are used for reconstructing the normalizing flow in case that user  want to further axamine the status of the flow.\nhasconverged = (iter, opt_stats, re, θ, st) -> false: function that checks whether the training has converged. The default is to always return false.\nprog=ProgressMeter.Progress(           max_iters; desc=\"Training\", barlen=31, showspeed=true, enabled=show_progress       ): progress bar configuration\n\nReturns\n\nθ: trained parameters of the normalizing flow\nopt_stats: statistics of the optimiser\nst: optimiser state for potential continuation of training\n\n\n\n\n\n","category":"function"},{"location":"api/#Utility-Functions-for-Taking-Gradient","page":"API","title":"Utility Functions for Taking Gradient","text":"","category":"section"},{"location":"api/","page":"API","title":"API","text":"NormalizingFlows.grad!\nNormalizingFlows.value_and_gradient!","category":"page"},{"location":"api/#NormalizingFlows.grad!","page":"API","title":"NormalizingFlows.grad!","text":"grad!(\n    rng::AbstractRNG,\n    ad::ADTypes.AbstractADType,\n    vo,\n    θ_flat::AbstractVector{<:Real},\n    reconstruct,\n    out::DiffResults.MutableDiffResult,\n    args...\n)\n\nCompute the value and gradient for negation of the variational objective vo  at θ_flat using the automatic differentiation backend ad.  \n\nDefault implementation is provided for ad where ad is one of AutoZygote,  AutoForwardDiff, AutoReverseDiff (with no compiled tape), and AutoEnzyme. The result is stored in out.\n\nArguments\n\nrng::AbstractRNG: random number generator\nad::ADTypes.AbstractADType: automatic differentiation backend, currently supports   ADTypes.AutoZygote(), ADTypes.ForwardDiff(), and ADTypes.ReverseDiff(). \nvo: variational objective\nθ_flat::AbstractVector{<:Real}: flattened parameters of the normalizing flow\nreconstruct: function that reconstructs the normalizing flow from the flattened parameters\nout::DiffResults.MutableDiffResult: mutable diff result to store the value and gradient\nargs...: additional arguments for vo\n\n\n\n\n\n","category":"function"},{"location":"api/#NormalizingFlows.value_and_gradient!","page":"API","title":"NormalizingFlows.value_and_gradient!","text":"value_and_gradient!(\n    ad::ADTypes.AbstractADType,\n    f,\n    θ::AbstractVector{T},\n    out::DiffResults.MutableDiffResult\n) where {T<:Real}\n\nCompute the value and gradient of a function f at θ using the automatic differentiation backend ad.  The result is stored in out.  The function f must return a scalar value. The gradient is stored in out as a vector of the same length as θ.\n\n\n\n\n\n","category":"function"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = NormalizingFlows","category":"page"},{"location":"#NormalizingFlows.jl","page":"Home","title":"NormalizingFlows.jl","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Documentation for NormalizingFlows.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The purpose of this package is to provide a simple and flexible interface for  variational inference (VI) and normalizing flows (NF) for Bayesian computation and generative modeling. The key focus is to ensure modularity and extensibility, so that users can easily  construct (e.g., define customized flow layers) and combine various components  (e.g., choose different VI objectives or gradient estimates)  for variational approximation of general target distributions,  without being tied to specific probabilistic programming frameworks or applications. ","category":"page"},{"location":"","page":"Home","title":"Home","text":"See the documentation for more.  ","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"To install the package, run the following command in the Julia REPL:","category":"page"},{"location":"","page":"Home","title":"Home","text":"]  # enter Pkg mode\n(@v1.9) pkg> add git@github.com:TuringLang/NormalizingFlows.jl.git","category":"page"},{"location":"","page":"Home","title":"Home","text":"Then simply run the following command to use the package:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using NormalizingFlows","category":"page"},{"location":"#What-are-normalizing-flows?","page":"Home","title":"What are normalizing flows?","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Normalizing flows transform a simple reference distribution q_0 (sometimes known as base distribution) to  a complex distribution q_theta using invertible functions with trainable parameter theta, aiming to approximate a target distribution p. The approximation is achieved by minimizing some statistical distances between q and p.","category":"page"},{"location":"","page":"Home","title":"Home","text":"In more details, given the base distribution, usually a standard Gaussian distribution, i.e., q_0 = mathcalN(0 I), we apply a series of parameterized invertible transformations (called flow layers), T_1 theta_1 cdots T_N theta_k, yielding that","category":"page"},{"location":"","page":"Home","title":"Home","text":"Z_N = T_N theta_N circ cdots circ T_1 theta_1 (Z_0)  quad Z_0 sim q_0quad  Z_N sim q_theta ","category":"page"},{"location":"","page":"Home","title":"Home","text":"where theta = (theta_1 dots theta_N) are the parameters to be learned, and q_theta is the transformed distribution (typically called the variational distribution or the flow distribution).  This describes sampling procedure of normalizing flows, which requires sending draws from the base distribution through a forward pass of these flow layers.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Since all the transformations are invertible (technically diffeomorphic), we can evaluate the density of a normalizing flow distribution q_theta by the change of variable formula:","category":"page"},{"location":"","page":"Home","title":"Home","text":"q_theta(x)=fracq_0left(T_1^-1 circ cdots circ\nT_N^-1(x)right)prod_n=1^N J_nleft(T_n^-1 circ cdots circ\nT_N^-1(x)right) quad J_n(x)=leftoperatornamedet nabla_x\nT_n(x)right","category":"page"},{"location":"","page":"Home","title":"Home","text":"Here we drop the subscript theta_n n = 1 dots N for simplicity.  Density evaluation of normalizing flow requires computing the inverse and the Jacobian determinant of each flow layer.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Given the feasibility of i.i.d. sampling and density evaluation, normalizing flows can be trained by minimizing some statistical distances to the target distribution p. The typical choice of the statistical distance is the forward and reverse Kullback-Leibler (KL) divergence, which leads to the following optimization problems:","category":"page"},{"location":"","page":"Home","title":"Home","text":"beginaligned\ntextReverse KLquad\nargmin _theta mathbbE_q_thetaleftlog q_theta(Z)-log p(Z)right \n= argmin _theta mathbbE_q_0leftlog fracq_theta(T_Ncirc cdots circ T_1(Z_0))p(T_Ncirc cdots circ T_1(Z_0))right \n= argmax _theta mathbbE_q_0left log pleft(T_N circ cdots circ T_1(Z_0)right)-log q_0(X)+sum_n=1^N log J_nleft(F_n circ cdots circ F_1(X)right)right\nendaligned","category":"page"},{"location":"","page":"Home","title":"Home","text":"and ","category":"page"},{"location":"","page":"Home","title":"Home","text":"beginaligned\ntextForward KLquad\nargmin _theta mathbbE_pleftlog q_theta(Z)-log p(Z)right \n= argmin _theta mathbbE_pleftlog q_theta(Z)right \nendaligned","category":"page"},{"location":"","page":"Home","title":"Home","text":"Both problems can be solved via standard stochastic optimization algorithms, such as stochastic gradient descent (SGD) and its variants. ","category":"page"},{"location":"example/#Example:-Using-Planar-Flow","page":"Example","title":"Example: Using Planar Flow","text":"","category":"section"},{"location":"example/","page":"Example","title":"Example","text":"Here we provide a minimal demonstration of learning a synthetic 2d banana distribution using planar flows (Renzende et al. 2015) by maximizing the Evidence Lower Bound (ELBO). To complete this task, the two key inputs are:","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"the log-density function of the target distribution, \nthe planar flow. ","category":"page"},{"location":"example/#The-Target-Distribution","page":"Example","title":"The Target Distribution","text":"","category":"section"},{"location":"example/","page":"Example","title":"Example","text":"The Banana object is defined in example/targets/banana.jl, see the source code for details.","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"p = Banana(2, 1.0f-1, 100.0f0)\nlogp = Base.Fix1(logpdf, p)","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"Visualize the contour of the log-density and the sample scatters of the target distribution:  (Image: Banana)","category":"page"},{"location":"example/#The-Planar-Flow","page":"Example","title":"The Planar Flow","text":"","category":"section"},{"location":"example/","page":"Example","title":"Example","text":"The planar flow is defined by repeated applying a sequence of invertible transformations to a base distribution q_0.  The building blocks for a planar flow of length N are the following invertible transformations, called planar layers:","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"textplanar layers \nT_n theta_n(x)=x+u_n cdot tanh left(w_n^T x+b_nright) quad n=1 ldots N ","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"where theta_n = (u_n w_n b_n) n=1 dots N are the parameters to be learned.  Thankfully, Bijectors.jl provides a nice framework to define a normalizing flow. Here we used the PlanarLayer() from Bijectors.jl to construct a  20-layer planar flow, of which the base distribution is a 2d standard Gaussian distribution.","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"using Bijectors, FunctionChains\n\nfunction create_planar_flow(n_layers::Int, q₀)\n    d = length(q₀)\n    Ls = [f32(PlanarLayer(d)) for _ in 1:n_layers]\n    ts = fchain(Ls)\n    return transformed(q₀, ts)\nend\n\n# create a 20-layer planar flow\nflow = create_planar_flow(20, MvNormal(zeros(Float32, 2), I))\nflow_untrained = deepcopy(flow) # keep a copy of the untrained flow for comparison","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"Notice that here the flow layers are chained together using fchain function from FunctionChains.jl.  Alternatively, one can do","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"ts = reduce(∘, [f32(PlanarLayer(d)) for i in 1:20]) ","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"However, we recommend using fchain to reduce the compilation time when the number of layers is large. See this comment for how the compilation time might be a concern.","category":"page"},{"location":"example/#Flow-Training","page":"Example","title":"Flow Training","text":"","category":"section"},{"location":"example/","page":"Example","title":"Example","text":"Then we can train the flow by maximizing the ELBO using the train_flow function as follows: ","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"using NormalizingFlows\nusing ADTypes\nusing Optimisers\n\nsample_per_iter = 10\n# callback function to track the number of samples used per iteration\ncb(iter, opt_stats, re, θ) = (sample_per_iter=sample_per_iter,)\n# defined stopping criteria when the gradient norm is less than 1e-3\ncheckconv(iter, stat, re, θ, st) = stat.gradient_norm < 1e-3\nflow_trained, stats, _ = train_flow(\n    elbo,\n    flow,\n    logp,\n    sample_per_iter;\n    max_iters=200_00,\n    optimiser=Optimisers.ADAM(),\n    callback=cb,\n    hasconverged=checkconv,\n    ADbackend=AutoZygote(), # using Zygote as the AD backend\n)","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"Examine the loss values during training:","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"using Plots\n\nlosses = map(x -> x.loss, stats)\nplot(losses; xlabel = \"#iteration\", ylabel= \"negative ELBO\", label=\"\", linewidth=2) ","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"(Image: elbo)","category":"page"},{"location":"example/#Evaluating-Trained-Flow","page":"Example","title":"Evaluating Trained Flow","text":"","category":"section"},{"location":"example/","page":"Example","title":"Example","text":"Finally, we can evaluate the trained flow by sampling from it and compare it with the target distribution. Since the flow is defined as a Bijectors.TransformedDistribution, one can easily sample from it using rand function, or examine the density using logpdf function. See documentation of Bijectors.jl for details.","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"using Random, Distributions\n\nnsample = 1000\nsamples_trained = rand(flow_trained, n_samples) # 1000 iid samples from the trained flow \nsamples_untrained = rand(flow_untrained, n_samples) # 1000 iid samples from the untrained flow\nsamples_true = rand(p, n_samples) # 1000 iid samples from the target\n\n# plot \nscatter(samples_true[1, :], samples_true[2, :]; label=\"True Distribution\", color=:blue, markersize=2, alpha=0.5)\nscatter!(samples_untrained[1, :], samples_untrained[2, :]; label=\"Untrained Flow\", color=:red, markersize=2, alpha=0.5)\nscatter!(samples_trained[1, :], samples_trained[2, :]; label=\"Trained Flow\", color=:green, markersize=2, alpha=0.5)\nplot!(title = \"Comparison of Trained and Untrained Flow\", xlabel = \"X\", ylabel= \"Y\", legend=:topleft) ","category":"page"},{"location":"example/","page":"Example","title":"Example","text":"(Image: compare)","category":"page"},{"location":"example/#Reference","page":"Example","title":"Reference","text":"","category":"section"},{"location":"example/","page":"Example","title":"Example","text":"Rezende, D. and Mohamed, S., 2015. Variational inference with normalizing flows. International Conference on Machine Learning  ","category":"page"}]
}
