['6c6,9', '< The moments of a random variable are arguably the most popular all-purpose statistic. However, cumulants are often more favorable statistics than moments. For example, if µ m := E[X m ] denotes the moments of a real-valued random variable X, then µ 2 = µ 2 1 + Var(X) and hence the variance that directly measures the fluctuation around the mean is a much better statistic for scale than the second moment µ 2 , see Appendix A. Cumulants provide a systematic way to only record the parts of the moment sequence that are not already captured by lower-order moments. While the moment and cumulant sequences (µ m ) m and (κ m ) m carry the same information, cumulants have several desirable properties that generalize to R d -valued random variables (McCullagh, 2018). Among these properties of cumulants, the ones that are important for our paper is that they can characterize distributions and statistical (in)dependence.', '---', '> While moments are widely used as all-purpose statistics for random variables, cumulants often offer more favorable properties, particularly in terms of interpretability, statistical efficiency, and characterization of independence. For instance, the variance, a key measure of fluctuation, is a cumulant (specifically, the second cumulant) and often provides a more robust and intuitive measure of scale than the second moment, which is heavily influenced by the mean (see Appendix A for a detailed discussion). Cumulants systematically capture the novel information in the moment sequence not already described by lower-order moments. Although moments and cumulants carry equivalent information, cumulants possess several desirable properties that extend to R d -valued random variables (McCullagh, 2018), notably their ability to characterize distributions and statistical (in)dependence.', '> ', '> This paper proposes a novel framework for extending cumulants to Reproducing Kernel Hilbert Spaces (RKHSs). This extension, which we term "kernelized cumulants," leverages tools from tensor algebras and is made computationally tractable through a kernel trick. Kernelized cumulants introduce a new class of all-purpose statistics for complex data, where classical measures like the Maximum Mean Discrepancy (MMD) and the Hilbert-Schmidt Independence Criterion (HSIC) emerge as specific degree-one instances of our generalized construction. We demonstrate both theoretically and empirically that incorporating higher-degree kernelized cumulants offers significant advantages, such as enhanced statistical power in hypothesis testing, with computational costs comparable to their degree-one counterparts.', '> ', '20,31c23,34', '< Instead of directly considering the law of a tuple of random variables (X 1 , . . . , X d ) in a product space X 1 × • • • × X d , it can be advantageous to use feature maps Φ i : X i → H i and instead study the distribution of the H 1 × • • • × H d -valued random variable Φ 1 (X 1 ), . . . , Φ d (X d ) . Motivated by this lifting, we study here moments of Hilbert-space valued random variables and assume in this subsection (with a slight abuse of notations) that one has already applied the lifting and X i ∈ H i where i = 1, . . . , d. In Section 3 we specialize the construction to RKHSs, and use these moments (Def. 1) to define kernelized cumulants.', '< Moments. In the finite-dimensional case (1) we defined the moment sequence by taking expectations of products of the coordinates of the underlying random variable. For the infinite-dimensional case, it is convenient to develop a coordinate-free definition which can be accomplished by using tensors. To do so we make use of the following results about Hilbert spaces: for real Hilbert spaces H 1 and H 2 the tensor product H 1 ⊗ H 2 is the Hilbert space given by completion of the tensor product of H 1 and H 2 as vector space; we also write H ⊗m', '< 1 : = H 1 ⊗ • • • ⊗ H 1 m-times', '< . Similarly, the direct sum', '< H 1 ⊕ H 2 is a Hilbert space. It is natural to consider E X ⊗m 1 ∈ H ⊗m 1', '< as the m-th moment of a H 1 -valued random variable X 1 where the integral in the expectation is meant in Bochner sense. Consequently the natural state space for all moments of a H 1 -valued random variable is the tensor algebra T 1 := m≥0 H ⊗m 1 where by convention H ⊗0 1 := R. See Appendix B for more details on tensor products of Hilbert spaces and tensor algebras.', '< Example 2.1 (H 1 = R d , m = 2). If X 1 = X 1 1 , . . . , X d 1 is H 1 = R d -valued then E X ⊗2 1 ∈ (R d ) ⊗2', '< can be identified with a (d × d)-sized matrix whose (i, j)-th entry is E X i 1 X j 1 .', '< Since we are interested in the general case of a H 1 × • • • × H d -valued random variable X = (X 1 , . . . , X d ) we arrive at the definition below.', '< Definition 1 (Moments in Hilbert spaces). Let γ be a probability measure on H : = H 1 × • • • × H d and let (X 1 , . . . , X d ) ∼ γ. We define', '< µ i (γ) := E[X ⊗i1 1 ⊗ • • • ⊗ X ⊗i d d ] ∈ H ⊗i , H ⊗i : = H ⊗i1 1 ⊗ • • • ⊗ H ⊗i d d (3)', '< for every i ∈ N d whenever the above expectation exists. The moment sequence is defined as the element', '---', "> Instead of directly analyzing the joint distribution of random variables (X 1 , . . . , X d ) in their original product space X 1 × • • • × X d , it is often advantageous to map them into higher-dimensional feature spaces. This is achieved through feature maps Φ i : X i → H i , transforming the original random variables into H 1 × • • • × H d -valued random variables Φ 1 (X 1 ), . . . , Φ d (X d ). This 'lifting' allows us to capture complex nonlinear relationships that might be intractable in the original space. Motivated by this, we first develop the theory of moments for general Hilbert-space valued random variables. For this subsection, we assume that this lifting has already occurred, and thus consider X i ∈ H i for i = 1, . . . , d. In Section 3, we will specialize this construction to Reproducing Kernel Hilbert Spaces (RKHSs) and leverage these generalized moments (Definition 1) to define our novel kernelized cumulants.", '> ', '> Moments. In the finite-dimensional setting, moments are typically defined by taking expectations of products of coordinates of the random variable, as shown in (1). For the infinite-dimensional case, a coordinate-free definition is more suitable and can be elegantly formulated using tensor products. To facilitate this, we briefly recall key concepts about Hilbert spaces: for real Hilbert spaces H 1 and H 2 , their tensor product H 1 ⊗ H 2 is the Hilbert space obtained by completing the algebraic tensor product. We also denote H ⊗m', '> 1 : = H 1 ⊗ • • • ⊗ H 1 (m-times)', '> . Similarly, the direct sum H 1 ⊕ H 2 is also a Hilbert space. The m-th moment of an H 1 -valued random variable X 1 is naturally defined as E X ⊗m 1 ∈ H ⊗m 1', '> , where the integral is understood in the Bochner sense. Consequently, the natural state space encompassing all moments of an H 1 -valued random variable is the tensor algebra T 1 := m≥0 H ⊗m 1 , with the convention that H ⊗0 1 := R. Further details on tensor products of Hilbert spaces and tensor algebras are provided in Appendix B.', '> Example 2.1 (H 1 = R d , m = 2). If X 1 = (X 1 1 , . . . , X d 1 ) is H 1 = R d -valued, then E X ⊗2 1 ∈ (R d ) ⊗2', '> can be identified with a (d × d)-sized matrix whose (i, j)-th entry is E X i 1 X j 1 .', '> Since our interest lies in the general case of an H 1 × • • • × H d -valued random variable X = (X 1 , . . . , X d ), we arrive at the following generalized definition.', '> Definition 1 (Moments in Hilbert spaces). Let γ be a probability measure on H : = H 1 × • • • × H d and let (X 1 , . . . , X d ) ∼ γ. We define the multi-indexed moments as', '> µ i (γ) := E[X ⊗i1 1 ⊗ • • • ⊗ X ⊗i d d ] ∈ H ⊗i , H ⊗i : = H ⊗i1 1 ⊗ • • • ⊗ H ⊗i d d (3)', '> for every i ∈ N d , provided the expectation exists. The complete moment sequence is defined as the element', '33,35c36,37', '< and for m ∈ N we refer to µ m (γ) = i∈N d :deg(i)=m µ i (γ) as the m-moments of γ.', '< In case of H i = R, both definitions (1) and (3) apply for µ i (γ). Henceforth, we always refer to (3) when we write µ i (γ). Even in the finite-dimensional case, Def. 1 is useful, for instance when', '< X 1 ∈ H 1 and X 2 ∈ H 2 have different state space (H 1 ̸ = H 2 ).', '---', '> and for any m ∈ N, we refer to µ m (γ) = i∈N d :deg(i)=m µ i (γ) as the m-th moments of γ.', '> It is important to note that in the special case where H i = R for all i, both definitions (1) and (3) are applicable for µ i (γ) and yield equivalent results. Henceforth, we exclusively refer to (3) when writing µ i (γ). Even in finite-dimensional scenarios, Definition 1 proves valuable, for instance, when random variables X 1 ∈ H 1 and X 2 ∈ H 2 originate from different state spaces (H 1 ̸ = H 2 ).', '38,40c40,42', '< We lift a random variable', '< X= (X 1 , . . . , X d ) ∈ X = X 1 × • • • × X d via a feature map Φ : X → H into a Hilbert space valued random variable Φ(X).', '< For the rest of the paper (i) X 1 , . . . , X d will denote a collection of Polish spaces, but the reader is invited to think of them as finite-dimensional Euclidean spaces, (ii) H is an RKHS with kernel k and canonical feature map Φ(x) = k(x, •),2 and (iii) all kernels are assumed to be bounded. 3 Our main results (Theorem 2 and Theorem 3) are that in this case the expected kernel trick applies to both items in the kernelized version of Theorem 1. The key to these results is an expression for inner products of cumulants in RKHSs (Lemma 1).', '---', '> To extend cumulants to complex data types and capture non-linear dependencies, we lift a random variable', '> X= (X 1 , . . . , X d ) ∈ X = X 1 × • • • × X d via a feature map Φ : X → H into a Hilbert space-valued random variable Φ(X). This allows us to implicitly work in a high-dimensional feature space without explicitly computing coordinates, a crucial aspect for tractability.', '> For the remainder of this paper, we assume: (i) X 1 , . . . , X d are Polish spaces (though the reader may intuitively consider them as finite-dimensional Euclidean spaces); (ii) H is a Reproducing Kernel Hilbert Space (RKHS) equipped with a kernel k and its canonical feature map Φ(x) = k(x, •); 2 and (iii) all kernels are bounded. 3 Our central findings, Theorem 2 and Theorem 3, demonstrate that the expected kernel trick extends to both the characterization of distributions and independence in this kernelized framework, mirroring the properties of classical cumulants in R d (Theorem 1). A pivotal component for these results is the derivation of an expression for inner products of kernelized cumulants within RKHSs (Lemma 1).', '673d674', '< ']
