Kernel Matrix Estimation of a Determinantal Point Process from a Finite Set of Samples: Properties and Algorithms
Abstract: Determinantal point processes (DPPs) on finite sets have recently gained popularity because of their ability to promote diversity among selected elements in a given subset. The probability distribution of a DPP is defined by the determinant of a positive semi-definite, real-valued matrix. When estimating the DPP parameter matrix, it is often more convenient to express the maximum likelihood criterion using the framework of L-ensembles. However, the resulting optimization problem is non-convex and N P-hard to solve.
In this paper, we establish conditions under which the maximum likelihood criterion has a well-defined optimum for a given finite set of samples. We demonstrate that regularization is generally beneficial for ensuring a proper solution. To solve the resulting optimization
problem, we propose a proximal algorithm which minimizes a penalized criterion. Through simulations, we compare our algorithm with previously proposed approaches, illustrating their differing behaviors and providing empirical support for our theoretical findings.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We thank the reviewers for their careful reading of the manuscript and good
quality review. Their independent evaluations seem globally to
converge. Besides, all their comments seem to us relevant and justified. We
hence made changes to take them into account (in blue in the supplementary material pdf file).
We provide a unified answer to the three following questions/requested changes,
which are common to the three reviewers.
1. **computational complexity**
- (reviewer 2ckl) "discuss the computational trade-offs of their proximal
method, comparing its per-iteration cost with that of simpler but unstable
fixed-point iteration."
- (reviewer VE63) "How does the algorithm behave when N is very large?"
- (reviewer bWjR) "For both the proposed method and existing approaches,
computational efficiency remains a primary challenge, as these methods
typically have 𝑂 ( 𝑁 3 ) complexity. While this does not weaken the
contribution of the paper, it would be valuable if the authors could
discuss potential directions for improving the computational efficiency or
scalability of the algorithm."
- *Our answer =>* All reviewers mentionned that computational cost/complexity is a
fundamental issue, especially for N large. Following the recommandations,
we slightly clarified Section 4.4 and proposed some potential directions
for improvement. [modifications on pp.8-9]
2. **better motivation**
- (reviewer 2ckl) "motivate regularization more clearly and highlight why the
non-coercive property is a more immediate, practical barrier to DPP
learning than the known barriers for NP-hardness."
- (reviewer VE63) "The authors claim that, as stated in the title, the
contributions are the properties and algorithm. However, are these
contributions leading to a better solution to the DPP. Since the
regularized criterion is already studied in the literature and the proximal
algorithm is also a standard approach for solving the regularized
optimization problem, the author is suggested to clarify how this paper can
push forward the frontier."
- *Our answer =>* As suggested by reviewers (2ckl) and (VE63), we improved the
motivation of our work in the introduction. First, we better emphasized
that non-coercivity constitutes a fundamental issue. Also, we mentionned
that proximal methods offer many possibilies. We hope that making these two
facts clear will contribute to pushing forward the frontier of DPP kernel
estimation. [modifications on p.2]
3. **choice of the parameters**
- (reviewer 2ckl) "a sensitivity analysis in the experiments to show how the
choice of the regularization parameter 𝜆 affects stability, convergence
speed, or final kernel accuracy"
- (reviewer VE63) "How to choose the regularization parameter of the
regularized criterion?"
- (reviewer bWjR) "In the experimental section, it would be helpful if the
authors could include additional experiments or discussion regarding the
tuning of hyperparameters 𝜇 , 𝜖 , 𝜈 , to provide guidance on their
practical selection."
- *Our answer =>* All reviewers asked for additionnal elements concerning the choice of
the different parameters \mu, \epsilon, \nu. Our choices were manually
tuned and this is now mentionned in the text. Also, to provide some guiding
elements, we added a short simulation (Figure 2) illustrating the choice of
\mu and some associated comments. Finally, we illustrated in Table 1 that
the returned solution is only slightly perturbed for small values of the
parameter \epsilon. [modifications on pp.10-11]
Below are two remarks from reviewer (VE63):
+ "In the abstract, the author has the wording "to address this challenge." What
is the challenge?"
- *Our answer =>* Thank you for indicating this unclear sentence: we modified it and
write now "to solve the resulting optimization problem".
[modifications on p.1]
+ "Please add more details or discussions for Theorem 1."
- *Our answer =>* The paragraph below Theorem 1 has been rewritten to better emphasize
our comments on the theorem. A reference has also been added to justify that
the considered functions satisfy the (KL) assumption by considering an
o-minimal structure.
[modifications on pp.6-7]
Assigned Action Editor: ~Sylvain_Le_Corff1
Submission Number: 6225
Loading