Keywords: regret minimization, bilateral trade, online learning, fairness, theory
TL;DR: We study the online learning problem of brokerage between traders where the objective is to maximize the total number of trades
Abstract: We explore brokerage between traders in an online learning framework.
At any round $t$, two traders meet to exchange an asset, provided the exchange is mutually beneficial.
The broker proposes a trading price, and each trader tries to sell their asset or buy the asset from the other party, depending on whether the price is higher or lower than their private valuations.
A trade happens if one trader is willing to sell and the other is willing to buy at the proposed price.
Previous work provided guidance to a broker aiming at enhancing traders' total earnings by maximizing the _gain from trade_, defined as the sum of the traders' net utilities after each interaction.
This classical notion of reward can be highly unfair to traders with small profit margins, and far from the real-life utility of the broker.
For these reasons, we investigate how the broker should behave to maximize the trading volume, i.e., the _total number of trades_.
We model the traders' valuations as an i.i.d. process with an unknown distribution.
If the traders' valuations are revealed after each interaction (full-feedback), and the traders' valuations cumulative distribution function (cdf) is continuous, we provide an algorithm achieving logarithmic regret and show its optimality up to constants.
If only their willingness to sell or buy at the proposed price is revealed after each interaction ($2$-bit feedback), we provide an algorithm achieving poly-logarithmic regret when the traders' valuations cdf is Lipschitz and show its near-optimality.
We complement our results by analyzing the implications of dropping the regularity assumptions on the unknown traders' valuations cdf.
If we drop the continuous cdf assumption, the regret rate degrades to $\Theta(\sqrt{T})$ in the full-feedback case, where $T$ is the time horizon.
If we drop the Lipschitz cdf assumption, learning becomes impossible in the $2$-bit feedback case.
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7017
Loading