Keywords: rate-distortion, on-line learning, exploration exploitation, Blahut algorithm
TL;DR: Adaptive rate-distortion in view of on-line learning, exploration-exploitation tradeoff, and "natural type selection".
Abstract: The inherent trade-off in on-line learning is between exploration and exploitation.
A good balance between these two (conflicting) goals can achieve a better
long-term performance. Can we define an optimal balance? We propose to study
this question through an adaptive lossy compression system, which exhibits a
”natural” trade-off between exploration and exploitation.
Submission Number: 17
Loading