- Keywords: Deep learning, Information theory, representation, coding, mutual information estimation
- TL;DR: We take a step towards measuring learning task difficulty and demonstrate that in practice performance strongly depends on the match of the representation of the information and the model interpreting it.
- Abstract: Learning can be framed as trying to encode the mutual information between input and output while discarding other information in the input. Since the distribution between input and output is unknown, also the true mutual information is. To quantify how difficult it is to learn a task, we calculate a observed mutual information score by dividing the estimated mutual information by the entropy of the input. We substantiate this score analytically by showing that the estimated mutual information has an error that increases with the entropy of the data. Intriguingly depending on how the data is represented the observed entropy and mutual information can vary wildly. There needs to be a match between how data is represented and how a model encodes it. Experimentally we analyze image-based input data representations and demonstrate that performance outcomes of extensive network architectures searches are well aligned to the calculated score. Therefore to ensure better learning outcomes, representations may need to be tailored to both task and model to align with the implicit distribution of the model.
- Code: https://drive.google.com/open?id=1D8wICzJVPJRUWB9y5WgceslXZfurY34g