Bad Predictive Coding Activation Functions

Published: 19 Mar 2024, Last Modified: 07 May 2024Tiny Papers @ ICLR 2024 PresentEveryoneRevisionsBibTeXCC BY 4.0
Keywords: predictive coding, energy functions, mimima, vizualizations
TL;DR: We show on a toy model how different activation function could affect the performance, and highlight this numerically in several experiments
Abstract: We investigate predictive coding networks (PCNs) by analyzing their performance under different activation function choices. We expand a previous theoretical discussion of a simple toy example of PCN in the training stage. Compared to classic gradient-based empirical risk minimization, we observe differences for the ReLU activation function. This leads us to carry out an empirical evaluation of classification tasks on FashionMNIST, CIFAR-10. We show that while ReLU might be a good baseline for classic machine learning, for predictive coding, it performs worse than other activation functions while also leading to the largest drop in performance compared to gradient-based empirical risk minimization.
Submission Number: 251
Loading