Bad Predictive Coding Activation Functions

Simon Frieder; Luca Pinchetti; Thomas Lukasiewicz

Bad Predictive Coding Activation Functions

Simon Frieder, Luca Pinchetti, Thomas Lukasiewicz

Published: 19 Mar 2024, Last Modified: 07 May 2024Tiny Papers @ ICLR 2024 PresentEveryoneRevisionsBibTeXCC BY 4.0

Keywords: predictive coding, energy functions, mimima, vizualizations

TL;DR: We show on a toy model how different activation function could affect the performance, and highlight this numerically in several experiments

Abstract: We investigate predictive coding networks (PCNs) by analyzing their performance under different activation function choices. We expand a previous theoretical discussion of a simple toy example of PCN in the training stage. Compared to classic gradient-based empirical risk minimization, we observe differences for the ReLU activation function. This leads us to carry out an empirical evaluation of classification tasks on FashionMNIST, CIFAR-10. We show that while ReLU might be a good baseline for classic machine learning, for predictive coding, it performs worse than other activation functions while also leading to the largest drop in performance compared to gradient-based empirical risk minimization.

Submission Number: 251

Loading