Structured Energy Network as a dynamic loss function. Case study. A case study with multi-label Classification

Jay-Yoon Lee; Dhruvesh Patel; Purujit Goyal; Andrew McCallum

Structured Energy Network as a dynamic loss function. Case study. A case study with multi-label Classification

Jay-Yoon Lee, Dhruvesh Patel, Purujit Goyal, Andrew McCallum

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Structured Prediction, Energy network, Energy-based models, Loss-function learning, Dynamic loss function

Abstract: We propose SEAL which utilizes this energy network as a trainable loss function for a simple feedfoward network. Structured prediction energy networks (SPENs) (Belanger & McCallum, 2016; Gygli et al., 2017) have shown that a neural network (i.e. energy network) can learn a reasonable energy function over the candidate structured outputs. We find that rather than using SPEN as a prediction network, using it as a trainable loss function is not only computationally efficient but also results in higher performance. compared to SPENs in both training and inference time. As the energy loss function is trainable, we propose SEAL to be dynamic which can adapt energy function to focus on the region where feedforward model will be affected most. We find this to be effective in ablation study comparing SEAL to the static version (§4) where energy function is fixed after pretraining. We show the relation to previous work on the joint optimization model of energy network and feedforward model (INFNET) as we show that it is equivalent to SEAL using margin-based loss if INFNET relaxes their loss function. Based on the unique architecture of SEAL, we further propose a variant of SEAL that utilizes noise contrastive ranking (NCE) loss that by itself does not perform well as a structured energy network, but embodied in SEAL, it shows the greatest performance among the variants we study. We demonstrate the effectiveness of SEAL on 7 feature-based and 3 text-based multi-label classification datasets. The best version of SEAL that uses NCE ranking method achieves close to +2.85, +2.23 respective F1 point gain in average over cross-entropy and INFNET on the feature-based datasets, excluding one outlier that has an excessive gain of +50.0 F1 points. Lastly, examining whether the proposed framework is effective on a large pre-trained model as well, we observe SEAL achieving +0.87 F1 point gain in average on top of BERT-based adapter model o text datasets.

One-sentence Summary: We propose to use SPEN, a powerful structured prediction model, from another angle: as a dynamic trainable loss function parameterized by neural network..

31 Replies

Loading