Boosted Long Short-Term Memory with Additional Inner Layers

Kacper Milan; Adrian Horzyk

Boosted Long Short-Term Memory with Additional Inner Layers

Kacper Milan, Adrian Horzyk

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Recurrent Neural Networks, Long Short-Term Memory, Sequence classification, Boosted architectures

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: The Boosted LSTM model, enhanced with additional layers inside the unit, outperforms standard LSTM architectures in empirical tasks, often converging in fewer epochs.

Abstract: Long Short-Term Memory (LSTM) is widely known as a powerful type of Recurrent Neural Network, allowing it to achieve great results on many difficult sequential data tasks. Numerous experiments have shown that adding more complexity to neural network architectures may lead to a significant increase in performance that outweighs the incurred costs of an upgraded structure. In this paper, we propose a Boosted LSTM model created by adding layers inside the LSTM unit to optimize the model by enhancing its memory and reasoning capabilities. We evaluated the performance of different versions of Boosted LSTM architectures using three empirical tasks, studying the impact of different placements of additional layers, the activation functions used in the additional layers, and the model's hidden units. The experiments have shown that the Boosted LSTM unit, which uses Exponential Linear Unit as its boosted layers activation function, performs better than the similar models created from the simple LSTM units while often taking fewer epochs to achieve similar or better results, usually in a smaller number of training epochs.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9308

Loading