- Abstract: Long Short-Term Memory network(LSTM) has attracted much attention on sequence modeling tasks, because of its ability to preserve longer term information in a sequence, compared to ordinary Recurrent Neural Networks(RNN's). The basic LSTM structure assumes a chain structure of the input sequence. However, audio streams often show a trend of combining phonemes into meaningful units, which could be words in speech processing task, or a certain type of noise in signal and noise separation task. We introduce Seq2Tree network, a modification of the LSTM network which constructs a tree structure from an input sequence. Experiments show that Seq2Tree network outperforms the state-of-the-art Bidirectional LSTM(BLSTM) model on the signal and noise separation task, namely CHiME Speech Separation and Recognition Challenge.
- TL;DR: A paper describing our ongoing work on developing a neural network architecture which could extensively and dynamically model multimedia data. Tested on Noise and Signal Separation task.
- Keywords: Deep Learning, Noise and Signal Separation, Recurrent Model, Multimedia Modeling