Keywords: hierarchical decoder, hierarchical text classification, emotional support conversation
TL;DR: We design a new type of language model with the format of hierarchical decoder, which performs well on multiple classification and generation tasks.
Abstract: Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by humans' hierarchical reasoning capability, we propose that a hierarchical decoder architecture could be built with different layers decoding texts in a streaming manner. Due to limited time and computational resources, we choose to adapt a pretrained language model into this form of hierarchical decoder. Language heads of the last layer are copied to different selected intermediate layers, and fine-tuned with different task inputs. By thorough experiments, we validate that these selective intermediate layers could be adapted to speak meaningful and reasonable contents, and this paradigm of hierarchical decoder can obtain state-of-the-art performances on multiple tasks such as hierarchical text classification, classification-guided generation, and hierarchical text generation. HdLM outperforms all baselines on WoS, DBpedia, ESConv, EmpatheticDialogues, AQuA, CommonSenseQA, and several cognitive tests. We also provide a thorough theoretical analysis to validate the convergence and computational savings of our methodology. Our code and model can be found on https://anonymous.4open.science/r/HdLM-0B90/.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 13147
Loading