everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
We present a sequence-to-sequence vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple tasks, resulting in a single model which we named Musketeer. The integration of knowledge across heterogeneous tasks is enabled by a novel feature called Task Explanation Prompt (TEP). TEP reduces interference among tasks, allowing the model to focus on their shared structure. With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.