Abstract: With the recent emergence of powerful instruction-tuned large language models (LLMs), various helpful conversational Artificial Intelligence (AI) systems have been deployed across many applications. When prompted by users, these AI systems successfully perform a wide range of tasks as part of a conversation. To provide some sort of memory and context, such approaches typically condition their output on the entire conversational history. Although this sensitivity to the conversational history can often lead to improved performance on subsequent tasks, we find that performance can in fact also be negatively impacted, if there is a \emph{task-switch}. To the best of our knowledge, our work makes the first attempt to formalize the study of such vulnerabilities and interference of tasks in conversational LLMs caused by task-switches in the conversational history. Our experiments across 5 datasets with 15 task switches using popular LLMs reveal that many of the task-switches can lead to significant performance degradation.
Paper Type: short
Research Area: Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability
Languages Studied: English
Preprint Status: We are considering releasing a non-anonymous preprint in the next two months (i.e., during the reviewing process).
A1: yes
A1 Elaboration For Yes Or No: Section 6
A2: yes
A2 Elaboration For Yes Or No: Section 7
A3: yes
A3 Elaboration For Yes Or No: Abstract and Section 1
B: yes
B1: yes
B1 Elaboration For Yes Or No: Section 4.1 Experimental Setup
B2: yes
B2 Elaboration For Yes Or No: Section 7
B3: yes
B3 Elaboration For Yes Or No: Experimental setup 4.1 and Appendix A
B4: n/a
B5: n/a
B6: yes
B6 Elaboration For Yes Or No: Appendix A
C: yes
C1: yes
C1 Elaboration For Yes Or No: Section 4.1 Experimental Setup and Section 7
C2: yes
C2 Elaboration For Yes Or No: Section 4.1 Experimental Setup
C3: yes
C3 Elaboration For Yes Or No: We report single run since our experiments are extremely computationally heavy.
C4: yes
C4 Elaboration For Yes Or No: Appendix A
D: no
D1: n/a
D2: n/a
D3: n/a
D4: n/a
D5: n/a
E: no
E1: n/a
0 Replies
Loading