Abstract: Large language models identify patterns in the relations between words and capture their relations in an embedding space. Schramowski and colleagues show that a direction in this space can be identified that separates ‘right’ and ‘wrong’ actions as judged by human survey participants.
0 Replies
Loading