New ACL Paper on GPT-2 Mechanistic Interpretability and Biases in Multiple Choice QA

May 21, 2025 1 min read

We’re excited to announce that our paper Anchored Answers: Unravelling Positional Bias in GPT-2’s Multiple-Choice Questions on has been accepted to an upcoming ACL 2025!

In this study, we investigate how GPT-2 internally represents and reasons over Multiple-Choice Questions (MCQA) tasks. Our analysis uncovers shortcut behaviors, positional heuristics that influence how the model selects answers—even when those answers appear superficially correct.

By conducting neuron- and circuit-level analysis, we shed light on how LLMs process structured prompts, where hidden biases emerge. This work contributes to our broader goal of developing safe, interpretable, and trustworthy AI systems in general domain NLP.

This is a collaboration with Prof. Ruizhe Li from University of Aberdeen.

Slides and code will be shared soon—stay tuned!

New ACL Paper on GPT-2 Mechanistic Interpretability and Biases in Multiple Choice QA

Yanjun Gao, PhD

Assistant Professor