New ACL Paper on GPT-2 Mechanistic Interpretability and Biases in Multiple Choice QA

We’re excited to announce that our paper Anchored Answers: Unravelling Positional Bias in GPT-2’s Multiple-Choice Questions on has been accepted to an upcoming ACL 2025!

In this study, we investigate how GPT-2 internally represents and reasons over Multiple-Choice Questions (MCQA) tasks. Our analysis uncovers shortcut behaviors, positional heuristics that influence how the model selects answers—even when those answers appear superficially correct.

By conducting neuron- and circuit-level analysis, we shed light on how LLMs process structured prompts, where hidden biases emerge. This work contributes to our broader goal of developing safe, interpretable, and trustworthy AI systems in general domain NLP.

This is a collaboration with Prof. Ruizhe Li from University of Aberdeen.

Slides and code will be shared soon—stay tuned!

Yanjun Gao
Yanjun Gao
Assistant Professor

My research interests include Natural Language Generation, Semantic Representation, Summarization Evaluation, Graph-based NLP, and AI applications in medicine and education.