Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models

Leask, Patrick; Al Moubayed, Noura

Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models

Leask, Patrick; Al Moubayed, Noura

Authors

Patrick Leask patrick.leask@durham.ac.uk
PGR Student Doctor of Philosophy

Dr Noura Al Moubayed noura.al-moubayed@durham.ac.uk
Associate Professor

Abstract

Sparse Autoencoders (SAEs) are a popular method for decomposing Large Language Model (LLM) activations into interpretable latents, however they have a substantial training cost and SAEs learned on different models are not directly comparable. Motivated by relative representation similarity measures, we introduce Inference-Time Decomposition of Activation models (ITDAs). ITDAs are constructed by greedily sampling activations into a dictionary based on an error threshold on their matching pursuit reconstruction. ITDAs can be trained in 1% of the time of SAEs, allowing us to cheaply train them on Llama-3.1 70B and 405B. ITDA dictionaries also enable cross-model comparisons, and outperform existing methods like CKA, SVCCA, and a relative representation method on a benchmark of representation similarity. Code available at https://github.com/pleask/itda

Citation

Leask, P., & Al Moubayed, N. (2025, July). Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models. Presented at International Conference on Machine Learning (ICML 2025), Vancouver, Canada

Presentation Conference Type	Conference Paper (published)
Conference Name	International Conference on Machine Learning (ICML 2025)
Start Date	Jul 13, 2025
End Date	Jul 19, 2025
Acceptance Date	May 26, 2025
Deposit Date	Jun 1, 2025
Peer Reviewed	Peer Reviewed
Series Title	Proceedings of Machine Learning Research
Series ISSN	2640-3498
Public URL	https://durham-repository.worktribe.com/output/4012842
Publisher URL	https://proceedings.mlr.press/
External URL	https://icml.cc/Conferences/2025