Transformer Models Do Not Just Learn Surface Statistics

A common criticism of Transformer models, such as ChatGPT, BERT, and Bard, is that they only learn surface statistics. According to this criticism, the predictions by transformers are superficial, because they do not represent the underlying state. In the case of language, the models would only capture general co-occurrence, on which transformer LLMS are typically trained, but neither the underlying hierarchical nature of language nor anything about the states of the world.

Evidence by now strongly suggests that this absolute criticism is wrong. In the following, I list the papers providing the evidence:

Board games

Transformer models learn states of board games (Chess, Othello) when modelling sequences. This evidence is very convincing in showing that Transformer models are in principle able to recover more than surface statistics.

  • Toshniwal et al. 2022.
  • Li et al. 2023

The paper by Li et al. is especially convincing, since they test the role of the state representation using interventions.

Hierarchical Syntax

The states of Transformer language models reflect syntax, including a hierarchical structure which is not obvious from the surface of language strings:

  • Lin et al. 2019
  • Tenney et al. 2019
  • Rogers et al. 2020: 843-844

Layer-Wise Operations

Some layer-wise operations in Transformer models appear to reflect human interpretable concepts. That these operations at least appear associated with meaningful concepts, suggests that they do not just recover meaningless surface statistics:

  • Geva et al. 2022

(This piece of evidence is perhaps more preliminary than the others.)

Correlations with Psychometric data

Transformer language models appear to have some correlation with psychometric data, including human brain states. Presumably human cognition reflects an underlying world state when processing language:

  • Wilcox et al. 2020
  • Merkx & Frank 2021
  • Michaelov et al. 2021
  • Oh et al. 2021
  • Schrimpf et al. 2021
  • Caucheteux et al. 2022
  • Caucheteux & King 2022


The evidence presented in these paper supports a role for representation of an underlying state. At this point, I consider the statement “Transformer models only learn surface statistics” to be probably wrong. (My subjective credence that they learn something about the underlying states is around 90%.)

I have not presented here evidence concerning the shortcomings of Transformer models. Such shortcomings exist. Specifically, the evidence I have pointed towards does not rule out that Transformer models are overrelient on surface statistics (for such a suggestion, see also Rogers et al. 2020: 843-844) and fail to model some aspects of the underlying state. The presented evidences also does not show that Transformer models fully capture compositionality, which I personally doubt they do, or that they can fully grasp meaning in the absence of non-textual data.


Previous Next
Are Transformer LLMs Minds? Groningen Cognitive Modelin...