<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>David Strohmaier</title>
    <description>This is the website and blog of David Strohmaier.</description>
    <link>https://dstrohmaier.com/</link>
    <atom:link href="https://dstrohmaier.com/feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Sat, 06 Jun 2026 19:07:06 +0100</pubDate>
    <lastBuildDate>Sat, 06 Jun 2026 19:07:06 +0100</lastBuildDate>
    <generator>Jekyll v3.10.0</generator>
    
     
      <item>
        <title>Upcoming Masterclass on Philosophy of AI: Additional Materials</title>
        <description>&lt;p&gt;I’m offering a philosophy of AI masterclass, “Concepts in Machines”, at the University of Zürich later this month (April 23–25, 2026). In this post, I briefly sketch my intentions for this course and collect extra materials not included in &lt;a href=&quot;https://www.philosophie.uzh.ch/de/research/congresses/master_classes.html&quot;&gt;the syllabus&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;sketch-of-course-topic&quot;&gt;Sketch of Course Topic&lt;/h2&gt;

&lt;p&gt;The course explores the role of concepts in neural models. What are the candidates for conceptual representations in LLMs? Starting from this question, we turn to debates in philosophy of AI. For example, we will discuss whether neural language models and especially LLMs understand meaning and produce meaningful output. Both philosophical contributions and publications in computer science venues will be considered.&lt;/p&gt;

&lt;p&gt;One of the goals of the course is to ensure that philosophy of AI remains connected to the current and rapidly developing state of AI research. Hence, the course includes&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;A hands-on component&lt;/strong&gt;: Participants will interact with the internals of neural models, including small neural language models.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Discussions of recent findings&lt;/strong&gt;: We will debate findings in recent AI research that bear upon philosophy but have received limited attention so far.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;additional-readings&quot;&gt;Additional Readings&lt;/h2&gt;

&lt;p&gt;As the masterclass lasts only three days, we cannot cover all contributions to the central topic. I list here additional readings on the core topic of how LLMs capture meaning/concepts.&lt;/p&gt;

&lt;h3 id=&quot;vector-semantics&quot;&gt;Vector Semantics&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://doi.org/10.1016/j.tics.2024.06.011&quot;&gt;Why concepts are (probably) vectors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;the-cognitive-alignment-of-llms&quot;&gt;The Cognitive Alignment of LLMs&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://pnas.org/doi/full/10.1073/pnas.2105646118&quot;&gt;The Neural Architecture of Language: Integrative Modeling Converges on Predictive Processing&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://doi.org/10.1162/TACL.a.58&quot;&gt;Large Language Models Are Human-Like Internally&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://doi.org/10.1146/annurev-psych-030625-040748&quot;&gt;Cognitive Modeling Using Artificial Intelligence&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://aclanthology.org/2024.acl-long.787&quot;&gt;Mission: Impossible Language Models&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;See also &lt;a href=&quot;/Transformers-Psychometric/&quot;&gt;the list&lt;/a&gt; I compiled previously, although it is in need of an update.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;more-on-the-grounding-problem-for-llms&quot;&gt;More on the Grounding Problem for LLMs&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://arxiv.org/abs/2408.09605&quot;&gt;Does Thought Require Sensory Grounding? From Pure Thinkers to Large Language Models&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://aclanthology.org/2024.emnlp-main.651/&quot;&gt;Pragmatic Norms Are All You Need – Why The Symbol Grounding Problem Does Not Apply to LLMs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://royalsocietypublishing.org/doi/10.1098/rstb.2023.0149&quot;&gt;Symbol Ungrounding: What the Successes (and Failures) of Large Language Models Reveal About Human Cognition&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;See also &lt;a href=&quot;/The-Symbol-Grounding-Problem/&quot;&gt;the list&lt;/a&gt; I compiled previously&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;additional-talks&quot;&gt;Additional Talks&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://dic.uqam.ca/seminaire-dic&quot;&gt;seminar of the DIC (Doctorat en informatique cognitive) at the Université du Québec à Montréal&lt;/a&gt; has hosted some excellent talks relevant for the masterclass. I recommend at least the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Chris Potts: &lt;a href=&quot;https://dic.uqam.ca/seminaire-dic/seminaire-dic-isc-cria-25-septembre-2025-par-chris-potts/&quot;&gt;Meaning in Large Language Models: Bridging Formal Semantics, Pragmatics, and Learned Representations&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Coelho Mollo: &lt;a href=&quot;https://dic.uqam.ca/seminaire-dic/seminaire-dic-isc-cria-21-septembre-2023-par-dimitri-coelho-molo/&quot;&gt;Grounding in Large Language Models: lessons for building functional ontologies for AI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Raphael Millière: &lt;a href=&quot;https://dic.uqam.ca/seminaire-dic/seminaire-dic-isc-cria-11-janvier-202-par-raphael-milliere/&quot;&gt;Mechanistic Explanation in Deep Learning &lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are many more excellent talks on the website!&lt;/p&gt;

&lt;h2 id=&quot;update-24042026-more-on-transformer-architectureself-attention&quot;&gt;UPDATE (24.04.2026): More on Transformer Architecture/Self-Attention&lt;/h2&gt;

&lt;p&gt;Over the course of the masterclass, it became that additional materials on the transformer architecture and the self-attention mechanism are helpful.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Perhaps the best introduction are the lessons on neural networks by 3Blue1Brown, including &lt;a href=&quot;https://www.3blue1brown.com/lessons/attention&quot;&gt;a video on attention&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://jalammar.github.io/illustrated-transformer/&quot;&gt;The Illustrated Transformer blog post&lt;/a&gt; by Jay Alammar is a classic.&lt;/li&gt;
  &lt;li&gt;Brendan Bycroft has created &lt;a href=&quot;https://bbycroft.net/llm&quot;&gt;a more comprehensive and comparative visualisation&lt;/a&gt; of transformer models.&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;p&gt;Note: I used an LLM to copy-edit this post.&lt;/p&gt;
</description>
        <pubDate>Wed, 15 Apr 2026 07:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/Upcoming-Masterclass-Phil-AI/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Upcoming-Masterclass-Phil-AI/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Notes on the Symbol Grounding Problem</title>
        <description>&lt;p&gt;This post collects a set of notes on the Symbol Grounding Problem in the context of LLMs. In this context, the question is whether LLM systems have an appropriate connection to the world. The appropriate connections are presumed to establish that the representations and output of the models are meaningful.&lt;/p&gt;

&lt;p&gt;You can find a &lt;a href=&quot;/The-Symbol-Grounding-Problem/&quot;&gt;list of papers on the topic of symbol grounding on my page&lt;/a&gt;. My notes assume familiarity with the literature, but develop their own framing. I use the term “Grounding Problem” to encompass what others have, for example, called the &lt;a href=&quot;https://arxiv.org/abs/2304.01481&quot;&gt;“Vector Grounding Problem”&lt;/a&gt;. The goal of my notes is to develop how the grounding problem poses challenges to both AI research and philosophy, as well as to describe how these field-specific challenges relate to each other.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/klingerts-diving-machine-1600.jpg&quot; alt=&quot;19th century illustration of a diving machine&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;notes&quot;&gt;Notes&lt;/h2&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Similar to the easy and hard problem of consciousness, a distinction between an easy and a hard problem of grounding in AI systems can be drawn.&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;1.1. The easy problem of grounding is the challenge of creating an AI system that can process multi-modal data and integrate the information these data sources provide. It is an engineering problem on the road to fully replicating human cognitive abilities.&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;1.2. The hard problem of grounding is the challenge of a) engineering an AI system so that its representations and output carry &lt;em&gt;intrinsic meaning&lt;/em&gt;, and b) &lt;em&gt;explaining&lt;/em&gt; why any computational representations or outputs carry intrinsic meaning.&lt;/p&gt;

        &lt;ul&gt;
          &lt;li&gt;
            &lt;p&gt;1.2.1. The hard problem of grounding has both an engineering (a) and an epistemic component (b).&lt;/p&gt;
          &lt;/li&gt;
          &lt;li&gt;
            &lt;p&gt;1.2.2. Establishing the success of the engineering component requires answering the epistemic component.&lt;/p&gt;
          &lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;1.3. A solution to the epistemic component of the hard problem should answer whether solving the easy grounding problem is equivalent to solving the engineering component of the hard problem.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;As in the case of the hard problem of consciousness, whether the hard problem of grounding is truly hard is open to debate.&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;2.1. The hard problem of grounding is truly hard if and only if the explanation of why any computational representations or output carry intrinsic meaning requires a non computationally-functionalist description.&lt;/p&gt;

        &lt;ul&gt;
          &lt;li&gt;2.1.1. A description is computationally-functionalist if and only if it is equivalent to the formalisation of a Turing machine and its processes.&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;2.2. The hard problem of grounding is truly hard only if its engineering component is not shown to be equivalent to the easy problem of grounding.&lt;/p&gt;

        &lt;ul&gt;
          &lt;li&gt;2.2.1. By analogy, the hard problem of consciousness is assumed to be truly hard because it is assumed to be beyond the functionalist theory of mind.&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;2.3. The hardness of the hard problem of grounding has to be argued for by using meta-semantic theories of meaning.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;2.4. One way of establishing that the hard problem is truly hard is to show that having phenomenal consciousness is required for the representations or output of a system to carry intrinsic meaning.&lt;/p&gt;

        &lt;ul&gt;
          &lt;li&gt;2.4.1. I assume that solving the easy problem of grounding does not guarantee the AI system to have phenomenal consciousness.&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;2.5. Another way of establishing that the hard problem is truly hard is to show that semantic externalism provides conditions for carrying intrinsic meaning which are not guaranteed to be met by any system described purely in functional terms.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The relevance of the engineering component of the hard problem of grounding for AI research is unresolved.&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;3.1. One goal of AI research is to fully replicate human cognitive abilities.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;3.2. It is conceivable that addressing the engineering component of the hard grounding problem has no relevance for whether an AI system fully replicates human cognitive abilities.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;3.3. Even if the engineering component has no relevance for fully replicating human cognitive abilities, the epistemic component of the hard problem might still have such relevance.&lt;/p&gt;

        &lt;ul&gt;
          &lt;li&gt;
            &lt;p&gt;3.3.1. If we have strong and correct prima facie intuitions about which computing systems have intrinsically meaningful representations and output, then we have a cognitive ability that realises at least a partial solution to the epistemic component of the hard problem of grounding.&lt;/p&gt;
          &lt;/li&gt;
          &lt;li&gt;
            &lt;p&gt;3.3.2. If we have a cognitive ability that realises a (partial) solution to the epistemic component, it is an ability that AI research should aim to replicate (following from 3.1.).&lt;/p&gt;
          &lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Harnad, S. (1990). &lt;a href=&quot;https://doi.org/10.1016/0167-2789(90)90087-6&quot;&gt;The Symbol Grounding Problem&lt;/a&gt;. Physica D: Nonlinear Phenomena, 42(1), 335–346.&lt;/li&gt;
  &lt;li&gt;Mollo, D. C., &amp;amp; Millière, R. (2025). &lt;a href=&quot;https://doi.org/10.48550/arXiv.2304.01481&quot;&gt;The Vector Grounding Problem&lt;/a&gt;. arXiv.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h2&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;A similar distinction has already been proposed by &lt;a href=&quot;https://philarchive.org/rec/MLLTHA-2&quot;&gt;Vincent Müller&lt;/a&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;With Mollo and Millière (2025) one could calls this the “sensori-motor grounding problem”. As they also notice, the easy grounding problem is part of the original discussion of the symbol grounding problem by Harnad (1990). &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Compare with the &lt;a href=&quot;https://plato.stanford.edu/entries/functionalism/#MachStatFunc&quot;&gt;formulation of machine-state funcationalism in the Stanford Encyclopedia of Philosophy&lt;/a&gt;. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;See, for example, the &lt;a href=&quot;https://iep.utm.edu/hard-problem-of-conciousness/&quot;&gt;discussion of the hard problem of concsciousness in the Internet Encyclopedia of Philosophy&lt;/a&gt;. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Sun, 19 Oct 2025 13:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/Reflections-on-the-SGP/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Reflections-on-the-SGP/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Poster at Upcoming CoNLL</title>
        <description>&lt;p&gt;I’m presenting a poster at the upcoming &lt;a href=&quot;https://conll.org/&quot;&gt;CoNLL workshop&lt;/a&gt; in Vienna collocated with the main &lt;a href=&quot;https://2025.aclweb.org/&quot;&gt;ACL conference&lt;/a&gt;. The poster summarises our paper on the &lt;em&gt;Cambridge Dictionary Look-Up&lt;/em&gt; dataset and our preliminary results on it. If you are around, come check it out at either of the two poster sessions!&lt;/p&gt;

&lt;p&gt;You can find the paper &lt;a href=&quot;https://openreview.net/forum?id=WLZmO6eFXR&quot;&gt;online on OpenReview&lt;/a&gt;. The public portions of the dataset are available on &lt;a href=&quot;https://englishlanguageitutoring.com/datasets/cambridge-dictionary-look-up-dataset&quot;&gt;the ELiT website&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We’ll release more data at a later time. If you have any questions about the dataset, feel free send me an email!&lt;/p&gt;
</description>
        <pubDate>Fri, 25 Jul 2025 11:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/Upcoming-Poster-at-CoNLL/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Upcoming-Poster-at-CoNLL/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Assumptions about Learning: A Hard Lesson</title>
        <description>&lt;p&gt;I’m currently reading Melissa Bowerman’s (2018) “Ten Lectures on Language, Cognition, and Language Acquisition”, and right at the start of the first lecture, she provides a fascinating remark on assumptions historically made about language learning. When we look into her remark and consider the larger consequences, we arrive at a warning about being too sure about one’s assumptions.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/hanging-one-hand.jpg&quot; alt=&quot;A man hanging by one hand between train carriages&quot; /&gt;&lt;/p&gt;

&lt;p&gt;According to Bowerman, when crosslinguistic research into language acquisition by children began, two claims from Chomsky were at the heart of it. For this post, I’m not interested in the second assumption (universal grammar). It is the first one that interests me. Bowerman formulates the first assumption as “children learn implicit rule systems” (p. 4).&lt;/p&gt;

&lt;p&gt;To clarify this assumption and its significance, Bowerman provides some more background and contrasts Chomsky’s contribution with the earlier behaviourist research. The behaviourists had their own assumption (p. 4):&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;… the idea was really that children learned in quite simple ways with general mechanisms—they learned by imitation, they learned by association.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;On Bowerman’s telling, a behaviourist approach to learning ruled until Chomsky swooped in and disrupted the discourse by asserting that children learn a very abstract rule system. The same story has been told in different ways many times, but one feature of this retelling struck me.&lt;/p&gt;

&lt;p&gt;The observant reader might notice that there is a gap in Bowerman’s brief retelling. Thinking it through step by step and leaving aside our background knowledge, it is not clear at all why the behaviourists and Chomsky are in disagreement. Chomsky is taken to assert that children acquire a certain kind of knowledge or skill: abstract rule systems. Behaviourists assert that there is a set of general mechanisms for learning. Those assertions seem perfectly compatible.&lt;/p&gt;

&lt;p&gt;One side argues for the learning of a certain kind of object, an abstract rule system, and the other side argues for a particular way of learning, using general mechanisms. It looks as if we had an argument between someone who says we have to get to a specific address (abstract rule systems) and someone else proposes the very general mechanism of walking to get there. Where is the conflict?&lt;/p&gt;

&lt;p&gt;The conflict arises, of course, from background assumptions that Bowerman left out from her summary, presumably because they seemed too obvious to her for repetition. Two of these assumption are (a) that there are different types of learning, and (b) that only certain types can let one acquire certain types of knowledge or skill. In particular, the general mechanisms proposed by behaviourists will never get you anywhere close to learning abstract rule systems. Walking never gets you to the address because it is surrounded by a moat. You need to swim.&lt;/p&gt;

&lt;p&gt;The same or at least a similar assumption is made explicit in the conclusion of Fodor and Pylyshyn’s “Connectionism and Cognitive Architecture” (1988, p. 69):&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;There is an alternative to the Empiricist idea that all learning consists of a kind of statistical inference, realized by adjusting parameters; it’s the Rationalist idea that some learning is a kind of theory construction, effected by framing hypotheses and evaluating them against evidence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By looking at Fodor and Pylyshyn’s work we arrive at a more sensible if incomplete sketch of the claim:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;There are at least two types of learning: statistical and theory construction.&lt;/li&gt;
  &lt;li&gt;There are at least two types of objects to be learned: correlations and abstract rule systems.&lt;/li&gt;
  &lt;li&gt;The statistical type of learning cannot lead to the acquisition of abstract rule systems.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To turn this sketch into a convincing argument, we still need to know &lt;em&gt;why&lt;/em&gt; statistical learning cannot lead to the acquisition of abstract rule systems. It is here where I’ve ended up less than convinced by Fodor, Pylyshyn, and others on their side of the debate. I still struggle to precisely understand what the reason is supposed to be for this inherent limit of statistical learning (broadly construed). I know that it led them to make statements such as the following (p. 67):&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;association is not a structure sensitive relation&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I can see an interpretation of this statement that construes &lt;em&gt;association&lt;/em&gt; narrowly enough to make the statement true. But to achieve this, we need to construe association on the basis of simple examples, such as correlations between a handful of variables. A more sophisticated construal is required to do justice to what connectionism can achieve (and would go on to achieve once it transformed into deep learning). Association, or perhaps better “statistical learning”, encompasses much more than Fodor and Pylyshyn expected.&lt;/p&gt;

&lt;p&gt;Why did Fodor and Pylyshyn miss it? They made another background assumption, which they were aware of but often got left aside: scale doesn’t matter.&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; Adding parameters does not matter. But how do we know this? How do we know that the nature of the inference doesn’t change with the parameter count?&lt;/p&gt;

&lt;p&gt;A follower of Fodor might be tempted to respond that scaling up is just doing more of the same and that doing more of the same cannot bring about a difference in kind. Adjusting parameters using statistics remains just that: adjusting parameters. One might claim that the behaviour of the system won’t be different in kind just because we are adjusting many more parameters.&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; That, however, is a fundamental assumption about which we can disagree.  In fact, empirical evidence speaks against this assumption for the case of neural networks.&lt;/p&gt;

&lt;p&gt;Overparameterization makes a decisive difference to the learning dynamics of the deep learning networks. We know that overparameterization helps with generalization (e.g. see Allen-Zhu et al., 2019) — a key indicator that the system has learned more general, dare I say, abstract rules — although the exact effects are still very much up for debate. &lt;a href=&quot;https://sites.google.com/view/icml2021oppo&quot;&gt;Entire workshops&lt;/a&gt; have discussed overparameterization and how to understand it.&lt;/p&gt;

&lt;p&gt;Neural models appear to be able to learn abstract rule systems using a ford of statistical learning (although there are still caveats about compositionality, see my &lt;a href=&quot;https://dstrohmaier.com/transformer-speculations/&quot;&gt;previous&lt;/a&gt; &lt;a href=&quot;https://dstrohmaier.com/compositionality-a-paper/&quot;&gt;blog&lt;/a&gt; &lt;a href=&quot;https://dstrohmaier.com/compositionality-word-meaning/&quot;&gt;posts&lt;/a&gt; on the topic, as well ass Kim &amp;amp; Linzen, 2020; Li et al., 2024; Valvoda et al., 2022). I’ve come to this view after co-authoring a number of papers describing how  we taught simple transformer networks abstract rules of meaning (see Strohmaier &amp;amp; Wimmer 2022, 2023, 2025). The neural models learned something that would have seemed beyond their reach from Fodor’s position of 1988. There is, of course,  a danger of misinterpreting a handful of results like this, but the overall pressure of the ever expanding literature on deep learning is hard to ignore. The debates from the 60s to the 90s might have gone quite differently if we had known the results of these experiments.&lt;/p&gt;

&lt;p&gt;The lesson is that our underlying assumptions, even our most fundamental assumptions that appear to belong to the realm of philosophy — that doing more of the same cannot bring about a difference in the kind of outcome — have great impact and can mislead us for decades. Our seemingly well-developed views in a debate often rest on assumptions we cannot defend with rigor. Surely, it is not the first time we are taught this lesson in the history of philosophy and science. It will not be the last time. Given all the assumptions we have to make to engage in science in the first place, it remains a difficult lesson to accept. We will not be able to avoid the assumptions,&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; only to spell out our position as far as we can and seek awareness of their foundations.&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Allen-Zhu, Z., Li, Y., &amp;amp; Liang, Y. (2019). &lt;a href=&quot;https://proceedings.neurips.cc/paper_files/paper/2019/hash/62dad6e273d32235ae02b7d321578ee8-Abstract.html&quot;&gt;Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers&lt;/a&gt;. Advances in Neural Information Processing Systems, 32.&lt;/li&gt;
  &lt;li&gt;Bowerman, M. (2018). &lt;a href=&quot;https://brill.com/display/title/35672&quot;&gt;Ten Lectures on Language, Cognition, and Language Acquisition&lt;/a&gt;. Brill.&lt;/li&gt;
  &lt;li&gt;Buckner, C. (2019). &lt;a href=&quot;https://doi.org/10.1111/phc3.12625&quot;&gt;Deep Learning: A Philosophical Introduction&lt;/a&gt;. Philosophy Compass, 14(10)&lt;/li&gt;
  &lt;li&gt;Fodor, J. A. (1974). &lt;a href=&quot;https://doi.org/10.1007/BF00485230&quot;&gt;Special sciences (or: The disunity of science as a working hypothesis)&lt;/a&gt;. Synthese, 28(2), 97–115.&lt;/li&gt;
  &lt;li&gt;Fodor, J. A., &amp;amp; Pylyshyn, Z. W. (1988). &lt;a href=&quot;https://doi.org/10.1016/0010-0277(88)90031-5&quot;&gt;Connectionism and cognitive architecture: A critical analysis&lt;/a&gt;. Cognition, 28(1), 3–71.&lt;/li&gt;
  &lt;li&gt;Kim, N., &amp;amp; Linzen, T. (2020). &lt;a href=&quot;https://doi.org/10.18653/v1/2020.emnlp-main.731&quot;&gt;COGS: A Compositional Generalization Challenge Based on Semantic Interpretation&lt;/a&gt;. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 9087–9105.&lt;/li&gt;
  &lt;li&gt;Li, Z., Jiang, G., Xie, H., Song, L., Lian, D., &amp;amp; Wei, Y. (2024). &lt;a href=&quot;https://doi.org/10.18653/v1/2024.findings-acl.576&quot;&gt;Understanding and Patching Compositional Reasoning in LLMs&lt;/a&gt;. In Findings of the Association for Computational Linguistics: ACL 2024 (pp. 9668–9688). Association for Computational Linguistics.&lt;/li&gt;
  &lt;li&gt;Valvoda, J., Saphra, N., Rawski, J., Williams, A., &amp;amp; Cotterell, R. (2022). &lt;a href=&quot;https://aclanthology.org/2022.coling-1.525&quot;&gt;Benchmarking Compositionality with Formal Languages&lt;/a&gt;. In Proceedings of the 29th International Conference on Computational Linguistics (pp. 6007–6018). International Committee on Computational Linguistics.&lt;/li&gt;
  &lt;li&gt;Strohmaier, D., &amp;amp; Wimmer, S. (2023). &lt;a href=&quot;https://philpapers.org/rec/STRCAL-7&quot;&gt;Contrafactives and Learnability: An Experiment with Propositional Constants&lt;/a&gt;. Post-Proceedings of Logic and Engineering of Natural Language Semantics 19.&lt;/li&gt;
  &lt;li&gt;Strohmaier, D., &amp;amp; Wimmer, S. (2025). &lt;a href=&quot;https://philpapers.org/rec/STRCLA-3&quot;&gt;Contrafactives, Learnability, and Production&lt;/a&gt;. Experiments in Linguistic Meaning, 3, 395–410.&lt;/li&gt;
  &lt;li&gt;Wimmer, S., &amp;amp; Strohmaier, D. (2022). &lt;a href=&quot;https://philarchive.org/rec/STRCAL-6&quot;&gt;Contrafactives and Learnability&lt;/a&gt;. In M. Degano, T. Roberts, G. Sbardolini, &amp;amp; M. Schouwstra (Eds.), Proceedings of the 23rd Amsterdam Colloquium (pp. 298–305).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h2&gt;

&lt;!--  LocalWords:  Bowerman crosslinguistic Pylyshyn doesn ve
 --&gt;
&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;This quote is taken from one of four options Fodor and Pylyshyn offer connectionists in the paper, but I consider to reflect a general assumption of the authors. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;I find it odd that of all people Fodor appears to have made such an assumption since he has published a very influential paper that seems to me to argue that scale matters: Fodor’s (1974) “The Special Sciences”. Perhaps, and that is only a guess, Fodor thought that when scale matters, it does so in ways that undermine the unity of the phenomenon. Under this interpretation, the scale of neural networks would only start to matter insofar it led beyond them being a cognitive theory of learning. (Note the discussion of connectionism as an implementation theory towards the end of Fodor and Pylyshyn 1988). &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;That scale makes little difference might have influenced also the initial neglect of deep learning by philosophers and even cognitive scienticsts. See the following quote from Buckner (2019, p. 2): “A commonly encountered attitude in these areas is that deep neural networks are just ‘more of the same’—perhaps an important engineering advance, but incremental rather than game changing—and so recent research developments do not merit the kind of careful scrutiny from philosophers that earlier waves of connectionism received.” &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;There are, of course, throughout the history of philosophy numerous attempts to escape the need for fallible assumptions. Descartes’ cogito is perhaps the best known example. I’m enough of a pragmatist to doubt the success of these approaches. But those pragmatist assumptions are themselves fallible assumptions. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Sat, 07 Jun 2025 14:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/Assumptions-About-Learning/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Assumptions-About-Learning/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Two Upcoming Talks on Philosophy and AI</title>
        <description>&lt;p&gt;I will give two talks about Philosophy and AI in the next few weeks (February 2025).&lt;/p&gt;

&lt;h2 id=&quot;workshop-on-contrafactives&quot;&gt;Workshop on Contrafactives&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;When: February 6-7, 2025&lt;/li&gt;
  &lt;li&gt;Where: University of Düsseldorf&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The workshop &lt;a href=&quot;https://philevents.org/event/show/125150&quot;&gt;“How we do (not) talk about mistaken beliefs”&lt;/a&gt; will bring
together researchers interested in the phenomenon of contrafactive
predicates: ascription verbs that denote attitudes by which people get
things wrong. According to current knowledge, no such ascription verbs
are lexicalised.&lt;/p&gt;

&lt;p&gt;At this workshop, I will present the most recent results and some
general conclusions from my long running research project into the
learnability of contrafactives (joint work with Simon Wimmer). We
tested whether transformer networks can acquire contrafactive
predicates. The answer is positive but attend the talk (or wait for
future publications) to learn about the bigger lessons.&lt;/p&gt;

&lt;p&gt;If you are interested, our previous papers on this topic are:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;https://philpapers.org/rec/STRCAL-6&quot;&gt;Contrafactives and Learnability&lt;/a&gt; (2022)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://philpapers.org/rec/STRCAL-7&quot;&gt;Contrafactives and Learnability: An Experiment with Propositional Constants&lt;/a&gt; (2023)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://philpapers.org/rec/STRCLA-3&quot;&gt;Contrafactives, Learnability, and Production&lt;/a&gt; (2025)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;non-human-entities-and-ai&quot;&gt;Non-Human Entities and AI&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;When:  February 20-22, 2025&lt;/li&gt;
  &lt;li&gt;Where: University of Heidelberg&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At Heidelberg, I will join the symposium on &lt;a href=&quot;https://philevents.org/event/show/125770&quot;&gt;“Artificial Intelligence
and Non-Human Entities: In Search of Synergies from Various
Philosophical Debates”&lt;/a&gt;. I will draw on my previous research on social
ontology to compare organisations and LLMs. Without giving too much
away, my talk will discuss how our ascriptions from the intentional
stance play different roles in the cases of  organisations and LLMs.&lt;/p&gt;

&lt;p&gt;Interestingly, at the end of my paper, I found myself discussing some
of the memes swirling around in the twitter AI debate. So if you want
to see me draw the connection between social ontology, LLMs, and the
Leviathan with a smiling face, this will be your opportunity.&lt;/p&gt;

&lt;p&gt;My previous research in this area includes:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;https://philpapers.org/rec/STRONN&quot;&gt;Ontology, neural networks, and the social sciences&lt;/a&gt; (2020)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://philpapers.org/rec/STROAC-2&quot;&gt;Organisations as Computing Systems&lt;/a&gt; (2021)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://philpapers.org/rec/STRSK&quot;&gt;Social-Computation-Supporting Kinds.&lt;/a&gt; (2020)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://philpapers.org/rec/STRTTO-16&quot;&gt;Two theories of group agency&lt;/a&gt; (2020)&lt;/li&gt;
&lt;/ol&gt;
</description>
        <pubDate>Mon, 03 Feb 2025 13:57:13 +0000</pubDate>
        <link>https://dstrohmaier.com/Two-Talks/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Two-Talks/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Revisiting Politeia</title>
        <description>&lt;p&gt;Being sufficiently arrogant to provide a &lt;a href=&quot;https://dstrohmaier.com/favourites-of-all-time/&quot;&gt;list of all time favourite books&lt;/a&gt; on my website, it behooves me to pay some thought to the books included on it. Thus, I have recently revisited Plato’s &lt;em&gt;Politeia&lt;/em&gt;, better known as &lt;em&gt;The Republic&lt;/em&gt;. To fit more of the engagement into my holidays, I listened to an audiobook production of &lt;em&gt;Politeia&lt;/em&gt;, while reading Julia Annas’ (1981) &lt;em&gt;An Introduction to Plato’s Republic&lt;/em&gt; along with it.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/plato.jpeg&quot; alt=&quot;bust of Plato&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It is certainly advisable to have a companion for this book, not least because many of Plato’s discussions are plain odd. Reading the commentary, it became apparent to me that some (though not all) of the arguments are a bit of an embarrassment to later philosophers. For example, why would Plato think that he can draw such a direct analogy between the structures of state and soul? At various points, and particularly in book VII of &lt;em&gt;Politeia&lt;/em&gt;, Plato appears to put more weight on this analogy than it can bear.&lt;/p&gt;

&lt;p&gt;Due to how underwhelming some of the arguments are, readers have always been tempted to seek a deeper meaning, a hidden esoteric path through the book. I’m very resistant to that, as with such a reading one can easily fool oneself into seeing whatever one wants to see. Usually, it is more productive to keep to the words of the philosophers and their arguments, as there is more than enough in them for us to engage with. I prefer a direct argument taken straight from the page.&lt;/p&gt;

&lt;p&gt;My strategy for making sense of Plato’s more feeble arguments, or &lt;em&gt;mutatis mutandis&lt;/em&gt; the lacking arguments of any other historical philosopher, tends to be one of two:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;To acknowledge that philosophy is a truly challenging task and with the &lt;em&gt;Politeia&lt;/em&gt; we are looking at very early work when many of conceptual and educational resources available to us had not been established.&lt;/li&gt;
  &lt;li&gt;To remind myself that that Plato wrote for a contemporary audience, whose expectations differed greatly from our modern expectations.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This strategy leads me to acknowledge weaknesses, while providing excuses for them. Julia Annas’ reading often makes similar moves and thereby reassures me that I’m not completely misguided. But this approach only gets you so far when the text tells you that there is more than what it says directly. For example, in passages where Socrates says something like this (435c-d):&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;But you should know, Glaucon, that, in my opinion, we
will never get a precise answer using our present methods of argument—
although there is another longer and fuller road that does lead to such an
answer. But perhaps we can get an answer that’s up to the standard of
our previous statements and inquiries.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But what, O Socrates, is this longer and fuller road? Is it the path of “dialectic” which you name but not develop much later in the dialogue? It isn’t clear, but Plato hints that there is more than what we can find directly in the text. Plato does not share my preference for direct argument taken straight from the page. Thus, my interpretive preferences are disappointed&lt;/p&gt;

&lt;p&gt;Even worse, I’m quite confident that whatever Plato sought to express in an esoteric manner would  disappoint me as well if I had access to it. Despite years in academia, I have yet to encounter an instance in which the ideas expressed through hints where more interesting or insightful than what you can find on a page of decent analytic philosophy. My understanding of language does not leave space for ineffable information and neither does my metaethics grant it any special value.&lt;/p&gt;

&lt;p&gt;Given all these issues, I have wondered whether Plato’s &lt;em&gt;Politeia&lt;/em&gt; is the right choice for my list of favourite books. It stands in for my general admiration of ancient Greek philosophy and sofar I cannot name a better to replace it. Aristotle’s &lt;em&gt;Metaphysics&lt;/em&gt; would be a contender, if  I had only read that work in its entirety, instead of pieces here and there. Julia Annas also appears to prefer Aristotle, whose arguments and positions she compares favourably to Plato’s &lt;em&gt;Politeia&lt;/em&gt; in more than one place. The comparison to Aristotle also brings forward one of the reasons I gained intellectually from revisiting &lt;em&gt;Politeia&lt;/em&gt;: It shed light on Hegel’s philosophy.&lt;/p&gt;

&lt;p&gt;My former supervisor and eminent Hegel scholar Bob Stern, &lt;a href=&quot;https://dstrohmaier.com/Bob-Stern/&quot;&gt;whom we have lost last year&lt;/a&gt;, put forward a rather Aristotelian reading of Hegel. Such a reading fit Bob’s approach to Hegel, which made his work seem rather reasonable while remaining close to the text. I, however, have been for years tempted by an interpretation that acknowledges the more Platonist and especially Neo-Platonist elements in Hegel — perhaps due to my encounter with Boethius’ &lt;em&gt;The Consolation of Philosophy&lt;/em&gt;. Two ideas where I see alignment are reason realising itself through the universe and the solution to the problem of evil. I remember mentioning these Neo-Platonist connections up to  Bob, who was always open to discussing such matters, but he couldn’t really see how it would help make sense of what Hegel said.&lt;/p&gt;

&lt;p&gt;Revisiting Plato’s &lt;em&gt;Politeia&lt;/em&gt;, the lines connecting its arguments to Hegel became once again apparent to me. The connections go beyond the obvious, e.g. that Plato uses the term “dialectic” for the highest form of gaining knowledge,&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; and include more subtle elements, such as an obsession with encompassing unity (totality) and belief in a connection between reality and being. Surely, a closer reading would reveal even more links. One day, I might pick up these threads and write about the Platonic Hegel. But given the immense literature around both of these figures, that is an endeavour better left for another time. For now, I rejoice in the intellectual engagement afforded by these texts and reaffirm my inclusion of &lt;em&gt;Politeia&lt;/em&gt; in my list of all time favourite books.&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Annas, J. (1981). &lt;a href=&quot;http://archive.org/details/introductiontopl0000anna&quot;&gt;An Introduction to Plato’s Republic&lt;/a&gt;. Oxford : Clarendon Press ; New York : Oxford University Press.&lt;/li&gt;
  &lt;li&gt;Griswold, C. L. (1988). Plato’s Metaphilosophy: Why Plato Wrote Dialogues. In C. L. Griswold (Ed.), Platonic Writings/Platonic Readings. Pennsylvania State University Press.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h2&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;But one can make quite a lot out of how Plato’s and Hegel’s method or metaphilosophy relate. One place to start would be Grisworld’s (1988) text on why Plato wrote dialogues. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Sun, 19 Jan 2025 11:57:13 +0000</pubDate>
        <link>https://dstrohmaier.com/Politeia/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Politeia/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>End of 2024: Transformers and Transformation</title>
        <description>&lt;p&gt;2024 was my year of training Transformer Language Models (LMs). The hours I spent on building, training and  debugging my models!&lt;/p&gt;

&lt;p&gt;I’ve been building and using NLP models using LMs as the basis for a while, but this year I spent much more time on training them from scratch, trying many variations and learning the hard way how brittle these systems are and how difficult they are to comprehend.&lt;/p&gt;

&lt;p&gt;A well-known Feynman quote goes:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;What I cannot create, I do not understand.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Wisely, this sentence is a one-sided conditional. It does not guarantee that you understand what you create. My understanding of the models I train is certainly imperfect. Even when they work, when I’ve created them successfully, they remain opaque.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/washington-hand-press.jpg&quot; alt=&quot;printing press&quot; /&gt;&lt;/p&gt;

&lt;p&gt;One can also question to which extent I created my language models. The data, the PyTorch framework, and the CUDA library all have different authors. Writing the first version of the program, choosing architectural details, selecting the data sources, it is easy to believe one is in control. But, at least for me, this feeling was rapidly punctured by the failure of the first version. And then the failure of the second version, and the third, and the fourth…&lt;/p&gt;

&lt;p&gt;Debugging a transformer model is hard work and it is often exceedingly difficult to tell whether one has committed a coding error or just selected the wrong hyperparameters (learning rate, batch size etc.). I certainly have done both at the same time, making it nearly impossible to tell why the model underperforms. I went through the code again and again and re-read the relevant passages in two papers again and again.&lt;/p&gt;

&lt;p&gt;There have been hours, days, even weeks, in which I wanted to just throw it all out and never touch a piece of Python code again; read a piece of philosophy instead. In those moments, I reminded myself of C.S. Peirce’s claim that some hours in the laboratory would improve philosophers (Peirce 2001, p. 29):&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;In my opinion, the present infantile condition of philosophy, — for as long
as earnest and industrious students of it are able to come to agreement upon
scarce a single principle, I do not see how it can be considered as otherwise
than in its infancy,—is due to the fact that during this century it has chiefly
been pursued by men who have not been nurtured in dissecting-rooms and
other laboratories, and who consequently have not been animated by the true
scientific Eros, but who have on the contrary come from theological seminaries,
and have consequently been inflamed with a desire to amend the lives of
themselves and others, a spirit no doubt more important than the love of
science, for men in average situations, but radically unfitting them for the task of
scientific investigation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It was not the scientific Eros that animated me when my models failed to perform. My experience was rather one of frustration. But this frustration, too, might establish habits of scientific work and thinking: neither code nor reality bend easily to your wishes or intuitions. It is all too easy to convince yourself of what you want to believe, especially if you have the argumentation skills of an academic philosopher. Making a computer model perform as you wish — that is another matter.&lt;/p&gt;

&lt;figure&gt;
&lt;img src=&quot;/assets/images/plot_loss.svg&quot; alt=&quot;A rising loss&quot; style=&quot;max-height: 900px; max-width: 100%&quot; /&gt;
  &lt;figcaption&gt;The loss should go down, alas.&lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;When the models finally started to learn, it was less a joyful surprise and more a relief. It should have been working, but for some reason the settings didn’t work — and for some reason, I did not consider one architectural detail for a long time, producing sub-par performance due to an implementation error over and over again.&lt;/p&gt;

&lt;p&gt;I imposed some of this pain on myself. Using slightly higher-level frameworks would have helped, for example, the &lt;a href=&quot;https://lightning.ai/docs/fabric/stable/&quot;&gt;Fabric framework&lt;/a&gt;, which I can recommend based on experience in other projects. But I relied primarily on PyTorch. The hours spent on multi-processing issues as a result! But this choice, while it cost me valuable time, meant that I had more control. The hours of debugging I had to deal with were the result of giving myself the opportunity to commit the errors. Taking control confronted me forcefully with my own limitations. I am in control and I am to blame.&lt;/p&gt;

&lt;p&gt;It is tempting to blame technology for an experienced lack of agency, especially when one receives the technology packaged and sealed off as a consumer product. Building one’s own model, one reclaims some agency and might learn why one traded off the agency in the first place. But once I ventured down this road — and one could go much further than I did, by not relying on frameworks like PyTorch in the first place, by implementing backpropagation from scratch — I was changed by it. I got to know and overcome some of my limitations.&lt;/p&gt;

&lt;p&gt;I’ve described here the personal experience of engaging in my Transformer research, mixed with speculations about its effects on character and habits. But solitary time spent in front of the screen is not the entirety of the scientific effort, far from it. The social exchange of beliefs is a key aspect in Peirce’s motivation for scientific investigation (see its role in &lt;em&gt;The Fixation of Belief&lt;/em&gt; in Peirce 1992).&lt;/p&gt;

&lt;p&gt;My intention for 2025 is to share more of my work and its fruits. I’ve already started by, for example, &lt;a href=&quot;/Two-Upcoming-Events/&quot;&gt;teaching a masterclass&lt;/a&gt; on LLMs for philosophers at the University of Barcelona. In 2025, I’ll be giving a few talks and, reviewers willing, I might publish one or more papers. The insights gained and habits built up during 2024, I want to refine them and put them to use.&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Peirce, C. S. (1992). The Essential Peirce, Volume 1: Selected Philosophical Writings: (1867–1893) Indiana University Press.&lt;/li&gt;
  &lt;li&gt;Peirce, C. S. (2001). The Essential Peirce, Volume 2: Selected Philosophical Writings: (1893–1913). Indiana University Press.&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Mon, 30 Dec 2024 11:57:13 +0000</pubDate>
        <link>https://dstrohmaier.com/End-of-Year-2024/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/End-of-Year-2024/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>LLMs, Symbol Grounding, and other Problems</title>
        <description>&lt;p&gt;To my surprise, the recent 2024 EMNLP conference included multiple  papers with a philosophical angle. One of them attracted my attention in particular: &lt;a href=&quot;https://aclanthology.org/2024.emnlp-main.651/&quot;&gt;“Pragmatic Norms Are All You Need”&lt;/a&gt; by Reto Gubelmann, who seeks to address the Symbol Grounding Problem (SGP). As a first approximation, the SGP can be understood as the problem of endowing computational representations with a connection to objects in the real word so that the representations can refer to them. The paper argues that for conceptual reasons the SGP does not apply to LLMs.&lt;/p&gt;

&lt;p&gt;In this post, I gather some general thoughts on this paper and the problem space it discusses. My primary point will be: The argument of the paper leaves many close cousins of the SGP as open possibilities. While I won’t describe these possibilities in detail, I will sketch how they might arise.&lt;/p&gt;

&lt;h3 id=&quot;the-core-of-the-paper&quot;&gt;The Core of the Paper&lt;/h3&gt;

&lt;p&gt;The argument of Gubelmann’s paper goes roughly as follows:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;The SGP depends conceptually on a particular theory of meaning and cognition, the Correspondence Theory of Meaning (CTM).&lt;/li&gt;
  &lt;li&gt;This theory of meaning and cognition does not apply to LLMs as deep learning models.&lt;/li&gt;
  &lt;li&gt;Thus, the SGP does not arise for LLMs and we can stop worrying about it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I can believe that the argument is sound on a specific construal of the SGP and the particular theory of meaning, but a lot hangs on those construals. The argument relies on conceptually linking the SGP to a version of the Correspondence Theory of Meaning (CTM) that does not apply to LLMs.&lt;/p&gt;

&lt;p&gt;On this version of the SGP and the CTM, symbols in the cognitive system have to be grounded — whatever that exactly means — so that they can stand in correspondence with real entities for them to carry meaning. Of course, if one denies that such correspondence relations are necessary for carrying meaning, then this version of the SGP does not arise. So far, it is hard to disagree with Gubelmann.&lt;/p&gt;

&lt;p&gt;I’m not particularly invested in whether this is the textually correct interpretation of the SGP (as introduced by Harnard 1990), that is whether Gubelmann’s formulation of the SGP reflects the original intentions. I’m more interested in whether there are nearby problems not afflicted by this argument. Perhaps we should  give them different names to avoid confusion, but they are what we care about when enter the debate about the SGP.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/an-ocean-of-books.jpg&quot; alt=&quot;Someone losing their grounding amidst symbols.&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;further-problems&quot;&gt;Further Problems&lt;/h3&gt;

&lt;p&gt;To see where such related problems may be found, I look at a few other claims the paper makes along the way:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Other theories of meaning than the CTM might in fact be preferable, in particular a pragmatist norm-based theory of meaning (exemplified by Wittgenstein and Brandom).
As a reason, Gubelmann (p.11670) points to the success of LLMs:&lt;/p&gt;

    &lt;blockquote&gt;
      &lt;p&gt;Time and again, and even more so after the publication of ChatGPT on November 30, 2022, LLMs have managed to shine in one linguistic challenge after another that was previously thought be beyond them. Hence, for the theoretical and empirical reasons sketched, we should abandon the correspondence theory of meaning in favor of a pragmatic one&lt;/p&gt;
    &lt;/blockquote&gt;

    &lt;p&gt;This passage also brings us to the second claim.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;The pragmatist theory of meaning is more likely to ascribe the understanding of meaning to LLMs. If this wasn’t the case, then the success of LLMs would not speak in favour of this theory.&lt;/li&gt;
  &lt;li&gt;The relevant version of CTM for formulating the SGP, representationalism, and the language of thought hypothesis go hand in hand (cf. p. 11666).&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I will go through these additional claims in turn, trying to show why they can be questioned and how that gives rise to problems similar to the SGP.&lt;/p&gt;

&lt;h4 id=&quot;claim-1-there-are-alternatives-to-ctm&quot;&gt;Claim 1: There Are Alternatives to CTM&lt;/h4&gt;

&lt;p&gt;Gubelmann clearly has a horse in the race when it comes to the correct theory of meaning, and his bet is not on CTM. I want to spend least time on this claim, because my general response is somewhat trivial: The alternatives exist but are far from generally accepted. While the pragmatist norm-based theory of meaning that Gubelmann points to has important representatives in Wittgenstein, Brandom, and others, it is a minority position in both philosophy and linguistics.&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;It is certainly premature to dismiss the SGP, just because the CTM is not the only option and LLMs are in tension with the latter. Some version of the CTM might be correct. Furthermore, the correct version of the CTM might be sufficiently broad to also apply to LLMs and raise the SGP (more on that when I get to the third claim). Then, we would have to address the problem after all.&lt;/p&gt;

&lt;p&gt;Of course, this amounts to little more than saying that we need to have the debate about theories of meaning before we can set other questions aside due to the outcome of this debate.&lt;/p&gt;

&lt;h4 id=&quot;claim-2-a-pragmatist-account-is-more-likely-to-ascribe-the-understanding-of-meaning-to-llms&quot;&gt;Claim 2: A Pragmatist Account Is More Likely to Ascribe the Understanding of Meaning to LLMs&lt;/h4&gt;

&lt;p&gt;Even if the CTM were superseded by a pragmatist theory of meaning as espoused by Brandom, that would not end all problems resembling the SGP. I am sceptical that this type of theory would support ascribing the understanding of meaning to LLMs, even though Gubelmann (p. 11671) writes on this matter:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;All it needs [to understand meaning on this theory] is a norm-governed practice in a society, the patterns of which can be picked up by LLMs from the training data and used to infer said norms.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But what is a norm-governed practice? In the philosophical literature, we can find a reading&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; where normativity determines meaning such that “it is not how we are &lt;em&gt;disposed&lt;/em&gt; to use an expression that determines its meaning, but how we are &lt;em&gt;supposed&lt;/em&gt; to use it” (Glür, Wikforss &amp;amp; Ganapini: 2022). While dispositions of use are something that an LLM would be excellently suited for, it is not clear that they have the correct connection to normativity.&lt;/p&gt;

&lt;p&gt;Brandom’s account is explicitly normative, the ability to commit is central to participating in this norm-governed practice of language use (Brandom 1994: 159). Without this ability, one might question whether the entity can in question can understand meaning. But can LLMs commit? It is hard to see how they could be capable of committing, if there is no way of holding them accountable and I am unconvinced that providing them input that chastises them for previous output should be considered holding them accountable. In fact, I would find it easier (albeit challenging) to ground an LLM with sensory information than to change it in such a way that it can be held accountable for its commitments. We end up with a grounding problem, albeit one that concerns the grounding in commitments.&lt;/p&gt;

&lt;p&gt;I assume that Gubelmann would endorse a weaker version of a normative theory of meaning. Based on my limited knowledge of Wittgenstein’s late work and peeking into Gubelmann’s 2023 paper on this issue, I have a vague idea of what this theory might look like. But Wittgenstein’s account of rule-following comes into play, and the issues surrounding that are beyond a blog-post or EMNLP paper.[^4&lt;/p&gt;

&lt;h4 id=&quot;claim-3-ctm-goes-hand-in-hand-with-various-theories&quot;&gt;Claim 3: CTM Goes Hand-in-Hand with Various Theories&lt;/h4&gt;

&lt;p&gt;Compared to typical philosophy papers, EMNLP papers are brief, which forces Gubelmann to provide a rather quick sketch of all the interconnected theories. The paper is specifically concerned with a representationalist version of CTM, where symbols represent objects in the world. Gubelmann contrasts this with LLMs as “connectionist, statistical devices that have no intrinsic symbolic structure” (p. 11670). The presentation suggests that symbols, representation, and correspondence theory are all on one side, and neural networks, connectionism, and statistics all on the other.&lt;/p&gt;

&lt;p&gt;But while LLMs are not symbolic models, they have some sort of representations in the form of activations. While Fodor serves in this paper as a symbolic-representationalist-correspondence foil, he also holds that connectionism is a representationalist theory (see Fodor &amp;amp; Pylyshyn 1988), just one that has unstructured representations that do not amount to a Language of Thought. Neural activations are not representations as used by symbolic systems, but it is sensible to consider them as encoding information and thereby representing. From this perspective, questions then arise about how and what these activations represent. Thus, we arrive again at problems that strongly resemble the symbol grounding problem, although it might be better to call it “representation grounding problem”.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;Gubelmann’s paper brought a large dose of conceptual thinking into EMNLP. It rightly raises awareness of the fact that the conceptual issues surrounding the SGP debate are very fraught and are sometimes treated rather carelessly. That being said, the paper ended up presenting a somewhat narrow picture of the conceptual issues; perhaps due to the page constraints. The goal of this post was to widen the picture.&lt;/p&gt;

&lt;p&gt;We might find ourselves accepting the CTM after all, but in a version that applies to LLMs, or we might end up rejecting that LLMs understand meaning, because we end up with a strongly normativist theory of meaning, or another theory emerges vindicated. These are all live options.&lt;/p&gt;

&lt;p&gt;Given these variety of options, I think it might better to turn Gubelmann’s main argument on its head, and as I mentioned, he does that himself to a degree: The however limited success of LLMs speaks against the LoT hypothesis, somewhat against representationalism, and perhaps even against the correspondence theories of meaning — under some construals. There are interesting arguments to be had in that direction. The conceptual space is wide open, and empirical results might help to constrain our search for truth.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Fodor, J. A., &amp;amp; Pylyshyn, Z. W. (1988). &lt;a href=&quot;https://doi.org/10.1016/0010-0277(88)90031-5&quot;&gt;Connectionism and cognitive architecture: A critical analysis&lt;/a&gt;. Cognition, 28(1), 3–71.&lt;/li&gt;
  &lt;li&gt;Glüer, K., Wikforss, Å., &amp;amp; Ganapini, M. (2024). &lt;a href=&quot;https://plato.stanford.edu/archives/fall2024/entries/meaning-normativity/&quot;&gt;The Normativity of Meaning and Content&lt;/a&gt;. In E. N. Zalta &amp;amp; U. Nodelman (Eds.), The Stanford Encyclopedia of Philosophy (Fall 2024). Metaphysics Research Lab, Stanford University.&lt;/li&gt;
  &lt;li&gt;Gubelmann, R. (2023). &lt;a href=&quot;https://doi.org/10.1163/18756735-00000182&quot;&gt;A Loosely Wittgensteinian Conception of the Linguistic Understanding of Large Language Models like BERT, GPT-3, and ChatGPT&lt;/a&gt;. Grazer Philosophische Studien, 99(4), 485–523.&lt;/li&gt;
  &lt;li&gt;Gubelmann, R. (2024). &lt;a href=&quot;https://aclanthology.org/2024.emnlp-main.651&quot;&gt;Pragmatic Norms Are All You Need – Why The Symbol Grounding Problem Does Not Apply to LLMs&lt;/a&gt;. In Y. Al-Onaizan, M. Bansal, &amp;amp; Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 11663–11678). Association for Computational Linguistics.&lt;/li&gt;
  &lt;li&gt;Harnad, S. (1990). &lt;a href=&quot;https://doi.org/10.1016/0167-2789(90)90087-6&quot;&gt;The symbol grounding problem&lt;/a&gt;. Physica D: Nonlinear Phenomena, 42(1), 335–346.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h2&gt;

&lt;!--  LocalWords:  SGP CTM EMNLP Brandom Brandom&apos;s activations Glür
  LocalWords:  representationalist Pylyshyn normativity Gubelmann LoT
  LocalWords:  representationalism Gubelmann&apos;s construals normativist
 --&gt;
&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Gubelmann writes that the “CTM is not necessarily symbolic, or as it is expressed in the field, representational” (p. 11666), but limits himself to the discussion of a symbolic/representational CTM. I gather that he takes it to be the basis of the SGP. By the way, I do not share the view that symbols and representations should be equated, as will become clear in my discussion of the third claim. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The &lt;a href=&quot;https://survey2020.philpeople.org/survey/results/4926&quot;&gt;2020 Philpapers&lt;/a&gt; survey found that slightly more than half of the surveyed philosophers accept or lean towards a correspondence theory of truth. While philosophers are a creative bunch, I assume that most of those who endorse the correspondence theory of truth also endorse it for meaning. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Other readings can, of course, also be found. My point is that potential other problems closely related to SGP remain worthy of consideration, even if the narrow argument of the paper goes through. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Thu, 05 Dec 2024 13:57:13 +0000</pubDate>
        <link>https://dstrohmaier.com/SGP-Pragmatism/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/SGP-Pragmatism/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Two Upcoming Events</title>
        <description>&lt;p&gt;I have two upcoming events to share:&lt;/p&gt;

&lt;h2 id=&quot;nlp4call-workshop&quot;&gt;NLP4CALL Workshop&lt;/h2&gt;

&lt;p&gt;I will present my paper “Semantic Error Prediction: Estimating Word
Production Complexity” (co-authored with Paula Buttery) at this year’s
&lt;a href=&quot;&quot;&gt;NLP4CALL&lt;/a&gt; workshop. The presentation is this Friday 25th of October
2024.&lt;/p&gt;

&lt;p&gt;The paper is already &lt;a href=&quot;https://ecp.ep.liu.se/index.php/sltc/article/download/1055/961&quot;&gt;available
online&lt;/a&gt;. It
argues that lexical semantic complexity &lt;em&gt;in production&lt;/em&gt; hat its own
distinct patterns and proposed semantic error prediction (SEP) as a
task to estimate this kind of complexity. I build some baseline models
for this task, including one using LLama2 and also provide a
downstream application as an example.&lt;/p&gt;

&lt;p&gt;If you are interested in SEP, I provide the scripts for re-creating my
dataset as a &lt;a href=&quot;https://github.com/dstrohmaier/semantic_error_prediction/&quot;&gt;github
repo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you have any issues and questions, feel free to email me!&lt;/p&gt;

&lt;h2 id=&quot;barcelona-masterclass&quot;&gt;Barcelona Masterclass&lt;/h2&gt;

&lt;p&gt;On the 7th of November 2024, I will hold a &lt;a href=&quot;http://www.ub.edu/grc_logos/colloquium_card.php?idSem=3356&quot;&gt;masterclass on LLMs for
philosophers&lt;/a&gt;
in Barcelona. At this event, I will introduce LLMs in more depth
without requiring any deeper mathematical or programming knowledge (no
linear algebra or Python). I hope to put up some of the materials
online at a later date.&lt;/p&gt;
</description>
        <pubDate>Mon, 21 Oct 2024 12:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/Two-Upcoming-Events/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Two-Upcoming-Events/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Bob Stern (1962-2024)</title>
        <description>&lt;blockquote&gt;
  &lt;p&gt;What’s not to like?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Bob asked that question in many of our reading group sessions, usually to conclude one of his discussions of Hegel’s philosophy. Amazingly, he was able to make you think Hegel made sense. That was truly an achievement, even though one was rarely able to recapture that sense later on one’s own. With Bob out of the room, the spark seemed to vanish together with his slightly mischievous smile.&lt;/p&gt;

&lt;p&gt;Bob was the kindest soul I have encountered in my life, and I’m extremely grateful for his supervision, support, and attention during my MA and PhD studies at Sheffield. The department at Sheffield was a special place, full of gentle and caring people, deeply invested in philosophical debate. Bob fit right in and stood out as the kindest among them all. He brought life to the department, not least by (co-)organising a range of extracurricular activities, such as philosophy film nights and the “Philosophy Rocks” events.&lt;/p&gt;

&lt;p&gt;On a hike that was part of the philosophy reading weekend, one more activity that made the department special, we walked next to each other. We debated the concrete universal, which according to Bob was Hegel’s solution of all problems metaphysical, as if it were the most natural and most exciting topic of world. It is a cliché to describe philosophy as a dialogue, and it does not fit every philosopher, but it was an approach that suited Bob and his character perfectly.&lt;/p&gt;

&lt;p&gt;Bob went beyond what duty called for in his work with students; as a Hegelian, he knew that duty cannot exhaust the good life. I remember sending him a draft of an entire chapter about two or three days prior to one of our meetings, and by then he had already read it. He debated the draft without any complaints. Neither did he complain about the twists and turns my PhD took, not even when I slowly, but surely, pushed Hegel out of my PhD thesis. The number of reference letters with which he provided me, for years after I had finished my PhD! Patient, inquisitive, and kind, Bob managed to be an excellent supervisor and human at the same time.&lt;/p&gt;

&lt;p&gt;The topic of my MA thesis, also supervised by Bob, was the meaning of death, and my argument had a Stern-ish twist to it. It went something like this: Dewey provides a conception of the meaning of an action as being determined by its experiential consequences. That is, to establish whether an action was good or bad, one had to consider the resulting experiences as if one had conducted an experiment. Deadly actions, however, do not have experiences following them, as the subject expires with death. So, how are we to evaluate them? How are we to evaluate the sacrifice of a person for a good cause? But — and this is the twist — Hegel has a solution to Dewey’s problem: The experiences continue in the society even after the agent of the action has passed away. Those experiences give their action meaning.&lt;/p&gt;

&lt;p&gt;I would be a worse philosopher if I did not acknowledge that this argument has many gaps and problems — not least because it would require spelling out this evaluative concept of meaning in more detail — but if there is something to the argument, then Bob’s actions certainly were meaningful. He has left behind groups of former PhD students, colleagues, and acquaintances, whose experiences have been enriched by their encounter with Bob. Some of them I met for the first time, just a few months ago at one of the Bob festivals, organised due to the news of his declining health. We talked about philosophy, about Bob’s life, and we went for a walk together, Bob with us. We continued the conversations that had brought us together and had enriched our lives.&lt;/p&gt;

&lt;p&gt;In his interpretation of Hegel, Bob emphasised the goal being at home in the world. He was at home in this world and made it a home for others, in a way I continue to admire. I am afraid that I have never achieved that sense of home, and with Bob’s passing, it has drifted further away. There was much to like about Bob, his presence, and the conversations with him.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;I recommend &lt;a href=&quot;https://www.whatisitliketobeaphilosopher.com/robert-stern&quot;&gt;this interview&lt;/a&gt; Bob gave about his life last year. For some of his papers on Hegel, see his volume &lt;a href=&quot;https://philpapers.org/rec/STEHM&quot;&gt;Hegelian Metaphysics&lt;/a&gt;.&lt;/p&gt;
</description>
        <pubDate>Tue, 27 Aug 2024 12:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/Bob-Stern/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Bob-Stern/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>ELM3: Contrafactives and Interdisciplinary Work</title>
        <description>&lt;p&gt;Last week, I had the pleasure to present a poster at the third conference on &lt;a href=&quot;https://www.elm-conference.net/archive/elm-3-2024/elm-3-program/&quot;&gt;Experiments in Linguistic Meaning&lt;/a&gt; (ELM3). My poster presented the latest collaborative work with &lt;a href=&quot;https://simonwimmer.weebly.com/&quot;&gt;Simon Wimmer&lt;/a&gt; on the topic of contrafactives. Click &lt;a href=&quot;/assets/elm3_poster.pdf&quot;&gt;here&lt;/a&gt; to get the poster as a PDF.&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; A paper fill follow.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/elm3_poster.png&quot; alt=&quot;Picture of the poster&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Walking interested conference participants through the poster, I liked to start as follows:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;We are investigating contrafactives, a type of verb that does not exist.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I used the paradoxical statements hook, but of course it twists the matter a little: Simon and I are interested in the non-existence of contrafactives. Why are contrafactives — verbs that attribute a propositional attitude and presuppose the falsehood of the attitude’s content — not lexicalised in any language?&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; Why are factives such as “know” (near?) universal, while contrafactives are entirely absent from the lexicon? Why is there no word that has the meaning of “falsely believe”?&lt;/p&gt;

&lt;p&gt;Simon and I had proposed that contrafactives might be harder to learn and found some limited evidence for it in &lt;a href=&quot;https://philpapers.org/rec/STRCAL-6&quot;&gt;two&lt;/a&gt; &lt;a href=&quot;https://link.springer.com/chapter/10.1007/978-3-031-43977-3_5&quot;&gt;previous&lt;/a&gt; papers considering comprehension. This time we looked at generation and didn’t find any evidence to that effect at all. What does that tell us about the previous two papers? I don’t know. The previous results might have been flukes, despite the statistical significance of the results. Or there could be an asymmetry between comprehension and production. We hope that future research can solve our puzzlement.&lt;/p&gt;

&lt;h2 id=&quot;observations-on-elm3-and-interdisciplinarity&quot;&gt;Observations on ELM3 and Interdisciplinarity&lt;/h2&gt;

&lt;p&gt;The conference was interdisciplinary, combining linguistics and cognitive science, with a pinch of philosophy added for flavour. Many of the methods were computational: Computer-based corpus linguistics, Bayesian inference, etc. Nonetheless, I stood out as a computer scientist because these days working in NLP is almost a proper subset of working on neural network models. I was a little surprised to that find my post was one of very few pieces of research using transformer models.&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;With all the publicity for LLMs these days, it is almost encouraging to see that other work continues. Given all the resources being committed to LLMs and the like, I’m worried about an excessive concentration of research capital. ELM3 assuaged these fears somewhat, although chatting with PhDs and postdocs about the lack of job openings in formal semantics reminded me of the limitations of a field without direct industry application — philosophy PhDs are familiar with the situation.&lt;/p&gt;

&lt;p&gt;I don’t mind having been the only researcher at the conference to train or fine-tune a transformer for their work. That being said, I’m a bit puzzled by how little impact the relative success of neural models had on the cognitive science aspect of ELM3. I heard more about &lt;a href=&quot;https://plato.stanford.edu/entries/language-thought/&quot;&gt;LoT&lt;/a&gt; than connectionism. Does the success of neural methods in NLP not support a connectionist approach to cognition? The approach might be incorrect, but some more discussion might have been in place. What do we have to rethink about semantics if transformer models capture at least some aspects of it?&lt;/p&gt;

&lt;h2 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h2&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;I corrected a typo after the conference. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;At least so far no one has been able to produce an example. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;I recall seeing &lt;a href=&quot;https://elm-conference.net/wp-content/uploads/ELM3/Abstracts/paper115.pdf&quot;&gt;one poster&lt;/a&gt; by Karl Mulligan and Kyle Rawlins using &lt;a href=&quot;https://arxiv.org/abs/1904.09675&quot;&gt;BERTScores&lt;/a&gt;. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Fri, 21 Jun 2024 08:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/ELM3-Contrafactives/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/ELM3-Contrafactives/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>A Debate about Words</title>
        <description>&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;Due to their lack of success in resolving problems, philosophers like to think that they are failing productively. For example, a philosopher might suggest that while the problem hasn’t gone away, one sees it more clearly after the debate. Such insight is a consolation prize awarded to the readers for the failure of the authors.&lt;/p&gt;

&lt;p&gt;This blog post discusses one unsuccessful debate, a debate about the nature of words. I will summarise a series of papers and how the debate ends up being less than successful. Don’t hope for a final answer to conclude this post. My goal is merely to document one debate on the issue on the way towards the answer — and to warn against being side–tracked by metaphysics.&lt;/p&gt;

&lt;p&gt;To those not engaged in philosophy of language and metaphysics, some terms in my summary and discussion might be unfamiliar. I’ve tried to include links in those cases, but  the larger points I make at the end of the post should be accessible even if one never makes sense of those terms.&lt;/p&gt;

&lt;h2 id=&quot;kaplans-original-paper&quot;&gt;Kaplan’s Original Paper&lt;/h2&gt;

&lt;p&gt;In 1990, David Kaplan’s paper &lt;a href=&quot;https://philpapers.org/rec/KAPW&quot;&gt;“Words”&lt;/a&gt; appeared, discussing the philosophy of words: What are they? What is the nature of words?&lt;/p&gt;

&lt;p&gt;Kaplan is moved by issues relating to &lt;a href=&quot;https://plato.stanford.edu/entries/reference/#MillHeir&quot;&gt;direct reference theory&lt;/a&gt;. Consider the statement &lt;a href=&quot;https://plato.stanford.edu/archives/fall2014/entries/frege/#FrePuz&quot;&gt;“Hesperus = Phosphorus”&lt;/a&gt;. As it happens, both names refer to Venus, hence the identity statement is true. At one point in history the identification was a genuine discovery. But if the names refer directly, that is without mediation by e.g. descriptions (“Hesperus is the brightest star in the evening sky”), then how can the identity statement be informative?&lt;/p&gt;

&lt;p&gt;To address such puzzles, Kaplan considers giving words a cognitive role. Hence, he is looking for a theory of words that aligns well with using a difference in names as a cognitive difference that can explain substitution effects in identity statements.&lt;/p&gt;

&lt;p&gt;The target of Kaplan’s criticism is a form–based type–token conception of words, according to which words would be utterance tokens individuated based on their form that instantiate types of words. In its place, Kaplan proposes what he called back then a stage–continuant theory of words, according to which words were objects in this world with an initial creation event followed by repetitions and storage.&lt;/p&gt;

&lt;p&gt;As part of his proposal, Kaplan faces the question of how to individuate words: What makes two utterances a repetition of the same word? Kaplan stresses intent. I might be mispronounce a name, but as long as I intend to speak a word I have previously acquired, my utterance will be a stage of it.&lt;/p&gt;

&lt;p&gt;Because Kaplan is primarily interested in semantic puzzles typically phrased using names and their role in identity statements,  proper names play an outsized role (Kaplan 1990: 110):&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;I have spoken of &lt;strong&gt;words&lt;/strong&gt;, though my examples have often involved &lt;strong&gt;names&lt;/strong&gt;. And truth to tell, it is names at which I aim. It is names that have been thought to challenge direct reference theory.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Due to this focus, Kaplan also discusses how people can be said to share a name. Hume, Kaplan, and yours truly all happen to be called “David”, but the reference seems to differ. In response to such worries, Kaplan distinguishes common currency names from generic names. Only common currency names are used as words, while generic names are cultural artefacts on which we draw for giving proper i.e. common currency names. Hume, Kaplan, and I share a name in the sense of having been given different common currency names using (?) one generic name.&lt;/p&gt;

&lt;h2 id=&quot;hawthorne-and-lepores-response&quot;&gt;Hawthorne and Lepore’s Response&lt;/h2&gt;

&lt;p&gt;In 2011, the debate continued with a response by John Hawthorne and Ernie Lepore, entitled &lt;a href=&quot;https://www.jstor.org/stable/23142917&quot;&gt;“On Words”&lt;/a&gt;. The almost 40 page long paper response thoroughly to Kaplan, although not always honing in on what bothered Kaplan himself. For example, Hawthorne and Lepore objected at length against Kaplan’s stage–contiuant proposal, interpreting as a form of &lt;a href=&quot;https://plato.stanford.edu/entries/temporal-parts/&quot;&gt;four–dimensionalism&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;More on point, Hawthorne and Lepore worry about the role of intent in Kaplan’s account. Surely, at some point intent is not enough to make utterances instances of a word. If you utterly fail by community standards to speak the word you intended to speak, you have not uttered the word?&lt;/p&gt;

&lt;p&gt;Based on the length of discussion, Hawthorne and Lepore’s main target is Kaplan’s distinction between common currency and generic names. They painstakingly go through different possible motivations for the distinction and reject one after the other. Again, this is an issue that is rather specific to proper names and I struggle to see why much would hang on it. Consider the questions: Should we say that Hume, Kaplan, and I have one name? How literally should we take the claim that we share a name? When it comes to the nature of words, these questions are a sideshow. At best, they are getting at something more important: What role does reference play in the individuation of words? To answer that question, we should consider words other than names.&lt;/p&gt;

&lt;p&gt;Interestingly, Hawthorne and Lepore end on a sceptical note, doubting whether metaphysics can provide individuation criteria for words, either because the facts accessible to us are insufficient to establish them, or because words have no proper place in our final ontology. They don’t see much hope even when their &lt;a href=&quot;https://www.oxfordbibliographies.com/display/document/obo-9780199772810/obo-9780199772810-0232.xml&quot;&gt;lexeme&lt;/a&gt;-like conception of words were “supplemented with the tools of theoretical linguistics” (Hawthorne and Lepore 2011: 485).&lt;/p&gt;

&lt;h2 id=&quot;kaplans-response-to-the-response&quot;&gt;Kaplan’s Response to the Response.&lt;/h2&gt;

&lt;p&gt;Kaplan responded with a further paper: &lt;a href=&quot;https://philpapers.org/rec/KAPWOW&quot;&gt;“Words on Words”&lt;/a&gt;. A few misunderstandings are cleared up, and reading the paper is becomes apparent that Kaplan is much less wedded to any metaphysics than his interlocutors presumed — definitely less than to a good joke. Kaplan does not want to commit to four–dimensionalism, i.e. the metaphysical interpretation of his continuant–stage proposal by Hawthorne and Lepore. Kaplan also willing to accept the type–token distinction as long as one jettisons the form–based criteria for individuating words.&lt;/p&gt;

&lt;p&gt;When it comes to the individuation of names, however, Kaplan sticks to his guns, defending both the role of intent and the common currency vs. generic names distinction. It becomes again apparent that the real subject of interest is not words in general, but names as they figure in arguments lobbed against direct reference theory. The question that troubles him is specifically whether he and Hume share a name (conceived of as a word) or have two different ones. Kaplan cares about the individuation of words insofar it might bear on the puzzles threatening direct reference theory.&lt;/p&gt;

&lt;p&gt;As an aside, the response to the response is also the funniest paper in the debate. Kaplan cracks jokes on nearly every page. If you don’t solve the problem, you might at least be funny.&lt;/p&gt;

&lt;h2 id=&quot;brombergers-contribution&quot;&gt;Bromberger’s Contribution&lt;/h2&gt;

&lt;p&gt;But there is another paper in the debate, one with a less catchy title: &lt;a href=&quot;https://philpapers.org/rec/BROWAW-8&quot;&gt;“What Are Words? Comments on Kaplan (1990), on Hawthorne and Lepore, and on the Issue”&lt;/a&gt;. In this brief paper, &lt;a href=&quot;https://philosophy.mit.edu/bromberger&quot;&gt;Sylvain Bromberger&lt;/a&gt; confronts the original contribution by Kaplan as well as the response by Hawthorne and Lepore with the linguistic reality.&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;They key to the paper’s title is at the end: “on the issue”. My impression is that Bromberger is the only one to primarily care about words as units of language. Bromberger is invested in the topic, and in his paper argues that much is missing from the debate (Bromberger 2011: 489–490). For a start, he stresses that words function as constituents of phrases and sentences” (Bromberger 2011: 490), something that is oddly absent from the debate (except for identity statements). Names do not exhaust the set of words and words generally have their roles in sentences.&lt;/p&gt;

&lt;p&gt;The paper brings in the linguistics that was previously consigned to footnotes, but in Brombergerian fashion, it ends on a sceptical note about our current epistemic status regarding the nature of words. As a result of his scepticism, Bromberger does not even claim to answer the question “what are words”, but instead hands out one of the philosophical consolation prizes (2011: 503):&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;But at least we are at a point where we can appreciate with some precision what we know we do not know.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The question into the nature of words broad, sprawling, and to answer it we have to integrate large amounts of disparate information. We should not expect to do much better than Kaplan, Hawthorne, Lepore, and Bromberger — all serious philosophers and researchers in their own rights! — unless we build upon their work and that of others. Although I did not expect an answer, I looked into the debate, because I want to keep my eyes on the actual prize, the answer to the question what words are.&lt;/p&gt;

&lt;p&gt;The debate serves as a warning about taking the question too lightly. Kaplan was troubled by a challenge to one of the most influential semantic theory, and thought a quick discussion of the nature of words could help resolve it. But the nature of words, what they are and how words can be individuated, is far too subtle a topic to allow a quick discussion to then be used for other purposes. The danger of ending up with a partial picture — in Kaplan’s case a picture limited primarily to proper names seen through the lens of semantics — is too great.&lt;/p&gt;

&lt;p&gt;While Hawthorne and Lepore are motivated by broader concerns, they set up their response in a rather limiting way (2011: 448):&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Our aim in this paper is to further advance an understanding of the nature of words, both by remedying the problems with Kaplan’s account, and also by achieving a suitable perspective on what the metaphysical investigation of word identity can hope to achieve.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Hawthorne and Lepore target the shortcomings of Kaplan’s account and otherwise discuss specifically the metaphysical investigation of the individuation criteria of words. Those are issues that can be debated, but they cover just a tiny fraction of the topic circumscribed by “the nature of words” and largely neglect linguistic or cognitive considerations.&lt;/p&gt;

&lt;p&gt;Metaphysical issues are an excellent way to get side–tracked prior to proper engagement with a subject matter. This is not to say that metaphysical questions are nonsensical or that their answers are epistemically inaccessible,&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; but it is a warning about the proper place of metaphysics. The greatest hope for addressing the metaphysical issues surrounding the nature of words lies in accumulating sufficient empirical knowledge about their linguistic nature to then bring it to bear on the metaphysical questions. The Kaplan–Hawthorne–Lepore exchange would have been improved by avoiding any of the issues surrounding four–dimensionalism and discussing e.g. compound nouns at greater length. The role intent is closer to the subject matter — cognition has some bearing on the nature of words — but the exchange on this issue does not reach very far.&lt;/p&gt;

&lt;p&gt;The lesson is that empirical complexities cannot just be ignored away by focusing on those areas least accessible to empirical investigation. While it might appear more philosophical to debate four–dimensionalism and the role of intent, that does not make it the right approach to uncover the nature of words. That lesson is in line with Kripke’s insight: The nature of water needed to be uncovered using the relevant sciences, in this case chemistry and physics. Why would words be so different?&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Bromberger, S. (2011). &lt;a href=&quot;https://philpapers.org/rec/BROWAW-8&quot;&gt;What Are Words? Comments on Kaplan (1990), on Hawthorne and Lepore, and on the Issue&lt;/a&gt;. Journal of Philosophy, 108(9), 486–503.&lt;/li&gt;
  &lt;li&gt;Hawthorne, J., &amp;amp; Lepore, E. (2011). &lt;a href=&quot;https://www.jstor.org/stable/23142917&quot;&gt;On Words&lt;/a&gt;. The Journal of Philosophy, 108(9), 447–485.&lt;/li&gt;
  &lt;li&gt;Kaplan, D. (1990). Words. Proceedings of the Aristotelian Society, Supplementary Volumes, 64, 93–119.&lt;/li&gt;
  &lt;li&gt;Kaplan, D. (2011). Words on Words. The Journal of Philosophy, 108(9), 504–529.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h2&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The initial footnote of Bromberger’s paper makes clear that he never got proper access to Kaplan’s response to Hawthorne and Lepore. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;I gather that this might be the “golly value” described by Bromberger in his paper &lt;a href=&quot;https://philpapers.org/rec/BRORI&quot;&gt;“Rational Ignorance”&lt;/a&gt;. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Although that too might be the case. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Thu, 02 May 2024 11:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/A-Debatte-about-Words/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/A-Debatte-about-Words/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Daniel Dennett (1942-2024)</title>
        <description>&lt;p&gt;Daniel Dennett has passed away.&lt;/p&gt;

&lt;p&gt;While my own connection to Dennett was limited, I want to share a few of memories, because these moments spent with and around Dennett impressed me greatly.&lt;/p&gt;

&lt;p&gt;I met Dennett during my visit at Tufts University in 2017, when I was in the middle of my PhD in philosophy. I don’t believe I had been aware of Dennett’s presence at Tufts when I initially planned the trip, but once I caught wind of it, I had to sit in on one of his courses. Dennett was willing to let me attend his course; on the condition that it wasn’t too overbooked and I didn’t take away a place from a registered student.&lt;/p&gt;

&lt;p&gt;It was an undergraduate course on his then just out book “From Bacteria to Bach and Back: The Evolution of Minds”. The crowd thinned out a little as the term progressed, but I stuck around. In addition to Dennett’s own book, I also read Peter Godfrey-Smith’s “Other Minds” for discussions in the class. For a few classes, Dennett was away, engaged in some professional manner, perhaps giving a talk or presenting his book. The weeks he was there were always a highlight. Hopefully, my contributions as a PhD student amongst Bachelor students were not too obnoxious.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/magic-lantern.png&quot; alt=&quot;A magic lantern projecting the Cartesian theatre&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Dennett expressed dissatisfaction with the turns academic philosophy had taken: The all–too–common disciplinary navel–gazing lacking any serious engagement with science. The inability of philosophers to imagine possibilities and their insistence that their lack of imagination was a proof of something. Dennett was impatient with some questions of metaphysics, or other intellectual puzzles that philosophers entertain themselves with — chmess! — because he was aware that actual progress can be made in science.&lt;/p&gt;

&lt;p&gt;By sticking around, I got to know Dennett a little better. I remember having pizza with him, or rather sitting next to him having pizza with the others, since there was no vegan option. It didn’t matter; I was listening to this great mind and his anecdotes, many of which I was happy to read again in his recent autobiography. His life was truly worth a book and more.&lt;/p&gt;

&lt;p&gt;One could just hang out with Dan and the crowd forming around him, and end up having a debate about cognition, or evolution, or AI, or anything else that caught one’s intellectual fancy. Dennett was accessible and made everything around him accessible. Not just philosophy, but science and art as well, the entirety of the intellectual world. One just had to not let oneself be intimidated, be imaginative, and make one’s case. I’m deeply grateful that Dennett was open to such easy engagement and let me be part of it for a few months.&lt;/p&gt;

&lt;p&gt;For more memories of Dennett see:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://philosopherscocoon.typepad.com/blog/2024/04/in-memoriam-daniel-c-dennett-1942-2024.html&quot;&gt;Philosopher’s Cocoon&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://schwitzsplinters.blogspot.com/2024/04/flexible-pluralism-about-others.html&quot;&gt;Schwitz Splinters&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://dailynous.com/2024/04/19/daniel-dennett-death-1942-2024/&quot;&gt;Daily Nous&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://leiterreports.typepad.com/blog/2024/04/in-memoriam-daniel-dennett-1942-2024.html&quot;&gt;Leiter Reports&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Sat, 20 Apr 2024 09:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/Daniel-Dennett/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Daniel-Dennett/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Suggestions for Better AI Criticism</title>
        <description>&lt;p&gt;Although I am &lt;a href=&quot;https://dstrohmaier.com/compositionality-word-meaning/&quot;&gt;acutely
aware&lt;/a&gt; of the
shortcomings of the current generation of AI models
(transformer-models in particular), most of the criticisms of AI I
stumble upon on the internet have become repetitive and lacking
insight. I am not interested in picking on anyone here, I’m interested
in reading more interesting criticisms. Therefore, I will provide a
list of proposals for &lt;em&gt;better&lt;/em&gt; AI criticism.&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/dog-reading-news.jpg&quot; alt=&quot;Picture of a dog reading a
newspaper&quot; /&gt;&lt;/p&gt;

&lt;p&gt;My suggestions pertain to criticism of the current abilities of AI,
rather than criticism of their broader social consequences (supposed
harm, replacement of human activities etc.). The criticisms I have in
mind are of the blog-post length and formality, not academic. The list
is neither complete, nor beyond dispute. Hopefully, it serves as a
source of sharpening ideas. Here are my suggestions:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;In your criticism, be specific about the architecture, or the family of
architectures. Not all neural network technologies are transformer
models, or even more specifically GPT models. Seek to tie your
criticism back to the specifics of the architecture. For example,
“as a transformer, the model lacks a bias about the order in which
to process a sequence and therefore…”.  The more general the
target of the criticism, the stronger the argument needs to
be. Unless you have an excellent argument, I advise against
dismissing neural networks in general.&lt;/li&gt;
  &lt;li&gt;Seek to distinguish whether the shortcomings are due to
architecture, data, training time, or another factor. If you cannot
be sure about the source, avoid claims relying on what the source
is. Fewer claims are possible when the details of the model are
secret, as in the case of most OpenAI model. While this is
frustrating, it limits what diagnoses are warranted.&lt;/li&gt;
  &lt;li&gt;When there are good reasons to be critical of the hype surrounding
AI technologies, avoid policing of emotional reactions. It is fairly
unproductive to scorn people for being impressed by what current
models can do, not least because by the expectation of two decades
ago, the models perform impressively. Instead, provide insight into
the models and their limitations, so that people can adjust their
reactions according to the reasons provided.&lt;/li&gt;
  &lt;li&gt;Avoid reductive claims as a standalone form of criticism,
i.e. claims that models are “just x”. For example, the claim that
language models are just statistical models for predicting the
next/masked word is on its own rather uninteresting, unless it is
embedded in a larger argument. If you go down the route of the
larger argument, consider whether the reductive claim is true due to
the model architecture and therefore general, or only due to a
specific usage of the architecture. For example, in transformers the
tokens can also easily be made to stand for other elements in a
sequence than words. They do not have to be just a statistical model
for predicting the next word.&lt;/li&gt;
  &lt;li&gt;When criticising examples created by models, think in terms of
distributions. From where in the distribution of model output are
the examples taken? Are the samples cherry-picked, that is, are they
examples of especially good performance? Then, stricter criteria for
their evaluation are warranted. Are they representative of the
entire distribution of model output? Then, it is more appropriate to
give a sense of the range of the examples. “Out of 5 examples, all
showed behaviour x” is a very different statement for a
cherry-picked sample of output and a more representative one. Be
open about the sampling.&lt;/li&gt;
  &lt;li&gt;When comparing to human cognitive capacities, provide evidence to
support your comparison. Unchecked by evidence, we are dubious
judges of our abilities. Asserting that people &lt;em&gt;never&lt;/em&gt; make a
certain sort of error — “a real person would never fail to see
that…” — requires empirical data to support the claim.&lt;/li&gt;
  &lt;li&gt;Be aware of human tendencies in processing input, in this case the
output of AI models, and adjust your criticism to it. We tend to be
very generous in our interpretation of text, doing our best to make
sense of it. We might be less forgiving with other forms of input
(e.g. video). The targeting of your AI criticism should
reflect more than our human processing biases.&lt;/li&gt;
  &lt;li&gt;Moving goal posts can be acceptable, but provide a justification for
why the goal posts have to be moved. Often we put the goal posts
where we believed that they would capture something deeper:
Reasoning and understanding. While we might have been mistaken in
that judgement, it needs to be argued &lt;em&gt;why&lt;/em&gt; we were mistaken and
&lt;em&gt;why&lt;/em&gt; the new place for the goal post will do any better. That a
model beat a goal post is, on its own, not a reason to move the
post.&lt;/li&gt;
  &lt;li&gt;Stay curious.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Better in terms of being intellectually enlightening and pushing forward science. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Sun, 18 Feb 2024 14:25:00 +0000</pubDate>
        <link>https://dstrohmaier.com/better-ai-criticism/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/better-ai-criticism/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Compositionality and Word Meaning</title>
        <description>&lt;p&gt;Transformer models do not learn compositionality. That is, they do not acquire the ability to construct hierarchical structures from smaller units by repeatedly applying the same rules.&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;I speculated about this a while ago, in &lt;a href=&quot;https://dstrohmaier.com/transformer-speculations/&quot;&gt;this post&lt;/a&gt;. More importantly, &lt;a href=&quot;https://doi.org/10.1613/jair.1.11674&quot;&gt;research&lt;/a&gt; has shown that while transformer models perform better on compositionality tasks than previous model types, they still cannot consistently solve it (see also &lt;a href=&quot;http://dstrohmaier.com/compositionality-a-paper/&quot;&gt;this post&lt;/a&gt;). A more &lt;a href=&quot;https://openreview.net/forum?id=Fkckkr3ya8&quot;&gt;recent paper&lt;/a&gt; investigating the OpenAI GPT models, which are larger and more sophisticated than most transformer models, has again found that these models fail to learn to act in accordance with the principle of compositionality.&lt;/p&gt;

&lt;p&gt;What does limitation mean for whether transformers can capture word meaning?&lt;/p&gt;

&lt;p&gt;The meaning of a word is closely tied to what it can contribute to the meaning of the compositional whole. The degree of dependence might vary, for example, the meaning of a name such as “Tom”, might depend little on the compositionality. For other words such as privatives, e.g. “fake”, it is hard to see how to understand their meaning in other ways than as their contribution to the compositional whole. It is important that FAKE in “They paid with fake money, taking the painting with them.” has MONEY within its scope. Thus, the argument goes that transformers must in principle be deficient in lexical semantics due to their current inability to learn compositionality.&lt;/p&gt;

&lt;p&gt;Another line of argument, however, goes as follows: Transformers compensate for their lack of compositional abilities by excessively attending to the nuances of lexical meaning. For example, a transformer might pick up that whatever money refers to, it is the kind of thing that is often faked. While we are able to resort to compositional rules to make out the meaning of a sentence, the model has to resort to what information it can gleam about the statistics of the words making up their sentences. Of course, transformer models do not operate on mere bags of words, in all standard versions they have access to positional information and appear able to infer &lt;a href=&quot;https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00349/96482/A-Primer-in-BERTology-What-We-Know-About-How-BERT&quot;&gt;grammatical relations&lt;/a&gt;, but the relations between words might serve as the crucial crutch.&lt;/p&gt;

&lt;p&gt;Transformers are deficient in lexical semantics and transformers are hyper-attentive to lexical semantics. This statement is no contradiction, because word meaning has multiple aspects and can be processed in multiple ways. Investigating lexical semantics in transformers requires awareness of their shortcomings with regard to compositionality as well the compensation mechanisms they might employ. Given all the challenges of exploring how transformers treat word meaning — the problem of sub-word tokenization, accounting for contextualisation, etc.  — this is no trifle.&lt;/p&gt;

&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The definitions of compositionality differ in detail. Within linguistics, one version goes as follows: The meaning of the whole is a function of the meaning of its parts (as structured by syntax). This is a good guiding gloss for linguistics, many papers investigating transformers, however, use something close to what I suggest above. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Tue, 06 Feb 2024 12:25:00 +0000</pubDate>
        <link>https://dstrohmaier.com/compositionality-word-meaning/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/compositionality-word-meaning/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Book Publication: Preference Change</title>
        <description>&lt;p&gt;Now available, &lt;a href=&quot;http://doi.org/10.1017/9781009181860&quot;&gt;an open-access introduction to preference change&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/cover.jpg&quot; alt=&quot;Book Cover&quot; /&gt;&lt;/p&gt;

&lt;p&gt;When I got into the topic of preference change, a few years back by now, such an introduction was sorely lacking. I hope many readers find it of value. Much research on preference change remains to be done and I hope the readers can help with that!&lt;/p&gt;

&lt;p&gt;Michael Messerli and I have co-authored the book, which has been published by Cambridge University Press in the Elements Series for Decision Theory and Philosophy. I greatly enjoyed working with Michael and everyone else involved in the process of getting this book together. Thank you!&lt;/p&gt;
</description>
        <pubDate>Thu, 18 Jan 2024 07:25:00 +0000</pubDate>
        <link>https://dstrohmaier.com/book-out-preference-change/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/book-out-preference-change/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Generative Senses: A Prolog Exercise</title>
        <description>&lt;p&gt;Prolog is not the tool of choice for most of NLP nowadays, but this didn’t always use to be the case. The unreasonable effectiveness of neural networks for most practical NLP tasks has led to this shift, since implementing neural networks in Prolog is rather awkward. But some ideas and theories from previous decades are still interesting, for theoretical exploration if not practical application, and they are often well-implemented using Prolog. Along these lines, I have implemented some core operations from &lt;a href=&quot;https://philpapers.org/rec/FRASGA&quot;&gt;Bradley Franks’ 1995 paper “Sense Generation”&lt;/a&gt;. You can find the code on &lt;a href=&quot;https://github.com/dstrohmaier/generative-senses&quot;&gt;github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This is an intriguing paper, and while it is outdated in some respects, it also prefigures some more recent theories. It suggests a decompositional, quasi-classical approach to concepts. That is, it is proposes that the meaning of words can be split into symbolic components resembling definitions. A full defense of this position would require more than a single paper, so Franks considers only one particularly challenging case: Privatives, words such as “fake” or “false” that can radically change what a word means. A fake gun is no gun at all. Some general adjectives can also have a privative effect under the right circumstances: a rubber duck is no duck and a stone lion is no lion. Bradley’s paper accounts for such effects by representing concepts using attribute-value structures (AVS). Unusually, the main AVS for a lexical entry is split into two sub-AVS:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;The central AVS which includes features of the conceptual core.&lt;/li&gt;
  &lt;li&gt;The diagnostic AVS which includes features used to identify objects falling under the concept.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The operations in privative cases change these AVS. For example, in the case of “fake gun” the conceptual core features of the concept of GUN become negated. In the case of “stone lion”, the operations are more complex because the features for STONE need to be appropriately combined with those of LION. A stone lion has four legs (a diagnostic feature according to Franks), but it is not an organic being or a lion at all (conceptual core features for LION). The operations needed for such combinations are implemented and tested in my Prolog code for three of Franks’ examples: “fake gun”, “stone lion”, and “wild lion”.&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; The paper has much more content, but these operations are at the heart of it.&lt;/p&gt;

&lt;p&gt;While the theory is from 1995, the Prolog code is more modern. I used the opportunity to try out a number of recent Prolog innovations to make the code simpler and logically purer. Cuts, a Prolog feature that would most certainly have been used in a 1995 implementation, are completely avoided. Franks’ theory invited the use of reification, that is the explicit representation of truth-values, which together with the reif library made the reasoning much easier. It was a great opportunity to showcase some of modern Prolog’s potential.&lt;/p&gt;

&lt;p&gt;My code was tested on &lt;a href=&quot;https://www.scryer.pl/&quot;&gt;Scryer-Prolog&lt;/a&gt;, but should work with minor changes on other implementations as long as they have versions of the libraries I used. If you want to try it out and have any problems, feel free to message me about it.&lt;/p&gt;

&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The last one, of course, is not a privative case, since wild lions are lions, but it can serve as a test-case anyway. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Wed, 20 Dec 2023 09:25:00 +0000</pubDate>
        <link>https://dstrohmaier.com/generative-senses/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/generative-senses/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Compositionality and Transformers: A Paper</title>
        <description>&lt;p&gt;Compositionality is one of the long-standing challenges to neural NLP. &lt;a href=&quot;/transformer-speculations&quot;&gt;I’m myself a bit sceptical that transformers really offer the kind of compositional processing found in human language processing&lt;/a&gt;. But even formulating the challenge can be a challenge.&lt;/p&gt;

&lt;p&gt;In it’s formulation by Partee (1995), the principle of compositionality states:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The meaning of a whole is a function of the meanings of the parts and of the way they are syntactically combined.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;NLP researchers have investigated for a while whether neural network models respect the presumed compositionality of language, but even with Partee’s principle of compositionality in hand, it is not clear what “respecting compositionality” would mean. Is it sufficient if a neural model can process sentences the meaning of which is compositional or are there further restrictions on how to process them?&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://doi.org/10.1613/jair.1.11674&quot;&gt;A paper that explicitly addresses such questions&lt;/a&gt; has been put forward by Hupkes, Dunkers, Mul, and Bruni (2020, e.g. page 759). Their paper, entitled “Compositionality Decomposed: How do Neural Networks Generalise?”, split compositionality into five task descriptions&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; and tested sequence-to-sequence models on them.&lt;/p&gt;

&lt;p&gt;The five task descriptions are:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Systematicity: The ability to process one sentence guarantees the ability to process a compositionally related sentence. Anyone who understands “The black cat hunts the red bird” will also understand “The red cat hunts the black bird”.&lt;/li&gt;
  &lt;li&gt;Productivity: From finite semantic components, arbitrarily long semantic wholes can be formed. We can form and understand the phrase: “The cat hunts the bird, which ate the worm, which crawled through the earth, which…”.&lt;/li&gt;
  &lt;li&gt;Substitutivity: Replacing a semantic component with a synonymous phrase should not affect the overall meaning. For example, it should make no difference to replace “the black cat” with “the cat, which was black”.&lt;/li&gt;
  &lt;li&gt;Localism: The semantic function only depends on the syntactically local constituents. The meaning of “the red bird” does not differ depending on whether it occurs in the sentence “The black cat hunts the red bird” or “The red bird caught the worm”.&lt;/li&gt;
  &lt;li&gt;Overgeneralisation: Faced with a compositional function with some exceptions, at first the function will be wrongly applied even in cases where an exception occurs. For instance, a child learning English might use the standard derivation of the past tense for the verb “run”, arriving at “runned” instead of “ran”.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All of these task descriptions can be questioned when it comes to natural language, as the authors of the paper are aware. Using one of their examples, localism appears to be violated when global context is required for disambiguating words. To understand what it means for the bat to fly right into my face, it is important to know whether the event occurred in a cave or on a baseball field. Context beyond the sentence might disambiguate the meaning. But if you disagree with the inclusion of a specific task description, that is no problem for the paper, since you can just ignore those results. If you disagree with all of them, I struggle to see what is left of compositionality.&lt;/p&gt;

&lt;p&gt;This is an excellent paper that avoided many of the pitfalls of previous research. It is certainly hard to look at the results — transformers performing reasonable well on substitutivity and systematicity — and think that neural networks are hopeless when it comes to compositionality, although they are obviously not perfect either. The transformer models seem specifically to struggle with productivity and localism, looking at the numbers in Table 1 on page 774. The limits of productivity especially suggest that the model has been unable to properly derive a rule it can arbitrarily apply. The transformer models only reach 50% accuracy on sequences longer than they have encountered before. That is in line with my scepticism about the compositional abilities of transformers.&lt;/p&gt;

&lt;p&gt;But I am also worried about the external validity of the positive results, that is whether we can infer from the strong showing of transformers in some of the experiments, e.g. the one investigating substitutivity, that they have the tested for skills also in the case of natural language. My worries are due to the artificial language employed by Hupkes et al. The language is purely instructional. It describes operations on strings of characters that return strings of characters. For example, an operation might be to reverse a string or to repeat it. The neural models are supposed to apply such functions, the end result always being new strings.&lt;/p&gt;

&lt;p&gt;The language lacks variables, quantifiers, and negation. As a consequence, the semantic functions are quite different from most of natural language. One cannot even express a thought like “If you switch the first two elements of a string, and then switch the first two element of the resulting string, you arrive at the original string”. The analysis of this proposition requires quantification and variables, which are not available in the language used by Hupkes et al.&lt;/p&gt;

&lt;p&gt;But do these shortcomings matter? As formulated above, the principle of compositionality requires that the meaning of the whole is a function of the meaning of the parts. It does not specify the functions. But surely the function matters! Otherwise the function which takes any part and returns the value True would be acceptable. Surely, every neural network can learn such a function, but we wouldn’t call the networks compositional on that basis.&lt;/p&gt;

&lt;p&gt;The functions chosen by Hupkes et al. are more interesting. They are not trivial, since different sentences and components map to different string outputs. But they are less complex than the functions required for analysing natural language. Composition turns into a different beast once quantification comes into the picture. Variables need to be dealt with. In natural language, variables of components might be free and therefore have no fixed meaning in the absence of an assignment. Consider a sentence so simple as “A cat hunts a bird”. The analysis of the component “hunts a bird” would include a free variable for the missing agent. Only when the noun phrase of “A cat” is added would this variable be bound.&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; There is nothing equivalent to such free variables in the Hupkes et al. paper. The incompleteness that pervades the composition of meaning in natural language is absent. Sceptics of neural models have, as a result, an easy time discounting the positive results the paper suggests.&lt;/p&gt;

&lt;p&gt;Such problems would not arise if one trained the network on first order logic with a model-theoretic semantics.&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; Why not ask the neural network about the truth of a sentence such as:&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;\[ \forall s, t \in \text{Strings} (\text{switch_1st_and_2nd_character}(s, t) \leftrightarrow  \text{switch_1st_and_2nd_character}(t, s) )  \]&lt;/p&gt;

&lt;p&gt;Such formal languages are well-understood and training examples can be automatically generated. They can also be decomposed and the components would then include free variables.&lt;/p&gt;

&lt;p&gt;I assume that the challenge of interpretation-dependence led Hupkes et al. to avoid such a solution. In model-theoretic semantics the semantic value of a sentence is relative to an interpretation, that is a description of the states of the world that make the sentence either true or false. Accordingly, if the output is supposed to be the meaning of a sentence, this would require the specification of an interpretation, either explicitly or implicitly through training examples. On the explicit approach, one has to feed an entire model as input into the neural network. In the implicit approach, the network has to learn the interpretation from the inputs it is received. Both approaches are more challenging than the one pursued by Hupkes et al.&lt;sup id=&quot;fnref:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;This challenge, however, can be met. In fact, &lt;a href=&quot;https://philpapers.org/rec/STRCAL-7&quot;&gt;research I have conducted with Simon Wimmer&lt;/a&gt; has provided models with artificial sentences and simple serialisations of a situation that either make the sentence true or false (see Strohmaier &amp;amp; Wimmer forthcoming). Our focus wasn’t on compositional semantics, as we were interested only in the semantic function for attitude ascriptions, and therefore did not include free variables either, but we were able to train a transformer-encoder on something closer to model-theoretic semantics.&lt;/p&gt;

&lt;p&gt;There have been some &lt;a href=&quot;https://arxiv.org/abs/1802.08535&quot;&gt;experiments investigating entailment using logical formulas&lt;/a&gt; (Evans et al. 2018), but for different logical systems. The &lt;a href=&quot;https://doi.org/10.18653/v1/2020.emnlp-main.731&quot;&gt;COGS dataset&lt;/a&gt; (Kim &amp;amp; Linzen 2020), which aims to evaluate compositional abilities using the task of mapping natural language to logical form, is also noteworthy in this context. But as far as I can tell, no one has tested for the task descriptions proposed by Hupkes et al. using first-order logic formulas.&lt;/p&gt;

&lt;p&gt;Assuming I haven’t missed something — if I have please email me! — there is room for a further empirical test of the compositional abilities and shortcomings of neural networks. Since I am busy with other research projects at the moment, I haven’t yet further explored this space, but I might come around to it (and if you are interested and able to cooperate on this, send me an email).&lt;/p&gt;

&lt;h3 id=&quot;references&quot;&gt;References&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Evans, R., Saxton, D., Amos, D., Kohli, P., &amp;amp; Grefenstette, E. (2018). &lt;a href=&quot;https://doi.org/10.48550/arXiv.1802.08535&quot;&gt;Can Neural Networks Understand Logical Entailment?&lt;/a&gt; (arXiv:1802.08535). arXiv.&lt;/li&gt;
  &lt;li&gt;Hupkes, D., Dankers, V., Mul, M., &amp;amp; Bruni, E. (2020). &lt;a href=&quot;https://doi.org/10.1613/jair.1.11674&quot;&gt;Compositionality Decomposed: How do Neural Networks Generalise?&lt;/a&gt; Journal of Artificial Intelligence Research, 67, 757–795.&lt;/li&gt;
  &lt;li&gt;Kim, N., &amp;amp; Linzen, T. (2020). &lt;a href=&quot;https://doi.org/10.18653/v1/2020.emnlp-main.731&quot;&gt;COGS: A Compositional Generalization Challenge Based on Semantic Interpretation.&lt;/a&gt; Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 9087–9105.&lt;/li&gt;
  &lt;li&gt;Partee, B. H. (1995). Lexical semantics and compositionality. In Language: An invitation to cognitive science, Vol. 1, 2nd ed (pp. 311–360). The MIT Press.&lt;/li&gt;
  &lt;li&gt;Strohmaier, D., &amp;amp; Wimmer, S. (forthcoming). &lt;a href=&quot;https://philpapers.org/rec/STRCAL-7&quot;&gt;Contrafactives and Learnability: An Experiment with Propositional Constants.&lt;/a&gt; Post-Proceedings of Logic and Engineering of Natural Language Semantics 19.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;I chose the name “task description” over “tasks”, since some of these descriptions concern how a task is fulfilled rather than the nature of the task itself. For example, overgeneralisation describes the trajectory of learning a task. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Linguists typically use the lambda-operator to enable this semantic composition and have the variables bound. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Even if neural network can learn the semantics of first-order logit, further challenges would remain. Natural language exceeds first-order logic, for example because it involves quantification over predicates. But these further challenges might be less specific to the compositional abilities of neural networks. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;How would this formulate evaluate when the string consists of only a single character? One could return a presupposition-failure in this case. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:5&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;A model would not be sufficient to determine the semantic value of phrases including free variables, which would occure in an experiment following exactly the lines of Hupkes et al. One could deal with these free variables by providing assignments, or by allowing the model to return an indeterminate value. &lt;a href=&quot;#fnref:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Tue, 29 Aug 2023 12:25:00 +0100</pubDate>
        <link>https://dstrohmaier.com/compositionality-a-paper/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/compositionality-a-paper/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>LLMs and Human Cognition: Shifting Arguments, Same Assumptions</title>
        <description>&lt;p&gt;Large transformer-based language models (LLMs) are performing well on a variety of tasks. This is  a reason to reconsider our understanding of language and how humans process it. Especially those sceptical of neural network approaches face a challenge, such as Noam Chomsky and his followers, have come under pressure. They need to justify their scepticism about the abilities of neural networks in the light of apparent counter-evidence. Many justifications are available, a classic one being that LLMs need much more data than humans, but in this post I’ll discuss an argument by Chomsky I hadn’t heard until recently. I will follow my own progress of thought, which starts with an initial reaction of surprise to Chomsky’s argument to the dawning realisation, that it showed less of a change than I had at first suspected.&lt;/p&gt;

&lt;p&gt;A few weeks ago I listened to a Tyler Cowen interview with Chomsky, in which the latter made the following argument (see the &lt;a href=&quot;https://conversationswithtyler.com/episodes/noam-chomsky/&quot;&gt;transcript&lt;/a&gt;):&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;[An LLM] does exactly as well with impossible systems as with languages. Therefore, in principle, it’s telling you nothing about language.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The argument seems to be that LLMs are not only able to learn actual languages,&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; but also other systems of symbols that no human could learn.&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; Therefore, LLMs diverge so much from human cognition as to be uninformative.&lt;/p&gt;

&lt;p&gt;I found that argument surprising, and anyone who was arguing on Chomsky’s side in the 80s and 90s would have found it surprising back then as well. In those heydays of the battle between classical cognitive science and connectionism, philosophers/cognitive scientists like Fodor and Pylyshyn (1988) argued that connectionist models can &lt;em&gt;in principle&lt;/em&gt; not learn language (because their representations lack structure). Back then, that was the position broadly aligned with Chomsky. Today, after the rise of LLMs, Chomsky’s argument claims that neural models are not informative because they can learn roughly every pattern of symbols for which we have sufficient data, not just human language.&lt;/p&gt;

&lt;p&gt;It’s easy to look at this argumentative shift and think it shows that the critics of neural networks are grasping at straws. At first glance, they appear are forced to completely reverse their original position in response to the rise of LLMs. First they told you that neural networks couldn’t learn enough, and now they are telling you that they learn too much. But that interpretation is making it a little too easy. There is a consistent set of ideas underlying these superficially opposed arguments. These ideas include:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Language acquisition provides core insights about language cognition: The processing of language is tied to how one can acquire language&lt;/li&gt;
  &lt;li&gt;Innate skills are a core part of language acquisition: Humans do not learn language from scratch, but starting with innate skills that make human-like language cognition possible in the first place
&lt;!-- 3. Language and thought are deeply intertwined:[^X] --&gt;
&lt;!-- 4. Thought and language are systematic: Language users must have certain abilities together, e.g. be able to both think relation(a,b) and relation(b,a)[^Y] --&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of these points are trivial. Consider the first point that language learning tells you something about language cognition after the learning has (largely) stopped. A person other than Chomsky might easily take the position that these processes can be treated separately. They might concede that LLMs learn language in a very different way compared to humans and therefore throw little light on language acquisition. At the same time, they might assert  once the LLMs are trained they process language in a rather similar way to humans. In other words, the model might capture language processing without capturing language learning/acquisition. For someone with this view, it would be of little interest that LLMs can also learn languages humans cannot learn. As long as they process actual languages the same, why worry that LLMs are also able to process other sequence patterns, if trained on those patterns?&lt;/p&gt;

&lt;p&gt;Chomsky’s position rules such a stance out, because the core skill of language cognition is one of hypothesis-driven explanation. In their &lt;a href=&quot;https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html&quot;&gt;New York Times opinion piece&lt;/a&gt; Chomsky et al. write:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Human-style thought is based on possible explanations and error correction, a process that gradually limits what possibilities can be rationally considered.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;According to Chomsky et al., we are seeking to explain and testing conjectured explanations against limited input. Both in acquisition and processing human thought is supposedly based on this core skill.&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Hypothesis-driven rationalist inference is contrasted with mere association resulting from statistical inference. Statistical methods might help in the evaluation of a hypothesis, but the inference process is primarily turning around the symbolic hypotheses themselves, not the statistics. This matters, in Chomsky’s view, because symbolic hypotheses can &lt;em&gt;rule out&lt;/em&gt; options, rather than make them merely unlikely, as statistical inference does.&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;On this picture, language acquisition and processing relies on hypothesis-driven cognition. That is a view Chomsky has long held, probably for the majority of a century by now. This position is clearly continuous with the critiques of the 80s &amp;amp; 90s. In their influential paper from this period, Fodor and Pylyshyn (1988) ended on a closely related noted:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;There is an alternative to Empiricist idea that all learning consists of a kind of statistical inference, realized by adjusting parameters; it’s the Rationalist idea that some learning is a kind of theory construction, effected by framing hypotheses and evaluating them against evidence. We seem to remember having been through this argument before. We find ourselves with a gnawing sense of deja vu.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A deja vu indeed! 35 years later Chomsky and his collaborators make the same point again: Symbolic hypotheses are conjectured and then tested against limited evidence. Apparently, the point never gets too old to bear repeating. On its own, however, the point has not had its intended force because the defenders of neural network approaches keeping wondering&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;whether symbolic hypothesis formation and testing really is at the core of both language acquisition and processing, and&lt;/li&gt;
  &lt;li&gt;whether neural networks cannot implement a form of this process after all?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In light of the new evidence of LLM performance, it might be sensible to review these two underlying hypotheses. Chomsky does not touch upon those reasons in the interview with Cowen, he presumes them. His reaction to LLMs does not go so far as to question these underlying assumptions.&lt;/p&gt;

&lt;p&gt;To put my cards on the table, the performance of LLMs (and their correlation with cognitive measures, see &lt;a href=&quot;/transformers-and-the-brain/&quot;&gt;this post&lt;/a&gt; and &lt;a href=&quot;/Transformers-Converge-Cognition/&quot;&gt;that post&lt;/a&gt;) have led me to believe that less of language processing relies on the kind of processes that Chomsky assumes to be the core and more on processes implemented by LLMs. Hard-coded rules or hypothesis-testing-derived rules drive fewer cognitive sub-processes and statistical matching drives more. I have come to doubt the scope of hypothesis 1. This update does not force me to accept that LLMs have sparks of AGI or implement much of human reasoning skills. It, nonetheless, has LLMs partially converge with human language processing.&lt;/p&gt;

&lt;p&gt;Listening to the Cowen interview, I was at first struck by how different Chomsky’s rationalist argument had become, only to realise I had been mistaken. If there is a problem with Chomsky’s argument, it is not so much that he has changed his tune. The problem is that the argument continues to rest on the same core assumptions and he hasn’t conceded an inch. Pressed to discuss LLMs, Chomsky does not even discuss these assumptions.&lt;/p&gt;

&lt;h3 id=&quot;references&quot;&gt;References&lt;/h3&gt;

&lt;p&gt;Fodor, J. A., &amp;amp; Pylyshyn, Z. W. (1988). &lt;a href=&quot;https://doi.org/10.1016/0010-0277(88)90031-5&quot;&gt;Connectionism and cognitive architecture: A critical analysis&lt;/a&gt;. Cognition, 28(1), 3–71.&lt;/p&gt;

&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;

&lt;!-- [^X]: Chomsky makes this point himself at another point in the interview.: &quot;Thought is what is generated by language. Language generates thought. They’re intimately related, if not indistinguishable.&quot; --&gt;

&lt;!-- [^Y]: This claim gets qualified by the notorious competence vs. performance distinction, another complication I will gloss over. --&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;If you want to be specific and use the vocabulary of Chomsky, the models would learn the statistics of external or E-language, not anything about internal or I-language. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;LLMs probably cannot learn to predict just any language. In fact, &lt;a href=&quot;https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00306/43545/Theoretical-Limitations-of-Self-Attention-in&quot;&gt;we know&lt;/a&gt; that standard Transformers have theoretical limits on learning certain languages. It is not relevant for the rest of the post, however, so I’ll gloss over it. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The New York Times opinion runs together whether something is impossible to learn or whether something can be learned to be impossible. It might be better to keep them distinct. If there are reasons to run them together, they are not obvious from the opinion piece. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;This might be granting too much to Chomsky. Why can statistical processes not rule anything out?  Why can they not represent with sufficient modal force? It’s not as if you cannot force a neural network to give you an output of 0 or 1 for a label that indicates rule-conformance. This counterexample presumably misses the point, but I have trouble understanding the point without making a lot of controversial assumptions that I see no reason to grant. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Sat, 12 Aug 2023 08:25:00 +0100</pubDate>
        <link>https://dstrohmaier.com/Shifting-Arguments/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Shifting-Arguments/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Transformers Converging with Cognition: More Papers</title>
        <description>&lt;p&gt;A while ago, I wrote up a number of papers (see &lt;a href=&quot;/transformers-and-the-brain&quot;&gt;this post&lt;/a&gt;), all of which suggested that transformer models have partially converged with human language cognition. Using various correlational measures and predictions the literature leads towards the conclusion that transformers and human language processing resemble each other.&lt;/p&gt;

&lt;p&gt;The rate of publishing in this field being what it is, new papers have come out or have come to my attention:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Goldstein et al.: &lt;a href=&quot;https://www.biorxiv.org/content/10.1101/2020.12.02.403477v4&quot;&gt;Thinking ahead: spontaneous prediction in context as a keystone of language in humans and machines&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Heilbron et al.: &lt;a href=&quot;https://www.pnas.org/doi/full/10.1073/pnas.2201968119&quot;&gt;A hierarchy of linguistic predictions during natural language comprehension&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Lyu et al.: &lt;a href=&quot;https://www.biorxiv.org/content/10.1101/2021.10.25.465687v3&quot;&gt;Finding structure during incremental speech comprehension&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Kumar et al.: &lt;a href=&quot;https://www.biorxiv.org/content/10.1101/2022.06.08.495348v3&quot;&gt;Reconstructing the cascade of language processing in the brain using the internal computations of a transformer-based language model&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Tuckute et al: &lt;a href=&quot;https://www.biorxiv.org/content/10.1101/2023.04.16.537080v2&quot;&gt;Driving and suppressing the human language network using large language models&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Arana et al.: &lt;a href=&quot;https://doi.org/10.1080/23273798.2023.2198245&quot;&gt;Deep learning models to study sentence comprehension in the human brain&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So far, the overall picture I derive from these papers is unchanged: Transformer-based language model exhibit a considerable convergence with measurements of human language cognition. Many of the papers underline this result. For example, Kumar et al. find convergence not just for the contextualised embeddings of transformer models, but also the weights of the attention heads. The convergence stretches to another aspect of transformers. While it remains a partial convergence, the finding is increasingly robust.&lt;/p&gt;

&lt;p&gt;The interpretation of the convergence is still a matter of discussion. Both Goldstein et al. and Heilbron et al. provide further evidence for the importance of prediction, a theme that was also strong in the paper by Schrimpf et al, which I discussed in &lt;a href=&quot;/transformers-and-the-brain&quot;&gt;my previous post&lt;/a&gt;. It seems increasingly clear that the human brain engages in a predictive process when processing language. Language modelling, although not necessarily in the exact forms (MLM, CLM etc.) used to pre-train transformers, has been vindicated as a cognitive task.&lt;/p&gt;

&lt;p&gt;That both the brain and transformers predict upcoming words and/or linguistic features cannot be the whole story, however. After all, language models based on LSTMs or other RNN models also engage in such predictions, but have been found to show less (though some) convergence with cognitive measurements. What is it specifically about transformers that leads to the convergence? And to repeat an insight gleaned from papers in the previous post: It cannot be just the number of parameters.&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;What, then, is about transformer models that explains the convergence with human language processing? The best answer I found in this new set of papers is that contextualisation matters. The Goldstein et al. paper provides evidence in that direction, comparing standard contextualised GPT-2 embeddings with de-contextualised GPT-2 embeddings and GloVE embeddings.&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; The standard GPT-2 embeddings perform best. But this does not answer all our questions: Why does contextualisation help? Is it because it addresses issues such as polysemy and homonymy? Or do transformers even partially address such issues as compositionality? (On the latter see &lt;a href=&quot;/transformer-speculations&quot;&gt;this post&lt;/a&gt; by me.)&lt;/p&gt;

&lt;p&gt;So far, the convergence finding has hold up in the literature. When it comes to interpreting the convergence, however, research is only inching forward with many questions left open. Both sides of the convergence are opaque, hence finding the convergence itself can only be an initial finding, albeit an extremely exciting one!&lt;/p&gt;

&lt;h2 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h2&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;One paper showing this is the one by &lt;a href=&quot;https://doi.org/10.18653/v1/2021.cmcl-1.2&quot;&gt;Merkx and Frank&lt;/a&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The comparison sadly does not include RNN models, which also provide a form of contextualisation. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Fri, 30 Jun 2023 09:05:00 +0100</pubDate>
        <link>https://dstrohmaier.com/Transformers-Converge-Cognition/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Transformers-Converge-Cognition/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Groningen Cognitive Modeling Spring School</title>
        <description>&lt;p&gt;I recently had the pleasure to attend the &lt;a href=&quot;http://www.cognitive-modeling.com/springschool/&quot;&gt;Groningen Cognitive Modeling Spring School&lt;/a&gt;. This spring school is an annual event, but I’ve only recently heard of it and applied soon after. I’ve been interested in the interpretation of neural network models as cognitive models for a while, and so it was time for deeper engagement with the dedicated cognitive modelling research.&lt;/p&gt;

&lt;p&gt;The spring school had different tracks for three cognitive modelling frameworks:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;http://act-r.psy.cmu.edu/&quot;&gt;ACT-R&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.ai.rug.nl/~niels/prims/index.html&quot;&gt;PRIMs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.nengo.ai/&quot;&gt;Nengo&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;ACT-R and PRIMs are both classic cognitive frameworks. Writing in ACT-R or PRIMs is similar to writing in a programming language,&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; but the basic constructs are based on a theory about the human cognitive architecture. Symbolic processing is a key part of these frameworks.&lt;/p&gt;

&lt;p&gt;I chose the third track and learned a decent chunk of Nengo, taught by &lt;a href=&quot;http://compneuro.uwaterloo.ca/people/terrence-c-stewart.html&quot;&gt;Terrence Stewart&lt;/a&gt;. In contrast to the two other frameworks, Nengo is a neural network framework. One specifies a neural network in Python, provides it with some input, and let’s the neurons compute, but it is not just another competitor to pyTorch, or Tensorflow. Nengo’s focus is on &lt;em&gt;neuro-biologically plausible&lt;/em&gt; models of neural networks. The neurons are spiking neurons, backpropagation is discouraged, and the time dimension matters. This is not your standard Deep Learning framework.&lt;/p&gt;

&lt;p&gt;Learning Nengo perfectly suited my goal to better understand how far the distance between the standard NLP neural models and cognitive models is. A look in the literature suggests, that Deep Learning models have started to partially converge with human cognitive processes (&lt;a href=&quot;/Transformers-Psychometric/&quot;&gt;see my post on this matter&lt;/a&gt;), but they remain biologically and cognitively implausible in many respects. For example, a standard Transformer model does not take relevantly longer to process a complex sentence than a simple one,&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; while humans certainly do. Time matters in Nengo in ways that I never had to bother with in pyTorch.&lt;/p&gt;

&lt;p&gt;The different nature of Nengo was particularly striking to me due to my background. While I was not the only computer scientist at the spring school, as far as I could tell I was the only one coming from NLP and heavily working with Transformer models. The disciplines of NLP and cognitive modelling have grown apart, despite their shared roots and some valiant research efforts to the contrary (especially the &lt;a href=&quot;https://cmclorg.github.io/&quot;&gt;Workshop on Cognitive Modeling and Computational Linguistics&lt;/a&gt;). The research I am working on aims to bridge this gap. Working with Nengo made the gap more apparent, and hopefully Nengo will be one tool to overcome it.&lt;/p&gt;

&lt;h2 id=&quot;nengo-resources&quot;&gt;Nengo Resources&lt;/h2&gt;

&lt;p&gt;If you want to learn more about &lt;a href=&quot;https://www.nengo.ai/&quot;&gt;Nengo&lt;/a&gt;, you can follow these links:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The &lt;a href=&quot;https://www.nengo.ai/nengo/v3.0.0/examples/advanced/nef-algorithm.html&quot;&gt;core algorithm&lt;/a&gt; of the Neural Engineering Framework underlying Nengo&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://compneuro.uwaterloo.ca/publications/stewart2012d.html&quot;&gt;Technical overview&lt;/a&gt; of Nengo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’ve also been pointed towards the book “How to Build a Brain” by Chris Eliasmith, who is behind much of the Neural Engineering Framework. So far I haven’t had the time to look at it myself.&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J., &amp;amp; Kaiser, L. (2023, January 23). &lt;a href=&quot;https://openreview.net/forum?id=HyzdRiR9Y7&quot;&gt;Universal Transformers&lt;/a&gt;. International Conference on Learning Representations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h2&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;In fact, ACT-R is heavily LISP-based, showing in what era of cognitive science it emerged. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Of course, there are non-standard Transformer models for which this claim is not true, e.g. when one introduces dynamic halting as in the Model of Dehghani et al. 2023. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Sun, 23 Apr 2023 08:45:00 +0100</pubDate>
        <link>https://dstrohmaier.com/Spring-School-Nengo/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Spring-School-Nengo/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Transformer Models Do Not Just Learn Surface Statistics</title>
        <description>&lt;p&gt;A common criticism of Transformer models, such as &lt;a href=&quot;https://openai.com/blog/chatgpt/&quot;&gt;ChatGPT&lt;/a&gt;, &lt;a href=&quot;https://arxiv.org/abs/1810.04805&quot;&gt;BERT&lt;/a&gt;, and &lt;a href=&quot;https://blog.google/technology/ai/bard-google-ai-search-updates/&quot;&gt;Bard&lt;/a&gt;, is that they only learn surface statistics. According to this criticism, the predictions by transformers are superficial, because they do not represent the underlying state. In the case of language, the models would only capture general co-occurrence, on which transformer LLMS are typically trained, but neither the underlying hierarchical nature of language nor anything about the states of the world.&lt;/p&gt;

&lt;p&gt;Evidence by now strongly suggests that this absolute criticism is wrong. In the following, I list the papers providing the evidence:&lt;/p&gt;

&lt;h2 id=&quot;board-games&quot;&gt;Board games&lt;/h2&gt;

&lt;p&gt;Transformer models learn states of board games (Chess, Othello) when modelling sequences. This evidence is very convincing in showing that Transformer models are in principle able to recover more than surface statistics.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Toshniwal et al. 2022.&lt;/li&gt;
  &lt;li&gt;Li et al. 2023&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The paper by Li et al. is especially convincing, since they test the role of the state representation using interventions.&lt;/p&gt;

&lt;h2 id=&quot;hierarchical-syntax&quot;&gt;Hierarchical Syntax&lt;/h2&gt;

&lt;p&gt;The states of Transformer language models reflect syntax, including a hierarchical structure which is not obvious from the surface of language strings:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Lin et al. 2019&lt;/li&gt;
  &lt;li&gt;Tenney et al. 2019&lt;/li&gt;
  &lt;li&gt;Rogers et al. 2020: 843-844&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;layer-wise-operations&quot;&gt;Layer-Wise Operations&lt;/h2&gt;

&lt;p&gt;Some layer-wise operations in Transformer models appear to reflect human interpretable concepts. That these operations at least appear associated with meaningful concepts, suggests that they do not just recover meaningless surface statistics:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Geva et al. 2022&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(This piece of evidence is perhaps more preliminary than the others.)&lt;/p&gt;

&lt;h2 id=&quot;correlations-with-psychometric-data&quot;&gt;Correlations with Psychometric data&lt;/h2&gt;

&lt;p&gt;Transformer language models appear to have some correlation with psychometric data, including human brain states. Presumably human cognition reflects an underlying world state when processing language:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Wilcox et al. 2020&lt;/li&gt;
  &lt;li&gt;Merkx &amp;amp; Frank 2021&lt;/li&gt;
  &lt;li&gt;Michaelov et al. 2021&lt;/li&gt;
  &lt;li&gt;Oh et al. 2021&lt;/li&gt;
  &lt;li&gt;Schrimpf et al. 2021&lt;/li&gt;
  &lt;li&gt;Caucheteux et al. 2022&lt;/li&gt;
  &lt;li&gt;Caucheteux &amp;amp; King 2022&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;The evidence presented in these paper supports a role for representation of an underlying state. At this point, I consider the statement “Transformer models only learn surface statistics” to be probably wrong. (My subjective credence that they learn something about the underlying states is around 90%.)&lt;/p&gt;

&lt;p&gt;I have not presented here evidence concerning the shortcomings of Transformer models. Such shortcomings exist. Specifically, the evidence I have pointed towards does not rule out that Transformer models are overrelient on surface statistics (for such a suggestion, see also Rogers et al. 2020: 843-844) and fail to model &lt;em&gt;some&lt;/em&gt; aspects of the underlying state. The presented evidences also does not show that Transformer models fully capture compositionality, &lt;a href=&quot;/transformer-speculations/&quot;&gt;which I personally doubt they do&lt;/a&gt;, or that they can fully grasp meaning in the absence of non-textual data.&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;Caucheteux, C., Gramfort, A., &amp;amp; King, J.-R. (2022). &lt;a href=&quot;https://doi.org/10.1038/s41598-022-20460-9&quot;&gt;Deep language algorithms predict semantic comprehension from brain activity.&lt;/a&gt; Scientific Reports, 12(1), Article 1.&lt;/li&gt;
  &lt;li&gt;Caucheteux, C., &amp;amp; King, J.-R. (2022). &lt;a href=&quot;https://doi.org/10.1038/s42003-022-03036-1&quot;&gt;Brains and algorithms partially converge in natural language processing.&lt;/a&gt; Communications Biology, 5(1), Article 1.&lt;/li&gt;
  &lt;li&gt;Geva, M., Caciularu, A., Wang, K., &amp;amp; Goldberg, Y. (2022). &lt;a href=&quot;https://aclanthology.org/2022.emnlp-main.3&quot;&gt;Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space.&lt;/a&gt; Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 30–45.&lt;/li&gt;
  &lt;li&gt;Merkx, D., &amp;amp; Frank, S. L. (2021). &lt;a href=&quot;https://doi.org/10.18653/v1/2021.cmcl-1.2&quot;&gt;Human Sentence Processing: Recurrence or Attention?&lt;/a&gt; Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, 12–22.&lt;/li&gt;
  &lt;li&gt;Li, K., Hopkins, A. K., Bau, D., Viégas, F., Pfister, H., &amp;amp; Wattenberg, M. (2023). &lt;a href=&quot;https://doi.org/10.48550/arXiv.2210.13382&quot;&gt;Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task&lt;/a&gt; (arXiv:2210.13382). arXiv.&lt;/li&gt;
  &lt;li&gt;Lin, Y., Tan, Y. C., &amp;amp; Frank, R. (2019). Open Sesame: Getting inside BERT’s Linguistic Knowledge. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 241–253. https://doi.org/10.18653/v1/W19-4825&lt;/li&gt;
  &lt;li&gt;Michaelov, J. A., Bardolph, M. D., Coulson, S., &amp;amp; Bergen, B. K. (2021). &lt;a href=&quot;http://arxiv.org/abs/2107.09648&quot;&gt;Different kinds of cognitive plausibility: Why are transformers better than RNNs at predicting N400 amplitude?&lt;/a&gt; ArXiv:2107.09648 [Cs].&lt;/li&gt;
  &lt;li&gt;Oh, B.-D., Clark, C., &amp;amp; Schuler, W. (2021). &lt;a href=&quot;https://doi.org/10.18653/v1/2021.acl-long.290&quot;&gt;Surprisal Estimators for Human Reading Times Need Character Models.&lt;/a&gt; Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 3746–3757.&lt;/li&gt;
  &lt;li&gt;Rogers, A., Kovaleva, O., &amp;amp; Rumshisky, A. (2020). &lt;a href=&quot;https://doi.org/10.1162/tacl_a_00349&quot;&gt;A Primer in BERTology: What We Know About How BERT Works.&lt;/a&gt; Transactions of the Association for Computational Linguistics, 8, 842–866.&lt;/li&gt;
  &lt;li&gt;Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., Tenenbaum, J. B., &amp;amp; Fedorenko, E. (2021). &lt;a href=&quot;https://doi.org/10.1073/pnas.2105646118&quot;&gt;The neural architecture of language: Integrative modeling converges on predictive processing.&lt;/a&gt; Proceedings of the National Academy of Sciences, 118(45), e2105646118.&lt;/li&gt;
  &lt;li&gt;Tenney, I., Xia, P., Chen, B., Wang, A., Poliak, A., McCoy, R. T., Kim, N., Durme, B. V., Bowman, S. R., Das, D., &amp;amp; Pavlick, E. (2019). What do you learn from context? Probing for sentence structure in contextualized word representations. International Conference on Learning Representations. https://openreview.net/forum?id=SJzSgnRcKX&lt;/li&gt;
  &lt;li&gt;Toshniwal, S., Wiseman, S., Livescu, K., &amp;amp; Gimpel, K. (2022). &lt;a href=&quot;https://doi.org/10.48550/arXiv.2102.13249&quot;&gt;Chess as a Testbed for Language Model State Tracking&lt;/a&gt; (arXiv:2102.13249). arXiv.&lt;/li&gt;
  &lt;li&gt;Wilcox, E. G., Gauthier, J., Hu, J., Qian, P., &amp;amp; Levy, R. (2020). &lt;a href=&quot;https://doi.org/10.48550/arXiv.2006.01912&quot;&gt;On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior&lt;/a&gt; (arXiv:2006.01912). arXiv.&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Mon, 13 Mar 2023 15:08:00 +0000</pubDate>
        <link>https://dstrohmaier.com/transformer-do-not-just-learn-surface-statistics/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/transformer-do-not-just-learn-surface-statistics/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Are Transformer LLMs Minds?</title>
        <description>&lt;p&gt;Transformer LLMs, such as &lt;a href=&quot;https://openai.com/blog/chatgpt/&quot;&gt;ChatGPT&lt;/a&gt;, &lt;a href=&quot;https://arxiv.org/abs/1810.04805&quot;&gt;BERT&lt;/a&gt;, and &lt;a href=&quot;https://blog.google/technology/ai/bard-google-ai-search-updates/&quot;&gt;Bard&lt;/a&gt;, have sufficiently impressed the public that some have described them as AI minds. But is this ascription of a mind justifiable?&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Betteridge’s law of headlines&lt;/em&gt; states:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Any headline that ends in a question mark can be answered by the word &lt;em&gt;no&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I believe this law fails for the present blog post. The correct answer to whether Transformer LLMs are minds is: &lt;em&gt;It is complicated, but it is typically not useful to describe them as minds.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I will make my case by asking and addressing more specific questions:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;What are Transformer LLMs?&lt;/li&gt;
  &lt;li&gt;Do we have a widely accepted set of necessary and sufficient conditions for ascribing a mind?&lt;/li&gt;
  &lt;li&gt;Is there a somewhat plausible cognitive science theory of mental states (e.g. beliefs) that would describe transformer LLMs?&lt;/li&gt;
  &lt;li&gt;What kind of concept is the concept MIND?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I will address each of these questions in turn.&lt;/p&gt;

&lt;h2 id=&quot;what-are-transformer-llms&quot;&gt;What are Transformer LLMs?&lt;/h2&gt;

&lt;p&gt;The answer to this question is: &lt;em&gt;Transformer LLMs are large language models using the Transformer deep learning architecture.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A language model is a model predicting the occurrence of words (or other linguistic units, such as characters) given a context. Generally, such models excel at predicting what the next element in a linguistic sequence might be. A large language model is simply a language model that has a large number of parameters and has been trained on large amounts of data. For example, GPT-3 had about 175 Billion parameters.&lt;/p&gt;

&lt;p&gt;The Transformer architecture (Vaswani et al. 2017) is currently the most prominent version of neural network deep learning. Without going into details of  the Transformer architecture, it can be said that it has enabled the networks to make better sense of general contextual information, although often limited to relatively small context windows. For more details, I recommend &lt;a href=&quot;http://nlp.seas.harvard.edu/annotated-transformer/&quot;&gt;this post on the inner workings of the original transformer&lt;/a&gt;,  &lt;a href=&quot;https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/&quot;&gt;this post on the family of Transformer models&lt;/a&gt;, and for an academic survey of what we know about Transformers see Rogers et al. 2020.&lt;/p&gt;

&lt;p&gt;Predicting the next element in a sequence is a very general task. It occurs when creating dialogue responses, as in the case of ChatGPT, but it can also serve as a training task to find generalising model parameters, which can then be used by integrating the Transformers in larger computational systems. For example, it is common to use states from the BERT model (Devlin et al. 2017) by putting additional neural network heads on top. Some Transformer LLMs have also been trained on other tasks than predicting words in context. In &lt;a href=&quot;https://openai.com/blog/chatgpt/&quot;&gt;the case of ChatGPT&lt;/a&gt;, the model has also been trained using reinforcement learning, in addition to the more standard deep learning methods. I will not consider the impact of these strategies in detail.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/gramme_machine.jpg&quot; alt=&quot;Picture of a complex machine&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;do-we-have-a-widely-accepted-set-of-necessary-and-jointly-sufficient-conditions-for-ascribing-a-mind&quot;&gt;Do we have a widely accepted set of necessary and jointly sufficient conditions for ascribing a mind?&lt;/h2&gt;

&lt;p&gt;The answer to this question is: &lt;em&gt;no&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;There is no such thing as a scientific consensus on the nature of minds. I personally endorse the computational theory of mind, according to which minds are computational systems. This condition is fulfilled by Transformers, but it is likely insufficient. After all, the chip in a microwave is a computational system and we usually do not consider it a mind.&lt;/p&gt;

&lt;p&gt;There are some standard proposals for additional conditions that might be added to arrive at a jointly sufficient set:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Ability to solve problems&lt;/li&gt;
  &lt;li&gt;Presence of certain mental states such as beliefs (access consciousness)&lt;/li&gt;
  &lt;li&gt;Phenomenal consciousness&lt;/li&gt;
  &lt;li&gt;Presence of a non-physical substance (substance dualism)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This list is by no means exhaustive, but it provides a sense of the range of possible conditions for having a mind. Depending on which conditions one endorses, the case for declaring Transformer LLMs to be minds looks quite different.&lt;/p&gt;

&lt;p&gt;The first condition is the least restrictive, so it is not surprising that a transformer LLM is likely to fulfil it. LLMs seem to be able to solve &lt;em&gt;some&lt;/em&gt; problems, most obviously predicting the next word in a sentence. But with little modification they can also address other problems, such as certain parentheses matching tasks (Weiss et al. 2021) and disambiguating word senses at a (near-)human level (Conia &amp;amp; Navigli 2021). But then, one might worry: Does not a thermostat solve the problem of keeping the temperature at a certain level? I specify a temperature and it figures out when to turn the heating on and for how long. We will come back to that case, but it suggests that the ability to solve problems, broadly understood, might be necessary but insufficient for being a mind.&lt;/p&gt;

&lt;p&gt;The second condition requires the computational system to have “access consciousness”, which I equate with having mental states, such as beliefs and wants. Colloquially, we say that someone has a mind of their own to underline that they have beliefs and wants of their own. They are not just tools solving problems for us, but they represent the world as being a certain way and try to intervene in the world to make it align more with how they want it to be. Mental states, such as beliefs and wants characterise the notion of access consciousness.&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; It is a restrictive, but quite appealing requirement for ascribing a mind. My microwave falls short of it, but humans, most of them anyway, fall under it.&lt;/p&gt;

&lt;p&gt;The third condition is the presence of phenomenal consciousness. To have phenomenal consciousness is to experience the world in a qualitative way (Nagel 1974). Common examples are the quality of red and pain. Phenomenal consciousness has been heavily debated and few if any conclusions have been reached. While I strongly doubt that LLMs have phenomenal consciousness,&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; it is less clear that we want to make this a necessary condition for having a mind. Especially, in the presence of beliefs and desires, phenomenal qualities appear more of an additional feature. Assume that you met a future AGI-robot and it truthfully confesses to that its experience of the world lacks any quality. It would still be sensible to ask this robot what it &lt;em&gt;beliefs&lt;/em&gt; and &lt;em&gt;wants&lt;/em&gt; and on some occasion to ask it what’s on its mind.&lt;/p&gt;

&lt;p&gt;Requiring the presence of a non-physical substance is the fourth and most restrictive condition. It is strongly associated with the kind of dualism proposed by Descartes. I don’t believe in the existence of a non-physical mind substance, so as far as I am concerned humans fail to meet it as well. That makes it, presumably, a too restrictive condition. &lt;a href=&quot;https://plato.stanford.edu/entries/dualism/&quot;&gt;Other formulations&lt;/a&gt; of dualism avoiding a non-physical substance have been put forward, but usually they mainly attempt to capture phenomenal consciousness, which I have already covered in the previous paragraph.&lt;/p&gt;

&lt;p&gt;There is no uncontroversial criterion for mindhood, but I suggest that having mental states is a decent starting point. If we were happy to describe Transformer LLMs as having beliefs and wants, then they would seem excellent candidates for being minds. Conversely, if LLMs lacked these states, we would be more sceptical about their chances. Either way, we could still continue to quibble, but with a much improved understanding of the matter at thand.&lt;/p&gt;

&lt;p&gt;For the sake of this post, I will accept access consciousness as a starting point. Accordingly, I will consider a computational system to be minded if it has states sufficiently similar to mental states such as beliefs. This suggestion would be more helpful, of course, if we had a universally accepted theory of such mental states. Unsurprisingly, we lack such a theory and so I will in the next section resort to providing two samples from the realm of plausible theories.&lt;/p&gt;

&lt;h2 id=&quot;is-there-a-somewhat-plausible-cognitive-science-theory-of-mental-states-that-would-describe-transformer-llms&quot;&gt;Is there a somewhat plausible cognitive science theory of mental states that would describe transformer LLMs?&lt;/h2&gt;

&lt;p&gt;The answer to this question is: &lt;em&gt;yes&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Daniel Dennett has long been a proponent of the so-called “intentional stance” theory of minds. He proposes that mental states such as beliefs can only be discerned by taking a certain predictive strategy, which he calls the “intentional stance”. To take this stance, an observer considers the system, assigns some beliefs and wants to it, and then makes predictions on that basis. For example, you might take the intentional stance when trying to figure out why the neighbour’s dog is barking. You might postulate that that it &lt;em&gt;believes&lt;/em&gt; that there is someone entering the place and &lt;em&gt;wants&lt;/em&gt; to scare them away.&lt;/p&gt;

&lt;p&gt;Having introduced the intentional stance, Dennett goes on to argue that any system&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;whose behavior is well predicted [from this stance] is in the fullest sense of the word a believer. (Dennett 1989, p. 15)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;According to Dennett, there is nothing more to having a mind with beliefs than an observer taking the intentional stance towards the system, ascribing beliefs, and having reasonable success with this approach. If you can start to predict the barking of the dog based on your belief and desire ascriptions, your use of the intentional stance was successful. Next time someone enters the place, the dog barks again. Success!&lt;/p&gt;

&lt;p&gt;The degree of success required to justify the attribution of mental states is debatable, because the strategy also has some success with thermostats. (I promised we would get back to the example.) After all, I can predict that the thermostat will turn up the heater by attributing to it the belief that it is 18° and the desire to have the room be at a temperature of 21°. But this is a questionable use of the intentional stance, because I can predict what will happen relatively simply by describing the thermostat as a mechanism.&lt;/p&gt;

&lt;p&gt;Mechanical descriptions of dogs and Transformer LLMs, by contrast, quickly run into difficulties. They might not be impossible and important research has been produced (e.g. Voita et al. 2019, Dai et al. 2022, Geva et al. 2022), but the challenges in formulating them and the ease in prediction provided by the intentional stance justifies the ascription of beliefs. At least, that is what the intentional stance theory purports.&lt;sup id=&quot;fnref:4&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:4&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;4&lt;/a&gt;&lt;/sup&gt; According to Dennett’s intentional stance theory of mind, Transformer LLMs have minds.&lt;/p&gt;

&lt;p&gt;To be clear, Dennett’s intentional stance theory is &lt;em&gt;not&lt;/em&gt; widely accepted within cognitive science. It receives attention and might very well be presented in a standard philosophy of mind or cognitive science course, but is treated as a rather extreme view within the field. Dennett serves as a positive example showing that there is a somewhat plausible cognitive science theory ascribing a mind to Transformer LLMs, not more.&lt;/p&gt;

&lt;p&gt;Like many others, I am more tempted by what is known in the philosophy of mind as &lt;a href=&quot;https://plato.stanford.edu/entries/functionalism/&quot;&gt;“functionalism”&lt;/a&gt;.  Functionalism a family of theories which judge whether something is a mental state by checking whether it has the required &lt;em&gt;functional profile&lt;/em&gt;. For example, something is a pause button if it has the functional profile of stopping a relevant process when you push it. The button itself can be made of wood, metal, or even only exist graphically on a touchscreen. What matters is that it plays the right role in a larger system. Being a belief would then be similar to being an pause button. The states in the LLM or in the human brain have play the right role in the overall cognitive system to be beliefs.&lt;/p&gt;

&lt;p&gt;Functionalism typically requires more than the kind of predictive success that Dennett declared the criterion for mind ascription. Whatever realises the mental states, be it wetware or hardware, has to fit a certain functional profile, not just lead to predictive success. The question, therefore, becomes what the relevant functional profiles are for mental states. What is their characteristic role in computational systems? These functional profiles would have to be very abstract if we seek to attribute beliefs both to humans and dogs, which after all have quite different cognitive capacities. In the case of belief, the profile would probably require some interaction with wants, so that the system acts &lt;em&gt;on the basis of its beliefs&lt;/em&gt; towards achieving its wants. Whether Transformer LLMs fulfil this aspect of the functional profile is debatable. But one might propose a stricter functional profile and require&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;the candidate state to induce physical engagement with the world under suitable condition, or&lt;/li&gt;
  &lt;li&gt;to enable forms of reasoning which LLMs still do not reliably exhibit.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once again, there is nothing close to a universally accepted answer on what the functional profile of beliefs is. But  even if we could agree on the functional profile of belief states, then we would have to wonder whether having states with similar but not quite the same functional profile is sufficient for access consciousness and therefore being a mind. Perhaps Transformer LLMs do not have beliefs but almost-something-like-beliefs. If that were so, would that justify declaring them to be minds? How far off can the functional profile be before we draw the line? In light of such issues, I suggest we should ask whether the search for a sufficient condition for being a mind might be based on a misunderstanding of the semantics of MIND.&lt;/p&gt;

&lt;h2 id=&quot;what-kind-of-concept-is-the-concept-mind&quot;&gt;What kind of concept is the concept MIND?&lt;/h2&gt;

&lt;p&gt;My partial answer to this question is: &lt;em&gt;A graded concept to be used with care.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;At least so far, no one has found a convincing analysis of what it is to have the mental states that would characterise a mind. Certainly, no one has been able to settle the debates with a set of necessary and jointly sufficient conditions. That might also be because MIND does not allow for an analysis in terms of such conditions.&lt;/p&gt;

&lt;p&gt;The concept of MIND resembles that of BIRD more closely than that of HYDROGEN. An atom is a hydrogen atom if and only if it has exactly one proton at its core. No such biconditional exists for BIRD. Instead we have core examples of what falls under the concept (a robin), further removed examples (a penguin), and as we go further down the spectrum to the archaeopteryx, we do not know at which point to draw the line. The concept of BIRD has something like a prototypical structure, where we have features that are typical for birds, but we have no clear cutoff for being a bird.&lt;sup id=&quot;fnref:5&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:5&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Being a mind might be like being a bird. A computational system might be closer or further away from being a prototypical mind. Being arrogant enough to take the human mind as the prototype of a mind, a Transformer LLM is certainly far removed – and I will stress this below again – but this is not to say that they do not fall under the concept at all. Penguins do not fly; they do not live in trees; they eat fish instead of seeds. They are pretty far removed from the typical bird. Penguins are, nonetheless, birds.&lt;/p&gt;

&lt;p&gt;By now I have justified my assertion that part of the correct answer to the original question is: &lt;em&gt;It is complicated.&lt;/em&gt; It is complicated because we don’t know enough about minds and Transformer LLMs and it is complicated, because the semantics of the concepts allow some stretching. But I will end on a more definitive note. At least for now, I believe in most contexts it is not particularly &lt;em&gt;useful&lt;/em&gt; to call LLMs minds.&lt;/p&gt;

&lt;p&gt;When we think of minds, we think of humans and other relatively complex animals that&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;engage with non-textual objects in the world and manipulate them according to their wants,&lt;sup id=&quot;fnref:6&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:6&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;6&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
  &lt;li&gt;self-reproduce and have been subjected to long-term evolution,&lt;/li&gt;
  &lt;li&gt;are heavily multi-modal (e.g. have smell, touch, proprioception etc.),&lt;/li&gt;
  &lt;li&gt;are probably not trained by backpropagation,&lt;/li&gt;
  &lt;li&gt;spend a significant amount of their cognitive capacity on securing their energy source,&lt;/li&gt;
  &lt;li&gt;either don’t use language at all or use it &lt;em&gt;as humans&lt;/em&gt;,&lt;/li&gt;
  &lt;li&gt;have phenomenal consciousness.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A standard Transformer LLM lacks these features, and it is therefore is usually not helpful to call LLMs minds.  The point is not that these features are &lt;em&gt;necessary&lt;/em&gt; conditions for having a mind – I have dismissed this for the case of phenomenal consciousness – but that they are characteristic for having a mind. This might change over time, the concept MIND might come to be recentered closer to Transformer LLMs as AI systems become widespread. For now, however, you and I are the more typical minds in the world.&lt;/p&gt;

&lt;p&gt;Were I to discover the fossil of an archaeopteryx in my garden, I should not tell the museum that I have found some bird bones behind the house. There might be a way to construe the statement as true, but it is not particularly helpful. The same usually applies to calling Transformers minds.&lt;/p&gt;

&lt;p&gt;I propose a broadly contextualist approach. If you sit in a philosophy seminar and try to list all existing kinds of minds, then it might be sensible to mention LLMs as an edge case. For making sense of LLMs as a tool, it is unhelpful, because human minds cannot be used as tools in the same way as Transformer LLMs. For analysing their inner working, it is unhelpful to describe LLMs as minds, because we need to look at more specific mechanisms for that purpose. For trying to predict their social impact, it is unhelpful to describe LLMs as minds, because adding Transformer LLMs to the world is not like adding human minds to the world. The similarities are not as relevant as the differences in most contexts.&lt;/p&gt;

&lt;p&gt;My advice is to avoid calling an LLM a mind, unless you find yourself in one of the rare situations in which doing so helps to move forward the discussion. Personally, I’ll try to set this topic aside after this post and focus on more fruitful questions, such as how transformers deal with &lt;a href=&quot;/transformer-speculations/&quot;&gt;compositional meaning&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Block, N. (1995). &lt;a href=&quot;https://doi.org/10.1017/s0140525x00038188&quot;&gt;On a Confusion About a Function of Consciousness.&lt;/a&gt; Brain and Behavioral Sciences, 18(2), 227–247.&lt;/li&gt;
  &lt;li&gt;Conia, S., &amp;amp; Navigli, R. (2021). &lt;a href=&quot;https://doi.org/10.18653/v1/2021.eacl-main.286&quot;&gt;Framing Word Sense Disambiguation as a Multi-Label Problem for Model-Agnostic Knowledge Integration.&lt;/a&gt; Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 3269–3275.&lt;/li&gt;
  &lt;li&gt;Dai, D., Dong, L., Hao, Y., Sui, Z., Chang, B., &amp;amp; Wei, F. (2022). &lt;a href=&quot;https://doi.org/10.18653/v1/2022.acl-long.581&quot;&gt;Knowledge Neurons in Pretrained Transformers.&lt;/a&gt; Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 8493–8502.&lt;/li&gt;
  &lt;li&gt;Dennett. (1989). The Intentional Stance. MIT Press.&lt;/li&gt;
  &lt;li&gt;Devlin, J., Chang, M.-W., Lee, K., &amp;amp; Toutanova, K. (2019). &lt;a href=&quot;http://arxiv.org/abs/1810.04805&quot;&gt;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.&lt;/a&gt; ArXiv:1810.04805 [CS].&lt;/li&gt;
  &lt;li&gt;Geva, M., Caciularu, A., Wang, K., &amp;amp; Goldberg, Y. (2022). &lt;a href=&quot;https://aclanthology.org/2022.emnlp-main.3&quot;&gt;Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space.&lt;/a&gt; Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 30–45.&lt;/li&gt;
  &lt;li&gt;Nagel, T. (1974). &lt;a href=&quot;https://doi.org/10.2307/2183914&quot;&gt;What is It Like to Be a Bat?&lt;/a&gt; Philosophical Review, 83(October), 435–450.&lt;/li&gt;
  &lt;li&gt;Reed, S., Zolna, K., Parisotto, E., Colmenarejo, S. G., Novikov, A., Barth-maron, G., Giménez, M., Sulsky, Y., Kay, J., Springenberg, J. T., Eccles, T., Bruce, J., Razavi, A., Edwards, A., Heess, N., Chen, Y., Hadsell, R., Vinyals, O., Bordbar, M., &amp;amp; Freitas, N. de. (2023). &lt;a href=&quot;https://openreview.net/forum?id=1ikK0kHjvj&quot;&gt;A Generalist Agent.&lt;/a&gt; Transactions on Machine Learning Research.&lt;/li&gt;
  &lt;li&gt;Rogers, A., Kovaleva, O., &amp;amp; Rumshisky, A. (2020). &lt;a href=&quot;https://doi.org/10.1162/tacl_a_00349&quot;&gt;A Primer in BERTology: What We Know About How BERT Works.&lt;/a&gt; Transactions of the Association for Computational Linguistics, 8, 842–866.&lt;/li&gt;
  &lt;li&gt;Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., &amp;amp; Polosukhin, I. (2017). &lt;a href=&quot;https://papers.nips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html&quot;&gt;Attention is All you Need.&lt;/a&gt; Advances in Neural Information Processing Systems, 30.&lt;/li&gt;
  &lt;li&gt;Voita, E., Sennrich, R., &amp;amp; Titov, I. (2019). &lt;a href=&quot;https://doi.org/10.18653/v1/D19-1448&quot;&gt;The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives.&lt;/a&gt; Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 4396–4406.&lt;/li&gt;
  &lt;li&gt;Weiss, G., Goldberg, Y., &amp;amp; Yahav, E. (2021). &lt;a href=&quot;https://proceedings.mlr.press/v139/weiss21a.html&quot;&gt;Thinking Like Transformers.&lt;/a&gt; Proceedings of the 38th International Conference on Machine Learning, 11080–11090.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h2&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;I will neglect in this post the difference between Transformer LLMs &lt;em&gt;being&lt;/em&gt; a mind and &lt;em&gt;having&lt;/em&gt; a mind. As long as a mind is realised, I will consider it a positive answer to the present question. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;According to Ned Block, “a state is access-conscious (A-conscious) if, in virtue of one’s having the state, a representation of its content is (1) […] poised for use as a premise in reasoning, (2) poised for rational control of action, and (3) poised for rational control of speech” (Block 1995, p. 231). &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;As far as I can tell, the best way we have to argue for this conclusion is to point towards physical correlates of phenomenal consciousness and suggest neural networks lack them. It is far from clear that this is a decisive strategy, partially because there might be multiple ways of bringing consciousness about, i.e. our human correlates for phenomenal consciousness might not be necessary. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:4&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;The commonly known problems of LLMs with factuality do not affect this result in the slightest, since the question is whether the system has beliefs, not whether these beliefs are correct. &lt;a href=&quot;#fnref:4&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:5&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;I say “something like”, because I want to avoid commitment to the entire theory of prototypes. I suggest that there are graded areas of falling under a concept, not that they are well-described by the distance to a single point in a semantic space, the prototype. &lt;a href=&quot;#fnref:5&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:6&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;I specifically deal with Transformer LLMs in this post, more generalist models, such as GATO (Reed et al. 2023) would require further discussion. I still doubt, however, that they are accurately described as having wants. &lt;a href=&quot;#fnref:6&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Tue, 28 Feb 2023 12:38:00 +0000</pubDate>
        <link>https://dstrohmaier.com/are-transformer-llms-minds/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/are-transformer-llms-minds/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Speculations about Transformers and Compositionality</title>
        <description>&lt;center&gt;&lt;em&gt;Warning: Speculative Content. Expect that parts of it will be proven wrong.&lt;/em&gt;&lt;/center&gt;

&lt;ol&gt;
  &lt;li&gt;The meaning of natural language sentences is compositional.
    &lt;ul&gt;
      &lt;li&gt;The meaning of an expression \( \mathbf{E} \) syntactically derived from the sub-expressions \(  \mathbf{E}_1,  \mathbf{E}_2, \dots \)  is a function of the semantic value of the sub-expressions
Writing \( |\mathbf{E}| \) for the semantic value of the expression \( \mathbf{E} \), the compositional thesis is that:  \( |\mathbf{E}| = f( |\mathbf{E}_1,  \mathbf{E}_2, \dots | ) \) &lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Transformers (Vaswani et al. 2017) do not correctly implement the compositional semantics of natural language cognition as found in humans agents.
    &lt;ul&gt;
      &lt;li&gt;Human cognition includes a dedicated mechanism to derive the meaning of the expression  \( \mathbf{E} \) from its sub-expressions compositionally.&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
      &lt;li&gt;There is no dedicated mechanism in transformers to derive the meaning of the overall expression \( \mathbf{E} \).&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Transformers have to compensate for their lack of directly compositional language processing and partially succeed in this.
    &lt;ul&gt;
      &lt;li&gt;Attention allows the transformers to partially compensate for lacking a compositional mechanism.&lt;/li&gt;
      &lt;li&gt;The compensatory role of attention is part of the explanation why some attention heads reflect syntactic connections (see section 4.2.1 of Rogers et al. 2020).&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;The compensation mechanisms of transformers lead to over-contextualisation of later level token embeddings.
    &lt;ul&gt;
      &lt;li&gt;The over-contextualisation is a partial explanation why transformer embeddings from earlier levels perform better on lexical semantic tasks (cf. Vulić et a. 2020).&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Some limitations of transformers will be overcome by using a mechanism that reflects composition more directly.&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;
    &lt;ul&gt;
      &lt;li&gt;A mechanism other than attention will be used.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For feedback, comments, and complaints, email me at  &lt;a href=&quot;mailto:david.strohmaier@cl.cam.ac.uk&quot;&gt;david.strohmaier@cl.cam.ac.uk&lt;/a&gt;. Links to relevant research are appreciated.&lt;/p&gt;

&lt;h3 id=&quot;references&quot;&gt;References&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Fine, K. (2007). Semantic Relationism. Blackwell.&lt;/li&gt;
  &lt;li&gt;Fodor, J. A., &amp;amp; Lepore, E. (2002). The Compositionality Papers. Oxford University Press, U.S.A.&lt;/li&gt;
  &lt;li&gt;Pylkkänen, L. (2020). &lt;a href=&quot;https://doi.org/10.1098/rstb.2019.0299&quot;&gt;Neural basis of basic composition: What we have learned from the red–boat studies and their extensions.&lt;/a&gt; Philosophical Transactions of the Royal Society B: Biological Sciences, 375(1791), 20190299.&lt;/li&gt;
  &lt;li&gt;Rogers, A., Kovaleva, O., &amp;amp; Rumshisky, A. (2020). &lt;a href=&quot;https://doi.org/10.1162/tacl_a_00349&quot;&gt;A Primer in BERTology: What We Know About How BERT Works.&lt;/a&gt; Transactions of the Association for Computational Linguistics, 8, 842–866.&lt;/li&gt;
  &lt;li&gt;Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., &amp;amp; Polosukhin, I. (2017). &lt;a href=&quot;https://papers.nips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html&quot;&gt;Attention is All you Need.&lt;/a&gt; Advances in Neural Information Processing Systems, 30.&lt;/li&gt;
  &lt;li&gt;Vulić, I., Ponti, E. M., Litschko, R., Glavaš, G., &amp;amp; Korhonen, A. (2020). &lt;a href=&quot;https://doi.org/10.18653/v1/2020.emnlp-main.586&quot;&gt;Probing Pretrained Language Models for Lexical Semantics.&lt;/a&gt; Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 7222–7240.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;I am using here the more general formula of Kit Fine’s (2007) &lt;em&gt;Semantic Relationism&lt;/em&gt;. More commonly the sub-expressions are taken to contribute their semantic values in an atomic fashion, i.e. \( \vert \mathbf{E} \vert = f( \vert \mathbf{E}_1 \vert,  \vert \mathbf{E}_2 \vert, \dots  ) \) &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;This has, to my knowledge, not been established yet. It has been argued for, in some form, by cognitive scientists such as Fodor &amp;amp; Lepore (2002), but my more optimistic assessment of neural methods is hard to square with these arguments. According to my understanding, empirical evidence from the level of cognitive neuroscience is missing or weak (e.g. Pylkkänen 2020). &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;This claim does not concern whether dealing with compositionality in the absence of a new mechanism is possible, but instead concerns which development path will be taken due to differences in feasibility. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Sun, 19 Feb 2023 10:08:13 +0000</pubDate>
        <link>https://dstrohmaier.com/transformer-speculations/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/transformer-speculations/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>10 Years of word2vec: Motivations and Success</title>
        <description>&lt;p&gt;Once in a while, a publication resets the literature. There is a clear before and after them, as researchers cite the new publications, while neglecting the earlier literature upon which they built. The &lt;em&gt;word2vec&lt;/em&gt; papers by Mikolov et al., which have been published about a decade ago in 2013, are an instance of this.&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; As can happen with such papers, the original motivations became overshadowed by later applications.&lt;/p&gt;

&lt;p&gt;In this post I will lay out those motivations were, how the embedding literature built upon them, and what made word2vec such an outstanding success.&lt;/p&gt;

&lt;p&gt;I will not go into the details of the word2vec algorithms, since numerous blog posts have been written on this topic already. If you need a refresher, the following might be worth a look:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://mccormickml.com/2016/04/27/word2vec-resources/&quot;&gt;Word2Vec Resources&lt;/a&gt; (by Chris McCormick)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://p.migdal.pl/2017/01/06/king-man-woman-queen-why.html/&quot;&gt;king - man + woman is queen; but why?&lt;/a&gt; (by Piotr Migdał)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://jalammar.github.io/illustrated-word2vec/&quot;&gt;The Illustrated word2vec&lt;/a&gt; (by Jay Alammar)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;why-word2vec&quot;&gt;Why word2vec&lt;/h2&gt;

&lt;p&gt;Mikolov et al. motivated the word2vec algorithms as fulfilling the following goals:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Motivation: Go beyond representing words as atomic units.&lt;/li&gt;
  &lt;li&gt;Motivation: Introduce a way to measure similarity.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first motivation was strongly associated with the idea of representation learning (cf. Bengio et al. 2013). Back then, the representations were often used by non-neural systems. We used word2vec embeddings with an SVM for sentiment classification during my MPhil studies. But with the progress of neural NLP, Seq2Seq models such as Transformers have become the standard. The representations inside those models are primarily used to explain their behaviour, and rarely taken as the primary object of research.&lt;/p&gt;

&lt;p&gt;The second motivation had its source in the linguistic notion of a semantic space, which had been explored by computational linguists and NLP researchers for years prior to word2vec (see Erk 2012). These models had been largely motivated by theories of lexical meaning and attempted to implement them.&lt;/p&gt;

&lt;h2 id=&quot;analogies-and-lexical-semantic-properties&quot;&gt;Analogies and Lexical Semantic Properties&lt;/h2&gt;

&lt;p&gt;Famously, word2vec was able to solve analogy problems, at least some of them.&lt;/p&gt;

&lt;p&gt;For example, starting with the embeddings for CAR and DRIVER, one can calculate that for PLANE the analogous concept to DRIVER is that of PILOT. All one had to do was to subtract the vector of CAR from the vector for DRIVE and add the result to the vector of PLANE. The next closest vector would, if it worked, be that for PILOT, i.e.&lt;/p&gt;

&lt;p&gt;\[ \vec{v}_{plane} + ( \vec{v}_{driver} - \vec{v}_{car} ) \approx \vec{v}_{pilot} \]&lt;/p&gt;

&lt;p&gt;We can think of these analogies as capturing as relations, e.g. there is a relation that holds both between CAR and DRIVER as well as PILOT and PLANE.&lt;/p&gt;

&lt;p&gt;The embeddings appeared to capture such relations in an intuitively interpretable way. In response, researchers sought to explain how these results came about (e.g. Levy &amp;amp; Goldberg 2014; Arora 2019; Hashimoto 2016), and then improve the linguistic quality of embeddings. Researchers engaged in that second endeavour argued that representing words as simple vectors failed to encode various lexical semantic properties, for example:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Polysemy: A single vectors does not appear to exhibit the variety of senses a word might carry.&lt;/li&gt;
  &lt;li&gt;Vagueness: OLD is a vague concept, and OCTOGENARIAN is not, or at least to a much smaller extent. A vector does not carry a specification of the vagueness in an obvious manner.&lt;/li&gt;
  &lt;li&gt;Taxonomical hierarchies: All dogs are mammals, but the vectors are not exhibiting such inclusion relationships.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The claim here is not that embeddings can never be used to detect polysemy or vagueness, e.g. by feeding them into a neural classifier, but that the vectors do not reflect such semantic properties in a straightforward way, similar to the way in which they captured analogies. Highly sophisticated approaches have been proposed, but the resulting models are often hard to train (for a survey and discussion see Emerson 2020).&lt;/p&gt;

&lt;h2 id=&quot;two-perspectives-on-embeddings&quot;&gt;Two Perspectives on Embeddings&lt;/h2&gt;

&lt;p&gt;In its focus on encoding semantic properties, parts of the embedding literature have deviated from the priority ranking of Mikolov et al. While Mikolov et al. referred to “meaningful regularities”, what mattered in the first instance was the downstream application. Vectors were not expected to reflect all regularities of word meanings.&lt;/p&gt;

&lt;p&gt;Reconsidering Mikolov et al.’s original motivation as well as the literature it spawned, both a perspective emphasising downstream application and one focused on encoding semantic properties can be discerned. I will give one argument for each of the two views:&lt;/p&gt;

&lt;p&gt;My argument for the first perspective is that word meanings as cognitive objects in human language users do not exist independently of other practical purposes. Both the word processing in human brains and the vectors we are concerned with have their role within larger computational processes (cf. Gauthier &amp;amp; Ivanova 2018). Encoding linguistic properties, such as taxonomical hierarchy, for their own sake would therefore not reflect human language cognition.&lt;/p&gt;

&lt;p&gt;To support the second view, one can argue that human word meanings are general purpose, and that they achieve this status &lt;em&gt;because&lt;/em&gt; they exhibit the semantic properties in question, e.g. taxonomical hierarchy or logical entailment. Accordingly, working towards encoding such properties brings us closer to improvements on many downstream tasks.&lt;/p&gt;

&lt;p&gt;A problem with the argument for the second view is that the embeddings created to encode semantic properties have found little use so far. For example, region embeddings, that should be able to capture taxonomical hierarchies have not found a purpose in more application-oriented systems yet. By comparison, word2vec embeddings and contextualised embeddings based on ELMo (Peters et al. 2018) and BERT (Devlin et al. 2019) did so very quickly.&lt;/p&gt;

&lt;p&gt;A key takeaway is that word2vec was able to reset the literature 10 years ago, because it made progress both in capturing linguistic regularities &lt;em&gt;and&lt;/em&gt; in supporting downstream applications. With the exception of contextualised embeddings, such success has not been forthcoming since, despite many attempts. Whoever will find another way to make such combined progress, has a good chance of changing NLP history.&lt;/p&gt;

&lt;!-- Following word2vec the general, although not only approach, to creating embeddings has been to use distributional information, i.e. how likely is a word to occur in the context of another. That, however, is certainly not a task which only required information about word meaning. Syntactic regularities obviously play a role. The reverse also holds, not all aspects of word meaning might be well captured by distributional information. --&gt;

&lt;h3 id=&quot;references&quot;&gt;References&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Arora, S., Li, Y., Liang, Y., Ma, T., &amp;amp; Risteski, A. (2019). &lt;a href=&quot;https://doi.org/10.48550/arXiv.1502.03520&quot;&gt;A Latent Variable Model Approach to PMI-based Word Embeddings (arXiv:1502.03520; Version 4).&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Bengio, Y., Courville, A. C., &amp;amp; Vincent, P. (2013). &lt;a href=&quot;https://doi.org/10.1109/TPAMI.2013.50&quot;&gt;Representation learning: A review and new perspectives.&lt;/a&gt; IEEE Trans. Pattern Anal. Mach. Intell., 35(8), 1798–1828.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Devlin, J., Chang, M.-W., Lee, K., &amp;amp; Toutanova, K. (2019). &lt;a href=&quot;http://arxiv.org/abs/1810.04805&quot;&gt;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.&lt;/a&gt; ArXiv:1810.04805.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Emerson, G. (2020). &lt;a href=&quot;https://doi.org/10.18653/v1/2020.acl-main.663&quot;&gt;What are the Goals of Distributional Semantics?&lt;/a&gt; Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7436–7453.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Erk, K. (2012). &lt;a href=&quot;https://doi.org/10.1002/lnco.362&quot;&gt;Vector Space Models of Word Meaning and Phrase Meaning: A Survey.&lt;/a&gt; Language and Linguistics Compass, 6(10), 635–653.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Gauthier, J., &amp;amp; Ivanova, A. (2018). &lt;a href=&quot;https://doi.org/10.48550/arXiv.1806.00591&quot;&gt;Does the brain represent words? An evaluation of brain decoding studies of language understanding (arXiv:1806.00591).&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Hashimoto, T. B., Alvarez-Melis, D., &amp;amp; Jaakkola, T. S. (2016). &lt;a href=&quot;https://doi.org/10.1162/tacl_a_00098&quot;&gt;Word Embeddings as Metric Recovery in Semantic Spaces.&lt;/a&gt; Transactions of the Association for Computational Linguistics, 4, 273–286.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Levy, O., &amp;amp; Goldberg, Y. (2014). &lt;a href=&quot;https://proceedings.neurips.cc/paper/2014/file/feab05aa91085b7a8012516bc3533958-Paper.pdf&quot;&gt;Neural word embedding as implicit matrix factorization.&lt;/a&gt; Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, 2177–2185.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Mikolov, T., Chen, K., Corrado, G., &amp;amp; Dean, J. (2013a). &lt;a href=&quot;http://arxiv.org/abs/1301.3781&quot;&gt;Efficient Estimation of Word Representations in Vector Space.&lt;/a&gt; ICLR Workshop.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Mikolov, T., Sutskever, I., Chen, K., Corrado, G., &amp;amp; Dean, J. (2013b). &lt;a href=&quot;http://dl.acm.org/citation.cfm?id=2999792.2999959&quot;&gt;Distributed Representations of Words and Phrases and Their Compositionality.&lt;/a&gt; Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, 3111–3119.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Mikolov, T., Yih, W., &amp;amp; Zweig, G. (2013c). &lt;a href=&quot;https://www.aclweb.org/anthology/N13-1090&quot;&gt;Linguistic Regularities in Continuous Space Word Representations.&lt;/a&gt; Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 746–751.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., &amp;amp; Zettlemoyer, L. (2018). &lt;a href=&quot;https://doi.org/10.18653/v1/N18-1202&quot;&gt;Deep Contextualized Word Representations.&lt;/a&gt; Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2227–2237.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;On Google scholar, &lt;a href=&quot;https://scholar.google.com/citations?view_op=view_citation&amp;amp;citation_for_view=oBu8kMMAAAAJ:CB2v5VPnA5kC&quot;&gt;one of the word2vec papers&lt;/a&gt; is cited more than 37000 publications, while &lt;a href=&quot;https://scholar.google.com/citations?view_op=view_citation&amp;amp;hl=en&amp;amp;citation_for_view=mxiO4IkAAAAJ:9yKSN-GCB0IC&quot;&gt;an important source&lt;/a&gt; of that paper has a mere 1216 citations to its name. Similar disruptions in citation patterns have been used as &lt;a href=&quot;https://www.nature.com/articles/s41586-022-05543-x&quot;&gt;a measure of scientific progress&lt;/a&gt;. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Fri, 10 Feb 2023 13:32:13 +0000</pubDate>
        <link>https://dstrohmaier.com/10-years-of-word2vec/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/10-years-of-word2vec/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Zotero BibLaTeX Style</title>
        <description>&lt;p&gt;I regularly use &lt;a href=&quot;https://www.zotero.org/&quot;&gt;Zotero&lt;/a&gt; for managing my bibliographies. I then usually typeset my writings in LaTex, citing references with BibLaTeX.&lt;/p&gt;

&lt;p&gt;For exporting BibLaTeX .bib files, I recommend the &lt;a href=&quot;https://github.com/retorquere/zotero-better-bibtex&quot;&gt;“Better BibTeX” extension&lt;/a&gt;. But sometimes I prefer to attach citations to my clipboard rather than have to export an entire .bib file. For BibTeX, there exists a &lt;a href=&quot;https://www.zotero.org/styles?q=bibtex&quot;&gt;bibliography style&lt;/a&gt; that allows Zotero users to do so. I have now start to adapt this file to BibLateX. You can find the result &lt;a href=&quot;https://github.com/dstrohmaier/biblatex_csl&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The file is still work-in-progress and you should not rely upon it. If you find any problems with the style, feel free to email me or make a pull request.&lt;/p&gt;
</description>
        <pubDate>Sun, 22 Jan 2023 12:00:00 +0000</pubDate>
        <link>https://dstrohmaier.com/Zotero-BibLaTeX-Style/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Zotero-BibLaTeX-Style/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Personal Reflections on 2022</title>
        <description>&lt;p&gt;Another year has passed and I want to take the opportunity to reflect on my research pursuits at a higher-level of abstraction. Hence, I will jot down notes on the lessons I cannot but take myself to have learned and sketch an intention of how to live as a researcher.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/reading.jpg&quot; alt=&quot;Picture of a reading man&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;a-conclusion-ive-drawn-rightly-or-wrongly&quot;&gt;A Conclusion I’ve Drawn, Rightly or Wrongly&lt;/h2&gt;

&lt;p&gt;As a human being, it is hard to avoid drawing lessons from one’s life, even though one knows the data underlying this inference process to be limited and biased. On this occasion, I will indulge the impulse. One of the lessons I have drawn for myself is the following: Pursue a limited number of research project doggedly. As my eclectic CV indicates, this is a lesson I learned the hard way.&lt;/p&gt;

&lt;p&gt;I already &lt;a href=&quot;/end-of-year/&quot;&gt;hinted at this lesson in 2020&lt;/a&gt;. Back then, I pointed out a danger of this lesson: the danger of ending up in a blind alley. Some research paths lead nowhere, no matter the determination with which one( follows them. In some cases, there is little to be done about that, because the universe does not signpost the paths to its secrets. Research is usually a gamble. Often, however, a clear-eyed look at where one’’s research efforts have led so far allow to infer that they will not lead to pastures bearing sweeter fruits. That does not necessarily stop those who have already committed them to the path. Untold numbers of academics have ended up spending their life in such a manner, many even enjoyed it.&lt;/p&gt;

&lt;p&gt;Occasional reflection is supposed to fend this danger off. Pursue a limited number of research projects doggedly, but once in a while step back, to reconsider them. One can consider a research project from various distances, and this post serves for reflection of the largest distance, that is of the greatest abstraction: research as a choice of what to spend a life one. So let it be recorded that, from this distance, I am still optimistic with regard to the path I have chosen.&lt;/p&gt;

&lt;p&gt;Research into lexical acquisition in human agents and NLP models remains promising. There is clear progress in NLP, widely advertised, but the progress is not well understood and clearly patchy. For example, how transformers cope with the compositionality of lexical meaning has, as far as I am aware, not yet been explained. I have no reasonable doubt that research in this area will push the epistemic frontier forward – and I want to contribute to it.&lt;/p&gt;

&lt;p&gt;&lt;!-- A further danger of the dogged pursuit is excessive self-denial. While following one interest to the exclusion of other might come natural to some, my interest tend to be manifold. Frustration in one project leads to me seek out another. But the research area of lexical acquisition in human agents and NLP models is vast. Many sub-projects can have their place in this larger area. As a compensation strategy, I try to direct my interests into this area in coordinate them.--&gt;&lt;/p&gt;

&lt;h2 id=&quot;why-writing-such-reflections-infrequently-is-a-good-idea&quot;&gt;Why Writing Such Reflections Infrequently Is a Good Idea&lt;/h2&gt;

&lt;p&gt;As already mentioned, I have written &lt;a href=&quot;/end-of-year/&quot;&gt;another post like this back in 2020&lt;/a&gt;, but then I skipped 2021. Instead I started 2021 with a &lt;a href=&quot;/why-learn-prolog-in-2021/&quot;&gt;post on Prolog&lt;/a&gt;, which has probably been my most successful blog post so far, in terms of engagement but also in what I was able to learn from the results. Generalising from my limited blogging experience, writing about first-order interests, such as a neglected programming language, has proven more productive and more in line with my own goals than obscure ruminations about my research path.&lt;/p&gt;

&lt;p&gt;Second- and higher-order thoughts, such as the reflections in the present post, serve to correct our first-order pursuits or increase their efficiency,[0] but they can become their own pursuit that distort our behaviour. If I intend to live my life as a researcher, then not for the sake of writing about this life. I live it for the sake of the epistemic progress brought about by this research. Any post like the present one should be no more than the rare exception. Special events, such as the passing of year, provide a limited occasion to engage in such exceptional behaviour. This post fills that role for this year, and now it is done.&lt;/p&gt;

&lt;p&gt;Onward, for scientific progress!&lt;/p&gt;

&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;

&lt;p&gt;[0] What is they justification for this assertion of purpose? Ah, there is the rub. That is a philosophical question, I’ll leave for another day.&lt;/p&gt;
</description>
        <pubDate>Fri, 16 Dec 2022 21:08:13 +0000</pubDate>
        <link>https://dstrohmaier.com/Personal-Reflections-on-2022/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Personal-Reflections-on-2022/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Transformers and the Brain: Literature Notes</title>
        <description>&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;Neural networks with Transformer-architecture remain the state of the art in natural language processing (NLP). For many tasks the first approach is to throw some version of the BERT model (&lt;a href=&quot;http://arxiv.org/abs/1810.04805&quot;&gt;Devlin et al. 2019&lt;/a&gt;) at it – a practice I’ve participated in (Yuan et al. &lt;a href=&quot;https://doi.org/10.18653/v1/2021.semeval-1.74&quot;&gt;2021a&lt;/a&gt;, &lt;a href=&quot;https://doi.org/10.18653/v1/2021.semeval-1.96&quot;&gt;2021b&lt;/a&gt;). The success of the Transformer-architecture has raised the question how such models compare to language processing in the human brain and a literature is growing around this question. In this post, I collect notes on selected papers which try to map representation in Transformer models to brain data. First I’ll list a few conclusions from the literature and then move through the selected papers to substantiate the conclusions and make further points.&lt;/p&gt;

&lt;p&gt;The main conclusions are:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;The Transformer-architecture is better than previous RNN architectures. That is, the mapping of Transformer models to brain data allows to predict more of it than if one uses an RNN architecture, typically LSTMs or GRU networks.&lt;/li&gt;
  &lt;li&gt;Word prediction performance matters, but is not everything. The capacity for predicting the next word given an incomplete sequence does not explain all that is special about Transformers.&lt;/li&gt;
  &lt;li&gt;We do not know why the Transformer-architecture performs so well, but semantics might play a role.&lt;/li&gt;
  &lt;li&gt;We need better brain data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/talking_head.jpg&quot; alt=&quot;Picture of a Talking Head&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;human-sentence-processing-recurrence-or-attention&quot;&gt;Human Sentence Processing: Recurrence or Attention?&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://doi.org/10.18653/v1/2021.cmcl-1.2&quot;&gt;This paper by Merkx and Frank (2021)&lt;/a&gt; explicitly compares GRU-RNN to Transformer models. They implement these models themselves and make them comparable, e.g. the total number of parameters are relatively close. The models are trained on the next-word prediction task. They are evaluated on&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;self-paced reading (SPR)&lt;/li&gt;
  &lt;li&gt;eye-tracking (ET),&lt;/li&gt;
  &lt;li&gt;and electroencephalography (EEG) data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The top-line result is that even controlling for performance as a language model, i.e. being able to predict word tokens, Transformer models tend to do better, specifically on the SPR and EEG datasets.[0] Something about the architecture other than its ability to capture statistical information about word distributions appears to make it especially well-suited for predicting brain performance.&lt;/p&gt;

&lt;p&gt;The authors show themselves surprised by the superior performance of the Transformer-architecture, because they “considered the Transformer’s unlimited memory and access to past inputs implausible given current theories on[sic] human language processing”. (p. 18). While the author are not giving up this view and therefore remain more sceptical than the authors of other papers I’ll mention, they consider the possibility that Transformers capture something about human language cognition. Specifically, they entertain that the attention-mechanism resembles cue-based retrieval, but since they do not provide much details on this hypothesis and I do not feel confident evaluating it.&lt;/p&gt;

&lt;h2 id=&quot;brains-and-algorithms-partially-converge-in-natural-language-processing&quot;&gt;Brains and Algorithms Partially Converge in Natural Language Processing&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://doi.org/10.1038/s42003-022-03036-1&quot;&gt;Caucheteux and King (2021)&lt;/a&gt; look at Transformer models and ask how the performance of such models on a word prediction task[1] and predicting brain measurement relate. The key findings are:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Performance on predicting words strongly correlates with predicting brain scores.&lt;/li&gt;
  &lt;li&gt;The relationship breaks down at the upper end of next-word prediction performance, that is the best models the authors have trained for word prediction do somewhat worse predicting brain scores. This suggests that Transformer models start to overfit to the word-prediction task to the detriment of being able to predict brain measurements.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;different-kinds-of-cognitive-plausibility-why-are-transformers-better-than-rnns-at-predicting-n400-amplitude&quot;&gt;Different Kinds of Cognitive Plausibility: Why Are Transformers Better than RNNs at Predicting N400 Amplitude?&lt;/h2&gt;

&lt;p&gt;Similarly to Merkx and Frank, &lt;a href=&quot;http://arxiv.org/abs/2107.09648&quot;&gt;Michaelov et al. (2021)&lt;/a&gt; compare RNNs and Transformer models in how well they can predict brain data, in this case the N400 amplitude (EEG study). They used an already existing LSTM model and GPT-2. In contrast to the experiments by Merkx and Frank, the models differ in many ways other than the difference between RNN and Transformer, e.g. vocabulary size and number of parameters.&lt;/p&gt;

&lt;p&gt;The paper also shows that the Transformer model does better at predicting the human brain data than its RNN competitor. Additional experiments suggest that part of the reason GPT-2 does better is that the cosine similarity feeds more into the surprisal of the model. Taking cosine similarity as a measure of semantic similarity, the authors hypothesize that ‘bag-of-words’ semantic activation may be part of the neurocognitive system that is measured by the N400 amplitude. But this claim is again to be considered speculative.&lt;/p&gt;

&lt;h2 id=&quot;the-neural-architecture-of-language-integrative-modeling-converges-on-predictive-processing&quot;&gt;The Neural Architecture of Language: Integrative Modeling Converges on Predictive Processing&lt;/h2&gt;

&lt;p&gt;This paper by &lt;a href=&quot;https://doi.org/10.1073/pnas.2105646118&quot;&gt;Schrimpf et al. (2021)&lt;/a&gt; offers one of the most encompassing comparisons across model architectures and datasets. Without going into all the details, GPT-2 stands out as the best model.&lt;/p&gt;

&lt;p&gt;The authors replicate the finding that performance on next-word prediction predicts performance on predicting brain measurements. Importantly, the authors compare the next-word prediction task with tasks from the GLUE benchmark and find that these do not predict brain scores.&lt;/p&gt;

&lt;p&gt;The paper also test whether the model architecture matters by computing brain scores for models with random weights.[2] The authors show that even under such conditions some models achieved noteworthy correlation. The Transformer architecture alone seems to do some of the work.&lt;/p&gt;

&lt;p&gt;The paper is perhaps the most optimistic one when it comes to ability to Transformers to predict brain data. On some datasets, the authors come to the conclusion that Transformer models reach noise ceiling, i.e. that the model does as good as possible. One dataset, however, remains very challenging: The Blank 2014 dataset consists of fMRI measurement where the stimuli are auditorily presented stories. Both the larger narrative context of stories and the auditory transmission stand out.[3]&lt;/p&gt;

&lt;p&gt;The authors on this paper suggest a convergence between neural model in NLP and cognitive science, since (next-)word prediction is a key task in NLP and predictive processing holds increasing sway in cognitive science. While the authors comparison with the GLUE tasks is suggestive in this regard, I am not yet sold that we see a proper convergence. The tasks humans did might be biased towards the next-word prediction (with perhaps the exception of Blank (2014), where the models did worst). Furthermore, I would not be surprised if the data from the GLUE benchmark are not as reliable as those for next-word prediction since they rely on challenging annotation by experts, hence the network might start to model noise to a great extend.&lt;/p&gt;

&lt;p&gt;Be that as it may, a convergence on prediction would not explain why the Transformer-architecture performs so well on both standard NLP tasks and predicting cognitive measures. LSTMs have also been trained on next-word prediction but do not perform as well. To explain the role of the Transformer-architecture, the authors point (amongst other things) towards the role of smoothed multi-scale processing and propose that this might capture something about language structure, but this discussion is merely suggestive.&lt;/p&gt;

&lt;p&gt;Coming from NLP rather than neuro-science, this paper also made clear to me that we need better brain data. The noise ceilings estimated by the authors, that is their estimate for how well brain measurements can be predicted in general, are rather low. Accordingly, much of the brain measurements is treated as individualised noise. The authors suggest that such a low ceiling might be due to language processing occuring on high level of cognition where the brain processing might not be stimulus-driven but top-down. As a result, there might just not be one pattern across individuals to predict. That seems speculative to me and better measurement might help raise the ceilings and thereby&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;I’ve already listed above the conclusions I’ve drawn from this emerging literature. The papers indicate a clear direction: Transformer models do well at predicting brain measurements, usually better than RNNs, and the architecture plays a role. Why they are doing better remains unclear, with multiple hypotheses being considered. It is intriguing that both the hypothesis by Merkx and Frank (cue-based retrieval) and Michaelov et al. (‘bag-of-words’ semantic activation) have a semantic tendency, i.e. Transformers are taken to do better because they capture something about semantic processing in the brain. But these discussions remain mostly suggestive, with the experiment by Michaelov et al. concerning the predictive power of cosine distance being the strongest piece of evidence, as far as I can tell, and that is not paricularly strong evidence since the cosine distance doesn’t necessarily just concern seamntics. Without a better understanding of language processing in the brain, it might prove difficult to reconstruct why the Transformer-architecture performs so well. Even worse, without better understanding of the human brain, it will become increasingly difficult to compare neural architectures in this way.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;

&lt;p&gt;[0] I don’t understand why this literature is so averse to publishing tables. Graphs are good, but being able to check against a table of data provides a way to test whether one has truly understood what is going on.&lt;/p&gt;

&lt;p&gt;[1] From the paper, it is not entirely clear to me whether the next word or a randomly masked word has to be predicted.&lt;/p&gt;

&lt;p&gt;[2] There is still a linear model trained on top of the randomly initialised models.&lt;/p&gt;

&lt;p&gt;[3] The Futrell 2018 dataset used by the authors is also story-based and the Transformer-model does better at predicting it, but it consists of self-paced reading data instead of brain measurements.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 id=&quot;references&quot;&gt;References&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Blank, I., Kanwisher, N., &amp;amp; Fedorenko, E. (2014). &lt;a href=&quot;https://doi.org/10.1152/jn.00884.2013&quot;&gt;A functional dissociation between language and multiple-demand systems revealed in patterns of BOLD signal fluctuations&lt;/a&gt;. Journal of Neurophysiology, 112(5), 1105–1118.&lt;/li&gt;
  &lt;li&gt;Caucheteux, C., &amp;amp; King, J.-R. (2022). &lt;a href=&quot;https://doi.org/10.1038/s42003-022-03036-1&quot;&gt;Brains and algorithms partially converge in natural language processing&lt;/a&gt;. Communications Biology, 5(1), 1–10.&lt;/li&gt;
  &lt;li&gt;Devlin, J., Chang, M.-W., Lee, K., &amp;amp; Toutanova, K. (2019). &lt;a href=&quot;http://arxiv.org/abs/1810.04805&quot;&gt;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding&lt;/a&gt;. ArXiv:1810.04805 [Cs].&lt;/li&gt;
  &lt;li&gt;Futrell, R., Gibson, E., Tily, H. J., Blank, I., Vishnevetsky, A., Piantadosi, S., &amp;amp; Fedorenko, E. (2018, May). &lt;a href=&quot;https://aclanthology.org/L18-1012&quot;&gt;The Natural Stories Corpus&lt;/a&gt;. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).&lt;/li&gt;
  &lt;li&gt;Merkx, D., &amp;amp; Frank, S. L. (2021). &lt;a href=&quot;https://doi.org/10.18653/v1/2021.cmcl-1.2&quot;&gt;Human Sentence Processing: Recurrence or Attention?&lt;/a&gt; Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, 12–22.&lt;/li&gt;
  &lt;li&gt;Michaelov, J. A., Bardolph, M. D., Coulson, S., &amp;amp; Bergen, B. K. (2021). &lt;a href=&quot;http://arxiv.org/abs/2107.09648&quot;&gt;Different kinds of cognitive plausibility: Why are transformers better than RNNs at predicting N400 amplitude?&lt;/a&gt; ArXiv:2107.09648 [Cs].&lt;/li&gt;
  &lt;li&gt;Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., Tenenbaum, J. B., &amp;amp; Fedorenko, E. (2021). &lt;a href=&quot;https://doi.org/10.1073/pnas.2105646118&quot;&gt;The neural architecture of language: Integrative modeling converges on predictive processing&lt;/a&gt;. Proceedings of the National Academy of Sciences, 118(45), e2105646118.&lt;/li&gt;
  &lt;li&gt;Yuan, Z., Tyen, G., &amp;amp; Strohmaier, D. (2021a). &lt;a href=&quot;https://doi.org/10.18653/v1/2021.semeval-1.74&quot;&gt;Cambridge at SemEval-2021 Task 1: An Ensemble of Feature-Based and Neural Models for Lexical Complexity Prediction&lt;/a&gt;. Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), 590–597.&lt;/li&gt;
  &lt;li&gt;Yuan, Z., &amp;amp; Strohmaier, D. (2021b). &lt;a href=&quot;https://doi.org/10.18653/v1/2021.semeval-1.96&quot;&gt;Cambridge at SemEval-2021 Task 2: Neural WiC-Model with Data Augmentation and Exploration of Representation&lt;/a&gt;. Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021), 730–737.&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Mon, 01 Aug 2022 17:00:13 +0100</pubDate>
        <link>https://dstrohmaier.com/transformers-and-the-brain/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/transformers-and-the-brain/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>The Unmasking of Dictionaries by Strong Opinion</title>
        <description>&lt;h3 id=&quot;disclaimer&quot;&gt;Disclaimer&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;I am currently funded by money from Cambridge University Press and Assessment, which also stewards various English dictionaries. All opinions in this post are distinctly mine.&lt;/em&gt;&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;em&gt;The Unmasking of English Dictionaries&lt;/em&gt; (CUP, 2018) is on a mission to change lexicography forever and on the way it tries to insult as many lexicographers as possible. Its author, the linguist R. M. W. Dixon, has a chip on his shoulder. Dictionaries of the English language are all wrong. Their creators misunderstand what a dictionary is for, and are, in general, lazy plagiarists. Surprisingly, the book is not just entertaining, but also makes intriguing suggestions, even though some of the arguments for them have serious gaps.&lt;/p&gt;

&lt;p&gt;Dixon repeats again and again that the purpose of dictionary is to “tell you when to use one word rather than another” (p. ix). That is the premise, and on its basis Dixon discusses the shortcomings of dictionaries:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;They treat words in isolation, rather than contrastive in their semantic field&lt;/li&gt;
  &lt;li&gt;They rely excessively on definitions&lt;/li&gt;
  &lt;li&gt;They neglect to provide grammatical information that is required for correct word use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The examples Dixon gives suggested that there is at least something to his diagnoses. He proposes that the problem be solved by the construction of a new dictionary organised semantic fields. The entries for these fields would compare the usage of the words contained in it, including the grammatical constraints on this usage. For illustration, the book contains a few sketches of such comparative discussions of lexical semantics, e.g. the field including “want”, “wish”, “desire”.&lt;/p&gt;

&lt;p&gt;One might, however, wonder about the correctness of Dixon’s premise that the main purpose of dictionaries is to enable the choice between words to use. I don’t think its true, at least descriptively. Dixon takes a distinctly productive task as the purpose for a dictionary, choosing one word to use over others. I expect, however, that much of the use of dictionaries is receptive. My expectation is that dictionaries are most frequently consulted when one is stumped by previously unfamiliar word in a text one tries to comprehend. Of course, that is speculation on my part, but so is Dixon’s claim that the purpose relates to productive use of English. And that leads us to the heart of the problem, Dixon does not sysematically engage with users of dictionaries – others than himself, that is – even though the whole point was to propose a new type of dictionary that is better suited for the needs of its users.&lt;/p&gt;

&lt;p&gt;After another swipe against lexicographers as lazy copyists, Dixon proposes the following procedure for producing a dictionary (p. 25-26):&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Select sets of related words&lt;/li&gt;
  &lt;li&gt;Consult corpora to compare and contrast those related words&lt;/li&gt;
  &lt;li&gt;Work out a conceptual template for the sets of words using “critical notions”&lt;/li&gt;
  &lt;li&gt;Only at the last step should one compare with other scholars and dictionaries.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Dixon’s proposed procedure does not include users at any point. There are no user studies, not even a step to incorporate informal feedback. The goal is to work “form first principles and with a fresh viewpoint” (p. 25). These first principles might please a linguistic expert such as Dixon, but surely the average dictionary user has different needs and these should be assessed in the process of constructing a dictionary.&lt;/p&gt;

&lt;p&gt;The lack of considering actual users and their needs also shows up in another assumption by Dixon, namely that contrastive but relatively abstract outlining of different usage patterns is sufficient to help with the choice between words. Dixon would have dictionaries present &lt;em&gt;everything&lt;/em&gt; that is required for choosing between words. That includes a lot of rather abstract linguistic information. The distinction of different types of clauses might quickly overwhelm a learner who just wanted to understand what “hanker” meant in a text they were reading, and while Dixon does envisage the usage of sentential examples, the theory-driven contrasting of words in a semantic set comes first as the organising principle (see p. 227).&lt;/p&gt;

&lt;p&gt;I would also like to add that for someone who emphasises “first principles”, Dixon does not spend much time on actually laying out his theoretical framework for lexical semantics. From Dixon’s approach, I would assume that he endorses some sort of lexical relation/frame semantics, but it is not obvious that these approaches correctly reflect word senses as they are cognitively encoded. Surely the first principles for organising a dictionary would be principles that reflect the entries in our mental lexicon? The problem here might be that Dixon does not think highly of many efforts of investigating “the role of language in human cognition” (p. 192). Although my main criticism is that the focus on linguistic “first principles” is to the exclusion of empirically assessing dictionary user needs, it might be noted that even the claimed “first principles” are not exactly fast foundations (using the word “fast” here in the antiquated secondary sense discussed on p. 131-134).&lt;/p&gt;

&lt;p&gt;Dixon’s book has great entertainment potential, especially for those of us who enjoy academic philippics. As is common for this text genre, the positive argument reveals holes upon closer expectation. Dixon’s assumption should be considered expert guesses about dictionary use, but guesses they remain. That being said, investigating Dixon’s proposals in actual user studies might be of great interest. The results could show to which extent lexicography really needs to be reborn.&lt;/p&gt;

</description>
        <pubDate>Tue, 31 May 2022 17:00:13 +0100</pubDate>
        <link>https://dstrohmaier.com/The-Unmasking/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/The-Unmasking/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>We Know So Little</title>
        <description>&lt;p&gt;Will machine learning (ML) solve natural language understanding (NLU)? A recent &lt;a href=&quot;https://thegradient.pub/machine-learning-wont-solve-the-natural-language-understanding-challenge/&quot;&gt;essay in &lt;em&gt;The Gradient&lt;/em&gt; by Walid Saba&lt;/a&gt; argues that it won’t. I lack the confidence for either affirming or denying that ML will lead to NLU, especially without much further explanation of what we understand ML and NLU to be, but I am confident that Saba’s arguments are not of the knock-down kind.&lt;/p&gt;

&lt;p&gt;A part of me would like Saba’s arguments to succeed. While I mostly work with neural nets and other ML methods, I have a soft spot for symbolic approaches to NLP.  I have read and enjoyed &lt;a href=&quot;https://www.coli.uni-saarland.de/publikationen/softcopies/Blackburn:1997:RIN.pdf&quot;&gt;&lt;em&gt;Representation and Inference for Natural Language&lt;/em&gt;&lt;/a&gt; and I am an avowed admirer of Prolog. When I got into  NLP, I began by reading Chomsky’s &lt;em&gt;Syntactic Structures&lt;/em&gt;, only later did I read &lt;em&gt;Neural Network Methods for Natural Language Processing&lt;/em&gt;. Certainly, I am the kind of person Saba’s argument should appeal to, and yet… and yet I can’t say it wins me over. At the end, I’m left unsure, not knowing whether ML will solve NLU.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/fatality.jpg&quot; alt=&quot;Picture of a man measuring words&quot; height=&quot;500&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In this post, I won’t try to cover all arguments from &lt;em&gt;The Gradient&lt;/em&gt; essay. For example, I won’t cover what Saba says on intensions, other than to frankly admit being puzzled by his claim that ML is all about extension. I’ll leave those argument to others. Instead, I’ll argue that we just don’t know enough about how language fundamentally works to adjudicate whether ML can solve NLU.[0] To make this argument, I pick out one of Saba’s claims about language and argue that the situation is more complicated. The claim I will take offense with is the following:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;[…] language understanding does not admit any degrees of freedom. A full understanding of an utterance or a question requires understanding the one and only one thought that a speaker is trying to convey.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;According to Saba, when we speak we are trying to convey one determinate thought, that is, a thought with a determinate content. The understanding of this content does not admit any degrees of freedom. As I interpret Saba, there is a matter of fact whether one correctly understands the other person or not and this fact either obtains or it does not. It doesn’t hold in degrees, only absolutely.&lt;/p&gt;

&lt;p&gt;In response, one might be tempted to point to examples where people are misunderstanding in degrees. If a speaker utters the sentence “The train is late” and one listener misunderstands it as meaning that the train will not arrive today at all and another listener misunderstands it as meaning that bananas are straight, then both are misunderstanding the sentence but the second listener is doing worse. As the example, one can misunderstand someone else more or less badly. But Saba can accept that one can be wrong in degrees, because his point is only that &lt;em&gt;full understanding&lt;/em&gt; does not admit any degrees of freedom. There might be many ways of doing it wrong, but there is only one way of doing it right. According to Saba, when we understand each other, there is one and only one thought with a determinate content to understand for each utterance. That is a much more plausible position, nonetheless, I will disagree with it.&lt;/p&gt;

&lt;p&gt;Consider Saba’s own example of an utterance:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Do we have a retired BBC reporter that was based in an East European country during the Cold War?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For the sake of illustration, assume that I utter this question as a member of a network of experts and that I want to know whether we, the network, include such a person. Saba suggests that I am expressing one determinate thought, that there is one correct analysis of my utterance, which an NLU system should produce. According to him, there is no degree of freedom in this analysis. I disagree, or at least I see good reasons for disagreement.&lt;/p&gt;

&lt;p&gt;As Saba states, understanding the exact thought of the question requires interpreting the phrase “retired BBC reporter”. This interpretation, however, turns out to be much harder than his gloss “the set of all reporters that worked at BBC and who are now retired” suggests. To see the problem, assume that in response to my question, someone asks me whether I intended to include freelance reporters who worked for the BBC or only its employees. The honest response to this question might very well be that I don’t know. I don’t know whether I meant to include freelancers in the extension of “BBC reporters” or not. Of course, I can make it up on the spot now, but I cannot decide the difference with regard to my prior intentions.&lt;/p&gt;

&lt;p&gt;Contrary to Saba, the difference I cannot decide is semantically significant. It might be that a former BBC freelancer meeting the description belongs to the network, but no employee BBC reporter does. Whether my question is to be answered affirmatively depends on a difference in phrasal meaning that 
1) I do not know how to resolve,
2) I do not know whether I intended to resolve it all when I uttered the sentence.[1]&lt;/p&gt;

&lt;p&gt;It seems that there is not one determinate content I sought to express.[2] There are at least two propositions that seem to fit my intention. But you might disagree and suggest that I intended to express one specific determinate proposition, I just don’t any longer know or never knew which one. In other words, instead of denying the determinacy of intended thought, you deny the epistemic access to the determinate intended thought. This suggestion seeks to rescue Saba’s argument with an epistemic move.&lt;/p&gt;

&lt;p&gt;I don’t know whether the epistemic move itself can be pulled off – do I really lack this introspection? – but I am confident that, in any case, it won’t achieve the argumentative goal. It cannot rescue Saba’s argument, because if I don’t have access to my determinate thoughts, you certainly don’t either. Even if one of the two interpretations is the truly correct one, you at best have approximately correct access to it. Yet, you have NLU, you understand natural language as well as any other human. You would have NLU without access to the one true thought, human-level NLU rather than super-human-level NLU. If Saba’s arguments only showed that ML can lead to no better NLU than human-level NLU, then those working on ML-based NLU won’t be all that worried.&lt;/p&gt;

&lt;p&gt;My overall argument does not depend on whether I am right in the final analysis. Maybe I intended to utter one determinate thought and maybe it is accessible to humans. Even if this were so, we do not know it. What matters is that Saba’s assumption is not safely established. We do not know that&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;[…] language understanding does not admit any degrees of freedom. A full understanding of an utterance or a question requires understanding the one and only one thought that a speaker is trying to convey.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We know too little about the foundations of language.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;

&lt;p&gt;[0] In my argument, I’m applying relatively high standards of knowledge. - Different standards for knowledge? See David Lewis’ &lt;a href=&quot;https://philpapers.org/rec/LEWEK&quot;&gt;paper &lt;em&gt;Elusive Knowledge&lt;/em&gt;&lt;/a&gt; - By denying that we have knowledge about how language fundamentally works, I am not denying that we have theories about it and I am not even ruling out that one of these theories is largely correct. I am, instead, suggesting that no theory of language and our understanding of it reaches the level of certainty Saba presumes.&lt;/p&gt;

&lt;p&gt;[1] This state of affairs differs from the missing text phenomenon, the fact that we do not express the fullness of our thoughts in utterances, that Saba happily acknowledges and makes argumentative use of. In my example, I’m not just leaving part of my thought unsaid because the part can be derived from my fragmentary statement together with common knowledge. Otherwise, I would myself be able to recover the left out part.&lt;/p&gt;

&lt;p&gt;[2] That claim resembles, of course, &lt;a href=&quot;https://plato.stanford.edu/entries/quine/#IndeTran&quot;&gt;Quine’s indeterminacy of translation&lt;/a&gt;. That being said, I am not sure what to make of Quine’s position, because I am not sharing his behaviourist assumptions and I do not know whether his position can be defended without them.&lt;/p&gt;
</description>
        <pubDate>Sun, 15 Aug 2021 22:30:13 +0100</pubDate>
        <link>https://dstrohmaier.com/We-know-so-little/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/We-know-so-little/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>On the State of Analytic Philosophy</title>
        <description>&lt;p&gt;A debate about the state of analytic philosophy has been developing in the philosophical blogosphere over the last few months, started by &lt;a href=&quot;https://sootyempiric.blogspot.com/2021/05/the-end-of-analytic-philosophy.html&quot;&gt;Liam Bright’s pessimistic assessment of the state&lt;/a&gt;. In this original post, Bright described analytic philosophy as a &lt;a href=&quot;https://plato.stanford.edu/entries/lakatos/#FalsMethScieReseProg1970&quot;&gt;“degenerate research programme”&lt;/a&gt;. No longer was there a shared paradigm, and instead philosophers either took a politically applied turn or just bumbled along not knowing what else to do.&lt;/p&gt;

&lt;p&gt;Bright summarises the situation, by describing a threefold lack of confidence:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Lack of confidence that analytic philosophy can solve its own problems.&lt;/li&gt;
  &lt;li&gt;Lack of confidence that analytic philosophy can be modified so as to do better.&lt;/li&gt;
  &lt;li&gt;Lack of confidence that the problems are worth solving in the first place.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Overall, I found myself largely agreeing with the pessimistic sentiments expressed in Bright’s post; otherwise I presumably wouldn’t have switched fields. That being said, I am modestly more optimistic on 1 and 2, as should become clear later in the presentation of my perspective.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/path-moon.jpg&quot; alt=&quot;Picture of a path to the moon&quot; height=&quot;300&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;the-debate&quot;&gt;The Debate&lt;/h2&gt;

&lt;p&gt;Bright’s post has started a debated, which has underlined for me how different the various sub-groups of academic philosophy are. Representatives of different areas in philosophy give different responses and many disagree with Bright more than I do.[0] One example of that is the recent &lt;a href=&quot;https://sootyempiric.blogspot.com/2021/07/further-reflections-on-analytic.html&quot;&gt;guest post by Preston Stovall&lt;/a&gt; on Bright’s blog.&lt;/p&gt;

&lt;p&gt;Stovall criticises the “march of Kripke” narrative in Bright’s original post, i.e. the narrative that sees Kripke’s work as the high point of the analytic tradition on which the later work relied, and by changing the narrative Stovall suggests a vision for a unified analytic philosophy. While I am vaguely familiar and attracted by the narrative that Stovall sketches, I cannot say that I recognise much from my own philosophical-academic experience and work in it. To tell the truth, I can recognise as a distinctly Pittsburghian approach to analytic philosophy, rather than the one I am used to. Perhaps this Pittsburghian view can take over, and unification be achieved behind another tradition with in analytic philosophy. For now, however, a diversity of viewpoints prevails.&lt;/p&gt;

&lt;p&gt;To add to this diversity, I want to present my own perspective, that is the perspective of one particular person who has turned to computer science out of dissatisfaction with philosophy’s current state. Hence, my post will be unabashedly self-centred, focussing on three of my own qualms with academic analytic philosophy:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Dissatisfaction with the methods used.&lt;/li&gt;
  &lt;li&gt;Lack of interest in the questions of the applied turn.&lt;/li&gt;
  &lt;li&gt;Lack of opportunities for career advancement.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;dissatisfaction-with-methods&quot;&gt;Dissatisfaction with Methods&lt;/h2&gt;

&lt;p&gt;My dissatisfaction with the methods of analytic philosophy is an instance of the one described by Bright. I share the lack of confidence that analytic philosophy as it stands can solve its own problems and it troubles me deeply. But not all find this prospect of unsolved problems so dismal. As Bright summarises, one common response to his diagnosis of the lack of suggests that&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;while it is true that philosophers generally cannot plausibly believe they will achieve rational consensus, this is not such a bad thing. The mistake was ever hoping for that in the first place, and once we have gotten over that hangup we can enjoy the sort of pluralistic free play of ideas that comes with a taste for dissensus.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I find the lack of ambition in this response deeply unappealing. We should aspire to solve our problems, or at least to make substantial progress towards such solutions. The rationales for the lack of ambition do not convince me. They seem to turn on the nature of philosophy and suggest that it is an open-ended discipline that can reach no conclusions at all. As is to be expected, I disagree with this view and while settling the nature of philosophy is beyond the reach of a blog post, outlining the difference is not.&lt;/p&gt;

&lt;p&gt;It might be true that philosophy constantly raises new questions. But there is a need to distinguish whether philosophy as a discipline can always raise new unanswered questions, and whether we can answer current questions in philosophy.[1] To give an example, I have published on the nature of social groups and I want philosophers to reach a rational consensus on this issue. Are groups pluralities or not? The common response appears to suggest that we might clarify this question itself, but never quite answer it.&lt;/p&gt;

&lt;p&gt;I am not as pessimistic as those who respond with accepting the problems as unresolvable. In contrast to them, I hold out a modicum of hope that one by one, we could reach widespread consensus in philosophy.[2] Undoubtedly it will be challenging and it might be a never-ending quest, since we are never running out of new questions. Still that is a far cry from the pessimism of being unable to reach consensus on any of them. In fact, I believe it makes me even more optimistic than Bright, who apparently does not dare to hope for a methodological renewal, at least not one that leads to true problem-solving.&lt;/p&gt;

&lt;p&gt;That being said, I am to be counted amongst the pessimists insofar I believe in the need for a far-reaching methodological change, crossing disciplinary boundaries, and do not see such change happening at the moment. My pessimism is sustained by a folk-sociological assessment, not by one of philosophy’s nature.&lt;/p&gt;

&lt;h2 id=&quot;applied-turn&quot;&gt;Applied Turn&lt;/h2&gt;

&lt;p&gt;Analytic philosophy in the US and the UK has undergone a sharp turn towards socio-politically hot topics, such as racism and gender. Before I turned to computer science, I primarily worked in social ontology, a sub-field of philosophy in which the applied turn has been especially notable. That is not entirely surprising, since the applied turn has focused on social issues. But my experience of it has differed from that described by Bright:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;Many of the projects that seem most exciting to junior philosophers concern injustice, oppression, propaganda, ideology – all things about which it is felt that philosophical analysis might be able to have a real world impact.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I am one of the junior philosophers for whom that was not the case. These projects did not excite me. Similar to Bright, I remain sceptical about the ability of philosophers to have “real world impact” in this way. But even if one were to grant that philosophy can have the intended impact, there are subtle and not-so-subtle differences in my political view and the one hegemonic in the applied turn. These differences give the applied turn a direction that does not suit me and what I want.&lt;/p&gt;

&lt;p&gt;As I did not share the political sentiment of the applied turn, I mostly stayed away from it. Over the years, however, its influence in social ontology increased and crowded out the issues I was more interested in, such as the ontological foundations of the social sciences. While my interests still form part of social ontology and are recognised as such, the crowding-out effect matters especially on a difficult job market, where opportunities for career advancement are scarce.&lt;/p&gt;

&lt;h2 id=&quot;lack-of-career-opportunities-and-conclusion&quot;&gt;Lack of Career Opportunities and Conclusion&lt;/h2&gt;

&lt;p&gt;I take a rather naive view on the issue of the philosophy job market. Of course, one can bemoan the state of funding for the humanities and on some days I have sympathies for such a take – usually on days when my attention is not on the relative scarcity of GPU time. But bracketing this issue, there are too many people for too few positions. A solution is for people to leave the academic discipline. From my perspective, this does not happen to a sufficient degree for two reasons:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Lack of confidence in one’s own skills&lt;/li&gt;
  &lt;li&gt;Exaggerated attachment to philosophy as an &lt;em&gt;academic&lt;/em&gt; field&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Many philosophy graduates believe they are only suited for philosophy, underestimating their skills and most importantly their capacity to acquire further skills. It is my experience that with few exceptions, notably students working in cognitive science, philosophy graduates tend not to think of themselves as people who can program. The large majority of them most certainly could and they could earn income with this skill. If they gave programming a try, they might also find it intellectually rewarding.&lt;/p&gt;

&lt;p&gt;The second claim rests on my impression that philosophers identify deeply with philosophy as a field. You do not just &lt;em&gt;study philosophy&lt;/em&gt;, you &lt;em&gt;are a philosopher&lt;/em&gt;. Certainly, I have felt that ways and to an extent still. It is deeply appealing to identify with such a rich tradition of more than 2000 years, a tradition that has brought about many great insights. But that identification does not imply that one has to secure a position in &lt;em&gt;academic&lt;/em&gt; philosophy. As is well-known, few of the most famous philosophers of history worked as academics. In addition, philosophy can also be pursued from positions in other fields.&lt;/p&gt;

&lt;p&gt;Of course, I solved my problems by moving into another field, computer science. My actions and the view expressed in this post are coherent. Furthermore, my choice of computer science addresses all three issues I have raised. While analytic philosophy took a socio-politically applied turn, I instead chose an implementational turn. This implementational turn, so I hoped and still do, could help us overcome the methodological impasse in asking the questions of philosophy. Of course, computer science also offers more opportunities for career advancement. I can always sell my Python, NLP, and Deep Learning skills on the job market.&lt;/p&gt;

&lt;p&gt;Clearly, I am advertising the solution of switching to computer science, but I don’t think this post will convince many. That one can earn more with a degree in CS is well-known and widely accepted, my methodological claim and its argumentational justification would have to do the work of changing minds. But because I have not sufficiently argued for how the methodological impasse of philosophy could be overcome by the methods of computer science, my post lacks argumentative power. At this point, I doubt I have an argument that can do the work. Luckily, I only promised to offer my perspective in this post.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;

&lt;p&gt;[0] For more of the debate, follow these links to various blog posts: &lt;a href=&quot;https://dailynous.com/2021/05/24/analytic-philosophys-triple-failure-of-confidence/&quot;&gt;1&lt;/a&gt;, &lt;a href=&quot;https://fliegenderbrief.wordpress.com/2021/06/01/dont-let-it-end/&quot;&gt;2&lt;/a&gt;, &lt;a href=&quot;http://lilith.cc/~victor/dagboek/index.php/2021/05/28/on-the-end-of-analytic-philosophy/&quot;&gt;3&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;[1] A connecting assumption is a deep holism about philosophical questions, as expressed in &lt;a href=&quot;http://lilith.cc/~victor/dagboek/index.php/2021/05/28/on-the-end-of-analytic-philosophy/&quot;&gt;this blogpost&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;Any philosophical problem is all philosophical problems. You will have known nothing if you have not known everything.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I don’t want to dismiss such assertions entirely, but notice that asserting it leads to an odd incoherence. As far as I am concerned, the question of holism about philosophy is a philosophical question. To know the positive answer to it, that is to know that any philosophical problem is all philosophical problems, would therefore require us to know everything.&lt;/p&gt;

&lt;p&gt;[2] I am not ruling out that some problems in philosophy might not be solvable. There might be a lack of epistemic access of one sort or another. But I do not think that we have justification to act on this assumption with regards to all or most philosophical problems. The background of my position is Peircean, taking hope in success as an important factor for the progress of inquiry.&lt;/p&gt;
</description>
        <pubDate>Wed, 07 Jul 2021 17:30:13 +0100</pubDate>
        <link>https://dstrohmaier.com/state-of-analytic-philosophy/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/state-of-analytic-philosophy/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Using Prolog for Sudoku Variants</title>
        <description>&lt;p&gt;The Sudoku scene has undoubtedly been one of the pandemic winners. Thanks to the Youtube channel &lt;a href=&quot;https://www.youtube.com/channel/UCC-UOdK8-mIjxBQm_ot1T-Q&quot;&gt;“Cracking the Cryptic”&lt;/a&gt;, its viral video on the &lt;a href=&quot;https://www.youtube.com/watch?v=yKf9aUIxdb4&quot;&gt;“Miracle Sudoku”&lt;/a&gt;, and the many entertaining videos that followed, Sudoku puzzles with extended rule-sets have received widespread attention. That is a prime opportunity for Prolog aficionados like myself to show off the power of the language. Many Sudoku puzzles are easily solved with Prolog.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/compositor.jpg&quot; alt=&quot;A Sudoku Setter at Work&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;existing-resources&quot;&gt;Existing Resources&lt;/h2&gt;

&lt;p&gt;A solver for standard Sudokus is a teaching example for the &lt;a href=&quot;https://www.swi-prolog.org/man/clpfd.html&quot;&gt;CLPFD library&lt;/a&gt;. The Power of Prolog has a &lt;a href=&quot;https://www.metalevel.at/sudoku/&quot;&gt;dedicated page and video&lt;/a&gt; for solving standard Sudokus. Puzzles with extended rule-sets have not gone unnoticed either. In fact, the original “Miracle Sudoku” video has been discussed and solved with Prolog in &lt;a href=&quot;https://benjamincongdon.me/blog/2020/05/23/Solving-the-Miracle-Sudoku-in-Prolog/&quot;&gt;a blog post by Benjamin Congdon&lt;/a&gt;. I want to add a little to these solvers.&lt;/p&gt;

&lt;p&gt;The extended solver of Congdon adds three constraints to the classical solver:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;King’s Move: Cells that are removed from each other by the equivalent of a move of a chess king cannot contain the same digit.&lt;/li&gt;
  &lt;li&gt;Knight’s Move: Cells that are removed from each other by the equivalent of a move of a chess knight cannot contain the same digit.&lt;/li&gt;
  &lt;li&gt;Orthogonal Adjancency: Orthogonally adjacent cells cannot contain consecutive digits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to see how to program these constraints, see Congdon’s &lt;a href=&quot;https://benjamincongdon.me/blog/2020/05/23/Solving-the-Miracle-Sudoku-in-Prolog/&quot;&gt;post&lt;/a&gt;. But there are other constraints that often appear on Cracking the Cryptic and I thought I would fill the gap. For a start, I want to address one of the most common constraint type:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Thermo: Numbers on a line are montonically increasing starting from a thermometer bulb.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;full-solution-thermo&quot;&gt;Full Solution: Thermo&lt;/h2&gt;

&lt;p&gt;For the Thermo constraint, I’ve chosen the great “Spoons” puzzle by the well-known setter Phistomefel. To solve that puzzle yourself, &lt;a href=&quot;https://app.crackingthecryptic.com/sudoku/BnRMNhBr8N&quot;&gt;follow this link&lt;/a&gt;. To solve it with Prolog, all we need beyond a standard solver are the following the two lines and the inclusion of the specific constraints:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;smaller(L,Sn,L) :- Sn #&amp;lt; L.
thermo([L|Ls]) :- foldl(smaller,Ls,L,_).
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The thermo predicate defined in these lines, checks whether a list of integers increases monotonically from left to right.[0]&lt;/p&gt;

&lt;p&gt;My complete solution, based on the previous solvers metioned above, looks as follows:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;:- use_module(library(clpfd)).

puzzle(Rows) :-
	Rows = [
		[A1,A2,A3,A4,A5,A6,A7,A8,A9],
		[B1,B2,B3,B4,B5,B6,B7,B8,B9],
		[C1,C2,C3,C4,C5,C6,C7,C8,C9],
		[D1,D2,D3,D4,D5,D6,D7,D8,D9],
		[E1,E2,E3,E4,E5,E6,E7,E8,E9],
		[F1,F2,F3,F4,F5,F6,F7,F8,F9],
		[G1,G2,G3,G4,G5,G6,G7,G8,G9],
		[H1,H2,H3,H4,H5,H6,H7,H8,H9],
		[I1,I2,I3,I4,I5,I6,I7,I8,I9]
		],
    sudoku(Rows),
	thermo([A3,A4,A5]),
	thermo([B2,C2,D2]),
	thermo([B3,C3,D3]),
	thermo([B4,C4,D4]),
	thermo([B5,C5,D5]),
	thermo([B7,C7,D7]),
	thermo([B8,C8,D8]),
	thermo([B9,C9,D9]),
	thermo([E3,E4,E5]),
	thermo([F1,G1,H1]),
	thermo([F3,G3,H3]),
	thermo([F4,G4,H4]),
	thermo([F6,G6,H6]),
	thermo([F7,G7,H7]),
	thermo([F8,G8,H8]),
	thermo([F9,G9,H9]),
	thermo([I3,I4,I5]),
	thermo([I8,I7,I6]).

sudoku(Rows) :-
	append(Rows, Vs), Vs ins 1..9,
	maplist(all_distinct, Rows),
	transpose(Rows, Columns),
	maplist(all_distinct, Columns),
	[As,Bs,Cs,Ds,Es,Fs,Gs,Hs,Is] = Rows,
	blocks(As, Bs, Cs),
	blocks(Ds, Es, Fs),
	blocks(Gs, Hs, Is).

blocks([], [], []).
blocks([N1,N2,N3|Ns1], [N4,N5,N6|Ns2], [N7,N8,N9|Ns3]) :-
    all_distinct([N1,N2,N3,N4,N5,N6,N7,N8,N9]),
    blocks(Ns1, Ns2, Ns3).

smaller(L,Sn,L) :- Sn #&amp;lt; L.
thermo([L|Ls]) :- foldl(smaller,Ls,L,_).

:- time((puzzle(Rows), maplist(labeling([ff]), Rows))),
	maplist(portray_clause, Rows).
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The solve took 0.141 seconds on my laptop.&lt;/p&gt;

&lt;h2 id=&quot;other-constraints&quot;&gt;Other Constraints&lt;/h2&gt;

&lt;p&gt;To show off the power of Prolog a little more, I’ll finish with the implementation of two more constraints.&lt;/p&gt;

&lt;p&gt;Summing constraints are equally straight forward to handle. There are in fact multiple variations of summing constraints, including summing along arrows and summing along diagonals (little killer clues). The code will usually be the same:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;add(X,Y,S):- S #= X+Y.
sum(Xs,S):- foldl(add,Xs,0,S).
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The predicate sum relates a list of integers – order does not matter – to its sum &lt;em&gt;S&lt;/em&gt;. When we implement a Sudoku puzzle, the &lt;em&gt;S&lt;/em&gt; will usually be another variable in the case of arrow clues and in the case of little killer clues, it will usually be a given digit.&lt;/p&gt;

&lt;p&gt;Disjoint groups are a further fascinating constraint. It is &lt;a href=&quot;https://www.funwithpuzzles.com/2014/08/disjoint-groups-sudoku-fun-with-sudoku.html&quot;&gt;defined as follows&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;cells with the same position in 3x3 boxes contains number from 1 to 9 i.e no number can repeat in the same position in 3x3 boxes.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I wrote a working implementation for the disjoint group constraint and I post it here for completeness, but it is not very elegant.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;disjoint(Rows) :-
	by3(Rows,First-[],Second-[],Third-[]),
	maplist(distinct_sets,[First,Second,Third]).

distinct_sets(Rows) :- row_sets(Rows,FSet,SSet,TSet),
                       maplist(all_distinct,[FSet,SSet,TSet]).

row_sets([],[],[],[]).
row_sets([H|Rows],L1,L2,L3) :- by3(H,L1-A,L2-B,L3-C),
                               row_sets(Rows,A,B,C).

by3([],A-A,B-B,C-C).
by3([N1,N2,N3|R],[N1|F]-A,[N2|S]-B,[N3|T]-C) :- by3(R,F-A,S-B,T-C).
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In a nutshell, the predicate disjoint groups every third row (A, D, G) and every third+1 row (B, E, H), and every third+2 row (C, F, I) together and then applies the same grouping within rows to create the disjoint sets. If you have a better implementation of the disjoint group constraint, then email me. And if you think you understand how it works and want to implement a solve yourself, give &lt;a href=&quot;https://app.crackingthecryptic.com/sudoku/LNqP9d8tdj&quot;&gt;this puzzle&lt;/a&gt; a try. I would love to see a good Prolog solver for it.&lt;/p&gt;

&lt;h2 id=&quot;update-22072021&quot;&gt;Update [22.07.2021]&lt;/h2&gt;
&lt;p&gt;I’ve been sent this clever implementation of the disjoint group constraint by Janne U. using a DCG:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;disjoint_groups2(Rows) :-
	phrase(blockrows(Rows), Blocks),
	transpose(Blocks, BlocksT),
	maplist(all_distinct, BlocksT).

blockrows([]) --&amp;gt; [].
blockrows([[],[],[]|R]) --&amp;gt; blockrows(R).
blockrows([[N1,N2,N3|Ns1], [N4,N5,N6|Ns2], [N7,N8,N9|Ns3]|R]) --&amp;gt;
	[[N1,N2,N3,N4,N5,N6,N7,N8,N9]],
	blockrows([Ns1,Ns2,Ns3|R]).
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;hr /&gt;
&lt;h3 id=&quot;footnote&quot;&gt;Footnote&lt;/h3&gt;

&lt;p&gt;[0] I consistently use here the predicates from the CLPFD library, rather than the vanilla mathematical predicates available in Prolog.&lt;/p&gt;
</description>
        <pubDate>Fri, 02 Jul 2021 22:30:13 +0100</pubDate>
        <link>https://dstrohmaier.com/sudoku-prolog/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/sudoku-prolog/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Notes on Standing and Occasion Meaning</title>
        <description>&lt;p&gt;Lexical semantics investigates the meaning of words, but one might distinguish multiple levels of meaning for a single word. For example, the word “semantics” might have a very broad and a very narrow meaning at the same time, depending on how much contextual information we take into account. The relative dependence on contextual information created lexical levels of meaning. In this post, I will share a few thoughts on the simplest distinction of lexical levels of meaning, that is the distinction between standing and occasion meaning of a word. The main insight will be that the distinction faces a problem with regard to homonyms and that current NLP approaches can be interpreted as implementing a specific solution to this problem.&lt;/p&gt;

&lt;h2 id=&quot;standing-and-occasion-meaning&quot;&gt;Standing and Occasion Meaning&lt;/h2&gt;

&lt;p&gt;Standing meaning is the meaning of a word in general, while occasion meaning is word meaning for a specific occasion. By drawing this line, we split word meaning on the one side into a generic core without contextual dependence, and on the other into a specification that depends on the linguistic context. To give an example, “student” might broadly denote learners, while in a specific context the denotation of the word might be narrowed down to the registered participants of a class. The one is the standing and the other is the occasion meaning.&lt;/p&gt;

&lt;p&gt;The distinction between standing and occasion meaning has a long history, going back at least to the late 19th century work of the linguist Hermann Paul. Paul distinguished between &lt;em&gt;usueller&lt;/em&gt; and &lt;em&gt;okkasioneller Bedeutung&lt;/em&gt; (usual/standing and occasional meaning) (see Geeraerts 2010: 14-16).[0] It is perhaps the most frequent way of distinguishing levels of meaning for a word.&lt;/p&gt;

&lt;p&gt;Faced with this distinction, we might wonder what exactly instantiates two types of meaning? So far, I have generically written about the meaning of a &lt;em&gt;word&lt;/em&gt;, but we can be more specific. Either the word type or the word tokens could instantiate the two levels of meaning. A relatively intuitive response is to attribute the standing meaning to word types and the occasion meaning to word tokens, as Recanati (2012) does. After all, the standing meaning and the word type are both more abstract and generic, while the occasion meaning and the word token are more concrete and specific. But this response is not without problems, because it commits us to word types having one specific meaning in the absence of any context. In the case of homonyms, such as “bank” the standing meaning becomes unclear.&lt;/p&gt;

&lt;h2 id=&quot;the-problem-of-homonyms&quot;&gt;The Problem of Homonyms&lt;/h2&gt;

&lt;p&gt;What is the standing meaning of “bank”? There is arguably not one standing meaning for this string which covers all uses. Prima facie, “bank” in the sense of financial institute does not share a meaning with “bank” in the sense of edge of a watercourse. In such cases, we have multiple options:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Assign the word types a single disjunctive meaning.&lt;/li&gt;
  &lt;li&gt;Distinguish two or more standing meanings for the word type.&lt;/li&gt;
  &lt;li&gt;Distinguish two or more homonymous word types each with their own standing meaning.&lt;/li&gt;
  &lt;li&gt;Deny the existence of a standing meaning and only postulate occasion meaning for the word type in question.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The first option sticks with word types having a single standing meaning and just renders it disjunct. We would think of “bank” as meaning &lt;em&gt;financial-institute-or-edge-of-watercourse&lt;/em&gt;. But when words develop many meanings, then this standing meaning will become unwieldy and increasingly empty. “Bank” can also refer to buildings housing financial institutes and to databanks, and to piggy banks, etc. The resulting disjunct will be incredibly broad.&lt;/p&gt;

&lt;p&gt;On the second option we just accept that a word type, that is one abstract word string, can have multiple standing meanings and the context will then pick out one of them for further specification. In response, one might ask whether “bank” also has the standing meaning of the German “Bank” which means &lt;em&gt;bench&lt;/em&gt;. If we individuate words purely as strings that instantiate meaning, it would seem to be the case, but that might be at odds with an understanding of words that makes them specific to languages.&lt;/p&gt;

&lt;p&gt;Recanati chooses the third option and individuates word types in terms of their standing meaning, but I don’t find it an obvious choice. Individuating word types in terms of standing meaning creates the challenge of changes in word type meaning over time. We might want to say that a word type has undergone (standing) meaning change, but this assertion would become incorrect if word types are individuated in terms of their meaning. Their meaning would become an individuating characteristic, requiring us to postulate a new word whenever we detect a different standing meaning. While hat does not rule out the third option, it makes it less appealing.&lt;/p&gt;

&lt;p&gt;The fourth option abandons the distinction between standing and occasion meaning, at least for words types that are homonyms. This move throws into doubt the whole project of having general levels of word meaning. After all, the standing and occasion meaning distinction was supposed to be the simplest possible differentiation between levels.&lt;/p&gt;

&lt;p&gt;The fifth solution would be to diverge from Recanati even further and assign both standing and occasion meaning to word tokens rather than word types. It seems a bit odd, however, to claim that words have a meaning only relative to a specific use.&lt;/p&gt;

&lt;p&gt;I’ll not argue at length for one of these options here – my favourite is the second option, but the differences can be quite subtle – instead I’ll end this post by considering the problem from the perspective of contemporary NLP.&lt;/p&gt;

&lt;h2 id=&quot;a-few-remarks-from-nlp&quot;&gt;A Few Remarks from NLP&lt;/h2&gt;

&lt;p&gt;Some neural architectures, prominently transformer architectures, represent lexical meaning at multiple points. Specifically, they represent meaning at an initial embedding layer and at later points of encoding. As a result, non-contextualised and multiple sets of contextualised word representations can be extracted from e.g. BERT models (Devlin et al. 2019).[1] We could then go on to identify the non-contextualised embeddings with the standing meaning and the final contextualised embeddings with the standing meaning. This interpretation is sketched in Emerson (2020).&lt;/p&gt;

&lt;p&gt;Of these models, we can then ask how they deal with the problem of homonyms. BERT and similar approaches effectively implement the first option and assign a disjunct standing meaning. The initial embedding for “bank” does not differ for the two homonyms. That could be changed, of course. We could preprocess the data with a coarse-grained word sense disambiguation (WSD) system and create different initial embeddings based on the results (cf. Trask et al. 2015).&lt;/p&gt;

&lt;p&gt;If you have a preference for either the second option, as I do, or Recanati’s option of individuating word types in terms of standing meanings, then you would not be satisfied of equating the initial embeddings with standing meanings. The introduction of coarse-grained WSD would fit these approaches much better.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;

&lt;p&gt;[0] Another important source for this distinction is Quine (2013), who distinguished occasion from standing sentences. But Quine’s approach is much more behaviourist.&lt;/p&gt;

&lt;p&gt;[1] I neglect here that BERT uses sub-tokenization.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 id=&quot;references&quot;&gt;References&lt;/h3&gt;

&lt;p&gt;Devlin, J., Chang, M.-W., Lee, K., &amp;amp; Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. &lt;a href=&quot;http://arxiv.org/abs/1810.04805&quot;&gt;http://arxiv.org/abs/1810.04805&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Emerson, G. (2020). What are the Goals of Distributional Semantics? &lt;em&gt;Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics&lt;/em&gt;, 7436–7453. &lt;a href=&quot;https://doi.org/10.18653/v1/2020.acl-main.663&quot;&gt;https://doi.org/10.18653/v1/2020.acl-main.663&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Geeraerts, D. (2010). &lt;em&gt;Theories of Lexical Semantics&lt;/em&gt;. Oxford University Press.&lt;/p&gt;

&lt;p&gt;Recanati, F. (2012). Compositionality, Flexibility, And Context Dependence. In &lt;em&gt;The Oxford Handbook of Compositionality&lt;/em&gt; (pp. 175–191). Oxford University Press. &lt;a href=&quot;https://doi.org/10.1093/oxfordhb/9780199541072.013.0008&quot;&gt;https://doi.org/10.1093/oxfordhb/9780199541072.013.0008&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Trask, A., Michalak, P., &amp;amp; Liu, J. (2015). sense2vec—A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings. ArXiv:1511.06388 [Cs]. &lt;a href=&quot;http://arxiv.org/abs/1511.06388&quot;&gt;http://arxiv.org/abs/1511.06388&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Quine, W. V. O. (2013[1960]). &lt;em&gt;Word and Object&lt;/em&gt; (new edition). MIT Press.&lt;/p&gt;
</description>
        <pubDate>Wed, 03 Feb 2021 15:37:13 +0000</pubDate>
        <link>https://dstrohmaier.com/standing-meaning-polysemy/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/standing-meaning-polysemy/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Follow Up: Why Learn Prolog in 2021?</title>
        <description>&lt;p&gt;My &lt;a href=&quot;http://dstrohmaier.com/why-learn-prolog-in-2021/&quot;&gt;recent blog post&lt;/a&gt; arguing why one should lean Prolog in 2021 made its way to the front page of Hacker News (HN), where it started a &lt;a href=&quot;https://news.ycombinator.com/item?id=25652369&quot;&gt;discussion&lt;/a&gt; with more than 100 comments. I’m glad to see that some saw value in my post and I want to respond to a few comments from this discussion. Given the number of comments, I won’t write an exhaustive response but instead focus on a few themes I care about and try to fill some gaps I left in my original post. This post will be more eclectic and discuss:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;How to evaluate the time investment into Prolog&lt;/li&gt;
  &lt;li&gt;Where the unfulfilled potential of Prolog might lie&lt;/li&gt;
  &lt;li&gt;Why I focussed on Prolog rather than another logic programming language&lt;/li&gt;
  &lt;li&gt;The aesthetic and epistemic reasons for learning Prolog&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I consider this post a contribution to an ongoing debate, not its conclusion, and I hope it will be understood in such a way.&lt;/p&gt;

&lt;h2 id=&quot;how-to-evaluate-the-time-investment&quot;&gt;How to Evaluate the Time Investment&lt;/h2&gt;

&lt;p&gt;I framed my blog post as providing reasons to learn Prolog for university students. The question I sought to provide them with reasons to invest their time specifically in Prolog. Why should they spend the time on Prolog than other opportunities?&lt;/p&gt;

&lt;p&gt;I believe that the opportunity costs are especially worth considering for students of computer science, because these costs can be larger than in other disciplines. If a CS student learns a skill that is in high demand on the market – and that is not the case for Prolog at this point – they might increase their future salary.&lt;/p&gt;

&lt;p&gt;A student would not be well advised to invest much time into Prolog if they primarily sought to increase their future salary while avoiding risk. There are better &lt;em&gt;risk-averse&lt;/em&gt; time investment opportunities, such as learning more about neural network technologies. Of course, this assumes a student who are motivated by risk-secure monetary outcomes and would be able to act on this motivation.[0] Instead students&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;might be motivated by other aims more strongly than by the additional income, or&lt;/li&gt;
  &lt;li&gt;might be willing to take a riskier bet to raise their income, or&lt;/li&gt;
  &lt;li&gt;might find themselves unable to resist watching funny Youtube clips for the sake of earning money but able to resist them for the sake of learning Prolog.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;More needs to be said about the case of students willing to take risky bets. My last post subtly suggested that Prolog might lead to an increased income, if at some future date Prolog releases an unfulfilled potential. Learning Prolog is a risky investment, but on the assumption of unfulfilled potential it is an investment with a potentially large pay-off.&lt;/p&gt;

&lt;h2 id=&quot;what-is-this-unfulfilled-potential&quot;&gt;What Is This Unfulfilled Potential?&lt;/h2&gt;

&lt;p&gt;In response to my claim that Prolog has unfulfilled potential, HN user &lt;em&gt;mths&lt;/em&gt; wrote&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;Is there any reason to believe the paradigm will somehow come into its own in the future? The way this question was addressed by the article was way too wishy-washy for my taste. 
I freely admit that I did not provide all that many details on this issue, because I feared the challenge of the predicting the future direction of Prolog and the embarrassment if I my specific predictions turn out to be mistaken. The comments, however, made me realise that I need to&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One area where many Prolog aficionados tend to see unfulfilled potential is constraint logic programming. Programming with constraints is powerful and has been explored relatively little. Constraint programming is also an area with relatively clear applications. It isn’t only helpful for solving Sudokus or logic puzzles – although that is true as well[1] – but e.g. also for various engineering tasks.&lt;/p&gt;

&lt;p&gt;There are other contenders in the space of constraint programming, but Prolog’s design allows us to integrate constraint logic programming seamlessly into larger projects. As The Power of Prolog &lt;a href=&quot;https://www.metalevel.at/prolog/optimization&quot;&gt;website&lt;/a&gt; puts it, constraints “blend in especially seamlessly into _logic programming languages like Prolog due to their relational nature and built-in search and backtracking mechanisms”.&lt;/p&gt;

&lt;p&gt;Another area where I believe Prolog has unfulfilled potential is the task of fully interpretable automated reasoning. Fully interpretable reasoning requires the ability to follow step by step the inferences processes, including the evidential basis, in a way that is comprehensible to the human cognitive architecture. While much work has pried open the black box of neural networks, I don’t see this level of interpretability reachable without much revision of our neural architectures. Admittedly, or many applications, this lack of full interpretability is acceptable and in some domains it might even be unavoidable. In some domains, however, we might expect such a level of interpretability. The legal domain is a case in point. For at least some parts of legal decision processes which might strip people of their most basic freedoms, we ought to do our best to provide a fully interpretable inference process. In these domains, Prolog or Prolog-like languages have unfulfilled potential.&lt;/p&gt;

&lt;p&gt;In addition to these two specific areas, let me offer a highly abstract reason why one should expect Prolog to have unfulfilled potential. I assume that Prolog is the main example of the logic programming paradigm, which in turn I assume to be one of the three major programming paradigms: imperative, functional, and logic. If those assumptions are granted – and there are plausible reasons to object to them – the question arises why the logic programming paradigm should be the only out of the three paradigms which does not find major areas of application. To my mind, it appears unlikely that there would be an entire approach to programming with a negligible domain of application. This argument is more of a hunch, but such hunches are the best guidance we have when it comes to hard-to-quantify unknowns, such as whether a technology has potential no one has even conceived of yet.&lt;/p&gt;

&lt;h2 id=&quot;but-does-it-have-to-be-prolog&quot;&gt;But Does It Have to Be &lt;em&gt;Prolog&lt;/em&gt;?&lt;/h2&gt;

&lt;p&gt;A few discussion participants saw merit in the logic programming paradigm, but felt less comfortable with Prolog in particular.&lt;/p&gt;

&lt;p&gt;For example, HN user &lt;em&gt;qart&lt;/em&gt; asked:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;I wonder… is it still justified to learn Prolog now? Aren’t there better alternatives for logic programming in many other common programming languages? I mean http://minikanren.org/&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Another example of a logic programming language that popped up repeatedly in the discussion is &lt;a href=&quot;https://mercurylang.org/index.html&quot;&gt;Mercury&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I lack experience with both miniKanren and Mercury and so I won’t argue that they are better or worse realisations of the logic programming paradigm. Instead I want to suggest that one should prefer to learn Prolog, because there are more resources available for Prolog. Many of these resources were linked to in the HN thread itself.&lt;/p&gt;

&lt;p&gt;Furthemore, given the presumed audience of university students, the question is mostly moot. Usually, departments would offer only one course in logic programming or require learning Prolog before learning more about other logic programming languages. The choices are limited by the curriculum.&lt;/p&gt;

&lt;p&gt;I’m open to the thought that the future of logic programming will not be exactly ISO-compliant Prolog. That being said, I’d be surprised if no key elements of Prolog would be available in that assumed future language, be it Horn-clauses, unification with backtracking, or Prolog-style constraint logic programming.&lt;/p&gt;

&lt;h2 id=&quot;aesthetic-and-epistemic-reasons&quot;&gt;Aesthetic and Epistemic Reasons&lt;/h2&gt;

&lt;p&gt;I don’t think one should fool oneself about learning Prolog and the lack of demand for Prolog skills on the job market, which was also a major topic on the HN thread. But I also made two other arguments, one appealing to the aesthetic and the other to the epistemic properties of Prolog. From the comments, I got the impression that these properties carried more weight with some discussion participants than with others. That is to be expected. What stood out to me, however, is that relatively few HN users questioned that Prolog has these properties.&lt;/p&gt;

&lt;p&gt;Perhaps the closest to arguing against my claim that Prolog is intellectually beautiful and epistemically revelatory would be the comments criticising the language for living up to its own ambitions. For example, &lt;em&gt;infogulch&lt;/em&gt; complained that:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;Prolog implementations are too heavily reliant on the stated order of predicate rules in order to make execution progress.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I have sympathy for such criticisms and in my post I admitted that “occasionally Prolog falls short of the programming-by-description-paradigm”. But much of the beauty derives from recognising the paradigm of which Prolog is the main example. Compare the experience to reading a novel in which the author has captured a completely new way of conceiving an aspect of reality, but some chapters fail to reflect this original conception. While one might argue that the novel is uneven, the new conception of reality within its pages is certainly a reason to read it. It seems to me that criticisms such as the one by &lt;em&gt;infogulch&lt;/em&gt; are analogous.&lt;/p&gt;

&lt;p&gt;In addition, Prolog is still being improved with regard to the aforementioned shortcomings. Prolog develops continue to work on bringing the language more in line with the ideas that render it beautiful and epistemically revelatory.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;My former post was intended as the best case I can make for Prolog within a blog post. I had an imagined audience, but I was writing the post to reflect on what might justify my own interaction with Prolog, including offering supervisions for a course. While my main aim was to offer a justification independently of its uptake by anyone else, I won’t deny that I derive great satisfaction from positive responses, such as &lt;em&gt;simongray&lt;/em&gt; writing&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;This inspired me. What’s the best book for modern prolog?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I am grateful for everyone who gave my blog post a chance.&lt;/p&gt;

&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;

&lt;p&gt;[0] I am also writing this in the context of a limited departmental curriculum, where students cannot just choose any course, but I won’t go into that here.&lt;/p&gt;

&lt;p&gt;[1] See &lt;a href=&quot;https://www.metalevel.at/sudoku/&quot;&gt;https://www.metalevel.at/sudoku/&lt;/a&gt; and  &lt;a href=&quot;https://www.metalevel.at/prolog/puzzles&quot;&gt;https://www.metalevel.at/prolog/puzzles&lt;/a&gt;.&lt;/p&gt;
</description>
        <pubDate>Thu, 07 Jan 2021 15:37:13 +0000</pubDate>
        <link>https://dstrohmaier.com/follow-up-why-learn-prolog-in-2021/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/follow-up-why-learn-prolog-in-2021/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Why Learn Prolog in 2021?</title>
        <description>&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;?- learn(prolog).
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Why should one learn Prolog in 2021? I should better have an answer to this question, because I will soon offer supervisions for a Prolog course. While I’m a personal admirer of this unusual programming language, students might rightfully demand a justification that goes beyond my preferences. Prolog certainly isn’t the most glamorous programming language to learn in 2021. Despite its lack of popularity, there are good reasons to learn Prolog and in the following, I’ll explore three of them.&lt;/p&gt;

&lt;h2 id=&quot;the-sheer-intellectual-beauty&quot;&gt;The Sheer Intellectual Beauty&lt;/h2&gt;

&lt;p&gt;Perhaps my background in philosophy helps explain my fondness for Prolog. Not only is first-order predicate logic taught to virtually all philosophy students as a tool for thought, but it also forms the foundation of Prolog’s logic-programming paradigm. Philosophers aim for a logical description of the world and Prolog goes beyond this ambition by allowing us to manipulate reality via a logical description. We solve problems by writing Horn clauses, and a Horn-clause is a logical formula that simplifies resolution. Logical formulas are the tool of problem solving. Once grasped, the idea of logically describing a problem and the having the computer solve it is almost irresistible.&lt;/p&gt;

&lt;p&gt;Of course, occasionally Prolog falls short of the programming-by-description-paradigm. There are cases where Prolog mixes logic and control instead of keeping them apart.[0] Nonetheless, logic programming, the paradigm of which Prolog is the primary example, comes with its own intellectual appeal. From the perspective of intellectual aesthetics, good Prolog code is a sublime experience (&lt;em&gt;erhabene Erfahrung&lt;/em&gt;).[1] Such Prolog code reveals the overwhelming power of logical description and the force of a capacity – the capacity to describe the world in logical terms and thereby solve problems – that resides in all of us.&lt;/p&gt;

&lt;p&gt;In sum, Prolog code has a timeless beauty to it – a claim that I believe is more commonly associated with the S-expressions of LISP – and is therefore worth learning. I am aware that an appeal to beauty has its limits, but the aesthetic properties of a programming language should not be entirely discounted. Our sense for intellectual beauty is an important tool for creation and it needs to be trained. If one understands what makes an approach beautiful, it becomes easier to create beautiful code and to resist the lure of beauty when it distracts from practical concerns. Learning Prolog is a way to tame the power of beautiful code.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;?- beautiful(prolog).
true
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;a-different-perspective-on-classical-issues&quot;&gt;A Different Perspective on Classical Issues&lt;/h2&gt;

&lt;p&gt;Recursion, list manipulations, and graph-hopping are standard topics of foundational computer science and Prolog addresses them with a twist.[2] Prolog offers a different perspective on classical issues of computer science, usually right away from the first lessons. As a result, Prolog has a relatively steep learning curve, but the different perspective can also be revelatory. One learns to describe classical problems in the format of Prolog Horn-clauses and thereby solve them, which can lead to a unique way of understanding them – especially, once one has learned to write &lt;em&gt;idiomatic&lt;/em&gt; Prolog.&lt;/p&gt;

&lt;p&gt;Prolog is not only beautiful, but it also reveals another aspect of the core issues of computer science to which it is applied. Occasionally, the aspect Prolog reveals is also the aspect that needs to be seen for solving a problem. Some problems call for Prolog. Having learned Prolog will allow one to address them beautifully and efficiently. To be honest, at the moment such problems are too rare to justify learning Prolog. But I don’t believe that this has to remain so. As my last argument in favour of learning Prolog, I will suggest that it has unfulfilled potential.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;?- unique_perspective(prolog).
true
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;unfulfilled-potential&quot;&gt;Unfulfilled Potential&lt;/h2&gt;

&lt;p&gt;As a student of computer science, one can make a decent career by always following the hype, but to stand out one has to diverge from the well-trodden paths. Those willing to explore unpopular territory have a chance of being ahead of the crowd. In 2021, Prolog is such unpopular territory. In my field of NLP, one might instead opt to learn more about neural networks and especially the Transformer architectures such as BERT. Learning about these topics is certainly advisable for a career in NLP, but it won’t make one stand out.&lt;/p&gt;

&lt;p&gt;Prolog is unpopular and, more importantly, I believe that it has not fulfilled its potential so far. The logic-programming paradigm with its separation between logic and control is powerful. Yet it does not find much use in current applications. This unpopularity despite power might deter a student from learning Prolog – perhaps logic-programming has faults which keep its from being successful – but it is also an opportunity. One can make the bet that more will come of Prolog or a language similar to it.[3] If the bet is successful, one will be ahead of the hype.&lt;/p&gt;

&lt;p&gt;Such bets on unpopular options are risky. It is a high-reward bet because of the limited chances of success. That being said, I would advise making a few such bets in the course of one’s life. Even if nothing comes of them, they render life more interesting and help to show individual character. Perhaps one shouldn’t go all in on such a bet, but this consideration should justify the few hours of a Prolog course, when one gets academic credits in addition to being able to assess the unfulfilled potential of Prolog better.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;?- potential(prolog,Y), unfulfilled(Y).
true
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Currently, Prolog does not belong to the most popular programming languages. Its logic programming paradigm makes it an outsider. Nonetheless, I’ve argued that there are good reasons to learn Prolog. The language is beautiful, it offers a different perspective on classic computer science issues, and it has unfulfilled potential. Whether you are motivated by aesthetic, academic, or career considerations, you have a reason to learn Prolog in 2021.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;learn(X) :- beautiful(X), unique_perspective(X), potential(X,Y), unfulfilled(Y).

?- learn(prolog).
true
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;UPDATE&lt;/em&gt; This blog post made its way to the frontpage of Hacker News where it received a sizeable number of &lt;a href=&quot;https://news.ycombinator.com/item?id=25652369&quot;&gt;comments&lt;/a&gt;. In response, I wrote a &lt;a href=&quot;http://dstrohmaier.com/follow-up-why-learn-prolog-in-2021/&quot;&gt;follow-up post&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;footnote&quot;&gt;Footnote&lt;/h3&gt;

&lt;p&gt;[0] I’m referencing here Robert Kowalski’s formulation of Algorithm = Logic + Control.&lt;/p&gt;

&lt;p&gt;[1] I hope Kantians can forgive me for treating the sublime (&lt;em&gt;das Erhabene&lt;/em&gt;) as a type of beauty, neglecting Kant’s distinction.&lt;/p&gt;

&lt;p&gt;[2] For an example, have a look at the quicksort implementation on &lt;a href=&quot;https://www.metalevel.at/prolog/sorting&quot;&gt;The Power of Prolog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;[3] I’m not &lt;a href=&quot;https://www.youtube.com/watch?v=kGQNeeRp4sM&quot;&gt;the only one who has hopes for Prolog’s future&lt;/a&gt;.&lt;/p&gt;
</description>
        <pubDate>Tue, 05 Jan 2021 21:37:13 +0000</pubDate>
        <link>https://dstrohmaier.com/why-learn-prolog-in-2021/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/why-learn-prolog-in-2021/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>End of Year Post: 2020</title>
        <description>&lt;p&gt;What has 2020 brought? In this post I want to offer a selective reflection on my research career and its developments in 2020. I will take a perspective that is at the same time deeply personal and highly abstract. From my personal heights, I’ll gesture at the turns I’ve taken this year, note a few outcomes, and point towards my future commitments.&lt;/p&gt;

&lt;h2 id=&quot;reorientations-and-pivots&quot;&gt;Reorientations and Pivots&lt;/h2&gt;

&lt;p&gt;My 2020 was characterised by multiple research reorientations and pivots. Many of these pivots I made privately, barely discussing them with my closest friends. These changes resulted from my assessment of my career trajectory and the various research fields into which I have tipped my toes. I don’t want to go into too much detail, but I intend to work less on social ontology and more on natural language processing (NLP). Generally, the path I chose is indicated by my most recent blog posts, which focus on the semantic dimensions of natural language, including lexical semantics. That choice, however, came after much trepidation and pivoting and in the last quarter of this year.&lt;/p&gt;

&lt;p&gt;Why the need for a pivot? All too often academia ends up tying intelligent people to a misguided path of research. Various aspects of academic institutions, such as the need to develop a publication record in a narrow subfield, incentivise researchers to stick with their research projects even when emerging evidence weighs against it. While sometimes sticking to one path pays off in the long run – the researchers who stuck to neural networks during the various AI winters are a prime example of success – in many cases it misallocates human talent. Our human capacities are not maximising the scientific progress of humanity, or much else of value.&lt;/p&gt;

&lt;p&gt;The possibility of wasting my limited capacities on fruitless research endeavours frightens me greatly. Hence, I have a history of abandoning research directions with which I have become disillusioned. A few years ago, before the final stages of my PhD, I was working in the history of philosophy and specifically on German Idealism, but by now that seems far removed from my research interests. While I can still derive joy from picking up a book by Hegel and perusing it, I cannot see myself dedicating my life to it. Hegel does not appear in my philosophy PhD thesis and after finishing my thesis, I completed an MPhil in advanced computer science, moving into NLP.&lt;/p&gt;

&lt;p&gt;The opposite worry of wasting my capacities on fruitless endeavours is that my endless pivoting will not lead to any lasting scientific contributions either. Scientific progress relies on risky up-front investments and other than sheer luck, there is no way around that fact. Given the advanced state of most scientific fields, researchers have to delve deep into a field to contribute. Accordingly, I also fear the prospect of my research career flailing endlessly. That being said, I hope that my decisions in the later part of 2020 put me on a promising research path. Directly or indirectly, my future blog posts will reveal whether my hope is misplaced.&lt;/p&gt;

&lt;h2 id=&quot;wrapping-up-projects&quot;&gt;Wrapping up Projects&lt;/h2&gt;

&lt;p&gt;While I kept pivoting between different research interests of mine, I also wrapped up some projects. Academically, these wrapping up events realised themselves as publications. I have published in the &lt;em&gt;Canadian Journal of Philosophy&lt;/em&gt; and &lt;em&gt;Synthese&lt;/em&gt;, both of which are fairly prestigious philosophy journals. Another philosophy paper has been accepted for publication and should appear in the next few months. These three papers are exploratory stepping stones in my research career. Although some of their insights will inform my future inquiries, I will abandon much of them. It would be a great joy to me if someone else would pick up the abandoned pieces and developed them into more than I have been able to. If you have any interest in that, feel free to drop me an email.&lt;/p&gt;

&lt;p&gt;Perhaps I should do a better job of advertising and selling these papers – and since I put considerable time and effort into them, I hope that they are of value – but in this post I am trying to reflect on the overall development of my academic career, and I doubt that these papers will be the most remarkable ones of my career. In fact, I would be rather disappointed in myself if they turned out to form the pinnacle of my research. My ambitions have not been realised yet.&lt;/p&gt;

&lt;h2 id=&quot;forward-into-2021&quot;&gt;Forward into 2021&lt;/h2&gt;

&lt;p&gt;I go into 2021 with a renewed sense of commitment to furthering the scientific progress of humanity within the bounds of my limited capacities and interests. Over the last 10 years, I learned, read, and wrote without excessive regard for disciplinary boundaries. Towards the end of my PhD, I started to question my research trajectories – not that I was ever certain about them – and I explored how I might live up to my commitment of furthering scientific progress. As a result, I expanded into computer science in 2018, but I avoided decisions about my career until they become more pressing over the course of this year. In 2021, I hope to build upon the restructured foundations of my research career and start living up to my commitment. Maybe I will read some more for it in the last few hours of this year.&lt;/p&gt;

&lt;p&gt;For the scientific progress of humanity!&lt;/p&gt;
</description>
        <pubDate>Thu, 31 Dec 2020 21:00:13 +0000</pubDate>
        <link>https://dstrohmaier.com/end-of-year/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/end-of-year/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Simulating Basic Logic with Tensors</title>
        <description>&lt;p&gt;Can we simulate basic logic operations, i.e. the operations of first-order predicate logic, using tensors? In his 2013 paper “Towards a Formal Distributional Semantics: Simulating Logical Calculi with Tensors”, Edward Grefenstette made some suggestions for such simulation.  The paper’s motivation was to take a step towards combining distributional with formal semantics. I’ve explored this paper in a Jupyter Notebook, which I put on &lt;a href=&quot;https://github.com/dstrohmaier/logic_with_tensors/blob/master/logical_calculi_tensors.ipynb&quot;&gt;github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;At the moment, the github notebook viewer breaks some of the LaTex formulas, but you can see it in the Jupyter notebook viewer &lt;a href=&quot;https://nbviewer.jupyter.org/github/dstrohmaier/logic_with_tensors/blob/master/logical_calculi_tensors.ipynb&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
</description>
        <pubDate>Fri, 18 Dec 2020 13:00:13 +0000</pubDate>
        <link>https://dstrohmaier.com/logical-calculi-tensors/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/logical-calculi-tensors/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Exploring Basic Distributional Representations</title>
        <description>&lt;p&gt;I’ve recently been reading up on distributional representations, that is representation of meaning that are based on count vectors. They were the exciting technology before neural networks and the embeddings networks create changed the field of NLP. Nowadays we do not count token occurrences, but let Word2Vec or BERT models create representations.&lt;/p&gt;

&lt;p&gt;While they have decidedly fallen out of favour, distributional representations are clever pieces of technology and I wanted to get some more experiences with them. So I’ve put together a Jupyter Notebook that explores key aspects of that technology:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Creating a count matrix&lt;/li&gt;
  &lt;li&gt;Calculating Pointwise Mutual Information&lt;/li&gt;
  &lt;li&gt;Calculating similarity scores&lt;/li&gt;
  &lt;li&gt;Reducing the dimensionality of the representations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can see the &lt;a href=&quot;https://github.com/dstrohmaier/distributional_representations/blob/master/count_matrix.ipynb&quot;&gt;notebook on github&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of course, my notebook is merely an introduction to some of the most basic techniques. For example, I do not explore incorporating syntactic information. Still, I hope it shows that these by now largely neglected techniques are fascinating application of statistical NLP.&lt;/p&gt;
</description>
        <pubDate>Sat, 28 Nov 2020 19:50:13 +0000</pubDate>
        <link>https://dstrohmaier.com/distributional-representations/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/distributional-representations/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Conceptual Grain</title>
        <description>&lt;p&gt;In this blog post I share some preliminary musings on conceptual grain – how fine-grained concepts such as DOG and MAMMAL are – which have arisen from my work in NLP and specifically word sense disambiguation. The upshot is that we can develop multiple metrics of conceptual grain and that we have to address the question of what we want these metrics to do for us.&lt;/p&gt;

&lt;p&gt;The classic task of word sense disambiguation (WSD) seeks to assign senses to word tokens in contexts. When I give you a tip, I might give you either advice or money for your service. A WSD system should assign the correct sense, but assigning a sense to “tip” relies on a repository of senses from which a WSD system can draw.&lt;/p&gt;

&lt;p&gt;WordNet is the dominant sense repository in automatic word sense disambiguation (cf. Fellbaum 1998), but its shortcomings have been known for a long time. One of these shortcomings is an exceedingly fine grain (cf. Ide &amp;amp; Wilks 2007; Navigli 2009). The concepts are too finely distinguished for current technology to perform well and arguably even for human annotators. WordNet offers 33 senses for the token “head”, so there is a good chance that some of them get confused some of the time.&lt;/p&gt;

&lt;p&gt;Despite the common complaint, the notion of grain on which it turns has remained rather unspecific. On the simplest interpretation, the grain of a sense repository such as WordNet is just the number of senses in it. There are just too many senses in WordNet! While fewer sense labels certainly would make it easier to create a classifier for WordNet, we might also look for a notion of grain with a little more theoretical heft. If we have our concepts organised with semantic relations, can we then describe grain in terms of such a linguistically founded organisation of concepts?&lt;/p&gt;

&lt;p&gt;Consider the concepts[0] of DOG and MAMMAL. You might propose that since dogs are a kind of mammal the concept is more fine-grained. In more linguistic terms, the hyponomy hierarchy of concepts provides a partial ordering of grain. A hyponym (DOG) is more fine-grained than its hypernym (MAMMAL).[1] Or to be a bit more formal, assume we have a tree of hyponyms and hypernyms, i.e. a taxonomy tree[2], then the depth at which a concept can be found in this tree could be considered its grain. Hence, we can define an order of grain using a function depth(), i.e. grain(DOG) ≥ grain(MAMMAL) if and only if depth(DOG) ≥ depth(MAMMAL).&lt;/p&gt;

&lt;p&gt;But there are other ways to specify the notion of grain. Consider again the example of “head” and its 33 senses. Prima facie, the problem here is not that the 33 senses are deep down in the hyponomy tree. The problem is that there are just too many senses that are &lt;em&gt;closely&lt;/em&gt; related. Once again, we can approximate the intuition with features of the taxonomy tree. Specifically, the branching factor of nodes in the tree provide an indication of how many closely related concepts there are.[3] In other words, it would hold that grain(DOG) ≥ grain(MAMMAL) if and only if hyper-branching-factor(DOG) ≥ hyper-branching-factor (MAMMAL), where hyper-branching-factor() returns the branching factor for the closest hypernym.[4] The assumption is that if the senses of “head” are really too close, they will be child nodes of densely populated hypernym nodes in the taxonomy tree.&lt;/p&gt;

&lt;p&gt;So far, I considered the taxonomic tree to be constant and then pointed at features of it – depth and branching factor – to suggest metrics of conceptual grain. Instead one could postulate an ordering of increasingly detailed taxonomic trees. Assume you create a taxonomic ontology and you add batches of nodes to it in a natural way. Then the stages of your taxonomic tree will each have a certain grain. At the beginning you will have a very coarse-grained taxonomy and with each step it will be finer-grained. You can now have a function introduction-to-tree() which returns the number of the stage at which a certain concept was introduced. Then, grain(DOG) ≥ grain(MAMMAL) if and only if introduction-to-tree(DOG) ≥ introduction-to-tree(MAMMAL).&lt;/p&gt;

&lt;p&gt;Admittedly, this measure has a problem, namely the need for an ordering of node batches in a “natural way” of adding them. Much of the subtleties of conceptual grain are hiding there. It won’t do to just record the steps in which nodes where added to WordNet or any other ontology, since chance and history will not follow such a natural order. A concept might be added later to the tree, for many reasons that would not support an inference about its grain – maybe people were too focused on some other topic domain and forget about the more common and coarser-grained concept.&lt;/p&gt;

&lt;p&gt;All of these metrics have their positive and negative sides, depending on what we want to use them for. The use cases provide criteria for evaluation. One of the original reason for introducing a notion of grain was to allow us to complain about WordNet as being too fine-grained for word sense disambiguation. It has too many fine-grained concepts or it has concepts with too high an introduction-to-tree factor for our classifiers. Hence, I propose this first criterion:[5]&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;Conceptual grain correlates with difficulty in addressing WSD as a classification task.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In addition to this NLP-driven criterion, we can also use some linguistic intuitions about grain – in both senses of linguistic – to evaluate the metrics. Such criteria serve the purpose of ensuring that the metric can support linguistic theorizing.&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;More fine-grained concepts should be (pragmatically?) exchangeable in more linguistic contexts than coarser concepts.&lt;/li&gt;
  &lt;li&gt;The grain of a concept should correlate with the length of a na&amp;amp;#xEFve definition we might give for it.
Further criteria could be proposed to ensure the integration of the metric in other disciplines, e.g. cognitive science. From the question of what is grain, we are driven the to the issue of what we the notion and its metrics to do for us.&lt;/li&gt;
&lt;/ol&gt;

&lt;hr /&gt;
&lt;h3 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h3&gt;
&lt;p&gt;[0] I use “sense” and “concept” interchangeably in this post.&lt;/p&gt;

&lt;p&gt;[1] I assume that the hyponomy relation holds between concepts, not words. Otherwise I am not sure how to handle polysemy.&lt;/p&gt;

&lt;p&gt;[2] Maybe concepts connected by hyponomy edges form a directed graph and not a tree, but let’s not get bogged down in that for now.&lt;/p&gt;

&lt;p&gt;[3] This measure ignores the proximity between hypernyms and hyponyms, but the next one arguably captures it.&lt;/p&gt;

&lt;p&gt;[4] A generalization would be to take the average branching factor of set of hyponyms.&lt;/p&gt;

&lt;p&gt;[5] A 0th implicit criterion was that a metric of conceptual grain should provide at least a partial order over our concepts.&lt;/p&gt;

&lt;hr /&gt;
&lt;h3 id=&quot;references&quot;&gt;References&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Fellbaum, C. (Ed.). (1998). WordNet: An Electronic Lexical Database. MIT Press.&lt;/li&gt;
  &lt;li&gt;Ide, N., &amp;amp; Wilks, Y. (2007). Making Sense About Sense. In E. Agirre &amp;amp; P. Edmonds (Eds.), Word Sense Disambiguation: Algorithms and Applications (pp. 47–73). Springer Netherlands. https://doi.org/10.1007/978-1-4020-4809-8_3&lt;/li&gt;
  &lt;li&gt;Navigli, R. (2009). Word Sense Disambiguation: A Survey. ACM Computing Surveys, 41(2), 1–69. https://doi.org/10.1145/1459352.1459355&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Mon, 16 Nov 2020 19:47:13 +0000</pubDate>
        <link>https://dstrohmaier.com/grain/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/grain/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>New Paper (CJP): Social-Computation-Supporting Kinds</title>
        <description>&lt;p&gt;The &lt;em&gt;Canadian Journal of Philosophy&lt;/em&gt; has published &lt;a href=&quot;http://dx.doi.org/10.1017/can.2020.33&quot;&gt;my paper&lt;/a&gt; on what I call “Social-Computation-Supporting Kinds”. This paper is a first attempt to re-describe the role of computation in social ontology. I argue – in move I would self-servingly love to call “bold” –  that there is a kind of social kinds which is distinguished by supporting social computations, that is groups implementing computational processes.&lt;/p&gt;

&lt;p&gt;I want to stress that it is a first attempt and will leave many questions open. It’s value lies, hopefully, in doing something different in the social kinds debate and sketching the value this new approach will have. I expect to publish more on this approach.&lt;/p&gt;

&lt;p&gt;The paper is once again published Open Access, thanks to the University of Cambridge.&lt;/p&gt;

&lt;p&gt;I also presented the gist of the paper at the recent online &lt;em&gt;Social Ontology&lt;/em&gt; conference. My video presentation from this conference is &lt;a href=&quot;https://so2020.isosonline.org/conference/social-computation-supporting-kinds/&quot;&gt;still available&lt;/a&gt; for those who don’t want to read the paper.&lt;/p&gt;
</description>
        <pubDate>Tue, 11 Aug 2020 20:47:13 +0100</pubDate>
        <link>https://dstrohmaier.com/New-Paper-on-Social-Kinds-and-Computation/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/New-Paper-on-Social-Kinds-and-Computation/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Presentation at the 2020 Social Ontology Conference</title>
        <description>&lt;p&gt;I am presenting at the 2020 Social Ontology Conference and because it is virtual, you can all watch it online. The &lt;a href=&quot;https://so2020.isosonline.org/&quot;&gt;conference website&lt;/a&gt; provides videos of all talks.&lt;/p&gt;

&lt;p&gt;In my &lt;a href=&quot;https://so2020.isosonline.org/conference/social-computation-supporting-kinds/&quot;&gt;talk&lt;/a&gt;, I discuss the notion of social-computation-supporting kinds. A longer paper exploring the idea will hopefully be published soon. The main advantage of the video is the excellent pixel art.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/pixelated_me.png&quot; alt=&quot;Picture of Myself&quot; /&gt;&lt;/p&gt;

&lt;p&gt;There will also be a Q&amp;amp;A session, but they haven’t been scheduled yet.&lt;/p&gt;
</description>
        <pubDate>Tue, 07 Jul 2020 11:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/social_ontology2020/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/social_ontology2020/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>More Social Ontology Highlights</title>
        <description>&lt;p&gt;I’ve recently posted a short list of &lt;a href=&quot;http://dstrohmaier.com/philosophy/2019/12/23/Best-of-Social-Ontology-2019.html&quot;&gt;social ontology highlights from 2019&lt;/a&gt;, but &lt;a href=&quot;http://philosophy.indiana.edu/people/ludwig.shtml&quot;&gt;Kirk Ludwig&lt;/a&gt; sent me a much more extensive list. I present it here in rearranged form.&lt;/p&gt;

&lt;h2 id=&quot;publications&quot;&gt;Publications&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Tony Lawson: &lt;a href=&quot;https://www.csog.econ.cam.ac.uk/Publications/Publications&quot;&gt;The Nature of Social Reality: Issues in Social Ontology (Economics as Social Theory)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Angela Condello, Maurizio Ferraris, John Rogers Searle: &lt;a href=&quot;https://www.routledge.com/Money-Social-Ontology-and-Law-1st-Edition/Condello-Ferraris-Rogers-Searle/p/book/9780367191115&quot;&gt;Money, Social Ontology and Law&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Trish Reay, Tammar B. Zilber, Ann Langley, and Haridimos Tsoukas (eds.): &lt;a href=&quot;https://global.oup.com/academic/product/institutions-and-organizations-9780198843818&quot;&gt;Institutions and Organizations. A Process View&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Holly Lawford-Smith: &lt;a href=&quot;https://global.oup.com/academic/product/not-in-their-name-9780198833666&quot;&gt;Not In Their Name. Are Citizens Culpable For Their States’ Actions?&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Luka Burazin, Kenneth Einar Himma, and Corrado Roversi (eds.): &lt;a href=&quot;https://global.oup.com/academic/product/law-as-an-artifact-9780198821977&quot;&gt;Law as an Artifact&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;J. Adam Carter, Andy Clark, Jesper Kallestrup, S. Orestis Palermos, and Duncan Pritchard (eds.): &lt;a href=&quot;https://global.oup.com/academic/product/socially-extended-epistemology-9780198801764&quot;&gt;Socially Extended Epistemology&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;sep-revisions&quot;&gt;SEP Revisions&lt;/h2&gt;
&lt;p&gt;The following three articles in the SEP received substantial revisions:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://plato.stanford.edu/entries/social-institutions/&quot;&gt;Social Institutions&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://plato.stanford.edu/entries/social-construction-naturalistic/&quot;&gt;Naturalistic Approaches to Social Construction&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://plato.stanford.edu/entries/epistemology-social/&quot;&gt;Social Epistemology&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;events&quot;&gt;Events&lt;/h2&gt;
&lt;p&gt;On top of Kirk’s list was the &lt;a href=&quot;https://events.tuni.fi/socialontology2019/&quot;&gt;Social Ontology/ENSO conference in Tampere&lt;/a&gt;, which I had already on my list. Otherwise there were three workshops/conferences at Vienna on:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://groupagency.univie.ac.at/fileadmin/user_upload/p_groupagency/Program_Conference_Social_Agency.pdf&quot;&gt;Social Agency, Group Agency &amp;amp; Relational Normativity&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://groupagency.univie.ac.at/events/workshops-and-conferences/workshop-group-agency-and-collective-responsibility/&quot;&gt;Group Agency and Collective Responsibility&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://groupagency.univie.ac.at/events/workshops-and-conferences/workshop-shared-agency-rationality-normativity/&quot;&gt;Shared Agency, Rationality, Normativity &lt;/a&gt; (with Michael Bratman and David Velleman)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There were two events in Milan:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Adelaide de Lastic “The Political Dimension of an Enterprise’s Collective Agency”, Thursday 28 March 2019&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.dipafilo.unimi.it/ecm/home/aggiornamenti-e-archivi/tutte-le-notizie/content/28-novembre-2019-italo-testa-a-pragmatist-take-on-social-ontology-habits-social-practices-statuses.0000.UNIMIDIRE-81494&quot;&gt;A Pragmatist take on Social Ontology: Habits, Social Practices, Statuses&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other events:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;The &lt;a href=&quot;https://www.apaonline.org/event/2019pacific&quot;&gt;2019 Pacific APA meeting had a symposium&lt;/a&gt; on Kirk’s second book, &lt;a href=&quot;https://global.oup.com/academic/product/from-plural-to-institutional-agency-9780198789994&quot;&gt;From Plural to Institutional Agency: Collective Action 2&lt;/a&gt;. (With Maria Jankovic and Carol Rovane as commentators)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://socialontologyglasgow.wordpress.com/events/&quot;&gt;Workshop on Social Ontology, Normativity, and Philosophy of Law&lt;/a&gt;, Glasgow University Law School, May 30-31&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://enposs.eu/past-enposs-2/&quot;&gt;European Network for Philosophy of the Social Sciences (ENPOSS)&lt;/a&gt; in Athens&lt;/li&gt;
  &lt;li&gt;The &lt;a href=&quot;https://ppesociety.org/the-2019-ppe-society-meeting/&quot;&gt;PPE Society meeting in spring 2019&lt;/a&gt; also featured some social ontology papers&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;near-future&quot;&gt;Near Future&lt;/h2&gt;
&lt;p&gt;Kirk also had some forthcoming publications on his list:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Anika Fiebich’s edited book &lt;a href=&quot;https://www.springer.com/gp/book/9783030297824&quot;&gt;Minimal Cooperation and Shared Agency&lt;/a&gt; has slid into 2020:&lt;/li&gt;
  &lt;li&gt;An issue of &lt;em&gt;Language and Communication&lt;/em&gt; is coming out on group speech acts but the papers are already &lt;a href=&quot;https://www.sciencedirect.com/journal/language-and-communication/special-issue/10K75XZFFJ3&quot;&gt;available online&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Saba Bazargan-Forward, Deborah Tollefsen (eds.): &lt;a href=&quot;https://www.routledge.com/The-Routledge-Handbook-of-Collective-Responsibility-1st-Edition/Bazargan-Forward-Tollefsen/p/book/9781138092242&quot;&gt;The Routledge Handbook of Collective Responsibility&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is quite a list and I have to admit that I was not aware of much that Kirk found. I hope others can benefit from it as well.&lt;/p&gt;
</description>
        <pubDate>Sat, 04 Jan 2020 13:57:13 +0000</pubDate>
        <link>https://dstrohmaier.com/More-Social-Ontology-Highlights/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/More-Social-Ontology-Highlights/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Best of Social Ontology 2019</title>
        <description>&lt;p&gt;Social ontology, by which I mean a subfield of contemporary analytic philosophy, is a comparatively small enterprise so far. That makes gathering a best-of-2019 list difficult. There just aren’t that many great papers coming out each year, or other notable events. Here are five highlights I could find. Feel free to send me an email and suggest other contributions to the field. I might update this entry later.&lt;/p&gt;

&lt;p&gt;Brian Epstein’s paper &lt;a href=&quot;https://philpapers.org/rec/EPSWAS-2&quot;&gt;“What are social groups? Their metaphysics and how to classify them”&lt;/a&gt; has been available as forthcoming for a while, but the official publication date has been 2019, which hopefully justifies including it in this list.[0]&lt;/p&gt;

&lt;p&gt;The International Social Ontology Society has started a &lt;a href=&quot;https://www.youtube.com/channel/UCoHANz5VREBjb_TBoWx0a8A&quot;&gt;Youtube channel&lt;/a&gt; this year and published keynotes from last years conference.&lt;/p&gt;

&lt;p&gt;The &lt;a href=&quot;https://isosonline.org/SO2019&quot;&gt;Social Ontology conference in Tampere&lt;/a&gt;, organised by the European Network of Social Ontology. I believe the keynotes have been recorded, so there is hope that they will appear on ISOS Youtube channel at some point.&lt;/p&gt;

&lt;p&gt;Finally, there has been a monist issue on the topic of &lt;a href=&quot;https://academic.oup.com/monist/issue/102/2&quot;&gt;&lt;em&gt;Collective Responsibility and Social Ontology&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;BONUS: Arto Laitinen has told me that a book symposium on Ásta’s &lt;em&gt;Categories We Live By&lt;/em&gt; will soon appear in the &lt;a href=&quot;https://www.degruyter.com/view/j/jso&quot;&gt;Journal of Social Ontology&lt;/a&gt; (and dated as being from 2019).&lt;/p&gt;

&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;Footnotes&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;[0] Two other publications on the ontology of groups deserve honourable mentions, even though they have appeared in 2018 (as forthcoming without a date yet in the first case). The first is Katherine Ritchie’s &lt;a href=&quot;https://philpapers.org/rec/RITSSA-4&quot;&gt;“Social Structures and the Ontology of Social Groups”&lt;/a&gt; and the second is Gabriel Uzquiano’s &lt;a href=&quot;https://philpapers.org/rec/UZQGTA&quot;&gt;“Groups: Toward a Theory of Plural Embodiment”&lt;/a&gt;.&lt;/p&gt;
</description>
        <pubDate>Mon, 23 Dec 2019 18:57:13 +0000</pubDate>
        <link>https://dstrohmaier.com/Best-of-Social-Ontology-2019/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Best-of-Social-Ontology-2019/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>A Different Map of the Tractatus</title>
        <description>&lt;p&gt;Over the years there have been a number of visualisations of Wittgenstein’s Tractatus Logico-Philosophicus. Most of them have made use of the tree structure Wittgenstein imposed on his text. With today’s web-technologies, &lt;a href=&quot;https://homepage.univie.ac.at/noichlm94/posts/tractatus/&quot;&gt;these&lt;/a&gt; &lt;a href=&quot;https://pbellon.github.io/tractatus-tree/&quot;&gt;representations&lt;/a&gt; of the text can be excellent. In this post, however, I present a map of the Tractatus unlike any of these previous experiments.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/visualisation_tractatus.gif&quot; alt=&quot;3D GIF of Tractatus statements&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The picture shows me playing with an interactive representation of all statements in the Tractatus, each represented by an embeddings. Embeddings are vector-representations of meaning. Usually they are created on the level of tokens, but there are ways of aggregating them to higher levels. I took the relatively easy path of averaging the embeddings for all the tokens in the statements.[0] The result should be a map of how strongly the statements are semantically related.[1] The closer two vectors are, the closer the statements are in their meaning, at least that is the idea.&lt;/p&gt;

&lt;p&gt;There are a variety of ways to create embeddings, typically making use of artifical neural networks. The Word2vec library made embeddings popular, but I wanted to explore something more cutting-edge for this visualisation. So I used a pretrained-BERT model to create the vectors. BERT is based on the now fashionable transformer networks (see &lt;a href=&quot;http://nlp.seas.harvard.edu/2018/04/03/attention.html&quot;&gt;here for a technical explanation&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The embeddings are just vectors, to make them visually accessible I use the online &lt;a href=&quot;http://projector.tensorflow.org/&quot;&gt;projector tool&lt;/a&gt;. For this purpose, the hundreds of dimension of the embeddings are reduced to three. Information included in the embeddings is lost in this process. Hence, what you see is only an approximation of what the embeddings capture.&lt;/p&gt;

&lt;p&gt;In contrast to a visualiation using the tree structure created by Wittgenstein, this approach can reveal something we haven’t been aware of. It can suggest connections no one has noticed before. I am not sure it does, but that it has the potential is exhilarating.&lt;/p&gt;

&lt;p&gt;The code is available on &lt;a href=&quot;https://github.com/dstrohmaier/tractatus_embeddings/&quot;&gt;github&lt;/a&gt;, including the embeddings in the TSV format needed for the projector tool.[2] Just go on &lt;a href=&quot;http://projector.tensorflow.org/&quot;&gt;the website&lt;/a&gt;, upload the two TSV-files and you can explore the tractatus in 3D.&lt;/p&gt;

&lt;hr /&gt;
&lt;p&gt;[0] It is actually a bit trickier than that, because I use information from multiple layers in the neural network to create the token-embeddings.&lt;/p&gt;

&lt;p&gt;[1] While embeddings capture some aspect of the semantic content of a token, they do not represent it entirely faithfully. As so much in machine learning, they are best seen as an approximation that works for certain purposes.&lt;/p&gt;

&lt;p&gt;[2] I avoided putting the text of the Tractatus online, since I am not sure what the copyright situation is. If you want it, email me.&lt;/p&gt;
</description>
        <pubDate>Mon, 02 Sep 2019 18:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/A-Different-Map-of-the-Tractatus/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/A-Different-Map-of-the-Tractatus/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Upcoming Talk (August 2019)</title>
        <description>&lt;p&gt;For the fourth year in a row, I will present a paper at a &lt;a href=&quot;https://isosonline.org/SO2019&quot;&gt;social ontology&lt;/a&gt; conference this Summer. After the last one in Boston, I thought it would be time to do something more ambitious. While my previous papers went well enough and led to two publications, they made relatively narrow arguments. This year in Tampere my claims will be much bolder. I do not want to give too much away, but I will propose a sweeping change to how we explain the social and what makes it special from a metaphysical perspective. What makes social interesting should be fundamentally reconceived.&lt;/p&gt;

&lt;p&gt;You should not miss this momentous event. If you do, you can email me at davidstrohmaier92@gmail.com to get an early draft of my paper.&lt;/p&gt;
</description>
        <pubDate>Thu, 01 Aug 2019 16:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/Upcoming-Talk-August-2019/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/Upcoming-Talk-August-2019/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Parsing Hegel</title>
        <description>&lt;p&gt;In another life I read a lot of Hegel, now a mere side-interest of mine. Despite the assurances of my former supervisor Bob Stern to the contrary, Georg Wilhelm Friedrich Hegel’s work is infamously opaque. Making sense of his &lt;em&gt;Phenomenology of Spirit&lt;/em&gt; poses a considerable challenge, and those who claim to understand him often end up with rather different readings.&lt;/p&gt;

&lt;p&gt;In my current life, I am finishing up an MPhil in Advanced Computer Science. My project is in the area of computational semantics where we seek to make sense of expressions in natural language by automatically producing formal representations of their meaning. For this purpose, I am using the Boxer-parser, which uses Discourse Representation Theory (DRT).[0] DRT offers a fancy formalism for capturing action-sentences using a neo-Davidsonian event semantics. One benefit of this theory is that it allows us to represent the meaning in neat little boxes, hence the namer of the parser. The boxes specify a number of variables at the top and then contain conditions in the form of predicates below.&lt;/p&gt;

&lt;p&gt;If computational semantics enables us to make sense of natural language, then why not use it to make Hegel approachable? Why not run Boxer on the &lt;em&gt;Phenomenology&lt;/em&gt;? I can think of very good reasons to resist the idea for the whole book, but not a single one of them kept me from giving it a try with a few sentences. So I just went ahead and adapted a tiny sliver of what I have learned during my MPhil to turn the first sentence of the &lt;em&gt;Phenomenology&lt;/em&gt; into a formal representation.&lt;/p&gt;

&lt;p&gt;The challenge should not be underestimated. The first two sentences read as follows:[1]&lt;/p&gt;

&lt;p&gt;“It is customary to preface a work with an explanation of the author’s aim, why he wrote the book, and the relationship in which he believes it to stand to other earlier or contemporary treatises on the same subject. In the case of a philosophical work, however, such an explanation seems not only superfluous but, in view of the nature of the subject-matter, even inappropriate and misleading. “&lt;/p&gt;

&lt;p&gt;This is not exactly “The dog chases the car”, an example much more adapted to the powers of Boxer. But I have to admit that Boxer surprised me. It managed to produce a representation of the first two sentences:[2]&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://raw.githubusercontent.com/dstrohmaier/parsinghegel/master/data/box_first_sent.svg?sanitize=true&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Despite the intuitive character of the boxes, it is not exactly easy to make sense of the jumble. Boxer seems to have produced less than complete parses, hence the repetition of certain elements (e.g. “contemporary treatise”), but I am honestly impressed that I got anything at all. In fact, Boxer did not present a parse when offered the third sentence:&lt;/p&gt;

&lt;p&gt;“For whatever might appropriately be said about philosophy in a preface - say a historical statement of the main drift and the point of view, the general content and results, a string of random assertions and assurances about truth - none of this can be accepted as the way in which to expound philosophical truth. “&lt;/p&gt;

&lt;p&gt;Failing on such Germanic verbosity is nothing of which Boxer has to be ashamed. It ends, however, the hopes of rendering Hegel intelligible with the current technology.[3] If you generously fund me for four to five years, I will try to produce such representations for the whole of the &lt;em&gt;Phenomenology&lt;/em&gt;. The decision whether that is a worthy investment of your money is up to you.&lt;/p&gt;

&lt;p&gt;You can find the code I used in a &lt;a href=&quot;https://github.com/dstrohmaier/parsinghegel&quot;&gt;public github repository&lt;/a&gt;, but you need to install the C&amp;amp;C parser as well as Boxer for it to work, which is a challenge in its own right.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;[0] Kamp, Hans, and Uwe Reyle. &lt;em&gt;From Discourse to Logic: Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory&lt;/em&gt;. Studies in Linguistics and Philosophy 42. Dordrecht: Springer-Science+Business Media, B.V, 1993.&lt;/p&gt;

&lt;p&gt;[1] I am using the Miller translation.&lt;/p&gt;

&lt;p&gt;[2] The parse neglects a few niceties such as representing the word-senses was WordNet synset and the like, but that is not the problem.&lt;/p&gt;

&lt;p&gt;[3] As a sidenote, let me suggest that Hegel’s &lt;em&gt;Phenomenology&lt;/em&gt; in fact works better with the neo-Davidsonian approach of Boxer than other philosophy texts, because it tries to describes the actions and experiences of spirit. What it describes is closer to action than what we find in most philosophy books.&lt;/p&gt;
</description>
        <pubDate>Thu, 23 May 2019 14:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/parsing-hegel/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/parsing-hegel/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
     
      <item>
        <title>Two New Publications</title>
        <description>&lt;p&gt;Two of my publications are finally out. Both of them are related to my PhD research into social ontology and the both investigate groups. &lt;a href=&quot;https://philpapers.org/rec/STRGMA-2&quot;&gt;The first&lt;/a&gt; one discusses group membership and argues that reducing it to mereological parthood plus further conditions is a viable option. The paper has an unusual history. Originally, I wrote another paper that argued the opposite conclusion, that is I tried to establish that all mereological accounts of groups fail. However, Katherine Hawley published a paper in the debate in 2018 and after reading it I decided that she was right, that we cannot take mereological accounts of the map at this point. So instead of publishing my first paper, I wrote a new one, filling a significant gap in the mereological account. You can find out everything in this crisp little piece.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://philpapers.org/rec/STRTTO-16&quot;&gt;My second paper&lt;/a&gt; undertakes a more ambitious project. It defends the conclusion that current interpretivist accounts of group agency fail and have to fail while functionalist accounts have a better shot. Like the first one, this second paper draws on the ontology of groups. Coinciding groups, that is groups which share all their members at all times, pose a special problem to interpretivist accounts, or so I argue.&lt;/p&gt;

&lt;p&gt;I am proud to say that both papers have been published open access. As long as you have internet access, you can read them.&lt;/p&gt;
</description>
        <pubDate>Tue, 16 Apr 2019 16:57:13 +0100</pubDate>
        <link>https://dstrohmaier.com/two-new/</link>
        <guid isPermaLink="true">https://dstrohmaier.com/two-new/</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
  </channel>
</rss>
