Zak Kolar

Why can’t LLMs understand?

November 25, 2024

I have a language made up of the following emojis:

🚓 🚕 🚜
🔺 🔻
🥦 🍋 🍅
🤿 ⛸️ 🏐

Here is a list of "sentences" written in this language. Do you notice any patterns?

🚕🔺🥦
🚜🔻⛸️
🚓🔻🍋
🚜🔺🤿

You may notice:

  • Each sentence is made up of exactly three emojis
  • Each sentence starts with a vehicle
  • Each sentence has a triangle in the middle

Click in the blank spaces below to see if you can create your own coherent sentences. Use the "test" button to check. In this context, coherent means a native speaker of the emoji language would believe they were written by another native speaker.

If you're unsure where to start, try guessing random sequences. Use the feedback to refine your guesses.

Open sentence tester in a new window

If you play with the tester long enough, you'll eventually come up with enough patterns to create a coherent sentence every time.

Now consider this sequence:

🚜🔺⛸️

The tester will tell you this sentence is coherent. But it true?

Missing context

What kind of nonsense question is that? You may have a handle on which emojis are likely to appear in a particular order. But you have no context to understand what they mean.

Maybe each emoji represents a word or concept we're familiar with. Or a pitch in a humpback whale song. Or a frequency in a radio signal originating from somewhere in space.

To construct a coherent sentence, we don't need to know what the emojis represent. We simply recreate the patterns we've observed. But to discern meaning and truth, we’d need additional context.

What if I gave you a bigger vocabulary with more emojis, thousands of example sentences, and unlimited time to look and test for patterns? You may be able to identify more patterns and produce sentences of greater complexity. But you still wouldn't understand what they mean.

Feedback

🚜🔺⛸️ is not true (at least as I’m writing this). You can add this feedback to your list of rules and patterns, avoiding that particular sentence in the future.

But you still don't know what the pattern means or why it's untrue. And unless I enumerate every possible false statement in the language, you'll have to rely on external feedback to compensate.

Translation

Here are translations from the emoji language into English:

🚓 = The traffic light
🚕 = The banana
🚜 = The weather

🔺 = is
🔻 = is not

🥦 = green
🍋 = yellow
🍅 = red

🤿 = raining
⛸️ = snowing
🏐 = sunny

This means 🚜🔺⛸️ translates to “The weather is snowing”. Even this is not enough context on its own. In Massachusetts in late November, it's false. But in another place or at a different time, the same sentence may be true.

LLMs

Large language models work by finding patterns to construct coherent sentences. They lack the context necessary to “understand” what they’re generating. As humans provide feedback about problematic or untrue statements, they "learn" specific patterns to avoid. But this is not the same as generalizable understanding.

Regardless of advances in hardware and training data, these models will always be prone to constructing false sentences (or “hallucinating”). In the disclaimer “ChatGPT may make mistakes”, there is no implicit “for now”.