When we see LLMs struggling to demonstrate an understanding of what letters are in each of the tokens that it emits or understand a word when there are spaces between each letter, we should compare it to a human struggling to understand a word written in IPA format (/sʌtʃ əz ðɪs/) even though we can understand the word spoken aloud normally perfectly fine.
I know IPA but I can’t read English text written in pure IPA as fast as I can read English text written normally. I think this is the case for almost anyone who has learned the IPA and knows English.
When we see LLMs struggling to demonstrate an understanding of what letters are in each of the tokens that it emits or understand a word when there are spaces between each letter, we should compare it to a human struggling to understand a word written in IPA format (/sʌtʃ əz ðɪs/) even though we can understand the word spoken aloud normally perfectly fine.
But if you’ve learned IPA you can read it just fine
I know IPA but I can’t read English text written in pure IPA as fast as I can read English text written normally. I think this is the case for almost anyone who has learned the IPA and knows English.