Block party - how AI pieces language together

Do you remember Lego? Of course you do.

Or if you have young children not only do you remember Lego, you can’t remove it from your memory because you also can’t remove the imprint from your feet as you tried to walk barefoot through the kitchen late at night without making a sound.

OK, why am I asking you this?

I am really, really, really bad at making anything with Lego. I try it with my nephew and he has always something much more awesome than I have by the end of the evening. The kid has experience. I, on the other hand, have none.

Or to put it another way, he has training and I have none. Each time he builds something, he gets better at predicting which pieces will work well together.

And that's exactly what makes AI tick - it's all about predicting what comes next. Just like my nephew choosing Lego pieces, AI constructs sentences by carefully selecting each piece based on what it has learned.

Let's see how that works.

Choosing the blocks

As far as the average user is concerned they ask AI a question and then a magical string of words appears.

Don’t get me wrong this is almost kind of magical, but it's mathematical magic.

So how do AI responses begin? Let’s stop and wonder how a human brain begins to respond to a question

Seriously, I’m no cognitive scientist. I can’t answer that last question and I do apologise for leading you down the wrong path.

But I will ask you this, have you ever had a conversation with somebody and while you were talking you suddenly go blank and realise you can’t finish the sentence? You started off, then realised you had no idea what comes next.

That’s kind of what AI is doing, except they will never get to that moment of crisis.

You’re going to mention tokens again, aren’t you?

Yep, this is once more a question of tokens. Sorry. I apologise, and I should’ve warned you in advance that the rest of this is going to be about statistics.

If that’s not your particular brand of vodka, no hard feelings, otherwise come with me.

Statistic turtles all the way down

When you ask a question and an AI assistant replies it is not giving you a full sentence right away. In fact, it is giving you a sentence word by word as it makes it up.

And, everything I just said is wrong.

It is in fact giving you a sentence, token by token, as it makes it up.

Yes entire words can be tokens but they can also be small parts of words, and when combined will form words that you recognise. They can also be puncutation.

Consider the phrase “I feel ill”.

If you send that to an AI assistant, it might say “You should see a doctor”, or it might tell you that you ate nothing but ice cream for a whole month, and have no sympathy for you.

But which is more likely?

I think we can both agree the sentence starting with “you should” is more likely.

Okay, now we are two words in.

But maybe it’s going to say “you should see a doctor” or maybe even “you should see your face”.

See a doctor, go to bed, take some medicine, see your face… At this point, we have gone down the trouser leg of possibilities

This is what AI is doing, and not just every word, but every token.

Let's visualise how an AI actually constructs its responses. The diagram below shows the step-by-step process of building a possible response to 'I feel ill':

Each box represents a token - a word or part of a word - and the lines show possible paths forward, with percentages indicating how likely each path is.

Decision tree diagram showing how AI generates responses, with probabilities for each possible word. Primary path 'You should see a doctor' has highest probabilities, with alternative branches showing other possible completions.

Multiple possible responses are considered, and every response is checked for statistical likelihood.

"You should see a doctor" is probably going to be the most accurate response statistically speaking.

But that might not always be the case. The AI doesn't always have to take the most probable token. Which is why if you ask it the same question twice you'll likely get two different sentences.

When you ask AI to be factual, it will go for something in a higher probability. And when you ask it to be creative, you're giving it an indication that lower probability tokens can be considered.

Back to the Lego

Just like my nephew and I, every Lego piece that comes next is checked to see if it is a good fit. But, again, he is more experienced than I am so his output is more sensible.

My nephew knows better than I do what the next piece should be. He might not know it but he is applying statistical knowledge to the next step.

And that is what an AI model is doing, but on a massive scale. Just as my nephew has built hundreds of Lego models, seeing which combinations work and which don't, AI models are trained on billions of examples of text. The probabilities for each possible token combination start as random guesses, but through exposure to vast amounts of text, the model learns which sequences make sense in which contexts - much like how my nephew now instinctively knows that a flat piece works better as a roof than a tall brick. This is why AI can generate everything from factual responses to creative stories - it has learned the patterns of how language fits together, just like an expert Lego builder knows how the pieces combine to create anything from houses to spaceships.