Peter Pandey has a curious parrot called Buddy. Buddy has a great mimicking ability and a sharp memory. Buddy listens to all the conversations in Peter's home and can mimic them very accurately.
Now when he hears, feeling hungry, I would like to have some biryani. For this case, the probability of him saying biryani, cherries or food is much higher than the words such as bicycle or book. Buddy doesn't understand the meaning of biryani or food or cherries the way humans do.
All he is doing is using statistical probability along with some randomness to predict the next word or set of words purely based on the past conversations he has listened to. We can call Buddy a Stochastic Parrot. Stochastic means a system that is characterized by randomness or probability.
A language model is somewhat like a Stochastic Parrot. There are computer programs that use a technology called neural networks to predict the next set of words for a sentence. For a simple explanation of a neural network please watch this particular video.
Just like how Buddy is trained on Peter's home conversations data set, you can have a language model that is trained on, for example, all movie related articles from Wikipedia and it will be able to predict the next set of words for a movie related sentence. Gmail autocomplete is one of the many applications that uses a language model underneath. Now that we have some understanding of a language model, let's understand what the heck is a large language model. Let's go back to our buddy example.
Our buddy got some divine superpower and now he can listen to Peter's neighbors'conversations, conversations that are happening in schools and universities in the town. In fact, not only in his town. all the towns across the world.
With this extra power and knowledge, now Buddy can complete the next set of words on a history subject, give you a nutrition advice, or even write a poem. Like our powerful parrot Buddy, large language models are trained on a huge volume of data such as Wikipedia articles, Google news articles, online books, and so on. If you look inside the LLM, You will find a neural network containing trillions of parameters that can capture more complex patterns and nuances in a language.
ChatGPT is an application that uses LLM called GPT-3 or GPT-4 behind the scenes. Other examples of LLMs are Palm2 by Google and Lama by Meta. On top of statistical predictions, LLM uses another approach called reinforcement learning with human feedback.
Let's understand this once again with Buddy. One day Peter was having a conversation with his cute little two-year-old son. Son, don't eat too much bananas else I will punish you with an iron rod. Hearing this, Peter realized that Buddy has been listening to the conversation from abusive parents in his town.
What he said was the effect of that. Peter then starts keeping a close eye on what Buddy is saying. For the same question, Buddy can produce multiple answers and all Peter has to do is tell him which one is toxic and which one is not.
After this training, Buddy doesn't use any toxic language. While training chat GPT, OpenAI used a similar approach of human intervention, RLHF. OpenAI used A huge workforce of humans to make chat GPT less toxic.
While LLMs are very powerful, they don't have any subjective experience, emotions or consciousness that we as humans have. LLMs work purely based on the data that they have been trained on. I hope you like this short explanation which was based on analogy. Obviously, the technical working of this thing is little different than analogy, but this should give you a good intuition on this topic.
If you like this video, please share with those who are curious about this topic.