Spend a vision with me: Gen AI Thing, Part III : finally getting to the crux of the issue

In my previous 2 blog posts, I explained, as I would explain to my mum, what is AI, and how machines can understand our words. In this blog, I explain heuristically (non-technical / common sense) how machines can not only understand what we are saying, but how they can respond intelligently and create answers that were not there before (Gen AI)

One more little detour

For those of you who have played with ChatGPT or something similar (Bard?), one of the things that puzzles people is the concept of ‘tokens’. Some of you may ask, since I claim that machines can understand human words well enough, what is this token thing? Are we gambling when using these ‘tools’?

Gambling? Yes may be… Some examples of funky ChatGPT (and Bard) results

ChatGPT making up cases to support its argument (1)
ChatGPT making up cases of sexual harassment and naming the supposed perpetrator (2)
Bard makes mistake regarding James-Webb Telescope (3)

There are ways to mitigate these issues, but this is beyond the scope of this blogpost. Suffice to say that such models do give information that may be less than reliable. But then again, they were not designed to ‘tell the truth, the whole truth, and nothing but the truth’.

ChatGPT/BARD are not designed to tell the truth?! I thought they were AI!

The answer to the question lies in the question itself. These are AI systems, and as I mentioned in my 1^st blog of the series, such models learn from (are trained on) data they are fed. Secondly, it may help to understand a bit how these systems, they are called LLMs (Large Language Model), work.

How does ChatGPT/Bard… work?

Let me start by a word game. How many of you play wordle (4)? Basically everyday, a 5 letter word is chosen, and you have to guess the word without any clue, you have 6 tries. All that you will ever know is whether the letter you have suggested exists in the answer but is in the wrong slot (yellow) or in the correct spot (green) or does not exist at all (black). The other condition is that any combination of letters you try has to be an existing word.

The thing is, most people, once they know the position of one letter, will try to guess the letters next to it based on what they know about the English language, for example (5):

‘E’ is the most common letter in English and your best bet if you know nothing about the word, this is followed by ‘T’ and ‘A’.
If there is a Q, chances are there has to be a U, and chances are the U follows the Q
If there is a ‘T’ except in 5^th position, then the next letter is likely a ‘H’ next is ‘O’ and next is ‘I’
If there is a ‘H’ except in 5^th position, then the next letter is likely a ‘E’ next is ‘A’ and next is ‘I’

Combinations of 2 letters such as ‘QU’, ‘TH’, ‘TO’, ‘TI’ are called bigrams. The idea is that once you know a letter, you use this information to find the most likely following letter – this is known as conditional probability, based on the condition that one letter is an ‘T’ then the most likely following letter in an ‘H’, not an ‘E’, the most common letter in English. The key is that your choice of letter changes based on information you have.

These are shortcuts, findings based on analysis of words, that can help you guess the letters in wordle.

As an aside, the most common bigrams in different languages can be very different (6)(7)

Bigram Popularity	English	French	Spanish
1	TH	ES	DE
2	HE	LE	ES
3	IN	DE	EN
4	ER	EN	EL
5	AN	ON	LA

Letters are fine, but Gen AI generates whole documents, not random letters

It’s just an extension of the idea. In the example above, I used bigrams (2 letters), when playing wordle, some people may choose trigrams (3 letters), it’s basically the same thing, just a little bit more complex.

The next step then is that instead of guessing the next letter (using a bi-gram), you guess the next word. But why stop there? You can actually go beyond a bi-gram and use multiple letters (here words). It’s, in principle, that straightforward. However, to improve the performance, there are a few more tricks.

The problem is the size of the data; given the number of words, the combinations possible increase exponentially as you add more words. The brilliant, or one of the more brilliant, things about LLMs is that they generate a probability of a combination of words occurring. They do that by using an underlying model and, recognise the patterns.

AI, NN, and the human brain

As mentioned in Part 1 of this blog, AI is about making a machine think like a human. The way this has been done in Neural Networks is to make a representation (model) of the human brain, with nodes and connections. And as it is thought with the human brain, each node does a fairly simply job (one of the simplest jobs is a binary yes/no or a threshold – in this case called a perceptron), and the connections between them are given weights based on how important they are.

Note that as Neural Nets have progressed, they have taken a life of their own and the idea of mimicking the human brain structure is not central, the architecture of neural nets, while using nodes and connections can be different.

Going back to the chair and table example

When you show the machine a picture, it breaks it down into small parts of the picture (features), may be the length of the leg, the shape of the back, and assigns weights based on how important these features are. After being trained over many examples, the model is ready to distinguish between table and chair.

The illustration above shows a very simple type of Neural Network, one input layer where you start, one hidden layer of nodes and connections in one direction to do the magic, into the output layer. For the table chair classification from images for example, it has been found that neurons arranged in a grid formation work well, specifically a Convolution Neural Net. Basically, a set of filters is applied to detect specific patterns in the picture (convolutions), then these are summarised and combined (more layers) to extract the more salient features without burning enormous resources, and finally pushed to the output layer; in the case of our chair/table classification there would be 2 nodes in the output layer, the output being the probability that the image fed is a chair or a table. (9)

There are many ways to structure a neural net, many parameters to play with. You wouldn’t be surprised that one of the important innovations was that, for processing text it is important to know what else is in the sentence, and not process each word independently. So, there was a need to be able to refer to past Long Short Term Memory (LSTM) (10) allowed this to happen by allowing the user to control how long some nodes would retain information, and hence be used to provide context.

However, LSTM is not that fast as it processes information sequentially, like many of us do, we read word by word.(11). In 2017, a team from google came up the brilliantly entitled paper “attention is all you need” (12). This gave rise to the rise of Decepticons (13), sorry, to Transformers(14). Basically, the machine, when processing a chunk of text, calculates weights using an attention network, calculating what words need to be given a higher weight. While Transformers can be run sequentially, they can also be run in parallel (no recursion), hence the usefulness of GPUs in LLMs.

To answer a friend’s question, GPUs are not necessary in LLMs, but they really speed things up. (15)

Is LLM therefore just a better chatbot?

You must be thinking that LSTM is something that has been used in Chatbots before, and LLMs, as I have explained here, basically just answer your queries…

Actually no. One huge difference between chatbots and LLMs is how they learn. LLMs use reinforcement learning (I sneakily introduced this in Part I of this series, there even is RLHF Reinforcement Learning from Human Feedback...), also the volume and diversity of data that these have been traditionally trained on is vastly different. LLMs can ‘talk’ about many more topics/intents than a traditional chatbot that is usually more focused.

However, the comparison with a chatbot is an interesting one. The interest in LLMs really took off with GPT3.5. As the name suggests it is not the 1^st offering in the GPT family of OpenAI. So what made GPT3 garner so much interest (GPT-1 was released in 2018, GPT2 in 2019, GPT3 in 2020, and GPT3.5 in 2022 (16))? One was that it suddenly improved, and second that a friendly chat interface was included, allowing virtually anybody with an internet connection to play with it, and become an instant advocate.

A few more points

GenAI, here LLMs, basically smartly and quickly process word/token embeddings to understand you, and produce a response. The key to understand them, as I mentioned earlier is to know they are not designed to give you the truth, but they answer: “what would a likely answer be?” Actually, not only that, GenAI gives you the likely answer of an average person (Thank you Doc for pointing this out clearly). Think about it, if it is trained on the whole internet, and ranks the most likely answer, then the most likely answer may not be that of people who really know what they are talking about. Hence, my thought that LLMs can help so-so coders, but expert coders may not be helped that much, they probably know better.

Questions to ponder:

Do you believe that logic is something that is common in humankind? Is common sense really that common?
How about Maths, do you believe that people are generally good or bad at Maths?
Why am I asking this? Simple, now tell me, do you think, LLMs are good at logic? At Maths?

Is most likely always the best?

Now, there’s one more thing is that you can influence: what GenAI responds to you. I mentioned that they basically rank all possible words and pick one; may be your first instinct is to always pick the highest probability word.

That would give you consistent answers over time. However, always using highest probability response often leads to circular and less than satisfactory answers. Hence, most people choose to allow some randomness (ChapGPT calls this temperature(17))

Conclusion:

GenAI is a great tool (what you can do with GenAI, whether you are from an SME, an individual looking to make your own life easier, or a large organisation may be a topic for a next blog). What it does it come up with is a possible answer based on the data it has been trained on. (Actually another blog post could be why GenAI is not the answer to everything, but that’s probably obvious)

https://www.channelnewsasia.com/business/new-york-lawyers-sanctioned-using-fake-chatgpt-cases-legal-brief-3581611
https://www.businesstoday.in/technology/news/story/openai-chatgpt-falsely-accuses-us-law-professor-of-sexual-harassment-376630-2023-04-08
https://www.bbc.com/news/business-64576225
https://www.nytimes.com/games/wordle/index.html
http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/english-letter-frequencies/
http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/french-letter-frequencies/
http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/spanish-letter-frequencies/
you can also adjust how you penalise mistakes, known as the loss function; so that’d be a 4th way.
https://www.simplilearn.com/tutorials/deep-learning-tutorial/convolutional-neural-network
http://www.bioinf.jku.at/publications/older/2604.pdf
LSTM evolved from Recurrent Neural Networks (RNN) where the idea was that you can look back at information you processed earlier (hence recurrent), however if the information was far back, there were problems referring to it.
https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
https://tfwiki.net/wiki/Rise_of_the_Decepticons
https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)
Memorable demo of CPU vs GPU https://www.youtube.com/watch?v=-P28LKWTzrI
https://en.wikipedia.org/wiki/GPT-4
https://community.openai.com/t/cheat-sheet-mastering-temperature-and-top-p-in-chatgpt-api-a-few-tips-and-tricks-on-controlling-the-creativity-deterministic-output-of-prompt-responses/172683

Spend a vision with me

Sunday, 29 October 2023

Gen AI Thing, Part III : finally getting to the crux of the issue

No comments:

Post a Comment