Spend a vision with me: Gen AI thing Part II: Talking in human language, not in machine language

In my previous blog, I explained that AI is about making machines think like humans, and I gave an example of a human task of recognising objects and how you can get a machine to do that. In this blog, I will expand a bit more on how we can all become Dr Dolittle (1) but with machines rather than animals.

A few years ago, someone from LinkedIn asked me what coding language I would recommend a child to learn since he was making the decision for his newborn. I said that, in my view, rather than humans learning how to speak machine language (coding), sooner or later machines would learn how to understand human language (something like NLP Natural Language Processing), and it would be more important for a child to learn how to think systematically but also creatively rather than learn how to code. I haven’t heard from that person since. Hey, sooner has happened (2)(3)(4). But I am jumping the gun.

For the 2^nd time, what is Gen AI!?

Gen AI is basically using AI to create something that wasn’t there before. What is created can be text, an image, a sound… But the trick is that, first the machine has to learn (that is be trained on a bunch of data/examples), then it can produce something.

But what excites most people is that anyone can use Gen AI because the machine speaks human language (no code and you can access the mythical AI!). I will tackle this part first.

The machine understands me!

Another branch in AI/ML is NLP, Natural Language Processing. NLP is precisely concerned with making machines understand what humans are saying. You can imagine, it’s already quite difficult for humans to understand each other, now imagine machines…

Language is a very complex thing, and is a living thing: new words are added all the time, meanings are added to words over time, words may mean different things in different contexts, humans use irony, sarcasm… But it is worth it because a huge amount of knowledge is kept in language, whether oral or written form. With the advent of the internet, and the digitisation (making it digital - bits and bytes- rather than analogue – printed image) of dictionaries, research papers, and democratisation of access to the internet (any idiot can write a blog – but smart people know which to read) there is a treasure trove of information that can be used to train a machine on the internet. But language is not that easy to deal with.

Words are all I have

In my previous post I talked about classification, and one of the keys is to measure the distance between things and decide which are similar. How does that apply to words?

But computers are all about numbers, not words…

The first challenge that machines have in comparing words is that they do better at numbers, so the first trick is to somehow make the problem one that involves numbers, once you know how to measure, then deciding which is closer is not so hard..

Look at the words “BETTER” and “BUTTER”. How close are they?

There is only 1 letter difference, so, these 2 words are quite close, it’s just replacing a letter. There are some concepts of distance that make such calculations, especially taking into account the number of letters in the word. These algorithms are quite useful. The idea is that words are similar if it takes little effort to change one into another.

Now, let me add the word “BEST” to the comparison. As an English speaking person, you would say “BEST” is close to “BETTER” but not so close to “BUTTER”, but going purely by replacing letters misses the meaning. Therefore there must be a way.

Vector Embedding

Similar to a dictionary for words humans can refer to, there is a source of information that machines can refer to that tells them the relationship between words (humans can use them too). This is called vector embedding.

Vector Embedding: Imagine

Imagine a 3 dimensional space in front of you. A point in this space represents a word. A vector for that word is like directions to that point in space (here may be x, y and z coordinates). And each word is embedded in space with closer words having similar meaning/context. One of the really popular techniques has been made public by google called word2vec, basically transform a word into a vector while preserving the meaning of the word.

So to follow our example, in the 3D space, ‘BETTER’ and ‘BEST’ will be close to each other, and ‘BUTTER’ further (closer to ‘MARGARINE’ and ‘MARMALADE’).

Points in space, and more

Not only are words that are similar grouped together so the machine can get the topics in a piece of text, but the relationships between the points in space also have meaning: moving from “BETTER” to “BEST” is the same journey as moving from “WORSE” to “WORST”.

This is something worth thinking about, not only do vector embeddings bring words that are about the same thing close to each other, but based on not only the distance, but the direction (6), the relationship between the words can be inferred.

What is the big deal with vector embeddings?

The beauty of vector embeddings is that some large organisations like google have made their vector space available for anyone to use, so we do not have to train the models, for example word2vec(5). In some cases, say you are dealing with very specialised topic say medicine, you should use specialised vector embeddings, but for most cases, for the machine to understand what the human is saying, generic vector embeddings work well enough.

Therefore, the machine is able to know what we are saying whether we use the same words or not because it now, with embeddings, see what words are close to each other in meaning and their relationship with others. That’s great!

What this means is that it is possible to train the machine on millions of pieces of text on a bunch of topics, and it will be able to understand that some of talking about the same thing even if the words used are different.

Ok, but this is not new right?

Correct! Vector embeddings aren’t a 2020s thing (7). In the 1950s, John Rupert Firth made a statement that underlies a lot of the thinking today:

“You shall know a word by the company it keeps” J.R. Firth 1957 (8)

However, 75 years ago we did not have the computing resources we have. So, AI went into winter – people could think about it, but it was very hard to put it into practice. For example, imagine the number of words in a language (9) – English Wiktionary (10) contains around 700k base words and 1.4m definitions - and if you want to put this in space with the meanings then you will need many groups spread across many dimensions, and even worse there will be dimensions with few words, making computation really tough (curse of dimensionality (11)). Most people can navigate through 4 dimensions and our brains can handle 4 dimensions easily (our 3D world + time) (Next time someone is late for a meeting, introduce them to the 4^th dimension 😊 ). However, some research points to humans being able to handle more (12), but still not as many required to plot even only common words in English.

Note that not everything stopped, people spent time in many other directions.

In the 2000s, research hotted up and some great leaps were made, for example research by Yoshua Bengio and colleagues at Montreal proposed the path forward “We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.” (13)

Ooops! getting too geeky here, just to summarise the point about vector embeddings. The thing with machines is that they don’t understand language just like that. So, one of the ideas was to convert words into numbers (vectors). Then the words that are about the same thing are grouped together, so if you use slightly different words from me but we are saying the same thing, the machine can tell. The neat thing about the numbers is that doing maths on the numbers allows the machine to understand the relationship between the words, for example the relationship between “king” and “man” is the same as “queen” and “woman”

(14)

The machine is now ready to understand you!

Add to this that there exist specialised vector embeddings for specific fields, this allows the machine to have understand you generally, or even if you are asking in depth questions on specialised topics.

So, what this helps is for machines to store all the info they have access to in a way that is very easy for them to search and make use of, so they can figure out to a large degree what you are talking about. It is not perfect, that is why you have a role of prompt engineer (someone who speaks the ‘human language’ the machines understand). Personally I think advances in NLP, machines being trained by interactions with humans, sooner or later there will be less need for prompt engineering; we (as in humans and AI) will all speak a ‘common language’, a bit like how some people speak differently to their children (or pets) or ‘foreigners’ compared to their own friends and family.

But still this is not Gen AI, where is the Generative part?

True, we are getting there…

In my previous blog and this one, I explained how machines can be made to think like humans, how advances in technology have made it easier to avail training data to machines so they can understand what humans are saying to a large extent.

The next step is how machines can now create stuff, I will be focusing on how machines can write stuff that has not been written before. That will be the topic of the 3^rd and last part of this loooong blogpost.

https://www.youtube.com/watch?v=YpBPavEDQCk
https://ai.meta.com/blog/code-llama-large-language-model-coding/
https://venturebeat.com/programming-development/stability-ai-launches-stablecode-an-llm-for-code-generation/
https://cloud.google.com/use-cases/ai-code-generation
https://en.wikipedia.org/wiki/Word2vec
That’s the basic thing about vectors, they are about ‘magnitude and direction’ https://en.wikipedia.org/wiki/Vector and the relationship between them can be ‘easily’ mathematically calculated
https://en.wikipedia.org/wiki/Word_embedding
https://cs.brown.edu/courses/csci2952d/readings/lecture1-firth.pdf
https://en.wikipedia.org/wiki/List_of_dictionaries_by_number_of_words
https://en.wiktionary.org/wiki/Wiktionary:Main_Page
https://en.wikipedia.org/wiki/Curse_of_dimensionality
https://www.frontiersin.org/articles/10.3389/fncom.2017.00048/full
https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
https://blogs.mathworks.com/loren/2017/09/21/math-with-words-word-embeddings-with-matlab-and-text-analytics-toolbox/

Spend a vision with me

Sunday, 22 October 2023

Gen AI thing Part II: Talking in human language, not in machine language

No comments:

Post a Comment