Sunday, 29 October 2023

Gen AI Thing, Part III : finally getting to the crux of the issue

In my previous 2 blog posts, I explained, as I would explain to my mum, what is AI, and how machines can understand our words. In this blog, I explain heuristically (non-technical / common sense) how machines can not only understand what we are saying, but how they can respond intelligently and create answers that were not there before (Gen AI)

One more little detour

For those of you who have played with ChatGPT or something similar (Bard?), one of the things that puzzles people is the concept of ‘tokens’. Some of you may ask, since I claim that machines can understand human words well enough, what is this token thing? Are we gambling when using these ‘tools’?

Gambling? Yes may be… Some examples of funky ChatGPT (and Bard) results

  • ChatGPT making up cases to support its argument (1)
  • ChatGPT making up cases of sexual harassment and naming the supposed perpetrator (2)
  • Bard makes mistake regarding James-Webb Telescope (3)

There are ways to mitigate these issues, but this is beyond the scope of this blogpost. Suffice to say that such models do give information that may be less than reliable. But then again, they were not designed to ‘tell the truth, the whole truth, and nothing but the truth’.


ChatGPT/BARD are not designed to tell the truth?! I thought they were AI!

The answer to the question lies in the question itself. These are AI systems, and as I mentioned in my 1st blog of the series, such models learn from (are trained on) data they are fed. Secondly, it may help to understand a bit how these systems, they are called LLMs (Large Language Model), work.

 

How does ChatGPT/Bard… work?

Let me start by a word game. How many of you play wordle (4)? Basically everyday, a 5 letter word is chosen, and you have to guess the word without any clue, you have 6 tries. All that you will ever know is whether the letter you have suggested exists in the answer but is in the wrong slot (yellow) or in the correct spot (green) or does not exist at all (black). The other condition is that any combination of letters you try has to be an existing word.

The thing is, most people, once they know the position of one letter, will try to guess the letters next to it based on what they know about the English language, for example (5):

  • ‘E’ is the most common letter in English and your best bet if you know nothing about the word, this is followed by ‘T’ and ‘A’.
  • If there is a Q, chances are there has to be a U, and chances are the U follows the Q
  •  If there is a ‘T’ except in 5th position, then the next letter is likely a ‘H’ next is ‘O’ and next is ‘I’
  •  If there is a ‘H’ except in 5th position, then the next letter is likely a ‘E’ next is ‘A’ and next is ‘I’

Combinations of 2 letters such as ‘QU’, ‘TH’, ‘TO’, ‘TI’ are called bigrams. The idea is that once you know a letter, you use this information to find the most likely following letter – this is known as conditional probability, based on the condition that one letter is an ‘T’ then the most likely following letter in an ‘H’, not an ‘E’, the most common letter in English. The key is that your choice of letter changes based on information you have.

These are shortcuts, findings based on analysis of words, that can help you guess the letters in wordle.

 As an aside, the most common bigrams in different languages can be very different (6)(7)

Bigram Popularity

English

French

Spanish

1

TH

ES

DE

2

HE

LE

ES

3

IN

DE

EN

4

ER

EN

EL

5

AN

ON

LA

 

Letters are fine, but Gen AI generates whole documents, not random letters

It’s just an extension of the idea. In the example above, I used bigrams (2 letters), when playing wordle, some people may choose trigrams (3 letters), it’s basically the same thing, just a little bit more complex.

The next step then is that instead of guessing the next letter (using a bi-gram), you guess the next word. But why stop there? You can actually go beyond a bi-gram and use multiple letters (here words). It’s, in principle, that straightforward. However, to improve the performance, there are a few more tricks.

The problem is the size of the data; given the number of words, the combinations possible increase exponentially as you add more words. The brilliant, or one of the more brilliant, things about LLMs is that they generate a probability of a combination of words occurring. They do that by using an underlying model and, recognise the patterns.

AI, NN, and the human brain

As mentioned in Part 1 of this blog, AI is about making a machine think like a human. The way this has been done in Neural Networks is to make a representation (model) of the human brain, with nodes and connections. And as it is thought with the human brain, each node does a fairly simply job (one of the simplest jobs is a binary yes/no or a threshold – in this case called a perceptron), and the connections between them are given weights based on how important they are.

 


Note that as Neural Nets have progressed, they have taken a life of their own and the idea of mimicking the human brain structure is not central, the architecture of neural nets, while using nodes and connections can be different.

Going back to the chair and table example

When you show the machine a picture, it breaks it down into small parts of the picture (features), may be the length of the leg, the shape of the back, and assigns weights based on how important these features are. After being trained over many examples, the model is ready to distinguish between table and chair.

The illustration above shows a very simple type of Neural Network, one input layer where you start, one hidden layer of nodes and connections in one direction to do the magic, into the output layer. For the table chair classification from images for example, it has been found that neurons arranged in a grid formation work well, specifically a Convolution Neural Net. Basically, a set of filters is applied to detect specific patterns in the picture (convolutions), then these are summarised and combined (more layers) to extract the more salient features without burning enormous resources, and finally  pushed to the output layer; in the case of our chair/table classification there would be 2 nodes in the output layer, the output being the probability that the image fed is a chair or a table. (9)

There are many ways to structure a neural net, many parameters to play with. You wouldn’t be surprised that one of the important innovations was that, for processing text it is important to know what else is in the sentence, and not process each word independently. So, there was a need to be able to refer to past Long Short Term Memory (LSTM) (10) allowed this to happen by allowing the user to control how long some nodes would retain information, and hence be used to provide context.

However, LSTM is not that fast as it processes information sequentially, like many of us do, we read word by word.(11). In 2017, a team from google came up the brilliantly entitled paper “attention is all you need” (12). This gave rise to the rise of Decepticons (13), sorry, to Transformers(14). Basically, the machine, when processing a chunk of text, calculates weights using an attention network, calculating what words need to be given a higher weight. While Transformers can be run sequentially, they can also be run in parallel (no recursion), hence the usefulness of GPUs in LLMs.

To answer a friend’s question, GPUs are not necessary in LLMs, but they really speed things up. (15)

Is LLM therefore just a better chatbot?

You must be thinking that LSTM is something that has been used in Chatbots before, and LLMs, as I have explained here, basically just answer your queries…

Actually no. One huge difference between chatbots and LLMs is how they learn. LLMs use reinforcement learning (I sneakily introduced this in Part I of this series, there even is RLHF Reinforcement Learning from Human Feedback...), also the volume and diversity of data that these have been traditionally trained on is vastly different. LLMs can ‘talk’ about many more topics/intents than a traditional chatbot that is usually more focused.

However, the comparison with a chatbot is an interesting one. The interest in LLMs really took off with GPT3.5. As the name suggests it is not the 1st offering in the GPT family of OpenAI. So what made GPT3 garner so much interest (GPT-1 was released in 2018, GPT2 in 2019, GPT3 in 2020, and GPT3.5 in 2022 (16))? One was that it suddenly improved, and second that a friendly chat interface was included, allowing virtually anybody with an internet connection to play with it, and become an instant advocate.

A few more points

GenAI, here LLMs, basically smartly and quickly process word/token embeddings to understand you, and produce a response. The key to understand them, as I mentioned earlier is to know they are not designed to give you the truth, but they answer: “what would a likely answer be?”  Actually, not only that, GenAI gives you the likely answer of an average person (Thank you Doc for pointing this out clearly). Think about it, if it is trained on the whole internet, and ranks the most likely answer, then the most likely answer may not be that of people who really know what they are talking about. Hence, my thought that LLMs can help so-so coders, but expert coders may not be helped that much, they probably know better.

Questions to ponder:

  •  Do you believe that logic is something that is common in humankind? Is common sense really that common?
  • How about Maths, do you believe that people are generally good or bad at Maths?
  • Why am I asking this? Simple, now tell me, do you think, LLMs are good at logic? At Maths?

Is most likely always the best?

Now, there’s one more thing is that you can influence: what GenAI responds to you. I mentioned that they basically rank all possible words and pick one; may be your first instinct is to always pick the highest probability word.

That would give you consistent answers over time. However, always using highest probability response often leads to circular and less than satisfactory answers. Hence, most people choose to allow some randomness (ChapGPT calls this temperature(17))

Conclusion:

GenAI is a great tool (what you can do with GenAI, whether you are from an SME, an individual looking to make your own life easier, or a large organisation may be a topic for a next blog). What it does it come up with is a possible answer based on the data it has been trained on. (Actually another blog post could be why GenAI is not the answer to everything, but that’s probably obvious)

 

  1. https://www.channelnewsasia.com/business/new-york-lawyers-sanctioned-using-fake-chatgpt-cases-legal-brief-3581611
  2. https://www.businesstoday.in/technology/news/story/openai-chatgpt-falsely-accuses-us-law-professor-of-sexual-harassment-376630-2023-04-08
  3. https://www.bbc.com/news/business-64576225
  4. https://www.nytimes.com/games/wordle/index.html
  5. http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/english-letter-frequencies/
  6. http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/french-letter-frequencies/
  7. http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/spanish-letter-frequencies/
  8. you can also adjust how you penalise mistakes, known as the loss function; so that’d be a 4th way.
  9. https://www.simplilearn.com/tutorials/deep-learning-tutorial/convolutional-neural-network
  10. http://www.bioinf.jku.at/publications/older/2604.pdf
  11. LSTM evolved from Recurrent Neural Networks (RNN) where the idea was that you can look back at information you processed earlier (hence recurrent), however if the information was far back, there were problems referring to it.
  12. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
  13. https://tfwiki.net/wiki/Rise_of_the_Decepticons
  14. https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)
  15. Memorable demo of CPU vs GPU https://www.youtube.com/watch?v=-P28LKWTzrI
  16. https://en.wikipedia.org/wiki/GPT-4
  17. https://community.openai.com/t/cheat-sheet-mastering-temperature-and-top-p-in-chatgpt-api-a-few-tips-and-tricks-on-controlling-the-creativity-deterministic-output-of-prompt-responses/172683


Sunday, 22 October 2023

Gen AI thing Part II: Talking in human language, not in machine language

In my previous blog, I explained that AI is about making machines think like humans, and I gave an example of a human task of recognising objects and how you can get a machine to do that. In this blog, I will expand a bit more on how we can all become Dr Dolittle (1) but with machines rather than animals.

A few years ago, someone from LinkedIn asked me what coding language I would recommend a child to learn since he was making the decision for his newborn. I said that, in my view, rather than humans learning how to speak machine language (coding), sooner or later machines would learn how to understand human language (something like NLP Natural Language Processing), and it would be more important for a child to learn how to think systematically but also creatively rather than learn how to code. I haven’t heard from that person since. Hey, sooner has happened (2)(3)(4). But I am jumping the gun.


For the 2nd time, what is Gen AI!?

Gen AI is basically using AI to create something that wasn’t there before. What is created can be text, an image, a sound… But the trick is that, first the machine has to learn (that is be trained on a bunch of data/examples), then it can produce something.

But what excites most people is that anyone can use Gen AI because the machine speaks human language (no code and you can access the mythical AI!). I will tackle this part first.


The machine understands me!

Another branch in AI/ML is NLP, Natural Language Processing. NLP is precisely concerned with making machines understand what humans are saying. You can imagine, it’s already quite difficult for humans to understand each other, now imagine machines…

Language is a very complex thing, and is a living thing: new words are added all the time, meanings are added to words over time, words may mean different things in different contexts, humans use irony, sarcasm… But it is worth it because a huge amount of knowledge is kept in language, whether oral or written form. With the advent of the internet, and the digitisation (making it digital - bits and bytes- rather than analogue – printed image) of dictionaries, research papers, and democratisation of access to the internet (any idiot can write a blog – but smart people know which to read) there is a treasure trove of information that can be used to train a machine on the internet. But language is not that easy to deal with.


Words are all I have

In my previous post I talked about classification, and one of the keys is to measure the distance between things and decide which are similar. How does that apply to words?

But computers are all about numbers, not words…

The first challenge that machines have in comparing words is that they do better at numbers, so the first trick is to somehow make the problem one that involves numbers, once you know how to measure, then deciding which is closer is not so hard..

Look at the words “BETTER” and “BUTTER”. How close are they?

There is only 1 letter difference, so, these 2 words are quite close, it’s just replacing a letter. There are some concepts of distance that make such calculations, especially taking into account the number of letters in the word. These algorithms are quite useful. The idea is that words are similar if it takes little effort to change one into another.

 Now, let me add the word “BEST” to the comparison. As an English speaking person, you would say “BEST” is close to “BETTER” but not so close to “BUTTER”, but going purely by replacing letters misses the meaning. Therefore there must be a way.

 

Vector Embedding

Similar to a dictionary for words humans can refer to, there is a source of information that machines can refer to that tells them the relationship between words (humans can use them too). This is called vector embedding.

 Vector Embedding: Imagine

Imagine a 3 dimensional space in front of you. A point in this space represents a word. A vector for that word is like directions to that point in space (here may be x, y and z coordinates). And each word is embedded in space with closer words having similar meaning/context. One of the really popular techniques has been made public by google called word2vec, basically transform a word into a vector while preserving the meaning of the word.

So to follow our example, in the 3D space, ‘BETTER’ and ‘BEST’ will be close to each other, and ‘BUTTER’ further (closer to ‘MARGARINE’ and ‘MARMALADE’).

 Points in space, and more

Not only are words that are similar grouped together so the machine can get the topics in a piece of text, but the relationships between the points in space also have meaning: moving from “BETTER” to “BEST” is the same journey as moving from “WORSE” to “WORST”.

This is something worth thinking about, not only do vector embeddings bring words that are about the same thing close to each other, but based on not only the distance, but the direction (6), the relationship between the words can be inferred.


What is the big deal with vector embeddings?

The beauty of vector embeddings is that some large organisations like google have made their vector space available for anyone to use, so we do not have to train the models, for example word2vec(5). In some cases, say you are dealing with very specialised topic say medicine, you should use specialised vector embeddings, but for most cases, for the machine to understand what the human is saying, generic vector embeddings work well enough.

Therefore, the machine is able to know what we are saying whether we use the same words or not because it now, with embeddings, see what words are close to each other in meaning and their relationship with others. That’s great!

What this means is that it is possible to train the machine on millions of pieces of text on a bunch of topics, and it will be able to understand that some of talking about the same thing even if the words used are different.


Ok, but this is not new right?

Correct! Vector embeddings aren’t a 2020s thing (7). In the 1950s, John Rupert Firth made a statement that underlies a lot of the thinking today:

               “You shall know a word by the company it keeps” J.R. Firth 1957 (8)

However, 75 years ago we did not have the computing resources we have. So, AI went into winter – people could think about it, but it was very hard to put it into practice. For example, imagine the number of words in a language (9) – English Wiktionary (10) contains around 700k base words and 1.4m definitions - and if you want to put this in space with the meanings then you will need many  groups spread across many dimensions, and even worse there will be dimensions with few words, making computation really tough (curse of dimensionality (11)). Most people can navigate through 4 dimensions and our brains can handle 4 dimensions easily (our 3D world + time) (Next time someone is late for a meeting, introduce them to the 4th dimension 😊 ). However, some research points to humans being able to handle more (12), but still not as many required to plot even only common words in English.

Note that not everything stopped, people spent time in many other directions.

In the 2000s, research hotted up and some great leaps were made, for example research by Yoshua Bengio and colleagues at Montreal proposed the path forward “We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.” (13)

Ooops! getting too geeky here, just to summarise the point about vector embeddings. The thing with machines is that they don’t understand language just like that. So, one of the ideas was to convert words into numbers (vectors). Then the words that are about the same thing are grouped together, so if you use slightly different words from me but we are saying the same thing, the machine can tell. The neat thing about the numbers is that doing maths on the numbers allows the machine to understand the relationship between the words, for example the relationship between “king” and “man” is the same as “queen” and “woman”

(14)

The machine is now ready to understand you!

Add to this that there exist specialised vector embeddings for specific fields, this allows the machine to have understand you generally, or even if you are asking in depth questions on specialised topics.

So, what this helps is for machines to store all the info they have access to in a way that is very easy for them to search and make use of, so they can figure out to a large degree what you are talking about. It is not perfect, that is why you have a role of prompt engineer (someone who speaks the ‘human language’ the machines understand). Personally I think advances in NLP, machines being trained by interactions with humans, sooner or later there will be less need for prompt engineering; we (as in humans and AI) will all speak a ‘common language’, a bit like how some people speak differently to their children (or pets) or ‘foreigners’ compared to their own friends and family.

 

But still this is not Gen AI, where is the Generative part?

True, we are getting there…

In my previous blog and this one, I explained how machines can be made to think like humans, how advances in technology have made it easier to avail training data to machines so they can understand what humans are saying to a large extent.

The next step is how machines can now create stuff, I will be focusing on how machines can write stuff that has not been written before. That will be the topic of the 3rd and last part of this loooong blogpost.

 

 

  1.  https://www.youtube.com/watch?v=YpBPavEDQCk
  2. https://ai.meta.com/blog/code-llama-large-language-model-coding/
  3. https://venturebeat.com/programming-development/stability-ai-launches-stablecode-an-llm-for-code-generation/
  4. https://cloud.google.com/use-cases/ai-code-generation
  5. https://en.wikipedia.org/wiki/Word2vec
  6. That’s the basic thing about vectors, they are about ‘magnitude and direction’ https://en.wikipedia.org/wiki/Vector and the relationship between them can be ‘easily’ mathematically calculated
  7. https://en.wikipedia.org/wiki/Word_embedding
  8. https://cs.brown.edu/courses/csci2952d/readings/lecture1-firth.pdf
  9. https://en.wikipedia.org/wiki/List_of_dictionaries_by_number_of_words
  10. https://en.wiktionary.org/wiki/Wiktionary:Main_Page
  11. https://en.wikipedia.org/wiki/Curse_of_dimensionality
  12. https://www.frontiersin.org/articles/10.3389/fncom.2017.00048/full
  13. https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
  14. https://blogs.mathworks.com/loren/2017/09/21/math-with-words-word-embeddings-with-matlab-and-text-analytics-toolbox/




Sunday, 15 October 2023

Gen AI thing Part I: Plato’s spark

A friend recently asked me to help him understand the “gen ai thing” at a level that will allow him to have discussions (and since he knows me well, he knows this comes w opinion). I decided to go a level simpler, and try to explain Gen AI in a way my mum would understand (she’s in her 80s and my recent victory was getting her to carry her mobile phone when she is out of the house). I figured out it would take me a while, so I broke the explanations into smaller more digestible pieces. Here is Part 1.

What is Gen AI?

Before we go there….

First what is AI.(with apologies to my brother, Dr. AI)

Humans are a very arrogant species, so we decided that the way we think is something worth replicating. Hence, if we could make machines think like humans, then we would have something fantastic. Basically, machines don’t get tired easily, and you can expand the capacity of a machine much faster than a human (hopefully (1)).

AI is basically that, how do we get machines to think like humans.

So, what does it mean to think like a human?

How do you think?

Let’s take a simple example (a simple application of thinking like a human), you see a piece of furniture in a shop, how do you decide that it is a chair, or a table (assuming someone hasn’t written: this chair/table for $xxx)?

Enter Plato!

This is not a new question. Plato (~428-342BC that’s close to 2500 years ago) came up with a theory of forms, and that made me fall in love with PH102. The basic idea is that there is this world where the perfect form of every item in our world exists. So, I thought that makes sense! I know if something is a chair or a table by comparing it to the ideal form: it is closer to the ideal chair, or the ideal table?

What does closer mean?

If you have read other articles by me, you will remember I love talking about distance, closer means smaller distance. An object a is more likely to be an A than a B if it is closer to form A than form/ideal B. This is easy; how you define closer is where the fun begins 😊 

Plato’s Theory of Forms

So, when I started playing with data, Plato’s theory of forms helped me a lot. The main difference is that, since I can’t access the world of ideals/forms, I have to base my version of form on what I had seen before.

The tables I had seen were 4 legged, came up to waist high (since my teens), had at the top a large flat surface so you can put stuff on top. Usually they were made of wood, although the legs could be made of metal. Chairs, were shorter, below waist high, but also usually had 4 legs, and made of similar material. However, chairs also had a back, the flat surface was not the highest point of the chair, but the back, so the person can sit on the flat seat, and rest his/her back on the back.

So, when I see a new object, I decide whether it looks more like a chair or a table, based on whether it is closer to the typical form I had in mind. Note that, I am not comparing just these words as I described table and chair, but the more complicated concept I have in mind (like an ideal form)

While humans learn from experience, machines can be made to learn. Instead of telling the machine the short ungainly description of a chair and a table above based on what I have seen, the trick is simply to give thousands of examples of things we know are chairs and tell the machine, these are chairs, and same thing for tables. So, you train the machine so that it comes up with its own view of what a chair is and what a table is. This is the training part of a model. 

In this case, we train the model by feeding it images of chairs with the label that these are chairs, and the same for tables. This is called supervised learning, since someone supervised the process by providing these presumably accurate labels.

 For now, we skip on how the machine breaks down the images, and let’s just assume that the machine now knows what chairs look like, and what tables look like. We then feed it a new image with a picture of a piece of furniture without label, and it will tell us: this is likely a chair (or a table) depending on what it has learnt. The machine has solved the classification problem, by deciding the new unlabelled furniture is classified as a chair/table accordingly.

 Now, nobody stops you from training the machine with other pieces of furniture, and animals, and all sorts of other things… Afterall, that’s how we learnt, no?

Thought experiment:

Imagine you are walking about, and from far you see something. How do you decide whether this thing with 4 black legs, and black and white splotchy pattern on the top and sides is a table or a cow or may be a dalmatian?

How would your thinking process go?

Would it be faster if you remembered you were in a field in the middle of a farm, or close to a nature inspired furniture shop?

For me, yes; based on the context (where the object is), I can make the process simpler by focusing on a smaller list of likely choices, than the whole list.

This is why you get faster, likely better results, on a specialised machine (a farm animal identifier in the first case or a furniture classifier in the second) rather than a generic machine: a machine trained only on furniture would identify the table much faster and more accurately than one that has also learnt about cows and dalmatians. However, the furniture classifier would fail if someone asked it to identify a dalmatian… Hence, machines/algos trained on a specific set of data are usually better at working on that theme/context, but will not do so well at things in different contexts. 

It should not be surprising, if someone from the tropics had never even heard of snow, he/she would be flabbergasted the first time, may be even think it was volcanic ash… But someone who has lived in the snow would even be able to tell you the type of snow (4), it all depends on what you need. Similarly, I know of many Mandarin/Cantonese/French speakers who claim that there are many nuances in their languages that are not present in English. Again, depends on what the people who use the language use it for.

If I had not seen a chair and table before, maybe I could check out in a dictionary:

  • Chair: a piece of furniture for one person to sit on, with a back, a seat and four legs (2)
  • Table: a piece of furniture that consists of a flat top supported by legs (3)

Then based on these definitions try and decide…

But you will tell me, wait, the human has a lot of work to do, he/she has to label the pictures.

Well, yes, for supervised learning, as a child asks adults: “what is this? And this? And this? How about this?”. But you will recognise the work the child put in: the child takes in the image he/she sees, commits it to memory in one shape or form, then later, when he/she sees a new object decides whether it is a chair, table or something else.

It is also possible to feed the machine unlabelled pictures, and it will decide by itself how many categories of objects there are (you can tell it that if you want) and it will create its own view of things and when presented with a new picture, after having been trained, decide whether that object is a chair of a table. This is called unsupervised learning.

There also is reinforcement learning, whereby the machine is given feedback on what it has predicted, therefore can continue learning by analysis what went right and what went wrong.

Now whether you choose to use supervised or unsupervised learning is up to you, there are reasons for and against using either form. Not only that, but how you choose to learn or group things also makes a difference to the output you will get and the ability of the model/algorithm to properly classify things. This is something I am geeky about, but is not for this blog post

You will agree this is a very useful thing to have in your back pocket and the practical applications are very very vast. For example, a few years ago, I found it was not too hard to build something that, once you feed it a photo of a piece of meat from a supermarket, it can identify the meat with reasonable accuracy, and you can slap on features such as estimating price (after estimating volume), freshness… You can easily do the same for fruit: auntie, no need to press-press anymore!)

Ok, but this is only classification of objects, doesn’t AI do many many more things? Is this really AI, or is it ML?

AI vs ML

AI is, as mentioned above, focused on making machines think like humans. ML is how we apply specific pieces if this to solve problems. The classification piece I used is a piece of ML, but ML is part of AI. But there is more than that.

Classification is just a small piece of what ML can do. ‘Traditionally’, ML has been used to do 3 things: classifying things as I illustrated above (think the photo app in your phone tagging the pictures by recognizing what is inside), finding out what affects what (regression) for example understanding how the weather affects price of tomatoes, and predicting things such as predicting the price of tomatoes next week.

A little diagram will illustrate what I am talking about:


So basically, while trying to explain Gen AI as it is today, I used ML, basically applied AI, and took 1 aspect (classification). I skipped over neural networks, that can be used to classify the images by say automatically varying the importance of different aspects – is height of horizontal piece mor important than number of legs – or even deep learning that basically is a more complex neural network.

But, simply by looking at the name: Neural Network, you can get a hint that the original idea was to mimic the layers of the human brain, deep learning is adding layers and other complexities. So, fret not, I am not misleading, I am simplifying. Remember, my aim is that even someone like my mum can understand.

In my next blog, I will explain the most common understanding of GenAI, ‘ChatGPT’, or basically LLM (Large Language Model) because people using it are not coding (speaking machine language) but speaking their own Natural Language (oops I slipped in NLP)


  1. Elon Musk’s Neuralink has been approved to have human trials https://www.cnbc.com/2023/09/20/elon-musks-neuralink-is-recruiting-patients-for-its-first-human-trial.html
  2. https://www.oxfordlearnersdictionaries.com/definition/english/chair_1
  3. https://www.oxfordlearnersdictionaries.com/definition/english/table_1
  4. https://en.wikipedia.org/wiki/Eskimo_words_for_snow