Spend a vision with me: 2023

Sunday, 10 December 2023

NTUC fairprice shines a new path in AI

Recently, I was having a discussion on the potential effects of large scale adoption of LLMs (and AI in general), and one of the risks was a move towards uniformity/homogeneity, or a loss of randomness in the human experience. (1)

Basically, if algorithms are designed to give you ‘the most likely’ or ‘the best’ answer (this may not always have to be the case (2)), then everyone would get similar answers and be driven to same things.

Add to this the fact that as more people use LLMs, more and more content on the internet will be LLM generated, and therefore the training data used for LLMs will include a higher percentage of LLM created data as opposed to human created data.

Fear not!

A data scientist at NTUC fairprice in Singapore has managed to build a machine (apply an algo) that gives very interesting answers:

The AI built by NTUC understands that, after a meal, you can use 2 similar products, similar in the sense that you, as a human, have a choice to do 1 of 2 things:

do the dishes using the sponge, or
have a piece of chocolate

There still is hope!

Or despair: “mummy/daddy, it’s not me! It’s the machine who told me I could do either one since they are similar”

Sunday, 19 November 2023

You are overpaying for your vehicle insurance. It doesn't have to be this way.

I am sure you have been making this complaint over the years, but didn’t have much choice since prices are around the same and policies are designed to be sticky or not so advantageous to get out of (that’s for another day).

Singapore General Insurance Association says so too!

But now, ladies and gentlemen, we have the ultimate confirmation. This comes from the GIA the association that groups General Insurers in Singapore “About two in 10 motor insurance claims in Singapore are fraudulent, often involving exaggerated injuries and inflated vehicle damage”(1)

And the president of Budget Direct

The president of Budget Direct, who usually prides itself in competitive rates even admitted: “In the end, all motorists are victims of motor insurance fraud as we all end up paying higher premiums as a result”

This is the key you see, the claims paid out in fraudulent cases simply get translated into increased premiums for ALL vehicle insurance customers. Irrespective of whether you commit fraud, are a scam victim, or are accident/claim-free, you are paying for the fraudsters’ bread butter and cake, and the insurers maintain their healthy margins and profits

The Insurers have no incentive to act on fraud

The thing is the GIA is saying, it is up to you, the customer to stop the fraud. And that I find laughable. Let’s see what are the main causes of fraud as per GIA …

Beware of Phoney Helpers: After an accident, individuals may offer "help" and pressure victims to follow their directions, often leading them to unauthorized repair shops or overpriced towing services.
Staged Accidents: Scammers stage accidents, causing victims to collide with their vehicles and then falsely accuse victims of causing the collision. They often fake injuries and make substantial claims for damage and injuries.
Phoney Witnesses: Suspect convenient witnesses who support the other driver's account, often suggesting a staged accident.

1 Unauthorised repair shops:

Most vehicle owners are aware of the workshops that their insurer accepts, whether by own bad experience, by hearing from friends and family, or from the insurer. Plus, most of the time, unauthorized repair shops costs are not paid by the insurer, if they are, it is a bit rich on the part of insurers to honour the claim while complaining about it.

Plus it is not rocket science to detect highly inflated claims based on pictures and description that accompany the claims. I know because I worked in an insurance company in a much less developed country than Singapore, and I know for fact that they have the data needed to deal with this, the question is financials and will.

2 Staged Accident

And how is that the fault of the insured? The insured is getting scammed at the same time as the insurer, unless GIA is claiming that the insured is somehow going along with the scammers… more on this later

3 Phoney Witnesses

Again, how will someone who has just been in an accident be able to detect whether witnesses are phoney or not?

Unless Singapore is a nation of scammers (not scammed/scam victims (2)), it just doesn’t make sense to think that individual people involved in accidents are part of the scam. So should victims pay the price twice (once being scammed and second via higher premium, and probably loss of NCB)?.

So my arguments that follow assume that Singapore is not a nation of scammers (unlike (3)). Afterall Singapore is only beaten by Finland, New Zealand, and Denmark in terms of corruption perception. (4)

The fact that GIA mentions Staged Accidents, Phoney Witnesses seems to indicate syndicates are at play, or at best a group of people who are in the business of scamming accident vistims. In fact, it is likely that staged accidents and phoney witnesses occur together, rather than separately.

You can have a staged accident without phoney witnesses, but very unlikely to have phoney witnesses to a real accident.

So chances are, there are syndicates/gangs/groups of scammers at work. It is ridiculous for GIA to expect an individual consumer to be able to detect them, don’t you think so?

So what can be done?

The answer, in most of my blogs, is Analytics!

Inflated Claims

I briefly mentioned the solution to GIA issue 1, inflated claims. Analytical models can be built to detect inflated claims. The beauty of this is that it can be even employed to detect which workshops are cheating.

But, from experience, there is little will power in senior management to do something that will rock the boat. It is important for analytics people to learn that not everything that can be done will be done, other factors come into play, obviously whether it is financially viable (in this case I am quite sure it pays for itself quite quickly, a couple of months of work to build, another month to finetune, and the low running costs for a basic solution), or politically (is it worth opening pandora’s box at your preferred workshops?).

In sum, technically easy to solve and pays for itself, management wise depends on management.

It's even worse at the GIA level where, as people in SG know, some workshops are on the panel for multiple insurers.

Staged Accidents and Phoney Witnesses

Accidents, by their nature, are (most of the time) unexpected, hence being able to, by simply looking say at road and traffic conditions, location, it is not that straight forward estimate the probability of an accident and highlight the stranger ones; one of the reasons being that humans play a large role and it is not so easy to get data on all actors involved, not only all drivers involved and their data, but also drivers in the immediate vicinity.(5)

The easy way to detect staged accidents and phoney witnesses is to focus on the people, not the vehicles. The key assumption is that these are the work of groups of people. Hence they are likely to play different roles at different times. Let me put it this way, how likely is it that someone is a claimant, a witness of an accident, and at fault for a vehicular accident all within say a year?

The idea is that, chances are, a member of the group is likely to play different roles over time, sometimes even with different insurers to make the chances of detection lower. This is something very easy to pick up using social network analytics, especially at the GIA or police level.

Conclusion

Saying that 20% of claims are likely to be fraudulent and placing the onus on customers/insured in the case of vehicular insurance in Singapore is a joke.

1 The main causes as stated by GIA are unlikely to be caused by claimants

2 The GIA itself (or to a lesser degree large insurers) are the ones who have the data easily at hand to detect potential fraudulent cases effectively

3 however the insurers (and the GIA) have little incentive to do so since they can simply pass the costs to customers.

However, relatively simple analytics can, right now, help alleviate this problem and allow customers to pay lower premiums since the risk of fraud can be mitigated. It is just a question of will from the insurers’ point of view.

https://insuranceasia.com/insurance/news/20-singapores-motor-insurance-claims-are-fraudulent-giaj
https://www.straitstimes.com/world/14-trillion-lost-to-scams-globally-s-pore-victims-lost-the-most-on-average-study
https://www.youtube.com/watch?v=q5PI5ZtJTSY
https://www.transparency.org/en/cpi/2021
That is not actually true anymore in Singapore, I will explain in a subsequent blog.

Sunday, 5 November 2023

GenAI thing, bonus: hype cycle

Gartner is an organization that classifies different technologies into their “hype cycle” framework. (1) basically, any piece of technology may go through 5 stages:

1. Technology Trigger

· A technology reaches a proof of concept, a successful experiment, people get excited.

2. Peak of Inflated Expectations

· Given the excitement, some companies jump in and experiment, some succeed, most do not.

3. Through of Disillusionment

· Given failures, some technology versions fail, and investment into the space gets hit and will only recover if providers iron out main issues.

4. Slope of enlightenment

· As technology becomes production ready, more successes are created and the usage and limits of the technology are better understood. New generation products appear.

5. Plateau of Productivity

· Mainstream adoption, what was successful niche spreads.

Guess where Gartner placed Gen AI in its 2023 AI hype cycle?

(2)

That’s right, right at the peak of inflated expectations. Plus, they only see that plateau of productivity being reached in 5 to 10 years.

On the other hand, something like Computer Vision, where we use machines to process images to extract meaningful information is close to the plateau of productivity. There are many pieces of software/APIs that help you analyse images very efficiently, and very importantly there are proven use cases in production for computer vision, from facial recognition to control access, to recognizing who is not correctly wearing masks (useful during COVID), to detecting anomalies in x-rays/MRIs, to identifying and tracking people from public cameras (ahum…).

GenAI, on the other hand, has made a big splash, people around the world, especially including non data professionals are raving about the possibilities that GenAI can bring. AI is already being used whether we are aware/like it or not, for example in the UK (3), now imagine GenAI (in an earlier blog I listed a few well known issues with LLMs)

So what have people been doing with GenAI. One of the avenues that is being explored is helping humans write code. And there are many many exampes of this; for example the ubiquitous GitHub CoPilot (4). But as I asked in an earlier blog, do you think the code that is written is of very high quality since it is built on ‘everyone’s’ coding…

There have also been efforts to help manage GenAI. Actually, apart from the coding co pilot, the other development from Microsoft Build (5) earlier this year is the guardrails Microsoft put around GenAI. And this can be leveraged, as OCBC has done (6) with MS Azure to allow fact checking, not blindly following the answers generated: curation! (7)

The reality is, I believe GenAI is a very useful tool to have in your arsenal. More ‘traditional’/’tried and tested’ methods may be more suitable for your problem at hand. I have had customers saying “I just want GenAI” whether their use case suits or not. I would just point to the “peak of inflated expectations”.

I am someone who enjoys building stuff that work and enables organisations to hit business KPIs, and to do that, choosing the right tool is very important, and this is something I can help with. You can use a sledgehammer to open a can of beans, you can use a can opener too; guess which, currently, more efficiently gets you to the beans and deal with your hunger?

https://en.wikipedia.org/wiki/Gartner_hype_cycle
https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2023-gartner-hype-cycle
https://www.theguardian.com/technology/2023/oct/23/uk-officials-use-ai-to-decide-on-issues-from-benefits-to-marriage-licences
https://github.blog/2023-03-22-github-copilot-x-the-ai-powered-developer-experience/
https://news.microsoft.com/build-2023/
https://www.straitstimes.com/business/ocbc-to-deploy-generative-ai-bot-for-all-30000-staff-globally
Interestingly, if you look again at the AI hype cycle 2023 diagram above, "Responsible AI" is also at the peak of inflated expectations, humans still, fortunately, have more thinking to do...

Sunday, 29 October 2023

Gen AI Thing, Part III : finally getting to the crux of the issue

In my previous 2 blog posts, I explained, as I would explain to my mum, what is AI, and how machines can understand our words. In this blog, I explain heuristically (non-technical / common sense) how machines can not only understand what we are saying, but how they can respond intelligently and create answers that were not there before (Gen AI)

One more little detour

For those of you who have played with ChatGPT or something similar (Bard?), one of the things that puzzles people is the concept of ‘tokens’. Some of you may ask, since I claim that machines can understand human words well enough, what is this token thing? Are we gambling when using these ‘tools’?

Gambling? Yes may be… Some examples of funky ChatGPT (and Bard) results

ChatGPT making up cases to support its argument (1)
ChatGPT making up cases of sexual harassment and naming the supposed perpetrator (2)
Bard makes mistake regarding James-Webb Telescope (3)

There are ways to mitigate these issues, but this is beyond the scope of this blogpost. Suffice to say that such models do give information that may be less than reliable. But then again, they were not designed to ‘tell the truth, the whole truth, and nothing but the truth’.

ChatGPT/BARD are not designed to tell the truth?! I thought they were AI!

The answer to the question lies in the question itself. These are AI systems, and as I mentioned in my 1^st blog of the series, such models learn from (are trained on) data they are fed. Secondly, it may help to understand a bit how these systems, they are called LLMs (Large Language Model), work.

How does ChatGPT/Bard… work?

Let me start by a word game. How many of you play wordle (4)? Basically everyday, a 5 letter word is chosen, and you have to guess the word without any clue, you have 6 tries. All that you will ever know is whether the letter you have suggested exists in the answer but is in the wrong slot (yellow) or in the correct spot (green) or does not exist at all (black). The other condition is that any combination of letters you try has to be an existing word.

The thing is, most people, once they know the position of one letter, will try to guess the letters next to it based on what they know about the English language, for example (5):

‘E’ is the most common letter in English and your best bet if you know nothing about the word, this is followed by ‘T’ and ‘A’.
If there is a Q, chances are there has to be a U, and chances are the U follows the Q
If there is a ‘T’ except in 5^th position, then the next letter is likely a ‘H’ next is ‘O’ and next is ‘I’
If there is a ‘H’ except in 5^th position, then the next letter is likely a ‘E’ next is ‘A’ and next is ‘I’

Combinations of 2 letters such as ‘QU’, ‘TH’, ‘TO’, ‘TI’ are called bigrams. The idea is that once you know a letter, you use this information to find the most likely following letter – this is known as conditional probability, based on the condition that one letter is an ‘T’ then the most likely following letter in an ‘H’, not an ‘E’, the most common letter in English. The key is that your choice of letter changes based on information you have.

These are shortcuts, findings based on analysis of words, that can help you guess the letters in wordle.

As an aside, the most common bigrams in different languages can be very different (6)(7)

Bigram Popularity	English	French	Spanish
1	TH	ES	DE
2	HE	LE	ES
3	IN	DE	EN
4	ER	EN	EL
5	AN	ON	LA

Letters are fine, but Gen AI generates whole documents, not random letters

It’s just an extension of the idea. In the example above, I used bigrams (2 letters), when playing wordle, some people may choose trigrams (3 letters), it’s basically the same thing, just a little bit more complex.

The next step then is that instead of guessing the next letter (using a bi-gram), you guess the next word. But why stop there? You can actually go beyond a bi-gram and use multiple letters (here words). It’s, in principle, that straightforward. However, to improve the performance, there are a few more tricks.

The problem is the size of the data; given the number of words, the combinations possible increase exponentially as you add more words. The brilliant, or one of the more brilliant, things about LLMs is that they generate a probability of a combination of words occurring. They do that by using an underlying model and, recognise the patterns.

AI, NN, and the human brain

As mentioned in Part 1 of this blog, AI is about making a machine think like a human. The way this has been done in Neural Networks is to make a representation (model) of the human brain, with nodes and connections. And as it is thought with the human brain, each node does a fairly simply job (one of the simplest jobs is a binary yes/no or a threshold – in this case called a perceptron), and the connections between them are given weights based on how important they are.

Note that as Neural Nets have progressed, they have taken a life of their own and the idea of mimicking the human brain structure is not central, the architecture of neural nets, while using nodes and connections can be different.

Going back to the chair and table example

When you show the machine a picture, it breaks it down into small parts of the picture (features), may be the length of the leg, the shape of the back, and assigns weights based on how important these features are. After being trained over many examples, the model is ready to distinguish between table and chair.

The illustration above shows a very simple type of Neural Network, one input layer where you start, one hidden layer of nodes and connections in one direction to do the magic, into the output layer. For the table chair classification from images for example, it has been found that neurons arranged in a grid formation work well, specifically a Convolution Neural Net. Basically, a set of filters is applied to detect specific patterns in the picture (convolutions), then these are summarised and combined (more layers) to extract the more salient features without burning enormous resources, and finally pushed to the output layer; in the case of our chair/table classification there would be 2 nodes in the output layer, the output being the probability that the image fed is a chair or a table. (9)

There are many ways to structure a neural net, many parameters to play with. You wouldn’t be surprised that one of the important innovations was that, for processing text it is important to know what else is in the sentence, and not process each word independently. So, there was a need to be able to refer to past Long Short Term Memory (LSTM) (10) allowed this to happen by allowing the user to control how long some nodes would retain information, and hence be used to provide context.

However, LSTM is not that fast as it processes information sequentially, like many of us do, we read word by word.(11). In 2017, a team from google came up the brilliantly entitled paper “attention is all you need” (12). This gave rise to the rise of Decepticons (13), sorry, to Transformers(14). Basically, the machine, when processing a chunk of text, calculates weights using an attention network, calculating what words need to be given a higher weight. While Transformers can be run sequentially, they can also be run in parallel (no recursion), hence the usefulness of GPUs in LLMs.

To answer a friend’s question, GPUs are not necessary in LLMs, but they really speed things up. (15)

Is LLM therefore just a better chatbot?

You must be thinking that LSTM is something that has been used in Chatbots before, and LLMs, as I have explained here, basically just answer your queries…

Actually no. One huge difference between chatbots and LLMs is how they learn. LLMs use reinforcement learning (I sneakily introduced this in Part I of this series, there even is RLHF Reinforcement Learning from Human Feedback...), also the volume and diversity of data that these have been traditionally trained on is vastly different. LLMs can ‘talk’ about many more topics/intents than a traditional chatbot that is usually more focused.

However, the comparison with a chatbot is an interesting one. The interest in LLMs really took off with GPT3.5. As the name suggests it is not the 1^st offering in the GPT family of OpenAI. So what made GPT3 garner so much interest (GPT-1 was released in 2018, GPT2 in 2019, GPT3 in 2020, and GPT3.5 in 2022 (16))? One was that it suddenly improved, and second that a friendly chat interface was included, allowing virtually anybody with an internet connection to play with it, and become an instant advocate.

A few more points

GenAI, here LLMs, basically smartly and quickly process word/token embeddings to understand you, and produce a response. The key to understand them, as I mentioned earlier is to know they are not designed to give you the truth, but they answer: “what would a likely answer be?” Actually, not only that, GenAI gives you the likely answer of an average person (Thank you Doc for pointing this out clearly). Think about it, if it is trained on the whole internet, and ranks the most likely answer, then the most likely answer may not be that of people who really know what they are talking about. Hence, my thought that LLMs can help so-so coders, but expert coders may not be helped that much, they probably know better.

Questions to ponder:

Do you believe that logic is something that is common in humankind? Is common sense really that common?
How about Maths, do you believe that people are generally good or bad at Maths?
Why am I asking this? Simple, now tell me, do you think, LLMs are good at logic? At Maths?

Is most likely always the best?

Now, there’s one more thing is that you can influence: what GenAI responds to you. I mentioned that they basically rank all possible words and pick one; may be your first instinct is to always pick the highest probability word.

That would give you consistent answers over time. However, always using highest probability response often leads to circular and less than satisfactory answers. Hence, most people choose to allow some randomness (ChapGPT calls this temperature(17))

Conclusion:

GenAI is a great tool (what you can do with GenAI, whether you are from an SME, an individual looking to make your own life easier, or a large organisation may be a topic for a next blog). What it does it come up with is a possible answer based on the data it has been trained on. (Actually another blog post could be why GenAI is not the answer to everything, but that’s probably obvious)

https://www.channelnewsasia.com/business/new-york-lawyers-sanctioned-using-fake-chatgpt-cases-legal-brief-3581611
https://www.businesstoday.in/technology/news/story/openai-chatgpt-falsely-accuses-us-law-professor-of-sexual-harassment-376630-2023-04-08
https://www.bbc.com/news/business-64576225
https://www.nytimes.com/games/wordle/index.html
http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/english-letter-frequencies/
http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/french-letter-frequencies/
http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/spanish-letter-frequencies/
you can also adjust how you penalise mistakes, known as the loss function; so that’d be a 4th way.
https://www.simplilearn.com/tutorials/deep-learning-tutorial/convolutional-neural-network
http://www.bioinf.jku.at/publications/older/2604.pdf
LSTM evolved from Recurrent Neural Networks (RNN) where the idea was that you can look back at information you processed earlier (hence recurrent), however if the information was far back, there were problems referring to it.
https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
https://tfwiki.net/wiki/Rise_of_the_Decepticons
https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)
Memorable demo of CPU vs GPU https://www.youtube.com/watch?v=-P28LKWTzrI
https://en.wikipedia.org/wiki/GPT-4
https://community.openai.com/t/cheat-sheet-mastering-temperature-and-top-p-in-chatgpt-api-a-few-tips-and-tricks-on-controlling-the-creativity-deterministic-output-of-prompt-responses/172683

Sunday, 22 October 2023

Gen AI thing Part II: Talking in human language, not in machine language

In my previous blog, I explained that AI is about making machines think like humans, and I gave an example of a human task of recognising objects and how you can get a machine to do that. In this blog, I will expand a bit more on how we can all become Dr Dolittle (1) but with machines rather than animals.

A few years ago, someone from LinkedIn asked me what coding language I would recommend a child to learn since he was making the decision for his newborn. I said that, in my view, rather than humans learning how to speak machine language (coding), sooner or later machines would learn how to understand human language (something like NLP Natural Language Processing), and it would be more important for a child to learn how to think systematically but also creatively rather than learn how to code. I haven’t heard from that person since. Hey, sooner has happened (2)(3)(4). But I am jumping the gun.

For the 2^nd time, what is Gen AI!?

Gen AI is basically using AI to create something that wasn’t there before. What is created can be text, an image, a sound… But the trick is that, first the machine has to learn (that is be trained on a bunch of data/examples), then it can produce something.

But what excites most people is that anyone can use Gen AI because the machine speaks human language (no code and you can access the mythical AI!). I will tackle this part first.

The machine understands me!

Another branch in AI/ML is NLP, Natural Language Processing. NLP is precisely concerned with making machines understand what humans are saying. You can imagine, it’s already quite difficult for humans to understand each other, now imagine machines…

Language is a very complex thing, and is a living thing: new words are added all the time, meanings are added to words over time, words may mean different things in different contexts, humans use irony, sarcasm… But it is worth it because a huge amount of knowledge is kept in language, whether oral or written form. With the advent of the internet, and the digitisation (making it digital - bits and bytes- rather than analogue – printed image) of dictionaries, research papers, and democratisation of access to the internet (any idiot can write a blog – but smart people know which to read) there is a treasure trove of information that can be used to train a machine on the internet. But language is not that easy to deal with.

Words are all I have

In my previous post I talked about classification, and one of the keys is to measure the distance between things and decide which are similar. How does that apply to words?

But computers are all about numbers, not words…

The first challenge that machines have in comparing words is that they do better at numbers, so the first trick is to somehow make the problem one that involves numbers, once you know how to measure, then deciding which is closer is not so hard..

Look at the words “BETTER” and “BUTTER”. How close are they?

There is only 1 letter difference, so, these 2 words are quite close, it’s just replacing a letter. There are some concepts of distance that make such calculations, especially taking into account the number of letters in the word. These algorithms are quite useful. The idea is that words are similar if it takes little effort to change one into another.

Now, let me add the word “BEST” to the comparison. As an English speaking person, you would say “BEST” is close to “BETTER” but not so close to “BUTTER”, but going purely by replacing letters misses the meaning. Therefore there must be a way.

Vector Embedding

Similar to a dictionary for words humans can refer to, there is a source of information that machines can refer to that tells them the relationship between words (humans can use them too). This is called vector embedding.

Vector Embedding: Imagine

Imagine a 3 dimensional space in front of you. A point in this space represents a word. A vector for that word is like directions to that point in space (here may be x, y and z coordinates). And each word is embedded in space with closer words having similar meaning/context. One of the really popular techniques has been made public by google called word2vec, basically transform a word into a vector while preserving the meaning of the word.

So to follow our example, in the 3D space, ‘BETTER’ and ‘BEST’ will be close to each other, and ‘BUTTER’ further (closer to ‘MARGARINE’ and ‘MARMALADE’).

Points in space, and more

Not only are words that are similar grouped together so the machine can get the topics in a piece of text, but the relationships between the points in space also have meaning: moving from “BETTER” to “BEST” is the same journey as moving from “WORSE” to “WORST”.

This is something worth thinking about, not only do vector embeddings bring words that are about the same thing close to each other, but based on not only the distance, but the direction (6), the relationship between the words can be inferred.

What is the big deal with vector embeddings?

The beauty of vector embeddings is that some large organisations like google have made their vector space available for anyone to use, so we do not have to train the models, for example word2vec(5). In some cases, say you are dealing with very specialised topic say medicine, you should use specialised vector embeddings, but for most cases, for the machine to understand what the human is saying, generic vector embeddings work well enough.

Therefore, the machine is able to know what we are saying whether we use the same words or not because it now, with embeddings, see what words are close to each other in meaning and their relationship with others. That’s great!

What this means is that it is possible to train the machine on millions of pieces of text on a bunch of topics, and it will be able to understand that some of talking about the same thing even if the words used are different.

Ok, but this is not new right?

Correct! Vector embeddings aren’t a 2020s thing (7). In the 1950s, John Rupert Firth made a statement that underlies a lot of the thinking today:

“You shall know a word by the company it keeps” J.R. Firth 1957 (8)

However, 75 years ago we did not have the computing resources we have. So, AI went into winter – people could think about it, but it was very hard to put it into practice. For example, imagine the number of words in a language (9) – English Wiktionary (10) contains around 700k base words and 1.4m definitions - and if you want to put this in space with the meanings then you will need many groups spread across many dimensions, and even worse there will be dimensions with few words, making computation really tough (curse of dimensionality (11)). Most people can navigate through 4 dimensions and our brains can handle 4 dimensions easily (our 3D world + time) (Next time someone is late for a meeting, introduce them to the 4^th dimension 😊 ). However, some research points to humans being able to handle more (12), but still not as many required to plot even only common words in English.

Note that not everything stopped, people spent time in many other directions.

In the 2000s, research hotted up and some great leaps were made, for example research by Yoshua Bengio and colleagues at Montreal proposed the path forward “We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.” (13)

Ooops! getting too geeky here, just to summarise the point about vector embeddings. The thing with machines is that they don’t understand language just like that. So, one of the ideas was to convert words into numbers (vectors). Then the words that are about the same thing are grouped together, so if you use slightly different words from me but we are saying the same thing, the machine can tell. The neat thing about the numbers is that doing maths on the numbers allows the machine to understand the relationship between the words, for example the relationship between “king” and “man” is the same as “queen” and “woman”

(14)

The machine is now ready to understand you!

Add to this that there exist specialised vector embeddings for specific fields, this allows the machine to have understand you generally, or even if you are asking in depth questions on specialised topics.

So, what this helps is for machines to store all the info they have access to in a way that is very easy for them to search and make use of, so they can figure out to a large degree what you are talking about. It is not perfect, that is why you have a role of prompt engineer (someone who speaks the ‘human language’ the machines understand). Personally I think advances in NLP, machines being trained by interactions with humans, sooner or later there will be less need for prompt engineering; we (as in humans and AI) will all speak a ‘common language’, a bit like how some people speak differently to their children (or pets) or ‘foreigners’ compared to their own friends and family.

But still this is not Gen AI, where is the Generative part?

True, we are getting there…

In my previous blog and this one, I explained how machines can be made to think like humans, how advances in technology have made it easier to avail training data to machines so they can understand what humans are saying to a large extent.

The next step is how machines can now create stuff, I will be focusing on how machines can write stuff that has not been written before. That will be the topic of the 3^rd and last part of this loooong blogpost.

https://www.youtube.com/watch?v=YpBPavEDQCk
https://ai.meta.com/blog/code-llama-large-language-model-coding/
https://venturebeat.com/programming-development/stability-ai-launches-stablecode-an-llm-for-code-generation/
https://cloud.google.com/use-cases/ai-code-generation
https://en.wikipedia.org/wiki/Word2vec
That’s the basic thing about vectors, they are about ‘magnitude and direction’ https://en.wikipedia.org/wiki/Vector and the relationship between them can be ‘easily’ mathematically calculated
https://en.wikipedia.org/wiki/Word_embedding
https://cs.brown.edu/courses/csci2952d/readings/lecture1-firth.pdf
https://en.wikipedia.org/wiki/List_of_dictionaries_by_number_of_words
https://en.wiktionary.org/wiki/Wiktionary:Main_Page
https://en.wikipedia.org/wiki/Curse_of_dimensionality
https://www.frontiersin.org/articles/10.3389/fncom.2017.00048/full
https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
https://blogs.mathworks.com/loren/2017/09/21/math-with-words-word-embeddings-with-matlab-and-text-analytics-toolbox/

Sunday, 15 October 2023

Gen AI thing Part I: Plato’s spark

A friend recently asked me to help him understand the “gen ai thing” at a level that will allow him to have discussions (and since he knows me well, he knows this comes w opinion). I decided to go a level simpler, and try to explain Gen AI in a way my mum would understand (she’s in her 80s and my recent victory was getting her to carry her mobile phone when she is out of the house). I figured out it would take me a while, so I broke the explanations into smaller more digestible pieces. Here is Part 1.

What is Gen AI?

Before we go there….

First what is AI.(with apologies to my brother, Dr. AI)

Humans are a very arrogant species, so we decided that the way we think is something worth replicating. Hence, if we could make machines think like humans, then we would have something fantastic. Basically, machines don’t get tired easily, and you can expand the capacity of a machine much faster than a human (hopefully (1)).

AI is basically that, how do we get machines to think like humans.

So, what does it mean to think like a human?

How do you think?

Let’s take a simple example (a simple application of thinking like a human), you see a piece of furniture in a shop, how do you decide that it is a chair, or a table (assuming someone hasn’t written: this chair/table for $xxx)?

Enter Plato!

This is not a new question. Plato (~428-342BC that’s close to 2500 years ago) came up with a theory of forms, and that made me fall in love with PH102. The basic idea is that there is this world where the perfect form of every item in our world exists. So, I thought that makes sense! I know if something is a chair or a table by comparing it to the ideal form: it is closer to the ideal chair, or the ideal table?

What does closer mean?

If you have read other articles by me, you will remember I love talking about distance, closer means smaller distance. An object a is more likely to be an A than a B if it is closer to form A than form/ideal B. This is easy; how you define closer is where the fun begins 😊

Plato’s Theory of Forms

So, when I started playing with data, Plato’s theory of forms helped me a lot. The main difference is that, since I can’t access the world of ideals/forms, I have to base my version of form on what I had seen before.

The tables I had seen were 4 legged, came up to waist high (since my teens), had at the top a large flat surface so you can put stuff on top. Usually they were made of wood, although the legs could be made of metal. Chairs, were shorter, below waist high, but also usually had 4 legs, and made of similar material. However, chairs also had a back, the flat surface was not the highest point of the chair, but the back, so the person can sit on the flat seat, and rest his/her back on the back.

So, when I see a new object, I decide whether it looks more like a chair or a table, based on whether it is closer to the typical form I had in mind. Note that, I am not comparing just these words as I described table and chair, but the more complicated concept I have in mind (like an ideal form)

While humans learn from experience, machines can be made to learn. Instead of telling the machine the short ungainly description of a chair and a table above based on what I have seen, the trick is simply to give thousands of examples of things we know are chairs and tell the machine, these are chairs, and same thing for tables. So, you train the machine so that it comes up with its own view of what a chair is and what a table is. This is the training part of a model.

In this case, we train the model by feeding it images of chairs with the label that these are chairs, and the same for tables. This is called supervised learning, since someone supervised the process by providing these presumably accurate labels.

For now, we skip on how the machine breaks down the images, and let’s just assume that the machine now knows what chairs look like, and what tables look like. We then feed it a new image with a picture of a piece of furniture without label, and it will tell us: this is likely a chair (or a table) depending on what it has learnt. The machine has solved the classification problem, by deciding the new unlabelled furniture is classified as a chair/table accordingly.

Now, nobody stops you from training the machine with other pieces of furniture, and animals, and all sorts of other things… Afterall, that’s how we learnt, no?

Thought experiment:

Imagine you are walking about, and from far you see something. How do you decide whether this thing with 4 black legs, and black and white splotchy pattern on the top and sides is a table or a cow or may be a dalmatian?

How would your thinking process go?

Would it be faster if you remembered you were in a field in the middle of a farm, or close to a nature inspired furniture shop?

For me, yes; based on the context (where the object is), I can make the process simpler by focusing on a smaller list of likely choices, than the whole list.

This is why you get faster, likely better results, on a specialised machine (a farm animal identifier in the first case or a furniture classifier in the second) rather than a generic machine: a machine trained only on furniture would identify the table much faster and more accurately than one that has also learnt about cows and dalmatians. However, the furniture classifier would fail if someone asked it to identify a dalmatian… Hence, machines/algos trained on a specific set of data are usually better at working on that theme/context, but will not do so well at things in different contexts.

It should not be surprising, if someone from the tropics had never even heard of snow, he/she would be flabbergasted the first time, may be even think it was volcanic ash… But someone who has lived in the snow would even be able to tell you the type of snow (4), it all depends on what you need. Similarly, I know of many Mandarin/Cantonese/French speakers who claim that there are many nuances in their languages that are not present in English. Again, depends on what the people who use the language use it for.

If I had not seen a chair and table before, maybe I could check out in a dictionary:

Chair: a piece of furniture for one person to sit on, with a back, a seat and four legs (2)
Table: a piece of furniture that consists of a flat top supported by legs (3)

Then based on these definitions try and decide…

But you will tell me, wait, the human has a lot of work to do, he/she has to label the pictures.

Well, yes, for supervised learning, as a child asks adults: “what is this? And this? And this? How about this?”. But you will recognise the work the child put in: the child takes in the image he/she sees, commits it to memory in one shape or form, then later, when he/she sees a new object decides whether it is a chair, table or something else.

It is also possible to feed the machine unlabelled pictures, and it will decide by itself how many categories of objects there are (you can tell it that if you want) and it will create its own view of things and when presented with a new picture, after having been trained, decide whether that object is a chair of a table. This is called unsupervised learning.

There also is reinforcement learning, whereby the machine is given feedback on what it has predicted, therefore can continue learning by analysis what went right and what went wrong.

Now whether you choose to use supervised or unsupervised learning is up to you, there are reasons for and against using either form. Not only that, but how you choose to learn or group things also makes a difference to the output you will get and the ability of the model/algorithm to properly classify things. This is something I am geeky about, but is not for this blog post

You will agree this is a very useful thing to have in your back pocket and the practical applications are very very vast. For example, a few years ago, I found it was not too hard to build something that, once you feed it a photo of a piece of meat from a supermarket, it can identify the meat with reasonable accuracy, and you can slap on features such as estimating price (after estimating volume), freshness… You can easily do the same for fruit: auntie, no need to press-press anymore!)

Ok, but this is only classification of objects, doesn’t AI do many many more things? Is this really AI, or is it ML?

AI vs ML

AI is, as mentioned above, focused on making machines think like humans. ML is how we apply specific pieces if this to solve problems. The classification piece I used is a piece of ML, but ML is part of AI. But there is more than that.

Classification is just a small piece of what ML can do. ‘Traditionally’, ML has been used to do 3 things: classifying things as I illustrated above (think the photo app in your phone tagging the pictures by recognizing what is inside), finding out what affects what (regression) for example understanding how the weather affects price of tomatoes, and predicting things such as predicting the price of tomatoes next week.

A little diagram will illustrate what I am talking about:

So basically, while trying to explain Gen AI as it is today, I used ML, basically applied AI, and took 1 aspect (classification). I skipped over neural networks, that can be used to classify the images by say automatically varying the importance of different aspects – is height of horizontal piece mor important than number of legs – or even deep learning that basically is a more complex neural network.

But, simply by looking at the name: Neural Network, you can get a hint that the original idea was to mimic the layers of the human brain, deep learning is adding layers and other complexities. So, fret not, I am not misleading, I am simplifying. Remember, my aim is that even someone like my mum can understand.

In my next blog, I will explain the most common understanding of GenAI, ‘ChatGPT’, or basically LLM (Large Language Model) because people using it are not coding (speaking machine language) but speaking their own Natural Language (oops I slipped in NLP)

Elon Musk’s Neuralink has been approved to have human trials https://www.cnbc.com/2023/09/20/elon-musks-neuralink-is-recruiting-patients-for-its-first-human-trial.html
https://www.oxfordlearnersdictionaries.com/definition/english/chair_1
https://www.oxfordlearnersdictionaries.com/definition/english/table_1
https://en.wikipedia.org/wiki/Eskimo_words_for_snow

Sunday, 17 September 2023

Singapore the land of slow fast food (fried chicken!)

I love fried chicken, (part of one vision...) When I was a kid, it was an adventure to go to the stadium to watch a football game with my dad, followed by either a vindaye and watercress in maison bread, or the luxury of a 2 piece chicken meal from KFC (at that time popularly known as “kentucky”).

I still love the treat of friend chicken and have my favourite chicken stall at Yishun Ring Road. I also like fast food fried chicken, including KFC of course.

My last 2 attempts at having fast food fried chicken in Singapore were disasters. In both cases I waited more than 30 minutes for my fried chicken. Hence the title: “land of slow fast food (fried chicken!)”

As I am someone who is really into using analytics to make a difference, I was not very happy at how, given the data they have, the 2 fast food chains messed up.

I will go in reverse chronological order, starting with the easier problem to solve.

Texas Chicken Sengkang

We went to Texas Chicken Sengkang. We waited for 35 minutes for a standard chicken set. The thing is that there simply was no visible queue. People who were waiting to eat-in were simply inside at their tables, waiting for their orders to be ready.

It was the day of the presidential elections, so as I was checking on my order, I told a couple who was considering ordering that I had been waiting for close to 30 mins. The guy decided to order and go vote, knowing he’d be done before his order was ready, and he was right.

The case of Texas chicken is simple, they basically had too many orders, including remote orders (online) so they basically delayed everyone.

This is a clear issue of bad planning. If they have a system to manage their capacity to deliver orders, they obviously forgot that election day was a public holiday (it was announced on August 11 that if there was more than 1 candidate the polls would be on Sep 1, no excuse there).

I also happened to see only 3 and then 4 staff members in kitchen and front-office. I can easily understand why, given the variety of the menu, including delicious biscuits, it would take good planning from a resource perspective and clearly the manager was overwhelmed. A fellow frustrated customer told me, since I was taking pictures: “if you complain, her name is XXX, that is the manager”, but to me it is not really the manager’s fault. The organization failed at equipping her with tools for planning, or did not train her well enough to deal with the data.

This is a simple resource planning issue. You can make things a little bit more complex by adding the menu variety and the orders. Simple resource optimization with constraints, then you can add more constraints.

But my point is that there is enough information to enable the manager, after all they are the ones facing hangry customers. Whether it is because of process, lack of staff, inability to access relevant data (organization willful blindness), this can be resolved.

You know it will be a public holiday (voting day), you can expect higher demand, not a normal BAU Friday morning. You also know, in advance, that you can ask staff to be available. Then you know what orders are coming in, from where, in what order. It is a simple resource allocation problem.

KFC at AMK

We went to KFC AMK, before dinner time (we expected quick meal since we were hungry) ordered a meal of 2 pieces of original recipe chicken each. While orders may not be fulfilled in the order they were received, I think you would agree it is ridiculous to see many people who arrived after you for half an hour getting their orders earlier.

After some observation, I figured out what was wrong.

The chicken was coming out from the kitchen alright, but while we had been waiting 2 batches of chicken that were prepared were both spicy, not original. (1) It was a decision by management of the branch to keep customers who ordered original recipe chicken waiting.

I have no idea how long I would have waited, since I went to the counter to change our order to spicy and my order was fulfilled within the next few minutes.

To me, this is horrible, blind management. There is so much technology that is lying around the branch, from collating the detailed orders (self-serve kiosks) to machines that track orders and display the status, but it did not help my order and that of other people waiting for a long time – presumably original recipe lovers like me. Whether it was a conscious decision on the part of the branch management, or a default setting from HQ, or some preset rules, I have no clue, but to me it is a failure to use the data they have.

Unless KFC Singapore chooses to make some customers ordering their likely flagship product wait for ages, this should not be happening.

There actually are many ways to solve this problem. At a simple level it is about optimizing the orders: minimizing the wait time, given what has been ordered, the capacity of the kitchen and the staff. This can be optimised on the spot as the orders come in, the data is already captured, do something with it...

With some more effort, the displays at the kiosks can be enhanced to inform customers about the orders. Frankly, seeing your order being prepared for more than 10 or 15 minutes at a fast food joint is ridiculous.

Failing that, a minimum prediction of how much of what should be prepared and when. You do not have to wait for the order to prepare the meal. I remember the scene in the founder where the main character checked out the fast food and was shocked to be able to just pick his order without waiting…(2). It is a simple job of predicting the food required and prepare some of it in advance. Of course there is a risk that your prediction is wrong and the food gets cold, but it is a balance between this and extra ordinary waiting times. Analytics can work with the business to optimize this.

Now if you combine all 3 components (predict what you are likely to need, optimize your orders, inform customers at point of order), you get a nice ‘living’ system whereby everyone is informed, makes decisions accordingly. If I really have to wait 30minutes for my original chicken at point of order, I am likely to switch to spicy; but at least it is a decision I make and is within my control as a customer. KFC gets food delivered to happy customers.

Many analytics solutions are actually made up of different components supporting each others, and if well designed can work by themselves, but work better with a second component. This is how I prefer to work, show results quickly, gain acceptance – includes training people to use the system- and keep growing the results and the ability of people to use the system, rather than a big bang approach. But then this depends on the ability of the organization, especially people on the ground, to adopt the use of data.

Conclusion

Data is just data, or even worse is noise, unless the people on the ground are aware of how to use it, and the organization implementing it. KFC Singapore and Texas Chicken Singapore have obviously invested in technology, generate sufficient data, but do not seem to have enabled decision making on the ground by educating the staff and enabling them to make decisions.

In this day an age, this is really not doing right by their customers and staff.

If any of my contacts, readers agree with this post and know decision makers in KFC/Texas or any other fried chicken/fast food joint that is serving slow food and is unhappy about it, please make the introduction, the rest will be up to us 😊

There was a country wide promotion for special flavour chicken, but the promotion was not being advertised at that branch.
https://www.youtube.com/watch?v=KultzqPJaJs
Thanks to hotpot for AI generated image