Spend a vision with me

Wednesday, 17 April 2024

ML/(Gen)AI: get the basics right, unless all you want to do is brag or punish

Recently we have seen a few walk-backs, tail-between legs moments in the field of applications of ML/AI; the most interesting one from my point of view is Amazon stopping its ‘just walk out’ from its grocery stores (1). The reason I find it interesting is that behind all the tech, there were a thousand people checking the purchases (2). Hands up if you thought that it was all done by machines…

However, I will give credit where it is due, the ability to just walk out of a store after shopping, knowing that your purchases have been accurately tracked and the correct amount deducted from your accounts is a pretty useful feature. (3) It saves the consumer an appreciable amount of time and effort, and presumable at low to no cost especially if was truly automated)

And to me, analytics/ML/AI ‘s main aim should be to make people’s lives easier. Saving time and low to no cost while shopping is a good thing.

With this as context, I am sure you will understand my exasperation at HDB and ST Engineering(*).

HDB is using AI to detect power failures (4) – as if these occur frequently in Singapore…. ST Engineering even wants to sell its AI capabilities, even actionable intelligence (5)

However, to me, the basic functions the organisations are hired to do have to be done properly first, it’s like the Maslow hierarchy of needs (6), start by getting the basic needs right first. And this does not only mean the snazzy jazzy AI stuff, but the whole implementation process.

One of the HDB carparks we use very often is near a Sheng Siong Supermarket Coffee Shop + sundry shop… a very well utilized area within a block of HDB flats. The carpark of this area is managed by ST Engineering.

And the car park system has been erratic for a long while, at least 6 months now. You can tell who is a regular user of the car park because they give the car near the gantry enough space to reverse, reposition a few times to try and get the sensors to detect the vehicle (7). My question is, how is it possible that with their tools at their disposal, ST Engineering has not detected issues with this gantry? I am not even talking preventive maintenance, I am talking about usage being affected… That’s even more basic.

To add to this, I have a very specific incident.

It was raining very heavily when we were trying to exit the carpark, and the barriers simply wouldn’t go up. I exited the vehicle to press for attention from a human (ST Engineering, I assume has a number of people who can deal with the situation). The moment connection was made, it was cut-off; basically the human hung up. Twice. While I was getting drenched.

I called the helpline once we managed to get out of the car park (thank you to the people queuing for their patience, and the closest vehicle knowing to leave large room for maneuvering.) All I got was basically the communication is spoilt, we will fix it. I asked for written feedback, gave my email address, and noting came from ST Engineering. To me sounds like the case was not even lodged and there may have been covering for a colleague.

The point is, with simple use of Analytics,

the faulty gantry should have been detected earlier, rather than wait for complaints
there should be an automated system to ensure cases and raised and closed with SLAs, and this should be tracked automatically. Again, this is simple using today’s tools.

To me ST Engineering has failed in analytics and process, and in customer care.

How about HDB you ask?

Well, HDB outsourced the management of the carpark to ST engineering. Do they have customer satisfaction reports from ST Engineering, or do they not care? They must be happy with ST Engineering reports and performance – although I doubt the contract involves more than $. But also the design of the car park is bad. I got drenched attempting to communicate with the human managing gantry issues. A couple of metres from the gantry is a nicely covered walkway. I would think that extending the coverage to the gantry would not have costed that much. But hey, who cares?

What I am saying is very simple, before you start talking of GenAI, make sure you have the basic right, take care of Maslow’s hygiene and safety issues before you go for you own self-actualisation. After all, while HDB has a virtual monopoly on parking, customers should matter, don’t you think?

Conclusion

Build useful analytics, useful to the your users, make sure your KPIs reflect that, and build them into contracts. On the contractor side, track and analyse your true performance continuously. It is not rocket science, but still so many organisations fail at making lives of their stakeholders easier. Although, it would seem HDB/contractors are focused on maximizing revenue, investing in punishing rather than delivering good service (8)(9).

https://www.theverge.com/2024/4/2/24119199/amazon-just-walk-out-cashierless-checkout-ending-dash-carts
https://www.bloomberg.com/opinion/articles/2024-04-03/the-humans-behind-amazon-s-just-walk-out-technology-are-all-over-ai
this is a very different experience from NTUC supermarket self-checkout, but hat’s for another day
https://sbr.com.sg/telecom-internet/news/hdb-eyes-ai-powered-energy-system-in-tengah
https://www.stengg.com/en/digital-tech/data-science-analytics-and-ai/
https://www.simplypsychology.org/maslow.html
Each vehicle in SG has an IU (In-vehicle Unit) and when you get into a car park, the IU number is read, the gantry opens, and upon exit the IU number is read, the time and relevant fee calculated and deducted from your cashcard within the IU, and the gantry opens.
https://blackdotresearch.sg/secret-devices-installed-in-hdb-car-park-gantries-to-catch-tailgaters/
https://tnp.straitstimes.com/news/singapore/hdb-crack-down-carpark-fee-evaders

* HDB is the Housing Development Board, a government agency responsible for public housing in Singapore, around 77% of residents live in HDB flats. ST Engineering is a government linked entity specialising in aerospace, electronics, land systems and marine sectors.

Sunday, 10 December 2023

NTUC fairprice shines a new path in AI

Recently, I was having a discussion on the potential effects of large scale adoption of LLMs (and AI in general), and one of the risks was a move towards uniformity/homogeneity, or a loss of randomness in the human experience. (1)

Basically, if algorithms are designed to give you ‘the most likely’ or ‘the best’ answer (this may not always have to be the case (2)), then everyone would get similar answers and be driven to same things.

Add to this the fact that as more people use LLMs, more and more content on the internet will be LLM generated, and therefore the training data used for LLMs will include a higher percentage of LLM created data as opposed to human created data.

Fear not!

A data scientist at NTUC fairprice in Singapore has managed to build a machine (apply an algo) that gives very interesting answers:

The AI built by NTUC understands that, after a meal, you can use 2 similar products, similar in the sense that you, as a human, have a choice to do 1 of 2 things:

do the dishes using the sponge, or
have a piece of chocolate

There still is hope!

Or despair: “mummy/daddy, it’s not me! It’s the machine who told me I could do either one since they are similar”

Sunday, 19 November 2023

You are overpaying for your vehicle insurance. It doesn't have to be this way.

I am sure you have been making this complaint over the years, but didn’t have much choice since prices are around the same and policies are designed to be sticky or not so advantageous to get out of (that’s for another day).

Singapore General Insurance Association says so too!

But now, ladies and gentlemen, we have the ultimate confirmation. This comes from the GIA the association that groups General Insurers in Singapore “About two in 10 motor insurance claims in Singapore are fraudulent, often involving exaggerated injuries and inflated vehicle damage”(1)

And the president of Budget Direct

The president of Budget Direct, who usually prides itself in competitive rates even admitted: “In the end, all motorists are victims of motor insurance fraud as we all end up paying higher premiums as a result”

This is the key you see, the claims paid out in fraudulent cases simply get translated into increased premiums for ALL vehicle insurance customers. Irrespective of whether you commit fraud, are a scam victim, or are accident/claim-free, you are paying for the fraudsters’ bread butter and cake, and the insurers maintain their healthy margins and profits

The Insurers have no incentive to act on fraud

The thing is the GIA is saying, it is up to you, the customer to stop the fraud. And that I find laughable. Let’s see what are the main causes of fraud as per GIA …

Beware of Phoney Helpers: After an accident, individuals may offer "help" and pressure victims to follow their directions, often leading them to unauthorized repair shops or overpriced towing services.
Staged Accidents: Scammers stage accidents, causing victims to collide with their vehicles and then falsely accuse victims of causing the collision. They often fake injuries and make substantial claims for damage and injuries.
Phoney Witnesses: Suspect convenient witnesses who support the other driver's account, often suggesting a staged accident.

1 Unauthorised repair shops:

Most vehicle owners are aware of the workshops that their insurer accepts, whether by own bad experience, by hearing from friends and family, or from the insurer. Plus, most of the time, unauthorized repair shops costs are not paid by the insurer, if they are, it is a bit rich on the part of insurers to honour the claim while complaining about it.

Plus it is not rocket science to detect highly inflated claims based on pictures and description that accompany the claims. I know because I worked in an insurance company in a much less developed country than Singapore, and I know for fact that they have the data needed to deal with this, the question is financials and will.

2 Staged Accident

And how is that the fault of the insured? The insured is getting scammed at the same time as the insurer, unless GIA is claiming that the insured is somehow going along with the scammers… more on this later

3 Phoney Witnesses

Again, how will someone who has just been in an accident be able to detect whether witnesses are phoney or not?

Unless Singapore is a nation of scammers (not scammed/scam victims (2)), it just doesn’t make sense to think that individual people involved in accidents are part of the scam. So should victims pay the price twice (once being scammed and second via higher premium, and probably loss of NCB)?.

So my arguments that follow assume that Singapore is not a nation of scammers (unlike (3)). Afterall Singapore is only beaten by Finland, New Zealand, and Denmark in terms of corruption perception. (4)

The fact that GIA mentions Staged Accidents, Phoney Witnesses seems to indicate syndicates are at play, or at best a group of people who are in the business of scamming accident vistims. In fact, it is likely that staged accidents and phoney witnesses occur together, rather than separately.

You can have a staged accident without phoney witnesses, but very unlikely to have phoney witnesses to a real accident.

So chances are, there are syndicates/gangs/groups of scammers at work. It is ridiculous for GIA to expect an individual consumer to be able to detect them, don’t you think so?

So what can be done?

The answer, in most of my blogs, is Analytics!

Inflated Claims

I briefly mentioned the solution to GIA issue 1, inflated claims. Analytical models can be built to detect inflated claims. The beauty of this is that it can be even employed to detect which workshops are cheating.

But, from experience, there is little will power in senior management to do something that will rock the boat. It is important for analytics people to learn that not everything that can be done will be done, other factors come into play, obviously whether it is financially viable (in this case I am quite sure it pays for itself quite quickly, a couple of months of work to build, another month to finetune, and the low running costs for a basic solution), or politically (is it worth opening pandora’s box at your preferred workshops?).

In sum, technically easy to solve and pays for itself, management wise depends on management.

It's even worse at the GIA level where, as people in SG know, some workshops are on the panel for multiple insurers.

Staged Accidents and Phoney Witnesses

Accidents, by their nature, are (most of the time) unexpected, hence being able to, by simply looking say at road and traffic conditions, location, it is not that straight forward estimate the probability of an accident and highlight the stranger ones; one of the reasons being that humans play a large role and it is not so easy to get data on all actors involved, not only all drivers involved and their data, but also drivers in the immediate vicinity.(5)

The easy way to detect staged accidents and phoney witnesses is to focus on the people, not the vehicles. The key assumption is that these are the work of groups of people. Hence they are likely to play different roles at different times. Let me put it this way, how likely is it that someone is a claimant, a witness of an accident, and at fault for a vehicular accident all within say a year?

The idea is that, chances are, a member of the group is likely to play different roles over time, sometimes even with different insurers to make the chances of detection lower. This is something very easy to pick up using social network analytics, especially at the GIA or police level.

Conclusion

Saying that 20% of claims are likely to be fraudulent and placing the onus on customers/insured in the case of vehicular insurance in Singapore is a joke.

1 The main causes as stated by GIA are unlikely to be caused by claimants

2 The GIA itself (or to a lesser degree large insurers) are the ones who have the data easily at hand to detect potential fraudulent cases effectively

3 however the insurers (and the GIA) have little incentive to do so since they can simply pass the costs to customers.

However, relatively simple analytics can, right now, help alleviate this problem and allow customers to pay lower premiums since the risk of fraud can be mitigated. It is just a question of will from the insurers’ point of view.

https://insuranceasia.com/insurance/news/20-singapores-motor-insurance-claims-are-fraudulent-giaj
https://www.straitstimes.com/world/14-trillion-lost-to-scams-globally-s-pore-victims-lost-the-most-on-average-study
https://www.youtube.com/watch?v=q5PI5ZtJTSY
https://www.transparency.org/en/cpi/2021
That is not actually true anymore in Singapore, I will explain in a subsequent blog.

Sunday, 5 November 2023

GenAI thing, bonus: hype cycle

Gartner is an organization that classifies different technologies into their “hype cycle” framework. (1) basically, any piece of technology may go through 5 stages:

1. Technology Trigger

· A technology reaches a proof of concept, a successful experiment, people get excited.

2. Peak of Inflated Expectations

· Given the excitement, some companies jump in and experiment, some succeed, most do not.

3. Through of Disillusionment

· Given failures, some technology versions fail, and investment into the space gets hit and will only recover if providers iron out main issues.

4. Slope of enlightenment

· As technology becomes production ready, more successes are created and the usage and limits of the technology are better understood. New generation products appear.

5. Plateau of Productivity

· Mainstream adoption, what was successful niche spreads.

Guess where Gartner placed Gen AI in its 2023 AI hype cycle?

(2)

That’s right, right at the peak of inflated expectations. Plus, they only see that plateau of productivity being reached in 5 to 10 years.

On the other hand, something like Computer Vision, where we use machines to process images to extract meaningful information is close to the plateau of productivity. There are many pieces of software/APIs that help you analyse images very efficiently, and very importantly there are proven use cases in production for computer vision, from facial recognition to control access, to recognizing who is not correctly wearing masks (useful during COVID), to detecting anomalies in x-rays/MRIs, to identifying and tracking people from public cameras (ahum…).

GenAI, on the other hand, has made a big splash, people around the world, especially including non data professionals are raving about the possibilities that GenAI can bring. AI is already being used whether we are aware/like it or not, for example in the UK (3), now imagine GenAI (in an earlier blog I listed a few well known issues with LLMs)

So what have people been doing with GenAI. One of the avenues that is being explored is helping humans write code. And there are many many exampes of this; for example the ubiquitous GitHub CoPilot (4). But as I asked in an earlier blog, do you think the code that is written is of very high quality since it is built on ‘everyone’s’ coding…

There have also been efforts to help manage GenAI. Actually, apart from the coding co pilot, the other development from Microsoft Build (5) earlier this year is the guardrails Microsoft put around GenAI. And this can be leveraged, as OCBC has done (6) with MS Azure to allow fact checking, not blindly following the answers generated: curation! (7)

The reality is, I believe GenAI is a very useful tool to have in your arsenal. More ‘traditional’/’tried and tested’ methods may be more suitable for your problem at hand. I have had customers saying “I just want GenAI” whether their use case suits or not. I would just point to the “peak of inflated expectations”.

I am someone who enjoys building stuff that work and enables organisations to hit business KPIs, and to do that, choosing the right tool is very important, and this is something I can help with. You can use a sledgehammer to open a can of beans, you can use a can opener too; guess which, currently, more efficiently gets you to the beans and deal with your hunger?

https://en.wikipedia.org/wiki/Gartner_hype_cycle
https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2023-gartner-hype-cycle
https://www.theguardian.com/technology/2023/oct/23/uk-officials-use-ai-to-decide-on-issues-from-benefits-to-marriage-licences
https://github.blog/2023-03-22-github-copilot-x-the-ai-powered-developer-experience/
https://news.microsoft.com/build-2023/
https://www.straitstimes.com/business/ocbc-to-deploy-generative-ai-bot-for-all-30000-staff-globally
Interestingly, if you look again at the AI hype cycle 2023 diagram above, "Responsible AI" is also at the peak of inflated expectations, humans still, fortunately, have more thinking to do...

Sunday, 29 October 2023

Gen AI Thing, Part III : finally getting to the crux of the issue

In my previous 2 blog posts, I explained, as I would explain to my mum, what is AI, and how machines can understand our words. In this blog, I explain heuristically (non-technical / common sense) how machines can not only understand what we are saying, but how they can respond intelligently and create answers that were not there before (Gen AI)

One more little detour

For those of you who have played with ChatGPT or something similar (Bard?), one of the things that puzzles people is the concept of ‘tokens’. Some of you may ask, since I claim that machines can understand human words well enough, what is this token thing? Are we gambling when using these ‘tools’?

Gambling? Yes may be… Some examples of funky ChatGPT (and Bard) results

ChatGPT making up cases to support its argument (1)
ChatGPT making up cases of sexual harassment and naming the supposed perpetrator (2)
Bard makes mistake regarding James-Webb Telescope (3)

There are ways to mitigate these issues, but this is beyond the scope of this blogpost. Suffice to say that such models do give information that may be less than reliable. But then again, they were not designed to ‘tell the truth, the whole truth, and nothing but the truth’.

ChatGPT/BARD are not designed to tell the truth?! I thought they were AI!

The answer to the question lies in the question itself. These are AI systems, and as I mentioned in my 1^st blog of the series, such models learn from (are trained on) data they are fed. Secondly, it may help to understand a bit how these systems, they are called LLMs (Large Language Model), work.

How does ChatGPT/Bard… work?

Let me start by a word game. How many of you play wordle (4)? Basically everyday, a 5 letter word is chosen, and you have to guess the word without any clue, you have 6 tries. All that you will ever know is whether the letter you have suggested exists in the answer but is in the wrong slot (yellow) or in the correct spot (green) or does not exist at all (black). The other condition is that any combination of letters you try has to be an existing word.

The thing is, most people, once they know the position of one letter, will try to guess the letters next to it based on what they know about the English language, for example (5):

‘E’ is the most common letter in English and your best bet if you know nothing about the word, this is followed by ‘T’ and ‘A’.
If there is a Q, chances are there has to be a U, and chances are the U follows the Q
If there is a ‘T’ except in 5^th position, then the next letter is likely a ‘H’ next is ‘O’ and next is ‘I’
If there is a ‘H’ except in 5^th position, then the next letter is likely a ‘E’ next is ‘A’ and next is ‘I’

Combinations of 2 letters such as ‘QU’, ‘TH’, ‘TO’, ‘TI’ are called bigrams. The idea is that once you know a letter, you use this information to find the most likely following letter – this is known as conditional probability, based on the condition that one letter is an ‘T’ then the most likely following letter in an ‘H’, not an ‘E’, the most common letter in English. The key is that your choice of letter changes based on information you have.

These are shortcuts, findings based on analysis of words, that can help you guess the letters in wordle.

As an aside, the most common bigrams in different languages can be very different (6)(7)

Bigram Popularity	English	French	Spanish
1	TH	ES	DE
2	HE	LE	ES
3	IN	DE	EN
4	ER	EN	EL
5	AN	ON	LA

Letters are fine, but Gen AI generates whole documents, not random letters

It’s just an extension of the idea. In the example above, I used bigrams (2 letters), when playing wordle, some people may choose trigrams (3 letters), it’s basically the same thing, just a little bit more complex.

The next step then is that instead of guessing the next letter (using a bi-gram), you guess the next word. But why stop there? You can actually go beyond a bi-gram and use multiple letters (here words). It’s, in principle, that straightforward. However, to improve the performance, there are a few more tricks.

The problem is the size of the data; given the number of words, the combinations possible increase exponentially as you add more words. The brilliant, or one of the more brilliant, things about LLMs is that they generate a probability of a combination of words occurring. They do that by using an underlying model and, recognise the patterns.

AI, NN, and the human brain

As mentioned in Part 1 of this blog, AI is about making a machine think like a human. The way this has been done in Neural Networks is to make a representation (model) of the human brain, with nodes and connections. And as it is thought with the human brain, each node does a fairly simply job (one of the simplest jobs is a binary yes/no or a threshold – in this case called a perceptron), and the connections between them are given weights based on how important they are.

Note that as Neural Nets have progressed, they have taken a life of their own and the idea of mimicking the human brain structure is not central, the architecture of neural nets, while using nodes and connections can be different.

Going back to the chair and table example

When you show the machine a picture, it breaks it down into small parts of the picture (features), may be the length of the leg, the shape of the back, and assigns weights based on how important these features are. After being trained over many examples, the model is ready to distinguish between table and chair.

The illustration above shows a very simple type of Neural Network, one input layer where you start, one hidden layer of nodes and connections in one direction to do the magic, into the output layer. For the table chair classification from images for example, it has been found that neurons arranged in a grid formation work well, specifically a Convolution Neural Net. Basically, a set of filters is applied to detect specific patterns in the picture (convolutions), then these are summarised and combined (more layers) to extract the more salient features without burning enormous resources, and finally pushed to the output layer; in the case of our chair/table classification there would be 2 nodes in the output layer, the output being the probability that the image fed is a chair or a table. (9)

There are many ways to structure a neural net, many parameters to play with. You wouldn’t be surprised that one of the important innovations was that, for processing text it is important to know what else is in the sentence, and not process each word independently. So, there was a need to be able to refer to past Long Short Term Memory (LSTM) (10) allowed this to happen by allowing the user to control how long some nodes would retain information, and hence be used to provide context.

However, LSTM is not that fast as it processes information sequentially, like many of us do, we read word by word.(11). In 2017, a team from google came up the brilliantly entitled paper “attention is all you need” (12). This gave rise to the rise of Decepticons (13), sorry, to Transformers(14). Basically, the machine, when processing a chunk of text, calculates weights using an attention network, calculating what words need to be given a higher weight. While Transformers can be run sequentially, they can also be run in parallel (no recursion), hence the usefulness of GPUs in LLMs.

To answer a friend’s question, GPUs are not necessary in LLMs, but they really speed things up. (15)

Is LLM therefore just a better chatbot?

You must be thinking that LSTM is something that has been used in Chatbots before, and LLMs, as I have explained here, basically just answer your queries…

Actually no. One huge difference between chatbots and LLMs is how they learn. LLMs use reinforcement learning (I sneakily introduced this in Part I of this series, there even is RLHF Reinforcement Learning from Human Feedback...), also the volume and diversity of data that these have been traditionally trained on is vastly different. LLMs can ‘talk’ about many more topics/intents than a traditional chatbot that is usually more focused.

However, the comparison with a chatbot is an interesting one. The interest in LLMs really took off with GPT3.5. As the name suggests it is not the 1^st offering in the GPT family of OpenAI. So what made GPT3 garner so much interest (GPT-1 was released in 2018, GPT2 in 2019, GPT3 in 2020, and GPT3.5 in 2022 (16))? One was that it suddenly improved, and second that a friendly chat interface was included, allowing virtually anybody with an internet connection to play with it, and become an instant advocate.

A few more points

GenAI, here LLMs, basically smartly and quickly process word/token embeddings to understand you, and produce a response. The key to understand them, as I mentioned earlier is to know they are not designed to give you the truth, but they answer: “what would a likely answer be?” Actually, not only that, GenAI gives you the likely answer of an average person (Thank you Doc for pointing this out clearly). Think about it, if it is trained on the whole internet, and ranks the most likely answer, then the most likely answer may not be that of people who really know what they are talking about. Hence, my thought that LLMs can help so-so coders, but expert coders may not be helped that much, they probably know better.

Questions to ponder:

Do you believe that logic is something that is common in humankind? Is common sense really that common?
How about Maths, do you believe that people are generally good or bad at Maths?
Why am I asking this? Simple, now tell me, do you think, LLMs are good at logic? At Maths?

Is most likely always the best?

Now, there’s one more thing is that you can influence: what GenAI responds to you. I mentioned that they basically rank all possible words and pick one; may be your first instinct is to always pick the highest probability word.

That would give you consistent answers over time. However, always using highest probability response often leads to circular and less than satisfactory answers. Hence, most people choose to allow some randomness (ChapGPT calls this temperature(17))

Conclusion:

GenAI is a great tool (what you can do with GenAI, whether you are from an SME, an individual looking to make your own life easier, or a large organisation may be a topic for a next blog). What it does it come up with is a possible answer based on the data it has been trained on. (Actually another blog post could be why GenAI is not the answer to everything, but that’s probably obvious)

https://www.channelnewsasia.com/business/new-york-lawyers-sanctioned-using-fake-chatgpt-cases-legal-brief-3581611
https://www.businesstoday.in/technology/news/story/openai-chatgpt-falsely-accuses-us-law-professor-of-sexual-harassment-376630-2023-04-08
https://www.bbc.com/news/business-64576225
https://www.nytimes.com/games/wordle/index.html
http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/english-letter-frequencies/
http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/french-letter-frequencies/
http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/spanish-letter-frequencies/
you can also adjust how you penalise mistakes, known as the loss function; so that’d be a 4th way.
https://www.simplilearn.com/tutorials/deep-learning-tutorial/convolutional-neural-network
http://www.bioinf.jku.at/publications/older/2604.pdf
LSTM evolved from Recurrent Neural Networks (RNN) where the idea was that you can look back at information you processed earlier (hence recurrent), however if the information was far back, there were problems referring to it.
https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
https://tfwiki.net/wiki/Rise_of_the_Decepticons
https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)
Memorable demo of CPU vs GPU https://www.youtube.com/watch?v=-P28LKWTzrI
https://en.wikipedia.org/wiki/GPT-4
https://community.openai.com/t/cheat-sheet-mastering-temperature-and-top-p-in-chatgpt-api-a-few-tips-and-tricks-on-controlling-the-creativity-deterministic-output-of-prompt-responses/172683

Sunday, 22 October 2023

Gen AI thing Part II: Talking in human language, not in machine language

In my previous blog, I explained that AI is about making machines think like humans, and I gave an example of a human task of recognising objects and how you can get a machine to do that. In this blog, I will expand a bit more on how we can all become Dr Dolittle (1) but with machines rather than animals.

A few years ago, someone from LinkedIn asked me what coding language I would recommend a child to learn since he was making the decision for his newborn. I said that, in my view, rather than humans learning how to speak machine language (coding), sooner or later machines would learn how to understand human language (something like NLP Natural Language Processing), and it would be more important for a child to learn how to think systematically but also creatively rather than learn how to code. I haven’t heard from that person since. Hey, sooner has happened (2)(3)(4). But I am jumping the gun.

For the 2^nd time, what is Gen AI!?

Gen AI is basically using AI to create something that wasn’t there before. What is created can be text, an image, a sound… But the trick is that, first the machine has to learn (that is be trained on a bunch of data/examples), then it can produce something.

But what excites most people is that anyone can use Gen AI because the machine speaks human language (no code and you can access the mythical AI!). I will tackle this part first.

The machine understands me!

Another branch in AI/ML is NLP, Natural Language Processing. NLP is precisely concerned with making machines understand what humans are saying. You can imagine, it’s already quite difficult for humans to understand each other, now imagine machines…

Language is a very complex thing, and is a living thing: new words are added all the time, meanings are added to words over time, words may mean different things in different contexts, humans use irony, sarcasm… But it is worth it because a huge amount of knowledge is kept in language, whether oral or written form. With the advent of the internet, and the digitisation (making it digital - bits and bytes- rather than analogue – printed image) of dictionaries, research papers, and democratisation of access to the internet (any idiot can write a blog – but smart people know which to read) there is a treasure trove of information that can be used to train a machine on the internet. But language is not that easy to deal with.

Words are all I have

In my previous post I talked about classification, and one of the keys is to measure the distance between things and decide which are similar. How does that apply to words?

But computers are all about numbers, not words…

The first challenge that machines have in comparing words is that they do better at numbers, so the first trick is to somehow make the problem one that involves numbers, once you know how to measure, then deciding which is closer is not so hard..

Look at the words “BETTER” and “BUTTER”. How close are they?

There is only 1 letter difference, so, these 2 words are quite close, it’s just replacing a letter. There are some concepts of distance that make such calculations, especially taking into account the number of letters in the word. These algorithms are quite useful. The idea is that words are similar if it takes little effort to change one into another.

Now, let me add the word “BEST” to the comparison. As an English speaking person, you would say “BEST” is close to “BETTER” but not so close to “BUTTER”, but going purely by replacing letters misses the meaning. Therefore there must be a way.

Vector Embedding

Similar to a dictionary for words humans can refer to, there is a source of information that machines can refer to that tells them the relationship between words (humans can use them too). This is called vector embedding.

Vector Embedding: Imagine

Imagine a 3 dimensional space in front of you. A point in this space represents a word. A vector for that word is like directions to that point in space (here may be x, y and z coordinates). And each word is embedded in space with closer words having similar meaning/context. One of the really popular techniques has been made public by google called word2vec, basically transform a word into a vector while preserving the meaning of the word.

So to follow our example, in the 3D space, ‘BETTER’ and ‘BEST’ will be close to each other, and ‘BUTTER’ further (closer to ‘MARGARINE’ and ‘MARMALADE’).

Points in space, and more

Not only are words that are similar grouped together so the machine can get the topics in a piece of text, but the relationships between the points in space also have meaning: moving from “BETTER” to “BEST” is the same journey as moving from “WORSE” to “WORST”.

This is something worth thinking about, not only do vector embeddings bring words that are about the same thing close to each other, but based on not only the distance, but the direction (6), the relationship between the words can be inferred.

What is the big deal with vector embeddings?

The beauty of vector embeddings is that some large organisations like google have made their vector space available for anyone to use, so we do not have to train the models, for example word2vec(5). In some cases, say you are dealing with very specialised topic say medicine, you should use specialised vector embeddings, but for most cases, for the machine to understand what the human is saying, generic vector embeddings work well enough.

Therefore, the machine is able to know what we are saying whether we use the same words or not because it now, with embeddings, see what words are close to each other in meaning and their relationship with others. That’s great!

What this means is that it is possible to train the machine on millions of pieces of text on a bunch of topics, and it will be able to understand that some of talking about the same thing even if the words used are different.

Ok, but this is not new right?

Correct! Vector embeddings aren’t a 2020s thing (7). In the 1950s, John Rupert Firth made a statement that underlies a lot of the thinking today:

“You shall know a word by the company it keeps” J.R. Firth 1957 (8)

However, 75 years ago we did not have the computing resources we have. So, AI went into winter – people could think about it, but it was very hard to put it into practice. For example, imagine the number of words in a language (9) – English Wiktionary (10) contains around 700k base words and 1.4m definitions - and if you want to put this in space with the meanings then you will need many groups spread across many dimensions, and even worse there will be dimensions with few words, making computation really tough (curse of dimensionality (11)). Most people can navigate through 4 dimensions and our brains can handle 4 dimensions easily (our 3D world + time) (Next time someone is late for a meeting, introduce them to the 4^th dimension 😊 ). However, some research points to humans being able to handle more (12), but still not as many required to plot even only common words in English.

Note that not everything stopped, people spent time in many other directions.

In the 2000s, research hotted up and some great leaps were made, for example research by Yoshua Bengio and colleagues at Montreal proposed the path forward “We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.” (13)

Ooops! getting too geeky here, just to summarise the point about vector embeddings. The thing with machines is that they don’t understand language just like that. So, one of the ideas was to convert words into numbers (vectors). Then the words that are about the same thing are grouped together, so if you use slightly different words from me but we are saying the same thing, the machine can tell. The neat thing about the numbers is that doing maths on the numbers allows the machine to understand the relationship between the words, for example the relationship between “king” and “man” is the same as “queen” and “woman”

(14)

The machine is now ready to understand you!

Add to this that there exist specialised vector embeddings for specific fields, this allows the machine to have understand you generally, or even if you are asking in depth questions on specialised topics.

So, what this helps is for machines to store all the info they have access to in a way that is very easy for them to search and make use of, so they can figure out to a large degree what you are talking about. It is not perfect, that is why you have a role of prompt engineer (someone who speaks the ‘human language’ the machines understand). Personally I think advances in NLP, machines being trained by interactions with humans, sooner or later there will be less need for prompt engineering; we (as in humans and AI) will all speak a ‘common language’, a bit like how some people speak differently to their children (or pets) or ‘foreigners’ compared to their own friends and family.

But still this is not Gen AI, where is the Generative part?

True, we are getting there…

In my previous blog and this one, I explained how machines can be made to think like humans, how advances in technology have made it easier to avail training data to machines so they can understand what humans are saying to a large extent.

The next step is how machines can now create stuff, I will be focusing on how machines can write stuff that has not been written before. That will be the topic of the 3^rd and last part of this loooong blogpost.

https://www.youtube.com/watch?v=YpBPavEDQCk
https://ai.meta.com/blog/code-llama-large-language-model-coding/
https://venturebeat.com/programming-development/stability-ai-launches-stablecode-an-llm-for-code-generation/
https://cloud.google.com/use-cases/ai-code-generation
https://en.wikipedia.org/wiki/Word2vec
That’s the basic thing about vectors, they are about ‘magnitude and direction’ https://en.wikipedia.org/wiki/Vector and the relationship between them can be ‘easily’ mathematically calculated
https://en.wikipedia.org/wiki/Word_embedding
https://cs.brown.edu/courses/csci2952d/readings/lecture1-firth.pdf
https://en.wikipedia.org/wiki/List_of_dictionaries_by_number_of_words
https://en.wiktionary.org/wiki/Wiktionary:Main_Page
https://en.wikipedia.org/wiki/Curse_of_dimensionality
https://www.frontiersin.org/articles/10.3389/fncom.2017.00048/full
https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
https://blogs.mathworks.com/loren/2017/09/21/math-with-words-word-embeddings-with-matlab-and-text-analytics-toolbox/

Sunday, 15 October 2023

Gen AI thing Part I: Plato’s spark

A friend recently asked me to help him understand the “gen ai thing” at a level that will allow him to have discussions (and since he knows me well, he knows this comes w opinion). I decided to go a level simpler, and try to explain Gen AI in a way my mum would understand (she’s in her 80s and my recent victory was getting her to carry her mobile phone when she is out of the house). I figured out it would take me a while, so I broke the explanations into smaller more digestible pieces. Here is Part 1.

What is Gen AI?

Before we go there….

First what is AI.(with apologies to my brother, Dr. AI)

Humans are a very arrogant species, so we decided that the way we think is something worth replicating. Hence, if we could make machines think like humans, then we would have something fantastic. Basically, machines don’t get tired easily, and you can expand the capacity of a machine much faster than a human (hopefully (1)).

AI is basically that, how do we get machines to think like humans.

So, what does it mean to think like a human?

How do you think?

Let’s take a simple example (a simple application of thinking like a human), you see a piece of furniture in a shop, how do you decide that it is a chair, or a table (assuming someone hasn’t written: this chair/table for $xxx)?

Enter Plato!

This is not a new question. Plato (~428-342BC that’s close to 2500 years ago) came up with a theory of forms, and that made me fall in love with PH102. The basic idea is that there is this world where the perfect form of every item in our world exists. So, I thought that makes sense! I know if something is a chair or a table by comparing it to the ideal form: it is closer to the ideal chair, or the ideal table?

What does closer mean?

If you have read other articles by me, you will remember I love talking about distance, closer means smaller distance. An object a is more likely to be an A than a B if it is closer to form A than form/ideal B. This is easy; how you define closer is where the fun begins 😊

Plato’s Theory of Forms

So, when I started playing with data, Plato’s theory of forms helped me a lot. The main difference is that, since I can’t access the world of ideals/forms, I have to base my version of form on what I had seen before.

The tables I had seen were 4 legged, came up to waist high (since my teens), had at the top a large flat surface so you can put stuff on top. Usually they were made of wood, although the legs could be made of metal. Chairs, were shorter, below waist high, but also usually had 4 legs, and made of similar material. However, chairs also had a back, the flat surface was not the highest point of the chair, but the back, so the person can sit on the flat seat, and rest his/her back on the back.

So, when I see a new object, I decide whether it looks more like a chair or a table, based on whether it is closer to the typical form I had in mind. Note that, I am not comparing just these words as I described table and chair, but the more complicated concept I have in mind (like an ideal form)

While humans learn from experience, machines can be made to learn. Instead of telling the machine the short ungainly description of a chair and a table above based on what I have seen, the trick is simply to give thousands of examples of things we know are chairs and tell the machine, these are chairs, and same thing for tables. So, you train the machine so that it comes up with its own view of what a chair is and what a table is. This is the training part of a model.

In this case, we train the model by feeding it images of chairs with the label that these are chairs, and the same for tables. This is called supervised learning, since someone supervised the process by providing these presumably accurate labels.

For now, we skip on how the machine breaks down the images, and let’s just assume that the machine now knows what chairs look like, and what tables look like. We then feed it a new image with a picture of a piece of furniture without label, and it will tell us: this is likely a chair (or a table) depending on what it has learnt. The machine has solved the classification problem, by deciding the new unlabelled furniture is classified as a chair/table accordingly.

Now, nobody stops you from training the machine with other pieces of furniture, and animals, and all sorts of other things… Afterall, that’s how we learnt, no?

Thought experiment:

Imagine you are walking about, and from far you see something. How do you decide whether this thing with 4 black legs, and black and white splotchy pattern on the top and sides is a table or a cow or may be a dalmatian?

How would your thinking process go?

Would it be faster if you remembered you were in a field in the middle of a farm, or close to a nature inspired furniture shop?

For me, yes; based on the context (where the object is), I can make the process simpler by focusing on a smaller list of likely choices, than the whole list.

This is why you get faster, likely better results, on a specialised machine (a farm animal identifier in the first case or a furniture classifier in the second) rather than a generic machine: a machine trained only on furniture would identify the table much faster and more accurately than one that has also learnt about cows and dalmatians. However, the furniture classifier would fail if someone asked it to identify a dalmatian… Hence, machines/algos trained on a specific set of data are usually better at working on that theme/context, but will not do so well at things in different contexts.

It should not be surprising, if someone from the tropics had never even heard of snow, he/she would be flabbergasted the first time, may be even think it was volcanic ash… But someone who has lived in the snow would even be able to tell you the type of snow (4), it all depends on what you need. Similarly, I know of many Mandarin/Cantonese/French speakers who claim that there are many nuances in their languages that are not present in English. Again, depends on what the people who use the language use it for.

If I had not seen a chair and table before, maybe I could check out in a dictionary:

Chair: a piece of furniture for one person to sit on, with a back, a seat and four legs (2)
Table: a piece of furniture that consists of a flat top supported by legs (3)

Then based on these definitions try and decide…

But you will tell me, wait, the human has a lot of work to do, he/she has to label the pictures.

Well, yes, for supervised learning, as a child asks adults: “what is this? And this? And this? How about this?”. But you will recognise the work the child put in: the child takes in the image he/she sees, commits it to memory in one shape or form, then later, when he/she sees a new object decides whether it is a chair, table or something else.

It is also possible to feed the machine unlabelled pictures, and it will decide by itself how many categories of objects there are (you can tell it that if you want) and it will create its own view of things and when presented with a new picture, after having been trained, decide whether that object is a chair of a table. This is called unsupervised learning.

There also is reinforcement learning, whereby the machine is given feedback on what it has predicted, therefore can continue learning by analysis what went right and what went wrong.

Now whether you choose to use supervised or unsupervised learning is up to you, there are reasons for and against using either form. Not only that, but how you choose to learn or group things also makes a difference to the output you will get and the ability of the model/algorithm to properly classify things. This is something I am geeky about, but is not for this blog post

You will agree this is a very useful thing to have in your back pocket and the practical applications are very very vast. For example, a few years ago, I found it was not too hard to build something that, once you feed it a photo of a piece of meat from a supermarket, it can identify the meat with reasonable accuracy, and you can slap on features such as estimating price (after estimating volume), freshness… You can easily do the same for fruit: auntie, no need to press-press anymore!)

Ok, but this is only classification of objects, doesn’t AI do many many more things? Is this really AI, or is it ML?

AI vs ML

AI is, as mentioned above, focused on making machines think like humans. ML is how we apply specific pieces if this to solve problems. The classification piece I used is a piece of ML, but ML is part of AI. But there is more than that.

Classification is just a small piece of what ML can do. ‘Traditionally’, ML has been used to do 3 things: classifying things as I illustrated above (think the photo app in your phone tagging the pictures by recognizing what is inside), finding out what affects what (regression) for example understanding how the weather affects price of tomatoes, and predicting things such as predicting the price of tomatoes next week.

A little diagram will illustrate what I am talking about:

So basically, while trying to explain Gen AI as it is today, I used ML, basically applied AI, and took 1 aspect (classification). I skipped over neural networks, that can be used to classify the images by say automatically varying the importance of different aspects – is height of horizontal piece mor important than number of legs – or even deep learning that basically is a more complex neural network.

But, simply by looking at the name: Neural Network, you can get a hint that the original idea was to mimic the layers of the human brain, deep learning is adding layers and other complexities. So, fret not, I am not misleading, I am simplifying. Remember, my aim is that even someone like my mum can understand.

In my next blog, I will explain the most common understanding of GenAI, ‘ChatGPT’, or basically LLM (Large Language Model) because people using it are not coding (speaking machine language) but speaking their own Natural Language (oops I slipped in NLP)

Elon Musk’s Neuralink has been approved to have human trials https://www.cnbc.com/2023/09/20/elon-musks-neuralink-is-recruiting-patients-for-its-first-human-trial.html
https://www.oxfordlearnersdictionaries.com/definition/english/chair_1
https://www.oxfordlearnersdictionaries.com/definition/english/table_1
https://en.wikipedia.org/wiki/Eskimo_words_for_snow