Spend a vision with me

Sunday 28 April 2024

Songs about AI: Have we been listening or mindlessly humming/headbanging...?

Many people still believe AI is a21st Century thing. It is not, it has been around since the 1950s (1), but current advances in storage and compute have given it new breath and started democratizing it.

Therefore it should come as no surprise that 20^th Century troubadour-philosophers have been informing us about what such technology could mean for us.

Have we been listening? These troubadour-philosophers do not communicate to us like the works of Plato/Aristotle that are unfortunately being pushed into colder storage to save costs. I came across one of these works on the radio recently, and got inspired to write this blog post. How many of you are familiar with these philosophical pieces?

Every Breath you take (Synchronicity) – The Police (1983)

Let’s start by something easy, and a song that many of us have heard, hummed, or even sung at a karaoke…

“Oh, can't you see
You belong to me?
How my poor heart aches
With every step you take”

The chorus of the song is probably what makes many people believe this is a love song, the singer is heart broken.

However, it is worth looking at the lyrics in more detail.

“Every breath you take
And every move you make
Every bond you break
Every step you take
I'll be watching you”

The first verse shows watching basically every waking moment.

“Every single day
And every word you say
Every game you play
Every night you stay
I'll be watching you”

And the second verse shows it goes on day and night.

There is no escape.

Nowadays it’s not so difficult to achieve this given the digital traces we leave everywhere, and the scary thing is that this has been happening, even in Singapore; true to the song someone searched his girlfriend in the police databases (2) However this is not an isolated incident, for example searching for details of his mistress (well the song could be about a mistress too, right?)(3) and there are many other, less romantic reasons for searching databases for individuals’ data (4)(5).

You will notice that these issues have occurred in different years.

I am not picking on the SPF, it’s just that

The cases regarding the Singapore Police Force on jealousy, surveillance, even ownership causing abuse of powers regarding data are

Not limited to the police force, it’s ironical (or prescient) because of the name of the band (6)(7)
not the only ones that have occurred, just a sample of those that made it to court and mainstream papers

My main point is that our data exists in so many places that the lyrics of the song can be taken literally.

Another thing to bear in mind is that in all these cases, the people who accessed the data actually had the privilege to do so, and they abused that privilege.

As a side not, I always tend to ask for all identifying information to be stripped off or the data anonymised before I work on any analysis/model. I think it is good practice.

The real question is whether enough is being done to prevent such abuses. And given that they keep occurring, the answer is no.

As more and more data is being captured about us, with more and more cameras being placed all over and technologies like facial recognition make it easy to identify and track individuals.

The singer, Sting in 1983 mentioned “I think it’s a nasty little song, really rather evil. It’s about jealousy and surveillance and ownership,”(8).

I wonder what he’d say nowadays. He did warn us about surveillance and ownership.

While The Police warned about what can happen at an individual level, someone else was already singing about a system designed to work at a larger scale.

Eye in the Sky (Eye in the Sky) – The Alan Parsons Project (1982)

This is another song many of us have hummed, “Eye in the sky, looking at you, I can read your mind…”

Casinos were some of the 1^st people to really look into analytics and human behaviour, going as far as designing the whole outlay of casinos, not only with FengShui (9) but also human minds in place. And that was what Alan Parsons project had written about.

But it is worth going into more details of the lyrics.

Many of us are familiar with the chorus (10) that starts:
“I am the eye in the sky
Looking at you
I can read your mind”

The idea is that surveillance is not just watching but has a predictive element “I can read your mind”. That was in 1982. And it goes even deeper:

“I am the maker of rules
Dealing with fools
I can cheat you blind”

Those who conduct the surveillance and the predictive analysis also make rules we have to obey, as fools. And since they are in control, they can cheat us if they choose to, so we better behave accordingly.

And the third part of the chorus:

“And I don't need to see any more to know that
I can read your mind (looking at you)
I can read your mind (looking at you)
I can read your mind (looking at you)
I can read your mind”

The information gathered is sufficient to predict what we do, what we are…

And they make it very clear in the 3^rd verse that the system is not something you can easily beat, it is futile to resist:

“Don't leave false illusions behind
…

Cause I ain't gonna live anymore believing
Some of the lies while all of the signs are deceiving”

Although the song goes quite dark, it also, earlier tried to use the carrot rather than the stick:

“Don't say words you're gonna regret
Don't let the fire rush to your head
I've heard the accusation before
And I ain't gonna take any more
Believe me
The sun in your eyes
Made some of the lies worth believing”

This is very true today where whatever we publish (this included) is captured ‘forever’ and can come to bite you in your behind, “Don’t say words you’re gonna regret”. But the ending of this verse is an encouragement to believe the lie “The sun on your eyes made some of the lies worth believing”, if we choose to, we can live contentedly.

Recently I was at a neighbourhood shop queuing for soya bean curd. A boy came in with his father and was looking around the shop. He was pointing out the cameras and counting them. I will ashamedly admit I had not paid attention.

Quick question, which city do you think has the most cctv cameras per sq km?

If you guessed Beijing or any city in China, you’d be wrong.

Chennai in India has the highest number of cctv cameras per sqkm, 657 (11), more than twice that of Beijing.

Anyway, back to the song… I feel that the song is a warning about surveillance and all that goes with it. That applies ot the whole album. Don’t take my word for it, their website says so too (12)

“The concept behind this album was related to belief systems, whether they be religious beliefs, political beliefs or belief in luck (as in gambling). Generally the concept is related to the universal idea that there is someone looking down on us all. The expression is also used in military and surveillance contexts.”

The Alan Parsons Project was direct about surveillance and hinted at how we probably could live an easier life if we complied to the rules.

But in true metal fantasy fashion, Ronnie James Dio sang about not bowing to the system while describing it very aptly, in 1992.

Computer God (Dehumanizer) – Black Sabbath (1992)

In 1992, Black Sabbath came up with the album dehumanizer, and the most relevant song for today’s theme is “Computer God”(13). You can find the full lyrics here (14).

The first verse itself contains the lyrics:

“Waiting for the revolution
New clear vision - genocide
Computerize god - it's the new religion
Program the brain - not the heartbeat”

The first verse, in a nutshell warns about the unstoppable march of technology. We have seen how technology is being used today (2024) to choose targets and ‘help’ decide their fate (15). The song eventually pushes to the risk of genocide of the human race.

Black Sabbath foresaw the unstoppable influence of technology, almost becoming a religion, and plead that the computers should help the brain, not the heart, because the heart is what makes us human. Perfect fit for the theme ‘Dehumanizer’.

The bridge (by the way illustrates the amazing talent of RJD) seems to hint at our addiction to social media and the effects it can have on us:

“Midnight confessions
Never heal the soul
What you believe is fantasy”

Many of us are attached to our devices at midnight, but is what is portrayed on social media real or fantasy? Is is just a way to learn about us to control us? Social media behaviour is also being used to identify people with certain traits and action is being taken (16)

“Your past is your future
Left behind
Lost in time
Will you surrender”

The next 4 lines hint at prediction, where your actions in the past, all of them, including those on social media (midnight confessions and the fantasy) are used to control your future since everything is calculated. Planned, and you just follow the recommendations that are fed to you while on social media (literally what your feed is calculated to be) or the ads, recommendations based on your profile that are shown to you. The question is “will you surrender”?

I, of course, am guilty of helping people surrender. When a ‘data scientist’, or ‘AI’ decides what offer to make you, and you pick it up, it counts as a success and reinforces the machine and directs what you will see next, you are being learned, your brain is being ‘programmed’ in a way. Are you being helped, or controlled? There is a thin line there, has it been crossed? Remember, this warning came more than 30 years ago.

The song ends on a grim note, warning us to think about what it is that makes us human (again, the album’s theme), and whether this is at risk:

“Virtual existence
With a superhuman mind
The ultimate creation
Destroyer of mankind”

It sounds a lot like “The Matrix” trilogy (17) but preceded it by 7 years.

Would you prefer the blue pill or the red pill?

Conclusion

The nice thing is that people have been thinking of and anticipating changes that advancing technology could bring to our society and the way we live. We are at a stage where the works of these people are all around us, we have been exposed to them. Have we been listening or just hearing?

Like many things in life, it is up to each individual to decide. Hopefully each of us first of all is aware of the choice, and at some point makes it.

Wednesday 17 April 2024

ML/(Gen)AI: get the basics right, unless all you want to do is brag or punish

Recently we have seen a few walk-backs, tail-between legs moments in the field of applications of ML/AI; the most interesting one from my point of view is Amazon stopping its ‘just walk out’ from its grocery stores (1). The reason I find it interesting is that behind all the tech, there were a thousand people checking the purchases (2). Hands up if you thought that it was all done by machines…

However, I will give credit where it is due, the ability to just walk out of a store after shopping, knowing that your purchases have been accurately tracked and the correct amount deducted from your accounts is a pretty useful feature. (3) It saves the consumer an appreciable amount of time and effort, and presumable at low to no cost especially if was truly automated)

And to me, analytics/ML/AI ‘s main aim should be to make people’s lives easier. Saving time and low to no cost while shopping is a good thing.

With this as context, I am sure you will understand my exasperation at HDB and ST Engineering(*).

HDB is using AI to detect power failures (4) – as if these occur frequently in Singapore…. ST Engineering even wants to sell its AI capabilities, even actionable intelligence (5)

However, to me, the basic functions the organisations are hired to do have to be done properly first, it’s like the Maslow hierarchy of needs (6), start by getting the basic needs right first. And this does not only mean the snazzy jazzy AI stuff, but the whole implementation process.

One of the HDB carparks we use very often is near a Sheng Siong Supermarket Coffee Shop + sundry shop… a very well utilized area within a block of HDB flats. The carpark of this area is managed by ST Engineering.

And the car park system has been erratic for a long while, at least 6 months now. You can tell who is a regular user of the car park because they give the car near the gantry enough space to reverse, reposition a few times to try and get the sensors to detect the vehicle (7). My question is, how is it possible that with their tools at their disposal, ST Engineering has not detected issues with this gantry? I am not even talking preventive maintenance, I am talking about usage being affected… That’s even more basic.

To add to this, I have a very specific incident.

It was raining very heavily when we were trying to exit the carpark, and the barriers simply wouldn’t go up. I exited the vehicle to press for attention from a human (ST Engineering, I assume has a number of people who can deal with the situation). The moment connection was made, it was cut-off; basically the human hung up. Twice. While I was getting drenched.

I called the helpline once we managed to get out of the car park (thank you to the people queuing for their patience, and the closest vehicle knowing to leave large room for maneuvering.) All I got was basically the communication is spoilt, we will fix it. I asked for written feedback, gave my email address, and noting came from ST Engineering. To me sounds like the case was not even lodged and there may have been covering for a colleague.

The point is, with simple use of Analytics,

the faulty gantry should have been detected earlier, rather than wait for complaints
there should be an automated system to ensure cases and raised and closed with SLAs, and this should be tracked automatically. Again, this is simple using today’s tools.

To me ST Engineering has failed in analytics and process, and in customer care.

How about HDB you ask?

Well, HDB outsourced the management of the carpark to ST engineering. Do they have customer satisfaction reports from ST Engineering, or do they not care? They must be happy with ST Engineering reports and performance – although I doubt the contract involves more than $. But also the design of the car park is bad. I got drenched attempting to communicate with the human managing gantry issues. A couple of metres from the gantry is a nicely covered walkway. I would think that extending the coverage to the gantry would not have costed that much. But hey, who cares?

What I am saying is very simple, before you start talking of GenAI, make sure you have the basic right, take care of Maslow’s hygiene and safety issues before you go for you own self-actualisation. After all, while HDB has a virtual monopoly on parking, customers should matter, don’t you think?

Conclusion

Build useful analytics, useful to the your users, make sure your KPIs reflect that, and build them into contracts. On the contractor side, track and analyse your true performance continuously. It is not rocket science, but still so many organisations fail at making lives of their stakeholders easier. Although, it would seem HDB/contractors are focused on maximizing revenue, investing in punishing rather than delivering good service (8)(9).

https://www.theverge.com/2024/4/2/24119199/amazon-just-walk-out-cashierless-checkout-ending-dash-carts
https://www.bloomberg.com/opinion/articles/2024-04-03/the-humans-behind-amazon-s-just-walk-out-technology-are-all-over-ai
this is a very different experience from NTUC supermarket self-checkout, but hat’s for another day
https://sbr.com.sg/telecom-internet/news/hdb-eyes-ai-powered-energy-system-in-tengah
https://www.stengg.com/en/digital-tech/data-science-analytics-and-ai/
https://www.simplypsychology.org/maslow.html
Each vehicle in SG has an IU (In-vehicle Unit) and when you get into a car park, the IU number is read, the gantry opens, and upon exit the IU number is read, the time and relevant fee calculated and deducted from your cashcard within the IU, and the gantry opens.
https://blackdotresearch.sg/secret-devices-installed-in-hdb-car-park-gantries-to-catch-tailgaters/
https://tnp.straitstimes.com/news/singapore/hdb-crack-down-carpark-fee-evaders

* HDB is the Housing Development Board, a government agency responsible for public housing in Singapore, around 77% of residents live in HDB flats. ST Engineering is a government linked entity specialising in aerospace, electronics, land systems and marine sectors.

Sunday 10 December 2023

NTUC fairprice shines a new path in AI

Recently, I was having a discussion on the potential effects of large scale adoption of LLMs (and AI in general), and one of the risks was a move towards uniformity/homogeneity, or a loss of randomness in the human experience. (1)

Basically, if algorithms are designed to give you ‘the most likely’ or ‘the best’ answer (this may not always have to be the case (2)), then everyone would get similar answers and be driven to same things.

Add to this the fact that as more people use LLMs, more and more content on the internet will be LLM generated, and therefore the training data used for LLMs will include a higher percentage of LLM created data as opposed to human created data.

Fear not!

A data scientist at NTUC fairprice in Singapore has managed to build a machine (apply an algo) that gives very interesting answers:

The AI built by NTUC understands that, after a meal, you can use 2 similar products, similar in the sense that you, as a human, have a choice to do 1 of 2 things:

do the dishes using the sponge, or
have a piece of chocolate

There still is hope!

Or despair: “mummy/daddy, it’s not me! It’s the machine who told me I could do either one since they are similar”

Sunday 19 November 2023

You are overpaying for your vehicle insurance. It doesn't have to be this way.

I am sure you have been making this complaint over the years, but didn’t have much choice since prices are around the same and policies are designed to be sticky or not so advantageous to get out of (that’s for another day).

Singapore General Insurance Association says so too!

But now, ladies and gentlemen, we have the ultimate confirmation. This comes from the GIA the association that groups General Insurers in Singapore “About two in 10 motor insurance claims in Singapore are fraudulent, often involving exaggerated injuries and inflated vehicle damage”(1)

And the president of Budget Direct

The president of Budget Direct, who usually prides itself in competitive rates even admitted: “In the end, all motorists are victims of motor insurance fraud as we all end up paying higher premiums as a result”

This is the key you see, the claims paid out in fraudulent cases simply get translated into increased premiums for ALL vehicle insurance customers. Irrespective of whether you commit fraud, are a scam victim, or are accident/claim-free, you are paying for the fraudsters’ bread butter and cake, and the insurers maintain their healthy margins and profits

The Insurers have no incentive to act on fraud

The thing is the GIA is saying, it is up to you, the customer to stop the fraud. And that I find laughable. Let’s see what are the main causes of fraud as per GIA …

Beware of Phoney Helpers: After an accident, individuals may offer "help" and pressure victims to follow their directions, often leading them to unauthorized repair shops or overpriced towing services.
Staged Accidents: Scammers stage accidents, causing victims to collide with their vehicles and then falsely accuse victims of causing the collision. They often fake injuries and make substantial claims for damage and injuries.
Phoney Witnesses: Suspect convenient witnesses who support the other driver's account, often suggesting a staged accident.

1 Unauthorised repair shops:

Most vehicle owners are aware of the workshops that their insurer accepts, whether by own bad experience, by hearing from friends and family, or from the insurer. Plus, most of the time, unauthorized repair shops costs are not paid by the insurer, if they are, it is a bit rich on the part of insurers to honour the claim while complaining about it.

Plus it is not rocket science to detect highly inflated claims based on pictures and description that accompany the claims. I know because I worked in an insurance company in a much less developed country than Singapore, and I know for fact that they have the data needed to deal with this, the question is financials and will.

2 Staged Accident

And how is that the fault of the insured? The insured is getting scammed at the same time as the insurer, unless GIA is claiming that the insured is somehow going along with the scammers… more on this later

3 Phoney Witnesses

Again, how will someone who has just been in an accident be able to detect whether witnesses are phoney or not?

Unless Singapore is a nation of scammers (not scammed/scam victims (2)), it just doesn’t make sense to think that individual people involved in accidents are part of the scam. So should victims pay the price twice (once being scammed and second via higher premium, and probably loss of NCB)?.

So my arguments that follow assume that Singapore is not a nation of scammers (unlike (3)). Afterall Singapore is only beaten by Finland, New Zealand, and Denmark in terms of corruption perception. (4)

The fact that GIA mentions Staged Accidents, Phoney Witnesses seems to indicate syndicates are at play, or at best a group of people who are in the business of scamming accident vistims. In fact, it is likely that staged accidents and phoney witnesses occur together, rather than separately.

You can have a staged accident without phoney witnesses, but very unlikely to have phoney witnesses to a real accident.

So chances are, there are syndicates/gangs/groups of scammers at work. It is ridiculous for GIA to expect an individual consumer to be able to detect them, don’t you think so?

So what can be done?

The answer, in most of my blogs, is Analytics!

Inflated Claims

I briefly mentioned the solution to GIA issue 1, inflated claims. Analytical models can be built to detect inflated claims. The beauty of this is that it can be even employed to detect which workshops are cheating.

But, from experience, there is little will power in senior management to do something that will rock the boat. It is important for analytics people to learn that not everything that can be done will be done, other factors come into play, obviously whether it is financially viable (in this case I am quite sure it pays for itself quite quickly, a couple of months of work to build, another month to finetune, and the low running costs for a basic solution), or politically (is it worth opening pandora’s box at your preferred workshops?).

In sum, technically easy to solve and pays for itself, management wise depends on management.

It's even worse at the GIA level where, as people in SG know, some workshops are on the panel for multiple insurers.

Staged Accidents and Phoney Witnesses

Accidents, by their nature, are (most of the time) unexpected, hence being able to, by simply looking say at road and traffic conditions, location, it is not that straight forward estimate the probability of an accident and highlight the stranger ones; one of the reasons being that humans play a large role and it is not so easy to get data on all actors involved, not only all drivers involved and their data, but also drivers in the immediate vicinity.(5)

The easy way to detect staged accidents and phoney witnesses is to focus on the people, not the vehicles. The key assumption is that these are the work of groups of people. Hence they are likely to play different roles at different times. Let me put it this way, how likely is it that someone is a claimant, a witness of an accident, and at fault for a vehicular accident all within say a year?

The idea is that, chances are, a member of the group is likely to play different roles over time, sometimes even with different insurers to make the chances of detection lower. This is something very easy to pick up using social network analytics, especially at the GIA or police level.

Conclusion

Saying that 20% of claims are likely to be fraudulent and placing the onus on customers/insured in the case of vehicular insurance in Singapore is a joke.

1 The main causes as stated by GIA are unlikely to be caused by claimants

2 The GIA itself (or to a lesser degree large insurers) are the ones who have the data easily at hand to detect potential fraudulent cases effectively

3 however the insurers (and the GIA) have little incentive to do so since they can simply pass the costs to customers.

However, relatively simple analytics can, right now, help alleviate this problem and allow customers to pay lower premiums since the risk of fraud can be mitigated. It is just a question of will from the insurers’ point of view.

https://insuranceasia.com/insurance/news/20-singapores-motor-insurance-claims-are-fraudulent-giaj
https://www.straitstimes.com/world/14-trillion-lost-to-scams-globally-s-pore-victims-lost-the-most-on-average-study
https://www.youtube.com/watch?v=q5PI5ZtJTSY
https://www.transparency.org/en/cpi/2021
That is not actually true anymore in Singapore, I will explain in a subsequent blog.

Sunday 5 November 2023

GenAI thing, bonus: hype cycle

Gartner is an organization that classifies different technologies into their “hype cycle” framework. (1) basically, any piece of technology may go through 5 stages:

1. Technology Trigger

· A technology reaches a proof of concept, a successful experiment, people get excited.

2. Peak of Inflated Expectations

· Given the excitement, some companies jump in and experiment, some succeed, most do not.

3. Through of Disillusionment

· Given failures, some technology versions fail, and investment into the space gets hit and will only recover if providers iron out main issues.

4. Slope of enlightenment

· As technology becomes production ready, more successes are created and the usage and limits of the technology are better understood. New generation products appear.

5. Plateau of Productivity

· Mainstream adoption, what was successful niche spreads.

Guess where Gartner placed Gen AI in its 2023 AI hype cycle?

(2)

That’s right, right at the peak of inflated expectations. Plus, they only see that plateau of productivity being reached in 5 to 10 years.

On the other hand, something like Computer Vision, where we use machines to process images to extract meaningful information is close to the plateau of productivity. There are many pieces of software/APIs that help you analyse images very efficiently, and very importantly there are proven use cases in production for computer vision, from facial recognition to control access, to recognizing who is not correctly wearing masks (useful during COVID), to detecting anomalies in x-rays/MRIs, to identifying and tracking people from public cameras (ahum…).

GenAI, on the other hand, has made a big splash, people around the world, especially including non data professionals are raving about the possibilities that GenAI can bring. AI is already being used whether we are aware/like it or not, for example in the UK (3), now imagine GenAI (in an earlier blog I listed a few well known issues with LLMs)

So what have people been doing with GenAI. One of the avenues that is being explored is helping humans write code. And there are many many exampes of this; for example the ubiquitous GitHub CoPilot (4). But as I asked in an earlier blog, do you think the code that is written is of very high quality since it is built on ‘everyone’s’ coding…

There have also been efforts to help manage GenAI. Actually, apart from the coding co pilot, the other development from Microsoft Build (5) earlier this year is the guardrails Microsoft put around GenAI. And this can be leveraged, as OCBC has done (6) with MS Azure to allow fact checking, not blindly following the answers generated: curation! (7)

The reality is, I believe GenAI is a very useful tool to have in your arsenal. More ‘traditional’/’tried and tested’ methods may be more suitable for your problem at hand. I have had customers saying “I just want GenAI” whether their use case suits or not. I would just point to the “peak of inflated expectations”.

I am someone who enjoys building stuff that work and enables organisations to hit business KPIs, and to do that, choosing the right tool is very important, and this is something I can help with. You can use a sledgehammer to open a can of beans, you can use a can opener too; guess which, currently, more efficiently gets you to the beans and deal with your hunger?

https://en.wikipedia.org/wiki/Gartner_hype_cycle
https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2023-gartner-hype-cycle
https://www.theguardian.com/technology/2023/oct/23/uk-officials-use-ai-to-decide-on-issues-from-benefits-to-marriage-licences
https://github.blog/2023-03-22-github-copilot-x-the-ai-powered-developer-experience/
https://news.microsoft.com/build-2023/
https://www.straitstimes.com/business/ocbc-to-deploy-generative-ai-bot-for-all-30000-staff-globally
Interestingly, if you look again at the AI hype cycle 2023 diagram above, "Responsible AI" is also at the peak of inflated expectations, humans still, fortunately, have more thinking to do...

Sunday 29 October 2023

Gen AI Thing, Part III : finally getting to the crux of the issue

In my previous 2 blog posts, I explained, as I would explain to my mum, what is AI, and how machines can understand our words. In this blog, I explain heuristically (non-technical / common sense) how machines can not only understand what we are saying, but how they can respond intelligently and create answers that were not there before (Gen AI)

One more little detour

For those of you who have played with ChatGPT or something similar (Bard?), one of the things that puzzles people is the concept of ‘tokens’. Some of you may ask, since I claim that machines can understand human words well enough, what is this token thing? Are we gambling when using these ‘tools’?

Gambling? Yes may be… Some examples of funky ChatGPT (and Bard) results

ChatGPT making up cases to support its argument (1)
ChatGPT making up cases of sexual harassment and naming the supposed perpetrator (2)
Bard makes mistake regarding James-Webb Telescope (3)

There are ways to mitigate these issues, but this is beyond the scope of this blogpost. Suffice to say that such models do give information that may be less than reliable. But then again, they were not designed to ‘tell the truth, the whole truth, and nothing but the truth’.

ChatGPT/BARD are not designed to tell the truth?! I thought they were AI!

The answer to the question lies in the question itself. These are AI systems, and as I mentioned in my 1^st blog of the series, such models learn from (are trained on) data they are fed. Secondly, it may help to understand a bit how these systems, they are called LLMs (Large Language Model), work.

How does ChatGPT/Bard… work?

Let me start by a word game. How many of you play wordle (4)? Basically everyday, a 5 letter word is chosen, and you have to guess the word without any clue, you have 6 tries. All that you will ever know is whether the letter you have suggested exists in the answer but is in the wrong slot (yellow) or in the correct spot (green) or does not exist at all (black). The other condition is that any combination of letters you try has to be an existing word.

The thing is, most people, once they know the position of one letter, will try to guess the letters next to it based on what they know about the English language, for example (5):

‘E’ is the most common letter in English and your best bet if you know nothing about the word, this is followed by ‘T’ and ‘A’.
If there is a Q, chances are there has to be a U, and chances are the U follows the Q
If there is a ‘T’ except in 5^th position, then the next letter is likely a ‘H’ next is ‘O’ and next is ‘I’
If there is a ‘H’ except in 5^th position, then the next letter is likely a ‘E’ next is ‘A’ and next is ‘I’

Combinations of 2 letters such as ‘QU’, ‘TH’, ‘TO’, ‘TI’ are called bigrams. The idea is that once you know a letter, you use this information to find the most likely following letter – this is known as conditional probability, based on the condition that one letter is an ‘T’ then the most likely following letter in an ‘H’, not an ‘E’, the most common letter in English. The key is that your choice of letter changes based on information you have.

These are shortcuts, findings based on analysis of words, that can help you guess the letters in wordle.

As an aside, the most common bigrams in different languages can be very different (6)(7)

Bigram Popularity	English	French	Spanish
1	TH	ES	DE
2	HE	LE	ES
3	IN	DE	EN
4	ER	EN	EL
5	AN	ON	LA

Letters are fine, but Gen AI generates whole documents, not random letters

It’s just an extension of the idea. In the example above, I used bigrams (2 letters), when playing wordle, some people may choose trigrams (3 letters), it’s basically the same thing, just a little bit more complex.

The next step then is that instead of guessing the next letter (using a bi-gram), you guess the next word. But why stop there? You can actually go beyond a bi-gram and use multiple letters (here words). It’s, in principle, that straightforward. However, to improve the performance, there are a few more tricks.

The problem is the size of the data; given the number of words, the combinations possible increase exponentially as you add more words. The brilliant, or one of the more brilliant, things about LLMs is that they generate a probability of a combination of words occurring. They do that by using an underlying model and, recognise the patterns.

AI, NN, and the human brain

As mentioned in Part 1 of this blog, AI is about making a machine think like a human. The way this has been done in Neural Networks is to make a representation (model) of the human brain, with nodes and connections. And as it is thought with the human brain, each node does a fairly simply job (one of the simplest jobs is a binary yes/no or a threshold – in this case called a perceptron), and the connections between them are given weights based on how important they are.

Note that as Neural Nets have progressed, they have taken a life of their own and the idea of mimicking the human brain structure is not central, the architecture of neural nets, while using nodes and connections can be different.

Going back to the chair and table example

When you show the machine a picture, it breaks it down into small parts of the picture (features), may be the length of the leg, the shape of the back, and assigns weights based on how important these features are. After being trained over many examples, the model is ready to distinguish between table and chair.

The illustration above shows a very simple type of Neural Network, one input layer where you start, one hidden layer of nodes and connections in one direction to do the magic, into the output layer. For the table chair classification from images for example, it has been found that neurons arranged in a grid formation work well, specifically a Convolution Neural Net. Basically, a set of filters is applied to detect specific patterns in the picture (convolutions), then these are summarised and combined (more layers) to extract the more salient features without burning enormous resources, and finally pushed to the output layer; in the case of our chair/table classification there would be 2 nodes in the output layer, the output being the probability that the image fed is a chair or a table. (9)

There are many ways to structure a neural net, many parameters to play with. You wouldn’t be surprised that one of the important innovations was that, for processing text it is important to know what else is in the sentence, and not process each word independently. So, there was a need to be able to refer to past Long Short Term Memory (LSTM) (10) allowed this to happen by allowing the user to control how long some nodes would retain information, and hence be used to provide context.

However, LSTM is not that fast as it processes information sequentially, like many of us do, we read word by word.(11). In 2017, a team from google came up the brilliantly entitled paper “attention is all you need” (12). This gave rise to the rise of Decepticons (13), sorry, to Transformers(14). Basically, the machine, when processing a chunk of text, calculates weights using an attention network, calculating what words need to be given a higher weight. While Transformers can be run sequentially, they can also be run in parallel (no recursion), hence the usefulness of GPUs in LLMs.

To answer a friend’s question, GPUs are not necessary in LLMs, but they really speed things up. (15)

Is LLM therefore just a better chatbot?

You must be thinking that LSTM is something that has been used in Chatbots before, and LLMs, as I have explained here, basically just answer your queries…

Actually no. One huge difference between chatbots and LLMs is how they learn. LLMs use reinforcement learning (I sneakily introduced this in Part I of this series, there even is RLHF Reinforcement Learning from Human Feedback...), also the volume and diversity of data that these have been traditionally trained on is vastly different. LLMs can ‘talk’ about many more topics/intents than a traditional chatbot that is usually more focused.

However, the comparison with a chatbot is an interesting one. The interest in LLMs really took off with GPT3.5. As the name suggests it is not the 1^st offering in the GPT family of OpenAI. So what made GPT3 garner so much interest (GPT-1 was released in 2018, GPT2 in 2019, GPT3 in 2020, and GPT3.5 in 2022 (16))? One was that it suddenly improved, and second that a friendly chat interface was included, allowing virtually anybody with an internet connection to play with it, and become an instant advocate.

A few more points

GenAI, here LLMs, basically smartly and quickly process word/token embeddings to understand you, and produce a response. The key to understand them, as I mentioned earlier is to know they are not designed to give you the truth, but they answer: “what would a likely answer be?” Actually, not only that, GenAI gives you the likely answer of an average person (Thank you Doc for pointing this out clearly). Think about it, if it is trained on the whole internet, and ranks the most likely answer, then the most likely answer may not be that of people who really know what they are talking about. Hence, my thought that LLMs can help so-so coders, but expert coders may not be helped that much, they probably know better.

Questions to ponder:

Do you believe that logic is something that is common in humankind? Is common sense really that common?
How about Maths, do you believe that people are generally good or bad at Maths?
Why am I asking this? Simple, now tell me, do you think, LLMs are good at logic? At Maths?

Is most likely always the best?

Now, there’s one more thing is that you can influence: what GenAI responds to you. I mentioned that they basically rank all possible words and pick one; may be your first instinct is to always pick the highest probability word.

That would give you consistent answers over time. However, always using highest probability response often leads to circular and less than satisfactory answers. Hence, most people choose to allow some randomness (ChapGPT calls this temperature(17))

Conclusion:

GenAI is a great tool (what you can do with GenAI, whether you are from an SME, an individual looking to make your own life easier, or a large organisation may be a topic for a next blog). What it does it come up with is a possible answer based on the data it has been trained on. (Actually another blog post could be why GenAI is not the answer to everything, but that’s probably obvious)

https://www.channelnewsasia.com/business/new-york-lawyers-sanctioned-using-fake-chatgpt-cases-legal-brief-3581611
https://www.businesstoday.in/technology/news/story/openai-chatgpt-falsely-accuses-us-law-professor-of-sexual-harassment-376630-2023-04-08
https://www.bbc.com/news/business-64576225
https://www.nytimes.com/games/wordle/index.html
http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/english-letter-frequencies/
http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/french-letter-frequencies/
http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/spanish-letter-frequencies/
you can also adjust how you penalise mistakes, known as the loss function; so that’d be a 4th way.
https://www.simplilearn.com/tutorials/deep-learning-tutorial/convolutional-neural-network
http://www.bioinf.jku.at/publications/older/2604.pdf
LSTM evolved from Recurrent Neural Networks (RNN) where the idea was that you can look back at information you processed earlier (hence recurrent), however if the information was far back, there were problems referring to it.
https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
https://tfwiki.net/wiki/Rise_of_the_Decepticons
https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)
Memorable demo of CPU vs GPU https://www.youtube.com/watch?v=-P28LKWTzrI
https://en.wikipedia.org/wiki/GPT-4
https://community.openai.com/t/cheat-sheet-mastering-temperature-and-top-p-in-chatgpt-api-a-few-tips-and-tricks-on-controlling-the-creativity-deterministic-output-of-prompt-responses/172683

Sunday 22 October 2023

Gen AI thing Part II: Talking in human language, not in machine language

In my previous blog, I explained that AI is about making machines think like humans, and I gave an example of a human task of recognising objects and how you can get a machine to do that. In this blog, I will expand a bit more on how we can all become Dr Dolittle (1) but with machines rather than animals.

A few years ago, someone from LinkedIn asked me what coding language I would recommend a child to learn since he was making the decision for his newborn. I said that, in my view, rather than humans learning how to speak machine language (coding), sooner or later machines would learn how to understand human language (something like NLP Natural Language Processing), and it would be more important for a child to learn how to think systematically but also creatively rather than learn how to code. I haven’t heard from that person since. Hey, sooner has happened (2)(3)(4). But I am jumping the gun.

For the 2^nd time, what is Gen AI!?

Gen AI is basically using AI to create something that wasn’t there before. What is created can be text, an image, a sound… But the trick is that, first the machine has to learn (that is be trained on a bunch of data/examples), then it can produce something.

But what excites most people is that anyone can use Gen AI because the machine speaks human language (no code and you can access the mythical AI!). I will tackle this part first.

The machine understands me!

Another branch in AI/ML is NLP, Natural Language Processing. NLP is precisely concerned with making machines understand what humans are saying. You can imagine, it’s already quite difficult for humans to understand each other, now imagine machines…

Language is a very complex thing, and is a living thing: new words are added all the time, meanings are added to words over time, words may mean different things in different contexts, humans use irony, sarcasm… But it is worth it because a huge amount of knowledge is kept in language, whether oral or written form. With the advent of the internet, and the digitisation (making it digital - bits and bytes- rather than analogue – printed image) of dictionaries, research papers, and democratisation of access to the internet (any idiot can write a blog – but smart people know which to read) there is a treasure trove of information that can be used to train a machine on the internet. But language is not that easy to deal with.

Words are all I have

In my previous post I talked about classification, and one of the keys is to measure the distance between things and decide which are similar. How does that apply to words?

But computers are all about numbers, not words…

The first challenge that machines have in comparing words is that they do better at numbers, so the first trick is to somehow make the problem one that involves numbers, once you know how to measure, then deciding which is closer is not so hard..

Look at the words “BETTER” and “BUTTER”. How close are they?

There is only 1 letter difference, so, these 2 words are quite close, it’s just replacing a letter. There are some concepts of distance that make such calculations, especially taking into account the number of letters in the word. These algorithms are quite useful. The idea is that words are similar if it takes little effort to change one into another.

Now, let me add the word “BEST” to the comparison. As an English speaking person, you would say “BEST” is close to “BETTER” but not so close to “BUTTER”, but going purely by replacing letters misses the meaning. Therefore there must be a way.

Vector Embedding

Similar to a dictionary for words humans can refer to, there is a source of information that machines can refer to that tells them the relationship between words (humans can use them too). This is called vector embedding.

Vector Embedding: Imagine

Imagine a 3 dimensional space in front of you. A point in this space represents a word. A vector for that word is like directions to that point in space (here may be x, y and z coordinates). And each word is embedded in space with closer words having similar meaning/context. One of the really popular techniques has been made public by google called word2vec, basically transform a word into a vector while preserving the meaning of the word.

So to follow our example, in the 3D space, ‘BETTER’ and ‘BEST’ will be close to each other, and ‘BUTTER’ further (closer to ‘MARGARINE’ and ‘MARMALADE’).

Points in space, and more

Not only are words that are similar grouped together so the machine can get the topics in a piece of text, but the relationships between the points in space also have meaning: moving from “BETTER” to “BEST” is the same journey as moving from “WORSE” to “WORST”.

This is something worth thinking about, not only do vector embeddings bring words that are about the same thing close to each other, but based on not only the distance, but the direction (6), the relationship between the words can be inferred.

What is the big deal with vector embeddings?

The beauty of vector embeddings is that some large organisations like google have made their vector space available for anyone to use, so we do not have to train the models, for example word2vec(5). In some cases, say you are dealing with very specialised topic say medicine, you should use specialised vector embeddings, but for most cases, for the machine to understand what the human is saying, generic vector embeddings work well enough.

Therefore, the machine is able to know what we are saying whether we use the same words or not because it now, with embeddings, see what words are close to each other in meaning and their relationship with others. That’s great!

What this means is that it is possible to train the machine on millions of pieces of text on a bunch of topics, and it will be able to understand that some of talking about the same thing even if the words used are different.

Ok, but this is not new right?

Correct! Vector embeddings aren’t a 2020s thing (7). In the 1950s, John Rupert Firth made a statement that underlies a lot of the thinking today:

“You shall know a word by the company it keeps” J.R. Firth 1957 (8)

However, 75 years ago we did not have the computing resources we have. So, AI went into winter – people could think about it, but it was very hard to put it into practice. For example, imagine the number of words in a language (9) – English Wiktionary (10) contains around 700k base words and 1.4m definitions - and if you want to put this in space with the meanings then you will need many groups spread across many dimensions, and even worse there will be dimensions with few words, making computation really tough (curse of dimensionality (11)). Most people can navigate through 4 dimensions and our brains can handle 4 dimensions easily (our 3D world + time) (Next time someone is late for a meeting, introduce them to the 4^th dimension 😊 ). However, some research points to humans being able to handle more (12), but still not as many required to plot even only common words in English.

Note that not everything stopped, people spent time in many other directions.

In the 2000s, research hotted up and some great leaps were made, for example research by Yoshua Bengio and colleagues at Montreal proposed the path forward “We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.” (13)

Ooops! getting too geeky here, just to summarise the point about vector embeddings. The thing with machines is that they don’t understand language just like that. So, one of the ideas was to convert words into numbers (vectors). Then the words that are about the same thing are grouped together, so if you use slightly different words from me but we are saying the same thing, the machine can tell. The neat thing about the numbers is that doing maths on the numbers allows the machine to understand the relationship between the words, for example the relationship between “king” and “man” is the same as “queen” and “woman”

(14)

The machine is now ready to understand you!

Add to this that there exist specialised vector embeddings for specific fields, this allows the machine to have understand you generally, or even if you are asking in depth questions on specialised topics.

So, what this helps is for machines to store all the info they have access to in a way that is very easy for them to search and make use of, so they can figure out to a large degree what you are talking about. It is not perfect, that is why you have a role of prompt engineer (someone who speaks the ‘human language’ the machines understand). Personally I think advances in NLP, machines being trained by interactions with humans, sooner or later there will be less need for prompt engineering; we (as in humans and AI) will all speak a ‘common language’, a bit like how some people speak differently to their children (or pets) or ‘foreigners’ compared to their own friends and family.

But still this is not Gen AI, where is the Generative part?

True, we are getting there…

In my previous blog and this one, I explained how machines can be made to think like humans, how advances in technology have made it easier to avail training data to machines so they can understand what humans are saying to a large extent.

The next step is how machines can now create stuff, I will be focusing on how machines can write stuff that has not been written before. That will be the topic of the 3^rd and last part of this loooong blogpost.

https://www.youtube.com/watch?v=YpBPavEDQCk
https://ai.meta.com/blog/code-llama-large-language-model-coding/
https://venturebeat.com/programming-development/stability-ai-launches-stablecode-an-llm-for-code-generation/
https://cloud.google.com/use-cases/ai-code-generation
https://en.wikipedia.org/wiki/Word2vec
That’s the basic thing about vectors, they are about ‘magnitude and direction’ https://en.wikipedia.org/wiki/Vector and the relationship between them can be ‘easily’ mathematically calculated
https://en.wikipedia.org/wiki/Word_embedding
https://cs.brown.edu/courses/csci2952d/readings/lecture1-firth.pdf
https://en.wikipedia.org/wiki/List_of_dictionaries_by_number_of_words
https://en.wiktionary.org/wiki/Wiktionary:Main_Page
https://en.wikipedia.org/wiki/Curse_of_dimensionality
https://www.frontiersin.org/articles/10.3389/fncom.2017.00048/full
https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
https://blogs.mathworks.com/loren/2017/09/21/math-with-words-word-embeddings-with-matlab-and-text-analytics-toolbox/