In my previous blog (1), I showed the direction Singapore has chosen to take regarding the new world of AI. Singapore chose to weaken traditional labour structures and support personal responsibility instead. I mentioned that in order for these 2 structures to be comparable, there needs to be transparency and accountability across the line, so individuals can come close to replicating informational resources the old structure (Unions) had.
I had left
out the obvious fact that, if you want to adopt AI, you need to know what you
are doing, else you will get hurt. Well, it only took a week for this obvious
fact to slap me in the face.
One of the
crown jewels of Singapore, the Mass Rapid Transit train system has had a major
failure, costing Singapore around $100m (2) and the cost is still going up as
the days go. And I place this failure firmly in the realm of an analytics
failure.
What
happened?
It is very simple.
- A train broke down at 0930 on Wednesday.
- In order to free the tracks, the affected train was dragged towards an appropriate area.
- However, the break down had caused a part of the train be dislodged and fall onto the tracks.
- During the dragging of the train, this caused damage to the tracks for at least 1.6km (3)
Why do I
say it is analytics failure then?
Before I go
there, let me go a little bit back in time, to the '1st grand MRT
failure'
The 1st
grand MRT failure
I am not
going to go through minor issues, if you are interested you can refer to
(4)(5), but to the 2011 case. Within 3 days 2 incidents occurred.
Many
fingers were pointed at the then CEO, MS Saw Phaik Hwa, whose expertise lay
making $ having been regional president of DFS (Duty Free Shop) Ventures – the
closest she got to managing transport was probably DFS shops at Changi airport.
She launched the now ubiquitous shops at MRT stations. Everyone knows she had
no engineering degree, but the aim was to monetise the assets of the SMRT.
She was
criticized for not prioritizing maintenance. But during the commission of
enquiry on the 2011 failures, while she was accused of neglecting maintenance –
the cost of maintenance rose by 3% per annum (6) – she argued that she simply
approved the figures proposed by the maintenance team.
She posited
a few possible reasons for the 2011 failures (that affected only around 200,000
people (8) compared to half a million now) and mentioned unexpected rise in
readership.
As it can
clearly be seen, Singapore population did indeed increase quite a bit leading
to 2011 (9); the growth rates 2006-2011 are higher than 2001-2005. In fact, in
2009, Singapore first crossed 5m population.
My own
non-expert but numbers driven opinion is that maintenance budgets may not have
taken increased wear and tear due to increased ridership.
Indeed, at
that time, the focus, as highlighted by expert witness Professor Lim from NTU
was on strength of ‘preventive maintenance’ on the part of SMRT.
Preventive
maintenance is what you do with your vehicle, you have a time (yearly, half
yearly) or usage (km used) based schedule to maintain it. The basic idea is
that, in most cases, most issues occur after a set period (time or usage),
hence the idea is to maintain before that threshold and identify and fix issues
before they become serious.
As I
mentioned before, CEOs are an indication of the direction an organization is
likely to take. Ms Saw Phaik Hwa was replaced by Mr Desmond Kuek, a career
military man and an engineer (10)
So what
has changed since then?
You can get
some historical perspective by reading what the former Straits Times transport
correspondent wrote (11). He highlights the recent completion of the 10 year
renewal programme, and the issues that have cropped up since.
But to me,
what is more enlightening is what he said SMRT does/did right, that is “In
his post, Tan called for full transparency from the authorities, questioning
why the incident occurred despite SMRT’s use of predictive maintenance systems
designed to prevent such failures. “We have been told SMRT now practices
preventive and predictive maintenance… So, what happened to that fateful
train?”“
SMRT has
included predictive maintenance among the tools at its disposal.
This is
totally in line with Singapore adopting the best techniques, and is now leading
the world in GenAI adoption (12).
In fact,
the SMRT Chairman, only last year, stressed the need to balance costs and
reliability, to avoid “over maintenance” (13). This is exactly where predictive
maintenance can help. It is not a replacement for preventive, but an additional
tool that should help manage costs better.
A couple of
things I’d like to point out before I go further.
The current
SMRT Chairman, Mr Seah Moon Ming is an engineer by training and had a career in
MINDEF (Ministry of Defence) and ST Engineering among other government related
posts.(14)
The current
CEO, Mr Ngien Hoon Ping is also an engineer and also comes from an army
background and is the third ex Singapore Armed Forces high ranking officer to
helm the SMRT (15).
The focus
is squarely on efficiency in maintenance.
So how did
this incident occur?
What
caused the 2nd grand failure?
To
me, it is preventive maintenance.
Yes,
analytics is causing Singapore $100m and counting.
Let me
explain myself.
I am not
saying preventive maintenance is bad.
On the
contrary, it is a potential cost and even life saver. It is a very useful tool.
As all tools, how is it used matters.
Now, preventive maintenance is not new to SMRT (16); even since the days of Mr Desmond Kuek, preventive maintenance has been put in place and AI used to make more sense of the data generated.
To be
clear, some highlights relevant to the current case:
- “allows real-time monitoring while the trains are in operation”
- “Sensors installed at City Hall MRT will also scan the entire North-South and East-West lines’ train fleet for defects such as wear and tear to the wheels or axel defects.”
- “allow SMRT to tap on multiple streams of data from all of its assets to predict the need for maintenance activities”
The system
SMRT installed in 2018 was from HK polytechnic university (17) and as first in
the world (as usual for Singapore) (18) and to be clear the capabilities are “Apart
from installing an optical fibre sensing network in tracks to monitor the
trains, sensors are also installed in in-service trains to monitor the tracks
on which the trains run.”
The system
SMRT has ‘listens’ to the train, and to the tracks, and feeds live updates of
data for processing: real time, trains and tracks.
The tools
therefore do not seem to be a problem.
The 1st
thing that SMRT publicized once the incident occurred, even before any possible
cause was investigated, is that the train that broke down was 35 years old.
To me, on
the contrary, this means that SMRT has enormous amount of data on this type of
train and the preventive maintenance models on this 35 year old train should
be top notch: more accurate and reliable data means more accurate and reliable
models, especially in slowly changing systems.
I am not
saying the models failed. There is much much more to implementing, using and
maintaining any model with predictive capabilities than simply just signing a
document and taking delivery of a system.
Think
3Ps People, Product, Processes
Product
The
collaboration between SMRT and HKPoly is still going strong (Dr Tan Kee Cheong
(18) is still with SMRT and was even adjunct at Hong Kong Poly (19)). Hong Kong
Poly is also at the cutting edge of research and application on railways (20)).
Therefore, there is no reason to believe that the product, that is the
predictive maintenance system from HK Poly has any major issues.
Rather, I
think the issue has all to do with people and process.
Let me
start with process
Process
Let’s recap
what happened (21)
- Train developed fault
- Train was being moved to the depot
- A component, axle box dropped onto the tracks
- The boogie frame dropped and caused wheels to shift
- This damaged rails and tracks for at least 1.6km as the damaged train was moved.
Axle box:
Predictive maintenance watches the health of axle boxes ““Sensors installed at City Hall MRT will also scan the entire North-South and East-West lines’ train fleet for defects such as wear and tear to the wheels or axel defects.””
The predictive maintenance system should have flagged potential issues with the axle and prevent axle to break.
1.6km of
damage (at least)
A piece of
equipment was dragged on tracks for at least 1.6km, and there was no alert from any
sensor that the noise, or the vibrations coming from the track were not normal?
I am pretty sure the sensors picked the issue. But why was the damage allowed
to continue so long?
The slew of sensors along the track should have detected the damage as it was occurring and minimized the impact.
So what
happened?
I think the
issue is with the volume of data SMRT deals with, ““allow SMRT to tap on
multiple streams of data from all of its assets to predict the need for
maintenance activities””. The process for dealing with the volume of data
is likely flawed.
And this
leads me to people.
People:
An
analytical system is not a fire and forget kind of thing. Its performance has
to be measured, the system within which it operated has to be evaluated, and
the analytical models adjusted accordingly.
This takes
some organizational commitment and some skill on the part of the analytical
team. This is where most models, even if properly implemented, degrade and may
fail past the short run.
Let me
explain a little bit.
To predict
whether a piece of material will fail, simple survival analysis type models are
sufficient for a single component, all the way to digital twins to account for interactions within. Every time you maintain the piece of equipment, you take data on
the state of the equipment, and if possible, the waste whether exhaust, or
oil/lubricant, and use the chemical analysis as input. There even are systems
that do preventive maintenance purely based on the sound of equipment (22).
For those
of you who have been to Ikea, remember the chair testing machine? (23).
Preventive
maintenance counts when the chair usually breaks and tells you it is good for
say 80% (depending on the risk) of that number, predictive maintenance looks at
the wear and tear on the flexible component and advises when it is
deteriorating. This is the lab (showroom) world.
Now when
the piece of equipment interacts with a changing world, then the external
components that affect the equipment also need to be included.
For
example, let’s say this chair is at my home. If I suddenly put a large amount
of weight, then my predictions regarding my chair go out of the window, the
environment the chair existed in has changed, and I need to adjust my
calculations accordingly.
A slightly
more complex system has to be built.
And someone
needs to know when the parameters within the model have to be adjusted.
This is
what people are for, to keep the predictions usable.
Predictive
maintenance suffers from the fact that it is hard to model to start with, given
failures are (hopefully) rare; modeling rare occurrences has its own
challenges. Now add to the fact that ideally you need to model external
components. People are even more crucial; imagine what may affect the model,
and try improving the model by testing if these features make the models
perform better.
It is a
continuous process.
A model is
meant to represent something. As things or the environment change, so must the
model. Luckily, for analytics, it is part of the process that people should
follow to test the changes in the model and make sure they are successfully
captured and the model improved.
Summary:
In sum, the
case of the SMRT incident illustrates the importance of people continually
thinking and improving systems and the processes around them.
It’s the
people, not the technology.
While
Singapore is leading GenAI adoption, and is putting a structure in place to go
along the chosen trajectory it is crucial basic steps are not missed else the
structure may crumble.
The case of
SMRT has shown that even in areas where Singapore is world class (24), there
still are gaps in the ability to use analytics and keep using it in the medium
to long run. And in my view, this case stems from issues with people and
processes.
Use of data
via the application of analytical models whether pure statistical models, ML,
AI… needs to be thought through and people with the right expertise and creativity
are needed to ensure these models keep performing as can be expected.
Just having
the IT skills to deploy a model, especially out of the box, or to follow
documents to launch them is not sufficient.
“Use your
blain!”
Let this be
a $100m lesson.
- https://www.linkedin.com/feed/update/urn:li:activity:7244502068500045824/
- Singapore
GDP per head is USD82,000 yearly, (https://www.macrotrends.net/global-metrics/countries/sgp/singapore/gdp-per-capita), let’s say SGD100,000, say around
$50 an hour. 516,000 commuters are affected a day (https://www.channelnewsasia.com/singapore/east-west-line-disruption-smrt-faulty-train-timeline-4638131), and let’s be nice an assume each
loses 1 hour of aoutput daily, so that is SGD25m a day. The issue has been
going on for 4 days already (excluding weekends), hence SGD100m.
- https://www.lta.gov.sg/content/ltagov/en/newsroom/2024/9/news-releases/update_on_EWL_recovery_works.html
- https://www.straitstimes.com/singapore/transport/water-in-tunnels-human-error-other-major-train-service-disruptions-in-s-pore-s-history
- https://www.nlb.gov.sg/main/article-detail?cmsuuid=0888e6b3-5912-4ceb-b34e-1238a0b2ea8f
- https://sgtransportcritic.wordpress.com/2021/12/16/dec-2011-breakdowns-2021/
- https://ifonlysingaporeans.blogspot.com/2012/05/mrt-breakdown-coi-day-18.html
- https://sg.news.yahoo.com/saw-phaik-hwa-defends-lavish-spending-in-tnp-exclusive.html
- https://www.macrotrends.net/global-metrics/countries/SGP/singapore/population
- https://en.wikipedia.org/wiki/Desmond_Kuek
- https://www.theonlinecitizen.com/2024/09/27/christopher-tan-criticizes-mrt-breakdown-following-decade-long-renewal-program/
- https://www.asiabusinessoutlook.com/news/singapore-tops-generative-ai-adoption-worldwide-nwid-7254.html
- https://www.smrt.com.sg/news-publications/newsroom/smrt-in-the-news/%E2%80%98we-don%E2%80%99t-want-overmaintenance%E2%80%99-smrt-chairman-flags-need-to-balance-rail-reliability-with-costs/4
- https://en.wikipedia.org/wiki/Seah_Moon_Ming
- https://sg.news.yahoo.com/ngien-hoon-ping-third-consecutive-saf-man-smrt-ceo-070030116.html
- https://www.todayonline.com/singapore/smrt-taps-predictive-technology-prioritise-maintenance
- https://www.scmp.com/presented/news/topics/polyu-innovating-better-world/article/2065348/optical-fibre-sensing-technology
- https://www.smartcitiesworld.net/news/worlds-first-onboard-train-track-monitoring-system-in-singapore-1026
- Dr. Chee Keong Tan - Head Network Systems Maintenance - SMRT Trains | LinkedIn
- https://www.globalrailwayreview.com/news/135103/mtr-corporation-mtra-and-hong-kong-polytechnic-university-sign-mou/
- https://www.straitstimes.com/multimedia/graphics/2024/09/ewl-train-breakdown/index.html?shell
- https://www.tandfonline.com/doi/full/10.1080/18824889.2020.1863611
- https://www.youtube.com/watch?v=4s_gyzshNPQ
- https://www.channelnewsasia.com/today/big-read/public-transport-connectivity-mrt-lines-buses-commute-big-read-4445081