“If you don’t have a PhD, don’t call
yourself a data scientist”; with these remarks the government linked person set
the stage to explain his views via a presentation on AI, and the problems
implementation of AI suffers from today.
To flesh out his argument, he
argued that only a PhD gives the rigour and access to large enough data to play
with to become a data scientist.
A very interesting point of view
from someone linked to the government.
Not everything he said was that
controversial, at least to me.
The presenter used my favourite diagram
for data science, the ‘Drew Conway’ diagram (1), acknowledging the importance
of subject matter expertise. Data Science is a balanced combination of “Substantive
Expertise” or subject matter expertise, “Maths and Statistics Knowledge” and “Hacking
Skills” or IT skills.
Furthermore, the presenter also
mentioned how hard it was to find all 3 skills at a required level in 1 person
and also spoke of data science teams; or like what I say: “Data Science is a
team Sport”.
Also the presenter was at pains
to point out that a 3 month course in data science does not make you a data
scientist, so even if you are an English Literature PhD, or hold a PhD in Astro
Physics, a 3 months data science course does not make you a data scientist; it
takes years.
I am on the wall on this one. I
think “data science” like every subject needs practice, and while a 3 month
course will most likely not give you enough experience, it doesn’t have to take
years and years. Any expertise is gained through practice.
Furthermore, the presenter is a
proponent of open source, and advises everyone to eschew classes and learn online
instead, pay tens of dollars rather than hundreds. I am all for learning
online, have taken classes from Data Camp (2) where I learnt a lot, as well as
from Coursera (3).
But where it gets really weird,
and please remember that the presenter is linked to the government, he then
went on to “sell” is classroom courses, of around 3 months, and hopes he can
provide some practical experience.
Unless he is targeting only PhDs
as students, I find what he is saying quite contradictory...
The reason I mentioned English
Literature and AstroPhysics is the presenter further mentioned that one of the
reasons why the country may be finding it hard to find “data scientists” is the
fault of HR departments. They are looking for a unicorn with degrees in
computer science (let alone PhDs). The advice was that they should loosen the
criteria and accept people from different disciplines and who have taken the
online courses...
My view is not that dissimilar.
I believe in passion and without knowing anything about a person, I would say
that an engineer is more likely to make a good “data scientist” than a Statistician
or a Computer Scientist. The reason is that to me, “data science” is about
delivering value and the passion should be to solve problems, the end, not the
means – AI/ML/Stats...
Then this goes back to the PhD
question. Do I believe you can’t be a “data scientist” without a PhD? Well, it
may be self-serving since my profile states “data scientist”, but no, I do not
believe a PhD is required.
In fact, quite a few
organisations have found this. Basically, people with PhDs are great at their
own domain, but “data science: requires a multitude of skills that they may not
have (for example subject matter expertise, or statistics for computer
scientists, or IT skills for Statisticians) or may not want to engage in: the ‘dirty’
work of cleaning and preparing the data. Hence the organisations whose “data
science” department is staffed purely by PhDs find it very difficult to get a
decent RoI. (results, results and results).
While I am at it, I will also
mention that another way that organisations get their staffing wrong (hey, may
be that deserves a separate blog, but here goes) is in the fact that some “data
scientists” delegate the data cleaning and preparation to “data preparation” or
“data engineers”. It gets worse when the latter do not have a clear career path
to the former, like sous-chefs becoming chefs... Data preparation should be
done with a purpose, and unless the high and mighty “data scientist” can
communicate the purpose effectively and in great detail (probably also requires
some EQ), there is a risk that the data preparation will not be that fit for purpose.
Basically I believe that data
cleaning and preparation is part of the role of a “data scientist” especially
since “data science” is by nature iterative and iterations may involve
obtaining and preparing data that was not included initially.
Quite a while ago I did an easy
to understand view of the work of a unicorn (“data scientist”); as you can see,
data preparation and transformation is part of the process. I can understand
that someone who is good a solving business problems may not be very good at
getting data in the most efficient way from various systems, or writing production
ready code, but surely preparing data is part of the role after all, most
people will tell you that this is 70%-80% of the work...(4)(5)
So why I am upset enough to write
this blog?
Basically I believe analytics/”Data
Science” has the power to unlock enough value to create win-win (win)
situations (organisation, customer/society, and (consultancy/ vendor)), and
getting the framework for data science is critical in that regard.
From the presentation I attended,
it would seem that the government has got some things right, some wrong, and
some contradicting each other. I do hope they sort things out; unfortunately,
the Peter principle may be at work.(6), or may be it’s HiPPOs (7) or both since
there often is a high correlation between the two (A hippo named Peter...)
Actually ya, this might be the
topic of my next blog, although I am also itching to write about AI/Automation and
retraining...
P.S. Did I mention that the presenter said that SLR is part of AI?
(See it pays to read all the way to the end... now please clean the coffee from your device)
P.S. Did I mention that the presenter said that SLR is part of AI?
(See it pays to read all the way to the end... now please clean the coffee from your device)
No comments:
Post a Comment