Human Resource Analytics is one of the more exciting
applications of ‘analytics’/’Big Data’/’Data Science’. A very straight forward
application is to predict who is likely to leave within a certain time frame,
and also, to some degree, what can be done to retain the employee if the
organisation wishes to do so. (Not all employees who want to leave are worth
keeping). I’ve built this type of model and they work quite well, and I only
used the employee’s behaviour at work.
This can be even more interesting in the context of “Big
Data”, with the increasing use of social networking and online presence, most
of us leave traces for more behavioural clues to be harvested. Recently there
has been a trend to use machine learning more in Human Resource, especially in
terms of hiring. It is sold as finally unbiased hiring. But how accurate is
that?
Quite a few start-ups are promising unbiased hiring, getting
you the best candidate by using proprietary algorithms, very often machine learning
algorithms.
Basically, in machine learning, the machine is fed a whole
bunch of data, in the context of HR this is likely to be CVs of applicants, any
test answers, interview notes, and in the case of supervised algorithms,
whether the person was hired or not (and may be the subsequent performance of
the people hired). The machine then trawls through the data, learns, and when
fed a new bunch of candidates will pick a list of top candidates based on what
it has learnt.
My first reaction to this is that, if the data you feed into
the algorithm is the data collected by your own organisation within the hiring
period, then the machine is likely to replicate what the organisation has been
doing. Let’s say that most of your hiring managers come from a specific school,
they might be biased in favour of that school, and hire people accordingly. The
machine will, in an unbiased way, learn this behaviour and apply it. Training a
machine on biased data, is likely to lead to the machine churning out biased
outcomes.
Of course there are a few ways round this.
One way is not to
limit the training dataset to a single organisation, but to take a broader view
of the market, basically trying to remove or at least dilute the bias. May be
your organisation is populated by graduates from school A, your competitor by
those from School B, so if you train your algorithm on the combined dataset, you are likely to be reducing the bias. This is the idea behind credit bureau
scores, and is likely what quite a few of the start-ups in the HR space are trying to do.
In this case, the organisation-specific biases are likely to
be mitigated, as long as the training dataset is diverse enough. However,
unless you have the universe, you still run the risk of being biased. Ideally
you should weigh your training set; it’s like stratified sampling in a way, but
how will you know how to stratify? You’d need to know the biases.
Another approach is to repair the training dataset, for example: http://www.datasociety.net/output/hiring-by-algorithm/
Basically, the output of the algorithm has to be analysed, and any biases
uncovered before implementation, and the training dataset modified to remove
that bias. It is not as easy as it seems. For example, imagine if the true
reason why people are rejected from the hiring process is not directly apparent
in the data, let's say due to criminal record. If some section of society, say
people with blue hair are over represented in the population of people with
criminal record, then it is possible that the algorithm will be less likely to
recommend people with blue hair for hire based on the fact that a smaller
proportion of them have been hired in the past, not recognising the real
reason. Basically blue hair becomes sort of a proxy for criminal record.
To go a level deeper, even an algorithm trained on the whole
population is unlikely to be free of societal biases https://www.theguardian.com/technology/2016/aug/03/algorithm-racist-human-employers-work
. For example if women have not been employed in C-level positions, it is
likely that an algorithm would pick that up and not recommend women for these
positions, or show people online whom they know to be women, lower paying jobs:
http://www.andrew.cmu.edu/user/danupam/dtd-pets15.pdf
.
To make matters worse, the algorithms often are black boxes.
Unless the results of the algorithm are analysed and an attempt is made to find
the possible drivers by supervised techniques, it would be almost impossible to
uncover the actual drivers. (There are vendors who do that, but clients need to
bear in mind that these are not necessarily the real drivers, and to me, it
kind of defeats the purpose)
As an example, imagine a story about a person committing
crimes, getting caught by the police, and getting sent to jail is told to a
classroom. Everyone agrees that the outcome (ending up in jail) is undesirable.
However, what was learnt from the story: do not commit crimes, or do not get
caught? A teacher can find out by simply asking the students, but a black box
is unlikely to answer.
This is what is called algorithmic transparency, one of the
solutions that the ford foundation posits to solving the biases in algorithms. https://www.fordfoundation.org/ideas/equals-change-blog/posts/can-computers-be-racist-big-data-inequality-and-discrimination/
but by definition, black boxes are not transparent.
For those of you who are interested in reading more about
the topic I’d recommend the following paper by Barocas and Selbst: Big Data’s disparate impact http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2477899 . For an overview of a method to uncover the hidden
biases and try to remove them from the training dataset bearing in mind the
trade-offs http://arxiv.org/pdf/1412.3756v3.pdf.
Basically it takes a lot of effort and if the actual predictors used are not
visible, even more so.
In summary, I’d say that achieving total impartiality in a Human Resource
function such as hiring by using machine learning is not as easy as it seems. Humans
are generally biased, even societies have biases, and this bias often taints
the machines too, especially via the data the algorithms are trained on. And if
the algorithms are black-boxes, it makes it that much harder to remove the
biases.
Hello,
ReplyDeleteThe Article on is machine learning the answer to removing biases from the hiring process is nice.It give detail information about it.Thanks for Sharing the information about big data scientist