Tuesday, 30 July 2019

Ethics, privacy, analytics, "data science", big data, big brother...

Ethics is a very hot topic in the field of analytics even more than it is a topic in civic society.




Why are ethics and analytics related? Simply because, during the course of our work, with the amazing amount of data being generated and captured in all sorts of ways by all sorts of players (and traded), added to the advances in technology, the increase in the amount of information, even personal information, we can grasp has increased much, much faster than the awareness of it, and therefore how much thinking has gone into setting ethical standards.

In sum, the ability to grasp information has grown much faster than the awareness of it. Since people are not aware of what you and me could know about them, they don’t see the ethical issues. For most of us, our ethics are rooted in our past.

Of course there are pioneers who subscribe to “it’s easier to ask for forgiveness than permission”, who may know ethics will change over time and that it is highly profitable to be ahead of the ethical changes. This is especially true in the technology and analytics spaces.

Let me first illustrate some possible ethical issues in famous uses of technology in the recent years (in no particular order), and I am not including hacking exploits. In my next blog I will illustrate real life cases people in analytics have faced.

1 Google car also captured wifi information and google kept the data

Remember the google car? It went around many countries worldwide, taking photos and enabling the maps. However they also collected more "It is now clear that we have been mistakenly collecting samples of payload data from open wifi networks, even though we never used that data in any Google products" google said in 2010 (1). However, if you, nowadays, want to 'benefit' from high-accuracy on your location services (something Grab demands), you will 'benefit' from a service that "calls upon every service available: GPS, Wi-Fi, Bluetooth, and/or cellular networks in whatever combination available, and uses Google's location services to provide the most accurate location." (2)

The fun bit was that google claimed to have "mistakenly" collected the data, I may be a technical ignoramus, but somehow I think that a camera and something that snoops wifis are quite different. Plus, even the first time they obtained the data, someone should have asked: "hey! what is this?" (which must have happened if it was a mistake" and hopefully "Should we keep this?".

At the very least, this "should", a question that demands a value judgement, should have been asked.


2 Facebook and emotion contagion experiment of 2012 (3)

Facebook simply wanted to understand whether people’s mood could spread to their friends and contacts.

So they simply arbitrarily, for some users, only show say post of their friends who displayed negative sentiments, while suppressing those that showed positive sentiments. Basically you only see that your friends are not in a good way, and nothing from those who ae doing well.
Do you think that would dampen your mood too?

Well, facebook showed it did.

Interestingly, this experiment affected 689,000 facebook users (like an old fashioned cheque, I will specify six hundred and eighty-nine thousand users).

Interestingly, someone (Clay Johnston from Blue state digital who helped Obama’s 2008 election campaign (4)) had anticipated Facebook’s next move: “Could the CIA incite revolution in Sudan by pressuring Facebook to promote discontent? Should that be legal? Could Mark Zuckerberg swing an election by promoting Upworthy [a website aggregating viral content] posts two weeks beforehand? Should that be legal?”

3 Facebook’s electoral activism (5) mid-term elections 2010

Here facebook ran experiments to find out whether they could influence people to vote.

The idea again is very simple, put an I voted button and encourage people who voted to click on it, and publish on their friend’s feed that they had voted. The idea was to see whether people who were shown that their friends had voted were more likely to also vote, and the contagion spread.

Do you think it was ethical? Does knowing that 61 million people were affected make a difference to your answer (sixty-one million).

Would it make a difference if they targeted only a certain group of people, in a specific area, say where elections are expected to be undecided?

I am quite sure that this experiment, in one way or another, played a part in the Cambridge Analytica campaign to get Mr Trump elected in 2016 (6)

4 Target and Pregnancy Prediction (7)

Target identified a combination of products purchased from their stores that indicated that a person was likely to be pregnant.

Pregnancy is expensive and target decided that it would be a good idea to give customers discounts to alleviate the burden while forming habits that would continue well beying the pregnancy period.

Since this was 2012, it wasn’t straightforward to identify customers as they walk-in, to make them offers. Therefore, target decided to mail the coupons and discount vouchers to the customers’ homes.

In a least one case, a father got to know that target was sending vouchers about cribs and other baby related to his daughter who was still in high school and thought it inappropriate.

Do you think target had violated any ethics in building the model (pregnancy prediction)? How about the way they chose to make use of the information (discounts with a view to form a long term habit)? Or the way they chose to implement the campaign (mailers to homes)?

I ask these questions because to me, an analytics project should encompass all aspects, from data collection, modelling, and execution/implementation. This is probably my ethical view, but if there is no responsibility throughout the chain, then ethics can easily disappers: “I just collect the data, for all I know nobody ever looks at it” or “I just build the models as a challenge, it can only potentially cause harm if employed wrongly” or “I have no idea how this thing as built, I am only executing on a plan”…

5 Amazon’s sexist recruiting tool (8)

Some of you may be old enough to remember that say the calendar was a separate programme you had to install on your desktop computers instead of it being preloaded into windows. After a while Microsoft simply decide to move into this space, drive the calendar companies out of business.

Today’s cloud providers are doing the same.

Amazon decided they wold get into the recruitment space. Using AI, they figured, they could find the best people for various jobs. Afterall they had a treasure trove of data , 10 years worth of applications.

Note that I am not knocking Amazon specifically, Microsoft via linked-in and google via google recruit (9) are also in the business.

What is different with the amazon effort is that they realised that their predictions very heavily biased against women. Their post-hos analysis found that the mention of the word “women” in a cv, for example in “captain of the women’s football team” tended to give applicants a lower score. So were graduates from female only colleges.

Amazon pulled the tool.

Do you think Amazon was ethical (or others are) when using CVs and hiring outcomes to score applicants? Or are they unethically perpetuating human biases?

6 Uber and one-night-stands

I have argued before that companies like grab, uber are not in the transportation business but in the data business. Their focus is on transforming the data that people give them for free into information they can sell.

In 2012, Uber showed what they can do with the data when they published a research piece entitles “rides of glory”. Basically they simply identified people who had one night stands. It is simple really and goes something like: if you are picked up at night from say an area with bars (assuming you consumed alcohol and given that alcohol lowers inhibitions), go to a residential area you rarely visit (from your past behaviour), and leave after a short while or up to dawn (with or without breakfast), then chances are you had a one-night-stand.

Note that even if Uber doesn’t have the data on you going to an unfamiliar place (afterall, your date could have paid for that ride), the fact that you leave the unfamiliar place is also a very good indicator.

Uber pulled the article, but you may still retrieve it from cached data or snapshots or refer to articles that describe the experiment (10).

This becomes more interesting when you consider that since 2016 (11), uber tracked people who have downloaded the app even when you are not using the app; after some furore they have apparently deactivated this feature (12).

Do you think it is ethical for Uber to use the data from passengers to predict one-night-stands, or would you feel this is an invasion of privacy? How about how often you go to a doctor, I bet your insurer would love to know that…

7 Baidu and Wei Zexi (13)

Just in case you thought I was being racist targeting only organisations based in the US, or that ethical issues were a purely western concern, I have included the case of Wei Zexi, a baidu user.

In simple terms, Baidu is the google of china, the number one search engine used by people in China.

Mr Wei Zexi found that he had a very rare form of cancer. And he decided to do some research using Baidu. The top recommendation (the post that appeared at the top of the resuts) was from the XXX Military Academy. The website then proceeded to detail their successes in treating that form of cancer. Mr Wei Zexi was convinced and with his family’s help moved city to undertake treatment.

He did not survive.

It is only later that it was found that the hospital did not really have the success rates it promoted, and arguably, Baidu placed at the top of their list, adding some credibility to the claims.

The Chinese authorities launched an investigation, and news of that investigation caused Baidu’s value on the share market to drop by USD5bn.

Do you think Baidu has an obligation to check on the veracity of the claims made by its clients when being paid to promote certain links to the top of the list? Or should people instead place little value on the rankings (thereby making the ranking algorithm wars obsolete)?

I am not sure how baidu ranks websites, but in the early years google differentiated itself by placing websites that are linked to most other websites on top. The underlying assumption was that if other websites refer to yours, then chances are you have something important/relevant or something many people agree with to say, hence higher credibility/relevance. Google has, of course, improved from page-rank (14) but the idea is there.
Ask yourself, how many times do you use a non-google search engine? How many times do you go to the second page of results? How many times do you read past the top 3 websites? How much trust are you putting in google? Now ask the question again, do you think that ethically google (and baidu and other search engines) should check on the veracity of claims, especially in cases where the price of getting things wrong is high, like in the case of medical treatment?

8 People at google are actually listening to you (15)

Recently, google fired a contractor who leaked individuals “ok google” conversations. While google argued that no personally identifiable information is captured, it is possible that some people may have mentioned their names, addresses.

I assume that users of “ok google” would never have imagined that it is possible for someone to actually be listening to their recordings. For those of you who do use “ok google”, is my assumption correct? Do you assume your chats with “ok google” are man/woman to machine and thereby relatively private? Do you feel this privacy is violated when a human is listening?

9 Alexa is always listening, and recording, and sharing (16)

Alexa is Amazon’s assistant, a device you bring into your home, and to whom you can make requests, ranging from doing internet searches (just like ok google) or to control things around the house (“Alexa, play romantic music”, “Alexa, dim the lights”…)

Did you know that Alexa is always listening?

Think about it, Alexa is designed to “wake up” and know you are addressing her/it when keywords are spoken such as “Alexa”, “Echo”, “Computer”. But in order to “hear” these words when they are spoken, Alexa needs to always be listening. Makes sense?

Where is becomes a bit more interesting is that Alexa keeps the conversations, transcribes them and stores them on the cloud and these are analysed, including by humans.

Do you think this is an invasion of your privacy? What if only machines ‘listened’ to your conversations, would that be ok? Is a line crossed if there are people listening?

10 Smart TV, who is watching who?

It’s not only voice activated machines who are at it; how smart is your smart TV?

Very.

Google makes millions by selling the idea of placing ads you may be interested when you are surfing. wouldn’t it be great if someone could do the same for when you watch tv?

The first step is to simply know who is watching TV at a point in time. For example, I spend most of my time watching cartoons/anime. If you only looked at my channel behaviour, you may assume that I live in a household with kids. Nut at this moment, since each tv gets the same advertisement, the ads are targeted to people with kids.

Once the TV is connected to the internet, then advertisements can be personalised, or at least different segments of people would be shown different ads. While judging based om viewing behaviour is better than nothing, wouldn’t it be better if you knew for sure say how many people are watching tv, may be their gender, approximate age group, on top of the viewing behaviour? That’s what smart TVs do. (17)

Smart TV manufacturers collect the data, and sell it to people who would like to personalise the ads that will be shown on the smart tv, a nice little cycle.

Do you think it is an invasion of privacy if your smart tv is watching you? Is it ethical for the smart tv manufacturers to do this, and even make it real hard to deactivate the fatures?

11 And it does not stop there… vroom vroom

TVs are just there in your living room, presumably voice activated devices such as alexa just stay where you put them. There is a device that moved autonomously around many people’s homes. I have a friend who proudly declares that he has not swept or mopped his home for 2 years.

No he is not a hoarder or loves filth; he uses an autonomous cleaner.

So what is the big deal? Well, recently, the technology has moved in the direction of mapping the internal layout of your home, and this is stored on the cloud, and may be shared with partners (18).

How do you feel about a map of what’s inside your home is potentially shareable? It becomes more interesting when some of these devices are also hooked up to Alexa. Are there any ethical issues here?

12 You can run but you cannot hide, facial recognition everywhere

How do you feel about facial recognition? Isn’t it great to be recognised and greeted/treated properly. There are offices that allow access to employees using just facial recognition. Do you feel that there is any downside to facial recognition?

My guess is that most people would be ok with the technology. They might not be with the millions of cameras constantly trying to identify everyone, everywhere. However, if facial recognition is used to send personalise warnings to children going towards areas with high drowning risk, it’s a good thing right? But maybe not when the system also automatically sends a message to the student’s parents and school (19).

So probably it’s a case of: “it’s not the technology, but the people who apply it”?

Not really. Facial recognition technology isn’t as accurate as you may think.

When the European cup was held in Cardiff a few years ago, the authorities thought it would be a great opportunity to identify and catch football fans who had been captured on camera engaging in various offences but were not caught then. They wrongly identified 2000 people (20). Of 2,470 potential hooligans pointed out by the system, 2,297 were wrong (false positive rate of 92.99%).

And it gets more interesting, Amazon Rekog for example is actually bad at recognising Asian faces (21), if I were in Washington, I’d be worried (22).

Do you think there are ethical questions around the use of facial recognition? Or does it depend on the use case? For example, in cases where the effect of getting it wrong is high (arresting and questioning the wrong person, or preventing some people from accessing some services), may be facial recognition should not be used?

There are cities now back-pedalling on the use of facial recognition in law enforcement (23)(24)(25). And there is an interesting map on the usage of facial recognition (26), would you be more or less likely to visit/live in cities that use facial recognition at scale? And why? Anything to do with ethics?

And hot off the press, google is apparently paying people cash or giving starbucks vouchers for allowing them to capture and use faces. (27) and in case you think it is fake news, I can offer you another piece of news on the topic (28).

And just in case you think that your face can be captured by any cctv anywhere, do bear in mind that cctvs were not designed to be used to collect information to allow facial recognition, specialised devices are. Is your identity worth only USD5?

What do you think?

Conclusion


Just as an indicator for your own, self, just take this little test to see how you range in terms of ethics.


Use case
Score of Ethical Issues
1 no issues to 5 huge issues
1
Google street view captured and stored wifi details

2
Facebook’s emotion contagion experiment

3
Facebook’s electoral activism (2010 mid-terms)

4
Target and pregnancy prediction

5
Amazon’s sexist recruiting tool

6
Uber and one-night-stands

7
Baidu and Wei Zexi

8
People at google are listening to you

9
Alexa is always listening, recording, sharing

10
Smart TV is watching you

11
Autonomous robot cleaner maps inside your home

12
Facial Recognition everywhere


Do you think there is a positive relationship between your score above and your age?

In my next blog, I will follow up with real life cases faced at work. Chances are that blog will be shorter J

My personal view is: "just because you can, doesn't mean you should"; yes, I often sit in corners - anyone reading my blogs probably must have guessed it. Anybody wants to guess my score in the table above? J



Monday, 8 July 2019

Data-driven is at risk (Bias, horizons, migration part II)


So ok, people are getting visibly more selfish and extreme in their views, so what?

In my previous blog (1) I used some numbers around the elections of Mr Trump and Brexit, and showed that the data was clearly pointing towards Mr Trump winning because of white people, and Brexit happening thanks to older people. I also explained why it is likely that we will see more and more of extreme views being taken and exhibited by people. In this blog, I consider what, as analytics people, we should expect and how we could deal with this new reality.




Data Driven is at risk

People in the analytics field have waited for the aim of becoming data-driven for a long time. Using data to help organisations make the ‘right’/’optimal’ decisions is what we do. 

For many years we were confined to very specific business units such as marketing/customer management – where we increased product take up, decreased churn, increased the value of customers to the organization while decreasing the costs, risk – where we help manage risk and find optimal points to balance risk and return, red flagging potential issues before they happen so that pre-emptive action can be taken, or as they are taken to help relatively objectively re-evaluate cases, operations – where we increase efficiency by making processes smoother, decreasing TAT, optimizing resources and resource utilization, human resource management – where we help select suitable candidates, reduce employee churn, flag potential issues ahead of time and so on.

Finally, with all the buzz around “data science”, ML. AI, DL, the concept of using data to make decisions became mainstream, and the idea of expanding the use of analytics to the organization as a whole started taking hold. So we has Chief Data Officers, Chief Analytics Officers… with associated teams who actually walk the talk, including specialized analytics consultancies such as experfy(2) and AlphaZetta (3). These organisations provide expert support in very specialized roles across the analytics spectrum, although less so on the dev-ops side where the market is very well stocked.

In fact, this emerging gap in the market caused management consultancies to try and create, or buy their own analytics shops, some more interestingly (4) than others (5).

But this new trend of visible extreme runs counter to data driven.

I mentioned the gap in the market above; not only have specialized organisations sprung up to address this gap using human experts, but some organisations, taking advantage of the lack of suitably qualified humans have gone the systems way and codified specific procedures in to generic solutions that may be applied in different cases, for example data robot (6). To me this just makes it easier to pick the answer you like; many algorithms are thrown at the data and a user can get to pick the one that suits them. I am not saying that a human wouldn’t do this – in fact running algorithms is probably the easiest part of the analytics process – but automated systems allow this to be done easier. It’s not the technology, it’s how it is used (and by who).



The second coming of the HiPPOs

In this murky environment, the HiPPOs are making a return. HiPPOs are decisions made by going along with the Highest Paid Person’s Opinion. Despite the hype around “data scientists” remuneration, usually they are not the HiPPOs.

Instead of being driven by data which means that in different circumstances, different positions are optimal and should, in true data-driven fashion, be taken, extreme and inflexible decision makers look for evidence that suits them.

In an environment where it is becoming more acceptable to express and push extreme opinions, where if you really want you can find some interpretation of data that suits your views – usually in the websites/newsfeeds you go regularly to/are recommended to you. After all, we are living in the world of “fake news”(7) and “deep fakes”(8). The fact is that very few people or organisations have the skills and resources to question the data, and in such an environment, it becomes very easy to drive a personal agenda.

A little knowledge

Another related issue is that everyone thinks they are experts, a little knowledge is bad (9). I once was in a meeting when someone said: “It’s easy to become a ‘data scientist’, you can just take a couple of hours course online! I’ve done it myself!”

I think online courses are great. Democratisation of knowledge is good because it allows for healthy informed debates, and eventually less bias/ However, there are all sorts of courses online, and it is very easy to spend a lot of time, effort and money on courses that are not that useful. 

Relying on wisdom of the crowds (rating systems) is not necessarily the right thing, especially for introductory courses which create the foundation upon which further knowledge will be added, simply because the majority of people giving ratings would themselves be newbies, hence the ratings would likely reflect how the learning took place rather than advised opinions on the quality of the learning. There has been some effort to build curate curricula out of the publicly available courses, but not everyone can benefit from them.

Correlation Co-Co-your-head

How many of us have heard the phrase “correlation does not mean causation” in the office or even at social events? Or even worse: how many of us have heard people use the word “correlation’ in everyday conversation? Especially when people use “correlation” to mean a relationship? (The most common measure of correlation in fact, only describes/measures linear relationships(10)).



Hence paradoxically, while more people are believing that being data driven is a good thing, we are getting less data-drive because:


1 It is getting easier to find “supporting evidence” by choosing algorithms after the fact based on their results. The emergence of ‘automated’ algorithm testing software plays a big part in that.

2 The proliferation of “data science” courses combined with the lack of “quality”/”suitability” checks gives the impression that “data science” is all about a couple of catchy algorithms let loose on the data, any data.

3 “Fake news” is becoming more and more common; in some cases, people repost and give credence to stuff without doing research into whether it is true. Not checking is not new. For example, it took 3 years and countless applications to realise that evidence in favour of feeding patients 80% oxygen after operations was flawed, and so was the WHO’s advice based on these paper (11)



So what can analytics people who choose to walk the talk do about this?

Unless you are given the mandate and the power, don’t even think of changing organisations

Most of us end up working for organisations, whether we are actually freelancers or contractors or employees. Depending on the level or role we are engaged in, there are cases when a nice carrot of being given a chance of making an organization become truly data-centric, or at least taking part in an effort to make an organization truly data-centric. Here are a few things to look out for.

1 Industry

The industry you are in makes a huge difference. It is arguable that the degree of exposure to digitization in an industry is a good gauge how likely more people in management would be on board the data-driven train.

A group of people from McKinsey wrote a very interesting article in HBR (12) that uses data and provides possible explanations for the findings. I have reproduced the diagram at the heart of their paper below:
see



One of the interesting things about such surveys is that the financial services industry is one of the most digitized. I think it is necessary to split banking and insurance. 

I have worked in both banking and insurance, and from my experience, insurance has far more dinosaurs alive than banking does. For example, I have met a CIO/CTO of an insurance company who was convinced that the best way to deal with migration of dirty data is to hire students as temps and re-key in the whole database (and I am not talking about only hundreds of records).

But this also means that the rewards for becoming data-driven in the insurance space are huge. And there are start-ups that are trying to bring modernity to insurance. However, there is a long way to go.
In sum, you can use the study by McKinsey as a guide but remember, insurance is much lower down the order.

From a business perspective, it is much easier to ‘sell’ analytics in an industry where it is already quite accepted; in fact many organisations would be looking around for partners; not being left behind is a powerful motivator.

2 strong leadership

If the adoption of digital in an industry, then, if there is to be data driven, the decision and support has to come from the top. Changing people’s mindsets is not easy, and we may disagree on what makes effective leadership, but I am quite sure we would agree that the ability to stay the course is critical in a leader who wants to implement change towards data driven.

You may argue that I am shooting myself in the foot, because I have been arguing that HiPPOs would be winning, and you are very unlikely to have a bigger HiPPO than the CEO. 

However, a HiPPO can easily win any battle, but change is more of a war than a battle. Furthermore, if there are opposing (or even just old-school status quo) voices in the organization – or people who want to become the new HiPPO – then any change will be in a very stop-start/one-step-forwards-two-back fashion. 

Basically without strong leadership, an organization cannot change, or will change at the pace of its slowest/least-willing/biggest bully executive.

In the case of analytics/”data science”, there is simply no point trying to effect change without strong leadership. It will just lead to frustration.

For consultancies on the other hand, doing tactical projects can be a decent source of income, as long as success metrics are defined very clearly at the outset; however any project where delivery cannot be measured objectively should be avoided.

3 Results based culture

It may sound obvious to analytics/”data science’ people, but not all organisations are results based. 
You do not need analytics to be results based, not only in sales but across the organization, such as in operations and even departments like marketing, PR and comms…

An organization who has the mindset of measuring outcomes, setting KPIs and rewarding their people according to these KPIs will quickly adapt to the use of analytics/”data science”.

But if an organization still rewards people subjectively, focuses on ‘effort’ (or worse relationship) rather than outcome, then the first battle for use of analytics will be over the need to measure outcomes. 

That’d be a lot of change to be done, and implementing analytics is hard enough without having to fight that battle.

Furthermore, an organization that does not have a results-based culture is most likely to be personality driven, HiPPOs will fight for their corner of the murky waters… 

In sum, it would not be a good organization for someone into analytics to join immediately.

As for consulting, it may be useful, but focus on short term projects that minimize bruising from HiPPOs, that have at least clear measurable outcomes. These may or may not grow into longer more transformational projects, or even longer term contracts. But as long as the culture does not change, it might be better not to prioritise such opportunities.

Summary 

Nowadays people can find like-minded people easier, and this has led to more visible and unbending extreme stances

Furthermore, the craze around ML/DL/AI/”data science” has increased demand for analysts/”data scientists” but also allowed some not so accurate beliefs to seep into the collective mind.

However, the increased demand, has also increased the supply of training and courses that supposedly prepare one for a career in analytics. But the quality of these is hard to ascertain. 

This sometimes creates people with a little knowledge that may be dangerous. This is worse when that knowledge belongs to executives, or worse HiPPOs.

Therefore, as practitioners, while the goal of “data-driven” organization is very enticing, most of us end-up being disillusioned and frustrated.

In order to minimize chances of this, I have proposed 3 aspects one should be very mindful of before getting involved in a gig/contract/job.

1 Industry – different industries have different maturity I use of analytics/digitization. In a more mature industry, it is easier to find people to support the efforts to be data driven. In others, you may find more blockers. As an employee that can be real frustrating.

2 Leadership – sometimes a good leader can drive change. Finding a strong leader is critical if an organization has to become data-driven. Change has to take place across the organization, and mini HiPPOs, status-quo people will drag the process down, deflect issues and focus on their comfort zone BAU. A leader who cannot cut through this is not someone you should follow as an employee. As a contractor, focus on measurable outcomes, to minimize risk, and report directly to the project sponsor who has to be right at the top.

3 Culture – any organization that wishes to become data-driven has to measure results and reward people accordingly. If an organization is not already doing that, the battle to become data driven will be long and painful. So someone from the analytics field should really reconsider joining such an organization. However, this is good hunting ground for gigs/short contracts as long as you can ensure metrics are used as acceptance criteria; this would help minimize chances of being crushed by HiPPOs.

Conclusion

If you find an opportunity to make an organization data-driven, and this organization is in an industry not traditionally associated with analytics, whose leader is more of a consensus/approval seeking person and where the organization doesn’t focus on results, then it is probably better to look elsewhere if you are looking for a longish work stint.

On the other hand, an organization that is in an industry that has been toying with analytics for a while, has a results based culture and has strong supportive leadership, then lucky you.