Spend a vision with me: Ethics, privacy, analytics, "data science", big data, big brother...

Ethics is a very hot topic in the field of analytics even more than it is a topic in civic society.

Why are ethics and analytics related? Simply because, during the course of our work, with the amazing amount of data being generated and captured in all sorts of ways by all sorts of players (and traded), added to the advances in technology, the increase in the amount of information, even personal information, we can grasp has increased much, much faster than the awareness of it, and therefore how much thinking has gone into setting ethical standards.

In sum, the ability to grasp information has grown much faster than the awareness of it. Since people are not aware of what you and me could know about them, they don’t see the ethical issues. For most of us, our ethics are rooted in our past.

Of course there are pioneers who subscribe to “it’s easier to ask for forgiveness than permission”, who may know ethics will change over time and that it is highly profitable to be ahead of the ethical changes. This is especially true in the technology and analytics spaces.

Let me first illustrate some possible ethical issues in famous uses of technology in the recent years (in no particular order), and I am not including hacking exploits. In my next blog I will illustrate real life cases people in analytics have faced.

1 Google car also captured wifi information and google kept the data

Remember the google car? It went around many countries worldwide, taking photos and enabling the maps. However they also collected more "It is now clear that we have been mistakenly collecting samples of payload data from open wifi networks, even though we never used that data in any Google products" google said in 2010 (1). However, if you, nowadays, want to 'benefit' from high-accuracy on your location services (something Grab demands), you will 'benefit' from a service that "calls upon every service available: GPS, Wi-Fi, Bluetooth, and/or cellular networks in whatever combination available, and uses Google's location services to provide the most accurate location." (2)

The fun bit was that google claimed to have "mistakenly" collected the data, I may be a technical ignoramus, but somehow I think that a camera and something that snoops wifis are quite different. Plus, even the first time they obtained the data, someone should have asked: "hey! what is this?" (which must have happened if it was a mistake" and hopefully "Should we keep this?".

At the very least, this "should", a question that demands a value judgement, should have been asked.

2 Facebook and emotion contagion experiment of 2012 (3)

Facebook simply wanted to understand whether people’s mood could spread to their friends and contacts.

So they simply arbitrarily, for some users, only show say post of their friends who displayed negative sentiments, while suppressing those that showed positive sentiments. Basically you only see that your friends are not in a good way, and nothing from those who ae doing well.

Do you think that would dampen your mood too?

Well, facebook showed it did.

Interestingly, this experiment affected 689,000 facebook users (like an old fashioned cheque, I will specify six hundred and eighty-nine thousand users).

Interestingly, someone (Clay Johnston from Blue state digital who helped Obama’s 2008 election campaign (4)) had anticipated Facebook’s next move: “Could the CIA incite revolution in Sudan by pressuring Facebook to promote discontent? Should that be legal? Could Mark Zuckerberg swing an election by promoting Upworthy [a website aggregating viral content] posts two weeks beforehand? Should that be legal?”

3 Facebook’s electoral activism (5) mid-term elections 2010

Here facebook ran experiments to find out whether they could influence people to vote.

The idea again is very simple, put an I voted button and encourage people who voted to click on it, and publish on their friend’s feed that they had voted. The idea was to see whether people who were shown that their friends had voted were more likely to also vote, and the contagion spread.

Do you think it was ethical? Does knowing that 61 million people were affected make a difference to your answer (sixty-one million).

Would it make a difference if they targeted only a certain group of people, in a specific area, say where elections are expected to be undecided?

I am quite sure that this experiment, in one way or another, played a part in the Cambridge Analytica campaign to get Mr Trump elected in 2016 (6)

4 Target and Pregnancy Prediction (7)

Target identified a combination of products purchased from their stores that indicated that a person was likely to be pregnant.

Pregnancy is expensive and target decided that it would be a good idea to give customers discounts to alleviate the burden while forming habits that would continue well beying the pregnancy period.

Since this was 2012, it wasn’t straightforward to identify customers as they walk-in, to make them offers. Therefore, target decided to mail the coupons and discount vouchers to the customers’ homes.

In a least one case, a father got to know that target was sending vouchers about cribs and other baby related to his daughter who was still in high school and thought it inappropriate.

Do you think target had violated any ethics in building the model (pregnancy prediction)? How about the way they chose to make use of the information (discounts with a view to form a long term habit)? Or the way they chose to implement the campaign (mailers to homes)?

I ask these questions because to me, an analytics project should encompass all aspects, from data collection, modelling, and execution/implementation. This is probably my ethical view, but if there is no responsibility throughout the chain, then ethics can easily disappers: “I just collect the data, for all I know nobody ever looks at it” or “I just build the models as a challenge, it can only potentially cause harm if employed wrongly” or “I have no idea how this thing as built, I am only executing on a plan”…

5 Amazon’s sexist recruiting tool (8)

Some of you may be old enough to remember that say the calendar was a separate programme you had to install on your desktop computers instead of it being preloaded into windows. After a while Microsoft simply decide to move into this space, drive the calendar companies out of business.

Today’s cloud providers are doing the same.

Amazon decided they wold get into the recruitment space. Using AI, they figured, they could find the best people for various jobs. Afterall they had a treasure trove of data , 10 years worth of applications.

Note that I am not knocking Amazon specifically, Microsoft via linked-in and google via google recruit (9) are also in the business.

What is different with the amazon effort is that they realised that their predictions very heavily biased against women. Their post-hos analysis found that the mention of the word “women” in a cv, for example in “captain of the women’s football team” tended to give applicants a lower score. So were graduates from female only colleges.

Amazon pulled the tool.

Do you think Amazon was ethical (or others are) when using CVs and hiring outcomes to score applicants? Or are they unethically perpetuating human biases?

6 Uber and one-night-stands

I have argued before that companies like grab, uber are not in the transportation business but in the data business. Their focus is on transforming the data that people give them for free into information they can sell.

In 2012, Uber showed what they can do with the data when they published a research piece entitles “rides of glory”. Basically they simply identified people who had one night stands. It is simple really and goes something like: if you are picked up at night from say an area with bars (assuming you consumed alcohol and given that alcohol lowers inhibitions), go to a residential area you rarely visit (from your past behaviour), and leave after a short while or up to dawn (with or without breakfast), then chances are you had a one-night-stand.

Note that even if Uber doesn’t have the data on you going to an unfamiliar place (afterall, your date could have paid for that ride), the fact that you leave the unfamiliar place is also a very good indicator.

Uber pulled the article, but you may still retrieve it from cached data or snapshots or refer to articles that describe the experiment (10).

This becomes more interesting when you consider that since 2016 (11), uber tracked people who have downloaded the app even when you are not using the app; after some furore they have apparently deactivated this feature (12).

Do you think it is ethical for Uber to use the data from passengers to predict one-night-stands, or would you feel this is an invasion of privacy? How about how often you go to a doctor, I bet your insurer would love to know that…

7 Baidu and Wei Zexi (13)

Just in case you thought I was being racist targeting only organisations based in the US, or that ethical issues were a purely western concern, I have included the case of Wei Zexi, a baidu user.

In simple terms, Baidu is the google of china, the number one search engine used by people in China.

Mr Wei Zexi found that he had a very rare form of cancer. And he decided to do some research using Baidu. The top recommendation (the post that appeared at the top of the resuts) was from the XXX Military Academy. The website then proceeded to detail their successes in treating that form of cancer. Mr Wei Zexi was convinced and with his family’s help moved city to undertake treatment.

He did not survive.

It is only later that it was found that the hospital did not really have the success rates it promoted, and arguably, Baidu placed at the top of their list, adding some credibility to the claims.

The Chinese authorities launched an investigation, and news of that investigation caused Baidu’s value on the share market to drop by USD5bn.

Do you think Baidu has an obligation to check on the veracity of the claims made by its clients when being paid to promote certain links to the top of the list? Or should people instead place little value on the rankings (thereby making the ranking algorithm wars obsolete)?

I am not sure how baidu ranks websites, but in the early years google differentiated itself by placing websites that are linked to most other websites on top. The underlying assumption was that if other websites refer to yours, then chances are you have something important/relevant or something many people agree with to say, hence higher credibility/relevance. Google has, of course, improved from page-rank (14) but the idea is there.

Ask yourself, how many times do you use a non-google search engine? How many times do you go to the second page of results? How many times do you read past the top 3 websites? How much trust are you putting in google? Now ask the question again, do you think that ethically google (and baidu and other search engines) should check on the veracity of claims, especially in cases where the price of getting things wrong is high, like in the case of medical treatment?

8 People at google are actually listening to you (15)

Recently, google fired a contractor who leaked individuals “ok google” conversations. While google argued that no personally identifiable information is captured, it is possible that some people may have mentioned their names, addresses.

I assume that users of “ok google” would never have imagined that it is possible for someone to actually be listening to their recordings. For those of you who do use “ok google”, is my assumption correct? Do you assume your chats with “ok google” are man/woman to machine and thereby relatively private? Do you feel this privacy is violated when a human is listening?

9 Alexa is always listening, and recording, and sharing (16)

Alexa is Amazon’s assistant, a device you bring into your home, and to whom you can make requests, ranging from doing internet searches (just like ok google) or to control things around the house (“Alexa, play romantic music”, “Alexa, dim the lights”…)

Did you know that Alexa is always listening?

Think about it, Alexa is designed to “wake up” and know you are addressing her/it when keywords are spoken such as “Alexa”, “Echo”, “Computer”. But in order to “hear” these words when they are spoken, Alexa needs to always be listening. Makes sense?

Where is becomes a bit more interesting is that Alexa keeps the conversations, transcribes them and stores them on the cloud and these are analysed, including by humans.

Do you think this is an invasion of your privacy? What if only machines ‘listened’ to your conversations, would that be ok? Is a line crossed if there are people listening?

10 Smart TV, who is watching who?

It’s not only voice activated machines who are at it; how smart is your smart TV?

Very.

Google makes millions by selling the idea of placing ads you may be interested when you are surfing. wouldn’t it be great if someone could do the same for when you watch tv?

The first step is to simply know who is watching TV at a point in time. For example, I spend most of my time watching cartoons/anime. If you only looked at my channel behaviour, you may assume that I live in a household with kids. Nut at this moment, since each tv gets the same advertisement, the ads are targeted to people with kids.

Once the TV is connected to the internet, then advertisements can be personalised, or at least different segments of people would be shown different ads. While judging based om viewing behaviour is better than nothing, wouldn’t it be better if you knew for sure say how many people are watching tv, may be their gender, approximate age group, on top of the viewing behaviour? That’s what smart TVs do. (17)

Smart TV manufacturers collect the data, and sell it to people who would like to personalise the ads that will be shown on the smart tv, a nice little cycle.

Do you think it is an invasion of privacy if your smart tv is watching you? Is it ethical for the smart tv manufacturers to do this, and even make it real hard to deactivate the fatures?

11 And it does not stop there… vroom vroom

TVs are just there in your living room, presumably voice activated devices such as alexa just stay where you put them. There is a device that moved autonomously around many people’s homes. I have a friend who proudly declares that he has not swept or mopped his home for 2 years.

No he is not a hoarder or loves filth; he uses an autonomous cleaner.

So what is the big deal? Well, recently, the technology has moved in the direction of mapping the internal layout of your home, and this is stored on the cloud, and may be shared with partners (18).

How do you feel about a map of what’s inside your home is potentially shareable? It becomes more interesting when some of these devices are also hooked up to Alexa. Are there any ethical issues here?

12 You can run but you cannot hide, facial recognition everywhere

How do you feel about facial recognition? Isn’t it great to be recognised and greeted/treated properly. There are offices that allow access to employees using just facial recognition. Do you feel that there is any downside to facial recognition?

My guess is that most people would be ok with the technology. They might not be with the millions of cameras constantly trying to identify everyone, everywhere. However, if facial recognition is used to send personalise warnings to children going towards areas with high drowning risk, it’s a good thing right? But maybe not when the system also automatically sends a message to the student’s parents and school (19).

So probably it’s a case of: “it’s not the technology, but the people who apply it”?

Not really. Facial recognition technology isn’t as accurate as you may think.

When the European cup was held in Cardiff a few years ago, the authorities thought it would be a great opportunity to identify and catch football fans who had been captured on camera engaging in various offences but were not caught then. They wrongly identified 2000 people (20). Of 2,470 potential hooligans pointed out by the system, 2,297 were wrong (false positive rate of 92.99%).

And it gets more interesting, Amazon Rekog for example is actually bad at recognising Asian faces (21), if I were in Washington, I’d be worried (22).

Do you think there are ethical questions around the use of facial recognition? Or does it depend on the use case? For example, in cases where the effect of getting it wrong is high (arresting and questioning the wrong person, or preventing some people from accessing some services), may be facial recognition should not be used?

There are cities now back-pedalling on the use of facial recognition in law enforcement (23)(24)(25). And there is an interesting map on the usage of facial recognition (26), would you be more or less likely to visit/live in cities that use facial recognition at scale? And why? Anything to do with ethics?

And hot off the press, google is apparently paying people cash or giving starbucks vouchers for allowing them to capture and use faces. (27) and in case you think it is fake news, I can offer you another piece of news on the topic (28).

And just in case you think that your face can be captured by any cctv anywhere, do bear in mind that cctvs were not designed to be used to collect information to allow facial recognition, specialised devices are. Is your identity worth only USD5?

What do you think?

Conclusion

Just as an indicator for your own, self, just take this little test to see how you range in terms of ethics.

	Use case	Score of Ethical Issues 1 no issues to 5 huge issues
1	Google street view captured and stored wifi details
2	Facebook’s emotion contagion experiment
3	Facebook’s electoral activism (2010 mid-terms)
4	Target and pregnancy prediction
5	Amazon’s sexist recruiting tool
6	Uber and one-night-stands
7	Baidu and Wei Zexi
8	People at google are listening to you
9	Alexa is always listening, recording, sharing
10	Smart TV is watching you
11	Autonomous robot cleaner maps inside your home
12	Facial Recognition everywhere

Do you think there is a positive relationship between your score above and your age?

In my next blog, I will follow up with real life cases faced at work. Chances are that blog will be shorter J

My personal view is: "just because you can, doesn't mean you should"; yes, I often sit in corners - anyone reading my blogs probably must have guessed it. Anybody wants to guess my score in the table above? J

1 https://www.theguardian.com/technology/2010/may/15/google-admits-storing-private-data

2 https://www.androidcentral.com/location-services-whats-difference-between-choices-and-which-should-i-pick#high

3 https://www.theguardian.com/technology/2014/jun/29/facebook-users-emotions-news-feeds

4 https://www.bluestatedigital.com/

5 https://research.fb.com/publications/a-61-million-person-experiment-in-social-influence-and-political-mobilization/

6 https://www.wired.com/amp-stories/cambridge-analytica-explainer/

7 https://www.businessinsider.com/the-incredible-story-of-how-target-exposed-a-teen-girls-pregnancy-2012-2

8 https://www.bbc.com/news/technology-45809919

9 https://hire.google.com/

10 https://gigaom.com/2012/03/26/uber-one-night-stands/

11 https://www.theverge.com/2016/11/30/13763714/uber-location-data-tracking-app-privacy-ios-android

12 https://www.eff.org/deeplinks/2016/12/uber-should-restore-user-control-location-privacy