Sunday 18 February 2018

Know how to address your relatives for Chinese New Year, or ontologise like a "data scientist" while visiting



One of things I usually say about “Data Science” is that a lot of what is being done today has been in existence for a long while, but if being used differently/more extensively partially aided by technological advances.

Ontologies are no different; they have been around for a long while. Most of us have come across taxonomies.

Put simply a taxonomy is a way of classifying stuff in a hierarchy of concepts, such as parent-child, class-sub-class, for example a family tree, useful for Chinese New Year(1), GXFC!



And since this is the lunar year of the dog , let’s look at the order of canidae:


So what’s the difference between a taxonomy and an ontology?

Basically an ontology can capture much more than parent-child type relationships. It is designed to capture ideas.

In fact the earliest known versions of ontologies date back to the ancient Greeks. Ontology is the philosophical study of the nature of being, becoming, existence or reality as well as the basic categories and their relations (2). Ontology is the study of being, existence, essence, and the relations between the different components. 

Ontologies physically manifested and be used as systems to capture knowledge in a certain area, and allow the representation of relationships between the concepts in that area. It can be used as a knowledge base, and that’s one form where it can be useful in “Data Science” for example.

Another thing I always say about “data science” is that the software doesn’t matter much; most software will have the most common formulae/algorithms, hence it is how you use the algorithms, knowing which to use when that is critical, not the software that is just a tool.

However, not all formulations of these algorithms are identical, hence for people new to some software mistakes can creep in unnoticed and bad decisions made.

A simple example will illustrate what I mean. For example if you are doing a simple hypothesis test based on the normal distribution and you key in the mean and standard deviation as you are used to. The p-values/CI are calculated accordingly. Now imagine the software was expecting the variance rather than the standard deviation; for the computation of the limits, it will take the square-root of the ‘variance’ and you are more likely not to reject the null than you should.

Hence enabling a user to quickly reference the parameters of a formula is very useful.


But an ontology can do much more than that. It’s not just different versions of one formula, but also the ability to know what are the closely related formulae/algorithms. This allows the users to pick the best algorithm for the problem they  are trying to solve.


Continuing the example above, what is some of the assumptions of the normal distribution have been violated? Even if you used the correct parameters, the result would likely still be invalid because the formula should not be used. Again this could lead to wrong conclusions.

This is where ProbOnto (3) can be useful. Probonto is an ontology of major probability distributions clearly enumerating the parameters that each formulation expects and the relationship between the formulae.


For example, if you are planning to use a binomial distribution B(n,p), if n, the sample size is large enough, it can be approximated by a normal distribution with mean np and variance np(1-p).


Or if X is a lognormal variable with a certain mean and variance, then log(X) follows a normal distribution with the same parameters. Lognormal distributions are often used in looking at pricing stocks and options; they are at the heart of the Black-Scholes model.

In sum, an ontology can be used as a great way of storing knowledge, as shown by ProbOnto. But is it only useful to forgetful people with some statistics background? And if so, why this blog from me?

Well, ontologies can also have very practical applications even if used purely as knowledge bases. They can be used to capture human knowledge in a systematic way and be used to solve business problems.

Let me take a simple example. If an organisation wants to automatically assess the impact of news on its business how can it do that quickly? 

Organising large volumes of text, extracting key words and arranging them in groups based on the context and enabling measuring the distance between words is what word2vec (4) does. While word2vec has been used for some domains such as in the domain of genes and proteins (5) and radiology (6).

However, not all organisations have the skill, know-how, large enough dataset and time to do their own implementation. However what many organisations do have is in-house know-how and experience.from their own people. Ontologies can be used to represent the knowledge from the experts and represent the words within the context at hand. 

For example, think of the case of the news of a proliferation of chilo infescatellus/sugar cane shoot borer(7) in India, threatening a large part of Indian sugar output 6 months down the road.

If you are an investment advisor, this means that output of sugar will fall; you don’t care about the insect causing the issues, prices are expected to rise, as such customers should buy sugar. Also, our ontology would tell you that alternate sources of sugar could be Brazil and China, and Cosan (8) might be a good stock to buy given their importance on the Brazilian market as well as their diversification into ethanol and other bi-products (ability to quickly change output mix).

But on the other hand, if you are in the chemicals and fertiliser business, the type of insect matters; you will know that there is little pesticides can do, hence given the type of infestation, there is no impact on your business, unlike say if the infestation was of scipophaga excerptalis/sugar cane top borer(9).

It is difficult to expect an out of the box algorithm to be able to work as well in both these 2 contexts, and that’s where ontologies built using human experience and knowledge can bridge or even adequately fill the gap.


  1. https://mustsharenews.com/addressing-relatives-cny/
  2. https://en.wikipedia.org/wiki/Ontology
  3. https://sites.google.com/site/probonto/home
  4. https://arxiv.org/pdf/1301.3781.pdf
  5. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141287
  6. https://www.sciencedirect.com/science/article/pii/S1532046417302575
  7. https://en.wikipedia.org/wiki/Chilo_infuscatellus
  8. https://en.wikipedia.org/wiki/Cosan
  9. https://en.wikipedia.org/wiki/Scirpophaga_excerptalis
 

Wednesday 7 February 2018

Tragedy of commons: what do Elephants and Bicycles have in common?





The other day while crossing the road near the very posh offices at Marina Bay Financial Centre (Home of DBS, SCB, IBM…) I saw a very amusing sight.



A bicycle from one of the bike sharing companies was abandoned, chain out, with the chain and gear rusting at the junction. 

What was even more amusing was the little pink piece of paper.



It is a notice from the Land Transport Authority, it says:
“Please do not park your bicycle/PAB/PMD here. It is an offence to park your bicycle/PAB/PMB other than in a designated parking lot/rack. Your action causes obstruction and/or inconvenience to other path users.
Please remove your bicycle/PAB/PMD from this location and place it in a proper parking lot/rack.
Thank you for your…”

I find this very amusing.
  1. Who was the notice addressed to? The individual who dumped the bicycle, or the owner of the bicycle? These are clearly 2 people/entities (unless the owner of obike did this, which is kind of unlikely, they are on the other side of the road).
  2. What is the aim of the piece of paper, increase litter? Given that it is unlikely that the person who dropped the bike to rust will pick it up, and that no one else would like to pay for a bike that is less that functional, what does this piece of paper achieve?

I wonder why this one didn’t get the pink piece of paper; it was just lying there in the passage way.



And if you look carefully in the background, you will see another bicycle (this one likely owned by an individual) locked against the lamppost near the red umbrellas.


So is this called parking, or is this called littering? I refer you to paragraph 17 of the environment and public health act (1): 
17. No person shall —

(a) deposit, drop, place or throw any dust, dirt, paper, ash, carcase, refuse, box, barrel, bale or any other article or thing in any public place
“ 
The above is kind of the only description that might apply; no it is not swill, or mucus, or food, or motor vehicle whose registration has expired (hmmm about PMD though, but that’s not my topic). What if I just leave a backpack where this bicycle was left, would I be littering? Well if the bicycle wasn’t litter why should my backpack?
There is another little twist I would like to add before I start talking about elephants.



I was having brunch the other day (yutaio and kopio) when I came across this corner of the newly renovated playground that has been reserved as a designated parking lot/rack.

That got me thinking, how much are the companies providing bike sharing services paying the government for this space? It’s can’t be just normal corporate taxes, does anyone with ties with the government know? If they are not renting this space from the government then their costs of business are lower than they should be (basically being subsidised by the government using taxpayer funds). 

Oh and don’t tell me these are public property, they are not; an organisation owns them and if profiting when the public uses them.

And this brings me to elephants.

Many years ago, I helped one of my professors write a paper for a conference about conservation and our paper was about the tragedy of commons. (yes, economics again (2)). The idea we presented was that one of the reasons why elephants were being poached at such an alarming pace and nobody was protecting them as the issue of ownership.

Since elephants are wild, they belong to no one. And no one has any incentive (except people with conservationist tendencies or who simply love animals) to protect them. Belonging to no one is akin to belonging to everyone, and if I don’t take the ivory, someone else will, so I better act fast.

To make matters worse, some farmers might dislike the elephants because the pachyderms sometimes destroy their crops (as farmland gets closer to the ‘wild’).

What we proposed was to give the right to earn tourist dollars for example from a herd of elephants to some people living in the area where the herd spends a lot of time. Then the elephant becomes valuable and worth defending.
So what does that have to do with the bikes?

The reason the situation with the bikes is so weird is the question of ownership. The person who ‘parked’ the bicycle on the patch of grass does not own the bicycle. The company who owns the bike maintains the story of “sharing” but does own the bike. However they have ‘sold’ each bike many times over. 

What I mean is that every person who subscribes to the service pays for the usage (likely puts a deposit from where the usage charges are deducted), and the bike sharing company has managed to sell the services of 1 bicycle to many people. But these people only have the rights to a bicycle if they can find one when they want to avail of that service. As long as they have ‘sold’ the service from the bike to enough people, and that covers the ‘capital outlay” then what follows is mostly profit (minus repair, maintenance and running costs and any fines haha).

In places like Singapore where people tend to be law abiding (and is small) it is unlikely that you will find a bike graveyard (3) (The colourful objects at the top and bottom of the photo are cranes and heavy lorries):



So does that mean that the tragedy of commons does not apply? The bicycles do have owners.

But as long as the costs of the bikes have been recouped, then there is no incentive for the owners of the bike sharing companies to take care of them or be responsible for them (especially if fines are paltry). Then the bicycle becomes a truly common property and I am sure we will see more and more rusty bicycles parked indiscriminately. Remember, the bike companies know exactly who left the bicycle exactly where. If they wanted to do something about it they could; it’s just that it’s not worth the effort.

And if you are wondering what the government is doing about it, they consider is a ‘disamenity’ (not littering) (4) and said please do not do this, and sign this Memorandum whereby “Operators will also remove faulty bicycles within a day. “ and that is a joke because the bicycle in the picture was there for many days.

So what is the cost? As this article shows (5), “out of the 292 removal notices issued, 62 bicycles ended up being impounded because they were not removed within half a day of the notice”, I wonder how the notices were issued, who was made aware of the pink piece of paper and how? Or more pertinently why use the pink piece of paper? The government seems to let people believe that GPS technology is so bad that it cannot be used to pin point whether the bicycle is at a “designated park/rack”… Or are we allowed to place all sorts of stuff (not candy wrappers, swill, snot…) for half a day anywhere without any problem?

I just think it is not sharing when one party makes all the $ and other parties pay for it.