January 27, 2013

Taxonomy pt 2

In my last entry I discussed the special constraints and problems that occur when you need to implement a classification system that changes over time.

This time we will take the discussion a little bit deeper and look at the basic API of a taxonomy component.

A taxonomy is a graph where each taxon has a limited life span and is traceable through previous revisions of the taxonomy.
If I would use birds as an example (and I really like to do that). The Armenian gull, Larus armenicus, is considered a specie by The Association of European Rarities Committees.
It was first considered a sub specie of Herring Gull (L. argentatus), but after that specie was split into European Herring Gull, Larus argentatus, American Herring Gull, Larus smithsonianus, Caspian Gull, Larus cachinnans, Yellow-legged Gull, Larus michahellis, Vega Gull, Larus vegae and the Armenian Gull, Larus armenicus.

To complicate stuff further another taxonomy, namely birdlife.org, doesn't consider it to be a valid specie but lumps it together with Yellow-legged Gull (Larus michahellis).

So... depending on when you see this gull and which taxonomy you use it can be either a Herring gull, an Armenian gull or a Yellow-legged gull and then you're only checking two taxonomies and believe me, there are more out there...

What does this tell us?

  1. A taxon has a time span.
  2. A taxon is derived from one or more taxons.
  3. A taxon is dependent of its taxonomy and several parallel taxonomies may exist.
  4. To find a taxon from its key (the latin name in this case), you will need to know the key, the time and the taxonomy.
To find out that the Armenian gull nowadays is considered a sub specie of Yellow-legged gull in Birdlife, I would have to backtrack the taxonomy graph to find the Armenian gull and then follow it to present time to see that it has been included in Yellow-legged gull. 

Even though this example is about birds, the same will apply more or less to any other type of taxonomy.

In pseudo-code the core functions would be:
  • Taxon InsertTaxon(Taxon taxon, List<Taxon> ancestors, DateTime validFrom) to insert a taxon based on zero or more ancestors.
  • Taxon FindTaxon(String key, DateTime when, Taxonomy string) to find a taxon given a key, time and a certain taxonomy.
The signature can of course differ, but the basic design will be the same.

Next time we will take a look on how we can implement this.

Til then... Bye, bye!

No comments:

Post a Comment