February 9, 2013

Taxonomy pt 3

In my last adventure in taxonomies I had hopes of using Neo4J and Neo4JClient to implement a taxonomy. However, I could never implement the Cypher-queries that worked in Neo4J through the .Net-library Neo4JClient. I got a "400 bad request"-error, struggled, tore my imaginary hair, gave up and changed the route to QuickGraph.

QuickGraph is a graph engine and not a database. It is not a big issue when it comes to a small taxonomy, but would pose problems for huge taxonomies since it holds all data in memory. For me it was a relief, since QuickGraph is something I know and feel comfortable with.

Installation of QuickGraph is straightforward. Just download, build and set a reference.

As a reminder a taxonomy is something where the meaning of a certain term is valid over a specific time. In this case the time is in which revision the term was defined. Like this:

I am using a directed graph where each revision of the taxonomy is descendant of the last revision.

As a practice set I used the gulls in the last blog (check it out).

Step 1.  Create an object that will hold each gull.

    public class Taxon 
        public Guid TaxonId { get; set; }
        public string ScientificName { get; set; }
        public string CommonName { get; set; }
        public int Revision { get; set; }

Since each scientific name can be used in several revisions I use a guid to know which unique taxon it is.

Step 2. Instantiate a directed graph.

BidirectionalGraph<Taxon, IEdge<Taxon>> graph = new BidirectionalGraph<Taxon, IEdge<Taxon>>();

This basically means that the nodes will be of the type Taxon and that the edges will connect two Taxon objects. Bidirectional means that each nod will no who it is descendent from and who derives from it.

Step 3. Implement the InsertTaxon function.

        public void InsertTaxon(Taxon taxon, List<Taxon> ancestors, BidirectionalGraph<Taxon, IEdge<Taxon>> graph)
            foreach (Taxon item in ancestors)
                TaggedEdge<Taxon, double> outEdge = new TaggedEdge<Taxon, double>(taxon, item, 1);


We're basically inserting the taxon for a new revision of a taxonomy and connects it to all taxons it derives from. By using a TaggedEdge we have a weighted graph. I just default it to one though.

Step 4. Implement the FindTaxon method.

        public IEnumerable<Taxon> FindTaxon(String scientificName, int fromRevision, int toRevision)
            //Find the taxon with the same scientific name from the chosen revision
            Taxon searchTaxon = graph.Vertices.Where(s => s.ScientificName == scientificName && s.Revision == fromRevision).First();
            //use that taxon to find the possible taxons at the to revision
            List<Taxon> results = new List<Taxon>();
            results.AddRange(GetAllDescendantsAtTime(searchTaxon, toRevision));
            return results.Distinct();

        private List<Taxon> GetAllDescendantsAtTime(Taxon searchTaxon, int toRevision)
            List<Taxon> results = new List<Taxon>();
            foreach (Taxon inNode in graph.InEdges(searchTaxon).Where(s => s.Source.Revision <= toRevision && s.Source.Revision > searchTaxon.Revision).Select(n => n.Source))
                if (inNode.Revision != toRevision)
                    results.AddRange(GetAllDescendantsAtTime(inNode, toRevision));
            return results;

So what we basically do is to find what say a Herring Gull was at revision 1 of the taxonomy and then see what it might be at revision 3.
We accomplish this by walking the graph recursively until we find all revision 3 species that are  based on the Herring Gull. The answer would be European Herring Gull, Yellow Legged Gull, Vega Gull, Caspian Gull and American Herring Gull. 

To note: This approach would need a specific graph for each taxonomy and, yeah, there are probably some algorithms that solves the problem faster than mine.

For me: This is perfect! Now I will just make a complete interface to it and use it in all my birding projects!

See you!

No comments:

Post a Comment