October 28, 2013

Six degrees of Kevin Bacon featuring Quickgraph

If you don't know what a Bacon Number is you can look it up here.

The basic idea is to calculate between how many movies there are between Kevin Bacon and another actor, the higher the number the further away. It is a very geeky calculation, in fact it so geeky that it is a part of google. Just google [actor name] bacon number and...


So how would you do this yourself?

First off we will need some data.

I used this to create a movie database. Basically you get an Actor table, a Dvd table and a Dvd_Actor table to link them.

Next we will just have to loop through all the records and create nodes(vertices) for each actor and movie in the database.

I used quickgraph and a sinple class to keep the nodes:

    public class MovieInformationNode
    {
        public int Id { get; set; }
        public NodeTypes NodeType { get; set; }
        public string Name { get; set; }
        public string PresentationName
        {
            get
            {

                if (NodeType == NodeTypes.Actor)
                    return "Actor: " + Name;
                else
                    return "Movie: " + Name;
            }
        }
    }

I declared the graph as:

private UndirectedGraph<MovieInformationNode, Edge<MovieInformationNode>> movieGraph                   = new UndirectedGraph<MovieInformationNode, Edge<MovieInformationNode>>();

For each Actor and Dvd I created a node and inserted it into the graph:

                MovieInformationNode node = new MovieInformationNode();
                node.NodeType = NodeTypes.Movie; //or actor if it is an actor, enum to keep track of type
                node.Id = dvd.Id;
                node.Name = dvd.DVD_Title.Trim();
                movieGraph.AddVertex(node);

For each link in Dvd_Actor I found the nodes in the graph and added an edge between them:

                //find actor
                MovieInformationNode actorNode = movieGraph.Vertices.Where(s => s.Id ==                                                               performance.Id && s.NodeType == NodeTypes.Actor).First();
                //find movie
                MovieInformationNode movieNode = movieGraph.Vertices.Where(s => s.Id ==                                                         performance.dvdid && s.NodeType == NodeTypes.Movie).First();
                Edge<MovieInformationNode> edge = new Edge<MovieInformationNode>(actorNode,                                                                               movieNode);
                movieGraph.AddEdge(edge);

Then you just need to get the shortest path:

            Func<Edge<MovieInformationNode>, double> edgeCost = (edge => 1.0D); //no weights
            var tryPath = movieGraph.ShortestPathsDijkstra(edgeCost, sourceNode);
            IEnumerable<Edge<MovieInformationNode>> path;
            if (tryPath(destinationNode, out path))
            {
                foreach (var item in path)
           {
                    listFrom.Items.Add(item.Source);
                    listFrom.Items.Add(item.Target);
           }
            }

To get this working you will need to get quickgraph here.


October 19, 2013

Converting from an adjacency list to hierarchyid in SQL Server



So many of us have worked with hierarchies in SQL Server, and for those not used to the HierarchyId the most common approach was to create a table with a parent relation to it self.

Consider a company structure where the main company has underlying companies, sections, divisions etc in a hierarcical manor. Something like this:


  • Company
    • SectionA
      • Division1
      • Division2
    • SectionB
      • Division1
        • GroupA

The structure for representing this in a database using an adjacency list approach would be something like this:

CREATE TABLE [dbo].[Organization](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Parent] [int] NULL,
[Name] [nvarchar](255) NOT NULL)

With this structure you can find the children of node 4 by a simple: 

SELECT * FROM Organization WHERE Parent = 4

The problem with this structure is when you need to search for all descendants to a node. Then you would need to first make a SELECT * FROM Organization WHERE Parent = 4 and then for each underlying node search for their children and then their children and so on. 

Recursion complicates things, but recursion coupled with database calls can make things go very slow as well.

So in SQL Server from 2008 (old technology, still so few uses it) there is a special datatype for handling hierarchies, the hierarchyid.

The new organizational structure would be something like:

CREATE TABLE [dbo].[NewOrganization](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Hierarchy] [hierarchyid] NOT NULL,
[Name] [nvarchar](255) NOT NULL)

With this table you would be able to make queries such as:

DECLARE @ParentOrganization hierarchyid
SELECT @ParentOrganization = Hierarchy FROM NewOrganization
WHERE Id = 4

SELECT * FROM NewOrganization
WHERE Hierarchy .IsDescendantOf(@ParentOrganization ) = 1

Giving you the full subtree with only one call. Pretty neat.

So how do you get from the lousy adjacency list to the splendid hierachyid?

Well... given the two tables above (note that they are a bit pseudo-coded, no primary keys for example) a solution would be like this in c#.

To connect to the data base you would need a connection string to point out the correct type library:

    <add name="Organizations" connectionString="Type System Version=SQL Server 2012;Data Source=[database];Initial Catalog=[table];Integrated Security=True" /> see this for more info why and how.

Next up you need a mechanism for reading and writing the data to the database up to you if it's entity framework or something else. Note that you would need a reference to Microsoft.SqlServer.Types to use the hierarchyid from c#.

    public class NewOrganizationItem
    {
public Int32 Id { get; set; }
        public SqlHierarchyId Hierarchy { get; set; }
        public String Name { get; set; }
    }
    public class OrganizationItem
    {
        public Int32 Id { get; set; }
        public Nullable<Int32> Parent { get; set; }
        public String Name { get; set; }
    }

        private void ConvertTree()
        {
//Load all old organizationitems from the database
            List<OrganizationItem> items = OrganizationManager.SelectAll();
//Start the importing 
            InsertOrganizations(items, null, SqlHierarchyId.Null);
        }

We will need to keep track on three things when importing. The old organizations parentId and where it should be placed in the new hierarchy. For that we would need both the parent (new parent) and to keep track of the last child under it (lastSibling) so we can insert the new node after the last child node.

        private void InsertOrganizations(List<OrganizationItem> oldItems, int? parentId, 
                                                         SqlHierarchyId newParent)
        {
            SqlHierarchyId lastSibling = SqlHierarchyId.Null;
//loop through all children
            foreach (var item in oldItems.Where(s=>s.Parent == parentId))
            {
                NewOrganizationItem newItem = new NewOrganizationItem();
                newItem.Name = item.Name.Trim();
                if (parentId != null)
//if not a root item create it under its parent after the last sibling
                    newItem.Hierarchy = newParent.GetDescendant(lastSibling, SqlHierarchyId.Null);
                else
//create it under the root node
                    newItem.Hierarchy = SqlHierarchyId.GetRoot().GetDescendant(lastSibling, 
                                                                               SqlHierarchyId.Null);
//Insert it into the database and return the newly created item
                newItem = NewOrganizationManager.Insert(newItem);
                lastSibling = newItem.Hierarchy;
//recursively continue
                InsertOrganizations(oldItems, item.Id, newItem.Hierarchy);
            }
        }

June 7, 2013

Teched day 4

So... the last day at the conference and the first one where the queues for the really cheap Surfaces where accessible enough for me to go shopping.

I went through the news of Sql Server 2014, some Sqllite on Windows Phone and some DirectX-stuff in C++, but the best one today was about dependency injection and containers called "Understanding Dependency Injection and Those Pesky Containers" going through the basics of Castle Windsor, NInject, Unity etc and focusing on Mef by Miguel Castro

It was a nice overview of something I use but doesn't come natural for me. I think I will be going for Mef in the future after seeing this presentation.

Tonight we will party here:
(credits to Andrew Bishop)
Rumor has it that there will be some famous artist there... we will see.

[comment added] It was Tina Turner.

So... what have I learned during these four days?

  • Lightswitch can actually be useful.
  • OData is a tool in my toolbox that needs to get sharpened.
  • Stop being lazy and always use a dependency container. (read Mef)
  • Use Geoflow as much as possible.
  • Stop hiding behind the fact that I am lousy on UX and start doing something about it.
  • Americans are strange but almost always nice.

June 6, 2013

Teched day 3

A bit of an in-between day for me. Attended a few Windows Phone-lectures that didn't contain any news and went to a webapi/odata lecture that was a bit over my level. (did however make a note to self to learn more immediately)

The gem of the day was a lecture by Billy Hollis called Design or die about something that I've stubbornly been proclaiming the last few years. Applications today needs UI/UX design and those companies that wont provide this will slowly be put away from the face of the earth. You simply cannot have the usual design that programmers make (you know the ugly, cramped, hard to read, bad colour choices and no usability-style) alongside with the often fantastically looking apps in the app stores.

We have left the DOS-era and we have left the Windows XP-era... time to adapt!

I must admit that I am one of those programmers who make really ugly applications. I have blamed the fact that I was born without designing talents, which is true, but it is time to change that and acquire some skills.

So I bought this:
Supposedly "the" design book for developers.
Was supposed to have attended the community party today, but fell asleep at seven and just woke up now at three o clock. Will try to sleep some more now.

Good night.

June 5, 2013

Teched day 2

Not only a conference, but an opportunity to sell stuff too...

The weather in New Orleans is so hot that it feels like walking into a wall every time I leave the air conditioned safety of Hilton Riverside. It is such a hard life to be an IT consultant some days... ;)

I read about Microsoft LightSwitch when it just came out and thought:  MS Access applications.
And not in a flattering way.

I gave it a real chance today and went to a session with Beth Massi on creating HTML 5-based business apps with Azure and Visual Studio LightSwitch.
Developers usually scoff about LightSwitch, because, well, you hardly need to program to create something and, well, Developers are people who excel in just programming. I don't scoff anymore. Anything that can create a decently good looking data editor and browser with filters, paged data, a UI that scales to works on Android, IOS, Windows phone and on desktops and that can deploy this to Azure in a one hour session (including some CSS-changes, customization and third part JavaScript controls) is not to scoff at. Period.

Maybe I will not use LightSwitch in many projects, since I have a hunch that there is a point when high customization will make a LightSwitch project more expensive and harder to manage than using traditional coding methods and  customer needs usually means a lot of customization. BUT, for all my fast hobby projects where I need to fix something simple web-based as quick as possible or need a prototype or just a way to manage data online, LightSwitch at least seems perfect at a glance.

Note to self: Start testing LightSwitch!
LightSwitch was a nice acquaintance

Chris Klug held a talk about patterns and architecture in MVVM with a pragmatic approach on how to build stuff in the best way. Chris is good at that stuff and he held an appreciated session. For me it was the deepest in technical aspects so far this conference and packed with good ideas. Check out his blog here.

Good to show those Americans that Swedes know their stuff too... ;)

A swede in the US
Finally, the dominating crapgadgetgiveaway this year is this: (and it even flashes!)
Who would not want a hat like this?

June 3, 2013

Impressions from the keynote at Teched New Orleans 2013

It started with jazz...


...and continued with Brad Andersson driving an Aston Martin on stage.

Key areas

BYOD

Microsoft are bringing in new tools to handle a company's devices with more control over personalization, policies and separating company data and company apps from personal stuff. It seems easy to register your own device for workplace security using two factor identification and the new concept of work folders mimics the folder replication of dropbox in a enterprise fashion.
The concept of just cleaning out the computer by leaving the workplace was quite impressive.

Azure

Azure gets cheaper and a new billing model. Now you won't have to pay for stopped virtual machines and the billing is per minute. The MSDN Azure licence seems changed with a free azure usage amount of 50, 100, 150$ for professional, premium and ultimate subscriptions respectively. There have been improvements on keeping check on how much money/time you've used up.

Visual Studio 2013 / TFS 2013

More focus on ALM which will be on a higher level than in VS 2012, so that you can have a hierarchical view going from department to projects, subprojects, features, tasks etc...
A new HUD-feature in VS2013 adds information about references, tests and source code changes on a method by method basis. It looked really useful.
VS load testing now got cloud support so you can run your existing load test through the cloud.
Microsoft has acquired Inrelease which will bring a workflow-view to deployment and releases handling stuff like approvals and configurations.
Team rooms introduces a new collaboration space in TFS giving project members easier access to what is happening in their project.

Big data

Sql Server 2014 and new additions to Excel gives new possibilities to explore big data.
The very, very impressive geoflow in Excel gives thematic 4d-mapping with drilldown capabilities. See it as google earth with graphs and a possibility to animate time series. Looks amazing. There is also a new data explorer in Excel.

Ps. Swedes were sending most attendants to New Orleans of all countries in Europe. Well done!

May 31, 2013

Stranded in Chicago

I missed the ritual preflight-beer at Arlanda, maybe that is why I am stranded in Chicago (United blamed the tornadoes, but I know it was the missed beer...). 
Continuing to Raleigh tomorrow and New Orleans on Sunday.

But right now I am stuck with a complimentary night at an anonymous hotel in the middle of nowhere in Chicago.  Can't wait to move on.


May 24, 2013

Preparing for Teched

Just a week left before I head to Teched in New Orleans. It will be fun! ...and it will probably be hot as well.

I will try to put my impressions on the blog while I am there, so this programming blog will double as a trip report for the first week in June.

'Til then...

Over and out!

March 25, 2013

Tricky stuff with Sql Server Spatial part 2

So you have this Sql Server 2012 with spatial datatypes and you want to read it with C#.

The first thing to do is to add a reference in your project to Microsoft.SqlServer.Types where SqlGeography and the likes reside.

The next step would be to make a query and read the results using a SqlDataReader. Something like this:

                result.Location  = (SqlGeography)reader["Location"];

The problem is that this will result in a System.InvalidCastException with the message:

{"[A]Microsoft.SqlServer.Types.SqlGeography cannot be cast to [B]Microsoft.SqlServer.Types.SqlGeography. Type A originates from 'Microsoft.SqlServer.Types, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' in the context 'Default' at location 'C:\\windows\\assembly\\GAC_MSIL\\Microsoft.SqlServer.Types\\10.0.0.0__89845dcd8080cc91\\Microsoft.SqlServer.Types.dll'. Type B originates from 'Microsoft.SqlServer.Types, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' in the context 'Default' at location 'C:\\windows\\assembly\\GAC_MSIL\\Microsoft.SqlServer.Types\\11.0.0.0__89845dcd8080cc91\\Microsoft.SqlServer.Types.dll'."}

If you read the message you actually can understand what's going on. Even though you put a specific reference to the types in Sql Server 2012, it still seems as if C# is trying to read with the old version of Sql Server (2008).

The easiest way to fix this is through the connection string and the Type System Version construct that appeared in .Net 4.5.

My first attempt on a connection string was like this (and yeah, my computer is named GAAH):


    <add name="Main" connectionString="data source=GAAH\SQLEXPRESS;initial catalog=SpatialSample;integrated security=True" />

By adding Type System Version we can tell it that we will use the 2012 version of Sql Server Types.

    <add name="Main" connectionString="Type System Version=SQL Server 2012;data source=GAAH\SQLEXPRESS;initial catalog=SpatialSample;integrated security=True" />

Now youre just a read away from your spatial objects.


February 26, 2013

Tricky stuff with Sql Server Spatial part 1

Dear reader(s),

Last week I held an internal lecture about SQL Server Spatial and the basics of GIS here at knowit.
I really like talking about GIS as it is about the world and big amounts of data and I do like the world and big  amounts of data.

Well... where I am? who am I?

Oh, yeah. While preparing I noticed a few things that can be a bit tricky with SQL Server Spatial.

This is the first one.

Preparations:

I use a table called test which basically holds one geography column named spatial and another varchar() called name.

Inserts:


INSERT INTO Test (Name, Spatial)
VALUES(
    'Polygon',
    geography::STGeomFromText(
         'POLYGON((-1 -1, 8 -1 , 8 8 ,-1 8, -1 -1))',4326));

So this basically inserts a polygon into the world. As you can see it is square and crosses the equator. The number 4326 is called the SRID and basically means that the coordinates are calculated according to the projection WGS84. (projections basically means that you get different sets of coordinates depending on how you try to put the round globe on to a square piece of paper (a map). Look here for more info.).

Well, besides the 4326 this is no sweat... now try doing it this way:


INSERT INTO Test (Name, Spatial)
VALUES(
    'Polygon',
    geography::STGeomFromText(
         'POLYGON((-1 -1, -1 8 , 8 8 ,8 -1, -1 -1))',4326))

Exactly the same call, but we put the coordinates the opposite way. Now it is no longer a square polygon crossing the equator but instead it is all but a square polygon crossing the equator. It is covering the rest of the world.

Depending on if you add point clockwise or counter clockwise gives you two totally different polygons.

Why?

Well the world IS actually round meaning that any set of points that makes up a polygon on the surface of the world has two meanings. An interior version and an exterior version. Counter clockwise gives the interior version, clockwise the exterior. Easy to forget...

More about it here.


February 9, 2013

Taxonomy pt 3

In my last adventure in taxonomies I had hopes of using Neo4J and Neo4JClient to implement a taxonomy. However, I could never implement the Cypher-queries that worked in Neo4J through the .Net-library Neo4JClient. I got a "400 bad request"-error, struggled, tore my imaginary hair, gave up and changed the route to QuickGraph.

QuickGraph is a graph engine and not a database. It is not a big issue when it comes to a small taxonomy, but would pose problems for huge taxonomies since it holds all data in memory. For me it was a relief, since QuickGraph is something I know and feel comfortable with.

Installation of QuickGraph is straightforward. Just download, build and set a reference.

As a reminder a taxonomy is something where the meaning of a certain term is valid over a specific time. In this case the time is in which revision the term was defined. Like this:



I am using a directed graph where each revision of the taxonomy is descendant of the last revision.

As a practice set I used the gulls in the last blog (check it out).

Step 1.  Create an object that will hold each gull.


    public class Taxon 
    {
        public Guid TaxonId { get; set; }
        public string ScientificName { get; set; }
        public string CommonName { get; set; }
        public int Revision { get; set; }
    }


Since each scientific name can be used in several revisions I use a guid to know which unique taxon it is.

Step 2. Instantiate a directed graph.

BidirectionalGraph<Taxon, IEdge<Taxon>> graph = new BidirectionalGraph<Taxon, IEdge<Taxon>>();

This basically means that the nodes will be of the type Taxon and that the edges will connect two Taxon objects. Bidirectional means that each nod will no who it is descendent from and who derives from it.

Step 3. Implement the InsertTaxon function.


        public void InsertTaxon(Taxon taxon, List<Taxon> ancestors, BidirectionalGraph<Taxon, IEdge<Taxon>> graph)
        {
            graph.AddVertex(taxon);
            foreach (Taxon item in ancestors)
            {
                TaggedEdge<Taxon, double> outEdge = new TaggedEdge<Taxon, double>(taxon, item, 1);
                graph.AddEdge(outEdge);

            }
        }

We're basically inserting the taxon for a new revision of a taxonomy and connects it to all taxons it derives from. By using a TaggedEdge we have a weighted graph. I just default it to one though.

Step 4. Implement the FindTaxon method.


        public IEnumerable<Taxon> FindTaxon(String scientificName, int fromRevision, int toRevision)
        {
            //Find the taxon with the same scientific name from the chosen revision
            Taxon searchTaxon = graph.Vertices.Where(s => s.ScientificName == scientificName && s.Revision == fromRevision).First();
            //use that taxon to find the possible taxons at the to revision
            List<Taxon> results = new List<Taxon>();
            results.AddRange(GetAllDescendantsAtTime(searchTaxon, toRevision));
            return results.Distinct();
        }


        private List<Taxon> GetAllDescendantsAtTime(Taxon searchTaxon, int toRevision)
        {
            List<Taxon> results = new List<Taxon>();
            foreach (Taxon inNode in graph.InEdges(searchTaxon).Where(s => s.Source.Revision <= toRevision && s.Source.Revision > searchTaxon.Revision).Select(n => n.Source))
            {
                if (inNode.Revision != toRevision)
                    results.AddRange(GetAllDescendantsAtTime(inNode, toRevision));
                else
                    results.Add(inNode);
            }
            return results;
        }

So what we basically do is to find what say a Herring Gull was at revision 1 of the taxonomy and then see what it might be at revision 3.
We accomplish this by walking the graph recursively until we find all revision 3 species that are  based on the Herring Gull. The answer would be European Herring Gull, Yellow Legged Gull, Vega Gull, Caspian Gull and American Herring Gull. 

To note: This approach would need a specific graph for each taxonomy and, yeah, there are probably some algorithms that solves the problem faster than mine.

For me: This is perfect! Now I will just make a complete interface to it and use it in all my birding projects!

See you!



January 27, 2013

Taxonomy pt 2

In my last entry I discussed the special constraints and problems that occur when you need to implement a classification system that changes over time.

This time we will take the discussion a little bit deeper and look at the basic API of a taxonomy component.

A taxonomy is a graph where each taxon has a limited life span and is traceable through previous revisions of the taxonomy.
If I would use birds as an example (and I really like to do that). The Armenian gull, Larus armenicus, is considered a specie by The Association of European Rarities Committees.
It was first considered a sub specie of Herring Gull (L. argentatus), but after that specie was split into European Herring Gull, Larus argentatus, American Herring Gull, Larus smithsonianus, Caspian Gull, Larus cachinnans, Yellow-legged Gull, Larus michahellis, Vega Gull, Larus vegae and the Armenian Gull, Larus armenicus.

To complicate stuff further another taxonomy, namely birdlife.org, doesn't consider it to be a valid specie but lumps it together with Yellow-legged Gull (Larus michahellis).



So... depending on when you see this gull and which taxonomy you use it can be either a Herring gull, an Armenian gull or a Yellow-legged gull and then you're only checking two taxonomies and believe me, there are more out there...

What does this tell us?

  1. A taxon has a time span.
  2. A taxon is derived from one or more taxons.
  3. A taxon is dependent of its taxonomy and several parallel taxonomies may exist.
  4. To find a taxon from its key (the latin name in this case), you will need to know the key, the time and the taxonomy.
To find out that the Armenian gull nowadays is considered a sub specie of Yellow-legged gull in Birdlife, I would have to backtrack the taxonomy graph to find the Armenian gull and then follow it to present time to see that it has been included in Yellow-legged gull. 

Even though this example is about birds, the same will apply more or less to any other type of taxonomy.

In pseudo-code the core functions would be:
  • Taxon InsertTaxon(Taxon taxon, List<Taxon> ancestors, DateTime validFrom) to insert a taxon based on zero or more ancestors.
  • Taxon FindTaxon(String key, DateTime when, Taxonomy string) to find a taxon given a key, time and a certain taxonomy.
The signature can of course differ, but the basic design will be the same.

Next time we will take a look on how we can implement this.

Til then... Bye, bye!

January 14, 2013

The deadly flat foot

A long time ago I made a system for health statistics and I was demoing it for the stakeholders who of course knew a lot about epidemiology. One of them asked if I could use my system to find out the most common cause of death in Sweden over a time period. I was eager to show off my system and generated the query.

The result was flat foot.
I didn't feel that sure about my system after that.

So what had happened?

I had tons and tons of statistical material with gender, ages and cause of death over several years. The cause of death was marked with a diagnostic number, a so called ICD (International Classification of Disease). This ICD-code was versioned so you had a ICD-6, ICD-7, ICD-8 and so on.

Now, what I didn't know was that Sweden made a shift from ICD-7 to ICD-8 at a certain point of time. My stakeholders (stake holders?) knew this of course and set the trap with a smile.

In ICD-7 the code 746 stands for flat foot but in ICD-8 the code 746 stands for congenital anomalies of heart. So when I summarized the statistics using ICD-7 terminology for ICD-8 data... well, I guess you get the idea.

ICD is an interesting example of taxonomy, the science of classification. Other types of classifications can be the futile attempt to classify the internet into a hierarchy of subjects by yahoo, the classification of plants by Linnae or the subject classification at a library (the strange combinations of letter like Pcj:k that somehow describes a books subject).

Classifications is hierarchic by nature, a subdivision from all into smaller and smaller parts. An approach that is easy but has drawbacks when something fits equally well in two or more classes. (consider a book that is both about History and Math for example)

Classifications also change over time as we saw with the flat foot case. In ICD-8 flat foot had moved from 746 to 736. A change that is vital to know about in order to get correct statistics.

So each version of classification connects to the previous version. In ICD-7 the flatfoot at code 746 points to the 736 flat foot in ICD-8.

Other diagnoses had one code in ICD-7 and got several codes in ICD-8.

Ischaemic Heart Disease for example was 420 in ICD-7 but was covered by the codes 410-414 in ICD-8.

A fork.

This of course made it impossible to know which ICD-8 diagnose a person with the ICD-7 diagnose of  Ischaemic Heart Disease had.
All that could be said was that it was one of the diagnoses between 410 and 414 and maybe, maybe if we knew the relative distribution between the diagnoses 410-414 we could guess that it was 40% chance of 410, 15% chance of 411 and so on stumbling through fuzzy logic.

The opposite could also happen of course, that two classes in the old version is represented by a single class in the new. A join.

Similarly some classes may not have a representation in the new version and totally new classes could appear. You will not find HIV in the ICD-7 because originates from 1955 when the disease was unknown.

To complicate matters even more there can be several different classifications that each has their own versioning with forks and joins, but who's classes also connect to classes in other taxonomies.

With birds you have the Sibley-Ahlquist classification that sees the species differently from the traditional Clemens classification.

The national symbol of New Zeeland, the kiwi bird, is considered to be part of the kiwi order Apterygiformes in Clemens but in the Sibley-Ahlquist it is seen as a part of the ostrich order Struthioniformes.

Still, a kiwi is a kiwi and there is a very strong relationship between the kiwi class in the Clemens classification and its Sibley-Ahlquist sibling.

So how do we fit this thing called classifications into SQL Server?

We have seen that a classification consists of a hierarchy of classes that are connected to other versions of the classification and also to other classes in totally different classifications.

We do have pretty good possibilities to implement hierarchies in SQL Server, but a hierarchy only covers one version of a classification. If we want to be able to track changes and translate between different versions of a taxonomy, a hierarchy is not enough because it is not a hierarchy. It is a graph. And maybe it is a directed acyclic graph and maybe, maybe even a weighted variant.

You can put graphs in a relational database, but it is painful.

Better to use a dedicated graph database such as Neo4J or maybe use a mix of graph engine and a relational database.

More on this next time.

January 6, 2013

A nice example on ink functionality and how to save and load images in Win 8 RT

Hi,

A new year and although I didn't made a promise to blog more this year, it still feels like I have a little more energy this year. After all, the world didn't vanish the 21st of December and the European Union hasn't collapsed yet.

So I got a question on how to load and save bitmaps in Windows 8 RT and planned to blog about it.

However... one of my rules in coding is to never ever try something new without googling first. So I did.

Here is a nice sample on using the ink and how to do a lot more than I would have done if I would have made it.

Hmm... now to some serious coding!