October 28, 2013

Six degrees of Kevin Bacon featuring Quickgraph

If you don't know what a Bacon Number is you can look it up here.

The basic idea is to calculate between how many movies there are between Kevin Bacon and another actor, the higher the number the further away. It is a very geeky calculation, in fact it so geeky that it is a part of google. Just google [actor name] bacon number and...


So how would you do this yourself?

First off we will need some data.

I used this to create a movie database. Basically you get an Actor table, a Dvd table and a Dvd_Actor table to link them.

Next we will just have to loop through all the records and create nodes(vertices) for each actor and movie in the database.

I used quickgraph and a sinple class to keep the nodes:

    public class MovieInformationNode
    {
        public int Id { get; set; }
        public NodeTypes NodeType { get; set; }
        public string Name { get; set; }
        public string PresentationName
        {
            get
            {

                if (NodeType == NodeTypes.Actor)
                    return "Actor: " + Name;
                else
                    return "Movie: " + Name;
            }
        }
    }

I declared the graph as:

private UndirectedGraph<MovieInformationNode, Edge<MovieInformationNode>> movieGraph                   = new UndirectedGraph<MovieInformationNode, Edge<MovieInformationNode>>();

For each Actor and Dvd I created a node and inserted it into the graph:

                MovieInformationNode node = new MovieInformationNode();
                node.NodeType = NodeTypes.Movie; //or actor if it is an actor, enum to keep track of type
                node.Id = dvd.Id;
                node.Name = dvd.DVD_Title.Trim();
                movieGraph.AddVertex(node);

For each link in Dvd_Actor I found the nodes in the graph and added an edge between them:

                //find actor
                MovieInformationNode actorNode = movieGraph.Vertices.Where(s => s.Id ==                                                               performance.Id && s.NodeType == NodeTypes.Actor).First();
                //find movie
                MovieInformationNode movieNode = movieGraph.Vertices.Where(s => s.Id ==                                                         performance.dvdid && s.NodeType == NodeTypes.Movie).First();
                Edge<MovieInformationNode> edge = new Edge<MovieInformationNode>(actorNode,                                                                               movieNode);
                movieGraph.AddEdge(edge);

Then you just need to get the shortest path:

            Func<Edge<MovieInformationNode>, double> edgeCost = (edge => 1.0D); //no weights
            var tryPath = movieGraph.ShortestPathsDijkstra(edgeCost, sourceNode);
            IEnumerable<Edge<MovieInformationNode>> path;
            if (tryPath(destinationNode, out path))
            {
                foreach (var item in path)
           {
                    listFrom.Items.Add(item.Source);
                    listFrom.Items.Add(item.Target);
           }
            }

To get this working you will need to get quickgraph here.


October 19, 2013

Converting from an adjacency list to hierarchyid in SQL Server



So many of us have worked with hierarchies in SQL Server, and for those not used to the HierarchyId the most common approach was to create a table with a parent relation to it self.

Consider a company structure where the main company has underlying companies, sections, divisions etc in a hierarcical manor. Something like this:


  • Company
    • SectionA
      • Division1
      • Division2
    • SectionB
      • Division1
        • GroupA

The structure for representing this in a database using an adjacency list approach would be something like this:

CREATE TABLE [dbo].[Organization](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Parent] [int] NULL,
[Name] [nvarchar](255) NOT NULL)

With this structure you can find the children of node 4 by a simple: 

SELECT * FROM Organization WHERE Parent = 4

The problem with this structure is when you need to search for all descendants to a node. Then you would need to first make a SELECT * FROM Organization WHERE Parent = 4 and then for each underlying node search for their children and then their children and so on. 

Recursion complicates things, but recursion coupled with database calls can make things go very slow as well.

So in SQL Server from 2008 (old technology, still so few uses it) there is a special datatype for handling hierarchies, the hierarchyid.

The new organizational structure would be something like:

CREATE TABLE [dbo].[NewOrganization](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Hierarchy] [hierarchyid] NOT NULL,
[Name] [nvarchar](255) NOT NULL)

With this table you would be able to make queries such as:

DECLARE @ParentOrganization hierarchyid
SELECT @ParentOrganization = Hierarchy FROM NewOrganization
WHERE Id = 4

SELECT * FROM NewOrganization
WHERE Hierarchy .IsDescendantOf(@ParentOrganization ) = 1

Giving you the full subtree with only one call. Pretty neat.

So how do you get from the lousy adjacency list to the splendid hierachyid?

Well... given the two tables above (note that they are a bit pseudo-coded, no primary keys for example) a solution would be like this in c#.

To connect to the data base you would need a connection string to point out the correct type library:

    <add name="Organizations" connectionString="Type System Version=SQL Server 2012;Data Source=[database];Initial Catalog=[table];Integrated Security=True" /> see this for more info why and how.

Next up you need a mechanism for reading and writing the data to the database up to you if it's entity framework or something else. Note that you would need a reference to Microsoft.SqlServer.Types to use the hierarchyid from c#.

    public class NewOrganizationItem
    {
public Int32 Id { get; set; }
        public SqlHierarchyId Hierarchy { get; set; }
        public String Name { get; set; }
    }
    public class OrganizationItem
    {
        public Int32 Id { get; set; }
        public Nullable<Int32> Parent { get; set; }
        public String Name { get; set; }
    }

        private void ConvertTree()
        {
//Load all old organizationitems from the database
            List<OrganizationItem> items = OrganizationManager.SelectAll();
//Start the importing 
            InsertOrganizations(items, null, SqlHierarchyId.Null);
        }

We will need to keep track on three things when importing. The old organizations parentId and where it should be placed in the new hierarchy. For that we would need both the parent (new parent) and to keep track of the last child under it (lastSibling) so we can insert the new node after the last child node.

        private void InsertOrganizations(List<OrganizationItem> oldItems, int? parentId, 
                                                         SqlHierarchyId newParent)
        {
            SqlHierarchyId lastSibling = SqlHierarchyId.Null;
//loop through all children
            foreach (var item in oldItems.Where(s=>s.Parent == parentId))
            {
                NewOrganizationItem newItem = new NewOrganizationItem();
                newItem.Name = item.Name.Trim();
                if (parentId != null)
//if not a root item create it under its parent after the last sibling
                    newItem.Hierarchy = newParent.GetDescendant(lastSibling, SqlHierarchyId.Null);
                else
//create it under the root node
                    newItem.Hierarchy = SqlHierarchyId.GetRoot().GetDescendant(lastSibling, 
                                                                               SqlHierarchyId.Null);
//Insert it into the database and return the newly created item
                newItem = NewOrganizationManager.Insert(newItem);
                lastSibling = newItem.Hierarchy;
//recursively continue
                InsertOrganizations(oldItems, item.Id, newItem.Hierarchy);
            }
        }