Introduction
Hierarchical and Alphabetical Sorting
To sort the data hierarchically and alphabetically means sorting a tree using Depth-first search algorithm and selecting nodes with the same parent in the alphabetical order of their names.
The hierarchical sorting together with the level of nesting is a good way to emulate an expanded tree or a table of a book contents using just a flat list of objects without the necessity to build a tree of objects.
Just fetch the records, sort them hierarchically and alphabetically, indent each record n
times, where n
is the level of nesting, and you'll get the expanded tree for UI and reports.
Two Approaches
There are several approaches to structure a relational database to store the hierarchical data. The one, which is being used more frequently is the Id
-ParentId
approach.
The Id
-ParentId
approach has a few drawbacks that make it necessary to seek alternative methods of storing hierarchical data.
The most difficult and expensive tasks in Id
-ParentId
approach are those, which require recursive methods, for example:
- find all descendants
- find all ancestors
- calculate the level of nesting
- sort flat records hierarchically and then alphabetically
As a response to these difficulties, Microsoft added the HierarchyId data type to SQL Server since version 2008.
The HierarchyId
data type:
- helps finding descendants and ancestors without recursion
- has fast
GetLevel()
method - contains the information about a node position in hierarchy
- contains the information about a node order within its parent children
If you need to build a clustered index on HierarchyId
, then all the node-records will physically be stored in the order in which they are most frequently fetched.
But the HierarchyId
data type still has drawbacks, which are the trade-offs for its useful properties:
- impossibility to use
HierarchyId
as a foreign key to a parent, i.e., self-referencing
(actually a foreign key to a parent can be substituted with PERSISTED REFERENCES as it is described in this article, but all the pros and cons of such a solution are waiting for their researchers) - difficult ways to generate
HierarchyId
for added and changed records to keep required order, SQL Server does not do this automatically, it is up to the programmer to manage it - difficult use of the
HierarchyId
as a primary and foreign key: it may regularly change - it has no corresponding type in JavaScript (for example)
So what to choose: Id
-ParentId
or HierarchyId
?
An acceptable solution to this dilemma of choice may be the combination of two approaches: Id
-ParentId
and HierarchyId
.
This combination will allow us to:
- use
Id
as an integer, ever increasing and unchangeable primary key - use
ParentId
as a self-referencing foreign key - use
HierarchyId
to keep the records physically sorted in hierarchical and alphabetical order
This article describes the way to unite the both approaches and ensure their normal operation.
Some Facts about HierarchyId
Column and Field Names in this Article and Source Code
Hid
— column of HierarchyId
data type HidPath
— string
column for human-readable path of HierarchyId
HidCode
— a binary presentation of HierarchyId
converted to a string
and without leading 0x
To Human-Readable Format
The HierarchyId
data type is a materialized path encoded into binary format. It can be decoded to human-readable string
path with ToString()
method:
select Hid, Hid.ToString() as HidPath, Hid.GetLevel() as Level from Folders order by Hid;
Hid HidPath Level
0x / 0
0x58 /1/ 1
0x5AC0 /1/1/ 2
0x5AD6 /1/1/1/ 3
0x5AD6B0 /1/1/1/1/ 4
0x5AD6B580 /1/1/1/1/1/ 5
0x5AD6B680 /1/1/1/1/2/ 5
0x5AD6B780 /1/1/1/1/3/ 5
0x5AD6D0 /1/1/1/2/ 4
0x5AD6D580 /1/1/1/2/1/ 5
0x5AD6D680 /1/1/1/2/2/ 5
0x5AD6D780 /1/1/1/2/3/ 5
...
From Human-Readable Format
The opposite operation from string
path to binary format works also:
declare @Hid hierarchyid = '/1/1/1/1/3/';
select @Hid
Results
0x5AD6B780
The above is the implicit conversion. HierarchyId
has the explicit Parse()
method, although a call format is a little strange:
declare @Hid hierarchyid = hierarchyid::Parse('/1/1/1/1/3/');
select @Hid
Results
0x5AD6B780
Tricky Numbering System of Paths
The interesting thing about the HierarchyId
is the way of how it manages the numeration of a node inserted between two other nodes.
To get a Hid
for a new node, HierarchyId
uses GetDescendant()
method from parent Hid
and uses two Hids
of nodes between which the new has to be inserted:
declare @Parent hierarchyid = 0x;
print @Parent.GetDescendant('/1/', '/2/').ToString()
print @Parent.GetDescendant('/1/', '/1.1/').ToString()
print @Parent.GetDescendant('/1/', '/1.0/').ToString()
print @Parent.GetDescendant('/1/', '/1.-1/').ToString()
print @Parent.GetDescendant('/1.1/', '/2/').ToString()
print @Parent.GetDescendant('/1.2/', '/2/').ToString()
print @Parent.GetDescendant('/1.3/', '/2/').ToString()
print @Parent.GetDescendant('/1.3/', '/1.4/').ToString()
print @Parent.GetDescendant_
('/1.2.3.4.5.6.7.8/', '/1.2.3.4.5.6.7.9/').ToString()
declare @Hid hierarchyid = '/1.2.3.4.5.6.7.8.1/';
select @Hid;
declare @Hid hierarchyid = '/-1.-2.-3.-4.-5.-6.-7.-8.-1234567890/';
select @Hid;
print @Parent.GetDescendant(null, null).ToString()
print @Parent.GetDescendant('/1/', null).ToString()
print @Parent.GetDescendant(null, '/1/').ToString()
Why Need Binary Encoding?
If we can build a string
path, why do we need this HierarchyId
binary encoding?
Could we just sort the hierarchical data by this string path?
Look at this example:
select hierarchyid::Parse('/1/') as Hid, '/1/' as HidPath union all
select hierarchyid::Parse('/2/') , '/2/' union all
select hierarchyid::Parse('/10/') , '/10/'
order by HidPath;
Results
Hid HidPath
0x58 /1/
0xAA /10/
0x68 /2/
The same query but ordered by Hid makes the right sorting:
Hid HidPath
0x58 /1/
0x68 /2/
0xAA /10/
Microsoft uses the sophisticated algorithm of HierarchyId
to encode the string
path so that it can be used for hierarchical sorting just by the Hid
column value.
The Solution
Master and SLave
So let's combine Id
-ParentId
self referencing approach with HierarchyId
.
The Id
-ParentId
part will be responsible for Id
primary key, self-reference ParentId
foreign key and be the leading, or master.
The HierarchyId
part will be a calculated field, or slave.
Of course Hid
as a persistent calculated field is the denormalization. But this is a conscious step and a compromise for the advantages of HierarchyId
.
The best place and moment to calculate the HierarchyId
and keep it in coordinated state is a stored procedure and while saving the node.
User Catalog for Experiments
Let's imagine we have a multi-user application where each user keeps its own Catalog
.
This Catalog
keeps information in hierarchical Folders
.
The table for these Folders
might look like this:
create table dbo.Folders
(
UserId int not null ,
Hid hierarchyid not null ,
Id int not null identity,
ParentId int null ,
Name nvarchar(50) not null ,
constraint CU_Folders unique clustered (UserId asc, Hid asc),
constraint PK_Folders primary key nonclustered (Id asc),
constraint FK_Folders_UserId foreign key (UserId) references dbo.Users (Id),
constraint FK_Folders_ParentId foreign key (ParentId) references dbo.Folders (Id)
constraint CH_Folders_ParentId check _
(Hid = 0x and ParentId is null or Hid <> 0x and ParentId is not null)
);
Note that the clustered index is built on UserId
and Hid
, but the primary key is built on Id
column.
This allows to keep all the records of the user's Folders
hierarchically sorted physically. And, at the same time, use the integer primary key as a foreign key for another tables and DTOs.
It is well known, that the hierarchy built on HierarchyId
may have one root only.
Because the clustered key is the combination of UserId
and Hid
columns, a table dbo.Folders
may have one root for each user.
The root node is mandatory, every user's catalog must have one, so the root is mostly a system record that is not edited by users.
Once the root node is a system record, every other nodes must have a parent and ParentId
must be not null
.
The Best Place to Make Calculation
As Microsoft states it here: "It is up to the application to generate and assign HierarchyId values in such a way that the desired relationship between rows is reflected in the values."
The best way to create and keep integrity and consistency of hierarchical data in our case is always use the same stored procedure to insert and update a Folder node.
A user-defined table type that represents saving Folder
may serve as a parameter to the SaveFolder
stored procedure:
create type dbo.FolderTableType as table
(
Id int not null primary key clustered,
ParentId int not null ,
Name nvarchar(50) not null
);
and here is the SaveFolder
stored procedure itself:
create procedure dbo.SaveFolder
@Folder dbo.FolderTableType readonly
as
begin
set nocount on;
begin
declare
@ParamId int ,
@ParentId int ,
@UserId int ,
@ParentHid hierarchyid ,
@StartTranCount int ,
@OldHid hierarchyid ,
@NewHid hierarchyid ;
declare @FolderIds table (
InsertedId int not null,
OldHid hierarchyid null,
NewHid hierarchyid null
);
end;
begin try
set @StartTranCount = @@trancount;
if @StartTranCount = 0
begin
set transaction isolation level serializable;
begin transaction;
end;
begin
select
@ParamId = Id,
@ParentId = ParentId
from
@Folder;
select
@UserId = UserId,
@ParentHid = Hid
from
dbo.Folders
where
Id = @ParentId;
end;
begin
merge into dbo.Folders as target
using
(
select
Hid = @ParentHid.GetDescendant (
LAG (case when t.Id is null then f.Hid end)
over(order by coalesce(t.Name, f.Name)),
LEAD(case when t.Id is null then f.Hid end)
over(order by coalesce(t.Name, f.Name))
),
Id = coalesce( t.Id , f.Id ),
ParentId = coalesce( t.ParentId , f.ParentId ),
Name = coalesce( t.Name , f.Name )
from
(select * from dbo.Folders where ParentId = @ParentId) f
full join @Folder t on t.Id = f.Id
)
as source on source.Id = @ParamId and source.Id = target.Id
when matched and target.UserId = @UserId then
update set
Hid = source.Hid ,
Name = source.Name ,
ParentId = source.ParentId
when not matched by target and source.Id = 0 then
insert (
UserId ,
Hid ,
ParentId ,
Name )
values (
@UserId ,
source.Hid ,
source.ParentId ,
source.Name )
output
inserted.Id ,
deleted.Hid ,
inserted.Hid
into @FolderIds (
InsertedId ,
OldHid ,
NewHid );
end;
begin
select top 1 @OldHid = OldHid, @NewHid = NewHid from @FolderIds;
if @OldHid <> @NewHid
update dbo.Folders set
Hid = Hid.GetReparentedValue(@OldHid, @NewHid)
where
UserId = @UserId
and Hid.IsDescendantOf(@OldHid) = 1;
end;
if @StartTranCount = 0 commit transaction;
end try
begin catch
if xact_state() <> 0 and @StartTranCount = 0 rollback transaction;
declare @ErrorMessage nvarchar(4000) = dbo.GetErrorMessage();
raiserror (@ErrorMessage, 16, 1);
return;
end catch;
end;
If to save each Folder
by the above stored procedure only, then the Hid
column will always have actual and consistent information.
Fetch a Folder With Its Descendants in Hierarchical Order
Thus the following stored procedure fetches a Folder
with all its descendants and sort them hierarchically and then alphabetically as if they are an expanded tree:
create procedure dbo.GetFolderWithSubFolders
@FolderId int
as
begin
set nocount on;
select
d.Id ,
d.ParentId ,
d.Name ,
[Level] = d.Hid.GetLevel(),
HidCode = dbo.GetHidCode(d.Hid),
HidPath = d.Hid.ToString()
from
dbo.Folders f
inner join dbo.Folders d on d.UserId = f.UserId and d.Hid.IsDescendantOf(f.Hid) = 1
where
f.Id = @FolderId
order by
d.Hid;
end;
For example:
exec dbo.GetFolderWithSubFolders @FolderId = 40
Id ParentId Name Level HidCode HidPath
40 13 3A 3 5AD6 /1/1/1/
121 40 4A 4 5AD6B0 /1/1/1/1/
364 121 5A 5 5AD6B580 /1/1/1/1/1/
365 121 5B 5 5AD6B680 /1/1/1/1/2/
366 121 5C 5 5AD6B780 /1/1/1/1/3/
122 40 4B 4 5AD6D0 /1/1/1/2/
367 122 5A 5 5AD6D580 /1/1/1/2/1/
368 122 5B 5 5AD6D680 /1/1/1/2/2/
369 122 5C 5 5AD6D780 /1/1/1/2/3/
123 40 4C 4 5AD6F0 /1/1/1/3/
370 123 5A 5 5AD6F580 /1/1/1/3/1/
371 123 5B 5 5AD6F680 /1/1/1/3/2/
372 123 5C 5 5AD6F780 /1/1/1/3/3/
What is the Use of HidCode?
As it stated above, HidCode
is a binary presentation of HierarchyId
converted to varchar
and without leading 0x
.
Here is the dbo.GetHidCode
scalar SQL function:
create function dbo.GetHidCode
(
@Hid hierarchyid
)
returns varchar(1000)
as
begin
return replace(convert(varchar(1000), cast(@Hid as varbinary(892)), 2), '0x', '');
end;
The HidCode
string
can be used on a client side to sort a flat list of Folders
hierarchically.
Chapter for the Perfectionists
The above method of Hid
calculation works well, but it has a tiny drawback.
-- imagine you have two sibling folders
Id ParentId Name Level HidCode HidPath
----- --------- ----- ------ ---------- ------------
364 121 5A 5 5AD6B580 /1/1/1/1/1/
365 121 5B 5 5AD6B680 /1/1/1/1/2/
-- and you want to insert a new folder
Id ParentId Name
----- --------- -----
0 121 5Aaa
-- the result will be
Id ParentId Name Level HidCode HidPath
----- --------- ----- ------ ---------- ------------
364 121 5A 5 5AD6B580 /1/1/1/1/1/
1234 121 5Aaa 5 5AD6B62C /1/1/1/1/1.1/
365 121 5B 5 5AD6B680 /1/1/1/1/2/
See this sequence in HidPath
: 1 -> 1.1 -> 2
instead of 1 -> 2 -> 3
In time, this ugly sequence may become more uglier.
If you are OK with that, then just skip this chapter.
But for the others, I would suggest one more option for keeping the hierarchical data in actual state.
Here is the stored procedure which recalculates the HierarchyId
so the path consists of integer numbers in a continuous sequence.
create procedure dbo.SaveFolderWithHidReculc
@Folder dbo.FolderTableType readonly
as
begin
set nocount on;
begin
declare
@ParentId int ,
@UserId int ,
@ParentHid hierarchyid ,
@ParentHidStr varchar(1000) ,
@StartTranCount int ,
@OldParentId int ,
@OldParentHid hierarchyid ;
declare @FolderIds table (
InsertedId int not null,
OldParentId int null,
OldParentHid hierarchyid null
);
end;
begin try
set @StartTranCount = @@trancount;
if @StartTranCount = 0
begin
set transaction isolation level serializable;
begin transaction;
end;
begin
select @ParentId = ParentId from @Folder;
select
@UserId = UserId ,
@ParentHid = Hid ,
@ParentHidStr = cast(Hid as varchar(1000))
from
dbo.Folders
where
Id = @ParentId;
end;
begin
merge into dbo.Folders as target
using
(
select
Hid = cast(concat(@ParentHidStr, -1, '/') as varchar(1000)),
Id ,
ParentId ,
Name
from
@Folder
)
as source on source.Id = target.Id
when matched and target.UserId = @UserId then
update set
ParentId = source.ParentId ,
Name = source.Name
when not matched by target and source.Id = 0 then
insert (
UserId ,
Hid ,
ParentId ,
Name )
values (
@UserId ,
source.Hid ,
source.ParentId ,
source.Name )
output
inserted.Id,
deleted.ParentId,
deleted.Hid.GetAncestor(1)
into
@FolderIds (
InsertedId ,
OldParentId ,
OldParentHid );
end
begin
select top 1
@OldParentId = OldParentId ,
@OldParentHid = OldParentHid
from
@FolderIds;
exec dbo.ReculcSubFolderHids @UserId, @ParentId, _
@ParentHid, @OldParentId, @OldParentHid ;
end;
if @StartTranCount = 0 commit transaction;
end try
begin catch
if xact_state() <> 0 and @StartTranCount = 0 rollback transaction;
declare @ErrorMessage nvarchar(4000) = dbo.GetErrorMessage();
raiserror (@ErrorMessage, 16, 1);
return;
end catch;
end;
The trick here is in call of ReculcSubFolderHids
stored procedure after the @Folder
save:
create procedure dbo.ReculcSubFolderHids
@UserId int ,
@ParentId int ,
@ParentHid hierarchyid ,
@OldParentId int = null,
@OldParentHid hierarchyid = null
as
begin
declare @ParentHidStr varchar(1000) = cast(@ParentHid as varchar(1000));
declare @OldParentHidStr varchar(1000) = cast(@OldParentId as varchar(1000));
with Recursion as
(
select
Id ,
ParentId ,
Name ,
OldHid = cast(Hid as varchar(1000)),
NewHid = cast(
concat(
case when ParentId = @ParentId then @ParentHidStr _
else @OldParentHidStr end,
row_number() over (order by Name, Id),
'/'
)
as varchar(1000)
)
from
dbo.Folders
where
ParentId in (@ParentId, @OldParentId)
union all
select
Id = f.Id ,
ParentId = f.ParentId ,
Name = f.Name ,
OldHid = cast(f.Hid as varchar(1000)),
NewHid = cast(
concat(
r.NewHid,
row_number() over _
(partition by f.ParentId order by f.Name, f.Id),
'/'
)
as varchar(1000)
)
from
Recursion r
inner join dbo.Folders f on f.ParentId = r.Id
where
r.OldHid <> r.NewHid
)
update f set
Hid = r.NewHid
from
dbo.Folders f
inner join Recursion r on r.Id = f.Id and f.Hid <> r.NewHid
where
f.UserId = @UserId;
end;
The above stored procedure calculated the paths which consist of integer numbers in a continuous sequence.
-- So the result of the example in the section beginning with SaveFolderWithHidReculc will be
Id ParentId Name Level HidCode HidPath
----- --------- ----- ------ ---------- ------------
364 121 5A 5 5AD6B580 /1/1/1/1/1/
1234 121 5Aaa 5 5AD6B680 /1/1/1/1/2/
365 121 5B 5 5AD6B780 /1/1/1/1/3/
This method has the drawback also. It works well until the recalculated branch consists of up to about 10,000 nodes, when the time for calculation do not exceed a second. 100,000 nodes recalculation may take up to 15 seconds or more.
Read a Tree in C#
The implemented Id
-ParentId
reference allows to build a tree of Folders
in C#.
More over, once the Folders
are sorted hierarchically, the tree can be build for one iteration through the flat list of Folders
.
Suppose we have a Folder
class in C#:
public class Folder
{
public Int32 Id { get; set; }
public Int32? ParentId { get; set; }
public String Name { get; set; }
public IList<Folder> SubFolders { get; set; }
}
So the following method builds a tree for one pass through the hierarchically sorted Folder
list:
public static IList<Folder> ConvertHierarchicallySortedFolderListToTrees
(IEnumerable<Folder> folders)
{
var parentStack = new Stack<Folder>();
var parent = default(Folder);
var prevNode = default(Folder);
var rootNodes = new List<Folder>();
foreach (var folder in folders)
{
if (parent == null || folder.ParentId == null)
{
rootNodes.Add(folder);
parent = folder;
}
else if (folder.ParentId == parent.Id)
{
if (parent.SubFolders == null)
parent.SubFolders = new List<Folder>();
parent.SubFolders.Add(folder);
}
else if (folder.ParentId == prevNode.Id)
{
parentStack.Push(parent);
parent = prevNode;
if (parent.SubFolders == null)
parent.SubFolders = new List<Folder>();
parent.SubFolders.Add(folder);
}
else
{
var parentFound = false;
while(parentStack.Count > 0 && parentFound == false)
{
parent = parentStack.Pop();
if (folder.ParentId != null && folder.ParentId.Value == parent.Id)
{
parent.SubFolders.Add(folder);
parentFound = true;
}
}
if (parentFound == false)
{
rootNodes.Add(folder);
parent = folder;
}
}
prevNode = folder;
}
return rootNodes;
}
The above method works two times faster than the following method for an unsorted Folder
list:
public static IList<Folder> ConvertHierarchicallyUnsortedFolderListToTrees
(IEnumerable<Folder> folders)
{
var dictionary = folders.ToDictionary(n => n.Id, n => n);
var rootFolders = new List<Folder>();
foreach (var folder in dictionary.Select(item => item.Value))
{
Folder parent;
if (folder.ParentId.HasValue &&
dictionary.TryGetValue(folder.ParentId.Value, out parent))
{
if (parent.SubFolders == null)
parent.SubFolders = new List<Folder>();
parent.SubFolders.Add(folder);
}
else
{
rootFolders.Add(folder);
}
}
return rootFolders;
}
Both methods can build multiple roots, so the return type is IList<Folder>
. That is useful when you need to fetch several brunches detached from the tree or several independent trees.
About Source Code
The attached archive contains the Hierarchy
solution, created in Visual Studio 2015,
which consists of two projects:
Hierarchy.DB
- SSDT project to create database for SQL Server 2016 Hierarchy.Tests
- Test project to call the repository methods and see the result
The solution contains the examples of:
- How to fetch a
Folder
by Id
with HidCode
, HidPath
and full path from the root Folder
- How to fetch immediate SubFolders of a
Folder
- How to fetch a
Folder
with all its descendants as a flat list and as a tree - How to fetch a
Folder
with all its parents as a flat list and as a tree - How to add a new
Folder
with Hid
calculation on the fly - How to edit a
Folder
keeping right hierarchical and alphabetical order - How to reparent a
Folder
- assign a Folder
to another Parent
in edit method - How to delete a
Folder
with all its descendants - How to find
Folders
by Name
and build branches from the found Folders
to the root as it realized in Visual Studio is Search Solution Explorer
In order to install the database and run the tests, change the connection string in file Hierarchy.DB.publish.xml and in App.config to yours.
The Database project contains the dbo.GenerateFolders
stored procedure which generates hierarchical data for tests. This generation occurs automatically during the database publish phase.
History
"It is best to erase all personal history, because that would make us free from the encumbering thoughts of other people."
Journey to Ixtlan by Carlos Castaneda