Introduction
Managing proper relation between tables have always been a big issue in the relational database. It plays a big role in terms of modeling and normalization. SQL Server lets us do so many crazy things without any prior warning, even when we are doing improper table relations. Applications may be working just fine, but there could be presents of big mass at the table relation levels.
Today I am trying to make a SQL query which will help to find table Hierarchy or Generation for each table at the relational database.
Background
In words, A Good Relation Database means there is:
- Presence of at least one table that has no dependencies
- An absence of any circulation flow at table relations
Here is a sample good relational database design:
And if you are thinking about the worst one, here it is:
If the database table number is small in size, it’s easy to get all the information with the design diagram. But if you have so many tables and even with different schemas, it’s hard to line them all with a single step.
So let’s start rolling over the query I was mentioning.
Table Relations Detail
First, we have to find the relations between different tables of the database. Basically, it is like finding the dependencies using the foreign key relations. To find the relations, I am using this query and the detail would be like:
CREATE TABLE #tblRelation (ObjectId VARCHAR(100) NOT NULL,
Name VARCHAR(100) NOT NULL,
[Schema] VARCHAR(100) NOT NULL,
[Column] VARCHAR(100) NOT NULL,
FkFromObjectId VARCHAR(100) NOT NULL,
FkFromTbl VARCHAR(100) NOT NULL,
FkFromSchema VARCHAR(100) NOT NULL,
FkFromClm VARCHAR(100) NOT NULL)
INSERT
INTO #tblRelation
SELECT
parent_object_id,
OBJECT_NAME(parent_object_id),
OBJECT_SCHEMA_NAME(parent_object_id),
c.NAME,
referenced_object_id,
OBJECT_NAME(referenced_object_id),
OBJECT_SCHEMA_NAME(referenced_object_id),
cref.NAME
FROM
sys.foreign_key_columns fk
INNER JOIN
sys.columns c
ON fk.parent_column_id = c.column_id
AND fk.parent_object_id = c.object_id
INNER JOIN
sys.columns cref
ON fk.referenced_column_id = cref.column_id
AND fk.referenced_object_id = cref.object_id
SELECT *
FROM #tblRelation
ORDER BY Name, [Schema], FkFromTbl, FkFromSchema
Now if any relation exists, we will see something like this:
If there is no relation between tables, there will be no entity at this table.
Find First Generation Tables and Table Schema Detail
The second step would be to find the first generation tables using the relations. By the word first generation, I meant a table:
- which doesn’t have any dependencies or any foreign key
- or its dependents on its own or applied foreign keys are from his own columns
CREATE TABLE #tblDetail (Name VARCHAR(100) NOT NULL,
[Schema] VARCHAR(100) NOT NULL,
Generation INT NULL)
INSERT
INTO #tblDetail(Name, [Schema], Generation)
(SELECT
TABLE_NAME,
TABLE_SCHEMA,
(CASE
WHEN(
(SELECT COUNT(*)
FROM #tblRelation
WHERE Name = TABLE_NAME AND [Schema] = TABLE_SCHEMA) = 0
)
THEN(SELECT 0)
WHEN(
(SELECT COUNT(*)
FROM #tblRelation
WHERE Name = TABLE_NAME AND [Schema] = TABLE_SCHEMA) =
(SELECT COUNT(*)
FROM #tblRelation
WHERE Name = TABLE_NAME
AND FkFromTbl = TABLE_NAME AND [Schema] = TABLE_SCHEMA)
)
THEN(SELECT 0)
ELSE(SELECT NULL)
END)
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
AND TABLE_NAME != 'sysdiagrams')
SELECT *
FROM #tblDetail
WHERE Generation IS NOT NULL
ORDER BY Generation, Name, [Schema]
As you can see, I am not only comparing the table name but also the schema name. This one is really important because the database could have multiple tables with the same name for different schemes. Now, if we find any first generation table at the database, we are assigning its generation to “0
”.
On,select
we will get something like this:
How To Know If There Is No Independent Table at Database?
It is important that we find at least one first-generation table to continue our work. But if the database doesn’t have any first generation tables, we put a small check for that:
IF NOT EXISTS(SELECT *
FROM #tblDetail
WHERE Generation IS NOT NULL)
BEGIN
PRINT 'No first generation table found at table relations.';
END
If we don’t find any first generation table, we will find a message. But don’t worry, the process will not throw any error for that. It will try its best to go until the finish line.
And no entity would be selected on the previous select
statement.
Finding Generations for Other Tables
Finding generations of other tables is simple.
- Find tables whose generation has not found yet “
where Generation is NULL at @tblDetail
” - And all of its dependencies could be 100% resolved, using tables whose generation is already found
- Apply its generation to
currentMaxGeneration + 1;
- Continue unless generation of all not found.
WHILE(EXISTS(SELECT *
FROM #tblDetail
WHERE Generation IS NULL))
BEGIN
DECLARE @tblNewGeneration TABLE(Name VARCHAR(100) NOT NULL,
[Schema] VARCHAR(100) NOT NULL)
INSERT INTO @tblNewGeneration
SELECT Name, [Schema]
FROM #tblDetail AS dtl
WHERE Generation IS NULL
AND NOT EXISTS (
SELECT DISTINCT FkFromTbl,FkFromSchema
FROM #tblRelation AS rel
WHERE rel.Name = dtl.Name
AND rel.[Schema] = dtl.[Schema]
EXCEPT
SELECT Name, [Schema]
FROM #tblDetail
WHERE Generation IS NOT NULL
)
IF(NOT EXISTS(SELECT * FROM @tblNewGeneration))
BEGIN
PRINT 'Circular flow found at table relations.';
BREAK;
END
DECLARE @crntGeneration INT
SET @crntGeneration = (SELECT MAX(Generation)
FROM #tblDetail
WHERE Generation IS NOT NULL);
UPDATE #tblDetail
SET Generation = @crntGeneration + 1
WHERE [Schema]+'.'+Name
IN (
SELECT [Schema]+'.'+Name
FROM @tblNewGeneration
)
DELETE FROM @tblNewGeneration;
END
But there is a chance, existence of a circular flow on the relations. So we use a check inside the loop to break the loop if any circular flow is found in relations.
How Do We Understand If Any Circular Flow Exists on the Relations?
- There are still tables whose generation need to be found.
- But no one’s dependencies could be resolved, using tables whose generation was already found.
And if we face a situation like this, a message would be displayed.
Result: Success
If we haven’t faced any of those situations, it means all the generation for the tables has been populated. And we will see no "NULL
" entry at generation column.
SELECT *
FROM #tblDetail
ORDER BY Generation, Name, [Schema]
Result: Failure
As I mentioned earlier, this query will try its best to reach the finish line without any error. But there are confidences. Sometimes, at the end of the process, we may find some “NULL
” entries at the generation column which means the database relation faces some problems like:
- First generation tables not found
- Circular flow found in relations
Where to Apply This Query?
I basically wrote this one for self-satisfaction and applied over a few of the relational databases I had worked with. Few databases replied well for feedback.
We could apply it for:
- Quickly review the table hierarchy to understand the system
- Intending to change the relation between tables or changed, now to find out the effects
- After adding or eliminating tables to find out the effects
Limitations
- There could be some errors which I haven’t faced yet. So, if you find any, please let me know.
- There could be some sort of tool which does this thing for you. If you know any of those, please mention them in the comments