Click here to Skip to main content
15,886,799 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello,

In our sql server database we have Id column in each table. This column is not primary key column but it is made as identity column. We have one more column for primary key. The data type of this column is guid.

The purpose of having Id column in each table is:
1) clustered index should not create on guid column.
2) Id column is very useful in case of migration.

This practice was implemented in our last project But our current client said we should remove Id column from each table and we should create clustered index on guid column.

Client also said that if we remain Id column in each table then we should have valid reason for that.

Please let me know should I remove Id column from each table or not.

Below is the example of our table structure:

CREATE TABLE [dbo].[PlanRider](
[Id] [int] IDENTITY(1,1) NOT NULL,
[PlanRiderId] [uniqueidentifier] NOT NULL,
[CompanyId] [uniqueidentifier] NOT NULL,
[PlanDetailId] [uniqueidentifier] NOT NULL,
[BasePlanId] [uniqueidentifier] NOT NULL,
[RiderPlanId] [uniqueidentifier] NOT NULL,
[RiderType] [varchar](10) NOT NULL,
[CreatedBy] [uniqueidentifier] NULL,
[CreatedDate] [datetime] NULL,
[UpdatedBy] [uniqueidentifier] NULL,
[UpdatedDate] [datetime] NULL,
[PageReference] [nvarchar](100) NULL,
[RowStatus] [int] NULL,
CONSTRAINT [PK_PLanRider] PRIMARY KEY NONCLUSTERED
(
[PlanRiderId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

What I have tried:

This practice was implemented in our last project so I implemented the same in current project.
Posted
Updated 28-May-22 17:21pm
v3
Comments
[no name] 10-Feb-20 7:58am    
The IDENTITY column is the "primary key" column, anything else is "secondary", regardless of your take on it.
phil.o 10-Feb-20 11:43am    
Not really. Defining IDENTITY on a column will only cause it to have auto-generated sequence of values. The PRIMARY KEY specification is what will cause the column to have unique values across the table, and it will automatically generate a clustered index on it.
[no name] 10-Feb-20 15:36pm    
True. But what came "first"? That's was my take. And if it's really "PRIMARY" then the IDENTITY is redundant, unless you're using to hang children off of, which again (in my mind) makes the IDENTITY the "primary". Just my "feelings".
:0)

Another "common feeling" is that primary keys should be "nonsense" numbers for which the IDENTITY column is ideal (which includes "client number", invoice numbers, etc.)
Udai Karan Mathur 11-Feb-20 1:58am    
Please let me know what should I do.
1) Remove Id column from all the table and create clustered index on guid coulmn ?
2) Put Id column as identity column and create primary key on guid column with unique non-clustered index ?

Please let me know which is best practice 1 or 2.
[no name] 11-Feb-20 9:04am    
Do both. But people's "primary keys" are usually some "compound key" of attributes dependent on the IDENTITY key, and which later turns out to have been a bad design decision due to a change in company policy. You can "change" primary keys, where in reality it creates a big mess.

Are there "child relations"? Compare the length of the IDENTITY column with your "primary key" column? What are you going to use to link them?
That the "IDENTITY column serves no purpose" is short-sighted and naïve.

And finally, what is easier to remember: a Guid or an int?

If the uniqueidentifier column is the primary key, then that's what needs to be referenced from other tables.

The IDENTITY column doesn't seem to serve any other purpose, unless you're passing it to a query to avoid having to pass the Guid. But that would potentially leave you open to an Insecure Direct Object Reference vulnerability - it's much easier for an attacker to enumerate records based on a sequential number than those based on a Guid.

Insecure Direct Object Reference Prevention Cheat Sheet | OWASP[^]

Without knowing what your data looks like, or how it will be queried, it's hard to say whether either column is suitable for a clustered index. It's quite possible you'd be better off using one or more of the other columns for the clustered index.

Clustered and Nonclustered Indexes Described - SQL Server | Microsoft Docs[^]
SQL Server Index Architecture and Design Guide - SQL Server | Microsoft Docs[^]
 
Share this answer
 
You list your number 1 reason for having the ID column as...
1) clustered index should not create on guid column.

Now, I agree that if the GUID is the Clustered Index Key, they you'll be adding a whole lot more space to your hard-disks and using more RAM because every non-clustered Index also contains the Clustered Index Key.

I also agree that range scans on GUIDs doesn't do a blood thing for you. If you look at it though, range scans on Identity columns usually don't work so well either. For example, even if all the IDs were IDENTITY columns, would a range scan on an IDENTITY column clustered Index for this table help you find every row in the table for a given CompanyID? No... not at all. You'd need a non-clustered index for that and then, even with an IDENTITY column as the clustered index, it would be a either a seek/bookmark lookup or a scan to find the rest of the information in that table after that and it doesn't matter if it's an IDENTITY column or a RANDOM GUID or a NEWSEQUENTIALID() or a SEQUENCE or what.

What about inserts causing fragmentation of non-clustered indexes? It doesn't matter... it'll happen with GUIDs or IDENTITIES. The maddening part about IDENTITIES is that you'll end up with severely logically fragmented indexes where lowering the FILL FACTOR will NOT help prevent fragmentation because the fragmentation will be in the form of "Sequential Silos" and fragmentation cannot be prevented in those. Oddly enough, the fragmentation will be prevented if you lower the FILL FACTOR on such an index if the Clustered Index is a Random GUID.

The exact same is true for IDENTITY columns where an Insert is done and then an "ExpAnsive" update is done (not likely on the table you posted but you DO have two variable width columns the allow the possibility). These types of indexes fragment like mad even though they have an ever-increasing key. Oddly enough, a GUID clustered index with a correct FILL FACTOR will prevent fragmentation of nearly all "ExpAnsive" updates.

What about overall fragmentation? I agree that an IDENTITY keyed clustered index is the absolute king for fragmentation free indexes but ONLY if they never suffer "ExpAnsive" updates. That pretty much limits things to reference/DIM tables and history tables. For GUIDs, randomness is not the actual big problem. I have a test where I insert 100,000 rows PER DAY into a 123 byte wide table with a Random GUID as the clustered index and it takes 58 days before it gets to just 1% fragmentation. That's nearly 6 million rows in 58 days on a Random Guid Clustered index with virtually no page splits and <1% fragmentation... even if it gets hit with some "ExpAnsive" updates.

Except for their size, Random GUIDs are actually the very epitome of how people expect indexes to operate as... lower the Fill Factor and you prevent fragmentation. Whereas that makes zero difference (but does waste a lot of space that will never be used) if the rows inserted into the hot spot are expansively updated before and index rebuild occurs... and that's the way it will happen because Inserts into an IDENTITY clustered index won't cause fragmentation (nor will they follow the Fill Factor... they'll ALWAYS fill the pages to 100% no matter the FILL FACTOR after just one insert has been done).

So, ask yourself... other than the extra size, what does your IDENTITY column have to offer? Speed? Not likely... the hotspot is a killer for OLTP performance. Using less memory for range scans? Nope... you usually won't get a range scan because you'll need to rely on non-clustered indexes to find stuff. Less fragmentation? Maybe on the clustered indexes and only if they're append only inserts and no expansive updates, which is rare except for audit/history tables. Better JOIN performance? It won't be substantially noticeable until you get into millions or even billions of rows RETURNED by a query.

Your second reason for using an IDENTITY is...
2) Id column is very useful in case of migration.

Funny... that's what they say about GUIDs... almost zero chance of a collision... not so with IDENTITY columns.

So, when you present your two "justifications" to the custom and the customer says, "Really? Prove it!", how are you going to do that? By referring to a bunch of bad and seriously inadequate testing that GUIDS fragment a lot?

Before you even think of doing that, you should watch the full 88 minutes (past where you'll think the end is) of the following YouTube because your customer may have watched it.

Black Arts Index Maintenance 1.2 - Guids vs. Fragmentation | Jeff Moden - YouTube[^]
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900