Introduction

There are a few different design patterns for designing a database for a multi-tenant SaaS application. One of the most popular is having a single database shared between multiple tenants (either a single database for all tenants or splitting the tenants among multiple databases which allows us to scale horizontally). This article shows concrete example code to create a sharded multi-tenant database where the tenants are logically isolated using row-level security predicates and individual SQL users for each tenant, in a way that the isolation occurs at the database level without needing to rely on (or trust) application-level solutions.

Background

Having a multi-tenant database (a single database shared between multiple tenants) requires some logic isolation to ensure that one tenant does not get access to the data of other tenants. One of the common solutions for this is adding a TenantID column into each table that requires isolation and then adding some logic to make sure that one tenant cannot see data from others. (if you're wondering why can't we add the TenantID column only to "root" tables and rely on joins for child tables, please check some good reasons in this StackOverflow thread).

When compared to other database patterns (like having isolated databases for each tenant), shared multi-tenant databases have a lower database cost - as you can see in this comparison table. However, the major limitation of having a single database for all your tenants is that it doesn't scale since you're limited by database/server limits. The common solution to this problem is using a hybrid between shared database and isolated databases - it's called database sharding, and basically, it means splitting your data into different databases, according to a sharding criterion (which in our case will by the TenantId) - but without having to keep each tenant on in a dedicated database.

So basically, the Catalog database (where we track all the tenants) should also track the database location for each Tenant: it should have the server name (so we can scale among multiple servers), it should have the database name, and (very important) it should have the SQL user and password for each tenant. The important aspect of a shared design is that multiple tenants can share the same database in the same server, so it's important to have strong logical isolation between tenants, and having individual SQL users and passwords is crucial for the solution that you'll see below.

(The screenshot above was taken from the article, Microsoft - Multi-tenant SaaS database tenancy patterns).

In the screenshot above each database (Catalog, Tenants A-to-D, Tenants E-to-K, etc.) should be hosted on a different server. But you can still benefit from this architecture even if you start/stay small: you can use a single SQL server to host both the Catalog DB and a single Tenants DB for all tenants. The benefit is that you'll get this database-level logical isolation between the tenants (more on this below) and yet in the future you can scale out without changing your architecture.

Creating the Catalog DB

The catalog database contains a list of all tenants, and where they are located.

For the purposes of this article, I'll leave the [Password] column as plain text, but it's a good practice to encrypt and protect that.

CREATE DATABASE [CatalogDB];
GO

USE [CatalogDB];

CREATE TABLE [Tenant] (
    [TenantID] [smallint] IDENTITY(1,1) NOT NULL 
       CONSTRAINT [PK_Tenant] PRIMARY KEY,
    [Subdomain] [nvarchar](256), -- how tenants are identified/resolved
    [Server] [nvarchar](256),
    [Database] [nvarchar](256),
    [UserId] [nvarchar](256),
    [Password] [nvarchar](256),
);

Advanced trick (out of the scope of this article): catalog databases can be a bottleneck (and a single point of failure), so for serious business applications it's a good idea to have some replication and caching for the catalog.

A quick word about the sharding method

The distribution of tenants among multiple servers can be made using different methods. An intuitive way would be like "put the first 10 tenants in this server A, then only when needed provision a new server B and put the next 10 tenants there, etc".

Another method would be starting with a few servers and distributing tenants evenly across those servers: Let's say you have 3 servers called A, B, C, you'd put Tenant1 into A, Tenant2 into B, Tenant3 into C, Tenant4 into A again, Tenant5 into B, etc. So basically tenants are distributed according to (TenantId)%(NumberOfServers).

If you don't want to have a single catalog (which as I said before is both a bottleneck and a single point of failure) you can spread your catalog across multiple servers (exactly like the tenants' data) as long as your requests can be routed directly to the right place, which would require the sharding to be based on something like the tenant domain. Let's say you have 3 servers A, B, and C, and after hashing "mycustomer1.myapp.com" and taking modulo 3 (because we have 3 servers) you get the number 0, so you know that this customer should be stored in server A. Then in server A you would find not only the tenant data for "mycustomer1" and some other tenants, but also you'd find a copy of the respective tenant records with their SQL usernames and passwords. So there's no single catalog, and if one server fails it won't affect all your tenants.

Creating the Shared Multi-Tenant DB

For hosting our first few tenants, we'll create a database called [TenantsA]. When we grow larger, we can put the next few customers into [TenantsB], [TenantsC], etc., which should all have identical structures.

Each tenant will have their own SQL user (like TenantId_1, TenantId_2, etc) to connect to their databases, so we'll create a function CurrentTenantId() that will return the TenantId whenever the current logged SQL user is named like 'TenantId_*'.

CREATE DATABASE [TenantsA];
GO

USE [TenantsA];
GO

CREATE FUNCTION dbo.CurrentTenantId ()
    RETURNS smallint WITH SCHEMABINDING
AS BEGIN
    -- If user is not named like 'TenantId_*', then it's not a Tenant...
    IF SUBSTRING(SYSTEM_USER,1,9) <> 'TenantId_'
        RETURN NULL;
    RETURN CAST(SUBSTRING(SYSTEM_USER,10,9999) AS smallint);
END;
GO
GRANT EXECUTE ON [dbo].[CurrentTenantId] TO public;

And then we create our tables that will be shared among the tenants (in our example a single table):

USE [TenantsA];
GO

CREATE TABLE [Person] (
  [PersonID] [int] NOT NULL IDENTITY(1,1)
      CONSTRAINT [PK_Person] PRIMARY KEY NONCLUSTERED, 
  [TenantID] [smallint] NOT NULL
      DEFAULT(dbo.CurrentTenantId()),
  [FullName] [nvarchar](256) NULL,
);

CREATE UNIQUE CLUSTERED INDEX [CIX_Person] ON [Person] ([TenantID], [PersonID]);

Note that the [TenantID] column has a default value, which uses the CurrentTenantId() function, so the SaaS application (which runs under the context of a tenant) doesn't need to explicitly specify the TenantID. Later in the article we'll also see that they cannot use anything different than their own id.

Advanced trick (out of the scope of this article): Multi-tenant databases can also have a copy of the [Tenant] table, with a copy of the records for its own tenants, and all tables (like [Person]) could have a foreign-key from [TenantId] to [Tenant] table, possibly with a cascade delete to allow easy deletion of a tenant.

A quick word about the Clustered Index

If you are a careful reader (and know a little about databases), you should have noticed that the [Person] table above has a primary key [PersonID], but it's defined as a non-clustered index.

The clustered index of the table (which basically describes how the records are physically ordered in disk) was explicitly defined to include first the [TenantID], and then the primary key ID. By having the clustered index starting with [TenantID], it means that any queries that are filtering for a specific TenantID will seek very quickly into the tenant records (either if it's fetching all tenant records or if it's filtering for some additional criteria).

This design is based on the assumption that most queries on this table will include the [TenantID] as a filter. And this holds true because there's this Security Predicate (below) which will ensure that all tenants will always query the table filtering by their own TenantId.

While this may not hold true for all possible cases, it's a safe assumption that for most multi-tenant systems, the majority of the reads are coming from tenants themselves, which will always be filtering for their own data.

Creating the Security Predicate Function

Row-level security allows us to filter the access to rows according to any criteria. We'll create a Predicate Function that takes the TenantId and decides (according to the current logged SQL user) if the user has access to the TenantId. Basically, any SQL user that is named like "TenantId_X" (for any number X) will only have access to view/modify records where [TenantId]=X.

USE [TenantsA];
GO

CREATE FUNCTION dbo.HasTenantAccess(@TenantId smallint)
    RETURNS TABLE
    WITH SCHEMABINDING
AS
    -- If user is named like 'TenantId_X' 
    -- it can only see/modify records where [TenantId]=X
    -- else it can see/modify all records
    RETURN SELECT 1 AS IsAllowed WHERE 
    dbo.CurrentTenantId() IS NULL
    OR @TenantId=dbo.CurrentTenantId();
GO

CREATE SECURITY POLICY dbo.TenantAccessPolicy;
GO

ALTER SECURITY POLICY dbo.TenantAccessPolicy ADD FILTER PREDICATE 
    dbo.HasTenantAccess([TenantId]) ON [dbo].[Person];

ALTER SECURITY POLICY dbo.TenantAccessPolicy ADD BLOCK PREDICATE 
    dbo.HasTenantAccess([TenantId]) ON [dbo].[Person];

Filter Predicates (ADD FILTER PREDICATE) are used to restrict (filter) read operations, while Block Predicates (ADD BLOCK PREDICATE) are used to restrict (block) write operations.

Creating a DDL Trigger to Enforce Security Predicate

The Security Policy TenantAccessPolicy can be used to apply filters over any number of tables. To make sure that any table with the [TenantId] column will always be filtered, we can create a DDL trigger that will apply the security predicate to any new (or modified) table. This trigger below captures any CREATE TABLE or ALTER TABLE.

CREATE TRIGGER [trApplyTenantAccessPolicy] ON DATABASE FOR DDL_DATABASE_LEVEL_EVENTS 
AS
BEGIN
  SET NOCOUNT ON
  SET ANSI_PADDING ON
  DECLARE @data XML
  DECLARE @ObjectName varchar(500)
  DECLARE @SchemaName varchar(500)
  DECLARE @ObjectType varchar(500)
  DECLARE @EventType varchar(500)

  SET @data = EVENTDATA()
  SET @ObjectName = @data.value('(/EVENT_INSTANCE/ObjectName)[1]', 'varchar(500)')
  SET @SchemaName = @data.value('(/EVENT_INSTANCE/SchemaName)[1]', 'varchar(500)')
  SET @ObjectType = @data.value('(/EVENT_INSTANCE/ObjectType)[1]', 'varchar(500)')
  SET @EventType = @data.value('(/EVENT_INSTANCE/EventType)[1]', 'varchar(500)')
  if (@ObjectType='TABLE' AND (@EventType='CREATE_TABLE' OR @EventType='ALTER_TABLE')) BEGIN
    declare @object_id int
    SET @object_id=OBJECT_ID(@SchemaName+'.'+@ObjectName);
    IF EXISTS (SELECT * FROM sys.columns WHERE  object_id = @object_id AND name = 'TenantID') 
    AND NOT EXISTS (SELECT * FROM sys.security_predicates WHERE target_object_id=@object_id 
    AND predicate_definition='([dbo].[HasTenantAccess]([TenantId]))') BEGIN
        PRINT 'Applying TenantAccessPolicy/HasTenantAccess into '
            + @SchemaName+'.'+@ObjectName + '...'
        EXEC('ALTER SECURITY POLICY dbo.TenantAccessPolicy
            ADD FILTER PREDICATE dbo.HasTenantAccess([TenantId])
            ON [' + @SchemaName+'].['+@ObjectName+']');
        EXEC('ALTER SECURITY POLICY dbo.TenantAccessPolicy
            ADD BLOCK PREDICATE dbo.HasTenantAccess([TenantId])
            ON [' + @SchemaName+'].['+@ObjectName+']');
    END
  END
END

Creating Tenants

Each tenant is basically a new record in the [Tenant] table, but it also requires an individual SQL user to be created (and it's a good idea to have different passwords for each tenant so that if the password of one tenant is compromised the attacker should NOT be able to reach your catalog or other tenants).

USE [CatalogDB];
INSERT INTO [Tenant] ([Subdomain]) VALUES ('customer1.myapp.com');
INSERT INTO [Tenant] ([Subdomain]) VALUES ('customer2.myapp.com');

-- For making this article simple, I'll run the next commands manually
-- but in a real production scenario you'll want to run this as dynamic SQL
-- based on the generated identity (SCOPE_IDENTITY())

-- Let's say the inserts above created TenantID=1 and TenantID=2
CREATE LOGIN [Tenant_1] WITH PASSWORD = '%UltraStrongPassword#!';
CREATE LOGIN [Tenant_2] WITH PASSWORD = '%aDifferentPasswordForEachTenantRight#!';

USE [TenantsA];

CREATE USER [TenantId_1] FOR LOGIN [TenantId_1];
CREATE USER [TenantId_2] FOR LOGIN [TenantId_2];
EXEC sp_addrolemember 'db_datareader', 'TenantId_1'
EXEC sp_addrolemember 'db_datawriter', 'TenantId_1'
EXEC sp_addrolemember 'db_datareader', 'TenantId_2'
EXEC sp_addrolemember 'db_datawriter', 'TenantId_2'

USE [CatalogDB]

UPDATE [Tenant] SET
  [Server]='(local)',
  [Database]='TenantsA',
  [UserId]='TenantId_1',
  [Password]='%UltraStrongPassword#!'
  WHERE [TenantID]=1;

UPDATE [Tenant] SET
  [Server]='(local)',
  [Database]='TenantsA',
  [UserId]='TenantId_1',
  [Password]='%aDifferentPasswordForEachTenantRight#!' -- Right?!
  WHERE [TenantID]=2;

Final Tests

Let's create a new table to make sure that new tables will automatically get the security policy (filter and block predicates) applied.

USE [TenantsA];
GO

CREATE TABLE [Order] (
  [OrderID] [int] NOT NULL IDENTITY(1,1)
      CONSTRAINT [PK_Order] PRIMARY KEY NONCLUSTERED, 
  [TenantID] [smallint] NOT NULL
      DEFAULT(dbo.CurrentTenantId()),
  [Amount] [decimal](15,2) NULL,
);

CREATE UNIQUE CLUSTERED INDEX [CIX_Order] ON [Order] ([TenantID], [OrderID]);

Let's add some records for each tenant

Invoke-Sqlcmd -Database "TenantsA" -Username "TenantId_1" 
-Password "%UltraStrongPassword#!" -Query @"
    INSERT INTO [Person] (FullName) VALUES ('Rick Drizin');
    INSERT INTO [Person] (FullName) VALUES ('Mickey Mouse');
    INSERT INTO [Order] (Amount) VALUES (100);
"@

Invoke-Sqlcmd -Database "TenantsA" -Username "TenantId_2" 
-Password "%aDifferentPasswordForEachTenantRight#!" -Query @"
    INSERT INTO [Person] (FullName) VALUES ('Donald Duck');
    INSERT INTO [Order] (Amount) VALUES (900);
"@

And now let's check that the records automatically got the tenant id and that results are filtered based on the current tenant (current user):

Finally, let's check that tenants cannot explicitly add or modify records from other tenants:

Isn't it cool? I hope you enjoyed this article as much as I did when I wrote it!

Appendix 1 - SUSER_ID() and String Manipulation

For the purpose of this article, I decided to go with an intuitive and simple design, which means that all shared tables contain the column [TenantID] and all SQL users are named like 'Tenant_*', and in order to extract the ID of the current tenant there's this function CurrentTenantId() which extracts the ID based on the current SQL user.

One issue with this design is that the CurrentTenantId() function does string manipulation (substring and cast), which is not so efficient. One important thing to consider is that this function is non-deterministic (we can verify that running SELECT OBJECTPROPERTY(OBJECT_ID('[dbo].[CurrentTenantId]'), 'IsDeterministic')), which means that the function is executed multiple times (e.g. for each row tested/returned/joined/etc). This is because SYSTEM_USER (which we use to get the current user) is also non-deterministic.

I understand this string manipulation doesn't make much of a difference in the overall process, since this is an in-memory operation (no I/O involved). But if we want to avoid this overhead we can make some minor changes:

Each SQL user has a record in table sys.server_principals, and has a unique id principal_id. Instead of storing the TenantID we can store this principal_id. So instead of having [TenantID] columns in tables like [Person] and [Order] we would have a column called [TenantPrincipalID], which is the column to be tested in the predicates.

The default value for this column would be SUSER_ID(), which returns this principal_id.

HasTenantAccess() function would also use this SUSER_ID and would be like this:

CREATE FUNCTION dbo.CurrentTenantId ()
   RETURNS smallint WITH SCHEMABINDING 
AS BEGIN

  -- If user is dbo, then it's not a Tenant... 
  IF IS_MEMBER('db_owner')=1 RETURN NULL;

  RETURN SUSER_ID();
END; 
GO 

GRANT EXECUTE ON [dbo].[CurrentTenantId] TO public;

Please note that SUSER_ID() is also non-deterministic (like SYSTEM_USER and all other security functions), so CurrentTenantId() will still be invoked for each row.

One downside of this SUSER_ID solution is that when connecting with a non-tenant user (e.g. for troubleshooting or maintenance) it gets a little trickier to filter for a tenant:

USE [TenantsA];

-- instead of this 
-- SELECT * FROM [Order] WHERE [TenantID]=2; 

 -- we would need this
SELECT * FROM [Order] WHERE [TenantPrincipalID]=SUSER_ID('TenantId_2');

Another downside is that if we move tenants around (across servers) we would have to update the [TenantPrincipalID] accordingly, since each server would give different IDs for their SQL users.

And even worse: if we restore the database for some tenants using a backup in a new server, we would have the [TenantPrincipalID] but we wouldn't know the respective [TenantId] for each one.

So it's very important to store the [PrincipalID] in the Catalog as soon as we create a new tenant:

USE [TenantsA];
SET @PrincipalID=SUSER_ID('TenantId_2')

USE [CatalogDB];
UPDATE [Tenant] SET [PrincipalID]=@PrincipalID WHERE [TenantID]=2

Appendix 2 - SESSION_CONTEXT and APP_NAME

There are many other articles about row-level security that use SESSION_CONTEXT (or temporary tables, which are similar but in my opinion a little easier to maintain) for setting the current ID. The problem with this solution is that it requires some tweaking in the data layer (when a connection is open you need to set the context). This not only adds an extra roundtrip to the database on every call but also opens the possibility for flaws (in case the code doesn't set the tenant correctly you would see unfiltered results?). In the end, using SESSION_CONTEXT would also add some overhead (if not string manipulation at least you would get I/O overhead, which is even worse).

Another popular solution used in many articles suggests using APP_NAME (which is something that you can pass in the connection string) instead of SYSTEM_USER or SUSER(). The APP_NAME function doesn't add I/O overhead (like SESSION_CONTEXT does), but it's still a nondeterministic function, and it will still require casting varchar to int. In the best case we are avoiding SUBSTRING, but that also could be avoided if we just named our tenant users without the "TenantId_" prefix (like "CREATE LOGIN [1]...").

If for any reason you prefer (or have to) share a single SQL user for all your tenants, then using APP_NAME can be a good workaround. In my opinion, having individual users adds an extra layer of security, and if each tenant will have its own user then there's no point in using APP_NAME instead of SYSTEM_USER/SUSER since the former is provided by the data layer and the latter is provided by SQL itself, so it's safer.

To sum, having tenants with individual users allow us to do the isolation directly in the database layer, without needing to trust the developers or the code.

History

Nov 20^th, 2021: First version
Nov 24^th, 2021: Appendices 1 and 2