New Features for Database Developers in SQL Server 2012: Simpler Paging, Sequences and FileTables

Andrei Ion Rînea

Rate me:

4.25/5 (3 votes)

20 Aug 2012CPOL8 min read

33.8K

Simpler paging, sequences and FileTables in SQL Server 2012

TL;DR

Paging got simpler and more efficient
Sequences have been introduced; better performance for auto-generated IDs and easier to have IDs unique across tables
FileTables have been introduced: building upon the FileStream feature now, we can have non-transactional access to files stored in the DB as a windows share along with transactional access via T-SQL

Lengthier Version

SQL Server 2012, in my opinion does not come with earth-shaking changes, but comes with performance improvements, feature improvements and some new features.

First of all, Management Studio has the same engine as Visual Studio which means you get a nice WPF experience, better font rendering and CTRL-scroll quick zoom-in/zoom-out.

Let’s say you want to retrieve data from the database in a paged way (that means chunks of data of a requested size or smaller). Typically, you would write this in SQL Server 2008 R2 or older:

SQL

DECLARE	@Offset		AS INT = 6
DECLARE @PageSize	AS INT = 5

SELECT	Id,
		Name
FROM
(
	SELECT	Id,
			Name,
			ROW_NUMBER()	OVER (ORDER BY Id)	AS	RowNumber
	FROM	Users
) UsersSelection

WHERE	UsersSelection.RowNumber >  @Offset
	AND	UsersSelection.RowNumber <= @Offset + @PageSize

In SQL Server 2012, the T-SQL syntax has been updated introducing keywords that facilitate a simpler and more efficient paging, keywords such as OFFSET, FETCH, NEXT ROWS and ONLY. A script that would retrieve the same data would be:

SQL

DECLARE	@Offset		AS INT = 6
DECLARE @PageSize	AS INT = 5

SELECT		Id,
			Name
FROM		Users
ORDER BY	Id
OFFSET		@Offset		ROWS
FETCH NEXT	@PageSize	ROWS ONLY

Observe the simpler, clearer syntax. Also, considering that the subselect has been eliminated (the subselect was required because the ROW_NUMBER column could not be addressed in the same select – for the WHERE clause), also the query cost was improved:

2008:

2012:

More details about the paging (some call it pagination) semantics can be found on MSDN.

Next, let’s introduce sequences. MSDN states that:

A sequence is a user-defined schema bound object that generates a sequence of numeric values according to the specification with which the sequence was created. The sequence of numeric values is generated in an ascending or descending order at a defined interval and can be configured to restart (cycle) when exhausted. Sequences, unlike identity columns, are not associated with specific tables. Applications refer to a sequence object to retrieve its next value. The relationship between sequences and tables is controlled by the application. User applications can reference a sequence object and coordinate the values across multiple rows and tables.

…
…
…

Use sequences instead of identity columns in the following scenarios:

The application requires a number before the insert into the table is made.

The application requires sharing a single series of numbers between multiple tables or multiple columns within a table.

The application must restart the number series when a specified number is reached. For example, after assigning values 1 through 10, the application starts assigning values 1 through 10 again.

The application requires sequence values to be sorted by another field. The NEXT VALUE FOR function can apply the OVER clause to the function call. The OVER clause guarantees that the values returned are generated in the order of the OVER clause’s ORDER BY clause.

An application requires multiple numbers to be assigned at the same time. For example, an application needs to reserve five sequential numbers. Requesting identity values could result in gaps in the series if other processes were simultaneously issued numbers. Calling sp_sequence_get_range can retrieve several numbers in the sequence at once.

You need to change the specification of the sequence, such as the increment value.

Now, let’s take a concrete example of a scenario. Let’s say there is some kind of inventory in your company that needs to track certain company assets. Each asset must have its identifier which should be unique across the entire inventory. Let’s suppose we need to track chairs, monitors and other office items and their main characteristics.

One way to do this would be to create a wide table such as:

SQL

CREATE TABLE Inventory (
    Id INT IDENTITY(1,1) NOT NULL,
    ItemTypeId TINYINT NOT NULL,
    InchSize TINYINT NULL, -- applies only to monitors
    HasArmRest BIT NULL, -- applies only to chairs
    AdjustableHeight BIT NULL, --might apply to monitors and chairs

 -- ... and so on

That would be a wrong way to do this because of several reasons:

It would be hard to read, maintain, query
It would violate several database design rules
Most of the columns would be nullable increasing the storage space and query speed

Therefore, we will design several tables, one for each of the inventory items, also getting rid of the ItemTypeId column.

However, generating the ID would no longer be so simple. If we put an ID column as an IDENTITY on each table, then we won’t have unique IDs across the inventory. So the 2008-R2-and-older approach would be to create another table, with a single IDENTITY column, table whose whole purpose would be to generate unique IDs. The script to insert an item in this scenario would be:

SQL

DECLARE @Id INT
BEGIN TRY
	BEGIN TRANSACTION
		INSERT INTO InventoryIds DEFAULT VALUES
		SET @Id = SCOPE_IDENTITY()
		INSERT INTO Monitors
		(
			[Id],
			[Width]
		)
		VALUES
		(
			@Id,
			17
		)
	COMMIT TRANSACTION
END TRY
BEGIN CATCH
	ROLLBACK TRANSACTION;
	DECLARE @ErrorMessage NVARCHAR(4000);
    DECLARE @ErrorSeverity INT;
    DECLARE @ErrorState INT;
    SELECT	@ErrorMessage = ERROR_MESSAGE(),
			@ErrorSeverity = ERROR_SEVERITY(),
			@ErrorState = ERROR_STATE();
    RAISERROR (@ErrorMessage,
               @ErrorSeverity,
               @ErrorState);
END CATCH

Notice the use of SCOPE_IDENTITY() instead of @@IDENTITY (a short and nice comparison can be found in this StackOverflow response) and INSERT INTO [TableName] DEFAULT VALUES, which at least for me, was a new thing.

Seems overly complicated and to an extent, it is. But at least it does a few things well:

It is transactional so if for some reasons, a part of it fails, all fails and no orphan IDs will be left behind
It employs a simpler error-handling through the use of TRY/CATCH. By the way, 2012 introduces the THROW keyword which can be used as is (no parameters) to rethrow the error to the caller.
On error, it rolls back the transaction AND it reports the error to the caller as close as it was.

Let’s see how we can simplify this in SQL Server 2012 using sequences:

First, we don’t need the InventoryIds any more. Then, we’ll create a sequence:

Notice the new “Sequences” node under “Programability”. Next, we’ll set up the sequence attributes:

The cycle option is useless in this and the rest of the majority of scenarios. The cache option can be useful in high-loads scenarios since the database engine doesn’t need to read the disk for each ID generated, as it was the case with IDENTITY. The performance improvement is not huge, but it’s welcome nevertheless.

Next, we’ll write the SQL script to insert the same data:

SQL

INSERT INTO Monitors
(
	Id,
	Width
)
VALUES
(
	NEXT VALUE FOR seqInventory,
	19
)

Notice the simpler syntax, a lot shorter and more elegant. Also, the monitor is larger.

During the presentation at RONUA, I was asked two questions for which I didn’t have a (definitive) answer back then:

Can’t we use UNIQUEIDENTIFIERs and be done with these sequences or sequence-simulation via that table?
Yes, we can but this has some disadvantages: longer key means more I/O and more storage which hampers JOIN performance and a GUID will be harder to manipulate for a human being – let’s say the poor guy that has to stamp all the inventory goods with the IDs
What data types are supported by sequences?
TINYINT, INT, SMALLINT, BIGINT, DECIMAL and NUMERIC

Finally, we approach filetables. This addresses storing (large) binary content in the database. Since this was possible, the database people have been torn in two camps: the ones that see this as the best solution and the ones that hate this to death. Personally, I’m in the first camp.

Typically, a CMS / or some other type of website will allow (some) users to upload pictures. This is a common scenario that is encountered but, of course, it’s not the only one.

The other solution would be to store the files on the disk (in the file system, that is) and have a table in the database which contains all the related data (filename, extension, length, date-modified, date-uploaded, actual-filename, actual-path and so on).

The advantages of storing the files in the database would be, in my opinion:

One data store for all the application’s data; better organization
A single backup, not two or more
Online backup; the app can add/delete/modify files while the backup is running
Incremental backups; yes, rsync could do incremental backups for the filesystem, but it’s harder

Now considering you would opt for this solution – storing the files in the DB – SQL Server 2012 offers you the filetable feature. This is a solution built upon the FILESTREAM feature which means storing the binary data outside of the MDF (Main Data File) of the database in order to avoid degrading the performance of the typical structured data.

The FileTable feature also allows you to access the files stored in the database via a virtual windows share so any app can access the files from there.

Let’s walk this through and we’ll comment on the feature as we go.

First, let’s enable the FILESTREAM support at the server level. Open SQL Server Configuration Manager (requires Administrator privileges), select the SQL Server instance and right-click to properties:

We’ll check all the check-boxes. After this, a server restart might be required. Next, we’ll create a new database (an existing database can be enabled too, but for clarity, we’ll start with a new database).
In the ‘New Database’ dialog, we’ll go through the three tabs from bottom to top. Let’s start with Filegroups and create a new FILESTREAM group:

Next, at the options tab, choose a directory name and select the desired access level to the windows share (‘non-transacted access’). For this demo, I chose ‘Full’:

Finally, we’ll add a new file in the General tab, then choose a name for the file, select FILESTREAM Data as its type, select the FILESTREAM group and choose a path. The path must exist and it will not be created by SQL Server. You need to create it before finalizing the database creation.

Now click ok and the database is finally created. It’s a bit of ‘pain in the side’ but it only hurts the first time (like other things in life..). Let’s create a filetable in this new database:

Notice the new ‘FileTable’ node under ‘Tables’. While doing this, we’ll not be greeted with a dialog box (as I consider we should be) but with a new script like so:

SQL

-- =========================================
-- Create FileTable template
-- =========================================
USE <database, sysname, AdventureWorks>
GO

IF OBJECT_ID('<schema_name, sysname, dbo>.<table_name, sysname, sample_filetable>', 'U') IS NOT NULL
  DROP TABLE <schema_name, sysname, dbo>.<table_name, sysname, sample_filetable>
GO

CREATE TABLE <schema_name, sysname, dbo>.<table_name, sysname, sample_filetable> AS FILETABLE
  WITH
  (
    FILETABLE_DIRECTORY = '<file_table_directory_name, sysname, sample_filetable>',
    FILETABLE_COLLATE_FILENAME = <file_table_filename_collation, sysname, database_default>
  )
GO

From all this mess, we’ll strip out the unnecessary and we’ll be left with this:

SQL

CREATE TABLE MyImages AS FILETABLE
  WITH
  (
    FILETABLE_DIRECTORY = 'MyImages'
  )

This little statement will create a table like so:

Right click on the file table (in Object Explorer) and select the Explore FileTable Directory. This will bring up an empty shared folder. Create a new text file and open it, write some text, save it and close the Notepad.

Now run.

SQL

SELECT	*,
		CAST(file_stream as VARCHAR(MAX))	AS [TextContent]
FROM	MyImages

You’ll notice how the files moved, renamed, deleted, content-changed and any kind of change done in the windows share is reflected back in the FileTable and vice-versa!

This is a very powerful feature, and in my opinion, quite welcome.

This article was originally posted at http://blog.andrei.rinea.ro?p=144

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Andrei Ion Rînea

Software Developer (Senior) IBM, Business Analytics

Romania

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

New Features for Database Developers in SQL Server 2012: Simpler Paging, Sequences and FileTables

TL;DR

Lengthier Version

License

Comments and Discussions