SQL Server JSON: Performance Cookbook

Sergii Syrovatchenko

Rate me:

5.00/5 (7 votes)

25 Jul 2021CPOL12 min read

24.9K

Here's everything you need to know about high-performance JSON parsing for SQL Server

In SQL Server 2017/2019, Microsoft significantly improved the performance of the already fast JSON parser. We will learn more about it in this article.

Introduction

When Microsoft has been knocking everyone’s socks off with weird novelties for many years in a row from one extreme to the other, little by little, you get used to it and wait for those updates with a certain degree of skepticism. Over time, this feeling only grows stronger and subconsciously, you do not expect anything good to happen.

However, sometimes everything turns out exactly in the opposite way. Out of the blue, Microsoft releases some perfectly working functionality that ruins all the existing stereotypes. At first, you expect kind of the same rake and sharp attacks of vicarious embarrassment, but every minute you realize that it was a properly working JSON parser for SQL Server that you have been missing all these years.

Such a sumptuous introduction has certain grounds, since for a long time on Microsoft Connect, support for working with JSON on SQL Server was one of the most demanded features. Years passed and unexpectedly this functionality was implemented together with the release of SQL Server 2016. Looking ahead, I will say that it turned out very well, but Microsoft didn’t stop there and in SQL Server 2017/2019 significantly improved the performance of the already fast JSON parser.

#1. Datatypes
#2. Storage
#3. Compress/Decompress
#4. Compression
#5. ColumnStore
#6. Create JSON
#7. Check JSON
#8. JsonValue
#9. OpenJson
#10. String Split
#11. Lax & strict
#12. Modify
#13. Convert implicit
#14. Indexes
#15. Parser performance
#Conclusions
#Git

1. Datatypes

JSON support on SQL Server is natively available for all editions. At the same time, Microsoft didn’t provide for a separate datatype, as is the case with XML. JSON data on SQL Server is stored as plain text: in Unicode (NVARCHAR/NCHAR) or ANSI (VARCHAR/CHAR) format.

SQL

DECLARE @JSON_ANSI VARCHAR(MAX) =      '[{"Nąme":"Lenōvo モデ460"}]'
      , @JSON_Unicode NVARCHAR(MAX) = N'[{"Nąme":"Lenōvo モデ460"}]'

SELECT DATALENGTH(@JSON_ANSI), @JSON_ANSI
UNION ALL
SELECT DATALENGTH(@JSON_Unicode), @JSON_Unicode

The main thing to remember is how much space this or that datatype takes (2 bytes per character, if we store data as Unicode, or 1 byte for ANSI strings). Also, don’t forget to put ‘N’ literal before Unicode constants. Otherwise, you can run into a bunch of funny situations:

--- ----------------------------
25  [{"Name":"Lenovo ??460"}]
50  [{"Nąme":"Lenōvo モデ460"}]

Everything seems simple, but not at all. Further, we will see that the selected data type affects not only the size, but also the parsing speed.

In addition, Microsoft strongly recommends against using deprecated data types - NTEXT/TEXT. For those who, as habit, still use them, we will make a small investigative experiment:

SQL

DROP TABLE IF EXISTS #varchar
DROP TABLE IF EXISTS #nvarchar
DROP TABLE IF EXISTS #ntext
GO

CREATE TABLE #varchar  (x VARCHAR(MAX))
CREATE TABLE #nvarchar (x NVARCHAR(MAX))
CREATE TABLE #ntext    (x NTEXT)
GO

DECLARE @json NVARCHAR(MAX) =
    N'[{"Manufacturer":"Lenovo","Model":"ThinkPad E460","Availability":1}]'

SET STATISTICS IO, TIME ON

INSERT INTO #varchar
SELECT TOP(50000) @json
FROM [master].dbo.spt_values s1
CROSS JOIN [master].dbo.spt_values s2
OPTION(MAXDOP 1)

INSERT INTO #nvarchar
SELECT TOP(50000) @json
FROM [master].dbo.spt_values s1
CROSS JOIN [master].dbo.spt_values s2
OPTION(MAXDOP 1)

INSERT INTO #ntext
SELECT TOP(50000) @json
FROM [master].dbo.spt_values s1
CROSS JOIN [master].dbo.spt_values s2
OPTION(MAXDOP 1)

SET STATISTICS IO, TIME OFF

The insertion performance in the latter case will differ significantly:

varchar:  CPU time = 32 ms,  elapsed time = 28 ms
nvarchar: CPU time = 31 ms,  elapsed time = 30 ms
ntext:    CPU time = 172 ms, elapsed time = 190 ms

Also, remember that NTEXT/TEXT are always stored on LOB pages:

SQL

SELECT obj_name = OBJECT_NAME(p.[object_id])
     , a.[type_desc]
     , a.total_pages
     , total_mb = a.total_pages * 8 / 1024.
FROM sys.allocation_units a
JOIN sys.partitions p ON p.[partition_id] = a.container_id
WHERE p.[object_id] IN (
        OBJECT_ID('#nvarchar'),
        OBJECT_ID('#ntext'),
        OBJECT_ID('#varchar')
    )

obj_name      type_desc      total_pages  total_mb
------------- -------------- ------------ -----------
varchar       IN_ROW_DATA    516          4.031250
varchar       LOB_DATA       0            0.000000
nvarchar      IN_ROW_DATA    932          7.281250
nvarchar      LOB_DATA       0            0.000000
ntext         IN_ROW_DATA    188          1.468750
ntext         LOB_DATA       1668         13.031250

For your information, starting with SQL Server 2005, the rule "which pages to store data on" has been changed for variable-length types. In general, if the size exceeds 8060 bytes, then the data is placed on the LOB page, otherwise it is stored in IN_ROW pages. It is clear that in this case, SQL Server optimizes the storage of data in pages.

And the last reason not to use NTEXT/TEXT is the fact that all JSON functions are simply not on friendly terms with deprecated data types:

SQL

SELECT TOP(1) 1
FROM #ntext
WHERE ISJSON(x) = 1

Msg 8116, Level 16, State 1, Line 63
Argument data type ntext is invalid for argument 1 of isjson function.

2. Storage

Now let's see how advantageous storing JSON as NVARCHAR/VARCHAR is compared to similar data presented as XML. In addition, we will try to store XML in a native format, and also represent it as a string:

XML

DECLARE @XML_Unicode NVARCHAR(MAX) = N'
<Manufacturer Name="Lenovo">
  <Product Name="ThinkPad E460">
    <Model Name="20ETS03100">
      <CPU>i7-6500U</CPU>
      <Memory>16</Memory>
      <SSD>256</SSD>
    </Model>
    <Model Name="20ETS02W00">
      <CPU>i5-6200U</CPU>
      <Memory>8</Memory>
      <HDD>1000</HDD>
    </Model>
    <Model Name="20ETS02V00">
      <CPU>i5-6200U</CPU>
      <Memory>4</Memory>
      <HDD>500</HDD>
    </Model>
  </Product>
</Manufacturer>'

DECLARE @JSON_Unicode NVARCHAR(MAX) = N'
[
  {
    "Manufacturer": {
      "Name": "Lenovo",
      "Product": {
        "Name": "ThinkPad E460",
        "Model": [
          {
            "Name": "20ETS03100",
            "CPU": "Intel Core i7-6500U",
            "Memory": 16,
            "SSD": "256"
          },
          {
            "Name": "20ETS02W00",
            "CPU": "Intel Core i5-6200U",
            "Memory": 8,
            "HDD": "1000"
          },
          {
            "Name": "20ETS02V00",
            "CPU": "Intel Core i5-6200U",
            "Memory": 4,
            "HDD": "500"
          }
        ]
      }
    }
  }
]'

DECLARE @XML_Unicode_D NVARCHAR(MAX) = N'<Manufacturer Name="Lenovo">
        <Product Name="ThinkPad E460"><Model Name="20ETS03100"><CPU>i7-6500U</CPU>
        <Memory>16</Memory><SSD>256</SSD></Model><Model Name="20ETS02W00">
        <CPU>i5-6200U</CPU><Memory>8</Memory><HDD>1000</HDD></Model>
        <Model Name="20ETS02V00"><CPU>i5-6200U</CPU><Memory>4</Memory>
        <HDD>500</HDD></Model></Product></Manufacturer>'
      , @JSON_Unicode_D NVARCHAR(MAX) = N'[{"Manufacturer":{"Name":"Lenovo","Product":
        {"Name":"ThinkPad E460","Model":[{"Name":"20ETS03100","CPU":"Intel Core i7-6500U",
        "Memory":16,"SSD":"256"},{"Name":"20ETS02W00","CPU":"Intel Core i5-6200U",
        "Memory":8,"HDD":"1000"},{"Name":"20ETS02V00","CPU":"Intel Core i5-6200U",
        "Memory":4,"HDD":"500"}]}}}]'

DECLARE @XML XML = @XML_Unicode
      , @XML_ANSI VARCHAR(MAX) = @XML_Unicode
      , @XML_D XML = @XML_Unicode_D
      , @XML_ANSI_D VARCHAR(MAX) = @XML_Unicode_D
      , @JSON_ANSI VARCHAR(MAX) = @JSON_Unicode
      , @JSON_ANSI_D VARCHAR(MAX) = @JSON_Unicode_D

SELECT *
FROM (
    VALUES ('XML Unicode', DATALENGTH(@XML_Unicode), DATALENGTH(@XML_Unicode_D))
         , ('XML ANSI', DATALENGTH(@XML_ANSI), DATALENGTH(@XML_ANSI_D))
         , ('XML', DATALENGTH(@XML), DATALENGTH(@XML_D))
         , ('JSON Unicode', DATALENGTH(@JSON_Unicode), DATALENGTH(@JSON_Unicode_D))
         , ('JSON ANSI', DATALENGTH(@JSON_ANSI), DATALENGTH(@JSON_ANSI_D))
) t(DataType, Delimeters, NoDelimeters)

When carried out, we get the following results:

DataType     Delimeters  NoDelimeters
------------ ----------- --------------
XML Unicode  914         674
XML ANSI     457         337
XML          398         398
JSON Unicode 1274        604
JSON ANSI    637         302

It may seem like the best option is native XML. This is partly true, however there are nuances. XML is always stored as Unicode. Furthermore, due to the fact that SQL Server uses a binary format for storing this data, everything is compressed into a kind of standardized dictionary with pointers. This is why formatting within XML doesn’t affect the final size of the data.

Strings are different, so I wouldn't recommend storing formatted JSON. The best option is to cut all extra characters when saving and format the data on demand already on the client side.

If you want to further reduce the size of JSON data, then we have several options at our disposal.

3. Compress/Decompress

SQL Server 2016 has implemented new COMPRESS/DECOMPRESS functions that add support for GZIP compression:

SQL

SELECT *
FROM (
    VALUES ('XML Unicode', DATALENGTH(COMPRESS(@XML_Unicode)),
                           DATALENGTH(COMPRESS(@XML_Unicode_D)))
         , ('XML ANSI', DATALENGTH(COMPRESS(@XML_ANSI)),
                        DATALENGTH(COMPRESS(@XML_ANSI_D)))
         , ('JSON Unicode', DATALENGTH(COMPRESS(@JSON_Unicode)),
                            DATALENGTH(COMPRESS(@JSON_Unicode_D)))
         , ('JSON ANSI', DATALENGTH(COMPRESS(@JSON_ANSI)),
                         DATALENGTH(COMPRESS(@JSON_ANSI_D)))
) t(DataType, CompressDelimeters, CompressNoDelimeters)

Results for the previous example:

DataType     CompressDelimeters   CompressNoDelimeters
------------ -------------------- --------------------
XML Unicode  244                  223
XML ANSI     198                  180
JSON Unicode 272                  224
JSON ANSI    221                  183

Everything shrinks well, but you need to remember about one single feature. Suppose that initially the data came in ANSI, and then the type of the variable changed to Unicode:

SQL

DECLARE @t TABLE (val VARBINARY(MAX))

INSERT INTO @t
VALUES (COMPRESS('[{"Name":"ThinkPad E460"}]')) -- VARCHAR(8000)
     , (COMPRESS(N'[{"Name":"ThinkPad E460"}]')) -- NVARCHAR(4000)

SELECT val
     , DECOMPRESS(val)
     , CAST(DECOMPRESS(val) AS NVARCHAR(MAX))
     , CAST(DECOMPRESS(val) AS VARCHAR(MAX))
FROM @t

The COMPRESS function returns different binary sequences for ANSI/Unicode and on subsequent reading, we will face the situation that some of the data is stored as ANSI, and some - in Unicode. It is extremely diﬃcult to guess later to which type to cast:

---------------------------- -------------------------------------------------------
筛丢浡≥∺桔湩偫摡䔠㘴∰嵽      [{"Name":"ThinkPad E460"}]
[{"Name":"ThinkPad E460"}]   [ { " N a m e " : " T h i n k P a d   E 4 6 0 " } ]

If we want to build a high-loaded system, then using the COMPRESS function will slow down the inserting:

SQL

USE tempdb
GO

DROP TABLE IF EXISTS #Compress
DROP TABLE IF EXISTS #NoCompress
GO

CREATE TABLE #NoCompress (DatabaseLogID INT PRIMARY KEY, JSON_Val NVARCHAR(MAX))
CREATE TABLE #Compress   (DatabaseLogID INT PRIMARY KEY, JSON_CompressVal VARBINARY(MAX))
GO

SET STATISTICS IO, TIME ON

INSERT INTO #NoCompress
SELECT DatabaseLogID
     , JSON_Val = (
            SELECT PostTime, DatabaseUser, [Event], [Schema], [Object], [TSQL]
            FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
        )
FROM AdventureWorks2014.dbo.DatabaseLog
OPTION(MAXDOP 1)

INSERT INTO #Compress
SELECT DatabaseLogID
     , JSON_CompressVal = COMPRESS((
            SELECT PostTime, DatabaseUser, [Event], [Schema], [Object], [TSQL]
            FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
         ))
FROM AdventureWorks2014.dbo.DatabaseLog
OPTION(MAXDOP 1)

SET STATISTICS IO, TIME OFF

Moreover, it’s very important:

NoCompress: CPU time = 15 ms,  elapsed time = 25 ms
Compress:   CPU time = 218 ms, elapsed time = 280 ms

This will reduce the size of the table:

SQL

SELECT obj_name = OBJECT_NAME(p.[object_id])
     , a.[type_desc]
     , a.total_pages
     , total_mb = a.total_pages * 8 / 1024.
FROM sys.partitions p
JOIN sys.allocation_units a ON p.[partition_id] = a.container_id
WHERE p.[object_id] IN (
        OBJECT_ID('#Compress'),
        OBJECT_ID('#NoCompress')
    )

obj_name       type_desc     total_pages  total_mb
-------------- ------------- ------------ ---------
NoCompress     IN_ROW_DATA   204          1.593750
NoCompress     LOB_DATA      26           0.203125
Compress       IN_ROW_DATA   92           0.718750
Compress       LOB_DATA      0            0.000000

In addition, reading from the compressed data table is then greatly slowed down by the DECOMPRESS function:

SQL

SET STATISTICS IO, TIME ON

SELECT *
FROM #NoCompress
WHERE JSON_VALUE(JSON_Val, '$.Event') = 'CREATE_TABLE'

SELECT DatabaseLogID, [JSON] = CAST(DECOMPRESS(JSON_CompressVal) AS NVARCHAR(MAX))
FROM #Compress
WHERE JSON_VALUE(CAST(DECOMPRESS(JSON_CompressVal) AS NVARCHAR(MAX)), '$.Event') =
    N'CREATE_TABLE'

SET STATISTICS IO, TIME OFF

Logical reads will be reduced, but execution time will remain extremely slow:

Table 'NoCompress'. Scan count 1, logical reads 187, ...
    CPU time = 16 ms, elapsed time = 37 ms

Table 'Compress'. Scan count 1, logical reads 79, ...
    CPU time = 109 ms, elapsed time = 212 ms

Alternatively, you can add a PERSISTED calculated column:

SQL

ALTER TABLE #Compress ADD EventType_Persisted
    AS CAST(JSON_VALUE(CAST(
            DECOMPRESS(JSON_CompressVal) AS NVARCHAR(MAX)), '$.Event')
        AS VARCHAR(200)) PERSISTED

or create a calculated column and non-clustered index based on it:

SQL

ALTER TABLE #Compress ADD EventType_NonPersisted
    AS CAST(JSON_VALUE(CAST(
            DECOMPRESS(JSON_CompressVal) AS NVARCHAR(MAX)), '$.Event')
        AS VARCHAR(200))

CREATE INDEX ix ON #Compress (EventType_NonPersisted)

Sometimes network latency has a much stronger impact on performance than the examples I gave above. Imagine that on the client we can compress the JSON GZIP data and send it to the server:

SQL

DECLARE @json NVARCHAR(MAX) = (
        SELECT t.[name]
             , t.[object_id]
             , [columns] = (
                     SELECT c.column_id, c.[name], c.system_type_id
                     FROM sys.all_columns c
                     WHERE c.[object_id] = t.[object_id]
                     FOR JSON AUTO
                 )
        FROM sys.all_objects t
        FOR JSON AUTO
    )

SELECT InitialSize = DATALENGTH(@json) / 1048576.
     , CompressSize = DATALENGTH(COMPRESS(@json)) / 1048576.

For me, it became sort of a "life belt" when I tried to reduce network traffic on one of the projects:

InitialSize    CompressSize
-------------- -------------
1.24907684     0.10125923

4. Compression

You can also use data compression to reduce the size of tables. Previously, compression was only available in the Enterprise edition. But with the release of SQL Server 2016 SP1, you can use this functionality even on Express edition:

SQL

USE AdventureWorks2014
GO

DROP TABLE IF EXISTS #InitialTable
DROP TABLE IF EXISTS #None
DROP TABLE IF EXISTS #Row
DROP TABLE IF EXISTS #Page
GO

CREATE TABLE #None (ID INT, Val NVARCHAR(MAX), INDEX ix CLUSTERED (ID)
    WITH (DATA_COMPRESSION = NONE))

CREATE TABLE #Row  (ID INT, Val NVARCHAR(MAX), INDEX ix CLUSTERED (ID)
    WITH (DATA_COMPRESSION = ROW))

CREATE TABLE #Page (ID INT, Val NVARCHAR(MAX), INDEX ix CLUSTERED (ID)
    WITH (DATA_COMPRESSION = PAGE))
GO

SELECT h.SalesOrderID
     , JSON_Data =
           (
                SELECT p.[Name]
                FROM Sales.SalesOrderDetail d
                JOIN Production.Product p ON d.ProductID = p.ProductID
                WHERE d.SalesOrderID = h.SalesOrderID
                FOR JSON AUTO
           )
INTO #InitialTable
FROM Sales.SalesOrderHeader h

SET STATISTICS IO, TIME ON

INSERT INTO #None
SELECT *
FROM #InitialTable
OPTION(MAXDOP 1)

INSERT INTO #Row
SELECT *
FROM #InitialTable
OPTION(MAXDOP 1)

INSERT INTO #Page
SELECT *
FROM #InitialTable
OPTION(MAXDOP 1)

SET STATISTICS IO, TIME OFF

None: CPU time = 62 ms,  elapsed time = 68 ms
Row:  CPU time = 94 ms,  elapsed time = 89 ms
Page: CPU time = 125 ms, elapsed time = 126 ms

PAGE compression uses algorithms that find similar chunks of data and replace them with smaller values. ROW compression truncates them to the lowest datatype, and also truncates excess characters. For example, our column is of type INT, which takes 4 bytes, but values which are less than 255 are stored there. For such records, the type is truncated, and the data on the disk takes up space as if it were TINYINT.

SQL

USE tempdb
GO

SELECT obj_name = OBJECT_NAME(p.[object_id])
     , a.[type_desc]
     , a.total_pages
     , total_mb = a.total_pages * 8 / 1024.
FROM sys.partitions p
JOIN sys.allocation_units a ON p.[partition_id] = a.container_id
WHERE p.[object_id] IN (OBJECT_ID('#None'), OBJECT_ID('#Page'), OBJECT_ID('#Row'))

obj_name   type_desc     total_pages  total_mb
---------- ------------- ------------ ---------
None      IN_ROW_DATA   1156         9.031250
Row       IN_ROW_DATA   1132         8.843750
Page      IN_ROW_DATA   1004         7.843750

5. ColumnStore

But what I like the most is the ColumnStore indexes, which are constantly improving from version to version in SQL Server.

The main idea of the ColumnStore is to split the data in the table into RowGroups of about 1 million rows each and, within this group, compress the data by columns. This results in significant savings in disk space, reduced logical reads, and faster analytical queries. Therefore, if there is a need to store an archive with JSON information, then you can create a clustered ColumnStore index:

SQL

USE AdventureWorks2014
GO

DROP TABLE IF EXISTS #CCI
DROP TABLE IF EXISTS #InitialTable
GO

CREATE TABLE #CCI (ID INT, Val NVARCHAR(MAX), INDEX ix CLUSTERED COLUMNSTORE)
GO

SELECT h.SalesOrderID
     , JSON_Data = CAST(
           (
                SELECT p.[Name]
                FROM Sales.SalesOrderDetail d
                JOIN Production.Product p ON d.ProductID = p.ProductID
                WHERE d.SalesOrderID = h.SalesOrderID
                FOR JSON AUTO
           )
       AS VARCHAR(8000)) -- SQL Server 2012..2016
INTO #InitialTable
FROM Sales.SalesOrderHeader h

SET STATISTICS TIME ON

INSERT INTO #CCI
SELECT *
FROM #InitialTable

SET STATISTICS TIME OFF

In this case, the speed of insertion into the table will approximately correspond to the PAGE compression. In addition, you can fine-tune the process for OLTP load using the COMPRESSION_DELAY option.

CCI: CPU time = 140 ms, elapsed time = 136 ms

Before SQL Server 2017, ColumnStore indexes didn’t support [N]VARCHAR(MAX) data types, but with the release of the new version, we were allowed to store strings of any length in the ColumnStore.

SQL

USE tempdb
GO

SELECT o.[name]
     , s.used_page_count / 128.
FROM sys.indexes i
JOIN sys.dm_db_partition_stats s ON i.[object_id] = s.[object_id] AND i.index_id = s.index_id
JOIN sys.objects o ON i.[object_id] = o.[object_id]
WHERE i.[object_id] = OBJECT_ID('#CCI')

The gains from this are sometimes very impressive in comparison with traditional compression:

------ ---------
CCI   0.796875

6. Create JSON

Now let's take a look at how you can generate JSON. If you have worked with XML in SQL Server before, everything is done here by analogy. The easiest way to generate JSON is to use FOR JSON AUTO. In this case, a JSON array will be generated from objects:

SQL

DROP TABLE IF EXISTS #Users
GO

CREATE TABLE #Users (
      UserID INT
    , UserName SYSNAME
    , RegDate DATETIME
)

INSERT INTO #Users
VALUES (1, 'Paul Denton', '20170123')
     , (2, 'JC Denton', NULL)
     , (3, 'Maggie Cho', NULL)

SELECT *
FROM #Users
FOR JSON AUTO

[
    {
        "UserID":1,
        "UserName":"Paul Denton",
        "RegDate":"2029-01-23T00:00:00"
    },
    {
        "UserID":2,
        "UserName":"JC Denton"
    },
    {
        "UserID":3,
        "UserName":"Maggie Cho"
    }
]

It’s important to note that NULL values are ignored.

If we want to include them in JSON, we can use the INCLUDE_NULL_VALUES option:

SQL

SELECT UserID, RegDate
FROM #Users
FOR JSON AUTO, INCLUDE_NULL_VALUES

[
    {
        "UserID":1,
        "RegDate":"2017-01-23T00:00:00"
    },
    {
        "UserID":2,
        "RegDate":null
    },
    {
        "UserID":3,
        "RegDate":null
    }
]

If you need to get rid of square brackets, then the WITHOUT_ARRAY_WRAPPER option will help us with this:

SQL

SELECT TOP(1) UserID, UserName
FROM #Users
FOR JSON AUTO, WITHOUT_ARRAY_WRAPPER

JavaScript

{
    "UserID":1,
    "UserName":"Paul Denton"
}

If we want to combine the results with the root element, then the ROOT option is provided for this:

SQL

SELECT UserID, UserName
FROM #Users
FOR JSON AUTO, ROOT('Users')

{
    "Users":[
        {
            "UserID":1,
            "UserName":"Paul Denton"
        },
        {
            "UserID":2,
            "UserName":"JC Denton"
        },
        {
            "UserID":3,
            "UserName":"Maggie Cho"
        }
    ]
}

If you need to create JSON with a more complex structure, assign the desired names to properties, group them, then you need to use the FOR JSON PATH expression:

SQL

SELECT TOP(1) UserID
            , UserName AS [Detail.FullName]
            , RegDate AS [Detail.RegDate]
FROM #Users
FOR JSON PATH

[
    {
        "UserID":1,
        "Detail":{
            "FullName":"Paul Denton",
            "RegDate":"2017-01-23T00:00:00"
        }
    }
]

SQL

SELECT t.[name]
     , t.[object_id]
     , [columns] = (
             SELECT c.column_id, c.[name]
             FROM sys.columns c
             WHERE c.[object_id] = t.[object_id]
             FOR JSON AUTO
         )
FROM sys.tables t
FOR JSON AUTO

[
    {
        "name":"#Users",
        "object_id":1483152329,
        "columns":[
            {
            "column_id":1,
            "name":"UserID"
            },
            {
            "column_id":2,
            "name":"UserName"
            },
            {
            "column_id":3,
            "name":"RegDate"
            }
        ]
    }
]

7. Check JSON

To check the validity of the JSON format, there is an ISJSON function that returns 1 if it is JSON, 0 if not, and NULL if NULL was passed.

SQL

DECLARE @json1 NVARCHAR(MAX) = N'{"id": 1}'
      , @json2 NVARCHAR(MAX) = N'[1,2,3]'
      , @json3 NVARCHAR(MAX) = N'1'
      , @json4 NVARCHAR(MAX) = N''
      , @json5 NVARCHAR(MAX) = NULL

SELECT ISJSON(@json1) -- 1
     , ISJSON(@json2) -- 1
     , ISJSON(@json3) -- 0
     , ISJSON(@json4) -- 0
     , ISJSON(@json5) -- NULL

8. JsonValue

To extract a scalar value from JSON, you can use the JSON_VALUE function:

SQL

DECLARE @json NVARCHAR(MAX) = N'
    {
        "UserID": 1,
        "UserName": "JC Denton",
        "IsActive": true,
        "Date": "2016-05-31T00:00:00",
        "Settings": [
             {
                "Language": "EN"
             },
             {
                "Skin": "FlatUI"
             }
          ]
    }'

SELECT JSON_VALUE(@json, '$.UserID')
     , JSON_VALUE(@json, '$.UserName')
     , JSON_VALUE(@json, '$.Settings[0].Language')
     , JSON_VALUE(@json, '$.Settings[1].Skin')
     , JSON_QUERY(@json, '$.Settings')

9. OpenJson

For parsing tabular data, the OPENJSON table function is used. It should be noted right away that it will only work on bases with compatibility level 130 and higher.

There are two modes of operation of the OPENSON function. The simplest one is without specifying a scheme for the resulting selection:

SQL

DECLARE @json NVARCHAR(MAX) = N'
    {
        "UserID": 1,
        "UserName": "JC Denton",
        "IsActive": true,
        "RegDate": "2016-05-31T00:00:00"
    }'

SELECT * FROM OPENJSON(@json)

In the second mode, we can ourselves describe how the returned result will look like: the names of the columns, their number, where to get the values for them from:

SQL

DECLARE @json NVARCHAR(MAX) = N'
    [
        {
            "User ID": 1,
            "UserName": "JC Denton",
            "IsActive": true,
            "Date": "2016-05-31T00:00:00",
            "Settings": [
                 {
                    "Language": "EN"
                 },
                 {
                    "Skin": "FlatUI"
                 }
              ]
        },
        {
            "User ID": 2,
            "UserName": "Paul Denton",
            "IsActive": false
        }
    ]'

SELECT * FROM OPENJSON(@json)
SELECT * FROM OPENJSON(@json, '$[0]')
SELECT * FROM OPENJSON(@json, '$[0].Settings[0]')

SELECT *
FROM OPENJSON(@json)
    WITH (
          UserID INT '$."User ID"'
        , UserName SYSNAME
        , IsActive BIT
        , RegDate DATETIME '$.Date'
        , Settings NVARCHAR(MAX) AS JSON
        , Skin SYSNAME '$.Settings[1].Skin'
    )

If our document has a nested hierarchy, then the following example will help:

SQL

DECLARE @json NVARCHAR(MAX) = N'
    [
        {
            "FullName": "JC Denton",
            "Children": [
                { "FullName": "Mary", "Male": "0" },
                { "FullName": "Paul", "Male": "1" }
            ]
        },
        {
            "FullName": "Paul Denton"
        }
    ]'

SELECT t.FullName, c.*
FROM OPENJSON(@json)
    WITH (
          FullName SYSNAME
        , Children NVARCHAR(MAX) AS JSON
    ) t
OUTER APPLY OPENJSON(Children)
    WITH (
          ChildrenName SYSNAME '$.FullName'
        , Male TINYINT
    ) c

10. String Split

With the release of SQL Server 2016, the STRING_SPLIT function was introduced. And everyone breathed a sigh of relief that now there is no need to invent a bicycle for dividing a string into tokens. However, there is another alternative - the OPENJSON construct, which we considered earlier. Let's test several options for splitting a string:

SQL

SET NOCOUNT ON
SET STATISTICS TIME OFF

DECLARE @x VARCHAR(MAX) = '1' + REPLICATE(CAST(',1' AS VARCHAR(MAX)), 1000)

SET STATISTICS TIME ON

;WITH cte AS
(
    SELECT s = 1
         , e = COALESCE(NULLIF(CHARINDEX(',', @x, 1), 0), LEN(@x) + 1)
         , v = SUBSTRING(@x, 1, COALESCE(NULLIF(CHARINDEX(',', @x, 1), 0), LEN(@x) + 1) - 1)
    UNION ALL
    SELECT s = CONVERT(INT, e) + 1
         , e = COALESCE(NULLIF(CHARINDEX(',', @x, e + 1), 0), LEN(@x) + 1)
         , v = SUBSTRING(@x, e + 1, COALESCE(NULLIF(CHARINDEX(',',  @x, e + 1), 0)
         , LEN(@x) + 1)- e - 1)
    FROM cte
    WHERE e < LEN(@x) + 1
)
SELECT v
FROM cte
WHERE LEN(v) > 0
OPTION (MAXRECURSION 0)

SELECT t.c.value('(./text())[1]', 'INT')
FROM (
    SELECT x = CONVERT(XML, '<i>' + REPLACE(@x, ',', '</i><i>') + '</i>').query('.')
) a
CROSS APPLY x.nodes('i') t(c)

SELECT *
FROM STRING_SPLIT(@x, N',') -- NCHAR(1)/CHAR(1)

SELECT [value]
FROM OPENJSON(N'[' + @x + N']') -- [1,2,3,4]

SET STATISTICS TIME OFF

If you look at the results, you will notice that OPENJSON in some cases can be faster than the STRING_SPLIT function, not to mention crutches with XML and CTE:

              500k    100k   50k    1000

------------- ------- ------ ------ ------
CTE           29407   2406   1266   58
XML           6520    1084   553    259
STRING_SPLIT  4665    594    329    27
OPENJSON      2606    506    273    19

Moreover, if we have a high-load OLTP, then there is no obvious difference between OPENJSON and STRING_SPLIT (1000 iterations + 10 values separated by commas):

CTE          = 4629 ms
XML          = 4397 ms
STRING_SPLIT = 4011 ms
OPENJSON     = 4047 ms

11. Lax & Strict

Starting with SQL Server 2005, it became possible to validate XML from the database side through the use of XML SCHEMA COLLECTION. We describe a schema for XML, and then, based on it, we can validate the data. There is no such functionality explicitly for JSON, but there is a workaround.

As far as I remember, there are two types of expressions for JSON: strict and lax (used by default). The difference is that if we specify non-existent or incorrect paths when parsing, then for the lax expression we will get NULL, and in the case of strict - an error:

SQL

DECLARE @json NVARCHAR(MAX) = N'
    {
        "UserID": 1,
        "UserName": "JC Denton"
    }'

SELECT JSON_VALUE(@json, '$.IsActive')
     , JSON_VALUE(@json, 'lax$.IsActive')
     , JSON_VALUE(@json, 'strict$.UserName')

SELECT JSON_VALUE(@json, 'strict$.IsActive')

Msg 13608, Level 16, State 2, Line 12
Property cannot be found on the specified JSON path.

12. Modify

There is a JSON_MODIFY function for modifying data inside JSON. The examples are quite simple, so there is no point in describing them in detail:

SQL

DECLARE @json NVARCHAR(MAX) = N'
    {
        "FirstName": "JC",
        "LastName": "Denton",
        "Age": 20,
        "Skills": ["SQL Server 2014"]
    }'

-- 20 -> 22
SET @json = JSON_MODIFY(@json, '$.Age', CAST(JSON_VALUE(@json, '$.Age') AS INT) + 2)

-- "SQL 2014" -> "SQL 2016"
SET @json = JSON_MODIFY(@json, '$.Skills[0]', 'SQL 2016')
SET @json = JSON_MODIFY(@json, 'append $.Skills', 'JSON')
SELECT * FROM OPENJSON(@json)

-- delete Age
SELECT * FROM OPENJSON(JSON_MODIFY(@json, 'lax$.Age', NULL))

-- set NULL
SELECT * FROM OPENJSON(JSON_MODIFY(@json, 'strict$.Age', NULL))
GO

DECLARE @json NVARCHAR(100) = N'{ "price": 105.90 }' -- rename
SET @json =
    JSON_MODIFY(
        JSON_MODIFY(@json, '$.Price',
            CAST(JSON_VALUE(@json, '$.price') AS NUMERIC(6,2))),
                '$.price', NULL)

SELECT @json

13. Convert Implicit

And so, we begin to get to the most exciting thing, namely the issues related to performance.

When parsing JSON, you need to remember one nuance - OPENJSON and JSON_VALUE return the result in Unicode, if we do not override this. In the AdventureWorks database, the AccountNumber column is of the VARCHAR data type:

SQL

USE AdventureWorks2014
GO

DECLARE @json NVARCHAR(MAX) = N'{ "AccountNumber": "AW00000009" }'

SET STATISTICS IO ON

SELECT CustomerID, AccountNumber
FROM Sales.Customer
WHERE AccountNumber = JSON_VALUE(@json, '$.AccountNumber')

SELECT CustomerID, AccountNumber
FROM Sales.Customer
WHERE AccountNumber = CAST(JSON_VALUE(@json, '$.AccountNumber') AS VARCHAR(10))

SET STATISTICS IO OFF

Difference in logical reads:

Table 'Customer'. Scan count 1, logical reads 37, ...
Table 'Customer'. Scan count 0, logical reads 2, ...

Due to the fact that the datatypes between the column and the function result are not the same for us, SQL Server has to perform an implicit conversion based on precedence. In this particular case, to NVARCHAR. Alas, all calculations and transformations on the index column most often lead to IndexScan.

If we explicitly specify the type, as in the column, we get IndexSeek.

14. Indexes

Now let's look at how you can index JSON objects. As I mentioned above, SQL Server 2016 didn’t add a separate data type for JSON, unlike XML. Therefore, you can use any string data types to store it.

If someone has experience with XML, then remember that for this format, there are several types of indexes in SQL Server that allow you to speed up certain selections. For string types, in which JSON is supposed to be stored, such indexes simply don’t exist.

Alas, JSONB was not delivered. The development team was in a hurry to release the JSON functionality and said literally the following: “If you miss the speed, we will add JSONB in the next major version”. But with the release of SQL Server 2017/2019, nothing new has been added.

And here computed columns come to the rescue, which can represent certain properties from JSON documents, according to which you need to search, and create indexes based on these columns.

SQL

USE AdventureWorks2014
GO

DROP TABLE IF EXISTS #JSON
GO

CREATE TABLE #JSON (
      DatabaseLogID INT PRIMARY KEY
    , InfoJSON NVARCHAR(MAX) NOT NULL
)
GO

INSERT INTO #JSON
SELECT DatabaseLogID
     , InfoJSON = (
            SELECT PostTime, DatabaseUser, [Event], [Schema], [Object], [TSQL]
            FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
         )
FROM dbo.DatabaseLog

It is not very rational to parse the same data every time:

SQL

SET STATISTICS IO, TIME ON

SELECT *
FROM #JSON
WHERE JSON_VALUE(InfoJSON, '$.Schema') + '.' + JSON_VALUE(InfoJSON, '$.Object') =
    'Person.Person'

SET STATISTICS IO, TIME OFF

Table 'JSON'. Scan count 1, logical reads 187, ...
    CPU time = 16 ms, elapsed time = 29 ms

Therefore, creating a calculated column and then including it in the index is sometimes justified:

SQL

ALTER TABLE #JSON
    ADD ObjectName AS
        JSON_VALUE(InfoJSON, '$.Schema') + '.' + JSON_VALUE(InfoJSON, '$.Object')
GO

CREATE INDEX IX_ObjectName ON #JSON (ObjectName)
GO

SET STATISTICS IO, TIME ON

SELECT *
FROM #JSON
WHERE JSON_VALUE(InfoJSON, '$.Schema') + '.' + JSON_VALUE(InfoJSON, '$.Object') =
    'Person.Person'

SELECT *
FROM #JSON
WHERE ObjectName = 'Person.Person'

SET STATISTICS IO, TIME OFF

That said, the SQL Server optimizer is pretty smart, so you don't need to change anything in the code:

Table 'JSON'. Scan count 1, logical reads 13, ...
    CPU time = 0 ms, elapsed time = 1 ms

Table 'JSON'. Scan count 1, logical reads 13, ...
    CPU time = 0 ms, elapsed time = 1 ms

In addition, you can create both regular indexes and full-text indexes, if we want to get a search over the contents of arrays or whole parts of objects. At the same time, the full-text index doesn’t have any special rules for processing JSON, it just breaks the text into separate tokens, using double quotes, commas, brackets as separators - this is what the JSON structure itself consists of:

SQL

USE AdventureWorks2014
GO

DROP TABLE IF EXISTS dbo.LogJSON
GO

CREATE TABLE dbo.LogJSON (
      DatabaseLogID INT
    , InfoJSON NVARCHAR(MAX) NOT NULL
    , CONSTRAINT pk PRIMARY KEY (DatabaseLogID)
)
GO

INSERT INTO dbo.LogJSON
SELECT DatabaseLogID
     , InfoJSON = (
            SELECT PostTime, DatabaseUser, [Event], ObjectName = [Schema] + '.' + [Object]
            FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
         )
FROM dbo.DatabaseLog
GO

IF EXISTS(
    SELECT *
    FROM sys.fulltext_catalogs
    WHERE [name] = 'JSON_FTC'
)
    DROP FULLTEXT CATALOG JSON_FTC
GO

CREATE FULLTEXT CATALOG JSON_FTC WITH ACCENT_SENSITIVITY = ON AUTHORIZATION dbo
GO

IF EXISTS (
        SELECT *
        FROM sys.fulltext_indexes
        WHERE [object_id] = OBJECT_ID(N'dbo.LogJSON')
    ) BEGIN
    ALTER FULLTEXT INDEX ON dbo.LogJSON DISABLE
    DROP FULLTEXT INDEX ON dbo.LogJSON
END
GO

CREATE FULLTEXT INDEX ON dbo.LogJSON (InfoJSON) KEY INDEX pk ON JSON_FTC
GO

SELECT *
FROM dbo.LogJSON
WHERE CONTAINS(InfoJSON, 'ALTER_TABLE')

15. Parser Performance

And finally, we have come to, perhaps, the most interesting part of this article. How much faster is JSON parsed compared to XML on SQL Server? To answer this question, I have prepared a series of tests.

We prepare two large files in JSON and XML format:

SQL

/*
    EXEC sys.sp_configure 'show advanced options', 1
    GO
    RECONFIGURE
    GO

    EXEC sys.sp_configure 'xp_cmdshell', 1
    GO
    RECONFIGURE WITH OVERRIDE
    GO
*/

USE AdventureWorks2014
GO

DROP PROCEDURE IF EXISTS ##get_xml
DROP PROCEDURE IF EXISTS ##get_json
GO

CREATE PROCEDURE ##get_xml
AS
    SELECT r.ProductID
         , r.[Name]
         , r.ProductNumber
         , d.OrderQty
         , d.UnitPrice
         , r.ListPrice
         , r.Color
         , r.MakeFlag
    FROM Sales.SalesOrderDetail d
    JOIN Production.Product r ON d.ProductID = r.ProductID
    FOR XML PATH ('Product'), ROOT('Products')
GO

CREATE PROCEDURE ##get_json
AS
    SELECT (
        SELECT r.ProductID
             , r.[Name]
             , r.ProductNumber
             , d.OrderQty
             , d.UnitPrice
             , r.ListPrice
             , r.Color
             , r.MakeFlag
        FROM Sales.SalesOrderDetail d
        JOIN Production.Product r ON d.ProductID = r.ProductID
        FOR JSON PATH
    )
GO

DECLARE @sql NVARCHAR(4000)
SET @sql =
    'bcp "EXEC ##get_xml" queryout "X:\sample.xml" -S ' + @@servername + ' -T -w -r -t'
EXEC sys.xp_cmdshell @sql

SET @sql =
    'bcp "EXEC ##get_json" queryout "X:\sample.txt" -S ' + @@servername + ' -T -w -r -t'
EXEC sys.xp_cmdshell @sql

Checking the performance of OPENJSON, OPENXML and XQuery:

SQL

SET NOCOUNT ON
SET STATISTICS TIME ON

DECLARE @xml
SELECT @xml = BulkColumn
FROM OPENROWSET(BULK 'X:\sample.xml', SINGLE_BLOB) x

DECLARE @jsonu NVARCHAR(MAX)
SELECT @jsonu = BulkColumn
FROM OPENROWSET(BULK 'X:\sample.txt', SINGLE_NCLOB) x

/*
    XML:      CPU = 891 ms, Time = 886 ms
    NVARCHAR: CPU = 141 ms, Time = 166 ms
*/

SELECT ProductID =     t.c.value('(ProductID/text())[1]', 'INT')
     , [Name] =        t.c.value('(Name/text())[1]', 'NVARCHAR(50)')
     , ProductNumber = t.c.value('(ProductNumber/text())[1]', 'NVARCHAR(25)')
     , OrderQty =      t.c.value('(OrderQty/text())[1]', 'SMALLINT')
     , UnitPrice =     t.c.value('(UnitPrice/text())[1]', 'MONEY')
     , ListPrice =     t.c.value('(ListPrice/text())[1]', 'MONEY')
     , Color =         t.c.value('(Color/text())[1]', 'NVARCHAR(15)')
     , MakeFlag =      t.c.value('(MakeFlag/text())[1]', 'BIT')
FROM @xml.nodes('Products/Product') t(c)

/*
    CPU time = 6203 ms, elapsed time = 6492 ms
*/

DECLARE @doc INT
EXEC sys.sp_xml_preparedocument @doc OUTPUT, @xml

SELECT *
FROM OPENXML(@doc, '/Products/Product', 2)
    WITH (
          ProductID INT
        , [Name] NVARCHAR(50)
        , ProductNumber NVARCHAR(25)
        , OrderQty SMALLINT
        , UnitPrice MONEY
        , ListPrice MONEY
        , Color NVARCHAR(15)
        , MakeFlag BIT
    )

EXEC sys.sp_xml_removedocument @doc

/*
    CPU time = 2656 ms, elapsed time = 3489 ms
    CPU time = 3844 ms, elapsed time = 4482 ms
    CPU time = 0 ms, elapsed time = 4 ms
*/

SELECT *
FROM OPENJSON(@jsonu)
    WITH (
          ProductID INT
        , [Name] NVARCHAR(50)
        , ProductNumber NVARCHAR(25)
        , OrderQty SMALLINT
        , UnitPrice MONEY
        , ListPrice MONEY
        , Color NVARCHAR(15)
        , MakeFlag BIT
    )

/*
    CPU time = 1359 ms, elapsed time = 1642 ms
*/

SET STATISTICS TIME, IO OFF

Now let's check the performance of the JSON_VALUE scalar function relative to XQuery:

SQL

SET NOCOUNT ON

DECLARE @jsonu NVARCHAR(MAX) = N'[
    {"User":"Sergii Syrovatchenko","Age":28,"Skills":["SQL Server","T-SQL","JSON","XML"]},
    {"User":"JC Denton","Skills":["Microfibral Muscle","Regeneration","EMP Shield"]},
    {"User":"Paul Denton","Age":32,"Skills":["Vision Enhancement"]}]'

DECLARE @jsonu_f NVARCHAR(MAX) = N'[
   {
      "User":"Sergii Syrovatchenko",
      "Age":28,
      "Skills":[
         "SQL Server",
         "T-SQL",
         "JSON",
         "XML"
      ]
   },
   {
      "User":"JC Denton",
      "Skills":[
         "Microfibral Muscle",
         "Regeneration",
         "EMP Shield"
      ]
   },
   {
      "User":"Paul Denton",
      "Age":32,
      "Skills":[
         "Vision Enhancement"
      ]
   }
]'

DECLARE @json VARCHAR(MAX) = @jsonu
      , @json_f VARCHAR(MAX) = @jsonu_f

DECLARE @xml = N'
<Users>
    <User Name="Sergii Syrovatchenko">
        <Age>28</Age>
        <Skills>
            <Skill>SQL Server</Skill>
            <Skill>T-SQL</Skill>
            <Skill>JSON</Skill>
            <Skill>XML</Skill>
        </Skills>
    </User>
    <User Name="JC Denton">
        <Skills>
            <Skill>Microfibral Muscle</Skill>
            <Skill>Regeneration</Skill>
            <Skill>EMP Shield</Skill>
        </Skills>
    </User>
    <User Name="Paul Denton">
        <Age>28</Age>
        <Skills>
            <Skill>Vision Enhancement</Skill>
        </Skills>
    </User>
</Users>'

DECLARE @i INT
      , @int INT
      , @varchar VARCHAR(100)
      , @nvarchar NVARCHAR(100)
      , @s DATETIME
      , @runs INT = 100000

DECLARE @t TABLE (
      iter INT IDENTITY PRIMARY KEY
    , data_type VARCHAR(100)
    , [path] VARCHAR(1000)
    , [type] VARCHAR(1000)
    , time_ms INT
)

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @int = JSON_VALUE(@jsonu, '$[0].Age')
         , @i += 1
INSERT INTO @t
SELECT '@jsonu', '$[0].Age', 'INT', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @int = JSON_VALUE(@jsonu_f, '$[0].Age')
         , @i += 1
INSERT INTO @t
SELECT '@jsonu_f', '$[0].Age', 'INT', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @int = JSON_VALUE(@json, '$[0].Age')
         , @i += 1
INSERT INTO @t
SELECT '@json', '$[0].Age', 'INT', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @int = JSON_VALUE(@json_f, '$[0].Age')
         , @i += 1
INSERT INTO @t
SELECT '@json_f', '$[0].Age', 'INT', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @int = @xml.value('(Users/User[1]/Age/text())[1]', 'INT')
         , @i += 1
INSERT INTO @t
SELECT '@xml', '(Users/User[1]/Age/text())[1]', 'INT', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @nvarchar = JSON_VALUE(@jsonu, '$[1].User')
         , @i += 1
INSERT INTO @t
SELECT '@jsonu', '$[1].User', 'NVARCHAR', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @nvarchar = JSON_VALUE(@jsonu_f, '$[1].User')
         , @i += 1
INSERT INTO @t
SELECT '@jsonu_f', '$[1].User', 'NVARCHAR', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @varchar = JSON_VALUE(@json, '$[1].User')
         , @i += 1
INSERT INTO @t
SELECT '@json', '$[1].User', 'VARCHAR', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @varchar = JSON_VALUE(@json_f, '$[1].User')
         , @i += 1
INSERT INTO @t
SELECT '@json_f', '$[1].User', 'VARCHAR', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @nvarchar = @xml.value('(Users/User[2]/@Name)[1]', 'NVARCHAR(100)')
         , @i += 1
INSERT INTO @t
SELECT '@xml', '(Users/User[2]/@Name)[1]', 'NVARCHAR', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @varchar = @xml.value('(Users/User[2]/@Name)[1]', 'VARCHAR(100)')
         , @i += 1
INSERT INTO @t
SELECT '@xml', '(Users/User[2]/@Name)[1]', 'VARCHAR', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @nvarchar = JSON_VALUE(@jsonu, '$[2].Skills[0]')
         , @i += 1
INSERT INTO @t
SELECT '@jsonu', '$[2].Skills[0]', 'NVARCHAR', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @nvarchar = JSON_VALUE(@jsonu_f, '$[2].Skills[0]')
         , @i += 1
INSERT INTO @t
SELECT '@jsonu_f', '$[2].Skills[0]', 'NVARCHAR', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @varchar = JSON_VALUE(@json, '$[2].Skills[0]')
         , @i += 1
INSERT INTO @t
SELECT '@json', '$[2].Skills[0]', 'VARCHAR', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @varchar = JSON_VALUE(@json_f, '$[2].Skills[0]')
         , @i += 1
INSERT INTO @t
SELECT '@json_f', '$[2].Skills[0]', 'VARCHAR', DATEDIFF(ms, @s, GETDATE())

SELECT @i = 1, @s = GETDATE()
WHILE @i <= @runs
    SELECT @varchar = @xml.value('(Users/User[3]/Skills/Skill/text())[1]', 'VARCHAR(100)')
         , @i += 1
INSERT INTO @t
SELECT '@xml', '(Users/User[3]/Skills/Skill/text())[1]', 'VARCHAR', DATEDIFF(ms, @s, GETDATE())

SELECT * FROM @t

Results:

data_type  path                                    type      2016 SP2(CU17) 2017(CU25) 2019(CU11)
---------- -------------------------------------- --------- ------------- ----------- -----------
@jsonu     $[0].Age                               INT       1030          314         250
@jsonu_f   $[0].Age                               INT       1014          333         290
@json      $[0].Age                               INT       1100          393         337
@json_f    $[0].Age                               INT       1166          444         396
@xml       (Users/User[1]/Age/text())[1]          INT       28037         30513       34427

@jsonu     $[1].User                              NVARCHAR  1223          513         490
@jsonu_f   $[1].User                              NVARCHAR  1367          647         673
@json      $[1].User                              VARCHAR   1343          623         604
@json_f    $[1].User                              VARCHAR   1527          790         800
@xml       (Users/User[2]/@Name)[1]               NVARCHAR  22487         25094       26840
@xml       (Users/User[2]/@Name)[1]               VARCHAR   22816         25190       26676

@jsonu     $[2].Skills[0]                         NVARCHAR  1654          856         814
@jsonu_f   $[2].Skills[0]                         NVARCHAR  1926          1124        1103
@json      $[2].Skills[0]                         VARCHAR   1790          960         907
@json_f    $[2].Skills[0]                         VARCHAR   2090          1250        1196
@xml       (Users/User[3]/Skills/Skill/text())[1] VARCHAR   23530         25426       27840

And another interesting nuance: you don't need to mix JSON_VALUE and OPENJSON calls. Also, try to specify only those columns that are really needed after parsing. With JSON, everything is extremely simple - the fewer columns you need to parse, the faster we get the result:

SQL

SET NOCOUNT ON
SET STATISTICS TIME ON

DECLARE @json NVARCHAR(MAX)
SELECT @json = BulkColumn
FROM OPENROWSET(BULK 'X:\sample.txt', SINGLE_NCLOB) x

SELECT COUNT_BIG(*)
FROM OPENJSON(@json)
WITH (
      ProductID INT
    , ProductNumber NVARCHAR(25)
    , OrderQty SMALLINT
    , UnitPrice MONEY
    , ListPrice MONEY
    , Color NVARCHAR(15)
)
WHERE Color = 'Black'

SELECT COUNT_BIG(*)
FROM OPENJSON(@json) WITH (Color NVARCHAR(15))
WHERE Color = 'Black'

SELECT COUNT_BIG(*)
FROM OPENJSON(@json)
WHERE JSON_VALUE(value, '$.Color') = 'Black'

/*
    2016 SP2(CU17):

    CPU time = 1140 ms, elapsed time = 1144 ms
    CPU time = 781 ms, elapsed time = 789 ms
    CPU time = 2157 ms, elapsed time = 2144 ms

    2017(CU25):

    CPU time = 1016 ms, elapsed time = 1034 ms
    CPU time = 718 ms, elapsed time = 736 ms
    CPU time = 1282 ms, elapsed time = 1286 ms

    2019(CU11):

    CPU time = 998 ms, elapsed time = 1011 ms
    CPU time = 721 ms, elapsed time = 745 ms
    CPU time = 1301 ms, elapsed time = 1316 ms
*/

Conclusions

Parsing data from JSON is 2x to 10x times faster than from XML.
JSON storage is often more redundant than XML storage.
Parsing JSON data in Unicode is 5-15% faster.
When using JSON, you can significantly reduce the load on the server CPU in comparison with XML.
Starting with SQL Server 2017, significantly accelerated the parsing of scalar values from JSON.
If you can get rid of XML in favor of JSON - do it!

Git

You can get all fresh code samples from my Git. Thanks!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Sergii Syrovatchenko

Database Administrator DraftKings Inc.

Ukraine

SQL Server DBA/DB Developer with 10+ years of experience in SQL Server 2005-2019, Azure/GCP. Worked on high-load OLTP/DW projects and develops system tools for SQL Server. In depth understanding of SQL Server Engine and experience in working with big databases. Domain knowledge of ERP/CRM, crawlers, gambling and retail sales. Blogger, mentor and speaker at local SQL Server events.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.