Click here to Skip to main content
15,867,330 members
Articles / Database Development / SQL Server

JSON for SQL Server. Part 1

Rate me:
Please Sign up or sign in to vote.
4.97/5 (54 votes)
26 Apr 2016CPOL7 min read 146.1K   5.1K   71   38
Support routines for handling JSON in SQL Server

Introduction

This article presents TSQL routines that provide support in SQL Server to use JSON data. The focus is on performance and flexibility.

Background

From Wikipedia: "JSON or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML."

JSON is widely used as a simpler alternative to XML. In the last 8 years, it has gained a lot of popularity, and it is now present in almost every kind of situation: as a format for data transmission, for data storage, for defining schemas or templates, ...

SQL Server before 2016 does not include native JSON support, for these systems users willing to use JSON at the database level will have to choose:

  1. Add JSON support to the database via CLR routines
  2. Add JSON support via TSQL routines

Using CLR offers flexibility and great performance, it has the following issues:

  • requires enabling CLR Integration, which may (or not) be beyond the authorization of the DB developer
  • requires developing in other languages (usually C#)
  • deployment can be "trickier" as it involves creating and assembly, registering,...

Being as widespread as it is, prior TSQL-JSON work do exist. The most comprehensive one, I have seen, being by Phil Factor (https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server). As he states in his article, there are a number of reasons, for exploring, no matter how ugly it seems, the handling of JSON at the database level.

Storage of JSON

In JSON, there are only two types of structured data: objects and list of objects. The definition of the latter is pretty obvious “a set of zero, one or more objects” and relies on the former. This leads us a definition of object: an object is a container for a set of properties. A property is just a name that holds either a simple value (string, number, bool or NULL) or structured data.

In human readable form, JSON defines some syntax conventions, which are easily understood by seeing a sample:

JavaScript
{
     "firstName": "John",
     "lastName": "Smith",
     "age": 25,
     "address":{
         "streetAddress": "21 2nd Street",
         "city": "New York",
         "state": "NY",
         "postalCode": "10021"
     },
     "phoneNumber": [
         {"type": "home","number": "212 555-1234"},
         {"type": "fax","number": "646 555-4567"}
     ]
 }

There is only one link between a JSON object and its properties, the "owns" property. As properties can represent additional objects, it is straight forward that an object represents a hierarchy of objects. As there is only one single connectivity path (the “owns” property), the relation among a JSON object and its properties, can be represented by a simple hierarchy table. Modeling this hierarchy only requires two fields: one for identifying objects and properties, and other for defining the “owns” property.

Note: As further on, we will be using a JSON top down parser, I find it more intuitive to name this link “parent” instead of “owns”. Conceptually, they express the same: Object A owns property B, if and only if the parent of property B is object A.

Besides the object-property relation, some properties can hold simple values, so we will extend our hierarchy model with two additional fields: one with the value type (string, number,…) and another with the actual value. This model can be expressed in TSQL using a table type:

SQL
CREATE TYPE [dbo].[pjsonData] AS TABLE(
    [id] [int] NOT NULL,
    [parent] [int] NOT NULL,
    [name] [nvarchar](100) NOT NULL,
    [kind] [nvarchar](10) NOT NULL,
    [value] [nvarchar](max) NOT NULL
)

Using this structure, the previous JSON sample would be stored as:

ID     Parent   Name            Kind     Value
-----------------------------------------------------
1      0                        OBJECT
2      1        firstName       STRING   John
3      1        lastName        STRING   Smith
4      1        age             NUMBER   25
5      1        address         OBJECT
6      5        streetAddress   STRING   21 2nd Street
7      5        city            STRING   New York
8      5        state           STRING   NY
9      5        postalCode      STRING   10021
10     1        phoneNumber     ARRAY
11    10                        OBJECT
12    11        type            STRING   home
13    11        number          STRING   212 555-1234
14    10                        OBJECT
15    14        type            STRING   fax
16    14        number          STRING   646 555-4567

Consuming JSON: json_parse

Given a string representing JSON encoded data, this stored procedure generates a pjsonData table storing the data. The functionality of this procedure is very similar to the other existing parsers, nevertheless the implementation is completely different.

The parsing procedure is called pjsonData and its usage is quite simple. As expected, it receives a string of JSON text.

SQL
select * from json_Parse('{
     "firstName": "John",
     "lastName": "Smith",
     "age": 25,
     "address":{
         "streetAddress": "21 2nd Street",
         "city": "New York",
         "state": "NY",
         "postalCode": "10021"
     },
     "phoneNumber": [
         {"type": "home","number": "212 555-1234"},
         {"type": "fax","number": "646 555-4567"}
     ]
 }')

and returns a pJsonData table:

Image 1

Generating JSON: json_toJson

This is the counterpart of the previous stored procedure. This procedure generates a JSON string representation for an item in a pjsonData table. Again, its working is better seen with a sample:

First let’s create a pjsonData table:

SQL
declare @data pJsonData

insert into @data select * from json_Parse('[
        {"Name": "John   Smith"   ,
        "address":{"streetAddress": 
        "21 2nd Street","city": "New York"}},
        {"Name": "Jane   Doe"     ,
        "address":{"streetAddress": "22 Madison Ave","city": "New York"}},
        {"Name": "George Williams",
        "address":{"streetAddress": "18 3rd Street","city": "Chicago"}}
]')

that sets the following:

Image 2

JSON data can be generated by executing dbo.json_toJson(@data pjsonData,@id int).

Where @data is a pjsonData table and @id is the ID of the element we want displayed as JSON. So:

Image 3

Notice that there is no object with ID=0 and dbo.json_toJson(@data,0) would raise an error.

Querying JSON: json_value and json_value2

These procedures apply a query expression on JSON data, if a property matching the query expression is found then the value for that property is returned, otherwise they return NULL. There are two versions for this procedure, one that receives the JSON data as a string (which internally gets parsed using json_Parse) and other that receives the JSON data as pJsonData. Use the second version when the same JSON data is queried more than once to avoid repetitive reparsing.

The syntax for them is:

SQL
dbo.json_Value(@json_string NVARCHAR(MAX),@query NVARCHAR(MAX))

dbo.json_Value2(@json_data pJsonData,@query NVARCHAR(MAX))

A query is a path expression detailing a valid route that, starting from the root object, reaches a property. A path expression uses the ‘.’ As a separator and index number or properties names as routing paths.

SQL
declare @json_string nvarchar(max)='[
        {"Name": "John Jr"   ,
         "Family":[
            {"relationship": "brother","Name": "Michael"},
            {"relationship": "father" ,"Name": "John"},
            {"relationship": "mother" ,"Name": "Marge"},
            {"relationship": "sister","Name": "Jane"}
            ]
        },
        {"Name": "June"   ,
         "Family":[
            {"relationship": "brother","Name": "George"},
            {"relationship": "father" ,"Name": "Derek"},
            {"relationship": "mother" ,"Name": "Lucy"}
            ]
        }
]'

declare @json_data pJsonData
insert into @json_data select * from json_Parse(@json_string)

That can be queried, either with json_value or json_value2:

Image 4

A great flexibility can be achieved by in-lining the JSON parsing and joining with a sequence generator. In the following sample, we are querying the table people(name,family). This table is generated on the-fly, with the family cell holding a JSON array.

SQL
select * from
(
select
    people.name
    ,k.num
    ,dbo.json_value(people.family,k.num+'.name') [relative]
    ,dbo.json_value(people.family,k.num+'.relationship') relation
from (values ('John','[ {"relationship": "brother","Name": "Michael"},
                        {"relationship": "father" ,"Name": "John"},
                        {"relationship": "mother" ,"Name": "Marge"},
                        {"relationship": "sister","Name": "Jane"}]'),
            ('Jane','[  {"relationship": "brother","Name": "George"},
                        {"relationship": "father" ,"Name": "Derek"},
                        {"relationship": "mother" ,"Name": "Lucy"}]')) people(name,family)
    ,(select '0' num union select '1' union select '2' union select '3' union select '4' union select '5') k
) md where md.relation is not null order by name,num

that returns:

Image 5

Implementation

Instead of using RegExp (or SQL-ish alternatives: PATINDEX, STUFF, temp string tables,…) this stored procedure implements a full lexer-parser engine that iterates through the input string and writes items into the hierarchy table as they are found. This is done through a heavy use of SQL procedural extensions implemented in TSQL. This has its cons & pros.

CONS

  • This approach is quite far from the "SQL way of getting stuff done". The code has a loops, recursion,... more than enough to raise an eye brow from a purist point of view.
  • Not portable to other SQL dialects (PSQL, PL/SQL, SQL PL,PL/pgSQL,..)
  • Support functions pollute the name space

PROS

  • Execution flow is simple and easy to follow (and to hack and to customize)
  • Trashing of strings is reduced to the bare minimum, as such there is very little memory overhead which improves performance.

To test the performance, using www.json-generator.com I have generated same sample JSON data of different sizes. This data has been inserted into a table (jdata) and the following test has been executed:

  • For each record in the jData table
  • Print the size of the JSON data
  • Parse the JSON data using this article PROCedural parser, and NON-PROCedural parser from the www.simple-talk.com website
  • Print the number of generated JSON elements in both cases (it should be the same for both parsers)
  • Print how many seconds it has taken to parse

Which gives:

Image 6

Conclusion and Future

Use of JSON data at the database level, definitively, is a valid option. It provides a lot of flexibility, and, in some situations, can be a valid technique to simplify the implementation of application-database resources: configuration, simple lookups or joined tables, filters or even dynamic queries.

In doing so, there a number of advantages:

  • Less data traffic in certain cases
  • Logic implementation closer to the database level (this can be good or bad depending on the situation)
  • Simpler data storage design

Although the hierarchy model is robust, and offers the same query possibilities as its underlying relational model, there are two major caveats:

  • Performance: Although the use of procedural techniques improves performance, it is still far away from its C# counterpart. In this current incarnation, I discourage it for large sets of JSON records or for large JSON files.
  • The syntax for querying is rather convoluted, and not easy to follow.

In the next series of this article, we will develop further more the use of JSON on the server by:

  • Improving syntax by creating helper queries
  • Use some JSON-based techniques to offer expanded functionality
  • Let's take a look at CLR alternatives

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Spain Spain
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralMy vote of 5 Pin
Shiv Rathod28-Feb-21 5:26
Shiv Rathod28-Feb-21 5:26 
BugPotential error (bug) with json_Value and json_Value2 (how to fix included) Pin
Matthieu Roy21-Jan-21 5:18
Matthieu Roy21-Jan-21 5:18 
First, thanks for this script, really useful.

I found a bug. In these 2 functions (json_Value and json_Value2), they are supposed to return nvarchar(max), but on some error validations, you try to return a static string converted (cast) to int!?
For example:
if (@kind!='ARRAY') return cast('Using index in non array JSON' as int);

It should be (remove the cast to fix):
if (@kind!='ARRAY') return 'Using index in non array JSON';


Maybe you should check all functions for this kind of error. SQL don't like this when we fall in these cases.

Thanks.

modified 21-Jan-21 11:35am.

GeneralRe: Potential error (bug) with json_Value and json_Value2 (how to fix included) Pin
Jose Segarra6-Jun-21 22:35
Jose Segarra6-Jun-21 22:35 
QuestionOne more way for JSON data generation Pin
JimmyLor8-Jun-17 4:35
JimmyLor8-Jun-17 4:35 
AnswerRe: One more way for JSON data generation Pin
Jose Segarra6-Jul-17 5:28
Jose Segarra6-Jul-17 5:28 
QuestionOFFEST and 2008 Pin
Member 26113088-May-17 9:47
Member 26113088-May-17 9:47 
AnswerRe: OFFEST and 2008 Pin
jpm1234527-May-17 11:06
jpm1234527-May-17 11:06 
Questionis Part 2 coming anytime soon? Pin
Member 98504096-Feb-17 5:02
Member 98504096-Feb-17 5:02 
AnswerRe: is Part 2 coming anytime soon? Pin
Jose Segarra6-Jun-17 13:05
Jose Segarra6-Jun-17 13:05 
GeneralRe: is Part 2 coming anytime soon? Pin
Stewart E. McGuire20-Jun-17 3:28
professionalStewart E. McGuire20-Jun-17 3:28 
QuestionHandling of null Pin
Fridz4-May-16 9:33
Fridz4-May-16 9:33 
AnswerRe: Handling of null Pin
Jose Segarra13-May-16 0:36
Jose Segarra13-May-16 0:36 
AnswerRe: Handling of null Pin
Stewart E. McGuire20-Jun-17 3:40
professionalStewart E. McGuire20-Jun-17 3:40 
GeneralGood Article Pin
Alireza_136229-Apr-16 3:21
Alireza_136229-Apr-16 3:21 
GeneralMy vote of 5 Pin
MarcusCole683326-Apr-16 5:13
professionalMarcusCole683326-Apr-16 5:13 
PraiseExcellet Job Pin
Member 200842311-Jan-16 14:24
Member 200842311-Jan-16 14:24 
SuggestionJSON support is available in SQL Server 2016 Pin
Jovan Popovic(MSFT)14-Nov-15 11:41
Jovan Popovic(MSFT)14-Nov-15 11:41 
GeneralRe: JSON support is available in SQL Server 2016 Pin
Jose Segarra26-Apr-16 4:08
Jose Segarra26-Apr-16 4:08 
Questionhow I can convert a query statement to format pJsonData Pin
samo_reque5-Nov-15 8:52
samo_reque5-Nov-15 8:52 
AnswerRe: how I can convert a query statement to format pJsonData Pin
Member 1315680827-Apr-17 6:41
Member 1315680827-Apr-17 6:41 
QuestionNice Article and Code! Can it read from dynamic web-json? Pin
Member 1191305816-Aug-15 11:34
Member 1191305816-Aug-15 11:34 
AnswerRe: Nice Article and Code! Can it read from dynamic web-json? Pin
Jose Segarra6-Sep-15 11:35
Jose Segarra6-Sep-15 11:35 
GeneralMy vote of 5 Pin
Daniel Miller27-Jul-15 7:53
professionalDaniel Miller27-Jul-15 7:53 
GeneralRe: My vote of 5 Pin
Jose Segarra11-Aug-15 21:18
Jose Segarra11-Aug-15 21:18 
QuestionBest Database Article of June 2015 Pin
Sibeesh Passion9-Jul-15 18:57
professionalSibeesh Passion9-Jul-15 18:57 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.