Migrate Data from SQL Server to Azure Cosmos DB

Srinivasa Dinesh Parupalli

4.91/5 (3 votes)

Nov 8, 2018

CPOL

6 min read

17182

Importing Documents from SQL Server to Azure Cosmos DB

Introduction

This article will provide you information on importing documents from SQL Server to Azure Cosmos DB.

Background

This article is for the developers and architects who already have basic understanding of Azure Cosmos DB. I wanted to make this article as a foundation for my upcoming articles which are towards CRUD operations on the documents/collections at Azure Cosmos DB. In order to perform the CRUD operations, I needed the data. So I started exploring few open source tools. Out of which I started using Azure Cosmos DB Data Migration tool from Microsoft.

This article provides instructions on using the Azure Cosmos DB Data Migration tool, which can import data from various data sources into Azure Cosmos DB.

The primary focus of this article is migration data from the SQL Server to Azure Cosmos DB.

Prerequisites

Azure Cosmos DB emulator: I would recommend this as this is free and it has the same functionalities which you can do at the Azure Cosmos DB at the Azure portal. The Azure Cosmos DB Emulator provides a local environment that emulates the Azure Cosmos DB service for development purposes. Using the Azure Cosmos DB Emulator, you can develop and test your application locally, without creating an Azure subscription or incurring any costs. When you're satisfied with how your application is working in the Azure Cosmos DB Emulator, you can switch to using an Azure Cosmos DB account in the cloud. Azure Cosmos DB emulator can be found here.
Azure Cosmos DB Data Migration tool: By using this tool, you can import from JSON files, CSV files, SQL, MongoDB, Azure Table storage, Amazon DynamoDB, etc. which can be used with Azure Cosmos DB. Azure Cosmos DB Migration tool can be found here.
SQL Server: At my end, I had SQL Server 2017 and I have used the connection of the server in this article.

Data Migration/Importing

This is multi-step process, let's go step by step.

Step 1

Identify the actual data which needs to be imported. For this article, I have considered AdventureWorks2017 database. I want the data of all the employees with their department. For that, I have created the below query.

SELECT 
    e.[BusinessEntityID] 
    ,p.[Title] 
    ,p.[FirstName] 
    ,p.[MiddleName] 
    ,p.[LastName] 
    ,p.[Suffix] 
    ,e.[JobTitle]
    ,d.[Name] AS [Department] 
    ,d.[GroupName] 
    ,edh.[StartDate] 
FROM [HumanResources].[Employee] e
    INNER JOIN [Person].[Person] p
    ON p.[BusinessEntityID] = e.[BusinessEntityID]
    INNER JOIN [HumanResources].[EmployeeDepartmentHistory] edh 
    ON e.[BusinessEntityID] = edh.[BusinessEntityID] 
    INNER JOIN [HumanResources].[Department] d 
    ON edh.[DepartmentID] = d.[DepartmentID] 
WHERE edh.EndDate IS NULL

The above query will result with 290 rows (this is as per my data in my database, it might vary as per the data in your database).

Step 2

Install the Azure Cosmos DB emulator and run it. You would find an image as shown below:

Step 3

Install Azure Cosmos DB Data Migration tool and open the dtui.exe. You would find an image like below:

Step 4

At the dtui.exe, click on the Source Information and choose SQL from the "Import from:" dropdown list. At the Connection String, give the connection string of your SQL Server box. After providing the connection string, you can either have your SQL query or choose SQL script file. I have chosen to provide the query.

Your screen should look like below:

Id is optional. When we migrate data to Azure Cosmos DB, the documents are automatically created with an Id which is GUID. We can control it by adding by ourselves, but we need to make them unique. I have mentioned in the SQL as I know that BusinessEntity is unique so I have added in the query.
Remember that Id property is always a string value in the Azure Cosmos DB.
Notice that query in the above image is different query at Step 1. As documents at the Azure Cosmos DB are represented as a json, we need to structure the SQL query to be outputted as json.
Have period(.) as the Nesting Separator. This would structure the SQL query to a valid json structure.
I just wanted to have employee and in that all the details apart from the joining date, just to show how we can structure our own data while migrating.

Step 5

Click Next after providing each of the items at Step 4. You would be at the Target Information screen.

Leave the default value (DocumentDB - Sequential record import(partitioned collection)) at "Export to:" dropdown list.
At the Connection String, copy the "Primary Connection String" from the Azure Emulator. If you want to use the Azure portal, then you can copy the same Primary Connection String. Once you are done, append it with Database=<some value>. I have provided with Database=EmployeesDB.
At Collection, give it as EmployeesWithDepartment
At Partition Key as /employee/department/name
Leave the Collection Throughput as is
Id Field as blank
Expand the Advance Options. Leave everything as is and at the Indexing Policy, right click and choose "Range".

Your screen should look like below:

Partitioned collection is a separate topic and I would be covering them in my next article on CRUD operations on the documents/collections at Azure Cosmos DB.
Collection at the Azure Cosmos DB are all collection of documents. Each document is a set of json structure defined while creating the document.
Id Field is provided if you have chosen a different identified name other than id.
Indexing Policies are dealt with in my next article on CRUD operations on the documents/collections at Azure Cosmos DB.

Step 6

If you want to generate an Error log file, you can mention it or skip this step.

Step 7

You would get the summary of Step 4 till Step 6. Once you are convinced with the details, click on Import. If it's a successful import, you would see the below screenshot. Else all the errors are displayed at the Errors tab. As mentioned in Step 1, 290 rows have been imported into Azure Cosmos DB.

Step 8

Go the Azure Cosmos DB Emulator and click on the Explorer at the Left Pane. You would find database and documents like the below screenshot:

Step 9

To query the documents at the Azure Cosmos DB, we can have SQL type of queries.

To List all the documents: SELECT * FROM c

Where c is the collection.

To fetch document with id equals 1: SELECT * FROM c WHERE c.id = "1"

To fetch document with department name as Tool Design: 
SELECT * FROM c WHERE c.employee.department.name = "Tool Design"

or:

SELECT * FROM EmployeeWithDepartment e WHERE e.employee.department.name = "Tool Design"

Points of Interest

Azure Cosmos DB Resource Model
- Azure Account
  - Databases
    - Collections
      - Documents
        
        resource/data
        
        attachments (optional)
Indexing Policies are of 3 types, Hash Index, Range Index and Spatial Index.
Resource properties which you find when you fetch data from the collection are listed below:
- id: user-defined unique identifier. It should be string
- _rid: auto-generated resource id
- _ts: auto-generated timestamp (last updated) epoch value
- _etag: auto-generated GUID. Used for optimistic concurrency
- _self: auto-generated URI path to the resource. This is very useful when using with SQL API for Azure Cosmos DB
- _attachments: URI path suffix to the resource attachment

Conclusion

Azure Cosmos DB is not limited to only creation of collections and SQL queries, we can also leverage having:

Stored Procedures
Triggers
User Defined Functions