Click here to Skip to main content
15,886,080 members
Articles / ETL
Tip/Trick

Cinchoo ETL - Flattening Complex, Nested JSON Objects

Rate me:
Please Sign up or sign in to vote.
4.64/5 (3 votes)
26 Sep 2022CPOL3 min read 8.8K   1   2
Quick tutorial about flattening complex, nested JSON objects using Cinchoo ETL
In this tip, you will learn how to flatten complex, nested JSON objects using Cinchoo ETL framework. It is very simple to use. With few lines of code, the conversion can be done. You can perform such conversion process as stream based, quite fast and with low memory footprint.

1. Introduction

ChoETL is an open source ETL (extract, transform and load) framework for .NET. It is a code based library for extracting data from multiple sources, transforming, and loading into your very own data warehouse in .NET environment. You can have data in your data warehouse in no time.

This tip talks about flattening complex, nested JSON file using Cinchoo ETL framework. It is very simple to use, with few lines of code, the conversion can be done. You can convert large files as the conversion process is stream based, quite fast and with low memory footprint.

2. Requirement

This framework library is written in C# using .NET 4.5 / .NET Core 3.x Framework.

3. How to Use

3.1 Sample Data

Let's begin by looking into the below JSON file containing complex, nested structure. Say, for example, we have two companies and the first one has two branches and the second has just one branch.

Listing 3.1.1 Sample JSON File (company.json)
JSON
[
  {
    "ID": "1",
    "Name": "C1",
    "Branches": [
      {
        "City": "New York",
        "Country": "USA"
      },{
        "City": "California",
        "Country": "USA"
      }
    ]
  },
  {
    "ID": "2",
    "Name": "C2",
    "Branches": [
      {
        "City": "Mexico City",
        "Country": "Mexico"
      }
    ]
  }
]

We are going to flatten using Cinchoo ETL Framework to get the result as shown in the below table:

ID Name Branches/0/City Branches/0/Country Branches/1/City Branches/1/Country
1 C1 New York USA California USA
2 C2 Mexico City Mexico null null

Expected JSON output looks as below:

JavaScript
[
  {
    "ID": "1",
    "Name": "C1",
    "Branches/0/City": "New York",
    "Branches/0/Country": "USA",
    "Branches/1/City": "California",
    "Branches/1/Country": "USA"
  },
  {
    "ID": "2",
    "Name": "C2",
    "Branches/0/City": "Mexico City",
    "Branches/0/Country": "Mexico",
    "Branches/1/City": null,
    "Branches/1/Country": null
  }
]

3.2 Install Library

Next, install ChoETL.JSON / ChoETL.JSON.NETStandard nuget package. To do this, run the following command in the Package Manager Console.

.NET Standard Framework

Install-Package ChoETL.JSON

.NET Core

Install-Package ChoETL.JSON.NETStandard

Now add ChoETL namespace to the program.

JavaScript
using ChoETL;

3.3 Quick Conversion

Let's use the library to convert the complex structured JSON file to flatten format. It is as simple as can be done with few lines of code. No POCO class needed. It is fast, stream based, and consumes low memory.

Listing 3.3.1. Quick JSON file conversion
JavaScript
private static void QuickConversion()
{
    using (var r = new ChoJSONReader.LoadText("companies.json"))
    {
        using (var w = new ChoJSONWriter("companies_out.json")
               .Configure(c => c.ThrowAndStopOnMissingField = false)
               .Configure(c => c.FlattenNode = true)
               .Configure(c => c.DefaultArrayHandling = false)
               .Configure(c => c.NestedKeySeparator = '/')
               .WithMaxScanNodes(3)
               )
            w.Write(r);
    }
}

Create an instance of ChoJSONWriter for producing flatten JSON (companies_out.json) output file. Then create an instance of ChoJSONReader object for reading complex JSON (companies.json) file.

Where:

  • Configure(c => c.FlattenNode = true) - tells the writer to flatten the json
  • Configure(c => c.ThrowAndStopOnMissingField = false) - configure the writer to ignore missing field while writing. In the sample above, the second node, comes with 1 branch object, but expect to output the second branch with null values.
  • Configure(c => c.NestedKeySeparator = '/') - tells the writer to use '/' as nested key separator.
  • Configure(c => c.DefaultArrayHandling = false) - tells the writer to turn off the treating the empty nodes as simple object (not as array).
  • WithMaxScanNodes(3) - tells the writer to scan the number of nodes to discover the optimal maximum number of fields required to output the flat out JSON.

Sample Fiddle: https://dotnetfiddle.net/uzZdKU

3.4 Using POCO Conversion

This approach uses POCO model classes to load the JSON file. First, define the POCO classes matches the companies.json file.

Listing 3.4.1. POCO classes
C#
public class Company
{
    public string ID { get; set; }
    public string Name { get; set; }
    public List<Branch> Branches { get; set; }
}

public class Branch
{
    public string City { get; set; }
    public string Country { get; set; }
}    
Listing 3.4.2. Conversion using POCO classes
JavaScript
private static void SelectiveColumnTest()
{
    using (var r = new ChoJSONReader<Company>("companies.json"))
    {
        using (var w = new ChoJSONWriter<Company>("companies_out.json")
            .Configure(c => c.FlattenNode = true)
            .Configure(c => c.ThrowAndStopOnMissingField = false)
            .Configure(c => c.NestedKeySeparator = '/')
            .WithMaxScanNodes(3)
            )
            w.Write(r);
    }
}

Create an instance of ChoJSONWriter for producing flatten JSON (companies_out.json) output file. Then create an instance of ChoJSONReader object for reading complex JSON (companies.json) file.

Where:

  • Configure(c => c.FlattenNode = true) - tells the writer to flatten the json
  • Configure(c => c.ThrowAndStopOnMissingField = false) - configures the writer to ignore missing field while writing. In the sample above, the second node, comes with 1 branch object, but expect to output the second branch with null values.
  • Configure(c => c.NestedKeySeparator = '/') - tells the writer to use '/' as nested key separator.
  • WithMaxScanNodes(3) - tells the writer to scan the number of nodes to discover the optimal maximum number of fields required output the flat out JSON.

Sample fiddle: https://dotnetfiddle.net/AX02BN

Download the sample attached above, try it.

For more information about Cinchoo ETL, please visit the other CodeProject articles:

History

  • 25th October, 2021: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralMy vote of 5 Pin
Ștefan-Mihai MOGA2-Nov-21 10:25
professionalȘtefan-Mihai MOGA2-Nov-21 10:25 
GeneralRe: My vote of 5 Pin
Cinchoo2-Nov-21 11:17
Cinchoo2-Nov-21 11:17 
QuestionMessage Closed Pin
27-Oct-21 0:55
Ghost Kitchen27-Oct-21 0:55 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.