Click here to Skip to main content
15,898,371 members
Articles / SqlBulkCopy

SQL bulk insert with strongly typed data including validation

Rate me:
Please Sign up or sign in to vote.
4.30/5 (7 votes)
7 Oct 2013CPOL3 min read 22.1K   12   17  
SQL bulk copy with strongly typed data including validation.

Introduction

SQLBulkCopy is a really useful and fast way to insert large amounts of data in a few seconds but it only accepts DataTable to insert into tables. Managing DataTables is very problematic, we have to specify columns, rows manually with all string values. If we miss something then all is screwed up and there is no support for any basic validation.

In this tutorial, I am going to create some classes which will act as a API through which we can take advantage of SqlBulkCopy with strongly typed objects and support of basic validation with the help of data annotation.

Custom attribute classes

At first we are going to create custom attributes. Through these, we will be able to specify column names and table name in a model itself.

DataTableAttribute implementation for specifying table name.

C#
    /// <summary>
/// Maps a class to a Database Table or XML Element
/// </summary>
[AttributeUsage(AttributeTargets.Class)]
// Specifying target which will be only applicable for classes
public class DataTableAttribute : Attribute
{
    /// <summary>
    /// Maps a class to a Database Table or XML Element
    /// </summary>
    /// <param name="containerName">Name of the Table
    ///         or XML Parent Element to map</param>
    public DataTableAttribute(string containerName)
        : this()
    {
        TableName = containerName;
    }

    /// <summary>
    /// Prevents a default instance of the <see
    ///       cref="DataTableAttribute"/> class from being
    /// created.
    /// </summary>
    private DataTableAttribute()
    {
        TableName = string.Empty;
    }

    /// <summary>
    /// Gets or sets the name of the table.
    /// </summary>
    /// <value>The name of the table.</value>
    public string TableName { get; set; }
}

DataColumnAttribute implementation for specifying column names.

C#
/// <summary>
/// Definition for identifying column name with type
/// </summary>
[AttributeUsage(AttributeTargets.Property)] // Only applicable for properties
public class DataColumnAttribute : Attribute
{
    /// <summary>
    /// Map a property to a database column or XML element
    /// </summary>
    /// <param name="name">Name of the column or element to map</param>
    /// <param name="type">Underlying DbType of the column</param>
    public DataColumnAttribute(string name, Type type)
        : this()
    {
        ColumnName = name;
        ColumnType = type;
    }

    /// <summary>
    /// Prevents a default instance of the <see cref="DataColumnAttribute"/> class from being
    /// created.
    /// </summary>
    private DataColumnAttribute()
    {
        ColumnName = string.Empty;
        ColumnType = null;
    }

    /// <summary>
    /// Gets or sets the name of the column.
    /// </summary>
    /// <value>The name of the column.</value>
    public string ColumnName { get; set; }

    /// <summary>
    /// Gets or sets the type of the column.
    /// </summary>
    /// <value>The type of the column.</value>
    public Type ColumnType { get; set; }
} 

You might be wondering why do we need to have these custom attributes rather we can get table name, column name and data types from class itself. The advantage of this approach is we can specify more properties which does not take part in DB operation or have a different name in DB and on code level.

Data Model

In second phase, let decorate our DB model class with above custom attributes and data annotation attributes for validation. Data annotation attributes could be skipped if you do not require any validation for bulk insert.

Note: You need add System.ComponentModel.DataAnnotations.dll. If planning for validation.

Data model for bulk insert:

C#
/// <summary>
/// Book information
/// </summary>
[DataTable("TBookInfo")]
public class BookInfo
{
    /// <summary>
    /// Gets or sets the EAN.
    /// </summary>
    /// <value>The EAN.</value>
    [DataColumn("ISBN", typeof(string))]
    [Required(ErrorMessage = "EAN is required")]
    [MaxLength(50, ErrorMessage = "EAN is exceeded max length of 50")]
    public string EAN { get; set; }

    /// <summary>
    /// Gets or sets the title.
    /// </summary>
    /// <value>The title.</value>
    [DataColumn("Title", typeof(string))]
    [Required(ErrorMessage = "Book title is required")]
    [MaxLength(255, ErrorMessage = "Book title is exceeded max length of 255")]
    public string Title { get; set; }

    /// <summary>
    /// Gets or sets the name of the publisher.
    /// </summary>
    /// <value>The name of the publisher.</value>
    [DataColumn("PublisherName", typeof(string))]
    [Required(ErrorMessage = "Publisher name  is required")]
    [MaxLength(255, ErrorMessage = "Publisher name is exceeded max length of 255")]
    public string PublisherName { get; set; }

    /// <summary>
    /// Gets or sets the synopsis.
    /// </summary>
    /// <value>The synopsis.</value>
    [DataColumn("Synopsis", typeof(string))]
    public string Synopsis { get; set; }

    /// <summary>
    /// Gets or sets the release date.
    /// </summary>
    /// <value>The release date.</value>
    [DataColumn("ReleaseDate", typeof(DateTime))]
    public DateTime? ReleaseDate { get; set; }

    /// <summary>
    /// Gets or sets the number of pages.
    /// </summary>
    /// <value>The number of pages.</value>
    [DataColumn("NumberOfPages", typeof(int))]
    public int? NumberOfPages { get; set; }
} 

DataColumn and DataTable is our custom defined attributes which would be used later. Required and MaxLength attributes came from data annotation attributes.

Now our model is decorated with required attributes which would take part in SqlBulkCopy.

Data model conversion implementation

Let's go into the implementation part for automatic conversion from above class to DataTable. Function implementation for getting table name from object:

C#
/// <summary>
/// Gets the table name from object.
/// </summary>
/// <typeparam name="T">Source for DataTable that need to be inserted</typeparam>
/// <returns>Get table name for object</returns>
public static String GetTableNameFromObject<T>()
{
    var arrTableAttrib = typeof(T).GetCustomAttributes(typeof(DataTableAttribute), true);
    return arrTableAttrib.Any() ? ((DataTableAttribute)arrTableAttrib.First()).TableName : null;
} 

So, we have got function for getting table name. Let's write a function to get PropertyInfo which will help us to retrieve property values through reflection and DataColumnAttribute for getting data column name with it's type.

NOTE: I am using simple caching mechanism to store our PropertyInfo and DataColumnAttribute for fast retrieval. You need to add System.Runtime.Caching.dll for using caching.

Let's look into code for getting those two values

C#
/// <summary>
/// Gets all custom property.
/// </summary>
/// <typeparam name="T">Source for DataTable that need to be inserted</typeparam>
/// <returns>Property info and DataColumnAttribute</returns>
public static Dictionary<PropertyInfo, DataColumnAttribute> GetAllCustomProperty<T>()
{
    if (MemoryCache.Default[typeof(T).ToString()] == null)
    {
        var props = (from prop in typeof(T).GetProperties()
                     where prop.GetCustomAttributes(typeof(DataColumnAttribute), true).Any()
                     select new
                     {
                         prop,
                         dataColumn = (DataColumnAttribute)
                         prop.GetCustomAttributes(typeof(DataColumnAttribute), true).FirstOrDefault()
                     }).ToDictionary(item => item.prop, item => item.dataColumn);
 
        // Set Caching key name according to class
        MemoryCache.Default.Set(typeof(T).ToString(), props, null);
 
        return props;
    }
    return (Dictionary<PropertyInfo, DataColumnAttribute>)MemoryCache.Default[typeof(T).ToString()];
}

I have simplified things with above mentioned single function and LINQ queries which are easier to understand. DataTable accepts arrays of DataColumns. So, let's declare a method to get DataColumn arrays which would be used more then once and this will be using above function.

C#
/// <summary>
/// Gets all data table column.
/// </summary>
/// <typeparam name="T">Source for DataTable that need to be inserted</typeparam>
/// <returns>Data column array</returns>
public static DataColumn[] GetAllDataTableColumn<T>()
{
    return (from item in GetAllCustomProperty<T>()
            select new DataColumn(item.Value.ColumnName, item.Value.ColumnType))
            .ToArray();
}

Now, we need to have a function that should return property value by passing value of custom column attribute.

C#
/// <summary>
/// Gets the property.
/// </summary>
/// <typeparam name="T">Source for DataTable that need to be inserted</typeparam>
/// <param name="dataColumnAttributeName">Name of the data column attribute.</param>
/// <returns>PropertyInfo for data attribute column name</returns>
public static PropertyInfo GetPropertyName<T>(string dataColumnAttributeName)
{
    return (from item in GetAllCustomProperty<T>()
            where item.Value.ColumnName == dataColumnAttributeName
            select item.Key).FirstOrDefault();
} 

The above function is again using same GetAllCustomProperty() for performance benefit.

All things are set now. Now, we need to create function that would parse object to DataTable to operate on SqlBulkCopy.

C#
/// <summary>
/// Gets the transformed data.
/// </summary>
/// <typeparam name="T">Source for DataTable that need to be inserted</typeparam>
/// <param name="myDataObjectList">My data object list.</param>
/// <returns>DataTable with mapped columns</returns>
public static DataTable GetTransformedData<T>(IList<T> myDataObjectList)
{
    var dataTable = new DataTable(GetTableNameFromObject<T>());
 
    // Get all Data columns
    var allColumns = GetAllDataTableColumn<T>();
 
    dataTable.Columns.AddRange(allColumns);
 
    foreach (T record in myDataObjectList)
    {
        var row = dataTable.NewRow();
        foreach (var colmnAttrib in allColumns)
        {
            row[colmnAttrib] = GetPropertyName<T>(
                  colmnAttrib.ColumnName).GetValue(record) ?? DBNull.Value;
        }
        dataTable.Rows.Add(row);
    }
    return dataTable;
}

Validation

We have defined custom attributes classes in first phase and in second decorated our model class. Third phase codes deals with conversion of model to DataTable.

Let's suppose you are getting values from some source and saving it to lstBooks object. Code for validating lstBooks and removing items which are having invalid rows:

C#
/// <summary>Validates data and remove any failed rows</summary>
/// <param name="lstBookInfoes">Modified book info list after validation.</param>
private static void ValidateData(ref List<BookInfo> lstBookInfoes)
{
    var results = new List<ValidationResult>();
 
    // Validate all fields of bookInfo
    lstBookInfoes.RemoveAll(bookInfo =>
                !Validator.TryValidateObject(
                    bookInfo,
                    new ValidationContext(bookInfo, null, null),
                    results,
                    true));
 
    // TODO: Log the cause of failed rows
}

Note: If we need some complex logic for validation, then inherit IValidatableObject interface and implement

IEnumerable<validationresult> 
Validate(ValidationContext validationContext)
on model class. In this method we can have any custom implementation.

Implementation of core code for doing SqlBulkInsert

C#
var lstBooks = GetSampleBookInfo();

// Validate and remove invalid rows
ValidateData(ref lstBooks);
 
using (var con = new SqlConnection(
  ConfigurationManager.ConnectionStrings["BulkInsertDemo"].ConnectionString))
{
    using (var bulkCopy = new SqlBulkCopy(con))
    {
        con.Open();

        // Get transformed data into DataTable from given generic list
        var dt = DataTableConverter.GetTransformedData(lstBooks);
 
        // Set the destination table name
        bulkCopy.DestinationTableName = 
          DataTableConverter.GetTableNameFromObject<BookInfo>();
 
        bulkCopy.BatchSize = 5000;
 
        bulkCopy.WriteToServer(dt);
    }
}

Transformation is done in a generic way. So, whenever you need to do bulk insert, create a class with the required attributes and everything else would be taken care of. It is really fast while getting value through reflection as it is using caching mechanism. So, very little or no performance overhead.

Note: In some cases DataTable is having issue related to ordering. You can change your class members order or table column orders. DataColumnAttribute class can be also modified to have ordering sequence model class.

Note: From demo download please modify actual path of DB file from app.config, if DB connection is failing.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior) Mindfire Solutions
India India
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- There are no messages in this forum --