Introduction
The current status of this project can be found at https://github.com/mtanneryd/ef6-bulk-operations. There is also a nuget package available at https://www.nuget.org/packages/Tanneryd.BulkOperations.EF6.
The entity framework (EF) is great in many ways but when inserting huge amounts of data, it is not really what you need, unless you have lots of time to spend waiting. In this article, I will present a way to use EF together with the bulk insert functionality without leaving the EF comfort zone. The code is based on an idea that I had some years ago, but I have added some more features to it since then. I figured it was time to write a complete article and provide some example code to go with it.
The latest version of this code can be found as a nuget package here.
There is also an updated version of this article here.
Background
The reason it takes so long is that every single insert
statement is encapsulated in the stored procedure sp_executesql
. So, if you have a million or so inserts to make, you get just as many roundtrips to the database each executing this stored procedure with your insert
statement inside. No wonder it takes eons to finish.
Typically, it could look something like this:
exec sp_executesql N'INSERT [dbo].[Prime]([Number], [CompositeId])
VALUES (@0, @1)
SELECT [Id]
FROM [dbo].[Prime]
WHERE @@ROWCOUNT > 0 AND [Id] = scope_identity()',N'@0 bigint,@1 bigint',@0=7,@1=198
I had the following requirements when I set out to find a way to speed things up a bit.
- dead simple to add the functionality to any EF6 project
- minimize impact on existing code
- handle complete object graphs
Challenges
Extend the Functionality of the db Context
To integrate seamlessly with the derived DbContext
class, I decided to add the BulkInsert
-method in a separate partial class declaration. BulkInsertAll
is the only public
method but there are a couple of private
methods too and it seemed like a practical solution to place them all in a separate file making use of the partial class pattern.
Table Mappings
I wanted to be able to deduct all relevant information such as table names, column names, foreign keys and whatever else I might need directly from the context and the entity objects themselves. But, as far as I know, there are no documented ways of accessing the entity framework mapping data so I had to resort to using reflection. It works but it also means that the code might stop working with a new version of EF. The following code is used to extract the initial mapping data using reflection.
var objectContext = ((IObjectContextAdapter)this).ObjectContext;
var workspace = objectContext.MetadataWorkspace;
var containerName = objectContext.DefaultContainerName;
Type t = entities[0].GetType();
var entityName = t.Name;
var storageMapping = workspace.GetItem<GlobalItem>(containerName, DataSpace.CSSpace);
dynamic temp = storageMapping.GetType().InvokeMember(
"EntitySetMappings",
BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance,
null, storageMapping, null);
The temp
variable now holds an enumerable of EntitySetMapping
objects that we cast to a list, just so that we can Single()
out the mapping for our entity type.
var entitySetMaps = new List<EntitySetMapping>();
foreach (var t in temp)
{
entitySetMaps.Add((EntitySetMapping)t);
}
var entitySetMap = entitySetMaps.Single(m => m.EntitySet.ElementType.Name == entityName);
Our next step is to find the mapping between CLR property names and table column names.
var typeMappings = entitySetMap.EntityTypeMappings;
EntityTypeMapping typeMapping = typeMappings[0];
var fragments = typeMapping.Fragments;
var fragment = fragments[0];
var properties = fragment.PropertyMappings;
foreach (var property in properties.Where(p => p is ScalarPropertyMapping).Cast<ScalarPropertyMapping>())
{
var clrProperty = property.Property;
var columnProperty = property.Column;
columnMappings.Add(clrProperty.Name, new CLR2ColumnMapping
{
CLRProperty = clrProperty,
ColumnProperty = columnProperty,
});
}
Finally, we look for any foreign key relationships.
var foreignKeyMappings = new List<ForeignKeyMapping>();
var navigationProperties =
typeMapping.EntityType.DeclaredMembers.Where(
m => m.BuiltInTypeKind == BuiltInTypeKind.NavigationProperty)
.Cast<NavigationProperty>()
.Where(p => p.RelationshipType is AssociationType)
.ToArray();
foreach (var navigationProperty in navigationProperties)
{
var relType = (AssociationType)navigationProperty.RelationshipType;
if (foreignKeyMappings.All(m => m.Name != relType.Name))
{
var fkMapping = new ForeignKeyMapping
{
NavigationPropertyName = navigationProperty.Name,
BuiltInTypeKind = navigationProperty.TypeUsage.EdmType.BuiltInTypeKind,
Name = relType.Name,
FromType = relType.Constraint.FromProperties.Single().DeclaringType.Name,
FromProperty = relType.Constraint.FromProperties.Single().Name,
ToType = relType.Constraint.ToProperties.Single().DeclaringType.Name,
ToProperty = relType.Constraint.ToProperties.Single().Name,
};
foreignKeyMappings.Add(fkMapping);
}
}
Identity Columns
The entity framework will set any generated identity columns when inserting new rows and as you can see in the sp_executesql
code snippet above, it uses scope_identity()
to do this. This works nicely as long as you insert one row at a time just like EF does but now that we want to use bulk insert instead, we need a slightly more sophisticated approach. So, the code below does this for us when our primary key column is an identity column.
var pkColumnName = pkColumn.Name;
var pkColumnType = Type.GetType(pkColumn.PrimitiveType.ClrEquivalentType.FullName);
cmd.CommandText = $@"SELECT COUNT(*) FROM {tableName}";
var result = cmd.ExecuteScalar();
var count = Convert.ToInt64(result);
cmd.CommandText = $"SELECT IDENT_INCR('{tableName}')";
result = cmd.ExecuteScalar();
dynamic identIncrement = Convert.ChangeType(result, pkColumnType);
cmd.CommandText = $"SELECT IDENT_CURRENT('{tableName}')";
result = cmd.ExecuteScalar();
dynamic identcurrent = Convert.ChangeType(result, pkColumnType);
var nextId = identcurrent + (count > 0 ? identIncrement : 0);
bulkCopy.BulkCopyTimeout = 5 * 60;
bulkCopy.WriteToServer(table);
cmd.CommandText = $"SELECT SCOPE_IDENTITY()";
result = cmd.ExecuteScalar();
dynamic lastId = Convert.ChangeType(result, pkColumnType);
cmd.CommandText = $"SELECT {pkColumnName} From {tableName}
WHERE {pkColumnName} >= {nextId} and {pkColumnName} <= {lastId}";
var reader = cmd.ExecuteReader();
var ids = (from IDataRecord r in reader
let pk = r[pkColumnName]
select pk)
.OrderBy(i => i)
.ToArray();
if (ids.Length != entities.Count) throw new ArgumentException
("More id values generated than we had entities. Something went wrong, try again.");
for (int i = 0; i < entities.Count; i++)
{
SetProperty(pkColumnName, entities[i], ids[i]);
}
Foreign Key Relationships
Our search through the entity mappings gave us information about foreign keys and with it, we can implement a recursive pattern that will allow the bulk insert method to save a complete entity graph and set identity columns for all saved objects. The BulkInsert
-method takes three arguments.
public void BulkInsertAll(IList entities, SqlTransaction transaction = null, bool recursive = false)
The list of entities must all be of the same type. The transaction is optional and the recursive flag tells us if we should save only the entities in the list (false
) or if we should also save entities in any navigation properties (true
). We have two types of navigation properties to deal with.
- Single properties, the result of a foreign key in this entity pointing to another entity
- Collection properties, the result of another entity having a foreign key to this entity
We start by finding all the single navigation properties and saving them, if they are new, and we then update the entity with the foreign key value. This will have to be done one navigation property at a time. Having done this, we can safely save all our entities. Finally, we go through all collection navigation properties, setting the proper foreign key to our entity whose primary keys we now have and then we make a recursive call to the BulkInsert
-method for each of these collection navigation properties.
Using the Code
All you have to do is to add the partial
class below after renaming it properly and placing it in the same namespace as your derived DbContext
class.
using System;
using System.Collections;
using System.Collections.Generic;
using System.Data;
using System.Data.Entity.Core.Mapping;
using System.Data.Entity.Infrastructure;
using System.Data.Entity.Core.Metadata.Edm;
using System.Data.SqlClient;
using System.Linq;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Text.RegularExpressions;
namespace BulkInsert.App
{
public partial class NumberContext
{
public NumberContext(string nameOrConnectionString)
: base($"name={nameOrConnectionString}")
{
}
public void BulkInsertAll(IList entities, SqlTransaction transaction = null,
bool recursive = false)
{
BulkInsertAll(entities, transaction, recursive,
new Dictionary<object, object>(new IdentityEqualityComparer<object>()));
}
private void BulkInsertAll(IList entities, SqlTransaction transaction,
bool recursive, Dictionary<object, object> savedEntities)
{
if (entities.Count == 0) return;
var objectContext = ((IObjectContextAdapter)this).ObjectContext;
var workspace = objectContext.MetadataWorkspace;
Type t = entities[0].GetType();
var mappings = GetMappings(workspace, objectContext.DefaultContainerName, t.Name);
if (recursive)
{
foreach (var fkMapping in mappings.ToForeignKeyMappings)
{
var navProperties = new HashSet<object>();
var modifiedEntities = new List<object[]>();
foreach (var entity in entities)
{
var navProperty = GetProperty(fkMapping.NavigationPropertyName, entity);
var navPropertyKey = GetProperty(fkMapping.ToProperty, entity);
if (navProperty != null && navPropertyKey == 0)
{
var currentValue = GetProperty(fkMapping.FromProperty, navProperty);
if (currentValue > 0)
{
SetProperty(fkMapping.ToProperty, entity, currentValue);
}
else
{
navProperties.Add(navProperty);
modifiedEntities.Add(new object[] { entity, navProperty });
}
}
}
if (navProperties.Any())
{
BulkInsertAll(navProperties.ToArray(), transaction, true, savedEntities);
foreach (var modifiedEntity in modifiedEntities)
{
var e = modifiedEntity[0];
var p = modifiedEntity[1];
SetProperty(fkMapping.ToProperty, e,
GetProperty(fkMapping.FromProperty, p));
}
}
}
}
var validEntities = new ArrayList();
var ignoredEntities = new ArrayList();
foreach (dynamic entity in entities)
{
if (savedEntities.ContainsKey(entity))
{
ignoredEntities.Add(entity);
continue;
}
validEntities.Add(entity);
savedEntities.Add(entity, entity);
}
BulkInsertAll(validEntities, t, mappings, transaction);
if (recursive)
{
foreach (var fkMapping in mappings.FromForeignKeyMappings)
{
var navigationPropertyName = fkMapping.NavigationPropertyName;
var navPropertyEntities = new List<dynamic>();
foreach (var entity in entities)
{
if (fkMapping.BuiltInTypeKind == BuiltInTypeKind.CollectionType ||
fkMapping.BuiltInTypeKind ==
BuiltInTypeKind.CollectionKind)
{
var navProperties = GetProperty(navigationPropertyName, entity);
foreach (var navProperty in navProperties)
{
SetProperty(fkMapping.ToProperty, navProperty,
GetProperty(fkMapping.FromProperty, entity));
navPropertyEntities.Add(navProperty);
}
}
else
{
var navProperty = GetProperty(navigationPropertyName, entity);
if (navProperty != null)
{
SetProperty(fkMapping.ToProperty, navProperty,
GetProperty(fkMapping.FromProperty, entity));
navPropertyEntities.Add(navProperty);
}
}
}
if (navPropertyEntities.Any())
{
BulkInsertAll(navPropertyEntities.ToArray(), transaction, true, savedEntities);
}
}
}
}
private void BulkInsertAll(IList entities, Type t, Mappings mappings,
lTransaction transaction = null)
{
Set(t).ToString();
var tableName = GetTableName(t);
var columnMappings = mappings.ColumnMappings;
var conn = (SqlConnection)Database.Connection;
if (conn.State == ConnectionState.Closed)
conn.Open();
var bulkCopy = new SqlBulkCopy(conn, SqlBulkCopyOptions.Default, transaction)
var properties = t.GetProperties().Where
(p => columnMappings.ContainsKey(p.Name)).ToArray();
var table = new DataTable();
foreach (var property in properties)
{
Type propertyType = property.PropertyType;
if (propertyType.IsGenericType &&
propertyType.GetGenericTypeDefinition() == typeof(Nullable<>))
{
propertyType = Nullable.GetUnderlyingType(propertyType);
}
if (columnMappings.ContainsKey(property.Name))
{
table.Columns.Add(new DataColumn(property.Name, propertyType));
var clrPropertyName = property.Name;
var tableColumnName = columnMappings[property.Name].ColumnProperty.Name;
bulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping
(clrPropertyName, tableColumnName));
}
}
foreach (var entity in entities)
{
var e = entity;
table.Rows.Add(properties.Select(property => GetPropertyValue
(property.GetValue(e, null))).ToArray());
}
var cmd = conn.CreateCommand();
cmd.Transaction = transaction;
var pkColumn = columnMappings.Values.Where(m =>
m.ColumnProperty.IsStoreGeneratedIdentity).Select(m => m.ColumnProperty).SingleOrDefault();
if (pkColumn != null)
{
var pkColumnName = pkColumn.Name;
var pkColumnType = Type.GetType(pkColumn.PrimitiveType.ClrEquivalentType.FullName);
cmd.CommandText = $@"SELECT COUNT(*) FROM {tableName}";
var result = cmd.ExecuteScalar();
var count = Convert.ToInt64(result);
cmd.CommandText = $"SELECT IDENT_INCR('{tableName}')";
result = cmd.ExecuteScalar();
dynamic identIncrement = Convert.ChangeType(result, pkColumnType);
cmd.CommandText = $"SELECT IDENT_CURRENT('{tableName}')";
result = cmd.ExecuteScalar();
dynamic identcurrent = Convert.ChangeType(result, pkColumnType);
var nextId = identcurrent + (count > 0 ? identIncrement : 0);
bulkCopy.BulkCopyTimeout = 5 * 60;
bulkCopy.WriteToServer(table);
cmd.CommandText = $"SELECT SCOPE_IDENTITY()";
result = cmd.ExecuteScalar();
dynamic lastId = Convert.ChangeType(result, pkColumnType);
cmd.CommandText = $"SELECT {pkColumnName} From {tableName}
WHERE {pkColumnName} >= {nextId} and {pkColumnName} <= {lastId}";
var reader = cmd.ExecuteReader();
var ids = (from IDataRecord r in reader
let pk = r[pkColumnName]
select pk)
.OrderBy(i => i)
.ToArray();
if (ids.Length != entities.Count) throw new ArgumentException
SetProperty(pkColumnName, entities[i], ids[i]);
}
}
else
{
bulkCopy.BulkCopyTimeout = 5 * 60;
bulkCopy.WriteToServer(table);
}
}
private string GetTableName(Type t)
{
var dbSet = Set(t);
var sql = dbSet.ToString();
var regex = new Regex(@"FROM (?<table>.*) AS");
var match = regex.Match(sql);
return match.Groups["table"].Value;
}
private object GetPropertyValue(object o)
{
if (o == null)
return DBNull.Value;
return o;
}
private Mappings GetMappings(MetadataWorkspace workspace,
string containerName, string entityName)
{
var columnMappings = new Dictionary<string, CLR2ColumnMapping>();
var storageMapping = workspace.GetItem<GlobalItem>(containerName, DataSpace.CSSpace);
dynamic temp = storageMapping.GetType().InvokeMember(
"EntitySetMappings",
BindingFlags.GetProperty | BindingFlags.Public | BindingFlags.Instance,
null, storageMapping, null);
var entitySetMaps = new List<EntitySetMapping>();
foreach (var t in temp)
{
entitySetMaps.Add((EntitySetMapping)t);
}
var entitySetMap = entitySetMaps.Single(m => m.EntitySet.ElementType.Name == entityName);
var typeMappings = entitySetMap.EntityTypeMappings;
EntityTypeMapping typeMapping = typeMappings[0];
var fragments = typeMapping.Fragments;
var fragment = fragments[0];
var properties = fragment.PropertyMappings;
foreach (var property in properties.Where
(p => p is ScalarPropertyMapping).Cast<ScalarPropertyMapping>())
{
var clrProperty = property.Property;
var columnProperty = property.Column;
columnMappings.Add(clrProperty.Name, new CLR2ColumnMapping
{
CLRProperty = clrProperty,
ColumnProperty = columnProperty,
});
}
var foreignKeyMappings = new List<ForeignKeyMapping>();
var navigationProperties =
typeMapping.EntityType.DeclaredMembers.Where
(m => m.BuiltInTypeKind == BuiltInTypeKind.NavigationProperty)
.Cast<NavigationProperty>()
.Where(p => p.RelationshipType is AssociationType)
.ToArray();
foreach (var navigationProperty in navigationProperties)
{
var relType = (AssociationType)navigationProperty.RelationshipType;
if (foreignKeyMappings.All(m => m.Name != relType.Name))
{
var fkMapping = new ForeignKeyMapping
{
NavigationPropertyName = navigationProperty.Name,
BuiltInTypeKind = navigationProperty.TypeUsage.EdmType.BuiltInTypeKind,
Name = relType.Name,
FromType = relType.Constraint.FromProperties.Single().DeclaringType.Name,
FromProperty = relType.Constraint.FromProperties.Single().Name,
ToType = relType.Constraint.ToProperties.Single().DeclaringType.Name,
ToProperty = relType.Constraint.ToProperties.Single().Name,
};
foreignKeyMappings.Add(fkMapping);
}
}
return new Mappings
{
ColumnMappings = columnMappings,
ToForeignKeyMappings = foreignKeyMappings.Where(m => m.ToType == entityName).ToArray(),
FromForeignKeyMappings = foreignKeyMappings.Where
(m => m.FromType == entityName).ToArray()
};
}
private dynamic GetProperty(string property, object instance)
{
var type = instance.GetType();
return type.InvokeMember(property, BindingFlags.GetProperty | BindingFlags.Public |
BindingFlags.Instance, Type.DefaultBinder, instance, null);
}
private void SetProperty(string property, object instance, object value)
{
var type = instance.GetType();
type.InvokeMember(property, BindingFlags.SetProperty | BindingFlags.Public |
BindingFlags.Instance, Type.DefaultBinder, instance, new[] { value });
}
}
class IdentityEqualityComparer<T> : IEqualityComparer<T> where T : class
{
public int GetHashCode(T value)
{
return RuntimeHelpers.GetHashCode(value);
}
public bool Equals(T left, T right)
{
return left == right;
}
}
class Mappings
{
public Dictionary<string, CLR2ColumnMapping> ColumnMappings { get; set; }
public ForeignKeyMapping[] ToForeignKeyMappings { get; set; }
public ForeignKeyMapping[] FromForeignKeyMappings { get; set; }
}
class CLR2ColumnMapping
{
public EdmProperty CLRProperty { get; set; }
public EdmProperty ColumnProperty { get; set; }
}
class ForeignKeyMapping
{
public BuiltInTypeKind BuiltInTypeKind { get; set; }
public string NavigationPropertyName { get; set; }
public string Name { get; set; }
public string FromType { get; set; }
public string FromProperty { get; set; }
public string ToType { get; set; }
public string ToProperty { get; set; }
}
}
You can also download the Bulkinsert.App
tutorial application from this link.
History
- 2017-04-15
- Fixed yet another minor bug related to complex entity graphs
- 2017-04-07
- Changed the software license to Apache License 2.0
- 2017-03-29
- Fixed some minor bugs and added support for complex entity graphs where the same entity object exists in multiple places. Now making sure to save each entity only once!
- 2017-03-17
- Updated version of sample code uploaded. Significant performance improvements when saving complex entity graphs
- 2017-03-10