Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Repository Pattern, Done Right

4.89/5 (84 votes)
26 Mar 2015LGPL39 min read 550.5K  
This post aims to explain why the Repository Pattern can still be a great choice.

The repository pattern has been discussed a lot lately. Especially about its usefulness since the introduction of OR/M libraries. This post (which is the third in a series about the data layer) aims to explain why it’s still a great choice.

Let’s start with the definition:

A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. Client objects construct query specifications declaratively and submit them to Repository for satisfaction. Objects can be added to and removed from the Repository, as they can from a simple collection of objects, and the mapping code encapsulated by the Repository will carry out the appropriate operations behind the scenes.

The repository pattern is an abstraction. Its purpose is to reduce complexity and make the rest of the code persistent ignorant. As a bonus, it allows you to write unit tests instead of integration tests. The problem is that many developers fail to understand the patterns purpose and create repositories which leak persistence specific information up to the caller (typically by exposing IQueryable<T>).

By doing so, they get no benefit over using the OR/M directly.

Common Misconceptions

Here are some common misconceptions regarding the purpose of the pattern.

Repositories is About Being Able to Switch DAL Implementation

Using repositories is not about being able to switch persistence technology (i.e., changing database or using a web service, etc. instead).

Repository pattern does allow you to do that, but it’s not the main purpose.

A more realistic approach is that you in UserRepository.GetUsersGroupOnSomeComplexQuery() use ADO.NET directly while you in UserRepository.Create() use Entity Framework. By doing so, you are probably saving a lot of time instead of struggling with LinqToSql to get your complex query running.

Repository pattern allows you to choose the technology that fits the current use case.

Unit Testing

When people talk about Repository pattern and unit tests, they are not saying that the pattern allows you to use unit tests for the data access layer.

What they mean is that it allows you to unit test the business layer. It’s possible as you can fake the repository (which is a lot easier than faking nhibernate/EF interfaces) and by doing so, write clean and readable tests for your business logic.

As you’ve separated business from data, you can also write integration tests for your data layer to make sure that the layer works with your current database schema.

If you use ORM/LINQ in your business logic, you can never be sure why the tests fail. It can be because your LINQ query is incorrect, because your business logic is not correct or because the ORM mapping is incorrect.

If you have mixed them and fake the ORM interfaces, you can’t be sure either. Because Linq to Objects do not work in the same way as Linq to SQL.

Repository pattern reduces the complexity in your tests and allow you to specialize your tests for the current layer.

How to Create a Repository

Building a correct repository implementation is very easy. In fact, you only have to follow a single rule:

Do not add anything into the repository class until the very moment that you need it.

A lot of coders are lazy and try to make a generic repository and use a base class with a lot of methods that they might need. YAGNI. You write the repository class once and keep it as long as the application lives (can be years). Why mess it up by being lazy? Keep it clean without any base class inheritance. It will make it much easier to read and maintain.

The above statement is a guideline and not a law. A base class can very well be motivated. My point is that you should think before you add it, so that you add it for the right reasons.

Mixing DAL/Business

Here is a simple example of why it’s hard to spot bugs if you mix LINQ and business logic.

C#
var brokenTrucks = _session.Query<Truck>().Where(x => x.State == 1);
foreach (var truck in brokenTrucks)
{
   if (truck.CalculateReponseTime().TotalDays > 30)
       SendEmailToManager(truck);
}

What does that give us? Broken trucks?

Well. No. The statement was copied from another place in the code and the developer had forgotten to update the query. Any unit tests would likely just check that some trucks are returned and that they are emailed to the manager.

So we basically have two problems here:

  1. Most developers will likely just check the name of the variable and not on the query.
  2. Any unit tests are against the business logic and not the query.

Both those problems would have been fixed with repositories. Since if we create repositories, we have unit tests for the business and integration tests for the data layer.

Domain (Business) Entities vs DAL Entities

There is a common discussion which always comes up. And that's which kind of objects the repository should return.

As the repository is an abstraction, it should always return whatever the layer above wants to work with, which in most cases are domain entities, i.e., the objects which will encapsulate the logic in your business code.

I usually start by mapping my domain entities directly in the data layer. In the case of this article, that means I'm using either Entity Framework CodeFirst or FluentMappings for nhibernate.

Both nhibernate and entity framework CodeFirst can return disconnected entities, i.e., entities which are not wrapped in transparent proxies (aka change tracking). When you turn that off, you get regular POCOs that aren't lazy loaded nor tracked. They work like any other object.

EF and Nhibernate support non public setters (properties) and the only compromise you have to take with entity framework is that it cannot initialize lists using fields. Thus you can still protect the state of your objects and use methods to change the state (just like any other well defined domain entity).

Thus you can map your domain entities directly to nhibernate/entity framework without having to adjust them to fit the persistence technology. However, if you have to start adjusting your domain entities, you should of course create a specific class that is used to work with the persistence storage.

What I'm saying is that I try to follow YAGNI here too. Hence, I start by mapping my domain entities to the persistence library. However, I never have to make any compromises. If I can't build the domain entities exactly like I need to represent the business, I always create DAL specific entities. By doing so, I also have to do conversions between domain entities and DAL entities in the repository. But hey, that's what the repository is for.

Implementations

Here are some different implementations with descriptions.

Base Classes

These classes can be reused for all different implementations.

UnitOfWork

The unit of work represents a transaction when used in data layers. Typically, the unit of work will roll back the transaction if SaveChanges() has not been invoked before being disposed.

C#
public interface IUnitOfWork : IDisposable
{
    void SaveChanges();
}

Paging

We also need to have page results.

C#
public class PagedResult<TEntity>
{
    IEnumerable<TEntity> _items;
    int _totalCount;
    
    public PagedResult(IEnumerable<TEntity> items, int totalCount)
    {
        _items = items;
        _totalCount = totalCount;
    }
    
    public IEnumerable<TEntity> Items { get { return _items; } }
    public int TotalCount { get { return _totalCount; } }
}

We can with the help of that create methods like:

C#
public class UserRepository
{
    public PagedResult<User> Find(int pageNumber, int pageSize)
    {
    }
}

Sorting

Finally, we prefer to do sorting and page items, right?

C#
var constraints = new QueryConstraints<User>()
    .SortBy("FirstName")
    .Page(1, 20);
    
var page = repository.Find("Jon", constraints);

Do note that I used the property name, but I could also have written constraints.SortBy(x => x.FirstName). However, that is a bit hard to write in web applications where we get the sort property as a string.

The class is a bit big, but you can find it at github.

In our repository, we can apply the constraints as (if it supports LINQ):

C#
public class UserRepository
{
    public PagedResult<User> Find(string text, QueryConstraints<User> constraints)
    {
        var query = _dbContext.Users.Where
                    (x => x.FirstName.StartsWith(text) || x.LastName.StartsWith(text));
        var count = query.Count();
        
        //easy
        var items = constraints.ApplyTo(query).ToList();
        
        return new PagedResult(items, count);
    }
}

The extension methods are also available at github.

Entity Framework

Do note that the repository pattern is only useful if you have POCOs which are mapped using code first. Otherwise, you’ll just break the abstraction with the entities instead (= the repository pattern isn’t very useful then). You can follow this article if you want to get a foundation generated for you.

I usually start with a small repository definition:

C#
public interface IRepository<TEntity, in TKey> where TEntity : class
{
    TEntity Get(TKey id);
    void Save(TEntity entity);
    void Delete(TEntity entity);
}

which I then specialize per domain model:

C#
public interface ITruckRepository : IRepository<Truck, string>
{
    IEnumerable<Truck> FindAll();
    IEnumerable<Truck> Find(string text);
}

That specialization is important. It keeps the contract simple. Only create methods that you know you need.

Then I go about and do the implementation:

C#
public class TruckRepository : ITruckRepository
{
    private readonly TruckerDbContext _dbContext;

    public TruckRepository(TruckerDbContext dbContext)
    {
        _dbContext = dbContext;
    }

    public Truck Get(string id)
    {
        return _dbContext.Trucks.FirstOrDefault(x => x.Id == id);
    }

    public void Save(Truck entity)
    {
        _dbContext.Trucks.Attach(entity);
    }

    public void Delete(Truck entity)
    {
        _dbContext.Trucks.Remove(entity);
    }

    public IEnumerable<Truck> FindAll()
    {
        return _dbContext.Trucks.ToList();
    }

    public IEnumerable<Truck> Find(string text)
    {
        return _dbContext.Trucks.Where(x => x.ModelName.StartsWith(text)).ToList();
    }
}

Unit of Work

The unit of work implementation is simple for Entity framework:

C#
public class EntityFrameworkUnitOfWork : IUnitOfWork
{
    private readonly DbContext _context;

    public EntityFrameworkUnitOfWork(DbContext context)
    {
        _context = context;
    }

    public void Dispose()
    {
        
    }

    public void SaveChanges()
    {
        _context.SaveChanges();
    }
}

nhibernate

I usually use fluent nhibernate to map my entities. IMHO, it has got a much nicer syntax than the built in code mappings. You can use nhibernate mapping generator to get a foundation created for you. But you do most often have to clean up the generated files a bit.

We can use the same base definition as for EF:

C#
public interface IRepository<TEntity, in TKey> where TEntity : class
{
    TEntity Get(TKey id);
    void Save(TEntity entity);
    void Delete(TEntity entity);
}

nhibernate is quite similar to Entity Framework, but it has a Get method which we can use. Hence, we create a base class:

C#
public class NHibernateRepository<TEntity, in TKey> where TEntity : class
{
    ISession _session;
    
    public NHibernateRepository(ISession session)
    {
        _session = session;
    }
    
    protected ISession Session { get { return _session; } }
    
    public TEntity Get(string id)
    {
        return _session.Get<TEntity>(id);
    }

    public void Save(TEntity entity)
    {
        _session.SaveOrUpdate(entity);
    }

    public void Delete(TEntity entity)
    {
        _session.Delete(entity);
    }
}

The specialization interface looks the same:

C#
public interface ITruckRepository : IRepository<Truck, string>
{
    IEnumerable<Truck> FindAll();
    IEnumerable<Truck> Find(string text);
}

But the implementation gets smaller:

C#
public class TruckRepository : NHibernateRepository<Truck, string>, ITruckRepository
{
    public TruckRepository(ISession session)
        : base(session)
    {
    }

    public IEnumerable<Truck> FindAll()
    {
        return _session.Query<Truck>().ToList();
    }

    public IEnumerable<Truck> Find(string text)
    {
        return _session.Query<Truck>().Where(x => x.ModelName.StartsWith(text)).ToList();
    }
}

Unit of Work

C#
public class NHibernateUnitOfWork : IUnitOfWork
{
    private readonly ISession _session;
    private ITransaction _transaction;

    public NHibernateUnitOfWork(ISession session)
    {
        _session = session;
        _transaction = _session.BeginTransaction();
    }

    public void Dispose()
    {
        if (_transaction != null)
            _transaction.Rollback();
    }

    public void SaveChanges()
    {
        if (_transaction == null)
            throw new InvalidOperationException("UnitOfWork have already been saved.");

        _transaction.Commit();
        _transaction = null;
    }
}

Typical Mistakes

Here are some mistakes which can be stumbled upon when using OR/Ms.

Do Not Expose LINQ Methods

Let’s get it straight. There are no complete LINQ to SQL implementations. They all are either missing features or implement things like eager/lazy loading in their own way. That means that they all are leaky abstractions. So if you expose LINQ outside your repository, you get a leaky abstraction. You could really stop using the repository pattern then and use the OR/M directly.

C#
public interface IRepository<TEntity>
{
    IQueryable<TEntity> Query();
    
    // [...]
}

Those repositories really do not serve any purpose. They are just lipstick on a pig. Use your ORM directly instead.

Learn About Lazy Loading

Lazy loading can be great. But it’s a curse for all which are not aware of it. If you don’t know what it is, Google.

If you are not careful, you could get 101 executed queries instead of 1 if you traverse a list of 100 items.

Invoke ToList() Before Returning

The query is not executed in the database until you invoke ToList(), FirstOrDefault() etc. So if you want to be able to keep all data related exceptions in the repositories, you have to invoke those methods.

Get Is Not the Same as Search

There are to types of reads which are made in the database.

The first one is to search after items, i.e., the user wants to identify the items that he/she likes to work with.

The second one is when the user has identified the item and wants to work with it.

Those queries are different. In the first one, the user only wants to get the most relevant information. In the second one, the user likely wants to get all information. Hence in the former one, you should probably return UserListItem or similar while the other case returns User. That also helps you to avoid the lazy loading problems.

I usually let search methods start with FindXxxx() while those getting the entire item starts with GetXxxx(). Also don’t be afraid of creating specialized POCOs for the searches. Two searches doesn’t necessarily have to return the same kind of entity information.

Summary

Don’t be lazy and try to make repositories that are too generic. It gives you no upsides compared to using the OR/M directly. If you want to use the repository pattern, make sure that you do it properly.

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)