LINQ Part 4: A Deep Dive Into a Queryable Extension Method

Eric Lynch

4.80/5 (6 votes)

Apr 25, 2018

CPOL

8 min read

25073

318

Part 4 in the LINQ series, contrasts the System.Linq.Enumerable and System.Linq.Queryable extension methods and explores how expression trees are produced and consumed.

Download source - 7.1 KB

Introduction

In the previous article, we introduced IQueryable. That article was narrowly focused on introducing the reader to a few basic concepts:

The extension methods in System.Linq.Queryable operate on instances of IQueryable, by building an expression tree (Expression).
IQueryable simply pairs the expression tree (Expression) with a query provider (IQueryProvider).
The query provider (IQueryProvider) is responsible for interpreting the expression tree, executing the query, and fetching the results.
The query provider (IQueryProvider) may limit what can appear in an expression tree.

To explain this, we provided some examples of expression trees. By intent, the article was intentionally vague on how expression trees get built in the first place.

This article will dig slightly deeper into that topic. It will demonstrate a clear contrast between the extension methods in System.Linq.Enumerable and System.Linq.Queryable. It will demonstrate how some simple expression trees are actually produced and consumed.

Background

Because this article is quite short, I originally considered including it as a part of LINQ Part 3: An Introduction to IQueryable. However, I was concerned that this additional detail might distract from the basic concepts in that article. This article assumes you have already read the previous article.

This is the fourth in a series of articles on LINQ. Links to other articles in this series are as follows:

LINQ Part 1: A Deep Dive into IEnumerable
LINQ Part 2: Standard Methods - Tools in the Toolbox
LINQ Part 3: An Introduction to IQueryable
LINQ Part 4: A Deep Dive Into a Queryable Extension Method

Getting Ready

In the source included with this article, we examine two different implementations of the Where method. To do this, we first create both an IEnumerable and an IQueryable instance.

var things = new[] { "Red Apple", "Green Apple", "Red Balloon", "Green Balloon" };
var enumerable = things.AsEnumerable();
var queryable = things.AsQueryable();

So we can see take a closer look at the functionality, the source includes an approximate equivalent to the standard System.Linq.Enumerable.Where and System.Linq.Queryable.Where methods: MyEnumerable.MyWhere and MyQueryable.MyWhere.

Syntactically, the calls to these two methods are identical:

enumerable = enumerable.MyWhere(item => item.StartsWith("Red"));
queryable = queryable.MyWhere(item => item.StartsWith("Red"));

However, they result in two entirely different method calls. Since enumerable is an instance of IEnumerable, it will call the MyEnumerable.MyWhere method.

In contrast, queryable is an instance of IQueryable, and will call the MyQueryable.MyWhere method.

Help From the Compiler

The first surprising thing we'll note is that these two methods have entirely different parameter types for the predicate.

IEnumerable<T> MyWhere<T>(this IEnumerable<T> source, Func<T, bool> predicate);
IQueryable<T> MyWhere<T>(this IQueryable<T> source, Expression<Func<T, bool>> predicate);

How is it possible that we can (apparently) pass the exact same value for two different parameter types? The short answer is that we can't.

This subtle difference in method signature actually triggers two very different behaviors in the compiler.

In the first case (Func<T, bool>), the compiler simply constructs a delegate for the predicate method. This is what is passed to the MyEnumerable.MyWhere method.

In the second case (Expression<Func<T, bool>>), the compiler constructs an entire expression tree, on your behalf, and passes this expression tree as a parameter to the MyQueryable.MyWhere method. Visually, the expression tree it constructs (and passes) appears as follows:

Same Name - Very Different Methods

We have two very different implementations of the MyWhere method: MyEnumerable.MyWhere and MyQueryable.MyWhere. Let's take a quick look at how they work.

MyEnumerable.MyWhere

In MyEnumerable.MyWhere, we operate directly upon the original IEnumerable. The logic here is very simple.

We create a new instance of IEnumerable that only includes items, from the original IEnumerable, which match the predicate condition. The standard Enumerable.Where method provides similar functionality, but also includes some performance optimizations.

public static IEnumerable<T> MyWhere<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
  if (source == null)
    throw new ArgumentNullException(nameof(source));

  if (predicate == null)
    throw new ArgumentNullException(nameof(predicate));

  // Loop and simply yield each item where the predicate is true...
  foreach (T item in source)
    if (predicate(item))
      yield return item;
}

MyQueryable.MyWhere

In MyQueryable.MyWhere, we do something very different. We create a new instance of IQueryable that has a slightly altered version of the original expression tree (passed in as a parameter).

Principally, we wrap the original expression in a MethodCallExpression for the standard Queryable.Where method. The standard Queryable.Where method provides similar functionality, by wrapping the original expression in a self-referencing MethodCallExpression.

public static IQueryable<T> MyWhere<T>(this IQueryable<T> source, Expression<Func<T, bool>> predicate)
{
  if (source == null)
    throw new ArgumentNullException(nameof(source));

  if (predicate == null)
    throw new ArgumentNullException(nameof(predicate));

  // Get the method information for the true Queryable.Where method
  MethodInfo whereMethodInfo = GetMethodInfo<T>((s, p) => Queryable.Where(s, p));

  // Create arguments for a call to the true Queryable.Where method. Note that
  // we quote the Lambda expression for the predicate, which seems to be necessary
  // (not certain why).
  var callArguments = new[] { source.Expression, Expression.Quote(predicate) };

  // Create an expression that calls the true Queryable.Where method
  var callToWhere = Expression.Call(null, whereMethodInfo, callArguments);

  // Return the new query that wraps the original expression in a call to the Queryable.Where method
  return source.Provider.CreateQuery<T>(callToWhere);
}

The new expression tree visually appears as follows:

Not a True Method Call

An astute observer may wonder why the Queryable.Where method would wrap the expression in a self-referencing MethodExpressionCall. This would seem to be a recipe for infinite recursion / stack overflows.

In this behavior, Queryable.Where is far from alone. Most of the extension methods in the Queryable class have a similar implementation.

So, what's going on here? Why don't we see infinite recursion / stack overflow?

The fact is that no IQueryProvider, worth its salt, is every going to truly call these methods. Think of these MethodCallExpression nodes as an advertisement of intent.

With the MethodExpressionCall node referencing the Queryable.Where method, we're simply informing the query provider that it should limit the results to those matching the predicate condition. We do not expect it will actually call this method.

At the time of iteration, the query provider (IQueryProvider) is responsible for interpreting the expression tree, executing the query, and fetching the results.

As demonstrated in this code, it is possible to write your own IQueryable extension methods. However, to insure that any IQueryProvider can interpret your expression tree, it is important that you limit its contents to those that might appear in an expression tree created by one of the standard methods.

Let's consider two common query providers: LINQ to Objects and LINQ to SQL.

LINQ to Objects (System.Linq.EnumerableQuery)

In our example, we create our IQueryable via a call to the Queryable.AsQueryable method. This creates an instance of System.Linq.EnumerableQuery. This class is really only a thin wrapper around the LINQ to Objects extension methods (in System.Linq.Enumerable).

Note: The Queryable.EnumerableQuery class implements both IQueryable and IQueryProvider. So, its doing double duty here: as both a queryable and a query provider.

When we begin to iterate this queryable, we make a call to its IQueryable<T>.GetEnumerator method. This method rewrites the expression tree, before compiling and executing it.

Along with other changes, it finds all MethodCallExpression nodes, in the tree, that reference a method declared in the System.Linq.Queryable class. It then substitutes these nodes for MethodCallExpression nodes that reference equivalent methods in the System.Linq.Enumerable class.

So, in our example, Queryable.Where becomes Enumerable.Where.

We're not really calling the Queryable.Where method referenced in the original expression tree. Instead, we're calling the equivalent Enumerable.Where method referenced in the re-written tree.

LINQ to SQL (System.Data.Linq.Table)

With LINQ to SQL, something very different occurs. We create an instance of the System.Data.Linq.Table class. This class is remarkably complex.

Note: System.Data.Linq.Table erves as both the IQueryable and IQueryProvider for LINQ to SQL.

When we begin to iterate this queryable, a lot happens. In order to explain it, we'll be leaving out much of the fine detail and instead discussing it at a high level.

Basically, this provider visits each of the nodes in the expression tree, so that it can create the text for a complete SQL command (e.g. SELECT * FROM Items WHERE Color = 'Red').

In this case, the expression tree might include a MethodCallExpression for the Queryable.Where method. This method call would be translated into the text for a SQL WHERE clause (e.g. WHERE Color = 'Red').

Once again, we're not actually calling the Queryable.Where method, referenced in the original expression tree. Instead, that MethodCallExpression simply provides information that is formatted into text.

When the complete SQL command is formatted, the query provider simply uses ADO's DbConnection / DbCommand to execute it. It then returns an IEnumerator that iterates over the result set.

As it iterates, it creates appropriate instances and sets instance properties to their corresponding column values.

Of course, this is an over-simplification. Though, it does provide a taste of the high level functionality for this complex provider.

Same Name - Same Results

In our example source code, when we iterate either enumerable or queryable, we get the same results:

WriteItems(enumerable);
// Red Apple, Red Balloon

WriteItems(queryable);
// Red Apple, Red Balloon

To prove that we are truly dealing with an expression tree, in the case of queryable, we can simply examine its Expression property. Note: The source includes a simple derivation of System.Linq.Expressions.ExpressionVisitor (to dump the expression tree to the console).

WriteExpression(queryable.Expression);
// Call Where
//   Constant "System.String[]"
//   Quote
//     Lambda
//       Call StartsWith
//         Parameter item
//         Constant "Red"
//       Parameter item

Summary

From this article, we should now understand the following about most methods in the System.Linq.Queryable class:

The C# compiler actually does a lot of the work. It creates expression trees from Lambda expressions. The expression trees are then passed, as parameters, to the relevant method.
Most of the methods simply wrap the original expression tree in a self-referencing MethodExpressionCall. They then create a new IQueryable instance that references the resulting expression tree.
The self-referencing MethodExpressionCall, in the created expression tree, is never actually called. Instead, it merely advertises intent to the query provider (IQueryProvider).
The query provider acts upon this advertised intent. In some cases, it may translate the intent into an equivalent method call (e.g. Enumerable.Where). In others, it may translate the intent into the text for some query language (e.g. SQL).

History

4/25/2018 - The original version was uploaded