LINQ and Deferred Execution

As of .NET 3.0, LINQ (and the often related Lambda Expressions) have been available for our use and abuse. LINQ stands for Language INtegrated Query, and is a method of modelling OO data in a more or less relational sense that is not unlike databases. And just like databases, it comes with a cost.

To offset this cost, LINQ uses Deferred Execution. Deferred Execution means that the code is not executed until it is needed. This means that the LINQ code that you write is not actually executed until you NEED to execute it – typically during an enumeration of the results.

An example. Let’s create an array of 1,000,000 integers, all of random values between 1 and 10000, and sort them in an ascending fashion using LINQ:

static void Main(string[] args)
{
    // Anytime that you know the size of your list, specify 
    // it in the constructor. This enables much more efficient 
    // processor and memory usage
    var myIntegers = new List<int>(1000000);

    // Initialize RNG
    var random = new Random();

    // Populate the list with random numbers
    for (int i = 0; i < 1000000; i++)
    {
        myIntegers.Add(random.Next(1, 10000));
    }

    var stopwatch = new Stopwatch();
    stopwatch.Start();

    // LINQ time, let's sort them
    var result = myIntegers.OrderBy(i => i);

    stopwatch.Stop();
    Console.WriteLine("LINQ OrderBy Time: " + stopwatch.ElapsedMilliseconds + " ms");

    Console.ReadKey();
}

And the output:

/img/linq-order-by-result.png

LINQ OrderBy Result

Note how little time it took to Order the results – only 0 ms! Seems a bit fishy that we sorted 1,000,000 integers in less than 1 millisecond, doesn’t it? This is because our application didn’t actually sort them at all. Via the power of Deferred Execution, your .NET application is intelligent enough to know not to actually execute any LINQ queries until you NEED to. Note that since we never did anything with the result, the sort never actually happened.

What did happen, however, was the creation of a .NET object called an Expression Tree. An Expression Tree is used for “meta programming” – basically writing code that writes code. LINQ automatically generates an Expression Tree for your query – which it can build upon as you tag on additional queries – that isn’t executed until it needs to be executed. This allows you to do all of the neat things that LINQ supports like joins, selecting based on a particular class property or value, and so on. The generation of this Expression Tree is actually all that has happened so far in our application, and it took approximately 0 ms – cheap!

Recall now that LINQ can be performed on 2 types of objects: IEnumerable<T> and IQueryable<T>. Since IEnumerable<T> exposes only one method, GetEnumerator, it is safe to say that LINQ queries against IEnumerable<T> objects are only actually executed whenever enumeration actually occurs. To prove this, let’s alter the code slightly by adding a ToList() call to our OrderBy() call – ToList() forces an enumeration of the collection to gather results, so our OrderBy will actually do the work of ordering things:

static void Main(string[] args)
{
    // Anytime that you know the size of your list, specify 
    // it in the constructor. This enables much more efficient 
    // processor and memory usage
    var myIntegers = new List<int>(1000000);

    // Initialize RNG
    var random = new Random();

    // Populate the list with random numbers
    for (int i = 0; i < 1000000; i++)
    {
        myIntegers.Add(random.Next(1, 10000));
    }

    var stopwatch = new Stopwatch();
    stopwatch.Start();

    // LINQ time, let's sort them
    // This time we force the enumeration with ToList()
    var result = myIntegers.OrderBy(i => i).ToList();

    stopwatch.Stop();
    Console.WriteLine("LINQ OrderBy Time: " + stopwatch.ElapsedMilliseconds + " ms");

    Console.ReadKey();
}

And the result:

/img/linq-order-by-then-any-result.png

LINQ OrderBy Then Any Result

A whopping 356 ms – seems a little more realistic for performing a sort than 0 ms!

Why is LINQ done this way? Why not execute each query immediately as the LINQ method is called? The answer is efficiency. I’d love to get further into the nitty gritty details of why, but Charlie Calvert explains it better than I probably would. I will, however, cover Expression Trees in a future post – they’re a ton of fun.


See also