Thursday, July 23, 2009

IEnumerable.Count() is a Code Smell

Way too often I come across code like this:

IEnumerable<Product> products = GetProducts();
if (products.Count() == 0) {
      SomePlaceHolder.Visible = false;
} else {
      SomePlaceHolder.Visible = true;
      SomeRepeater.DataSource = products;
      SomeRepeater.DataBind();
}

What’s wrong with this is that Count() is a LINQ extension method that literally iterates through every item in an enumerable. In other words it won’t short circuit after it finds more than zero items -- it will continue iterating through every single item. And while that may be a fast operation in dev, it may be a really slow operation in production.

But it gets worse. Because of LINQ’s deferred execution the results of .Count() are not cached. So when there is at least one product you are guaranteed to enumerate every single product twice: once for .Count() and once for .DataSource = products.

Do Any() Solutions Exist?

One possibility is to instead use IEnumerable.Any(). So you could rewrite the above code like this:

IEnumerable<Product> products = GetProducts();
bool anyProducts = products.Any();
SomePlaceHolder.Visible = anyProducts;
if (anyProducts) {
      SomeRepeater.DataSource = products;
      SomeRepeater.DataBind;
}

The results will be better regardless of if your data source is in memory or a database, but for illustration purposes LINQ to SQL outputs the following SQL if you run the code smell version when there is at least one product in the database:

SELECT COUNT(*) AS [value]
FROM [dbo].[Product] AS [t0]
SELECT [t0].[ProductId], [t0].[ProductName], ...
FROM [dbo].[Product] AS [t0]

And it outputs this SQL if you run the better version when there is at least one product:

SELECT
    (CASE
        WHEN EXISTS(
            SELECT NULL AS [EMPTY]
            FROM [dbo].[Product] AS [t0]
            ) THEN 1
        ELSE 0
     END) AS [value]
SELECT [t0].[ProductId], [t0].[ProductName]
FROM [dbo].[Product] AS [t0]

Looks a little uglier, but if you run the execution plan you’ve converted two table scans into an index scan and a table scan (well, depending on your schema of course). Regardless, much better execution-wise.

Undeferring deferred execution

Now what if your data source is a database and you really want to optimize the heck out of this and only perform one SQL statement? You could disable deferred execution by doing this:

IList<Product> products = GetProducts().ToList();
bool anyProducts = products.Any();
SomePlaceHolder.Visible = anyProducts;
if (anyProducts) {
      SomeRepeater.DataSource = products;
      SomeRepeater.DataBind;
}

The .ToList() function forces enumeration to happen immediately and cache the results in a list. This achieves the result of converting two queries into one, but be warned it also destroys one of the nicest benefits of LINQ’s deferred execution: the ability to append .Sort() or .Skip().Take() (paging) statements at the very end and have them execute in your original query, in SQL, in the database, where they belong.

Summary

IEnumerable.Count() seems like a pleasant and friendly convenience function; it feels so similar to the innocuous List.Count -- but don’t be fooled. It’s a mistake. One you (or the person who maintains the system when you leave) may soon regret.

4 comments:

itsliketoobright said...

Yah, I love the contractor we have who writes idiot code like this:

var things = //linq query here

for(var i = 0; i < things.count(); i++)
things.ToList()[i]

I wish I was joking.

Anonymous said...

Thank you very much for posting this one. Would've normally be inclined to use it until I read your article.

Lee Richardson said...

itsliketoobright: Wow! Awesome!

Anonymous said...

A great example of why you should know your data types, and specifically what an enumerator does and does not do. Just because you can count something with an extension method of a class that only guarantees forward movement, doesn't mean you should.

You wouldn't want to count data length with a stream either.