Thursday, March 27, 2008

Expression Trees: Why LINQ to SQL is Better than NHibernate

In my last post I described how the Where() function works for LINQ to Objects via extension methods and the yield statement. That was interesting. But where things get crazy is how the other LINQ technologies, like LINQ to SQL use extension methods. In particular it’s their use of a new C# 3 feature called expression trees that makes them extremely powerful. And it’s an advantage that more traditional technologies like NHibernate will never touch until they branch out from being a simple port of a Java technology. In this post I’ll explain the inherent advantage conferred on LINQ technologies by expression trees and attempt to describe how the magic works.

What’s so Magic about LINQ to SQL?

LINQ to SQL (and it’s more powerful unreleased cousin LINQ to Entities) is a new Object Relational Mapping (ORM) technology from Microsoft. It allows you to write something like the following:

IEnumerable<Product> products = northwindDataContext.Products.Where(
      p => p.Category.CategoryName == "Beverages"
      );

Which as you’d expect returns products from the database whose category is Beverages. But wait, aren’t you impressed? If not read over that code again, you should be very impressed. In the background that C# code is converted into the following SQL:

SELECT [t0].[ProductID], [t0].[ProductName], ...
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN [dbo].[Categories] AS [t1]
ON [t1].[CategoryID] = [t0].[CategoryID]
WHERE [t1].[CategoryName] = @p0

In other words it’s pretty smart. It isn’t just returning all products and filtering them in memory using the LINQ to Objects version of Where() I discussed previously.

Doing something like that using NHibernate Criteria would require something like this:

ICriteria c = session.CreateCriteria(typeof(Product));
c.Add(Expression.Eq("Category.CategoryName", "Beverages"));
IEnumerable<Product> products = c.List<Product>();

You could use HQL too, but both NHibernate options suffer from the same problem. Did you spot it?

The LINQ to SQL version is taking actual strongly typed C# code and somehow smartly converting it to useful SQL. The NHibernate version does the same thing, but always using a weakly typed alternative. In other words the column “CategoryName” in NHibernate is a string. If it or its data type change in NHibernate you won’t find out until runtime. And that is the beauty of LINQ to SQL: you’ll find more errors at compile time. And if you’re like me you want the compiler to find your mistakes before the unit tests that you (or your fellow developers) may or may not have written do.

So you’re probably now wondering if you can put strongly typed C# in your where clause and it somehow magically gets converted to SQL, what’s the limit? If you put in a String.ToLower() or StartsWith() will it get converted to equivalent SQL? What about a loop or conditional? A function call? A recursive function call? At some point it has to break down and either return all products and filter them in memory or just fail right? Before answering those questions we need to understand what’s going on.

Understanding the Magic

The Magic happens in a class called Expression<T>. Expression takes a generic argument that must be a delegate and is usually one of the built in Func methods.  However the class can only be instantiated to a lambda expression. That’s right, not a delegate or anonymous method, only a Lambda expression. So in my deferred execution post where I explained what Lambda expression are, I said they were essentially syntactic sugar for an anonymous methods. Well, the emphasis is on the essentially, because they really aren’t sugar at all. When you assign a lambda expression to an Expression, the compiler, rather than generating the IL to evaluate the expression, generates IL that constructs an abstract syntax tree (AST) for the expression! You can then parse the tree and perform actions based on the code in the lambda expression.

Below is an example adapted from the .Net Developer’s guide on MSDN that shows how this works:

// convert the lambda expression to an abstract syntax tree
Expression<Func<int, bool>> expression = i => i < 5;

ParameterExpression param = (ParameterExpression)expression.Parameters[0];
// this next line would fail if we change the Lambda expression much
BinaryExpression operation = (BinaryExpression)expression.Body;
ParameterExpression left = (ParameterExpression)operation.Left;
ConstantExpression right = (ConstantExpression)operation.Right;

Console.WriteLine("Decomposed expression: {0} => {1} {2} {3}",
      param.Name,
      left.Name,
      operation.NodeType,
      right.Value
      );

This outputs “Decomposed expression: i => i LessThan 5”. The first line is the most important. It defines an Expression that takes a delegate with a single int parameter and a return type of bool. It then instantiates the Expression to a simple lambda expression.  Incidentally this would also work if we defined our own Delegate:

public delegate bool LessThanFive(int i);

public static void DoStuff() {
      Expression<LessThanFive> expression = i => i < 5;
}

It would, however, not work if we used an anonymous method:

Expression<Func<int, bool>> expression = delegate(int i) { return i < 5; };

While that looks legal it actually results in the compile time error “An anonymous method expression cannot be converted to an expression tree.”

There is a lot of complexity in parsing the AST, far beyond the scope of this article. However, the MSDN does have a nice diagram that helps explain how the following slightly more complicated Lambda expression that determines if a string has more letters than a number:

Expression<Func<string, int, bool>> expression =
    (str, num) => num > str.Length;

How Deep Does The Rabbit Hole Go?

So LINQ to SQL uses this Expression Tree technique to parse a plethora of possible code that you could throw at it and turn it into smart SQL. For instance check out a couple of the following conversions that LINQ to SQL will (or will not) perform:

p => p.Category.CategoryName.ToLower() == "beverages"

Results In:

SELECT [t0].[ProductID], ...
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN [dbo].[Categories] AS [t1] ON [t1].[CategoryID] = [t0].[CategoryID]
WHERE LOWER([t1].[CategoryName]) = @p0

Not bad, huh? How about:

p => p.Category.CategoryName.Contains("everage")

That results in the following SQL snippet:

WHERE [t1].[CategoryName] LIKE @p0

And it sets @p0 to “%everage%”. Pretty cool. Ok this will get it to fail though, right?

public static string GetCat() {
    return "Beverages";
}

IEnumerable<Product> products = northwindDataContext.Products.Where(
      p => p.Category.CategoryName == GetCat()
      );

It turns out that LINQ to SQL will look inside of other functions! Alright, there’s no way it can do complicated conditionals:

p => p.Category.CategoryName ==
    "Beverages" ? p.UnitsInStock < 5 : !p.Discontinued

This should only pick up Beverages that have fewer than 5 items in stock regardless of whether they are discontinued and any other products that aren’t discontinued. Would you believe that it runs a single SQL statement:

SELECT [t0].[ProductID], ...
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN [dbo].[Categories] AS [t1] ON [t1].[CategoryID] = [t0].[CategoryID]
WHERE (
    (CASE
        WHEN [t1].[CategoryName] = @p0 THEN
            (CASE
                WHEN [t0].[UnitsInStock] < @p1 THEN 1
                WHEN NOT ([t0].[UnitsInStock] < @p1) THEN 0
                ELSE NULL
             END)
        ELSE CONVERT(Int,
            (CASE
                WHEN NOT ([t0].[Discontinued] = 1) THEN 1
                WHEN NOT NOT ([t0].[Discontinued] = 1) THEN 0
                ELSE NULL
             END))
     END)) = 1

Wow, it sure isn’t pretty, but it scales to multiple conditionals, and most importantly it didn’t return all products and process them in memory. Not bad.

Conclusion

I asserted up front that using expression trees and the strong typing that comes with them is the reason LINQ to SQL is inherently better that NHibernate. I really can’t make that claim without admitting one of LINQ to SQL’s biggest shortcomings: It currently does not support multiple table inheritance. Ultimately, however, it’s a short term fault since the forthcoming LINQ to Entities does. And I stand by my claim because from a long term perspective as long as technologies like NHibernate remain pure ports of Java code they will never realize the full benefits of equivelant LINQ technologies that take advantage of .Net's native strengths: like expression trees.

11 comments:

Leonardoavs said...

Lo primero NHibernate es un API no es un añadido al compilador si claro LINQ es bueno pero Linq por sí solo no tiene dos niveles de cache, no tiene herencia, no tiene lazy loading, el HQL es un lenguaje de consulta orientado a Objetos que se traduce una ves obtenido el String en un AST o un árbol de sintaxis abstracto. Linq es representado por una serie de añadiduras al lenguaje como lo son las expresiones lamda, los métodos de extensión y muchas otras cosas si Linq tiene verificación de tipos en tiempo de compilación pero lo que no te dicen es que tienes que tener una clase construida (Una clase que represente o mapee un objeto de base de datos una entidad o lo que sea) para que esto suceda como en Linq to SQL o Linq to Entities. Entonces en definitiva y para no arruinarle la fiesta Linq y NHibernate no son lo mismo y me alegro que los MS hayan hecho un ORM cuyo concepto fue popularizado por Hibernate. Por último se está construyendo un Linq to NHibernate así que nunca diga nunca y no solo es un porte es una gran tecnología que MS esta copiando.

dotstop said...

To me , that has almost NOTHING to do with NHibernate.

I mean sure, strongly typed. But you can't compare that LinqToSql to NHibernate .

You can compare the ToSql part with NHibernate and say : it produces better querys and compare mapping styles and what you can do with them.

There already is a project LinqToNhibernate. Then you will have typeSafety.

In my view your point is flawed.

Anonymous said...

Of course, LINQ to SQL has the huge disadvantage of only working with SQL Server. One of the advantages of O/R mappers is database independence, and you give that up with LINQ to SQL.

I always write a unit test for queries against the O/R mapper we're using (Wilson with a custom wrapper) to avoid runtime errors.

JD said...

@anon You might want to check out the beta of LightSpeed 2.0 which includes LINQ support (and therefore LINQ support of Oracle, PostgreSQL, MySQL, SQLite and, of course, SQL Server).

http://www.mindscape.co.nz/blog/index.php/2008/03/17/wanted-linq-to-lightspeed-beta-testers/

- JD

James Gregory said...

As dotstop said, this has barely anything to do with NHibernate.

To say that Linq to SQL is better than NHibernate is misleading. To say that LinqToSql is better than NHibernate's query engine may be fair, but there are many other benefits that NHibernate has over LinqToSql.

Combine those with the LinqToNHibernate project, or the Rhino.Commons additions and you're onto a type-safe winner.

Mike Griffin said...

We at EntitySpaces are pretty on par with LINQ as far as our DynamicQuery API goes, check it out... I'm not trying to spam, just showing you that LINQ is not the only way, or even the first, nor does it go half the places where we do.

http://www.entityspaces.net/Portal/Default.aspx

Subselects
http://www.entityspaces.net/blog/2008/03/03/EntitySpaces2008DynamicSubQueryShowcase.aspx

Joins
http://www.entityspaces.net/blog/2007/09/20/EntitySpacesAddsJoinsAndArithmeticExpressionsII.aspx

And it runs on SQL, Oracle, MySQL, PostgreSQL, VistaDB, Access, Mono, Compact Framework

Mike Griffin said...

Dang it, sorry it didn't post well
=======================

We at EntitySpaces are pretty on par with LINQ as far as our DynamicQuery API goes, check it out... I'm not trying to spam, just showing you that LINQ is not the only way, or even the first, nor does it go half the places where we do.

EntitySpaces

SubSelects

Joins

And it runs on SQL, Oracle, MySQL, PostgreSQL, VistaDB, Access, Mono, Compact Framework

Jimmy Bogard said...

"Simple port of a Java technology"

That's an .... interesting description of NHibernate. Both LINQ to SQL and LINQ to Entities are _years_ behind other ORM's, including NHibernate.

Expression trees? LINQ to NHibernate is fairly complete, lacking support in some edge cases.

NHibernate's advantage is that it supports persistence ignorance, something that LINQ to Entities won't support. Additionally, LINQ to Entities' core architecture (the 3 Models) is deeply flawed.

Not only does LINQ to SQL not support multiple table inheritance, but many, many mapping options that NHibernate does.

LINQ to SQL would be great for a demo app or prototype, but that's about it.

chris said...

@Jimmy Bogard "LINQ to Entities' core architecture (the 3 Models) is deeply flawed."

That's a rather sweepingly broad statement to make with absolutely zero supporting statements to back it up.

Marc said...

I'm essentially an ms guy, using.net since pre-release. For the last year or so I've been using java as I wanted to add some more strings to my bow. The java projects I have worked on tend to use spring and hibernate and for java it works really well. Spring has a HibernateDaoSupport class which gives you basic crud methods out of the box. No need to write anything sqlesque, whether strongly typed or not. Sure, you have to dive into hql or sql if you have anything complex or unusual to do.

Anyways, I felt with 3.5 it was time to check out linq and I'm porting one of my java apps to c#. There is an inheritance hierachy which uses multiple tables. Its a reasonable use of inheritance. Multiple tables is the best way to model it in the database. Using linq I cant do this. Single table inheritance is not the way to model this sort of inheritance and feels like a fudge. I am a great fan of .net, but if ms are going to go to the bother of integrating query functionality into the language they need to get this addressed. And throw in crud for single entites and collections out of the box too. I'm thinking of using nHibernate instead now.

Anonymous said...

NHybernate and DevX ORM are definitely show stoppers. Most large projects only use Linq. If you 'Skip' and 'Take' a few, you know why ;-)