In the past couple of weeks, I’ve had occasion to look fairly closely at the ADO.NET Entity Framework, and compare it to NHibernate. Of course, before I even started, I went out and read what other people had to say on the subject. Many people point to this post by Danny Simmons as approximating the "official" Microsoft position on the subject, and commenters around the web seem to focus specifically on what Danny calls "a [not yet delivered, but] much larger vision of an entity-aware data platform." That’s interesting, but there are differences which exist today which are perhaps even more interesting. As Danny points out, "The EF was specifically structured to separate the process of mapping queries/shaping results from building objects and tracking changes. "
The Entity Framework goes to fairly extreme measures to ensure that the bidirectional mappings between the store (database, usually) and the client entity model are correct, provably, when possible. You can read about that in great detail in this paper, but one very visible implication of it to the developer is that in the Entity Framework, you supply separate metadata descriptions for the store and the object model, plus a third description to bridge the two. Contrast that with a Hibernate mapping, which rolls all three together into one description. Erik Meijer and José Blakely elaborate on this point in an interview with ACM Queue’s Terry Coatta:
TC: You mentioned object-relational mappers. A certain portion of our audience has worked with products such as Hibernate or NHibernate and other commercial ORM systems. One of the existing characteristics of LINQ and the Entity Framework is that they divide traditional ORM into two pieces: one part handling mapping and one part handling querying. Is that a correct view and why is this separation reasonable?
JB: Several OR mappers bundle these two concerns together, and that actually makes sense when the only problem you’re trying to solve is how to bridge the gap between the application and the database.
But we should also look at another very broad class of mapping scenarios. We are building database management systems and data services around SQL Server— data services such as replication, reporting services, and OLAP. These all provide services at higher semantic levels of abstraction than does the relational model.
Thus, when we look at the impedance mismatch both of applications and data services, we realized that for the data-services case, you don’t want objects with methods and behaviors. What you want is a value-based, richer structural data model.
By value-based, I mean the ability to have a high-level constructs such as entities and relationships but without the behaviors. Just as the relational model is a value-based model, we felt that we needed to provide a layer of abstraction that is richer in terms of entities and relationships. Therefore, the Entity Data Model and the Entity Framework became a natural layer of abstraction that we felt had to be built, and it’s at that level of abstraction where the mapping between richer-level entities and semantic concepts such as inheritance is abstracted.
Now the Entity Data Model, which is the formalism that defines the Entity Framework value-based layer, is very close to the object data model of .NET, modulo the behaviors. We decided to let the Entity Framework take care of all the mapping concerns and then just build than programming-language veneers, or wrappers, over entities to expose a variety of programming-language bindings over this infrastructure.
EM: I would like to point out that there’s a deep analogy with how I explained LINQ in the beginning. We are trying to extract not one particular case where you go from tables to objects, but rather a wide variety of different things for different uses. So instead of having a one-often thing, we are trying to generalize this concept so that there are many other situations in which is applicable.
Ironically, one of the places where this distinction is most visible is when attempting to use the Entity Framework designer in the current .NET 3.5 / Visual Studio 2008 Service Pack 1 Beta. I say "attempting" because, at this point, if you do any serious work with the Entity Framework, you’re almost certain to be editing the EDMX file (which is XML) by hand. It’s very easy, at the moment, to make the designer create EDMX which is either not a valid or not parsable by the designer. Part of this is, no doubt, because the Entity Framework is quite a bit more mature than its designer. The Entity Framework has its roots in WinFS and Microsoft Research, while the Visual Studio designer appears to be a more recent addition to "productize" the Entity Framework. Presumably, the more glaring bugs in the designer will be fixed before release.
But part of me wonders if the instability in the designer is due not only to its relative immaturity, but also to the fact that it tries to present a single face for a mapping which is fundamentally a three-part system. Indeed, there seems to have been a drive from the "Entity Framework tools team" to roll these three parts into one. The designer shows (graphically) you only the conceptual model, for the most part. Some parts of the storage model filter through here and there, and the mapping can be seen when you click on an individual element. The paper I referenced earlier shows you the three parts in a graphical form, but the Entity Framework designer shows you only a piece of this, mostly the OO side of things.
One of the mental barriers that you have to get over when designing a good object relational mapping is the tendency to think primarily in object oriented terms, or relational terms, whichever suits your personality. A good object relational mapping, though, incorporates both a good object model and a good relational model. For example, let’s say you have a database with a table for People, and related tables for Employees and Customers. A single person might have a record in all three tables. Now, from a strictly relational point of view, you could construct a database VIEW for employees and another one for customers, both of which incorporate information from the People table. When using a one VIEW or the other, you can temporarily think of an individual person as "just" an Employee or "just" a Customer, even though you know that they are both. So someone coming from this worldview might be tempted to do an OO mapping where Employee and Customer are both (direct) subclasses of Person. But this doesn’t work with the data we have; since a single person has both employee and customer records (and since no Person instance can be of the concrete subtype Employee and Customer simultaneously), the OO relationship between Person and Employee needs to be composition rather than inheritance, and similarly for Person and Customer.
So would be Entity Framework designer be better if it graphically showed you all three facets of your mapping? It’s hard to say. Our model appears complicated enough in just the OO view, even though it represents less than 100 database tables, which is significantly smaller than the schema for our production applications. Attempting to add the storage metadata and the mapping to that diagram would be putting more lines on a drawing which already threatens to make the term "spaghetti code" literal.
At the same time, though, certain operations really beg for an explicit representation of the storage metadata, the OO model, and the mapping between them, especially when configuring relationships and inheritance. The latter, in particular, we have found difficult to do in the visual designer without corrupting the EDMX.
Perhaps more importantly, though, it may be true that the "simplification" of the designer is part of the reason that people have to ask what the differences between the Entity Framework and NHibernate in the first place. If you just look at the XML files for both systems, the difference stands out a lot more.