In this post I will demonstrate that use of the join keyword in LINQ to SQL and LINQ to Entities is nearly always wrong. LINQ queries which you write with the join keyword are harder to read and write than queries you write using associations, and they require knowledge of database metadata which is not required otherwise. This introduces the potential for errors and makes maintenance harder.
Many people ask how to do a "left join" in LINQ to SQL, and unfortunately, the answer they nearly always get — "Use DefaultIfEmpty!" — is, in my opinion, terrible advice. Let’s implement the same query with and without the join keyword, and then compare the readability of the queries, the functionality, the knowledge required to write the query, and the maintainability of the code. I think you will find that using associations wins on every single criterion I examine.
I’m going to use the Northwind demo database for this example, since many people are familiar with its structure. I created a LINQ to SQL model for Northwind by simply dragging all of the tables in the database onto the LINQ to SQL designer. The only change I made to the model generated by the designer is to rename the "Employee1" property on the generated "Employee" type to the more descriptive name "Supervisor." I’m using LINQ to SQL for this demo, but everything I’m saying here applies equally to LINQ to Entities.
"Left" Joins
Let’s imagine that I am asked to produce a web page listing all employees in a company, along with their supervisor, if any. This requires a "left join," in SQL terms, because not all employees have a supervisor. I’ll project onto a presentation model, just like I do in LINQ to Entities. Using the association properties generated by LINQ to SQL, this is quite simple:
This is fairly readable. The one thing that you need to know is that both LINQ to SQL and LINQ to Entities coalesce nulls. This means that on a row where e.Supervisor is null, you will not get a NullReferenceException in the assignment to SupervisorName and SupervisorBirthDate, as you would with LINQ to Objects. Instead, null will be assigned. Therefore, it is important that EmployeeListItem.SupervisorBirthDate is of type DateTime? (a.k.a. Nullable<DateTime>) instead of the non-nullable DateTime.
Let’s compare that to the equivalent query using the join syntax, using the mysteriously popular DefaultIfEmpty trick:
Yuck! This is far less readable than the query above. Yet these two queries produce exactly the same results. If you don’t believe me, download the sample project attached to this post and try it yourself. Perhaps even worse than the general unreadability of the "join version" is the fact that this query requires knowledge of the structure of the database which is already present in the DBML (or EDMX, in the case of the Entity Framework) model. This is a problem for two reasons. First, it’s an opportunity for programmers to make a mistake, which the first query eliminates. Second, it’s a potential maintenance issue if the foreign key definition ever changes in the database.
"Inner" Joins
Now let’s compare an example of an "inner join" using both my recommended method of associations and the LINQ join keyword. Here’s the association version. It’s so readable that there is very little to say about it:
Here’s the join version:
Again, these two queries do exactly the same thing, as you can confirm for yourself by running the demo project. The join version here shares all of the faults of the "left join" version above.
API Consistency
Thus far, I’ve been discussing LINQ to SQL. But what if I have an Employee instance and that like to examine the employee’s supervisor? I might write code like this:
Now compare that code with the "association" and "join" query forms. You will see that using the associations makes the LINQ to SQL queries much more closely resemble how you work with the materialized entity objects in "regular" code. Again, I think this makes your code easier to read.
What About Performance?
Not surprisingly, the SQL generated by the equivalent association and join versions is close to identical. I would not expect to ever see a performance difference between the two syntaxes in terms of query execution time.
Is It Ever Correct to Use join?
The advantages of using associations are so strong that you may wonder why join exists in LINQ at all. Associations, however, are only helpful when they actually exist. There may be times when you need to "join" based on values which are not actually foreign keys. Or you may need to join between LINQ to SQL and LINQ to Objects.
Running the Demo Project
Here’s a the demo project. To build and run it, you’ll need to Visual Studio 2010 Beta, SQL Server (Express is fine), and the Northwind demo database I linked above. Open the project, find the Web.config file in Solution Explorer and open it. Change the connectionString to point to your SQL Server. Now you should be able to run the application.




{ 17 } Comments
Another good post, your answers on SO about EF is always good resource to learn EF.
Craig -
I think a lot of people have missed the significance of:
from s in context.Employees
from e in s.Employees
as a way of doing joins. Once you understand what you’re looking at, it’s obvious and makes perfect sense. It might make sense to highlight this in a future post. And again, thanks for the SO help.
Aren’t we cheating somewhat here ?
In the self referencing Table Employees the field ReportsTo is nullable and actually contains nulls.
It is unusual for a foreign key to be nullable between 2 SEPARATE tables.
Try left joining Customers and Orders.
Although CustomerID in Orders is nullable, there are no Orders without a CustomerID.
If Orders is the left table, an inner join and an outer join produce the same result anyway, and yield only those customers who have placed orders.
If Customer is the left table, the fields of Customer.Order are inaccesible.
It requires a nested foreach to convert the result set to a List.
I wouldn’t describe this as an improvement over ‘join’ where I can immediately chuck the result set into a DataGridView.
Maybe you have an elegant solution to left joining Customers with Orders, i.e showing ALL Customers with or without Order details ?
Thank you for your informative response.
I asked you for a query to left join Customers and Orders and you gave it.
Simply adding DefaultIfEmpty() to the inner join syntax does the trick, although you spent so much time deprecating it in the original article.
I find it quite reasonable to want a result set that looks like
select t1.ContactName, t2.Freight from Customers t1
left join Orders t2 on t1.CustomerID = t2.CustomerID
Craig,
I am trying to write a "testable code" as part of my learning process. Would look forward if you could write a post on writing testable code using C#.
Thanks
my DB has lots of relationship and as you know L2S/L2EF creates a list of…….
i use L2EF in my Silverlight application via WCF and as you know, WCF serialize result of query base on its binding. now when i send a little query(number of result) it takes much time because each object contains many data on its relationship properties.
i try select cluase clause in LINQ query but in L2S/L2EF not supported!! and also i don’t want create DTO object(My DB has more than 200 table)
what’s your opinion?
Hi Craig,
thank you for an effort you’ve put into this post.
I have read all and it refreshed waht already know.
Refreshing is good because it lets you remember the "thing" for longer.
Your effort is appreciated
Regards
Mariusz
Hi Craig,
Thanks for such a great article…
I notice though that you have no place where you join a query to another by way of entity itself, this is what I am having issue with and can not find anywhere validation for…binding inside the join on an entity return type instead of table equivalent type…
Is this valid???
ObjectQuery c2a = db.ADMINISTRATION;
IQueryable AGREEMENT_IDS =
from com2a in c2a
where com2a.DESC.Contains(keyword)
from com2b in com2a.PACK
from com2c in com2b.AGREEMENT_PACK
join com2d in db.AGREEMENT on com2c.AGREEMENT equals com2d <————- valid??
select com2d.AGREEMENT_ID;
Leon, no, that’s not valid. When comparing entities, you have to compare the ID, not the enity itself. So you can do
foo.Id = bar.Id, butfoo = barwon’t work.Great Article
Thanks
Ray Akkanson
Just what I was looking for. Very nice.
-Peter
i searched many sites this ur article only guide cuteee. KeeP Rocking..
Craig, for the left join you did, it is quit simple than using the DefaultValue. Question - how do you do the same using lamda expressions
Craig - this just totally made my life easier. Eliminating joins, and therefore imminent redundancy, is definitely the way to go.
This was a really good article, thank you.
Hello,
I understand that it’s possible to do that only because of foreign keys and primary keys set in the database ? However, if I have a MySQL database how can I do that ?
Thank you
marcantoine:
1) InnoDB supports FKs. If you’re using MyISAM, switch to InnoDB, or, better still, PostgreSQL.
2) You can have navigations/relationships in your model without FKs; it’s just harder. But see (1).
Post a Comment