Skip to content

join in LINQ to SQL and LINQ to Entities Considered Messy, Redundant

In this post I will demonstrate that use of the join keyword in LINQ to SQL and LINQ to Entities is nearly always wrong. LINQ queries which you write with the join keyword are harder to read and write than queries you write using associations, and they require knowledge of database metadata which is not required otherwise. This introduces the potential for errors and makes maintenance harder.

Many people ask how to do a "left join" in LINQ to SQL, and unfortunately, the answer they nearly always get — "Use DefaultIfEmpty!" — is, in my opinion, terrible advice. Let’s implement the same query with and without the join keyword, and then compare the readability of the queries, the functionality, the knowledge required to write the query, and the maintainability of the code. I think you will find that using associations wins on every single criterion I examine.

I’m going to use the Northwind demo database for this example, since many people are familiar with its structure. I created a LINQ to SQL model for Northwind by simply dragging all of the tables in the database onto the LINQ to SQL designer. The only change I made to the model generated by the designer is to rename the "Employee1" property on the generated "Employee" type to the more descriptive name "Supervisor." I’m using LINQ to SQL for this demo, but everything I’m saying here applies equally to LINQ to Entities.

"Left" Joins

Let’s imagine that I am asked to produce a web page listing all employees in a company, along with their supervisor, if any. This requires a "left join," in SQL terms, because not all employees have a supervisor. I’ll project onto a presentation model, just like I do in LINQ to Entities. Using the association properties generated by LINQ to SQL, this is quite simple:

This is fairly readable. The one thing that you need to know is that both LINQ to SQL and LINQ to Entities coalesce nulls. This means that on a row where e.Supervisor is null, you will not get a NullReferenceException in the assignment to SupervisorName and SupervisorBirthDate, as you would with LINQ to Objects. Instead, null will be assigned. Therefore, it is important that EmployeeListItem.SupervisorBirthDate is of type DateTime? (a.k.a. Nullable<DateTime>) instead of the non-nullable DateTime.

Let’s compare that to the equivalent query using the join syntax, using the mysteriously popular DefaultIfEmpty trick:

Yuck! This is far less readable than the query above. Yet these two queries produce exactly the same results. If you don’t believe me, download the sample project attached to this post and try it yourself. Perhaps even worse than the general unreadability of the "join version" is the fact that this query requires knowledge of the structure of the database which is already present in the DBML (or EDMX, in the case of the Entity Framework) model. This is a problem for two reasons. First, it’s an opportunity for programmers to make a mistake, which the first query eliminates. Second, it’s a potential maintenance issue if the foreign key definition ever changes in the database.

"Inner" Joins

Now let’s compare an example of an "inner join" using both my recommended method of associations and the LINQ join keyword. Here’s the association version. It’s so readable that there is very little to say about it:

Here’s the join version:

Again, these two queries do exactly the same thing, as you can confirm for yourself by running the demo project. The join version here shares all of the faults of the "left join" version above.

API Consistency

Thus far, I’ve been discussing LINQ to SQL. But what if I have an Employee instance and that like to examine the employee’s supervisor? I might write code like this:

Now compare that code with the "association" and "join" query forms. You will see that using the associations makes the LINQ to SQL queries much more closely resemble how you work with the materialized entity objects in "regular" code. Again, I think this makes your code easier to read.

What About Performance?

Not surprisingly, the SQL generated by the equivalent association and join versions is close to identical. I would not expect to ever see a performance difference between the two syntaxes in terms of query execution time.

Is It Ever Correct to Use join?

The advantages of using associations are so strong that you may wonder why join exists in LINQ at all. Associations, however, are only helpful when they actually exist. There may be times when you need to "join" based on values which are not actually foreign keys. Or you may need to join between LINQ to SQL and LINQ to Objects.

Running the Demo Project

Here’s a the demo project. To build and run it, you’ll need to Visual Studio 2010 Beta, SQL Server (Express is fine), and the Northwind demo database I linked above. Open the project, find the Web.config file in Solution Explorer and open it. Change the connectionString to point to your SQL Server. Now you should be able to run the application.

{ 16 } Comments

  1. J.W. | January 17, 2010 at 10:59 am | Permalink

    Another good post, your answers on SO about EF is always good resource to learn EF.

  2. chris | January 29, 2010 at 10:58 am | Permalink

    Craig -

    I think a lot of people have missed the significance of:

    from s in context.Employees
    from e in s.Employees

    as a way of doing joins. Once you understand what you’re looking at, it’s obvious and makes perfect sense. It might make sense to highlight this in a future post. And again, thanks for the SO help.

  3. AL | March 2, 2010 at 6:15 pm | Permalink

    Aren’t we cheating somewhat here ?

    Response: I don’t think so, but then again, my goal is not to "immediately chuck the result set into a DataGridView." Yours might be, which is fine, but that just means your needs are different than mine. As for how to meet your needs, read on…

    In the self referencing Table Employees the field ReportsTo is nullable and actually contains nulls.
    It is unusual for a foreign key to be nullable between 2 SEPARATE tables.
    Try left joining Customers and Orders.
    Although CustomerID in Orders is nullable, there are no Orders without a CustomerID.

    Response: Not in the data delivered with Northwind, no, but it’s certainly possible.

    If Orders is the left table, an inner join and an outer join produce the same result anyway, and yield only those customers who have placed orders.
    If Customer is the left table, the fields of Customer.Order are inaccesible.

    Response: That is wrong, on several counts. First, the property is Orders, not Order. It’s 1..*, not 1..1. Second, you certainly can use the properties of Orders in a query. The following query works fine, for example:
    var q = from c in context.Customers
    from o in c.Orders
    select new
    {
    Name = c.ContactName,
    Freight = o.Freight
    };

    It requires a nested foreach to convert the result set to a List.

    Response:No. What I think you’re trying to say is "I don’t know how to convert the result to a "flattened, SQL-like" tabular result set instead of an "object-like," hierarchical tree of customers and orders without a nested foreach (or a join). But it’s quite possible. I’ve already shown the "inner join" version above. Here’s the "left join" version:
    var q = from c in context.Customers
    from o in c.Orders.DefaultIfEmpty()
    select new
    {
    Name = c.ContactName,
    Freight = o.Freight
    };

    I wouldn’t describe this as an improvement over ‘join’ where I can immediately chuck the result set into a DataGridView.

    Response:See the query above. This isn’t how I’d do it, because most use cases of such tabular result forms tend to want to group them back into hierarchies anyway (think of every reporting tool ever created). But it’s certainly possible. And again, the join is just noise. The association properties make it redundant and messy.

    Maybe you have an elegant solution to left joining Customers with Orders, i.e showing ALL Customers with or without Order details ?

  4. AL | March 4, 2010 at 6:12 am | Permalink

    Thank you for your informative response.
    I asked you for a query to left join Customers and Orders and you gave it.

    Simply adding DefaultIfEmpty() to the inner join syntax does the trick, although you spent so much time deprecating it in the original article.

    Response: My goal is to get LINQ users to think in terms of object graphs rather than relational sets. It’s a critical difference which many miss because LINQ looks so much like SQL. But if your goal is to throw the results into a visual control designed for a relational set, then there’s nothing particularly wrong with it.

    I find it quite reasonable to want a result set that looks like

    select t1.ContactName, t2.Freight from Customers t1
    left join Orders t2 on t1.CustomerID = t2.CustomerID

  5. rakesh | June 3, 2010 at 11:26 pm | Permalink

    Craig,

    I am trying to write a "testable code" as part of my learning process. Would look forward if you could write a post on writing testable code using C#.
    Thanks

  6. Meysam | December 7, 2010 at 1:34 pm | Permalink

    my DB has lots of relationship and as you know L2S/L2EF creates a list of…….
    i use L2EF in my Silverlight application via WCF and as you know, WCF serialize result of query base on its binding. now when i send a little query(number of result) it takes much time because each object contains many data on its relationship properties.

    i try select cluase clause in LINQ query but in L2S/L2EF not supported!! and also i don’t want create DTO object(My DB has more than 200 table)
    what’s your opinion?

  7. Mariusz Zaleski | January 3, 2011 at 3:00 pm | Permalink

    Hi Craig,

    thank you for an effort you’ve put into this post.
    I have read all and it refreshed waht already know.
    Refreshing is good because it lets you remember the "thing" for longer.

    Your effort is appreciated :-)

    Regards
    Mariusz

  8. Leon Anderson | November 25, 2011 at 11:26 am | Permalink

    Hi Craig,
    Thanks for such a great article…
    I notice though that you have no place where you join a query to another by way of entity itself, this is what I am having issue with and can not find anywhere validation for…binding inside the join on an entity return type instead of table equivalent type…

    Is this valid???
    ObjectQuery c2a = db.ADMINISTRATION;
    IQueryable AGREEMENT_IDS =
    from com2a in c2a
    where com2a.DESC.Contains(keyword)
    from com2b in com2a.PACK
    from com2c in com2b.AGREEMENT_PACK
    join com2d in db.AGREEMENT on com2c.AGREEMENT equals com2d <————- valid??
    select com2d.AGREEMENT_ID;

  9. Ray Akkanson | December 8, 2011 at 8:38 pm | Permalink

    Great Article

    Thanks
    Ray Akkanson

  10. Peter Trenery | February 27, 2012 at 12:03 pm | Permalink

    Just what I was looking for. Very nice.

    -Peter

  11. SIbi ELango | August 24, 2012 at 3:08 am | Permalink

    i searched many sites this ur article only guide cuteee. KeeP Rocking..

  12. Nate Pat | September 9, 2012 at 2:49 pm | Permalink

    Craig, for the left join you did, it is quit simple than using the DefaultValue. Question - how do you do the same using lamda expressions

  13. Mike Drummond | February 17, 2013 at 9:23 am | Permalink

    Craig - this just totally made my life easier. Eliminating joins, and therefore imminent redundancy, is definitely the way to go.

  14. Allen | February 20, 2013 at 4:24 pm | Permalink

    This was a really good article, thank you.

  15. marcantoine | April 17, 2013 at 2:52 am | Permalink

    Hello,

    I understand that it’s possible to do that only because of foreign keys and primary keys set in the database ? However, if I have a MySQL database how can I do that ?

    Thank you

  16. Majed | June 9, 2013 at 7:36 am | Permalink

    I really hate that join syntax and this the link I will refer to them.Great!

Post a Comment

Your email is never published nor shared. Required fields are marked *

Bad Behavior has blocked 713 access attempts in the last 7 days.

Close