OdeToCode IC Logo

Inner Join, Outer Left Join, Let's All Join Together With LINQ

Tuesday, March 25, 2008

The least intuitive LINQ operators for me are the join operators. After working with healthcare data warehouses for years, I've become accustomed to writing outer joins to circumvent data of the most … suboptimal kind. Foreign keys? What are those? Alas, I digress…

At first glance, LINQ appears to only offer a join operator with an 'inner join' behavior. That is, when joining a sequence of departments with a sequence of employees, we will only see those departments that have one or more employees.

var query =
  from department in departments
  join employee in employees
      on department.ID equals employee.DepartmentID 
  select new { employee.Name, Department = department.Name };

After a bit more digging, you might come across the GroupJoin operator. We can use GroupJoin like a SQL left outer join. The "left" side of the join is the outer sequence. If we use departments as the outer sequence in a group join, we can then see the departments with no employees. Note: it is the into keyword in the next query that triggers the C# compiler to use a GroupJoin instead of a plain Join operator.

var query =
  from department in departments 
  join employee in employees 
      on department.ID equals employee.DepartmentID
      into employeeGroup 
  select new { department.Name, Employees = employeeGroup };

As you might suspect from the syntax, however, the query doesn't give us back a "flat" resultset like a SQL query. Instead, we have a hierarchy to traverse. The projection provides us a department name for each sequence of employees.

foreach (var department in query)
    Console.WriteLine("{0}", department.Name);
    foreach (var employee in department.Employees)
        Console.WriteLine("\t{0}", employee.Name);

Flattening a sequence is a job for SelectMany. The trick is in knowing that adding an additional from clause translates to a SelectMany operator, and just like the outer joins of SQL, we need to project a null value when no employee exists for a given department – this is the job of DefaultIfEmpty.

var query =
  from department in departments
  join employee in employees
      on department.ID equals employee.DepartmentID
      into employeeGroups
  from employee in employeeGroups.DefaultIfEmpty()
  select new { DepartmentName = department.Name, EmployeeName = employee.Name };

One last catch – this query does work with LINQ to SQL, but if you are stubbing out a layer using in-memory collections, the query can easily throw a null reference exception. The last tweak would be to make sure you have a non-null employee object before asking for the Name property in the last select.

scott Tuesday, March 25, 2008
And I should mention....

If this was LINQ to SQL / entity framework, you could have an Fkey relationship between employees and departments and avoid writing the group join logic:

from department in departments
from employee in department.Employees.DefaultIfEmpty()
select new { DepartmentName = department.Name, EmployeeName = employee.Name };
Milan Negovan Tuesday, March 25, 2008
Scott, this is the reason I prefer Standard Query Operators---everything is named explicitly without the double and triple meaning of what "select" means in each edge case. :)
Milan Tuesday, March 25, 2008
Thank you very much. This is something I was looking for for quite some time. In one place, concise... Thanks.
scott Tuesday, March 25, 2008

Agreed, I'm preferring them myself for the same reason (plus it makes the sequence pipeline a bit more explicit).

In this case, the lambda expression version is quite ugly though :)

var query =

department => department.ID,
employee => employee.DepartmentID,
(department, employeeGrouping) =>
DepartmentName = department.Name,
Employees = employeeGrouping
group => group.Employees.DefaultIfEmpty(),
(group, employee) => new {
DepartmentName = group.DepartmentName,
EmployeeName = employee != null ? employee.Name : ""

Ruurd Tuesday, March 25, 2008
Yes, very good.
This topic has been unclear for me as well, until now ;-)
Paul Looijmans Saturday, March 29, 2008
One thing to note re:DefaultIfEmpty is that Linq to Entities does not support it:

scott Wednesday, April 2, 2008
Paul: great point. This strategy would be for in-memory collections. Linq to Entities can do the same, but with a different approach.
Gravatar Sam Wheat Wednesday, October 28, 2009
Your articles are the best. Thanks very much!
john Monday, November 16, 2009
So what is the Linq to Entities "different approach"?
Gravatar scott Tuesday, November 17, 2009
@john: it should work the same way
Gravatar Simon Tuesday, November 24, 2009
@scott: but it doesn't. Linq to Entities does NOT support the DefaultIfEmpty() method.

A simple parent child relation can be pulled in like this:

var em = new EntityModel();
var qry = from customer in em.Customers
select new {Customer = customer, customer.Orders, customer.Addresses};

This will return an anonymous type with {Customer Customer, EntityCollection<Order> Orders, EntityCollection<Address> Addresses}

My problem, and I have not solved this yet, is how to get the OrderDetails into that query too. This is a grandchild to Customer and so there is no direct property.
I want all Customers, with or without Orders, and all Orders for those customers, with or without OrderDetails.
Gravatar Simon Tuesday, November 24, 2009
Dohn! To pull the grandchildren too, you can shape the query with .Include(), like this:

var em = new EntityModel();
var qry = em.Customers

Gravatar scott Tuesday, November 24, 2009
@Simon: Oh, right.

Off hand I don't have an answer. EF 1.0 has some limitations! Sorry I don't have a good answer. You might try asking in the forums.
joniba Sunday, January 23, 2011
Just wanted to say thanks for this great post.

Comments are closed.