Inner, Outer, Let's All Join Together With LINQ

The least intuitive LINQ operators for me are the join operators. After working with healthcare data warehouses for years, I've become accustomed to writing outer joins to circumvent data of the most … suboptimal kind. Foreign keys? What are those? Alas, I digress…

At first glance, LINQ appears to only offer a join operator with an 'inner join' behavior. That is, when joining a sequence of departments with a sequence of employees, we will only see those departments that have one or more employees.

var query =
  from department in departments
  join employee in employees
      
on department.ID equals employee.DepartmentID
  select new { employee.Name, Department = department.Name };

After a bit more digging, you might come across the GroupJoin operator. We can use GroupJoin like a SQL left outer join. The "left" side of the join is the outer sequence. If we use departments as the outer sequence in a group join, we can then see the departments with no employees. Note: it is the into keyword in the next query that triggers the C# compiler to use a GroupJoin instead of a plain Join operator.

var query =
  from department in departments
  join employee in employees
      
on department.ID equals employee.DepartmentID
    
  into employeeGroup
  select new { department.Name, Employees = employeeGroup };

As you might suspect from the syntax, however, the query doesn't give us back a "flat" resultset like a SQL query. Instead, we have a hierarchy to traverse. The projection provides us a department name for each sequence of employees.

foreach (var department in query)
{
    
Console.WriteLine("{0}", department.Name);
    
foreach (var employee in department.Employees)
    {
        
Console.WriteLine("\t{0}", employee.Name);
    }
}

Flattening a sequence is a job for SelectMany. The trick is in knowing that adding an additional from clause translates to a SelectMany operator, and just like the outer joins of SQL, we need to project a null value when no employee exists for a given department – this is the job of DefaultIfEmpty.

var query =
  from department in departments
  join employee in employees
      
on department.ID equals employee.DepartmentID
    
  into employeeGroups
  from employee in employeeGroups.DefaultIfEmpty()
  select new { DepartmentName = department.Name, EmployeeName = employee.Name };

One last catch – this query does work with LINQ to SQL, but if you are stubbing out a layer using in-memory collections, the query can easily throw a null reference exception. The last tweak would be to make sure you have a non-null employee object before asking for the Name property in the last select.

Print | posted @ Tuesday, March 25, 2008 3:54 AM

Comments on this entry:

Gravatar # re: Inner, Outer, Let's All Join Together With LINQ
by scott at 3/25/2008 11:58 AM

And I should mention....

If this was LINQ to SQL / entity framework, you could have an Fkey relationship between employees and departments and avoid writing the group join logic:

from department in departments
from employee in department.Employees.DefaultIfEmpty()
select new { DepartmentName = department.Name, EmployeeName = employee.Name };
  
Gravatar # re: Inner, Outer, Let's All Join Together With LINQ
by Milan Negovan at 3/25/2008 1:01 PM

Scott, this is the reason I prefer Standard Query Operators---everything is named explicitly without the double and triple meaning of what "select" means in each edge case. :)
  
Gravatar # re: Inner, Outer, Let's All Join Together With LINQ
by Milan at 3/25/2008 1:25 PM

Thank you very much. This is something I was looking for for quite some time. In one place, concise... Thanks.
  
Gravatar # re: Inner, Outer, Let's All Join Together With LINQ
by scott at 3/25/2008 2:47 PM

Milan:

Agreed, I'm preferring them myself for the same reason (plus it makes the sequence pipeline a bit more explicit).

In this case, the lambda expression version is quite ugly though :)

var query =
departments.GroupJoin(employees,

department => department.ID,
employee => employee.DepartmentID,
(department, employeeGrouping) =>
new
{
DepartmentName = department.Name,
Employees = employeeGrouping
})
.SelectMany(
group => group.Employees.DefaultIfEmpty(),
(group, employee) => new {
DepartmentName = group.DepartmentName,
EmployeeName = employee != null ? employee.Name : ""
}

);
  
Gravatar # re: Inner, Outer, Let's All Join Together With LINQ
by Ruurd at 3/25/2008 3:31 PM

Yes, very good.
This topic has been unclear for me as well, until now ;-)
  
Gravatar # re: Inner, Outer, Let's All Join Together With LINQ
by Paul Looijmans at 3/29/2008 3:45 PM

One thing to note re:DefaultIfEmpty is that Linq to Entities does not support it:

msdn2.microsoft.com/en-us/library/bb738638.aspx
  
Gravatar # re: Inner, Outer, Let's All Join Together With LINQ
by scott at 4/2/2008 4:49 AM

Paul: great point. This strategy would be for in-memory collections. Linq to Entities can do the same, but with a different approach.
  
Gravatar # re: Inner, Outer, Let's All Join Together With LINQ
by Sam Wheat at 10/28/2009 2:07 PM

Your articles are the best. Thanks very much!
  
Gravatar # re: Inner, Outer, Let's All Join Together With LINQ
by john at 11/16/2009 4:27 PM

So what is the Linq to Entities "different approach"?
  
Gravatar # re: Inner, Outer, Let's All Join Together With LINQ
by scott at 11/17/2009 9:37 PM

@john: it should work the same way
  
Gravatar # re: Inner, Outer, Let's All Join Together With LINQ
by Simon at 11/24/2009 12:32 PM

@scott: but it doesn't. Linq to Entities does NOT support the DefaultIfEmpty() method.

A simple parent child relation can be pulled in like this:

var em = new EntityModel();
var qry = from customer in em.Customers
select new {Customer = customer, customer.Orders, customer.Addresses};

This will return an anonymous type with {Customer Customer, EntityCollection<Order> Orders, EntityCollection<Address> Addresses}

My problem, and I have not solved this yet, is how to get the OrderDetails into that query too. This is a grandchild to Customer and so there is no direct property.
I want all Customers, with or without Orders, and all Orders for those customers, with or without OrderDetails.
  
Gravatar # re: Inner, Outer, Let's All Join Together With LINQ
by Simon at 11/24/2009 1:08 PM

Dohn! To pull the grandchildren too, you can shape the query with .Include(), like this:

var em = new EntityModel();
var qry = em.Customers
.Include("Orders")
.Include("Orders.OrderDetails")
.Include("Addresses");

  
Gravatar # re: Inner, Outer, Let's All Join Together With LINQ
by scott at 11/24/2009 10:25 PM

@Simon: Oh, right.

Off hand I don't have an answer. EF 1.0 has some limitations! Sorry I don't have a good answer. You might try asking in the forums.
  

Your comment:

Title:
Name:
Email:
Website:
 
Italic Underline Blockquote Hyperlink
 
 
Please add 4 and 7 and type the answer here: