Inner, Outer, Let's All Join Together With LINQ

The least intuitive LINQ operators for me are the join operators. After working with healthcare data warehouses for years, I've become accustomed to writing outer joins to circumvent data of the most … suboptimal kind. Foreign keys? What are those? Alas, I digress…

At first glance, LINQ appears to only offer a join operator with an 'inner join' behavior. That is, when joining a sequence of departments with a sequence of employees, we will only see those departments that have one or more employees.

var query =
  from department in departments
  join employee in employees
      
on department.ID equals employee.DepartmentID
  select new { employee.Name, Department = department.Name };

After a bit more digging, you might come across the GroupJoin operator. We can use GroupJoin like a SQL left outer join. The "left" side of the join is the outer sequence. If we use departments as the outer sequence in a group join, we can then see the departments with no employees. Note: it is the into keyword in the next query that triggers the C# compiler to use a GroupJoin instead of a plain Join operator.

var query =
  from department in departments
  join employee in employees
      
on department.ID equals employee.DepartmentID
    
  into employeeGroup
  select new { department.Name, Employees = employeeGroup };

As you might suspect from the syntax, however, the query doesn't give us back a "flat" resultset like a SQL query. Instead, we have a hierarchy to traverse. The projection provides us a department name for each sequence of employees.

foreach (var department in query)
{
    
Console.WriteLine("{0}", department.Name);
    
foreach (var employee in department.Employees)
    {
        
Console.WriteLine("\t{0}", employee.Name);
    }
}

Flattening a sequence is a job for SelectMany. The trick is in knowing that adding an additional from clause translates to a SelectMany operator, and just like the outer joins of SQL, we need to project a null value when no employee exists for a given department – this is the job of DefaultIfEmpty.

var query =
  from department in departments
  join employee in employees
      
on department.ID equals employee.DepartmentID
    
  into employeeGroups
  from employee in employeeGroups.DefaultIfEmpty()
  select new { DepartmentName = department.Name, EmployeeName = employee.Name };

One last catch – this query does work with LINQ to SQL, but if you are stubbing out a layer using in-memory collections, the query can easily throw a null reference exception. The last tweak would be to make sure you have a non-null employee object before asking for the Name property in the last select.

posted on Monday, March 24, 2008 11:54 PM by scott

Comments

Tuesday, March 25, 2008 4:58 AM by scott

# re: Inner, Outer, Let's All Join Together With LINQ

And I should mention....

If this was LINQ to SQL / entity framework, you could have an Fkey relationship between employees and departments and avoid writing the group join logic:

from department in departments
from employee in department.Employees.DefaultIfEmpty()
select new { DepartmentName = department.Name, EmployeeName = employee.Name };
Tuesday, March 25, 2008 6:01 AM by Milan Negovan

# re: Inner, Outer, Let's All Join Together With LINQ

Scott, this is the reason I prefer Standard Query Operators---everything is named explicitly without the double and triple meaning of what "select" means in each edge case. :)
Tuesday, March 25, 2008 6:25 AM by Milan

# re: Inner, Outer, Let's All Join Together With LINQ

Thank you very much. This is something I was looking for for quite some time. In one place, concise... Thanks.
Tuesday, March 25, 2008 7:01 AM by Jason Haley

# Interesting Finds: March 25, 2008

Tuesday, March 25, 2008 7:36 AM by Rocks Thoughts

# LINQ Outer Joins

LINQ Outer Joins
Tuesday, March 25, 2008 7:47 AM by scott

# re: Inner, Outer, Let's All Join Together With LINQ

Milan:

Agreed, I'm preferring them myself for the same reason (plus it makes the sequence pipeline a bit more explicit).

In this case, the lambda expression version is quite ugly though :)

var query =
departments.GroupJoin(employees,

department => department.ID,
employee => employee.DepartmentID,
(department, employeeGrouping) =>
new
{
DepartmentName = department.Name,
Employees = employeeGrouping
})
.SelectMany(
group => group.Employees.DefaultIfEmpty(),
(group, employee) => new {
DepartmentName = group.DepartmentName,
EmployeeName = employee != null ? employee.Name : ""
}

);
Tuesday, March 25, 2008 8:31 AM by Ruurd

# re: Inner, Outer, Let's All Join Together With LINQ

Yes, very good.
This topic has been unclear for me as well, until now ;-)
Saturday, March 29, 2008 8:45 AM by Paul Looijmans

# re: Inner, Outer, Let's All Join Together With LINQ

One thing to note re:DefaultIfEmpty is that Linq to Entities does not support it:

http://msdn2.microsoft.com/en-us/library/bb738638.aspx
Tuesday, April 01, 2008 9:49 PM by scott

# re: Inner, Outer, Let's All Join Together With LINQ

Paul: great point. This strategy would be for in-memory collections. Linq to Entities can do the same, but with a different approach.