The least intuitive LINQ operators for me are the join operators. After working with healthcare data warehouses for years, I've become accustomed to writing outer joins to circumvent data of the most … suboptimal kind. Foreign keys? What are those? Alas, I digress…
At first glance, LINQ appears to only offer a join operator with an 'inner join' behavior. That is, when joining a sequence of departments with a sequence of employees, we will only see those departments that have one or more employees.
var query = from department in departments join employee in employees on department.ID equals employee.DepartmentID select new { employee.Name, Department = department.Name };
After a bit more digging, you might come across the GroupJoin operator. We can use GroupJoin like a SQL left outer join. The "left" side of the join is the outer sequence. If we use departments as the outer sequence in a group join, we can then see the departments with no employees. Note: it is the into keyword in the next query that triggers the C# compiler to use a GroupJoin instead of a plain Join operator.
var query = from department in departments join employee in employees on department.ID equals employee.DepartmentID into employeeGroup select new { department.Name, Employees = employeeGroup };
As you might suspect from the syntax, however, the query doesn't give us back a "flat" resultset like a SQL query. Instead, we have a hierarchy to traverse. The projection provides us a department name for each sequence of employees.
foreach (var department in query) { Console.WriteLine("{0}", department.Name); foreach (var employee in department.Employees) { Console.WriteLine("\t{0}", employee.Name); } }
Flattening a sequence is a job for SelectMany. The trick is in knowing that adding an additional from clause translates to a SelectMany operator, and just like the outer joins of SQL, we need to project a null value when no employee exists for a given department – this is the job of DefaultIfEmpty.
var query = from department in departments join employee in employees on department.ID equals employee.DepartmentID into employeeGroups from employee in employeeGroups.DefaultIfEmpty() select new { DepartmentName = department.Name, EmployeeName = employee.Name };
One last catch – this query does work with LINQ to SQL, but if you are stubbing out a layer using in-memory collections, the query can easily throw a null reference exception. The last tweak would be to make sure you have a non-null employee object before asking for the Name property in the last select.