The least intuitive LINQ operators for me are the join operators. After working with healthcare data warehouses for years, I've become accustomed to writing outer joins to circumvent data of the most … suboptimal kind. Foreign keys? What are those? Alas, I digress…
At first glance, LINQ appears to only offer a join operator with an 'inner join' behavior. That is, when joining a sequence of departments with a sequence of employees, we will only see those departments that have one or more employees.
After a bit more digging, you might come across the GroupJoin operator. We can use GroupJoin like a SQL left outer join. The "left" side of the join is the outer sequence. If we use departments as the outer sequence in a group join, we can then see the departments with no employees. Note: it is the into keyword in the next query that triggers the C# compiler to use a GroupJoin instead of a plain Join operator.
As you might suspect from the syntax, however, the query doesn't give us back a "flat" resultset like a SQL query. Instead, we have a hierarchy to traverse. The projection provides us a department name for each sequence of employees.
Flattening a sequence is a job for SelectMany. The trick is in knowing that adding an additional from clause translates to a SelectMany operator, and just like the outer joins of SQL, we need to project a null value when no employee exists for a given department – this is the job of DefaultIfEmpty.
One last catch – this query does work with LINQ to SQL, but if you are stubbing out a layer using in-memory collections, the query can easily throw a null reference exception. The last tweak would be to make sure you have a non-null employee object before asking for the Name property in the last select.
Comments
If this was LINQ to SQL / entity framework, you could have an Fkey relationship between employees and departments and avoid writing the group join logic:
from department in departments
from employee in department.Employees.DefaultIfEmpty()
select new { DepartmentName = department.Name, EmployeeName = employee.Name };
Agreed, I'm preferring them myself for the same reason (plus it makes the sequence pipeline a bit more explicit).
In this case, the lambda expression version is quite ugly though :)
var query =
departments.GroupJoin(employees,
department => department.ID,
employee => employee.DepartmentID,
(department, employeeGrouping) =>
new
{
DepartmentName = department.Name,
Employees = employeeGrouping
})
.SelectMany(
group => group.Employees.DefaultIfEmpty(),
(group, employee) => new {
DepartmentName = group.DepartmentName,
EmployeeName = employee != null ? employee.Name : ""
}
);
This topic has been unclear for me as well, until now ;-)
msdn2.microsoft.com/en-us/library/bb738638.aspx
A simple parent child relation can be pulled in like this:
var em = new EntityModel();
var qry = from customer in em.Customers
select new {Customer = customer, customer.Orders, customer.Addresses};
This will return an anonymous type with {Customer Customer, EntityCollection<Order> Orders, EntityCollection<Address> Addresses}
My problem, and I have not solved this yet, is how to get the OrderDetails into that query too. This is a grandchild to Customer and so there is no direct property.
I want all Customers, with or without Orders, and all Orders for those customers, with or without OrderDetails.
var em = new EntityModel();
var qry = em.Customers
.Include("Orders")
.Include("Orders.OrderDetails")
.Include("Addresses");
Off hand I don't have an answer. EF 1.0 has some limitations! Sorry I don't have a good answer. You might try asking in the forums.
Cheers