The Standard LINQ Operators

Saturday, October 25, 2008

The standard LINQ operators are a collection of 50 (give or take a few) methods that form the heart and soul of Language Integrated Query. The beauty of these operators is how they can execute against diverse data sources. Not only can you use the operators locally to query object in memory, but you can use the same operators when working with objects stored in a relational database, in XML, behind a web service, and more. In this article we will take a tour of the query operators and drill into specific topics of interest.

What Is A Standard Query Operator?

The standard query operators are defined as extension methods on the static Enumerable and Queryable classes from the System.Linq namespace. Extension methods on the Enumerable class extend the IEnumerable<T> interface and generally work locally (in-memory). There are many “data sources” that implement the IEnumerable<T> interface, including arrays, lists, and generally any collection of objects.

Extension methods on the Queryable class extend the IQueryable<T> interface. Objects implementing IQueryable<T> are backed by a LINQ provider that can translate the operators into the native query language of some remote data source. When working with LINQ to SQL, for example, the LINQ to SQL provider translates query operators into T-SQL commands that execute against SQL Server.

The operators themselves fall into one of two categories. There are lazy operators that offer deferred execution of a query. When you write a LINQ query that is only using the lazy operators you are only defining the query, and the query will not execute until you actually try to iterate or consume the results of the query. Other operators are greedy operators that will execute a query immediately. We’ll be pointing out the greedy operators in this article, otherwise you can assume an operator is lazy.

Lazy operators can further be categorized as streaming or non-streaming operators. When a streaming operator executes it does not need to look at all of the data a query is producing. Instead, a streaming operator can work on the sequence one element at a time. A non-streaming operator will need to evaluate all the data in a sequence before it can produce a result. A good example is the OrderBy operator. Before OrderBy can produce its first result, it needs to traverse all the data to calculate which element should be first in the result.

Some of the standard operators have an associated C# keyword assigned for use in the LINQ comprehension query syntax (the query syntax that purposefully resembles SQL). This include the Where, OrderBy, Select, Join, and GroupBy operators, among others. Most of the standard operators, however, do not have an associated keyword in the C# language.

We’ll be breaking the operators down in to the following area of functionality.

  • Filtering
  • Projecting
  • Joining
  • Ordering
  • Grouping
  • Conversions
  • Sets
  • Aggregation
  • Quantifiers
  • Generation
  • Elements

Filtering Operators

The two filtering operators in LINQ are the Where and OfType operators. An example follows.

ArrayList list = new ArrayList();
list.Add("Dash");
list.Add(new object());
list.Add("Skitty");
list.Add(new object());

var query =
     from name in list.OfType<string>()
     where name == "Dash"
     select name;

The Where operator generally translates to a WHERE clause when working with a relational database and has an associated keyword in C# as shown in the code above. The OfType operator can be used to coax a non-generic collection (like an ArrayList) into a LINQ query. Since ArrayList  does not implement IEnumerable<T>, the OfType operator is the only LINQ operator we can apply to the list.OfType is also useful if you are working with an inheritance hierarchy and only want to select objects of a specific subtype from a collection. This includes scenarios where LINQ to SQL or the Entity Framework are used to model inheritance in the database. Both operators are deferred.

Sorting Operators

The sorting operators are OrderBy, OrderByDescending, ThenBy, ThenByDescending, and Reverse. When working with a comprehension query the C# orderby keyword will map to a corresponding OrderBy method. For example, the following code:

var query =
    from name in names
    orderby name, name.Length
    select name;

… would translate to …

var query =
    names.OrderBy(s => s)
         .ThenBy(s => s.Length);

The return value of the OrderBy operator is an IOrderedEnumerable<T>. This special interface inherits from IEnumreable<T> and allows the ThenBy and ThenByDescending operators to work as they extend IOrderedEnumerable<T> instead of IEnumerable<T> like the other LINQ operators. This can sometimes create unexpected errors when using the var keyword and type inference. The query variable in the last code snippet above will be typed as IOrderdEnumerable<T>. If you try to add to your query definition later in the same method …

query = names.Where(s => s.Length > 3);

… you’ll find a compiler error when you build that “IEnumerable<string> cannot be assigned to IOrderEnumerable<string>”. In a case like this you might want to forgo the var keyword and explicitly declare query as an IEnumerable<T>.

Set Operations

The set operators in LINQ include Distinct (to remove duplicate values), Except (returns the difference of two sequences), Intersect (returns the intersection of two sequences), and Union (returns the unique elements from two sequences).

int[] twos = { 2, 4, 6, 8, 10 };
int[] threes = { 3, 6, 9, 12, 15 };

// 6 
var intersection = twos.Intersect(threes);

// 2, 4, 8, 10 
var except = twos.Except(threes);

// 2, 4, 7, 8, 10, 3, 9, 12, 15 
var union = twos.Union(threes);

It’s important to understand how the LINQ operators test for equality. Consider the following in-memory collection of employees.

var employees = 
        new List<Employee> {
    new Employee() { ID=1, Name="Scott" },
    new Employee() { ID=2, Name="Poonam" },
    new Employee() { ID=3, Name="Scott"}
};

You might think a query of this collection using the Distinct operator would remove the “duplicate” employee named Scott, but it doesn’t.

// yields a sequence of 3 employees 
var query =
      (from employee in employees
       select employee).Distinct();

This version of the Distinct operator uses the default equality comparer which, for non-value types will test object references. The Distinct operator then will see three distinct object references. Most of the LINQ operators that use an equality test (including Distinct) provide an overloaded version of the method that accepts an IEqualityComparer you can pass to perform custom equality tests.

Another issue to be aware of with LINQ and equality is how the C# compiler generates anonymous types. Anonymous types are built to test if the properties of two objects have the same values (see And Equality For All Anonymous Types). Thus, the following query on the same collection would only return two objects.

// yields a sequence of 2 employees 
var query = (from employee in employees
             select new { employee.ID, employee.Name })
             .Distinct();

Quantification Operators

Quantifiers are the All, Any, and Contains operators.

int[] twos = { 2, 4, 6, 8, 10 };

// true 
bool areAllevenNumbers = twos.All(i => i % 2 == 0); 

// true 
bool containsMultipleOfThree = twos.Any(i => i % 3 == 0);

// false 
bool hasSeven = twos.Contains(7);

Personally, I’ve found these operators invaluable when working with collections of business rules. Here is a simple example:

Employee employee = 
        new Employee { ID = 1, Name = 
            "Poonam", DepartmentID = 1 };

Func<Employee, bool>[] validEmployeeRules = 
{
    e => e.DepartmentID > 0,
    e => !String.IsNullOrEmpty(e.Name),
    e => e.ID > 0
};


bool isValidEmployee =
      validEmployeeRules.All(rule => rule(employee));

Projection Operators

The Select operators return one output element for each input element. The Select operator has the opportunity to project a new type of element, however, and its often useful to think of the Select operator as performing a transformation or mapping.

The SelectMany operator is useful when working with a sequence of sequences. This operator will “flatten” the sub-sequences into a single output sequence – you can think of SelectMany as something like a nested iterator or nested foreach loop that digs objects out of a sequence inside a sequence. SelectMany is used when there are multiple from keywords in a comprehension query.

Here is an example:

string[] famousQuotes = 
{
    "Advertising is legalized lying",
    "Advertising is the greatest art form of the twentieth
        century" 
};

var query = 
        (from sentence in famousQuotesfrom word in sentence.Split(' ')
         select word).Distinct();

If we iterate through the result we will see the query produce the following sequence of strings: “Advertising” “is” “legalized” “lying” “the” “greatest” “Art” “form” “of” “twentieth” “century”. The second from keyword will introduce a SelectMany operator into the query and a translation of the above code into extension method syntax will look like the following.

query =
    famousQuotes.SelectMany(s => s.Split(' '))
                .Distinct();

The above code will produce the same sequence of strings. If we used a Select operator instead of SelectMany …

var query2 =
    famousQuotes.Select(s => s.Split(' '))
                .Distinct();

… then the result would be a sequence that contains two string arrays: { “Advertising”, “is”, “legalized”, “lying” } and { “Advertising”, “is”, “the”, “greatest”, “art”, “form”, “of”, “the” “twentieth”, “century” }. We started with an array of two strings and the Select operator projected one output object for each input. The SelectMany operator, however, iterated inside of the string arrays produced by Split to give us back one sequence of many strings.

Partition Operators

The Skip and Take operators in LINQ are the primary partitioning operators. These operators are commonly used to produced paged result-sets for UI data binding. In order to get the third page of results for a UI that shows 10 records per page, you could apply a Skip(20) operator followed by a Take(10).

There are also SkipUntil and TakeUntil operators that accept a predicate that you can use to dynamically express how many items you need.

int[] numbers = { 1, 3, 5, 7, 9 };

// yields 5, 7, 9 
var query = numbers.SkipWhile(n => n < 5)
                   .TakeWhile(n => n < 10);

Join Operators

In LINQ there is a Join operator and also a GroupJoin operator. The Join operator is similar to a SQL INNER JOIN in the sense that it only outputs  results when it finds a match between two sequences and the result itself is still a flat sequence. You typically need to perform some projection into a new object to see information from both sides of the join.

var employees = new List<Employee> {
    new Employee { ID=1, Name="Scott", DepartmentID=1 },
    new Employee { ID=2, Name="Poonam", DepartmentID=1 },
    new Employee { ID=3, Name="Andy", DepartmentID=2}
};

var departments = new List<Department> {
    new Department { ID=1, Name="Engineering" },
    new Department { ID=2, Name="Sales" },
    new Department { ID=3, Name="Skunkworks" }
};

var query =
     from employee in employees
     join department in departments
       on employee.DepartmentID 
           equals department.ID
     select new { 
         EmployeeName = employee.Name, 
         DepartmentName = department.Name 
     };

Notice the Join operator has a keyword for use in comprehension query syntax, but you must use the equals keyword and not the equality operator (==) to express the join condition. If you need to join on a “compound” key (multiple properties), you’ll need to create new anonymous types for the left and right hand side of the equals condition with all the properties you want to compare.

Here is the same query written using extension method syntax (no Select operator is needed since the Join operator is overloaded to accept a projector expression).

var query2 = employees.Join(
        departments,         // inner sequence 
        e => e.DepartmentID, // outer key selector 
        d => d.ID,           // inner key selector 
        (e, d) => new {      // result projector 
          EmployeeName = e.Name,
          DeparmentName = d.Name
        });

Note that neither of the last two queries will have the “Skunkworks” department appear in any output because the Skunkworks department never joins to any employees. This is why the Join operator is similar to the INNER JOIN of SQL. The Join operator is most useful when you are joining on a 1:1 basis.

To see Skunkworks appear in some output we can use the GroupJoin operator. GroupJoin is similar to a LEFT OUTER JOIN in SQL but produces a hierarchical result set, whereas SQL is still a flat result.

Specifically, the GroupJoin operator will always produce one output for each input from the “outer” sequence. Any matching elements from the inner sequence are grouped into a collection that can be associated with the outer element.

var query3 = departments.GroupJoin(
        employees,           // inner sequence d => 
        d.ID,                // outer key selector 
        e => e.DepartmentID, // inner key selector 
        (d, g) => new        // result projector {
            DepartmentName = d.Name,
            Employees = g
        });

In the above query the departments collection is our outer sequence – we will always see all the available departments. The difference is in the result projector. With a regular Join operator our projector sees two arguments – a employee object and it’s matching department object. With GroupJoin, the two parameters are: a Department object(the d in our result projector lambda expression) and an IEnumerable<Employee> object(the g in our result projector lambda expression). The second parameter represents all the employees that matched the join criteria for this department (and it could be empty, as in the case of Skunkworks). A GroupJoin can be triggered in comprehension query syntax by using a join followed by an into operator.

Grouping Operators

The grouping operators are the GroupBy and ToLookup operators.Both operators return a sequence of IGrouping<K,V> objects. This interface specifies that the object exposes a Key property, and this Key property will represent the grouping value.

int[] numbers = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };

var query = numbers.ToLookup(i => i % 2);

foreach (IGrouping<int, int> group in query)
{
    Console.WriteLine("Key:
        {0}", group.Key);
    foreach (int number in group)
    {
        Console.WriteLine(number);
    }
}

This above query should produced grouped Key values of 0 and 1 (the two possible answers to the modulo of 2). Inside of each group object will be an enumeration of the objects from the original sequence that fell into that grouping. Essentially the above query will have a group of odd numbers (Key value of 1) and a grouping of even numbers (Key value of 2).

The primary difference between GroupBy and ToLookup is that GroupBy is lazy and offers deferred execution. ToLookup is greedy and will execute immediately.

Generational Operators

The Empty operator will create an empty sequence for IEnumerable<T>. Note that we can use this operator using static method invocation syntax since we typically won’t have an existing sequence to work with.

var empty = Enumerable.Empty<Employee>();

The Range operator can generate a sequence of numbers, while Repeat  can generate a sequence of any value.

var empty = Enumerable.Empty<Employee>();            

int start = 1;
int count = 10;
IEnumerable<int> numbers = Enumerable.Range(start, count);

var tenTerminators = Enumerable.Repeat(new Employee { Name ="Arnold" }, 10);

Finally, the DefaultIfEmpty operator will generate an empty collection using the default value of a type when it is applied to an empty sequence.

string[] names = { }; //empty array 

IEnumerable<string>
        oneNullString = names.DefaultIfEmpty();

Equality

The one equality operator in LINQ is the SequenceEquals operator. This operator will walk through two sequences and compare the objects inside for equality. This is another operator where you can override the default equality test using an IEqualityComparer object as a parameter to a second overload of the operator. The test below will return false because the first employee objects in each sequence are not the same.

Employee e1 = 
        new Employee() { ID = 1 };
Employee e2 = new Employee() { ID = 2 };
Employee e3 = new Employee() { ID = 3 };

var employees1 = new List<Employee>() { e1, e2, e3 };
var employees2 = new List<Employee>() { e3, e2, e1 };

bool result = employees1.SequenceEqual(employees2);

Element Operators

Element operators include the ElementAt, First, Last, and Single operators. For each of these operators there is a corresponding “or default” operator that you can use in to avoid exceptions when an element does not exist (ElementAtOrDefault, FirstOrDefault, LastOrDefault, SingleOrDefault). Thier behavior is demonstrated in the following code.

string[] empty = { };
string[] notEmpty = { "Hello", "World" };

var result = empty.FirstOrDefault(); // null 
result = notEmpty.Last();            // World
result = notEmpty.ElementAt(1);      // World 
result = empty.First();              // InvalidOperationException 
result = notEmpty.Single();         // InvalidOperationException 
result = notEmpty.First(s => s.StartsWith("W")); 

The primary difference between First and Single is that the Single operator will throw an exception if a sequence does not contain a single element, whereas First is happy to take just the first element from a sequence of 10. You can use Single when you want to guarantee that a query returns just a single item and generate an exception if the query returns 0 or more than 1 item.

Conversions

The first two conversion operators are AsEnumerable and AsQueryable. The AsEnumerable operator is useful when you want to make a queryable sequence (a sequence backed by a LINQ provider and typically a remote datasource, like a database) into an in-memory collection where all operators appearing after AsEnumerable will work in-memory with LINQ to Object. For example, when working with the queryable properties of a LINQ to SQL data context class, you can return a query to the upper layers of your application with AsEnumerable on the end, meaning the higher layers will not be able to compose operators into the query that change the SQL that LINQ to SQL will generate.

The AsQueryable operator works in the opposite direction – it makes an in-memory collection appear as if it is backed by a remote LINQ provider. This can be useful in unit tests where you want to “fake” a LINQ to SQL queryable collection with in-memory data.

The OfType and Cast operators both coerce types in a sequence. The OfType operator we also listed as a filtering operator – it will only return objects that can be type cast to a specific type, while Cast will fail if it cannot cast all the objects in a sequence to a specific type.

object[] data = { "Foo", 1, "Bar" };

// will return a sequence of 2 strings 
var query1 = data.OfType<string>();

// will create an exception when executed 
var query2 = data.Cast<string>();

The last four conversion operators are ToList, ToDictionary, ToList, and ToLookup. These are all greedy operators that will execute a query immediately and construct an in-memory data structure. For more on ToList and greediness, see Lazy LINQand Enumerable Objects.

Concatenation

The Concat operator concatenates two sequences and uses deferred execution. Concat is similar to the Union operator, but it will not remove duplicates.

string[] firstNames = { "Scott", "James", "Allen", "Greg" };
string[] lastNames = { "James", "Allen", "Scott", "Smith" };

var concatNames = firstNames.Concat(lastNames).OrderBy(s => s);
var unionNames = firstNames.Union(lastNames).OrderBy(s => s);

The first query will produce the sequence: “Allen”, “Allen”, “Greg”, “James”, “James”, “Scott”, “Scott”, “Smith”.

The second query will produce the sequence: “Allen”, “Greg”, “James”, “Scott”, “Smith”.

Aggregation Operators

No query technology is complete without aggregation, and LINQ includes the usual Average, Count, LongCount (for big results), Max, Min, and Sum. For example, here are some aggregation operations in a query that produces statistics about the running  processes on a machine:

Process[] runningProcesses = Process.GetProcesses();

var summary = new {
    ProcessCount = runningProcesses.Count(),
    WorkerProcessCount = runningProcesses.Count(p => p.ProcessName == "w3wp"),
    TotalThreads = runningProcesses.Sum(p => p.Threads.Count),
    MinThreads = runningProcesses.Min(p => p.Threads.Count),
    MaxThreads = runningProcesses.Max(p => p.Threads.Count),
    AvgThreads = runningProcesses.Average(p => p.Threads.Count)
};

The most interesting operator in LINQ is the Aggregate operator, which can perform just about any type of aggregation you need (you could also implement mapping, filtering, grouping, and projection with aggregation if you wish). I’ll pass you to my blog entry “Custom Aggregations In LINQ” for more information.

Summary

I hope you’ve enjoyed this tour of the standard LINQ operators. Knowing about all the options available in the LINQ tool belt will empower you to write better LINQ queries with less code. Don’t forget you can also implement custom operators in LINQ if none of these built-in operators fit your solution – all you need to do is write extension methods for IEnumerable<T> or IQueryable<T> (although be careful extending IQueryable<T> as remote LINQ providers will not understand your custom operator). Happy LINQing!

by K. Scott Allen K.Scott Allen
My Pluralsight Courses
The Podcast!