Tripping on Code

ToLookup vs. GroupBy

January 31, 2021

GroupBy and ToLookup methods are two methods found as extensions to IEnumerable on the System.Linq namespace seems to perform very similar operations. The main difference between them is that GroupBy performs deferred execution while ToLookup performs immediate execution. Take a look at the example below for more details.

The Setup

Let us create a record to hold our data type.

public record Person(string firstName, string lastName, string country)
{
    public override string ToString()
    {
        return $"{firstName} {lastName} - {country}";
    }
}

Then let’s set up a method to return a new Person object.

private static IEnumerable<Person> GetPersons()
{
     var firstNameRandomizer = RandomizerFactory.GetRandomizer(new FieldOptionsFirstName());
     var lastNameRandomizer = RandomizerFactory.GetRandomizer(new FieldOptionsLastName());
     var countryRandomizer = RandomizerFactory.GetRandomizer(new FieldOptionsCountry());

     while (true)
     {
          yield return new Person(
              firstName: firstNameRandomizer.Generate(),
              lastName: lastNameRandomizer.Generate(),
              country: countryRandomizer.Generate());
      }
}

The GetPersons method is an iterator that will return as many persons as requested by the caller.

Note: We are using the RandomDataGenerator.Net package to generate random names and countries.

The differences

Superficial Differences

Let’s try executing the above method with both GroupBy and ToLookup

// ToLookup
ILookup<string, Person> lookupResult = GetPersons().Take(100).ToLookup(x => x.country);

// GroupBy
IEnumerable<IGrouping<string, Person>> groupResult = GetPersons().Take(100).GroupBy(x => x.country);

The ToLookup method returns an ILookup<string, Person>. An ILookup<TKey, TValue> is very similar to a Dictionary<string, IEnumerable<Person>. So running groupResult["United Kingdom"] returns an IEnumerable containing all the people living in the United Kingdom.

The GroupBy method, on the other hand, returns an IEnumerable<IGrouping<string, Person>>. An IGrouping<TKey, TValue> also maps a single key to multiple values. However, we cannot use the indexer to find the value. If you want all the people who live in the UK, you will have to do the following.

IGrouping<string, Person> ukPeople = groupResult.FirstOrDefault(x => x.Key == "United Kingdom");

ukPeople is a collection that will contain all the people living in the United Kingdom. It has a property called Key, which includes the Key used to group these results. The United Kingdom, in this case.

Immediate Execution vs. Deferred Execution

The main difference is how they are executed. ToLookup is very similar to ToList, ToDictionary, or ToArray. The result is materialized immediately, and you can start working on the items. In the case of GroupBy, the result is only materialized when a method like ToList or ToArray is invoked.

This becomes a factor when we are dealing with either Entity Framework or dealing with generator functions that return large datasets.

ToLookup will pause all execution until the entire list is processed. GroupBy will only be processed as we loop through the result or invoke ToList.

Entity Framework and Deferred Execution

The difference between ToLookup and GroupBy are very easily explained when looking at Entity Framework.

Let’s take a look at the following expression

// Query 1
var peopleInUkByGender = dbContext.People.Where(x => x.Age > 18).GroupBy(x => x.Gender);

// Query 2
var peopleInUkByGender = dbContext.People.Where(x => x.Age > 18).ToLookup(x => x.Gender);

The above code gets translated into (rougly) the following SQL statements

# Query 1
SELECT * FROM People
WHERE Age > 18
GROUP BY Gender

# Query 2
SELECT * FROM People
WHERE Age > 18

For Query 1, the grouping is done in the database server. For Query 2, the grouping is done in memory in the application and not in the database server.

Which should I use?

There are many factors that should be taken into consideration when making such a decision, but I can provide a general rule here that could work well.

  • If you are using an ORM to write a Query, GroupBy will generally perform better because it gets translated into SQL and performed in the database server.
  • If you are dealing with IEnumerable or IQueryable, use GroupBy. An IEnumerable indicates the possibility of deferred execution. So using GroupBy will hold off the execution until it is needed.
  • When dealing with Lists, Collections, Dictionary, Hashsets or any other type that requires the data to be materialized, use ToLookup. It’s easier to use and the syntax is cleaner.

Written by Avinash. You should follow him on Twitter