Post

C# LINQ Performance Optimization Techniques

Introduction to LINQ Performance

Language Integrated Query (LINQ) is one of C#’s most powerful features, enabling developers to query collections using a declarative syntax. However, improper use of LINQ can lead to significant performance issues. This guide explores performance optimization techniques, common pitfalls, and best practices for writing efficient LINQ queries.

Understanding when and how to optimize LINQ is crucial for building high-performance applications, especially when dealing with large datasets or performance-critical code paths.

Understanding LINQ Execution Models

Deferred Execution vs Immediate Execution

LINQ queries use deferred execution by default, meaning the query doesn’t execute until you enumerate the results.

1
2
3
4
5
6
7
8
9
10
11
12
// Deferred execution - query not executed yet
var query = numbers.Where(n => n > 5);

// Query executes when enumerated
foreach (var num in query)
{
    Console.WriteLine(num);
}

// Multiple enumerations = multiple executions
var count = query.Count(); // First execution
var first = query.First(); // Second execution - query runs again!

Immediate Execution Methods:

1
2
3
4
5
6
7
8
9
var numbers = Enumerable.Range(1, 100);

// These execute immediately
var list = numbers.Where(n => n > 5).ToList();
var array = numbers.Where(n => n > 5).ToArray();
var count = numbers.Count(n => n > 5);
var first = numbers.First(n => n > 5);
var sum = numbers.Sum();
var max = numbers.Max();

IEnumerable vs IQueryable

Understanding the difference between IEnumerable<T> and IQueryable<T> is essential for performance optimization.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// IEnumerable - executes in memory
public IEnumerable<Product> GetExpensiveProducts_Slow()
{
    // ALL products loaded into memory first
    IEnumerable<Product> products = dbContext.Products;
    
    // Filtering happens in memory
    return products.Where(p => p.Price > 100);
}

// IQueryable - executes in database
public IQueryable<Product> GetExpensiveProducts_Fast()
{
    // Query built as expression tree
    IQueryable<Product> products = dbContext.Products;
    
    // Filtering happens in database
    return products.Where(p => p.Price > 100);
}

// SQL generated for IQueryable:
// SELECT * FROM Products WHERE Price > 100

Common Performance Pitfalls

1. Multiple Enumeration

One of the most common and costly mistakes is enumerating a query multiple times.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// BAD - multiple enumerations
public void ProcessData(IEnumerable<int> numbers)
{
    if (numbers.Any())  // First enumeration
    {
        var count = numbers.Count();  // Second enumeration
        var sum = numbers.Sum();      // Third enumeration
        var avg = numbers.Average();   // Fourth enumeration
    }
}

// GOOD - materialize once
public void ProcessData(IEnumerable<int> numbers)
{
    var materializedNumbers = numbers.ToList();  // Single enumeration
    
    if (materializedNumbers.Any())
    {
        var count = materializedNumbers.Count;
        var sum = materializedNumbers.Sum();
        var avg = materializedNumbers.Average();
    }
}

// BETTER - use aggregate for multiple calculations
public void ProcessData(IEnumerable<int> numbers)
{
    var stats = numbers.Aggregate(
        new { Count = 0, Sum = 0 },
        (acc, n) => new { Count = acc.Count + 1, Sum = acc.Sum + n }
    );
    
    var average = stats.Count > 0 ? stats.Sum / (double)stats.Count : 0;
}

2. N+1 Query Problem

The N+1 problem occurs when you query for N items and then make an additional query for each item.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
// BAD - N+1 queries
public List<OrderDto> GetOrdersWithCustomers_Slow()
{
    var orders = dbContext.Orders.ToList();
    
    return orders.Select(o => new OrderDto
    {
        OrderId = o.Id,
        // This causes a separate query for EACH order!
        CustomerName = dbContext.Customers
            .First(c => c.Id == o.CustomerId).Name
    }).ToList();
}

// GOOD - single query with join
public List<OrderDto> GetOrdersWithCustomers_Fast()
{
    return dbContext.Orders
        .Join(
            dbContext.Customers,
            order => order.CustomerId,
            customer => customer.Id,
            (order, customer) => new OrderDto
            {
                OrderId = order.Id,
                CustomerName = customer.Name
            }
        )
        .ToList();
}

// BETTER - use Include for Entity Framework
public List<OrderDto> GetOrdersWithCustomers_Best()
{
    return dbContext.Orders
        .Include(o => o.Customer)
        .Select(o => new OrderDto
        {
            OrderId = o.Id,
            CustomerName = o.Customer.Name
        })
        .ToList();
}

3. Inefficient Filtering Order

The order of LINQ operations significantly impacts performance.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// BAD - process all items then filter
public List<Product> GetTopExpensiveProducts_Slow()
{
    return products
        .Select(p => new
        {
            Product = p,
            // Expensive calculation for ALL products
            TotalCost = CalculateComplexCost(p)
        })
        .Where(x => x.TotalCost > 1000)  // Filter after calculation
        .Take(10)
        .Select(x => x.Product)
        .ToList();
}

// GOOD - filter first, then process
public List<Product> GetTopExpensiveProducts_Fast()
{
    return products
        .Where(p => p.BasePrice > 900)  // Pre-filter with cheap check
        .Select(p => new
        {
            Product = p,
            // Expensive calculation for fewer items
            TotalCost = CalculateComplexCost(p)
        })
        .Where(x => x.TotalCost > 1000)  // Precise filter
        .Take(10)
        .Select(x => x.Product)
        .ToList();
}

4. Unnecessary ToList() Calls

Calling ToList() prematurely materializes the entire collection in memory.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// BAD - unnecessary materialization
public IEnumerable<Product> GetFilteredProducts_Slow()
{
    var allProducts = dbContext.Products.ToList();  // Load everything!
    
    return allProducts
        .Where(p => p.IsActive)
        .Where(p => p.Price > 100)
        .OrderBy(p => p.Name);
}

// GOOD - let the database do the work
public IEnumerable<Product> GetFilteredProducts_Fast()
{
    return dbContext.Products
        .Where(p => p.IsActive)
        .Where(p => p.Price > 100)
        .OrderBy(p => p.Name);
    // ToList() called by consumer only if needed
}

Optimization Techniques

1. Use AsParallel for CPU-Bound Operations

For CPU-intensive operations on large in-memory collections, use PLINQ (Parallel LINQ).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Sequential processing
public List<ProcessedData> ProcessData_Sequential(List<RawData> data)
{
    return data
        .Where(d => IsValid(d))
        .Select(d => ProcessIntensive(d))  // CPU-intensive
        .ToList();
}

// Parallel processing
public List<ProcessedData> ProcessData_Parallel(List<RawData> data)
{
    return data
        .AsParallel()
        .Where(d => IsValid(d))
        .Select(d => ProcessIntensive(d))
        .ToList();
}

// With degree of parallelism control
public List<ProcessedData> ProcessData_Controlled(List<RawData> data)
{
    return data
        .AsParallel()
        .WithDegreeOfParallelism(4)
        .Where(d => IsValid(d))
        .Select(d => ProcessIntensive(d))
        .ToList();
}

When to use AsParallel:

  • CPU-bound operations
  • Large datasets (1000+ items)
  • Operations that don’t depend on order
  • Operations without side effects

When NOT to use AsParallel:

  • I/O-bound operations
  • Small datasets (overhead outweighs benefits)
  • Operations that require order
  • Database queries (use IQueryable instead)

2. Optimize Projections with Select

Only select the fields you need, especially with database queries.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// BAD - retrieves all columns
public List<ProductSummary> GetProductSummaries_Slow()
{
    return dbContext.Products
        .Where(p => p.IsActive)
        .ToList()  // Loads ALL columns into memory
        .Select(p => new ProductSummary
        {
            Id = p.Id,
            Name = p.Name
        })
        .ToList();
}

// GOOD - projects only needed columns
public List<ProductSummary> GetProductSummaries_Fast()
{
    return dbContext.Products
        .Where(p => p.IsActive)
        .Select(p => new ProductSummary  // Projection in database
        {
            Id = p.Id,
            Name = p.Name
        })
        .ToList();
}

// SQL generated: SELECT Id, Name FROM Products WHERE IsActive = 1

3. Use Any() Instead of Count()

When checking if a collection has elements, use Any() instead of Count().

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// BAD - counts all elements
if (products.Count() > 0)
{
    ProcessProducts(products);
}

if (products.Count() > 10)
{
    // Still counts all elements even if there are millions
}

// GOOD - stops at first element
if (products.Any())
{
    ProcessProducts(products);
}

// For checking minimum count
if (products.Skip(10).Any())
{
    // Efficiently checks if count > 10
}

4. Optimize Lookups with Dictionary

When performing multiple lookups, convert to a dictionary first.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// BAD - O(n) lookup for each item
public List<OrderWithCustomer> GetOrdersWithCustomers_Slow(
    List<Order> orders, 
    List<Customer> customers)
{
    return orders.Select(o => new OrderWithCustomer
    {
        Order = o,
        // O(n) lookup for each order!
        Customer = customers.FirstOrDefault(c => c.Id == o.CustomerId)
    }).ToList();
}

// GOOD - O(1) lookup with dictionary
public List<OrderWithCustomer> GetOrdersWithCustomers_Fast(
    List<Order> orders, 
    List<Customer> customers)
{
    var customerDict = customers.ToDictionary(c => c.Id);
    
    return orders.Select(o => new OrderWithCustomer
    {
        Order = o,
        // O(1) lookup
        Customer = customerDict.GetValueOrDefault(o.CustomerId)
    }).ToList();
}

5. Use HashSet for Contains Operations

When checking membership multiple times, use HashSet<T> instead of List<T>.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// BAD - O(n) for each Contains check
public List<Product> FilterByCategory_Slow(
    List<Product> products, 
    List<int> categoryIds)
{
    return products
        .Where(p => categoryIds.Contains(p.CategoryId))  // O(n) each time
        .ToList();
}

// GOOD - O(1) for each Contains check
public List<Product> FilterByCategory_Fast(
    List<Product> products, 
    List<int> categoryIds)
{
    var categorySet = new HashSet<int>(categoryIds);
    
    return products
        .Where(p => categorySet.Contains(p.CategoryId))  // O(1) each time
        .ToList();
}

6. Batch Operations

Process large collections in batches to manage memory usage.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
public static class BatchExtensions
{
    public static IEnumerable<List<T>> Batch<T>(
        this IEnumerable<T> source, 
        int batchSize)
    {
        var batch = new List<T>(batchSize);
        
        foreach (var item in source)
        {
            batch.Add(item);
            
            if (batch.Count >= batchSize)
            {
                yield return batch;
                batch = new List<T>(batchSize);
            }
        }
        
        if (batch.Any())
        {
            yield return batch;
        }
    }
}

// Usage
public async Task ProcessLargeDataset(IEnumerable<DataItem> items)
{
    foreach (var batch in items.Batch(1000))
    {
        await ProcessBatchAsync(batch);
        
        // Memory released between batches
        GC.Collect();
    }
}

Database Query Optimization

1. Proper Use of Include and ThenInclude

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
// BAD - lazy loading causes N+1 queries
public List<Order> GetOrdersWithDetails_Slow()
{
    var orders = dbContext.Orders.ToList();
    
    foreach (var order in orders)
    {
        // Separate query for each order!
        var items = order.OrderItems.ToList();
        
        foreach (var item in items)
        {
            // Another query for each item!
            var product = item.Product;
        }
    }
    
    return orders;
}

// GOOD - eager loading with Include
public List<Order> GetOrdersWithDetails_Fast()
{
    return dbContext.Orders
        .Include(o => o.OrderItems)
            .ThenInclude(oi => oi.Product)
        .Include(o => o.Customer)
        .ToList();
}

// BETTER - load only needed data
public List<OrderDto> GetOrdersWithDetails_Best()
{
    return dbContext.Orders
        .Select(o => new OrderDto
        {
            OrderId = o.Id,
            CustomerName = o.Customer.Name,
            Items = o.OrderItems.Select(oi => new OrderItemDto
            {
                ProductName = oi.Product.Name,
                Quantity = oi.Quantity,
                Price = oi.Price
            }).ToList()
        })
        .ToList();
}

2. Avoid Client-Side Evaluation

Ensure operations are performed in the database, not in memory.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// BAD - client-side evaluation
public List<Product> GetProductsByCustomLogic_Slow()
{
    return dbContext.Products
        .Where(p => p.IsActive)
        .ToList()  // Loads into memory
        .Where(p => CustomBusinessLogic(p))  // Executes in memory
        .ToList();
}

// GOOD - database-side evaluation
public List<Product> GetProductsByCustomLogic_Fast()
{
    return dbContext.Products
        .Where(p => p.IsActive && p.Price > 100 && p.Stock > 0)
        .ToList();
}

3. Use Compiled Queries

For frequently executed queries, use compiled queries to avoid repeated query compilation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// Define compiled query
private static readonly Func<AppDbContext, int, IEnumerable<Product>> 
    GetProductsByCategoryCompiled = 
        EF.CompileQuery((AppDbContext context, int categoryId) =>
            context.Products
                .Where(p => p.CategoryId == categoryId && p.IsActive)
                .OrderBy(p => p.Name)
        );

// Use compiled query
public List<Product> GetProductsByCategory(int categoryId)
{
    return GetProductsByCategoryCompiled(dbContext, categoryId).ToList();
}

Benchmarking LINQ Performance

Using BenchmarkDotNet

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

[MemoryDiagnoser]
public class LinqBenchmarks
{
    private List<int> _numbers;
    
    [GlobalSetup]
    public void Setup()
    {
        _numbers = Enumerable.Range(1, 10000).ToList();
    }
    
    [Benchmark]
    public int ForLoop()
    {
        int sum = 0;
        for (int i = 0; i < _numbers.Count; i++)
        {
            if (_numbers[i] % 2 == 0)
                sum += _numbers[i];
        }
        return sum;
    }
    
    [Benchmark]
    public int LinqWhere()
    {
        return _numbers.Where(n => n % 2 == 0).Sum();
    }
    
    [Benchmark]
    public int LinqAggregate()
    {
        return _numbers.Aggregate(0, (sum, n) => n % 2 == 0 ? sum + n : sum);
    }
}

// Run benchmarks
public class Program
{
    public static void Main()
    {
        BenchmarkRunner.Run<LinqBenchmarks>();
    }
}

Best Practices Summary

1. Know Your Data

1
2
3
4
5
6
7
8
9
10
11
12
// Small collections (< 100 items) - LINQ is fine
var smallList = GetSmallList();
var result = smallList.Where(x => x.IsActive).OrderBy(x => x.Name).ToList();

// Large collections - optimize carefully
var largeList = GetLargeList();
var result = largeList
    .Where(x => x.IsActive)  // Filter early
    .Select(x => new { x.Id, x.Name })  // Project early
    .OrderBy(x => x.Name)
    .Take(100)  // Limit results
    .ToList();

2. Use the Right Collection Type

1
2
3
4
5
6
7
8
9
10
11
// For lookups - use Dictionary
var lookup = items.ToDictionary(i => i.Id);

// For uniqueness checks - use HashSet
var uniqueIds = new HashSet<int>(items.Select(i => i.Id));

// For ordered data - use SortedSet or SortedDictionary
var sorted = new SortedSet<int>(items.Select(i => i.Priority));

// For simple iteration - use List or Array
var list = items.ToList();

3. Profile Before Optimizing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// Use diagnostic tools
using System.Diagnostics;

public void MeasurePerformance()
{
    var sw = Stopwatch.StartNew();
    
    var result = PerformLinqOperation();
    
    sw.Stop();
    Console.WriteLine($"Execution time: {sw.ElapsedMilliseconds}ms");
    
    // Memory profiling
    var before = GC.GetTotalMemory(false);
    var result2 = PerformLinqOperation();
    var after = GC.GetTotalMemory(false);
    Console.WriteLine($"Memory used: {(after - before) / 1024}KB");
}

4. When to Avoid LINQ

Sometimes traditional loops are more efficient:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// LINQ version
public int SumEvenNumbers_Linq(int[] numbers)
{
    return numbers.Where(n => n % 2 == 0).Sum();
}

// Loop version - faster for simple operations
public int SumEvenNumbers_Loop(int[] numbers)
{
    int sum = 0;
    for (int i = 0; i < numbers.Length; i++)
    {
        if (numbers[i] % 2 == 0)
            sum += numbers[i];
    }
    return sum;
}

// Use loops when:
// - Simple aggregation operations
// - Performance-critical hot paths
// - Need to break early
// - Working with arrays and known sizes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
public class ProductSearchService
{
    private readonly AppDbContext _context;
    
    public async Task<SearchResult> SearchProducts(SearchCriteria criteria)
    {
        var query = _context.Products.AsQueryable();
        
        // Apply filters
        if (!string.IsNullOrEmpty(criteria.SearchTerm))
        {
            query = query.Where(p => 
                p.Name.Contains(criteria.SearchTerm) || 
                p.Description.Contains(criteria.SearchTerm));
        }
        
        if (criteria.MinPrice.HasValue)
        {
            query = query.Where(p => p.Price >= criteria.MinPrice.Value);
        }
        
        if (criteria.MaxPrice.HasValue)
        {
            query = query.Where(p => p.Price <= criteria.MaxPrice.Value);
        }
        
        if (criteria.CategoryIds?.Any() == true)
        {
            var categorySet = new HashSet<int>(criteria.CategoryIds);
            query = query.Where(p => categorySet.Contains(p.CategoryId));
        }
        
        // Get total count for pagination
        var totalCount = await query.CountAsync();
        
        // Apply sorting
        query = criteria.SortBy switch
        {
            "price_asc" => query.OrderBy(p => p.Price),
            "price_desc" => query.OrderByDescending(p => p.Price),
            "name" => query.OrderBy(p => p.Name),
            _ => query.OrderByDescending(p => p.CreatedDate)
        };
        
        // Apply pagination and projection
        var products = await query
            .Skip((criteria.Page - 1) * criteria.PageSize)
            .Take(criteria.PageSize)
            .Select(p => new ProductDto
            {
                Id = p.Id,
                Name = p.Name,
                Price = p.Price,
                ImageUrl = p.ImageUrl,
                CategoryName = p.Category.Name
            })
            .ToListAsync();
        
        return new SearchResult
        {
            Products = products,
            TotalCount = totalCount,
            Page = criteria.Page,
            PageSize = criteria.PageSize
        };
    }
}

Conclusion

Optimizing LINQ performance requires understanding execution models, avoiding common pitfalls, and applying appropriate optimization techniques. Key takeaways include:

  • Understand deferred vs immediate execution
  • Know when to use IEnumerable<T> vs IQueryable<T>
  • Avoid multiple enumerations by materializing once
  • Filter early, project late
  • Use Any() instead of Count() for existence checks
  • Convert to Dictionary or HashSet for frequent lookups
  • Profile before optimizing
  • Use AsParallel for CPU-bound operations on large datasets
  • Optimize database queries with proper projections and eager loading

Remember: readable code is valuable, but understanding performance characteristics helps you make informed decisions when optimization is necessary.

References

This post is licensed under CC BY 4.0 by the author.