How can perform a filter in a child entity inside an Aggregate Root in Entity Framework

I'm trying to access an item in a list inside an aggregate root, but since it has a lot of entries (40K+), Entity Framework takes a long time to execute it, 150.180 ms on my dev machine.

Here's a stripped down example that shows this behavior:

public class Parent
{
    public int Id { get; private set; }
    public virtual ICollection<Child> Children { get; private set; }

    public void Remove(string someProperty)
    {
        var itensToRemove = Children
            .Where(x => x.SomeProperty == someProperty)
            .ToList(); // -> this is where it takes a long time to run

        // remove...
    }
}

public class Child
{
    public int Id { get; set; }
}

Seeding:

INSERT [dbo].[Parent] ([Id]) VALUES (1)
INSERT [dbo].[Child] ([Id], [Parent_Id]) VALUES (1, 1)
...
INSERT [dbo].[Child] ([Id], [Parent_Id]) VALUES (40000, 1)

I also tried casting to List and using .RemoveAll(), but the result is the same.

(Children as List<Child>).RemoveAll(x => x.SomeProperty == someProperty);

Since I'm using lazy loading, I always thought that EF would consider the .Where(...) and create a filtered SQL query, but SQL Profiler tells me it doesn't:

exec sp_executesql N'SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[Parent_Id] AS [Parent_Id]
    FROM [dbo].[Child] AS [Extent1]
    WHERE 
        ([Extent1].[Parent_Id] IS NOT NULL) AND 
        ([Extent1].[Parent_Id] = @EntityKeyValue1)
',N'@EntityKeyValue1 int',@EntityKeyValue1=1

What's interesting is that when I run the above query in SSMS it returns all rows instantly.

In terms of design, I'm considering accessing it directly based on this answer, but I feel it would break the DDD design in my case since it involves business logic that belongs in the parent.

1 answer

  • answered 2018-10-11 22:55 Steve Py

    I wouldn't attempt logic like this from within an entity. "Children" is either eagerly loaded as a list at the time that the parent is read, or it is lazy loaded within the scope of the context when referenced.

    When you attempt:

    var itensToRemove = Children
                .Where(x => x.SomeProperty == someProperty)
                .ToList(); 
    

    ... this lazy-loads all children for the parent prior to the Where condition executing. If this is running across many parents or large sets of children it will be very inefficient.

    Entities are not business logic, they are data. If you have a requirement like an action to remove all matching children from a parent entity, or from several/all parent entities, then this should be handled at a service or repository level to encapsulate the business logic and use the entities as a representation of the data.

    For instance, given a parent ID to remove all children with a "childType" of "Answer". If a parent has a few children, and children are relatively small entities then you can load the parent with it's children eager loaded, and remove the applicable entities, then save the parent:

    var parent = context.Parents.Where(p => p.ParentId == parentId)
      .Include(p => p.Children)
      .Single();
    var childrenToRemove = parent.Children.Where(c => c.ChildType == "Answer").ToList();
    foreach (child in childrenToRemove)
       parent.Children.Remove(child);
    
    context.SaveChanges();
    

    If you are bulk removing children from many/all parents, or child entities are bulky in size, then I would consider keeping children as a top-level entity (with a DbSet in the context) and delete them directly. This is assuming that there is a 1-to-many relationship between parent and child (child contains a parentId) and child does not have any children itself:

    List<Child> childrenToDelete = new List<Child>();
    do
    {
      childrenToDelete = context.Children
        .Where(c => c.ChildType == "Answer")
        .Select(c => c.ChildId)
        .Take(1000)
        .ToList() // Execute the query to get our 1000 IDs. We need Linq2Obj references to continue.
        .Select(id => new Child { ChildId = id})
        .ToList();
    
       foreach(var child in childrenToDelete)
         context.Children.Attach(child);
    
       context.Children.RemoveRange(childrenToDelete);
       context.SaveChanges();
    } while (childrenToDelete.Any());
    

    The above loads applicable children IDs and compose/attach new child entity references using those IDs. This saves loading all child data to perform the removal. We load and delete in batches of 1000 to keep the transaction size reasonable. This also needs to be done with a "clean" context where parents/children haven't already been loaded as this would cause issues with Attach. This also needs considerations for error handling, and considerations about handling failures part way through as each batch of 1000 will be committed.

    If you're faced with children that have child references of their own then you can look at defining a context and entity model set specifically for this operation where the only data is the required references. (PKs and FKs) where you can load children and their related entities from the bounded context for the operation and issue the deletes. Again, if this is going to affect a large # of rows, pull the data in batches.

    The last alternative I can suggest is to utilize a stored procedure for the operation, then be sure that any loaded parent entities in the context are re-loaded.

    Hopefully that provides some ideas on how to handle your scenario.