EF4 SaveChanges how to avoid out of memory exception - entity-framework-4

The existing fragment
foreach (var ownerCandidates in ownerToCandidatesDictionary)
{
foreach (var candidate in ownerCandidates.Value)
{
transactionEntities.AddToSomeEntity(someObject)
}
}
transactionEntities.SaveChanges(System.Data.Objects.SaveOptions.AcceptAllChangesAfterSave);
Is rewriting to
int i = 0 ;
foreach (var ownerCandidates in ownerToCandidatesDictionary)
{
foreach (var candidate in ownerCandidates.Value)
{
transactionEntities.AddToSomeEntity(someObject)
}
if ( i++ % 1000 == 0 )
{
transactionEntities.SaveChanges(System.Data.Objects.SaveOptions.AcceptAllChangesAfterSave);
}
}
transactionEntities.SaveChanges(System.Data.Objects.SaveOptions.AcceptAllChangesAfterSave);
gives us the same functionality in case of successful program termination ? My concern we keep adding, does SaveChanges in a loop work only with what was added since previous SaveChanges. Do we have saving in batches here ? If it is not the case how the original fragment may be changed to avoid
12/06/2012 7:50:37 PM : System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.Data.Mapping.Update.Internal.Propagator.Project(DbProjectExpression node, PropagatorResult row, TypeUsage resultType)
at System.Data.Mapping.Update.Internal.Propagator.Visit(DbProjectExpression node)
at System.Data.Common.CommandTrees.DbProjectExpression.Accept[TResultType](DbExpressionVisitor`1 visitor)
at System.Data.Mapping.Update.Internal.Propagator.Propagate(UpdateTranslator parent, EntitySet table, DbQueryCommandTree umView)
at System.Data.Mapping.Update.Internal.UpdateTranslator.<ProduceDynamicCommands>d__0.MoveNext()
at System.Linq.Enumerable.<ConcatIterator>d__71`1.MoveNext()
at System.Data.Mapping.Update.Internal.UpdateCommandOrderer..ctor(IEnumerable`1 commands, UpdateTranslator translator)
at System.Data.Mapping.Update.Internal.UpdateTranslator.ProduceCommands()
at System.Data.Mapping.Update.Internal.UpdateTranslator.Update(IEntityStateManager stateManager, IEntityAdapter adapter)
at System.Data.EntityClient.EntityAdapter.Update(IEntityStateManager entityCache)
at System.Data.Objects.ObjectContext.SaveChanges(SaveOptions options)

Subsequent call to SaveChanges() will not have any effect if changes are not made to the entity set after the previous SaveChanges(). After the SaveChanges() what goes in database is the new batch. Batching is the option to overcome the OutOfMemoryException during huge inserts with EntityFramework. Actually SaveChanges() should by called only once but in your case as the data is huge it must be divided into the batches.
Also while adding entity in bulk to get significant performance improvements, temporarily disable the automatic detection of changes by setting AutoDetectChangesEnabled to false. Here is the infocontext.Configuration.AutoDetectChangesEnabled = false;

Related

EF DbContext Attach works only when proxy creation disabled

I ran into an issue where I was intermittently receiving an error:
An entity object cannot be referenced by multiple instances of IEntityChangeTracker
whenever trying to attach an entity to the DbContext.
UPDATE: Original post is below and is TL;DR. So here is a simplified version with more testing.
First I get the Documents collection. There are 2 items returned in this query.
using (UnitOfWork uow = new UnitOfWork())
{
// uncomment below line resolves all errors
// uow.Context.Configuration.ProxyCreationEnabled = false;
// returns 2 documents in the collection
documents = uow.DocumentRepository.GetDocumentByBatchEagerly(skip, take).ToList();
}
Scenario 1:
using (UnitOfWork uow2 = new UnitOfWork())
{
// This errors ONLY if the original `uow` context is not disposed.
uow2.DocumentRepository.Update(documents[0]);
}
This scenario works as expected. I can force the IEntityChangeTracker error by NOT disposing the original uow context.
Scenario 2:
Iterate through the 2 items in the documents collection.
foreach (Document document in documents)
{
_ = Task.Run(() =>
{
using (UnitOfWork uow3 = new UnitOfWork())
{
uow3.DocumentRepository.Update(document);
});
}
}
Both items fail to attach to the DbSet with the IEntityChangeTracker error. Sometimes one succeeds and only one fails. I assume this might be to do with the exact timings of the Task Scheduler. But even if they are attaching concurrently, they are different document entities. So they shouldn't be being tracked by any other context. Why am I getting the error?
If I uncomment ProxyCreationEnabled = false on the original uow context, this scenario works! So how are they still being tracked even thought the context was disposed? Why is it a problem that they are DynamicProxies, even though they are not attached to or tracked by any context.
ORIGINAL POST:
I have an entity object called Document, and it's related entity which is a collection of DocumentVersions.
In the code below, the document object and all related entities including DocumentVersions have already been eagerly loaded before being passed to this method - which I will demonstrate after.
public async Task PutNewVersions(Document document)
{
// get versions
List<DocumentVersion> versions = document.DocumentVersions.ToList();
for (int i = 0; i < versions.Count; i++)
{
UnitOfWork uow = new UnitOfWork();
try
{
versions[i].Attempt++;
//... make some API call that succeeds
versions[i].ContentUploaded = true;
versions[i].Result = 1;
}
finally
{
uow.DocumentVersionRepository.Update(versions[i]); // error hit in this method
uow.Save();
}
}
}
The Update method just attaches the entity and changes the state. It is part of a GenericRepository class that all my Entity Repositories inherit from:
public virtual void Update(TEntity entityToUpdate)
{
dbSet.Attach(entityToUpdate); // error is hit here
context.Entry(entityToUpdate).State = EntityState.Modified;
}
The document entity, and all related entities are loaded eagerly using a method in the Document entity repository:
public class DocumentRepository : GenericRepository<Document>
{
public DocumentRepository(MyEntities context) : base(context)
{
this.context = context;
}
public IEnumerable<Document> GetDocumentByBatchEagerly(int skip, int take)
{
return (from document in context.Documents
.Include(...)
.Include(...)
.Include(...)
.Include(...)
.Include(d => d.DocumentVersions)
.AsNoTracking()
orderby document.DocumentKey descending
select document).Skip(skip).Take(take);
}
}
The method description for .AsNoTracking() says that "the entities returned will not be cached in the DbContext". Great!
Then why does the .Attach() method above think that this DocumentVersion entity is already referenced in another IEntityChangeTracker? I am assuming this means it is referenced in another DbContext, i.e: the one calling GetDocumentByBatchEagerly(). And why does this issue only present intermittently? It seems that it happens less often when I am stepping through the code.
I resolved this by adding the following line to the above DocumentRepository constructor:
this.context.Configuration.ProxyCreationEnabled = false;
I just don't understand why this appears to resolve the issue.
It also means if I ever want to use the DocumentRepository for something else and want to leverage change tracking and lazy loading, I can't. There doesn't seem to be a 'per query' option to turn off dynamic proxies like there is with 'as no tracking'.
For completeness, here is how the 'GetDocumentsByBatchEagerly' method is being used, to demonstrate that it uses it's own instance of UnitOfWork:
public class MigrationHandler
{
UnitOfWork uow = new UnitOfWork();
public async Task FeedPipelineAsync()
{
bool moreResults = true;
do
{
// documents retrieved with AsNoTracking()
List<Document> documents = uow.DocumentRepository.GetDocumentByBatchEagerly(skip, take).ToList();
if (documents.Count == 0) moreResults = false;
skip += take;
// push each record into TPL Dataflow pipeline
foreach (Document document in documents)
{
// Entry point for the data flow pipeline which links to
// a block that calls PutNewVersions()
await dataFlowPipeline.DocumentCreationTransformBlock.SendAsync(document);
}
} while (moreResults);
dataFlowPipeline.DocumentCreationTransformBlock.Complete();
// await completion of each block at the end of the pipeline
await Task.WhenAll(
dataFlowPipeline.FileDocumentsActionBlock.Completion,
dataFlowPipeline.PutVersionActionBlock.Completion);
}
}

InvalidOperationException when using updatemodel with EF4.3.1

When I update my model I get an error on a child relation which I also try to update.
My model, say Order has a releationship with OrderItem. In my view I have the details of the order together with an editortemplate for the orderitems. When I update the data the link to Order is null but the orderid is filled, so it should be able to link it, TryUpdateModel returns true, the save however fails with:
InvalidOperationException: The operation failed: The relationship could not be changed because one or more of the foreign-key properties is non-nullable. When a change is made to a relationship, the related foreign-key property is set to a null value. If the foreign-key does not support null values, a new relationship must be defined, the foreign-key property must be assigned another non-null value, or the unrelated object must be deleted.]
My update method:
public ActionResult ChangeOrder(Order model)
{
var order = this.orderRepository.GetOrder(model.OrderId);
if (ModelState.IsValid)
{
var success = this.TryUpdateModel(order);
}
this.orderRepository.Save();
return this.View(order);
}
I tried all solutions I saw on SO and other sources, none succeeded.
I use .Net MVC 3, EF 4.3.1 together with DBContext.
There are a number of code smells here, which I'll try to be elegant with when correcting :)
I can only assume that "Order" is your EF entity? If so, I would highly recommend keeping it separate from the view by creating a view model for your form and copying the data in to it. Your view model should really only contain properties that your form will be using or manipulating.
I also presume orderRepository.GetOrder() is a data layer call that retrieves an order from a data store?
You are also declaring potentially unused variables. "var order =" will be loaded even if your model is invalid, and "var success =" is never used.
TryUpdateModel and UpdateModel aren't very robust for real-world programming. I'm not entirely convinced they should be there at all, if I'm honest. I generally use a more abstracted approach, such as the service / factory pattern. It's more work, but gives you a lot more control.
In your case, I would recommend the following pattern. There's minimal abstraction, but it still gives you more control than using TryUpdateModel / UpdateModel:
public ActionResult ChangeOrder(OrderViewModel model) {
if(ModelState.IsValid) {
// Retrieve original order
var order = orderRepository.GetOrder(model.OrderId);
// Update primitive properties
order.Property1 = model.Property1;
order.Property2 = model.Property2;
order.Property3 = model.Property3;
order.Property4 = model.Property4;
// Update collections manually
order.Collection1 = model.Collection1.Select(x => new Collection1Item {
Prop1 = x.Prop1,
Prop2 = x.Prop2
});
try {
// Save to repository
orderRepository.SaveOrder(order);
} catch (Exception ex) {
ModelState.AddModelError("", ex.Message);
return View(model);
}
return RedirectToAction("SuccessAction");
}
return View(model);
}
Not ideal, but it should serve you a bit better...
I refer you to this post, which is similar.
I assume that the user can perform the following actions in your view:
Modify order (header) data
Delete an existing order item
Modify order item data
Add a new order item
To do a correct update of the changed object graph (order + list of order items) you need to deal with all four cases. TryUpdateModel won't be able to perform a correct update of the object graph in the database.
I write the following code directly using a context. You can abstract the use of the context away into your repository. Make sure that you use the same context instance in every repository that is involved in the following code.
public ActionResult ChangeOrder(Order model)
{
if (ModelState.IsValid)
{
// load the order from DB INCLUDING the current order items in the DB
var orderInDB = context.Orders.Include(o => o.OrderItems)
.Single(o => o.OrderId == model.OrderId);
// (1) Update modified order header properties
context.Entry(orderInDB).CurrentValues.SetValues(model);
// (2) Delete the order items from the DB
// that have been removed in the view
foreach (var item in orderInDB.OrderItems.ToList())
{
if (!model.OrderItems.Any(oi => oi.OrderItemId == item.OrderItemId))
context.OrderItems.Remove(item);
// Omitting this call "Remove from context/DB" causes
// the exception you are having
}
foreach (var item in model.OrderItems)
{
var orderItem = orderInDB.OrderItems
.SingleOrDefault(oi => oi.OrderItemId == item.OrderItemId);
if (orderItem != null)
{
// (3) Existing order item: Update modified item properties
context.Entry(orderItem).CurrentValues.SetValues(item);
}
else
{
// (4) New order item: Add it
orderInDB.OrderItems.Add(item);
}
}
context.SaveChanges();
return RedirectToAction("Index"); // or some other view
}
return View(model);
}

Why is my code not able to update the database?

I am having trouble saving my entities after updating them. I can add new entities like this: add(student); but if I tried this:
if (ModelState.IsValid)
{
db.Entry(student).State = EntityState.Modified;
db.SaveChanges();
return RedirectToAction("someView");
}
I get this error message:
System.Data.Entity.Infrastructure.DbUpdateConcurrencyException was unhandled by user code
Message=Store update, insert, or delete statement affected an unexpected number of rows (0). Entities may have been modified or
deleted since entities were loaded. Refresh ObjectStateManager
entries.
Here’s my controller method:
[HttpPost]
public ActionResult ClassAttendance(InstructorIndexData viewModel, FormCollection frmcol)
{
var instructorData = new InstructorIndexData();
string[] AllFstMNames = frmcol["item.Student.FirstMidName"].Split(',');
string[] AllLstNames = frmcol["item.Student.LastName"].Split(',');
string[] AllAddresses = frmcol["item.Student.Address"].Split(',');
string[] AllEnrollmentDates = frmcol["item.Student.EnrollmentDate"].Split(',');
//more of the same code…
var student = new Student();
var enrollment = new Enrollment();
for ( int i = 0; i < AllFstMNames.Count(); i++)
{
student.FirstMidName = AllFstMNames[i];
student.LastName = AllLstNames[i];
student.Address = AllAddresses[i];
student.EnrollmentDate = Convert.ToDateTime(AllEnrollmentDates[i]);
if (!string.IsNullOrEmpty(frmcol["item.Grade"]))
{
enrollment.Grade = Convert.ToInt32(AllGrades[i]);
}
enrollment.StudentID = Convert.ToInt32(AllStudentIds[i]);
enrollment.attendanceCode = Convert.ToInt32(AllAttendanceCodes[i]);
enrollment.classDays = AllclassDays[i];
enrollment.CourseID = Convert.ToInt32 (AllCourseIds[i]);
//update rows
}
if (ModelState.IsValid)
{
db.Entry(student).State = EntityState.Modified;
db.SaveChanges();
return RedirectToAction("someView");
}
Can you help me with just being able to update values in the database?
While I was looking at the code here, my initial thought is that it doesn't seem quite right to have a for loop that updates the student and enrollment objects multiple times and then to have only one call to db.SaveChanges outside the loop. This is concerning because only the last iteration of the for loop will be applied when the data is saved to the database. (You have a comment to "update rows" at the end of the for loop - perhaps some code is missing or misplaced?)
Then, I started thinking about why it would be necessary to manually set the Entry(...).State property. Wouldn't the db automatically know that an object is modified and needs to be saved? That lead me to this question: Where is db defined? What technology stack is being used there?
Finally, after making an assumption that the db object might work something like the MS LINQ-to-SQL feature, I noticed that the the student object is newly instantiated before the for loop. This is fine for inserting new data, but if you are wanting to update existing data, I believe you need to first get a copy of the object from the database and then update the properties. This allows the db object to monitor the changes (again, assuming that it has this capability). (If this is not the case, then it leads me to wonder how the db will know which record in the database to update since you are not setting anything that appears to be a primary key, such as StudentId, on the student object in the loop.)

.Net Entity Framework ObjectContext error

I have method with code:
using (var cc = new MyDBContext())
{
var myList = (from user in cc.Users
where user.UserGroup.Name == "smth"
orderby user.ID ascending
select user);
if (startIndex != null)
return View(myList.Skip((int)startIndex).Take(50));
else
return View(myList);
}
In view I catch exception The ObjectContext instance has been disposed and can no longer be used for operations that require a connection.
Some people says that .ToList() must solve problem, but it throws an exception with myList.ToList() too. What is my problem?
P.S. in debug mode I have exception at #item.FullName in view, but if I move mouse on FullName property I can see correct value.
Sorry for my bad english.
Take the "return View()" statements outside of the "using" block completely. That will ensure you have retrieved the complete data sets before your DbContext object is disposed. Like this:
using (var cc = new MyDBContext())
{
var myList = (linq).ToList();
}
return View(myList);
I'm pretty sure the problem is that you are returning an IEnumerable to the View, which means the items haven't actually been retrieved yet. But when you return the object to your View, the DbContext is getting disposed before the view has a chance to retrieve the rows.
The problem was in lazy loaded sub property of User entity. I add to link statement Include("PropName") and it works good.

Method not called when using yield return

I'm having a little trouble with a method in which I use yield return this doesn't work...
public IEnumerable<MyClass> SomeMethod(int aParam)
{
foreach(DataRow row in GetClassesFromDB(aParam).Rows)
{
yield return new MyClass((int)row["Id"], (string)row["SomeString"]);
}
}
The above code never runs, when the call is made to this method it just steps over it.
However if I change to...
public IEnumerable<MyClass> SomeMethod(int aParam)
{
IList<MyClass> classes = new List<MyClass>();
foreach(DataRow row in GetClassesFromDB(aParam).Rows)
{
classes.Add(new MyClass((int)rows["Id"], (string)row["SomeString"]);
}
return classes;
}
It works just fine.
I don't understand why the first method never runs, could you help me in understanding what is happening here?
The "yield" version is only "run" when the caller actually starts to enumerate the returned collection.
If, for instance, you only get the collection:
var results = SomeObject.SomeMethod (5);
and don't do anything with it, the SomeMethod will not execute.
Only when you start enumerating the results collection, it will hit.
foreach (MyClass c in results)
{
/* Now it strikes */
}
yield return methods are actually converted into state machine classes that retrieve information lazily - only when you actually ask for it. That means that in order to actually pull data, you have to iterate over the result of your method.
// Gives you an iterator object that hasn't done anything yet
IEnumerable<MyClass> list = SomeMethod();
// Enumerate over the object
foreach (var item in list ) {
// Only here will the data be retrieved.
// The method will stop on yield return every time the foreach loops.
}
The reason it runs in the second case is because there's no yield block, and thus the entire method runs in one go.
In this specific case, it's unlikely that you'll have any advantage to use an iterator block over a regular one because your GetClassesFromDb() isn't one either. This means that it will retrieve all the data at the same time first time it runs. Iterator blocks are best used when you can access items one at a time, because that way you can stop if you don't need them anymore.
I had to learn in a near disastrous way how cool/dangerous yield is when I decided to make our company's parser read incoming data lazily. Fortunately only one of the handful of our implementing functions actually used the yield keyword. Took a few days to realize it was quietly not doing any work at all.
The yield keyword it will be as lazy as it possibly can, including skipping over the method altogether if you don't put it to work with something like .ToList() or .FirstOrDefault() or .Any()
Below are two variations, one using the keyword and one returning a straight-up list. One won't even bother to execute, while the other will, even though they seem the same.
public class WhatDoesYieldDo
{
public List<string> YieldTestResults;
public List<string> ListTestResults;
[TestMethod]
public void TestMethod1()
{
ListTest();
Assert.IsTrue(ListTestResults.Any());
YieldTest();
Assert.IsTrue(YieldTestResults.Any());
}
public IEnumerable<string> YieldTest()
{
YieldTestResults = new List<string>();
for (var i = 0; i < 10; i++)
{
YieldTestResults.Add(i.ToString(CultureInfo.InvariantCulture));
yield return i.ToString(CultureInfo.InvariantCulture);
}
}
public IEnumerable<string> ListTest()
{
ListTestResults = new List<string>();
for (var i = 0; i < 10; i++)
{
ListTestResults.Add(i.ToString(CultureInfo.InvariantCulture));
}
return ListTestResults;
}
}
Moral of the story: Make sure that if have a method that returns IEnumerable and you use yield in that method, you have something that will iterate over the results, or the method won't execute at all.

Resources