Insert/Update bulk data to sql table in entity framework - entity-framework-6

I have one product table with consist of product number , price and currency.
I upload the data through excel file with the data.
When i upload the data then i create a list at back end with all data from excel and then send entire list to server to update/insert data in the table.
foreach(Product value in item)
{
Product p = context.Product.SingleOrDefault(p => p.ProductNumber == value.ProductNumber && p.CurrencyId == value.CurrencyId);
if (p!= null)
{
context.Entry<Product>(p).Property(p => p.ProductPrice).IsModified = true;
p.ProductPrice = value.ProductPrice ;
context.SaveChanges();
}
else
{
context.Product.Add(value);
context.SaveChanges();
}
}
But as the data increases (Product increase) then upload takes a lot of time.
Any suggestion to optimize this? One stuff i am thinking is of Index on productnumber and currency column.

The performance issue is not caused because of the index (Of course if no index exists, this will also help) but because of the number of database round-trip your code is performing.
For every product:
You check if the product exists (1 database round-trip)
You insert/update the product (1 database round-trip)
So if you upload 10,000 products from your excel file, you will perform 20,000 database round-trip which is insane.
There are some tricks which may help the performance a little bit like adding multiple products to insert with AddRange instead of Add, but you would end up with the same performance issue caused by the number of database round-trip.
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library allows to dramatically improve performance by performing:
BulkSaveChanges
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
Example:
// Insert or Update using ProductNumber && CurrencyId as the Key
ctx.BulkMerge(list, operation =>
{
operation.ColumnPrimaryKeyExpression = x => new {x.ProductNumber, x.CurrencyId};
});

Related

linq query times out

my Linq query times out and the results are never retreived..
using (var repository = PersistenceService.GetRepository<Organisation>(UserId))
{
var organisation = repository.GetAll().FirstOrDefault(x => x.Id == organisationId);
if (organisation != null)
{
isCourseApproverRole = organisation.Contacts.FirstOrDefault(con => con.RoleName == "CourseApprover" &&
con.Individual.Id == individualId) != null;
}
}
When I try doing all this in one query it works fine..
Can some one explain why above query will time out??
Note: organisation.Contacts contain about 18,000 rows for the selected organisation.
It's because of massive lazy loading.
The first command...
var organisation = repository.GetAll().FirstOrDefault(x => x.Id == organisationId);
... pulls an Organisation object into memory. That shouldn't be any problem.
But then you access organisation.Contacts. It doesn't matter which LINQ method you apply to this collection, the whole collection is pulled into memory by lazy loading. The LINQ filter is applied afterwards.
However, though highly inefficient, this still shouldn't cause a timeout. Fetching 18000 records by an indexed search (I assume) shouldn't take more than 30s (I assume) unless something else is terribly wrong (like low resources, bad network).
The part con.Individual.Id == individualId is the culprit. If you would have monitored the executed SQL commands you'd have seen that this causes one query for each Individual until the predicate Id == individualId is matched. These queries run while the query organisation.Contacts is read. No doubt, this causes the timeout.
You could fix this by replacing the predicate by con.IndividualId == individualId (i.e. using the foreign key property). But I really don't understand why you don't do this in one query, which works fine, as you say. By the current approach you fetch large amounts of data, while in the end you only need one boolean value!

How to reduce Azure Table Storage latency?

I have a rather huge (30 mln rows, up to 5–100Kb each) Table on Azure.
Each RowKey is a Guid and PartitionKey is a first Guid part, for example:
PartitionKey = "1bbe3d4b"
RowKey = "1bbe3d4b-2230-4b4f-8f5f-fe5fe1d4d006"
Table has 600 reads and 600 writes (updates) per second with an average latency of 60ms. All queries use both PartitionKey and RowKey.
BUT, some reads take up to 3000ms (!). In average, >1% of all reads take more than 500ms and there's no correlation with entity size (100Kb row may be returned in 25ms and 10Kb one – in 1500ms).
My application is an ASP.Net MVC 4 web-site running on 4-5 Large instances.
I have read all MSDN articles regarding Azure Table Storage performance goals and already did the following:
UseNagle is turned Off
Expect100Continue is also disabled
MaxConnections for table client is set to 250 (setting 1000–5000 doesn't make any sense)
Also I checked that:
Storage account monitoring counters have no throttling errors
There are some kind of "waves" in performance, though they does not depend on load
What could be the reason of such performance issues and how to improve it?
I use the MergeOption.NoTracking setting on the DataServiceContext.MergeOption property for extra performance if I have no intention of updating the entity anytime soon. Here is an example:
var account = CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue("DataConnectionString"));
var tableStorageServiceContext = new AzureTableStorageServiceContext(account.TableEndpoint.ToString(), account.Credentials);
tableStorageServiceContext.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(1));
tableStorageServiceContext.MergeOption = MergeOption.NoTracking;
tableStorageServiceContext.AddObject(AzureTableStorageServiceContext.CloudLogEntityName, newItem);
tableStorageServiceContext.SaveChangesWithRetries();
Another problem might be that you are retrieving the entire enity with all its properties even though you intend only use one or two properties - this is of course wasteful but can't be easily avoided. However, If you use Slazure then you can use query projections to only retrieve the entity properties that you are interested in from the table storage and nothing more, which would give you better query performance. Here is an example:
using SysSurge.Slazure;
using SysSurge.Slazure.Linq;
using SysSurge.Slazure.Linq.QueryParser;
namespace TableOperations
{
public class MemberInfo
{
public string GetRichMembers()
{
// Get a reference to the table storage
dynamic storage = new QueryableStorage<DynEntity>("UseDevelopmentStorage=true");
// Build table query and make sure it only return members that earn more than $60k/yr
// by using a "Where" query filter, and make sure that only the "Name" and
// "Salary" entity properties are retrieved from the table storage to make the
// query quicker.
QueryableTable<DynEntity> membersTable = storage.WebsiteMembers;
var memberQuery = membersTable.Where("Salary > 60000").Select("new(Name, Salary)");
var result = "";
// Cast the query result to a dynamic so that we can get access its dynamic properties
foreach (dynamic member in memberQuery)
{
// Show some information about the member
result += "LINQ query result: Name=" + member.Name + ", Salary=" + member.Salary + "<br>";
}
return result;
}
}
}
Full disclosure: I coded Slazure.
You could also consider pagination if you are retrieving large data sets, example:
// Retrieve 50 members but also skip the first 50 members
var memberQuery = membersTable.Where("Salary > 60000").Take(50).Skip(50);
Typically, if a specific query requires scanning a large number of rows, that will take longer time. Is the behavior you are seeing specific a query / data? Or, are you seeing the performance varies for the same data and query?

How to include duplicate records in query?

Given an array of part ids containing duplicates, how can I find the corresponding records in my Part model, including the duplicates?
An example array of part ids would be ["1B", "4", "3421", "4"]. If we assume I have a record corresponding to each of those values I would like to see 4 records returned in total, not 3. If possible, I was hoping to be able to make additional SQL operations on whatever is returned.
Here's what I'm currently using which doesn't include the duplicates:
#parts = Part.where(:part_id => params[:ids])
To give a little background, I'm trying to upload an XML file containing a list of parts used in some item. My application is meant to parse the XML file and compare the parts listed within against my Parts database so that I can see how much the part weighs. These items will sometimes contain duplicates of various parts so that's what I'm trying to account for here.
The only way I can think of doing it is using map...
#parts = params[:ids].map { |id| Part.find_by_id(id) }
hard to tell exactly what you are doing, are you looking up weight from the xml or from your data?
parts_xml = some_method_that_loads_xml
part_ids_from_xml = part_xml.... # pull out the ids
parts = Part.where("id IN (?)", part_ids_from_xml)
now you have two arrays (xml data and your 'matching' database records) and you can use select or detect to do in memory lookups by part_id
part_ids_from_xml.each do |part_id|
weight = parts.detect { |item| item.id == part_id }.weight
puts "#{id} weighs #{weight}"
end
see http://ruby-doc.org/core-2.0.0/Enumerable.html#method-i-detect
and http://ruby-doc.org/core-2.0.0/Enumerable.html#method-i-select

ASP.NET MVC3 -Performance Improvement through paging concept, I Need an example?

I am working on application built on ASP.NET MVC 3.0 and displaying the data in MVC WebGrid.
I am using LINQ to get the records from Entities to EntityViewModel. In doing this I have to convert the records from entity to EntityViewModel.
I have 30K records to be displayed in the grid, for each and every record there are 3 flags where It has to go 3 other tables and compare the existence of the record and paint with true or false and display the same in grid.
I am displaying 10 records at a time, but it is bit very slow as I am getting all the records and storing in my application.
The Paging is in place (I mean to say -only 10 records are being displayed in web grid) but all the records are getting loaded into the application which is taking 15-20 seconds. I have checked the place where this time is being spent by the processor. It's happening in the painting place(where every record is being compared with 3 other tables).
I have converted LINQ query to SQL and I can see my SQL query is getting executed under 2 seconds. By this , I can strongly say that, I do not want to spend time on SQL indexing as the speed of SQL query is good enough.
I have two options to implement
1) Caching for MVC
2) Paging(where I should get only first ten records).
I want to go with the paging technique for performance improvement .
Now my question is how do I pass the number 10(no of records to service method) so that It brings up only ten records. And also how do I get the next 10 records when clicking on the next page.
I would post the code, but I cannot do it as it has some sensitive data.
Any example how to tackle this situation, many thanks.
If you're using SQL 2005 + you could use ROW_NUMBER() in your stored procedure:
http://msdn.microsoft.com/en-us/library/ms186734(v=SQL.90).aspx
or else if you just want to do it in LINQ try the Skip() and Take() methods.
As simple as:
int page = 2;
int pageSize = 10;
var pagedStuff = query.Skip((page - 1) * pageSize).Take(pageSize);
You should always, always, always be limiting the amount of rows you get from the database. Unbounded reads kill applications. 30k turns into 300k and then you are just destroying your sql server.
Jfar is on the right track with .Skip and .Take. The Linq2Sql engine (and most entity frameworks) will convert this to SQL that will return a limited result set. However, this doesn't preclude caching the results as well. I recommend doing that as well. That fastest trip to SQL Server is the one you don't have to take. :) I do something like this where my controller method handles paged or un-paged results and caches whatever comes back from SQL:
[AcceptVerbs("GET")]
[OutputCache(Duration = 360, VaryByParam = "*")]
public ActionResult GetRecords(int? page, int? items)
{
int limit = items ?? defaultItemsPerPage;
int pageNum = page ?? 0;
if (pageNum <= 0) { pageNum = 1; }
ViewBag.Paged = (page != null);
var records = null;
if (page != null)
{
records = myEntities.Skip((pageNum - 1) * limit).Take(limit).ToList();
}
else
{
records = myEntities.ToList();
}
return View("GetRecords", records);
}
If you call it with no params, you get the entire results set (/GetRecords). Calling it will params will get you the restricted set (/GetRecords?page=3&items=25).
You could extend this method further by adding .Contains and .StartsWith functionality.
If you do decide to go the custom stored procedure route, I'd recommend using "TOP" and "ROW_NUMBER" to restrict results rather than a temp table.
Personally I would create a custom stored procedure to do this and then call it through Linq to SQL. e.g.
CREATE PROCEDURE [dbo].[SearchData]
(
#SearchStr NVARCHAR(50),
#Page int = 1,
#RecsPerPage int = 50,
#rc int OUTPUT
)
AS
SET NOCOUNT ON
SET FMTONLY OFF
DECLARE #TempFound TABLE
(
UID int IDENTITY NOT NULL,
PersonId UNIQUEIDENTIFIER
)
INSERT INTO #TempFound
(
PersonId
)
SELECT PersonId FROM People WHERE Surname Like '%' + SearchStr + '%'
SET #rc = ##ROWCOUNT
-- Calculate the final offset for paging --
DECLARE #FirstRec int, #LastRec int
SELECT #FirstRec = (#Page - 1) * #RecsPerPage
SELECT #LastRec = (#Page * #RecsPerPage + 1)
-- Final select --
SELECT p.* FROM People p INNER JOIN #TempFound tf
ON p.PersonId = tf.PersonId
WHERE (tf.UID > #FirstRec) AND (tf.UID < #LastRec)
The #rc parameter is the total number of records found.
You obviously have to model it to your own table, but it should run extremely fast..
To bind it to an object in Linq to SQL, you just have to make sure that the final selects fields match the fields of the object it is to be bound to.

Select certain number of records for batch processing

Hi is it possible using Entity Framework and/or linq to select a certain number of rows? For example i want to select rows 0 - 500000 and assign these records to the List VariableAList object, then select rows 500001 - 1000000 and assign this to the List VariableBList object, etc. etc.
Where the Numbers object is like ID,Number,DateCreated, DateAssigned, etc.
Sounds like you're looking for the .Take(int) and .Skip(int) methods
using (YourEntities db = new YourEntities())
{
var VariableAList = db.Numbers
.Take(500000);
var VariableBList = db.Numbers
.Skip(500000)
.Take(500000);
}
You may want to be wary of the size of these lists in memory.
Note: You also may need an .OrderBy clause prior to using .Skip or .Take--I vaguely remember running into this problem in the past.

Resources