Select certain number of records for batch processing

Select certain number of records for batch processing - asp.net-mvc

Hi is it possible using Entity Framework and/or linq to select a certain number of rows? For example i want to select rows 0 - 500000 and assign these records to the List VariableAList object, then select rows 500001 - 1000000 and assign this to the List VariableBList object, etc. etc.
Where the Numbers object is like ID,Number,DateCreated, DateAssigned, etc.

Sounds like you're looking for the .Take(int) and .Skip(int) methods
using (YourEntities db = new YourEntities())
{
var VariableAList = db.Numbers
.Take(500000);
var VariableBList = db.Numbers
.Skip(500000)
.Take(500000);
}
You may want to be wary of the size of these lists in memory.
Note: You also may need an .OrderBy clause prior to using .Skip or .Take--I vaguely remember running into this problem in the past.

Related

Ruby on rails how to take lowest/biggest number from array

I have an array with products, I need to display only 1 product, with the largest number in available_amount. How can i do this?
How do I iterate to display products with parameters:
- #part.wh_ps.sort_by(&:available_amount).each do |whp|
product number1: available_amount: 2;
product number2: available_amount: 5;

If those objects are mapped from the database being the result of a query and you already have them in memory, then you could use Enumerable#max_by:
#part.wh_ps.max_by(&:available_amount)
It should return the object within that array with the biggest available_amount if any.
If you need the one with the lowest available_amount, then Enumerable#min_by, and if you need both Enumerable#minmax_by.
However if that's not the case and you're hitting the database again, you could consider making the exact query using SQL (ActiveRecord) asking for the row with the biggest value for the given column.

SELECT statement with multiple conditions on TAGs

For considerably long period of time I’ve been struggling the following problem. This is an example of data stored in the DB:
> show series
flights,cycleId=1535,cycleIdx=0,engineId=2,flightId=1696,flightIdx=0,type=fil
flights,cycleId=1535,cycleIdx=0,engineId=2,flightId=1696,flightIdx=0,type=std
flights,cycleId=1535,cycleIdx=0,engineId=2,flightId=1696,flightIdx=0,type=raw
...
and my intention is to select a specific one by using a query like this:
SELECT * FROM flights WHERE type='fil' AND engineId= '2' AND flightId = '1696' AND flightIdx = '0' AND cycleId = '1535' AND cycleIdx = '0'
Such query, however, yields always zero results. Zilch.
Selecting the first (and only) tag works fine:
SELECT * FROM flights WHERE cycleId = '1535'
but using this condition on any other tag, like for example
SELECT * FROM flights WHERE type='fil'
does never return a single row. Querying only the first tag and nothing else works.
Could you please give me a hint what am I doing wrong? From all I have found people are always selecting just by a single tag but never more. What is the part that I cannot see?
Many thanks for any ideas!

I believe I have discovered the reason: two keys from the tags made by mistake their way into the fields. I spotted the trouble when listing the tag and fields keys as
show tag keys
show field keys
Deleting all records does not remove the keys from these lists and the problem persists. One need to drop the entire database to restore the order of things.

How to reduce Azure Table Storage latency?

I have a rather huge (30 mln rows, up to 5–100Kb each) Table on Azure.
Each RowKey is a Guid and PartitionKey is a first Guid part, for example:
PartitionKey = "1bbe3d4b"
RowKey = "1bbe3d4b-2230-4b4f-8f5f-fe5fe1d4d006"
Table has 600 reads and 600 writes (updates) per second with an average latency of 60ms. All queries use both PartitionKey and RowKey.
BUT, some reads take up to 3000ms (!). In average, >1% of all reads take more than 500ms and there's no correlation with entity size (100Kb row may be returned in 25ms and 10Kb one – in 1500ms).
My application is an ASP.Net MVC 4 web-site running on 4-5 Large instances.
I have read all MSDN articles regarding Azure Table Storage performance goals and already did the following:
UseNagle is turned Off
Expect100Continue is also disabled
MaxConnections for table client is set to 250 (setting 1000–5000 doesn't make any sense)
Also I checked that:
Storage account monitoring counters have no throttling errors
There are some kind of "waves" in performance, though they does not depend on load
What could be the reason of such performance issues and how to improve it?

I use the MergeOption.NoTracking setting on the DataServiceContext.MergeOption property for extra performance if I have no intention of updating the entity anytime soon. Here is an example:
var account = CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue("DataConnectionString"));
var tableStorageServiceContext = new AzureTableStorageServiceContext(account.TableEndpoint.ToString(), account.Credentials);
tableStorageServiceContext.RetryPolicy = RetryPolicies.Retry(3, TimeSpan.FromSeconds(1));
tableStorageServiceContext.MergeOption = MergeOption.NoTracking;
tableStorageServiceContext.AddObject(AzureTableStorageServiceContext.CloudLogEntityName, newItem);
tableStorageServiceContext.SaveChangesWithRetries();
Another problem might be that you are retrieving the entire enity with all its properties even though you intend only use one or two properties - this is of course wasteful but can't be easily avoided. However, If you use Slazure then you can use query projections to only retrieve the entity properties that you are interested in from the table storage and nothing more, which would give you better query performance. Here is an example:
using SysSurge.Slazure;
using SysSurge.Slazure.Linq;
using SysSurge.Slazure.Linq.QueryParser;
namespace TableOperations
{
public class MemberInfo
{
public string GetRichMembers()
{
// Get a reference to the table storage
dynamic storage = new QueryableStorage<DynEntity>("UseDevelopmentStorage=true");
// Build table query and make sure it only return members that earn more than $60k/yr
// by using a "Where" query filter, and make sure that only the "Name" and
// "Salary" entity properties are retrieved from the table storage to make the
// query quicker.
QueryableTable<DynEntity> membersTable = storage.WebsiteMembers;
var memberQuery = membersTable.Where("Salary > 60000").Select("new(Name, Salary)");
var result = "";
// Cast the query result to a dynamic so that we can get access its dynamic properties
foreach (dynamic member in memberQuery)
{
// Show some information about the member
result += "LINQ query result: Name=" + member.Name + ", Salary=" + member.Salary + "<br>";
}
return result;
}
}
}
Full disclosure: I coded Slazure.
You could also consider pagination if you are retrieving large data sets, example:
// Retrieve 50 members but also skip the first 50 members
var memberQuery = membersTable.Where("Salary > 60000").Take(50).Skip(50);

Typically, if a specific query requires scanning a large number of rows, that will take longer time. Is the behavior you are seeing specific a query / data? Or, are you seeing the performance varies for the same data and query?

Search records having comma seperated values that contains any element from the given list

I have a domain class Schedule with a property 'days' holding comma separated values like '2,5,6,8,9'.
Class Schedule {
String days
...
}
Schedule schedule1 = new Schedule(days :'2,5,6,8,9')
schedule1.save()
Schedule schedule2 = new Schedule(days :'1,5,9,13')
schedule2.save()
I need to get the list of the schedules having any day from the given list say [2,8,11].
Output: [schedule1]
How do I write the criteria query or HQL for the same. We can prefix & suffix the days with comma like ',2,5,6,8,9,' if that helps.
Thanks,

Hope you have a good reason for such denormalization - otherwise it would be better to save the list to a child table.
Otherwise, querying would be complicated. Like:
def days = [2,8,11]
// note to check for empty days
Schedule.withCriteria {
days.each { day ->
or {
like('username', "$day,%") // starts with "$day"
like('username', "%,$day,%")
like('username', "%,$day") // ends with "$day"
}
}
}

In MySQL there is a SET datatype and FIND_IN_SET function, but I've never used that with Grails. Some databases have support for standard SQL2003 ARRAY datatype for storing arrays in a field. It's possible to map them using hibernate usertypes (which are supported in Grails).
If you are using MySQL, FIND_IN_SET query should work with the Criteria API sqlRestriction:
http://grails.org/doc/latest/api/grails/orm/HibernateCriteriaBuilder.html#sqlRestriction(java.lang.String)
Using SET+FIND_IN_SET makes the queries a bit more efficient than like queries if you care about performance and have a real requirement to do denormalization.

Combining table, web service data in Grails

I'm trying to figure out the best approach to display combined tables based on matching logic and input search criteria.
Here is the situation:
We have a table of customers stored locally. The fields of interest are ssn, first name, last name and date of birth.
We also have a web service which provides the same information. Some of the customers from the web service are the same as the local file, some different.
SSN is not required in either.
I need to combine this data to be viewed on a Grails display.
The criteria for combination are 1) match on SSN. 2) For any remaining records, exact match on first name, last name and date of birth.
There's no need at this point for soundex or approximate logic.
It looks like what I should do is extract all the records from both inputs into a single collection, somehow making it a set on SSN. Then remove the blank ssn.
This will handle the SSN matching (once I figure out how to make that a set).
Then, I need to go back to the original two input sources (cached in a collection to prevent a re-read) and remove any records that exist in the SSN set derived previously.
Then, create another set based on first name, last name and date of birth - again if I can figure out how to make a set.
Then combine the two derived collections into a single collection. The collection should be sorted for display purposes.
Does this make sense? I think the search criteria will limit the number of record pulled in so I can do this in memory.
Essentially, I'm looking for some ideas on how the Grails code would look for achieving the above logic (assuming this is a good approach). The local customer table is a domain object, while what I'm getting from the WS is an array list of objects.
Also, I'm not entirely clear on how the maxresults, firstResult, and order used for the display would be affected. I think I need to read in all the records which match the search criteria first, do the combining, and display from the derived collection.

The traditional Java way of doing this would be to copy both the local and remote objects into TreeSet containers with a custom comparator, first for SSN, second for name/birthdate.
This might look something like:
def localCustomers = Customer.list()
def remoteCustomers = RemoteService.get()
TreeSet ssnFilter = new TreeSet(new ClosureComparator({c1, c2 -> c1.ssn <=> c2.ssn}))
ssnFilter.addAll(localCustomers)
ssnFilter.addAll(remoteCustomers)
TreeSet nameDobFilter = new TreeSet(new ClosureComparator({c1, c2 -> c1.firstName + c1.lastName + c1.dob <=> c2.firstName + c2.lastName + c2.dob}))
nameDobFilter.addAll(ssnFilter)
def filteredCustomers = nameDobFilter as List
At this point, filteredCustomers has all the records, except those that are duplicates by your two criteria.
Another approach is to filter the lists by sorting and doing a foldr operation, combining adjacent elements if they match. This way, you have an opportunity to combine the data from both sources.
For example:
def combineByNameAndDob(customers) {
customers.sort() {
c1, c2 -> (c1.firstName + c1.lastName + c1.dob) <=>
(c2.firstName + c2.lastName + c2.dob)
}.inject([]) { cs, c ->
if (cs && c.equalsByNameAndDob(cs[-1])) {
cs[-1].combine(c) //combine the attributes of both records
cs
} else {
cs << c
}
}
}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart