Parallelization of user fetching with graph api and delta

Parallelization of user fetching with graph api and delta - microsoft-graph-api

I user delta query to get changes to users for a particular tenant. The algorithm looks like:
Fetch all users and save delta
Use delta to get only changes
Everything works fine however the initial call to fetch all users is very slow, as I need to follow nextLink and basically if a tenant has hugh number of users ( > 1 000 000) and max number of items per page is 999 it takes a lof of the for that synchronization.
I thought, I could parallelize it - use startswith(mail,'{a}') filter and call the api for every letter in the alphabet. The problem is that with this approach I cannot fet delta link (or I would get a delta for every call).
Is there maybe a better way to speed up user fetching ?

Delta on users does not support filtering objects today on any other property than the Id. You could request support for filtering by adding an idea in UserVoice.
As a workaround you could sync the users in parallels with a filter using the GET API (/users) and then issue a delta query with $deltatoken=latest to get a token from that point and not have to sync all the changes sequentially. This doesn't guarantee consistency though.
Lastly, sync can be made faster (using delta, without parallelization) by selecting only the properties you need.

Related

Device Delete event Handling in Rule chain being able to reduce the total device count at Customer Level

I am using total count of devices as the "server attributes" at customer entity level that is in turn being used for Dashboard widgets like in "Doughnut charts". Hence to get the total count information, I have put a rule chain in place that handles "Device" Addition / Assignment event to increment the "totalDeviceCount" attribute at customer level. But when the device is getting deleted / UnAssigned , I am unable to get access to the Customer entity using "Enrichment" node as the relation is already removed at the trigger of this event. With this I have the challenge of maintaining the right count information for the widgets.
Has anyone come across similar requirement? How to handle this scenario?

Has anyone come across similar requirement? How to handle this scenario?
What you could do is to count your devices periodically, instead of tracking each individual addition/removal.
This you can achieve using the Aggregate Latest Node, where you can indicate a period (say, each minute), the entity or devices you want to count, and to which variable name you want to save it.
This node outputs a POST_TELEMETRY_REQUEST. If you are ok with that then just route that node to Save Timeseries. If you want an attribute, route that node to a Script Transformation Node and change the msgType to POST_ATTRIBUTE_REQUEST.

Can I perform a single fetch request which returns independent calculations for subsets of the results?

My data model has a ClickerRecord entity with 2 attributes: date (NSDate) and numberOfBiscuits (NSNumber). Every time a new record is added, a different value for numberOfBiscuits can be entered.
To calculate a daily average for the number of biscuits I'm currently doing a fetch request for each day within range and using the corresponding NSExpression to calculate the sum of all numberOfBiscuits values for that day.
The problem: I'm using asynchronous fetch requests to avoid blocking the main thread, so it ends up being quite slow when there are many days between the first and last record. The fetch requests are performed one after another.
I could also load all records into memory and perform the sorting and calculations, but I'm worried that it could become an issue when the number of records becomes very large.
Therefore, my question: Is it possible to use NSExpressions to add something like sub-predicates for each date interval, in order to do a single fetch request and retrieve a dictionary with an entry for each daily sum of numberOfBiscuits?
If not, what would be the recommended approach for this situation?
I've read about subqueries but as far as I've understood they're not intended for this kind of use.
This is the first question I'm asking on SO, so I hope to have written it in a clear way :)

I think what you are looking for is the propertiesToGroupBy (see the Apple Docs) for the NSFetchRequest, though in your case it is not straight forward to implement, for reasons I will discuss later.
Suppose you could specify the category of biscuit consumed on each occasion, and this is stored in a category attribute of your entity. Then to obtain the total number of biscuits of each category (ignoring the date), you could use an NSExpression using #sum and specify:
fetch.propertiesToGroupBy = ["category"]
CoreData will then group the results of the fetch by the category and will calculate the sum for each group separately.
The problem in your case is that (unless you already strip out the time information from your date attribute), there is no attribute that represents the date interval that you want to group by, and CoreData will not let you specify a computed value to group by. You would need to add a new day attribute to your entity, and calculate that whenever you add/update a record, and specify it in the group by. And you face the same problem again if you subsequently want to calculate your average over a different interval - weeks or months for example. One other downside to this is that the results will only include days for which there are ClickerRecords: if the user has a day where they consume no biscuits, then the fetch will not show a result for that day (ie it will not infer an average of 0). You would need to handle this appropriately when using the results.
It might be better either to tune your asynchronous fetch or, as you suggest, just to read the whole lot into memory to perform the calculations. If your entity only has those two attributes, and assuming your users don't live entirely on biscuits, the volumes should not be too problematic.

Cumulocity data export

I noticed a limit of 2000 records per API call for getting collections out of Cumulocity. Will we be constrained to these limits or is there any other batch API available?

You cannot get more than 2000 records for a single collection request at the moment. But you can specify a more direct query e.g. by time and grab it in multiple requests if it exceeds the 2000 records.
Example:
/measurement/measurements?dateFrom={dateFrom}&dateTo={dateTo}
Another way would be to get data continuously pushed to you. You can use the real-time API http://cumulocity.com/guides/reference/real-time-notifications/

Repositary for Pagination in ASP.NET

I am fetching around 10,000 records from a service. I need to implement pagination in my ASP.NET UI. I don't want to store the records in the database. I have planned to fetch records in chunk (of 100 records) and put them in cache.
If I display 10 records/page, then I can paginate between 10 pages. Now if a user clicks page number 11, then again I will call the service, get the records and refresh the cache to hold a new set of records. If a user again clicks on the first page index, I need to hit the service again.
Is this a feasible strategy for pagination in an ASP.NET context? Also, too many records in cache could impact on performance. Could anybody suggest an effective approach for this kind of scenario?

If using Pagination there is no reason to cache. Otherwise this is generally the right approach.
Flow is select (select from) -> filter (where) -> sort (orderby) -> skip(page * pagesize) -> take(pagesize).
This should ALL get passed down to the data layer such that the user code is not actually executing any of it, the db is. The skip/take part is usually where people have the most issues as it requires a generate column in the query (row number) but is usually doable on the DB side.

How do I get current courses for a user in Desire2Learn's Valence API? What can we do to fetch when courses are in thousands?

We need to find all the courses for a user whose startDate is less than today's date and endDate is greater than today's date. We are using API
/d2l/api/lp/{ver}/enrollments/myenrollments/?orgUnitTypeId=3
In one particular case I have more than 18 thousand courses against one user. The service can not return 18 thousand records at one go, I can only get 100 records at a time, so I need to use bookmark fields to fetch data in set of 100 records. Bookmark is the courseId of the last 100th record that we fetched, to get next set of 100 records.
/d2l/api/lp/{ver}/enrollments/myenrollments/?orgUnitTypeId=3&bookmark=12528
I need to repeat the loop 180 times, which results in "Request time out" error.
I need to filter the record on the basis of startDate and endDate, no sorting criteria is mentioned which can sort the data on the basis of startDate or endDate. Can anyone help me to find out the way to sort these data, or tell any other API which can do such type of sorting?
Note: All the 18 thousand records has property "IsActive":true

Rather than getting to the list of org units by user, you can try getting to the user by the list of org units. You could try using /d2l/api/lp/{ver}/orgstructure/{orgUnitId}/descendants/?ouTypeId={courseOfferingType} to retrieve the entire list of course offering IDs descended from the highest common ancestor known for the user's enrollments. You can then loop through /d2l/api/lp/{ver}/courses/{orgUnitId} to fetch back the course offering info for each one of those org units to pre-filter and cut out all the course offerings you don't care about based on dates. Then, for the ones left, you can check for the user's enrollment in each one of those to figure out which of your smaller set the user matches with.
This will certainly result in more calls to the service, not less, so it only has two advantages I can see:
You should be able to get the entire starting set of course offerings you need off the hop rather than getting it back in pages (although it's entirely possible that this call will get turned into a paging call in the future and the "fetch all the org units at once" nature it currently has deprecated).
If you need to do this entire use-case for more than one user, you can fetch the org structure data once, cache it, and then only do queries for users on the subset of the data.
In the mean time, I think it's totally reasonable to request an enhancement on the enrollments calls to provide better filtering (active/nonactive, start-dates, end-dates, and etc): I suspect that such a request might see more traction than a request to give control to clients over paging (i.e. number of responses in each page frame).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart