Diffable datasource performance issues with > 20K rows - uitableview

I run into performance issues with the diffable data source when having a larger data set, around 22,000 items. I am surprised that applying the snapshot takes so much time when animation is ON. See the code section:
let shouldAnimate = tableView.numberOfSections != 0
apply(snapshot as NSDiffableDataSourceSnapshot<String, NSManagedObjectID>, animatingDifferences: shouldAnimate)
Note: there is a good resource here by Jesse Squires:
My questions is, am I missing something or diffable data source cannot handle this any faster given that applying a snapshot is an O(n) operation?
Turning off animation, same as reloadData, would somewhat help.
The sample code was setup based on this article by the awesome SwiftLee.
Please see the sample project here.
Sample video here.
Update (September 2nd, 2021): A good Twitter discussion here.
Sidenote:
The sample app can be improved by not setting the fetchBatchSize since the request is used with a NSFetchedResultsController. See link.

In iOS 15 we now have Apply SnapShot Using ReloadData.
It's a lot faster than apply snapshot if you are replacing data and dont care about animations

Related

Using FIRDatabaseReference more efficiently

Is there a difference in performance between those two codes?
The first one declares a database reference and uses it for every read and write.
The second one declares a database reference for every read and write.
Code 1:
let databaseReference = Database.database().reference()
databaseReference.setValue()
databaseReference.setValue()
databaseReference.setValue()
...
Code 2:
Database.database().reference().setValue()
Database.database().reference().setValue()
Database.database().reference().setValue()
...
Getting a reference to a database location is a purely local operation, that is quite heavily optimized.
There is no significant performance difference between those two. Any performance difference there is will be overshadowed by any network call you do (like when you call setValue().
When you wonder things like this, I recommend measuring it yourself btw. While I can reassure you all I want, nothing beats running a quick test and seeing the difference yourself, and in many cases it's not nearly as complex as you main initially think.

Dataflow to process late and out-of-order data for batch and stream messages?

My company receives both batch and stream based event data. I want to process the data using Google Cloud dataflow over a predictable time period. However, I realize that in some instances the data comes late or out of order. How to use Dataflow to handle late or out of order?
This is a homework question, and would like to know the only answer in below.
a. Set a single global window to capture all data
b. Set sliding window to capture all the lagged data
c. Use watermark and timestamps to capture the lagged data
d. Ensure every datasource type (stream or batch) has a timestamp, and use the timestamps to define the logic for lagged data.
My reasoning - I believe 'C' is the answer. But then, watermark is actually different from late data. Please confirm. Also, since the question mentioned both batch and stream based, i also think if 'D' could be the answer since 'batch'(or bounded collection) mode doesn't have the timestamps unless it comes from source or is programmatically set. So, i am a bit confused on the answer.
Please help here. I am a non-native english speaker, so not sure if I could have missed some cues in the question.
How to use Dataflow to handle late or out of order
This is a big question. I will try to give some simple explanations but provide some resources that might help you understand.
Bounded data collection
You have gotten a sense of it: bounded data does not have lateness problem. By the nature of bounded data, you can read the full data set at once before pipeline starts.
Unbounded data collection
Your C is correct, and watermark is different from late data. Watermark in implementation is a monotonically increasing timestamp. When Beam/Dataflow see a record with a event timestamp that is earlier than the watermark, the record is treated as late data (this is only conceptual and you might want to check[1] for some detailed discussion).
Here are [2], [3], [4] as reference for this topic:
https://docs.google.com/document/d/12r7frmxNickxB5tbpuEh_n35_IJeVZn1peOrBrhhP6Y/edit#heading=h.7a03n7d5mf6g
https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
https://www.oreilly.com/library/view/streaming-systems/9781491983867/
https://docs.google.com/presentation/d/1ln5KndBTiskEOGa1QmYSCq16YWO9Dtmj7ZwzjU7SsW4/edit#slide=id.g19b6635698_3_4
B and C may be the answer.
With sliding windows, you have the order of the data, so if you recive the data in position 9 and you don't recive the data in the position 8, you know that data 8 is delayed and wait for it. The problem is, if the latest data is delayed, you can't know this data is delayed and you lost it. https://en.wikipedia.org/wiki/Sliding_window_protocol
Watermark, wait a period of time for the lagged data, if this time passes and the data doesn't arrive, you lose this data.
So, the answer is C, because B says "capture all the lagged data" and C ignores the word all

Filter DOORS on historical data

Is there a way to filter based on historical data?
For example: "Show me all objects who had "Attribute_X" == True on 01/01/2013"
As Steve stated, this would require an advanced DXL script.
I'm not sure about creating a filter on this, but identifying those objects you are looking for, I might be able to help. Having recently solved a similar task, I recommend to start with Tony Goodman's really excellent Smart History Viewer (this code could be used as DXL tutorial!) which has almost all the code you need. You just need to find and understand it.
Let me elaborate. Besides other nifty stuff, the history viewer basically does:
For all (selected) baselines, explicitly including un-baselined current version: gather all module changes and put them into a two-dimensional Skip list each, for module/object/session changes. Focus on the object changes.
There is an unused function printObjectHistory in the code which helps understanding the data structures. Have a look at the inner loop
for hist in skipHistory do
Inside this loop, consider only changes which happened before "01/01/2013" (check hist->HIST_DATE to obtain this information). The history viewer code already classified the detected changes, so you want to watch out for changes which contain the string "Modify Attribute: Attribute_X". Assign the new value to a buffer. Outside this loop, check if the buffer contains "True". If so, you this is one of the objects you wanted to find.

How to avoid overlap between pages of firebase query results

I am trying to implement infinite scroll (aka paging) using Firebase's relatively new query functionality. I am stuck on one hopefully minor issue.
I ask for the first 10 results as follows:
offersRef.queryOrderedByChild(orderedByChildNamed).queryLimitedToFirst(10).observeEventType(.ChildAdded, andPreviousSiblingKeyWithBlock:childAddedBlock, withCancelBlock:childAddedCancelBlock)
But when I want to get the next 10, I will have to start with the 10th key as my starting value. What I really want is to pass the 10th key and tell firebase that I want it offset by 1, so that it will observe the next 10. But I think "offset" is old syntax (before query functionality was rolled out) and can't be used here.
So I tried asking for 11 and then ignoring the first one, but that is problematic as you may quickly guess, since the results I am observing can (and will) change:
offersRef.queryOrderedByChild(orderedByChildNamed).queryStartingAtValue(startingValue,childKey:startingKey!).queryLimitedToFirst(10+1).observeEventType(.ChildAdded, andPreviousSiblingKeyWithBlock:childAddedBlock, withCancelBlock:childAddedCancelBlock)
And just for clarity, the following are all variables defined in my app and not particularly germane to the question:
offersRef
orderedByChildNamed
childAddedBlock
childAddedCancelBlock

is it possible to optimise spreadsheetgear performance?

I am using the spreadsheetgear library in a web application that manages some large spreadsheets, populating cell values and extracting the results. It runs extremely fast for small spreadsheets but we are noticing problems with more sophisticated ones. One possible improvement that springs to mind is that as we are setting each parameter cell value I imagine that the other cell values are being recalculated immediately (possibly?), so if I set a lot of cell values prior to extracting the results then those calculations are being run redundantly (num of parameters - 1) times. Of course I don't really know how it works, maybe it just calculates the relevant values when inspected... so can someone please let me know if that is the case and if so is there anything that can be done to delay processing?
It looks like values are being calculated on pull from formula cell anyway.
Here is an explanation regarding some configuration settings
http://www.spreadsheetgear.com/support/help/spreadsheetgear.net.3.0/SpreadsheetGear~SpreadsheetGear.IWorkbookSet~CalculationOnDemand.html

Resources