How to avoid overlap between pages of firebase query results - ios

I am trying to implement infinite scroll (aka paging) using Firebase's relatively new query functionality. I am stuck on one hopefully minor issue.
I ask for the first 10 results as follows:
offersRef.queryOrderedByChild(orderedByChildNamed).queryLimitedToFirst(10).observeEventType(.ChildAdded, andPreviousSiblingKeyWithBlock:childAddedBlock, withCancelBlock:childAddedCancelBlock)
But when I want to get the next 10, I will have to start with the 10th key as my starting value. What I really want is to pass the 10th key and tell firebase that I want it offset by 1, so that it will observe the next 10. But I think "offset" is old syntax (before query functionality was rolled out) and can't be used here.
So I tried asking for 11 and then ignoring the first one, but that is problematic as you may quickly guess, since the results I am observing can (and will) change:
offersRef.queryOrderedByChild(orderedByChildNamed).queryStartingAtValue(startingValue,childKey:startingKey!).queryLimitedToFirst(10+1).observeEventType(.ChildAdded, andPreviousSiblingKeyWithBlock:childAddedBlock, withCancelBlock:childAddedCancelBlock)
And just for clarity, the following are all variables defined in my app and not particularly germane to the question:
offersRef
orderedByChildNamed
childAddedBlock
childAddedCancelBlock

Related

Keeping min/max value in a BigTable cell

I have a problem where it would be very helpful if I was able to send a ReadModifyWrite request to BigTable where it only overwrites the value if the new value is bigger/smaller than the existing value. Is this somehow possible?
Note: I thought of a hacky way where I use the timestamp as my actual value, and have the max number of versions 1, so that would keep the "latest" value which is the higher timestamp. But those timestamps would have values from 1 to 10 instead of 1.5bn. Would this work?
I looked into the existing APIs but haven't found anything that would help me do this. It seems like it is available in DynamoDB, so I guess it's reasonable to ask for BigTable to have it as well https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html#API_UpdateItem_RequestSyntax
Your timestamp approach could probably be made to work, but would interact poorly with stuff like age-based garbage collection.
I also assume you mean CheckAndMutate as opposed to ReadModifyWrite? The former lets you do conditional overwrites, the latter lets you do unconditional increments/appends. If you actually want an increment that only works if the result will be larger, just make sure you only send positive increments ;)
My suggestion, assuming your client language supports it, would be to use a CheckAndMutateRow request with a value_range_filter. This will require you to use a fixed-width encoding for your values, but that's no different than re-using the timestamp.
Example: if you want to set the value to 000768, but only if that would be an increase, use a value_range_filter from 000000 to 000767, inclusive, and do your write in the true_mutation of the CheckAndMutate.

Remove duplicates across window triggers/firings

Let's say I have an unbounded pcollection of sentences keyed by userid, and I want a constantly updated value for whether the user is annoying, we can calculate whether a user is annoying by passing all of the sentences they've ever said into the funcion isAnnoying(). Forever.
I set the window to global with a trigger afterElement(1), accumulatingFiredPanes(), do GroupByKey, then have a ParDo that emits userid,isAnnoying
That works forever, keeps accumulating the state for each user etc. Except it turns out the vast majority of the time a new sentence does not change whether a user isAnnoying, and so most of the times the window fires and emits a userid,isAnnoying tuple it's a redundant update and the io was unnecessary. How do I catch these duplicate updates and drop while still getting an update every time a sentence comes in that does change the isAnnoying value?
Today there is no way to directly express "output only when the combined result has changed".
One approach that you may be able to apply to reduce data volume, depending on your pipeline: Use .discardingFiredPanes() and then follow the GroupByKey with an immediate filter that drops any zero values, where "zero" means the identity element of your CombineFn. I'm using the fact that associativity requirements of Combine mean you must be able to independently calculate the incremental "annoying-ness" of a sentence without reference to the history.
When BEAM-23 (cross-bundle mutable per-key-and-window state for ParDo) is implemented, you will be able to manually maintain the state and implement this sort of "only send output when the result changes" logic yourself.
However, I think this scenario likely deserves explicit consideration in the model. It blends the concepts embodied today by triggers and the accumulation mode.

F# Array.Parallel hanging

I have been struggling with parallel and async constructs in F# for the last couple days and not sure where to go at this point. I have been programming with F# for about 4 months - certainly no expert - and I currently have a series of calculations that are implemented in F# (asp.net 4.5) and are working correctly when executed sequentially. I am running the calculations on a multi-core server and since there are millions of inputs to perform the same calculation on, I am hoping to take advantage of parallelism to speed it up.
The calculations are extremely data parallel - basically the exact calculation on different input data. I have tried a number of different avenues and I continually run into the same issue - it seems as if the parallel looping never gets to the end of the input data set. I have tried TPL, ConcurrentQueues, Parallel.Array.map/iter and all the same result: the program starts out fine and then somewhere in the middle (indeterminate) it just hangs and never completes. For simplicity I actually removed the calculation from the program and I am just calling a print method, and Here is where the code is currently at:
let runParallel =
let ids = query {for c in db.CustTable do select c.id} |> Seq.take(5)
let customerInputArray= getAllObservations ids
Array.Parallel.iter(fun c -> testParallel c) customerInputArray
let key = System.Console.ReadKey()
0
A few points...
I limited the results above to only 5 just for debugging. The actual program does not apply the Take(5).
The testParallel method is just a printfn "test".
The customerInputArray is a complex data type. It is a tuple of lists that contain records. So I am pretty sure my problem must be there...but I added exception handling and no exception is getting raised, so have no idea how to go about finding the problem.
Any help is appreciated. Thanks in advance.
EDIT: Thanks for the advice...I think it is definitely deadlock. When I remove all of the printfn, sprintfn, and string concat operations, it completes. (of course, I need those things in there.)
Is printfn, sprintfn, and string ops not thread-safe?
Another EDIT: Iteration always stops on the last item..So if my input array has 15 items, the processing stops on item 14, or seems to never get to item 15. Then everything just hangs. Does not matter what the size of the input array is..Any ideas what can be causing this? I even switched over to Parallel.ForEach (instead of Array.Parallel) and same behavior.
Update on the situation and how I resolved this issue.
I was unable to upload code from my example due to my company's firewall policy, so in the end my question did not have enough details. I failed to mention that I was using a type provider which was important information in this situation. But here is what I figured out.
I am using the F# type provider for SQL Server and was passing around its Service Types which I suspect are not thread-safe. When I replaced the ServiceTypes with plain old F# Records, the code worked fine - no more deadlocks and everything completed without error.

Is there any point to using Any() linq expression for optimisation purposes?

I have a MVC application which returns 2 types of Json responses from 2 controller methods; AnyRemindersExist() and GetAllUserReminders(). The first returns a boolean, 2nd returns an array, both wrapped as Json.
I have a JavaScript timer checking for calendar reminders against a user. It makes the first call (AnyRemindersExist) to check whether reminders exist and whether the client should then make the 2nd call.
For example, if the result of the Json response is false from the Any() query, it doesn't then make the 2nd controller action which makes a LINQ select call. If there are reminders that exist, it then goes further and then requests them (making use of the LINQ SELECT).
Imagine a system ramped up where 100-1000s users use the system and on the client, every 30-60 seconds a request comes in to load in the reminders. Does this Any() call help in anyway in reducing load on the server?
If you're always going to get the actual values afterwards, then no - it would make more sense to have fewer requests, and just always give the full results. I very much doubt that returning no results is slower than returning an indication that there are no results.
EDIT: tvanfosson's latest comment (at the time of this writing) is worth promoting:
You can really only tell by measuring and I'd only resort to it IFF the performance of the select only option didn't meet the requirements.
That's the most important thing about performance: the value of a guess is much less than the value of test data.
I would say that it depends on how the underlying queries are translated. If the any call is translated into an indexed lookup when the select (perhaps due to a join to get related data) must do some sort of table scan, then it will save some work in the case when there are no reminders to be found. It will cause a little extra work when there are reminders. It might be useful if the majority of the calls don't result in any results.
In the general case, though, I would just select the data and only try to optimize IF that turns out to not be fast enough. The conditions under which it will actually save effort on the server are pretty narrow and might only apply if you hand-craft the SQL rather than depend on your ORM.
Any only checks to see if there is at least one item in the Collection that is being returned. Versus using something like Count > 0 which counts the total amount of items in the collection then yes this is more optimal.
If your AnyRemindersExist method is operating on a similar principle then not calling a second call to the server would reduce your load.
So you are asking if not doing work the application doesn't need to do would reduce the workload on the server?
Of course. How would this answer every be "yes, doing extra work for no reason won't effect the server load".
It ultimately depends on how much faster the Any check is compared to getting the results and how often it will be false.
If the Any call takes near as long as the select then it pretty
much never makes sense.
If the Any call is much faster than the select but 90% of the
time it's true, then it probably isn't worth it (best case you
get 10% improvement, worst case it's actually more work).
If the Any call is much faster than the select and 90% of the
time it's false, then it probably makes sense to check if there
are any before actually getting results.
So the answer is it depends on your specific scenario. Ultimately you're going to need to measure both the relative performance (on different typical loads, maybe some queries are more intensive than others) as well as the frequency that there are no results to return.
Actually it should almost never make sense to check Any in this case.
If Any returns false then you don't need to grab the results.
However this means it would have returned no results anyway, so
unless your Any check is significantly faster than a select
returning 0 results, there's no added benefit here.
On the other hand, if Any returns true, then you'll need to get the
results anyway, so in this case Any is purely additional work done.

Best way to detect and store path combinations for analysing purpose later

I am searching for ideas/examples on how to store path patterns from users - with the goal of analysing their behaviours and optimizing on "most used path" when we can detect them somehow.
Eg. which action do they do after what, so that we later on can check to see if certain actions are done over and over again - therefore developing a shortcut or assembling some of the actions into a combined multiaction.
My first guess would be some sort of "simple log", perhaps stored in some SQL-manner, where we can keep each action as an index and then just record everything.
Problem is that the path/action might be dynamically changed - even while logging - so we need to be able to take care of this fact too, when looking for patterns later.
Would you log everthing "bigtime" first and then POST-process every bit of details after some time or do you have great experience with other tactics?
My worry is that this is going to take up space, BIG TIME while logging 1000 users each day for a month or more.
Hope this makes sense and I am curious to see if anyone can provide sample code, pseudocode or perhaps links to something usefull.
Our tools will be C#, SQL-database, XML and .NET 3.5 - clients could also get .NET 4.0 if needed.
Patterns examples as we expect them
...
User #1001: A-B-A-A-A-B-C-E-F-G-H-A-A-A-C-B-A
User #1002: B-A-A-B-C-E-F
User #1003: F-B-B-A-E-C-A-A-A
User #1002: C-E-F
...
etc. no real way to know what they do next nor how many they will use, how often they will do it.
A secondary goal, if possible, if we later on add a new "action" called G (just sample to illustrate, there will be hundreds of actions) how could we detect these new behaviours influence on the previous patterns.
To explain it better, my thought here would be some way to detect "patterns within patterns", sort of like how compressions work, so that "repeative patterns" are spottet. We dont know how long these patterns might be, nor how often they might come. How do we break this down into "small bits and pieces" - whats the best approach you think?
I am not sure what you mean by path, but, if you gave every action in a path a unique symbol, you could reduce the problem to longest common substring or subsequence.
Or have a map of paths to the number of times that action occurred. Every time a certain path happens, increment the count for that path. Then sort to find the most common.
Pseudo idea/implementation so far
Log ever users action into a list/series of actions, bulk kinda style (textfiles/SQL - what ever, just store the whole thing for post-processing)
start counting every "1 action", "2 actions", "3 actions" up til a certain amount (lets say 30 levels)
sort them all, by giving values of importants to some of the actions (might be those producing end results)
A usefull result perhaps?
If we count all [A], [A-A], [A-B], [A-C], [A-A-A], [A-A-B] etc. its going to make a LONG and fine list of which actions are used in row frequently, and thats in the right direction, because if some of these results gets too high, we might need a shorter path. Problem is then, whats too few actions to be optimized and whats the longest needed actionlist to search for? My guess is that we need to do this counting first, then examine the numbers.
Problem is that this would be part of an analyzing tool we are developing and we dont have data until implementation, so we dont know what to look for before its actually done. hmm... wondering if there really IS an answer to this one.

Resources