NSURLSessionTaskMetrics was introduced since iOS10, which is very helpful.
But i am still feel confused about the property transactionMetrics, which is an array of NSURLSessionTaskTransactionMetrics.
I want to know, if there are more than one transaction metrics in the array, how do i measure the performance of the task? simply summate all their time, or just pick one of them, if so which one to use?
I've got an rough idea that during the task execution, the session may use more than one transaction to accomplish the task.
But can any one give a more detail description about that, the transaction executed in order or concurrently .
looking forward for anyone could help
Related
I am currently implementing a web application in .net core(C#) using entity framework. While working on the project, I actually encountered quite a few challenges but I will start with the one which I think are most important. My questions are as follows:
Instead of frequent loading data from the database, I am having a set of static objects which is a mirror of the data in the database. However, it is tedious and error prone when I want to ensure any changes, i.e., adding/deleting/modifying of objects are being saved to the database at real time. Is there any good example or advice that I can refer to improve my approach to do this?
Another thing is that value of some objects' properties will be changed on the fly according to the value of some other objects' properties. Something like a spreadsheet where a cell's value will be changed automatically if the value in the cell that the formula is referring to changes. I do not have a solution to do this yet. Appreciate if anyone has any example that I can refer to. But this will add another layer of complexity to sync the changes of the objects in memory to database.
At the moment, I am unsure if there is any better approach. Appreciate if anyone can help. Thanks!
Basically, you're facing a problem that's called eventual consistency. Something changes and two or more systems need to be aware at the same time. The problem here is that both changes need to be applied in order to consider the operation successful. If either one fails, you need to know.
In your case, I would use the Azure Service Bus. You can create queues and put messages on a queue. An Azure Function would handle these queue messages. You would create two queues, one for database updates, and one for the in-memory update (I think changing this to a cache service may be something to think off). Now the advantage of these queues is that you can easily drop messages on these queues from anywhere. Because you mentioned the object is going to evolve, you may need to update these objects either in the database or in memory (cache).
Once you've done that, I'd create a topic, with two subscriptions. One forwarding messages to Queue 1, and the other to Queue 2. This will solve your primary problem. In case an object changes, just send it to the topic. Both changes (database and memory) will be executed automagically.
The only problem you have now, it that you mentioned you wanted to update the database in real-time. With this scenario, you're going to have to leave that.
Also, you need to make sure you have proper alerts in place for the queues so in case you did miss a message, or your functions didn't handle it well enough, you'll receive an alert to check & correct errors.
I'm totally agree with #nineedm's and answer, but there are also other solutions.
If you introduce cache, you will always face cache revalidation problem - you have to mark cache as invalid when data were changed. Sometimes it is easy, depending on nature of cached data and how often data are changed.
If you have just single application, MemoryCache can be enough with proper specified expiration options.
If there is a cluster - you have to look at Distributed Cache solutions, for example Redis. There is MS article about that Distributed caching in ASP.NET Core
To our Streaming pipeline, we want to submit unique GCS files, each file containing multiple event information, each event also containing a key (for example, device_id). As part of the processing, we want to shuffle by this device_id so as to achieve some form of worker to device_id affinity (more background on why we want to do it is in this another SO question. Once all events from the same file are complete, we want to reduce (GroupBy) by their source GCS file (which we will make a property of the event itself, something like file_id) and finally write the output to GCS (could be multiple files).
The reason we want to do the final GroupBy is because we want to notify an external service once a specific input file has completed processing. The only problem with this approach is that since the data is shuffled by the device_id and then grouped at the end by the file_id, there is no way to guarantee that all data from a specific file_id has completed processing.
Is there something we could do about it? I understand that Dataflow provides exactly_once guarantees which means all the events will be eventually processed but is there a way to set a deterministic trigger to say all data for a specific key has been grouped?
EDIT
I wanted to highlight the broader problem we are facing here. The ability to mark
file-level completeness would help us checkpoint different stages of the data as seen by external consumers. For example,
this would allow us to trigger per-hour or per-day completeness which are critical for us to generate reports for that window. Given that these stages/barriers (hour/day) are clearly defined on the input (GCS files are date/hour partitioned), it is only natural to expect the same of the output. But with Dataflow's model, this seems impossible.
Similarly, although Dataflow guarantees exactly-once, there will be cases where the entire pipeline needs to be restarted since something went horribly wrong - in those cases, it is almost impossible to restart from the correct input marker since there is no guarantee that what was already consumed has been completely flushed out. The DRAIN mode tries to achieve this but as mentioned, if the entire pipeline is messed up and draining itself cannot make progress, there is no way to know which part of the source should be the starting point.
We are considering using Spark since its micro-batch based Streaming model seems to fit better. We would still like to explore Dataflow if possible but it seems that we wont be able to achieve it without storing these checkpoints externally from within the application. If there is an alternative way of providing these guarantees from Dataflow, it would be great. The idea behind broadening this question was to see if we are missing an alternate perspective which would solve our problem.
Thanks
This is actually tricky. Neither Beam nor Dataflow have a notion of a per-key watermark, and it would be difficult to implement that level of granularity.
One idea would be to use a stateful DoFn instead of the second shuffle. This DoFn would need to receive the number of elements expected in the file (from either a side-input or some special value on the main input). Then it could count the number of elements it had processed, and only output that everything has been processed once it had seen that number of elements.
This would be assuming that the expected number of elements can be determined ahead of time, etc.
First time learning about concurrency and threading within Rails, so any advice is very appreciated.
I currently have an array of 50 strings. I have an 3rd party API call that takes in the string and returns a numeric value. Right now I am simply calling the API on each string one at a time, which takes a really long time.
After looking at a few SO like this one, this other one and finally this one, it seems like I have to use some sort of threading to achieve what I want to do. My plan is to break down the array into batches of ten strings, and then run 5 API calls on each array of ten strings concurrently in hopes that it will drastically reduce the time.
I've never done threading of any kind with rails before, so I just wondering if I am on the right track following the third SO post above, or if I should use other techniques that may be better for my need.
The approach you take will depend on your use case. Do you need to wait for all the calls to be made to do something with the result? Can it be asynchronous?
If you are looking into threads to distribute the work then the third SO post you mentioned is a good way to do it.
If your use case permits the process to be async, I'd definitely look into a scheduler, as mentioned in the first SO post. I've use DelayedJob for this goal, there are some other alternatives.
On a related topic, I usually implement a micro-service that receives those requests and processes them async instead of having DelayedJob in the same app, but is just a matter of preference.
Something REALLY important to have in mind if you go with the async approach is that if you are accessing ActiveRecord records inside a thread you need to explicitly check out the database connection. Rails only handles the check in/out of connections in the main thread. Be really careful on this since it can cause connection leaks really hard to track.
The first answer on this SO post shows how to ensure the db connection to be released.
Hope that helps.
Essentially each time a visitor reaches the application, the controller performs a database query to check what are the most relevant items to show.
Although the items shown vary with time, they are not personally selected for each user.
This means that instead of being calculated each time a visitor comes, it would be better to be system performing a single query every like 10 minutes and store it, to apply on each visit.
What is the best way to apply this idea? I was thinking on cronjobs and maybe store on redis but IDK, some help is appreciated!
There are a number of ways to do this. One way that I've used in the past with success is to have a table in your database that represents the most relevant items and then have a cron job that updates that table.
Fragment caching like #wesley6j recommended isn't a bad way to go either and you can combine the 2 techniques as well if you want.
If you want more detailed suggestions, you can provide some more details about what you are trying to achieve.
I was going through this excellent blog post
(http://www.humancode.us/2014/08/14/target-queues.html)
of target threads in iOS and I could not help but wonder why do we need such a mechanism. In the example, we are specifying a serialised target queue for a custom concurrent queue. Can we not achieve the same by executing the blocks in the original concurrent queue in a serialised queue instead?
Whats the point of having a serialised target queue for a concurrent queue????
If I got You right, you're asking why would someone start serial task on a concurrent queue.
You would need that kind of behaviour in case, if most tasks with some resource can be performed concurrently (aka, simultaneously), but some tasks are, by nature, unsafe to be performed concurrently with others.
The most common example is readers/writers problem. Here you are accessing, for example, some resource of a file system. It's ok to read it even from different threads - every reader will get exactly what it needs. But here comes necessity to update contents of that file. Modifying it while someone reads it leads to unpredicted results - reader is not guaranteed to get the right, expected, info (partially from old version, partially from new). Even worse - there can be two writers (if file contents changes by application user and from some central storage via net) - result will be some crazy mix of two versions (actually, it can be now even corrupted)
Here comes necessity for each writer to wait till all other tasks performed (no one reads, no one writes), and for each reader to wait until no writing tasks take place (no one writes, no matter how many readers)
Wikipedia has nice article on this one. I haven't run into any other practical situations, where you would need this, but I believe there're more of them.
Hope it answers your question