Do requests in a batch run In parallel? - microsoft-graph-api

As long as I do not use the dependsOn property, do the requests in a batch request run in parallel?
I have heard that they may not, and for performance reasons, it may be better to send individual requests in parallel from my code, so I'm wondering if that's truly the case.

It really depends on what requests are in the batch and with entities they are touching. But in general yes, requests are run in parallel if you don't add a depends on property. Implementation details may vary though. It is possible that large batches are sliced into subsets of requests and one subset is executed at a time (with all requests in the subset being executed in parallel).
No matter what you'll save HTTP connections handshakes, HTTP request headers and more when using a batch vs lots of requests.

Related

How to perform parallel processing on a "single" rails API in server side?

There are a lot of methods to handle "multiple" API requests in server side. Parallel processing can be implemented here.
But i would like to know how a single API can be parallely processed.
For example:
If an API request executes a method say method1.
def method 1
.....
......
.......
end
If method1 is a long method which may take a long time for processing (include multiple loops and database queries), instead of processing it sequentially, is there a scope for parallel processing there?
One way would be using resque for creating background jobs. But is there any other way to do it and if so how should the code be written to accommodate the requirement.
And is there any server side method to do it which is not ruby specific?
Note that there is a huge difference between event based servers and background jobs.
An event based server often runs on a single thread and uses callbacks for non-blocking IO. The most famous example is Node.js. For ruby there is the eventmachine library and various frameworks and simple HTTP servers.
An evented server can start processing one request and then switch to another request while the first is waiting for a database call for example.
Note that even if you have a event based server you can't be slow at processing requests. The user experience will suffer and clients will drop the connection.
Thats where background jobs (workers) come in, they let your web process finish fast so that it can send the response and start dealing with the next request. Slow processes like sending out emails or cleanup that don't have does not require user feedback or concurrency are farmed out to workers.
So in conclusion - if your application is slow then using parallel processing is not going to save you. Its not a silver bullet. Instead you should invest in optimising database queries and leveraging caching so that responses are fast.
While you could potentially run database queries or other operations in parallel in Rails, the greatly added complexity is probably not worth the performance gains.
What I mean here is that with what you are usually doing in Rails concurrency is not really applicable - you're fetching something from the DB and using it to make JSON or HTML. You can't really start rendering until you have the results back anyways. While you could potentially do something like fetch data and use it to render partials concurrently Rails does not support this out of the box since it would greatly increase the complexity while not offering much to the majority of the users of the framework.
As always - don't optimise prematurely.

Is it possible to reduce the throughput of my pipeline?

I have a dataflow job that communicates with external resources. The problem is that theses external resources are slower than the dataflow job and this causes that the external resources are always saturated. I need some form to reduce the quantity of messages read from PubSub or something to reduce the throughput of the job in order to reduce the traffic to the external resources.
Thanks.
We currently do not support throttling primitives (such as "make sure this DoFn is throttled to at most X calls per second over the whole job"), however we know it is an important use case and it will most likely be supported sooner or later.
Meanwhile your best bet is, as Ryan said, to limit the number of workers and worker threads: specify --numWorkers (or --maxNumWorkers if you are using autoscaling) and --numberOfWorkerHarnessThreads. However, note that this will lead to creating a backlog of input messages, rather than dropping them. It is hard to tell which is better in your use case.

Neo4j batching using REST interface locking database?

When batching several queries in an HTTP requests for Neo4j, does that cause the graph database to perform all the queries in the HTTP request before moving to the next request?
Could this potentially mean that a large enough batch would lock the whole database for the time it takes to perform all queries in the batch? Or are they somehow run in parallel?
Is the batching using the REST interface (and py2neo) using the batch inserter (so its non transactional) or normal transactional insertion?
Thanks
It performs all queries in the batch request, but other queries can come in in parallel and are executed on other threads. It is only if your batch-request consumes all CPU, Memory, IO that it affects other queries.
I would use the transactional API from 2.x on.

How do I create a worker daemon which waits for jobs and executes them?

I'm new to Rails and multithreading and am curious about how to achieve the following in the most elegant way.
I couldn't find any nice tutorials which explained in detail what's the best design decision for the following task:
I have a couple of HTTP requests which will be run for a user in the background, for example, parsing a couple websites and get some information like HTTP response code, response time, then return the results. For performance reasons, I decided to split the total number of URLs to parse into batches of 25 each, then execute each batch in a thread, join these and write the result to a database.
I decided to use the following gem (http://rubygems.org/gems/thread) to ensure that there's a maximum number of threads that are run simultaneously. So far so good.
The problem is, if two users start their analysis in parallel, the maximum number of threads is two times the maximum of my threadpool.
My solution (imho) is to create a worker daemon which runs on its own and waits for jobs from the clients.
My question is, what's the best way to achieve this in Rails?
Maybe create a Rake task, and use it as a daemon (see: "Daemoninsing a rake task") and (how?) add jobs to it?
Thank you very much in advance!
I'd build a queue in a table in the database, and a bit of code that is periodically started by cron, which walks that table, passing requests to Typhoeus and Hydra.
Here's how the author summarizes the gem:
Like a modern code version of the mythical beast with 100 serpent heads, Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic.
As users add requests, append them to the table. You'll want fields like:
A "processed" field so you can tell which were handled in case the system goes down.
A "success" field so you can tell which requests were processed successfully, so you can retry if they failed.
A "retry_count" field so you can retry up to "n" times, then flag that URL as unreachable.
A "next_scan_time" field that says when the URL should be scanned again so you don't DOS a site by hitting it continuously.
Typhoeus and Hydra are easy to use, and do make it easy to handle multiple requests.
There are a bunch of libraries for Rails that can manage queues of long-running background jobs for you. Here are a few:
Sidekiq uses Redis for job storage and supports multiple worker threads.
Resque also uses Redis and a single worker thread.
delayed_job manages a job queue through ActiveRecord (or Mongoid).
Once you've chosen one, I'd recommend using Foreman to simplify launching multiple daemons at once.

Concurrent SOAP api requests taking collectively longer time

I'm using savon gem to interact with a soap api. I'm trying to send three parallel request to the api using parallel gem. Normally each request takes around 13 seconds to complete so for three requests it takes around 39 seconds. After using parallel gem and sending three parallel requests using 3 threads it takes around 23 seconds to complete all three requests which is really nice but I'm not able to figure out why its not completing it in like 14-15 seconds. I really need to lower the total time as it directly affects the response time of my website. Any ideas on why it is happening? Are network requests blocking in nature?
I'm sending the requests as follows
Parallel.map(["GDSSpecialReturn", "Normal", "LCCSpecialReturn"], :in_threads => 3){ |promo_plan| self.search_request(promo_plan) }
I tried using multiple processes also but no use.
I have 2 theories:
Part of the work load can't run in parallel, so you don't see 3x speedup, but a bit less than that. It's very rare to see multithreaded tasks speedup 100% proportionally to the number of CPUs used, because there are always a few bits that have to run one at a time. See Amdahl's Law, which provides equations to describe this, and states that:
The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program
Disk I/O is involved, and this runs slower in parallel because of disk seek time, limiting the IO per second. Remember that unless you're on an SSD, the disk has to make part of a physical rotation every time you look for something different on it. With 3 requests at once, the disk is skipping repeatedly over the disk to try to fulfill I/O requests in 3 different places. This is why random I/O on hard drives is much slower than sequential I/O. Even on an SSD, random I/O can be a bit slower, especially if small-block read-write is involved.
I think option 2 is the culprit if you're running your database on the same system. The problem is that when the SOAP calls hit the DB, it gets hit on both of these factors. Even blazing-fast 15000 RPM server hard drives can only manage ~200 IO operations per second. SSDs will do 10,000-100,000+ IO/s. See figures on Wikipedia for ballparks. Though, most databases do some clever memory caching to mitigate the problems.
A clever way to test if it's factor 2 is to run an H2 Database in-memory DB and test SOAP calls using this. They'll complete much faster, probably, and you should see similar execution time for 1,3, or $CPU-COUNT requests at once.
That's actually is big question, it depends on many factors.
1. Ruby language implementation
It could be different between MRI, Rubinus, JRuby. Tho I am not sure if the parallel gem
support Rubinus and JRuby.
2. Your Machine
How many CPU cores do you have in your machine, you can leverage this using parallel process? Have you tried using process do this if you have multiple cores?
Parallel.map(["GDSSpecialReturn", "Normal", "LCCSpecialReturn"]){ |promo_plan| self.search_request(promo_plan) } # by default it will use [number] of processes if you have [number] of CPUs
3. What happened underline self.search_request?
If you running this in MRI env, cause the GIL, it actually running your code not concurrently. Or put it precisely, the IO call won't block(MRI implementation), so only the network call part will be running concurrently, but not all others. That's why I am interesting about what other works you did inside self.search_request, cause that would have impact on the overall performance.
So I recommend you can test your code in different environments and different machines(it could be different between your local machine and the real production machine, so please do try tune and benchmark) to get the best result.
Btw, if you want to know more about the threads/process in ruby, highly recommend Jesse Storimer's Working with ruby threads, he did a pretty good job explaining all this things.
Hope it helps, thanks.

Resources