OPA engine support for concurrent calls - open-policy-agent

How many concurrent REST Put/Patch calls can an OPA engine support to update the policy/data?
I tried looking through the documentation, but couldn't find any information pertaining to this.

OPA currently supports multiple concurrent policy queries (which are read-only) and 1 concurrent write (e.g., an HTTP PUT on /v1/data). The reads and write can be processed concurrently. When the write is committed, the server will block until outstanding policy queries complete.

Related

Do requests in a batch run In parallel?

As long as I do not use the dependsOn property, do the requests in a batch request run in parallel?
I have heard that they may not, and for performance reasons, it may be better to send individual requests in parallel from my code, so I'm wondering if that's truly the case.
It really depends on what requests are in the batch and with entities they are touching. But in general yes, requests are run in parallel if you don't add a depends on property. Implementation details may vary though. It is possible that large batches are sliced into subsets of requests and one subset is executed at a time (with all requests in the subset being executed in parallel).
No matter what you'll save HTTP connections handshakes, HTTP request headers and more when using a batch vs lots of requests.

Why increments are not supported in Dataflow-BigTable connector?

We have a use case in the Streaming mode where we want to keep track of a counter on BigTable from the pipeline (something #items finished processing) for which we need the increment operation. From looking at https://cloud.google.com/bigtable/docs/dataflow-hbase, I see that append/increment operations of the HBase API are not supported by this client. The reason stated is the retry logic on batch mode but if Dataflow guarantees exactly-once, why would supporting it be a bad idea since I know for sure the increment was called only-once? I want to understand what part I am missing.
Also, is CloudBigTableIO usable in Streaming mode or is it tied to Batch mode only? I guess we could use the BigTable HBase client directly in the pipeline but the connector seems to have nice properties like Connection-pooling which we would like to leverage and hence the question.
The way that Dataflow (and other systems) offer the appearence of exactly-once execution in the presence of failures and retries is by requiring that side-effects (such as mutating BigTable) are idempotent. A "write" is idempotent because it is overwritten on retry. Inserts can be made idempotent by including a deterministic "insert ID" that deduplicates the insert.
For an increment, that is not the case. It is not supported because it would not be idempotent when retried, so it would not support exactly-once execution.
CloudBigTableIO is usable in streaming mode. We had to implement a DoFn rather than a Sink in order to support that via the Dataflow SDK.

How to perform parallel processing on a "single" rails API in server side?

There are a lot of methods to handle "multiple" API requests in server side. Parallel processing can be implemented here.
But i would like to know how a single API can be parallely processed.
For example:
If an API request executes a method say method1.
def method 1
.....
......
.......
end
If method1 is a long method which may take a long time for processing (include multiple loops and database queries), instead of processing it sequentially, is there a scope for parallel processing there?
One way would be using resque for creating background jobs. But is there any other way to do it and if so how should the code be written to accommodate the requirement.
And is there any server side method to do it which is not ruby specific?
Note that there is a huge difference between event based servers and background jobs.
An event based server often runs on a single thread and uses callbacks for non-blocking IO. The most famous example is Node.js. For ruby there is the eventmachine library and various frameworks and simple HTTP servers.
An evented server can start processing one request and then switch to another request while the first is waiting for a database call for example.
Note that even if you have a event based server you can't be slow at processing requests. The user experience will suffer and clients will drop the connection.
Thats where background jobs (workers) come in, they let your web process finish fast so that it can send the response and start dealing with the next request. Slow processes like sending out emails or cleanup that don't have does not require user feedback or concurrency are farmed out to workers.
So in conclusion - if your application is slow then using parallel processing is not going to save you. Its not a silver bullet. Instead you should invest in optimising database queries and leveraging caching so that responses are fast.
While you could potentially run database queries or other operations in parallel in Rails, the greatly added complexity is probably not worth the performance gains.
What I mean here is that with what you are usually doing in Rails concurrency is not really applicable - you're fetching something from the DB and using it to make JSON or HTML. You can't really start rendering until you have the results back anyways. While you could potentially do something like fetch data and use it to render partials concurrently Rails does not support this out of the box since it would greatly increase the complexity while not offering much to the majority of the users of the framework.
As always - don't optimise prematurely.

Parallel asynchronous requests in SOA using a messaging broker

I've been looking at an SOA using a messaging broker (rabbitmq / rails), however there are still a few niggles I cant get my head around.
If I wanted to run parallel requests as you would using something like Typhoeus with http
a) how in an asynchronous system like this - when you have potentially multiple threads publishing to the same topic exchange do you connect the response message with your request - would you add a unique routing key?
c) what would be the best way initiating and managing multiple parallel calls of this nature in ruby?
Many thanks
In answer to a), yes you use a routing key, or in the parlance of messaging, a correlation identifier.
In answer to c), sorry I haven't a clue about Ruby, but messaging by nature supports parallelism by using queues to manage throughput. I assume that whatever broker you choose would provide the appropriate samples and tooling for your needs.
I would use at sidekiq or rescue for jobs like that. If your system is larger and distributed you can create a module/class which takes your job including key as argument, sends it to rabbitmq, some worker which is subscribed to fan out or channel picks it up and sends the result back as POST to your app (web hook approach).
For simplicity you can also just put some sort of Ajax spinner on your view and poll every 10 seconds or whatever suits you if the result is back. For sure you should have some kind of id for every job. If you have questions about it I could elaborate more. My apps crunch a lot if data in long running tasks with up to 500,000,000 items in rabbit queues.

How do I create a worker daemon which waits for jobs and executes them?

I'm new to Rails and multithreading and am curious about how to achieve the following in the most elegant way.
I couldn't find any nice tutorials which explained in detail what's the best design decision for the following task:
I have a couple of HTTP requests which will be run for a user in the background, for example, parsing a couple websites and get some information like HTTP response code, response time, then return the results. For performance reasons, I decided to split the total number of URLs to parse into batches of 25 each, then execute each batch in a thread, join these and write the result to a database.
I decided to use the following gem (http://rubygems.org/gems/thread) to ensure that there's a maximum number of threads that are run simultaneously. So far so good.
The problem is, if two users start their analysis in parallel, the maximum number of threads is two times the maximum of my threadpool.
My solution (imho) is to create a worker daemon which runs on its own and waits for jobs from the clients.
My question is, what's the best way to achieve this in Rails?
Maybe create a Rake task, and use it as a daemon (see: "Daemoninsing a rake task") and (how?) add jobs to it?
Thank you very much in advance!
I'd build a queue in a table in the database, and a bit of code that is periodically started by cron, which walks that table, passing requests to Typhoeus and Hydra.
Here's how the author summarizes the gem:
Like a modern code version of the mythical beast with 100 serpent heads, Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic.
As users add requests, append them to the table. You'll want fields like:
A "processed" field so you can tell which were handled in case the system goes down.
A "success" field so you can tell which requests were processed successfully, so you can retry if they failed.
A "retry_count" field so you can retry up to "n" times, then flag that URL as unreachable.
A "next_scan_time" field that says when the URL should be scanned again so you don't DOS a site by hitting it continuously.
Typhoeus and Hydra are easy to use, and do make it easy to handle multiple requests.
There are a bunch of libraries for Rails that can manage queues of long-running background jobs for you. Here are a few:
Sidekiq uses Redis for job storage and supports multiple worker threads.
Resque also uses Redis and a single worker thread.
delayed_job manages a job queue through ActiveRecord (or Mongoid).
Once you've chosen one, I'd recommend using Foreman to simplify launching multiple daemons at once.

Resources