Single Transaction Getting Committed, Multiple transactions in a batch is remaining in PENDING State - hyperledger

I am running a hyperledger sawtooth network where the prod has 12 nodes while the tests we have 2 nodes. In test env whenever I am sending a batch with a single transaction, it's getting committed while, multiple transactions a batch remains in the PENDING state forever.
For the HTTP response, I am receiving 202 Accepted from Sawtooth.
The debug and error logs does not show any difference between the single and multiple transactions per batch.
I am using sawtooth golang SDK for development.
While the link below explains the possible scenarios,
https://github.com/danintel/sawtooth-faq/blob/master/client.rst#what-does-it-mean-if-a-batch-status-result-remains-pending
but this will not be applicable to my case since the batch with a single transaction is always getting successfully COMMITTED.
Anyone has any idea what can be the possible reason, and where can I at least look for the errors.
data": [
{
"id": "bb78d4b5e9cc0ec62b750a6ac0825dff816615d0a968584bc3fa0bfc31de82bc12a1961aa6f81f83b815ca38f2b252ca000a4324831c5c44c913a0a81a28f5b9",
"invalid_transactions": [],
"status": "PENDING"
}
]

Do you have debug in your Transaction Processor and is it enabled? If so, are all transactions received and processed successfully and in the same order by your Transaction Processor?

Related

Does Cloud Workflows executions really fail when the Failed status reaches 2000?

I have a question about "Concurrent executions" in the following document.
https://cloud.google.com/workflows/quotas?hl=en#request_limit
I would like to leave execution in a failed state or delete failed execution.
But I read the document and it reads as follows:
Unable to delete execution.
Once the failed status has accumulated to 2000, the execution can no longer be created.
I would first like to confirm that this perception is correct.
Does every workflow executions have to be Success?
I read the document
https://cloud.google.com/workflows/quotas?hl=en#request_limit
Google Cloud Workflows - Concurrent executions limit
Concurrent executions only counts executions that have not yet completed or failed. Executions in a failed state are not counted, so you are fine to leave them (which is usually what you want for inspecting and processing later).

How does dataflow manage current processes during upscaling streaming job?

When dataflow streaming job with autoscaling enabled is deployed, it uses single worker.
Let's assume that pipeline reads pubsub messages, does some DoFn operations and uploads into BQ.
Let's also assume that PubSub queue is already a bit big.
So pipeline get started and loads some pubsubs processing them on single worker.
After couple of minutes it gets realized that some extra workers are needed and creates them.
Many pubsub messages are already loaded and are being processed but not acked yet.
And here is my question: how dataflow will manage those unacked yet, being processed elements?
My observations would suggest that dataflow sends many of those already being processed messages to a newly created worker and we can see that the same element is being processed at the same time on two workers.
Is this expected behavior?
Another question is - what next? First wins? Or new wins?
I mean, we have the same pubsub message that is still being processed on first worker and on the new one.
What if process on first worker will be faster and finishes processing? It will be acked and goes downstream or will be drop because new process for this element is on and only new one can be finalized?
Dataflow provides exactly-once processing of every record. Funnily enough, this does not mean that user code is run only once per record, whether by the streaming or batch runner.
It might run a given record through a user transform multiple times, or it might even run the same record simultaneously on multiple workers; this is necessary to guarantee at-least once processing in the face of worker failures. Only one of these invocations can “win” and produce output further down the pipeline.
More information here - https://cloud.google.com/blog/products/data-analytics/after-lambda-exactly-once-processing-in-google-cloud-dataflow-part-1

Sidekiq multiple dependant jobs, when to completed or retry?

In my Rails application, I have a model called Report
Report has one or many chunks (called Chunk) that would generate a piece of content based on external service calls (APIs, etc.)
When user requests to generate a report, by using Sidekiq, I queue the "chunk's jobs" in order to run them in the background and notify user that we will be emailing them the result once the report is generated.
Report uses a state machine, to flag whether or not all the jobs are successfully finished. All the chunks must be completed before we flag the report as ready. If one fails, we need to either try again, or give up at some point.
I determined the states as draft (default), working, finished The finish result is a combination of all the services pieces together. 'Draft' is when the chunks are still in the queue and none of them has started generating any content.
How would you tackle this situation with Sidekiq? How do you keep a track (live) which chunk's services are finished, or working or failed, so we can flag the report finished or failed?
I'd like to see a way to periodically check the jobs to see where they are standing, and change a state when they all finished successfully, or flag it fail, if all the retries give up!
Thank you
We had a similar need in our application to determine when sidekiq jobs were finished during automated testing.
What we used is the sidekiq-status gem: https://github.com/utgarda/sidekiq-status
Here's the rough usage:
job_id = Job.perform_async()
You'd then pass the job ID to the place where it will try to check the status of the job
Sidekiq::Status::status job_id #=> :working, :queued, :failed, :complete
Hope this helps.
This is a Sidekiq Pro feature called Batches.
https://github.com/mperham/sidekiq/wiki/Batches

Dataflow process has not recovered on failure

Following the recent incidents where a whole AZ would have been lost to an outage, I would like to understand better the Dataflow failover procedures.
When I manually deleted the worker nodes for a dataflow job (Streaming, PubSub to BigQuery), they had been successfully recreated/restarted, yet the Dataflow process itself had not recovered.
Even though all the statuses were OK, the data items were not flowing.
The only way to restart the flow was to cancel the job and re-submit it again.
Even though I understand that manual deletion is not a valid test, we cannot discount the factor of the human error.
My understanding that the workflow should have restarted automatically, yet it is not the observed case here.
What do I miss?
Dataflow does rely on GCE for resilience to physical failure, so we do not support recovery from manual deletion of a node. Explicit deletion does not simulate a GCE outage, so this will not test the resiliency property you are interested in.

Why can't one of my Rails processes see what the other has committed to the DB?

I'm developing an app on top of Amazon FPS. When you make a payment with FPS, the payment succeeds asynchronously: you make a request and wait for a POST (an Instant Payment Notification) informing you whether the charge completed.
I need the user to see whether the charge completed by the next page load (if possible), so I'm having the server:
Charge the user, then
Spin in a loop checking the database for a status update, and
Time out if it takes too long
Meanwhile, another server process is:
Receiving the IPN and
Noting the success in the database for the other process to see.
I'm running Unicorn with 3 workers. They're all logging to the same terminal window. I see the first process begin to spin, reporting repeatedly that the charge is still pending. Then I see the IPN come in, and the second process pick it up and write to the database that it has succeeded. Then I see the first process continue to see that it's pending.
Why does it never see the success value that was written to the database?
It feels to me like a transaction issue, so I ran a separate process which loops and outputs the status of the latest charge. When the second process reported that it marked the charge successful, this third independent process agreed. It's just the first server process that's failing to see the updated value.
As far as I can tell, the loop in that first process is not inside a transaction, and so it shouldn't be reading an old snapshot. But perhaps it is? How would I tell?
My Stack:
Unicorn 4.6.3
Rails 4.0
Ruby 2.0
Postgres 9.2

Resources