How to Flow Drain suspended Flows? - devops

I understand that I can do setFlowsDrainingModeEnabled to stop new flows and to let the older suspended or current work-in-process flows complete. I am satisfied about the current work-in-process flows.
The reason that the older flows are suspended is because they are waiting for some other counterparty flows in the network to complete.
Unless the counterparty node comes alive or sorts its issues, the flow drain is incomplete.
CordaRPCOps.stateMachinesFeed.snapshot.size would be non-zero. This means there are pending and waiting flows.
If the flow has not completed its course, would I be indefinitely stopping the node for the upgrade?

In Corda 3.1, you would proceed as follows:
Shut down the node
Clear the NODE_CHECKPOINTS table
For each row in the VAULT_STATES table:
Set LOCK_ID to null
Set LOCK_TIMESTAMP to null
Update the node
Restart the node
In Corda 4+, a CordaRPCOps.killFlow API will be introduced to kill suspended flows.

Related

confluent Kafka consumer stuck after broker restarts

I am using kafka_image=wurstmeister/kafka
zookeeper_version=3.4.14
kafka_version=2.12-2.4.0
C# client: Confluent kafka v1.2.0
We are
using 3 brokers , 1 zookeeper cluster As a pat deployment we use to
stop all the brokers ,zookeeper,producer,consumers and delete the
kafka-log files, and starts consumers first later will starts the
brokers and zookeeper. In this process some time consumer getting
stuck, its not picking any messages even it alive. If i restarts
the consumer, it started picking
Rebalance can be the reason for such a behaviour. When rebalance starts in a consumer group, all the consumers in this group are revoked and during rebalance consumers cannot commit offset or poll data until rebalance finishes and partitions are assigned to new consumers.
Some important notes to consider:
rebalance timeout is equal to max.poll.interval.ms. So if your max.poll.inteval.ms is so high because of long running processes then rebalance can take so much time.
Reasons to rebalance:
Joining new consumer to consumer group
Clean shutdown of a consumer
Adding new partition(s) to a topic which is subscribed by the consumer group
When a consumer is considered dead by the group coordinator
Expiring session.timeout.ms without sending heartbeat
Not calling poll during max.poll.interval.ms
Reason to face with rebalance after restart can be the joinGroupRequests that consumers send to group coordinator by calling poll. Each requests trigger to rebalance. (in potentially) Then you are getting lots of rebalances. To overcome this problem, you can increase group.initial.rebalance.delay.ms. It is 3 sec in default.
group.initial.rebalance.delay.ms: The amount of time the group
coordinator will wait for more consumers to join a new group before
performing the first rebalance. A longer delay means potentially fewer
rebalances, but increases the time until processing begins.

How to know if the Flow Drain has completed?

Can I know if there are any indicators in my node to check if the flow drain is complete, so that I can carry on with the upgrades?
The number of current in-progress/suspended flows on a node is given by:
CordaRPCOps.stateMachinesFeed.snapshot.size
You can also see the in-progress/suspended flows via the shell using:
run stateMachinesSnapshot
Corda 4 will introduce CordaRPCOps.pendingFlowsCount to make checking this easier.

Background Tasks in Spring (AMQP)

I need to handle a time-consuming and error-prone task (e.g., invoking a SOAP endpoint that will trigger the delivery of an SMS) whenever a given endpoint of my REST API is invoked, but I'd prefer not to make my users wait for that before sending a response back. Spring AMQP is already part of my stack, so I though about leveraging it to establish a "work queue" and have a number of worker processes consuming from the queue and taking care of the "work units". I have, however, the following requirements:
A work unit is guaranteed to be delivered, and delivered to exactly one worker.
Shall a work unit fail to be completed for any reason it must get placed back in the queue so that another worker can pick it up later.
Work units survive server reboots and crashes. This is mandatory because I won't be using a DB of any kind to store them.
I know RabbitMQ and Spring AMQP can be configured in such a way that ensures these three requirements, but I've only ever used it to achieve RPC so I don't know much about anything other than that. Is there any example I might follow? What are some of the pitfalls to watch out for?
While creating queues, rabbitmq gives you two options; transient or durable. Durable messages will be available until you acknowledge them. And messages won't expire if you do not give queue a ttl. For starters you can enable rabbitmq management plugin and play around a little.
But if you really want to guarantee the safety of your messages against hard resets or hardware problems, i guess you need to use a rabbitmq cluster.
Rabbitmq Clustering and you can find high availability subject on the right side of the page.
This guy explaines how to cluster
By the way i like beanstalkd too. You can make it write messages to disk and they will be safe except disk failures.

Creating a FIFO queue in SWF to control access to critical code sections

At the moment we have an Amazon Simple Workflow application that has a few tasks that can occur in parallel at the beginning of the process, followed by one path through a critical region where we can only allow one process to proceed.
We have modeled the critical region as a child workflow and we only allow one process to run in the child workflow at a time (though there is a race condition in our code that hasn't caused us issues yet). This is doing the job, but it has some issues.
We have a method that keeps checking if the child workflow is running and if it isn't it proceeds (race condition mentioned above - the is running check and starting running are not an atomic operation), otherwise throws an exception and retries, this method has an exponential backoff, the problems are: 1. With multiple workflows entering, which workflow will proceed first is non-deterministic, it would be better if this were a FIFO queue. 2. We can end up waiting a long time for the next workflow to start so there is wasted time, would be nice if the workflows proceeded as soon as the last one had finished.
We can address point 2 by reducing the retry interval, but we would still have the non-FIFO problem.
I can imagine modeling this quite easily on a single machine with a queue and locks, but what are our options in SWF?
You can have "critical section" workflow that is always running. Then signal it to "queue" execute requests. Upon receiving signal the "critical section" workflow either starts activity if it is not running or queues the request in the decider. When activity execution completes the "response" signal is sent back to the requester workflow. As "critical section" workflow is always running it has periodically restart itself as new (passing list of outstanding requests as a parameter) the same way all cron workflows are doing.

IMAP Idle Timeout

Lets say I am using IMAP IDLE to monitor changes in a mail folder.
The IMAP spec says that IDLE connections should only stay alive for 30 minutes max, but it is recommended that a lower number of minutes is selected - say 20 minutes, then cancel the idle and restart.
I am wondering what would happen if the mail contents changed between the idle canceling, and the new idle being created. An email could potentially be missed. Given that RECENT is a bit vague, this could lead to getting a message list before the old idle ends, and a new idle starts.
But this is almost the same as polling every 20 minutes, and defeats some of the benefit of idle.
Alternatively, a new idle session could be started prior to terminating the expiring one.
But in any case, I think this problem has already been solved so here I am asking for recommendations.
Thanks,
Paul
As you know, the purpose of IMAP IDLE command (RFC 2177) is to make it possible to have the server transmit status updates to the client in real time. In this context, status updates means untagged IMAP server responses such as EXISTS, RECENT, FETCH or EXPUNGE that are sent when new messages arrive, message status is updated or a message is removed.
However, these IMAP status updates can be returned by any IMAP command, not just the IDLE command - for example, the NOOP command (see RFC 3501 section 6.1.2) can be used to poll for server updates as well (it predates the IDLE command). IDLE only makes it possible to get these updates more efficiently - if you don't use IDLE command, server updates will simply be sent by the server when the client executes another command (or even when no command is in progress in some cases) - see RFC 3501 section 5.2 and 5.3 for details.
This means that if a message is changed between the IDLE canceling and the new IDLE command, the status updates should not be lost, just as they are not lost if you never used IDLE in the first place (and use NOOP every few seconds instead, for example) - they should simply be sent after the new IDLE command is started.
Another approach would be to remember last highest uid of the folder being monitored. Whenever you think there is chance that you missed update. Do a search as follows :*

Resources