QuickFIX/J Failover handling process - quickfixj

We are using the QuickFIX/J library for our application. There is a requirement received to handle the fail-over scenario for the initiator session as below.
If a disconnection experienced for the primary host(SocketAcceptHost), we should try the same connection for a configurable number of times.
If all the retry attempts failed only then do we need to start trying to the other hosts(SocketAcceptHost1, SocketAcceptHost2...SocketAcceptHost[N]).
As of my understanding, the above requirement is not available with existing code and it seems doable while introducing new parameters to the class "IoSessionInitiator". Please be kind enough to advise me further on this.

Related

Provider configuration set number of messages to prefectch

I am new to spring boot jms and i saw something like provider configuration - > set Number of messages to prefetch function, i could not find much about it and i don't have much clear idea as to what does it mean?
I have my concurrency set to 5-100. So does it mean, everytime a new consumer is spawned, this will get the number of messages, lets say if i set it 5, will it get 5 messages from queue at once?
Without using spring jms we use to make a receive call which can fetch upto 10 message, over here we have a jms listener, are these two same?
Can anyone clarify its real purpose in terms of multiple consumers, what it will do?
Yes; the prefetch applies to each consumer.
Using multiple consumers allows you to process messages is parallel to improve performance, but you can only do that if the order in which you process messages is not important.

Background Tasks in Spring (AMQP)

I need to handle a time-consuming and error-prone task (e.g., invoking a SOAP endpoint that will trigger the delivery of an SMS) whenever a given endpoint of my REST API is invoked, but I'd prefer not to make my users wait for that before sending a response back. Spring AMQP is already part of my stack, so I though about leveraging it to establish a "work queue" and have a number of worker processes consuming from the queue and taking care of the "work units". I have, however, the following requirements:
A work unit is guaranteed to be delivered, and delivered to exactly one worker.
Shall a work unit fail to be completed for any reason it must get placed back in the queue so that another worker can pick it up later.
Work units survive server reboots and crashes. This is mandatory because I won't be using a DB of any kind to store them.
I know RabbitMQ and Spring AMQP can be configured in such a way that ensures these three requirements, but I've only ever used it to achieve RPC so I don't know much about anything other than that. Is there any example I might follow? What are some of the pitfalls to watch out for?
While creating queues, rabbitmq gives you two options; transient or durable. Durable messages will be available until you acknowledge them. And messages won't expire if you do not give queue a ttl. For starters you can enable rabbitmq management plugin and play around a little.
But if you really want to guarantee the safety of your messages against hard resets or hardware problems, i guess you need to use a rabbitmq cluster.
Rabbitmq Clustering and you can find high availability subject on the right side of the page.
This guy explaines how to cluster
By the way i like beanstalkd too. You can make it write messages to disk and they will be safe except disk failures.

HBase 0.98.1 Put operations never timeout

I am using 0.98.1 version of HBase server and client. My application has strict response time requirements. As far as HBase is concerned, I would like to abort the HBase operation if the execution exceeds 1 or 2 seconds. This task timeout is useful in case of Region-Server being non-responsive or has crashed.
I tired configuring
1) HBASE_RPC_TIMEOUT_KEY = "hbase.rpc.timeout";
2) HBASE_CLIENT_RETRIES_NUMBER = "hbase.client.retries.number";
However, the Put operations never timeout (I am using sync flush). The operations return only after the Put is successful.
I looked through the code and found that the function receiveGlobalFailure in AsyncProcess class keeps resubmitting the task without any check on the retires. This is in version 0.98.1
I do see that in 0.99.1 there have been some changes to AsyncProcess class that might do what I want. I have not verified it though.
My questions are:
Is there any other configuration that I missed that can give me
the desired functionality.
Do I have to use 0.99.1 client to
solve my problem? Does 0.99.1 solve my problem?
If I have to use 0.99.1 client, then do I have to use 0.99.1 server or can I still use my existing 0.98.1 region-server.

What is a good practice to achieve the "Exactly-once delivery" behavior with Amazon SQS?

According to the documentation:
Q: How many times will I receive each message?
Amazon SQS is
engineered to provide “at least once” delivery of all messages in its
queues. Although most of the time each message will be delivered to
your application exactly once, you should design your system so that
processing a message more than once does not create any errors or
inconsistencies.
Is there any good practice to achieve the exactly-once delivery?
I was thinking about using the DynamoDB “Conditional Writes” as distributed locking mechanism but... any better idea?
Some reference to this topic:
At-least-once delivery (Service Behavior)
Exactly-once delivery (Service Behavior)
FIFO queues are now available and provide ordered, exactly once out of the box.
https://aws.amazon.com/sqs/faqs/#fifo-queues
Check your region for availability.
The best solution really depends on exactly how critical it is that you not perform the action suggested in the message more than once. For some actions such as deleting a file or resizing an image it doesn't really matter if it happens twice, so it is fine to do nothing. When it is more critical to not do the work a second time I use an identifier for each message (generated by the sender) and the receiver tracks dups by marking the ids as seen in memchachd. Fine for many things, but probably not if life or money depends on it, especially if there a multiple consumers.
Conditional writes sound like a clever solution, but it has me wondering if perhaps AWS isn't such a great solution for your problem if you need a bullet proof exactly-once solution.
Another alternative for distributed locking is Redis cluster, which can also be provisioned with AWS ElasticCache. Redis supports transactions which guarantee that concurrent calls will get executed in sequence.
One of the advantages of using cache is that you can set expiration timeouts, so if your message processing fails the lock will get timed release.
In this blog post the usage of a low-latency control database like Amazon DynamoDB is also recommended:
https://aws.amazon.com/blogs/compute/new-for-aws-lambda-sqs-fifo-as-an-event-source/
Amazon SQS FIFO queues ensure that the order of processing follows the
message order within a message group. However, it does not guarantee
only once delivery when used as a Lambda trigger. If only once
delivery is important in your serverless application, it’s recommended
to make your function idempotent. You could achieve this by tracking a
unique attribute of the message using a scalable, low-latency control
database like Amazon DynamoDB.
In short - we can put item or update item in dynamodb table with condition expretion attribute_not_exists(for put) or if_not_exists(for update), please check example here
https://stackoverflow.com/a/55110463/9783262
If we get an exception during put/update operations, we have to return from a lambda without further processing, if not get it then process the message (https://aws.amazon.com/premiumsupport/knowledge-center/lambda-function-idempotent/)
The following resources were helpful for me too:
https://ably.com/blog/sqs-fifo-queues-message-ordering-and-exactly-once-processing-guaranteed
https://aws.amazon.com/blogs/aws/introducing-amazon-sns-fifo-first-in-first-out-pub-sub-messaging/
https://youtu.be/8zysQqxgj0I

Having C# application communicate with Nagios

We are using Nagios to monitor our network with great results. There is now a new requirement we are struggling with:
We want to notify Nagios of an non
fatal but critical application errors. The
application does not stop running but
there is some sort of issue that
needs looking into.
Once the issue has been looked into,
we need some way to "unflag" the
issue in Nagios.
We tried using the syslog, but the biggest problem was once an error was logged, the service was put into an error state with no way to recover. Also, while applications would report a critical error to the syslog, most of the time they don't report an "All clear" error.
I've done this using passive checks: http://nagios.sourceforge.net/docs/3_0/passivechecks.html
Basically, you're application is just going to feed the nagios core some data into its external command file. Nagios will eventually read the data and update the alerts, execute event handlers, etc.
Exactly how you set this up will be unique for your case, but if you need any other help just let me know. :)

Resources