ActiveMQ Artemis Error - AMQ224088: Timeout (10 seconds) while handshaking with LB - timeout

My question is related to the question already posted here
Its indicated in the original post that the timeout happens about once a month. In our setup we are receiving this once every 10 seconds. Our production logs are filled with this handshake exception messages. Would setting the timeout value for handshake apply to our scenario as well?

Yes. Setting handshake-timeout=0 on the relevant acceptor URL in your broker.xml applies here even with the higher volume of timeouts.

Related

Python Requests POST timeout prematurely closed connection

I'm doing some file uploads that sends to an nginx reverse proxy. If I set the python requests timeout to 10 seconds and upload a large file, nginx will report client prematurely closed connection and forward an empty body to the server. If I remove the requests timeout, the file uploads without any issues. As I understand it, the timeout should only apply if the client fails to receive or send any bytes, which I don't believe is the case as it's in the middle of uploading the file. It seems to behave more like a time limit, cutting the connection after 10 seconds with no exception being raised by requests. Is sending bytes different than reading bytes for timeout? I haven't set anything for stream or tried any type of multi-part. I would like to set a timeout but confused as to why the connection is getting aborted early - thanks for any help.

Prometheus errors and log location

I have a Prometheus service running in a docker container and we have a group of servers that are rotating reporting up and down with the error "context deadline exceeded".
Our time interval is 15 seconds and timeout is 10 second.
The servers have been polled with no issues for months, no new changes have been identified. At first I suspected a networking issues but I have triple checked the entire path and all containers and everything is okay. I have even tcpdumped on the destination server and Prometheus polling server and can see the connections establish and complete, yet still being reported as down.
Can anyone tell me where I can find logs relating to "content deadline exceeded"? Is there any additional information I can find on what is causing this?
From other thread it seems like this is a timeout issue, but the servers are a subsecond away and again there is no packetloss occurring anywhere.
Thanks for any help.

Mongo::Error::SocketError: Broken pipe when executing queries in a mongo replica set

I randomly get this error from my rails app connected to a mongo replica set. This then leads to the server description getting changed to 'unknown' and server selection starting all over again.
This doesn't happen when I try running the app in local connected to a standalone mongod server.
For some reason, connecting to a replica set and executing repeated queries on it results in
Read retry due to: Mongo::Error::SocketError EOFError: end of file reached
Is this an issue with the underlining SSL/TLS connection to the replica set? I've tried increasing the socket_timeout, connection_timeout in my mongoid.yml with no success.
This happens when the server closes the connection to the client. One time when that happens is when a replica set has an election - the old primary closes all of its connections with MongoDB 4.0 and earlier (not with 4.2+).
Since the read is retried, the situation does not generally affect the application other than the diagnostic message being printed that you see.
You can look into the server logs at the time when the message is printed to see why the server closed the connection. Sometimes connections are closed without server-side logging (for example, when the server process terminates it doesn't log all connection closes).

In what cases does Google Cloud Run respond with "The request failed because the HTTP connection to the instance had an error."?

We've been running Google Cloud Run for a little over a month now and noticed that we periodically have cloud run instances that simply fail with:
The request failed because the HTTP connection to the instance had an error.
This message is nearly always* proceeded by the following message (those are the only messages in the log):
This request caused a new container instance to be started and may thus take longer and use more CPU than a typical request.
* I cannot find, nor recall, a case where that isn't true, but I have not done an exhaustive search.
A few things that may be of importance:
Our concurrency level is set to 1 because our requests can take up to the maximum amount of memory available, 2GB.
We have received errors that we've exceeded the maximum memory, but we've dialed back our usage to obviate that issue.
This message appears to occur shortly after 30 seconds (e.g., 32, 35) and our timeout is set to 75 seconds.
In my case, this error was always thrown after 120 seconds from receiving the request. I figured out the issue that Node 12 default request timeout is 120 seconds. So If you are using Node server you either can change the default timeout or update Node version to 13 as they removed the default timeout https://github.com/nodejs/node/pull/27558.
If your logs didn't catch anything useful, most probably the instance crashes because you run heavy CPU tasks. A mention about this can be found on the Google Issue Tracker:
A common cause for 503 errors on Cloud Run would be when requests use
a lot of CPU and as the container is out of resources it is unable to
process some requests
For me the issue got resolved by upgrading node "FROM node:13.10.1 AS build" to "FROM node:14.10.1 AS build" in docker file it got resolved by upgarding the node.

Message published but never reached the broker

My rabbit template is configured to use CachingConnectionFactory with
cache mode connection. In rare cases when calling
rabbitTemplate.convertAndSend
it passes without any issue but the message never got the rabbitmq broker.
Few seconds after that, another thread logs:
An unexpected connection driver error occured
and stacktrace:
com.rabbitmq.client.MissedHeartbeatException: Heartbeat missing with heartbeat = 60 seconds
Is there a config that I should activate to be sure that message has got the broker or at least I expect an exception to be thrown in the sending thread?
Consider to turn on Publisher Confirms and Returns on the CachingConnectionFactory and mandatory for the RabbitTemplate: https://docs.spring.io/spring-amqp/docs/2.1.4.RELEASE/reference/#cf-pub-conf-ret

Resources