I am developing a service which reads messages from SQS, does some processing and posts the result to a different service.
When I initially developed this, the service was working perfectly fine, however after some days (about 5 days) I got this exception:
com.amazonaws.http.AmazonHttpClient: Unable to execute HTTP request: Connection to https://sqs.us-east-1.amazonaws.com refused { org.apache.http.conn.HttpHostConnectException: Connection to https://sqs.us-east-1.amazonaws.com refused
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:561)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:280)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:165)
at com.amazonaws.services.sqs.AmazonSQSClient.invoke(AmazonSQSClient.java:869)
at com.amazonaws.services.sqs.AmazonSQSClient.getQueueAttributes(AmazonSQSClient.java:453)
I reactivated the service and it started working again. My question is has someone faced this issue? If so what is the best way to handle such exception?
Other interesting fact is the same exception happened on all the hosts of the service at nearly the same time. Can this be because of an SQS outage or are such transient connection failures expected?
Related
I am working oncan application which is to push records to azure table & blobs. My application ran for around 8 days perfectly fine but then it started giving connection time out error related to blob. Can anyone please guide to to workaround this?
Error logs below :
"java.net.ConnectException: Operation timed out (Connection timed out)\n\tat java.base/java.net.PlainSocketImpl.socketConnect(Native Method)\n\tat java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)\n\tat java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)\n\tat java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)\n\tat java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)\n\tat java.base/java.net.Socket.connect(Socket.java:609)\n\tat java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:299)\n\tat java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:177)\n\tat java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:474)\n\tat java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:569)\n\tat java.base/sun.net.www.protocol.https.HttpsClient.(HttpsClient.java:266)\n\tat java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:373)\n\tat java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:203)\n\tat java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)\n\tat java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)\n\tat java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:189)\n\tat java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592)\n\tat java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)\n\tat java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)\n\tat java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:334)\n\tat com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:115)\n\tat com.microsoft.azure.storage.blob.CloudBlobContainer.exists(CloudBlobContainer.java:744)\n\tat com.microsoft.azure.storage.blob.CloudBlobContainer.exists(CloudBlobContainer.java:731)\n\tat com.microsoft.azure.storage.blob.CloudBlobContainer.exists(CloudBlobContainer.java:705)\n\tat
To resolve the error "java.net.ConnectException: Operation timed out (Connection timed out)" try below suggestions if helpful:
Try setting a small timeoutInterval and use a large maximumExecutionTime in your blobRequestOptions.
If the above does not work, use fiddler to verify that you are sending and receiving a response as expected.
Try checking the code you are using to upload files to Azure blob.
Check the system configuration and make sure there is no firewall that blocks the request from Java.
Connection timeouts usually occurs if there are too many requests overloading the server.
Otherwise, try uploading blob in chunk as a workaround.
If still the error persists, try using code snippet mentioned in this link.
Check whether the IP address/domain and port are incorrect or down.
For more information in detail, please check below references:
java - Azure StorageException: An unknown failure occurred : Connection timed out: connect - Stack Overflow
java - What could cause socket ConnectException: Connection timed out? - Stack Overflow
Why would a "java.net.ConnectException: Connection timed out" exception occur when URL is up? - Stack Overflow
Error message in Application Insights:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (OurApiUrlAddress:443) A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
It's always a 21 seconds TCP timeout, this is a very generic error I know, but the reason for this error is not always the same, I've been reading all the threads about this. We've been investigating this problem for months with no luck, we're also in contact with Azure team.
Important: this same site written in RUBY was using this same API without any problem in the past, the API is responsive and it's called from other sites without any problem, but this specific site was migrated from RUBY to .NET and at the same time this site was hosted in AZURE, this are the 2 big changes. This just happens when the site (remember it's hosted in Azure) calls to API / services hosted in our company, this doesn't happen when site calls a service hosted somewhere else, these makes us think the problem may be related to the company infrastructure but it can't be that alone, this has to be related to .NET and AZURE someway since these APIs and services respond perfectly to calls from other sites hosted in our network and they were working fine with the ruby version of this site. These Apis and services are not throwing this error when called in-browser from outside the company network.
The services/apis are behind a firewall but ports are perfectly configured (there are not any other traffic apps nor devices at play).
This error doesn't seem to be related to port exhaustion or SNAT, since sometimes only 1 developer alone is working in the DEV environment and he gets this socket exception error.
Just to give an idea we're getting around 250 socket exceptions a day on production, and this is just a small percentage of all the calls, so there is something that, just sometimes, is making this happen.
We know about the well known HttpClient issue when multiple instances are created, so we decided to use the Singleton approach ensuring only 1 instance per API/Service, as I'll show here, this is the call that gives more socket exceptions:
In StartUp class/file:
services.AddSingleton<IUploadApi>(new UploadApi(new HttpClient() { BaseAddress = new Uri(appSettings.Endpoints.UploadServicesUrl) }));
Part of appsettings.json:
"Endpoints": {
"UploadServicesUrl": "https://ourApiUrlAddress"
},
UploadApi.cs
public interface IUploadApi
{
Task<UploadArtworkViewModel.UploadConfigurationData> GetUploadConfiguration();
}
public class UploadApi : IUploadApi
{
private readonly HttpClient httpClient;
public UploadApi(HttpClient client)
{
httpClient = client;
}
public async Task<UploadArtworkViewModel.UploadConfigurationData> GetUploadConfiguration()
{
var response = await httpClient.GetAsync("api/GetUploadConfiguration").ConfigureAwait(false);
var json = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
return JsonConvert.DeserializeObject<UploadArtworkViewModel.UploadConfigurationData>(json);
}
}
Call from controller:
model.UploadConfiguration = await UploadApi.GetUploadConfiguration().ConfigureAwait(false);
Any idea on things to test or places to look are welcome, obviously I've not been able to reproduce this one. We know there's always a 21 seconds timeout, that's a TCP timeout, but that doesn't help much. Maybe for some reason the connection is dropped or Azure is having problems (sometimes) when accessing the company network. I can post more info from application insights if needed but I don't see anything special there about the error.
EDIT - More info: It happens when any API or service is called from this MVC site Controllers, so the problem appears sporadically (still like 300 times per day) when the site server tries to reach an API or service, this makes me believe it's something related to the company infraestructure, but still no idea what it could be.
From asp.net monsters:
"the application has exited and yet there are still a bunch of these
connections open"
"They are in the TIME_WAIT state which means that the connection has
been closed on one side (ours) but we’re still waiting to see if any
additional packets come in on it because they might have been delayed
on the network somewhere."
Even if you're using a singleton HttpClient, it seems that some of the connections are awaiting for additional packages which leads to socket exaustion.
The solution is to change your code and use HttpClientFactory or HttpClientFacotoryLite. The reason to use HttpClientFactory is that produces HttpClient instances that resuse Socket handlers from a pool of socket handlers. The handlers are recycled periodically to also take care of DNS changes. In summary, when using HttpClientFactory, HttpClient delegates work to a SocketClientHandler.
We finally got this problem fixed after working together with Azure team for some time, it was a gateway problem, solution was applying NAT/Vnet Integration. This is what we did to fix it:
https://learn.microsoft.com/en-us/azure/app-service/networking/nat-gateway-integration
Am working with wso2esb4.9.0 and having around 160 services which are http and are processed frequently, Initially when the server is started every thing is fine all service request response is up to the mark,
After 10-12 days the ESB server gets hanged does not process any request and no exception are seen in the log file even,They are some request which may be piled up in the server and not allowing new connection to process.
when i do restart of the server all the connections get releases and works for other 10-12 days again.
But doing a restart of the server may not be a good idea to do , where can i find these connections and close them if possible and am i missing any config changes of wso2 esb.
Am trying to find some different connection number using JMX and also what to know if any one face this issue and found the possible solution.
i am trying to use swift mailer to send emails with mandrill API. I was working on a server and it worked great. Then when i changed to another server and it shows this: Fatal error: Uncaught exception 'Swift_TransportException' with message 'Connection could not be established with host smtp.mandrillapp.com [Connection refused #111]'
can anyone help?
This is typically a result of your hosting provider blocking the port being used or blocking external SMTP connections. You'll likely want to get in touch with the host for the new server you're working with since many shared hosting providers limit or prohibit the use of certain ports or external services.
In my case, I had not mentioned port number 587. mandrillapp changed from default 25 to 587 now.
I am trying to render few static pages in my rails app when the mysql server is shut down. I tried to catch the Mysql::Error exception and render the corresponding static page for each controller.
When we just stop the mysql service in the machine where the mysql is installed. The Mysql::Error exception is thrown immediately and i am able to render the pages without any delay. but if i just shut down the server. The whole website becomes irresponsive.
I traced down the actual function in the rails framework , which is taking 3 mins to complete. It was this statement
Mysql.real_connect
in the active_record gem. which takes so long. Is there any way i can give a time out so that , when the mysql server is powered off. it returns with the Mysql::Error exception really quickly so that i can render the pages without any delay??
This is probably coming from the socket timeout within the mysql adapter. When the service is stopped, the server will respond quickly with a connection refused error. When the server itself is down, the socket will have to get a connection timeout before it returns. What you'll probably have to do is monkey patch the #real_connect method so that it first validates that the server is running by attempting a socket connection (with a timeout) before continuing on with the original implementation. This question may be of some help to you there:
How do I set the socket timeout in Ruby?
dbh = Mysql.init
dbh.options(Mysql::OPT_CONNECT_TIMEOUT, 6)