Vert.x - 3.5.0
I have single vert.x vertical using single web client consuming 2 different API's on same destination host.
I have connection maxPoolSize=100, keepAlive=true, connectionTimeout=9000 and request timeout =10000.
My http call is failing because of null header parameter is being passed in my code but problem is when vert.x throws null pointer while doing POST (because of one of the null header parameter) my http connection's are getting exhausted after sometime and never released and if I do make a call after sometime it's still timing out.
Please suggest what am I missing here and if vert.x is doing something internally to manage connection's that is causing this issue.
How should we come up with number for maxPoolSize, connect timeout value provided I know my requestTimeout is 10000 and keep alive is true
I can fix my null header parameter issue but I am more interested what is happening to the connection pool for timeout in such scenario's - is it like not releasing the connection ?
How can I monitor my current connection pool utilization.
Related
Error message in Application Insights:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (OurApiUrlAddress:443) A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
It's always a 21 seconds TCP timeout, this is a very generic error I know, but the reason for this error is not always the same, I've been reading all the threads about this. We've been investigating this problem for months with no luck, we're also in contact with Azure team.
Important: this same site written in RUBY was using this same API without any problem in the past, the API is responsive and it's called from other sites without any problem, but this specific site was migrated from RUBY to .NET and at the same time this site was hosted in AZURE, this are the 2 big changes. This just happens when the site (remember it's hosted in Azure) calls to API / services hosted in our company, this doesn't happen when site calls a service hosted somewhere else, these makes us think the problem may be related to the company infrastructure but it can't be that alone, this has to be related to .NET and AZURE someway since these APIs and services respond perfectly to calls from other sites hosted in our network and they were working fine with the ruby version of this site. These Apis and services are not throwing this error when called in-browser from outside the company network.
The services/apis are behind a firewall but ports are perfectly configured (there are not any other traffic apps nor devices at play).
This error doesn't seem to be related to port exhaustion or SNAT, since sometimes only 1 developer alone is working in the DEV environment and he gets this socket exception error.
Just to give an idea we're getting around 250 socket exceptions a day on production, and this is just a small percentage of all the calls, so there is something that, just sometimes, is making this happen.
We know about the well known HttpClient issue when multiple instances are created, so we decided to use the Singleton approach ensuring only 1 instance per API/Service, as I'll show here, this is the call that gives more socket exceptions:
In StartUp class/file:
services.AddSingleton<IUploadApi>(new UploadApi(new HttpClient() { BaseAddress = new Uri(appSettings.Endpoints.UploadServicesUrl) }));
Part of appsettings.json:
"Endpoints": {
"UploadServicesUrl": "https://ourApiUrlAddress"
},
UploadApi.cs
public interface IUploadApi
{
Task<UploadArtworkViewModel.UploadConfigurationData> GetUploadConfiguration();
}
public class UploadApi : IUploadApi
{
private readonly HttpClient httpClient;
public UploadApi(HttpClient client)
{
httpClient = client;
}
public async Task<UploadArtworkViewModel.UploadConfigurationData> GetUploadConfiguration()
{
var response = await httpClient.GetAsync("api/GetUploadConfiguration").ConfigureAwait(false);
var json = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
return JsonConvert.DeserializeObject<UploadArtworkViewModel.UploadConfigurationData>(json);
}
}
Call from controller:
model.UploadConfiguration = await UploadApi.GetUploadConfiguration().ConfigureAwait(false);
Any idea on things to test or places to look are welcome, obviously I've not been able to reproduce this one. We know there's always a 21 seconds timeout, that's a TCP timeout, but that doesn't help much. Maybe for some reason the connection is dropped or Azure is having problems (sometimes) when accessing the company network. I can post more info from application insights if needed but I don't see anything special there about the error.
EDIT - More info: It happens when any API or service is called from this MVC site Controllers, so the problem appears sporadically (still like 300 times per day) when the site server tries to reach an API or service, this makes me believe it's something related to the company infraestructure, but still no idea what it could be.
From asp.net monsters:
"the application has exited and yet there are still a bunch of these
connections open"
"They are in the TIME_WAIT state which means that the connection has
been closed on one side (ours) but we’re still waiting to see if any
additional packets come in on it because they might have been delayed
on the network somewhere."
Even if you're using a singleton HttpClient, it seems that some of the connections are awaiting for additional packages which leads to socket exaustion.
The solution is to change your code and use HttpClientFactory or HttpClientFacotoryLite. The reason to use HttpClientFactory is that produces HttpClient instances that resuse Socket handlers from a pool of socket handlers. The handlers are recycled periodically to also take care of DNS changes. In summary, when using HttpClientFactory, HttpClient delegates work to a SocketClientHandler.
We finally got this problem fixed after working together with Azure team for some time, it was a gateway problem, solution was applying NAT/Vnet Integration. This is what we did to fix it:
https://learn.microsoft.com/en-us/azure/app-service/networking/nat-gateway-integration
I have a web app that I built. It communicates with the Salesforce API. I have users and administrators. All connections to the API use the same credentials.
I am concerned that my API connection is going to be created multiple times because each admin that is logged in has their own instance of the connection.
If I hold the API connection in a constant, do all other sessions/users have access to that exact connection or do I have to connect for each user, or how can I share one single API connection for ALL users?
A stateless API will never have a persistent connection, so there's no use in holding these in constants. Every HTTP request is a separate TCP connection by definition.
It's only things like database or Websocket connections that persist and if you need to manage those you need a connection pool, not a simple constant. If the connection ever fails it needs to be replaced, and if more than one thread potentially requires it you have to handle acquisition and locking properly.
Create your API connectors as necessary. Unless you have a measurable performance problem don't worry about it.
A Ruby constant is like a variable, except that its value is supposed to remain constant for the duration of the program. The Ruby interpreter does not actually enforce the constancy of constants, but it does issue a warning if a program changes the value of a constant.
Reference: http://rubylearning.com/satishtalim/ruby_constants.html
We're using Spring AMQP in the style of Spring Remoting with AMQP. I'm setting x-message-ttl on every message so that it expires immediately if it cannot be delivered immediately to a consumer. This works great, however, it leaves the producer waiting for the specified value of replyTimeout before failing with RemoteProxyFailureException (if I recall correctly). Is there any way I can make the producer fail immeditely if the message cannot be delivered (only waiting for the timeout if the message is actually received)?
The loose coupling of the architecture means there's no indication to the producer of the expiry.
There used to be an immediate flag but it was removed in rabbitmq 3.0.
On possible solution would be to configure a DLX/DLQ so the expired message can be consumed by another consumer, which can return an exception to the client.
EDIT:
Simply have the fallback consumer implement the same interface and have it throw an exception.
See this updated test case.
I am using Grails Ws-Client Plugin
but my application waits for the SOAP response back from the server from which i am consuming web service and my application waits from this code
def proxy = webService.getClient(wsdlUrl)
This mostly occours when the server is down or net connection is slow.
the wait also continues in case the webservice is temporarily removed from the server and the url containing the wsdl is redirecting to home page of website when try to access on web browser.
How can i detect that the wsdl is present or not and how can i set timeout like property so that the wait for response exist for 10 seconds and then it stops waiting for response and code start executing normally in case of stall .
I also don't get any exception or error as well.
Sounds like there's no read and/or connect timeouts set on the client by default. This should help if the web service is down: proxy.setConnectionTimeout(value_in_milliseconds)
I'm not sure about setting the read timeout though, which is what you'd see if the host was up and accepting connections but the web service wasn't available or not responding. The best solution we found for this was to use the Apache Commons HTTP client instead of the default client, which gave us much more granular configuration over the client's connection settings. It's possible they're in the WS-Client plugin also, but the relevant documentation (actually the GroovyWS documentation) doesn't appear to mention anything about read timeouts.
we're using CometD 2 to achieve the connection between a central data provider and several backends consuming the data. Up to now, when one of the backends fails briefly, all messages posted in the meantime are lost. Now we heard about the "Acknowledge Extension" for CometD. It is supposed to create a server-side list of messages and delivers them when one of the clients reports to be back online. Here are some questions:
1) Does this also work with several clients?
2) The documentation (http://cometd.org/documentation/2.x/cometd-ext/ack) says: "Note that if the disconnected browser is disconnected for in excess of maxInterval (default 10s), then the client will be timed out and the unacknowledged queue discarded." -> does this mean that in case my client doesn't restore within the maxInterval, the messages are lost anyway?
Hence,
2.1) What's the maximal maxInterval? Which consequences does it have to set it to a high value?
2.2) We'd need a secure mechanism for fail outs of at least a few minutes. Is this possible? Are there any alternatives?
3) Is it really only necessary to add the two extensions in both the client and cometD server? We're using Jetty for the server and .NET Oyatel for the client. Does anyone have some experiences with this?
I'm sorry for this bunch of questions, but unfortunately, the CometD project isn't really well documented. I really appreciate any answers.
Cheers,
Chris
1) Does this also work with several Clients
Yes, it does. There is one message queue allocated for each client (see AcknowledgedMessagesClientExtension).
2) does this mean that in case my client doesn't restore within the maxInterval, the messages are lost anyway?
Yes, it does. When the client can't reach the server for maxInterval milliseconds, the server will throw away all state associated with that client.
2.1) What's the maximal maxInterval? Which consequences does it have to set it to a high value?
maxInterval is a servlet parameter of the Cometd servlet. It is internally treated as a long value, so the maximal value for it is Long.MAX_VALUE.
Example configuration:
<init-param>
<!-- The max period of time, in milliseconds, that the server will wait for
a new long poll from a client before that client is considered invalid
and is removed -->
<param-name>maxInterval</param-name>
<param-value>10000</param-value>
</init-param>
Setting it to a high value means that the server will wait longer before throwing away the state associated with a client (from the time the client stops contacting the server).
I see two problems with this. First, the memory requirements of the server will potentially be higher (which may also make denial of service easier). Second, the RemoveListener isn't called on the Server before the maxInterval expires, which may require you to implement additional logic that differentiates between "momentarily unreachable" and "disconnected".
2.2) We'd need a secure mechanism for fail outs of at least a few minutes. Is this possible? Are there any alternatives?
Yes, it is possible to configure the maxInterval to last for a few minutes.
An alternative would be to restore any server side state on every handshake. This can be achieved by adding a listener to "/meta/handshake" and publishing a message to a "/service/" channel (to make sure only the server receives the message), or by adding an additional property to the "ext" property of the handshake message. Be careful to let the client restore only valid state (sign it on the server if you must).
3) Is it really only necessary to add the two extensions in both the client and cometD server?
On the server it is sufficient to do something like:
bayeux.addExtension(new AcknowledgedMessagesExtension());
I don't know how you'd do it on Oyatel. In Javascript it suffices to simply include the extension (dojo.require or script include for jQuery).
When a client with the AckExtension connects to the server, a message similar to the following will be logged (from my Jetty console log):
[qtp959713667-32] INFO org.cometd.server.ext.AcknowledgedMessagesExtension - Enabled message acknowledgement for client 51vkuhps5qgsuaxhehzfg6yw92
Another note because it may not be obvious: the ack extension will only provide server to client delivery guarantee, not client to server. That is, when you publish a message from the client to the server, it may not reach the server and will be lost.
Once the message has made it to the server, the ack extension will ensure that all recipients connected at that time will receive the message (as long as they aren't unreachable for maxInterval milliseconds).
It is relatively straightforward to implement client-side retrying if you listen to notifications on "/meta/unsuccessful" and resend the message (the original message that failed is passed as message.request to the handler).