We have a connection to postgres database that is configured with tomcat connection pool. The problem is that when connection becomes active it never goes back to idle.
When I start my microservice it has 0 active connections and 10 idle ones. After one hour of work there are 7 active and 3 idle. After weekend there were 100 active, it reached the limit and service was down.
Is there any way to configure tomcat connection pool to check active connections state and if they are stucked to close them?
Looks like your application is leaking connection. By default hibernate c3p0 provide facilities for detecting leaks , there are two parameters to configure :
5
true
After this it will print stack trace for long active connections and close them.
Recommended not to use on high load. If using another pool, search for a similar thing
As we have http timeouts inside our cluster, it seems that due to this there is a connection leak. I investigated and connection remains always active.
The solution for me was to enable abandoned connections verification.
private DataSource configureDataSource(String url, String user, String password, String driverClassName){
DataSource ds = DataSourceBuilder.create()
.url(url)
.username(user)
.password(password)
.driverClassName(driverClassName)
.build();
org.apache.tomcat.jdbc.pool.DataSource configuredDataSource = (org.apache.tomcat.jdbc.pool.DataSource) ds;
// some other configurations here
// ...
configuredDataSource.getPoolProperties()
.setRemoveAbandonedTimeout(300);
configuredDataSource.getPoolProperties()
.setRemoveAbandoned(true);
}
#Bean(name = "qaDataSource")
public JdbcTemplate getQaJdbcTemplate() {
DataSource ds = configureDataSource(qaURL, qaUsername, qaPassword ,qaDriverClassName);
return new JdbcTemplate(ds);
}
RemoveAbandoned and RemoveAbandonedTimeout flags mean that if some connection is in active state more that timeout value it will be closed. If you put this to your code ensure that this timeout is superior that the maximum query execution time for your service.
Related
I am using the event store client for .Net and I am struggling to find the correct way to use the client. When I register the client as a singleton in the .Net dependency injection and run my application over an extended period of time memory usage grows continuously with each subscription.
I create and register the client in the following way. A full minimal application that experiences the problem can be found here.
var esdbConnectionString = configuration.GetValue("ESDB_CONNECTION_STRING", "esdb://admin:changeit#localhost:2113?tls=false");
var eventStoreClientSettings = EventStoreClientSettings.Create(esdbConnectionString);
var eventStoreClient = new EventStoreClient(eventStoreClientSettings);
services.AddSingleton(eventStoreClient);
My application has a high number of short streams over an extended period of time
To Reproduce
Steps to reproduce the behavior:
Register EventStoreClient as singleton as reccomended in the documentation.
Subscribe to a very high number of streams over an extended time.
Cancel the CancellationToken sent into the stream subscription and let it be garbage collected.
Watch memory usage of service grow.
How I am creating and subscribing to streams:
var streamName = CreateStreamName();
var payload = new PingEvent { StreamNr = _currentStreamNumber };
var eventData = new EventData(Uuid.NewUuid(), typeof(PingEvent).Name, EventSerialization.SerializeEventData(payload));
await _client.AppendToStreamAsync(streamName, StreamState.Any, new[] { eventData });
var streamCancellationTokenSource = new CancellationTokenSource(TimeSpan.FromMinutes(30));
await _client.SubscribeToStreamAsync(streamName, FromStream.Start, async (sub, evnt, token) =>
{
if (evnt.Event.EventType == "PongEvent")
{
_previousStreamIsDone = true;
streamCancellationTokenSource.Cancel();
}
},
cancellationToken: streamCancellationTokenSource.Token);
Approaches attempted
Registering as Transient or Scoped
If I register the client as Transient or Scoped in .Net DI it is throwing thousands of exceptions internally and causing multiple problems.
Manually handling lifetime of client
By having a singleton service that handles the lifetime of the client I have attempted to every once in a while dispose of the client and create a new one, ensuring that there exists only one instance of the client at the same time. This results in same problem as registering the service as Transient or Scoped.
I am using version 22.0.0 of the Event Store client in .Net 6 against Event Store Database 21.10.0. The problems happens both when running on windows and on the standard aspnet:6.0 linux docker container.
By inspecting the results of these dotnet-dumps the memory growth seem to be happening inside this HashSet of ActiveCalls in the gRPC client.
I am hoping to find a way of using the client that does not lead to memory growth.
In your reproduction the leaked calls are coming from the extra read that you are issuing while processing an event received on the subscription.
There is an open issue (https://github.com/EventStore/EventStore-Client-Dotnet/issues/219) at the moment to deal with this better, but currently if you issue a read but don't consume all the events and don't cancel the read, then the call remains open. In your case this is happening if the slave has managed to reply Pong before the master has issued the read that results from receiving its own Ping in the subscription. That read will then contain the Ping and the Pong, only the Ping is read, and the call remains open.
For now, if you cancel those reads by passing the cancellation token that you are cancelling into the ReadStreamAsync call in ReadFromStartOfStreamToEnd, it should resolve your problem.
In case it's helpful for you, you can see the number of Current Calls live rather than waiting a long time to see the effect on memory:
dotnet-counters monitor --counters "Grpc.Net.Client" -p <processid>
I have developed a Quarkus app with which I want to receive and process MQTT messages.
This also works so far.
My problem is that when the internet goes down at the MQTT broker and the app reconnects afterwards, the app reconnects to the broker but no messages are received. I think that the "subscribe" method is not called anymore.
How can I solve this problem?
Here is my Config:
mp.messaging.incoming.smarthome/electricity.connector=smallrye-mqtt
mp.messaging.incoming.smarthome/electricity.host=192.168.1.88
mp.messaging.incoming.smarthome/electricity.port=1883
mp.messaging.incoming.smarthome/electricity.reconnect-attempts=3000
mp.messaging.incoming.smarthome/electricity.reconnect-interval-seconds=10
mp.messaging.incoming.smarthome/electricity.qos=1
mp.messaging.incoming.smarthome/electricity.failure-strategy=ignore
Here is my Controller:
#Incoming("smarthome/electricity")
public void consume(byte[] raw) {
String price = new String(raw,StandardCharsets.UTF_8);
String[] parts = price.split(",");
String watt = parts[0].trim();
String timeStamp = parts[1].trim();
byte wattH = Byte.parseByte(watt.replace("WH", ""));
ZonedDateTime now = ZonedDateTime.now(ZoneId.of("Europe/Vienna"))
.withHour(Integer.parseInt(timeStamp.split(":")[0]))
.withMinute(Integer.parseInt(timeStamp.split(":")[1]));
Message message = new Message(wattH,now);
System.out.println(message);
service.addToPackage(message);
scheudler.check();
}
Stack Output if i cut the Connection:
2022-09-20 07:50:09,683 ERROR [io.sma.rea.mes.mqtt] (vert.x-eventloop-thread-0) SRMSG17105: Unable to establish a connection with the MQTT broker: java.net.SocketException: Connection reset
If the Connection is back:
2022-09-20 07:50:26,751 INFO [io.ver.mqt.imp.MqttClientImpl] (vert.x-eventloop-thread Connection with 192.168.1.88:1883 established successfully
So the connection seems to be back, but there are no more incoming messages.
I solved the Problem by myself.
I set :
quarkus.arc.remove-unused-beans=none
And now it works fine.
I tried many ways to fix the problem, but this seems to be the issue.
I think there is some bean removed in the runtime when the connection is lost for a too long time.
If anyone can explain why this happens please tell me
We're using Jersey (ver 2.22.2) to execute REST requests, and ApacheConnectorProvider together with PoolingHttpClientConnectionManager to manage our connections pool.
Is there a way to release manually connections from the leased connections list?
PoolingHttpClientConnectionManager provides methods to close expired and idle connections, but this will close and remove connections from the available connections list, which is not what I'm looking for.
The reason that I want to do it is because I want to avoid connections leaking.
The developer that is using the above service should always close the connection by doing response.readEntity() or response.close(), and if he forgets to do it, then I don't think that manually closing the connections is a good solution.
But if the connection wasn't close because of some unexpected issue, and remaining in the leased list, then I want to be able to close it by myself.
The same as Apache advising to write a daemon thread to clear expired connections ("Connection eviction policy"), I want to be able to clear connections from leased list as well.
Call response.close() inside finally block of your code or you can use java try-with-resources (java version > 7+) as well.
try{
ClosableHttpResposne response = .....;
}catch(Exception e){
// .....
}finally{
response.close();
}
or
try(ClosableHttpResposne response = .....){
}catch(Exception e){
}
We are using Lettuce in our project. We have a requirement to monitor the status of connection.
I know Lettuce can re-connect Redis when the connection is down. But is there some way to notify application that the connection is down/up?
Thanks,
Steven
Lettuce provides an event-model for connection events. You can subscribe to the EventBus and react to events published on the bus. There are multiple events, but for your case, you'd want to listen to connected and disconnected events:
ConnectionActivatedEvent: The logical connection is activated and can be used to dispatch Redis commands (SSL handshake complete, PING before activating response received)
ConnectionDeactivatedEvent: The logical connection is deactivated. The internal processing state is reset and the isOpen() flag is set to false.
Both events are fired after receiving Transport-related events such as ConnectedEvent respective DisconnectedEvent.
The following example illustrates how to consume these events:
RedisClient client = RedisClient.create()
EventBus eventBus = client.getresources().eventBus();
Disposable subscription = eventBus.get().subscribe(e -> {
if (e instanceOf ConnectionActivatedEvent) {
// …
}
});
…
subscription.dispose();
client.shutdown();
Please note that events are dispatched asynchronously. Anything that happens in the event listener should be non-blocking (i.e. if you need to call blocking code such as further Redis interaction, please offload this task to a dedicated Thread).
Read more
Lettuce Reference Documentation: Events
We have a web application running on Azure that performs miscellaneous database maintenance tasks like creating databases, deleting unused databases, and so on. Everything is running on Azure SQL.
This application runs 24/24, and the maintenance tasks are performed every hour. Most of the time, everyhing goes well. However, the task sometimes ends up with errors like those ones :
HTTP error GatewayTimeout : The gateway did not receive a response from ‘Microsoft.Sql’ within the specified time period
HTTP error ServiceUnavailable : The request timed out
SQLException : Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
SQLException : A connection was successfully established with the server, but then an error occurred during the pre-login handshake
It seems like the database is not reachable when this happens.
We'd be glad if someone could help us to debug the issue.
thank you in advance.
There are transient errors and other type of errors that are particular to Azure SQL Database. Transient fault errors typically manifest as one of the following error messages from your client programs:
•Database on server is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of
•Database on server is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of . (Microsoft SQL Server, Error: 40613)
•An existing connection was forcibly closed by the remote host.
•System.Data.Entity.Core.EntityCommandExecutionException: An error occurred while executing the command definition. See the inner exception for details. ---> System.Data.SqlClient.SqlException: A transport-level error has occurred when receiving results from the server. (provider: Session Provider, error: 19 - Physical connection is not usable)
•An connection attempt to a secondary database failed because the database is in the process of reconfguration and it is busy applying new pages while in the middle of an active transation on the primary database.
Because of those errors and more explained here. It is necessary to create a retry logic on applications that connect to Azure SQL Database.
public void HandleTransients()
{
var connStr = "some database";
var _policy = RetryPolicy.Create < SqlAzureTransientErrorDetectionStrategy(
retryCount: 3,
retryInterval: TimeSpan.FromSeconds(5));
using (var conn = new ReliableSqlConnection(connStr, _policy))
{
// Do SQL stuff here.
}
}
More about how to create a retry logic here.
Throttling is also a cause of timeouts. The following queries may help you understand the impact of workloads on the Azure SQL database.
SELECT
(COUNT(end_time) - SUM(CASE WHEN avg_cpu_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'CPU Fit Percent'
,(COUNT(end_time) - SUM(CASE WHEN avg_log_write_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'Log Write Fit Percent'
,(COUNT(end_time) - SUM(CASE WHEN avg_data_io_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'Physical Data Read Fit Percent'
FROM sys.dm_db_resource_stats
--service level objective (SLO) of 99.9% <= go to next tier