I wrote two small programs which tried to acquire the same Remote Mutex named "The Token":
ACE_Remote_Mutex token("The Token", 1, 1);
token.acquire();
ACE_OS::sleep(5);
token.release();
return 0;
Both of them got the following debug output:
(3078597488) acquired The Token
(4243|3078597488) BIG PROBLEMS with get_connection: Connection refused
error on remote acquire, releasing shadow mutex.
(3078597488) released The Token, owner is no owner
(4243|3078597488) BIG PROBLEMS with get_connection: Connection refused
(3078597488) release failed: Permission denied.
(3078597488) shadow: release failed
Does ACE_Remote_Mutex work only with some sort of "agent" like Corba broker? Can I modify my code?
Remote_Mutex uses Token Service to acquire lock. Token Service is not a CORBA service but it plays a role similar to it. Here is an example of svc.conf entry that starts Token Service dynamically:
dynamic Token_Service Service_Object *
../lib/netsvcs:_make_ACE_Token_Acceptor()
"-p 10202"
Related
Question
Is it possible to further specifiy what kind of failures trip a dapr circuit breaker targeting a component (AWS SQS)?
Use case
I am sending emails via AWS SES. If you reach your sending limits, the AWS sdk throws an error (code 454). In this case, I want a circuit breaker to stop the processing of the queue and retry sending the emails later.
However, when you have another error, e.g. invalid email address, I don't want this to trip the circuit breaker as it is not transient. I would like to send the message to the DLQ though, as to manually examine these messages later (-> that's why I am still throwing here and not failing siltently).
Simplified setup
I have a circuit breaker defined that trips when my snssqs-pubsub component has more than 3 consecutive failures:
circuitBreakers:
pubsubCB:
# Available variables: requests, totalSuccesses, totalFailures, consecutiveSuccesses, consecutiveFailures
trip: consecutiveFailures > 3
# ...
targets:
components:
snssqs-pubsub-emails:
inbound:
circuitBreaker: pubsubCB
In my application, I want to retry sending emails that failed because the AWS SES sending limit was hit:
try {
await this.sendMail(options);
} catch (error) {
if (error.responseCode === '454') {
// This error should trip the cicuit breaker
throw new Error({ status: 429, 'Rate limited. Should be retried.' })
} else {
// This error should not trip the circuit breaker.
// Because status is 404, dapr directly puts the message into DLQ and skips retry
throw new NotFoundError({ status: 404 })
}
}
You may not have a problem to worry about if you have business case that does not violate AWS Terms of Service. You can put a support ticket and get SES Service Limit's raised.
It does not appear that dapr retry policies don't support the customization you need but .NET does.
If you don't want to process the message, then don't delete it. You can then set visibility timeout of the message in the SQS so they stay hidden to avoid processing again too quickly. Any exception thrown regardless will end up in the DLQ.
I am using the event store client for .Net and I am struggling to find the correct way to use the client. When I register the client as a singleton in the .Net dependency injection and run my application over an extended period of time memory usage grows continuously with each subscription.
I create and register the client in the following way. A full minimal application that experiences the problem can be found here.
var esdbConnectionString = configuration.GetValue("ESDB_CONNECTION_STRING", "esdb://admin:changeit#localhost:2113?tls=false");
var eventStoreClientSettings = EventStoreClientSettings.Create(esdbConnectionString);
var eventStoreClient = new EventStoreClient(eventStoreClientSettings);
services.AddSingleton(eventStoreClient);
My application has a high number of short streams over an extended period of time
To Reproduce
Steps to reproduce the behavior:
Register EventStoreClient as singleton as reccomended in the documentation.
Subscribe to a very high number of streams over an extended time.
Cancel the CancellationToken sent into the stream subscription and let it be garbage collected.
Watch memory usage of service grow.
How I am creating and subscribing to streams:
var streamName = CreateStreamName();
var payload = new PingEvent { StreamNr = _currentStreamNumber };
var eventData = new EventData(Uuid.NewUuid(), typeof(PingEvent).Name, EventSerialization.SerializeEventData(payload));
await _client.AppendToStreamAsync(streamName, StreamState.Any, new[] { eventData });
var streamCancellationTokenSource = new CancellationTokenSource(TimeSpan.FromMinutes(30));
await _client.SubscribeToStreamAsync(streamName, FromStream.Start, async (sub, evnt, token) =>
{
if (evnt.Event.EventType == "PongEvent")
{
_previousStreamIsDone = true;
streamCancellationTokenSource.Cancel();
}
},
cancellationToken: streamCancellationTokenSource.Token);
Approaches attempted
Registering as Transient or Scoped
If I register the client as Transient or Scoped in .Net DI it is throwing thousands of exceptions internally and causing multiple problems.
Manually handling lifetime of client
By having a singleton service that handles the lifetime of the client I have attempted to every once in a while dispose of the client and create a new one, ensuring that there exists only one instance of the client at the same time. This results in same problem as registering the service as Transient or Scoped.
I am using version 22.0.0 of the Event Store client in .Net 6 against Event Store Database 21.10.0. The problems happens both when running on windows and on the standard aspnet:6.0 linux docker container.
By inspecting the results of these dotnet-dumps the memory growth seem to be happening inside this HashSet of ActiveCalls in the gRPC client.
I am hoping to find a way of using the client that does not lead to memory growth.
In your reproduction the leaked calls are coming from the extra read that you are issuing while processing an event received on the subscription.
There is an open issue (https://github.com/EventStore/EventStore-Client-Dotnet/issues/219) at the moment to deal with this better, but currently if you issue a read but don't consume all the events and don't cancel the read, then the call remains open. In your case this is happening if the slave has managed to reply Pong before the master has issued the read that results from receiving its own Ping in the subscription. That read will then contain the Ping and the Pong, only the Ping is read, and the call remains open.
For now, if you cancel those reads by passing the cancellation token that you are cancelling into the ReadStreamAsync call in ReadFromStartOfStreamToEnd, it should resolve your problem.
In case it's helpful for you, you can see the number of Current Calls live rather than waiting a long time to see the effect on memory:
dotnet-counters monitor --counters "Grpc.Net.Client" -p <processid>
The documentation for mqtt.client:connect() states that the last arg is a "callback function for when the connection could not be established".
I have a case where mqtt.client:connect() succeeds, so the "not established" callback is not called (correct behavior). But, later, when my mqtt broker goes down, the "not established" callback function gets unexpectedly activated.
I have the following code:
function handle_mqtt_error (client, reason)
print("mqtt connect failed, reason = "..reason..". Trying again shortly.")
tmr.create():alarm(10 * 1000, tmr.ALARM_SINGLE, do_mqtt_connect)
end
function do_mqtt_connect ()
print("connecting---")
m:connect(MQTT_HOST, MQTT_PORT, 1, function(client)
print("mqtt connected")
client:publish("topic/status", "online", 1, 1)
end,
handle_mqtt_error
)
print("returning---")
end
-- init mqtt client
m = mqtt.Client(MQTT_CLIENT_ID, 120, MQTT_USER, MQTT_PASS)
-- connect to mqtt
print("Starting Test")
do_mqtt_connect()
I see the output from the test begin, as expected, with:
Starting Test
connecting---
returning---
mqtt connected
At this point, I kill my mqtt broker, and I unexpectedly see:
mqtt connect failed, reason = -5. Trying again shortly.
connecting---
returning---
mqtt connect failed, reason = -5. Trying again shortly.
connecting---
returning---
And, happily, but unexpectedly, when I restart my broker, I see:
mqtt connected
So, it appears that handle_mqtt_error() is not only called "when the connection could not be established". It appears that it also be called if mqtt.client:connect() successfully establishes a connection, then the connection is later broken.
======= New Information =======
I downloaded the "dev" tree and used the Docker image to build the firmware. Within mqtt.c, I enabled NODE_DBG. The interesting lines are:
enter mqtt_socket_reconnected.
mqtt connect failed, reason = -5. Trying again shortly.
enter mqtt_socket_disconnected.
leave mqtt_socket_disconnected.
leave mqtt_socket_reconnected.
The "mqtt connect failed..." message is printed by handle_mqtt_error(), which is my "connect failed" callback.
Here's my theory. When my test starts, do_mqtt_connect() calls mqtt_socket_connect(), which does this:
espconn_regist_reconcb(pesp_conn, mqtt_socket_reconnected);
This sets reconnect_callback (in app/lwip/app/espconn.c). Later, after my broker goes down and comes back up, espconn_tcp_reconnect() is called (in app/lwip/app/espconn_tcp.c). It calls the reconnect_callback, which is mqtt_socket_reconnected(), which calls handle_mqtt_error().
So, I think the end result doesn't match the documentation, but it works out okay for me. If the behavior did match the documentation, I would just add some Lua code to handle the "offline" event, and try to reestablish the mqtt connection. I just thought someone might be interested if the behavior doesn't match the documentation.
We have a web application running on Azure that performs miscellaneous database maintenance tasks like creating databases, deleting unused databases, and so on. Everything is running on Azure SQL.
This application runs 24/24, and the maintenance tasks are performed every hour. Most of the time, everyhing goes well. However, the task sometimes ends up with errors like those ones :
HTTP error GatewayTimeout : The gateway did not receive a response from ‘Microsoft.Sql’ within the specified time period
HTTP error ServiceUnavailable : The request timed out
SQLException : Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
SQLException : A connection was successfully established with the server, but then an error occurred during the pre-login handshake
It seems like the database is not reachable when this happens.
We'd be glad if someone could help us to debug the issue.
thank you in advance.
There are transient errors and other type of errors that are particular to Azure SQL Database. Transient fault errors typically manifest as one of the following error messages from your client programs:
•Database on server is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of
•Database on server is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of . (Microsoft SQL Server, Error: 40613)
•An existing connection was forcibly closed by the remote host.
•System.Data.Entity.Core.EntityCommandExecutionException: An error occurred while executing the command definition. See the inner exception for details. ---> System.Data.SqlClient.SqlException: A transport-level error has occurred when receiving results from the server. (provider: Session Provider, error: 19 - Physical connection is not usable)
•An connection attempt to a secondary database failed because the database is in the process of reconfguration and it is busy applying new pages while in the middle of an active transation on the primary database.
Because of those errors and more explained here. It is necessary to create a retry logic on applications that connect to Azure SQL Database.
public void HandleTransients()
{
var connStr = "some database";
var _policy = RetryPolicy.Create < SqlAzureTransientErrorDetectionStrategy(
retryCount: 3,
retryInterval: TimeSpan.FromSeconds(5));
using (var conn = new ReliableSqlConnection(connStr, _policy))
{
// Do SQL stuff here.
}
}
More about how to create a retry logic here.
Throttling is also a cause of timeouts. The following queries may help you understand the impact of workloads on the Azure SQL database.
SELECT
(COUNT(end_time) - SUM(CASE WHEN avg_cpu_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'CPU Fit Percent'
,(COUNT(end_time) - SUM(CASE WHEN avg_log_write_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'Log Write Fit Percent'
,(COUNT(end_time) - SUM(CASE WHEN avg_data_io_percent > 80 THEN 1 ELSE 0 END) * 1.0) / COUNT(end_time) AS 'Physical Data Read Fit Percent'
FROM sys.dm_db_resource_stats
--service level objective (SLO) of 99.9% <= go to next tier
I am passing authentication token parameter like below...
socket = SocketIOClient(socketURL: URL(string:"http://localhost:3001")!, config: [.log(true), .forcePolling(true),.connectParams(["Authorisation": kToken])])
But it always gives an error....
LOG SocketParser: Decoded packet as: SocketPacket {type: 2; data: [login_ack, {
message = "InValid Token";
result = "";
status = error;
}]; id: -1; placeholders: -1; nsp: /}
As I think that I am missing some thing to establish authenticated socket connection. So, please guide me where I am lacking.
I have resolved it by my self!
It was the issue from back end team. They were put some dependencies with the socket call back sender device information details which they are taking from the socket call backs. And the actual problem was ,all the devices and computers were providing their device details but iOS was not giving it from their end with their security protection.So,their validity condition were not satisfying for iOS socket call backs.
Backend team handled this situation by putting "Other" device type checking at their end.