HiveMQ MQTT Client - Resubscribe topics on automatic reconnect - mqtt

I am using the HiveMQ MQTT client in Spring to receive MQTT messages.
My client configuration looks like this
public Mqtt3AsyncClient mqtt3Client() {
var mqtt3Client = Mqtt3Client.builder()
.serverHost("my.host")
.sslWithDefaultConfig()
.serverPort(0000)
.automaticReconnectWithDefaultConfig()
.buildBlocking();
mqtt3Client.connect();
return mqtt3Client.toAsync();
}
After the client is available, another Spring Bean is initialized using the client. It subscribes a topic:
#PostConstruct
public void subscribeTopic() {
mqtt3AsyncClient.subscribeWith()
.topicFilter("topicfilter")
.qos(MqttQos.AT_LEAST_ONCE)
.callback(message -> {
/*Handle message*/
})
.send()
.whenComplete((mqtt3SubAck, throwable) -> {
if (throwable != null) {
/*Logging*/
} else {
/*Logging*/
}
});
}
I saw multiple times that no more messages were delivered to my application while I was still able to use the client connection to send messages (thus it was connected at that time).
I could not find any documentation on how the HiveMQ MQTT client handles the configured automaticReconnectWithDefaultConfig(). Can anyone point out, whether my subscription created in subscribeTopic() is resubscribed?
I also found the method addSubscription() that may replace the .topicFilter(..).qos(...) part. I also could no find any information whether this makes the subscription more resilient to connection losses.
I'd appreciate any kind of information on that topic.
Thanks.

Currently the HiveMQ MQTT Client will only continue to receive messages for subscriptions if the broker reports an existing session in the ConAck of the re-connect. This requires two things - 1) you need to set cleanSession = false when initially connecting, and 2) the broker needs to have not lost the session in between connections.
For 1) you can try adding this to your connect:
client.connectWith().cleanSession(false).send();
With 2) it will depend on the broker and what the cause of the connection loss was. If it was 'just' a network outage and the broker was running normal in the background then it should work fine. If the broker crashed and was restarted, then it will require that the broker has persistence configured and that it was able to re-establish the session after the restart.
There are actually a couple of discussions over on the github project page of the HiveMQ MQTT Client regarding this issue and whether functionality should be added to auto re-subscribe even in the case that no pre-existing session was found. And also on a related note whether any publishes done while the connection was lost should auto-publish even if no session was found after the re-connect. If these are features you require, maybe hop on over there and chime in on the discussions :)
Lastly, you can also manually perform re-subscribes by adding a MqttClientConnectedListener while building the client which can then re-create the subscriptions each time the auto re-connect happens.
HTH
Cheers,
C

Related

Intermittent Socket Exceptions calling API / Services from ASP.NET Core 5.0 MVC site hosted in Azure

Error message in Application Insights:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (OurApiUrlAddress:443) A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
It's always a 21 seconds TCP timeout, this is a very generic error I know, but the reason for this error is not always the same, I've been reading all the threads about this. We've been investigating this problem for months with no luck, we're also in contact with Azure team.
Important: this same site written in RUBY was using this same API without any problem in the past, the API is responsive and it's called from other sites without any problem, but this specific site was migrated from RUBY to .NET and at the same time this site was hosted in AZURE, this are the 2 big changes. This just happens when the site (remember it's hosted in Azure) calls to API / services hosted in our company, this doesn't happen when site calls a service hosted somewhere else, these makes us think the problem may be related to the company infrastructure but it can't be that alone, this has to be related to .NET and AZURE someway since these APIs and services respond perfectly to calls from other sites hosted in our network and they were working fine with the ruby version of this site. These Apis and services are not throwing this error when called in-browser from outside the company network.
The services/apis are behind a firewall but ports are perfectly configured (there are not any other traffic apps nor devices at play).
This error doesn't seem to be related to port exhaustion or SNAT, since sometimes only 1 developer alone is working in the DEV environment and he gets this socket exception error.
Just to give an idea we're getting around 250 socket exceptions a day on production, and this is just a small percentage of all the calls, so there is something that, just sometimes, is making this happen.
We know about the well known HttpClient issue when multiple instances are created, so we decided to use the Singleton approach ensuring only 1 instance per API/Service, as I'll show here, this is the call that gives more socket exceptions:
In StartUp class/file:
services.AddSingleton<IUploadApi>(new UploadApi(new HttpClient() { BaseAddress = new Uri(appSettings.Endpoints.UploadServicesUrl) }));
Part of appsettings.json:
"Endpoints": {
"UploadServicesUrl": "https://ourApiUrlAddress"
},
UploadApi.cs
public interface IUploadApi
{
Task<UploadArtworkViewModel.UploadConfigurationData> GetUploadConfiguration();
}
public class UploadApi : IUploadApi
{
private readonly HttpClient httpClient;
public UploadApi(HttpClient client)
{
httpClient = client;
}
public async Task<UploadArtworkViewModel.UploadConfigurationData> GetUploadConfiguration()
{
var response = await httpClient.GetAsync("api/GetUploadConfiguration").ConfigureAwait(false);
var json = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
return JsonConvert.DeserializeObject<UploadArtworkViewModel.UploadConfigurationData>(json);
}
}
Call from controller:
model.UploadConfiguration = await UploadApi.GetUploadConfiguration().ConfigureAwait(false);
Any idea on things to test or places to look are welcome, obviously I've not been able to reproduce this one. We know there's always a 21 seconds timeout, that's a TCP timeout, but that doesn't help much. Maybe for some reason the connection is dropped or Azure is having problems (sometimes) when accessing the company network. I can post more info from application insights if needed but I don't see anything special there about the error.
EDIT - More info: It happens when any API or service is called from this MVC site Controllers, so the problem appears sporadically (still like 300 times per day) when the site server tries to reach an API or service, this makes me believe it's something related to the company infraestructure, but still no idea what it could be.
From asp.net monsters:
"the application has exited and yet there are still a bunch of these
connections open"
"They are in the TIME_WAIT state which means that the connection has
been closed on one side (ours) but we’re still waiting to see if any
additional packets come in on it because they might have been delayed
on the network somewhere."
Even if you're using a singleton HttpClient, it seems that some of the connections are awaiting for additional packages which leads to socket exaustion.
The solution is to change your code and use HttpClientFactory or HttpClientFacotoryLite. The reason to use HttpClientFactory is that produces HttpClient instances that resuse Socket handlers from a pool of socket handlers. The handlers are recycled periodically to also take care of DNS changes. In summary, when using HttpClientFactory, HttpClient delegates work to a SocketClientHandler.
We finally got this problem fixed after working together with Azure team for some time, it was a gateway problem, solution was applying NAT/Vnet Integration. This is what we did to fix it:
https://learn.microsoft.com/en-us/azure/app-service/networking/nat-gateway-integration

How to get broker details on connectionLost()

I am using eclipse paho java client to connect to mqtt broker.
Have written a subscriber client implementing MqttCallbackExtended.
I am getting connectionLost() callback.
But how do I get to know that which broker lost the connection.
I have specified multiple uri's via setServerURIs() api of MqttConnectOptions.
If you have specified multiple brokers, they all should be part of the same cluster offering the same topic space.
This means you shouldn't need to care from the client side which broker you were connected to as the client will just move to the next in the list when it tries to reconnect.
But if you really need to know, then you can always log the URI when the connection is created using the information from the connectionComplete() callback on the MqttCallbackExtended class

how to sanitize mqtt message payload in server side?

I made an instant messaging app using MQTT protocol.
I want to add some extra data about messages in payload like sent time ( server time not client time ) and also provide kind of server side payload sanitizing.
Is it a good idea to add a third party client with superuser privileges between message sender and message receiver on broker's local machine to do this job ?
or is there any better idea ?
by the way I'm using EMQTT as message broker.
From a pure security view having direct peer to peer traffic (without filtering and sanitising) sounds like a dangerous idea. (At least in the Internet-of-things domain I would clearly object against it.)
Why? Because the clients are outside of your control (i.e. a hacker can re-engineer) and inject any traffic to exploit security holes on the receiving side of other clients.
So sanitising on the server side sounds like a very good idea.
I would suggest two topics: One (inbound) topic the clients use to publish messages, and another (outbound) topic used by clients to subscribe to messages. A server side component would then read the messages from inbound topic, sanitize it and publish to the outbound topic.
This de-coupeling makes it also easier to introduce MQTT payload changed: If you update the payload in a non-compatible way, introduce a new inbound topic and keep the old inbound topic too. This allows you to support old and new clients during the transition phase.

Signalr calling specific client from outside the hub

I'm aware of the Chris Fulstow project log4net.signalr, it is a great idea if you want a non production log since it logs all messages from all requests. I would like to have something that discriminates log messages by the request originating them and sed back to the proper browser.
Here what I've done in the appender:
public class SignalRHubAppender:AppenderSkeleton
{
protected override void Append(log4net.Core.LoggingEvent loggingEvent)
{
if (HttpContext.Current != null)
{
var cookie = HttpContext.Current.Request.Cookies["log-id"];
if (null != cookie)
{
var formattedEvent = RenderLoggingEvent(loggingEvent);
var context = GlobalHost.ConnectionManager.GetHubContext<Log4NetHub>();
context.Clients[cookie.Value].onLog(new { Message = formattedEvent, Event = loggingEvent });
}
}
}
}
I'm trying to attach the session id to a cookie, but this does not work on the same machine because the cookie is overwritten.
here is the code I use on the client to attach the event:
//start hubs
$.connection.hub.start()
.done(function () {
console.log("hub subsystem running...");
console.log("hub connection id=" + $.connection.hub.id);
$.cookie("log-id", $.connection.hub.id);
log4netHub.listen();
});
As a result, just the last page connected shows the log messages. I would like to know if there is some strategies to have the current connection id from the browser which originate the current request, if there is any.
Also I'm interested to know if there is better design to achieve a per browser logging.
EDIT
I could made a convention name based cookie ( like log-id-someguid ), but I wonder if there is something smarter.
BOUNTY
I decided to start a bounty on that question, and I would additionally ask about the architecture, in order to see if my strategy makes sense or not.
My doubt is, I'm using the hub in a single "direction" from server to client, and I use it to log activities not originating from calls to the hub, but from other requests ( potentially requests raised on other hubs ), is that a correct approach, having as a goal a browser visible log4net appender?
The idea about how to correctly target the right browser instance/tab, even when multiple tabs are open on the same SPA, is to differentiate them through the Url. One possible way to implement that is to redirect them at the first access from http://foo.com to http://foo.com/hhd83hd8hd8dh3, randomly generated each time. That url rewriting could be done in other ways too, but it's just a way to illustrate the problem. This way the appender will be able to inspect the originating Url, and from the Url through some mapping you keep server side you can identify the right SignalR ConnectionId. The implementation details may vary, but the basic idea is this one. Tracking some more info available in the HttpContext since the first connection you could also put in place additional strategies in order to prevent any hijacking.
About your architecture, I can tell you that this is exactly the way I used it in ElmahR. I have messages originating from outside the notification hub (errors posted from other web apps), and I do a broadcast to all clients connected to that hub (and subscribing certain groups): it works fine.
I'm not an authoritative source, but I also guess that such an architecture is ok, even with multiple hubs, because hubs at the end of the day are just an abstraction over a (one) persistent connection which allows you to group messaging by contexts. Behind the scenes (I'm simplifying) you have just a persistent connection with messages going back and forth, so whatever hub structure you define on top of it (which is there just to help you organizing things) you still insist on that connection, so you cannot do any harm.
SignalR is good on doing 2 things: massive broadcast (Clients), and one-to-one communication (Caller). As long as you do not try to do weird things like building keeping server-side references to specific callers, you should be ok, whatever number of Hubs, and interactions among them, you have.
These are my conclusions, coming from the field. Maybe you can twit #dfowler about this question and see if he has (much) more authoritative guidelines.

Implementing Acknowledge-Extension for CometD in Jetty/ASP.NET

we're using CometD 2 to achieve the connection between a central data provider and several backends consuming the data. Up to now, when one of the backends fails briefly, all messages posted in the meantime are lost. Now we heard about the "Acknowledge Extension" for CometD. It is supposed to create a server-side list of messages and delivers them when one of the clients reports to be back online. Here are some questions:
1) Does this also work with several clients?
2) The documentation (http://cometd.org/documentation/2.x/cometd-ext/ack) says: "Note that if the disconnected browser is disconnected for in excess of maxInterval (default 10s), then the client will be timed out and the unacknowledged queue discarded." -> does this mean that in case my client doesn't restore within the maxInterval, the messages are lost anyway?
Hence,
2.1) What's the maximal maxInterval? Which consequences does it have to set it to a high value?
2.2) We'd need a secure mechanism for fail outs of at least a few minutes. Is this possible? Are there any alternatives?
3) Is it really only necessary to add the two extensions in both the client and cometD server? We're using Jetty for the server and .NET Oyatel for the client. Does anyone have some experiences with this?
I'm sorry for this bunch of questions, but unfortunately, the CometD project isn't really well documented. I really appreciate any answers.
Cheers,
Chris
1) Does this also work with several Clients
Yes, it does. There is one message queue allocated for each client (see AcknowledgedMessagesClientExtension).
2) does this mean that in case my client doesn't restore within the maxInterval, the messages are lost anyway?
Yes, it does. When the client can't reach the server for maxInterval milliseconds, the server will throw away all state associated with that client.
2.1) What's the maximal maxInterval? Which consequences does it have to set it to a high value?
maxInterval is a servlet parameter of the Cometd servlet. It is internally treated as a long value, so the maximal value for it is Long.MAX_VALUE.
Example configuration:
<init-param>
<!-- The max period of time, in milliseconds, that the server will wait for
a new long poll from a client before that client is considered invalid
and is removed -->
<param-name>maxInterval</param-name>
<param-value>10000</param-value>
</init-param>
Setting it to a high value means that the server will wait longer before throwing away the state associated with a client (from the time the client stops contacting the server).
I see two problems with this. First, the memory requirements of the server will potentially be higher (which may also make denial of service easier). Second, the RemoveListener isn't called on the Server before the maxInterval expires, which may require you to implement additional logic that differentiates between "momentarily unreachable" and "disconnected".
2.2) We'd need a secure mechanism for fail outs of at least a few minutes. Is this possible? Are there any alternatives?
Yes, it is possible to configure the maxInterval to last for a few minutes.
An alternative would be to restore any server side state on every handshake. This can be achieved by adding a listener to "/meta/handshake" and publishing a message to a "/service/" channel (to make sure only the server receives the message), or by adding an additional property to the "ext" property of the handshake message. Be careful to let the client restore only valid state (sign it on the server if you must).
3) Is it really only necessary to add the two extensions in both the client and cometD server?
On the server it is sufficient to do something like:
bayeux.addExtension(new AcknowledgedMessagesExtension());
I don't know how you'd do it on Oyatel. In Javascript it suffices to simply include the extension (dojo.require or script include for jQuery).
When a client with the AckExtension connects to the server, a message similar to the following will be logged (from my Jetty console log):
[qtp959713667-32] INFO org.cometd.server.ext.AcknowledgedMessagesExtension - Enabled message acknowledgement for client 51vkuhps5qgsuaxhehzfg6yw92
Another note because it may not be obvious: the ack extension will only provide server to client delivery guarantee, not client to server. That is, when you publish a message from the client to the server, it may not reach the server and will be lost.
Once the message has made it to the server, the ack extension will ensure that all recipients connected at that time will receive the message (as long as they aren't unreachable for maxInterval milliseconds).
It is relatively straightforward to implement client-side retrying if you listen to notifications on "/meta/unsuccessful" and resend the message (the original message that failed is passed as message.request to the handler).

Resources