Reasons for Solace CLIENT_CLIENT_DISCONNECT_MQTT - solace

Recently one of our MQTT clients is disconnected by Solace quite often in our Development Solace appliance but there is no issue for the same client in Test Solace appliance. We have no clue why this happens.
Upon checking Solace event log, I noticed there are quite a number of records in the event log for CLIENT_CLIENT_DISCONNECT_MQTT event. There are different reasons given for the event. The unique reasons I filtered out from the event log are listed below. May I know what could be the causes of these reasons?
Following are the reasons for CLIENT_CLIENT_DISCONNECT_MQTT event I filtered from the event log:
Client Disconnect Received
Forced Logout
Peer TCP Closed
Peer TCP Reset
I tried to think of the possible causes. For (1), does that mean client performs a normal MQTT disconnect call? For (2), could it be triggered by our backend application which issues SEMP command to disconnect the client as we do have such a function at the backend application? As for (3) and (4), I am not sure under what circumstances it happens as our MQTT client does not do anything specifically that could cause a disconnection to happen.
Is there any documentation of the reasons and the explanation for the causes of them?

I found the answer in Solace syslog documentation, https://docs.solace.com/System-and-Software-Maintenance/Monitoring-Events-Using-Syslog.htm
In addition, I did a simple experiment and found the following:
Client Disconnect Received: when client does a mqtt disconnect call
Forced Logout: (a) when Solace disconnects a client if duplicate client ID is used; (b) When SEMP command is used to disconnect the client
Peer
Peer TCP Reset: when the client 's connection is interrupted (e.g. the client program is killed by pressing ctrl+c)

Related

MQTT bridge with Sparkplug B -> NDEATH scenarios not working as expected

I have two machines and am testing MQTT bridge connections with Sparkplug B payloads.
I have the bridges working as expected but when I simulate some failure points as annotated in the image below, things are not working as expected. My expectation is an NDEATH will be visible on the broker on Machine B when any of the three points in the image disconnect.
When I kill the publisher or the local MQTT Broker on Machine A, I do indeed see the NDEATH as expected when subscribed to the Machine B MQTT Broker, but when I pull the plug between Machine A & B as noted by #3 in the image, I do not see a NDEATH! I have waited for a long period to make sure the 60 second keep alive has had plenty of time to react which I understand to be 1.5x the keep alive typically.
The publisher and Broker on Machine A continue to operate and when the connection at point #3 is brought back online, all messages are delivered, but I was expecting with the bridge connection down, any nodes that had published a last will & testament (LWT) would see an NDEATH due to the connection loss at any point.
I have tested with mosquitto, vernemq and a little with hive-ce, however hive-ce is severely limited in functionality. Am I missing something with my understanding of MQTT bridging? Shouldn't NDEATH be sent in all three scenarios?
From the sparkplug spec:
A critical aspect for MQTT in a real-time SCADA/IIoT application is making sure that the primary MQTT SCADA/IIoT
Host Node can know the “STATE” of any EoN node in the infrastructure within the MQTT Keep Alive period (refer
to section 3.1.2.10 in the MQTT Specification). To implement the state a known Will Topic and Will Message is
defined and specified. The Will Topic and Will Message registered in the MQTT CONNECT session establishment,
collectively make up what we are calling the Death Certificate. Note that the delivery of the Death Certificate
upon any MQTT client going offline unexpectedly is part of the MQTT protocol specification, not part of this
Sparkplug™ specification (refer to section 3.1 CONNECT in the MQTT Specification for further details on how an
MQTT Session is established and maintained).
So, in MQTT terms, NDEATH is a 'Will' which, as mentioned above, is defined in section 3.1 of the the MQTT spec:
If the Will Flag is set to 1 this indicates that, if the Connect request is accepted, a Will Message MUST be stored on the Server and associated with the Network Connection. The Will Message MUST be published when the Network Connection is subsequently closed unless the Will Message has been deleted by the Server on receipt of a DISCONNECT Packet
In summary NDEATH creates a 'Will' which the MQTT broker publishes if it looses the connection with the publisher (unless a DISCONNECT is received first).
When you establish a bridge this relays messages published on whatever topic(s) the bridge is configured to relay. The bridge only communicates published messages; not information about what clients are connected (or any 'Will' they may have set) so when the bridged connection goes down subscribers will not receive the NDEATH.
Monitoring the connection status of bridges is not something covered by the spec so options vary from broker to broker. For example Mosquitto can (using a 'Will' on the bridge connection) provide a notification when the connection goes down (see notifications in mosquitto.conf). This may provide you with some options to get the information you need.

Issue with Google IoT MQTT bridge

We have an IoT based application device which is configured to communication with our Dashboard via MQTT bridge from Various service providers like Google, AWS and Azure.
So the flow is:
Device start TLS session with service provider.
Subscribe to a particular topic and wait for messages from the
service provider with 5 second timeout.
Dashboard publishes messages to same topic periodically.
IoT service provider broadcast it to all devices subscribed.
Publish and subscribe messages are with MQTT QOS 1 services.
Observation:
AWS and Azure works fine with above flow, but device stop receiving messages from Google MQTT bridge after 3-5 successful iterations even though our dashboard is publishing messages to Google IoT MQTT bridge.
For Google, we have identified that control flow is different when compared with Azure and AWS.
For Google, we need to subscribe and un-subscribe for a given topic every-time before waiting to receive message while for AWS and Azure we need to subscribe once during opening a MQTT connection.
Issue:
Sometime 5 sec device timeout occurs as it could not receive messages for subscribed topic from Google MQTT bridge. Adding multiple retries to overcome timeout issue was unsuccessful as issue still persist as device could not receive message from Google MQTT bridge after 45-60sec of device operation after powering on.
Is there is constraint with Google MQTT bridge to receive messages periodically without subscribing it every-time?
How can device receive messages without timing out (5 sec) from Google MQTT bridge?
Is there any workaround to recover a device once it got timed out with establishing MQTT reconnection?
I am using google iot core as well,the device side code for the mqtt client is golang while using paho mqtt package. this client package support OnConnect handler which while using this handler I achieve the recovery which I think you are looking for.
Via this handler I am re-subscribing to the "config" topic.
I think that google does not save the subscriptions which the clients are subscribed to and therefore the client needs to re-subscribe upon successful connection
Here's the golang code I've used (inspired by gingi007's answer, thank you!)
var onConn MQTT.OnConnectHandler
onConn = func(client MQTT.Client) {
fmt.Println("connected")
client.Subscribe(topic.Config, 1, handlerFunc)
}
mqttOpts.SetOnConnectHandler(onConn)
client := MQTT.NewClient(mqttOpts)
this way config updates keep flowing to my device, while if you subscribe outside of the onConnectHandler you'll just receive one config update when you connect.

#Solace - Implementing DR for solace Java JMS publisher

I have an existing application which was running on solace jar v7.1.2 execute in pub/sub mode. Now we have upgraded to v10.1.1 and as part of implementing DR setup(Disaster Recovery), I have added one more host in the configuration with comma separated.
The application could connect to the primary host successfully, but during the switch-over, (ie from primary to DR) the application had failed to connect and i have received the below error. It connects to DR host if I restart my application.
com.solacesystems.jcsmp.JCSMPErrorResponseException: 400: Unknown Flow Name [Subcode:55]
at com.solacesystems.jcsmp.impl.flow.PubFlowManager.doPubAssuredCtrl(PubFlowManager.java:266)
at com.solacesystems.jcsmp.impl.flow.PubFlowManager.notifyReconnected(PubFlowManager.java:452)
at com.solacesystems.jcsmp.protocol.impl.TcpClientChannel$ClientChannelReconnect.call(TcpClientChannel.java:2097)
... 5 more
|EAI-000376|||ERROR| |EAI-000376 JMS Exception occurred, Description: `Error sending message - unknown flow name ((JCSMPTransportException)
Need help to understand if we need to have some configuration to do the reconnect to the DR host for a smooth switch over.
In Solace JMS API versions earlier than 7.1.2.226, any sessions on which the clients have published Guaranteed messages will be destroyed after a DR switch‑over. To indicate the disconnect and loss of publisher flow the JMS API will generate this exception. Upon receiving these exceptions, the client application should create a new session. After a new session is established, the client application can republish any Guaranteed messages that had been sent but not acked on the previous session, as these message might not have been persisted and replicated.
However, this behavior was improved in version 7.1.2.226 and later so that the API handles this transparently. It is no longer required to implement code to catch this exception. Can you please verify that the application is not using an API earlier 7.1.2.226? This can be done by enabling debug-level logs.
As Alexandra pointed out, when using guaranteed messaging, as of version 7.1.2 the Solace JMS API guarantees delivery even in the case of failover. It is normal to receive INFO-level log messages that say "Error Response (400) - Unknown Flow Name", this does not indicate a problem, but exceptions (with stack traces) are a problem and indicate that delivery is not guaranteed.
Background: if the connection between the client and the broker (on the Solace server) is terminated unexpectedly, the broker maintains the flow state — but only for three minutes. The state is also copied to the HA mate broker to support failover (but not to the replication mate). If the client reconnects within three minutes, it can resume where it left off. If it reconnects after three minutes, the server will respond with the following (which will be echoed to the logs):
2019-01-04 10:00:59,999 INFO [com.solacesystems.jcsmp.impl.flow.PubFlowManager] (Context_2_Thread_reconnect_service) Error Response (400) - Unknown Flow Name
2019-01-04 10:00:59,999 INFO [com.solacesystems.jcsmp.impl.PubADManager] (Context_2_Thread_reconnect_service) Unknown Publisher Flow (flowId=36) recovered: 1 messages renumbered and resent (lastMessageIdSent =0)
That's okay: the client JMS library will automatically resend whatever messages are necessary, so guaranteed messaging is still guaranteed.
Also, just to confirm, the jar name indicates the version, so sol-jms-10.1.1.jar uses version 10.1.1.

About the usage of the Last will and testament message in MQTT

In my case, I use the last will message of MQTT to notify that some clients disconnect unexpectedly, such as the listening topic is "status".
My question is that when one client is connected to the brokerA, then disconnect unexpectedly from the brokerA, but it reconnects again to the brokerA, at the time, could there be one "last message" sent to the "status" topic?
Another question is that: when the last will message could be sent after one client disconnected from the broker unexpectedly.
There is a really good description of the LWT (Last Will and Testement) here:
http://www.hivemq.com/mqtt-essentials-part-9-last-will-and-testament/
But the simple version is as follows:
The broker will only deliver the LWT message under the following circumstances are met:
An I/O error or network failure is detected by the server.
The client fails to communicate within the Keep Alive time.
The client closes the network connection without sending a DISCONNECT
packet first.
The server closes the network connection because of a protocol error.

Losing messages over lost connection xmpp

i went through this question
Lost messages over XMPP on device disconnected
but there is no answer.
When a connection is lost due to some network issue then the server is not able to recognize it and keeps on sending messages to disconnected receiver which are permanently lost.
I have a workaround in which i ping the client from server and when the client gets disconnected server is able to recognize it after 10 sec and save further messages in queue preventing them from being lost.
my question is can 100% fail save message delivery be achieved by using some other way i know psi and many other xmpp client are doing it.
on ios side i am using xmppframework
One way is to employ the Advanced Message Processing (AMP) on your server; another one is to employ the Message Delivery Receipts on your clients.
The former one requires an AMP-enabled server implementation and the initiating client has to be able to tell the server what kind of delivery status reports it wants (it wants an error to be returned if the delivery is not possible). Note that this is not bullet-proof anyway as there is a window between the moment the target client losts its connectivity with the server and the moment the TCP stack on the server's machine detects this and tells the server about it: during this window, everything sent to the client is considered by the server to be sent okay because there's no concept of message boundaries in the TCP layer and hence if the server process managed to stuff a message stanza's XML into the system buffers of its TCP connection, it considers that stanza to be sent—there's no way for it to know which bits of its stream did not get to the receiver once the TCP stack says the connection is lost.
The latter one is bullet-proof as the clients rely on explicit notifications about message reception. This does increase chattiness though. In return, no server support for this feature is required—it's implemented solely in the clients.
go with XEP-0198 and enjoy...
http://xmpp.org/extensions/xep-0198.html
For a XMPP client I'm working on, the following mechanism is used:
Add Reachability to the project, to detect quickly when the phone is having connectivity problems.
Use a modified version of XEP-0198, adding a confirmation sent by the server. So, the client sends a message, the server confirms with a receipt. Later on, the receiving user will also confirm with a receipt. For each message you send, you get two confirmations, one from the server, one from the client. This requires modifications on the server of course.
When the app is not connected to the XMPP server, messages are queued.
When the app is logged in again to the XMPP server, the app takes all messages which were not confirmed by the server and sends them again.
For this to work, you have to locally store the messages in the app with three possible states: "Not sent", "Confirmed by server", "Confirmed by user"

Resources