I am using two Wi-Fi Direct enabled devices (Let be A with android ICS running and B with Android Jelly Bean )
A has intent value 3 and B has intent value 5 . GO Negotiation happens and B elected as GO. I am using Wi-Fi DirectDemo App for file transfer and able to transfer file from A to B successfully.
Next I changed the intent value of A to 15 and B has intent value 5. GO negotiation happens and A elected as GO.It moves to Provisoning phase then. But the Group Formation fails with no error message except "Group Formation timeout"
Group formation failure happens due to
1) Failed WPS provisioning
2) 15 sec timeout.
Since no error messages are thrown it is hard to debug the issue.I have two queries.
What process actually takes place in this 15 secs.?
Is there any common reason or cause for this timeout?
Expecting answers from Wi-Fi Direct experts.......
Sounds like its EAPOL handshake failure.
We need the sniffer traces to pin point the issue. If you got a sniffer, please share the capture
Related
I have an application that uses mqtt for communication between modules and with a mobile terminal.
In some situations when the messages do not arrive, the node performs a self test of MQTT (sending a msg to itself), and when the selftest fails, tries to reconnect to the broker (mqtt offline not always arrives). And then two problems may arise:
If I perform a mqtt.client:close() to assure that the client is closed (to avoid the second problem) and the client is already closed, the node resets.
If I perform a mqtt.client:connect() and the client is still connected, an exception and a restet occurs.
is there a way to know if the mqtt client is connected or not?
Thanks for your comment. I am going to describe what I am doing, to see if you can help me:
I have two independent system, a master and a slave. The master publish a test message every 10 minutes. If there is no answer from the slave. it publish a test message to itself. If this self test does not arrive, a disconnection from the broker is assumed, and a reconnection is initiated.
And here is where the problem comes, sometimes the client is disconnected and everything go well, but sometimes it is still connected but unresponsive, and the node resets with an exception "already connected".
Performing a mqtt:close() previously to the reconnection, should be safe, but if I send it and the client is truly disconnected, the node resets without any reason (known to me).
All this is happening without receiving any offline message.
Instead of waiting for messages manually sent by the master client (which could fail to send for various reasons, leading a listening device to the wrong conclusion about the state of its connection to the broker), I recommend using MQTT's built-in connection management.
First, you can make sure that each client's initial connection has succeeded by including an error handler in :connect(). If the client really opens, nothing in the NodeMCU documentation indicates that it will close itself; it may go offline.
Once connected, the client only knows that something is wrong when it sends a message and does not receive a response. It sounds like you are not calling :publish() much (which would otherwise let you know by returning false), so pinging may be best. If you expect to receive a message from the broker every n seconds, set a keepalive time slightly higher than that on the client.
Then, failure to get a response to those messages should trigger an event that you can respond to. That might be something like the following (not tested, may work better called outside the callback):
m:on("offline", function(client) m:close() end)
I am using the Twilio Network Traversal Service as part of a native application I am working on to perform peer-to-peer remote desktop connections. We implement a subset of the WebRTC protocol stack that is equivalent to the WebRTC data channels (not the WebRTC video and audio protocols). When using a TURN relay, the TURN allocation seems to be invalidated randomly somewhere between a few minutes and a maximum of 12 minutes from the session start. This issue looks very similar to this one, but the proposed workaround (sending silent audio) is not acceptable in my case, since I do not implement the WebRTC audio/video protocols.
I have been pulling my hair on this problem for the last two weeks, and isolated the issue as being the Twilio service itself. To compare, I have used a web based WebRTC data channel demo using firefox and the Xirsys TURN server cloud. I have wireshark captures showing firefox getting disconnected with Twilio just like my native application, while the exact same firefox demo doesn't get disconnected when using the Xirsys servers.
I was using Xirsys originally, but I experienced some instability with their service that made me switch to Twilio, which is why I would rather have Twilio fix this issue instead of going back with Xirsys. At the bare minimum, I would rather have two WebRTC hosting providers I can choose from that I know should work fine. This is why I am taking the time to explain the issue in detail so it can get fixed.
Here are two wireshark captures (with the peer-to-peer data messages filtered out) showing firefox using WebRTC data channels and the Twilio TURN relay servers:
The traffic stops being relayed after 4 minutes in the first capture, and after about 11 minutes in the second capture. In both captures, firefox detects that traffic stops being relayed (at the data channel level) and attempts a graceful disconnection by sending a Refresh request packet with a lifetime of zero. Both graceful disconnections result in a 437 Allocation Mismatch error, indicating that the server doesn't even know about the allocation firefox is trying to close gracefully.
With my native application, this would often take the form of a CreatePermission Request message that fails with a 438 "Wrong nonce" error, which is basically what should happen if a client tries to update the permission on an allocation that no longer exists. The error code 438 usually means "Stale nonce", which is not really an error, but an indication that the nonce has expired and the client should try again using the new nonce contained in the "error" message. It took me a while to figure out, but even if the error code is 438, the error string is not the same. I have observed a true stale nonce error with Xirsys and successfully updated my permission with the new nonce from the error response, so I know I can properly handle this case in my implementation.
Here is the source code for the WebRTC data channel demo I have used:
https://github.com/devolutions/webrtc-demo
For comparison, here is the same firefox data channel demo using the Xirsys TURN server cloud:
In this capture, I have let the demo run for about 16 minutes (it works for much longer than that, the longest I have tried is two hours). We can see that the traffic keeps getting relayed for the entire duration of the session, and CreatePermission requests keep getting sent by firefox with success. At the end, the graceful disconnection is triggered by firefox closing the WebRTC data channel (instead of being closed due to traffic no longer being relayed). As opposed to the Twilio captures, the Refresh request with a lifetime of zero is successful: the Xirsys TURN server still knows about the allocation and sends back a success response, as expected.
It should be noted that the ICMP unreachable errors are normal because I think in this case firefox is no longer listening on the given port when the response comes back. In other words, it sends the Refresh request with a lifetime of zero and doesn't wait for the answer to come back.
For the time being, I have no other choice but to go back with Xirsys, but I would really like if the Twilio Network Traversal Service could be fixed. Let me know if you have more questions regarding the issue.
I have uploaded the wireshark captures here for reference.
EDIT: I have modified the webrtc demo page such that it doesn't close the connection when the ice connection state is set to 'disconnected'. Now I get the real disconnection when the ice connection state goes to 'failed'. However, it effectively didn't change anything, since in this case it takes just a few seconds more for the state to go from 'disconnected' to 'failed'.
Since I have new relevant screenshots and captures, I am updating the original question to clarify certain problems pointed out by Philipp Hancke:
First, here is a new capture with the ice connection state fix (the browser closes the connection only when the state goes to 'failed'):
It's interesting to see that this time, the session stayed up for a whole 18 minutes. This was taken on a saturday morning, so I'm guessing that the issue could be related to the current workload on the twilio servers. However, it failed in the exact same way as it always does so far for me. As a bonus, we even have a valid stale nonce response that is correctly handled by firefox.
However, if we take a different view of the same capture, we can see that the traffic stops being relayed for a solid 30 seconds before firefox considers the connection as being dropped and sends the Refresh request with a lifetime of zero. As in previous captures, the server responds with an Allocation Mismatch error, indicating it doesn't know which allocation firefox is talking about.
The last eight packets being sent are of the same size, so my guess is that they are retransmissions. After 30 seconds of retransmissions, it is likely that SCTP considers the transport as being dropped.
With regards to the refresh request with a lifetime of zero, I did a test where I close the connection early on, from the browser. In this case, the server recognizes the allocation and returns a success response:
The allocation mismatch is the easiest symptom to observe, but in my testing with my native application, I have seen similar errors with Refresh requests for non-zero lifetimes, and with CreatePermission requests (438 "Wrong nonce" error). However, since the browser closes the connection after 30 seconds of data not being relayed, it is hard to observe these errors with the current webrtc demo. If we could change that timeout to 10 minutes, we would see those errors as well.
Excellent problem description!
Without the server log this is hard to determine what goes wrong. I tried with the appear.in TURN servers which run an up-to-date version of coturn and show the same behaviour as the Twilio servers. Xirsys seems to be running a custom version of coturn (Coturn-0.5 'Xirsys Turn Services' from the software field but coturn never had such a version).
In both captures, firefox detects that traffic stops being relayed (at the data channel level) and attempts a graceful disconnection by sending a Refresh request packet with a lifetime of zero.
Not quite. A refresh request with a lifetime of 0 is used to discard an allocation. At that point it does not matter what the server returns as the connection is beyond repair anyway.
This is caused by peerjs closing the peerconnection if the iceconnectionstate changes to disconnected, here in your bundled library version.
This is overly aggressive (and does not even fix things) and we've had a discussion about what the specification should do wrt to trying to fix things with an ice restart here which also links to a great explanation of the disconnected state.
The disconnected state probably happens because a few packets get lost. But this is something that can happen when there is minor congestion. I'd recommend removing the pc.close() in the disconnected case.
If you are looking for other TURN providers, Tokbox provides the same service. For datachannels the latency of a properly run distributed TURN network does not matter as much as for VoIP so you might run your own servers in a single location instead.
I am using Google IOT core with mongoose os. I wanted to update device connection status to firestore. But i am unable to find event which reports mqtt connection status to pub/sub like when device disconnects or reconnect i.e if device is offline or not.
I am stuck on this problem for days.Any help will be appreciated
Update
As #devunwired mentioned in this response it is now possible to monitor Stackdriver logs for disconnect events. You must have at a minimum enabled INFO level logging on your project in IoT Core > Registries > [your registry] > Edit Registry > Select "Info" log level > Click save.
Original Response
There are a few values you can look at that are tracked in device configuration metadata that you could use to know when a device last was online:
Last Configuration Send time - sent anytime your device connects /
configuration is posted
Last Event Time - Last time an event was sent from the device
Last State Time - Last time state was sent from the device
Last Heartbeat time - Last time MQTT heartbeat was sent
To get you started, here is an example using API explorer that you can fill-in with your project ID, region, registry, and device to query for a specific device's metadata.
For 1...3 you have control over these through device manager and by publishing data. MQTT heartbeat is updated if your device sends an MQTT_PINGREQ message during the "ping period" without other messages getting sent.
At any rate, you could use any of these update time values to see the last time a device was online / functioning. You could query the states of your devices after listing the devices in a registry and could update a Firebase RTDB periodically if that's how you want to report (e.g. using AppEngine TaskQueue). Note that you also just can get these "last connected" values from the Google Cloud Console.
It was said before but we don't have an event for disconnect, just configuration ack, which generally is the connection event. If you want to share state between a device and the device manager, use state messages.
Unfortunately, there's no built in way to do this right now as there aren't events on this state.
However, you could implement a hack by sending a message on connect/disconnect from the device that you have a Cloud Function subscribed to the Pub/Sub topic listening for. It's not perfect as it would fail in the case where the device disconnected unexpectedly.
There currently is no way to do this, that i've been able to find (a year later after this original post). I posted a question here on SO regarding this as well, with more details and link to example code I had to use for handling this:
Google Core IoT Device Offline Event or Connection Status
The AWS IoT platform publishes messages on a special MQTT topic (prefixed with $aws) when your device connects/disconnects. You can easily use these to monitor these events - however, you should be aware that the MQTT protocol is designed to be robust to a poor networking conditions and the broker on the AWS side probably doesn't think it's a bit deal to disconnect a client. The broker expects that the client will just reconnect and queue messages for a moment during that process (which can be a big deal on a microcontroller).
All that being said, the AWS topics you would watch are:
$aws/events/presence/connected/{clientId}
and
$aws/events/presence/disconnected/{clientId}
and the documentation for these (and other) lifecycle events are located: https://docs.aws.amazon.com/iot/latest/developerguide/life-cycle-events.html
I am able to connect to a peripheral device using BLE, but shortly after reading some characteristics, the framework returns:
CoreBluetooth[WARNING] Unknown error: 14
and the peripheral is disconnected.
Looking at the BT Core_V4.0 spec, I am not sure what the error means. Is the 14 a hex value? does it mean the following error according to the spec: (Part D Section 2 - Error Codes)
2.20 REMOTE DEVICE TERMINATED CONNECTION DUE TO LOW RESOURCES (0X14)
The Remote Device Terminated Connection due to Low Resources error code indicates that the remote device terminated the connection because of low resources.
I tried changing the battery but did not have a different effect.
Also, I don't know how to catch these CB errors, I only see them logged, but when the device disconnects, it does not provide an error (it is null).
I do not directly control the source code for the peripheral but can ask for a bug fix. So any hints are appreciated it.
Thanks,
You cannot intercept these CB errors, they are just traces from lower layer BLE.
Error 13 for instance is when the length of written data is not as specified in the GATT database.
Error 14 means the connection was closed by other side (peripheral). I have seen this several times. Some times I read data too fast (You are not allowed to request next access before previous has been answered, there is only 1 "resource" in BLE per connection. Maybe this is what you also see?
As always it is best to get the TI BTLE USB Dongle with sniffer sw installed and then use the TI RF Sniffer tool in BLE mode with that dongle. You get a lot of information you can debug from in those traces. Like see if there are more than one read or write request without response.
I've encountered a pretty large issue and have been trying to find a solution for 2 months now with no luck. I've submitted it as a bug, ( https://bugzilla.xamarin.com/show_bug.cgi?id=4910 ) but was hoping maybe someone here could shed some light on the cause of the problem, or suggest a work-around.
In a nutshell, to encounter the error:
Create a basic .Net socket connection between two devices
Create and initialize a GameKit.GKSession object on a least one device.
What occurs is the transfer of the data on the .NET socket becomes erratic and too slow to be usable. I've performed many tests across different devices (see link below) and it affects all of them (iPad 3 affected the least). I've tested it between an iPhone and a Windows PC and it still occurs. MonoTouch's GameKit code is somehow affecting the Socket code.
As you can see from the spreadsheet, speed drop from a few milliseconds to send 1 MB to several minutes to forever.
As soon as the GameKit.GKSession is set to null, any backedlogged data on the socket flows freely again and the sockets act normally once more.
Sample Windows and iOS/MonoTouch Apps demonstrating problem: https://dl.dropbox.com/u/8617393/SocketBug/SocketBug.zip
Test results across different devices (PDF Spreadsheet):
https://dl.dropbox.com/u/8617393/SocketBug/SocketBugTestResults.pdf
This issue seems so hard even Apple decided to circumvent it: http://developer.apple.com/library/ios/#qa/qa1753/_index.html#//apple_ref/doc/uid/DTS40011315
The revealing sentence is this: "This change was made to reduce interference with Wi-Fi."
To assist with anyone that comes across this problem, the issue only manifests itself when the GKSession.Available is set to true. This is required for the device to be discoverable, but not to maintain a connection. With this in mind, a temporary work-around can be used where the GKSession.Available is only set to true when required, and set to false as soon as the connection has been made.
In my scenario, I have a socket connection to the server on both devices. I use this connection to send a message from device A to device B, asking it to become discoverable via bluetooth for the next 10 seconds. As soon as the message is sent, device A begins looking for device B and connects as soon as it finds it. Once the connection is established (or 10 seconds elapses without a connection), device B turns off discoverability and the socket behaves as normal.
In my circumstances, this is an acceptable (pending no real fix) workaround to the problem.