Vaadin Flow UI stops updating - vaadin

We have some users reporting an odd problem with our Vaadin 23 application in Chrome and Edge. The application randomly "hangs" such that the user can still interact with client-side components, but nothing seems to reach the server. The problem occurs more frequently when the users are connected to their corporate network from home via VPN.
The application is configured to use push with long polling, and is deployed as a WAR file on Tomcat 9 under Java 11.
There are no error messages in the Javascript console
A network trace (screen shot below) shows successful heartbeats, push renegotiations, and UI interactions
The server access logs mirror the requests from the network trace, so we are confident all requests are making it through to the server
What we are seeing is that the XHR POST requests are being generated when the client interacts with the UI (the ?v-r=uidl requests), but no server updates are ever applied to the UI. The application becomes unresponsive to user input. Interactions that are purely client-side (e.g. selecting a tab in a tabbed layout) still work, but no server-side updates ever get applied.
Any clues as to what is going wrong?
EDITED 10-aug-22:
An interesting observation that indicates the problem may be server-side: The XHR POST requests in the network trace above are triggered by clicking on tabs within a Tabs component. I added logging in the tab event handling to log a message every time a tab is clicked. When the application freezes, I can see the XHR requests in the server's access log, but I do not see the event handler log messages.

Related

Firebase A/B test with server logged conversion event

I'm trying to set up a Firebase A/B test with a conversion event logged from my server, in combination with events fired from my mobile application.
From the diagram below (retrieved here), I would expect that if I properly pass along my App Instance Id and Event Data to my server, and then pass that data along when logging to GA4, the event should be properly attributable to a Firebase A/B test, but that has yet to work.
I have confirmed that A/B tests with events fired from the mobile app directly work as expected, and confirmed that App Instance Id is being passed to my server before being forwarded to GA4 (which is where Firebase is getting my mobile app data).
The reason I am attempting to do this from my server is this event can be logged from either my Apple Watch app, or the iOS app, and has a heavy amount of processing that occurs on the server before being "complete" - I am therefore looking to log that conversion at the end of that server processing, but this processing is hidden from my mobile clients (I just send a 200 back shortly after receiving the payload and spawn child processes to handle the rest of the work on the server).
Is this simply a limitation of the Firebase A/B testing framework?
Thanks!

MQTT JS don't connect when mobile browser is not active

I am using mqtt and websocket for maintain a realtime comunication with a server, but i am noticed that my client, mqtt.js, don't works when my browser is minimized or tab is not active in my mobile device, any help?????
That's just the way mobile browsers work, they will suspend anything in the background to save battery.
I suggest you look at something like the page lifecycle events covered in this document from Chrome to see how to handle getting notified when the page is suspended and when it gets focus back and is resumed.

Websocket connections on iPhone get lost when safari is un-focused/hidden

I use a single websocket connection for my web app's notification and chat system. Everything works fine except that when using an iPhone, after closing/hiding Safari (which is actually just a hide/un-focus of the window I think) the connection gets lost and there will be no automatic re-connect after re-opening the Safari window. This might also occur on all other smartphones when hiding the browser window.
On desktop browsers this problem can't occur, as closing tab/window/browser will reload everything on the user's next visit ... But on mobile it seems to be more like:
Lose focus/hide window -> Cancel all client/server connections
Show window again -> Just show the rendered DOM and call Interval/Timeout functions
A solution I thought of is running an interval function every X minutes to check if a websocket connection exists otherwise create one ... This is ok, but I don't like this approach too much and was wondering if there is something I am doing wrong or missing on websockets as I used XHR Polling till now.
I use "Rails Action Cable" for my web app's websocket connections. As I use Vue.js for my frontend, so I wrote a custom package to use action cable's client side functionality instead of rail's full integration (https://www.npmjs.com/package/vue-action-cable) but I think the problem is more specific about websocket connections on mobile devices which un-focus the app window.
Experiencing this on an XHR polling app and a websocket app that uses Action Cable and React. The solution I did for the XHR polling was to utilize document.addEventListener('visibilitychange') and trigger on document.hidden to make a API call to the server whenever they come back to the tab. Essentially an "away" and "back" trigger. I plan to use that same trigger idea in React to then make sure the Action Cable connections are good. I can share that solution with you when it's done if you want.

Twilio WebRTC TURN relay randomly stops working after a few minutes

I am using the Twilio Network Traversal Service as part of a native application I am working on to perform peer-to-peer remote desktop connections. We implement a subset of the WebRTC protocol stack that is equivalent to the WebRTC data channels (not the WebRTC video and audio protocols). When using a TURN relay, the TURN allocation seems to be invalidated randomly somewhere between a few minutes and a maximum of 12 minutes from the session start. This issue looks very similar to this one, but the proposed workaround (sending silent audio) is not acceptable in my case, since I do not implement the WebRTC audio/video protocols.
I have been pulling my hair on this problem for the last two weeks, and isolated the issue as being the Twilio service itself. To compare, I have used a web based WebRTC data channel demo using firefox and the Xirsys TURN server cloud. I have wireshark captures showing firefox getting disconnected with Twilio just like my native application, while the exact same firefox demo doesn't get disconnected when using the Xirsys servers.
I was using Xirsys originally, but I experienced some instability with their service that made me switch to Twilio, which is why I would rather have Twilio fix this issue instead of going back with Xirsys. At the bare minimum, I would rather have two WebRTC hosting providers I can choose from that I know should work fine. This is why I am taking the time to explain the issue in detail so it can get fixed.
Here are two wireshark captures (with the peer-to-peer data messages filtered out) showing firefox using WebRTC data channels and the Twilio TURN relay servers:
The traffic stops being relayed after 4 minutes in the first capture, and after about 11 minutes in the second capture. In both captures, firefox detects that traffic stops being relayed (at the data channel level) and attempts a graceful disconnection by sending a Refresh request packet with a lifetime of zero. Both graceful disconnections result in a 437 Allocation Mismatch error, indicating that the server doesn't even know about the allocation firefox is trying to close gracefully.
With my native application, this would often take the form of a CreatePermission Request message that fails with a 438 "Wrong nonce" error, which is basically what should happen if a client tries to update the permission on an allocation that no longer exists. The error code 438 usually means "Stale nonce", which is not really an error, but an indication that the nonce has expired and the client should try again using the new nonce contained in the "error" message. It took me a while to figure out, but even if the error code is 438, the error string is not the same. I have observed a true stale nonce error with Xirsys and successfully updated my permission with the new nonce from the error response, so I know I can properly handle this case in my implementation.
Here is the source code for the WebRTC data channel demo I have used:
https://github.com/devolutions/webrtc-demo
For comparison, here is the same firefox data channel demo using the Xirsys TURN server cloud:
In this capture, I have let the demo run for about 16 minutes (it works for much longer than that, the longest I have tried is two hours). We can see that the traffic keeps getting relayed for the entire duration of the session, and CreatePermission requests keep getting sent by firefox with success. At the end, the graceful disconnection is triggered by firefox closing the WebRTC data channel (instead of being closed due to traffic no longer being relayed). As opposed to the Twilio captures, the Refresh request with a lifetime of zero is successful: the Xirsys TURN server still knows about the allocation and sends back a success response, as expected.
It should be noted that the ICMP unreachable errors are normal because I think in this case firefox is no longer listening on the given port when the response comes back. In other words, it sends the Refresh request with a lifetime of zero and doesn't wait for the answer to come back.
For the time being, I have no other choice but to go back with Xirsys, but I would really like if the Twilio Network Traversal Service could be fixed. Let me know if you have more questions regarding the issue.
I have uploaded the wireshark captures here for reference.
EDIT: I have modified the webrtc demo page such that it doesn't close the connection when the ice connection state is set to 'disconnected'. Now I get the real disconnection when the ice connection state goes to 'failed'. However, it effectively didn't change anything, since in this case it takes just a few seconds more for the state to go from 'disconnected' to 'failed'.
Since I have new relevant screenshots and captures, I am updating the original question to clarify certain problems pointed out by Philipp Hancke:
First, here is a new capture with the ice connection state fix (the browser closes the connection only when the state goes to 'failed'):
It's interesting to see that this time, the session stayed up for a whole 18 minutes. This was taken on a saturday morning, so I'm guessing that the issue could be related to the current workload on the twilio servers. However, it failed in the exact same way as it always does so far for me. As a bonus, we even have a valid stale nonce response that is correctly handled by firefox.
However, if we take a different view of the same capture, we can see that the traffic stops being relayed for a solid 30 seconds before firefox considers the connection as being dropped and sends the Refresh request with a lifetime of zero. As in previous captures, the server responds with an Allocation Mismatch error, indicating it doesn't know which allocation firefox is talking about.
The last eight packets being sent are of the same size, so my guess is that they are retransmissions. After 30 seconds of retransmissions, it is likely that SCTP considers the transport as being dropped.
With regards to the refresh request with a lifetime of zero, I did a test where I close the connection early on, from the browser. In this case, the server recognizes the allocation and returns a success response:
The allocation mismatch is the easiest symptom to observe, but in my testing with my native application, I have seen similar errors with Refresh requests for non-zero lifetimes, and with CreatePermission requests (438 "Wrong nonce" error). However, since the browser closes the connection after 30 seconds of data not being relayed, it is hard to observe these errors with the current webrtc demo. If we could change that timeout to 10 minutes, we would see those errors as well.
Excellent problem description!
Without the server log this is hard to determine what goes wrong. I tried with the appear.in TURN servers which run an up-to-date version of coturn and show the same behaviour as the Twilio servers. Xirsys seems to be running a custom version of coturn (Coturn-0.5 'Xirsys Turn Services' from the software field but coturn never had such a version).
In both captures, firefox detects that traffic stops being relayed (at the data channel level) and attempts a graceful disconnection by sending a Refresh request packet with a lifetime of zero.
Not quite. A refresh request with a lifetime of 0 is used to discard an allocation. At that point it does not matter what the server returns as the connection is beyond repair anyway.
This is caused by peerjs closing the peerconnection if the iceconnectionstate changes to disconnected, here in your bundled library version.
This is overly aggressive (and does not even fix things) and we've had a discussion about what the specification should do wrt to trying to fix things with an ice restart here which also links to a great explanation of the disconnected state.
The disconnected state probably happens because a few packets get lost. But this is something that can happen when there is minor congestion. I'd recommend removing the pc.close() in the disconnected case.
If you are looking for other TURN providers, Tokbox provides the same service. For datachannels the latency of a properly run distributed TURN network does not matter as much as for VoIP so you might run your own servers in a single location instead.

WebExtension redirect and block websites

I have started simple web extension for firefox which in theory should block access to specific websites based on some response from the remote server. User tries to navigate, new page will not be loaded until confirmation is not received from the remote server. Unfortunately remote "check" server is limited to a few requests in a second for each user so I can't (and it's unnecessary to) check each request made after user navigates to some page. Is there any method to listen for "real" navigation not all those requests and redirect whole tab somewhere before any requests are even made?
I've tried add-on API:
tab events fired after content is already received, which is not nice.
"http-on-modify-request" event is fired for each request separately spamming remote check server.
WebExtensions:
browser.webNavigation.onBeforeNavigate seems like what I need, but I can't neither send check request neither redirect from there and I am not sure I will able to.
"http-on-modify-request" event is fired for each request separately spamming remote check server.
that observer notification gives you a http channel, the channel has a loadInfo property, which has an externalContentPolicyType property which allows you to filter for top level document loads by matching one of the content policy constants.
WebRequest.jsm and browser.webRequest are abstractions over the http observers and provide similar functionality.

Resources