WSL2 + Docker - Keep Alive Bug in TCP stack

WSL2 + Docker - Keep Alive Bug in TCP stack - docker

I wonder if others noticed this issue with the WSL2 Debian implementation of TCP.
I am connecting from a Docker container running WSL2 Debian v. 20
The TCP client sends a Keep-Alive packet every second which is kind of overkill. Then after roughly 5 minutes, the client terminates the connection without any reason. Is anybody seeing this behavior?
You can reproduce this by just opening a telnet session to another host. But the behavior happens on other types of sockets too.
And before you ask, this issue is not caused by the server, it does not occur when opening the same tcp connection from other hosts.
wireshark dump of the last few seconds of the idle TCP connection

I had the same problem with Ubuntu on WSL2. An outbound ssh connection closed after a period of time if there was no activity on that connection. Particularly anoying if you were running an application that produced no screen output.
I suspect that the internal router that connects wsl to the local network dropped the idle TCP connection.
The solution was to shorten the TCP keep-alive timers in /proc/sys/net/ipv4, the following worked for me:
echo 300 > /proc/sys/net/tcp_keepalive_time
echo 45 > /proc/sys/net/tcp_keepalive_intvl

So I figured this out. Unfortunately, the WSL2 implementation of Debian seems to have this hardcoded in the stack. I tried to change the parameters of the socket open call and they didn't cause a change in the behavior.

Related

Checking for port exhaustion with Netstat HttpClientFactory WPF

I want to check that I have properly implemented HttpClientFactory. I have a desktop application that pings my server every 20 seconds. When I open command prompt and run "netstat -ano | findstr {My server IP}" I can see there are always 2 or 3 connections. As time goes on and I continue to check, the ports will slowly change (go up in their port #'s, older ports disappear) but there are never more than 2 or 3 connections. Does this mean that the old ports are being released and I am not at risk for port exhaustion? Thanks.
As mentioned above. I am going to begin selling my application very soon and need to be sure that I am not going to exhaust my client's ports and hinder their network.

Cannot Connect to a Firebird 2.5 database remotely

I currently have a Firebird 2.5 database at a client premises, installed on a Windows 7 Pro machine (32 bit), that has multiple stations in their local network that can connect to the database, along with the local machine being able to connect with our application and IBExpert.
However, for some of our software packages, a remote connection is required (outside of the local network). This previously was working but no longer works.
When I connect with FlameRobin from my office (I'm located in a different city / different network), I receive the following error message:
IBPP::SQLException
Context: Database::Connect
Message: isc_attach_database failed
SQL Message: -923
Connection not established
Engine Code : 335544421
Engine Message :
connection rejected by remote interface.
Performing this connection attempt with IBExpert, both from my office and from other external networks fail with a same message.
However, I am getting TCP/IP communication from what I can see. Here are the details of my troubleshooting steps for the last week:
Originally, I was receiving the following error when connecting from outside the network:
"Connection not established
Connection refused by remote interface"
Since that time, we have done a restart of the router and now have the current "connection rejected by remote interface." error message.
I can telnet to the public IP through port 3050 from my office and other outside networks.
I tested port 3050 on sites like YouGetSignal.com or CanYouSeeMe.org and they appear as open.
Other ports that we communicate on publicly are open and communicating.
The site has Kaspersky antivirus installed but all tests to connect via IBExpert while Kaspersky was in sleep mode behaved the same.
Installation of Firebird 2.5 to another workstation in the same local network, pointing to port 3051 (both in Firebird.conf and in the Windows Firewall and Router) show up as being open through Telnet and CanYouSeeMe.org but again, cannot be communicated on from outside via port 3051.
IBExpert works from a workstation in the network to the server
The server currently has no entry for RemoteBindAddress in the Firebird.conf
Wireshark shows that when a connecting from outside, there are packets coming through.
The TCP/IP test in IBexpert under Communication Diagnostics for the public IP as the host and the Service show the following Test Results:
Attempt connecting to XX.YY.ZZ.AAA.
Socket for connection obtained.
Found service 'GDS_DB' at port '3050'
Connection established to host 'XX.YY.ZZ.AAA',
on port 3050.
TCP/IP Communication Test Passed!
Database path, username, and password have all been checked multiple times.
locally on the server, I've changed security of the database.FDB and the security2.FDB to have Everyone, Full Control
At this point, we have a scheduled restart of the ISP's modem happening soon, although the fact that we have full TCP/IP communication over the port makes me doubtful that this is the issue.
If anyone can lead me down any recommended next steps to debug or to any tools that are available to help in this situation, that would be greatly appreciated.

This turns out to be a networking issue. We performed the following tests:
We performed a power cycle on the ISP's modem which showed no change in behavior
We connected a laptop directly to the ISP's modem but couldn't communicate to FB even with proper port forwarding rules in place on the machine and firewall.
We ran wireshark on both sides and on connection attempts, we found many attempts to connect with retransmissions that failed.
The technical team at the client side decided to install a VPN capable router and now we're good to go. From what we found there may be some kind of ISP blocking occurring as many of the tech teams remote services were failing to connect with similar behavior.
Hopefully this post helps people in the future with remote connectivity debugging, and all of the places you can look at when you're running into this problem.

Docker services stops communicating after some time

I have together 6 containers running in docker swarm. Kafka+Zookeeper, MongoDB, A, B, C and Interface. Interface is the main access point from public - only this container publish the port - 5683. The interface container connects to A, B and C during startup. I am using docker-compose file + docker stack deploy, each service has a name which is used as host for interface. Everything starts successfully and works fine. After some time (20 mins,1h,..) I am not able to make request to interface. Interface receives my requests but application lost connection with service A,B,C or all of them. If I restart interface, it's able to reconnect to services A,B,C.
I firstly thought it's problem of application so I expose 2 new ports on each service (interface, A,B,C) and connect with profiler and debugger to them. Application is running properly, no leaks, no blocked threads, normally working and waiting for connections. Debugger shows me that when I make a request to interface and interface tries to request service A, Connection reset by peer exception was thrown.
During this debugging I found out interesting stuff. I attached debugger to interface when the services started and also debugger was disconnected after some time. + I was not able to reconnect it, until I made request to the container -> application. PRoblem - handshake failed.
Another interesting stuff that I found out was that I was not able to request neither interface. So I used wireshark to see what's going on and: SYN - ACK was fine. Then application post some data and interface respond with FIN,ACK. I assume that this also happen when interface tries to request service A and it FIN the connection. Codebase of Interface, A,B and C is the same regarding netty server.
Finally, I don't think it's a application issue. Why? I tried to deploy containers not as services. I run each container separately, published the ports of each and endpoint of services were set to localhost. (not overlay network). And it is working. Containers run without problem. + I didn't say at the beginning, that the the java applications (interface, A,B,C) runs without problem when they are running as standalone application - not in docker.
Could you please help me what could be the issue? Why the docker in case of overlay network is closing sockets?
I am using newest docker. I used also older.

Finally, I was able to solve the problem.
What was happening, one more time. Interface opens permanent TCP connection to A,B,C. When you try to run these services A,B,C as a standalone java applications, everything is working. When we dockerize them and run in swarm, it was working only few minutes. Strange was that the connection between Interface and another service was interrupted in the moment when you made a request from client to interface.
After many many unsuccessful tests and debugging each container I tried to run each docker container separately, with mapped ports and as endpoint I specified localhost. (each container exposed ports and interface was connecting to localhost) Funny thing happen, it was working. When you run containers like this, different network driver for container is used. Bridge one. If you run it in swarm, overlay network driver is used.
So it had to be something with the docker network, not with application itself. Next step was tcpdump from each container after couple of minutes, when it should stop working. It was very interesting.
Client -> Interface (OK, request accepted)
Interface ->(forward request because it belongs to A) A
Interface -> A [POST]
A -> Interface [RESET]
A was reseting opened TCP communication after couple of minutes without communication. Why?
Docker uses IP Virtual Server and IPVS maintains its own connection table. The default timeout for CLOSE_WAIT connections in IPVS table is 60 seconds. Hence when the server sends something after 60 seconds, the IPVS connection is no longer available and the packet looks invalid for a new TCP session and gets RST. On the client side, the connection remains forever in FIN_WAIT2 state because the app still has the socket open; kernel's fin_wait timer kicks in only for orphaned TCP sockets.
This is what I read about it and how understand it. I am not sure if my explanation of problem is correct, but based on these assumptions I implemented ping-pong between Interface and A,B,C services in case there is no communication for <60seconds. And, it’s working.

Got the same issue.
Specified
endpoint_mode: dnsrr
to properties of the service which plays "server" role and it works just fine.
https://forums.docker.com/t/tcp-timeout-that-occurs-only-in-docker-swarm-not-simple-docker-run/58179

ArangoDB: 'Could not connect to 'tcp://127.0.0.1:8529' 'connect() failed with #10061

Sometimes my ArangoDB is going down with next error:
Error message 'Could not connect to 'tcp://127.0.0.1:8529' 'connect() failed with #10061
I can't understand the reason. It's look like I am turning on my PC and nothing do not work.
Before I fixed this problem with reinstall, but is there any better solution?
OS Windows
ArangoDB 2.8.7

The V8 version used in the pre ArangoDB 3 had occasional troubles in the garbage collection which would make ArangoDB in term go down.
This is fixed with ArangoDB 3.
Please upgrade your installation, and report back whether the problem still persists.
You can use netstat to check whether ArangoDB is listening to its default port 8529:
netstat -a
Active Connections
Proto Lokale Adresse Remoteadresse Status
...
TCP 127.0.0.1:8529 meschenich:0 LISTEN
...
If thats not the case, your client has nothing to connect to.

This could be due to firewall of an antivirus.
In my case it was Avast antivirus that was blocking connecting to that port.
I disabled all the antivirus shields and checked loading arangodb web server
http://127.0.0.1:8529
It connects after few minutes.
Reference : No connection could be made because the target machine actively refused it

I fixed the problem by restarting Windows.

Restcomm jDiameter: Error creating SCTP socket

I am trying to create a standalone SCTP diameter client using Jdiameter. The jar libraries I am using are jdiameter-api-1.5.9.0-build538-SNAPSHOT and jdiameter-impl-1.5.9.0-build538-SNAPSHOT
But I get this error Unable to create server socket for LocalPeer 'client.test.com' at 127.0.0.1:55555 (org.mobicents.protocols.api.AssociationListener)
It works fine with TCP. I tried to debug but couldn't figure out the problem. Kindly help me with this.

SCTP will not work on windows systems. For linux systems, you might have to install the sctp stack. However, be aware that for some linux distributions you might run into strange issues with it, so e.g. that the port is still being blocked even after all server sockets, client sockets etc are closed and even the processes have been shut down or killed. In these cases, you need to wait about 5-10 minutes until the sctp stack recognizes that there is no one anymore who is interested in this port and releases it by itself.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart