Concurrent TCP connections in Linux - connection

I want to know whether I had hit the maximum number of connections that are allowed in my Linux based server
# netstat -an | grep -i time | wc -l
1116
# netstat -an | grep -i estab | wc -l
2137
TCP parameters at Kernel level are as follows:
# cat /proc/sys/net/ipv4/tcp_fin_timeout
60
# cat /proc/sys/net/ipv4/ip_local_port_range
32768 61000
TIME_WAIT connections are from the load balancer IP (199.X.X.02)
tcp 0 0 199.X.X.05:8280 199.X.X.02:51884 TIME_WAIT
How can I know whether I had hit the maximum limit? Any kernel parameters which will will tell me the current no of open connections. Also , how to calculate the maximum concurrent connections that is supported.

"Any kernel parameters which will will tell me the current no of open connections".
Partial answer:
You should be able to see the list of your open sockets under /proc/PID/fd (where PID is you process' pid, of course).

Related

nftables rule for queueing udp packets to userspace libnetfilter_queue based program

I have an iptables filter for queueing output packets to userspace program. The iptable filter works fine and I see the packets being forwarded to the userspace program for processing/mangaling.
iptables -A OUTPUT -p udp --dport 8080 -j NFQUEUE --queue-num 0
Since we are planning to use nftables going forward, using iptables-translate I was able to convert this filter to nft filter as below.
iptables-translate -A OUTPUT -p udp --dport 8080 -j NFQUEUE --queue-num 0
nft add rule ip filter OUTPUT udp dport 8080 counter queue num 0
If I add this rule to nftables, I see the following error.
sudo nft add rule ip filter OUTPUT udp dport 8080 counter queue num 0
Error: Could not process rule: No such file or directory
add rule ip filter OUTPUT udp dport 8080 counter queue num 0
^^^^^^
So did the following so the queue filter can be added to nftables.
nft add table ip filter
nft add chain ip filter OUTPUT
nft add rule ip filter OUTPUT udp dport 8080 counter queue num 0
Now I see rule added to nftable...
nft list ruleset
table ip filter {
chain OUTPUT {
udp dport 8080 counter packets 0 bytes 0 queue num 0
}
In nftables, with the above filter rule don't see packets being forwarded to my userspace program using libnetfilter_queue libraries.
Please let me know what I missing in the nftables queue filter rule?

How to fix "Connection refused" error on ACME certificate challenge with cookiecutter-django

I have created a simple website using cookiecutter-django (using the latest master cloned today). Running the docker-compose setup locally works. Now I would like to deploy the site on digital ocean. To do this, I run the following commands:
$ docker-machine create -d digitalocean --digitalocean-access-token=secret instancename
$ eval "$(docker-machine env instancename)"
$ sudo docker-compose -f production.yml build
$ sudo docker-compose -f production.yml up
In the cookiecutter-django documentation I read
If you are not using a subdomain of the domain name set in the project, then remember to put your staging/production IP address in the DJANGO_ALLOWED_HOSTS environment variable (see Settings) before you deploy your website. Failure to do this will mean you will not have access to your website through the HTTP protocol.
Therefore, in the file .envs/.production/.django I changed the line with DJANGO_ALLOWED_HOSTS from
DJANGO_ALLOWED_HOSTS=.example.com (instead of example.com I use my actual domain)
to
DJANGO_ALLOWED_HOSTS=XXX.XXX.XXX.XX
(with XXX.XXX.XXX.XX being the IP of my digital ocean droplet; I also tried DJANGO_ALLOWED_HOSTS=.example.com and DJANGO_ALLOWED_HOSTS=.example.com,XXX.XXX.XXX.XX with the same outcome)
In addition, I logged in to where I registered the domain and made sure to point the A-Record to the IP of my digital ocean droplet.
With this setup the deployment does not work. I get the following error message:
traefik_1 | time="2019-03-29T21:32:20Z" level=error msg="Unable to obtain ACME certificate for domains \"example.com\" detected thanks to rule \"Host:example.com\" : unable to generate a certificate for the domains [example.com]: acme: Error -> One or more domains had a problem:\n[example.com] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Fetching http://example.com/.well-known/acme-challenge/example-key-here: Connection refused, url: \n"
Unfortunately, I was not able to find a solution for this problem. Any help is greatly appreciated!
Update
When I run netstat -antp on the server as suggested in the comments I get the following output (IPs replaced with placeholders):
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1590/sshd
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:48923 SYN_RECV -
tcp 0 332 XXX.XXX.XXX.XX:22 ZZ.ZZZ.ZZ.ZZZ:49726 ESTABLISHED 16959/0
tcp 0 1 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:17195 FIN_WAIT1 -
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:57909 ESTABLISHED 16958/sshd: [accept
tcp6 0 0 :::2376 :::* LISTEN 5120/dockerd
tcp6 0 0 :::22 :::* LISTEN 1590/sshd
When I run $ sudo docker-compose -f production.yml up before, netstat -antp returns this:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1590/sshd
tcp 0 332 XXX.XXX.XXX.XX:22 ZZ.ZZZ.ZZ.ZZZ:49726 ESTABLISHED 16959/0
tcp 0 0 XXX.XXX.XXX.XX:22 AA.AAA.AAA.A:50098 ESTABLISHED 17046/sshd: [accept
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:55652 SYN_RECV -
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:16750 SYN_RECV -
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:31541 SYN_RECV -
tcp 0 1 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:57909 FIN_WAIT1 -
tcp6 0 0 :::2376 :::* LISTEN 5120/dockerd
tcp6 0 0 :::22 :::* LISTEN 1590/sshd
In my experience, the Droplets are configured as needed by cookiecutter-django, the ports are open properly, so unless you closed them, you shouldn't have to do anything.
Usually, when this error happens, it's due to DNS configuration issue. Basically Let's Encrypt was not able to reach your server using the domain example.com. Unfortunately, you're not giving us the actual domain you've used, so I'll try to guess.
You said you've configured a A record to point to your droplet, which is what you should do. However, this config needs to be propagated on most of the name servers, which may take time. It might be propagated for you, but if the name server used by Let's Encrypt isn't, your TLS certificate will fail.
You can check how well it's propagated using an online tool which checks multiple name servers at once, like https://dnschecker.org/.
From your machine, you can do so using dig (for people interested, I recommend this video):
# Using your default name server
dig example.com
# Using 1.1.1.1 as name server
dig #1.1.1.1 example.com
Hope that helps.

Gitlab exceeds numtcpsock beancounter limit (OpenVZ)

What is the best way to find out where is the problem with Gitlab (only used application on Ubuntu Plesk Onyx server), that every time I lookup at /proc/user_beancounters the numtcpsock value is on normal state (< 100) and sometimes some Gitlab processes seems to exceed the numtcpsock limit (3000) more than 2300 times, so the virtual server (OpenVZ) crashes?
I already have limited the redis & postgresql connections on /etc/gitlab/gitlab.rb:
postgresql['shared_buffers'] = "30MB"
postgresql['max_connections'] = 100
redis['maxclients'] = "500"
redis['tcp_timeout'] = "20"
redis['tcp_keepalive'] = "10"
sudo gitlab-ctl reconfigure && sudo gitlab-ctl restart
But that seems to don't prevent the server crashes. I need a approach to fix this problem. Have you some ideas?
Edit:
The server is only used by about 3-5 people netstat -pnt | wc -l return about 49 tcp connections. cat /proc/user_beancounters numtcpsock 33 at the moment. All of them except my ssh connection listening on local ip.
Here some examples:
tcp 0 0 127.0.0.1:47280 127.0.0.1:9168 TIME_WAIT -
tcp 0 0 127.0.0.1:9229 127.0.0.1:34810 TIME_WAIT -
tcp 0 0 127.0.0.1:9100 127.0.0.1:45758 TIME_WAIT -
tcp 0 0 127.0.0.1:56264 127.0.0.1:8082 TIME_WAIT -
tcp 0 0 127.0.0.1:9090 127.0.0.1:43670 TIME_WAIT -
tcp 0 0 127.0.0.1:9121 127.0.0.1:41636 TIME_WAIT -
tcp 0 0 127.0.0.1:9236 127.0.0.1:42842 TIME_WAIT -
tcp 0 0 127.0.0.1:9090 127.0.0.1:43926 TIME_WAIT -
tcp 0 0 127.0.0.1:9090 127.0.0.1:44538 TIME_WAIT -
A firewall and fail2ban with many jails (ssh etc) are also active on server.
The numtcpsock value is the amount of TCP connections to your openvz virtual server. Exceeding that wouldn't crash your server, but it would prevent any new TCP sockets from being created and if you only have remote access to the virtual server you would effectively be locked out.
I am not sure how gitlab would be reaching your maximum numtcpsock limit of 3000, unless you have a couple hundred concurrent users. If that is the case, you would simply need to upgrade your numtcpsock maximum limit.
The more likely cause of your numtcpsock issues, if you have a public IP address, would be excessive connections to SSH, HTTP or some other popular TCP service hackers like to probe.
When you are having numtcpsock issues, you would want to check the output of netstat -pnt to see what TCP connections are open on your server. That output will show who is connected and on which port.
To prevent excessive TCP connections in the first place, if the problem is indeed gitlab, make sure that it is not configured in a way that will eat all your available connections. If the issue turns out to be caused by external connections that you do not want, make sure you have some reasonable firewall rules in place or a tool like fail2ban to do it for you.
Edit: Explanation of netstat flags used in answer (taken from netstat man page in Ubuntu 16.04)
-p, --program: show the PID and program to which each socket belongs
-l, --listening: show only listening sockets
-n, --numeric: show numerical addresses instead of trying to determine symbolic host, port or user names
-t, --tcp

rails: port is in use or requires root privileges

I am getting this error when attempting to start up a rails 4.1.1 server:
Listening on 0.0.0.0:3000, CTRL+C to stop
Exiting
/Users/darrenburgess/.rvm/gems/ruby-2.1.2#myflix/gems/eventmachine-1.0.0/lib/eventmachine.rb:526:in `start_tcp_server': no acceptor (port is in use or requires root privileges) (RuntimeError)
I have tried the following commands to find and kill the process, however none of them reveal any servers running on 3000
ps ax | grep rails
ps ax | grep ruby
lsof -i TCP | grep LISTEN
lsof -i :3000
These, from my research on stack overflow, seem to be all of the available methods for discovering running ports.
In a rails 5 application I am getting the following similar error:
Listening on tcp://0.0.0.0:3000
Exiting
/Users/darrenburgess/.rvm/gems/ruby-2.3.1/gems/puma-3.7.0/lib/puma/binder.rb:269:in `initialize': Address already in use - bind(2) for "0.0.0.0" port 3000 (Errno::EADDRINUSE)
Note that I can start rails servers on other ports.
This error persists even after machine reboot. Seems I have exhausted all avenues for finding and killing ports in use. What other things can I try?
UPDATE:
#hjpotter92 suggests running:
netstat -lntp | grep 3000
This however does not work as an option is required for the p argument. According to man netstat the list of protocols is found in etc/protocols.
I looked in that file and found that tcp is a listed protocol. However, this command does not return any output:
netstat -lntp tcp | grep 3000
Nor does this command return anything either:
netstat -lnt | grep 3000
You can try scanning the port like this lsof -i :3000 and then kill the process using sudo kill -9 <PID>.
Well, it turns out the answer to this is fairly obscure. The Node instance of FileMaker server 16 is running on port 3000. I was running a FileMaker server on the my Rails development machine.
This command helped to discover that:
sudo lsof -P -i :3000
Result
node 562 fmserver 20u IPv6 0x3ef1908b38776fe5 0t0 TCP *:3000 (LISTEN)
I could kill that process, however elected instead to disable the Node instance (The FileMaker REST/Data API).
Documentation here shows that FileMaker 16 is using that port.
http://help.filemaker.com/app/answers/detail/a_id/16319

Meaning of SS command output with 3 colons (':::')?

The increasingly popular ss command (/usr/sbin/ss on RHEL) is a replacement for netstat.
I'm trying to parse the output in Python and I'm seeing some odd data that is not explained in the documentation.
$ ss -an | head
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 0 :::14144 :::*
LISTEN 0 0 127.0.0.1:32000 *:*
LISTEN 0 0 :::3233 :::*
LISTEN 0 0 *:5634 *:*
LISTEN 0 0 :::5634 :::*
So, it's obvious what the local address means when it's 127.0.0.1:32000, obviously listening on the loopback interface on port 32000. But, what do the 3 colons ::: mean?
Really, I can figure it's two extra colons, since the format is host:port, so what does a host of two colons mean?
I should mention I'm running this on a RHEL/CENTOS box:
Linux boxname 2.6.18-348.3.1.el5 #1 SMP somedate x86_64 x86_64 x86_64 GNU/Linux
This is not explained in any of the online man pages or other discussions I can find.
That's IPV6 abbreviated address representation. The colon groups represent consecutive zero groups.
:::14144 would be read as 0000:0000:0000:0000:0000:0000:0000:0000 port 14144 which I guess would mean all addresses with port 14144
:::* would be read as 0000:0000:0000:0000:0000:0000:0000:0000 all ports which I guess would mean all addresses with any port

Resources