Gitlab exceeds numtcpsock beancounter limit (OpenVZ) - ruby-on-rails

What is the best way to find out where is the problem with Gitlab (only used application on Ubuntu Plesk Onyx server), that every time I lookup at /proc/user_beancounters the numtcpsock value is on normal state (< 100) and sometimes some Gitlab processes seems to exceed the numtcpsock limit (3000) more than 2300 times, so the virtual server (OpenVZ) crashes?
I already have limited the redis & postgresql connections on /etc/gitlab/gitlab.rb:
postgresql['shared_buffers'] = "30MB"
postgresql['max_connections'] = 100
redis['maxclients'] = "500"
redis['tcp_timeout'] = "20"
redis['tcp_keepalive'] = "10"
sudo gitlab-ctl reconfigure && sudo gitlab-ctl restart
But that seems to don't prevent the server crashes. I need a approach to fix this problem. Have you some ideas?
Edit:
The server is only used by about 3-5 people netstat -pnt | wc -l return about 49 tcp connections. cat /proc/user_beancounters numtcpsock 33 at the moment. All of them except my ssh connection listening on local ip.
Here some examples:
tcp 0 0 127.0.0.1:47280 127.0.0.1:9168 TIME_WAIT -
tcp 0 0 127.0.0.1:9229 127.0.0.1:34810 TIME_WAIT -
tcp 0 0 127.0.0.1:9100 127.0.0.1:45758 TIME_WAIT -
tcp 0 0 127.0.0.1:56264 127.0.0.1:8082 TIME_WAIT -
tcp 0 0 127.0.0.1:9090 127.0.0.1:43670 TIME_WAIT -
tcp 0 0 127.0.0.1:9121 127.0.0.1:41636 TIME_WAIT -
tcp 0 0 127.0.0.1:9236 127.0.0.1:42842 TIME_WAIT -
tcp 0 0 127.0.0.1:9090 127.0.0.1:43926 TIME_WAIT -
tcp 0 0 127.0.0.1:9090 127.0.0.1:44538 TIME_WAIT -
A firewall and fail2ban with many jails (ssh etc) are also active on server.

The numtcpsock value is the amount of TCP connections to your openvz virtual server. Exceeding that wouldn't crash your server, but it would prevent any new TCP sockets from being created and if you only have remote access to the virtual server you would effectively be locked out.
I am not sure how gitlab would be reaching your maximum numtcpsock limit of 3000, unless you have a couple hundred concurrent users. If that is the case, you would simply need to upgrade your numtcpsock maximum limit.
The more likely cause of your numtcpsock issues, if you have a public IP address, would be excessive connections to SSH, HTTP or some other popular TCP service hackers like to probe.
When you are having numtcpsock issues, you would want to check the output of netstat -pnt to see what TCP connections are open on your server. That output will show who is connected and on which port.
To prevent excessive TCP connections in the first place, if the problem is indeed gitlab, make sure that it is not configured in a way that will eat all your available connections. If the issue turns out to be caused by external connections that you do not want, make sure you have some reasonable firewall rules in place or a tool like fail2ban to do it for you.
Edit: Explanation of netstat flags used in answer (taken from netstat man page in Ubuntu 16.04)
-p, --program: show the PID and program to which each socket belongs
-l, --listening: show only listening sockets
-n, --numeric: show numerical addresses instead of trying to determine symbolic host, port or user names
-t, --tcp

Related

docker container not able to reach some of host's ports

I have a stack with docker-compose running on a VM.
Here is a sample output of my netstat -tulpn on the VM
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:9839 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8484 0.0.0.0:* LISTEN
The docker is able to communicate with port 9839 (using 172.17.0.1) but not with port 8484.
Why is that?
That's because the program listening on port 8484 is bound to 127.0.0.1 meaning that it'll only accept connections from localhost.
The one listening on 9839 has bound to 0.0.0.0 meaning it'll accept connections from anywhere.
To make the one listening on 8484 accept connections from anywhere, you need to change what it's binding to. If it's something you've written yourself, you can change it in code. If it's not, there's probably a configuration setting your can set.

How to fix "Connection refused" error on ACME certificate challenge with cookiecutter-django

I have created a simple website using cookiecutter-django (using the latest master cloned today). Running the docker-compose setup locally works. Now I would like to deploy the site on digital ocean. To do this, I run the following commands:
$ docker-machine create -d digitalocean --digitalocean-access-token=secret instancename
$ eval "$(docker-machine env instancename)"
$ sudo docker-compose -f production.yml build
$ sudo docker-compose -f production.yml up
In the cookiecutter-django documentation I read
If you are not using a subdomain of the domain name set in the project, then remember to put your staging/production IP address in the DJANGO_ALLOWED_HOSTS environment variable (see Settings) before you deploy your website. Failure to do this will mean you will not have access to your website through the HTTP protocol.
Therefore, in the file .envs/.production/.django I changed the line with DJANGO_ALLOWED_HOSTS from
DJANGO_ALLOWED_HOSTS=.example.com (instead of example.com I use my actual domain)
to
DJANGO_ALLOWED_HOSTS=XXX.XXX.XXX.XX
(with XXX.XXX.XXX.XX being the IP of my digital ocean droplet; I also tried DJANGO_ALLOWED_HOSTS=.example.com and DJANGO_ALLOWED_HOSTS=.example.com,XXX.XXX.XXX.XX with the same outcome)
In addition, I logged in to where I registered the domain and made sure to point the A-Record to the IP of my digital ocean droplet.
With this setup the deployment does not work. I get the following error message:
traefik_1 | time="2019-03-29T21:32:20Z" level=error msg="Unable to obtain ACME certificate for domains \"example.com\" detected thanks to rule \"Host:example.com\" : unable to generate a certificate for the domains [example.com]: acme: Error -> One or more domains had a problem:\n[example.com] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Fetching http://example.com/.well-known/acme-challenge/example-key-here: Connection refused, url: \n"
Unfortunately, I was not able to find a solution for this problem. Any help is greatly appreciated!
Update
When I run netstat -antp on the server as suggested in the comments I get the following output (IPs replaced with placeholders):
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1590/sshd
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:48923 SYN_RECV -
tcp 0 332 XXX.XXX.XXX.XX:22 ZZ.ZZZ.ZZ.ZZZ:49726 ESTABLISHED 16959/0
tcp 0 1 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:17195 FIN_WAIT1 -
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:57909 ESTABLISHED 16958/sshd: [accept
tcp6 0 0 :::2376 :::* LISTEN 5120/dockerd
tcp6 0 0 :::22 :::* LISTEN 1590/sshd
When I run $ sudo docker-compose -f production.yml up before, netstat -antp returns this:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1590/sshd
tcp 0 332 XXX.XXX.XXX.XX:22 ZZ.ZZZ.ZZ.ZZZ:49726 ESTABLISHED 16959/0
tcp 0 0 XXX.XXX.XXX.XX:22 AA.AAA.AAA.A:50098 ESTABLISHED 17046/sshd: [accept
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:55652 SYN_RECV -
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:16750 SYN_RECV -
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:31541 SYN_RECV -
tcp 0 1 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:57909 FIN_WAIT1 -
tcp6 0 0 :::2376 :::* LISTEN 5120/dockerd
tcp6 0 0 :::22 :::* LISTEN 1590/sshd
In my experience, the Droplets are configured as needed by cookiecutter-django, the ports are open properly, so unless you closed them, you shouldn't have to do anything.
Usually, when this error happens, it's due to DNS configuration issue. Basically Let's Encrypt was not able to reach your server using the domain example.com. Unfortunately, you're not giving us the actual domain you've used, so I'll try to guess.
You said you've configured a A record to point to your droplet, which is what you should do. However, this config needs to be propagated on most of the name servers, which may take time. It might be propagated for you, but if the name server used by Let's Encrypt isn't, your TLS certificate will fail.
You can check how well it's propagated using an online tool which checks multiple name servers at once, like https://dnschecker.org/.
From your machine, you can do so using dig (for people interested, I recommend this video):
# Using your default name server
dig example.com
# Using 1.1.1.1 as name server
dig #1.1.1.1 example.com
Hope that helps.

Docker run cannot publish port range despite netstat indicates that ports are available

I am trying to run a Docker image from inside Google Cloud Shell (i.e. on an courtesy Google Compute Engine instance) as follows:
docker run -d -p 20000-30000:10000-20000 -it <image-id> bash -c bash
Previous to this step, netstat -tuapn has reported the following:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:8998 0.0.0.0:* LISTEN 249/python
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:13080 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:13081 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:34490 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:13082 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:13083 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:13084 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:34490 127.0.0.1:48161 ESTABLISHED -
tcp 0 252 172.17.0.2:22 173.194.92.34:49424 ESTABLISHED -
tcp 0 0 127.0.0.1:48161 127.0.0.1:34490 ESTABLISHED 15784/python
tcp6 0 0 :::22 :::* LISTEN -
So it looks to me as if all the ports between 20000 and 30000 are available, but the run is nevertheless terminated with the following error message:
Error response from daemon: Cannot start container :
failed to create endpoint on network bridge: Timed out
proxy starting the userland proxy
What's going on here? How can I obtain more diagnostic information and ultimately solve the problem (i.e. get my Docker image to run with the whole port range available).
Opening up ports in a range doesn't currently scale well in Docker. The above will result in 10,000 docker-proxy processes being spawned to support each port, including all the file descriptors needed to support all those processes, plus a long list of firewall rules being added. At some point, you'll hit a resource limit on either file descriptors or processes. See issue 11185 on github for more details.
The only workaround when running on a host you control is to not allocate the ports and manually update the firewall rules. Not sure that's even an option with GCE. Best solution will be to redesign your requirements to keep the port range small. The last option is to bypass the bridge network entirely and run on the host network where there are no more proxies and firewall rules with --net=host. The later removes any network isolation you have in the container, so tends to be recommended against.

How to judge a port is open or closed

How I can say a port is open or closed. What's the exact meaning of Open port and closed port.
My favorite tool to check if a specific port is open or closed is telnet. You'll find this tool on all of the operating systems.
The syntax is: telnet <hostname/ip> <port>
This is what it looks like if the port is open:
telnet localhost 3306
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
This is what it looks like if the port is closed:
telnet localhost 9999
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
telnet: Unable to connect to remote host
Based on your use case, you may need to do this from a different machine, just to rule out firewall rules being an issue. For example, just because I am able to telnet to port 3306 locally doesn't mean that other machines are able to access port 3306. They may see it as closed due to firewall rules.
As far as what open/closed ports means, an open port allows data to be sent to a program listening on that port. In the examples above, port 3306 is open. MySQL server is listening on that port. That allows MySQL clients to connect to the MySQL database and issue queries and so on.
There are other tools to check the status of multiple ports. You can Google for Port Scanner along with the OS you are using for additional options.
A port that's opened is a port to which you can connect (TCP)/ send data (UDP). It is open because a process opened it.
There are many different types of ports. These used on the Internet are TCP and UDP ports.
To see the list of existing connections you can use netstat (available under Unix and MS-Windows). Under Linux, we have the -l (--listen) command line option to limit the list to opened ports (i.e. listening ports).
> netstat -n64l
...
tcp 0 0 0.0.0.0:6000 0.0.0.0:* LISTEN
...
udp 0 0 0.0.0.0:53 0.0.0.0:*
...
raw 0 0 0.0.0.0:1 0.0.0.0:* 7
...
In my example, I show a TCP port 6000 opened. This is generally for X11 access (so you can open windows between computers.)
The other port, 53, is a UDP port used by the DNS system. Notice that UDP port are "just opened". You can always send packets to them. You cannot create a client/server connection like you do with TCP/IP. Hence, in this case you do not see the LISTEN state.
The last entry here is "raw". This is a local type of port which only works between processes within one computer. It may be used by processes to send RPC events and such.
Update:
Since then netstat has been somewhat deprecated and you may want to learn about ss instead:
ss -l4n
-- or --
ss -l6n
Unfortunately, at the moment you have to select either -4 or -6 for the corresponding stack (IPv4 or IPv6).
If you're interested in writing C/C++ code or alike, you can read that information from /proc/net/.... For example, the TCP connections are found here:
/proc/net/tcp (IPv4)
/proc/net/tcp6 (IPv6)
Similarly, you'll see UDP files and a Unix file.
Programmatically, if you are only checking one port then you can just attempt a connection. If the port is open, then it will connect. You can then close the connection immediately.
Finally, there is the Kernel direct socket connection for socket diagnostics like so:
int s = socket(
AF_NETLINK
, SOCK_RAW | SOCK_CLOEXEC | SOCK_NONBLOCK
, NETLINK_SOCK_DIAG);
The main problem I have with that one is that it does not really send you events when something changes. But you can read the current state in structures which is safer than attempting to parse files in /proc/....
I have some code handling such a socket in my eventdispatcher library. Only it still has to do a poll to get the data since the kernel does not generate events on its own (i.e. a push is much better since it only has to happen once when an event actually happens).

Does the port change when a server accepts a TCP connection?

When a client connects to a server using TCP, a new socket is created for the TCP stream. Does the connection remain on the same port the connection was made or does it get changed to some other port?
The new socket is an application-level concept introduced because each established connection needs a unique file descriptor (also distinct from the listening file descriptor), which maps to, but isn't the same as, a TCP session. The session itself is identified by the combination of source and destination address and port. The source (client) port is usually chosen at random, while the destination (server) port is the listen port. No additional port is allocated.
The server use the same port to listen and accept new connection, and communicate to the remote client.
Let's me give you an example, (in linux system):
First, start a http server by python:
xiongyu#ubuntu:~$ sudo python -m SimpleHTTPServer 500
Serving HTTP on 0.0.0.0 port 500 ...
Second use nc command to connect to the http server, here we start two client by:
xiongyu#ubuntu:~$ nc 0.0.0.0 500
Use netstat to see the netstate of port 500:
xiongyu#ubuntu:~$ netstat -natp |grep ':500'
tcp 0 0 0.0.0.0:500 0.0.0.0:* LISTEN 54661/python
tcp 0 0 127.0.0.1:51586 127.0.0.1:500 ESTABLISHED 57078/nc
tcp 0 0 127.0.0.1:51584 127.0.0.1:500 ESTABLISHED 54542/nc
tcp 0 0 127.0.0.1:500 127.0.0.1:51586 ESTABLISHED -
tcp 0 0 127.0.0.1:500 127.0.0.1:51584 ESTABLISHED 54661/python
You can see, the http server use port 500 to LISTEN for the client, after a new client connected to the server, it still use the port 500 to communite with the client, but with a new file descriptor .
The socket associated with the new descriptor returned by accept on the server will use the same port on the server side of the connection as the original socket (assuming "normal" definitions where the client initiates the connection). The new socket will have a different client port number (the remote port from the server's point of view).

Resources