Securing Redis with Stunnel on Docker Swarm - docker

I have added stunnel to a Redis container and PHP-FPM container to securely transfer application data between services on a docker swarm cluster. I haven't been able to find any other similar questions, so I'm wondering if I'm taking the wrong approach here.
I have this working in my local environment, it's when I deploy it to the swarm that it fails.
Problem
When I try to ping from the client container by executing redis-cli -p 8001 ping
Then I get the following error: Error: Connection reset by peer
When I take a look at the logs for stunnel I can see that it accepted the connection on the client and then fails when attempting to send it to the redis server container as seen below
2018.05.19 16:42:39 LOG5[ui]: Configuration successful
2018.05.19 16:45:19 LOG7[0]: Service [redis-client] started
2018.05.19 16:45:19 LOG5[0]: Service [redis-client] accepted connection from 127.0.0.1:41710
2018.05.19 16:45:19 LOG6[0]: s_connect: connecting 10.0.0.5:6379
2018.05.19 16:45:19 LOG7[0]: s_connect: s_poll_wait 10.0.0.5:6379: waiting 10 seconds
2018.05.19 16:45:19 LOG3[0]: s_connect: connect 10.0.0.5:6379: Connection refused (111)
2018.05.19 16:45:19 LOG5[0]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket
2018.05.19 16:45:19 LOG7[0]: Local descriptor (FD=3) closed
2018.05.19 16:45:19 LOG7[0]: Service [redis-client] finished (0 left)
Configuration Details
Here's the stunnel configuration on the Redis server
pid = /run/stunnel-redis.pid
output = /tmp/stunnel.log
[redis-server]
cert = /etc/stunnel/redis-server.crt
key = /etc/stunnel/redis-server.key
accept = redis_master:6379
connect = 127.0.0.1:6378
And here's the stunnel configuration for the client
pid = /run/stunnel-redis.pid
output = /tmp/stunnel.log
[redis-client]
client = yes
accept = 127.0.0.1:8001
connect = redis_master:6379
CAfile = /etc/stunnel/redis-server.crt
verify = 4
debug = 7
This is what my docker-stack.yml file looks like for these two services
php_fpm:
build:
context: .
dockerfile: fpm.Dockerfile
image: registry.github.com/hidden
ports:
- "8001"
redis_master:
build:
context: .
dockerfile: redis.Dockerfile
image: registry.github.com/hidden
ports:
- "6378"
- "6379"
sysctls:
- net.core.somaxconn=511
volumes:
- redis-data:/data
Output of netstat -plunt in the fpm client container
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:8001 0.0.0.0:* LISTEN 208/stunnel4
tcp 0 0 127.0.0.11:45281 0.0.0.0:* LISTEN -
tcp6 0 0 :::9000 :::* LISTEN 52/php-fpm.conf)
udp 0 0 127.0.0.11:43781 0.0.0.0:* -
Output of netstat -plunt in the redis server container
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.11:39294 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:6378 0.0.0.0:* LISTEN 8/redis-server *:63
tcp 0 0 10.0.0.14:6379 0.0.0.0:* LISTEN 37/stunnel4
tcp6 0 0 :::6378 :::* LISTEN 8/redis-server *:63
udp 0 0 127.0.0.11:44855 0.0.0.0:* -
I've confirmed there is no firewall active on the host machine. These services are currently on the same host, but they will soon be on separate hosts, hence the need for stunnel.
These services are deployed with the docker stack command so an overlay network is automatically created and attached to both of these services.
Anyone have any thoughts on why the request from the client to the server is being refused?

FINALLY got this working! I hope this helps someone else. The problem was the stunnel configuration on the redis-server, the correct configureation is as follows:
[redis-server]
cert = /etc/stunnel/redis-server.crt
key = /etc/stunnel/redis-server.key
accept = 6379
connect = 6378
The problem appears to be that I had used the hostname redis_master in the accept option, switching it to only the port fixed the problem.

Related

Opening a port to the world on AWS not working

I am having trouble accessing a service that is running in a docker container (port 5005) from the internet over TCP.
The server is a ubuntu AWS ec2 instance with port 5005 open in the security group (both v4 and v6 addressing)
The docker processes are running fine, appearing to map the port from inside its container to the ec2 instance.
ubuntu#ip-172-31-5-89:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
71e620ea2969 rasa/rasa-sdk:latest "./entrypoint.sh sta…" 15 minutes ago Up 15 minutes 0.0.0.0:5055->5055/tcp, :::5055->5055/tcp emma_action_server_1
533010182ca7 rasa/rasa:latest-full "rasa run --enable-a…" 15 minutes ago Up 15 minutes 0.0.0.0:5005->5005/tcp, :::5005->5005/tcp emma_rasa_1
(yes, 5005 and 5055 are both valid ports and not a typo - but only 5005 should be exposed to the ec2 instance and up through the firewall out to the web.
ufw appears to be signalling the port fine.
Status: active
To Action From
-- ------ ----
5005/tcp ALLOW Anywhere
5005 ALLOW Anywhere
22 ALLOW Anywhere
5005/tcp (v6) ALLOW Anywhere (v6)
5005 (v6) ALLOW Anywhere (v6)
22 (v6) ALLOW Anywhere (v6)
and the ec2 instance appears to be listening fine:
ubuntu#ip-172-31-5-89:~$ sudo netstat -plunta | grep LISTEN
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 561/systemd-resolve
tcp 0 0 0.0.0.0:5055 0.0.0.0:* LISTEN 6473/docker-proxy
tcp 0 0 0.0.0.0:5005 0.0.0.0:* LISTEN 6451/docker-proxy
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 810/sshd: /usr/sbin
tcp6 0 0 :::5055 :::* LISTEN 6480/docker-proxy
tcp6 0 0 :::5005 :::* LISTEN 6458/docker-proxy
tcp6 0 0 :::22 :::* LISTEN 810/sshd: /usr/sbin
Yet, when I try accessing public.IP.address:5005 on any online port checking tool - it says the port is closed. When I actually try to make a POST request via postman - I get ETIMEDOUT which Im not sure is another way to say its closed, or infact, its just a timeout... but when I make the same POST request on the server, using local addressing, it works fine.
This works locally on ec2 (outside of container):
curl -XPOST localhost:5005/webhooks/rest/webhook -d '{"message":"hi"}'
this doesnt work - ETIMEOUT:
curl -XPOST publicIPAddressHere:5005/webhooks/rest/webhook -d '{"message":"hi"}'
The ACL and Network appear to be setup correctly also.
When I run the reachability analyser, it works - but thats obviously coming from inside the network from the private IP address... 172... so the issue is clearly exposing the port to the world.
I was able to get this working by creating a fresh ec2 instance on its own VPC/ACL with the same configuration as above.
Not really an answer as it is a work-around - gremlins in the system.

How to fix "Connection refused" error on ACME certificate challenge with cookiecutter-django

I have created a simple website using cookiecutter-django (using the latest master cloned today). Running the docker-compose setup locally works. Now I would like to deploy the site on digital ocean. To do this, I run the following commands:
$ docker-machine create -d digitalocean --digitalocean-access-token=secret instancename
$ eval "$(docker-machine env instancename)"
$ sudo docker-compose -f production.yml build
$ sudo docker-compose -f production.yml up
In the cookiecutter-django documentation I read
If you are not using a subdomain of the domain name set in the project, then remember to put your staging/production IP address in the DJANGO_ALLOWED_HOSTS environment variable (see Settings) before you deploy your website. Failure to do this will mean you will not have access to your website through the HTTP protocol.
Therefore, in the file .envs/.production/.django I changed the line with DJANGO_ALLOWED_HOSTS from
DJANGO_ALLOWED_HOSTS=.example.com (instead of example.com I use my actual domain)
to
DJANGO_ALLOWED_HOSTS=XXX.XXX.XXX.XX
(with XXX.XXX.XXX.XX being the IP of my digital ocean droplet; I also tried DJANGO_ALLOWED_HOSTS=.example.com and DJANGO_ALLOWED_HOSTS=.example.com,XXX.XXX.XXX.XX with the same outcome)
In addition, I logged in to where I registered the domain and made sure to point the A-Record to the IP of my digital ocean droplet.
With this setup the deployment does not work. I get the following error message:
traefik_1 | time="2019-03-29T21:32:20Z" level=error msg="Unable to obtain ACME certificate for domains \"example.com\" detected thanks to rule \"Host:example.com\" : unable to generate a certificate for the domains [example.com]: acme: Error -> One or more domains had a problem:\n[example.com] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Fetching http://example.com/.well-known/acme-challenge/example-key-here: Connection refused, url: \n"
Unfortunately, I was not able to find a solution for this problem. Any help is greatly appreciated!
Update
When I run netstat -antp on the server as suggested in the comments I get the following output (IPs replaced with placeholders):
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1590/sshd
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:48923 SYN_RECV -
tcp 0 332 XXX.XXX.XXX.XX:22 ZZ.ZZZ.ZZ.ZZZ:49726 ESTABLISHED 16959/0
tcp 0 1 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:17195 FIN_WAIT1 -
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:57909 ESTABLISHED 16958/sshd: [accept
tcp6 0 0 :::2376 :::* LISTEN 5120/dockerd
tcp6 0 0 :::22 :::* LISTEN 1590/sshd
When I run $ sudo docker-compose -f production.yml up before, netstat -antp returns this:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1590/sshd
tcp 0 332 XXX.XXX.XXX.XX:22 ZZ.ZZZ.ZZ.ZZZ:49726 ESTABLISHED 16959/0
tcp 0 0 XXX.XXX.XXX.XX:22 AA.AAA.AAA.A:50098 ESTABLISHED 17046/sshd: [accept
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:55652 SYN_RECV -
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:16750 SYN_RECV -
tcp 0 0 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:31541 SYN_RECV -
tcp 0 1 XXX.XXX.XXX.XX:22 YYY.YY.Y.YYY:57909 FIN_WAIT1 -
tcp6 0 0 :::2376 :::* LISTEN 5120/dockerd
tcp6 0 0 :::22 :::* LISTEN 1590/sshd
In my experience, the Droplets are configured as needed by cookiecutter-django, the ports are open properly, so unless you closed them, you shouldn't have to do anything.
Usually, when this error happens, it's due to DNS configuration issue. Basically Let's Encrypt was not able to reach your server using the domain example.com. Unfortunately, you're not giving us the actual domain you've used, so I'll try to guess.
You said you've configured a A record to point to your droplet, which is what you should do. However, this config needs to be propagated on most of the name servers, which may take time. It might be propagated for you, but if the name server used by Let's Encrypt isn't, your TLS certificate will fail.
You can check how well it's propagated using an online tool which checks multiple name servers at once, like https://dnschecker.org/.
From your machine, you can do so using dig (for people interested, I recommend this video):
# Using your default name server
dig example.com
# Using 1.1.1.1 as name server
dig #1.1.1.1 example.com
Hope that helps.

Socket port not opening in Docker Swarm Cluster (Root Cause Identified)

I have following setup
Two VMs
created overlay network
created two docker swarm services
docker service create --name karaf1-service --replicas 1 --network karaf_net karaf1:2.0.0
docker service create --name karaf2-service --replicas 1 --network karaf_net karaf2:2.0.0
Now these containers open socket port at start, i observed some time it successfully able to create it lot of time it fails.
ServerSocketFactory.getDefault().createServerSocket(serverPort)
if both containers get start on one node its mostly successfull, but when containers get created on different node it almost fails every time.
before troubleshooting for network issue, container atleast should create sockets.
this container not able to open socket
root#bd48643080b2:/opt/apache/apache-karaf-4.1.5# netstat -tulnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:8101 0.0.0.0:* LISTEN 61/java
tcp 0 0 127.0.0.1:1099 0.0.0.0:* LISTEN 61/java
tcp 0 0 0.0.0.0:41551 0.0.0.0:* LISTEN 61/java
tcp 0 0 127.0.0.11:44853 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:44444 0.0.0.0:* LISTEN 61/java
Following container able to create it on port 4550, but some times it vice versa
root#38d26c7dde1a:/opt/apache/apache-karaf-4.1.5# netstat -tulnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:37347 0.0.0.0:* LISTEN 61/java
tcp 0 0 0.0.0.0:8101 0.0.0.0:* LISTEN 61/java
tcp 0 0 0.0.0.0:4550 0.0.0.0:* LISTEN 61/java
tcp 0 0 127.0.0.11:37575 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:1099 0.0.0.0:* LISTEN 61/java
tcp 0 0 127.0.0.1:35321 0.0.0.0:* LISTEN 61/java
tcp 0 0 0.0.0.0:44444 0.0.0.0:* LISTEN 61/java
Root Cause Identified:
As i am creating two services so while creating first service i provide second service as hostname to first service to keep verifying status so java throwing error on hostname like "karaf2-service"
java.net.UnknownHostException: karaf2-service: Name or service not known
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
Now i cant add entry of karaf2-service in etc/hosts so socket dont complain as i dont know which IP would be assign to docker-swarm service? in overlay network we mostly communicate with service names.
Any suggestions to resolve this???
The easiest way to do this, is to check on container startup if you can reach the other service, and if not, wait a few seconds then try again.
There are multiple tools to do this, such as wait-for-it: https://github.com/vishnubob/wait-for-it

Cannot access port on host mapped to docker container port

I have started a docker container using the command
sudo docker run -it -P -d plcdimage
The image is built using a Dockerfile which has instruction EXPOSE 8080. Container runs a jboss server with an application deployed on it. Port mappings are :
Command: sudo docker port be1837e849dc
Output: 8080/tcp -> 0.0.0.0:32771
When I try to access the web application running on jboss in the container from the mapped host port using url:
http://IPAddressOfHost:32771/
I get connection refused error. Following is the result of command "netstat -tulpn"
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp6 0 0 :::9999 :::* LISTEN -
tcp6 0 0 :::22 :::* LISTEN -
tcp6 0 0 :::32771 :::* LISTEN -
udp 0 0 0.0.0.0:68 0.0.0.0:* -
I tried doing telnet hostip 32771 and it also results in connection refused.
Docker version 1.12.1
build 23cf638
What could be the possible reason for this?
Thanks in advance
I found that jboss server running inside the container was not listening on 0.0.0.0. One option to do this is, while starting the standalone server use -b 0.0.0.0.
/bin/standalone.sh -b 0.0.0.0

Docker run cannot publish port range despite netstat indicates that ports are available

I am trying to run a Docker image from inside Google Cloud Shell (i.e. on an courtesy Google Compute Engine instance) as follows:
docker run -d -p 20000-30000:10000-20000 -it <image-id> bash -c bash
Previous to this step, netstat -tuapn has reported the following:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:8998 0.0.0.0:* LISTEN 249/python
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:13080 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:13081 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:34490 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:13082 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:13083 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:13084 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:34490 127.0.0.1:48161 ESTABLISHED -
tcp 0 252 172.17.0.2:22 173.194.92.34:49424 ESTABLISHED -
tcp 0 0 127.0.0.1:48161 127.0.0.1:34490 ESTABLISHED 15784/python
tcp6 0 0 :::22 :::* LISTEN -
So it looks to me as if all the ports between 20000 and 30000 are available, but the run is nevertheless terminated with the following error message:
Error response from daemon: Cannot start container :
failed to create endpoint on network bridge: Timed out
proxy starting the userland proxy
What's going on here? How can I obtain more diagnostic information and ultimately solve the problem (i.e. get my Docker image to run with the whole port range available).
Opening up ports in a range doesn't currently scale well in Docker. The above will result in 10,000 docker-proxy processes being spawned to support each port, including all the file descriptors needed to support all those processes, plus a long list of firewall rules being added. At some point, you'll hit a resource limit on either file descriptors or processes. See issue 11185 on github for more details.
The only workaround when running on a host you control is to not allocate the ports and manually update the firewall rules. Not sure that's even an option with GCE. Best solution will be to redesign your requirements to keep the port range small. The last option is to bypass the bridge network entirely and run on the host network where there are no more proxies and firewall rules with --net=host. The later removes any network isolation you have in the container, so tends to be recommended against.

Resources