How can I debug Docker name resolution? - docker

I have a very peculiar problem regarding Docker and how containers resolve the IP address of other containers using their container names.
My machine runs many different containers interacting with each other, making the problem difficult to reproduce in a minimal example as would be preferred if possible.
The problem I am encountering is that whenever I add a new container/service/docker-network (by running docker-compose up) all of a sudden, my container for reverse proxying HTTP(S) traffic starts resolving the IP addresses of containers incorrectly.
I can verify that the Name --> IP resolutions are incorrect by checking the logs of where the reverse proxy is trying to connect and then running docker network inspect on the container I expect it to resolve. These are now not the same, resulting in errors all around.
Whenever I stop the new service that triggered the errors, the name resolution is back to normal and all works as expected again. I have encountered this same problem when starting multiple different images, so the problem is most likely not in the new container.
Since I can not provide you with a simple minimally reproducible example (I would if I could), how can I further debug the docker name resolution in an attempt at gathering more information about this problem?
Any advice is appriceated. I know this is not the optimal Stack Overflow question, but I can't do much more without more information about my problem.
I hope you understand.

Related

All docker stack are restarting automatically

I have a multi-services environment that is hosted with docker swarm. There are multiple stacks that are created. All the docker containers which are running have an inbuild Spring Boot application. The issue is coming that all my stacks get restarted on their own. Now I know that in compose file I have mentioned that restart_policy as on failure. Hence it auto restarted. The issue comes that when services are restarted, I get errors from a particular service and this breaks everything.
I am not able to figure out what actually happens.
I did quite a lot of research and found out about these things.
Docker daemon is not restarted. I double-checked this with the uptime of the docker daemon.
I checked the docker service ps <Service_ID> and there I can see service showing shutdown and starting. No other information.
I checked the docker service logs <Service_ID> but no error in there too.
I checked for resource crunch. I can assure you that there was quite a good resource available at the host as well as each container level.
Can someone help where exactly to find logs for this even? Any other thoughts on this?
My host is actually a VM hosted on VMWare Vcenter.
After a lot of research and going through all docker logs, I could not find the solution. Later on, I discovered that there was a memory snapshot taken for backup every 24 hours.
Here is what I observe:
Whenever we take a snapshot, all docker services running on the host restart automatically. There will be no errors in that but they will just restart gracefully.
I found some questions already having this problem with VMware snapshots.
As far as I know, when we take a snapshot, it points to a different memory location and saves the previous one. I am not able to find why it's happening but yes Root cause of the problem was this. If anyone is a VMWare snapshots expert, please let us know.

Rsyslog can't start inside of a docker container

I've got a docker container running a service, and I need that service to send logs to rsyslog. It's an ubuntu image running a set of services in the container. However, the rsyslog service cannot start inside this container. I cannot determine why.
Running service rsyslog start (this image uses upstart, not systemd) returns only the output start: Job failed to start. There is no further information provided, even when I use --verbose.
Furthermore, there are no error logs from this failed startup process. Because rsyslog is the service that can't start, it's obviously not running, so nothing is getting logged. I'm not finding anything relevant in Upstart's logs either: /var/log/upstart/ only contains the logs of a few things that successfully started, as well as dmesg.log which simply contains dmesg: klogctl failed: Operation not permitted. which from what I can tell is because of a docker limitation that cannot really be fixed. And it's unknown if this is even related to the issue.
Here's the interesting bit: I have the exact same container running on a different host, and it's not suffering from this issue. Rsyslog is able to start and run in the container just fine on that host. So obviously the cause is some difference between the hosts. But I don't know where to begin with that: There are LOTS of differences between the hosts (the working one is my local windows system, the failing one is a virtual machine running in a cloud environment), so I wouldn't know where to even begin about which differences could cause this issue and which ones couldn't.
I've exhausted everything that I know to check. My only option left is to come to stackoverflow and ask for any ideas.
Two questions here, really:
Is there any way to get more information out of the failure to start? start itself is a binary file, not a script, so I can't open it up and edit it. I'm reliant solely on the output of that command, and it's not logging anything anywhere useful.
What could possibly be different between these two hosts that could cause this issue? Are there any smoking guns or obvious candidates to check?
Regarding the container itself, unfortunately it's a container provided by a third party that I'm simply modifying. I can't really change anything fundamental about the container, such as the fact that it's entrypoint is /sbin/init (which is a very bad practice for docker containers, and is the root cause of all of my troubles). This is also causing some issues with the docker logging driver, which is why I'm stuck using syslog as the logging solution instead.

Docker Port fowarding is not working anymore

I have mutiple Docker containers that I use on my machine for testing, etc.,all through port fowarding.
Strangely, for the last four days I have not been able to connect to any of them. I made some tests with application outside of containers, it appears I can still connect to them.
But for every application inside a container I get a "connection reset by peer error"
I might have messed up with dangling docker network interface before this happenned,
but this is my first time having that consequence and now my work is really impeded.
Does anybody know what could be going on?
It was a problem with the iptables. I don't know which one, but after deleting them and reinstalling everything it started working again.

No response from Docker service

I tried following the tutorial here
https://docs.docker.com/get-started/part3/.
First issue I ran into was when I called docker swarm init. It also asked for docker swarm init --advertise-addr with one of two possible IPv6 IPs.
I tried initializing the swarm on both and then starting the service. The service starts succesfully, but I can't get any response when accessing Localhost:4000. It just loads forever.
I have tried rebuilding the image, creating the swarm on both IPs, checking the logs (there was nothing there), but I kind of run out of ideas. If it helps, the computer has dual operating system, might affect the networking in ways I an unable to figure out.
How can I receive a response on my request?
The issue I was facing was a connection between google chrome and docker swarm, documented better here
https://forums.docker.com/t/google-chrome-and-localhost-in-swarm-mode/32229/9.
There is no apparent solution

Failing to see how ambassador pattern enhances modularity / simplicty of container architecture in Docker

I fail to see how implementing the ambassador pattern would help us simplify / modularize the design of our container architecture.
Let's say that I have a database container db on host A and is used by a program db-client which sits on host B, which are connected via ambassador containers db-ambassador and db-foreign-ambassador over a network:
[host A (db) --> (db-ambassador)] <- ... -> [host B (db-forgn-ambsdr) --> (db-client)]
Connections between containers in the same machine, e.g. db to db-ambassador, and db-foreign-ambassador to db-client are done via Docker's --link parameter while db-ambassador and db-foreign-ambassador talks over the network.
But , --link is just a fancy way of inserting ip addresses, ports and other info from one container to another. When a container fails, the other container which is linked to it does not get notified, nor will it know the new IP address of the crashing container when it restarts. In short, if a container which is linked to another went dead, the link is also dead.
To consider my example, lets say that db crashed and restarts, thus get assigned to a different IP. db-ambassador would have to be restarted too, in order to update the link between them... Except you shouldn't. If db-ambassador is restarted, the IP would have changed too, and foreign-db-ambassador won't know where to reach it at the new IP address.
Quoting an article in the Docker's docs about the ambassador pattern,
When you need to rewire your consumer to talk to a different Redis
server, you can just restart the redis-ambassador container that the
consumer is connected to.
This pattern also allows you to transparently move the Redis server to
a different docker host from the consumer.
it seems like this is exactly the problem it is trying to solve. Which, as far as my understanding goes, it totally didn't. Not if you consider --link is only useful as long as the linked container doesn't crash. The option to start a crashing node on its previous IP would have been a good workaround if supported, at least for a small/medium sized architecture.
Am I missing something obvious?
Jérôme had some good slides (11-33) on how ambassadors are better than other ways of service discovery (i.e. DNS, key-value stores, bind-mount config file, etc.) in his slide deck on "Shipping Applications to Production in Containers with Docker". He also has some suggestions for how to solve the problem I think you are mentioning, especially Docker Grand Ambassador looks promising.

Resources