Rsyslog can't start inside of a docker container - docker

I've got a docker container running a service, and I need that service to send logs to rsyslog. It's an ubuntu image running a set of services in the container. However, the rsyslog service cannot start inside this container. I cannot determine why.
Running service rsyslog start (this image uses upstart, not systemd) returns only the output start: Job failed to start. There is no further information provided, even when I use --verbose.
Furthermore, there are no error logs from this failed startup process. Because rsyslog is the service that can't start, it's obviously not running, so nothing is getting logged. I'm not finding anything relevant in Upstart's logs either: /var/log/upstart/ only contains the logs of a few things that successfully started, as well as dmesg.log which simply contains dmesg: klogctl failed: Operation not permitted. which from what I can tell is because of a docker limitation that cannot really be fixed. And it's unknown if this is even related to the issue.
Here's the interesting bit: I have the exact same container running on a different host, and it's not suffering from this issue. Rsyslog is able to start and run in the container just fine on that host. So obviously the cause is some difference between the hosts. But I don't know where to begin with that: There are LOTS of differences between the hosts (the working one is my local windows system, the failing one is a virtual machine running in a cloud environment), so I wouldn't know where to even begin about which differences could cause this issue and which ones couldn't.
I've exhausted everything that I know to check. My only option left is to come to stackoverflow and ask for any ideas.
Two questions here, really:
Is there any way to get more information out of the failure to start? start itself is a binary file, not a script, so I can't open it up and edit it. I'm reliant solely on the output of that command, and it's not logging anything anywhere useful.
What could possibly be different between these two hosts that could cause this issue? Are there any smoking guns or obvious candidates to check?
Regarding the container itself, unfortunately it's a container provided by a third party that I'm simply modifying. I can't really change anything fundamental about the container, such as the fact that it's entrypoint is /sbin/init (which is a very bad practice for docker containers, and is the root cause of all of my troubles). This is also causing some issues with the docker logging driver, which is why I'm stuck using syslog as the logging solution instead.

Related

Why does Docker randomly throw this a 'Permission Denied' error when trying to stop a container?

I am trying to stop a docker container and get the following error:
This happens randomly on occasion and it is very frustrating to have restart the docker service and relaunch all my containers.
Would anyone know what could be happening to cause this? As far I have seen or know, there has not been any changes made to the container since they have been launched, may some changes in the content of the data in the containers. Also if people need more information, I would be happy to provide.
FYI everything that I am doing I am doing as a root user.
ALSO -- ABSOLUTLEY CANNOT STOP THE DOCKER DAMON OR RESTART IT, THIS MUST BE RESOLVED WHILE KEEPING THE CURRENT CONTAINERS OPEN AND RUNNIN.

All docker stack are restarting automatically

I have a multi-services environment that is hosted with docker swarm. There are multiple stacks that are created. All the docker containers which are running have an inbuild Spring Boot application. The issue is coming that all my stacks get restarted on their own. Now I know that in compose file I have mentioned that restart_policy as on failure. Hence it auto restarted. The issue comes that when services are restarted, I get errors from a particular service and this breaks everything.
I am not able to figure out what actually happens.
I did quite a lot of research and found out about these things.
Docker daemon is not restarted. I double-checked this with the uptime of the docker daemon.
I checked the docker service ps <Service_ID> and there I can see service showing shutdown and starting. No other information.
I checked the docker service logs <Service_ID> but no error in there too.
I checked for resource crunch. I can assure you that there was quite a good resource available at the host as well as each container level.
Can someone help where exactly to find logs for this even? Any other thoughts on this?
My host is actually a VM hosted on VMWare Vcenter.
After a lot of research and going through all docker logs, I could not find the solution. Later on, I discovered that there was a memory snapshot taken for backup every 24 hours.
Here is what I observe:
Whenever we take a snapshot, all docker services running on the host restart automatically. There will be no errors in that but they will just restart gracefully.
I found some questions already having this problem with VMware snapshots.
As far as I know, when we take a snapshot, it points to a different memory location and saves the previous one. I am not able to find why it's happening but yes Root cause of the problem was this. If anyone is a VMWare snapshots expert, please let us know.

How can I debug Docker name resolution?

I have a very peculiar problem regarding Docker and how containers resolve the IP address of other containers using their container names.
My machine runs many different containers interacting with each other, making the problem difficult to reproduce in a minimal example as would be preferred if possible.
The problem I am encountering is that whenever I add a new container/service/docker-network (by running docker-compose up) all of a sudden, my container for reverse proxying HTTP(S) traffic starts resolving the IP addresses of containers incorrectly.
I can verify that the Name --> IP resolutions are incorrect by checking the logs of where the reverse proxy is trying to connect and then running docker network inspect on the container I expect it to resolve. These are now not the same, resulting in errors all around.
Whenever I stop the new service that triggered the errors, the name resolution is back to normal and all works as expected again. I have encountered this same problem when starting multiple different images, so the problem is most likely not in the new container.
Since I can not provide you with a simple minimally reproducible example (I would if I could), how can I further debug the docker name resolution in an attempt at gathering more information about this problem?
Any advice is appriceated. I know this is not the optimal Stack Overflow question, but I can't do much more without more information about my problem.
I hope you understand.

What's the purpose of the node-modules container in wolkenkit?

That container is built when deploying the application.
Looks like its purpose is to share dependencies across modules.
It looks like it is started as a container but nothing is apparently running, a bit like an init container.
Console says it starts/stops that component when using respective wolkenkit start and wolkenkit stop command.
On startup:
On shutdown:
When you docker ps, that container cannot be found:
Can someone explain these components?
When starting a wolkenkit application, the application is boxed in a number of Docker containers, and these containers are then started along with a few other containers that provide the infrastructure, such as databases, a message queue, ...
The reason why the application is split into several Docker containers is because wolkenkit builds upon the CQRS pattern, which suggests separating the read side of an application from the application's write side, and hence there is one container for the read side, and one for the write side (actually there are a few more, but you get the picture).
Now, since you may develop on an operating system other than Linux, the wolkenkit application may run under a different operating system than when you develop it, as within Docker it's always Linux. This means that the start command can not simply copy over the node_modules folder into the containers, as they may contain binary modules, which are then not compatible (imagine installing on Windows on the host, but running on Linux within Docker).
To avoid issues here, wolkenkit runs an npm install when starting the application inside of the containers. The problem now is that if wolkenkit did this in every single container, the start would be super slow (it's not the fastest thing on earth anyway, due to all the Docker building and starting that's happening under the hood). So wolkenkit tries to optimize this as much as possible.
One concept here is to run npm install only once, inside of a container of its own. This is the node-modules container you encountered. This container is then linked as a volume to all the containers that contain the application's code. This way you only have to run npm install once, but multiple containers can use the outcome of this command.
Since this container now contains data, but no code, it only has to be there, it doesn't actually do anything. This is why it gets created, but is not run.
I hope this makes it a little bit clearer, and I was able to answer your question :-)
PS: Please note that I am one of the core developers of wolkenkit, so take my answer with a grain of salt.

Docker Logs filling up on running container

I've got a Docker container currently running in production on a CentOS 7 VM. We have encountered a problem where the logs of the container are filling up the host drive (the log files found at /var/lib/docker/{continer_name}) over time and causing the container to become unresponsive forcing us to clear logs on the host in order to enable it to continue processing.
We can't take the container down, meaning I can't just bring it back up using the --log-opt flag to set up some log rotation options.
We've tried using logrotate, but the nature of the container means the logs are being written to regularly and what we find is often the logs are rotated, but the original file does not decrease in size due to being written to whilst the rotation is underway.
I'm trying to find a solution to this problem where we can set up some kind of task that will clear the logs down to a specific file size. Any help is greatly appreciated.
i would suggest mounting the containers logs directory to a host directory, and there you can schedule whatever task to zip/move/delete the log files...

Resources