fail2ban stop working ambigiously

fail2ban stop working ambigiously - monitoring

Its has been 3 months since monitoring. And nature of fail2ban that I observe is that it stop blocking ips after few days of busy schedule. Then I use to restart it and it again start working, blocking IP. For few months things goes this way but after few months, fail2ban not block IP even after restarting. Then I have to do fresh installation of fail2ban and then It again start blocking.
Can anyone tell reason for such nature of fail2ban?

Is hard to know what is happening withou any config file or log. Anyway, that kind of behavior can be because of a clock sync problem.
I don't know if you structure could be: one machine generating logs of the services and other machine with fail2ban reading it and banning. Or a host machine with fail2ban and the services in docker containers. In these case of scenarios, the date should be sync, that includes the timezone. Maybe after restarting your server is synchronizing your clock and changing the date... I don't know. Is a possibility.

Fail2ban removes the banned hosts from your list according to the time set in /etc/fail2ban/jail.local
# "bantime" is the number of seconds that a host is banned.
bantime = 3600
No ban on fail2ban is permanent (unless done by some configuration modification), so if you reboot with
sudo /etc/init.d/fail2ban restart
all banned ips will automatically receive an unban.
For you to restart the service without losing the list of banned ips, type
sudo fail2ban-client reload

Related

All docker stack are restarting automatically

I have a multi-services environment that is hosted with docker swarm. There are multiple stacks that are created. All the docker containers which are running have an inbuild Spring Boot application. The issue is coming that all my stacks get restarted on their own. Now I know that in compose file I have mentioned that restart_policy as on failure. Hence it auto restarted. The issue comes that when services are restarted, I get errors from a particular service and this breaks everything.
I am not able to figure out what actually happens.
I did quite a lot of research and found out about these things.
Docker daemon is not restarted. I double-checked this with the uptime of the docker daemon.
I checked the docker service ps <Service_ID> and there I can see service showing shutdown and starting. No other information.
I checked the docker service logs <Service_ID> but no error in there too.
I checked for resource crunch. I can assure you that there was quite a good resource available at the host as well as each container level.
Can someone help where exactly to find logs for this even? Any other thoughts on this?
My host is actually a VM hosted on VMWare Vcenter.

After a lot of research and going through all docker logs, I could not find the solution. Later on, I discovered that there was a memory snapshot taken for backup every 24 hours.
Here is what I observe:
Whenever we take a snapshot, all docker services running on the host restart automatically. There will be no errors in that but they will just restart gracefully.
I found some questions already having this problem with VMware snapshots.
As far as I know, when we take a snapshot, it points to a different memory location and saves the previous one. I am not able to find why it's happening but yes Root cause of the problem was this. If anyone is a VMWare snapshots expert, please let us know.

Architectural question about user-controlled Docker instances

I got a website in Laravel where you can click on a button which sends a message to a Python daemon which is isolated in Docker. This works for an easy MVP to prove a concept, but it's not viable in production because a user would most likely want to pause, resume and stop that process as well because that service is designed to never stop otherwise considering it's a scanner which is looped.
I have thought about a couple of solutions for this, such as fixing it in the software layer but that would add complexity to the program. I have googled Docker and I have found that it is actually possible to do what I want to do with Docker itself with the commands pause, unpause, run and kill.
It would be optimal if I had a service which would interact with the Docker instances with the criteria of above and would be able to take commands from HTTP. Is Docker Swarm the right solution for this problem or is there an easier way?

There are both significant security and complexity concerns to using Docker this way and I would not recommend it.
The core rule of Docker security has always been, if you can run any docker command, then you can easily take over the entire host. (You cannot prevent someone from docker run a container, as container-root, bind-mounting any part of the host filesystem; so they can reset host-root's password in the /etc/shadow file to something they know, allow remote-root ssh access, and reboot the host, as one example.) I'd be extremely careful about connecting this ability to my web tier. Strongly coupling your application to Docker will also make it more difficult to develop and test.
Instead of launching a process per crawling job, a better approach might be to set up some sort of job queue (perhaps RabbitMQ), and have a multi-user worker that pulls jobs from the queue to do work. You could have a queue per user, and a separate control queue that receives the stop/start/cancel messages.
If you do this:
You can run your whole application without needing Docker: you need the front-end, the message queue system, and a worker, but these can all run on your local development system
If you need more crawlers, you can launch more workers (works well with Kubernetes deployments)
If you're generating too many crawl requests, you can launch fewer workers
If a worker dies unexpectedly, you can just restart it, and its jobs will still be in the queue
Nothing needs to keep track of which process or container belongs to a specific end user

Running docker-Desktop on Windows 10 cannot restart containers after system restart

I am running Docker-Dektop Version 2.1.0.0 (36874) on a Windows 10 environment.
I am using two separate container compositions, one of these binding to port 8081 on my machine, and the other binding to 9990 and 8787.
After a system restart, I am unable to start these container compositions again, because the ports are already bound.
So far, I have tried multiple approaches to solve this:
manually stop all containers prior to system shutdown
manually stop and remove all containers prior to system shutdown
the above, plus explicitly stopping the docker application prior to system shutdown
removing all containers after system startup and prior to restart
pruning the networks after container removal
restart docker app prior to restarting containers (this worked up until the last update)
I did fiddle around with the compose files and the configuration, but taht would be too much detail to go into right now; all of these did not help.
What I recently found was, directly after a system startup and prior to starting any container, that the process com.docker.backend was already listening to the bound ports. This is confusing as the containers were shut down prior to system shutdown and are not run with a restart-command.
So I explicitly quit the docker desktop app, and the process still remaind, and it still bound the ports.
After manually killing the process as administrator from the power shell, and restarting the docker desktop application, my containers were able to start again.
Has anyone else had this problem? Does anyone know a "fix" for this at all?
And, of course, is this even the right page to ask? As this is not strictly programming, I am unsure.

Docker setup gets screwed up sometimes, so try deleting %appdata%\Docker.

The problem went away after the update to version 2.1.0.1 (37199)

Docker swarm mode load balancing

I've set up a docker swarm mode cluster, with two managers and one worker. This is on Centos 7. They're on machines dkr1, dkr2, dkr3. dkr3 is the worker.
I was upgrading to v1.13 the other day, and wanted zero downtime. But it didn't work exactly as expected. I'm trying to work out the correct way to do it, since this is one of the main goals, of having a cluster.
The swarm is in 'global' mode. That is, one replica per machine. My method for upgrading was to drain the node, stop the daemon, yum upgrade, start daemon. (Note that this wiped out my daemon config settings for ExecStart=...! Be careful if you upgrade.)
Our client/ESB hits dkr2, which does its load balancing magic over the swarm. dkr2 which is the leader. dkr1 is 'reachable'
I brought down dkr3. No issues. Upgraded docker. Brought it back up. No downtime from bringing down the worker.
Brought down dkr1. No issue at first. Still working when I brought it down. Upgraded docker. Brought it back up.
But during startup, it 404'ed. Once up, it was OK.
Brought down dkr2. I didn't actually record what happened then, sorry.
Anyway, while my app was starting up on dkr1, it 404'ed, since the server hadn't started yet.
Any idea what I might be doing wrong? I would suppose I need a health check of some sort, because the container is obviously ok, but the server isn't responding yet. So that's when I get downtime.

You are correct -- You need to specify a healthcheck to run against your app inside the container in order to make sure it is ready. Your container will not receive traffic until this healtcheck has passed.
A simple curl to an endpoint should suffice. Use the Healthcheck flag in your Dockerfile to specify a healthcheck to perform.
An example of the healthcheck line in a Dockerfile to check if an endpoint returned 200 OK would be:
HEALTHCHECK CMD curl -f 'http://localhost:8443/somepath' || exit 1
If you can't modify your Dockerfile, then you can also specify your healthcheck manually at deployment time using the compose file healthcheck format.
If that's also not possible either and you need to update a running service, you can do a service update and use a combination of the health flags to specify your healthcheck.

Reset IP counter used by docker

I start, stop and remove containers as part of a continuous build process. Each time the build runs, the containers get new IPs.
Im already at 172.17.0.95 after starting at 172.17.0.2 an hour back.
Since I remove old containers each build, I would also like to reset the IP counter, so that I dont have a timebomb where I run out of IP addresses after say a few hundred builds.
Please let me know how I can let the entity (DHCP Server?) know that IPAddress is free again, and to reset counter.
Thanks in advance SO community!

Docker seems to default to using 172.17.0.0/16 for the docker0 interface. That's 255^2 addresses, and if you use 100 every hour you'll run through them all in just over 27 days. I think Docker is just being conservative in not recycling them faster, but will loop around when it reaches the end.
If you need a bigger or different address space, you can use the --bip and --fixed-cidr flags on the Docker server to choose your own CIDR. See the Docker documentation on networking here.
If you really just want to reset the counter, you would need to restart the docker server. This will have the side-effect of terminating all your running containers.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart