Unable to follow "Sandbox" links from Mesos UI - docker

I have three physical nodes with docker installed on each of them. I configured Mesos,Marathon,Hadoop,Flink and Zookeeper on them. I can see all UI in my browser and run a Flink application in Marathon.
The problem is that Mesos UI shows me that Flink is running, but when I click on SandBox, I see this error:
Failed to connect to agent '16657705-0573-410a-aef3-e2bb4119092c-S0' on '//50592e835da1:5051/slave(1)/state?jsonp=JSON_CALLBACK'.
Potential reasons:
The agent is not accessible
The agent timed out or went offline
I know it is related to Mesos configuration, but I have no idea what is wrong.
I wrote MESOS_HOSTNAME in /etc/hosts, but it did not work. Also, I sat MESOS_HOSTNAME=IP in mesos-agent-env.sh, but it did not work.
Would you please guide me how I can solve the issue?
Any help would be really appreciated.

Problem solved.
According to the Apache Mesos site, The hostname the agent node should report, or that the master should advertise in ZooKeeper.. I forgot to use --hostname=SlaveIP in slave command; after using it, every thing ran without any errors.
/home/mesos-1.7.2/build/bin/mesos-slave.sh
--master=10.32.0.2:5050,10.32.0.3:5050 --hostname=10.32.0.4
--work_dir=/var/run/mesos --log_dir=/var/log/mesos
--systemd_enable_support=false

Related

All docker stack are restarting automatically

I have a multi-services environment that is hosted with docker swarm. There are multiple stacks that are created. All the docker containers which are running have an inbuild Spring Boot application. The issue is coming that all my stacks get restarted on their own. Now I know that in compose file I have mentioned that restart_policy as on failure. Hence it auto restarted. The issue comes that when services are restarted, I get errors from a particular service and this breaks everything.
I am not able to figure out what actually happens.
I did quite a lot of research and found out about these things.
Docker daemon is not restarted. I double-checked this with the uptime of the docker daemon.
I checked the docker service ps <Service_ID> and there I can see service showing shutdown and starting. No other information.
I checked the docker service logs <Service_ID> but no error in there too.
I checked for resource crunch. I can assure you that there was quite a good resource available at the host as well as each container level.
Can someone help where exactly to find logs for this even? Any other thoughts on this?
My host is actually a VM hosted on VMWare Vcenter.
After a lot of research and going through all docker logs, I could not find the solution. Later on, I discovered that there was a memory snapshot taken for backup every 24 hours.
Here is what I observe:
Whenever we take a snapshot, all docker services running on the host restart automatically. There will be no errors in that but they will just restart gracefully.
I found some questions already having this problem with VMware snapshots.
As far as I know, when we take a snapshot, it points to a different memory location and saves the previous one. I am not able to find why it's happening but yes Root cause of the problem was this. If anyone is a VMWare snapshots expert, please let us know.

what's the purpose of the zabbix officially provided docker image of the zabbix agent?

I used the zabbix official docker-compose yaml to set up a set of zabbix system and I found the server as a monitoring target was not available. I searched the Internet and found there are people also encountered such problem.Someone said the agent container's IP or DNS name should be used as the server's. I tried and found it works. But I'm confused by the agent. Does it monitor the server container,the agent container or the host machine? If it only monitors the agent container itself,what's the purpose of it?
Does it monitor the server container,the agent container or the host machine?
Agent container.
If it only monitors the agent container itself,what's the purpose of it?
For testing. And for monitoring external stuff, with custom commands. Or you can connect stuff from host and monitor it, so just in all the cases you do not want or can't install agent on the host.
Everybody who configures a Dockerized Zabbix installation like yourself bumps into to this issue- and of course find themselves on StackExchange looking for the answers that should have been in the documentation.
The reason that the Zabbix Agent in the docker-compose install you're referring to can't initially connect is that both it and server it monitors both run in isolated containers. Separate containers cannot talk to each other on 127.0.0.1 (localhost) addresses. And that is actually a good thing!
I've reviewed the documentation in the repo you're talking about and it's sparse to say the least; it certainly could be better. But to be fair to Zabbix, their docker-compose install DOES work great when you get it running and can achieve pretty fair results quickly with little effort (and a bit of Googling ;-> ).
I actually found FURTHER pain connecting to containerized Zabbix Agents raised on different hosts outside of the docker-compose install you're referring to. Connectivity was being busted because the host the docker-compose install was raised on was NAT'ing out the traffic and presenting the wrong IP address. I've documented this issue HERE.
Dockerized Zabbix is a good thing; there is a purpose to it. I agree with you though that the documentation could be better though. Stick with it!

Docker install of AZCore results in authserver+worldserver doesn't exist error

I'm trying to spin up a fresh server using the azerothcore docker installation guide. I have completed all of the early installation steps, up until running the containers. Upon running the containers (for worldserver and authserver) i see the following output from the containers. It appears the destination of the world and auth servers in dist/bin is missing, how may i resolve this issue?
Check your docker settings. Make sure you have enough memory. If containers have low memory they will not finish the compile. Check if you have build issues.

Why do Kubernetes' containers fail when ran on runsc (gVisor) as a runtime in Docker?

I am running a single master Kubernetes cluster with Docker. I wanted to try runsc (gVisor) on Kubernetes. I just wanted to start each container in a separate sandbox. So I set runsc as the default runtime and restarted the Docker service. To my surprise, all the Kubernetes' containers were failing (checked with docker ps). What is the exception that causes this? Is there any other way to use gVisor+Docker+Kubernetes?
I am using the right requirements to run each of them.
PS: I am just a beginner.
Thanks for trying gVisor! Sorry it isn't working for you.
Running a Kubernetes Pod inside gVisor is still fairly experimental. It can be made to work, but is a bit difficult to configure right now. We are working to make this easier.
Can you run gVisor with Docker (not Kubernetes)? See the instructions here:
https://github.com/google/gvisor#configuring-docker
If that fails, please file a bug report:
https://github.com/google/gvisor/issues
If you can include debug logs, that will help us diagnose any failure.
https://github.com/google/gvisor#debugging

Timeouts accessing services on swarm published ports

We're using Docker in Swarm mode to host a number of services. Recently we've hit an issue where we get connection timeouts intermittently (sometimes as much as every other request) when trying to access some services.
We've upgraded the environment to the latest version of Docker (currently Docker version 17.03.0-ce, build 3a232c8), done a staggered reboot of all servers (trying to maintain uptime if possible even though this environment is technically a test environment) and tried stopping / starting services as well, but the issue still persists.
I'm confident the issue is not related to the service that's running in Docker, as we're seeing it on various services which have until recently been running without issue, I think it's more likely an environmental issue, or some problem with Docker's internal routing in the overlay network, but not sure how to prove / solve this.
Any advice on how to diagnose or solve this would be greatly appreciated!

Resources