how can i launch the kafka scheduler using marathon in minimesos? - docker

I'm trying to launch the kafka-mesos framework scheduler using the docker container as prescribed at https://github.com/mesos/kafka/tree/master/src/docker#running-image-in-marathon using the Marathon implementation running in minimesos (I would like to add a minimesos tag, but don't have the points). The app is registered and can be seen in the Marathon console but it remains in Waiting state and the Deployment GUI says that it is trying to ScaleApplication.
I've tried looking for /var/log files in the marathon and mesos-master containers that might show why this is happening. Initially i thought it may have been because the image was not pulled, so i added "forcePullImage": true to the JSON app configuration but it still waits. I've also changed the networking from HOST to BRIDGE on the assumption that this is consistent with the minimesos caveats at http://minimesos.readthedocs.org/en/latest/ .
In the mesos log i do see:
I0106 20:07:15.259790 15 master.cpp:4967] Sending 1 offers to framework 5e1508a8-0024-4626-9e0e-5c063f3c78a9-0000 (marathon) at scheduler-575c233a-8bc3-413f-b070-505fcf138ece#172.17.0.6:39111
I0106 20:07:15.266100 9 master.cpp:3300] Processing DECLINE call for offers: [ 5e1508a8-0024-4626-9e0e-5c063f3c78a9-O77 ] for framework 5e1508a8-0024-4626-9e0e-5c063f3c78a9-0000 (marathon) at scheduler-575c233a-8bc3-413f-b070-505fcf138ece#172.17.0.6:39111
I0106 20:07:15.266633 9 hierarchical.hpp:1103] Recovered ports(*):[33000-34000]; cpus(*):1; mem(*):1001; disk(*):13483 (total: ports(*):[33000-34000]; cpus(*):1; mem(*):1001; disk(*):13483, allocated: ) on slave 5e1508a8-0024-4626-9e0e-5c063f3c78a9-S0 from framework 5e1508a8-0024-4626-9e0e-5c063f3c78a9-0000
I0106 20:07:15.266770 9 hierarchical.hpp:1140] Framework 5e1508a8-0024-4626-9e0e-5c063f3c78a9-0000 filtered slave 5e1508a8-0024-4626-9e0e-5c063f3c78a9-S0 for 2mins
I0106 20:07:16.261010 11 hierarchical.hpp:1521] Filtered offer with ports(*):[33000-34000]; cpus(*):1; mem(*):1001; disk(*):13483 on slave 5e1508a8-0024-4626-9e0e-5c063f3c78a9-S0 for framework 5e1508a8-0024-4626-9e0e-5c063f3c78a9-0000
I0106 20:07:16.261245 11 hierarchical.hpp:1326] No resources available to allocate!
I0106 20:07:16.261335 11 hierarchical.hpp:1421] No inverse offers to send out!
but I'm not sure if this is relevant since it does not correlate to the resource settings in the Kafka App config. The GUI shows that no tasks have been created.
I do have ten mesosphere/inky docker tasks running alongside the attempted Kafka deployment. This may be a configuration issue specific to the Kafka docker image. I just don't know the best way to debug it. Perhaps a case of increasing the log levels in a config file. It may be an environment variable or network setting. I'm digging into it and will update my progress, but any suggestions would be appreciated.
thanks!

Thanks for trying this out! I am looking into this and you can follow progress on this issue at https://github.com/ContainerSolutions/minimesos/issues/188 and https://github.com/mesos/kafka/issues/172

FYI I got Mesos Kafka installed on minimesos via a quickstart shell script. See this PR on Mesos Kafka https://github.com/mesos/kafka/pull/183
It does not use Marathon and the minimesos install command yet. That is the next step.

Related

Mysterious Filebeat 7 X-Pack issue using Docker image

I've also posted this as a question on the official Elastic forum, but that doesn't seem super frequented.
https://discuss.elastic.co/t/x-pack-check-on-oss-docker-image/198521
At any rate, here's the query:
We're running a managed AWS Elasticsearch cluster — not ideal, but that's life — and run most the rest of our stuff with Kubernetes. We recently upgraded our cluster to Elasticsearch 7, so I wanted to upgrade the Filebeat service we have running on the Kubernetes nodes to capture logs.
I've specified image: docker.elastic.co/beats/filebeat-oss:7.3.1 in my daemon configuration, but I still see
Connection marked as failed because the onConnect callback failed:
request checking for ILM availability failed:
401 Unauthorized: {"Message":"Your request: '/_xpack' is not allowed."}
in the logs. Same thing when I've tried other 7.x images. A bug? Or something that's new in v7?
The license file is an Apache License, and the build when I do filebeat version inside the container is a4be71b90ce3e3b8213b616adfcd9e455513da45.
It turns out that starting in one of the 7.x versions they turned on index lifecycle management checks by default. ILM (index lifecycle management) is an X-Pack feature, so turning this on by default means that Filebeat will do an X-Pack check by default.
This can be fixed by adding setup.ilm.enabled: false to the Filebeat configuration. So, not a bug per se in the OSS Docker build.

Unable to follow "Sandbox" links from Mesos UI

I have three physical nodes with docker installed on each of them. I configured Mesos,Marathon,Hadoop,Flink and Zookeeper on them. I can see all UI in my browser and run a Flink application in Marathon.
The problem is that Mesos UI shows me that Flink is running, but when I click on SandBox, I see this error:
Failed to connect to agent '16657705-0573-410a-aef3-e2bb4119092c-S0' on '//50592e835da1:5051/slave(1)/state?jsonp=JSON_CALLBACK'.
Potential reasons:
The agent is not accessible
The agent timed out or went offline
I know it is related to Mesos configuration, but I have no idea what is wrong.
I wrote MESOS_HOSTNAME in /etc/hosts, but it did not work. Also, I sat MESOS_HOSTNAME=IP in mesos-agent-env.sh, but it did not work.
Would you please guide me how I can solve the issue?
Any help would be really appreciated.
Problem solved.
According to the Apache Mesos site, The hostname the agent node should report, or that the master should advertise in ZooKeeper.. I forgot to use --hostname=SlaveIP in slave command; after using it, every thing ran without any errors.
/home/mesos-1.7.2/build/bin/mesos-slave.sh
--master=10.32.0.2:5050,10.32.0.3:5050 --hostname=10.32.0.4
--work_dir=/var/run/mesos --log_dir=/var/log/mesos
--systemd_enable_support=false

how to mange dependent docker services

I have two dockers one containing a Cassandra server and one containing a jetty server which has my application.
both dockers use chef-solo for configuration and for running tasks after the jetty and the Cassandra start.
in the Cassandra docker, we run the keyspace creation after the server has started.
and in the jetty, we preload data into the system after the server started.
the problem is that I need to know when the Cassandra docker has completed its initialization before I can start the jetty because in order to preload the data I need a connection to thew Cassandra.
my question is how can this be achieved?
is there a docker commend that can notify the system my docker is ready?
is there a way to check the chef if it completed its initialization?
perhaps I need another approach?
suggestions will be welcomed.
No, there's no native way to do this in Docker - and the reasoning for that is because it's not how containers should be used. Your image is your application; and provisioning should not be happening upon coming online.
What you should do here is add your Chef scripts into a Dockerfile and build your image from it; that way you're ready to go when you run the application.
However if you'd like to control start-up order; you wait for the container to be running before starting another; but it will not wait until it is "ready". Further reading on this can be found here.
Good luck!

Launching jobs with large docker images in mesos via aurora can be slow

When launching a task over mesos via aurora that uses a rather large docker image (~2GB) there is a long wait time before the task actually starts.
Even when the task has been previously launched and we would expect the docker image to already be available to the worker node, there is still a waiting time dependent on image size before the task actually launches. Using docker, you can launch a container almost instantly as long as it is in your images list already, does the mesos containerizer not support this "caching" as well ? Is this functionality something that can be configured ?
I haven't tried using the docker containerizer, but it is my understanding that it will be phased out soon anyway and that gpu resource isolation, which we require, only works for the mesos containerizer.
I am assuming you are talking about the unified containerizer running docker images? What is backend you are using? By default the Mesos agents use the copy backend which is why you are seeing it being slow. You can look at the backend the agent is using by hitting flags endpoint on the agent. Switch the backend to aufs or overlayfs to see if speeds up the launch. You can specify the backend through the flag --image_provisioner_backend=VALUE on the agent.
NOTE: There are few bugs fixes related to aufs and overlayfs backend in the latest Mesos release 1.2.0-rc1 that you might want to pick up. Not to mention that there is an autobackend feature in 1.2.0-rc1 that will automatically select the fastest backend available.

How Mesos Marathon handle application data persistence?

I have been exploring Mesos, Marathon framework to deploy applications. I have a doubt that how Marathon handle application files when an application is killed .
For example we are using Jenkins which is run through Marathon and if Jenkins server fails and it will be restarted again by Marathon but this time old jobs defined will be lost .
Now my question is how can I ensure that if a application restarts, those old application jobs should be available ?
Thanks.
As of right now mesos/marathon is great at supporting stateless applications, but the support for stateful applications is increasing.
By default the task data is written into sandbox and hence will be lost when a task is failed/restarted. Note that usually only a small percentage of tasks fails (e.g. only the tasks being on the failed node).
Now let us have a look at different failure scenarios.
Recovering from slave process failures:
When only the Mesos slave process fails (or is upgraded) the framework can use slave checkpointing for reconnecting to the running executors.
Executor failures (e.g. Jenkins process failures):
In this case the framework could persist it own metadata on some persistent media and use it to restart. Note, that this is highly application specific and hence mesos/marathon can not offer a generic way to do this (and I am actually not sure how that could look like in case of jenkins). Persistent data could either be written to HDFS, Cassandra or you could have a look at the concept of dynamic reservations.

Resources