Limitations in Mesos and Marathon Regarding Docker - docker

We have this scenario.
We have 3/3 master/slave arch for Mesos.
Each sleeve is identical, 4GB RAM and 4 Core CPUs.
We have started 10 marathon Apps with 1core CPU and 1GB RAM. We started the containers, but not utilizing them, as per the system it's saying 97% CPU is free.
Now, we are trying to start an another container with a 3Core CPU and 2GB RAM.
Unfortunately, we are not able to start the container, as per the Mesos logs, it's saying that marathon has declined the offer, but all slave nodes are not doing anything. Marathon apps stayed in Deployment state itself.
If mesos is not able to allocate resources to the marathon app (If containers are not utilizing the resources), then what's the use of Docker integration here.
As per my understanding:
Once an offer is accepted by marathon app, even if docker is not using that resource, mesos is thinking like that resources are already utilizing by the app. But if the container is not utilizing any resources, mesos need to collect the available resources and allocate to next marathon application.
Instead of that once an offer is assigned to marathon App, Mesos is subtracting the allocated resources from the total resources.
We are not fully utilizing the Docker features in Mesos/Marathon.
Let me know any suggestions and answers.
Thank you

Mesos tracks "allocation" and not the actual usage. If your app is not doing anything, it doesn't mean it won't do anything in the next moment. That means, if your app requested 1 CPU, this CPU is reserved for the app.
Now, if you don't want to precisely estimate resources your app is using, you may want to look at oversubscription in Mesos. You must keep in mind though, that once oversubscribed resources are requested by the app, for which these resources have been allocated, apps using oversubscribed resources may be terminated.

Mesos/Marathon actually considers the allocated 10*(1GB + 1CPU), because that is the max your app(s) is allowed to use.
And so yes your understanding is correct.
In my opinion you have at least 2 options
Assign less resources to your tasks.
There is actually an interesting new feature which seems to fit your use case: oversubscription which basically tries to utilize this difference between allocated and actual used resources.

Related

How to set the CPU priority (niceness) of a Docker container?

One of my containers is always busy, and is taking CPU away from other containers (webservers) that need to be responsive and are only active from time to time.
I would like to lower the CPU priority of the CPU-consuming container, so that whenever the other containers need the CPU, it is not clogged.
How do I do this? I have been searching the web for a while now, but I can't find the answer.
I have tried running the container with --entrypoint='nice 10 mybinary', but it turns out --entrypoint can only run binaries, not shell commands.
You can limit CPU resources on the container level. I recommend to use --cpu-shares 512 for your case.
https://docs.docker.com/config/containers/resource_constraints/:
Set this flag to a value greater or less than the default of 1024 to increase or reduce the container’s weight, and give it access to a greater or lesser proportion of the host machine’s CPU cycles. This is only enforced when CPU cycles are constrained. When plenty of CPU cycles are available, all containers use as much CPU as they need. In that way, this is a soft limit. --cpu-shares does not prevent containers from being scheduled in swarm mode. It prioritizes container CPU resources for the available CPU cycles. It does not guarantee or reserve any specific CPU access.
Setting the CPU shares is the most direct answer to your request, and typically preferred over adding capabilities to the container could be used by a malicious actor inside of the container to impact the host. The only reason I can think of to add the SYS_NICE capability to the container is if you have multiple processes inside the container and want to give different priorities to them, or need to change the priority while the container is running.
The more traditional solution to noisy neighbors is to configure each container with a limit on how much CPU and memory it is allowed to use. This is an upper bound, so realize there may be idle CPU resources if you set this low and do not have any other tasks available for the CPU to run.
The easiest way to set the limit on containers from the docker run command line is with --cpus which allows you to configure a fractional number of cores to be available to the container. Passing an option like --cpus 2.5 allows the container to use as many as 2.5 cores before the kernel scheduler throttles the process. If you had a 4 core host, that would ensure that at least 1.5 cores are always available to other processes.
Related to these limits, with Swarm Mode you can also configure a reservation for CPU (and memory). The reservation is a lower limit that Docker ensures has not been reserved for any other containers. This is used to select nodes to schedule containers, and may prevent some containers from being scheduled when there are not enough resources available, rather than scheduling so many jobs on a single node that it fails.
--cpu-shares looks like a good answer, although it's not clear to me how to verify it's working. I'm also curious what the max value is? Document doesn't say.
But, as an alternative for trusted containers, that same document also shows --cap-add=sys_nice that will allow changing process priorities within a container. i.e., if the nice or renice command is available within the container, it should work when you add the sys_nice capability. You'll only want to allow this capability for trusted containers because you don't want untrusted programs changing their own priorities willy nilly.
You can verify by inspecting the NI column for the process in question using top or ps -efl on the host.

How much resources to allocate to docker

I have been playing around with docker for a few months now and we are now ready to run a few production containers, and it got me into researching the infrastructure.
It let me to the question of, how much resources do I need to allocate to docker and how much should be left for the OS.
e.g. My server is 8 core 16gb ram. How much of that should I allocate to docker? I want to obviously allocate the maximum possible, but at what point would there be degradation of performance of the server it self?
Your question is hard to answer, and here's why: "docker" itself doesn't really require much in the way of resources. On the other hand, the applications that you run using docker will have their own requirements.
For example, if you're hosting a multi-terabyte database in a docker container, you're going to require more memory (and probably a lot more storage) than you would for, say, a single wordpress site.
If you're hosting some sort of video transcoding pipeline in Docker, you might end up consuming a lot more of your available CPU.
The only resource that Docker really consumes on its own is the storage space for images and volumes...and again, how much space you need is entirely dependent on how you're using Docker.
It all depends on exactly what you plan on doing with your system.

Efficiency of horizontal scaling when Multiple Deployment of same container on a single Host

Do we get efficiency in terms of load handling when the same container (in this case the container has a apache server and a php application) is deployed 5 or more times (i.e. 5 or more containers are deployed) on the same Host or VM?
Here efficiency would mean whether the application in such an architecture is able to serve more requests or serve requests faster?
As far as i am aware, each request launches a new apache-php thread and if we have 5 containers handling the requests then will it be inefficient since now the threads launched by apache will be contextually be switched out more often?
Scaling an application requires understanding why the application has reached it's limit. For this, you need to gather metrics from the application and host when it is fully loaded. Without testing and gathering metrics, you're only guessing why you've at capacity.
If the application is fully utilizing one or more cpu cores, but not all of them, then it is either not multi threaded, or is encountering locks preventing all the cores from being used. Adding more containers to the host in this scenario may help scale.
Typically, horizontal scaling is done because a single host is using all of some resource, like disk io, network bandwidth, memory, or cpu. If you find that the app is using all of one or more of these resources when under heavy load, then you need more hosts, not more containers running on the same host.
This all assumes you haven't configured docker to limit resources on the containers. If you reach your capacity with one container, and have resource limits configured, then the easiest way to get further performance is to remove or reduce those limits.

How many containers should exist per host in production? How should services be split?

I'm trying to understand the benefits of Docker better and I am not really understanding how it would work in production.
Let's say I have a web frontend, a rest api backend and a db. That makes 3 containers.
Let's say that I want 3 of the front end, 5 of the backend and 7 of the db. (Minor question: Does it ever make sense to have less dbs than backend servers?)
Now, given the above scenario, if I package them all on the same host then I gain the benefit of efficiently using the resources of the host, but then I am DOA when that machine fails or has a network partition.
If I separate them into 1 full application (ie 1 FE, 1 BE & 1 DB) per host, and put extra containers on their own host, I get some advantages of using resources efficiently, but it seems to me that I still lose significantly when I have a network partition since it will take down multiple services.
Hence I'm almost leaning to the conclusion that I should be putting in 1 container per host, but then that means I am using my resources pretty inefficiently and then what are the benefits of containers in production? I mean, an OS might be an extra couple gigs per machine in storage size, but most cloud providers give you a minimum of 10 gigs storage. And let's face it, a rest api backend or a web front end is not gonna even come close to the 10 gigs...even including the OS.
So, after all that, I'm trying to figure out if I'm missing the point of containers? Are the benefits of keeping all containers of an application on 1 host, mostly tied to testing and development benefits?
I know there are benefits from moving containers amongst different providers/machines easily, but for the most part, I don't see that as a huge gain personally since that was doable with images...
Are there any other benefits for containers in production that I am missing? Or are the main benefits for testing and development? (Am I thinking about containers in production wrong)?
Note: The question is very broad and could fill an entire book but I'll shed some light.
Benefits of containers
The exciting part about containers is not about their use on a single host, but their use across hosts connected on a large cluster. Do not look at your machines as independent docker hosts, but as a pool of resource to host your containers.
Containers alone are not ground-breaking (ie. Docker's CTO stating at the last DockerCon that "nobody cares about containers"), but coupled to state of the art schedulers and container orchestration frameworks, they become a very powerful abstraction to handle production-grade software.
As to the argument that it also applies to Virtual Machines, yes it does, but containers have some technical advantage (See: How is Docker different from a normal virtual machine) over VMs that makes them convenient to use.
On a Single host
On a single host, the benefits you can get from containers are (amongst many others):
Use as a development environment mimicking the behavior on a real production cluster.
Reproducible builds independent of the host (convenient for sharing)
Testing new software without bloating your machine with packages you won't use daily.
Extending from a single host to a pool of machines (cluster)
When time comes to manage a production cluster, there are two approaches:
Create a couple of docker hosts and run/connect containers together "manually" through scripts or using solutions like docker-compose. Monitoring the lifetime of your services/containers is at your charge, and you should be prepared to handle service downtime.
Let a container orchestrator deal with everything and monitor the lifetime of your services to better cope with failures.
There are plenty of container orchestrators: Kubernetes, Swarm, Mesos, Nomad, Cloud Foundry, and probably many others. They power many large-scale companies and infrastructures, like Ebay, so they sure found a benefit in using these.
Pick the right replication strategy
A container is better used as a disposable resource meaning you can stop and restart the DB independently and it shouldn't impact the backend (other than throwing an error because the DB is down). As such you should be able to handle any kind of network partition as long as your services are properly replicated across several hosts.
You need to pick a proper replication strategy, to make sure your service stays up and running. You can for example replicate your DB across Cloud provider Availability Zones so that when an entire zone goes down, your data remains available.
Using Kubernetes for example, you can put each of your containers (1 FE, 1 BE & 1 DB) in a pod. Kubernetes will deal with replicating this pod on many hosts and monitor that these pods are always up and running, if not a new pod will be created to cope with the failure.
If you want to mitigate the effect of network partitions, specify node affinities, hinting the scheduler to place containers on the same subset of machines and replicate on an appropriate number of hosts.
How many containers per host?
It really depends on the number of machines you use and the resources they have.
The rule is that you shouldn't bloat a host with too many containers if you don't specify any resource constraint (in terms of CPU or Memory). Otherwise, you risk compromising the host and exhaust its resources, which in turn will impact all the other services on the machine. A good replication strategy is not only important at a single service level, but also to ensure good health for the pool of services that are sharing a host.
Resource constraint should be dealt with depending on the type of your workload: a DB will probably use more resources than your Front-end container so you should size accordingly.
As an example, using Swarm, you can explicitely specify the number of CPUs or Memory you need for a given service (See docker service documentation). Although there are many possibilities and you can also give an upper bound/lower bound in terms of CPU or Memory usage. Depending on the values chosen, the scheduler will pin the service to the right machine with available resources.
Kubernetes works pretty much the same way and you can specify limits for your pods (See documentation).
Mesos has more fine grained resource management policies with frameworks (for specific workloads like Hadoop, Spark, and many more) and with over-commiting capabilities. Mesos is especially convenient for Big Data kind of workloads.
How should services be split?
It really depends on the orchestration solution:
In Docker Swarm, you would create a service for each component (FE, BE, DB) and set the desired replication number for each service.
In Kubernetes, you can either create a pod encompassing the entire application (FE, BE, DB and the volume attached to the DB) or create separate pods for the FE, BE, DB+volume.
Generally: use one service per type of container. Regarding groups of containers, evaluate if it is more convenient to scale the entire group of container (as an atomic unit, ie. a pod) than to manage them separately.
Sum up
Containers are better used with an orchestration framework/platform. There are plenty of available solutions to deal with container scheduling and resource management. Pick one that might fit your use case, and learn how to use it. Always pick an appropriate replication strategy, keeping in mind possible failure modes. Specify resource constraints for your containers/services when possible to avoid resource exhaustion which could potentially lead to bringing a host down.
This depends on the type of application you run in your containers. From the top of my head I can think of a couple different ways to look at this:
is your application diskspace heavy?
do you need the application fail save on multiple machines?
can you run multiple different instance of different applications on the same host without decreasing performance of them?
do you use software like kubernetes or swarm to handle your machines?
I think most of the question are interesting to answer even without containers. Containers might free you of thinking about single hosts, but you still have to decide and measure the load of your host machines yourself.
Minor question: Does it ever make sense to have less dbs than backend servers?
Yes.
Consider cases where you hit normal(without many joins) SQL select statements to get data from the database but your Business Logic demands too much computation. In those cases you might consider keeping your Back-End Service count high and Database Service count low.
It all depends on the use case which is getting solved.
The number of containers per host depends on the design ratio of the host and the workload ratio of the containers. Both ratios are
Throughput/Capacity ratios. In the old days, this was called E/B for execution/bandwidth. Execution was cpu and banwidth was I/o. Solutions were said to be cpu or I/o bound.
Today memories are very large the critical factor is usually cpu/nest
capacity. We describe workloads as cpu intense or nest intense. A useful proxy for nest capacity is the size of highest level cache. A useful design ratio estimator is (clock x cores)/cache. Fir the same core count the machine with a lower design ratio will hold more containers. In part this is because the machine with more cache will scale better and see less saturation at higher utilization. By

Does Apache Mesos recognize GPU cores?

In slide 25 of this talk by Twitter's Head of Open Source office, the presenter says that Mesos allows one to track and manage even GPU (I assume he meant GPGPU) resources. But I cant find any information on this anywhere else. Can someone please help? Besides Mesos, are there other cluster managers that support GPGPU?
Mesos does not yet provide direct support for (GP)GPUs, but does support custom resource types. If you specify --resources="gpu(*):8" when starting the mesos-slave, then this will become part of the resource offer to frameworks, which can launch tasks that claim to use these resources. Once some of the gpu resources are in use by a task, only the remaining resources will be offered again, until that task completes and the gpu resources become available again. In this way, the Mesos resource allocator can actually schedule the gpu resources you declared, and ensure that only the amount declared are offered/allocated to frameworks.
Mesos does not yet have support for gpu isolation, but with "pluggable isolator modules", you could build your own gpu isolator to enforce gpu resource limits.
Alternately, if you don't want to allocate individual gpu resources, but only want to declare some nodes as having gpus while others do not, you can just use --attributes="hasGpu:true" or something similar to differentiate the nodes that do/do not have gpus. This information is also passed onto the frameworks in resource offers, but these attributes cannot be "consumed" by a running task, so they will always be offered for that node.
For more information, see https://mesos.apache.org/documentation/attributes-resources/

Resources