Kubernetes & Docker Efficiency [closed] - docker

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I've been looking for information on how efficiently Kubernetes & Docker are in terms of using machine resources, but I haven't found much so far. Here are my three questions, all about Kubernetes+Docker:
If multiple containers on the same node are running the same binary, are the code pages shared between all these instances? That is, is there a single set of physical pages allocated on the node for all these processes? For example, if I'm running a service mesh like Istio, which runs Envoy in every pod, is the system smart enough to only load the Envoy code in memory once, or does all the indirection taking place prevent the Linux kernel from recognizing that sharing is possible?
In a large Kubernetes deployment, there will end up being a considerable number of redundantly downloaded docker images on each node. Instead, it would seem more effective to have a single in-cluster repository for these images that all nodes can fetch from. I saw this about having docker use NFS for a common image store. Is this the only answer?
I heard there's a practical limit to the number of pods Kubernetes will schedule on a single node (30). Such a small limit forces you to use smaller VMs in order to be able to fully saturate them. Anybody know why this limit exists and whether it will eventually be raised? I ask this in the context of trying to run Kubernetes on bare metal where VMs aren't used at all. In such a world, I'd want to be able to pack way more than 30 pods on a (large) physical machine.
Thank you for any insights or pointers.

You state your question in the way that you plan to use docker as container runtime for kubernetes. That is fine - but there are more choices. Depending on the runtime the answers will change.
In general kubernetes provides an abstraction over the actual scheduling and running of pods/containers. Perhaps you invest too much human time into details that can be solved with more metal, which is cheap.
Multiple containers on a single node are usually (docker/containerd/crio) just system processes. Like you launch your Apache httpd multiple times yourself. If the kernel uses memory deduplication, it can indeed share pages.
If you use a container runtime that launches micro-VMs (firecracker,kata, ...) I doubt memory deduplication will be possible.
I would not recommend to share storage for the container images, f.e. with NFS. With some customer setups I had to diagnose issues caused by this. like deadlocks. Basically you would reduce the robustness of your cluster in order to save disk space. Just use more metal.
The usual limit is 110 Pods per node which is usually plenty. You can change this limit using --max-pods parameter to the kubelet process or configuration file for kubelet. The reason for the limit is that the management of a pod incurs effort on the kubelet and etcd/apiserver side.

Related

Can concurrent docker containers speed-up computations?

As a disclaimer let me just say that I am a beginner with Docker and hence the question might sound a bit "dummy".
I am exploring parallelization options to speed-up some computations. I'm working with Python, so I followed the official guidelines to create my first image and then run it as a container.
For the time being, I use a dummy program that generates a very large np random matrix (let's say 4000 x 4000) and then finds how many elements in each row fall into a predefined range [min, max].
I then launched a second container of the same image obviously with a different port and name. I didn't get any speed-ups in the computations which I was expecting somehow, since:
a) I haven't developed any mechanism for the 2 containers to somehow "talk to each other" and share intermediate results
b) I am not sure if the program itself is suitable for speedups in such a way.
So my questions corresponding to a, b above are:
Is parallelism a "feature" supported by docker deployments and in what sense? Is it just load sharing? And if I implement a load balancer, how does docker know how to transfer intermediate results from one container to the other?
If the previous question is not "correct", do I then need to write "parallel" versions of my programs to assign to each container? Isn't this equivalent to writing MPI versions of my program and assign them to different cores in my system? What would be the benefit of a docker architecture then?
thanks in advance
Docker is just a way of deploying your application - it does not in itself allow you to 'support' parallelism just by using Docker. Your application itself needs to support parallelism. Docker (and Kubernetes, etc) can help you scale out in parallel easily but your applications need to be able to support that scaling out.
So if you can run multiple instances of your application in parallel now (however you might do that) and it would not deliver any performance improvement then Docker will not help you. If running multiple instances now does improve performance then Docker will help scale out.

What will be a project level approach in Kubernetes? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
My company has one project which required 3-4 days of deployment time. I thought about it and try to make one deployment modal for this project using Kubernetes.
I read all about it but getting into project-level create some problem.
What is done till now...
Created Kubernetes cluster with one master node and one worker node in ubuntu VM.
Understand I need to create a Deployment file, Service file, Persistent volume, and claim.
Created a custom image with the base image as CentOS7 and python2.7 with certain requirements and uploaded them on the docker hub.
Now I created one Deployment.yml file to pull that image but it is Showing CrashLoopBackOff error and IT IS NOT able to pull the image through Deployment.yml file
Note: I pulled the image separately using docker and it is working.
Thanks in advance :)
It is a very wide area but I could give you certain high level points with respect to kubernetes.
Create different clusters for different projects. Also create different cluster for different environment like QA, Dev, Production.
Set resource quotas for individual projects. Also your deployments should have resource limits for RAM and CPUs. Precisely estimate the resource demand for each and every application.
Use namespaces for logical separation and using tags is always a good approach.
If you want to follow template based approach, you could search about helm charts.
Your k8s nodes, disks, deployments, services, ingress any other kind of kubernetes object you create should have labels.
Use node auto scaling (cloud specific) and horizontal pod auto scaling techniques for better scaling and resilience.
Always try to distribute your k8s deployments across region for fail-over strategy. If anything goes down in some part of your hosted region then your application should sustain it.
In case your want to move project to some reputed cloud provider, try to integrate cloud provided security and firewall rules with your k8s cluster.
I hope this would help.
I'd go with one cluster for production and one cluster for development/testing. Within the cluster you can use namespaces to isolate group of applications. For example every developer has its own namespace for testing.

How big can a GKE container image get before it's a problem?

This question is admittedly somewhat vague. If you have suggestions how to better word it, please by all means, give me feedback...
I want to understand how big a GKE container image can get before there may be problems, either serious or minor. For example, I've built a docker image (not deployed yet) that is 683 MB.
(As an aside, the reason it's so big is that I'm running a computer vision library licensed from a company with certain attributes: (1) uses native libraries that are not compatible with Alpine; (2) uses Java; (3) uses Node.js to run a required licensing daemon in same container; (4) has some very large machine learning model files.)
Although the service will have auto-scaling enabled, I expect the auto-scaling to be fairly light. It might add a new pod occasionally, but not major spikes up and down.
The size of the container will determine how many resources to assign it and thus how much CPU, memory and disk space your nodes.must have. I have seen containers require over 2 GB of memory and still work fine within the cluster.
There probably is an upper limit but the containers would have to be enormous, your container size should not pose any issues aside from possibly container startup
In practice, you're going to have issues pushing an image to GCR before you have issues running it on GKE, but there isn't a hard limit outside the storage capabilities of your nodes. You can get away with O(GB) pretty easily.

Using Docker for a mail server [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 4 years ago.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Improve this question
I've been interested in docker for a while, but not jumped in yet. I have a need to set up a mail server, so thought maybe I could use this as a reason to learn more about docker. However, I'm unclear how to best go about it.
I've installed a mailserver on a VPS before, but not into multiple containers. I'd like to install Postfix, Dovecot, MySQL or Postgresql, and SpamAssassin, similar to what is described here:
https://www.digitalocean.com/community/tutorials/how-to-configure-a-mail-server-using-postfix-dovecot-mysql-and-spamassasin
However, what would be a good way to dockerize it? Would I simply put everything into a single container? Or would it be better to have MySQL in one container, Postfix in another, and additional containers for Dovecot and SpamAssassin? Or should some containers be shared?
Are there any HOWTOs on installing a mailserver using docker? If there is, I haven't found it yet.
The point of Docker isn't containerization for containerization's sake. It is to put together things that belong together and separate things that don't belong together.
With that in mind, the way I would set this up is with a container for the MySql database and another container for all of the mail components. The mail components are typically integrated with each other by calling each other's executables or by reading/writing shared files, so it does not make sense to separate them in separate containers anyway. Since the database could also be used for other things, and communication with it is done over a socket, it makes more sense for that to be a separate container.
Dovecot, Spamassassin, et al can go in separate containers to postfix. Use LMTP for the connections and it'll all work. This much is practical.
Now for the ideological bit. If you really wanted to do things 'the docker way', what would that look like.
Postfix is the difficult one. It's not one daemon, but rather a cluster of different daemons that talk to each other and do different parts of the mail handling tasks. Some of the interaction between these component daemons is via files (e.g the mail queues), some is via sockets, and some is via signals.
When you start up postfix, you really start the 'master' daemon, which then starts the other daemon processes it needs using the rules in master.cf.
Logging is particularly difficult in this scenario. All the different daemons independently log to /dev/log, and there's really no way to process those logs without putting a syslog daemon inside the container. "Not the docker way!"
Basically the compartmentalisation of functionality in postfix is very much a micro-service sort of approach, but it's not based on containerisation. There's no way for you to separate the different services out into different containers under docker, and even if you could, the reliance on signals is problematic.
I suppose it might be possible to re-engineer the 'master' daemon, giving it access to the docker process in the host, (or running docker within docker), and thus this new master daemon could coordinate the various services in separate containers. We can speculate, but I've not heard of anyone moving on this as an actual project.
That leaves us with the more likely option of choosing a more container friendly daemon than postfix for use in docker. I've been using postfix more or less exclusively for about the past decade, and haven't had much reason to look around options till now. I'd be very interested if anyone can add commentary on possible more docker-friendly MTA options?

What is the best distributed Erlang in-memory cache? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I need some suggestion for the erlang in-memory cache system.
The cache item is key-value based storage.
key is usually an ASCII string; value is erlang's types include number / list / tuple / etc.
The cache item can be set by any of the node.
The cache item can be get by any of the node.
The cache item is shared cross all nodes even on different servers
dirty-read is permitted, I don't want any lock or transaction to reduce the performance.
Totally distributed, no centralized machine or service.
Good performance
Easy install and deployment and configuration and maintenance
First choice seems to me is mnesia, but I have no experence on it.
Does it meet my requirement?
How the performance can I expect?
Another option is memcached --
But I am afraid the performance is lower than mnesia because extra serialization/deserialization are performed as memcached daemon is from another OS process.
Yes. Mnesia meets your requirements. However, like you said, a tool is good when the one using it understands it in depth. We have used mnesia on a distributed authentication system and we have not experienced any problem thus far. When mnesia is used as a cache it is better off than memcached, for one reason "Memcached cannot guarantee that what you write, you can read at any time, due to memory swap out issues and stuff" (follow here). However, this means that your distributed system is going to be built over Erlang. Indeed mnesia in your case beats most NoSQL cache solutions because their systems are Eventually consistent. Mnesia is consistent, as long as network availability can be ensured across the cluster. For a distributed cache system, you dont want a situation where you read different values for the same key from different nodes, hence mnesia's consistency comes in handy here. Something you should think about, is that, it is possible to have a centralised Memory cache for a distributed system. This works like this: You have RABBITMQ server running and accessible by AMQP clients on each Cluster node. Systems interact over the AMQP interface. Because, the cache is centralised, consistency is ensured by the process/system responsible for writing and reading from the cache. The other systems just place a request for a key, onto the AMQP message bus, and the system responsible for cache receives this message and replies it with the value.
We have used the Message bus Architecture using RABBITMQ for a recent system which involved integration with banking systems, an ERP system and Public online service. What we built was responsible for fusing all these together and we are glad that we used RABBITMQ. The details are many but what we did is to come up with a message format, and a system identification mechanism. All systems must have a RABBITMQ client for writing and reading from the message bus. Then you would create a read Queue for each system, so that other system write their requests into that queue, whose name inside RABBITMQ, is the same as the system owning it. Then, later, you must encrypt the messages passing over the bus. In the end, you have systems bound together over large distance/across states, but with an efficient network, you wont believe how fast RABBITMQ binds these systems. Anyhow, RABBITMQ can also be clustered, and i should tell you that it is Mnesia which powers RABBITMQ (that tells you how good mnesia can be).
Another thing is that, you should do some reading and write many programs until you are comfortable with it.

Resources