I am working on a project to allocate tasks to the containers. Since my major is not Computer science, I am not familiar with the mathematical theory behind that.
The question which I am going to ask is regarding the computation time of Docker containers.
I am copying a part of an article that confuses me.
Delay(t) = lambda(t) / C_max
where C_max is the capacity of resources allocated to the container. The lambda be the
mean of a Poisson Process of the requests arrival.
Now, I have several questions:
What is the unit of lambda and C_max if the resources allocated to this container are Processing resources? Would it be a cycle per sec?
Can more than one task be allocated to a container at the same time or we need a buffer to store future tasks to be processed?
Question 2:
Yes, more than one task can be executed in a container at the same time. A container can have multiple processes inside it. But for ease of working with and reasoning about often people just try to put a single process inside a container.
Related
I am working on a project that is using Openwhisk. I have created a Kubernetes cluster on Google cloud with 5 nodes and installed OW on it. My serverless function is written in Java. It does some processing based on arguments I pass to it. The processing can last up to 30 seconds and I invoke the function multiple times during these 30 seconds which means I want to have a greater number of runtime containers(pods) created without having to wait for the previous invocation to finish. Ideally, there should be a container for each invocation until the resources are finished.
Now, what happens is that when I start invoking the function, the first container is created, and then after few seconds, another one to serve the first two invocation. From that point on, I continue invoking the function (no more than 5 simultaneous invocation) but no containers are started. Then, after some time, a third container is created and sometimes, but rarely, a fourth one, but only after long time. What is even weirded is that the containers are all started on a single cluster node or sometimes on two nodes (always the same two nodes). The other nodes are not used. I have set up the cluster carefully. Each node is labeled as invoker. I have tried experimenting with memory assigned to each container, max number of containers, I have increased the max number of invocations I can have per minute but despite all this, I haven't been able to increase the number of containers created. Additionally, I have tried with different machines used for the cluster (different number of cores and memory) but it was in vain.
Since Openwhisk is still relatively a young project, I don't get enough information from the official documentation unfortunately. Can someone explain how does Openwhisk decide when to start a new container? What parameters can I change in values.yaml such that I achieve greater number of containers?
The reason why very few containers were created is the fact that worker nodes do not have Docker Java runtime image and that it needs be downloaded on each of the nodes the first this environment is requested. This image weights a few hundred MBs and it needs time to be downloaded (a couple of seconds in google cluster). I don't know why Openwhisk controller decided to wait for already created pods to be available instead of downloading the image on other nodes. Anyway, once I downloaded the image manually on each of the nodes, using the same application with the same request rate, a new pod was created for each request that could not be served with an existing pod.
The OpenWhisk scheduler implements several heuristics to map an invocation to a container. This post by Markus Thömmes https://medium.com/openwhisk/squeezing-the-milliseconds-how-to-make-serverless-platforms-blazing-fast-aea0e9951bd0 explains how container reuse and caching work and may be applicable for what you are seeing.
When you inspect the activation record for the invokes in your experiment, check the annotations on the activation record to determine if the request was "warm" or "cold". Warm means container was reused for a new invoke. Cold means a container was freshly allocated to service the invoke.
See this document https://github.com/apache/openwhisk/blob/master/docs/annotations.md#annotations-specific-to-activations which explains the meaning of waitTime and initTime. When the latter is decorating the annotation, the activation was "cold" meaning a fresh container was allocated.
It's possible your activation rate is not fast enough to trigger new container allocations. That is, the scheduler decided to allocate your request to an invoker where the previous invoke finished and could accept the new request. Without more details about the arrival rate or think time, it is not possible to answer your question more precisely.
Lastly, OpenWhisk is a mature serverless function platform. It has been in production since 2016 as IBM Cloud Functions, and now powers multiple public serverless offerings including Adobe I/O Runtime and Naver Lambda service among others.
Say I have an instance with 2G memory, and a task/container with 0.5G soft memory limit, and 0.75G hard memory limit.
The instance is running 3 containers, each consuming 0.6G memory. Now a 4th container needs to be added? What happens to the 3 running containers? Is their memory allocation reduced? Or are they migrated to another instance? What if there is no other instance, will the 4th container be placed?
I understand how soft and hard CPU limits work since CPU is a dynamic resource (the application can handle spikes in free CPU). In case of memory, however, you cannot really take away memory from a container that is already using it.
The 4th container will not be able to spawn and you will get the below error.
(service sample) was unable to place a task because no container instance met all of its requirements. The closest matching (container-instance 05016874-f518-4b7a-a817-eb32a4d387f1) has insufficient memory available. For more information, see the Troubleshooting section of the Amazon ECS Developer Guide.
You need to add another ecs instance if you want to schedule the 4th container. all other 3 containers will be in the steady state. Nothing like memory allocation reduced happened in the cluster. If there is no instance your service will always be in an unsteady state and continue to give you the above errors.
Ref: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html
Actually, memory can be reclaimed from running processes. For example the kernel may evict memory that is backed by files (like the code of the process itself). If the data ends up being needed again the kernel can page it back in. This is explained a little in this blog post: https://chrisdown.name/2018/01/02/in-defence-of-swap.html
If the task is scheduled on that node but the kernel fails to reclaim enough memory to avoid an out-of-memory situation then one of the processes will get killed by the kernel, which docker will detect and kill the container, which ECS will notice. I'm not sure if ECS will try to reschedule the dead task on the same instance or a different one. It probably depends.
Background:
I have a language processing java app that requires about 16MB memory and takes about 40 seconds to initialise resources into that memory before exposing a webservice. I am new to containers and related technologies so apologies if my question is obvious...
Objective:
I want to make available several hundred instances of my app on-demand and in a pre-loaded/ pre-configured state. (eg I could make a call to AWS to stand-up 'n' instances of my app and they would be ready in <10seconds.)
Question:
I'm anticipating that I 'may' be able to create a docker image of the app, initialise it and pause hence be able to clone that on demand and 'un-pause' ? Could you advise whether what I am looking to do is possible and if so, how you would approach it.
AWS is my platform of choice so any AWS flavoured specifics would be super helpful.
I'd split your question in two, if you don't mind:
1. Spinning up N containers (or, more likely, scale on demand)
2. Preloading memory.
#1 is Kubernetes's bread and butter and you can find a ton of resources about it online, so allow me to focus on #2.
The real problem is that you're too focused on a possible solution to see the bigger picture:
You want to "preload memory" in order to speed up launch time (well, what do you think Java is doing in those 40s that the magick memory preloader wouldn't?).
A different approach would be to launch the container, let Java eat up resources for 40s, but not make that container available to the world during that time.
Kubernetes provides tools to achieve exactly that, see here:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
Hope this helps!
I have a video editing task that needs to be completed occasionally. The task is relatively intensive and therefore needs a powerful machine to do it. It can take up to about 10 minutes to complete. I might get 10-20 such requests per day, though that will increase in the future.
I have created a docker container that currently is a consumer that pulls jobs from PubSub. I was thinking to have an instance of this container on Google Container Engine. However, as I understand it, I would need to have at least one instance of this (large / powerful / expensive) container running at all times, even if the majority of time it is sat idle. Therefore my cost for running this service would be overly high until my usage increased.
Is there an alternative way of running my container (GCP or otherwise) where I push a job to some service, which then starts an instance of a powerful machine, processes the job, then shuts down? Therefore I am paying for my CPU hours used.
Have a look at the cluster autoscaler: https://cloud.google.com/container-engine/docs/cluster-autoscaler
I wish to know is there any way that I can create the threads on other nodes without starting the process on the nodes.
For example :- lets say I have cluster of 5 nodes I am running an application on node1. Which creates 5 threads on I want the threads not to be created in the same system but across the cluster lets say 1 node 1 thread type.
Is there any way this can be done or is it more depends on the Load Scheduler and does openMP do something like that?
if there is any ambiguity in the question plz let me know I will clarify it.
Short answer - not simply. Threads share a processes' address space, and so therefore it is extremely difficult to relocate them across cluster nodes. And, if it is possible (systems do exist which support this) then getting them to maintain a consistent state introduces a lot of synchronization and communication overhead which impacts on performance.
In short, if you're distributing an application across a cluster, stick with multiple processes and choose a suitable communication mechanism.
generally, leave threads to vm or engine to avoid very inert locks, focus app or transport, if one, create time (200 hz=5ms heuristic), if 2, repaint, good pattern: eventdrive