If I run out of memory, which process will be killed? - memory

I have an important process which I don't want to be killed. This process is using about 10GB of RAM. I have 32GB available. I want to run another process which will take up 18.2GB of RAM. There should be some room left. What happens if I hit the full 32GB? Will the last program I called be killed? That wouldn't be so bad, but the important one cannot die.

It is a likely chance that one of your programs will be moved to your disk but i'm not sure which one it is so I recommend batch processing.

Related

Dask: Would storage network speed cause a worker to die

I am running a process that writes large files across the storage network. I can run the process using a simple loop and I get no failures. I can run using distributed and jobqueue during off peak hours and no workers fail. However when I run the same command during peak hours, I get worker killing themselves.
I have ample memory for the task and plenty of workers, so I am not sitting in a queue.
The error logs usually has a bunch of over garbage collection limits followed by a Worker killed with Signal 9
Signal 9 suggests that the process has violated some system limit, not that Dask has decided for the worker to die. Since this only happens on high disk IO at busy times, indeed I agree that the network storage is the likely culprit, e.g., a lot of writes have been buffered, but are not being cleared through the relatively low bandwidth.
Dask also uses local storage for temporary files, and "local" might be the network storage. If you have real local disks on the nodes, you should use that, or if not, maybe turn off disk-spilling altogether. https://docs.dask.org/en/latest/setup/hpc.html#local-storage

What happens to ECS containers that exceed soft memory limit when there is memory contention?

Say I have an instance with 2G memory, and a task/container with 0.5G soft memory limit, and 0.75G hard memory limit.
The instance is running 3 containers, each consuming 0.6G memory. Now a 4th container needs to be added? What happens to the 3 running containers? Is their memory allocation reduced? Or are they migrated to another instance? What if there is no other instance, will the 4th container be placed?
I understand how soft and hard CPU limits work since CPU is a dynamic resource (the application can handle spikes in free CPU). In case of memory, however, you cannot really take away memory from a container that is already using it.
The 4th container will not be able to spawn and you will get the below error.
(service sample) was unable to place a task because no container instance met all of its requirements. The closest matching (container-instance 05016874-f518-4b7a-a817-eb32a4d387f1) has insufficient memory available. For more information, see the Troubleshooting section of the Amazon ECS Developer Guide.
You need to add another ecs instance if you want to schedule the 4th container. all other 3 containers will be in the steady state. Nothing like memory allocation reduced happened in the cluster. If there is no instance your service will always be in an unsteady state and continue to give you the above errors.
Ref: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html
Actually, memory can be reclaimed from running processes. For example the kernel may evict memory that is backed by files (like the code of the process itself). If the data ends up being needed again the kernel can page it back in. This is explained a little in this blog post: https://chrisdown.name/2018/01/02/in-defence-of-swap.html
If the task is scheduled on that node but the kernel fails to reclaim enough memory to avoid an out-of-memory situation then one of the processes will get killed by the kernel, which docker will detect and kill the container, which ECS will notice. I'm not sure if ECS will try to reschedule the dead task on the same instance or a different one. It probably depends.

Identify core an Erlang process

Any way to identify the specific core an Erlang process is scheduled on?
Let's say you spawn a bunch of processes to simply print out the core the process is running on, and then exit. Any way to do this?
I spent some time reading docs and googling but couldn't find anything.
Thanks.
EDIT: "core" = CPU core number (or if not number, another identifier that identifies the CPU core).
There is erlang:system_info(scheduler_id) that in most cases is maped to a logical core. But this information is pretty ephemeral because the process may be suspended and resumed on any other scheduler.
What is your use case that you really need that kind of information?
No there is not. If you spawn 2000 processes and they terminate quickly, chances are that you will finish the job before rebalancing occurs. In this case you would only have a single core operating all the time.
You could take a look at the scheduler utilization calls however, see erlang:statistics(scheduler_wall_time). It will tell you how much work each scheduler is really doing.

Need to improve the Linux performance for embedded system

I have a ARM OMAP based embedded system with 1 GHZ processor running Linux 2.6.33 cross compiled as CONFIG_PREEMPT. One of the Processes (process 1) is critical and need to run every 4 or 8 milli sec which is configurable. There is another process's (process 2) thread which transfers image to FTP or any other configured application. To trigger the time critical process 1 i use a high resolution timer as a seperate thread (FIFO, say 60) with highest Real time priority in the system. Process 2 is having lower RT priority (RR 20) than process 1 (RR 50).
If there is no image transfer enabled or configured i dont see any timeouts for the critical process (process 1) mentioned above. But if i enable any image transfer then the process 1 will timeout or the image transfer fails due to some error and one of these process dies and then other process runs fine.
I see that if the image resolution is higher then the timing out of process 1 is faster.
With higher resolution of image (say SXGA) the NET_RX ethernet interrupt holds the CPU for long time and by the time it gives up CPU, process 1 timesout. It looks like NET_RX interrupt is having highest priority than timer interrupt used for process 1 and it doesn't give the CPU.
I want to make sure both process running and process 1 should not miss the deadline.
How to debug the system that where it is exactly waiting so that i can remove those waits or atleast avoid those if possible.
How can i achieve this ? Please help.
Linux is not a real-time operating system. It offers no guarantees other than "best efforts" scheduling.
If you have a task which has to run at a particular rate all the time, you need to run that task under a proper RTOS which can make those sorts of guarantees.
Otherwise you have to relax your constraints to "runs every 4ms, mostly".
You may want to check "http://www.techonline.com/electrical-engineers/education-training/tech-papers/4402454/Challenges-in-Using-Linux-for-CPU-intensive-real-time-networking-products". It describes network performance in PREEMPT_RT
I found the solution to this performance issue by modifying priority of the thread sending image data to a SCHED_NORMAL and re arranging the source code avoiding unnecessary loops. Now i see that the image transfer is not affecting the performance of the whole system.

speed up php-cli

Why is a php cli process using 25% of CPU, is there a way to reduce this? Right now I'm running 3 instances but obviously I would like to run much more to finish the job faster.
Background info: I'm moving data from a transbase db to mysql db.
EDIT: If I run this in a browser there isn't such a noticeable load on the CPU.
More processes doesn't mean faster processing. The PHP process takes as much CPU as it can to finisgh the task as quick as possible. It's probably 25% because you got a quad-core processor and it's a single threaded task.
Ideally, you would need 4 processes if you could assign each of them to a different code. Also, because of waiting for database or disk-I/O, a single thread cannot fully use all CPU power all the time, so go ahead and run more processes. It's not that a 5th processes will crash because all CPU power is used up; it will just take its share, while the OS divides processing power to all running processes.
Just dont' start too many; every process has a little overhead, and you won't benefit from having 200 simultaneous processes.

Resources