Current config:
16GO RAM, 4 cpu cores, Apache/2.2 using prefork module (which is set at 700 maxClients, since avg process size ~18MB), with suexec and suphp mods enabled (PHP 5.5).
Back-end of site using CakePHP2 and storing on a MySQL server. The site consists of text / some compressed images in the front and data processing in the back.
I use this shell script to calculate my avg process size:
total_httpd_processes_size=`ps -ylC apache2 --sort:rss | awk '{ sum += $9 } END { print sum }'`
total_http_processes_count=`ps -ylC apache2 --sort:rss | wc -l`
echo "total_http_processes_count=$total_http_processes_count"
AVG_httpd_process_size=$(expr $total_httpd_processes_size / $total_http_processes_count)
httpd_process_size_MB=$(expr $AVG_httpd_process_size / 1024)
echo "httpd_process_size_MB=$total_httpd_process_size_MB"
By disabling unused apache mods, I won ~4mb which is pretty nice (from 22mb to 18mb).
What else can I do to significantly reduce the AVG process size ?
Related
I'm running dask over slurm via jobqueue and I have been getting 3 errors pretty consistently...
Basically my question is what could be causing these failures? At first glance the problem is that too many workers are writing to disk at once, or my workers are forking into many other processes, but it's pretty difficult to track that. I can ssh into the node but I'm not seeing an abnormal number of processes, and each node has a 500gb ssd, so I shouldn't be writing excessively.
Everything below this is just information about my configurations and such
My setup is as follows:
cluster = SLURMCluster(cores=1, memory=f"{args.gbmem}GB", queue='fast_q', name=args.name,
env_extra=["source ~/.zshrc"])
cluster.adapt(minimum=1, maximum=200)
client = await Client(cluster, processes=False, asynchronous=True)
I suppose i'm not even sure if processes=False should be set.
I run this starter script via sbatch under the conditions of 4gb of memory, 2 cores (-c) (even though i expect to only need 1) and 1 task (-n). And this sets off all of my jobs via the slurmcluster config from above. I dumped my slurm submission scripts to files and they look reasonable.
Each job is not complex, it is a subprocess.call( command to a compiled executable that takes 1 core and 2-4 GB of memory. I require the client call and further calls to be asynchronous because I have a lot of conditional computations. So each worker when loaded should consist of 1 python processes, 1 running executable, and 1 shell.
Imposed by the scheduler we have
>> ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 512
-n: file descriptors 1024
-l: locked-in-memory size (kbytes) 64
-v: address space (kbytes) unlimited
-x: file locks unlimited
-i: pending signals 1031203
-q: bytes in POSIX msg queues 819200
-e: max nice 0
-r: max rt priority 0
-N 15: unlimited
And each node has 64 cores. so I don't really think i'm hitting any limits.
i'm using the jobqueue.yaml file that looks like:
slurm:
name: dask-worker
cores: 1 # Total number of cores per job
memory: 2 # Total amount of memory per job
processes: 1 # Number of Python processes per job
local-directory: /scratch # Location of fast local storage like /scratch or $TMPDIR
queue: fast_q
walltime: '24:00:00'
log-directory: /home/dbun/slurm_logs
I would appreciate any advice at all! Full log is below.
FORK BLOCKING IO ERROR
distributed.nanny - INFO - Start Nanny at: 'tcp://172.16.131.82:13687'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/dbun/.local/share/pyenv/versions/3.7.0/lib/python3.7/multiprocessing/forkserver.py", line 250, in main
pid = os.fork()
BlockingIOError: [Errno 11] Resource temporarily unavailable
distributed.dask_worker - INFO - End worker
Aborted!
CANT START NEW THREAD ERROR
https://pastebin.com/ibYUNcqD
BLOCKING IO ERROR
https://pastebin.com/FGfxqZEk
EDIT:
Another piece of the puzzle:
It looks like dask_worker is running multiple multiprocessing.forkserver calls? does that sound reasonable?
https://pastebin.com/r2pTQUS4
This problem was caused by having ulimit -u too low.
As it turns out each worker has a few processes associated with it, and the python ones have multiple threads. In the end you end up with approximately 14 threads that contribute to your ulimit -u. Mine was set to 512, and with a 64 core system I was likely hitting ~896. It looks like the a maximum threads per a process I could have had would have been 8.
Solution:
in .zshrc (.bashrc) I added the line
ulimit -u unlimited
Haven't had any problems since.
In Snakemake, I have 5 rules. For each I set the memory limit by resources mem_mb option.
It looks like this:
rule assembly:
input:
file1 = os.path.join(MAIN_DIR, "1.txt"), \
file2 = os.path.join(MAIN_DIR, "2.txt"), \
file3 = os.path.join(MAIN_DIR, "3.txt")
output:
foldr = dir, \
file4 = os.path.join(dir, "A.png"), \
file5 = os.path.join(dir, "A.tsv")
resources:
mem_mb=100000
shell:
" pythonscript.py -i {input.file1} -v {input.file2} -q {input.file3} --cores 5 -o {output.foldr} "
I want to limit the memory usage of the whole Snakefile by doing something like:
snakamake --snakefile mysnakefile_snakefile --resources mem_mb=100000
So not all jobs would use 100GB each ( if I have 5 rules, meaning as 500GB memory allocation), but all of their executions will be maximum 100GB ( 5 jobs, total of 100 GB allocation?)
The command line argument sets the total limit. The Snakemake scheduler will ensure that for the set of running jobs, the sum of the mem_mb resources will not exceed the total limit.
I think this is exactly what you want, isn't it? You just need to set the per-job expected memory in the rule itself. Note that Snakemake does not measure this for you. You have to define that value yourself in the rule. E.g., if you expect your job to use 100MB memory, put mem_mb=100 into that rule.
I am following this link and trying to implement the scenarios there.
So I need to generate a data for MANET nodes representing their location in this format:
Current time - latest x – latest y – latest update time – previous x –previous y – previous update time
with the use of setdest tool with these options:
1500 by 300 grid, ran for 300 seconds and used pause times of 20s and maximum velocities of 2.5 m/s.
so I come up with this command
./setdest -v 2 -n 10 -s 2.5 -m 10 -M 50 -t 300 -p 20 -x 1500 -y 300 > test1.tcl
which worked and generated a tcl file, but I don't know how can I obtain the data in the required format.
setdest -v 2 -n 10 -s 2.5 -m 10 -M 50 -t 300 -p 20 -x 1500 -y 300 > test1.tcl
Not a tcl file : Is a "scen" / scenario file with 1,700 "ns" commands. Your file was renamed to "test1.scen", and is now used in the manet examples, in the simulation example aodv-manet-20.tcl :
set val(cp) "test1.scen" ;#Connection Pattern
Please be aware that time settings are maximum time. "Long time settings" were useful ~20 years ago when computers were slow. (Though there are complex simulations lasting half an hour to one hour.)
Link, manet-examples-1.tar.gz https://drive.google.com/file/d/0B7S255p3kFXNR05CclpEdVdvQm8/view?usp=sharing
Edit: New example added → manet0-16-nam.tcl → → https://drive.google.com/file/d/0B7S255p3kFXNR0ZuQ1l6YnlWRGc/view?usp=sharing
I am collecting some metrics using graphite, but sometimes there is no data coming into it (probably because the server has gone down, or no network connectivity). I want nagios to send me an alert during such an event. How do i do that?
You could use the check_file_age script from nagios-plugins to check a single known datapoint of interest per system that you are collecting data from.
check_file_age -w 600 -c 1800 /opt/graphite/storage/whisper/servers/$(uname -f)/cpu/idl.wsp
That would alert you if a certain metric was missing within 5 minutes.
Else
You could run a find command over all the points, and report any that have not been updated in n hours.
#!/bin/bash
OLD_GRAPHS=$(find /opt/graphite/storage/whisper -mmin +120 -type f | wc -l)
if [[ OLD_GRAPHS -gt 0 ]];then
echo "Found ${OLD_GRAPHS} graph(s) without an update in 120 minutes"
exit 1
fi
echo "All graphs are up to date"
exit 0
I want to capture all bandwidth value in iperf not only Mbits size but also bits and Kbits as well.
[3] 0.0 - 1.0 sec 128 Kbytes 1.05 Mbits/sec
[3] 1.0 - 2.0 sec 0 Kbytes 0.00 bits/sec
[3] 2.0 - 3.0 sec 90 Kbytes 900.5 Kbits/sec
So far I know about this
iperf -c 10.0.0.1 -i 1 -t 100 | grep -Po '[0-9.]*(?= Mbits/sec)'
but that only captures Mbits value. How to capture bits/sec and Kbits/sec as well at the same time with Mbits/sec?
Thank you
I know this is old, but in case someone stumbles upon it, you could add an optional character class to your grep:
grep -Po '[0-9.]*(?= [KM]*bits/sec)'
This should do it
iperf -c 10.0.0.1 -i 1 -t 100 | awk '{print$5}' FPAT=[.0-9]+
FPAT=[.0-9]+ defines a field as one or more of .0-9
{print$5} prints just the rate
you might want to man iperf to see what's supported. Here's the latest from 2.0.10
-f, --format
[abkmgKMG] format to report: adaptive, bits, Kbits, Mbits, KBytes, MBytes (see NOTES for more)