Unix CPU/Memory Monitoring - memory

I want to develop program that can monitor many Unix client's CPU/Memory share.
A Unix client program written in C runs the command popen to get CPU / memory information and send it to the server using sockets.
Example, On Solaris 11, use the following command to get CPU / Memory information.
CPU : top -n 1 |grep "CPU"|sed -n 1p|awk '{print $3}'|sed 's/[^0-9.0-9]//g'|awk '{print 100-$1}'
Memory : top -n 1 |grep \"Mem\" |awk '{print $2, $5}'|sed 's/[^0-9]/ /g'|awk '{print $2/$1*100}'
There is a problem here.
Solaris 9, 10 have no top command.
HP-UX can not use grep because it can not execute commands once.
AIX's topas, too.
If you have other command or way, please recommend to me.

You know in HP-UX you can use a glance.
And you can use top but it is a not better decision.

Related

Optimizing of Opengrok on large base

i have a server instance here with 4 Cores and 32 GB RAM and Ubuntu 20.04.3 LTS installed. On this machine there is an opengrok-instance running as docker container.
Inside of the docker container it uses AdoptOpenJDK:
OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
Eclipse OpenJ9 VM AdoptOpenJDK-11.0.11+9 (build openj9-0.26.0, JRE 11 Linux amd64-64-Bit Compressed References 20210421_975 (JIT enabled, AOT enabled)
OpenJ9 - b4cc246d9
OMR - 162e6f729
JCL - 7796c80419 based on jdk-11.0.11+9)
The code-base that the opengrok-indexer scans is 320 GB big and tooks 21 hours.
What i am figured is out was, that i've am disable the history-option it tooks lesser time. Is there a possibility to reduce this time, if the history-flag is set.
Here are my index-command:
opengrok-indexer -J=-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -J=-Djava.util.logging.config.file=/usr/share/tomcat10/conf/logging.properties -J=-XX:-UseGCOverheadLimit -J=-Xmx30G -J=-Xms30G -J=-server -a /var/opengrok/dist/lib/opengrok.jar -- -R /var/opengrok/etc/read-only.xml -m 256 -c /usr/bin/ctags -s /var/opengrok/src/ -d /var/opengrok/data --remote on -H -P -S -G -W /var/opengrok/etc/configuration.xml --progress -v -O on -T 3 --assignTags --search --remote on -i *.so -i *.o -i *.a -i *.class -i *.jar -i *.apk -i *.tar -i *.bz2 -i *.gz -i *.obj -i *.zip"
Thank you for your help in advance.
Kind Regards
Siegfried
You should try to increase the number of threads using the following options:
--historyThreads number
The number of threads to use for history cache generation on repository level. By default the number of threads will be set to the number of available CPUs.
Assumes -H/--history.
--historyFileThreads number
The number of threads to use for history cache generation when dealing with individual files.
By default the number of threads will be set to the number of available CPUs.
Assumes -H/--history.
-T, --threads number
The number of threads to use for index generation, repository scan
and repository invalidation.
By default the number of threads will be set to the number of available
CPUs. This influences the number of spawned ctags processes as well.
Take a look at the "renamedHistory" option too. Theoretically "off" is the default option but this has a huge impact on the index time, so it's worth the check:
--renamedHistory on|off
Enable or disable generating history for renamed files.
If set to on, makes history indexing slower for repositories
with lots of renamed files. Default is off.

grep multiple patterns using pattern file

I downloaded very huge list of hosts to block ads.
The problem is some sites are broken its functionality, like forum/discussion and/or pics. So i wanna remove some sites in hosts file.
Let say I wanna remove a.com and b.com from hosts.
These methods work.
grep -ve a.com -e b.com hosts > new_hosts
or
egrep -v 'a.com|b.com' hosts > new_hosts
Both are working fine. But if pattern increase, I wanna write the pattern in file.
If I use this
grep -vf pattern.txt hosts > new_hosts
Only the last pattern will be removed.
If pattern.txt contain
a.com
b.com
Only b.com omitted from new_hosts, a.com still written in new_hosts.
So what grep command to use using pattern file?
If you have a hosts file that you want to compare with another file containing entries you want to eliminate, this will be easier with uniq than with grep.
Just combine the files and run something like this:
cat hosts badfile badfile | sort | uniq -u > new_hosts
Badfile is added twice because if an entry is not already present in hosts, it will remain. Duplicating guarantees all copies are eliminated.
Thx for the feedback guys. Since most of you suspect the error from pattern.txt, then I suspect it could be windows notepad which made the error.
New line from Windows notepad is terminated by 0D 0A (hex).
I read somewhere the new line for grep shoud be 0A (hex).
After editing the pattern.txt using Notepad++, this command finally works :-)
grep -vf pattern.txt hosts > new_hosts
Or maybe this is better
fgrep -vf pattern.txt hosts > new_hosts
Both are working perfectly :-)

Use all cores to make OpenCV 3 [duplicate]

Quick question: what is the compiler flag to allow g++ to spawn multiple instances of itself in order to compile large projects quicker (for example 4 source files at a time for a multi-core CPU)?
You can do this with make - with gnu make it is the -j flag (this will also help on a uniprocessor machine).
For example if you want 4 parallel jobs from make:
make -j 4
You can also run gcc in a pipe with
gcc -pipe
This will pipeline the compile stages, which will also help keep the cores busy.
If you have additional machines available too, you might check out distcc, which will farm compiles out to those as well.
There is no such flag, and having one runs against the Unix philosophy of having each tool perform just one function and perform it well. Spawning compiler processes is conceptually the job of the build system. What you are probably looking for is the -j (jobs) flag to GNU make, a la
make -j4
Or you can use pmake or similar parallel make systems.
People have mentioned make but bjam also supports a similar concept. Using bjam -jx instructs bjam to build up to x concurrent commands.
We use the same build scripts on Windows and Linux and using this option halves our build times on both platforms. Nice.
If using make, issue with -j. From man make:
-j [jobs], --jobs[=jobs]
Specifies the number of jobs (commands) to run simultaneously.
If there is more than one -j option, the last one is effective.
If the -j option is given without an argument, make will not limit the
number of jobs that can run simultaneously.
And most notably, if you want to script or identify the number of cores you have available (depending on your environment, and if you run in many environments, this can change a lot) you may use ubiquitous Python function cpu_count():
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.cpu_count
Like this:
make -j $(python3 -c 'import multiprocessing as mp; print(int(mp.cpu_count() * 1.5))')
If you're asking why 1.5 I'll quote user artless-noise in a comment above:
The 1.5 number is because of the noted I/O bound problem. It is a rule of thumb. About 1/3 of the jobs will be waiting for I/O, so the remaining jobs will be using the available cores. A number greater than the cores is better and you could even go as high as 2x.
make will do this for you. Investigate the -j and -l switches in the man page. I don't think g++ is parallelizable.
distcc can also be used to distribute compiles not only on the current machine, but also on other machines in a farm that have distcc installed.
I'm not sure about g++, but if you're using GNU Make then "make -j N" (where N is the number of threads make can create) will allow make to run multple g++ jobs at the same time (so long as the files do not depend on each other).
GNU parallel
I was making a synthetic compilation benchmark and couldn't be bothered to write a Makefile, so I used:
sudo apt-get install parallel
ls | grep -E '\.c$' | parallel -t --will-cite "gcc -c -o '{.}.o' '{}'"
Explanation:
{.} takes the input argument and removes its extension
-t prints out the commands being run to give us an idea of progress
--will-cite removes the request to cite the software if you publish results using it...
parallel is so convenient that I could even do a timestamp check myself:
ls | grep -E '\.c$' | parallel -t --will-cite "\
if ! [ -f '{.}.o' ] || [ '{}' -nt '{.}.o' ]; then
gcc -c -o '{.}.o' '{}'
fi
"
xargs -P can also run jobs in parallel, but it is a bit less convenient to do the extension manipulation or run multiple commands with it: Calling multiple commands through xargs
Parallel linking was asked at: Can gcc use multiple cores when linking?
TODO: I think I read somewhere that compilation can be reduced to matrix multiplication, so maybe it is also possible to speed up single file compilation for large files. But I can't find a reference now.
Tested in Ubuntu 18.10.

How to make output of any shell command unbuffered?

Is there a way to run shell commands without output buffering?
For example, hexdump file | ./my_script will only pass input from hexdump to my_script in buffered chunks, not line by line.
Actually I want to know a general solution how to make any command unbuffered?
Try stdbuf, included in GNU coreutils and thus virtually any Linux distro. This sets the buffer length for input, output and error to zero:
stdbuf -i0 -o0 -e0 command
The command unbuffer from the expect package disables the output buffering:
Ubuntu Manpage: unbuffer - unbuffer output
Example usage:
unbuffer hexdump file | ./my_script
AFAIK, you can't do it without ugly hacks. Writing to a pipe (or reading from it) automatically turns on full buffering and there is nothing you can do about it :-(. "Line buffering" (which is what you want) is only used when reading/writing a terminal. The ugly hacks exactly do this: They connect a program to a pseudo-terminal, so that the other tools in the pipe read/write from that terminal in line buffering mode. The whole problem is described here:
http://www.pixelbeat.org/programming/stdio_buffering/
The page has also some suggestions (the aforementioned "ugly hacks") what to do, i.e. using unbuffer or pulling some tricks with LD_PRELOAD.
You could also use the script command to make the output of hexdump line-buffered (hexdump will be run in a pseudo terminal which tricks hexdump into thinking its writing its stdout to a terminal, and not to a pipe).
# cf. http://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe/
stty -echo -onlcr
script -q /dev/null hexdump file | ./my_script # FreeBSD, Mac OS X
script -q -c "hexdump file" /dev/null | ./my_script # Linux
stty echo onlcr
One should use grep or egrep "--line-buffered" options to solve this. no other tools needed.

Run top, print output, then quit OR how to get real memory usage without top

I'm running Mac OS 10.6. I want to run top to get memory usage, but not in interactive mode, or any mode that updates. I just want memory usage at that point in time then return to prompt. I've looked for other utilities to get memory usage... but came up short (vm_stat is for virtual memory). Can someone direct me how to get top or something else to print memory usage to stdout?
top -l 1 will put just one sample to standard output (you can redirect it, filter it, etc, as you wish of course). man top for many more details.
you can also use the ps command. eg
ps -eo pmem,comm
check the ps man page for more output formatting. eg rss, size etc..
I've been using this command to spit out the basic info in the first few lines
top -l 1 -n 0
-l 1 = just one sample
-n 0 = 0 processes
this is a bit of a hack .... but if you only want the memory line ... you could feed it through head and tail.
top -l 1 -n 0 | head -n 5 | tail -n 2

Resources