Ubuntu 18.04: GNU parallel can't find and use the full number of cores on a local system - gnu-parallel

I am using GNU parallel (version 20200522) on Ubuntu Linux 18.04.4 and running jobs on all cores of a local server minus 2 cores, that is I am using the -j-2 parameter.
find /folder/ -type f -iname "*.pdf" | parallel -j-2 --nice 2 "script.sh {1} {1/.}; mv -f -v {1} /folder2/; mv -f {1/.}.txt /folder3/" :::: -
However, the program shows
Error: Cannot run any jobs.
I tried using the -j100% parameter and I have seen that it uses just 1 core(job), and I deduce that, for GNU parallel, 100% of the available cores on this system is just one core.
If I use the -j5 parameter (which does not imply autodetection of the total number of cores), everything is alright, parallel launches 5 jobs and uses 5 cores.
The interesting part is that the file /root/.parallel/tmp/sshlogin/MACHINE_NAME/cpuspec contains the following:
1
6
6
which means, I think, that GNU parallel should see 6 available cores.
I have tried deleting the cpuspec file and running parallel again to redetect the total number of cores, but the cpuspec file and the behavior of the program remain the same.
On different systems, deleting the cpuspec file solved all issues, but on this particular system it is not working. The virtual machine is copied from another server with a different configuration, that is why I need deleting the cpuspec file.
What should I do to get GNU parallel to correctly detect the number of cores on the system, so that I can use the -j-2 parameter?
Update 21.07:
After deleting once again the folder with the cpuspec file, running the parallel --number-of-sockets/cores/threads commands and using just once the -S 6/: parameter, the problem seems to have resolved itself. Now GNU parallel correctly detects the number of cores and the -j-2 parameters works.
I am not sure what good things happened, but I am not able to reproduce the bug anymore.
Ole, thank you for your answer. If I meet the bug again or if I am able to reproduce it, I will post it here.
And here is the output to the commands:
parallel --number-of-sockets
1
parallel --number-of-cores
6
parallel --number-of-threads
6
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2620 v3 # 2.40GHz
stepping : 2
microcode : 0xffffffff
cpu MHz : 2397.218
cache size : 15360 KB
physical id : 0
siblings : 6
core id : 0
cpu cores : 6
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl cpuid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm pti fsgsbase bmi1 avx2 smep bmi2 erms xsaveopt
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4794.43
clflush size : 64
cache_alignment : 64
address sizes : 42 bits physical, 48 bits virtual
power management:
And it repeats itself for 5 more cores.

You may have found a bug. Please post the output of:
cat /proc/cpuinfo
parallel --number-of-sockets
parallel --number-of-cores
parallel --number-of-threads
Also see if you can make an MCVE.
As a workaround you can use -S 6/: to force GNU Parallel to detect 6 cores on your system.
find /folder/ -type f -iname "*.pdf" |
parallel -S 6/: -j-2 --nice 2 "script.sh {1} {1/.}; mv -f -v {1} /folder2/; mv -f {1/.}.txt /folder3/"
(Also :::: - can be left out completely: If there is no ::: :::: then GNU Parallel reads from stdin).

Related

Does Hyper Threading will decrease the performance of Docker?

I have the below configuration server, In this we are running only Docker, All the applications are running as Docker container , Expect Docker , no other application are installed in the OS
System Information :
- Server Model : Dell Power Edge R440
- Number of CPU : 2
- Number of Cores Per CPU : 10 Memory (RAM) - 64 GB
- Hyper Threading : Enabled
- Virtualization : Enabled
- Operating System : CentOS 7
I have gone through Dell BIOS tuning guide.(Dell Power Edge Bios tuning for performance power)
In that they are suggesting to Enable Hyper threading (Logical Processor), and their default is Hyper Threading( Logical Processor) Enabled.
Does Hyper Threading will decrease the performance of Docker?
Containers running in docker are
- Grafana
- PostgreSQL
- EMQX
- Prometheus
- Golang
Output of lscpu
Architecture : x86_64
CPU op-mode(s) : 32-bit, 64-bit
Byte Order : Little Endian
CPU(s) : 40
On-line CPU(s) list : 0-39
Thread(s) per core : 2
Core(s) per socket : 10
Socket(s) : 2
NUMA node(s) : 2
Vendor ID : GenuineIntel
CPU family : 6
Model : 85
Model name : Intel® Xeon® Silver 4210 CPU # 2.20GHz
Stepping : 7
CPU MHz : 999.963
CPU max MHz : 3200.0000
CPU min MHz : 1000.0000
BogoMIPS : 4400.00
Virtualization : VT-x
L1d cache : 32K
L1i cache : 32K
L2 cache : 1024K
L3 cache : 14080K

Why multi-gpu faster than single gpu in caffe training?

In the same hardware/software env, With the same net and solver, just differ in command line.
While command line is:
caffe-master/build/tools/caffe train --solver=solver_base.prototxt --gpu=6
It tasks about 50 seconds per 100 iters.
While command is :
caffe-master/build/tools/caffe train --solver=solver_base.prototxt --gpu=4,5,6,7
It takes about 48 seconds per 100 iters.
As usual, multi-gpu training should cost more time than single-gpu because of cost like replication. So does anyone can tell me why. Thanks very much!
Env:
2 * Intel(R) Xeon(R) CPU E5-2699 v4 # 2.20GHz
8 * Nvidia Tesla V100 PCIE 16GB
Caffe 1.0.0 / use_cudnn on
Cuda 9.0.176
Cudnn 6.0.21

It possible to boot freertos over network?

some words to my system.
Im work on the Xilinx development-board zc706.
The basic example of freertos are running.
Now the question is: How i can boot the application over network?
A freertos application is a bare-metal approach.
Typically a loader like u-boot is been used, but the examples I find, was only for the linux use-case.
Addition:
With the XMD console its possible to load the u-boot in the memory
XMD% source ps7_init.tcl
XMD% ps7_init
XMD% dow u-boot
Processor started. Type "stop" to stop processor
Processor Stop Condition Unknown
Processor Reset .... DONE
Downloading Program -- u-boot
section, .text: 0x04000000-0x040524d7
section, efi_runtime_text: 0x040524d8-0x040524fb
section, .rodata: 0x04052500-0x040650d1
section, .hash: 0x040650d4-0x040650ff
section, .dtb.init.rodata: 0x04065100-0x0406866f
section, .data: 0x04068670-0x0406b31b
section, .got.plt: 0x0406b31c-0x0406b327
section, efi_runtime_data: 0x0406b328-0x0406b3ff
section, .u_boot_list: 0x0406b400-0x0406c71f
section, .rel.dyn: 0x0406c720-0x04077d5f
section, .bss: 0x0406c720-0x040ad29f
Download Progress..10.20.30.40.50.60.70.80.90.Done
Setting PC with Program Start Address 0x04000000
XMD% run
RUNNING> 0
XMD%
The result ist seen with on a com port:
U-Boot 2017.01-00012-g374a838 (May 29 2017 - 17:55:04 +0200)
Model: Zynq ZC706 Development Board
Board: Xilinx Zynq
I2C: ready
DRAM: ECC disabled 1 GiB
MMC: sdhci#e0100000: 0 (SD)
SF: Detected s25fl128s_64k with page size 512 Bytes, erase size 128 KiB, total 32 MiB
*** Warning - bad CRC, using default environment
In: serial#e0001000
Out: serial#e0001000
Err: serial#e0001000
Model: Zynq ZC706 Development Board
Board: Xilinx Zynq
Net: ZYNQ GEM: e000b000, phyaddr 7, interface rgmii-id
eth0: ethernet#e000b000
Hit any key to stop autoboot: 0
Device: sdhci#e0100000
Manufacturer ID: 27
OEM: 5048
Name: SD16G
Tran Speed: 50000000
Rd Block Len: 512
SD version 3.0
High Capacity: Yes
Capacity: 14.5 GiB
Bus Width: 4-bit
Erase Group Size: 512 Bytes
reading uEnv.txt
** Unable to read file uEnv.txt **
Copying Linux from SD to RAM...
reading uImage
** Unable to read file uImage **
Zynq>
Addition:
I have build the FSBL with the flag FSBL_DEBUG:
(Project -> Properties -> C/C++ Build -> Settings -> ARM gcc compiler -> Symbols)
The I build the bin file only with the boot loader partion and put it on the SD card:
Xilinx Tools->Create Boot Image
Addition:
The problem is, that the SDK needs a file with name u-boot.elf. The extention was not there after the build of u-boot.
So now I have a TFTP-Server running on my host and the u-boot find the uEnv.txt file, but the cmd in this file doesn't run:
How I can setup the u-boot an give the right loadAddress to loadthe freeRTos elf-file?
The tftpboot cmd seems to be:
tftpboot [loadAddress] [bootfilename]
e.g.
tftpboot 0x80400000 vlm-boards/14726/uImage
What is the load address of the zc706 board?
Addition:
The connection an the download with the TFTP-server seems to work:
But after starting with the "go" cmd a reset occur.
Zynq> setenv ipaddr 192.168.150.101
Zynq> setenv netmask 255.255.255.0
Zynq> setenv gatewayip 192.168.150.1
Zynq> serverip=192.168.150.100
Zynq> ping 192.168.150.100
Using ethernet#e000b000 device
host 192.168.150.100 is alive
Zynq> tftpboot 0x8000 FreeRTOS_HelloWorld.elf
Using ethernet#e000b000 device
TFTP from server 192.168.150.100; our IP address is 192.168.150.101
Filename 'FreeRTOS_HelloWorld.elf'.
Load address: 0x8000
Loading: ###############
2.8 MiB/s
done
Bytes transferred = 205675 (3236b hex)
Zynq> go 0x8000
## Starting application at 0x00008000 ...
undefined instruction
pc : [<0000fa60>] lr : [<3ff443c4>]
reloc pc : [<c40cda60>] lr : [<040023c4>]
sp : 3eb20cf4 ip : 0000001c fp : 3ff4437c
r10: 3eb1f9b0 r9 : 3eb21ee8 r8 : 3ffaef30
r7 : 00000000 r6 : 00008000 r5 : 00000002 r4 : 3eb2f9b4
r3 : 00008000 r2 : 3eb2f9b4 r1 : 3eb2f9b4 r0 : 00001084
Flags: nZcv IRQs off FIQs off Mode SVC_32
Resetting CPU ...
resetting ...
Thx in advance
The solution is:
The Xilinx SDK supply as an output an Elf-File, which the u-boot understands:
tftpboot 0x000000 FreeRTOS_ZC706_HelloWorld.elf
bootelf 0x0
Thanks
tftpboot 0x0 hello.efl; bootelf 0x0;
works in Uboot 2019.2 version and FreeRTOS.elf.
For the other core, you need to convert it to bin format using arm-none-eabi-objcopy -O binary hello.elf hello.bin. tftpboot it under the correct memory postion. And fire it up in the CPU0 code.

How to speed up sonarqube analysis job?

I have one java based application which is having huge line of source code(~1m).Now I am using jenkins with sonar-runner-2.4 to run analysis with code coverage and test cases count.I have upgraded sonarqube server from 5.4 to 6.3.1.Before upgrade this job took 9hrs to complete the whole analysis (still it is very much long time but fine) but after upgrade to sonarqube-6.3.1 same job taking 13hrs to complete the same analysis.
How do I improve analysis time at least my earlier time 9hr ?
EDIT
Here is my JAVA_OPTS for sonarqube-6.3.1 instance
sonar.web.javaOpts=-Xmx6G -Xms2G -XX:MaxPermSize=1G -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true
Available Hardware :
$lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 26
Stepping: 5
CPU MHz: 1596.000
BogoMIPS: 3999.44
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s): 4-7
Available Memory :
$free -m
total used free shared buff/cache available
Mem: 128714 58945 66232 430 3535 68298
Swap: 32767 957 31810
sonar-project.properties for the long running job:
sonar-project.properties
As you haven't really given many details, I can't really give many details in the answer, but the simple answer is that you have to make the scan do less work.
Look at your codebase. Is your scan processing generated classes? Is it scanning test classes? Is it scanning classes that have little real business logic? If you answer "yes" to any of those, consider excluding those classes.
Look at the SonarQube plugins you're using. Are you running every possible plugin you can run? Are there some heuristics you don't need to run, or perhaps you could run less frequently?

Tracking memory usage of piped commands with valgrind

I have a couple processes running a tool I've written that are joined by pipes, and I would like to measure their collected memory usage with valgrind. So far, I have tried something like:
$ valgrind tool=massif trace-children=yes --peak-inaccuracy=0.5 --pages-as-heap=yes --massif-out-file=myProcesses".%p" myProcesses.script
Where myProcesses.script runs the equivalent of my tool foo twice, e.g.:
foo | foo > /dev/null
Valgrind doesn't seem to capture the collected memory usage of this the way I expect. If I use top to track this, I get (for sake of argument) 10% memory usage on the first foo, and then another 10% collects on the second foo before the myProcesses.script completes. This is the sort of thing I want to measure: the usage of both processes. Valgrind instead returns the following error:
Massif: ms_main.c:1891 (ms_new_mem_brk): Assertion 'VG_IS_PAGE_ALIGNED(len)' failed.
Is there a way to collect memory usage data for commands I'm using in a piped fashion (using valgrind)? Or a similar tool that I can use to accurately automate these measurements?
The numbers that top returns while polling seem hand-wavy, to me, and I am seeking accurate and repeatable measurements. If you have suggestions for alternative tools, I would welcome those, as well.
EDIT - Fixed typo with valgrind option.
EDIT 2 - For some reason, it appears that the option --pages-as-heap is giving us troubles with the binaries we're testing. Your examples run fine. A new page is created every time we enter a non-inlined function (stack overflows - heh). We wanted to count those, but they're relatively minor in the scale of memory usage we're testing. (Perhaps there aren't function calls in ls or less?) Removing --pages-as-heap helped get testing working again. Thanks to MrGomez for the great help.
With the correct valgrind version given in the errata, this seems to just work for me in Valgrind 3.6.1. My invocation:
<me>#harley:/tmp/test$ /usr/local/bin/valgrind --tool=massif \
--trace-children=yes --peak-inaccuracy=0.5 --pages-as-heap=yes \
--massif-out-file=myProcesses".%p" ./testscript.sh
==21067== Massif, a heap profiler
==21067== Copyright (C) 2003-2010, and GNU GPL'd, by Nicholas Nethercote
==21067== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==21067== Command: ./testscript.sh
==21067==
==21068== Massif, a heap profiler
==21068== Copyright (C) 2003-2010, and GNU GPL'd, by Nicholas Nethercote
==21068== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==21068== Command: /bin/ls
==21068==
==21070== Massif, a heap profiler
==21070== Copyright (C) 2003-2010, and GNU GPL'd, by Nicholas Nethercote
==21070== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==21069== Massif, a heap profiler
==21069== Copyright (C) 2003-2010, and GNU GPL'd, by Nicholas Nethercote
==21069== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==21069== Command: /bin/sleep 5
==21069==
==21070== Command: /usr/bin/less
==21070==
==21068==
(END) ==21069==
==21070==
==21067==
The contents of my test script, testscript.sh:
ls | sleep 5 | less
Sparse contents from one of the files generated by --massif-out-file=myProcesses".%p" (myProcesses.21055):
desc: --peak-inaccuracy=0.5 --pages-as-heap=yes --massif-out-file=myProcesses.%p
cmd: ./testscript.sh
time_unit: i
#-----------
snapshot=0
#-----------
time=0
mem_heap_B=110592
mem_heap_extra_B=0
mem_stacks_B=0
heap_tree=empty
#-----------
snapshot=1
#-----------
time=0
mem_heap_B=118784
mem_heap_extra_B=0
mem_stacks_B=0
heap_tree=empty
...
#-----------
snapshot=18
#-----------
time=108269
mem_heap_B=1708032
mem_heap_extra_B=0
mem_stacks_B=0
heap_tree=peak
n2: 1708032 (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
n3: 1474560 0x4015E42: mmap (mmap.S:62)
n1: 1425408 0x4005CAC: _dl_map_object_from_fd (dl-load.c:1209)
n2: 1425408 0x4007109: _dl_map_object (dl-load.c:2250)
n1: 1413120 0x400CEEA: openaux (dl-deps.c:65)
n1: 1413120 0x400D834: _dl_catch_error (dl-error.c:178)
n1: 1413120 0x400C1E0: _dl_map_object_deps (dl-deps.c:247)
n1: 1413120 0x4002B59: dl_main (rtld.c:1780)
n1: 1413120 0x40140C5: _dl_sysdep_start (dl-sysdep.c:243)
n1: 1413120 0x4000C6B: _dl_start (rtld.c:333)
n0: 1413120 0x4000855: ??? (in /lib/ld-2.11.1.so)
n0: 12288 in 1 place, below massif's threshold (01.00%)
n0: 28672 in 3 places, all below massif's threshold (01.00%)
n1: 20480 0x4005E0C: _dl_map_object_from_fd (dl-load.c:1260)
n1: 20480 0x4007109: _dl_map_object (dl-load.c:2250)
n0: 20480 in 2 places, all below massif's threshold (01.00%)
n0: 233472 0xFFFFFFFF: ???
#-----------
snapshot=19
#-----------
time=108269
mem_heap_B=1703936
mem_heap_extra_B=0
mem_stacks_B=0
heap_tree=empty
#-----------
snapshot=20
#-----------
time=200236
mem_heap_B=1839104
mem_heap_extra_B=0
mem_stacks_B=0
heap_tree=empty
Massif continues to complain about heap allocations in the remainder of my files. Note this is very similar to your error.
I theorize that your version of valgrind was built in debug mode, causing the asserts to fire. A rebuild from source (I used this with the defaults hanging off ./configure) will fix the issue.
Either way, this seems to be expected with Massif.
Some programs allow you to preload the libmemusage.so library and get a report of what memory allocations were allocated recorded:
$ LD_PRELOAD=libmemusage.so less /etc/passwd
Memory usage summary: heap total: 36212, heap peak: 35011, stack peak: 15008
total calls total memory failed calls
malloc| 39 5985 0
realloc| 3 64 0 (nomove:2, dec:0, free:0)
calloc| 238 30163 0
free| 51 11546
Histogram for block sizes:
0-15 128 45% ==================================================
16-31 13 4% =====
32-47 105 37% =========================================
48-63 2 <1%
64-79 4 1% =
80-95 5 1% =
96-111 3 1% =
112-127 3 1% =
160-175 1 <1%
192-207 1 <1%
208-223 2 <1%
256-271 1 <1%
432-447 1 <1%
560-575 1 <1%
656-671 1 <1%
768-783 1 <1%
944-959 1 <1%
1024-1039 2 <1%
1328-1343 1 <1%
2128-2143 1 <1%
3312-3327 1 <1%
7952-7967 1 <1%
8240-8255 1 <1%
Though I must admit that it doesn't always work -- LD_PRELOAD=libmemusage.so ls never reports anything, for example -- and I wish I knew the conditions that allow it to work or not work.

Resources