I have a couple processes running a tool I've written that are joined by pipes, and I would like to measure their collected memory usage with valgrind. So far, I have tried something like:
$ valgrind tool=massif trace-children=yes --peak-inaccuracy=0.5 --pages-as-heap=yes --massif-out-file=myProcesses".%p" myProcesses.script
Where myProcesses.script runs the equivalent of my tool foo twice, e.g.:
foo | foo > /dev/null
Valgrind doesn't seem to capture the collected memory usage of this the way I expect. If I use top to track this, I get (for sake of argument) 10% memory usage on the first foo, and then another 10% collects on the second foo before the myProcesses.script completes. This is the sort of thing I want to measure: the usage of both processes. Valgrind instead returns the following error:
Massif: ms_main.c:1891 (ms_new_mem_brk): Assertion 'VG_IS_PAGE_ALIGNED(len)' failed.
Is there a way to collect memory usage data for commands I'm using in a piped fashion (using valgrind)? Or a similar tool that I can use to accurately automate these measurements?
The numbers that top returns while polling seem hand-wavy, to me, and I am seeking accurate and repeatable measurements. If you have suggestions for alternative tools, I would welcome those, as well.
EDIT - Fixed typo with valgrind option.
EDIT 2 - For some reason, it appears that the option --pages-as-heap is giving us troubles with the binaries we're testing. Your examples run fine. A new page is created every time we enter a non-inlined function (stack overflows - heh). We wanted to count those, but they're relatively minor in the scale of memory usage we're testing. (Perhaps there aren't function calls in ls or less?) Removing --pages-as-heap helped get testing working again. Thanks to MrGomez for the great help.
With the correct valgrind version given in the errata, this seems to just work for me in Valgrind 3.6.1. My invocation:
<me>#harley:/tmp/test$ /usr/local/bin/valgrind --tool=massif \
--trace-children=yes --peak-inaccuracy=0.5 --pages-as-heap=yes \
--massif-out-file=myProcesses".%p" ./testscript.sh
==21067== Massif, a heap profiler
==21067== Copyright (C) 2003-2010, and GNU GPL'd, by Nicholas Nethercote
==21067== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==21067== Command: ./testscript.sh
==21067==
==21068== Massif, a heap profiler
==21068== Copyright (C) 2003-2010, and GNU GPL'd, by Nicholas Nethercote
==21068== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==21068== Command: /bin/ls
==21068==
==21070== Massif, a heap profiler
==21070== Copyright (C) 2003-2010, and GNU GPL'd, by Nicholas Nethercote
==21070== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==21069== Massif, a heap profiler
==21069== Copyright (C) 2003-2010, and GNU GPL'd, by Nicholas Nethercote
==21069== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==21069== Command: /bin/sleep 5
==21069==
==21070== Command: /usr/bin/less
==21070==
==21068==
(END) ==21069==
==21070==
==21067==
The contents of my test script, testscript.sh:
ls | sleep 5 | less
Sparse contents from one of the files generated by --massif-out-file=myProcesses".%p" (myProcesses.21055):
desc: --peak-inaccuracy=0.5 --pages-as-heap=yes --massif-out-file=myProcesses.%p
cmd: ./testscript.sh
time_unit: i
#-----------
snapshot=0
#-----------
time=0
mem_heap_B=110592
mem_heap_extra_B=0
mem_stacks_B=0
heap_tree=empty
#-----------
snapshot=1
#-----------
time=0
mem_heap_B=118784
mem_heap_extra_B=0
mem_stacks_B=0
heap_tree=empty
...
#-----------
snapshot=18
#-----------
time=108269
mem_heap_B=1708032
mem_heap_extra_B=0
mem_stacks_B=0
heap_tree=peak
n2: 1708032 (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
n3: 1474560 0x4015E42: mmap (mmap.S:62)
n1: 1425408 0x4005CAC: _dl_map_object_from_fd (dl-load.c:1209)
n2: 1425408 0x4007109: _dl_map_object (dl-load.c:2250)
n1: 1413120 0x400CEEA: openaux (dl-deps.c:65)
n1: 1413120 0x400D834: _dl_catch_error (dl-error.c:178)
n1: 1413120 0x400C1E0: _dl_map_object_deps (dl-deps.c:247)
n1: 1413120 0x4002B59: dl_main (rtld.c:1780)
n1: 1413120 0x40140C5: _dl_sysdep_start (dl-sysdep.c:243)
n1: 1413120 0x4000C6B: _dl_start (rtld.c:333)
n0: 1413120 0x4000855: ??? (in /lib/ld-2.11.1.so)
n0: 12288 in 1 place, below massif's threshold (01.00%)
n0: 28672 in 3 places, all below massif's threshold (01.00%)
n1: 20480 0x4005E0C: _dl_map_object_from_fd (dl-load.c:1260)
n1: 20480 0x4007109: _dl_map_object (dl-load.c:2250)
n0: 20480 in 2 places, all below massif's threshold (01.00%)
n0: 233472 0xFFFFFFFF: ???
#-----------
snapshot=19
#-----------
time=108269
mem_heap_B=1703936
mem_heap_extra_B=0
mem_stacks_B=0
heap_tree=empty
#-----------
snapshot=20
#-----------
time=200236
mem_heap_B=1839104
mem_heap_extra_B=0
mem_stacks_B=0
heap_tree=empty
Massif continues to complain about heap allocations in the remainder of my files. Note this is very similar to your error.
I theorize that your version of valgrind was built in debug mode, causing the asserts to fire. A rebuild from source (I used this with the defaults hanging off ./configure) will fix the issue.
Either way, this seems to be expected with Massif.
Some programs allow you to preload the libmemusage.so library and get a report of what memory allocations were allocated recorded:
$ LD_PRELOAD=libmemusage.so less /etc/passwd
Memory usage summary: heap total: 36212, heap peak: 35011, stack peak: 15008
total calls total memory failed calls
malloc| 39 5985 0
realloc| 3 64 0 (nomove:2, dec:0, free:0)
calloc| 238 30163 0
free| 51 11546
Histogram for block sizes:
0-15 128 45% ==================================================
16-31 13 4% =====
32-47 105 37% =========================================
48-63 2 <1%
64-79 4 1% =
80-95 5 1% =
96-111 3 1% =
112-127 3 1% =
160-175 1 <1%
192-207 1 <1%
208-223 2 <1%
256-271 1 <1%
432-447 1 <1%
560-575 1 <1%
656-671 1 <1%
768-783 1 <1%
944-959 1 <1%
1024-1039 2 <1%
1328-1343 1 <1%
2128-2143 1 <1%
3312-3327 1 <1%
7952-7967 1 <1%
8240-8255 1 <1%
Though I must admit that it doesn't always work -- LD_PRELOAD=libmemusage.so ls never reports anything, for example -- and I wish I knew the conditions that allow it to work or not work.
Related
I'm currently reading Master Embedded Linux Programming and I'm on the chapter where it goes into bootloaders, more specifically U-Boot for the Beaglebone Black.
I have built a crosscompiler and I'm able to build U-Boot, however I can't make it run the way it is described in the book.
After some experimentation and Google'ing, I can make it work by writing MLO and u-boot.img in raw mode (using these command)
However, if I put the files in a FAT32 MBR boot partition, the Beaglebone will not boot, it will only show a string of C's, which indicate that it is trying to get its bootloader from the serial interface and it has decided it cannot boot from SD card.
I have also studied this answer. According to that answer I should be doing everything correctly. I've tried to experiment with the MMC raw mode options in the U-Boot build configuration, but I've not been able to find a change that works.
I feel like there must be something obvious I'm missing, but I can't figure it out. Are there any things I can try to debug this further?
Update: some more details on the partition tables.
When using the "raw way" of putting LBO and u-boot.img on the SD cards, I have not created any partitions at all. This works:
$ sudo sfdisk /dev/sda -l
Disk /dev/sda: 117,75 GiB, 126437294080 bytes, 246947840 sectors
Disk model: MassStorageClass
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
When trying to use a boot partition, that does not work, I have this configuration:
$ sudo sfdisk /dev/sda -l
Disk /dev/sda: 117,75 GiB, 126437294080 bytes, 246947840 sectors
Disk model: MassStorageClass
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x3d985ec3
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 133119 131072 64M c W95 FAT32 (LBA)
Update 2: The contents of the boot partition is the exact same 2 files that I use for the raw writes, so they are confirmed to work:
$ ls -al
total 1000
drwxr-xr-x 2 peter peter 16384 Jan 1 1970 .
drwxr-x---+ 3 root root 4096 Jul 18 08:44 ..
-rw-r--r-- 1 peter peter 108184 Jul 14 13:56 MLO
-rw-r--r-- 1 peter peter 893144 Jul 14 13:56 u-boot.img
Update 3: I have already tried the following U-Boot options to try it go get to work (in the SPL / TPL menu):
"Support FAT filesystems" This is enabled by default. I can't really find a good reference for the U-Boot options, but I am guessing this is what enables booting from a FAT partition (which is what I'm trying to do)
"MCC raw mode: by sector" I have disabled this. As expected, this indeed breaks the booting in raw mode, which is the only thing I got working up till now.
"MCC raw mode: by partition". I have tried to enable this and using partition 1 to load U-Boot from. I'm not sure how to understand this option. I assume raw mode does not require partitions, but this asks for what partition to use...
In general, if any one can point me to a U-Boot configuration reference, that would already by very helpful. Right now, I'm just randomly turning things on and off that sound like they may help.
I am working with
U-boot v2021.10
BeagleBone Black rev C
I've created an uboot.env image with mkenvimage tool from file
loadfromsd=load mmc 0:1 0x82000000 /zImage; load mmc 0:1 0x88000000 /am335x-boneblack.dtb
set_bootargs=setenv bootargs console=ttyS0,115200n8 root=/dev/mmcblk0p2 rw rootfstype=ext4 rootwait
uenvcmd=setenv auotload no; run set_bootargs; run loadfromsd; printenv bootargs; bootz 0x82000000 - 0x88000000
The problem is in files loading to memory with load cmd in first line.
Full message from start is:
U-Boot SPL 2021.10 (Oct 14 2021 - 20:41:20 -0700)
Trying to boot from MMC1
U-Boot 2021.10 (Oct 14 2021 - 20:41:20 -0700)
CPU : AM335X-GP rev 2.1
Model: TI AM335x BeagleBone Black
DRAM: 512 MiB
ti_sysc target-module#9000: failed to get fck clock
WDT: Started with servicing (60s timeout)
NAND: nand_base: timeout while waiting for chip to become ready
nand_base: No NAND device found
0 MiB
MMC: ti_sysc target-module#7000: failed to get fck clock
OMAP SD/MMC: 0, OMAP SD/MMC: 1
Loading Environment from FAT... OK
<ethaddr> not set. Validating first E-fuse MAC
Net: eth2: ethernet#4a100000, eth3: usb_ether
=> run uenvcmd
4295456 bytes read in 282 ms (14.5 MiB/s)
'ailed to load '/am335x-boneblack.dtb
bootargs=console=ttyS0,115200n8 root=/dev/mmcblk0p2 rw rootfstype=ext4 rootwait
Kernel image # 0x82000000 [ 0x000000 - 0x418b20 ]
ERROR: Did not find a cmdline Flattened Device Tree
Could not find a valid device tree
Actual error is
=> run uenvcmd
4295456 bytes read in 282 ms (14.5 MiB/s)
'ailed to load '/am335x-boneblack.dtb
P.S. My u-boot fails to recognize ${} substitutions properly, and usage of
console=ttyS0,115200n8
bootpartition=mmcblk0p2
set_bootargs=setenv bootargs console=${console} root=/dev/${bootpartition} rw rootfstype=ext4 rootwait
caused and error
syntax error:
rootfstype=ext4 rootwait0n8
this 0n8 was appended after rootwait and shouldn't be there. So I've written this "straight" file without variables.
Thanks to sawdust for info that carriage return character matters and overrides first letter of error msg - I've got an idea that it also matters for path to file in load cmd, and it matters.
If I use space+\r, NOT just \r - everything works fine.
I am using GNU parallel (version 20200522) on Ubuntu Linux 18.04.4 and running jobs on all cores of a local server minus 2 cores, that is I am using the -j-2 parameter.
find /folder/ -type f -iname "*.pdf" | parallel -j-2 --nice 2 "script.sh {1} {1/.}; mv -f -v {1} /folder2/; mv -f {1/.}.txt /folder3/" :::: -
However, the program shows
Error: Cannot run any jobs.
I tried using the -j100% parameter and I have seen that it uses just 1 core(job), and I deduce that, for GNU parallel, 100% of the available cores on this system is just one core.
If I use the -j5 parameter (which does not imply autodetection of the total number of cores), everything is alright, parallel launches 5 jobs and uses 5 cores.
The interesting part is that the file /root/.parallel/tmp/sshlogin/MACHINE_NAME/cpuspec contains the following:
1
6
6
which means, I think, that GNU parallel should see 6 available cores.
I have tried deleting the cpuspec file and running parallel again to redetect the total number of cores, but the cpuspec file and the behavior of the program remain the same.
On different systems, deleting the cpuspec file solved all issues, but on this particular system it is not working. The virtual machine is copied from another server with a different configuration, that is why I need deleting the cpuspec file.
What should I do to get GNU parallel to correctly detect the number of cores on the system, so that I can use the -j-2 parameter?
Update 21.07:
After deleting once again the folder with the cpuspec file, running the parallel --number-of-sockets/cores/threads commands and using just once the -S 6/: parameter, the problem seems to have resolved itself. Now GNU parallel correctly detects the number of cores and the -j-2 parameters works.
I am not sure what good things happened, but I am not able to reproduce the bug anymore.
Ole, thank you for your answer. If I meet the bug again or if I am able to reproduce it, I will post it here.
And here is the output to the commands:
parallel --number-of-sockets
1
parallel --number-of-cores
6
parallel --number-of-threads
6
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2620 v3 # 2.40GHz
stepping : 2
microcode : 0xffffffff
cpu MHz : 2397.218
cache size : 15360 KB
physical id : 0
siblings : 6
core id : 0
cpu cores : 6
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl cpuid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm pti fsgsbase bmi1 avx2 smep bmi2 erms xsaveopt
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 4794.43
clflush size : 64
cache_alignment : 64
address sizes : 42 bits physical, 48 bits virtual
power management:
And it repeats itself for 5 more cores.
You may have found a bug. Please post the output of:
cat /proc/cpuinfo
parallel --number-of-sockets
parallel --number-of-cores
parallel --number-of-threads
Also see if you can make an MCVE.
As a workaround you can use -S 6/: to force GNU Parallel to detect 6 cores on your system.
find /folder/ -type f -iname "*.pdf" |
parallel -S 6/: -j-2 --nice 2 "script.sh {1} {1/.}; mv -f -v {1} /folder2/; mv -f {1/.}.txt /folder3/"
(Also :::: - can be left out completely: If there is no ::: :::: then GNU Parallel reads from stdin).
I have one java based application which is having huge line of source code(~1m).Now I am using jenkins with sonar-runner-2.4 to run analysis with code coverage and test cases count.I have upgraded sonarqube server from 5.4 to 6.3.1.Before upgrade this job took 9hrs to complete the whole analysis (still it is very much long time but fine) but after upgrade to sonarqube-6.3.1 same job taking 13hrs to complete the same analysis.
How do I improve analysis time at least my earlier time 9hr ?
EDIT
Here is my JAVA_OPTS for sonarqube-6.3.1 instance
sonar.web.javaOpts=-Xmx6G -Xms2G -XX:MaxPermSize=1G -XX:+HeapDumpOnOutOfMemoryError -Djava.net.preferIPv4Stack=true
Available Hardware :
$lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 26
Stepping: 5
CPU MHz: 1596.000
BogoMIPS: 3999.44
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
NUMA node0 CPU(s): 0-3
NUMA node1 CPU(s): 4-7
Available Memory :
$free -m
total used free shared buff/cache available
Mem: 128714 58945 66232 430 3535 68298
Swap: 32767 957 31810
sonar-project.properties for the long running job:
sonar-project.properties
As you haven't really given many details, I can't really give many details in the answer, but the simple answer is that you have to make the scan do less work.
Look at your codebase. Is your scan processing generated classes? Is it scanning test classes? Is it scanning classes that have little real business logic? If you answer "yes" to any of those, consider excluding those classes.
Look at the SonarQube plugins you're using. Are you running every possible plugin you can run? Are there some heuristics you don't need to run, or perhaps you could run less frequently?
I have a 64-bit hotspot JDK version 1.7.0 installed on a 64-bit RHEL 6 machine. I use the following JVM options for my tomcat application.
CATALINA_OPTS="${CATALINA_OPTS} -Dfile.encoding=UTF8 -Dorg.apache.catalina.loader.WebappClassLoader.ENABLE_CLEAR_REFERENCES=false -Duser.timezone=EST5EDT"
# General Heap sizing
CATALINA_OPTS="${CATALINA_OPTS} -Xms4096m -Xmx4096m -XX:NewSize=2048m -XX:MaxNewSize=2048m -XX:PermSize=512m -XX:MaxPermSize=512m -XX:+UseCompressedOops -XX:+DisableExplicitGC"
# Enable the CMS GC policy
CATALINA_OPTS="${CATALINA_OPTS} -XX:+UseConcMarkSweepGC -XX:CMSWaitDuration=15000 -XX:+CMSParallelRemarkEnabled -XX:+CMSCompactWhenClearAllSoftRefs -XX:+CMSConcurrentMTEnabled -XX:+CMSScavengeBeforeRemark -XX:+CMSClassUnloadingEnabled"
# Verbose Garbage Collection Logging
CURRENT_DATE=`date +%Y%m%d%H%M%S`
CATALINA_OPTS="${CATALINA_OPTS} -verbose:gc -XX:+PrintGCDetails -Xloggc:${CATALINA_BASE}/logs/gc-${CURRENT_DATE}.log -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution"
When I have a Garbage Collection analysis, the GC logs show a maximum available heap of only 3.8GB instead of 4GB allocated to the JVM. Why is that?
New Generation (2048M) consists of 80% Eden (1638.4M) and two Survivor Spaces (10% or 204.8M each):
Heap
par new generation total 1887488K, used 134226K [0x00000006fae00000, 0x000000077ae00000, 0x000000077ae00000)
eden space 1677824K, 8% used [0x00000006fae00000, 0x00000007031148e0, 0x0000000761480000)
from space 209664K, 0% used [0x0000000761480000, 0x0000000761480000, 0x000000076e140000)
to space 209664K, 0% used [0x000000076e140000, 0x000000076e140000, 0x000000077ae00000)
concurrent mark-sweep generation total 2097152K, used 242K [0x000000077ae00000, 0x00000007fae00000, 0x00000007fae00000)
At any time one of survivor spaces is empty (see Generations).
So, the useful heap size is 1638.4 + 204.8 + 2048 = 3891.2 MB