`caffe': malloc(): memory corruption when snapshotting to disk - machine-learning

I am training a simple network. Having trouble to get caffe run, I decided to test run on 20 images only. But I can't get past this following error message. I have rebuilt caffe as suggested by other posts but didn't solve the issue.
I1008 13:52:01.227901 45606 solver.cpp:454] Snapshotting to binary proto file _iter_10.caffemodel
*** Aborted at 1475952725 (unix time) try "date -d #1475952725" if you are using GNU date ***
PC: # 0x7f5e0130768c caffe::BlobProto::SerializeWithCachedSizesToArray()
*** SIGSEGV (#0xd70e000) received by PID 45606 (TID 0x7f5e01e0ea00) from PID 225501184; stack trace: ***
# 0x7f5df32c98d0 (unknown)
# 0x7f5e0130768c caffe::BlobProto::SerializeWithCachedSizesToArray()
# 0x7f5e0130d13f caffe::LayerParameter::SerializeWithCachedSizesToArray()
# 0x7f5e0130f8d7 caffe::NetParameter::SerializeWithCachedSizesToArray()
# 0x7f5dfb6fd58a (unknown)
# 0x7f5dfb6fd655 (unknown)
# 0x7f5dfb6fd7bf (unknown)
# 0x7f5dfb76815b (unknown)
# 0x7f5e01389803 caffe::WriteProtoToBinaryFile()
# 0x7f5e013a1a82 caffe::Solver<>::SnapshotToBinaryProto()
# 0x7f5e013a1b6f caffe::Solver<>::Snapshot()
# 0x7f5e013a3219 caffe::Solver<>::Step()
# 0x7f5e013a34a9 caffe::Solver<>::Solve()
# 0x409426 train()
# 0x405c83 main
# 0x7f5df2f30b45 (unknown)
# 0x406565 (unknown)
# 0x0 (unknown)
*** Error in `caffe': malloc(): memory corruption: 0x000000000d4ceac0 ***
I have a feeling that it is caused by my solver file. Here is the my solver.
net: "/X/train.prototxt"
test_iter: 5
test_interval: 5
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
lr_policy: "step"
stepsize: 5
gamma: 0.1
power: 0.75
display: 5
max_iter: 20
snapshot: 10
snapshot_prefix: "/X/A"
solver_mode: GPU
Do you see any issue on my solver?
Cheers,

Is your model larger than 2gb?
If so, this error may be due to the limitation of the protobuf format.
Try adding
snapshot_format: HDF5
at the end of your solver.prototxt to save in hdf5 format instead.
Related discussion can be found at:
https://github.com/BVLC/caffe/pull/2836

Related

Why the same tasks cost differerent CPU on linux kernel 4.9 and 5.4?

My application is a compute intensive task(I.e. video encoding). When it is running on linux kernel 4.9(Ubuntu 16.04), the cpu usage is 3300%. But when it is running on linux kernel 5.4(Ubuntu 20.04), the cpu Usage is just 2850%. Promise the processes do the same job.
So I wonder if linux kernel had done some cpu scheduling optimization or related work between 4.9 and 5.4? Could you give any advice to investigate the reason?
I am not sure if the version of glic has effect or not, for your information, the version of glic is 2.23 on linux kernel 4.9 while 2.31 on linux kernel 5.4.
CPU Info:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Silver 4210 CPU # 2.20GHz
Stepping: 7
CPU MHz: 2200.000
BogoMIPS: 4401.69
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 14080K
NUMA node0 CPU(s): 0-9,20-29
NUMA node1 CPU(s): 10-19,30-39
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
Output of perf stat on Linux Kernel 4.9
Performance counter stats for process id '32504':
3146297.833447 cpu-clock (msec) # 32.906 CPUs utilized
1,718,778 context-switches # 0.546 K/sec
574,717 cpu-migrations # 0.183 K/sec
2,796,706 page-faults # 0.889 K/sec
6,193,409,215,015 cycles # 1.968 GHz (30.76%)
6,948,575,328,419 instructions # 1.12 insn per cycle (38.47%)
540,538,530,660 branches # 171.801 M/sec (38.47%)
33,087,740,169 branch-misses # 6.12% of all branches (38.50%)
1,966,141,393,632 L1-dcache-loads # 624.906 M/sec (38.49%)
184,477,765,497 L1-dcache-load-misses # 9.38% of all L1-dcache hits (38.47%)
8,324,742,443 LLC-loads # 2.646 M/sec (30.78%)
3,835,471,095 LLC-load-misses # 92.15% of all LL-cache hits (30.76%)
<not supported> L1-icache-loads
187,604,831,388 L1-icache-load-misses (30.78%)
1,965,198,121,190 dTLB-loads # 624.607 M/sec (30.81%)
438,496,889 dTLB-load-misses # 0.02% of all dTLB cache hits (30.79%)
7,139,892,384 iTLB-loads # 2.269 M/sec (30.79%)
260,660,265 iTLB-load-misses # 3.65% of all iTLB cache hits (30.77%)
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
95.615072142 seconds time elapsed
Output of perf stat on Linux Kernel 5.4
Performance counter stats for process id '3355137':
2,718,192.32 msec cpu-clock # 29.184 CPUs utilized
1,719,910 context-switches # 0.633 K/sec
448,685 cpu-migrations # 0.165 K/sec
3,884,586 page-faults # 0.001 M/sec
5,927,930,305,757 cycles # 2.181 GHz (30.77%)
6,848,723,995,972 instructions # 1.16 insn per cycle (38.47%)
536,856,379,853 branches # 197.505 M/sec (38.47%)
32,245,288,271 branch-misses # 6.01% of all branches (38.48%)
1,935,640,517,821 L1-dcache-loads # 712.106 M/sec (38.47%)
177,978,528,204 L1-dcache-load-misses # 9.19% of all L1-dcache hits (38.49%)
8,119,842,688 LLC-loads # 2.987 M/sec (30.77%)
3,625,986,107 LLC-load-misses # 44.66% of all LL-cache hits (30.75%)
<not supported> L1-icache-loads
184,001,558,310 L1-icache-load-misses (30.76%)
1,934,701,161,746 dTLB-loads # 711.760 M/sec (30.74%)
676,618,636 dTLB-load-misses # 0.03% of all dTLB cache hits (30.76%)
6,275,901,454 iTLB-loads # 2.309 M/sec (30.78%)
391,706,425 iTLB-load-misses # 6.24% of all iTLB cache hits (30.78%)
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
93.139551411 seconds time elapsed
UPDATE:
It is confirmed the performance gain comes from linux kernel 5.4, because the performance on linux kernel 5.3 is the same as linux kernel 4.9.
It is confirmed the performance gain has no relation with libc, because on linux kernel 5.10 whose libc is 2.23 the performance is the same as linux kernel 5.4 whose libc is 2.31
It seems performance gain comes from this fix:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de53fd7aedb100f03e5d2231cfce0e4993282425

Thingsboard Performance Test - Gatling

I'm trying to do performance test using the https://github.com/thingsboard/gatling-mqtt project.
Although current versions of the tools are a bit different, I managed to have it running.
I've tested a couple of basic scripts with Gatling using HTTP and Gatling worked as expected.
But when using the scripts provided with the tool, that use the plugin for MQTT it does not work.
In fact it runs but it doesn't do anything. No connections, no logs, no reports, no errors.
On the other tests it increments the global OK=0 as the test progresses, and generates logs and reports.
But as you can see below, when running with the MQTT plugin it doesn't increment the count
.
Simulation MqttSimulation_localhost started...
================================================================================
2021-10-14 17:55:02 5s elapsed
---- Requests ------------------------------------------------------------------
> Global (OK=0 KO=0 )
---- MQTT Test -----------------------------------------------------------------
[--------------------------------------------------------------------------] 0%
waiting: 0 / active: 10 / done:0
================================================================================
================================================================================
2021-10-14 17:55:07 10s elapsed
---- Requests ------------------------------------------------------------------
> Global (OK=0 KO=0 )
---- MQTT Test -----------------------------------------------------------------
[--------------------------------------------------------------------------] 0%
waiting: 0 / active: 10 / done:0
================================================================================
================================================================================
2021-10-14 17:55:12 15s elapsed
---- Requests ------------------------------------------------------------------
> Global (OK=0 KO=0 )
---- MQTT Test -----------------------------------------------------------------
[--------------------------------------------------------------------------] 0%
waiting: 0 / active: 10 / done:0
================================================================================
================================================================================
2021-10-14 17:55:17 20s elapsed
---- Requests ------------------------------------------------------------------
> Global (OK=0 KO=0 )
... and it goes on forever
Any ideas on what could be happening?
I would certainly appreciate any help on this!
Thank you!

How to convert task-clock perf-event to seconds or milliseconds?

I am trying to use perf for performance analysis.
When I use perf stat it provides execution time
Performance counter stats for './quicksort_ver1 input.txt 10000':
7.00 msec task-clock:u # 0.918 CPUs utilized
2,679,253 cycles:u # 0.383 GHz (9.58%)
18,034,446 instructions:u # 6.73 insn per cycle (23.56%)
5,764,095 branches:u # 822.955 M/sec (37.62%)
5,030,025 dTLB-loads # 718.150 M/sec (51.69%)
2,948,787 dTLB-stores # 421.006 M/sec (65.75%)
5,525,534 L1-dcache-loads # 788.895 M/sec (48.31%)
2,653,434 L1-dcache-stores # 378.838 M/sec (34.25%)
4,900 L1-dcache-load-misses # 0.09% of all L1-dcache hits (20.16%)
66 LLC-load-misses # 0.00% of all LL-cache hits (6.09%)
<not counted> LLC-store-misses (0.00%)
<not counted> LLC-loads (0.00%)
<not counted> LLC-stores (0.00%)
0.007631774 seconds time elapsed
0.006655000 seconds user
0.000950000 seconds sys
However when I use perf record, I observe that for task-clock 45 samples and 14999985 events are collected.
Samples: 45 of event 'task-clock:u', Event count (approx.): 14999985
Children Self Command Shared Object Symbol
+ 91.11% 0.00% quicksort_ver1 quicksort_ver1 [.] _start
+ 91.11% 0.00% quicksort_ver1 libc-2.17.so [.] __libc_start_main
+ 91.11% 0.00% quicksort_ver1 quicksort_ver1 [.] main
is there any way to convert task-clock events to seconds to milliseconds?
Got answer with little bit of experimentation. Basic unit of task-cpu event is Nano second
stats collected with perf stat
$ sudo perf stat -e task-clock:u ./bubble_sort input.txt 50000
Performance counter stats for './bubble_sort input.txt 50000':
11,617.33 msec task-clock:u # 1.000 CPUs utilized
11.617480215 seconds time elapsed
11.615856000 seconds user
0.002000000 seconds sys
stats collected with perf record
$ sudo perf report
Samples: 35K of event 'task-clock:u', Event count (approx.): 11715321618
Overhead Command Shared Object Symbol
73.75% bubble_sort bubble_sort [.] bubbleSort
26.15% bubble_sort bubble_sort [.] swap
0.07% bubble_sort libc-2.17.so [.] _IO_vfscanf
observe in both the cases sample has changed but event count is approximately same.
perf stat reports elapsed time as 11.617480215 seconds and perf report reports total task-clock events: 11715321618
11715321618 nanoseconds = 11.715321618 seconds which is approximately equals to 11.615856000 seconds
apparently basic unit of task-cpu event is Nanosecond.

Cannot Create ESXI VMFS Datastore with error 'Cannot Change the host configuration'

I am running ESXI 7.0 on a Dell 3930 Rack PC. This PC has an NVME SSD and a 1TB Sata HDD plugged in. I used the Dell ESXI ISO image while setting up.
I can see the NVM and PCH controllers when I browse storage. The name of controller showing is: 'Cannon Lake PCH-H AHCI Controller'
When I goto devices, I can also see the 'Local ATA Disk' there. Despite all attempts, I am not able to create a VMFS datastore and always receive an error saying 'cannot change host configuration'
I tried clearing the partition from ESXI Web client but wasn't successful either. The vmkernel logs show the following when I try to create a datastore
2021-05-30T09:48:08.091Z cpu8:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:FAIL!!: Internal command 2f, 00
2021-05-30T09:48:08.091Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Fail to get error log for port 0
2021-05-30T09:48:08.092Z cpu8:1049247)vmw_ahci[00000017]: _IssueComReset:Issuing comreset...
2021-05-30T09:48:08.226Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:fail a command on slot 1
2021-05-30T09:48:08.226Z cpu12:1049325)ScsiDeviceIO: 4062: Cmd(0x455a74543900) 0x28, CmdSN 0xe from world 1196852 to dev "t10.ATA_____ST1000LM0492D2GH172__________________________________ZGS23QAV" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44
2021-05-30T09:48:08.226Z cpu12:1049325)0x0.
2021-05-30T09:48:08.264Z cpu8:1048723)vmw_ahci[00000017]: CompletionBottomHalf:strange irq(s), 0x4000000
2021-05-30T09:48:08.264Z cpu8:1048723)vmw_ahci[00000017]: CompletionBottomHalf:PORT_IRQ_IF_NONFATAL exception.
2021-05-30T09:48:08.264Z cpu8:1048723)vmw_ahci[00000017]: LogExceptionSignal:Port 0, Signal: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040) Curr: --|--|--|--|--|--|--|--|--|--|--|-- (0x0000)
2021-05-30T09:48:08.264Z cpu8:1049247)vmw_ahci[00000017]: LogExceptionProcess:Port 0, Process: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040) Curr: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040)
2021-05-30T09:48:08.264Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Performing device reset due to Port IRQ Error.
2021-05-30T09:48:08.264Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:hardware stop on slot 0x1, activeTags 0x00000002
2021-05-30T09:48:08.286Z cpu8:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:port status: 0x40000001, tf status: 0x451
2021-05-30T09:48:08.288Z cpu8:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:FAIL!!: Internal command 2f, 00
2021-05-30T09:48:08.288Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Fail to get error log for port 0
2021-05-30T09:48:08.289Z cpu8:1049247)vmw_ahci[00000017]: _IssueComReset:Issuing comreset...
2021-05-30T09:48:08.414Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:fail a command on slot 1
2021-05-30T09:48:08.414Z cpu12:1049325)ScsiDeviceIO: 4062: Cmd(0x455a744e2600) 0x28, CmdSN 0x15 from world 1196852 to dev "t10.ATA_____ST1000LM0492D2GH172__________________________________ZGS23QAV" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44
2021-05-30T09:48:08.414Z cpu12:1049325)0x0.
2021-05-30T09:48:08.449Z cpu8:1049830)vmw_ahci[00000017]: CompletionBottomHalf:Error port=0, PxIS=0x08000000, PxTDF=0x40,PxSERR=0x00400100, PxCI=0x00000000, PxSACT=0x00000002, ActiveTags=0x00000002
2021-05-30T09:48:08.449Z cpu8:1049830)vmw_ahci[00000017]: CompletionBottomHalf:SCSI cmd 0x2a on slot 1 lba=0x0, lbc=0x22
2021-05-30T09:48:08.449Z cpu8:1049830)vmw_ahci[00000017]: CompletionBottomHalf:cfis->command= 0x61
2021-05-30T09:48:08.449Z cpu8:1049830)vmw_ahci[00000017]: LogExceptionSignal:Port 0, Signal: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|--|--|--|--|--|--|-- (0x0000)
2021-05-30T09:48:08.449Z cpu8:1049247)vmw_ahci[00000017]: LogExceptionProcess:Port 0, Process: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020)
2021-05-30T09:48:08.449Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Performing device reset due to Task File Error.
2021-05-30T09:48:08.449Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:hardware stop on slot 0x1, activeTags 0x00000002
2021-05-30T09:48:08.461Z cpu8:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:port status: 0x40000008, tf status: 0x84c1
2021-05-30T09:48:08.462Z cpu8:1049247)vmw_ahci[00000017]: _IssueComReset:Issuing comreset...
2021-05-30T09:48:08.618Z cpu8:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:FAIL!!: Internal command 2f, 00
2021-05-30T09:48:08.618Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Fail to get error log for port 0
2021-05-30T09:48:08.619Z cpu8:1049247)vmw_ahci[00000017]: _IssueComReset:Issuing comreset...
2021-05-30T09:48:08.661Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:fail a command on slot 1
2021-05-30T09:48:08.661Z cpu12:1049325)ScsiDeviceIO: 4062: Cmd(0x455a744aa700) 0x2a, CmdSN 0x2 from world 1196852 to dev "t10.ATA_____ST1000LM0492D2GH172__________________________________ZGS23QAV" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44
2021-05-30T09:48:08.661Z cpu12:1049325)0x0.
2021-05-30T09:48:12.698Z cpu11:1049281)NMP: nmp_ResetDeviceLogThrottling:3776: last error status from device t10.ATA_____ST1000LM0492D2GH172__________________________________ZGS23QAV repeated 6 times
2021-05-30T09:48:42.644Z cpu7:1049176)INFO (ne1000): false RX hang detected on vmnic0
2021-05-30T09:51:12.698Z cpu3:1049363)DVFilter: 6344: Checking disconnected filters for timeouts
2021-05-30T09:52:20.250Z cpu2:1049176)INFO (ne1000): false RX hang detected on vmnic0
2021-05-30T09:52:32.136Z cpu8:1051618)vmw_ahci[00000017]: CompletionBottomHalf:strange irq(s), 0x4000000
2021-05-30T09:52:32.136Z cpu8:1051618)vmw_ahci[00000017]: CompletionBottomHalf:PORT_IRQ_IF_NONFATAL exception.
2021-05-30T09:52:32.136Z cpu8:1051618)vmw_ahci[00000017]: LogExceptionSignal:Port 0, Signal: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040) Curr: --|--|--|--|--|--|--|--|--|--|--|-- (0x0000)
2021-05-30T09:52:32.136Z cpu8:1049247)vmw_ahci[00000017]: LogExceptionProcess:Port 0, Process: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040) Curr: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040)
2021-05-30T09:52:32.136Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Performing device reset due to Port IRQ Error.
2021-05-30T09:52:32.137Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:hardware stop on slot 0x1, activeTags 0x00000002
2021-05-30T09:52:32.159Z cpu8:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:port status: 0x40000001, tf status: 0x451
2021-05-30T09:52:32.161Z cpu2:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:FAIL!!: Internal command 2f, 00
2021-05-30T09:52:32.161Z cpu2:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Fail to get error log for port 0
2021-05-30T09:52:32.162Z cpu2:1049247)vmw_ahci[00000017]: _IssueComReset:Issuing comreset...
2021-05-30T09:52:32.283Z cpu2:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:fail a command on slot 1
2021-05-30T09:52:32.283Z cpu12:1048622)NMP: nmp_ThrottleLogForDevice:3856: Cmd 0x28 (0x455a733c8440, 0) to dev "t10.ATA_____ST1000LM0492D2GH172__________________________________ZGS23QAV" on path "vmhba0:C0:T0:L0" Failed:
2021-05-30T09:52:32.283Z cpu12:1048622)NMP: nmp_ThrottleLogForDevice:3865: H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0. Act:NONE. cmdId.initiator=0x451a20b1a7b8 CmdSN 0x18a60
2021-05-30T09:52:32.283Z cpu12:1048622)ScsiDeviceIO: 4062: Cmd(0x455a733c8440) 0x28, CmdSN 0x18a60 from world 0 to dev "t10.ATA_____ST1000LM0492D2GH172__________________________________ZGS23QAV" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44
2021-05-30T09:52:32.283Z cpu12:1048622)0x0.
I had some doubts if the computer's AHCI controller(Cannon Lake PCH-H AHCI Controller) is compatible with esxi 7 but cannot find any resource that confirms this. I read somewhere that disabling the default AHCI driver by the following ssh command may help:
esxcli system module set --enabled=false --module=vmw_ahci
I tried this and if the driver is disabled, post restart the controller wont even display at all. So this had to be re-enabled.
I also tried clearing out the partition table as this drive has no useful information but it always throws an 'input/output' error to any partedUtil command. It seems any write attempt to this device does not work.
When I try the partedUtil getptbl command, the partition format is described as 'unknown'.
FYI, before I setup ESXI, the HDD in question was a diskdrive for a Ubuntu OS and was accessible.
Any leads that could help fix this issue would be welcome.

Am I increasing shared memory correctly for GNURadio?

I'm working with GNURadio, and working with stream tags (using stream tagging to create a burst transmitter), but my flowgraph won't run with around ~200 stream tags, citing error below.
gr::vmcircbuf_sysv_shm: shmget (1): Invalid argument
gr::vmcircbuf_sysv_shm: shmget (1): Invalid argument
gr::vmcircbuf_sysv_shm: shmget (1): Invalid argument
gr::buffer::allocate_buffer: failed to allocate buffer of size 1250000 KB
gr::vmcircbuf_sysv_shm: shmget (1): Invalid argument
gr::vmcircbuf_sysv_shm: shmget (1): Invalid argument
gr::vmcircbuf_sysv_shm: shmget (1): Invalid argument
gr::buffer::allocate_buffer: failed to allocate buffer of size 1250000 KB
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
However, sysctl --all | grep shm outputs
kernel.shm_next_id = -1
kernel.shm_rmid_forced = 0
kernel.shmall = 32147483648
kernel.shmmax = 32147483648
kernel.shmmni = 16777216
This means I should have 32 GB in shared memory, correct? I set kernel.shmall and shmmax via
sudo sysctl kernel.shmall=32147483648
sudo sysctl kernel.shmmax=32147483648
The only thing that concerns me is cat /proc/meminfo | grep shmem returns
Shmem: 42556 kB
Is there a better way to increase shared memory?

Resources