Cannot Create ESXI VMFS Datastore with error 'Cannot Change the host configuration' - storage

I am running ESXI 7.0 on a Dell 3930 Rack PC. This PC has an NVME SSD and a 1TB Sata HDD plugged in. I used the Dell ESXI ISO image while setting up.
I can see the NVM and PCH controllers when I browse storage. The name of controller showing is: 'Cannon Lake PCH-H AHCI Controller'
When I goto devices, I can also see the 'Local ATA Disk' there. Despite all attempts, I am not able to create a VMFS datastore and always receive an error saying 'cannot change host configuration'
I tried clearing the partition from ESXI Web client but wasn't successful either. The vmkernel logs show the following when I try to create a datastore
2021-05-30T09:48:08.091Z cpu8:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:FAIL!!: Internal command 2f, 00
2021-05-30T09:48:08.091Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Fail to get error log for port 0
2021-05-30T09:48:08.092Z cpu8:1049247)vmw_ahci[00000017]: _IssueComReset:Issuing comreset...
2021-05-30T09:48:08.226Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:fail a command on slot 1
2021-05-30T09:48:08.226Z cpu12:1049325)ScsiDeviceIO: 4062: Cmd(0x455a74543900) 0x28, CmdSN 0xe from world 1196852 to dev "t10.ATA_____ST1000LM0492D2GH172__________________________________ZGS23QAV" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44
2021-05-30T09:48:08.226Z cpu12:1049325)0x0.
2021-05-30T09:48:08.264Z cpu8:1048723)vmw_ahci[00000017]: CompletionBottomHalf:strange irq(s), 0x4000000
2021-05-30T09:48:08.264Z cpu8:1048723)vmw_ahci[00000017]: CompletionBottomHalf:PORT_IRQ_IF_NONFATAL exception.
2021-05-30T09:48:08.264Z cpu8:1048723)vmw_ahci[00000017]: LogExceptionSignal:Port 0, Signal: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040) Curr: --|--|--|--|--|--|--|--|--|--|--|-- (0x0000)
2021-05-30T09:48:08.264Z cpu8:1049247)vmw_ahci[00000017]: LogExceptionProcess:Port 0, Process: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040) Curr: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040)
2021-05-30T09:48:08.264Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Performing device reset due to Port IRQ Error.
2021-05-30T09:48:08.264Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:hardware stop on slot 0x1, activeTags 0x00000002
2021-05-30T09:48:08.286Z cpu8:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:port status: 0x40000001, tf status: 0x451
2021-05-30T09:48:08.288Z cpu8:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:FAIL!!: Internal command 2f, 00
2021-05-30T09:48:08.288Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Fail to get error log for port 0
2021-05-30T09:48:08.289Z cpu8:1049247)vmw_ahci[00000017]: _IssueComReset:Issuing comreset...
2021-05-30T09:48:08.414Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:fail a command on slot 1
2021-05-30T09:48:08.414Z cpu12:1049325)ScsiDeviceIO: 4062: Cmd(0x455a744e2600) 0x28, CmdSN 0x15 from world 1196852 to dev "t10.ATA_____ST1000LM0492D2GH172__________________________________ZGS23QAV" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44
2021-05-30T09:48:08.414Z cpu12:1049325)0x0.
2021-05-30T09:48:08.449Z cpu8:1049830)vmw_ahci[00000017]: CompletionBottomHalf:Error port=0, PxIS=0x08000000, PxTDF=0x40,PxSERR=0x00400100, PxCI=0x00000000, PxSACT=0x00000002, ActiveTags=0x00000002
2021-05-30T09:48:08.449Z cpu8:1049830)vmw_ahci[00000017]: CompletionBottomHalf:SCSI cmd 0x2a on slot 1 lba=0x0, lbc=0x22
2021-05-30T09:48:08.449Z cpu8:1049830)vmw_ahci[00000017]: CompletionBottomHalf:cfis->command= 0x61
2021-05-30T09:48:08.449Z cpu8:1049830)vmw_ahci[00000017]: LogExceptionSignal:Port 0, Signal: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|--|--|--|--|--|--|-- (0x0000)
2021-05-30T09:48:08.449Z cpu8:1049247)vmw_ahci[00000017]: LogExceptionProcess:Port 0, Process: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020) Curr: --|--|--|--|--|TF|--|--|--|--|--|-- (0x0020)
2021-05-30T09:48:08.449Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Performing device reset due to Task File Error.
2021-05-30T09:48:08.449Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:hardware stop on slot 0x1, activeTags 0x00000002
2021-05-30T09:48:08.461Z cpu8:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:port status: 0x40000008, tf status: 0x84c1
2021-05-30T09:48:08.462Z cpu8:1049247)vmw_ahci[00000017]: _IssueComReset:Issuing comreset...
2021-05-30T09:48:08.618Z cpu8:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:FAIL!!: Internal command 2f, 00
2021-05-30T09:48:08.618Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Fail to get error log for port 0
2021-05-30T09:48:08.619Z cpu8:1049247)vmw_ahci[00000017]: _IssueComReset:Issuing comreset...
2021-05-30T09:48:08.661Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:fail a command on slot 1
2021-05-30T09:48:08.661Z cpu12:1049325)ScsiDeviceIO: 4062: Cmd(0x455a744aa700) 0x2a, CmdSN 0x2 from world 1196852 to dev "t10.ATA_____ST1000LM0492D2GH172__________________________________ZGS23QAV" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44
2021-05-30T09:48:08.661Z cpu12:1049325)0x0.
2021-05-30T09:48:12.698Z cpu11:1049281)NMP: nmp_ResetDeviceLogThrottling:3776: last error status from device t10.ATA_____ST1000LM0492D2GH172__________________________________ZGS23QAV repeated 6 times
2021-05-30T09:48:42.644Z cpu7:1049176)INFO (ne1000): false RX hang detected on vmnic0
2021-05-30T09:51:12.698Z cpu3:1049363)DVFilter: 6344: Checking disconnected filters for timeouts
2021-05-30T09:52:20.250Z cpu2:1049176)INFO (ne1000): false RX hang detected on vmnic0
2021-05-30T09:52:32.136Z cpu8:1051618)vmw_ahci[00000017]: CompletionBottomHalf:strange irq(s), 0x4000000
2021-05-30T09:52:32.136Z cpu8:1051618)vmw_ahci[00000017]: CompletionBottomHalf:PORT_IRQ_IF_NONFATAL exception.
2021-05-30T09:52:32.136Z cpu8:1051618)vmw_ahci[00000017]: LogExceptionSignal:Port 0, Signal: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040) Curr: --|--|--|--|--|--|--|--|--|--|--|-- (0x0000)
2021-05-30T09:52:32.136Z cpu8:1049247)vmw_ahci[00000017]: LogExceptionProcess:Port 0, Process: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040) Curr: --|--|--|--|--|--|IR|--|--|--|--|-- (0x0040)
2021-05-30T09:52:32.136Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Performing device reset due to Port IRQ Error.
2021-05-30T09:52:32.137Z cpu8:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:hardware stop on slot 0x1, activeTags 0x00000002
2021-05-30T09:52:32.159Z cpu8:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:port status: 0x40000001, tf status: 0x451
2021-05-30T09:52:32.161Z cpu2:1049247)vmw_ahci[00000017]: ExecInternalCommandPolled:FAIL!!: Internal command 2f, 00
2021-05-30T09:52:32.161Z cpu2:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:Fail to get error log for port 0
2021-05-30T09:52:32.162Z cpu2:1049247)vmw_ahci[00000017]: _IssueComReset:Issuing comreset...
2021-05-30T09:52:32.283Z cpu2:1049247)vmw_ahci[00000017]: ExceptionHandlerWorld:fail a command on slot 1
2021-05-30T09:52:32.283Z cpu12:1048622)NMP: nmp_ThrottleLogForDevice:3856: Cmd 0x28 (0x455a733c8440, 0) to dev "t10.ATA_____ST1000LM0492D2GH172__________________________________ZGS23QAV" on path "vmhba0:C0:T0:L0" Failed:
2021-05-30T09:52:32.283Z cpu12:1048622)NMP: nmp_ThrottleLogForDevice:3865: H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44 0x0. Act:NONE. cmdId.initiator=0x451a20b1a7b8 CmdSN 0x18a60
2021-05-30T09:52:32.283Z cpu12:1048622)ScsiDeviceIO: 4062: Cmd(0x455a733c8440) 0x28, CmdSN 0x18a60 from world 0 to dev "t10.ATA_____ST1000LM0492D2GH172__________________________________ZGS23QAV" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x4 0x44
2021-05-30T09:52:32.283Z cpu12:1048622)0x0.
I had some doubts if the computer's AHCI controller(Cannon Lake PCH-H AHCI Controller) is compatible with esxi 7 but cannot find any resource that confirms this. I read somewhere that disabling the default AHCI driver by the following ssh command may help:
esxcli system module set --enabled=false --module=vmw_ahci
I tried this and if the driver is disabled, post restart the controller wont even display at all. So this had to be re-enabled.
I also tried clearing out the partition table as this drive has no useful information but it always throws an 'input/output' error to any partedUtil command. It seems any write attempt to this device does not work.
When I try the partedUtil getptbl command, the partition format is described as 'unknown'.
FYI, before I setup ESXI, the HDD in question was a diskdrive for a Ubuntu OS and was accessible.
Any leads that could help fix this issue would be welcome.

Related

Dataflow (Beam 2.12) does not start due to ext4 not found

I am seeing all types of strange errors when running a dataflow job (Beam 2.12).
The job basically takes input from pubsub, read/writes from/to Datastore writes the result to pubsub.
Several Warnings W and Errors E appear in the Stackdriver logs. It is unclear how to resolve these. Up to now we were using Beam 2.9 and were not experiencing any of these issues.
A partial (redacted) log dump is available below.
W acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
W ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
W ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
W i8042: Warning: Keylock active
W GPT:Primary header thinks Alt. header is not at the end of the disk.
W GPT:52428799 != 62914559
W GPT:Alternate GPT header not at the end of the disk.
W GPT:52428799 != 62914559
W GPT: Use GNU Parted to correct GPT errors.
W device-mapper: verity: Argument 0: 'payload=PARTUUID=XXX'
W device-mapper: verity: Argument 1: 'hashtree=PARTUUID=XXX'
W device-mapper: verity: Argument 2: 'hashstart=2539520'
W device-mapper: verity: Argument 3: 'alg=sha1'
W device-mapper: verity: Argument 4: 'root_hexdigest=900...'
W device-mapper: verity: Argument 5: 'salt=b113702...'
W [/usr/lib/tmpfiles.d/var.conf:12] Duplicate line for path "/var/run", ignoring.
W Cannot set file attribute for '/var/log/journal', value=0x00800000, mask=0x00800000: Operation not supported
W Cannot set file attribute for '/var/log/journal/2a0c3f0af65e8318a0b8f3eb...', value=0x00800000, mask=0x00800000: Operation not supported
W Could not load the device policy file.
W [WARNING:persistent_integer.cc(96)] cannot open /var/lib/metrics/version.cycle for reading: No such file or directory
W WARNING Could not update the authorized keys file for user root. [Errno 30] Read-only file system: '/root/.ssh'.
W [CLOUDINIT] cc_write_files.py[WARNING]: Undecodable permissions None, assuming 420
...
E Error initializing dynamic plugin prober: Error (re-)creating driver directory: mkdir /usr/libexec/kubernetes: read-only file system
W No api server defined - no node status update will be sent.
W Failed to retrieve checkpoint for "kubelet_internal_checkpoint": checkpoint is not found
W Unknown healthcheck type 'NONE' (expected 'CMD') in container 7df5acdbd1ad6756e3e409c6e8760d274bdc03f83bf...
E while reading 'google-dockercfg-url' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg-url
E while reading 'google-dockercfg' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg
W Unknown healthcheck type 'NONE' (expected 'CMD') in container 3aa8c92a0b7d746d7004768d5182f0558a0c0c90dfcd5...
W Unknown healthcheck type 'NONE' (expected 'CMD') in container 64b9fb0459f88833dee78943c32598761154e4a49d708...
W Unknown healthcheck type 'NONE' (expected 'CMD') in container d2edf1c5e89b746e8c9c96b2a39a9d7ac7da2ecf52f96d...
W Unknown healthcheck type 'NONE' (expected 'CMD') in container b2448a8792ad63059bb70f1f6f12385caae7a833018d05...
E EXT4-fs (sdb): VFS: Can't find ext4 filesystem
E Error syncing pod c386113... ("dataflow-...-harness-z656_default(c386113...)"), skipping: failed to "StartContainer" for "java-streaming" with CrashLoopBackOff: "Back-off 10s restarting failed container=java-streaming pod=dataflow-...-harness-z656_default(c386113...)"
W [WARNING:metrics_daemon.cc(619)] cannot read /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
E EXT4-fs (sdd): VFS: Can't find ext4 filesystem
...
W Unknown healthcheck type 'NONE' (expected 'CMD') in container 675eb66a9e794b3dea03b62c3bdaf539034c998bf11c...
E Error syncing pod c386113... ("dataflow-...-harness-z656_default(c386113...)"), skipping: failed to "StartContainer" for "java-streaming" with CrashLoopBackOff: "Back-off 40s restarting failed container=java-streaming pod=dataflow-...-harness-z656_default(c386113...)"
E Error syncing pod c386113... ("dataflow-...-harness-z656_default(c386113...)"), skipping: failed to "StartContainer" for "java-streaming" with CrashLoopBackOff: "Back-off 40s restarting failed container=java-streaming pod=dataflow-...-harness-z656_default(c386113...)"
E Error syncing pod c386113... ("dataflow-...-harness-z656_default(c386113...)"), skipping: failed to "StartContainer" for "java-streaming" with CrashLoopBackOff: "Back-off 40s restarting failed container=java-streaming pod=dataflow-...-harness-z656_default(c386113...)"
W Unknown healthcheck type 'NONE' (expected 'CMD') in container 7d7536b93cb92562bdd12da3fd25a53caea8c9a9e1cee603b3999dfdd5681a27
E Error syncing pod c386113... ("dataflow-...-harness-z656_default(c386113...)"), skipping: failed to "StartContainer" for "java-streaming" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=java-streaming pod=dataflow-...-harness-z656_default(c386113...)"
I resolved this by upgrading several dependencies.
The maven versions plugin helped me do this, I installed the plugin by adding the following to my .pom file:
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>versions-maven-plugin</artifactId>
<version>2.5</version>
</plugin>
Then I checked the libraries to update and updated them. I suspect it to be an older BigTable client, as mentioned here.
mvn versions:display-dependency-updates
mvn versions:use-latest-versions

Job Fails with odd message

I have a job that is failing at the very start of the message:
"#*" and "#N" are reserved sharding specs. Filepattern must not contain any of them.
I have altered the destination location to be something other than the default (an email address) which would include the # symbol but I can still see it is using temporary destinations within that path that I am unable to edit.
Did anyone experience this issue before? I've got a file which is only 65k rows long, I can preview all of the complete data in Data Prep but when I run the job it fails which is super tedious and ~3hrs of cleaning down the drain if this won't run. (I appreciate it's not designed for this, but Excel was being a mare so it seemed like a good solution!)
Edit - Adding Logs:
2018-03-10 (13:47:34) Value "PTableLoadTransformGCS/Shuffle/GroupByKey/Session" materialized.
2018-03-10 (13:47:34) Executing operation PTableLoadTransformGCS/SumQuoteAndDelimiterCounts/GroupByKey/Read+PTableLoadTran...
2018-03-10 (13:47:38) Executing operation PTableLoadTransformGCS/Shuffle/GroupByKey/Close
2018-03-10 (13:47:38) Executing operation PTableStoreTransformGCS/WriteFiles/GroupUnwritten/Create
2018-03-10 (13:47:39) Value "PTableStoreTransformGCS/WriteFiles/GroupUnwritten/Session" materialized.
2018-03-10 (13:47:39) Executing operation PTableLoadTransformGCS/Shuffle/GroupByKey/Read+PTableLoadTransformGCS/Shuffle/Gr...
2018-03-10 (13:47:39) Executing failure step failure49
2018-03-10 (13:47:39) Workflow failed. Causes: (c759db2a23a80ea): "#*" and "#N" are reserved sharding specs. Filepattern m...
(c759db2a23a8c5b): Workflow failed. Causes: (c759db2a23a80ea): "#*" and "#N" are reserved sharding specs. Filepattern must not contain any of them.
2018-03-10 (13:47:39) Cleaning up.
2018-03-10 (13:47:39) Starting worker pool teardown.
2018-03-10 (13:47:39) Stopping worker pool...
And StackDriver warnings or higher:
W ACPI: RSDP 0x00000000000F23A0 000014 (v00 Google)
W ACPI: RSDT 0x00000000BFFF3430 000038 (v01 Google GOOGRSDT 00000001 GOOG 00000001)
W ACPI: FACP 0x00000000BFFFCF60 0000F4 (v02 Google GOOGFACP 00000001 GOOG 00000001)
W ACPI: DSDT 0x00000000BFFF3470 0017B2 (v01 Google GOOGDSDT 00000001 GOOG 00000001)
W ACPI: FACS 0x00000000BFFFCF00 000040
W ACPI: FACS 0x00000000BFFFCF00 000040
W ACPI: SSDT 0x00000000BFFF65F0 00690D (v01 Google GOOGSSDT 00000001 GOOG 00000001)
W ACPI: APIC 0x00000000BFFF5D10 00006E (v01 Google GOOGAPIC 00000001 GOOG 00000001)
W ACPI: WAET 0x00000000BFFF5CE0 000028 (v01 Google GOOGWAET 00000001 GOOG 00000001)
W ACPI: SRAT 0x00000000BFFF4C30 0000B8 (v01 Google GOOGSRAT 00000001 GOOG 00000001)
W ACPI: 2 ACPI AML tables successfully acquired and loaded
W ACPI: Executed 2 blocks of module-level executable AML code
W acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
W ACPI: Enabled 16 GPEs in block 00 to 0F
W ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
W ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
W i8042: Warning: Keylock active
W GPT:Primary header thinks Alt. header is not at the end of the disk.
W GPT:41943039 != 524287999
W GPT:Alternate GPT header not at the end of the disk.
W GPT:41943039 != 524287999
W GPT: Use GNU Parted to correct GPT errors.
W device-mapper: verity: Argument 0: 'payload=PARTUUID=245B0EEC-6404-8744-AAF2-E8C6BF78D7B2'
W device-mapper: verity: Argument 1: 'hashtree=PARTUUID=245B0EEC-6404-8744-AAF2-E8C6BF78D7B2'
W device-mapper: verity: Argument 2: 'hashstart=2539520'
W device-mapper: verity: Argument 3: 'alg=sha1'
W device-mapper: verity: Argument 4: 'root_hexdigest=244007b512ddbf69792d485fdcbc3440531f1264'
W device-mapper: verity: Argument 5: 'salt=5bacc0df39d2a60191e9b221ffc962c55e251ead18cf1472bf8d3ed84383765b'
E EXT4-fs (dm-0): couldn't mount as ext3 due to feature incompatibilities
W [/usr/lib/tmpfiles.d/var.conf:12] Duplicate line for path "/var/run", ignoring.
W Could not stat /dev/pstore: No such file or directory
W Kernel does not support crash dumping
W Could not load the device policy file.
W [CLOUDINIT] cc_write_files.py[WARNING]: Undecodable permissions None, assuming 420
W [CLOUDINIT] cc_write_files.py[WARNING]: Undecodable permissions None, assuming 420
W [CLOUDINIT] cc_write_files.py[WARNING]: Undecodable permissions None, assuming 420
W [CLOUDINIT] cc_write_files.py[WARNING]: Undecodable permissions None, assuming 420
W [WARNING:persistent_integer.cc(75)] cannot open /var/lib/metrics/version.cycle for reading: No such file or directory
W No API client: no api servers specified
W Unable to update cni config: No networks found in /etc/cni/net.d
W unable to connect to Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp 127.0.0.1:15441: getsockopt: connection refused
W No api server defined - no events will be sent to API server.
W Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
W Unable to update cni config: No networks found in /etc/cni/net.d
E Image garbage collection failed once. Stats initialization may not have completed yet: unable to find data for container /
W No api server defined - no node status update will be sent.
E Failed to check if disk space is available for the runtime: failed to get fs info for "runtime": unable to find data for container /
E Failed to check if disk space is available on the root partition: failed to get fs info for "root": unable to find data for container /
E [ContainerManager]: Fail to get rootfs information unable to find data for container /
W Registration of the rkt container factory failed: unable to communicate with Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp 127.0.0.1:15441: getsockopt: connection refused
E Could not find capacity information for resource storage.kubernetes.io/scratch
W eviction manager: no observation found for eviction signal allocatableNodeFs.available
W Profiling Agent not found. Profiles will not be available from this worker.
E debconf: delaying package configuration, since apt-utils is not installed
W [WARNING:metrics_daemon.cc(598)] cannot read /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
E % Total % Received % Xferd Average Speed Time Time Time Current
E Dload Upload Total Spent Left Speed
E
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 3698 100 3698 0 0 64248 0 --:--:-- --:--:-- --:--:-- 64877

Am I increasing shared memory correctly for GNURadio?

I'm working with GNURadio, and working with stream tags (using stream tagging to create a burst transmitter), but my flowgraph won't run with around ~200 stream tags, citing error below.
gr::vmcircbuf_sysv_shm: shmget (1): Invalid argument
gr::vmcircbuf_sysv_shm: shmget (1): Invalid argument
gr::vmcircbuf_sysv_shm: shmget (1): Invalid argument
gr::buffer::allocate_buffer: failed to allocate buffer of size 1250000 KB
gr::vmcircbuf_sysv_shm: shmget (1): Invalid argument
gr::vmcircbuf_sysv_shm: shmget (1): Invalid argument
gr::vmcircbuf_sysv_shm: shmget (1): Invalid argument
gr::buffer::allocate_buffer: failed to allocate buffer of size 1250000 KB
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
However, sysctl --all | grep shm outputs
kernel.shm_next_id = -1
kernel.shm_rmid_forced = 0
kernel.shmall = 32147483648
kernel.shmmax = 32147483648
kernel.shmmni = 16777216
This means I should have 32 GB in shared memory, correct? I set kernel.shmall and shmmax via
sudo sysctl kernel.shmall=32147483648
sudo sysctl kernel.shmmax=32147483648
The only thing that concerns me is cat /proc/meminfo | grep shmem returns
Shmem: 42556 kB
Is there a better way to increase shared memory?

RabbitMQ Generic server rabbit_disk_monitor terminating / eheap_alloc: Cannot allocate 229520 bytes of memory (of type "old_heap")

RabbitMQ crashed.
RabbitMQ was working correctly for many days(10-15 days).
I am not getting why it got crashed.
I am using RabbitMQ 3.4.0 on Erlang 17.0
The erlang has created dump file for the crash. Which shows
eheap_alloc: Cannot allocate 229520 bytes of memory (of type "old_heap").
Also note that the rabbitmq publish-subscribe message load is very low. (max:1-2 messages/second).And RabbitMQ messages are processed as it comes so RabbitMQ is almost empty all the time. The disk space & memory are also sufficient.
More system info:
Limiting to approx 8092 file handles (7280 sockets)
Memory limit set to 6553MB of 16383MB total.
Disk free limit set to 50MB.
The RabbitMQ logs are as below.
=ERROR REPORT==== 18-Jul-2015::04:29:31 ===
** Generic server rabbit_disk_monitor terminating
** Last message in was update
** When Server state == {state,"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia",
50000000,28358258688,100,10000,
#Ref<0.0.106.70488>,false}
** Reason for termination ==
** {eacces,[{erlang,open_port,
[{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,internal_update,1,[]},
{rabbit_disk_monitor,handle_info,2,[]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,599}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
=INFO REPORT==== 18-Jul-2015::04:29:31 ===
Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{eacces,[{erlang,open_port,
[{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,init,1,[]},
{gen_server,init_it,6,[{file,"gen_server.erl"},{line,306}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}},
17179336704}
=INFO REPORT==== 18-Jul-2015::04:29:31 ===
Disabling disk free space monitoring on unsupported platform:
{{'EXIT',{eacces,[{erlang,open_port,
[{spawn,"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,init,1,[]},
{gen_server,init_it,6,[{file,"gen_server.erl"},{line,306}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}},
17179336704}
=CRASH REPORT==== 18-Jul-2015::04:29:31 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.167.0>
registered_name: rabbit_disk_monitor
exception exit: {eacces,
[{erlang,open_port,
[{spawn,
"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,internal_update,1,[]},
{rabbit_disk_monitor,handle_info,2,[]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,599}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}
in function gen_server:terminate/6 (gen_server.erl, line 746)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>]
messages: []
links: [<0.166.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 4185
stack_size: 27
reductions: 481081978
neighbours:
=SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: child_terminated
Reason: {eacces,
[{erlang,open_port,
[{spawn,
"C:\\Windows\\system32\\cmd.exe /c dir /-C /W \"c:/Users/jasmin.joshi/AppData/Roaming/RabbitMQ/db/rabbit#localhost-mnesia\""},
[stream,in,eof,hide]],
[]},
{os,cmd,1,[{file,"os.erl"},{line,204}]},
{rabbit_disk_monitor,get_disk_free,2,[]},
{rabbit_disk_monitor,internal_update,1,[]},
{rabbit_disk_monitor,handle_info,2,[]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,599}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}
Offender: [{pid,<0.167.0>},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
{restart_type,{transient,1}},
{shutdown,4294967295},
{child_type,worker}]
=CRASH REPORT==== 18-Jul-2015::04:29:31 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.24989.51>
registered_name: []
exception exit: unsupported_platform
in function gen_server:init_it/6 (gen_server.erl, line 322)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>]
messages: []
links: [<0.166.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 1598
stack_size: 27
reductions: 650
neighbours:
=SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: start_error
Reason: unsupported_platform
Offender: [{pid,<0.167.0>},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
{restart_type,{transient,1}},
{shutdown,4294967295},
{child_type,worker}]
=CRASH REPORT==== 18-Jul-2015::04:29:31 ===
crasher:
initial call: rabbit_disk_monitor:init/1
pid: <0.24991.51>
registered_name: []
exception exit: unsupported_platform
in function gen_server:init_it/6 (gen_server.erl, line 322)
ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.140.0>]
messages: []
links: [<0.166.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 1598
stack_size: 27
reductions: 650
neighbours:
=SUPERVISOR REPORT==== 18-Jul-2015::04:29:31 ===
Supervisor: {local,rabbit_disk_monitor_sup}
Context: start_error
Reason: unsupported_platform
Offender: [{pid,{restarting,<0.167.0>}},
{name,rabbit_disk_monitor},
{mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
{restart_type,{transient,1}},
{shutdown,4294967295},
{child_type,worker}]
From the error message, rabbitmq can't open more files due to system limits.
You can set max open file numbers to upper value to avoid the problem.
https://serverfault.com/questions/249477/windows-server-2008-r2-max-open-files-limit
There are two unrelated errors here: one is the VM failure to allocate memory. Another is disk space monitor terminating. Disk space monitor is optional and on some less common platforms or with specific security restrictions, it is known to fail. That does not bring the VM down, and certainly has nothing to do with heap allocation failures.
The heap allocation failure typically comes down to two most common cases:
A known bug fixed in Erlang 17.x (don't recall which specific patch release, so use 17.5)
You run 32 bit Erlang/OTP on a 64 bit OS.
Chen Yu's comment about the EACCESS system call error is correct.
I get analog error
systemd unit for activation check: "rabbitmq-server.service"
eheap_alloc: Cannot allocate 306586976 bytes of memory (of type "heap").^M
^M
Crash dump is being written to: erl_crash.dump...done^M
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 514979
max locked memory (kbytes, -l) 65536
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 514979
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
this is crush dump
=erl_crash_dump:0.5
Wed Dec 2 17:16:31 2020
Slogan: eheap_alloc: Cannot allocate 306586976 bytes of memory (of type "heap").
System version: Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:32:32] [ds:32:32:10] [async-threads:512] [kernel-poll:true]
Compiled: Mon Feb 5 17:34:00 2018
Taints: crypto,asn1rt_nif,erl_tracer,zlib
Atoms: 34136
Calling Thread: scheduler:0
=scheduler:1
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING
Scheduler Sleep Info Aux Work:
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK
Current Process:
=scheduler:2
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING
Scheduler Sleep Info Aux Work: THR_PRGR_LATER_OP
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC
Current Process:
=scheduler:3
Scheduler Sleep Info Flags:
Scheduler Sleep Info Aux Work: DELAYED_AW_WAKEUP | DD | THR_PRGR_LATER_OP
Current Port:
Run Queue Max Length: 0
Run Queue High Length: 0
Run Queue Normal Length: 0
Run Queue Low Length: 0
Run Queue Port Length: 0
Run Queue Flags: OUT_OF_WORK | HALFTIME_OUT_OF_WORK | NONEMPTY | EXEC
Current Process: <0.12306.0>
Current Process State: Running
Current Process Internal State: ACT_PRIO_NORMAL | USR_PRIO_NORMAL | PRQ_PRIO_NORMAL | ACTIVE | RUNNING | TRAP_EXIT | ON_HEAP_MSGQ
Current Process Program counter: 0x00007f2f3ab3a060 (unknown function)
Current Process CP: 0x0000000000000000 (invalid)
Current Process Limited Stack Trace:
0x00007f2b50252d68:SReturn addr 0x32A6EC98 (rabbit_channel:handle_method/3 + 6712)
0x00007f2b50252d78:SReturn addr 0x32A69630 (rabbit_channel:handle_cast/2 + 4160)
0x00007f2b50252df8:SReturn addr 0x51102708 (gen_server2:handle_msg/2 + 1808)
0x00007f2b50252e28:SReturn addr 0x3FD85E70 (proc_lib:init_p_do_apply/3 + 72)
0x00007f2b50252e48:SReturn addr 0x7FFB4948 (<terminate process normally>)
=scheduler:4
Scheduler Sleep Info Flags: SLEEPING | TSE_SLEEPING | WAITING

[uwsgi + lua]: no app loaded

I'm trying to set up a uWSGI server with Lua script.
For now I've just a little test script (more or less the one shown in the uWSGI doc http://uwsgi-docs.readthedocs.org/en/latest/Lua.html#your-first-wsapi-application).
Here is my script :
function run(wsapi_env)
local headers = { ["Content-type"] = "text/html" }
local function hello_text()
coroutine.yield("<html><body>")
coroutine.yield("<p>Hello Wsapi!</p>")
coroutine.yield("<p>PATH_INFO: " .. wsapi_env.PATH_INFO .. "</p>")
coroutine.yield("<p>SCRIPT_NAME: " .. wsapi_env.SCRIPT_NAME .. "</p>")
coroutine.yield("</body></html>")
end
return 200, headers, coroutine.wrap(hello_text)
end
return run
I launch uWSGI with this command line ( until I manage to launch it succefully once, then I will use config file) :
uwsgi --socket :63031 --plugins lua --lua main.lua --master
I've run this command from the directory where is stored main.lua (I've tried with main.lua full path ) .
But uWSGI doesn't load the lua script :
*** Starting uWSGI 2.0.7-debian (64bit) on [Thu Feb 5 15:45:00 2015] ***
compiled with version: 4.9.1 on 25 October 2014 19:17:54
os: Linux-3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt2-1 (2014-12-08)
nodename: ns342653.ip-91-121-135.eu
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 4
current working directory: /home/vincent/web
detected binary path: /usr/bin/uwsgi-core
your processes number limit is 63906
your memory page size is 4096 bytes
detected max file descriptor number: 65536
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to TCP address :63031 fd 3
Initializing Lua environment... (1 lua_States)
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 145536 bytes (142 KB) for 1 cores
*** Operational MODE: single process ***
*** no app loaded. going in full dynamic mode ***
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 8148)
spawned uWSGI worker 1 (pid: 8149, cores: 1)
How can I make uWSGI load my script ?
Thanks for your awnser.
( P.S. : I've successfully launched uWSGI with psgi and perl script with almost the same config)

Resources