We have a blade server booting from SAN where we attempt to image. After the image got applied successfully the server failed to boot to the OS. We escalate the issue to the storage team and found out the root cause was "LUN was mapped incorrectly" however not much more detail was given regarding the root cause and resolution. We do not have much knowledge on SAN. Could someone help to explain what is the most probably cause for "LUN was mapped incorrectly" when server failed to boot to OS after image got applied and how the issue is resolved?
First off - LUN is 'logical unit' - it's essentially a disk as provided from a storage array. Topology and geometry are hidden behind the scenes, and generally shouldn't be visible to the host.
LUN mapping is the process where a LUN as created on a storage array is presented across the SAN to a designated host - or set of hosts. Part of this involves setting a LUN ID (Although many storage arrays do this automatically) and this LUN id is how it 'appears' to the host. The convention for SCSI connectivity is that a LUN is identifiable by a compound of controller, target, LUN id. (After which the host can partition the LUN, although it probably shouldn't on most SAN storage configurations).
Controller being the card in the host, target being the storage array, and LUN being that number that the storage array has configured.
Many implementations of SCSI check to see if a LUN 0 exists first, and if it doesn't, doesn't bother to continue scanning the SCSI bus - as searching large number of LUNs and getting timeouts because it's not connected can take a lot of time.
Your boot device will be 'known' to the host as a particular combination of controller, target, lun (and partition). incorrect mapping means that - probably - this boot LUN was on the wrong LUN id, thus your host couldn't find it to boot from.
Related
I am currently doing a system design with CANopen communication and I am curious about the following question.
In the system a device is programmed to have no Node-Id assigned (255) on startup. Normaly a LSS Master now has to assign a specific Node-Id to the device to work properly. However, if there is no LSS Master functionality implemented in any other bus node, does the CANopen standard allows the the unconfigured device to assign itself a predefined ID after a timeout?
In my opinion this is not possible because it can lead to undefined system states but I could not find anything in the standard documents.
We use Azure IOT and edgeHub and edgeAgent as modules for Edge runtime. We want to verify the capability of Offline storage is configured correctly in our environment
We have custom simulator module connected to custom publisher module that publishes to an API in the Cloud. The simulator is continuously producing a message around 10KB every 2 seconds. The publisher module could not talk outside because of a blocking firewall rule. I want to verify all the memory/RAM allocated for edgeHub is used and later overflows to disk and uses reasonable available disk space.
This exercise is taking longer to complete even when I run multiple modules/instances of simulator.
Queries:
How can I control the size of memory allocated to edgeHub. What is the correct createOptions to control/reduce allocated memory. It is currently allocated around 1.8GB.
I see generally during the exercise, RAM keeps increasing. But at some point, it drops down a little and keeps increasing after. Is there some kind of GC or optimization happening inside edgeHub?
b0fb729d25c8 edgeHub 1.36% 547.8MiB / 1.885GiB 28.38% 451MB / 40.1MB 410MB / 2.92GB 221
How can ensure that any of the messages produced by simulator are not lost. Is there a way to count the number of messages in edgeHub?
I have done the proper configuration to mount a directory from VM to container for persistent storage. Is there a specific folder inside edgeHub folder under which messages would be stored when overflown?
I will document the answers after the input I have received from Azure IoTHub/IoTEdge maintainers.
To control memory limit of containers/modules on Edge, specify as below with appropriate max memory limit in bytes
"createOptions": "{\"HostConfig\": { \"Memory\": 536870912 }}"
Messages are persistent because the implementation is RocksDB which stores to disk by default
To make the messages persistent even after edgeHub container recreation, mount a directory from Host to Container. In the example below /data from Host is mounted to appropriate location within edgeHub
"env": {
"storageFolder": {
"value": "/data"
}
}
To monitor the memory usage the metrics feature will be generally available in the 1.0.10
Monitor IoT Edge Devices At-scale
We have 2 windows servers running on windows server 2012R2
we have a shared disk and a witness disk to implement a quorum behavior in the shared disk arbitration.
both quorum and data are currently configured with Fiber channel MPIO.
we do not provide the hardware so our customers work with various SAN vendors.
We are using the SCSI3 persistent reservation mechanism to make the disk arbitration, we are reserving the quorum witness disk from one machine and checking it from the other (passive) machine.
As part of the reservation flow each machine registers its unique SCSI registration key and uses it to perform the reservation when needed.
The issue occurs when MPIO is configured since in our current implementation (so it seems ) the key is registered on the device using the io path which is currently used to access the storage.
Once there is a failover/switch in IO path the reservation fails due to the fact that the key is not registered for that path.
Is there a way on the device/code level to have a SCSI reservation key be registered on all IO paths instead of just the specific path the registration command arrived on?
Thanks.
pr type need to be set as "Exclusive Access - Registrants Only". And all paths on active windows host must be registered for pr.
https://www.veritas.com/support/en_US/article.100016085.html
and https://www.veritas.com/support/en_US/article.100018257.html may help.
I want to understand the performance of one device driver module in linux kernel. In this case I use carl9170 device driver in linux.
If I use two physical interfaces, how can the single module carl9170 handle 2 different physical interfaces?
Because so far, I have known that these 2 physical interfaces will make 2 instances and use different packet buffers for each but just using single carl9170 module. So it's confusing me.
And which file in linux kernel source code can I find about this handling method (relates to carl9170 device driver)?
Thank you very much for your help
For 2, take a look at the folder:
drivers/net/wireless/ath/carl9170/
This folder is located under your kernel source directory. It contains all the sources of the driver.
For 1:
It is pretty much how classes works on oriented object programming: how does an object know which instance of the data it must work with? The this pointer references the correct in memory data.
Take a look at the file drivers/net/wireless/ath/carl9170/carl9170.h. Every function exported by the driver is declared at this file. Note that every function has at its first parameter a reference to the struct ar9170 data type. This is exactly the data set that the driver must work with. It specifies everything the driver need to know about the device and its sates, since the USB buses address where the device is connected, to the state of the device, like its power, connection state and any other data the driver itself need in order to keep the device working properly.
Note that this is driver internal data thought. The kernel has its own set of data to keep both the driver, the device and the kernel itself working.
Take a look at the 546 line of carl9170.h. It is where the function declarations starts. This file is as of the kernel 3.8.8.
Just like in Object Oriented Programming you would allocate as many instances of a class as you need, the kernel will allocate as many ar9170 structures a it needs, one referencing each device.
The device ids can be obtained under the /sys/class/net directory. There will be a soft link for each of the network devices attached to your computer. This link will point the device to something like the following:
$ ls -l eth0
../../devices/pci0000:00/0000:00:04.0/0000:02:00.0/net/eth0
The pci0000:00 is the bus. The 0000:00:04.0 I believe is the bus address. Finally, the 0000:02:00.0 is the device id. Afaik, every registered device follows the same logic.
Finally, if you have two carl9170 devices, both will be under the directory /sys/class/net but probably one of them will be named wifi0 and the other wifi1. Also, each of them will point to different devices (check it with the command ls -l /sys/class/net).
I just would like to note that in the explanation I haven't used any wireless card. So I'm not sure whether wireless cards are shown under /sys/class/net or not. Anyway, it will be something very similar, like /sys/class/wireless.
When we working on NUMA system, memory can be local or remote relative to current NUMA node.
To make memory more local there is a "first-touch" policy (the default memory to node binding strategy):
http://lse.sourceforge.net/numa/status/description.html
Default Memory Binding
It is important that user programs' memory is allocated on a node close to the one containing the CPU on which they are running. Therefore, by default, page faults are satisfied by memory from the node containing the page-faulting CPU. Because the first CPU to touch the page will be the CPU that faults the page in, this default policy is called "first touch".
http://techpubs.sgi.com/library/dynaweb_docs/0640/SGI_Developer/books/OrOn2_PfTune/sgi_html/ch08.html
The default policy is called first-touch. Under this policy, the process that first touches (that is, writes to, or reads from) a page of memory causes that page to be allocated in the node on which the process is running. This policy works well for sequential programs and for many parallel programs as well.
There are also some other non-local policies. Also there is a function to require explicit move of memory segment to some NUMA node.
But sometimes (in context of many threads of single applications) it can be useful to have "next touch" policy: call some function to "unbind" some memory region (up to 100s MB) with some data and reapply the "first touch"-like handler on this region which will migrate the page on next touch (read or write) to the numa node of accessing thread.
This policy is useful in case when there are huge data to process by many threads and there are different patterns of access to this data (e.g. first phase - split the 2D array by columns via threads; second - split the same data by rows).
Such policy was supported in Solaris since 9 via madvice with MADV_ACCESS_LWP flag
https://cims.nyu.edu/cgi-systems/man.cgi?section=3C&topic=madvise
MADV_ACCESS_LWP Tell the kernel that the next LWP to
touch the specified address range
will access it most heavily, so the
kernel should try to allocate the
memory and other resources for this
range and the LWP accordingly.
There was (may 2009) the patch to linux kernel named "affinity-on-next-touch", http://lwn.net/Articles/332754/ (thread) but as I understand it was unaccepted into mainline, isn't it?
Also there were Lee Schermerhorn's "migrate_on_fault" patches http://free.linux.hp.com/~lts/Patches/PageMigration/.
So, the question: Is there some next-touch for NUMA in current vanilla Linux kernel or in some major fork, like RedHat linux kernel or Oracle linux kernel?
Given my understanding, there aren't anything similar in the vanilla kernel. numactl has functions to migrate pages manually, but it's probably not helpful in your case. (NUMA policy description is in Documentation/vm/numa_memory_policy if you want to check yourself)
I think those patches are not merged as I don't see any of the relevant code snippets showing up in current kernel.