What does this mean NFC NDEF 716 byte? - memory

What does it mean when someone says the NFC tag memory capacity is 1024 byte (NDEF 716 byte)? If I am only using the unique ID of the NFC tag, how do these sizes affect my selection of the tag?

Since you only intend to use the anti-collision identifier (UID), the actual memory size does not affect your application. Most NFC tags have some form of anti-collision identifier. Depending on the tag type, the memory used to store that identifier may already be calculated into the overall memory size or may be stored in a separate memory area that is not counted towards the overall memory size.
The discepancy between the overall memory size and the memory available for storing NDEF data comes from the fact that not all memory regions are usable for storing NDEF data. Depending on the tag platform, some memory areas may be reserved for storing lock bits, capability information, access keys/passwords, the anti-collision identifier, or other meta information. Consequently, these areas cannot be used for general-purpose NDEF data (e.g. to store a website URL).
However, all this does not mean that only using the UID makes you safe to use just any tag.
First, there are tags that are not compatible to all Android devices (in case you intend to use Android as the reader platform, though similar constraints might apply to other reader platforms as well). Particularly the figures you mentioned in your question (1024 byte overall memory, 716 byte NDEF memory) suggest that these are MIFARE Classic 1K tags. These tags use a proprietary protocol that is not available on some Android devices (specifically those without an NFC chipset from NXP). While using the anti-collision identifier would even work on all devices, some manufacturers (e.g. Samsung) decided to explicitly block those tags on many of their devices. Consequently, using MIFARE Classic tags might not be the best choice for your application.
Second, not all tags might expose an anti-collision identifier that's suitable for your application. For instance, there are tags that only expose a random ID that changes with every activation. Also, there might be tags with duplicate IDs. For instance, due to the total amount tags with 4-byte (N)UID that has been manufactured, there must be tags with duplicate IDs.

The answer to your second question is that these sizes do not affect your selection of the tag. The NFC tag's unique ID (UID) is stored in a seperate memory space to the NDEF memory space.
The answer to your first question is that this memory space is where you can store your NFC data. Consider this similar to memory space on a USB stick. However, consider that 716 bytes is not much more than a couple of paragraphs of text. The difference between the 1024 and the 716 is because the chip stores other data such as the UID.
There's an explanation of NFC tag memory capacity and how much you need at https://nfc.today/advice/nfc-tags-how-much-memory

Related

How is memory loaded from virtual memory?

I know that Operating systems usually keep Page Tables to map a chunk of virtual memory to a chunk of physical memory.
My question is, does the CPU load the whole chunk when it's loading a given byte?
lets say I have:
ld %r0, 0x4(%r1)
Assuming my page size is 4 KB, does the the CPU load all 4KB at once or it manages to
Load only a byte given the offset properly?
Is the page size mandated by the hardware or configurable by software and the OS?
Edit:
Figured that page size is mandated by hardware:
The available page sizes depend on the instruction set architecture, processor type, and operating (addressing) mode. The operating system selects one or more sizes from the sizes supported by the architecture
your question touches so many levels of cpu/memory architecture ... without knowing the exact cpu-architecture/memory-architecture/version you have in mind: while the cpu-command targets only one byte, it will trigger the memory-controller to locate the whole physical-page and load that one (atleast, prefetch might kick in) completely to second/first-level cache. your data is transfered after filling the cache.
On a typical modern CPU, yes, it loads the whole page.
It couldn't really work any other way, since there are only two states in the page tables for a given page: present and not present. If the page is present, it must be mapped to some page in physical memory. If not present, every access to that page produces a page fault. There is no "partially present" state.
In order for it to be safe for the OS to mark the page present, it has to load the entire page into physical memory and update the page tables to point the virtual page to the physical page. If it only loaded a single byte or a smaller amount, the application might later try to access some other byte on the same page that hadn't been loaded, and it'd read garbage. There's no way for the CPU to generate another page fault in that case to let the OS fix things up, unless the page were marked not present, in which case the original access wouldn't be able to complete either.
The page size is fixed in hardware, though some architectures offer a few different choices that the OS can select from. For instance, recent x86-64 CPUs allow pages to be either 4 KB, 2 MB or 1 GB. The OS can mix-and-match these at runtime; there are bits in the page tables to indicate the size of each page.

Does AArch64 support unaligned access?

Does AArch64 support unaligned access natively? I am asking because currently ocamlopt assumes "no".
Providing the hardware bit for strict alignment checking is not turned on (which, as on x86, no general-purpose OS is realistically going to do), AArch64 does permit unaligned data accesses to Normal (not Device) memory with the regular load/store instructions.
However, there are several reasons why a compiler would still want to maintain aligned data:
Atomicity of reads and writes: naturally-aligned loads and stores are guaranteed to be atomic, i.e. if one thread reads an aligned memory location simultaneously with another thread writing the same location, the read will only ever return the old value or the new value. That guarantee does not apply if the location is not aligned to the access size - in that case the read could return some unknown mixture of the two values. If the language has a concurrency model which relies on that not happening, it's probably not going to allow unaligned data.
Atomic read-modify-write operations: If the language has a concurrency model in which some or all data types can be updated (not just read or written) atomically, then for those operations the code generation will involve using the load-exclusive/store-exclusive instructions to build up atomic read-modify-write sequences, rather than plain loads/stores. The exclusive instructions will always fault if the address is not aligned to the access size.
Efficiency: On most cores, an unaligned access at best still takes at least 1 cycle longer than a properly-aligned one. In the worst case, a single unaligned access can cross a cache line boundary (which has additional overhead in itself), and generate two cache misses or even two consecutive page faults. Unless you're in an incredibly memory-constrained environment, or have no control over the data layout (e.g. pulling packets out of a network receive buffer), unaligned data is still best avoided.
Necessity: If the language has a suitable data model, i.e. no pointers, and any data from external sources is already marshalled into appropriate datatypes at a lower level, then there's really no need for unaligned accesses anyway, and it makes the compiler's life that much easier to simply ignore the idea altogether.
I have no idea what concerns OCaml in particular, but I certainly wouldn't be surprised if it were "all of the above".

How big can the payload be when sending data via WatchConnectivity?

When sending data using the WatchConnectivity framework, either from the phone to the watch or vice-versa, how big can the payload be before the framework gives me the WCErrorCodePayloadTooLarge error?
I couldn't find the answer on Apple's documentation, and there doesn't seem to be much information on this on the internet at this time (in fact, googling WCErrorCodePayloadTooLarge gives me just 4 results).
Has anyone tested to try to find the answer to this? If this question doesn't get an answer, I will try to do it myself and post the results.
So far, all the information I have is that it may be able to support files that are bigger than 30 MBs. I think this because I take a lot of raw photos on my iPhone, and they usually are ~36MB in size, and they always show up in my watch's Photos app.
For reference, WCSession's documentation has the following description of WCErrorCodePayloadTooLarge:
An error indicating that the item being sent exceeds the maximum size
limit. This type of error can occur for both data dictionaries and
files.
Available in watchOS 2.0 and later.
According to the private symbols WCPayloadSizeLimitApplicationContext, WCPayloadSizeLimitMessage, WCPayloadSizeLimitUserInfo, the limits (as of iOS 9.0.2) are:
65,536 bytes (65.5 KB) for a message
65,536 bytes (65.5 KB) for a user info
262,144 bytes (262.1 KB) for an application context
I don't know why Apple wouldn't document this, other than the fact that it can be difficult when sending dictionaries through WatchConnectivity to determine exactly how large they are. Certainly the acceptable sizes may change over time.
I couldn't find (and haven't personally observed) any maximum size limit when sending files, though I've noticed that it seems to get unreliable when you send large files (hundreds of MBs).

Exclusives Reservation Granule (ERG) on Apple's processors

Does anyone know what is ERG on Apple's A5, A5X, A6 and A6X processors?
We ran into an obscure bug with LDREX/STREX instructions and the behavior is different between A5's and A6's. The only explanation I have is that they have different ERG, but can't find anything on that. I also could not find a way to retrieve this value, the MRC instruction seems to be prohibited in the user mode on iOS.
Thank you!
On OMAP 4460 (ARM Cortex-A9, same as Apple A5/A5X) ERG is 32 bytes (which is same as cache line size).
I don't know that those values are on A6/A6X (and there is no way to find out without loading your own driver, which you can not do on Apple devices), but my guestimate is that cache line size increased to 64 bytes, and so did ERG.
Alternatively, you may optimize the algorithm for the architectural maximum of 512 words (2K bytes).
ERG size is a critical consideration when using ldrex/strex.
When an ldrex has been issued, if a memory access occurs in the ERG within which the ldrex read occured, the strex will fail.
It is not unusual to have a structure which contains an ldrex/strex target and some additional data, where the additional data is accessed between the ldrex/strex pair (for example, to store the value loaded by ldex).
If the ldrex/strex target has insufficient padding in the structure (i.e. the ERG size chosen is too small) the access to other members of the structure will therefore cause the strex to ALWAYS fail.
Game over, lights out.
Regarding ldrex/strex, ARM implements a "local monitor" and a "global monitor". On systems with only a local monitor, the only way an ldrex/strex can fail is if two ldrexs are issued on the same address prior to an strex having been issued - only systems with global monitors actually notice memory bus traffic within the ERG of the ldrex/strex target.
ARM systems vary hugely and I suspect there are systems which have only local monitors and so do not in fact actually support ldrex/strex.

Why memory alignment of 4 is needed for efficient access?

I understand why data need to be aligned (and all the efforts made to accomplish it like padding) so we can reduce the number of memory accesses but this assumes that processor just can fetch addresses multiples of 4(supposing we are using a 32-bit architecture).
And because of that assumption we need to align memory.
My question is:Why we can just access addresses multiple of 4(efficiency, hardware restriction, another one)?
Which is the advantages of doing this? Why cannot we access all the addresses available?
Memory is constructed from hardware (RAM) that is attached to memory busses. The wider the bus, the fewer cycles are required to fetch data. If memory was one byte wide, you'd need four cycles to read one 32-bit value. Over time memory architectures have evolved, and depending on the class of processor (embedded, low power, high performance, etc.), and the cache design, memory may be quite wide (say, 256 bits).
Given a very wide internal bus (between RAM or cache) and registers, say twice the width of the register, you could fetch a value in one cycle regardless of alignment if you have a barrel shifter in the data path. Barrel shifters are expensive, so not all processors have them; without one in the path, multiple cycles would be needed to align the value.

Resources