Does a stuffing bit in CAN count towards the next stuffing group - can-bus

If you have a sequence of bits in CAN data:
011111000001
There will need to be a stuffed 0 after the ones, and a stuffed 1 after the 0s. But I'm not sure where the 1 should go.
The standard seems ambiguous to me because sometimes it talks about "5 consecutive bits during normal operation", but sometimes it says "5 consecutive bits of data". Does a stuffing bit count as data?
i.e.
should it be:
01111100000011
Or
01111100000101

Bit stuffing only applies to the CAN frame until the ACK-bit. In the End-Of-Frame and Intermission fields, no bit stuffing is applied.
It does not matter what is transmitted.
It is simply "after 5 consecutive bits of the same value" one complementary bit is inserted.
The second of your examples is correct. 6 consecutive bits make the message invalid.

From the old Bosch CAN2.0B spec, chapter 5:
The frame segments START OF FRAME, ARBITRATION FIELD, CONTROL FIELD, DATA FIELD and CRC SEQUENCE are coded by the method of bit stuffing.
Meaning everything from the start of the frame to the 15 bit CRC can have bit stuffing, but not the 1 bit CRC delimiter and the rest of the frame.
Whenever a transmitter detects five consecutive bits in the bit stream to be transmitted
This "bit stream" refers to all the fields mentioned in the previously quoted sentence.
...in the actual transmitted bit stream
The actual transmitted bit stream is the original data + appended stuffing bit(s).

Related

Bug in ARCore acquireDepthImage?

The documentation states that it's 16 bits each with top 3 bits set to 0, meaning the range should be about 8,192. My code is calling
depthImage.getPlanes()[0].getBuffer().asShortBuffer().get(x)
The range of these numbers is about the full signed short range and seemingly random. To debug, I tried printing the following values:
depthImage.getPlanes()[0].getBuffer().get(0);
depthImage.getPlanes()[0].getBuffer().get(1);
The first one oscillates randomly and the second one is almost always in the low numbers such as 0-6, but I've seen as high as 21.
It seems like the 2nd byte is the most significant byte of the number and the 1st byte is the least significant byte (i.e. they are reversed).
It's little-endian encoding, use Short.reverseBytes():
depthMm = java.lang.Short.reverseBytes(depthImage.planes[0].buffer.getShort(offset))

Deflate and fixed Huffman codes

I'm trying to implement a deflate compressor and I have to decide whether to
compress a block using the static huffman code or create a dynamic one.
What is the rationale behind the length associated with the static code?
(this is the table included in the rfc)
Lit Value Bits
--------- ----
0 - 143 8
144 - 255 9
256 - 279 7
280 - 287 8
I thought static code was more biased towards plain ascii text, instead it
looks like it prefers by a tiny bit the compression of the rle length
What is a good heuristic to decide whether to use static code?
I was thinking to build a distribution of probabilities from a sample of the
input data and calculate a distance (maybe EMD?) from the probabilities derived
from the static code.
I would guess that the creator of the code took a large sample of literals and lengths from compressed data, likely including executables along with text, and found typical code lengths over the large set. They were then approximated with the table shown. However the author passed away many years ago, so we'll never know for sure.
You don't need a heuristic. Once you have done the work to find matching strings, it is comparatively very fast to compute the number of bits in the block for both a dynamic and static representation. Then simply pick the smaller one. Or the static one if equal (decodes faster).
I don't know about rationale, but there was a small amount of irrationale in choosing the static code lengths:
In the table in your question, the maximum static code number there is 287, but the DEFLATE specification only allows up to code 285, meaning code lengths have wastefully been assigned to two invalid codes. (And not even the longest ones either!) It's a similar story with the table for distance codes, with 32 codes having lengths assigned, but only 30 valid.
So there are some easy improvements that could have been made, but that said, without some prior knowledge of the data, it's not really possible to produce anything that's massively more efficient generally. The "flatness" of the table (no code longer than 9 bits) reduces the worst-case performance to 1 extra bit per byte of uncompressable data.
I think the main rationale behind the groupings is that by keeping group sizes to a multiple of 8, it's possible to tell which group a code belongs to by looking at the 5 most significant bits, which also tells you its length, along with what value to add to immediately get the code value itself
00000 00 .. 00101 11 7 bits + 256 -> (256..279)
00110 000 .. 10111 111 8 bits - 48 -> ( 0..144)
11000 000 .. 11000 111 8 bits + 78 -> (280..287)
11001 0000 .. 11111 1111 9 bits - 256 -> (144..255)
So in theory you could set up a lookup table with 32 entries to quickly read in the codes, but it's an uncommon case and probably not worth optimising for.
There are only really two cases (with some overlap) where Fixed Huffman blocks are likely to be the most efficient:
where the input size in bytes is very small, Static Huffman can be more efficient than Uncompressed, because Uncompressed uses a 32-bit header, while Fixed Huffman needs only a 7-bit footer, plus 1 bit potential overhead per byte.
where the output size is very small (ie. small-ish, highly compressible data), Static Huffman can be more efficient than Dynamic Huffman - again because Dynamic Huffman uses a certain amount of space for an additional header. (A practical minimum header size is difficult to calculate, but I'd say at least 64 bits, probably more.)
That said, I've found they are actually helpful from a developer's perspective, because it's very easy to implement a Deflate-compatible function using Static Huffman blocks, and to iteratively improve from there to get more efficient algorithms working.

Why bytes of one word has opposite order in binary files?

I was reading BMP file in hex editor while discovered something odd. Two first letters "BM" are written in order, however the next word(2B), which is means file size, is 36 30 in hex. Actual size is 0x3036. I've noticed that other numbers are stored the same way.
I'm also using MARS MIPS emulator which can display memory by words. String in.bmp is stored as b . n i / \0 p m.
Why data isn't stored continuously?
It depends not on the data itself but on how you store this data: per byte, per word (2 bytes, usually), or per long (4 bytes -- again, usually). As long as you store data per byte you don't see anything unusual; data appears "continuous". However, with longer units, you are subject to endianness.
It appears your emulator is assuming all words need to have their bytes reversed; and you can see in your example that this assumption is not always valid.
As for the BM "magic" signature: it's not meant to be read as a word value "BM", but rather as "first, a single byte B, then a single byte M". All next values are written in little-endian order, not only 'exchanging' your 36 and 30 but also the 2 zeroes 'before' (or 'after') (the larger values in the BMP header are of 4 bytes long type).

Cache and memory

First of all, this is not language tag spam, but this question not specific to one language in particulary and I think that this stackexchange site is the most appropriated for my question.
I'm working on cache and memory, trying to understand how it works.
What I don't understand is this sentence (in bold, not in the picture) :
In the MIPS architecture, since words are aligned to multiples of four
bytes, the least significant two bits are ignored when selecting a
word in the block.
So let's say I have this two adresses :
[1........0]10
[1........0]00
^
|
same 30 bits for boths [31-12] for the tag and [11-2] for the index (see figure below)
As I understand the first one will result in a MISS (I assume that the initial cache is empty). So one slot in the cache will be filled with the data located in this memory adress.
Now, we took the second one, since it has the same 30 bits, it will result in a HIT in the cache because we access the same slot (because of the same 10 bits) and the 20 bits of the adress are equals to the 20 bits stored in the Tag field.
So in result, we'll have the data located at the memory [1........0]10 and not [1........0]00 which is wrong !
So I assume this has to do with the sentence I quote above. Can anyone explain me why my reasoning is wrong ?
The cache in figure :
In the MIPS architecture, since words are aligned to multiples of four
bytes, the least significant two bits are ignored when selecting a
word in the block.
It just mean that in memory, my words are aligned like that :
So when selecting a word, I shouldn't care about the two last bits, because I'll load a word.
This two last bits will be useful for the processor when a load byte (lb) instruction will be performed, to correctly shift the data to get the one at the correct byte position.

How could I guess a checksum algorithm?

Let's assume that I have some packets with a 16-bit checksum at the end. I would like to guess which checksum algorithm is used.
For a start, from dump data I can see that one byte change in the packet's payload totally changes the checksum, so I can assume that it isn't some kind of simple XOR or sum.
Then I tried several variations of CRC16, but without much luck.
This question might be more biased towards cryptography, but I'm really interested in any easy to understand statistical tools to find out which CRC this might be. I might even turn to drawing different CRC algorithms if everything else fails.
Backgroud story: I have serial RFID protocol with some kind of checksum. I can replay messages without problem, and interpret results (without checksum check), but I can't send modified packets because device drops them on the floor.
Using existing software, I can change payload of RFID chip. However, unique serial number is immutable, so I don't have ability to check every possible combination. Allthough I could generate dumps of values incrementing by one, but not enough to make exhaustive search applicable to this problem.
dump files with data are available if question itself isn't enough :-)
Need reference documentation? A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS is great reference which I found after asking question here.
In the end, after very helpful hint in accepted answer than it's CCITT, I
used this CRC calculator, and xored generated checksum with known checksum to get 0xffff which led me to conclusion that final xor is 0xffff instread of CCITT's 0x0000.
There are a number of variables to consider for a CRC:
Polynomial
No of bits (16 or 32)
Normal (LSB first) or Reverse (MSB first)
Initial value
How the final value is manipulated (e.g. subtracted from 0xffff), or is a constant value
Typical CRCs:
LRC: Polynomial=0x81; 8 bits; Normal; Initial=0; Final=as calculated
CRC16: Polynomial=0xa001; 16 bits; Normal; Initial=0; Final=as calculated
CCITT: Polynomial=0x1021; 16 bits; reverse; Initial=0xffff; Final=0x1d0f
Xmodem: Polynomial=0x1021; 16 bits; reverse; Initial=0; Final=0x1d0f
CRC32: Polynomial=0xebd88320; 32 bits; Normal; Initial=0xffffffff; Final=inverted value
ZIP32: Polynomial=0x04c11db7; 32 bits; Normal; Initial=0xffffffff; Final=as calculated
The first thing to do is to get some samples by changing say the last byte. This will assist you to figure out the number of bytes in the CRC.
Is this a "homemade" algorithm. In this case it may take some time. Otherwise try the standard algorithms.
Try changing either the msb or the lsb of the last byte, and see how this changes the CRC. This will give an indication of the direction.
To make it more difficult, there are implementations that manipulate the CRC so that it will not affect the communications medium (protocol).
From your comment about RFID, it implies that the CRC is communications related. Usually CRC16 is used for communications, though CCITT is also used on some systems.
On the other hand, if this is UHF RFID tagging, then there are a few CRC schemes - a 5 bit one and some 16 bit ones. These are documented in the ISO standards and the IPX data sheets.
IPX: Polynomial=0x8005; 16 bits; Reverse; Initial=0xffff; Final=as calculated
ISO 18000-6B: Polynomial=0x1021; 16 bits; Reverse; Initial=0xffff; Final=as calculated
ISO 18000-6C: Polynomial=0x1021; 16 bits; Reverse; Initial=0xffff; Final=as calculated
Data must be padded with zeroes to make a multiple of 8 bits
ISO CRC5: Polynomial=custom; 5 bits; Reverse; Initial=0x9; Final=shifted left by 3 bits
Data must be padded with zeroes to make a multiple of 8 bits
EPC class 1: Polynomial=custom 0x1021; 16 bits; Reverse; Initial=0xffff; Final=post processing of 16 zero bits
Here is your answer!!!!
Having worked through your logs, the CRC is the CCITT one. The first byte 0xd6 is excluded from the CRC.
It might not be a CRC, it might be an error correcting code like Reed-Solomon.
ECC codes are often a substantial fraction of the size of the original data they protect, depending on the error rate they want to handle. If the size of the messages is more than about 16 bytes, 2 bytes of ECC wouldn't be enough to be useful. So if the message is large, you're most likely correct that its some sort of CRC.
I'm trying to crack a similar problem here and I found a pretty neat website that will take your file and run checksums on it with 47 different algorithms and show the results. If the algorithm used to calculate your checksum is any of these algorithms, you would simply find it among the list of checksums produced with a simple text search.
The website is https://defuse.ca/checksums.htm
You would have to try every possible checksum algorithm and see which one generates the same result. However, there is no guarantee to what content was included in the checksum. For example, some algorithms skip white spaces, which lead to different results.
I really don't see why would somebody want to know that though.

Resources