calculate checksum for spilted file by boost crc - checksum

I wonder that if I can calculate 2 checksums by read first half of the file to get one checksum A and then read the rest of the file to get another checksum B, and these two checksums A,B will combined to a uniq check sum (with longer length)
I use the boost::CRC library try to implement it, but I don't know if I use it right?
(1)The second parameter of process_bytes, is that means the total buffer length? (2) Does the result will calculated by the function recursively that I don't have to worry about the array? Or when I call the process_byte, it just calculate the new checksum of the new single byte of the buffer array?
Frankly
std::ifstream ifs( argv[i], std::ios_base::binary );
if ( ifs )
{
do
{
char buffer[BUFFER_SIZE];
ifs.read( buffer, BUFFER_SIZE);
result.process_bytes( buffer, HALF_FILE_SIZE );
} while (HALF or END of FILE );
}
std::cout << result.checksum() << std::endl;
plz refer to this page to see the boost::CRC example code:
http://www.boost.org/doc/libs/1_37_0/libs/crc/crc_example.cpp

I can't figure out what you're asking.
First off, a CRC is not a checksum. The "sum" in checksum means that the data is added. A CRC computes a polynomial remainder over a finite field, which is not a sum. This is important, since you seem to be asking about combining CRCs. The CRCs of two pieces cannot be added to get the CRC of the whole thing.
Second, the way to get the CRC of multiple pieces is to compute a single CRC over those pieces. That is what the example code does. result contains a single CRC that is updated with each piece that is run through it with process_bytes.
Third, it is possible to combine two CRCs, given the two CRCs and the length of the first piece, to get what would have been a single CRC of the two pieces concatenated. The operation is not trivial, but you can find it in zlib's crc32_combine() routine.

Related

Does a stuffing bit in CAN count towards the next stuffing group

If you have a sequence of bits in CAN data:
011111000001
There will need to be a stuffed 0 after the ones, and a stuffed 1 after the 0s. But I'm not sure where the 1 should go.
The standard seems ambiguous to me because sometimes it talks about "5 consecutive bits during normal operation", but sometimes it says "5 consecutive bits of data". Does a stuffing bit count as data?
i.e.
should it be:
01111100000011
Or
01111100000101
Bit stuffing only applies to the CAN frame until the ACK-bit. In the End-Of-Frame and Intermission fields, no bit stuffing is applied.
It does not matter what is transmitted.
It is simply "after 5 consecutive bits of the same value" one complementary bit is inserted.
The second of your examples is correct. 6 consecutive bits make the message invalid.
From the old Bosch CAN2.0B spec, chapter 5:
The frame segments START OF FRAME, ARBITRATION FIELD, CONTROL FIELD, DATA FIELD and CRC SEQUENCE are coded by the method of bit stuffing.
Meaning everything from the start of the frame to the 15 bit CRC can have bit stuffing, but not the 1 bit CRC delimiter and the rest of the frame.
Whenever a transmitter detects five consecutive bits in the bit stream to be transmitted
This "bit stream" refers to all the fields mentioned in the previously quoted sentence.
...in the actual transmitted bit stream
The actual transmitted bit stream is the original data + appended stuffing bit(s).

Efficient whole file CRC computation in the presence of small overwrites

I have a large file and I maintain crc32 checksum over its contents. If a fixed portion of the file were to change either at the start of the file or the end of the file, I can maintain crc32 checksum of the static portion and the dynamic portion and use crc32_combine to efficiently calculate the new whole file checksum. Mark Adler answered it beautifully here: CRC Calculation Of A Mostly Static Data Stream.
But if the content in the middle of the file were to change and not always at a predefined offset (and length), is there a way to efficiently compute the whole file checksum without reading the whole file?
Yes, so long as you know the before and after values of the bytes changed. And their location, of course.
Compute the exclusive-or of the before and after. That is zeros where there are no changes, and non-zero where there are changes. Then compute the raw CRC of the exclusive-or for the entire file, and then exclusive-or the result of that with the CRC.
Presumably you will have a long sequence of zeros, and some non-zero values, and then another long sequence of zeros. You can ignore the initial long sequence and just start computing the CRC of the non-zero values. Then use the same trick in the link to apply the long sequence of zeros after that to the raw CRC.

Calculating CRC-CCITT (Kermit) on two different sites, gives different results

I was playing around with CRC-CCITT (Kermit), and I noticed that when calculating the checksum on different sites, I got different results.
On lammertbies.nl/comm/info/crc-calculation.html the result of 123456789 was 0x8921, but on crccalc.com it was 0x2189.
In fact whatever value you enter the result on crccalc is the same as lammertbies, but with the last two characters first. So foobar on lammertbies is 0xF4E3, but on crccalc is 0xE3F4.
Which site is correct, and what is the other site doing wrong?
This is an issue of big endian versus little endian in the reported CRC value.
You can verify most implementations of CRC by appending the CRC to a string and checking the appended string for CRC == 0. Go back to both of the CRC calculator web sites, change the input to hex, enter the hex string
"3132333435363738398921"
and the Kermit CRC will be 0000, so the CRC as appended to the string would be 0x89, 0x21.

Reverse engineering checksum from ascii string?

I'm currently working on reverse engineering a device I have serial protocol.
I'm mostly there however I can't figure out one part of the string.
For each string the machine returns it always has !XXXX where the XXXX varies in a hex value. From what I can find this may be CRC16?
However I can't figure out how to calculate the CRC myself to confirm it is correct.
Here's an example of 3 Responses.
U;0;!1F1B
U;1;!0E92
U;2;!3C09
The number can be replaced with a range of ascii characters. For example here's what I'll be using most often.
U;RYAN W;!FF0A
How do I calculate how the checksum is generated?
You need more examples with different lengths.
With reveng, you will want to reverse the CRC byte, e.g. 1b1f, not 1f1b. It appears that the CRC is calculated over what is between the semicolons. With reveng I get that the polynomial is 0x1021, which is a very common 16-bit polynomial, and that the CRC is reflected.
% reveng -w 16 -s 301b1f 31920e 32093c 5259414e20570aff
width=16 poly=0x1021 init=0x1554 refin=true refout=true xorout=0x07f0 check=0xfa7e name=(none)
width=16 poly=0x1021 init=0xe54b refin=true refout=true xorout=0xffff check=0xfa7e name=(none)
With more examples, you will be able to determine the initial value of the CRC register and what the result is exclusive-or'ed with.
There is a tool available to reverse-engineer CRC calculations: CRC RevEng http://reveng.sourceforge.net/
You can give it hex strings of the input and checksum and ask it what CRC algorithm matches the input. Here is the input for the first three strings (assuming the messages are U;0;, U;1; and U;2;):
$ reveng -w 16 -s 553b303b1f1b 553b313b0e92 553b323b3c09
width=16 poly=0xa097 init=0x63bc refin=false refout=false xorout=0x0000 check=0x6327 residue=0x0000 name=(none)
The checksum follows the input messages. Unfortunately this doesn't work if I try the RYAN W message. You'll probably want to try editing the input messages to see which part of the string is being input into the CRC.

Jfif/jpeg parsing, bytes between streams

I'm parsing an Jpeg/JFIF file and I noticed that after the SOI (0xFF D8) I parse the different "streams" starting with 0xFFXX (where XX is a hexadecimal number) until I find the EOI (0XFFD9). Now the structure of the diffrent chunks is:
APP0 marker 2 Bytes
Length 2 Bytes
Now when I parse the a chunk I parse until i reach the length written in the 2 Bytes of the length field. After that I thought I would immediately find another Marker, followed by a length for the next chunk. According to my parser that is not always true, there might be data between the chunks. I couldn't find out what that data is, and if it is relevant to the image. Do you have any hints what this could be and how to interpret those bytes?
I'm lost and would be happy if somebody could point me in the correct direction. Thanks in advance
I've recently noticed this too. In my case it's an APP2 chunk which is the ICC profile which doesn't contain the length of the chunk.
In fact so far as I can see the length of the chunk needn't be the first 2 bytes (though it usually is).
In JFIF all 0xFF bytes are replaced with 0xFF 0x00 in the data section, so it should just be a matter of calculating the length from that. I just read until I hit another header, however I've noticed that sometimes (again in the ICC profile) there are byte sequences which don't make sense such as 0xFF 0x6D, so I may still be missing something.

Resources