Calculating CRC-CCITT (Kermit) on two different sites, gives different results - checksum

I was playing around with CRC-CCITT (Kermit), and I noticed that when calculating the checksum on different sites, I got different results.
On lammertbies.nl/comm/info/crc-calculation.html the result of 123456789 was 0x8921, but on crccalc.com it was 0x2189.
In fact whatever value you enter the result on crccalc is the same as lammertbies, but with the last two characters first. So foobar on lammertbies is 0xF4E3, but on crccalc is 0xE3F4.
Which site is correct, and what is the other site doing wrong?

This is an issue of big endian versus little endian in the reported CRC value.
You can verify most implementations of CRC by appending the CRC to a string and checking the appended string for CRC == 0. Go back to both of the CRC calculator web sites, change the input to hex, enter the hex string
"3132333435363738398921"
and the Kermit CRC will be 0000, so the CRC as appended to the string would be 0x89, 0x21.

Related

How to convert hexadecimal data (stored in a string variable) to an integer value

Edit (abstract)
I tried to interpret Char/String data as Byte, 4 bytes at a time. This was because I could only get TComport/TDatapacket to interpret streamed data as String, not as any other data type. I still don't know how to get the Read method and OnRxBuf event handler to work with TComport.
Problem Summary
I'm trying to get data from a mass spectrometer (MS) using some Delphi code. The instrument is connected with a serial cable and follows the RS232 protocol. I am able to send commands and process the text-based outputs from the MS without problems, but I am having trouble with interpreting the data buffer.
Background
From the user manual of this instrument:
"With the exception of the ion current values, the output of the RGA are ASCII character strings terminated by a linefeed + carriage return terminator. Ion signals are represented as integers in units of 10^-16 Amps, and transmitted directly in hex format (four byte integers, 2's complement format, Least Significant Byte first) for maximum data throughput."
I'm not sure whether (1) hex data can be stored properly in a string variable. I'm also not sure how to (2) implement 2's complement in Delphi and (3) the Least Significant Byte first.
Following #David Heffernan 's advice, I went and revised my data types. Attempting to harvest binary data from characters doesn't work, because not all values from 0-255 can be properly represented. You lose data along the way, basically. Especially it your data is represented 4 bytes at a time.
The solution for me was to use the Async Professional component instead of Denjan's Comport lib. It handles datastreams better and has a built-in log that I could use to figure out how to interpret streamed resposes from the instrument. It's also better documented. So, if you're new to serial communications (like I am), rather give that a go.

Reverse engineering checksum from ascii string?

I'm currently working on reverse engineering a device I have serial protocol.
I'm mostly there however I can't figure out one part of the string.
For each string the machine returns it always has !XXXX where the XXXX varies in a hex value. From what I can find this may be CRC16?
However I can't figure out how to calculate the CRC myself to confirm it is correct.
Here's an example of 3 Responses.
U;0;!1F1B
U;1;!0E92
U;2;!3C09
The number can be replaced with a range of ascii characters. For example here's what I'll be using most often.
U;RYAN W;!FF0A
How do I calculate how the checksum is generated?
You need more examples with different lengths.
With reveng, you will want to reverse the CRC byte, e.g. 1b1f, not 1f1b. It appears that the CRC is calculated over what is between the semicolons. With reveng I get that the polynomial is 0x1021, which is a very common 16-bit polynomial, and that the CRC is reflected.
% reveng -w 16 -s 301b1f 31920e 32093c 5259414e20570aff
width=16 poly=0x1021 init=0x1554 refin=true refout=true xorout=0x07f0 check=0xfa7e name=(none)
width=16 poly=0x1021 init=0xe54b refin=true refout=true xorout=0xffff check=0xfa7e name=(none)
With more examples, you will be able to determine the initial value of the CRC register and what the result is exclusive-or'ed with.
There is a tool available to reverse-engineer CRC calculations: CRC RevEng http://reveng.sourceforge.net/
You can give it hex strings of the input and checksum and ask it what CRC algorithm matches the input. Here is the input for the first three strings (assuming the messages are U;0;, U;1; and U;2;):
$ reveng -w 16 -s 553b303b1f1b 553b313b0e92 553b323b3c09
width=16 poly=0xa097 init=0x63bc refin=false refout=false xorout=0x0000 check=0x6327 residue=0x0000 name=(none)
The checksum follows the input messages. Unfortunately this doesn't work if I try the RYAN W message. You'll probably want to try editing the input messages to see which part of the string is being input into the CRC.

Why bytes of one word has opposite order in binary files?

I was reading BMP file in hex editor while discovered something odd. Two first letters "BM" are written in order, however the next word(2B), which is means file size, is 36 30 in hex. Actual size is 0x3036. I've noticed that other numbers are stored the same way.
I'm also using MARS MIPS emulator which can display memory by words. String in.bmp is stored as b . n i / \0 p m.
Why data isn't stored continuously?
It depends not on the data itself but on how you store this data: per byte, per word (2 bytes, usually), or per long (4 bytes -- again, usually). As long as you store data per byte you don't see anything unusual; data appears "continuous". However, with longer units, you are subject to endianness.
It appears your emulator is assuming all words need to have their bytes reversed; and you can see in your example that this assumption is not always valid.
As for the BM "magic" signature: it's not meant to be read as a word value "BM", but rather as "first, a single byte B, then a single byte M". All next values are written in little-endian order, not only 'exchanging' your 36 and 30 but also the 2 zeroes 'before' (or 'after') (the larger values in the BMP header are of 4 bytes long type).

Jfif/jpeg parsing, bytes between streams

I'm parsing an Jpeg/JFIF file and I noticed that after the SOI (0xFF D8) I parse the different "streams" starting with 0xFFXX (where XX is a hexadecimal number) until I find the EOI (0XFFD9). Now the structure of the diffrent chunks is:
APP0 marker 2 Bytes
Length 2 Bytes
Now when I parse the a chunk I parse until i reach the length written in the 2 Bytes of the length field. After that I thought I would immediately find another Marker, followed by a length for the next chunk. According to my parser that is not always true, there might be data between the chunks. I couldn't find out what that data is, and if it is relevant to the image. Do you have any hints what this could be and how to interpret those bytes?
I'm lost and would be happy if somebody could point me in the correct direction. Thanks in advance
I've recently noticed this too. In my case it's an APP2 chunk which is the ICC profile which doesn't contain the length of the chunk.
In fact so far as I can see the length of the chunk needn't be the first 2 bytes (though it usually is).
In JFIF all 0xFF bytes are replaced with 0xFF 0x00 in the data section, so it should just be a matter of calculating the length from that. I just read until I hit another header, however I've noticed that sometimes (again in the ICC profile) there are byte sequences which don't make sense such as 0xFF 0x6D, so I may still be missing something.

How could I guess a checksum algorithm?

Let's assume that I have some packets with a 16-bit checksum at the end. I would like to guess which checksum algorithm is used.
For a start, from dump data I can see that one byte change in the packet's payload totally changes the checksum, so I can assume that it isn't some kind of simple XOR or sum.
Then I tried several variations of CRC16, but without much luck.
This question might be more biased towards cryptography, but I'm really interested in any easy to understand statistical tools to find out which CRC this might be. I might even turn to drawing different CRC algorithms if everything else fails.
Backgroud story: I have serial RFID protocol with some kind of checksum. I can replay messages without problem, and interpret results (without checksum check), but I can't send modified packets because device drops them on the floor.
Using existing software, I can change payload of RFID chip. However, unique serial number is immutable, so I don't have ability to check every possible combination. Allthough I could generate dumps of values incrementing by one, but not enough to make exhaustive search applicable to this problem.
dump files with data are available if question itself isn't enough :-)
Need reference documentation? A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS is great reference which I found after asking question here.
In the end, after very helpful hint in accepted answer than it's CCITT, I
used this CRC calculator, and xored generated checksum with known checksum to get 0xffff which led me to conclusion that final xor is 0xffff instread of CCITT's 0x0000.
There are a number of variables to consider for a CRC:
Polynomial
No of bits (16 or 32)
Normal (LSB first) or Reverse (MSB first)
Initial value
How the final value is manipulated (e.g. subtracted from 0xffff), or is a constant value
Typical CRCs:
LRC: Polynomial=0x81; 8 bits; Normal; Initial=0; Final=as calculated
CRC16: Polynomial=0xa001; 16 bits; Normal; Initial=0; Final=as calculated
CCITT: Polynomial=0x1021; 16 bits; reverse; Initial=0xffff; Final=0x1d0f
Xmodem: Polynomial=0x1021; 16 bits; reverse; Initial=0; Final=0x1d0f
CRC32: Polynomial=0xebd88320; 32 bits; Normal; Initial=0xffffffff; Final=inverted value
ZIP32: Polynomial=0x04c11db7; 32 bits; Normal; Initial=0xffffffff; Final=as calculated
The first thing to do is to get some samples by changing say the last byte. This will assist you to figure out the number of bytes in the CRC.
Is this a "homemade" algorithm. In this case it may take some time. Otherwise try the standard algorithms.
Try changing either the msb or the lsb of the last byte, and see how this changes the CRC. This will give an indication of the direction.
To make it more difficult, there are implementations that manipulate the CRC so that it will not affect the communications medium (protocol).
From your comment about RFID, it implies that the CRC is communications related. Usually CRC16 is used for communications, though CCITT is also used on some systems.
On the other hand, if this is UHF RFID tagging, then there are a few CRC schemes - a 5 bit one and some 16 bit ones. These are documented in the ISO standards and the IPX data sheets.
IPX: Polynomial=0x8005; 16 bits; Reverse; Initial=0xffff; Final=as calculated
ISO 18000-6B: Polynomial=0x1021; 16 bits; Reverse; Initial=0xffff; Final=as calculated
ISO 18000-6C: Polynomial=0x1021; 16 bits; Reverse; Initial=0xffff; Final=as calculated
Data must be padded with zeroes to make a multiple of 8 bits
ISO CRC5: Polynomial=custom; 5 bits; Reverse; Initial=0x9; Final=shifted left by 3 bits
Data must be padded with zeroes to make a multiple of 8 bits
EPC class 1: Polynomial=custom 0x1021; 16 bits; Reverse; Initial=0xffff; Final=post processing of 16 zero bits
Here is your answer!!!!
Having worked through your logs, the CRC is the CCITT one. The first byte 0xd6 is excluded from the CRC.
It might not be a CRC, it might be an error correcting code like Reed-Solomon.
ECC codes are often a substantial fraction of the size of the original data they protect, depending on the error rate they want to handle. If the size of the messages is more than about 16 bytes, 2 bytes of ECC wouldn't be enough to be useful. So if the message is large, you're most likely correct that its some sort of CRC.
I'm trying to crack a similar problem here and I found a pretty neat website that will take your file and run checksums on it with 47 different algorithms and show the results. If the algorithm used to calculate your checksum is any of these algorithms, you would simply find it among the list of checksums produced with a simple text search.
The website is https://defuse.ca/checksums.htm
You would have to try every possible checksum algorithm and see which one generates the same result. However, there is no guarantee to what content was included in the checksum. For example, some algorithms skip white spaces, which lead to different results.
I really don't see why would somebody want to know that though.

Resources