Question regarding HEX editor and retrieving file content from raw format - digital

How to view the content of a file in raw format in hex editor? and how to find the header offset and tailer offset of a document in raw format in hex editor?

1. How to view the content of a file in raw format in hex editor?
On Linux / Mac you can use xxd, which also has a lot of formatting options of the output, but a simple example:
xxd file.pdf | less
00000000: 2550 4446 2d31 2e37 0d25 e2e3 cfd3 0d0a %PDF-1.7.%......
00000010: 3131 3837 3420 3020 6f62 6a0d 3c3c 2f4c 11874 0 obj.<</L
00000020: 696e 6561 7269 7a65 6420 312f 4c20 3330 inearized 1/L 30
00000030: 3934 3237 392f 4f20 3131 3837 372f 4520 94279/O 11877/E
00000040: 3133 3334 3538 2f4e 2037 362f 5420 3238 133458/N 76/T 28
00000050: 3536 3638 312f 4820 5b20 3136 3733 2034 56681/H [ 1673 4
00000060: 3331 315d 3e3e 0d65 6e64 6f62 6a0d 2020 311]>>.endobj.
...
...
002f36c0: 4134 3534 3437 3444 4434 3337 3e3c 3036 A454474DD437><06
002f36d0: 3839 3542 4133 4234 4341 3434 3044 4232 895BA3B4CA440DB2
002f36e0: 3435 3937 3645 3545 3331 3231 3738 3e5d 45976E5E312178>]
002f36f0: 3e3e 0d73 7461 7274 7872 6566 0d31 3136 >>.startxref.116
002f3700: 0d25 2545 4f46 0d .%%EOF.
You can also open any file using popular hex editor HxD on Windows ( screenshot from https://mh-nexus.de/en/graphics/HxDShotLarge.png )
2. how to find the header offset and tailer offset
Let's take a look at file signatures and magic bytes. As you can see, the lenght of them can differ:
1F 9D .. 0 z tar.z compressed file (often tar zip) using Lempel-Ziv-Welch algorithm
25 50 44 46 2d %PDF- 0 pdf PDF document[16]
ed ab ee db í«îÛ 0 rpm RedHat Package Manager (RPM) package [3]
If you don't want to manually inspect based on the previous list, but rather programatically identify file signatures, there are some libraries for different languages, such as pyfsig, and they maintain a list of current file signatures under current list that they can deal with.

Related

How to filter MQTT traffic on base of topic name in tcpdump

I am capturing MQTT traffic for troubleshooting using below command
tcpdump -i team0 -w mqtt-trace.pcap src 10.x.x.x
But it results in very big file within minutes, Can i filter tcpdump on base of topic name
Following is tcp payload, I want it only capture payload which has PKGCTRL/1/status/frequency or if tcpdump can directly support filter on application layer protocol like wireshark mqtt.topic == PKGCTRL/1/status/frequency
0000 00 13 95 36 2e ef 00 10 7e 07 87 3d 08 00 45 00 ...6....~..=..E.
0010 00 77 2e 0d 40 00 40 06 f6 78 0a 0b 80 f3 0a 0b .w..#.#..x......
0020 80 f2 c3 6a 75 83 e4 f8 f7 7a 0a 89 67 76 50 18 ...ju....z..gvP.
0030 ea 60 59 f8 00 00 30 4d 00 1a 50 4b 47 43 54 52 .`Y...0M..PKGCTR
0040 4c 2f 31 2f 73 74 61 74 75 73 2f 66 72 65 71 75 L/1/status/frequ
0050 65 6e 63 79 0a 11 09 c2 7a 85 14 d0 71 37 16 12 ency....z...q7..
0060 06 08 01 10 01 18 00 12 1c 0a 0d 09 0b 46 25 75 .............F%u
0070 02 f2 48 40 10 21 18 00 11 00 60 76 14 d0 71 37 ..H#.!....`v..q7
0080 16 20 00 28 00 . .(.
I don't think the previously accepted answer necessarily does what you think it does and possibly not even what you want it to do. The original question stated, "But it results in very big file within minutes, Can i filter tcpdump on base of topic name"
If you're trying to limit the size of the capture file, then the previously accepted answer isn't doing that because it uses the exact same capture filter as was originally provided, namely src 10.x.x.x. This means that you're capturing the same amount of data as you were before. Just because a capture file name wasn't specified doesn't mean that packets aren't being written to a file; they are. In the case of tshark, packets are written to a temporary file, which will continue to grow until the capture session is terminated and then ideally it will be deleted, but not always. The location of the temporary file varies depending on the platform that tshark is run on, but you should be able to easily locate the directory by running tshark -G folders | grep "^Temp".
Now, if you want to reduce the size of the capture file, or the number of packets that you see, then you should be able to modify the tcpdump or tshark command-line arguments to accomplish that.
First off, if you don't need the entire payload, you can apply a snaplen to cut the packets short after some appropriate value. This is done using the -s option, and it's the same option for either capture tool.
OK, but whether you decide to apply a snaplen or not, if you want to filter based on the specific topic name, most likely you can achieve this; however there are a couple of caveats that I listed below. The main idea is to use the slice operator, [] (see the pcap-filter man page) to compare various bytes of the TCP payload to specific values. (NOTE: Neither tcpdump itself nor pcap-filter refers to this operator as the slice operator, but wireshark-filter does, so I do as well.) So the filter should:
Match packets only to/from a particular host, in this case 10.x.x.x
Match only MQTT packets (typically by port number, which I'll assume to be the standard tcp/1883 port)
Match only PUBLISH messages with QoS 0
Match only PUBLISH messages where the topic length is 26 bytes
Match only PUBLISH messages where the topic is "PKGCTRL/1/status/frequency"
Here's a command that should work (at least in most cases -> see caveats below):
tcpdump -i team0 -w mqtt-trace.pcap \
"(src host 10.x.x.x) and \
(tcp port 1883) and \
((tcp[20]&0xf6)=0x30) and \
(tcp[22:2]=26) and \
(tcp[24:4]=0x504b4743 and tcp[28:4]=0x54524c2f and \
tcp[32:4]=0x312f7374 and tcp[36:4]=0x61747573 and \
tcp[40:4]=0x2f667265 and tcp[44:4]=0x7175656e and tcp[48:2]=0x6379)"
It should be obvious from the description of the desired filter above what each separate component of the filter is doing for you.
Here, you'll end up with a capture file of the desired packets that you can reference later on if you need to. You can even apply this same filter to the tshark solution too if you prefer as well as writing to the named capture file, because as I explained earlier, tshark is writing packets to a file whether you explicitly specify one or not.
Caveats:
The filter assumes TCP headers are 20 bytes for simplicity in illustrating the solution, but that may not be the case. If you want a more robust solution to accommodate any TCP header size, then you'll need to determine the TCP header size from the data offset field of the TCP header, which is done in the filter using ((tcp[12]&0xf0)>>4)*4, and then replace every occurrence of 20 in the slice operator's offset field with that value. So for example, tcp[22:2]=26 becomes tcp[(((tcp[12]&0xf0)>>4)*4)+2:2]=26, etc.
Because the MQTT remaining message length field is variable-length encoded per MQTT3.1.1 section 2.2.3 Remaining Length, the filter, as provided above, will only work for values of the remaining length field from 0 to 127, i.e., the remaining length field must only be a single byte. Given that the topic in this case is 26 bytes and the topic length itself is 2 bytes, this means the filter will only work for MQTT message payloads of 99 bytes or less (127 - (2 + 26)).
As #hardillb suggested, use tshark instead. Due to tshark's architecture, you can't write a file at the same time as a display filter (it would be too slow).
To get the information you need, it would look something like this
$ tshark -i team0 -f "src 10.x.x.x" \
-Y "mqtt.topic == PKGCTRL/1/status/frequency" -T fields -e mqtt.topic
-i team0: Filter on interface team0
-f "src 10.x.x.x": Use a capture filter, which is the same as tcpdump's filtering. This will speed up processing as it's faster than a display filter (next bullet).
-Y "mqtt.topic == PKGCTRL/1/status/frequency": Filter for packets that match this display filter
-T fields -e mqtt.topic: Output only the mqtt.topic field, as that is the target information.
-T fields will output columnar data vertically delimited by newlines. It won't be horizontally delimited because there is only column, but the default is \t.

Why is there something written in the data section of an ICMPv4 echo ping request?

(My question differs from this one.)I am connected to a AP in a wireless network and I've send a simple ping request to www.google.com. When I analyze the packet in wireshark, I can see, that there are 48 bytes written in the data section of ICMP. After 8 bytes of trash values, the values are sequentially increasing from 0x10 to 0x37.Is there any particular reason, why ICMPv4 fits values instead of just using a bunch of zeroes?
The hexdump of the ICMPv4 data section:
0030 09 d9 0d 00 00 00 00 00 10 11 12 13 14 15 .Ù............
0040 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 .......... !"#$%
0050 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 &'()*+,-./012345
0060 36 37 67
After 8 bytes of trash values
First of all, these are not trash values. In some implementations of ping, the 1st 8 bytes may represent a timestamp.
As #ross-jacobs mentioned, RFC 792 describes the ICMP Echo Request/Reply Packets. For clarity, these two packets are described, in relevant part, as follows:
Echo or Echo Reply Message
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Code | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identifier | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data ...
+-+-+-+-+-
...
Description
The data received in the echo message must be returned in the echo
reply message.
Here you can see that the contents of the Data is not described at all; therefore an implementation is free to use whatever data it wishes, including none at all.
Now, since ping is a network test tool, one of the things it can help test is fragmentation/reassembly. Every ping implementation I'm aware of allows the user to specify the size of the payload, and if you exceed the MTU, you should see the ICMP packet fragmented/reassembled. If you examine the payload of the first fragment, you can tell where the second fragment should start just by looking at the sequence of bytes in the payload of the first fragment. If the data was all 0's, it wouldn't be possible to do this. Similarly, if an ICMP packet wasn't reassembled properly, not only would the checksum likely be wrong, but you would most likely be able to tell exactly where the reassembly failed due to a gap or other inconsistency in the payload. This is just one example of why a payload with a sequence of bytes instead of all 0's is more useful to the user.

COM Port Commands CRC XOR

I have a check printer that I want to connect in Delphi 7 by COM port and operate.
I have a command that I extracted with Serial Port Monitor:
STX "PIRI(781" FS NULL ETX "0B" wich is 02 50 49 52 49 28 37 38 31 1c 00 03 30 42 hex
The manual says the following:
CRC (which is the last two digits after the ETX) - packet checksum. It
is calculated by the following algorithm: executing XOR for every
byte of the block including ETX by excluding STX. The data of the
checksum take up two bytes and are a symbolic representation of the
numeric in a hexadecimal calculation system.
I tried to an ONLINE CRC calculator and return a 1B result and a 27 numeric.
How to do it? For "PIRI(781" FS NULL ETX it should be 0B
The documentation incorrectly identifies the check value as a CRC. It is not. It is simply the exclusive-or of the noted bytes. The exclusive-or of 50 49 52 49 28 37 38 31 1c 00 03 is 0b. You then convert the 0b to hex (with an upper case B, i.e. 0B), and get 30 42.

Assembly Language: Memory Bytes and Offsets

I am confused as to how memory is stored when declaring variables in assembly language. I have this block of sample code:
val1 db 1,2
val2 dw 1,2
val3 db '12'
From my study guide, it says that the total number of bytes required in memory to store the data declared by these three data definitions is 8 bytes (in decimal). How do I go about calculating this?
It also says that the offset into the data segment of val3 is 6 bytes and the hex byte at offset 5 is 00. I'm lost as to how to calculate these bytes and offsets.
Also, reading val1 into memory will produce 0102 but reading val3 into memory produces 3132. Are apostrophes represented by the 3 or where does it come from? How would val2 be read into memory?
You have two bytes, 0x01 and 0x02. That's two bytes so far.
Then you have two words, 0x0001 and 0x0002. That's another four bytes, making six to date.
The you have two more bytes making up the characters of the string '12', which are 0x31 and 0x32 in ASCII (a). That's another two bytes bringing the grand total to eight.
In little-endian format (which is what you're looking at here based on the memory values your question states), they're stored as:
offset value
------ -----
0 0x01
1 0x02
2 0x01
3 0x00
4 0x02
5 0x00
6 0x31
7 0x32
(a) The character set you're using in this case is the ASCII one (you can follow that link for a table describing all the characters in that set).
The byte values 0x30 thru 0x39 are the digits 0 thru 9, just as the bytes 0x41 thru 0x5A represent the upper-case alpha characters. The pseudo-op:
db '12'
is saying to insert the bytes for the characters '1' and '2'.
Similarly:
db 'Pax is a really cool guy',0
would give you the hex-dump representation:
addr +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +A +B +C +D +E +F +0123456789ABCDEF
0000 50 61 78 20 69 73 20 61 20 72 65 61 6C 6C 79 20 Pax is a really
0010 63 6F 6F 6C 20 67 75 79 00 cool guy.
val1 is two consecutive bytes, 1 and 2. db means "direct byte". val2 is two consecutive words, i.e. 4 bytes, again 1 and 2. in memory they will be 1, 0, 2, 0, assuming you're on a big endian machine. val3 is a two bytes string. 31 and 32 in are 49 and 50 in hexadecimal notation, they are ASCII codes for the characters "1" and "2".

How to calculate Internet checksum?

I have a question regarding how the Internet checksum is calculated. I couldn't find any good explanation from the book, so I ask it here.
Have a look at the following example.
The following two messages are sent: 10101001 and 00111001. The checksum is calculated with 1's complement. So far I understand. But how is the sum calculated? At first I thought it maybe is XOR, but it seems not to be the case.
10101001
00111001
--------
Sum 11100010
Checksum: 00011101
And then when they calculate if the message arrived OK. And once again how is the sum calculated?
10101001
00111001
00011101
--------
Sum 11111111
Complement 00000000 means that the pattern is O.K.
It uses addition, hence the name "sum". 10101001 + 00111001 = 11100010.
For example:
+------------+-----+----+----+----+---+---+---+---+--------+
| bin value | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | result |
+------------+-----+----+----+----+---+---+---+---+--------+
| value 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 169 |
| value 2 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 57 |
| sum/result | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 226 |
+------------+-----+----+----+----+---+---+---+---+--------+
If by internet checksum you mean TCP Checksum there's a good explanation here and even some code.
When you're calculating the checksum remember that it's not just a function of the data but also of the "pseudo header" which puts the source IP, dest IP, protocol, and length of the TCP packet into the data to be checksummed. This ties the TCP meta data to some data in the IP header.
TCP/IP Illustrated Vol 1 is a good reference for this and explains it all in detail.
The calculation of the internet checksum uses ones complement arithmetic. Consider the data being checksummed is a sequence of 8 bit integers. First you need to add them using ones complement arithmetic and take the ones complement of the result.
NOTE: When adding numbers ones complement arithmetic, a carryover from the MSB needs to be added to the result. Consider for eg., the addition of 3(0011) and 5(0101).
3'->1100
5'->1010
0110 with a carry of 1
Thus we have, 0111(1's complement representation of -8).
The checksum is the 1's complement of the result obtained int he previous step. Hence we have 1000. If no carry exists, we just complement the result obtained in the summing stage.
The UDP checksum is created on the sending side by summing all the 16-bit words in the segment, with any overflow being wrapped around and then the 1's complement is performed and the result is added to the checksum field inside the segment.
at the receiver side, all words inside the packet are added and the checksum is added upon them if the result is 1111 1111 1111 1111 then the segment is valid else the segment has an error.
exmaple:
0110 0110 0110 0000
0101 0101 0101 0101
1000 1111 0000 1100
--------------------
1 0100 1010 1100 0001 //there is an overflow so we wrap it up, means add it to the sum
the sum = 0100 1010 1100 0010
now let's take the 1's complement
checksum = 1011 0101 0011 1101
at the receiver the sum is calculated and then added to the checksum
0100 1010 1100 0010
1011 0101 0011 1101
----------------------
1111 1111 1111 1111 //clearly this should be the answer, if it isn't then there is an error
references:Computer networking a top-down approach[Ross-kurose]
Here's a complete example with a real header of an IPv4 packet.
In the following example, I use bc, printf and here strings to calculate the header checksum and verify it. Consequently, it should be easy to reproduce the results on Linux by copy-pasting the commands.
These are the twenty bytes of our example packet header:
45 00 00 34 5F 7C 40 00 40 06 [00 00] C0 A8 B2 14 C6 FC CE 19
The sender hasn't calculated the checksum yet. The two bytes in square brackets is where the checksum will go. The checksum's value is initially set to zero.
We can mentally split up this header as a sequence of ten 16-bit values: 0x4500, 0x0034, 0x5F7C, etc.
Let's see how the sender of the packet calculates the header checksum:
Add all 16-bit values to get 0x42C87: bc <<< 'obase=16;ibase=16;4500 + 0034 + 5F7C + 4000 + 4006 + 0000 + C0A8 + B214 + C6FC + CE19'
The leading digit 4 is the carry count, we add this to the rest of the number to get 0x2C8B: bc <<< 'obase=16;ibase=16;2C87 + 4'
Invert¹ 0x2C8B to get the checksum: 0xD374
Finally, insert the checksum into the header:
45 00 00 34 5F 7C 40 00 40 06 [D3 74] C0 A8 B2 14 C6 FC CE 19
Now the header is ready to be sent.
The recipient of the IPv4 packet then creates the checksum of the received header in the same way:
Add all 16-bit values to get 0x4FFFB: bc <<< 'obase=16;ibase=16;4500 + 0034 + 5F7C + 4000 + 4006 + D374 + C0A8 + B214 + C6FC + CE19'
Again, there's a carry count so we add that to the rest to get 0xFFFF: bc <<< 'obase=16;ibase=16;FFFB + 4'
If the checksum is 0xFFFF, as in our case, the IPv4 header is intact.
See the Wikipedia entry for more information.
¹Inverting the hexadecimal number means converting it to binary, flipping the bits, and converting it to hexadecimal again. You can do this online or with Bash: hex_nr=0x2C8B; hex_len=$(( ${#hex_nr} - 2 )); inverted=$(printf '%X' "$(( ~ hex_nr ))"); trunc_inverted=${inverted: -hex_len}; echo $trunc_inverted

Resources