how to use seq2seq to decode concatenated string - machine-learning

Am trying to decode a concatenated String like below ...
SQCB7A750BATWE SQ CB 7 A 750 B A T WE
PT05A1219PY023 PT 05 A 12 19 P Y 023
PT55A1019PX02 PT 55 A 10 19 P X 02
PT33SE2215SW023 PT 33 SE 22 15 S W 023
PT05A2216PW023(LC) PT 05 A 22 16 P W 023 (LC)
am looking for a smarter way rather than hard-coded rules as the input will have variations(number of characters and digits), I came across SEQ2SEQ model and I want to know if it's possible to use it in such problem
I already followed some tutorials to get a taste of it, but the results weren't even close
it also seems there are 2 approaches character level and word level as per this tutorial
Character level:
Input sentence: SQCACA333BA71A
Decoded sentence: P 9(PDD366AZ2IDD4K )F)F(L)L)1)1)1) 6A
-
Input sentence: SQCAAC152DA71A
Decoded sentence: P 9(PDD366AZ2IDD4K )F)F(L)L)1)1)1) 6A
am still trying to implement the word level, but I'd like to know if the problem can be solved using this approach (seq2seq)

Related

Repetitive regular expression in Lua

I need to find a pattern of 6 pairs of hexadecimal numbers (without 0x), eg.
"00 5a 4f 23 aa 89"
This pattern works for me, but the question is if there any way to simplify it?
[%da-f][%da-f]%s[%da-f][%da-f]%s[%da-f][%da-f]%s[%da-f][%da-f]%s[%da-f][%da-f]%s[%da-f][%da-f]
Lua patterns do not support limiting quantifiers and many more features that regular expressions support (hence, Lua patterns are not even regular expressions).
You can build the pattern dynamically since you know how many times you need to repeat a part of a pattern:
local text = '00 5a 4f 23 aa 89'
local answer = text:match('[%da-f][%da-f]'..('%s[%da-f][%da-f]'):rep(5) )
print (answer)
-- => 00 5a 4f 23 aa 89
See the Lua demo.
The '[%da-f][%da-f]'..('%s[%da-f][%da-f]'):rep(5) can be further shortened with %x hex char shorthand:
'%x%x'..('%s%x%x'):rep(5)
Lua supports %x for hexadecimal digits, so you can replace all every [%da-f] with %x:
%x%x%s%x%x%s%x%x%s%x%x%s%x%x%s%x%x
Lua doesn't support specific quantifiers {n}. If it did, you could make it quite a lot shorter.
Also you can use a "One or more" with the Plus-Sign to shorten up...
print(('Your MAC is: 00 5a 4f 23 aa 89'):match('%x+%s%x+%s%x+%s%x+%s%x+%s%x+'))
-- Tested in Lua 5.1 up to 5.4
It is described under "Pattern Item:" in...
https://www.lua.org/manual/5.4/manual.html#6.4.1
final solution:
local text = '00 5a 4f 23 aa 89'
local pattern = '%x%x'..('%s%x%x'):rep(5)
local answer = text:match(pattern)
print (answer)

Savitzky - Golay filter for 2D Matrices

i am doing some research about implementing a Savitzky-Golay filter for images. As far as i have read, the main application for this filter is signal processing, e.g. for smoothing audio-files.
The idea is fitting a polynomial through a defined neighbourhood around point P(i) and setting this point P to his new value P_new(i) = polynomial(i).
The problem in 2D-space is - in my opinion - that there is not only one direction to do the fitting. You can use different "directions" to find a polynomial. Like for
[51 52 11 33 34]
[41 42 12 24 01]
[01 02 PP 03 04]
[21 23 13 43 44]
[31 32 14 53 54]
It could be:
[01 02 PP 03 04], (horizontal)
[11 12 PP 23 24], (vertical)
[51 42 PP 43 54], (diagonal)
[41 42 PP 43 44], (semi-diagonal?)
but also
[41 02 PP 03 44], (semi-diagonal as well)
(see my illustration)
So my question is: Does the Savitzky-Golay filter even make sense for 2D-space, and if yes, is there and any defined generalized form for this filter for higher dimensions and larger filter masks?
Thank you !
A first option is to use SG filtering in a separable way, i.e. filtering once on the horizontal rows, then a second time on the vertical rows.
A second option is to rewrite the equations with a bivariate polynomial (bicubic f.i.) and solve for the coefficients by least-squares.

The DHT in JPEG does not contain the actual huffman code, how come?

The DHT contains 16 bytes that just contains count of how many values were encoded with huffman code of each length from 1 bit all the way to 16 bits. After this, it contains the actual values that were encoded, all these value are 8 bits in size.
Q: Why is huffman code not stored, how does decoder derive the codes?
Q: If there are say 4 values that have huffman code of 3 bits long, we shall write them as 4 bytes. Does it matter what order they are in or they have to be in ascending or descending order? I do know that the values must be in order such that the values with 1 bit huffman code are then followed by values with 2 bit huffman code e.t.c.
Q: I have used jpegsnoop to look at huffman table of different files, some made in MS paint and some were downloaded. I find that they all have the SAME table.
Here are the DHT entries I got from JPEG snoop:
Destination ID = 1
Class = 1 (AC Table)
Codes of length 01 bits (000 total):
Codes of length 02 bits (002 total): 00 01
Codes of length 03 bits (001 total): 02
Codes of length 04 bits (002 total): 03 11
Codes of length 05 bits (004 total): 04 05 21 31
Codes of length 06 bits (004 total): 06 12 41 51
Codes of length 07 bits (003 total): 07 61 71
Codes of length 08 bits (004 total): 13 22 32 81
Codes of length 09 bits (007 total): 08 14 42 91 A1 B1 C1
Codes of length 10 bits (005 total): 09 23 33 52 F0
Codes of length 11 bits (004 total): 15 62 72 D1
Codes of length 12 bits (004 total): 0A 16 24 34
Codes of length 13 bits (000 total):
Codes of length 14 bits (001 total): E1
Codes of length 15 bits (002 total): 25 F1
Codes of length 16 bits (119 total): 17 18 19 1A 26 27 28 29 2A 35 36 37 38 39 3A 43
44 45 46 47 48 49 4A 53 54 55 56 57 58 59 5A 63
64 65 66 67 68 69 6A 73 74 75 76 77 78 79 7A 82
83 84 85 86 87 88 89 8A 92 93 94 95 96 97 98 99
9A A2 A3 A4 A5 A6 A7 A8 A9 AA B2 B3 B4 B5 B6 B7
B8 B9 BA C2 C3 C4 C5 C6 C7 C8 C9 CA D2 D3 D4 D5
D6 D7 D8 D9 DA E2 E3 E4 E5 E6 E7 E8 E9 EA F2 F3
F4 F5 F6 F7 F8 F9 FA
Total number of codes: 162
And
Destination ID = 1
Class = 0 (DC / Lossless Table)
Codes of length 01 bits (000 total):
Codes of length 02 bits (003 total): 00 01 02
Codes of length 03 bits (001 total): 03
Codes of length 04 bits (001 total): 04
Codes of length 05 bits (001 total): 05
Codes of length 06 bits (001 total): 06
Codes of length 07 bits (001 total): 07
Codes of length 08 bits (001 total): 08
Codes of length 09 bits (001 total): 09
Codes of length 10 bits (001 total): 0A
Codes of length 11 bits (001 total): 0B
Codes of length 12 bits (000 total):
Codes of length 13 bits (000 total):
Codes of length 14 bits (000 total):
Codes of length 15 bits (000 total):
Codes of length 16 bits (000 total):
Total number of codes: 012
The AC table compresses RRRRSSSS that contain zero-run length and AC coefficient magnitude while the DC table compresses SSSS. Thus, I think that the AC table must contain total of 255 entries (exlcuded 0) while the DC table must be 15 entries (excluded 0). However, neither of the tables contain this many total number of codes. WHY?
Q: Why is huffman code not stored, how does decoder derive the codes?
The reason the Huffman tables is defined as they are rather than with the actual codes is that it is much smaller and simpler to encode that way. PNG uses a similar but different method.
Keep in mind that to store the Huffman codes in the JPEG stream you would need to include both the length and the code itself. The result would be much larger than encoding a count of lengths.
Q: If there are say 4 values that have huffman code of 3 bits long, we shall write them as 4 bytes. Does it matter what order they are in or they have to be in ascending or descending order?
If the Huffman code has 3 bits, it is written as three bits to the JPEG stream. The codes are generated in ascending order.
Q: I have used jpegsnoop to look at huffman table of different files, some made in MS paint and some were downloaded. I find that they all have the SAME table.
The encoder is being lazy and using a fixed Huffman table. There is a sample Huffman table in the JPEG standard that they often use. To generate optimal Huffman codes, the encoder must make two passes over the data. With a preset table, the encoder only needs to make one pass.
F.1.2.1.2 and F.1.2.2.1 of the JPEG Specification explain why the Huffman tables are not fully populated. For baseline encoding DC difference values are limited to 11 bits (table F.1) and AC values are limited to 10 bits (table F.2).
Since DC Huffman symbols only need SSSS values from 0 to 11 their Huffman trees need only 12 codes as you've reported.
AC Huffman symbols have a prefix zero run count from 0 to 15. With 11 bit sizes that works out to 16 * 11 = 176 symbols. However, they don't include the symbols 0x10, 0x20, ... 0xE0 because they are redundant. They encode a run of 1, 2, ... 14 zeros followed by a 0 value. If an encoder has, say, 7 zero values followed by a 3 bit value it can encode that as 0x73. There would be no point encoding it with two symbols 0x60;0x03.
Ignoring those 14 useless symbols we end up with 162 codes as you have reported.
By the way, the 0xF0 (ZRL) value is needed because there isn't a symbol that can express a run of 16 zeros followed by a value thus is cannot be merged.
I don't know why the JPEG spec limits the DC and AC values to a certain number of bits. I speculate that the extra precision would have no effect or is typically thrown away by quantization. Or maybe it has to do with the mathematics of the Inverse Discrete Cosine Transform. Keep in mind that these Huffman encoded values are (quantized) coefficients for the IDCT and are only indirectly related to the continuous tone RGB output.
The Huffman encoding is almost fully determined by the relative frequencies of all 256 symbols (except tiebreaker rules). This means you can choose many, many formats to encode those relative frequencies; the most simple one would be to simply store all these frequencies. The receiver can then rebuild the encoding from that order.
Background: the two least frequent characters of a Huffman encoding share the same (long) prefix, and differ only in the last bit. This combination is then assigned a joint frequency (sum of both combinations), which is used recursively to determine the prefix. Finally, you end up with two sets, one holding X characters and the other holding 256-X characters. The first set has prefix 0 and the second set has prefix 1.
Yes, that's arbitrary, you could swap those 0 and 1, and have a similar table and the same compression ratio - a 0 is just as long as a one. That's why you have detailed rules (e.g. most common set gets 0, tiebreaker is first byte in set)
Back to encoding. You want to store these relative frequencies efficiently, as we're using compression here. Now, as I pointed out, when we have codes suffix-0 and suffix-1, they're both equally long (namely the suffix length plus one). So we know from the fact that there are 119 unique 16 bits codes that there are 60 unique prefixes with length 15. Calculating backwards, we also know that there are two unique symbols with length 15, total 62, so there must be 31 prefixes of length 14. We can backtrack again to prefixes of length 1.
Again, it's necessary to point out that we don't know here the exact values of those prefixes, and the matching symbols. This depends on the tiebreaker rules, as pointed out, but those rules are fixed for JPEG.
JPEG does have a bit of a special case: Huffman codes for very rare symbols should be longer than 16 bits. That's inconvenient, so in building the table you don't choose the two sets with the least combined frequency if either of them already has a long suffix - combining those two subsets would just make the suffix even longer. You see this with all the 16-bit codes in the example: most should have had longer codes in a pure Huffman encoding.
I think the worst case is if the most frequent character appears 50%, the next 25%, etc. You'll get codes 0, 10, 110, 1110 etcetera. That's unary counting, which is indeed optimal for that case, but the longest code is now 256 bits. You'd need a document with 2^256 bytes to have a frequency of 2^-256, though.

What format is this string in? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I'm trying to figure out what data an iPhone game app is sending. It uses the GKTurnBasedMatch framework.
I've captured some of its packets and I found a promising XML message with this string as the value for a game-state key:
eyJHYW1lR3VpZCI6ImJkMTIwYWZiLTNjNjQtNGU3Ni05MzdhLWZlMTYyOGUxZGI1NyIsIkdhbWVEYXRhIjoiQWdFVXdxbHA4aUxSdlZ3UWZXK1d0T0hpbkZXalVheHQ1MXlsM01MQ04ydytMZ2Q5aVNPeGJSRnZ2dzZnMnVTcjJSZ1hUY3lxYzlZWHVmZFZGZTFndVM0QjEwZXZcLzRCRXBPNGZQUHFlR0RadDk4WkxKdG83SXZQQkFQVGcyNmh0ZmRMTFB2cm1uM0o0b3NFWmtcL1lkeXNYeDBnMmtMNUdVZWtYXC9US24rYURVTlpJbzk4MGZLTkRwVnU0ays5cDdPOER3RVRCUU8rOVZhdEhPSUh1eDBFR0dUaUtwVGRSS0tEWUFkb3c1bmtnZ2FOakwrYVU4Q1FudXJIajZcLzNXSzZPbzNvVVpRR3o4VU50RFB2QzFvYm90TTNnRHBjdDNaZ250WDZlWkxweDVvc3VpUnN4OFNWcWFpR29hTTg5VE8xeG84ZXRXb0hXQXFmYU13eG5mblJ2TFprVTVxRkZnTGljS0U5dUZDZUg5UCs0cDlXMFFQejg5TXlzaGF5dFdDdFJxOEZHYkg1c3dJeW85UmZ2Q1Y0V1daTE9CWDF1OWVZTXZGVHFkMStmMFwvdm5VSVJRUFp0NlNGY3pHaXhPWGtGaWpPeXNPMEtHT2F2M2JWK1F4UXEzSG95SUwrVjVBQTIyRmExWjB0Mjhub2F4ME1vUUF2T2ZaejZWSlwvY0dyNFNneWZQeCtEdnVVQU9lOFJycDNrK3YxZTlhXC94Vk43WGdKRG1vNzRBUkxcL1wveU5hMGx1MWxqRXN2NnVrVUJcLzZRdkJnZGdsSG1tOGhQa0VLTzJ0M0NveXBHM0FBaEUzVlVHcmNpbm9mTE11OTI0MDFpWmFXWUV5MTJOXC9XOXFKcFNtR2lsaXVYVmNITnA5Wm15U1hIQ05lOEVwNVhqSk1jZ3h4ZnRFRkEwaXlEQkM3b2tJVjdXNmc4ZHZDNGdNcVFnMXNSRjlXTGJxU0l3S3BKM1BGYm51QlZNb2xodDQzSTJLXC9tU2JWRU1VRnhyME5mNitDZnlNYm5PZ2ZOMG9wblFXXC9rNm5RQitkXC8yeVVWY2poVXBxb0RpR05IRFIydldcL09CZXJiQ3JUSmtZK3V3aFZWcDVSNlBqWHdPVXRNek1BenduU0tETGFNK1BIMitvZTY1WXRVZmNRM0p6Wkc5bUNpNk50ZmhZWUNscWtJcHpsNVYrRE1MUXdLR294eXhJVnhcL1F3U094Q3lvTjNGdm90RlhBU0ZEWXVGb1R0ckdHSHlrWlFIYzljYks4K2REUXAycVhHK3VoYSt5VEhseElRSjJUcTNmMU5zY3cyQXlcL3dncitVekdEV0lseVdiYWQzWXNTTVhOVUJucFNpaHYydkxEbEF3MFArUnVsOTMyUzFMeEttSzRKWnRvY08zVlp0UXlpUUpIcklhNVdzUTRvb3JnVHQrc0ZFemxqQytOeXRGQ3h1dlpcL042ZUVTUFROTDZsMFJHZnk4c0RuSnBiR1VvS0lsXC9zMFdnTFRxelp6cXY5bXJGTkpcL2c2WXYwNTNzc1BhSWJkbW1JVG4rcnpmeUhwTnJrVkFKN0NPdGFxd2VnbmRnN3FuMDhidGJjMHF5Mm5rZU5wVkRtejBINGY0eFVXMDROSjYrR0R3QnNoTVNqNUV4YVJVWXBza3F0XC9mQmZcL2t0cGFkTk9nWkt2dmNoOGJUTXBcL2J2TTVtdnptc0NOUUxsalM1cEZwRFI0RDVHQzk3RzZzMnMxNExiNE1Sd05oaXpKMCs1cEc0M1BId29tK1Vna1ZTcG9Td083Qk9HeGQ4VVhGZENCV0hmbmhpanY1SEptajJLcm84N3Fhc0tNT00xS3NOWEtEREduXC9DQmRJUDBqVFJ4a0p5MjFUYUswT2ptYktFbU43RHNWRjF0U09DTU1rU2FXRlJqNWNVZ3lzVzhSbzNvcnRzOEltQW1HWWVoVzNCTHZYbUdVcUljdXNHSUxVbzhuOW5vK2tjQnlSSzhIU3poVXlPclpBQndPcW5wd1FzZ084YVBjc2hyQW1zZWtqSmpLb2N1NkMxUEluWGhqMzJ4aWhhTVJnVGxwNFVnTjlkNXA2WGVwZ1A4SkZLOURmdjUrRDN2RnBsRFJvZnpLTmVucEVsV3QyM2xKTGg1NzlLUmtCWnpBdlJwOSthWDVEdHh1bDR4NnhZR2pBR04xUUk0Q2hNbXl3SVR3clJqQUNwSHlkK0tsQmNMQWpUNmNZZFhsUldkTW52SEpyV1RFMVUzaTZISkRZakNlbXhGMVg1MWVqR1BVSzZLOE9UKzJzYzNcL2tmQmJjM0I4VUJZMitJSWNGYTd1YWVFMFc1RWJDZnA0c004UzVWZ1dUM3VsdUI2aUJOQ2UyRFJrYnF2VU9IWmVJbVNmODd5VktnSysxMW5OVUNXNmtnMjhVTTdIdysyU1E3T1NXeEI3dXV3Q2tKejc0M0I4cVBlOVJQejJxWGxodGdMTmZTSGpCaWJXSytVVTVcL0UzU1l1dDIrdnJLSDNWS0NvMTNTbCtEQlBMVGFhSHhkUEhLR0pQdEdUMVZobERnMmRlTGlLZVkwaEdPMGhzSUFGWTVEYVJuU0xwWWd5Qmw1cEI3Mmt6MVJmXC9IWGhvRkE4bU04aHhLY3JSKzM4eXZ6YmwzeXZmRzdOTzFFeitSamF3ZVBaT1J1VWRBYkVIMmdUK01Gb3BPSStZR1lYV1BGVjhLVG1JbW5wYzBlNGo3NiszT3c0eGFhQ3dONGd0c3JSbnNiNEdpU2I2TVRoNDJwR2JpOEtMbk53bXJTRVM2NCtWMFoyVkQ1bFwvUUNIV1o2QnppVFhsdUZhYUdCMGVBeUxPWlQzMXF6bTZGMDBhNnVPNEY4M0c4dWUwa1poT2RBNlRIdXU0TVpDVHF0VEpkM3p2WmU4bTdtaG5SNlFhM2tOd3BNZW9lcjY4ZGxiaGRUYmtpVFNYODh1Z3VvNVlcL1IyemRhQ0NYTERuZUFjZWRmNmlDWGFuXC9ONFZsekRhYVRLdUxoY2hnM2NFMllFWGtieGxSN29INWN1Y3hWUnhQTG1VZEo1TFFrdVwvZ0Nwc3pGVExTUUJnVUdkQjdzams4cE1hZUthOGl4K0RXYWVoTjE1ZFBVd1c3NEJVWG5ucTMxdG1NdnJ6M2xUTXF5OGF6NFM5ajRDd1pubjI0QzR5XC9RNTI1emN3djM2RFJyMVpcL2ZyTWl1Zk9tVGNSSjFlYzJ0XC85NU1HWDZITERqUW5waytoajBEaG5xNGw5WHAyblRtZElYUktMM2p1bTNJZFwva21EbkkwTFptQzBTRStsU1doQWI4aXFIaXl2MWxQTGJpYkkrUm15TlVNdXpcL0lxMStiMCtiTlNVRXpYTXg1MDJHK2hJcHhyNWFYQ0ZOYktTWEpXZDlIdGpHNHBFQU94RENQc1BMTjF4bjhkZWZLQkdkcytuV3ZBa3EyZXlOM2NxM0sxU3ByaHZLWXd4enJcL0FEVTlYTXMyVDVkS3F1bzdlclBHaHp2NU5BTUVSUWNnWGNkRjRaZ2NOZ3pmYTBLOGd3aEpsOGR4ZU02cjl5VEROcWQ0YkpQNEgyXC8rczI2eXlwVkRnaUhQVWRxMGxUanFlN1NtbTREWG41VWthV0JHSnBJRDdNWDk5d29hN2N0YnNhOXM4OGVFSWJ0YmRHWDAzMkRxS1wvcUZrSXdRU0I4RXZhTzNJcVlnQ0I5YXBSQlp4ck5mVk9kMGlUNVNYckFseFF6QW1Jd0dxaGtRZVpkVm92M3BJREk4QkxpdCs4eVNQUGMyejQrTjJvM3F2VlJneVk4WmY5clNtdHM1M0tURmxNVzZTM0tIQ2pDc3FWN1poVkZGNDdwN0F4NWV5R05rVFFWUFROZ2xlRWNyeEVXRWszYUQ1SzlLTjM0UEQ1dGd6eDd5dGM0TUdvb0hmTEpWTDQ1ekJYYTJCVXdKbHVZUGltaXM1TXh6ZUt6NzNHKytJSzllV3haMERoSkV2bnFubUlES29wS1B0S0orbFpYMlFNcEtXNVBObjdLc0FiN2hjQzFrb1oxQjhrcnhvQnFVejZZNnVBYTY5akpjUmhOZlNNbG8rMzA3MG5hbFpST1p3NHRTVVpJclwvYXNUc25nVWZ0XC9JRUNcL3FtSTRvZ0NjNGxLZ0VjXC94K0tzV1p3K0g3Qk4zODVGXC9qaCtVcG9sTGVEU3lTV3BLbXY1Mll1eDRpOHlkaVNNcFc2Y0RNNmFrT2ZUZWhkclNRTlAxd3BnR0NoZGM5NHloWmx2NmZJYVN5T3dFQXRvc3FQQ1RKTmg1UFI4ZVBram5FakR6dXlNRmhcL2pYSEhiWEpFdWkyd2kzSmZWYlNYWWpyTDlNeUd1SFYxXC9YSnZrZ0ZOU1FHZ2RSWllCSjhIckhmWGdraWt6WEFJNzVLc2twYzlud2RzTFNNRHhpVitVRlwvVFo3R2REMFQ2QkxKSHlCSitUTGdtZDZOZUZtR0cyRVZjb0dBRjY0ZThkOUdmS3BcL1pwRlh5N0dnamtRZjNrR0c1MjdlV3FiUkJhNWtldW0xUzd0RE00M0NBRlBqT0hEbnVYang5RUVsanJXbTErck1mWng4bWtzV3h4cmVFczBXUXBPaHZiT1JIcjZcL2RVSGdKY1l3SFk0bkQ3UWdGdlJpTXdndnRtR25tMXB6QzVZU3U3TEt2YWVEVGtyakZMVGhaTFBDZFQreStJRnNsXC9RdzRTaFJlSGN2bWdRT2NBSnlJVnlhNnY0T2RUSEwwSk95Z3F2SGthdWdQeFFjekEwRDV4bXl2XC9zOHhWb2F2M2xjOFwvRjhvWm5JMlB0b2xzMjdESVBsTkhOMkt2aXp1NDdKSmh2VW9WT1lqWlV3bEZ2RzUrbXB2c0hIckhld21zM1FLRVwvN3V1RklxOXlid2J6NFwvUzVrdE5XQ29OUnh3RDNhMVhvSmlSSjdUcVh2V2M1U1ZuUVJ5M3h3Z0lvV0VtbFNaTjREU3JidHFOVzh2UFFpYUV0SlF0azRKeFBvNVwvajlXblFiUjJMeTFJbWRSVldXNEZERG02WGtkSEx3b05KZ0ZIYXQ1RUh1NDJRZGJjcytXclFTTExBdmVsZGJkWHdSMkJHR1FCa2JidkVtTTBqMHhXR2lcL2hKVFg3aFFxWXNiWkpnR3dXeHVheTd4dE00dWw1Q1NWQ1ArZU9NdDJPcHEzQVU0aEJLSnJnRUN3R1FDM3lhM3ZlVENXZnk0M2lEQ3N2dTlcL0Qyanh1bGdRR1pJWTZObDU4c3A4SVFlaGxDTk5scnNQN3haZ0ZxVVZxS3ViblF0R2tLeENvM01nSUtCSVQxQzB6OHpDd3ZHTFNKelBCaEFYOWNENU10djg0bGZXdmpRSFczNWdpNHJmdkc4cUZ3SGw4RlVjWjdcL05udk15U2l3K0pZRlU2S3pMMFVXdnRaVTZrMWtVWmZnaWNNSDdUWlRkTWhodU9ZTzVrMDlSbDNrbHVYNGdzV3hSUTRNVWtkMTJsN1FiajlnUGVhejRTOTRwMXdzNUhqQ1lhUVZEMzhuc1dqb1VsXC9xQVVlWDMrR2FcL1RzRXFkN1NocFYxT3dYOTRzaDYzb3lKb1gzXC9tZmlxRGlMZUdMSEhtQWhqdXZwNFJYUlwvTmJheU01WUdhUzFNaURMYzgrRzczZEtwa1JtM2xTNDRGT0lISHdcL1wvSjlvbXFJaXV2YkM0aGxHTUVcL3lPMW82NVFCOXlPSDEyV0VFSktCbGFDQlFUZzRmTjlMRlFxMU9jVjJkRkd6UDdTZ2hHb0VUMHRTTjhkc2toVWhnUUJvaWpicGlxVWRyNEprOU8rbkI0UXZcL2xFM29BNFwvWG8rblU1eDBFamgzRWFoampSQ1ZzU3RHUzdqTzVuUE9ua2hCUjROV3Q0Sm9Ca2dOK0lMOHJDb2xUekVXUUFrVFkyTzBoQjB1bldCaHVUR0JHbk9wUHl3amV6ckZ5Wlg3M1hcL3NISGpRYXBYdndSZGJKMm5Wdm05bVc0amZ1b2VlQnpsdVE4Z0NXU1FwOW1TK2VrUHg5ZVNwM2ZhMUFTeVl1V2V3TU5SM3ViUENIUU0yTkw2aFIrdmRvWCtoY2h1dzN2UnpLaDhBZzhaXC9HbEI5cENWYUpIK1QzaHpUMHowSTdJT1dxXC9PWG9JNEIyV1hPa1lMM0ZNRm5FUk9cL2I2bzA4Q0puWU52S1h0YW9lTEtBVnFEWUFmdFFKSTlmV0ZkazZCTDg0c2xXR2F4bVZ1cDF3Vk50NDBxV1wveVI5dmJnSHFrbVwvWmtcL3loTm9cL1ZLaWR6M2JiWVNHcVhoQzJzajFMTGpJS29WZDNKMHV5TDFJZjViZVNYM28yQkhSYmkwWmZMbnR3enNubGFQZmF3MHQ5bkZYbWdMRUVVaDBnR1wvZmk0RU9wQ2hDNDNUNFc0K2hLZDQzSTc2ekQzQzhZOTdlK01icnloekdJWGF4Qmd4NUdPR1FnWlZPNFdlU241VG40WXMyQ01vSkpnVEpudmgwUVE3WTk2MnBwSDF4Y0FjdGdQTG51eFhwVzQ5QnR2MG5NNUljZkwxZ0N5RUNvblo1R2ZuQW9FeUk5amM3UzFXMjF0RUpsdEZ5d3lzVmVBQktxamZjeGlBR0lhNU9rOTJXUkFcL2kzb2V6aHl6NHBvcGNlelcybysrTUgzSUF4RmVOV1JnVm1hXC9FMDRWK05JTlE1RU4rTXpSd1lGVjdzREVJaDlkanBuenJ5cEZycE1IclBlXC9qOGFuYkc4aE1tajZFN0JoN3dvdENTZno0aDhGekxhMTF6am9keUVtNG5SSThLZXZTOHpkYzdqNDFKZ0xhTmRaV1VvdzJ6RnpPY3FiWUNVZDlxaEh1MXQ3eWNlNWR4dkJTMk0wZVZ0eEw4cGRUZnFYTFhNb0diSldWYlIyWWREVTFtRzdlaGtcLzVFVStibUZRWExPNWpnUWp4OVwvRHhzK2EwZFJYRnE4WHlvVEdQd1FTaVRZa3RuVGE4NzhIQlBBN2Y4R1JkUVRlaU8wb08xNGVNWHBNYlZ4RHZcL0lBZnU1QTlFZ1FTQjNjS0xNeEFrSVkyb2UyK01INEdkQkhadDQ4Y1NXRHpLdUtqREFHa3MwMTZHVWFYMHJ5VnhoejQ1blFaa0gxNmFlTGFDS0F1MUs5VVwveFdRMDB6eDBSdWFLOCtCK1wvKzFnUHkwVHh5VkhtOEZ3UWpPYVZxM0lhbEdBV0hodnBjYWtRNk54T3gyaFZzSkN4bEl4TVVCZ3B4djF2VlN3aXA2T2RhVHV4cENYeTZhTEFabUlSeGNINlNRWWVmNERiNzhqT1hCc1NFRWJ5bkJ0OEp6OFRtcmhvWHRPUVwveDhsTlwvM0s0T0k1ZEpBbStLZHN3TzBlVUdMdz09In0=
Base64, of course! So I decoded it and got:
{"GameGuid":"bd120afb-3c64-4e76-937a-fe1628e1db57","GameData":"AgEUwqlp8iLRvVwQfW+WtOHinFWjUaxt51yl3MLCN2w+Lgd9iSOxbRFvvw6g2uSr2RgXTcyqc9YXufdVFe1guS4B10ev\/4BEpO4fPPqeGDZt98ZLJto7IvPBAPTg26htfdLLPvrmn3J4osEZk\/YdysXx0g2kL5GUekX\/TKn+aDUNZIo980fKNDpVu4k+9p7O8DwETBQO+9VatHOIHux0EGGTiKpTdRKKDYAdow5nkggaNjL+aU8CQnurHj6\/3WK6Oo3oUZQGz8UNtDPvC1obotM3gDpct3ZgntX6eZLpx5osuiRsx8SVqaiGoaM89TO1xo8etWoHWAqfaMwxnfnRvLZkU5qFFgLicKE9uFCeH9P+4p9W0QPz89MyshaytWCtRq8FGbH5swIyo9RfvCV4WWZLOBX1u9eYMvFTqd1+f0\/vnUIRQPZt6SFczGixOXkFijOysO0KGOav3bV+QxQq3HoyIL+V5AA22Fa1Z0t28noax0MoQAvOfZz6VJ\/cGr4SgyfPx+DvuUAOe8Rrp3k+v1e9a\/xVN7XgJDmo74ARL\/\/yNa0lu1ljEsv6ukUB\/6QvBgdglHmm8hPkEKO2t3CoypG3AAhE3VUGrcinofLMu92401iZaWYEy12N\/W9qJpSmGiliuXVcHNp9ZmySXHCNe8Ep5XjJMcgxxftEFA0iyDBC7okIV7W6g8dvC4gMqQg1sRF9WLbqSIwKpJ3PFbnuBVMolht43I2K\/mSbVEMUFxr0Nf6+CfyMbnOgfN0opnQW\/k6nQB+d\/2yUVcjhUpqoDiGNHDR2vW\/OBerbCrTJkY+uwhVVp5R6PjXwOUtMzMAzwnSKDLaM+PH2+oe65YtUfcQ3JzZG9mCi6NtfhYYClqkIpzl5V+DMLQwKGoxyxIVx\/QwSOxCyoN3FvotFXASFDYuFoTtrGGHykZQHc9cbK8+dDQp2qXG+uha+yTHlxIQJ2Tq3f1Nscw2Ay\/wgr+UzGDWIlyWbad3YsSMXNUBnpSihv2vLDlAw0P+Rul932S1LxKmK4JZtocO3VZtQyiQJHrIa5WsQ4oorgTt+sFEzljC+NytFCxuvZ\/N6eESPTNL6l0RGfy8sDnJpbGUoKIl\/s0WgLTqzZzqv9mrFNJ\/g6Yv053ssPaIbdmmITn+rzfyHpNrkVAJ7COtaqwegndg7qn08btbc0qy2nkeNpVDmz0H4f4xUW04NJ6+GDwBshMSj5ExaRUYpskqt\/fBf\/ktpadNOgZKvvch8bTMp\/bvM5mvzmsCNQLljS5pFpDR4D5GC97G6s2s14Lb4MRwNhizJ0+5pG43PHwom+UgkVSpoSwO7BOGxd8UXFdCBWHfnhijv5HJmj2Kro87qasKMOM1KsNXKDDGn\/CBdIP0jTRxkJy21TaK0OjmbKEmN7DsVF1tSOCMMkSaWFRj5cUgysW8Ro3orts8ImAmGYehW3BLvXmGUqIcusGILUo8n9no+kcByRK8HSzhUyOrZABwOqnpwQsgO8aPcshrAmsekjJjKocu6C1PInXhj32xihaMRgTlp4UgN9d5p6XepgP8JFK9Dfv5+D3vFplDRofzKNenpElWt23lJLh579KRkBZzAvRp9+aX5Dtxul4x6xYGjAGN1QI4ChMmywITwrRjACpHyd+KlBcLAjT6cYdXlRWdMnvHJrWTE1U3i6HJDYjCemxF1X51ejGPUK6K8OT+2sc3\/kfBbc3B8UBY2+IIcFa7uaeE0W5EbCfp4sM8S5VgWT3uluB6iBNCe2DRkbqvUOHZeImSf87yVKgK+11nNUCW6kg28UM7Hw+2SQ7OSWxB7uuwCkJz743B8qPe9RPz2qXlhtgLNfSHjBibWK+UU5\/E3SYut2+vrKH3VKCo13Sl+DBPLTaaHxdPHKGJPtGT1VhlDg2deLiKeY0hGO0hsIAFY5DaRnSLpYgyBl5pB72kz1Rf\/HXhoFA8mM8hxKcrR+38yvzbl3yvfG7NO1Ez+RjawePZORuUdAbEH2gT+MFopOI+YGYXWPFV8KTmImnpc0e4j76+3Ow4xaaCwN4gtsrRnsb4GiSb6MTh42pGbi8KLnNwmrSES64+V0Z2VD5l\/QCHWZ6BziTXluFaaGB0eAyLOZT31qzm6F00a6uO4F83G8ue0kZhOdA6THuu4MZCTqtTJd3zvZe8m7mhnR6Qa3kNwpMeoer68dlbhdTbkiTSX88uguo5Y\/R2zdaCCXLDneAcedf6iCXan\/N4VlzDaaTKuLhchg3cE2YEXkbxlR7oH5cucxVRxPLmUdJ5LQku\/gCpszFTLSQBgUGdB7sjk8pMaeKa8ix+DWaehN15dPUwW74BUXnnq31tmMvrz3lTMqy8az4S9j4CwZnn24C4y\/Q525zcwv36DRr1Z\/frMiufOmTcRJ1ec2t\/95MGX6HLDjQnpk+hj0Dhnq4l9Xp2nTmdIXRKL3jum3Id\/kmDnI0LZmC0SE+lSWhAb8iqHiyv1lPLbibI+RmyNUMuz\/Iq1+b0+bNSUEzXMx502G+hIpxr5aXCFNbKSXJWd9HtjG4pEAOxDCPsPLN1xn8defKBGds+nWvAkq2eyN3cq3K1SprhvKYwxzr\/ADU9XMs2T5dKquo7erPGhzv5NAMERQcgXcdF4ZgcNgzfa0K8gwhJl8dxeM6r9yTDNqd4bJP4H2\/+s26yypVDgiHPUdq0lTjqe7Smm4DXn5UkaWBGJpID7MX99woa7ctbsa9s88eEIbtbdGX032DqK\/qFkIwQSB8EvaO3IqYgCB9apRBZxrNfVOd0iT5SXrAlxQzAmIwGqhkQeZdVov3pIDI8BLit+8ySPPc2z4+N2o3qvVRgyY8Zf9rSmts53KTFlMW6S3KHCjCsqV7ZhVFF47p7Ax5eyGNkTQVPTNgleEcrxEWEk3aD5K9KN34PD5tgzx7ytc4MGooHfLJVL45zBXa2BUwJluYPimis5MxzeKz73G++IK9eWxZ0DhJEvnqnmIDKopKPtKJ+lZX2QMpKW5PNn7KsAb7hcC1koZ1B8krxoBqUz6Y6uAa69jJcRhNfSMlo+3070nalZROZw4tSUZIr\/asTsngUft\/IEC\/qmI4ogCc4lKgEc\/x+KsWZw+H7BN385F\/jh+UpolLeDSySWpKmv52Yux4i8ydiSMpW6cDM6akOfTehdrSQNP1wpgGChdc94yhZlv6fIaSyOwEAtosqPCTJNh5PR8ePkjnEjDzuyMFh\/jXHHbXJEui2wi3JfVbSXYjrL9MyGuHV1\/XJvkgFNSQGgdRZYBJ8HrHfXgkikzXAI75Kskpc9nwdsLSMDxiV+UF\/TZ7GdD0T6BLJHyBJ+TLgmd6NeFmGG2EVcoGAF64e8d9GfKp\/ZpFXy7GgjkQf3kGG527eWqbRBa5keum1S7tDM43CAFPjOHDnuXjx9EEljrWm1+rMfZx8mksWxxreEs0WQpOhvbORHr6\/dUHgJcYwHY4nD7QgFvRiMwgvtmGnm1pzC5YSu7LKvaeDTkrjFLThZLPCdT+y+IFsl\/Qw4ShReHcvmgQOcAJyIVya6v4OdTHL0JOygqvHkaugPxQczA0D5xmyv\/s8xVoav3lc8\/F8oZnI2Ptols27DIPlNHN2Kvizu47JJhvUoVOYjZUwlFvG5+mpvsHHrHewms3QKE\/7uuFIq9ybwbz4\/S5ktNWCoNRxwD3a1XoJiRJ7TqXvWc5SVnQRy3xwgIoWEmlSZN4DSrbtqNW8vPQiaEtJQtk4JxPo5\/j9WnQbR2Ly1ImdRVWW4FDDm6XkdHLwoNJgFHat5EHu42Qdbcs+WrQSLLAveldbdXwR2BGGQBkbbvEmM0j0xWGi\/hJTX7hQqYsbZJgGwWxuay7xtM4ul5CSVCP+eOMt2Opq3AU4hBKJrgECwGQC3ya3veTCWfy43iDCsvu9\/D2jxulgQGZIY6Nl58sp8IQehlCNNlrsP7xZgFqUVqKubnQtGkKxCo3MgIKBIT1C0z8zCwvGLSJzPBhAX9cD5Mtv84lfWvjQHW35gi4rfvG8qFwHl8FUcZ7\/NnvMySiw+JYFU6KzL0UWvtZU6k1kUZfgicMH7TZTdMhhuOYO5k09Rl3kluX4gsWxRQ4MUkd12l7Qbj9gPeaz4S94p1ws5HjCYaQVD38nsWjoUl\/qAUeX3+Ga\/TsEqd7ShpV1OwX94sh63oyJoX3\/mfiqDiLeGLHHmAhjuvp4RXR\/NbayM5YGaS1MiDLc8+G73dKpkRm3lS44FOIHHw\/\/J9omqIiuvbC4hlGME\/yO1o65QB9yOH12WEEJKBlaCBQTg4fN9LFQq1OcV2dFGzP7SghGoET0tSN8dskhUhgQBoijbpiqUdr4Jk9O+nB4Qv\/lE3oA4\/Xo+nU5x0Ejh3EahjjRCVsStGS7jO5nPOnkhBR4NWt4JoBkgN+IL8rColTzEWQAkTY2O0hB0unWBhuTGBGnOpPywjezrFyZX73X\/sHHjQapXvwRdbJ2nVvm9mW4jfuoeeBzluQ8gCWSQp9mS+ekPx9eSp3fa1ASyYuWewMNR3ubPCHQM2NL6hR+vdoX+hchuw3vRzKh8Ag8Z\/GlB9pCVaJH+T3hzT0z0I7IOWq\/OXoI4B2WXOkYL3FMFnERO\/b6o08CJnYNvKXtaoeLKAVqDYAftQJI9fWFdk6BL84slWGaxmVup1wVNt40qW\/yR9vbgHqkm\/Zk\/yhNo\/VKidz3bbYSGqXhC2sj1LLjIKoVd3J0uyL1If5beSX3o2BHRbi0ZfLntwzsnlaPfaw0t9nFXmgLEEUh0gG\/fi4EOpChC43T4W4+hKd43I76zD3C8Y97e+MbryhzGIXaxBgx5GOGQgZVO4WeSn5Tn4Ys2CMoJJgTJnvh0QQ7Y962ppH1xcActgPLnuxXpW49Btv0nM5IcfL1gCyEConZ5GfnAoEyI9jc7S1W21tEJltFywysVeABKqjfcxiAGIa5Ok92WRA\/i3oezhyz4popcezW2o++MH3IAxFeNWRgVma\/E04V+NINQ5EN+MzRwYFV7sDEIh9djpnzrypFrpMHrPe\/j8anbG8hMmj6E7Bh7wotCSfz4h8FzLa11zjodyEm4nRI8KevS8zdc7j41JgLaNdZWUow2zFzOcqbYCUd9qhHu1t7yce5dxvBS2M0eVtxL8pdTfqXLXMoGbJWVbR2YdDU1mG7ehk\/5EU+bmFQXLO5jgQjx9\/Dxs+a0dRXFq8XyoTGPwQSiTYktnTa878HBPA7f8GRdQTeiO0oO14eMXpMbVxDv\/IAfu5A9EgQSB3cKLMxAkIY2oe2+MH4GdBHZt48cSWDzKuKjDAGks016GUaX0ryVxhz45nQZkH16aeLaCKAu1K9U\/xWQ00zx0RuaK8+B+\/+1gPy0TxyVHm8FwQjOaVq3IalGAWHhvpcakQ6NxOx2hVsJCxlIxMUBgpxv1vVSwip6OdaTuxpCXy6aLAZmIRxcH6SQYef4Db78jOXBsSEEbynBt8Jz8TmrhoXtOQ\/x8lN\/3K4OI5dJAm+KdswO0eUGLw=="}
Looks like a dict with another Base64 string in the GameData key. However, Base64 decoding that gives me a bunch of binary data:
02 01 14 c2 a9 69 f2 22 d1 bd 5c 10 7d 6f 96 b4 .....i."..\.}o..
e1 e2 9c 55 a3 51 ac 6d e7 5c a5 dc c2 c2 37 6c ...U.Q.m.\....7l
3e 2e 07 7d 89 23 b1 6d 11 6f bf 0e a0 da e4 ab >..}.#.m.o......
d9 18 17 4d cc aa 73 d6 17 b9 f7 55 15 ed 60 b9 ...M..s....U..`.
which is uncompressible:
>>> len(game_data)
4114
>>> len(game_data.encode("zlib"))
4125
It's not zlib-encoded:
>>> game_data.decode("zlib")
Traceback (most recent call last):
File "<pyshell#126>", line 1, in <module>
game_data.decode("zlib")
File "C:\Python27\lib\encodings\zlib_codec.py", line 43, in zlib_decode
output = zlib.decompress(input)
error: Error -3 while decompressing data: incorrect header check
And it's not even zlib without the header:
>>> def inflate(data):
import zlib
decompress = zlib.decompressobj(
-zlib.MAX_WBITS # see above
)
inflated = decompress.decompress(data)
inflated += decompress.flush()
return inflated
>>> inflate("roflcopters".encode("zlib")[2:])
'roflcopters'
>>> inflate(game_data)
Traceback (most recent call last):
File "<pyshell#130>", line 1, in <module>
inflate(game_data)
File "<pyshell#128>", line 6, in inflate
inflated = decompress.decompress(data)
error: Error -3 while decompressing: invalid distance too far back
I've tried using this online Objective-C compiler along with various classes like NSUnarchiver, NSKeyedUnarchiver, and NSPropertyListSerialization, but no luck, yet. Those all seem to produce output which at least has recognizable strings in it so even if they are used, something else must be going on as well.
The only similarity between different batches has been that they all start with 0x0201. Everything else seems totally different, even for subsequent updates for the same match, which makes me wonder if there's some obfuscation/encryption going on...
Any tips on where I can go from here?
It's almost certainly some proprietary structure from within the game, serialized out to bytes. 0x0201 could well be versioning for a struct, or just a set of flags that doesn't change across blobs you've seen.
There's no need to assume this is intentionally obfuscated or encrypted data. Standard textual (JSON, XML) and binary (bplist) containers are increasingly ubiquitous and often make one's life easier, but there's nothing nefarious about representing data in a more raw binary format if it's convenient. (See below re: encryption)
To really reverse engineer this in any more detail may be a Sisyphean task: figure out what the values in the binary blob are, numerically or otherwise. Match up the game state data with known (or unknown) values for the game. Do reverse engineering on the code to see what it's writing. That's some varsity stuff, but it's possible.
Re: encryption: encryption, or at least signing, is common in some parts of online gaming to prevent tampering with game state by bots to gain advantage. Whether that's happening here or not is anyone's guess. A bunch of floating point numbers that represent world positions could look similarly random.

Addressing memory data in 32 bit protected mode with nasm

So my book says i can define a table of words like so:
table: dw "13,37,99,99"
and that i can snatch values from the table by incrementing the index into the address of the table like so:
mov ax, [table+2] ; should give me 37
but instead it places 0x2c33 in ax rather than 0x3337
is this because of a difference in system architecture? maybe because the book is for 386 and i'm running 686?
0x2C is a comma , and 0x33 is the character 3, and they appear at positions 2 and 3 in your string, as expected. (I'm a little confused as to what you were expecting, since you first say "should give me 37" and later say "rather than 0x3337".)
You have defined a string constant when I suspect that you didn't mean to. The following:
dw "13,37,99,99"
Will produce the following output:
Offset 00 01 02 03 04 05 06 07 08 09 0A 0B
31 33 2C 33 37 2C 39 39 2C 39 39 00
Why? Because:
31 is the ASCII code for '1'
33 is the ASCII code for '3'
2C is the ASCII code for ','
...
39 is the ASCII code for '9'
NASM also null-terminates your string by putting 0 byte at the end (If you don't want your strings to be null-terminated use single quotes instead, '13,37,99,99')
Take into account that ax holds two bytes and it should be fairly clear why ax contains 0x2C33.
I suspect what you wanted was more along the lines of this (no quotes and we use db to indicate we are declaring byte-sized data instead of dw that declares word-sized data):
db 13,37,99,99
This would still give you 0x6363 (ax holds two bytes / conversion of 99, 99 to hex). Not sure where you got 0x3337 from.
I recommend that you install yourself a hex editor and have an experiment inspecting the output from NASM.

Resources