I recently downloaded gigabytes of data (text) in multiple files that I want to automatically process. However, the charset or actual encoding of text is wrong. The problem is that text editors such as Notepad++, SublimeText 3 or Word detect it simply as ANSI. I've tried all charsets there were available, but there are still parts that are amiss across files.
Default ANSI encoding (wrong special characters):
OBJEVUJE SE ZELENÁ KNÍ®KA
Frantík Severýn sedí na prázdných bednách od cukru, pohupuje bosýma
nohama a naslouchá kázání páně Bočanovu. Kázání nepatří jemu, nýbrľ
paní Bílkové, která stojí před pultem. Frantík se tváří, jako by se
nezajímal o nic jiného neľ o své zablácené klátící se nohy. Zatím vąak
napíná uąi, aby mu neuąlo ani slovíčko.
»Tak to dál nepůjde, milá paní,« křičí hokynář a jeho tlustý zátylek
je rudý zlostí. »Jedno zboľí nezaplatíte a uľ zas chcete nové na dluh.
Copak si myslíte, ľe kradu?«
ISO 8859-2 encoding (wrong quotation marks):
OBJEVUJE SE ZELENÁ KNÍŽKA
Frantík Severýn sedí na prázdných bednách od cukru, pohupuje bosýma
nohama a naslouchá kázání páně Bočanovu. Kázání nepatří jemu, nýbrž
paní Bílkové, která stojí před pultem. Frantík se tváří, jako by se
nezajímal o nic jiného než o své zablácené klátící se nohy. Zatím však
napíná uši, aby mu neušlo ani slovíčko.
ťTak to dál nepůjde, milá paní,Ť křičí hokynář a jeho tlustý zátylek
je rudý zlostí. ťJedno zboží nezaplatíte a už zas chcete nové na dluh.
Copak si myslíte, že kradu?Ť
DESIRED OUTPUT:
OBJEVUJE SE ZELENÁ KNÍŽKA
Frantík Severýn sedí na prázdných bednách od cukru, pohupuje bosýma
nohama a naslouchá kázání páně Bočanovu. Kázání nepatří jemu, nýbrž
paní Bílkové, která stojí před pultem. Frantík se tváří, jako by se
nezajímal o nic jiného než o své zablácené klátící se nohy. Zatím však
napíná uši, aby mu neušlo ani slovíčko.
»Tak to dál nepůjde, milá paní,« křičí hokynář a jeho tlustý zátylek
je rudý zlostí. »Jedno zboží nezaplatíte a už zas chcete nové na dluh.
Copak si myslíte, že kradu?«
What character encoding is this?
After reading this I suspect that it might be an older/legacy one, but I am not sure how to fix it as I don't know any software that supports it. Another option is that it might be just corrupt, because all quotation marks seem to be encoded as ť/Ť. How can I verify this?
EDIT: hex information:
KNͮKA = 4B 4E CD AE 4B 41
»Tak to dál nepůjde = BB 54 61 6B 20 74 6F 20 64 E1 6C 20 6E 65 70 F9 6A 64 65
co má chu» vstát = 63 6F 20 6D E1 20 63 68 75 BB 20 76 73 74 E1 74
Use UTF-8, not ascii, not iso-..., not latin....
latin1 comes close, but misses the ř.
You say it was "downloaded". Can you show us the hex for the characters in question?
»Žřč converts to hex:
C2BB C5BD C599 C48D in UTF-8 -- the only one that can handle all chars
BB 8E 3F 3F in latin1
BB 8E F8 3F in cp1250
3F AE F8 E8 in latin2
Note: 3F is ?, meaning conversion problems.
Hex BB is ť in latin2.
Related
I try to create an elliptic public key by calculate the point on curve from a given number ( my private key ), so I have the coordinates (x,y) of elliptic curve point
I get the coordinates by
myPublicKeyCoordinates = myPrivateKeyValue * GPointOnCurve
How can i build the PEM ( or DER ) file for my public key?
I don't care about language (java, python, javascript, ...)
because i want to known how build the file ( even if i write every single byte... )
Assuming you already know about ITU-T X.680-201508 (the ASN.1 language) and ITU-T X.690-201508 (the BER (and CER) and DER encodings for ASN.1 data), the main defining document for Elliptic Curve Keys and their representation is https://www.secg.org/sec1-v2.pdf from the Standards for Efficient Cryptography Group (not the US Securites and Exchange Commission).
Section C.3 (Syntax for Elliptic Curve Public Keys) says that the general transport container for an EC public key is the X.509 SubjectPublicKeyInfo structure:
SubjectPublicKeyInfo ::= SEQUENCE {
algorithm AlgorithmIdentifier {{ECPKAlgorithms}} (WITH COMPONENTS
{algorithm, parameters}) ,
subjectPublicKey BIT STRING
}
The possible "algorithms" (which really means key encoding types) is the open-ended set
ECPKAlgorithms ALGORITHM ::= {
ecPublicKeyType |
ecPublicKeyTypeRestricted |
ecPublicKeyTypeSupplemented |
{OID ecdh PARMS ECDomainParameters {{SECGCurveNames}}} |
{OID ecmqv PARMS ECDomainParameters {{SECGCurveNames}}},
...
}
ecPublicKeyType ALGORITHM ::= {
OID id-ecPublicKey PARMS ECDomainParameters {{SECGCurveNames}}
}
...
ECDomainParameters came from C.2:
ECDomainParameters{ECDOMAIN:IOSet} ::= CHOICE {
specified SpecifiedECDomain,
named ECDOMAIN.&id({IOSet}),
implicitCA NULL
}
C.3 mentions about halfway through
The elliptic curve public key (a value of type ECPoint that is an OCTET STRING) is mapped to a subjectPublicKey (a value encoded as type BIT STRING) as follows: The most significant bit of the value of the OCTET STRING becomes the most significant bit of the value of the BIT STRING and so on with consecutive bits until the least significant bit of the OCTET STRING becomes the least significant bit of the BIT STRING.
So we seek backwards and find
An elliptic curve point itself is represented by the following type
ECPoint ::= OCTET STRING
whose value is the octet string obtained from the conversion routines given in Section 2.3.3.
2.3.3 (Elliptic-Curve-Point-to-Octet-String Conversion) has a lot of words, but the best supported format is not using point compression (and P != the point at infinity)
If P = (xP , yP ) != O and point compression is not being used, proceed as follows:
3.1. Convert the field element xP to an octet string X of length (log2 q)/8 octets using the conversion routine specified in Section 2.3.5.
3.2. Convert the field element yP to an octet string Y of length (log2 q)/8 octets using the conversion routine specified in Section 2.3.5.
3.3. Output M = 0416 || X || Y .
2.3.5 is a whole lot of words for "big endian byte order of a length long enough to hold all values in the field" (aka "leave in leading zeros").
So now we party.
Given the FIPS 186-3 reference key on secp256r1 (d=70A12C2DB16845ED56FF68CFC21A472B3F04D7D6851BF6349F2D7D5B3452B38A),
Q is
(8101ECE47464A6EAD70CF69A6E2BD3D88691A3262D22CBA4F7635EAFF26680A8, D8A12BA61D599235F67D9CB4D58F1783D3CA43E78F0A5ABAA624079936C0C3A9)
And the public key DER looks like
// SubjectPublicKeyInfo
30 XA
// AlgorithmIdentifier
30 XB
// AlgorithmIdentifier.id (id-ecPublicKey (1.2.840.10045.2.1))
06 07 2A 86 48 CE 3D 02 01
// AlgorithmIdentifier.parameters, using the named curve id (1.2.840.10045.3.1.7)
06 08 2A 86 48 CE 3D 03 01 07
// SubjectPublicKeyInfo.subjectPublicKey
03 XC 00
// Uncompressed public key
04
// Q.X
81 01 EC E4 74 64 A6 EA D7 0C F6 9A 6E 2B D3 D8
86 91 A3 26 2D 22 CB A4 F7 63 5E AF F2 66 80 A8
// Q.Y
D8 A1 2B A6 1D 59 92 35 F6 7D 9C B4 D5 8F 17 83
D3 CA 43 E7 8F 0A 5A BA A6 24 07 99 36 C0 C3 A9
Count up all the bytes for XA, XB, and XC:
XC = 32 (Q.X) + 32 (Q.Y) + 1 (0x04) + 1 (0x00 for the unused bits) = 66 = 0x42
XB = 19 = 0x13
XA is then 66 + 19 + 2 (tag bytes) + 2 (length bytes) = 89 = 0x59
(And, of course, if any of our length values exceeded 0x7F we would have had to encode them correctly)
So now we are left with
30 59 30 13 06 07 2A 86 48 CE 3D 02 01 06 08 2A
86 48 CE 3D 03 01 07 03 42 00 04 81 01 EC E4 74
64 A6 EA D7 0C F6 9A 6E 2B D3 D8 86 91 A3 26 2D
22 CB A4 F7 63 5E AF F2 66 80 A8 D8 A1 2B A6 1D
59 92 35 F6 7D 9C B4 D5 8F 17 83 D3 CA 43 E7 8F
0A 5A BA A6 24 07 99 36 C0 C3 A9
And, we verify:
$ xxd -r -p | openssl ec -text -noout -inform der -pubin
read EC key
<paste, then hit CTRL+D>
30 59 30 13 06 07 2A 86 48 CE 3D 02 01 06 08 2A
86 48 CE 3D 03 01 07 03 42 00 04 81 01 EC E4 74
64 A6 EA D7 0C F6 9A 6E 2B D3 D8 86 91 A3 26 2D
22 CB A4 F7 63 5E AF F2 66 80 A8 D8 A1 2B A6 1D
59 92 35 F6 7D 9C B4 D5 8F 17 83 D3 CA 43 E7 8F
0A 5A BA A6 24 07 99 36 C0 C3 A9
Private-Key: (256 bit)
pub:
04:81:01:ec:e4:74:64:a6:ea:d7:0c:f6:9a:6e:2b:
d3:d8:86:91:a3:26:2d:22:cb:a4:f7:63:5e:af:f2:
66:80:a8:d8:a1:2b:a6:1d:59:92:35:f6:7d:9c:b4:
d5:8f:17:83:d3:ca:43:e7:8f:0a:5a:ba:a6:24:07:
99:36:c0:c3:a9
ASN1 OID: prime256v1
NIST CURVE: P-256
Printing it as "Private-Key: (256-bit)" is just a bug/quirk of the tool, there's no private key there.
Things are harder for specified parameter curves, but those don't interoperate well (https://www.rfc-editor.org/rfc/rfc5480#section-2.1.1 says that a conforming CA MUST NOT use the specified parameter form, or the implicit form, but MUST use the named form).
I want to generate random bytes in Ruby, but I also want to insert some constant values into specific positions of the random bytes.
random_hex_string = SecureRandom.hex (length)
random_hex_string.insert(0,"0102")
random_hex_string.insert(30*1,"36")
So I generate random hex bytes and insert my hex values there. The problem is that I have now a string not a byte array. So when I print it:
File.open("file.txt", 'w+b') do |f|
f.write(random_hex_string)
It - not surprisingly - converts the hex string into binary then writes it. So my hex values are not kept. To be more clear, when I write my hex string to the file, and then I want to see the same hex values when I hex dump the file. How can I do this?
You can turn it into a single element array and pack it as hex. For instance, when I run your code:
require 'securerandom'
length = 2 ** 6
random_hex_string = SecureRandom.hex (length)
random_hex_string.insert(0,"0102")
random_hex_string.insert(30*1,"36")
puts random_hex_string
File.open("file.txt", 'w+b') do |file|
file.write([random_hex_string].pack('H*')) # see also 'h'
end
I get the output
010299e84e9e4541d08cb800462b6f36a87ff118d6291368e96e8907598a2dfd4090658fea1dab6ed460ab512ddc54522329f6b4ddd287e4302ef603ce60e85e631591
and then running
$ hexdump file.txt
0000000 01 02 99 e8 4e 9e 45 41 d0 8c b8 00 46 2b 6f 36
0000010 a8 7f f1 18 d6 29 13 68 e9 6e 89 07 59 8a 2d fd
0000020 40 90 65 8f ea 1d ab 6e d4 60 ab 51 2d dc 54 52
0000030 23 29 f6 b4 dd d2 87 e4 30 2e f6 03 ce 60 e8 5e
0000040 63 15 91
0000043
Unless, I'm mistaken, it matches up perfectly with the output from the script.
I've never messed with Array#pack before, and haven't done much with hex, but this seems to be giving the results you require, so should be a step in the right direction at least.
Why ask for hex when you don't actually want hex? Just do this:
random_bytes = SecureRandom.random_bytes(length)
random_bytes.insert(0, "\x01\x02")
random_bytes.insert(15, "\x36")
I am tasked with decoding data stored on a Aztec barcode using an iOS device. I have access to the code that assembles the string sent to the barcode printer, but the printing itself is a black box.
As I step through the process, I can see that the string sent to the printer looks like this (note that other than the first 8 characters this is a encrypted string):
_36_30_30_30_30_34_7c_5d_49_0b_ea_f7_93_ba_89_d2_c6_c2_41_2a_d7_1c_49_8c_6d_4b_5c_07_5a_ca_7a_6a_c6_d5_d0_6c_f7_20_76_5b_e0_18_46_93_7e_2a_30_0d_14_3a_1a_e5_66_7c_05_f9_df_96_8a_f1_45_a5_4a_6e_2f_89_3f_f0_93_1f_bc_3e_77_5b_27_0c_58_df_55_37_4c_ae_8a_e7_c3_c6_16_5b_57_db_7c_2d_2c_8b_1c_e3_a4_44_1b_c4_ba_6a_c6_98_93_ae_2d_20_6e_9f_e8_0f_eb_bc_9f_2e_8a_e7_cf_da_22_96_e1_74_de_b2_f0_29_ec_b1_c1_75_43_1f_b2_e5_1f_a5_f6_06_3e_97_a1_a1_93_f4_51_4a_c4_14_9f_1a_c2_5b_ba_02_45_44_2b_b3_c2_5b_ba_02_45_44_2b_b3_c2_5b_ba_02_45_44_2b_b3_c2_5b_ba_02_45_44_2b_b3_c2_5b_ba_02_45_44_2b_b3_c2_5b_ba_02_45_44_2b_b3_06_0b_12_75_85_8b_07_fb
And the printed barcode looks like this:
However, when I use a generic iOS barcode reader to read it back (I've tried a few), I get the following:
600004|]I�ê÷ºÒÆÂA*×�ImK\�ZÊzjÆÕÐl÷ v[à�F~*0
�:�åf|�ùßñE¥Jn/?ð�¼>w['�XßU7L®çÃÆ�[WÛ|-,�ã¤D�ĺjÆ®- nè�PÐk^¡±xOS5·Óþ�ßá×D¢\���¥ö�>¡¡ôQJÄ��Â[º�ED+³Â[º�ED+³Â[º�ED+³Â[º�ED+³Â[º�ED+³Â[º�ED+³���u�û
This bares a resemblance the original string (for example the first few characters). But I have no idea what type of encoding this is, or how to translate it to the hex codes I was expecting to see.
I would love to know:
1) What am I looking at here?
2) How can I convert this string back into the original format?
Note: For clarity, what you refer to as the encrypted string, I will refer to as the hex code, to further differentiate from the random-looking string at the end of your post.
Summary
I believe the encoding you're seeing in the string is a bungled ASCII/ISO-8859-1 encoding. It is omitting some characters, making it impossible to recover your original hex code from that string. After finding a scanner that properly handles the barcode, it turns out the barcode does not match your hex code.
Encoding
Wikipedia says that by default1, byte codes in Aztec are interpreted as ASCII when between 0 and 127, and as ISO-8859-1 when between 128 and 255. So when you substitute the letters and symbols you're getting with the proper hex values from those two encodings, you get the following:
36 30 30 30 30 34 7C 5D 49 EA F7 BA D2 C6 C2 41 2A D7 49 6D 4B 5C 5A CA 7A 6A C6 D5 D0 6C F7 20 76 5B E0 46 7E 2A 30 0A 3A E5 66 7C F9 DF F1 45 A5 4A 6E 2F 3F F0 BC 3E 77 5B 27 58 DF 55 37 4C AE E7 C3 C6 5B 57 DB 7C 2D 2C E3 A4 44 C4 BA 6A C6 AE 2D 20 6E E8 50 D0 6B 5E A1 B1 78 4F 53 35 B7 D3 FE DF E1 D7 44 A2 5C
This is similar to your encrypted hex code, but with some bytes omitted, and the stuff after the bolded E8 byte is different. The omitted bytes are all from the 00 - 1F and 80 - 9F ranges. The 00 - 1F range in ASCII are control codes, most of which are rarely used and not well supported by many applications. The other range is undefined in ISO-8859-12. So any application trying to interpret these bytes as ASCII/ISO-8859-1 strings may result in unpredictable behaviour.
If you remove bytes from these ranges in your encrypted hex code, you get essentially3 the same thing I got, up to the E8 byte. The byte you have after E8 is 0F. I've never heard of this control code before, but apparently it's called "Shift In" and its function is to "Return to regular character set after Shift Out." Since we're already having trouble with character sets, I can only assume that this control code is responsible for the interpretation errors after the E8 byte.
Edit: One of your recent edits modified the string, and it now contains a few of these characters: �. This is Unicode's replacement character, a character that often replaces others when there is character encoding issues, or a process has trouble interpreting a particular character. In this case, it is replacing many bytes from the 00 - 1F range, which are the ASCII controls. It remains impossible to recover. The 80 - 9F range is still omitted.
A better barcode reader
In order to properly interpret the barcode, you'll need a reader that does not interpret the hex code as encoded strings, but as a byte stream. At the very least, you need a reader that will still preserve the 00 - 1F and 80 - 9F ranges.
One such reader that I've found is NeoReader. It is entirely possible you've already tried it, but copy-pasting can cause errors with these special code ranges.
I scanned the code with it on an iOS 7 device, then hit the "Copy to Clipboard" button that the app provides. Then, I pasted the string at the top of this converter and hit convert. I usually use this converter for Unicode stuff, but other dedicated text to hex converters I found were not able to handle the string and its special codes. If you scroll down to the "Hexadecimal code points," you should be able to see the needed hexadecimal codes, though they are prefixed with an extra 004.
The string it produces (though take it with a grain of salt, I had some copy paste-errors and it appears the special controls were removed upon posting it):
600004|]I ê÷ºÒÆÂA*×ImK\ZÊzjÆÕÐl÷ v[àF~*0
:åf|ùßñE¥Jn/?ð¼>w[' XßU7L®çÃÆ[WÛ|-,ã¤DĺjÆ®- nèPÐk^¡±xOS5·Óþßá×D¢\¥ö>¡¡ôQJÄÂ[ºED+³Â[ºED+³Â[ºED+³Â[ºED+³Â[ºED+³Â[ºED+³ uû
Hex code comparison (differences are marked by < >):
Your hex code: 36 30 30 30 30 34 7C 5D 49 0B EA F7 93 BA 89 D2 C6 C2 41 2A D7 1C 49 8C 6D 4B 5C 07 5A CA 7A 6A C6 D5 D0 6C F7 20 76 5B E0 18 46 93 7E 2A 30 <0D> 14 3A 1A E5 66 7C 05 F9 DF 96 8A F1 45 A5 4A 6E 2F 89 3F F0 93 1F BC 3E 77 5B 27 0C 58 DF 55 37 4C AE 8A E7 C3 C6 16 5B 57 DB 7C 2D 2C 8B 1C E3 A4 44 1B C4 BA 6A C6 98 93 AE 2D 20 6E 9F E8 0F <EB BC 9F 2E 8A E7 CF DA 22 96 E1 74 DE B2 F0 29 EC B1 C1 75 43 1F B2 E5> 1F A5 F6 06 3E 97 A1 A1 93 F4 51 4A C4 14 9F 1A C2 5B BA 02 45 44 2B B3 C2 5B BA 02 45 44 2B B3 C2 5B BA 02 45 44 2B B3 C2 5B BA 02 45 44 2B B3 C2 5B BA 02 45 44 2B B3 C2 5B BA 02 45 44 2B B3 06 0B 12 75 85 8B 07 FB
NeoReader string: 36 30 30 30 30 34 7C 5D 49 0B EA F7 93 BA 89 D2 C6 C2 41 2A D7 1C 49 8C 6D 4B 5C 07 5A CA 7A 6A C6 D5 D0 6C F7 20 76 5B E0 18 46 93 7E 2A 30 <0A> 14 3A 1A E5 66 7C 05 F9 DF 96 8A F1 45 A5 4A 6E 2F 89 3F F0 93 1F BC 3E 77 5B 27 0C 58 DF 55 37 4C AE 8A E7 C3 C6 16 5B 57 DB 7C 2D 2C 8B 1C E3 A4 44 1B C4 BA 6A C6 98 93 AE 2D 20 6E 9F E8 0F <81 50 D0 6B 5E A1 B1 78 4F 53 35 B7 D3 FE 1F DF E1 90 D7 44 A2 5C 00 19> 1F A5 F6 06 3E 97 A1 A1 93 F4 51 4A C4 14 9F 1A C2 5B BA 02 45 44 2B B3 C2 5B BA 02 45 44 2B B3 C2 5B BA 02 45 44 2B B3 C2 5B BA 02 45 44 2B B3 C2 5B BA 02 45 44 2B B3 C2 5B BA 02 45 44 2B B3 06 0B 12 75 85 8B 07 FB
The difference explained
It turns out, that the barcode does not actually match your hex code. Where our two codes diverge, at that 0F byte, the barcode actually follows what NeoReader suggests. This is shown in the image below, which is zoomed in on the bottom-right quadrant of the barcode (the blue lines indicate parts that do not encode data, they are to help orient the scanner).
I managed to manually5 decode that section of barcode, with help from this video tutorial. Your barcode, however, does not use the string encoding method shown there, as it uses a binary shift escape to work with 8-bit values. From there I believe the one 0A <-> 0D difference is due to a copy paste error on my part.
Unfortunately, since the printer is a black box to you, it does not appear as though you can fix this problem yourself.
Footnotes
I could not find an Aztec Code specification, but the behaviour seems to be relatively consistent with the default.
ISO-8859-1 is essentially a superset of ASCII, but it technically leaves the ASCII control code range undefined. This is usually ignored in practice.
The only difference is the italicized 0A character I have, which is a new line character. Your string has 0D, another new line character. Different systems handle new lines differently, and its not uncommon for them to automatically change the new line characters. Unlike most other ASCII control codes, new line characters are usually well supported.
The reason for this is complicated. Glossing over a few details, I believe that upon hitting the convert button, it is first converted to UTF-16 (Javascript's native string encoding). The byte values for the ASCII/ISO-8859-1 characters are the same in UTF-16. However, UTF-16 is a 16-bit encoding rather than an 8-bit encoding, hence, the extra 00.
That was painful.
First of all, I've tried the following online barcode reader:
intbusoft : server error
onlinebarcodereader : no code found
datasymbol : code different from yours
pqscan : no code found
zxing : code longer than your.
This makes me think that your bar-code may be not so well construct...
Here's your output:
600004|]Iê÷ºÒÆÂA*×ImK\ZÊzjÆÕÐl÷ v[àF~*0
:åf|ùßñE¥Jn/?ð¼>w['XßU7L®çÃÆ[WÛ|-,ã¤DĺjÆ®- nèPÐk^¡±xOS5·Óþßá×D¢\
and here's the one from zxing:
600004|]I�ê÷ºÒÆÂA*×�ImK\�ZÊzjÆÕÐl÷ v[à�F~*0
�:�åf|�ùßñE¥Jn/?ð�¼>w['�XßU7L®çÃÆ�[WÛ|-,�ã¤D�ĺjÆ®- nè�PÐk^¡±xOS5·Óþ�ßá×D¢\���¥ö�>¡¡ôQJÄ��Â[º�ED+³Â[º�ED+³Â[º�ED+³Â[º�ED+³Â[º�ED+³Â[º�ED+³���u�û
(maybe this difference is due to a copy/paste manipulation on your side)
This is the matching that I was able to find:
6 0 0 0 0 4 | ] I � ê ÷ º Ò Æ Â
36 30 30 30 30 34 7c 5d 49 0b ea f7 93 ba 89 d2 c6 c2
A * × � I m K \ � Z Ê z j Æ Õ Ð l
41 2a d7 1c 49 8c 6d 4b 5c 07 5a ca 7a 6a c6 d5 d0 6c
÷ v [ à � F ~ * 0 � : � å f |
f7 20 76 5b e0 18 46 93 7e 2a 30 0d 14 3a 1a e5 66 7c
� ù ß ñ E ¥ J n / ? ð � ¼ >
05 f9 df 96 8a f1 45 a5 4a 6e 2f 89 3f f0 93 1f bc 3e
w [ ' � X ß U 7 L ® ç Ã Æ � [ W Û
77 5b 27 0c 58 df 55 37 4c ae 8a e7 c3 c6 16 5b 57 db
| - , � ã ¤ D � Ä º j Æ ® -
7c 2d 2c 8b 1c e3 a4 44 1b c4 ba 6a c6 98 93 ae 2d 20
n è � P
6e 9f e8 0f eb
And this seems to be some Unicode UCS-2 encoding.
After this, I can't explain the difference between output and expected hexadecimal values
I am trying to inspect h264 bitstream coming from hardware encoder on TI Davinci board.
00 00 0b c8 25 88 84 27 e4 a2 8e 32 77 87 ec 16 86 37 d7 8e 99 e1 8c 3b 8b ce fe a5 fc e9 9c f3 34 87 9f d7 ff 66 7d c1 ce ed 62 18 05 35 00 08 0f f6 69 12 08 a8 32 5e c7 fe c8 bf 77 e4 62 e4 9e 8b b0 6e f0 39 60 5b e8 26 78 52 d8 24 75 5c 2f 06 ce 71 04 aa cb e3 19 d0 dd 02 b5 e7 0e a7 ce 77 70 a9 7c 46 1e 65 b3 7b 02 c9 d4 72 d7 97 36 f3 59 93 e5 e6 92 ff 8f ba 29 03 d5 da 0a 7a 14 1f 19 b5 88 b1 98 7a 3b e1 58 a2 88 a1 5a 4a
The first 4 bytes seams to be size of the trailing chunk ...
What is the format of this bitstream?
How to extract nal_unit_type and slice_type/pict_type ?
Is there forbidden zero bit present ?
this source states that stream does not need to contain start codes, sequence parameter set NALUs and picture parameter set NALUs. And in that case decoder must obtain SPS and PPS NALUs externally (Some kind of extradata parameter to decoder..).
The ITU-T H.264 standard and the ISO/IEC MPEG-4 AVC standard (formally, ISO/IEC 14496-10 – MPEG-4 Part 10, Advanced Video Coding) are jointly maintained so that they have identical technical content. (http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC)
The H.264 spec can be downloaded for free at:
http://www.itu.int/rec/T-REC-H.264/en
The ISO version costs currently CHF 323.00 at http://webstore.iec.ch/
The bitstream format is defined in ISO/IEC 14496-10:
Information technology — Coding of audio-visual objects — Part 10:
Advanced Video Coding
You can download the standard from itu.int website.
The data you provided looks like NAL Unit 5 (lower 5 bits of first byte of payload, after first 4 bytes with length are skipped) and it does not carry SPS/PPS units.
See also previous topics on H.264 decoding, e.g. H.264 stream header
I've just upgraded a project from Delphi 2006 to Delphi XE. Everything is working as expected except I get an exception when I close my app.
It's not breaking on a code line. It breaks to the CPU window on a LEAVE command.
I've attached a Eureka log if that is any help.
EurekaLog 6.0.25
Application:
------------------------------------------------------
1.1 Start Date : Fri, 3 Dec 2010 10:44:17 +0100
1.2 Name/Description: LogoTid.exe
1.3 Version Number :
1.4 Parameters :
1.5 Compilation Date: Fri, 3 Dec 2010 10:44:15 +0100
1.6 Up Time : 5 seconds
Exception:
----------------------------------------------------
2.1 Date : Fri, 3 Dec 2010 10:44:22 +0100
2.2 Address : 004062A0
2.3 Module Name : LogoTid.exe
2.4 Module Version:
2.5 Type : EInvalidPointer
2.6 Message : Invalid pointer operation.
2.7 ID : 5E21
2.8 Count : 1
2.9 Status : New
2.10 Note :
User:
-------------------------------------------------------
3.1 ID : oda
3.2 Name :
3.3 Email :
3.4 Company :
3.5 Privileges: SeIncreaseQuotaPrivilege - OFF
SeSecurityPrivilege - OFF
SeTakeOwnershipPrivilege - OFF
SeLoadDriverPrivilege - OFF
SeSystemProfilePrivilege - OFF
SeSystemtimePrivilege - OFF
SeProfileSingleProcessPrivilege - OFF
SeIncreaseBasePriorityPrivilege - OFF
SeCreatePagefilePrivilege - OFF
SeBackupPrivilege - OFF
SeRestorePrivilege - OFF
SeShutdownPrivilege - OFF
SeDebugPrivilege - ON
SeSystemEnvironmentPrivilege - OFF
SeChangeNotifyPrivilege - ON
SeRemoteShutdownPrivilege - OFF
SeUndockPrivilege - OFF
SeManageVolumePrivilege - OFF
SeImpersonatePrivilege - ON
SeCreateGlobalPrivilege - ON
SeIncreaseWorkingSetPrivilege - OFF
SeTimeZonePrivilege - OFF
SeCreateSymbolicLinkPrivilege - OFF
Active Controls:
------------------------------------------------------------------
4.1 Form Class : TAppBuilder
4.2 Form Text : LogoTid - Delphi XE - uMain [Running] [Built]
4.3 Control Class:
4.4 Control Text :
Computer:
------------------------------------------------------------------------------------------------
5.1 Name : OLE-LAPTOP
5.2 Total Memory : 3891 Mb
5.3 Free Memory : 778 Mb
5.4 Total Disk : 120 Gb
5.5 Free Disk : 57,93 Gb
5.6 System Up Time: 1 day, 23 hours, 16 minutes, 56 seconds
5.7 Processor : Intel(R) Core(TM) i5 CPU M 520 # 2.40GHz
5.8 Display Mode : 1920 x 1200, 32 bit
5.9 Display DPI : 96
5.10 Video Card : Intel(R) Graphics Media Accelerator HD (driver 8.15.10.2025 - RAM 1721 MB)
5.11 Printer : RICOH Aficio 2232C RPCS (driver 1.0.0)
Operating System:
--------------------------------------------
6.1 Type : Microsoft Windows 7 (64 bit)
6.2 Build # : 7600
6.3 Update :
6.4 Language: Danish
6.5 Charset : 0
Call Stack Information:
-------------------------------------------------------------------
|Address |Module |Unit |Class|Procedure/Method |Line |
-------------------------------------------------------------------
|Running Thread: ID=5632; Priority=0; Class=; [Main] |
|-----------------------------------------------------------------|
|00D171A1|LogoTid.exe |LogoTid.dpr| | |32[5]|
|76A73675|kernel32.dll| | |BaseThreadInitThunk| |
-------------------------------------------------------------------
Assembler Information:
-----------------------------------------------------------------
; System.TObject.FreeInstance
; ----------------------------
00406294 push ebx
00406295 mov ebx, eax
00406297 mov eax, ebx
00406299 call System.TObject.CleanupInstance
0040629E mov eax, ebx
004062A0 call System._FreeMem ; <-- EXCEPTION
004062A5 pop ebx
004062A6 ret
Registers:
-----------------------------
EAX: 02AF8058 EDI: 00000001
EBX: 004062A5 ESI: 004062A5
ECX: 0041D700 ESP: 0018FE98
EDX: 004062A5 EIP: 004062A0
Stack: Memory Dump:
------------------ ---------------------------------------------------------------------------
0018FE98: FFFFFF02 004062A0: E8 3B E7 FF FF 5B C3 90 83 C0 CC 8B 00 C3 8B C0 .;...[..........
0018FE9C: 00404B78 004062B0: 84 D2 74 08 83 C4 F0 E8 54 05 00 00 84 D2 74 0F ..t.....T.....t.
0018FEA0: 02B1CEC0 004062C0: E8 A3 05 00 00 64 8F 05 00 00 00 00 83 C4 0C C3 .....d..........
0018FEA4: 02B1CEC0 004062D0: E8 E3 05 00 00 84 D2 7E 05 E8 82 05 00 00 C3 90 .......~........
0018FEA8: 00404BC2 004062E0: 85 C0 74 07 B2 01 8B 08 FF 51 FC C3 53 56 57 89 ..t......Q..SVW.
0018FEAC: 02B1CEC0 004062F0: C3 89 D7 AB 8B 4B CC 31 C0 51 C1 E9 02 49 F3 AB .....K.1.Q...I..
0018FEB0: 0018FEE8 00406300: 59 83 E1 03 F3 AA 89 D0 89 E2 8B 4B AC 85 C9 74 Y..........K...t
0018FEB4: 004062A5 00406310: 01 51 8B 5B D0 85 DB 74 04 8B 1B EB ED 39 D4 74 .Q.[...t.....9.t
0018FEB8: 03A02F01 00406320: 1D 5B 8B 0B 83 C3 04 8B 73 10 85 F6 74 06 8B 7B .[......s...t..{
0018FEBC: 00406865 00406330: 14 89 34 07 83 C3 1C 49 75 ED 39 D4 75 E3 5F 5E ..4....Iu.9.u._^
0018FEC0: 0045B949 00406340: 5B C3 8B C0 53 56 89 C3 89 C6 8B 36 8B 56 B4 8B [...SV.....6.V..
0018FEC4: 03A02FA0 00406350: 76 D0 85 D2 74 07 E8 85 36 00 00 89 D8 85 F6 75 v...t...6......u
0018FEC8: 03A02F01 00406360: E9 89 D8 E8 78 06 00 00 5E 5B C3 90 87 D1 81 F9 ....x...^[......
0018FECC: 004062EB 00406370: 00 00 00 FF 73 11 81 F9 00 00 00 FE 72 07 0F BF ....s.......r...
0018FED0: 00912606 00406380: C9 03 08 FF 21 FF E1 81 E1 FF FF FF 00 01 C1 89 ....!...........
0018FED4: 00000000 00406390: D0 8B 11 E9 A8 59 00 00 C3 8D 40 00 3B C2 0F 94 .....Y....#.;...
--- Edit
Ok, tried turning of parts of my program until the error went away, and found the troublemaker.
It's my webservice WSDL generated proxy. If I create the proxy object without calling any functions on the service, it throws the error.
I've created a test project without any other code than the proxy object creation and it also throws the error. I've also tried with another webservice, same error. Both webservices was created with Delphi 2006 (.net 1.1).
Lastly I tried with a .net 4.0 webservice created in VS2010. No problems. So either Delphi XE is projects is not compatible with .net 1.1 webservices or Delphi 2006 webservices. Either way it's a mess.
Any thouhts on how to solve this, maybe a workaround?
The log won't help here. It looks like a memory corruption issue, which can happen if your code performs indexed operations on strings (writing to string's character position, for example) and you have not fixed all code where string is casted to PChar or similar code.
In other words, you have to perform careful analysis of your code. Start with turning off some modules and code blocks completely until the exception disappears. Then start adding them one by one.
Likely related to the fact that the string is now a Unicode string (2 bytes per char), and not an AnsiString (1 byte per char). If you play with the raw bytes of strings, this is a major problem. To solve it, simply replace all string to AnsiString and all char to AnsiChar. Of course, you will lose Unicode support by doing this. A better fix is to rework your string handling routines. Often, what is necessary is only to add some multiplicative factors sizeof(char) (=2) every here and there.
Example (old code):
byteSize = length(str);
Example (new code):
byteSize = length(str) * sizeof(char);
Found a solution / workaround.
The error occurs if you use a Webservice directly in a form.
Create an empty vcl forms project, use the wsdl generator to generate a webservice proxy. Include proxy class in uses section. Declare a private object of the proxy, and then in the form create use the proxy class getXXXXXXX function to initiate your object. Run the project.
When you close the form, you get an exception.
The solution / workaround is to create your own class, and talk to the webservice proxy through this class.