I'm using MongoDB for my Go and Rails projects (using same database) and i have a bson.Binary data in my document (contain Base64 encoded publicKey)
type Device struct {
Id string `json:"id" form:"id" bson:"_id"`
PublicKey bson.Binary `json:"pub_key" form:"pub_key" bson:"public_key"`
Token string `json:"token" form:"token" bson:"token"`
CreatedAt time.Time `json:"created_at" bson:"created_at"`
UpdatedAt time.Time `json:"updated_at" bson:"updated_at"`
}
but when i retrieve it in my Go projects and the data is somehow corrupted (4 characters are missing)
device := models.Device{}
err = db.C(models.CollectionDevice).Find(bson.M{"_id": deviceId}).One(&device)
publicKey := device.PublicKey
publicKeyBase64 := base64.StdEncoding.EncodeToString(publicKey.Data)
fmt.Println("publicKey BinaryData: ", publicKey.Data)
t
"pub_key": {
"Kind": 0,
"Data": "Axj5A9BgkKJohGh/0BAQEFAAMBjYAAMYmCAYGACqW0smeNiaIgRgXJv7dLZ0n5gTeXxiI9q6h8EdbiPDLiFsv7Jgatwxm+OpucrUBp4sZp/othrSQWnJQR5vpnCZYZnNJUdjTAYw+PyjQ2qq9YEeLHMnMzhYgaq2ata+CSsCjalkOyzLt/rDEBn5WHaCfYm0Vm+QbbEWVLltUqcjvScOyAwEAAQn"
}
(220 characters length in Data)
the original Base64 encoded string is
Axj5A9BgkKJohGh/0BAQEFAAMBjYAAMYmCAYGACqW0smeNiaIgRgXJv7dLZ0\n5gTeXxiI9q6h8EdbiPDLiFsv7Jgatwxm+OpucrUBp4sZp/othrSQWnJQR5vp\nCZYZnNJUdjTAYw+PyjQ2qq9YEeLHMnMzhYgaq2ata+CSsCjalkOyzLt/rDEB\n5WHaCfYm0Vm+QbbEWVLltUqcjvScOyAwEAAQ\n
with 224 characters
i use this code in my Go projects:
fmt.Println("publicKey: ", publicKey.Data)
//publicKey is bson.Binary data type
i have tried to retrieve the data from my rails projects (same database) and it retrieve correctly
public_key.as_json["$binary"]
the result is exactly right (224 characters) :
"Axj5A9BgkKJohGh/0BAQEFAAMBjYAAMYmCAYGACqW0smeNiaIgRgXJv7dLZ0\nn5gTeXxiI9q6h8EdbiPDLiFsv7Jgatwxm+OpucrUBp4sZp/othrSQWnJQR5v\npnCZYZnNJUdjTAYw+PyjQ2qq9YEeLHMnMzhYgaq2ata+CSsCjalkOyzLt/rD\nEBn5WHaCfYm0Vm+QbbEWVLltUqcjvScOyAwEAAQn\n"
As you can see, there is still a \n at the last of string. Any one know why 4 characters are missing in Go ?
===== Additional information =====
When i store it in my mongoDB from my Rails API, i receive public_key params as Base64 format, but i decode it to binary and then i stored it with this code
def create
params = device_params
public_key = Base64.decode64 device_params[:public_key]
params[:public_key] = BSON::Binary.new(public_key, :generic)
device = Device.find_or_create_by(id: device_params[:id])
render_success device.update_attributes(params), device
end
Your original data contains exactly 4 newline characters \n:
Axj5A9BgkKJohGh/0BAQEFAAMBjYAAMYmCAYGACqW0smeNiaIgRgXJv7dLZ0\n
5gTeXxiI9q6h8EdbiPDLiFsv7Jgatwxm+OpucrUBp4sZp/othrSQWnJQR5vp\n
CZYZnNJUdjTAYw+PyjQ2qq9YEeLHMnMzhYgaq2ata+CSsCjalkOyzLt/rDEB\n
5WHaCfYm0Vm+QbbEWVLltUqcjvScOyAwEAAQ\n
These newline characters are not part of the Base64 encoded data, they are just to format the input.
Problem arises likely when you try to store this as a raw string literal, because this is an interpreted string literal, the \n sequences are to be discarded / removed.
Since the backslash is an invalid symbol for "normal" Base64, they are removed, though the subsequent n is not (because n is valid in Base64). As a result, you get corrupted Base64 data.
You should either completely remove the line endings and then you get a valid Base64 which you may store either as a string or binary; or store it as a general string which will preserve the line endings (which have to be dealt with during base64 decoding).
Related
I am facing the following issue:
In my app, the user can enter special characters (like emojis) in a textfield also. So, while sending this entered text to server in request body, I am converting it using the following code:
func emojiToUTF8()->String
{
let data = self.data(using: .nonLossyASCII, allowLossyConversion: true)
let emoji = String.init(data: data!, encoding: .utf8)
return emoji ?? self
}
For instance, if I enter the text "☺️", it gets converted into "\u263a\ufe0f" using the above method. Things are fine till here.
The problem occurs when I add this to a dictionary for sending it as a request parameter to the server. Code i'm using:
var parameters = [String:String]()
parameters["feedback"] = feedBackTxt
print("Parameters:",parameters) /// output: ["feedback": "\\u263a\\ufe0f"]
So, the problem here is that an extra slash is getting appended before each slash due to char escaping. I checked the created dictionary value as well. It shows double slash there also. How do I avoid this? Why is this happening when I am simply creating a dictionary with a string? This is causing issue at server end.
I have tried a couple of things, but none of them seem to work.
Your problem is that you're double-encoding.
You're taking a string, converting it to ASCII, then re-parsing it as UTF8 and then encoding that (probably) as JSON, which is UTF8. In the process, the backslashes are being escaped by your second encoder.
The best solution to this is to rework your server to accept UTF8. However, if you can't do that, you need to ensure you encode this string just one time, in ASCII.
In short, you should get rid of emojiToUTF8 and ensure that your parameters processor encodes the way your server requires (which apparently is ASCII and not UTF8).
I have the following code:
buff=esp.flash_read(esp.flash_user_start(),50)
print(buff)
I get the following output from print:
bytearray(b'{"ssid": "mySSID", "password": "myPASSWD"}\xff\xff\xff\xff\xff\xff')
What I want to do is get the json in buff. What is the correct "Python-way" to do that?
buff is a Python bytes object, as shown by the print output beginning with b'. To convert this into a string you need to decode it.
In standard Python you could use
buff.decode(errors='ignore')
Note that without specifying errors=ignore you would get a UnicodeDecodeError because the \xff bytes aren't valid in the default encoding, which is UTF-8; presumably they're padding and you want to ignore them.
If that works on the ESP8266, great! However this from the MicroPython docs suggests the keyword syntax might not be implemented - I don't have an ESP8266 to test it. If not then you may need to remove the padding characters yourself:
textLength = find(buff, b'\xff')
text = buff[0:textLength].decode()
or simply:
text = buff[0:buff.find(b'\xff')].decode()
If decode isn't implemented either, which it isn't in the online MicroPython interpreter, you can use str:
text = str(buff[0:find(buff, b'\xff')], 'utf-8')
Here you have to specify explicitly that you're decoding from UTF-8 (or whatever encoding you specify).
However if what you're really after is the values encoded in the JSON, you should be able to use the json module to retrieve them into a dict:
import json
j = json.loads(buff[0:buff.find(b'\xff')])
ssid = j['ssid']
password = j['password']
CONCLUSION:
For some reason the flow wouldn't let me convert the incoming message to a BLOB by changing the Message Domain property of the Input Node so I added a Reset Content Descriptor node before the Compute Node with the code from the accepted answer. On the line that parses the XML and creates the XMLNSC Child for the message I was getting a 'CHARACTER:Invalid wire format received' error so I took that line out and added another Reset Content Descriptor node after the Compute Node instead. Now it parses and replaces the Unicode characters with spaces. So now it doesn't crash.
Here is the code for the added Compute Node:
CREATE FUNCTION Main() RETURNS BOOLEAN
BEGIN
DECLARE NonPrintable BLOB X'0001020304050607080B0C0E0F101112131415161718191A1B1C1D1E1F7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF1F2F3F4F5F6F7F8F9FAFBFCFDFEFF';
DECLARE Printable BLOB X'20202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020';
DECLARE Fixed BLOB TRANSLATE(InputRoot.BLOB.BLOB, NonPrintable, Printable);
SET OutputRoot = InputRoot;
SET OutputRoot.BLOB.BLOB = Fixed;
RETURN TRUE;
END;
UPDATE:
The message is being parsed as XML using XMLNSC. Thought that would cause a problem, but it does not appear to be.
Now I'm using PHP. I've created a node to plug into the legacy flow. Here's the relevant code:
class fixIncompetence {
function evaluate ($output_assembly,$input_assembly) {
$output_assembly->MRM = $input_assembly->MRM;
$output_assembly->MQMD = $input_assembly->MQMD;
$tmp = htmlentities($input_assembly->MRM->VALUE_TO_FIX, ENT_HTML5|ENT_SUBSTITUTE,'UTF-8');
if (!empty($tmp)) {
$output_assembly->MRM->VALUE_TO_FIX = $tmp;
}
// Ensure there are no null MRM fields. MessageBroker is strict.
foreach ($output_assembly->MRM as $key => $val) {
if (empty($val)) {
$output_assembly->MRM->$key = '';
}
}
}
}
Right now I'm getting a vague error about read only messages, but before that it wasn't working either.
Original Question:
For some reason I am unable to impress upon the senders of our MQ
messages that smart quotes, endashes, emdashes, and such crash our XML
parser.
I managed to make a working solution with SQL queries, but it wasted
too many resources. Here's the last thing I tried, but it didn't work
either:
CREATE FUNCTION CLEAN(IN STR CHAR) RETURNS CHAR BEGIN
SET STR = REPLACE('–',STR,'–');
SET STR = REPLACE('—',STR,'—');
SET STR = REPLACE('·',STR,'·');
SET STR = REPLACE('“',STR,'“');
SET STR = REPLACE('”',STR,'”');
SET STR = REPLACE('‘',STR,'&lsqo;');
SET STR = REPLACE('’',STR,'’');
SET STR = REPLACE('•',STR,'•');
SET STR = REPLACE('°',STR,'°');
RETURN STR;
END;
As you can see I'm not very good at this. I have tried reading about
various ESQL string functions without much success.
So in ESQL you can use the TRANSLATE function.
The following is a snippet I use to clean up a BLOB containing non-ASCII low hex values so that it then be cast into a usable character string.
You should be able to modify it to change your undesired characters into something more benign. Basically each hex value in NonPrintable gets translated into its positional equivalent in Printable, in this case always a full-stop i.e. x'2E' in ASCII. You'll need to make your BLOB's long enough to cover the desired range of hex values.
DECLARE NonPrintable BLOB X'000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F';
DECLARE Printable BLOB X'2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E';
SET WorkBlob = TRANSLATE(WorkBlob, NonPrintable, Printable);
BTW if messages with invalid characters only come in every now and then I'd probably specify BLOB on the input node and then use something similar to the following to invoke the XMLNSC parser.
CREATE LASTCHILD OF OutputRoot DOMAIN 'XMLNSC'
PARSE(InputRoot.BLOB.BLOB CCSID InputRoot.Properties.CodedCharSetId ENCODING InputRoot.Properties.Encoding);
With the exception terminal wired up you can then correct the BLOB's of any messages containing parser breaking invalid characters before attempting to reparse.
Finally my best wishes as I've had a number of battles over the years with being forced to correct invalid message content in the "Integration Layer" after all that's what it's meant to do.
I have an application that builds an HTML email. Included in the content is an encoded URL parameter that might, for example, contain a promotional code or product reference. The email is generated by a Windows service (essentially a console application) and the link, when clicked is handled by an MVC web site. Here is the code for creating the email link:
string CreateLink(string domain, string code) {
// code == "xyz123"
string encrypted = DES3Crypto.Encrypt(code); // H3uKbdyzrUo=
string urlParam = encrypted.EncodeBase64(); // SDN1S2JkeXpyVW890
return domain + "/" + urlParam;
}
The action method on the MVC controller is constructed as follows:
public ActionResult Index(string id) {
string decoded = id.DecodeBase64();
string decrypted = DES3Crypto.Decrypt(decoded);
...
}
In all our testing, this mechanism has worked as expected, however, now we have gone live we are seeing around a 4% error rate where the conversion from base-64 fails with the following exception:
The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or a non-white space character among the padding characters.
The id parameter from the url 'looks' OK. The problem appears to be with the EncodeBase64/DecodeBase64 methods that are failing as DecodeBase64 method returns a 'garbled' string such as "�nl����□��7y�b�8�sJ���=" on the failed links.
Furthermore, most of the errors are from IE6 user agents leading me to think this is a character encoding problem but I don't see why.
For reference, here is the code for my base-64 URL encoding:
public static string EncodeBase64(this string source)
{
byte[] bytes = Encoding.UTF8.GetBytes(source);
string encodedString = HttpServerUtility.UrlTokenEncode(bytes);
return encodedString;
}
public static string DecodeBase64(this string encodedString)
{
byte[] bytes = HttpServerUtility.UrlTokenDecode(encodedString);
string decodedString = Encoding.UTF8.GetString(bytes);
return decodedString;
}
Any advice would be much appreciated.
To recap, I was creating a URL that used a base-64 encoded parameter which was itself a Triple DES encrypted string. So the URL looked like http://[Domain_Name]/SDN1S2JkeXpyVW890 The link referenced a controller action on an MVC web site.
The URL was then inserted into an HTML formatted email. Looking at the error log, we saw that around 5% of the public users that responded to the link were throwing an "invalid base-64 string error". Most, but not all, of these errors were related to the IE6 user agent.
After trying many possible solutions based around character and URL encoding, it was discovered that somewhere in the client's process the url was being converted to lower-case - this, of course, broke the base-64 encoding (as it is uses both upper and lower case encoding characters).
Whether the case corruption was caused by the client's browser, email client or perhaps local anti-virus software, I have not been able to determine.
The Solution
Do not use any of the standard base-64 encoding methods, instead use a base-32 or zBase-32 encoding instead - both of which are case-insensitive.
See the following links for more details
Base-32 - Wikipedia
MyTenPennies Base-32 .NET Implementation
The moral of the story is, Base-64 URL encoding can be unreliable in some public environments. Base-32, whilst slightly more verbose, is a better choice.
Hope this helps.
It looks like you were really close. You had an extra zero coming back from your encyrpted.EncodeBase64() function.
Try this:
string data = "H3uKbdyzrUo=";
string b64str = Convert.ToBase64String(UTF8Encoding.UTF8.GetBytes(data));
string clearText = UTF8Encoding.UTF8.GetString(Convert.FromBase64String(b64str));
This is an interesting issue. My guess is that IE 6 is eating some of the characters.
For example, the length of the string that you included "ywhar0xznxpjdnfnddc0yxzbk2jnqt090" is not a multiple of four (which is a requirement for FromBase64 to work http://msdn.microsoft.com/en-us/library/system.convert.frombase64string.aspx)
But if you were to pad that string until it's length is a multiple of four ("ywhar0xznxpjdnfnddc0yxzbk2jnqt090" + "a12") then that works.
The MSDN documentation says that one ("=") or two ("==") equal characters are used for padding to/fromBase64 methods and I suspect IE 6 is truncating that from the string that you send.
This is total speculation but I hope it helps.
I have a D program with Tango and I'm trying to uncompress a gzipped string. Unfortunately I don't have A stream to it, but the compressed data is stored in a char[]. How can I uncompress it using tangos tango.io.compress.ZlibStream? I need another char[] with the uncompressed data.
I've been trying this for hours now. I'm not very familiar with tango.
Thank you
Edit: my code looks something like this:
char[] rawData; // decoded data goes here
Array array = new Array(e.value[4..(e.value.length-3)]); // e.value is a char[]
// array slice, castet to char[] is "H4sIAAAAAAAAA2NkYGBgHMWDBgMAjw2X0pABAAA="
// array.readable returns 40 (matches the above string)
// decoded string is expected to be 33 repeatitions of "AQAAAAEAAAABAAAA"
// followed by "AQAAAA=="
auto reader = new ZlibInput(array);
ubyte[1024] buffer;
reader.read(buffer); // throws Z_DATA_ERROR
well, nevermind. It appears, the guy who designed this file format compressed the data, before he encoded it with base64. I tried to decompress still base64 encoded data.
When I decoded the string with base64 and used gzip on the resulting ubyte array, it did the trick!
sorry about that.