Base64 decode leads to partial decode followed by non printable characters - character-encoding

We are trying to decode base64 encoded strings from some scraped URLs. However the decoding works only partially and starts generating non-printable characters. For example,
Here is the encoded string
CiQ2MmE5NzI5NC01YTJhLTQzMTctOTU2Yi1lY2E3MjU3NzA2ZWEQyIgCGISAAiIDQ1BBKM609E8whIACOZhn42doAQAAQg10X3NyeGhsODZ2emxnSgJJRFIGQ2hyb21lWgxNb2JpbGUgUGhvbmViB0FuZHJvaWRqeGh0dHBzOi8vcmV2dHRtb2JpbGUuY29tL2NsaWNrLnBocD9rZXk9ZGEyZ3luNGc3bnZtanM4MDk0YWEmQ1BDPXtDUEN9JkNBTVBBSUdOX0lEPXtDQU1QQUlHTl9JRH0mQ1JFQVRJVkVfSUQ9e0NSRUFUSVZFX0lEfXIeCMaUMxD29gEYosk_IPb2ASgAMgYwLjAwNzBAAEgigAGIrpGwB4gB-o7B4wWSAQ8xMTguMTM2LjE1NC4yMjaaAQ8xMTguMTM2LjE1NC4yMjaiAQxKYWthcnRhIFJheWE=
the decoded output looks like this
i'$fpo%?
$62a97294-5a2a-4317-956b-eca7257706ea"CPA(O09gghB
t_srxhl86vzlgJIDRChromeZMobile PhonebAndroidjxhttps://revttmobile.com/click.php?key=da2gyn4g7nvmjs8094aa&CPC={CPC}&CAMPAIGN_ID={CAMPAIGN_ID}&CREATIVE_ID={CREATIVE_ID}r3=J
b+lh0Y 3bSB##i3bSB##j 'F&
So we are able to get partial data but not everything. Any ideas on why this is happening?

Related

UTF-8 Chars in FTP Greeting

I tried to use Unicode characters in my FTP server's greeting, but the client seems to read them as two different characters each. Because of this, I need a way to encode them into UTF-8. For now, I have the greeting HTML encoded because I am displaying it on a webpage, but on any other client it will display the encoding. How can I set the greeting to be parsed as UTF-8? And if I can't, then is there a way I can parse the greeting correctly?
EDIT: Answered my own question, see below.
I found the answer to the question. It was actually UTF-8 encoded already, and I had to decode it from UTF-8. Here is what I did:
decodeURIComponent(escape(greeting))
Don't forget to replace the line breaks with <br> if you are displaying it on a webpage like I am!
decodeURIComponent(escape(greeting)).replace(/\n/g,'<br>')

How long can GET URL #anchor be?

I'm trying to encode the state of webpage in #anchor by base64 encoding JSON string. Sometimes this string gets very long, 10KB for example. I'm not certain I can keep all this data in #anchor.
Does anyone know if #anchor is part of URL length definition?
If it is then is it 2048? or is it 4096? There are conflicting answers about URL length.
If it isn't then how much data can I put in #anchor?

Convert Uniocode to UTF-8 before sending json

My rails app gets certain data in database from another application. That data is stored as text and it may have some unicode chars in it. Now my rails app does have UTF-8 set as default in the config. But when that data is sent as json to backbone front-end then those unicode chars and not converted properly and the front-end displays ? or smart-quotes instead of displaying the proper char. How do I force the rails backend to do the encoding on the backend to convert unicode chars to UTF-8 in the json?
.encode('UTF-8') on each field.
Which is not that good, or you can write your own json serializer, where you can encode to any encoding you want
http://matthewrobertson.org/blog/2013/08/06/active-record-serializers-from-scratch/
or patch the system one
http://api.rubyonrails.org/classes/ActiveModel/Serializers/JSON.html

SHA256 implementation using Base64 for input and output

I've been asked to develop the company's backoffice for the iPad and, while developing the login screen, I've ran into an issue with the authentication process.
The passwords are concatenated with a salt, hashed using SHA-256 and stored in the database.
The backoffice is Flash-based and uses the as3crypto library to hash then password+salt and my problem is that the current implementation uses Base64 for both input and output.
This site demonstrates how this can be done: just select Hash and select Base64 for both input and output format and fire away. So far, all my attempts have yielded different results from the ones this site (and the backoffice code) give me.
While I think that in theory it should be relatively simply:
Base64 encode the pass+salt
Hash it using SHA-256
Base64 encode the result again
so far I haven't been able to do this and I'm getting quite the headache to be honest.
My code is becoming a living maze, i'll have to redo-it tomorrow I reckon.
Any ideas?
Cheers and thanks in advance
PS: Here's the Backoffice's Flash code for generating hashed passwords by the way:
var currentResult:ByteArray;
var hash:IHash = Crypto.getHash('sha256');
var data:ByteArray = Base64.decodeToByteArray(str + vatel);
currentResult = hash.hash(data);
return Base64.encodeByteArray(currentResult).toString();
The backoffice code does not do
Base64 encode the pass+salt
Hash it using SHA-256
Base64 encode the result again
(as you wrote above)
Instead, what it does is
Base64 decode the pass+salt string into a byte array
Hash the byte array using SHA-256
Base64 encode the byte array, returning a string
As per step 1 above, it's a unclear what kind of character encoding the input strings uses. You need to make sure that both systems use the same encoding for the input strings! UTF8, UTF16-LE or UTF16-BE makes a world of a difference in this case!
Start by finding out the correct character encoding to use on the iOS side.
Oh, and Matt Gallagher has written an easy to use wrapper class for hashes to use on iOS, HashValue.m, I've used it with good results.

Weird charactors on HTML page

i am using Last.fm API to fetch some info of artists .I save info in DB and then display on my webpage.
But characters like “ (double quote) are shown as “ .
Example Artist info http://www.last.fm/music/David+Penn
and i got the first line as " Producer, arranger, dj and musician from Madrid-Spain. He has his own record company “Zen Recordsâ€, and ".
Mine Db is UTF-8 but i dunno why this error is still coming .
This seems to be a character encoding error. Confirm that you are reading the webpage as the correct encoding and are showing the results in the correct encoding.
You should be using UTF-8 all the way through. Check that:
your connection to the database is UTF-8 (using mysql_set_charset);
the pages you're outputting are marked as UTF-8 (<meta http-equiv="Content-Type" content="text/html;charset=utf-8">);
when you output strings from the database, you HTML-encode them using htmlspecialchars() and not htmlentities().
htmlentities HTML-encodes all non-ASCII characters, and by default assumes you are passing it bytes in ISO-8859-1. So if you pass it “ encoded as UTF-8 (bytes 0xE2, 0x80, 0x9C), you'd get “, instead of the expected “ or “. This can be fixed by passing in utf-8 as the optional $charset argument.
However it's usually easier to just use htmlspecialchars() instead, as this leaves non-ASCII characters alone, as raw bytes instead of HTML entity references. This results in a smaller page output, so is preferable as long as you're sure the HTML you're producing will keep its charset information (which you can usually rely on, except in context like sending snippets of HTML in a mail or something).
htmlspecialchars() does have an optional $charset argument too, but setting it to utf-8 is not critical since that results in no change of behaviour over the default ISO-8859-1 charset. If you are producing output in old-school multibyte encodings like Shift-JIS you do have to worry about setting this argument correctly, but today that's quite rare as most sane people use UTF-8 in preference.

Resources