I have a MS Doc file and I have converted it from Blob to Base64 encoded string. It contains a string in it as: <z></z>
And I have base64 encoded string for this: <z></z>
But when I search it in the above string converted from blob data then I am not able to find it!!
Can you guide me what I am doing wrong:
Blob beforeblob1 = Blob.valueOf(vDovMerge.Merge_Text__c);
String vDovMergeBlob = EncodingUtil.base64Encode(beforeblob1 );
String v = EncodingUtil.base64Encode(vDoc.Body);
system.debug('****v****'+v);
Blob beforeblob = Blob.valueOf('<z></z>');
String rep = EncodingUtil.base64Encode(beforeblob );
system.debug('****rep****'+rep );
v = v.replace(rep ,vDovMergeBlob );
system.debug('****v****'+v);
Base64-encoding converts 3 bytes of input to 4 bytes of output. So when encoding <z></z> only it is sure to start as the first byte of the block to be encoded. When encoding it as part of a larger data-block it may end up starting as the second or third byte to be encoded thus producing totally different output - that even depends on the data surrounding your block.
Example:
Assuming ASCII-encoding
encoding <z></z> results in PHo+PC96Pg==
encoding a<z></z>results in YTx6Pjwvej4=
encoding aa<z></z> results in YWE8ej48L3o+
encoding aaa<z></z> results in YWFhPHo+PC96Pg== which again contains the original encoding since it starts on a 3-byte-boundary.
So the only way to search the base64-encoded data would be to treat it as a bit-stream and search for the bit-pattern of <z></z> without respect to byte-boundaries - doesn't sound like a lot of fun to me :-(
Related
I have this little code:
void main(List<String> args) {
const data = 'amigo+/=:chesu';
var encoded = base64Encode(utf8.encode(data));
var encoded2 = base64Encode(data.codeUnits);
var decoded = utf8.decode(base64Decode(encoded));
var decoded2 = utf8.decode(base64Decode(encoded2));
print(encoded);
print(encoded2);
print(decoded);
print(decoded2);
}
The output is:
YW1pZ28rLz06Y2hlc3U=
YW1pZ28rLz06Y2hlc3U=
amigo+/=:chesu
amigo+/=:chesu
codeUnits property gives an unmodifiable list of the UTF-16 code units, is it OK to use utf8.decode function? or what function should be used for encoded2?
It's simply not a good idea to do base64Encode(data.codeUnits) because base64Encode encodes bytes, and data.codeUnits isn't necessarily bytes.
Here they are (because all the characters of the string have code points below 256, they are even ASCII.)
Using ut8.encode before base64Encode is good. It works for all strings.
The best way to convert from UTF-16 code units to a String is String.fromCharCodes.
Here you are using base64Encode(data.codeUnits) which only works if the data string contains only code units up to 255. So, if you assume that, then it means that decoding that can be done using either latin1.decode or String.fromCharCodes.
Using ascii.decode and utf8.decode also works if the string only contains ASCII (which it does here, but which isn't guaranteed by base64Encode succeeding).
In short, don't do base64Encode(data.codeUnits). Convert the string to bytes before doing base64Encode, then use the reverse conversion to convert bytes back to strings.
I tried this
print(utf8.decode('use âsmartâ symbols like â thisâ'.codeUnits));
and got this
use “smart” symbols like ‘ this’
The ” and ‘ are smart characters from iOS keyboard
let data = Data(base64Encoded:"aGVsbG8gd29ybGQ=" ,options:.ignoreUnknownCharacters)
what is the "aGVsbG8gd29ybGQ" meaning???
Language:Swift 3
aGVsbG8gd29ybGQ= means hello world which encoded by base64.
You can check it your self on this site.
Base64 encoding is a way to encode something (this could be hello world, or an image, or anything) to a string value. You can then decode it to get the original data, so aGVsbG8gd29ybGQ= becomes hello word.
An image would turn into a string and decode back to an image (bits / bytes).
The swift example you are giving is decoding the base64 encoded string to a Data object.
I'm having issues working with iOS Swift 2.0 to perform an XOR on a [UInt8] and convert the XORd result to a String. I'm having to interface with a crude server that wants to do simple XOR encryption with a predefined array of UInt8 values and return that result as a String.
Using iOS Swift 2.0 Playground, create the following array:
let xorResult : [UInt8] = [24, 48, 160, 212] // XORd result
let result = NSString(bytes: xorResult, length: xorResult.count, encoding: NSUTF8StringEncoding)
The result is always nil. If you remove the 160 and 212 values from the array, NSString is not nil. If I switch to NSUTF16StringEncoding then I do not receive nil, however, the server does not support UTF16. I have tried converting the values to a hex string, then converting the hex string to NSData, then try to convert that to NSUTF8StringEncoding but still nil until I remove the 160 and 212. I know this algorithm works in Java, however in Java we're using a combination of char and StringBuilder and everything is happy. Is there a way around this in iOS Swift?
To store an arbitrary chunk of binary data as as a string, you need
a string encoding which maps each single byte (0 ... 255) to some
character. UTF-8 does not have this property, as for example 160
is the start of a multi-byte UTF-8 sequence and not valid on its own.
The simplest encoding with this property is the ISO Latin 1 aka
ISO 8859-1, which is the
ISO/IEC 8859-1
encoding when supplemented with the C0 and C1 control codes.
It maps the Unicode code points U+0000 .. U+00FF
to the bytes 0x00 .. 0xFF (compare 8859-1.TXT).
This encoding is available for
(NS)String as NSISOLatin1StringEncoding.
Please note: The result of converting an arbitrary binary chunk to
a (NS)String with NSISOLatin1StringEncoding will contain embedded
NUL and control characters. Some functions behave unexpectedly
when used with such a string. For example, NSLog() terminates the
output at the first embedded NUL character. This conversion
is meant to solve OP's concrete problem (creating a QR-code which
is recognized by a 3rd party application). It is not meant as
a universal mechanism to convert arbitrary data to a string which may
be printed or presented in any way to the user.
decodedData is nil but My base64String contains an extremely long string
Encode
var imgProfile:NSData = UIImagePNGRepresentation(imgUI)
let base64String = imgProfile.base64EncodedStringWithOptions(.allZeros)
Decode
let base64String = prefs.valueForKey("imgDefault") as? String
let decodedData = NSData(base64EncodedString: base64String!, options: NSDataBase64DecodingOptions(rawValue: 0) )
var decodedimage = UIImage(data: decodedData!)
I"m having trouble outputting my image from base64
base64 string ENCODE before inserting into db:
iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAMAAADDpiTIAAAADFBMVEXFxcX////p6enW1tbAmiBwAAAAHGlET1QAAAACAAAAAAAAAQAAAAAoAAABAAAAAQAAAAYAppse6QAABcxJREFUeAHs3et22jAQReEQ3v+di8sikDaxZeGLdPTxpzS202jP9pwRkNWPDw8EEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQiCVwvV4/p8fl9fH3K7cjsau2sBuBqfKvVf/x+ecnDQJtKSn9qw83DQIpDLqktcV/iECCAGFuXf9Rz6o/5UHPEtTe+t9V0Qj6dGCb6t9d4EBvDrzb+b/3gOlvsqAjB97M/f+rf/+KjUEfDuxU/nsb6APBwD/llsn/UycwDTQt1443/9MGSdCqA4eUXxK0Wv6P917xed7hJc8+m6Uw7A922O1/90MOtGXaweWXA8OXnwINKXAtSe1dzvGWcQManND9nzIZBU434MjZ/1n45zP7gVMVOPX2v1ugCZxowHnp/2wBl4tJ4CwFzm7/DwvEwCkGNND+vwzQBI5XoI32/1CAAUcb0Er7fxggBg41oKH2/2WAJnCcAg3Wf/rU4HEABv+Xmqw/Aw6zsq3x7xEB0596wBEStFt/Boxefwbsb0Br27/XBJie2w7u60Dr9WfA6PVnwJ4GtDz/PbPAXmAvB/qov0lw9PozYB8Dern/pyyQAts70Ojrv8/of33mfYHNBeiq/t4X2L
base64 string DECODE when pulling down from db:
Optional("iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAMAAADDpiTIAAAADFBMVEXFxcX////p6enW1tbAmiBwAAAAHGlET1QAAAACAAAAAAAAAQAAAAAoAAABAAAAAQAAAAYAppse6QAABcxJREFUeAHs3et22jAQReEQ3v
di8sikDaxZeGLdPTxpzS202jP9pwRkNWPDw8EEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQQAABBBBAAAEEEEAAAQQQiCVwvV4/p8fl9fH3K7cjsau2sBuBqfKvVf/x
ecnDQJtKSn9qw83DQIpDLqktcV/iECCAGFuXf9Rz6o/5UHPEtTe
t9V0Qj6dGCb6t9d4EBvDrzb b/3gOlvsqAjB97M/f rf/
KjUEfDuxU/nsb6APBwD/llsn/UycwDTQt1443/9MGSdCqA4eUXxK0Wv6P917xed7hJc8
m6Uw7A922O1/90MOtGXaweWXA8OXnwINKXAtSe1dzvGWcQManND9nzIZBU434MjZ/1n45zP7gVMVOPX2v1ugCZxowHnp/2wBl4tJ4CwFzm7/DwvEwCkGNND
vwzQBI5XoI32/1CAAUcb0Er7fxggBg41oKH2/2WAJnCcAg3Wf/rU4HEABv
Xmqw/Aw6zsq3x7xEB0596wBEStFt/Boxefwbsb0Br27/XBJie2w7u60Dr9WfA6PVnwJ4GtDz/PbPAXmAvB/qov0lw9PozYB8Dern/pyyQAts70Ojrv8/of33mfYHNB
There are two different problems:
It would appear that the + characters have been replaced with spaces. That will happen if you submit an application/x-www-form-urlencoded request without percent escaping the + characters. This probably happened when you first sent the base64 string to be stored in the database.
See https://stackoverflow.com/a/24888789/1271826 for a discussion of some percent encoding patterns. The key point here is to not rely upon stringByAddingPercentEscapesUsingEncoding, because that will allow + characters to go unescaped.
The string is also missing the trailing = characters. (The string's length should be a multiple of four, and in this case, it's two characters short, so there should be == at the end of the rendition with the + characters in it (the "before" string). While that is sometimes a mistake made by poorly designed base64-encoders, this is not a problem that base64EncodedStringWithOptions suffers from.
In this case, it looks like a much longer base64 string must have been truncated somehow. (Your strings are suspiciously close to 1024 characters. lol.) This truncation could happen if you put the parameters in the URL rather than the body of the request. But nothing in this code sample would account for this behavior, so the problem rests elsewhere.
But look at the length of the original NSData. The base64 string should be 1/3 larger than that (plus rounded up to the nearest four characters, once you include the trailing = characters).
And, once you decode the string that you've provided and look at the actual contents, you can also see that the base64 string was truncated. (According to the portion provided, there should be 1484 bytes of IDAT data, and there's not, plus there's no IEND chunk ... don't worry about those details, but rest assured that it's basically saying that the PNG data stream is incomplete.)
If you're getting nil returned, then your base64 string is not valid. NSData(base64EncodedString:options:) requires a base64 string that is padded with = to a multiple of 4.
Here's a similar issue (except in Obj-C).
NSData won't accept valid base64 encoded string
I am having issues parsing text files that have illegal characters(binary markers) in them. An answer would be something as follows:
test.csv
^000000^id1,text1,text2,text3
Here the ^000000^ is a textual representation of illegal characters in the source file.
I was thinking about using the java.nio to validate the line before I process it. So, I was thinking of introducing a Validator trait as follows:
import java.nio.charset._
trait Validator{
private def encoder = Charset.forName("UTF-8").newEncoder
def isValidEncoding(line:String):Boolean = {
encoder.canEncode(line)
}
}
Do you guys think this is the correct approach to handle the situation?
Thanks
It is too late when you already have a String, UTF-8 can always encode any string*. You need to go to the point where you are decoding the file initially.
ISO-8859-1 is an encoding with interesting properties:
Literally any byte sequence is valid ISO-8859-1
The code point of each decoded character is exactly the same as the value of the byte it was decoded from
So you could decode the file as ISO-8859-1 and just strip non-English characters:
//Pseudo code
str = file.decode("ISO-8859-1");
str = str.replace( "[\u0000-\u0019\u007F-\u00FF]", "");
You can also iterate line-by-line, and ignore each line that contains a character in [\u0000-\u0019\u007F-\u00FF], if that's what you mean by validating a line before processing it.
It also occurred to me that the binary marker could be a BOM. You can use a hex editor to view the values.
*Except those with illegal surrogates which is probably not the case here.
Binary data is not a string. Don't try to hack around input sequences that would be illegal upon conversion to a String.
If your input is an arbitrary sequence of bytes (even if many of them conform to ASCII), don't even try to convert it to a String.