swift base64EncodedStringWithOptions us-ascii - ios

I'm trying to base64 encode but in us-ascii.
The problem is that bs data method
Using UTF8
Quote from the docs:
"Create a Base-64, UTF-8 encoded NSData from the receiver's contents using the given options."
Is it possible to create a base-64 us-ascii encoded?
Thanks in advance.

Base 64 for is for transferring MIME types (http://en.wikipedia.org/wiki/Base64) and all characters in Base 64 encoded string falls under ASCII.

Related

How to parse Byte Order Mark UTF-8 text file in objective C ios

I have sent a text file as an attachment from from whatsapp and then when opened the sent file in iPhone app I am seeing =EF=BB=BF in start which means it is BOM utf-8 file. My question is why '=' character is coming after every code instead of 0x?
Also all emoji are coming in this style =F0=9F=98=9D, how can I convert this into simple text in objective C? Any help is much appreciated.
Thanks
What you're seeing is quoted-printable encoding. This encoding scheme escapes non-printable ASCII or 8-bit characters as =xx where xx is the hex value of the byte. Quoted-printable is mainly used in email transmission. See the question Objective-C decode quoted printable text for tips on decoding.

Why JDK8's Base64 uses ISO-8859-1?

I'm writing my own BASE64 encoder/decoder for some constrained environments.
And I found that Base64.Encoder#encodeString saying that it uses ISO-8859-1 for construct a String from those encoded bytes.
I perfectly presuming that ISO-8859-1 charset also covers all base64 alphabets.
Is there any possible reason not to use US-ASCII?
I suspect it's more efficient: converting from ISO-8859-1 back to text is just a matter of promoting each byte straight to a char, whereas for ASCII you'd need to check that the byte is valid ASCII. The result for base64 will always be the same, of course.
(That's only a guess, but an educated one. You could always run benchmarks if you want to validate it...)

What's the characterset of SHA1?

I need to know what character will the SHA1 will generate for me?
Is it possible to know the characterset of the SHA1? Or if it's configurable, what's the default characterset of it?
Thank you.
SHA-1 doesn't generate text, it generates a binary hash (like most digests), so it doesn't have a charset (or care about the input's charset for that matter).
You can represent it as text (a string representation of the hex value, and base64 are popular) if you want, especially if you need to transfer it over the network or display it to users. That encoding is up to you.
I'm fairly sure it's just binary data rather than any character encoding. You could then encode that in Base64 if you like.
The hash algorithm SHA1 takes a stream of bytes as input, and calculates the 160-bits digest. Command line versions output the digest as a hexadecimal string. No charsets involved.

invalid token error while parsing an XML file with UTF-8 encoding

invalid token error while parsing an XML file with UTF-8 encoding.
This error is coming when it encountered extended ASCII character 'â' { "â", "â" }.
When I have changed the encoding from UTF-8 to ISO-8859-1 the parsing is successful. But my application should support UTF-8, ASCII and extended ASCII characters. What should I do for this?
Any ideas are welcome.
Thanks in Advance for your time and solution.
Telling a parser that a latin-1 file is UTF-8 by setting the encoding attribute of the XML declaration will result in an error similar to that which you report.
If the 'â' character (U+00E2) appears in a UTF-8 encoded file, then that character will be encoded in that file as a two byte sequence. So if you are not changing the bytes in the file when you say you are changing the encoding, you are not changing the encoding of the file, only telling the parser that a non-UTF-8 file is UTF-8.

Percent Encoded UTF-8 to Ascii(8-bit) conversion

Im reading in urls and they often have percent encoded characters.
Example: %C3%A9 is actually é
According to http://www.microsystools.com/products/sitemap-generator/faq/character-percentage-url-encoding/ , characters in the upper half of 8-Bit ASCII (128-255) are encoded as UTF-8, then their bytes are saved as hex. Now, when I get my URL, the %HEX's have been reencoded as 8-bit ascii, and I need to convert those back to their true 8bit ascii. Is there any function/library I can use, or else, how would I go about the conversion?
Im using C/C++.
First you need to URLDecode. Not a function available in cross-platform C++, but, luckily for you, not a hard problem. Copy bytes from source to target. Non-% bytes just get copied. When you hit %xx, convert XX from hex chars to binary, and you have your byte.
This gives you a buffer of text in UTF-8. You say you want 'ASCII' -- ISO-646. Then you can't have an accented e. I can think of several possibilities for what you really want:
ISO-8859-1. You can use ICU to convert UTF-8 to ISO-8859-1.
ISO-646. You can also use ICU, and I believe it will make accented chars into their ISO-646 equivalents.

Resources