AES encrypt string appear \r\n - ios

When I use AES128 encrypt string, if the encrypted string is too long then it will contain \r\n in it. like this
Now I have to use empty string to replace it. Why does the encrypt-string contain the \r\n and any better way to avoid it or fix it.
Thanks.
Answers: it's caused by the Base64 encoding process, every 64 characters will insert a \r\n .

That is a Base64 encoded string.
Actual encryption output is an array of 8-bit bytes, not characters. The code is Base64 encoding the encrypted data with an option to insert line breaks every 64 characters, this is sometimes to allow better printing of the output. When it is decoded use the NSDataBase64DecodingIgnoreUnknownCharacters option to remove line breaks .
In particular for Objective-C the to create a Base64 string from NSData is:
- (NSString *)base64EncodedStringWithOptions:(NSDataBase64EncodingOptions)options
The options include:
NSDataBase64Encoding64CharacterLineLength
Set the maximum line length to 64 characters, after which a line ending is inserted.
Which inserts "\r\n" (carriage return, new line) characters each 64 characters.
If that is not what you want pass 0 as the option value.
To decode Base64 use the Objective-C method:
- (instancetype)initWithBase64EncodedString:(NSString *)base64String options:(NSDataBase64DecodingOptions)options
With the option: NSDataBase64DecodingIgnoreUnknownCharacters.
Apple code:
The default implementation of this method will reject non-alphabet characters, including line break characters. To support different encodings and ignore non-alphabet characters, specify an options value of NSDataBase64DecodingIgnoreUnknownCharacters.
The thing that gives it away as a Base64 string is a length that is a multiple of 4, the characters used "a-zA-Z0-9+/" and the trailing "=" characters.
Historic note: These days on OSX and iOS line breaks are a single "\n" (0x0a) line feed character. Back when we used teletypes as terminals "\r" (0x0d) carriage return moved the carriage or print head back but did not move the paper up to the next line. "\n" newline moved the paper up one line but did move the carriage or print head back. They were two distinct mechanical operations. Later some systems used either "\r\n", "\n\r", "\r" or "\n". Unix choose "\n" and thus OSX and iOS.

Related

What scheme is used to encode unicode characters in a .url shortcut?

What scheme is used to encode unicode characters in a windows url shortcut?
For example, a new shortcut for url "http://Ψαℕ℧▶" produces a .url file with the text:
[{000214A0-0000-0000-C000-000000000046}]
Prop3=19,2
[InternetShortcut]
IDList=
URL=http://?aN??/
[InternetShortcut.A]
URL=http://?aN??/
[InternetShortcut.W]
URL=http://+A6gDsSEVIScltg-/
What is the algorithm to decode "+A6gDsSEVIScltg-" to "Ψαℕ℧▶"?
I am not asking for API code, but I would like to know the encoding scheme details.
Note: The encoding scheme is not utf-8 nor utf-16 nor ucs-2 and no %encoding.
+A6gDsSEVIScltg- is the UTF-7 encoded form of Ψαℕ℧▶.
The correct way to process a .url file is to use the IUniformResourceLocator and IPropertyStorage interfaces from the CLSID_InternetShortcut COM object. See Internet Shortcuts on MSDN for details.
The answer (utf-7) allowed me to successfully develop the url conversion routine.
Let me summarize the steps:
To obtain the unicode url from a InternetShortcut.W found in a .url file.
. Pass ascii chars until crlf, after making them internet safe.
. A none escaped + character starts a utf-7 formatted unicode sequence:
. Collect 6-bit nibbles from base64 coded ascii
. Per collected 16 bits, convert the 16 bits to utf-8 (1,2, or 3 chars)
. Pass the utf8 generated characters as %hh
. Continue until the occurrence of a "-" character
. The bit collector should be zero

Convert Unicode escape sequence into its corresponding character

I'm receiving a string from the server and it has the special characters in code. Here's the example:
"El usuario o las contrase\UOOOOfffda no son v\UOOOOfffdlidos"
The first one should be an "ñ" and the second one "á"
I know it's not complicated but I can't find the answer. How can I get the string with the special characters correctly formatted?
Unicode U+FFFD (in your string, displayed as UTF-32 \U0000fffd) is "�", the replacement character. It is often substituted in strings when a system encounters unrecognized characters.
This character really shouldn't appear in string data since its purpose is to indicate an error in displaying or interpreting the string. Since your server is sending you that character for both ñ and á, there is no way to retrieve the correct character.
How are you "receiving" this string? It could be that you are accessing the server incorrectly so it isn't sending you an unmodified string.
Unicode for those characters should look like this:
#"accented-a is \u00f1, and tilda-n is \u00e1"
But it's not clear what you're getting from the server makes any sense. The objective-c literal must have a lowercase leading "u" followed only by valid hex digits (0-9 and a-f). I don't see a transformation that changes the literals you have to the ones you expect.
Once the characters are formatted properly, the built-in classes will just work, for example, assigning the string to a label's text property will show the user a nice glyph.

iPhone - Localizable.strings - string with single quotes inside

I need to add some french translation into iOS Application. But I don know how to use single qute char in the Localizable.strings file.
For example text :
"Invalid username or password."="Nom d'utilisateur ou mot de passe incorrect.";
Causes an error. I've tried adding backslashes, but it havn't worked as well.
Using Special Characters in String Resources Just as in C, some
characters must be prefixed with a backslash before you can include
them in the string. These characters include double quotation marks,
the backslash character itself, and special control characters such as
linefeed (\n) and carriage returns (\r).
From:
https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/LoadingResources/Strings/Strings.html
Did you try escaping it with a backslash?
"\'"
As a last resort you could use direct U codes:
You can include arbitrary Unicode characters in a value string by
specifying \U followed immediately by up to four hexadecimal digits.
The four digits denote the entry for the desired Unicode character;
for example, the space character is represented by hexadecimal 20 and
thus would be \U0020 when specified as a Unicode character. This
option is useful if a string must include Unicode characters that for
some reason cannot be typed. If you use this option, you must also
pass the -u option to genstrings in order for the hexadecimal digits
to be interpreted correctly in the resulting strings file. The
genstrings tool assumes your strings are low-ASCII by default and only
interprets backslash sequences if the -u option is specified.
The apostrophe should be \U0027

Mime encoded headers with extra '=' (==?utf-8?b?base64string?=)

This might be a silly question but... here it goes!
I wrote my own MIME parser in native C++. It's a nightmare with the encodings! It was stable for the last 3 months or so but recently I noticed this Subject: header.
Subject: =?UTF-8?B?T2ZpY2luYSBkZSBJbmZvcm1hY2nDs24sIEluaWNpYXRpdmFzIHkgUmVjbGFt?===?UTF-8?B?YWNpb25lcw==?=
which should decode to this:
Subject: Oficina de Información, Iniciativas y Reclamaciones
The problem is there is one extra = (equal) in there which I can't figure out binding the two (why 2?) encoded elements which I don't understand why are separated. In theory the format should be: =?charset?encoding?encoded_string?= but found another subject that starts with two =.
==?UTF-8?B?blahblahlblah?=
How should I handle the extra =?
I could replace ==? with =? (which I am) before doing anything (and it works)... but I'm wondering if there's any kind of spec regarding this so I don't hack my way into proper functionality.
PS: How much I hate these relic protocols! All text communications should be UTF-8 and XML :)
In MIME headers encoded words are used (RFC 2047 Section 2.).
... (why 2?)
To overcome 75 encoded word limit, which is there because of 78 line length limit (or to use 2 different encodings like Chinese and Polish for example).
RFC 2047:
An 'encoded-word' may not be more than 75 characters long,
including 'charset', 'encoding', 'encoded-text', and delimiters.
If it is desirable to encode more text than will fit in an
'encoded-word' of 75 characters, multiple 'encoded-word's
(separated by CRLF SPACE) may be used.
Here's the example from RFC2047 (note there is no '=' in between):
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
=?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
Your subject should be decoded as:
"Oficina de Información, Iniciativas y Reclam=aciones"
mraq answer is incorrect. Soft line breaks apply to 'Quoted Printable' Content-Transfer-Encoding only, which can be used in MIME body.
It is called the "Soft Line Break" and it is the heritage of the SMTP protocol.
Quoting page 20 of RFC2045
(Soft Line Breaks) The Quoted-Printable encoding
REQUIRES that encoded lines be no more than 76
characters long. If longer lines are to be encoded
with the Quoted-Printable encoding, "soft" line breaks
must be used. An equal sign as the last character on a
encoded line indicates such a non-significant ("soft")
line break in the encoded text.
And also Wikipedia on Quoted-printable
A soft line break consists of an "=" at the end of an encoded line,
and does not appear as a line break in the decoded text.
From what I can see in the MIME RFC double equal signs are not valid input (for encoding), but keep in mind you could interpret the first equal sign as what it is and then use the following stuff for decoding. But seriously, those extra equal signs look like artifacts, maybe from an incorrect encoder.

Parsing \"–\" with Erlang re

I've parsed an HTML page with mochiweb_html and want to parse the following text fragment
0 – 1
Basically I want to split the string on the spaces and dash character and extract the numbers in the first characters.
Now the string above is represented as the following Erlang list
[48,32,226,128,147,32,49]
I'm trying to split it using the following regex:
{ok, P}=re:compile("\\xD2\\x80\\x93"), %% characters 226, 128, 147
re:split([48,32,226,128,147,32,49], P, [{return, list}])
But this doesn't work; it seems the \xD2 character is the problem [if I remove it from the regex, the split occurs]
Could someone possibly explain
what I'm doing wrong here ?
why the '–' character seemingly requires three integers for representation [226, 128, 147]
Thanks.
226,128,147 is E2,80,93 in hex.
> {ok, P} = re:compile("\xE2\x80\x93").
...
> re:split([48,32,226,128,147,32,49], P, [{return, list}]).
["0 "," 1"]
As to your second question, about why a dash takes 3 bytes to encode, it's because the dash in your input isn't an ASCII hyphen (hex 2D), but is a Unicode en-dash (hex 2013). Your code is recieving this in UTF-8 encoding, rather than the more obvious UCS-2 encoding. Hex 2013 comes out to hex E28093 in UTF-8 encoding.
If your next question is "why UTF-8", it's because it's far easier to retrofit an old system using 8-bit characters and null-terminated C style strings to use Unicode via UTF-8 than to widen everything to UCS-2 or UCS-4. UTF-8 remains compatible with ASCII and C strings, so the conversion can be done piecemeal over the course of years, or decades if need be. Wide characters require a "Big Bang" one-time conversion effort, where everything has to move to the new system at once. UTF-8 is therefore far more popular on systems with legacies dating back to before the early 90s, when Unicode was created.

Resources