from utf8 to latin1 - dart

I have an utf-8 string like this:
=C3=A0=C3=A8=C3=AC=C3=B2=C3=B9
I need to decode the string to latin1. The expected result is:
àèìòù
This is what I tried without success:
utf8.decode(stringData.runes.toList())

I solved.
Thanks to #lenz comment and the js function found here I solved with this function:
String decodeQuotedPrintable(String input) {
return input
// https://tools.ietf.org/html/rfc2045#section-6.7, rule 3:
// “Therefore, when decoding a `Quoted-Printable` body, any trailing white
// space on a line must be deleted, as it will necessarily have been added
// by intermediate transport agents.”
.replaceAll(RegExp(r'[\t\x20]$', multiLine: true), '')
// Remove hard line breaks preceded by `=`. Proper `Quoted-Printable`-
// encoded data only contains CRLF line endings, but for compatibility
// reasons we support separate CR and LF too.
.replaceAll(RegExp(r'=(?:\r\n?|\n|$)', multiLine: true), '')
// Decode escape sequences of the form `=XX` where `XX` is any
// combination of two hexidecimal digits. For optimal compatibility,
// lowercase hexadecimal digits are supported as well. See
// https://tools.ietf.org/html/rfc2045#section-6.7, note 1.
.replaceAllMapped(RegExp(r'=([a-fA-F\d]{2})'), (match) {
var codePoint = int.parse(match[1], radix: 16);
return String.fromCharCode(codePoint);
});
}
and with this code:
var output = decodeQuotedPrintable(input);
output = utf8.decode(latin1.encode(input));

Related

How to replace unicode escape character in Dart

I need to clean up a string that has an escape character but couldn't do so.
Here is my test code:
test('Replace unicode escape character', () {
String originalText = 'Jeremiah 52:1\\u201334';
String replacedText = originalText.replaceAll(r'\\', r'\');
expect(replacedText, 'Jeremiah 52:1\u201334');
});
It fails with an error:
Expected: 'Jeremiah 52:1–34'
Actual: 'Jeremiah 52:1\\u201334'
Which: is different.
Expected: ... miah 52:1–34
Actual: ... miah 52:1\\u201334
Unicode characters and escape characters aren't stored the way you write them when you wrote the string -- they are converted to their own values. This is evident when you run the following code:
print('\\u2013'.length); // Prints: 6
print('\u2013'.length); // Prints: 1
Here, what happened was: the first stored the following characters: '\', 'u', '2', '0', '1', and '3' -- while the latter stored '–' only.
Hence, your attempt to change the first by replacing two slashes \\ with one slashes \ wouldn't work, as the compiler isn't converting your unicode escape characters any longer.
That doesn't mean that you won't be able to convert your unicode codes into unicode characters though. You could use the following code:
final String str = 'Jeremiah 52:1\\u2013340';
final Pattern unicodePattern = new RegExp(r'\\u([0-9A-Fa-f]{4})');
final String newStr = str.replaceAllMapped(unicodePattern, (Match unicodeMatch) {
final int hexCode = int.parse(unicodeMatch.group(1), radix: 16);
final unicode = String.fromCharCode(hexCode);
return unicode;
});
print('Old string: $str');
print('New string: $newStr');

What kind of encoding is this ? And what script should I use to decode this?

I have an encode string which I have to convert to a form which has HTML tags and display it in a webview. The encoded data I have is of this format :-
Having+a+pet+is+quite+like+parenting%2C+you+just+cannot+stand+the+chance+being+away+from+them.+Still%2C+very+often+it+occurs+that+you+cannot+have+your+pets+around+everywhere+so+you+need+to+shelter+them+elsewhere+for+a+while.+But+at+the+end+of+the+day%2C+one+must+admit+that+it+is+pretty+hard+to+trust+anyone+with+our+furry+babies.+Who+knows+them+better+than+us%2C+their+habits%2C+eating+orders%2C+temperaments+and+socialization+tendencies%3F%0D%0A%0D%0AIt+is+because+of+this+very+reason+that+we+at+%3Ca+href%3D%22https%3A%2F%2Fwww.whatsuplife.in%2Fnoida%22+target%3D%22_blank%22+rel%3D%22noopener%22%3EWhat%92s+Up+Noida+are+sharing%3C%2Fa%3E+with+you+the+list+of+5+pet+boarding+facilities+which+you+can+actually+trust.%0D%0A%3Ch4%3E%3Cstrong%3EThe+Dog+Resorts%2C+Noida+%3C%2Fstrong%3E%3C%2Fh4%3E%0D%0AAt+the+Dog+Resort+Noida%2C+be+assured+that+your+pet+will+be+cared+for+by+thorough+professionals.+%A0They+resort+boasts+of+a+10%2C000+sq.+ft.+premises+area+where+about+5000+sq+.ft+is+wide+open+green+area+to+make+your+pet+stay+playful+and+cheering.+The+best+aspect+about+the+place+is+that+there+is+complete+medical+care+and+behavioral+apprehending+of+pets+as+they+have+a+dedicated+team+of+veterinary+doctors%2C+dog+trainers%2C+and+other+pet+care+expertise.%0D%0A%0D%0A%3Ca+href%3D%22https%3A%2F%2Fwww.whatsuplife.in%2Fnoida%2Fblog%2Fwp-content%2Fuploads%2F2017%2F06%2FDog-resort-noida.png%22%3E%3Cimg+class%3D%22aligncenter+wp-image-4902+size-full%22+src%3D%22https%3A%2F%2Fwww.whatsuplife.in%2Fnoida%2Fblog%2Fwp-content%2Fuploads%2F2017%2F06%2FDog-resort-noida.png%22+alt%3D%22Dog+resort+noida%22+width%3D%221052%22+height%3D%22326%22+%2F%3E%3C%2Fa%3E%0D%0A%0D%0A%3Cstrong%3EAddress%3A%A0+Sector+112%2C+Noida%3C%2Fstrong%3E%0D%0A%3Cstrong%3E+Contact%3A+9911782646%3C%2Fstrong%3E%0D%0A%3Ch4%3E%3Cstrong%3EBest+Pet+Boarding+%3C%2Fstrong%3E%3C%2Fh4%3E%0D%0AThey+are+indeed+one+of+the+highly+rated+and+positively+reviewed+pet+boarding+facilities+in+Noida.+Not+just+for+dogs+but+they+have+a+friendly+providence+of+pet+care+and+day+boarding+services+for+other+pets+like+fish%2C+bird%2C+guinea+pig%2C+hamster%2C+cat%2C+and+others+as+well.%A0+The+idea+of+creating+is+%91no-cage%92+free+space+for+beloved+pets+was+developed+by+Kaveri+Rana+Bhardwaj+and+Yashraj+Bhardwaj.+At+the+Best+Pet+Boarding%2C+the+emphasis+is+on+making+your+pet+socialize+and+have+a+free+will+of+wandering+in+wide+space.+They+believe+in+the+unique+concept+of+letting+in+a+trained+dog+which+said+to+be+their+%91alpha%92+among+other+pet+dogs%2C+this+maintains+the+decorum+all+the+time.%0D%0A%0D%0A%3Ca+href%3D%22https%3A%2F%2Fwww.whatsuplife.in%2Fnoida%2Fblog%2Fwp-content%2Fuploads%2F2017%2F06%2Fbest-pet-boarding.jpg%22%3E%3Cimg+class%3D%22aligncenter+wp-image-4903+size-full%22+src%3D%22https%3A%2F%2Fwww.whatsuplife.in%2Fnoida%2Fblog%2Fwp-content%2Fuploads%2F2017%2F06%2Fbest-pet-boarding.jpg%22+alt%3D%22best+pet+boarding+noida%22+width%3D%22960%22+height%3D%22489%22+%2F%3E%3C%2Fa%3E%0D%0A%0D%0A%3Cstrong%3EAddress%3A+B+14+1+Omicron+2%2C+Greater+Noida%3C%2Fstrong%3E%0D%0A%3Cstrong%3E+Phone%3A+09711951179%3C%2Fstrong%3E%0D%0A%3Ch4%3E%3Cstrong%3EPet+Home+Affairs+Day+Boarding+%3C%2Fstrong%3E%3C%2Fh4%3E%0D%0A%3Ca+style%3D%22font-size%3A+14px%3B+font-family%3A+%27Open+Sans%27%2C+Arial%2C+sans-serif%3B+color%3A+%2319232d%3B+text-decoration-line%3A+underline%3B%22+href%3D%22https%3A%2F%2Fwww.whatsuplife.in%2Fnoida%2Fblog%2Fwp-content%2Fuploads%2F2017%2F06%2Fpet-home-affairs-day-care.jpg%22%3E%3Cimg+class%3D%22alignright+wp-image-4906+size-full%22+style%3D%22font-size%3A+14px%3B%22+src%3D%22https%3A%2F%2Fwww.whatsuplife.in%2Fnoida%2Fblog%2Fwp-content%2Fuploads%2F2017%2F06%2Fpet-home-affairs-day-care.jpg%22+alt%3D%22pet+boarding+in+noida%22+width%3D%22350%22+height%3D%22263%22+%2F%3E%3C%2Fa%3E%0D%0A%0D%0APet+Home+Affairs+enjoys+a+sterling+reputation+in+pet+care+industry+and+is+one+of+the+remarkable+services+in+Noida.+At+the+boarding%2C+there+are+professionals+who+understand+the+needs+of+your+pet+and+also+your+preferences+about+the+specific+services+you+need+for+your+pet.+Pet+care+system+is+well+scheduled+for+meal+and+medicine+timings+and+under+the+care+of+pet+care+experts+here.%0D%0A%0D%0A%3Cstrong%3EAddress%3A%A0+Supertech+Capetown%2C+Sector+74%2C+Noida.%3C%2Fstrong%3E%0D%0A%3Cstrong%3E+Phone%3A+09873204636%3C%2Fstrong%3E%0D%0A%3Ch4%3E%3Cstrong%3EClean+Cute+%3C%2Fstrong%3E%3C%2Fh4%3E%0D%0AWith+their+10%2C000+sq.+ft+wide+space+of+Dog+day+care+and+pet+holiday+home%2C+Clean+Cute+is+known+to+serve+all+sizes%2C+kinds%2C+breeds%2C+and+age+of+pets.+They+are+biggest+and+best+boarding+facility+not+only+in+Noida+but+also+in+NCR.+Clean+Cute+is+managed+by+a+highly+skilled+dog+handling+team+offering+pet+grooming+services%2C+recreational+activities%2C+health+care%2C+therapies%2C+pet+parenting+counseling%2C+dog+behavior+management%2C+skin+care%2C+dog+rehabilitation+and+actually%2C+what+not.%0D%0A%0D%0A%3Ca+href%3D%22https%3A%2F%2Fwww.whatsuplife.in%2Fnoida%2Fblog%2Fwp-content%2Fuploads%2F2017%2F06%2FBoarding-Large-File.jpg%22%3E%3Cimg+class%3D%22aligncenter+wp-image-4907+size-large%22+src%3D%22https%3A%2F%2Fwww.whatsuplife.in%2Fnoida%2Fblog%2Fwp-content%2Fuploads%2F2017%2F06%2FBoarding-Large-File-1024x682.jpg%22+alt%3D%22dog+boarding+in+noida%22+width%3D%22702%22+height%3D%22468%22+%2F%3E%3C%2Fa%3E%0D%0A%0D%0A%26nbsp%3B%0D%0A%0D%0A%3Cstrong%3EAddress%3A%A0+Sector+112%2C+Noida%3C%2Fstrong%3E%0D%0A%3Cstrong%3E+Phone%3A+08076548163%3C%2Fstrong%3E%0D%0A%3Ch4%3E%3Cstrong%3EGoDogee+Pet+services+%3C%2Fstrong%3E%3C%2Fh4%3E%0D%0AGoDogee+is+one+of+the+India%92s+leading+pet+care+services+providing+pet+day+care%2C+pet+hotel%2C+pet+grooming%2C+pet+vaccination%2C+pet+treatment%2C+pet+mating+and+other+interesting+and+unique+sort+of+services.%0D%0A%0D%0A%3Cimg+class%3D%22aligncenter+size-full+wp-image-4905%22+src%3D%22https%3A%2F%2Fwww.whatsuplife.in%2Fnoida%2Fblog%2Fwp-content%2Fuploads%2F2017%2F06%2Fgodogee.jpg%22+alt%3D%22%22+width%3D%221920%22+height%3D%22790%22+%2F%3E%0D%0A%0D%0A%3Cstrong%3EAddress%3A+J-10%2C+Noida+Rd%2C+H+Block%2C+Sector+8%2C+Noida%3C%2Fstrong%3E%0D%0A%3Cstrong%3E+Phone%3A+09910664343%3C%2Fstrong%3E
What kind of format is this and how do I convert it into text with html tags ?
I'm currently using this method but I'm getting a nil return value :
func urlDecode(_ htmlString: String) -> String {
var decodedURL = htmlString.replacingOccurrences(of: "+", with: " ")
decodedURL = decodedURL.removingPercentEncoding!
return decodedURL
}
So you are getting nil, which means that there is something in your string that is invalid. How could we find out what? Well it's certainly not spaces, so let's break your string into "words":
var decodedURL = htmlString.replacingOccurrences(of: "+", with: " ")
let chunks = decodedURL.components(separatedBy: " ")
and now try to decode each "word" and see which ones fail:
for chunk in chunks
{
let decoded = chunk.removingPercentEncoding
if decoded == nil
{
print("\(chunk)")
}
}
this outputs:
rel%3D%22noopener%22%3EWhat%92s
%A0They
%2F%3E%3C%2Fa%3E%0D%0A%0D%0A%3Cstrong%3EAddress%3A%A0
well.%A0
%91no-cage%92
%91alpha%92
here.%0D%0A%0D%0A%3Cstrong%3EAddress%3A%A0
%2F%3E%3C%2Fa%3E%0D%0A%0D%0A%26nbsp%3B%0D%0A%0D%0A%3Cstrong%3EAddress%3A%A0
India%92s
Now removingPercentEncoding decodes UTF-8, so checking each of those % sequences to see if they are valid UTF-8 shows that %91, %92 & %A0 are not. However they are valid Windows-1252 encodings - single curly quotes and the non-breaking space.
So you need to decode the string as Windows-1252 not UTF-8. Time for you to look in the documentation...
HTH
From apple,
A new string with the percent-encoded sequences removed, or nil if the
receiver contains an invalid percent-encoding sequence.You must call
this method only on strings that you know to be percent-encoded.
Calling this method on strings that are not percent-encoded can lead
to misinterpreting a percent character as the beginning of a
percent-encoded sequence.
You get nil because removingPercentEncoding does not recognize some of your percent encoding. I pasted it onto online parser and find out that %A0 and %92 seem to be suspicious.

get length of string in UTF8

How can I get the length (not number of bytes) of a string in its UTF-8 encoded form (PHP's mb_strlen(.., 'UTF-8') equivalent)?
I tried string.characters.count but it does not return the correct length for certain characters like an emoji.
Example:
let s = "✌🏿️"
print(s.characters.count) // prints 2, but should print 3.
You can access the UTF-8 encoding of a string with the .utf8 property. Use count on that to get the number of UTF-8 code units in the string:
let string = "\u{1f603}" // One of the smiley face emojis...
print(string.utf8.count) // prints "4"
Based on your edited question, what you are probably looking for is the number of UnicodeScalars used to encode the string. You access that with the unicodeScalars property:
let s = "✌🏿️"
print(s.unicodeScalars.count) // prints 3
The reason everyone is confused is because your original question asks for the length of the string in its UTF-8 encoded form. The answer that you actually wanted had nothing to do with the length of the string in its UTF-8 encoded form.
I think you are confused about the difference between Unicode "extended grapheme clusters", Unicode code points, and the various encodings (like UTF-8) that can be used to encode a Unicode code point.
A Character in Swift represents what Unicode calls an "extended grapheme cluster". That is to say, it is a single visual character, even if it is made up of multiple Unicode code points.
A Unicode code point is a single linguistic symbol that is given a 32-bit value. Two or more Unicode code points can combine to create a single Character. In Swift, the Unicode code point is represented by the UnicodeScalar type.
When it comes time to store a string, or send it over the internet, or otherwise turn it into data that is represented by bytes, you have to decide how to encode it. There are all kinds of encodings, the most common is probably UTF-8, which encodes the string as a series of UInt8 values.
That's just a brief snippet of the difference between the three concepts. It is actually a really interesting subject and if you Google some of those terms, you will find a lot more good information.
let str = "ačŘ"
print("str has \(str.characters.count) characters") // 3
print("and \(str.utf8.count) bytes as encoded in UTF-8") // 5
update (based on your notes)
let s = "✌🏿️"
let arr:[UInt8] = [226, 156, 140, 240, 159, 143, 191, 239, 184, 143]
var arrCchar = arr.map { (uint8) -> Int8 in
Int8(bitPattern: uint8)
}
arrCchar += [0] // to be null terminated
let str = String.fromCString(&arrCchar)
print(str) // Optional("✌🏿️")
s == str // TRUE !!!!
by characters
s.characters.forEach { (c) -> () in
let str = String(c)
print(str.utf8.map{$0}, "which represents character: ", c)
str.unicodeScalars.forEach({ (u) -> () in
print("composed from unicode scalar(s): ", u.debugDescription)
})
}
/*
[226, 156, 140] which represents character: ✌
composed from unicode scalar(s): "\u{270C}"
[240, 159, 143, 191, 239, 184, 143] which represents character: 🏿️
composed from unicode scalar(s): "\u{0001F3FF}"
composed from unicode scalar(s): "\u{FE0F}"
*/
Every character in Unicode can be represented by one or more unicode scalars. A unicode scalar is a unique 21-bit number (and name) for a character or modifier, such as U+0061 for LOWERCASE LATIN LETTER A("a"), or U+1F425 for FRONT-FACING BABY CHICK ("\U0001f425").
When a Unicode string is written to a text file or some other storage, these unicode scalars are encoded in one of several Unicode-defined formats. Each format encodes the string in small chunks known as code units. These include the UTF-8 format (which encodes a string as 8-bit code units) and the UTF-16 format (which encodes a string as 16-bit code units).
//copy from Apple Developer swift programming guide

Invalid escape sequence in literal: "\b"

I need to be able to create a String that is "\b". But when I try to, Xcode throws a compile-time error: Invalid escape sequence in literal. I don't understand why though, "\r" works just fine. If I put "\\b" then that's what is actually stored in the String, which is not what I need - I only need one backslash. To me, this seems to be a Swift oddity because it works just fine in Objective-C.
let str = "\b" //Invalid escape sequence in literal
NSString *str = #"\b"; //works great
I need to generate this string because "\b" is the only way to detect when the user pressed 'delete' when using UIKeyCommand:
let command = UIKeyCommand(input: "\b", modifierFlags: nil, action: "didHitDelete:")
How can I work around this issue?
EDIT: It really doesn't want to generate a String that is only "\b", this does not work - it stays the original value:
var delKey = "\rb"
delKey = delKey.stringByReplacingOccurrencesOfString("r", withString: "", options: .LiteralSearch, range: nil)
The Swift equivalent of \b is \u{8}. It maps to ASCII control code 8, just like \b in Objective C. I've tested this and found it to work fine with UIKeyCommand, in this earlier answer of mine.
Example snippet:
func keyCommands() -> NSArray {
return [
UIKeyCommand(input: "\u{8}", modifierFlags: .allZeros, action: "backspacePressed")
]
}
I don't believe it is supported.
Based on the Swift documentation https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift_Programming_Language/StringsAndCharacters.html:
String literals can include the following special Unicode characters:
The escaped special characters \0 (null character), \ (backslash), \t
(horizontal tab), \n (line feed), \r (carriage return), \" (double
quote) and \' (single quote)
An arbitrary Unicode scalar, written as
\u{n}, where n is between one and eight hexadecimal digits
The ASCII for the \b is 8. If you do the following, you'll see these results
let bs = "\u{8}"
var str = "Simple\u{8}string"
println(bs) // Prints ""
println("bs length is \(bs.lengthOfBytesUsingEncoding(NSUTF8StringEncoding))") // Prints 1
println(str) // Prints Simplestring
let space = "\u{20}"
println(space) // Prints " "
println("space length is \(space.lengthOfBytesUsingEncoding(NSUTF8StringEncoding))") // Prints 1
str = "Simple\u{20}string"
println(str) // Prints Simple string
It looks like while ASCII 8 "exists", it is "ignored".

golang convert iso8859-1 to utf8

I am trying to convert an ISO 8859-1 encoded string to UTF-8.
The following function works with my testdata which contains german umlauts, but I'm not quite sure what source encoding the rune(b) cast assumes. Is it assuming some kind of default encoding, e.g. ISO8859-1 or is there any way to tell it what encoding to use?
func toUtf8(iso8859_1_buf []byte) string {
var buf = bytes.NewBuffer(make([]byte, len(iso8859_1_buf)*4))
for _, b := range(iso8859_1_buf) {
r := rune(b)
buf.WriteRune(r)
}
return string(buf.Bytes())
}
rune is an alias for int32, and when it comes to encoding, a rune is assumed to have a Unicode character value (code point). So the value b in rune(b) should be a unicode value. For 0x00 - 0xFF this value is identical to Latin-1, so you don't have to worry about it.
Then you need to encode the runes into UTF8. But this encoding is simply done by converting a []rune to string.
This is an example of your function without using the bytes package:
func toUtf8(iso8859_1_buf []byte) string {
buf := make([]rune, len(iso8859_1_buf))
for i, b := range iso8859_1_buf {
buf[i] = rune(b)
}
return string(buf)
}
The effect of
r := rune(expression)
is:
Declare variable r with type rune (alias for int32).
Initialize variable r with the value of expresion.
No (re)encoding is involved and saying which one should be optionally used is possible only by explicitly writing/handling some re-encoding in code. Luckily, in this case no (re)encoding is necessary, Unicode incorporated those codes of ISO 8859-1 in a comparable way as ASCII. (If I checked correctly here)

Resources