I am trying to decode a string which is in UTF-8 format, into normal human readable string and tried may codes available on SO. But non of these worked.
My demo UTF-8 String is :-
let demoString: String = "यॠपहलॠपहलॠà¤à¤¾à¤¹à¤¤à¤¯à¥ बहà¤à¥ बहà¤à¥ हालत"
Is there is anyway to decode this UTF-8 String in swift. Any help would be appricated.
let demoString: String = "यॠपहलॠपहलॠà¤à¤¾à¤¹à¤¤à¤¯à¥ बहà¤à¥ बहà¤à¥ हालत"
This defines a perfectly fine string containing some rather weird characters like "à", "¤" and so on. There is no decoding that can be done here. The first character is a "Latin Small Letter A With Grave", U+00E0 or C3A0 in UTF-8 format.
If you want a string with "Hindi" characters - I suppose you mean Devanagari, or Bengali, Gurmukhi, Gujarati etc. , type for example
let demoString: String = "ऄइउऋऌऍ"
Related
I am facing the following issue:
In my app, the user can enter special characters (like emojis) in a textfield also. So, while sending this entered text to server in request body, I am converting it using the following code:
func emojiToUTF8()->String
{
let data = self.data(using: .nonLossyASCII, allowLossyConversion: true)
let emoji = String.init(data: data!, encoding: .utf8)
return emoji ?? self
}
For instance, if I enter the text "☺️", it gets converted into "\u263a\ufe0f" using the above method. Things are fine till here.
The problem occurs when I add this to a dictionary for sending it as a request parameter to the server. Code i'm using:
var parameters = [String:String]()
parameters["feedback"] = feedBackTxt
print("Parameters:",parameters) /// output: ["feedback": "\\u263a\\ufe0f"]
So, the problem here is that an extra slash is getting appended before each slash due to char escaping. I checked the created dictionary value as well. It shows double slash there also. How do I avoid this? Why is this happening when I am simply creating a dictionary with a string? This is causing issue at server end.
I have tried a couple of things, but none of them seem to work.
Your problem is that you're double-encoding.
You're taking a string, converting it to ASCII, then re-parsing it as UTF8 and then encoding that (probably) as JSON, which is UTF8. In the process, the backslashes are being escaped by your second encoder.
The best solution to this is to rework your server to accept UTF8. However, if you can't do that, you need to ensure you encode this string just one time, in ASCII.
In short, you should get rid of emojiToUTF8 and ensure that your parameters processor encodes the way your server requires (which apparently is ASCII and not UTF8).
I have the following code:
buff=esp.flash_read(esp.flash_user_start(),50)
print(buff)
I get the following output from print:
bytearray(b'{"ssid": "mySSID", "password": "myPASSWD"}\xff\xff\xff\xff\xff\xff')
What I want to do is get the json in buff. What is the correct "Python-way" to do that?
buff is a Python bytes object, as shown by the print output beginning with b'. To convert this into a string you need to decode it.
In standard Python you could use
buff.decode(errors='ignore')
Note that without specifying errors=ignore you would get a UnicodeDecodeError because the \xff bytes aren't valid in the default encoding, which is UTF-8; presumably they're padding and you want to ignore them.
If that works on the ESP8266, great! However this from the MicroPython docs suggests the keyword syntax might not be implemented - I don't have an ESP8266 to test it. If not then you may need to remove the padding characters yourself:
textLength = find(buff, b'\xff')
text = buff[0:textLength].decode()
or simply:
text = buff[0:buff.find(b'\xff')].decode()
If decode isn't implemented either, which it isn't in the online MicroPython interpreter, you can use str:
text = str(buff[0:find(buff, b'\xff')], 'utf-8')
Here you have to specify explicitly that you're decoding from UTF-8 (or whatever encoding you specify).
However if what you're really after is the values encoded in the JSON, you should be able to use the json module to retrieve them into a dict:
import json
j = json.loads(buff[0:buff.find(b'\xff')])
ssid = j['ssid']
password = j['password']
I'm returning a Character vector from a function in R to C# using R.NET. The only problem is that unicode characters, such as Greek Letters are being lost. The following line gives an example of the code I'm using:
CharacterVector cvAll = results[5].AsList().AsCharacter();
Where results is a list of results returned by the R function. The characters are also written by R to a text file and they display fine in notepad and other editors. Can I get R.Net to return the characters correctly?
Looks like you ran into an open issue with RDotNet : https://github.com/jmp75/rdotnet/issues/25
Unicode characters don't seem to be supported yet. I ran into the same issue while calling the engine.CreateDataFrame() method. It did return a DataFrame with all my accentuated strings wrong.
There seems to be a workaround though : when calling RDotNet functions, if I give strings encoded in my computer default encoding (Windows ANSI) and converted from UTF-8 (important), R takes them and gives back correctly interpreted accentuated strings to C#. I don't exactly know why it is working though... It might have something to do with the default encoding used with .Net for string being UTF-16. (cf. here : http://csharpindepth.com/Articles/General/Strings.aspx), hence the conversion from UTF-8 to default ANSI that seems to be working.
Here is an ugly example : when I'm building a RDotNet DataFrame, I convert all strings in a CharacterVector to ANSI (from UTF-8) encoded ones :
try
{
string[] colAsStrings = null;
colAsStrings = Array.ConvertAll<object, string>(uneColonne, s => StringEncodingHelper.EncodeToDefaultFromUTF8((string)s));
correctedDataArray[i] = colAsStrings;
columnConverted = true;
}
Here is the static method used for conversion :
public static string EncodeToDefaultFromUTF8(string stringToEncode)
{
byte[] utf8EncodedBytes = Encoding.UTF8.GetBytes(stringToEncode);
return Encoding.Default.GetString(utf8EncodedBytes);
}
I tried everything to convert JSON response to Chinese language but not getting any success. I need to display those string in uilabel.
This is the response I'm getting:
sentence = "\U00e6\U201a\U00a8\U00e5\U00a5\U00bd\U00e3\U20ac\U201a";
pinyin = "n\U00c3\U00adn h\U00c4\U0192o"
Converting sentence's string should be like 您好 but I'm getting 您好。
For pinyin I'm getting exactly right string [[nín hăo]] in label without converting but for sentence it gives me wrong value.
I'm using XCode 7.1 and my deployment target is 8.0.
Hi thanks all for helping and trying :) i ended up solving my own problem. What I did is directly put dict value to label text rather than passing from NSString. Taking it into string will give me value like 您好。
Here is what i've done.
cell.lblWord.text = [NSString stringWithFormat:#"Word: %#",[[dic objectForKey:#"cat"]objectForKey:#"chart"]];
It's strange but true, tried before but wasn't working.
I have a problem to convert an URL string, which I extract from XML file to NSString.
The URL string look like this, it looks like odd but it is URL format.
%3CTEXTFORMAT%20LEADING%3D%222%22%3E%3CP%20ALIGN%3D%22LEFT%22%3E%3CFONT%20FACE%3D%22Arial%22%20SIZE%3D%2212%22%20COLOR%3D%22%23000000%22%20LETTERSPACING%3D%220%22%20KERNING%3D%220%22%3E%u53F0%u5317%u7E2323141%u65B0%u5E97%u6C11%u6B0A%u8DEF130%u5DF714%u865F5%u6A13%3C/FONT%3E%3C/P%3E%3C/TEXTFORMAT%3E
However, when I use stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding method, it return nil.
After some experiment and research, seems this URL contain %u cause problem while converting URL and this %u looks like unicode, however, I try to remove all %u then stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding method return a proper string without any problem.
Does anyone know how can I convert this URLstring to NSString properly?
It is Unicode han characters in your urlString thats why it is not converting.
Replace %u to \u and you will get your String.
NSString *str=#"%3CTEXTFORMAT%20LEADING%3D%222%22%3E%3CP%20ALIGN%3D%22LEFT%22%3E%3CFONT%20FACE %3D%22Arial%22%20SIZE%3D%2212%22%20COLOR%3D%22%23000000%22%20LETTERSPACING%3D%220%22%20KERNING%3D%220%22%3E%u53F0%u5317%u7E2323141%u65B0%u5E97%u6C11%u6B0A%u8DEF130%u5DF714%u865F5%u6A13%3C/FONT%3E%3C/P%3E%3C/TEXTFORMAT%3E";
str=[str stringByReplacingOccurrencesOfString:#"%u" withString:#"\\u"];
NSString *convertedStr=[str stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSLog(#"converted string is %# \n",convertedStr);
output :---------------
converted string is <TEXTFORMAT LEADING="2"><P ALIGN="LEFT"><FONT FACE="Arial" SIZE="12" COLOR="#000000" LETTERSPACING="0" KERNING="0">\u53F0\u5317\u7E2323141\u65B0\u5E97\u6C11\u6B0A\u8DEF130\u5DF714\u865F5\u6A13</FONT></P></TEXTFORMAT>
for more Info follow this url
This is chinese unicode char
here is some code that will prove it:
NSString *newStr=#"\u53F0\u5317\u7E2323141\u65B0\u5E97\u6C11\u6B0A\u8DEF130\u5DF714\u865F5\u6A13";
NSLog(#"chinese string is %#",[newStr stringByReplacingPercentEscapesUsingEncoding:NSUTF16StringEncoding]);
output:----------------------
台北縣23141新店民權路130巷14號5樓
go to google translate converting this string will give you someone's address.
as :-
Citizens Xindian, Taipei County 23141 Road 130, 5th Floor, No. 14, Lane