libxslt: xml to html text encoding issue iOS - ios

I am using xslt framework to convert xml to html for iOS project.
I am sending the encoded xml to the xslt framework. But the output it gives is not encoded. So when i try to parse the html, i am getting the parser error.
NSString *xml = #"<div>a<b</div>" // not exact this but its similar in encoding
NSData *xmlMem = [xml dataUsingEncoding:NSUTF8StringEncoding];
NSString* styleSheetPath = [[NSBundle mainBundle] pathForResource:fileName ofType:fileExtension];
xmlDocPtr doc, res;
xsltStylesheetPtr sty;
xmlSubstituteEntitiesDefault(1);
xmlLoadExtDtdDefaultValue = 1;
sty = xsltParseStylesheetFile((const xmlChar *)[styleSheetPath cStringUsingEncoding: NSUTF8StringEncoding]);
doc = xmlParseMemory([xmlMem bytes], [xmlMem length]);
res = xsltApplyStylesheet(sty, doc, nil);
xmlChar* xmlResultBuffer = nil;
xsltSaveResultToString(&xmlResultBuffer, &length, res, sty);
NSString* resultHTML = [NSString stringWithCString: (char *)xmlResultBuffer encoding:NSUTF8StringEncoding];
NSLog(#"Result: %#", resultHTML);
Result: <div>a<b<div>
The result is not an encoded html. Could anyone help me to fix this issue?

The problem is the following: In the course of parsing a string of XML, any entity references are expanded, that is, replaced with the string value they reference.
If your input XML contains entities such as <, they will appear as < as soon as they are parsed - and before the XML can be processed.
To avoid this, just replace & with its entity, too, that is, &. Change
NSString *xml = #"<div>a<b</div>"
to
NSString *xml = #"<div>a&lt;b</div>"
Then, &lt; is resolved to < but no further replacement is applied, since it is not an iterative process.

Related

Get unicode chars from webservice and display them in ios app

I need some help:
i get from WebService only a part from the uunicode value, and after this I append the prefix \u to finish the value. The .ttf is good, i tested with some hardcoded values.
NSString *cuvant = [[self.catData objectAtIndex:indexPath.row]objectAtIndex:9]; //Get data
//apend prefix (double \ to escape the \u command)
cuvant = [NSString stringWithFormat:#"\\u%#",cuvant];
// cell.catChar.text = [NSString stringWithUTF8String:"\ue674"]; --->this works very well
cell.catChar.text = [NSString stringWithUTF8String:[cuvant UTF8String]]; //---> this doesn't work
i searched the documentation, and other sites but i didn't found nothing usefull, all the hints are with hardcoded data... i need tot take the codes dinamically
Thanks!
All you need is just to feed this unicoded string as data first. So make a C-String then
NSData *dataFromUnicodedString = [NSData dataWithBytes:yourCUnicodedString length:strlen(yourCUnicodedString)];
and afterwards the resulting string will be
NSString *unicodedString = [[NSString alloc] initWithData:dataFromUnicodedString encoding:NSUTF8StringEncoding];

convert unicode characters while data loading in uiwebview

{ disclaimertxt = "<b>sample tex\U221a\U00a9t n\U221a\U00a9ewerv\U221a\U00a9e iew adults.</b>
\n<br/>
\n<br/>
\nthis is sample \U221a\U2020 an d\U221a\U00a9convertion of the language\U221a\U00a9reduction\U201a\U00c4\U00b6
\n ";
}
the above one is dictionary which contains above value with key disclaimertext
but in output the unicode characters is replaced with ?(C)" and "?A
and my code is :
NSData *messagedata = [dictionary[disclaimertxt] dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *string = [[NSString alloc]initWithData:messagedata encoding:NSUTF8StringEncoding];
so i want to display those unicode characters with respected text/value while loading in uiwebview.. plz give ur suggestions. in ios7 its working fine with desired output what i need. but when i run ios6 aim getting above outout.so plz find solution.
Follow these steps:
manually load the HTML from the page that doesn't include the meta tag, into a string.
Using string manipulation, insert your appropriate encoding meta tag into the manually loaded HTML
set the HTML in your web view to the modified string using loadHTMLString:
Use this to convert Unicode charaters to NSString:
char cString[] = "This isn\u2019t Test String";
NSData *data = [NSData dataWithBytes:cString length:strlen(cString)];
NSString *string = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
Now use this string in your Webview.

Encoding for converting between NSString to NSData and back

I'm trying to encrypt/decrypt an NSString and return the original string in the end. Here's how I convert the string to a data object:
NSData *string_data = [string dataUsingEncoding:NSUTF8StringEncoding];
And after that data has been encrypted/decrypted I want it back to the original string by doing:
NSString *to_string = [NSString stringWithCString:[decrypted_data bytes] encoding:NSUTF8StringEncoding];
The encoding seems to match, but I still get a null when I try to print out to_string to the console. I've tried all sorts of encoding settings. It doesn't seem to work.
Use:
NSString *to_string = [[NSString alloc] initWithData:string_data encoding:NSUTF8StringEncoding];
It is not safe to use stringWithCString because the bytes buffer you get from NSData is not guaranteed to be null-terminated.

How do I properly encode Unicode characters in my NSString?

Problem Statement
I create a number of strings, concatenate them together into CSV format, and then email the string as an attachment.
When these strings contain only ASCII characters, the CSV file is built and emailed properly. When I include non-ASCII characters, the result string becomes malformed and the CSV file is not created properly. (The email view shows an attachment, but it is not sent.)
For instance, this works:
uncle bill's house of pancakes
But this doesn't (note the curly apostrophe):
uncle bill’s house of pancakes
Question
How do I create and encode the final string properly so that all valid unicode characters are included and the result string is formed properly?
Notes
The strings are created via a UITextField and then are written to and then read from a Core Data store.
This suggests that the problem lies in the initial creation and encoding of the string: NSString unicode encoding problem
I don't want to have to do this: remove non ASCII characters from NSString in objective-c
The strings are written and read to/from the data store fine. The strings display properly (individually) in the app's table views. The problem only manifests when concatenating the strings together for the email attachment.
String Processing Code
I concatenate my strings together like this:
[reportString appendFormat:#"%#,", category];
[reportString appendFormat:#"%#,", client];
[reportString appendFormat:#"%#\n", detail];
etc.
Replacing curly quotes with boring quotes makes it work, but I don't want to do it this way:
- (NSMutableString *)cleanString:(NSString *)activity {
NSString *temp1 = [activity stringByReplacingOccurrencesOfString:#"’" withString:#"'"];
NSString *temp2 = [temp1 stringByReplacingOccurrencesOfString:#"‘" withString:#"'"];
NSString *temp3 = [temp2 stringByReplacingOccurrencesOfString:#"”" withString:#"\""];
NSString *temp4 = [temp3 stringByReplacingOccurrencesOfString:#"“" withString:#"\""];
return [NSMutableString temp4];
}
Edit:
The email is sent:
NSString *attachment = [self formatReportCSV];
[picker addAttachmentData:[attachment dataUsingEncoding:NSStringEncodingConversionAllowLossy] mimeType:nil fileName:#"MyCSVFile.csv"];
where formatReportCSV is the function that concatenates and returns the csv string.
You seem to be running across a string encoding issue. Without seeing what your Core Data model looks like, I'd assume the issue boils down to the issue reproduced by the code below.
NSString *string1 = #"Uncle bill’s house of pancakes.";
NSString *string2 = #" Appended with some garbage's stuff.";
NSMutableString *mutableString = [NSMutableString stringWithString: string1];
[mutableString appendString: string2];
NSLog(#"We got: %#", mutableString);
// We got: Uncle bill’s house of pancakes. Appended with some garbage's stuff.
NSData *storedVersion = [mutableString dataUsingEncoding: NSStringEncodingConversionAllowLossy];
NSString *restoredString = [[NSString alloc] initWithData: storedVersion encoding: NSStringEncodingConversionAllowLossy];
NSLog(#"Restored string with NSStringEncodingConversionAllowLossy: %#", restoredString);
// Restored string with NSStringEncodingConversionAllowLossy:
storedVersion = [mutableString dataUsingEncoding: NSUTF8StringEncoding];
restoredString = [[NSString alloc] initWithData: storedVersion encoding: NSUTF8StringEncoding];
NSLog(#"Restored string with UTF8: %#", restoredString);
// Restored string with UTF8: Uncle bill’s house of pancakes. Appended with some garbage's stuff.
Note how the first string (encoded using ASCII) couldn't handle the presence of the non-ASCII character (it can if you use dataUsingEncoding:allowsLossyConversion: with the second parameter being YES).
This code should fix the issue:
NSString *attachment = [self formatReportCSV];
[picker addAttachmentData:[attachment dataUsingEncoding: NSUTF8StringEncoding] mimeType:nil fileName:#"MyCSVFile.csv"];
Note: you may need to use one of the UTF16 string encodings if you need to handle non-UTF8 languages like Japanese.

Encrypted twitter feed

I'm developing an iOS application , that will take a twits from twitter,
I'm using the following API
https://api.twitter.com/1/statuses/user_timeline.json?include_entities=true&include_rts=true&count=2&screen_name=TareqAlSuwaidan
The problem are feed in Arabic Language ,
i.e the text feed appears like this
\u0623\u0646\u0643 \u0648\u0627\u0647\u0645
How can i get the real text (or how to encode this to get real text) ?
This is not encrypted, it is unicode. The codes 0600 - 06ff is Arabic. NSString handles unicode.
Here is an example:
NSString *string = #"\u0623\u0646\u0643 \u0648\u0627\u0647\u0645";
NSLog(#"string: '%#'", string);
NSLog output:
string: 'أنك واهم'
The only question is exactly what problem are you seeing, are you getting the Arabic text? Are you using NSJSONSerialization to deserialize the JSON? If so there should be no problem.
Here is an example with the question URL (don't use synchronous requests in production code):
NSURL *url = [NSURL URLWithString:#"https://api.twitter.com/1/statuses/user_timeline.json?include_entities=true&include_rts=true&count=2&screen_name=TareqAlSuwaidan"];
NSData *data = [NSData dataWithContentsOfURL:url];
NSError *error;
NSArray *jsonObject = [NSJSONSerialization JSONObjectWithData:data options:NSJSONReadingMutableContainers error:&error];
NSDictionary *object1 = [jsonObject objectAtIndex:0];
NSString *text = [object1 objectForKey:#"text"];
NSLog(#"text: '%#'", text);
NSLog output:
text: '#Naser_Albdya أيدت الثورة السورية منذ بدايتها وارجع لليوتوب واكتب( سوريا السويدان )
Those are Unicode literals. I think all that's needed is to use NSString's stringWithUTF8String: method on the string you have. That should use NSString's native Unicode handling to convert the literals to the actual characters. Example:
NSString *directFromTwitter = [twitterInterface getTweet];
// directFromTwitter contains "\u0623\u0646\u0643 \u0648\u0627\u0647\u0645"
NSString *encodedString = [NSString stringWithUTF8String:[directFromTwitter UTF8String]];
// encodedString contains "أنك واهم", or something like it
The method call inside the conversion call ([directFromTwitter UTF8String]) is to get access to the raw bytes of the string, that are used by stringWithUTF8String. I'm not exactly sure on what those code points come out to, I just relied on Python to do the conversion.

Resources