I get a weird text symbol when I output the content of a NSString. A user sent its database and it is stored in core data. This is what the symbol looks like:
I guess it does have some specific code, or is it a not supported character? The user can't explain from where such symbol comes from.
Thanks
Look at NSString's class description. You can use the 'canBeConvertedToEncoding' iteratively, with all possible encodings listed in the 'String Encodings' section at the bottom, to find out which one your strings use.
Related
In the console, for some unprocessable characters it emits this:
Can someone tell me:
1) What exactly does that symbol mean?
2) What's it called?
3) How can I detect which chars will give me that - e.g. if I want to write code to find all the integers for which the string representation is rendered as that, how do I do it:
(60000..70000).select {|i| what_do_i_do_here?(i) }
The symbol means that the font cannot display a character.
The name: check out the sister site: https://english.stackexchange.com/questions/62524/what-do-you-call-the-phenomenon-where-a-rectangle-is-shown-because-a-font-lack , so officially in Unicode: replacement glyph, but often it is called tofu.
It depends on your font, and the font/language setting. Note: you are just trying single code-points, but some glyphs ("letters" as displayed by a font), could be made by several combining code-points (the "char" in many computer languages).
I really do not know how to know which characters are supported by a specific font (just by having the font). Often the font creators tell you which glyphs are implemented.
Is there a simple way (a function, a method...) of validating a character that a user types to see if it's compatible with Mac OS Roman? I've read a few dozen topics to find out why an iOS application crashes in reference to CGContextShowTextAtPoint. I guess an application can crash if it tries to draw on an image a string (i.e. ©) containing a character that is not included in the Mac OS Roman set. There are 256 characters in this set. I wonder if there's a better way other than matching the selected character one by one with those 256 characters?
Thank you
You might give https://developer.apple.com/library/mac/#documentation/graphicsimaging/conceptual/drawingwithquartz2d/dq_text/dq_text.html a closer read.
You can draw any encoding using CGContextShowGlyphsAtPoint instead of CContextShowTextAtPoint so you can tell it what the encoding is. If the user types it then you'll be getting the string as an NSString which is a Unicode string underneath. Probably the easiest is going to be to get the utf8 encoding of that user entered string via NSString's UTF8String method.
If you really want to stick with the very limited MacRoman for some reason, then use NSString's cStringUsingEncoding: passing in NSMacOSRomanStringEncoding to get a MacRoman string. Read the documentation on this in NSString though. Will return null if the user string can't be encoded in MacRoman losslessly. As it discusses you can use dataUsingEncoding:allowLossyConversion: and canBeConvertedToEncoding: to check. Read the cautions in the Discussion for cStringUsingEncoding: about about lifecycle of the returned strings though. getCString:maxLength:encoding: might end up being a better choice for you. All discussed in the class documentation for NSString.
This doesn't directly answer the question but this answer may be a solution to your problem.
If you have an NSString, instead of using CGContextShowTextAtPoint, you can do:
[someStr drawAtPoint:somePoint withFont:someFont];
where someStr is an NSString containing any Unicode characters a user can type, somePoint is a CGPoint, and someFont is the UIFont to use to render the text.
I'm trying to read a text file that is comprised of stock symbols and the associated company into a dictionary, each line in the text file has one symbol and company like
APPL Apple
GOOG-Q Google
Ultimately, I'm trying to have a searchbar that looks up the corresponding company based on the stock symbol (or vice versa).
So, what is the best way to approach this? read the entire file as a string then try to separate the items with fileString componentsSeparatedByString:#"\n" or is there a better way?
NSScanner is often a good approach for parsing data.
You might want to pay attention to the end-of-line character, depending on where the file comes from, it can be a linefeed (\n), carriage-return (\r), or both characters (\r\n).
Using NSScanner will make it easy for you to scan for lines, while ignoring the actual end-of-line character (see [NSCharacterSet newLineCharacterSet]).
I am using PDFKitten for searching strings within PDF documents with highlighting of the results. FastPDFKit or any other commercial library is no option so i sticked to the most close one for my requirements.
As you can see in the screenshot i searched for the string "in" which is always correctly highlighted except the last one. I got a more complex PDF document where the highlighted box for "in" is nearly 40% wrong.
I read the whole syntax and checked the issues tracker but except line height problems i found nothing regarding the width calculation. For the moment i dont see any pattern where the calculation goes or could be wrong and i hope that maybe someone else had a close problem to mine.
My current expectation is that the coordinates and character width is wrong calculated somewhere in the font classes or RenderingState.m. The project is very complex and maybe someone of you had a similar problem with PDFKitten in the past.
I have used the original sample PDF document from PDFKitten for my screenshot.
This might be a bug in PDFKitten when calculating the width of characters whose character identifier does not coincide with its unicode character code.
appendPDFString in StringDetector works with two strings when processing some string data:
// Use CID string for font-related computations.
NSString *cidString = [font stringWithPDFString:string];
// Use Unicode string to compare with user input.
NSString *unicodeString = [[font stringWithPDFString:string] lowercaseString];
stringWithPDFString in Font transforms the sequence of character identifiers of its argument into a unicode string.
Thus, in spite of the name of the variable, cidString is not a sequence of character identifiers but instead of unicode chars. Nonetheless its entries are used as argument of didScanCharacter which in Scanner is implemented to forward the position by the character width: It is using the value as parameter of widthOfCharacter in Font to determine the character width, and that method (according to the comment "Width of the given character (CID) scaled to fontsize") expects its argument to be a character identifier.
So, if CID and unicode character code don't coincide, the wrong character widths is determined and the position of any following character cannot be trusted. In the case at hand, the /fi ligature has a CID of 12 which is way different from its Unicode code 0xfb01.
I would propose PDFKitten to be enhanced to also define a didScanCID method in StringDetector which in appendPDFString should be called next to didScanCharacter for each processed character forwarding its CID. Scanner then should make use of this new method instead to calculate the width to forward its cursor.
This should be triple-checked first, though. Maybe some widthOfCharacter implementations (there are different ones for different font types) in spite of the comment expect the argument to be a unicode code after all...
(Sorry if I used the wrong vocabulary here or there, I'm a 'Java guy... :))
As already pointed out in the topic, I got the following error:
Character #\u009C cannot be represented in the character set CHARSET:CP1252
trying to print out a string given back by drakma:http-request, as far as I understand the error-code the problem is that the windows-encoding (CP1252) does not support this character.
Therefore to be able to process it, I might/must convert the whole string.
My question is what package/library does support converting strings to certain character-sets efficiently?
An alike question is this one, but just ignoring the error would not help in my case.
Drakma already does the job of "converting strings": after all, when it reads from some random webserver, it just gets a stream of bytes. It then has to convert that to a lisp string. You probably want to bind *drakma-default-external-format* to something else, although I can't remember off-hand what the allowable values are. Maybe something like :utf-8?