I am currently working on Swift script that allows type-checks by compiler for localization strings, something that was sorely needed for a long time. If you are interested, you can check the project on GitHub to get better understanding.
The problem
Part of it is creation of methods from strings, when parser encounters special characters, that are meant to be changed in runtime (%d, %f, %# etc.). String like this:
"PROFILE_INFO" = "I am %#, I am %d years old and %.2fm in height!"
Will get converted to method with following signature:
func profileInfo(value1 : String, value2 : Int, value3 : Float { ...
What I am really curious about and what I could not find anywhere, not even in documentation, is what types are allowed in localization strings. I suspect it goes through default format and there is a lot of types to cover, in which case, I am curious what people used and what types can be omitted. I am using following regexp matching to find the special characters currently, and then converting them to appropriate data types:
let regexp = "%([0-9]*.[0-9]*(d|f|ld)|#|d)"
let matches = self.matchesForRegexInText("%([0-9]*.[0-9]*(d|f|ld)|#|d)", text: string)
I know this covers most of the usual cases, but obviously, I would like to have full coverage if possible.
TLDR:
Q1: What format specifiers are allowed in localization strings - are there any changes from classic string format or everything is the same?
Q2: Is there any better way to convert those characters to appropriate data type than using regexp to parse them out?
Thanks!
Related
Assuming I define a variable like this in Lua
local input = "..."
Where the ... comes from a user-provided string. Would that user be able to perform code injection just from a variable definition? Do I need to sanitize the string?
As a general rule, if you ever need to ask yourself if you need to sanitize your inputs, the correct answer is "yes".
As to this particular case, if you just copy/paste the user's string directly into the Lua source file, even in quotes like that, they will be able to execute arbitrary code. It's not even particularly difficult; they can provide some text"; my_code = 20; last = "end of string.
The best way to sanitize this is by using a long-form literal string with [[...]] syntax. But even that can be broken out, so you need to search through the given string for repeated sequences of the = character. Each time you find a sequence, note how many = characters are in that sequence. After searching, insert a number of = characters into your literal string that isn't one of the lengths found in the user string.
Of course, the internal implementation of Lua may have some limits on the length of the = sequence in a long-form literal string. In such a case, an external user could break your code by forcing you to use a longer sequence than the implementation supports. But it won't be able to cause arbitrary code execution; you'll just get a compile error.
I have a very simple task: from server I get UTF-8 string as byte array and I need to show all symbols from this string in upper case. From this string you can get really any symbol of unicode table.
I know how to do it in a line of code:
NSString* upperStr = [[NSString stringWithCString:utf8str encoding:NSUTF8StringEncoding] uppercaseString];
And seems to me it works with all symbols which I have checked. But I don't understand: why we need method uppercaseStringWithLocale? When we work with unicode each symbol has unique place in unicode table and we can easily find does it have upper/lower case representation. What trouble I might have if I use uppercaseString instead uppercaseStringWithLocale?
Uppercasing is locale-dependent. For example, the uppercase version of "i" is "I" in English, but "İ" in Turkish. If you always apply English rules, uppercasing of Turkish words will end up wrong.
The docs say:
The following methods perform localized case mappings based on the
locale specified. Passing nil indicates the canonical mapping. For
the user preference locale setting, specify +[NSLocale currentLocale].
Assumedly in some locales the mapping from lowercase to uppercase changes even within a character set. I'm not an expert in every language around the globe, but the people who wrote these methods are, so I use 'em.
Is there a simple way (a function, a method...) of validating a character that a user types to see if it's compatible with Mac OS Roman? I've read a few dozen topics to find out why an iOS application crashes in reference to CGContextShowTextAtPoint. I guess an application can crash if it tries to draw on an image a string (i.e. ©) containing a character that is not included in the Mac OS Roman set. There are 256 characters in this set. I wonder if there's a better way other than matching the selected character one by one with those 256 characters?
Thank you
You might give https://developer.apple.com/library/mac/#documentation/graphicsimaging/conceptual/drawingwithquartz2d/dq_text/dq_text.html a closer read.
You can draw any encoding using CGContextShowGlyphsAtPoint instead of CContextShowTextAtPoint so you can tell it what the encoding is. If the user types it then you'll be getting the string as an NSString which is a Unicode string underneath. Probably the easiest is going to be to get the utf8 encoding of that user entered string via NSString's UTF8String method.
If you really want to stick with the very limited MacRoman for some reason, then use NSString's cStringUsingEncoding: passing in NSMacOSRomanStringEncoding to get a MacRoman string. Read the documentation on this in NSString though. Will return null if the user string can't be encoded in MacRoman losslessly. As it discusses you can use dataUsingEncoding:allowLossyConversion: and canBeConvertedToEncoding: to check. Read the cautions in the Discussion for cStringUsingEncoding: about about lifecycle of the returned strings though. getCString:maxLength:encoding: might end up being a better choice for you. All discussed in the class documentation for NSString.
This doesn't directly answer the question but this answer may be a solution to your problem.
If you have an NSString, instead of using CGContextShowTextAtPoint, you can do:
[someStr drawAtPoint:somePoint withFont:someFont];
where someStr is an NSString containing any Unicode characters a user can type, somePoint is a CGPoint, and someFont is the UIFont to use to render the text.
I am trying to learn iOs programming. and I suppose this is a bit of a reverse question.
I have just completed a tutorial on youtube using Xcode to create a simple iPhone app that will allow you to store, list and delete data from an SQLite3 database (as the app i want to produce will need a database).
However the bloke who put the video up didn't seem to explain 'why' he did what he did, so I am now trying to understand what each bit of code does
(I come from a PHP and SQL web programming background, so I understand accessing databases, calling data rows etc to show the content on a website.)
The one part of this iOs bit I don't quite understand is the %s and %d values used as they didn't seem to be declared anywhere.
The code is;
if(sqlite3_open([dbPathString UTF8String], &personDB)==SQLITE_OK) {
NSString *inserStmt = [NSString stringWithFormat:#"INSERT INTO PERSONS(NAME,AGE) values ('%s', '%d')",[self.nameField.text UTF8String],[self.ageField.text intValue]];
now %s and %d clearly get their values from the self.nameField and self.ageField. However that implies that I could only ever submit two values into a table? or are there other % for other values, but surely then there is a max of 26.
I would be grateful for any explanation you could give.
Also in addition, does anyone have any suggestions about other fully explained ways to learn to code for iOS? especially if you were a starter just learning iOS programming for a first time with limited C programming skills before hand.
The area i am looking for is to create an app that will store some text fields and an image, which either will be stored in a database and the image either in the database or as a link and appropriately named.
I'd like to be able to manipulate the image to resize it so it is optimised for the iPhone display (don't need a HD image in the APP)
Later I'd like to be able to work out how either upload the local database (sqlite3) file to a an online storage (either my own server or dropbox), or synchronise it to an SQL database (from initial looks just exporting the file would be better and embedding the images into a field would be better for this project, even though i know it is not the normal way of doing things)
%s and %d are format specifiers for a null-terminated array of characters and a signed 32-bit integer respectively. You can find the details about specifiers in the String Programming Guide. However, you should not format the string this way for a SQLite statement as it puts you at risk of SQL injection. Instead you should bind the values using ? and the appropriate sqlite3_bind* function. For your situation you would use sqlite3_bind_text for NAME and sqlite3_bind_int for AGE.
Have a look at the class reference:
https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html
Here are the string format specifiers:
https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Strings/Articles/formatSpecifiers.html#//apple_ref/doc/uid/TP40004265
As you can see, %d is outputting an integer while %s is outputting a string.
Part 1:
The % convention "string format specifiers" is a common standard for string substitution.
They are not variables, but typed substitution placeholders.
%s --> string
%d --> number
Part 2:
You might check out the iTunes U course:
iPhone Application Programming '11
by Prof. Jan Borchers
https://itunes.apple.com/us/itunes-u/iphone-application-programming/id474416629
https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Strings/Articles/formatSpecifiers.html
%s
Null-terminated array of 8-bit unsigned characters. Because the %s specifier causes the characters to be interpreted in the system default encoding, the results can be variable, especially with right-to-left languages. For example, with RTL, %s inserts direction markers when the characters are not strongly directional. For this reason, it’s best to avoid %s and specify encodings explicitly.
%d, %D
Signed 32-bit integer (int).
That is what is called a formatted string, basically, it is a way to inject values into a string. The character after the % sign is used to indicate the datatype that the value should be formatted as. In your case, %s is used to indicate a string value and %d is used to indicate a decimal integral value.
This type of string formatting is extremely common; many programming languages provide some mechanism for performing this type of string formatting and the formatting symbols are largely standardized. You can find a more information on the C++ website.
Here's my problem. I have YAML doc that contains the following pair:
run_ID: 2010_03_31_101
When this get's parsed at
org.yaml.snakeyaml.constructor.SafeConstructor.ConstructYamlInt:159
underscores get stripped and Constructor returns Long 20100331101
instead of unmodified String "2010_03_31_101" that I really need.
QUESTION: How
can I disable this behavior and force parser to use String constructor
instead of Long?
OK. Got answer form their mailing list. Here it is
Hi, according to the spec
(http://yaml.org/type/int.html): Any
“_” characters in the number are
ignored, allowing a readable
representation of large values
You have a few ways to solve it. 1) do
not rely on implicit types, use quotes
(single or double) run_ID:
'2010_03_31_101'
2) Turn off resolver for integers (as
it is done here for floats) link
1 link 2
3) Define your own pattern for int
link 3
Please be aware that when you start to
deviate from the spec other recipients
may fail to parse your YAML document.
Using quotes is safe.
Andrey