SQLite Order By places umlauts & special chars at end - ios

I'm using Phonegap to do a dictionary app for iOS.
When querying the database for an alphabetical list I use COLLATE NOCASE:
ORDER BY term COLLATE NOCASE ASC
This solved the problem that terms starting with a lower case letter where appended to the end (Picked it up from that question).
However non-standard characters as öäüéêè still get sorted in the end ~ here 2 examples:
Expected: Öffnungszeiten Oberved: Zuzahlung
Zuzahlung Öffnungszeiten
(or) clé cliquer sur
cliquer sur clé
I looked around and found similar matters discussed here or here but it seems the general advice is to install some type of extension
This extension can probably help you ...
...use ICU either as an extension
SQLite supports integration with ICU ...
But I'm not sure if this is applicable in my situation where the database is not hosted by myself but running on the customers device. So I'd guess I'd to ship this extension w/ my app-package.
I'm not very familiar with iOS but I've got the feeling that would be complicated - at least.
Also in the official forum I've found that hint:
SQLite does not properly handle accented characters.
and a little bit down in the text the poster mentions a bug in SQLite.
All the links I've found haven't been active for >= 1 year and non of them seems to deal with the mobile environment I'm currently developing in.
So I was wondering if anyone else found a solution on their iOS projects.
The documentation states they're only 3 default COLLATION option:
6.0 Collating Sequences
When SQLite compares two strings, it uses a collating sequence or
collating function (two words for the same thing) to determine which
string is greater or if the two strings are equal. SQLite has three
built-in collating functions: BINARY, NOCASE, and RTRIM.
BINARY - Compares string data using memcmp(), regardless of text encoding.
NOCASE - The same as binary, except the 26 upper case characters of ASCII are folded to their lower case equivalents before the
comparison is performed. Note that only ASCII characters are case
folded. SQLite does not attempt to do full UTF case folding due to the
size of the tables required.
RTRIM - The same as binary, except that trailing space characters are ignored.
For now my best guess would be to do the sorting in JavaScript but I suspect that this wouldn't do anything well to overall performance.

The reason is that the SQLite on iOS doesn't come with ICU (International Components for Unicode) enabled. So you need to build your own SQLite version with ICU enabled + your own ICU version as static lib + add the ICU .dat and make SQLite load this .dat file. Then you can load any collation via a simple SQL command (i.e. 'icu_load_collation("de_DE", "DEUTSCH")', once after the db was opened)
It doesn't only sound like it's dirt work, it really is. Try to find a version of SQLite + ICU with all of it done already.

Related

String comparison (>) returns different results on different platforms? [duplicate]

This question already has an answer here:
Swift how to sort dict keys by byte value and not alphabetically?
(1 answer)
Closed 5 years ago.
Consider the following predicate
print("S" > "g")
Running this on Xcode yields false, whereas running this on the online compiler of tutorialspoint or e.g. the IBM Swift Sandbox (Swift Dev. 4.0 (Sep 5, 2017) / Platform: Linux (x86_64)), yields true.
How come there's a different result of the predicate on the online compilers (Linux?) as compared to vs Xcode?
This is a known open "bug" (or perhaps rather a known limitation):
SR-530 - [String] sort order varies on Darwin vs. Linux
Quoting Dave Abrahams' comment to the open bug report:
This will mostly be fixed by the new string work, wherein String's
default sort order will be implemented as a lexicographical ordering
of FCC-normalized UTF16 code units.
Note that on both platforms we rely on ICU for normalization services,
and normalization differences among different implementations of ICU
are a real possibility, so there will never be a guarantee that two
arbitrary strings sort the same on both platforms.
However, for Latin-1 strings such as those in the example, the new
work will fix the problem.
Moreover, from The String Manifest:
Comparing and Hashing Strings
...
Following this scheme everywhere would also allow us to make sorting
behavior consistent across platforms. Currently, we sort String
according to the UCA, except that--only on Apple platforms--pairs of
ASCII characters are ordered by unicode scalar value.
Most likely, the particular example of the OP (covering solely ASCII characters), comparison according to UCA (Unicode Collation Algorithm) is used for Linux platforms, whereas on Apple platforms, the sorting of these single ASCII character String's (or; String instances starting with ASCII characters) is according to unicode scalar value.
// ASCII value
print("S".unicodeScalars.first!.value) // 83
print("g".unicodeScalars.first!.value) // 103
// Unicode scalar value
print(String(format: "%04X", "S".unicodeScalars.first!.value)) // 0053
print(String(format: "%04X", "g".unicodeScalars.first!.value)) // 0067
print("S" < "g") // 'true' on Apple platforms (comparison by unicode scalar value),
// 'false' on Linux platforms (comparison according to UCA)
See also the excellent accepted answer to the following Q&A:
What does it mean that string and character comparisons in Swift are not locale-sensitive?

Unicode filenames in iOS

Is it possible to use the full range of (let's say) the Chinese language in filenames of assets (images) within iOS? If not, what portions of big languages are supported in filenames, string searches and other file handling activities?
iOS and Mac OS currently use the HFS+ filesystem, which supports full Unicode in filenames. This means essentially any character, including Chinese and other human languages. The filesystem allows up to 255 characters, which for most languages is about 255 code points. (I see a note that the length is based on UTF16-encoded characters. There are characters which require more than 16 bits to encode, like emoji, which you can also use, but you'll have fewer characters allowed.)
The file APIs on iOS (NSFileManager, etc) should accommodate Unicode strings without any extra work. Do note that Unicode sequences are canonicalized in a particular way: e.g. an é character can be represented in multiple different ways in Unicode, but will be decomposed in a standardized way as a filename.
The bottom line is, you can feel free to use Unicode strings as your filenames as long as they are of reasonable length. Because superlong Unicode names will start running into length issues in a slightly unpredictable way (really just complicated and unnecessary to compute), you should probably set some sane self-imposed length limits.
APFS is the next-gen filesystem that Apple is developing, and will appear on iOS at some point soon. I can't find info on file name encoding but it's a fair assumption that it will support anything HFS+ supports, if not more so.
The iOS filesystem uses case-sensitive HFSX, which is a variant of HFS Plus and uses the same rules for filenames and character encodings.
Those rules are laid out in several sections of Apple Technote 1150.
The important considerations are:
You may use up to 255 16-bit Unicode characters per file or folder name as described in the HFS Plus Names section of Technote 1150.
The filesystem at its base level uses Unicode v2.0 (this is fixed) and strings must be stored in fully decomposed, canonical order. This precludes the use of some "equivalent forms" -- i.e. they must be converted to decomposed form. This is described in detail in the Unicode Subtleties section of Technote 1150. This section details other issues and should be read carefully.
A list of illegal characters can be found in this Decomposition Table.
The colon character ':' is used as a directory separator and is invalid in file and folder names.

tFuzzyMatch apparently not working on Arabic text strings

I have created a job in talend open studio for data integration v5.5.1.
I am trying to find matches between two customer names columns, one is a lookup and the other contain dirty data.
The job runs as expected when the customer names are in english. However, for arabic names, only exact matches are found regardless of the underlying match algorithm i used (levenschtein, metaphone, double metaphone) even with loose bounds for the levenschtein algorithm min 1 max 50).
I suspect this has to do with character encoding. How should I proceed? any way I can operate using the unicode or even UTF-8 interpretation in Talend?
I am using excel data sources through tFileInputExcel
I got it resolved by moving the data to mysql with a UTF-8 collation. Somehow Excel input wasn't preserving the collation.

what is %s and %d?

I am trying to learn iOs programming. and I suppose this is a bit of a reverse question.
I have just completed a tutorial on youtube using Xcode to create a simple iPhone app that will allow you to store, list and delete data from an SQLite3 database (as the app i want to produce will need a database).
However the bloke who put the video up didn't seem to explain 'why' he did what he did, so I am now trying to understand what each bit of code does
(I come from a PHP and SQL web programming background, so I understand accessing databases, calling data rows etc to show the content on a website.)
The one part of this iOs bit I don't quite understand is the %s and %d values used as they didn't seem to be declared anywhere.
The code is;
if(sqlite3_open([dbPathString UTF8String], &personDB)==SQLITE_OK) {
NSString *inserStmt = [NSString stringWithFormat:#"INSERT INTO PERSONS(NAME,AGE) values ('%s', '%d')",[self.nameField.text UTF8String],[self.ageField.text intValue]];
now %s and %d clearly get their values from the self.nameField and self.ageField. However that implies that I could only ever submit two values into a table? or are there other % for other values, but surely then there is a max of 26.
I would be grateful for any explanation you could give.
Also in addition, does anyone have any suggestions about other fully explained ways to learn to code for iOS? especially if you were a starter just learning iOS programming for a first time with limited C programming skills before hand.
The area i am looking for is to create an app that will store some text fields and an image, which either will be stored in a database and the image either in the database or as a link and appropriately named.
I'd like to be able to manipulate the image to resize it so it is optimised for the iPhone display (don't need a HD image in the APP)
Later I'd like to be able to work out how either upload the local database (sqlite3) file to a an online storage (either my own server or dropbox), or synchronise it to an SQL database (from initial looks just exporting the file would be better and embedding the images into a field would be better for this project, even though i know it is not the normal way of doing things)
%s and %d are format specifiers for a null-terminated array of characters and a signed 32-bit integer respectively. You can find the details about specifiers in the String Programming Guide. However, you should not format the string this way for a SQLite statement as it puts you at risk of SQL injection. Instead you should bind the values using ? and the appropriate sqlite3_bind* function. For your situation you would use sqlite3_bind_text for NAME and sqlite3_bind_int for AGE.
Have a look at the class reference:
https://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html
Here are the string format specifiers:
https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Strings/Articles/formatSpecifiers.html#//apple_ref/doc/uid/TP40004265
As you can see, %d is outputting an integer while %s is outputting a string.
Part 1:
The % convention "string format specifiers" is a common standard for string substitution.
They are not variables, but typed substitution placeholders.
%s --> string
%d --> number
Part 2:
You might check out the iTunes U course:
iPhone Application Programming '11
by Prof. Jan Borchers
https://itunes.apple.com/us/itunes-u/iphone-application-programming/id474416629
https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Strings/Articles/formatSpecifiers.html
%s
Null-terminated array of 8-bit unsigned characters. Because the %s specifier causes the characters to be interpreted in the system default encoding, the results can be variable, especially with right-to-left languages. For example, with RTL, %s inserts direction markers when the characters are not strongly directional. For this reason, it’s best to avoid %s and specify encodings explicitly.
%d, %D
Signed 32-bit integer (int).
That is what is called a formatted string, basically, it is a way to inject values into a string. The character after the % sign is used to indicate the datatype that the value should be formatted as. In your case, %s is used to indicate a string value and %d is used to indicate a decimal integral value.
This type of string formatting is extremely common; many programming languages provide some mechanism for performing this type of string formatting and the formatting symbols are largely standardized. You can find a more information on the C++ website.

String concatenation in SE 4.1 SELECT statement

I'm using ISQL-SE 4.1 and need to concatenate two CHAR strings in a SELECT statement. I tried using the two pipe symbols || but that doesnt work in 4.1 is there another way to do the trick, maybe using subscripts [a,b] or using a temporary file without having to resort to ESQL?
The concatenate operators were introduced in Version 5.00. The functions were introduced even later.
There isn't a way to directly concatenate strings in the SQL dialect of Informix SE 4.10.
If you need string concatenation in SQL, upgrade to a more recent version of SE. (I note that Version 4.10 was originally released in about 1989.)
Alternatively, if you are writing a report, then you can select the individual strings and concatenate in the ACE report.

Resources