iOS: Convert a string from little endiand to big endian - ios

I got a string in little endian that I would like to convert in big endian.
This "647b" should become "7b64". How can I do this in iOS (C++ code welcome)?
PS: I am deriving the string from a NSData object.

You didn't say how you were converting your data into a NSString, so I have to make some assumptions here.
NSStrings (and CFStringRefs, which are toll free bridged) have encodings. Many iOS devs don't need to keep in mind that their strings are UTF8Encoded.
[If you look at the list of string encodings from Apple (CoreFoundation & Foundation), you'll see some of them do specify little-endian and big-endian.
And probably the best way to do what you are trying to do is something like this:
// load it as UTF16 big endian
NSString *str = [NSString alloc] initWithData:yourDataObject encoding:NSUTF16BigEndianStringEncoding];
That line of code I found in this related question.

Related

Anyone have/know of a "universal" string class for C++Builder?

Has anyone built a "universal" string class for C++Builder that manages all of the conversions to/from ASCII and Unicode?
I had a vision of a class that would accept AnsiString, UnicodeString, WideString, char*, wchar_t*, std::string, and variant values, and would provide any of those back out. AND the copy constructor has to do a deep copy, not just provide a pointer to the same buffer space (as AnsiString and UnicodeString do).
I figure someone else besides me must have to pass strings to both old interfaces that use char* and new ones that use (wide) strings. If you have built, or know of, something you're willing to share, please let me know. Most of the time it's not too big a deal, until I have to pass a map<std::string, std::string>, then it starts getting ugly.
We do not, and will not, support any internationalization whatsoever, so I don't need to worry about encoding. I just want a class that will return my little ASCII strings in whatever format makes the compiler happy... sanely.
UPDATE: to address the comments:
So, std::map<std::string, std::string> is ugly, because you can't do:
parammap[AnsiString(widekey).c_str()] = AnsiString(widevalue).c_str();
Oh no no no. You have to do this:
AnsiString akey = widekey;
AnsiString aval = widevalue;
parammap[akey.c_str()] = aval.c_str();
The person who originally wrote this code tried to keep it as port-friendly as possible, so he standardized on char* for all of the function calls he wrote (circa 2000, it wasn't a bad assumption). Sometimes I was trying to convert everything to char *s before I realized that the function was then immediately turning around and converting it back to wide. There are multiple interface layers, and it took me a while to figure out how it all went together.
Add in some creative compiler bugs, where it would get confused, especially when pulling string values out of Variants. In some places, I had to do:
String wstr = passedvariant.AsType(varString);
String astr = wstr;
std::string key = astr.c_str();
Then life happened, we ended up starting the port over (for the 3rd time. Don't ask), and I finally got smart and wrapped the low-level library in a layer that does all of the conversions, and retooled the middle layers to deal in Strings, so the application layer can just use String except for that map. For the map<string, string>, I created a function to do the converting, so it was one line in a bunch of places instead of six (the three line conversion above for both key and value).
Lastly, I wasn't actually asking for anyone to make suggestions on how to make my code better. I was asking if anyone had or knew of a universal string class. Our code is the way it is for reasons, and I'm not rewriting all of it to make it prettier. I just wanted not to have to touch so many lines... again. It would have been so much nicer to have the compiler keep track of which format is needed and convert it.

Replace characters in C string

Given this C string:
unsigned char *temp = (unsigned char *)[#"Hey, I am some usual CString" UTF8String]
How can I replace "usual" with "other" to get: "Hey, I am some other CString".
I cannot use NSString functions (replaceCharactersInRange/replaceOccurencesOfString, etc.) for performance reasons. I have to keep it all at low level, since the strings I'll be dealing with happen to exceed 5MB, and therefore the replacements (there will be a lot of replacements to do) take about 10 minutes on a iOS device.
Objective-C is a just thin layer over C.
If you need to work with native C strings, just go ahead and do it.
This
What is the function to replace string in C?
seems to address your problem fairly well.
The C string returned by UTF8String is const. You can't safely change it by casting it to a non-const string and mutate the bytes. So the only way to do this is by creating a copy.
If you really have reason to use an NSString as the source it might be much faster to do the transformation on the original string.
If you want to get a better answer that helps you to speed up your special case you should provide some more information. How do you create the original string, what's the number and size of search/replacement strings and so on.

How to Validate a Character Compatible with MacRoman

Is there a simple way (a function, a method...) of validating a character that a user types to see if it's compatible with Mac OS Roman? I've read a few dozen topics to find out why an iOS application crashes in reference to CGContextShowTextAtPoint. I guess an application can crash if it tries to draw on an image a string (i.e. ©) containing a character that is not included in the Mac OS Roman set. There are 256 characters in this set. I wonder if there's a better way other than matching the selected character one by one with those 256 characters?
Thank you
You might give https://developer.apple.com/library/mac/#documentation/graphicsimaging/conceptual/drawingwithquartz2d/dq_text/dq_text.html a closer read.
You can draw any encoding using CGContextShowGlyphsAtPoint instead of CContextShowTextAtPoint so you can tell it what the encoding is. If the user types it then you'll be getting the string as an NSString which is a Unicode string underneath. Probably the easiest is going to be to get the utf8 encoding of that user entered string via NSString's UTF8String method.
If you really want to stick with the very limited MacRoman for some reason, then use NSString's cStringUsingEncoding: passing in NSMacOSRomanStringEncoding to get a MacRoman string. Read the documentation on this in NSString though. Will return null if the user string can't be encoded in MacRoman losslessly. As it discusses you can use dataUsingEncoding:allowLossyConversion: and canBeConvertedToEncoding: to check. Read the cautions in the Discussion for cStringUsingEncoding: about about lifecycle of the returned strings though. getCString:maxLength:encoding: might end up being a better choice for you. All discussed in the class documentation for NSString.
This doesn't directly answer the question but this answer may be a solution to your problem.
If you have an NSString, instead of using CGContextShowTextAtPoint, you can do:
[someStr drawAtPoint:somePoint withFont:someFont];
where someStr is an NSString containing any Unicode characters a user can type, somePoint is a CGPoint, and someFont is the UIFont to use to render the text.

Delphi XE - should I use String or AnsiString?

I finally upgraded to Delphi XE. I have a library of units where I use strings to store plain ANSI characters (chars between A and U). I am 101% sure that I will never ever use UNICODE characters in those places.
I want to convert all other libraries to Unicode, but for this specific library I think it will be better to stick with ANSI. The advantage is the memory requirement as in some cases I load very large TXT files (containing ONLY Ansi characters). The disadvantage might be that I have to do lots and lots of typecasts when I make those libraries to interact with normal (unicode) libraries.
There are some general guidelines to show when is good to convert to Unicode and when to stick with Ansi?
The problem with general guidelines is that something like this can be very specific to a person's situation. Your example here is one of those.
However, for people Googling and arriving here, some general guidelines are:
Yes, convert to Unicode. Don't try to keep an old app fully using AnsiStrings. The reason is that the whole VCL is Unicode, and you shouldn't try to mix the two, because you will convert every time you assign a Unicode string to an ANSI string, and that is a lossy conversion. Trying to keep the old way because it's less work (or some similar reason) will cause you pain; just embrace the new string type, convert, and go with it.
Instead of randomly mixing the two, explicitly perform any conversions you need to, once - for example, if you're loading data from an old version of your program you know it will be ANSI, so read it into a Unicode string there, and that's it. Ever after, it will be Unicode.
You should not need to change the type of your string variables - string pre-D2009 is ANSI, and in D2009 and alter is Unicode. Instead, follow compiler warnings and watch which string methods you use - some still take an AnsiString parameter and I find it all confusing. The compiler will tell you.
If you use strings to hold bytes (in other words, using them as an array of bytes because a character was a byte) switch to TBytes.
You may encounter specific problems for things like encryption (strings are no longer byte/characters, so 'character' for 'character' you may get different output); reading text files (use the stream classes and TEncoding); and, frankly, miscellaneous stuff. Search here on SO, most things have been asked before.
Commenters, please add more suggestions... I mostly use C++Builder, not Delphi, and there are probably quite a few specific things for Delphi I don't know about.
Now for your specific question: should you convert this library?
If:
The values between A and U are truly only ever in this range, and
These values represent characters (A really is A, not byte value 65 - if so, use TBytes), and
You load large text files and memory is a problem
then not converting to Unicode, and instead switching your strings to AnsiStrings, makes sense.
Be aware that:
There is an overhead every time you convert from ANSI to Unicode
You could use UTF8String, which is a specific type of AnsiString that will not be lossy when converted, and will still store most text (Roman characters) in a single byte
Changing all the instances of string to AnsiString could be a bit of work, and you will need to check all the methods called with them to see if too many implicit conversions are being performed (for performance), etc
You may need to change the outer layer of your library to use Unicode so that conversion code or ANSI/Unicode compiler warnings are not visible to users of your library
If you convert to Unicode, sets of characters (can't remember the syntax, maybe if 'S' in MySet?) won't work. From your description of characters A to U, I could guess you would like to use this syntax.
My recommendation? Personally, the only reason I would do this from the information you've given is the memory use, and possibly performance depending on what you're doing with this huge amount of A..Us. If that truly is significant, it's both the driver and the constraint, and you should convert to ANSI.
You should be able to wrap up the conversion at the interface between this unit and its clients. Use AnsiString internally and string everywhere else and you should be fine.
In general only use AnsiString if it is important that the Chars are single bytes, Otherwise the use of string ensures future compatibility with Unicode.
You need to check all libraries anyway because all Windows API functions in Delhpi XE replaced by their unicode-analogues, etc. If you will never use UNICODE you need to use Delphi 7.
Use AnsiString explicitly everywhere in this unit and then you'll get compiler warning errors (which you should never ignore) for String to AnsiString conversion errors if you happen to access the routines incorrectly.
Alternately, perhaps preferably depending on your situation, simply convert everything to UTF8.
Stick with Ansi strings ONLY if you do not have the time to convert the code properly. The use of Ansi strings is really only for backward compatibility - to my knowledge C# does not have an equiavalent to Ansi strings. Otherwise use the standard Unicode strings. If you have a look on my web-site I have a whole strings routines unit (about 5,000 LOC) that works with both Delphi 2007 (non-Uniocde) and XE (Unicode) with only "string" interfaces and contains almost all of the conversion issues you might face.

Delphi 2009 + Unicode + Char-size

I just got Delphi 2009 and have previously read some articles about modifications that might be necessary because of the switch to Unicode strings.
Mostly, it is mentioned that sizeof(char) is not guaranteed to be 1 anymore.
But why would this be interesting regarding string manipulation?
For example, if I use an AnsiString:='Test' and do the same with a String (which is unicode now), then I get Length() = 4 which is correct for both cases.
Without having tested it, I'm sure all other string manipulation functions behave the same way and decide internally if the argument is a unicode string or anything else.
Why would the actual size of a char be of interest for me if I do string manipulations?
(Of course if I use strings as strings and not to store any other data)
Thanks for any help!
Holger
With Unicode SizeOf(SomeChar) <> Length(SomeChar). Essentially the length of a string is less then the sum of the size of its chars. As long as you don't assume SizeOf(Char) = 1, or SizeOf(SomeString[x]) = 1 (since both are FALSE now) or try to interchange bytes with chars, then you shouldn't have any trouble. Any place you are doing something creative stuffing Bytes into Chars or Strings, then you will need to use AnsiString.
(SizeOf(SomeString) is still 4 no matter the length since it is essentially a pointer with some compiler magic.)
People often implicitly convert from characters to bytes in old Delphi code without really thinking about it. For example, when writing to a stream. When you write a string to a stream, you have to specify the number of bytes you write, but people often pass the character count instead. See this post from Chris Bensen for another example.
Another way people often make this implicit conversion and older code is by using a "string" to store binary data. In this case, they actually want bytes, but the data type expects characters. D2009 has a better type for this.
I didn't try Delphi 2009, but are using fpc which is also switching to unicode slowly. I'm 95% sure that everything below also holds for Delphi 2009
In fpc (when supporting unicode) it will be so that functions like 'length' take the codepage into consideration. Thus it will return the length of the string as a 'human' would see it. If there are - for example - two chinese characters, that both take two bytes of memory in unicode, length will return 2, since there are two characters in the string. But the string will take 4 bytes of memory. (+the memory for the reference count and the leading #0, but that aside)
What you can not do anymore is this:
var p : pchar;
begin
p := s[1];
for i := 0 to length(string)-1 do
begin
write(p);
inc(p);
end;
end;
Because this code will - in the two chinese-character example - write the wrong two characters. Namely the two bytes which are part of the first 'real' character.
In short: Length() doesn't return the amount of bytes allocated for the string anymore, but the amount of characters. (Before the switch to unicode, those two values were equal to eachother)
The actual size of a character shouldn't matter, unless you are doing the manipulation at the byte level.
(Of course if I use strings as strings and not to store any other data)
That's the key point, YOU don't use strings for other purposes, but some people do. They use strings just like arrays, so they (and that's including me) would need to check all such uses to make sure nothing is broken...
Lets not forget that there are times when this conversion is not really desired. Say for storing a GUID in a record for instance. The guid can only contain hexadecimal characters plus the - and brackets...making them take up twice the space can make quite an impact on existing code. Sure the simple solution is to change them to AnsiString, and deal with the compiler warnings if you do any string manipulation on them.
It can be an issue if you make Windows API calls. Or if you have legacy code that does inc or dec of str[0] to change its length.

Resources