Given this C string:
unsigned char *temp = (unsigned char *)[#"Hey, I am some usual CString" UTF8String]
How can I replace "usual" with "other" to get: "Hey, I am some other CString".
I cannot use NSString functions (replaceCharactersInRange/replaceOccurencesOfString, etc.) for performance reasons. I have to keep it all at low level, since the strings I'll be dealing with happen to exceed 5MB, and therefore the replacements (there will be a lot of replacements to do) take about 10 minutes on a iOS device.
Objective-C is a just thin layer over C.
If you need to work with native C strings, just go ahead and do it.
This
What is the function to replace string in C?
seems to address your problem fairly well.
The C string returned by UTF8String is const. You can't safely change it by casting it to a non-const string and mutate the bytes. So the only way to do this is by creating a copy.
If you really have reason to use an NSString as the source it might be much faster to do the transformation on the original string.
If you want to get a better answer that helps you to speed up your special case you should provide some more information. How do you create the original string, what's the number and size of search/replacement strings and so on.
Related
Has anyone built a "universal" string class for C++Builder that manages all of the conversions to/from ASCII and Unicode?
I had a vision of a class that would accept AnsiString, UnicodeString, WideString, char*, wchar_t*, std::string, and variant values, and would provide any of those back out. AND the copy constructor has to do a deep copy, not just provide a pointer to the same buffer space (as AnsiString and UnicodeString do).
I figure someone else besides me must have to pass strings to both old interfaces that use char* and new ones that use (wide) strings. If you have built, or know of, something you're willing to share, please let me know. Most of the time it's not too big a deal, until I have to pass a map<std::string, std::string>, then it starts getting ugly.
We do not, and will not, support any internationalization whatsoever, so I don't need to worry about encoding. I just want a class that will return my little ASCII strings in whatever format makes the compiler happy... sanely.
UPDATE: to address the comments:
So, std::map<std::string, std::string> is ugly, because you can't do:
parammap[AnsiString(widekey).c_str()] = AnsiString(widevalue).c_str();
Oh no no no. You have to do this:
AnsiString akey = widekey;
AnsiString aval = widevalue;
parammap[akey.c_str()] = aval.c_str();
The person who originally wrote this code tried to keep it as port-friendly as possible, so he standardized on char* for all of the function calls he wrote (circa 2000, it wasn't a bad assumption). Sometimes I was trying to convert everything to char *s before I realized that the function was then immediately turning around and converting it back to wide. There are multiple interface layers, and it took me a while to figure out how it all went together.
Add in some creative compiler bugs, where it would get confused, especially when pulling string values out of Variants. In some places, I had to do:
String wstr = passedvariant.AsType(varString);
String astr = wstr;
std::string key = astr.c_str();
Then life happened, we ended up starting the port over (for the 3rd time. Don't ask), and I finally got smart and wrapped the low-level library in a layer that does all of the conversions, and retooled the middle layers to deal in Strings, so the application layer can just use String except for that map. For the map<string, string>, I created a function to do the converting, so it was one line in a bunch of places instead of six (the three line conversion above for both key and value).
Lastly, I wasn't actually asking for anyone to make suggestions on how to make my code better. I was asking if anyone had or knew of a universal string class. Our code is the way it is for reasons, and I'm not rewriting all of it to make it prettier. I just wanted not to have to touch so many lines... again. It would have been so much nicer to have the compiler keep track of which format is needed and convert it.
As the title of the question states I'm looking to take the following string in hexadecimal base:
b9c84ee012f4faa7a1e2115d5ca15893db816a2c4df45bb8ceda76aa90c1e096456663f2cc5e6748662470648dd663ebc80e151d4d940c98a0aa5401aca64663c13264b8123bcee4db98f53e8c5d0391a7078ae72e7520da1926aa31d18b2c68c8e88a65a5c221219ace37ae25feb54b7bd4a096b53b66edba053f4e42e64b63
And convert it to its decimal equivalent string:
130460875511427281888098224554274438589599458108116621315331564625526207150503189508290869993616666570545720782519885681004493227707439823316135825978491918446215631462717116534949960283082518139523879868865346440610923729433468564872249430429294675444577680464924109881111890440473667357213574597524163283811
I've looked to use this code, found at this link:
unsigned result = 0;
NSScanner *scanner = [NSScanner scannerWithString:hexString];
[scanner setScanLocation:1]; // bypass '#' character
[scanner scanHexInt:&result];
NSLog(#" %u",result);
However, I keep getting the following result: 4294967295. Any ideas on how I can solve this problem?
This sounds like a homework/quiz question, and SO isn't to get code written, so here are some hints in hope they help.
Your number is BIG, far larger than any standard integer size, so you are not going to be able to do this with long long or even NSDecimal.
Now you could go and source an "infinite" precision arithmetic package, but really what you need to do isn't that hard (but if you are going to be doing more than this then such using a package would make sense).
Now think back to your school days, how were you taught to do base conversion? The standard method is long division and reminders.
Example: start with BAD in hex and convert to decimal:
BAD ÷ A = 12A remainder 9
12A ÷ A = 1D remainder 8
1D ÷ A = 2 remainder 9
2 ÷ A = 0 remainder 2
now read the remainder back, last first, to give 2989 decimal.
Long division is a digit at a time process, starting with the most significant digit, and carrying the remainder as you move to the next digit. Sounds like a loop.
Your initial number is a string, the most significant digit is first. Sounds like a loop.
Processing characters one at a time from an NSString is, well, painful. So first convert your NSString to a standard C string. If you copy this into a C-array you can then overwrite it each time you "divide". You'll probably find the standard C functions strlen() and strcpy() helpful.
Of course you have characters in your string, not integer values. Include ctype.h in your code and use the digittoint() function to convert each character in your number to its numeric equivalent.
The standard library doesn't have the inverse of digittoint(), so to convert an integer back to its character equivalent you need to write your own code, think indexing into a suitable constant string...
Write a C function, something like int divide(char *hexstring) which does one long division of hexstring, writing the result into hexstring and returning the remainder. (If you wish to write more general code, useful for testing, write something like int divide(char *buf, int base, int divisor) - so you can convert hex to decimal and then back again to check you get the back to where you started.)
Now you can loop calling your divide and accumulating the remainders (as characters) into another string.
How big should your result string be? Well a number written in decimal typically has more digits than when written in hex (e.g. 2989 v. BAD above). If you're being general then hex uses the fewest digits and binary uses the most. A single hex digit equates to 4 binary digits, so a working buffer 4 times the input size will always be long enough. Don't forget to allow for the terminating NUL in C strings in your buffer.
And as hinted above, for testing make your code general, convert your hex string to a decimal one, then convert that back to a hex one and check the result is the same as the input.
If this sounds complicated don't despair, it only takes around 30 lines of well spaced code.
If you get stuck coding it ask a new question showing your code, explain what goes wrong, and somebody will undoubtedly help you out.
HTH
Your result is the maximum of unsinged int 32 bit, the type you are using. As far as I can see, in the NSScanner documentation long long is the biggest supported type.
I got a string in little endian that I would like to convert in big endian.
This "647b" should become "7b64". How can I do this in iOS (C++ code welcome)?
PS: I am deriving the string from a NSData object.
You didn't say how you were converting your data into a NSString, so I have to make some assumptions here.
NSStrings (and CFStringRefs, which are toll free bridged) have encodings. Many iOS devs don't need to keep in mind that their strings are UTF8Encoded.
[If you look at the list of string encodings from Apple (CoreFoundation & Foundation), you'll see some of them do specify little-endian and big-endian.
And probably the best way to do what you are trying to do is something like this:
// load it as UTF16 big endian
NSString *str = [NSString alloc] initWithData:yourDataObject encoding:NSUTF16BigEndianStringEncoding];
That line of code I found in this related question.
Does it make a difference which one I use in objective-c (particularly on iOS)? I assume it comes from inheriting from C and its types, as well as inheriting the types from Mac OS, which iOS was based on, but I don't know which one I should use:
unsigned char from...well..the compiler?
uint8_t from stdint.h
UInt8 from MacTypes.h
Byte from MacTypes.h
Bytef from zconf.h
I am aware that the various defs are for portability reasons, and using literals like unsigned char is not good future thinking (size might change, and things will end up like the Windows API again). I'd like some advice on how to spot the best ones for my uses. Or a good tongue lashing if I'm just being silly...
EDIT : Just for some more info, if I want something that will always be 1 byte, should I use uint8_t (doesn't seem like it would change with a name like that)? I'd like to think UInt8 wouldn't change either but I see that the definition of UInt32 varies on whether or not the processor is 64-bit.
FURTHER EDIT : When I say byte, I specifically mean that I want 8 bits. I am doing pixel compression operations (32 bits -> 8 bits) for disk storage.
It's totally indifferent. Whichever you use, it will most probably end up being an unsigned char. If you want it to look nice, though, I suggest you use uint8_t from <stdint.h>.
Neither will change with the architecture. char is always 1 byte as per the C standard, and it would be insupportable from a user's point of view if in an implementation, UInt8 suddenly became 16 bits long.
(It is not the case, however, that char is required to be 8 bits wide, it's only that if the name of a type suggest that it's 8 bits long, then any sensible implementation does indeed typedefs it as such. Incidentally, a byte (which char is) is often an 8-bit unit, i. e. an octet.)
As in every programming language derived from C-language type model, Objective C has a handful of equivalent options to declare a 8-bit integer.
Why did I say equivalent? Because as OP correctly stated, it's obvious that all of those options eventually typedef-ed to unsigned char built-in compiler type. This is correct for now and, let's speak practically, nobody sane will change them to be a non-8-bit integers in the future.
So, the actual question here is what is the better order to prioritize considerations when choosing the type name for 8-bit integer?
Code readability
Since basically in every code having C language roots, primitive type names are a mess. Therefore, probably the most important factor is readability. And by that I mean clear and uniquely identifiable intent of choosing this specific type for this specific integer for the majority of people who would read your code.
So let's take look at those types from an average Objective C programmer point of view who knows little about C language.
unsigned char - what's this??? why char is ever meant to be signed???
uint8_t - ok, unsigned 8 bit integer
UInt8 - hmm, the same as above
Byte - signed or unsigned 8 bit integer
Bytef - what's this? byte-float? what does that 'f' mean?
It's obvious here that unsigned char and Bytef aren't a good choices.
Going further, you can notice another nuisance with Byte type name: you can't say for sure if it represents signed or unsigned integer which could be extremely important when you're trying to understand what is the range of values this integer could hold (-128 .. 127 or 0 .. 256). This is not adding points to code readability, too.
Uniform code style
We're now left with the 2 type names: uint8_t and UInt8. How to choose between them?
Again, looking at them through the eyes of an Objective C programmer, who is using type names like NSInteger, NSUInteger a lot, it looks like much natural when he sees UInt8. uint8_t just looks like a very low-level daunting stuff.
Conclusion
Thus, we eventually are left with the single option - UInt8. Which is clearly identifiable in terms of number of bits, range and looks accustomed. So it's probably the best choice here.
I'm converting some legacy code to Delphi 2010.
There are a fair number of old ShortStrings, like string[25]
Why does the assignment below:
type
S: String;
ShortS: String[25];
...
S := ShortS;
cause the compiler to generate this warning:
W1057 Implicit string cast from 'ShortString' to 'string'.
There's no data loss that is occurring here. In what circumstances would this warning be helpful information to me?
Thanks!
Tomw
It's because your code is implicitly converting a single-byte character string to a UnicodeString. It's warning you in case you might have overlooked it, since that can cause problems if you do it by mistake.
To make it go away, use an explicit conversion:
S := string(ShortS);
The ShortString type has not changed. It continues to be, in effect, an array of AnsiChar.
By assigning it to a string type, you are taking what is a group of AnsiChars (one byte) and putting it into a group of WideChars (two bytes). The compiler can do that just fine, and is smart enough not to lose data, but the warning is there to let you know that such a conversion has taken place.
The warning is very important because you may lose data. The conversion is done using the current Windows 8-bit character set, and some character sets do not define all values between 0 and 255, or are multi-byte character sets, and thus cannot convert all byte values.
The data loss can occur on a standard computer in a country with specific standard character sets, or on a computer in USA that has been set up for a different locale, because the user communicates a lot with people in other languages.
For instance, if the local code page is 932, the byte values 129 and 130 will both convert to the same value in the Unicode string.
In addition to this, the conversion involves a Windows API call, which is an expensive operation. If you do a lot of these, it can slow down your application.
It's safe ( as long as you're using the ShortString for its intended purpose: to hold a string of characters and not a collection of bytes, some of which may be 0 ), but may have performance implications if you do it a lot. As far as I know, Delphi has to allocate memory for the new unicode string, extract the characters from the ShortString into a null-terminated string (that's why it's important that it's a properly-formed string) and then call something like the Windows API MultiByteToWideChar() function. Not rocket science, but not a trivial operation either.
ShortStrings don't have a code page associated with them, AnsiStrings do (since D2009).
The conversion from ShortString to UnicodeString can only be done on the assumption that ShortStrings are encoded in the default ANSI encoding which is not a safe assumption.
I don't really know Delphi, but if I remember correctly, the Shortstrings are essentially a sequence of characters on the stack, whereas a regular string (AnsiString) is actually a reference to a location on the heap. This may have different implications.
Here's a good article on the different string types:
http://www.codexterity.com/delphistrings.htm
I think there might also be a difference in terms of encoding but I'm not 100% sure.