I'm looking for anything that can help me deviate string GetRTTIClassName(IntPtr ProcessHandle, IntPtr StructAddress). The function would use another (third-party) app's process handle to get names of structures located at specific addresses in its memory (should there be found any).
All of RTTI questions/documentation I can find relate to it being used in the same application, and have nothing to do with process interop. The only thing close to what I'm looking for is this module in Cheat Engine's source code (which is also how I found out that it's possible in the first place), but it has over a dozen of nested language-specific dependencies, let alone the fact that Lazarus won't let me build it outside of the project context anyway.
If you know of code examples, libraries, documentation on what I've described, or just info on accessing another app's low-level metadata (pardon my French), please share them. If it makes a difference, I'm targeting C#.
Edit: from what I've gathered, the way runtime information is stored depends on the compiler, so I'll mention that the third-party app I'm "exploring" is a MSVC project.
As I understand, I need to:
Get address of the structure based on address of its instance;
Starting from structure address, navigate through pointers to find its name (possibly "decorated").
I've also found a more readable C# implementation and a bunch of articles on reversing (works for step 2), but I can't seem to find step 1.
I'll update/comment as I find more info, but right now I'm getting a headache just digging into this low-level stuff.
It's a pretty long pointer ladder. I've transcribed the solution ReClass.NET uses to clean C# without dependencies.
Resulting library can be found here.
Objective-C has no namespaces; it's much like C, everything is within one global namespace. Common practice is to prefix classes with initials, e.g. if you are working at IBM, you could prefix them with "IBM"; if you work for Microsoft, you could use "MS"; and so on. Sometimes the initials refer to the project, e.g. Adium prefixes classes with "AI" (as there is no company behind it of that you could take the initials). Apple prefixes classes with NS and says this prefix is reserved for Apple only.
So far so well. But appending 2 to 4 letters to a class name in front is a very, very limited namespace. E.g. MS or AI could have an entirely different meanings (AI could be Artificial Intelligence for example) and some other developer might decide to use them and create an equally named class. Bang, namespace collision.
Okay, if this is a collision between one of your own classes and one of an external framework you are using, you can easily change the naming of your class, no big deal. But what if you use two external frameworks, both frameworks that you don't have the source to and that you can't change? Your application links with both of them and you get name conflicts. How would you go about solving these? What is the best way to work around them in such a way that you can still use both classes?
In C you can work around these by not linking directly to the library, instead you load the library at runtime, using dlopen(), then find the symbol you are looking for using dlsym() and assign it to a global symbol (that you can name any way you like) and then access it through this global symbol. E.g. if you have a conflict because some C library has a function named open(), you could define a variable named myOpen and have it point to the open() function of the library, thus when you want to use the system open(), you just use open() and when you want to use the other one, you access it via the myOpen identifier.
Is something similar possible in Objective-C and if not, is there any other clever, tricky solution you can use resolve namespace conflicts? Any ideas?
Update:
Just to clarify this: answers that suggest how to avoid namespace collisions in advance or how to create a better namespace are certainly welcome; however, I will not accept them as the answer since they don't solve my problem. I have two libraries and their class names collide. I can't change them; I don't have the source of either one. The collision is already there and tips on how it could have been avoided in advance won't help anymore. I can forward them to the developers of these frameworks and hope they choose a better namespace in the future, but for the time being I'm searching a solution to work with the frameworks right now within a single application. Any solutions to make this possible?
Prefixing your classes with a unique prefix is fundamentally the only option but there are several ways to make this less onerous and ugly. There is a long discussion of options here. My favorite is the #compatibility_alias Objective-C compiler directive (described here). You can use #compatibility_alias to "rename" a class, allowing you to name your class using FQDN or some such prefix:
#interface COM_WHATEVER_ClassName : NSObject
#end
#compatibility_alias ClassName COM_WHATEVER_ClassName
// now ClassName is an alias for COM_WHATEVER_ClassName
#implementation ClassName //OK
//blah
#end
ClassName *myClass; //OK
As part of a complete strategy, you could prefix all your classes with a unique prefix such as the FQDN and then create a header with all the #compatibility_alias (I would imagine you could auto-generate said header).
The downside of prefixing like this is that you have to enter the true class name (e.g. COM_WHATEVER_ClassName above) in anything that needs the class name from a string besides the compiler. Notably, #compatibility_alias is a compiler directive, not a runtime function so NSClassFromString(ClassName) will fail (return nil)--you'll have to use NSClassFromString(COM_WHATERVER_ClassName). You can use ibtool via build phase to modify class names in an Interface Builder nib/xib so that you don't have to write the full COM_WHATEVER_... in Interface Builder.
Final caveat: because this is a compiler directive (and an obscure one at that), it may not be portable across compilers. In particular, I don't know if it works with the Clang frontend from the LLVM project, though it should work with LLVM-GCC (LLVM using the GCC frontend).
If you do not need to use classes from both frameworks at the same time, and you are targeting platforms which support NSBundle unloading (OS X 10.4 or later, no GNUStep support), and performance really isn't an issue for you, I believe that you could load one framework every time you need to use a class from it, and then unload it and load the other one when you need to use the other framework.
My initial idea was to use NSBundle to load one of the frameworks, then copy or rename the classes inside that framework, and then load the other framework. There are two problems with this. First, I couldn't find a function to copy the data pointed to rename or copy a class, and any other classes in that first framework which reference the renamed class would now reference the class from the other framework.
You wouldn't need to copy or rename a class if there were a way to copy the data pointed to by an IMP. You could create a new class and then copy over ivars, methods, properties and categories. Much more work, but it is possible. However, you would still have a problem with the other classes in the framework referencing the wrong class.
EDIT: The fundamental difference between the C and Objective-C runtimes is, as I understand it, when libraries are loaded, the functions in those libraries contain pointers to any symbols they reference, whereas in Objective-C, they contain string representations of the names of thsoe symbols. Thus, in your example, you can use dlsym to get the symbol's address in memory and attach it to another symbol. The other code in the library still works because you're not changing the address of the original symbol. Objective-C uses a lookup table to map class names to addresses, and it's a 1-1 mapping, so you can't have two classes with the same name. Thus, to load both classes, one of them must have their name changed. However, when other classes need to access one of the classes with that name, they will ask the lookup table for its address, and the lookup table will never return the address of the renamed class given the original class's name.
Several people have already shared some tricky and clever code that might help solve the problem. Some of the suggestions may work, but all of them are less than ideal, and some of them are downright nasty to implement. (Sometimes ugly hacks are unavoidable, but I try to avoid them whenever I can.) From a practical standpoint, here are my suggestions.
In any case, inform the developers of both frameworks of the conflict, and make it clear that their failure to avoid and/or deal with it is causing you real business problems, which could translate into lost business revenue if unresolved. Emphasize that while resolving existing conflicts on a per-class basis is a less intrusive fix, changing their prefix entirely (or using one if they're not currently, and shame on them!) is the best way to ensure that they won't see the same problem again.
If the naming conflicts are limited to a reasonably small set of classes, see if you can work around just those classes, especially if one of the conflicting classes isn't being used by your code, directly or indirectly. If so, see whether the vendor will provide a custom version of the framework that doesn't include the conflicting classes. If not, be frank about the fact that their inflexibility is reducing your ROI from using their framework. Don't feel bad about being pushy within reason — the customer is always right. ;-)
If one framework is more "dispensable", you might consider replacing it with another framework (or combination of code), either third-party or homebrew. (The latter is the undesirable worst-case, since it will certainly incur additional business costs, both for development and maintenance.) If you do, inform the vendor of that framework exactly why you decided to not use their framework.
If both frameworks are deemed equally indispensable to your application, explore ways to factor out usage of one of them to one or more separate processes, perhaps communicating via DO as Louis Gerbarg suggested. Depending on the degree of communication, this may not be as bad as you might expect. Several programs (including QuickTime, I believe) use this approach to provide more granular security provided by using Seatbelt sandbox profiles in Leopard, such that only a specific subset of your code is permitted to perform critical or sensitive operations. Performance will be a tradeoff, but may be your only option
I'm guessing that licensing fees, terms, and durations may prevent instant action on any of these points. Hopefully you'll be able to resolve the conflict as soon as possible. Good luck!
This is gross, but you could use distributed objects in order to keep one of the classes only in a subordinate programs address and RPC to it. That will get messy if you are passing a ton of stuff back and forth (and may not be possible if both class are directly manipulating views, etc).
There are other potential solutions, but a lot of them depend on the exact situation. In particular, are you using the modern or legacy runtimes, are you fat or single architecture, 32 or 64 bit, what OS releases are you targeting, are you dynamically linking, statically linking, or do you have a choice, and is it potentially okay to do something that might require maintenance for new software updates.
If you are really desperate, what you could do is:
Not link against one of the libraries directly
Implement an alternate version of the objc runtime routines that changes the name at load time (checkout the objc4 project, what exactly you need to do depends on a number of the questions I asked above, but it should be possible no matter what the answers are).
Use something like mach_override to inject your new implementation
Load the new library using normal methods, it will go through the patched linker routine and get its className changed
The above is going to be pretty labor intensive, and if you need to implement it against multiple archs and different runtime versions it will be very unpleasant, but it can definitely be made to work.
Have you considered using the runtime functions (/usr/include/objc/runtime.h) to clone one of the conflicting classes to a non-colliding class, and then loading the colliding class framework? (this would require the colliding frameworks to be loaded at different times to work.)
You can inspect the classes ivars, methods (with names and implementation addresses) and names with the runtime, and create your own as well dynamically to have the same ivar layout, methods names/implementation addresses, and only differ by name (to avoid the collision)
Desperate situations call for desperate measures. Have you considered hacking the object code (or library file) of one of the libraries, changing the colliding symbol to an alternative name - of the same length but a different spelling (but, recommendation, the same length of name)? Inherently nasty.
It isn't clear if your code is directly calling the two functions with the same name but different implementations or whether the conflict is indirect (nor is it clear whether it makes any difference). However, there's at least an outside chance that renaming would work. It might be an idea, too, to minimize the difference in the spellings, so that if the symbols are in a sorted order in a table, the renaming doesn't move things out of order. Things like binary search get upset if the array they're searching isn't in sorted order as expected.
#compatibility_alias will be able to solve class namespace conflicts, e.g.
#compatibility_alias NewAliasClass OriginalClass;
However, this will not resolve any of the enums, typedefs, or protocol namespace collisions. Furthermore, it does not play well with #class forward decls of the original class. Since most frameworks will come with these non-class things like typedefs, you would likely not be able to fix the namespacing problem with just compatibility_alias.
I looked at a similar problem to yours, but I had access to source and was building the frameworks.
The best solution I found for this was using #compatibility_alias conditionally with #defines to support the enums/typedefs/protocols/etc. You can do this conditionally on the compile unit for the header in question to minimize risk of expanding stuff in the other colliding framework.
It seems that the issue is that you can't reference headers files from both systems in the same translation unit (source file). If you create objective-c wrappers around the libraries (making them more usable in the process), and only #include the headers for each library in the implementation of the wrapper classes, that would effectively separate name collisions.
I don't have enough experience with this in objective-c (just getting started), but I believe that is what I would do in C.
Prefixing the files is the simplest solution I am aware of.
Cocoadev has a namespace page which is a community effort to avoid namespace collisions.
Feel free to add your own to this list, I believe that is what it is for.
http://www.cocoadev.com/index.pl?ChooseYourOwnPrefix
If you have a collision, I would suggest you think hard about how you might refactor one of the frameworks out of your application. Having a collision suggests that the two are doing similar things as it is, and you likely could get around using an extra framework simply by refactoring your application. Not only would this solve your namespace problem, but it would make your code more robust, easier to maintain, and more efficient.
Over a more technical solution, if I were in your position this would be my choice.
If the collision is only at the static link level then you can choose which library is used to resolve symbols:
cc foo.o -ldog bar.o -lcat
If foo.o and bar.o both reference the symbol rat then libdog will resolve foo.o's rat and libcat will resolve bar.o's rat.
Just a thought.. not tested or proven and could be way of the mark but in have you considered writing an adapter for the class's you use from the simpler of the frameworks.. or at least their interfaces?
If you were to write a wrapper around the simpler of the frameworks (or the one who's interfaces you access the least) would it not be possible to compile that wrapper into a library. Given the library is precompiled and only its headers need be distributed, You'd be effectively hiding the underlying framework and would be free to combine it with the second framework with clashing.
I appreciate of course that there are likely to be times when you need to use class's from both frameworks at the same time however, you could provide factories for further class adapters of that framework. On the back of that point I guess you'd need a bit of refactoring to extract out the interfaces you are using from both frameworks which should provide a nice starting point for you to build your wrapper.
You could build upon the library as you and when you need further functionality from the wrapped library, and simply recompile when you it changes.
Again, in no way proven but felt like adding a perspective. hope it helps :)
If you have two frameworks that have the same function name, you could try dynamically loading the frameworks. It'll be inelegant, but possible. How to do it with Objective-C classes, I don't know. I'm guessing the NSBundle class will have methods that'll load a specific class.
I'm evaluating the many possibilities for a trial protection system and came up with the following question:
If I use my "trial check" class more than once (scattered several times over the application), will it be compiled just once into the exe?
The reason why I'm asking is that if it's only compiled once into the exe, then patching this single class will invalidate all places where it is used.
If it's compiled just once, are there any viable alternatives to prevent this?
Thanks!
EDIT: I'm actually not trying to roll my own protection system, I'm looking at several existing solutions like OnGuard, mxProtector and TRegWare. It was while looking at the various solutions source-code, that I came up with this question.
Yes, even if you create several instances of the class in different places there is only one copy of the methods (implementation), so if hacker patches the class all instances will be patched.
Do you really want to roll your own protection system? It ain't easy to come up with good system and there are several ready to use solutions around, if youre on budget then perhaps TurboPower OnGuard (which is open source now) will do.
BTW the general wisdom is that if they want to crack your app they will do it, no matter what, so one shouldn't waste too much resources on protection schemes. The only foolproof way is to exclude some of the (key) functionality from trial version, ie
{$IFDEF trial_version}
ShowMessage('Sorry, this function is not available in trial version');
{$ELSE}
// do the thing
{$END}
but of course, if full version gets into wild then it will be cracked...
If you use the inline keyword for functions and methods where possible, the executable code will be "multi-plicated". There are some limitations to the use of inlining, though (see the linked doc).
I agree with Ain and Marco that spending effort on protection schemes may be more bother than benefit, and that it makes more sense to use existing solutions than to roll your own.
Yes. The standard workaround is to put the code in an .inc and include that in multiple units.
But that makes less sense in a security setting. Since if sb has learned to search for a pattern, he can simply repeat the search to find the other occurrences, making it a minor nuisance at best.
This is one of the reasons why DIY protection is often a waste of time, and I agree fully with Ain. (both the onguard thing, and the fact that if functionality IS in the exe, it will be unlocked sooner or later, giving sufficient motivation)
Just for the principle: There is actually a possibility to have the "same" class compiled multiple times. If you declare the class with a generic type and later have several instances with different instance types, the class code is compiled for each type. The generic class don't even have to make use of the generic type. If you spread the generic instances over different units, the code will be separated, too.
type
TDummy<T> = class
public
procedure Dummy1;
end;
procedure TDummy<T>.Dummy1;
begin
...
end;
var
FDummy1: TDummy<Integer>;
FDummy2: TDummy<Byte>;
FDummy3: TDummy<TButton>;
FDummy4: TDummy<TLabel>;
We’re rewriting a calculation core from scratch in Delphi, and we’re looking for ways to let other people write code against it.
Automation seems a fairly safe way to get this done. One use we’re thinking of is making it available to VBA/Office, and also generating a .NET assembly (based on the Automation object, that's easy).
But the code should still be easy to use from Delphi, since we’ll be writing our (desktop) UI with that.
Now I’ve been looking into creating an Automation server in Delphi, and it looks like quite a hassle to have to design the components in the Type Library wizard, and then generate the base code.
The calculations we’re having to implement are described in official rules and regulations that are still not ratified, and so could still change before we’re done — they very probably will, perhaps quite extensively. Waiting for the final version is not an option.
An alternative way could be to finish the entire object model first, and write a separate Automation server which only describes the top-level object, switch $METHODINFO ON, and use TObjectDispatch to return all the subordinate objects. As I see it, that would entail having to write wrappers to return the objects by IDispatch interface. Since there's over a 100 different classes in there, that doesn’t look like an attractive option.
Edit: TObjectDispatch is smart enough to wrap any objects returned by properties and methods as well; so only the top object(s) would need to be wrapped. Lack of a complete type library does mean only late-binding is possible, however.
Is there an other, easier (read: hassle-free) way to write a COM-accessible object model in Delphi?
You don't have to use the type library designer. You can write or generate (e.g. from RTTI of your Delphi classes) a .ridl file and add it to your Automation library project.
Generating interface description from RTTI is a great idea! After you have your interfaces generated you can generate a delphi unit from them and implementing in your classes. Of course the majority are implemented already since you have generated the interfaces from those classes after all. The late binding resolution can be done after that by hand using RTTI and implementing IDispatch and IDispatchEx in a common baseclass of the scriptable classes.
Recently, we received a bug report from one of our users: something on the screen was displayed incorrectly in our software. Somehow, we could not reproduce this in our development environment (Delphi 2007).
After some further study, it appears that this bug only manifests itself when "Code optimization" is turned on.
Are there any people here with experience in hunting down such a Heisenbug? Any specific constructs or coding bugs that commonly cause such an issue in Delphi software? Any places you would start looking?
I'll also just start debugging the whole thing in the usual way, but any tips specific to Optimization-related bugs (*) would be more than welcome!
(*) Note: I don't mean to say that the bug is caused by the optimizer; I think it's much more likely some wonky construct in the code is somehow pushed "over the edge" by the optimizer.
Update
It seems the bug boils down to a record being fully initialized with zeros when there's no code optimization, and the same record containing some random data when there is optimization. In this case, the random data seems to cause an enum type to contain invalid data (to my great surprise!).
Solution
The solution turned out to involve an unitialized local record variable somewhere deep in the code. Apparently, without optimization the record was reset (heap?), and with optimization turned on, the record was filled with the usual garbage. Thanks to you all for your contributions --- I learned a lot along the way!
Typically bugs of this form are caused by invalid memory access (reading uninitialised data, reading off the end of a buffer...) or thread race conditions.
The former will be affected by optimisations causing data layout to be rearranged in memory, and/or possibly by debug code that initialises newly allocated memory to some value; causing the incorrect code to "accidentally work".
The latter will be affected due to timings changing between optimisation levels. The former is generally much more likely.
If you have some automated way of making freshly allocated memory be filled with some constant value before it is passed to the program, and this makes the crash go away or become reproducible in the debug build, that'll provide a good point to start chasing things.
Could very well be a memory vs register issue: you programm running fine relying on memory persistence after a free.
I would recommend running your application with FastMM4 in full debug mode to be sure of your memory management.
Another (not free) tool which can be very useful in a case like this is Eurekalog.
Another thing that I've seen: a crash with the FPU registers being botched when calling some outside code (DLL, COM...) while with the debugger everything was OK.
A record that contains different data according to different compiler settings tells me one thing: That the record is not being explicitly initialised.
You may find that the setting of the compiler optimization flag is only one factor that might affect the content of that record - with any uninitialised data structures the one thing that you can rely on is that you can't rely on the initial content of the structure.
In simple terms:
class member data is initialised (to zero's) for new instances of the class
local variables (in functions and procedures) and unit variables are NOT initialised except in a few specific cases: interface references, dynamic arrays and strings and I think (but would need to check) records if they contain one or more fields of those types that would be initialised (strings, interface references etc).
The question as stated is now a little misleading because it seems you found your "Heisenberg" fairly easily enough. Now the issue is how to deal with it, and the answer is simply to explicitly initialise your record so that you aren't reliant on whatever behaviour or side-effect of the compiler is sometimes taking care of that for you and sometimes not.
Especially in purely native languages, like Delphi, you should be more than careful not to abuse the freedom to be able to cast anything to anything.
IOW: One thing, I have seen is that someone copies the definition of a class (e.g. from the implementation section in RTL or VCL) into his own code and then cast instances of the original class to his copy.
Now, after upgrading the library where the original class came from, you might experience all kinds of weird stuff. Like jumping into the wrong methods or bufferoverflows.
There's also the habit of using signed integer as pointers and vice-versa. (Instead of cardinal)
this works perfectly fine as long as your process has only 2GB of address space. But boot with the /3GB switch and you will see a lot of apps that start acting crazy. Those made the assumption of "pointer=signed integer" at least somewhere.
Your customer uses a 64Bit Windows? Chances are, he might have a larger address space for 32Bit apps. Pretty tough to debug w/o having such a test system available.
Then, there's race conditions.
Like having 2 threads, where one is very, very slow. So that you instinctively assume it will always be the last one and so there's no code that handles the scenario where "Captn slow" finishes first.
Changes in the underlying technologies can make these assumptions very wrong, very fast indeed.
Take a look at the upcoming breed of Flash-based super-mega-fast server storage.
Systems that can read and write Gigabytes per second. Applications that assume the IO stuff to be significantly slower than some calculations on in-memory values will easily fail on this kind of fast storage.
I could go on and on, but I gotta run right now...
Cheers
Code optimization does not mean necessarily that debug symbols have to be left out. Do a debug build with code optimization, then you can still debug the program and maybe the error occurs now.
One easy thing to do is Turn on compiler warning and hint, rebuild project and then fix all warnings/hints
Cheers
If it Delphi businesscode, with dataaware components etc, the follow might not apply.
I'm however writing machine vision code which is a bit computational. Most of the unittests are console based. I also am involved with FPC, and over the years have tested a lot with FPC. Partially out of hobby, partially in desperate situations where I wanted any hunch.
Some standard tricks that I tried (decreasing usefulness)
use -gv and valgrind the code (practically this means applications are required to run on Linux/FreeBSD. But for computational code and unittests that can be doable)
compile using fpc param -gt (=trash local vars, randomize local vars on procedure init)
modify heapmanager to randomize data of blocks it puts out (also applyable to Delphi code)
Try FPC's range/overflow checking and compiler hints.
run on a Mac Mini (powerpc) or win64. Due to totally different rules and memory layouts it can catch pretty funky things.
The 2 and 3 together nearly allow you to find most, if not all initialization problems.
Try to find any clues, and then go back to Delphi and search more focussed, debug etc.
I do realize this is not easy. I have a lot of FPC experience, and didn't have to find everything out from scratch for these cases. Still it might be worth a try, and might be a motivation to start setting up non-visual systems and unittests FPC compatible and platform independant. Most of this work will be needed anyway, seeing the Delphi roadmap.
In such problems i always advice to use logfiles.
Question: Can you somehow determine the incorrect display in the sourcecode?
If not, my answer wont help you.
If yes, check for the incorrectness, and as soon as you find it, dump the stack to a logfile. (see post mortem debugging for details about dumping and resymbolizing the stack).
If you see that some data has been corrupted, but you dont know how and then this happend, extract a function that does such a test for validity (with logging if failed), and call this function from more and more places over program execution (i.e. after each menu call). If you reiterate such a approach a few times you have good chances to find the problem.
Is this a local variable inside a procedure or function?
If so, then it lives on the stack, and will contain garbage. Depending on the execution path and compiler settings the garbage will change, potentially pushing your logic 'over the edge'.
--jeroen
Given your description of the problem I think you had uninitialized data that you got away with without the optimizer but which blew up with the optimization on.