I'm currently reverse engineering a file that appears to be statically compiled, however IDA Pro isn't picking up on any of the signatures! I feel like I am spending a lot of time stepping through functions that should be recognized by IDA, but they're not.
Anyway, maybe I am wrong... does anyone have any ideas? Has anyone run into this before?
IDA is a great disassembler, but it is not perfect. Some code, especially unlined/optimized code, simply cannot be disassembled into coherent functions in an automated fashion. This is what happens during compiling - coherent code is translated into instructions that the machine understands, not humans. IDA can make guesses and estimates, but it can't do everything. Reverse engineering will always involve some amount of manual interpretation to fill in the gaps.
If the compiler is not recognized by IDA (e.g. there were some changes in startup code), signatures won't be applied automatically. And if IDA doesn't know this compiler at all, it won't have any signatures. So:
if it has signatures but the compiler was not recognized automatically, apply them manually. For Delphi/C++ Builder, try b32vcl or bds.
if it doesn't have signatures for this compiler/library, you can create them yourself using FLAIR tools (assuming you have access to the original libraries)
This question is very broad, but I will try to give my opinion.
If the problem is that IDA is not correctly identifying Delphi, then you should try another software. There is a good tool called IDR (Interactive Delphi Reconstructor), however keep in mind that it runs the software before disassembling it and you should not run any not trustworthy programs on your PC (try virtual machine insted)
Otherwise, if the question is about IDA itself, then... IDA is not perfect at all, so it needs a reverse engineer to run it good, this will mean you have to statically identify some code, stack pointers, variables and etc. If it comes to Hex-Rays decompiler there are even more things to look for. For example it can identify not proper convention for a function and you will have to correct it or it can create too many variables that should be mapped by hand.
Also there are some databases for IDA's Flirt functions that could be useful to you. https://github.com/Maktm/FLIRTDB
Related
I have recently started doing some work with COBOL, where I have only ever done work in z/OS Assembler on a Mainframe before.
I know that COBOL will be translated into Mainframe machine-code, but I am wondering if it is possible to see the generated code?
I want to use this to better understand the under workings of COBOL.
For example, if I was to compile a COBOL program, I would like to see the assembly that results from the compile. Is something like this possible?
Relenting, only because of this: "I want to use this to better understand the under workings of Cobol".
The simple answer is that there is, for Enterprise COBOL on z/OS, a compiler option, LIST. LIST will provide what is known as the "pseudo assembler" output in your compile listing (and some other useful stuff for understanding the executable program). Another compiler option, OFFSET, shows the displacement from the start of the program of the code generated for each COBOL verb. LIST (which inherently has the offset already) and OFFSET are mutually exclusive. So you need to specify LIST and NOOFFSET.
Compiler options can be specified on the PARM of the EXEC PGM= for the compiler. Since the PARM is limited to 100 characters, compiler options can also be specified in a data set, with a DDName of SYSOPTF (which, in turn, you use a compiler option to specify its use).
A third way to specify compiler options is to include them in the program source, using the PROCESS or (more common, since it is shorter) CBL statement.
It is likely that you have a "panel" to compile your programs. This may have a field allowing options to be specified.
However, be aware of a couple of things: it is possible, when installing the compiler, to "nail in" compiler options (which means they can't be changed by the application programmer); it is possible, when installing the compiler, to prevent the use of PROCESS/CBL statements.
The reason for the above is standardisation. There are compiler options which affect code generation, and using different code generation options within the same system can cause unwanted affects. Even across systems, different code generation options may not be desirable if programmers are prone to expect the "normal" options.
It is unlikely that listing-only options will be "nailed", but if you are prevented from specifying options, then you may need to make a special request. This is not common, but you may be unlucky. Not my fault if it doesn't work for you.
This compiler options, and how you can specify them, are documented in the Enterprise COBOL Programming Guide for your specific release. There you will also find the documentation of the pseudo-assembler (be aware that it appears in the document as "pseudo-assembler", "pseudoassembler" and "pseudo assembler", for no good reason).
When you see the pseudo-assembler, you will see that it is not in the same format as an Assembler statement (I've never discovered why, but as far as I know it has been that way for more than 40 years). The line with the pseudo-assembler will also contain the machine-code in the format you are already familiar with from the output of the Assembler.
Don't expect to see a compiled COBOL program looking like an Assembler program that you would write. Enterprise COBOL adheres to a language Standard (1985) with IBM Extensions. The answer to "why does it do it likely that" will be "because", except for optimisations (see later).
What you see will depend heavily on the version of your compiler, because in the summer of 2013, IBM introduced V5, with entirely new code-generation and optimisation. Up to V4.2, the code generator dated back to "ESA", which meant that over 600 machine instructions introduced since ESA were not available to Enterprise COBOL programs, and extended registers. The same COBOL program compiled with V4.2 and with V6.1 (latest version at time of writing) will be markedly different, and not only because of the different instructions, but also because the structure of an executable COBOL program was also redesigned.
Then there's opimisation. With V4.2, there was one level of possible optimisation, and the optimised code was generally "recognisable". With V5+, there are three levels of optimisation (you get level zero without asking for it) and the optimisations are much more extreme, including, well, extreme stuff. If you have V5+, and want to know a bit more about what is going on, use OPT(0) to get a grip on what is happening, and then note the effects of OPT(1) and OPT(2) (and realise, with the increased compile times, how much work is put into the optimisation).
There's not really a substantial amount of official documentation of the internals. Search-engineing will reveal some stuff. IBM's Compiler Cafe:COBOL Cafe Forum - IBM is a good place if you want more knowledge of V5+ internals, as a couple of the developers attend there. For up to V4.2, here may be as good a place as any to ask further specific questions.
for a study on genetic programming, I would like to implement an evolutionary system on basis of llvm and apply code-mutations (possibly on IR level).
I found llvm-mutate which is quite useful executing point mutations.
As far as I have understood, the instructions get count/numbered, one can then e.g. delete a numbered instruction.
However, introduction of new instructions seems to be possible as one of the availeable statements in the code.
Real mutation however would allow to insert any of the allowed IR instructions, irrespective of it beeing used in the code to be mutated.
In addition, it should be possible to insert library function calls of linked libraries (not used in the current code, but possibly available, because the lib has been linked in clang).
Did I overlook this in the llvm-mutate or is it really not possible so far?
Are there any projects trying to /already have implement(ed) such mutations for llvm?
llvm has lots of code analysis tools which should allow the implementation of the afore mentioned approach. llvm is huge, so I'm a bit disoriented. Any hints which tools could be helpful (e.g. getting a list of available library functions etc.)?
Thanks
Alex
Very interesting question. I have been intrigued by the possibility of doing binary-level genetic programming for a while. With respect to what you ask:
It is apparent from their documentation that LLVM-mutate can't do what you are asking. However, I think it is wise for it not to. My reasoning is that any machine-language genetic program would inevitably face the "Halting Problem", e.g. it would be impossible to know if a randomly generated instruction would completely crash the whole computer (for example, by assigning a value to a OS-reserved pointer), or it might run forever and take all of your CPU cycles. Turing's theorem tells us that it is impossible to know in advance if a given program would do that. Mind you, LLVM-mutate can cause for a perfectly harmless program to still crash or run forever, but I think their approach makes it less likely by only taking existing instructions.
However, such a thing as "impossibility" only deters scientists, not engineers :-)...
What I have been thinking is this: In nature, real mutations work a lot more like LLVM-mutate that like what we do in normal Genetic Programming. In other words, they simply swap letters out of a very limited set (A,T,C,G) and every possible variation comes out of this. We could have a program or set of programs with an initial set of instructions, plus a set of "possible functions" either linked or defined in the program. Most of these functions would not be actually used, but they will be there to provide "raw DNA" for mutations, just like in our DNA. This set of functions would have the complete (or semi-complete) set of possible functions for a problem space. Then, we simply use basic operations like the ones in LLVM-mutate.
Some possible problems though:
Given the amount of possible variability, the only way to have
acceptable execution times would be to have massive amounts of
computing power. Possibly achievable in the Cloud or with GPUs.
You would still have to contend with Mr. Turing's Halting Problem.
However I think this could be resolved by running the solutions in a
"Sandbox" that doesn't take you down if the solution blows up:
Something like a single-use virtual machine or a Docker-like
container, with a time limitation (to get out of infinite loops). A
solution that crashes or times out would get the worst possible
fitness, so that the programs would tend to diverge away from those
paths.
As to why do this at all, I can see a number of interesting applications: Self-healing programs, programs that self-optimize for an specific environment, program "vaccination" against vulnerabilities, mutating viruses, quality assurance, etc.
I think there's a potential open source project here. It would be insane, dangerous and a time-sucking vortex: Just my kind of project. Count me in if someone doing it.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Why decompiling a delphi exe, is so easy, compared to others executables built with other programming languages/compilers?
There are a few things that help with reversing delphi programs:
You get the full form data including the name of event handler methods
All members with published visibility have metadata used with RTTI
The compiler is pretty bad at optimizing. It does no whole program optimization and the assembly is usually a straight forward translation of the original source with only minor optimizations. (At least it was in the versions I used, might have improved since then)
All classes, even those compiled with RTTI off have some level of metadata available. In particular it's possible to get the name and inheritance structure of classes. And for any instance of a class you happen to see in the debugger you can get its VMT and thus its class name.
Delphi uses textfiles describing the content of your form and hooks up event handlers by name. This approach obviously needs enough metadata to deserialize that textual representation of a from and hook up the eventhandlers by name.
An alternative some other GUI toolkits use is auto-generating code that initializes the form and hooks up the event handler with code. Since this code directly uses pointers to the eventhandlers and directly assigns to properties/calls setters it doesn't need any metadata. Which has the side-effect that reversing becomes a bit harder.
It shouldn't be too hard to create a program that transforms a dfm file into a series of hardcoded instructions that creates the form instead. So a tool like DeDe won't work that well anymore. But that doesn't gain you much in practice.
But figuring out which evenhandler corresponds to which control/event is still rather easy. Especially since stuff like FLIRT identifies most library functions. So you just need to breakpoint the one you're interested in and then step into the user code.
The statement you make is false. Delphi is not particularly more easy to decompile than code produced by other mainstream compilers.
For .net languages there is Reflector.
C++ is covered in this Stack Overflow question.
Python/Perl/Ruby etc. are interpreted.
If you were able to prove that the results of decompiling a Delphi executable were of significantly higher quality than in other widely used languages then your question would carry more weight.
Story from the trenches: Decompiling a tiny Delphi DLL
I've been through a Delphi decompiling session myself. It was one of those fake-sounding "I lost my sources" thing, I really did lose the sources for a tiny Firebird UDF library. Now I do no better, I didn't jump right into decompiling because the library was so small and I knew a rewrite would be much faster.
This DLL exports a function that looks like this:
function udf_do_some_math(Number1, Number2:Currency): Currency;
After doing the sane thing and rewriting the function and doing some regression tests I discovered some obscure corner-cases where the new function's result wasn't the same as the old function's result! The trouble was, the new function's result was the correct result, the old DLL contained a BUG and I had to reproduce the BUG - with this function consistency is more important then accuracy.
Again, did the sane thing and tried to "guess" at the BUG. I knew it was a rounding issue but simply couldn't figure out what it was. Finally I decided to give decompilers I try. After all this was a small library, the entry-point was straight-forward and I didn't really need re-compilable code, nor 100% decompilation: I only needed enough to figure out the old BUG so I can reproduce it!
Decompiling failed! I tried lots of different decompilers, including a couple of "commercial" ones. Most produced what on the surface looked like good data, but not enough to figure out the old bug. The most promising one, the one with version specific knowledge of the VCL and RTL gave the worst failure: sure, it figured out the RTL calls, gave them names, but failed to locate the exported function! The one function I was interested in wasn't shown int the list of entry points, and it should have been straight forward since it's an exported function.
This decompiling attempt should have been easy because:
The code was fairly simple and not a lot of it.
It was a DLL with an exported function, none of the complexity you'd expect from an event-driven exe.
I wasn't interested in re-compilable code, I simply wanted to find an old bug so I can reproduce it.
I didn't ask for Pascal code, assembler would've been good enough.
I knew precisely what the code was doing and how it was doing it. It wasn't cryptic 3rd party code.
My solution
After decompilers failed me I turned to my own trusty Delphi IDE for debugging. I wrote a small Delphi program that directly imports the function from the DLL, created a fake Firbird memory manager DLL so my DLL can load, called my old function with the parameters I knew would give bad results, steped into the code using the debugger and closely watched the FPU registers. After a few failed attempts I finally noticed a value was popped from the FPU stack as integer where it shouldn't have been Integer so I had my BUG: I mistakenly defined an Integer local variable where I should have used Currency. Armed with that knowledge I was able to reproduce the bug.
Only thing that is easier in Delphi is retrieving VCLs.
After using decompilers like DeDe you will get application user interface but without any logic.
So if you want to retrieve only forms and buttons - Delphi is easier than other compilers, but if you want to know what is going on after clicking on the button you'll need to use ollydbg or other (debugger/disassembler) as for other languages that creates executables.
There are pros and cons. I am not sure what angle your referring to as being easier. There is also a huge difference in a 1 form simple application, versus a very in-depth application that has many forms and tons of classes and functions. It's like Notepad versus Office 2013 (given they were coded in delphi, just an example comparing complexity not language).
In a small app, having the extra information that Delphi apps "usually" contain can make it a breeze. However, in a large application it may "help", but you have a million calls to dig through. They may help you get in the near vicinity, but calls inside of calls inside of calls, then multiple returns used as jumps... makes you dizzy. Then if the app "was" packed or protected, some things can still be a garbled mess. While it may work programming wise, reading it can be a lot harder. I was in one the other day, where all of the strings were encrypted, so "referenced text strings" were no help, and the encryption was not a simple md5 or base64, it was some custom algorithm. Maybe an MD5 with a salt, then base64 encoded? I never could get to the exact method on the strings. I knew what some of them were supposed to be, but couldn't reproduce the method, even though it looked like it was base64, it was the base64 of the string already encrypted some how... I dont rely on text strings, but in a large large app, every little bit helps.
Of course, my interpretation of this question, was looking at a Delphi exe in OllyDbg. I could be off base on where you guys were going with this topic, but I feel in regards to Olly and reversing, I am on point (if that was what you were talking about) lol.
What should I know about code obfuscation in Delphi?
Should I or shouldn't I do it?
How it is done and is there any good tools (commercial/free) to automate it?
Why would you need to?
As a whole Delphi does not decompile back, unlike .net, so, while decompilation is always a bit of a risk, Ive never found a decompiler that actually did it to a useful way, lots of areas got left as assembler and so on.
If people want to rework your work, they can, no matter what, obfuscation or not, heck, some coders write almost naturally obfuscated code (having worked with a few)
My vote therefore, is shouldnt bother. Unless someone can show me a decompiler for delphi that really works, and produces full sets of compilable, and all delphi where it was originally, I wouldnt worry one drop.
Pythia is a program that can obfuscate binaries (not the source) created with Delphi or C++ Builder. Source code for Pythia is here.
Before:
After:
There's no point obfuscating since the compiler already does that for you.
There is no way to re-create the source code from the binary.
And components can be distributed in a useful way without having to distribute the source code.
So there usually is no (technical) reason for distributing the source code.
You could do other things to reduce an attacker's ability to disable your software activation system, for example, but in a native-compiled system like Delphi, you can't recreate source code from the binaries. Another answer (the accepted one at the moment) says exactly this, and someone else pointed out a helpful tool to obfuscate the RTTI information that people might use to gain some insight into the internals of your software.
You could investigate the following hardening techniques to block modification of your system, if that's what you really want:
Self-modifying code, with gating logic that divides critical functions of your code such as software activation, into various levels of inter-operable checksums, and code damage and repair.
Debug detection. You can detect debuggers being used on your software and attempt to block the software from working in this case.
Encrypt the PE binary data on disk, and decrypt it either at load time, or just in time before it runs, so that critical assembler code can not be so easily reverse engineered back to assembly language.
As others have stated, hackers working on your software do not need to restore the original sources to modify it. They will attempt, if they try it at all, to modify your binaries directly, and will use a detailed and expansive knowledge of assembler language to circumvent things you may wish them not to.
You can use free JCF (Jedi Code Formatter) to obfuscate your source code. However, pascal syntax does not allow strong obfuscation and JCF even doesn't do it's best (well, it's a code formatting tool, not obfuscator!)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Why decompiling a delphi exe, is so easy, compared to others executables built with other programming languages/compilers?
There are a few things that help with reversing delphi programs:
You get the full form data including the name of event handler methods
All members with published visibility have metadata used with RTTI
The compiler is pretty bad at optimizing. It does no whole program optimization and the assembly is usually a straight forward translation of the original source with only minor optimizations. (At least it was in the versions I used, might have improved since then)
All classes, even those compiled with RTTI off have some level of metadata available. In particular it's possible to get the name and inheritance structure of classes. And for any instance of a class you happen to see in the debugger you can get its VMT and thus its class name.
Delphi uses textfiles describing the content of your form and hooks up event handlers by name. This approach obviously needs enough metadata to deserialize that textual representation of a from and hook up the eventhandlers by name.
An alternative some other GUI toolkits use is auto-generating code that initializes the form and hooks up the event handler with code. Since this code directly uses pointers to the eventhandlers and directly assigns to properties/calls setters it doesn't need any metadata. Which has the side-effect that reversing becomes a bit harder.
It shouldn't be too hard to create a program that transforms a dfm file into a series of hardcoded instructions that creates the form instead. So a tool like DeDe won't work that well anymore. But that doesn't gain you much in practice.
But figuring out which evenhandler corresponds to which control/event is still rather easy. Especially since stuff like FLIRT identifies most library functions. So you just need to breakpoint the one you're interested in and then step into the user code.
The statement you make is false. Delphi is not particularly more easy to decompile than code produced by other mainstream compilers.
For .net languages there is Reflector.
C++ is covered in this Stack Overflow question.
Python/Perl/Ruby etc. are interpreted.
If you were able to prove that the results of decompiling a Delphi executable were of significantly higher quality than in other widely used languages then your question would carry more weight.
Story from the trenches: Decompiling a tiny Delphi DLL
I've been through a Delphi decompiling session myself. It was one of those fake-sounding "I lost my sources" thing, I really did lose the sources for a tiny Firebird UDF library. Now I do no better, I didn't jump right into decompiling because the library was so small and I knew a rewrite would be much faster.
This DLL exports a function that looks like this:
function udf_do_some_math(Number1, Number2:Currency): Currency;
After doing the sane thing and rewriting the function and doing some regression tests I discovered some obscure corner-cases where the new function's result wasn't the same as the old function's result! The trouble was, the new function's result was the correct result, the old DLL contained a BUG and I had to reproduce the BUG - with this function consistency is more important then accuracy.
Again, did the sane thing and tried to "guess" at the BUG. I knew it was a rounding issue but simply couldn't figure out what it was. Finally I decided to give decompilers I try. After all this was a small library, the entry-point was straight-forward and I didn't really need re-compilable code, nor 100% decompilation: I only needed enough to figure out the old BUG so I can reproduce it!
Decompiling failed! I tried lots of different decompilers, including a couple of "commercial" ones. Most produced what on the surface looked like good data, but not enough to figure out the old bug. The most promising one, the one with version specific knowledge of the VCL and RTL gave the worst failure: sure, it figured out the RTL calls, gave them names, but failed to locate the exported function! The one function I was interested in wasn't shown int the list of entry points, and it should have been straight forward since it's an exported function.
This decompiling attempt should have been easy because:
The code was fairly simple and not a lot of it.
It was a DLL with an exported function, none of the complexity you'd expect from an event-driven exe.
I wasn't interested in re-compilable code, I simply wanted to find an old bug so I can reproduce it.
I didn't ask for Pascal code, assembler would've been good enough.
I knew precisely what the code was doing and how it was doing it. It wasn't cryptic 3rd party code.
My solution
After decompilers failed me I turned to my own trusty Delphi IDE for debugging. I wrote a small Delphi program that directly imports the function from the DLL, created a fake Firbird memory manager DLL so my DLL can load, called my old function with the parameters I knew would give bad results, steped into the code using the debugger and closely watched the FPU registers. After a few failed attempts I finally noticed a value was popped from the FPU stack as integer where it shouldn't have been Integer so I had my BUG: I mistakenly defined an Integer local variable where I should have used Currency. Armed with that knowledge I was able to reproduce the bug.
Only thing that is easier in Delphi is retrieving VCLs.
After using decompilers like DeDe you will get application user interface but without any logic.
So if you want to retrieve only forms and buttons - Delphi is easier than other compilers, but if you want to know what is going on after clicking on the button you'll need to use ollydbg or other (debugger/disassembler) as for other languages that creates executables.
There are pros and cons. I am not sure what angle your referring to as being easier. There is also a huge difference in a 1 form simple application, versus a very in-depth application that has many forms and tons of classes and functions. It's like Notepad versus Office 2013 (given they were coded in delphi, just an example comparing complexity not language).
In a small app, having the extra information that Delphi apps "usually" contain can make it a breeze. However, in a large application it may "help", but you have a million calls to dig through. They may help you get in the near vicinity, but calls inside of calls inside of calls, then multiple returns used as jumps... makes you dizzy. Then if the app "was" packed or protected, some things can still be a garbled mess. While it may work programming wise, reading it can be a lot harder. I was in one the other day, where all of the strings were encrypted, so "referenced text strings" were no help, and the encryption was not a simple md5 or base64, it was some custom algorithm. Maybe an MD5 with a salt, then base64 encoded? I never could get to the exact method on the strings. I knew what some of them were supposed to be, but couldn't reproduce the method, even though it looked like it was base64, it was the base64 of the string already encrypted some how... I dont rely on text strings, but in a large large app, every little bit helps.
Of course, my interpretation of this question, was looking at a Delphi exe in OllyDbg. I could be off base on where you guys were going with this topic, but I feel in regards to Olly and reversing, I am on point (if that was what you were talking about) lol.