F# MSIL obfuscation - f#

Two obfuscation-related questions:
1) Is there any tool that can disassemble F# back to its source form, or something close to it, from the MSIL target form? This is not an attempt at security through obscurity but I want to protect some source code from "theft".
2) I looked briefly at some F# compiler output and in general it appears pretty gibberish compared to what you get if you disassemble C# compiled code, presumably because C# is closer to the MSIL intermediate representation. The only partly mangled code I've seen from the C# compiler is iterators (and presumably async as of C# 5.0).
So far my impression is that the F# compiled code is reasonably "obfuscated" but is that true? (I realize this is a somewhat subjective question.)

I haven't heard of anything like this; however, I think it's quite likely such a tool will appear in the relatively-near future.
Assemblies produced by the F# compiler (i.e., MSIL and related metadata) aren't obfuscated in any way. However, some of the code it produces is far different than the code produced by the C# or VB.NET compilers, so it's not going to be as easy to reverse-engineer (simply because the tools to do so aren't available). Of course, as #Craig Stuntz said, this doesn't afford much protection against an experienced, motivated attacker.
If you're really paranoid, you might consider using an obfuscation tool on your compiled assemblies before shipping them. I've been using {SmartAssembly} with F# since late 2010, so I know that one works with F#; if you go with another tool, make sure you test it against some reasonably complicated F# assemblies before buying it -- at the time I was looking for an obfuscator, many of them didn't work correctly (or at all) with F# assemblies.
I wrote up some notes a while back about obfuscating F# assemblies, if you want to read more: Any experience using .NET obfuscators on F# assemblies?

F# is part of the .NET language therefore it can be decompiled. You can have a look at RedGate's Reflector if you want to spend money or 0xd4d's dnSpy (and yes, its the same developer as the very-well known deobfuscator De4Dot). Decompiled code is really close to hard-coded code, the logic is still the same and you can copy/paste the source code.
If you want to protect a F# application you may consider using an obfuscator, & currently they are almost all handled by De4Dot so it's hard to choose wisely, though .NETGuard is really strong, it can handle F# applications, it can produce a native output & it has some strong constant protection and De4Dot cannot handle it.

Related

What risks exist if I work in a C# shop and attempt to write F# just to rely on ILSpy for conversion?

What risks are involved if I work in a C# shop and I attempt to write a feature in F# and then rely on ILSpy to translate the F# source code to a C# representation?
I would very strongly recommend against doing this.
F# code that has been decompiled into C# tends to be extremely verbose and unreadable. It will be near impossible for anyone who doesn't possess a copy of the original F# code to understand or maintain.
Functional code gives you opportunities for code reuse that you wouldn't have in an OO language. The C# code produced by decompiling probably wouldn't offer (m)any avenues of reuse beyond the boundaries of your decompiled F#.
What's idiomatic in F# sometimes isn't in C#, that's particularly true after an intermediate stage of decompilation. The code would likely not pass a review process.
Units of measure and inline functions with static type constraints are both features of the F# compiler rather something provided by .NET. You might gain some advantage from them by using the decompiled C# directly but any modifications made to the C# source wouldn't be checked for e.g. dimensional correctness.
I would also second Tomas' suggestion of having a read through this article: http://fsharpforfunandprofit.com/posts/low-risk-ways-to-use-fsharp-at-work/
I would suggest, however, that it could be worth having a conversation with your team/manager(s) about the possibility of introducing F# at your workplace.
My personal experience of using F# commercially is that development time often tends to be shorter (sometimes substantially) compared to the same project done in C# and it's usually easier to verify and test the result. These are advantages that are very appealing commercially.

What's the easiest way to build an F# compiler that runs on the JVM and generates Java bytecode?

The current F# Compiler is written in F#, is open source and runs on .Net and Mono, allowing it to execute on many platforms including Windows, Mac and Linux. F#'s Code Quotations mechanism has been used to compile F# to JavaScript in projects like WebSharper, Pit and FunScript. There also appears to be some interest in running F# code on the JVM.
I believe a version of the OCaml compiler was used to originally Bootstrap the F# compiler.
If someone wanted to build an F# compiler that runs on the JVM would it be easier to:
Modify the existing F# compiler to emit Java bytecode and then compile the F# compiler with it?
Use a JVM based ML compiler like Yeti to Bootstrap a minimal F# compiler on the JVM?
Re-write the F# compiler from scratch in Java as the fjord project appears to be attempting?
Something else?
Another option that should probably be considered is to convert the .NET CLR byte code into JVM byte-code like http://www.ikvm.net does with JVM > CLR byte codes. Although this approach has been considered and dismissed by the fjord owner.
Getting buy-in from the top with option 1) and have the F# compiler team have pluggable backends that could emit Java bytecode sounds in theory like it would produce the most polished solution.
But if you look at other languages that have been ported to different platforms this is rarely the case. Most of the time it's been a rewrite from scratch. But this is also likely due to the original language team having no interest in supporting alternative platforms themselves and that the original host implementation might've not been able to support multiple backends and it's already too slow for this to be a feasible option to start with.
My hunch is a combination of re-writing from scratch and being able to do as much code sharing and automation as possible from the original implementation. E.g. if the test suites could be re-used for both implementations it would take a lot of the burden off the JVM port and go a long way in ensuring language parity.
If I really had to do this, I would probably start with the #1 approach - add JVM backend to the existing compiler. But I would also try to argue for a different target VM.
Quotations are not very relevant - as an author of WebSharper I can assure you that while quotations can give you a nice F#-like language to program with, they are restrictive, and not optimized. I imagine that for potential JVM F# users the bar would be a lot higher - full language compatibility and comparable performance. This is very hard.
Take tail calls, for example. In WebSharper we apply heuristics to optimize some local tail calls to loops in JavaScript, but that is not enough - you cannot in general rely on TCO, as you do in general F# libraries. This is ok for WebSharper as our users do not expect to have full F#, but will not be ok for a JVM F# port. I believe most JVM implementations do not do TCO, so it will have to be implemented with some indirection, introducing a performance hit.
An bytecode re-compilation approach mentioned by #mythz sounds very attractive as it allows more than just porting F# - ideally it allows porting more .NET software to the JVM. I worked quite a bit with .NET bytecode analysis on an internal WebSharper 3.0 project - we are looking at the option of compiling .NET bytecode instead of F# quotations to JavaScript. But there are huge challenges there:
A lot of code in BCL is opaque (native) - and you cannot decompile it
The generics model is fairly complicated. I have implemented a JavaScript runtime that models class and method generics, instantiation, type generation, and basic reflection with some precision and reasonable performance. This was difficult enough in dynamic JavaScript with closures and is seems quite difficult to do in a performant way on the JVM - but maybe I just do not see a simple solution.
Value types create significant complications in the bytecode. I am yet to figure this one out for WebSharper 3.0. They cannot be ignored either, as they are used extensively by many libraries you would want ported.
Similarly, basic reflection is used in many real-world .NET libraries - and it is a nightmare to cross-compile in terms of both lots of native code and proper support for generics and value types.
Also, the bytecode approach does not remove the question on how to implement tail calls. AFAIK, Scala does not implement tailcalls. They have certainly the talent and the funding to do that - the fact that they do not, tells me a lot about how practical it is to do TCO on the JVM. For our .NET->JavaScript port I will probably go a similar route - no TCO guarantees unless you specifically ask for trampolining which will work but cost you an order of magnitude (or two) in performance.
There is a project that compiles OCaml to the JVM, OCaml-Java: it's pretty complete and in particular can compile the OCaml's compiler (written in OCaml) sources. I'm not sure which aspects of the F# language you're interested in, but if you're mainly looking at getting a mature strict typed functional language to the JVM, that may be a good option.
I suspect any approach would be a lot of work, but I think your first suggestion is the only one that would avoid introducing lots of additional incompatibilities and bugs. The compiler's pretty complex and there are a lot of corner cases around overload resolution, etc. (and the spec probably has gaps too), so it seems very unlikely that a new implementation would have consistently compatible semantics.
Modify the existing F# compiler to emit Java bytecode and then compile the F# compiler with it?
Use a JVM based ML compiler like Yeti to Bootstrap a minimal F# compiler on the JVM?
Porting the compiler shouldn't be that hard if it is written in F#.
I'd probably go the first way, because this is the only way one could hope to keep the new compiler in sync with the .net F# compiler.
Re-write the F# compiler from scratch in Java as the fjord project appears to be attempting?
This is certainly the least elegant approach, IMHO.
Something else?
When the compiler is done, you'll have 90% of the work left to do.
For example, not knowing much F#, but I assume it is easy to use any .NET libraries out there.
That means, the basic problem is to port the .NET ecosystem, somehow.
I was looking for something in similar lines, though it was more like a F# to Akka translator/compiler. As far as F# -> JVM is concerned, I came across two not quite production ready options:
1. F# -> [Fjord][1] -> JVM.
2. F# -> [Funscript][2] -> [Vert.X][3] -> JVM

Would it be safe to rely on DeHL for new projects?

I've been browsing the DeHL repository on GoogleCode, and it looks really good to me.
Many interesting features that make basic programming tasks easier; Some neat things that are in the DotNet FCL, but are missing from the Delphi RTL can be found in this library;
Coded in a modern way, making good use of new language features;
Each class, record type, member function and parameter is documented in such a way that it'll show in the code completion of the Delphi IDE;
Well-organized and clean code;
Plenty of unit tests;
Open source and Free;
Basically, it looks like this library should've been included with Delphi, as part of the RTL.
One major drawback: The project has been discontinued. :-(
Now my question is:
Would it be safe to rely on this library for future projects, and use it as a base framework to build upon?
Basically I'd like to hear from somebody who's actually used this library whether or not it's worth it to invest time in getting to know this library, and why.
IIRC the project was discontinued because it was an over-engineered first attempt and a lot of its features turned out really messy and bloated. You should look at Alex Ciobanu's second attempt, which is simply called Collections. It contains most of the interesting features from DeHL, but leaner.
Be careful, though. It still makes heavy use of generics, which will make your binary size really big if you use it a lot, because the compiler team hasn't implemented a way to collapse duplicate code yet.

When and how should I obfuscate my Delphi code?

What should I know about code obfuscation in Delphi?
Should I or shouldn't I do it?
How it is done and is there any good tools (commercial/free) to automate it?
Why would you need to?
As a whole Delphi does not decompile back, unlike .net, so, while decompilation is always a bit of a risk, Ive never found a decompiler that actually did it to a useful way, lots of areas got left as assembler and so on.
If people want to rework your work, they can, no matter what, obfuscation or not, heck, some coders write almost naturally obfuscated code (having worked with a few)
My vote therefore, is shouldnt bother. Unless someone can show me a decompiler for delphi that really works, and produces full sets of compilable, and all delphi where it was originally, I wouldnt worry one drop.
Pythia is a program that can obfuscate binaries (not the source) created with Delphi or C++ Builder. Source code for Pythia is here.
Before:
After:
There's no point obfuscating since the compiler already does that for you.
There is no way to re-create the source code from the binary.
And components can be distributed in a useful way without having to distribute the source code.
So there usually is no (technical) reason for distributing the source code.
You could do other things to reduce an attacker's ability to disable your software activation system, for example, but in a native-compiled system like Delphi, you can't recreate source code from the binaries. Another answer (the accepted one at the moment) says exactly this, and someone else pointed out a helpful tool to obfuscate the RTTI information that people might use to gain some insight into the internals of your software.
You could investigate the following hardening techniques to block modification of your system, if that's what you really want:
Self-modifying code, with gating logic that divides critical functions of your code such as software activation, into various levels of inter-operable checksums, and code damage and repair.
Debug detection. You can detect debuggers being used on your software and attempt to block the software from working in this case.
Encrypt the PE binary data on disk, and decrypt it either at load time, or just in time before it runs, so that critical assembler code can not be so easily reverse engineered back to assembly language.
As others have stated, hackers working on your software do not need to restore the original sources to modify it. They will attempt, if they try it at all, to modify your binaries directly, and will use a detailed and expansive knowledge of assembler language to circumvent things you may wish them not to.
You can use free JCF (Jedi Code Formatter) to obfuscate your source code. However, pascal syntax does not allow strong obfuscation and JCF even doesn't do it's best (well, it's a code formatting tool, not obfuscator!)

Is it possible to convert F# code to C# code?

The reason why I am asking is that I'm learning F# and would like to attend TopCoder competitions. However, F# is not among the list of languages supported there. But C# is on the list (to be honest, this is the case for almost all online coding competitions, except Google Code Jam and Facebook Hacker cup).
The possible workarounds I can think of at this moment are
1) find a translator that can translate F# source code directly into C#
2) compile F# code into .net executable first, then disassemble it back to C# code
The minimum requirement is that the generated C# must be able to compile into a runnable .net executable, preferable as less external dependency as possible.
The first approach seems unlikely, a quick google search turns out nothing relevant.
Approach two looks more promising, there are .net disassemblers exist.
I tried the most popular one --- Reflector from Red Gate. While it can perfectly dissemble C# executables, it appears to have problems with executables compiled from F#: it happily disassembled, but the resulting C# code has some special characters such as adding a leading $ sign to a class name and other weird stuffs, so it cannot be compiled. I was using Visual Studio 2010 Professional, the latest Reflector beta version (which is free).
Am I missing anything here? Is it possible?
Update:
It looks like this is still impossible. For now, I'll use C# instead.
As others already pointed out in the comments - if there is some way to do that, there will be quite a few nasty cases where it probably won't quite work and it will be very fragile...
One way to deal with the problem (for you) is to just write the solution in F# and then rewrite it to C#. This may sound stupid, but there are some advantages:
In F#, you can easily prototype the solution, so you'll be able to find the right solution faster.
When translating code to C#, you'll probably find yourself using features like lambda expressions more often, so it may even improve your C# skills...
If you rely on .NET libraries, then this part of code will be easy to translate.
Of course, the best thing would be to convince the organizers that they should support F# (which probably wouldn't be too difficult if they allow C# already), but I understand that this may be a challange.

Resources