Related
We all know situations when you cannot go open source and freely distribute software - and I am in one of these situations.
I have an app that consists of a number of binaries (compiled from C sources) and Python code that wraps it all into a system. This app used to work as a cloud solution so users had access to app functions via network but no chance to touch the actual server where binaries and code are stored.
Now we want to deliver the "local" version of our system. The app will be running on PCs that our users will physically own. We know that everything could be broken, but at least want to protect the app from possible copying and reverse-engineering as much as possible.
I know that Docker is a wonderful deployment tool so I wonder: is it possible to create encrypted Docker containers where no one can see any data stored in the container's filesystem? Is there a known solution to this problem?
Also, maybe there are well known solutions not based on Docker?
The root user on the host machine (where the docker daemon runs) has full access to all the processes running on the host. That means the person who controls the host machine can always get access to the RAM of the application as well as the file system. That makes it impossible to hide a key for decrypting the file system or protecting RAM from debugging.
Using obfuscation on a standard Linux box, you can make it harder to read the file system and RAM, but you can't make it impossible or the container cannot run.
If you can control the hardware running the operating system, then you might want to look at the Trusted Platform Module which starts system verification as soon as the system boots. You could then theoretically do things before the root user has access to the system to hide keys and strongly encrypt file systems. Even then, given physical access to the machine, a determined attacker can always get the decrypted data.
What you are asking about is called obfuscation. It has nothing to do with Docker and is a very language-specific problem; for data you can always do whatever mangling you want, but while you can hope to discourage the attacker it will never be secure. Even state-of-the-art encryption schemes can't help since the program (which you provide) has to contain the key.
C is usually hard enough to reverse engineer, for Python you can try pyobfuscate and similar.
For data, I found this question (keywords: encrypting files game).
If you want a completely secure solution, you're searching for the 'holy grail' of confidentiality: homomorphous encryption. In short, you want to encrypt your application and data, send them to a PC, and have this PC run them without its owner, OS, or anyone else being able to scoop at the data.
Doing so without a massive performance penalty is an active research project. There has been at least one project having managed this, but it still has limitations:
It's windows-only
The CPU has access to the key (ie, you have to trust Intel)
It's optimised for cloud scenarios. If you want to install this to multiple PCs, you need to provide the key in a secure way (ie just go there and type it yourself) to one of the PCs you're going to install your application, and this PC should be able to securely propagate the key to the other PCs.
Andy's suggestion on using the TPM has similar implications to points 2 and 3.
Sounds like Docker is not the right tool, because it was never intended to be used as a full-blown sandbox (at least based on what I've been reading). Why aren't you using a more full-blown VirtualBox approach? At least then you're able to lock up the virtual machine behind logins (as much as a physical installation on someone else's computer can be locked up) and run it isolated, encrypted filesystems and the whole nine yards.
You can either go lightweight and open, or fat and closed. I don't know that there's a "lightweight and closed" option.
I have exactly the same problem. Currently what I was able to discover is bellow.
A. Asylo(https://asylo.dev)
Asylo requires programs/algorithms to be written in C++.
Asylo library is integrated in docker and it seems to be feаsable to create custom dоcker image based on Asylo .
Asylo depends on many not so popular technologies like "proto buffers" and "bazel" etc. To me it seems that learning curve will be steep i.e. the person who is creating docker images/(programs) will need a lot of time to understand how to do it.
Asylo is free of charge
Asylo is bright new with all the advantages and disadvantages of being that.
Asylo is produced by Google but it is NOT an officially supported Google product according to the disclaimer on its page.
Asylo promises that data in trusted environment could be saved even from user with root privileges. However, there is lack of documentation and currently it is not clear how this could be implemented.
B. Scone(https://sconedocs.github.io)
It is binded to INTEL SGX technology but also there is Simulation mode(for development).
It is not free. It has just a small set of functionalities which are not paid.
Seems to support a lot of security functionalities.
Easy for use.
They seems to have more documentation and instructions how to build your own docker image with their technology.
For the Python part, you might consider using Pyinstaller, with appropriate options, it can pack your whole python app in a single executable file, which will not require python installation to be run by end users. It effectively runs a python interpreter on the packaged code, but it has a cipher option, which allows you to encrypt the bytecode.
Yes, the key will be somewhere around the executable, and a very savvy costumer might have the means to extract it, thus unraveling a not so readable code. It's up to you to know if your code contains some big secret you need to hide at all costs. I would probably not do it if I wanted to charge big money for any bug solving in the deployed product. I could use it if client has good compliance standards and is not a potential competitor, nor is expected to pay for more licenses.
While I've done this once, I honestly would avoid doing it again.
Regarding the C code, if you can compile it into executables and/or shared libraries can be included in the executable generated by Pyinstaller.
My product needs to be deployed and installed on the client's server. How can I take care that my ruby code is encrypted and invisible
You can't. In order to execute the code, the CPU needs to understand the code. CPUs are much stupider than humans, so if the CPU can understand the code, then so can a human.
There are only two possibilities:
Don't give your client the code. (The "Google" model.) Instead, give them a service that runs your code under your control.
Give your client a sealed box. (The "XBox" model.) Give your client the code, pre-installed on a hardened, tamper-proof, secure computer under your control, running hardened, tamper-proof, secure firmware under your control, and a hardened, tamper-proof, secure OS under your control. Note that this is non-trivial: Microsoft employed some of the most brilliant hardware security, information security, and cryptography experts on the planet, and they still made a mistake that made the XBox easy to crack.
Unfortunately, you have excluded both those possibilities, so the answer is: you can't.
Note, however, that copying your code is illegal. So, if you don't do business with criminals, then it may not even be necessary to protect your code.
Here are some examples how other companies solve this problem:
Have a good relationship with your clients. People are less likely to steal from friends they like than from strangers they don't know, or people they actively dislike.
Make the product so good that clients want to pay.
Make the product so cheap that clients have no incentive to copy the code.
Offer additional values that you cannot get by copying the code, e.g. support, services, maintenance, training, customization, and consulting.
Especially in the corporate world, clients often prefer to pay, simply for having someone to sue in case something goes wrong. (You can see this as a special case of the last point.)
Note that copy protection schemes are not free. You at least have to integrate it into your product, which takes developer time and resources. And this assumes that the protection scheme itself is gratis, which is typically not the case. These are either pretty expensive, or you have to develop your own (which is also pretty expensive because experienced cryptographers and infosec specialists are not cheap, and cheap cryptographers and infosec specialists will not be able to create a secure system.)
This in turn increases the price of your product, which makes it more likely that someone can't afford it and will copy it.
Also, I have never seen a copy protection scheme that works. There's always something wrong with them. The hardware dongle is only available with an interface the client doesn't have. (For example, when computers stopped having serial and parallel ports in favor of USB, a lot of copy protection schemes still required serial or parallel ports and didn't work with USB-to-serial or USB-to-parallel adapters.) Or, the client uses a VM, so there is no hardware to plug the dongle into. Or, the copyright protection scheme requires Internet access, but that is not available. Or, the driver of the dongle crashes the client's machine. Or, the license key contains characters that can't easily by typed on the client's keyboard. Or, the copy protection scheme has a bug that doesn't allow non-ASCII characters, but you are using the client's name as part of the key. Or, the manufacturer of the copy protection scheme changes the format of dongle to an incompatible one without telling you, and without changing the type number, or the color and physical form of the dongle, so you don't notice.
Note that none of this is hypothetical: all of these have happened to me as a user. Several of these happened at vendors I know.
This means that a there will be significant amount of resources needed in your support department to deal with those problems, which increases the cost of your product even further. It also decreases client satisfaction, when they have problems with your product. (Again, I know some companies that use copy protection and get a significant amount of support tickets because of that.)
There are industries where it is quite common that people buy the product, but then use a cracked version anyway because the copyright protection schemes are so bad that the risk of losing your data due to a cracked version from an untrusted source is lower than losing your data due to the badly implemented copyright protection scheme.
There is a company that is very successful, and very loved by its users that does not use any copy protection in a market where everybody uses copy protection. This is how they do it:
Because they don't have to invest development resources into copy protection, their products are at least as good as their competition's for less development effort.
Because they don't have to invest development resources into copy protection, their products are cheaper than their competition's.
Because their product are not burdened with the overhead of copy protection, their products are more stable and more efficient than their competition's.
They have fair pricing, based on income levels in their target countries, meaning they charge lower prices in poorer countries. This makes it less likely that someone copies their product because they can't afford it.
A single license can be used on as many machines as you like, both Windows and macOS.
There is a no-questions-asked, full-refund return policy.
The lead-developer and the lead-designer personally respond to every single support issue, feature request, and enhancement suggestion.
Of course, they know full well that people abuse their return policy. They buy the product, use it for a project, then give it back. But, they have received messages from people saying "Hey, I copied your software and used it in a project. During this project, I realized how awesome your software is, here's your money, and here's something extra as an apology. Also, I showed it to my friends and colleagues, and they all bought a copy!"
Another example are switch manufacturers. Most of them have very strict license enforcement. However, one of them goes a different route: there is one version of the firmware, and it always has all features enabled. Nothing bad will happen if you use features that you haven't paid for. However, when you need support, they will compare your config to your account, and then say "Hey, we noticed that you are using some features you haven't paid for. We are sure that this is an honest mistake on your part, so we will help you this once, but please don't forget to send us a purchase order as soon as possible, thanks!"
Guess which manufacturer I prefer to work with, and prefer to recommend?
Say I have an Erlang system which sometimes go nasty and produces a lot of warnings, which will be logged to file, the system should still functional properly if the logging doesn't block disk IO, (i.e. the logs are "more or less expected"). (This is a real world scenario, not something I makeup)
The error_logger comes with erlang doesn't have overload protection, so if the amount of logs are really big, the logging will block disk IO and possibly cause system to malfunction.
My question is, why error_logger doesn't have overload protection by default, is it because that this feature is not actually needed if you design your architecture right? or is it because that this is some kind of advanced feature, if you need this feature, you should use another library, like lager?
I would say that overload protection is a feature, and that is a matter of implementation (and/or configuration).
Overload protection is perhaps a nice feature to have in a logging toolkit, useful for some people, even if probably most people won't need it, but error_logger is really a logging interface, one designed to support arbitrary implementations (each with disparate configuration depending on their features) that the developer/integrator/user can choose to plug in, depending on their requirements.
This may not be exhaustive, but things that immediately come to mind which change logging requirements are:
application (some applications obviously log more/less/differently than others)
host environment (capabilities are different for traditional or SSD storage for example)
use case of the application (an end user or developer might deploy with configurations that mean more or less logging)
local infrastructure and standards (some organisations might use only local logs, but others may use syslog everywhere, religiously)
external or 3rd party environmental factors (such as another network service which talks to the application/node, causing logging)
It's really important to decouple the logging interface from implementation, because:
a developer might change their mind about the logging implementation part way through a project, and the decoupling means this is easy
a system administrator might need to modify or override the developers default, because of host, use case, local infrastructure and other environmental factors, etc, some of which the developer may not be able to anticipate
Thus, because it's an interface, error_logger doesn't and shouldn't have overload protection; it's outside error_logger's remit.
I freely admit that it may be possible to make plausible arguments for including some implementation in an interface, and I can imagine arguments for including features like overload protection in error_logger despite what I've said, but it's a slippery slope. I would choose purity and simplicity instead; I think it's worth keeping error_logger lean and mean rather than allowing additional bulk to creep into it that will affect the performance of every logging implementation everywhere. Take that path and before you know it the limitations won't be the disk I/O blocking, it'll be error_logger itself because it's become bloated, and there won't be anything to be done about it other than invent a new error logger to use instead.
And if so, how. I'm talking about this 4GB Patch.
On the face of it, it seems like a pretty nifty idea: on Windows, each 32-bit application normally only has access to 2GB of address space, but if you have 64-bit Windows, you can enable a little flag to allow a 32-bit application to access the full 4GB. The page gives some examples of applications that might benefit from it.
HOWEVER, most applications seem to assume that memory allocation is always successful. Some applications do check if allocations are successful, but even then can at best quit gracefully on failure. I've never in my (short) life come across an application that could fail a memory allocation and still keep going with no loss of functionality or impact on correctness, and I have a feeling that such applications are from extremely rare to essentially non-existent in the realm of desktop computers. With this in mind, it would seem reasonable to assume that any such application would be programmed to not exceed 2GB memory usage under normal conditions, and those few that do would have been built with this magic flag already enabled for the benefit of 64-bit users.
So, have I made some incorrect assumptions? If not, how does this tool help in practice? I don't see how it could, yet I see quite a few people around the internet claiming it works (for some definition of works).
Your troublesome assumptions are these ones:
Some applications do check if allocations are successful, but even then can at best quit gracefully on failure. I've never in my (short) life come across an application that could fail a memory allocation and still keep going with no loss of functionality or impact on correctness, and I have a feeling that such applications are from extremely rare to essentially non-existent in the realm of desktop computers.
There do exist applications that do better than "quit gracefully" on failure. Yes, functionality will be impacted (after all, there wasn't enough memory to continue with the requested operation), but many apps will at least be able to stay running - so, for example, you may not be able to add any more text to your enormous document, but you can at least save the document in its current state (or make it smaller, etc.)
With this in mind, it would seem reasonable to assume that any such application would be programmed to not exceed 2GB memory usage under normal conditions, and those few that do would have been built with this magic flag already enabled for the benefit of 64-bit users.
The trouble with this assumption is that, in general, an application's memory usage is determined by what you do with it. So, as over the past years storage sizes have grown, and memory sizes have grown, the sizes of files that people want to operate on have also grown - so an application that worked fine when 1GB files were unheard of may struggle now that (for example) high definition video can be taken by many consumer cameras.
Putting that another way: applications that used to fit comfortably within 2GB of memory no longer do, because people want do do more with them now.
I do think the following extract from your link of 4 GB Patch pretty much explains the reason of how and why it works.
Why things are this way on x64 is easy to explain. On x86 applications have 2GB of virtual memory out of 4GB (the other 2GB are reserved for the system). On x64 these two other GB can now be accessed by 32bit applications. In order to achieve this, a flag has to be set in the file's internal format. This is, of course, very easy for insiders who do it every day with the CFF Explorer. This tool was written because not everybody is an insider, and most probably a lot of people don't even know that this can be achieved. Even I wouldn't have written this tool if someone didn't explicitly ask me to.
And to expand on CFF,
The CFF Explorer was designed to make PE editing as easy as possible,
but without losing sight on the portable executable's internal
structure. This application includes a series of tools which might
help not only reverse engineers but also programmers. It offers a
multi-file environment and a switchable interface.
And to quote a Microsoft insider, Larry Miller of Microsoft MCSA on a blog post about patching games using the tool,
Under 32 bit windows an application has access to 2GB of VIRTUAL
memory space. 64 bit Windows makes 4GB available to applications.
Without the change mentioned an application will only be able to
access 2GB.
This was not an arbitrary restriction. Most 32 bit applications simply
can not cope with a larger than 2GB address space. The switch
mentioned indicates to the system that it is able to cope. If this
switch is manually set most 32 bit applications will crash in 64 bit
environment.
In some cases the switch may be useful. But don't be surprised if it
crashes.
And finally to add from MSDN - Migrating 32-bit Managed Code to 64-bit,
There is also information in the PE that tells the Windows loader if
the assembly is targeted for a specific architecture. This additional
information ensures that assemblies targeted for a particular
architecture are not loaded in a different one. The C#, Visual Basic
.NET, and C++ Whidbey compilers let you set the appropriate flags in
the PE header. For example, C# and THIRD have a /platform:{anycpu,
x86, Itanium, x64} compiler option.
Note: While it is technically possible to modify the flags in the PE header of an assembly after it has been compiled, Microsoft does not recommend doing this.
Finally to answer your question - how does this tool help in practice?
Since you have malloc in your tags, I believe you are working on unmanaged memory. This patch would mostly result in invalid pointers as they become twice the size now, and almost all other primitive datatypes would be scaled by a factor of 2X.
But for managed code since all these are handled by the CLR in .NET, this would mean really helpful and would not have much problems unless you are dealing with any of the following :
Invoking platform APIs via p/invoke
Invoking COM objects
Making use of unsafe code
Using marshaling as a mechanism for sharing information
Using serialization as a way of persisting state
To summarize, being a programmer I would not use the tool to convert my application and rather would migrate it myself by changing build targets. being said that if I have a exe that can do well like games with more RAM, then this is worth a try.
I would like to know, let say I develop a program, have access to the clients computer, is there no way to develop the program so that is can only run on that machine, by writing in the computers unique identifier (if there is something like that) into the code and compiling the program. I'm using Delphi XE2
Yes, you can prevent some degree of unauthorized use by binding your executable to machine characteristics. You can do it yourself (problematic) or you can buy an off-the-shelf solution to do it for you (disclaimer--I work for one of the companies that produce solutions for these kinds of problems: Wibu-Systems). There are two problems with machine binding; we can help with one of them:
False positives: Machine characteristics can change due to user upgrades or weird driver behavior. That can cause your licensing system to report that the user is trying to abuse the license (a false positive). This is an endemic problem in these systems. (Shameless self-promotion: we have just released a new method of binding to reduce or eliminate these kinds of errors. We call it SmartBind(tm).
Crackability: Because any machine binding has to use OS calls to get hardware "fingerprint" info back for validation, a cracker can patch the dlls used to always return known "good" values, allowing for cracked software. These kinds of cracks are rampant on bittorrent sites. Unfortunately there is no great way around it, although our approach uses some crypto mojo to make it harder to do. For the ultimate in anti-piracy, you have to use a crypto device like a CmStick, HASP, or KeyLok. NSA can crack anything, of course, but the degree of difficulty of cracking a top-notch hardware-based solution like CodeMeter makes it unlikely unless the payoff is truly gigantic.
What I strongly suggest is that you look into commercial solutions to carefully study the available options. There are a number of vendors in this space and several good products to choose from (of course, I think our product is the best). Rolling your own solution will cause you lots of grief downstream as you try to deal with various configuration issues and potentially unhappy users.
The short answer is that there is no reliable way to prevent copying a program. Certainly there are techniques for identifying particular instances of the program, identifying machine hardware, etc, but for every one of those techniques, there is a countering technique to bypass it for users who really want to go to the trouble. Whether that is to hack your program and change what it looks for (or disable the checks altogether), to virtualize the hardware you are looking for, etc. There is always a way. It is just a matter of time and effort that someone is willing to put in.
If you want something simple this will give you the hard disk volume ID as a number which should be unique to each machine bar hacking.
function GetHDSerialNumber: Dword;
var dw:DWord; mc, fl : dword; c:string;
begin
c:=extractfiledrive(application.exename)+'\';
GetVolumeInformation(Pchar(c),nil,0,#dw,mc,fl,nil,0);
Result := dw;
end;
This works up to Delphi 2007, versions above that are unicode, you're on yer own with that problem.
While there is no such thing as hack-proof hardware the Wibu system mentioned has not been hacked yet, and it has strong anti-hack features including physical design features that make the most sophisticated hacking all but impossible.
Other solutions like i-Lock have been hacked, but so far Wibu is a good answer. I just bought their starter pack.