Recommended way to distribute Halide generated functions? - image-processing

I am currently experimenting with Halide, the initial tests show quite promising performance improvements.
I am now wondering about what is the best strategy to distribute Halide code. Requiring users to install Halide seems like a heavy barrier at this point in time (since there are no automated install options).
One option would be to use compile_to_c, add the generated C code in the repository, and distribute compilation scripts for such C code. scikit-learn uses a similar strategy for Cython generated code. For Halide this seems like a no-go since the generated C code loses all the optimizations, defeating the purpose of Halide.
My current idea would be to use
compile_to_bitcode, distribute the generated bitcode together with compilation scripts that call llc to generate the desired machine code. The only requirement for the user would be to have llc (i.e. llvm) installed.
Does anyone have experience on this issue?
What are the pro and cons of my idea of distributing bitcode?
What would you recommend?

Some details on the kind of software distribution would help. The question implies a source code distribution, but there is a big difference between a library where programmers may need to interact with Halide produced code at a fine-grained level, and an application where use of Halide is largely invisible to the end user and the goal is just to get it to build.
Distributing bitcode is doable but problematic. To be portable, you have to use something like the PNaCl backend. (PNaCl is fairly close to a generic LLVM bitcode representation.) If you target a specific architecture, there is no guarantee the bitcode will compile or run on any other one. (Halide can lower to architecture specific intrinsics for example.) The LLVM community discourages using bitcode as a distribution format, though if it is in source form (.ll, not .bc) it is likely fairly stable and seems not much worse than shipping assembly files in terms of long term stability.
Halide includes an OS specific runtime into the generated output so even with bitcode, the result includes a number of target specific dependencies.
Often one ends up with a design that chooses, at runtime, between one of a number of Halide outputs based on the actual type of processor being used. E.g. using Halide to compile the same algorithm with two different schedules for SSE2 and AVX2 processors. In this model, there are going to be a lot of object files anyway and one can simply choose at build time which ones to include for a given architecture and OS. Distributing the objects as .ll files rather than .o files will likely work, but I'm not sure it buys much.
I would strive to make the full source code available, requiring Halide if one is doing a compilation from the ground up, and look for ways to provide various levels of binary distribution. Certainly for end user software the emphasis should be on how to get the fully built package into the hands of users. For libraries, Halide may be used to surface a higher level programming model to users of the library, in which case the Halide compiler will need to be present anyway.
We strive to make Halide fairly easy to get onto a system and very stable, but have not absolutely nailed either yet. I'd likely try to provide some level of fallback and using the C backend to generate generic C code might be a decent way to do that without rewriting everything in C directly. (If building from source, one gets a choice between installing Halide or using the prebuilt C code.) This is one of the better use cases for the C backend. (Generating C code from Halide is generally a pretty marginal idea despite it seeming to be a good one at first.)

compile_to_c() is definitely not recommended, as the code it generates isn't very optimized; it's useful mostly as a debugging / development tool.
compile_to_bitcode() sounds like it could work, but I'm not aware of anyone using this as a distribution method.
(It would probably be useful to have an automated install available for Halide.)

Related

iOS bitcode - security concerns

We are distributing a software module for iOS online. Since, Apple is advocating bitcode and even making it mandatory for apps on some devices (watchOS/tvOS) - forcing us to deliver this software module (static library) with bitcode.
The concern is how secure bitcode is from anyone to reverse engineer and decompile (like java bytecode) and how to protect against it? It is easy for anyone to download libraries from the website and extract bitcode (IR) from it and decompile. Some valuable information on it here
https://lowlevelbits.org/bitcode-demystified/
Bitcode maynot be concern for apps as apple will strip it but definitely appears to be a concern for static libraries.
Any insights?
As the link notes "malefactor can obtain your app or library, retrieve the [code] from it and steal your ‘secret algorithm.’" Yep. Totally true.
Also, if you ship non-bitcode libraries, a "malefactor can obtain your app or library, retrieve the [code] from it and steal your ‘secret algorithm.’"
Also, if you ship non-bitcode apps, a "malefactor can obtain your app or library, retrieve the [code] from it and steal your ‘secret algorithm.’"
There is no situation where this is not true. Tools as cheap as Hopper (my tool of choice, but there are also some cheaper solutions) and elaborate as IDA can decompile your functions into passable C code.
If you're working with Cocoa (ObjC or Swift), you have made it even easier to reverse engineer because it's so easy to dynamically introspect Cocoa.
This is not a solvable problem. Both apps and libraries can try to employ obfuscation techniques, but they are complex, fragile, and typically require significant expense or expertise (and often both). In any case, you will need to continually improve your obfuscation as people break it. This is fairly pointless for a library, since there's very little you could re-protect once it leaks, but you can try.
It will leak. That's not solvable. Bitcode doesn't change a whole lot about that. It might be somewhat simpler to read IR than ARM assembly, but not that much, and certainly not if the thing you're protecting is small (like a small algorithm or a key).
There are some obfuscation vendors out there. Product recommendations are off-topic for Stack Overflow (because they attract spam), but search for "ios obfuscation" and you'll find them. In this space, since it's just "tricky hiding" (not security or encryption) you generally get what you pay for. Open source solutions make little sense, since the whole point is to be tricky and hide how you're doing it. I've worked with some open source obfuscation libraries that make it easier to extract secrets from your code (because they're trivial to reverse, and their use marks the parts of the code where you're hiding things).
If this is important to your business plan, then budget for that, and expect it to introduce some challenging bugs, and expect it to be broken anyway (but maybe take longer).
#Rob Napier, you are wrong in comparing orange to apple. Reading assembly code or disassembling with IDA is world apart compared to reading code generated by decompiling intermediate code. Bitcode is totally a nuisance

Is Electron a Reliable Framework for Enterprise Apps?

We can see good applications (such as Slack and Insomnia) going to Electron, but there is safety/stable enough to build an big solution (such as an ERP) with that? Thanks.
As far as stability goes, Electron is very stable. In my experience I've had no stability issues or unanticipated behavior while developing some complex software on Electron.
However a bigger concern for some is security. Allow me to explain.
How Electron Packages Applications
Electron packages applications by bundling all of their javascript components into an asar.
Asar is a simple extensive archive format, it works like tar that concatenates all files together without compression, while having random access support.
Why This is a Security Concern
What this means is that all of your applications code is just put into an archive. This archive can be explored and extracted using the asar command quite trivially.
npm install asar
asar extract my-app.asar
While this may not be an issue for open source projects or applications like Slack which rely on a backend paid service, license based or paid products could be easily stolen as there is no code security / obscurity that a traditional compiled application might offer. For some, this may be acceptable, for others it may not. Especially if business logic occurs in the application.
Can This Issue be Mitigated?
One potential solution to this issue would be the ability to encrypt the ASAR. This issue has been brought up to the Electron devs, but they have stated that while they are open to a pull request they will likely not be implementing it themselves.
Another possible technique to mitigate this issue is code obfuscation using something such as UglifyJS. However this is obviously not true protection, just a hiding technique.
A third solution, one used by NW.js is to compile your JS to a V8 snapshot. However the Electron devs have indicated that this has significant (50%) performance costs and they will likely not support such capability.
All of this being said, it is possible to decompile / reverse engineer almost any application in any language. Electron just makes it a little easier to do so by "putting your code out there." However they have strong reasoning for doing so (performance gains) and unless you have a paid license product it probably doesn't make much difference to you anyways.
Further reading:
https://github.com/electron/electron/issues/3041
https://github.com/electron/electron/issues/2570

Rolling own code instead of using libraries, avoiding the common approach

I have seen a plethora of projects roll their own things instead of using well tested libraries.
In some other instances I have seen people re-implement Elliptic Curves and Random Number Generators, refusing to use tested libraries, because their code is "better".
Why do people do this, choose to spend their time implementing something instead of using something that has been already done, tested and deployed in a plethora of systems?
For example, the Signal Android messenger app has the whole, full copy of OpenSSL embedded into itself for encryption. Ref
Why not use BouncyCastle or java.security.*?
Is it a ego thing? Is it a trust thing, ie. they don't trust libraries?
It can be for a host of different reasons.
Build vs. buy (or use by reference) should come down to a thorough analysis. That said, many folks get into programming because they like building things. Sometimes it's rewarding to build your own code (even when a third party library exists).
That said, I'll try to list some reasons why you might not want to use third party libraries:
Licensing: Does the third party library licensing conflict or restrict your intended usage of your code? For example, GPL-licensed code may not be the best pick for something used commercially.
Security: Has the third party code been thoroughly analyzed for any security vulnerabilities? If it's public-facing, then have there been exploits in the past that have targeted this code? If so, then how quickly have the contributors fixed things (or have they even bothered to issue a patch).
Ease of use: For example, I may not want to try to use a C++ library in C# code. It's possible, but it's less straightforward than using a C# library.
Bug fixes: Is development ongoing on the third party library? If there's a bug, then how easily can you get it fixed?
Domain knowledge: We can't specialize in everything. Using your example of encryption, I'd strongly discourage attempting to build an encryption library from scratch unless you have an encryption background.
Simplicity: Your use case may be much smaller than what a third party library is built to provide. For example, if you needed to build a Point class to represent an X,Y,Z point, then you could reference a third party graphics library. But if you don't need the ability to do graphics calculations on 3D space, then referencing an entire graphics library might be overkill.
All this said, there are many times using a third party library works (and is the appropriate choice). Using your example, I'd never try to implement an encryption stack on my own -- there's no reason to do so with the plethora of open-source options available.

migrate COBOL code

I have a task to convert COBOL code to .NET. Are there any converters available? I am trying to understand COBOL code in high level. I have a trouble understanding the COBOL code. Is there any flowchart generators? I appreciate any help.
Thank you..
Migrating software systems from one language or operating environment to another is always a challenge. Here are
a few things to consider:
Legacy code tends to be poorly structured as a result of a
long history of quick fixes and problem work-arounds. This really ups the signal-to-noise ratio
when trying to warp your head around what is really going on.
Converting code leads to further "de-structuring"
to compensate for mis-matches between the source and
target implementation platforms. When you start from a poorly structured base (legacy system),
the end result may be totally un-intelligible.
Documentation of the legacy architecture and/or business processes is generally so far out of
date that it is worse than useless, it may actually be misleading.
Complexity of COBOL code is almost always under estimated.
A number of "features" will be promulgated into the converted system that were originally
built to compensate for things that "couldn't be done" at one time (due to smaller memories,
slower computers etc.). Many of these may now be non-issues and you really don't want them.
There are no obvious or straight forward ways to refactor legacy process driven
systems into an equivalent object oriented system (at least not in a meaningful way).
There have been successful projects that migrated COBOL directly into Java. See naca.
However, the end result is only something its mother (or another COBOL programmer) could love, see this discussion
In general I would be suspicious of any product or tool making claims to convert your COBOL legacy
system into anything but another version of COBOL (e.g. COBOL.net). To this end you still
end up with what is essentially a COBOL system. If this approach is acceptable then you
might want to review this white paper from Micro Focus.
IMHO, your best bet for replacing COBOL is to re-engineer your system. If you ever find
a silver bullet to get from where you are to where you want to be - write a book, become
a consultant and make many millions of dollars.
Sorry to have provided such a negative answer, but if you are working with anything
but a trivial legacy system, the problem is going to be anything but trivial to solve.
Note: Don't bother with flowcharting the existing system. Try to get a handle on process input/output and program to program data transformation and flow. You need to understand the business function here, not a specific implementation of it.
Micro Focus and Fujitsu both have COBOL products that work with .NET. Micro Focus allow you to download a product trial, while the Fujitsu NetCOBOL site has a number of articles and case studies.
Micro Focus
http://www.microfocus.com/products/micro-focus-developer/micro-focus-cobol/windows-and-net/micro-focus-visual-cobol.aspx
Fujitsu
http://www.netcobol.com/products/Fujitsu-NetCOBOL-for-.NET/overview
[Note: I work for Micro Focus]
Hi
Actually, making COBOL applications available on the .NET framework is pretty straightforward (contrary to claim made in one of the earlier responses). Fujitsu and Micro Focus both have COBOL compilers that can create ILASM code for execution in the CLR.
Micro Focus Visual COBOL (http://www.microfocus.com/visualcobol) makes it particularly easy to deploy traditional, procedural COBOL as managed code with full support for COBOL data types, file systems etc. It also includes an updated OO COBOL syntax that takes away a lot of the verbosity & complexity of the syntax to be very easy to write COBOL code based on C# examples. It's unique approach also makes it easy to use all the Visual Studio tools such as IntelliSense.
The original question mentioned "convert" and I would strongly recommend against any approach that requires the source code to be converted to some other language before being used in a .NET environment. The amount of effort and risk involved is highly unlikely to be worth any benefits accrued. On the contrary, keeping the code in COBOL maintains the existing, working code and allows for the option to deploy onto other platforms in the future. For example, how about having a single set of source code and having the option to deploy into .NET as a native language and into a Java environment without changing a line of source code?
I recommend you get a trial copy of Visual COBOL from the link above and see how you can use your existing code in .NET without making any changes.
This is not an easy task. COBOL has fundamental ideas about data types that do not map well with the object-oriented .NET framework (e.g. in COBOL, all data types are represented in terms of fixed-size buffers) and in particular the way groups and arrays work do not map well to .NET classes.
I believe there are COBOL compilers that can actually compile .NET bytecode, but they would have their own runtime libraries to manage all of that. It might be worth looking at one of these compilers and simply leaving the legacy code in COBOL.
Other than that, line-by-line translation is probably not possible. Look at the code at a higher level and translate blocks of code at a time (e.g. at the procedure level or even higher).
There are a lot mechanisms how to convert COBOL to modern scalable environments, such as .NET or Java.
The first is a migration to a new environment with saving the existing COBOL code with some minor modifications (NET Microfocus COBOL);
The second is a migration to a new platform with simulation of COBOL statements and constructions. When there are some additional NET/Java libraries to simulate some specific COBOL logic:
ACCEPT goes to NETLibrary.Accept and so on.
The third approach is the most valuable one, when you migrate to "pure" NET/Java code with all the benefits of the new environment. It can be easily maintained and developed in the future.
However, the unique expertise and toolkits are required for this approach, and there are only a few players on the global market that can help you in this case.
If we are talking about automatic migration, the number of players decreases greatly and, unfortunately for you, you have to pay for the specific technologies and tools (like ours).
However, it is a better idea to invest your money in your future growth in the modern environment, than to spend your money on the "simulation" of old technologies.
Translations is not an easy task. Besides Micro Focus and Fujitsu there is also Raincode that offers a free version of Cobol that nicely integrates with Visual Studio.

Datamining models in FORTRAN or C (or managed code)?

We are planning to develop a datamining package for windows. The program core / calculation engine will be developed in F# with GUI stuff / DB bindings etc done in C# and F#.
However, we have not yet decided on the model implementations. Since we need high performance, we probably can't use managed code here (any objections here?). The question is, is it reasonable to develop the models in FORTRAN or should we stick to C (or maybe C++). We are looking into using OpenCL at some point for suitable models - it feels funny having to go from managed code -> FORTRAN -> C -> OpenCL invocation for these situations.
Any recommendations?
F# compiles to the CLR, which has a just-in-time compiler. It's a dialect of ML, which is strongly typed, allowing all of the nice optimisations that go with that type of architecture; this means you will probably get reasonable performance from F#. For comparison, you could also try porting your code to OCaml (IIRC this compiles to native code) and see if that makes a material difference.
If it really is too slow then see how far that scaling hardware will get you. With the performance available through a modern PC or server it seems unlikely that you would need to go to anything exotic unless you are working with truly brobdinagian data sets. Users with smaller data sets may well be OK on an ordinary PC.
Workstations give you perhaps an order of magnitude more capacity than a standard dekstop PC. A high-end workstation like a HP Z800 or XW9400 (similar kit is available from several other manufacturers) can take two 4 or 6 core CPU chips, tens of gigabytes of RAM (up to 192GB in some cases) and has various options for high-speed I/O like SAS disks, external disk arrays or SSDs. This type of hardware is expensive but may be cheaper than a large body of programmer time. Your existing desktop support infrastructure shouldn be able to this sort of kit. The most likely problem is compatibility issues running 32 bit software on a 64-bit O/S. In this case you have various options like VMs or KVM switches to work around the compatibility issues.
The next step up is a 4 or 8 socket server. Fairly ordinary wintel servers go up to 8 sockets (32-48 cores) and perhaps 512GB of RAM - without having to move off the Wintel platform. This gives you fairly wide range of options within your platform of choice before you have to go to anything exotic1.
Finally, if you can't make it run quickly in F#, validate the F# prototype and build a C implementation using the F# prototype as a control. If that's still not fast enough you've got problems.
If your application can be structured in a way that suits the platform then you could look at a more exotic platform. Depending on what will work with your application, you might be able to host it on a cluster, cloud provider or build the core engine on a GPU, Cell processor or FPGA. However, in doing this you're getting into (quite substantial) additional costs and exotic dependencies that might cause support issues. You will probably also have to bring a third-party consultant who knows how to program the platform.
After all that, the best advice is: suck it and see. If you're comfortable with F# you should be able to prototype your application fairly quickly. See how fast it runs and don't worry too much about performance until you have some clear indication that it really will be an issue. Remember, Knuth said that premature optimisation is the root of all evil about 97% of the time. Keep a weather eye out for issues and re-evaluate your strategy if you think performance really will cause trouble.
Edit: If you want to make a packaged application then you will probably be more performance-sensitive than otherwise. In this case performance will probably become an issue sooner than it would with a bespoke system. However, this doesn't affect the basic 'suck it and see' principle.
For example, at the risk of starting a game of buzzword bingo, if your application can be parallelized and made to work on a shared-nothing architecture you might see if one of the cloud server providers [ducks] could be induced to host it. An appropriate front-end could be built to run locally or through a browser. However, on this type of architecture the internet connection to the data source becomes a bottleneck. If you have large data sets then uploading these to the service provider becomes a problem. It may be quicker to process a large dataset locally than to upload it through an internet connection.
I would advise not to bother with optimizations yet. First try to get a working prototype, then find out where computation time is spent. You can probably move the biggest bottlenecks out into C or Fortran when and if needed -- then see how much difference it makes.
As they say, often 90% of the computation is spent in 10% of the code.

Resources