Reduce DeepLearning4j dependency size of exported jar - deeplearning4j

In my application, I would like to use Deeplearning4j. Deeplearning4j has over 120mb of dependencies, which is a lot considering my own code is only 0.5mb.
Is it possible to reduce the dependencies required? Would loading an already-trained network allow me to ship my application with a smaller file size?

There are many ways to reduce the size of your jar depending on what your use case is. We cover this more recently in our docs, but I'll summarize some things to try here:
DL4j is heavily based on javacpp. You can add -Djavacpp.platform=$YOUR_PLATFORM (linux-x86_64, windows-x86_64,..) to your build to reduce the number of native dependencies in there.
If you are using deeplearning4j-core, that includes a lot of extra dependencies you may not need. In this case, you may only need deeplearning4j-nn for the configuration. The same goes for if you are using only samediff, you do not need the dl4j apis. I don't know enough about your use case to confirm what you do and don't need though.
If you are deploying on an embedded platform, we also have the ability to reduce the number of supported operations and data types now as well. This feature is mainly for advanced users right now (involves building from source) but if you think that could also be applicable despite the first 2, please do confirm and I can try to clarify that a bit.

Related

OpenLiberty Docker image with *all* features enabled

The OpenLiberty Docker Images tagged as full contain a server.xml that only enables the javaee-8.0 feature. Those tagged as microProfile3 only enable microProfile-3.0.
I want both... even better: I'd like to have just all features enabled while I'm developing; I'll optimize for performance when I need it, i.e. maybe not at all.
Is there an easier way than to build another image with both features enabled?
It isn't possible to enable all features at once in Liberty because many of the features intentionally conflict with one another. For example, you can't load two different versions of the same feature at the same time (e.g. servlet-3.1 and servlet-4.0)
You can pretty concisely enable all of the latest JavaEE and MicroProfile features at once by doing this:
<server>
<featureManager>
<feature>javaee-8.0</feature>
<feature>microProfile-3.2</feature>
</featureManager>
</server>
Doing this will give quite a lot of capabilities (more than a single app typically needs). The features not included in these two umbrella features are pretty specialized, such as JCache session persistence (sessionCache-1.0) or event logging (eventLogging-1.0).
You can think of the tags as indicators for what features are included in the image more so than what's enabled by default. In other words, 'full' has all the features available and can be enabled without the need for install, whereas 'microProfile3' only has the microProfile-3 features installed. Note that some packages, like javaee8, have more than just the single feature included as it also provides other features that users may need to use alongside that single feature (though only that one feature is enabled by default). You can see the breakdown of features to package here
Andy's answer explains why you can't enable all the features at once (conflicts). Regarding whether there's an easy way to build with both features enabled, I'd recommend starting with 'full' and updating the Dockerfile to COPY the server.xml with both features (plus any other ones you'd like) to /config. Like you alluded to in your question, this is fine for development, but you would not want to do it for production as it would included a lot of extra features that you're not using. For production, you'd want to use the opposite approach and start with the smallest image (perhaps kernel) and add only the features that your application/server needs, ensuring a fit-for-purpose runtime.

differences between classes in backtype.storm & org.apache.storm & com.twitter.heron packages

I want to write some custom schedulers for apache heron, and i'm diving a little deep into the source code. I noticed that in the heron source code there are couple of packages with similar classes. For example most of classes in backtype.storm & org.apache.storm are similar(exactly similar such that inside codes are identical). There are also some similar classes between these two packages and com.twitter.heron(for example com.twitter.heron.api.tuple.Fields) but some of them have different code inside(such as the Fields class). I know that when writing topologies we can import each package that we want and we can choose between either one of these but i'm curious about the differences between them and why they put all of these packages together. and didn't merge them? And if storm classes are the only choice for writing topologies, what are classes in com.twitter.heron package good for?
I know that heron is designed to be fully backward compatible with storm and this might be because of the backward compatibility issue, but i have to admit that this has confused me a lot, because i need to write my own code inside these classes and i don't know how to choose which one, which one is constantly developing and maintaining by developers and i should choose them as candidates to modify.
Thanks in advance.
Based on the Descriptions of the developer team in here:
Use of heron api classes is not recommended - since we might change them frequently. They are meant for internal usage only.
backtype.storm is if your application wants to use pre-storm 1.0.0. For post 1.0.0 applications, you should use org.apache.storm

How to perform image processing using the "12 factor app" way in Rails

I'm having a hard time figuring out the best way to do image uploads in my Rails project while still adhering the the '12 factor app' standards.
Basically, I would like to have users upload their own profile images. I know I can use Paperclip to accomplish that, but I still need to use something like ImageMagick to resize the images and prepare the pictures to be used in my application.
According to 12 Factor app's Section 2, Dependencies (http://12factor.net/dependencies):
A twelve-factor app never relies on implicit existence of system-wide packages.
Twelve-factor apps also do not rely on the implicit existence of any system tools. Examples include shelling out to ImageMagick or curl.
It says you shouldn't use ImageMagick locally. That makes sense, but how would you do you image processing then?
Thanks in advance for any advice you can give!
I think the key word there is implicit.
Make dependencies explicit, and account for behaviors if some required component you don't ship is not present. There are ton's of environment management tools that let you stand up a new environment configured in the 'proper' way.
I tend to think of ImageMagick as a backing service, akin to the database your application uses. I don't believe 12-factor is telling you that you have to also ship the RDBMS in order to be totally self-contained.
Perhaps some configuration values where if specified, then the ImageMagick libraries are used, or degrade gracefully until libraries are loaded and configured.
Yeah, it's a sticky wicket.

Is setNumThreads(x) parallelizing my OpenCV code?

I really wonder if using OpenCV's setNumThreads(); really allows my code to run in parallel. I've searched a lot on the internet without finding any answer for my question.
Someone there have any answer for my question?
The effect depends greatly on the configuration options you select on cmake configure, see for example CMakeLists.txt, plus the catches of the different configuration options:
/* IMPORTANT: always use the same order of defines
1. HAVE_TBB - 3rdparty library, should be explicitly enabled
2. HAVE_CSTRIPES - 3rdparty library, should be explicitly enabled
3. HAVE_OPENMP - integrated to compiler, should be explicitly enabled
4. HAVE_GCD - system wide, used automatically (APPLE only)
5. HAVE_CONCURRENCY - part of runtime, used automatically (Windows only - MSVS 10, MSVS 11)
*/
And with those, you can understand the code itself. All that said, the parallelising engine won't do much if you're running an inherently sequential algorithm, which is practically everything under OpenCV... My guess is that if you would have several OpenCV programs running in parallel, you could see a meaningful difference.
Feel the need to build on miguelao's answer: most of OpenCV's functionality is NOT multithreaded. setNumThreads only effects multithreaded functions, such as calcOpticalFlowPyrLK.
Normally by default, OpenCV will use as many threads as you have cores. So setNumThreads won't give you a speed gain.
My main use for it is disabling multithreading, so that I may do my own with coarser granularity.

Why OpenCV is releasing "World.dll" in 2.4.7?

I heard OpenCV will have "world.dll", a single library that will have the combined functionality of all the other modules in the next release. My question is why would OpenCV do this now while in the past releases, it has always divided the functionality into categorized modules. Is there any special benefit for this?
Some info here: http://www.programmerfish.com/should-you-be-using-opencv-world-module/
The main aspect is to make deployment of end-user applications easier - you only have one DLL file instead of many.
I guess there is also a slight performance increase in loading times, because loading one DLL means more continous reading from HDD.

Resources