I heard OpenCV will have "world.dll", a single library that will have the combined functionality of all the other modules in the next release. My question is why would OpenCV do this now while in the past releases, it has always divided the functionality into categorized modules. Is there any special benefit for this?
Some info here: http://www.programmerfish.com/should-you-be-using-opencv-world-module/
The main aspect is to make deployment of end-user applications easier - you only have one DLL file instead of many.
I guess there is also a slight performance increase in loading times, because loading one DLL means more continous reading from HDD.
Related
In my application, I would like to use Deeplearning4j. Deeplearning4j has over 120mb of dependencies, which is a lot considering my own code is only 0.5mb.
Is it possible to reduce the dependencies required? Would loading an already-trained network allow me to ship my application with a smaller file size?
There are many ways to reduce the size of your jar depending on what your use case is. We cover this more recently in our docs, but I'll summarize some things to try here:
DL4j is heavily based on javacpp. You can add -Djavacpp.platform=$YOUR_PLATFORM (linux-x86_64, windows-x86_64,..) to your build to reduce the number of native dependencies in there.
If you are using deeplearning4j-core, that includes a lot of extra dependencies you may not need. In this case, you may only need deeplearning4j-nn for the configuration. The same goes for if you are using only samediff, you do not need the dl4j apis. I don't know enough about your use case to confirm what you do and don't need though.
If you are deploying on an embedded platform, we also have the ability to reduce the number of supported operations and data types now as well. This feature is mainly for advanced users right now (involves building from source) but if you think that could also be applicable despite the first 2, please do confirm and I can try to clarify that a bit.
Using the cloud compiling website I created a custom NodeMCU firmware that has a lot of modules. So many that the firmware itself is almost 700KB in size. I usually only use up to 5 modules for a single project, so I'm wondering if the inclusion of all the other modules in the firmware have a noticeable negative impact on the RAM usage.
There's an excellent explanation of the ESP8266 memory map (and other interesting bits) at https://www.kickstarter.com/projects/214379695/micropython-on-the-esp8266-beautifully-easy-iot/posts/1501224. Furthermore, you got a great answer as a comment.
Every module baked into the binary consumes memory just by "being there". If you wanted to measure the impact a single module has on the available heap you'd have to build two binaries, one with and one without that module. You'd flash both and calculate the delta of running node.heap() right after start.
Does compiling NodeMCU with lots of modules have an impact on the memory usage?
Yes, it definitely does as you noticed.
I usually only use up to 5 modules for a single project
That's why we recommend to use a different set (read "minimal set") of modules for every project. The beauty of the NodeMCU firmware is that you only have to do this once, contrary to e.g. Arduino, after which swapping scripts or even individual functions in'n out is super quick.
I suggest you also take a look at https://nodemcu.readthedocs.io/en/dev/en/lua-developer-faq/#techniques-for-reducing-ram-and-spiffs-footprint. A major overhaul is in the making at https://github.com/nodemcu/nodemcu-firmware/pull/1899.
I am currently experimenting with Halide, the initial tests show quite promising performance improvements.
I am now wondering about what is the best strategy to distribute Halide code. Requiring users to install Halide seems like a heavy barrier at this point in time (since there are no automated install options).
One option would be to use compile_to_c, add the generated C code in the repository, and distribute compilation scripts for such C code. scikit-learn uses a similar strategy for Cython generated code. For Halide this seems like a no-go since the generated C code loses all the optimizations, defeating the purpose of Halide.
My current idea would be to use
compile_to_bitcode, distribute the generated bitcode together with compilation scripts that call llc to generate the desired machine code. The only requirement for the user would be to have llc (i.e. llvm) installed.
Does anyone have experience on this issue?
What are the pro and cons of my idea of distributing bitcode?
What would you recommend?
Some details on the kind of software distribution would help. The question implies a source code distribution, but there is a big difference between a library where programmers may need to interact with Halide produced code at a fine-grained level, and an application where use of Halide is largely invisible to the end user and the goal is just to get it to build.
Distributing bitcode is doable but problematic. To be portable, you have to use something like the PNaCl backend. (PNaCl is fairly close to a generic LLVM bitcode representation.) If you target a specific architecture, there is no guarantee the bitcode will compile or run on any other one. (Halide can lower to architecture specific intrinsics for example.) The LLVM community discourages using bitcode as a distribution format, though if it is in source form (.ll, not .bc) it is likely fairly stable and seems not much worse than shipping assembly files in terms of long term stability.
Halide includes an OS specific runtime into the generated output so even with bitcode, the result includes a number of target specific dependencies.
Often one ends up with a design that chooses, at runtime, between one of a number of Halide outputs based on the actual type of processor being used. E.g. using Halide to compile the same algorithm with two different schedules for SSE2 and AVX2 processors. In this model, there are going to be a lot of object files anyway and one can simply choose at build time which ones to include for a given architecture and OS. Distributing the objects as .ll files rather than .o files will likely work, but I'm not sure it buys much.
I would strive to make the full source code available, requiring Halide if one is doing a compilation from the ground up, and look for ways to provide various levels of binary distribution. Certainly for end user software the emphasis should be on how to get the fully built package into the hands of users. For libraries, Halide may be used to surface a higher level programming model to users of the library, in which case the Halide compiler will need to be present anyway.
We strive to make Halide fairly easy to get onto a system and very stable, but have not absolutely nailed either yet. I'd likely try to provide some level of fallback and using the C backend to generate generic C code might be a decent way to do that without rewriting everything in C directly. (If building from source, one gets a choice between installing Halide or using the prebuilt C code.) This is one of the better use cases for the C backend. (Generating C code from Halide is generally a pretty marginal idea despite it seeming to be a good one at first.)
compile_to_c() is definitely not recommended, as the code it generates isn't very optimized; it's useful mostly as a debugging / development tool.
compile_to_bitcode() sounds like it could work, but I'm not aware of anyone using this as a distribution method.
(It would probably be useful to have an automated install available for Halide.)
I really wonder if using OpenCV's setNumThreads(); really allows my code to run in parallel. I've searched a lot on the internet without finding any answer for my question.
Someone there have any answer for my question?
The effect depends greatly on the configuration options you select on cmake configure, see for example CMakeLists.txt, plus the catches of the different configuration options:
/* IMPORTANT: always use the same order of defines
1. HAVE_TBB - 3rdparty library, should be explicitly enabled
2. HAVE_CSTRIPES - 3rdparty library, should be explicitly enabled
3. HAVE_OPENMP - integrated to compiler, should be explicitly enabled
4. HAVE_GCD - system wide, used automatically (APPLE only)
5. HAVE_CONCURRENCY - part of runtime, used automatically (Windows only - MSVS 10, MSVS 11)
*/
And with those, you can understand the code itself. All that said, the parallelising engine won't do much if you're running an inherently sequential algorithm, which is practically everything under OpenCV... My guess is that if you would have several OpenCV programs running in parallel, you could see a meaningful difference.
Feel the need to build on miguelao's answer: most of OpenCV's functionality is NOT multithreaded. setNumThreads only effects multithreaded functions, such as calcOpticalFlowPyrLK.
Normally by default, OpenCV will use as many threads as you have cores. So setNumThreads won't give you a speed gain.
My main use for it is disabling multithreading, so that I may do my own with coarser granularity.
If I compile my entire Delphi application to a single exe, that file will grow to 5MB, 10MB, maybe more. When is that too big? What are the issues with this? This is a commercial application, currently on Delphi XE.
I'm aware of the option to Build with Runtime Packages. That sounded like a good idea, but I see comments here noting that there are some issues and disadvantages.
A Delphi application is never really too big.
However the larger the exe is, the harder it will be to redistribute the file.
Also if the executable is located on a network-disk start-up time may suffer.
A number of factors make the exe grow:
enabling debug info (will more or less double the exe size). Disable the inclusion of debug info in the final exe (see screenshot above).
including bitmaps (in an imagelist or likewise component) will also grow the exe substantially.
including resources (using a custom *.res) file will grow the size.
I would advise against putting resources in a separate dll.
This will complicate your application, whilst not reducing the loading time and distribution issues.
Turning off debug info in production code is a must.
If you have a Delphi-2010 or newer you can choose to include images in the png format.
This will take up much less space than old-skool bitmaps.
As long as your app is below 30 MB I would not really worry overmuch about the file size though.
Strip RTTI info
David suggests stripping RTTI info (this will disable live-bindings and some other advanced stuff), see: Reduce exe file
According to David it saves about 30% in exe size.
Exe-size will only increase loading time
Far more important is the amount of data your application allocates as storage.
The amount of space you use (or waste) here will have a far greater impact on the performance of your application than the raw exe size.
Strategy or tools to find "non-leak" memory usage problems in Delphi?
A better way to optimize is to make sure you don't leak resources
How to activate ReportMemoryLeaksOnShutdown only in debug mode?
Windows API calls memory leak detection
Use smart datastructures and algorithms
It gets too general to really narrow it down here, but use algorithms with O(slowly increasing) over O(wasteful increase).
Big-O for Eight Year Olds?
And try and limit memory usage by only fetching the data that you need instead of all the data you might need but probably never will.
Delphi data structures
Etc etc.
I don't know any issues with the exe-size of an application. I'm currently working at an application where the exe is around 60MB and there is no problem.
The only limitation I know are the limitation of the available memory. And an application with use of runtime-packages will consume more working memory because all runtime packages are load on application start. And the packages contains a lot of code wich is problably not used in your application.
I really like the idea of runtime-packages but I don't like the implementation in Delphi. One main disadvantage is that you have to ship your app with a bunch of packages wich makes it hard to maintain.
use RELEASE Build for reduce execute size , increase performance.also use runtime package for reduce exe file but use runtime package cause increase package(setup) file size.