Batch processing of image - image-processing

I have been assigned to write a windows application which will copy the images from source folder > and it's sub folder (there could be n number of sub folder which can have upto 50 Gb images). Each image size might vary from some kb to 20 MB. I need to resize and compress the picture.
I am clueless and wondering if that can be done without hitting the CPU to hard and on the other hand, little faster.
Is it possible ? Can you guide me the best way to implement this ?

Image processing is always a CPU intensive task. You could do little tricks like lowering the priority of the process that's preforming the image processes so it impacts your machine less, but there's very little tradeoff to be made.
As for how to do it,
Write a script that looks for all the files in the current directory and sub-directories. If you're not sure how, do a Google search. You could do this is Perl, Python, PHP, C#, or even a BAT file.
Call one of the 10,000,000 free or open-source programs to do image conversion. The most widely used Linux program is ImageMagick and there's a Windows version of it available too.

Related

Check how much of a docker image is accessed?

How can I measure the efficiency of a container image, in terms of what portion of its contents are actually used (accessed) for the processes therein?
There are various forms of wastage that could contribute to excessively large images, such as layers storing files that are superseded in later layers (which can be analysed using dive), or binaries interlaced with unstripped debug information, or the inclusion of extraneous files (or data) that are simply not needed for the process which executes in the container. Here I'm asking about the latter.
Are there docker-specific tools (analogous to dive) for estimating/measuring this kind of wastage/efficiency, or should I just apply general Linux techniques? Can the filesystem access time (atime) be relied upon inside a container (to distinguish which files have/haven't been read since the container was instantiated) or do I need to instrument the image with tools like the Linux auditing system (auditd)?

OP_JMP opcode scalability

My app uses a customized version of Lua 5.0.2 and some customers feed very large (automatically generated) scripts into it. For example, scripts that are 200 megabytes or more in size. It all seems to work fine but I'm wondering how scalable the Lua VM is.
I'm especially concerned about the OP_JMP scalability. In Lua 5.0.2 OP_JMP uses 18 bits to code the destination PC. So it can jump 262144 instructions at a maximum. Can this limitation be a problem with very large scripts? For example, a 200 megabyte script in which a function at the very bottom of the script calls a function at the very top of the script? Will this hit an OP_JMP limit or are function calls implemented in a different way and OP_JMP is only used for coding control structures like for, if, ... which are very unlikely to have to jump more than 2^18 instructions...
Thanks to anybody who can shed some light onto this!

How important is a small Docker image when running?

There are a number of really tiny Linux Docker images that weigh in around 4-5M and the "full" distros that start around 100M and climb to twice that.
Setting aside storage space and download time from a repo, are there runtime considerations to small vs large images? For example if I have a compiled Go program, one running on Busybox and the other on Ubuntu, and I run say 10 of them on a machine, in what ways (if any) does it matter than one image is tiny and the other pretty heavy? Does one consume more runtime resources than the other?
I never saw any real difference in consuming other resources than storage and RAM if the image is bigger, however, as Docker containers should be single process why having the big overhead of unused clutter in your containers?
When trimming things down to small containers, there some advantages you may consider:
Faster transfer when deploying (esp. important if you wan't to do rolling upgrades)
Costs: The most time I used big containers, I ran exactly into storage issues on small VMs
Distributed File Systems: When using some File Storage like GlusterFS or other attached storage, big containers slow down, when bootet and updated heavily
massive overhead of data: if you have 500 MB clutter, you'll have it on your dev-machine, your CI/CD-Server, your registry AND every node of your production servers. This CAN matter depending on your use case.
I would say: If you just use a handful containers internally then the size is less important, if ever, than using hundrets of containers in production.

ImageResizer subfolders how to choose

We are using ImageResizer plugin with Diskcahce plugin for caching, http://imageresizing.net/plugins/diskcache.
Our web sites uses millions of image. Total size of all the images is close to 2 TB.
So to maintain disk caching, how many subfolders do we have to specify?
As per the documentation I have read here : http://imageresizing.net/plugins/diskcache
given a desired cache size of 100,000 items, this should be set to 256.
But in our case, cache size will be quite more.
Can we use this plugin ?
If yes, how many subfolders do we have to use?
Thanks in advance..
Set the subfolders count to ((max images)/400), then round up to the nearest power of two. 16387 will support around 6 million images. You're going to have a hard time scaling NTFS past 4 million or so; the Windows filesystem gets bogged down.
If you can, Prefer "Proxy" of images.
The proxy will be faster (no ASP.net Serverside code will be used) than create and fill a large folder of rarely-accessed files and manage it.
IIS Proxy, Varnish Proxy, NGINX, or any other proxy can maintain a cache, and It will works better than imageresizer/asp.net-mvc.

How Reduce Disk read write overhead?

I have one website mainly composed on javascript. I hosted it on IIS.
This website request for the images from the particular folder on hard disk and display them to end user.
The request of image are very frequent and fast.
Is there any way to reduce this overhead of disk read operation ?
I heard about memory mapping, where portion of hard disk can be mapped and it will be used as the primary memory.
Can somebody tell me if I am wrong or right, if I am right what are steps to do this.
If I am wrong , is there any other solution for this ?
While memory mapping is viable idea, I would suggest using memcached. It runs as a distinct process, permits horizontal scaling and is tried and tested and in active deployment at some of the most demanding website. A well implemented memcached server can reduce disk access significantly.
It also has bindings for many languages including those over the internet. I assume you want a solution for Java (Your most tags relate to that language). Read this article for installation and other admin tasks. It also has code samples (for Java) that you can start of with.
In pseudocode terms, what you need to do, when you receive a request for, lets say, Moon.jpeg is:
String key = md5_hash(Moon.jpeg); /* Or some other key generation mechanism */
IF key IN memcached
SUPPLY FROM memcached /* No disk access */
ELSE
READ Moon.jpeg FROM DISK
STORE IN memcached ASSOCIATED WITH key
SUPPLY
END
This is a very crude algorithm, you can read more about cache algorithms in this Wiki article.
The wrong direction. You want to reduce IO to slow disks (relative). You would want to have the files mapped in physical memory. In simple scenarios the OS will handle this automagically with file cache. You may look if Windows provides any tunable parameters or at least see what perf metric you can gather.
If I remember correctly (years ago) IIS handle static files very efficiently due a kernel routing driver linked to IIS, but only if it doesnt pass through further ISAPI filters etc.. You can probably find some info related to this on Channel9 etc..
Long term wise you should look to move static assets to a CDN such as CloudFront etc..
Like any problem though... are you sure you have a problem?

Resources