We are using ImageResizer plugin with Diskcahce plugin for caching, http://imageresizing.net/plugins/diskcache.
Our web sites uses millions of image. Total size of all the images is close to 2 TB.
So to maintain disk caching, how many subfolders do we have to specify?
As per the documentation I have read here : http://imageresizing.net/plugins/diskcache
given a desired cache size of 100,000 items, this should be set to 256.
But in our case, cache size will be quite more.
Can we use this plugin ?
If yes, how many subfolders do we have to use?
Thanks in advance..
Set the subfolders count to ((max images)/400), then round up to the nearest power of two. 16387 will support around 6 million images. You're going to have a hard time scaling NTFS past 4 million or so; the Windows filesystem gets bogged down.
If you can, Prefer "Proxy" of images.
The proxy will be faster (no ASP.net Serverside code will be used) than create and fill a large folder of rarely-accessed files and manage it.
IIS Proxy, Varnish Proxy, NGINX, or any other proxy can maintain a cache, and It will works better than imageresizer/asp.net-mvc.
Related
Can somebody explain to me why I should possibly use ImageMagick (or the fork GraphicsMagick) in a CMS instead of simply displaying and sizing my images via CSS?
The browser methods cannot fail and are automatically updated on client side, the ImageMagick-binary can fail on the server and I have to maintain it by hand to not become obsolete.
ImageMagick offers lots of image manipulations not available via normal HTML/CSS operations. So for some tasks, your browser just can't do the modifications.
One very important task is simply file size: if a user uploads a 20MB image you don't want to deliver that to your clients with a 3G mobile connection and let them scale the image down: you want to have your server do that task once and then serve images that are substantially smaller in size.
We are a small bootstrapped ISP in a third world country where bandwidths are usually expensive and slow. We recently got a customer who need storage solution, of 10s of TB of mostly video files (its a tv station). The thing is I know my way around linux but I have never done anything like this before. We have a backblaze 3 storage pod casing which we are thinking of using as a storage server. The Server will be connected to customer directly so its not gonna go through the internet, because 100+mbps speed is unheard off in this part of the world.
I was thinking of using 4TB HDD all formatted with ext4 and using LVM to make them one large volume (50-70tb at least). So customer logs in to an FTP like client and dumps whatever files he/she wants. But the customer only sees a single volume, and we can add space as his requirements increases. Of course this is just on papers from preliminary research as i don't have prior experience with this kind of system. Also I have to take cost in to consideration so can't go for any proprietary solution.
My questions are:
Is this the best way to handle this probably, are there equally good or better solutions out there?
For large storage solutions (at least large for me) what are my cost effective options when it comes to dealing with data corruption and HD failure.
Would love to hear any other solutions and tips you guys might have. thanks!
ZFS might be a good option but there is no native bug-free solution for Linux, yet. I would recommend other operating systems in that case.
Today I would recommend Linux MD raid5 on enterprise disks or raid6 on consumer/desktop disks. I would not assign more than 6 disks to an array. LVM can then be used to tie the arrays to a logical volume suitable for ext4.
The ext4-filesystem is well tested and stable while XFS might be better for large file storage. The downside to XFS is that it is not possible to shrink an XFS filesystem. I would prefer ext4 because of it's more flexible nature.
Please also take into consideration that backups are still required even if you are storing your data on raid-arrays. The data can silently corrupt or be accidentally deleted.
In the end, everything depends on what the customer wants. Telling the customer the price of the service usually has an effect on the requirements.
I would like to add to the answer that mingalsuo gave. As he stated, it really comes down to the customer requirements. You don't say what, specifically, the customer will do with this data. Is it for archive only? Will they be actively streaming the data? What is your budget for this project? These types of answers will better determine the proposed solution. Here are some options based on a great many assumptions. Maybe one of them will be a good fit for your project.
CAPACITY:
In this case, you are not that concerned about performance but more interested in capacity. In this case, the number of spindles don't really matter much. As Mingalsuo stated, put together a set of RAID-6 SATA arrays and use LVM to produce a large volume.
SMALL BUSINESS PERFORMANCE:
In this case, you need performance. The customer is going to store files but also requires the ability for a small number of simultaneous data streams. Here you want as many spindles as possible. For streaming, it does little good to focus on the size of the controller cache. Just focus on the number of spindles. You want as many as possible. Keep in mind that the time to rebuild a failed drive increases with the size of the drive. And, during a rebuild, your performance will suffer. For these reasons I'd suggest smaller drives. Maybe 1TB drives at most. This will provide you with faster rebuild times and more spindles for streaming.
ENTERPRISE PERFORMANCE:
Here you need high performance - similar to that that an enterprise demands. You require many simultaneous data streams and performance is required. In this case, I would stay away from SATA drives and use 900G or 1.2TB SAS drives instead. I would also suggest that you consider abstracting the storage layer from the server layer. Create a Linux server and use iSCSI (or fibre) to connect to the storage device. This will allow you to load balance if possible, or at the very least make recovery from disaster easier.
NON TRADITIONAL SOLUTIONS:
You stated that the environment has few high-speed connections to the internet. Again, depending on the requirements, you still might consider cloud storage. Hear me out :) Let's assume that the files will be uploaded today, used for the next week or month, and then rarely read. In this case, these files are sitting on (potentially) expensive disks for no reason except archive. Wouldn't it be better to keep those active files on expensive (local) disk until they "retire" and then move them to less expensive disk? There are solutions that do just that. One, for example, is called StorSimple. This is an appliance that contains SAS (and even flash) drives and uses cloud storage to automatically migrate "retired" data from the local storage to cloud storage. Because this data is retired it wouldn't matter if it took longer than normal to move it to the cloud. And, this appliance automatically pulls it back from the cloud to local storage when it is accessed. This solution might be too expensive for your project but there are similar ones that you might find will work for you. The added benefit of this is that your data is automatically backed up by the cloud provider and you have an unlimited supply of storage at your disposal.
We need to serve the same image in a number of possible sizes in our app. The library consists of 10's of thousands of images which will be stored on S3, so storing the same image in all it's possible sizes does not seem ideal. I have seen a few mentions on Google that EC2 could be used to resize S3 images on the fly, but I am struggling to find more information. Could anyone please point me in the direction of some more info or ideally, some code samples?
Tip
It was not obvious to us at first, but never serve images to an app or website directly from S3, it is highly recommended to use CloudFront. There are 3 reasons:
Cost - CloudFront is cheaper
Performance - CloudFront is faster
Reliability - S3 will occasionally not serve a resource when queried frequently i.e. more than 10-20 times a second. This took us ages to debug as resources would randomly not be available.
The above are not necessarily failings of S3 as it's meant to be a storage and not a content delivery service.
Why not store all image sizes, assuming you aren't talking about hundreds of different possible sizes? Storage cost is minimal. You would also then be able to serve your images up through Cloudfront (or directly from S3) such that you don't have to use your application server to resize images on the fly. If you serve a lot of these images, the amount of processing cost you save (i.e. CPU cycles, memory requirements, etc.) by not having to dynamically resize images and process image requests in your web server would likely easily offset the storage cost.
What you need is an image server. Yes, it can be hosted on EC2. These links should help starting off: https://github.com/adamdbradley/foresight.js/wiki/Server-Resizing-Images
http://en.wikipedia.org/wiki/Image_server
I have one website mainly composed on javascript. I hosted it on IIS.
This website request for the images from the particular folder on hard disk and display them to end user.
The request of image are very frequent and fast.
Is there any way to reduce this overhead of disk read operation ?
I heard about memory mapping, where portion of hard disk can be mapped and it will be used as the primary memory.
Can somebody tell me if I am wrong or right, if I am right what are steps to do this.
If I am wrong , is there any other solution for this ?
While memory mapping is viable idea, I would suggest using memcached. It runs as a distinct process, permits horizontal scaling and is tried and tested and in active deployment at some of the most demanding website. A well implemented memcached server can reduce disk access significantly.
It also has bindings for many languages including those over the internet. I assume you want a solution for Java (Your most tags relate to that language). Read this article for installation and other admin tasks. It also has code samples (for Java) that you can start of with.
In pseudocode terms, what you need to do, when you receive a request for, lets say, Moon.jpeg is:
String key = md5_hash(Moon.jpeg); /* Or some other key generation mechanism */
IF key IN memcached
SUPPLY FROM memcached /* No disk access */
ELSE
READ Moon.jpeg FROM DISK
STORE IN memcached ASSOCIATED WITH key
SUPPLY
END
This is a very crude algorithm, you can read more about cache algorithms in this Wiki article.
The wrong direction. You want to reduce IO to slow disks (relative). You would want to have the files mapped in physical memory. In simple scenarios the OS will handle this automagically with file cache. You may look if Windows provides any tunable parameters or at least see what perf metric you can gather.
If I remember correctly (years ago) IIS handle static files very efficiently due a kernel routing driver linked to IIS, but only if it doesnt pass through further ISAPI filters etc.. You can probably find some info related to this on Channel9 etc..
Long term wise you should look to move static assets to a CDN such as CloudFront etc..
Like any problem though... are you sure you have a problem?
I have been assigned to write a windows application which will copy the images from source folder > and it's sub folder (there could be n number of sub folder which can have upto 50 Gb images). Each image size might vary from some kb to 20 MB. I need to resize and compress the picture.
I am clueless and wondering if that can be done without hitting the CPU to hard and on the other hand, little faster.
Is it possible ? Can you guide me the best way to implement this ?
Image processing is always a CPU intensive task. You could do little tricks like lowering the priority of the process that's preforming the image processes so it impacts your machine less, but there's very little tradeoff to be made.
As for how to do it,
Write a script that looks for all the files in the current directory and sub-directories. If you're not sure how, do a Google search. You could do this is Perl, Python, PHP, C#, or even a BAT file.
Call one of the 10,000,000 free or open-source programs to do image conversion. The most widely used Linux program is ImageMagick and there's a Windows version of it available too.