Shared file storage for a Rails Application - ruby-on-rails

I have a rails app that accepts file uploads and I wanted to know the best way to have common storage between servers. Since we have a number of windows applications we have used samba in the past, but as we build pure linux apps I would like to do this the best possible way.
We are expecting large amounts of data, so would need to scale this across multiple file servers.

I've used paperclip with an S3 backend.

If you want to have all the data in-house than a networked file-system might be the way to go. Try setting up AFS it scales pretty good.

Another good alternative is from the creators of Memcached:
Mogile FS
http://www.danga.com/mogilefs/

One easy way to do it is to use attachment_fu with an S3 backend.

Related

What is exact ModeShape advantages over writing own mongodb storage?

I'm going to develop file storage system. Mainly I will store text documents. I read many questions and answers and got some information about file management systems on top of which I may develop my own.
Alfresco uses filesystem and DB reference to FS, Apache Jackrabbit uses fs||db and Modeshape uses fs||db||nosql db(cassandra,mongo)
Blobs are slower than FS especially in dealing with large files (>1MB) but blobs more reliable and provide backup,migration, consistency support out of box. As I don't want to store many large files the performance difference between fs and blob became blurred.
I decided to store blob not in relational DB but in mongo db, because
mongodb has GridFS under the hood, which provides chunked processing of binary data, replication between servers out of box;
mongodb good for storing key/value which is docid/blob in my case;
AFAIK, facebook uses mongo db for storing images and media (but they merge many files to one blob)
Many CMS systems like Magnolia, Hippo CMS and LogocalDOC based on Jacrabbit which may only provide FS||DB and don't relevant for me as I want mongodb. Alfresco is too cumbersome for my small requirements ans also doesn't support nosql DB and I decided to choose ModeShape.
Question: What is exact profit of using Modeshape instead of simply creating own small web app and directly write to mongodb and gain benefits of GridFS?
The only answer from myself is that Modeshape also comes with bundled Lucene engine for indexed search. I'm not sure about versions of documents - does it specially written in Modeshape or I can simply rely on mongodb to deal with this task? Does modeshape provide additional mechanisms to provide integrity of data and reliable storage or it simply relies on underlying database?
I also would like to use file storage system as REST service under JBOSS Keycloak and not sure is it possible to put Modeshape under Keycloak. So, my question is should I develop own app and thus gain flexible develop, integrate it with mongoDB, put it under Keycloak and other custom wishes or I should use Modeshape and gain some advantages? What is that advantages? Will it really decrease code amount from my side? Is mongoDB enough for developing simple file storage system with backups, versioning, reliable storage of UTF-8 documents?
Answer provided on JBOSS forums

Image/File hosting storage best practices and standards

We are building an image and file hosting website and we will save these files on our servers, so I want to know if there are any best practices or standards I need to read and follow to make our website scalable and easy to extend in the future.
Is there a book or articles or videos talking about this subject, please share.
As per my experience to deal with large data.
its always best to opt for Cloud, check for "Amazon S3" (Amazon AWS) or Windows Azure.
features like "CDN" (cloud front) is a big plus.
I believe this is not a simple question that can be answered without knowing
how many files are expected ?
how many users/files accesses per hour/day/minute ?
your usage scenarios with this files (downloading? streaming? how many concurrent files downloaded at once?
are you stuck in one particular OS (windows) and filesystem (NTFS), or is there freedom in this ?
My personal note : Building own image/file hosting is not a trivial task, i strongly recommend you to hire somebody with experience from this area.
I would recommend that if possible, you look at a 3rd party solution that provides an api. you'll then get the benefits of lower cost of ownership, no maintenance costs for the hardware and continual updates thrown in for free when the 3rd party adds new features to the core offering. I know this from 1st hand experience as we scoped out the options for doing this in a recent project and came to the conclusion that we'd spend 100 times more on our own solution and even then, may not get it right. We opted for a company called Razuna who offer both a hosted and open source version of their platform. Their api is very straightfwd and can be consumed inside your mvc app with potentially only a few days effort (depending on your use case). The beauty of this approach is that the hosted elements are actually on the nirvanix backbone and are served via their CDN - so win win.
You can get the details at:
http://www.razuna.com
and can view the api docs at:
http://wiki.razuna.com/display/ecp/Developer+Guides
Good luck and if you need any further real-life guidence on this, feel free to come back. Oh and btw, we were also able to ask for 'paid for' features to be added to the core offering at pretty much standard market day rates.

Use built-in local database or Isolated storage

Local database is now built into Windows Phone 7.5 Mango. I'm considering a scenario of storing a few unrelated collections with data. Using the local database is pretty straightforward, while using Isolated storage requires a bit more custom development. There are also some alternative solutions like FileDb, mentioned in a pre-mango discussion here https://stackoverflow.com/a/6954250/346995
What would be the best solution of local database/Isolated storage with regards to simplicity and performance? Would local database fit most scenarios?
Unless you are going to be storing relational data (and it sounds like you aren't) I would suggest using the IsoStore. It isn't really that difficult to use.
Now, as far as performance. Reading the disk on the phone is not going to be fast. That being said, any solution you use is going to be saved to disk in the end; so I don't think you will notice much of a difference if you go with DB or IsoStore.
Isolated Storage Overview
Isolated Storage Best Practices
31 Days of Mango: Isolated Storage
Using Isolated Storage in Windows Phone 7

Cloud-aware programming and help choosing a good framework

How can i write a cloud-aware application? e.g. an application that takes benefit of being deployed on cloud. Is it same as an application that runs or a vps/dedicated server? if not then what are the differences? are there any design changes? What are the procedures that i need to take if i am to migrate an application to cloud-aware?
Also i am about to implement a web application idea which would need features like security, performance, caching, and more importantly free. I have been comparing some frameworks and found that django has least RAM/CPU usage and works great in prefork+threaded mode, but i have also read that django based sites stop to respond with huge load of connections. Other frameworks that i have seen/know are Zend, CakePHP, Lithium/Cake3, CodeIgnitor, Symfony, Ruby on Rails....
So i would leave this to your opinion as well, suggest me a good free framework based on my needs.
Finally thanks for reading the essay ;)
I feel a matrix moment coming on... "what is the cloud? The cloud is all around us, a prison for your program..." (what? the FAQ said bring your sense of humour...)
Ok so seriously, what is the cloud? It depends on the implementation but usual features include scalable computing resource and a charge per cpu-hour, storage area etc. So yes, it is a bit like developing on your VPS/a normal server.
As I understand it, Google App Engine allows you to consume as much as you want. The back-end resource management is done by Google and billed to you and you pay for what you use. I believe there's even a free threshold.
Amazon EC2 exposes an API that actually allows you to add virtual machine instances (someone correct me please if I'm wrong) having pre-configured them, deploy another instance of your web app, talk between private IP ranges if you wish (slicehost definitely allow this). As such, EC2 can allow you to act like a giant load balancer on the front-end passing work off to a whole number of VMs on the back end, or expose all that publicly, take your pick. I'm not sure on the exact detail because I didn't build the system but that's how I understand it.
I have a feeling (but I know least about Azure) that on Azure, resource management is done automatically, for you, by Microsoft, based on what your app uses.
So, in summary, the cloud is different things depending on which particular cloud you choose. EC2 seems to expose an API for managing resource, GAE and Azure appear to be environments which grow and shrink in the background based on your use.
Note: I am aware there are certain constraints developing in GAE, particularly with Java. In a minute, I'll edit in another thread where someone made an excellent comment on one of my posts to this effect.
Edit as promised, see this thread: Cloud Agnostic Architecture?
As for a choice of framework, it really doesn't matter as far as I'm concerned. If you are planning on deploying to one of these platforms you might want to check framework/language availability. I personally have just started Django and love it, having learnt python a while ago, so, in my totally unbiased opinion, use Django. Other developers will probably recommend other things, based on their preferences. What do you know? What are you most comfortable with? What do you like the most? I'd go with that. I chose Django purely because I'm not such a big fan of PHP, I like Python and I was comfortable with the framework when I initially played around with it.
Edit: So how do you write cloud-aware code? You design your software in such a way it fits on one of these architectures. Again, see the cloud-agnostic thread for some really good discussion on ways of doing this. For example, you might talk to some services on GAE which scale. That they are on GAE (example) doesn't really matter, you use loose coupling ideas. In essence, this is just a step up from the web service idea.
Also, another feature of the cloud I forgot to mention is the idea of CDN's being provided for you - some cloud implementations might move your data around the globe to make it more efficient to serve, or just because that's where they've got space. If that's an issue, don't use the cloud.
I cannot answer your question - I'm not experienced in such projects - but I can tell you one thing... both CakePHP and CodeIgniter are designed for PHP4 - in other words: for really old technology. And it seems nothing is going to change in their case. Symfony (especially 2.0 version which is still in heavy beta) is worth considering, but as I said on the very beginning - I can not support this with my own experience.
For designing applications for deployment for the cloud, the main thing to consider if recoverability. If your server is terminated, you may lose all of your data. If you're deploying on Amazon, I'd recommend putting all data that you need persisted onto an Elastic Block Storage (EBS) device. This would be data like user generated content/files, the database files and logs. I also use the EBS snapshot on a 5 day rotation so that's backed up itself. That said, I've had a cloud server up on AWS for over a year without any issues.
As for frameworks, I'm giving Grails a try at the minute and I'm quite enjoying it. Built to be syntactically similar to Rails but runs on the JVM. It means you can take advantage of all the Java goodness, like threading, concurrency and all the great libraries out there to build your web application.

Third-Party Storage Solution for a Website Application?

I'm building a website that stores a lot of community-generated multimedia.
I originally wanted to store the data in my own SAN, but for scalability purposes I'm looking for a solution to store all the images outside my network and request them by HTTP request connectors.
I also want a solution that will let me grow in space easily by clustering or by other way.
I guess that there is a term for what I'm looking for, but I'm really not sure what it is.
I'm not sure if this question belongs to SO or ServerFault. I'm asking from the perspective of a web developer, but maybe it qualifies best for a networking question. My apologies if I'm wrong.
Best to all and wish you a happy new year.
Well, there have been a lot of services offering just that, the most popular being Amazon's S3 ( used for example at 37Signal ). A lot of libraries exist for almost any language being used in webdev, so maybe thats something to get you started!
Here are a few services you could use:
Amazon S3 / Amazon CloudFront
Rackspace CloudFiles
OpSource Cloud Files
Box.net
The terms you are looking for is "cloud storage" and optionally "content delivery network" or CDN for short. For example on Amazon, S3 is the cloud storage and CloudFront the CDN.

Resources