How do I get Paperclip image uploads to work on a Rails app running on 8 machines (load-balanced)?
A user can upload an image on the app. The image is stored on one of the machines. The user later requests the image, but it's not found, because it's being requested from another machine.
What's the workaround for this type of problem? I can't use AWS or any cloud service; images have to be stored in-house.
Thanks.
One solution is to use NFS to mount a shared folder that will be the root of your public/system or whatever you called your folder containing paperclip images.
There's a few things to consider to make everything work though :
Use a dedicated server that will only contain assets, this way your hard drive(s) are dedicated to serve your paperclip images
NFS can be expensive. Use it to write files from your App servers to your Asset server only. You'll have to configure your load balancer or reverse proxy or web server to retrieve all images from the asset server directly, without asking an application server to do it over NFS.
a RAID system is recommended on your asset server of course
a second asset server is recommended, with the same specs. You can make it act as a backup server and regularly rsync your paperclip images to it. If the master asset server ever goes down, you'll be able to switch to this one.
When mounting the shared NFS folder, use the soft option, and mount via a high-speed local network connection, for example : mount -o soft 10.0.0.1:/export/shared_image_folder . If you're not specifying the soft option, and the asset server goes down, your Ruby instances will keep waiting for the server to go up. Everything will be stuck, and the website will look down. Learned this one the hard way ...
THese are general guidelines to use NFS. I'm using it on a quite big production website with hundreds of thousands of images and it works fine for me.
If you don't want to use a file share like NFS, you could store the images in your database. Here is a gem that provides a :database storage type for Paperclip:
https://github.com/softace/paperclip_database
Related
I am trying to create a dashboard using CSV files, Highcharts.js, and HTML5. In a local development environment I can render the charts using CSVs both on my file system and hosted on the web. The current goal is to deploy the dashboard live on Heroku.
The CSVs will be updated manually - for now - once per day in a consistent format as required by Highcharts. The web application should be able to render the charts with these new, "standardized" CSVs whenever the dashboard page is requested. My question is: where do I host these CSVs? Do I use S3? Do I keep them on my local file system and manually push the updates to heroku daily? If the CSVs are hosted on another machine, is there a way for my application (and only my application) to access them securely?
Thanks!
Use the gem carrierwave direct to upload the file directly from the client to an Amazon S3 bucket.
https://github.com/dwilkie/carrierwave_direct
You basically give the trusted logged in client a temporary key to upload the file, and nothing else, and then the client returns information about the uploaded file to your web app. Make sure you have set the upload to be private to prevent any third parties from trying to brut force find the CSV. You will then need to create a background worker to do the actually work on the CVS file. The gem has some good docs on how to do this.
https://github.com/dwilkie/carrierwave_direct#processing-and-referencing-files-in-a-background-process
In short in the background process you will download the file temporarily to heroku, parse it out, get the data you need and then discard the copy on heroku, and if you want the copy on S3. This way you get around the heroku issue of permanent file storage, and the issue of tied up dynos with direct uploads, because there is nothing like NGINX for file uploads on heroku.
Also make sure that the file size does not exceed the available memory of your worker dyno, otherwise you will crash. Sense you don't seem to need to worry about concurrency I would suggest https://github.com/resque/resque.
I have a Rails 3 application that needs to display images from another application. Those images change over time, so I have a task that runs hourly to check for changes. For performance reasons, I want to create a sprite with those images that I serve from my own application (so 1 image and 1 css file).
Ideally, I'd like Sprockets to handle these files in the same way it does any of my other images and stylesheets in my application so I don't have to roll my own minification, gzip, caching, etc solutions.
Is there a way to hook into Sprockets at runtime so that I don't have to stop my server, precompile, and start the server again?
I was unable to find any way to hook into sprockets.
For those who are curious, I solved this problem in the following way (but left out the minification/gzip piece, since the tiny performance boost doesn't justify the complexity):
Create the sprite files as usual.
Determine the md5 hash value for the image.
Copy the new files to the public folder using the md5 hash value as part of the file name.
Send the new files to all servers in the cluster.
Update a configuration variable on all servers to know which file to serve (e.g. MyApp::Application.config.tool_icon_md5_key).
Remove old files from all servers in the cluster.
The clustering is actually the most difficult piece. The key there is avoiding the case where one server requests the new file but the server that handles that request doesn't actually have that file yet.
I'm working on a Rails app that accepts file uploads and where users can modify these files later. For example, they can change the text file contents or perform basic manipulations on images such as resizing, cropping, rotating etc.
At the moment the files are stored on the same server where Apache is running with Passenger to serve all application requests.
I need to move user files to dedicated server to distribute the load on my setup. At the moment our users upload around 10GB of files in a week, which is not huge amount but eventually it adds up.
And so i'm going through a different options on how to implement the communication between application server(s) and a file server. I'd like to start out with a simple and fool-proof solution. If it scales well later across multiple file servers, i'd be more than happy.
Here are some different options i've been investigating:
Amazon S3. I find it a bit difficult to implement for my application. It adds complexity of "uploading" the uploaded file again (possibly multiple times later), please mind that users can modify files and images with my app. Other than that, it would be nice "set it and forget it" solution.
Some sort of simple RPC server that lives on file server and transparently manages files when looking from the application server side. I haven't been able to find any standard and well tested tools here yet so this is a bit more theorethical in my mind. However, the Bert and Ernie built and used in GitHub seem interesting but maybe too complex just to start out.
MogileFS also seems interesting. Haven't seen it in use (but that's my problem :).
So i'm looking for different (and possibly standards-based) approaches how file servers for web applications are implemented and how they have been working in the wild.
Use S3. It is inexpensive, a-la-carte, and if people start downloading their files, your server won't have to get stressed because your download pages can point directly to the S3 URL of the uploaded file.
"Pedro" has a nice sample application that works with S3 at github.com.
Clone the application ( git clone git://github.com/pedro/paperclip-on-heroku.git )
Make sure that you have the right_aws gem installed.
Put your Amazon S3 credentials (API & secret) into config/s3.yml
Install the Firefox S3 plugin (http://www.s3fox.net/)
Go into Firefox S3 plugin and put in your api & secret.
Use the S3 plugin to create a bucket with a unique name, perhaps 'your-paperclip-demo'.
Edit app/models/user.rb, and put your bucket name on the second last line (:bucket => 'your-paperclip-demo').
Fire up your server locally and upload some files to your local app. You'll see from the S3 plugin that the file was uploaded to Amazon S3 in your new bucket.
I'm usually terribly incompetent or unlucky at getting these kinds of things working, but with Pedro's little S3 upload application I was successful. Good luck.
you could also try and compile a version of Dropbox (they provide the source) and ln -s that to your public/system directory so paperclip saves to it. this way you can access the files remotely from any desktop as well... I haven't done this yet so i can't attest to how easy/hard/valuable it is but it's on my teux deux list... :)
I think S3 is your best bet. With a plugin like Paperclip it's really very easy to add to a Rails application, and not having to worry about scaling it will save on headaches.
I have multiple applications (an admin application, a "public"/non-admin application and a web service application) that all share a single database.
I've gotten the applications to share models and other code where appropriate, so I don't have multiple copies of the same code in each. However, the one task that I've yet to configure is how to share files that get uploaded between applications. I'm using Paperclip to successfully upload files to my applications, but if it uploads the files to the application doing the upload.
Ideally, I'd like to be able to serve all the files from the web service. My idea was that I'd need some type of task executed every time a new file is uploaded to any of the applications to have the file created in the file structure of the web service.
I know I could easily accomplish serving files from a single application if I loaded the files into the database (which is how I accomplished this in a similar application suite), but I'm not sure if that's the best route to go for managing/serving the files. Another idea I had was storing the files in the database and having the web service manage "serving" them and having it create the file on the disk on the first request. After the first request for the file, the web service would serve the file from the disk rather than from the database.
Does anyone have any ideas on what the best way to accomplish this might be? Or any better ideas?
Thank you in advance for any feedback anyone might have on the subject.
I'd recommend putting them in a shared location that is served directly by your front end webserver (not Rails) if you have that kind of setup, in this example it's serving up a location called files that points at a folder on disk. Then in your paperclip options, change the save location.
has_attached_file :image,
:url => "/files/:basename.:extension",
:path => "/var/htdocs/public/files/:basename.:extension"
Are you running all apps on the same UNIX/Linux system? Have you tried creating symbolic links to share the folder that contains the images? The goal is to save all images to the same location. Eliminating the need to throw in complicated hooks for attachment creation.
Paperclip by default stores things at :rails_root/public/system/:attachment/:id/:style/:filename If you're sharing a database you won't have to worry about collisions. And you just need to create a system folder to be used by each app.
You can use one app's public/system folder as the master, or create an entirely new one. From this point on all other system folders that aren't the master one will be referred to slave folders. Once you've chosen your master it's as simple as moving everything in each slave folder to the master folder. Deleting the slave folders and replacing them with a symbolic link to the master folder.
Sample command set to migrate and replace with symlink given the paperclip defaults. It's probably a good idea to stop the server before attempting this.
$ mv /path/to/slave/project/public/system/* /path/to/master/system
$ mv /path/to/slave/project/public/system.bak
$ ln -s /path/to/master/system /path/to/slave/project/public/system
Once you're sure the migration is sucessful you can remove the backup:
$ rm /path/to/slave/project/public/system.bak
I have a question about hosting large dynamically-generated assets and Heroku.
My app will offer bulk download of a subset of its underlying data, which will consist of a large file (>100 MB) generated once every 24 hours. If I were running on a server, I'd just write the file into the public directory.
But as I understand it, this is not possible with Heroku. The /tmp directory can be written to, but the guaranteed lifetime of files there seems to be defined in terms of one request-response cycle, not a background job.
I'd like to use S3 to host the download file. The S3 gem does support streaming uploads, but only for files that already exist on the local filesystem. It looks like the content size needs to be known up-front, which won't be possible in my case.
So this looks like a catch-22. I'm trying to avoid creating a gigantic string in memory when uploading to S3, but S3 only supports streaming uploads for files that already exist on the local filesystem.
Given a Rails app in which I can't write to the local filesystem, how do I serve a large file that's generated daily without creating a large string in memory?
${RAILS_ROOT}/tmp (not /tmp, it's in your app's directory) lasts for the duration of your process. If you're running a background DJ, the files in TMP will last for the duration of that process.
Actually, the files will last longer, the reason we say you can't guarantee availability is that tmp isn't shared across servers, and each job/process can run on a different server based on the cloud load. You also need to make sure you delete your files when you're done with them as part of the job.
-Another Heroku employee
Rich,
Have you tried writing the file to ./tmp then streaming the file to S3?
-Blake Mizerany (Heroku)