fb_graph upload photo from base64 encoded string - ruby-on-rails

I'm receiving attachments from postmarkapp (described here: http://developer.postmarkapp.com/developer-inbound-parse.html#attachments).
I want to upload those photos to facebook using fb_graph (https://github.com/nov/fb_graph) using its photo! method (https://github.com/nov/fb_graph/wiki/Photo-and-Album).
This is easy and works fine when testing by specifying a :source like in the examples from an actual file.
However I'm trying to not write out to a file but instead just convert the base64 encoded string to StringIO and pass that as the :source argument. This doesn't work and I get this error:
ruby
FbGraph::InvalidRequest: OAuthException :: (#324) Requires upload file
The reason I don't want to write out a file is because I'm using heroku and delayed_job so I'm not sure if a file I write out will still be around when the job is processed. That would be nice however since my current plan is to store the images in the db with delayed job.
Thanks.

I couldn't find a way to make this work with heroku without first uploading it to mongohq with gridfs. You can't use the cedar ephemeral file system because those files, written during a controller action, won't be visible to your delayed_job worker.
So even though it sucks I do this;
Receive upload from postmarkapp (blocks one dyno)
Write to mongohq using grid fs (this involves one upload which blocks one dyno)
Queue job using delayed_job
Read back from mongohq (blocks one worker while downloading)
Re-upload to FB when posting
So why not just post directly to facebook if I'm going to incur that initial blocking cost of uploading to mongohq anyway? Because that upload is way faster than uploading to FB for reasons unkonwn.
The right answer on heroku is to have a node.js dyno handling these callbacks from postmark so the dyno isn't blocked during either the read from postmark or the write to mongohq (or facebook) then to do some extra work to have the node app interact with the rails app to stay synced.

Related

How to speed up image uploading carrierwave and Rails4

I am working with Rails4 and carrierwave, uploading the images and files to S3. But it's taking much time and very slow. How to handle this situation to speed up the server speed!!!
How to handle this using Background Jobs and Handle request from lot of users.
Also getting images is very slow into my application!!!
Can you suggest me how to achieve Rails severe works fast while uploading files?
You might consider uploading directly from the client to S3 via Ajax. This would nearly completely take your server out of the mix.
Uploading Image to Amazon s3 with HTML, javascript & jQuery with Ajax Request (No PHP)
This is a well documented concept elsewhere online.
Amazon S3 now has notifications for newly created objects.
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
You could drop the upload notifications into an Amazon SQS queue. You could then use a gem like Fog to create a background worker to pull events off the queue to create or update records in the database to reflect the newly completed upload.
https://github.com/fog/fog
Regardless of the solution, if you're uploading big files, it's likely your local network's upload speed that is the bottleneck.

Rails/Heroku - How to create a background job for process that requires file upload

I run my Rails app on Heroku. I have an admin dashboard that allows for creating new objects in bulk through a custom CSV uploader. Ultimately I'll be uploading CSVs with 10k-35k rows. The parser works perfectly on my dev environment and 20k+ entries are successfully created through uploading the CSV. On Heroku, however, I run into H12 errors (request timeout). This obviously makes sense since the files are so large and so many objects are being created. To get around this I tried some simple solutions, amping up the dyno power on Heroku and reducing the CSV file to 2500 rows. Neither of these did the trick.
I tried to use my delayed_job implementation in combination with adding a worker dyno to my procfile to .delay the file upload and process so that the web request wouldn't timeout waiting for the file to process. This fails, though, because this background process relies on a CSV upload which is held in memory at the time of the web request so the background job doesn't have the file when it executes.
It seems like what I might need to do is:
Execute the upload of the CSV to S3 as a background process
Schedule the processing of the CSV file as a background job
Make sure the CSV parser knows how to find the file on S3
Parse and finish
This solution isn't 100% ideal as the admin user who uploads the file will essentially get an "ok, you sent the instructions" confirmation without good visibility into whether or not the process is executing properly. But I can handle that and fix later if it gets the job done.
tl;dr question
Assuming the above-mentioned solution is the right/recommended approach, how can I structure this properly? I am mostly unclear on how to schedule/create a delayed_job entry that knows where to find a CSV file uploaded to S3 via Carrierwave. Any and all help much appreciated.
Please request any code that's helpful.
I've primarily used sidekiq to queue asynchronous processes on heroku.
This link is also a great resource to help you get started with implementing sidekiq with heroku.
You can put the files that need to be processed in a specific S3 bucket and eliminate the need for passing file names to background job.
Background job can fetch files from the specific s3 bucket and start processing.
To provide real time update to the user, you can do the following:
use memcached to maintain the status. Background job should keep updating the status information. If you are not familiar with caching, you can use a db table.
include javascript/jquery in the user response. This script should make ajax requests to get the status information and provide updates to user online. But if it is a big file, user may not want to wait for the completion of the job in which case it is better provide a query interface for checking job status.
background job should delete/move the file from the bucket on completion.
In our app, we let users import data for multiple models and developed a generic design. We maintain the status information in db since we perform some analytics on it. If you are interested, here is a blog article http://koradainc.com/blog/ that describes our design. The design does not describe background process or S3 but combined with above steps should give you full solution.

How to post images directly to s3 on a heroku app from a json request?

I have a rails app hosted on heroku and a mobile app made with rhodes.
I'd like to send images from the mobile app to my rails app using an HTTP POST request. Since heroku doesn't allow you to store files, I'm using amazon s3.
I can't send the file from heroku to s3 because it takes more than 30 seconds and causes a timeout. I've seen plenty of examples of uploading a file direct to s3 when the user has a form, but this obviously won't work in this case.
I tried using the suggestion here:
rails 3, heroku, aws-s3, simply trying to upload a file to S3 that is POSTed (http/multipart) to our app
but I still get a 503 request timeout.
I don't want to put my amazon s3 keys on the app.
Right now, I feel like my only option is to host my app on EC2 which I would rather not do as I like the simplicity of Heroku.
Also, it seems strange that these uploads would take so long regardless. I'm only posting images from a mobile phone camera, so they're not huge files.
I was getting the same error in a project in my job. Some people says that the only way to solve this is by uploading files directly to the S3 bucket. This is difficult in our case, because we are using Paperclip Gem for Rails and different size versions of the image.
Some other people says that "The Heroku timeout is a set in stone thing that you need to work around. Direct upload to S3 is the only option, with some sort of post-upload processing required", so I recomend to do the next:
Maybe this is not a solution but, it could be very useful, it was for me in a Rails App:
Worker Dynos, Background Jobs and Queueing
Perhaps you should move this heavy lifting into a background job which can run asynchronously from your web request.
Regards!
So I finally figured out how to do this.
After lots of back and forth with AWS reps and Cloudfiles reps and pulling my hair out, I realized it would be a lot less work to just get another rails server that could write to the filesystem.
So, I started another rails app on openshift. It's just as easy as Heroku to get started (in fact, I might consider moving my rails app there, but it's too new for my taste right now and doesn't have the community around it that Heroku does).
Then, I just had to have communications between my two rails apps.
I know it's not the best/scalable/elegant fix, but it got the job done, and that's what matters in the end!

Rails and Node in the same app on Heroku?

I'm building a Rails application that deals with file uploads through CarrierWave. Currently, larger file uploads block the server for a significant amount of time. I have seen solutions like the s3-swf-upload-plugin gem that skip the local server and send files straight from the browser to S3, but this would require some modifications for pre-generating unique filenames and synchronizing them with the database. I'm sure it wouldn't be too much trouble, but Heroku's new Cedar stack gave me the idea of offloading these long running requests to a node.js instance running in the same app. I'm not very experienced with these kinds of things, so excuse my wording if it's a bit off.
Would something like this be possible? How would you configure things such that certain requests (ones involving file uploads, in this case) would be handled by a node app bundled in the same heroku repository as the main rails app?
I don't think it's possible to mix Rails and Node in the same app. However, you could get roughly the same functionality by using two separate apps that communicate with each other.
You can use ENV['DATABASE_URL'] to determine your database connection string. Use the heroku console to set it as an ENV variable for your Node app (e.g. heroku config:add OTHER_DB=your_connection_string) should then be able to use the same connection string to connect to the same database from your other heroku app. You could even access it outside of heroku if you have a dedicated database, see: http://devcenter.heroku.com/articles/external-database-access
For seamless integration between the two apps, you could have a form rendered by the Rails app post to a URL of the Node app. In addition to the file upload, include in that form via hidden input fields any other variables you need to communicate to the Node app. When the upload to the Node app is done, it could redirect the client back to the Rails app, passing any status or variables as get parameters.
Run the two apps under two subdomains of the same domain and you could even share cookies between them.
You need two apps. I am doing exactly what's described in this question. I wanted large streaming uploads, and since Rack writes downloads to a temp file before passing them through to the handler, it is not possible to do this with Rails.
Node.js, on the other hand, does this beautifully. So there are two Heroku apps, the Rails web app and the Node.js (Express) web app. The Rails web app uses SWFUpload as the client-side solution. The Rails app and the Node.js app both have a secret key as a Heroku config variable. When it's time for the user to upload, client-side Javascript requests an upload URL from the Rails server. The Rails server forms an upload URL with an Expires parameter and computes a signature using the secret key. The client-side Javascript handler passes this URL along to SWFUpload (upload_url property). The user selects the files to upload, and SWFUpload starts posting them to the upload_url. The Node.js app verifies that the URL is not expired and that the signature is valid. It processes the form data with the formidable library.
One other detail. Flash requires the Node.js app to serve a crossdomain.xml that permits the cross-site request.
My Node.js app doesn't touch the database; but if it did I would share DATABASE_URL as previously suggested. Note that you can't share a DATABASE_URL outside of Heroku unless you have a dedicated DB. The DATABASE_URLs for shared databases are not reachable from outside Heroku (unlike some other services like RedisToGo).

Why would you upload assets directly to S3?

I have seen quite a few code samples/plugins that promote uploading assets directly to S3. For example, if you have a user object with an avatar, the file upload field would load directly to S3.
The only way I see this being possible is if the user object is already created in the database and your S3 bucket + path is something like
user_avatars.domain.com/some/id/partition/medium.jpg
But then if you had an image tag that tried to access that URL when an avatar was not uploaded, it would yield a bad result. How would you handle checking for existence?
Also, it seems like this would not work well for most has many associations. For example, if a user had many songs/mp3s, where would you store those and how would you access them.
Also, your validations will be shot.
I am having trouble thinking of situations where direct upload to S3 (or any cloud) is a good idea and was hoping people could clarify either proper use cases, or tell me why my logic is incorrect.
Why pay for storage/bandwidth/backups/etc. when you can have somebody in the cloud handle it for you?
S3 (and other Cloud-based storage options) handle all the headaches for you. You get all the storage you need, a good distribution network (almost definitely better than you'd have on your own unless you're paying for a premium CDN), and backups.
Allowing users to upload directly to S3 takes even more of the bandwidth load off of you. I can see the tracking concerns, but S3 makes it pretty easy to handle that situation. If you look at the direct upload methods, you'll see that you can force a redirect on a successful upload.
Amazon will then pass the following to the redirect handler: bucket, key, etag
That should give you what you need to track the uploaded asset after success. Direct uploads give you the best of both worlds. You get your tracking information and it unloads your bandwidth.
Check this link for details: Amazon S3: Browser-Based Uploads using POST
If you are hosting your Rails application on Heroku, the reason could very well be that Heroku doesn't allow file-uploads larger than 4MB:
http://docs.heroku.com/s3#direct-upload
So if you would like your users to be able to upload large files, this is the only way forward.
Remember how web servers work.
Unless you're using a sort of async web setup like you could achieve with Node.JS or Erlang (just 2 examples), then every upload request your web application serves ties up an entire process or thread while the file is being uploaded.
Imagine that you're uploading a file that's several megabytes large. Most internet users don't have tremendously fast uplinks, so your web server spends a lot of time doing nothing. While it's doing all of that nothing, it can't service any other requests. Which means your users start to get long delays and/or error responses from the server. Which means they start using some other website to get the same thing done. You can always have more processes and threads running, but each of those costs additional memory which eventually means additional $.
By uploading straight to S3, in addition to the bandwidth savings that Justin Niessner mentioned and the Heroku workaround that Thomas Watson mentioned, you let Amazon worry about that problem. You can have a single-process webserver effectively handle very large uploads, since it punts that actual functionality over to Amazon.
So yeah, it's more complicated to set up, and you have to handle the callbacks to track things, but if you deal with anything other than really small files (and even in those cases), why cost yourself more money?
Edit: fixing typos

Resources