What is ActiveStorage's 'attach' doing behind the scenes when files are uploaded to a service as like AWS? - ruby-on-rails

I can't seem to find a detailed explanation of what process the attach method initialises when uploading a file (not direct upload) to a configured service such as Amazon S3.
Does the file get uploaded to the application server temporarily first while a job is created to upload it to S3?
If so, is there a job to clear it out or is it removed once the job uploading it to S3 returns successfully?
What if that job fails, does it just replace itself? etc etc
I think I understand the process for direct uploads and why a bit of manual error handling needs to be done, but I'd like to know exactly how ActiveStorage handles uploads through the application server and I can't seem to find solid documentation on this.

Related

Rails. Seahorse::Client::NetworkingError Amazon S3

It's duplicating this thread Seahorse::Client::NetworkingError Amazon S3 file upload with rails
But I don't understand what's actually going on there. So I am looking for a better explanation here.
The issue - my app asks a user for a photo upload. Then saves the photo to AWS S3 using gem aws-sdk-rails. Everything works fine for the most of the times.
But yesterday I got an error notification:
Seahorse::Client::NetworkingError: Connection reset by peer
...
99 File "/app/app/controllers/customers_controller.rb" line 22 in create
...
Here it says it might be a network issue https://github.com/aws/aws-sdk-ruby/issues/1572
But I still can't figure out how to handle this kind of situation.
Should I use some sort of background worker for this? I have no experience with background workers (if it's a right track at all).

How to see the speedup when using Cloudinary "direct upload" method?

I have a RoR web app that allow users upload images and use Cloudinary as cloud storage. I read their document and find a cool way called "direct uploading" which reduce my server's loading. To my knowledge, the spirit is changing workflow
image -> server -> Cloudinary
to
image -> Cloudinary
and my server only store an Cloudinary url to database, not the image file (Tell me if I'm wrong, thx).
So my question is, how to check whether I have changed to "direct uploading" method successfully? Open element inspector to see time cost for each POST and GET requests? Other better options?
I expect big advances via this way, but how can I feel it?
Thanks form a rookie =)
# The app is deployed on heroku.
# Doesn't change to direct uploading method yet.
# This app is private, only serve for around 10 people.
You can indeed (and it is very recommended to) bypass your server and let Cloudinary take care of the upload processing directly. This indeed lowers the processing of your server to simply store the uploaded image's details, and the image is directly stored in your Cloudinary account. This indeed quickens the upload process. You can test out the sample project which demonstrates both server-side and client-side uploads.

How to speed up image uploading carrierwave and Rails4

I am working with Rails4 and carrierwave, uploading the images and files to S3. But it's taking much time and very slow. How to handle this situation to speed up the server speed!!!
How to handle this using Background Jobs and Handle request from lot of users.
Also getting images is very slow into my application!!!
Can you suggest me how to achieve Rails severe works fast while uploading files?
You might consider uploading directly from the client to S3 via Ajax. This would nearly completely take your server out of the mix.
Uploading Image to Amazon s3 with HTML, javascript & jQuery with Ajax Request (No PHP)
This is a well documented concept elsewhere online.
Amazon S3 now has notifications for newly created objects.
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
You could drop the upload notifications into an Amazon SQS queue. You could then use a gem like Fog to create a background worker to pull events off the queue to create or update records in the database to reflect the newly completed upload.
https://github.com/fog/fog
Regardless of the solution, if you're uploading big files, it's likely your local network's upload speed that is the bottleneck.

Monitor and navigate S3 bucket for new files added by users

I have a Rails app that catalogues recorded music products with metadata & wav files.
Previously, my users had the option to send me files via ftp, which i'd monitor with a cron task for new .complete files and then pick it's associated .xml file and a perform metadata import and audio file transfer to S3.
I regularly hit capacity limits on the prior FTP so decided to move the user 'dropbox' to S3, with an FTP gateway to allow users to send me their files. Now it's on S3 and due to S3 not storing the object in folders i'm struggling to get my head around how to navigate the bucket, find the .complete files and then perform my imports as usual.
Can anyway recommend how to 'scan' a bucket for new .complete files.....read the filename and then pass back to my app so that I can then pick up it's xml, wav and jpg files?
The structure of the files in my bucket is like this. As you can see there are two products here. I would need to find both and import their associated xml data and wavs/jpg
42093156-5060156655634/
42093156-5060156655634/5060156655634.complete
42093156-5060156655634/5060156655634.jpg
42093156-5060156655634/5060156655634.xml
42093156-5060156655634/5060156655634_1_01_wav.wav
42093156-5060156655634/5060156655634_1_02_wav.wav
42093156-5060156655634/5060156655634_1_03_wav.wav
42093156-5060156655634/5060156655634_1_04_wav.wav
42093156-5060156655634/5060156655634_1_05_wav.wav
42093156-5060156655634/5060156655634_1_06_wav.wav
42093156-5060156655634/5060156655634_1_07_wav.wav
42093156-5060156655634/5060156655634_1_08_wav.wav
42093156-5060156655634/5060156655634_1_09_wav.wav
42093156-5060156655634/5060156655634_1_10_wav.wav
42093156-5060156655634/5060156655634_1_11_wav.wav
42093163-5060243322593/
42093163-5060243322593/5060243322593.complete
42093163-5060243322593/5060243322593.jpg
42093163-5060243322593/5060243322593.xml
42093163-5060243322593/5060243322593_1_01_wav.wav
Though Amazon S3 does not formally have the concept of folders, you can actually simulate folders through the GET Bucket API, using the delimiter and prefix parameters. You'd get a result similar to what you see in the AWS Management Console interface.
Using this, you could list the top-level directories, and scan through them. After finding the names of the top-level directories, you could change the parameters and issue a new GET Bucket request, to list the "files" inside the "directory", and check for the existence of the .complete file as well as your .xml and other relevant files.
However, there might be a different approach to your problem: did you consider using SQS? You could make the process that receives the uploads post a message to a queue in SQS, say, completed-uploads, with the name of the folder of the upload that just completed. Another process would then consume the queue and process the finished uploads. No need to scan through the directories in S3.
Just note that, if you try the SQS approach, you might need to be prepared for the possibility of being notified more than once of a finished upload: SQS guarantees that it will eventually deliver posted messages at least once; you might receive duplicated messages! (you can identify a duplicated message by saving the id of the received message on, say, a consistent database, and checking newly received messages against the same database).
Also, remember that, if you use the US Standard Region for S3, then you don't have read-after-write consistency, you have only eventual-consistency, which means that the process receiving messages from SQS might try to GET the object from S3 and get nothing back -- just try again until it sees the object.

Does Amazon S3's HTTP Uploads feature support web-hook style callbacks?

When uploading files to Amazon S3 using the browser http upload feature, I know I can specify a success_action_redirect field/value that will tell my browser where to go when the upload is done.
I'm wondering: is it possible to ask Amazon to make a web hook style POST request to my web server whenever a file gets uploaded?
Basically, I want a way of being notified whenever a client uploads a new file, so that my server can process the upload. I'd like to do this without relying on the client to make the request to my server to tell me the file has been uploaded (never trust the client, right?).
They just recently announced AWS Lambda which lets you run code in response to events, with S3 uploads being one of the supported events.
Amazon can publish a notification to SNS or SQS when an object has been created in your specified S3 bucket.
http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
There is no support from Amazon regarding this as yet but we can get around this with other tools like s3cmd etc, which allow us to write cronjobs to notify us of any change in the keys on S3. So if a new key is created (notified via timestamp) we could have it send a GET request to our server endpoint listening for updates from S3 with the associated metadata.
We could use GET or POST here as the data would be very minimal I think. Probably a form data with POST should do.

Resources