Rails synchronise with S3 - ruby-on-rails

Does anyone know of a way to synchronise an S3 bucket with rails?
Basically what I would like is a tool that recognised when a file on s3 was added, renamed, or moved (modified) and be able to relay data about the changes to my web application so that I can update my database with the new changes.
If not a tool to do this directly, what would be the best thing to use to interface with S3?

S3 being an Object store, operations such as rename and modify are not directly possible. For example, rename is combination of
delete A + copy B (where A is the old name and B is the new name)
Modify is also similar. It is a combination of
delete A + copy A (where A is the old and new name. However s3 can preserve older versions of A)
You can enable logging on your s3 bucket. Then download the logs periodically, parse it for the items you are interested in, and update your local metadata.
See Documentation: S3 Access Logging

Related

How to add an Amazon S3 data source via REST API?

I have CSV files in a directory of an S3 bucket. I would like to use all of the files as a single table in Dremio, I think this is possible as long as each file has the same header/columns as the others.
Do I need to first add an Amazon S3 data source using the UI or can I somehow add one as a Source using the Catalog API? (I'd prefer the latter.) The REST API documentation doesn't provide a clear example of how to do this (or I just didn't get it), and I have been unable to find how to get the "New Amazon S3 Source" configuration screen as shown in the documentation, perhaps because I've not logged in as an administrator?
For example, let's say I have a dataset split over two CSV files in an S3 bucket named examplebucket within a directory named datadir:
s3://examplebucket/datadir/part_0.csv
s3://examplebucket/datadir/part_1.csv
Do I somehow set the S3 bucket/path s3://examplebucket/datadir as a data source and then promote each of the files contained therein (part_0.csv and part_1.csv) as a Dataset? Is that sufficient to allow all the files to be used as a single table?
It turns out that this is only possible for admin users, normal users can't add a source. To do what I have proposed above you put the files into an S3 bucket which has already been configured as a Dremio source by an admin user. Then you promote the files or folder as a data source using the Dremio Catalog API.

Should I delete uploaded files from the file system on my own?

I have a rails app where the user can upload files. The files get uploaded to an external cloud service by a backgroud jobs. It's vital for my app that the files won't get stored in the file system after they've been uploaded. Not right away, in general -- they must not remain in the file system.
Should I delete them on my own? Or will get deleted automatically?
Also, debugging my app, I noticied this for an attachment params:
[2] pry(#<MyController>)> my_params.tempfile.path
"/var/folders/qr/0v5z71xn7x503ykyv1j6lkp00000gn/T/RackMultipart20181007-10937-3ntmgg.png"
That file gets stored not in "/tmp" but in "/var" and that means that it won't get deleted automatically, right?
Note that I'm not using paperclip for this task.
You are right the files won't get deleted automatically.
You have to delete the file explicitly at some point in time.
It depends how you set it up. If you used Tempfile to save it then yes the files will be deleted when the object is garbage collected. If not then it probably won't be deleted.
If the files get stored on an external service it might be worth setting up ActiveStorage which allows you to directly upload to external storage providers without the file ever touching your server.

Move Images From Parse To S3 AWS

I need help moving the images I have from Parse to S3 on AWS. I have viewed numerous supposed guides and GitHub projects, but everything stops short at giving you all the information. One even says, you need GCS bucket set up, but gives no details on how to set up one. Just someone please help me with this. I have the S3 File Adapter in my index.js all set up for the app, but none of the images are there, they are still hosted in parse.
If you are referring to old images that where hosted with parse.com that you want to move across to your own environment then it can be done with the utility tool.
Get all files across all classess in a Parse database. Print file URLs
to console OR transfer to S3, GCS, or filesystem. Rename files so that
Parse Server no longer detects that they are hosted by Parse. Update
MongoDB with new file names.
https://github.com/parse-server-modules/parse-files-utils
Moving forward if you have setup your S3 bucket correctly all new images from your app will be stored there.
https://github.com/ParsePlatform/parse-server/wiki/Configuring-File-Adapters

How to reference and update a file on S3 from Rails 4

I have a Rails 4 application that needs to use a number of excel files, representing rosters, (20 or so, grouped by their own individual committee) that have to be read in and editable by the User. Pre-deploy I had the system working perfectly where these files would live in public/rosters and could be referenced and edited by any authenticated user, unfortunately when I deployed to Heroku I could no longer do this.
I have been using an S3 bucket to host the other files necessary for this and other related apps, and it's been working wonderfully, for what I've been using it for; so I decided to try it as a solution to this problem. Unfortunately it would appear as if I could only access the files the way I had been by making them publicly accessible, which is not something that I want to do.
So my question is this: what would be the best way to reference these files (using my access_key_id and secret_access_key to authenticate ideally) and allow a User to push changes that will overwrite the file on the S3 bucket.
You have to use aws-sdk-ruby to write file to S3 which works using access_key_id and secret_access_key. Check this documentation. Hope this helps. Thanks!

How do I copy files between buckets using s3 from a rails application?

I am currently developing a rails application that tries to copy/move videos from one bucket to another in s3. However i keep getting a proxy error 502 on my rails application. In the mongrel log it says "failed to allocate memory." Once this error occurs the application dies and we must restart is.
Seems like your code is reading the entire resource into memory, and that out-of-memories your application. A naïve way to do this (and from your description, you're doing something like this already) would be to download the file and upload it again: just download it to a local file and not into memory. However, Amazon engineers have thought ahead and provide APIs that can deal with this specific case, as well.
If you're using something like the RightAWS gem, you can use its S3Interface like so:
# With s3 being an S3 object acquired via S3Interface.new
# Copies key1 from bucket b1 to key1_copy in bucket b2:
s3.copy('b1', 'key1', 'b2', 'key1_copy')
And if you're using the naked S3 HTTP interface, see amazon's object copy docs for a solution that uses only HTTP to copy one object from one bucket to another.
try to stream files instead of loading whole file into memory and then working with it.
for example, if you're using aws-s3 gem, do not use:
data = open(file)
S3Object.store file_name, data, BUCKET
Use following instead:
S3Object.store file_name, open(file), BUCKET
not sure how exactly to "stream-download" the file though.
boto works well. See this thread. Using boto, you copy the objects straight from one bucket to another, rather than downloading them to the local machine and then uploading them to another bucket.
You can copy bucket to bucket directly using the fog gem.
s3 = Fog::Storage.new(your_aws_credentials)
s3.copy_object('source-bucket', 'source/path', 'dest-bucket', 'dest/path')

Resources