Stream uploading large files using aws-sdk - ruby-on-rails

Is there a way to stream upload large files to S3 using aws-sdk?
I can't seem to figure it out but I'm assuming there's a way.
Thanks

Update
My memory failed me and I didn't read the quote mentioned in my initial answer correctly (see below), as revealed by the API documentation for (S3Object, ObjectVersion) write(data, options = {}) :
Writes data to the object in S3. This method will attempt to
intelligently choose between uploading in one request and using
#multipart_upload.
[...] You can pass :data or :file as the first argument or as options. [emphasis mine]
The data parameter is the one to be used for streaming, apparently:
:data (Object) — The data to upload. Valid values include:
[...] Any object responding to read and eof?; the object must support the following access methods:
read # all at once
read(length) until eof? # in chunks
If you specify data this way, you must also include the
:content_length option.
[...]
:content_length (Integer) — If provided, this option must match the
total number of bytes written to S3 during the operation. This option
is required if :data is an IO-like object without a size method.
[emphasis mine]
The resulting sample fragment might look like so accordingly:
# Upload a file.
key = File.basename(file_name)
s3.buckets[bucket_name].objects[key].write(:data => File.open(file_name),
:content_length => File.size(file_name))
puts "Uploading file #{file_name} to bucket #{bucket_name}."
Please note that I still haven't actually tested this, so beware ;)
Initial Answer
This is explained in Upload an Object Using the AWS SDK for Ruby:
Uploading Objects
Create an instance of the AWS::S3 class by providing your AWS credentials.
Use the AWS::S3::S3Object#write method which takes a data parameter and options hash which allow you to upload data from a file, or a stream. [emphasis mine]
The page contains a complete example as well, which uses a file rather than a stream though, the relevant fragment:
# Upload a file.
key = File.basename(file_name)
s3.buckets[bucket_name].objects[key].write(:file => file_name)
puts "Uploading file #{file_name} to bucket #{bucket_name}."
That should be easy to adjust to use a stream instead (if I recall correctly you might just need to replace the file_name parameter with open(file_name) - make sure to verify this though), e.g.:
# Upload a file.
key = File.basename(file_name)
s3.buckets[bucket_name].objects[key].write(:file => open(file_name))
puts "Uploading file #{file_name} to bucket #{bucket_name}."

I don't know how big the files you want to upload are, but for large files a 'pre-signed post' allows the user operating the browser to bypass your server and upload directly to S3. That may be what you need - to free up your server during an upload.

Related

When using activestorage in Rails 6, how do I retain a file when redisplaying a form WITHOUT uploading?

What I'd like to know is very simple:
How to retain uploaded files on form resubmission in Rails6 with ActiveStorage?
I have checked the below as a similar question.
When using activestorage in Rails 6, how do I retain a file when redisplaying a form?
The summary of suggested solution in it is like this:
Active Storage store attachments after the record is saved rather than
immediately. So, if you want to persist assigned file after validation
error, you must upload and save the file.
For example,
def update
...
# `obj` is your model that using `has_one_attached`.
if(obj.update)
redirect_to ...
else
obj.attachment_changes.each do |_, change|
if change.is_a?(ActiveStorage::Attached::Changes::CreateOne)
change.upload
change.blob.save
end
end
...
end
end
https://medium.com/#TETRA2000/active-storage-how-to-retain-uploaded-files-on-form-resubmission-91b57be78d53
or, use direct_upload:
= f.file_field :doc, direct_upload: true
= f.hidden_field :doc, value: f.object.doc.signed_id if f.object.doc.attached?
By using these solutions, yeah, I managed to retain a file when redisplaying a form.
However, these are against this pr intention. pr says
It’s of little use to identify an invalid file after it’s already been
shipped off to storage:
you might use a size validation to limit the cost that a single file
can add to your AWS bill, but if the file is stored before
validations run, you incur its cost regardless.
So, I don't want to upload file to persist it after validation.
How can I retain the file without uploading file and saving blob?
I cannot use CarrierWave.

How do I attach filenames with extensions in Rails 6 ActiveStorage?

I am using ActiveStorage in Rails 6. I am clear with the concept of has_one_attached and has_many_attached.
From that I had few questions:
Is possible to upload original filename with extension to storage instead of key?
How to specify storage path during has_many_attached. i.e I have 5 files that need to stored under object specific folder.
e.g /path/to/images/<image_id>/
The key is a secure token that points from your application's blob to the right file stored on the service (S3, etc.). Although you can't use something instead of the key, it is entirely possible to store the original (or any other) file/path when you attach. For example, given an instance of a class that has_many_attachments :images:
message.images.attach(io: File.open('/wherever/thing1.png'),
filename: '/path/to/images/thing1.png',
content_type: 'image/png')
The filename and content_type are stored with the blob, and can be used when querying:
message.images.blobs.find_by(filename: '/path/to/images/thing1.png')
=> #<ActiveStorage::Blob id: 1, key: "...", filename: "/path/to/images/thing1.png" ...>
So, if you have five files under a specific folder you would simply open/upload them and specify the appropriate :filename when you attach.

How to upload and download image and video to s3 glacier using rails

I want to upload images and videos on s3 glacier using ruby on rails.
Now I created a vault on S3 glacier and set all permissions.
Now I created an archive inside vault using rails method like :
vault.archives.create(:body => File.open(video_path).to_s,
:description => 'my first archive')
And after that I create archive based job like:
vault.jobs.create(:type => Fog::AWS::Glacier::Job::ARCHIVE,
:archive_id =>"my archive id" )
And getting these jobs by:
vault.jobs.get("my job id")
** it provides me the response like:**
id="return my job id",
action="ArchiveRetrieval",
archive_id="return my archive id",
archive_size=24,
completed=true,
completed_at=2019-03-05 19:49:36 UTC,
created_at=2019-03-05 15:55:29 UTC,
inventory_size=0,
description=nil,
tree_hash="xxxxxx",
sns_topic=nil,
status_code="Succeeded",
status_message="Succeeded",
vault_arn="xxxxxxxxxx:vaults/myvalutname",
format=nil,
type=nil
My questions are:
Is the approach of uploading image/video in the above code is correct or not? If it's not correct please suggest me the right way for uploading.
How I'll get uploaded image/video URL from the glacier s3.
Where I'll see uploaded videos and images stored on the glacier. Now it only shows Number of archives in my vault.
I need the Experts suggestions for my problem.
Please help me out.
There is a primary difference between S3 and Glacier. The basic concept of Glacier is to store long term files for backups. S3 is quick to access. But Glacier is to save files for a long time rarely access. So it is good to the user for doing backups where you want to save files but rarely retrieve unless any emergency arise. So the price to retrieve the file is more expensive than saving it. I am not sure why you want to save videos in Glacier but be firm that retrieval speed of Glacier is slow, and to save images/videos you may consider S3 than a glacier. Following is a reply to your answer
Yes, you are doing correctly, and this is the way you have to do it.
This is a little bit tricky. You have to use the database to save all those ids retrieved. So later you can easily retrieve. That meant instead of having any directory you can save all information in the database so later on you can retrieve it.
id="return my job id", # save it to database
action="ArchiveRetrieval",
archive_id="return my archive id", # better save in relation table of archives
archive_size=24,
completed=true,
completed_at=2019-03-05 19:49:36 UTC,
created_at=2019-03-05 15:55:29 UTC,
inventory_size=0,
description=nil,
tree_hash="xxxxxx",
sns_topic=nil,
status_code="Succeeded",
status_message="Succeeded",
vault_arn="xxxxxxxxxx:vaults/myvalutname",
format=nil,
type=nil
So it mean all uploads you will have following fields in db to access back.
To retrieve files you have to use your database and get everything. Now just use your object ID to get your file.
archive_id - id - description
return my archive id - return my job id - my first archive
When you want to do a list of files, just run ActiveRecord db records to get it and link it in a way so that action can retrieve that files using object id.

Active storage seed Rails

I want to seed my db with some instances containing active storage attachments, but i don't how i can do it. I tried some methods but not a success.
There is my Seed.
User.create(email: "test#ok.com", password: "okokok") if User.count.zero?
50.times do |i|
temp = Template.create(
title: Faker::Name.name,
description: Faker::Lorem.paragraph(2),
user: User.first,
github_link: Faker::SiliconValley.url,
category: rand(0...4)
)
puts Template.first.photo
temp.photo.attach(Template.first.photo)
end
Thx for your help
It's also in the documentation guide since a couple of days:
http://edgeguides.rubyonrails.org/active_storage_overview.html#attaching-file-io-objects
Sometimes you need to attach a file that doesn’t arrive via an HTTP
request. For example, you may want to attach a file you generated on
disk or downloaded from a user-submitted URL. You may also want to
attach a fixture file in a model test. To do that, provide a Hash
containing at least an open IO object and a filename:
#message.image.attach(io: File.open('/path/to/file'), filename: 'file.pdf')
When possible, provide a content type as well. Active Storage attempts
to determine a file’s content type from its data. It falls back to the
content type you provide if it can’t do that.
#message.image.attach(io: File.open('/path/to/file'), filename: 'file.pdf', content_type: 'application/pdf')
If you don’t provide a content type and Active Storage can’t determine
the file’s content type automatically, it defaults to
application/octet-stream.
Ok i found a solution, i post it for guys in the same situation:
temp.photo.attach(
io: File.open('storage/3n/GG/3nGGV5K5ucYZDYSYojV8mDcr'),
filename: 'file.png'
)
If you have more easiest solutions share it ;)

Including .xml file to rails and using it

So I have this currency .xml file:
http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml
Now, I am wondering, how can I make my rails application read it? Where do I even have to put it and how do I include it?
I am basically making a currency exchange rate calculator.
And I am going to make the dropdown menu have the currency names from the .xml table appear in it and be usable.
First of all you're going to have to be able to read the file--I assume you want the very latest from that site, so you'll be making an HTTP request (otherwise, just store the file anywhere in your app and read it with File.read with a relative path). Here I use Net::HTTP, but you could use HTTParty or whatever you prefer.
It looks like it changes on a daily basis, so maybe you'll only want to make one HTTP request every day and cache the file somewhere along with a timestamp.
Let's say you have a directory in your application called rates where we store the cached xml files, the heart of the functionality could look like this (kind of clunky but I want the behaviour to be obvious):
def get_rates
today_path = Rails.root.join 'rates', "#{Date.today.to_s}.xml"
xml_content = if File.exists? today_path
# Read it from local storage
File.read today_path
else
# Go get it and store it!
xml = Net::HTTP.get URI 'http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml'
File.write today_path, xml
xml
end
# Now convert that XML to a hash. Lots of ways to do this, but this is very simple xml.
currency_list = Hash.from_xml(xml_content)["Envelope"]["Cube"]["Cube"]["Cube"]
# Now currency_list is an Array of hashes e.g. [{"currency"=>"USD", "rate"=>"1.3784"}, ...]
# Let's say you want a single hash like "USD" => "1.3784", you could do a conversion like this
Hash[currency_list.map &:values]
end
The important part there is Hash.from_xml. Where you have XML that is essentially key/value pairs, this is your friend. For anything more complicated you will want to look for an XML library like Nokogiri. The ["Envelope"]["Cube"]["Cube"]["Cube"] is digging through the hash to get to the important part.
Now, you can see how sensitive this will be to any changes in the XML structure, and you should make the endpoint configurable, and that hash is probably small enough to cache up in memory, but this is the basic idea.
To get your list of currencies out of the hash just say get_rates.keys.
As long as you understand what's going on, you can make that smaller:
def get_rates
today_path = Rails.root.join 'rates', "#{Date.today.to_s}.xml"
Hash[Hash.from_xml(if File.exists? today_path
File.read today_path
else
xml = Net::HTTP.get URI 'http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml'
File.write today_path, xml
xml
end)["Envelope"]["Cube"]["Cube"]["Cube"].map &:values]
end
If you do choose to cache the xml you will probably want to automatically clear out old versions of the cached XML file, too. If you want to cache other conversion lists consider a naming scheme derived automatically from the URI, e.g. eurofxref-daily-2013-10-28.xml.
Edit: let's say you want to cache the converted xml in memory--why not!
module CurrencyRetrieval
def get_rates
if defined?(##rates_retrieved) && (##rates_retrieved == Date.today)
##rates
else
##rates_retrieved = Date.today
##rates = Hash[Hash.from_xml(Net::HTTP.get URI 'http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml')["Envelope"]["Cube"]["Cube"]["Cube"].map &:values]
end
end
end
Now just include CurrencyRetrieval wherever you need it and you're golden. ##rates and ##rates_retrieved will be stored as class variables in whatever class you include this module within. You must test that this persists between calls in your production setup (otherwise fall back to the file-based approach or store those values elsewhere).
Note, if the XML structure changes, or the XML is unavailable today, you'll want to invalidate ##rates and handle exceptions in some nice way...better safe than sorry.

Resources