ruby reading files from S3 with open-URI

ruby reading files from S3 with open-URI - ruby-on-rails

I'm having some problems reading a file from S3. I want to be able to load the ID3 tags remotely, but using open-URI doesn't work, it gives me the following error:
ruby-1.8.7-p302 > c=TagLib2::File.new(open(URI.parse("http://recordtemple.com.s3.amazonaws.com/music/745/original/The%20Stranger.mp3?1292096514")))
TypeError: can't convert Tempfile into String
from (irb):8:in `initialize'
from (irb):8:in `new'
from (irb):8
However, if i download the same file and put it on my desktop (ie no need for open-URI), it works just fine.
c=TagLib2::File.new("/Users/momofwombie/Desktop/blah.mp3")
is there something else I should be doing to read a remote file?
UPDATE: I just found this link, which may explain a little bit, but surely there must be some way to do this...
Read header data from files on remote server

Might want to check out AWS::S3, a Ruby Library for Amazon's Simple Storage Service
Do an AWS::S3:S3Object.find for the file and then an use about to retrieve the metadata
This solution assumes you have the AWS credentials and permission to access the S3 bucket that contains the files in question.

TagLib2::File.new doesn't take a file handle, which is what you are passing to it when you use open without a read.
Add on read and you'll get the contents of the URL, but TagLib2::File doesn't know what to do with that either, so you are forced to read the contents of the URL, and save it.
I also noticed you are unnecessarily complicating your use of OpenURI. You don't have to parse the URL using URI before passing it to open. Just pass the URL string.
require 'open-uri'
fname = File.basename($0) << '.' << $$.to_s
File.open(fname, 'wb') do |fo|
fo.print open("http://recordtemple.com.s3.amazonaws.com/music/745/original/The%20Stranger.mp3?1292096514").read
end
c = TagLib2::File.new(fname)
# do more processing...
File.delete(fname)
I don't have TagLib2 installed but I ran the rest of the code and the mp3 file downloaded to my disk and is playable. The File.delete would clean up afterwards, which should put you in the state you want to be in.

This solution isn't going to work much longer. Paperclip > 3.0.0 has removed to_file. I'm using S3 & Heroku. What I ended up doing was copying the file to a temporary location and parsing it from there. Here is my code:
dest = Tempfile.new(upload.spreadsheet_file_name)
dest.binmode
upload.spreadsheet.copy_to_local_file(:default_style, dest.path)
file_loc = dest.path
...
CSV.foreach(file_loc, :headers => true, :skip_blanks => true) do |row|}

This seems to work instead of open-URI:
Mp3Info.open(mp3.to_file.path) do |mp3info|
puts mp3info.tag.artist
end
Paperclip has a to_file method that downloads the file from S3.

Related

FilePond: all JPEG files are uploaded as ".jpg" - why?

I've written a small FilePond example (excellent piece of work BTW) but there is one thing that I can't understand:
When I upload, e.g., "foo.JPG" or "foo.jpeg" it is saved as "foo.jpg" on the server.
I would like to keep the file extension, but don't know how to do it.
I've tried the FileRename-plugin but the files are still saved as .jpg.

I worked out a solution. If anyone is interested I can post complete code, but basically I stored the real filename (e.g., "foo.jpeg") in metadata and extracted it on the PHP server side, and then renamed the temporary file using that name instead.
In FilePond.create:
beforeAddFile: (fileItem) => new Promise(resolve => {
fileItem.setMetadata('true_filename', fileItem.filename);
resolve(true);
}),

Rails uploading to Amazon S3 bucket

I have two differnt buckets in the same Amazon S3 account. I can upload fine to one but not the other.
pictures.rb
def self.set_s3_direct_post
return S3_BUCKET.presigned_post(key: "uploads/#{SecureRandom.uuid}/${filename}", success_action_status: '201', acl: 'public-read')
end
aws.rb
S3_BUCKET = Aws::S3::Resource.new.bucket(ENV['S3_FIRST_BUCKET'])
Trying to change bucket (with S3_SECOND_BUCKET instead of S3_FIRST_BUCKET) results in a failed upload. I notice that the url for a failed upload is in the form:
https://s3.amazonaws.com/%E2%80%9Csecond_bucket%E2%80%9D
but a successful upload looks like:
https://first_bucket.s3.amazonaws.com
How do I control this??

Unbelievable. I edited my ~/.profile file in TextEdit. The addition of the line:
export S3_SECOND_BUCKET="second_bucket"
put slightly curly double-quotes around second_bucket. Turns out these double-quotes are different to the ones Sublime Text 2 uses, which are straight (like above). The TextEdit versions weren't being recognised as quotes. Cheers TextEdit, you t**t.

Ruby on Rails copy S3 file with special characters (%C5) in path

I have a problem with my attachment system on web page. I store them on amazon S3 using paperclip. I have an option to copy attachment to new file. Everything works fine until there are polish special characters in title, like: ŁĄKA.jpg. Then I get an error:
Saving error: Appendix Paperclip::Errors::NotIdentifiedByImageMagickError
/Users/michal/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/activerecord-4.2.5/lib/active_record/validations.rb:79:in `raise_record_invalid'
/Users/michal/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/activerecord-4.2.5/lib/active_record/validations.rb:43:in `save!'
My code:
instance.appendixes.select {|a| a.temporary? && !a.appendix.exists?}.each do |a|
a.appendix = S3File.new(a.s3path)
a.process = false
a.appendix_url = nil
puts "CREATING NEW FILE from (temporary?) appendix: #{a.id}, path: #{a.s3path}, is_public: #{a.is_public}, determine_is_public: #{a.determine_is_public}"
a.is_public = a.determine_is_public
logger.debug("CREATING NEW FILE from (temporary?) appendix: #{a.id}, path: #{a.s3path}, is_public: #{a.is_public}, determine_is_public: #{a.determine_is_public}")
a.save! # bo delayed_job
end
I'm getting error on a.save! when path is like: appendixes/appendixes/242/original/%25C5%2581A%25CC%25A8KA.jpg, but works like charm when it is: appendixes/appendixes/243/original/laka.jpg or another file name without polish letters. Anybody had this kind of problem or have suggestions how to fix it?

Ok, I found what was wrong. I had to replace in a.s3path, the last part with original name (łąka.jpg) and everything works fine. So when I have:
S3File.new(appendixes/appendixes/243/original/łąka.jpg) it works good and finds the correct file on s3 server.

retrieve carrierwave file uploaded to Amazon S3

Using Rails 3.2, carrier wave, and recently switched to store on Amazon S3. My setup and uploads are all working fine.
1. I have image_uploader.rb to upload and store images. Displaying them all works fine
2. I have file_uploader.rb to upload and store files. I've even taken it a step further to upload ZIP files and extract a version so that both the ZIP file and TXT files are stored in the correct place on S3.
My problem is I run a method on the TXT file. In the past, I used storage :file
With that I was able to:
Dir.chdir("public/uploads/")
import_file = Dir['*.TXT'].first
f = File.new(import_file)
Now, that I'm using storage :fog I can't get seem to retrieve/File.new/Open the file.
I see the file with the usual commands:
#upload1.team_file # stored file
#upload1.team_file.url # url
#upload1.team_file_url(:data_file).to_s # version created
I've been pouring through all kinds of very limited leads on retrieving and/or opening the file, but everything I try seems to return errors, such as:
Errno::ENOENT: No such file or directory - https://teamfiles.s3.amazonaws.com/data_files…
Thoughts on the difference here of retrieving and USING a file from AmazonS3? Thanks!

Pulling from multiple threads, APIs, etc. I'm answering my own question with what I've found. I welcome any corrections or improvements:
To retrieve carrierwave files uploaded to AmazonS3, you have to understand that open(#upload.file_url) or File.open(#upload.file_url) does NOT open the file, it only opens the PATH to the file. (ref: Ruby OpenURI )
I use: open_uri_url = open(#upload.file_url)
You then have to find the specific file in that path that you want. For me, I then find a ZIP file that was uploaded to AmazonS3 and Extract the specific file within the ZIP file that I want with a unique *.ABC extension:
zip_content_file = Zip::File.open(open_uri_url).map{|content| content if content.to_s.split('.').last == "ABC"}.compact.first
Now, from here, where to extract to?? I create a unique directory in the Rails tmp directory to extract the file to, use it and then delete the directory:
tmp_directory = "tmp/extracts/#{#upload.parent_id}/"
FileUtils.mkdir_p(tmp_directory) unless File.directory?(tmp_directory)
extract = zip_content_file.extract(tmp_directory + content_file.to_s)
Now with found from the AmazonS3 stored ZIP file and extracted, I can open, read, etc:
f = File.new(tmp_directory + extract.to_s)
I hope this helps with Carrierwave, AmazonS3, ZIP files and using them once uploaded.

Tracking Upload Progress of File to S3 Using Ruby aws-sdk

Firstly, I am aware that there are quite a few questions that are similar to this one in SO. I have read most, if not all of them, over the past week. But I still can't make this work for me.
I am developing a Ruby on Rails app that allows users to upload mp3 files to Amazon S3. The upload itself works perfectly, but a progress bar would greatly improve user experience on the website.
I am using the aws-sdk gem which is the official one from Amazon. I have looked everywhere in its documentation for callbacks during the upload process, but I couldn't find anything.
The files are uploaded one at a time directly to S3 so it doesn't need to load it into memory. No multiple file upload necessary either.
I figured that I may need to use JQuery to make this work and I am fine with that.
I found this that looked very promising: https://github.com/blueimp/jQuery-File-Upload
And I even tried following the example here: https://github.com/ncri/s3_uploader_example
But I just could not make it work for me.
The documentation for aws-sdk also BRIEFLY describes streaming uploads with a block:
obj.write do |buffer, bytes|
# writing fewer than the requested number of bytes to the buffer
# will cause write to stop yielding to the block
end
But this is barely helpful. How does one "write to the buffer"? I tried a few intuitive options that would always result in timeouts. And how would I even update the browser based on the buffering?
Is there a better or simpler solution to this?
Thank you in advance.
I would appreciate any help on this subject.

The "buffer" object yielded when passing a block to #write is an instance of StringIO. You can write to the buffer using #write or #<<. Here is an example that uses the block form to upload a file.
file = File.open('/path/to/file', 'r')
obj = s3.buckets['my-bucket'].objects['object-key']
obj.write(:content_length => file.size) do |buffer, bytes|
buffer.write(file.read(bytes))
# you could do some interesting things here to track progress
end
file.close

After read the source code of the AWS gem, I've adapted (or mostly copy) the multipart upload method to yield the current progress based on how many chunks have been uploaded
s3 = AWS::S3.new.buckets['your_bucket']
file = File.open(filepath, 'r', encoding: 'BINARY')
file_to_upload = "#{s3_dir}/#{filename}"
upload_progress = 0
opts = {
content_type: mime_type,
cache_control: 'max-age=31536000',
estimated_content_length: file.size,
}
part_size = self.compute_part_size(opts)
parts_number = (file.size.to_f / part_size).ceil.to_i
obj = s3.objects[file_to_upload]
begin
obj.multipart_upload(opts) do |upload|
until file.eof? do
break if (abort_upload = upload.aborted?)
upload.add_part(file.read(part_size))
upload_progress += 1.0/parts_number
# Yields the Float progress and the String filepath from the
# current file that's being uploaded
yield(upload_progress, upload) if block_given?
end
end
end
The compute_part_size method is defined here and I've modified it to this:
def compute_part_size options
max_parts = 10000
min_size = 5242880 #5 MB
estimated_size = options[:estimated_content_length]
[(estimated_size.to_f / max_parts).ceil, min_size].max.to_i
end
This code was tested on Ruby 2.0.0p0

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart