Rails, partially download a file from amazon s3 - ruby-on-rails

I have a rails application and I would love to download part of the file from Amazon S3 with following code:
url = URI.parse('https://topdisplay.s3-eu-west-1.amazonaws.com/uploads/song/url/15/09_-_No_Goodbyes.mp3?AWSAccessKeyId=dfsfsdf#fdfsd&Signature=fsdfdfdgfvvsersf') # turn the string into a URI
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true #S3 uses SSL, isn't it?
req = Net::HTTP::Get.new(url.path) # init a request with the url
req.range = (0..4096) # limit the load to only 4096 bytes
res = http.request(req) # load the mp3 file
Mp3Info.open( StringIO.open(res.body) ) do |m| #do the parsing
puts m
end
The url is correct, I can download a file through browser. But I get 403 error from amazon at http.request command:
res = http.request(req)
=> #<Net::HTTPForbidden 403 Forbidden readbody=true>
How I can download that file with rails? =)
By the way, finally, I've got another solution. I needed that code to check track length after uploading it to the website. So it was looked like that:
upload track to S3 -> download part of it -> check length
But later I've noticed that carrierwave automatically uploads everything to tmp folder first, so uploading process actually looks like that:
upload to tmp -> upload from website to amazon s3 -> save
And if we call :before_save callback we will be able to open track before it's uploading to S3.
So the code should look like that:
before_save :set_duration
Mp3Info.open( 'public'+url.to_s ) do |m| #do the parsing
self.duration = m.length.to_i
self.name = m.tag.title if self.name == ""
end
In that case I simplified the process a lot :)
Have a sexy day!

right now you are only making a request to the path, i think you need to include the query portion as well
full_path = (url.query.blank?) ? url.path : "#{url.path}?#{url.query}"
req = Net::HTTP::Get.new(full_path)
see also - http://house9.blogspot.com/2010/01/ruby-http-get-with-nethttp.html

Related

Do I have to download an image before upload it to s3?

I have Rails app with embedded images. What I want is to upload these images to s3 and serve theme from there instead of form original source Do I have to download the img to my server before upload it to s3?
Short answer: If you're scraping someone's content, then...yes, you need to pull the file down before uploading to to S3.
Long answer: If the other site (the original source) is working with you, you can give them a Presigned URL that they can use to upload to your S3 bucket.
From Amazon's docs: https://docs.aws.amazon.com/AmazonS3/latest/dev/UploadObjectPreSignedURLRubySDK.html
#Uploading an object using a presigned URL for SDK for Ruby - Version 3.
require 'aws-sdk-s3'
require 'net/http'
s3 = Aws::S3::Resource.new(region:'us-west-2')
obj = s3.bucket('BucketName').object('KeyName')
# Replace BucketName with the name of your bucket.
# Replace KeyName with the name of the object you are creating or replacing.
url = URI.parse(obj.presigned_url(:put))
body = "Hello World!"
# This is the contents of your object. In this case, it's a simple string.
Net::HTTP.start(url.host) do |http|
http.send_request("PUT", url.request_uri, body, {
# This is required, or Net::HTTP will add a default unsigned content-type.
"content-type" => "",
})
end
puts obj.get.body.read
# This will print out the contents of your object to the terminal window.

Convert paperclip pdf from S3 to base64 (Rails)

I'm sending a base64 of a PDF to an external API endpoint in a Rails app.
This occurs regularly with different PDFs for different users. I'm currently using the Paperclip gem.
The problem is getting the PDF into a format that I can then convert to base64.
Below works if I start with a PDF locally and .read it, but not when it comes from S3.
Code:
def self.get_pdf(upload_id)
# get URL for file in S3 (for directly accessing the PDF in browser)
# `.generic` implemented via `has_attached_file :generic` in model
# `.expiring_url` is paperclip syntax for generating a URL
s3_url = Upload
.find(upload_id)
.generic
.expiring_url(100)
# open file from URL
file = open(s3_url)
# read file
pdf = File.read(file)
# convert to base64
base64 = Base64.encode64(File.open(pdf, "rb").read)
end
Error:
OpenURI::HTTPError (404 Not Found):
Ideally this can just occur in memory instead of actually download the file.
Streaming-in a base64 from S3 while streaming out the API request would be awesome but I don't think thats an option here.
UPDATE:
signed URLs from Cyberduck + Michael's answer will work
paperclip URLs fail + Michael's answer results in below error
Error:
The specified key does not exist.
Unfortunately I need to use Paperclip so I can generate links and download PDFs on the fly, based on the uploads table records in my db.
Is there is a technicality about paperclip links I don't understand?
base64 = Base64.encode64( get_me(s3_url).body ).gsub("\n", '')
def get_me(url)
uri = URI(url)
req = Net::HTTP::Get.new(uri)
req['Any_header_you_might_need'] = 'idem'
res = Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == 'https') do |http|
http.request(req)
end
return res
end

How to parse a very huge XML file from a remote server rails

I have a very large XML from a remote server which I have to parse and get the data.
I have tried to open the file using the open() function but it is taking more than 15 minutes and still no response.
Then I tried Nokogiri::XML(open(URL)) where URL is the link which contains the data to parse.
Also, I have tried using Net::HTTP::Get but again with no fruitful results.
Can anyone suggest which gem and function can be used to parse the data?
As mentioned before, Nokogiri::XML::Reader is your friend here. The example in the documentation works fine if you have the file locally.
It is also possible to parse the data as soon as it comes in, fully streaming. This involves getting the data in chunks (e.g. using Net::HTTP) and connecting it to the Nokogiri::XML::Reader by means of an IO.pipe.
Example (adapted from this gist):
require 'nokogiri'
require 'net/http'
# setup request
uri = URI("http://example.com/articles.xml")
req = Net::HTTP::Get.new(uri.request_uri)
# read response in a separate thread using a pipe to communicate
rd, wr = IO.pipe
reader_thread = Thread.new do
Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == 'https') do |http|
http.request(req) do |response|
response.read_body {|chunk| wr.write(chunk) }
end
wr.close
end
end
# parse the incoming data chunk by chunk
reader = Nokogiri::XML::Reader(rd)
reader.each do |node|
next if node.node_type != Nokogiri::XML::Reader::TYPE_ELEMENT
next if node.name != "article"
# now that we have the desired fragment, put it to use
doc = Nokogiri::XML(node.outer_xml)
puts("Got #{doc.text}")
end
rd.close
# let the reader thread finish cleanly
reader_thread.join
If you are working with large XML files then you can use Nokogiri::XML::Reader class. I have successfully opened 1 GB files without any problems. For optimal performance you could download the file first and then parse it using XML::Reader class localy on your server
The usage is something like this (replace XML_FILE with your path):
Nokogiri::XML::Reader(File.open(XML_FILE)).each do |node|
if node.name == 'Node' && node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT
puts node.outer_xml # you can do something like this also Nokogiri::XML(node.outer_xml).at('./Node')
end
end
Heere is the documentation: http://www.rubydoc.info/github/sparklemotion/nokogiri/master/Nokogiri/XML/Reader
Hope it helps

Using S3 Presigned-URL for upload a file that will then have public-read access

I am using Ruby on Rails and AWS gem.
I can get pre-signed URL for upload and download.
But when I get the URL there is no file, and so setting acl to 'public-read'
on the download-url doesn't work.
Use case is this: 1, server provides the user a path to upload content to my bucket that is not readable without credentials. 2, And that content needs to be public later: readable by anyone.
To clarify:
I am not uploading the file, I am providing URL for my users to upload. At that time, I also want to give the user a URL that is readable by the public. It seems like it would be easier if I uploaded the file by myself. Also, read URL needs to never expire.
When you generate a pre-signed URL for a PUT object request, you can specify the key and the ACL the uploader must use. If I wanted the user to upload an objet to my bucket with the key "files/hello.txt" and the file should be publicly readable, I can do the following:
s3 = Aws::S3::Resource.new
obj = s3.bucket('bucket-name').object('files/hello.text')
put_url = obj.presigned_url(:put, acl: 'public-read', expires_in: 3600 * 24)
#=> "https://bucket-name.s3.amazonaws.com/files/hello.text?X-Amz-..."
obj.public_url
#=> "https://bucket-name.s3.amazonaws.com/files/hello.text"
I can give the put_url to someone else. This URL will allow them to PUT an object to the URL. It has the following conditions:
The PUT request must be made within the given expiration. In the example above I specified 24 hours. The :expires_in option may not exceed 1 week.
The PUT request must specify the HTTP header of 'x-amz-acl' with the value of 'public-read'.
Using the put_url, I can upload any an object using Ruby's Net::HTTP:
require 'net/http'
uri = URI.parse(put_url)
request = Net::HTTP::Put.new(uri.request_uri, 'x-amz-acl' => 'public-read')
request.body = 'Hello World!'
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
resp = http.request(request)
Now the object has been uploaded by someone else, I can make a vanilla GET request to the #public_url. This could be done by a browser, curl, wget, etc.
You have two options:
Set the ACL on the object to 'public-read' when you PUT the object. This allows you to use the public url without a signature to GET the object.
Let the ACL on the object default to private and provide pre-signed GET urls for users. These expire, so you have to generate new URLs as needed. A pre-signed URL allows someone to send GET request to the object without credentials themselves.
Upload a public object and generate a public url:
require 'aws-sdk'
s3 = Aws::S3::Resource.new
s3.bucket('bucket-name').object('key').upload_file('/path/to/file', acl:'public-read')
s3.public_url
#=> "https://bucket-name.s3.amazonaws.com/key"
Upload a private object and generate a GET url that is good for 1-hour:
s3 = Aws::S3::Resource.new
s3.bucket('bucket-name').object('key').upload_file('/path/to/file')
s3.presigned_url(:get, expires_in: 3600)
#=> "https://bucket-name.s3.amazonaws.com/key?X-Amz-Algorithm=AWS4-HMAC-SHA256&..."

Ruby on Rails - OAuth 2 multipart Post (Uploading to Facebook or Soundcloud)

I am working on a Rails App that Uses OmniAuth to gather Oauth/OAuth2 credentials for my users and then posts out to those services on their behalf.
Creating simple posts to update status feeds work great.. Now I am to the point of needing to upload files. Facebook says "To publish a photo, issue a POST request with the photo file attachment as multipart/form-data." http://developers.facebook.com/docs/reference/api/photo/
So that is what I am trying to do:
I have implemented the module here: Ruby: How to post a file via HTTP as multipart/form-data? to get the headers and data...
if appearance.post.post_attachment_content_type.to_s.include?('image')
fbpost = "https://graph.facebook.com/me/photos"
data, headers = Multipart::Post.prepare_query("title" => appearance.post.post_attachment_file_name , "document" => File.read(appearance.post.post_attachment.path))
paramsarray = {:source=>data, :message=> appearance.post.content}
response = access_token.request(:post, fbpost, paramsarray, headers)
appearance.result = response
appearance.save
end
I but I am getting a OAuth2::HTTPError - HTTP 400 Error
Any assistance would be Incredible... As I see this information will also be needed for uploading files to SoundCloud also.
Thanks,
Mark
Struggled with this myself. The oauth2 library is backed by Faraday for it's HTTP interaction. with a little configuration it supports uploaded files out of the box. First step is to add the appropriate Faraday middleware when building your connection. An example from my code:
OAuth2::Client.new client_id, secret, site: site do |stack|
stack.request :multipart
stack.request :url_encoded
stack.adapter Faraday.default_adapter
end
This adds the multipart encoding support to the Faraday connection. Next when making the request on your access token object you want to use a Faraday::UploadIO object. So:
upload = Faraday::UploadIO.new io, mime_type, filename
access_token.post('some/url', params: {url: 'params'}, body: {file: upload})
In the above code:
io - An IO object for the file you want to upload. Can be a File object or even a StringIO.
mime_type - The mime type of the file you are uploading. You can either try to detect this server-side or if a user uploaded the file to you, you should be able to extract the mime type from their request.
filename - What are are calling the file you are uploading. This can also be determined by your own choosing or you can just use whatever the user uploading the file calls it.
some/url - Replace this with the URL you want to post to
{url: 'params'} - Replace this with any URL params you want to provide
{file: upload} - Replace this with your multipart form data. Obviously one (or more) of the key/value pairs should have an instance of your file upload.
I'm actually using successfully this code to upload a photo on a fb page :
dir = Dir.pwd.concat("/public/system/posts/images")
fb_url = URI.parse("https://graph.facebook.com/#{#page_id}/photos")
img = File.open("myfile.jpg")
req = Net::HTTP::Post::Multipart.new(
"#{fb_url.path}?access_token=#{#token}",
"source" => UploadIO.new(img, "application/jpg", img.path),
"message" => "some messsage"
)
n = Net::HTTP.new(fb_url.host, fb_url.port)
n.use_ssl = true
n.verify_mode = OpenSSL::SSL::VERIFY_NONE
n.start do |http|
#result = http.request(req)
end

Resources