How to write to tmp file or stream an image object up to s3 in ruby on rails - ruby-on-rails

The code below resizes my image. But I am not sure how to write it out to a temp file or blob so I can upload it to s3.
origImage = MiniMagick::Image.open(myPhoto.tempfile.path)
origImage.resize "200x200"
thumbKey = "tiny-#{key}"
obj = bucket.objects[thumbKey].write(:file => origImage.write("tiny.jpg"))
I can upload the original file just fine to s3 with the below command:
obj = bucket.objects[key].write('data')
obj.write(:file => myPhoto.tempfile)
I think I want to create a temp file, read the image file into it and upload that:
thumbFile = Tempfile.new('temp')
thumbFile.write(origImage.read)
obj = bucket.objects[thumbKey].write(:file => thumbFile)
but the origImage class doesn't have a read command.
UPDATE: I was reading the source code and found this out about the write command
# Writes the temporary file out to either a file location (by passing in a String) or by
# passing in a Stream that you can #write(chunk) to repeatedly
#
# #param output_to [IOStream, String] Some kind of stream object that needs to be read or a file path as a String
# #return [IOStream, Boolean] If you pass in a file location [String] then you get a success boolean. If its a stream, you get it back.
# Writes the temporary image that we are using for processing to the output path
And the s3 api docs say you can stream the content using a code block like:
obj.write do |buffer, bytes|
# writing fewer than the requested number of bytes to the buffer
# will cause write to stop yielding to the block
end
How do I change my code so
origImage.write(s3stream here)
http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html
UPDATE 2
This code successfully uploads the thumbnail file to s3. But I would still love to know how to stream it up. It would be much more efficient I think.
#resize image and upload a thumbnail
smallImage = MiniMagick::Image.open(myPhoto.tempfile.path)
smallImage.resize "200x200"
thumbKey = "tiny-#{key}"
newFile = Tempfile.new("tempimage")
smallImage.write(newFile.path)
obj = bucket.objects[thumbKey].write('data')
obj.write(:file => newFile)

smallImage.to_blob ?
below code copy from https://github.com/probablycorey/mini_magick/blob/master/lib/mini_magick.rb
# Gives you raw image data back
# #return [String] binary string
def to_blob
f = File.new #path
f.binmode
f.read
ensure
f.close if f
end

Have you looked into the paperclip gem? The gem offers direct compatibility to s3 and works great.

Related

Zip File from S3 files

Ruby '2.7.4'
Rails '~> 5.2.2'
I have access to an S3 bucket containing several files of several types, which I am trying to
Download into memory
Put them all together inside a zip file
Upload this zip file into some S3 bucket
I've looked into several issues on the web already, without any success.
Specifically, I'm trying to use the rubyzip gem, but no matter what I do, I always end up with the error message : 'no implicit conversion of StringIO into String'
Here's a summary of my current code
gem 'rubyzip', require: 'zip'
require 'zip'
bucket_name = 'redacted'
zip_filename = "My final complete zip file.zip"
s3_client = Aws::S3::Client.new(region: 'eu-west-3')
s3_resource = Aws::S3::Resource.new(region: 'eu-west-3')
bucket = s3_resource.bucket(bucket_name)
s3_filename = 's3_file_name'
s3_file = s3_client.get_object(bucket: bucket_name, key: s3_filename)
file = s3_file.body
At this point, I have exactly one file, in a StringIO format.
However please bear in mind that I'm trying to reproduce this with several files, which means I want to bundle several files inside a final zip.
I'm failing to put this file into a zip and/or put the zip back into s3.
Attempt N°1
stringio = Zip::OutputStream.write_buffer do |zio|
zio.put_next_entry("test1.zip")
zio.write(file)
end
stringio.rewind
binary_data = stringio.sysread
Error message : no implicit conversion of StringIO into String
Attempt N°2
zip_file_name = 'my_test_file_name.zip'
File.open(zip_file_name, 'w') { |f| f.puts(file.rewind && file.read) }
final_zip = Zip::File.open(zip_filename, create: true) do |zipfile|
zf = Zip::File.new(file, create: true, buffer: true)
zipfile.add(zf.to_s, zip_file_name)
end
really_final_zip = Zip::File.new(final_zip, create: true, buffer: true)
new_object = bucket.object(zip_file_name)
new_object.put(body: final_zip)
Error Message : expected params[:body] to be a String or IO like object that supports read and rewind, got value #<Zip::Entry:0x0000558a06ff42a0
If instead of that last line, I write
new_object.put(body: final_zip.to_s)
A text file is created in S3 (instead of the zip) with the content #<StringIO:0x0000558a06c8c8d8>
Need to read the bytes from the file so...
change
s3_file.body to s3_file.body.read

Zlib gunzip only returning partial file

I have a 27MB .gz file (127MB unzipped). Using ruby's Zlib to ungzip the file returns correctly formatted data, but the file is truncated to a fraction of the expected size (1290 rows of data out of 253,000).
string_io = StringIO.new(body)
file = File.new("test.json.gz", "w+")
file.puts string_io.read
file.close
# string_io.read.length == 26_675_650
# File.size("test.json.gz") == 27_738_775
Using GzipReader:
data = ""
File.open(file.path) do |f|
gz = Zlib::GzipReader.new(f)
data << gz.read
gz.close
end
# data.length = 603_537
Using a different GzipReader method:
data = ""
Zlib::GzipReader.open(file.path) do |gz|
data << gz.read
end
# data.length == 603_537
Using gunzip:
gz = Zlib.gunzip(string_io.read)
# gz.length == 603_537
The expected size is 127,604,690 but I'm only able to extract 603,537. Using gunzip in my terminal correctly extracts the entire file but I'm looking for a programmatic way to handle this.
Instead of opening a file and passing a file handler, have you tried using Zlib::GzipReader.open()? It's documented here https://ruby-doc.org/stdlib/libdoc/zlib/rdoc/Zlib/GzipReader.html
I tested locally and was able to get proper results:
data = ''
=> ""
Zlib::GzipReader.open('file.tar.gz') { |gz|
data << gz.read
}
data.length
=> 750003
Then checked the file size uncompressed:
gzip -l file.tar.gz
compressed uncompressed ratio uncompressed_name
315581 754176 58.1% file.tar
Edit: Saw your update that you are pulling the data via S3 API. Make sure you are Base64 decoding your body before writing it to a file.

How to convert word file to PDF in ROR

I am using Libreconv gem to convert word to doc but it's not working with S3
bucket = Aws::S3::Bucket.new('bucket-name')
object = bucket.object file.attachment.blob.key
path = object.presigned_url(:get)
Libreconv.convert(path, "public/test.pdf")
If I try to convert this path to PDF using Libreconv then it's give me filename too long error. I have wrriten this code under ActiveJobs. So kindly provide me solutions as per ActiveJobs.
Can someone please suggest me how can I convert word file to pdf.
Here path is https://domain.s3.amazonaws.com/Bf5qPUP3znZGCHCcTWHcR5Nn?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIZ6RZ7J425ORVUYQ%2F20181206%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20181206T051240Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&X-Amz-Signature=b89c47a324b2aa423bf64dfb343e3b3c90dce9b54fa9fe1bc4efa9c248e912f9
and error I am getting is
Error: source file could not be loaded
*** Errno::ENAMETOOLONG Exception: File name too long # rb_sysopen - /tmp/Bf5qPUP3znZGCHCcTWHcR5Nn?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIZ6RZ7J425ORVUYQ%2F20181206%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20181206T051240Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&X-Amz-Signature=b89c47a324b2aa423bf64dfb343e3b3c90dce9b54fa9fe1bc4efa9c248e912f9.pd
It seems that you PDF is created with all the params needed to fetch docx from S3.
I suppose it happens in this line:
target_tmp_file = "#{target_path}/#{File.basename(#source, ".*")}.#{File.basename(#convert_to, ":*")}"
#source is https://domain.s3.amazonaws.com/Bf5qPUP3znZGCHCcTWHcR5Nn?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIZ6RZ7J425ORVUYQ%2F20181206%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20181206T051240Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&X-Amz-Signature=b89c47a324b2aa423bf64dfb343e3b3c90dce9b54fa9fe1bc4efa9c248e912f9 and
> File.basename(#source, ".*")
=> "Bf5qPUP3znZGCHCcTWHcR5Nn?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIZ6RZ7J425ORVUYQ%2F20181206%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20181206T051240Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&X-Amz-Signature=b89c47a324b2aa423bf64dfb343e3b3c90dce9b54fa9fe1bc4efa9c248e912f9"
As a result Libreconv gem tries to create a tmp file with this long name and it's too long - that's why an error is raised.
Possible solution: split the process into separate steps of fetching file and converting it. Something like:
require "open-uri"
bucket = Aws::S3::Bucket.new('bucket-name')
object = bucket.object file.attachment.blob.key
path = object.presigned_url(:get)
doc_file = open(path)
begin
Libreconv.convert(doc_file.path, "public/test.pdf")
ensure
doc_file.delete
end
following is the answer using combine pdf gem
tape = Tape.new(file)
result = tape.preview
tempfile = Tempfile.new(['foo', '.pdf'])
File.open(tempfile, 'wb') do |f|
f.write result
end
path = tempfile.path
combine_pdf(path)
and for load file for S3 I have used
object = #bucket.object object_key
path = object.presigned_url(:get)
response = Net::HTTP.get_response(URI.parse(path)).body

How to get a bitmap image in ruby?

The google vision API requires a bitmap sent as an argument. I am trying to convert a png from a URL to a bitmap to pass to the google api:
require "google/cloud/vision"
PROJECT_ID = Rails.application.secrets["project_id"]
KEY_FILE = "#{Rails.root}/#{Rails.application.secrets["key_file"]}"
google_vision = Google::Cloud::Vision.new project: PROJECT_ID, keyfile: KEY_FILE
img = open("https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png").read
image = google_vision.image img
ArgumentError: string contains null byte
This is the source code processing of the gem:
def self.from_source source, vision = nil
if source.respond_to?(:read) && source.respond_to?(:rewind)
return from_io(source, vision)
end
# Convert Storage::File objects to the URL
source = source.to_gs_url if source.respond_to? :to_gs_url
# Everything should be a string from now on
source = String source
# Create an Image from a HTTP/HTTPS URL or Google Storage URL.
return from_url(source, vision) if url? source
# Create an image from a file on the filesystem
if File.file? source
unless File.readable? source
fail ArgumentError, "Cannot read #{source}"
end
return from_io(File.open(source, "rb"), vision)
end
fail ArgumentError, "Unable to convert #{source} to an Image"
end
https://github.com/GoogleCloudPlatform/google-cloud-ruby
Why is it telling me string contains null byte? How can I get a bitmap in ruby?
According to the documentation (which, to be fair, is not exactly easy to find without digging into the source code), Google::Cloud::Vision#image doesn't want the raw image bytes, it wants a path or URL of some sort:
Use Vision::Project#image to create images for the Cloud Vision service.
You can provide a file path:
[...]
Or any publicly-accessible image HTTP/HTTPS URL:
[...]
Or, you can initialize the image with a Google Cloud Storage URI:
So you'd want to say something like:
image = google_vision.image "https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png"
instead of reading the image data yourself.
Instead of using write you want to use IO.copy_stream as it streams the download straight to the file system instead of reading the whole file into memory and then writing it:
require 'open-uri'
require 'tempfile'
uri = URI("https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png")
tmp_img = Tempfile.new(uri.path.split('/').last)
IO.copy_stream(open(uri), tmp_img)
Note that you don't need to set the 'r:BINARY' flag as the bytes are just streamed without actually reading the file.
You can then use the file by:
require "google/cloud/vision"
# Use fetch as it raises an error if the key is not present
PROJECT_ID = Rails.application.secrets.fetch("project_id")
# Rails.root is a Pathname object so use `.join` to construct paths
KEY_FILE = Rails.root.join(Rails.application.secrets.fetch("key_file"))
google_vision = Google::Cloud::Vision.new(
project: PROJECT_ID,
keyfile: KEY_FILE
)
image = google_vision.image(File.absolute_path(tmp_img))
When you are done you clean up by calling tmp_img.unlink.
Remember to read things in binary format:
open("https://www.google.com/..._272x92dp.png",'r:BINARY').read
If you forget this it might try and open it as UTF-8 textual data which would cause lots of problems.

Rubyzip: Export zip file directly to S3 without writing tmpfile to disk?

I have this code, which writes a zip file to disk, reads it back, uploads to s3, then deletes the file:
compressed_file = some_temp_path
Zip::ZipOutputStream.open(compressed_file) do |zos|
some_file_list.each do |file|
zos.put_next_entry(file.some_title)
zos.print IO.read(file.path)
end
end # Write zip file
s3 = Aws::S3.new(S3_KEY, S3_SECRET)
bucket = Aws::S3::Bucket.create(s3, S3_BUCKET)
bucket.put("#{BUCKET_PATH}/archive.zip", IO.read(compressed_file), {}, 'authenticated-read')
File.delete(compressed_file)
This code works already but what I want is to not create the zip file anymore, to save a few steps. I was wondering if there is a way to export the zipfile data directly to s3 without having to first create a tmpfile, read it back, then delete it?
I think I just found the answer to my question.
It's Zip::ZipOutputStream.write_buffer. I'll check this out and update this answer when I get it working.
Update
It does work. My code is like this now:
compressed_filestream = Zip::ZipOutputStream.write_buffer do |zos|
some_file_list.each do |file|
zos.put_next_entry(file.some_title)
zos.print IO.read(file.path)
end
end # Outputs zipfile as StringIO
s3 = Aws::S3.new(S3_KEY, S3_SECRET)
bucket = Aws::S3::Bucket.create(s3, S3_BUCKET)
compressed_filestream.rewind
bucket.put("#{BUCKET_PATH}/archive.zip", compressed_filestream.read, {}, 'authenticated-read')
The write_buffer returns a StringIO and needs to rewind the stream first before reading it. Now I don't need to create and delete the tmpfile.
I'm just wondering now if write_buffer would be more memory extensive or heavier than open? Or is it the other way around?

Resources