Dealing with null bytes when creating ec2 with user_data using fog - ruby-on-rails

I am trying to provision an ec2 instance using fog, here is the code that I am using:
compute = Fog::Compute.new provider: 'AWS',
region: 'us-east-1', aws_access_key_id: ACCESS_KEY,
aws_secret_access_key: SECRET_ACCESS_KEY
options = {
image_id: 'ami-xxxxxx',
flavor_id: 'm1.small',
#custom security group created in AWS Account with open ports
groups: ['myGroup'],
private_key_path: '~/.ssh/id_rsa',
public_key_path: '~/.ssh/id_rsa.pub',
username: 'ec2-user',
user_data: File.read(Rails.root.join('public', 'somefile.zip'))
}
compute.servers.bootstrap options
When, I run this. I get following error:
Fog::JSON::EncodeError: string contains null byte
from /home/gaurish/.rvm/gems/ruby-2.0.0-p247/gems/multi_json-1.8.2/lib/multi_json/adapters/oj.rb:20:in `dump'
As you may notice, above. I am supplying a ZIP file for user_data option. And this is what I think the problem occurs. My guess is that the zip file or encoding it to base64 somehow adds a null byte("\0") due to which Oj can't encode it to JSON format.
Now,
Can anyone verify if its a bug in fog or am I doing anything wrong?
Any workarounds to avoid null bytes?
Versions used:
Fog 1.19
multi_json-1.8.2
oj-2.2.3

I have solved this issue. here is how:
file = File.open(path, 'rb') #path => path to zip file
contents = file.read
file.close
user_data = Base64.encode64 contents
now, this user_data can be safely passed into options[:user_data] hash without null byte errors. this issue is being tracked here:
https://github.com/fog/fog/issues/2506

Related

Zip File from S3 files

Ruby '2.7.4'
Rails '~> 5.2.2'
I have access to an S3 bucket containing several files of several types, which I am trying to
Download into memory
Put them all together inside a zip file
Upload this zip file into some S3 bucket
I've looked into several issues on the web already, without any success.
Specifically, I'm trying to use the rubyzip gem, but no matter what I do, I always end up with the error message : 'no implicit conversion of StringIO into String'
Here's a summary of my current code
gem 'rubyzip', require: 'zip'
require 'zip'
bucket_name = 'redacted'
zip_filename = "My final complete zip file.zip"
s3_client = Aws::S3::Client.new(region: 'eu-west-3')
s3_resource = Aws::S3::Resource.new(region: 'eu-west-3')
bucket = s3_resource.bucket(bucket_name)
s3_filename = 's3_file_name'
s3_file = s3_client.get_object(bucket: bucket_name, key: s3_filename)
file = s3_file.body
At this point, I have exactly one file, in a StringIO format.
However please bear in mind that I'm trying to reproduce this with several files, which means I want to bundle several files inside a final zip.
I'm failing to put this file into a zip and/or put the zip back into s3.
Attempt N°1
stringio = Zip::OutputStream.write_buffer do |zio|
zio.put_next_entry("test1.zip")
zio.write(file)
end
stringio.rewind
binary_data = stringio.sysread
Error message : no implicit conversion of StringIO into String
Attempt N°2
zip_file_name = 'my_test_file_name.zip'
File.open(zip_file_name, 'w') { |f| f.puts(file.rewind && file.read) }
final_zip = Zip::File.open(zip_filename, create: true) do |zipfile|
zf = Zip::File.new(file, create: true, buffer: true)
zipfile.add(zf.to_s, zip_file_name)
end
really_final_zip = Zip::File.new(final_zip, create: true, buffer: true)
new_object = bucket.object(zip_file_name)
new_object.put(body: final_zip)
Error Message : expected params[:body] to be a String or IO like object that supports read and rewind, got value #<Zip::Entry:0x0000558a06ff42a0
If instead of that last line, I write
new_object.put(body: final_zip.to_s)
A text file is created in S3 (instead of the zip) with the content #<StringIO:0x0000558a06c8c8d8>
Need to read the bytes from the file so...
change
s3_file.body to s3_file.body.read

Getting a datafile from AWS S3 Bucket and parse in Rails?

I'm creating a Ruby script in Rails that will:
1) create an S3 object with AWS S3 SDK
2) iterate through bucket and download (get) each file
3) through iteration, store the file in memory and then convert to a string
4) parse the string for data and re-upload the file based on parsed data to an appropriate folder in same bucket
So the code I have so far in Rails jobs:
def aws_get
io = IO.new(1)
bucket_col = []
s3 = Aws::S3::Resource.new(region: 'us-east-1', access_key_id: Rails.application.credentials.dig(:aws, :access_key_id), secret_access_key: Rails.application.credentials.dig(:aws, :secret_access_key))
s3.bucket('missouridata').objects.each do |object|
obj = s3.bucket('missouridata').object(object.key)
file = obj.get(response_target: io)
???
end
end
The questions marks is where I don't know what to do next. How do I take the file stored in memory and convert it to a string to be parsed?
I have the perfect solution for you. I have beein using fog gem to manipulate S3 bucket for a while. It can do pretty much anything for you.
Here is the reference link.
https://www.ironin.it/blog/manipulating-files-on-amazon-s3-storage-with-rubys-fog-gem.html

Ruby aws-sdk - ".exists?" says the file doesn't exist even though I see it in the bucket

I stuck all afternoon on checking whether an uploaded file to AWS S3 exists or not. I use Ruby On Rails and the gem called aws-sdk, v2.
First of all - the file exists in the bucket, it is located here:
test_bucket/users/10/file_test.pdf
There's no typo, this is the exact path. Also, the bucket + credentials are set up correctly.
And here's how I try to check the existence of the file:
config = {region: 'us-west-1', bucket: AWS_S3_CONFIG['bucket'], key: AWS_S3_CONFIG['access_key_id'], secret: AWS_S3_CONFIG['secret_access_key']}
Aws.config.update({region: config[:region],
credentials: Aws::Credentials.new(config[:key], config[:secret]),
:s3 => { :region => 'us-east-1' }})
bucket = Aws::S3::Resource.new.bucket(config[:bucket])
puts bucket.object("file_test.pdf").exists?
The output is always false.
I also tried puts bucket.object("test_bucket/users/10/file_test.pdf").exists?, but still false.
Also, I tried to make the file public in the AWS S3 dashboard, but no success, still false. The file is visible when click on the generated link.
But the problem is that when I check with using aws-sdk if the file exist, the output is still false.
What am I doing wrong?
Thank you.
You need to pass the full path to the object (not including the bucket name) - users/10/file_test.pdf

Rails 4, Fog, Amazon s3 - retrieving all the images as an array from a specific folder in a bucket.

I am using amazon s3, rails 4, and the FOG gem. I have an amazon bucket called uipstudy with 100 folders, each containing about 20 images. I use the following to get all the images in a specific folder (In my application_helper.rb which is included in the application_controller.rb).
def get_files(image_folder)
connection = Fog::Storage.new(
provider: 'AWS',
aws_access_key_id: '######',
aws_secret_access_key: '#######'
)
connection.directories.get('uipimages', prefix:image_folder).files.map do |file|
file.key
end
end
In my controller I have this....in this example I am looking in the folder "1" in the uipstudy bucket.
#Amazon solution:
#images = get_files('1')
#images.each do |image|
image = "https://s3.amazonaws.com/uipstudy/#{image}"
#image_array << image
end
The problem is that its returning the files inside the folder labelled "1" but also in 10, 11, 12,13....etc. I assumed that the prefix was an absolute but it appears not. Is there a way to enforce that the prefix gets exactly the folder specified in the prefix?
I think you should be able to make a small change in your script to get the behavior you want. Simply append a forward slash to the prefix so that it clearly shows you want things that are like a directory instead of any/all things that begin with a particular character.
So, that would get you something like:
directory = connection.directories.get('upimages', prefix: image_folder + '/')
directory.files.map do |file|
file.key
end
(I just split it into two commands to make it format/read easier)
Below is my solution using the aws-sdk gem.
initialize s3 client
s3 = AWS::S3.new
bucket = s3.buckets[ENV['AWS_BUCKET']]
regex for ipa files in _inbox folder
regex = %r{_inbox/(?:[^/]+/)*[^/]+\.ipa}i
get and process ipa files
bucket.objects.select { |o| o.key.match(regex) }.each do |ipa|

How to copy file across buckets using aws-s3 or aws-sdk gem in ruby on rails

The aws-s3 documentation says:
# Copying an object
S3Object.copy 'headshot.jpg', 'headshot2.jpg', 'photos'
But how do I copy heashot.jpg from the photos bucket to the archive bucket for example
Thanks!
Deb
AWS-SDK gem. S3Object#copy_to
Copies data from the current object to another object in S3.
S3 handles the copy so the client does not need to fetch the
data and upload it again. You can also change the storage
class and metadata of the object when copying.
It uses copy_object method internal, so the copy functionality allows you to copy objects within or between your S3 buckets, and optionally to replace the metadata associated with the object in the process.
Standard method (download/upload)
Copy method
Code sample:
require 'aws-sdk'
AWS.config(
:access_key_id => '***',
:secret_access_key => '***',
:max_retries => 10
)
file = 'test_file.rb'
bucket_0 = {:name => 'bucket_from', :endpoint => 's3-eu-west-1.amazonaws.com'}
bucket_1 = {:name => 'bucket_to', :endpoint => 's3.amazonaws.com'}
s3_interface_from = AWS::S3.new(:s3_endpoint => bucket_0[:endpoint])
bucket_from = s3_interface_from.buckets[bucket_0[:name]]
bucket_from.objects[file].write(open(file))
s3_interface_to = AWS::S3.new(:s3_endpoint => bucket_1[:endpoint])
bucket_to = s3_interface_to.buckets[bucket_1[:name]]
bucket_to.objects[file].copy_from(file, {:bucket => bucket_from})
Using the right_aws gem:
# With s3 being an S3 object acquired via S3Interface.new
# Copies key1 from bucket b1 to key1_copy in bucket b2:
s3.copy('b1', 'key1', 'b2', 'key1_copy')
the gotcha I ran into is that if you have pics/1234/yourfile.jpg the bucket is only pics and the key is 1234/yourfile.jpg
I got the answer from here: How do I copy files between buckets using s3 from a rails application?
For anyone still looking, AWS has documentation for this. It's actually very simple with the aws-sdk gem:
bucket = Aws::S3::Bucket.new('source-bucket')
object = bucket.object('source-key')
object.copy_to(bucket: 'target-bucket', key: 'target-key')
When using the AWS SDK gem's copy_from or copy_to there are three things that aren't copied by default: ACL, storage class, or server side encryption. You need to specify them as options.
from_object.copy_to from_object.key, {:bucket => 'new-bucket-name', :acl => :public_read}
https://github.com/aws/aws-sdk-ruby/blob/master/lib/aws/s3/s3_object.rb#L904
Here's a simple ruby class to copy all objects from one bucket to another bucket: https://gist.github.com/edwardsharp/d501af263728eceb361ebba80d7fe324
Multiple images could easily be copied using aws-sdk gem as follows:
require 'aws-sdk'
image_names = ['one.jpg', 'two.jpg', 'three.jpg', 'four.jpg', 'five.png', 'six.jpg']
Aws.config.update({
region: "destination_region",
credentials: Aws::Credentials.new('AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY')
})
image_names.each do |img|
s3 = Aws::S3::Client.new()
resp = s3.copy_object({
bucket: "destinationation_bucket_name",
copy_source: URI.encode_www_form_component("/source_bucket_name/path/to/#{img}"),
key: "path/where/to/save/#{img}"
})
end
If you have too many images, it is suggested to put the copying process in a background job.
I believe that in order to copy between buckets you must read the file's contents from the source bucket and then write it back to the destination bucket via your application's memory space. There's a snippet showing this using aws-s3 here and another approach using right_aws here
The aws-s3 gem does not have the ability to copy files in between buckets without moving files to your local machine. If that's acceptable to you, then the following will work:
AWS::S3::S3Object.store 'dest-key', open('http://url/to/source.file'), 'dest-bucket'
I ran into the same issue that you had, so I cloned the source code for AWS-S3 and made a branch that has a copy_to method that allows for copying between buckets, which I've been bundling into my projects and using when I need that functionality. Hopefully someone else will find this useful as well.
View the branch on GitHub.

Resources