Zlib gunzip only returning partial file - ruby-on-rails

I have a 27MB .gz file (127MB unzipped). Using ruby's Zlib to ungzip the file returns correctly formatted data, but the file is truncated to a fraction of the expected size (1290 rows of data out of 253,000).
string_io = StringIO.new(body)
file = File.new("test.json.gz", "w+")
file.puts string_io.read
file.close
# string_io.read.length == 26_675_650
# File.size("test.json.gz") == 27_738_775
Using GzipReader:
data = ""
File.open(file.path) do |f|
gz = Zlib::GzipReader.new(f)
data << gz.read
gz.close
end
# data.length = 603_537
Using a different GzipReader method:
data = ""
Zlib::GzipReader.open(file.path) do |gz|
data << gz.read
end
# data.length == 603_537
Using gunzip:
gz = Zlib.gunzip(string_io.read)
# gz.length == 603_537
The expected size is 127,604,690 but I'm only able to extract 603,537. Using gunzip in my terminal correctly extracts the entire file but I'm looking for a programmatic way to handle this.

Instead of opening a file and passing a file handler, have you tried using Zlib::GzipReader.open()? It's documented here https://ruby-doc.org/stdlib/libdoc/zlib/rdoc/Zlib/GzipReader.html
I tested locally and was able to get proper results:
data = ''
=> ""
Zlib::GzipReader.open('file.tar.gz') { |gz|
data << gz.read
}
data.length
=> 750003
Then checked the file size uncompressed:
gzip -l file.tar.gz
compressed uncompressed ratio uncompressed_name
315581 754176 58.1% file.tar
Edit: Saw your update that you are pulling the data via S3 API. Make sure you are Base64 decoding your body before writing it to a file.

Related

Can't upload image to active storage

I am unabe to store a base64 file using ActiveStorage,I am recieving a base64 string from my client
params["image"] = ""
and when I try to attach it I get:
ActiveSupport::MessageVerifier::InvalidSignature(ActiveSupport::MessageVerifier::InvalidSignature):
I've followed many tutorials, tried decoding it first:
decoded_image = Base64.decode64(params["image"])
post.image.attach(decoded_image)
As well as removing the data:image/png;base64 part from the string with:
decoded_image = Base64.decode64(params["image"]['data:image/png;base64,'.length .. -1])
And then attaching the image with no success, when I do it directly from a file with:
file = open("image.png")
post.image.attach(io: file, filename: "post.png")
It works perfectly, so I think that my mistake is during the parsing of the string
I'm not sure if it will work, but give a try to this approach creating a temporary file with Tempfile:
encoded_image = params["image"]['data:image/png;base64,'.length .. -1]
decoded_image = Base64.decode64(encoded_image)
file = Tempfile.new
file.binmode
file.write(decoded_image)
file.rewind
post.image.attach(
io: file,
filename: "post.png" # The name of the file should be received from paramaters, as well
)
file.close
file.unlink

iTMSTransporter metadata.xml md5 utility ios

So I have 100 achievements to upload, rather than using the website I thought it may be faster to create a metadata.xml file and use iTMSTransporter to upload the data. Unfortunately one snag is a MD5 checksum must be computed for each image file, or Apple rejects the entire itmsp package. Requiring this almost invalidates the whole "ease" of using iTMSTransporter.
Is there a utility to parse the metadata file and update it with the checksums? Or perhaps something which generates a metadata file and does it?
There is a command line program that will generate the metadata.xml file and compute the files' checksums. It requires you to put your metadata in a YAML file which it turns into a metadata.xml: https://github.com/colinhumber/itunes_transporter_generator
You can use this script to update a directory containing a metadata.xml file (or files) and assets:
require "rexml/document"
require "digest"
def set_checksum(path)
xml = File.read(path)
doc = Document.new(xml)
doc.get_elements("//achievement//file_name").each do |e|
next unless e.text =~ /\S/
file = File.join($source, e.text.strip)
puts "Computing checksum for #{file}"
$md5.file(file)
checksum = $md5.hexdigest!
node = e.parent.elements["checksum"]
node = Element.new("checksum", e.parent) unless node
node.text = checksum
node.add_attribute("type", "md5")
end
puts "Saving update file"
File.write(path, doc.to_s)
end
include REXML
$source = ARGV.shift || Dir.pwd
$md5 = Digest::MD5.new
Dir["#$source/*.xml"].each do |path|
puts "Processing #{path}"
set_checksum(path)
end
Use it as follows:
> ruby script.rb
or
> ruby script.rb /path/to/metadata/directory

Ruby create tar ball in chunks to avoid out of memory error

I'm trying to re-use the following code to create a tar ball:
tarfile = File.open("#{Pathname.new(path).realpath.to_s}.tar","w")
Gem::Package::TarWriter.new(tarfile) do |tar|
Dir[File.join(path, "**/*")].each do |file|
mode = File.stat(file).mode
relative_file = file.sub /^#{Regexp::escape path}\/?/, ''
if File.directory?(file)
tar.mkdir relative_file, mode
else
tar.add_file relative_file, mode do |tf|
File.open(file, "rb") { |f| tf.write f.read }
end
end
end
end
tarfile.rewind
tarfile
It works fine as far as only small folders are involve but anything large will fail with the following error:
Error: Your application used more memory than the safety cap
How can I do it in chunks to avoid the memory problems?
It looks like the problem could be in this line:
File.open(file, "rb") { |f| tf.write f.read }
You are "slurping" your input file by doing f.read. slurping means the entire file is being read into memory, which isn't scalable at all, and is the result of using read without a length.
Instead, I'd do something to read and write the file in blocks so you have a consistent memory usage. This reads in 1MB blocks. You can adjust that for your own needs:
BLOCKSIZE_TO_READ = 1024 * 1000
File.open(file, "rb") do |fi|
while buffer = fi.read(BLOCKSIZE_TO_READ)
tf.write buffer
end
end
Here's what the documentation says about read:
If length is a positive integer, it try to read length bytes without any conversion (binary mode). It returns nil or a string whose length is 1 to length bytes. nil means it met EOF at beginning. The 1 to length-1 bytes string means it met EOF after reading the result. The length bytes string means it doesn’t meet EOF. The resulted string is always ASCII-8BIT encoding.
An additional problem is it looks like you're not opening the output file correctly:
tarfile = File.open("#{Pathname.new(path).realpath.to_s}.tar","w")
You're writing it in "text" mode because of "w". Instead, you need to write in binary mode, "wb", because tarballs contain binary (compressed) data:
tarfile = File.open("#{Pathname.new(path).realpath.to_s}.tar","wb")
Rewriting the original code to be more like I'd want to see it, results in:
BLOCKSIZE_TO_READ = 1024 * 1000
def create_tarball(path)
tar_filename = Pathname.new(path).realpath.to_path + '.tar'
File.open(tar_filename, 'wb') do |tarfile|
Gem::Package::TarWriter.new(tarfile) do |tar|
Dir[File.join(path, '**/*')].each do |file|
mode = File.stat(file).mode
relative_file = file.sub(/^#{ Regexp.escape(path) }\/?/, '')
if File.directory?(file)
tar.mkdir(relative_file, mode)
else
tar.add_file(relative_file, mode) do |tf|
File.open(file, 'rb') do |f|
while buffer = f.read(BLOCKSIZE_TO_READ)
tf.write buffer
end
end
end
end
end
end
end
tar_filename
end
BLOCKSIZE_TO_READ should be at the top of your file since it's a constant and is a "tweakable" - something more likely to be changed than the body of the code.
The method returns the path to the tarball, not an IO handle like the original code. Using the block form of IO.open automatically closes the output, which would cause any subsequent open to automatically rewind. I much prefer passing around path strings than IO handles for files.
I also wrapped some of the method parameters in enclosing parenthesis. While parenthesis aren't required around method parameters in Ruby, and some people eschew them, I think they make the code more maintainable by delimiting where the parameters start and end. They also avoid confusing Ruby when you're passing parameters and a block to a method -- a well-known cause for bugs.
minitar looks like it writes to a stream so I don't think memory will be a problem. Here is the comment and definition of the pack method (as of May 21, 2013):
# A convenience method to pack files specified by +src+ into +dest+. If
# +src+ is an Array, then each file detailed therein will be packed into
# the resulting Archive::Tar::Minitar::Output stream; if +recurse_dirs+
# is true, then directories will be recursed.
#
# If +src+ is an Array, it will be treated as the argument to Find.find;
# all files matching will be packed.
def pack(src, dest, recurse_dirs = true, &block)
Output.open(dest) do |outp|
if src.kind_of?(Array)
src.each do |entry|
pack_file(entry, outp, &block)
if dir?(entry) and recurse_dirs
Dir["#{entry}/**/**"].each do |ee|
pack_file(ee, outp, &block)
end
end
end
else
Find.find(src) do |entry|
pack_file(entry, outp, &block)
end
end
end
end
Example from the README to write a tar:
# Packs everything that matches Find.find('tests')
File.open('test.tar', 'wb') { |tar| Minitar.pack('tests', tar) }
Example from the README to write a gzipped tar:
tgz = Zlib::GzipWriter.new(File.open('test.tgz', 'wb'))
# Warning: tgz will be closed!
Minitar.pack('tests', tgz)

How to write to tmp file or stream an image object up to s3 in ruby on rails

The code below resizes my image. But I am not sure how to write it out to a temp file or blob so I can upload it to s3.
origImage = MiniMagick::Image.open(myPhoto.tempfile.path)
origImage.resize "200x200"
thumbKey = "tiny-#{key}"
obj = bucket.objects[thumbKey].write(:file => origImage.write("tiny.jpg"))
I can upload the original file just fine to s3 with the below command:
obj = bucket.objects[key].write('data')
obj.write(:file => myPhoto.tempfile)
I think I want to create a temp file, read the image file into it and upload that:
thumbFile = Tempfile.new('temp')
thumbFile.write(origImage.read)
obj = bucket.objects[thumbKey].write(:file => thumbFile)
but the origImage class doesn't have a read command.
UPDATE: I was reading the source code and found this out about the write command
# Writes the temporary file out to either a file location (by passing in a String) or by
# passing in a Stream that you can #write(chunk) to repeatedly
#
# #param output_to [IOStream, String] Some kind of stream object that needs to be read or a file path as a String
# #return [IOStream, Boolean] If you pass in a file location [String] then you get a success boolean. If its a stream, you get it back.
# Writes the temporary image that we are using for processing to the output path
And the s3 api docs say you can stream the content using a code block like:
obj.write do |buffer, bytes|
# writing fewer than the requested number of bytes to the buffer
# will cause write to stop yielding to the block
end
How do I change my code so
origImage.write(s3stream here)
http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html
UPDATE 2
This code successfully uploads the thumbnail file to s3. But I would still love to know how to stream it up. It would be much more efficient I think.
#resize image and upload a thumbnail
smallImage = MiniMagick::Image.open(myPhoto.tempfile.path)
smallImage.resize "200x200"
thumbKey = "tiny-#{key}"
newFile = Tempfile.new("tempimage")
smallImage.write(newFile.path)
obj = bucket.objects[thumbKey].write('data')
obj.write(:file => newFile)
smallImage.to_blob ?
below code copy from https://github.com/probablycorey/mini_magick/blob/master/lib/mini_magick.rb
# Gives you raw image data back
# #return [String] binary string
def to_blob
f = File.new #path
f.binmode
f.read
ensure
f.close if f
end
Have you looked into the paperclip gem? The gem offers direct compatibility to s3 and works great.

Converting binary IOstream into file

I am using rails server. i am sending core http request.
in request.body contents a file which I want to be uploaded. This request.body is StringIo object. I want to upload this file to my server.
This writes the file to disk in 1mb (1024**2) chunks. Reading the whole file in at once can leave you open to a DOS with huge files.
File.open("where-you-want-the-file", "w") do |f|
while blk = request.body.read(1024**2)
f << blk
end
end

Resources