I have generated many PDF files in memory and I want to compress them into one zip file before sending it as a email attachment. I have looked at Rubyzip and it does not allows me to create a zip file without saving it to disk (maybe I am wrong).
Is there any way I can compress those file without creating a temp file?
I had a similar problem which I solved using the rubyzip gem and the stringio object.
It turns out that rubyzip provides a method that returns a stringio object: ZipOutputStream.write_buffer.
You can create the zip file structure as you like using put_next_entry and write and once you are finished you can rewind the stringio and read the binary data using sysread.
See the following simple example (works for rubyzip 0.9.X)
require 'zip/zip'
stringio = Zip::OutputStream.write_buffer do |zio|
zio.put_next_entry("test.txt")
zio.write "Hello world!"
end
stringio.rewind
binary_data = stringio.sysread
Tested on jruby 1.6.5.1 (ruby-1.9.2-p136) (2011-12-27 1bf37c2) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_29) [Windows Server 2008-amd64-java])
The following example works for rubyzip >= 1.0.0
require 'rubygems'
require 'zip'
stringio = Zip::OutputStream.write_buffer do |zio|
zio.put_next_entry("test.txt")
zio.write "Hello world!"
end
binary_data = stringio.string
Tested on jruby 1.7.22 (1.9.3p551) 2015-08-20 c28f492 on OpenJDK 64-Bit Server VM 1.7.0_79-b14 +jit [linux-amd64] and rubyzip gem 1.1.7
Ruby comes with a very convenient StringIO library - this can be used for using a String as output IO object or faking reading a file backed by a String.
The challenge here is that RubyZip does not support directly taking an IO object when creating a Zip::ZipOutputStream, but if you look at the implementation of the initialize, and depending on your willingness to experiment, you may be able to extend the class and allow it to take either an IO object or a file name in the constructor.
There are two RubyZip libraries that I was able to find.
Chilkat's Ruby Zip Library
rubyzip on Sourceforge
Chilkat's library definitely allows one to create a zip file in memory instead of writing it to disk automatically as seen in these links: Zip to Memory, Zip from in memory data
The one on SourceForge, on the other hand, may provide an option of zipping a file in memory but I'm not entirely certain since I'm very new to ruby. The SourceForge rubyzip is based on java.util.zip which has led to it having a class called ZipOutputStream. I don't know how good the rubyzip implementation is, but with java.util.zip implementation the OutputStream can be set to ByteArrayOutputStream, FileOutputStream, FilterOutputStream, ObjectOutputStream, OutputStream, PipedOutputStream....
If that holds true for the rubyzip implementation then it should be a matter of using ZipOutputStream to pass in a ByteArrayOutputStream of sorts which would result in it being output to memory.
If it doesn't exist in rubyzip, then I'm sure you could always write your own implementation and submit it for inclusion in rubyzip seeing as it is opensource.
If you're on Linux, and depending upon how much RAM you have, and how large your files are, you could always use tmpfs (shared memory). Then, the rubyzip disk-based methods will work. http://www.mjmwired.net/kernel/Documentation/filesystems/tmpfs.txt
The accepted answer works well but it didn't solve my problem. I didn't want to use the write_buffer method because it automatically closes the stream after the block closes. The code snippet below gives you more control over when the stream is created and closed.
require 'stringio'
require 'zip'
io = StringIO.new
zip_io = Zip::OutputStream.new(io, true) # 'true' indicates 'io' is a stream
zip_io.put_next_entry('test.txt')
zip_io.write('Hello world!')
# Read the data and close the streams
io.rewind
binary_data = io.read
zip_io.close_buffer
io.close
Related
I want to extract images from a m4v video sent from mobile to my rails server. These images will be later used for face recognization purposes. There is a gem called "streamio-ffmpeg" that does this job nicely and easily but the problem is that it does not support JRuby-1.7.13 that I am currently using on my server. It's a big application and upgrading the JRuby version not desirable at this moment.
Can someone please suggest JRuby1.7.13 compatible alternative solutions/gems to extract the images from a video file?
From the sourcecode, it looks like streamio-ffmpeg outputs the underlying command by default :
FFMPEG.logger.info("Running transcoding...\n#{command}\n")
So all you have to do is execute :
movie.screenshot("screenshot_%d.jpg", { vframes: 50, frame_rate: '6/2' }, validate: false)
on a system where streamio-ffmpeg is installed.
You look at the output, extract the command, and use it somewhere else with :
system("ffmpeg arguments_you_extracted_from_the_logs")
without having to install streamio-ffmpeg.
I need to cache an ftp folder locally in ruby. Right now I'm using ftp_sync to download the ftp folder but it's painfully slow, do you guys know any library that can download the folder files in parallel?
Thanks!
The syncftp gem may help you:
http://rubydoc.info/gems/syncftp/0.0.3/frames
Ruby has a decent built-in FTP library in case you want to roll your own:
http://www.ruby-doc.org/stdlib-1.9.3/libdoc/net/ftp/rdoc/Net/FTP.html
To download files in parallel, you can use multiple threads with timeouts:
Ruby Net::FTP Timeout Threads
A great way to get parallel work done is Celluloid, the concurrent framework:
https://github.com/celluloid/celluloid
All that said, if the download speed is limited to your overall network bandwidth, then none of these approaches will help much.
To speed up the transfers in this case, be sure you're only downloading the information that's changed: new files and changed sections of existing files.
Segmented downloading can give massive speedups in some cases, such as downloaded big log files where only a small percentage of the file has changed, and the changes are all at the end of the file, and are all appends.
You can also consider shelling out to the command line. There are many tools that can help you with this. A good general-purpose one is "curl", which supports simple ranges for FTP files as well, for example you can get the first 100 bytes of a document using FTP like this:
curl -r 0-99 ftp://www.get.this/README
Are you open to other protocols besides FTP? Take a look at the "rsync" command, which is excellent for download synchronization. The rsync command has many optimizations to transfer just the changed data. For example rsync can sync a remote directory to a local directory like this:
rsync -auvC me#my.com:/remote/foo/ /local/foo/
Take a look at Curb. It's a wrapper around Curl, and can do multiple connections in parallel.
This is a modified version of one of their examples:
require 'curb'
urls = %w[
http://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.3-p286.tar.bz2
http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tar.bz2
]
responses = {}
m = Curl::Multi.new
# add a few easy handles
urls.each do |url|
responses[url] = Curl::Easy.new(url)
puts "Queuing #{ url }..."
m.add(responses[url])
end
spinner_counter = 0
spinner = %w[ | / - \ ]
m.perform do
print 'Performing downloads ', spinner[spinner_counter], "\r"
spinner_counter = (spinner_counter + 1) % spinner.size
end
puts
urls.each do |url|
print "[#{ url } #{ responses[url].total_time } seconds] Saving #{ responses[url].body_str.size } bytes..."
File.open(File.basename(url), 'wb') { |fo| fo.write(responses[url].body_str) }
puts 'done.'
end
That'll pull in both the Ruby and Python source (which are pretty big so they'll take about a minute, depending on your internet connection and host). You won't see any files appear until the last block, where they get written out.
Firstly, I am aware that there are quite a few questions that are similar to this one in SO. I have read most, if not all of them, over the past week. But I still can't make this work for me.
I am developing a Ruby on Rails app that allows users to upload mp3 files to Amazon S3. The upload itself works perfectly, but a progress bar would greatly improve user experience on the website.
I am using the aws-sdk gem which is the official one from Amazon. I have looked everywhere in its documentation for callbacks during the upload process, but I couldn't find anything.
The files are uploaded one at a time directly to S3 so it doesn't need to load it into memory. No multiple file upload necessary either.
I figured that I may need to use JQuery to make this work and I am fine with that.
I found this that looked very promising: https://github.com/blueimp/jQuery-File-Upload
And I even tried following the example here: https://github.com/ncri/s3_uploader_example
But I just could not make it work for me.
The documentation for aws-sdk also BRIEFLY describes streaming uploads with a block:
obj.write do |buffer, bytes|
# writing fewer than the requested number of bytes to the buffer
# will cause write to stop yielding to the block
end
But this is barely helpful. How does one "write to the buffer"? I tried a few intuitive options that would always result in timeouts. And how would I even update the browser based on the buffering?
Is there a better or simpler solution to this?
Thank you in advance.
I would appreciate any help on this subject.
The "buffer" object yielded when passing a block to #write is an instance of StringIO. You can write to the buffer using #write or #<<. Here is an example that uses the block form to upload a file.
file = File.open('/path/to/file', 'r')
obj = s3.buckets['my-bucket'].objects['object-key']
obj.write(:content_length => file.size) do |buffer, bytes|
buffer.write(file.read(bytes))
# you could do some interesting things here to track progress
end
file.close
After read the source code of the AWS gem, I've adapted (or mostly copy) the multipart upload method to yield the current progress based on how many chunks have been uploaded
s3 = AWS::S3.new.buckets['your_bucket']
file = File.open(filepath, 'r', encoding: 'BINARY')
file_to_upload = "#{s3_dir}/#{filename}"
upload_progress = 0
opts = {
content_type: mime_type,
cache_control: 'max-age=31536000',
estimated_content_length: file.size,
}
part_size = self.compute_part_size(opts)
parts_number = (file.size.to_f / part_size).ceil.to_i
obj = s3.objects[file_to_upload]
begin
obj.multipart_upload(opts) do |upload|
until file.eof? do
break if (abort_upload = upload.aborted?)
upload.add_part(file.read(part_size))
upload_progress += 1.0/parts_number
# Yields the Float progress and the String filepath from the
# current file that's being uploaded
yield(upload_progress, upload) if block_given?
end
end
end
The compute_part_size method is defined here and I've modified it to this:
def compute_part_size options
max_parts = 10000
min_size = 5242880 #5 MB
estimated_size = options[:estimated_content_length]
[(estimated_size.to_f / max_parts).ceil, min_size].max.to_i
end
This code was tested on Ruby 2.0.0p0
I'm having some problems reading a file from S3. I want to be able to load the ID3 tags remotely, but using open-URI doesn't work, it gives me the following error:
ruby-1.8.7-p302 > c=TagLib2::File.new(open(URI.parse("http://recordtemple.com.s3.amazonaws.com/music/745/original/The%20Stranger.mp3?1292096514")))
TypeError: can't convert Tempfile into String
from (irb):8:in `initialize'
from (irb):8:in `new'
from (irb):8
However, if i download the same file and put it on my desktop (ie no need for open-URI), it works just fine.
c=TagLib2::File.new("/Users/momofwombie/Desktop/blah.mp3")
is there something else I should be doing to read a remote file?
UPDATE: I just found this link, which may explain a little bit, but surely there must be some way to do this...
Read header data from files on remote server
Might want to check out AWS::S3, a Ruby Library for Amazon's Simple Storage Service
Do an AWS::S3:S3Object.find for the file and then an use about to retrieve the metadata
This solution assumes you have the AWS credentials and permission to access the S3 bucket that contains the files in question.
TagLib2::File.new doesn't take a file handle, which is what you are passing to it when you use open without a read.
Add on read and you'll get the contents of the URL, but TagLib2::File doesn't know what to do with that either, so you are forced to read the contents of the URL, and save it.
I also noticed you are unnecessarily complicating your use of OpenURI. You don't have to parse the URL using URI before passing it to open. Just pass the URL string.
require 'open-uri'
fname = File.basename($0) << '.' << $$.to_s
File.open(fname, 'wb') do |fo|
fo.print open("http://recordtemple.com.s3.amazonaws.com/music/745/original/The%20Stranger.mp3?1292096514").read
end
c = TagLib2::File.new(fname)
# do more processing...
File.delete(fname)
I don't have TagLib2 installed but I ran the rest of the code and the mp3 file downloaded to my disk and is playable. The File.delete would clean up afterwards, which should put you in the state you want to be in.
This solution isn't going to work much longer. Paperclip > 3.0.0 has removed to_file. I'm using S3 & Heroku. What I ended up doing was copying the file to a temporary location and parsing it from there. Here is my code:
dest = Tempfile.new(upload.spreadsheet_file_name)
dest.binmode
upload.spreadsheet.copy_to_local_file(:default_style, dest.path)
file_loc = dest.path
...
CSV.foreach(file_loc, :headers => true, :skip_blanks => true) do |row|}
This seems to work instead of open-URI:
Mp3Info.open(mp3.to_file.path) do |mp3info|
puts mp3info.tag.artist
end
Paperclip has a to_file method that downloads the file from S3.
All I want to do is get all the content from a local file and store it in a variable. How?
File.read(#icon.full_filename).each {|l| r += l}
only gives me a part of it. In PHP, I just used file_get_contents.
data = File.read("/path/to/file")
I think you should consider using IO.binread("/path/to/file") if you have a recent ruby interpreter (i.e. >= 1.9.2)
You could find IO class documentation here http://www.ruby-doc.org/core-2.1.2/IO.html
Answering my own question here... turns out it's a Windows only quirk that happens when reading binary files (in my case a JPEG) that requires an additional flag in the open or File.open function call. I revised it to open("/path/to/file", 'rb') {|io| a = a + io.read} and all was fine.