How to make Ruby Net::SFTP not block, so I can begin streaming a remote download immediately? - ruby-on-rails

How can I acquire a remote file using Net::SFTP and stream it without waiting for the entire file to download?
My test.zip file is 1GB in size. When I run this code my browser does nothing for several minutes and then the download finally begins. I'd like it to begin streaming the file sooner than that. I have to use Net::SFTP or something like it since the file is only available via SSH or SFTP.
I've also tried Net::SFTP's download and download! methods and got the same result.
headers['Content-Type'] = 'application/zip'
headers['Content-disposition'] = "attachment; filename=test.zip"
self.response_body = Enumerator.new do |lines|
Net::SFTP.start('example.com', 'foo', keys: ['~/.ssh/id_rsa.pub']) do |sftp|
file = sftp.file.open('/path/to/test.zip')
lines << file.read(4096) until file.eof?
end
end

Related

How to read a pdf response from Rails app and save to file with parallel tests?

Ok - I have the following in my test/test_helper.rb:
def read_pdf_from_response(response)
file = Tempfile.new
file.write response.body.force_encoding('UTF-8')
begin
reader = PDF::Reader.new(file)
reader.pages.map(&:text).join.squeeze("\n")
ensure
file.close
file.unlink
end
end
I use it like this in an integration test:
get project_path(project, format: 'pdf')
read_response_from_pdf(#response).tap do |pdf|
assert_match(/whatever/, pdf)
end
This works fine as long as I run a test singly or when running all tests with only one worker, e.g. PARALLEL_WORKERS=1. But tests that use this method will fail intermittently when I run my suite with more than 1 parallel worker. My laptop has 8 cores, so that's normally what it's running with.
Here's the error:
PDF::Reader::MalformedPDFError: PDF malformed, expected 5 but found 96 instead
or sometimes: PDF::Reader::MalformedPDFError: PDF file is empty
The PDF reader is https://github.com/yob/pdf-reader which hasn't given any problems.
The controller that sends the PDF returns like so:
send_file out_file,
filename: "#{#project.name}.pdf",
type: 'application/pdf',
disposition: (params[:download] ? 'attachment' : 'inline')
I can't see why this isn't working. No files should ever have the same name at the same time, since I'm using Tempfile, right? How can I make all this run with parallel tests?
While I cannot confirm why this is happening the issue may be that:
You are forcing the encoding to "UTF-8" but PDF documents are binary files so this conversion could be damaging the PDF.
Some of the responses you are receiving are truly empty or malformed.
Maybe try this instead:
def read_pdf_from_response(response)
doc = StringIO.new(response.body.to_s)
begin
PDF::Reader.new(doc)
.pages
.map(&:text)
.join
.squeeze("\n")
rescue PDF::Reader::MalformedPDFError => e
# handle issues with the pdf itself
end
end
This will avoid the file system altogether while still using a compatible IO object and will make sure that the response is read as binary to avoid any conversion conflicts.

How can I ZIP and stream many files without appending to memory on Rails5/Ruby2.4? [duplicate]

I need to serve some data from my database in a zip file, streaming it on the fly such that:
I do not write a temporary file to disk
I do not compose the whole file in RAM
I know that I can do streaming generation of zip files to the filesystemk using ZipOutputStream as here. I also know that I can do streaming output from a rails controller by setting response_body to a Proc as here. What I need (I think) is a way of plugging those two things together. Can I make rails serve a response from a ZipOutputStream? Can I get ZipOutputStream give me incremental chunks of data that I can feed into my response_body Proc? Or is there another way?
Short Version
https://github.com/fringd/zipline
Long Version
so jo5h's answer didn't work for me in rails 3.1.1
i found a youtube video that helped, though.
http://www.youtube.com/watch?v=K0XvnspdPsc
the crux of it is creating an object that responds to each... this is what i did:
class ZipGenerator
def initialize(model)
#model = model
end
def each( &block )
output = Object.new
output.define_singleton_method :tell, Proc.new { 0 }
output.define_singleton_method :pos=, Proc.new { |x| 0 }
output.define_singleton_method :<<, Proc.new { |x| block.call(x) }
output.define_singleton_method :close, Proc.new { nil }
Zip::IoZip.open(output) do |zip|
#model.attachments.all.each do |attachment|
zip.put_next_entry "#{attachment.name}.pdf"
file = attachment.file.file.send :file
file = File.open(file) if file.is_a? String
while buffer = file.read(2048)
zip << buffer
end
end
end
sleep 10
end
end
def getzip
self.response_body = ZipGenerator.new(#model)
#this is a hack to preven middleware from buffering
headers['Last-Modified'] = Time.now.to_s
end
EDIT:
the above solution didn't ACTUALLY work... the problem is that rubyzip needs to jump around the file to rewrite the headers for entries as it goes. particularly it needs to write the compressed size BEFORE it writes the data. this is just not possible in a truly streaming situation... so ultimately this task may be impossible. there is a chance that it might be possible to buffer a whole file at a time, but this seemed less worth it. ultimately i just wrote to a tmp file... on heroku i can write to Rails.root/tmp less instant feedback, and not ideal, but neccessary.
ANOTHER EDIT:
i got another idea recently... we COULD know the compressed size of the files if we do not compress them. the plan goes something like this:
subclass the ZipStreamOutput class as follows:
always use the "stored" compression method, in other words do not compress
ensure we never seek backwards to change file headers, get it all right up front
rewrite any code related to TOC that seeks
I haven't tried to implement this yet, but will report back if there's any success.
OK ONE LAST EDIT:
In the zip standard: http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers
they mention that there's a bit you can flip to put the size, compressed size and crc AFTER a file. so my new plan was to subclass zipoutput stream so that it
sets this flag
writes sizes and CRCs after the data
never rewinds output
furthermore i needed to get all the hacks in order to stream output in rails fixed up...
anyways it all worked!
here's a gem!
https://github.com/fringd/zipline
I had a similar issue. I didn't need to stream directly, but only had your first case of not wanting to write a temp file. You can easily modify ZipOutputStream to accept an IO object instead of just a filename.
module Zip
class IOOutputStream < ZipOutputStream
def initialize io
super '-'
#outputStream = io
end
def stream
#outputStream
end
end
end
From there, it should just be a matter of using the new Zip::IOOutputStream in your Proc. In your controller, you'd probably do something like:
self.response_body = proc do |response, output|
Zip::IOOutputStream.open(output) do |zip|
my_files.each do |file|
zip.put_next_entry file
zip << IO.read file
end
end
end
It is now possible to do this directly:
class SomeController < ApplicationController
def some_action
compressed_filestream = Zip::ZipOutputStream.write_buffer do |zos|
zos.put_next_entry "some/filename.ext"
zos.print data
end
compressed_filestream .rewind
respond_to do |format|
format.zip do
send_data compressed_filestream .read, filename: "some.zip"
end
end
# or some other return of send_data
end
end
This is the link you want:
http://info.michael-simons.eu/2008/01/21/using-rubyzip-to-create-zip-files-on-the-fly/
It builds and generates the zipfile using ZipOutputStream and then uses send_file to send it directly out from the controller.
Use chunked HTTP transfer encoding for output: HTTP header "Transfer-Encoding: chunked" and restructure the output according to the chunked encoding specification, so no need to know the resulting ZIP file size at the begginning of the transfer. Can be easily coded in Ruby with the help of Open3.popen3 and threads.

Concatenate bytes - Writing file to FTP folder as chunks

I'm writing an Rails app which could upload files to storage. Big files are splitted into chunks from client (with JS) and upload parts to server.
As in development, I could simply open existed file and write following bytes into that.
(I'm using CarrierWave gem)
File.open(#up_file.link.path, "ab") do |f|
f.write(up_file_params[:link].read)
end
# This code worked when I upload to '/public' folder in development
However, now I want to use a FTP server to storage files. But I can't Concatenate new bytes with existed bytes.
def get_ftp_connection # create a new FTP connection
ftp = Net::FTP.new
ftp.connect(ENV['ftp_host'], ENV['ftp_port'])
begin
ftp.passive = ENV['ftp_passive']
ftp.login(ENV['ftp_user'], ENV['ftp_passwd'])
yield ftp
ensure
ftp.quit
end
end
.....
def create
.....
get_ftp_connection #up_file do |ftp|
full_path = ::File.dirname "#{ENV['ftp_folder']}/#{#up_file.link.path}"
base_name = File.basename(#up_file.link.to_s)
ftp.chdir(full_path)
ftp.putbinaryfile(up_file_params[:link].read, base_name)
end
end
I got ArgumentError (string contains null byte): at putbinaryfile... , any help :(
As mentioned in the comments you could completely download the file first and then upload to the ftp server. If that's not an option for whatever reason, you could append to the remote file as it's being uploaded:
ftp.storbinary("APPE #{base_name}", up_file_params[:link], Net::FTP::DEFAULT_BLOCKSIZE)

Rubyzip: Export zip file directly to S3 without writing tmpfile to disk?

I have this code, which writes a zip file to disk, reads it back, uploads to s3, then deletes the file:
compressed_file = some_temp_path
Zip::ZipOutputStream.open(compressed_file) do |zos|
some_file_list.each do |file|
zos.put_next_entry(file.some_title)
zos.print IO.read(file.path)
end
end # Write zip file
s3 = Aws::S3.new(S3_KEY, S3_SECRET)
bucket = Aws::S3::Bucket.create(s3, S3_BUCKET)
bucket.put("#{BUCKET_PATH}/archive.zip", IO.read(compressed_file), {}, 'authenticated-read')
File.delete(compressed_file)
This code works already but what I want is to not create the zip file anymore, to save a few steps. I was wondering if there is a way to export the zipfile data directly to s3 without having to first create a tmpfile, read it back, then delete it?
I think I just found the answer to my question.
It's Zip::ZipOutputStream.write_buffer. I'll check this out and update this answer when I get it working.
Update
It does work. My code is like this now:
compressed_filestream = Zip::ZipOutputStream.write_buffer do |zos|
some_file_list.each do |file|
zos.put_next_entry(file.some_title)
zos.print IO.read(file.path)
end
end # Outputs zipfile as StringIO
s3 = Aws::S3.new(S3_KEY, S3_SECRET)
bucket = Aws::S3::Bucket.create(s3, S3_BUCKET)
compressed_filestream.rewind
bucket.put("#{BUCKET_PATH}/archive.zip", compressed_filestream.read, {}, 'authenticated-read')
The write_buffer returns a StringIO and needs to rewind the stream first before reading it. Now I don't need to create and delete the tmpfile.
I'm just wondering now if write_buffer would be more memory extensive or heavier than open? Or is it the other way around?

Why does using OpenURI to download a file result in a partial file?

I'm trying to use OpenURI to download a file from S3, and then save it locally so I can send the file as an attachment with ActionMailer.
Something strange is going on. The images being downloaded and attached are corrupt, the bottom parts of the images are missing.
Here's the code:
require 'open-uri'
open("#{Rails.root.to_s}/tmp/#{a.attachment_file_name}", "wb") do |file|
source_url = a.authenticated_url()
io = open(URI.parse(source_url).to_s)
file << io.read
attachments[a.attachment_file_name] = File.read("#{Rails.root.to_s}/tmp/#{a.attachment_file_name}")
end
a is the attachment from ActionMailer.
What can I try next?
It looks like you're trying to read the file before it's been closed, which could leave part of the file buffer unwritten.
I'd do it like this:
require 'open-uri'
source_url = a.authenticated_url()
attachment_file = "#{Rails.root.to_s}/tmp/#{a.attachment_file_name}"
open(attachment_file, "wb") do |file|
file.print open(source_url, &:read)
end
attachments[a.attachment_file_name] = File.read(attachment_file)
It looks like source_url = a.authenticated_url() will be a string, so parsing the string into a URI then doing to_s on it will be redundant unless URI is doing some normalizing, which I don't think it does.
Based on my sysadmin experience: A side task is cleaning up the downloaded/spooled files. They could be deleted immediately after being attached, or you could have a cron job that runs daily, deleting all spooled files over one day old.
An additional concern for this is there is no error handling in case the URL can't be read, causing the attachment to fail. Using a temp spool file you could check for the existence of the file. Even better, you should probably be prepared to handle an exception if the server returns a 400 or 500 error.
To avoid using a temporary spool file try this untested code:
require 'open-uri'
source_url = a.authenticated_url()
attachments[a.attachment_file_name] = open(source_url, &:read)

Resources