Why does using OpenURI to download a file result in a partial file? - ruby-on-rails

I'm trying to use OpenURI to download a file from S3, and then save it locally so I can send the file as an attachment with ActionMailer.
Something strange is going on. The images being downloaded and attached are corrupt, the bottom parts of the images are missing.
Here's the code:
require 'open-uri'
open("#{Rails.root.to_s}/tmp/#{a.attachment_file_name}", "wb") do |file|
source_url = a.authenticated_url()
io = open(URI.parse(source_url).to_s)
file << io.read
attachments[a.attachment_file_name] = File.read("#{Rails.root.to_s}/tmp/#{a.attachment_file_name}")
end
a is the attachment from ActionMailer.
What can I try next?

It looks like you're trying to read the file before it's been closed, which could leave part of the file buffer unwritten.
I'd do it like this:
require 'open-uri'
source_url = a.authenticated_url()
attachment_file = "#{Rails.root.to_s}/tmp/#{a.attachment_file_name}"
open(attachment_file, "wb") do |file|
file.print open(source_url, &:read)
end
attachments[a.attachment_file_name] = File.read(attachment_file)
It looks like source_url = a.authenticated_url() will be a string, so parsing the string into a URI then doing to_s on it will be redundant unless URI is doing some normalizing, which I don't think it does.
Based on my sysadmin experience: A side task is cleaning up the downloaded/spooled files. They could be deleted immediately after being attached, or you could have a cron job that runs daily, deleting all spooled files over one day old.
An additional concern for this is there is no error handling in case the URL can't be read, causing the attachment to fail. Using a temp spool file you could check for the existence of the file. Even better, you should probably be prepared to handle an exception if the server returns a 400 or 500 error.
To avoid using a temporary spool file try this untested code:
require 'open-uri'
source_url = a.authenticated_url()
attachments[a.attachment_file_name] = open(source_url, &:read)

Related

How to read a pdf response from Rails app and save to file with parallel tests?

Ok - I have the following in my test/test_helper.rb:
def read_pdf_from_response(response)
file = Tempfile.new
file.write response.body.force_encoding('UTF-8')
begin
reader = PDF::Reader.new(file)
reader.pages.map(&:text).join.squeeze("\n")
ensure
file.close
file.unlink
end
end
I use it like this in an integration test:
get project_path(project, format: 'pdf')
read_response_from_pdf(#response).tap do |pdf|
assert_match(/whatever/, pdf)
end
This works fine as long as I run a test singly or when running all tests with only one worker, e.g. PARALLEL_WORKERS=1. But tests that use this method will fail intermittently when I run my suite with more than 1 parallel worker. My laptop has 8 cores, so that's normally what it's running with.
Here's the error:
PDF::Reader::MalformedPDFError: PDF malformed, expected 5 but found 96 instead
or sometimes: PDF::Reader::MalformedPDFError: PDF file is empty
The PDF reader is https://github.com/yob/pdf-reader which hasn't given any problems.
The controller that sends the PDF returns like so:
send_file out_file,
filename: "#{#project.name}.pdf",
type: 'application/pdf',
disposition: (params[:download] ? 'attachment' : 'inline')
I can't see why this isn't working. No files should ever have the same name at the same time, since I'm using Tempfile, right? How can I make all this run with parallel tests?
While I cannot confirm why this is happening the issue may be that:
You are forcing the encoding to "UTF-8" but PDF documents are binary files so this conversion could be damaging the PDF.
Some of the responses you are receiving are truly empty or malformed.
Maybe try this instead:
def read_pdf_from_response(response)
doc = StringIO.new(response.body.to_s)
begin
PDF::Reader.new(doc)
.pages
.map(&:text)
.join
.squeeze("\n")
rescue PDF::Reader::MalformedPDFError => e
# handle issues with the pdf itself
end
end
This will avoid the file system altogether while still using a compatible IO object and will make sure that the response is read as binary to avoid any conversion conflicts.

Rails FTP OPEN CSV

I have the following code to connect my rails app to my FTP. This works great. However, I want to use open-uri to open the csv file so I can parse it. Any ideas how to do this? I think it's an easy thing to do but I'm missing something.
require 'net/ftp'
ftp = Net::FTP.new
ftp.connect("xxx.xxx.xx.xxx",21)
ftp.login("xxxxx","xxxx")
ftp.chdir("/")
ftp.passive = true
puts ftp.list("TEST.csv")
You'll need to use #gettextfile.
A) Get the file to a local temporary file and read its content
# Creating a tmp file can be done differently as well.
# It may also be omitted, in which case `gettextfile`
# will create a file in the current directory.
Dir::Tmpname.create(['TEST', ['.csv']) do |file_name|
ftp.gettextfile('TEST.csv', file_name)
content = File.read(file_name)
end
B) Pass a block to gettextfile and get the content one line at a time
content = ''
ftp.gettextfile('TEST.csv') do |line|
content << line
end

Heroku: Unpacking a Gzip file through a rake task fails

I'm using Rails 5.2 with ruby 2.5.1 and am deploying my app to Heroku.
I ran into problems when I tried running my local rake task. The task calls an API which responds with a *.gz file, saves it, upzips and then uses the retrieved JSON to populate the database and finally deletes the *.gz file. The task runs smooth in development but when called in production. The last line printed into the console is 'Unzipping the file...', so my guess is that the issues origin from the zlib library.
companies_list.rake
require 'json'
require 'open-uri'
require 'zlib'
require 'openssl'
require 'action_view'
include ActionView::Helpers::DateHelper
desc 'Updates Company table'
task update_db: :environment do
start = Time.now
zip_file_url = 'https://example.com/api/download'
TEMP_FILE_NAME = 'companies.gz'
puts 'Creating folders...'
tempdir = Dir.mktmpdir
file_path = "#{tempdir}/#{TEMP_FILE_NAME}"
puts 'Downloading the file...'
open(file_path, 'wb') do |file|
open(zip_file_url, { ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE }) do |uri|
file.write(uri.read)
end
end
puts 'Download complete.'
puts 'Unzipping the file...'
gz = Zlib::GzipReader.new(open(file_path))
output = gz.read
#companies_array = JSON.parse(output)
puts 'Unzipping complete.'
(...)
end
Has anyone else run into similar issues and knows how to get it to work?
Your code snippet does not indicate that you ever close your GzipReader. It is often best to wrap IO's in blocks to ensure they are closed appropriately. Also, the open method may not be the one you want, so just let GzipReader handle opening the file for you and just send in the file_path.
Zlib::GzipReader.new(file_path) do |gz|
output = gz.read
#companies_array = JSON.parse(output)
end
The issue was linked to memory limit rather than Gzip unpacking (that's why the problem only occurred in production).
The solution was using a Json::Streamer so that the whole file is not loading into memory at once.
This is the crucial part: (goes after the code posted in the question)
puts 'Updating the Company table...'
streamer = Json::Streamer.parser(file_io: file, chunk_size: 1024) # customize your chunk_size
streamer.get(nesting_level: 1) do |company|
(do your stuff with the API data here...)
end
end

How can I download an image from a website using Rails?

I'm using Selenium-Webdriver, OpenUri and Nokogiri to scrape a website. I want to download a particular image from said website to my Ubuntu computer. I tried a few different methods but each of them gives a different error message.
Here's my base code, which opens the website and gets the image url (everything after this I ran in my pry console):
require 'open-url'
require 'selenium-webdriver'
require 'nokogiri'
require 'uri'
url = "https://www.google.com/"
browser = Selenium::WebDriver.for :chrome
document = open(url).read
parsed_content = Nokogiri::HTML(content)
image = "https://www.google.com" + parsed_content.css('#hplogo').attr('src').value
binding.pry
1) Here's the first thing I tried to download the image:
download = open(image)
IO.copy_stream(download, '~/image.png')
For this, I got the following error:
Errno::ENOENT: No such file or directory # rb_sysopen - ~/image.png from (pry):44:in 'initialize'
As per this question, I tried adding a directory in the code:
FileUtils.mkdir_p(image) unless File.exist?(image)
But I got the same error.
2) Next I tried this:
open('image.png', 'wb') do |file|
file << open(image).read
end
and this returns
#<File:image.png (closed)
but the file isn't anywhere on my computer and I can't figure out what that message means.
3) Next I tried
IO.copy_stream(open(image), 'image.png')
which simply returned this:
5482
but again, I have no idea what that means and the file isn't anywhere.
4) Finally I tried
read_image = open(image).read
File.open(image, 'image.png') do |file|
file.puts read_image
end
which outputs
ArgumentError: invalid access mode image.png
from (pry):53:in 'initialize
What am I doing wrong? Was I close with any of my approaches?
File open second argument is mode for file openning.
read_image = open(image).read
File.open('image.png', 'w+') do |file|
file.write read_image
end
Your third variant works good.
5482 - length of file. File 'image.png' in same directory as your .rb file.

How do I copy a file onto a separate server using Net::FTP?

I'm building a Rails app which creates a bookmarklet file for each user upon sign-up. I'd like to save that file onto a remote server, so I'm trying Ruby's Net::FTP, based on "Rails upload file to ftp server".
I tried this code:
require 'net/ftp'
FileUtils.cp('public/ext/files/script.js', 'public/ext/bookmarklets/'+resource.authentication_token )
file = File.open('public/ext/bookmarklets/'+resource.authentication_token, 'a') {|f| f.puts("cb_bookmarklet.init('"+resource.username+"', '"+resource.authentication_token+"', '"+resource.id.to_s+"');$('<link>', {href: '//***.com/bookmarklet/cb.css',rel: 'stylesheet',type: 'text/css'}).appendTo('head');});"); return f }
ftp = Net::FTP.new('www.***.com')
ftp.passive = true
ftp.login(user = '***', psswd = '***')
ftp.storbinary("STOR " + file.original_filename, StringIO.new(file.read), Net::FTP::DEFAULT_BLOCKSIZE)
ftp.quit()
But I'm getting an error that the file variable is nil. I may be doing several things wrong here. I'm pretty new to Ruby and Rails, so any help is welcome.
The block form of File.open does not return the file handle (and even if it did, it would be closed at that point). Perhaps change your code to roughly:
require '…'
FileUtils.cp …
File.open('…','a') do |file|
ftp = …
ftp.storbinary("STOR #{file.original_filename}", StringIO.new(file.read))
ftp.quit
end
Alternatively:
require '…'
FileUtils.cp …
filename = '…'
contents = IO.read(filename)
ftp = …
ftp.storbinary("STOR #{filename}", StringIO.new(contents))
ftp.quit

Resources