I have a model with a paperclip attachment (image or pdf file). I'm processing this model in a background process with Sidekiq. In the background process I'm converting the pdf file into a pdf file string with imagemagick convert function in order to get rid of any password protection or garbage in the file format.
This convert works well in my development environment's sidekiq and on my staging server's sidekiq but it is not working on production server's sidekiq. On my production server it is working if I run the converting method from Rails console. When it is not working the dst file is empty and the returned string is ''
The model looks like this
class Receipt < ActiveRecord::Base
has_attached_file :image, styles: { original: {} }, path: ':rails_root/attachments/:class/:attachment/:id_partition/:style/:filename'
validates_attachment_content_type :image, :content_type => IMAGE_MIME_TYPES
def convert_pdf_to_pdf_string
if %w(application/pdf application/octet-stream).include?(image.content_type)
begin
dst = Tempfile.new([File.basename(image.path, File.extname(image.path)), '.pdf'], :encoding => 'ascii-8bit')
dst.flush
`convert -density 200 #{image.path.shellescape} #{dst.path.shellescape}`
dst
str = File.open(dst.path, 'rb') {|f| f.read}.to_s
ensure
dst.close
dst.unlink
end
end
end
end
I'm out of ideas how to debug this problem.
There is no exceptions raised. I also tried to log the return value of the backticks call but that doesn't return anything.
Please throw any ideas what I should check.
I started to figure out how to make the convert to produce some logging. After I changed the backticks call to
`convert -density 200 #{image.path.shellescape} #{dst.path.shellescape} 2>log/magick_error.log 1>log/magick.log`
the code started to work. Into the magick_error.log it produces
**** Warning: File has some garbage before %PDF- .
So I guess this output to STDERR was some how screwing up things in my production environment. Staging and development environments were tolerating that.
Related
Ok - I have the following in my test/test_helper.rb:
def read_pdf_from_response(response)
file = Tempfile.new
file.write response.body.force_encoding('UTF-8')
begin
reader = PDF::Reader.new(file)
reader.pages.map(&:text).join.squeeze("\n")
ensure
file.close
file.unlink
end
end
I use it like this in an integration test:
get project_path(project, format: 'pdf')
read_response_from_pdf(#response).tap do |pdf|
assert_match(/whatever/, pdf)
end
This works fine as long as I run a test singly or when running all tests with only one worker, e.g. PARALLEL_WORKERS=1. But tests that use this method will fail intermittently when I run my suite with more than 1 parallel worker. My laptop has 8 cores, so that's normally what it's running with.
Here's the error:
PDF::Reader::MalformedPDFError: PDF malformed, expected 5 but found 96 instead
or sometimes: PDF::Reader::MalformedPDFError: PDF file is empty
The PDF reader is https://github.com/yob/pdf-reader which hasn't given any problems.
The controller that sends the PDF returns like so:
send_file out_file,
filename: "#{#project.name}.pdf",
type: 'application/pdf',
disposition: (params[:download] ? 'attachment' : 'inline')
I can't see why this isn't working. No files should ever have the same name at the same time, since I'm using Tempfile, right? How can I make all this run with parallel tests?
While I cannot confirm why this is happening the issue may be that:
You are forcing the encoding to "UTF-8" but PDF documents are binary files so this conversion could be damaging the PDF.
Some of the responses you are receiving are truly empty or malformed.
Maybe try this instead:
def read_pdf_from_response(response)
doc = StringIO.new(response.body.to_s)
begin
PDF::Reader.new(doc)
.pages
.map(&:text)
.join
.squeeze("\n")
rescue PDF::Reader::MalformedPDFError => e
# handle issues with the pdf itself
end
end
This will avoid the file system altogether while still using a compatible IO object and will make sure that the response is read as binary to avoid any conversion conflicts.
There is this well known issue with Paperclip.
https://github.com/thoughtbot/paperclip/issues/1924
How do I configure my model so that this stupid spoofing validation will work?
Before the problem was discovered I was using:
validates_attachment_content_type :csv_import, :content_type => 'text/csv'
But that would not work on some versions of Windows. On Windows 7 Professional I get this error:
[paperclip] Content Type Spoof: Filename delivery_detail.csv (application/octet-stream from Headers, [#<MIME::Type:0x00000005077f38 #content_type="text/csv", #raw_media_type="text", #raw_sub_type="csv", #simplified="text/csv", #media_type="text", #sub_type="csv", #extensions=["csv"], #encoding="8bit", #system=nil, #registered=true, #url=["IANA", "RFC4180"], #obsolete=nil, #docs=nil>, #<MIME::Type:0x000000050c7f60 #content_type="text/comma-separated-values", #raw_media_type="text", #raw_sub_type="comma-separated-values", #simplified="text/comma-separated-values", #media_type="text", #sub_type="comma-separated-values", #extensions=["csv"], #encoding="8bit", #system=nil, #registered=false, #url=nil, #obsolete="!", #docs="use-instead:text/csv", #use_instead=["text/csv"]>] from Extension), content type discovered from file command: text/plain. See documentation to allow this combination.
Has anyone ever succeded in making paperclip upload csv files?
I tried every possible workaround from Github issue reports and nothing has worked. I need to see working example solution.
update 1
sonianand11 commented on 2 Oct 2014
https://github.com/thoughtbot/paperclip/issues/1470
This works, but it involves switching off content validation, Is there a better way to do it?.
I came up with the following solution:
add to the model:
validates_attachment_content_type :my_csv_uploaded_file, content_type: ['text/plain', 'text/csv', 'application/vnd.ms-excel']
and to the initializer:
Paperclip.options[:content_type_mappings] = { csv: 'application/vnd.ms-excel' }
It worked for me. It was tested using Windows 7 Professional
I can receive emails via imap through the mail gem and would like to add the attachments to my model (called message). I receive this error based on my code taken from this blog post:
Encoding::UndefinedConversionError ("\xC1" from ASCII-8BIT to UTF-8):
My code:
mail.attachments.each_with_index do | attachment, index |
fake_file = AttachmentFile.new("test.jpg")
fake_file.write(attachment.decoded)
fake_file.flush
fake_file.original_filename = attachment.filename
fake_file.content_type = "image/gif"
#message.doc1 = fake_file if index == 0 and attachment.content_type.start_with?("image/")
end
Not sure what I am doing wrong to cause the error - maybe because the file is not read in binary mode? Another alternative was given in the same post:
file = StringIO.new(attachment.decoded)
file.class.class_eval { attr_accessor :original_filename, :content_type }
file.original_filename = attachment.filename
file.content_type = attachment.mime_type
This worked once with a gif but failed with a pdf, providing rollbacks. Also, I am on Windows for Dev, which has lead to problems with paperclip in the past (file size not read properly etc.)
The model has the following validations:
validates_attachment_content_type :doc1, :content_type => [ 'application/pdf', /image/ ], :message => "only pdf or img"
My log files are not that helpful, only normal output for the first and second option:
Command :: SET PATH=/usr/bin;%PATH% & file -b --mime "C:/Users/FOUNDA~1/AppData/Local/Temp/c81e728d9d4c2f636f067f89cc14862c20150412-5216-1utzh29.gif"
Not sure what kind of initialization there is for the mail gem - it is able to pull the emails correctly. Sorry, maybe I misunderstood your reply.
I made a mistake by having a validation that requested a certain structure for the filename, which was causing problems. The second option I listed works flawlessly now. Can't speak for the first one - the problems there would be the same, probably.
I am using Paperclip in my Rails application for attaching images.
I declared validation for content_type in my model as
validates_attachment :image,
:content_type => { :content_type => ["image/jpg", "image/gif", "image/png"] }
I have two examples, one with a valid image and other with an invalid image
For an invalid image, i just renamed a .txt file to a .png
it "Image is valid" do
image = File.new("#{Rails.root}/spec/support/right.png")
expect(FactoryGirl.build(:pin, image: image)).to be_valid
end
it "Image is invalid" do
image = File.new("#{Rails.root}/spec/support/wrong.png")
expect(FactoryGirl.build(:pin, image: image)).to have(1).errors_on(:image_content_type)
end
I expected that both my examples should run successfully. BUT, my second example fails.
I don't get any error for content_type of wrong.png.
I thought that Paperclip's content_type validation would actually check file format(binary data encoding) of an uploaded file. BUT it seems that here, its just checking for the file extension. Does this validation only check extension of an uploaded file?
I maybe missing something here(configuration?). Is there any other validation available in Paperclip to achieve this? Or should I opt for a Custom Validator in this case?
This issue is resolved in Paperclip's latest version 4.1.1 released on February 21, 2014.
Both of my following examples pass correctly now.
it "Image is valid" do
image = File.new("#{Rails.root}/spec/support/right.png")
expect(FactoryGirl.build(:pin, image: image)).to be_valid
end
it "Image is invalid" do
image = File.new("#{Rails.root}/spec/support/wrong.png")
expect(FactoryGirl.build(:pin, image: image)).to have(1).errors_on(:image_content_type)
end
After a little bit of research found out that,
When I upload an invalid image,
For example: spoof(renamed) wrong.txt file as wrong.png and upload.
In prior release of Paperclip, wrong.png passes the content_type validation with flying colors without giving any error because Paperclip only used to check the extensions of the uploaded file and not content within.
Whereas, In the current release of Paperclip 4.1.1, same spoofed wrong.png fails the validation and throws the following error in view:
Image has an extension that does not match its contents
Upon investigating server log entries, I found the following:
Command :: file -b --mime-type
'/var/folders/tg/8sxl1vss4fb0sqtcrv3lzcfm0000gn/T/a7f21d0002b0d9d91eb158d702cd930320140317-531-swkmb8'
[paperclip] Content Type Spoof: Filename wrong.png (["image/png"]),
content type discovered from file command: text/plain. See
documentation to allow this combination.
Here, you can see that Paperclip actually checked the content of the uploaded file stating text/plain and also erred out saying Content Type Spoof.
Hope my findings will help others to understand how Paperclip's content-type validation has improved over the time.
On a RoR app that i've inherited, a test is failing that involves a file upload. The assertion that fails looks like so:
assert_equal File.size("#{RAILS_ROOT}/test/fixtures/#{filename}"), #candidate.picture.length
It fails with (the test file is 69 bytes):
<69> expected but was <5>.
This is after a post using:
fixture_file_upload(filename, content_type, :binary)
In the candidate model, the uploaded file is assigned to a property that is then saved to a mediumblob in MySQL. It looks to me like the uploaded file is 69 bytes, but after it is assigned to the model property (using UploadedFile.read), the length is showing as only 5 bytes.
So this code:
puts "file.length=" + file.length.to_s
self.picture = file.read
puts "self.picture.length=" + self.picture.length.to_s
results in this output:
file.length=69
self.picture.length=5
I'm at a bit of a loss as to why this is, any ideas?
This came down to a Windows/Ruby idiosyncrasy, where reading the file appeared to be happening in text mode. There is an extension in this app in test_helper, something like:
class ActionController::TestUploadedFile
# Akward but neccessary for testing since an ActionController::UploadedFile subtype is expected
include ActionController::UploadedFile
def read
tempfile = File.new(self.path)
tempfile.read
end
end
And apparently, on Windows, there is a specific IO method that can be called to force the file into binary mode. Calling this method on the tempfile, like so:
tempfile.binmode
caused everything to work as expected, with the read from the UploadedFile matching the size of the fixture file on disk.