Carrierwave image extensions - ruby-on-rails

I'm trying to determine whether a remote url is an image. Most url's have .jpg, .png etc...but some images, like google images, have no extension...i.e.
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSbK2NSUILnFozlX-oCWQ0r2PS2gHPPF7c8XaxGuJFGe83KGJkhFtlLXU_u
I've tried using FastImage to determine whether a url is an image. It works when any URL is fed into it...
How could I ensure that remote urls use FastImage and uploaded files use the whitelist? Here is what have in my uploader. Avatar_remote_url isn't recognized...what do I do in the uploader to just test remote urls and not regular files.
def extension_white_list
if defined? avatar_remote_url && !FastImage.type(CGI::unescape(avatar_remote_url)).nil?
# ok to process
else # regular uploaded file should detect the following extensions
%w(jpg jpeg gif png)
end
end

if all you have to work with is a url like that you can send a HEAD request to the server to obtain the content type for the image. From that you can obtain the extension
require 'net/http'
require 'mime/types'
def get_extension(url)
uri = URI.parse(url)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true if uri.scheme == 'https'
request = Net::HTTP::Head.new(uri.request_uri)
response = http.request(request)
content_type = response['Content-Type']
MIME::Types[content_type].first.extensions.first
end

I'm working with the code you provided and some of the code provided in the CarrierWave Wiki for validating remote URLs.
You can create a new validator in lib/remote_image_validator.rb.
require 'fastimage'
class RemoteImageValidator < ActiveModel::EachValidator
def validate_each(object, attribute, value)
raise(ArgumentError, "A regular expression must be supplied as the :format option of the options hash") unless options[:format].nil? || options[:format].is_a?(Regexp)
configuration = { :message => "is invalid or not responding", :format => URI::regexp(%w(http https)) }
configuration.update(options)
if value =~ configuration[:format]
begin
if FastImage.type(CGI::unescape(avatar_remote_url))
true
else
object.errors.add(attribute, configuration[:message]) and false
end
rescue
object.errors.add(attribute, configuration[:message]) and false
end
else
object.errors.add(attribute, configuration[:message]) and false
end
end
end
Then in your model
class User < ActiveRecord::Base
validates :avatar_remote_url,
:remote_image => {
:format => /(^$)|(^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$)/ix,
:unless => remote_avatar_url.blank?
}
end

I was having a similar issue where creating the different versions from the original was failing because ImageMagick could not figure out the correct encoder to use due to the missing extension. Here is a monkey-patch I applied in Rails that fixed my problem:
module CarrierWave
module Uploader
module Download
class RemoteFile
def original_filename
value = File.basename(file.base_uri.path)
mime_type = Mime::Type.lookup(file.content_type)
unless File.extname(value).present? || mime_type.blank?
value = "#{value}.#{mime_type.symbol}"
end
value
end
end
end
end
end
I believe this will address the problem you are having as well since it ensures the existence of a file extension when the content type is set appropriately.
UPDATE:
The master branch of carrierwave has a different solution to this problem that uses the Content-Disposition header to figure out the filename. Here is the relevant pull request on github.

Related

Carrierwave sets mime type to invalid/invalid

I recently upgraded from Carrierwave 1.3 to 2.1, and I got a couple of specs failing due to the invalid mime type.
I store on the database, CSV Uploads, and I validate on the model if the mime type is text/csv.
validates :file, presence: true, file_content_type: {
allow: [
'text/csv',
'application/vnd.ms-excel',
'application/vnd.ms-office',
'application/octet-stream',
'text/comma-separated-values'
]
}
and on the spec, I created a fixture
let(:file) { fixture_file_upload('files/fixture.csv', 'text/csv') }
when I debug,
#file=
#<CarrierWave::SanitizedFile:0x00007f8c731791f0
#content=nil,
#content_type="invalid/invalid",
#file="/Users/tiagovieira/code/work/tpc/public/uploads/csv_file_upload/file/1/1605532759-308056149220914-0040-7268/fixture.csv",
#original_filename="fixture.csv">,
#filename="fixture.csv",
#identifier="fixture.csv",
Is this related to the fact that carrierwave stopped using mime-types gem as a dependency?
Seems the problem is found.
In previous carrierwave version "CarrierWave::SanitizedFile" content_type was calculated by extension
https://github.com/carrierwaveuploader/carrierwave/blob/1.x-stable/lib/carrierwave/sanitized_file.rb
def content_type
return #content_type if #content_type
if #file.respond_to?(:content_type) and #file.content_type
#content_type = #file.content_type.to_s.chomp
elsif path
#content_type = ::MIME::Types.type_for(path).first.to_s
end
end
And now it has more complicated way. It uses algorithms to recognize the file type by what data this file contains.
https://github.com/carrierwaveuploader/carrierwave/blob/master/lib/carrierwave/sanitized_file.rb
def content_type
#content_type ||=
existing_content_type ||
mime_magic_content_type ||
mini_mime_content_type
end
And i have "invalid/invalid" content-type after mime_magic_content_type which seems could not fetch file type using "MimeMagic.by_magic".
PS i see that "plain/text" content_type is returned for usual css file.
https://github.com/minad/mimemagic/blob/master/lib/mimemagic/tables.rb#L1506
use Rack::Test::UploadedFile
assign the mounted model a Rack::Test::UploadedFile object
assume your model is:
class User < ApplicationRecord
mount_uploader :file, FileUploader
end
to test the uploader you can use something like:
user.file = Rack::Test::UploadedFile.new(File.open('test_file.csv'), "text/csv")
user.save
whitelist by Carrierwave's content_type_whitelist or extension_whitelist
class FileUploader < CarrierWave::Uploader::Base
private
def extension_whitelist
%w(csv xlsx xls)
end
def content_type_whitelist
[
'text/csv',
'application/vnd.ms-excel',
'application/vnd.ms-office',
'application/octet-stream',
'text/comma-separated-values'
]
end
end
also check:
https://til.codes/testing-carrierwave-file-uploads-with-rspec-and-factorygirl/

Reading and writing file attributes

In the rails console:
ActionDispatch::Http::UploadedFile.new tempfile: 'tempfilefoo', original_filename: 'filename_foo.jpg', content_type: 'content_type_foo', headers: 'headers_foo'
=> #<ActionDispatch::Http::UploadedFile:0x0000000548f3a0 #tempfile="tempfilefoo", #original_filename=nil, #content_type=nil, #headers=nil>
I can write a string to #tempfile, and yet #original_filename, #content_type and #headers remain as nil
Why is this and how can I write information to these attributes?
And how can I read these attributes from a file instance?
i.e.
File.new('path/to/file.png')
It's not documented (and doesn't make much sense), but it looks like the options UploadedFile#initialize takes are :tempfile, :filename, :type and :head:
def initialize(hash) # :nodoc:
#tempfile = hash[:tempfile]
raise(ArgumentError, ':tempfile is required') unless #tempfile
#original_filename = encode_filename(hash[:filename])
#content_type = hash[:type]
#headers = hash[:head]
end
Changing your invocation to this ought to work:
ActionDispatch::Http::UploadedFile.new tempfile: 'tempfilefoo',
filename: 'filename_foo.jpg', type: 'content_type_foo', head: 'headers_foo'
Or you can set them after initialization:
file = ActionDispatch::Http::UploadedFile.new tempfile: 'tempfilefoo', filename: 'filename_foo.jpg'
file.content_type = 'content_type_foo'
file.headers = 'headers_foo'
I'm not sure I understand your second question, "And how can I read these attributes from a file instance?"
You can extract the filename (or last component) from any path with File.basename:
file = File.new('path/to/file.png')
File.basename(file.path) # => "file.png"
If you want to get the Content-Type that corresponds to a file extension, you can use Rails' Mime module:
type = Mime["png"] # => #<Mime::Type:... #synonyms=[], #symbol=:png, #string="text/png">
type.to_s # => "text/png"
You can put this together with File.extname, which gives you the extension:
ext = File.extname("path/to/file.png") # => ".png"
ext = ext.sub(/^\./, '') # => "png" (drop the leading dot)
Mime[ext].to_s # => "text/png"
You can see a list of all of the MIME types Rails knows about by typing Mime::SET in the Rails console, or looking at the source, which also shows you how to register other MIME types in case you're expecting other types of files.
the following should help you:
upload = ActionDispatch::Http::UploadedFile.new({
:tempfile => File.new("#{Rails.root}/relative_path/to/tempfilefoo") , #make sure this file exists
:filename => "filename_foo" # use this instead of original_filename
})
upload.headers = "headers_foo"
upload.content_type = "content_type_foo"
I didn't understand by "And how can I read these attributes from a file instance?", what you exactly want to do.
Perhaps if you want to read the tempfile, you can use:
upload.read # -> content of tempfile
upload.rewind # -> rewinds the pointer back so that you can read it again.
Hope it helps :) And let me know if I have misunderstood.

Paperclip is not supporting .doc file

In rails 4.0.2, I am using paperclip gem to upload files. But it is not supporting .doc file. Below the file upload field, it is showing an error message as "has an extension that does not match its contents"
In model, the validation for checking the content type is given below :
validates_attachment_content_type :document, :content_type => ['application/txt', 'text/plain',
'application/pdf', 'application/msword',
'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'application/vnd.oasis.opendocument.text',
'application/x-vnd.oasis.opendocument.text',
'application/rtf', 'application/x-rtf', 'text/rtf',
'text/richtext', 'application/doc', 'application/docx', 'application/x-soffice', 'application/octet-stream']
Gems which is used right now
rails (4.0.2, 4.0.0, 3.2.13, 3.2.8, 3.0.4, 3.0.3)
paperclip (3.5.2, 2.3.11, 2.3.8)
How can I solve this issue?
add this to an initializer to disable spoofing protection:
require 'paperclip/media_type_spoof_detector'
module Paperclip
class MediaTypeSpoofDetector
def spoofed?
false
end
end
end
For centOS
module Paperclip
class MediaTypeSpoofDetector
def type_from_file_command
begin
Paperclip.run("file", "-b --mime :file", :file => #file.path)
rescue Cocaine::CommandLineError
""
end
end
end
end
from https://github.com/thoughtbot/paperclip/issues/1429
It's a bad idea to skip spoofing checking. Because Paperclip adds it for security reason. See this article for details:
http://robots.thoughtbot.com/prevent-spoofing-with-paperclip
The spoof validation checks if file's extension matches it's mime type. For example a txt file's mime type is text/plain, when you upload it to Paperclip everything goes fine. But if you modify the extension to jpg then upload it, the validation fails because jpg file's mime type should be image/jpeg.
Note that this validation is for security checking so there's not a normal way to skip it. Even when you use do_not_validate_attachment_file_type it's not skipped. But for some files Paperclip can't recognize file -> mime type mapping correctly.
In this case the right way is adding content type mapping to Paperclip configuration. Like this:
# Add it to initializer
Paperclip.options[:content_type_mappings] = {
pem: 'text/plain'
}
In this way it works without breaking spoofing validation. If you don't know what mime type a file is, you can use file command:
file -b --mime-type some_file.pdf # -> application/pdf
You can authorize all content types using do_not_validate_attachment_file_type :file
You can enable spoofing using has_attached_file :file,
class Upload
#validate_media_type == false means "authorize spoofing"
has_attached_file :file, validate_media_type: false
#authorize all content types
do_not_validate_attachment_file_type :file_if_content_type_missing
end
The error in the server log means that your OS file command cannot get you the MIME type for a .doc file. This happens for me with ubuntu 12.04.
To get around this, I slightly altered MediaTypeSpoofDetector to use mimetype if file --mime didn't work.
module Paperclip
class MediaTypeSpoofDetector
private
def type_from_file_command
# -- original code removed --
# begin
# Paperclip.run("file", "-b --mime-type :file", :file => #file.path)
# rescue Cocaine::CommandLineError
# ""
# end
# -- new code follows --
file_type = ''
begin
file_type = Paperclip.run('file', '-b --mime-type :file', file: #file.path)
rescue Cocaine::CommandLineError
file_type = ''
end
if file_type == ''
begin
file_type = Paperclip.run('mimetype', '-b :file', file: #file.path)
rescue Cocaine::CommandLineError
file_type = ''
end
end
file_type
end
end
end
Try putting do_not_validate_attachment_file_type :document validation in model.

Migrating paperclip S3 images to new url/path format

Is there a recommended technique for migrating a large set of paperclip S3 images to a new :url and :path format?
The reason for this is because after upgrading to rails 3.1, new versions of thumbs are not being shown after cropping (previously cached version is shown). This is because the filename no longer changes (since asset_timestamp was removed in rails 3.1). I'm using :fingerprint in the url/path format, but this is generated from the original, which doesn't change when cropping.
I was intending to insert :updated_at in the url/path format, and update attachment.updated_at during cropping, but after implementing that change all existing images would need to be moved to their new location. That's around half a million images to rename over S3.
At this point I'm considering copying them to their new location first, then deploying the code change, then moving any images which were missed (ie uploaded after the copy), but I'm hoping there's an easier way... any suggestions?
I had to change my paperclip path in order to support image cropping, I ended up creating a rake task to help out.
namespace :paperclip_migration do
desc 'Migrate data'
task :migrate_s3 => :environment do
# Make sure that all of the models have been loaded so any attachments are registered
puts 'Loading models...'
Dir[Rails.root.join('app', 'models', '**/*')].each { |file| File.basename(file, '.rb').camelize.constantize }
# Iterate through all of the registered attachments
puts 'Migrating attachments...'
attachment_registry.each_definition do |klass, name, options|
puts "Migrating #{klass}: #{name}"
klass.find_each(batch_size: 100) do |instance|
attachment = instance.send(name)
unless attachment.blank?
attachment.styles.each do |style_name, style|
old_path = interpolator.interpolate(old_path_option, attachment, style_name)
new_path = interpolator.interpolate(new_path_option, attachment, style_name)
# puts "#{style_name}:\n\told: #{old_path}\n\tnew: #{new_path}"
s3_copy(s3_bucket, old_path, new_path)
end
end
end
end
puts 'Completed migration.'
end
#############################################################################
private
# Paperclip Configuration
def attachment_registry
Paperclip::AttachmentRegistry
end
def s3_bucket
ENV['S3_BUCKET']
end
def old_path_option
':class/:id_partition/:attachment/:hash.:extension'
end
def new_path_option
':class/:attachment/:id_partition/:style/:filename'
end
def interpolator
Paperclip::Interpolations
end
# S3
def s3
AWS::S3.new(access_key_id: ENV['S3_KEY'], secret_access_key: ENV['S3_SECRET'])
end
def s3_copy(bucket, source, destination)
source_object = s3.buckets[bucket].objects[source]
destination_object = source_object.copy_to(destination, {metadata: source_object.metadata.to_h})
destination_object.acl = source_object.acl
puts "Copied #{source}"
rescue Exception => e
puts "*Unable to copy #{source} - #{e.message}"
end
end
Didn't find a feasible method for migrating to a new url format. I ended up overriding Paperclip::Attachment#generate_fingerprint so it appends :updated_at.

Storing image using open URI and paperclip having size less than 10kb

I want to import some icons from my old site. The size of those icons is less than 10kb. So when I am trying to import the icons its returning stringio.txt file.
require "open-uri"
class Category < ActiveRecord::Base
has_attached_file :icon, :path => ":rails_root/public/:attachment/:id/:style/:basename.:extension"
def icon_from_url(url)
self.icon = open(url)
end
end
In rake task.
category = Category.new
category.icon_from_url "https://xyz.com/images/dog.png"
category.save
Try:
def icon_from_url(url)
extname = File.extname(url)
basename = File.basename(url, extname)
file = Tempfile.new([basename, extname])
file.binmode
open(URI.parse(url)) do |data|
file.write data.read
end
file.rewind
self.icon = file
end
To override the default filename of a "fake file upload" in Paperclip (stringio.txt on small files or an almost random temporary name on larger files) you have 2 main possibilities:
Define an original_filename on the IO:
def icon_from_url(url)
io = open(url)
io.original_filename = "foo.png"
self.icon = io
end
You can also get the filename from the URI:
io.original_filename = File.basename(URI.parse(url).path)
Or replace :basename in your :path:
has_attached_file :icon, :path => ":rails_root/public/:attachment/:id/:style/foo.png", :url => "/:attachment/:id/:style/foo.png"
Remember to alway change the :url when you change the :path, otherwise the icon.url method will be wrong.
You can also define you own custom interpolations (e.g. :rails_root/public/:whatever).
You are almost there I think, try opening parsed uri, not the string.
require "open-uri"
class Category < ActiveRecord::Base
has_attached_file :icon, :path =>:rails_root/public/:attachment/:id/:style/:basename.:extension"
def icon_from_url(url)
self.icon = open(URI.parse(url))
end
end
Of course this doesn't handle errors
You can also disable OpenURI from ever creating a StringIO object, and force it to create a temp file instead. See this SO answer:
Why does Ruby open-uri's open return a StringIO in my unit test, but a FileIO in my controller?
In the past, I found the most reliable way to retrieve remote files was by using the command line tool "wget". The following code is mostly copied straight from an existing production (Rails 2.x) app with a few tweaks to fit with your code examples:
class CategoryIconImporter
def self.download_to_tempfile (url)
system(wget_download_command_for(url))
##tempfile.path
end
def self.clear_tempfile
##tempfile.delete if ##tempfile && ##tempfile.path && File.exist?(##tempfile.path)
##tempfile = nil
end
def self.set_wget
# used for retrieval in NrlImage (and in future from other sies?)
if !##wget
stdin, stdout, stderr = Open3.popen3('which wget')
##wget = stdout.gets
##wget ||= '/usr/local/bin/wget'
##wget.strip!
end
end
def self.wget_download_command_for (url)
set_wget
##tempfile = Tempfile.new url.sub(/\?.+$/, '').split(/[\/\\]/).last
command = [ ##wget ]
command << '-q'
if url =~ /^https/
command << '--secure-protocol=auto'
command << '--no-check-certificate'
end
command << '-O'
command << ##tempfile.path
command << url
command.join(' ')
end
def self.import_from_url (category_params, url)
clear_tempfile
filename = url.sub(/\?.+$/, '').split(/[\/\\]/).last
found = MIME::Types.type_for(filename)
content_type = !found.empty? ? found.first.content_type : nil
download_to_tempfile url
nicer_path = RAILS_ROOT + '/tmp/' + filename
File.copy ##tempfile.path, nicer_path
Category.create(category_params.merge({:icon => ActionController::TestUploadedFile.new(nicer_path, content_type, true)}))
end
end
The rake task logic might look like:
[
['Cat', 'cat'],
['Dog', 'dog'],
].each do |name, icon|
CategoryIconImporter.import_from_url {:name => name}, "https://xyz.com/images/#{icon}.png"
end
This uses the mime-types gem for content type discovery:
gem 'mime-types', :require => 'mime/types'

Resources