Rails: Preventing Duplicate Photo Uploads with Paperclip? - ruby-on-rails

Is there anyway to throw a validation error if a user tries to upload the same photo twice to a Rails app using Paperclip? Paperclip doesn't seem to offer this functionality...
I'm using Rails 2.3.5 and Paperclip (obviously).
SOLUTION: (or one of them, at least)
Using Beerlington's suggestion, I decided to go with an MD5 Checksum comparison:
class Photo < ActiveRecord::Base
#...
has_attached_file :image #, ...
before_validation_on_create :generate_md5_checksum
validate :unique_photo
#...
def generate_md5_checksum
self.md5_checksum = Digest::MD5.hexdigest(image.to_file.read)
end
def unique_photo
photo_digest = self.md5_checksum
errors.add_to_base "You have already uploaded that file!" unless User.find(self.user_id).photos.find_by_md5_checksum(photo_digest).nil?
end
# ...
end
Then I just added a column to my photos table called md5_checksum, and voila! Now my app throws a validation error if you try to upload the same photo!
No idea how efficient/inefficient this is, so refactoring's welcome!
Thanks!

What about doing an MD5 on the image file? If it is the exact same file, the MD5 hash will be the same for both images.

For anyone else trying to do this. Paperclip now has md5 hashing built in. If you have a [attachment]_fingerprint in your model, paperclip will populate this with the MD5.
Since I already had a column named hash_value, I made a 'virtual' attribute called fingerprint
#Virtual attribute to have paperclip generate the md5
def picture_fingerprint
self.hash_value
end
def picture_fingerprint=(md5Hash)
self.hash_value=md5Hash
end
And, with rails3, using sexy_validations, I was able to simply add this to the top my my model to ensure that the hash_value is unique before it saves the model:
validates :hash_value, :uniqueness => { :message => "Image has already been uploaded." }

You might run into a problem when your images have amended EXIF metadata. This happened to me, and I had to extract pixel values and calculate MD5s out of them, to ignore changes made by Wordpress etc. You can read about it on our blog: http://www.amberbit.com/blog/2013/12/20/similar-images-detection-in-ruby-with-phash/ but essentially you want to get the pixel data out of image with some tool (like RMagick), concatinate it to string, and calculate MD5 out of that.

As Stephen indicated, your biggest issue is how to determine if a file is a duplicate, and there is no clear answer for this.
If these are photos taken with a digital camera, you would want to compare the EXIF data. If the EXIF data matches then the photo is most likely a duplicate. If it is a duplicate then you can inform the user of this. You'll have to accept the upload initially though so that you examine the EXIF data.
I should mention that EXIFR is a nice ruby gem for examining the EXIF data.

Related

Rails 5.2 ActiveStorage save and then read Exif data

On Rails 5.2 I am trying to save an avatar via ActiveStorage but it seems as though not image oriantation data is being saved in the active storage blob.
I am saving the avatar via a file_field on a create action my
#user model
has_one_attached :avatar
private
def avatar_validation
if avatar.attached?
if avatar.blob.byte_size > 1000000
avatar.purge
errors.add(:avatar, 'file is too large')
elsif !avatar.blob.content_type.in?(%w[image/png image/jpg
image/jpeg])
avatar.purge
errors.add(:avatar, 'file type needs to be JPEG, JPG, or PNG')
end
end
end
I have been reading some documentation for minimagick https://github.com/minimagick/minimagick but have not figured out how I can associate
user.avatar.blob
with
image = MiniMagick::Image.open("input.jpg")
I have tried
image = MiniMagick::Image.open("user.avatar.blob")
but have had no luck
I need to try and figure this out because some avatars stored in active storage are being displayed rotated 90 degrees.
https://edgeguides.rubyonrails.org/active_storage_overview.html
talks of image processing but I have also had no luck with the gem rails recommends
I think you want to use a variant when displaying the image rather than trying to edit the stored image. To fix the orientation, you could say:
user.avatar.variant(auto_orient: true)
And if you want to do several operations at once (rather than in a pipeline), use combine_options:
user.avatar.variant(combine_options: {
auto_orient: true,
gravity: 'center',
resize: '23x42', # Using real dimensions of course.
crop: '23x42+0+0'
})
The edited image will be cached so you only do the transformation work on first access. You might want to put your variants into view helpers (or maybe even a model concern depending on your needs) so that you can isolate the noise.
You might want to refer to the API docs as well as the guide:
ActiveStorage::Variant
ActiveStorage::Variation

ActiveStorage Thumbnail Persistence

I have migrated my Rails app to 5.2.0. Before I was using Paperclip. Paperclip generates different variants like thumbnail and avatar when an image is uploaded. How can I achieve this with ActiveStorage? I know we can do this user.avatar.variant(resize_to_fit: [100, 100]) but to me it's like doing this over and over again. I'm aiming to do pre-processing of these variants once it's uploaded.
Also you guys can suggest a better technique if this is bad from your experience.
Using .processed is the correct way to check if that variant was already processed and uploaded to the storage service.
One thing that Paperclip did nicely was the styles: {} object, in which you could list all the different transformations you wanted to do for thumbnails, etc, and name them.
Here's how I am handling named & stored transformations. This also keeps my template syntax shorter:
class Image < ActiveRecord::Base
has_one_attached :image_file
def self.sizes
{
thumbnail: { resize: "100x100" },
hero1: { resize: "1000x500" }
}
end
def sized(size)
self.image_file.variant(Image.sizes[size]).processed
end
end
Then in a template, say I have #image, I can simply call #image.sized(:hero1)
#aguardientico is correct that by add the .processed method to your variant object which will use the blob key to check if the file already exists on your service before attempting to re-produce the whole process again.
Also something to know in addition is the resize_to_fit is a ImageProcessing gem transformation method and is not supported yet by Rails 5.2. Instead right now it uses MiniMagick where you would append > to the resize method for paperclip.
so rewritten it would look like user.avatar.variant(resize: "100x100>")

Carrierwave, creating a duplicate attachment when duplicating its containing model

I would like to duplicate a model. The original model contains an attachment through Carrierwave. Ideally, a new attachment would be created, that is a copy of the original image, for the new model object.
I have looked through the Carrierwave documentation, and googled this problem, but have not found a solution that creates a new duplicate of the original image. Is this reasonable? Possible?
I don't believe Carrierwave has this option. However, you can make use of the *_remote_url= method to set the new model's picture to be a duplicate of the first.
Here's a brief example
Say I have a model which has_one :photo attached with carrierwave. I can duplicate, the model, set the photo to the previous one and save it. Example:
first_model = User.first
duplicate_model = first_model.dup #(where the dup code duplicates everything else you need)
duplicate_model.remote_photo_url = first_model.photo_url
duplicate_model.save
This would then "copy" the photo from the first object into your second as a new carrierwave attachment.
While copy_carrierwave_file is a neat gem it is not nescessary as long as you use local storage.
carrierwave can use local files as source of attachments and you can use this to duplicate the attachment:
first_user = User.first
duplicate_user = first_user.dup
duplicate_user.photo = File.open(first_user.photo.file.file) if first_user.photo.present?
duplicate_user.save
This is more efficient than routing the image twice through your web server.
Try this gem https://github.com/equivalent/copy_carrierwave_file , it handles both local storage and Fog storage
original_resource = User.last
new_resource = User.new
CopyCarrierwaveFile::CopyFileService.new(original_resource, new_resource, :avatar).set_file
new_resource.save
nev_resource.avatar.url # https://...image.jpg
For me with CarrierWave 0.10 this works just fine:
user = User.first
dup_user = user.dup
dup_user.photo = user.photo
dup_user.save
Although I'm not sure how this works out when using cloud storage like S3
Extracted from the Carrierwave wiki page:
YourModel.find_each do |ym|
begin
ym.process_your_uploader_upload = true # only if you use carrierwave_backgrounder
ym.your_uploader.cache_stored_file!
ym.your_uploader.retrieve_from_cache!(ym.your_uploader.cache_name)
ym.your_uploader.recreate_versions!(:version1, :version2)
ym.save!
rescue => e
puts "ERROR: YourModel: #{ym.id} -> #{e.to_s}"
end
end
I needed to fully duplicate the whole version set on S3, while some of the versions were cropped.
Unfortunately, remote_#{column}_url= method was of no help, because by the time the versions are recreated, there are no crop params on the model:
I used RailsCasts approach using attr_accessor to crop the avatar, and those params weren't stored in the DB.
After some research and a lot of failures, I found this answer and noticed that copy_to method.
It turned out that both SanitizedFile and Storage::Fog have it, so it's possible to use it for local and S3 files. I didn't however investigate how it literally works and decided to let Carrierwave a chance to take care of it.
class AvatarUploader
…
def duplicate_to(target)
return unless file.present? && target.logo.file.present?
versions.keys.each do |version|
public_send(version).file.copy_to(target.avatar.public_send(version).path)
end
end
end
That's all it takes to fully duplicate the images, no matter if they are cropped or not.
There's a catch, however: you should only call duplicate_to after the model is already saved with other avatar, or the target path would be nil. Thus, one useless round of processing takes place for the new record.
new_user.assign_attributes(old_user.slice(:avatar, :avatar_alignment))
# Won't work!
old_user.avatar.duplicate_to(new_user) # => as the `new_user` hasn't persisted yet, its avatar files are Tempfiles
new_user.save # => will recreate the versions from the original image, losing the cropped versions!
# But this works
new_user.save # => the avatar will be stored as a set of versions created from the original (useless processing)
old_user.avatar.duplicate_to(new_user) # => the avatar files will be rewritten by the copies of old_user files
I think it's a good idea to store the crop params somewhere in the DB in a JSON-like object for such cases (and to be protected from losing cropping data when you have to recreate_versions!), but if that's not an option, this solution might be what you seek.
As this thread is the first G-link when searching for carrierwave duplicate, I decided to post this answer exactly here.
Carrierwave 1.3.2 with fog-aws 1.2.0.
Hope this helps someone or the future me!
This worked for me:
user = User.first
dup_user = user.dup
dup_user.photo = user.photo
dup_user.save
Reference: https://codeutility.org/ruby-on-rails-carrierwave-creating-a-duplicate-attachment-when-duplicating-its-containing-model-stack-overflow/

Using carrierwave to upload one image to multiple storage location

I would like to be able to upload one image into two different locations: one location would be on the local filesystem (of the server) and the other would be Amazon S3 (the Amazon S3 location would be optional).
My current environment is Rails 3.2.8, Ruby 1.9.3, with Carrierwave used for uploading the file.
I've had some success using the following method:
Model
class Image < ActiveRecord:Base
attt_accessor :remote
before_save :configure_for_remote
mount_uploader :image, ImageUploader #stores images locally
mount_uploader :image_remote, ImageRemoteUploader #store images on S3
def configure_for_remote
if self.remote=="1"
self.image_remote = self.image.dup
end
end
end
Relevant view form fields (simple form syntax)
<p><%= f.input :image, as: :file %></p>
<p><%= f.input :remote, as: :boolean %></p>
The user checks the "remote" checkbox in the form and chooses the image to upload. The before_save callback stores a duplicate of image into image_remote, the file is processed by their respective uploaders, and I have my desired result.
However, I'm starting to run into problems when I want to update that field. For example, if the user chooses to first upload the file locally and not to S3 (does not check the remote checkbox), then later comes back to the form and checks the remote checkbox. In this case, the before_save callback does not get run because no real active record column has been changed (only the remote flag). I've tried to use before_validation, but this fails to work (the image_remote uploader stores the proper filename in the image_remote column, but the image does not get uploaded to S3). Obviously something is changing between the before_validation and the before_save (image attribute is being converted to and uploader?) but I can't seem to figure out why this doesn't work.
With all this being said, I think my approach with using dup is a bit of a hack, and I'm hoping someone can advise me in a more elegant way of reaching my goal.
Thanks for your help.
I was to solve this, although I'm still not sure if it's the most elegant solution.
First off, I mentioned in my question that when I registered config_for_remote_upload with the before_validation callback, the file was not uploaded to S3, but the image_remote column was populated. Upon further inspection, the situation is even worse. When initializing the image_remote uploader within the before_validation callback, all files were deleted on the S3 storage bucket! I replicated this a couple times. I only tested when the store_dir was set to nil in the uploaded, thus putting the files at the root of the bucket.
Initializing the image_remote column in during the before_save callback does not have this problem. In order force the record to save (it wouldn't save, because only a non db column attribute was being changed) I added a before_validation that changed the update_at field of the record.
before_validation: :change_record_updated_at
...
def change_record_updated_at
self.update_at=Time.current
end
I also moved away from using dup, not because it didn't work, but rather because I didn't know why it worked. Instead I created a StringIO object for the file and assigned that to the image_remote column.
def config_for_remote_upload
if self.remote.to_i==1
#self.image_remote = self.image.dup
#this will open the file as binary
img_binary = File.open(self.image.file.path){ |i| i.read }
img_encoded = Base64.encode64(img_binary)
io = FilelessIO.new(Base64.decode64(img_encoded))
io.original_filename = self.image.file.original_filename
self.image_remote = io
elsif self.remote.to_i==0
#delete remote image and clear field
self.remove_image_remote = true
end
end
See here for further info on FilelessIO (StringIO with original_filename).
With this configuration, the file can be uploaded to the second storage location (S3 in my case) after the initial upload.
Hope this helps someone else out.

Carrierwave Rails 3 S3, save the file size to the database

Using Carrierwave with Rails 3.2.6. All fine, except I need to sort a table where some attachments are displayed by file size. I'm using S3 for storage with fog.
Let's say I have a Carrierwave showing like this:
<%= #project.attachment %>
I am able to show the size of the file by using '.size' after the field name:
<%= #project.attachment.size %>
shows the file size in bytes, but as I need to use an order clause when getting the records from the database, I cannot sort on this.
Is there any way to write the file size to a particular column in the database after it has been uploaded so I can sort on this??
many thanks
this worked for me
before_save :update_project_attributes
private
def update_project_attributes
if project.present? && project_changed?
self.file_size = project.file.size
end
end
You should add a virtual attribute to the model and define a custom getter method that returns the file size. You can then sort with respect to this virtual attribute as you usually would. Let me know if you need more details and I will try to provide them!
Ok,
got this to work with before_save
before_save :set_size
def set_size
self.size = self.upload.size
end
where upload is the mounted field and size is a new db column to store the size.

Resources