Storing files from Imgur in POSTGRES using Rails - ruby-on-rails

I am trying to set up a database to store png files extracted from Imgur and I wanted to know if it would be possible to directly store my images into my database rather than just the link. I currently am storing image links from a txt file with selected Imgur links using activerecord in my seed.rb file as seen in the code below:
image = []
images_c = [:url]
x = 0
while x < 25 do
image_hash = {:url => url_file_lines[x]}
image << image_hash
x = x + 1
end
Image.import images_c, import, validate: false
I currently have a good idea of how to retrieve the files via the imgur link but am having a hard time figuring out how to convert the image file into a datatype that could be stored into the postgres db.

Store images in database (blob) is a bad idea. Just because you will get unnecessary growth of db. Instead of store in database - I recommend use some gems for storing files (shrine). If you has not any cloud storage - you can just save it in on your local storage. You just need add column image_data at your table with type jsonb (as you use postgres) and then mount uploader class into your model (like on gem documentation).

Related

Carrierwave, creating a duplicate attachment when duplicating its containing model

I would like to duplicate a model. The original model contains an attachment through Carrierwave. Ideally, a new attachment would be created, that is a copy of the original image, for the new model object.
I have looked through the Carrierwave documentation, and googled this problem, but have not found a solution that creates a new duplicate of the original image. Is this reasonable? Possible?
I don't believe Carrierwave has this option. However, you can make use of the *_remote_url= method to set the new model's picture to be a duplicate of the first.
Here's a brief example
Say I have a model which has_one :photo attached with carrierwave. I can duplicate, the model, set the photo to the previous one and save it. Example:
first_model = User.first
duplicate_model = first_model.dup #(where the dup code duplicates everything else you need)
duplicate_model.remote_photo_url = first_model.photo_url
duplicate_model.save
This would then "copy" the photo from the first object into your second as a new carrierwave attachment.
While copy_carrierwave_file is a neat gem it is not nescessary as long as you use local storage.
carrierwave can use local files as source of attachments and you can use this to duplicate the attachment:
first_user = User.first
duplicate_user = first_user.dup
duplicate_user.photo = File.open(first_user.photo.file.file) if first_user.photo.present?
duplicate_user.save
This is more efficient than routing the image twice through your web server.
Try this gem https://github.com/equivalent/copy_carrierwave_file , it handles both local storage and Fog storage
original_resource = User.last
new_resource = User.new
CopyCarrierwaveFile::CopyFileService.new(original_resource, new_resource, :avatar).set_file
new_resource.save
nev_resource.avatar.url # https://...image.jpg
For me with CarrierWave 0.10 this works just fine:
user = User.first
dup_user = user.dup
dup_user.photo = user.photo
dup_user.save
Although I'm not sure how this works out when using cloud storage like S3
Extracted from the Carrierwave wiki page:
YourModel.find_each do |ym|
begin
ym.process_your_uploader_upload = true # only if you use carrierwave_backgrounder
ym.your_uploader.cache_stored_file!
ym.your_uploader.retrieve_from_cache!(ym.your_uploader.cache_name)
ym.your_uploader.recreate_versions!(:version1, :version2)
ym.save!
rescue => e
puts "ERROR: YourModel: #{ym.id} -> #{e.to_s}"
end
end
I needed to fully duplicate the whole version set on S3, while some of the versions were cropped.
Unfortunately, remote_#{column}_url= method was of no help, because by the time the versions are recreated, there are no crop params on the model:
I used RailsCasts approach using attr_accessor to crop the avatar, and those params weren't stored in the DB.
After some research and a lot of failures, I found this answer and noticed that copy_to method.
It turned out that both SanitizedFile and Storage::Fog have it, so it's possible to use it for local and S3 files. I didn't however investigate how it literally works and decided to let Carrierwave a chance to take care of it.
class AvatarUploader
…
def duplicate_to(target)
return unless file.present? && target.logo.file.present?
versions.keys.each do |version|
public_send(version).file.copy_to(target.avatar.public_send(version).path)
end
end
end
That's all it takes to fully duplicate the images, no matter if they are cropped or not.
There's a catch, however: you should only call duplicate_to after the model is already saved with other avatar, or the target path would be nil. Thus, one useless round of processing takes place for the new record.
new_user.assign_attributes(old_user.slice(:avatar, :avatar_alignment))
# Won't work!
old_user.avatar.duplicate_to(new_user) # => as the `new_user` hasn't persisted yet, its avatar files are Tempfiles
new_user.save # => will recreate the versions from the original image, losing the cropped versions!
# But this works
new_user.save # => the avatar will be stored as a set of versions created from the original (useless processing)
old_user.avatar.duplicate_to(new_user) # => the avatar files will be rewritten by the copies of old_user files
I think it's a good idea to store the crop params somewhere in the DB in a JSON-like object for such cases (and to be protected from losing cropping data when you have to recreate_versions!), but if that's not an option, this solution might be what you seek.
As this thread is the first G-link when searching for carrierwave duplicate, I decided to post this answer exactly here.
Carrierwave 1.3.2 with fog-aws 1.2.0.
Hope this helps someone or the future me!
This worked for me:
user = User.first
dup_user = user.dup
dup_user.photo = user.photo
dup_user.save
Reference: https://codeutility.org/ruby-on-rails-carrierwave-creating-a-duplicate-attachment-when-duplicating-its-containing-model-stack-overflow/

Carrierwave Rails 3 S3, save the file size to the database

Using Carrierwave with Rails 3.2.6. All fine, except I need to sort a table where some attachments are displayed by file size. I'm using S3 for storage with fog.
Let's say I have a Carrierwave showing like this:
<%= #project.attachment %>
I am able to show the size of the file by using '.size' after the field name:
<%= #project.attachment.size %>
shows the file size in bytes, but as I need to use an order clause when getting the records from the database, I cannot sort on this.
Is there any way to write the file size to a particular column in the database after it has been uploaded so I can sort on this??
many thanks
this worked for me
before_save :update_project_attributes
private
def update_project_attributes
if project.present? && project_changed?
self.file_size = project.file.size
end
end
You should add a virtual attribute to the model and define a custom getter method that returns the file size. You can then sort with respect to this virtual attribute as you usually would. Let me know if you need more details and I will try to provide them!
Ok,
got this to work with before_save
before_save :set_size
def set_size
self.size = self.upload.size
end
where upload is the mounted field and size is a new db column to store the size.

Stream uploading large files using aws-sdk

Is there a way to stream upload large files to S3 using aws-sdk?
I can't seem to figure it out but I'm assuming there's a way.
Thanks
Update
My memory failed me and I didn't read the quote mentioned in my initial answer correctly (see below), as revealed by the API documentation for (S3Object, ObjectVersion) write(data, options = {}) :
Writes data to the object in S3. This method will attempt to
intelligently choose between uploading in one request and using
#multipart_upload.
[...] You can pass :data or :file as the first argument or as options. [emphasis mine]
The data parameter is the one to be used for streaming, apparently:
:data (Object) — The data to upload. Valid values include:
[...] Any object responding to read and eof?; the object must support the following access methods:
read # all at once
read(length) until eof? # in chunks
If you specify data this way, you must also include the
:content_length option.
[...]
:content_length (Integer) — If provided, this option must match the
total number of bytes written to S3 during the operation. This option
is required if :data is an IO-like object without a size method.
[emphasis mine]
The resulting sample fragment might look like so accordingly:
# Upload a file.
key = File.basename(file_name)
s3.buckets[bucket_name].objects[key].write(:data => File.open(file_name),
:content_length => File.size(file_name))
puts "Uploading file #{file_name} to bucket #{bucket_name}."
Please note that I still haven't actually tested this, so beware ;)
Initial Answer
This is explained in Upload an Object Using the AWS SDK for Ruby:
Uploading Objects
Create an instance of the AWS::S3 class by providing your AWS credentials.
Use the AWS::S3::S3Object#write method which takes a data parameter and options hash which allow you to upload data from a file, or a stream. [emphasis mine]
The page contains a complete example as well, which uses a file rather than a stream though, the relevant fragment:
# Upload a file.
key = File.basename(file_name)
s3.buckets[bucket_name].objects[key].write(:file => file_name)
puts "Uploading file #{file_name} to bucket #{bucket_name}."
That should be easy to adjust to use a stream instead (if I recall correctly you might just need to replace the file_name parameter with open(file_name) - make sure to verify this though), e.g.:
# Upload a file.
key = File.basename(file_name)
s3.buckets[bucket_name].objects[key].write(:file => open(file_name))
puts "Uploading file #{file_name} to bucket #{bucket_name}."
I don't know how big the files you want to upload are, but for large files a 'pre-signed post' allows the user operating the browser to bypass your server and upload directly to S3. That may be what you need - to free up your server during an upload.

Rails: Preventing Duplicate Photo Uploads with Paperclip?

Is there anyway to throw a validation error if a user tries to upload the same photo twice to a Rails app using Paperclip? Paperclip doesn't seem to offer this functionality...
I'm using Rails 2.3.5 and Paperclip (obviously).
SOLUTION: (or one of them, at least)
Using Beerlington's suggestion, I decided to go with an MD5 Checksum comparison:
class Photo < ActiveRecord::Base
#...
has_attached_file :image #, ...
before_validation_on_create :generate_md5_checksum
validate :unique_photo
#...
def generate_md5_checksum
self.md5_checksum = Digest::MD5.hexdigest(image.to_file.read)
end
def unique_photo
photo_digest = self.md5_checksum
errors.add_to_base "You have already uploaded that file!" unless User.find(self.user_id).photos.find_by_md5_checksum(photo_digest).nil?
end
# ...
end
Then I just added a column to my photos table called md5_checksum, and voila! Now my app throws a validation error if you try to upload the same photo!
No idea how efficient/inefficient this is, so refactoring's welcome!
Thanks!
What about doing an MD5 on the image file? If it is the exact same file, the MD5 hash will be the same for both images.
For anyone else trying to do this. Paperclip now has md5 hashing built in. If you have a [attachment]_fingerprint in your model, paperclip will populate this with the MD5.
Since I already had a column named hash_value, I made a 'virtual' attribute called fingerprint
#Virtual attribute to have paperclip generate the md5
def picture_fingerprint
self.hash_value
end
def picture_fingerprint=(md5Hash)
self.hash_value=md5Hash
end
And, with rails3, using sexy_validations, I was able to simply add this to the top my my model to ensure that the hash_value is unique before it saves the model:
validates :hash_value, :uniqueness => { :message => "Image has already been uploaded." }
You might run into a problem when your images have amended EXIF metadata. This happened to me, and I had to extract pixel values and calculate MD5s out of them, to ignore changes made by Wordpress etc. You can read about it on our blog: http://www.amberbit.com/blog/2013/12/20/similar-images-detection-in-ruby-with-phash/ but essentially you want to get the pixel data out of image with some tool (like RMagick), concatinate it to string, and calculate MD5 out of that.
As Stephen indicated, your biggest issue is how to determine if a file is a duplicate, and there is no clear answer for this.
If these are photos taken with a digital camera, you would want to compare the EXIF data. If the EXIF data matches then the photo is most likely a duplicate. If it is a duplicate then you can inform the user of this. You'll have to accept the upload initially though so that you examine the EXIF data.
I should mention that EXIFR is a nice ruby gem for examining the EXIF data.

File upload in Rails- data is an object, how do I return it in my view?

When doing an upload in my Rails project, the database stores
--- !ruby/object:File
content_type: application/octet-stream
original_path: my.numbers
how do I get it to return my.numbers in my view only?
Thanks a bunch!
Marco
ps. I don't want to use attachment_fu or any other plugin preferably.
A file upload is actually received by your controller as a File object, not as data, so it is your responsibility to read it in. Typically uploaded files are saved in a temporary directory and an open filehandle to it is present in the params.
You could do something like the following to retrieve the data:
def create
# Read in data from file into parameter before creating anything
if (params[:model] and params[:model][:file])
params[:model][:file] = params[:model][:file].read
end
#model = MyModel.create(params[:model])
end
You would probably need to be sure that the column in the database can store binary data. In MySQL migrations this is the :binary column type.
You can access the name of the uploaded file with the helper original_filename.
so params[:model][:file].original_filename

Resources