Carrierwave: file hash and model id in filename/store_dir

Carrierwave: file hash and model id in filename/store_dir - carrierwave

I'm using carrierwave in a Rails 4 project with the file storage for development and testing and the fog storage (for storing on Amazon S3) for production.
I would like to save my files with paths like this:
/model_class_name/part_of_hash/another_part_of_hash/hash-model_id.file_extension
(example: /images/12/34/1234567-89.png where 1234567 is the SHA1 hash of the file content and 89 is the id of the associated image model in the database).
What I tried so far is this:
class MyUploader < CarrierWave::Uploader::Base
def store_dir
"#{model.class.name.underscore}/#{sha1_for(file)[0..1]}/#{sha1_for(file)[2..3]}"
end
def filename
"#{sha1_for(file)}-#{model.id}.#{file.extension}" if original_file
end
private
def sha1_for file
Digest::SHA1.hexdigest file.read
end
end
This does not work because:
model.id is not available when filename is called
file is not always available when store_dir is called
So, coming to my questions:
is it possible to use model ids/attributes within filename? This link says it should not be done; is there a way to work around it?
is it possible to use file content/attributes within store_dir? I found no documentation on this but my experiences so far say "no" (see above).
how would you implement file/directory naming to get something as close as possible to what I outlined in the beginning?

Including the id in the filename on create may not be possible, since the filename is stored in the database but the id isn't available yet. An (admittedly rather extreme) workaround would be to use a temporary value on create, and then after_commit on: :create, move the file and change the name in the database. It may be possible to optimize this with an after_create, but I'll leave that up to you. (This is where carrierwave actually uploads the file.)
Including file attributes directly within the store_dir isn't possible, since store_dir is used to calculate the url—url would require knowing the sha1, which requires having access to the file, which requires knowing the url, etc. The workaround is pretty obvious: cache the attributes in which you're interested (in this case the sha1) in the model's database record, and use that in the store_dir.
The simpler variant on the id-in-filename approach is to use some other value, such as a uuid, and store that value in the database. There are some notes on that here.

Taavo's answer strictly answers my questions. But I want to quickly detail the final solution I implemented since it may helps someone else, too...
I gave up the idea to use the model id in the filename and replaced it with a random string instead (the whole idea of the model id in the filename was to just ensure that 2 identical files associated with different models end up with different file names; and some random characters ensure that as well).
So I ended up with filenames like filehash-randomstring.extension.
Since carrierwave saves the filename in the model, I realized that I already have the file hash available in the model (in the form of the first part of the filename). So I just used this within store_dir to generate a path in the form model_class_name/file_hash_part/another_file_hash_part.
My final implementation looks like this:
class MyUploader < Carrierwave::Uploader::Base
def store_dir
# file name saved on the model. It is in the form:
# filehash-randomstring.extension, see below...
filename = model.send(:"#{mounted_as}_identifier")
"#{model.class.name.underscore}/#{filename[0..1]}/#{filename[3..4]}"
end
def filename
if original_filename
existing = model.send(:"#{mounted_as}_identifier")
# reuse the existing file name from the model if present.
# otherwise, generate a new one (and cache it in an instance variable)
#generated_filename ||= if existing.present?
existing
else
"#{sha1_for file}-#{SecureRandom.hex(4)}.#{file.extension}"
end
end
end
private
def sha1_for file
Digest::SHA1.hexdigest file.read
end
end

I came across the same problem recently, where the model.id was not available yet when storing the filename in the DB, upon creation of the uploader record. I found this workaround. I am not sure if it is respecting RESTful principles, I am open to suggestions.
I modified the controller, so that right after the creation of the image, an update_attributes is executed, so that the filename including the now existing model.id value is saved in the DB.
def create
#uploader = Uploader.new(uploader_params)
if #uploader.save
if #uploader.update_attributes(uploader_params)
render json: #uploader, status: :created
end
else
render json: #uploader.errors, status: :unprocessable_entity
end
end

Related

Carrierwave, creating a duplicate attachment when duplicating its containing model

I would like to duplicate a model. The original model contains an attachment through Carrierwave. Ideally, a new attachment would be created, that is a copy of the original image, for the new model object.
I have looked through the Carrierwave documentation, and googled this problem, but have not found a solution that creates a new duplicate of the original image. Is this reasonable? Possible?

I don't believe Carrierwave has this option. However, you can make use of the *_remote_url= method to set the new model's picture to be a duplicate of the first.
Here's a brief example
Say I have a model which has_one :photo attached with carrierwave. I can duplicate, the model, set the photo to the previous one and save it. Example:
first_model = User.first
duplicate_model = first_model.dup #(where the dup code duplicates everything else you need)
duplicate_model.remote_photo_url = first_model.photo_url
duplicate_model.save
This would then "copy" the photo from the first object into your second as a new carrierwave attachment.

While copy_carrierwave_file is a neat gem it is not nescessary as long as you use local storage.
carrierwave can use local files as source of attachments and you can use this to duplicate the attachment:
first_user = User.first
duplicate_user = first_user.dup
duplicate_user.photo = File.open(first_user.photo.file.file) if first_user.photo.present?
duplicate_user.save
This is more efficient than routing the image twice through your web server.

Try this gem https://github.com/equivalent/copy_carrierwave_file , it handles both local storage and Fog storage
original_resource = User.last
new_resource = User.new
CopyCarrierwaveFile::CopyFileService.new(original_resource, new_resource, :avatar).set_file
new_resource.save
nev_resource.avatar.url # https://...image.jpg

For me with CarrierWave 0.10 this works just fine:
user = User.first
dup_user = user.dup
dup_user.photo = user.photo
dup_user.save
Although I'm not sure how this works out when using cloud storage like S3

Extracted from the Carrierwave wiki page:
YourModel.find_each do |ym|
begin
ym.process_your_uploader_upload = true # only if you use carrierwave_backgrounder
ym.your_uploader.cache_stored_file!
ym.your_uploader.retrieve_from_cache!(ym.your_uploader.cache_name)
ym.your_uploader.recreate_versions!(:version1, :version2)
ym.save!
rescue => e
puts "ERROR: YourModel: #{ym.id} -> #{e.to_s}"
end
end

I needed to fully duplicate the whole version set on S3, while some of the versions were cropped.
Unfortunately, remote_#{column}_url= method was of no help, because by the time the versions are recreated, there are no crop params on the model:
I used RailsCasts approach using attr_accessor to crop the avatar, and those params weren't stored in the DB.
After some research and a lot of failures, I found this answer and noticed that copy_to method.
It turned out that both SanitizedFile and Storage::Fog have it, so it's possible to use it for local and S3 files. I didn't however investigate how it literally works and decided to let Carrierwave a chance to take care of it.
class AvatarUploader
…
def duplicate_to(target)
return unless file.present? && target.logo.file.present?
versions.keys.each do |version|
public_send(version).file.copy_to(target.avatar.public_send(version).path)
end
end
end
That's all it takes to fully duplicate the images, no matter if they are cropped or not.
There's a catch, however: you should only call duplicate_to after the model is already saved with other avatar, or the target path would be nil. Thus, one useless round of processing takes place for the new record.
new_user.assign_attributes(old_user.slice(:avatar, :avatar_alignment))
# Won't work!
old_user.avatar.duplicate_to(new_user) # => as the `new_user` hasn't persisted yet, its avatar files are Tempfiles
new_user.save # => will recreate the versions from the original image, losing the cropped versions!
# But this works
new_user.save # => the avatar will be stored as a set of versions created from the original (useless processing)
old_user.avatar.duplicate_to(new_user) # => the avatar files will be rewritten by the copies of old_user files
I think it's a good idea to store the crop params somewhere in the DB in a JSON-like object for such cases (and to be protected from losing cropping data when you have to recreate_versions!), but if that's not an option, this solution might be what you seek.
As this thread is the first G-link when searching for carrierwave duplicate, I decided to post this answer exactly here.
Carrierwave 1.3.2 with fog-aws 1.2.0.
Hope this helps someone or the future me!

This worked for me:
user = User.first
dup_user = user.dup
dup_user.photo = user.photo
dup_user.save
Reference: https://codeutility.org/ruby-on-rails-carrierwave-creating-a-duplicate-attachment-when-duplicating-its-containing-model-stack-overflow/

Including .xml file to rails and using it

So I have this currency .xml file:
http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml
Now, I am wondering, how can I make my rails application read it? Where do I even have to put it and how do I include it?
I am basically making a currency exchange rate calculator.
And I am going to make the dropdown menu have the currency names from the .xml table appear in it and be usable.

First of all you're going to have to be able to read the file--I assume you want the very latest from that site, so you'll be making an HTTP request (otherwise, just store the file anywhere in your app and read it with File.read with a relative path). Here I use Net::HTTP, but you could use HTTParty or whatever you prefer.
It looks like it changes on a daily basis, so maybe you'll only want to make one HTTP request every day and cache the file somewhere along with a timestamp.
Let's say you have a directory in your application called rates where we store the cached xml files, the heart of the functionality could look like this (kind of clunky but I want the behaviour to be obvious):
def get_rates
today_path = Rails.root.join 'rates', "#{Date.today.to_s}.xml"
xml_content = if File.exists? today_path
# Read it from local storage
File.read today_path
else
# Go get it and store it!
xml = Net::HTTP.get URI 'http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml'
File.write today_path, xml
xml
end
# Now convert that XML to a hash. Lots of ways to do this, but this is very simple xml.
currency_list = Hash.from_xml(xml_content)["Envelope"]["Cube"]["Cube"]["Cube"]
# Now currency_list is an Array of hashes e.g. [{"currency"=>"USD", "rate"=>"1.3784"}, ...]
# Let's say you want a single hash like "USD" => "1.3784", you could do a conversion like this
Hash[currency_list.map &:values]
end
The important part there is Hash.from_xml. Where you have XML that is essentially key/value pairs, this is your friend. For anything more complicated you will want to look for an XML library like Nokogiri. The ["Envelope"]["Cube"]["Cube"]["Cube"] is digging through the hash to get to the important part.
Now, you can see how sensitive this will be to any changes in the XML structure, and you should make the endpoint configurable, and that hash is probably small enough to cache up in memory, but this is the basic idea.
To get your list of currencies out of the hash just say get_rates.keys.
As long as you understand what's going on, you can make that smaller:
def get_rates
today_path = Rails.root.join 'rates', "#{Date.today.to_s}.xml"
Hash[Hash.from_xml(if File.exists? today_path
File.read today_path
else
xml = Net::HTTP.get URI 'http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml'
File.write today_path, xml
xml
end)["Envelope"]["Cube"]["Cube"]["Cube"].map &:values]
end
If you do choose to cache the xml you will probably want to automatically clear out old versions of the cached XML file, too. If you want to cache other conversion lists consider a naming scheme derived automatically from the URI, e.g. eurofxref-daily-2013-10-28.xml.
Edit: let's say you want to cache the converted xml in memory--why not!
module CurrencyRetrieval
def get_rates
if defined?(##rates_retrieved) && (##rates_retrieved == Date.today)
##rates
else
##rates_retrieved = Date.today
##rates = Hash[Hash.from_xml(Net::HTTP.get URI 'http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml')["Envelope"]["Cube"]["Cube"]["Cube"].map &:values]
end
end
end
Now just include CurrencyRetrieval wherever you need it and you're golden. ##rates and ##rates_retrieved will be stored as class variables in whatever class you include this module within. You must test that this persists between calls in your production setup (otherwise fall back to the file-based approach or store those values elsewhere).
Note, if the XML structure changes, or the XML is unavailable today, you'll want to invalidate ##rates and handle exceptions in some nice way...better safe than sorry.

Using carrierwave to upload one image to multiple storage location

I would like to be able to upload one image into two different locations: one location would be on the local filesystem (of the server) and the other would be Amazon S3 (the Amazon S3 location would be optional).
My current environment is Rails 3.2.8, Ruby 1.9.3, with Carrierwave used for uploading the file.
I've had some success using the following method:
Model
class Image < ActiveRecord:Base
attt_accessor :remote
before_save :configure_for_remote
mount_uploader :image, ImageUploader #stores images locally
mount_uploader :image_remote, ImageRemoteUploader #store images on S3
def configure_for_remote
if self.remote=="1"
self.image_remote = self.image.dup
end
end
end
Relevant view form fields (simple form syntax)
<p><%= f.input :image, as: :file %></p>
<p><%= f.input :remote, as: :boolean %></p>
The user checks the "remote" checkbox in the form and chooses the image to upload. The before_save callback stores a duplicate of image into image_remote, the file is processed by their respective uploaders, and I have my desired result.
However, I'm starting to run into problems when I want to update that field. For example, if the user chooses to first upload the file locally and not to S3 (does not check the remote checkbox), then later comes back to the form and checks the remote checkbox. In this case, the before_save callback does not get run because no real active record column has been changed (only the remote flag). I've tried to use before_validation, but this fails to work (the image_remote uploader stores the proper filename in the image_remote column, but the image does not get uploaded to S3). Obviously something is changing between the before_validation and the before_save (image attribute is being converted to and uploader?) but I can't seem to figure out why this doesn't work.
With all this being said, I think my approach with using dup is a bit of a hack, and I'm hoping someone can advise me in a more elegant way of reaching my goal.
Thanks for your help.

I was to solve this, although I'm still not sure if it's the most elegant solution.
First off, I mentioned in my question that when I registered config_for_remote_upload with the before_validation callback, the file was not uploaded to S3, but the image_remote column was populated. Upon further inspection, the situation is even worse. When initializing the image_remote uploader within the before_validation callback, all files were deleted on the S3 storage bucket! I replicated this a couple times. I only tested when the store_dir was set to nil in the uploaded, thus putting the files at the root of the bucket.
Initializing the image_remote column in during the before_save callback does not have this problem. In order force the record to save (it wouldn't save, because only a non db column attribute was being changed) I added a before_validation that changed the update_at field of the record.
before_validation: :change_record_updated_at
...
def change_record_updated_at
self.update_at=Time.current
end
I also moved away from using dup, not because it didn't work, but rather because I didn't know why it worked. Instead I created a StringIO object for the file and assigned that to the image_remote column.
def config_for_remote_upload
if self.remote.to_i==1
#self.image_remote = self.image.dup
#this will open the file as binary
img_binary = File.open(self.image.file.path){ |i| i.read }
img_encoded = Base64.encode64(img_binary)
io = FilelessIO.new(Base64.decode64(img_encoded))
io.original_filename = self.image.file.original_filename
self.image_remote = io
elsif self.remote.to_i==0
#delete remote image and clear field
self.remove_image_remote = true
end
end
See here for further info on FilelessIO (StringIO with original_filename).
With this configuration, the file can be uploaded to the second storage location (S3 in my case) after the initial upload.
Hope this helps someone else out.

What is the best way to obfuscate numerical IDs in an application

Given I've got a site where most of the resources have numerical IDs (i.e. user.id question.id etc.) but that like the Germans looking back on WWII I'd rather not reveal these to the observers, what's the best way to obfuscate them?
I presume the method is going to involve the .to_param and then some symmetric encryption algorithm but I'm not sure what's the most efficient encryption to do and how it'll impact lookup times in the DB etc.
Any advice from the road trodden would be much appreciated.

I published a Rails plugin that does this called obfuscate_id. I didn't need it to be secure, but just to make the id in the url non-obvious to the casual user. I also wanted it to look cleaner than a long hash.
It also has the advantage of needing no migrations or database changes. It's pretty simple.
Just add the gem to your Gemfile:
gem 'obfuscate_id'
And add call the obfuscate id in your model:
class Post < ActiveRecord::Base
obfuscate_id
end
This will create urls like this:
# post 7000
http://example.com/posts/5270192353
# post 7001
http://example.com/posts/7107163820
# post 7002
http://example.com/posts/3296163828
You also don't need to look up the records in any special way, ActiveRecord find just works.
Post.find(params[:id])
More information here:
https://github.com/namick/obfuscate_id

I usually use a salted Hash and store it in the DB in an indexed field. It depends on the level of security you expect, but I use one salt for all.
This method makes the creation a bit more expensive, because you are going to have an INSERT and an UPDATE, but your lookups will be quite fast.
Pseudo code:
class MyModel << ActiveRecord::Base
MY_SALT = 'some secret string'
after_create :generate_hashed_id
def to_param
self.hashed_id
end
def generate_hashed_id
self.update_attributes(:hashed_id => Digest::SHA1.hexdigest("--#{MY_SALT}--#{self.id}--"))
end
end
Now you can look up the record with MyModel.find_by_hashed_id(params[:id]) without any performance repercussions.

Here's a solution. It's the same concept as Wukerplank's answer, but there's a couple of important differences.
1) There's no need to insert the record then update it. Just set the uuid before inserting by using the before_create callback. Also note the set_uuid callback is private.
2) There's a handy library called SecureRandom. Use it! I like to use uuid's, but SecureRandom can generate other types of random numbers as well.
3) To find the record use User.find_by_uuid!(params[:id]). Notice the "!". That will raise an error if the record is not found just like User.find(params[:id]) would.
class User
before_create :set_uuid
def to_param
uuid
end
private
def set_uuid
self.uuid = SecureRandom.uuid
end
end

Hashids is a great cross-platform option.

You can try using this gem,
https://github.com/wbasmayor/masked_id
it obfuscates your id and at the same time giving each model it's own obfuscated code so all no. 1 id won't have the same hash. Also, it does not override anything on the rails side, it just provides new method so it doesn't mess up your rails if your also extending them.

Faced with a similar problem, I created a gem to handle the obfuscation of Model ids using Blowfish. This allows the creation of nice 11 character obfuscated ids on the fly. The caveat is, the id must be within 99,999,999, e.g. a max length of 8.
https://github.com/mguymon/obfuscate
To use with Rails, create an initializer in config/initializers with:
require 'obfuscate/obfuscatable'
Obfuscate.setup do |config|
config.salt = "A weak salt ..."
end
Now add to models that you want to be Obfuscatable:
class Message < ActiveRecord::Base
obfuscatable # a hash of config overrides can be passed.
end
To get the 11 character obfuscated_id, which uses the Blowfish single block encryption:
message = Message.find(1)
obfuscated = message.obfuscated_id # "NuwhZTtHnko"
clarified = message.clarify_id( obfuscated ) # "1"
Message.find_by_obfuscated_id( obfuscated )
Or obfuscate a block of text using Blowfish string encryption, allowing longer blocks of text to be obfuscated:
obfuscated = message.obfuscate( "if you use your imagination, this is a long block of text" ) # "GoxjVCCuBQgaLvttm7mXNEN9U6A_xxBjM3CYWBrsWs640PVXmkuypo7S8rBHEv_z1jP3hhFqQzlI9L1s2DTQ6FYZwfop-xlA"
clarified = message.clarify( obfuscated ) # "if you use your imagination, this is a long block of text"

Rails non-image file upload to DB without using server-side temp files?

I'm looking into the feasibility of adding a function to my Rails-based intranet site that allows users to upload files.
Two purposes:
My users are widely distributed geographically and linking to documents on the shared network storage doesn't always work (different addresses, DNS entries and stuff outside my control or interest) so I'm thinking about providing a database-oriented alternative.
We have a number of files from which we parse data at the client end. I'd rather like to be able to push that up to the server.
I've looked at attachment_fu, Paperclip and another one (forgotten the name!) all of which seem very image-oriented, although attachment_fu at least can work without a image processing library present, thank goodness.
The big problem is that my server does not permit my application to write files locally, and these plugins all seem to want to create a Tempfile.
The questions (finally!)
Is there a reasonable way to upload binary data and process it in memory and/or store it as a BLOB without any server-side file saves?
Or should I give up on the file distribution idea and give the users a second-best option of copy-and-paste text boxes where possible?
(Closest I could find on SO was this which doesn't really help)

You could read the data from the params object, and write it straight to your model.
For example, you could have a form like this.
<% form_for :upload, :url => {:action=>:upload}, :html=>{:multipart=>true} do |f| %>
<%= f.file_field :file %>
<%= f.submit 'Upload' %>
<% end %>
Then you can easily get the original filename and the binary data.
class TestController < ApplicationController
def upload
file_param = params[:upload][:file]
filename = file_param.original_filename
filedata = file_param.read
#data = UploadedFile.create(:name => filename, :data => filedata)
render :text => "created #{#data.id}"
end
end
Of course your model needs to have the proper columns.
class CreateUploadedFiles < ActiveRecord::Migration
def self.up
create_table :uploaded_files do |t|
t.string :name
t.binary :data
t.timestamps
end
end
def self.down
drop_table :uploaded_files
end
end
Hope this helps!

The big problem is that my server does not permit my application to write files locally, and these plugins all seem to want to create a Tempfile.
Yes it does, or you wouldn't be able to upload the files at all.
Rails itself creates tempfiles if the uploaded file is larger than 15k or so.
<%= f.file_field :file %>
....
file_param = params[:upload][:file]
As soon as you upload something bigger than 15k, params[:upload][:file] is going to be an ActionController::UploadedTempFile.
What's the difference? Rails is likely writing it's tempfiles to the global temp directory (which everyone can write to) but the plugins are probably trying to write to RAILS_ROOT/tmp, which your server disallows. The good news is you can just configure those things to use a different temp dir so they can write their tempfiles, and it should all work.
For example, attachment_fu's default temp path is under rails root.. You should be able to change it like this:
Technoweenie::AttachmentFu.tempfile_path = Dir::tmpdir
**PS: pulling the file data straight out of the params and putting it into the database may still be the best way to go. I personally dislike attachment_fu and it's ilk, as they try to do too many things, but either way, it's very useful to know about how the whole uploaded file/temp file thing works in rails :-)

This HowTo for Rails includes a section (near the end of the page) on how to upload directly to the database. That section is sort of messed up, but the gist of it is that you just read the uploaded file contents into your BLOB field on your ActiveRecord object and save as normal. Since I don't know how you use the file inside your application, I can't really give any advice on how to use it from the database, though there is also a section on downloading from the DB in the HowTo.
It may be easier just to see if you can get permission to write to a single directory, perhaps inside your web app folder, on the server.

So this code in my controller:
def upload
file_content = params[:upload][:file]
render :text => [
[:original_path, :content_type, :local_path, :path, :original_filename].collect {|m| file_content.send(m)},
file_content.class,
file_content.size.to_s].flatten.join("<br/>")
end
gives this for a smaller file:
b_wib.xls
application/vnd.ms-excel
b_wib.xls
ActionController::UploadedStringIO
13824
and this for a larger one:
a_wib.xls
application/vnd.ms-excel
/tmp/CGI.10029.1
/tmp/CGI.10029.1
a_wib.xls
ActionController::UploadedTempfile
27648
...which is exactly as Orion described.

For anyone else reading this while just saving the File/IO in params to the database is a nice solution (why complicate matters) Paperclip, and I would suspect attachment_fu, are not image specific. Since uploading images and resizing are very common Paperclip comes bundled with a processor to resize images, but it is not enabled by default and you can easily add your own processors.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Carrierwave: file hash and model id in filename/store_dir - carrierwave

Related

Carrierwave, creating a duplicate attachment when duplicating its containing model

Including .xml file to rails and using it

Using carrierwave to upload one image to multiple storage location

What is the best way to obfuscate numerical IDs in an application

Rails non-image file upload to DB without using server-side temp files?

Categories

Resources