How to cache files over 1MB with rack/cache on Heroku? - ruby-on-rails

I'm using a combination of Dragonfly and rack/cache hosted on Heroku.
I'm using Dragonfly for uploaded assets. Thumbnails are processed on-the-fly and stored in rack/cache for fast delivery from memcached (via the Memcachier addon).
Regular static assets are also cached in memcached via rack/cache.
My problem is that any uploaded files over 1MB are causing a 500 error in my application.
2013-07-15T10:38:27.040992+00:00 app[web.1]: DalliError: Value too large, memcached can only store 1048576 bytes per key [key: d49c36d5db74ef45e957cf169a0b27b83b9e84de, size: 1502314]
2013-07-15T10:38:27.052255+00:00 app[web.1]: cache: [GET /media/BAhbBlsHOgZmSSIdNTA3Njk3ZWFiODBmNDEwMDEzMDAzNjA4BjoGRVQ/WTW_A5Flyer_HealthcareMedicalObsGynae_WEB.pdf] miss, store
2013-07-15T10:38:27.060583+00:00 app[web.1]: !! Unexpected error while processing request: undefined method `each' for nil:NilClass
Memcache has a limit of 1MB, so I can understand why my asset was not cached, but I would rather it didn't break serving assets.
I'm not even sure where this error is coming from. Presumably from one of the other rack middlewares?
Increasing the maximum file size doesn't seem to have have any affect.
config.cache_store = :dalli_store, ENV["MEMCACHIER_SERVERS"].split(","), {¬
:username => ENV["MEMCACHIER_USERNAME"],¬
:password => ENV["MEMCACHIER_PASSWORD"],¬
:value_max_bytes => 5242880 # 5MB¬
}
Long term, I know that moving this sort of asset off of Heroku is a sensible move, but that won't be a quick job.
What can I do to serve these assets on Heroku in the meantime without errors?

So contrary to #jordelver's question, I find that setting the :value_max_bytes option of dalli does work. I'm setting up Rack::Cache in a slightly different way that maybe makes the difference.
This is what my production.rb contains to configure Rack::Cache:
client = Dalli::Client.new(ENV["MEMCACHIER_SERVERS"],
:username => ENV["MEMCACHIER_USERNAME"],
:password => ENV["MEMCACHIER_PASSWORD"],
:value_max_bytes => 10485760)
config.action_dispatch.rack_cache = {
:metastore => client,
:entitystore => client
}
config.static_cache_control = "public, max-age=2592000"
With the above, some errors will be printed to the logs for values over 1MB, but they won't cause a 5xx error for the client, just a cache miss.
P.S I work for MemCachier :) so we're interested in trying to sort this out. Please let me know if it works.

I had the same issue as #jordelver and managed to get round memcachier's limits by monkey patching Dragonfly::Response:
module Dragonfly
class Response
private
def cache_headers
if job.size > 1048576
{
"Cache-Control" => "no-cache, no-store",
"Pragma" => "no-cache"
}
else
{
"Cache-Control" => "public, max-age=31536000", # (1 year)
"ETag" => %("#{job.signature}")
}
end
end
end
end
Essentially, if the size is over 1048576 bytes, send a no-cache header.

My application.js was too big for rack-cache so I did:
# in config/environments/development.rb
config.action_dispatch.rack_cache = {
metastore: 'file:/var/cache/rack/meta',
entitystore: 'file:tmp/cache/rack/body'
}
And it works!
Stores metadata in memcache but actual file in filesystem and not in memory.

Related

Rails cached value lost/nil despite expires_in 24.hours

I am using ruby 2.3.3 and Rails 4.2.8 with Puma (1 worker, 5 threads) and on my admin (i.e. not critical) page I want to show some stats (integer values) from my database. Some requests take quite a long time to perform so I decided to cache these values and use a rake task to re-write them every day.
Admin#index controller
require 'timeout'
begin
timeout(8) do
#products_a = Rails.cache.fetch('products_a', :expires_in => 24.hours) { Product.where(heavy_condition_a).size }
#products_b = Rails.cache.fetch('products_b', :expires_in => 24.hours) { Product.where(heavy_condition_b).size }
#products_c = Rails.cache.fetch('products_c', :expires_in => 24.hours) { Product.where(heavy_condition_c).size }
#products_d = Rails.cache.fetch('products_d', :expires_in => 24.hours) { Product.where(heavy_condition_d).size }
end
rescue Timeout::Error
#products_a = 999
#products_b = 999
#products_c = 999
#products_d = 999
end
Admin#index view
<li>Products A: <%= #products_a %></li>
<li>Products B: <%= #products_b %></li>
<li>Products C: <%= #products_c %></li>
<li>Products D: <%= #products_d %></li>
Rake task
task :set_admin_index_stats => :environment do
Rails.cache.write('products_a', Product.where(heavy_condition_a).size, :expires_in => 24.hours)
Rails.cache.write('products_b', Product.where(heavy_condition_b).size, :expires_in => 24.hours)
Rails.cache.write('products_c', Product.where(heavy_condition_c).size, :expires_in => 24.hours)
Rails.cache.write('products_d', Product.where(heavy_condition_d).size, :expires_in => 24.hours)
end
I am using this in production and use Memcachier (on Heroku) as a cache store. I also use it for page caching on the website and it works fine there. I have:
production.rb
config.cache_store = :dalli_store
The problem I am experiencing is that the cached values disappear almost instantly, and quite intermittently, from the cache. In the console I have tried:
I Rails.cache.write one value (e.g. product_a) and check it a minute later, it is still there. Although crude, I can see the "Set cmds" increments by one in Memcachier admin tool.
However, when I add the next value (e.g. product_b) the first one disappears (becomes nil)! Sometimes if I add all 4 values, 2 seems to stick. These are not always the same values. It is like whack-a-mole!
If I run the rake to write the values and then try to read the values typically only two values are left, whereas the others are lost.
I have seen a similar question related to this where the reason explained was the use of a multithread server. The cached value was saved in one thread and could not be reached in another, the solution was to use a memcache, which I do.
It is not only the console. If I just reload admin#index view to store the values or run the rake task, I experience the same problem. The values do not stick.
My suspicion is that I am either not using the Rails.cache-commands properly or that these commands do not in fact use Memcachier. I have not been able to determine whether my values are in fact stored in Memcachier but when I use my first command in the console I do get the following:
Rails.cache.read('products_a')
Dalli::Server#connect mc1.dev.eu.ec2.memcachier.com:11211
Dalli/SASL authenticating as abc123
Dalli/SASL authenticating as abc123
Dalli/SASL: abc123
Dalli/SASL: abc123
=> 0
but I do not get that for subsequent writes (which I assume is a matter of readability in the console and not a proof of Memcachier not being used.
What am I missing here? Why won't the values stick in my cache?
Heroku DevCenter states a little different cache config and gives some advice about threaded Rails app servers like Puma using connection_pool gem:
config.cache_store = :mem_cache_store,
(ENV["MEMCACHIER_SERVERS"] || "").split(","),
{ :username => ENV["MEMCACHIER_USERNAME"],
:password => ENV["MEMCACHIER_PASSWORD"],
:failover => true,
:socket_timeout => 1.5,
:socket_failure_delay => 0.2,
:down_retry_delay => 60,
:pool_size => 5
}

AWS S3 file uploads returning 403:Forbidden with Carrierwave/fog gems in Rails

I've been scratching my head over this for several days. Currently working through Chapter 13 (user microposts) of the Rails Tutorial, and while my app works fine in development, I cannot seem to get the image uploads to AWS S3 running in production (details here in the tutorial). The app utililzes ASW S3 buckets for storage, and CarrierWave/Fog gems for file upload.
When I open my production heroku app, everything works fine except the image upload. When I try to upload an image on a new micropost, I get a generic 'We're sorry, but something went wrong.' error in the browser. Heroku logs show a :status 403 error "Forbidden" when trying to reach the bucket (detailed logs below).
Others have seemed to have similar problems. In most cases it seems the solution is to set proper IAM user permissions or set the S3 bucket policy, but I am fairly confident I have those set up correctly for two reasons:
I am able to use the web GUI on that user account to upload and manage files in the bucket, so I know it has access.
I installed the aws command line tool, configured it with the same user access key and secret key used in my web app, and I am able to upload and retrieve information from the bucket through the CLI.
Still I have tried several bucket policies and user permissions, including the Amazon S3 full access permission (I believe this is the most general), and several more specific versions (see latest below).
Other things I have tried that seem to work for others:
Updating CORS configuration for the bucket (although honestly I do not fully understand CORS or if it is necessary for accessing buckets. Again I started with a general config, not a specific one, and still can't get it to work).
Using different gems for storage, such as carrierwave-aws in lieu of fog.
Setting fog's config.delete_tmp_file_after_storage = false -- two examples of success for this: 1 | 2
Being a new developer, I feel I don't have the technical sophistication yet to diagnose this one, and frankly it is a bit discouraging, so I am hoping for help. Here are some questions I have and possible lines of inquiry I would like to look further into:
Question: If I am able to access the bucket with the aws-cli, shouldn't I also be able to use heroku with carrierwave/fog using the same credentials, or am I misunderstanding something about how heroku reaches for the S3 bucket?
Possible issue: Carrierwave or ImageMagick creates a temp file when the image is uploaded. In some cases it seems that may interfere with the bucket upload. Someone gave a vague answer about this here, but I do not understand what action might help get rid of that problem.
Possible issue: Someone seemed to have issues with Rails strong params while doing this. I could not quite understand how to debug this... When I use byebug to open an interactive console in the server at the error location (in my the micropost_controller #create method), right after the file upload, the image is in the params hash, but the :picture key of the instance variable #micropost is nil (at the same time, the :content key is not nil, it contains whatever text I submitted with the picture, as it should).
Link to the rest of the code on GitHub if it helps.
Sorry for the long-windedness. Any guidance would be greatly appreciated.
Errors:
2019-01-07T08:11:37.684069+00:00 app[web.1]: F, [2019-01-07T08:11:37.683926 #22] FATAL -- : [363c5115-f872-43b4-857a-10d1d7d11737] Excon::Error::Forbidden (Expected(200) <=> Actual(403 Forbidden)
2019-01-07T08:11:37.684074+00:00 app[web.1]: excon.error.response
2019-01-07T08:11:37.684077+00:00 app[web.1]: :body => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>***</RequestId><HostId>***=</HostId></Error>"
2019-01-07T08:11:37.684079+00:00 app[web.1]: :cookies => [
2019-01-07T08:11:37.684081+00:00 app[web.1]: ]
2019-01-07T08:11:37.684082+00:00 app[web.1]: :headers => {
2019-01-07T08:11:37.684085+00:00 app[web.1]: "Connection" => "close"
2019-01-07T08:11:37.684086+00:00 app[web.1]: "Content-Type" => "application/xml"
2019-01-07T08:11:37.684088+00:00 app[web.1]: "Date" => "Mon, 07 Jan 2019 08:11:36 GMT"
2019-01-07T08:11:37.684090+00:00 app[web.1]: "Server" => "AmazonS3"
2019-01-07T08:11:37.684092+00:00 app[web.1]: "x-amz-id-2" => "***"
2019-01-07T08:11:37.684094+00:00 app[web.1]: "x-amz-request-id" => "***"
2019-01-07T08:11:37.684096+00:00 app[web.1]: }
2019-01-07T08:11:37.684098+00:00 app[web.1]: :host => "bucket-name.s3-us-west-1.amazonaws.com"
2019-01-07T08:11:37.684099+00:00 app[web.1]: :local_address => "*********"
2019-01-07T08:11:37.684101+00:00 app[web.1]: :local_port => ******
2019-01-07T08:11:37.684103+00:00 app[web.1]: :path => "/uploads/micropost/picture/306/ocean2.jpeg"
2019-01-07T08:11:37.684104+00:00 app[web.1]: :port => 443
2019-01-07T08:11:37.684106+00:00 app[web.1]: :reason_phrase => "Forbidden"
2019-01-07T08:11:37.684108+00:00 app[web.1]: :remote_ip => "*******"
2019-01-07T08:11:37.684110+00:00 app[web.1]: :status => 403
2019-01-07T08:11:37.684111+00:00 app[web.1]: :status_line => "HTTP/1.1 403 Forbidden\r\n"
2019-01-07T08:11:37.684113+00:00 app[web.1]: ):
2019-01-07T08:11:37.684217+00:00 app[web.1]: F, [2019-01-07T08:11:37.684147 #22] FATAL -- : [363c5115-f872-43b4-857a-10d1d7d11737]
2019-01-07T08:11:37.684392+00:00 app[web.1]: F, [2019-01-07T08:11:37.684328 #22] FATAL -- : [363c5115-f872-43b4-857a-10d1d7d11737] app/controllers/microposts_controller.rb:7:in `create'
Bucket policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::***********:user/user-name"
},
"Action": [
"s3:AbortMultipartUpload",
"s3:ListBucket",
"s3:GetObject",
"s3:PutObject",
"s3:PutObjectAcl",
"s3:DeleteObject",
"s3:GetObjectVersion"
],
"Resource": [
"arn:aws:s3:::bucket-name/*",
"arn:aws:s3:::bucket-name"
]
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::*******:user/user-name"
},
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::bucket-name",
"Condition": {}
}
]
}
CORS:
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
<AllowedOrigin>*</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
<AllowedMethod>POST</AllowedMethod>
<AllowedMethod>PUT</AllowedMethod>
<MaxAgeSeconds>3000</MaxAgeSeconds>
<AllowedHeader>*</AllowedHeader>
</CORSRule>
<CORSRule>
<AllowedOrigin>*</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
<AllowedMethod>POST</AllowedMethod>
<AllowedMethod>PUT</AllowedMethod>
<MaxAgeSeconds>3000</MaxAgeSeconds>
<AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>
Current Initializer: carrier_wave.rb
if Rails.env.production?
CarrierWave.configure do |config|
config.fog_credentials = {
# Configuration for Amazon S3
:provider => 'AWS',
:aws_access_key_id => ENV['S3_ACCESS_KEY'],
:aws_secret_access_key => ENV['S3_SECRET_KEY'],
:region => ENV['S3_REGION']
}
config.fog_directory = ENV['S3_BUCKET']
end
end
picture_uploader.rb
class PictureUploader < CarrierWave::Uploader::Base
include CarrierWave::MiniMagick
process resize_to_limit: [400, 400]
if Rails.env.production?
storage :fog
else
storage :file
end
# Override the directory where uploaded files will be stored.
# This is a sensible default for uploaders that are meant to be mounted:
def store_dir
"uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
# Add a white list of extensions which are allowed to be uploaded.
def extension_whitelist
%w(jpg jpeg gif png)
end
end
Just had this problem and found out for private buckets with CarrierWave and fog you need to add this config.fog_public = false to your CarrierWave.configure.
After setting fog_public to false, the urls returned will be ones with signature too.
Docs: https://www.rubydoc.info/gems/carrierwave/CarrierWave/Storage/Fog
All tutorials I saw on how to upload a file in S3 using Carrierwave/fog had enabled Public access on their buckets. I had not for obvious reasons and so needed the config.aws_acl = :private in my carrierwave.rb config file. I think this will help others who face the same problem after following online tutorials trying to get this to work.

Setting public_file_server.headers except for some files

I use this in production.rb :
config.public_file_server.headers = {
'Cache-Control' => 'public, s-maxage=31536000, maxage=31536000',
'Expires' => "#{1.year.from_now.to_formatted_s(:rfc822)}"
}
I use public files through a cdn.mydomain.com, which is reading from www.mydomain.com and it copies the cache-control from www.mydomain.com, that I set with public_file_server.headers.
The issue is that I want some files from /public to not have those cache-control, for example for my service-worker.js
Is there a way to set those cache control only for one folder in /public for example?
The other solution would be to remove this public_file_server.headers configuration, and setting the cache control on the cdn level (I use cdn.mydomain.com/publicfile), and keeping www.mydomain.com/serviceworker without cache control, for the service worker.
But maybe there is a chance to config this at the Rails level?
I had exactly the same problem: PWA built with Rails using CDN (Cloudfront). For the assets I want to use cache headers with far future expires, but the ServiceWorker needs Cache-control: No-cache.
Because CloudFront doesn't allow to add or change headers by itself, I need a solution on the app level. After some research I found a solution in a blogpost. The idea is to set headers via public_file_server.headers and add a middleware to change this for the ServiceWorker file.
Also, you wrote maxage=, it should be max-age=.
Here is the code I use:
production.rb:
config.public_file_server.enabled = ENV['RAILS_SERVE_STATIC_FILES'].present?
config.public_file_server.headers = {
'Cache-Control' => 'public, s-maxage=31536000, max-age=15552000',
'Expires' => 1.year.from_now.to_formatted_s(:rfc822)
}
if ENV['RAILS_SERVE_STATIC_FILES'].present?
config.middleware.insert_before ActionDispatch::Static, ServiceWorkerManager, ['sw.js']
end
app/middleware/service_worker_manager.rb:
# Taken from https://codeburst.io/service-workers-rails-middleware-841d0194144d
#
class ServiceWorkerManager
# We’ll pass 'service_workers' when we register this middleware.
def initialize(app, service_workers)
#app = app
#service_workers = service_workers
end
def call(env)
# Let the next middleware classes & app do their thing first…
status, headers, response = #app.call(env)
dont_cache = #service_workers.any? { |worker_name| env['REQUEST_PATH'].include?(worker_name) }
# …and modify the response if a service worker was fetched.
if dont_cache
headers['Cache-Control'] = 'no-cache'
headers.except!('Expires')
end
[status, headers, response]
end
end

Warning! ActionDispatch::Session::CacheStore failed to save session. Content dropped. in logfile

I'm running a Rails 4 application and I'm seeing this in my production logs on Heroku. I see it frequently and some users are reported getting logged out and I don't see anything in the logs that would indicate why it's happening. I don't see any information from dalli that it cannot save. For most users, it works just fine. Any tips on how to hunt this down?
Error Message in Production Logs
Warning! ActionDispatch::Session::CacheStore failed to save session. Content dropped.
My configuration in production
config.cache_store = :dalli_store,
(ENV["MEMCACHIER_SERVERS"] || "").split(","),
{:username => ENV["MEMCACHIER_USERNAME"],
:password => ENV["MEMCACHIER_PASSWORD"],
:failover => true,
:socket_timeout => 1.5,
:socket_failure_delay => 0.2
}
config.session_store = :mem_cache_store
I found the problem that was creating this error, I'm using a custom header variable to set the session ID and the client was sending nil which was causing rack to not be able to store to a nil session ID. I wrapped the call in a blank? check to ensure it's only set if there is a valid value and that fixed my issue.
if(!request.headers["Session"].blank?)
request.session_options[:id] = request.headers["Session"]
end

Zip up all Paperclip attachments stored on S3

Paperclip is a great upload plugin for Rails. Storing uploads on the local filesystem or Amazon S3 seems to work well. I'd just assume store files on the localhost, but the use of S3 is required for this app as it will be hosted on Heroku.
How would I go about getting all of my uploads/attachments from S3 in a single zipped download?
Getting a zip of files from the local filesystem seems straight forward. It's getting the files from S3 that has me puzzled. I think it may have something to do with the way that rubyzip handles files referenced by URL. I've tried various approaches but can't seem to avoid errors.
format.zip {
registrations_with_attachments = Registration.find_by_sql('SELECT * FROM registrations WHERE abstract_file_name NOT LIKE ""')
headers['Cache-Control'] = 'no-cache'
tmp_filename = "#{RAILS_ROOT}/tmp/tmp_zip_" <<
Time.now.to_f.to_s <<
".zip"
# rubyzip gem version 0.9.1
# rdoc http://rubyzip.sourceforge.net/
Zip::ZipFile.open(tmp_filename, Zip::ZipFile::CREATE) do |zip|
#get all of the attachments
# attempt to get files stored on S3
# FAIL
registrations_with_attachments.each { |e| zip.add("abstracts/#{e.abstract.original_filename}", e.abstract.url(:original, false)) }
# => No such file or directory - http://s3.amazonaws.com/bucket/original/abstract.txt
# Should note that these files in S3 bucket are publicly accessible. No ACL.
# works with local storage. Thanks to Henrik Nyh
# registrations_with_attachments.each { |e| zip.add("abstracts/#{e.abstract.original_filename}", e.abstract.path(:original)) }
end
send_data(File.open(tmp_filename, "rb+").read, :type => 'application/zip', :disposition => 'attachment', :filename => tmp_filename.to_s)
File.delete tmp_filename
}
You almost certainly want to use e.abstract.to_file.path instead of e.abstract.url(...).
See:
Paperclip::Storage::S3::to_file (should return a TempFile)
TempFile::path
UPDATE
From the changelog:
New in 3.0.1:
API CHANGE: #to_file has been removed. Use the #copy_to_local_file method instead.
#vlard's solution is ok. However I've run into some issues with the to_file. It creates a tempfile and the garbage collector deletes (sometimes) the file before it was added to the zip file. Therefor, I'm getting random Errno::ENOENT: No such file or directory errors.
So I'm using the following code now (I've kept the initial code variables names for consistency with the initial question)
format.zip {
registrations_with_attachments = Registration.find_by_sql('SELECT * FROM registrations WHERE abstract_file_name NOT LIKE ""')
headers['Cache-Control'] = 'no-cache'
#please note that using nanoseconds option in strftime reduces the risks concerning the situation where 2 or more users initiate the download in the same time
tmp_filename = "#{RAILS_ROOT}/tmp/tmp_zip_" <<
Time.now.strftime('%Y-%m-%d-%H%M%S-%N').to_s <<
".zip"
# rubyzip gem version 0.9.4
zip = Zip::ZipFile.open(tmp_filename, Zip::ZipFile::CREATE)
zip.close
registrations_with_attachments.each { |e|
file_to_add = e.file.to_file
zip = Zip::ZipFile.open(tmp_filename)
zip.add("abstracts/#{e.abstract.original_filename}", file_to_add.path)
zip.close
puts "added #{file_to_add.path} to #{tmp_filename}" #force garbage collector to keep the file_to_add until after the file has been added to zip
}
send_data(File.open(tmp_filename, "rb+").read, :type => 'application/zip', :disposition => 'attachment', :filename => tmp_filename.to_s)
File.delete tmp_filename
}

Resources