I am configuring gzip compression for a server via Jetty, and there are PUT/POST endpoints whose response payloads I would like to compress. The default GzipHandler configuration for Jetty specifically only includes GET; that this is the default is documented, but I'm unable to find documentation as to why this is the default. Is there a downside to applying gzip when the method is non-GET?
The reason comes down to responses from PUT and POST are, in a general sense, not suitable for being put into a cache.
GET was selected as the default back when gzip compression was first introduced, (and back before Jetty moved to Eclipse, back before Servlet 2.0, back when it was called the GzipFilter in Jetty) and in that era if the content couldn't be cached it wasn't compressed.
Why?
Well, back then using system resources to compress the same content over and over and over again was seen as a negative, it was more important to serve many requests vs a few optimized requests.
The GzipHandler can be configured to use any method, even nonsensical ones like HEAD.
Don't let a default value with historical reasons prevent you from using the GzipHandler, use it, configure it, and be happy.
Feel free to file an issue requesting the defaults be changed at https://github.com/eclipse/jetty.project/issues
Related
I have a show view, that uses a 'Universal Viewer' to load images. The image dimensions come from a json file that comes from a IIIF image server.
I fixed a bug and a new json file exists, but the user's browser is still using the old info.json file.
I understand that I could just have them do a hard-reload, like I myself did on my machine, but many users may be affected, and I'm just damn curious now.
Modern browsers all ship with cache control functionality baked into it. Using a combination of ETags and Cache-Control headers, you can accomplish what you seek without having to change the file names or use cache busting query parameters.
ETags allow you to communicate a token to a client that will tell their browser to update the cached version. This token can be created based on the content creation date, content length, or a fingerprint of the content.
Cache-Control headers allow you to create policies for web resources about how long, who, and how your content can be cached.
Using ETags and Cache-Control headers is a useful way to communicate to users when to update their cache when serving IIIF or any other content. However, adding ETags and Cache-Control this can be quite specific to your local implementation. Many frameworks (like Ruby on Rails) have much of this functionality baked into it. There are also web server configurations that may need to be modified, some sample configurations are available from the HTML5 Boilerplate project that use these strategies.
Sample Apache configurations for:
ETags https://github.com/h5bp/server-configs-apache/blob/master/src/web_performance/etags.conf
Cache expiration https://github.com/h5bp/server-configs-apache/blob/master/src/web_performance/expires_headers.conf
It depends on where the JSON file is being served from, and how it's being cached.
The guaranteed way to expire the cache on the file is to change the filename every time it changes. This is typically done be renaming it filename-MD5HASH.ext, where the MD5HASH is the MD5 hash of the file.
If you can't change the file name (it comes from a source you can't control, you might be able to get away with adding a caching busting query key to the URL. Something like http://example.com/file.ext?q=123456.
We are using Twilio's API (using TwiML), and are serving ".wav" files to them via our standard Rails/Passenger/nginx stack. However, I'm running into an issue -- according to Twilio, our server is sending "application/octet-stream" for .wav files, instead of the required "audio/wav".
I've made sure both Rails (in the mime_types initializer) and nginx (in the mime.types file) have the appropriate mime type. Yet, roughly half the time, Twilio reports that it cannot retrieve the file due to an "application/octet-stream" mime type. The other half of the time, it works fine.
Has anyone experienced behavior like this?
I may have finally cleared this issue up... I modified our verbs to use full URLs instead of a path, and I used HTTP protocol in those URLs.
# Old
<Play>/assets/path/to/file.wav</Play>
# New
<Play>http://server.com/assets/path/to/file.wav</Play>
I'm not sure whether it's the shift to HTTP, or the shift to URL, but my error rate immediately dropped from around 50% to 0%. I'll leave this open for other answers in case someone knows why this might be the case or they have another solution.
If I include in my application cache manifest:
/example.html
and this redirects to
https://s3.amazonaws.com/longURL/example.html?dynamicauthenticationparameters
will this work?
The current draft HTML5 specification seems to be silent on redirects for content files (as opposed to the manifest itself) apart from referring to a manual redirect flag, which apparently is set but (as far as I can tell) never actually used.
(The intention is to avoid proxying some S3 content, but to still make it available offline using the cache mechanism. JavaScript and LocalStorage would presumably be a workaround if the above can't be done).
Any pointers to the relevant part of a spec and/or current browser implementation behavior would be helpful.
The current specification now states that if the resource is redirected to a different origin, then this is treated as a failure and the local cached copy (or fallback) is used instead.
In section 5.6.4 of http://www.w3.org/TR/2011/WD-html5-20110525/offline.html it states that:
Redirects are fatal because they are either indicative of a network
problem (e.g. a captive portal); or would allow resources to be added
to the cache under URLs that differ from any URL that the networking
model will allow access to, leaving orphan entries; or would allow
resources to be stored under URLs different than their true URLs. All
of these situations are bad.
So sadly you can't serve some pages from Amazon S3 or Cloudfront.
Note: Please correct me if any of my assumptions are wrong. I'm not very sure of any of this...
I have been playing around with HTTP caching on Heroku and trying to work out
a nice way to differentiate between mobile and desktop requests when caching using Varnish
on Heroku.
My first idea was that I could set a Vary header so the cache is Varied on If-None-Match. As Rails automatically sends back etags generated from a hash of the content the etag would vary between desktop and mobile requests (different templates) and so it would eventually cache two versions (not fact, just my original thoughts). I have been playing around with this but I don't think it works.
Firstly, I can't wrap my head around when/if anything gets cached as surely requests with If-None-Match will be conditional gets anyway? Secondly, in practice fresh requests (ones without If-None-Match) sometimes receive the mobile site. Is this because the cache doesn't know whether to serve up the mobile or desktop cached version as the If-None-Match header isn't there?
As it probably sounds, I am rather confused. Will this approach work in any way or am I being silly? Also, is there anyway to achieve separate cached versions if I am unable to reach the Varnish config at all (as I am on Heroku)?
The exact code I am using in Rails to set the cache headers is:
response.headers['Cache-Control'] = 'public, max-age=86400'
response.headers['Vary'] = 'If-None-Match'
Edit: I am aware I can use Vary: User-Agent but trying to avoid it if possible due to it have a high miss rate (many, many user agents).
You could try Vary: User-Agent. However you'll have many cached versions of a single page (one for each user agent).
An other solution may be to detect mobile browsers directly in the reverse proxy, set a X-Is-Mobile-Browser client header before the reverse proxy attempts to find a cached page, set a Vary: X-Is-Mobile-Browser on the backend server (so that the reverse proxy will only cache 2 versions of the same page) and replace that header with Vary: User-Agent before sending to client.
If you can not change your varnish configuration, you have to make different urls for mobile and desktop pages. You can add some url-parameter (?mobile=true), add a piece in your path (yourdomain.com/mobile/news) or use a different host (like m.yourdomain.com).
This makes a lot of sense because (I've seen this many times, both in CMSs and applications) at some point in time you want to differentiate content and structure for mobile devices. People just do different things or are looking for different information on mobile devices...
I am trying to reduce the load on my webservers by adding an "Image server" (a dedicated server for handling image requests), and redirecting all requests for .gif,.jpg,.png etc., to it.
My question is, what is the best way to handle the redirection?
At the firewall level? (can I do this using iptables?)
At the load balancer level? (can ldirectord handle this?)
At the apache level - using rewrite rules?
Thanks for any suggestions on the best way to do this.
--Update--
One thing I would add is that these are domains that are hosted for 3rd parties, so I can't expect all the developers to modify their code and point their images to another server.
The further up the chain you can do it, the better.
Ideally, do it at the DNS level by using a different domain for your images (eg imgs.example.com)
If you can afford it, get someone else to do it by using a CDN (Content delivery network).
-Update-
There are also 2 featuers of apache's mod_rewrite that you might want to look at. They are all described well at http://httpd.apache.org/docs/1.3/misc/rewriteguide.html.
The first is under the heading "Dynamic Miror" in the above document, that uses the mod_rewrite Proxy flag [p]. This lets your server silently fetch files from another domain and return them.
The second is to just redirect the request to the new domain. This second option puts less strain on your server, but requests still need to come in and it slows down the final rendering of the page, as each request needs to make an essentially redundant request to your server first.
i agree with rikh. If you want images to be served from a different webserver, then serve them on a different web-server. For example:
<IMG src="images/Brett.jpg">
becomes
<IMG src="http://brettnesbitt.akamia-technologies.com/images/Brett.jpg">
Any kind of load balancer will still feed the image from the web-server's pipe, which is what you're trying to avoid.
i, of course, know what you really want. What you really want is for any request like:
GET images/Brett.jpg HTTP/1.1
to automatically get converted into:
HTTP/1.1 307 Temporary Redirect
Location: http://brettnesbitt.akamia-technologies.com/images/Brett.jpg
this way you don't have to do any work, except copy the images to the other web-server.
That i really don't know how to do.
By using the phrase "NAT", it implies that the firewall/router receives HTTP requests, and you want to forward the request to a different internal server if the HTTP request was for image files.
This then begs the question about what you're actually trying to save. No matter which internal web-server services the HTTP request, the data is still going to have to flow through the firewall/router's pipe.
The reason i bring it up is because the common scenario when someone wants to serve images from a different server is because they want to split up high-bandwidth, mostly static, low-CPU cost content from their actual logic.
Only using NAT to re-write the packet and send it to a different server will not work towards that common issue.
The other reason might be because images are not static content on your system, and a request to
GET images/Brett.jpg HTTP/1.1
actually builds an image on the fly, with a high-CPU cost, or only using with data available (i.e. SQL Server database) to ServerB.
If this is the case then i would still use a different server name on the image request:
GET http://www.brettsoft.com/default.aspx HTTP/1.1
GET http://imageserver.brettsoft.com/images/Brett.jpg HTTP/1.1
i understand what you're hoping for, with network packet inspection to override the NAT rule and send it to another server - i've never seen any such thing that can do that.
It sounds more "proxy-ish", where the web-proxy does this. (i.e. pfSense and m0n0wall can't do it)
Which then leads to a kind of solution we used once: a custom web-server that analyzes the request, makes the appropriate request off some internal server, and binary writes the response to the client.
That pain in the ass solution was insisted upon by a "security consultant", who apparently believes in security through obscurity.
i know IIS cannot do such things for you itself - i don't know about other web-server products.
i just asked around, and apparently if you wanted to write a custom kernel module for you linux based router, you could have it inspect packets and take appropriate action. Such a module might exist. There are, apparently, plenty of other open-sourced modules to use as a starting point.
But i'd rather shoot myself in the head.