How to handle docker API /images/create? - docker

Docker API image creation / pull (/v1.6/images/create) apparently always return
HTTP/1.1 200 OK
Content-Type: application/json
no matter if the process is a success or a failure.
Furthermore, the payload is not valid json.
eg: /v1.6/images/create?fromImage=whatevertheflush
returns:
{"status":"Pulling repository whatevertheflush"}{"error":"Server error: 404 trying to fetch remote history for whatevertheflush","errorDetail":{"code":404,"message":"Server error: 404 trying to fetch remote history for whatevertheflush"}}
Not being valid json, and the HTTP error not being forwarded / used makes it awkward to handle errors for clients.
Indeed, docker-py just puke the payload (https://github.com/dotcloud/docker-py/blob/master/docker/client.py#L374). And DockerHTTPClient from openstack tries to return a value based on the http error code, which is always 200... (https://github.com/openstack/nova/blob/master/nova/virt/docker/client.py#L191)
Now, I understand the pull might take a long time, and that it somewhat make sense to start streaming an answer to the client, but I can't help thinking something is wrong here.
So, this is three fold:
am I missing something entirely here?
if not: if you are implementing a client application (say, in Python), how would you handle this (elegantly, if possible :))? try to detect valid json blocks, load them, and exit whenever we "think" something is wrong?
if not: is this going to change (for the better) in future docker versions?

This question is a bit old, but for the future reader who has landed on this page, I'd like to let you know you're not alone, we feel your pain. This API is indeed as terrible as it looks.
The TL;DR answer is "the /images/create response format is undocumented; discard the output and query /images/XXX/json after your create call completes."
I wrote some orchestration tools a few years ago, and I found the /images/create API to be extremely annoying. But let's dive in:
There is no documented schema of the 200 response; the v1.19 docs simply gave examples of a few records. The v1.37 (latest at the time I write this) docs don't even go that far, no details are provided at all of the response.
The response is sent as Transfer-Encoding: chunked, and each record sent is preceded by the byte count in hex. Here's a low-level exerpt (bypassing curl, so we can see what actually gets sent on the wire):
host-4:~ rg$ telnet localhost 2375
Trying ::1...
Connected to localhost.
Escape character is '^]'.
POST /images/create?fromImage=jenkins/jenkins:latest HTTP/1.1
Host: localhost:2375
User-Agent: foo/1.0
Accept: */*
HTTP/1.1 200 OK
Api-Version: 1.39
Content-Type: application/json
Docker-Experimental: true
Ostype: linux
Server: Docker/18.09.1 (linux)
Date: Wed, 06 Feb 2019 16:53:19 GMT
Transfer-Encoding: chunked
39
{"status":"Pulling from jenkins/jenkins","id":"latest"}
5e
{"status":"Digest: sha256:abd3e3f96fbc3445c420fda590f37e2bd3377f69affd47b63b3d826d084c5ddc"}
45
{"status":"Status: Image is up to date for jenkins/jenkins:latest"}
0
Yes, it streams the image download progress -- client libraries that don't give low-level access to the chunked records may just concatenate the data before it's provided to you. As you encountered, early versions of the API returned JSON records with the only delimiter being the chunked transfer encoding, so client code received a concatenated block of undelimited JSON and had to parse it by tracking curlies/quotes/escape chars! It has since been updated to now emit records delimited by newlines, but can we count on them always being there? Who knows! This behavior changed without ceremony, and was not preserved if you call older versions of the API on newer daemons.
It returns 200 OK immediately, which doesn't represent success or failure. (Given the nature of the call, I'd imagine it should probably return 202 Accepted instead. Ideally, we'd get a Location header pointing to a new URL that we could use to query the progress/status.)
The response data returned is huge, spammy, and just... silly. If you have a docker instance listening on TCP, try curl -Nv -X POST http://yourdocker:2375/images/create?fromImage=jenkins/jenkins:latest -o /tmp/omgwtf.txt. You'll be amazed. A ton of bandwidth is wasted transferring server-rendered ASCII bar graphs!!!. In fact, the records return each layer's progress three different ways, as numeric fields for current and total bytes, as a bar graph, and as a pretty-printed string with MB or GB units. Why isn't this just rendered on the client? Great question.
Instead, you need your client to parse kilobytes or megabytes of spam.
The bar graph has a randomly escaped unicode rep of the > character, despite being safely inside a JSON string. Someone was just throwing escape calls at the wall to see what stuck? ¯\_(ツ)_/¯
The records themselves are pretty arbitrary. There's an id field that changes what it references, and the only way to know what kind of record it is to parse the human-readable string. Pulling from XXX vs Pulling fs layer vs Downloading etc. As far as I can tell, the only real way to know if it's done is to track all the ids, and ensure you get a Pull complete for each at the time that the socket closes.
You might be able to look for Status: Downloaded newer image for XXX but I'm not sure if there are multiple possible responses for this.
As I mentioned at the start, you'll probably have the best luck requesting /images/XXX/json after /images/create claims to be complete. The combination of the two calls will give a pretty reliable indication of whether /images/create worked or not.
Here's a longer block of concatenated client response that shows a few different record types. Edited for brevity:
{"status":"Pulling from jenkins/jenkins","id":"latest"}
{"status":"Pulling fs layer","progressDetail":{},"id":"ab1fc7e4bf91"}
{"status":"Pulling fs layer","progressDetail":{},"id":"35fba333ff52"}
{"status":"Pulling fs layer","progressDetail":{},"id":"f0cb1fa13079"}
{"status":"Pulling fs layer","progressDetail":{},"id":"3d1dd648b5ad"}
{"status":"Pulling fs layer","progressDetail":{},"id":"a9f886e483d6"}
{"status":"Pulling fs layer","progressDetail":{},"id":"4346341d3c49"}
..
"status":"Waiting","progressDetail":{},"id":"3d1dd648b5ad"}
{"status":"Waiting","progressDetail":{},"id":"a9f886e483d6"}
{"status":"Waiting","progressDetail":{},"id":"4346341d3c49"}
{"status":"Waiting","progressDetail":{},"id":"006f2208d67a"}
{"status":"Waiting","progressDetail":{},"id":"fb85cf26717d"}
{"status":"Waiting","progressDetail":{},"id":"52ca068dbca7"}
{"status":"Waiting","progressDetail":{},"id":"82f4759b8d12"}
...
{"status":"Downloading","progressDetail":{"current":110118,"total":10780995},"progress":"[\u003e ] 110.1kB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":457415,"total":45344749},"progress":"[\u003e ] 457.4kB/45.34MB","id":"ab1fc7e4bf91"}
{"status":"Downloading","progressDetail":{"current":44427,"total":4340040},"progress":"[\u003e ] 44.43kB/4.34MB","id":"f0cb1fa13079"}
{"status":"Downloading","progressDetail":{"current":817890,"total":10780995},"progress":"[===\u003e ] 817.9kB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":1833671,"total":45344749},"progress":"[==\u003e ] 1.834MB/45.34MB","id":"ab1fc7e4bf91"}
{"status":"Downloading","progressDetail":{"current":531179,"total":4340040},"progress":"[======\u003e ] 531.2kB/4.34MB","id":"f0cb1fa13079"}
{"status":"Downloading","progressDetail":{"current":1719010,"total":10780995},"progress":"[=======\u003e ] 1.719MB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":3205831,"total":45344749},"progress":"[===\u003e ] 3.206MB/45.34MB","id":"ab1fc7e4bf91"}
{"status":"Downloading","progressDetail":{"current":1129195,"total":4340040},"progress":"[=============\u003e ] 1.129MB/4.34MB","id":"f0cb1fa13079"}
{"status":"Downloading","progressDetail":{"current":2640610,"total":10780995},"progress":"[============\u003e ] 2.641MB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":1719019,"total":4340040},"progress":"[===================\u003e ] 1.719MB/4.34MB","id":"f0cb1fa13079"}
{"status":"Downloading","progressDetail":{"current":4586183,"total":45344749},"progress":"[=====\u003e ] 4.586MB/45.34MB","id":"ab1fc7e4bf91"}
{"status":"Downloading","progressDetail":{"current":3549922,"total":10780995},"progress":"[================\u003e ] 3.55MB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":2513643,"total":4340040},"progress":"[============================\u003e ] 2.514M
...
{"status":"Pull complete","progressDetail":{},"id":"6d9b49fc8a28"}
{"status":"Extracting","progressDetail":{"current":380,"total":380},"progress":"[==================================================\u003e] 380B/380B","id":"6302e8b6563c"}
{"status":"Extracting","progressDetail":{"current":380,"total":380},"progress":"[==================================================\u003e] 380B/380B","id":"6302e8b6563c"}
{"status":"Pull complete","progressDetail":{},"id":"6302e8b6563c"}
{"status":"Extracting","progressDetail":{"current":1548,"total":1548},"progress":"[==================================================\u003e] 1.548kB/1.548kB","id":"7348f018cf93"}
{"status":"Extracting","progressDetail":{"current":1548,"total":1548},"progress":"[==================================================\u003e] 1.548kB/1.548kB","id":"7348f018cf93"}
{"status":"Pull complete","progressDetail":{},"id":"7348f018cf93"}
{"status":"Extracting","progressDetail":{"current":3083,"total":3083},"progress":"[==================================================\u003e] 3.083kB/3.083kB","id":"c651ee7bd59e"}
{"status":"Extracting","progressDetail":{"current":3083,"total":3083},"progress":"[==================================================\u003e] 3.083kB/3.083kB","id":"c651ee7bd59e"}
{"status":"Pull complete","progressDetail":{},"id":"c651ee7bd59e"}
{"status":"Digest: sha256:abd3e3f96fbc3445c420fda590f37e2bd3377f69affd47b63b3d826d084c5ddc"}
{"status":"Status: Downloaded newer image for jenkins/jenkins:latest"}
This code runs the Internet now. =8-O

This particular endpoint actually returns chunked encoding. An example via curl:
$ curl -v -X POST http://localhost:4243/images/create?fromImage=base
* About to connect() to localhost port 4243 (#0)
* Trying ::1...
* Connection refused
* Trying 127.0.0.1...
* connected
* Connected to localhost (127.0.0.1) port 4243 (#0)
> POST /images/create?fromImage=base HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8y zlib/1.2.5
> Host: localhost:4243
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Fri, 07 Feb 2014 04:21:59 GMT
< Transfer-Encoding: chunked
<
* Connection #0 to host localhost left intact
{"status":"Pulling repository base"}{"status":"Pulling image (ubuntu-quantl) from base","progressDetail":{},"id":"b750fe79269d"}{"status":"Pulling image (ubuntu-quantl) from base, endpoint: https://cdn-registry-1.docker.io/v1/","progressDetail":{},"id":"b750fe79269d"}{"status":"Pulling dependent layers","progressDetail":{},"id":"b750fe79269d"}{"status":"Download complete","progressDetail":{},"id":"27cf78414709"}{"status":"Download complete","progressDetail":{},"id":"b750fe79269d"}{"status":"Download complete","progressDetail":{},"id":"b750fe79269d"}* Closing connection #0
Now I'm not sure how you go about parsing this in Python, but in Ruby, I can use Yajl like so:
parts = []
Yajl::Parser.parse(body) { |o| parts << o }
puts parts
{"status"=>"Pulling repository base"}
{"status"=>"Pulling image (ubuntu-quantl) from base", "progressDetail"=>{}, "id"=>"b750fe79269d"}
{"status"=>"Pulling image (ubuntu-quantl) from base, endpoint: https://cdn-registry-1.docker.io/v1/", "progressDetail"=>{}, "id"=>"b750fe79269d"}
{"status"=>"Pulling dependent layers", "progressDetail"=>{}, "id"=>"b750fe79269d"}
{"status"=>"Download complete", "progressDetail"=>{}, "id"=>"27cf78414709"}
{"status"=>"Download complete", "progressDetail"=>{}, "id"=>"b750fe79269d"}
{"status"=>"Download complete", "progressDetail"=>{}, "id"=>"b750fe79269d"}

Using Docker v1.9 I still having this problem to deal with.
Also have found an issue on Docker Github repository: Docker uses invalid JSON format in some API functions #16925
Where some contributor suggests to use Content-Type HTTP header like this: application/json; boundary=NL
This not worked for me.
Then, while struggling with my custom parser, found this question StackOverflow: How to handle a huge stream of JSON dictionaries?

Related

Inserting more than one point into Influx DB on a single from command prompt

From a windows command prompt i'm attempting to enter 2 points on a single line, but I always get a parsing error.
If I try to separate the points with a \r\n, I get the same parse error
INSERT temperature,location=cityname value=-6.01 1575378000temperature,location=cityname value=-5.99 1575381600
ERR: {"error":"unable to parse 'temperature,location=cityname value=-6.01 1575378000temperature,location=cityname value=-5.99 1575381600': bad timestamp"}
Anyone have any experience with doing multi point inserts?
Update: using a file as input with curl
.$ curl -i -XPOST 'http://myserver:8086/write?db=testing' --data-binary #example.txt
HTTP/1.1 400 Bad Request
Content-Type: application/json
Request-Id: f005e1c8-1613-11ea-82ff-00155d0968c8
X-Influxdb-Build: OSS
X-Influxdb-Error: unable to parse 'temperature,location=cityname value=-6.00 1575378000 ': bad timestamp unable to parse 'temperature,location=cityname value=-5.00 1575381600 ': bad timestamp
X-Influxdb-Version: 1.7.7
X-Request-Id: f005e1c8-1613-11ea-82ff-00155d0968c8
Date: Tue, 03 Dec 2019 21:29:07 GMT
Content-Length: 189
{"error":"unable to parse 'temperature,location=cityname value=-6.00 1575378000\r': bad timestamp\nunable to parse 'temperature,location=cityname value=-5.00 1575381600\r': bad timestamp"}
example.txt file contents:
temperature,location=cityname value=-6.00 1575378000
temperature,location=cityname value=-5.00 1575381600
There's no multiple points in the INSERT clause.
You have to either use separate INSERTS (what's the issue with that, by the way? why would you insist to have it it in a single clause?)
Or use batches of line protocol records another way (like this, or send it through REST endpoint).

Problem at tweeting with ESP8266 via Thingspeak

I programmed my ESP8266 to read the soil moisture. Depending on the moisture a water pump gets activated. Now I wanted the ESP to tweet different sentences, depending on the situation.
Therefore I connected my twitter account to thingspeak.com and followed this code
Connecting to the internet works fine.
Problems:
It does not tweet every time and if it tweets, only the first word from a sentence shows up at twitter.
According to the forum, where I found the code, I already tried to replace all the spaces between the words with "%20". However then nothing shows up at twitter at all. Also single words are not always posted to twitter.
This is the code I have problems with:
// if connection to thingspeak.com is successful, send your tweet!
if (client.connect("184.106.153.149", 80))
{
client.print("GET /apps/thingtweet/1/statuses/update?key=" + API + "&status=" + tweet + " HTTP/1.1\r\n");
client.print("Host: api.thingspeak.com\r\n");
client.print("Accept: */*\r\n");
client.print("User-Agent: Mozilla/4.0 (compatible; esp8266 Lua; Windows NT 5.1)\r\n");
client.print("\r\n");
Serial.println("tweeted " + tweet);
}
I don't get any error messages.
Maybe you could help me to make it visible if the tweet was really sent and how I manage to tweet a whole sentence.
I am using the Arduino IDE version 1.8.9 and I am uploading to this board
The rest of the code works fine. The only problem is the tweeting.
Update
I now tried a few different things:
Checking server response
Works and helps a lot. The results are:
Single words as String don't get any response at all
Same for Strings like "Test%20Tweet"
Strings with multiple words like "Test Tweet" get the following response and the first word of the String shows up as a tweet
HTTP/1.1 200 OK
Server: nginx/1.7.5
Date: Wed, 19 Jun 2019 18:44:22 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 1
Connection: keep-alive
Status: 200 OK
X-Frame-Options: SAMEORIGIN
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PUT, OPTIONS, DELETE, PATCH
Access-Control-Allow-Headers: origin, content-type, X-Requested-With
Access-Control-Max-Age: 1800
ETag: W/"RANDOM_CHARS"
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: THE_ID
1
I think the Content-Length might be the problem?
But I don't know how to change it in this code.
Checking if the connection succeded
I implemented this into my code an it never shows up on the monitor. So I think i never have a problem with not connecting.
Use a hostname instead of IP address
I tried it and never got a bad request. On the other hand nothing shows up on twitter at all.
Check if your tweet variable contains any new-line characters (carriage return or line feed). For example, the following variable would cause problems
String tweet = "Tweet no. 1\r\n";
due to the new-line characters at the end. These characters will cause the first line of the HTTP request to be cut short. I.e., instead of
GET /apps/thingtweet/1/statuses/update?key=api_key&status=Tweet no. 1 HTTP/1.1\r\n
it would become
GET /apps/thingtweet/1/statuses/update?key=api_key&status=Tweet no. 1\r\n
and the server would reject it with a 400 (Bad request) error.
On the other hand
String tweet = "Tweet no. 1";
would be fine.
If your tweets may contain such characters, then try encoding them before passing them to client.print():
tweet.replace("\r", "%0D");
tweet.replace("\n", "%0A");
Use a hostname instead of IP address
According to https://uk.mathworks.com/help/thingspeak/writedata.html, the relevant hostname for the API you are using is api.thingspeak.com. Use that instead of the IP address. This is preferable because the IP address a hostname points to can change regularly. (The IP address you are using doesn't even seem to be correct - and may already be out of date.)
I.e., change
if (client.connect("184.106.153.149", 80)) {
to
if (client.connect("api.thingspeak.com", 80)) {
API endpoint
Are you sure you are using the correct API endpoint? According to the link above, it looks like the API endpoint you need is https://api.thingspeak.com/update.json - so you may need to change
client.print("GET /apps/thingtweet/1/statuses/update?key=" + API + "&status=" + tweet + " HTTP/1.1\r\n");
to
client.print("GET /update.json?api_key=" + API + "&status=" + tweet + " HTTP/1.1\r\n");
Check if the connection succeeded
Presently, your device sends the HTTP request if connects to the server successfully - but doesn't give any indication if the connection fails! So add an else block to handle that scenario and notify the user via the serial console.
if (client.connect("api.thingspeak.com", 80)) {
client.print("GET /apps/thingtweet/1/statuses/update?key=" + API + "&status=" + tweet + " HTTP/1.1\r\n");
// etc.
}
else {
Serial.println("Connection to the server failed!");
}
Checking server response
To check the response from the server, add the following block to your main loop - which will print the server response via the serial console.
delay(50);
while (client.available()) {
String response_line = client.readString();
Serial.println(response_line);
}
To clarify: that code should go inside your loop() function.
The response should include a status line - such as HTTP/1.1 200 OK if the request was successful, or HTTP/1.1 400 Bad Request if there was a problem.
In the case of a Bad request response, the full message will quite likely contain more information about the precise reason the request failed.
HTTP vs HTTPs
Lastly, are you sure that the API supports (plain, unencrypted) HTTP as well as HTTPs? If not, that may be your problem.

Race between socket accept and receive

I am using nodemcu with an esp-32 and recently came across an annoying problem. I refer to this sample from the NodeMCU Github page:
-- a simple HTTP server
srv = net.createServer(net.TCP)
srv:listen(80, function(conn)
conn:on("receive", function(sck, payload)
print(payload)
sck:send("HTTP/1.0 200 OK\r\nContent-Type: text/html\r\n\r\n<h1> Hello, NodeMCU.</h1>")
end)
conn:on("sent", function(sck) sck:close() end)
end)
This doesn't seem to work in every case.
If I try it with telnet, there is no issue:
$ telnet 172.17.10.59 80
Trying 172.17.10.59...
Connected to 172.17.10.59.
Escape character is '^]'.
GET / HTTP/1.1
HTTP/1.0 200 OK
Content-Type: text/html
<h1> Hello, NodeMCU.</h1>
Connection closed by foreign host.
But when using wget, it hangs most of the time:
$ wget http://172.17.10.59/
--2017-05-12 15:00:09-- http://172.17.10.59/
Connecting to 172.17.10.59:80... connected.
HTTP request sent, awaiting response...
After some research, the root cause seems to be, that the receive callback is registered after the first data was received from the client. This doesn't happen when testing manually with telnet, but with a client like wget or a browser, the delay between connecting and receiving the first data seems to be too small to register the receive handler first.
I have looked into the nodemcu code and there doesn't seem to be an easy way to work around this problem. Or do I miss something here?
In HTTP/1.0, you need the "Content-Length" in the HTTP header when there is a message-body.
For example:
"HTTP/1.0 200 OK\r\nContent-Type: text/html\r\nContent-Length: 25 \r\n\r\n<h1> Hello, NodeMCU.</h1>"
ref: https://www.w3.org/Protocols/HTTP/1.0/spec.html#Content-Length

NodeMCU - Lua - HTTP Post or luasocket - Need guidance

This is my first time here and thought of joining the forum because I am new to Lua programming and have almost given up on HTTP Post method.
I am trying my hands on IOT using ESP8266 (running on NodeMCU) and using ESPlore to send the Lua program to ESP8266.
So, my program's objective is to call an API and post few parameters using my Lua program running on ESP8266.
I tried the following approaches -
1. Using HTTP Post
conn=net.createConnection(net.TCP, 0)
conn:on("receive", display)
conn:connect(80,HOST)
conn:on("connection",function(obj)
local post_request = build_post_request(HOST,URI)
obj:send(post_request)
end
----function as below ----------------------------------------------------
function build_post_request(host, uri)
local data = ""
data = "param1=1&param2=2"
request = "POST uri HTTP/1.1\r\n"..
"Host: example.com\r\n"..
"apiKey: e2sss3af-9ssd-43b0-bfdd-24a1dssssc46\r\n"..
"Cache-Control: no-cache\r\n"..
"Content-Type: application/x-www-form-urlencoded\r\n"..data
return request
end
----------------Response --------------------------------------
HTTP/1.1 400 Bad Request
Date: Sun, 11 Oct 2015 16:10:55 GMT
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=utf-8
Content-Language: en
Content-Length: 968
Connection: close
Apache Tomcat/7.0.54 - Error report
The request sent by the client was syntactically incorrect.
I don't understand what is wrong with it.
2. Using Luasocket
I have included following in my program -
local http = require"socket.http"
local ltn12 = require"ltn12"
and it throws following errors-
script1.lua:3: module 'socket.http' not found:
no field package.preload['socket.http']
no file 'socket/http.lc'
no file 'socket/http.lua'
I don't know how to get these libs and send to ESP8266 and not sure if that will suffice.
Question :
Which is the best method to post data to a server using an API.
a. If HTTP Post, what is the problem with my code.
b. If Luasocket then how do I send it to ESP8266 as I am not using a compiler on my laptop.
"Content-Type: application/x-www-form-urlencoded\r\n"..data
I don't understand what is wrong with it.
In HTTP,the headers are always delimited by \r\n\r\n. Without the second CR-LF pair, the following data with cause a header error as reported dy Tomcat.
Second you can't use the standard socket libraries on the ESP8266. You must use the net library which is a nodemcu wrapper around the Espressif SDK.

Heroku & Rails - Varnish is only caching very occasionally

I have an issue similar to Heroku & Rails - Varnish HTTP Cache Not Working, but the solution (wait for a while, then everything works) doesn't seem to apply - I've had the setup below for several days.
This thread on the Heroku Google group has some users with the same problem. They mention that it takes a while for everything to be cached, but my understanding is that after a while, everything should get cached, no? Or does that only apply if there is a Lot of traffic?
I need some advice on where I should be looking/what I can try changing in order to get caching working properly.
My setup:
I have http://www.swingoutlondon.co.uk running on Heroku (Rails 3.0.3, Ruby 1.9.2, bamboo-mri-1.9.2) and the main index page performs a lot of database queries to return what is essentially a static page - usually taking about 2-3 seconds (yes, that's something I really do need to address, but I figure varnish caching is a quick win).
I've set the Cache-Control response header as described here, and indeed that does seem to have been set on the page:
>> curl -I http://swingoutlondon.co.uk
HTTP/1.1 200 OK
Server: nginx
Date: Sun, 13 May 2012 00:01:05 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Cache-Control: public, max-age=300
Etag: "2565201f3ae39c6a9a1f6b1fb8bbbe0a"
X-Ua-Compatible: IE=Edge,chrome=1
X-Runtime: 1.699667
Content-Length: 44224
Accept-Ranges: bytes
X-Varnish: 681634826
Age: 0
Via: 1.1 varnish
Note: Cache-Control: public, max-age=300
I assume that Age: 0 indicates that it hasn't retrieved a cached copy, and indeed the command returns in the normal slow 2-3 seconds.
If keep repeatedly trying that curl, I can occasionally a cached copy (the page loads in under half a second and Age is greater than 0).
I must confess to not fully understanding HTTP headers, but one clue might be: when Age is greater than 0, I get two lots of digits in X-Varnish (in all other cases I only get one set):
X-Varnish: 848670407 848650521
Here's what I've checked:
the source of is identical each time.
I have one before_filter on that page, which sets the time the page was last updated as an instance variable.
there are a number of cookies - as far as I can see they are all set by either Google Analytics or the Twitter or Facebook buttons.
For good measure, here are my Request headers:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control:max-age=0
Connection:keep-alive
Cookie:__utma=264326157.189257391.1336869624.1336869624.1336869624.1; __utmb=264326157.2.10.1336869624; __utmc=264326157; __utmz=264326157.1336869624.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
Host:www.swingoutlondon.co.uk
If-None-Match:"2565201f3ae39c6a9a1f6b1fb8bbbe0a"
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.168 Safari/535.19
Ah well - turns out that because Heroku uses multiple independent Varnish servers, and because traffic to Swing Out London is relatively low, I shouldn't expect to have many pages served by the caches if my max-age is only 5 minutes. Setting it to 20 or 30 minutes results in much more caching.
I've written a detailed blog post collecting my learnings. Thanks to Garry Shulter for helping me out with this one.

Resources