Docker API image creation / pull (/v1.6/images/create) apparently always return
HTTP/1.1 200 OK
Content-Type: application/json
no matter if the process is a success or a failure.
Furthermore, the payload is not valid json.
eg: /v1.6/images/create?fromImage=whatevertheflush
returns:
{"status":"Pulling repository whatevertheflush"}{"error":"Server error: 404 trying to fetch remote history for whatevertheflush","errorDetail":{"code":404,"message":"Server error: 404 trying to fetch remote history for whatevertheflush"}}
Not being valid json, and the HTTP error not being forwarded / used makes it awkward to handle errors for clients.
Indeed, docker-py just puke the payload (https://github.com/dotcloud/docker-py/blob/master/docker/client.py#L374). And DockerHTTPClient from openstack tries to return a value based on the http error code, which is always 200... (https://github.com/openstack/nova/blob/master/nova/virt/docker/client.py#L191)
Now, I understand the pull might take a long time, and that it somewhat make sense to start streaming an answer to the client, but I can't help thinking something is wrong here.
So, this is three fold:
am I missing something entirely here?
if not: if you are implementing a client application (say, in Python), how would you handle this (elegantly, if possible :))? try to detect valid json blocks, load them, and exit whenever we "think" something is wrong?
if not: is this going to change (for the better) in future docker versions?
This question is a bit old, but for the future reader who has landed on this page, I'd like to let you know you're not alone, we feel your pain. This API is indeed as terrible as it looks.
The TL;DR answer is "the /images/create response format is undocumented; discard the output and query /images/XXX/json after your create call completes."
I wrote some orchestration tools a few years ago, and I found the /images/create API to be extremely annoying. But let's dive in:
There is no documented schema of the 200 response; the v1.19 docs simply gave examples of a few records. The v1.37 (latest at the time I write this) docs don't even go that far, no details are provided at all of the response.
The response is sent as Transfer-Encoding: chunked, and each record sent is preceded by the byte count in hex. Here's a low-level exerpt (bypassing curl, so we can see what actually gets sent on the wire):
host-4:~ rg$ telnet localhost 2375
Trying ::1...
Connected to localhost.
Escape character is '^]'.
POST /images/create?fromImage=jenkins/jenkins:latest HTTP/1.1
Host: localhost:2375
User-Agent: foo/1.0
Accept: */*
HTTP/1.1 200 OK
Api-Version: 1.39
Content-Type: application/json
Docker-Experimental: true
Ostype: linux
Server: Docker/18.09.1 (linux)
Date: Wed, 06 Feb 2019 16:53:19 GMT
Transfer-Encoding: chunked
39
{"status":"Pulling from jenkins/jenkins","id":"latest"}
5e
{"status":"Digest: sha256:abd3e3f96fbc3445c420fda590f37e2bd3377f69affd47b63b3d826d084c5ddc"}
45
{"status":"Status: Image is up to date for jenkins/jenkins:latest"}
0
Yes, it streams the image download progress -- client libraries that don't give low-level access to the chunked records may just concatenate the data before it's provided to you. As you encountered, early versions of the API returned JSON records with the only delimiter being the chunked transfer encoding, so client code received a concatenated block of undelimited JSON and had to parse it by tracking curlies/quotes/escape chars! It has since been updated to now emit records delimited by newlines, but can we count on them always being there? Who knows! This behavior changed without ceremony, and was not preserved if you call older versions of the API on newer daemons.
It returns 200 OK immediately, which doesn't represent success or failure. (Given the nature of the call, I'd imagine it should probably return 202 Accepted instead. Ideally, we'd get a Location header pointing to a new URL that we could use to query the progress/status.)
The response data returned is huge, spammy, and just... silly. If you have a docker instance listening on TCP, try curl -Nv -X POST http://yourdocker:2375/images/create?fromImage=jenkins/jenkins:latest -o /tmp/omgwtf.txt. You'll be amazed. A ton of bandwidth is wasted transferring server-rendered ASCII bar graphs!!!. In fact, the records return each layer's progress three different ways, as numeric fields for current and total bytes, as a bar graph, and as a pretty-printed string with MB or GB units. Why isn't this just rendered on the client? Great question.
Instead, you need your client to parse kilobytes or megabytes of spam.
The bar graph has a randomly escaped unicode rep of the > character, despite being safely inside a JSON string. Someone was just throwing escape calls at the wall to see what stuck? ¯\_(ツ)_/¯
The records themselves are pretty arbitrary. There's an id field that changes what it references, and the only way to know what kind of record it is to parse the human-readable string. Pulling from XXX vs Pulling fs layer vs Downloading etc. As far as I can tell, the only real way to know if it's done is to track all the ids, and ensure you get a Pull complete for each at the time that the socket closes.
You might be able to look for Status: Downloaded newer image for XXX but I'm not sure if there are multiple possible responses for this.
As I mentioned at the start, you'll probably have the best luck requesting /images/XXX/json after /images/create claims to be complete. The combination of the two calls will give a pretty reliable indication of whether /images/create worked or not.
Here's a longer block of concatenated client response that shows a few different record types. Edited for brevity:
{"status":"Pulling from jenkins/jenkins","id":"latest"}
{"status":"Pulling fs layer","progressDetail":{},"id":"ab1fc7e4bf91"}
{"status":"Pulling fs layer","progressDetail":{},"id":"35fba333ff52"}
{"status":"Pulling fs layer","progressDetail":{},"id":"f0cb1fa13079"}
{"status":"Pulling fs layer","progressDetail":{},"id":"3d1dd648b5ad"}
{"status":"Pulling fs layer","progressDetail":{},"id":"a9f886e483d6"}
{"status":"Pulling fs layer","progressDetail":{},"id":"4346341d3c49"}
..
"status":"Waiting","progressDetail":{},"id":"3d1dd648b5ad"}
{"status":"Waiting","progressDetail":{},"id":"a9f886e483d6"}
{"status":"Waiting","progressDetail":{},"id":"4346341d3c49"}
{"status":"Waiting","progressDetail":{},"id":"006f2208d67a"}
{"status":"Waiting","progressDetail":{},"id":"fb85cf26717d"}
{"status":"Waiting","progressDetail":{},"id":"52ca068dbca7"}
{"status":"Waiting","progressDetail":{},"id":"82f4759b8d12"}
...
{"status":"Downloading","progressDetail":{"current":110118,"total":10780995},"progress":"[\u003e ] 110.1kB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":457415,"total":45344749},"progress":"[\u003e ] 457.4kB/45.34MB","id":"ab1fc7e4bf91"}
{"status":"Downloading","progressDetail":{"current":44427,"total":4340040},"progress":"[\u003e ] 44.43kB/4.34MB","id":"f0cb1fa13079"}
{"status":"Downloading","progressDetail":{"current":817890,"total":10780995},"progress":"[===\u003e ] 817.9kB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":1833671,"total":45344749},"progress":"[==\u003e ] 1.834MB/45.34MB","id":"ab1fc7e4bf91"}
{"status":"Downloading","progressDetail":{"current":531179,"total":4340040},"progress":"[======\u003e ] 531.2kB/4.34MB","id":"f0cb1fa13079"}
{"status":"Downloading","progressDetail":{"current":1719010,"total":10780995},"progress":"[=======\u003e ] 1.719MB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":3205831,"total":45344749},"progress":"[===\u003e ] 3.206MB/45.34MB","id":"ab1fc7e4bf91"}
{"status":"Downloading","progressDetail":{"current":1129195,"total":4340040},"progress":"[=============\u003e ] 1.129MB/4.34MB","id":"f0cb1fa13079"}
{"status":"Downloading","progressDetail":{"current":2640610,"total":10780995},"progress":"[============\u003e ] 2.641MB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":1719019,"total":4340040},"progress":"[===================\u003e ] 1.719MB/4.34MB","id":"f0cb1fa13079"}
{"status":"Downloading","progressDetail":{"current":4586183,"total":45344749},"progress":"[=====\u003e ] 4.586MB/45.34MB","id":"ab1fc7e4bf91"}
{"status":"Downloading","progressDetail":{"current":3549922,"total":10780995},"progress":"[================\u003e ] 3.55MB/10.78MB","id":"35fba333ff52"}
{"status":"Downloading","progressDetail":{"current":2513643,"total":4340040},"progress":"[============================\u003e ] 2.514M
...
{"status":"Pull complete","progressDetail":{},"id":"6d9b49fc8a28"}
{"status":"Extracting","progressDetail":{"current":380,"total":380},"progress":"[==================================================\u003e] 380B/380B","id":"6302e8b6563c"}
{"status":"Extracting","progressDetail":{"current":380,"total":380},"progress":"[==================================================\u003e] 380B/380B","id":"6302e8b6563c"}
{"status":"Pull complete","progressDetail":{},"id":"6302e8b6563c"}
{"status":"Extracting","progressDetail":{"current":1548,"total":1548},"progress":"[==================================================\u003e] 1.548kB/1.548kB","id":"7348f018cf93"}
{"status":"Extracting","progressDetail":{"current":1548,"total":1548},"progress":"[==================================================\u003e] 1.548kB/1.548kB","id":"7348f018cf93"}
{"status":"Pull complete","progressDetail":{},"id":"7348f018cf93"}
{"status":"Extracting","progressDetail":{"current":3083,"total":3083},"progress":"[==================================================\u003e] 3.083kB/3.083kB","id":"c651ee7bd59e"}
{"status":"Extracting","progressDetail":{"current":3083,"total":3083},"progress":"[==================================================\u003e] 3.083kB/3.083kB","id":"c651ee7bd59e"}
{"status":"Pull complete","progressDetail":{},"id":"c651ee7bd59e"}
{"status":"Digest: sha256:abd3e3f96fbc3445c420fda590f37e2bd3377f69affd47b63b3d826d084c5ddc"}
{"status":"Status: Downloaded newer image for jenkins/jenkins:latest"}
This code runs the Internet now. =8-O
This particular endpoint actually returns chunked encoding. An example via curl:
$ curl -v -X POST http://localhost:4243/images/create?fromImage=base
* About to connect() to localhost port 4243 (#0)
* Trying ::1...
* Connection refused
* Trying 127.0.0.1...
* connected
* Connected to localhost (127.0.0.1) port 4243 (#0)
> POST /images/create?fromImage=base HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8y zlib/1.2.5
> Host: localhost:4243
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Fri, 07 Feb 2014 04:21:59 GMT
< Transfer-Encoding: chunked
<
* Connection #0 to host localhost left intact
{"status":"Pulling repository base"}{"status":"Pulling image (ubuntu-quantl) from base","progressDetail":{},"id":"b750fe79269d"}{"status":"Pulling image (ubuntu-quantl) from base, endpoint: https://cdn-registry-1.docker.io/v1/","progressDetail":{},"id":"b750fe79269d"}{"status":"Pulling dependent layers","progressDetail":{},"id":"b750fe79269d"}{"status":"Download complete","progressDetail":{},"id":"27cf78414709"}{"status":"Download complete","progressDetail":{},"id":"b750fe79269d"}{"status":"Download complete","progressDetail":{},"id":"b750fe79269d"}* Closing connection #0
Now I'm not sure how you go about parsing this in Python, but in Ruby, I can use Yajl like so:
parts = []
Yajl::Parser.parse(body) { |o| parts << o }
puts parts
{"status"=>"Pulling repository base"}
{"status"=>"Pulling image (ubuntu-quantl) from base", "progressDetail"=>{}, "id"=>"b750fe79269d"}
{"status"=>"Pulling image (ubuntu-quantl) from base, endpoint: https://cdn-registry-1.docker.io/v1/", "progressDetail"=>{}, "id"=>"b750fe79269d"}
{"status"=>"Pulling dependent layers", "progressDetail"=>{}, "id"=>"b750fe79269d"}
{"status"=>"Download complete", "progressDetail"=>{}, "id"=>"27cf78414709"}
{"status"=>"Download complete", "progressDetail"=>{}, "id"=>"b750fe79269d"}
{"status"=>"Download complete", "progressDetail"=>{}, "id"=>"b750fe79269d"}
Using Docker v1.9 I still having this problem to deal with.
Also have found an issue on Docker Github repository: Docker uses invalid JSON format in some API functions #16925
Where some contributor suggests to use Content-Type HTTP header like this: application/json; boundary=NL
This not worked for me.
Then, while struggling with my custom parser, found this question StackOverflow: How to handle a huge stream of JSON dictionaries?
I want to build a small script in python which needs to fetch an url. The server is a kind of crappy though and replies pure ASCII without any headers.
When I try:
import urllib.request
response = urllib.request.urlopen(url)
print(response.read())
I obtain a http.client.BadStatusLine: 100 error because this isn't a properly formatted HTTP response.
Is there another way to fetch an url and get the raw content, without trying to parse the response?
Thanks
It's difficult to answer your direct question without a bit more information; not knowing exactly how the (web) server in question is broken.
That said, you might try using something a bit lower-level, a socket for example. Here's one way (python2.x style, and untested):
#!/usr/bin/env python
import socket
from urlparse import urlparse
def geturl(url, timeout=10, receive_buffer=4096):
parsed = urlparse(url)
try:
host, port = parsed.netloc.split(':')
except ValueError:
host, port = parsed.netloc, 80
sock = socket.create_connection((host, port), timeout)
sock.sendall('GET %s HTTP/1.0\n\n' % parsed.path)
response = [sock.recv(receive_buffer)]
while response[-1]:
response.append(sock.recv(receive_buffer))
return ''.join(response)
print geturl('http://www.example.com/') #<- the trailing / is needed if no
other path element is present
And here's a stab at a python3.2 conversion (you may not need to decode from bytes, if writing the response to a file for example):
#!/usr/bin/env python
import socket
from urllib.parse import urlparse
ENCODING = 'ascii'
def geturl(url, timeout=10, receive_buffer=4096):
parsed = urlparse(url)
try:
host, port = parsed.netloc.split(':')
except ValueError:
host, port = parsed.netloc, 80
sock = socket.create_connection((host, port), timeout)
method = 'GET %s HTTP/1.0\n\n' % parsed.path
sock.sendall(bytes(method, ENCODING))
response = [sock.recv(receive_buffer)]
while response[-1]:
response.append(sock.recv(receive_buffer))
return ''.join(r.decode(ENCODING) for r in response)
print(geturl('http://www.example.com/'))
HTH!
Edit: You may need to adjust what you put in the request, depending on the web server in question. Guanidene's excellent answer provides several resources to guide you on that path.
What you need to do in this case is send a raw HTTP request using sockets.
You would need to do a bit of low level network programming using the socket python module in this case. (Network sockets actually return you all the information sent by the server as it as, so you can accordingly interpret the response as you wish. For example, the HTTP protocol interprets the response in terms of standard HTTP headers - GET, POST, HEAD, etc. The high-level module urllib hides this header information from you and just returns you the data.)
You also need to have some basic information about HTTP headers. For your case, you just need to know about the GET HTTP request. See its definition here - http://djce.org.uk/dumprequest, see an example of it here - http://en.wikipedia.org/wiki/HTTP#Example_session. (If you wish to capture live traces of HTTP requests sent from your browser, you would need a packet sniffing software like wireshark.)
Once you know basics about socket module and HTTP headers, you can go through this example - http://coding.debuntu.org/python-socket-simple-tcp-client which tells you how to send a HTTP request over a socket to a server and read its reply back. You can also refer to this unclear question on SO.
(You can google python socket http to get more examples.)
(Tip: I am not a Java fan, but still, if you don't find enough convincing examples on this topic under python, try finding it under Java, and then accordingly translate it to python.)
urllib.urlretrieve('http://google.com/abc.jpg', 'abc.jpg')