Docker stalls on run - Wait Condition Exit

Docker stalls on run - Wait Condition Exit - docker

Trying to run a small 100mb docker image. It does run in the end, but takes about 5 minutes to successfully run. When it does run it gives me the below error:
time="2018-09-25T10:20:28+01:00" level=error msg="error waiting for container: error during connect: Post http://XYZ/v1.32/containers/933e895a7a1429199f053ab6f384589307c927ebe9833f368352e196246308a0/wait?condition=next-exit: EOF"
I've googled it and it doesn't seem to come up in search results. Is there any way I can find out what is happening?

From the provided URL:
http://XYZ/v1.32/containers/{...container_ID...}/wait?condition=next-exit
one may assume that what gives you this error tries to talk to the Docker Engine API and more specifically to this endpoint:
POST /containers/{id}/wait
Wait for a container
Block until a container stops, then returns the exit code.
query Parameters:
condition string "not-running"
Wait until a container state reaches the given condition, either 'not-running' (default), 'next-exit', or 'removed'.
...but has a connection problem.

Related

Docker login to Gitea registry fails even though curl succeeds

I'm using Gitea (on Kubernetes, behind an Ingress) as a Docker image registry. On my network I have gitea.avril aliased to the IP where it's running. I recently found that my Kubernetes cluster was failing to pull images:
Failed to pull image "gitea.avril/scubbo/<image_name>:<tag>": rpc error: code = Unknown desc = failed to pull and unpack image "gitea.avril/scubbo/<image_name>:<tag>": failed to resolve reference "gitea.avril/scubbo/<image_name>:<tag>": failed to authorize: failed to fetch anonymous token: unexpected status: 530
While trying to debug this, I found that I am unable to login to the registry, even though curling with the same credentials succeeds:
$ curl -k -u "scubbo:$(cat /tmp/gitea-password)" https://gitea.avril/v2/_catalog
{"repositories":[...populated list...]}
# Tell docker login to treat `gitea.avril` as insecure, since certificate is provided by Kubernetes
$ cat /etc/docker/daemon.json
{
"insecure-registries": ["gitea.avril"]
}
$ docker login -u scubbo -p $(cat /tmp/gitea-password) https://gitea.avril
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Error response from daemon: Get "https://gitea.avril/v2/": received unexpected HTTP status: 530
The first request shows up as a 200 OK in the Gitea logs, the second as a 401 Unauthorized.
I get a similar error when I kubectl exec onto the Gitea container itself, install Docker, and try to docker login localhost:3000 - after an error indicating that server gave HTTP response to HTTPS client, it falls back to the http protocol and similarly reports a 530.
I've tried restart Gitea with GITEA__log__LEVEL=Debug, but that didn't result in any extra logging. I've also tried creating a fresh user (in case I have some weirdness cached somewhere) and using that - same behaviour.
EDIT: after increasing log level to Trace, I noticed that successful attempts to curl result in the following lines:
...rvices/auth/basic.go:67:Verify() [T] [638d16c4] Basic Authorization: Attempting login for: scubbo
...rvices/auth/basic.go:112:Verify() [T] [638d16c4] Basic Authorization: Attempting SignIn for scubbo
...rvices/auth/basic.go:125:Verify() [T] [638d16c4] Basic Authorization: Logged in user 1:scubbo
whereas attempts to docker login result in:
...es/container/auth.go:27:Verify() [T] [638d16d4] ParseAuthorizationToken: no token
This is the case even when doing docker login localhost:3000 from the Gitea container itself (that is - this is not due to some authentication getting dropped by the Kubernetes Ingress).
I'm not sure what could be causing this - I'll start up a fresh Gitea registry to compare.

EDIT: in this Github issue, the Gitea team pointed out that standard docker authentication includes creating a Bearer token which references the ROOT_URL, explaining this issue.
Text below preserved for posterity:
...Huh. I have a fix, and I think it indicates some incorrect (or, at least, unexpected) behaviour; but in fairness it only comes about because I'm doing some pretty unexpected things as well...
TL;DR attempting to docker login to Gitea from an alternative domain name can result in an error if the primary domain name is unavailable; apparently because, while doing so, Gitea itself makes a call to ROOT_URL rather than localhost
Background
Gitea has a configuration variable called ROOT_URL. This is, among other things, used to generate the copiable "HTTPS" links from repo pages. This is presumed to be the "main" URL on which users will access Gitea.
I use Cloudflared Tunnels to make some of my Kubernetes services (including Gitea) available externally (on <foo>.scubbo.org addresses) without opening ports to the outside world. Since Cloudflared tunnels do not automatically update DNS records when a new service is added, I have written a small tool[0] which can be run as an initContainer "before" restarting the Cloudflared tunnel, to refresh DNS[1].
Cold-start problem
However, now there is a cold-start problem:
(Unless I temporarily disable this initContainer) I can't start Cloudflared tunnels if Gitea is unavailable (because it's the source for the initContainer's image)
Gitea('s public address) will be unavailable until Cloudflared tunnels start up.
To get around this cold-start problem, in the Cloudflared initContainers definition, I reference the image by a Kubernetes Ingress name (which is DNS-aliased by my router) gitea.avril rather than by the public (Cloudflared tunnel) name gitea.scubbo.org. The cold-start startup sequence then becomes:
Cloudflared tries to start up, fails to find a registry at gitea.avril, continues to attempt
Gitea (Pod and Ingress) start up
Cloudflared detects that gitea.avril is now responding, pulls the Cloudflared initContainer image, and successfully deploys
gitea.scubbo.org is now available (via Cloudflared)
So far, so good. Except that testing now indicates[2] that, when trying to docker login (or docker pull, or presumably, many other docker commands) to a Gitea instance will result in a call to the ROOT_URL domain - which, if Cloudflared isn't up yet, will result in an error[3].
So what?
My particular usage of this is clearly an edge case, and I could easily get around this in a number of ways (including moving my "Cloudflared tunnel startup" to a separately-initialized, only-privately-available registry). However, what this reduces to is that "docker API calls to a Gitea instance will fail if the ROOT_URL for the instance is unavailable", which seems like unexpected behaviour to me - if the API call can get through to the Gitea service in the first place, it should be able to succeed in calling itself?
However, I totally recognize that the complexity of fixing this (going through and replacing $ROOT_URL with localhost:$PORT throughout Gitea) might not be worth the value. I'll open an issue on the Gitea team, but I'd be perfectly content with a WILLNOTFIX.
Footnotes
[0]: Note - depending on when you follow that link, you might see a red warning banner indicating "_Your ROOT_URL in app.ini is https://gitea.avril/ but you are visiting https://gitea.scubbo.org/scubbo/cloudflaredtunneldns_". That's because of this very issue!
[1]: Note from the linked issue that the Cloudflared team indicate that this is unexpected usage - "We don't expect the origins to be dynamically added or removed services behind cloudflared".
[2]: I think this is new behaviour, as I'm reasonably certain that I've done a successful "cold start" before. However, I wouldn't swear to it.
[3]: After I've , the error is instead error parsing HTTP 404 response body: unexpected end of JSON input: "" rather than the 530-related errors I got before. This is probably a quirk of Cloudflared's caching or DNS behaviour. I'm working on a minimal reproducing example that circumvents Cloudflared.

OWASP ZAP Docker Full Scan Fails with Proxy Error

I am attempting to perform a full scan against my application using the OWASP ZAP docker packaged scan, however the scans are failing to connect to the proxy with ProxyError, and the error seems to be inconsistent. Sometimes it will be NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f89c980ee80>: Failed to establish a new connection: [Errno 111] Connection refused'), other times it will be ConnectionResetError(104, 'Connection reset by peer').
The command I am using is basic one from the documentation:
docker run -t owasp/zap2docker-stable zap-full-scan.py -t https://my.webapp.com
Running the baseline-scan works without issue, and running the full-scan against other demo sites works fine too.
As suggested I have run curl against my application from inside the container and 200 is returned as expected, so I am unsure what needs to be configured to enable the connection.

Failed to pull image "mcr.microsoft.comoss/calico/pod2daemon-flexvol:v3.18.1 (missing "/")

Executive summary
For several weeks we sporadically see the following error on all of our AKS Kubernetes clusters:
Failed to pull image "mcr.microsoft.comoss/calico/pod2daemon-flexvol:v3.18.1
Obviously there is a missing "/" after "mcr.microsoft.com".
The problem started after upgrading the clusters from 1.17 to 1.20.
Where does this spelling error come from? Is there anything WE can do about it?
Some details
The full error is:
Failed to pull image "mcr.microsoft.comoss/calico/pod2daemon-flexvol:v3.18.1": rpc error: code = Unknown desc = failed to pull and unpack image "mcr.microsoft.comoss/calico/pod2daemon-flexvol:v3.18.1": failed to resolve reference "mcr.microsoft.comoss/calico/pod2daemon-flexvol:v3.18.1": failed to do request: Head https://mcr.microsoft.comoss/v2/calico/pod2daemon-flexvol/manifests/v3.18.1: dial tcp: lookup mcr.microsoft.comoss on 168.63.129.16:53: no such host
In 50% of the cases the following is logged also:
Pod 'calico-system/calico-typha-685d454c58-pdqkh' triggered a Warning-Event: 'FailedMount'. Warning Message: Unable to attach or mount volumes: unmounted volumes=[typha-ca typha-certs calico-typha-token-424k6], unattached volumes=[typha-ca typha-certs calico-typha-token-424k6]: timed out waiting for the condition
There seems to be no measurable effect on cluster health apart from the warnings - I see no correlating errors in any services.
We did not find a trigger which causes the behavior. It does not seem to be correlated to any change we do from our side (deployments, scaling, ...).
Also there seems to be no pattern as to the frequency. Sometimes there is no problem for several days and then we have the error pop up 10 times per day.
Another observation is that the calico-kube-controller and several pods were restarted. Replicaset and deployments did not change.
Restart time
Since all the pods of the daemonset are running eventually, the problem seems to be solving itself after some time.

Are you behind a firewall, and used this link to set it up
https://learn.microsoft.com/en-us/azure/aks/limit-egress-traffic
If so add HTTP to the mcr.microsoft.com, looks like MS missed the 's' in an update recently
Paul

WSO2 MI Infinite loop on invalid request line

I run a very simple micro integrator service that only has 1 proxy service and a single sequence. In this sequence the incoming XML message is transferred to amazon SQS service.
If I run this in the Integration Studio on the instance that comes built in I have no problems. However, when I package the file into a CAR and feed it to the docker instance it will boot up and instantly gets bombarded with requests? That is to say, the following logs take over and the container can no longer be manually stopped:
[2020-04-15 12:45:44,585] INFO
{org.apache.synapse.transport.passthru.SourceHandler} - Writer null
when calling informWriterError ^[[?62;c^[[?62;c[2020-04-15
12:45:46,589] ERROR
{org.apache.synapse.transport.passthru.SourceHandler} - HttpException
occurred org.apache.http.ProtocolException: Invalid request line:
ÇÃ^ú§ß¡ðO©%åË*29xÙVÀ$À(=À&À*kjÀ at
org.apache.http.impl.nio.codecs.AbstractMessageParser.parse(AbstractMessageParser.java:208)
at
org.apache.synapse.transport.http.conn.LoggingNHttpServerConnection$LoggingNHttpMessageParser.parse(LoggingNHttpServerConnection.java:407)
at
org.apache.synapse.transport.http.conn.LoggingNHttpServerConnection$LoggingNHttpMessageParser.parse(LoggingNHttpServerConnection.java:381)
at
org.apache.http.impl.nio.DefaultNHttpServerConnection.consumeInput(DefaultNHttpServerConnection.java:265)
at
org.apache.synapse.transport.http.conn.LoggingNHttpServerConnection.consumeInput(LoggingNHttpServerConnection.java:114)
at
org.apache.synapse.transport.passthru.ServerIODispatch.onInputReady(ServerIODispatch.java:82)
at
org.apache.synapse.transport.passthru.ServerIODispatch.onInputReady(ServerIODispatch.java:39)
at
org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:113)
at
org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:159)
at
org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:338)
at
org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:316)
at
org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:277)
at
org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:105)
at
org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:586)
at java.lang.Thread.run(Thread.java:748) Caused by:
org.apache.http.ParseException: Invalid request line:
ÇÃ^þvHÅFmÉ
(#ë¸'º¯æ¦V
I made sure there were no outside connections possible and I also found the older threads of someone describing this problem, but their solution (changing something in the keystore) did not work.
Also, I made sure to include the SQS certificate in the container as well.
I have no connections setup to connect to the container so that will be out of the equation as well.
What am I missing here?

I have no idea why, but I have identified the culprit to be none other than Portainer. When I shutdown Portainer the stream of requests stops.
According to Wireshark, the requests are all made towards
GET
http://172.17.0.1:9000/api/endpoints/< containerID >/docker/< someId >/logs
It seems that because the WSO2 container I'm trying to run is an ESB that uses endpoints and returns 400 status codes on non-existing endpoints portainer will retry until it succeeds. This is just my observation, so I could be wrong.
I have confirmed my findings by uploading my container to AWS where the problem did not exist.

How to add health check for python code in docker container

I have just started exploring Health Check feature in docker. All the tutorials online are showing same type of health check examples. Like this link1 link2. They are using this same command:
HEALTHCHECK CMD curl --fail http://localhost:3000/ || exit 1
I have a python code which I have converted into docker image and its container is running fine. I have service in container which runs fine but I want to put a health check on this service. It is started/stopped using :
service <myservice> start
service <myservice> stop
This service is responsible to send data to server. I need to put a health check on this but don't know how to do it. I have searched for this and didn't found any examples. Can anyone please point me to the right link or can explain it.?
Thanks

The health check command is not something magical, but rather something you can automate to get a better status on your service.
Some questions you should ask yourself before setting the healthcheck:
How would i normally verify that the service is running ok, assuming i'm running it normally instead of inside of a container and it's not an automated process, but rather i check the status doing something myself
If the service has no open ports it can be interrogated on, does it rather write it's success/failure status on disk inside a file?
If the service has open ports but it communicates on a custom protocol, do i have any tools that i use to interrogate the open ports
Let's take the curl command you listed: It implies that the healthcheck listed is monitoring a http service started on port 3000. The curl command will fail if the http status code returned is not 200. That's pretty straight forward to demonstrate the health check usage.
Assuming you write success or failure to a file every 30 seconds from your service then your healthcheck would be a script that exits abnormally when encountering the failure text
Assuming that your service has an open port but is communicating via some custom protocol like protocol buffers, then all you have to do is call it with a script that encodes a payload with proto buf then checks the output received
And so on...

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart