issues running multiple build jobs in parallel usin DIND - docker

We have a local gitlab instance using 3 runners which works fine when we have a single build job running.
Sadly, when launching 3 build jobs using dind in parallel, it fails with a multitude of errors:
sometimes unable to login to docker to pull the image for cache
sometimes the login succeeds and it fails in the build
but in both cases it complains about the certificate:
failed to dial gRPC: cannot connect to the Docker daemon. Is 'docker daemon' running on this host?: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "docker:dind CA")
Suspecting that the certificates get crashed by the other build job, we decided to separate the folder used for certificates, so it is unique to each runner, sadly the issue remains.
We have also noticed that DOCKER_HOST="tcp://docker:2376" the docker address is random, and many times returns the same value, which means again they are using the same resources.
I have found a guide on how to manually use a script to ensure each job is connected to its unique dind service (HERE), but since the article is over 5 years old, I wonder if that is still applicable or am I doing something wrong.
Please share any advice or guidance on where to look.

Related

Google Cloud Run - create service task is loadingforever

I'm trying to deploy a node.js application to Google cloud run.
I pushed a docker to container registry - that seems to be successful.
But, when I'm trying to deploy it to google cloud run -> to make it public, and accessible via WAN it fails, for unknown reasons.
While loading:
this step can take 10-15 minutes...
when it fails:
Resource readiness deadline exceeded.
The solution is provided in the GCP documentation:
https://cloud.google.com/run/docs/troubleshooting#service-agent
They suggest that you Verify that the service agent has the Cloud Run Service Agent role. If the service agent does not have the role, grant it.
Additionally, you should check the logs for the run app, you might see clues to what the cause is.

Duplicate image registries 'index.docker.io' found in the 'imageRegistryCredentials' of container group

[PS C:\Source\VelocityAzurev0.10.0\credentialagent-docker-compose> docker compose up
[+] Running 0/1
- Group credentialagent-docker-compose Error 1.7s
containerinstance.ContainerGroupsClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="DuplicateImageRegistries" Message="Duplicate image registries 'index.docker.io' found in the 'imageRegistryCredentials' of container group 'credentialagent-docker-compose'."
PS C:\Source\VelocityAzurev0.10.0\credentialagent-docker-compose>]
This was working for me until this morning with no apparent YAML changes. If I tweak the YAML to use local vs azure resources and use a local Docker context, the compose up works. The prior successful runs were removed with "compose down". I double checked the Azure subscription and there appear to be no Container Instances or Groups present in the resource group.
I can't seem to find any pertinent questions/solutions to this particular error regarding the Code="DuplicateImageRegistries".
I have the same problem today . After a huge unsuccessful research I can deploy my container again after run docker logout.
$ docker logout
Removing login credentials for https://index.docker.io/v1/
I have seen this issue in the past where Azure Technical Support team helped to resolve this issue. I would recommend you to create an Azure Technical Support Ticket and get the issue resolved with best support.

GKE can't pull image from GCR

This one is a real head-scratcher, because everything had worked fine for years until yesterday. I have a google cloud account and the billing is set up correctly. I have private images in my GCR registry which I can 'docker pull' and 'docker push' from my laptop (MacBook Pro with Big Sur 11.4) with no problems.
The problem I detail here started happening yesterday after I deleted a project in the google cloud console, then created it again from scratch with the same name. The previous project had no problem pulling GCR images, the new one couldn't pull the same images. I have now used the cloud console to create new, empty test projects with a variety of names, with new clusters using default GKE values. But this new problem persists with all of them.
When I use kubectl to create a deployment on GKE that uses any of the GCR images in the same project, I get ErrImagePull errors. When I 'describe' the pod that won't load the image, the error (with project id obscured) is:
Failed to pull image "gcr.io/test-xxxxxx/test:1.0.0": rpc error: code
= Unknown desc = failed to pull and unpack image "gcr.io/test-xxxxxx/test:1.0.0": failed to resolve reference
"gcr.io/test-xxxxxx/test:1.0.0": unexpected status code [manifests
1.0.0]: 401 Unauthorized.
This happens when I use kubectl from my laptop (including after wiping out and creating a new .kube/config file with proper credentials), but happens exactly the same when I use the cloud console to set up a deployment by choosing 'Deploy to GKE' for the GCR image... no kubectl involved.
If I ssh into a node in any of these new clusters and try to 'docker pull' a GCR image (in the same project), I get a similar error:
Error response from daemon: unauthorized: You don't have the needed
permissions to perform this operation, and you may have invalid
credentials. To authenticate your request, follow the steps in:
https://cloud.google.com/container-registry/docs/advanced-authentication
My understanding from numerous articles is that no special authorization needs to be set up for GKE to pull GCR images from within the same project, and I've NEVER had this issue in the past.
I hope I'm not the only one on this deserted island. Thanks in advance for your help!
I tried Implementing the setup and faced the same error both on the GKE Cluster and the Cluster’s nodes. This was caused because the access to Cloud Storage API is “Disabled” on the Cluster Nodes which can be verified from Node (VM instance) details under the “Cloud API access scopes” section.
We can rectify this by changing the “Access scopes” to “Set access for each API” and modify access to specific API in the Node Pools -> default-pool -> Security section when creating the cluster. In our case we need at least "Read Only" access for Storage API to enable access to Cloud Storage where the Image is stored. Changing the service account and access scopes for an instance for more information.

Hyperledger Composer setting up connection.json

Hi thank you for all guys watching this article.
Cuz I want to use Hyperledger composer, I deployed orderer, peer, ca and other things. And I got success until creating channel and joining channel
(I believe that this is true cuz I finished making channels, joining peers to join that channel, installing and instantiating chaincode.)
So after that I did
"composer network install" command and I got error that there is no response from peers.
(And "Response from attempted peer comms was an error: Error: 14 UNAVAILABLE: EOF")
So I started to think that there is a problem on the file named "connection.json". But I don't know specifically how to edit that file.
I got response like below commanding "docker service ls" and "docker network inspect fabric"
enter image description here
and my connection.json file looks like this
enter image description here
And I referred to this page to do Hyperledger Fabric on multiple hosts.
https://medium.com/#malliksarvepalli/hyperledger-fabric-on-multiple-hosts-using-docker-swarm-and-compose-f4b70c64fa7d
And this is the screenshot after installing business network
enter image description here
I think your fabric network is not running!
open a terminal and go to your fabric-dev-servers directory and ./startFabric.sh
if you facing any error there, like some container already exists do ./teardownFabric.sh first and then run above start command again.
once a network is running successfully then you need to create admin card by running ./createPeerAdminCard.sh
Could you confirm that all orderer, peers and CAs are successfully launched on each machine? 'docker ps' command shows which services are running. If you use 'docker ps -a', you can find which service is stopped.
From the all of docker-compose files, following container name should be listed by 'docker ps'
orderer
: orderer
org1
: ca1
: org1peer0
: org1peer1
: org1cli
org2
: ca2
: org2peer0
: org2peer1
: org2cli
Could you check this is correct?
Are you running this project on 3 machines or 3 cloud instances?

How to debug failing docker image signing with the Docker Hub registry/notary?

Since today I haven't been able to push new signed images to a Docker Hub private repository due to image signing failing. I have Docker Content Trust enabled. I don't know of any significant changes in my environment that could affect this, except routinely installing the latest security updates to Ubuntu a couple days ago. But signed image pushing did work after those upgrades.
My question is, how to go about debugging signing related problems? There seem to be not much available by googling or duckduckgoing.
I tried running the notary CLI but it didn't seem to provide much help to me. The various options of different commands are not very well documented.
Environment:
OS: Ubuntu 18.04.1 LTS
Docker version 18.06.1-ce, build e68fc7a
relevant environment variables:
DOCKER_CONTENT_TRUST=1
DOCKER_CONTENT_TRUST_REPOSITORY_PASSPHRASE=[undisclosed]
DOCKER_CONTENT_TRUST_ROOT_PASSPHRASE=[undisclosed]
Failing command:
docker push xyz/abc:def
Sample output (obfuscated from the unrelevant parts):
user#machine:~$ source .docker-signing-credentials
user#machine:~$ export DOCKER_CONTENT_TRUST=1
user#machine:~$ docker push myorg/myproject:myimage_v1.38.0
The push refers to repository [docker.io/myorg/myproject]
c72506834af4: Layer already exists
043ae531d76e: Layer already exists
... 8< ... snip ... 8< ...
af840f32f0a2: Layer already exists
8decd5535924: Layer already exists
myimage_v1.38.0: digest: sha256:baa3e1148e0100df8cbb0aab46200be2bdf600d7802d7cddb3a23c12053af82d size: 8883
Signing and pushing trust metadata
failed to sign docker.io/myorg/myproject:myimage_v1.38.0: An error occurred during validation: rpc error: code = 14 desc = grpc: RPC failed fast due to transport failure
When I unset DOCKER_CONTENT_TRUST, there is no problem with pushing the images.
There is an issue open with this same exact description in:
https://github.com/docker/hub-feedback/issues/1646
it might be a good idea to join to this issue.
The root cause was the issue of degraded performance in Docker Hub Notary. See the resolution by Docker support.

Resources