I'm using Gitea (on Kubernetes, behind an Ingress) as a Docker image registry. On my network I have gitea.avril aliased to the IP where it's running. I recently found that my Kubernetes cluster was failing to pull images:
Failed to pull image "gitea.avril/scubbo/<image_name>:<tag>": rpc error: code = Unknown desc = failed to pull and unpack image "gitea.avril/scubbo/<image_name>:<tag>": failed to resolve reference "gitea.avril/scubbo/<image_name>:<tag>": failed to authorize: failed to fetch anonymous token: unexpected status: 530
While trying to debug this, I found that I am unable to login to the registry, even though curling with the same credentials succeeds:
$ curl -k -u "scubbo:$(cat /tmp/gitea-password)" https://gitea.avril/v2/_catalog
{"repositories":[...populated list...]}
# Tell docker login to treat `gitea.avril` as insecure, since certificate is provided by Kubernetes
$ cat /etc/docker/daemon.json
{
"insecure-registries": ["gitea.avril"]
}
$ docker login -u scubbo -p $(cat /tmp/gitea-password) https://gitea.avril
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Error response from daemon: Get "https://gitea.avril/v2/": received unexpected HTTP status: 530
The first request shows up as a 200 OK in the Gitea logs, the second as a 401 Unauthorized.
I get a similar error when I kubectl exec onto the Gitea container itself, install Docker, and try to docker login localhost:3000 - after an error indicating that server gave HTTP response to HTTPS client, it falls back to the http protocol and similarly reports a 530.
I've tried restart Gitea with GITEA__log__LEVEL=Debug, but that didn't result in any extra logging. I've also tried creating a fresh user (in case I have some weirdness cached somewhere) and using that - same behaviour.
EDIT: after increasing log level to Trace, I noticed that successful attempts to curl result in the following lines:
...rvices/auth/basic.go:67:Verify() [T] [638d16c4] Basic Authorization: Attempting login for: scubbo
...rvices/auth/basic.go:112:Verify() [T] [638d16c4] Basic Authorization: Attempting SignIn for scubbo
...rvices/auth/basic.go:125:Verify() [T] [638d16c4] Basic Authorization: Logged in user 1:scubbo
whereas attempts to docker login result in:
...es/container/auth.go:27:Verify() [T] [638d16d4] ParseAuthorizationToken: no token
This is the case even when doing docker login localhost:3000 from the Gitea container itself (that is - this is not due to some authentication getting dropped by the Kubernetes Ingress).
I'm not sure what could be causing this - I'll start up a fresh Gitea registry to compare.
EDIT: in this Github issue, the Gitea team pointed out that standard docker authentication includes creating a Bearer token which references the ROOT_URL, explaining this issue.
Text below preserved for posterity:
...Huh. I have a fix, and I think it indicates some incorrect (or, at least, unexpected) behaviour; but in fairness it only comes about because I'm doing some pretty unexpected things as well...
TL;DR attempting to docker login to Gitea from an alternative domain name can result in an error if the primary domain name is unavailable; apparently because, while doing so, Gitea itself makes a call to ROOT_URL rather than localhost
Background
Gitea has a configuration variable called ROOT_URL. This is, among other things, used to generate the copiable "HTTPS" links from repo pages. This is presumed to be the "main" URL on which users will access Gitea.
I use Cloudflared Tunnels to make some of my Kubernetes services (including Gitea) available externally (on <foo>.scubbo.org addresses) without opening ports to the outside world. Since Cloudflared tunnels do not automatically update DNS records when a new service is added, I have written a small tool[0] which can be run as an initContainer "before" restarting the Cloudflared tunnel, to refresh DNS[1].
Cold-start problem
However, now there is a cold-start problem:
(Unless I temporarily disable this initContainer) I can't start Cloudflared tunnels if Gitea is unavailable (because it's the source for the initContainer's image)
Gitea('s public address) will be unavailable until Cloudflared tunnels start up.
To get around this cold-start problem, in the Cloudflared initContainers definition, I reference the image by a Kubernetes Ingress name (which is DNS-aliased by my router) gitea.avril rather than by the public (Cloudflared tunnel) name gitea.scubbo.org. The cold-start startup sequence then becomes:
Cloudflared tries to start up, fails to find a registry at gitea.avril, continues to attempt
Gitea (Pod and Ingress) start up
Cloudflared detects that gitea.avril is now responding, pulls the Cloudflared initContainer image, and successfully deploys
gitea.scubbo.org is now available (via Cloudflared)
So far, so good. Except that testing now indicates[2] that, when trying to docker login (or docker pull, or presumably, many other docker commands) to a Gitea instance will result in a call to the ROOT_URL domain - which, if Cloudflared isn't up yet, will result in an error[3].
So what?
My particular usage of this is clearly an edge case, and I could easily get around this in a number of ways (including moving my "Cloudflared tunnel startup" to a separately-initialized, only-privately-available registry). However, what this reduces to is that "docker API calls to a Gitea instance will fail if the ROOT_URL for the instance is unavailable", which seems like unexpected behaviour to me - if the API call can get through to the Gitea service in the first place, it should be able to succeed in calling itself?
However, I totally recognize that the complexity of fixing this (going through and replacing $ROOT_URL with localhost:$PORT throughout Gitea) might not be worth the value. I'll open an issue on the Gitea team, but I'd be perfectly content with a WILLNOTFIX.
Footnotes
[0]: Note - depending on when you follow that link, you might see a red warning banner indicating "_Your ROOT_URL in app.ini is https://gitea.avril/ but you are visiting https://gitea.scubbo.org/scubbo/cloudflaredtunneldns_". That's because of this very issue!
[1]: Note from the linked issue that the Cloudflared team indicate that this is unexpected usage - "We don't expect the origins to be dynamically added or removed services behind cloudflared".
[2]: I think this is new behaviour, as I'm reasonably certain that I've done a successful "cold start" before. However, I wouldn't swear to it.
[3]: After I've , the error is instead error parsing HTTP 404 response body: unexpected end of JSON input: "" rather than the 530-related errors I got before. This is probably a quirk of Cloudflared's caching or DNS behaviour. I'm working on a minimal reproducing example that circumvents Cloudflared.
I am using this Github repo and folder path I found: https://github.com/entechlog/kafka-examples/tree/master/kafka-connect-standalone to run Kafka connect locally in standalone mode. I have made some changes to the Docker compose file but mainly changes that pertain to authentication.
The problem I am now having is that when I run the Docker image, I get this error multiple times, for each partition (there are 10 of them, 0 through 9):
[2021-12-07 19:03:04,485] INFO [bq-sink-connector|task-0] [Consumer clientId=connector- consumer-bq-sink-connector-0, groupId=connect-bq-sink-connector] Found no committed offset for partition <topic name here>-<partition number here> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1362)
I don't think there are any issues with authenticating or connecting to the endpoint(s), I think the consumer (connect sink) is not sending the offset back.
Am I missing an environment variable? You will see this docker compose file has CONNECT_OFFSET_STORAGE_FILE_FILENAME: /tmp/connect.offsets, and I tried adding CONNECTOR_OFFSET_STORAGE_FILE_FILENAME: /tmp/connect.offsets (CONNECT_ vs. CONNECTOR_) and then I get an error Failed authentication with <Kafka endpoint here>, so now I'm just going in circles.
I think you are focused on the wrong output.
That is an INFO message
The offsets file (or topic in distributed mode) is for source connectors.
Sink connectors use consumer groups. If there is no found offset found for groupId=connect-bq-sink-connector, then the consumer group didn't commit it.
Our Composer instance dropped all its active workers in the middle of the day. Node memory and cpu utilization disappeared for 2 out of 3 nodes.
First errors were:
_mysql_exceptions.OperationalError: (2006, "Can't connect to MySQL server on 'airflow-sqlproxy-service.default.svc.cluster.local' (110))"
Restarting Composer instance (with a dummy env variable) does not help, gives the below error:
Killing GKE workers in error does not help either. Stackdriver has this:
ERROR: (gcloud.container.clusters.describe) You do not currently have an active account selected.)
And another error seems to point to internal Google authentication service problem:
ERROR: (gcloud.container.clusters.get-credentials) There was a problem refreshing your current auth tokens: Unable to find the server at metadata.google.internal)
The Composer storage bucket seems to have 'Storage Legacy Bucket ...' permissions for some service accounts. Some changes going on in the authentication backend or what could be the underlying cause of the sudden and weird freeze?
Versions are composer-1.8.2 and airflow-1.10.3.
We have the following scenario:
Current working setup
Web API project using a single DockerFile
A release pipe line with an 'Azure App Service deploy' task.
Proposed new setup
Web API project using multi container Docker Compose file
A release pipe line with an 'Azure Web App for Containers' task.
Upon deploying the new setup we receive the below error message:
ERROR - multi-container unit was not started successfully
Unhandled exception. System.AggregateException: One or more errors occurred.
(Parameters: Connection String: XXX, Resource: https://vault.azure.net, Authority:
https://login.windows.net/xxxxx. Exception Message:
Tried to get token using Managed Service Identity.
Access token could not be acquired. Connection refused)
The exception thrown is because it can't connect to Azure MSI (Managed Service Identity). It does this to obtain a token before connecting to key vault.
I have tried the following based upon some research and solutions others have found:
Connecting with "RunAs=App" (this seems to be the default parameter-less constructor anyway)
Building up the connection string myself manually by pulling the "MSI_SECRET" environment variable from the machine. This is always blank.
Restarting MSI.
Upgrading and downgrading AppAuthentication package
MSI appears to be configured correctly as it works perfectly with our current working setup so we can rule that out.
It's worth noting that this is System assigned identity not a user assigned one.
The documentation that states which services support managed identites only mentions 'Azure Container Instances' not 'Azure Managed Container Instances' and that is for Linux/Preview too so that it could be not supported.
Services that support managed identities for Azure resources
We've spent a considerable amount of time getting to this point with the configuration and deployment and it would be great if we could resolve this last issue.
Any help appreciated.
Unfortunately, there currently is no multi-container support for managed identities. The multi-container feature is in preview and so does not have all its functionality working yet.
However, the documentation you linked to is also not as clear about the supported scenarios, so I am working on getting this documentation updated to better clarify this. I can update this answer once that's done.
I'm gonna set up sample elasticbeanstalk environment for multi-container docker.
But it is not created due to error.
environment tier: web-server
other configuration info: https://i.stack.imgur.com/gKwBn.png
I want to create sample elasticbeanstalk environment for multi-container docker.
But the actual is not created.
Here is the error statement.
WARN Environment health has transitioned from Pending to Severe.
Initialization in progress (running for 15 minutes). None of the instances are sending data.
ERROR Stack named 'awseb-e-at4dw9xg2u-stack' aborted operation.
Current state: 'CREATE_FAILED' Reason: The following resource(s) failed to create: [AWSEBInstanceLaunchWaitCondition].
ERROR LaunchWaitCondition failed.
The expected number of EC2 instances were not initialized within the given time.
Rebuild the environment. If this persists, contact support.
This issue is solved by allowing inbound & outbound traffic for default ACL network.
inbound
outbound