Unable to delete gcloud composer environment - google-cloud-composer

I'm trying to delete gcloud environments. One did not successfully create (no associated Airflow or Bucket) and one did. When I attempt to delete, I get an error message (after a really long time) of RPC Skipped due to required preoperation not finished yet. The logs don't provide any valuable information, and I wasn't able to find anything wrong in the cluster. The only solution I have found so far is to delete the entire project, but I would prefer not to. Any suggestions would be greatly appreciated!

Follow the steps below to delete the environment's resources manually:
Delete GKE cluster that corresponds to the environment
Delete the Google Storage bucket used by the environment
Delete the related deployment with:
gcloud deployment-manager deployments delete <DEPLOYMENT_NAME> --delete-policy=ABANDON
Then try again to delete the Composer environment with:
gcloud composer environments delete <ENVIRONMENT_NAME> --location <LOCATION>

I would like to share what worked for me in case someone else runs into this problem as I followed all the steps above and still could not delete the composer environment.
My 'gcloud composer environments list' command was returning '0', but I could see my environment was still in the console view and when I tried to delete it, I would get the same error message as honlicious. Additionally, I ran 'gcloud projects add-iam-policy-binding' to try to give my Compute Engine ServiceAccount the composer.serviceAgent role, but this still did not resolve my issue. What eventually worked was disabling the Cloud Composer API and then re-enabling it. This removed my old environment I was unable to previously delete.

I got this issue when I tried to create and delete Cloud Composer by Terraform.
I created a Service Account apart from the Composer and this led to deletion it in the first order during a terraform destroy operation.
So the correct order is:
Delete Composer environment
Delete Composer’s Service Account

Related

GoogleCloudPlatform/gcr-cleaner cannot delete the stale images

I am trying to use the gcr-cleaner that recommended by google to clean up my stale images. However, it cannot delete anything as expected even if it returns me it executes successfully.
I have granted the browser, cloud run admin, service account user, and storage admin roles for the service account. Also, the docker configuration is successful as well. i have tried both GitHub action and cloud run, none of them work.
Even if I give it a wrong repo name, it will show
Deleting refs older than 2022-10-25T20:08:16Z on 1 repo(s)...
gcr.io/project-id/my-repo
✗ no refs were deleted
But there are a bunch of images are older than that timestamp.
Anyone has the same issue before? How should i solve it?
I just had this issue right now, my problem was that without the flag -tag-filter-any it does not delete tagged images, and all images I wanted to delete are tagged. What solved for me was then setting this flag with the regex for my tags:
docker run -v "${HOME}/.config/gcloud:/.config/gcloud" -it us-docker.pkg.dev/gcr-cleaner/gcr-cleaner/gcr-cleaner-cli -grace 720h -keep 5 -repo gcr.io/[MY-PROJECT]/[MY-REPO] -tag-filter-any "^(\d+).(\d+).(\d+)$"

Puppet Code Manager setup issue with Bitbucket

I have just installed puppet server enterprise and successfully added a few nodes and got some custom modules running also. I am now wanting to move to Code Manager before we get too deep in it.
I have followed the instructions for creating an empty Bitbucket repo here and initializing it with one single file environment.conf on a production branch as described in that link.
I have then followed the steps here to configure Code Manager but when I get to Test the control repository section to test the connection with puppet-code deploy --dry-run I get the following error:
--dry-run implies --all.
--dry-run implies --wait.
Dry-run deploying all environments.
2021/12/21 20:21:12 ERROR - [POST /deploys][500] Errors while collecting a list of environments to deploy (exit code: 1).
"/opt/puppetlabs/puppet/lib/ruby/gems/2.7.0/gems/rugged-0.27.7/lib/rugged/repository.rb:258: warning: Using the last argument as keyword parameters is deprecated\nERROR\t -\u003e Unable to determine current branches for Git source 'puppet' (/etc/puppetlabs/code-staging/environments)\nOriginal exception:\nFailed to authenticate SSH session: Unable to send userauth-publickey request at /opt/puppetlabs/server/data/code-manager/git/git#git.company.com-1234-in-puppet-control-repo.git\n"
I have added the puppet server's SSH pub key to the bitbucket repo's access tokens.
There are a few things in that error message im not fully understanding.
Unable to determine current branches for Git source 'puppet' - What is meant by source 'puppet' - my repo is called puppet-control-repo...?
Failed to authenticate SSH session: Unable to send userauth-publickey request - My puppet master's SSH keys are in the token list for that repo so confused here also.
Any guidance would be appreciated.
UPDATE (13-01-2022):
I can successfully clone on puppet server using command
git clone ssh://git#git.example.com:1234/project/puppet-control-repo.git --config core.sshCommand="ssh -i /etc/puppetlabs/puppetserver/ssh/id-control_repo.rsa"
Note sure why puppet is still returning:
Failed to authenticate SSH session: Unable to send userauth-publickey request
I don't know if you saw the instructions here https://puppet.com/docs/pe/2021.4/control_repo.html#managing_environments_with_a_control_repository but you can run
puppet infrastructure configure
which makes sure the files have right permissions.
I would also test attempting a clone with keys works outside of code deploy
git clone -i /etc/puppetlabs/puppetserver/ssh/id-control_repo.rsa your_gir_url
If this works it may be worth being aware of an issue we experienced on github https://puppet.com/blog/how-githubs-protocol-changes-impact-your-puppet-code-deployments/ which depending on bitbuckets approach to protocal may be having a similar affect.
We are updating docs to recommend the usage of more secure keys ed25519 creating as per the article.
if a manual clone doesnt work it suggests bitbucket doesn't have your public key correctly
Also a more complete debugging command is
runuser -u pe-puppet -- /opt/puppetlabs/puppet/bin/r10k -c /opt/puppetlabs/server/data/code-manager/r10k.yaml deploy environment production --puppetfile --verbose debug2
FOLLOWUP
On investigation we found https://support.puppet.com/hc/en-us/articles/227829007 which showed ssh:// was required at the start of r10k_remote making an example command of ssh://git#bitbucket.org:davidsandilands/control-repo.git
I have requested updates to https://support.puppet.com/hc/en-us/articles/227829007 to highlight this is not a version confined issue and asked for the puppet code manager configuration docs to be updated to reflect this may be required.
I see that you have a .pub file in the ssh directory. I believe it's expecting a private key there.
Also do you have the master class set up to point to your repo inside of Puppet Enterprise web ui?
You'll want to set the following parameters on that class.
code_manager_auto_configure = true
r10k_private_key = $PRIVATE_KEY_IN_SSH_FOLDER_ABSOLUTE_PATH
r10k_remote = Your git URL
The PE Master can be found in Node Groups on the PE Web UI Node Groups -> PE Infrastructure -> PE Master
Thanks to #david-sandilands for helping me resolve this and guiding me to this article via the puppet community slack. Top guy!
EDIT 1:
The solution was documented here: https://support.puppet.com/hc/en-us/articles/227829007-Fix-your-Bitbucket-Stash-Code-Manager-configuration-in-Puppet-Enterprise-2015-3-to-2017-2
However the documentation was out of date as it affected version 2021.4 also.
In short:
r10k_remote = "ssh://git#git.company.com:1234/project/control-repo.git"
Not
r10k_remote = "git#git.company.com:1234/project/control-repo.git"
When working with Bitbucket Server.
EDIT 2:
Puppet have since updated their documentation:
https://puppet.com/docs/pe/2021.5/code_mgr_config.html#code_mgr_enable

Connecting to different ARN/Role/Amazon Account when trying to deploy

I have previously had Serverless installed on a server, and then when I tried to edit the function and package it back up to edit the zip file I broke it, so I have to start all over. So to begin this issue: I had Serverless running and was using it with this package - https://github.com/adieuadieu/serverless-chrome/tree/master/examples/serverless-framework/aws
When I sudo npm run deploy, I get the ServerlessError:
ServerlessError: User: arn:aws:sts::XXX:assumed-role/EC2CodeDeploy/i-268b1acf is not authorized to perform: cloudformation:DescribeStackResources on resource: arn:aws:cloudformation:us-east-1:YYY:stack/aws-dev/*
I'm not sure why it is trying to connect to a Role and not an IAM. So I check the Role, and it is in an entirely different AWS account than the account I've configured. Let's call this Account B.
When it comes to configuration, I've installed AWS CLI and entered in the key, id, and region in my Account A in AWS. Not touching Account B whatsoever. When I run aws s3 ls I see the correct s3 buckets of the account with the key/id/regioin, so I know CLI is working with the correct account. Sounds good. I check the ~/.aws/creditionals file and just has one profile [default] which seems normal. No other profiles are in here. I copied this over to the ~/.aws/config file so now both files are same. Works great.
I then go into my SSH where I've installed serverless, and run npm run deploy and it gives me the same message above. I think maybe somehow it is not using the correct account for whatever reason. So I manually set the access key and secret with the following commands:
serverless config credentials --provider aws --key XXX --secret YYY
It tells me there already is a profile in the aws creds file, so I then add --o to the end to overwrite. I run sudo npm run deploy and still same error.
I then run this command to manually set a profile in the creds for serverless, with the profile name matching the IAM user name:
serverless config credentials --provider aws --key XXX --secret YYY --profile serverless-agent
Where "serverless-agent" is the name of my IAM user I've been trying to use to deploy. I run this, it tells me there already is an existing profile in the aws creds file so I run it with --o and it tells me the aws file is now updated. In bash I go to Vim the file and I only see the single "[default]" settings, as if nothing has changed. I run sudo npm run deploy and it gives me the same Error.
I then go and manually set the access and secret:
export AWS_ACCESS_KEY_ID=XXX
export AWS_SECRET_ACCESS_KEY=YYY
I run sudo npm run deploy and it gives me the same Error.
I even removed AWS CLI, and the directory that holds the creditionals and config files - and when I manually set my account creds via serverless config it tells me there already is a profile set up in my aws file, prompting me to use the overwrite command - how is this possible when the file is literally not on my computer?
So I then think that serverless itself has a cache or something, calling the wrong file or whatever for creds, so I uninstall serverless via sudo npm uninstall -g serverless so that I can start from zero again. I then do all of the above steps and more all over again, and nothing has changed. Same error message.
I do have Apex.run set up, but that should be using my AWS CLI config file so I'm not sure if that is causing any problems. But then again I've no clue of anything deep on this subject, and I can't find any ability to remove Apex itself in their docs.
In the package I am trying to deploy, I do not have a profile:XXX set in the serverless.yml file, because I've read if you do not then it just defaults to the [default] profile you have set in the aws creds file on your computer. Just to check, I go into the serverless.yml file and set the profile: default, and the error I now get when I run npm run deploy is
Profile default does not exist
How is that possible when I have the "default" profile set in my creds file? So I remember that previously I ran the serverless config creditionals command and added the profile name of serverless-agent to it (yet didn't save in the aws creds file as I mentioned above), so I add that profile name to the serverless.yml file just to see if this works, and same error of "Profile default does not exist".
So back to the error message. The Role is an account not even related to the IAM user I'm using in my aws creds. Without knowing a lot about this, it's as if the config in serverless via ssh isn't correct or something. Is it using old creds I had set up in Apex.run? Why is the aws creds file not updated with the profile when I manually set it in serverless config command? I am using the same user account (but with new key and secret) that I used a few weeks ago when I correctly deployed and my Lambda and API was set up for me on AWS. Boy do I miss those time and wish I didn't mess up my existing Lambda functions, without setting version number prior, forcing me to start all over.
I am so confused. Any help would be greatly appreciated.
If you are using IAM role then you have to use that IAM role through assume role using powershell.
I was also facing same issue earlier, when we moved from from user to role.

Openshift : pods not being deleted

I have am using OpenShift 3, and have been trying to get Fabric8 setup.
Things havent been going to well, so I decided to remove all services and pods.
When I run
oc delete all -l provider=fabric8
The cli output claims to have deleted a lot of pods, however, they are still showing in the web console, and I can run the same command in the CLI again and get the exact same list of pods that OpenShift cli claims it deleted.
How do I actually delete these pods?
Why is this not working as designed?
Thanks
Deletion is graceful by default, meaning the pods are given an opportunity to terminate themselves. You can force a graceless delete with oc delete all --grace-period=0 ...
You can also delete the pod forcefully like below,it works fine
#oc delete all -l provider=fabric8 --grace-period=0 --force
Sadly Jordan's response did not work for me on openshift 3.6.
Instead I used the option --now equivalent to --grace-period=1

Error deploying rails app to heroku

I am following the rails tutorial, and I am at a point where it instructs to deploy the app to heroku for the second time. I have successfully deployed an app in the past, but it will not work now.
I get this error : Permission denied (public key)
fatal: could not read from remote repository.
The remote exists and is correct, and when using the "heroku key" my key appears. I can add a new stack to heroku as well. I also tried re-adding the key, and that did not work.
Very confused, all the solutions I have found have not worked.
Sounds like you need to configure your ssh keys (usually located at ~/.ssh). Are you using github? If so, your ssh keys should already be set up (you won't be able to push to github.com without setting those up).
If you haven't already set up your ssh keys, follow these instructions from github to do so.
Once your ssh keys are set up, performing the command 'git push heroku' should do the trick. Make sure Heroku is set up correctly by following the instructions from the tutorial
You are probably not deploying as the same user you deployed the first app as. If you are in a linux environment this probably means you deployed as root one time and tried to as a user the other time, maybe you used sudo .
Or possibly you deleted your ssh public keys....or maybe you changed the permissions of your ssh keys.
I am not high enough rated to comment, so please navigate to ~/.ssh and type "ls -l" so I can see your permissions. Then navigate one directory up to ~/ and type "ls -la" so I can see your permissions on the actual .ssh folder
then navigate to /.ssh and do the same permissions posting so I can see them.

Resources