Packer-built OpenStack instance stuck in "Spawning" state - jenkins

I'm trying to build a new Debian image with Packer, but the building process halts at ==> openstack: Waiting for server to become ready..., while Packers building instance is stuck in the Spawning state.
(Edit: My last test build was stuck for ~45 minutes, and exited with this error message: Build 'openstack' errored: Error waiting for server ({uuid}) to become ready: unexpected state 'ERROR', wanted target '[ACTIVE]')
The source image is a cloud image of Debian, and my template file looks like this:
{
"variables": {
"os_auth_url": " ( Keystone URL ) ",
"os_domain_name": " ( Domain Name ) ",
"os_tenant_name": " ( Project Name ) ",
"os_region_name": " ( Region Name ) "
},
"builders": [
{
"type": "openstack",
"flavor": "b.tiny",
"image_name": "packer-openstack-{{timestamp}}",
"source_image": "cd8da3bf-66cd-4847-8970-447533b86b30",
"ssh_username": "debian",
"username": "{{user `username`}}",
"password": "{{user `password`}}",
"identity_endpoint": "{{user `os_auth_url`}}",
"domain_name": "{{user `os_domain_name`}}",
"tenant_name": "{{user `os_tenant_name`}}",
"region": "{{user `os_region_name`}}",
"floating_ip_pool": "internet",
"security_groups": [
"deb_test_uni"
],
"networks": [
"a4151f4e-fd88-4df8-97e1-2b113f149ef8",
"71b10496-2617-47ae-abbc-36239f0863bb"
]
}
]
}
The username and password fields are being added by a separate file, located on the (Jenkins) build server.
The building process managed to get past this at one point, but exited with a ssh timeout error. I have no idea why that happened, and why only then.
Is there anything blindingly obvious that I'm missing? Or has anyone else suffered the same problem, but managed to find a solution?
Thanks in advance!

It turns out that, in my case, there was nothing I (personally) could do. It was neither the Packer template nor the environment variables (as I had a suspicion it could be), but a fault in the server-side configuration.
I'm sorry that I don't know what the bug or fix was, as I wasn't the one who found or fixed the problem, but knowing that it could be good idea to double-check the server setup might help someone in the future.

Related

Pushing an image to ECR, getting "Retrying in ... seconds"

I recently created a new repository in AWS ECR, and I'm attempting to push an image. I'm copy/pasting the directions provided via the "View push commands" button on the repository page. I'll copy those here for reference:
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 123456789.dkr.ecr.us-west-2.amazonaws.com
("Login succeeded")
docker build -t myorg/myapp .
docker tag myorg/myapp:latest 123456789.dkr.ecr.us-west-2.amazonaws.com/myorg/myapp:latest
docker push 123456789.dkr.ecr.us-west-2.amazonaws.com/myorg/myapp:latest
However, when I get to the docker push step, I see:
> docker push 123456789.dkr.ecr.us-west-2.amazonaws.com/myorg/myapp:latest
The push refers to repository [123456789.dkr.ecr.us-west-2.amazonaws.com/myorg/myapp]
a53c8ed5f326: Retrying in 1 second
78e16537476e: Retrying in 1 second
b7e38d172e62: Retrying in 1 second
f1ff72b2b1ca: Retrying in 1 second
33b67aceeff0: Retrying in 1 second
c3a550784113: Waiting
83fc4b4db427: Waiting
e8ade0d39f19: Waiting
487d5f9ec63f: Waiting
b24e42eb9639: Waiting
9262398ff7bf: Waiting
804aae047b71: Waiting
5d33f5d87bf5: Waiting
4e38024e7e09: Waiting
EOF
I'm wondering if this has something to do with the permissions/policies associated with this repository. Right now there are no statements attached to this repository. Is that the missing part? If so, what would that statement look like? I've tried this, but it had no effect:
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "AllowPutImage",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789:root"
},
"Action": "ecr:PutImage"
}
]
}
Bonus Points:
I eventually want to use this in a CDK CodeBuildAction. I was getting the same error as above, so I check to see if I was getting the same result in my local terminal, which I am. So if the policy statement needs to be different for use in the CDK CodeBuildAction those details would be appreciated as well.
Thank you in advance for and advice.
I was having the same problem when trying to upload the image manually using the AWS and Docker CLI. I was able to fix it by going into ECR -> Repositories -> Permissions then adding a new policy statement with principal:* and the following actions:
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability",
"ecr:CompleteLayerUpload",
"ecr:GetDownloadUrlForLayer",
"ecr:InitiateLayerUpload",
"ecr:PutImage",
"ecr:UploadLayerPart"
Be sure to add more restrictive principals. I was just trying to see if permissions were the problem in this case and sure enough they were.
The accepted answer works correctly in resolving the issue. However, as has been mentioned in the answer, allowing principal:* is risky and can get your ECR compromised.
Be sure to add specific principal(s) i.e. IAM Users/Roles such that only those Users/Roles will be allowed to execute the mentioned "Actions". Following JSON policy can be added in Amazon ECR >> Repositories >> Select Required Repository >> Permissions >> Edit policy JSON to get this resolved quickly:
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "Statement1",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<AccountNumber>:role/<RoleName>"
},
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:BatchGetImage",
"ecr:CompleteLayerUpload",
"ecr:GetDownloadUrlForLayer",
"ecr:InitiateLayerUpload",
"ecr:PutImage",
"ecr:UploadLayerPart"
]
}
]
}
I had this issue when the repository didn't exist in ECR - I assumed that pushing would create it, but it didn't.
Creating it before pushing solved the problem.
It turns out it was a missing/misconfigured policy. I was able to get it working within CodeBuild by adding a role with the AmazonEC2ContainerRegistryPowerUser managed policy:
new CodeBuildAction({
actionName: "ApplicationBuildAction",
input: this.applicationSourceOutput,
outputs: [this.applicationBuildOutput],
project: new PipelineProject(this, "ApplicationBuildProject", {
vpc: this.codeBuildVpc,
securityGroups: [this.codeBuildSecurityGroup],
environment: {
buildImage: LinuxBuildImage.STANDARD_5_0,
privileged: true,
},
environmentVariables: {
ECR_REPO_URI: {
value: ECR_REPO_URI,
},
ECR_REPO_NAME: {
value: ECR_REPO_NAME,
},
AWS_REGION: {
value: this.region,
}
},
buildSpec: BuildSpec.fromObject({
version: "0.2",
phases: {
pre_build: {
commands: [
"echo 'Logging into Amazon ECR...'",
"aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ECR_REPO_URI",
"COMMIT_HASH=$(echo \"$CODEBUILD_RESOLVED_SOURCE_VERSION\" | head -c 8)"
]
},
build: {
commands: [
"docker build -t $ECR_REPO_NAME:latest ."
]
},
post_build: {
commands: [
"docker tag $ECR_REPO_NAME:latest $ECR_REPO_URI/$ECR_REPO_NAME:latest",
"docker tag $ECR_REPO_NAME:latest $ECR_REPO_URI/$ECR_REPO_NAME:$COMMIT_HASH",
"docker push $ECR_REPO_URI/$ECR_REPO_NAME:latest",
"docker push $ECR_REPO_URI/$ECR_REPO_NAME:$COMMIT_HASH",
]
}
}
}),
// * * ADDED THIS ROLE HERE * *
role: new Role(this, "application-build-project-role", {
assumedBy: new ServicePrincipal("codebuild.amazonaws.com"),
managedPolicies: [ManagedPolicy.fromAwsManagedPolicyName("AmazonEC2ContainerRegistryPowerUser")]
})
}),
});
In my case, the repo was not created on ECR. Creating it fixed it.
The same message ("Retrying in ... seconds" in loop) may be seen when running "docker push" without first creating the corresponding repo in ECR ("myorg/myapp" in your example). Run:
aws ecr create-repository --repository-name myorg/myapp --region us-west-2
The problem is your iam-user have not permission to full access of ecr so attach below policy to your iam-user.
follow photo for policy attachment
For anyone running into this issue, my problem was having the wrong AWS profile/account configured in my AWS cli.
run aws configure and add the keys of the account having access to ECR repository.
If you have multiple AWS accounts using the cli, then check out this solution.
Just had this problem. It was permission related. In my case I was using CDKv2, which assumes a specific role in order to upload assets. Because the user I was deploying as did not have permission to assume that role, it failed. The hint was these warning messages that appeared during the deploy:
current credentials could not be used to assume 'arn:aws:iam::12345:role/cdk-abcde1234-image-publishing-role-12345-ap-southeast-2', but are for the right account. Proceeding anyway.
current credentials could not be used to assume 'arn:aws:iam::12345:role/cdk-abcde1234-file-publishing-role-12345-ap-southeast-2', but are for the right account. Proceeding anyway.
Yes, updating the permissions on your ECR repo would fix it, but since CDK is supposed to maintain this for you, the proper solution is to allow your user to assume the CDK role so you don't need to mess with ECR permissions yourself.
In my case I did this by granting the sts:AssumeRole permission for the resource arn:aws:iam::*:role/cdk-*. This allowed my user to assume both the file upload role and the image upload role.
After granting this permission, the CDK errors about being unable to assume the role went away, and I was able to deploy successfully.
For me, the problem was that the repository name on ECR had to be the same as the name of the app/repository I was pushing. Tried all fixes here, didn't work. This did!
Browse ECR -> Repositories -> Permissions
Edit JSON Policy.
Add these actions.
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability",
"ecr:CompleteLayerUpload",
"ecr:GetDownloadUrlForLayer",
"ecr:InitiateLayerUpload",
"ecr:PutImage",
"ecr:UploadLayerPart"
And Add "*" in Resources.
Save it.
You're good to go, Now you can push the image to ECR.
If you have MFA enforcement policy on your account that might be the problem because you have to have a token for getting action. Take a look at this AWS document to get a token on CLI.
I was uploading from EC2 instance and I was missing to specify the region to my awscli, the login was successful but the docker push command was Retrying all the time, I have set the correct permissions on the ECR repo side
This line fix the issue for me and
aws configure set default.region us-west-1
In my case I used wrong AWS credentials and aws configure with correct credentials resolved the issue.

VScode remote containter debug unable to find malloc.c

I have been trying to debug C++ code via VScode on a remote docker container. While this is working for 2 of my other college's it isn't for me. We both use the same docker image. So I suspect it's something in my VScode, but what I do not know.
I get the following error when debugging the source code.
Unable to open 'malloc.c': Unable to read file 'vscode-remote://attached-container+7b22636f6e7461696e65724e616d65223a222f637070616e74227d/build/glibc-S9d2JN/glibc-2.27/malloc/malloc.c' (Error: Unable to resolve non-existing file 'vscode-remote://attached-container+7b22636f6e7461696e65724e616d65223a222f637070616e74227d/build/glibc-S9d2JN/glibc-2.27/malloc/malloc.c').
I can "fix" this by extracting glibc in /build/, but I would rather have it fix forever and not have the same issue with another docker container (possible). Glibc is installed in the Docker container at /usr/src/glibc. I found it by running find / -iname glibc.
To run the application from VScode on the remote docker container, I use this launch.json file:
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "(gdb) Launch Program",
"type": "cppdbg",
"request": "launch",
"program": "${workspaceFolder}/build/src/application",
"args": [],
"stopAtEntry": false,
"cwd": "${workspaceFolder}/build/src",
"environment": [],
"externalConsole": false,
"MIMode": "gdb",
"setupCommands": [
{
"description": "Enable pretty-printing for gdb",
"text": "-enable-pretty-printing",
"ignoreFailures": true
}
]
}
]
}
Not sure if this information is necessary but it cant do harm.
host: Windows 10
docker container: ubuntu 18:04
visual studio code version: 1.55.0
Hopefully, this is enough information to resolve the issue I'm facing.

How to provide a valid crumb in ansible jenkins_script module

I am using Ansible to check the status of several jenkins servers. The playbook that I have created checks the disk space, uptime, and jenkins version perfectly fine. However, I tried to add a task that prints out a list of the installed jenkins plugins for each server by using the jenkins_Script module and keep receiving a '403' error message.
Playbook:
- name: Obtaining a list of Jenkins Plugins
jenkins_script:
script: 'println(Jenkins.instance.pluginManager.plugins)'
url: 'http://server.com:8080/'
user: '*****'
password: '*****'
Output:
fatal: [server]: FAILED! => {
"changed": false,
"failed": true,
"invocation": {
"module_args": {
"args": null,
"force_basic_auth": true,
"password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
"script": "println(Jenkins.instance.pluginManager.plugins)",
"url": "http://server.com:8080/",
"url_password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
"url_username": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
"user": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
"validate_certs": true
}
},
"msg": "HTTP error 403 HTTP Error 403: No valid crumb was included in the request"
}
-- I believe I have narrowed down the issue - It looks like I wasn't providing a crumb. I have since generated the crumb, but there is no 'crumb' arguement for the jenkins_script module. Does anyone know how to successfully provide a crumb?
Will gladly clarify anything stated above if needed, and any assistance is greatly appreciated.
https://github.com/ansible/ansible/pull/20207
-- if you're on ansible 2.3 the changes have already been committed all you have to do is make sure 'cross site request forgery' is enabled on the jenkins servers. (Manage jenkins > Configure Global security)

Consul watch with critical consul checks

So I have a consul check that watches over a container and is designed to go critical when the container is stopped. I want to create a consul watch that will run a script after the check has gone critical, or after several critical responses (for example if my check sends 5 critical responses I want it to run a script).
Here is the json for my working check and my guess as to what I my watch might look like:
{
// this check works
"checks": [
{
"id": "docker_stuff",
"name": "curl test",
"notes": "curls the docker container",
"script": "/scripts/docker.py",
"interval": "1s"
}
],
//this watch doesn't work
"watches": [
{
"Node": "client2",
"CheckID": "docker-stuff",
"Name": "docker-stuff-watch",
"Status": "critical",
"Status_amt": "5",
"handler": "/scripts/new-docker.sh",
"Output": "container relaunched",
}
]
}
What do I need to change in my watch to get it working?
Would I also need to use a consul event to watch my health check and then trigger a consul watch (of the event type) that runs my /scripts/new-docker.sh script? If so then how would I make a consul event that watches over my health check? For example if this was my consul check, watch and event, what would I need to change to get this working?
{
"checks": [
{
"id": "docker_stuff",
"name": "curl test",
"notes": "curls the docker container",
"script": "/scripts/docker.py",
"interval": "1s"
}
],
"watches": [
{
"type": "event",
"name": "docker-stuff-watch",
"handler": "/scripts/new-docker.sh"
}
],
"events": [
{
"Node": "client2",
"CheckID": "docker-stuff",
"Name": "docker-stuff-event",
"Status": "critical",
"Status_amt": "5",
"Output": "container relaunched",
}
]
}
What do I need to change in my watch to get it working?
Are there any errors? Make sure your watch handler '/scripts/new-docker.sh' is consuming STDIN that Consul will be sending, even if it is throwing it away to /dev/null, otherwise the process will wait forever for it to be consumed
Something like
while read -r -t 0; do read -r; done
I would recommend considering an upgrade to the next version of Docker 1.12 (release candidate at the moment). The new concept of services can be used to state the desired number of containers to be run.
https://docs.docker.com/engine/swarm/swarm-tutorial/deploy-service/
There's also a new HEALTHCHECK directive in the Dockerfile that enables you to bundle a check script with the container image.
These new features might enable you to replace the functionality you've had to implement using consul.

How do I use ${workspaceRoot} for my Electron app in Visual Studio Code?

I have an Electron app that I was able to debug in Visual Studio Code. After I upgraded to version 0.10.8 it will no longer run.
I am getting the error message below in my launch.json file:
Relative paths will no longer be automatically converted to absolute ones. Consider using ${workspaceRoot} as a prefix.
Absolute path to the runtime executable to be used. Default is the runtime executable on the PATH.
Here is my launch.json file:
{
"version": "0.2.0",
"configurations": [
{
"name": "My First Electron App",
"type": "node",
"request": "launch",
"program": "$(workspaceRoot}/app/main.js", //ERROR
"stopOnEntry": false,
"args": [],
"cwd": "$(workspaceRoot}",
"runtimeExecutable": "$(workspaceRoot}/node_modules/electron-prebuilt/dist/electron.app/Contents/MacOS/Electron", //ERROR
"runtimeArgs": [
"--nolazy"
],
"env": {
"NODE_ENV": "development"
},
"externalConsole": false,
"sourceMaps": false,
"outDir": null
},
{
"name": "Attach",
"type": "node",
"request": "attach",
"port": 5858
}
]
}
I am getting the green squiggly line mentioned for the two lines with //ERROR at the end.
I saw this article, but honestly was familiar with VS Code enough to understand how this should be implemented: https://code.visualstudio.com/Docs/editor/tasks#_variable-substitution
UPDATE
I replaced the value for "cwd" with "${workspaceRoot}" as recommended by Isidor. The green squiggly line went away.
I updated the error message that I am still seeing on the other two lines.
When I hit F5 I get this error message:
request 'launch': runtime executable '/private/var/git/electron-vs-code/$(workspaceRoot}/node_modules/electron-prebuilt/dist/electron.app/Contents/MacOS/Electron' does not exist
There is a typo in your json. Change the parenthesis after the $ in $(workspaceRoot} to a curly brace. This should at least fix the warning.
Even though you are getting the relative path warning VSCode still automatically converts relative to absolute paths in 0.10.8. To get rid of the warnings for "cwd", instead of "." please put "${workspaceRoot}".
What happens when you run try to debug your electron app, do you see some other error, since the relative to absolute can not be the true cause of this. If you command palette / open developper tools -> do you see some error in the console?

Resources