I'm running community gerrit cookbook in docker using chef-solo.
If I run the cookbook in a Dockerfile as a build step, it throws an error (check the log). But if I run the image and go inside the container and run the same command, it works fine.
Any idea what's going on?
Its complaining about sudo, yet continues and creates symbolic link. 'target_mode = nil' should not be a problem since it complains about same thing when I run the command inside the container as well but works fine. It ends up complaining about init.d script which does not make sense.
chef-solo as a build step:
RUN chef-solo --log_level debug -c /resources/solo.rb -j /resources/node.json
Logs:
[ :08+01:00] INFO: Processing ruby_block[gerrit-init] action run (gerrit::default line 225)
sudo: sorry, you must have a tty to run sudo
[ :08+01:00] INFO: /opt/gerrit/war/gerrit-2.7.war exist....initailizing gerrit
[ :08+01:00] INFO: ruby_block[gerrit-init] called
[ :08+01:00] INFO: Processing link[/etc/init.d/gerrit] action create (gerrit::default line 240)
[ :08+01:00] DEBUG: link[/etc/init.d/gerrit] created symbolic link from /etc/init.d/gerrit -> /opt/gerrit/install/bin/gerrit.sh
[ :08+01:00] INFO: link[/etc/init.d/gerrit] created
[ :08+01:00] DEBUG: found target_mode == nil, so no mode was specified on resource, not managing mode
[ :08+01:00] DEBUG: found target_uid == nil, so no owner was specified on resource, not managing owner
[ :08+01:00] DEBUG: found target_gid == nil, so no group was specified on resource, not managing group
[ :08+01:00] INFO: Processing link[/etc/rc3.d/S90gerrit] action create (gerrit::default line 244)
[ :08+01:00] DEBUG: link[/etc/rc3.d/S90gerrit] created symbolic link from /etc/rc3.d/S90gerrit -> ../init.d/gerrit
[ :08+01:00] INFO: link[/etc/rc3.d/S90gerrit] created
[ :08+01:00] DEBUG: found target_mode == nil, so no mode was specified on resource, not managing mode
[ :08+01:00] DEBUG: found target_uid == nil, so no owner was specified on resource, not managing owner
[ :08+01:00] DEBUG: found target_gid == nil, so no group was specified on resource, not managing group
[ :08+01:00] INFO: Processing service[gerrit] action enable (gerrit::default line 248)
[ :08+01:00] DEBUG: service[gerrit] supports status, running
================================================================================
Error executing action `enable` on resource 'service[gerrit]'
================================================================================
Chef::Exceptions::Service
-------------------------
service[gerrit]: unable to locate the init.d script!
Resource Declaration:
---------------------
# In /var/chef/cookbooks/gerrit/recipes/default.rb
248: service 'gerrit' do
249: supports :status => false, :restart => true, :reload => true
250: action [ :enable, :start ]
251: end
252:
Compiled Resource:
------------------
# Declared in /var/chef/cookbooks/gerrit/recipes/default.rb:248:in `from_file'
service("gerrit") do
action [:enable, :start]
supports {:status=>true, :restart=>true, :reload=>true}
retries 0
retry_delay 2
guard_interpreter :default
service_name "gerrit"
pattern "gerrit"
cookbook_name :gerrit
recipe_name "default"
end
Containers are not virtual machines, meaning they run single processes and not have process managers running.This explains why chef-solo will have issues creating service resources.
I would suggest reading about some of the emerging support that chef is designing for containers:
https://docs.getchef.com/containers.html
https://github.com/opscode/chef-init
I don't pretend it makes lots of sense at first read. I am yet to be convinced that chef is the best way to build a container.
The actually error was sudo: sorry, you must have a tty to run sudo, linux terminal not assigned due to security reasons, more info in this link here.
By default Docker runs as root, there is no need to do sudo. The cookbook I was running created 'gerrit' user which was causing me to do sudo. I removed the user and ran everything as root. Solved!
Related
I want to use buildah from gitlab-ci, in order to build an image, run a container from it and do some tests against it.
My current gitlab-ci is:
tests:
tags:
- docker
image: quay.io/buildah/stable
stage: test
variables:
STORAGE_DRIVER: "vfs"
BUILDAH_FORMAT: "docker"
BUILDAH_ISOLATION: "rootless"
only:
refs:
- merge_requests
changes:
- **/*
script:
- buildah info --debug
- buildah unshare docker/test/run.sh
My runner is private gitlab runner, I don't want to change its configuration (to not break other CI).
The content of run.sh is:
#!/usr/bin/env bash
set -euo pipefail
container=$(buildah --ulimit nofile=8192 --name my-container from phusion/baseimage:bionic-1.0.0-amd64)
The error is:
level=warning msg="error reading allowed ID mappings: error reading subuid mappings for user \"root\" and subgid mappings for group \"root\": No subuid ranges found for user \"root\" in /etc/subuid" level=warning msg="Found no UID ranges set aside for user \"root\" in /etc/subuid." level=warning msg="Found no GID ranges set aside for user \"root\" in /etc/subgid." No buildah sali-container already exists... Package Sali Creating sali-container Completed short name "phusion/baseimage" with unqualified-search registries (origin: /etc/containers/registries.conf) Getting image source signatures Copying blob
sha256:36505266dcc64eeb1010bd2112e6f73981e1a8246e4f6d4e287763b57f101b0b Copying blob
sha256:1907967438a7f3c5ff54c8002847fe52ed596a9cc250c0987f1e2205a7005ff9 Copying blob
sha256:23884877105a7ff84a910895cd044061a4561385ff6c36480ee080b76ec0e771 Copying blob
sha256:2910811b6c4227c2f42aaea9a3dd5f53b1d469f67e2cf7e601f631b119b61ff7 Copying blob
sha256:bc38caa0f5b94141276220daaf428892096e4afd24b05668cd188311e00a635f Copying blob
sha256:53c90fd859186b7b770d65adcb6ae577d4c61133f033e628530b1fd8dc0af643 Copying blob
sha256:d039079bb3a9bf1acf69e7c00db0e6559a86148c906ba5dab06b67c694bbe87c Copying config
sha256:32c929dd2961004079c1e35f8eb5ef25b9dd23f32bc58ac7eccd72b4aa19f262 Writing manifest to image destination Storing signatures level=error msg="Error while applying layer: ApplyLayer
exit status 1 stdout: stderr: potentially insufficient UIDs or GIDs available in user namespace (requested 0:42 for /etc/gshadow): Check /etc/subuid and /etc/subgid: lchown /etc/gshadow: invalid argument" 4 errors occurred while pulling:
* Error initializing source docker://registry.fedoraproject.org/phusion/baseimage:bionic-1.0.0-amd64: Error reading manifest bionic-1.0.0-amd64 in registry.fedoraproject.org/phusion/baseimage: manifest unknown: manifest unknown
* Error initializing source docker://registry.access.redhat.com/phusion/baseimage:bionic-1.0.0-amd64: Error reading manifest bionic-1.0.0-amd64 in registry.access.redhat.com/phusion/baseimage: name unknown: Repo not found
* Error initializing source docker://registry.centos.org/phusion/baseimage:bionic-1.0.0-amd64: Error reading manifest bionic-1.0.0-amd64 in registry.centos.org/phusion/baseimage: manifest unknown: manifest unknown
* Error committing the finished image: error adding layer with blob "sha256:23884877105a7ff84a910895cd044061a4561385ff6c36480ee080b76ec0e771": ApplyLayer exit status 1 stdout: stderr: potentially insufficient UIDs or GIDs available in user namespace (requested 0:42 for /etc/gshadow): Check /etc/subuid and /etc/subgid: lchown /etc/gshadow: invalid argument level=error msg="exit status 125" level=error msg="exit status 125"
The result of buildah info --debug:
{
"debug": {
"buildah version": "1.18.0",
"compiler": "gc",
"git commit": "",
"go version": "go1.15.2"
},
"host": {
"CgroupVersion": "v1",
"Distribution": {
"distribution": "fedora",
"version": "33"
},
"MemFree": 9021378560,
"MemTotal": 15768850432,
"OCIRuntime": "runc",
"SwapFree": 0,
"SwapTotal": 0,
"arch": "amd64",
"cpus": 4,
"hostname": "runner-cvBUQadt-project-2197143-concurrent-0",
"kernel": "4.14.83+",
"os": "linux",
"rootless": false,
"uptime": "6391h 28m 15.45s (Approximately 266.29 days)"
},
"store": {
"ContainerStore": {
"number": 0
},
"GraphDriverName": "vfs",
"GraphOptions": [
"vfs.imagestore=/var/lib/shared"
],
"GraphRoot": "/var/lib/containers/storage",
"GraphStatus": {},
"ImageStore": {
"number": 0
},
"RunRoot": "/var/run/containers/storage"
}
}
I read other posts about the errors I had and came to this configuration, which is not enough. I choose buildah by thinking it would be easy to use from a CI as it is supposed to run rootless, but this is a real nightmare... I am poor lonesome developer and not a sysadmin, I don't understand how to setup linux for buildah... Can somebody help me?
Buildah is going to need to run as root or within a user namespace with sufficent UIDs to install files with different UID.
This looks like for some reason buildah thought it should run within a user namespace and then did not find root listed within the user namespace. This usually happens when you did not run with enough privileges.
I have an App Engine service I deploy a custom runtime in a flexible environment. Deployments functioned normally on 11/20. On 11/21 gcloud app deploy stopped using the Dockerfile and began treating it as a non-custom runtime. Neither the app.yaml nor the Dockerfile have changed.
Below is a sample log from 11/20 and 11/21 respectively. You will note Using Dockerfile found in... of the first log is not present in the second log.
First log, 11/20:
2020-11-20 11:12:02,202 DEBUG root Loaded Command Group: ['gcloud', 'app']
2020-11-20 11:12:02,547 DEBUG root Loaded Command Group: ['gcloud', 'app', 'deploy']
2020-11-20 11:12:02,551 DEBUG root Running [gcloud.app.deploy] with arguments: [--project: "distributed-computing-qa", --version: "9-2-0rc9"]
2020-11-20 11:12:02,621 INFO oauth2client.client Refreshing access_token
2020-11-20 11:12:03,043 DEBUG root Loading runtimes experiment config from [gs://runtime-builders/experiments.yaml]
2020-11-20 11:12:03,076 INFO root Reading [<googlecloudsdk.api_lib.storage.storage_util.ObjectReference object at 0x0000021920ECA548>]
2020-11-20 11:12:03,526 DEBUG root API endpoint: [https://appengine.googleapis.com/], API version: [v1]
2020-11-20 11:12:04,419 INFO ___FILE_ONLY___ Services to deploy:
2020-11-20 11:12:04,420 INFO ___FILE_ONLY___ descriptor: [C:\Users\Benjamin Filkins\Documents\Projects\Deployment\QA\dci\app.yaml]
source: [C:\Users\Benjamin Filkins\Documents\Projects\Deployment\QA\dci]
target project: [distributed-computing-qa]
target service: [default]
target version: [9-2-0rc9]
target url: [https://distributed-computing-qa.uc.r.appspot.com]
2020-11-20 11:12:05,272 DEBUG root No bucket specified, retrieving default bucket.
2020-11-20 11:12:05,274 DEBUG root Using bucket [gs://staging.distributed-computing-qa.appspot.com].
2020-11-20 11:12:05,941 DEBUG root Service [appengineflex.googleapis.com] is already enabled for project [distributed-computing-qa]
2020-11-20 11:12:06,109 INFO ___FILE_ONLY___ Beginning deployment of service [default]...
2020-11-20 11:12:06,123 INFO root Ignoring directory [node_modules]: Directory matches ignore regex.
2020-11-20 11:12:09,085 INFO root Ignoring directory [server\node_modules]: Directory matches ignore regex.
2020-11-20 11:12:09,679 INFO root Using Dockerfile found in C:\Users\Benjamin Filkins\Documents\Projects\Deployment\QA\dci
2020-11-20 11:12:09,679 INFO ___FILE_ONLY___ Building and pushing image for service [default]
2020-11-20 11:12:10,305 DEBUG root Could not call git with args ('config', '--get-regexp', 'remote\\.(.*)\\.url'): Command '['git', 'config', '--get-regexp', 'remote\\.(.*)\\.url']' returned non-zero exit status 1.
2020-11-20 11:12:10,305 INFO root Could not generate [source-context.json]: Could not list remote URLs from source directory: C:\Users\Benjamin Filkins\Documents\Projects\Deployment\QA\dci
2020-11-20 11:12:37,592 INFO root Uploading [C:\Users\BENJAM~1\AppData\Local\Temp\tmpwbdhi28f\src.tgz] to [staging.distributed-computing-qa.appspot.com/us.gcr.io/distributed-computing-qa/appengine/default.9-2-0rc9:latest]
2020-11-20 11:13:03,413 DEBUG root Using builder image: [gcr.io/cloud-builders/docker]
Second log, 11/21:
2020-11-21 05:10:39,041 DEBUG root Loaded Command Group: ['gcloud', 'app']
2020-11-21 05:10:39,177 DEBUG root Loaded Command Group: ['gcloud', 'app', 'deploy']
2020-11-21 05:10:39,181 DEBUG root Running [gcloud.app.deploy] with arguments: [--project: "distributed-computing-qa", --version: "9-2-0rc10"]
2020-11-21 05:10:39,203 DEBUG root Loading runtimes experiment config from [gs://runtime-builders/experiments.yaml]
2020-11-21 05:10:39,231 INFO root Reading [<googlecloudsdk.api_lib.storage.storage_util.ObjectReference object at 0x000001E60B3ED208>]
2020-11-21 05:10:39,522 DEBUG root API endpoint: [https://appengine.googleapis.com/], API version: [v1]
2020-11-21 05:10:40,196 INFO ___FILE_ONLY___ Services to deploy:
2020-11-21 05:10:40,198 INFO ___FILE_ONLY___ descriptor: [C:\Users\Benjamin
Filkins\Documents\Projects\Deployment\QA\dci\app.yaml]
source: [C:\Users\Benjamin Filkins\Documents\Projects\Deployment\QA\dci]
target project: [distributed-computing-qa]
target service: [default]
target version: [9-2-0rc10]
target url: [https://distributed-computing-qa.uc.r.appspot.com]
2020-11-21 05:10:44,749 DEBUG root No bucket specified, retrieving default bucket.
2020-11-21 05:10:44,758 DEBUG root Using bucket [gs://staging.distributed-computing-qa.appspot.com].
2020-11-21 05:10:45,460 DEBUG root Service [appengineflex.googleapis.com] is already enabled for project [distributed-computing-qa]
2020-11-21 05:10:45,645 INFO ___FILE_ONLY___ Beginning deployment of service [default]...
2020-11-21 05:10:45,658 INFO root Ignoring directory [node_modules]: Directory matches ignore regex.
2020-11-21 05:10:48,255 INFO root Ignoring directory [server\node_modules]: Directory matches ignore regex.
2020-11-21 05:10:57,261 DEBUG root Could not call git with args ('config', '--get-regexp', 'remote\\.(.*)\\.url'): Command '['git', 'config', '--get-regexp', 'remote\\.(.*)\\.url']' returned non-zero exit status 1.
2020-11-21 05:10:57,261 INFO root Could not find any remote repositories associated with [C:\Users\Benjamin Filkins\Documents\Projects\Deployment\QA\dci]. Cloud diagnostic tools may not be able to display the correct source code for this deployment.
2020-11-21 05:11:19,099 DEBUG root Skipping upload of [.env]
2020-11-21 05:11:19,099 INFO root Incremental upload skipped 100.0% of data
There are four separate projects this is now occurring on. A co-worker can also confirm the same behavior. What I have tried and can confirm:
Updated Google Cloud SDK to latest version (319.0.0)
Confirmed Cloud Build API is active
Confirmed the Cloud Build service account has the App Engine Admin, Cloud Build Service Account and Service Account User roles
App.yaml and Dockerfile present in root and unchanged between attempts
App.yaml contains runtime: custom and env: flex
What I cannot confirm with certainty or prove did not have an impact:
Changes in OS (Windows 10), though no update had occurred during this time period
Changes in my GCP service account roles/permissions, though given the spread across four distinct projects and impacting multiple users seems incredibly unlikely
Any additional insight into this issue or additional items I may have missed would be greatly appreciated.
I have solved the issue by downgrading to SDK version 271.0.0. My machine has both Python 2.7 and 3 and I noted 274 and above began support for using Python 3.
Upgrading to 274 or above results in the reported issue. 273 and below (I only went as far as 267) does not have the reported issues. While I am currently unable to provide concrete evidence, my suspicion would be down to the SDK's ability to determine which version of Python to prefer. As noted here support of Python 2 was deprecated on 09/30/2020.
BACKGROUND:
We are trying to deploy App as a docker container through AWS-Greengrass Connector Service to the edge device (Running Greengrass core as container in Linux env).
We are configuring the greengrass group connector in cloud for docker app deployment.
ISSUES:
While deploying from AWS greengrass group (AWS cloud), we are able to see successful deployment message, but application is not getting deployed to the edge device (running greengrass core as container).
LOGS:
DockerApplicationDeploymentLog:
[2020-11-05T10:35:42.632Z][FATAL]-lambda_runtime.py:381,Failed to initialize Lambda runtime due to exception: "getgrnam(): name not found: 'docker'"
[2020-11-05T10:35:44.789Z][WARN]-ipc_client.py:162,deprecated arg port=8000 will be ignored
[2020-11-05T10:35:45.012Z][WARN]-ipc_client.py:162,deprecated arg port=8000 will be ignored
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:41,docker deployer starting up
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:45,checking inputs
[2020-11-05T10:35:45.012Z][INFO]-docker_deployer.py:52,docker group permissions
[2020-11-05T10:35:45.02Z][FATAL]-lambda_runtime.py:141,Failed to import handler function "handlers.function_handler" due to exception: "getgrnam(): name not found: 'docker'"
RuntimeSystemLog:
[2020-11-05T10:31:49.78Z][DEBUG]-Restart worker because it was killed. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Reserve worker. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Doing start attempt: {"Attempt count": 0, "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6"}
[2020-11-05T10:31:49.78Z][DEBUG]-Creating directory. {"dir": "/greengrass/ggc/packages/1.11.0/var/lambda/8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.78Z][DEBUG]-changed ownership {"path": "/greengrass/ggc/packages/1.11.0/var/lambda/8b0ee21d-e481-4d27-5e30-cb4d912547f5", "new uid": 121, "new gid": 121}
[2020-11-05T10:31:49.782Z][DEBUG]-Resolving environment variable {"Variable": "PYTHONPATH=/greengrass/ggc/deployment/lambda/arn.aws.lambda.ap-south-1.aws.function.DockerApplicationDeployment.6"}
[2020-11-05T10:31:49.79Z][DEBUG]-Resolving environment variable {"Variable": "PATH=/usr/bin:/usr/local/bin"}
[2020-11-05T10:31:49.799Z][DEBUG]-Resolving environment variable {"Variable": "DOCKER_DEPLOYER_DOCKER_COMPOSE_DESTINATION_FILE_PATH=/home/ggc_user"}
[2020-11-05T10:31:49.82Z][DEBUG]-Creating new worker. {"functionArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.82Z][DEBUG]-Starting worker process. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:49.829Z][DEBUG]-Worker process started. {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "pid": 20471}
[2020-11-05T10:31:49.83Z][DEBUG]-Start work result: {"workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "funcArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "state": "Starting", "initDurationSeconds": 0.012234454}
[2020-11-05T10:31:49.831Z][INFO]-Created worker. {"functionArn": "arn:aws:lambda:ap-south-1:aws:function:DockerApplicationDeployment:6", "workerId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5", "pid": 20471}
[2020-11-05T10:31:53.155Z][DEBUG]-Received a credential provider request {"serverLambdaArn": "arn:aws:lambda:::function:GGTES", "clientId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager getting work {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "funcArn": "arn:aws:lambda:::function:GGTES", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-Successfully GET work. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "fromWorkerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.156Z][DEBUG]-POST work result. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager putting work result. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-WorkManager put work result successfully. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "invocationId": "955c2c43-1187-4001-7988-4213b95eb584"}
[2020-11-05T10:31:53.156Z][DEBUG]-Successfully POST work result. {"invocationId": "955c2c43-1187-4001-7988-4213b95eb584", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.157Z][DEBUG]-Handled a credential provider request {"clientId": "8b0ee21d-e481-4d27-5e30-cb4d912547f5"}
[2020-11-05T10:31:53.158Z][DEBUG]-GET work item. {"fromWorkerId": "148f7a1a-168f-40a5-682d-92e00d56a5df", "ofFunction": "arn:aws:lambda:::function:GGTES"}
[2020-11-05T10:31:53.158Z][DEBUG]-Worker timer doesn't exist. {"workerId": "148f7a1a-168f-40a5-682d-92e00d56a5df"}
Did you doublecheck to meet the requirments listed in
https://docs.aws.amazon.com/greengrass/latest/developerguide/docker-app-connector.html
https://docs.aws.amazon.com/greengrass/latest/developerguide/docker-app-connector.html#docker-app-connector-linux-user
I dont know this particular error, but it complains about some missing basic user/group settings:
[2020-11-05T10:35:42.632Z][FATAL]-lambda_runtime.py:381,Failed to initialize Lambda runtime due to exception: "getgrnam(): name not found: 'docker'"
I'm new to EB and AWS, and my docker images build fine but fail to run on Elastic Beanstalk. My suspicion is that they are not connecting to the database correctly, however, I'm not getting anything useful when I run "eb logs" from the commandline. Here are the errors:
{
"status": "FAILURE",
"api_version": "1.0",
"results": [
{
"status": "FAILURE",
"msg": "(TRUNCATED)...rrun.aws.json: No such file or directory
73927c49adff622a1a229d9369bdd80674d96d20f3eb99a9cdea786f4411a368
Docker container quit unexpectedly after launch: Docker container quit unexpectedly on Wed May 20 17:15:02 UTC 2015:.
Check snapshot logs for details.
Hook /opt/elasticbeanstalk/hooks/appdeploy/pre/04run.sh failed.
For more detail, check /var/log/eb-activity.log using console or EB CLI",
"returncode": 1,
"events": [
{
"msg": "Successfully pulled node:0.12.2-slim",
"severity": "TRACE",
"timestamp": 1432142064
},
{
"msg": "Successfully built aws_beanstalk/staging-app",
"severity": "TRACE",
"timestamp": 1432142094
},
{
"msg": "Docker container quit unexpectedly after launch: Docker container quit unexpectedly on Wed May 20 17:15:02 UTC 2015:. Check snapshot logs for details.",
"severity": "ERROR",
"timestamp": 1432142102
}
]
}
],
"truncated": "true"
}
And after the build completes:
[2015-05-20T17:15:02.694Z] INFO [8603] - [CMD-AppDeploy/AppDeployStage0/AppDeployPreHook/04run.sh] : Activity execution failed, because: cat: /var/app/current/Dockerrun.aws.json: No such file or directory
cat: /var/app/current/Dockerrun.aws.json: No such file or directory
73927c49adff622a1a229d9369bdd80674d96d20f3eb99a9cdea786f4411a368
Docker container quit unexpectedly after launch: Docker container quit unexpectedly on Wed May 20 17:15:02 UTC 2015:. Check snapshot logs for details. (ElasticBeanstalk::ExternalInvocationError)
caused by: cat: /var/app/current/Dockerrun.aws.json: No such file or directory
cat: /var/app/current/Dockerrun.aws.json: No such file or directory
73927c49adff622a1a229d9369bdd80674d96d20f3eb99a9cdea786f4411a368
Docker container quit unexpectedly after launch: Docker container quit unexpectedly on Wed May 20 17:15:02 UTC 2015:. Check snapshot logs for details. (Executor::NonZeroExitStatus)
The docker containers work locally, so what else can I do to figure out what's going wrong? I keep hearing about "snapshot logs" but where do I check these snapshot logs? Are they the output of what I'm already running "eb logs"?
I had this issue for a day or two. I managed to see the logs by going AWS Console > Elastic Beanstalk > Environment > ${YOUR_APPLICATION_ENV}
On the left pane;
Log > Request Logs > Download > Open in any text editor.
/var/log/eb-docker/containers/eb-current-app/
Follow the path and you will see the what causing the error and can fix it.
Assuming you have SSH access to the EC2 instance running your container, these are a few log files useful for debugging single container Docker instances in Beanstalk:
/tmp/docker_build.log
/tmp/docker_pull.log
/tmp/docker_run.log
In order to look at the error logs for the running process, first read the
/tmp/docker_run.log file. This file contains the Docker process id. Something like this:
c6ae58e4ad77e926f6a8230237acf95771c6b5d80d48fb1bc20591f964fd690c
The first few characters should match the process listed from the command docker ps. Use this value to find the corresponding log file in the following directory:
/var/log/eb-docker/containers/eb-current-app/
The format of the file name is eb-docker-ps-id-stdouterr.log
I had this issue when my containers were crashing because there was no traffic allowed between EBS and RDS. If you use any database try curling it. Also, you might want to try sudo docker logs CONTAINER_ID a try catching something useful. What might help also is trying to launch container manually from the instance. There's slight possibility something will come up.
I get the following error while starting one of the instances on OpsWorks. Does anyone have any ideas about this error?
This is printed before the official error announcement (based on request by sethvargo):
[2014-08-13T17:27:08+00:00] INFO: Processing directory[/srv/www/instance/shared/cached-copy] action
delete (opsworks_delayed_job::deploy line 48)
[2014-08-13T17:27:08+00:00] INFO: Processing ruby_block[change HOME to /home/deploy for source checkout] action run (opsworks_delayed_job::deploy line 56)
[2014-08-13T17:27:08+00:00] INFO: ruby_block[change HOME to /home/deploy for source checkout] called
[2014-08-13T17:27:08+00:00] INFO: Processing deploy[/srv/www/instance] action deploy (opsworks_delayed_job::deploy line 65)
[2014-08-13T17:27:09+00:00] INFO: deploy[/srv/www/instance] cloning repo git#github.com:xx/xx.git to /srv/www/instance/shared/cached-copy
[2014-08-13T17:27:17+00:00] INFO: deploy[/srv/www/instance] checked out branch: master onto: deploy reference: 714153bbb6a37f0484526cf4da3eda4fcd8df977
[2014-08-13T17:27:17+00:00] INFO: deploy[/srv/www/instance] synchronizing git submodules
[2014-08-13T17:27:17+00:00] INFO: deploy[/srv/www/instance] enabling git submodules
[2014-08-13T17:27:18+00:00] INFO: deploy[/srv/www/instance] set user to deploy
[2014-08-13T17:27:18+00:00] INFO: deploy[/srv/www/instance] set group to www-data
[2014-08-13T17:27:22+00:00] INFO: deploy[/srv/www/instance] copied the cached checkout to /srv/www/instance/releases/20140813172708
[2014-08-13T17:27:23+00:00] INFO: deploy[/srv/www/instance] set user to deploy
[2014-08-13T17:27:23+00:00] INFO: deploy[/srv/www/instance] set group to www-data
[2014-08-13T17:27:23+00:00] INFO: deploy[/srv/www/instance] running callback before_migrate
[2014-08-13T17:27:23+00:00] INFO: deploy[/srv/www/instance] created directories before symlinking: tmp,public,config
[2014-08-13T17:27:23+00:00] INFO: deploy[/srv/www/instance] linked shared paths into current release: system => public/system, pids => tmp/pids, log => log
[2014-08-13T17:27:23+00:00] INFO: deploy[/srv/www/instance] made pre-migration symlinks
[2014-08-13T17:27:24+00:00] INFO: deploy[/srv/www/instance] set user to deploy
[2014-08-13T17:27:24+00:00] INFO: deploy[/srv/www/instance] set group to www-data
[2014-08-13T17:27:24+00:00] INFO: Gemfile detected. Running bundle install.
[2014-08-13T17:27:24+00:00] INFO: sudo su - deploy -c 'cd /srv/www/instance/releases/20140813172708 && /usr/local/bin/bundle install --path /home/deploy/.bundler/instance --without=test development'
Here is the error:
================================================================================
Error executing action `deploy` on resource 'deploy[/srv/www/instance]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '127'
Cookbook Trace:
Cookbook Trace:
---------------
/var/lib/aws/opsworks/cache.stage2/cookbooks/opsworks_commons/libraries/shellout.rb:8:in `shellout'
/var/lib/aws/opsworks/cache.stage2/cookbooks/rails/libraries/rails_configuration.rb:41:in `bundle'
/var/lib/aws/opsworks/cache.stage2/cookbooks/deploy/definitions/opsworks_deploy.rb:103:in `block (3 levels) in from_file'
Resource declaration is:
Resource Declaration:
---------------------
# In /var/lib/aws/opsworks/cache.stage2/cookbooks/deploy/definitions/opsworks_deploy.rb
65: deploy deploy[:deploy_to] do
66: provider Chef::Provider::Deploy.const_get(deploy[:chef_provider])
67: keep_releases deploy[:keep_releases]
68: repository deploy[:scm][:repository]
69: user deploy[:user]
70: group deploy[:group]
71: revision deploy[:scm][:revision]
72: migrate deploy[:migrate]
73: migration_command deploy[:migrate_command]
74: environment deploy[:environment].to_hash
75: create_dirs_before_symlink( deploy[:create_dirs_before_symlink] )
76: symlink_before_migrate( deploy[:symlink_before_migrate] )
77: action deploy[:action]
78:
79: if deploy[:application_type] == 'rails'
80: restart_command "sleep #{deploy[:sleep_before_restart]} && #{node[:opsworks][:rails_stack][:restart_command]}"
81: end
82:
With credit to Seth Vargo, the problem was that the bundler gem was not being installed by OpsWorks. The Chef version is 11.10. We had to add the bundler gem manually to the default Chef setup file.
Faced the same issue while booting an instance under Opsworks.
After debugging, found the reason for the issue was: The Chef version wasn't mentioned anywhere in Stack or Layer settings. Thus, while running the recipes, some default version of chef was being picked up, which did not have bundler installed on it by default. So, when the recipe was trying to run "run bundle install" , it was exiting with an error.
Simple solution is to explicitly add the chef version along other settings(if any) as below under stack or layer settings:
{
<other settings>
"opsworks_bundler": {
"manage_package": "true",
"version": "1.16.3"
}
}