Concourse pending for long time before running task - docker

I have a Concourse Pipeline with a Task using a Docker image that is stored in our local Artifactory server. Every time I start the Pipeline it takes about 5 mins until the tasks are finally run. The log looks like this:
I assume that Concourse somehow checks for newer versions of the Docker image. Unfortunately I have no chance to debug since all the logfiles on the Concourse worker VM offer no usable information.
My Questions:
How can I possibly debug what's going on, when Concourse says "preparing build" and the status is "pending".
Is there any chance to avoid Concourse from checking for a newer version of the Docker image? I tagged the Docker image with version latest - might this be an issue?
Any further ideas how I could speed things up?
Here is the detailed configuration of my pipeline and tasks:
pipeline.yml:
---
resources:
- name: concourse-image
type: docker-image
source:
repository: OUR_DOMAIN/subpath/concourse
username: ...
password: ...
insecure_registries:
- OUR_DOMAIN
# ...
jobs:
- name: deploy
public: true
plan:
- get: concourse-image
- task: create-manifest
image: concourse-image
file: concourse/tasks/create-manifest/task.yml
params:
# ...
task.yml:
---
platform: linux
inputs:
- name: git
- name: concourse
outputs:
- name: deployment-manifest
run:
path: concourse/tasks/create-and-upload-cloud-config/task.sh

The reason for this problem was that we pulled the Docker image from an internal Docker registry, which is running on HTTP only. Concourse tried to pull the image using HTTPS and it took around 5 mins until Concourse switched to HTTP (that's what a tcpdump on the worker showed us).
Changing the resource configuration to the following config solved the problems:
resources:
- name: concourse-image
type: docker-image
source:
repository: OUR_SERVER:80/subpath/concourse
username: docker-readonly
password: docker-readonly
insecure_registries:
- OUR_SERVER:80
So basically it was adding the port explicitly to the repository and the insecure_registries.

Related

GitHub Actions - Running DotNetCore Tests with Neo4j service dependency

I would need some help with steps to run integration tests on GitHub. My project needs Neo4j so I have declared neo4j as a service. However while running my test project, I am seeing the following error
Connection with the server breaks due to ExtendedSocketException:
Resource temporarily unavailable Please ensure that your database is listening on the correct
host and port and that you have compatible encryption settings both on Neo4j server and driver.
Note that the default encryption setting has changed in Neo4j 4.0.
I am not sure if the issue is with how I am specifying the actions or something to do with Neo4j.
Here's my GitHub CI File: dotnetcore.yml
name: .NET Core
on:
push:
branches: [ master, GithubActions ]
pull_request:
branches: [ master ]
jobs:
build:
runs-on: ubuntu-latest
env:
NEO4J_HOST: neo4j
# Service containers to run with `container-job`
services:
# Label used to access the service container
neo4j:
# Docker Hub image
image: neo4j:4.0.1
ports:
- 7474:7474 # used for http
- 7687:7687 # used for bolt
env:
NEO4J_AUTH: neo4j/password
NEO4J_dbms_connector_http_advertised__address: "NEO4J_HOST:7687"
NEO4J_dbms_connector_bolt_advertised__address: "NEO4J_HOST:7687"
steps:
- uses: actions/checkout#v2
- name: Setup .NET Core
uses: actions/setup-dotnet#v1
with:
dotnet-version: 3.1.101
- name: Install dependencies
run: dotnet restore ./src/BbcCorp.Neo4j.NeoGraphManager.sln
- name: Build
run: dotnet build --configuration Release --no-restore ./src/BbcCorp.Neo4j.NeoGraphManager.sln
- name: Run Integration Tests
env:
NEO4J_SERVER: NEO4J_HOST
run: |
cd ./src/BbcCorp.Neo4j.Tests
docker ps
dotnet test --no-restore --verbosity normal BbcCorp.Neo4j.Tests.csproj
I am trying to connect to: bolt://NEO4J_HOST:7687 but it just doesn't connect.
Error Details:
Using Neo4j Server NEO4J_HOST:7687 as neo4j/password
Error executing query. Connection with the server breaks due to ExtendedSocketException: Resource temporarily unavailable Please ensure that your database is listening on the correct host and port and that you have compatible encryption settings both on Neo4j server and driver. Note that the default encryption setting has changed in Neo4j 4.0.
[xUnit.net 00:00:00.83] NeoGraphManagerIntegrationTests.SimpleNodeTests [FAIL]
[xUnit.net 00:00:00.83] Neo4j.Driver.ServiceUnavailableException : Connection with the server breaks due to ExtendedSocketException: Resource temporarily unavailable Please ensure that your database is listening on the correct host and port and that you have compatible encryption settings both on Neo4j server and driver. Note that the default encryption setting has changed in Neo4j 4.0.
[xUnit.net 00:00:00.83] ---- System.Net.Internals.SocketExceptionFactory+ExtendedSocketException : Resource temporarily unavailable
The settings file for my C# test project looks like this
{
"NEO4J_SERVER": "localhost",
"NEO4J_PORT": 7687,
"NEO4J_DB_USER": "neo4j",
"NEO4J_DB_PWD": "password"
}
The tests runs fine from my local machine but fails when we execute on GitHub.

Openshift: any deployment resulted in Application is not available

Fist time deploying to OpenShift (actually minishift in my Windows 10 Pro). Any sample application I deploied successfully resulted in:
From Web Console I see a weird message "Build #1 is pending" although I saw it was successfully from PowerShell
I found someone fixing similiar issue changing to 0.0.0.0 (enter link description here) but I give a try and it isn't the solution in my case.
Here are the full logs and how I am deploying
PS C:\to_learn\docker-compose-to-minishift\first-try> oc new-app https://github.com/openshift/nodejs-ex warning: Cannot check if git requires authentication.
--> Found image 93de123 (16 months old) in image stream "openshift/nodejs" under tag "10" for "nodejs"
Node.js 10.12.0
---------------
Node.js available as docker container is a base platform for building and running various Node.js applications and frameworks. Node.js is a platform built on Chrome's JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.
Tags: builder, nodejs, nodejs-10.12.0
* The source repository appears to match: nodejs
* A source build using source code from https://github.com/openshift/nodejs-ex will be created
* The resulting image will be pushed to image stream tag "nodejs-ex:latest"
* Use 'start-build' to trigger a new build
* WARNING: this source repository may require credentials.
Create a secret with your git credentials and use 'set build-secret' to assign it to the build config.
* This image will be deployed in deployment config "nodejs-ex"
* Port 8080/tcp will be load balanced by service "nodejs-ex"
* Other containers can access this service through the hostname "nodejs-ex"
--> Creating resources ...
imagestream.image.openshift.io "nodejs-ex" created
buildconfig.build.openshift.io "nodejs-ex" created
deploymentconfig.apps.openshift.io "nodejs-ex" created
service "nodejs-ex" created
--> Success
Build scheduled, use 'oc logs -f bc/nodejs-ex' to track its progress.
Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
'oc expose svc/nodejs-ex'
Run 'oc status' to view your app.
PS C:\to_learn\docker-compose-to-minishift\first-try> oc get bc/nodejs-ex -o yaml apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
annotations:
openshift.io/generated-by: OpenShiftNewApp
creationTimestamp: 2020-02-20T20:10:38Z
labels:
app: nodejs-ex
name: nodejs-ex
namespace: samplepipeline
resourceVersion: "1123211"
selfLink: /apis/build.openshift.io/v1/namespaces/samplepipeline/buildconfigs/nodejs-ex
uid: 1003675e-541d-11ea-9577-080027aefe4e
spec:
failedBuildsHistoryLimit: 5
nodeSelector: null
output:
to:
kind: ImageStreamTag
name: nodejs-ex:latest
postCommit: {}
resources: {}
runPolicy: Serial
source:
git:
uri: https://github.com/openshift/nodejs-ex
type: Git
strategy:
sourceStrategy:
from:
kind: ImageStreamTag
name: nodejs:10
namespace: openshift
type: Source
successfulBuildsHistoryLimit: 5
triggers:
- github:
secret: c3FoC0RRfTy_76WEOTNg
type: GitHub
- generic:
secret: vlKqJQ3ZBxfP4HWce_Oz
type: Generic
- type: ConfigChange
- imageChange:
lastTriggeredImageID: 172.30.1.1:5000/openshift/nodejs#sha256:3cc041334eef8d5853078a0190e46a2998a70ad98320db512968f1de0561705e
type: ImageChange
status:
lastVersion: 1

scdf 2.1 k8s config security context non root no fs writable

I need config scdf2 skipper , scdf and app pods to run without root and no write into filesystem pod .
i made changes into config yamls
data:
application.yaml: |-
spring:
cloud:
skipper:
server:
platform:
kubernetes:
accounts:
default:
namespace: default
deploymentServiceAccountName: scdf2-server-data-flow
securityContext:
runAsUser: 2000
allowPrivilegeEscalation: false
limits:
Colla
And scdf start runs with user "2000", (there is a problem with writeable local maven repo, fixed with a pvc nfs)...
But, the app pods always starts as root user, no 2000 users.
I've change skipper-config with securitycontext, .. any clues?
TX
What you set as deploymentServiceAccountName is one of the Kubernetes deployer properties that can be used for deploying streaming applications or launching task applications.
Looks like the above configuration is not applied to your SCDF or Skipper server configuration properties as they should at least get applied when deploying applications.
For the SCDF server and Skipper servers, in your SCDF/Skipper server deployment configurations, you need to explicitly set your serviceAccountName (not as deploymentServiceAccountName as its name suggests, the deploymentServiceAccountName is internally converted into the actual serviceAccountName for the respective stream/task apps when they get deployed).
We got it. We use it into skipp/scdf deploy, not in pods deploymente.
Your request:
Into scdf / skipper cfg deployment got:
spec:
containers:
- name: {{ template "scdf.fullname" . }}-server
image: {{ .Values.server.image }}:{{ .Values.server.version }}
imagePullPolicy: {{ .Values.server.imagePullPolicy }}
volumeMounts:
...
serviceAccountName: {{ template "scdf.serviceAccountName" . }}
Do you tell me to change config map scdf/skipper to task and streams? Another property into or before config about deployment
How is it relation about "serviceaccount" and user running process into pod?
How related serviceaccount with running process user "2000"
I cant understand.
Please help, it is very important to running without root and no use local filesystem from pod excepts "tmp" files

using ansible with docker-compose

I am trying to deploy a docker setup using Ansible playbook. For this, I am using docker_service.
My Playbook looks like:
---
- name: Run Docker compose
hosts: all
gather_facts: no
tasks:
- debug: msg="Container - {{ inventory_hostname }}"
- docker_service:
project_src: "compose"
state: absent
- docker_service:
project_src: "compose"
state: present
Upon running this simple playbook as:
ansible-playbook -v playbook.yml --ask-sudo-pass
I added --ask-sudo-pass to ensure that it was not a permission issue.
OUTPUT
SUDO password:
PLAY [Run Docker compose] ******************************************************
TASK [debug] *******************************************************************
ok: [prolims-staging] => {
"msg": "Container - prolims-staging"
}
TASK [docker_service] **********************************************************
fatal: [prolims-staging]: FAILED! => {"changed": false, "msg": "Error connecting: Error while fetching server API version: ('Connection aborted.', error(13, 'Permission denied'))"}
to retry, use: --limit #/data/prolims-provision/provision-docker.retry
PLAY RECAP *********************************************************************
prolims-staging : ok=1 changed=0 unreachable=0 failed=1
I did try looking out for this issue on other forums as well ( and similar questions on this StackOverflow too), but those were not helpful.
Note: I am able to run docker-compose successfully in the target machine from its CLI (using sudo).
Also, I tried playing around with docker_container as well. I tried to execute a playbook with contents below:
...
- name: check container status
command: docker ps
register: result
- name: Create a container
docker_container:
name: db_pg
image: "postgres:latest"
state: present
recreate: yes
...
and running this playbook works perfectly fine.
I assume, posting my docker-compose file might not be relevant here.
I followed this example, but did not work. Maybe, I might be missing some stupid or really important thing here.
Any help on understanding and resolving this issue would be appreciated.
I am able to run docker-compose successfully in the target machine from its CLI (using sudo).
So you need to use become declaration for the task.
I added --ask-sudo-pass to ensure that it was not a permission issue.
Just adding --ask-sudo-pass to the ansible-playbook parameters doesn't have any effect unless the relevant tasks/plays have become declaration (and become_method is set to sudo, but this is by default).
Reference.

How to use wait_for in a two node cluster docker container deployment?

I have a cluster with two machines. I have a playbook to run a docker container in each one:
---
- hosts: machines_in_the_cluster
tasks:
- name: Run app container
docker:
name: "name"
image: whatever:1.0
pull: always
state: reloaded
ports:
- "8080:8080"
It starts a tomcat server in each machine. But I don't want to execute the task in the second machine until the first has finished starting tomcat.
How can I solve it? Is there any kind of health checking via http? Is there a solution using wait_for?
You can run an entire playbook in serial mode to make sure that Ansible completes the entire playbook against a subset of hosts before moving on to another set.
You can do this simply by adding the serial parameter to the playbook like so:
---
- hosts: machines_in_the_cluster
serial: 1
tasks:
- name: Run app container
...
You can specify either an absolute number of hosts to do at a time (the above example just does one at a time) or a percentage of the available hosts. Running this:
- hosts: machines_in_the_cluster
serial: 50%
tasks:
- name: Run app container
...
When you have 5 hosts in the targeted group will run 3 times, targeting the first 2 hosts, the second 2 hosts and then finally the last host.
I have solved it by adding a task to wait for the tomcat start up:
- name: Wait for server startup
local_action: wait_for host={{inventory_hostname}} port={{port}} timeout=50 delay=10
This will wait 10 seconds doing nothing and then will start polling to check if port is available. If timeout is reached, task will fail.
I have also added serial: 1 in the playbook (as ydaetskcoR mentions), in order to run tasks sequentally

Resources