I have a Jenkins server which builds docker images (using the docker maven plugin).
These Builds normally take about 40 seconds. But sometimes they take up to 1.5 hours.
Now I am wondering why. And I am also wondering how I can debug the situation.
The output of the build in the jenkins console just tells me that it hangs during the maven docker build. Example output where it hangs:
[INFO] ------------------------------------------------------------------------
[INFO] Building MY Docker Image MY Image 0.5.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] >>> docker-maven-plugin:0.15.14:build (default-cli) > initialize # docker-image-my-image >>>
[INFO]
[INFO] <<< docker-maven-plugin:0.15.14:build (default-cli) < initialize # docker-image-my-image <<<
[INFO]
[INFO] --- docker-maven-plugin:0.15.14:build (default-cli) # docker-image-my-image ---
[INFO] Building tar: /usr/home/jenkinshome/workspace/JOB_NAME/source-repo/docker/image-my-image/docker-build.tar
[INFO] DOCKER> docker-build.tar: Created [my-image] in 69 milliseconds
And now it hangs.
Can I somehow watch at which operation docker is hanging?
UPDATE
The build also hangs on the host maschine, when started directly via docker build.
This is the end of the output of doing this with strace:
futex(0xc820028908, FUTEX_WAKE, 1) = 1
clock_gettime(CLOCK_REALTIME, {1486567620, 329675667}) = 0
clock_gettime(CLOCK_REALTIME, {1486567620, 329756643}) = 0
socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
connect(3, {sa_family=AF_LOCAL, sun_path="/var/run/docker.sock"}, 23) = 0
clock_gettime(CLOCK_REALTIME, {1486567620, 330059768}) = 0
epoll_create1(EPOLL_CLOEXEC) = 4
epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=3892185696, u64=139659048782432}}) = 0
getsockname(3, {sa_family=AF_LOCAL, NULL}, [2]) = 0
getpeername(3, {sa_family=AF_LOCAL, sun_path="/var/run/docker.sock"}, [23]) = 0
futex(0xc820028908, FUTEX_WAKE, 1) = 1
read(3, 0xc820349000, 4096) = -1 EAGAIN (Resource temporarily unavailable)
write(3, "POST /v1.24/build?buildargs=%7B%"..., 349) = 349
futex(0xc820028d08, FUTEX_WAKE, 1) = 1
write(3, "7ff\r\nockerfile\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2054) = 2054
ioctl(0, TIOCGWINSZ, {ws_row=57, ws_col=105, ws_xpixel=0, ws_ypixel=0}) = 0
ioctl(0, TIOCGWINSZ, {ws_row=57, ws_col=105, ws_xpixel=0, ws_ypixel=0}) = 0
) = 481, "Sending build context to Docker "..., 48Sending build context to Docker daemon 2.048 kB
write(1, "\r\n", 2
) = 2
write(3, "0\r\n\r\n", 5) = 5
futex(0xc820028d08, FUTEX_WAKE, 1) = 1
futex(0x12d3a48, FUTEX_WAIT, 0, NULLStep 1 : FROM jboss-db
---> 74a0020a9922
Step 2 : MAINTAINER ***
---> Using cache
---> 5d38cbd5501b
Step 3 : USER root:root
---> Running in 64b14554d8be
And at this point it hangs
You have a bunch of options here, and you'll have to feel out which one works best for you. You can:
strace the process on the docker host.
Use something like sysdig on the host, which is easier to use and more detailed.
do printf debugging or some sort of logging that indicates your build is still alive to stdout/stderr and the docker log -f <build container> to see if it's the build that's stuck or the jenkins slave/master processes.
Any more detailed advice will probably need more information about your setup, what you mean by hang and which component is hung exactly.
Related
My documentation tests are silently not executed in my Docker environment while everything works on both Windows and Ubuntu/Debian hosts.
I created a minimal Github Repository to demonstrate the issue. I tried two different versions of Rust nightly and Rust stable, debug/release, all without success. See my Dockerfile and complete build output.
Example code:
/// Fixes string arrays which can also be objects into string arrays
/// # Examples
///
/// ```
/// assert_eq!(cargo_test_doc_docker::add(1, 2), 3);
/// ```
pub fn add(a: i32, b: i32) -> i32 {
a + b
}
Result when executing on Debian:
arturh#host:~/projects/cargo-test-doc-docker$ cargo test
Compiling cargo-test-doc-docker v0.1.0 (/home/arturh/projects/cargo-test-doc-docker)
Finished test [unoptimized + debuginfo] target(s) in 2.39s
Running target/debug/deps/cargo_test_doc_docker-9d5ae146cd4c3628
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running target/debug/deps/cargo_test_doc_docker-2a696d2579128ce1
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Doc-tests cargo-test-doc-docker
running 1 test
test src/lib.rs - add (line 4) ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
The problem occurs when executing the build on Docker. This is a minimal Dockerfile that reproduces the problem:
FROM ekidd/rust-musl-builder:nightly-2020-01-26-openssl11 as build
COPY --chown=rust:rust . .
RUN cargo test; echo $?
Result for every Rust toolchain I tried:
Step 6/17 : RUN cargo test; echo $?
---> Running in b266fc72f3c1
Compiling cargo-test-doc-docker v0.1.0 (/home/rust/src)
Finished test [unoptimized + debuginfo] target(s) in 0.32s
Running target/x86_64-unknown-linux-musl/debug/deps/cargo_test_doc_docker-7b40e7e5b47f49eb
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running target/x86_64-unknown-linux-musl/debug/deps/cargo_test_doc_docker-0bfec9752a7bec14
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
0
It does not even try to execute any doc tests and exits with zero so it's not easily noticed. I guess it must be something the Docker base image does, but what could that be?
Cross Compilation
This is a logical, if surprising, outcome of cross-compilation.
To understand why, imagine that you:
Compile on a Linux x64 machine (Host).
Target a Windows ARM machine.
The generated code cannot be executed on the current host (Linux x64): it is prepared for a different CPU (instruction set) and OS (system calls).
Since the tests -- unit tests, integration tests, and documentation tests -- are also generated for the target architecture, they cannot be executed on the host either.
What to do with the tests?
If your code has no specific dependency on a specific platform, then you can content yourself with compiling for the host and running those.
Otherwise, you will need access to a machine that can actually run the cross-compiled binaries. You can still use cross-compilation to speed up building those binaries, and then upload them to either a physical or virtual machine to run them.
AFAIK Cargo does not help with the latter, so you'll need your own scripts.
Shepmaster was right, when I target the x86_64-unknown-linux-musl it also does not work locally on Debian:
arturh#host:~/projects/cargo-test-doc-docker$ cargo test --target=x86_64-unknown-linux-musl; echo $?
Compiling cargo-test-doc-docker v0.1.0 (/home/arturh/projects/cargo-test-doc-docker)
Finished test [unoptimized + debuginfo] target(s) in 0.28s
Running target/x86_64-unknown-linux-musl/debug/deps/cargo_test_doc_docker-8dfff5631875d404
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running target/x86_64-unknown-linux-musl/debug/deps/cargo_test_doc_docker-eb877250b708174b
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
0
I guess I must have a separate build step for testing the doc tests with the target x86_64-unknown-linux-gnu.
OS: Ubuntu 16.04, NVIDIA Driver
I followed the drake installation procedure as described in drake website.(I have also installed nvidia driver)After installation, as per the instructions when I run:
$ xhost +local:root; nvidia-docker run -i --rm -e DISPLAY -e QT_X11_NO_MITSHM=1 -v /tmp/.X11-unix:/tmp/.X11-unix --privileged -t drake; xhost -local:root
I am getting the following error:(simulation is not being displayed, but the build is successful)
non-network local connections being added to access control list
+ [[ 0 -eq 0 ]]
+ bazel build //tools:drake_visualizer //examples/acrobot:run_passive
Starting local Bazel server and connecting to it...
INFO: Analysed 2 targets (95 packages loaded, 18023 targets configured).
INFO: Found 2 targets...
INFO: Elapsed time: 89.206s, Critical Path: 1.58s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
+ sleep 2
+ ./bazel-bin/tools/drake_visualizer
+ bazel run //examples/acrobot:run_passive
INFO: Analysed target //examples/acrobot:run_passive (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples/acrobot:run_passive up-to-date:
bazel-bin/examples/acrobot/run_passive
INFO: Elapsed time: 1.031s, Critical Path: 0.01s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
process 297: D-Bus library appears to be incorrectly set up; failed to read machine uuid: UUID file '/etc/machine-id' should contain a hex string of length 32, not length 0, with no other text
See the manual page for dbus-uuidgen to correct this issue.
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
Could not initialize OpenGL for RasterGLSurface, reverting to RasterSurface.
Could not initialize OpenGL for RasterGLSurface, reverting to RasterSurface.
Could not initialize OpenGL for RasterGLSurface, reverting to RasterSurface.
Could not initialize GLX
./setup/ubuntu/docker/entrypoint.sh: line 15: 297 Aborted (core dumped) ./bazel-bin/tools/drake_visualizer
non-network local connections being removed from access control list
We are in the process of updating the instructions to use nvidia-docker 2.0. Please check the drake repo again later this week for an update. In meantime, you may wish to try the open-source driver instructions on the same page.
I set a Command Line phase in a TFS Builds to execute a Robocopy and it returns an error code 1, although there are no errors during the robocopy execution.
If I run the Robocopy command directly in the Cmd it works, and the Job log shows that the Robocopy works porperly until the end:
2019-02-27T10:21:58.3234459Z Total Copied Skipped
Mismatch FAILED Extras
2019-02-27T10:21:58.3234459Z Dirs : 1688 0 1688 0 0 0
2019-02-27T10:21:58.3234459Z Files : 6107 6 6101 0 0 0
2019-02-27T10:21:58.3234459Z Bytes : 246.01 m 299.2 k 245.71 m 0 0 0
2019-02-27T10:21:58.3234459Z Times : 0:00:17 0:00:00 0:00:00 0:00:17
2019-02-27T10:21:58.3234459Z
2019-02-27T10:21:58.3234459Z
2019-02-27T10:21:58.3234459Z Speed : 3879329 Bytes/sec.
2019-02-27T10:21:58.3234459Z Speed : 221.976 MegaBytes/min.
2019-02-27T10:21:58.3234459Z
2019-02-27T10:21:58.3234459Z Ended : Wed Feb 27 11:21:58 2019
2019-02-27T10:21:58.3702460Z ##[error]Process completed with exit code 1.
Here is an image about the Build configuration:
RoboCopy has ExitCodes > 0.
In your example Exit Code = 1 means One or more files were copied successfully (that is, new files have arrived).
To fix this you could create a Powershell Script, which executes the copy and overwrites the Exit code.
like
param( [String] $sourcesDirectory, [String] $destinationDirectory, [String] $attributes)
robocopy $sourcesDirectory $destinationDirectory $attributes
if( $LASTEXITCODE -ge 8 )
{
throw ("An error occured while copying. [RoboCopyCode: $($LASTEXITCODE)]")
}
else
{
$global:LASTEXITCODE = 0;
}
exit 0
robocopy use the error code different, error code 1 is not a real error, it just saying that one or more files were copied successfully.
TFS recognize error code 1 as a real error and fail the build.
To solve that you need to change the robocopy error code:
(robocopy c:\dirA c:\dirB *.*) ^& IF %ERRORLEVEL% LEQ 1 exit 0
The ^& IF %ERRORLEVEL% LEQ 1 exit 0 convert the error code 1 to 0 and then the TFS build will not be failed.
I am trying to use for the first time the Yocto tool for my BeagleBoneBlack.
First I run this bash file to install Yocto:
#!/bin/bash
WKDIR=/work
mkdir -p $WKDIR/beaglebone-black/yocto/sources
mkdir -p $WKDIR/beaglebone-black/yocto/builds
cd $WKDIR/beaglebone-black/yocto/sources
git clone -b morty git://git.yoctoproject.org/poky.git poky-morty
cd $WKDIR/beaglebone-black/yocto/
source sources/poky-morty/oe-init-build-env builds/build-bbb-morty
Then I edited the file local.conf at "build-bbb-morty/conf" diretory:
MACHINE ?= "beaglebone"
and added
DL_DIR ?= "${TOPDIR}/../dl"
IMAGE_INSTALL_append = " kernel-modules kernel-devicetree"
Then I run bitbake:> bitbake core-image-minimal
After about 8 hours in my Core i7 five generation I got this result at my terminal output and I have no idea what I need to do to fix it:
bitbake core-image-minimal
Parsing recipes: 100% |########################################################################################################| Time: 0:02:55
Parsing of 864 .bb files complete (0 cached, 864 parsed). 1318 targets, 67 skipped, 0 masked, 0 errors.
NOTE: Resolving any missing task queue dependencies
Build Configuration:
BB_VERSION = "1.32.0"
BUILD_SYS = "x86_64-linux"
NATIVELSBSTRING = "Ubuntu-16.04"
TARGET_SYS = "arm-poky-linux-gnueabi"
MACHINE = "beaglebone"
DISTRO = "poky"
DISTRO_VERSION = "2.2.1"
TUNE_FEATURES = "arm armv7a vfp neon callconvention-hard cortexa8"
TARGET_FPU = "hard"
meta
meta-poky
meta-yocto-bsp = "morty:a3fa5ce87619e81d7acfa43340dd18d8f2b2d7dc"
NOTE: Fetching uninative binary shim from http ://downloads.yoctoproject.org/releases/uninative/1.4/x86_64-nativesdk-libc.tar.bz2;sha256sum=101ff8f2580c193488db9e76f9646fb6ed38b65fb76f403acb0e2178ce7127ca
--2017-01-18 15:51:09-- http ://downloads.yoctoproject.org/releases/uninative/1.4/x86_64-nativesdk-libc.tar.bz2
Resolving downloads.yoctoproject.org (downloads.yoctoproject.org)... 198.145.20.127
Connecting to downloads.yoctoproject.org (downloads.yoctoproject.org)|198.145.20.127|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2473216 (2.4M) [application/octet-stream]
Saving to: ‘/work/beaglebone-black/yocto/builds/build-bbb-morty/../dl/uninative/101ff8f2580c193488db9e76f9646fb6ed38b65fb76f403acb0e2178ce7127ca/x86_64-nativesdk-libc.tar.bz2’
2017-01-18 15:51:18 (297 KB/s) - ‘/work/beaglebone-black/yocto/builds/build-bbb-morty/../dl/uninative/101ff8f2580c193488db9e76f9646fb6ed38b65fb76f403acb0e2178ce7127ca/x86_64-nativesdk-libc.tar.bz2’ saved [2473216/2473216]
Initialising tasks: 100% |#####################################################################################################| Time: 0:00:14
NOTE: Executing SetScene Tasks
NOTE: Executing RunQueue Tasks
WARNING: attr-native-2.4.47-r0 do_fetch: Failed to fetch URL http ://download.savannah.gnu.org/releases/attr/attr-2.4.47.src.tar.gz, attempting MIRRORS if available
WARNING: libpng-native-1.6.24-r0 do_fetch: Failed to fetch URL http ://distfiles.gentoo.org/distfiles/libpng-1.6.24.tar.xz, attempting MIRRORS if available
ERROR: core-image-minimal-1.0-r0 do_image_wic: Function failed: do_image_wic (log file is located at /work/beaglebone-black/yocto/builds/build-bbb-morty/tmp/work/beaglebone-poky-linux-gnueabi/core-image-minimal/1.0-r0/temp/log.do_image_wic.23788)
ERROR: Logfile of failure stored in: /work/beaglebone-black/yocto/builds/build-bbb-morty/tmp/work/beaglebone-poky-linux-gnueabi/core-image-minimal/1.0-r0/temp/log.do_image_wic.23788
Log data follows:
| DEBUG: Executing python function set_image_size
| DEBUG: Python function set_image_size finished
| DEBUG: Executing shell function do_image_wic
| Checking basic build environment...
| Done.
|
| Build artifacts not found, exiting.<br/>
| (Please check that the build artifacts for the machine
| selected in local.conf actually exist and that they
| are the correct artifacts for the image (.wks file))
|
| The artifact that couldn't be found was kernel-dir:
| /work/beaglebone-black/yocto/builds/build-bbb-morty/tmp/deploy/images/beaglebone
| WARNING: exit code 1 from a shell command.
| ERROR: Function failed: do_image_wic (log file is located at /work/beaglebone-black/yocto/builds/build-bbb-morty/tmp/work/beaglebone-poky-linux-gnueabi/core-image-minimal/1.0-r0/temp/log.do_image_wic.23788)
ERROR: Task (/work/beaglebone-black/yocto/sources/poky-morty/meta/recipes-core/images/core-image-minimal.bb:do_image_wic) failed with exit code '1'
NOTE: Tasks Summary: Attempted 1771 tasks of which 6 didn't need to be rerun and 1 failed.
Summary: 1 task failed:
/work/beaglebone-black/yocto/sources/poky-morty/meta/recipes-core/images/core-image-minimal.bb:do_image_wic
Summary: There were 2 WARNING messages shown.
Summary: There was 1 ERROR message shown, returning a non-zero exit code.
While not sure this could be the reason of the problem, the prefered method to add packages to the image, in the local.conf context is using the CORE_IMAGE_EXTRA_INSTALL variable.
Therefore change:
IMAGE_INSTALL_append = " kernel-modules kernel-devicetree"
to
CORE_IMAGE_EXTRA_INSTALL += "kernel-modules kernel-devicetree"
I think there is no problem with your work method.
It seems to be a build environment problem, but the error log seems to confirm.
your log location at "/work/beaglebone-black/yocto/builds/build-bbb-morty/tmp/work/beaglebone-poky-linux-gnueabi/core-image-minimal/1.0-r0/temp/log.do_image_wic.23788"
Your error log indicates the the URL for fetching binaries failed.
You can try using tunnel through proxy. Or you can run the bitbake again because it can also fail sometimes due to network conditions.
I am trying to build a Docker image on a Play 2.2 project. I am using Docker version 1.2.0 on Ubuntu Linux.
My Docker specific settings in Build.scala looks like this:
dockerBaseImage in Docker := "dockerfile/java:7"
maintainer in Docker := "My name"
dockerExposedPorts in Docker := Seq(9000, 9443)
dockerExposedVolumes in Docker := Seq("/opt/docker/logs")
Generated Dockerfile:
FROM dockerfile/java:latest
MAINTAINER
ADD files /
WORKDIR /opt/docker
RUN ["chown", "-R", "daemon", "."]
USER daemon
ENTRYPOINT ["bin/device-guides"]
CMD []
Output looks like the dockerBaseImage is being ignored, and the default
(dockerfile/java:latest) is not handled correctly:
[project] $ docker:publishLocal
[info] Wrote /..../project.pom
[info] Step 0 : FROM dockerfile/java:latest
[info] ---> bf7307ff060a
[info] Step 1 : MAINTAINER
[error] 2014/10/07 11:30:12 Invalid Dockerfile format
[trace] Stack trace suppressed: run last docker:publishLocal for the full output.
[error] (docker:publishLocal) Nonzero exit value: 1
[error] Total time: 2 s, completed Oct 7, 2014 11:30:12 AM
[project] $ run last docker:publishLocal
java.lang.RuntimeException: Invalid port argument: last
at scala.sys.package$.error(package.scala:27)
at play.PlayRun$class.play$PlayRun$$parsePort(PlayRun.scala:52)
at play.PlayRun$$anonfun$play$PlayRun$$filterArgs$2.apply(PlayRun.scala:69)
at play.PlayRun$$anonfun$play$PlayRun$$filterArgs$2.apply(PlayRun.scala:69)
at scala.Option.map(Option.scala:145)
at play.PlayRun$class.play$PlayRun$$filterArgs(PlayRun.scala:69)
at play.PlayRun$$anonfun$playRunTask$1$$anonfun$apply$1.apply(PlayRun.scala:97)
at play.PlayRun$$anonfun$playRunTask$1$$anonfun$apply$1.apply(PlayRun.scala:91)
at scala.Function7$$anonfun$tupled$1.apply(Function7.scala:35)
at scala.Function7$$anonfun$tupled$1.apply(Function7.scala:34)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Invalid port argument: last
[error] Total time: 0 s, completed Oct 7, 2014 11:30:16 AM
What needs to be done to make this work?
I am able to build the image using Docker from the command line:
docker build --force-rm -t device-guides:1.0-SNAPSHOT .
Packaging/publishing settings are per-project settings, rather than per-build settings.
You were using a Build.scala style build, with a format like this:
object ApplicationBuild extends Build {
val main = play.Project(appName, appVersion, libraryDependencies).settings(
...
)
}
The settings should be applied to this main project. This means that you call the settings() method on the project, passing in the appropriate settings to set up the packaging as you wish.
In this case:
object ApplicationBuild extends Build {
val main = play.Project(appName, appVersion, libraryDependencies).settings(
dockerBaseImage in Docker := "dockerfile/java:7",
maintainer in Docker := "My name",
dockerExposedPorts in Docker := Seq(9000, 9443),
dockerExposedVolumes in Docker := Seq("/opt/docker/logs")
)
}
To reuse similar settings across multiple projects, you can either create a val of type Seq[sbt.Setting], or extend sbt.Project to provide the common settings. See http://jsuereth.com/scala/2013/06/11/effective-sbt.html for some examples of how to do this (e.g. Rule #4).
This placement of settings is not necessarily clear if one is used to using build.sbt-type builds instead, because in that file, a line that evaluates to an SBT setting (or sequence of settings) is automatically appended to the root project's settings.
It's a wrong command you executed. I didn't saw it the first time.
run last docker:publishLocal
remove the run last
docker:publishLocal
Now you get your docker image build as expected