travis-ci: matrix.exclude ignored? - travis-ci

i'm setting up travis-ci form my C++-project, and want to have three different jobs per build:
linux (native=64bit)
osx/64bit (native)
osx/32bit
to achieve this, i've configure travis to build on linux and osx, and created an evironment-variable ARCH that is set either to a specific architecture (e.g. i386) or empty (for native builds)
here's my .travis.yml:
language: cpp
env:
matrix:
- ARCH=
- ARCH=i386
global:
- secure: ...
os:
- linux
- osx
matrix:
exclude:
- os: linux
before_install:
- ./travis-ci/install-dependencies.sh
script:
- ./travis-ci/build.sh
the script- and before_install-scripts are setup to honour the ARCH envvar.
now for reasons unknown to me, when i push to github, the build-matrix includes:
OS:linux, env:ARCH=
OS:linux, env:ARCH=i386
and indeed, i get two jobs for linux.
so it seems that my exclude statement is ignored.
any hints what i should do to not build linux/ARCH=i386?

So it seems that the problem was, that my particular project had not had OSX support enabled yet (currently this needs to be done manually).
Thus the os axis of the matrix did not really exist.
Once the osx builds were enabled (and therefore the os axis was established properly), the exclude statement started to work as expected.

Related

How to make Bazel correctly cache dependencies built by itself?

I have a (relatively small) Bazel rule for some configure/make based project, say xmlsec1 - you can take any other, the important thing seems to be the external tooling behind foreign_cc:
xmlsec1.BUILD:
load("#rules_foreign_cc//foreign_cc:defs.bzl", "configure_make")
filegroup(name="all_srcs", srcs=glob(["**"]))
configure_make(
name="xmlsec1",
lib_name="xmlsec1",
lib_source=":all_srcs",
configure_command="configure",
configure_in_place=True,
out_binaries=["xmlsec1"],
targets=["install"],
)
xmlsec1.bzl:
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
def xmlsec1():
http_archive(
name = "xmlsec1",
url = "https://www.aleksey.com/xmlsec/download/xmlsec1-1.2.37.tar.gz",
sha256 = "5f8dfbcb6d1e56bddd0b5ec2e00a3d0ca5342a9f57c24dffde5c796b2be2871c",
build_file = "#//:xmlsec1.BUILD",
)
All works fine for me until Bazel's remote cache gets activated and I'm building among different Linux distributions.
To avoid cache collisions I'm running bazel build with --action_env=SYSTEM_DIGEST="$(cat /etc/os-release)", resulting in different hashes on different distros.
While this approach seems to work for the artifacts defined in xmlsec1 (I can see this from the execution logs and observing expected re-builds), the foreign_cc part seems to built without those action_env variables.
This is what I get when I try to build #xmlsec1//:xmlsec1 (line breaks for readability):
+ /home/me/.cache/bazel/_bazel_me/8f6a55c898f3ec22f87d9cee5890b9e5/sandbox/processwrapper-sandbox/5/execroot/my_project_packages/bazel-out/k8-opt-exec-2B5CBBC6/bin/external/rules_foreign_cc/toolchains\
/make/bin/make install
/home/me/.cache/bazel/_bazel_me/8f6a55c898f3ec22f87d9cee5890b9e5/sandbox/processwrapper-sandbox/5/execroot/my_project_packages/bazel-out/k8-opt-exec-2B5CBBC6/bin/external/rules_foreign_cc/toolchains\
/make/bin/make: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /home/me/.cache/bazel/_bazel_me/8f6a55c898f3ec22f87d9cee5890b9e5/sandbox/processwrapper-sandbox/5/execroot/my_project_packages/bazel-out/k8-opt-exec-2B5CBBC6/bin/external/rules_foreign_cc/toolchains/make/bin/make)
/home/me/.cache/bazel/_bazel_me/8f6a55c898f3ec22f87d9cee5890b9e5/sandbox/processwrapper-sandbox/5/execroot/my_project_packages/bazel-out/k8-opt-exec-2B5CBBC6/bin/external/rules_foreign_cc/toolchains\
/make/bin/make: /lib64/libc.so.6: version `GLIBC_2.33' not found (required by /home/me/.cache/bazel/_bazel_me/8f6a55c898f3ec22f87d9cee5890b9e5/sandbox/processwrapper-sandbox/5/execroot/my_project_packages/bazel-out/k8-opt-exec-2B5CBBC6/bin/external/rules_foreign_cc/toolchains/make/bin/make)
I get this linker error only with bazel remote cache being activated and (this is the interesting part) having xmlsec1 built on a recent distribution (say Ubuntu 22.04) and then trying to build it on Centos-8.
So I guess this is what's going on:
make from foreign_cc get's built and linked against a recent version of GLIBC, ignoring values provided with --action_env
foreign_cc artifacts are being stored in the bazel remote cache
another build on an older distro (Centos-8) also tries to build make and has no reason not to take the artifacts from the cache, since --action_env values are also ignored here, resulting in the same hash
since the binaries are linked against a version of GLIBC which is not available yet on Centos-8 they are not compatible and crash with the error you see above.
So my question(s) is(/are):
Is that intended behavior? why is --action_env being ignored for builds Bazel runs implicitly (and not for those defined explicitly)?
Is there a way to apply those rules for Bazels own dependencies?
Is there a better way to define system properties with effects on all Builds?

Maven upgrade unable to run package appengine:stage

I have a Java 8 Spring Boot app that is deployed to Google App Engine and being built via GCP CloudBuild. I am trying to upgrade it from Java 8 to Java 11.
In the cloudbuild.yaml file, I changed:
- id: 'Build and Test'
name: 'gcr.io/cloud-builders/mvn:3.5.0-jdk-8'
args: ['package', 'appengine:stage']
to:
- id: 'Build and Test'
name: 'maven:3.8.3-jdk-11'
args: ['package', 'appengine:stage']
When I run the CloudBuild, this step now suddenly fails with the following error:
docker.io/library/maven:3.8.3-jdk-11
/usr/local/bin/mvn-entrypoint.sh: 50: exec: package: not found
In its previous configuration, it was running just fine. The entire cloudbuild.yaml file is:
steps:
- id: 'copy file'
name: 'ubuntu'
args: ['cp', 'src/main/appengine/app.yaml', src/main/appengine/app.yaml]
- id: 'Build and Test'
name: 'maven:3.8.3-jdk-11'
args: ['package', 'appengine:stage']
What is going on here? Does the gcr.io/cloud-builders/mvn:3.5.0-jdk-8 image somehow understand mvn package appengine:stage, whereas the maven:3.8.3-jdk-11 image doesn't? Mainly I just need someone to help me understand why I'm getting the error. If anyone could also lend some suggestions for how to fix or circumvent it, that'd be greatly appreciated as well. Thanks in advance!
Reviewing Google documentation about migrating to the Java 11 runtime, I found that App Engine standard environment allows you to use several of App Engine's packaged services and APIs in the Java 11 runtime, reducing runtime conversion effort and complexity.
The Project Engine API JAR allows your Java 11 app to contact the bundled services APIs and access most of the same functionality as the Java 8 runtime.
You can also use Google Cloud products that are equivalent to the App Engine packaged services in functionality.
Adding overview of the migration process documentation.
The process to follow is:
Download Cloud SDK.
Migrate from the standalone App Engine Maven plugin to the Cloud
SDK-based Maven plugin or the Cloud SDK-based Gradle plugin.
Install the App Engine API JAR if you are using the App Engine
bundled services.
Migrate your XML files to the equivalent yaml files.
Now, regarding the issue you are getting, you should specify the whole path of the script because /usr/src/app may not be in your path. You must also ensure that your entrypoint.sh is executable; however, depending on your circumstance, docker will duplicate the permissions precisely as they are on your build host, so this step may not be necessary.
Additional suggestion is that you can't use single quotes ' for the entrypoint/command, you can try with “,
It was so obvious in hindsight. I needed to specify mvn as the entrypoint command in the step like so:
- id: 'Build and Test'
entrypoint: mvn
name: 'maven:3.8.3.0-jdk-11'
args: ['package','appengine:stage']

Building tensorflow 2.2.0 pip wheel file, for use in CentOS system (older libc)

Introduction:
I have to create a pip wheel of Tensorflow 2.2.0 with cuda libraries dynamically linked(specifically cudart.so). To accomplish this i am currently using the tensorflow-dev docker image.
I am able to build the tf wheel file, an able to install and use it while inside the build container.
Issue:
The issue is that importing the generated wheel file in a CentOS server, i get the following error:
ImportError: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /home1/private/mavridis/Vineyard/tensorflowshared/test/lib64/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so)
Having looked around, the issue is caused by the build container using a newer libc:
ldd --version
ldd (Ubuntu GLIBC 2.27-3ubuntu1) 2.27
Compared to CentOS older version:
ldd --version
ldd (GNU libc) 2.17
Expected behavior:
Having already tried the 'vanilla' tenorflow 2.2.0 version with no issues, installed using pip:
pip install tensorflow==2.2.0
I expected my own build to also work.
So i assume there is some configuration option or docker configuration to allow me to use the docker built wheel file, in a CentOS setup, just like the pip installed version. As this wheel file is intended to be deployed to setups beyond my control, solutions involving alternate OSes and/or libc replacement are not applicable.
Build configuration:
During build i use the following configuration/ command line:
export TF_NEED_CUDA=1
export TF_USE_XLA=0
export TF_SET_ANDROID_WORKSPACE=0
export TF_NEED_OPENCL_SYCL=0
export TF_NEED_ROCM=0
bazel build --config=opt --config=cuda --output_filter=DONT_MATCH_ANYTHING --linkopt=-L/usr/local/cuda/lib64 --linkopt=-lcudart --linkopt=-static-libstdc++ //tensorflow/tools/pip_package:build_pip_package
Regarding options used:
--output_filter=DONT_MATCH_ANYTHING : Silence warnings
--linkopt=-L/usr/local/cuda/lib64 --linkopt=-lcudart : Dynamic linking of cudart.so
--linkopt=-static-libstdc++ : Static link libstc++ as libstc++ also caused the libc error, this however is not possible for libm
I expected my own build to also work.
That expectation is (obviously) incorrect. The symbols your program or library requires from GLIBC depend on exactly which functions you call.
Consider the following program:
int main() { exit(0); }
When compiled/linked on a GLIBC-2.30 system, this program only depends on GLIBC_2.2.5 (because it doesn't call any newer symbols).
Now change the program slightly:
int main() { gettid(); exit(0); }
Compile/link it again, and all of a sudden this program now requires GLIBC_2.30 (because that's where gettid() was added to GLIBC), and will not work on any system which has older GLIBC.
So i assume there is some configuration option or docker configuration
Sure: your Docker image must have GLIBC that is not newer than what your target system have, i.e. GLIBC-2.17. Your current image contains GLIBC-2.27 (or newer).
You need a different Docker image, and you'll likely have to build it yourself, since GLIBC-2.17 is over 7 years old, and predates TensorFlow by many years.
Update:
What i don't understand is how come the pip tensorflow package (which i assumed was build with the docker image i am using) works with CentOS?
It works by accident, just like my first program would work on CentOS, but the second one wouldn't.
In short i wanted to generate a pip package that would work on 'any' linux/libc version
That is an impossible goal: Linux predates GLIBC, and it is impossible to build a single package that will work on a Linux distribution which didn't include GLIBC and on a distribution that did.
You have to draw a line somewhere. The developers of tensorflow-dev docker image drew a line at GLIBC-2.27. Packages built on this image should work on any system with 2.27 or later, and might (but are not at all guaranteed to) work on older systems.
just like the pip installed version.
You claim that the pip installed version has no "only GLIBC-xx or later" requirement, but that is not true. I am 99.9% sure that it requires at least GLIBC-2.14.
To find which GLIBC versions that package requires, run this command:
readelf -WV _pywrap_tensorflow_internal.so | grep GLIBC_
I assumed, the pip installed version was built using the publicly available tensorflow-devel docker image.
That is quite likely. And like I said, it happens to work on CentOS, but minute changes may make it not work anymore.
Update 2:
So running the readelf command as you suggested, does show the most recent required versions to be: - pip version: GLIBC_2.12 - mine : GLIBC_2.27 So from what i understand the pip version uses an older version even from CentOS, which explains why it works.
It doesn't "use" older version, it uses whatever version is available.
It requires a minimum version 2.12, while your build requires a minimum version 2.27.
How do they achieve this? Do they use a different image that has an older libc? If so, where can i get it? Or do they use the public image, but build with some bazel flag, that 'limits' symbols to the ones contained up to libc 2.12?
You are still not getting it.
The version that your program requires depends on exactly which functions you call. In my example program, if I only call exit, my program requires vesion 2.2.5, but if I also call gettid, then my program requires version 2.30. Note: these two programs are built on the same system with the same flags.
So no: they (most likely) didn't use a different Docker image, and didn't use "magic" bazel flags. They just happened to not call any functions which require GLIBC version > 2.12, and you did.
P.S. You can find which symbol(s) are causing "bad" dependency in your build like so:
readelf -Ws _pywrap_tensorflow_internal.so | egrep 'GLIBC_2.2[0-9]'
readelf -Ws _pywrap_tensorflow_internal.so | egrep 'GLIBC_2.1[89]'
This would produce output similar to (using my second program):
readelf -Ws a.out | egrep 'GLIBC_2.[23][0-9]'
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND gettid#GLIBC_2.30 (2)
48: 0000000000000000 0 FUNC GLOBAL DEFAULT UND gettid##GLIBC_2.30
The output above shows that the only symbol my binary requires from GLIBC 2.20 or above is gettid.
To make a counter point to what Employed Russian wrote:
The version that your program requires depends on exactly which functions you call. In my example program, if I only call exit, my program requires vesion 2.2.5, but if I also call gettid, then my program requires version 2.30. Note: these two programs are built on the same system with the same flags.
I don't think that's quite accurate. My understanding, which is corroborated by https://github.com/wheybags/glibc_version_header, is that things work like so (quoting that project, emphasis mine):
Glibc uses something called symbol versioning. This means that when you use e.g., malloc in your program, the symbol the linker will actually link against is malloc#GLIBC_YOUR_INSTALLED_VERSION (actually, it will link to malloc from the most recent version of glibc that changed the implementaton of malloc, but you get the idea).
So my guess (I have not checked) would be that the Tensorflow releases are built against an older glibc (perhaps by way of being built on an older release of their target Linux distro).

Build dkms module for specific kernel versions only

How do you define dkms.conf such that a DKMS module will only be built for specific kernel version or range of versions?
Background:
A buggy driver is present in the the current kernels we are using (eg 4.4) but fixed in 4.10. I produced as dkms package with the 4.10 source code in it, which all works fine on kernel 4.4. But as we update to later OS releases (or HWE releases) with later kernel releases - eg 4.15 - I want to avoid rebuilding the (now possibly older) 4.10 kernel driver when the kernel version is 4.10 or higher.
Here's my base dkms.conf file
PACKAGE_NAME="cp210x"
PACKAGE_VERSION="#MODULE_VERSION#"
BUILT_MODULE_NAME[0]="$PACKAGE_NAME"
DEST_MODULE_LOCATION[0]="/updates/dkms"
AUTOINSTALL="YES"
REMAKE_INITRD="YES"
I tried BUILD_EXCLUSIVE_KERNEL matching to 4.N kernel versions
BUILD_EXCLUSIVE_KERNEL="^4\.[0-9]\.*"
Expected behaviour - will not install the kernel module for kernel 4.15.0-43-generic. Actual behaviour - installs as normal
My reading suggests an alternate might work (for this test I'm just matching my current kernel version) to change the compile rule to be a no-op.
MAKE_MATCH[1]="^4\.15\.*"
MAKE[1]=":"
I'm on Debian/Ubuntu platforms if that makes any difference.
Ok - the problem was between keyboard and chair - my BUILD_EXCLUSIVE_KERNEL regexp had an error in it - the .* suffix got mixed with the \. number separator. But I'll document a working example here since google didn't find any good examples before I posted here:
Firstly I wasn't sure what regexp dialect I needed to be using (grep, pcre, etc,..) especially since there is shell escaping mixed in, so thought perhaps the mismatch was there.
Turns out dkms is a bash script and so uses [[ $ver =~ $match_regexp ]]. So to test the matching this worked:
re="^(3\.[0-9]+\.|4\.[0-9]\.)" ; [[ "4.15.0-43-generic" =~ $re ]] && echo true
# but this didn't
[[ "4.15.0-43-generic" =~ "^(3\.[0-9]+\.|4\.[0-9]\.)" ]] && echo true
Here's the config file I ended up using:
PACKAGE_NAME="cp210x"
PACKAGE_VERSION="#MODULE_VERSION#"
BUILT_MODULE_NAME[0]="$PACKAGE_NAME"
DEST_MODULE_LOCATION[0]="/updates/dkms"
AUTOINSTALL="YES"
REMAKE_INITRD="YES"
# Since this code comes from 4.10 only update kernels 4.9 and earlier
BUILD_EXCLUSIVE_KERNEL="^(3\.[0-9]+\.|4\.[0-9]\.)"
Which looks like this when installed via dpkg.
First Installation: checking all kernels...
Building only for 4.15.0-43-generic
Building initial module for 4.15.0-43-generic
Error! The dkms.conf for this module includes a BUILD_EXCLUSIVE directive which
does not match this kernel/arch. This indicates that it should not be built.
Skipped.
But installs correctly against lower kernel versions.
Additionally the wording of the BUILD_EXCLUSIVE_KERNEL documentation suggests it is an error if the kernel mismatches which might not be desirable, however if you check the output above you'll see that the "Error" does not cause a package installation failure, just marked as skipped.

How do I install LLVM/Clang/libc++ version 3.9 on Travis-CI?

I know how to install LLVM/Clang/libc++ 3.8 on Travis CI, through the whitelisted llvm-toolchain-trusty-3.8, but this doesn't exist (or work) for 3.9.
Note the thing I need is libc++experimental.a, which contains the implementation of std::experimental::filesystem for libc++.
I really find the Travis-CI way of doing things kind of inflexible, so if there is a wholly alternative way of getting specific versions of things installed on a build machine, please enlighten me and free me from these stupid limitations. I also don't want to build every single toolchain dependency on Travis, that would be overkill.
The best way to get new libc++ in Travis-CI is to build it from source after installing LLVM/Clang.
Here is the script I wrote to download, build and install libc++ for Travis, and here is an example usage in Google Benchmarks .travis.yml. The script takes about 120 seconds to complete.
PS. I'm happy to see people using libc++'s std::experimental::filesystem :-)
You can install packages with apt addon into your container-based image.
Add next lines to your .travis.yml
addons:
apt:
sources:
- llvm-toolchain-trusty-3.9
packages:
- clang-3.9
- libc++-dev
- libc++abi-dev
Side note: At the moment you have posted your question llvm-toolchain-trusty-3.9 were whitelisted

Resources