With a Composer repository, how can one determine reverse package requirements? - jenkins

For a given version of a package, I need to find those packages which Require it.
I can re-invent the wheel by crafting some kind of parser to go against our Satis repo's packages.json but surely there's an easier way that's already present in the Composer API?
The use case for this is a build pipeline I am constructing on our Jenkins CI server that responds to commits to our top-level master composer project which is the moving version to we need to retrieve and assemble (via composer require) each package in our Satis repo that has a dependency, applying fuzzy version matching.

I don't have internal knowledge about the Composer API - my approach is to use the published CLI interface to get the job done, trying to avoid fancy stuff, because the Composer project still is heavily work in progess. That being said:
You cannot compile a list of "this package A is being used by all these packages" in the general case, because that would mean that you have to scan ALL packages existing in the world. That job would never end.
However, using Satis will compile a list of packages detected to use a certain package, which is rendered in the HTML template just for information. So for a reasonable small world it is possible to detect the dependencies and create a reverse dependency map. But I don't think this is a reliable feature within Composer, because for the usual use case Composer will only evaluate forward dependencies, never reverse. There is no use case within Composer to create these relations, chances are high Satis also doesn't expose them in the regular case.
You can however try to output something that is more machine readable than the rendered HTML to get the "packages using this" info - it is passed as an array variable into the template. Shouldn't be too difficult to output something else.

Related

What is a "jobset" in the parlance of the Hydra continuous integration tool?

I found definitions (see below) and how they are usually used (i.e., a Hydra jobset is tracking a git branch) but I still couldn't figure out what they are in the general sense. Maybe if you could explain it laymen's terms with specific examples?
Eelco Dolstra, Eelco Visser: "Hydra: A Declarative Approach to Continuous Integration"
a specification of the location of a Nix expression, along with possible values for the function arguments of the jobs defined by the Nix expression.
NixOS Wiki: "Hydra"
Job Set
A list of jobs which will be run. Often a Jobset fits to a certain branch (master, staging, stable). A jobset is defined by its inputs and will trigger if these inputs change, e.g. like a new commit onto a branch is added. Job sets may depend on each other
Hydra User's Guide
3.2 Job Sets
A project can consist of multiple job sets (hereafter jobsets), separate tasks that can be built separately, but may depend on each other (without cyclic dependencies, of course).
My question may seem pointless after listing all these definitions, but here's an examle to demonstrate my confusion: I looked at the project listing at https://hydra.nixos.org/ and I was under the impression that a project is a channel, and jobsets are the branches in a repo. (I know, there is no mention of "channel" in there, and on the channel page it even says that "Nix channels are not supported by this Hydra server." :)
I could fool myself with that when looking at the Hydra project but this argument fell apart when I clicked on the flakes one (that is, couldn't find a supporting github repo, but generally the jobset names didn't feel like branch names).
Also, in the Dolstra/Visser paper, Hydra was set up using SVN; I don't know if SVN even uses branches (mostly because the paper didn't mention them) but this does prove that Hydra can be set up with VCS/SCM other than git where the underlying concepts can be fundamentally different. Again, I could easily be wrong.
I think I found the best definition in the flox documentation:
Within channels, jobsets allow flox to build packages against multiple versions of dependencies simultaneously.
flox is a framework built around the Nix eco-system to use Nix without having to install it; it seamlessly relays Nix commands to a remote Nix store and Hydra build farm as if they were issued on the local machine, and thus one would consistently get the same results everywhere where flox is installed.
If I understand this correctly,
the stable jobset would then specify as inputs the latest stable releases of all the dependencies of the channel's packages (e.g., from their release branch)
staging / unstable would take the latest revision of a development branch (e.g., master / main) of the dependencies (a.k.a., bleeding edge)
and so on.
Note to self: jobset specification applies to every package in a channel.
Some lingering questions if the above is correct:
What does a jobset specification look like? For example, stable takes the latest commits (i.e. HEAD) of release branches, whereas staging/unstable would do the same but for master/main (or other development) branches?

What is the best option for build kubeflow components?

I am read about Kubeflow, and for create components there are two ways.
Container-Based
Function-Based
But there isn't an explication about why I should to use one or another, for example for load a Container-based, I need to generate a docker image push, and load in the pipeline the yaml, with the specification, but with function-based, I only need import the function.
And in order to apply ci-cd with the latest version, if I have a container-based, I can have a repo with all yml and load with load_by_url, but if they are a function, I can have a repo with all and load as a package too.
So what do you think that is the best approach container-based or function-based.
Thanks.
The short answer is it depends, but a more nuance answer is depends what you want to do with the component.
As base knowledge, when a KFP pipeline is compiled, it's actually a series of different YAMLs that are launched by Argo Workflows. All of these needs to be container based to run on Kubernetes, even if the container itself has all python.
A function to Python Container Op is a quick way to get started with Kubeflow Pipelines. It was designed to model after Airflow's python-native DSL. It will take your python function and run it within a defined Python container. You're right it's easier to encapsulate all your work within the same Git folder. This set up is great for teams just getting started with KFP and don't mind some boilerplate to get going quickly.
Components really become powerful when your team needs to share work, or you have an enterprise ML platform that is creating template logic of how to run specific jobs in a pipeline. The components can be separately versioned and built to use on any of your clusters in the same way (underlying container should be stored in docker hub or ECR, if you're on AWS). There are inputs/outputs to prescribe how the run will execute using the component. You can imagine a team in Uber might use a KFP to pull data for number of drivers in a certain zone. The inputs to the component could be Geo coordinate box and also time of day of when to load the data. The component saves the data to S3, which then is loaded to your model for training. Without the component, there would be quite a bit of boiler plate that would need to copy the code across multiple pipelines and users.
I'm a former PM at AWS for SageMaker and open source ML integrations, and this is sharing from my experience looking at enterprise set ups.
But there isn't an explication about why I should to use one or another, for example for load a Container-based, I need to generate a docker image push, and load in the pipeline the yaml, with the specification, but with function-based, I only need import the function.
There are some misconceptions here.
There is only one kind of component under the hood - container-based component (there are also graph components, but this is irrelevant here).
However, most of our users like python and do not like building container. This is why I've developed a feature called "Lightweight python components" which generates ComponentSpec/component.yaml from a python function source code. The generated component basically runs python3 -u -c '<your function>; <command-line parsing>' arg1 arg2 ....
There is a misconception that "function-based components are different from component.yaml files".
No, it's the same format. You're supposed to save the generated component into a file for sharing: create_component_from_func(my_func, output_component_file='component.yaml'). After your code stabilizes, you should upload the code and the component.yaml to GitHub or other place and use load_component_from_url to load that component.yaml in pipelines.
Check the component.yaml files in the KFP repo. More than half of the component.yaml files are Lightweight components - they're generated from python functions.
component.yaml are intended for sharing the components. They're declarative, portable, indexable, safe, language-agnostic etc. You should always publish component.yaml files. If component.yaml is generated from a python function, then it's good practice to put component.py alongside so that the component can be easily regenerated when making changes.
The decision whether to create component using Lightweight python component feature or not is very simple:
Is you code in a self-contained python function (not a CLI program yet)? Do you want to avoid building, pushing and maintaining containers? If yes, then the Lightweight python component feature (create_component_from_func) can help you and generate the component.yaml for you.
Otherwise, write component.yaml yourself.

Paket + FAKE + swapping dependencies in CI tool

I'm messing about with some FAKE and Paket (on F#) and Jenkins, not really sure I know what I'm doing but I know what I WANT to do.
The short description is I want the build server to build a whole family of related services against a referenced package, but the package comes in different flavours (but share the same basic namespace/module names).
The long description;
I have a family of family of services that sit on top of an external
API.
i.e.
they all are reference some external package and access it through modules etc.
e.g.
ServiceA.fsprj
...
let f (x : ExternalApi.Foo) = ....
---------------
ServiceB.fsprj
...
let g (x : ExternalApi.Foo) = ....
The developer will probably develop against the most common flavour, lets say ExternalApiVanilla.
The developer will be using Paket, and Fake for build tools, and Jenkins.
When the code is checked in though I want the build service to attempt to build it against the vanilla flavour...but also against chocolate, strawberry and banana.
The flavours are not "versions" in the sense of a version number, they are distinct products, with their own nuget packages. So I think (somehow) I want to parametise a jenkins folder with all the jobs in with the name of the api package, pass that into the build script and then get the build script to swap out whatever the engineer has referenced and reference the parameter.
Of course some compilations will fail, we have to develop different variants of services to handle some of the variants of API, but 90% of our stuff works on all versions, we just need an automated way to check the build and then create new variants of services and jobs, to handle them.
as an aside, we are doing some things with C# and cake/nuget, but controlling the versioning by passing the nuget folder in and forcing the build to find specific versions of 1 flavour...I understadn this, though I wouldnt be able to write it, but I want to go 1 step further and replace the reference itself with a different one.
——————-
i’ll try looking at the paket.dependencies/paket references files in the build script, remove the existing reference, and add the jenkins defined ones from a shell and paket and aee what happens, dont especially like it, im dependent on the format of these files and i was hoping this would be mainstream
I have solved this, at least in the context of cake + nuget (and the same solution will apply), by simply search replacing the package reference (using XDocument) in the cake script with a reference parameter set up in the job parameters.
I'll now implement it in the fake version of this build, though I may simply drop paket all
together.

How to display configuration differences between two jenkins Jenkins builds?

I want to display non-code differences between current build and the latest known successful build on Jenkins.
By non-code differences I mean things like:
Environment variables (includes Jenkins parameters) (set), maybe with some filter
Version of system tool packages (rpm -qa | sort)
Versions of python packages installed (pip freeze)
While I know how to save and archive these files as part of the build, the only part that is not clear is how to generate the diff/change-report regarding differences found between current build and the last successful build.
Please note that I am looking for a pipeline compatible solution and ideally I would prefer to make this report easily accessible on Jenkins UI, like we currently have with SCM changelogs.
Or to rephrase this, how do I create build manifest and diff it against last known successful one? If anyone knows a standard manifest format that can easily be used to combine all these information it would be great.
you always ask the most baller questions, nice work. :)
we always try to push as many things into code as possible because of the same sort of lack of traceability you're describing with non-code configuration. we start with using Jenkinsfiles, so we capture a lot of the build configuration there (in a way that still shows changes in source control). for system tool packages, we get that into the app by using docker and by inheriting from a specific tag of the docker base image. so even if we want to change system packages or even the python version, for example, that would manifest as an update of the FROM line in the app's Dockerfile. Even environment variables can be micromanaged by docker, to address your other example. There's more detail about how we try to sidestep your question at https://jenkins.io/blog/2017/07/13/speaker-blog-rosetta-stone/.
there will always be things that are hard to capture as code, and builds will therefore still fail and be hard to debug occasionally, so i hope someone pipes up with a clean solution to your question.

Ant: Is it possible to create a dynamic ant script?

So, at work, I frequently have to create virtually identical ant scripts. Basically the application we provide to our clients is designed to be easily extensible, and we offer a service of designing and creating custom modules for it. Because of the complexity of our application, with lots of cross dependencies, I tend to develop the module within our core dev environment, compile it using IntelliJ, and then run a basic ant script that does the following tasks:
1) Clean build directory
2) Create build directory and directory hierarchy based on package paths.
3) Copy class files (and source files to a separate sources directory).
4) Jar it up.
The thing is, to do this I need to go through the script line by line and change a bunch of property names, so it works for the new use case. I also save all the scripts in case I need to go back to them.
This isn't the worst thing in the world, but I'm always looking for a better way to do things. Hence my idea:
For each specific implementation I would provide an ant script (or other file) of just properties. Key-value pairs, which would have specific prefixes for each key based on what it's used for. I would then want my ant script to run the various tasks, executing each one for the key-value pairs that are appropriate.
For example, copying the class files. I would have a property with a name like "classFile.filePath". I would want the script to call the task for every property it detects that starts with "classFile...".
Honestly, from my current research so far, I'm not confident that this is possible. But... I'm super stubborn, and always looking for new creative options. So, what options do I have? Or are there none?
It's possible to dynamically generate ANT scripts, for example the following does this using an XML input file:
Use pure Ant to search if list of files exists and take action based on condition
Personally I would always try and avoid this level of complexity. Ant is not a programming language.
Looking at what you're trying to achieve it does appear you could benefit from packaging your dependencies as jars and using a Maven repository manager like Nexus or Artifactory for storage. This would simplify each sub-project build. When building projects that depend on these published libraries you can use a dependency management tool like Apache ivy to download them.
Hope that helps your question is fairly broad.

Resources