Different checksum results for jar files compiled on subsequent build? - checksum

I am working verifying the jar files present on remote unix boxes with that of built on local machine(Windows & Cygwin) with same JVM.
As a POC I am trying to verify if same checksum is produced with jar files generated on my machine with consecutive builds, I tried below,
Generated the jar file first time using ant script
Calculated the checksum (e.g. "xyz abc")
Generated the jar file again with same ant script without changing anything
I got different checksum but same byte count (e.g. "xvw abc")
I am not sure how java internal processes produce the class files and then the jar files, Can someone please help me understand below points
Does the cksum utility of unix/cygwin consider timestamp of the file while coming up with the value?
Will the checksum be different for compiled class files/jar file produced if we keep every other things same [Compiler version + sourcecode + machine + environment]?

Answer to question 1: cksum doesn't consider the timestamp of the archive (e.g. jar-file) but it does consider the timestamps of the files inside the jarfile.
Answer to question 2: The checksums of the individual class-files will be the same with all other things the same (source-code, compiler etc.) The checksums of the jar-files will be different. Causes of differences can be the timestamp of the files inside the jarfile or if files are put into the archive in different orders (e.g. caused by parallel builds).
If you want to create a reproducible build with gradle you can do so with the config below:
tasks.withType(AbstractArchiveTask) {
preserveFileTimestamps = false
reproducibleFileOrder = true
}
Maven allows something similar, sorry I don't know how to do this with ant..
More info here:
https://dzone.com/articles/reproducible-builds-in-java
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74682318

Related

Does a BEAM file remember whether it was built with -Werror?

I am working on a tool that deals with BEAM files, and we want to be able to assume the code was compiled with -Werror, so we don't have to repeat validations that are already done by the erl_lint compiler pass.
Is there a way to figure out if the BEAM was built with -Werror?
I'd expect beam_lib:chunks/2 to help here, but unfortunately it doesn't seem to have what I'm looking for:
beam_lib:chunks("sample.beam", [debug_info, attributes, compile_info]).
% the stuff returned says nothing about -Werror, even if I compile with -Werror
It seems that this information would be always stripped
However, if you are in control of compilation process - you can put additional info into beam files, - which will be accessible through M:module_info(compile) and via beam chunks as well.
For example in rebar:
{erl_opts, [debug_info, {compile_info, [{my_key, my_value}]}]}.
And then:
1> my_module:module_info(compile).
[{version,"7.6.6"},
{options,[debug_info, ...
{my_key,my_value}]
The same is true for "discoverability" of this key directly from beam chunks:
2> beam_lib:chunks("my_beam.beam", [compile_info]).
{ok, ... {my_key,my_value}]}]}}
Meaning, that you can "stamp" your beam files with some meta-information easily. So, a workaround may be to stamp those beam files with this mark.

In pyspark, reading csv files gets failed if even 1 path does not exist. How can we avoid this?

In pyspark reading csv files from different paths gets failed if even one path does not exist.
Logs = spark.read.load(Logpaths, format="csv", schema=logsSchema, header="true", mode="DROPMALFORMED");
Here Logpaths is an array that contain multiple paths. And these paths are created dynamically depending upon given startDate and endDate range. If Logpaths contain 5 paths and first 3 exists but 4th does not exist. Then whole extraction gets failed. How can I avoid this in pyspark or how can I check there existance before reading?
In scala I did this by checking file existance and filter out non-existed records by using hadoop hdfs filesystem globStatus function.
Path = '/bilal/2018.12.16/logs.csv'
val hadoopConf = new org.apache.hadoop.conf.Configuration()
val fs = org.apache.hadoop.fs.FileSystem.get(hadoopConf)
val fileStatus = fs.globStatus(new org.apache.hadoop.fs.Path(Path));
So I got what I was looking for. Like the code I posted in the question which can be used in scala for file existance check. We can use below code in case of PySpark.
fs = sc._jvm.org.apache.hadoop.fs.FileSystem.get(sc._jsc.hadoopConfiguration())
fs.exists(sc._jvm.org.apache.hadoop.fs.Path("bilal/logs/log.csv"))
This is exactly the same code also used in scala, so in this case we are using java library for hadoop and java code runs on JVM on which spark is running.

Jenkins PyLint Warnings tool parses log files but reports 'found 0 issues'

I have setup Jenkins to run pylint on all python source files and all the log files are generated (apparently correctly) into a sub-directory as follows:
Source\pylint_logs\pylint1.log, pylint2.log, ..., pylint75.log
I have included a --msg-template definition based on the instructions on my Jenkins Configure page: Post-build Actions->Record compiler warnings and static analysis results->Static Analysis Tools. The template is shown as:
msg-template={path}:{line}: [{msg_id}, {obj}] {msg} ({symbol})
An example of one of the log files being generated by Jenkins/pylint is as follows:
************* Module FigureView
myapp\Views\FigureView.py:1: [C0103, ] Module name "FigureView" doesn't conform to snake_case naming style (invalid-name)
myapp\Views\FigureView.py:30: [C0103, FigureView.__init__] Attribute name "ax" doesn't conform to snake_case naming style (invalid-name)
------------------------------------------------------------------
Your code has been rated at 8.57/10 (previous run: 8.57/10, +0.00)
For the PyLint Report File Pattern, I have: Source/pylint_logs/pylint*.log
It appears that PyLint Warnings is parsing the files because the console output looks like this:
[PyLint] Searching for all files in 'D:\Jenkins\workspace\PROJECT' that match the pattern 'Source/pylint_logs/pylint*.log'
[PyLint] -> found 75 files
[PyLint] Successfully parsed file D:\Jenkins\workspace\PROJECT\Source\pylint_logs\pylint1.log
[PyLint] -> found 0 issues (skipped 0 duplicates)
[PyLint] Successfully parsed file D:\Jenkins\workspace\PROJECT\Source\pylint_logs\pylint10.log
[PyLint] -> found 0 issues (skipped 0 duplicates)
This repeats for all 75 files, even though there are plenty of issues in the log files.
What is odd, is that when I was first prototyping the use of Jenkins on this project, I set it up to just run pylint on a single file. I ran across another StackOverflow post that showed a msg-template that allowed me to get it working (unable to get pylint output to populate the violations graph). I even got the graph to show up for the PyLint Warnings Trend. I used the following definition per the post:
msg-template={path}:{line}: [{msg_id}({symbol}), {obj}] {msg}
Note that this format is slightly different from the one recommended by my Jenkins page (shown earlier). Even though this worked for a single file, neither template now seems to work for multiple files, or else there is something other than the template causing the problem. My graph has flat-lined, and I always get 0 issues reported.
I have had trouble finding useful documentation on the Jenkins PyLint Warnings tool. Does anyone have any ideas or pointers to documentation I can research further? Thanks much!
Ensure pass output-format parameter in pylint command. Example:
pylint --exit-zero --output-format=parseable module1 module2 > pylint.report
you have to set the Pylint's option --message-template in .pylintrc as
msg-template={path}: {line}: [{msg_id} ({symbol}), {obj}] {msg}
output-format=text

Continuous Integration with Blue Ocean, Github and Nuget causes path too long

NUnit.Extension.VSProjectLoader.3.7.0
I try to get a build chain to work with Jenkins Blue Ocean where the sources are in GitHub and additional dependencies are in nuget.
When I restore packages I get the error after the specific package NUnit.Extension.VSProjectLoader.3.7.0:
Errors in packages.config projects
The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.
On the agent machine the path is very short: C:\guinode\ on top of that additional length is added making the packages folder the following size:
MyGitProject is replacing my actual project name, the length is equal.
C:\guinode\workspace\MyGitProject_master-CFRRXMXQEUULVB4YKQOFGB65CQNC4U5VJKTARN2A6TSBK5PBATBA\packages
Checking the package on the agent machine shows that NUnit.Extension.VSProjectLoader.3.7.0 was loaded completely.
Checking a local installation and replacing the first path of the package I can find two files that are 260 characters or longer.
They belong to an internal project, so I have a chance of influencing that.
None of the directories are 248 characters or more.
So the immediate solution for me is to redeploy the internal reference package.
My question for future reference is if I can do something to the packages location or something to workspace\MyGitProject_master-CFRRXMXQEUULVB4YKQOFGB65CQNC4U5VJKTARN2A6TSBK5PBATBA so that I save some characters per default.
According to the microsoft documentation it can be possible to modify the 260 length rule.
If you prefix your file with '\\?\' eg: '\\?\C:\guinode\workspace...' then long path will be in use ( a little bit more than 32000 char). I hope settings JENKINS_HOME to this kind of path make all process use that (I'm not sure)
On recent Windows version (10.1607, 2016?) there is an option in the registry to enable long path. Set 1 to the following key: HKLM\SYSTEM\CurrentControlSet\Control\FileSystem LongPathsEnabled (Type: REG_DWORD) and restart the process.

How to "reduce" Jenkins Pipeline output path

We were building our solution without any "Pipeline" in Jenkins until recently, so I'm currently in the progress to move our build to multibranch pipelines.
The issue that I'm running into is that we have a lot of structure une our solution(lot of subfolder, and sometimes some big names).
Currently, the jenkins pipeline extract everything in a folder that looks like:
D:\ws\ght-build_feature_pipelines-TMQ33LB5OQIQ5VXVMFKFDG2HWCD4MUOGEGUWJUOMZ5D2GI42BIQA
Which is very-long, and now we are reaching the 260 characters limit of MSBuild:
C:\Program Files (x86)\Microsoft Visual
Studio\2017\Professional\MSBuild\15.0\Bin\Microsoft.Common.CurrentVersion.targets(2991,5):
error MSB3553: Resource file
"obj\Release\xx.aaaaaaaaaa.yyy.bbbbbb.dddddddddddddd.yyyyyyy.vvv.dddddddddd.Resources.resources"
has an invalid name. The item metadata "%(FullPath)" cannot be applied
to the path
"obj\Release\xx.aaaaaaaaaa.yyy.bbbbbb.dddddddddddddd.yyyyyyy.vvv.dddddddddd.Resources.resources".
The specified path, file name, or both are too long. The fully
qualified file name must be less than 260 characters, and the
directory name must be less than 248 characters.
[D:\ws\ght-build_feature_pipelines-TMQ33LB5OQIQ5VXVMFKFDG2HWCD4MUOGEGUWJUOMZ5D2GI42BIQA\Src\bbbbbb\dddddd\dddddddddddddd\yyyyyyy\xx.aaaaaaaaaa.yyy.bbbbbb.dddddddddddddd.yyyyyyy.vvv\xx.aaaaaaaaaa.yyy.bbbbbb.dddddddddddddd.yyyyyyy.vvv.csproj]
We have so much cases where the length is big that it's really a big job to refactor everything, so I'm looking on how to specify to jenkins a smaller path?
What I finally did:
pipeline {
agent {
node{
label 'windows-node'
customWorkspace "D:\\ws\\${env.BRANCH_NAME}"
}
}
options{
skipDefaultCheckout()
}
...
}
And I've a step that does the checkout. It was easier for me to have a "per-job" behavior, without touching jenkins global settings.
Update (for any recent Jenkins instances)
Turns out that with recent Jenkins versions PATH_MAX seems to be ignored.
The only thing it does: Issue a warning in the Jenkins log when smaller than a certain value, which actually does not matter - as the setting itself will anyways be ignored (as seen on Jenkins 2.249.3). See also: JENKINS-2111
As far as I can tell - the new setting was introduced in jenkins-branch-api 2.0.21:
There's a new property introduced: MAX_LENGTH.
This defaults to 32 characters by default.
You can set it the same way like PATH_MAX:
As a java property - to ensure that Jenkins will start using the right setting, e.g.:
-Djenkins.branch.WorkspaceLocatorImpl.MAX_LENGTH=40
or during run-time, using the script console:
jenkins.branch.WorkspaceLocatorImpl.MAX_LENGTH=40
For older Jenkins instances
Actually there's a java property you can set to specify the length of the directory name, e.g.:
-Djenkins.branch.WorkspaceLocatorImpl.PATH_MAX=20
To make it permanent you have to specify this property in the Jenkins java startup configuration file.
You may also read and write this property using the Jenkins script console for temporary changes or to just give it a try as it takes effect immediately, e.g.
println jenkins.branch.WorkspaceLocatorImpl.PATH_MAX
jenkins.branch.WorkspaceLocatorImpl.PATH_MAX = 20
println jenkins.branch.WorkspaceLocatorImpl.PATH_MAX
Setting this value to 0 changes the path generation behavior.
For details please check:
https://issues.jenkins-ci.org/browse/JENKINS-34564
https://issues.jenkins-ci.org/browse/JENKINS-38706

Resources