spark submit add multiple jars in classpath - submit

I am trying to run a spark program where i have multiple jar files, if I had only one jar I am not able run. I want to add both the jar files which are in same location. I have tried the below but it shows a dependency error
spark-submit \
--class "max" maxjar.jar Book1.csv test \
--driver-class-path /usr/lib/spark/assembly/lib/hive-common-0.13.1-cdh​5.3.0.jar
How can i add another jar file which is in the same directory?
I want add /usr/lib/spark/assembly/lib/hive-serde.jar.

Just use the --jars parameter. Spark will share those jars (comma-separated) with the executors.

Specifying full path for all additional jars works.
./bin/spark-submit --class "SparkTest" --master local[*] --jars /fullpath/first.jar,/fullpath/second.jar /fullpath/your-program.jar
Or add jars in conf/spark-defaults.conf by adding lines like:
spark.driver.extraClassPath /fullpath/firs.jar:/fullpath/second.jar
spark.executor.extraClassPath /fullpath/firs.jar:/fullpath/second.jar

You can use * for import all jars into a folder when adding in conf/spark-defaults.conf .
spark.driver.extraClassPath /fullpath/*
spark.executor.extraClassPath /fullpath/*

I was trying to connect to mysql from the python code that was executed using spark-submit.
I was using HDP sandbox that was using Ambari. Tried lot of options such as --jars, --driver-class-path, etc, but none worked.
Solution
Copy the jar in /usr/local/miniconda/lib/python2.7/site-packages/pyspark/jars/
As of now I'm not sure if it's a solution or a quick hack, but since I'm working on POC so it kind of works for me.

In Spark 2.3 you need to just set the --jars option. The file path should be prepended with the scheme though ie file:///<absolute path to the jars>
Eg : file:////home/hadoop/spark/externaljsrs/* or file:////home/hadoop/spark/externaljars/abc.jar,file:////home/hadoop/spark/externaljars/def.jar

Pass --jars with the path of jar files separated by , to spark-submit.
For reference:
--driver-class-path is used to mention "extra" jars to add to the "driver" of the spark job
--driver-library-path is used to "change" the default library path for the jars needed for the spark driver
--driver-class-path will only push the jars to the driver machine. If you want to send the jars to "executors", you need to use --jars
And to set the jars programatically set the following config:
spark.yarn.dist.jars with comma-separated list of jars.
Eg:
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Spark config example") \
.config("spark.yarn.dist.jars", "<path-to-jar/test1.jar>,<path-to-jar/test2.jar>") \
.getOrCreate()

You can use --jars $(echo /Path/To/Your/Jars/*.jar | tr ' ' ',') to include entire folder of Jars.
So,
spark-submit -- class com.yourClass \
--jars $(echo /Path/To/Your/Jars/*.jar | tr ' ' ',') \
...

For --driver-class-path option you can use : as delimeter to pass multiple jars.
Below is the example with spark-shell command but I guess the same should work with spark-submit as well
spark-shell --driver-class-path /path/to/example.jar:/path/to/another.jar
Spark version: 2.2.0

if you are using properties file you can add following line there:
spark.jars=jars/your_jar1.jar,...
assuming that
<your root from where you run spark-submit>
|
|-jars
|-your_jar1.jar

Related

Jenkins Console Print Encoded Characters

When outputting characters from a declarative pipeline running inside a linux container is it possible to change the encoding to match the true output from the terminal? I.e.
├── file1 +-- file1
├── file2 +-- file2
└── file3 +-- file3
^Formatting I want ^Formatting I get
.
I tried passing the following arguments to my Docker Agent:
-e JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF-8"
-e LC_ALL="en_US.UTF-8"
.
Combined with:
sh returnStdout: true, script: " "
And got ├── in place of the "+--", which seems to be the ANSI encoding for the "├──".
I am using the ansiColor Option but that didn't seem to help much.
.
I saw this similar question, but I was unsure on how to implement the solution in the pipeline.
Jenkins: console output characters
You can use Jenkins II to change the encoding to UTF-8.
Go to
Jenkins -> Manage Jenkins -> Configure System -> Global properties
and add two envirenment variables JAVA_TOOL_OPTIONS and LANG having values -Dfile.encoding=UTF-8 and en_US.UTF-8 respectively
.
After adding these you may need to restart Jenkins.
Reference: https://www.linkedin.com/pulse/how-resolve-utf-8-encoding-issue-jenkins-ajuram-salim/
UPDATE:
or you can update <arguments> in jenkins.xml file.
e.g.
<arguments>-Xrs -Xmx256m -Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle -Dfile.encoding=UTF-8 -jar "%BASE%\jenkins.war" --httpPort=8080 --webroot="%BASE%\war"</arguments>
Here is the official answer from cloudbees. Unfortunately all of these did not work for me.
https://support.cloudbees.com/hc/en-us/articles/360004397911-How-to-address-issues-with-unmappable-characters-
Add these to JVM Arguments in master and also on agents -
-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
For me, the problem was really in specifying the optional 'encoding' parameter to the 'sh' pipeline step: sh: Shell Script
Of course, this will only work provided the file.encoding is set properly as described in other posts here.

How to package a single Python script with nix?

I have a single Python script called myscript.py and would like to package it up as a nix derivation with mkDerivation.
The only requirement is that my Python script has a run-time dependency, say, for the consul Python library (which itself depends on the requests and six Python libraries).
For example for myscript.py:
#!/usr/bin/env python3
import consul
print('hi')
How to do that?
I can't figure out how to pass mkDerivation a single script (its src seems to always want a directory, or fetchgit or similar), and also can't figure out how to make the dependency libraries available at runtime.
When you have a single Python file as your script, you don't need src in your mkDerivation and you also don't need to unpack any source code.
The default mkDerivation will try to unpack your source code; to prevent that, simply set dontUnpack = true.
myscript-package = pkgs.stdenv.mkDerivation {
name = "myscript";
propagatedBuildInputs = [
(pkgs.python36.withPackages (pythonPackages: with pythonPackages; [
consul
six
requests2
]))
];
dontUnpack = true;
installPhase = "install -Dm755 ${./myscript.py} $out/bin/myscript";
};
If your script is executable (which we ensure with install -m above) Nix will automatically replace your #!/usr/bin/env python3 line with one which invokes the right specific python interpreter (the one for python36 in the example above), and which does so in an environment that has the Python packages you've specifified in propagatedBuildInputs available.
If you use NixOS, you can then also put your package into environment.systemPackages, and myscript will be available in shells on that NixOS.
This helper function is really nice:
pkgs.writers.writePython3Bin "github-owner-repos" { libraries = [ pkgs.python3Packages.PyGithub ]; } ''
import os
import sys
from github import Github
if __name__ == '__main__':
gh = Github(os.environ['GITHUB_TOKEN'])
for repo in gh.get_user(login=sys.argv[1]).get_repos():
print(repo.ssh_url)
''
https://github.com/nixos/nixpkgs/blob/master/pkgs/build-support/writers/default.nix#L319

swagger-codegen custom generator ClassNotFound

I'm writing a custom generator for swagger-codegen. When I attempt to run the generator with
java -jar modules/swagger-codegen-cli/target/swagger-codegen-cli.jar generate -i path/to/swagger.json -l com.my.company.codegen.MyGenerator -o outputlocation
it fails with
Can't load config class with name com.my.company.codegen.MyGenerator
... list of built-in generators...
at io.swagger.codegen.CodegenConfigLoader.forName(CodegenConfigLoader.java:31)
at io.swagger.codegen.config.CodegenConfigurator.toClientOptInput(CodegenConfigurator.java:286)
at io.swagger.codegen.cmd.Generate.run(Generate.java:186)
at io.swagger.codegen.SwaggerCodegen.main(SwaggerCodegen.java:35)
Caused by: java.lang.ClassNotFoundException: com.my.company.codegen.MyGenerator
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at io.swagger.codegen.CodegenConfigLoader.forName(CodegenConfigLoader.java:29)
... 3 more
I'm not having trouble with any of the built-in generators.
What I did to get here (following the readme):
cloned the project
mvn package
java -jar modules/swagger-codegen-cli/target/swagger-codegen-cli.jar meta -o output/myLibrary -n myGenerator -p com.my.company.codegen
and then the above code
I also tried running mvn package again after making my custom generator (which did not make a .jar file anywhere I could find), and tried creating the .jar file myself. Got the same error.
Also FYI, my confusion was definitely increased by some apparent documentation inconsistencies: expected location for my module differs between here and the classname expected here (end of that section). Also, the command for making your own module specifies modules/swagger-codegen-distribution... when I believe it should specify modules/swagger-codegen-cli.... And the guidance in the project readme doesn't seem very congruent with the custom module readme that is generated here.
I don't normally work with Java, so apologies if I'm just missing something super obvious. Thanks in advance for any help!
After trying a bunch of things / internetting, here is what worked:
java -cp output/myLibrary/target/myCustomCodegen-swagger-codegen-1.0.0.jar:modules/swagger-codegen-cli/target/swagger-codegen-cli.jar io.swagger.codegen.SwaggerCodegen generate -i path/to/swagger.json -l com.my.company.codegen.MyCustomCodegenGenerator -o outputlocation
Here are the steps I had to take start to finish to create a custom generator:
git clone from source
cd swagger-codegen
mvn package
java -jar modules/swagger-codegen-cli/target/swagger-codegen-cli.jar meta -o output/myLibrary -n myCustomCodegen -p com.my.company.codegen. This will create output/myLibrary and subdirectories, where you should find myCustomCodegenGenerator.java ("Generator" is appended to the class name you specify in the command). You should also be able to find the mustache templates within the resources subdirectory.
Make whatever changes you want to myCustomCodegenGenerator.java and the templates.
cd output/myLibrary
mvn package
cd ../..
Now generate your custom library: java -cp output/myLibrary/target/myCustomCodegen-swagger-codegen-1.0.0.jar:modules/swagger-codegen-cli/target/swagger-codegen-cli.jar io.swagger.codegen.SwaggerCodegen generate -i path/to/swagger.json -l com.my.company.codegen.MyCustomCodegenGenerator -o outputlocation (building in step 7 should have generated target/myCustomCodegen-swagger-codegen-1.0.0.jar for you)
Notes:
Obviously the cding is based on where I put things, just wanted to be clear on relatively where I was when running commands
If you are just using the default generated base class for your generator (instead of subclassing an existing language), you will get an exception FileNotFound for myCustomCodegen/myFile.mustache -- it's from this optional block which you can just comment out of your custom generator class.
Remember to mvn package your custom module when you make changes
You'll need to include your custom library in the java command. For example:
java -cp path/to/your/jar.com:modules/swagger-codegen-cli/target/swagger-codegen-cli.jar \
-jar modules/swagger-codegen-cli/target/swagger-codegen-cli.jar \
{args}
Note that if you are attempting to create a generator in Windows and run it from PowerShell, I had to modify #baylee's steps as follows:
mvn install
and
java -cp 'output/myLibrary/target/myCustomCodegen-swagger-codegen-1.0.0.jar;modules/swagger-codegen-cli/target/swagger-codegen-cli.jar' io.swagger.codegen.Codegen -i path/to/swagger.json -l my-language -o outputlocation

How to re-build jna-4.1.0.jar with linux-s390x specific libjnidispatch.so

How can I rebuild jna-4.1.0.jar file to include the linux-s390x specific libjnidispatch.so file.
This is needed by one of my application and failing on the dependency of this libjnidispatch.so file.
Did try to follow this question: How to use JNAerator with multiple dynamic libraries under one header?
Syntax Used:
java -jar jnaerator-0.11-shaded.jar \
> -arch linux-s390x linux-s390x/libjnidispatch.so \
> -mode jna-3.3.0-jenkins-3.jar \
> -jar jna-3.3.0-jenkins-3_updated.jar
Getting below error:
ERROR: JNAeration failed !
#
# Error parsing arguments :
# -arch linux-s390x linux-s390x/libjnidispatch.so -mode jna-3.3.0-jenkins-3.jar -jar jna-3.3.0-jenkins-3_updated.jar : com.ochafik.lang.jnaerator.JNAerator$CommandLineException: Argument 'linux_s390x' is not one of the expected values :
# linux_x64,
# linux_x86,
# armeabi,
# sunos_x86,
# sunos_sparc,
# darwin_universal,
# win32,
# win64
# Please use -h for help on the command-line options available.
Clone the git repository.
Make sure you have a JDK installed, with Apache ant and native build tools (make, gcc, grep, etc).
Then just run ant native; ant jar.
Note that s390x may not be recognized out of the box, but if you look through the build files for how the other platforms are handled it should be straightforward to add in a switch for s390x (see build.xml and native/Makefile).
If you have a package distribution for a previous JNA version, that package may include the necessary packages and if so, consider submitting them to the JNA project as a github pull request to have them incorporating into the project proper.

jenkins plugin for triggering build whenever any file changed in a given directory

I am looking for functionality where we have a directory with some files in it.
Whenever any one makes a change in any of the files in the directory, jenkins shoukd trigger a build.
Is there any plugin or mathod for this functionality. Please advise.
Thanks in advance.
I have not tried it myself, but The FSTrigger plugin seems to do what you want:
FSTrigger provides polling mechanisms to monitor a file system and
trigger a build if a file or a set of files have changed.
If you can monitor the directory with a script, you can trigger the build with a HTTP GET, for example with wget or curl:
wget -O- $JENKINS_URL/job/JOBNAME/build
Although slightly related.. it seems like this issue was about monitoring static files on system.. however there are many version control systems for just this purpose.
I answered this in another post if you're using git to track changes on the files themselves:
#!/bin/bash
set -e
job_name="whatever"
JOB_URL="http://myserver:8080/job/${job_name}/"
FILTER_PATH="path/to/folder/to/monitor"
python_func="import json, sys
obj = json.loads(sys.stdin.read())
ch_list = obj['changeSet']['items']
_list = [ j['affectedPaths'] for j in ch_list ]
for outer in _list:
for inner in outer:
print inner
"
_affected_files=`curl --silent ${JOB_URL}${BUILD_NUMBER}'/api/json' | python -c "$python_func"`
if [ -z "`echo \"$_affected_files\" | grep \"${FILTER_PATH}\"`" ]; then
echo "[INFO] no changes detected in ${FILTER_PATH}"
exit 0
else
echo "[INFO] changed files detected: "
for a_file in `echo "$_affected_files" | grep "${FILTER_PATH}"`; do
echo " $a_file"
done;
fi;
You can add the check directly to the top of the job's exec shell, and it will exit 0 if no changes detected.. Hence, you can always poll the top level of the repo for check-in's to trigger a build. And only complete a build if the files in question change.

Resources