Spark Streaming - CheckPointing issue

Spark Streaming - CheckPointing issue - twitter

i've done the twitter streaming using twitter's streaming user api and spark streaming. this runs successfully on my local machine. but when i run this program on cluster in local mode. it just run successfully for the very first time. later on it gives the following exception.
"Exception in thread "main" org.apache.spark.SparkException: Found both spark.executor.extraClassPath and SPARK_CLASSPATH. Use only the former."
and spark class path is unset already!!
I have to make a new checkpoint directory each time to make it run successfully. otherwise it shows above exception.
Can anyone help me to resolve this issue?
Thanks :)

Had faced similar issue.
setting SPARK_CLASSPATH causes problems as it is depricated. So don't use it.
export LIB_JARS=dependency/jcodings-1.0.8.jar,dependency.....etc
spark-submit --deploy-mode client --master local --class org.xyz.spark.driver.SomeClass --num-executors 10 --jars ${LIB_JARS}

try use
#!/bin/bash
HBASE_HOME=/opt/cloudera/parcels/CDH/lib/hbase
SPARK_CLASSPATH="$HBASE_HOME/conf/:$HBASE_HOME/hbase-client.jar:$HBASE_HOME/hbase-protocol.jar:$HBASE_HOME/lib/htrace-core.jar:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar"
spark-submit --num-executors 2 --executor-cores 2 --executor-memory 10G --conf spark.executor.extraClassPath=$SPARK_CLASSPATH your_spark_program.jar --class your_entry_class
the most important is --conf spark.executor.extraClassPath=$SPARK_CLASSPATH

Related

Set maximum retries in google cloud run

I am trying to set a custom number of retries if a task fails in Google Cloud Run. According to the documentation, I should use --max-retries to set the numbers of tres. I tried to set it with the following command
gcloud beta run deploy ${SAMPLE} \
--set-env-vars GOOGLE_CLOUD_PROJECT=${GOOGLE_CLOUD_PROJECT} \
--image gcr.io/${GOOGLE_CLOUD_PROJECT}/${SAMPLE} --timeout=30m --cpu 4 --memory 4Gi --concurrency 1 --execution-environment gen2 --max-retries 2
But I got an error
unrecognized arguments:
--max-retries
I documention is also mentioned that the value can be modified in the console by "Click Container, Variables, Connections, Security to expand the job properties page.", but I am also not able to find this in the console.

Sometimes flags are only available via the CLI and not via the GUI. In this case the --max-retries is in Beta hence why it's not in the GUI yet. It's possible the GUI features haven't rolledout everywhere yet

Google Endpoints YAML file update: Is there a simpler method

When using Google Endpoints with Cloud Run to provide the container service, one creates a YAML file (stagger 2.0 format) to specify the paths with all configurations. For EVERY CHANGE the following is what I do (based on the documentation (https://cloud.google.com/endpoints/docs/openapi/get-started-cloud-functions)
Step 1: Deploying the Endpoints configuration
gcloud endpoints services deploy openapi-functions.yaml \
--project ESP_PROJECT_ID
This gives me the following output:
Service Configuration [CONFIG_ID] uploaded for service [CLOUD_RUN_HOSTNAME]
Then,
Step 2: Download the script to local machine
chmod +x gcloud_build_image
./gcloud_build_image -s CLOUD_RUN_HOSTNAME \
-c CONFIG_ID -p ESP_PROJECT_ID
Then,
Step 3: Re deploy the Cloud Run service
gcloud run deploy CLOUD_RUN_SERVICE_NAME \
--image="gcr.io/ESP_PROJECT_ID/endpoints-runtime-serverless:CLOUD_RUN_HOSTNAME-CONFIG_ID" \
--allow-unauthenticated \
--platform managed \
--project=ESP_PROJECT_ID
Is this the process for every API path change? Or is there a simpler direct method of updating the YAML file and uploading it somewhere?
Thanks.

Based on the documentation, yes, this would be the process for every API path change. However, this may change in the future as this feature is currently on beta as stated on the documentation you shared.
You may want to look over here in order to create a feature request to GCP so they can improve this feature in the future.
In the meantime, I could advise to create a script for this process as it is always the same steps and doing something in bash that runs these commands would help you automatize the task.
Hope you find this useful.

When you use the default Cloud Endpoint image as described in the documentation the parameter --rollout_strategy=managed is automatically set.
You have to wait up to 1 minutes to use the new configuration. Personally it's what I observe in my deployments. Have a try on it!

Error while running model training in google cloud ml

I want to run model training in the cloud. I am following this link which runs a sample code to train a model based on flower dataset. The tutorial consists of 4 stages:
Set up your Cloud Storage bucket
Preprocessing training and evaluation data in the cloud
Run model training in the cloud
Deploying and using the model for prediction
I was able to complete step 1 and 2, however in step 3, job is successfully submitted but somehow error occurs and task exits with non exit status 1. Here is the log of the task
Screenshot of expanded log is:
I used following command:
gcloud ml-engine jobs submit training test${JOB_ID} \
--stream-logs \
--module-name trainer.task \
--package-path trainer\
--staging-bucket ${BUCKET_NAME} \
--region us-central1 \
--runtime-version=1.2 \
-- \
--output_path "${GCS_PATH}/training" \
--eval_data_paths "${GCS_PATH}/preproc/eval*" \
--train_data_paths "${GCS_PATH}/preproc/train*"
Thanks in advance!

Can you please confirm that the input files (eval_data_paths and train_data_paths) are not empty? Additionally if you are still having issues can you please file an issue https://github.com/GoogleCloudPlatform/cloudml-samples since its easier to handle the issue on Github.

I met the same issue and couldn't figure out, then I followed this, do it again from git clone and there was no error after running on gcs.

It is clear from your error message
The replica worker 1 exited with a non-zero status of 1. Termination reason: Error
that you have some programming error (syntax, undefined etc).
For more information, Check the return code and meaning
Return code -------------Meaning-------------- Cloud ML Engine response
0 Successful completion Shuts down and releases job resources.
1-128 Unrecoverable error Ends the job and logs the error.
Your need to find your bug first and fix it, then try again.
I recommend run your task locally (if your configuration supports) before you submit in cloud. If you find any bug, you can fix easily in your local machine.

Spark Cloudera - Worker Memory Setting [duplicate]

I am configuring an Apache Spark cluster.
When I run the cluster with 1 master and 3 slaves, I see this on the master monitor page:
Memory
2.0 GB (512.0 MB Used)
2.0 GB (512.0 MB Used)
6.0 GB (512.0 MB Used)
I want to increase the used memory for the workers but I could not find the right config for this. I have changed spark-env.sh as below:
export SPARK_WORKER_MEMORY=6g
export SPARK_MEM=6g
export SPARK_DAEMON_MEMORY=6g
export SPARK_JAVA_OPTS="-Dspark.executor.memory=6g"
export JAVA_OPTS="-Xms6G -Xmx6G"
But the used memory is still the same. What should I do to change used memory?

When using 1.0.0+ and using spark-shell or spark-submit, use the --executor-memory option. E.g.
spark-shell --executor-memory 8G ...
0.9.0 and under:
When you start a job or start the shell change the memory. We had to modify the spark-shell script so that it would carry command line arguments through as arguments for the underlying java application. In particular:
OPTIONS="$#"
...
$FWDIR/bin/spark-class $OPTIONS org.apache.spark.repl.Main "$#"
Then we can run our spark shell as follows:
spark-shell -Dspark.executor.memory=6g
When configuring it for a standalone jar, I set the system property programmatically before creating the spark context and pass the value in as a command line argument (I can make it shorter than the long winded system props then).
System.setProperty("spark.executor.memory", valueFromCommandLine)
As for changing the default cluster wide, sorry, not entirely sure how to do it properly.
One final point - I'm a little worried by the fact you have 2 nodes with 2GB and one with 6GB. The memory you can use will be limited to the smallest node - so here 2GB.

In Spark 1.1.1, to set the Max Memory of workers.
in conf/spark.env.sh, write this:
export SPARK_EXECUTOR_MEMORY=2G
If you have not used the config file yet, copy the template file
cp conf/spark-env.sh.template conf/spark-env.sh
Then make the change and don't forget to source it
source conf/spark-env.sh

In my case, I use ipython notebook server to connect to spark. I want to increase the memory for executor.
This is what I do:
from pyspark import SparkContext
from pyspark.conf import SparkConf
conf = SparkConf()
conf.setMaster(CLUSTER_URL).setAppName('ipython-notebook').set("spark.executor.memory", "2g")
sc = SparkContext(conf=conf)

According to Spark documentation you can change the Memory per Node with command line argument --executor-memory while submitting your application. E.g.
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://master.node:7077 \
--executor-memory 8G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000
I've tested and it works.

The default configuration for the worker is to allocate Host_Memory - 1Gb for each worker. The configuration parameter to manually adjust that value is SPARK_WORKER_MEMORY, like in your question:
export SPARK_WORKER_MEMORY=6g.

Icinga check_jboss "NRPE: unable to read output"

I'm using Icinga to monitor some servers and services. Most of them run fine. But now I like to monitor a JBoss-AS on one server via NRPE. Therefore I'm using the check_jboss-Plugin from MonitoringExchange. Although each time I try running a test-command from my Icinga-Server via NRPE I'm getting a NRPE: unable to read output error. When I try executing the command directly on the monitored server it runs fine. It's strange that the execution on the monitored server takes around 5 seconds to return a acceptable result but the NRPE-Exceution returns immediately the error. Trying to set up the NRPE-timeout didn't solve the problem. I also checked the permissions of the check_jboss-plugin and set them to "777" so that there should be no error.
I don't think that there's a common issue with NRPE, because there are also some other checks (e.g. check_load, check_disk, ...) via NRPE and they are all running fine. The permissions of these plugins are analog to my check_jboss-Plugin.
Following one sample exceuction on the monitored server which runs fine:
/usr/lib64/nagios/plugins/check_jboss.pl -T ServerInfo -J jboss.system -a MaxMemory -w 3000: -c 2000: -f
JBOSS OK - MaxMemory is 4049076224 | MaxMemory=4049076224
Here are two command-executions via NRPE from my Icinga-Server. Both commands are correctly
./check_nrpe -H xxx.xxx.xxx.xxx -c check_hda1
DISK OK - free space: / 47452 MB (76% inode=97%);| /=14505MB;52218;58745;0;65273
./check_nrpe -H xxx.xxx.xxx.xxx -c jboss_MaxMemory
NRPE: Unable to read output
Does anyone have a hint for me? If further config-information needed please ask :)

Try to rule out SELinux either by disabling it globally or by changing the SELinux type to nagios_unconfined_plugin_exec_t.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart