Creation of DSEGraphFrames in Java or Scala using a SparkSession - datastax-enterprise

I am trying to obtain a DSEGraphFrame of my DSE graphs in either java or scala. I am using the blog documentation, as follows
//load a graph in Java
DseGraphFrame graph = DseGraphFrameBuilder.dseGraph("test", spark);
// load a graph in scala
val graph = spark.dseGraph("test_graph")
They both require a spark session (in scala is implicit). My question is how can you create the SparkSession spark? I tried already creating it by myself but the builder could not parse the master:
val spark = SparkSession
.builder
.master("dse://<ip_address>")
.appName("DseGraphFrames")
.getOrCreate()

Only applications launched with dse spark-submit can understand dse:// master addresses. So make sure your launch is run by dse spark-submit
DSE Doc Reference

Related

GraalVM native image reflection doesn't work

I'm trying to create a GraalVM native image using the maven plugin but having some issues.
Here the config for the maven plugin
I'm using GraalVM JDK (installed through Sdkman):
$ java -version
openjdk version "16.0.1" 2021-04-20
OpenJDK Runtime Environment GraalVM CE 21.1.0 (build 16.0.1+9-jvmci-21.1-b05)
OpenJDK 64-Bit Server VM GraalVM CE 21.1.0 (build 16.0.1+9-jvmci-21.1-b05, mixed mode, sharing)
I have a done simple main class like:
package it.r;
public class Main {
public static void main(String[] args) {
System.out.println("********");
System.out.println(Main.class.getConstructors().length);
System.out.println("********");
}
}
When executing it using mvn exec:java -Dexec.mainClass=it.r.Main I get as a result:
********
1
********
But when doing mvn package and then executing the created executable, I have as result:
********
0
********
Why is this happening?
Here the git repo to reproduce
This issue seems to impact Jackson deserialization, as in another example I have an error from jackson that cannot deserialize a yaml file because it can't find constructors for my class.
When GraalVM native image builds your application into a native binary it statically analyzes your application.
The analysis is static, so several dynamic features your application might use require explicit configuration, for example:
reflection
serialization
method handles
using resources (like classloader.getResource())
JNI
This explicit configuration is provided as json configuration files, for example,
You can provide the config files manually, but you can also run your application using a javaagent which will record usages of features requiring configuration.
In a nutshell, you run your application like this:
java -agentlib:native-image-agent=config-output-dir=/path/to/config-dir/
and exercise the code paths that use the code you want to be configured. This is important because the tracing agent can only record the config for the code it actually saw running.
Then the output directory will contain a json file, for example looking like this:
[
{
"name":"StringCapitalizer",
"methods":[{"name":"capitalize","parameterTypes":["java.lang.String"] }]
},
{
"name":"StringReverser",
"methods":[{"name":"reverse","parameterTypes":["java.lang.String"] }]
}
This file lists the classes that need to be included into the analysis and the binary result and their members that need to be accessed.
It’s fairly straightforward but a bit tedious to create manually that’s why the agent approach is preferred.
There’s also a programmatic way to configure classes and members be registered for reflection, but using it means you need to include a dependency on the GraalVM code into your app.
Classes using reflections need to be registered in order to include them in the native image built, more info in the docs

Tensorflow Serving - Not found: Op type not registered 'GatherTree'

I just a newbie. I have problem when I serving tensorflow model in this case:
I. Using this http://opennmt.net/OpenNMT-tf/quickstart.html to train the model.
II. Serving the model with following steps:
Create docker image with:
docker build --pull -t $USER/tensorflow-serving-devel -f tensorflow_serving/tools/docker/Dockerfile.devel .
Run docker container:
docker run --name=tf_container -it $USER/tensorflow-serving-devel
Serving the model:
tensorflow_model_server --port=9000 --model_name=model_name --model_base_path=/model_file &> result_log &
III.The result_log file content:
2019-10-21 02:46:12.840258: I tensorflow_serving/core/loader_harness.cc:155] Encountered an error for servable version {name: ente version: 1569320347}: Not found: Op type not registered 'GatherTree' in binary running on 1b79e5fb3ac4. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2019-10-21 02:46:12.840280: E tensorflow_serving/core/aspired_versions_manager.cc:359] Servable {name: ente version: 1569320347} cannot be loaded: Not found: Op type not registered 'GatherTree' in binary running on 1b79e5fb3ac4. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
2019-10-21 02:46:13.664569: I tensorflow_serving/core/basic_manager.cc:280] Unload all remaining servables in the manager.
Failed to start server. Error: Unknown: 1 servable(s) did not become available: {{{name: ente version: 1569320347} due to error: Not found: Op type not registered 'GatherTree' in binary running on 1b79e5fb3ac4. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.}, } ```
I have searched Google and try to update some services, but the problem still here. Anyone have any idea please?
Thanks so much for any suggestions!
With the transition to TensorFlow 2.0, the GatherTree op that is used in beam search is currently not available in TensorFlow Serving.
If you exported your model with OpenNMT-tf 1.x, it uses the op GatherTree from tf.contrib which was removed in recent versions of TensorFlow Serving. You should use a previous version of TensorFlow Serving such as 1.15.0.
If you exported your model with OpenNMT-tf 2.x, it uses the op Addons>GatherTree from TensorFlow Addons which is presently not integrated in TensorFlow Serving. This is a work in progress. There are currently 2 workarounds:
use opennmt/tensorflow-serving:2.1.0 which is a custom Serving build that includes this op.
disable beam search in OpenNMT-tf by exporting your model with this configuration:
params:
beam_width: 1

Running AWS SAM build from within Python script

I'm in the process of migrating entire CloudFormation stacks to Troposphere, including Lambda and Lambda-reliant CFN Custom Resources.
One of my goals is to circumvent the creation of template files altogether, making the Python code the sole "source of truth" (i.e without template files that are created and therefore can be edited, causing config drift).
This requires the ability to:
Pass a file-like object to the SAM builder (instead of a file-name)
Calling the AWS SAM builder from Python and not the CLI
My first naive idea was that I would be able to import a few modules from aws-sam-cli put a wrapper for io.StringIO around it (to hold the template as file-like object) and presto! Then I looked at the source code for sam build and all hope left me:
I may not be able to use Docker/containers for building, as I it will map the build environment, including template files.
AWS SAM CLI is not designed to have a purely callable set of library functions, similar to boto3. Close, but not quite.
Here is the core of the Python source
with BuildContext(template,
base_dir,
build_dir,
clean=clean,
manifest_path=manifest_path,
use_container=use_container,
parameter_overrides=parameter_overrides,
docker_network=docker_network,
skip_pull_image=skip_pull_image,
mode=mode) as ctx:
builder = ApplicationBuilder(ctx.function_provider,
ctx.build_dir,
ctx.base_dir,
manifest_path_override=ctx.manifest_path_override,
container_manager=ctx.container_manager,
mode=ctx.mode
)
try:
artifacts = builder.build()
modified_template = builder.update_template(ctx.template_dict,
ctx.original_template_path,
artifacts)
move_template(ctx.original_template_path,
ctx.output_template_path,
modified_template)
click.secho("\nBuild Succeeded", fg="green")
msg = gen_success_msg(os.path.relpath(ctx.build_dir),
os.path.relpath(ctx.output_template_path),
os.path.abspath(ctx.build_dir) == os.path.abspath(DEFAULT_BUILD_DIR))
click.secho(msg, fg="yellow")
This relies on a number of imports from a aws-sam-cli internal library with the build focused ones being
from samcli.commands.build.build_context import BuildContext
from samcli.lib.build.app_builder import ApplicationBuilder, BuildError, UnsupportedBuilderLibraryVersionError, ContainerBuildNotSupported
from samcli.lib.build.workflow_config import UnsupportedRuntimeException
It's clear that this means it's not as simple as creating something like a boto3 client and away I go! It looks more like I'd have to fork the whole thing and throw out nearly everything to be left with the build command, context and environment.
Interestingly enough, sam package and sam deploy, according to the docs, are merely aliases for aws cloudformation package and aws cloudformation deploy, meaning those can be used in boto3!
Has somebody possibly already solved this issue? I've googled and searched here, but haven't found anything.
I use PyCharm and the AWS Toolkit which if great for development and debugging and from there I can run SAM builds, but it's "hidden" in the PyCharm plugins - which are written in Kotlin!
My current work-around is to create the CFN templates as temp files and pass them to the CLI commands which are called from Python - an approach I've always disliked.
I may put in a feature request with the aws-sam-cli team and see what they say, unless one of them reads this.
I've managed to launch sam local start-api from a python3 script.
Firstly, pip3 install aws-sam-cli
Then the individual command can be imported and run.
import sys
from samcli.commands.local.start_api.cli import cli
sys.exit(cli())
... provided there's a template.yaml in the current directory.
What I haven't (yet) managed to do is influence the command-line arguments that cli() would receive, so that I could tell it which -t template to use.
Edit
Looking at the way aws-sam-cli integration tests work it seems that they actually kick off a process to run the CLI. So they don't actually pass a parameter to the cli() call at all :-(
For example:
class TestSamPython36HelloWorldIntegration(InvokeIntegBase):
template = Path("template.yml")
def test_invoke_returncode_is_zero(self):
command_list = self.get_command_list(
"HelloWorldServerlessFunction", template_path=self.template_path, event_path=self.event_path
)
process = Popen(command_list, stdout=PIPE)
return_code = process.wait()
self.assertEquals(return_code, 0)
.... etc
from https://github.com/awslabs/aws-sam-cli/blob/a83aa9e620ff679ca740496a3f1ff4872b88894a/tests/integration/local/invoke/test_integrations_cli.py
See also start_api_integ_base.py in the same repo.
I think on the whole this is to be expected because the whole thing is implemented in terms of the click command-line application framework. Unfortunately.
See for example http://click.palletsprojects.com/en/7.x/testing/ which says "The CliRunner.invoke() method runs the command line script in isolation ..." -- my emphasis.
I am using following python script to run sam cli commands. This should work for you too.
import json
import sys
import os
try:
LAMBDA_S3_BUCKET="s3-bucket-name-in-same-region"
AWS_REGION="us-east-1"
API_NAME = "YourAPIName"
BASE_PATH="/path/to/your/project/code/dir"
STACK_NAME="YourCloudFormationStackName"
BUILD_DIR="%s/%s" % (BASE_PATH, "build_artifact")
if not os.path.exists(BUILD_DIR):
os.mkdir(BUILD_DIR)
os.system("cd %s && sam build --template template.yaml --build-dir %s" % (BASE_PATH, BUILD_DIR))
os.system("cd %s && sam package --template-file %s/template.yaml --output-template-file packaged.yaml --s3-bucket %s" %(BASE_PATH, BUILD_DIR, LAMBDA_S3_BUCKET))
os.system("cd %s && sam deploy --template-file packaged.yaml --stack-name %s --capabilities CAPABILITY_IAM --region %s" %(BASE_PATH, STACK_NAME, AWS_REGION))
except Exception as e:
print(e.message)
exit(1)

Is there a way to convert thrift IDL into wsdl spec?

Is there any open source library or online service that could automagicaly generate wsdl spec on the base of the thrift IDL?
The goal is to build facade API on the top of existing thrift API that would allow coupling with ansient systems via SOAP protocol.
There are a couple of ready to use tools that allow to convert Thrift IDL into WSDL. The rest of the answer assumes we live in the Java world with JDK and Maven at hand and internet connection available.
The first one is Swift Code Generator Tool. As it's readme states, one have to:
download the latest version:
mvn org.apache.maven.plugins:maven-dependency-plugin:2.8:get -DremoteRepositories=central::default::http://repo1.maven.apache.org/maven2 -Dartifact=com.facebook.swift:swift-generator-cli:RELEASE:jar:standalone -Ddest=/tmp/
run downloaded jar in the directory containing thrift files:
java -jar /tmp/swift-generator-cli-0.23.1-standalone.jar -use_java_namespace -out ../java *.thrift
assuming standard
- src
- main
- java
- thrift
Maven project layout. Swift Code Generator will generate a Java interface for each Thrift service entry. Every Thrift source file must declare a 'java' namespace, like this:
namespace java com.acme
The generated interface will include nested Async interface for asynchronous invocation. Remove Async subinterface. Automation of Async removal is left as an exercise for the reader.
Compile generated java files with javac or your favourite build tool (ant, maven, gradle, etc.). Do not forget to include com.facebook.swift:swift-annotations:0.23.1 as a compile dependency.
Finally use Apache Axis2' java2wsdl utility available within Axis2 binary distribution, like this:
/tmp/axis2-1.7.4/bin/java2wsdl.sh -cn com.acme.TargetService -cp build/classes/main
to generate wsdl for the Thrift service TargetService {...} entry.

Data in Hbase are not structured as it should be - Twitter Flume

Users, greetings !
I have installed a flume on my cloudera 4.6, and I am trying to get tweets from twitter.
So I created a HDFS sink and a HBase sink, and they are gathering tweets... But data in HBase is not well structured.
As the data is not structured, I can't make queries on it with impala.
I created a table tweets {NAME => 'tweet'}, {NAME => 'retweet'}, {NAME => 'entities'}, {NAME => 'user'}
and my flume configuration is : http://pastebin.com/4b5d3R8Q
I am following this tutorial, but I don't know what to do with his serializer.
https://github.com/AronMacDonald/Twitter_Hbase_Impala
I have to make it into a jar ?
I have currently this in Hbase: http://pastebin.com/aNGBsvB7
Everything is in the column tweets...
I recompiled and used the flume-sources-1.0-SNAPSHOT.jar from the git:https://github.com/cloudera/cdh-twitter-example and so there were no promblem when using 'TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource'
Install Maven, then download the repository of cdh-twitter-example.
Unzip, then execute inside (as mentionned) :
$ cd flume-sources
$ mvn package
$ cd ..
This problem happened when the twitter4j version updated from 2.2.6 to 3.X, they removed the method setIncludeEntities, and the JAR is not up to date.
PS: Do not download the prebuilt version, it is still the old.

Resources