Spring Cloud Data Flow java DSL: container properties - spring-cloud-dataflow

I have a SCDF local deployment where I want to deploy a custom docker-based sink. This sink internally consists of a java part that acts as translation wrapper between SCDF and another bit of nonjava code.
I need to be able to control
Name of container
Number of instances
volumes mounted to container
Ports mapped to container
Environment variables passed to the nonjava code
Looking at LocalAppDeployer and DockerCommandBuilder it seems I should be able to do (1) and (2) with something like
HashMap<String,String> params = new HashMap<>();
params.put(AppDeployer.COUNT_PROPERTY_KEY,2);
params.put(AppDeployer.GROUP_PROPERTY_KEY,"foo");
Stream.builder(scdf)
.name("mystream")
.definition("file|bar")
.create()
.deploy(props);
which I expect to give me 2 containers: foo-bar-1 and foo-bar-2
My question is how can I archive (3),(4) and (5)?

For any future searches:
TL;DR: use deployer.<appName>.local.docker.volume-mounts and deployer.<appName>s.local.docker.port-mappings
e.g:
Map<String, String> properties = new HashMap<>();
properties.put(String.format("deployer.%s.local.docker.volume-mounts", "myApp"),"/tmp/foo:/bar");
properties.put(String.format("deployer.%s.local.docker.port-mappings", "myApp"),"9090:80");
Stream.builder(scdf).name("myStream").definition("time|log").create().deploy(properties)
See PR. Thank you to the SCDF team for their help

Related

How to associate a Jupyter Workspace Custom Container with it's Vertex AI Workbench instance

I have multiple User Managed Vertex AI Workbench instances running in my GCP Project.
Each can run one or more Jupyter Workspaces by clicking OPEN JUPYTERLAB. Each Jupyter lab opens in a new browser tab.
From one of the Jupyter lab tabs, how can I tell which workbench instance or VM is hosting it?
EDIT: The first answer by #kiran mathew is not working for me because I have a custom docker container and that solution returns the hostname of the container which is not set to the Workench instance name. I changed the title of the question to be specific to custom containers.
Python code:
import socket
instance_name = socket.gethostname()
print(instance_name)
I have two notebooks stacckkk and stackoverflow2 . When I ran the above code in these two notebook individually I got the below results.
Result:
1st Notebook
2nd Notebook
Using the URL solution in the comment by #KiranMathew, here is a function to return the workbook name that works with custom docker containers.
def get_workbook_name():
url = "http://metadata.google.internal/computeMetadata/v1/instance/name"
req = urllib.request.Request(url)
req.add_header("Metadata-Flavor", "Google")
workbook_name = urllib.request.urlopen(req).read().decode()
return workbook_name
Still anticipating a simpler approach will be available as foreshadowed by #gogasca
You can also get the project id with: http://metadata.google.internal/computeMetadata/v1/project/project-id

What is the correct way to identify production & development environment in AWS CDK?

I am learning AWS Cloud Development Kit (CDK).
As part of this learning, I am trying to understand how I am supposed to correctly handle production and development environment.
I know AWS CDK provides the environment parameter to allow deploying stacks to specific account.
But then, how to have specific options for development versus production stacks ? It does not seem to be provided by default by AWS CDK or am I missing/misunderstanding something ?
A very simple example could be that I want a S3 bucket called my-s3-bucket-dev for my development account and one named my-s3-bucket-prod for my production account. But then how to have e.g. a variable stage correctly handled in AWS CDK ?
I know I can add parameters in the cdk.json file but again, I don't know how to correctly use this file to depend upon the deployed stack i.e. production vs development.
Thanks for the support
Welcome to AWS CDK.
Enjoy the ride. ;)
Actually, there is no semantic (in your case the stage) in an account itself.
This has nothing to do with CDK or Cloud Formation.
You need to take care of this.
You're right, that you could use the CDK context in the cdk.json.
There's no schema enforcement in the context, except for some internally used variables by CDK.
You could define your dev and prod objects within.
There are other ways of defining the context.
Here is an example, what it could look like:
{
"app": "node app",
// usually there's some internal definition for your CDK project
"context": {
"dev": {
"accountId" : "prod_account",
"accountRegion" : "us-east-1",
"name": "dev",
"resourceConfig":
{
// here you could differentiate the config per AWS resource-type
// e.g. dev has lower hardware specs
}
},
"prod": {
"accountId" : "prod_account",
"accountRegion" : "us-east-1",
"name": "prod",
"resourceConfig":
{
// here you could differentiate the config per AWS resource-type
// prod has higher hardware specs or more cluster nodes
}
}
}
}
With this being defined, you need to run your CDK application with the -c flag to specify which configuration object (dev or prod), you want to have.
For instance, you could run it with cdk synth -c stage=prod.
This sets the stage variable in your context and makes it available.
When it was successful, you can re-access the context again and fetch the appropriate config object.
const app = new cdk.App();
const stage = app.node.tryGetContext('stage');
// the following step is only needed, if you have a different config per account
const stageConfig = app.node.tryGetContext(stage );
// ... do some validation and pass the config to the stacks as constructor argument
As I said, the context is one way of doing this.
However, there are drawbacks to it.
It's JSON and no code.
What I prefer is to have TypeScript types per resource configuration (e.g. S3) and wire them all together as a plain object.
The object maps the account/region information and the corresponding resource configurations.

No such property: ToInputStream for class: Script4

I have a situation where I want to import my graph data to database.I am having janusgraph(latest version) running with cassandra(version 3) and elasticsearch(version 6.6.0) using Docker.I have been suggested to use gryo format.So I have tried this command
graph.io(IoCore.gryo()).reader().create().readGraph(ToInputStream.from("my_graph.kryo"), graph);
but ended up with an error
No such property: ToInputStream for class: Script4
The documentation I am following is here.Please take a look and put me in a right procedure. Thanks in advance!
ToInputStream is not a function of Gremlin or JanusGraph. I believe that it is only a function of IBM Compose so unless you are running JanusGraph on that specific platform, this command will not work.
Versions of JanusGraph that utilize TinkerPop 3.4.x will support the io() step and this is the preferred manner in which to load gryo (as well as graphson and graphml) files.
Graph graph = ... // setup JanusGraph instance
GraphTraversalSource g = traversal().withGraph(graph); // might use withRemote() here instead depending on how you are connecting I suppose
g.io("graph.kryo").read().iterate()
Note that if you are connecting remotely - it seems you are sending scripts to the Docker instance given your error - then be sure that that "graph.kryo" file path is accessible to Docker. That's what's nice about ToInputStream from Compose as it allows you to access remote sources.

Launching composed task built by DSL from stream application

Every example I've seen (task-launcher sink and triggertask source ) shows how to launch the task defined by uri attribute.
My tasks definitions look like this :
sampleTask <t2: timestamp || t1: timestamp>
sampleTask-t1 timestamp
sampleTask-t2 timestamp
sampleTaskRunner composed-task-runner --graph=sampleTask
My question is how do I launch the composed task runner (sampleTaskRunner, defined by DSL) from stream application.
Thanks
UPDATE
I ended up with the below solution that triggers task using SCDF REST API :
composedTask definition :
<timestamp || mySampleTask>
Stream definition :
http | httpclient | log
Deployment properties :
app.http.port=81
app.httpclient.body=name=composedTask&arguments=--increment-instance-enabled=true
app.httpclient.http-method=POST
app.httpclient.url=http://localhost:9393/tasks/executions
app.httpclient.headers-expression={'Content-Type':'application/x-www-form-urlencoded'}
Though it's easy to implement http sink component, would be great if stream application starters will provide one out of the box.
Another concern I have is about discovering the SCDF REST URL when deployed in distributed environment.
Here's a quick take from one of the SCDF's R&D team members (Glenn Renfro).
stream create foozer --definition "trigger --fixed-delay=5 | tasklaunchrequest-transform --uri=maven://org.springframework.cloud.task.app:composedtaskrunner-task:1.1.0.BUILD-SNAPSHOT --command-line-arguments='--graph=sampleTask-t1||sampleTask-t2 --increment-instance-enabled=true --spring.datasource.url=jdbc:mariadb://localhost:3306/test --spring.datasource.username=root --spring.datasource.password=password --spring.datasource.driverClassName=org.mariadb.jdbc.Driver' | task-launcher-local" --deploy
In the foozer stream definition,
1) "trigger" source happens to trigger an upstream event every 5s
2) "tasklaunchrequest-transform" processor takes a few arguments; more specifically, it uses "composedtaskrunner-task:1.1.0.BUILD-SNAPSHOT" to launch a composed-task graph (i.e., sampleTask-t1||sampleTask-t2)
3) Pay attention to --increment-instance-enabled. This was recently added to CTR application and this provides the ability to re-launch a composed-task in a recurring cadence
4) Since the CTR and SCDF must share the same database, we are also passing datasource properties as command-line args. (SCDF-server is already started with the same datasource credentials)
Hope this helps.
Lastly, we will add a sample to the reference guide via: spring-cloud/spring-cloud-dataflow#1780

Retrieve vm summary on Jconsole

I need to access the VM arguements from the VM summary tab on jconsole programmatically.. i.e using java.
Am using JMX to create connection to a remote server. using method
MBeanServerConnection connection = JMXConnector.getMBeanServerConnection();
When I used Jconcole to monitor the jmx connection, there is a lot of information on VM summary tab. I need the VM arguements particularly to use in my program.
Please guide !!
I got the answer to the question. In case someone needs it.. Use this code:
ObjectName objName = new ObjectName(ManagementFactory.RUNTIME_MXBEAN_NAME);
String[] vendor = (String[]) mbsc.getAttribute(objName, "InputArguments");
This will return array of strings that contains vm arguments. you can pass different paramenters to query the RuntimeMxBean.

Resources