Using existing pub sub subscription from google data flow - google-cloud-dataflow

Am using Google Data Flow where in one of the steps am subscribing to a topic in pub sub using already created subscription.
Here is the code snippet
CustomPipelineOptions options =
PipelineOptionsFactory.fromArgs(args).withValidation().as(customPipelineOptions.class);
Pipeline p = Pipeline.create(options);
PCollection<TableRow> datastream = p.apply(PubsubIO.Read.named("Read device data from PubSub") .subscription("projects/<projectID>/subscriptions/<subscriptionname>)
.topic(String.format("projects/%s/topics/%s", options.getSourceProject(), options.getSourceTopic()))
.timestampLabel("ts")
.withCoder(TableRowJsonCoder.of()));
The above code when executed results in the following error:
Error processing pipeline. Causes: (b5e276ef8c76419f): Unrecognized input pubsub_subscription for step s1.
Am passing the right subscription name and project ID.
Not sure why am still getting the above error.
Please kindly help.

Specifying one of 2 sources should be enough: a topic or a subscription.
I suggest you try:
PCollection<TableRow> datastream = p
.apply(PubsubIO.Read.named("Read device data from PubSub")
.topic(String.format("projects/%s/topics/%s", options.getSourceProject(), options.getSourceTopic()))
.timestampLabel("ts")
.withCoder(TableRowJsonCoder.of()));
Also: I suppose you are using the Dataflow 1.9 SDK? You might want to think about moving to the new Beam 2.0.0 release. You can find the reference for PubSub in that SDK here.

Related

Nested rows using STRUCT are not supported in Dataflow SQL (GCP)

With Dataflow SQL I would like to read a Pub/Sub topic, enrich the message and write the message to a Pub/Sub topic.
Which Dataflow SQL query will create my desired output message?
Pub/Sub input message: {"event_timestamp":1619784049000, "device":{"ID":"some_id"}}
Desired Pub/Sub output message: {"event_timestamp":1619784049000, "device":{“ID":"some_id", “NAME”:”some_name”}}
What I get is: {"event_timestamp":1619784049000, "device":{"ID":"some_id"}, "NAME":"some_name" }
but I need the NAME inside the “device” attribute.
SELECT message_table.device as device, devices.name as NAME
FROM pubsub.topic.project_id.`topic` as message_table
JOIN bigquery.table.project_id.dataflow_sql_dataset.devices as devices
ON devices.device_id = message_table.device.id
Unfortunately, Dataflow SQL does not currently support STRUCT/Sub queries, but we are working on it. Since there are some Apache Beam dependencies preventing its progress (Nested Rows Support, Upgrading Calcite), we cannot provide an ETA at the moment, but you can follow its progress on this issue tracker.
You need to create a struct in the projection (SELECT part)
SELECT STRUCT(message_table.device.ID as ID , devices.name as NAME) as device
FROM pubsub.topic.project_id.`topic` as message_table
JOIN bigquery.table.project_id.dataflow_sql_dataset.devices as devices
ON devices.device_id = message_table.device.id

Exported Dataflow Template Parameters Unknown

I've exported a Cloud Dataflow template from Dataprep as outlined here:
https://cloud.google.com/dataprep/docs/html/Export-Basics_57344556
In Dataprep, the flow pulls in text files via wildcard from Google Cloud Storage, transforms the data, and appends it to an existing BigQuery table. All works as intended.
However, when trying to start a Dataflow job from the exported template, I can't seem to get the startup parameters right. The error messages aren't overly specific but it's clear that for one thing, I'm not getting the locations (input and output) right.
The only Google-provided template for this use case (found at https://cloud.google.com/dataflow/docs/guides/templates/provided-templates#cloud-storage-text-to-bigquery) doesn't apply as it uses a UDF and also runs in Batch mode, overwriting any existing BigQuery table rather than append.
Inspecting the original Dataflow job details from Dataprep shows a number of parameters (found in the metadata file) but I haven't been able to get those to work within my code. Here's an example of one such failed configuration:
import time
from google.cloud import storage
from googleapiclient.discovery import build
from oauth2client.client import GoogleCredentials
def dummy(event, context):
pass
def process_data(event, context):
credentials = GoogleCredentials.get_application_default()
service = build('dataflow', 'v1b3', credentials=credentials)
data = event
gsclient = storage.Client()
file_name = data['name']
time_stamp = time.time()
GCSPATH="gs://[path to template]
BODY = {
"jobName": "GCS2BigQuery_{tstamp}".format(tstamp=time_stamp),
"parameters": {
"inputLocations" : '{{\"location1\":\"[my bucket]/{filename}\"}}'.format(filename=file_name),
"outputLocations": '{{\"location1\":\"[project]:[dataset].[table]\", [... other locations]"}}',
"customGcsTempLocation": "gs://[my bucket]/dataflow"
},
"environment": {
"zone": "us-east1-b"
}
}
print(BODY["parameters"])
request = service.projects().templates().launch(projectId=PROJECT, gcsPath=GCSPATH, body=BODY)
response = request.execute()
print(response)
The above example indicates invalid field ("location1", which I pulled from a completed Dataflow job. I know I need to specify the GCS location, the template location, and the BigQuery table but haven't found the correct syntax anywhere. As mentioned above, I found the field names and sample values in the job's generated metadata file.
I realize that this specific use case may not ring any bells but in general if anyone has had success determining and using the correct startup parameters for a Dataflow job exported from Dataprep, I'd be most grateful to learn more about that. Thx.
I think you need to review this document it explains exactly the syntax required for passing the various pipeline options available including the location parameters needed... 1
Specifically with your code snippet the following does not follow the correct syntax
""inputLocations" : '{{\"location1\":\"[my bucket]/{filename}\"}}'.format(filename=file_name)"
In addition to document1, you should also review the available pipeline options and their correct syntax 2
Please use the links...They are the official documentation links from Google.These links will never go stale or be removed they are actively monitored and maintained by a dedicated team

BuildHTTPClient not able to get Build Definition Steps?

We are using the BuildHTTPClient to programmatically create a copy of a build definition, update the variables in memory and then save the updated object as a new definition.
I'm using Microsoft.TeamFoundation.Build2.WebApi.BuildHTTPClient 16.141. The TFS version is 17 update 3 (rest api 3.x)
This is a similar question to https://serverfault.com/questions/799607/tfs-buildhttpclient-updatedefinition-c-example but I'm trying to stay within using the BuildHttpClient libraries and not go directly to the RestAPIs.
The problem is the Steps list is always null along with other properties even though we have them in the build definition.
UPDATE Posted as an answer below
After looking at #Daniel Frosts attempt below we started looking at using older versions of the NuGet package. Surprisingly the supported version 15.131.1 does not support this but we have found out that the version="15.112.0-preview" does.
After rolling back all of our Dlls to match that version the steps were cloned when saving the new copy of the build.
All of the code examples we used work when you are using this package. We were unable to get Daniel's example working but the version of the Dll was the issue.
We need to create a GitHub issue and report it to MS
First Attempt - GetDefinitionAsync:
VssConnection connection = new VssConnection(DefinitionTypesDTO.serverUrl, new VssCredentials());
BuildHttpClient bdClient = connection.GetClient<BuildHttpClient>();
Task <BuildDefinition> resultDef = bdClient.GetDefinitionAsync(DefinitionTypesDTO.teamProjectName, buildID);
resultDef.Wait();
BuildDefinition updatedDefinition = UpdateBuildDefinitionValues(resultDef.Result, dr, defName);
updatedTask = bdClient.CreateDefinitionAsync(updatedDefinition, DefinitionTypesDTO.teamProjectName);
The update works on the variables and we can save the updated definition back to TFS but there are not any tasks in the newly created build definition. When we look at the object that is returned from GetDefinitionAsync we see that the Steps list is empty. It looks like GetDefinitionAsync just doesn't get the full object.
Second Attempt - Specific Revision:
int rev = 9;
Task <BuildDefinition> resultDef = bdClient.GetDefinitionAsync(DefinitionTypesDTO.teamProjectName, buildID, revision: rev);
resultDef.Wait();
BuildDefinition updatedDefinition = UpdateBuildDefinitionValues(resultDef.Result, dr, defName);
Based on SteveSims post we were thinking we are not getting the correct revision. So we added revision to the request. I see the same issue with the correct revision. Similarly to SteveSims post I can open the DefinitionURL in a browser and I see that the tasks are in the JSON in the browser but the BuildDefinition object is not populated with them.
Third Attempt - GetFullDefinition:
So then I thought to try getFullDefinition, maybe that's that "Full" means of course with out any documentation on these libraries I have no idea.
var task2 = bdClient.GetFullDefinitionsAsync(DefinitionTypesDTO.teamProjectName, "MyBuildDefName","$/","TfsVersionControl");
task2.Wait();
Still no luck, the Steps list is always null even though we have steps in the build definition.
Fourth Attempt - Save As Template
var task2 = bdClient.GetTemplateAsync DefinitionTypesDTO.teamProjectName, "1_Batch_Dev");
task2.Wait();
I tried saving the Build Definition off as a template. So in the Web UI I chose "Save as Template", still no steps.
Fifth Attempt: Using the URL as mentioned in SteveSims post:
Finally i said ok, i'll try the solution SteveSims used, using the webclient to get the object from the URL.
var client = new WebClient();
client.UseDefaultCredentials = true;
var json = client.DownloadString(LastDefinitionUrl);
//Convert the JSON to an actual builddefinition
BuildDefinition result = JsonConvert.DeserializeObject<BuildDefinition>(json);
This also didn't work. The build definition steps are null. Even when looking at the Json object (var json) i see the steps. But the object is not loaded with them.
I've seen this post which seems to add the Steps to the base definition, i've tried this but honestly I'm having an issue understanding how he has modified the BuildDefinition Object when referencing that via NuGet?
https://dennisdel.com/blog/getting-build-steps-with-visual-studio-team-services-.net-api/
After looking at #Daniel Frosts attempt below we started looking at using older versions of the NuGet package. Surprisingly the supported version 15.131.1 does not support this but we have found out that the version="15.112.0-preview" does.
After rolling back all of our Dlls to match that version the steps were cloned when saving the new copy of the build.
All of the code examples above work when you are using this package. We were unable to get Daniel's example working but we didn't try hard as we had working code.
We need to create a GitHub issue for this.
Found this in my code, which works.
Use this package, not sure if it could have an impact (joke).
...packages\Microsoft.TeamFoundationServer.Client.15.112.1\lib\net45\Microsoft.TeamFoundation.Build2.WebApi.dll
private Microsoft.TeamFoundation.Build.WebApi.BuildDefinition GetBuildDefinition(string projectName, string buildDefinitionName)
{
var buildDefinitionReferences = _buildHttpClient.GetFullDefinitionsAsync(projectName, "*", null, null, DefinitionQueryOrder.DefinitionNameAscending, top: 1000).Result;
return buildDefinitionReferences.SingleOrDefault(x => x.Name == buildDefinitionName && x.DefinitionQuality != DefinitionQuality.Draft);
}
With the newer clients Steps will always be empty. In newer api-versions (which are used by the newer clients) the steps have moved to Phases. If you use GetDefinitions or GetFullDefinitions and look in
definition.Process.Phases[0].Steps
you'll find them. (GetDefinitions gets shallow references so the process won't be included.)
The Steps collection still exists for compatibility reasons (we don't want apps to crash with stuff like MethodNotFoundExceptions) but it won't be populated.
I was having this problem, although I able to get Phases[0] information at runtime, but could not get it at design time. I solved this problem using dynamic type.
dynamic process = buildDefTemplate.Process;
foreach (BuildDefinitionStep tempStep in process.Phases[0].Steps)
{
// do some work here
}
Not, it is working!
Microsoft.TeamFoundationServer.Client version 16.170.0 I can get build steps through process.Phases[0].Steps only with process and step being dynamic as #whitecore above stated
var definitions = buildClient.GetFullDefinitionsAsync(project: project.Name);
foreach (var definition in definitions.Result)
{
Console.WriteLine(string.Format("\n {0} - {1}:", definition.Id, definition.Name));
dynamic process = definition.Process;
foreach (dynamic step in process.Phases[0].Steps)
{
Console.WriteLine(step.DisplayName);
}
}

Launching composed task built by DSL from stream application

Every example I've seen (task-launcher sink and triggertask source ) shows how to launch the task defined by uri attribute.
My tasks definitions look like this :
sampleTask <t2: timestamp || t1: timestamp>
sampleTask-t1 timestamp
sampleTask-t2 timestamp
sampleTaskRunner composed-task-runner --graph=sampleTask
My question is how do I launch the composed task runner (sampleTaskRunner, defined by DSL) from stream application.
Thanks
UPDATE
I ended up with the below solution that triggers task using SCDF REST API :
composedTask definition :
<timestamp || mySampleTask>
Stream definition :
http | httpclient | log
Deployment properties :
app.http.port=81
app.httpclient.body=name=composedTask&arguments=--increment-instance-enabled=true
app.httpclient.http-method=POST
app.httpclient.url=http://localhost:9393/tasks/executions
app.httpclient.headers-expression={'Content-Type':'application/x-www-form-urlencoded'}
Though it's easy to implement http sink component, would be great if stream application starters will provide one out of the box.
Another concern I have is about discovering the SCDF REST URL when deployed in distributed environment.
Here's a quick take from one of the SCDF's R&D team members (Glenn Renfro).
stream create foozer --definition "trigger --fixed-delay=5 | tasklaunchrequest-transform --uri=maven://org.springframework.cloud.task.app:composedtaskrunner-task:1.1.0.BUILD-SNAPSHOT --command-line-arguments='--graph=sampleTask-t1||sampleTask-t2 --increment-instance-enabled=true --spring.datasource.url=jdbc:mariadb://localhost:3306/test --spring.datasource.username=root --spring.datasource.password=password --spring.datasource.driverClassName=org.mariadb.jdbc.Driver' | task-launcher-local" --deploy
In the foozer stream definition,
1) "trigger" source happens to trigger an upstream event every 5s
2) "tasklaunchrequest-transform" processor takes a few arguments; more specifically, it uses "composedtaskrunner-task:1.1.0.BUILD-SNAPSHOT" to launch a composed-task graph (i.e., sampleTask-t1||sampleTask-t2)
3) Pay attention to --increment-instance-enabled. This was recently added to CTR application and this provides the ability to re-launch a composed-task in a recurring cadence
4) Since the CTR and SCDF must share the same database, we are also passing datasource properties as command-line args. (SCDF-server is already started with the same datasource credentials)
Hope this helps.
Lastly, we will add a sample to the reference guide via: spring-cloud/spring-cloud-dataflow#1780

Windev Quickbooks SDK OpenConnection2

I've been trying to find a way to connect my Windev application using the Quickbooks SDK.
I wish to connect to my local QB instance using the qbXML API.
I've been able to get a reference to the library using :
myconnection = new object Automation "QBXMLRP2.RequestProcessor"
However, when it comes to the OpenConnection2 method, I only get errors. Either "missing parameter" or "invalid parameter". I am aware that I should pass a "localQBD" type to the function, but I have not found out how to reference it. The following represents my invalid script.
myconnection>>OpenConnection2("","My Test App", localQBD)
How can I achieve a connection to QB through Windev?
After much searching, I have found that I was on the right path using the automation variable type.
However, I have yet to find how to reference the constants provided by the library. Instead, I declare them beforehand like so
CONSTANT
omSingleUser = 0
omMultiUser = 1
omDontCare = 2
qbStopOnError = 0
qbContinueOnError = 1
ctLocalQBD = 1
ctLocalQBDLaunchUI = 3
FIN
Which gives us this working example
myconnection = new object Automation "QBXMLRP2.RequestProcessor"
ticket = myconnection>>BeginSession("",::omDontCare)
XMLresponse = myconnection>>ProcessRequest(ticket,XMLrequest)
myconnection>>EndSession(ticket)
myconnection>>CloseConnection()
delete myconnection
A huge thanks goes to Frank Cazabon for showing me the proper constant values.
I have a complete external WinDev component that accesses QB and a helper program that can generate the WinDev calls in the correct order with the correct spelling and provides an OSR for all the QuickBooks fields and modules.
I have a similar product for the Clarion language and am in the final stages of the WinDev version. Contact me if you are interested. qbsnap at wybatap.com

Resources