The current Dataflow documentation and referenced templates (see link below) use BigQueryIO.Write.Method.STREAMING_INSERTS as the input method into BigQuery.
https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/HEAD/src/main/java/com/google/cloud/teleport/templates/PubSubToBigQuery.java
Are there any code examples that show how to use the new STORAGE_WRITE_API with Dataflow?
You can see an example from the Apache Beam repository where we have an integration test.
Pipeline p = Pipeline.create(options);
final int payloadSizeBytes = options.getPayloadSizeBytes();
// Generate input.
PCollection<Value> values =
p.apply(
GenerateSequence.from(1)
.to(1000000)
.withRate(options.getRecordsPerSecond(), Duration.standardSeconds(1)))
.apply(
MapElements.into(TypeDescriptor.of(Value.class))
.via(
l -> {
byte[] payload = "".getBytes(StandardCharsets.UTF_8);
if (payloadSizeBytes > 0) {
payload = new byte[payloadSizeBytes];
ThreadLocalRandom.current().nextBytes(payload);
}
return new AutoValue_BigQueryStorageAPIStreamingIT_Value(
l, ByteBuffer.wrap(payload));
}));
values.apply(
"writeVortex",
BigQueryIO.<Value>write()
.useBeamSchema()
.to(options.getTargetTable())
.withMethod(Write.Method.STORAGE_WRITE_API)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(WriteDisposition.WRITE_APPEND)
.withNumStorageWriteApiStreams(options.getNumShards())
.withTriggeringFrequency(Duration.standardSeconds(options.getTriggerFrequencySec())));
p.run();
Related
Using Timestamper plugin 1.11.2 with globally enabled timestamps, using the default format, I get the following console output:
00:00:41.097 Some Message
In Blue Ocean the output shows like:
[2020-04-01T00:00:41.097Z] Some Message
How can I make it so that Blue Ocean uses the short timestamp format? The long format is somewhat unreadable and clutters the details view of the steps.
I've looked at the Pipeline Options too, but there is only the timestamps option which doesn't have a parameter to specify the format.
Note: This question isn't a dupe, because it asks for differences in time zone only.
Edit:
⚠️ Unfortunately this workaround doesn't work in the context of node, see JENKINS-59575. Looks like I have to finally get my hands dirty with plugin development, to do stuff like that in a supported way.
Anyway, I won't delete this answer, as the code may still be useful in other scenarios.
Original answer:
As a workaround, I have created a custom ConsoleLogFilter. It can be applied as a pipeline option, a stage option or at the steps level. If you have the timestamp plugin installed, you should disable the global timestamp option to prevent duplicate timestamps.
Typically you would define the low-level code in a shared library. Here is a sample that can be copy-pasted right into the pipeline script editor (you might have to disable Groovy sandbox):
import hudson.console.LineTransformationOutputStream
import hudson.console.ConsoleLogFilter
import java.nio.charset.Charset
import java.nio.charset.StandardCharsets
pipeline{
agent any
/*
options{
// Enable timestamps for the whole pipeline, using default format
//withContext( myTimestamps() )
// Enable timestamps for the whole pipeline, using custom format
//withContext( myTimestamps( dateFormat: 'HH:mm:ss', prefix: '', suffix: ' - ' ) )
}
*/
stages {
stage('A') {
options {
// Enable timestamps for this stage only
withContext( myTimestamps() )
}
steps {
echo 'Hello World'
}
}
stage('B') {
steps {
echo 'Hello World'
// Enable timestamps for some steps only
withMyTimestamps( dateFormat: 'HH:mm:ss') {
echo 'Hello World'
}
}
}
}
}
//----- Code below should be moved into a shared library -----
// For use as option at pipeline or stage level, e. g.: withContext( myTimestamps() )
def myTimestamps( Map args = [:] ) {
return new MyTimestampedLogFilter( args )
}
// For use as block wrapper at steps level
void withMyTimestamps( Map args = [:], Closure block ) {
withContext( new MyTimestampedLogFilter( args ), block )
}
class MyTimestampedLogFilter extends ConsoleLogFilter {
String dateFormat
String prefix
String suffix
MyTimestampedLogFilter( Map args = [:] ) {
this.dateFormat = args.dateFormat ?: 'YY-MM-dd HH:mm:ss'
this.prefix = args.prefix ?: '['
this.suffix = args.suffix ?: '] '
}
#NonCPS
OutputStream decorateLogger( AbstractBuild build, OutputStream logger )
throws IOException, InterruptedException {
return new MyTimestampedOutputStream( logger, StandardCharsets.UTF_8, this.dateFormat, this.prefix, this.suffix )
}
}
class MyTimestampedOutputStream extends LineTransformationOutputStream {
OutputStream logger
Charset charset
String dateFormat
String prefix
String suffix
MyTimestampedOutputStream( OutputStream logger, Charset charset, String dateFormat, String prefix, String suffix ) {
this.logger = logger
this.charset = charset
this.dateFormat = dateFormat
this.prefix = prefix
this.suffix = suffix
}
#NonCPS
void close() throws IOException {
super.close();
logger.close();
}
#NonCPS
void eol( byte[] bytes, int len ) throws IOException {
def lineIn = charset.decode( java.nio.ByteBuffer.wrap( bytes, 0, len ) ).toString()
def dateFormatted = new Date().format( this.dateFormat )
def lineOut = "${this.prefix}${dateFormatted}${this.suffix}${lineIn}\n"
logger.write( lineOut.getBytes( charset ) )
}
}
Example output for stage "B":
Credits:
I got the idea from this answer.
I am trying to write a pipeline which periodically checks a Google Storage bucket for new .gz files which are actually compressed .csv files. Then it writes those records to a BigQuery table. The following code was working in batch mode before I added the .watchForNewFiles(...) and .withMethod(STREAMING_INSERTS) parts. I am expecting it to run in streaming mode with those changes. However I am getting an exception that I can't find anything related on the web. Here is my code:
public static void main(String[] args) {
DataflowDfpOptions options = PipelineOptionsFactory.fromArgs(args)
//.withValidation()
.as(DataflowDfpOptions.class);
Pipeline pipeline = Pipeline.create(options);
Stopwatch sw = Stopwatch.createStarted();
log.info("DFP data transfer from GS to BQ has started.");
pipeline.apply("ReadFromStorage", TextIO.read()
.from("gs://my-bucket/my-folder/*.gz")
.withCompression(Compression.GZIP)
.watchForNewFiles(
// Check for new files every 30 seconds
Duration.standardSeconds(30),
// Never stop checking for new files
Watch.Growth.never()
)
)
.apply("TransformToTableRow", ParDo.of(new TableRowConverterFn()))
.apply("WriteToBigQuery", BigQueryIO.writeTableRows()
.to(options.getTableId())
.withMethod(STREAMING_INSERTS)
.withCreateDisposition(CREATE_NEVER)
.withWriteDisposition(WRITE_APPEND)
.withSchema(TableSchema)); //todo: use withJsonScheme(String json) method instead
pipeline.run().waitUntilFinish();
log.info("DFP data transfer from GS to BQ is finished in {} seconds.", sw.elapsed(TimeUnit.SECONDS));
}
/**
* Creates a TableRow from a CSV line
*/
private static class TableRowConverterFn extends DoFn<String, TableRow> {
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
String[] split = c.element().split(",");
//Ignore the header line
//Since this is going to be run in parallel, we can't guarantee that the first line passed to this method will be the header
if (split[0].equals("Time")) {
log.info("Skipped header");
return;
}
TableRow row = new TableRow();
for (int i = 0; i < split.length; i++) {
TableFieldSchema col = TableSchema.getFields().get(i);
//String is the most common type, putting it in the first if clause for a little bit optimization.
if (col.getType().equals("STRING")) {
row.set(col.getName(), split[i]);
} else if (col.getType().equals("INTEGER")) {
row.set(col.getName(), Long.valueOf(split[i]));
} else if (col.getType().equals("BOOLEAN")) {
row.set(col.getName(), Boolean.valueOf(split[i]));
} else if (col.getType().equals("FLOAT")) {
row.set(col.getName(), Float.valueOf(split[i]));
} else {
//Simply try to write it as a String if
//todo: Consider other BQ data types.
row.set(col.getName(), split[i]);
}
}
c.output(row);
}
}
And the stack trace:
java.lang.IllegalArgumentException: Not expecting a splittable ParDoSingle: should have been overridden
at org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
at org.apache.beam.runners.dataflow.PrimitiveParDoSingleFactory$PayloadTranslator.payloadForParDoSingle(PrimitiveParDoSingleFactory.java:167)
at org.apache.beam.runners.dataflow.PrimitiveParDoSingleFactory$PayloadTranslator.translate(PrimitiveParDoSingleFactory.java:145)
at org.apache.beam.runners.core.construction.PTransformTranslation.toProto(PTransformTranslation.java:206)
at org.apache.beam.runners.core.construction.SdkComponents.registerPTransform(SdkComponents.java:86)
at org.apache.beam.runners.core.construction.PipelineTranslation$1.visitPrimitiveTransform(PipelineTranslation.java:87)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:668)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458)
at org.apache.beam.runners.core.construction.PipelineTranslation.toProto(PipelineTranslation.java:59)
at org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:165)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:684)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:173)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at com.diply.data.App.main(App.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
at java.lang.Thread.run(Thread.java:748)
Here is my command to publish the job on Dataflow:
clean compile exec:java -Dexec.mainClass=com.my.project.App "-Dexec.args=--runner=DataflowRunner --tempLocation=gs://my-bucket/tmp --tableId=Temp.TestTable --project=my-project --jobName=dataflow-dfp-streaming" -Pdataflow-runner
I use apache beam version 2.5.0. Here is the relevant section from my pom.xml.
<properties>
<beam.version>2.5.0</beam.version>
<bigquery.version>v2-rev374-1.23.0</bigquery.version>
<google-clients.version>1.23.0</google-clients.version>
...
</properties>
Running the code with Dataflow 2.4.0 gives a more explicit error: java.lang.UnsupportedOperationException: DataflowRunner does not currently support splittable DoFn
However, this answer suggests that this is supported since 2.2.0. This is indeed the case, and following this remark you need to add the --streaming option in your Dexec.args to force it into streaming mode.
I tested it with the code I supplied in the comments with both your pom and mine and both 1. produce your error without --streaming 2. run fine with --streaming
You might want to open a github beam issue since this behavior is not documented anywhere offically as far as I know.
I am trying to pass arguments to MSBuild 2.0. After research it appears that I need to do this using variables, but I cannot figure out how to incorporate this into my queue request below. I have tried parameters but that does not seem to work. Here is what I am trying to tell MSBuild #" /p:OctoPackPackageVersion=" + releaseNumber. This worked with the XAML build using IBuildRequest.ProcessParameters.
var buildClient = new BuildHttpClient(new Uri(collectionURL), new
VssCredentials(true));
var res = await buildClient.QueueBuildAsync(new Build
{
Definition = new DefinitionReference
{
Id = targetBuild.Id
},
Project = targetBuild.Project,
SourceVersion = ChangeSetNumber,
Parameters = buildArg
});
return res.Id.ToString();
vNext build system is different with legacy XAML build system, you cannot pass variable to build tasks in the build definition directly when queue the build. The code you used updated the build definition before queue the build which means that the build definition may keep changing if the variable changed.
The workaround for this would be add a variable in your build definition for example "var1" and then use this variable as the arguments for MSBuild Task:
With this, you will be able to pass the value to "var1" variable when queue the build without updating the build definition.
Build build = new Build();
build.Parameters = "{\"var1\":\"/p:OctoPackPackageVersion=version2\"}";
// OR using Newtonsoft.Json.JsonConvert
var dict = new Dictionary<string, string>{{"var1", "/p:OctoPackPackageVersion=version2"}};
build.Parameters = JsonConvert.SerializeObject(dict)
I have found this solution and it works for me excellent. I set custom parameters for convenience in build definition without updating on server:
foreach (var variable in targetBuildDef.Variables.Where(p => p.Value.AllowOverride))
{
var customVar = variables.FirstOrDefault(p => p.Key == variable.Key);
if (customVar == null)
continue;
variable.Value.Value = customVar.Value.TrimEnd('\\');
}
And then set variables values in build parameters:
using (TfsTeamProjectCollection ttpc = new TfsTeamProjectCollection(new Uri(tFSCollectionUri)))
{
using (BuildHttpClient buildServer = ttpc.GetClient<BuildHttpClient>())
{
var requestedBuild = new Build
{
Definition = targetBuildDef,
Project = targetBuildDef.Project
};
var dic = targetBuildDef.Variables.Where(z => z.Value.AllowOverride).Select(x => new KeyValuePair<string, string>(x.Key, x.Value.Value));
var paramString = $"{{{string.Join(",", dic.Select(p => $#"""{p.Key}"":""{p.Value}"""))}}}";
var jsonParams = HttpUtility.JavaScriptStringEncode(paramString).Replace(#"\""", #"""");
requestedBuild.Parameters = jsonParams;
var queuedBuild = buildServer.QueueBuildAsync(requestedBuild).Result;
First, the new build on TFS2015 which is called vNext build not MSbuild 2.0.
Which you are looking for should be Build variables. Variables give you a convenient way to get key bits of data into various parts of your build process. For the variable with Allow at queue time box checked you could be enable allow your team to modify the value when they manually queue a build.
Some tutorials may be helpful for using variables:
TFS Build 2015 (vNext) – Scripts and Variables
Passing Visual Studio Team Services build properties to MSBuild
Patrick, I was able to find a work around to my issue by updating the build definition. This is definitely not ideal but it works. As you can see below I am trying to add to the msbuild args already present. If you know a better way let me know. I really appreciate you taking the time to look at my question.
public static async Task<string> QueueNewBuild(string project, BuildDefinitionReference targetBuild, string collectionURL, string ChangeSetNumber, string ReleaseNumber, bool CreateRelease)
{
var buildClient = new BuildHttpClient(new Uri(collectionURL), new VssCredentials(true));
await Task.Delay(1000).ConfigureAwait(false);
var buildDef = await buildClient.GetDefinitionAsync(targetBuild.Project.Id, targetBuild.Id);
BuildDefinitionVariable OrigMSbuildvar = buildDef.Variables["MSBuildArgs"];
buildDef.Variables["MSBuildArgs"].Value = OrigMSbuildvar.Value + " /p:OctoPackPackageVersion=" + ReleaseNumber.ToString();
await Task.Delay(1000).ConfigureAwait(false);
buildDef = await buildClient.UpdateDefinitionAsync(buildDef);
await Task.Delay(1000).ConfigureAwait(false);
Build build = new Build
{
Definition = new DefinitionReference
{
Id = targetBuild.Id
},
Project = targetBuild.Project,
SourceVersion = ChangeSetNumber
};
await Task.Delay(1000).ConfigureAwait(false);
var res = await buildClient.QueueBuildAsync(build);
buildDef.Variables["MSBuildArgs"].Value = OrigMSbuildvar.Value;
await Task.Delay(1000).ConfigureAwait(false);
buildDef = await buildClient.UpdateDefinitionAsync(buildDef);
return res.Id.ToString();
}
I am receiving messages to dataflow via pubsub in streaming mode (which is required for my desires).
Each message should be stored in its own file in GCS.
Since unbounded collections in TextIO.Write is not supported I tried to divide the PCollection into windows which contain one element each.
And writes each window to google-cloud-storage.
Here is my code:
public static void main(String[] args) {
DataflowPipelineOptions options = PipelineOptionsFactory.create()
.as(DataflowPipelineOptions.class);
options.setRunner(BlockingDataflowPipelineRunner.class);
options.setProject(PROJECT_ID);
options.setStagingLocation(STAGING_LOCATION);
options.setStreaming(true);
Pipeline pipeline = Pipeline.create(options);
PubsubIO.Read.Bound<String> readFromPubsub = PubsubIO.Read.named("ReadFromPubsub")
.subscription(SUBSCRIPTION);
PCollection<String> streamData = pipeline.apply(readFromPubsub);
PCollection<String> windowedMessage = streamData.apply(Window.<String>triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1))).discardingFiredPanes());
e
windowedMessage.apply(TextIO.Write.to("gs://pubsub-outputs/1"));
pipeline.run();
}
I still receive the same error got before windowing.
The DataflowPipelineRunner in streaming mode does not support TextIO.Write.
What is the code for executing the described above.
TextIO work with Bound PCollection, you could write into GCS with API Storage.
You could do :
PipeOptions options = data.getPipeline().getOptions().as(PipeOptions.class);
data.apply(WithKeys.of(new SerializableFunction<String, String>() {
public String apply(String s) { return "mykey"; } }))
.apply(Window.<KV<String, String>>into(FixedWindows.of(Duration.standardMinutes(options.getTimeWrite()))))
.apply(GroupByKey.create())
.apply(Values.<Iterable<String>>create())
.apply(ParDo.of(new StorageWrite(options)));
You create a Window with an operation of groupBy and you could write with iterable into Storage. the processElement of StorageWrite :
PipeOptions options = c.getPipelineOptions().as(PipeOptions.class);
String date = ISODateTimeFormat.date().print(c.window().maxTimestamp());
String isoDate = ISODateTimeFormat.dateTime().print(c.window().maxTimestamp());
String blobName = String.format("%s/%s/%s", options.getBucketRepository(), date, options.getFileOutName() + isoDate);
BlobId blobId = BlobId.of(options.getGCSBucket(), blobName);
WriteChannel writer = storage.writer(BlobInfo.builder(blobId).contentType("text/plain").build());
for (Iterator<String> it = c.element().iterator(); it.hasNext();) {
writer.write(ByteBuffer.wrap(it.next().getBytes()));
}
writer.close();
I am looking for UFT and TFS integration (Run test from TFS like we did with HPQC)
I search on google but no help . If anyone know how to do this please let me know steps.
Thanks
You can use Generic Test to call QTP during the testing in TFS. Make sure QTP is installed on the test agent. See the code here for reference:
QTP TFS Generic Test Integration.
One more link for reference: Executing remote QTP scripts via Test Agents and Test Controllers.
Take a look at a solution from OpsHub.
More details:
Announcement:
http://blogs.msdn.com/b/visualstudioalm/archive/2013/05/16/enabling-seamless-integration-with-team-foundation-server-microsoft-test-professional-and-hp-alm-with-opshub-v5-3.aspx
Video:
http://opshub.com/ohrel/Resources/Videos/QTP_MTM_Video/QTP_MTM_Video.mp4
Case study:
https://customers.microsoft.com/Pages/CustomerStory.aspx?recid=17218
Take a look into this code:
import QTObjectModelLib dll from C:\Program Files (x86)\HP\Unified Functional Testing\bin location to your solution.
public void Fn_QTP()
{
qtApp.Launch();
qtApp.Visible = true;
qtApp.Options.Run.RunMode = "Fast";
qtApp.Options.Run.StepExecutionDelay = 0;
qtApp.Options.Run.ViewResults = false;
qtApp.Test.Settings.Run.OnError = "Stop";
//iterate for all test cases under selected module
// oTestSuiteDict : this dictionary conatins all the testsuites from TFS which meant to be executed.
//keys have their ID's
foreach (var item in oTestSuiteDict.Keys)
{
foreach (var TestCase in oTestSuiteDict[item].Keys)
{
Console.WriteLine("Executing TestCase : {0}", TestCase);
//update the XML file and upload in QTP
//this XML file is used to provide the data to QTP as a environment variables.
Fn_UpdateXMLFile(item, TestCase);
//Open the test Case
string scriptPath = #"path of script that will be opened in QTP (Action)";
qtApp.Open(scriptPath, true, false);
// Get a reference to the test object
qtTest = qtApp.Test; // Get reference to test object opened/created by application
qtTest.Settings.Run.OnError = "NextStep";
//check if the library is already associated.
if (qtTest.Settings.Resources.Libraries.Find(#"library path") == 1)
{
qtTest.Settings.Resources.Libraries.RemoveAll();
}
qtTest.Settings.Resources.Libraries.Add(#"Library Path");
//Console.WriteLine("Library is associated with Test");
// Get a reference to the Results Object for test results location
QTObjectModelLib.RunResultsOptions qtRRO = new QTObjectModelLib.RunResultsOptions();
// Run the test
//creates and start the instance of Stopwatch just to track the time period of testcase execution.
Stopwatch stopwatch = Stopwatch.StartNew();
qtTest.Run(qtRRO, true, null); // run the test
stopwatch.Stop();
string oTime = stopwatch.Elapsed.ToString();
oTestCaseTime.Add(TestCase, oTime);
string ostatus = qtTest.LastRunResults.Status;
oResults.Add(TestCase, ostatus);
qtTest.Close(); // Close the test
}
}
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(qtTest); // Cleanly release COM object
qtTest = null; // set object to null
//break;
//qtApp.Quit(); // Quit QTP
GC.Collect(); // Garbage collect
GC.WaitForPendingFinalizers(); // Wait for GC
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(qtApp); // Cleanly release COM Object
qtApp = null; // set to null
}
// Fn_UpdateXMLFile : function to update environment variables for qtp
//module name : the testsuite name(contains list of testcases); testcasename : testcases listed in modulename(test suite)
public void Fn_UpdateXMLFile(string modulename,string testcasename)
{
string oPath = #"path of xml file";
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(oPath);
XmlNodeList nodes = xmlDoc.SelectNodes("Environment/Variable/Value");
nodes[0].InnerText = modulename;
nodes[1].InnerText = testcasename;
xmlDoc.Save(oPath);
}
//format of XML file :
<Environment>
<Variable>
<Name>ModuleName</Name>
<Value>ToolsMenu</Value>
</Variable>
<Variable>
<Name>""</Name>
<Value>""</Value>
</Variable>
</Environment>