Flume is truncating Characters When I use the source type as logger -- It just shows first 20 Characters and ignores the rest

Flume is truncating Characters When I use the source type as logger -- It just shows first 20 Characters and ignores the rest - flume

Here is my Test config (used netcat+ logger as console)
\#START OF CONFIG FILE
\#Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
\# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 4444
\# Describe the sink
a1.sinks.k1.type = logger
\#Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
\# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
\#====END OF CONFIG FILE
Now I issued the following command to use my specific config:
$bin/flume-ng agent --conf conf --conf-file conf/netcat_dump.conf --name a1 -Dflume.root.logger=DEBUG,console
Use netcat command and enter the following text:
$netcat localhost 4444
This is First Event being sent to flume through netcat
Now, If you look at the Flume Console, you see the truncated log line.
2013-11-25 15:33:20,862 ---- Event: { headers:{} body: 54 68 69 73 20 69 73 20 46 69 72 73
74 20 45 76 **This is First Ev** }
2013-11-25 15:33:20,862 ---- Events processed = 1
Note: I'd tried with most of the channel parameters but didn't help.

Your output is working as expected as the default logger sink will truncate the body content to 16 bytes. I don't believe you can override this behavior without creating your own custom LoggerSink as the current LoggerSink does not have any configuration parameters. I modified the existing LoggerSink below and named it AdvancedLoggerSink (a bit of a misnomer as it's not all that advanced).
The advanced logger sink adds a configuration parameter called maxBytes which you can use to set how much of your log message gets output. The default is still 16 bytes but you can now overwrite this with whatever you want. If you set it to 0 then it will print the entire log message.
To get this working you need to download the flume binaries and then create a JAR file with the AdvancedLoggerSink class in it. You'll need to include the following flume jars which are located in the lib directory of the flume binary download when you compile and create your jar file:
flume-ng-configuration-1.4.0.jar
flume-ng-core-1.4.0.jar
flume-ng-sdk-1.4.0.jar
slf4j-api-1.6.1.jar
Assuming you create a jar file called advancedLoggerSink.jar you would then place that into your flume plugin directory inside a directory called lib. The plugins directory defaults to $FLUME_HOME/plugins.d but you can create it anywhere. Your directory structure should looks like this:
plugins.d/advanced-logger-sink/lib/advancedLoggerSink.jar
(Make sure you put the jar inside a directory called 'lib'. See the flume user guide for more info on the plugin directory layout http://flume.apache.org/FlumeUserGuide.html)
To run the flume agent use the following command:
flume-ng agent --plugins-path /path/to/your/plugins.d --conf /conf/directory --conf-file /conf/logger.flume --name a1 -Dflume.root.logger=INFO,console
Notice how I specified the plugins-path (the path where the plugins.d directory exists). Flume will automatically load the advancedLoggerSink inside the plugins.d directory.
Here's the AdvancedLoggerSink class:
import org.apache.flume.Channel;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.Sink;
import org.apache.flume.Transaction;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.EventHelper;
import org.apache.flume.sink.AbstractSink;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class AdvancedLoggerSink extends AbstractSink implements Configurable {
private static final int defaultMaxBytes = 16;
private int maxBytesProp;
private static final Logger logger = LoggerFactory
.getLogger(AdvancedLoggerSink.class);
#Override
public void configure(Context context) {
// maxBytes of 0 means to log the entire event
int maxBytesProp = context.getInteger("maxBytes", defaultMaxBytes);
if (maxBytesProp < 0) {
maxBytesProp = defaultMaxBytes;
}
this.maxBytesProp = maxBytesProp;
}
#Override
public Status process() throws EventDeliveryException {
Status result = Status.READY;
Channel channel = getChannel();
Transaction transaction = channel.getTransaction();
Event event = null;
try {
transaction.begin();
event = channel.take();
if (event != null) {
if (logger.isInfoEnabled()) {
logger.info("Event: " + EventHelper.dumpEvent(
event,
this.maxBytesProp == 0 ? event.getBody().length : this.maxBytesProp
));
}
} else {
// No event found, request back-off semantics from the sink
// runner
result = Status.BACKOFF;
}
transaction.commit();
} catch (Exception ex) {
transaction.rollback();
throw new EventDeliveryException("Failed to log event: " + event,
ex);
} finally {
transaction.close();
}
return result;
}
}
Your config file should then look like:
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = AdvancedLoggerSink
# maxBytes is the maximum number of bytes to output for the body of the event
# the default is 16 bytes. If you set maxBytes to 0 then the entire record will
# be output.
a1.sinks.k1.maxBytes = 0
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

The above answer written by Sarus works except change the a1.sinks.k1.type to the whole class path name which should include the package name. In addition, for Flume 1.6.0, copy your compiled jar to the lib folder under the installed flume path. You can also use System.out.pritnln instead of using log. Something like below
if(event!=null){
System.out.println(EventHelper.dumpEvent(event,event.getBody().length));
status = Status.READY;
}else{
System.out.println("Event is null");
status = Status.BACKOFF;
}

Related

How do I resolve the "Decompressor is not installed for grpc-encoding" issue?

I'm getting this error when I call my gRPC Golang server from Dart:
Caught error: gRPC Error (code: 12, codeName: UNIMPLEMENTED, message: grpc: Decompressor is not installed for grpc-encoding "gzip", details: [], rawResponse: null, trailers: {})
I have read https://github.com/bradleyjkemp/grpc-tools/issues/19, and it doesn't appear to apply to my issue.
The server is running 1.19.2 on Gcloud Ubuntu.
Dart is running 2.18.2 on Mac Monterey
I have a Dart client calling a Go server. Both appear to be using GZIP for compression.
Dart proto
syntax = "proto3";
option java_multiple_files = true;
option java_package = "io.grpc.examples.helloworld";
option java_outer_classname = "HelloWorldProto";
option objc_class_prefix = "HLW";
package helloworld;
// The greeting service definition.
service Greeter {
// Sends a greeting
rpc SayHello (HelloRequest) returns (HelloReply) {}
}
// The request message containing the user's name.
message HelloRequest {
string name = 1;
}
// The response message containing the greetings
message HelloReply {
string message = 1;
}
GO proto:
syntax = "proto3";
option go_package = "google.golang.org/grpc/examples/helloworld/helloworld";
option java_multiple_files = true;
option java_package = "io.grpc.examples.helloworld";
option java_outer_classname = "HelloWorldProto";
package helloworld;
// The greeting service definition.
service Greeter {
// Sends a greeting
rpc SayHello (HelloRequest) returns (HelloReply) {}
}
// The request message containing the user's name.
message HelloRequest {
string name = 1;
}
// The response message containing the greetings
message HelloReply {
string message = 1;
}
Dart Client code:
import 'package:grpc/grpc.dart';
import 'package:helloworld/src/generated/helloworld.pbgrpc.dart';
Future<void> main(List<String> args) async {
final channel = ClientChannel(
'ps-dev1.savup.com',
port: 54320,
options: ChannelOptions(
credentials: ChannelCredentials.insecure(),
codecRegistry:
CodecRegistry(codecs: const [GzipCodec(), IdentityCodec()]),
),
);
final stub = GreeterClient(channel);
final name = args.isNotEmpty ? args[0] : 'world';
try {
final response = await stub.sayHello(
HelloRequest()..name = name,
options: CallOptions(compression: const GzipCodec()),
);
print('Greeter client received: ${response.message}');
} catch (e) {
print('Caught error: $e');
}
await channel.shutdown();
}
The Go gRPC server works fine with a Go gRPC client and BloomRPC.
I'm new to gRPC in general and very new to Dart.
Thanks in advance for any help resolving this issue.

That error that you shared shows that your server doesn't support gzip compression.
The quickest fix is to not use gzip compression in the client's call options, by removing the line:
options: CallOptions(compression: const GzipCodec()),
from your Dart code.
The go-grpc library has an implementation of a gzip compression encoding in package github.com/grpc/grpc-go/encoding/gzip, but it's experimental, so likely not wise to use it in production; or at least you should pay close attention to it:
// Package gzip implements and registers the gzip compressor
// during the initialization.
//
// Experimental
//
// Notice: This package is EXPERIMENTAL and may be changed or removed in a
// later release.
If you want to use it in your server, you just need to import the package; there is no user-facing code in the package:
import (
_ "github.com/grpc/grpc-go/encoding/gzip"
)
The documentation about compression for grpc-go mentions this above package as an example of how your implement such a compressor.
So you may also want to copy the code to a more stable location and take responsibility for maintaining it yourself, until there is a stable supported version of it.

How can I enable Caddy plugins in NixOS?

I've just started playing with NixOS, and have so far managed to edit /etc/nixos/configuration.nix in my NixOS 18.09 VM to have PHP-FPM and the Caddy webserver enabled.
{ config, pkgs, ... }:
{
imports = [ <nixpkgs/nixos/modules/installer/virtualbox-demo.nix> ];
users = {
mutableUsers = false;
groups = {
caddy = { };
php-project = { };
};
users = {
hello = {
group = "php-project";
};
};
};
environment.systemPackages = [
pkgs.htop
pkgs.httpie
pkgs.php # for PHP CLI
];
services.caddy = {
enable = true;
email = "david#example.com";
agree = true;
config = ''
(common) {
gzip
header / -Server
header / -X-Powered-By
}
:8080 {
root /var/www/hello
fastcgi / /run/phpfpm/hello.sock php
log syslog
import common
}
'';
};
services.phpfpm = {
phpOptions = ''
date.timezone = "Europe/Berlin"
'';
poolConfigs = {
hello = ''
user = hello
listen = /run/phpfpm/hello.sock
; ...
pm.max_requests = 500
'';
};
};
}
A PHP-processed response is available at at localhost:8080. (Yay!)
To enable Caddy plugins when compiling from source, Go imports are added to caddy's run.go, e.g.:
_ "github.com/mholt/caddy/caddyhttp" // plug in the HTTP server type
// This is where other plugins get plugged in (imported)
_ "github.com/nicolasazrak/caddy-cache" // added to use another plugin
)
How can I set such line insertion to be performed after the source is downloaded and before the build takes place? (If this is a reasonable approach when using Nix?)
The NixOS 18.09 caddy package.
The NixOS 18.09 caddy service.
I believe that when writing a package a builder script (Bash or otherwise) can be assigned, and I'm thinking the line insertion could be done in it. But I'm lost as to how to assign a script to an existing package in this situation (override an attribute/use an overlay?) and where to put the script on the disk.
Status update
I've been doing some reading on customising packages in general and it sounds like overlays might be what I need. However, I don't seem to be able to get my overlay evaluated.
I'm using overriding of the package name as a test as it's simpler than patching code.
Overlay attempt 1
/etc/nixos/configuration.nix:
{ config, pkgs, options, ... }:
{
imports = [ <nixpkgs/nixos/modules/installer/virtualbox-demo.nix> ];
nix.nixPath = options.nix.nixPath.default ++ [
"nixpkgs-overlays=/etc/nixos/overlays-compat/"
];
# ...
}
/etc/nixos/overlays-compat/overlays.nix:
self: super:
with super.lib;
let
# Using the nixos plumbing that's used to evaluate the config...
eval = import <nixpkgs/nixos/lib/eval-config.nix>;
# Evaluate the config,
paths = (eval {modules = [(import <nixos-config>)];})
# then get the `nixpkgs.overlays` option.
.config.nixpkgs.overlays
;
in
foldl' (flip extends) (_: super) paths self
/etc/nixos/overlays-compat/caddy.nix:
self: super:
{
caddy = super.caddy.override {
name = "caddy-override";
};
}
Overlay attempt 2
/etc/nixos/configuration.nix:
nixpkgs.overlays = [ (self: super: {
caddy = super.caddy.override {
name = "caddy-override";
};
} ) ];
error: anonymous function at /nix/store/mr5sfmz6lm5952ch5q6v49563wzylrkx-nixos-18.09.2327.37694c8cc0e/nixos/pkgs/servers/caddy/default.nix:1:1 called with unexpected argument 'name', at /nix/store/mr5sfmz6lm5952ch5q6v49563wzylrkx-nixos-18.09.2327.37694c8cc0e/nixos/lib/customisation.nix:69:12
overrideAttrs
I previously managed to override the package name with this:
{ config, pkgs, options, ... }:
let
caddyOverride = pkgs.caddy.overrideAttrs (oldAttrs: rec {
name = "caddy-override-v${oldAttrs.version}";
});
in {
{
# ...
services.caddy = {
package = caddyOverride;
# ...
}
}
I could see in htop that the caddy binary was in a folder called /nix/store/...-caddy-override-v0.11.0-bin/. But I understand that overriding in this way has been superseded by overlays.

In order to add plugins to Caddy, it seems that the method is to modify the source.
You will need to adapt the Nixpkgs expression for Caddy to make that possible. That can be done outside the Nixpkgs tree, using services.caddy.package = callPackage ./my-caddy.nix {} for example, or by forking the Nixpkgs repository and pointing your NIX_PATH to your clone.

There is an issue for Caddy plugins: https://github.com/NixOS/nixpkgs/issues/14671
PR welcome!

wsdl2java output produces only the package name

i have used a sample wsdl in my java code. when i try to print the output it returns only the package name like:
com.holidaywebservice.holidayservice_v2.CountryCode#6b6478
This happens only when the output was a list.
Part of my code:
HolidayService2 hs1= new HolidayService2();
HolidayService2Soap hss1= hs1.getHolidayService2Soap();
ArrayOfCountryCode acc = hss1.getCountriesAvailable();
system.out.println(acc.getCountryCode());
wsdl url:http://holidaywebservice.com/HolidayService_v2/HolidayService2.asmx?WSDL

With this com.holidaywebservice.holidayservice_v2.CountryCode#6b6478 you're trying to print the ArrayOfCountryCode object. Your code instead should be:
package com.holidaywebservice.holidayservice_v2.clientsample;
import com.holidaywebservice.holidayservice_v2.*;
public class ClientSample {
public static void main(String[] args) {
//Create Web Service Client..."
HolidayService2 service1 = new HolidayService2();
//Create Web Service...
HolidayService2HttpGet port1 = service1.getHolidayService2HttpGet();
//call WS
ArrayOfCountryCode acc = port1.getCountriesAvailable();
for(CountryCode cc : acc.getCountryCode()){
System.out.println("Country code is: " + cc.getCode());
System.out.println("Country code Description is: " + cc.getDescription());
}
}
}
Update Try just adding the below
for(CountryCode cc : acc.getCountryCode()){
System.out.println("Country code is: " + cc.getCode());
System.out.println("Country code Description is: " + cc.getDescription());
}
After the line ArrayOfCountryCode acc = hss1.getCountriesAvailable(); in your current code. But you see the gist of it, acc is an array of country codes.

Apache Beam IllegalArgumentException on Google Dataflow with message `Not expecting a splittable ParDoSingle: should have been overridden`

I am trying to write a pipeline which periodically checks a Google Storage bucket for new .gz files which are actually compressed .csv files. Then it writes those records to a BigQuery table. The following code was working in batch mode before I added the .watchForNewFiles(...) and .withMethod(STREAMING_INSERTS) parts. I am expecting it to run in streaming mode with those changes. However I am getting an exception that I can't find anything related on the web. Here is my code:
public static void main(String[] args) {
DataflowDfpOptions options = PipelineOptionsFactory.fromArgs(args)
//.withValidation()
.as(DataflowDfpOptions.class);
Pipeline pipeline = Pipeline.create(options);
Stopwatch sw = Stopwatch.createStarted();
log.info("DFP data transfer from GS to BQ has started.");
pipeline.apply("ReadFromStorage", TextIO.read()
.from("gs://my-bucket/my-folder/*.gz")
.withCompression(Compression.GZIP)
.watchForNewFiles(
// Check for new files every 30 seconds
Duration.standardSeconds(30),
// Never stop checking for new files
Watch.Growth.never()
)
)
.apply("TransformToTableRow", ParDo.of(new TableRowConverterFn()))
.apply("WriteToBigQuery", BigQueryIO.writeTableRows()
.to(options.getTableId())
.withMethod(STREAMING_INSERTS)
.withCreateDisposition(CREATE_NEVER)
.withWriteDisposition(WRITE_APPEND)
.withSchema(TableSchema)); //todo: use withJsonScheme(String json) method instead
pipeline.run().waitUntilFinish();
log.info("DFP data transfer from GS to BQ is finished in {} seconds.", sw.elapsed(TimeUnit.SECONDS));
}
/**
* Creates a TableRow from a CSV line
*/
private static class TableRowConverterFn extends DoFn<String, TableRow> {
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
String[] split = c.element().split(",");
//Ignore the header line
//Since this is going to be run in parallel, we can't guarantee that the first line passed to this method will be the header
if (split[0].equals("Time")) {
log.info("Skipped header");
return;
}
TableRow row = new TableRow();
for (int i = 0; i < split.length; i++) {
TableFieldSchema col = TableSchema.getFields().get(i);
//String is the most common type, putting it in the first if clause for a little bit optimization.
if (col.getType().equals("STRING")) {
row.set(col.getName(), split[i]);
} else if (col.getType().equals("INTEGER")) {
row.set(col.getName(), Long.valueOf(split[i]));
} else if (col.getType().equals("BOOLEAN")) {
row.set(col.getName(), Boolean.valueOf(split[i]));
} else if (col.getType().equals("FLOAT")) {
row.set(col.getName(), Float.valueOf(split[i]));
} else {
//Simply try to write it as a String if
//todo: Consider other BQ data types.
row.set(col.getName(), split[i]);
}
}
c.output(row);
}
}
And the stack trace:
java.lang.IllegalArgumentException: Not expecting a splittable ParDoSingle: should have been overridden
at org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
at org.apache.beam.runners.dataflow.PrimitiveParDoSingleFactory$PayloadTranslator.payloadForParDoSingle(PrimitiveParDoSingleFactory.java:167)
at org.apache.beam.runners.dataflow.PrimitiveParDoSingleFactory$PayloadTranslator.translate(PrimitiveParDoSingleFactory.java:145)
at org.apache.beam.runners.core.construction.PTransformTranslation.toProto(PTransformTranslation.java:206)
at org.apache.beam.runners.core.construction.SdkComponents.registerPTransform(SdkComponents.java:86)
at org.apache.beam.runners.core.construction.PipelineTranslation$1.visitPrimitiveTransform(PipelineTranslation.java:87)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:668)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458)
at org.apache.beam.runners.core.construction.PipelineTranslation.toProto(PipelineTranslation.java:59)
at org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:165)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:684)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:173)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at com.diply.data.App.main(App.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
at java.lang.Thread.run(Thread.java:748)
Here is my command to publish the job on Dataflow:
clean compile exec:java -Dexec.mainClass=com.my.project.App "-Dexec.args=--runner=DataflowRunner --tempLocation=gs://my-bucket/tmp --tableId=Temp.TestTable --project=my-project --jobName=dataflow-dfp-streaming" -Pdataflow-runner
I use apache beam version 2.5.0. Here is the relevant section from my pom.xml.
<properties>
<beam.version>2.5.0</beam.version>
<bigquery.version>v2-rev374-1.23.0</bigquery.version>
<google-clients.version>1.23.0</google-clients.version>
...
</properties>

Running the code with Dataflow 2.4.0 gives a more explicit error: java.lang.UnsupportedOperationException: DataflowRunner does not currently support splittable DoFn
However, this answer suggests that this is supported since 2.2.0. This is indeed the case, and following this remark you need to add the --streaming option in your Dexec.args to force it into streaming mode.
I tested it with the code I supplied in the comments with both your pom and mine and both 1. produce your error without --streaming 2. run fine with --streaming
You might want to open a github beam issue since this behavior is not documented anywhere offically as far as I know.

Writing to Google Cloud Storage from PubSub using Cloud Dataflow using windowing

I am receiving messages to dataflow via pubsub in streaming mode (which is required for my desires).
Each message should be stored in its own file in GCS.
Since unbounded collections in TextIO.Write is not supported I tried to divide the PCollection into windows which contain one element each.
And writes each window to google-cloud-storage.
Here is my code:
public static void main(String[] args) {
DataflowPipelineOptions options = PipelineOptionsFactory.create()
.as(DataflowPipelineOptions.class);
options.setRunner(BlockingDataflowPipelineRunner.class);
options.setProject(PROJECT_ID);
options.setStagingLocation(STAGING_LOCATION);
options.setStreaming(true);
Pipeline pipeline = Pipeline.create(options);
PubsubIO.Read.Bound<String> readFromPubsub = PubsubIO.Read.named("ReadFromPubsub")
.subscription(SUBSCRIPTION);
PCollection<String> streamData = pipeline.apply(readFromPubsub);
PCollection<String> windowedMessage = streamData.apply(Window.<String>triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1))).discardingFiredPanes());
e
windowedMessage.apply(TextIO.Write.to("gs://pubsub-outputs/1"));
pipeline.run();
}
I still receive the same error got before windowing.
The DataflowPipelineRunner in streaming mode does not support TextIO.Write.
What is the code for executing the described above.

TextIO work with Bound PCollection, you could write into GCS with API Storage.
You could do :
PipeOptions options = data.getPipeline().getOptions().as(PipeOptions.class);
data.apply(WithKeys.of(new SerializableFunction<String, String>() {
public String apply(String s) { return "mykey"; } }))
.apply(Window.<KV<String, String>>into(FixedWindows.of(Duration.standardMinutes(options.getTimeWrite()))))
.apply(GroupByKey.create())
.apply(Values.<Iterable<String>>create())
.apply(ParDo.of(new StorageWrite(options)));
You create a Window with an operation of groupBy and you could write with iterable into Storage. the processElement of StorageWrite :
PipeOptions options = c.getPipelineOptions().as(PipeOptions.class);
String date = ISODateTimeFormat.date().print(c.window().maxTimestamp());
String isoDate = ISODateTimeFormat.dateTime().print(c.window().maxTimestamp());
String blobName = String.format("%s/%s/%s", options.getBucketRepository(), date, options.getFileOutName() + isoDate);
BlobId blobId = BlobId.of(options.getGCSBucket(), blobName);
WriteChannel writer = storage.writer(BlobInfo.builder(blobId).contentType("text/plain").build());
for (Iterator<String> it = c.element().iterator(); it.hasNext();) {
writer.write(ByteBuffer.wrap(it.next().getBytes()));
}
writer.close();

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Flume is truncating Characters When I use the source type as logger -- It just shows first 20 Characters and ignores the rest - flume

Related

How do I resolve the "Decompressor is not installed for grpc-encoding" issue?

How can I enable Caddy plugins in NixOS?

wsdl2java output produces only the package name

Apache Beam IllegalArgumentException on Google Dataflow with message `Not expecting a splittable ParDoSingle: should have been overridden`

Writing to Google Cloud Storage from PubSub using Cloud Dataflow using windowing

Categories

Resources