Dataflow GroupBy -> multiple outputs based on keys

Dataflow GroupBy -> multiple outputs based on keys - google-cloud-dataflow

Is there any simple way that I can redirect the output of GroupBy into multiple output files based on Group keys?
Bin.apply(GroupByKey.<String, KV<Long,Iterable<TableRow>>>create())
.apply(ParDo.named("Print Bins").of( ... )
.apply(TextIO.Write.to(*Output file based on key*))
If Sink is the solution, would you please share a sample code w/ me?
Thanks!

Beam 2.2 will include an API to do just that - TextIO.write().to(DynamicDestinations), see source. For now, if you'd like to use this API, you can use the 2.2.0-SNAPSHOT version. Note that this API is experimental and might change in Beam 2.3 or onwards.

Related

How to create docs for spring-integration endpoint

I'm using a SpringBoot 2.2.6 WebApplication with Maven 3. I'm also using spring-integration-http for my endpoint, that's mean that my endpoint are similar to follow:
#Bean
public IntegrationFlow test(CommonTransformer<TestDTO, String, TestMapper> testTransformer, Jackson2JsonObjectMapper obj) {
return IntegrationFlows.from(Http.inboundGateway("/foo/{name}")
.requestMapping(m -> m.methods(HttpMethod.GET))
.payloadExpression("#pathVariables.name")
.replyChannel(Constants.REPLY)
.requestPayloadType(String.class))
.transform(testTransformer)
.transform(new ObjectToJsonTransformer(obj))
.channel(Constants.HTTP_REQUEST)
.get();
}
Now I would like to create a OpenApi docs for my endpoint and, if it's possible, a swagger GUI interface to test it.
I have read several official/unofficial docs and I find interesting docs here another much interesting example here.
My preoccupation is that many of this articles are dated before 2020 (for example one of these use deprecated annotation likes #EnableSwagger2Mvc) but I can't managed to find out something more updated.
Is anyone aware of a more up-to-date procedure?
-------------------------- UPDATE --------------------------
First of all Thanks #ArtemBilan for yor response.
Yes I read that article and I'm not new to documenting my REST API. With springdoc-openapi-ui I'm able to create a .json file that, if putted in some editor like http://swagger.io or if used with a specific maven plugin can create a client (in both spring java and Angular language) ready for use.
I have tried the springfox way (above) to documenting my spring-integration-http but it sucks! It generate some useless files to reproduce the call via CURL..
Is not what I'm looking for. I must (the STO asks) documenting my endpoint like the .yaml you can find for the example Swagger Pet Store.
And it seems there's no way with this spring-integration-http to do so..
Any help is appreciate.

Is possible to use typeORM with Druid?

Currently I cannot find any information on a Node compatiable ORM for working with druid.
Druid is not officially supported by typeORM.
Druid takes sql("Druid SQL") so hypothetically, I should be able to output the raw sql queries to druid, correct?

I've not seen typeORM directly - rather it's super common for apps to query the Apache Calcite-powered SQL API directly:
https://druid.apache.org/docs/latest/querying/sql.html
Some people build an additional layer with application logic on top first - e.g. what Target have done.
https://imply.io/virtual-druid-summit/enterprise-scale-analytics-platform-powered-by-druid-at-target
Note the bit on NULL handling in case that's important to ya :) https://druid.apache.org/docs/latest/querying/sql.html#null-values

Extracting CCD document values using MDHT via Mirth

I am trying to use MDHT tools to extract values from a CCD document via Mirth. I am doing this in the following way.
Downloaded Java runtime libraries and placed them in Mirth's custom-lib folder and wrote sample code to extract some patient values in Mirth's transformer & Deploy section.
Code in transformer.
// Load the CCD Document
var doc = org.openhealthtools.mdht.uml.cda.util.CDAUtil.load(new
java.io.ByteArrayInputStream(messageObject.getRawData().getBytes("UTF-8")));
// Get CCD Document Sections to be parsed
var docPatientRole = doc.getRecordTargets().get(0).getPatientRole();
var docPatient = docPatientRole.getPatient();
var docPatientName = docPatient.getNames().get(0);
// Map Patient Identity Fields to Mirth Channel Map Variables
channelMap.put('patientFirstName',
docPatientName.getGivens().get(0).getText());
channelMap.put('patientLastName',
docPatientName.getFamilies().get(0).getText());
channelMap.put('patientGenderCode',
docPatient.getAdministrativeGenderCode().getCode());
channelMap.put('patientDateOfBirth', docPatient.getBirthTime().getValue()); // YYYYMMDD
Can anyone help me with the code as I am new to JavaScript and also I am not aware of all the functions in the .jar files so as to access all the other components in a CCD.
I am currently stuck at this point. I need to access all the sections/Components in a CCD. Can anyone please redirect me to any examples/tutorials (via Mirth) related to each section. I have already looked at some links guide/developers guide but all links are Dead and not working.
Any help is appreciated.

"but all links are dead and not working"
I know that feel... it's frustrating.
For a start, you need to define a type and a version of the document that you want to consume. Check out this article What version of CCDA Document is this? Then you need to find an Implementation Guide (IG) for this type of documents so you know its structure (for example, HL7 C-CDA Release 1.1 IG is available here). If you know the document type, you know what data can be extracted from the document.
I'm not sure about a programming language that you use in your question. Is it Java or JavaScript? My examples are in Java:
CCDA REST API - package com.appliedinformatics.cdaapi.parser (RecordTarget, Medications, Problems, Results).
MDHT Developers Guide: Consume CDA Content using MDHT API (Allergies).
MDHT Consolidated CDA Validator - GitHub
A reference C-CDA Validator - GitHub
MDHT CDA Maven example - GitHub

How to get a dataflow job's step details using Java Beam SDK?

I'm using Java Beam SDK for my dataflow job, and com.google.api.services.dataflow.model.Job class gives details about a particular job. However, it doesn't provide any method/property to get dataflow step information such as Elements Added, Estimated Size etc.
Below is the code I'm using to get the job's information,
PipelineResult result = p.run();
String jobId = ((DataflowPipelineJob) result).getJobId();
DataflowClient client = DataflowClient.create(options);
Job job = client.getJob(jobId);
I'm looking for something like,
job.getSteps("step name").getElementsAdded();
job.getSteps("step name").getEstimatedSize();
Thanks in advance.

The SinkMetrics class provides a bytesWritten() method and a elementsWritten() method. In addition, the SourceMetrics class provides an elementsRead() and a bytesRead() method.
If you use the classes in the org.apache.beam.sdk.metrics package to query for these metrics and filter by step, you should be able to get the underlying metrics for the stop (i.e., elements read).
I would add that if you're willing to look outside of the Beam Java SDK, since you're running on Google Cloud Dataflow you can use the Google Dataflow API In particular, you can use projects.jobs.getMetrics to fetch the detailed metrics for the job including the number of elements written/read. You will need to do some parsing of the metrics as there are hundreds of metrics for even a simple job, but the underlying data you're looking for is present via this API call (I just tested).

Get LOC count for classes and methods using Jenkins

I have installed Jenkins for my projects. The automatic build and deployment is happening successfully.
I would like to get following data:
No. of classes with lines in the range 0-50
No. of classes with lines in the range 301-500
No. of classes with lines in the range 501-1000
No. of classes with lines > 1000 etc.
I'd like the same things for methods: Eg: No. of methods with lines in the range 0-50
How can I get this data?
Please let me know.

I suggest you use http://cloc.sourceforge.net/
You can then extract the data as SQL data import it in a H2 database (in-memory) to
group accordingly to your needs.

Perhaps more than you need, but have you looked at Sonar? (http://www.sonarsource.org/) It integrates with your build, and can provide the metrics you're looking for and a lot more besides.

There are several other useful and easy to use tools:
javancss: A Source Measurement Suite for Java http://www.kclee.de/clemens/java/javancss/
ckjm: Chidamber and Kemerer Java Metrics http://www.spinellis.gr/sw/ckjm/
Some relevant tools:
classycle - http://classycle.sourceforge.net/
jdepend - http://clarkware.com/software/JDepend.html
You can also use XRadar to aggregate all these report and get something called "Project health". XRadar also supports previously mentioned CLOC

I do not know if the issue is still relevant to you. The other responses do not address Jenkins. There are several plugins for Jenkins, such as http://www.dwheeler.com/sloccount/. After you install the plugin, you can retrieve code metrics via the Jenkins REST API.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Dataflow GroupBy -> multiple outputs based on keys - google-cloud-dataflow

Beam 2.2 will include an API to do just that - TextIO.write().to(DynamicDestinations), see source. For now, if you'd like to use this API, you can use the 2.2.0-SNAPSHOT version. Note that this API is experimental and might change in Beam 2.3 or onwards.

Related

How to create docs for spring-integration endpoint

Is possible to use typeORM with Druid?

Extracting CCD document values using MDHT via Mirth

How to get a dataflow job's step details using Java Beam SDK?

Get LOC count for classes and methods using Jenkins

Categories

Resources