Parse Stackdriver LogEntry JSON in Dataflow pipeline - google-cloud-dataflow

I'm building a Dataflow pipeline to process Stackdriver logs, the data are read from Pub/Sub and results written into BigQuery.
When I read from Pub/Sub I get JSON strings of LogEntry objects but what I'm really interested in is protoPayload.line records which contain user log messages. To get those I need to parse LogEntry JSON object and I found a two years old Google example how to do it:
try {
JsonParser parser = new JacksonFactory().createJsonParser(entry);
LogEntry logEntry = parser.parse(LogEntry.class);
logString = logEntry.getTextPayload();
catch (IOException e) {
LOG.error("IOException parsing entry: " + e.getMessage());
catch(NullPointerException e) {
LOG.error("NullPointerException parsing entry: " + e.getMessage());
Unfortunately this doesn't work for me, the logEntry.getTextPayload() returns null. I'm not even sure if it's suppose to work as library is not mentioned anywhere in Google Cloud docs, the current logging library seems to be google-cloud-logging.
So if anyone could suggest what is the right or simplest way of parsing LogEntry objects?

I ended up with manually parsing LogEntry JSON with gson library, using the tree traversing approach in particular.
Here is a small snippet:
static class ProcessLogMessages extends DoFn<String, String> {
public void processElement(ProcessContext c) {
String entry = c.element();
JsonParser parser = new JsonParser();
JsonElement element = parser.parse(entry);
if (element.isJsonNull()) {
JsonObject root = element.getAsJsonObject();
JsonArray lines = root.get("protoPayload").getAsJsonObject().get("line").getAsJsonArray();
for (int i = 0; i < lines.size(); i++) {
JsonObject line = lines.get(i).getAsJsonObject();
String logMessage = line.get("logMessage").getAsString();
// Do what you need with the logMessage here
This is simple enough and works fine for me since I'm interested in protoPayload.line.logMessage objects only. But I guess this is not ideal way of parsing LogEntry objects if you need to work with many attributes.


Parsing the swagger API doc (swagger.json) to Java objects

I want to parse any complex swagger-API-document(swagger.json) to Java objects.
may be List>
what are available options?
I am trying with io.swagger.parser.SwaggerParser.
but want to make sure that I know other available options and I use the correct parser which suffices to parse any complex document.
currently we are trying as below.
public List<Map<String,Object>> parse(String swaggerDocString) throws SwaggerParseException{
Swagger swagger = new SwaggerParser().parse(swaggerDocString);
return processSwagger(swagger);
}catch(Exception ex){
String exceptionRefId=OSGUtil.getExceptionReferenceId();
logger.error("exception ref id " + exceptionRefId + " : Error while loading swagger file " + ex);
throw new SwaggerParseException("", ex.getLocalizedMessage(),exceptionRefId);
public List<Map<String,Object>> processSwagger(Swagger swagger){
List<Map<String,Object>> finalResult=new ArrayList<>();
Map<String, Model> definitions = swagger.getDefinitions();
// loop all the available paths of the swagger
if(swagger.getPaths()!=null && swagger.getPaths().keySet()!=null &&swagger.getPaths().keySet().size()>0 ){
//get the path
Path path=swagger.getPath(group);
//list all the operations of the path
Map<HttpMethod,Operation> mapList=path.getOperationMap();
return finalResult;
whats the differences between
swagger has the implementations for all the technologies.
and details for parsing swagger into Java is here.

How to create HL7 message ORU_R01 type using HAPI 2.4

I am newbie to HL7. I am trying to construct HL7 message ORU_R01 type using HAPI 2.4. I got incorrect message format when I add patient details in the below code; otherwise the format is ok. How to fix this issue? is there any example to construct HL7 ORU message with PID,ORC,OBR and OBX?
Output without patient
Output with patient (If I comment the patient details in the code)
import ca.uhn.hl7v2.model.v24.message.ORM_O01;
import ca.uhn.hl7v2.HapiContext;
import ca.uhn.hl7v2.DefaultHapiContext;
import ca.uhn.hl7v2.parser.Parser;
import ca.uhn.hl7v2.model.v24.segment.MSH;
public class CreateORUMessage {
private String sendingApplication = "IM";
private String sendingFacility = "ABC-ClinPath";
private String receivingApplication = "ABC-vet";
private String receivingFacility = "ABC-VetMed";
private void createHL7Message(){
ORM_O01 order = new ORM_O01();
//ORU_R01 oru = new ORU_R01();
// Populate the MSH Segment
// Example - MSH|^~\&|HISA_8592|HISF_2603|||200706081131||ADT^A04|HL7O.1.11379|D|2.1
MSH mshSegment = order.getMSH();
//PID - patient details
ORM_O01_PATIENT orm_pid = order.getPATIENT();
// Now, let's encode the message and look at the output
HapiContext context = new DefaultHapiContext();
Parser parser = context.getPipeParser();
String encodedMessage = parser.encode(order);
System.out.println("Printing ER7 Encoded Message:");
//String msg = order.encode();
}catch(Exception e){
public static void main(String args[]){
new CreateORUMessage().createHL7Message();
I tried other way too, but it's not worked :(
String msg = order.encode();
Your problem most likely is, that the segment separator character in HL7 is CR, which just resets the cursor to the start of the line and the next line overwrites the previous one. This only affects writing the message to the console. Writing to file or sending over TCP should be fine without any further conversions.
I had the same problem in an application once, this is my solution below.
ORU_R01 outMessage = new ORU_R01();
outMessage.initQuickstart("ORU", "R01", "T");
MSH mshSegment = outMessage.getMSH();
/* some code removed */
PID pidSegment = outMessage.getRESPONSE().getPATIENT().getPID();
/* some more code removed */
LOGGER.trace("Generated message contents:\n" + replaceNewlines(outMessage.encode()));
And the code for replaceNewLines() is quite simple
private static String replaceNewlines(String input) {
return input.replaceAll("\\r", "\n");

Groovy DAO Variable Scope Issue

In our Grails project we use a common Groovy DAO accessing an Amazon Oracle Database with PooledDataSource, and things were not working, and I suspect it was because the scope of some of the variables was incorrect. I have trimmed the code down and changed the names to a small subset of what we are doing in several locations. Some of the code in question was written by another developer with much more Java experience than me - I am a relative Java/Groovy novice - forgive the basic questions.
class SomeDAO {
MyPooledDataSource ds = new MyPooledDataSource()
Connection conn
PreparedStatement stmt
String queryText
public String getUserCount() {
String jsonOne
PojoOne one = new PojoOne()
conn = ds.getPooled()
queryText = getQuery("SomeQuery")
try {
stmt = conn.prepareStatement(queryText)
stmt.setString(1, 'YTD')
stmt.setString(2, '2014')
ResultSet rs = stmt.executeQuery()
while ( {
} catch (SQLException e) {}
jsonOne = (one as JSON).toString()
return jsonOne
public String getUserMetrics() {
String jsonTwo
ArrayList objArray = new ArrayList()
conn = ds.getPooled()
try {
queryText = getQuery("SomeOtherQuery")
stmt = conn.prepareStatement(queryText)
stmt.setString(1, 'YTD')
stmt.setString(2, '2014')
ResultSet rsQuery = stmt.executeQuery()
while ( {
PojoTwo two = new PojoTwo()
} catch (SQLException e) {}
jsonTwo = (objArray as JSON).toString()
return jsonTwo
public String getQuery(String operationName){
String query = "select QRY_TXT from T_SVC_QRY where OPERATION_NM = '" + operationName + "'"
ResultSet rs
conn = ds.getPooled()
stmt = conn.prepareStatement(query)
rs = stmt.executeQuery(query)
while ( {
queryText = rs.getString("QRY_TXT")
return queryText
I have a some concerns about the code we've written...
Seems like...
Connection conn
PreparedStatement stmt
String queryText
...should not be at the Class level, but declared in each method to avoid mutation of the variables by another method (or even the same method by a different request) causing side effects. Correct? Please explain
Should PooledDataSource be declared at the Class level for re-use. Correct? Please explain
Do we need to do the ds.setDataSource() in each method, or should that be done once for the class and why?
Seems like if try/catch is appropriate for the getUserCount and getUserMetrics methods, than it should be used in the getQuery method as well. Correct? Please explain
That's funky code. Only the DataSource should be shared. Do a Google search with your favorite search engine for groovy.sql.Sql - it's your best bet for working directly with JDBC in a Groovy or Grails project. It has lots of helper methods that let you write intuitive code and let it do the heavy lifting.
You might want to start by checking out the Javadoc page for the class.

NEO4J Spatial: tips about batch inserter

This is my scenario: we are building a routing system by using neo4j and the spatial plugin. We start from the OSM file and we read this file and import nodes and relationships in our graph (a custom graph model)
Now, if we don't use the batch inserter of neo4j, in order to import a compressed OSM file (with compressed dimension of around 140MB, and normal dimensions around 2GB) it takes around 3 days on a dedicated server with the following characteristics: CentOS 6.5 64bit, quad core, 8GB RAM; pease note that the most time is related to the Neo4J Nodes and relationships creation; in-fact if we read the same file without doing anything with neo4j, the file is read in around 7 minutes (i'm sure about this becouse in our process we first read the file in order to store the correct osm nodes ids and then we read again the file in order to create the neo4j graph)
Obviously we need to improve the import proces so we are trying to use the batchInserter. So far, so good (I need to check how much it will perform by using the batchInserter but I guess it will be faster); so the first thing I did was: let's try to use the batch inserter in a simple test case (very similar to our code, but without modifying our code directly)
I list my software versions:
Neo4j: 2.0.2
Neo4jSpatial: 0.13-neo4j-2.0.1
Neo4jGraphCollections: 0.7.1-neo4j-2.0.1
Osmosis: 0.43.1
Since I'm using osmosis in order to read the osm file, I wrote the following Sink implementation:
public class BatchInserterSinkTest implements Sink
public static final Map<String, String> NEO4J_CFG = new HashMap<String, String>();
private static File basePath = new File("/home/angelo/Scrivania/neo4j");
private static File dbPath = new File(basePath, "db");
private GraphDatabaseService graphDb;
private BatchInserter batchInserter;
// private BatchInserterIndexProvider batchIndexService;
private SpatialDatabaseService spatialDb;
private SimplePointLayer spl;
NEO4J_CFG.put( "neostore.nodestore.db.mapped_memory", "100M" );
NEO4J_CFG.put( "neostore.relationshipstore.db.mapped_memory", "300M" );
NEO4J_CFG.put( "neostore.propertystore.db.mapped_memory", "400M" );
NEO4J_CFG.put( "neostore.propertystore.db.strings.mapped_memory", "800M" );
NEO4J_CFG.put( "neostore.propertystore.db.arrays.mapped_memory", "10M" );
NEO4J_CFG.put( "dump_configuration", "true" );
public void initialize(Map<String, Object> arg0)
batchInserter = BatchInserters.inserter(dbPath.getAbsolutePath(), NEO4J_CFG);
graphDb = new SpatialBatchGraphDatabaseService(batchInserter);
spatialDb = new SpatialDatabaseService(graphDb);
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
//batchIndexService = new LuceneBatchInserterIndexProvider(batchInserter);
public void complete()
// TODO Auto-generated method stub
public void release()
// TODO Auto-generated method stub
public void process(EntityContainer ec)
Entity entity = ec.getEntity();
if (entity instanceof Node) {
Node osmNodo = (Node)entity;
org.neo4j.graphdb.Node graphNode = graphDb.createNode();
graphNode.setProperty("osmId", osmNodo.getId());
graphNode.setProperty("latitudine", osmNodo.getLatitude());
graphNode.setProperty("longitudine", osmNodo.getLongitude());
} else if (entity instanceof Way) {
//do something with the way
} else if (entity instanceof Relation) {
//do something with the relation
Then I wrote the following test case:
public class BatchInserterTest
private static final Log logger = LogFactory.getLog(BatchInserterTest.class.getName());
public void batchInserter()
File file = new File("/home/angelo/Scrivania/MilanoPiccolo.osm");
boolean pbf = false;
CompressionMethod compression = CompressionMethod.None;
if (file.getName().endsWith(".pbf"))
pbf = true;
else if (file.getName().endsWith(".gz"))
compression = CompressionMethod.GZip;
else if (file.getName().endsWith(".bz2"))
compression = CompressionMethod.BZip2;
RunnableSource reader;
if (pbf)
reader = new crosby.binary.osmosis.OsmosisReader(new FileInputStream(file));
reader = new XmlReader(file, false, compression);
reader.setSink(new BatchInserterSinkTest());
Thread readerThread = new Thread(reader);
while (readerThread.isAlive())
catch (InterruptedException e)
/* do nothing */
catch (Exception e)
logger.error("Errore nella creazione di neo4j con batchInserter", e);
By executing this code, I get this exception:
Exception in thread "Thread-1" java.lang.ClassCastException: org.neo4j.unsafe.batchinsert.SpatialBatchGraphDatabaseService cannot be cast to org.neo4j.kernel.GraphDatabaseAPI
at org.neo4j.cypher.ExecutionEngine.<init>(ExecutionEngine.scala:113)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(
at org.neo4j.collections.graphdb.ReferenceNodes.getReferenceNode(
at org.neo4j.gis.spatial.SpatialDatabaseService.getSpatialRoot(
at org.neo4j.gis.spatial.SpatialDatabaseService.getLayer(
at org.neo4j.gis.spatial.SpatialDatabaseService.containsLayer(
at org.neo4j.gis.spatial.SpatialDatabaseService.createLayer(
at org.neo4j.gis.spatial.SpatialDatabaseService.createSimplePointLayer(
at it.eng.pinf.graph.batch.test.BatchInserterSinkTest.initialize(
This is related to this code:
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
So now I'm wondering: how can I use the batchInserter for my case? I have to add the created nodes to the how can I create it by using the batchInserter graph db service?
Is there any little simple sample?
Any tip is really really appreciated
The OSMImporter class in the code has an example of using the batch inserter to import OSM data. The main thing is that the batch inserter is not really supported by neo4j spatial, so you need to do a few things manually. If you look at the class OSMImporter.OSMBatchWriter, you will see how it does things. It is not using the SimplePointLayer at all, since that does not support the batch inserter. It is creating the graph structure it wants directly. The simple point layer is quite simple, certainly much simpler than the OSM model created by the code I'm referencing, so I think you should be able to write a batch-inserter compatible version yourself without too much trouble.
What I would recommend is that you create the layer and nodes using the batch inserter to create the correct graph structure, then switch to the normal embedded API and use that to iterate through the nodes and add them to the spatial index.

Accessing encoded stream in OpenRasta

I have a need to access the encoded stream in OpenRasta before it gets sent to the client. I have tried using a PipelineContributor and registering it before KnownStages.IEnd, tried after KnownStages.IOperationExecution and after KnownStages.AfterResponseConding but in all instances the context.Response.Entity stream is null or empty.
Anyone know how I can do this?
Also I want to find out the requested codec fairly early on yet when I register after KnowStages.ICodecRequestSelection it returns null. I just get the feeling I am missing something about these pipeline contributors.
Without writing your own Codec (which, by the way, is really easy), I'm unaware of a way to get the actual stream of bytes sent to the browser. The way I'm doing this is serializing the ICommunicationContext.Response.Entity before the IResponseCoding known stage. Pseudo code:
class ResponseLogger : IPipelineContributor
public void Initialize(IPipeline pipelineRunner)
PipelineContinuation LogResponse(ICommunicationContext context)
string content = Serialize(context.Response.Entity);
string Serialize(IHttpEntity entity)
if ((entity == null) || (entity.Instance == null))
return String.Empty;
using (var writer = new StringWriter())
using (var xmlWriter = XmlWriter.Create(writer))
Type entityType = entity.Instance.GetType();
XmlSerializer serializer = new XmlSerializer(entityType);
serializer.Serialize(xmlWriter, entity.Instance);
return writer.ToString();
catch (Exception exception)
return exception.ToString();
This ResponseLogger is registered the usual way:
As mentioned, this doesn't necessarily give you the exact stream of bytes sent to the browser, but it is close enough for my needs, since the stream of bytes sent to the browser is basically just the same serialized entity.
By writing your own codec, you can with no more than 100 lines of code tap into the IMediaTypeWriter.WriteTo() method, which I would guess is the last line of defense before your bytes are transferred into the cloud. Within it, you basically just do something simple like this:
public void WriteTo(object entity, IHttpEntity response, string[] parameters)
using (var writer = XmlWriter.Create(response.Stream))
XmlSerializer serializer = new XmlSerializer(entity.GetType());
serializer.Serialize(writer, entity);
If you instead of writing directly to to the IHttpEntity.Stream write to a StringWriter and do ToString() on it, you'll have the serialized entity which you can log and do whatever you want with before writing it to the output stream.
While all of the above example code is based on XML serialization and deserialization, the same principle should apply no matter what format your application is using.
