Writer behavior when using Partitions in JSR352 Java Batch processing - jsr352

I have implemented a Java batch program using the JSR352 framework implementation available in the Websphere Liberty server container.
This Job is Chunk processing oriented
Reader - Read a data
Processor - Process it Writer
Writer - Write data in a file & update data in another file
Will the write operation in the Writer over-write each other when i create 2 partitions for this job ? or will both the partitions write output remain in the file without getting overwritten by the other ?

Related

Apache Beam: Wait for AvroIO write step is done before start ImportTransform Dataflow template

I'm using apache beam to create a pipeline where basically reads an InputFile, Convert to Avro, write the AvroFile to a bucket and then Import these avro files to Spanner using Dataflow template
The problem that I'm facing is that the last step (Import the Avro files to the Database) is starting before the previous (write Avro Files to the bucket) is done.
I tried to add the Wait.on function but that only works if returns a PCollection, but when I write the files to avro it returns PDone.
Example of the Code:
// Step 1: Read Files
PCollection<String> lines = pipeline.apply("Reading Input Data exported from Cassandra",TextIO.read().from(options.getInputFile()));
// Step 2: Convert to Avro
lines .apply("Write Item Avro File",AvroIO.writeGenericRecords(spannerItemAvroSchema).to(options.getOutput()).withSuffix(".avro"));
// Step 3: Import to the DataBase
pipeline.apply( new ImportTransform(
spannerConfig,
options.getInputDir(),
options.getWaitForIndexes(),
options.getWaitForForeignKeys(),
options.getEarlyIndexCreateFlag()));
Again, the problem is because step 3 starts before Step 2 is done
any ideas?
This is a flaw in the API, see, e.g. a recent discussion on this on the beam dev list. The only solutions for now are to either fork AvroIO to return a PCollection or run two pipelines sequentially.

jmeter | Get pass/fail count of samples from jtl file

I am using jmeter for functional testing and have 2 different jmx.
The first jmx has all APIs automated and the second jmx is used to send the html report (generated using Ant-Jmeter task) through SMTP sampler.
Now, I want to send the count of Total, Pass, Fail sample counts in the same email by parsing the jtl file generated by first jmx.
Here is what I can see in the jtl file, s="true" and s="false".
I want count of the same and save it as property to use it further in SMTP sampler.
Example in jtl:
<sample t="2" it="0" lt="2" ct="0" ts="1565592433268" s="false" lb="Verify Latest Patch" rc="200" rm="OK" tn="Tenant_Login 3-1" dt="text" by="9" sby="0" ng="1" na="1">
Any help will be appreciated.
Add the next line to user.properties file:
jmeter.save.saveservice.autoflush=true
it will instruct JMeter to immediately write results to file as soon as they're available
Add tearDown Thread Group to your Test Plan
Add HTTP Request sampler to the TearDown Thread Group
Configure it as follows:
Protocol: file
Path: `location of your .jtl result file
Add XPath Extractor as a child of the HTTP Request sampler
Configure it as follows:
Reference Name: anything meaningful, i.e. successCount
XPath query: count(//sample[#s='true'])
That's it, now you should be able to refer the successful samples count as ${successCount} where required

Error creating vocabulary from big text file on disk

I try to perform example from https://cran.r-project.org/web/packages/text2vec/vignettes/files-multicore.html but with my file "text" - 3.7Gb plain text, build from Wikipedia XML dump with Perl script from here - http://mattmahoney.net/dc/textdata.html
setwd("c:/rtest")
library(text2vec)
library(doParallel)
N_WORKERS = 2
registerDoParallel(N_WORKERS)
it_files_par = ifiles_parallel(file_paths = "text")
it_token_par = itoken_parallel(it_files_par, preprocessor = tolower, tokenizer = word_tokenizer)
vocab = create_vocabulary(it_token_par)
This causes error:
Error in unserialize(socklist[[n]]) : error reading from connection
I have 8Gb RAM, word2vec model from this file is created without any errors.
First of all it doesn't make sense to use parallel iterators on a single file - each file processed in a separate R worker process. So here it will be worse than just itoken. Also it involves sending result from each worker to the master process. Here we see that result it too big to be send through socket.
Long story short - just use itoken or split your file into several smaller files.

zlib inflate giving data error in erlang

I have a java client which is sending some message to an erlang server process listening on TCP.The java client sends the data using outputstream.On the server side i am using following call to uncompress the data after initialising zlib
zlib:inflate(ZStream, Data),
where Data is binary.I am getting data_error on this call.
Under what conditions do I get data_error with zlib.
Try setting a 0 or -15 WindowBits, would help if you paste more code like the zlib:inflateInit call, the binary dump of Data variable, and the Java side zlib init.
If you are streaming the data in relatively small chunks, you can use my ezlib on Github.
Performance wise it's around 69 % faster than erlang driver and also works better when you have concurrent sessions.
To integrate, use rebar as you would do for any other erlang app. To run a small example:
StringBin = <<"this is a string compressed with zlib nif library">>,
{ok, DeflateRef} = ezlib:new(?Z_DEFLATE),
{ok, InflateRef} = ezlib:new(?Z_INFLATE),
CompressedBin = ezlib:process(DeflateRef, StringBin),
DecompressedBin = ezlib:process(InflateRef, CompressedBin).
Do not use it to compress large blocks, because you can block the erlang scheduler. I will change this in the subsequent versions.

Is It possible to compile a latex document via node.js?

I'm new to node.js but I think It could be good for an asynchronous latex compile engine.
In other words I'd like to know if could be possible, and how, to compile a document via node.js and pdflatex.
The remote application would send the document as a JSON data structure, toghether with a template name for the end document layout.
The node.js will handle the compilation in pdf, taking the template from the file system.
Do you know if something similar, already exist?
You can spawn own child processes and thus also start latex processing. By registering appropriate listeners, you can detect the process completition or failure output:
var sys = require('sys'),
spawn = require('child_process').spawn,
pdflatex = spawn('pdflatex', ['-output-directory', '/target/dir/','input.tex']);
pdflatex.on('exit', function (code) {
console.log('child process exited with code ' + code);
});
EDIT: For creating the intermediary latex file using the provided data, I'd suggest to use a node.js template engine like mu/mustache.
So you could then pump the chunks of the template engine process as stdin to your spawned pdflatex process.

Resources