Hazelcast Jet java.util.streaming - java-stream

I really like how jazelcast jet works with java util streams but when I run those streams I am getting confused how is this running in a distributed way.
public class IstreamHazelcastDemo {
public static void main( String[] args ) {
JetInstance jet = Jet.newJetInstance();
Jet.newJetInstance();
IListJet<String> list = jet.getList("list");
for(int i = 0; i < 50; i++) {
list.add("test" + i);
}
DistributedStream.fromList(list)
.map(word -> {
System.out.println("word: " + word);
return word.toUpperCase();
})
.collect(DistributedCollectors.toIList("sink"))
.forEach(System.out::println);
}
}
This is a simple example where I create a jet instance first bu running another main program and then run this code so it forms a cluster of 2 nodes. So when I run the above code I was expecting to see the print statement inside map function to be printed in both the nodes since I thought its distributed and will send to multiple nodes. But it always executed the whole flow only in one node. I am trying to think how is this distributed or is it me who is lacking the understanding of hazelcast Jet.
Thanks

Try this change and you should see a difference
IMapJet<String, String> map = jet.getMap("map");
for(int i = 0; i < 50; i++) {
map.put("test" + i, "test" + i);
}
DistributedStream.fromMap(map)
.map(entry -> {
System.out.println("word: " + entry.getKey());
return entry.getKey().toUpperCase();
})
.collect(DistributedCollectors.toIList("sink"))
.forEach(System.out::println);
The difference here is around distribution and partitioning.
A list is distributed, meaning sent to the grid for hosting, but it is still a single object. One grid member holds it, so you'll see a single stream of sysout from the mapper.
A map is distributed, but is also partitioned, meaning the hosting is split across the grid members. If there are two grid members they'll have roughly half of the map content each. So you'll see multiple streams of sysout.

Related

EMQX- Publish MQTT Topic with unique identifier is taking much more time than Static MQTT Topic

I was trying to publish messages on emqx broker on different topics.Scenario takes much time while publishing with dynamic topic with one client and if we put topic name as static it takes much less time.
Here I have posted result and code for the same.
I am using EMQX broker with Eclipse paho client Version 3 and Qos level 1.
Time for different topics with 100 simple publish message (Consider id as dynamic here):
Total time pattern 1: /config/{id}/outward::36 sec -----------------> HERE TOPIC is DYNAMIC. and {id} is a variable whose value is changing in loop as shown in below code
Total time pattern 2: /config/test::1.2 sec -----------------------> HERE TOPIC is STATIC
How shall I publish message with different id so topic creation wont take much time?
public class MwttPublish {
static IMqttClient instance= null;
public static IMqttClient getInstance() {
try {
if (instance == null) {
instance = new MqttClient(mqttHostUrl, "SimpleTestMQTT");
}
if (!instance.isConnected()) {
MqttConnectOptions options = new MqttConnectOptions();
options.setUserName("test");
options.setPassword("test".toCharArray());
options.setAutomaticReconnect(true);
options.setCleanSession(false);
options.setConnectionTimeout(10);
instance.connect(options);
}
} catch (final Exception e) {
System.out.println("Exception in mqtt: {}" + e.getMessage());
}
return instance;
}
public static void publishMessage() throws MqttException {
IMqttClient iMqttClient = getInstance();
MqttMessage mqttMessage = new MqttMessage("Hello".getBytes());
mqttMessage.setQos(1);
mqttMessage.setRetained(true);
System.out.println("Publish Start for pattern 1");
int i =0;
final BigDecimal mqttmsgPublishstartTime = new BigDecimal(System.currentTimeMillis());
do {
iMqttClient.publish("/config/" +i +"/outward", mqttMessage);
i++;
}while(i<100);
System.out.println("Total time pattern 1 /config/i/outward::" + (new BigDecimal(System.currentTimeMillis())).subtract(mqttmsgPublishstartTime));
System.out.println("Publish Start for pattern 2");
final BigDecimal mqttmsgPublishstartTime1 = new BigDecimal(System.currentTimeMillis());
i =0;
do {
iMqttClient.publish("/config/test", mqttMessage);
i++;
}while(i<100);
System.out.println("Total time pattern 2 /config/test::" + (new BigDecimal(System.currentTimeMillis())).subtract(mqttmsgPublishstartTime1));
}
}
This is not a valid test, you've fallen into many of the clasic micro benchmark traps e.g.
Way too small a sample size
No account for JVM JIT warm up or GC overhead
Not comparing like to like e.g. time taken to concatenate the strings for the topics
Please check out the following: https://stackoverflow.com/a/2844291/504554
Also from a MQTT point of view topics are ephemeral they only really "exist" for the instant a message is published while the broker checks for subscribed clients with a matching pattern.

PowerPoint VSTO allocates more and more memory

I'm writting a VSTO which loops through all slides, through all shapes and sets the title to a value.
I recognized that the memory consuption is going up after each run.
So therefore I minimized my code and let it run a 100 time which ends up allocating about 20MB Memory for every 100 runs.
My code is executed from a sidebar-button, the presentation has about 30 slides with titles.
My code looks like this for the button:
private void button1_Click(object sender, EventArgs e)
{
SetTitle_Direct();
Stopwatch watch = new Stopwatch();
watch.Start();
SetTitle_Direct();
watch.Stop();
//MessageBox.Show("Time spend: " + watch.Elapsed);
AMRefreshProgress.Maximum = 100;
AMRefreshProgress.Step = 1;
AMRefreshProgress.UseWaitCursor = true;
AMRefreshProgress.ForeColor = System.Drawing.ColorTranslator.FromHtml(ThisAddIn.amColor);
for (int i = 1; i <= 100; i++)
{
SetTitle_Direct();
AMRefreshProgress.PerformStep();
}
AMRefreshProgress.Value = 0;
AMRefreshProgress.UseWaitCursor = false;
Stopwatch watch2 = new Stopwatch();
watch2.Start();
SetTitle_Direct();
watch2.Stop();
MessageBox.Show("Time 1st run: " + watch.Elapsed + "\n Time 11th run: " + watch2.Elapsed);
}
The SetTitle_Direct() loops through the slides:
public void SetTitle_Direct()
{
PowerPoint.Presentation oPresentation = Globals.ThisAddIn.Application.ActivePresentation;
foreach (PowerPoint.Slide oSlide in oPresentation.Slides)
{
if (oSlide.Shapes.HasTitle == OFFICECORE.MsoTriState.msoTrue)
{
oSlide.Shapes.Title.TextFrame.TextRange.Text = "Test Main Title";
}
for (int iShape = 1; iShape <= oSlide.Shapes.Count; iShape++)
{
if (oSlide.Shapes[iShape].Type == Microsoft.Office.Core.MsoShapeType.msoPlaceholder)
{
if (oSlide.Shapes[iShape].PlaceholderFormat.Type == PowerPoint.PpPlaceholderType.ppPlaceholderSubtitle)
{
oSlide.Shapes[iShape].TextFrame.TextRange.Text = "Test Sub Title";
}
}
}
}
}
What causes the AddIn to allocate more and more memory - or how could this be avoided?
When you develop a VSTO based add-in you typically deal with COM objects (PowerPoint is a COM server). With each property or method call which returns a COM object, an object reference is typically increased which leads to keeping objects in memory until the reference counter is decreased and equals zero. So, I'd recommend using the Marshal.ReleaseComObject method to decrease an object reference and let the runtime to keep memory and make objects lifetime shorter.
To be able to release every COM object you must split long lines of property and method calls. Declare each property or method call on a separate line of code. Thus, you will be able to release every object and debug the code efficiently if anything strange happens around.
You may take a look at the When to release COM objects in Office add-ins developed in .NET article for more information.
Another alternative way is to use the garbage collector:
GC.Collect
GC.WaitForPendingFinalizers
GC.Collect
GC.WaitForPendingFinalizers

Where do you do CallActivityAsync in orchestration method

I have just started using durable functions and needs some advise for how to do fan out pattern correctly. I have a FTP server where from I read all the files. I want to start an Activity function for each file. As I understand it the orchestrator function will be called everytime an Activity function is being executed. I just want to read the files once. To avoid calling the code that read the files and starts the activity functions multiple times, what is the recommended approach? Is it having an activity function that that add's all the activity functions or is it using the IsReplaying property, or something different?
[FunctionName("OrchestrationMoveFilesToBlob")]
public static async Task<List<string>> RunOrchestrator(
[OrchestrationTrigger] DurableOrchestrationContext context)
{
var outputs = new List<string>();
if (!context.IsReplaying)
{
// Do you call your database here and make a call to CallActivityAsync for each row?
}
// doing it here is properly very wrong as it will be called multiple times
var tasks = new Task<string>[7];
for (int i = 0; i < 7; i++)
{
tasks[i] = context.CallActivityAsync<string>("E2_CopyFileToBlob",""); }
await Task.WhenAll(tasks);
return outputs;
}
When looking into the sample in the link below this actually calls it directly in the orchestrator function? Is this not really bad? It continue adding same activities again and again .... ?
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-cloud-backup
Not sure I understand what you try to achieve but your code looks not bad so far. An orchestration is just called once (and maybe some times more for replay but this is not your problem here). From your orchestration you can call in a fan out all your activity functions (gathering a file from an ftp) each activity function one file. await Task.WhenAll(tasks) is your fan in. (you can use a List<Task> instead of the array and call .Add(task) on it if you want. In order to not edit your code I copied it here and added some comments and questions (feel free to edit here):
[FunctionName("OrchestrationMoveFilesToBlob")]
public static async Task<List<string>> RunOrchestrator(
[OrchestrationTrigger] DurableOrchestrationContext context)
{
var outputs = new List<string>();
if (!context.IsReplaying)
{
// just needed for things that should not happen twice like logging....
}
// if your work isn't a fixed list just call an activity
// which replies with the list of work here (e.g. list of filenames)
var tasks = new Task<string>[7]; // can be a List<Task> too
for (int i = 0; i < 7; i++)
{
tasks[i] = context.CallActivityAsync<string>("E2_CopyFileToBlob","");
}
await Task.WhenAll(tasks);
return outputs; // currently an empty list. What do you want to give back?
}

From Bigtable To GCS (and vice versa) via Dataflow

We are trying to run a daily Dataflow pipeline that reads off Bigtable and dumps data into GCS (using HBase's Scan and BaseResultCoder as coder) as follows (just to highlight the idea):
Pipeline pipeline = Pipeline.create(options);
Scan scan = new Scan();
scan.setCacheBlocks(false).setMaxVersions(1);
scan.addFamily(Bytes.toBytes("f"));
CloudBigtableScanConfiguration btConfig = BCloudBigtableScanConfiguration.Builder().withProjectId("aaa").withInstanceId("bbb").withTableId("ccc").withScan(scan).build();
pipeline.apply(Read.from(CloudBigtableIO.read(btConfig))).apply(TextIO.Write.to("gs://bucket/dir/file").withCoder(HBaseResultCoder.getInstance()));
pipeline.run();
This seems to run perfectly as expected.
Now, we want to be able to use the dumped file in GCS for a recovery job if needed. That is, we want to have a dataflow pipeline which reads the dumped data (which is PCollection) from GCS and creates Mutations ('Put' objects, basically). For some reason, the following code fails with a bunch of NullPointerExceptions. We are unsure why that would be the case -- if-statements below which check for null or 0-length strings were added to see if that would make a difference, but it did not.
// Part of DoFn<Result,Mutation>
#Override
public void processElement(ProcessContext c) {
Result result = c.element();
byte[] row = result.getRow();
if (row == null || row.length == 0) { // NullPointerException at this line
return;
}
Put mutation = new Put(result.getRow());
// go through the column/value entries of this row, and create a corresponding put mutation.
for (Entry<byte[], byte[]> entry : result.getFamilyMap(Bytes.toBytes(cf)).entrySet()) {
byte[] qualifier = entry.getKey();
if (qualifier == null || qualifier.length == 0) {
continue;
}
byte[] val = entry.getValue();
if (val == null || val.length == 0) {
continue;
}
mutation.addImmutable(cf_bytes, qualifier, entry.getValue());
}
c.output(mutation);
}
The error we get is the following (line 83 is marked above):
(2a6ad6372944050d): java.lang.NullPointerException at some.package.RecoveryFromGcs$CreateMutationFromResult.processElement(RecoveryFromGcs.java:83)
I have two questions:
1. Has someone experienced something like this when they try to ParDo on PCollection to get PCollection which is to be written to a bigtable?
2. Is this a reasonable approach? The end-goal is to be able to leave a daily snapshot of our bigtable (for a specific column family) on a regular basis by means of a back-up in case something bad happens. We wish to be able to read the back-up data via dataflow, and write it to bigtable when we need to.
Any suggestions and help will be really appreciated!
-------- Edit
Here is the code that scans Bigtable and dumps data to GCS:
(Some details are hidden if they are not relevant.)
public static void execute(Options options) {
Pipeline pipeline = Pipeline.create(options);
final String cf = "f"; // some specific column family.
Scan scan = new Scan();
scan.setCacheBlocks(false).setMaxVersions(1); // Disable caching and read only the latest cell.
scan.addFamily(Bytes.toBytes(cf));
CloudBigtableScanConfiguration btConfig =
BigtableUtils.getCloudBigtableScanConfigurationBuilder(options.getProject(), "some-bigtable-name").withScan(scan).build();
PCollection<Result> result = pipeline.apply(Read.from(CloudBigtableIO.read(btConfig)));
PCollection<Mutation> mutation =
result.apply(ParDo.of(new CreateMutationFromResult(cf))).setCoder(new HBaseMutationCoder());
mutation.apply(TextIO.Write.to("gs://path-to-files").withCoder(new HBaseMutationCoder()));
pipeline.run();
}
}
The job that reads the output of the above code has the following code:
(This is the one throwing exception when reading from GCS)
public static void execute(Options options) {
Pipeline pipeline = Pipeline.create(options);
PCollection<Mutation> mutations = pipeline.apply(TextIO.Read
.from("gs://path-to-files").withCoder(new HBaseMutationCoder()));
CloudBigtableScanConfiguration config =
BigtableUtils.getCloudBigtableScanConfigurationBuilder(options.getProject(), btTarget).build();
if (config != null) {
CloudBigtableIO.initializeForWrite(pipeline);
mutations.apply(CloudBigtableIO.writeToTable(config));
}
pipeline.run();
}
}
The error I am getting (https://jpst.it/Qr6M) is a bit confusing as the mutations are all Put objects, but the error is about 'Delete' object.
It's probably best to discuss this issue on the cloud bigtable client github issues page. We are currently working on import / export features like this one, so we'll respond quickly. We'll also explore this approach on our own, even if you don't add the github issue. The github issue will allow us to communicate better.
FWIW, I don't understand how you could get an NPE on the line you highlighted. Are you sure you have the right line?
EDIT (12/12):
The following processElement() method should work to convert a Result to a Put:
#Override
public void processElement(DoFn<Result, Mutation>.ProcessContext c) throws Exception {
Result result = c.element();
byte[] row = result.getRow();
if (row != null && row.length > 0) {
Put put = new Put(row);
for (Cell cell : result.rawCells()) {
put.add(cell);
}
c.output(put);
}
}

Vaadin Grid Row Index

In a vaadin table if we do
table.setRowHeaderMode(RowHeaderMode.INDEX);
we get a column with the row index.
Is it possible to to the same with a vaadin grid?
So far I haven't seen such an option, but you should be able to fake it with a generated column. Please see below a naive implementation which should get you started (improvements and suggestions are more than welcome):
// our grid with a bean item container
Grid grid = new Grid();
BeanItemContainer<Person> container = new BeanItemContainer<>(Person.class);
// wrap the bean item container so we can generated a fake header column
GeneratedPropertyContainer wrappingContainer = new GeneratedPropertyContainer(container);
wrappingContainer.addGeneratedProperty("rowHeader", new PropertyValueGenerator<Long>() {
private long index = 0;
#Override
public Long getValue(Item item, Object itemId, Object propertyId) {
return index++;
}
#Override
public Class<Long> getType() {
return Long.class;
}
});
// assign the data source to the grid and set desired column order
grid.setContainerDataSource(wrappingContainer);
grid.setColumnOrder("rowHeader", "name", "surname");
// tweak it a bit - definitely needs more tweaking
grid.getColumn("rowHeader").setHeaderCaption("").setHidable(false).setEditable(false).setResizable(false).setWidth(30);
// freeze the fake header column to prevent it from scrolling horizontally
grid.setFrozenColumnCount(1);
// add dummy data
layout.addComponent(grid);
for (int i = 0; i < 20 ; i++) {
container.addBean(new Person("person " + i, "surname " + i));
}
This will generate something similar to the image below:
There is a Grid Renderer that can be used to do this now. It is in the grid renderers add-on https://vaadin.com/directory/component/grid-renderers-collection-for-vaadin7. It is compatible with Vaadin 8 as well.
Here is how it could be used (there are a few different options for how to render the index).
grid.addColumn(value -> "", new RowIndexRenderer()).setCaption("Row index");
Worth to mention that I use the following with Vaadin 18 flow and works perfectly.
grid.addColumn(TemplateRenderer.of("[[index]]")).setHeader("#");
Ok, it took me more than a while to figure this out. I don't know why you need this, but if your purpose is to find which grid row was clicked, then you can get the index from the datasource of your control via the itemClick event of your listener.
In my case, my datasource is an SQLContainer, and I already had it available (see ds var) so I did it this way:
grid.addListener(new ItemClickEvent.ItemClickListener() {
#Override
public void itemClick(ItemClickEvent event) {
Object itemId = event.getItemId();
int indexOfRow = ds.indexOfId(itemId);
}
});
You usually add a datasource to your control when you initialize it, via constructor or by setting the property. If you got you Grid from somewhere with an already-attached datasource, you can always get it with something like this:
SQLContainer ds = (SQLContainer)gred.getContainerDataSource();
I use this trick:
int i = 0;
grid.addComponentColumn(object -> {
i++;
return new Label("" + i);
}).setCaption("");

Resources