Turn off Konvajs warnings - konvajs

When having more than five layers, Konva outputs a warning to the console along the lines of:
"The stage has n layers. Recommended maximin number of layers is 3-5. Adding more layers into the stage may drop the performance. Rethink your tree structure, you can use Konva.Group."
That's good, but if everything works fine and there are no performance issues I would like to turn off the warning.
Is there a way to stop the Konva lib from outputting certain warnings?

That is possible: https://konvajs.org/api/Konva.html#.showWarnings
Konva.showWarnings = false;

Related

How to avoid data averaging when logging to metric across multiple runs?

I'm trying to log data points for the same metric across multiple runs (wandb.init is called repeatedly in between each data point) and I'm unsure how to avoid the behavior seen in the attached screenshot...
Instead of getting a line chart with multiple points, I'm getting a single data point with associated statistics. In the attached e.g., the 1st data point was generated at step 1,470 and the 2nd at step 2,940...rather than seeing two points, I'm instead getting a single point that's the average and appears at step 2,205.
My hunch is that using the resume run feature may address my problem, but even testing out this hunch is proving to be cumbersome given the constraints of the system I'm working with...
Before I invest more time in my hypothesized solution, could someone confirm that the behavior I'm seeing is, indeed, the result of logging data to the same metric across separate runs without using the resume feature?
If this is the case, can you confirm or deny my conception of how to use resume?
Initial run:
run = wandb.init()
wandb_id = run.id
cache wandb_id for successive runs
Successive run:
retrieve wandb_id from cache
wandb.init(id=wandb_id, resume="must")
Is it also acceptable / preferable to replace 1. and 2. of the initial run with:
wandb_id = wandb.util.generate_id()
wandb.init(id=wandb_id)
It looks like you’re grouping runs so that could be why it’s appearing as averaging across step - this might not be the case but it’s worth trying. Turn off grouping by clicking the button in the centre above your runs table on the left - it’s highlighted in purple in the image below.
Both of the ways you’re suggesting resuming runs seem fine.
My hunch is that using the resume run feature may address my problem,
Indeed, providing a cached id in combination with resume="must" fixed the issue.
Corresponding snippet:
import wandb
# wandb run associated with evaluation after first N epochs of training.
wandb_id = wandb.util.generate_id()
wandb.init(id=wandb_id, project="alrichards", name="test-run-3/job-1", group="test-run-3")
wandb.log({"mean_evaluate_loss_epoch": 20}, step=1)
wandb.finish()
# wandb run associated with evaluation after second N epochs of training.
wandb.init(id=wandb_id, resume="must", project="alrichards", name="test-run-3/job-2", group="test-run-3")
wandb.log({"mean_evaluate_loss_epoch": 10}, step=5)
wandb.finish()

Apache-camel Xpathbuilder performance

I have following question. I set up an camel -project to parse certain xml files. I have to selecting take out certain nodes from a file.
I have two files 246kb and 347kb in size. I am extracting a parent-child pair of 250 nodes in the above given example.
With the default factory here are the times. For the 246kb file respt 77secs and 106 secs. I wanted to improve the performance so switched to saxon and the times are as follows 47secs and 54secs. I was able to cut the time down by at least half.
Is it possible to cut the time further, any other factory or optimizations I can use will be appreciated.
I am using XpathBuilder to cut the xpaths out. here is an example. Is it possible to not to have to create XpathBuilder repeatedly, it seems like it has to be constructed for every xpath, I would have one instance and keep pumping the xpaths into it, maybe it will improve performance further.
return XPathBuilder.xpath(nodeXpath)
.saxon()
.namespace(Consts.XPATH_PREFIX, nameSpace)
.evaluate(exchange.getContext(), exchange.getIn().getBody(String.class), String.class);
Adding more details based on Michael's comments. So I am kind of joining them, will become clear with my example below. I am combining them into a json.
So here we go, Lets say we have following mappings for first and second path.
pData.tinf.rexd: bm:Document/bm:xxxxx/bm:PmtInf[{0}]/bm:ReqdExctnDt/text()
pData.tinf.pIdentifi.instId://bm:Document/bm:xxxxx/bm:PmtInf[{0}]/bm:CdtTrfTxInf[{1}]/bm:PmtId/bm:InstrId/text()
This would result in a json as below
pData:{
tinf: {
rexd: <value_from_xml>
}
pIdentifi:{
instId: <value_from_xml>
}
}
Hard to say without seeing your actual XPath expression, but given the file sizes and execution time my guess would be that you're doing a join which is being executed naively as a cartesian product, i.e. with O(n*m) performance. There is probably some way of reorganizing it to have logarithmic performance, but the devil is in the detail. Saxon-EE is quite good at optimizing join queries automatically; if not, there are often ways of doing it manually -- though XSLT gives you more options (e.g. using xsl:key or xsl:merge) than XPath does.
Actually I was able to bring the time down to 10 secs. I am using apache-camel. So I added threads there so that multiple files can be read in separate threads. Once the file was being read, it had serial operation to based on the length of the nodes that had to be traversed. I realized that it was not necessary to be serial here so introduced parrallelStream and that now gave it enough power. One thing to guard agains is not to have a proliferation of threads since that can degrade the performance. So I try to restrict the number of threads to twice or thrice the number of cores on the operating machine.

Dask - Understanding diagnostics - memory:list

I am working on some fairly complex application that is making use of Dask framework, trying to increase the performance. To that end I am looking at the diagnostics dashboard. I have two use-cases. On first I have a 1GB parquet file split in 50 parts, and on second use case I have the first part of the above file, split over 5 parts, which is what used for the following charts:
The red node is called "memory:list" and I do not understand what it is.
When running the bigger input this seems to block the whole operation.
Finally this is what I see when I go inside those nodes:
I am not sure where I should start looking to understand what is generating this memory:list node, especially given how there is no stack button inside the task as it often happens. Any suggestions ?
Red nodes are in memory. So this computation has occurred, and the result is sitting in memory on some machine.
It looks like the type of the piece of data is a Python list object. Also, the name of the task is list-159..., so probably this is the result of calling the list Python function.

GPUImage - Does filter chain order matter?

I've set up a bunch of sliders to manipulate the values of various GPUIImageFilters targeted by a GPUIImagePicture. My current chain order looks like this:
self.gpuImagePicture = [[GPUImagePicture alloc] initWithImage:self.image];
[self.gpuImagePicture addTarget:self.toneCurveFilter];
[self.toneCurveFilter addTarget:self.exposureFilter];
[self.exposureFilter addTarget:self.constrastFilter];
[self.constrastFilter addTarget:self.saturationFilter];
[self.saturationFilter addTarget:self.highlightShadowFilter];
[self.highlightShadowFilter addTarget:self.whiteBalanceFilter];
[self.whiteBalanceFilter addTarget:self.gpuImageView];
[self.whiteBalanceFilter setInputRotation:[self gpuImageRotationModeForImage:self.image] atIndex:0];
[self.gpuImagePicture processImage];
When I remove the tone curve filter everything works smoothly. If I use the tone curve filter alone I have no issues either. When I use the above implementation processing slows down tremendously.
Does the order of the filter-chaining matter when it comes to memory management and processing, or did adding the tone curve filter to the rest of the chain just push this setup over the edge?
EDIT:
I've realized it might be worth mentioning how the sliders change the filter values. If the exposure slider is moved, for example, it triggers this code:
[self.exposureFilter setExposure:sender.value];
[self.gpuImagePicture processImage];
Sometimes, filter order doesn't matter, but it usually does. Some color-adjustment operations work in such a way that they can be rearranged without changing the output, but most filter calculations will produce slightly to significantly different results if you rearrange them. Think about it like arithmetic where you change the order of operations or move some parentheses around.
Now, when it comes to performance or memory usage, order usually doesn't matter. Branching operations are the only case where this comes into play (having a filter that targets multiple outputs, that at some point are blended or combined into one output). You don't have that here, though.
You do have a number of steps in the above, and there is overhead in every filter you chain. However, I'm surprised you even see a difference in performance, because the bottleneck in the above should be the creation of the GPUImagePicture. Instantiating one of those is far slower than any filter you'd perform on it, due to the pass through Core Graphics needed to re-render and upload the picture as a texture.
If you are reusing your toneCurveFilter or others, make sure that they are fully disconnected from all their previous targets before using -addTarget: again. It's possible that you're switching out pictures while leaving all the filters connected to their previous targets, meaning that each new picture will keep adding targets. This will lead to tremendous slowdown.
I bet something like the above is what's slowing you down, but when in doubt fire up Time Profiler and / or the OpenGL profiler and see where you're really spending all your time.

Constructing Dataflow pipeline with same transforms on side outputs

We are building a streaming pipeline where the data may encounter different errors at several steps, such as serialization error, validation error, and runtime error on writing to the storage. Whenever the error happens, we direct the data to a side output. The error handling logic is the same on these side outputs. We write the data to a common error storage for post processing / reporting.
There are at least three options to construct the pipleine. (pseudo code below)
Handle each side output with a new instance of the transform.
sideOutput1.apply(new HandleErrorTransform());
sideOutput2.apply(new HandleErrorTransform());
Handle each side output with a single instance of the transform.
Transform errorTransform = new HandleErrorTransform();
sideOutput1.apply(errorTransform);
sideOutput2.apply(errorTransofrm);
Flatten the output from these side outputs and use a single transform to handle all the errors.
PCollectionList.of(sideOutput1).and(sideOutput2)
.apply(Flatten.<ErrorMessage>pCollections())
.apply(new HandleErrorTransform());
Is there any advice on which one to use, for better scalability and performance? Or maybe it doesn't matter?
1 and 2 are basically the same -- since the pipeline is serialized the sharing doesn't offer any advantages.
Option 3 may have have some advantages since it is easier for you to add more logic to that path. It will likely be a bit easier to scale since there will only be one source writing elements to the final location, and that means fewer buffers, more opportunities to batch elements, etc.
One downside of 3 is that the use of flatten will hold up any windows created in the HandleErrorTransform until all of the main pipeline has processed those timestamps. This may be desirable -- all errors from records in this window -- but if not can be addressed using triggers.

Resources