Assigning labels to triples - jena

I am currently trying to do stream reasoning using Jena, so I want to be able to reason over a certain set of triples that have occurred in a particular window of time, also taking into account some background static knowledge.
My problem is that I have an ontology that I read from several files, however I wish for the triples I obtain to have time stamps for when I receive them, which I thought I could just do by applying labels to the triples (I am just giving them all random time stamps for the moment as this is only a test).
While I didn't think that this would be problem, I am struggling at the initial step of just applying a label to an existing triple and selecting it. I cannot not seem to be able to access triples from the ontModel without having to transform it into a Graph, and while I could then create quads with the extra value being some literal for time, I can't find a way to then reason over this graph.
Any light that people can shed on this issue would help. I hope I am being clear.

I'm not sure exactly how you're putting labels on your triples, but you can get Statements from an OntModel, and Statement implements FrontsTriple through which you can access a corresponding Triple.

Related

OPENCV OPENVINO cv2.rectangle

I am using opencv and openvino and am trying to figure out when I have a face detected, use the cv2.rectangle and have my coordinates sent but only on the first person bounded by the box so it can move the motors because when it sees multiple people it sends multiple coordinates and thus causing the servo and stepper motors to go crazy. Any help would be appreciated. Thank you
Generally, each code would run line by line. You'll need to create a proper function for each scenario so that the data could be handled and processed properly. In short, you'll need to implement error handling and data handling (probably more than these, depending on your software/hardware design). If you are trying to implement multiple threads of executions at the same time, it is better to use multithreading.
Besides, you are using 2 types of motors. Simply taking in all data is inefficient and prone to cause missing data. You'll need to be clear about what servo motor and stepper motor tasks are, the relations between coordinates, who will trigger what, if something fails or some sequence is missing then do task X, etc.
For example, the sequence of Data A should produce Result A but it is halted halfway because Data B went into the buffer and interfered with Result A and at the same time screwed Result B which was anticipated to happen. (This is what happened in your program)
It's good to review and design your whole process by creating a coding flowchart (a diagram that represents an algorithm). It will give you a clear idea of what should happen for each sequence of code. Then, design a proper handler for each situation.
Can you share more insights of your (pseudo-)code, please?
It sounds easy - you trigger a face-detection inference-request and you get a list/vector with all detected faces (the region-of-interest for each detected face) (including false-positive and false-positives, requiring some consistency-checks to filter those).
If you are interested in the first detected face only - then it could be to just process the first returned result from the list/vector.
However, you will see that sometimes the order of results might change, i.e. when 2 faces A and B were detected, in the next run it could still return faces, but B first and then A.
You could add object-tracking on top of face-detection to make sure you always process the same face.
(But even that could fail sometimes)

Abaqus: efficiently export large xy data set from Abaqus

I am trying to export XY data objects from sets of the size of 20-40k elements, but Abaqus is slowing down considerably, and even crashing. In fact, when I create the xy data, Abaqus gives me a warning saying that "the number of xyDataObjects is very large, and might cause performance issues". And so it does.
My usual procedure to is to create the xy data and then export in rpt format. Can someone suggest another method less prone to crashing? Would it be more efficient to divide the output element set into two or more subsets, and concatenate them after exporting?
The method recommended by #hgazibara in the comments is certainly sufficient, but it is laborious.
An easier method, I found, is a package called Abaqus2Matlab, which scrapes any variable you want from the odb. See here: http://www.abaqus2matlab.com/

Apache-camel Xpathbuilder performance

I have following question. I set up an camel -project to parse certain xml files. I have to selecting take out certain nodes from a file.
I have two files 246kb and 347kb in size. I am extracting a parent-child pair of 250 nodes in the above given example.
With the default factory here are the times. For the 246kb file respt 77secs and 106 secs. I wanted to improve the performance so switched to saxon and the times are as follows 47secs and 54secs. I was able to cut the time down by at least half.
Is it possible to cut the time further, any other factory or optimizations I can use will be appreciated.
I am using XpathBuilder to cut the xpaths out. here is an example. Is it possible to not to have to create XpathBuilder repeatedly, it seems like it has to be constructed for every xpath, I would have one instance and keep pumping the xpaths into it, maybe it will improve performance further.
return XPathBuilder.xpath(nodeXpath)
.saxon()
.namespace(Consts.XPATH_PREFIX, nameSpace)
.evaluate(exchange.getContext(), exchange.getIn().getBody(String.class), String.class);
Adding more details based on Michael's comments. So I am kind of joining them, will become clear with my example below. I am combining them into a json.
So here we go, Lets say we have following mappings for first and second path.
pData.tinf.rexd: bm:Document/bm:xxxxx/bm:PmtInf[{0}]/bm:ReqdExctnDt/text()
pData.tinf.pIdentifi.instId://bm:Document/bm:xxxxx/bm:PmtInf[{0}]/bm:CdtTrfTxInf[{1}]/bm:PmtId/bm:InstrId/text()
This would result in a json as below
pData:{
tinf: {
rexd: <value_from_xml>
}
pIdentifi:{
instId: <value_from_xml>
}
}
Hard to say without seeing your actual XPath expression, but given the file sizes and execution time my guess would be that you're doing a join which is being executed naively as a cartesian product, i.e. with O(n*m) performance. There is probably some way of reorganizing it to have logarithmic performance, but the devil is in the detail. Saxon-EE is quite good at optimizing join queries automatically; if not, there are often ways of doing it manually -- though XSLT gives you more options (e.g. using xsl:key or xsl:merge) than XPath does.
Actually I was able to bring the time down to 10 secs. I am using apache-camel. So I added threads there so that multiple files can be read in separate threads. Once the file was being read, it had serial operation to based on the length of the nodes that had to be traversed. I realized that it was not necessary to be serial here so introduced parrallelStream and that now gave it enough power. One thing to guard agains is not to have a proliferation of threads since that can degrade the performance. So I try to restrict the number of threads to twice or thrice the number of cores on the operating machine.

Apache Beam - Delta between windows

Apologies, in trying to be concise and clear my previous description of my question turned into a special case of the general case I'm trying to solve.
New Description
I'm trying to Compare the last emitted value of an Aggregation Function (Let's say Sum()) with a each element that I aggregate over in the current window.
Worth noting, that the ideal (I think) solution would include
The T2(from t-1) element used at time = t is the one that was created during the previous window.
I've been playing with several ideas/experiments but I'm struggling to find a way to accomplish this in a way is elegant and "empathetic" to Beam's compute model (which I'm still trying to fully Grock after many an article/blog/doc and book :)
Side inputs seem unwieldy because It looks like I have to shift the emitted 5M#T-1 Aggregation's timestamp into the 5M#T's window in order to align it with the current 5M window
In attempting this with side inputs (as I understand them), I ended up with some nasty code that was quite "circularly referential", but not in an elegant recursive way :)
Any help in the right direction would be much appreciated.
Edit:
Modified diagram and improved description to more clearly show:
the intent of using emitted T2(from t-1) to calculate T2 at t
that the desired T2(from t-1) used to calculate T2 is the one with the correct key
Instead of modifying the timestamp of records that are materialized so that they appear in the current window, you should supply a window mapping fn which just maps the current window on to the past one.
You'll want to create a custom WindowFn which implements the window mapping behavior that you want paying special attention to overriding the getDefaultWindowMappingFn function.
Your pipeline would be like:
PCollection<T> mySource = /* data */
PCollectionView<SumT> view = mySource
.apply(Window.into(myCustomWindowFnWithNewWindowMappingFn))
.apply(Combine.globally(myCombiner).asSingletonView());
mySource.apply(ParDo.of(/* DoFn that consumes side input */).withSideInputs(view));
Pay special attention to the default value the combiner will produce since this will be the default value when the view has had no data emitted to it.
Also, the easiest way to write your own custom window function is to copy an existing one.

Store Redundant Info vs. Repeated Conversions

Is it preferable to store redundant information, (which can be otherwise generated from existing data,) or to instead convert the existing data each time you need access?
I've simplified my specific problem as best as I can below, hoping that the provided answers are useful as future-reference material.
Example:
Let's say we've developed a program that places data into Squares on a grid (like a super-descriptive game of Tic-Tac-Toe or something) and assigns various details, and a unique identification number to each:
Throughout our program, we often perform logic based on a square's X and/or Y coordinates (checking for 3 in a row) and other times we only need the ID (perhaps to access a string at "SquareName[ID]") - We aren't exactly certain which of these two is accessed more often, but it's a rather close competition.
Up until now we've simply stored the ID inside the square class, and converted it with some simple formulas whenever just the X or Y are needed. Say we want to get coordinates for one square in particular:
int CurrentX = (this.Square.ID - 1) % 3) + 1; // X coordinate, 1 through 3
int CurrentY = (this.Square.ID + 1) / 3; // Y, 1 through 3
Since the squares don't move around or change ID after setup, part of me believes it would be simpler just to store all 3 values inside the Square class, but my other part cringes at the redundancy since access to X and Y is already easy enough to calculate from the existing ID.
(Note, This program itself is not very memory or resource intensive, nor does the size of the grid get much larger, so it mostly comes down to which option is a better practice or rule of thumb.)
What would you do?
As a rule of thumb, for a system where the data is read/write, store your basic data without redundancy.
When performance or other considerations become a practical issue, then you should denormalize as necessary. (i.e. wait for it to be a problem, don't pre-optimize overly much).
Your goal should be the most maintainable code possible. That usually means writing the least code possible. Having extra code to maintain redundant copies of data points will make your code more brittle.
If those are values which can be determined at the moment of creation and then do not change anymore, I would go for variables populated in the constructor. It's not redundant info in so far as that it isn't stored anywhere else, but that's not my main point. When reading my code, I'd usually expect that whenever something is computed at the time of request, it might change per request. It is easy to find the point in the source where the field is populated and where it is changed, especially if it does never change, but you might end up slightly confused when looking at some calculation which will return always the same result, as it's variables can't change, and wonder whether you're just missing a case or this is really static.
Also, using a descriptive variable name, you can get rid of the comments. Not that I generally aim at not commenting, but source code which doesn't even need comments is a pretty save signal for easy to understand code, which might (/should) be your aim.

Resources