Is it possible to perform i/o from within JENA rules? - jena

I'm interested in using Jena to build a fault diagnostic, ontology-based expert system. Is it possible to perform i/o from within forward- or backward-chaining rules? For example to prompt the user for further facts? Or to access a database?

Related

Jena Query Optimization

I am pretty new to sparql and apache Jena so please forgive my naiveness.
I loaded wikidata dump (705G) using TDB2 loader and executed some query examples from Wikidata Query Service.
Most of the queries take longer time in Jena compare to Wikidata Query Service.
My machine is configured with 750G of RAM and 80 CPUs.
My questions are:
Why Wikidata service is faster then Jena?
How can I improve query performance without rewriting query? maybe some indexing techniques? Or specific server configurations?
I looked up all stackoverflow questions with [Jena] tag and didn't find anything about it. If you can provide tutorials or topics except official Jena website it would be great.
You can try to use the next generation TDB2 (instead of TDB1).
tdb2.tdbloader --loc /path/to/tdb2/ /path/to/some.ttl
Also, building a TDB2 like that does not generate statistics by default. You have to manually do it. First cd to the TDB2 you created (following the example above that is /path/to/tdb2) and run (in bash):
tdb2.tdbstats --loc=`pwd` > /tmp/stats.opt
mv /tmp/stats.opt > /path/to/tdb2/Data-0001/
The statistics "guide the optimizer in choosing one execution plan over another" which could help you achieve better query performance.
https://jena.apache.org/documentation/tdb/optimizer.html#running-tdbstats

What is the best way to implement a B2B communication?

For my bachelorĀ“s degree I got the task to implement a B2B communication for an ERP system developed by the company I currently work for. Because it should also be able to communicate with other software I consider using EDI messages (EDIFACT) or maybe cXML. What is the best way to approach this task.
I had the idea to translate the EDIFACT message into xml defined by one xsd describing every EDIFACT message.
Then I would write the xml into the database or to business objects using a selfwritten mapper.
For writing EDIFACT messages I just use the same methods the other way round.
I thought using XML transformation first would be easier for the mapping and gives the opportunity using the xml for other purposes like writing other edi formats.
The other idea is to just use cXML and map it.
What is the best approach to this task?
You're essentially designing and implementing a public facing API for the ERP, so you need to consider security, reliability, non-repudiation, impact on business process under normal and abnormal conditions.
You'll also need to consider (ask) what sorts of information your customers will need to exchange with their partners (master data, transactional messages, financial information, etc).
I'd start by looking at the most commonly exchanged messages in the industry most representative of the ERPs users - look for message content and structure.
whether you choose to use EDIFACT, ANSI X12, cXML, XCBL, GS1XML, ebXML or something else is less important than good documentation and flexibility. it's unlikely that your choice will be exactly what any of your customers need without further transformation. you don't want to invent a new any to any transformation tool, and you probably don't even want to bundle an existing one.

Neo4j Restful VS Neo4j JDBC

What are the comparative advantages of querying a neo4j DB via
REST API
JDBC
as a Spring Data plugin
Performance will be better within Java using JDBC as opposed to a REST API. Here's a good explanation of why:
When you add complexity the code will run slower. Introducing a REST
service if it's not required will slow the execution down as the
system is doing more.
Abstracting the database is good practice. If you're worried about
speed you could look into caching the data in memory so that the
database doesn't need to be touched to handle the request.
Before optimizing performance though I'd look into what problem you're
trying to solve and the architecture you're using, I'm struggling to
think of a situation where the database options would be direct access
vs REST.
Regarding using neo4j as a plugin you can certainly do so, but I have to imagine the performance would not be as good as using JDBC.
From the book "Graph Databases" - Ian Robinson
Queries run fastest when the portions of the graph needed to satisfy
them reside in main memory (that is, in the filesystem cache and the
object cache). A single graph database instance today can hold many
billions of nodes, relationships, and properties, meaning that some
graphs will be just too big to fit into main memory.
If you add another layer to the app, this will be reflected in performance, so the bare you can consumes your data the better the performance but also the complexity and understanding of the code.

CEP with shared memory for fallback

I'm facing difficulties with finding the best CEP product for our problem. We need a distributed CEP solution with shared memory. The main reason for distribution isn't speeding up the process, but having a fallback in case of hardware or software problems on nodes. Because of that, all nodes should keep their own copy of the event-history.
Some less important requirements to the CEP product are:
- Open source is a big pre.
- It should run on a Linux system.
- Running in a Java environment would be nice.
Which CEP products are recommended?
A number of commercial non-open source products employ a distributed data grid to store the stateful event processing data in a fault-tolerant manner. My personal experience is with TIBCO BusinessEvents, which internally uses TIBCO ActiveSpaces. Other products claim do similar things, e.g., Oracle Event Processing uses Oracle Coherence.
Open source solutions, I wouldn't be aware that any of them offers functionality like this out of the box. With the right skills you might be able to use them in conjunction with a data grid (I've seen people try to use Drools Fusion together with infinispan), but there are quite a number of complexities that you need think about that a pre-integrated product would take care of for you (transaction boundaries, data access, keeping track of changes, data modeling).
An alternative you might consider if performance doesn't dictate a distributed/load-balanced setup could be to just run a hot standby, i.e., two engines performing the same CEP logic, but only one engine (the active one) actually triggering outgoing actions. The hot-standby engine would be just evaluating the CEP logic to have the data in its memory ready to take over in case of failure but not trigger outgoing actions as long as the other engine is running.

Can I connect directly to the output of a Mahout model with other data-related tools?

My only experience with Machine Learning / Data Mining is via SQL Server Analysis Services.
Using SSAS, I can set up models and fire direct singleton queries against it do to things like real-time market basket analysis and product suggestions. I can grab the "results" from the model as a flattened resultset and visualize same elsewhere.
Can I connect directly to the output of a Mahout model with other data-related tools in the same manner? For example, is there any way I can pull out a tabular resultset so I can render same with the visualization tool of my choice? ODBC driver, maybe?
Thanks!
The output of Mahout is generally a file on HDFS, though you could dump it out anywhere Hadoop can put data. And with another job to translate to put in whatever form you need, it's readable. And if you can find an ODBC driver for the data store you put it in, yes.
So I suppose the answer is, no, there is not by design any integration with any particular consumer. But you can probably hook up whatever you imagine.
There are some bits that are designed to be real-time systems queried via API, but I don't think it's what you mean.

Resources