Best way to query Individuals from an Inferred Jena Ontology - jena

I created an ontology based Security alerts.
After reading in some data(Individuals) it got pretty big, so I decided to use a Jena Rule Reasoner to determine some facts. I mostly give Individuals types and attributes and use some regex. Heres a small (constructed) example which gives an individual the type "multiple", when its information matches the regex:
[testRuleContent: (?X ns:hasClassification ?Y), (?Y ns:hasText ?Z), regex(?Z, '.Multiple.')
-> (?X rdf:type ns:Multiple)]
To use the reasoner i create an infModel based on my previous loaded ontology:
RuleReasoner ruleReasoner = new RuleReasoner("GenaralRuleReasoner");
//read rules from file
List<Rule> ruleList = Rule.parseRules(Rule.parseRules(rd));
com.hp.hpl.jena.reasoner.Reasoner reasoner = new GenericRuleReasoner(ruleList);
//jenaOntology is the ontology with the data
InfModel inferredOntotlogy = ModelFactory.createInfModel(reasoner, jenaOntology);
inferredOntotlogy.prepare();
This works without a problem and i can write the infModel into a file with the added types.
Whats the preferable method to query the inferred ontology for certain individuals (in this example those with the type: "Multiple")?
At the moment I use "listStatements()" on the Inferred Model:
Resource multiple = inferredOntotlogy.getResource("file:/C:/ns#Multiple");
StmtIterator iter = inferredOntotlogy.listStatements(null, RDF.type, multiple);
while (iter.hasNext()) {
Resource subject = iter.next().getSubject();
//Individual ind = subject.as(Individual.class);
String indUri = iter.next().getSubject().getURI();
The cast throws an exception(Its only a node with the Uri). But I get the valid Uri of the individual and could work with the basic ontology model without the new proprties (I only need them to get the searched individual so its a possible solution).
A similar attempt would be to use getDeductionsModel() on the Inferred Model to get a Model -> OntModel and query it (potentially with SPARQL).
But id prefere an easy way to query the Inferred Model. Is there such a solution? Or can u give me a tip how to handle this situaion the best way?

I will just work with the resources fow now. It provides all the functionality I need. I should have taken a closer look at the API.
I answered my own question to mark it as solved tomorrow.

Related

Ontology comparison in owlapi

I am using OWLAPI for a project, and I need to compare two ontologies for differences between them. This would ignore blank nodes so that, for instance, I can determine whether the same OWL restrictions are in both ontologies. Not only do I need to know whether there are differences, but I need to find out what those differences are. does such functionality exist in the OWLAPI, oz is there a relatively simple way to do this?
The equality between anonymous class expressions is not based on the blank node ids - anonymous class expressions only have blank nodes in the textual output, in memory the ids are ignored. So checking if an axiom exists in an ontology will by default match expressions correctly for your diff.
This is not true for individuals - anonymous individuals will not be found to be the same across ontologies, and this is by specs. An anonymous individual in one ontology cannot be found in another, because the anonymous individual ids are scoped to the containing ontology.
Note: the unit tests for OWLAPI have to carry out a very similar task, to verify that an ontology can be parsed, written and parsed again without change (i.e., roundtripped between input syntax and output syntax), so there is code that you can look at to take inspiration. See TestBase.java - equal() method for more details. This includes code to deal with different ids for anonymous individuals.

F# using XML Type Provider to modify xml

I need to process a bunch of XML documents. They are quite complex in their structure (i.e. loads of nodes), but the processing consists in changing the values for a few nodes and saving the file under a different name.
I am looking for a way to do that without having to reconstruct the output XML by explicitly instantiating all the types and passing all of the unchanged values in, but simply by copying them from the input. If the types generated automatically by the type provider were record types, I could simply create the output by let output = { input with changedNode = myNewValue }, but with the type provider I have to do let output = MyXml.MyRoot(input.UnchangedNode1, input.UnchangedNode2, myNewValue, input.UnchangedNode3, ...). This is further complicated by my changed values being in some of the nested nodes, so I have quite a lot of fluff to pass in to get to it.
The F# Data type providers were primarily designed to provide easy access when reading the data, so they do not have very good story for writing data (partly, the issue is that the underlying JSON representation is quite different than the underlying XML representation).
For XML, the type provider just wraps the standard XElement types, which happen to be mutable. This means that you can actually navigate to the elements using provided types, but then use the underlying LINQ to XML to mutate the value. For example:
type X = XmlProvider<"<foos><foo a=\"1\" /><foo a=\"2\" /></foos>">
// Change the 'a' attribute of all 'foo' nodes to 1234
let doc = X.GetSample()
for f in doc.Foos do
f.XElement.SetAttributeValue(XName.Get "a", 1234)
// Prints the modified document
doc.ToString()
This is likely not perfect - sometimes, you'll need to change the parent element (like here, the provided f.A property is not mutable), but it might do the trick. I don't know whether this is the best way of solving the problem in general, or whether something like XSLT might be easier - it probably depends on the concrete transformations.

How can I distinguish an axiom from an inferred statement in a Jena RDFS-INF model?

When I create a RDFS_MEM_RDFS_INF model in Jena and read some RDFS-File, a number of statements, that were not explicitly stated in the file are added. E.g. if we have a triple
a p b
and p is a rdfs:subPropertyOf q, than
a q b is
also in the model. A concrete example is the following: if
a skos:related b
is in the file
a skos:semanticRelation b
is also in the model.
Is there any possibility to check whether a statement in the model is an axiom or an inferred one? There are such methods for OWL Models, but I use the RDFS Model. A trivial solution would be to build two models, one without and one with inference, but I would prefer a less memory consuming solution.
Jena InfModel has a method getRawModel(). This Model wont contain the inferred statements, it will contain only the axioms in the file. use a check against that. If you are using the OntModel it has got a method getBaseModel().
To preserve djthequest's answer from a comment:
Jena InfModel has a method getRawModel(). This Model wont contain the
inferred statements, it will contain only the axioms in the file. use
a check against that. If you are using the OntModel it has got a
method getBaseModel().
and Christian Wartena's response indicating that this was a solution:
Thanks. This works fine! I didn't find that method when I was reading
the documentation last week.
(I'll remove this answer if djthequest posts one.)

Statically typed Context

I am looking to encode (in some .NET language -- fsharp seems most likely to support) a class of types that form a current context.
The rules would be that we start with an initial context of type 'a. As we progress through the computations the context will be added to, but in a way that I can always get previous values. So assume an operation that adds information to the context, 'a -> 'b, that infers that all elements of 'a are also in 'b.
The idea is similar to an immutable map, but I would like it to be statically typed. Is that feasible? How, or why not? TIA.
Update: The answer appears to be that you cannot quite do this at this time, although I have some good suggestions for modeling what I am looking for in a different way. Thanks to all who tried to help with my poorly worded question.
Separate record types in F# are distinct, even if superficially they have similar structure. Even if the fields of record 'a form a subset of fields of record 'c, there's no way of enforcing that relationship statically. If you have a valid reason to use distinct record types there, the best you could do would be to use reflection to get the fields using FSharpType.GetRecordFields and check if one forms the subset of the other.
Furthermore, introducing a new record type for each piece of data added would result in horrendous amounts of boilerplate.
I see two ways to model it that would feel more at place in F# and still allow you some way of enforcing some form of your 'a :> 'c constraint at runtime.
1) If you foresee a small number of records, all of which are useful in other parts of your program, you can use a discriminated union to enumerate the steps of your process:
type NameAndAmountAndFooDU =
| Initial of Name
| Intermediate of NameAndAmount
| Final of NameAndAmountAndFoo
With that, records that previously were unrelated types 'a and 'c, become part of a single type. That means you can store them in a list inside Context and easily go back in time to see if the changes are going in the right direction (Initial -> Intermediate -> Final).
2) If you foresee a lot of changes like 'adding' a single field, and you care more about the final product than the intermediate ones, you can define a record of option fields based on the final record:
type NameAndAmountAndFooOption =
{
Name: string option
Amount: decimal option
Foo: bool option
}
and have a way to convert it to a non-option NameAndAmountAndFoo (or the intermediate ones like NameAndAmount if you need them for some reason). Then in Context you can set the values of individual fields one at a time, and again, collect the previous records to keep track of how changes are applied.
Something like this?
type Property =
| Name of string
| Amount of float
let context = Map.empty<string,Property>
//Parse or whatever
let context = Map.add "Name" (Name("bob")) context
let context = Map.add "Amount" (Amount(3.14)) context
I have a feeling that if you could show us a bit more of your problem space, there may be a more idiomatic overall solution.

I have to read invoice data from a convoluted ASCII file, how would you guard against future changes?

I have to read invoice ascii files that are structured in a really convoluted way, for example:
55651108 3090617.10.0806:46:32101639Example Company Construction Company Example Road. 9 9524 Example City
There's actually additional stuff in there, but I don't want to confuse you any further.
I know I'm doomed if the client can't offer a better structure. For instance 30906 is an iterative number that grows. 101639 is the CustomerId. The whitespaces between "Example Company" and "Construction Company" are of variable length The field "Example Company" could have whitespaces of variable length too however, for instance "Microsoft Corporation Redmond". Same with the other fields. So there's no clear way to extract data from the latter part.
But that's not the question. I got taken away. My question is as follows:
If the input was somewhat structured and well defined, how would you guard against future changes in its structure. How would you design and implement a reader.
I was thinking of using a simple EAV Model in my DB, and use text or xml templates that describe the input, the entity names, and their valuetypes. I would parse the invoice files according to the templates.
"If the input was somewhat structured and well defined, how would you guard against future changes in its structure. How would you design and implement a reader?"
You must define the layout in a way you can flexibly pick it apart.
Here's a python version
class Field( object ):
def __init__( self, name, size ):
self.name= name
self.size = size
self.offset= None
class Record( object ):
def __init__( self, fieldList ):
self.fields= fieldList
self.fieldMap= {}
offset= 0
for f in self.fields:
f.offset= offset
offset += f.size
self.fieldMap[f.name]= f
def parse( self, aLine ):
self.buffer= aLine
def get( self, aField ):
fld= self.fieldMap[aField]
return self.buffer[ fld.offset:fld.offset+fld.size+1 ]
def __getattr__( self, aField ):
return self.get(aField)
Now you can define records
myRecord= Record(
Field('aField',8),
Field('filler',1),
Field('another',5),
Field('somethingElse',8),
)
This gives you a fighting chance of picking apart some input in a reasonably flexible way.
myRecord.parse(input)
myRecord.get('aField')
Once you can parse, adding conversions is a matter of subclassing Field to define the various types (dates, amounts, etc.)
I believe that a template describing the entity names and the value types is good one. Something like a "schema" for a text file.
What I would try to do is to separate the reader from the rest of the application as much as possible. So, the question really is, how to define an interface that will be able to accommodate for changes in the parameters list. This is may not be always possible, but still, if you are relying on an interface to read the data, you could change the implementation of the reader without affecting the rest of the system.
Well, your file format looks much like the french protocol called Etebac used between banks and their customers.
It's a fixed width text format.
The best you can do is use some kind of unpack function :
$ perl -MData::Dumper -e 'print Dumper(unpack("A8 x A5 A8 A8 A6 A30 A30", "55651108 3090617.10.0806:46:32101639Example Company Construction Company Example Road. 9 9524 Example City"))'
$VAR1 = '55651108';
$VAR2 = '30906';
$VAR3 = '17.10.08';
$VAR4 = '06:46:32';
$VAR5 = '101639';
$VAR6 = 'Example Company';
$VAR7 = 'Construction Company';
What you should do is for every input, check that it is what it's supposed to be, that is, XX.XX.XX, or YY:YY:YY or that it does not start with a space, and abort if it does.
I'd have a database of invoice data, with tables such as Company, Invoices, Invoice_Items. Depends on complexity, do you wish to record your orders as well, and then link invoices to the orders, and so on? But I digress...
I'd have an in-memory model of the database model, but that's a given. If XML output and input was needed, I would have an XML serialisation of the model if I needed to supply the invoices as data elsewhere, and a SAX parser to read it in. Some APIs might make this trivial to do, or maybe you just want to expose a web service to your repository if you are going to have clients reading from you.
As for reading in the text files (and there isn't much information relating to them - why would the format of these change? where are they coming from? Are you replacing this system, or will it keep on running, and you're just a new backend that that they're feeding?) You say the number of spaces is variable - is that just because the format is fixed-width columns? I would create a reader that would read them into your model, and hence your database schema.

Resources